Golang高并发抓取HTML图片
Golang高并发抓取HTML图片
使用准备
1.安装Golang
2.下载爬虫包
go get -v github.com/hunterhug/marmot/util
go get -v github.com/hunterhug/marmot/tool
程序
该程序只能抓取HTML中src="http"
中的图片, 必须带有协议头http(s)
, 其他如data-src
和混淆在JS中的无法抓取
See: https://github.com/hunterhug/marmot/blob/master/example/practice/pictures/main.go
package main
import (
"fmt"
"github.com/hunterhug/marmot/util"
"github.com/hunterhug/marmot/tool"
)
// Num of miner, We can run it at the same time to crawl data fast
var MinerNum = 5
// You can update this decide whether to proxy
var ProxyAddress interface{}
func main() {
// You can Proxy!
// ProxyAddress = "socks5://127.0.0.1:1080"
fmt.Println(`Welcome: Input "url" and picture keep "dir"`)
fmt.Println("---------------------------------------------")
url := util.Input(`URL(Like: "http://publicdomainarchive.com")`, "http://publicdomainarchive.com")
dir := util.Input(`DIR(Default: "./picture")`, "./picture")
fmt.Printf("You will keep %s picture in dir %s\n", url, dir)
fmt.Println("---------------------------------------------")
// Start Catch
err := tool.DownloadHTMLPictures(url, dir, MinerNum, ProxyAddress)
if err != nil {
fmt.Println("Error:" + err.Error())
}
}
解释均写, 运行后:
Welcome: Input "url" and picture keep "dir"
---------------------------------------------
URL(Like: "http://publicdomainarchive.com")
DIR(Default: "./picture")
You will keep http://publicdomainarchive.com picture in dir ./picture
---------------------------------------------
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_02_03_modern.jpg
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_03_03_google_dark.png
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-003-667x1000-192684_667x675.jpg
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_powered-by-wp-engine.png
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_divi.png
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-002-1000x667.jpg
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_public-domain-mark.png
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_01_03_public-domain-images-free-stock-photos008-1000x625.jpg
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_09_03_Weekly.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-054-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_10_03_instagram_dark.png
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_02_03_vintage.jpg
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-001-1000x667.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-070-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_03_03_twitter02_dark.png
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_01_03_public-domain-images-free-stock-photos001-1000x750-167066_1000x675.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-035-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_03_03_facebook_dark.png
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-060-1000x667.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-013-1000x667.jpg
---------------------------------------------
URL(Like: "http://publicdomainarchive.com")
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_powered-by-wp-engine.png
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_divi.png
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-002-1000x667.jpg
SP4: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_05_03_public-domain-mark.png
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_01_03_public-domain-images-free-stock-photos008-1000x625.jpg
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_09_03_Weekly.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-054-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_10_03_instagram_dark.png
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_02_03_vintage.jpg
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-001-1000x667.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-070-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_03_03_twitter02_dark.png
SP2: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_01_03_public-domain-images-free-stock-photos001-1000x750-167066_1000x675.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-035-1000x667.jpg
SP3: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2014_03_03_03_facebook_dark.png
SP0: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_11_03_free-stock-photos-public-domain-images-060-1000x667.jpg
SP1: Keep in ./picture/http_04__03__03_publicdomainarchive.com_03_wp-content_03_uploads_03_2017_03_09_03_free-stock-photos-public-domain-images-013-1000x667.jpg
---------------------------------------------
URL(Like: "http://publicdomainarchive.com")
Golang高并发抓取HTML图片的更多相关文章
- android高仿抖音、点餐界面、天气项目、自定义view指示、爬取美女图片等源码
Android精选源码 一个爬取美女图片的app Android高仿抖音 android一个可以上拉下滑的Ui效果 android用shape方式实现样式源码 一款Android上的新浪微博第三方轻量 ...
- Scrapy爬取美女图片第三集 代理ip(上) (原创)
首先说一声,让大家久等了.本来打算那天进行更新的,可是一细想,也只有我这样的单身狗还在做科研,大家可能没心思看更新的文章,所以就拖到了今天.不过忙了521,522这一天半,我把数据库也添加进来了,修复 ...
- Scrapy爬取美女图片续集 (原创)
上一篇咱们讲解了Scrapy的工作机制和如何使用Scrapy爬取美女图片,而今天接着讲解Scrapy爬取美女图片,不过采取了不同的方式和代码实现,对Scrapy的功能进行更深入的运用.(我的新书< ...
- Python爬虫学习(6): 爬取MM图片
为了有趣我们今天就主要去爬取以下MM的图片,并将其按名保存在本地.要爬取的网站为: 大秀台模特网 1. 分析网站 进入官网后我们发现有很多分类: 而我们要爬取的模特中的女模内容,点进入之后其网址为:h ...
- Scrapy爬取美女图片 (原创)
有半个月没有更新了,最近确实有点忙.先是华为的比赛,接着实验室又有项目,然后又学习了一些新的知识,所以没有更新文章.为了表达我的歉意,我给大家来一波福利... 今天咱们说的是爬虫框架.之前我使用pyt ...
- 百度图片爬虫-python版-如何爬取百度图片?
上一篇我写了如何爬取百度网盘的爬虫,在这里还是重温一下,把链接附上: http://www.cnblogs.com/huangxie/p/5473273.html 这一篇我想写写如何爬取百度图片的爬虫 ...
- php远程抓取网站图片并保存
以前看到网上别人说写程序抓取网页图片的,感觉挺神奇,心想什么时候我自己也写一个抓取图片的方法! 刚好这两天没什么事,就参考了网上一个php抓取图片代码,重点借鉴了 匹配img标签和其src属性正则的写 ...
- 百度UEditor编辑器关闭抓取远程图片功能(默认开启)
这个坑娘的功能,开始时居然不知道如何触发,以为有个按钮,点击一下触发,翻阅了文档,没有发现,然后再网络上看到原来是复制粘贴非白名单内的图片到编辑框时触发,坑娘啊............... 问题又来 ...
- Scrapy-多层爬取天堂图片网
1.根据图片分类对爬取的图片进行分类 开发者选项 --> 找到分类地址 爬取每个分类的地址通过回调函数传入下一层 name = 'sky'start_urls = ['http: ...
随机推荐
- springlcoud中使用consul作为注册中心
好久没写博客了,从今天开始重新杨帆起航............................................ springlcoud中使用consul作为注册中心. 我们先对比下注册 ...
- 100-continue
https://wiki.open.qq.com/wiki/技术优化原则#1._.E7.A8.8B.E5.BA.8F.E8.AE.BE.E8.AE.A1.E6.97.B6.E9.9C.80.E8.A6 ...
- python 设计模式之装饰器模式 Decorator Pattern
#写在前面 已经有一个礼拜多没写博客了,因为沉醉在了<妙味>这部小说里,里面讲的是一个厨师苏秒的故事.现实中大部分人不会有她的天分.我喜欢她的性格:总是想着去解决问题,好像从来没有怨天尤人 ...
- VS Code 通过文件名查询文件并打开
On Windows press Ctrl+p or Ctrl+e On Mac press Cmd+p on the Linux press also Ctrl+p works Older Mac ...
- 【FreeMarker】FreeMarker使用(三)
搭建一个 1.FreeMarker取值 <!DOCTYPE html> <html> <head> <meta charset="UTF-8&quo ...
- Hadoop记录-部署hadoop环境shell实现
#!/bin/bash menu() { echo "---欢迎使用hadoop部署管理程序---" echo "# 1.初始化Linux环境" echo &q ...
- pandas绘制矩阵散点图(scatter_matrix)的方法
以 sklearn的iris样本为数据集 import matplotlib.pyplot as plt from scipy import sparse import numpy as np imp ...
- 邪淫真正的可怕危害 (转自学佛网:http://www.xuefo.net/nr/article54/544414.html)
邪淫真正的可怕危害 邪淫的害处可能很快就显现,也可能是逐渐的表现出来.但往往后者的害处更大.因为当积累了多年的邪淫果报一旦显现,后悔就已经晚了. 我本人就是一个很好的例子.回想自己这40多年的日子,邪 ...
- yaml文件实例:nginx+ingress
[root@lab3 nginx]# cat nginx-test.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: nam ...
- MySQL 8中使用全文检索示例
首先建议张册测试用的表test,并使用fulltext说明将title和body两列的数据加入全文检索的索引列中: drop table if exists test; create table te ...