python写的百度图片爬虫

学了一下python正则表达式，写一个百度图片爬虫玩玩。

当技术遇上心术不正的人，就成我这样的2B青年了。

python3.6开发。程序已经打包好，下载地址: http://pan.baidu.com/s/1bpalugf 密码：kfk4

#!/usr/local/env python
from tkinter import *
import re,os,requests,hashlib,threading
from PIL import Image
 
class Application(Frame):
	def __init__(self, master=None):
		Frame.__init__(self, master)
		self.school=threading.local()
		self.pack()
		self.createWidgets()
 
	def createWidgets(self):
		self.nameLabel=Label(self,text='请输入关键词:')
		self.nameLabel.grid(row=0,sticky=W)
 
		self.nameInput = Entry(self)
		self.nameInput.grid(row=0,column=1)
 
		self.picys=IntVar()
		self.Checkbutton = Checkbutton(self,text='图片压缩',variable=self.picys)
		self.Checkbutton.grid(row=1,column=0,columnspan=2,sticky=W)
 
		self.alertButton = Button(self, text='下载',command=self.gorun)
		self.alertButton.grid(row=1,column=1,sticky=E)
 
	def cddir(self):
		keyword=self.nameInput.get()
		os.chdir('C:\\Users\\Administrator\\Desktop\\')
		if os.path.exists(keyword) ==False:
			os.mkdir(keyword)
		os.chdir(keyword)
 
	def gorun(self):
		self.cddir()
		word=self.nameInput.get()
		x=0
		for i in range(5):
			t=threading.Thread(target=self.xiazai,args=(x,word,))
			t.start()
			x+=20
			if i == 4:
				t.join()
				self.delfile()
				if self.picys.get() == 1:
					self.suoxiao()
	def xiazai(self,page,word):
		baidupn=self.school.student=page
		num=1
		for i in range(50):
			url='https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8'
			payload={'word':word,'pn':baidupn}
			html = requests.get(url,params=payload).text
			regular='"objURL":"(.*?)",'
			pic=re.findall(regular,html)
			baidupn+=100
 
			for tu in pic:
				try:
					dl=requests.get(tu,timeout=60)
					pic_name=os.path.basename(tu)
					if pic_name in os.walk(os.getcwd()):
						continue
					else:
						if ('?' in pic_name) or ('&' in pic_name) or ('.' not in pic_name):
							pic_name='%s%s' %(num,'.jpg')
						with open(pic_name,"wb") as code:
							code.write(dl.content)
							requests.session().keep_alive = False
							dl.close()
						num+=1
				except requests.exceptions.ConnectionError:
					#print('这张图片下载失败了,图片地址',tu)
					continue
 
	def suoxiao(self):
		self.cddir()
		filedir=os.walk(os.getcwd())
		for i in filedir:
			for tplb in i[2]:
				if ('jpg' in tplb) or ('jpeg' in tplb):
					try:
						im=Image.open(tplb)
						w,h=im.size
						if w > 500:
							im.thumbnail((w//2,h//2))
							im.save(tplb,'jpeg')
						im.close()
					except OSError:
						print('跳过此文件')
 
	def md5sum(self,filename):
		f=open(filename, 'rb')
		md5=hashlib.md5()
		while True:
			fb = f.read(8096)
			if not fb:
				break
			md5.update(fb)
		f.close()
		return (md5.hexdigest())
 
	def delfile(self):
		all_md5={}
		self.cddir()
		filedir=os.walk(os.getcwd())
		for i in filedir:
			for tlie in i[2]:
				if self.md5sum(tlie) in all_md5.values():
					os.remove(tlie)
				else:
					all_md5[tlie]=self.md5sum(tlie)
 
app=Application()
app.master.title('图片下载器')
app.mainloop()

无耻的求一下赞助

python写的百度图片爬虫的更多相关文章

百度图片爬虫-python版-如何爬取百度图片?
上一篇我写了如何爬取百度网盘的爬虫,在这里还是重温一下,把链接附上: http://www.cnblogs.com/huangxie/p/5473273.html 这一篇我想写写如何爬取百度图片的爬虫 ...
用python写一个百度翻译
运行环境: python 3.6.0 今天处于练习的目的,就用 python 写了一个百度翻译,是如何做到的呢,其实呢就是拿到接口,通过这个接口去访问,不过中间确实是出现了点问题,不过都解决掉了先晾 ...
用 Python 批量下载百度图片
为了做一个图像分类的小项目,需要制作自己的数据集.要想制作数据集,就得从网上下载大量的图片,再统一处理. 这时,一张张的保存下载,就显得很繁琐.那么,有没有一种方法可以把搜索到的图片直接下载到本地 ...
百度图片爬虫-python版
self.browser=imitate_browser.BrowserBase() self.chance=0 self.chanc ...
python 百度图片爬虫
# -*- coding:utf-8 -*- #https://blog.csdn.net/qq_32166627/article/details/60882964 import requests i ...
Python练习册第 0013 题：用 Python 写一个爬图片的程序，爬这个链接里的日本妹子图片 :-)，(http://tieba.baidu.com/p/2166231880)
这道题是一道爬虫练习题,需要爬链接http://tieba.baidu.com/p/2166231880里的所有妹子图片,点进链接看一下,这位妹子是日本著名性感女演员--杉本由美,^_^好漂亮啊,赶紧 ...
python写的百度贴吧相册下载
突然想搞个这样的工具,写来写去都不知道在干嘛了,本来两个文件,现在整合在一起了. 乱得不行,懒得整理了,能用就行. 下载部分用了多线程,但是下载一个文件还是用的单线程,也就是没管http头的range ...
【python小练】图片爬虫之BeautifulSoup4
Python3用不了Scrapy! Python3用不了Scrapy! Python3用不了Scrapy! [重要的事情说三遍,据说大神们还在尝试把scrapy移植到python3,特么浪费我半个小时 ...
python爬取百度图片
import requests import re from urllib import parse import os from threading import Thread def downlo ...

随机推荐

actf2020 exec
actf2020 exec 1.根据提示,ping一个127.0.0.1,有回显,ls一下发现index.php 3.方向找错了,绕了一大圈,还cat了index.php也没发现什么最后没招了,回原 ...
java之类的抽取与对象的创建
Java语言之类的抽取前言:世界由什么组成?This is a question.有人说是原子.分子,有人说是山川草木. 诚然,一千个人眼中有一千个哈姆雷特.而在程序员眼中,万物皆对象. 定义: 在 ...
C# 计算三角形和长方形周长面积
编写一个控制台应用程序,输入三角形或者长方形边长,计算其周长和面积并输出. 代码如下: using System; using System.Collections.Generic; using Sy ...
linux中链接错误的时候，快速找到缺失的符号在哪个库中
编译一个opencv程序,链接的时候出现大量的如下错误: /home/admin/opencv/opencv-master/modules/imgproc/src/color_lab.cpp:23: ...
Docker+etcd+flanneld+kubernets 构建容器编排系统(1)
Docker: Docker Engine, 一个client-server 结构的应用, 包含Docker daemon,一个用来和daemon 交互的REST API, 一个命令行应用CLI. ...
Kubernetes的故事之持久化存储(十)
一.Storage 1.1.Volume 官网网址:https://kubernetes.io/docs/concepts/storage/volumes/ 通过官网说明大致总结下就是这个volume ...
拒绝编译等待 - 动态研发模式 ARK
作者:字节跳动终端技术--徐纪光背景 iOS 业界研发模式多为 CocoaPods + Xcode + Git 的多仓组件化开发模型.为追求极致的研发体验.提升研发效率,对该研发模式进行了大量优化, ...
AI换脸实战教学（FaceSwap的使用）---------第一步Extration：提取人脸。
市面上有多款AI换脸的方法,笔者这里节选了Github那年很火的开源项目FaceSwap: (很早就实践了,但是忘记记录啦hhh,请勿用于不正当用途哦) 做了一篇详细教学,包括配置,参数设置,换脸效果 ...
java多线程中同步的问题？
一.通过模拟网络延迟,解决同步的问题. package com.zxf.demo; public class G01 implements Runnable{ private int num=10; ...
Codeforces Round #738 (Div. 2)
Codeforces Round #738 (Div. 2) 跳转链接 A. Mocha and Math 题目大意有一个长度为\(n\)的数组可以进行无数次下面的操作,问操作后数组中的最大值的最 ...

python写的百度图片爬虫

python写的百度图片爬虫的更多相关文章

随机推荐

热门专题