python抓取不得姐动图（报错 urllib.error.HTTPError: HTTP Error 403: Forbidden）

抓取不得姐动图（报错）

# -*- coding:utf-8 -*-

#__author__ :kusy

#__content__:文件说明

#__date__:2018/7/23 17:01

import urllib.request

import re

def getHtml(url):

    page = urllib.request.urlopen(url)

    html = page.read()

    # print(html)

    return html

def getImg(reg,savePath):

    iCnt = 0

    def giveImg(html):

        imgre = re.compile(reg)

        imglist = re.findall(imgre, html.decode('utf-8'))

        nonlocal iCnt

        for imgurl in imglist:

            urllib.request.urlretrieve(imgurl, savePath + '%s.gif' % iCnt)

            iCnt += 1

    return giveImg

# html = getHtml("http://pic.sogou.com/")

# reg = r'"image":"(.+?)"'  #sougou

reg = r'data-original="(.+?\.gif)"'

savePath = 'image/gif/'

g = getImg(reg,savePath)

for i in range(10):

    if i >1:

        print("http://www.budejie.com/" + str(i))

        html = getHtml("http://www.budejie.com/" + str(i))

    else:

        html = getHtml("http://www.budejie.com/")

    g(html)

报错如下

E:\kusy\python\venv\Scripts\python.exe E:/kusy/python/getJpg.py

http://www.budejie.com/2

Traceback (most recent call last):

  File "E:/kusy/python/getJpg.py", line 35, in <module>

    html = getHtml("http://www.budejie.com/" + str(i))

  File "E:/kusy/python/getJpg.py", line 9, in getHtml

    page = urllib.request.urlopen(url)

  File "C:\Users\jingjing\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 223, in urlopen

    return opener.open(url, data, timeout)

  File "C:\Users\jingjing\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 532, in open

    response = meth(req, response)

  File "C:\Users\jingjing\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 642, in http_response

    'http', request, response, code, msg, hdrs)

  File "C:\Users\jingjing\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 570, in error

    return self._call_chain(*args)

  File "C:\Users\jingjing\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 504, in _call_chain

    result = func(*args)

  File "C:\Users\jingjing\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 650, in http_error_default

    raise HTTPError(req.full_url, code, msg, hdrs, fp)

urllib.error.HTTPError: HTTP Error 403: Forbidden

Process finished with exit code 1

百度了下已解决：

# -*- coding:utf-8 -*-

#__author__ :kusy

#__content__:文件说明

#__date__:2018/7/23 17:01

import urllib.request

import re

def getHtml(url):

    # 如果不加上下面的这行出现会出现urllib.error.HTTPError: HTTP Error 403: Forbidden错误

    # 主要是由于该网站禁止爬虫导致的，可以在请求加上头信息，伪装成浏览器访问User-Agent,具体的信息可以通过火狐的FireBug插件查询

    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}

    req = urllib.request.Request(url=url,headers=headers)

    page = urllib.request.urlopen(req)

    html = page.read()

    # print(html)

    return html

def getImg(reg,savePath):

    iCnt = 0

    def giveImg(html):

        imgre = re.compile(reg)

        imglist = re.findall(imgre, html.decode('utf-8'))

        nonlocal iCnt

        for imgurl in imglist:

            urllib.request.urlretrieve(imgurl, savePath + '%s.gif' % iCnt)

            iCnt += 1

    return giveImg

# html = getHtml("http://pic.sogou.com/")

# reg = r'"image":"(.+?)"'  #sougou

reg = r'data-original="(.+?\.gif)"'

savePath = 'image/gif/'

g = getImg(reg,savePath)

for i in range(10):

    if i >1:

        print("http://www.budejie.com/" + str(i))

        html = getHtml("http://www.budejie.com/" + str(i))

    else:

        html = getHtml("http://www.budejie.com/")

    g(html)

下载成功

python抓取不得姐动图（报错 urllib.error.HTTPError: HTTP Error 403: Forbidden）的更多相关文章

抓取https网页时，报错sun.security.validator.ValidatorException: PKIX path building failed 解决办法
抓取https网页时,报错sun.security.validator.ValidatorException: PKIX path building failed 解决办法原因是https证书问题, ...
Python抓取zabbix性能监控图
一.通过查询zabbix db的方式通过主机IP获取到所需要的graphid(比如CPU监控图.内存监控图等,每个图对应一个graphid),最后将图片保存到本地注:该graph必须要在 scree ...
nagios报错HTTP WARNING: HTTP/1.1 403 Forbidden解决方法
Nagios--localhost报警:"WARNING: HTTP/1.1 403 Forbidden "解决方法: In dashboard it shows alert on ...
myeclipse关于svn更新报错：OPTIONS of '/svn/Xxx': 403 Forbidden
这个问题出现原因是其他人修改了我原本写作的代码位置,把两个类转移到了别的文件夹,我更新之后只显示除了他增加的文件夹而没有里面的类,同时爆出错误: 问题原因:svn版本号不匹配,即跳版本. 解决如下:r ...
python抓取性感尤物美女图
由于是只用标准库,装了python3运行本代码就能下载到多多的美女图... 写出代码前面部分的时候,我意识到自己的函数设计错了,强忍继续把代码写完. 测试发现速度一般,200K左右的下载速度,也没有很 ...
使用Python抓取猫眼近10万条评论并分析
<一出好戏>讲述人性,使用Python抓取猫眼近10万条评论并分析,一起揭秘“这出好戏”到底如何? 黄渤首次导演的电影<一出好戏>自8月10日在全国上映,至今已有10天,其主演 ...
python抓取知乎热榜
知乎热榜讨论话题,https://www.zhihu.com/hot,本文用python抓取下来分析 #!/usr/bin/python # -*- coding: UTF-8 -*- from ur ...
Python 抓取网页并提取信息(程序详解)
最近因项目需要用到python处理网页,因此学习相关知识.下面程序使用python抓取网页并提取信息,具体内容如下: #---------------------------------------- ...
使用 Python 抓取欧洲足球联赛数据
Web Scraping在大数据时代,一切都要用数据来说话,大数据处理的过程一般需要经过以下的几个步骤数据的采集和获取数据的清洗,抽取,变形和装载数据的分析,探索和预测 ...

随机推荐

redisql 试用
redisql 是一个redis 模块,可以让redis 支持sql 查询,基于rust编写具有以下特性快速,每秒130k的插入使用标准sql 容易操作,基于redis,使用标准的redis 二 ...
3-开发共享版APP(搭建指南)-修改手机验证码
https://www.cnblogs.com/yangfengwu/p/11273743.html 请先看数据篇或者参考 https://www.cnblogs.com/yangfengwu/p/ ...
SHOI做题记录
LOJ #2027. 「SHOI2016」黑暗前的幻想乡考虑到每个公司一条边,那就等价于没有任何一家公司没有边. 然后就可以容斥+矩阵树定理,没了. LOJ #2028. 「SHOI2016」随机序 ...
分析WordPress数据表之分类标签表(功能篇)
数据表分析 wp_terms(分类标签表) 表字段如下:term_id(分类或标签ID)name(分类或标签名)slug(别名)term_group(分类标签组,我想应该是用于排序分组) wp_ter ...
docker安装与基本配置
1.安装docker #yum remove docker \ docker-common \ container-selinux \ docker-selinux \ docker-engine \ ...
thrift 是rpc协议
PC(Remote Procedure Call,远程过程调用)是建立在Socket之上的,出于一种类比的愿望,在一台机器上运行的主程序,可以调用另一台机器上准备好的子程序,就像LPC(本地过程调用) ...
Cesium 1.54评测【转】
重要功能评测 3dtiles数据上画线和贴纹理 3dtiles数据上画线和贴纹理把线条贴到3dtiles上需要用到两个属性:clampToGround和classificationType. c ...
NOTIC: [8] Trying to get property of non-object
NOTIC: [8] Trying to get property of non-object /home/wwwroot/qwsd/Application/Admin/Controller/Pr ...
洛谷 P1969 积木大赛（NOIP2013）
题目描述春春幼儿园举办了一年一度的“积木大赛”.今年比赛的内容是搭建一座宽度为n的大厦,大厦可以看成由n块宽度为1的积木组成,第i块积木的最终高度需要是hi. 在搭建开始之前,没有任何积木(可以看成n ...
python做上位机
参考文章: https://blog.csdn.net/dgut_guangdian/article/details/78391270 https://www.cnblogs.com/lanceyu/ ...

python抓取不得姐动图（报错 urllib.error.HTTPError: HTTP Error 403: Forbidden）

python抓取不得姐动图（报错 urllib.error.HTTPError: HTTP Error 403: Forbidden）的更多相关文章

随机推荐

热门专题