Python3 的urllib实例

在Python3中合并了 urllib 和 urllib2，统一命名为 urllib 了，我觉得这样更加合理了。让我们可以像读取本地文件一样读取WEB上的数据。封装了一个类，供以后方便使用吧！并附带有许多的应用实例。

一、封装的类

#!/usr/bin/env python3

# -*- coding: utf-8 -*-  

import time

import sys

import gzip

import socket

import urllib.request, urllib.parse, urllib.error

import http.cookiejar  

class HttpTester:

    def __init__(self, timeout=10, addHeaders=True):

        socket.setdefaulttimeout(timeout)   # 设置超时时间  

        self.__opener = urllib.request.build_opener()

        urllib.request.install_opener(self.__opener)  

        if addHeaders: self.__addHeaders()  

    def __error(self, e):

        '''''错误处理'''

        print(e)  

    def __addHeaders(self):

        '''''添加默认的 headers.'''

        self.__opener.addheaders = [('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'),

                                    ('Connection', 'keep-alive'),

                                    ('Cache-Control', 'no-cache'),

                                    ('Accept-Language:', 'zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3'),

                                    ('Accept-Encoding', 'gzip, deflate'),

                                    ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')]  

    def __decode(self, webPage, charset):

        '''''gzip解压，并根据指定的编码解码网页'''

        if webPage.startswith(b'\x1f\x8b'):

            return gzip.decompress(webPage).decode(charset)

        else:

            return webPage.decode(charset)  

    def addCookiejar(self):

        '''''为 self.__opener 添加 cookiejar handler。'''

        cj = http.cookiejar.CookieJar()

        self.__opener.add_handler(urllib.request.HTTPCookieProcessor(cj))  

    def addProxy(self, host, type='http'):

        '''''设置代理'''

        proxy = urllib.request.ProxyHandler({type: host})

        self.__opener.add_handler(proxy)  

    def addAuth(self, url, user, pwd):

        '''''添加认证'''

        pwdMsg = urllib.request.HTTPPasswordMgrWithDefaultRealm()

        pwdMsg.add_password(None, url, user, pwd)

        auth = urllib.request.HTTPBasicAuthHandler(pwdMsg)

        self.__opener.add_handler(auth)  

    def get(self, url, params={}, headers={}, charset='UTF-8'):

        '''''HTTP GET 方法'''

        if params: url += '?' + urllib.parse.urlencode(params)

        request = urllib.request.Request(url)

        for k,v in headers.items(): request.add_header(k, v)    # 为特定的 request 添加指定的 headers  

        try:

            response = urllib.request.urlopen(request)

        except urllib.error.HTTPError as e:

            self.__error(e)

        else:

            return self.__decode(response.read(), charset)  

    def post(self, url, params={}, headers={}, charset='UTF-8'):

        '''''HTTP POST 方法'''

        params = urllib.parse.urlencode(params)

        request = urllib.request.Request(url, data=params.encode(charset))  # 带 data 参数的 request 被认为是 POST 方法。

        for k,v in headers.items(): request.add_header(k, v)  

        try:

            response = urllib.request.urlopen(request)

        except urllib.error.HTTPError as e:

            self.__error(e)

        else:

            return self.__decode(response.read(), charset)  

    def download(self, url, savefile):

        '''''下载文件或网页'''

        header_gzip = None  

        for header in self.__opener.addheaders:     # 移除支持 gzip 压缩的 header

            if 'Accept-Encoding' in header:

                header_gzip = header

                self.__opener.addheaders.remove(header)  

        __perLen = 0

        def reporthook(a, b, c):    # a:已经下载的数据大小; b:数据大小; c:远程文件大小;

            if c > 1000000:

                nonlocal __perLen

                per = (100.0 * a * b) / c

                if per>100: per=100

                per = '{:.2f}%'.format(per)

                print('\b'*__perLen, per, end='')     # 打印下载进度百分比

                sys.stdout.flush()

                __perLen = len(per)+1  

        print('--> {}\t'.format(url), end='')

        try:

            urllib.request.urlretrieve(url, savefile, reporthook)   # reporthook 为回调钩子函数，用于显示下载进度

        except urllib.error.HTTPError as e:

            self.__error(e)

        finally:

            self.__opener.addheaders.append(header_gzip)

            print()

二、应用实例

在OSC上动弹一下

Python3 的urllib实例的更多相关文章

python3: 爬虫---- urllib, beautifulsoup
最近晚上学习爬虫,首先从基本的开始: python3 将urllib,urllib2集成到urllib中了, urllib可以对指定的网页进行请求下载, beautifulsoup 可以从杂乱的ht ...
python3中urllib库的request模块详解
刚刚接触爬虫,基础的东西得时时回顾才行,这么全面的帖子无论如何也得厚着脸皮转过来啊! 原帖地址:https://www.2cto.com/kf/201801/714859.html 什么是 Urlli ...
Python3中urllib详细使用方法(header,代理,超时,认证,异常处理)
urllib是python的一个获取url(Uniform Resource Locators,统一资源定址器)了,我们可以利用它来抓取远程的数据进行保存哦,下面整理了一些关于urllib使用中的一些 ...
Python3中urllib详细使用方法(header,代理,超时,认证,异常处理) 转
urllib是python的一个获取url(Uniform Resource Locators,统一资源定址器)了,我们可以利用它来抓取远程的数据进行保存哦,下面整理了一些关于urllib使用中的一些 ...
Python2和Python3中urllib库中urlencode的使用注意事项
前言在Python中,我们通常使用urllib中的urlencode方法将字典编码,用于提交数据给url等操作,但是在Python2和Python3中urllib模块中所提供的urlencode的包 ...
常见的爬虫分析库（1）-Python3中Urllib库基本使用
原文来自:https://www.cnblogs.com/0bug/p/8893677.html 什么是Urllib? Python内置的HTTP请求库 urllib.request ...
Python3中Urllib库基本使用
什么是Urllib? Python内置的HTTP请求库 urllib.request 请求模块 urllib.error 异常处理模块 urllib.par ...
Python -- 网络编程 -- 认识Python3的urllib库
Python3的urllib包含5个模块 urllib error parse request response robotparser 各个模块的主要成员: error ['ContentTooSh ...
Python3 使用 urllib 编写爬虫
什么是爬虫爬虫,也叫蜘蛛(Spider),如果把互联网比喻成一个蜘蛛网,Spider就是一只在网上爬来爬去的蜘蛛.网络爬虫就是根据网页的地址来寻找网页的,也就是URL.举一个简单的例子,我们在浏览器 ...

随机推荐

前端人脸识别框架Tracking.js与JqueryFaceDetection
这篇文章主要就介绍两种前端的人脸识别框架(Tracking.js和JqueryFaceDetection) 技术特点 Tracking.js是使用js封装的一个框架,使用起来需要自己配置许多的东西,略 ...
Permutations，全排列
问题描述:给定一个数组,数字中数字不重复,求所有全排列. 算法分析:可以用交换递归法,也可以用插入法. 递归法:例如,123,先把1和1交换,然后递归全排列2和3,然后再把1和1换回来.1和2交换,全 ...
JNI简单步骤01
1.环境变量 1.1.相应的环境变量中,加入如下内容:(Windows) (1).ClASSPATH中输入 : ".;C:\Program Files\Java\jdk1.7.0_07\jr ...
docker之DockerSwarm的了解
这次一起了解下docker Swarm,什么是dockerSwarm. 什么是docker Swarm 产品背景使用docker的流程,ssh到一台服务器,运行docker命令来运行本机的docke ...
HTTP Status 500 - com.opensymphony.xwork2.ActionSupport.toAddPage()
使用struts2过程中碰到以下错误 HTTP Status 500 - com.opensymphony.xwork2.ActionSupport.toAddPage() type Exceptio ...
golang采坑记一（http与json）
http服务端在把json串写入http.ResponseWriter对象的时候我们常用的方式如下: //方法一: func ...(w http.ResponseWriter, r *http.Re ...
Qt 中使用智能指针
教研室的项目,就是用Qt做个图形界面能收发数据就可以了,但是创建数据管理类的时候需要各种new, delete,很小心了但是内存使用量在不断开关程序之后函数会长,由于用的是gcc 4.7.* 所以好 ...
【dlbook】机器学习基础
[机器学习基础] 模型的 vc dimension 如何衡量? 如何根据网络结构衡量模型容量?有效容量和模型容量之间的关系? 统计学习理论中边界不用于深度学习之中,原因? 1.边界通常比较松, 2.深 ...
HAWQ取代传统数仓实践（十）——维度表技术之杂项维度
一.什么是杂项维度简单地说,杂项维度就是一种包含的数据具有很少可能值的维度.事务型商业过程通常产生一系列混杂的.低基数的标志位或状态信息.与其为每个标志或属性定义不同的维度,不如建立单独的将不同维度 ...
Markdown博文快速转为微信文章
介绍技术博文在CSDN上,全是Markdown格式,最近看各位大佬又是个人网站又是个人微信公众号,突然发现: "个人博客小站 + 个人微信公众号 + CSDN + 掘金+ - = 程序员标 ...

Python3 的urllib实例

一、封装的类

二、应用实例

Python3 的urllib实例的更多相关文章

随机推荐

热门专题