【转】Python3中urllib详细使用方法(header,代理,超时,认证,异常处理)

urllib是python的一个获取url(Uniform Resource Locators,统一资源定址器)了，我们可以利用它来抓取远程的数据进行保存哦，下面整理了一些关于urllib使用中的一些关于header,代理,超时,认证,异常处理处理方法，下面一起来看看。

python3 抓取网页资源的 N 种方法

1、最简单

1 import urllib.request

2

3 response = urllib.request.urlopen('http://python.org/')

4

5 html = response.read()

2、使用 Request

1 import urllib.request

2

3 req = urllib.request.Request('http://python.org/')

4

5 response = urllib.request.urlopen(req)

6

7 the_page = response.read()

3、发送数据

 1 #! /usr/bin/env python3

 2

 3 import urllib.parse

 4

 5 import urllib.request

 6

 7 url = 'http://localhost/login.php'

 8

 9 user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

10

11 values = {  'act' : 'login',  'login[email]' : 'abc@abc.com',  'login[password]' : '123456'  }

12

13 data = urllib.parse.urlencode(values)

14

15 req = urllib.request.Request(url, data)

16

17 req.add_header('Referer', 'http://www.python.org/')

18

19 response = urllib.request.urlopen(req)

20

21 the_page = response.read()

22

23 print(the_page.decode("utf8"))

4、发送数据和header

 1 #! /usr/bin/env python3

 2

 3 import urllib.parse

 4

 5 import urllib.request

 6

 7 url = 'http://localhost/login.php'

 8

 9 user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

10

11 values = {  'act' : 'login',  'login[email]' : 'abc@abc.com',  'login[password]' : '123456'  }

12

13 headers = { 'User-Agent' : user_agent }

14

15 data = urllib.parse.urlencode(values)

16

17 req = urllib.request.Request(url, data, headers)

18

19 response = urllib.request.urlopen(req)

20

21 the_page = response.read()

22

23 print(the_page.decode("utf8"))

5、http 错误

 1 #! /usr/bin/env python3

 2

 3 import urllib.request

 4

 5 req = urllib.request.Request('http://python.org/')

 6

 7 try:

 8

 9 　　urllib.request.urlopen(req)

10

11 except urllib.error.HTTPError as e:

12

13 　　print(e.code)

14

15 print(e.read().decode("utf8"))

6、异常处理1

 1 #! /usr/bin/env python3

 2

 3 from urllib.request import Request, urlopen

 4

 5 from urllib.error import URLError, HTTPError

 6

 7 req = Request('http://www.python.org/')

 8

 9 try:

10

11 　　response = urlopen(req)

12

13 except HTTPError as e:

14

15 　　print('The (www.python.org)server couldn't fulfill the request.')

16

17 　　print('Error code: ', e.code)

18

19 except URLError as e:

20

21 　　print('We failed to reach a server.')

22

23 　　print('Reason: ', e.reason)

24

25 else:

26

27 　　print("good!")

28

29 　　print(response.read().decode("utf8"))

7、异常处理2

 1 #! /usr/bin/env python3

 2

 3 from urllib.request import Request, urlopen

 4

 5 from urllib.error import  URLError

 6

 7 req = Request("http://www.python.org/")

 8

 9 try:

10

11 　　response = urlopen(req)

12

13 except URLError as e:

14

15 　　if hasattr(e, 'reason'):

16

17 　　　　print('We failed to reach a server.')

18

19 　　　　print('Reason: ', e.reason)

20

21 　　elif hasattr(e, 'code'):

22

23 　　　　print('The server couldn't fulfill the request.')

24

25 　　　　print('Error code: ', e.code)

26

27 else:  print("good!")

28

29 　　print(response.read().decode("utf8"))

8、HTTP 认证

 1 #! /usr/bin/env python3

 2

 3 import urllib.request

 4

 5 # create a password manager

 6

 7 password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()

 8

 9 # Add the username and password.

10

11 # If we knew the realm, we could use it instead of None.

12

13 top_level_url = "https://www.python.org/"

14

15 password_mgr.add_password(None, top_level_url, 'rekfan', 'xxxxxx')

16

17 handler = urllib.request.HTTPBasicAuthHandler(password_mgr)

18

19 # create "opener" (OpenerDirector instance)

20

21 opener = urllib.request.build_opener(handler)

22

23 # use the opener to fetch a URL

24

25 a_url = "https://www.python.org/"

26

27 x = opener.open(a_url)

28

29 print(x.read())

30

31 # Install the opener.

32

33 # Now all calls to urllib.request.urlopen use our opener.

34

35 urllib.request.install_opener(opener)

36

37 a = urllib.request.urlopen(a_url).read().decode('utf8')

38

39 print(a)

9、使用代理

 1 #! /usr/bin/env python3

 2

 3 import urllib.request

 4

 5 proxy_support = urllib.request.ProxyHandler({'sock5': 'localhost:1080'})

 6

 7 opener = urllib.request.build_opener(proxy_support)

 8

 9 urllib.request.install_opener(opener)

10

11  a = urllib.request.urlopen("http://www.python.org/").read().decode("utf8")

12

13 print(a)

10、超时

 1 #! /usr/bin/env python3

 2

 3 import socket

 4

 5 import urllib.request

 6

 7 # timeout in seconds

 8

 9 timeout = 2

10

11 socket.setdefaulttimeout(timeout)

12

13 # this call to urllib.request.urlopen now uses the default timeout

14

15 # we have set in the socket module

16

17 req = urllib.request.Request('http://www.python.org/')

18

19 a = urllib.request.urlopen(req).read()

20

21 print(a)

来源：http://www.cnblogs.com/ifso/p/4707135.html

【转】Python3中urllib详细使用方法(header,代理,超时,认证,异常处理)的更多相关文章

Python3中urllib详细使用方法(header,代理,超时,认证,异常处理)
urllib是python的一个获取url(Uniform Resource Locators,统一资源定址器)了,我们可以利用它来抓取远程的数据进行保存哦,下面整理了一些关于urllib使用中的一些 ...
Python3中urllib详细使用方法(header,代理,超时,认证,异常处理) 转
urllib是python的一个获取url(Uniform Resource Locators,统一资源定址器)了,我们可以利用它来抓取远程的数据进行保存哦,下面整理了一些关于urllib使用中的一些 ...
Python3中使用urllib的方法详解(header,代理,超时,认证,异常处理)_python
我们可以利用urllib来抓取远程的数据进行保存哦,以下是python3 抓取网页资源的多种方法,有需要的可以参考借鉴. 1.最简单 import urllib.request response = ...
Python3中使用urllib的方法详解(header,代理,超时,认证,异常处理)
出自 http://www.jb51.net/article/93125.htm
Python2和Python3中urllib库中urlencode的使用注意事项
前言在Python中,我们通常使用urllib中的urlencode方法将字典编码,用于提交数据给url等操作,但是在Python2和Python3中urllib模块中所提供的urlencode的包 ...
python3中使用builtwith的方法（很详细）
1. 首先通过pip install builtwith安装builtwith C:\Users\Administrator>pip install builtwith Collecting b ...
python3中urllib库的request模块详解
刚刚接触爬虫,基础的东西得时时回顾才行,这么全面的帖子无论如何也得厚着脸皮转过来啊! 原帖地址:https://www.2cto.com/kf/201801/714859.html 什么是 Urlli ...
Python3中Urllib库基本使用
什么是Urllib? Python内置的HTTP请求库 urllib.request 请求模块 urllib.error 异常处理模块 urllib.par ...
python3中urllib的基本使用
urllib 在python3中,urllib和urllib2进行了合并,现在只有一个urllib模块,urllib和urllib2的中的内容整合进了urllib.request,urlparse整合 ...

随机推荐

curl/libcurl获取打开网页平均网速
CURL: curl -o /dev/null -s -w %{http_code}:%{http_connect}:%{content_type}:%{time_namelookup}:%{time ...
HTTP 0.9 HTTP 1.0 HTTP 1.1 HTTP 2.0区别
HTTP协议 :Hyper Text Transfer Protocol(超文本传输协议),是用于从万维网(WWW:World Wide Web)服务器传输超文本到本地浏览器的传送协议.是互联网上应用 ...
laravel 原生 sql
1.插入数据 DB::insert('insert into users (id, name, email, password) values (?, ?, ? , ? )',[1, 'Laravel ...
java 10 中 var关键字用法
引用:https://mp.weixin.qq.com/s/n1tcJ0CywSi0j-YycGPwxg what java10引入了局部变量折断 var用于声明局部变量. 如var user=new ...
Linux chgrp命令
在lunix系统里,文件或目录的权限的掌控以拥有者及所诉群组来管理.可以使用chgrp指令取变更文件与目录所属群组,这种方式采用群组名称或群组识别码都可以.Chgrp命令就是change group的 ...
ava集合---ArrayList的实现原理
一.ArrayList概述 ArrayList是基于数组实现的,是一个动态数组,其容量能自动增长,类似于C语言中的动态申请内存,动态增长内存 ArrayList不是线程安全的,只能用在单线程环境下,多 ...
[css 实践篇] 解决悬浮的<header> <footer>遮挡内容的处理技巧
我写的实践篇都是自己在实践项目所遇到的 "拦路虎" 还是很有借鉴的意义的.(实践才是检验真理的唯一标准呀),废话不多说,进去正题 position: fixed 绝对固定底部后会 ...
初始css
1.CSS规则由两部分构成,即选择器和声明器声明必须放在{}中并且声明可以是一条或者多条每条声明由一个属性和值构成,属性和值用冒号分开,每条语句用英文冒号分开注意: css的最后一条声明,用以结 ...
第1次作业：我与我的IT梦
第一部分:结缘计算机 1.1最美的风景,一直在路上说实话以前没有想过自己将学习计算机这个专业,在大二之前,我还是教师教育学院的一名师范生,机缘巧合,赶上了学校允许师范专业的同学转到非师范专业,于是, ...
C语言嵌套循环
题目一:7-3 编程打印空心字符菱形 1.提交列表 2.设计思路: 1.定义整型变量循环控制变量i,j,k,x,y,z,e及菱形的高度height: 2.定义字符型变量letter: 3.输入字符型变 ...

【转】Python3中urllib详细使用方法(header,代理,超时,认证,异常处理)

【转】Python3中urllib详细使用方法(header,代理,超时,认证,异常处理)的更多相关文章

随机推荐

热门专题