urllib2的简单介绍
参考网址:http://www.voidspace.org.uk/python/articles/urllib2.shtml

Fetching URLs
The simplest way to use urllib2 is as follows :
1、
import urllib2
response = urllib2.urlopen('http://python.org/')
html = response.read()

2、
import urllib2
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)
the_page = response.read()
3、
req = urllib2.Request('ftp://example.com/')
4、
import urllib
import urllib2

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }

data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
5、
Data can also be passed in an HTTP GET request by encoding it in the URL itself.

>>> import urllib2
>>> import urllib
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.urlencode(data)
>>> print url_values
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib2.urlopen(full_url)
6、
import urllib
import urllib2

url = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
headers = { 'User-Agent' : user_agent }

data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

7、Handling Exceptions
1)URLError
>>> req = urllib2.Request('http://www.jianshu.com/p/5c7a1af4aa531')
>>> try: urllib2.urlopen(req)
>>> except URLError, e:
>>> print e.reason
>>> print e,e.code #分别表示凡返回错误类型,错误代码和类型,错误代码

2)HTTPError is the subclass of URLError raised in the specific case of HTTP URLs.
HTTPError
Every HTTP response from the server contains a numeric "status code". Sometimes the status code indicates that the server is unable to fulfil the request
3)Error Codes
100: ('Continue', 'Request received, please continue'),
101: ('Switching Protocols',
'Switching to new protocol; obey Upgrade header'),

200: ('OK', 'Request fulfilled, document follows'),
201: ('Created', 'Document created, URL follows'),
202: ('Accepted',
'Request accepted, processing continues off-line'),
203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
204: ('No Content', 'Request fulfilled, nothing follows'),
205: ('Reset Content', 'Clear input form for further input.'),
206: ('Partial Content', 'Partial content follows.'),

300: ('Multiple Choices',
'Object has several resources -- see URI list'),
301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
302: ('Found', 'Object moved temporarily -- see URI list'),
303: ('See Other', 'Object moved -- see Method and URL list'),
304: ('Not Modified',
'Document has not changed since given time'),
305: ('Use Proxy',
'You must use proxy specified in Location to access this '
'resource.'),
307: ('Temporary Redirect',
'Object moved temporarily -- see URI list'),

400: ('Bad Request',
'Bad request syntax or unsupported method'),
401: ('Unauthorized',
'No permission -- see authorization schemes'),
402: ('Payment Required',
'No payment -- see charging schemes'),
403: ('Forbidden',
'Request forbidden -- authorization will not help'),
404: ('Not Found', 'Nothing matches the given URI'),
405: ('Method Not Allowed',
'Specified method is invalid for this server.'),
406: ('Not Acceptable', 'URI not available in preferred format.'),
407: ('Proxy Authentication Required', 'You must authenticate with '
'this proxy before proceeding.'),
408: ('Request Timeout', 'Request timed out; try again later.'),
409: ('Conflict', 'Request conflict.'),
410: ('Gone',
'URI no longer exists and has been permanently removed.'),
411: ('Length Required', 'Client must specify Content-Length.'),
412: ('Precondition Failed', 'Precondition in headers is false.'),
413: ('Request Entity Too Large', 'Entity is too large.'),
414: ('Request-URI Too Long', 'URI is too long.'),
415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
416: ('Requested Range Not Satisfiable',
'Cannot satisfy request range.'),
417: ('Expectation Failed',
'Expect condition could not be satisfied.'),

500: ('Internal Server Error', 'Server got itself in trouble'),
501: ('Not Implemented',
'Server does not support this operation'),
502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
503: ('Service Unavailable',
'The server cannot process the request due to a high load'),
504: ('Gateway Timeout',
'The gateway server did not receive a timely response'),
505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
4)
例子1:
from urllib2 import Request, urlopen, URLError, HTTPError
req = Request(someurl)
try:
response = urlopen(req)
except HTTPError, e:
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
except URLError, e:
print 'We failed to reach a server.'
print 'Reason: ', e.reason
else:

注意:HTTPError 是URLError的子类,要写在前面
例子2:
from urllib2 import Request, urlopen, URLError
req = Request(someurl)
try:
response = urlopen(req)
except URLError, e:
if hasattr(e, 'reason'):
print 'We failed to reach a server.'
print 'Reason: ', e.reason
elif hasattr(e, 'code'):
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
else:
# everything is fine

例子3:
from urllib2 import Request, urlopen
req = Request(someurl)
try:
response = urlopen(req)
except IOError, e:
if hasattr(e, 'reason'):
print 'We failed to reach a server.'
print 'Reason: ', e.reason
elif hasattr(e, 'code'):
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
else:
# everything is fine
注意:URLError是IOError的子类,极少数情况下可能会报socket.error

8、info and geturl
geturl:
这将返回所获取页面的真实URL。这很有用,因为urlopen(或者使用的打开器对象)可能已经遵循了重定向。所获取的页面的URL可能与所请求的URL不相同
info:
这会返回一个类似于字典的对象,描述所获取的页面,特别是服务器发送的头部。它现在是一个httplib。HTTPMessage实例.
例子:
from urllib2 import Request,urlopen,URLError,HTTPError
url = 'https://passport.baidu.com/center?_t=1510744860'
req = Request(url)
response = urlopen(req)
print response.info()
print response.geturl()

9、Openers and Handlers
Openers:

  当你获取一个URL你使用一个opener(一个urllib2.OpenerDirector的实例)。正常情况下,我们使用默认opener:通过urlopen。但你能够创建个性的openers。可以用build_opener来创建opener对象。一般可用于需要处理cookie或者不想进行redirection的应用场景(You will want to create openers if you want to fetch URLs with specific handlers installed, for example to get an opener that handles cookies, or to get an opener that does not handle redirections.)

  以下是用代理ip模拟登录时(需要处理cookie)使用handler和opener的具体流程。

1 self.proxy = urllib2.ProxyHandler({'http': self.proxy_url})
2 self.cookie = cookielib.LWPCookieJar()
3 self.cookie_handler = urllib2.HTTPCookieProcessor(self.cookie)
4 self.opener = urllib2.build_opener(self.cookie_handler, self.proxy, urllib2.HTTPHandler)
Handles:

  Openers使用处理器handlers,所有的“繁重”工作由handlers处理。每个handlers知道如何通过特定协议打开URLs,或者如何处理URL打开时的各个方面。例如HTTP重定向或者HTTP cookies。

更多关于Openers和Handlers的信息。http://www.voidspace.org.uk/python/articles/urllib2.shtml#openers-and-handlers

10、Proxies
proxy代理ip创建opener

Note:Currently urllib2 does not support fetching of https locations through a proxy. This can be a problem.
(http://www.voidspace.org.uk/python/articles/urllib2.shtml#proxies)

例子:
1 import urllib2
2 proxy——handler = urllib2.ProxyHandler({'http': '54.186.78.110:3128'})#注意要确保该代理ip可用
3 opener = urllib2.build_opener(proxy_handler)
4 request = urllib2.Request(url, post_data, login_headers)#该例中还需要提交post_data和header信息
5 response = opener.open(request)
6 print response.read().encode('utf-8')

11、Sockets and Layers
例子:
import socket
import urllib2

# timeout in seconds
timeout = 10
socket.setdefaulttimeout(timeout)

# this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)

12、Cookie
urllib2 对 Cookie 的处理也是自动的。如果需要得到某个 Cookie 项的值,可以这么做:
例子:
import urllib2
import cookielib
cookie = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
response = opener.open('http://www.baidu.com')
for item in cookie:
print 'Name = '+item.name
print 'Value = '+item.value

运行之后就会输出访问百度的Cookie值:
Name = BAIDUID
Value = C664216C4F7BD6B98DB0B300292E0A23:FG=1
Name = BIDUPSID
Value = C664216C4F7BD6B98DB0B300292E0A23
Name = H_PS_PSSID
Value = 1464_21099_17001_24879_22159
Name = PSTM
Value = 1510747061
Name = BDSVRTM
Value = 0
Name = BD_HOME
Value = 0
13、对付"反盗链"
某些站点有所谓的反盗链设置,其实说穿了很简单,
就是检查你发送请求的header里面,referer站点是不是他自己,
所以我们只需要像把headers的referer改成该网站即可,以baidu为例:
#...
headers = {
'Referer':'http://www.baidu.com/'
}
#...
headers是一个dict数据结构,你可以放入任何想要的header,来做一些伪装。
例如,有些网站喜欢读取header中的X-Forwarded-For来看看人家的真实IP,可以直接把X-Forwarde-For改了。

python urllib2库的简单总结的更多相关文章

  1. python requests库的简单使用

    requests是python的一个HTTP客户端库,跟urllib,urllib2类似,但比urllib,urllib2更加使用简单. 1. requests库的安装在你的终端中运行pip安装命令即 ...

  2. python第三方库requests简单介绍

    一.发送请求与传递参数 简单demo: import requests r = requests.get(url='http://www.itwhy.org') # 最基本的GET请求 print(r ...

  3. Python Pexpect库的简单使用

    Python Pexpect库的使用 简介 最近需要远程操作一个服务器并执行该服务器上的一个python脚本,查到可以使用Pexpect这个库.记录一下. 什么是Pexpect?Pexpect能够产生 ...

  4. Python turtle库绘制简单图形

    一.简介 Python中的turtle库是一个直观有趣的图形绘制函数库.turtle库绘制图形有一个基本框架:一个小海龟在坐标系中爬行,其爬行轨迹形成了绘制图形. 二.简单的图形列举 1.绘制4个不同 ...

  5. python requests库的简单运用

    python requests的简单运用 使用pycharm获取requests包 ctrl+alt+s Project:pythonProject pythoninterpreter 点+号搜索 使 ...

  6. python urllib2练习发送简单post

    import urllib2 import urllib url = 'http://localhost/1.php' while True: data = raw_input('(ctrl+c ex ...

  7. python第三方库,你要的这里都有

    Python的第三方库多的超出我的想象. python 第三方模块 转 https://github.com/masterpy/zwpy_lst   Chardet,字符编码探测器,可以自动检测文本. ...

  8. python常用库(转)

    转自http://www.west999.com/info/html/wangluobiancheng/qita/20180729/4410114.html Python常用的库简单介绍一下 fuzz ...

  9. Python全部库整理

    库名称简介 Chardet字符编码探测器,可以自动检测文本.网页.xml的编码. colorama主要用来给文本添加各种颜色,并且非常简单易用. Prettytable主要用于在终端或浏览器端构建格式 ...

随机推荐

  1. SSH 框架整合总结

    1. 搭建Struts2 环境 创建 struts2 的配置文件: struts.xml; 在 web.xml 中配置 struts2 的核心过滤器; // struts.xml <?xml v ...

  2. 白话Redis分布式锁

    redis分布式 简单来说就是,操作redis实例时,不是常规(单机)操作一个实例,而是操作两台或多台,也就是基于分布式集群: 而且redis内部是单进程.单线程,是数据安全的(只有自己的线程在操作数 ...

  3. OVN实战---《A Primer on OVN》翻译

    overview 在本文中,我们将在三个host之间创建一个简单的二层overlay network.首先,我们来简单看一下,整个系统是怎么工作的.OVN基于分布式的control plane,其中各 ...

  4. Django - ModelForm组件

    一.ModelForm组件 这是一个神奇的组件,通过名字我们可以看出来,这个组件的功能就是把model和form组合起来,先来一个简单的例子来看一下这个东西怎么用:比如我们的数据库中有这样一张学生表, ...

  5. 配置 Docker 镜像下载的本地 mirror 服务

    Docker registry 工具如今已经非常好的支持了 mirror 功能,使用它能够配置一个本地的 mirror 服务.将 pull 过的镜像 cache 在本地.这样其他主机再次 pull 的 ...

  6. oracle入门(1)——安装oracle 11g x64 for windows

    [本文简介] 最近因为一个项目的需要,从零学习起了oracle,现在把学到的东西记录分享一下. 首先是安装篇,在win8 装10G 一直失败,网上各种方法都试过了,最后不得不放弃,选择了11G. 11 ...

  7. Hurst指数以及MF-DFA

    转:https://uqer.io/home/ https://uqer.io/community/share/564c3bc2f9f06c4446b48393 写在前面 9月的时候说想把arch包加 ...

  8. Hadoop的eclipse1.1.2插件的安装和配置

    我的集群使用的hadoop版本是hadoop-1.1.2.对应的eclipse版本也是:hadoop-eclipse-plugin-1.1.2_20131021200005 (1)在eclipse的d ...

  9. ES6的十个新特性

    这里只讲 ES6比较突出的特性,因为只能挑出十个,所以其他特性请参考官方文档: /** * Created by zhangsong on 16/5/20. *///    ***********Nu ...

  10. HDOJ 2203 亲和串 【KMP】

    HDOJ 2203 亲和串 [KMP] Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others) ...