requests

Python标准库中提供了:urllib、urllib2、httplib等模块以供Http请求,但是,它的 API 太渣了。它是为另一个时代、另一个互联网所创建的。它需要巨量的工作,甚至包括各种方法覆盖,来完成最简单的任务。

Requests 是使用 Apache2 Licensed 许可证的 基于Python开发的HTTP 库,其在Python内置模块的基础上进行了高度的封装,从而使得Pythoner进行网络请求时,变得美好了许多,使用Requests可以轻而易举的完成浏览器可有的任何操作。

1、GET请求

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 1、无参数实例
  
import requests
  
ret = requests.get('https://github.com/timeline.json')
  
print ret.url
print ret.text
  
  
  
# 2、有参数实例
  
import requests
  
payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.get("http://httpbin.org/get", params=payload)
  
print ret.url
print ret.text

2、POST请求

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 1、基本POST实例
  
import requests
  
payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.post("http://httpbin.org/post", data=payload)
  
print ret.text
  
  
# 2、发送请求头和数据实例
  
import requests
import json
  
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}
  
ret = requests.post(url, data=json.dumps(payload), headers=headers)
  
print ret.text
print ret.cookies

3、其他请求

1
2
3
4
5
6
7
8
9
10
requests.get(url, params=None, **kwargs)
requests.post(url, data=None, json=None, **kwargs)
requests.put(url, data=None, **kwargs)
requests.head(url, **kwargs)
requests.delete(url, **kwargs)
requests.patch(url, data=None, **kwargs)
requests.options(url, **kwargs)
  
# 以上方法均是在此方法的基础上构建
requests.request(method, url, **kwargs)

4、更多参数

  1. def request(method, url, **kwargs):
  2. """Constructs and sends a :class:`Request <Request>`.
  3.  
  4. :param method: method for the new :class:`Request` object.
  5. :param url: URL for the new :class:`Request` object.
  6. :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
  7. :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
  8. :param json: (optional) json data to send in the body of the :class:`Request`.
  9. :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
  10. :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
  11. :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
  12. ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
  13. or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
  14. defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
  15. to add for the file.
  16. :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
  17. :param timeout: (optional) How long to wait for the server to send data
  18. before giving up, as a float, or a :ref:`(connect timeout, read
  19. timeout) <timeouts>` tuple.
  20. :type timeout: float or tuple
  21. :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.
  22. :type allow_redirects: bool
  23. :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
  24. :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.
  25. :param stream: (optional) if ``False``, the response content will be immediately downloaded.
  26. :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
  27. :return: :class:`Response <Response>` object
  28. :rtype: requests.Response
  29.  
  30. Usage::
  31.  
  32. >>> import requests
  33. >>> req = requests.request('GET', 'http://httpbin.org/get')
  34. <Response [200]>
  35. """
  1. def param_method_url():
  2. # requests.request(method='get', url='http://127.0.0.1:8000/test/')
  3. # requests.request(method='post', url='http://127.0.0.1:8000/test/')
  4. pass
  5.  
  6. def param_param():
  7. # - 可以是字典
  8. # - 可以是字符串
  9. # - 可以是字节(ascii编码以内)
  10.  
  11. # requests.request(method='get',
  12. # url='http://127.0.0.1:8000/test/',
  13. # params={'k1': 'v1', 'k2': '水电费'})
  14.  
  15. # requests.request(method='get',
  16. # url='http://127.0.0.1:8000/test/',
  17. # params="k1=v1&k2=水电费&k3=v3&k3=vv3")
  18.  
  19. # requests.request(method='get',
  20. # url='http://127.0.0.1:8000/test/',
  21. # params=bytes("k1=v1&k2=k2&k3=v3&k3=vv3", encoding='utf8'))
  22.  
  23. # 错误
  24. # requests.request(method='get',
  25. # url='http://127.0.0.1:8000/test/',
  26. # params=bytes("k1=v1&k2=水电费&k3=v3&k3=vv3", encoding='utf8'))
  27. pass
  28.  
  29. def param_data():
  30. # 可以是字典
  31. # 可以是字符串
  32. # 可以是字节
  33. # 可以是文件对象
  34.  
  35. # requests.request(method='POST',
  36. # url='http://127.0.0.1:8000/test/',
  37. # data={'k1': 'v1', 'k2': '水电费'})
  38.  
  39. # requests.request(method='POST',
  40. # url='http://127.0.0.1:8000/test/',
  41. # data="k1=v1; k2=v2; k3=v3; k3=v4"
  42. # )
  43.  
  44. # requests.request(method='POST',
  45. # url='http://127.0.0.1:8000/test/',
  46. # data="k1=v1;k2=v2;k3=v3;k3=v4",
  47. # headers={'Content-Type': 'application/x-www-form-urlencoded'}
  48. # )
  49.  
  50. # requests.request(method='POST',
  51. # url='http://127.0.0.1:8000/test/',
  52. # data=open('data_file.py', mode='r', encoding='utf-8'), # 文件内容是:k1=v1;k2=v2;k3=v3;k3=v4
  53. # headers={'Content-Type': 'application/x-www-form-urlencoded'}
  54. # )
  55. pass
  56.  
  57. def param_json():
  58. # 将json中对应的数据进行序列化成一个字符串,json.dumps(...)
  59. # 然后发送到服务器端的body中,并且Content-Type是 {'Content-Type': 'application/json'}
  60. requests.request(method='POST',
  61. url='http://127.0.0.1:8000/test/',
  62. json={'k1': 'v1', 'k2': '水电费'})
  63.  
  64. def param_headers():
  65. # 发送请求头到服务器端
  66. requests.request(method='POST',
  67. url='http://127.0.0.1:8000/test/',
  68. json={'k1': 'v1', 'k2': '水电费'},
  69. headers={'Content-Type': 'application/x-www-form-urlencoded'}
  70. )
  1. def param_cookies():
  2. # 发送Cookie到服务器端
  3. requests.request(method='POST',
  4. url='http://127.0.0.1:8000/test/',
  5. data={'k1': 'v1', 'k2': 'v2'},
  6. cookies={'cook1': 'value1'},
  7. )
  8. # 也可以使用CookieJar(字典形式就是在此基础上封装)
  9. from http.cookiejar import CookieJar
  10. from http.cookiejar import Cookie
  11.  
  12. obj = CookieJar()
  13. obj.set_cookie(Cookie(version=0, name='c1', value='v1', port=None, domain='', path='/', secure=False, expires=None,
  14. discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False,
  15. port_specified=False, domain_specified=False, domain_initial_dot=False, path_specified=False)
  16. )
  17. requests.request(method='POST',
  18. url='http://127.0.0.1:8000/test/',
  19. data={'k1': 'v1', 'k2': 'v2'},
  20. cookies=obj)
  21.  
  22. def param_files():
  23. # 发送文件
  24. # file_dict = {
  25. # 'f1': open('readme', 'rb')
  26. # }
  27. # requests.request(method='POST',
  28. # url='http://127.0.0.1:8000/test/',
  29. # files=file_dict)
  30.  
  31. # 发送文件,定制文件名
  32. # file_dict = {
  33. # 'f1': ('test.txt', open('readme', 'rb'))
  34. # }
  35. # requests.request(method='POST',
  36. # url='http://127.0.0.1:8000/test/',
  37. # files=file_dict)
  38.  
  39. # 发送文件,定制文件名
  40. # file_dict = {
  41. # 'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf")
  42. # }
  43. # requests.request(method='POST',
  44. # url='http://127.0.0.1:8000/test/',
  45. # files=file_dict)
  46.  
  47. # 发送文件,定制文件名
  48. # file_dict = {
  49. # 'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf", 'application/text', {'k1': '0'})
  50. # }
  51. # requests.request(method='POST',
  52. # url='http://127.0.0.1:8000/test/',
  53. # files=file_dict)
  54.  
  55. pass
  56.  
  57. def param_auth():
  58. from requests.auth import HTTPBasicAuth, HTTPDigestAuth
  59.  
  60. ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi', 'sdfasdfasdf'))
  61. print(ret.text)
  62.  
  63. # ret = requests.get('http://192.168.1.1',
  64. # auth=HTTPBasicAuth('admin', 'admin'))
  65. # ret.encoding = 'gbk'
  66. # print(ret.text)
  67.  
  68. # ret = requests.get('http://httpbin.org/digest-auth/auth/user/pass', auth=HTTPDigestAuth('user', 'pass'))
  69. # print(ret)
  70. #
  71.  
  72. def param_timeout():
  73. # ret = requests.get('http://google.com/', timeout=1)
  74. # print(ret)
  75.  
  76. # ret = requests.get('http://google.com/', timeout=(5, 1))
  77. # print(ret)
  78. pass
  79.  
  80. def param_allow_redirects():
  81. ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False)
  82. print(ret.text)
  1. def param_proxies():
  2. # proxies = {
  3. # "http": "61.172.249.96:80",
  4. # "https": "http://61.185.219.126:3128",
  5. # }
  6.  
  7. # proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}
  8.  
  9. # ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies)
  10. # print(ret.headers)
  11.  
  12. # from requests.auth import HTTPProxyAuth
  13. #
  14. # proxyDict = {
  15. # 'http': '77.75.105.165',
  16. # 'https': '77.75.105.165'
  17. # }
  18. # auth = HTTPProxyAuth('username', 'mypassword')
  19. #
  20. # r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)
  21. # print(r.text)
  22.  
  23. pass
  24.  
  25. def param_stream():
  26. ret = requests.get('http://127.0.0.1:8000/test/', stream=True)
  27. print(ret.content)
  28. ret.close()
  29.  
  30. # from contextlib import closing
  31. # with closing(requests.get('http://httpbin.org/get', stream=True)) as r:
  32. # # 在此处理响应。
  33. # for i in r.iter_content():
  34. # print(i)
  35.  
  36. def requests_session():
  37. import requests
  38.  
  39. session = requests.Session()
  40.  
  41. ### 1、首先登陆任何页面,获取cookie
  42.  
  43. i1 = session.get(url="http://dig.chouti.com/help/service")
  44.  
  45. ### 2、用户登陆,携带上一次的cookie,后台对cookie中的 gpsd 进行授权
  46. i2 = session.post(
  47. url="http://dig.chouti.com/login",
  48. data={
  49. 'phone': "8615131255089",
  50. 'password': "xxxxxx",
  51. 'oneMonth': ""
  52. }
  53. )
  54.  
  55. i3 = session.post(
  56. url="http://dig.chouti.com/link/vote?linksId=8589623",
  57. )
  58. print(i3.text)
  1. } # proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'} # ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies) # print(ret.headers) # from requests.auth import HTTPProxyAuth # # proxyDict = { # 'http': '77.75.105.165', # 'https': '77.75.105.165' # } # auth = HTTPProxyAuth('username', 'mypassword') # # r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth) # print(r.text) pass def param_stream(): ret = requests.get('http://127.0.0.1:8000/test/', stream=True) print(ret.content) ret.close() # from contextlib import closing # with closing(requests.get('http://httpbin.org/get', stream=True)) as r: # # 在此处理响应。 # for i in r.iter_content(): # print(i) def requests_session(): import requests session = requests.Session() ### 1、首先登陆任何页面,获取cookie i1 = session.get(url="http://dig.chouti.com/help/service") ### 2、用户登陆,携带上一次的cookie,后台对cookie中的 gpsd 进行授权 i2 = session.post( url="http://dig.chouti.com/login", data={ 'phone': "8615131255089", 'password': "xxxxxx", 'oneMonth': "" } ) i3 = session.post( url="http://dig.chouti.com/link/vote?linksId=8589623", ) print(i3.text)
  1. ', secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False, port_specified=False, domain_specified=False, domain_initial_dot=False, path_specified=False) ) requests.request(method='POST', url='http://127.0.0.1:8000/test/', data={'k1': 'v1', 'k2': 'v2'}, cookies=obj) def param_files(): # 发送文件 # file_dict = { # 'f1': open('readme', 'rb') # } # requests.request(method='POST', # url='http://127.0.0.1:8000/test/', # files=file_dict) # 发送文件,定制文件名 # file_dict = { # 'f1': ('test.txt', open('readme', 'rb')) # } # requests.request(method='POST', # url='http://127.0.0.1:8000/test/', # files=file_dict) # 发送文件,定制文件名 # file_dict = { # 'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf") # } # requests.request(method='POST', # url='http://127.0.0.1:8000/test/', # files=file_dict) # 发送文件,定制文件名 # file_dict = { # 'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf", 'application/text', {'k1': '0'}) # } # requests.request(method='POST', # url='http://127.0.0.1:8000/test/', # files=file_dict) pass def param_auth(): from requests.auth import HTTPBasicAuth, HTTPDigestAuth ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi', 'sdfasdfasdf')) print(ret.text) # ret = requests.get('http://192.168.1.1', # auth=HTTPBasicAuth('admin', 'admin')) # ret.encoding = 'gbk' # print(ret.text) # ret = requests.get('http://httpbin.org/digest-auth/auth/user/pass', auth=HTTPDigestAuth('user', 'pass')) # print(ret) # def param_timeout(): # ret = requests.get('http://google.com/', timeout=1) # print(ret) # ret = requests.get('http://google.com/', timeout=(5, 1)) # print(ret) pass def param_allow_redirects(): ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False) print(ret.text) def param_proxies(): # proxies = { # "http": "61.172.249.96:80", # "https": "http://61.185.219.126:3128", # } # proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'} # ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies) # print(ret.headers) # from requests.auth import HTTPProxyAuth # # proxyDict = { # 'http': '77.75.105.165', # 'https': '77.75.105.165' # } # auth = HTTPProxyAuth('username', 'mypassword') # # r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth) # print(r.text) pass def param_stream(): ret = requests.get('http://127.0.0.1:8000/test/', stream=True) print(ret.content) ret.close() # from contextlib import closing # with closing(requests.get('http://httpbin.org/get', stream=True)) as r: # # 在此处理响应。 # for i in r.iter_content(): # print(i) def requests_session(): import requests session = requests.Session() ### 1、首先登陆任何页面,获取cookie i1 = session.get(url="http://dig.chouti.com/help/service") ### 2、用户登陆,携带上一次的cookie,后台对cookie中的 gpsd 进行授权 i2 = session.post( url="http://dig.chouti.com/login", data={ 'phone': "8615131255089", 'password': "xxxxxx", 'oneMonth': "" } ) i3 = session.post( url="http://dig.chouti.com/link/vote?linksId=8589623", ) print(i3.text)

备注:requests请求参数中,files={‘f1’:open('xxx','rb')}  ,可以上传文件 ,  auth参数用于基础认证

  timeout参数用于设置超时时间(单位为秒)  timeout=2,表示请求连接时,如果2秒没有连接上,放弃此次请求,timeout也可以设置两个参数   timeout=(3,2) 第一个时间表示请求连接的时间,第二个 参数表示响应的时间。

  proxies参数用于设置代理ip。  proxies = { "http": "61.172.249.96:80", "https": "http://61.185.219.126:3128"}  表示如果请求是http格式的,就访问http对应的代理,  如果请求是https格式 的,就访问https对应的代理。

  使用proxies的还有另一种方式   proxies = {'http://访问的网址': 'http://10.10.1.10:5323'}   表示访问某个网址,对应的使用哪个代理。

  如果代理加密了可以使用这个方式:

    from requests.auth import HTTPProxyAuth

    proxyDict = {  'http': '77.75.105.165',  'https': '77.75.105.165'  }   #  代理ip

    auth = HTTPProxyAuth('username', 'mypassword')

    r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)

  cert参数表示使用证书(.pem文件)加密,如果有两个值时,表示给证书也加密,  搭配verify参数(用于确认证书)使用

  

  另外,凡是以https开头的网址都是需要证书的

  

  我们每次请求都会携带上次的cookie或者上几次的cookie。requests模块帮我们封装了一个session的方法,帮我们管理cookie和headers,帮助我们每次请求时,都会携带上一次的cookie,同时,响应回来后,会将新的cookie更新进去。

  用法:session = requests.Session()       session.get(....)

官方文档:http://cn.python-requests.org/zh_CN/latest/user/quickstart.html#id4

爬虫基础库之requests的更多相关文章

  1. 爬虫基础库之requests模块

    一.requests模块简介 使用requests可以模拟浏览器请求,比起之前用到的urllib,requests模块的api更加快捷,其实ruquests的本质就是封装urllib3这个模块. re ...

  2. [python爬虫]Requests-BeautifulSoup-Re库方案--Requests库介绍

    [根据北京理工大学嵩天老师“Python网络爬虫与信息提取”慕课课程编写  文章中部分图片来自老师PPT 慕课链接:https://www.icourse163.org/learn/BIT-10018 ...

  3. F#之旅5 - 小实践之下载网页(爬虫基础库)

    参考文章:https://swlaschin.gitbooks.io/fsharpforfunandprofit/content/posts/fvsc-download.html 参考的文章教了我们如 ...

  4. python爬虫之路——初识爬虫三大库,requests,lxml,beautiful.

    三大库:requests,lxml,beautifulSoup. Request库作用:请求网站获取网页数据. get()的基本使用方法 #导入库 import requests #向网站发送请求,获 ...

  5. 爬虫——请求库之requests

    阅读目录 一 介绍 二 基于GET请求 三 基于POST请求 四 响应Response 五 高级用法 一 介绍 #介绍:使用requests可以模拟浏览器的请求,比起之前用到的urllib,reque ...

  6. 爬虫基础库之Selenium

    1.简介 selenium最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器,完全模拟浏览器的操作, ...

  7. 爬虫 - 请求库之requests

    介绍 使用requests可以模拟浏览器的请求,比起python内置的urllib模块,requests模块的api更加便捷(本质就是封装了urllib3) 注意:requests库发送请求将网页内容 ...

  8. 爬虫请求库之requests库

    一.介绍 介绍:使用requests可以模拟浏览器的请求,比之前的urllib库使用更加方便 注意:requests库发送请求将网页内容下载下来之后,并不会执行js代码,这需要我们自己分析目标站点然后 ...

  9. 爬虫基础库之beautifulsoup的简单使用

    beautifulsoup的简单使用 简单来说,Beautiful Soup是python的一个库,最主要的功能是从网页抓取数据.官方解释如下: ''' Beautiful Soup提供一些简单的.p ...

随机推荐

  1. 如何按需动态加载js文件

    JavaScript无非就是script标签引入页面,但当项目越来越大的时候,单页面引入N个js显然不行,合并为单个文件减少了请求数,但请求的文件体积却很大.这时候最好的做法就是按需引入,动态引入组件 ...

  2. [dhcpd]清除dhcp缓存

    修改了dhcp的default-lease-time && max-lease-time 清除缓存: rm /var/lib/dhcpd.leases~ echo "&quo ...

  3. HRBUST 1819

    石子合并问题--圆形版 Time Limit: 1000 MS Memory Limit: 32768 K Total Submit: 61(27 users) Total Accepted: 26( ...

  4. CentOS 下安装 LEMP 服务(nginx、MariaDB/MySQL 和 php)

    转载自:https://linux.cn/article-4314-1.html 编译自:http://xmodulo.com/install-lemp-stack-centos.html 作者: D ...

  5. c# 深拷贝与浅拷贝的区别分析及实例

    浅拷贝(影子克隆):只复制对象的基本类型,对象类型,仍属于原来的引用. 深拷贝(深度克隆):不紧复制对象的基本类,同时也复制原对象中的对象.就是说完全是新对象产生的. 深拷贝是指源对象与拷贝对象互相独 ...

  6. linux用户登录指定目录

    一.创建用户和用户组 [root@web4 lianyu]# groupadd lianyu [root@web4 lianyu]# useradd lianyu -g lianyu [root@we ...

  7. wamp环境介绍

    一.简介 Wamp就是 Windows Apache Mysql PHP集成安装环境,即在window下的apache.php和mysql的服务器软件. 二.常用的集成环境 XAMPP - XAMPP ...

  8. frame外弹出,刷新父页面

    //刷新父页面 function reflashParent() { var id = parent.tabbar.getActiveTab(); id = id.replace('tab','mai ...

  9. Ubuntu12.04 安装LAMP及phpmyadmin

    1.安装 Apache apt-get install apache2 2.安装 PHP5 apt-get install php5 libapache2-mod-php5 3.安装 MySQL ap ...

  10. 【BZOJ4903】【CTSC2017】吉夫特 [DP]

    吉夫特 Time Limit: 15 Sec  Memory Limit: 512 MB[Submit][Status][Discuss] Description Input 第一行一个整数n. 接下 ...