爬虫必备—requests

Requests 是使用 Apache2 Licensed 许可证的基于Python开发的HTTP 库，其在Python内置模块的基础上进行了高度的封装，从而使得Pythoner进行网络请求时，变得美好了许多，使用Requests可以轻而易举的完成浏览器可有的任何操作。

1. GET请求

 # 1、无参数实例

 import requests

 ret = requests.get('https://github.com/timeline.json')

 print(ret.url)

 print(ret.text)

 # 2、有参数实例

 import requests

 payload = {'key1': 'value1', 'key2': 'value2'}

 ret = requests.get("http://httpbin.org/get", params=payload)

 print(ret.url)

 print(ret.text)

2. POST请求

 # 1、基本POST实例

 import requests

 payload = {'key1': 'value1', 'key2': 'value2'}

 ret = requests.post("http://httpbin.org/post", data=payload)

 print(ret.text)

 # 2、发送请求头和数据实例

 import requests

 import json

 url = 'https://api.github.com/some/endpoint'

 payload = {'some': 'data'}

 headers = {'content-type': 'application/json'}

 ret = requests.post(url, data=json.dumps(payload), headers=headers)

 print(ret.text)

 print(ret.cookies)

3. 其它请求

 requests.get(url, params=None, **kwargs)

 requests.post(url, data=None, json=None, **kwargs)

 requests.put(url, data=None, **kwargs)

 requests.head(url, **kwargs)

 requests.delete(url, **kwargs)

 requests.patch(url, data=None, **kwargs)

 requests.options(url, **kwargs)

 # 以上方法均是在此方法的基础上构建

 requests.request(method, url, **kwargs)

4. 请求参数

 def request(method, url, **kwargs):

     """Constructs and sends a :class:`Request <Request>`.

     :param method: method for the new :class:`Request` object.

     :param url: URL for the new :class:`Request` object.

     :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.

     :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.

     :param json: (optional) json data to send in the body of the :class:`Request`.

     :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.

     :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.

     :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.

         ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``

         or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string

         defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers

         to add for the file.

     :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.

     :param timeout: (optional) How long to wait for the server to send data

         before giving up, as a float, or a :ref:`(connect timeout, read

         timeout) <timeouts>` tuple.

     :type timeout: float or tuple

     :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.

     :type allow_redirects: bool

     :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.

     :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.

     :param stream: (optional) if ``False``, the response content will be immediately downloaded.

     :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.

     :return: :class:`Response <Response>` object

     :rtype: requests.Response

     Usage::

       >>> import requests

       >>> req = requests.request('GET', 'http://httpbin.org/get')

       <Response [200]>

     """

参数列表

5. 参数示例

 def param_method_url():

     # requests.request(method='get', url='http://127.0.0.1:8000/test/')

     # requests.request(method='post', url='http://127.0.0.1:8000/test/')

     pass

 def param_param():

     # - 可以是字典

     # - 可以是字符串

     # - 可以是字节（ascii编码以内）

     # requests.request(method='get',

     # url='http://127.0.0.1:8000/test/',

     # params={'k1': 'v1', 'k2': '水电费'})

     # requests.request(method='get',

     # url='http://127.0.0.1:8000/test/',

     # params="k1=v1&k2=水电费&k3=v3&k3=vv3")

     # requests.request(method='get',

     # url='http://127.0.0.1:8000/test/',

     # params=bytes("k1=v1&k2=k2&k3=v3&k3=vv3", encoding='utf8'))

     # 错误

     # requests.request(method='get',

     # url='http://127.0.0.1:8000/test/',

     # params=bytes("k1=v1&k2=水电费&k3=v3&k3=vv3", encoding='utf8'))

     pass

 def param_data():

     # 可以是字典

     # 可以是字符串

     # 可以是字节

     # 可以是文件对象

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # data={'k1': 'v1', 'k2': '水电费'})

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # data="k1=v1; k2=v2; k3=v3; k3=v4"

     # )

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # data="k1=v1;k2=v2;k3=v3;k3=v4",

     # headers={'Content-Type': 'application/x-www-form-urlencoded'}

     # )

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # data=open('data_file.py', mode='r', encoding='utf-8'), # 文件内容是：k1=v1;k2=v2;k3=v3;k3=v4

     # headers={'Content-Type': 'application/x-www-form-urlencoded'}

     # )

     pass

 def param_json():

     # 将json中对应的数据进行序列化成一个字符串，json.dumps(...)

     # 然后发送到服务器端的body中，并且Content-Type是 {'Content-Type': 'application/json'}

     requests.request(method='POST',

                      url='http://127.0.0.1:8000/test/',

                      json={'k1': 'v1', 'k2': '水电费'})

 def param_headers():

     # 发送请求头到服务器端

     requests.request(method='POST',

                      url='http://127.0.0.1:8000/test/',

                      json={'k1': 'v1', 'k2': '水电费'},

                      headers={'Content-Type': 'application/x-www-form-urlencoded'}

                      )

 def param_cookies():

     # 发送Cookie到服务器端

     requests.request(method='POST',

                      url='http://127.0.0.1:8000/test/',

                      data={'k1': 'v1', 'k2': 'v2'},

                      cookies={'cook1': 'value1'},

                      )

     # 也可以使用CookieJar（字典形式就是在此基础上封装）

     from http.cookiejar import CookieJar

     from http.cookiejar import Cookie

     obj = CookieJar()

     obj.set_cookie(Cookie(version=0, name='c1', value='v1', port=None, domain='', path='/', secure=False, expires=None,

                           discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False,

                           port_specified=False, domain_specified=False, domain_initial_dot=False, path_specified=False)

                    )

     requests.request(method='POST',

                      url='http://127.0.0.1:8000/test/',

                      data={'k1': 'v1', 'k2': 'v2'},

                      cookies=obj)

 def param_files():

     # 发送文件

     # file_dict = {

     # 'f1': open('readme', 'rb')

     # }

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # files=file_dict)

     # 发送文件，定制文件名

     # file_dict = {

     # 'f1': ('test.txt', open('readme', 'rb'))

     # }

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # files=file_dict)

     # 发送文件，定制文件名

     # file_dict = {

     # 'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf")

     # }

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # files=file_dict)

     # 发送文件，定制文件名

     # file_dict = {

     #     'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf", 'application/text', {'k1': '0'})

     # }

     # requests.request(method='POST',

     #                  url='http://127.0.0.1:8000/test/',

     #                  files=file_dict)

     pass

 def param_auth():

     from requests.auth import HTTPBasicAuth, HTTPDigestAuth

     ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi', 'sdfasdfasdf'))

     print(ret.text)

     # ret = requests.get('http://192.168.1.1',

     # auth=HTTPBasicAuth('admin', 'admin'))

     # ret.encoding = 'gbk'

     # print(ret.text)

     # ret = requests.get('http://httpbin.org/digest-auth/auth/user/pass', auth=HTTPDigestAuth('user', 'pass'))

     # print(ret)

     #

 def param_timeout():

     # ret = requests.get('http://google.com/', timeout=1)

     # print(ret)

     # ret = requests.get('http://google.com/', timeout=(5, 1))

     # print(ret)

     pass

 def param_allow_redirects():

     ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False)

     print(ret.text)

 def param_proxies():

     # proxies = {

     # "http": "61.172.249.96:80",

     # "https": "http://61.185.219.126:3128",

     # }

     # proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}

     # ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies)

     # print(ret.headers)

     # from requests.auth import HTTPProxyAuth

     #

     # proxyDict = {

     # 'http': '77.75.105.165',

     # 'https': '77.75.105.165'

     # }

     # auth = HTTPProxyAuth('username', 'mypassword')

     #

     # r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)

     # print(r.text)

     pass

 def param_stream():

     ret = requests.get('http://127.0.0.1:8000/test/', stream=True)

     print(ret.content)

     ret.close()

     # from contextlib import closing

     # with closing(requests.get('http://httpbin.org/get', stream=True)) as r:

     # # 在此处理响应。

     # for i in r.iter_content():

     # print(i)

 def requests_session():

     import requests

     session = requests.Session()

     ### 1、首先登陆任何页面，获取cookie

     i1 = session.get(url="http://dig.chouti.com/help/service")

     ### 2、用户登陆，携带上一次的cookie，后台对cookie中的 gpsd 进行授权

     i2 = session.post(

         url="http://dig.chouti.com/login",

         data={

             'phone': "",

             'password': "xxxxxx",

             'oneMonth': ""

         }

     )

     i3 = session.post(

         url="http://dig.chouti.com/link/vote?linksId=8589623",

     )

     print(i3.text)

参数示例

6. requests模拟登陆GitHub

 import requests

 from bs4 import BeautifulSoup

 def login_github():

     """

     通过requests模块模拟浏览器登陆GitHub

     :return:

     """

     # 获取csrf_token

     r1 = requests.get('https://github.com/login')   # 获得get请求的对象

     s1 = BeautifulSoup(r1.text, 'html.parser')      # 使用bs4解析HTML对象

     token = s1.find('input', attrs={'name': 'authenticity_token'}).get('value')     # 获取登陆授权码，即csrf_token

     get_cookies = r1.cookies.get_dict()     # 获取get请求的cookies，post请求时必须携带

     # 发送post登陆请求

     '''

     post登陆参数

     commit    Sign+in

     utf8    ✓

     authenticity_token    E961jQMIyC9NPwL54YPj70gv2hbXWJ…fTUd+e4lT5RAizKbfzQo4eRHsfg==

     login    JackUpDown（用户名）

     password    **********（密码）

     '''

     r2 = requests.post(

         'https://github.com/session',

         data={

             'commit': 'Sign+in',

             'utf8': '✓',

             'authenticity_token': token,

             'login': 'JackUpDown',

             'password': '**********'

         },

         cookies=get_cookies     # 携带get请求的cookies

                        )

     login_cookies = r2.cookies.get_dict()   # 获得登陆成功的cookies，携带此cookies就可以访问任意GitHub页面

     # 携带post cookies跳转任意页面

     r3 = requests.get('https://github.com/settings/emails', cookies=login_cookies)

     print(r3.text)

转载自：

1. http://www.cnblogs.com/wupeiqi/articles/6283017.html

爬虫必备—requests的更多相关文章

网络爬虫必备知识之requests库
就库的范围,个人认为网络爬虫必备库知识包括urllib.requests.re.BeautifulSoup.concurrent.futures,接下来将结对requests库的使用方法进行总结 1. ...
网络爬虫必备知识之urllib库
就库的范围,个人认为网络爬虫必备库知识包括urllib.requests.re.BeautifulSoup.concurrent.futures,接下来将结合爬虫示例分别对urllib库的使用方法进行 ...
网络爬虫必备知识之concurrent.futures库
就库的范围,个人认为网络爬虫必备库知识包括urllib.requests.re.BeautifulSoup.concurrent.futures,接下来将结对concurrent.futures库的使 ...
Python爬虫之requests
爬虫之requests 库的基本用法基本请求: requests库提供了http所有的基本请求方式.例如 r = requests.post("http://httpbin.org/pos ...
第三百二十二节，web爬虫，requests请求
第三百二十二节,web爬虫,requests请求 requests请求,就是用yhthon的requests模块模拟浏览器请求,返回html源码模拟浏览器请求有两种,一种是不需要用户登录或者验证的请 ...
孤荷凌寒自学python第六十七天初步了解Python爬虫初识requests模块
孤荷凌寒自学python第六十七天初步了解Python爬虫初识requests模块 (完整学习过程屏幕记录视频地址在文末) 从今天起开始正式学习Python的爬虫. 今天已经初步了解了两个主要的模块: ...
Python爬虫练习(requests模块)
Python爬虫练习(requests模块) 关注公众号"轻松学编程"了解更多. 一.使用正则表达式解析页面和提取数据 1.爬取动态数据(js格式) 爬取http://fund.e ...
【Python爬虫】爬虫利器 requests 库小结
requests库 Requests 是一个 Python 的 HTTP 客户端库. 支持许多 HTTP 特性,可以非常方便地进行网页请求.网页分析和处理网页资源,拥有许多强大的功能. 本文主要介绍 ...
自定义 scrapy 爬虫的 requests
之前使用 scrapy 抓取数据的时候 ,默认是在逻辑中判断是否执行下一次请求 def parse(self): # 获取所有的url,例如获取到urls中 for url in urls: yiel ...

随机推荐

ObjectMapper 动态用法
class DymicObject { private Object o; public DymicObject(Object o) { this.o = o; } p ...
[Alpha]团队任务拆解
要求团队任务拆解 Alpha阶段总体规划初步实现测试.报告: 实现对游戏最基本的测试,包括内置随机测试.提供可供选择的组合测试实现对游戏测试时操作的记录并最终生成报告能够在发现异常时及时将异常 ...
ui4-5
2016PS第4-5周图像的高级编辑方法: 4-1.用变换将照片放入相框 1.打开素材文件01-1.jpg 2.执行:文件/置入,选素材01-2.jpg 3.执行:编辑/变换/缩放,缩小照片,暂不退 ...
【算法笔记】B1052 卖个萌
题目链接:https://pintia.cn/problem-sets/994805260223102976/problems/994805273883951104 #include <math ...
9. Javascript学习笔记——表单处理
9. 表单处理 9.1 表单的基础知识 ///表单用 <form> 元素表示,对应的是 HTMLFormElement 类型,继承自 HTMLElement. //属性:action.me ...
springboot(十三)-分库分表-手动配置
sharding-jdbc简介 Sharding-JDBC直接封装JDBC API,可以理解为增强版的JDBC驱动,旧代码迁移成本几乎为零: 可适用于任何基于java的ORM框架,如:JPA, Hib ...
winform两个窗体之间传值（C#委托事件实现）
委托定义一个委托,声明一个委托变量,然后让变量去做方法应该做的事. 委托是一个类型事件是委托变量实现的经典例子:两个winform窗体传值定义两个窗体:form1和form2 form1上有一 ...
(热死你)Resin https ssl Linux 配置，实战可用
(热死你)Resin https ssl Linux 配置,实战可用一.配置resin 1.在resin服务器中创建目录keys文件和openssl.conf,格式内容如下: #先复制以下的内容: ...
Linux下Tomcat8.0.44配置使用Apr
听说Apr可以提高tomcat很多的性能,配置具体如下 1.安装apr 1.5.2 [root@ecs-3c46 ]# cd /usr/local/src [root@ecs-3c46 src]# w ...
WPF获取程序版本号(Version)的方法
1.第一种:通过System来获取 public static Version GetEdition() { return System.Reflection.Assembly.GetExecutin ...

爬虫必备—requests

1. GET请求

2. POST请求

3. 其它请求

4. 请求参数

5. 参数示例

6. requests模拟登陆GitHub

爬虫必备—requests的更多相关文章

随机推荐

热门专题