Python爬虫之-Requests
Requests模块
Python标准库中提供了:urllib、urllib2、httplib等模块以供Http请求,但是,它的 API 太渣了。
它是为另一个时代、另一个互联网所创建的。它需要巨量的工作,甚至包括各种方法覆盖,来完成最简单的任务。
Requests 是使用 Apache2 Licensed 许可证的 基于Python开发的HTTP 库,其在Python内置模块的基础上进行了高度的封装;
从而使得Pythoner进行网络请求时,变得方便了许多,使用Requests可以轻而易举的完成浏览器可有的任何操作。
GET请求
- # 1、无参数实例
- import requests
- ret = requests.get('https://github.com/timeline.json')
- print(ret.url)
print(ret.text)- # 2、有参数实例
import requests payload = {'key1': 'value1', 'key2': 'value2'} ret = requests.get("http://httpbin.org/get", params=payload)
print(ret.url)
print(ret.text)
POST请求
- # 1、基本POST实例
- import requests
- payload = {'key1': 'value1', 'key2': 'value2'}
- ret = requests.post("http://httpbin.org/post", data=payload)
- print(ret.text)
- # 2、发送请求头和数据实例
- import requests
- import json
- url = 'https://api.github.com/some/endpoint'
- payload = {'some': 'data'}
- headers = {'content-type': 'application/json'}
- ret = requests.post(url, data=json.dumps(payload), headers=headers)
- print(ret.text)print(ret.cookies)
其他请求
- requests.get(url, params=None, **kwargs)
- requests.post(url, data=None, json=None, **kwargs)
- requests.put(url, data=None, **kwargs)
- requests.head(url, **kwargs)
- requests.delete(url, **kwargs)
- requests.patch(url, data=None, **kwargs)
- requests.options(url, **kwargs)
- # 以上方法均是在此方法的基础上构建
- requests.request(method, url, **kwargs)
更多参数
- def request(method, url, **kwargs):
- """Constructs and sends a :class:`Request <Request>`.
- :param method: method for the new :class:`Request` object.
- :param url: URL for the new :class:`Request` object.
- :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
- :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
- :param json: (optional) json data to send in the body of the :class:`Request`.
- :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
- :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
- :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
- ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
- or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
- defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
- to add for the file.
- :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
- :param timeout: (optional) How long to wait for the server to send data
- before giving up, as a float, or a :ref:`(connect timeout, read
- timeout) <timeouts>` tuple.
- :type timeout: float or tuple
- :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.
- :type allow_redirects: bool
- :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
- :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.
- :param stream: (optional) if ``False``, the response content will be immediately downloaded.
- :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
- :return: :class:`Response <Response>` object
- :rtype: requests.Response
- Usage::
- >>> import requests
- >>> req = requests.request('GET', 'http://httpbin.org/get')
- <Response [200]>
- """
参数列表
- def param_method_url():
- # requests.request(method='get', url='http://127.0.0.1:8000/test/')
- # requests.request(method='post', url='http://127.0.0.1:8000/test/')
- pass
- def param_param():
- # - 可以是字典
- # - 可以是字符串
- # - 可以是字节(ascii编码以内)
- # requests.request(method='get',
- # url='http://127.0.0.1:8000/test/',
- # params={'k1': 'v1', 'k2': '水电费'})
- # requests.request(method='get',
- # url='http://127.0.0.1:8000/test/',
- # params="k1=v1&k2=水电费&k3=v3&k3=vv3")
- # requests.request(method='get',
- # url='http://127.0.0.1:8000/test/',
- # params=bytes("k1=v1&k2=k2&k3=v3&k3=vv3", encoding='utf8'))
- # 错误
- # requests.request(method='get',
- # url='http://127.0.0.1:8000/test/',
- # params=bytes("k1=v1&k2=水电费&k3=v3&k3=vv3", encoding='utf8'))
- pass
- def param_data():
- # 可以是字典
- # 可以是字符串
- # 可以是字节
- # 可以是文件对象
- # requests.request(method='POST',
- # url='http://127.0.0.1:8000/test/',
- # data={'k1': 'v1', 'k2': '水电费'})
- # requests.request(method='POST',
- # url='http://127.0.0.1:8000/test/',
- # data="k1=v1; k2=v2; k3=v3; k3=v4"
- # )
- # requests.request(method='POST',
- # url='http://127.0.0.1:8000/test/',
- # data="k1=v1;k2=v2;k3=v3;k3=v4",
- # headers={'Content-Type': 'application/x-www-form-urlencoded'}
- # )
- # requests.request(method='POST',
- # url='http://127.0.0.1:8000/test/',
- # data=open('data_file.py', mode='r', encoding='utf-8'), # 文件内容是:k1=v1;k2=v2;k3=v3;k3=v4
- # headers={'Content-Type': 'application/x-www-form-urlencoded'}
- # )
- pass
- def param_json():
- # 将json中对应的数据进行序列化成一个字符串,json.dumps(...)
- # 然后发送到服务器端的body中,并且Content-Type是 {'Content-Type': 'application/json'}
- requests.request(method='POST',
- url='http://127.0.0.1:8000/test/',
- json={'k1': 'v1', 'k2': '水电费'})
- def param_headers():
- # 发送请求头到服务器端
- requests.request(method='POST',
- url='http://127.0.0.1:8000/test/',
- json={'k1': 'v1', 'k2': '水电费'},
- headers={'Content-Type': 'application/x-www-form-urlencoded'}
- )
- def param_cookies():
- # 发送Cookie到服务器端
- requests.request(method='POST',
- url='http://127.0.0.1:8000/test/',
- data={'k1': 'v1', 'k2': 'v2'},
- cookies={'cook1': 'value1'},
- )
- # 也可以使用CookieJar(字典形式就是在此基础上封装)
- from http.cookiejar import CookieJar
- from http.cookiejar import Cookie
- obj = CookieJar()
- obj.set_cookie(Cookie(version=0, name='c1', value='v1', port=None, domain='', path='/', secure=False, expires=None,
- discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False,
- port_specified=False, domain_specified=False, domain_initial_dot=False, path_specified=False)
- )
- requests.request(method='POST',
- url='http://127.0.0.1:8000/test/',
- data={'k1': 'v1', 'k2': 'v2'},
- cookies=obj)
- def param_files():
- # 发送文件
- # file_dict = {
- # 'f1': open('readme', 'rb')
- # }
- # requests.request(method='POST',
- # url='http://127.0.0.1:8000/test/',
- # files=file_dict)
- # 发送文件,定制文件名
- # file_dict = {
- # 'f1': ('test.txt', open('readme', 'rb'))
- # }
- # requests.request(method='POST',
- # url='http://127.0.0.1:8000/test/',
- # files=file_dict)
- # 发送文件,定制文件名
- # file_dict = {
- # 'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf")
- # }
- # requests.request(method='POST',
- # url='http://127.0.0.1:8000/test/',
- # files=file_dict)
- # 发送文件,定制文件名
- # file_dict = {
- # 'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf", 'application/text', {'k1': '0'})
- # }
- # requests.request(method='POST',
- # url='http://127.0.0.1:8000/test/',
- # files=file_dict)
- pass
- def param_auth():
- from requests.auth import HTTPBasicAuth, HTTPDigestAuth
- ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi', 'sdfasdfasdf'))
- print(ret.text)
- # ret = requests.get('http://192.168.1.1',
- # auth=HTTPBasicAuth('admin', 'admin'))
- # ret.encoding = 'gbk'
- # print(ret.text)
- # ret = requests.get('http://httpbin.org/digest-auth/auth/user/pass', auth=HTTPDigestAuth('user', 'pass'))
- # print(ret)
- #
- def param_timeout():
- # ret = requests.get('http://google.com/', timeout=1)
- # print(ret)
- # ret = requests.get('http://google.com/', timeout=(5, 1))
- # print(ret)
- pass
- def param_allow_redirects():
- ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False)
- print(ret.text)
- def param_proxies():
- # proxies = {
- # "http": "61.172.249.96:80",
- # "https": "http://61.185.219.126:3128",
- # }
- # proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}
- # ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies)
- # print(ret.headers)
- # from requests.auth import HTTPProxyAuth
- #
- # proxyDict = {
- # 'http': '77.75.105.165',
- # 'https': '77.75.105.165'
- # }
- # auth = HTTPProxyAuth('username', 'mypassword')
- #
- # r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)
- # print(r.text)
- pass
- def param_stream():
- ret = requests.get('http://127.0.0.1:8000/test/', stream=True)
- print(ret.content)
- ret.close()
- # from contextlib import closing
- # with closing(requests.get('http://httpbin.org/get', stream=True)) as r:
- # # 在此处理响应。
- # for i in r.iter_content():
- # print(i)
- def requests_session():
- import requests
- session = requests.Session()
- ### 1、首先登陆任何页面,获取cookie
- i1 = session.get(url="http://dig.chouti.com/help/service")
- ### 2、用户登陆,携带上一次的cookie,后台对cookie中的 gpsd 进行授权
- i2 = session.post(
- url="http://dig.chouti.com/login",
- data={
- 'phone': "",
- 'password': "xxxxxx",
- 'oneMonth': ""
- }
- )
- i3 = session.post(
- url="http://dig.chouti.com/link/vote?linksId=8589623",
- )
- print(i3.text)
- 参数示例
使用示例
Python爬虫之-Requests的更多相关文章
- 孤荷凌寒自学python第六十七天初步了解Python爬虫初识requests模块
孤荷凌寒自学python第六十七天初步了解Python爬虫初识requests模块 (完整学习过程屏幕记录视频地址在文末) 从今天起开始正式学习Python的爬虫. 今天已经初步了解了两个主要的模块: ...
- Python爬虫练习(requests模块)
Python爬虫练习(requests模块) 关注公众号"轻松学编程"了解更多. 一.使用正则表达式解析页面和提取数据 1.爬取动态数据(js格式) 爬取http://fund.e ...
- python爬虫之requests库
在python爬虫中,要想获取url的原网页,就要用到众所周知的强大好用的requests库,在2018年python文档年度总结中,requests库使用率排行第一,接下来就开始简单的使用reque ...
- Python爬虫之requests
爬虫之requests 库的基本用法 基本请求: requests库提供了http所有的基本请求方式.例如 r = requests.post("http://httpbin.org/pos ...
- Python 爬虫二 requests模块
requests模块 Requests模块 get方法请求 整体演示一下: import requests response = requests.get("https://www.baid ...
- python爬虫之requests库介绍(二)
一.requests基于cookie操作 引言:有些时候,我们在使用爬虫程序去爬取一些用户相关信息的数据(爬取张三“人人网”个人主页数据)时,如果使用之前requests模块常规操作时,往往达不到我们 ...
- Python爬虫之requests库介绍(一)
一:Requests: 让 HTTP 服务人类 虽然Python的标准库中 urllib2 模块已经包含了平常我们使用的大多数功能,但是它的 API 使用起来让人感觉不太好,而 Requests 自称 ...
- Python爬虫之requests模块(1)
一.引入 Requests 唯一的一个非转基因的 Python HTTP 库,人类可以安全享用. 警告:非专业使用其他 HTTP 库会导致危险的副作用,包括:安全缺陷症.冗余代码症.重新发明轮子症.啃 ...
- Python爬虫之requests模块(2)
一.今日内容 session处理cookie proxies参数设置请求代理ip 基于线程池的数据爬取 二.回顾 xpath的解析流程 bs4的解析流程 常用xpath表达式 常用bs4解析方法 三. ...
- python爬虫值requests模块
- 基于如下5点展开requests模块的学习 什么是requests模块 requests模块是python中原生的基于网络请求的模块,其主要作用是用来模拟浏览器发起请求.功能强大,用法简洁高效.在 ...
随机推荐
- git commit 、CHANGELOG 和版本发布的标准自动化
一直以来,因为团队项目迭代节奏很快,每次发布的更新日志和版本更新都是通过人肉来完成的.有时候实在忙的团团转,对于手动的写这些更新信息就显得力不从心了.对于团队新来的小伙伴,有时候遇到些紧急情况,就更显 ...
- C# 冒泡排序
class Program { static void swap( ref int atemp, ref int btemp)//注意ref的使用 { int temp = atemp; atemp ...
- linux 从百度网盘下载文件的方法
linux 从百度网盘下载文件的方法 发表于2015 年 月 日由shenwang 方法1.wget wget是在Linux下开发的开放源代码的软件,作者是Hrvoje Niksic,后来被移植到包括 ...
- html 基本标签 ---短语
<em> </em> 着重 <strong> </strong> 强调 <dfn> </dfn> 定义 <code> ...
- HTML页面中直接加载其他JSP页面
1.在经典的框架中填充页面时 要填充2处的页面,2处为内容页面,是另外的一个JSP页面 2.左侧页面代码 <%@ page language="java" import=&q ...
- easy UI树形复选框
首先,展示一下结果 这个是使用easyui的combotree控件来实现的,具体的代码如下: 1,声明一个复选框 <select id="rolePer" name=&quo ...
- 技巧C#
1. 在CallBack之后保持滚动条的位置: 在Asp.Net1.1中,CallBack之后保持滚动条的位置是一件非常痛苦的事情,特别是页中有一个Grid并且想要编辑特定的行.为了不停留在想 ...
- pycharm重置设置,恢复默认设置
备忘,备忘,备忘 window 系统 找到下方目录-->删除. 再重新打开pycharm # Windows Vista, 7, 8, 10: <SYSTEM DRIVE>\User ...
- asp.net发送邮件带格式(本demo发送验证码)
public ActionResult Mail(string email, string userName) { try { MailSender mail = new MailSender(); ...
- windows连接远程打印机
windows连接hp的远程打印机时,自动装不了驱动.. 需打开驱动程序(驱动程序安装需接设备),然后windows就过下载驱动这步了..