01: requests模块

1.1 requests模块简介返回顶部

　　1. requests模块介绍

　　　　　　1、 Python标准库中提供了：urllib、urllib2、httplib等模块以供Http请求，但是，它的 API 太渣了。
　　　　　　2、 Requests 是使用 Apache2 Licensed 许可证的基于Python开发的HTTP 库，其在Python内置模块的基础上进行了高度的封装
　　　　　　3、从而使得Pythoner进行网络请求时，变得美好了许多，使用Requests可以轻而易举的完成浏览器可有的任何操作

　　2. requests请求头和请求体

　　　　　　1、发送http请求包含请求头和请求体两部分，请求头和请求体使用”\r\n\r\n”分隔,请求头和请求头之间用一个
　　　　　　 ‘\r\n’,进行分隔，get请求只有请求头没有请求体，post请求同时有请求头和请求体
　　　　　　2、发送http请求时如果涉及cookies，cookies是放到请求头中，如果是回写cookie时是将cookie放到响应头中回写的cookie一般名字就叫（set-cookie）
　　　　　　3、如果有重定向
　　　　　　 - 响应头中有一个location参数

　　3. requests模块常用方法

　　　　　　1、 pip install requests
　　　　　　2、 response = requests.get('http://www.baidu.com/ ') #获取指定url的网页内容
　　　　　　3、 response.text #获取文本文件
　　　　　　4、 response.content #获取字节类型
　　　　　　5、 response.encoding = ‘utf-8’ #指定获取的网页内容用utf-8编码
　　　　　　　　response.encoding = response.apparent_encoding #下载的页面是什么编码就用什么编码格式
　　　　　　6、 response.cookies #拿到cookies
　　　　　　　　response.cookies.get_dict() #拿到cookie字典样式

1.2 使用requests模块发送get请求返回顶部

　　1、requests发送get请求的常用参数

import requests

ret = requests.get('https://github.com/timeline.json')

print(ret.url)        # 打印访问的url

print(ret.text)        # 打印返回值

get发送无参数实例

import requests

payload = {'key1': 'value1', 'key2': 'value2'}

ret = requests.get("http://httpbin.org/get", params=payload)

print(ret.url)        # 打印访问的url    http://httpbin.org/get?key1=value1&key2=value2

print(ret.text)       # 打印返回值

get发送有参数实例

import requests

ret = requests.get(

   url='http://www.baidu.com',

   params={'k1':123,'k2':456},       #http://www.baidu.com?k1=123&k2=456

   cookies={'c1':'','c2':''}, #requests会将这个cookie放到请求头中

   headers={                         #一般在请求头中做爬虫限制就下面三个限制

      'User-Agent':'',              # 伪造浏览器标记

      'Referer': 'http://dig.chouti.com/',

      # 有些网站在爬取时做了Referer限制，即判断上一次访问的是否是这个网站，是才让登录

      'xml-httprequest':'true',   #发送ajax请求可能就会有这个标记

   }

)

print ret.text

requests发送get请求常用参数

　　2、Requests一些其他参数

　　　　1、 auth参数：帮我们上传的用户名和密码进行简单加密
　　　　　　　　说明：有些网页登录时没有HTML页面，直接就是一个弹窗框，此时即可用这种方法发送登录请求

from requests.auth import HTTPBasicAuth, HTTPDigestAuth

ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi', 'abc'))

auth参数

　　　　2、timeout参数：请求多久未响应就超时

ret = requests.get('http://google.com/', timeout=1)

timeout参数

　　　　3、allow_redirects参数：是否允许重定向

ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False)

allow_redirects参数

　　　　4、 proxies参数：代理（需要有代理服务器）

import requests

proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}

ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies)

print(ret.headers)

from requests.auth import HTTPProxyAuth

proxyDict = {'http': '77.75.105.165','https': '77.75.105.165'}

auth = HTTPProxyAuth('username', 'mypassword')

r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)

print(r.text)

proxies参数

　　　　5、stream参数：一点点下载，一点点保存（比如内存很小，无法直接下载大文件）

ret = requests.get('http://127.0.0.1:8000/test/', stream=True)

print(ret.content)

ret.close()

stream参数

　　　　6、verify参数：忽略证书直接访问https网页

requests.get(

   url='https:xxxx',

   verify = False,                  #忽略证书

   # cert='fuck.pem',               #自己制作的证书

   # cert=('funck.crt','xxx.key')   #花钱买的第三方可信赖的证书（这种证书已经植入到操作系统中了）

)

verify参数

1.3 使用requests模块发送post请求返回顶部

　　1．基本POST实例

import requests

payload = {'key1': 'value1', 'key2': 'value2'}

ret = requests.post("http://httpbin.org/post", data=payload)

print(ret.text)

基本POST实例

　　2. 发送请求头和数据实例

import requests

import json

url = 'https://api.github.com/some/endpoint'

payload = {'some': 'data'}

headers = {'content-type': 'application/json'}

ret = requests.post(url, data=json.dumps(payload), headers=headers)

print(ret.text)

print(ret.cookies)

发送请求头和数据实例

　　3. requests发送post请求常用参数参数

requests.request(

   method='POST',                         # 提交方式

   url='http://www.oldboyedu.com',        # 提交地址

   data={'user': 'alex', 'pwd': ''},   # 通过请求体传递数据：post方式

   # json = {'user':'alex','pwd':'123',{'k1':'v1','k2':'v2'}},

   # json和data都是通过post向请求体传递数据，但是json传递的数据可以在字典中嵌套字典

   cookies={'cook1': 'value1'},           # 发送Cookie到服务器端

   headers={

      'Referer': 'http://dig.chouti.com/',

      # 有些网站在爬取时做了Referer限制，即判断上一次访问的是否是这个网站，是才让登录

      'User-Agent': 'Mozilla/5.0Safari/537.36',    # 伪造浏览器客户端(这里是谷歌浏览器)

   },

)

requests发送post请求常用参数参数

　　4. requests.post发送文件

import requests

requests.post(

   url='xxx',

   files={

      'f1':open('s1.py','rb'),               #这样就可以将s1.py这个文件上传到上面url中了

      'f2':('ssss1.py',open('s1.py','rb')),

 #指定上传文件名：第一个参数是上传到服务器端的文件名

   }

)

requests.post发送文件

1.4 requests.request()参数介绍返回顶部

　　1. requests.request()介绍

　　　　1、上面使用的requests.post()和requests.get()实际上就是调用requests.request()方法，只是传递的参数不同

　　2. requests.request()常用的几个参数（cookies是在headers中传递过去的）

requests.request(

   method='POST',                         # 提交方式

   url='http://www.oldboyedu.com',        # 提交地址

   params={'k1': 'v1', 'k2': 'v2'},       # 在url中传递的数据：get方式

   data={'user': 'alex', 'pwd': ''},   # 通过请求体传递数据：post方式

   # json = {'user':'alex','pwd':'123',{'k1':'v1','k2':'v2'}},

   # json和data都是通过post向请求体传递数据，但是json传递的数据可以在字典中嵌套字典

   cookies={'cook1': 'value1'},           # 发送Cookie到服务器端

   headers={

      'Referer': 'http://dig.chouti.com/',

      # 有些网站在爬取时做了Referer限制，即判断上一次访问的是否是这个网站，是才让登录

      'User-Agent': 'Mozilla/5.0Safari/537.36',    # 伪造浏览器客户端(这里是谷歌浏览器)

   },

)

requests.request()常用的几个参数

　　3、requests.Session()帮我们自动找到cookie携带信息自动登录

import requests

session = requests.Session()

### 1、首先登陆任何页面，获取cookie

i1 = session.get(url="http://dig.chouti.com/help/service")

### 2、用户登陆，携带上一次的cookie，后台对cookie中的 gpsd 进行授权

i2 = session.post(

    url="http://dig.chouti.com/login",

    data={

        'phone': "",

        'password': "7481079xl",

        'oneMonth': ""

    })

### 3、这个是点赞某条新闻的url（这样就可以模仿登录点赞了）

i3 = session.post(url="http://dig.chouti.com/link/vote?linksId=15055231",)

requests.Session()帮我们自动找到cookie携带信息自动登录

　　4、发送请求与传递参数其他方式

　　　　　　requests.get(‘https://github.com/timeline.json’) # GET （从服务器取出资源）
　　　　　　requests.post(“http://httpbin.org/post”) # POST （在服务器新建一个资源）
　　　　　　requests.put(“http://httpbin.org/put”) # PUT （在服务器更新资源：客户端提供改变后的完整资源）
　　　　　　requests.delete(“http://httpbin.org/delete”) # DELETE （从服务器删除资源）
　　　　　　requests.head(“http://httpbin.org/get”) # HEAD
　　　　　　requests.options(“http://httpbin.org/get”) # OPTIONS