python爬虫知识点总结（四）Requests库的基本使用

官方文档：http://docs.python-requests.org/en/master

安装方法

　　命令行下输入：pip3 install requests。详见：https://www.cnblogs.com/cthon/p/9388304.html

一、什么是Requets?

requets

实例引入

import requests

response = requests.get('https://www.baidu.com')

print(type(response))

print(response.status_code)

print(type(response.text))

print(response.text)

print(response.cookies)

各种请求方式

import requests

requests.post('http://httpbin.org/post')

requests.put('http://httpbin.org/put')

requests.delete('http://httpbin.org/delete')

requests.get('http://httpbin.org/get')

requests.options('http://httpbin.org/get')

请求

基本GET请求

基本写法

import requests

response = requests.get('http://httpbin.org/get')

print(response.text)

带参数GET请求

import requests

response = requests.get('http://httpbin.org/get?name=jack&age=22')

print(response.text)

import requests

data = {

    'name':'jack',

    'age':22

}

response = requests.get('http://httpbin.org/get',params=data)

print(response.text)

解析json

import requests

import json

response = requests.get('https://github.com/get')

print(type(response.text))

print(response.json())

print(json.loads(response.text))

print(type(response.json()))

获取二进制数据

import requests

response = requests.get('https://github.com/favicon.ico')

print(type(response.text),type(response.content))

print(response.text)

print(response.content)

import requests

response = requests.get('https://www.bilibili.com/video/av24028845/?p=9')

with open('q.avi','wb') as f:

    f.write(response.content)

    f.close()

添加headers

import requests

response = requests.get('https://zhihu.com/explore')

print(response.text)

import requests

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36'}

response = requests.get('https://www.zhihu.com/explore',headers=headers)

print(response.text)

基本POST请求

import requests

data = {'name':'jack','age':'22'}

response = requests.post('https://httpbin.org/post',data=data)

print(response.text)

print(response.json())

import requests

data = {'name':'jack','age':'22'}

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36'}

response = requests.post('https://httpbin.org/post',data=data,headers=headers)

print(response.text)

print(response.json())

响应

response属性

import requests

response = requests.get('http://www.jianshu.com')

print(type(response.status_code),response.status_code)

print(type(response.headers),response.headers)

print(type(response.cookies),response.cookies)

print(type(response.url),response.url)

print(type(response.history),response.history)

状态码判断

import requests

response = requests.get('http://www.cnblogs.com/cthon/p/9383778.html')

exit() if not response.status_code == requests.codes.not_found else print('404 Not Found')

import requests

response = requests.get('http://www.cnblogs.com/cthon/p/9383778.html')

exit() if not response.status_code == 200 else print('Request Successfully')

状态码

100:('continue',),

101:('switching_protocols',),

102:('processing',),

103:('checkpoint',),

122:('url_too_long','request_url_too_long'),

200:('ok','okay','all_ok','all_okay','all_good','\\o/','√',),

201:('created',),

202:('accepted',),

203:('non_authoritative_info','non_authoritative_information'),

204:('no_content',),

205:('reset_content','reset',),

206:('partial_content','partial'),

207:('multi_status','multiple_status','multi_stati','multiple_status'),

208:('already_reported',),

226:('im_used',),

#Redirection

300:('multiple_choices',),

301:('moved_permanently','moved','\\o-'),

302:('found',),

303:('see_other','other'),

304:('not_modified',),

305:('use_proxy',),

306:('switch_proxy',),

307:('temporary_redirect','temporary_moved','temporary'),

308:('permanent_redirect','temporary_moved','temporary',),#There 2 to be removed in 3.0

#Client Error

400:('bad_request','bad'),

401:('unauthorized',),

402:('payment_required','payment'),

403:('forbidden',),

404:('not_found','-o-'),

405:('method_not_allowed','not_allowed'),

406:('not_acceptable',),

407:('proxy_authentication_required','proxy_auth','proxy_authentication'),

408:('request_timeout','timeout'),

409:('confict',),

410('gone',),

411:('length_required',),

412:('precondition_failed','precondition'),

413:('request_entity_too_large',),

414:('request_url_too_large',),

415:('unsupported_media_type','unsupported_media','media_type'),

416:('requested_range_not_satisfiable','requestd_range','range_not_satisfiable'),

417:('expectation_request',),

418:('im_a_teapot','teapot','i_am_a_teapot'),

421:('misdirected_request',),

422:('unprocessable_entity','unprocessable'),

423:('locked',),

424:('failed_dependency','dependency'),

425:('unordered_collection','unordered'),

426:('upgrade_required','upgrade'),

428:('precondition_required','precondition'),

429:('too_many_requests','too_many'),

431:('header_fields_too_large','fields_too_large'),

444:('no_response','none'),

449:('retry_with','retry'),

450:('blocked_by_windows_parental_controls','parental_controls'),

451:('unavailable_for_legal_reasons','legal_reasons'),

499:('client_closed_request',),

#Server Error

500:('internal_server_error','server_error','/o\\','×'),

501:('not_implemented',),

502:('bad_gateway',),

503:('service_unavailable','unavailable'),

504:('gateway_timeout',),

505:('http_version_not_supported','http_version'),

506:('variant_also_negotiaes',),

507:('insufficient_storage',),

509:('bandwidth_limit_exceeded','bandwidth'),

510:('not_extended',),

511:('network_aurhentication_required','network_auth','network_authentication'),

高级文件操作

import requests

files= {'file':open('favicon.ico','rb')}

response = requests.post('http://httpbin.org/post',files=files)

print(response.text)

获取Cookie

import requests

response = requests.get('http://www.baidu.com')

print(response.cookies)

for key,value in response.cookies.items():

    print(key+'='+value)

会话维持

import requests

requests.get('http://httpbin.org/cookies/set/number/123456789')

response=requests.get('http://httpbin.org/cookies')

print(response.text)

import requests

s = requests.Session()

s.get('http://httpbin.org/cookies/set/number/123456789')

response=s.get('http://httpbin.org/cookies')

print(response.text)

证书验证

#12306错误证书，请求失败

import requests

response = requests.get('https://www.12306.cn/')

print(response.status_code)

import requests

from requests.packages import urllib3

urllib3.disable_warnings()

response = requests.get('https://www.12306.cn',verify = False)

print(response.status_code)

import requests

reeponse = requests.get('https://www.12306.cn',cer=('/path/server.crt','/path/key'))

print(response.status_code)

代理设置

　　http代理

import requests

proxies = {

    'http':'http://127.0.0.1:9743',

    'https':'https://127.0.0.1:9743'

}

response = requests.get('https://www.taobao.com',proxies=proxies)

print(response.status_code)

import requests

proxies = {

    'http':'http:/user:password@/127.0.0.1:9743'

}

response = requests.get('https://www.taobao.com',proxies=proxies)

print(response.status_code)

　　socket代理

pip3 install 'requests[socks]'

import requests

proxies = {

    'http':'socks5://127.0..0.1.9742',

    'https':'socks5://127.0.0.1:9742'

}

response = requests.get('https://www.taobao.com',proxies=proxies)

print(response.status_code)

超时设置

import requests

from requests.exceptions import ReadTimeout

try:

    response = requests.get('http://www.baidu.com',timeout = 0.01)

    print(response.status_code)

except ReadTimeout:

    print('Timeout')

认证设置

import requests

from requests.auth import HTTPBasicAuth 

r = requests.get('http://120.27.34.24:9001',auth=HTTPBasicAuth('user','123'))

print(r.status_code)

import requests

r = requests.get('http://120.27.34.24:9001',auth=('user','123'))

print(r.status_code)

异常处理

import requests

from requests.exceptions import ReadTimeout,HTTPError,RequestException

try:

    response = requests.get('http://www.baidu.com',timeout=0.1)

    print(response.status_code)

except ReadTimeout:

    print('Timeout')

except HTTPError:

    print('Http error')

except ConnectionError:

    print('Connection Error')

except RequestException:

    print('Error')

python爬虫知识点总结（四）Requests库的基本使用的更多相关文章

(转)Python爬虫利器一之Requests库的用法
官方文档以下内容大多来自于官方文档,本文进行了一些修改和总结.要了解更多可以参考官方文档安装利用 pip 安装 $ pip install requests 或者利用 easy_install ...
Python爬虫利器一之Requests库的用法
前言之前我们用了 urllib 库,这个作为入门的工具还是不错的,对了解一些爬虫的基本理念,掌握爬虫爬取的流程有所帮助.入门之后,我们就需要学习一些更加高级的内容和工具来方便我们的爬取.那么这一节来 ...
python爬虫入门三：requests库
urllib库在很多时候都比较繁琐,比如处理Cookies.因此,我们选择学习另一个更为简单易用的HTTP库:Requests. requests官方文档 1. 什么是Requests Request ...
python爬虫（八） requests库之 get请求
requests库比urllib库更加方便,包含了很多功能. 1.在使用之前需要先安装pip,在pycharm中打开: 写入pip install requests命令,即可下载在github中有关 ...
python爬虫（6）--Requests库的用法
1.安装利用pip来安装reques库,进入pip的下载位置,打开cmd,默认地址为 C:\Python27\Scripts 可以看到文件中有pip.exe,直接在上面输入cmd回车,进入命令行界面 ...
9.Python爬虫利器一之Requests库的用法（一）
requests 官方文档: http://cn.python-requests.org/zh_CN/latest/user/quickstart.html request 是一个第三方的HTTP库 ...
Python爬虫学习笔记-2.Requests库
Requests是Python的一个优雅而简单的HTTP库,它比Pyhton内置的urllib库,更加强大. 0X01 基本使用安装 Requests,只要在你的终端中运行这个简单命令即可: pip ...
python爬虫（九） requests库之post请求
1.方法: response=requests.post("https://www.baidu.com/s",data=data) 2.拉勾网职位信息获取因为拉勾网设置了反爬虫机 ...
python爬虫学习，使用requests库来实现模拟登录4399小游戏网站。
1.首先分析请求,打开4399网站. 右键检查元素或者F12打开开发者工具.然后找到network选项, 这里最好勾选perserve log 选项,用来保存请求日志.这时我们来先用我们的账号密码登陆 ...
python爬虫学习(6) —— 神器 Requests
Requests 是使用 Apache2 Licensed 许可证的 HTTP 库.用 Python 编写,真正的为人类着想. Python 标准库中的 urllib2 模块提供了你所需要的大多数 H ...

随机推荐

Python中optparse模块使用浅析
转载:http://www.jb51.net/article/59296.htm 最近遇到一个问题,是指定参数来运行某个特定的进程,这很类似Linux中一些命令的参数了,比如ls -a,为什么加上-a ...
Fragment小结
Fragment是Android3.0之后增加的新特性,通常人们叫它碎片.可是,我认为把它理解成一个View模块比較好,尽管它不是继承自View.假设阅读过源代码就知道它是内置View对象从而实现Vi ...
asp.net core 系列之Reponse caching之cache in-memory (2)
这篇文章(主要翻译于官网,水平有限,见谅)讲解asp.net core 中的 Cache in-memory (内存缓存). Cache in-memory in ASP.NET Core Cachi ...
Ubuntu Server 安装 NodeJS
准备命令: $ sudo apt-get install python $ sudo apt-get install build-essential $ sudo apt-get install gc ...
跟我一起写 Makefile（一）[转]
原文链接 http://bbs.chinaunix.net/thread-408225-1-1.html(出处: http://bbs.chinaunix.net/) 陈皓概述—— 什么是makef ...
python学习（四）字符串学习
#!/usr/bin/python # 这一节学习的是python中的字符串操作 # 字符串是在Python中作为序列存在的, 其他的序列有列表和元组 # 1. 序列的操作 S = 'Spam' # ...
mongo的时间类型，erlang中对其的处理
需求:要想在一个调度中,从mongo中查出大于一个时间戳的所有的数据总和. 这个需求很简单,一个是scheduler,还有另一个就是查出来大于某个时间戳的总和,比如大于每天0点时间点的和. 需要注意的 ...
Android 自定义View跑马灯效果（一）
今天通过书籍重新复习了一遍自定义VIew,为了加强自己的学习,我把它写在博客里面,有兴趣的可以看一下,相互学习共同进步: 通过自定义一个跑马灯效果,来诠释一下简单的效果: 一.创建一个类继承View, ...
Android 红色小圆球提示气泡 BadgeView
今天给大家分享两个实用有简单的一个小圆球提示气泡: BadgeView 参考地址: https://github.com/qstumn/BadgeView; 个人地址:http://git ...
说明sizeof和strlen之间的区别。
解析:由以下几个例子我们说明sizeof和strlen之间的区别.第1个例子: sizeof(ss)结果为4,ss是指向字符串常量的字符指针.sizeof(*ss)结果为1,*ss是第一个字符.第2个 ...

python爬虫知识点总结（四）Requests库的基本使用

requets

实例引入

各种请求方式

请求

基本GET请求

基本写法

带参数GET请求

解析json

获取二进制数据

添加headers

基本POST请求

响应

response属性

状态码判断

状态码

高级文件操作

获取Cookie

会话维持

证书验证

代理设置

http代理

socket代理

超时设置

认证设置

异常处理

python爬虫知识点总结（四）Requests库的基本使用的更多相关文章

随机推荐

热门专题

　　http代理

　　socket代理