python爬虫requests的使用

1 发送get请求获取页面

 import requests

 # 1 要爬取的页面地址

 url = 'http://www.baidu.com'

 # 2 发送get请求 拿到响应

 response = requests.get(url=url)

 # 3 获取响应内容文本  两种方法

 html1 = response.content.decode() #response.content为bytes类型，decode() 将它转换为utf8

 print(html1)

 response.encoding='utf8'

 html2 = response.text # 用response.text 会自动选择一种方式解码 有时候会乱码，要提前设置response.encoding

 print(html2)

2 发送post请求获取页面

 import requests

 # 1 要爬取的页面地址

 url = 'http://www.baidu.com'

 # 2 发送get请求 拿到响应

 response = requests.post(url=url)

 # 3 获取响应内容文本  两种方法

 html1 = response.content.decode() #response.content为bytes类型，decode() 将它转换为utf8

 print(html1)

 response.encoding='utf8'

 html2 = response.text # 用response.text 会自动选择一种方式解码 有时候会乱码，要提前设置response.encoding

 print(html2)

3 伪装浏览器，携带报头

 import requests

 # 伪装我们的报文头，加上Use-Agent 伪装成浏览器

 headers = {

     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',

     # 如果要带着cookie 可以传入cookie，也可以放在报文头当中

     #'Cookie':'这里放入cookie'

 }

 # 1 要爬取的页面地址

 url = 'http://www.baidu.com'

 # 2 发送get请求 拿到响应

 response = requests.get(url=url,headers=headers)

 # 3 获取响应内容文本  两种方法

 html = response.content.decode() #response.content为bytes类型，decode() 将它转换为utf8

 print(html)

4 携带数据（比如发送请求去登陆）

 import requests

 # 如果伪装登录，可以传送一个字典类型数据

 data = {

 '''这里放入需要的key：value'''

 }

 # 1 要爬取的页面地址

 url = 'http://www.baidu.com'

 # 2 发送get请求 拿到响应

 # get请求用params  相当于在url后面拼接key=value&key=value

 response = requests.get(url=url,params=data)

 # post用data传入参数 携带post的数据

 response = requests.post(url=url,data=data)

 # 3 获取响应内容文本  两种方法

 html = response.content.decode() #response.content为bytes类型，decode() 将它转换为utf8

 print(html)

5 代理

import requests

# 将代理的服务器放入这里，key为协议类型 value为代理的ip和端口

# 发送https或者http请求会根据不同代理ip选择 为我们发送请求

proxies = {

    'http':'http://127.0.0.1:80',

    'https':'https://127.0.0.1:80'

}

# 1 要爬取的页面地址

url = 'http://www.baidu.com'

# 2 发送get请求 拿到响应

response = requests.get(url=url,proxies=proxies)

# 3 获取响应内容文本  两种方法

html = response.content.decode() #response.content为bytes类型，decode() 将它转换为utf8

print(html)

6 携带cookie

 import requests

 # 如果要带着cookie字典 可以传入cookie，也可以放在报文头当中

 cookies = {

     #'key':'value',

 }

 # 或者将cookie放在报文头当中

 headers = {

     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',

     # 如果要带着cookie 可以传入cookie，也可以放在报文头当中

     #'Cookie':'这里放入cookie'

 }

 # 1 要爬取的页面地址

 url = 'http://www.baidu.com'

 # 2 发送get请求 拿到响应

 response = requests.get(url=url,cookies=cookies)

 #response = requests.get(url=url,headers=headers)

 # 3 获取响应内容文本  两种方法

 html = response.content.decode() #response.content为bytes类型，decode() 将它转换为utf8

 print(html)

7 保持session 帮我们保存response中的session

 import requests

 # 获取一个session对象为我们发送请求 用法与requests对象相同

 session = requests.session()

 url = 'http://www.baidu.com'

 #保持session发送请求

 response = session.get(url=url)

 # 获取页面

 html = response.content.decode()

 print(html)

 #查看session

 print(response.cookies)

8 设置连接超时时间

 import requests

 # 获取一个session对象为我们发送请求 用法与requests对象相同

 session = requests.session()

 url = 'http://www.baidu.com'

 #保持session发送请求

 response = session.get(url=url,timeout = 3) # 3秒时间为超时时间

 # 获取页面

 html = response.content.decode()

 print(html)

 #查看session

 print(response.cookies)

9 设置ssl校验对方https协议合法性是否忽略

 import requests

 # 获取一个session对象为我们发送请求 用法与requests对象相同

 session = requests.session()

 url = 'http://www.baidu.com'

 #保持session发送请求

 response = session.get(url=url,verify=False) # 不校验ssl 如果对方https协议不合法，我们忽略 继续请求

 # 获取页面

 html = response.content.decode()

 print(html)

 #查看session

 print(response.cookies)

10 重新连接次数

 import requests

 from retrying import retry

 @retry(stop_max_attempt_number=3) # 设置超时重新连接 次数3

 def get( url ):

     response = requests.get(url=url,timeout=3)

     return response.content.decode()

 url = 'http://www.baidu.com'

 html = get(url)

 print(html)

python爬虫requests的使用的更多相关文章

Python爬虫—requests库get和post方法使用
目录 Python爬虫-requests库get和post方法使用 1. 安装requests库 2.requests.get()方法使用 3.requests.post()方法使用-构造formda ...
Python 爬虫—— requests BeautifulSoup
本文记录下用来爬虫主要使用的两个库.第一个是requests,用这个库能很方便的下载网页,不用标准库里面各种urllib:第二个BeautifulSoup用来解析网页,不然自己用正则的话很烦. req ...
Python爬虫--Requests库
Requests Requests是用python语言基于urllib编写的,采用的是Apache2 Licensed开源协议的HTTP库,requests是python实现的最简单易用的HTTP库, ...
【Python成长之路】Python爬虫 --requests库爬取网站乱码（\xe4\xb8\xb0\xe5\xa）的解决方法【华为云分享】
[写在前面] 在用requests库对自己的CSDN个人博客(https://blog.csdn.net/yuzipeng)进行爬取时,发现乱码报错(\xe4\xb8\xb0\xe5\xaf\x8c\ ...
Python爬虫 requests库基础
requests库简介 requests是使用Apache2 licensed 许可证的HTTP库. 用python编写. 比urllib2模块更简洁. Request支持HTTP连接保持和连接池,支 ...
python 爬虫 requests+BeautifulSoup 爬取巨潮资讯公司概况代码实例
第一次写一个算是比较完整的爬虫,自我感觉极差啊,代码low,效率差,也没有保存到本地文件或者数据库,强行使用了一波多线程导致数据顺序发生了变化... 贴在这里,引以为戒吧. # -*- coding: ...
python爬虫---requests库的用法
requests是python实现的简单易用的HTTP库,使用起来比urllib简洁很多因为是第三方库,所以使用前需要cmd安装 pip install requests 安装完成后import一下 ...
Python爬虫---requests库快速上手
一.requests库简介 requests是Python的一个HTTP相关的库 requests安装: pip install requests 二.GET请求 import requests # ...
Python爬虫requests判断请求超时并重新发送请求
下面是简单的一个重复请求过程,更高级更简单的请移步本博客: https://www.cnblogs.com/fanjp666888/p/9796943.html 在爬虫的执行当中,总会遇到请求连接 ...
python爬虫——requests库使用代理
在看这篇文章之前,需要大家掌握的知识技能: python基础 html基础 http状态码让我们看看这篇文章中有哪些知识点: get方法 post方法 header参数,模拟用户 data参数,提交 ...

随机推荐

【Unity3D与23种设计模式】游戏的主循环——Game Loop
游戏与其他软件最大的不同就是游戏有Update逻辑一般的软件是由"事件"驱动因为它不会突然跑出来一只"兔子" 因此,只有游戏才有"帧" ...
Windows Server 2019 预览版介绍
在Windows server 2012.Windows server 2016还未完全普及的情况下,昨天Windows Server团队宣布Windows Server 2019将在2018年的下半 ...
openlayers渲染mapbox gl的vector tile
准备条件 https://openlayers.org/en/v4.6.5/build/ol.js https://cdn.polyfill.io/v2/polyfill.min.js DEMO &l ...
Win7硬盘的AHCI模式
1.什么是硬盘的AHCI模式? AHCI是串行ATA高级主控接口的英文缩写,它是Intel所主导的一项技术,它允许存储驱动程序启用高级SATA功能,如本机命令队列(NCQ)和热插拔.开启AHCI之后可 ...
腾讯云GAME-TECH游戏开发者技术沙龙(深圳)开启报名啦~
欢迎大家前往腾讯云+社区,获取更多腾讯海量技术实践干货哦~. 作者:由腾讯游戏云发表在云+社区腾讯云GAME-TECH沙龙继1月杭州站后,将于3月30日来到深圳站,与游戏厂商和游戏开发者,畅聊游戏安 ...
docker初次体验-管理MySQL+tomcat镜像
引言平时经常用linux,我没少吃苦后悔linux没好好研究研究.装一些软件配一些环境时很是害怕,多亏有了docker.docker是一个应用容器引擎,可以管理很多的软件镜像,这些镜像被官方放在了d ...
jquery ajax 返回的json对象新增属性值（干货）
$.ajax({ type:"GEt'; url:"你的地址", data:{"你的字段","字段值"} success:funt ...
IPFS: BitSwap协议(数据块交换)
原创 2018-01-11 飞向未来 IPFS指南 BitSwap协议 IPFS节点之间是如何进行数据交换的?本文来讲一下这个问题. IPFS在BitTorrent的基础上实现了p2p数据交换协议:B ...
Netty实现客户端和服务端通信简单例子
Netty是建立在NIO基础之上,Netty在NIO之上又提供了更高层次的抽象. 在Netty里面,Accept连接可以使用单独的线程池去处理,读写操作又是另外的线程池来处理. Accept连接和读写 ...
Java路径类问题总结
一.获取路径: 单独的Java类中获得绝对路径根据java.io.File的Doc文挡,可知: 默认情况下new File("/")代表的目录为:System.getPropert ...

python爬虫requests的使用

python爬虫requests的使用的更多相关文章

随机推荐

热门专题