python3拉勾网爬虫之（您操作太频繁，请稍后访问）

你是否经历过这个：

那就对了~
因为需要post和相关的cookie来请求~
所以，一个简单的代码爬拉钩~~~

 1 import requests
 2 import time
 3 import json
 4
 5
 6 def main():
 7     url_start = "https://www.lagou.com/jobs/list_运维?city=%E6%88%90%E9%83%BD&cl=false&fromSearch=true&labelWords=&suginput="
 8     url_parse = "https://www.lagou.com/jobs/positionAjax.json?city=成都&needAddtionalResult=false"
 9     headers = {
10         'Accept': 'application/json, text/javascript, */*; q=0.01',
11         'Referer': 'https://www.lagou.com/jobs/list_%E8%BF%90%E7%BB%B4?city=%E6%88%90%E9%83%BD&cl=false&fromSearch=true&labelWords=&suginput=',
12         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'
13     }
14     for x in range(1, 5):
15         data = {
16             'first': 'true',
17             'pn': str(x),
18             'kd': '运维'
19                 }
20         s = requests.Session() # 创建一个session对象
21         s.get(url_start, headers=headers, timeout=3)  # 用session对象发出get请求，请求首页获取cookies
22         cookie = s.cookies  # 为此次获取的cookies
23         response = s.post(url_parse, data=data, headers=headers, cookies=cookie, timeout=3)  # 获取此次文本
24         time.sleep(5)
25         response.encoding = response.apparent_encoding
26         text = json.loads(response.text)
27         info = text["content"]["positionResult"]["result"]
28         for i in info:
29             print(i["companyFullName"])
30             companyFullName = i["companyFullName"]
31             print(i["positionName"])
32             positionName = i["positionName"]
33             print(i["salary"])
34             salary = i["salary"]
35             print(i["companySize"])
36             companySize = i["companySize"]
37             print(i["skillLables"])
38             skillLables = i["skillLables"]
39             print(i["createTime"])
40             createTime = i["createTime"]
41             print(i["district"])
42             district = i["district"]
43             print(i["stationname"])
44             stationname = i["stationname"]
45
46 if __name__ == '__main__':
47     main()

python3拉勾网爬虫之（您操作太频繁，请稍后访问）的更多相关文章

python爬虫拉钩网：{'msg': '您操作太频繁,请稍后再访问', 'clientIp': '113.57.176.181', 'success': False}
反爬第一课: 在打印html.text的时候总会提示 {'success': False, 'msg': '您操作太频繁,请稍后再访问', 'clientIp': '113.14.1.254'} 需要 ...
Python3网络爬虫-- 使用代理，轮换使用各种IP访问
# proxy_list 代理列表 run_times = 100000 for i in range(run_times): for item in proxy_list: proxies = { ...
Python3网络爬虫之requests动态爬虫：拉钩网
操作环境: Windows10.Python3.6.Pycharm.谷歌浏览器目标网址: https://www.lagou.com/jobs/list_Python/p-city_0?px=defa ...
.Net实现拉勾网爬虫
前几天看到一个.NET Core写成的爬虫,有些莫名的小兴奋,之前一直用集搜客去爬拉勾网的招聘信息,这个傻瓜化工具相当于用HTML模板页去标记DOM节点,然后在浏览器窗口上模拟人的浏览行为同时跟踪节点 ...
Python3网络爬虫(四)：使用User Agent和代理IP隐藏身份《转》
https://blog.csdn.net/c406495762/article/details/60137956 运行平台:Windows Python版本:Python3.x IDE:Sublim ...
# Python3微博爬虫[requests+pyquery+selenium+mongodb]
目录 Python3微博爬虫[requests+pyquery+selenium+mongodb] 主要技术站点分析程序流程图编程实现数据库选择代理IP测试模拟登录获取用户详细信息获取 ...
Python3 常用爬虫库的安装
Python3 常用爬虫库的安装 1 简介 Windows下安装Python3常用的爬虫库:requests.selenium.beautifulsoup4.pyquery.pymysql.pymon ...
转：【Python3网络爬虫开发实战】 requests基本用法
1. 准备工作在开始之前,请确保已经正确安装好了requests库.如果没有安装,可以参考1.2.1节安装. 2. 实例引入 urllib库中的urlopen()方法实际上是以GET方式请求网页,而 ...
Python3.x爬虫教程：爬网页、爬图片、自己主动登录
林炳文Evankaka原创作品. 转载请注明出处http://blog.csdn.net/evankaka 摘要:本文将使用Python3.4爬网页.爬图片.自己主动登录.并对HTTP协议做了一个简单 ...

随机推荐

LGP5142题解
题意简明,不说了( 因为教练让同学们做线段树的题,早就会了线段树的我就来爆踩水水蓝了/kk 首先推一下柿子: \[\frac 1 n\sum_{i=1}^n(a_i^2-2 \times a_i \t ...
5月25日 python学习总结 HTML标签
一.HTML简介 http://www.cnblogs.com/linhaifeng/articles/8973878.html 二.HTML标签与文档结构 http://www.cnblogs.c ...
[SniperOJ](web)图书管理系统注入源码泄露
0x00 题目概况题目地址:http://www.sniperoj.cn:10000/ 这是一道注入题,存在git源码泄露,使用githack(freebuf有工具介绍)把源码脱下来,进行审计,然后 ...
如何运行MATLAB和C++混合编程
在GitHub下载了一个大佬的滤波器程序,包含MATLAB和C++,刚开始直接运行,发现提示如下: 然后,第一步:点击截图访问后面的链接,跳转到如下截图: 第二步:点击上面截图的左下角,R2015b版 ...
CF1500D Tiles for Bathroom （递推+大讨论）
题目大意:给你一个n*n的矩阵,现在问对于每个k\le n,求出所有k*k的子矩阵中,元素种类数不超过q的矩阵个数,n\le 1500, q\le 10 先考虑最暴力的做法: 对于每个格子,求出以它为 ...
CSC3100
其实是存一下代码 1. AVL的java实现维护一下每个点左右子树深度差,差绝对值大于2就转,转的方式和treap, splay转的方式差不多.旋转操作可以使两端差归零变得更平衡. 虽然平衡但转的次 ...
解决IDEA的plugins安装插件很慢、不成功问题
1.修改hosts文件路径:C:\Windows\System32\drivers\etc 52.84.224.36 plugins.jetbrains.com 重启IDEA,再次安装插件时,芜湖~ ...
Android studio Error occurred during initialization of VM
Unable to start the daemon process. This problem might be caused by incorrect configuration of the d ...
Redis String Type
Redis字符串的操作命令和对应的api如下: set [key] [value] JedisAPI:public String set(final String key, final String ...
学习openldap01
Linux 下openldap的详细介绍,搭建,配置管理,备份,案例 Ldap 服务应用指南兼容(5.X&6.X) 1.1 Ldap 目录服务介绍 1.1.1 什么是目录服务(activ ...

python3拉勾网爬虫之（您操作太频繁，请稍后访问）

python3拉勾网爬虫之（您操作太频繁，请稍后访问）的更多相关文章

随机推荐

热门专题