selenium爬取qq空间，requests爬取雪球网数据

一、爬取qq空间好友动态数据

# 爬取qq空间好友状态信息(说说，好友名称)，并屏蔽广告

from selenium import webdriver

from time import sleep

from lxml import etree

# 自动操作浏览器

bro = webdriver.Chrome(executable_path=r'D:\爬虫+数据分析\tools\chromedriver.exe')

bro.get('https://qzone.qq.com/')

sleep(3)

#注意：如果想要通过find系列函数去定位某一个iframe标签下的子标签的话，一定要使用如下操作：

bro.switch_to.frame('login_frame')#参数表示的是iframe标签的id属性值，，ifram是子标签

bro.find_element_by_id('switcher_plogin').click()  # 单击id为switcher_plogin的页面标签，即点击账号密码登录

sleep(3) # 等待数据加载

# 自动输入用户名，密码登录空间

bro.find_element_by_id('u').send_keys('')

bro.find_element_by_id('p').send_keys('qq密码')

sleep(3)

bro.find_element_by_id('login_button').click()

sleep(3)

# 点击‘个人中心’，进到好友动态

bro.find_element_by_id('aIcenter').click()

sleep(3)

# 拖动滚轮，一次一屏；加载动态数据

bro.execute_script('window.scrollTo(0,document.body.scrollHeight)')

sleep(3)

bro.execute_script('window.scrollTo(0,document.body.scrollHeight)')

sleep(3)

bro.execute_script('window.scrollTo(0,document.body.scrollHeight)')

sleep(3)

# 获取浏览器当前的页面源码数据

page_text = bro.page_source  

# 数据解析

tree = etree.HTML(page_text)

li_list = tree.xpath('//ul[@id="feed_friend_list"]/li')

for li in li_list:

    user_name_list = li.xpath(".//div[@class='user-info']/div[@class='f-nick']/a/text()")

    text_list = li.xpath('.//div[@class="f-info"]/text()|.//div[@class="f-info qz_info_cut"]//text()') # 需要展开的说说类名不同

    for tu in zip(user_name_list,text_list):

        text = '\n'.join(tu)

        print(text+'\n\n')

bro.close() # 关闭浏览器

二、爬取雪球网的新闻的标题、作者、来源等

import requests

import json

headers = {

    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.12 Safari/537.36',

}

url_index = 'https://xueqiu.com'

url = 'https://xueqiu.com/v4/statuses/public_timeline_by_category.json?since_id=-1&max_id=-1&count=10&category=-1'

# 创建一个session对象

session = requests.Session()

#使用session进行请求的发送：获取cookie，且将cookie保存到session中

session.get(url_index,headers=headers)

# 获取json响应数据

json_dic = session.get(url=url,headers=headers).json()

for dic in json_dic["list"]:

    data = dic["data"]

    data_dic = json.loads(data)

    title = data_dic["title"]

#     description = data_dic["description"]

    column = dic["column"]

    author =  data_dic["user"]["screen_name"]

    print(f"标题：{title}\n来源：{column}\n作者：{author}\n")

selenium爬取qq空间，requests爬取雪球网数据的更多相关文章

python+selenium+requests爬取qq空间相册时遇到的问题及解决思路
最近研究了下用python爬取qq空间相册的问题,遇到的问题及解决思路如下: 1.qq空间相册的访问需要qq登录并且需是好友,requests模块模拟qq登录略显麻烦,所以采用selenium的dri ...
通过Scrapy抓取QQ空间
毕业设计题目就是用Scrapy抓取QQ空间的数据,最近毕业设计弄完了,来总结以下: 首先是模拟登录的问题: 由于Tencent对模拟登录比较讨厌,各个防备,而本人能力有限,所以做的最简单的,手动登录后 ...
使用Python+Selenium模拟登录QQ空间
使用Python+Selenium模拟登录QQ空间爬QQ空间之类的页面时大多需要进行登录,研究QQ登录规则的话,得分析大量Javascript的加密解密,这绝对能掉好几斤头发.而现在有了seleniu ...
selenium iframe 定位 qq空间说说
selenium iframe 定位 qq空间说说
用python爬取QQ空间
好久没写博文了,最近捣鼓了一下python,好像有点上瘾了,感觉python比js厉害好多,但是接触不久,只看了<[大家网]Python基础教程(第2版)[www.TopSage.com]> ...
Python_小林的爬取QQ空间相册图片链接程序
前言昨天看见某人的空间有上传了XXXX个头像,然后我就想着下载回来[所以本质上这是一个头像下载程序],但是一个个另存为太浪费时间了,上网搜索有没有现成的工具,居然要注册码,还卖45一套.你们的良心也 ...
Python爬虫实战：使用Selenium抓取QQ空间好友说说
前面我们接触到的,都是使用requests+BeautifulSoup组合对静态网页进行请求和数据解析,若是JS生成的内容,也介绍了通过寻找API借口来获取数据. 但是有的时候,网页数据由JS生成,A ...
selenium firefox 提取qq空间相册链接
环境: selenium-java 3.9.1 firefox 57.0 geckodriver 0.19.1 1.大概的思路就是模拟用户点击行为,关于滚动条的问题,我是模拟下拉箭头,否则只能每个相册 ...
python selenium爬取QQ空间方法
from selenium import webdriver import time # 打开浏览器 dr = webdriver.Chrome() # 打开某个网址 dr.get('https:// ...

随机推荐

cf499B-Lecture 【map】
http://codeforces.com/problemset/problem/499/B B. Lecture You have a new professor of graph theo ...
unity与android交互总结
http://www.jianshu.com/p/4739ce2f4cd1 http://www.cnblogs.com/suoluo/p/5443889.html http://www.th7.cn ...
最长的相同节点值路径 · Longest Univalue Path
［抄题］: Given a binary tree, find the length of the longest path where each node in the path has the s ...
CSS块级元素、内联元素概念[转]
CSS文档流与块级元素(block).内联元素(inline),之前翻阅不少书籍,看过不少文章, 看到所多的是零碎的CSS布局基本知识,比较表面.看过O'Reilly的<CSS权威指南>, ...
VC6.0 中添加/取消块注释的Macro代码
SAMPLE.DSM是微软提供的样例,使用的是vb语言.其中的 CommentOut 函数,是支持块注释的,可是这种/**/的注释方式,有时候用起来不是很方便,因为两个/会因为一个/而终止.对于大块代 ...
bootstrap导入JavaScript插件
Bootstrap的JavaScript插件可以单独导入到页面中,也可以一次性导入到页面中.因为在Bootstrap中的JavaScript插件都是依赖于jQuery库,所以不论是单独导入还一次性导入 ...
IE6，7，8在boostrap中兼容h5和css3
IE6.7.8版本(IE9以下版本)浏览器兼容html5新增的标签,引入下面代码文件即可: <script src="https://oss.maxcdn.com/libs/html5 ...
eclipse netbeans 代码模板
eclipse 代码模板插入slf4j ${:import(org.slf4j.Logger,org.slf4j.LoggerFactory)} private static final Log ...
【转载】Jedis对管道、事务以及Watch的操作详细解析
转载地址:http://blog.csdn.net/liyantianmin/article/details/51613772 1.Pipeline 利用pipeline的方式从client打包多条命 ...
SpringMVC源码解读 - HandlerMapping - SimpleUrlHandlerMapping初始化
摘要: SimpleUrlHandlerMapping只是参与Handler的注册,请求映射时由AbstractUrlHandlerMapping搞定. 初始化时,通过setMappings(Prop ...

selenium爬取qq空间，requests爬取雪球网数据

一、爬取qq空间好友动态数据

二、爬取雪球网的新闻的标题、作者、来源等

selenium爬取qq空间，requests爬取雪球网数据的更多相关文章

随机推荐

热门专题