小爬爬5:重点回顾&&移动端数据爬取1
1.
()什么是selenium
- 基于浏览器自动化的一个模块
()在爬虫中为什么使用selenium及其和爬虫之间的关联
- 可以便捷的获取动态加载的数据
- 实现模拟登陆
()列举常见的selenium模块的方法及其作用
- get(url)
- find系列的函数进行标签定位 #记住常用的几个
- send_keys(‘key’) #录入1个数据池
- click() #点击
- excute_script(‘jsCode’) #执行js代码
- page_source #获取页面的数据
- switch_to.frame('iframeID') #iframe需要切换
- quite() #关闭
- save_screenshot() #保存屏幕的内容
- a = ActionChains(bro) #动作链实例化对象
- a.click_and_hold('tag') #点击且长按这个标签
- tag.move_by_offset(x,y).perform() #偏移某个标签 ()loop的作用:
可以将多个任务对象注册到loop中
loop就可以通过不间断循环的形式异步的执行任务对象 ()多任务异步协程是如何实现异步的
- 协程
- 任务对象
- loop
2.单线程多任务异步协程回顾
# Author: studybrother sun
import asyncio
import aiohttp
#在实现该函数的时候,其函数实现内部不可以出现非异步模块的代码
async def request(url):
async with aiohttp.ClientSession() as s:
async with await s.get(url=url) as response:
page_text = await response.text() #解析的搜索界面 return page_text def callback(task): #回调
print(task.result())
def callback1(task):
print(task.result()) #事件循环对象:
loop = asyncio.get_event_loop()
c = request('https://www.baidu.com')
c1 = request('https://www.sogou.com') task = asyncio.ensure_future(c)
task.add_done_callback(callback) task1 = asyncio.ensure_future(c1)
task1.add_done_callback(callback1) tasks = [task,task1]
loop.run_until_complete(asyncio.wait(tasks))
运行的得到下面的结果:
<html>
<head>
<script>
location.replace(location.href.replace("https://","http://"));
</script>
</head>
<body>
<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>
<!DOCTYPE html>
<html lang="cn">
<head>
<script>window._speedMark = new Date();
window.lead_ip = '221.218.208.77';window.now = ;</script> <meta charset="utf-8">
<link rel="dns-prefetch" href="//img01.sogoucdn.com"><link rel="dns-prefetch" href="//img02.sogoucdn.com"><link rel="dns-prefetch" href="//img03.sogoucdn.com"><link rel="dns-prefetch" href="//img04.sogoucdn.com"><link rel="dns-prefetch" href="//dlweb.sogoucdn.com">
<title>搜狗搜索引擎 - 上网从搜狗开始</title>
<link rel="shortcut icon" href="/images/logo/new/favicon.ico?v=4" type="image/x-icon">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="搜狗搜索">
<meta name="keywords" content="搜狗搜索,网页搜索,微信搜索,视频搜索,图片搜索,音乐搜索,新闻搜索,软件搜索,问答搜索,百科搜索,购物搜索">
<meta name="description" content="搜狗搜索是全球第三代互动式搜索引擎,支持微信公众号和文章搜索、知乎搜索、英文搜索及翻译等,通过自主研发的人工智能算法为用户提供专业、精准、便捷的搜索服务。"> <link rel="stylesheet" type="text/css" href="/web/index/css/base.v.1.4.12.css">
<style>.wrapper .suggestion{border: 1px solid #e8e8e8; width:622px;-moz-box-shadow: 0px 1px 8px rgba(,,,0.1);-webkit-box-shadow: 0px 1px 8px rgba(,,,0.1);box-shadow: 0px 1px 8px rgba(,,,0.1);border-top-left-radius: 0px;border-top-right-radius: 0px;border-bottom-right-radius: 2px;border-bottom-left-radius: 2px; top:43px;} .wrapper .suglist{width: 206px;} .wrapper .suglist .keyword {color: #7a77c8;} .big-scn .suggestion {width: 654px;} .big-scn .suglist{width:236px;} .wrapper .suglist{ padding:4px }</style></head>
<body >
<div class="bg-gj-w" id="settings-mask" style="display: none;"></div>
<div class="gjss" id="settings-advanced" style="display: none;top:-240px;">
<div class="hf-box" id="settings-save-layer">
<div class="hf-def">已保存设置</div>
</div>
<div class="gjss-tab">
<a uigs-id="tab_set" href="javascript:void(0);" class="js-settings-tab tab-a cur">搜索设置</a>
<a uigs-id="tab_adv" href="javascript:void(0);" class="js-settings-tab tab-a">高级搜索</a>
<a href="javascript:void(0);" class="close-btn" id="settings-close"></a>
</div>
<div class="gjss-main">
<div class="gjss-sz js-settings-content">
<p class="gjss-err js-settings-mask" style="display: none;">搜索设置暂不可用,请启用浏览器的Cookie功能,然后刷新本页。</p>
<div class="bg-wkq js-settings-mask" id="settings-tips" style="display: none;"></div> <dl class="js-as-select">
<dt>搜索结果显示条数</dt>
<dd>
<a href="javascript:void(0);" class="xz" id="settings-number" data-value="">每页显示10条</a>
<ul id="settings-number-list">
<li><a uigs-id="set_10" href="javascript:void(0);" data-value="">每页显示10条</a></li>
<li><a uigs-id="set-20" href="javascript:void(0);" data-value="">每页显示20条</a></li>
<li><a uigs-id="set-50" href="javascript:void(0);" data-value="">每页显示50条</a></li>
<li><a uigs-id="set-100" href="javascript:void(0);" data-value="">每页显示100条</a></li>
</ul>
</dd>
<input type="hidden" name="pageNum" id="settings-show-number" value="">
</dl>
<p class="enter" style="padding-top: 20px;">
<a href="javascript:void(0);" id="settings-save" uigs-id="set-save" class="a1">保存</a>
<a href="javascript:void(0);" id="settings-reset" uigs-id="set-reset" class="a2">恢复默认</a>
</p>
</div>
<div class="gjss-sz js-settings-content" style="display: none;">
<form action="/web" target="_blank" id="advanced-search-form">
<input type="hidden" name="query" value="">
<input name="fieldtitle" type="hidden" value=""/>
<input name="fieldcontent" type="hidden" value=""/>
<input name="fieldstripurl" type="hidden" value=""/>
<input name="bstype" type="hidden" value=""/>
<input name="ie" type="hidden" value="utf8"/>
<dl>
<dt>搜索关键词</dt>
<dd class="js-as-radio">
<div class="input-box js-input-box" id="advanced-query-box">
<input name="q" type="text" must="" size="" maxlength="" autocomplete="off" placeholder="例如:搜狗真棒(多个关键词可用空格区分)">
<span class="err-word">* 请输入搜索关键词</span>
</div>
<a uigs-id="adv_split-query" href="javascript:void(0);" data-value="checkbox" class="dk-btn cur">拆分关键词</a>
<a uigs-id="adv_no-split-query" href="javascript:void(0);" data-value="" class="dk-btn">不拆分关键词</a>
<input type="hidden" name="include" value="checkbox">
</dd>
</dl>
<dl>
<dt>在指定站内搜索</dt>
<dd>
<div class="input-box js-input-box"><input name="sitequery" type="text" size="" autocomplete="off" placeholder="例如:www.sogou.com"></div>
</dd>
</dl>
<dl class="js-as-select" style="padding-top:16px">
<dt>搜索词位于</dt>
<dd>
<a href="javascript:void(0);" class="xz">网页中任何地方</a>
<ul>
<li><a href="javascript:void(0);" data-value="">网页中任何地方</a></li>
<li><a href="javascript:void(0);" data-value="">仅在标题中</a></li>
<li><a href="javascript:void(0);" data-value="">仅在正文中</a></li>
<li><a href="javascript:void(0);" data-value="">仅在网址中</a></li>
</ul>
</dd>
<input type="hidden" name="located" value="">
</dl>
<dl class="js-as-select" style="padding-top:16px">
<dt>需要搜索的文件格式</dt>
<dd >
<a href="javascript:void(0);" class="xz">全部网页</a>
<ul>
<li><a href="javascript:void(0);" data-value="">全部网页</a></li>
<li><a href="javascript:void(0);" data-value="doc">Microsoft Word (.doc)</a></li>
<li><a href="javascript:void(0);" data-value="xls">Microsoft Excel (.xls)</a></li>
<li><a href="javascript:void(0);" data-value="ppt">Microsoft Powerpoint (.ppt)</a></li>
<li><a href="javascript:void(0);" data-value="pdf">Adobe Acrobat PDF (.pdf)</a></li>
<li><a href="javascript:void(0);" data-value="rtf">RTF (.rtf)</a></li>
<li><a href="javascript:void(0);" data-value="all">全部文档</a></li>
</ul>
</dd>
<input type="hidden" name="filetype" value="">
</dl>
<dl>
<dt>搜索结果排序方式</dt>
<dd class="js-as-radio">
<a uigs-id="adv_relevance-ranking" href="javascript:void(0);" data-value="off" class="dk-btn cur">按相关性排序</a>
<a uigs-id="adv_time-sort" href="javascript:void(0);" data-value="on" class="dk-btn">按时间排序</a>
<input type="hidden" name="tro" value="off">
</dd>
</dl>
<p class="enter"><input id="adv-search-btn" uigs-id="adv_search-btn" type="submit" class="a1" value="开始搜索"></p>
</form>
</div>
</div>
</div>
<div class="wrapper" id="wrap">
<div class="header">
<div class="top-nav">
<ul>
<li><a onclick="st(this,'40030300','news')" href="http://news.sogou.com" uigs-id="nav_news" id="news">新闻</a></li>
<li class="cur"><span>网页</span></li>
<li><a onclick="st(this,'73141200','weixin')" href="http://weixin.sogou.com/" uigs-id="nav_weixin" id="weixinch">微信</a></li>
<li><a onclick="st(this,'40051200','zhihu')" href="http://zhihu.sogou.com/" uigs-id="nav_zhihu" id="zhihu">知乎</a></li>
<li><a onclick="st(this,'40030500','pic')" href="http://pic.sogou.com" uigs-id="nav_pic" id="pic">图片</a></li>
<li><a onclick="st(this,'40030600','video')" href="https://v.sogou.com/" uigs-id="nav_v" id="video">视频</a></li>
<li><a href="http://mingyi.sogou.com?fr=common_index_nav" uigs-id="nav_mingyi" id="mingyi" onclick="st(this,'','myingyi')">明医</a></li>
<li><a href="http://english.sogou.com?fr=pcweb_index_nav" uigs-id="nav_overseas" id="overseas" onclick="st(this,'','overseas')" >英文</a></li>
<li><a onclick="st(this,'web2ww','wenwen')" href="https://wenwen.sogou.com/?ch=websearch" uigs-id="nav_wenwen" id="index_more_wenwen">问问</a></li>
<li><a href="http://scholar.sogou.com?fr=common_index_nav" uigs-id="nav_scholar" id="scholar" onclick="st(this,'','scholar')">学术</a></li>
<li class="show-more">
<a href="javascript:void(0);" id="more-product">更多<i class="m-arr"></i></a>
<div class="pos-more" id="products-box" style="top: 40px;">
<span class="ico-san"></span> <a onclick="st(this,'40031000')" href="http://map.sogou.com" uigs-id="nav_map" id="map">地图</a>
<a onclick="st(this,'40031500')" href="http://gouwu.sogou.com/" uigs-id="nav_gouwu" id="index_more_gouwu">购物</a>
<a onclick="st(this,'40051203')" href="http://baike.sogou.com/Home.v" uigs-id="nav_baike" id="index_more_baike">百科</a>
<a onclick="st(this)" href="http://zhishi.sogou.com" uigs-id="nav_zhishi" id="index_more_zhishi">知识</a>
<a onclick="st(this,'40051205')" href="http://as.sogou.com/" uigs-id="nav_app" id="index_more_appli">应用</a>
<a onclick="st(this,'40051205','fanyi')" href="http://fanyi.sogou.com?fr=common_index_nav_pc" uigs-id="nav_fanyi" id="index_more_fanyi">翻译</a>
<a href="http://index.sogou.com" uigs-id="nav_index" id="index_more_index">指数</a>
<a href="http://dangjian.sogou.com" uigs-id="nav_dangjian" id="dangjian" onclick="st(this,'','dangjian')">党建</a>
<span class="all"><a onclick="st(this,'40051206')" href="http://www.sogou.com/docs/more.htm?v=1" uigs-id="nav_all" target="_blank">全部</a></span>
</div>
</li>
</ul>
</div> <div class="user-box">
<div class="local-weather" id="local-weather">
<div class="wea-box" id="cur-weather" style="display: none;"></div>
<div class="pos-more" id="detail-weather" style="top:40px;"></div>
</div>
<span class="line" id="user-box-line" style="display: none;"></span>
<div class="user-enter">
<a href="javascript:void(0);" id="show-card" style="display: none" uigs-id="settings_show-card">显示卡片</a>
<a href="javascript:void(0);" uigs-id="settings_change-skin" id="changeSkinBtn" >换肤</a>
<span class="s-dw">
<a href="javascript:void(0);" id="settings">设置</a>
<div class="pos-more" id="settings-box" style="top:40px;">
<span class="ico-san"></span>
<a href="javascript:void(0);" id="search-settings" uigs-id="settings_config">搜索设置</a>
<a href="javascript:void(0);" id="advanced-search" uigs-id="settings_advanced">高级搜索</a>
<a href="http://help.sogou.com/?w=01091500&v=1" uigs-id="settings_help">帮助</a>
</div>
</span>
<a href="javascript:void(0);" class="enter" id="loginBtn">登录</a> </div>
</div>
</div>
<div class="content" id="content">
<div class="pos-header" id="top-float-bar">
<div class="part-one"></div>
<div class="part-two" id="card-tab-layer">
<div class="c-top" id="top-card-tab"></div>
</div>
</div>
<div class="logo2" id="logo-s"><span></span></div> <div class="logo" id="logo-l"><span></span></div> <div class="search-box" id="search-box">
<form action="/web" name="sf" id="sf">
<span class="sec-input-box">
<input type="text" class="sec-input active" name="query" id="query" maxlength="" len="" autocomplete="off" />
</span>
<span class="enter-input"><input type="submit" value="" id="stb"></span>
<input type="hidden" name="_asf" value="www.sogou.com" />
<input type="hidden" name="_ast" />
<input type="hidden" name="w" value="" />
<input type="hidden" name="p" value="" />
<input type="hidden" name="ie" value="utf8" />
<input type="hidden" name="from" value="index-nologin" />
<input type="hidden" name="s_from" value="index" />
<div class="keywords-tips" id="keywordsTips" style="display:none">
<i></i><p>搜狗的查询限制在"<strong>40个汉字</strong>"以内。</p>
</div>
</form>
</div>
</div>
<div class="card-box" id="card-box" style="display: none;">
<div class="card-box2" id="card-box2">
<div class="c-top" id="card-tab-box">
<a href="javascript:void(0);" id="card-settings" uigs-id="settings_settings-btn" class="shezhi"></a>
<div class="pos-more" id="card-options">
<span class="ico-san"></span>
<a href="javascript:void(0);" uigs-id="settings_close-card" id="close-card">关闭卡片</a>
</div>
</div>
<div class="c-main" id="card-content"></div>
</div>
</div>
<div class="loog-more" id="scroll-more" style="display: none;">
<a href="javascript:void(0);" uigs-id="scroll-more">滚动查看更多<br><span class="ico_san"></span></a>
</div> <div class="ft" id="footer" style="display: none;">
<a href="http://fuwu.sogou.com/" target="_blank" uigs-id="footer_tuiguang">企业推广</a><span class="line"></span><a href="http://corp.sogou.com/" target="_blank" uigs-id="footer_about">关于搜狗</a><span class="line"></span><a href="http://ir.sogou.com/" target="_blank" uigs-id="footer_aboutEnglish">About Sogou</a><span class="line"></span><a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" uigs-id="footer_disclaimer">免责声明</a><span class="line"></span><a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" uigs-id="footer_feedback">意见反馈及投诉</a><span class="line"></span><a href="http://corp.sogou.com/private.html" target="_blank" uigs-id="footer_private">隐私政策</a><br>
© - Sogou.com / <span class="g">京网文 () -852号</span> / <a href="http://www.miibeian.gov.cn" target="_blank" class="g">京ICP证050897号</a><br>
<span class="g">(京)-经营性--</span> / <a href="http://www.miibeian.gov.cn/" target="_blank" class="g">京ICP备11001839号-</a> / <a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a>
</div>
<div class="ft-v1" id="QRcode-footer" style="padding-bottom:53px; ">
<div class="erwm-box">
<span class="ewm"></span>
<div class="erwx">
<p>搜狗搜索APP</p>
<p class="p2">搜你所想</p>
</div>
</div>
<div class="ft-info">
<a uigs-id="mid_pinyin" href="http://pinyin.sogou.com/" target="_blank"><i class="i1"></i>搜狗输入法</a><span class="line"></span><a uigs-id="mid_liulanqi" href="http://ie.sogou.com/" target="_blank"><i class="i2"></i>浏览器</a><span class="line"></span><a uigs-id="mid_daohang" href="http://123.sogou.com/" target="_blank"><i class="i3"></i>网址导航</a><br> <a href="http://corp.sogou.com/" target="_blank" class="g">关于搜狗</a> - <a href="http://ir.sogou.com/" target="_blank" class="g">About Sogou</a> - <a href="http://fuwu.sogou.com/" target="_blank" class="g">企业推广</a> - <a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" class="g">免责声明</a> - <a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" class="g">意见反馈及投诉</a> - <a href="http://corp.sogou.com/private.html" target="_blank" class="g" uigs-id="footer_private">隐私政策</a><br>
© - Sogou.com / <span class="g">京网文 () -852号</span> / <span class="g">(京)-经营性--</span><br>
<a href="http://www.miibeian.gov.cn" target="_blank" class="g">京ICP证050897号</a> / <a href="http://www.miibeian.gov.cn/" target="_blank" class="g">京ICP备11001839号-</a> / <a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a>
</div>
</div> <div class="kuozhan" id="QRcode-box" style="display: none;">
<a href="javascript:void(0);" id="miniQRcode"></a>
<span id="QRcode"></span>
</div>
<a href="javascript:void(0);" class="back-top" id="back-top"></a> </div>
<script>
var SugPara, uigs_para,
msBrowserName = navigator.userAgent.toLowerCase(),
msIsSe = false,
msIsMSearch = false,
hasDoodle = false,
queryinput = document.getElementById('query'); uigs_para={
"uigs_productid": "webapp",
"type": "webindex_new",
"stype": "nologin",
"scrnwi": screen.width,
"scrnhi": screen.height,
"uigs_pbtag": "A",
"uigs_cookie": "SUID,sct",
"protocol": location.protocol.toLowerCase() == "https:" ? "https" : "http"
}; SugPara = {"enableSug":true,"sugType":"web","domain":"w.sugg.sogou.com","productId":"web","sugFormName":"sf","inputid":"query","submitId":"stb","suggestRid":"","normalRid":"","useParent": ,"sugglocation":"index","showVr":true,"showHotwords":true,"suggAbtestObject":{"suggestHistoryStrategy1":"","suggestHistoryStrategy2":"0|1|2|3|4|5|6|7|8","suggHistoryAbtest":""}}; function mk_con() {
try {
window.external.metasearch('make_connection', 'www.google.com.hk');
} catch (e) {}
} if (/se \.x/i.test(msBrowserName)) {
msIsSe = true;
} if (/metasr/i.test(msBrowserName)) {
msIsMSearch = true;
} if (queryinput) {
if (msIsSe && msIsMSearch) {
if (queryinput.addEventListener) {
queryinput.addEventListener('keypress', mk_con, false);
queryinput.addEventListener('keydown', mk_con, false)
} else if (queryinput.attachEvent) {
queryinput.attachEvent('onkeypress', mk_con);
queryinput.attachEvent('onkeydown', mk_con);
} else {
queryinput.onkeypress = mk_con;
queryinput.onkeydown = mk_con;
}
}
}
function getDomain(){
var domainName = document.domain;
if(domainName.indexOf("sogou.com")==(domainName.length-)){
return ".sogou.com";
}else if(domainName.indexOf("soso.com")==(domainName.length-)){
return ".soso.com";
}else if(domainName.indexOf("sogo.com") != -){
return ".sogo.com"
}
}
window.m_s_index = function() {
var w = document.sf.query,
c = Math.round((new Date().getTime() + Math.random()) * ); w.focus(); if(new RegExp("kw=([^&]+)").test(location.search)) {
if(w.value.length == ) {
w.value = decodeURIComponent(RegExp.$);
}
} if (document.cookie.indexOf("SUV=") < ) {
document.cookie = "SUV=" + c + ";path=/;expires=Sun, 29 July 2026 00:00:00 UTC;domain="+getDomain();
} (new Image).src = '//pb6.sogou.com/v6'; }; function st(self, p, product, anchor) {
var searchBox = document.sf.query,
query = encodeURIComponent(searchBox.value), productUrl = {
"news": 'http://news.sogou.com/news?ie=utf8&query=',
"web": 'web?ie=utf8&query=',
"weixin": 'http://weixin.sogou.com/weixin?type=2&ie=utf8&query=',
"zhihu": 'http://zhihu.sogou.com/zhihu?ie=utf8&query=',
"pic": 'http://pic.sogou.com/pics?ie=utf8&query=',
"video": 'https://v.sogou.com/v?ie=utf8&query=',
"myingyi": 'https://www.sogou.com/web?m2web=mingyi.sogou.com&ie=utf8&query=',
"overseas": 'http://english.sogou.com?b_o_e=1&ie=utf8&fr=pcweb_index_nav&query=',
"scholar": 'http://scholar.sogou.com?ie=utf8&fr=common_index_nav&query=',
"fanyi": 'http://fanyi.sogou.com/?fr=common_index_nav_pc&ie=utf8&keyword=',
"wenwen":'http://wenwen.sogou.com/s/?ch=websearch&w=',
"dangjian":'http://dangjian.sogou.com/dangjian?query='
},
newHref = productUrl[product] || self.href; function getConnectSymbol(url) {
return url.indexOf("?") > - ? '&' : '?';
} if(searchBox && searchBox.value !== ''){ if(productUrl[product]) {
newHref = productUrl[product] + query;
} else if(newHref.indexOf("kw=") > ) {
newHref = newHref.replace(new RegExp("kw=[^&$]*"), "kw=" + query)
} else {
newHref += getConnectSymbol(newHref) + 'kw=' + query;
}
} if(p){
newHref += getConnectSymbol(newHref) + "p=" + p;
} if (anchor && anchor.length > ){
newHref += "#" + anchor;
} if (searchBox && searchBox.value == '' && (product == 'wenwen' || product == 'dangjian')){//问问首页链接单独处理
newHref = self.href;
} self.href = newHref;
} window.cid = function(o, p) {
var w = document.sf.query,
q = encodeURIComponent(w.value); if (!q) {
o.href += "?cid=" + p
} else {
if (p === "web2ww") {
o.href += "s/?cid=web2ww&w=" + q
} else if (p === "web2bk") {
o.href += "Search.e?sp=S" + q + "&cid=web2bk"
}
}
}; window.m_s_index();
</script>
<script src="//dlweb.sogoucdn.com/common/lib/jquery/jquery-1.11.0.min.js"></script>
<script charset="gbk" type="text/javascript" src="/js/sugg_new.v.104.js"></script>
<script src="/js/pb_v.1.9.6.min.js"></script>
<script src="/js/lib/jquery.mousewheel.min.js"></script>
<script src="/js/lib/juicer-min.js"></script>
<script src="/js/common/widget/login_new.min.v.0.5.js"></script>
<script src="//account.sogou.com/static/api/passport-async.js"></script>
<script src="/web/index/js/base.v.1.1.14.js"></script>
<script src="/web/js/voice.min.v.0.0.6.js"></script>
<script src="/web/js/taspeed.min.v.0.0.1.js"></script>
</body>
</html>
<!--zly-->
3.移动端数据爬取&&环境配置等
实验:参考下面的blog
https://www.cnblogs.com/bobo-zhang/p/10068994.html
- 移动端数据爬取:
- 抓包工具:(定义:代理服务器)
window:- fiddler,mitproxy(两者都是代理服务器)
mac:青花瓷
- 在手机中安装证书:
- 1让电脑开启一个wifi,然后手机连接wifi(手机和电脑是在同一个网段下)
- 手机浏览器中:ip:8888,点击超链进行证书下载
- 需要将手机的代理开启:将代理ip和端口号设置成fiddler的端口和fidd所在机器的ip
(1)将证书发送给"手机"
(2)在Fiddler中,点击Tools=>Options=>
下一步,"允许"其他设备连接:=>"确定"=>OK
在浏览器中访问:http://localhost:8888/http://localhost:8888/
得到下面的结果
我们可以在上图的最后一行下载"证书"
小爬爬5:重点回顾&&移动端数据爬取1的更多相关文章
- 移动端数据爬取和Scrapy框架
移动端数据爬取 注:抓包工具:青花瓷 1.配置fiddler 2.移动端安装fiddler证书 3.配置手机的网络 - 给手机设置一个代理IP:port a. Fiddler设置 打开Fiddler软 ...
- 爬虫05 /js加密/js逆向、常用抓包工具、移动端数据爬取
爬虫05 /js加密/js逆向.常用抓包工具.移动端数据爬取 目录 爬虫05 /js加密/js逆向.常用抓包工具.移动端数据爬取 1. js加密.js逆向:案例1 2. js加密.js逆向:案例2 3 ...
- 小爬爬6: 网易新闻scrapy+selenium的爬取
1.https://news.163.com/ 国内国际,军事航空,无人机都是动态加载的,先不管其他我们最后再搞中间件 2. 我们可以查看到"国内"等板块的位置 新建一个项目,创建 ...
- 移动端数据爬取(fidlde)
一.什么是Fiddler? 1 什么是Fiddler? Fiddler是位于客户端和服务器端的HTTP代理,也是目前最常用的http抓包工具之一 . 它能够记录客户端和服务器之间的所有 HTTP请求, ...
- 人人贷网的数据爬取(利用python包selenium)
记得之前应同学之情,帮忙爬取人人贷网的借贷人信息,综合网上各种相关资料,改善一下别人代码,并能实现数据代码爬取,具体请看我之前的博客:http://www.cnblogs.com/Yiutto/p/5 ...
- 一个免费ss网站的数据爬取过程
一个免费ss网站的数据爬取过程 Apr 14, 2019 引言 爬虫整体概况 主要功能方法 绕过DDOS保护(Cloudflare) post中参数a,b,c的解析 post中参数a,b,c的解析 p ...
- requests模块session处理cookie 与基于线程池的数据爬取
引入 有些时候,我们在使用爬虫程序去爬取一些用户相关信息的数据(爬取张三“人人网”个人主页数据)时,如果使用之前requests模块常规操作时,往往达不到我们想要的目的,例如: #!/usr/bin/ ...
- python实现人人网用户数据爬取及简单分析
这是之前做的一个小项目.这几天刚好整理了一些相关资料,顺便就在这里做一个梳理啦~ 简单来说这个项目实现了,登录人人网并爬取用户数据.并对用户数据进行分析挖掘,终于效果例如以下:1.存储人人网用户数据( ...
- Python3,x:如何进行手机APP的数据爬取
Python3,x:如何进行手机APP的数据爬取 一.简介 平时我们的爬虫多是针对网页的,但是随着手机端APP应用数量的增多,相应的爬取需求也就越来越多,因此手机端APP的数据爬取对于一名爬虫工程师来 ...
随机推荐
- SpringData初探
前言 项目中用到这个,没有学过,手动搭建,测试执行流程, 理论的东西有时间再补充 Maven依赖 <?xml version="1.0" encoding="UTF ...
- springcloud 与分布式系统(转载)
原地址:http://blog.csdn.net/neosmith/article/details/51919038 本文不是讲解如何使用spring Cloud的教程,而是探讨Spring Clou ...
- css3之弹性盒模型(Flex Box)
CSS3 弹性盒子(Flex Box) 弹性盒子是 CSS3 的一种新的布局模式. CSS3 弹性盒( Flexible Box 或 flexbox),是一种当页面需要适应不同的屏幕大小以及设备类型时 ...
- storm-jdbc的使用
最近项目组分配到研究storm-jdbc用法 发现网上关于insert和query方法挺多的,但是自定义方法很少.而且用法上也挺多缺陷.在此自己总结记录一下 JdbcInsertBolt 的核心代码 ...
- mybatis学习:mybatis注解开发一对多
实体类User: public class User implements Serializable { private Integer id; private String username; pr ...
- 跟我一起做一个vue的小项目(十)
接下来我们对城市列表页面进行优化,除了对数据优化,也会进行节流处理 //src\pages\city\components\Alphabet.vue <template> <ul c ...
- 几个树形dp
1.重建道路 树形dp基础题,f[i][j]表示在i这个点我和我的子树联通块大小为j最少砍几条边. 转移的时候,到下一个子树时上一个子树所有答案先++(此树直接砍掉不贡献答案),再继续dp. 注意更新 ...
- H5C3--圆角
/*添加圆角 规律:顺时针方向 一个值:代表四个方向 二个值:左上+右下 / 右上+左下 三个值:左上 / 右上+左下 / 右下 四个值:左上/ 右上 / 右下/ 左下*/ /*border-radi ...
- Git--版本管理的使用及理解
如果多人合作时,git也是需要中间交换服务器来解决冲突合并,这不还是集中式版本控制吗? 而svn不是也可以将所有源码下载到本机,然后在本机修改,保存在本机上,为什么这个不能说是分布式,提交的时候不也是 ...
- python中str的常用方法汇总(1)
a = 'strABC' # Strabc : 首字母大写,其他全部小写 b = a.capitalize() print(b) # STRABC : 全部大写 c = a.upper() print ...