python 爬一下
1.安装Requests
window:pip install requests
linux:sudo pip install requests
国内安装缓慢,建议到:
http://www.lfd.uci.edu/~gohlke/pythonlibs/
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAbMAAACjCAIAAACPLmf/AAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAA7EAAAOxAGVKw4bAAARQElEQVR4nO2dva4dtxGAuUHaVEkZw68gI43kh7DzBIHLSH0AqdJVpQukl8v4EXKNPICryI0gPUCaQEUapwhSpdoUK+/lHXKGHO4f95zvwyn2Z3Y45HJnh+QeMozXSgg3ruNLJNuuFTLFS7zyS9DS2jTRy6amDFcpZ/ta7uDEL8NBDMOraWMcX8bb4ux8cDoyC8+StnxW/7ybVWVbqyVdo8QrH8sMw6v6JFxkC0cr5+DPr3Z5etNDxX0vytekK7QZmrWzdiqp/QItC1mDi4nal6f5FeluVK9Oz7GOOYSb9DVoxCNZGSNW0i5Jd43jDfakv2IqNktiTMOebDEahZZNy9jOpisKxHXf6+WLWS7WBy1djWIcbdvgquf1xmQ3Kh+fa+awmHFmfmXVvEhj4cowKqu/wUL75R/bU2lVfRi4MGDUrhVqs9uanTX2G2e92TGyUCms1ai2dA3h+oSMAmyo5y6IE4sc7xmz9Hbn4trZ9oytknS3xGWyp8HehOam5XZdE5vWE285b53fC6ZTz9gVa1WsBj1d1WnNmH36Q/tn67xTznvyi6MNKJN993ZSM+LBHJc9IrioyWOcltaKdNkTywdnOXdS/jF2jCa6O3qwX9wv7XhNYJjmvcP8nothHMdjEn54L4tjzdpB7ZStP75Ka/7Y/Z5i1K++x1BTro0CpybZKbaNlQdnOdvWGqlr3wmkd80W0OQNxNvIvu92usXcpcRVJU09rUg1tcKuD9n8GvmqeWSuh8M8YwO89/aBcgboJWYEgA652nfkmWJGAIB9OMEIDADAzuAZAQAkeEYAAAmeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQIJnBACQMAvZPQ3z2jfMKjZv16dSL99sWCXFmT7EqbPQPA2HuKrhftVbwvoEu3LslOIdst0KWTuseLXpSlUXvCpWdkUKQzi7u9ayAaxg1QPX25q+vCkt4jly0kmlwGAcXxKIQcz1esb9OfuczGe3fy0oh2uAVVWX9ve1dVHVyO/zBGYLx5iR13tcEzMm8TXsKcrXpCu02dMJa717aX5d98uwR2QNz3sMxzbmWVW1ks5XVXWVwyWtqlpfizQBVj3tk+PHpuOusfmg1kcm+tFckVfzuze7fIJhT6VV9WHgwoDRWBdBC3mK5azlt7IcvNmxl3aoEW7rdS3amS2f5QE+ceLhHO8Zs/RWM4xlQ/ZMGgQNX0253kkNuJTvYA+0wQhMmbUqboNL7eqZ0ey/tkFwI79d3S9Ywgk8Y7YidlIF48Eclz0iCK3JY5yW1op02SO+7HGVs/f4DtgOetxmlVFtPcJQul8b2QNr0csKWcWxZu2gdqpmiUgxvGhcYtRs19i0PQyqCQuT7BSb/87hKufK48XktN2igCZvJ531Ytn7XplusZIE8/6m9hgJ1VRpWIszrZDFe3UfKGeAXmJGAGiAd9hGnClmBADYhxOMwAAA7AyeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQIJnBACQ4BkBACSdzkJ2CK5ZlL1/TxZJuFKpl282rJLi/9nFqbOwaaG1GWDXxuZ/cBo5ZRbxBxw7cW6HsHag154LmGXatXbgdjak29pE4kusZW3CGq63NX15f9yO44h0ahYwGHtaO1DcR+MsbMf1esb9EW2l09Xvs9t/AVDsu8HagUv7+9rmQ6yR38cTZQvHmHfSe1wTK05xmLWnKF+T7rjG2oFGKqn9AntWR2FAMVH78jS/Il1cbZ5jG/OsHVgJawd65fdZO9DQYK8OmF7iquesTbg1x49NZ7tUtBep6EdzRV7N78aal//I2oHm8UqTGrJQKdzW6+qy06gnWYzb2lDPXRAnFjneM2bp7c4Zk+PvmTQIGr6a2sjXxPrn7S30Z9Oy7aEKNcAITJm1KlbDo9JVnbZ7xGDrmzUNoM/D6P1UjIvkBJ4x++B14jLiwRyXPSK4qMljnJbWinTZM7J24KGI+6UdrwkMsx/b95bfc9HLOjDFsWbtoHbK1h9fpTV/7H5PMepX32OoKddGe1OT7BTbxsqDs5wrjxeT03aLApq8nXScR/u+2+kWc5cSV5U09bQi1dQKuz5k82vkq+aRuR7OtA4M7719oJwBeokZAaBDrvYdeaaYEQBgH04wAgMAsDN4RgAACZ4RAECCZwQAkOAZAQAkeEYAAAmeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQIJnBACQ4BkBACR4RgAAiX+FrG9/t4EZR/D03dEWAECnEDMCAEjwjAAAEm9r+sPts69eyIM/vX3z98crGXQxDMMwbUyzps+7WcZxzApkjxenYRdJp6fi4+mRfojzvpaFveW3zZ7ecnF5eGPGR8/fvH8dQgg/vX3z/fjm+/HN+9fhN0+effnjBsadmqnWznU33h3HUWzMMunx+MiE7WRTtTV2dsgwDHOWV1TbW357swcmLqw1/dnts6+Gvt10+iQIH2pIrpL6GR/FM9q8KSe9jyfiwjwjAMAK+L/a+cRvnjz76uftuZ/xs9tnX0S9kKL/UZz979Pwq2/vxeazYjdVJU5NZz/+cH9wsk2zKjXMwRY9X7th9D+GpOsqFk4vTMtBk9f6W43Sm2WWpKspT1OJ9YiEjP5f2x6vMakBmv7Ucs3g9Kxtf72p10BzzPjT2/sOx9gBaf2P8dnpwv/94dPGxMfn97uGqvTUfPm8O53VrGon7vmq6e+bLplZkPIDPc3V12iCzVmLn6v5URcXZstBkxd9BVrXQdZO0c/oStcmVpV1SanB8cZ0iWZPVn+lPbGwVt+y2TTSFT22WnnWlNv10BwzasSx5MzsnlyRWlaV4OPzNx+fO3R65R8wRkGB65KJJc5x64obOwLhfIWPCKVyyMrPOpd4dps03Rrh5SnaQaVLVUjK31tnluQL5xizxDN+fP4mhGdfPHn2ZeTyVvyCJ6tqTjRty9tGVsoXWFJrLwnKYWLrcljLg2eb2CJExTPGrB4z7kAc9312K13zcnmV7eKdc9FcDhdWgGfJjmGnaNOcIjv7sHBseuoczH7PGH9AY4jF8l8k35Cnqrzf5Vjyy7v/LgbRJ7W6/srutn7Y2dqa8m82aSr8tL9bU8hzEULwPgYfboe7n/1XOqYcDxCH0ti0HIN+/af34c9FVcWBZnuMO5J/+q7hjwRpjSnWY9Ebld3VrjIk643UmnuiJz47OJsKZ8+mpNbW+1wjaWNsWktXU168L6nmePy32Ols32hhUlr+wtpQujV2RrKxoV1Prjx+9AcIK8y10zYgszZP3wVaEDtCUR+F5hnBYP8vvecg7uD/FC78AgYqWf6xESxEtKa5ETVQTAAAEv4dCAAgwTMCAEjwjAAAEjwjAIAEzwgAIMEzAgBIDvzf9Nn/fsTXTgAXCzEjAIAEzwgAIDnjLGQnYBhezdvj+LL+kkrhBnkXDfZ3pf+8GLd1LjRKbA/Gwwgn/+kZCzfGrn1JjXCDvIsG+7vSf3bsAqG49oHWtGQYbnZP8dUcBYzjyzieWkUeALzgGWFv4sZg7OUB+uGwfsY5NBvHm3hbnI0PZk9Nu+LCoh4RGE6n5oNCiSav0dWTP/dbiQ4s0WkV77rsb9AvLnTZH5LinbfjhOrla1Kv0V/Mr5GvrLX0Kh7McQ35MI4hhJsQbuZtsZHuZrc1ee/x7K4uME7Gi5/MpL/fsHiJ1k+n2RPrjK+t6e+r7CRdRX+N/SK/WVVt8tl0V9GfJb2qqL9GLazF8WPTc/xlxGjzQTtYy15Sb0AlkcHleKfyhR/HXM39hkZaol9y2o23s6a67G/Qn8o3ZM0lny1bTVhEf8VLavJbCXHi4RzvGbN4vZVLT9x+r0nLKz/hfTZ6exgO7wfYAm+O4kLYYqRrbkpfXlGfnU4949aI+LTGObrkz1XXU2vXtV9oO0vhNNt5lgyCwQnGprNjNUVh45T3uxy/vHQE84YWd8QycZCiNeWy8g22ZRvvDfa79Bvye7JFDKiVZ3pQ63aATjhsHRhRV+wB6OKp1HuK4eascFb/LFCSV8st+xjMx+v/3mCP3tb/B8YekM0GjC77vfqDkl/beHsIuCigyVemnmKMuWfL0y6ZuINS0588MnjSDTlwhaylc+00DMisylHl5saOR5ZHK1vrPxfXlt9LpZeYEQAawAtvBKuqAgBITjACAwCwM3hGAAAJnhEAQIJnBACQ4BkBACR4RgAACZ4RAECCZwQAkOAZAQAkeEYAAAmeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQHJuzzgMSycG74SLyQjAZXBizzgMlzPt7jiOOEeAfvA6lw+3w90LefDzt+M3jxsNEAprVfXgFidfljVjdnMuI3vI1IyRu0oBaGOtgtVWOtsC79Jj/eNdb/rR8zGE4e7FvQv7cDvcPRm+a3KOk1tcRdUBGIHeVK0vOwwkzt2ItQo2Xax1o9W7xCKIq+s/hFO2preOrY564LvyNeM4Eg/arHWz+rnpy8muVH5GTukZAQA2pSH4SrsaP387PvphuHsRwuu3X4cn09kHbeQXiWR4IBOSlrVp9MOYUfTLzLtiIzzsuxEv6mz7V5OP04oTsu009NiXZDWk6cbZTLOcpitk7MuLWdDyZRSpkbussGZV8f7W2OOy01tPvHrierVM/4OoraYprV2S7UnUujKzSqaDcxt/lkkvEUdSYc2eynyl9mTSHd28fx1uQvjL2/uN+LhxSts1DuZJzRZH5t2fM5w5XnO5S3/xqoaENIx0s6cM+13miYML9dvKtbRqtr32eO3UxFbU01Bvc5pvxEalvH25vVuvrUZ/CDcuea8lIq15wzsCU+RT0Pd4DOF+OOXTuM2T3//tX3/97XeZ2NARMHoZK4Liosyoj6iMUT2uj/iWY6Qbnyoqma+tNF6TqclXczlrGdSOL7Snps5szZJyXs+Gciy2YkJav+RsxpJOzKJycXCJZ/zZ3y0dTd7QLWqMD8c6ap6E+CFcUjt7eOq2IJuv3spZs6fBzqNosy0dpzbE4t2GtGzmprRQ7h3g9trmlV89ZrT5xx+Hd3et3YszldFZUcm8XdS2z5eGB37PuF3SvZWzZo/Lzgtmh698UnZLqJ6FY9OPno9fvw7/fDJ89+OD46m/e/R8/Pr13X/uLLEPt8OrQapyUxNorNgkmVU1PE6rmLHwMXa9ZuLYyjC+RsaFiOmyg0VGckV7TvrdjMtsEQ8qCjf8ziYOFStj2ErWalk/kHE+VGKg+ZvH90d+/TT8+9t7yTQMTN3lon/UZEeoQ/K0hKT3ShMQykMSRwiZ2acUhbVTzXFKcXQ4JJmyx3wNYU2VyHiDSVmy5SxOadYWy3mVgek06Zr7W6/H+GbApV8bzzX8kTGWXTM2nWoWKYoms6vfMG19G+mm2GPfmd31Wg3FdvH6/YkX0+rxZuRiMg7QJ2s9YGksWXMKWrA/NgSA5RB6AABI+HcgAIAEzwgAIMEzAgBI8IwAAJKd/wNzj/HZ2uqp7DnK1DaR8vKPswBgRY4cm04/1V7dmP09Y8h9wmo7r63lAcDLYTHjPvBNEgA0QD/j8RAAAvTGwa3peLfyz6fiVHZe6+wfTsXlY93E10VTc/lyT6Q8X+V1i3hSgC04uDWdnTPV2E0n70tn1jP+gR8fjGdstdPVpq4w8/XpP+r1bqvhEgDYiE5b00NEfHDFCFebmC9NV7twC2qmiprBjQJsRBeeMTstWMyelqTpTuYVPWZOm8PNecEtAmxHF56xf2J3ufospw3eE7cIsCm9eEbD4xgzzhrCy1llYmpv2Cim9sxeq0356Q1pAUDj4P/AZAc90kGVeFubWzvejqfaDtFYTUj8naG8baLmyU+5PsbOLqdrs+ms9ABw4vkZmdcaADail9a0lznkPNoQALhACLsAACRnjRkBALYDzwgAIMEzAgBI8IwAAJL/A8vrQQ07yvP2AAAAAElFTkSuQmCC" alt="" />
搜索到request并下载
修改后缀名whl为zip并解压,复制requests文件夹到python的lib目录下
2.获取网站内容
import requests
useragent = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36'}
html = requests.get("http://tieba.baidu.com/f?ie=utf-8&kw=python",headers=useragent)
print(html.text)
3.向网页提交数据
get从服务器获取数据
post向服务器发送数据
get通过构造url中的参数来实现功能
post是将数据放在header中提交数据
在使用ajax加载数据的时候是不会在源码中显示的,这时候就要发送post请求来获取数据
data={
'type':'',
'sort':'',
'currentPage':''
}
html_text = requests.post("http://xxxxxx/student/courses/searchCourses",data=data)
print(html_text.text)
---------------------------------------------------------------------------------------------
举个小例子,这是从计科学院的视频上记录下来的笔记
import requests
import re # -*- coding: utf-8 -*- class spider(object): def changepage(self,url,total_page):
now_page = int(re.search('pageNum=(\d+)',url,re.S).group(1))
page_group=[]
for i in range(now_page,total_page+1):
link = re.sub('pageNum=\d+','pageNum=%s'%i,url,re.S)
page_group.append(link)
return page_group def getsource(self,url):
html = requests.get(url)
return html.text def geteveryclass(self,source):
everyclass = re.findall('(<li id=.*?</li>)',source,re.S)
return everyclass def getinfo(self,eachclass):
info = {}
info['title'] = re.search('alt="(.*?)"',eachclass,re.S).group(1)
info['content'] = re.search('display: none;">(.*?)</p>',eachclass,re.S).group(1)
timeandlevel = re.findall('<em>(.*?)</em>',eachclass,re.S)
info['classtime'] = timeandlevel[0]
info['classlevel'] = timeandlevel[1]
info['learnnum'] = re.search('"learn-number">(.*?)</em>',eachclass,re.S).group(1)
return info def saveinfo(self,classinfo):
f=open('info.txt','a')#open(路径+文件名,读写模式)r只读,r+读写,w新建(会覆盖原有文件),a追加,b二进制文件.常用模式
for each in classinfo:
f.writelines('title:'+each['title']+'\n')
# f.writelines('content:'+each['content'+'\n'])
# f.writelines('classtime:'+each['classtime'+'\n'])
# f.writelines('classlevel:'+each['classlevel'+'\n'])
# f.writelines('learnnum:'+each['learnnum'+'\n\n'])
f.close() if __name__ == '__main__': classinfo = []#定义一个列表,里面将放置所有课程的字典
url = 'http://www.jikexueyuan.com/course/?pageNum=1'
jikespider = spider()#实例化
all_links = jikespider.changepage(url,2)#获取20页的url
for link in all_links:
print('读取文件:'+link)
html = jikespider.getsource(link)#获取当前页资源
everyclass = jikespider.geteveryclass(html)#获取当前页所有的li
for each in everyclass:
info = jikespider.getinfo(each)#分类获取资源
classinfo.append(info)#加入列表
jikespider.saveinfo(classinfo)#写操作
python 爬一下的更多相关文章
- Python 爬取所有51VOA网站的Learn a words文本及mp3音频
Python 爬取所有51VOA网站的Learn a words文本及mp3音频 #!/usr/bin/env python # -*- coding: utf-8 -*- #Python 爬取所有5 ...
- python爬取网站数据
开学前接了一个任务,内容是从网上爬取特定属性的数据.正好之前学了python,练练手. 编码问题 因为涉及到中文,所以必然地涉及到了编码的问题,这一次借这个机会算是彻底搞清楚了. 问题要从文字的编码讲 ...
- python爬取某个网页的图片-如百度贴吧
python爬取某个网页的图片-如百度贴吧 作者:vpoet mail:vpoet_sir@163.com 注:随意copy,不用告诉我 #coding:utf-8 import urllib imp ...
- python爬爬爬之单网页html页面爬取
python爬爬爬之单网页html页面爬取 作者:vpoet mail:vpoet_sir@163.com 注:随意copy 不用告诉我 #coding:utf-8 import urllib2 Re ...
- Python:爬取乌云厂商列表,使用BeautifulSoup解析
在SSS论坛看到有人写的Python爬取乌云厂商,想练一下手,就照着重新写了一遍 原帖:http://bbs.sssie.com/thread-965-1-1.html #coding:utf- im ...
- 使用python爬取MedSci上的期刊信息
使用python爬取medsci上的期刊信息,通过设定条件,然后获取相应的期刊的的影响因子排名,期刊名称,英文全称和影响因子.主要过程如下: 首先,通过分析网站http://www.medsci.cn ...
- Python 爬取美团酒店信息
事由:近期和朋友聊天,聊到黄山酒店事情,需要了解一下黄山的酒店情况,然后就想着用python 爬一些数据出来,做个参考 主要思路:通过查找,基本思路清晰,目标明确,仅仅爬取美团莫一地区的酒店信息,不过 ...
- python爬取免费优质IP归属地查询接口
python爬取免费优质IP归属地查询接口 具体不表,我今天要做的工作就是: 需要将数据库中大量ip查询出起归属地 刚开始感觉好简单啊,毕竟只需要从百度找个免费接口然后来个python脚本跑一晚上就o ...
- Python爬取豆瓣指定书籍的短评
Python爬取豆瓣指定书籍的短评 #!/usr/bin/python # coding=utf-8 import re import sys import time import random im ...
- python爬取网页的通用代码框架
python爬取网页的通用代码框架: def getHTMLText(url):#参数code缺省值为‘utf-8’(编码方式) try: r=requests.get(url,timeout=30) ...
随机推荐
- Android 支付宝钱包手势password裂纹战斗
底 随着移动互联网和手机屏幕越做越大的普及等..购物在移动设备上.消费是必不可少的人们习惯于生活. 随着这股浪潮的兴起,安全.便捷的移动支付的需求也越来越大.故,各大互联网公司纷纷推出了移动支付平台. ...
- 用JS判断两个数字的大小
js中的var定义的变量默认是字符串,如果单纯的比较字符串的话,会出现错误,需要先转化为int类型在做比较. [备注:110和18在你写的程序中是18大的,因为 这两个数都是字符串,而1和1相等之后比 ...
- result 相关
1.dispatcher 2.redirect 3.chain 4.redirectAction 5.freemarker 6.httpheader 7.stream 8.velocity 9.xsl ...
- ExtJS003单击按钮弹出window
html部分 <input type="button" id="btn" name="name" value="点击&quo ...
- sort函数简单调用方法
向量调用sort函数排序,一般有三个参数,即为sort(v.begin(),v.end(),cmp),第三个传入的是比较函数的地址(函数名),决定你比较的性质,运用灵活 #include<ios ...
- Java学习之finally
如果catch中有return语句,finally里面的语句还会执行吗? 会执行,在return语句的中间执行 public class Test{ public static void main(S ...
- Java学习之内部类
示例1: package com.swust.面向对象; class Person1{ private String username="zhangsan"; public Per ...
- jQuery学习之结构解析
jQuery内核解析 1.jQuery整体的结构是一个匿名函数 (function( window, undefined ) {})(window); 2.jQuery就是一个很普通的函数,也是一个很 ...
- Linux 系统之Systemd
子贡问为仁.子曰:“工欲善其事,必先利其器.居是邦也,事其大夫之贤者,友其士之仁者.”——孔子(春秋)<论语·卫灵公> [工欲善其事,必先利其器] 掌握一门技术,知道其发展历程是非常重要的 ...
- 删除Mac中所有 .DS_Store 隐藏文件
删除Mac中所有 .DS_Store 隐藏文件 35•36509感谢 longago 分享于 2012-07-06 12:01|只看该作者|倒序浏览|打印 Safari 5.1.7 Mac OS X ...