python 爬一下
1.安装Requests
window:pip install requests
linux:sudo pip install requests
国内安装缓慢,建议到:
http://www.lfd.uci.edu/~gohlke/pythonlibs/
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAbMAAACjCAIAAACPLmf/AAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAA7EAAAOxAGVKw4bAAARQElEQVR4nO2dva4dtxGAuUHaVEkZw68gI43kh7DzBIHLSH0AqdJVpQukl8v4EXKNPICryI0gPUCaQEUapwhSpdoUK+/lHXKGHO4f95zvwyn2Z3Y45HJnh+QeMozXSgg3ruNLJNuuFTLFS7zyS9DS2jTRy6amDFcpZ/ta7uDEL8NBDMOraWMcX8bb4ux8cDoyC8+StnxW/7ybVWVbqyVdo8QrH8sMw6v6JFxkC0cr5+DPr3Z5etNDxX0vytekK7QZmrWzdiqp/QItC1mDi4nal6f5FeluVK9Oz7GOOYSb9DVoxCNZGSNW0i5Jd43jDfakv2IqNktiTMOebDEahZZNy9jOpisKxHXf6+WLWS7WBy1djWIcbdvgquf1xmQ3Kh+fa+awmHFmfmXVvEhj4cowKqu/wUL75R/bU2lVfRi4MGDUrhVqs9uanTX2G2e92TGyUCms1ai2dA3h+oSMAmyo5y6IE4sc7xmz9Hbn4trZ9oytknS3xGWyp8HehOam5XZdE5vWE285b53fC6ZTz9gVa1WsBj1d1WnNmH36Q/tn67xTznvyi6MNKJN993ZSM+LBHJc9IrioyWOcltaKdNkTywdnOXdS/jF2jCa6O3qwX9wv7XhNYJjmvcP8nothHMdjEn54L4tjzdpB7ZStP75Ka/7Y/Z5i1K++x1BTro0CpybZKbaNlQdnOdvWGqlr3wmkd80W0OQNxNvIvu92usXcpcRVJU09rUg1tcKuD9n8GvmqeWSuh8M8YwO89/aBcgboJWYEgA652nfkmWJGAIB9OMEIDADAzuAZAQAkeEYAAAmeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQIJnBACQMAvZPQ3z2jfMKjZv16dSL99sWCXFmT7EqbPQPA2HuKrhftVbwvoEu3LslOIdst0KWTuseLXpSlUXvCpWdkUKQzi7u9ayAaxg1QPX25q+vCkt4jly0kmlwGAcXxKIQcz1esb9OfuczGe3fy0oh2uAVVWX9ve1dVHVyO/zBGYLx5iR13tcEzMm8TXsKcrXpCu02dMJa717aX5d98uwR2QNz3sMxzbmWVW1ks5XVXWVwyWtqlpfizQBVj3tk+PHpuOusfmg1kcm+tFckVfzuze7fIJhT6VV9WHgwoDRWBdBC3mK5azlt7IcvNmxl3aoEW7rdS3amS2f5QE+ceLhHO8Zs/RWM4xlQ/ZMGgQNX0253kkNuJTvYA+0wQhMmbUqboNL7eqZ0ey/tkFwI79d3S9Ywgk8Y7YidlIF48Eclz0iCK3JY5yW1op02SO+7HGVs/f4DtgOetxmlVFtPcJQul8b2QNr0csKWcWxZu2gdqpmiUgxvGhcYtRs19i0PQyqCQuT7BSb/87hKufK48XktN2igCZvJ531Ytn7XplusZIE8/6m9hgJ1VRpWIszrZDFe3UfKGeAXmJGAGiAd9hGnClmBADYhxOMwAAA7AyeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQIJnBACQ4BkBACSdzkJ2CK5ZlL1/TxZJuFKpl282rJLi/9nFqbOwaaG1GWDXxuZ/cBo5ZRbxBxw7cW6HsHag154LmGXatXbgdjak29pE4kusZW3CGq63NX15f9yO44h0ahYwGHtaO1DcR+MsbMf1esb9EW2l09Xvs9t/AVDsu8HagUv7+9rmQ6yR38cTZQvHmHfSe1wTK05xmLWnKF+T7rjG2oFGKqn9AntWR2FAMVH78jS/Il1cbZ5jG/OsHVgJawd65fdZO9DQYK8OmF7iquesTbg1x49NZ7tUtBep6EdzRV7N78aal//I2oHm8UqTGrJQKdzW6+qy06gnWYzb2lDPXRAnFjneM2bp7c4Zk+PvmTQIGr6a2sjXxPrn7S30Z9Oy7aEKNcAITJm1KlbDo9JVnbZ7xGDrmzUNoM/D6P1UjIvkBJ4x++B14jLiwRyXPSK4qMljnJbWinTZM7J24KGI+6UdrwkMsx/b95bfc9HLOjDFsWbtoHbK1h9fpTV/7H5PMepX32OoKddGe1OT7BTbxsqDs5wrjxeT03aLApq8nXScR/u+2+kWc5cSV5U09bQi1dQKuz5k82vkq+aRuR7OtA4M7719oJwBeokZAaBDrvYdeaaYEQBgH04wAgMAsDN4RgAACZ4RAECCZwQAkOAZAQAkeEYAAAmeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQIJnBACQ4BkBACR4RgAAiX+FrG9/t4EZR/D03dEWAECnEDMCAEjwjAAAEm9r+sPts69eyIM/vX3z98crGXQxDMMwbUyzps+7WcZxzApkjxenYRdJp6fi4+mRfojzvpaFveW3zZ7ecnF5eGPGR8/fvH8dQgg/vX3z/fjm+/HN+9fhN0+effnjBsadmqnWznU33h3HUWzMMunx+MiE7WRTtTV2dsgwDHOWV1TbW357swcmLqw1/dnts6+Gvt10+iQIH2pIrpL6GR/FM9q8KSe9jyfiwjwjAMAK+L/a+cRvnjz76uftuZ/xs9tnX0S9kKL/UZz979Pwq2/vxeazYjdVJU5NZz/+cH9wsk2zKjXMwRY9X7th9D+GpOsqFk4vTMtBk9f6W43Sm2WWpKspT1OJ9YiEjP5f2x6vMakBmv7Ucs3g9Kxtf72p10BzzPjT2/sOx9gBaf2P8dnpwv/94dPGxMfn97uGqvTUfPm8O53VrGon7vmq6e+bLplZkPIDPc3V12iCzVmLn6v5URcXZstBkxd9BVrXQdZO0c/oStcmVpV1SanB8cZ0iWZPVn+lPbGwVt+y2TTSFT22WnnWlNv10BwzasSx5MzsnlyRWlaV4OPzNx+fO3R65R8wRkGB65KJJc5x64obOwLhfIWPCKVyyMrPOpd4dps03Rrh5SnaQaVLVUjK31tnluQL5xizxDN+fP4mhGdfPHn2ZeTyVvyCJ6tqTjRty9tGVsoXWFJrLwnKYWLrcljLg2eb2CJExTPGrB4z7kAc9312K13zcnmV7eKdc9FcDhdWgGfJjmGnaNOcIjv7sHBseuoczH7PGH9AY4jF8l8k35Cnqrzf5Vjyy7v/LgbRJ7W6/srutn7Y2dqa8m82aSr8tL9bU8hzEULwPgYfboe7n/1XOqYcDxCH0ti0HIN+/af34c9FVcWBZnuMO5J/+q7hjwRpjSnWY9Ebld3VrjIk643UmnuiJz47OJsKZ8+mpNbW+1wjaWNsWktXU168L6nmePy32Ols32hhUlr+wtpQujV2RrKxoV1Prjx+9AcIK8y10zYgszZP3wVaEDtCUR+F5hnBYP8vvecg7uD/FC78AgYqWf6xESxEtKa5ETVQTAAAEv4dCAAgwTMCAEjwjAAAEjwjAIAEzwgAIMEzAgBIDvzf9Nn/fsTXTgAXCzEjAIAEzwgAIDnjLGQnYBhezdvj+LL+kkrhBnkXDfZ3pf+8GLd1LjRKbA/Gwwgn/+kZCzfGrn1JjXCDvIsG+7vSf3bsAqG49oHWtGQYbnZP8dUcBYzjyzieWkUeALzgGWFv4sZg7OUB+uGwfsY5NBvHm3hbnI0PZk9Nu+LCoh4RGE6n5oNCiSav0dWTP/dbiQ4s0WkV77rsb9AvLnTZH5LinbfjhOrla1Kv0V/Mr5GvrLX0Kh7McQ35MI4hhJsQbuZtsZHuZrc1ee/x7K4uME7Gi5/MpL/fsHiJ1k+n2RPrjK+t6e+r7CRdRX+N/SK/WVVt8tl0V9GfJb2qqL9GLazF8WPTc/xlxGjzQTtYy15Sb0AlkcHleKfyhR/HXM39hkZaol9y2o23s6a67G/Qn8o3ZM0lny1bTVhEf8VLavJbCXHi4RzvGbN4vZVLT9x+r0nLKz/hfTZ6exgO7wfYAm+O4kLYYqRrbkpfXlGfnU4949aI+LTGObrkz1XXU2vXtV9oO0vhNNt5lgyCwQnGprNjNUVh45T3uxy/vHQE84YWd8QycZCiNeWy8g22ZRvvDfa79Bvye7JFDKiVZ3pQ63aATjhsHRhRV+wB6OKp1HuK4eascFb/LFCSV8st+xjMx+v/3mCP3tb/B8YekM0GjC77vfqDkl/beHsIuCigyVemnmKMuWfL0y6ZuINS0588MnjSDTlwhaylc+00DMisylHl5saOR5ZHK1vrPxfXlt9LpZeYEQAawAtvBKuqAgBITjACAwCwM3hGAAAJnhEAQIJnBACQ4BkBACR4RgAACZ4RAECCZwQAkOAZAQAkeEYAAAmeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQHJuzzgMSycG74SLyQjAZXBizzgMlzPt7jiOOEeAfvA6lw+3w90LefDzt+M3jxsNEAprVfXgFidfljVjdnMuI3vI1IyRu0oBaGOtgtVWOtsC79Jj/eNdb/rR8zGE4e7FvQv7cDvcPRm+a3KOk1tcRdUBGIHeVK0vOwwkzt2ItQo2Xax1o9W7xCKIq+s/hFO2preOrY564LvyNeM4Eg/arHWz+rnpy8muVH5GTukZAQA2pSH4SrsaP387PvphuHsRwuu3X4cn09kHbeQXiWR4IBOSlrVp9MOYUfTLzLtiIzzsuxEv6mz7V5OP04oTsu009NiXZDWk6cbZTLOcpitk7MuLWdDyZRSpkbussGZV8f7W2OOy01tPvHrierVM/4OoraYprV2S7UnUujKzSqaDcxt/lkkvEUdSYc2eynyl9mTSHd28fx1uQvjL2/uN+LhxSts1DuZJzRZH5t2fM5w5XnO5S3/xqoaENIx0s6cM+13miYML9dvKtbRqtr32eO3UxFbU01Bvc5pvxEalvH25vVuvrUZ/CDcuea8lIq15wzsCU+RT0Pd4DOF+OOXTuM2T3//tX3/97XeZ2NARMHoZK4Liosyoj6iMUT2uj/iWY6Qbnyoqma+tNF6TqclXczlrGdSOL7Snps5szZJyXs+Gciy2YkJav+RsxpJOzKJycXCJZ/zZ3y0dTd7QLWqMD8c6ap6E+CFcUjt7eOq2IJuv3spZs6fBzqNosy0dpzbE4t2GtGzmprRQ7h3g9trmlV89ZrT5xx+Hd3et3YszldFZUcm8XdS2z5eGB37PuF3SvZWzZo/Lzgtmh698UnZLqJ6FY9OPno9fvw7/fDJ89+OD46m/e/R8/Pr13X/uLLEPt8OrQapyUxNorNgkmVU1PE6rmLHwMXa9ZuLYyjC+RsaFiOmyg0VGckV7TvrdjMtsEQ8qCjf8ziYOFStj2ErWalk/kHE+VGKg+ZvH90d+/TT8+9t7yTQMTN3lon/UZEeoQ/K0hKT3ShMQykMSRwiZ2acUhbVTzXFKcXQ4JJmyx3wNYU2VyHiDSVmy5SxOadYWy3mVgek06Zr7W6/H+GbApV8bzzX8kTGWXTM2nWoWKYoms6vfMG19G+mm2GPfmd31Wg3FdvH6/YkX0+rxZuRiMg7QJ2s9YGksWXMKWrA/NgSA5RB6AABI+HcgAIAEzwgAIMEzAgBI8IwAAJKd/wNzj/HZ2uqp7DnK1DaR8vKPswBgRY4cm04/1V7dmP09Y8h9wmo7r63lAcDLYTHjPvBNEgA0QD/j8RAAAvTGwa3peLfyz6fiVHZe6+wfTsXlY93E10VTc/lyT6Q8X+V1i3hSgC04uDWdnTPV2E0n70tn1jP+gR8fjGdstdPVpq4w8/XpP+r1bqvhEgDYiE5b00NEfHDFCFebmC9NV7twC2qmiprBjQJsRBeeMTstWMyelqTpTuYVPWZOm8PNecEtAmxHF56xf2J3ufospw3eE7cIsCm9eEbD4xgzzhrCy1llYmpv2Cim9sxeq0356Q1pAUDj4P/AZAc90kGVeFubWzvejqfaDtFYTUj8naG8baLmyU+5PsbOLqdrs+ms9ABw4vkZmdcaADail9a0lznkPNoQALhACLsAACRnjRkBALYDzwgAIMEzAgBI8IwAAJL/A8vrQQ07yvP2AAAAAElFTkSuQmCC" alt="" />
搜索到request并下载
修改后缀名whl为zip并解压,复制requests文件夹到python的lib目录下
2.获取网站内容
- import requests
- useragent = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36'}
- html = requests.get("http://tieba.baidu.com/f?ie=utf-8&kw=python",headers=useragent)
- print(html.text)
3.向网页提交数据
get从服务器获取数据
post向服务器发送数据
get通过构造url中的参数来实现功能
post是将数据放在header中提交数据
在使用ajax加载数据的时候是不会在源码中显示的,这时候就要发送post请求来获取数据
- data={
- 'type':'',
- 'sort':'',
- 'currentPage':''
- }
- html_text = requests.post("http://xxxxxx/student/courses/searchCourses",data=data)
- print(html_text.text)
---------------------------------------------------------------------------------------------
举个小例子,这是从计科学院的视频上记录下来的笔记
- import requests
- import re
- # -*- coding: utf-8 -*-
- class spider(object):
- def changepage(self,url,total_page):
- now_page = int(re.search('pageNum=(\d+)',url,re.S).group(1))
- page_group=[]
- for i in range(now_page,total_page+1):
- link = re.sub('pageNum=\d+','pageNum=%s'%i,url,re.S)
- page_group.append(link)
- return page_group
- def getsource(self,url):
- html = requests.get(url)
- return html.text
- def geteveryclass(self,source):
- everyclass = re.findall('(<li id=.*?</li>)',source,re.S)
- return everyclass
- def getinfo(self,eachclass):
- info = {}
- info['title'] = re.search('alt="(.*?)"',eachclass,re.S).group(1)
- info['content'] = re.search('display: none;">(.*?)</p>',eachclass,re.S).group(1)
- timeandlevel = re.findall('<em>(.*?)</em>',eachclass,re.S)
- info['classtime'] = timeandlevel[0]
- info['classlevel'] = timeandlevel[1]
- info['learnnum'] = re.search('"learn-number">(.*?)</em>',eachclass,re.S).group(1)
- return info
- def saveinfo(self,classinfo):
- f=open('info.txt','a')#open(路径+文件名,读写模式)r只读,r+读写,w新建(会覆盖原有文件),a追加,b二进制文件.常用模式
- for each in classinfo:
- f.writelines('title:'+each['title']+'\n')
- # f.writelines('content:'+each['content'+'\n'])
- # f.writelines('classtime:'+each['classtime'+'\n'])
- # f.writelines('classlevel:'+each['classlevel'+'\n'])
- # f.writelines('learnnum:'+each['learnnum'+'\n\n'])
- f.close()
- if __name__ == '__main__':
- classinfo = []#定义一个列表,里面将放置所有课程的字典
- url = 'http://www.jikexueyuan.com/course/?pageNum=1'
- jikespider = spider()#实例化
- all_links = jikespider.changepage(url,2)#获取20页的url
- for link in all_links:
- print('读取文件:'+link)
- html = jikespider.getsource(link)#获取当前页资源
- everyclass = jikespider.geteveryclass(html)#获取当前页所有的li
- for each in everyclass:
- info = jikespider.getinfo(each)#分类获取资源
- classinfo.append(info)#加入列表
- jikespider.saveinfo(classinfo)#写操作
python 爬一下的更多相关文章
- Python 爬取所有51VOA网站的Learn a words文本及mp3音频
Python 爬取所有51VOA网站的Learn a words文本及mp3音频 #!/usr/bin/env python # -*- coding: utf-8 -*- #Python 爬取所有5 ...
- python爬取网站数据
开学前接了一个任务,内容是从网上爬取特定属性的数据.正好之前学了python,练练手. 编码问题 因为涉及到中文,所以必然地涉及到了编码的问题,这一次借这个机会算是彻底搞清楚了. 问题要从文字的编码讲 ...
- python爬取某个网页的图片-如百度贴吧
python爬取某个网页的图片-如百度贴吧 作者:vpoet mail:vpoet_sir@163.com 注:随意copy,不用告诉我 #coding:utf-8 import urllib imp ...
- python爬爬爬之单网页html页面爬取
python爬爬爬之单网页html页面爬取 作者:vpoet mail:vpoet_sir@163.com 注:随意copy 不用告诉我 #coding:utf-8 import urllib2 Re ...
- Python:爬取乌云厂商列表,使用BeautifulSoup解析
在SSS论坛看到有人写的Python爬取乌云厂商,想练一下手,就照着重新写了一遍 原帖:http://bbs.sssie.com/thread-965-1-1.html #coding:utf- im ...
- 使用python爬取MedSci上的期刊信息
使用python爬取medsci上的期刊信息,通过设定条件,然后获取相应的期刊的的影响因子排名,期刊名称,英文全称和影响因子.主要过程如下: 首先,通过分析网站http://www.medsci.cn ...
- Python 爬取美团酒店信息
事由:近期和朋友聊天,聊到黄山酒店事情,需要了解一下黄山的酒店情况,然后就想着用python 爬一些数据出来,做个参考 主要思路:通过查找,基本思路清晰,目标明确,仅仅爬取美团莫一地区的酒店信息,不过 ...
- python爬取免费优质IP归属地查询接口
python爬取免费优质IP归属地查询接口 具体不表,我今天要做的工作就是: 需要将数据库中大量ip查询出起归属地 刚开始感觉好简单啊,毕竟只需要从百度找个免费接口然后来个python脚本跑一晚上就o ...
- Python爬取豆瓣指定书籍的短评
Python爬取豆瓣指定书籍的短评 #!/usr/bin/python # coding=utf-8 import re import sys import time import random im ...
- python爬取网页的通用代码框架
python爬取网页的通用代码框架: def getHTMLText(url):#参数code缺省值为‘utf-8’(编码方式) try: r=requests.get(url,timeout=30) ...
随机推荐
- (Android) ContentProvider 实例
ContentProvider 用于应用程序(Android Application)之间传递数据,包括Insert, update, delete, query. 下面的例子是在两个应用之间传递数据 ...
- Hibernate学习之面试问题汇总
1. Hibernate 的检索方式有哪些 ? ① 导航对象图检索 ② OID检索 ③ HQL检索 ④ QBC检索 ⑤ 本地SQL检索 2. 在 Hibernate 中 Java 对象的状态有哪些 ? ...
- Android 开发笔记“浅谈DDMS视图”
DDMS 的全称是Dalvik Debug Monitor Service,即Dalvik调试监控服务,是一个可视化的调试监控工具.它主要是对系统运行后台日志的监控,还有系统线程,模拟器状态的监控.此 ...
- sourceTree安装与使用
1,下载并安装 sourceTree http://downloads.atlassian.com/software/sourcetree/windows/SourceTreeSetup_1.6.14 ...
- 整理网站优化(SEO)的方案
首先,我们来确定一下seo方案的定义是什么,所谓seo方案是指针对于某个网站,在完成了解熟悉的情况下,结合自身的一套seo优化方法来制定完成符合这个网站seo推广思路和策略.接下来就了解一下新手seo ...
- android-适配Adapter
Adapter是把数据和用户界面视图绑定到一起的桥梁类,负责创建用来表示父视图中的每一个条目的子视图,并提供对底层数据的访问. public class MainActivity extends Ac ...
- step_by_step_G+入门-在线服务
第一步:先大概介绍下我们的窗体的布局框架,窗体大体分为以下3大块: 顶部:也就是大的模块划分(比如首页,软件管家,在线服务等) 内容区域:根据选择的不同的顶部模块,进行不同的内容展示: 底部:设置,下 ...
- 怎么给没链接的flash加超链接
最近开始准备设计一个广告条,本想用阿里妈妈的的banner marker来设计,却遗憾的发现,banner marker已经实行收费模式了. 我不得不启用另一款在线banner生成工具,百度旗下的&q ...
- elk 分布式数据同步
zjtest7-redis:/elk/elasticsearch/data/es_cluster/nodes/0/indices/library# strings ./1/index/_3.cfs | ...
- perl lwp关闭ssl校验
use LWP::UserAgent; use HTTP::Cookies; use HTTP::Headers; use HTTP::Response; use Encode; use File:: ...