1.安装Requests
window:pip install requests
linux:sudo pip install requests
国内安装缓慢,建议到:
http://www.lfd.uci.edu/~gohlke/pythonlibs/

aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAbMAAACjCAIAAACPLmf/AAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAA7EAAAOxAGVKw4bAAARQElEQVR4nO2dva4dtxGAuUHaVEkZw68gI43kh7DzBIHLSH0AqdJVpQukl8v4EXKNPICryI0gPUCaQEUapwhSpdoUK+/lHXKGHO4f95zvwyn2Z3Y45HJnh+QeMozXSgg3ruNLJNuuFTLFS7zyS9DS2jTRy6amDFcpZ/ta7uDEL8NBDMOraWMcX8bb4ux8cDoyC8+StnxW/7ybVWVbqyVdo8QrH8sMw6v6JFxkC0cr5+DPr3Z5etNDxX0vytekK7QZmrWzdiqp/QItC1mDi4nal6f5FeluVK9Oz7GOOYSb9DVoxCNZGSNW0i5Jd43jDfakv2IqNktiTMOebDEahZZNy9jOpisKxHXf6+WLWS7WBy1djWIcbdvgquf1xmQ3Kh+fa+awmHFmfmXVvEhj4cowKqu/wUL75R/bU2lVfRi4MGDUrhVqs9uanTX2G2e92TGyUCms1ai2dA3h+oSMAmyo5y6IE4sc7xmz9Hbn4trZ9oytknS3xGWyp8HehOam5XZdE5vWE285b53fC6ZTz9gVa1WsBj1d1WnNmH36Q/tn67xTznvyi6MNKJN993ZSM+LBHJc9IrioyWOcltaKdNkTywdnOXdS/jF2jCa6O3qwX9wv7XhNYJjmvcP8nothHMdjEn54L4tjzdpB7ZStP75Ka/7Y/Z5i1K++x1BTro0CpybZKbaNlQdnOdvWGqlr3wmkd80W0OQNxNvIvu92usXcpcRVJU09rUg1tcKuD9n8GvmqeWSuh8M8YwO89/aBcgboJWYEgA652nfkmWJGAIB9OMEIDADAzuAZAQAkeEYAAAmeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQIJnBACQMAvZPQ3z2jfMKjZv16dSL99sWCXFmT7EqbPQPA2HuKrhftVbwvoEu3LslOIdst0KWTuseLXpSlUXvCpWdkUKQzi7u9ayAaxg1QPX25q+vCkt4jly0kmlwGAcXxKIQcz1esb9OfuczGe3fy0oh2uAVVWX9ve1dVHVyO/zBGYLx5iR13tcEzMm8TXsKcrXpCu02dMJa717aX5d98uwR2QNz3sMxzbmWVW1ks5XVXWVwyWtqlpfizQBVj3tk+PHpuOusfmg1kcm+tFckVfzuze7fIJhT6VV9WHgwoDRWBdBC3mK5azlt7IcvNmxl3aoEW7rdS3amS2f5QE+ceLhHO8Zs/RWM4xlQ/ZMGgQNX0253kkNuJTvYA+0wQhMmbUqboNL7eqZ0ey/tkFwI79d3S9Ywgk8Y7YidlIF48Eclz0iCK3JY5yW1op02SO+7HGVs/f4DtgOetxmlVFtPcJQul8b2QNr0csKWcWxZu2gdqpmiUgxvGhcYtRs19i0PQyqCQuT7BSb/87hKufK48XktN2igCZvJ531Ytn7XplusZIE8/6m9hgJ1VRpWIszrZDFe3UfKGeAXmJGAGiAd9hGnClmBADYhxOMwAAA7AyeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQIJnBACQ4BkBACSdzkJ2CK5ZlL1/TxZJuFKpl282rJLi/9nFqbOwaaG1GWDXxuZ/cBo5ZRbxBxw7cW6HsHag154LmGXatXbgdjak29pE4kusZW3CGq63NX15f9yO44h0ahYwGHtaO1DcR+MsbMf1esb9EW2l09Xvs9t/AVDsu8HagUv7+9rmQ6yR38cTZQvHmHfSe1wTK05xmLWnKF+T7rjG2oFGKqn9AntWR2FAMVH78jS/Il1cbZ5jG/OsHVgJawd65fdZO9DQYK8OmF7iquesTbg1x49NZ7tUtBep6EdzRV7N78aal//I2oHm8UqTGrJQKdzW6+qy06gnWYzb2lDPXRAnFjneM2bp7c4Zk+PvmTQIGr6a2sjXxPrn7S30Z9Oy7aEKNcAITJm1KlbDo9JVnbZ7xGDrmzUNoM/D6P1UjIvkBJ4x++B14jLiwRyXPSK4qMljnJbWinTZM7J24KGI+6UdrwkMsx/b95bfc9HLOjDFsWbtoHbK1h9fpTV/7H5PMepX32OoKddGe1OT7BTbxsqDs5wrjxeT03aLApq8nXScR/u+2+kWc5cSV5U09bQi1dQKuz5k82vkq+aRuR7OtA4M7719oJwBeokZAaBDrvYdeaaYEQBgH04wAgMAsDN4RgAACZ4RAECCZwQAkOAZAQAkeEYAAAmeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQIJnBACQ4BkBACR4RgAAiX+FrG9/t4EZR/D03dEWAECnEDMCAEjwjAAAEm9r+sPts69eyIM/vX3z98crGXQxDMMwbUyzps+7WcZxzApkjxenYRdJp6fi4+mRfojzvpaFveW3zZ7ecnF5eGPGR8/fvH8dQgg/vX3z/fjm+/HN+9fhN0+effnjBsadmqnWznU33h3HUWzMMunx+MiE7WRTtTV2dsgwDHOWV1TbW357swcmLqw1/dnts6+Gvt10+iQIH2pIrpL6GR/FM9q8KSe9jyfiwjwjAMAK+L/a+cRvnjz76uftuZ/xs9tnX0S9kKL/UZz979Pwq2/vxeazYjdVJU5NZz/+cH9wsk2zKjXMwRY9X7th9D+GpOsqFk4vTMtBk9f6W43Sm2WWpKspT1OJ9YiEjP5f2x6vMakBmv7Ucs3g9Kxtf72p10BzzPjT2/sOx9gBaf2P8dnpwv/94dPGxMfn97uGqvTUfPm8O53VrGon7vmq6e+bLplZkPIDPc3V12iCzVmLn6v5URcXZstBkxd9BVrXQdZO0c/oStcmVpV1SanB8cZ0iWZPVn+lPbGwVt+y2TTSFT22WnnWlNv10BwzasSx5MzsnlyRWlaV4OPzNx+fO3R65R8wRkGB65KJJc5x64obOwLhfIWPCKVyyMrPOpd4dps03Rrh5SnaQaVLVUjK31tnluQL5xizxDN+fP4mhGdfPHn2ZeTyVvyCJ6tqTjRty9tGVsoXWFJrLwnKYWLrcljLg2eb2CJExTPGrB4z7kAc9312K13zcnmV7eKdc9FcDhdWgGfJjmGnaNOcIjv7sHBseuoczH7PGH9AY4jF8l8k35Cnqrzf5Vjyy7v/LgbRJ7W6/srutn7Y2dqa8m82aSr8tL9bU8hzEULwPgYfboe7n/1XOqYcDxCH0ti0HIN+/af34c9FVcWBZnuMO5J/+q7hjwRpjSnWY9Ebld3VrjIk643UmnuiJz47OJsKZ8+mpNbW+1wjaWNsWktXU168L6nmePy32Ols32hhUlr+wtpQujV2RrKxoV1Prjx+9AcIK8y10zYgszZP3wVaEDtCUR+F5hnBYP8vvecg7uD/FC78AgYqWf6xESxEtKa5ETVQTAAAEv4dCAAgwTMCAEjwjAAAEjwjAIAEzwgAIMEzAgBIDvzf9Nn/fsTXTgAXCzEjAIAEzwgAIDnjLGQnYBhezdvj+LL+kkrhBnkXDfZ3pf+8GLd1LjRKbA/Gwwgn/+kZCzfGrn1JjXCDvIsG+7vSf3bsAqG49oHWtGQYbnZP8dUcBYzjyzieWkUeALzgGWFv4sZg7OUB+uGwfsY5NBvHm3hbnI0PZk9Nu+LCoh4RGE6n5oNCiSav0dWTP/dbiQ4s0WkV77rsb9AvLnTZH5LinbfjhOrla1Kv0V/Mr5GvrLX0Kh7McQ35MI4hhJsQbuZtsZHuZrc1ee/x7K4uME7Gi5/MpL/fsHiJ1k+n2RPrjK+t6e+r7CRdRX+N/SK/WVVt8tl0V9GfJb2qqL9GLazF8WPTc/xlxGjzQTtYy15Sb0AlkcHleKfyhR/HXM39hkZaol9y2o23s6a67G/Qn8o3ZM0lny1bTVhEf8VLavJbCXHi4RzvGbN4vZVLT9x+r0nLKz/hfTZ6exgO7wfYAm+O4kLYYqRrbkpfXlGfnU4949aI+LTGObrkz1XXU2vXtV9oO0vhNNt5lgyCwQnGprNjNUVh45T3uxy/vHQE84YWd8QycZCiNeWy8g22ZRvvDfa79Bvye7JFDKiVZ3pQ63aATjhsHRhRV+wB6OKp1HuK4eascFb/LFCSV8st+xjMx+v/3mCP3tb/B8YekM0GjC77vfqDkl/beHsIuCigyVemnmKMuWfL0y6ZuINS0588MnjSDTlwhaylc+00DMisylHl5saOR5ZHK1vrPxfXlt9LpZeYEQAawAtvBKuqAgBITjACAwCwM3hGAAAJnhEAQIJnBACQ4BkBACR4RgAACZ4RAECCZwQAkOAZAQAkeEYAAAmeEQBAgmcEAJDgGQEAJHhGAAAJnhEAQHJuzzgMSycG74SLyQjAZXBizzgMlzPt7jiOOEeAfvA6lw+3w90LefDzt+M3jxsNEAprVfXgFidfljVjdnMuI3vI1IyRu0oBaGOtgtVWOtsC79Jj/eNdb/rR8zGE4e7FvQv7cDvcPRm+a3KOk1tcRdUBGIHeVK0vOwwkzt2ItQo2Xax1o9W7xCKIq+s/hFO2preOrY564LvyNeM4Eg/arHWz+rnpy8muVH5GTukZAQA2pSH4SrsaP387PvphuHsRwuu3X4cn09kHbeQXiWR4IBOSlrVp9MOYUfTLzLtiIzzsuxEv6mz7V5OP04oTsu009NiXZDWk6cbZTLOcpitk7MuLWdDyZRSpkbussGZV8f7W2OOy01tPvHrierVM/4OoraYprV2S7UnUujKzSqaDcxt/lkkvEUdSYc2eynyl9mTSHd28fx1uQvjL2/uN+LhxSts1DuZJzRZH5t2fM5w5XnO5S3/xqoaENIx0s6cM+13miYML9dvKtbRqtr32eO3UxFbU01Bvc5pvxEalvH25vVuvrUZ/CDcuea8lIq15wzsCU+RT0Pd4DOF+OOXTuM2T3//tX3/97XeZ2NARMHoZK4Liosyoj6iMUT2uj/iWY6Qbnyoqma+tNF6TqclXczlrGdSOL7Snps5szZJyXs+Gciy2YkJav+RsxpJOzKJycXCJZ/zZ3y0dTd7QLWqMD8c6ap6E+CFcUjt7eOq2IJuv3spZs6fBzqNosy0dpzbE4t2GtGzmprRQ7h3g9trmlV89ZrT5xx+Hd3et3YszldFZUcm8XdS2z5eGB37PuF3SvZWzZo/Lzgtmh698UnZLqJ6FY9OPno9fvw7/fDJ89+OD46m/e/R8/Pr13X/uLLEPt8OrQapyUxNorNgkmVU1PE6rmLHwMXa9ZuLYyjC+RsaFiOmyg0VGckV7TvrdjMtsEQ8qCjf8ziYOFStj2ErWalk/kHE+VGKg+ZvH90d+/TT8+9t7yTQMTN3lon/UZEeoQ/K0hKT3ShMQykMSRwiZ2acUhbVTzXFKcXQ4JJmyx3wNYU2VyHiDSVmy5SxOadYWy3mVgek06Zr7W6/H+GbApV8bzzX8kTGWXTM2nWoWKYoms6vfMG19G+mm2GPfmd31Wg3FdvH6/YkX0+rxZuRiMg7QJ2s9YGksWXMKWrA/NgSA5RB6AABI+HcgAIAEzwgAIMEzAgBI8IwAAJKd/wNzj/HZ2uqp7DnK1DaR8vKPswBgRY4cm04/1V7dmP09Y8h9wmo7r63lAcDLYTHjPvBNEgA0QD/j8RAAAvTGwa3peLfyz6fiVHZe6+wfTsXlY93E10VTc/lyT6Q8X+V1i3hSgC04uDWdnTPV2E0n70tn1jP+gR8fjGdstdPVpq4w8/XpP+r1bqvhEgDYiE5b00NEfHDFCFebmC9NV7twC2qmiprBjQJsRBeeMTstWMyelqTpTuYVPWZOm8PNecEtAmxHF56xf2J3ufospw3eE7cIsCm9eEbD4xgzzhrCy1llYmpv2Cim9sxeq0356Q1pAUDj4P/AZAc90kGVeFubWzvejqfaDtFYTUj8naG8baLmyU+5PsbOLqdrs+ms9ABw4vkZmdcaADail9a0lznkPNoQALhACLsAACRnjRkBALYDzwgAIMEzAgBI8IwAAJL/A8vrQQ07yvP2AAAAAElFTkSuQmCC" alt="" />
搜索到request并下载    
修改后缀名whl为zip并解压,复制requests文件夹到python的lib目录下

2.获取网站内容

import requests
useragent = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36'}
html = requests.get("http://tieba.baidu.com/f?ie=utf-8&kw=python",headers=useragent)
print(html.text)

3.向网页提交数据
get从服务器获取数据
post向服务器发送数据
get通过构造url中的参数来实现功能
post是将数据放在header中提交数据

在使用ajax加载数据的时候是不会在源码中显示的,这时候就要发送post请求来获取数据

data={
'type':'',
'sort':'',
'currentPage':''
}
html_text = requests.post("http://xxxxxx/student/courses/searchCourses",data=data)
print(html_text.text)

---------------------------------------------------------------------------------------------

举个小例子,这是从计科学院的视频上记录下来的笔记

import requests
import re # -*- coding: utf-8 -*- class spider(object): def changepage(self,url,total_page):
now_page = int(re.search('pageNum=(\d+)',url,re.S).group(1))
page_group=[]
for i in range(now_page,total_page+1):
link = re.sub('pageNum=\d+','pageNum=%s'%i,url,re.S)
page_group.append(link)
return page_group def getsource(self,url):
html = requests.get(url)
return html.text def geteveryclass(self,source):
everyclass = re.findall('(<li id=.*?</li>)',source,re.S)
return everyclass def getinfo(self,eachclass):
info = {}
info['title'] = re.search('alt="(.*?)"',eachclass,re.S).group(1)
info['content'] = re.search('display: none;">(.*?)</p>',eachclass,re.S).group(1)
timeandlevel = re.findall('<em>(.*?)</em>',eachclass,re.S)
info['classtime'] = timeandlevel[0]
info['classlevel'] = timeandlevel[1]
info['learnnum'] = re.search('"learn-number">(.*?)</em>',eachclass,re.S).group(1)
return info def saveinfo(self,classinfo):
f=open('info.txt','a')#open(路径+文件名,读写模式)r只读,r+读写,w新建(会覆盖原有文件),a追加,b二进制文件.常用模式
for each in classinfo:
f.writelines('title:'+each['title']+'\n')
# f.writelines('content:'+each['content'+'\n'])
# f.writelines('classtime:'+each['classtime'+'\n'])
# f.writelines('classlevel:'+each['classlevel'+'\n'])
# f.writelines('learnnum:'+each['learnnum'+'\n\n'])
f.close() if __name__ == '__main__': classinfo = []#定义一个列表,里面将放置所有课程的字典
url = 'http://www.jikexueyuan.com/course/?pageNum=1'
jikespider = spider()#实例化
all_links = jikespider.changepage(url,2)#获取20页的url
for link in all_links:
print('读取文件:'+link)
html = jikespider.getsource(link)#获取当前页资源
everyclass = jikespider.geteveryclass(html)#获取当前页所有的li
for each in everyclass:
info = jikespider.getinfo(each)#分类获取资源
classinfo.append(info)#加入列表
jikespider.saveinfo(classinfo)#写操作

python 爬一下的更多相关文章

  1. Python 爬取所有51VOA网站的Learn a words文本及mp3音频

    Python 爬取所有51VOA网站的Learn a words文本及mp3音频 #!/usr/bin/env python # -*- coding: utf-8 -*- #Python 爬取所有5 ...

  2. python爬取网站数据

    开学前接了一个任务,内容是从网上爬取特定属性的数据.正好之前学了python,练练手. 编码问题 因为涉及到中文,所以必然地涉及到了编码的问题,这一次借这个机会算是彻底搞清楚了. 问题要从文字的编码讲 ...

  3. python爬取某个网页的图片-如百度贴吧

    python爬取某个网页的图片-如百度贴吧 作者:vpoet mail:vpoet_sir@163.com 注:随意copy,不用告诉我 #coding:utf-8 import urllib imp ...

  4. python爬爬爬之单网页html页面爬取

    python爬爬爬之单网页html页面爬取 作者:vpoet mail:vpoet_sir@163.com 注:随意copy 不用告诉我 #coding:utf-8 import urllib2 Re ...

  5. Python:爬取乌云厂商列表,使用BeautifulSoup解析

    在SSS论坛看到有人写的Python爬取乌云厂商,想练一下手,就照着重新写了一遍 原帖:http://bbs.sssie.com/thread-965-1-1.html #coding:utf- im ...

  6. 使用python爬取MedSci上的期刊信息

    使用python爬取medsci上的期刊信息,通过设定条件,然后获取相应的期刊的的影响因子排名,期刊名称,英文全称和影响因子.主要过程如下: 首先,通过分析网站http://www.medsci.cn ...

  7. Python 爬取美团酒店信息

    事由:近期和朋友聊天,聊到黄山酒店事情,需要了解一下黄山的酒店情况,然后就想着用python 爬一些数据出来,做个参考 主要思路:通过查找,基本思路清晰,目标明确,仅仅爬取美团莫一地区的酒店信息,不过 ...

  8. python爬取免费优质IP归属地查询接口

    python爬取免费优质IP归属地查询接口 具体不表,我今天要做的工作就是: 需要将数据库中大量ip查询出起归属地 刚开始感觉好简单啊,毕竟只需要从百度找个免费接口然后来个python脚本跑一晚上就o ...

  9. Python爬取豆瓣指定书籍的短评

    Python爬取豆瓣指定书籍的短评 #!/usr/bin/python # coding=utf-8 import re import sys import time import random im ...

  10. python爬取网页的通用代码框架

    python爬取网页的通用代码框架: def getHTMLText(url):#参数code缺省值为‘utf-8’(编码方式) try: r=requests.get(url,timeout=30) ...

随机推荐

  1. swig模板 html代码自然状态下输出是转义的,必须加一个函数来转换为html代码;

    <div>{{o.content|raw}}</div> |raw 相当于一个函数,转义函数,最终输出结果为html代码:

  2. VB.NET版机房收费系统---SqlHelper

    SqlHelper,最早接触这个词儿的时候,好像是13年的暑假,那个夏天来的比往年来的稍晚一些,呵呵,sqlhelper,翻译成中文就是数据库助手,帮手.百度百科这样对她进行阐述: SqlHelper ...

  3. cc2540 cc2541 低功耗实測和总结-与注意事项 - 低功耗小于10uA

    CC2541 CC2540 实现超低功耗是很重要的: 我们来总结一下实现方法: 1,有定时器在跑时会一直跑在  PM2  电流在  300uA左右.    没有定时器跑后会到 PM3 , 电流会少于1 ...

  4. 优化器的使用oracle ---explain plan

    如果要分析某条SQL的性能问题,通常我们要先看SQL的执行计划,看看SQL的每一步执行是否存在问题. 如果一条SQL平时执行的好好的,却有一天突然性能很差,如果排除了系统资源和阻塞的原因,那么基本可以 ...

  5. ASP.NET之电子商务系统开发-2(购物车功能)

    一.前言 继上次的首页数据列表后,这是第二篇.记录一下购物车这个比较庞大的功能,可能实现的方法跟其他人有点不一样,不过原理都差不多,是将cookie存数据库里面的. 二.开始 首先看一下购物车流程及对 ...

  6. C#编写的windows服务安装后启动提示“服务启动后又停止了”

    使用C#编写的windows服务安装到服务器上行后进行启动时,总是提示“服务启动后又停止了”. 检查了服务逻辑是没问题,安装在开发本地也是正常,网上查了资料说是可能是服务没有注册,我检查了服务是正常注 ...

  7. Js闭包与循环

    目标:点击任何一个li,提示当前点击位置 <ul> <li>第1个</li> <li>第2个</li> <li>第3个</ ...

  8. MFC 简单实现 DES 算法

    前言 徐旭东老师说过学者就应该对知识抱有敬畏之心,所以我的博客的标题总喜欢加上"简单"二字,就是为了提醒自己,自己所学知识只是皮毛,离真理还远矣. DES 算法 DES算法是密码体 ...

  9. Githut Token (hidden): Githut 安装验证

    登录https://github.com 进入https://github.com/settings/profile 参考 http://jingyan.baidu.com/article/22fe7 ...

  10. [转] iOS多线程编程之NSOperation和NSOperationQueue的使用

    <iOS多线程编程之NSThread的使用> 介绍三种多线程编程和NSThread的使用,这篇介绍NSOperation的使用. 使用 NSOperation的方式有两种, 一种是用定义好 ...