xpath爬取新浪天气

参考资料：

http://cuiqingcai.com/1052.html

http://cuiqingcai.com/2621.html

http://www.cnblogs.com/jixin/p/5131040.html

完整代码：

 # -*- coding:utf-8 -*-
 import urllib2
 from lxml import etree
 user_agent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0"
 headers = {'User-Agent':user_agent}
 url = 'http://weather.sina.com.cn/'
 user_input = raw_input("请输入你想查询天气的城市的拼音，如beijing\n")
 # print user_input
 url = url+user_input
 # print url
 req = urllib2.Request(url,headers=headers)
 reponse = urllib2.urlopen(req)
 text = reponse.read()
 # print html
 # print type(text)
 html = etree.HTML(text)
 # print html
 # result = etree.tostring(html)
 # print result
 # 有时候当天天气信息的icons和times数据只有19条，分开处理
 def change_list(lis):
     new_lis = []
     if len(lis) == 19:
         if lis == icons:
             new_lis.append(lis[0])
             for i in range(1,19,2):
                 new_lis.append(lis[i]+u'转'+lis[i+1])
         elif lis == times:
             new_lis.append(lis[0].text)
             for i in range(1, 19, 2):
                 new_lis.append(lis[i].text + u'到' + lis[i + 1].text)
     elif len(lis) == 20:
         if lis == icons:
             for i in range(20,2):
                 new_lis.append(lis[i]+u'转'+lis[i+1])
         elif lis == times:
             for i in range(20,2):
                 new_lis.append(lis[i].text + u'到' + lis[i + 1].text)
     return new_lis
 note1 = html.xpath('//*[@class="wt_tt0_note"]')
 note2 = html.xpath('//*[@class="wt_tt0_note"]/..')
 # print note[0].text
 dates = html.xpath('//*[@class="wt_fc_c0_i_date"]')
 days = html.xpath('//*[@class="wt_fc_c0_i_date"]/following-sibling::*[1]')
 icons = html.xpath('//*[@class="wt_fc_c0_i_icons clearfix"]/img/@alt')
 # print len(icons)
 icons = change_list(icons)
 times = html.xpath('//*[@class="wt_fc_c0_i_times"]/span')
 times = change_list(times)
 temps = html.xpath('//*[@class="wt_fc_c0_i_temp"]')
 tips = html.xpath('//*[@class="wt_fc_c0_i_tip"]')
 ls = html.xpath('//*[@class="l"]')
 rs = html.xpath('//*[@class="r"]')
 print note1[0].text,note2[0].text
 # print len(ls),len(rs)
 # PM2.5和空气质量只有7条数据
 for i in range(7):
     print dates[i].text, days[i].text, times[i], icons[i], temps[i].text, tips[i].text, u'PM2.5:'+ls[i].text, u'空气质量:'+rs[i].text
 for i in range(7,10):
     print dates[i].text, days[i].text, times[i], icons[i], temps[i].text, tips[i].text

xpath爬取新浪天气的更多相关文章

selenium+BeautifulSoup+phantomjs爬取新浪新闻
一下载phantomjs,把phantomjs.exe的文件路径加到环境变量中,也可以phantomjs.exe拷贝到一个已存在的环境变量路径中,比如我用的anaconda,我把phantomjs. ...
Python3：爬取新浪、网易、今日头条、UC四大网站新闻标题及内容
Python3:爬取新浪.网易.今日头条.UC四大网站新闻标题及内容以爬取相应网站的社会新闻内容为例: 一.新浪: 新浪网的新闻比较好爬取,我是用BeautifulSoup直接解析的,它并没有使用J ...
Python 爬虫实例（7）—— 爬取新浪军事新闻
我们打开新浪新闻,看到页面如下,首先去爬取一级 url,图片中蓝色圆圈部分第二zh张图片,显示需要分页, 源代码: # coding:utf-8 import json import redis i ...
python3爬虫-爬取新浪新闻首页所有新闻标题
准备工作:安装requests和BeautifulSoup4.打开cmd,输入如下命令 pip install requests pip install BeautifulSoup4 打开我们要爬取的 ...
python3使用requests爬取新浪热门微博
微博登录的实现代码来源:https://gist.github.com/mrluanma/3621775 相关环境使用的python3.4,发现配置好环境后可以直接使用pip easy_instal ...
python2.7 爬虫初体验爬取新浪国内新闻_20161130
python2.7 爬虫初学习模块:BeautifulSoup requests 1.获取新浪国内新闻标题 2.获取新闻url 3.还没想好,想法是把第2步的url 获取到下载网页源代码再去分析源 ...
python爬取新浪股票数据—绘图【原创分享】
目标:不做蜡烛图,只用折线图绘图,绘出四条线之间的关系. 注:未使用接口,仅爬虫学习,不做任何违法操作. """ 新浪财经,爬取历史股票数据 ""&q ...
【python3】爬取新浪的栏目分类
目标地址: http://www.sina.com.cn/ 查看源代码,分析: 1 整个分类在 div main-nav 里边包含 2 分组情况:1,4一组 . 2,3一组 . 5 一组 .6一组 ...
Python-定时爬取指定城市天气(一)-发送给关心的微信好友
一.背景上班的日子总是3点一线,家里,公司和上班的路径,对于一个特别懒得我来说,经常遇到上班路上下雨了,而我却没带伞,多么痛的领悟.最近对python有一种狂热的学习热情,写了4年多的C++代码,对 ...

随机推荐

阿里云 ss！！！
一.shadowsocks简介(以下来自wiki百科) shadowsocks是一种基于Socks5代理方式的网络数据加密传输包,并采用Apache许可证.GPL.MIT许可证等多种自由软件许可协议开 ...
poj2182（线段树求序列第k小）
题目链接:https://vjudge.net/problem/POJ-2182 题意:有n头牛,从1..n编号,乱序排成一列,给出第2..n个牛其前面有多少比它编号小的个数,记为a[i],求该序列的 ...
Android 代码判断是否有网络
public void okGo() { ConnectivityManager connectivityManager = (ConnectivityManager) getSystemServic ...
python 基础 ----- 常用的方法
one.将英文字符设置大小写 upper() :将英文字符设置大写 lower() :将英文字符设置小写 two.去掉字符串的首尾空格不能去除字符串中间的空格偶 strip() : 去掉 ...
pythone函数基础（12）连接Redis，写数据，读数据，修改数据
需要导入Resdis模块 import redisip = '127.0.0.1'password='123456'r = redis.Redis(host=ip,password=password, ...
内置函数-map
ret = map(abs,[-1,1,2,3]) print(ret) for i in ret: print(i) l = [1,-2,3,6,8,-7] l.sort(key=abs) prin ...
Linux-Centon7安装以及配置
环境:MacOS 10.13.6 虚拟机:VirtualBox6.0(VirtualBox-6.0.4-128413-OSX.dmg) Linux:Centos7(CentOS-7-x86_64-Mi ...
Android 8.0+ 更新安装apk失败的问题
最近做项目发现Android 8.0+ 更新安装apk时出现安装失败的情况总结原因是缺少安装的权限 Android 8.0 (Android O)为了针对一些流氓软件引导用户安装其他无关应用. ...
Spring+Quartz集群环境下定时调度的解决方案
集群环境可能出现的问题在上一篇博客我们介绍了如何在自己的项目中从无到有的添加了Quartz定时调度引擎,其实就是一个Quartz 和Spring的整合过程,很容易实现,但是我们现在企业中项目通常都是 ...
LCMapString/LCMapStringEx实现简体字、繁体字的转换。
c#环境下想要最小程度不使用第三方库.程序性能,于是选择了这个Windows API. 转载自https://coolong124220.nidbox.com/diary/read/8045380 对 ...

xpath爬取新浪天气

xpath爬取新浪天气的更多相关文章

随机推荐

热门专题