python学习之——爬取网页信息

爬取网页信息

说明：正则表达式有待学习，之后完善此功能

#encoding=utf-8

import urllib

import re

import os

#获取网络数据到指定文件

def getHtml(url,fname):

    #fname = 'C:\\Users\cuiliting\\Desktop\\weather_forecast.txt'

    data =[]

    page = urllib.urlopen(url)

    html = page.read()

    data.append(html)

    fobj = open(fname,'w')

    fobj.writelines('%s' %x for x in data)

    fobj.close()

    page.close()  

#从文件获取得到所需数据

def getWeather(fname,weath_keyword):

    fobj = open(fname,'r')

    for eachline in fobj:

        if weath_keyword in eachline:

            print eachline,

    fobj.close()  

if __name__ == '__main__':

    #url_input = raw_input("please enter url:")

    #fname_input = raw_input("please enter fname:")

    #weath_keyword_input = 'raw_input("please enter keywords:")'

    url_input = 'http://www.weather.com.cn/weather/101010100.shtml'

    fname_input = 'C:\\Users\\Desktop\\weather_forecast.txt'

    weath_keyword_input = '<h1>10日（明天）</h1>'

    getHtml(url_input,fname_input)

    getWeather(fname_input,weath_keyword_input)

python学习之——爬取网页信息的更多相关文章

[python] 常用正则表达式爬取网页信息及分析HTML标签总结【转】
[python] 常用正则表达式爬取网页信息及分析HTML标签总结转http://blog.csdn.net/Eastmount/article/details/51082253 标签: pytho ...
Python爬取网页信息
Python爬取网页信息的步骤以爬取英文名字网站(https://nameberry.com/)中每个名字的评论内容,包括英文名,用户名,评论的时间和评论的内容为例. 1.确认网址在浏览器中输入初 ...
常用正则表达式爬取网页信息及HTML分析总结
Python爬取网页信息时,经常使用的正则表达式及方法. 1.获取<tr></tr>标签之间内容 2.获取<a href..></a>超链接之间内容 3 ...
python requests库爬取网页小实例：爬取网页图片
爬取网页图片: #网络图片爬取 import requests import os root="C://Users//Lenovo//Desktop//" #以原文件名作为保存的文 ...
python 爬虫（爬取网页的img并下载）
from urllib.request import urlopen # 引用第三方库 import requests #引用requests/用于访问网站(没安装需要安装) from pyquery ...
[Python学习] 简单爬取CSDN下载资源信息
这是一篇Python爬取CSDN下载资源信息的样例,主要是通过urllib2获取CSDN某个人全部资源的资源URL.资源名称.下载次数.分数等信息.写这篇文章的原因是我想获取自己的资源全部的评论信息. ...
python 嵌套爬取网页信息
当需要的信息要经过两个链接才能打开的时候,就需要用到嵌套爬取. 比如要爬取起点中文网排行榜的小说简介,找到榜单网址:https://www.qidian.com/all?orderId=&st ...
python+selenium+PhantomJS爬取网页动态加载内容
一般我们使用python的第三方库requests及框架scrapy来爬取网上的资源,但是设计javascript渲染的页面却不能抓取,此时,我们使用web自动化测试化工具Selenium+无界面浏览 ...
Python简单程序爬取天气信息，定时发邮件给朋友【高薪必学】
前段时间看到了这个博客.https://blog.csdn.net/weixin_45081575/article/details/102886718.他用了request模块,这不巧了么,正好我刚用 ...

随机推荐

黄聪：基于jQuery+JSON的省市区三级地区联动
查看演示:http://www.helloweba.com/demo/cityselect/ 源码下载:http://files.cnblogs.com/files/huangcong/citysel ...
flash 自定义右键功能
使用过程可能遇到的问题: MouseEvent.RIGHT_CLICK TypeError: Error #2007: 参数 type 不能为空. 首先在项目中设置编译参数. 如果是as项目,则在项目 ...
bzoj2006 noi2010 超级钢琴主席树 + 优先队列
Time Limit: 20 Sec Memory Limit: 552 MBSubmit: 2435 Solved: 1195 Description 小 Z是一个小有名气的钢琴家,最近C博士送 ...
Android Gradle 技巧之一： Build Variant 相关
Build Variant android gradle 插件,允许对最终的包以多个维度进行组合. BuildVariant = ProductFlavor x BuildType 两个维度最常见的 ...
预处理语句--#define、#error和#warning
1.#define语句我们经常会这样定义一些宏: #define BLOCK 8192 但这样的宏却不能在字符串中展开,如: printf("The BLOCK numb ...
[HTML] CSS3 圆角
使用 CSS3 border-radius 属性,你可以给任何元素制作 "圆角". CSS3 border-radius 属性使用 CSS3 border-radius 属性,你 ...
How to force the UI to refresh immediately(WPF)
Question 0 Sign in to vote Folks, In my application, when the user hits "Submit" button, I ...
Node以数据块的形式读取文件
在Node中,http响应头信息中Transfer-Encoding默认是chunked. Transfer-Encoding:chunked Node天生的异步机制,让响应可以逐步产生. 这种发送数 ...
sql（转自http://www.imooc.com/article/2325）
http://www.imooc.com/article/2325
百度地图API示例之添加/删除工具条、比例尺控件
代码 <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" cont ...

python学习之——爬取网页信息

python学习之——爬取网页信息的更多相关文章

随机推荐

热门专题