There are many programs that can be used to extract bulk information from a web site, including browser extensions and some web services. Depending on your browser, tools like Readability (which helps extract text from a page) or DownThemAll (which allows you to download many files at once) will help you automate some tedious tasks, while Chrome’s Scraper extension was explicitly built to extract tables from web sites. Developer extensions like FireBug (for Firefox, the same thing is already included in Chrome, Safari and IE) let you track exactly how a web site is structured and what communications happen between your browser and the server.

ScraperWiki is a web site that allows you to code scrapers in a number of different programming languages, including Python, Ruby and PHP. If you want to get started with scraping without the hassle of setting up a programming environment on your computer, this is the way to go. Other web services, such as Google Spreadsheets and Yahoo! Pipes also allow you to perform some extraction from other web sites.

- See more at: http://datajournalismhandbook.org/1.0/en/getting_data_3.html#sthash.l3Zv6bi9.dpuf

Tools that help you scrape web data----帮助你收集web数据的工具的更多相关文章

  1. 关于将dede织梦data目录迁移出web目录

    关于将dede织梦data目录迁移出web目录织梦官方提供了一个教程,但是如果你是按照他们提供的教程做的话会出现很多问题.比如验证码问题,图片显示问题等等一大堆.织梦官方这种是很不负责任的,因为那个教 ...

  2. Python Web-第二周-正则表达式(Using Python to Access Web Data)

    0.课程地址与说明 1.课程地址:https://www.coursera.org/learn/python-network-data/home/welcome 2.课程全名:Using Python ...

  3. 【Python学习笔记】Coursera课程《Using Python to Access Web Data》 密歇根大学 Charles Severance——Week6 JSON and the REST Architecture课堂笔记

    Coursera课程<Using Python to Access Web Data> 密歇根大学 Week6 JSON and the REST Architecture 13.5 Ja ...

  4. 【Python学习笔记】Coursera课程《Using Python to Access Web Data 》 密歇根大学 Charles Severance——Week2 Regular Expressions课堂笔记

    Coursera课程<Using Python to Access Web Data > 密歇根大学 Charles Severance Week2 Regular Expressions ...

  5. web.input()和web.data() 遇到特殊字符

    使用web.py的时候,web.input()和web.data() 都可以接收用户从浏览器端输入的参数. web.input()方法返回一个包含从url(GET方法)或http header(POS ...

  6. Dynamic Data linq to SQL Web Application

    微软提供了一个数据驱动网站模板,可以自动生成CRUD页面,使用过程中碰到些问题 1.首先是如何应用,只需要创建个context并且在Global.asax里面加入下面这一句就可以了 DefaultMo ...

  7. 《Using Python to Access Web Data》 Week5 Web Services and XML 课堂笔记

    Coursera课程<Using Python to Access Web Data> 密歇根大学 Week5 Web Services and XML 13.1 Data on the ...

  8. 《Using Python to Access Web Data》Week4 Programs that Surf the Web 课堂笔记

    Coursera课程<Using Python to Access Web Data> 密歇根大学 Week4 Programs that Surf the Web 12.3 Unicod ...

  9. 《Using Python to Access Web Data》 Week3 Networks and Sockets 课堂笔记

    Coursera课程<Using Python to Access Web Data> 密歇根大学 Week3 Networks and Sockets 12.1 Networked Te ...

随机推荐

  1. 技术名词解释——Camus

    由LinkedIn公司开发的消息队列同步框架,提供将Kafka(一种消息队列框架)的数据装载到Hadoop分布式文件系统(HDFS)的功能. 英文版原文出处:http://docs.confluent ...

  2. rman全备份异机恢复

    一.测试环境 [oracle@localhost ~]$ uname -a Linux localhost.localdomain -.el6.x86_64 # SMP Tue May :: EDT ...

  3. hdu 3234 Exclusive-OR

    Exclusive-OR 题意:输入n个点和Q次操作(1 <= n <= 20,000, 2 <= Q <= 40,000).操作和叙述的点标号k(0 < k < ...

  4. C++ map映射的使用方法

    今天考试做了道题,用上了map,这是一道提高组联赛难度的题目,先发题目: ****************************** 1. A-B problem( dec.c/cpp/pas) . ...

  5. [转载]C#读取Excel几种方法的体会

    C#读取Excel几种方法的体会 转载地址:http://developer.51cto.com/art/201302/380622.htm (1) OleDb: 用这种方法读取Excel速度还是非常 ...

  6. http://sofar.blog.51cto.com/353572/1540874

    http://sofar.blog.51cto.com/353572/1540874 http://singlefly.blog.51cto.com/4658189/1368579 http://ww ...

  7. 汇编 db,dw,dd的区别

    db定义字节类型变量,一个字节数据占1个字节单元,读完一个,偏移量加1 dw定义字类型变量,一个字数据占2个字节单元,读完一个,偏移量加2 dd定义双字类型变量,一个双字数据占4个字节单元,读完一个, ...

  8. ANDROID_MARS学习笔记_S01原始版_005_ProgressBar

    一.代码 1.xml(1)main.xml <?xml version="1.0" encoding="utf-8"?> <LinearLay ...

  9. SSH框架中配置Hibernate使用proxool连接池

    一.导入proxool.jar包 案例用的是proxool-0.8.3.jar,一般通过MyEclipse配置的SSH都会包含这个jar,如果没有,就去网上搜下下载导入就好了. 二.新建Proxool ...

  10. HDU 4009 不定根最小树形图

    讲一下建图过程,首先建立一个超级源点S,对于这个源点,向每个HOUSE连一条有向边,权值为该HOUSE建立WELL的费用,即高度*X. 然后每个可以连边的WELL之间,费用为曼哈顿距离*Y,然后考虑两 ...