MiniCrawler

Github Path :

https://github.com/LixinZhang/miniCrowler

Introduction:

MiniCrawler is a simple web crawler implemented by Python.
Threadpool tech is used to speed up fetching pages.
One can config the crawler through modify the file config.py. And start the crawling job using python run.py.
The webs pages fetched will be stored in pages folder.
check_status.py helps you check the job's status as following:

Rank            Hostname        Times
----------------------------------------
   1             buaa.edu.cn        40
   2             baixing.com        32
   3             cnblogs.com        29
   4              hao123.com         5
   5           xinhuanet.com         2
   6          visionplaza.cn         2
   7           people.com.cn         2
   8                  org.cn         2
   9                 news.cn         2
  10             most.gov.cn         2

More Detail

You can find more detail in my Chinese Blog. Python 多线程抓取网页

MiniCrowler的更多相关文章

随机推荐

【Python之路Day12】网络篇之Python操作MySQL
pymysql是Python中操作MySQL的模块,使用方法和MySQLDB几乎一样. 1. 执行SQL语句 #!/usr/bin/env python3 # -*- coding: utf-8 -* ...
SSH Secure Shell Client的windows客户端样式设置
SSH Secure Shell Client下载:http://pan.baidu.com/s/1dF2lDdf 其他工具(putty-0.67)下载:http://pan.baidu.com/s/ ...
优秀的UI插件
妹子UI: http://amazeui.org/getting-started 百度图表: http://echarts.baidu.com/ 手机UI库:https://github.com/ac ...
[转]js动态获取图片长宽尺寸
http://blog.phpdr.net/js-get-image-size.html lightbox类效果为了让图片居中显示而使用预加载,需要等待完全加载完毕才能显示,体验不佳(如filick相 ...
Bug Tracker 使用笔记（有图有真相）
目的:管理Bug,完善业务流程. 前提条件:BugTracker是基于IIS和SQL Server和Asp.Net的.相当于一个Web端的管理系统. 1.下载地址 http://sourceforge ...
python time
Python-time 计算程序运行时间 import time start = time.clock() time.sleep(5) end = time.clock() runtime = end ...
LRU缓存实现(Java)
LRU Cache的LinkedHashMap实现 LRU Cache的链表+HashMap实现 LinkedHashMap的FIFO实现调用示例 LRU是Least Recently Used 的 ...
你应该知道的RPC原理
你应该知道的RPC原理在学校期间大家都写过不少程序,比如写个hello world服务类,然后本地调用下,如下所示.这些程序的特点是服务消费方和服务提供方是本地调用关系. 而一旦踏入公司尤其是大型互 ...
intellij自动缩进排版
用鼠标选中需要缩进的代码块,然后输入命令 ctrl+alt+i ref: http://stackoverflow.com/questions/12264127/how-to-reindent-lin ...
C语言实现单链表-04版
前面的版本似乎没能让项目经理满意,他还希望这个链表有更多的功能: 我们接下来要解决几个比较简单的功能: Problem 1,更加友好的显示数据: 2,能够通过名字删除节点: Solution 首先我们 ...

MiniCrowler

MiniCrawler

Github Path :

Introduction:

More Detail

MiniCrowler的更多相关文章

随机推荐

热门专题