Python抓取淘宝IP地址数据

def fetch(ip):
    url = 'http://ip.taobao.com/service/getIpInfo.php?ip=' + ip
    result = []
    try:
        response = urllib.urlopen(url).read()
        jsondata = json.loads(response)
        if jsondata[u'code'] == 0:
            result.append(jsondata[u'data'][u'ip'].encode('utf-8'))
            result.append(jsondata[u'data'][u'country'].encode('utf-8'))
            result.append(jsondata[u'data'][u'country_id'].encode('utf-8'))
            result.append(jsondata[u'data'][u'area'].encode('utf-8'))
            result.append(jsondata[u'data'][u'area_id'].encode('utf-8'))
            result.append(jsondata[u'data'][u'region'].encode('utf-8'))
            result.append(jsondata[u'data'][u'region_id'].encode('utf-8'))
            result.append(jsondata[u'data'][u'city'].encode('utf-8'))
            result.append(jsondata[u'data'][u'city_id'].encode('utf-8'))
            result.append(jsondata[u'data'][u'county'].encode('utf-8'))
            result.append(jsondata[u'data'][u'county_id'].encode('utf-8'))
            result.append(jsondata[u'data'][u'isp'].encode('utf-8'))
            result.append(jsondata[u'data'][u'isp_id'].encode('utf-8'))
        else:
            return 0, result
    except:
        logging.exception("Url open failed:" + url)
        return 0, result
    return 1, result
 
def worker(ratelimit, jobs, results, progress):
    global cancel
    while not cancel:
        try:
            ratelimit.ratecontrol()
            ip = jobs.get(timeout=2) # Wait 2 seconds
            ok, result = fetch(ip)
            if not ok:
                logging.error("Fetch information failed, ip:{}".format(ip))
                progress.put("") # Notify the progress even it failed
            elif result is not None:
                results.put(" ".join(result))
            jobs.task_done()    # Notify one item
        except Queue.Empty:
            pass
        except:
            logging.exception("Unknown Error!")

def process(target, results, progress):
    global cancel
    while not cancel:
        try:
            line = results.get(timeout=5)
        except Queue.Empty:
            pass
        else:
            print >>target, line
            progress.put("")
            results.task_done()

def progproc(progressbar, count, progress):
    """
    Since ProgressBar is not a thread-safe class, we use a Queue to do the counting job, like
    two other threads. Use this thread do the printing of progress bar. By the way, it will
    print to stderr, which does not conflict with the default result output(stdout).
    """
    idx = 1
    while True:
        try:
            progress.get(timeout=5)
        except Queue.Empty:
            pass
        else:
            progressbar.update(idx)
            idx += 1

Python抓取淘宝IP地址数据的更多相关文章

Python 爬取淘宝商品数据挖掘分析实战
Python 爬取淘宝商品数据挖掘分析实战项目内容本案例选择>> 商品类目:沙发: 数量:共100页 4400个商品: 筛选条件:天猫.销量从高到低.价格500元以上. 爬取淘宝商品 ...
简单的抓取淘宝关键字信息、图片的Python爬虫|Python3中级玩家：淘宝天猫商品搜索爬虫自动化工具（第一篇）
Python3中级玩家:淘宝天猫商品搜索爬虫自动化工具(第一篇) 淘宝改字段,Bugfix,查看https://github.com/hunterhug/taobaoscrapy.git 由于Gith ...
python(27) 抓取淘宝买家秀
selenium 是Web应用测试工具,可以利用selenium和python,以及chromedriver等工具实现一些动态加密网站的抓取.本文利用这些工具抓取淘宝内衣评价买家秀图片. 准备工作下 ...
一次Python爬虫的修改，抓取淘宝MM照片
这篇文章是2016-3-2写的,时隔一年了,淘宝的验证机制也有了改变.代码不一定有效,保留着作为一种代码学习. 崔大哥这有篇>>小白爬虫第一弹之抓取妹子图不失为学python爬虫的绝佳教 ...
Python爬虫实战四之抓取淘宝MM照片
原文:Python爬虫实战四之抓取淘宝MM照片其实还有好多,大家可以看 Python爬虫学习系列教程福利啊福利,本次为大家带来的项目是抓取淘宝MM照片并保存起来,大家有没有很激动呢? 本篇目标 1. ...
Python爬虫实战八之利用Selenium抓取淘宝匿名旺旺
更新其实本文的初衷是为了获取淘宝的非匿名旺旺,在淘宝详情页的最下方有相关评论,含有非匿名旺旺号,快一年了淘宝都没有修复这个. 可就在今天,淘宝把所有的账号设置成了匿名显示,SO,获取非匿名旺旺号已经 ...
Python爬虫之一 PySpider 抓取淘宝MM的个人信息和图片
ySpider 是一个非常方便并且功能强大的爬虫框架,支持多线程爬取.JS动态解析,提供了可操作界面.出错重试.定时爬取等等的功能,使用非常人性化. 本篇通过做一个PySpider 项目,来理解 Py ...
甜咸粽子党大战，Python爬取淘宝上的粽子数据并进行分析
前言本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理. 爬虫爬取淘宝数据,本次采用的方法是:Selenium控制Chrome浏览 ...
芝麻HTTP：Python爬虫实战之抓取淘宝MM照片
本篇目标 1.抓取淘宝MM的姓名,头像,年龄 2.抓取每一个MM的资料简介以及写真图片 3.把每一个MM的写真图片按照文件夹保存到本地 4.熟悉文件保存的过程 1.URL的格式在这里我们用到的URL ...

随机推荐

yii2 AR需要注意的地方
$model::find(['id'=>1])->one();和$model::findOne(1); 返回的都是一个Obj不能使用foreach遍历,其他都是返回对象数组可以用forea ...
PHP源码阅读笔记一（explode和implode函数分析）
PHP源码阅读笔记一一.explode和implode函数array explode ( string separator, string string [, int limit] )此函数返回由字符 ...
c# .net使用SqlDataReader注意的几点
转自:http://blog.knowsky.com/258608.htm 1.当SqlDataReader没有关闭之前,数据库连接会一直保持open状态,所以在使用SqlDataReader时,使用 ...
JSP页面的五种跳转方法
①RequestDispatcher.forward() 是在服务器端起作用,当使用forward()时,Servlet engine传递HTTP请求从当前的Servlet or JSP到另外一个Se ...
单片微机原理P0：80C51结构原理
本来我真的不想让51的东西出现在我的博客上的,因为51这种东西真的太low了,学了最多就所谓的垃圾科创利用一下,但是想一下这门课我也要考试,还是写一点东西顺便放博客上吧. 这一系列主要参考<单片 ...
数据库之--- SQLite 语句
一. 基础创表操作: 1. 创建表 CREATE TABLE IF NOT EXISTS t_dog(name text, age bolb, weight real); 2. 插入记录 INSERT ...
Rewrite Path in Asp.Net MVC Project
// Change the current path so that the Routing handler can correctly interpret // the request, then ...
IntelliJ 一键添加双引号
IntelliJ IDEA(目前最新版本为2016.2.5) 不直接支持一键向选取内容添加双引号,但可以通过配置 Live Template 实现此功能. 以下为配置方法(针对2016.2.5版,其它 ...
解决方案:安装wordpress出现500 Internal Server Error
做一个资讯站点的时候遇到一个wordpress不知道算不算常见的问题:程序安装的时候提示500 Internal Server Error 那么最终百度谷歌找到以下解决方案: 安装新版本wordpre ...
剖析ECMALL的登录机制
在ecmall.php文件中实例化控制器类,每一个控制器类,必须继承(extends)upload\admin\app\backend.base.php文件.在继承中调用方法是谁先被继承谁的方法被先调 ...

Python抓取淘宝IP地址数据

Python抓取淘宝IP地址数据的更多相关文章

随机推荐

热门专题