定时爬虫抓当日免费应用：Scrapy + Tkinter + LaunchControl

花了个周末学了下Scrapy，正好一直想买mindnode，于是顺手做了个爬虫，抓取爱范儿每天的限免应用信息。

Thinking

大概思路就是使用LaunchControl每天定时（比如早上9点50，这时正好刚到公司不久）跑一下爬虫脚本，如果找到感兴趣的应用在限免，就使用Tkinter弹出提示。当然，也可以直接用Scrapy做定时任务，以后再说。

Coding

Scrapy ＋ Tkinter

# -*- coding: utf-8 -*-

import scrapy

import Tkinter

from scrapy.shell import inspect_response

import json

# 设置感兴趣的app名称

I_want_apps = set(['mindnode pro', 'u.memory'])

class XianmianSpider(scrapy.Spider):

    user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'

    name = "xianmian"

    allowed_domains = ["app.so"]

    start_urls = (

        'http://app.so/api/v1.1/appso/discount/?platform=web&limit=10',

    )

    def parse(self, response):

        jsonresponse = json.loads(response.body_as_unicode())

        apps = jsonresponse['objects']

        appTitles = {item['display_name'].lower() for item in apps}

        self.logger.info('today\' apps are: ' + str(appTitles))

        the_apps = appTitles & I_want_apps

        if the_apps:

            self.showMsg('found the apps: {}'.format(list(the_apps)))

    def showMsg(self, msg):

        import Tkinter

        root = Tkinter.Tk()

        root.title('福利到！')

        label = Tkinter.Label(root, text=msg)

        label.pack()

        center_window(root, 300, 240)

        root.maxsize(600, 400)

        root.minsize(300, 240)

        Tkinter.mainloop()

def get_screen_size(window):

    return window.winfo_screenwidth(),window.winfo_screenheight()  

def get_window_size(window):

    return window.winfo_reqwidth(),window.winfo_reqheight()  

def center_window(root, width, height):

    screenwidth = root.winfo_screenwidth()

    screenheight = root.winfo_screenheight()

    size = '%dx%d+%d+%d' % (width, height, (screenwidth - width)/2, (screenheight - height)/2)

    print(size)

    root.geometry(size)

LaunchControl

LaunchControl用起来比较直观。当然，也可以直接用mac自带的launchctl，具体可参考launchctl使用说明

定时爬虫抓当日免费应用：Scrapy + Tkinter + LaunchControl的更多相关文章

C#多线程爬虫抓取免费代理IP
这里用到一个HTML解析辅助类:HtmlAgilityPack,如果没有网上找一个增加到库里,这个插件有很多版本,如果你开发环境是使用VS2005就2.0的类库,VS2010就使用4.0,以此类推.. ...
vb.net 多线程爬虫抓取免费代理IP
Class Program Public Shared masterPorxyList As List(Of proxy) = New List(Of proxy)() Public Class pr ...
scrapy定时执行抓取任务
在ubuntu环境下,使用scrapy定时执行抓取任务,由于scrapy本身没有提供定时执行的功能,所以采用了crontab的方式进行定时执行: 首先编写要执行的命令脚本cron.sh #! /bin ...
c#批量抓取免费代理并验证有效性
之前看到某公司的官网的文章的浏览量刷新一次网页就会增加一次,给人的感觉不太好,一个公司的官网给人如此直白的漏洞,我批量发起请求的时候发现页面打开都报错,100多人的公司的官网文章刷新一次你给我看这个, ...
如何利用Python网络爬虫抓取微信朋友圈的动态（上）
今天小编给大家分享一下如何利用Python网络爬虫抓取微信朋友圈的动态信息,实际上如果单独的去爬取朋友圈的话,难度会非常大,因为微信没有提供向网易云音乐这样的API接口,所以很容易找不到门.不过不要慌 ...
scrapy爬虫学习系列二：scrapy简单爬虫样例学习
系列文章列表: scrapy爬虫学习系列一:scrapy爬虫环境的准备: http://www.cnblogs.com/zhaojiedi1992/p/zhaojiedi_python_00 ...
scrapy爬虫学习系列一：scrapy爬虫环境的准备
系列文章列表: scrapy爬虫学习系列一:scrapy爬虫环境的准备: http://www.cnblogs.com/zhaojiedi1992/p/zhaojiedi_python_00 ...
scrapy爬虫学习系列三：scrapy部署到scrapyhub上
系列文章列表: scrapy爬虫学习系列一:scrapy爬虫环境的准备: http://www.cnblogs.com/zhaojiedi1992/p/zhaojiedi_python_00 ...
shopex-百度爬虫抓取过于频繁导致php-cgi占用CPU过高的解决办法
步骤 1.开启slowlog:php-fpm里修改配置观察slowlog里的超时文件,然后修改相应超时文件 2.1修改完后,仍然无效,查看access.log,发现大量如下的请求 220.181.1 ...

随机推荐

hadoop 错误
错误:DataXceiver error processing WRITE_BLOCK operation 2014-05-06 15:21:30,378 ERROR org.apache.hadoo ...
Retain NULL values vs Keep NULLs in SSIS Dataflows - Which To Use? (转载)
There is some confusion as to what the various NULL settings all do in SSIS. In fact in one team whe ...
ListItem Updating事件监视有没有上传附件
using System; using System.Collections.Generic; using System.Text; using Microsoft.SharePoint; using ...
Flask—02-Flask会话控制与模板引擎
会话控制原理说明:概念百度说明的很详细,请自行百度 cookie 说明: 由于HTTP协议无状态无连接的特点,导致一个用户在同一网站做连续操作时,需要不断的提供身份信息:为了解决这个问题,我们可以通 ...
快速玩转linux(2)
ssh是什么 SSH:secure shell 安全外壳协议建立在应用层基础上的安全协议可靠, 专为远程登录会话和其他网络服务提供安全性的协议. mark 客户端服务端都基本支持全平台服务器 ...
date 参数（option）-d
记录这篇博客的原因是:鸟哥的linux教程中,关于date命令的部分缺少-d这个参数的介绍,并且12章中的shell编写部分有用到-d参数 date 参数(option)-d与--date=" ...
linux总结及常用命令
一.操作系统的作用: 1.是现代计算机系统中最基本和最重要的系统软件 2.承上启下的作用 3.向下对硬件操作进行封装 4.向上对用户和应用程序提供方便访问硬件的接口二.不同领域的操作系统: 1 ...
Python的scrapy之爬取boss直聘网站
在我们的项目中,单单分析一个51job网站的工作职位可能爬取结果不太理想,所以我又爬取了boss直聘网的工作,不过boss直聘的网站一次只能展示300个职位,所以我们一次也只能爬取300个职位. jo ...
Qt——绘图
1.涉及类 QPainter QPaintEngine QPaintDevice 作为绘图的使用者,只需要关注 QPainter 和 QPaintDevice 2.QPainter 使用 QPain ...
atoi 和 atof （把数字字符串转化为数字储存）
int atoi(char *s) 如果字符串内容是整数就返回该整数,否则返回0 double atof(char *s) 同上,不过返回浮点型 #include<iostream> #i ...