APScheduler 浅析

前言

APScheduler是python下的任务调度框架，全程为Advanced Python Scheduler，是一款轻量级的Python任务调度框架。它允许你像Linux下的Crontab那样安排定期执行的任务，并且支持Python函数或任意可调用的对象。

安装

(ENV1) [eason@localhost]$ pip install apscheduler

Collecting apscheduler

  Downloading APScheduler-3.3.0-py2.py3-none-any.whl (57kB)

    100% |████████████████████████████████| 61kB 81kB/s

Collecting pytz (from apscheduler)

  Downloading pytz-2016.10-py2.py3-none-any.whl (483kB)

    100% |████████████████████████████████| 491kB 52kB/s

Collecting funcsigs; python_version == "2.7" (from apscheduler)

  Downloading funcsigs-1.0.2-py2.py3-none-any.whl

Requirement already satisfied: six>=1.4.0 in /home/eason/ENV1/lib/python2.7/site-packages (from apscheduler)

Collecting tzlocal>=1.2 (from apscheduler)

  Downloading tzlocal-1.3.tar.gz

Requirement already satisfied: setuptools>=0.7 in /home/eason/ENV1/lib/python2.7/site-packages (from apscheduler)

Collecting futures; python_version == "2.7" (from apscheduler)

  Downloading futures-3.0.5-py2-none-any.whl

Building wheels for collected packages: tzlocal

  Running setup.py bdist_wheel for tzlocal ... done

  Stored in directory: /home/eason/.cache/pip/wheels/80/19/a8/635ad9f4ad8a63b49d073c55cbca31fb5898ce2560ed145a69

Successfully built tzlocal

Installing collected packages: pytz, funcsigs, tzlocal, futures, apscheduler

Successfully installed apscheduler-3.3.0 funcsigs-1.0.2 futures-3.0.5 pytz-2016.10 tzlocal-1.3

(ENV1) [eason@localhost]$

基本概念

APScheduler 有四种组件:

triggers
job stores
executors
schedulers

triggers（触发器）中包含调度逻辑，每个作业都由自己的触发器来决定下次运行时间。除了他们自己初始配置意外，触发器完全是无状态的。

job stores（作业存储器）存储被调度的作业，默认的作业存储器只是简单地把作业保存在内存中，其他的作业存储器则是将作业保存在数据库中。当作业被保存到一个持久化的作业存储器中的时候，该作业的数据会被序列化，并在加载时被反序列化。作业存储器不能共享调度器。

executors（执行器）处理作业的运行，他们通常通过在作业中提交指定的可调用对象到一个线程或者进城池来进行。当作业完成时，执行器将会通知调度器。

schedulers（调度器）配置作业存储器和执行器可以在调度器中完成，例如添加、修改和移除作业。根据不同的应用场景可以选用不同的调度器，可选的有BlockingScheduler,BackgroundScheduler,AsyncIOScheduler,GeventScheduler,TornadoScheduler,TwistedScheduler,QtScheduler 7种。

选择合适的调度器

BlockingScheduler : 当调度器是你应用中唯一要运行的东西时
BackgroundScheduler : 当你没有运行任何其他框架并希望调度器在你应用的后台执行时使用。
AsyncIOScheduler : 当你的程序使用了asyncio（一个异步框架）的时候使用。
GeventScheduler : 当你的程序使用了gevent（高性能的Python并发框架）的时候使用。
TornadoScheduler : 当你的程序基于Tornado（一个web框架）的时候使用。
TwistedScheduler : 当你的程序使用了Twisted（一个异步框架）的时候使用
QtScheduler : 如果你的应用是一个Qt应用的时候可以使用。

选择合适的作业存储器

如果你的应用在每次启动的时候都会重新创建作业，那么使用默认的作业存储器（MemoryJobStore）即可，但是如果你需要在调度器重启或者应用程序奔溃的情况下任然保留作业，你应该根据你的应用环境来选择具体的作业存储器。例如：使用Mongo或者SQLAlchemy JobStore （用于支持大多数RDBMS）

关于执行器

对执行器的选择取决于你使用上面哪些框架，大多数情况下，使用默认的ThreadPoolExecutor已经能够满足需求。如果你的应用涉及到CPU密集型操作，你可以考虑使用ProcessPoolExecutor来使用更多的CPU核心。你也可以同时使用两者，将ProcessPoolExecutor作为第二执行器。

关于触发器

当你调度作业的时候，你需要为这个作业选择一个触发器，用来描述这个作业何时被触发，APScheduler有三种内置的触发器类型:

date 一次性指定日期
interval 在某个时间范围内间隔多长时间执行一次
cron 和Linux crontab格式兼容，最为强大

date 最基本的一种调度，作业只会执行一次。它的参数如下：

run_date (datetime|str) – 作业的运行日期或时间
timezone (datetime.tzinfo|str) – 指定时区

举个栗子：

# 2016-12-12运行一次job_function

sched.add_job(job_function, 'date', run_date=date(2016, 12, 12), args=['text'])

# 2016-12-12 12:00:00运行一次job_function

sched.add_job(job_function, 'date', run_date=datetime(2016, 12, 12, 12, 0, 0), args=['text'])

interval 间隔调度，参数如下：

weeks (int) – 间隔几周
days (int) – 间隔几天
hours (int) – 间隔几小时
minutes (int) – 间隔几分钟
seconds (int) – 间隔多少秒
start_date (datetime|str) – 开始日期
end_date (datetime|str) – 结束日期
timezone (datetime.tzinfo|str) – 时区

举个栗子：

# 每两个小时调一下job_function

sched.add_job(job_function, 'interval', hours=2)

cron参数如下：

year (int|str) – 年，4位数字
month (int|str) – 月 (范围1-12)
day (int|str) – 日 (范围1-31)
week (int|str) – 周 (范围1-53)
day_of_week (int|str) – 周内第几天或者星期几 (范围0-6 或者 mon,tue,wed,thu,fri,sat,sun)
hour (int|str) – 时 (范围0-23)
minute (int|str) – 分 (范围0-59)
second (int|str) – 秒 (范围0-59)
start_date (datetime|str) – 最早开始日期(包含)
end_date (datetime|str) – 最晚结束时间(包含)
timezone (datetime.tzinfo|str) – 指定时区

取值格式：

表达式	参数	描述
*	any	Fire on every value
*/a	any	Fire every a values, starting from the minimum
a-b	any	Fire on any value within the a-b range (a must be smaller than b)
a-b/c	any	Fire every c values within the a-b range
xth y	day	Fire on the x -th occurrence of weekday y within the month
last x	day	Fire on the last occurrence of weekday x within the month
last	day	Fire on the last day within the month
x,y,z	any	Fire on any matching expression; can combine any number of any of the above expressions

举个栗子：

# job_function将会在6,7,8,11,12月的第3个周五的1,2,3点运行

sched.add_job(job_function, 'cron', month='6-8,11-12', day='3rd fri', hour='0-3')

# 截止到2016-12-30 00:00:00，每周一到周五早上五点半运行job_function

sched.add_job(job_function, 'cron', day_of_week='mon-fri', hour=5, minute=30, end_date='2016-12-31')

实践

添加作业有两种方法，一种是使用add_job()函数，还有一种方式是通过scheduled_job()装饰器。

add_job()函数方式

from apscheduler.schedulers.blocking import BlockingScheduler

import datetime

def my_job1():

    print 'my_job1 is running, Now is %s' % datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

def my_job2():

    print 'my_job2 is running, Now is %s' % datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

sched = BlockingScheduler()

# 每隔5秒运行一次my_job1

sched.add_job(my_job1, 'interval', seconds=5,id='my_job1')

# 每隔5秒运行一次my_job2

sched.add_job(my_job2,'cron',second='*/5',id='my_job2')

sched.start()

执行结果：

$my_job2 is running, Now is 2016-12-13 14:41:10

$my_job1 is running, Now is 2016-12-13 14:41:12

$my_job2 is running, Now is 2016-12-13 14:41:15

$my_job1 is running, Now is 2016-12-13 14:41:17

$my_job2 is running, Now is 2016-12-13 14:41:20

$my_job1 is running, Now is 2016-12-13 14:41:22

$my_job2 is running, Now is 2016-12-13 14:41:25

$my_job1 is running, Now is 2016-12-13 14:41:27

scheduled_job()装饰器方式

from apscheduler.schedulers.blocking import BlockingScheduler

import datetime

sched = BlockingScheduler()

#每隔5秒运行一次my_job1

@sched.scheduled_job('interval',seconds=5,id='my_job1')

def my_job1():

    print 'my_job1 is running, Now is %s' % datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

#每隔5秒运行一次my_job2

@sched.scheduled_job('cron',second='*/5',id='my_job2')

def my_job2():

    print 'my_job2 is running, Now is %s' % datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

sched.start()

执行结果：

$my_job2 is running, Now is 2016-12-13 15:09:00

$my_job1 is running, Now is 2016-12-13 15:09:03

$my_job2 is running, Now is 2016-12-13 15:09:05

$my_job1 is running, Now is 2016-12-13 15:09:08

$my_job2 is running, Now is 2016-12-13 15:09:10

$my_job1 is running, Now is 2016-12-13 15:09:13

$my_job2 is running, Now is 2016-12-13 15:09:15

$my_job1 is running, Now is 2016-12-13 15:09:18

使用SQLAlchemy作业存储器存放作业

from apscheduler.schedulers.blocking import BlockingScheduler

from datetime import datetime,timedelta

import logging

sched = BlockingScheduler()

def my_job():

    print 'my_job is running, Now is %s' % datetime.now().strftime("%Y-%m-%d %H:%M:%S")

#使用sqlalchemy作业存储器

url='mysql+mysqldb://root:123456@localhost:3306/scrapy?charset=utf8'

sched.add_jobstore('sqlalchemy',url=url)

#添加作业

sched.add_job(my_job,'interval',id='myjob',seconds=5)

log = logging.getLogger('apscheduler.executors.default')

log.setLevel(logging.INFO)  # DEBUG

#设定日志格式

fmt = logging.Formatter('%(levelname)s:%(name)s:%(message)s')

h = logging.StreamHandler()

h.setFormatter(fmt)

log.addHandler(h)

sched.start()

执行结果：

$ python scheduler.py

INFO:apscheduler.executors.default:Running job "my_job (trigger: interval[0:00:05], next run at: 2016-12-13 21:26:45 CST)" (scheduled at 2016-12-13 21:26:45.067157+08:00)

my_job is running, Now is 2016-12-13 21:26:45

INFO:apscheduler.executors.default:Job "my_job (trigger: interval[0:00:05], next run at: 2016-12-13 21:26:50 CST)" executed successfully

INFO:apscheduler.executors.default:Running job "my_job (trigger: interval[0:00:05], next run at: 2016-12-13 21:26:50 CST)" (scheduled at 2016-12-13 21:26:50.067157+08:00)

my_job is running, Now is 2016-12-13 21:26:50

INFO:apscheduler.executors.default:Job "my_job (trigger: interval[0:00:05], next run at: 2016-12-13 21:26:50 CST)" executed successfully

INFO:apscheduler.executors.default:Running job "my_job (trigger: interval[0:00:05], next run at: 2016-12-13 21:26:55 CST)" (scheduled at 2016-12-13 21:26:55.067157+08:00)

my_job is running, Now is 2016-12-13 21:26:55

INFO:apscheduler.executors.default:Job "my_job (trigger: interval[0:00:05], next run at: 2016-12-13 21:26:55 CST)" executed successfully

INFO:apscheduler.executors.default:Running job "my_job (trigger: interval[0:00:05], next run at: 2016-12-13 21:27:00 CST)" (scheduled at 2016-12-13 21:27:00.067157+08:00)

my_job is running, Now is 2016-12-13 21:27:00

INFO:apscheduler.executors.default:Job "my_job (trigger: interval[0:00:05], next run at: 2016-12-13 21:27:05 CST)" executed successfully

INFO:apscheduler.executors.default:Running job "my_job (trigger: interval[0:00:05], next run at: 2016-12-13 21:27:05 CST)" (scheduled at 2016-12-13 21:27:05.067157+08:00)

my_job is running, Now is 2016-12-13 21:27:05

INFO:apscheduler.executors.default:Job "my_job (trigger: interval[0:00:05], next run at: 2016-12-13 21:27:05 CST)" executed successfully