[源码解析] 并行分布式任务队列 Celery 之 Task是什么

0x00 摘要

Celery是一个简单、灵活且可靠的,处理大量消息的分布式系统,专注于实时处理的异步任务队列,同时也支持任务调度。本文目的是看看 Celery 的 task 究竟是什么,以及 如果我们想从无到有实现一个 task 机制,有哪些地方需要注意,应该如何处理。

因为 task 和 Consumer 消费密切相关,为了更好的说明,故本文与上文有部分重复,请谅解。

0x01 思考出发点

我们可以大致想想需要一些问题,也就是我们下面剖析的出发点和留意点。

  • task 究竟是什么?
  • task 有什么分类?
  • 有没有内置的 task?
  • task 如何注册到系统中?
  • 用户自定义的 task 如何注册到系统中?

我们在下面会逐一回答这些问题。

0x02 示例代码

示例代码服务端如下,这里使用了装饰器来包装待执行任务。

Task就是用户自定义的业务代码,这里的 task 就是一个加法功能。

  1. from celery import Celery
  2. app = Celery('myTest', broker='redis://localhost:6379')
  3. @app.task
  4. def add(x,y):
  5. print(x+y)
  6. return x+y
  7. if __name__ == '__main__':
  8. app.worker_main(argv=['worker'])

发送代码如下:

  1. from myTest import add
  2. re = add.apply_async((2,17))

0x03 任务是什么

为了了解 task 是什么,我们首先打印出运行变量看看,这里选取了主要成员变量:

  1. self = {add} <@task: myTest.add of myTest at 0x7faf35f0a208>
  2. Request = {str} 'celery.worker.request:Request'
  3. Strategy = {str} 'celery.worker.strategy:default'
  4. app = {Celery} <Celery myTest at 0x7faf35f0a208>
  5. backend = {DisabledBackend} <celery.backends.base.DisabledBackend object at 0x7faf364aea20>
  6. from_config = {tuple: 9} (('serializer', 'task_serializer'), ('rate_limit', 'task_default_rate_limit'), ('priority', 'task_default_priority'), ('track_started', 'task_track_started'), ('acks_late', 'task_acks_late'), ('acks_on_failure_or_timeout', 'task_acks_on_failure_or_timeout'), ('reject_on_worker_lost', 'task_reject_on_worker_lost'), ('ignore_result', 'task_ignore_result'), ('store_errors_even_if_ignored', 'task_store_errors_even_if_ignored'))
  7. name = {str} 'myTest.add'
  8. priority = {NoneType} None
  9. request = {Context} <Context: {}>
  10. request_stack = {_LocalStack: 0} <celery.utils.threads._LocalStack object at 0x7faf36405e48>
  11. serializer = {str} 'json'

可以看出来,'myTest.add' 是一个Task变量。

于是我们需要看看Task 是什么。Task 的实现在 Celery 中你会发现有两处,

  • 一处位于 celery/app/task.py

  • 第二个位于 celery/task/base.py 中;

他们之间是有关系的,你可以认为第一个是对外暴露的接口,而第二个是具体的实现。

0x04 Celery应用与任务

任务是 Celery 里不可缺少的一部分,它可以是任何可调用对象。每一个任务通过一个唯一的名称进行标识, worker 通过这个名称对任务进行检索。任务可以通过 app.task 装饰器进行注册,需要注意的一点是,当函数有多个装饰器时,为了保证 Celery 的正常运行,app.task 装饰器需要在最外层。

Task 承载的功能就是在 Celery 应用中,启动对应的消息消费者。

任务最基本的形式就是函数,任务发布最直接的想法就是client将要执行的相关函数代码打包,发布到broker。分布式计算框架 spark 就是使用这种方式(Spark的思想比较简单:挪计算不挪数据)。2.0之前的celery也支持这种任务发布的方式。

这种方式显而易见的一个坏处是传递给broker的数据量可能会比较大。解决的办法也很容易想到,就是把要发布的任务相关的代码,提前告诉worker这就是 全局集合 和 注解注册的作用

当采用 "提前告诉 worker 我们自定义的 task" 时候,定义 task 的方法如下:

  1. @app.task(name='hello_task')
  2. def hello():
  3. print('hello')

其中的app是worker中的application,通过装饰器的方式,对任务函数注册。

app会维护一个字典,key是任务的名字,也就是这里的hello_task,value是这个函数的内存地址。任务名必须唯一,但是任务名这个参数不是必须的,如果没有给这个参数,celery会自动根据包的路径和函数名生成一个任务名。

通过上面这种方式,client发布任务只需要提供任务名以及相关参数,不必提供任务相关代码:

  1. # client端
  2. app.send_task('hello_task')

这里需要注意:client发布任务后,任务会以一个消息的形式写入broker队列,带有任务名称等相关参数,等待worker获取。这里任务的发布,是完全独立于worker端的,即使worker没有启动,消息也会被写入队列。

这种方式也有显而易见的坏处,所有要执行的任务代码都需要提前在worker端注册好,client端和worker端的耦合变强了。

因此,我们需要从 Celery 应用启动时候开始看。

4.1 全局回调集合 和 内置任务

Celery 启动首先就是来到 celery/_state.py

这里建立了一个 全局 set,用来收集所有的 任务 tasks

  1. #: Global set of functions to call whenever a new app is finalized.
  2. #: Shared tasks, and built-in tasks are created by adding callbacks here.
  3. _on_app_finalizers = set()

在启动时候,系统通过调用如下函数来添加 任务。

  1. def connect_on_app_finalize(callback):
  2. """Connect callback to be called when any app is finalized."""
  3. _on_app_finalizers.add(callback)
  4. return callback

首先,celery/app/builtins.py 就定义了很多内置任务,需要一一添加到全局回调集合中。

  1. @connect_on_app_finalize
  2. def add_map_task(app):
  3. from celery.canvas import signature
  4. @app.task(name='celery.map', shared=False, lazy=False)
  5. def xmap(task, it):
  6. task = signature(task, app=app).type
  7. return [task(item) for item in it]
  8. return xmap

其次,系统流程会来到我们的自定义task,把这个 task 注册到全局回调集合中。

即,可以这么理解:Celery 启动之后,会查找代码中,哪些类或者函数使用了 @task注解,然后就把这些 类或者函数注册到全局回调集合中

  1. @app.task
  2. def add(x,y):
  3. print(x+y)
  4. return x+y

4.2 装饰器@app.task

我们顺着 @app.task 来到了 Celery 应用本身。

代码位于:celery/app/base.py。

@app.task 的作用是返回 _create_task_cls 来构建一个task proxy,然后加入 应用待处理队列 pending,并且利用connect_on_app_finalize(cons) 加入全局回调集合

  1. _create_task_cls = {function} <function Celery.task.<locals>.inner_create_task_cls.<locals>._create_task_cls at 0x7ff1a7b118c8>

具体代码如下:

  1. def task(self, *args, **opts):
  2. if USING_EXECV and opts.get('lazy', True):
  3. from . import shared_task
  4. return shared_task(*args, lazy=False, **opts)
  5. def inner_create_task_cls(shared=True, filter=None, lazy=True, **opts):
  6. _filt = filter
  7. def _create_task_cls(fun):
  8. if shared:
  9. def cons(app):
  10. return app._task_from_fun(fun, **opts)
  11. cons.__name__ = fun.__name__
  12. connect_on_app_finalize(cons) # 这里是重点,加入全局回调集合
  13. if not lazy or self.finalized:
  14. ret = self._task_from_fun(fun, **opts)
  15. else:
  16. # return a proxy object that evaluates on first use
  17. ret = PromiseProxy(self._task_from_fun, (fun,), opts,
  18. __doc__=fun.__doc__)
  19. self._pending.append(ret) # 加入应用pending
  20. if _filt:
  21. return _filt(ret)
  22. return ret
  23. return _create_task_cls
  24. if len(args) == 1:
  25. if callable(args[0]):
  26. return inner_create_task_cls(**opts)(*args)
  27. return inner_create_task_cls(**opts)

4.2.1 建立 Proxy 实例

按照示例中的调用,Celery 返回了Proxy的实例,传入参数就是task_by_cons。

此时查看一下Proxy类的实现,该类位于celery/local.py中。

  1. class Proxy(object):
  2. """Proxy to another object."""
  3. # Code stolen from werkzeug.local.Proxy.
  4. __slots__ = ('__local', '__args', '__kwargs', '__dict__')
  5. def __init__(self, local,
  6. args=None, kwargs=None, name=None, __doc__=None):
  7. object.__setattr__(self, '_Proxy__local', local) # 将传入参数local设置到_Proxy__local属性中
  8. object.__setattr__(self, '_Proxy__args', args or ()) # 设置列表属性
  9. object.__setattr__(self, '_Proxy__kwargs', kwargs or {}) # 设置键值属性
  10. if name is not None:
  11. object.__setattr__(self, '__custom_name__', name)
  12. if __doc__ is not None:
  13. object.__setattr__(self, '__doc__', __doc__)
  14. ...
  15. def _get_current_object(self):
  16. """Get current object.
  17. This is useful if you want the real
  18. object behind the proxy at a time for performance reasons or because
  19. you want to pass the object into a different context.
  20. """
  21. loc = object.__getattribute__(self, '_Proxy__local') # 获取初始化传入的local
  22. if not hasattr(loc, '__release_local__'): # 如果没有__release_local__属性
  23. return loc(*self.__args, **self.__kwargs) # 函数调用,将初始化的值传入调用该函数
  24. try: # pragma: no cover
  25. # not sure what this is about
  26. return getattr(loc, self.__name__) # 获取当前__name__属性值
  27. except AttributeError: # pragma: no cover
  28. raise RuntimeError('no object bound to {0.__name__}'.format(self))
  29. ...
  30. def __getattr__(self, name):
  31. if name == '__members__':
  32. return dir(self._get_current_object())
  33. return getattr(self._get_current_object(), name) # 获取obj的属性
  34. def __setitem__(self, key, value):
  35. self._get_current_object()[key] = value # 设置key val
  36. def __delitem__(self, key):
  37. del self._get_current_object()[key] # 删除对应key
  38. def __setslice__(self, i, j, seq):
  39. self._get_current_object()[i:j] = seq # 列表操作
  40. def __delslice__(self, i, j):
  41. del self._get_current_object()[i:j]
  42. def __setattr__(self, name, value):
  43. setattr(self._get_current_object(), name, value) # 设置属性
  44. def __delattr__(self, name):
  45. delattr(self._get_current_object(), name) # 删除对应属性

我们只展示了部分属性,分析如上,主要是根据传入的是否local是否是函数,或者包含release_local来判断是否是调用函数,或是获取属性来处理

4.2.2 添加待处理

上面代码中,如下会把 task 添加到 Celery 应用的 pending queue。

  1. self._pending.append(ret)

_pending定义如下,就是一个 deque:

  1. class Celery:
  2. """Celery application.
  3. """
  4. def __init__(self, main=None, loader=None, backend=None,
  5. amqp=None, events=None, log=None, control=None,
  6. set_as_current=True, tasks=None, broker=None, include=None,
  7. changes=None, config_source=None, fixups=None, task_cls=None,
  8. autofinalize=True, namespace=None, strict_typing=True,
  9. **kwargs):
  10. self._pending = deque()

此时全局集合如下:

  1. _on_app_finalizers = {set: 10}
  2. {function} <function add_chunk_task at 0x7fc200a81400>
  3. {function} <function add_backend_cleanup_task at 0x7fc200a81048>
  4. {function} <function add_starmap_task at 0x7fc200a81488>
  5. {function} <function add_group_task at 0x7fc200a812f0>
  6. {function} <function add_map_task at 0x7fc200a81510>
  7. {function} <function Celery.task.<locals>.inner_create_task_cls.<locals>._create_task_cls.<locals>.cons at 0x7fc200af4510>
  8. {function} <function add_accumulate_task at 0x7fc200aa0158>
  9. {function} <function add_chain_task at 0x7fc200a81378>
  10. {function} <function add_unlock_chord_task at 0x7fc200a81598>
  11. {function} <function add_chord_task at 0x7fc200aa01e0>

具体逻辑如图:

  1. +------------------------------+
  2. | _on_app_finalizers = set() |
  3. | |
  4. +--------------+---------------+
  5. |
  6. connect_on_app_finalize |
  7. +------------+ |
  8. | builtins.py| +-----------------------> |
  9. +------------+ |
  10. |
  11. connect_on_app_finalize |
  12. +-------------+ |
  13. |User Function| +----------------------> |
  14. +-------------+ |
  15. v
  16. +----------------------------------------------------------------------------------------------------+
  17. | _on_app_finalizers |
  18. | |
  19. | |
  20. | ^function add_chunk_task> |
  21. | <function add_backend_cleanup_task> |
  22. | <function add_starmap_task> |
  23. | <function add_group_task> |
  24. | <function add_map_task^ |
  25. | <function Celery.task.vlocals^.inner_create_task_cls.<locals>._create_task_cls.<locals>.cons> |
  26. | <function add_accumulate_taskv |
  27. | <function add_chain_task> |
  28. | <function add_unlock_chord_task> |
  29. | vfunction add_chord_task> |
  30. | |
  31. +----------------------------------------------------------------------------------------------------+

至此,得倒了一个 全局 set :_on_app_finalizers,用来收集所有的 任务 tasks。

手机上如图:

4.3 Celery Worker 启动

目前 Celery 知道了有哪些 task,并且把它们收集起来,但是还不知道它们的逻辑意义。或者可以这么认为,Celery 只是知道有哪些类,但是没有这些类的实例。

因为消费 task 是 Celery 的核心功能,所以我们不可避免的要再回顾下 Worker 的启动,但是这里我们注重 worker 之中 与 task 相关的部分。

其实就是处理上面的 全局 set :_on_app_finalizers把这些暂时没有意义的 task 与 Celery 应用关联起来

具体来说,就是:

  • 根据 task 的具体类生成 task 的实例;
  • 把这些具体task 实例与 Celery 联系起来,比如用 task 名字就可以找到具体实例
  • 配合实例的各种属性;

4.3.1 Worker 示例

这里的Worker 就是 Celery 用来消费的 worker 实例

所以,我们直接来到 worker 看看。

代码位于:celery/bin/worker.py

  1. @click.pass_context
  2. @handle_preload_options
  3. def worker(ctx, hostname=None, pool_cls=None, app=None, uid=None, gid=None,
  4. loglevel=None, logfile=None, pidfile=None, statedb=None,
  5. **kwargs):
  6. """Start worker instance."""
  7. app = ctx.obj.app
  8. worker = app.Worker(
  9. hostname=hostname, pool_cls=pool_cls, loglevel=loglevel,
  10. logfile=logfile, # node format handled by celery.app.log.setup
  11. pidfile=node_format(pidfile, hostname),
  12. statedb=node_format(statedb, hostname),
  13. no_color=ctx.obj.no_color,
  14. **kwargs) # 运行到这里
  15. worker.start()
  16. return worker.exitcode

4.3.2 WorkController

worker = app.Worker 之中,我们会发现,间接调用到了 WorkerController。

代码运行到这里,位于:celery/worker/worker.py。

这里做了一些初始化工作,我们继续探究。

  1. class WorkController:
  2. """Unmanaged worker instance."""
  3. def __init__(self, app=None, hostname=None, **kwargs):
  4. self.app = app or self.app
  5. self.hostname = default_nodename(hostname)
  6. self.startup_time = datetime.utcnow()
  7. self.app.loader.init_worker()
  8. self.on_before_init(**kwargs) # 运行到这里

4.3.3 Worker(WorkController)

代码运行到这里,位于:celery/apps/worker.py

这里调用到了 trace.setup_worker_optimizations,这样马上就看到 task 了。

  1. class Worker(WorkController):
  2. """Worker as a program."""
  3. def on_before_init(self, quiet=False, **kwargs):
  4. self.quiet = quiet
  5. trace.setup_worker_optimizations(self.app, self.hostname)

4.3.4 trace 进入任务联系

代码运行到这里,位于:celery/app/trace.py。

调用到 app.finalize(),目的是启动之前,搞定所有任务

  1. def setup_worker_optimizations(app, hostname=None):
  2. """Setup worker related optimizations."""
  3. global trace_task_ret
  4. hostname = hostname or gethostname()
  5. # make sure custom Task.__call__ methods that calls super
  6. # won't mess up the request/task stack.
  7. _install_stack_protection()
  8. app.set_default()
  9. # evaluate all task classes by finalizing the app.
  10. app.finalize()

4.3.5 把任务和应用关联起来

费了半天劲,我们才来到了关键逻辑。

app.finalize() 会添加任务到 Celery 应用。

即:之前系统把所有的task都收集起来了,得倒了一个全局 set :_on_app_finalizers。但是这个 set 中的task 目前没有逻辑意义,需要和 Celery 应用联系起来才行,所以这里就是要建立关联

堆栈如下:

  1. _task_from_fun, base.py:450
  2. _create_task_cls, base.py:425
  3. add_chunk_task, builtins.py:128
  4. _announce_app_finalized, _state.py:52
  5. finalize, base.py:511
  6. setup_worker_optimizations, trace.py:643
  7. on_before_init, worker.py:90
  8. __init__, worker.py:95
  9. worker, worker.py:326
  10. caller, base.py:132
  11. new_func, decorators.py:21
  12. invoke, core.py:610
  13. invoke, core.py:1066
  14. invoke, core.py:1259
  15. main, core.py:782
  16. start, base.py:358
  17. worker_main, base.py:374

代码如下:

  1. def finalize(self, auto=False):
  2. """Finalize the app.
  3. This loads built-in tasks, evaluates pending task decorators,
  4. reads configuration, etc.
  5. """
  6. with self._finalize_mutex:
  7. if not self.finalized:
  8. if auto and not self.autofinalize:
  9. raise RuntimeError('Contract breach: app not finalized')
  10. self.finalized = True
  11. _announce_app_finalized(self) # 这里是关键,建立关联
  12. pending = self._pending
  13. while pending:
  14. maybe_evaluate(pending.popleft())
  15. for task in self._tasks.values():
  16. task.bind(self)
  17. self.on_after_finalize.send(sender=self)
4.3.5.1 添加任务

_announce_app_finalized(self) 函数是为了 : 把全局回调集合 _on_app_finalizers 中的回调函数运行,得到任务的实例,然后就把它们加入到 Celery 的任务列表,用户可以通过 task 名字得到对应的 task 实例

  1. def _announce_app_finalized(app):
  2. callbacks = set(_on_app_finalizers)
  3. for callback in callbacks:
  4. callback(app)

对于我们的用户自定义任务,callback 就是 _create_task_cls,因此就是运行 _create_task_cls 进行添加。

  1. def inner_create_task_cls(shared=True, filter=None, lazy=True, **opts):
  2. _filt = filter
  3. def _create_task_cls(fun):
  4. if shared:
  5. def cons(app):
  6. return app._task_from_fun(fun, **opts)
  7. cons.__name__ = fun.__name__
  8. connect_on_app_finalize(cons)
  9. if not lazy or self.finalized:
  10. ret = self._task_from_fun(fun, **opts) # 这里

于是,在初始化过程中,为每个 app 添加该任务时,会调用到 app._task_from_fun(fun, **options)。

_task_from_fun 之中,使用如下代码把任务添加到 celery 之中。这样就关联起来

  1. self._tasks[task.name] = task

于是 self._tasks就为:

  1. _tasks = {TaskRegistry: 10}
  2. NotRegistered = {type} <class 'celery.exceptions.NotRegistered'>
  3. 'celery.starmap' = {xstarmap} <@task: celery.starmap of myTest at 0x25da0ca0d88>
  4. 'celery.chord' = {chord} <@task: celery.chord of myTest at 0x25da0ca0d88>
  5. 'celery.accumulate' = {accumulate} <@task: celery.accumulate of myTest at 0x25da0ca0d88>
  6. 'celery.chunks' = {chunks} <@task: celery.chunks of myTest at 0x25da0ca0d88>
  7. 'celery.chord_unlock' = {unlock_chord} <@task: celery.chord_unlock of myTest at 0x25da0ca0d88>
  8. 'celery.group' = {group} <@task: celery.group of myTest at 0x25da0ca0d88>
  9. 'celery.map' = {xmap} <@task: celery.map of myTest at 0x25da0ca0d88>
  10. 'myTest.add' = {add} <@task: myTest.add of myTest at 0x25da0ca0d88>
  11. 'celery.backend_cleanup' = {backend_cleanup} <@task: celery.backend_cleanup of myTest at 0x25da0ca0d88>
  12. 'celery.chain' = {chain} <@task: celery.chain of myTest at 0x25da0ca0d88>
  13. __len__ = {int} 10

具体代码如下:

  1. def _task_from_fun(self, fun, name=None, base=None, bind=False, **options):
  2. if not self.finalized and not self.autofinalize:
  3. raise RuntimeError('Contract breach: app not finalized')
  4. name = name or self.gen_task_name(fun.__name__, fun.__module__)
  5. base = base or self.Task
  6. if name not in self._tasks:
  7. run = fun if bind else staticmethod(fun)
  8. task = type(fun.__name__, (base,), dict({
  9. 'app': self,
  10. 'name': name,
  11. 'run': run,
  12. '_decorated': True,
  13. '__doc__': fun.__doc__,
  14. '__module__': fun.__module__,
  15. '__annotations__': fun.__annotations__,
  16. '__header__': staticmethod(head_from_fun(fun, bound=bind)),
  17. '__wrapped__': run}, **options))()
  18. self._tasks[task.name] = task
  19. task.bind(self) # connects task to this app
  20. add_autoretry_behaviour(task, **options)
  21. else:
  22. task = self._tasks[name]
  23. return task
4.3.5.2 bind

其中task在默认情况下是celery.app.task:Task,在动态生成该实例后,调用了task.bind(self)方法,这里就是设置 app 各种属性。

  1. @classmethod
  2. def bind(cls, app):
  3. was_bound, cls.__bound__ = cls.__bound__, True
  4. cls._app = app # 设置类的_app属性
  5. conf = app.conf # 获取app的配置信息
  6. cls._exec_options = None # clear option cache
  7. if cls.typing is None:
  8. cls.typing = app.strict_typing
  9. for attr_name, config_name in cls.from_config: # 设置类中的默认值
  10. if getattr(cls, attr_name, None) is None: # 如果获取该属性为空
  11. setattr(cls, attr_name, conf[config_name]) # 使用app配置中的默认值
  12. # decorate with annotations from config.
  13. if not was_bound:
  14. cls.annotate()
  15. from celery.utils.threads import LocalStack
  16. cls.request_stack = LocalStack() # 使用线程栈保存数据
  17. # PeriodicTask uses this to add itself to the PeriodicTask schedule.
  18. cls.on_bound(app)
  19. return app
4.3.5.3 处理 "待处理"

运行回到 Celery,此时代码位于:celery/app/base.py

变量如下:

  1. pending = {deque: 1} deque([<@task: myTest.add of myTest at 0x7fd907623550>])
  2. self = {Celery} <Celery myTest at 0x7fd907623550>

从pending 中提取任务之后,会进行处理。前面我们提到,有一些 task 的待处理工作,就是在这里执行。

代码位于:celery/local.py

  1. def __maybe_evaluate__(self):
  2. return self._get_current_object()
  3. def _get_current_object(self):
  4. try:
  5. return object.__getattribute__(self, '__thing')

此时self如下,就是任务本身:

  1. self = {add} <@task: myTest.add of myTest at 0x7fa09ee1e320>

返回就是 myTest.add 任务本身。

4.3.6 多进程 VS Task

目前已经得到了所有的 task,并且每一个task都有自己的实例,可以进行调用。

因为任务消费需要用到多进程,所以我们需要先大致看看多进程如何启动的

让我们继续看看 Celery Worker 的启动。

在 Celery Worker 启动过程中,会启动不同的bootsteps,在 Worker 启动过程中,对应的 steps 为:[<step: Hub>, <step: Pool>, <step: Consumer>]。

  1. start, bootsteps.py:116
  2. start, worker.py:204
  3. worker, worker.py:327
  4. caller, base.py:132
  5. new_func, decorators.py:21
  6. invoke, core.py:610
  7. invoke, core.py:1066
  8. invoke, core.py:1259
  9. main, core.py:782
  10. start, base.py:358
  11. worker_main, base.py:374

代码位于:celery/bootsteps.py

  1. def start(self, parent):
  2. self.state = RUN
  3. if self.on_start:
  4. self.on_start()
  5. for i, step in enumerate(s for s in parent.steps if s is not None):
  6. self.started = i + 1
  7. step.start(parent)

变量为:

  1. parent.steps = {list: 3}
  2. 0 = {Hub} <step: Hub>
  3. 1 = {Pool} <step: Pool>
  4. 2 = {Consumer} <step: Consumer>
  5. __len__ = {int} 3

具体 任务处理的逻辑 启动 就在 Pool 之中。

在 Pool(bootsteps.StartStopStep) 中,如下代码 w.process_task = w._process_task 给具体的 pool 配置了回调方法。 即 当 pool 接到通知,有运行机会时候,他知道用什么回调函数来获取/执行具体的task

  1. class Pool(bootsteps.StartStopStep):
  2. """Bootstep managing the worker pool.
  3. Describes how to initialize the worker pool, and starts and stops
  4. the pool during worker start-up/shutdown.
  5. Adds attributes:
  6. * autoscale
  7. * pool
  8. * max_concurrency
  9. * min_concurrency
  10. """
  11. def create(self, w):
  12. procs = w.min_concurrency
  13. w.process_task = w._process_task # 这里配置回调函数

方法如下,可以预计,未来会通过 req.execute_using_pool(self.pool) 这里调用到 多进程

  1. def _process_task(self, req):
  2. """Process task by sending it to the pool of workers."""
  3. req.execute_using_pool(self.pool)

此时 变量为:

  1. self = {Pool} <step: Pool>
  2. semaphore = {NoneType} None
  3. threaded = {bool} False
  4. w = {Worker} celery

4.3.7 总结

最后得到如下逻辑,这个TaskRegistry 在执行任务会用到

  1. self._tasks = {TaskRegistry: 10}
  2. NotRegistered = {type} <class 'celery.exceptions.NotRegistered'>
  3. 'celery.chunks' = {chunks} <@task: celery.chunks of myTest at 0x7fb652da5fd0>
  4. 'celery.backend_cleanup' = {backend_cleanup} <@task: celery.backend_cleanup of myTest at 0x7fb652da5fd0>
  5. 'celery.chord_unlock' = {unlock_chord} <@task: celery.chord_unlock of myTest at 0x7fb652da5fd0>
  6. 'celery.group' = {group} <@task: celery.group of myTest at 0x7fb652da5fd0>
  7. 'celery.map' = {xmap} <@task: celery.map of myTest at 0x7fb652da5fd0>
  8. 'celery.chain' = {chain} <@task: celery.chain of myTest at 0x7fb652da5fd0>
  9. 'celery.starmap' = {xstarmap} <@task: celery.starmap of myTest at 0x7fb652da5fd0>
  10. 'celery.chord' = {chord} <@task: celery.chord of myTest at 0x7fb652da5fd0>
  11. 'myTest.add' = {add} <@task: myTest.add of myTest at 0x7fb652da5fd0>
  12. 'celery.accumulate' = {accumulate} <@task: celery.accumulate of myTest at 0x7fb652da5fd0>
  13. __len__ = {int} 10

图例如下:

  1. +------------------------------+
  2. | _on_app_finalizers = set() |
  3. | |
  4. +--------------+---------------+
  5. |
  6. connect_on_app_finalize |
  7. +------------+ |
  8. | builtins.py| +-----------------------> |
  9. +------------+ |
  10. |
  11. connect_on_app_finalize |
  12. +-------------+ |
  13. |User Function| +----------------------> |
  14. +-------------+ |
  15. v
  16. +----------------------------------------------------------------------------------------------------+
  17. | _on_app_finalizers |
  18. | |
  19. | |
  20. | ^function add_chunk_task> |
  21. | <function add_backend_cleanup_task> |
  22. | <function add_starmap_task> |
  23. | <function add_group_task> |
  24. | <function add_map_task^ |
  25. | <function Celery.task.vlocals^.inner_create_task_cls.<locals>._create_task_cls.<locals>.cons> |
  26. | <function add_accumulate_taskv |
  27. | <function add_chain_task> |
  28. | <function add_unlock_chord_task> |
  29. | vfunction add_chord_task> |
  30. | |
  31. +----------------------------+-----------------------------------------------------------------------+
  32. |
  33. |
  34. | +--------------------------------------------------------------------------------------------+
  35. finalize v | |
  36. | TaskRegistry |
  37. +---------------------------+ | |
  38. | | | |
  39. | Celery | | |
  40. | | | NotRegistered = {type} <class 'celery.exceptions.NotRegistered'> |
  41. _process_task <-------------------+ process_task| | 'celery.chunks' = {chunks} <@task: celery.chunks of myTest> |
  42. | | | 'celery.backend_cleanup' = {backend_cleanup} <@task: celery.backend_cleanup of myTest > |
  43. | | | 'celery.chord_unlock' = {unlock_chord} <@task: celery.chord_unlock of myTest> |
  44. | _tasks +-------------> | 'celery.group' = {group} <@task: celery.group of myTest> |
  45. | | | 'celery.map' = {xmap} <@task: celery.map of myTest> |
  46. | | | 'celery.chain' = {chain} <@task: celery.chain of myTest> |
  47. +---------------------------+ | 'celery.starmap' = {xstarmap} <@task: celery.starmap of myTest> |
  48. | 'celery.chord' = {chord} <@task: celery.chord of myTest> |
  49. | 'myTest.add' = {add} <@task: myTest.add of myTest> |
  50. | 'celery.accumulate' = {accumulate} <@task: celery.accumulate of myTest> |
  51. | |
  52. +--------------------------------------------------------------------------------------------+

手机如下:

或者我们调整 图结构,从另一个角度看看。

  1. +------------------------------+
  2. | _on_app_finalizers = set() |
  3. | |
  4. +--------------+---------------+
  5. |
  6. |
  7. | connect_on_app_finalize +------------+
  8. | <----------------------------+ | builtins.py|
  9. | +------------+
  10. |
  11. | connect_on_app_finalize
  12. | +-------------+
  13. + | <---------------------------+ |User Function|
  14. | +-------------+
  15. v
  16. +------------------------------------------------------------------------------------------------+
  17. | _on_app_finalizers |
  18. | |
  19. | |
  20. | ^function add_chunk_task> |
  21. | <function add_backend_cleanup_task> |
  22. | <function add_starmap_task> |
  23. | <function add_group_task> |
  24. | <function add_map_task^ |
  25. | <function Celery.task.vlocals^.inner_create_task_cls.<locals>._create_task_cls.<locals>.cons> |
  26. | <function add_accumulate_taskv |
  27. | <function add_chain_task> |
  28. | <function add_unlock_chord_task> |
  29. | vfunction add_chord_task> |
  30. | |
  31. +--------------------------+---------------------------------------------------------------------+
  32. |
  33. |
  34. finalize |
  35. |
  36. |
  37. v
  38. +-------------+-------------+
  39. | |
  40. | Celery |
  41. | |
  42. | _tasks |
  43. | + |
  44. | | |
  45. +---------------------------+
  46. |
  47. |
  48. |
  49. v
  50. +--------------------------------------------------------------------------------------------+
  51. | |
  52. | TaskRegistry |
  53. | |
  54. | NotRegistered = {type} <class 'celery.exceptions.NotRegistered'> |
  55. | 'celery.chunks' = {chunks} <@task: celery.chunks of myTest> |
  56. | 'celery.backend_cleanup' = {backend_cleanup} <@task: celery.backend_cleanup of myTest > |
  57. | 'celery.chord_unlock' = {unlock_chord} <@task: celery.chord_unlock of myTest> |
  58. | 'celery.group' = {group} <@task: celery.group of myTest> |
  59. | 'celery.map' = {xmap} <@task: celery.map of myTest> |
  60. | 'celery.chain' = {chain} <@task: celery.chain of myTest> |
  61. | 'celery.starmap' = {xstarmap} <@task: celery.starmap of myTest> |
  62. | 'celery.chord' = {chord} <@task: celery.chord of myTest> |
  63. | 'myTest.add' = {add} <@task: myTest.add of myTest> |
  64. | 'celery.accumulate' = {accumulate} <@task: celery.accumulate of myTest> |
  65. | |
  66. +--------------------------------------------------------------------------------------------+

手机如下:

0x05 Task定义

Task 定义的代码位于:celery/app/task.py。

从其成员变量可以清楚的看到大致功能分类如下:

基础信息,比如:

  • 对应的Celery应用;
  • task 名字;
  • 功能类信息;

错误处理信息,比如:

  • 速率控制;
  • 最大重试次数;
  • 重试间隔时间;
  • 重试时候的错误处理;

业务控制,比如:

  • 是否ack late;
  • ack错误处理;
  • 自动注册;
  • 后端存储信息;
  • worker 出错如何处理;

任务控制,比如:

  • 请求stack;
  • 缺省request;
  • 优先级;
  • 失效时间;
  • 执行option;

具体定义如下:

  1. @abstract.CallableTask.register
  2. class Task:
  3. __trace__ = None
  4. __v2_compat__ = False # set by old base in celery.task.base
  5. MaxRetriesExceededError = MaxRetriesExceededError
  6. OperationalError = OperationalError
  7. Strategy = 'celery.worker.strategy:default'
  8. Request = 'celery.worker.request:Request'
  9. _app = None
  10. name = None
  11. typing = None
  12. max_retries = 3
  13. default_retry_delay = 3 * 60
  14. rate_limit = None
  15. ignore_result = None
  16. trail = True
  17. send_events = True
  18. store_errors_even_if_ignored = None
  19. serializer = None
  20. time_limit = None
  21. soft_time_limit = None
  22. backend = None
  23. autoregister = True
  24. track_started = None
  25. acks_late = None
  26. acks_on_failure_or_timeout = None
  27. reject_on_worker_lost = None
  28. throws = ()
  29. expires = None
  30. priority = None
  31. resultrepr_maxsize = 1024
  32. request_stack = None
  33. _default_request = None
  34. abstract = True
  35. _exec_options = None
  36. __bound__ = False
  37. from_config = (
  38. ('serializer', 'task_serializer'),
  39. ('rate_limit', 'task_default_rate_limit'),
  40. ('priority', 'task_default_priority'),
  41. ('track_started', 'task_track_started'),
  42. ('acks_late', 'task_acks_late'),
  43. ('acks_on_failure_or_timeout', 'task_acks_on_failure_or_timeout'),
  44. ('reject_on_worker_lost', 'task_reject_on_worker_lost'),
  45. ('ignore_result', 'task_ignore_result'),
  46. ('store_errors_even_if_ignored', 'task_store_errors_even_if_ignored'),
  47. )
  48. _backend = None # set by backend property.

0x06 consumer

因为 task 是通过 Consumer 来调用,所以我们要看看 Consumer 中关于 task 的部分,就是把 task 和 consumer 联系起来,这样才能够让 Consumer 具体调用到 task

6.1 Consumer steps

Consumer启动时候,也是要运行多个 steps。

  1. parent.steps = {list: 8}
  2. 0 = {Connection} <step: Connection>
  3. 1 = {Events} <step: Events>
  4. 2 = {Heart} <step: Heart>
  5. 3 = {Mingle} <step: Mingle>
  6. 4 = {Gossip} <step: Gossip>
  7. 5 = {Tasks} <step: Tasks>
  8. 6 = {Control} <step: Control>
  9. 7 = {Evloop} <step: event loop>
  10. __len__ = {int} 8

6.2 Tasks steps

consumer 会启动 Tasks 这个bootsteps,这里会:

  • update_strategies :配置每个任务的回调方法,比如:'celery.chunks' = {function} <function default.<locals>.task_message_handler at 0x7fc5a47d5a60>
  • task_consumer = c.app.amqp.TaskConsumer :这样 task 就和 amqp.Consumer 联系起来
  • 设置 QoS;
  • 设置 预取;

因此,task 的回调就和 amqp.Consumer 联系,消息通路就构建完成

代码位于:celery/worker/consumer/tasks.py

  1. class Tasks(bootsteps.StartStopStep):
  2. """Bootstep starting the task message consumer."""
  3. requires = (Mingle,)
  4. def __init__(self, c, **kwargs):
  5. c.task_consumer = c.qos = None
  6. super().__init__(c, **kwargs)
  7. def start(self, c):
  8. """Start task consumer."""
  9. c.update_strategies() # 配置每个任务的回调方法
  10. # - RabbitMQ 3.3 completely redefines how basic_qos works..
  11. # This will detect if the new qos smenatics is in effect,
  12. # and if so make sure the 'apply_global' flag is set on qos updates.
  13. qos_global = not c.connection.qos_semantics_matches_spec
  14. # set initial prefetch count
  15. c.connection.default_channel.basic_qos(
  16. 0, c.initial_prefetch_count, qos_global,
  17. )
  18. c.task_consumer = c.app.amqp.TaskConsumer(
  19. c.connection, on_decode_error=c.on_decode_error,
  20. ) # task 就和 amqp.Consumer 联系起来
  21. def set_prefetch_count(prefetch_count):
  22. return c.task_consumer.qos(
  23. prefetch_count=prefetch_count,
  24. apply_global=qos_global,
  25. )
  26. c.qos = QoS(set_prefetch_count, c.initial_prefetch_count)

5.2.1 策略

关于 task 运行其实是需要一定策略的,这也可以认为是一种负载均衡。其策略如下:

  1. SCHED_STRATEGY_FCFS = 1
  2. SCHED_STRATEGY_FAIR = 4
  3. SCHED_STRATEGIES = {
  4. None: SCHED_STRATEGY_FAIR,
  5. 'default': SCHED_STRATEGY_FAIR,
  6. 'fast': SCHED_STRATEGY_FCFS,
  7. 'fcfs': SCHED_STRATEGY_FCFS,
  8. 'fair': SCHED_STRATEGY_FAIR,
  9. }

5.2.2 更新策略

update_strategies 会配置每个任务的回调策略以及回调方法,比如:'celery.chunks' = {function} <function default.<locals>.task_message_handler at 0x7fc5a47d5a60>

堆栈如下:

  1. update_strategies, consumer.py:523
  2. start, tasks.py:26
  3. start, bootsteps.py:116
  4. start, consumer.py:311
  5. start, bootsteps.py:365
  6. start, bootsteps.py:116
  7. start, worker.py:204
  8. worker, worker.py:327
  9. caller, base.py:132
  10. new_func, decorators.py:21
  11. invoke, core.py:610
  12. invoke, core.py:1066
  13. invoke, core.py:1259
  14. main, core.py:782
  15. start, base.py:358
  16. worker_main, base.py:374

代码位于:celery/worker/consumer/consumer.py

  1. def update_strategies(self):
  2. loader = self.app.loader # app的加载器
  3. for name, task in items(self.app.tasks): # 遍历所有的任务
  4. self.strategies[name] = task.start_strategy(self.app, self) # 将task的name设为key 将task start_strategy调用的返回值作为 value
  5. task.__trace__ = build_tracer(name, task, loader, self.hostname,
  6. app=self.app) # 处理相关执行结果的函数

app.tasks变量如下,这就是目前 Celery 注册的所有 tasks:

  1. self.app.tasks = {TaskRegistry: 10}
  2. NotRegistered = {type} <class 'celery.exceptions.NotRegistered'>
  3. 'celery.chunks' = {chunks} <@task: celery.chunks of myTest at 0x7fc5a36e8160>
  4. 'celery.backend_cleanup' = {backend_cleanup} <@task: celery.backend_cleanup of myTest at 0x7fc5a36e8160>
  5. 'celery.chord_unlock' = {unlock_chord} <@task: celery.chord_unlock of myTest at 0x7fc5a36e8160>
  6. 'celery.group' = {group} <@task: celery.group of myTest at 0x7fc5a36e8160>
  7. 'celery.map' = {xmap} <@task: celery.map of myTest at 0x7fc5a36e8160>
  8. 'celery.chain' = {chain} <@task: celery.chain of myTest at 0x7fc5a36e8160>
  9. 'celery.starmap' = {xstarmap} <@task: celery.starmap of myTest at 0x7fc5a36e8160>
  10. 'celery.chord' = {chord} <@task: celery.chord of myTest at 0x7fc5a36e8160>
  11. 'myTest.add' = {add} <@task: myTest.add of myTest at 0x7fc5a36e8160>
  12. 'celery.accumulate' = {accumulate} <@task: celery.accumulate of myTest at 0x7fc5a36e8160>
  13. __len__ = {int} 10

此时我们继续查看task.start_strategy函数,

  1. def start_strategy(self, app, consumer, **kwargs):
  2. return instantiate(self.Strategy, self, app, consumer, **kwargs) # 生成task实例

此时self.Strategy的默认值是celery.worker.strategy:default,

  1. def default(task, app, consumer,
  2. info=logger.info, error=logger.error, task_reserved=task_reserved,
  3. to_system_tz=timezone.to_system, bytes=bytes, buffer_t=buffer_t,
  4. proto1_to_proto2=proto1_to_proto2):
  5. """Default task execution strategy.
  6. Note:
  7. Strategies are here as an optimization, so sadly
  8. it's not very easy to override.
  9. """
  10. hostname = consumer.hostname # 设置相关的消费者信息
  11. connection_errors = consumer.connection_errors # 设置错误值
  12. _does_info = logger.isEnabledFor(logging.INFO)
  13. # task event related
  14. # (optimized to avoid calling request.send_event)
  15. eventer = consumer.event_dispatcher
  16. events = eventer and eventer.enabled
  17. send_event = eventer.send
  18. task_sends_events = events and task.send_events
  19. call_at = consumer.timer.call_at
  20. apply_eta_task = consumer.apply_eta_task
  21. rate_limits_enabled = not consumer.disable_rate_limits
  22. get_bucket = consumer.task_buckets.__getitem__
  23. handle = consumer.on_task_request
  24. limit_task = consumer._limit_task
  25. body_can_be_buffer = consumer.pool.body_can_be_buffer
  26. Req = create_request_cls(Request, task, consumer.pool, hostname, eventer) # 返回一个请求类
  27. revoked_tasks = consumer.controller.state.revoked
  28. def task_message_handler(message, body, ack, reject, callbacks,
  29. to_timestamp=to_timestamp):
  30. if body is None:
  31. body, headers, decoded, utc = (
  32. message.body, message.headers, False, True,
  33. )
  34. if not body_can_be_buffer:
  35. body = bytes(body) if isinstance(body, buffer_t) else body
  36. else:
  37. body, headers, decoded, utc = proto1_to_proto2(message, body) # 解析接受的数据
  38. req = Req(
  39. message,
  40. on_ack=ack, on_reject=reject, app=app, hostname=hostname,
  41. eventer=eventer, task=task, connection_errors=connection_errors,
  42. body=body, headers=headers, decoded=decoded, utc=utc,
  43. ) # 实例化请求
  44. if (req.expires or req.id in revoked_tasks) and req.revoked():
  45. return
  46. if task_sends_events:
  47. send_event(
  48. 'task-received',
  49. uuid=req.id, name=req.name,
  50. args=req.argsrepr, kwargs=req.kwargsrepr,
  51. root_id=req.root_id, parent_id=req.parent_id,
  52. retries=req.request_dict.get('retries', 0),
  53. eta=req.eta and req.eta.isoformat(),
  54. expires=req.expires and req.expires.isoformat(),
  55. ) # 如果需要发送接受请求则发送
  56. if req.eta: # 时间相关处理
  57. try:
  58. if req.utc:
  59. eta = to_timestamp(to_system_tz(req.eta))
  60. else:
  61. eta = to_timestamp(req.eta, timezone.local)
  62. else:
  63. consumer.qos.increment_eventually()
  64. call_at(eta, apply_eta_task, (req,), priority=6)
  65. else:
  66. if rate_limits_enabled: # 速率限制
  67. bucket = get_bucket(task.name)
  68. if bucket:
  69. return limit_task(req, bucket, 1)
  70. task_reserved(req) #
  71. if callbacks:
  72. [callback(req) for callback in callbacks]
  73. handle(req) # 处理接受的请求
  74. return task_message_handler

此时处理的 handler 就是在 consumer 初始化的时候传入的 w.process_task,

  1. def _process_task(self, req):
  2. """Process task by sending it to the pool of workers."""
  3. req.execute_using_pool(self.pool)

操作之后,得到了每个task的回调策略,这样当多进程调用时候,就知道如何调用task了,即对于我们目前的各个 task,当从broker 拿到任务消息之后,我们都调用 task_message_handler

  1. strategies = {dict: 10}
  2. 'celery.chunks' = {function} <function default.<locals>.task_message_handler at 0x7fc5a47d5a60>
  3. 'celery.backend_cleanup' = {function} <function default.<locals>.task_message_handler at 0x7fc5a4878400>
  4. 'celery.chord_unlock' = {function} <function default.<locals>.task_message_handler at 0x7fc5a4878598>
  5. 'celery.group' = {function} <function default.<locals>.task_message_handler at 0x7fc5a4878840>
  6. 'celery.map' = {function} <function default.<locals>.task_message_handler at 0x7fc5a4878ae8>
  7. 'celery.chain' = {function} <function default.<locals>.task_message_handler at 0x7fc5a4878d90>
  8. 'celery.starmap' = {function} <function default.<locals>.task_message_handler at 0x7fc5a487b0d0>
  9. 'celery.chord' = {function} <function default.<locals>.task_message_handler at 0x7fc5a487b378>
  10. 'myTest.add' = {function} <function default.<locals>.task_message_handler at 0x7fc5a487b620>
  11. 'celery.accumulate' = {function} <function default.<locals>.task_message_handler at 0x7fc5a487b8c8>
  12. __len__ = {int} 10

5.2.3 Request

celery.worker.strategy:default 之中,这部分代码需要看看:

  1. Req = create_request_cls(Request, task, consumer.pool, hostname, eventer) # 返回一个请求类

Strategy 中,以下目的是为了 根据 task 实例 构建一个 Request,从而把 broker 消息,consumer,多进程都联系起来。

具体可以看到 Request. execute_using_pool 这里就会和多进程处理开始关联,比如和 comsumer 的 pool 进程池联系起来。

  1. Req = create_request_cls(Request, task, consumer.pool, hostname, eventer)

task 实例为:

  1. myTest.add[863cf9b2-8440-4ea2-8ac4-06b3dcd2fd1f]

获得Requst代码为:

  1. def create_request_cls(base, task, pool, hostname, eventer,
  2. ref=ref, revoked_tasks=revoked_tasks,
  3. task_ready=task_ready, trace=trace_task_ret):
  4. default_time_limit = task.time_limit
  5. default_soft_time_limit = task.soft_time_limit
  6. apply_async = pool.apply_async
  7. acks_late = task.acks_late
  8. events = eventer and eventer.enabled
  9. class Request(base):
  10. def execute_using_pool(self, pool, **kwargs):
  11. task_id = self.task_id
  12. if (self.expires or task_id in revoked_tasks) and self.revoked():
  13. raise TaskRevokedError(task_id)
  14. time_limit, soft_time_limit = self.time_limits
  15. result = apply_async(
  16. trace,
  17. args=(self.type, task_id, self.request_dict, self.body,
  18. self.content_type, self.content_encoding),
  19. accept_callback=self.on_accepted,
  20. timeout_callback=self.on_timeout,
  21. callback=self.on_success,
  22. error_callback=self.on_failure,
  23. soft_timeout=soft_time_limit or default_soft_time_limit,
  24. timeout=time_limit or default_time_limit,
  25. correlation_id=task_id,
  26. )
  27. # cannot create weakref to None
  28. # pylint: disable=attribute-defined-outside-init
  29. self._apply_result = maybe(ref, result)
  30. return result
  31. def on_success(self, failed__retval__runtime, **kwargs):
  32. failed, retval, runtime = failed__retval__runtime
  33. if failed:
  34. if isinstance(retval.exception, (
  35. SystemExit, KeyboardInterrupt)):
  36. raise retval.exception
  37. return self.on_failure(retval, return_ok=True)
  38. task_ready(self)
  39. if acks_late:
  40. self.acknowledge()
  41. if events:
  42. self.send_event(
  43. 'task-succeeded', result=retval, runtime=runtime,
  44. )
  45. return Request

5.2.4 如何调用到多进程

前面回调函数 task_message_handler中有 req = Req(...),这就涉及到了如何调用多进程,即 Request 类处理。

  1. def task_message_handler(message, body, ack, reject, callbacks,
  2. to_timestamp=to_timestamp):
  3. req = Req(
  4. message,
  5. on_ack=ack, on_reject=reject, app=app, hostname=hostname,
  6. eventer=eventer, task=task, connection_errors=connection_errors,
  7. body=body, headers=headers, decoded=decoded, utc=utc,
  8. ) # 实例化请求
  9. if req.eta: # 时间相关
  10. else:
  11. task_reserved(req) #
  12. if callbacks:
  13. [callback(req) for callback in callbacks]
  14. handle(req) # 处理接受的请求
  15. return task_message_handler

注意:

此时处理的 handle(req) 的 handle函数 就是在 consumer 初始化的时候传入的 w.process_task,

  1. def _process_task(self, req):
  2. """Process task by sending it to the pool of workers."""
  3. req.execute_using_pool(self.pool)

所以,handle(req) 实际上就是调用 Request 的 execute_using_pool 函数,就来到了多进程。

代码为:

  1. class Request(base):
  2. def execute_using_pool(self, pool, **kwargs):
  3. task_id = self.task_id# 获取任务id
  4. if (self.expires or task_id in revoked_tasks) and self.revoked():# 检查是否过期或者是否已经执行过
  5. raise TaskRevokedError(task_id)
  6. time_limit, soft_time_limit = self.time_limits# 获取时间
  7. result = apply_async(# 执行对应的func并返回结果
  8. trace,
  9. args=(self.type, task_id, self.request_dict, self.body,
  10. self.content_type, self.content_encoding),
  11. accept_callback=self.on_accepted,
  12. timeout_callback=self.on_timeout,
  13. callback=self.on_success,
  14. error_callback=self.on_failure,
  15. soft_timeout=soft_time_limit or default_soft_time_limit,
  16. timeout=time_limit or default_time_limit,
  17. correlation_id=task_id,
  18. )
  19. # cannot create weakref to None
  20. # pylint: disable=attribute-defined-outside-init
  21. self._apply_result = maybe(ref, result)
  22. return result

5.3 总结

因为信息量太大,所以分为三个图展示。

5.3.1 Strategy

strategy 逻辑为:

  1. +-----------------------+ +---------------------------+
  2. | Celery | | Consumer |
  3. | | | |
  4. | consumer +---------------------> | | +---------------+
  5. | | | task_consumer +---------------> | amqp.Consumer |
  6. | _tasks | | | +---------------+
  7. | + | | |
  8. | | | | strategies +----------------+
  9. +-----------------------+ | | |
  10. | | | |
  11. | +---------------------------+ |
  12. | v
  13. v
  14. +------------------------------------------------------+-------------------------------------+ +-----------------------------------------------------------------------------+
  15. | | | strategies = {dict: 10} |
  16. | TaskRegistry | | 'celery.chunks' = function default.<locals>.task_message_handler |
  17. | | | 'celery.backend_cleanup' = function default.<locals>.task_message_handler |
  18. | NotRegistered = {type} <class 'celery.exceptions.NotRegistered'> | | 'celery.chord_unlock' = function default.^locals>.task_message_handler |
  19. | 'celery.chunks' = {chunks} <@task: celery.chunks of myTest> | | 'celery.group' = function default.<localsv.task_message_handler |
  20. | 'celery.backend_cleanup' = {backend_cleanup} <@task: celery.backend_cleanup of myTest > | | 'celery.map' = function default.<locals>.task_message_handler |
  21. | 'celery.chord_unlock' = {unlock_chord} <@task: celery.chord_unlock of myTest> | | 'celery.chain' = function default.<locals>.task_message_handler |
  22. | 'celery.group' = {group} <@task: celery.group of myTest> | | 'celery.starmap' = function default.<locals>.task_message_handler |
  23. | 'celery.map' = {xmap} <@task: celery.map of myTest> | | 'celery.chord' = function default.<locals>.task_message_handler |
  24. | 'celery.chain' = {chain} <@task: celery.chain of myTest> | | 'myTest.add' = function default.<locals^.task_message_handler |
  25. | 'celery.starmap' = {xstarmap} <@task: celery.starmap of myTest> | | 'celery.accumulate' = function default.vlocals>.task_message_handler |
  26. | 'celery.chord' = {chord} <@task: celery.chord of myTest> | | |
  27. | 'myTest.add' = {add} <@task: myTest.add of myTest> | +-----------------------------------------------------------------------------+
  28. | 'celery.accumulate' = {accumulate} <@task: celery.accumulate of myTest> |
  29. | |
  30. +--------------------------------------------------------------------------------------------+

手机如下

5.3.2 注册 task 逻辑

Celery 应用中注册的task 逻辑为

  1. +------------------------------+
  2. | _on_app_finalizers = set() |
  3. | |
  4. +--------------+---------------+
  5. |
  6. connect_on_app_finalize |
  7. +------------+ |
  8. | builtins.py| +-----------------------> |
  9. +------------+ |
  10. |
  11. connect_on_app_finalize |
  12. +-------------+ |
  13. |User Function| +----------------------> |
  14. +-------------+ |
  15. v
  16. +----------------------------------------------------------------------------------------------------+
  17. | _on_app_finalizers |
  18. | |
  19. | |
  20. | ^function add_chunk_task> |
  21. | <function add_backend_cleanup_task> |
  22. | <function add_starmap_task> |
  23. | <function add_group_task> |
  24. | <function add_map_task^ |
  25. | <function Celery.task.vlocals^.inner_create_task_cls.<locals>._create_task_cls.<locals>.cons> |
  26. | <function add_accumulate_taskv |
  27. | <function add_chain_task> |
  28. | <function add_unlock_chord_task> |
  29. | vfunction add_chord_task> |
  30. | |
  31. +----------------------------+-----------------------------------------------------------------------+
  32. |
  33. |
  34. | +--------------------------------------------------------------------------------------------+
  35. finalize v | |
  36. | TaskRegistry |
  37. +---------------------------+ | |
  38. | | | |
  39. | Celery | | |
  40. | | | NotRegistered = {type} <class 'celery.exceptions.NotRegistered'> |
  41. _process_task <-------------------+ process_task| | 'celery.chunks' = {chunks} <@task: celery.chunks of myTest> |
  42. | | | 'celery.backend_cleanup' = {backend_cleanup} <@task: celery.backend_cleanup of myTest > |
  43. | | | 'celery.chord_unlock' = {unlock_chord} <@task: celery.chord_unlock of myTest> |
  44. | _tasks +-------------> | 'celery.group' = {group} <@task: celery.group of myTest> |
  45. +---------------+ | | | 'celery.map' = {xmap} <@task: celery.map of myTest> |
  46. | amqp.Consumer | <--------+ task_consumer | | 'celery.chain' = {chain} <@task: celery.chain of myTest> |
  47. +---------------+ | | | 'celery.starmap' = {xstarmap} <@task: celery.starmap of myTest> |
  48. +---------------------------+ | 'celery.chord' = {chord} <@task: celery.chord of myTest> |
  49. | 'myTest.add' = {add} <@task: myTest.add of myTest> |
  50. | 'celery.accumulate' = {accumulate} <@task: celery.accumulate of myTest> |
  51. | |
  52. +--------------------------------------------------------------------------------------------+

手机如下:

5.3.3 处理任务逻辑

当从broker获取消息之后,处理任务时候逻辑为:

  1. +
  2. Consumer |
  3. message |
  4. v strategy +------------------------------------+
  5. +------------+------+ | strategies |
  6. | on_task_received | <--------+ | |
  7. | | |[myTest.add : task_message_handler] |
  8. +------------+------+ +------------------------------------+
  9. |
  10. |
  11. +------------------------------------------------------------------------------------+
  12. strategy |
  13. |
  14. |
  15. v Request [myTest.add]
  16. +------------+-------------+ +---------------------+
  17. | task_message_handler | <-------------------+ | create_request_cls |
  18. | | | |
  19. +------------+-------------+ +---------------------+
  20. | _process_task_sem
  21. |
  22. +--------------------------------------------------------------------------------------+
  23. Worker | req[{Request} myTest.add]
  24. v
  25. +--------+-----------+
  26. | WorkController |
  27. | |
  28. | pool +-------------------------+
  29. +--------+-----------+ |
  30. | |
  31. | apply_async v
  32. +-----------+----------+ +---+-------+
  33. |{Request} myTest.add | +---------------> | TaskPool |
  34. +----------------------+ +-----------+
  35. myTest.add

手机如下图:

至此,Celery启动全部分析结束,我们下一步看看一个完整的例子,即消息如何从发送到被消费的流程。

0xFF 参考

celery源码分析-Task的初始化与发送任务

Celery 源码解析三: Task 对象的实现

Celery-4.1 用户指南: Application(应用)

[源码解析] 并行分布式任务队列 Celery 之 Task是什么的更多相关文章

  1. [源码解析] 并行分布式任务队列 Celery 之 消费动态流程

    [源码解析] 并行分布式任务队列 Celery 之 消费动态流程 目录 [源码解析] 并行分布式任务队列 Celery 之 消费动态流程 0x00 摘要 0x01 来由 0x02 逻辑 in komb ...

  2. [源码解析] 并行分布式任务队列 Celery 之 多进程模型

    [源码解析] 并行分布式任务队列 Celery 之 多进程模型 目录 [源码解析] 并行分布式任务队列 Celery 之 多进程模型 0x00 摘要 0x01 Consumer 组件 Pool boo ...

  3. [源码解析] 并行分布式任务队列 Celery 之 EventDispatcher & Event 组件

    [源码解析] 并行分布式任务队列 Celery 之 EventDispatcher & Event 组件 目录 [源码解析] 并行分布式任务队列 Celery 之 EventDispatche ...

  4. [源码解析] 并行分布式任务队列 Celery 之 负载均衡

    [源码解析] 并行分布式任务队列 Celery 之 负载均衡 目录 [源码解析] 并行分布式任务队列 Celery 之 负载均衡 0x00 摘要 0x01 负载均衡 1.1 哪几个 queue 1.1 ...

  5. [源码分析] 并行分布式任务队列 Celery 之 Timer & Heartbeat

    [源码分析] 并行分布式任务队列 Celery 之 Timer & Heartbeat 目录 [源码分析] 并行分布式任务队列 Celery 之 Timer & Heartbeat 0 ...

  6. [源码解析] 并行分布式框架 Celery 之 Lamport 逻辑时钟 & Mingle

    [源码解析] 并行分布式框架 Celery 之 Lamport 逻辑时钟 & Mingle 目录 [源码解析] 并行分布式框架 Celery 之 Lamport 逻辑时钟 & Ming ...

  7. [源码解析] 并行分布式框架 Celery 之架构 (2)

    [源码解析] 并行分布式框架 Celery 之架构 (2) 目录 [源码解析] 并行分布式框架 Celery 之架构 (2) 0x00 摘要 0x01 上文回顾 0x02 worker的思考 2.1 ...

  8. [源码解析] 并行分布式框架 Celery 之架构 (1)

    [源码解析] 并行分布式框架 Celery 之架构 (1) 目录 [源码解析] 并行分布式框架 Celery 之架构 (1) 0x00 摘要 0x01 Celery 简介 1.1 什么是 Celery ...

  9. [源码解析] 并行分布式框架 Celery 之 worker 启动 (1)

    [源码解析] 并行分布式框架 Celery 之 worker 启动 (1) 目录 [源码解析] 并行分布式框架 Celery 之 worker 启动 (1) 0x00 摘要 0x01 Celery的架 ...

随机推荐

  1. how to get selected option text in javascript

    how to get selected option text in javascript refs https://developer.mozilla.org/en-US/docs/Web/API/ ...

  2. H5 直播 & App 直播

    H5 直播 & App 直播 polyv 直播 https://github.com/polyv 宝利威 直播 https://www.polyv.net/live/ SDK https:// ...

  3. xcode upgrade & git bug

    xcode upgrade & git bug ➜ op-static git checkout feature/select-seat-system Agreeing to the Xcod ...

  4. nasm astrncat_s函数 x86

    xxx.asm: %define p1 ebp+8 %define p2 ebp+12 %define p3 ebp+16 %define p4 ebp+20 section .text global ...

  5. 人物传记STEPHEN LITAN:去中心化存储是Web3.0生态重要组成

    近期,NGK.IO的开发团队首席技术官STEPHEN LITAN分享了自己对去中心化储存的观点,以下为分享内容. 目前的存储方式主要是集中式存储,随着数据规模和复杂度的迅速增加,集中存储的数据对于系统 ...

  6. OAuth:每次授权暗中保护你的那个“MAN”

    摘要:OAuth是一种授权协议,允许用户在不将账号口令泄露给第三方应用的前提下,使第三方应用可以获得用户在某个web服务上存放资源的访问权限. 背景 在传统模式下,用户的客户端在访问某个web服务提供 ...

  7. Content type 'application/json;charset=UTF-8' not supported异常的解决过程

    首先说一下当时的场景,其实就是一个很简单的添加操作,后台传递的值是json格式的,如下图 ,后台对应的实体类, @Data @EqualsAndHashCode(callSuper = false) ...

  8. MYSQL 悲观锁和乐观锁简单介绍及实现

    1:悲观锁 1.1 特点: 每次查询都会进行锁行,怕"其他人"进行数据的修改. 1.2 实现步骤: 步骤1:开启事务test1,并对id=2的记录进行查询,并加锁,如:   步骤2 ...

  9. 第43天学习打卡(JVM探究)

    JVM探究 请你谈谈你对JVM的理解?Java8虚拟机和之前的变化更新? 什么是OOM,什么是栈溢出StackOverFlowError? 怎么分析? JVM的常用调优参数有哪些? 内存快照如何抓取, ...

  10. socket短连接太多,accept次数很多导致主线程CPU占满,工作线程CPU占用率低

    1.使用epoll的ET模式: 2.开启reuseport方法: Linux 最新SO_REUSEPORT特性:http://www.mamicode.com/info-detail-2201958. ...