Python profiling

Profiling（性能调试）是我一直很感兴趣的一个话题，之前给大家介绍过Datadog这个工具，今天我们来看看Python语言中有哪些方法来做Profiling。

Poorman's Profiler

最基础的就是使用time.time()来计时，这个方法简单有效，也许所有写过Python代码的人都用过。我们可以创建一个decorator使它用起来更方便。

import time

import logging

import functools

def simple_profiling(func):

    @wraps.functools(func)

    def wrapped(*args, **kwargs):

        start_time = time.time()

        result = func(*args, **kwargs)

        time_spent = time.time() - start_time

        fullname = '{}.{}'.format(func.__module__, func.func_name)

        logging.info('%s[args=%s kwargs=%s] completed in %.4fs',

                     fullname, args, kwargs, time_spent)

        return result

@simple_profiling

def foo(sec):

    time.sleep(sec)

这个方法的优点是简单，额外开效非常低（大部分情况下可以忽略不计）。但缺点也很明显，除了总用时，没有任何其他信息。

cProfile

cProfile是Python标准库中的一个模块，它可以非常仔细地分析代码执行过程中所有函数调用的用时和次数。cProfile最简单的用法是用cProfile.run来执行一段代码，或是用python -m cProfile myscript.py来执行一个脚本。例如，

import cProfile

import requests

cProfile.run('requests.get("http://tech.glowing.com")')

以上代码的执行结果如下

         5020 function calls (5006 primitive calls) in 1.751 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.000    0.000    1.751    1.751 <string>:1(<module>)

        4    0.000    0.000    0.000    0.000 <string>:8(__new__)

       13    0.000    0.000    0.000    0.000 Queue.py:107(put)

       14    0.000    0.000    0.000    0.000 Queue.py:150(get)

        1    0.000    0.000    0.000    0.000 Queue.py:234(_init)

       27    0.000    0.000    0.000    0.000 Queue.py:237(_qsize)

       13    0.000    0.000    0.000    0.000 Queue.py:240(_put)

    ...

        1    1.027    1.027    1.027    1.027 {_socket.getaddrinfo}

        3    0.715    0.238    0.715    0.238 {method 'recv' of '_socket.socket' objects}

    ...

完整的结果很长，这里我只截取了一部分。结果中每一行包含一个函数的执行信息，每个字段的含义如下

ncalls: 该函数被调用的次数
tottime: 该函数的总耗时，子函数的调用时间不计算在内
percall: tottime / ncalls
cumtime: 该函数加上其所有子函数的总耗时
percall: cumtime / ncalls

这里我们看到requests.get的时间主要是花费在_socket.getaddrinfo和_socket.socket.recv这两个子函数上，其中_socket.getaddrinfo被调用了一次，而_socket.socket.recv被调用了3次。

虽然cProfile.run用起来非常直观，但在实际调试时并不灵活，并且最后打印出来的信息过于冗长。我们可以通过写一个decorator让cProfile的使用更方便，但这里我想介绍Python中另一个非常好用的特性，那就是contextmanager。Decorator用来包裹一个函数，而contextmanager则用来包裹一个代码块。例如，我们希望用cProfile来调试一个代码块，可以创建如下的contextmanager

import cProfile

import pstats

from contextlib import contextmanager

import requests

@contextmanager

def profiling(sortby='cumulative', limit=20):

    pr = cProfile.Profile()

    pr.enable()

    yield

    pr.disable()

    ps = pstats.Stats(pr).sort_stats(sortby)

    ps.print_stats(limit)

with profiling(sortby='tottime', limit=5):

    requests.get('http://tech.glowing.com')

这样就可以很方便地用with语法对一段代码进行性能调试，以上代码的执行结果如下

   5992 function calls (5978 primitive calls) in 0.293 seconds

   Ordered by: internal time

   List reduced from 383 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        6    0.276    0.046    0.276    0.046 {method 'recv' of '_socket.socket' objects}

        5    0.002    0.000    0.002    0.000 {_socket.gethostbyname}

        1    0.001    0.001    0.001    0.001 {_socket.getaddrinfo}

        5    0.001    0.000    0.001    0.000 {_scproxy._get_proxy_settings}

        8    0.001    0.000    0.001    0.000 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py:1376(getproxies_environment)

因为这里的结果按tottime排序，并且只显示最费时的5个函数，所以看起来要比之前的结果清晰很多。

Line Profiler

与cProfile相比，Line Profiler的结果更加直观，它可以告诉你一个函数中每一行的耗时。Line Profiler并不在标准库中，需要用pip来安装

pip install line_profiler

Line Profiler的使用方法与cProfile类似，同样建议为其创建decorator或是contextmanager来方便使用。具体代码可以看这里。

例如，我们用Line Profiler来调试urlparse.parse_qsl函数，其输出结果如下

Line #  % Time  Line Contents

=============================

   390          def parse_qsl(qs, keep_blank_values=0, strict_parsing=0):

   ...

   409    15.3      pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]

   410     0.0      r = []

   411     3.2      for name_value in pairs:

   412     2.8          if not name_value and not strict_parsing:

   413                      continue

   414     8.2          nv = name_value.split('=', 1)

   415     5.7          if len(nv) != 2:

   416                      if strict_parsing:

   417                          raise ValueError, "bad query field: %r" % (name_value,)

   418                      # Handle case of a control-name with no equal sign

   419                      if keep_blank_values:

   420                          nv.append('')

   421                      else:

   422                          continue

   423     5.3          if len(nv[1]) or keep_blank_values:

   424    32.0              name = unquote(nv[0].replace('+', ' '))

   425    22.4              value = unquote(nv[1].replace('+', ' '))

   426     4.6              r.append((name, value))

   427

   428     0.4      return r

Line Profiler的结果虽然直观，但由于它不能显示子函数运行的详细信息，所以实际可以使用的场景比较有限。

定制低开销的Profiler

cProfile虽然是非常不错的性能调试工具，应用场景也很广泛，但它最主要的缺点是有很大的额外开销（overhead）。这使cProfile不适合在生产环境中使用，而且过大的额外开销有时会导致Profiler打开与关闭状态下代码的性能瓶颈不一致。cProfile高开销来自于它对每一个函数调用进行计时，但很多时候，其实我们只关心某些类库的执行时间。以典型的网络移动应用的服务程序为例，它们的大部分时间花费在I/O上，我们在做性能分析时，关心的是在响应用户请求时，服务器在RPC, Cache和DB这些I/O操作上各花费了多少时间。如果更细致一些，我们也许会关心在一次用户请求中，DB的select/insert/update各被调用了多少次，花费了多少时间。

如果大家熟悉如New Relic之类的APM服务（Application Performance Monitoring），它们可以将每一个web请求的响应时间分解成RPC/Cache/DB的开销，同时它并不会对服务器造成太多额外的负担。我们也可以自己编写这类的Profiler，需要做的就是那一些特定的I/O函数进行monkey-patch，将原先的函数替换成带有计时器的函数。比如，我们想对所有的requests.get函数计时，但又不能更改requests模块的代码，我们可以这样做

import requests

setattr(requests, 'get', simple_profiling(requests.get))

如果你想对requests模块下所有的公有API计时，我们可以写一个patch_module辅助函数

import types

import requests

def patch_module(module):

    methods = [m for m in dir(module) if not m.startswith('_')

                and isinstance(getattr(module, m), types.FunctionType)]

    for name in methods:

        func = getattr(module, name)

        setattr(module, name, simple_profiling(func))

patch_module(requests)

那要是我们想对所有redis的访问计时呢？这时我们需要对redis.StrictRedis这个类下的所有方法进行monkey-patch，我们可以写一个patch_class辅助函数

import types

import redis

def patch_class(cls):

    methods = [m for m in dir(cls) if not m.startswith('_')

                and isinstance(getattr(cls, m), types.MethodType)]

    for name in methods:

        method = getattr(cls, name)

        patched_func = simple_profiling(method.im_func)

        if isinstance(method, classmethod):

            setattr(cls, name, classmethod(patched_func))

        elif isinstance(method, staticmethod):

            setattr(cls, name, staticmethod(patched_func))

        else:

            setattr(cls, name, patched_func)

patch_class(redis.StrictRedis)

在Github上，我们给出了一个完整的自定义Profiler的例子，它可以patch一个模块或是一个类，对不同的方法归类，profiling结束时打印出各个方法的时间，以及各个类别的总耗时。同时这个Profiler也是线程安全的。由于篇幅有限，这里只给出使用这个Profiler的例子，实现部分请查阅源代码

patch_module(requests, category='http')

patch_class(redis.StrictRedis, category='redis')

with profiling('Custom profiler demo', verbose=True):

    resp = requests.get('http://tech.glowing.com')

    redis_conn = redis.StrictRedis()

    redis_conn.set('foo', 'bar')

    redis_conn.get('foo')

其执行结果如下

Performance stats for Custom profiler demo

CATEGROY           TIME

----------  -----------

http        0.405049

redis       0.000648975

CALL                                 TIME

----------------------------  -----------

requests.api.get              0.405049

redis.client.StrictRedis.set  0.000494003

redis.client.StrictRedis.get  0.000154972

CALL HISTORY  www.90168.org

    0.4050s requests.api.get[args=('http://tech.glowing.com',) kwargs={}]

    0.0005s redis.client.StrictRedis.set[args=(<redis.client.StrictRedis object at 0x7ffddb574710>, 'foo', 'bar') kwargs={}]

    0.0002s redis.client.StrictRedis.get[args=(<redis.client.StrictRedis object at 0x7ffddb574710>, 'foo') kwargs={}]

小结

性能调试是一个很大的题目，这里介绍的只是非常初级的知识。在最后一小节给出的例子也是希望大家不要局限于现有的工具，而是能根据自己的需求，打造最合适的工具。欢迎大家交流讨论！