Exploring Python Code Objects

https://late.am/post/2012/03/26/exploring-python-code-objects.html

Inspired by David Beazley's Keynote at PyCon, I've been digging around in code objects in Python lately. I don't have a particular axe to grind, nor some particular task to solve (yet?), so consider this post just some notes and ramblings that might be of interest (and my apologies if not).

Disclaimer: This post is about CPython version 2.7, though much of it is also likely true for other CPython versions (including 3.x). I make no claims to its accuracy or applicability to PyPy, Jython, IronPython, etc.

Step 0: What?

So first of all, what is a code object? Many people (particularly Python haters) claim that Python is an interpreted language, but all your Python code is actually compiled before it is ever executed. This goes even for code you write interactively in the Python shell. CPython implements a virtual machine that executes a stack-based bytecode. At runtime, executable things (functions, methods, modules, class bodies, lambdas, statements, expressions, etc) are all executed as bytecode by the Python virtual machine.

Code objects, then, are Python objects which represent some piece of bytecode, along with all that it needs to execute: a declaration of the expected argument types and counts, a list (not dictionary! more about which later) of locals, information about the source code from which the bytecode was generated (for debugging and printing stack traces), etc -- oh, and also (perhaps obviously), the bytecode itself, as a str (or, in Python3, bytes).

Though code objects represent some piece of executable code, they are not, by themselves, directly callable. To execute a code object, you must use the exec keyword or eval() function.

Step 1: Make some Code

Most of the time, you won't encounter code objects in ordinary Python programming. When you do, there's a very good chance that they were created and are managed for you by Python, without any special attention. In some cases, you might want to create code objects yourself, like in this post where we'll be experimenting with them:

>>> code_str = """
... print "Hello, world"
... """
>>> code_obj = compile(code_str, '<string>', 'exec')
>>> code_obj
<code object <module> at 0x1054c74b0, file "<string>", line 2>

Woohoo, your first code object!

The first argument to compile() is the string of Python code to be compiled, which should be obvious. The second defines the "filename" of the piece of code (here, as is conventional, we use <string> to indicate code attained from the interactive shell). The third is the type of compilation, which most often will be exec as you see here. The other choices for mode are eval, which is used for strings containing only a single expression, orsingle, in which the generated code object is expected to contain a single statement, whose return value is printed if it is not None (like in the interactive shell).

When using eval mode, if the code contains statements (as our example above does, it contains a printstatement), compilation will fail with a syntax error:

>>> code_str = """
... print "Hello, world"
... """
>>> code_obj = compile(code_str, '<string>', 'eval')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 2
print "Hello, world"
^
SyntaxError: invalid syntax

When using single, only a single statement is processed; multiple statements (or non-statements) will be ignored:

>>> code_str = """
... print "Hello, world"
... print "Goodbye, world"
... """
>>> code_obj = compile(code_str, '<string>', 'single')
>>> exec code_obj
Hello, world

What happened to my "goodbye"?

For the rest of the post, we'll stick with exec, which is the type of compilation Python does for you when importing modules.

Step 2: Open 'er Up

Let's go back to our first example, and have a look inside the code object to see what we have:

>>> code_str = """
... print "Hello, world"
... """
>>> code_obj = compile(code_str, '<string>', 'exec')
>>> dir(code_obj)
# dunder attributes excluded for readability
['co_argcount', 'co_cellvars', 'co_code', 'co_consts', 'co_filename',
'co_firstlineno', 'co_flags', 'co_freevars', 'co_lnotab', 'co_name',
'co_names', 'co_nlocals', 'co_stacksize', 'co_varnames']

These attributes are documented in the inspect module, but I'll highlight a few cool ones here:

First, we can see where our second argument to compile() ended up:

>>> code_obj.co_filename
'<string>'

And, perhaps surprisingly, our code represents an anonymous module (code compiled with exec mode is always treated as module-level code, though, of course, it can contain function or class definitions, or any other valid Python):

>>> code_obj.co_name
'<module>'

And, as we expect, a code object representing a Python module (that's effectively what our code string was -- a series of statements at the top-most level, that is, not indented at all) takes no arguments:

>>> code_obj.co_argcount
0
>>> code_obj.co_varnames
()

If we were to take a piece of code from a function which does have arguments, we'd see them here:

>>> def foo(x, y):
... print x, y
...
>>> foo.func_code
<code object foo at 0x1054b9830, file "<stdin>", line 1>
>>> foo.func_code.co_varnames
('x', 'y')
>>> foo.func_code.co_argcount
2

If you're curious, you can also see the raw bytecode that will be processed by the Python virtual machine:

>>> code_obj.co_code
'd\x00\x00GHd\x01\x00S'

I don't recommend trying to learn to read that directly, there's an easier way (hint: see the next section).

Finally, we have one constant object within scope, the string "Hello, world", which is printed by our code:

>>> code_obj.co_consts
('Hello, world', None)

Wait. Where's that None coming from?

A Detour into Code Disassembly

We can see exactly what's going on in our code object with the dis module, which disassembles code objects into a human readable series of bytecode instructions:

>>> dis.dis(code_obj)
2 0 LOAD_CONST 0 ('Hello, world')
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 1 (None)
8 RETURN_VALUE

Reading disassembled Python code requires a bit of experience, so let me walk you through it. The LOAD_CONSTinstruction reads a value from the co_consts tuple, and pushes it onto the top of the stack. The PRINT_ITEMinstruction pops the top of the stack, and prints the string representation. PRINT_NEWLINE should be pretty self-explanatory.

Next we see the mysterious None. It turns out this is a bit of a quirk of the implementation details of the CPython virtual machine. Since function calls in Python (including "hidden" function calls, like those behind animport statement) are implemented with function calls in C in the Python virtual machine, modules actually have a return value -- this indicates to the Python virtual machine that execution of the module has completed, and control can be returned to the calling scope (i.e. the module in which the import statement appeared). I won't embarrass myself by trying to explain this further -- if you are interested, see Larry Hastingss PyCon presentation Stepping through CPython around 44:22 -- that video covers Python 3.x, but Python 2.7 does the same thing. If you're interested in this sort of implementation detail, then you should definitely watch the entirety of this video, and David Beazley's keynote as well.

Step 3: Interesting Internals

Many of the features we've looked at are clearly useful for a running Python virtual machine, but what about the human side of the story? What if we want to interactively debug code (using pdb or a similar tool), or get helpful, readable tracebacks from exceptions?

It turns out, code objects support this as well. As we've already seen, code objects indicate from which file they were generated, and this will obviously help in looking up source code; they also indicate the line number on which the source code for this code object begins:

>>> code_obj.co_firstlineno
2

And the mysterious co_lnotab attribute. To illustrate its purpose, we'll need a larger code snippet:

>>> code_str = """
... x = 1
... y = 2
... print x + y
... """
>>> code_obj = compile(code_str, '<string>', 'exec')
>>> code_obj.co_lnotab
'\x06\x01\x06\x01'

Hm, so what are we to make of this? Perhaps the dis module can help here again:

>>> dis.dis(code_obj)
2 0 LOAD_CONST 0 (1)
3 STORE_NAME 0 (x) 3 6 LOAD_CONST 1 (2)
9 STORE_NAME 1 (y) 4 12 LOAD_NAME 0 (x)
15 LOAD_NAME 1 (y)
18 BINARY_ADD
19 PRINT_ITEM
20 PRINT_NEWLINE
21 LOAD_CONST 2 (None)
24 RETURN_VALUE

At the far left of (some) lines, is the line number of the Python source from which this code object was created (notice that 2 here corresponds to the value of code_obj.co_firstlineno). The next column is the offset into the code of the bytecode instruction, 0 bytes for the first instruction, 3 bytes for the second, and so on. The third column is the instruction name itself, and the fourth is the argument to the instruction, if any, along with the value of the argument in parentheses.

Now we can put this together with the co_lnotab (which stands for "line number table", by the way) to see how Python makes sense of the code objects' relation to their original source code:

>>> code_obj.co_lnotab
'\x06\x01\x06\x01'

After a little tinkering and trial and error, I realized that this is a series of pairs of bytes: the first is a length offset into the bytecode (6 bytes, which advances us to the second LOAD_CONST as seen in our disassembly), followed by a number of source lines of code that the skipped instructions appeared on.

We can confirm this theory by slightly modifying our source code, recompiling, and examining the co_lnotabattribute of the resulting code object:

>>> code_str2 = """
... x = 1
...
... y = 2
... print x + y
... """
>>> code_obj2 = compile(code_str2, '<string>', 'exec')
>>> code_obj2.co_lnotab
'\x06\x02\x06\x01'

We've moved the second assignment down one line, so we see that in the second byte of co_lnotab, we are incrementing the "current line number" by two instead of by one.

We can also verify that the bytecode resulting from these two (slightly) different source codes is identical:

>>> code_obj2.co_code == code_obj.co_code
True

Since both the bytecode offset and line number offset at single (unsigned) bytes, one might wonder what happens if you have, say, 257 (or more) blank lines between statements in a Python source file? Let's see:

>>> thousand_blanks = '\n' * 1000
>>> code_str = """
... x = 1
... """ + thousand_blanks + """
... y = 2
... print x + y
... """
>>> code_obj = compile(code_str, '<string>', 'exec')
>>> code_obj.co_lnotab
'\x06\xff\x00\xff\x00\xff\x00\xec\x06\x01'

Since both the bytecode offsets and line offsets are, well, offsets, having large empty spaces just means that some of the interleaved offsets are 0-length offsets. Here we have a 6-byte offset into bytecode, followed by a 255-line offset into the source code, then a 0-byte offset into the bytecode, another 255 lines of source, another 0 bytes into the bytecode, yet another 255 lines of source, one more 0-byte offset into bytecode, and a final 236 lines of offset into the source code (then the usual, expected 6 bytes of bytecode and 1 line of source code for the final print statement). Neat!

Exploring Python Code Objects的更多相关文章

  1. Python integer objects implementation

    http://www.laurentluce.com/posts/python-integer-objects-implementation/ Python integer objects imple ...

  2. Python string objects implementation

    http://www.laurentluce.com/posts/python-string-objects-implementation/ Python string objects impleme ...

  3. 机器学习算法实现(R&Python code)

    Machine Learning Algorithms Machine Learning Algorithms (Python and R) 明天考试,今天就来简单写写机器学习的算法 Types Su ...

  4. How to run Python code from Sublime

    How to run Python Code from Sublime,and How to run Python Code with input from sublime Using Sublime ...

  5. Python code 提取UML

    Python是一门支持面向对象编程的语言,在大型软件项目中,我们往往会使用面向对象的特性去组织我们的代码,那有没有这样一种工具,可以帮助我们从已有代码中提取出UML图呢?答案是有的.以下,我们逐个介绍 ...

  6. PEP 8 – Style Guide for Python Code

    原文:PEP 8 – Style Guide for Python Code PEP:8 题目:Python代码风格指南 作者:Guido van Rossum, www.yszx11.cnBarry ...

  7. Change the environment variable for python code running

    python程序运行中改变环境变量: Trying to change the way the loader works for a running Python is very tricky; pr ...

  8. python code

    执行动态语句 执行字符串中的代码 http://www.cnblogs.com/fanweibin/p/5418817.html #!usr/bin/env python #coding:utf-8 ...

  9. Python——Code Like a Pythonista: Idiomatic Python

    Code Like a Pythonista: Idiomatic Python 如果你有C++基础,那学习另一门语言会相对容易.因为C++即面向过程,又面向对象.它很底层,能像C一样访问机器:它也很 ...

随机推荐

  1. Windows 窗体最小化和隐藏的区别及恢复

    应用程序有托盘图标,窗体最小化或者隐藏窗体时,点击托盘图标需要恢复窗体显示,并且恢复之前的现实状体,正常大小或者最大化显示.例如:最大化显示窗体时,点击最小化按钮,窗体最小化,点击托盘图标将最大化显示 ...

  2. [OFBiz]开发 四

    1.在几个已安装的应用模块中,资产管理模块,是最简单的,可以从这个开始入手.E:\eclipse-SDK-3.7.1-win32\ofbiz\apache-ofbiz-10.04\specialpur ...

  3. C#调用C++导出类(转)

    由于使用别人的Dll,导出的是一个实体类,在C#里封送很难,百度下,有个朋友回复一篇英文的,虽然不一定使用,但可以作为一个知识点,现把原文贴下: c#调用C++写的dll导出类,包含继承,重载等详细介 ...

  4. 如何给10^7个数据量的磁盘文件排序--bitset

    题目: 输入:给定一个文件,里面最多含有n个不重复的正整数(也就是说可能含有少于n个不重复正整数),且其中每个数都小于等于n,n=10^7.输出:得到按从小到大升序排列的包含所有输入的整数的列表. 分 ...

  5. pci hole -- 被吞噬的内存

    参见wiki: http://en.wikipedia.org/wiki/PCI_hole PCI 空洞 pci 空洞是32位硬件和32位操作系统一个导致计算机显示的内存比实际安装的内存少的一个限制. ...

  6. 递归算法,JavaScript实现

    我们先来看一下定义.递归算法,是将问题转化为规模缩小的同类问题的子问题,每一个子问题都用一个同样的算法去解决.一般来说,一个递归算法就是函数调用自身去解决它的子问题. 递归算法的特点: 在函数过程中调 ...

  7. WebService学习之三:spring+cxf整合

    步骤一:spring项目(java web项目)引入CXF jar包 步骤二:创建webservice服务器 1)创建一个服务接口 package com.buss.app.login; import ...

  8. Com进程通信(Delphi2007)

    相关资料: 1.http://my.oschina.net/u/582827/blog/2847662.http://www.cnblogs.com/findumars/p/5277561.html3 ...

  9. PC问题-可以PING通IP,PING名字不通,可以远程,但不能访问共享文件夹?

    问题现象:可以PING通IP,PING名字不通,可以远程,但不能访问共享文件夹? 问题原因:目标主机中NetLogon服务未启动. 问题处理:远程打开目标主机,“我的电脑-管理-服务-启动(改为自动) ...

  10. Oracle 表数据去重

    Oracle数据库中重复数据怎么去除?使用数据表的时候经常会出现重复的数据,那么要怎么删除呢?下面我们就来说一说去除Oracle数据库重复数据的问题.今天我们要说的有两种方法. 一.根据rowid来去 ...