python对象序列化之pickle

本片文章主要是对pickle官网的阅读记录。

The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [1] or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

pickle是python标准模块之一，不需要再额外安装。

pickle用来序列化和反序列化 Python object structure。其实就是一种数据存储方式，将python的数据结构以特定的形式保存下来。另外，经过pickle序列化后的数据不是human-readable的。

这里提一下老外对事物的命名习惯，pickle是腌制的意思，那么对python object的"腌制"，其实就是一种数据处理，至于数据处理的规则是什么，这里暂时不做进一步介绍。

“Pickling” 就是将有层次结构的python object转换成字节流；“unpickling” 就是相反的过程。

说明: 如果碰到“Pickling” “serialization”, “marshalling,” or “flattening”，都是表达相同的意思，翻译成"序列化"就好了；如果单词前加了un，就翻成“反序列化”。

Warning：The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

不要去序列化错误的或者恶意的结构化数据，也不要去反序列化不受信任或未授权的数据源。意思就是“序列化”和“反序列化”要按照pickle模块的规则来进行。

Data stream format

The data format used by pickle is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can’t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects.

pickle使用的数据格式是Python语言特有的。非Python程序可能不能重构被序列化的数据。

By default, the pickle data format uses a relatively compact binary representation. If you need optimal size characteristics, you can efficiently compress pickled data.

默认，pickle的序列化数据格式是一种相对紧凑的二进制表示。如果对数据大小有更高要求，可以压缩已序列化的数据。

The module pickletools contains tools for analyzing data streams generated by pickle. pickletools source code has extensive comments about opcodes used by pickle protocols.

pickletools包含很多用来解析已序列化数据的工具。

There are currently 5 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.
Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.
Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This is the default protocol, and the recommended protocol when compatibility with other Python 3 versions is required.
Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. Refer to PEP 3154 for information about improvements brought by protocol 4.

Note：

Serialization is a more primitive notion than persistence; although pickle reads and writes file objects, it does not handle the issue of naming persistent objects, nor the (even more complicated) issue of concurrent access to persistent objects. The pickle module can transform a complex object into a byte stream and it can transform the byte stream into an object with the same internal structure. Perhaps the most obvious thing to do with these byte streams is to write them onto a file, but it is also conceivable to send them across a network or store them in a database. The shelve module provides a simple interface to pickle and unpickle objects on DBM-style database files.

Module Interface

To serialize an object hierarchy, you simply call the dumps() function. Similarly, to de-serialize a data stream, you call the loads() function. However, if you want more control over serialization and de-serialization, you can create a Pickler or an Unpickler object, respectively.

通过dumps()进行序列化，通过loads()进行反序列化.

The pickle module provides the following constants:

　　pickle.HIGHEST_PROTOCOL 即指定协议版本号为最高版本号。

　　　　An integer, the highest protocol version available. This value can be passed as a protocol value to functions dump() and dumps() as well as the Pickler constructor.

　　pickle.DEFAULT_PROTOCOL 即指定默认版本号。当前的默认版本号是version3

　　　　An integer, the default protocol version used for pickling. May be less than HIGHEST_PROTOCOL. Currently the default protocol is 3, a new protocol designed for Python 3.

The pickle module provides the following functions to make the pickling process more convenient:

　　pickle.dump(obj, file, protocol=None, *, fix_imports=True) 即将数据obj写进文件

Write a pickled representation of obj to the open file object file. This is equivalent to Pickler(file, protocol).dump(obj).

The optional protocol argument, an integer, tells the pickler to use the given protocol; supported protocols are 0 to HIGHEST_PROTOCOL. If not specified, the default is DEFAULT_PROTOCOL. If a negative number is specified, HIGHEST_PROTOCOL is selected.

The file argument must have a write() method that accepts a single bytes argument. It can thus be an on-disk file opened for binary writing, an io.BytesIO instance, or any other custom object that meets this interface.

If fix_imports is true and protocol is less than 3, pickle will try to map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2.

pickle.dumps(obj, protocol=None, *, fix_imports=True)

Return the pickled representation of the object as a bytes object, instead of writing it to a file.

Arguments protocol and fix_imports have the same meaning as in dump().

pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict")

Read a pickled object representation from the open file object file and return the reconstituted object hierarchy specified therein. This is equivalent to Unpickler(file).load().

The protocol version of the pickle is detected automatically, so no protocol argument is needed. Bytes past the pickled object’s representation are ignored.

The argument file must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both methods should return bytes. Thus file can be an on-disk file opened for binary reading, an io.BytesIO object, or any other custom object that meets this interface.

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

pickle.loads(bytes_object, *, fix_imports=True, encoding="ASCII", errors="strict")

Read a pickled object hierarchy from a bytes object and return the reconstituted object hierarchy specified therein.

The protocol version of the pickle is detected automatically, so no protocol argument is needed. Bytes past the pickled object’s representation are ignored.

The pickle module defines three exceptions:

exception pickle.PickleError: Common base class for the other pickling exceptions. It inherits Exception.

exception pickle.PicklingError

Error raised when an unpicklable object is encountered by Pickler. It inherits PickleError.

Refer to What can be pickled and unpickled? to learn what kinds of objects can be pickled.

exception pickle.UnpicklingError

Error raised when there is a problem unpickling an object, such as a data corruption or a security violation. It inherits PickleError.

Note that other exceptions may also be raised during unpickling, including (but not necessarily limited to) AttributeError, EOFError, ImportError, and IndexError.

What can be pickled and unpickled?

The following types can be pickled:

None, True, and False
integers, floating point numbers, complex numbers
strings, bytes, bytearrays
tuples, lists, sets, and dictionaries containing only picklable objects
functions defined at the top level of a module (using def, not lambda)
built-in functions defined at the top level of a module
classes that are defined at the top level of a module
instances of such classes whose __dict__ or the result of calling __getstate__() is picklable (see section Pickling Class Instances for details).