Python演讲笔记1

参考：

1. The Clean Architecture in Python (Brandon Rhodes)

2. Python Best Practice Patterns (Vladimir Keleshev)

3. Transforming Code into Beautiful, Idiomatic Python (Raymond Hettinger)

4. How to Write Resuable Code (Greg Ward)

5. How to write actually object-oriented python (Per Fagrell)

最近看了一些 Python 的演讲，觉得很有启发。

1. The Clean Architecture in Python (Brandon Rhodes)

我们习惯上用子程序来隐藏复杂的 IO，而不是真正的与逻辑进行解耦，所以就不如把 IO 从程序的底层提升到顶层。

Listing 1，访问 API，尝试获取 Definition 字段信息并返回

# Listing 1

import requests

from urllib import urlencode

def find_definition(word):

    q = 'define' + word

    url = 'http://api.duckduckgo.com/?'

    url += urlencode({'q': q, 'format': 'json'})

    response = requests.get(url)    # I/O

    data = response.json()          # I/O

    definition = data[u'Definition']

    if definition == u'':

        raise ValueError('that is not a word')

    return definition

Listing 2，意识到 IO 应该与逻辑分离，于是有了 call_json_api，表面上，IO 被隐藏了，但是，IO 并没有与逻辑分离。

现在想测试 find_definition ，有可能绕过 IO 么？没可能，IO 与逻辑仍然紧密耦合。

再看看 find_definition 究竟做了什么，构建url，IO，判断，依此解耦。

# Listing 2

def find_definition(word):

    q = 'define' + word

    url = 'http://api.duckduckgo.com/?'

    url += urlencode({'q': q, 'format': 'json'})

    data = call_json_api(url)

    definition = data[u'Definition']

    if definition == u'':

        raise ValueError('that is not a word')

    return definition

def call_json_api(url):

    response = requests.get(url)

    data = response.json()

    return data

Listing 3，代码没有变化，但是进行了新的组合，构建 url 和判断被拆分出来，独立于 IO。

在这里，我认为 IO 维持 call_json_api 也可以，但是可能作者为了突出把 IO 由程序底层提升至最上层。

关键在于，build_url 和 pluck_definition 与 IO 完全解耦，可以随意测试它们，并且它们属于 fast function。

如果要测试 find_definition，那么将比 Listing 1 和 2 的版本更容易。

# Listing 3

def find_definition(word):

    url = build_url(word)

    data = requests.get(url).json()    # I/O

    return pluck_definition(data)

def build_url(word):

    q = 'define ' + word

    url = 'http://api.duckduckgo.com/?'

    url += urlencode({'q': q, 'format': 'json'})

    return url

def pluck_definition(data):

    definition = data[u'Definition']

    if definition == u'':

        raise ValueError('that is not a word')

    return definition

把没有副作用的函数称为纯函数，纯函数更容易测试。依赖注入和猴子补丁，是在对错误的程序结构进行弥补，而 Python 可以尽量避免。

build_url 和 pluck_definition 属于纯函数，而 Listing 1 和 Listing 2 中的 find_definition，在 Python 要依赖猴子补丁进行测试了。

def test_build_url():

    assert build_url('word') == (

        'http://api.duckduckgo.com/'

        '?q=define+word&format=json'

    )

def test_build_url_with_punctuation():

    assert build_url('what?!') == (

        'http://api.duckduckgo.com/'

        '?q=define+what%3F%21&format=json'

    )

def test_build_url_with_hyphen():

    assert build_url('hyphen-ate') == (

        'http://api.duckduckgo.com/'

        '?q=define+hyphen-ate&format=json'

    )

函数式编程的最大优势可能不是不可变数据结构，而是它们是在处理我们可以想象到的数据，并用 Shell 命令举例。

再来看两个版本的 find_definition, 对比 Listing 1，Listing 3 的 find_definition 明显更清晰，清晰在哪里？

word -> url -> data 这都是真实的数据，就像是 Shell 中的管道一样，数据从一个管道流向下一个管道，并且我们能清晰的想象到，这个数据每一步的形态。

＃Listing 1

def find_definition(word):

    q = 'define' + word

    url = 'http://api.duckduckgo.com/?'

    url += urlencode({'q': q, 'format': 'json'})

    response = requests.get(url)    # I/O

    data = response.json()          # I/O

    definition = data[u'Definition']

    if definition == u'':

        raise ValueError('that is not a word')

    return definition

＃Listing 3

def find_definition(word):

    url = build_url(word)

    data = requests.get(url).json()    # I/O

    return pluck_definition(data)

总结

演讲的题目是 The Clean Architecture in Python，而让 architecture 不 clean 的是因为 IO 操作，IO 操作处理起来往往麻烦，所以将其单独包装放到子程序中，看起来像是解决了问题，但是视而不见的策略下 IO 和逻辑还是强耦和的，没有办法脱离 IO 对某一部分逻辑进行单独测试，为了解决这个问题，静态语言引入了依赖注入，动态语言使用猴子补丁，但都不如从一开始就正视 IO，实现 IO 与逻辑的解耦；只包含逻辑的函数称为纯函数，数据在纯函数中流动，与 Shell 的管道相似，每一步都有具体的数据表现形式，这些就构成了 clean architecture。

2. Python Best Practice Patterns (Vladimir Keleshev)

每个函数有着一个确定的功能，函数内的操作都应该处于同一层次抽象上，依此原则，程序必然表现为众多小函数的集合，每个函数可能只有几行代码。

锅炉管理安全检测，在温度和气压到达临界后自动停机，停机失败触发报警。

safety_check 包括温度压力读取和计算，临界判断，关机，报警，它们属于同一层次，但是它们内部的逻辑不是，依此解耦。

class Boiler(object):

    # ...

    def safety_check(self):

        # Convert fixed-point floating-point:

        temperature = self.modbus.read_holding()

        perssure_psi = self.abb_f100.register / F100_FACTOR

        if psi_to_pascal(pressure_psi) > MAX_PRESSURE:

            if temperature > MAX_TEMPERATURE:

                # Shutdown!

                self.pnoz.relay[15] &= MASK_POWER_COIL

                self.pnoz.port.write("$PL,15\0")

                sleep(RELAY_RESPONSE_DELAY)

                # Successfull shutdown?

                if self.pnoz.relay[16] & MASK_POWER_OFF:

                    # Play alarm:

                    with open(BUZZER_MP3_FILE) as f:

                        play_sound(f.read())

值得一提的是 @property 和 all 的使用，all 之外还有 any

class Boiler(object):

    # ...

    def alarm(self):

        with open(BUZZER_MP3_FILE) as f:

            play_sound(f.read())

    def shutdown(self):

        self.pnoz.relay[15] &= MASK_POWER_COIL

        self.pnoz.port.write("$PL,15\0")

        sleep(RELAY_RESPONSE_DELAY)

        return not (self.ponz.relay[16] & MASK_POWER_OFF)

    def safety_check(self):

        if all((self.pressure > MAX_PRESSURE,

               temperature > MAX_TEMPERATURE)):

                if not self.shutdown():

                    self.alarm()

    @property

    def temperature(self):

        return self.modbus.read_holding()

    @property

    def pressure(self):

        perssure_psi = self.abb_f100.register / F100_FACTOR

        return psi_to_pascal(perssure_psi)

Python类初始化需要的所有参数都应该传递给初始化函数。

# wrong

point = Point()

point.x = 12

point.y = 5

# better

point = Point(x=12, y=5)

point = Point.polar(r=13, theta=22.6)

class Point(object):

    def __init__(self, x, y):

        self.x, self.y = x, y

    @classmethod

    def polar(cls, r, theta):

        return cls(r * cos(theta),

                   r * sin(theta))

一个函数需要很多参数，并且内部有很多临时变量，如何优化？

面对一个复杂的任务，后面的代码依赖 processed，copied，executed 这些临时变量，而临时变量依赖 task， job， obligation 这些参数。

假设 send_task 可以解耦为 prepare, process, execute

def send_task(task, job, obligation):

    ...

    processed = ...

    ...

    copied = ...

    ...

    executed = ...

    ...

    100 more lines

第一次解耦，把生成 processed，copied，executed 的准备工作提出来，并没有很大的改善，并且如果 process 和 execute 也依赖 task， job， obligation

函数的参数过多就会变成一个问题

def prepare(task, job, obligation):

    ...

    return processed, copied, executed

def process(processed, copied, executed)

    ...

    return processed, copied, executed

def execute(processed, copied, executed)

    ...

def send_task(task, job, obligation):

    execute(*process(*prepare(task, job, obligation)))

如果一些函数共享一些数据，那么这就应该是个类，因为类本身就是数据和函数的集合。

class TaskSender(object):

    def __init__(self, task, job, obligation):

        self.task = task

        self.job = job

        self.obligation = obligation

        self.processed = []

        self.copied = []

        self.executed = []

    def __call__(self):

        self.prepare()

        self.process()

        self.execute()

    ...

一些动作应该确保一起进行，如何处理？

使用 Context Manager

# not good

f = open('file.txt', 'w')

f.write('hi')

f.close()

# better

with open('file.txt', 'w') as f:

    f.write('hi')

with SomeProtocol(host, port) as protocol:

    protocol.send(['get', signal])

    result = protocol.receive()

class SomeProtocol(object):

    def __init__(self, host, port):

        self.host, self.port = host, port

    def __enter__(self):

        self._client = socket()

        self._client.connect((self.host, self.port))

        return self

    def __exit__(self, exception, value, traceback):

        self._client.close()

    def send(self, payload): ...

    def receive(self): ...

str 和 repr

debug 时可以直接print实例而不是使用属性初始化字符串

# default

>>> Point(12, 5)

<__main__.Point instance at 0x100b4a758>

# __repr__

>>> Point(12, 5)

Point(x=12, y=5)

# __str__

>>> print(Point(12, 5))

(12, 5)

class Point(object):

    ...

    def __str__(self):

        return '({x}, {y})'.format(self.x, self.y)

    def __repr__(self):

        return '{}(x={}, y={})'.format(self.__class__.__name__,

                                       self.x, self.y)

注释是本应该出现在代码中却丢失的信息，在这个角度上讲注释和bug无异

引自演讲1中提到的极限编程的一个观点，在这里也是讲一部分 comment 可以更好的呈现在程序中

# not good

if self.flags & 0b1000:    # Am I visible?

    ...

# better

@property

def is_visible(self):

    return self.flags & 0b1000

if self.is_visible:

    ...

# Tell my station to process me

self.station.process(self)

省去不必要的判断

if else 在某种程度上可以由良好的设计替代

# normal

if type(entry) is Film:

    responsible = entry.producer

else:

    responsible = entry.author

# better

class Film(object):

    ...

    @property

    def responsible(self):

        return self.producer

entry.responsible

类变量

能使用类变量的地方，尽量不使用全局变量，类变量在实例方法中通过 Classname.variable 使用

迭代器

通过 __iter__ 实现迭代器，还不能完全理解迭代器的好处，但是感觉上使用迭代器要更好一些，不仅仅是代码整洁，Department 的属性更少的暴露可能也是好处。

# normal

class Department(object):

    def __init__(self, *employees):

        self.employees = employees

for employee in department.employees:

    ...

# use __iter__

class Department(object):

    def __init__(self, *employees):

        self._employees = employees

    def __iter__(self):

        return iter(self._employees)

for employee in department:

    ...

Set and Concatenating Streams

set 的使用和神库 itertools

item in a_set

item not in a_set

a_set <= other

a_set.is_subset(other)

a_set | other

a_set.union(other)

a_set & other

a_set.intersection(other)

a_set - other

a_set.difference(other)

# not good

for each in big_list + another_big_list:

    ...

# better

for each in itertools.chain(big_list,

                            another_big_list)

    ...

总结：

演讲1提到的 IO 与逻辑分离，在这里给出了更普适的实践方法，每个函数都应该有确定的功能，函数内的代码应该在同一抽象层次上，依此进行解耦。又提到对于一个大型的方法，方法中代码共享多个参数和临时变量，解耦后函数参数过多，这时候就应该使用类，类是数据和方法的集合。这两点实际上是回答了代码应该怎么组织的问题，剩下的就是一些技巧，比如上下文管理器，迭代器，set，itertools等。还值得一提的是，注释是本应该出现程序中却没有出现的 bug，这不是在宣扬不写注释，而是讲代码应该更明确。

3. Transforming Code into Beautiful, Idiomatic Python (Raymond Hettinger)

reversed, sorted, enumerate, izip, iter, partial, .iteritems(), dict(izip(list1, list2)), dict(enumerate(list1)), defaultdict, .setdefault, popitem is atomic, namedtuple,

deque

4. How to Write Resuable Code (Greg Ward)

OOP was not a silver bullet

面向对象，函数式，协程，都是在解决特定的问题，没有银弹。

OOP是众多方法中的一种，崇拜和排斥都不是好的态度。

Fewer Classes More Functions

函数能优雅实现的就不要用类。

如果很多函数共享一些变量，那就是一个类，与演讲2一致。

Functions ≠ Procedures

Pascal's best idea: functions compute stuff, procedures do stuff

rule of thumb: every function should either return a value or have a side effect: never both!

一个函数应该要么有一个副作用，要么返回一个值，但是绝不能既有副作用又返回一个值。

提问中有人问到，如果一个函数是执行了一个副作用然后返回布尔值，按照这条规则就只能生成一个异常，但是很多情况下这又不像是异常，该如何处理？

演讲者对此的解释是，这条规则也不是银弹，具体的选择还是取决于应用场景。

Extensibility ≠ Reusability

仅仅一个 class Foo 并不能让代码可扩展可复用。

不要太执着于可扩展性，这很可能只是个故事。

总结：

Python的诱惑很多，函数式，面向对象，设计模式，动态语言不需要设计模式……切记，没有银弹，没有免费的午餐。

一个函数应该在副作用和返回值中间二选其一，共享变量的函数就该组合成一个类，进一步补充了演讲1和演讲2中的观点。

隐含着还给出了第二个原则，不要太纠结于可扩展。

5. How to write actually object-oriented python (Per Fagrell)

Single Responsibility Principle 单一职责原则

Code should have one and only one reason to change.

管理连接和接收数据属于不同的职责，需要对其拆分。

# not good

class Modem(object):

    def call(self, number):

        pass

    def disconnect(self):

        pass

    def send_data(self, data):

        pass

    def recv_data(self):

        pass

# better

class ConnectionManage(object):

    def call(self, number):

        pass

    def disconnect(self):

        pass

class DataTransciever(object):

    def send_data(self, data):

        pass

    def recv_data(self):

        pass

业务逻辑和数据持久化也需要进行拆分。

class Person(object):

    def calculate_pay(self):

        ...

    def save(self):

        ...

class Person(object):

    def calculate_pay(self):

        ...

class DbPersistMixin(object):

    def save(self):

        ...

Open/Closed Principle 开闭原则

Code should open to extension but close to modification.

# normal

def validate_link(self, links):

    for link in links:

        track = Track(link)

        self.validate(track)

# when modify

def validate_link(self, links):

    for link in links:

        if link.startwith("spotify:album:"):

            uri = Album(link)

        else:

            uri = Track(link)

        self.validate(uri)

# better

def validate_link(self, links):

    for link in links:

        self.validate(uri_factory(link))

Liskov Substitutability Principle 里氏替换原则

Anywhere you use a base class, you should be able to use a subclass and not know it.

Python duck typing

Interface Segregation Principle 接口隔离原则

Don't force clients to use interfaces they don't need.

Dependency Inversion Principle 依赖倒置原则

High-level modules shouldn't relay on low-level modules. Both should relay on abstractions.

以上五原则就是SOLID

Tell, Don't Ask

Tell objects to do the work, don't ask them for their data.

可以理解意思，但是演讲者的例子有些牵强，除非 calculate 不是为了计算 cost，那么更细的拆分是有意义的。

def calculate(self):

    cost = 0

    for line_item in self.bill.items:

        cost += line_item.cost

def calculate(self):

    cost = self.bill.total_cost()

    ...

总结：

主要介绍了OOP SOLID设计原则，着重介绍了SRP，实际上是 IO 和逻辑分离，函数内部代码同层次抽象的思想延伸到了类，不同的是，类强调单一职责，职责的范围概念上比函数的功能大了一点，另外在讲到应用逻辑与持久化分离的时候，实际上持久化部分使用了多重继承，Python中是典型的钻石继承，不过钻石继承要注意super的使用，并且有一种观点是钻石继承其实不应该存在，可以通过组合的方式来解决。最后一个作者自己添加的原则，实际上与人开车，但是drive方法是在车类中意思相近。