迭代器与生成器

迭代是Python最强大的功能之一，虽然看起来迭代只是处理序列中元素的一种方法，但不仅仅如此。

手动遍历迭代器

想遍历但不想使用for循环。

使用next()方法并在代码中捕获StopIteration异常。

StopIteration用来指示迭代的结尾，也可以通过返回指定结尾。

l = next(iterator, None)

代理迭代

构建了一个自定义容器对象，想在这个容器上执行迭代操作。

只需定义__iter__()方法，将迭代操作代理到容器内部对象上。

class Node:
    def __init__(self, value):
        self._value = value
        self._children = []
    def __repr__(self):
        return 'Node({!r})'.format(self._value)
    def add_child(self, node):
        self._children.append(node)
    def __iter__(self):
        return iter(self._children)
# Example
if __name__ == '__main__':
    root = Node(0)
    child1 = Node(1)
    child2 = Node(2)
    root.add_child(child1)
    root.add_child(child2)
    # Outputs Node(1), Node(2)
    for ch in root:
        print(ch)

迭代器协议需要__iter__()方法返回一个实现了__next__()方法的迭代器对象。

用生成器创建新的迭代模式

只需要实现yield语句即可转换为生成器，并且生成器只能用于迭代操作。

实现迭代器协议

最简单的是使用生成器函数，否则需要实现__iter__()和__next__()方法并完成对StopIteration异常的捕捉。

class Node:
    def __init__(self, value):
        self._value = value
        self._children = []
    def __repr__(self):
        return 'Node({!r})'.format(self._value)
    def add_child(self, node):
        self._children.append(node)
    def __iter__(self):
        return iter(self._children)
    def depth_first(self):
        yield self
        for c in self:
            yield from c.depth_first()
# Example
if __name__ == '__main__':
    root = Node(0)
    child1 = Node(1)
    child2 = Node(2)
    root.add_child(child1)
    root.add_child(child2)
    child1.add_child(Node(3))
    child1.add_child(Node(4))
    child2.add_child(Node(5))
    for ch in root.depth_first():
        print(ch)
    # Outputs Node(0), Node(1), Node(3), Node(4), Node(2), Node(5)

class Node2:
    def __init__(self, value):
        self._value = value
        self._children = []
    def __repr__(self):
        return 'Node({!r})'.format(self._value)
    def add_child(self, node):
        self._children.append(node)
    def __iter__(self):
        return iter(self._children)
    def depth_first(self):
        return DepthFirstIterator(self)
class DepthFirstIterator(object):
    '''
    Depth-first traversal
    '''
    def __init__(self, start_node):
        self._node = start_node
        self._children_iter = None
        self._child_iter = None
    def __iter__(self):
        return self
    def __next__(self):
        # Return myself if just started; create an iterator for children
        if self._children_iter is None:
            self._children_iter = iter(self._node)
            return self._node
        # If processing a child, return its next item
        elif self._child_iter:
            try:
                nextchild = next(self._child_iter)
                return nextchild
            except StopIteration:
                self._child_iter = None
                return next(self)
        # Advance to the next child and start its iteration
        else:
            self._child_iter = next(self._children_iter).depth_first()
            return next(self)

通常没人会去写第二种复杂的代码，又要维护状态又要处理异常，因此最好是使用生成器来实现。

反向迭代

使用reversed()方法，并且要注意，反向迭代必须是对象大小确定或该对象实现了__reversed__()方法才能生效，两者都不符合就需要将对象转换为列表才行。

# Print a file backwards
f = open('somefile')
for line in reversed(list(f)):
    print(line, end='')

还要注意的是，如果可迭代对象元素很多，转换为列表会消耗大量内存。

class Countdown:
    def __init__(self, start):
        self.start = start
    # Forward iterator
    def __iter__(self):
        n = self.start
        while n > 0:
            yield n
            n -= 1
    # Reverse iterator
    def __reversed__(self):
        n = 1
        while n <= self.start:
            yield n
            n += 1
for rr in reversed(Countdown(30)):
    print(rr)
for rr in Countdown(30):
    print(rr)

带有外部状态的生成器函数

如果想定义一个生成器函数，同时调用某个想暴露给用户使用的状态值，最简单的方法是实现一个类，然后把生成器函数放到__iter__()方法中。

from collections import deque
class linehistory:
    def __init__(self, lines, histlen=3):
        self.lines = lines
        self.history = deque(maxlen=histlen)
    def __iter__(self):
        for lineno, line in enumerate(self.lines, 1):
            self.history.append((lineno, line))
            yield line
    def clear(self):
        self.history.clear()

迭代器切片

想得到一个由迭代器/生成器生成的切片对象，使用itertools模块的islice()方法。

>>> def count(n):
...     while True:
...         yield n
...         n += 1
...
>>> c = count(0)
>>> c[10:20]
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable
>>> # Now using islice()
>>> import itertools
>>> for x in itertools.islice(c, 10, 20):
...     print(x)
...
10
11
12
13
14
15
16
17
18
19
>>>

跳过可迭代对象的开始部分

itertools模块的dropwhile()函数，传入一个函数对象和一个可迭代对象，返回一个可迭代对象，类似filter()方法，丢弃函数返回True的元素。

>>> from itertools import dropwhile
>>> with open('/etc/passwd') as f:
...     for line in dropwhile(lambda line: line.startswith('#'), f):
...         print(line, end='')

如果知道元素个数，也可以用islice()方法来抛弃前n个元素。

>>> from itertools import islice
>>> items = ['a', 'b', 'c', 1, 4, 10, 15]
>>> for x in islice(items, 3, None):
...     print(x)
...
1
4
10
15
>>>

None的作用与切片的[3:]原理相同。

排列组合的迭代

有时需要遍历一个集合中元素的所有可能的排列或组合。

itertools模块提供了三个函数来解决此类问题。

permutaions()接受可迭代对象和可选的长度参数，生成基于指定长度的排列元组。

>>> items = ['a', 'b', 'c']
>>> from itertools import permutations
>>> for p in permutations(items):
...     print(p)
...
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
>>> for p in permutations(items, 2):
...     print(p)
...
('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')
>>>

combinations()接收可迭代对象和必选的长度参数，返回组合元组。

>>> from itertools import combinations
>>> for c in combinations(items, 3):
...     print(c)
...
('a', 'b', 'c')
>>> for c in combinations(items, 2):
...     print(c)
...
('a', 'b')
('a', 'c')
('b', 'c')
>>> for c in combinations(items, 1):
...     print(c)
...
('a',)
('b',)
('c',)
>>>

若要允许同一元素被多次选择，可使用combinations_with_replacement()方法。

枚举迭代

在迭代的同时跟踪被处理的元素的下标索引。

使用内置的enumerate()方法，接收一个可迭代对象和可选的初始值，返回一个迭代器。

同时迭代多个序列

使用zip()方法，该方法需要注意以最短序列长度为迭代基准，超过长度不迭代。

或可使用itertools.zip_longest()方法，接收多个可迭代对象和一个fillvalue关键字参数指定默认值，此方法会迭代到最长序列。

注意的是zip()方法返回一个迭代器而不是列表。

同时对多个可迭代对象进行迭代

避免写重复的循环，使用itertools模块的chain()方法组合多个可迭代对象并返回一个新的迭代器。

>>> from itertools import chain
>>> a = [1, 2, 3, 4]
>>> b = ['x', 'y', 'z']
>>> for x in chain(a, b):
... print(x)
...
1
2
3
4
x
y
z
>>>

数据处理管道

使用生成器函数来实现管道机制。读取文件做一个生成器，读取行做一个生成器，处理行做一个生成器，最后用一个循环或相应方法调用，形成一个数据管道，注意的是yield和yield from的区别。

展开嵌套的序列

利用yield from后接可迭代对象会返回其所有元素的特点来调用，避免重复的循环代码，更优雅。

注意要判断是否是可迭代对象。yield from对于在生成器中调用其他生成器很有用。

合并有序序列并生成有序可迭代对象

有时需要将多个有序序列合并成一个有序序列，使用heapq.merge()方法可以解决。

>>> import heapq
>>> a = [1, 4, 7, 10]
>>> b = [2, 5, 6, 11]
>>> for c in heapq.merge(a, b):
...     print(c)
...
1
2
4
5
6
7
10
11

由于可迭代特性，heapq.merge()不会立刻读取所有序列，因此在长序列中使用不会有太大开销，并且必须注意的是，输入的序列必须是排序过的，heapq.merge()方法不会检查顺序，这个方法只是比较多个序列中的首位值，较小的放入新的序列。

迭代器代替while无限循环

利用了iter()方法的一个特性来做无限循环或有条件的循环。该方法接收一个可调用对象，iter()方法不断调用该对象直到其返回值与标记值相等为止。

>>> bool(iter(int, 1))
True

int默认值是0,因此iter迭代器永远不会结束，所以其布尔值始终是True。

CHUNKSIZE = 8192
def reader(s):
    while True:
        data = s.recv(CHUNKSIZE)
        if data == b'':
            break
        process_data(data)
def reader2(s):
    for chunk in iter(lambda: s.recv(CHUNKSIZE), b''):
        pass
        # process_data(data)

PythonCookBook笔记——迭代器与生成器的更多相关文章

Python学习笔记——迭代器和生成器
1.手动遍历迭代器使用next函数,并捕获StopIteration异常. def manual_iter(): with open('./test.py') as f: try: while Tr ...
Python 从零学起（纯基础）笔记之迭代器、生成器和修饰器
Python的迭代器. 生成器和修饰器 1. 迭代器是访问集合元素的一种方式,从第一个到最后,只许前进不许后退. 优点:不要求事先准备好整个迭代过程中的所有元素,仅仅在迭代到某个元素时才计算该元素,而 ...
python学习笔记四迭代器，生成器，装饰器（基础篇）
迭代器 __iter__方法返回一个迭代器,它是具有__next__方法的对象.在调用__next__方法时,迭代器会返回它的下一个值,若__next__方法调用迭代器没有值返回,就会引发一个Sto ...
Python复习笔记（八）迭代器和生成器和协程
1. 迭代器 1.1 可迭代对象判断xxx_obj是否可以迭代在第1步成立的前提下,调用 iter 函数得到 xxx_obj 对象的 __iter__ 方法的返回值 __iter__ 方法的返回值 ...
Python学习笔记（4）：容器、迭代对象、迭代器、生成器、生成器表达式
在了解Python的数据结构时,容器(container).可迭代对象(iterable).迭代器(iterator).生成器(generator).列表/集合/字典推导式(list,set,dict ...
流畅python学习笔记：第十四章：迭代器和生成器
迭代器和生成器是python中的重要特性,本章作者花了很大的篇幅来介绍迭代器和生成器的用法. 首先来看一个单词序列的例子: import re re_word=re.compile(r'\w+') c ...
Python学习笔记：输入输出，注释，运算符，变量，数字类型，序列，条件和循环控制，函数，迭代器与生成器，异常处理
输入输出输入函数input()和raw_input() 在Python3.x中只有input()作为输入函数,会将输入内容自动转换str类型: 在Python2.x中有input()和raw_inp ...
Python笔记(十)_迭代器与生成器
迭代用for...in来遍历一个可迭代对象的过程就叫迭代可迭代对象:列表.元组.字典.集合.字符串.生成器可以使用内置函数isinstance()判断一个对象是否是可迭代对象 >>& ...
Python3 迭代器与生成器 - 学习笔记
可迭代对象(Iterable) 迭代器(Iterator) 定义迭代器和可迭代对象的区别创建一个迭代器创建一个迭代器类使用内置iter()函数 StopIteration异常生成器(gene ...

随机推荐

Java面试题之Java中==和equals()和hashCode()的区别
“==”: ==是运算符,用来比较两个值.两个对象的内存地址是否相等: “equals()”: equals是Object类的方法,默认情况下比较两个对象是否是同一个对象,内部实现是通过“==”来实现 ...
bzoj 3625小朋友和二叉树多项式求逆+多项式开根好题
题目大意给定n种权值给定m \(F_i表示权值和为i的二叉树个数\) 求\(F_1,F_2...F_m\) 分析安利博客 \(F_d=F_L*F_R*C_{mid},L+mid+R=d\) \( ...
bzoj 1185 [HNOI2007]最小矩形覆盖凸包+旋转卡壳
题目大意用最小矩形覆盖平面上所有的点分析有一结论:最小矩形中有一条边在凸包的边上,不然可以旋转一个角度让面积变小简略证明我们逆时针枚举一条边用旋转卡壳维护此时最左,最右,最上的点注意注 ...
从网上搜集的X86 显示 int 10H
INT 10H 是由 BIOS 对屏幕及显示器所提供的服务程序,而后倚天公司针对倚天中文提供了许多服务程序,这些服务程序也加挂在 INT 10H 内.使用 INT 10H 中断服务程序时,先指定 AH ...
Larevel5.1 打印SQL语句
Larevel5.1 打印SQL语句为了方便调试,开发时需要打印sql. 方法一(全局打开): SQL打印默认是关闭的, 需要在/vendor/illuminate/database/Connect ...
洛谷—— P1134 阶乘问题
https://www.luogu.org/problemnew/show/P1134 题目描述也许你早就知道阶乘的含义,N阶乘是由1到N相乘而产生,如: 12! = 1 x 2 x 3 x 4 x ...
Oracle中PL/SQL 范例
1.写匿名块,输入三角形三个表的长度.在控制台打印三角形的面积 declare v_side_first ):=&第一条边; v_side_second ):=&第二条边; v_sid ...
go语言：函数参数传递详解
参数传递是指在程序的传递过程中,实际参数就会将参数值传递给相应的形式参数,然后在函数中实现对数据处理和返回的过程.比较常见的参数传递有:值传递,按地址传递参数或者按数组传递参数. 1.常规传递使用普 ...
OpenLayers3 动画
参考文章 openlayers3中三种动画实现
【postman】postman测试API报错如下：TypeError: Failed to execute 'fetch' on 'Window': Invalid value 对中文支持不好
使用postman测试APi的时候,因为系统需要在header部带上登录用户的信息,所以如下: 然后测试报错如下:TypeError: Failed to execute 'fetch' on 'W ...

PythonCookBook笔记——迭代器与生成器