itertools：处理可迭代对象的模块

合并和分解迭代器

chain

chain可以接收多个可迭代对象(或者迭代器)作为参数，最后返回一个迭代器。它会生成所有输入迭代器的内容，就好像这些内容来自一个迭代器一样。类似于collections下的ChainMap，可以合并多个字典。chain可以合并多个可迭代对象

import itertools

c = itertools.chain([1, 2, 3], "abc", {"k1": "v1", "k2": "v2"})

# 直接打印的话是一个对象

print(c)  # <itertools.chain object at 0x00000000029745F8>

for i in c:

    print(i, end=" ")  # 1 2 3 a b c k1 k2

# 还可以使用chain.from_iterable,参数接收多个可迭代对象组成的一个可迭代对象

c = itertools.chain.from_iterable([[1, 2, 3], "abc", {"k1": "v1", "k2": "v2"}])

for i in c:

    print(i, end=" ")  # 1 2 3 a b c k1 k2

zip_longest

从名字上能看出来，这似乎和内置的zip有关系。确实如此，就是将多个可迭代对象对应位置的元素组合起来，像拉链(zip)一样。只不过内置的zip是"木桶原理"，一方匹配到头了，那么就不匹配了。而zip_longest显然是以长的那一方为基准

import itertools

# 函数zip则是把多个迭代器对象组合到一个元组中

name = ["古明地觉", "椎名真白", "雪之下雪乃"]

where = ["东方地灵殿", "樱花张的宠物女孩", "春物"]

z = zip(name, where)

print(z)  # <zip object at 0x0000000001DC03C8>

print(list(z))  # [('古明地觉', '东方地灵殿'), ('椎名真白', '樱花张的宠物女孩'), ('雪之下雪乃', '春物')]

# zip英文意思是拉链，很形象，就是把对应元素给组合起来

# 但如果两者长度不一致怎么办？

name = ["古明地觉", "椎名真白", "雪之下雪乃", "xxx"]

where = ["东方地灵殿", "樱花张的宠物女孩", "春物"]

print(list(zip(name, where)))  # [('古明地觉', '东方地灵殿'), ('椎名真白', '樱花张的宠物女孩'), ('雪之下雪乃', '春物')]

# 可以看到，不一致的时候，当一方结束之后就停止匹配。

# 如果想匹配长的，那么可以使用itertools下面的zip_longest

print(list(itertools.zip_longest(name, where)))  # [('古明地觉', '东方地灵殿'), ('椎名真白', '樱花张的宠物女孩'), ('雪之下雪乃', '春物'), ('xxx', None)]

# 可以看到没有的，自动为None了，当然我们也可以指定填充字符

print(list(itertools.zip_longest(name, where, fillvalue="你输入的是啥啊")))

# [('古明地觉', '东方地灵殿'), ('椎名真白', '樱花张的宠物女孩'), ('雪之下雪乃', '春物'), ('xxx', '你输入的是啥啊')]

islice

如果一个迭代器里面包含了很多元素，我们只要想一部分的话，可以使用islice，按照索引从迭代器中返回所选择的元素，并且得到的还是一个迭代器

import itertools

num = range(20)

# 从index=5的地方选到index=10(不包含)的地方

s = itertools.islice(num, 5, 10)

print(list(s))  # [5, 6, 7, 8, 9]

#  从开头选到index=5的地方

s = itertools.islice(num, 5)

print(list(s))  # [0, 1, 2, 3, 4]

# 从index=5的地方选择到index=15的地方，步长为3

s = itertools.islice(num, 5, 15, 3)

print(list(s))  # [5, 8, 11, 14]

'''

所以除了迭代器之外，

如果只传一个参数，比如5，表示从index=0选到index=5(不包含)的地方

如果传两个参数，比如5和10，表示从index=5选到index=10(不包含)的地方

如果传三个参数，比如5和10和2，表示从index=5选到index=10(不包含)的地方，步长为2

'''

# 那么支不支持负数索引呢？答案是不支持的，因为不知道迭代器有多长，除非全部读取，可是那样的话干嘛不直接转化为列表之后再用切片获取呢？

# 之所以使用islice这种形式，就是为了在不全部读取的情况下，也能选择出我们想要的部分，所以这种方式只支持从前往后，不能从后往前。

tee

import itertools

r = [1, 2, 3, 4, 5]

i1, i2 = itertools.tee(r, n=2)

print(list(i1))  # [1, 2, 3, 4, 5]

print(list(i2))  # [1, 2, 3, 4, 5]

转换输入

startmap

import itertools

'''

内置的map(函数, 序列)返回一个迭代器，它对序列中的每一个值都调用指定的函数并返回结果。

输入迭代中的元素全部被消费时，map()就会停止

'''

l = [1, 2, 3]

map_l = map(lambda x: str(x) + "a", l)

print(list(map_l))  # ['1a', '2a', '3a']

l1 = [(0, 5), (1, 6), (2, 7)]

'''

注意map里面的函数只能有一个参数，因此不可以写成以下格式

map_l1 = map(lambda x, y: x*y, l1)

但是可以这样

'''

map_l1 = map(lambda x: x[0] * x[1], l1)

print(list(map_l1))  # [0, 6, 14]

# 但是itertools下的startmap()是支持的

l2 = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

# 注意里面的函数的参数的参数个数是由我们后面传入对象决定的,这里每个元组显然有三个元素，所以需要三个参数

map_l1 = itertools.starmap(lambda x, y, z: f"{x} + {y} + {z} = {x + y + z}", l2)

print(list(map_l1))  # ['1 + 2 + 3 = 6', '4 + 5 + 6 = 15', '7 + 8 + 9 = 24']

# map的话只能通过lambda x: x[0] + x[1] + x[2]这样的形式

# 因此starmap只能对类似于[(), (), ()]这种值进行处理,比如[1, 2, 3]使用starmap是会报错的，但是[(1, ), (2, ), (3, )]不会报错

生成新值

count

import itertools

'''

count(start=0, step=1)返回一个迭代器，该迭代器能够无限地生成连续的整数。

接收两个参数：起始（默认为0）和步长（默认为1）

等价于：

def count(firstval=0, step=1):

    x = firstval

    while 1:

        yield x

        x += step

'''

# 起始值为5，步长为2

c1 = itertools.count(5, 2)

print(list(itertools.islice(c1, 5)))  # [5, 7, 9, 11, 13]

cycle

import itertools

'''

cycle(iterable)返回一个迭代器，会无限重复里面的内容，直到内存耗尽

'''

c2 = itertools.cycle("abc")

print(list(itertools.islice(c2, 10)))  # ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a']

repeat

import itertools

'''

repeat(obj, times=None),无限重复obj，除非指定times。

'''

# 指定重复3次

print(list(itertools.repeat("abc", 3)))  # ['abc', 'abc', 'abc']

过滤

dropwhile

删除满足条件的值，注意：是删除

import itertools

l = [1, 2, 3, 4, 5]

drop_l = itertools.dropwhile(lambda x: x < 3, l)

# 依旧返回迭代器

print(drop_l)  # <itertools.dropwhile object at 0x000001AD63AD0488>

# 可以看到小于3的都被丢掉了

print(list(drop_l))  # [3, 4, 5]

takewhile

这个和filter是一样的，保留满足条件的值

import itertools

l = [1, 2, 3, 4, 5]

take_l = itertools.takewhile(lambda x: x < 3, l)

print(take_l)  # <itertools.takewhile object at 0x000001D37F512948>

print(list(take_l))  # [1, 2]

filter_l = filter(lambda x: x < 3, l)

print(list(filter_l))  # [1, 2]

filterfalse

这个和dropwhile一样，和takewhile、filter相反

import itertools

l = [1, 2, 3, 4, 5]

filterfalse_l = itertools.filterfalse(lambda x: x < 3, l)

print(list(filterfalse_l))  # [3, 4, 5]

filter和takewhile一样，过滤出条件为True的值
filterfalse和dropwhile一样，过滤出条件为False的值

compress

compress则提供了另一种过滤可迭代对象内容的方法。

import itertools

condition = [True, False, True, True, False]

data = [1, 2, 3, 4, 5]

print(list(itertools.compress(data, condition)))  # [1, 3, 4]

# 除了指定True和False，还可以使用python它类型的值，会以其对应的布尔值作为判断依据

condition = [1, 0, "x", "x", {}]

print(list(itertools.compress(data, condition)))  # [1, 3, 4]

合并输入

accumulate

accumulate处理输入的序列，得到一个类似于斐波那契的结果

import itertools

print(list(itertools.accumulate(range(5))))  # [0, 1, 3, 6, 10]

print(list(itertools.accumulate("abcde")))  # ["a", "ab", "abc", "abcd", "abcde"]

# 所以这里的相加还要看具体的含义

try:

    print(list(itertools.accumulate([[1, 2], (3, 4)])))

except TypeError as e:

    print(e)  # can only concatenate list (not "tuple") to list

    # 这里就显示无法将列表和元组相加

# 当然也可以自定义

data = [1, 2, 3, 4, 5]

method = lambda x, y: x * y

print(list(itertools.accumulate(data, method)))  # [1, 2, 6, 24, 120]

# 可以看到这里的结果就改变了

product

product则是会将多个可迭代对象组合成一个笛卡尔积

import itertools

print(list(itertools.product([1, 2, 3], [2, 3])))  # [(1, 2), (1, 3), (2, 2), (2, 3), (3, 2), (3, 3)]

permutations

permutations从输入迭代器生成元素，这些元素以给定长度的排列形成组合。默认会生成所以排列的全集

import itertools

data = [1, 2, 3, 4]

print(list(itertools.permutations(data)))

# 根据排列组合，显然是A44,4 * 3 * 2 * 1 = 24种组合

'''

[(1, 2, 3, 4), (1, 2, 4, 3), (1, 3, 2, 4), (1, 3, 4, 2), (1, 4, 2, 3), (1, 4, 3, 2),

(2, 1, 3, 4), (2, 1, 4, 3), (2, 3, 1, 4), (2, 3, 4, 1), (2, 4, 1, 3), (2, 4, 3, 1),

(3, 1, 2, 4), (3, 1, 4, 2), (3, 2, 1, 4), (3, 2, 4, 1), (3, 4, 1, 2), (3, 4, 2, 1),

(4, 1, 2, 3), (4, 1, 3, 2), (4, 2, 1, 3), (4, 2, 3, 1), (4, 3, 1, 2), (4, 3, 2, 1)]

'''

print(list(itertools.permutations(data, 2)))

# [(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4), (3, 1), (3, 2), (3, 4), (4, 1), (4, 2), (4, 3)]

combinations

permutations显然是考虑了顺序，相当于排列组合里面A，而combinations只考虑元素是否一致，而不管顺序，相当于排列组合里面的C

import itertools

# permutations只要顺序不同就看做一种结果，combinations则保证只要元素相同就是同一种结果

data = "abcd"

print(list(itertools.combinations(data, 3)))  # [('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'c', 'd'), ('b', 'c', 'd')]

# 如果拿抽小球来作比喻的话，显然combinations是不放回的，也就是不会重复单个的输入元素

# 但是有时候可能也需要考虑包含重复元素的组合，相当于抽小球的时候有放回。

# 对于这种情况，可以使用combinations_with_replacement

print(list(itertools.combinations_with_replacement(data, 3)))

'''

[('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'a', 'd'), ('a', 'b', 'b'),

('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'c', 'c'), ('a', 'c', 'd'), ('a', 'd', 'd'),

('b', 'b', 'b'), ('b', 'b', 'c'), ('b', 'b', 'd'), ('b', 'c', 'c'), ('b', 'c', 'd'),

('b', 'd', 'd'), ('c', 'c', 'c'), ('c', 'c', 'd'), ('c', 'd', 'd'), ('d', 'd', 'd')]

'''