python-进程池与线程池，协程

一、进程池与线程池

实现并发的手段有两种，多线程和多进程。注：并发是指多个任务看起来是同时运行的。主要是切换+保存状态。

当我们需要执行的并发任务大于cpu的核数时，我们需要知道一个操作系统不能无限的开启进程和线程，通常有几个核就开几个进程，如果进程开启过多，就无法充分利用cpu多核的优势，效率反而会下降。这个时候就引入了进程池线程池的概念。

池的功能就是限制启动的进程数或线程数

concurent.future模块：

concurrent.futures模块提供了高度封装的异步调用接口

ProcessPoolExecutor: 进程池，提供异步调用

p = ProcessPoolExecutor(max_works)对于进程池如果不写max_works：默认的是cpu的数目,默认是4个

ThreadPoolExecutor：线程池，提供异步调用
p = ThreadPoolExecutor(max_works)对于线程池如果不写max_works：默认的是cpu的数目*5

补充：

提交任务的两种方式:
# 同步调用:提交完一个任务之后,就在原地等待,等待任务完完整整地运行完毕拿到结果后,再执行下一行代码,会导致任务是串行执行的
# 异步调用:提交完一个任务之后,不在原地等待,结果???,而是直接执行下一行代码,会导致任务是并发执行的

进程池从无到有创建进程后，然后会固定使用进程池里创建好的进程去执行所有任务，不会开启其他进程

# 基本方法

#submit(fn, *args, **kwargs)

异步提交任务

#map(func, *iterables, timeout=None, chunksize=1)

取代for循环submit的操作

#shutdown(wait=True)

相当于进程池的pool.close()+pool.join()操作

wait=True，等待池内所有任务执行完毕回收完资源后才继续

wait=False，立即返回，并不会等待池内的任务执行完毕

但不管wait参数为何值，整个程序都会等到所有任务执行完毕

submit和map必须在shutdown之前

#result(timeout=None)

取得结果

#add_done_callback(fn)

回调函数

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor

import time,random,os

import requests

def get(url):

    print('%s GET %s' %(os.getpid(),url))

    time.sleep(3)

    response=requests.get(url)

    if response.status_code == 200:

        res=response.text

    else:

        res='下载失败'

    return res

def parse(future):

    time.sleep(1)

    res=future.result()

    print('%s 解析结果为%s' %(os.getpid(),len(res)))

if __name__ == '__main__':

    urls=[

        'https://www.baidu.com',

        'https://www.sina.com.cn',

        'https://www.tmall.com',

        'https://www.jd.com',

        'https://www.python.org',

        'https://www.openstack.org',

        'https://www.baidu.com',

        'https://www.baidu.com',

        'https://www.baidu.com',

    ]

    p=ProcessPoolExecutor(9)

    start=time.time()

    for url in urls:

        future=p.submit(get,url)

        # 异步调用:提交完一个任务之后,不在原地等待,而是直接执行下一行代码,会导致任务是并发执行的,,结果futrue对象会在任务运行完毕后自动传给回调函数

        future.add_done_callback(parse)  #parse会在任务运行完毕后自动触发,然后接收一个参数future对象

    p.shutdown(wait=True)

    print('主',time.time()-start)

    print('主',os.getpid())

test

线程池与进程池相比他们的同步执行和异步执行是一样的：

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor

from threading import current_thread

import time,random,os

import requests

def get(url):

    print('%s GET %s' %(current_thread().name,url))

    time.sleep(3)

    response=requests.get(url)

    if response.status_code == 200:

        res=response.text

    else:

        res='下载失败'

    return res

def parse(future):

    time.sleep(1)

    res=future.result()

    print('%s 解析结果为%s' %(current_thread().name,len(res)))

if __name__ == '__main__':

    urls=[

        'https://www.baidu.com',

        'https://www.sina.com.cn',

        'https://www.tmall.com',

        'https://www.jd.com',

        'https://www.python.org',

        'https://www.openstack.org',

        'https://www.baidu.com',

        'https://www.baidu.com',

        'https://www.baidu.com',

    ]

    p=ThreadPoolExecutor(4)

    for url in urls:

        future=p.submit(get,url)

        future.add_done_callback(parse)

    p.shutdown(wait=True)

    print('主',current_thread().name)

test

map函数：

# 我们的那个p.submit(task,i)和map函数的原理类似。我们就

# 可以用map函数去代替。更减缩了代码

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor

import os, time, random

def task(n):

    print('[%s] is running' % os.getpid())

    time.sleep(random.randint(1, 3))  # I/O密集型的，，一般用线程，用了进程耗时长

    return n ** 2

if __name__ == '__main__':

    p = ProcessPoolExecutor()

    obj = p.map(task, range(10))

    p.shutdown()  # 相当于close和join方法

    print('=' * 30)

    print(obj)  # 返回的是一个迭代器

    print(list(obj))

回调函数（知乎）：https://www.zhihu.com/question/19801131/answer/27459821

二、协程

在单线程的情况下实现并发。

遇到IO就切换就可以降低单线程的IO时间,从而最大限度地提升单线程的效率。

实现并发是让多个任务看起来同时运行（切换+保存状态），cpu在运行一个任务的时候，会在两种情况下去执行其他的任务，一种是遇到了I/O操作，一种是计算时间过长。其中第二种情况使用线程并发并不能提升效率，运算密集型的并发反而会降低效率。

#串行执行

import time

def func1():

    for i in range(10000000):

        i+1

def func2():

    for i in range(10000000):

        i+1

start = time.time()

func1()

func2()

stop = time.time()

print(stop - start)#1.675490379333496

串行执行

#基于yield并发执行

import time

def func1():

    while True:

        print('func1')

        100000+1

        yield

def func2():

    g=func1()

    for i in range(10000000):

        print('func2')

        time.sleep(100)

        i+1

        next(g)

start=time.time()

func2()

stop=time.time()

print(stop-start)

基于yield并发执行

yield复习：

函数中只有有yield，这个函数就变成了一个生成器，调用函数不会执行函数体代码，会得到一个返回值，返回值就是生成器对象。

def yield_test(n):

    for i in range(n):

        yield call(i)

        print("i=",i)

    #做一些其它的事情

    print("do something.")

    print("end.")

def call(i):

    return i*2

#使用for循环

for i in yield_test(5):

    print(i,",")

test

协程的本质就是在单线程下，由用户自己控制一个任务遇到IO操作就切换到另一个任务去执行，以此来提升效率。

Gevent：

gevent是第三方库，通过greenlet实现协程，其基本思想是：

当一个greenlet遇到IO操作时，比如访问网络，就自动切换到其他的greenlet，等到IO操作完成，再在适当的时候切换回来继续执行。由于IO操作非常耗时，经常使程序处于等待状态，有了gevent为我们自动切换协程，就保证总有greenlet在运行，而不是等待IO。

由于切换是在IO操作时自动完成，所以gevent需要修改Python自带的一些标准库，这一过程在启动时通过monkey patch完成：

我们用等待的时间模拟IO阻塞在gevent模块里面要用gevent.sleep(5)表示等待的时间要是我们想用time.sleep()，就要在最上面导入from gevent import monkey;monkey.patch_all()这句话如果不导入直接用time.sleep()，就实现不了单线程并发的效果了

注：猴子补丁需要在第一行就运行

from gevent import monkey;monkey.patch_all()

from gevent import spawn,joinall #pip3 install gevent

import time

def play(name):

    print('%s play 1' %name)

    time.sleep(5)

    print('%s play 2' %name)

def eat(name):

    print('%s eat 1' %name)

    time.sleep(3)

    print('%s eat 2' %name)

start=time.time()

g1=spawn(play,'lxx')

g2=spawn(eat,'lxx')

# g1.join()

# g2.join()

joinall([g1,g2])

print('主',time.time()-start)

test

gevent.spawn()”方法会创建一个新的greenlet协程对象，并运行它。”gevent.joinall()”方法会等待所有传入的greenlet协程运行结束后再退出，这个方法可以接受一个”timeout”参数来设置超时时间，单位是秒。

在单线程内实现socket并发：

from gevent import monkey;monkey.patch_all()

from socket import *

from gevent import spawn

def comunicate(conn):

    while True:  # 通信循环

        try:

            data = conn.recv(1024)

            if len(data) == 0: break

            conn.send(data.upper())

        except ConnectionResetError:

            break

    conn.close()

def server(ip, port, backlog=5):

    server = socket(AF_INET, SOCK_STREAM)

    server.bind((ip, port))

    server.listen(backlog)

    while True:  # 链接循环

        conn, client_addr = server.accept()

        print(client_addr)

        # 通信

        spawn(comunicate,conn)

if __name__ == '__main__':

    g1=spawn(server,'127.0.0.1',8080)

    g1.join()

server

from threading import Thread,current_thread

from socket import *

def client():

    client=socket(AF_INET,SOCK_STREAM)

    client.connect(('127.0.0.1',8080))

    n=0

    while True:

        msg='%s say hello %s' %(current_thread().name,n)

        n+=1

        client.send(msg.encode('utf-8'))

        data=client.recv(1024)

        print(data.decode('utf-8'))

if __name__ == '__main__':

    for i in range(500):

        t=Thread(target=client)

        t.start()

client