python 并发专题(六):协程相关函数以及实现(gevent)
文档资源 http://sdiehl.github.io/gevent-tutorial/
一、协程实现
线程和协程
既然我们上面也说了,协程也被称为微线程,下面对比一下协程和线程:
- 线程之间需要上下文切换成本相对协程来说是比较高的,尤其在开启线程较多时,但协程的切换成本非常低。
- 同样的线程的切换更多的是靠操作系统来控制,而协程的执行由我们自己控制
我们通过下面的图更容易理解:


从上图可以看出,协程只是在单一的线程里不同的协程之间切换,其实和线程很像,线程是在一个进程下,不同的线程之间做切换,这也可能是协程称为微线程的原因吧
继续分析协程:

既然Gevent用的是Greenlet,我们通过下图来理解greenlet:

每个协程都有一个parent,最顶层的协程就是man thread或者是当前的线程,每个协程遇到IO的时候就把控制权交给最顶层的协程,它会看那个协程的IO event已经完成,就将控制权给它。
from greenlet import greenlet def test1(x,y):
z = gr2.switch(x+y)
print(z) def test2(u):
print(u)
gr1.switch(42) gr1 = greenlet(test1)
gr2 = greenlet(test2) gr1.switch("hello",'world')
greenlet(run=None, parent=None): 创建一个greenlet实例.
gr.parent:每一个协程都有一个父协程,当前协程结束后会回到父协程中执行,该 属性默认是创建该协程的协程.
gr.run: 该属性是协程实际运行的代码. run方法结束了,那么该协程也就结束了.
gr.switch(*args, **kwargs): 切换到gr协程.
gr.throw(): 切换到gr协程,接着抛出一个异常.
下面是gevent的一个例子:
import gevent def func1():
print("start func1")
gevent.sleep(1)
print("end func1") def func2():
print("start func2")
gevent.sleep(1)
print("end func2") gevent.joinall(
[
gevent.spawn(func1),
gevent.spawn(func2)
]
)
二、多协程
简单的多协程
import gevent def func1():
print("start func1")
gevent.sleep(1)
print("end func1") def func2():
print("start func2")
gevent.sleep(1)
print("end func2") gevent.joinall(
[
gevent.spawn(func1),
gevent.spawn(func2)
]
)
joinall(greenlets, timeout=None, raise_error=False, count=None)
Wait for the greenlets to finish.
- Parameters
- Returns
-
A sequence of the greenlets that finished before the timeout (if any) expired
wait(objects=None, timeout=None, count=None)
Wait for objects to become ready or for event loop to finish.
协程间的通信
import gevent
from gevent.queue import Queue tasks = Queue() def worker(n):
while not tasks.empty():
task = tasks.get()
print('Worker %s got task %s' % (n, task))
gevent.sleep(0) print('Quitting time!') def boss():
for i in xrange(1,25):
tasks.put_nowait(i) gevent.spawn(boss).join() gevent.joinall([
gevent.spawn(worker, 'steve'),
gevent.spawn(worker, 'john'),
gevent.spawn(worker, 'nancy'),
])
Worker steve got task 1
Worker john got task 2
Worker nancy got task 3
Worker steve got task 4
Worker john got task 5
Worker nancy got task 6
Worker steve got task 7
Worker john got task 8
Worker nancy got task 9
Worker steve got task 10
Worker john got task 11
Worker nancy got task 12
Worker steve got task 13
Worker john got task 14
Worker nancy got task 15
Worker steve got task 16
Worker john got task 17
Worker nancy got task 18
Worker steve got task 19
Worker john got task 20
Worker nancy got task 21
Worker steve got task 22
Worker john got task 23
Worker nancy got task 24
Quitting time!
Quitting time!
Quitting time!
full()-
Return
Trueif the queue is full,Falseotherwise.Queue(None)is never full.
get(block=True, timeout=None)-
Remove and return an item from the queue.
If optional args block is true and timeout is
None(the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
get_nowait()-
Remove and return an item from the queue without blocking.
Only get an item if one is immediately available. Otherwise raise the Empty exception.
peek(block=True, timeout=None)-
Return an item from the queue without removing it.
If optional args block is true and timeout is
None(the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
peek_nowait()-
Return an item from the queue without blocking.
Only return an item if one is immediately available. Otherwise raise the Empty exception.
put(item, block=True, timeout=None)-
Put an item into the queue.
If optional arg block is true and timeout is
None(the default), block if necessary until a free slot is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Full exception if no free slot was available within that time. Otherwise (block is false), put an item on the queue if a free slot is immediately available, else raise the Full exception (timeout is ignored in that case).
put_nowait(item)-
Put an item into the queue without blocking.
Only enqueue the item if a free slot is immediately available. Otherwise raise the Full exception.
qsize()-
Return the size of the queue.
三、协程池
from __future__ import print_function
import time
import gevent
from gevent.threadpool import ThreadPool pool = ThreadPool(3)
start = time.time()
for _ in range(4):
pool.spawn(time.sleep, 1)
gevent.wait()
delay = time.time() - start
print('Running "time.sleep(1)" 4 times with 3 threads. Should take about 2 seconds: %.3fs' % delay)
spawn(func, *args, **kwargs)
Add a new task to the threadpool that will run func(*args, **kwargs).
Waits until a slot is available. Creates a new native thread if necessary.
join()
Waits until all outstanding tasks have been completed.
四、协程爬虫实现
普通多协程版本
import gevent
from gevent import monkey
import re
import urllib.request
from lxml import etree
from lxml.cssselect import CSSSelector
import lxml.html
from lxml import etree
from lxml.html.clean import Cleaner
import string
import requests
import json
import zipfile, io
import math
import time
from gevent.queue import Queue HEADERS = {#'Accept':"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50', } # Thread-local state to stored information on locks already acquired def start_urls(tasks,total_page):
#生产者 产生用于消费的urls任务列表 url = "https://api.bilibili.com/x/v2/reply?jsonp=jsonp&pn={}&type=1&oid=455312953&sort=2&_=1587372277524"
for i in range(1,total_page+1):
tasks.put(url.format(i))
return tasks def init_start():
#获取评论列表的总页数
url = "https://api.bilibili.com/x/v2/reply?jsonp=jsonp&pn=1&type=1&oid=455312953&sort=2&_=1587372277524"
content = downloader(url)
data = json.loads(content.text)
total_page = math.ceil(int(data['data']['page']['count'])/int(data['data']['page']['size']))
print(total_page)
return total_page def downloader(url):
#下载任务
content = requests.get(url,headers=HEADERS)
print(content.status_code,type(content.status_code))
return content def work(tasks,n):
#消费者
while not tasks.empty():
gevent.sleep(1)
try:
url = tasks.get()
except Exception as e:
print('e',e)
continue
print(url)
data = downloader(url) if __name__ == '__main__':
total_page = init_start()
tasks = Queue()
task_urls = start_urls(tasks,total_page) gevent.joinall([gevent.spawn(work,task_urls,i) for i in range(3)])
协程池版本
注意:https://www.v2ex.com/t/308276
import gevent
from gevent import monkey
monkey.patchall()
import time
import json
from gevent.queue import Queue
from gevent import pool
import requests
import math # HEADERS = {#'Accept':"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50', } # Thread-local state to stored information on locks already acquired def start_urls(tasks,total_page):
#生产者 产生用于消费的urls任务列表 url = "https://api.bilibili.com/x/v2/reply?jsonp=jsonp&pn={}&type=1&oid=455312953&sort=2&_=1587372277524"
for i in range(1,total_page+1):
tasks.put(url.format(i))
return tasks def init_start():
#获取评论列表的总页数
url = "https://api.bilibili.com/x/v2/reply?jsonp=jsonp&pn=1&type=1&oid=455312953&sort=2&_=1587372277524"
content = downloader(url)
data = json.loads(content.text)
total_page = math.ceil(int(data['data']['page']['count'])/int(data['data']['page']['size']))
print(total_page)
return total_page def downloader(url):
#下载任务
content = requests.get(url,headers=HEADERS)
print(content.status_code,type(content.status_code))
return content def work(tasks,n):
#消费者
while not tasks.empty():
time.sleep(1)
try:
url = tasks.get()
except Exception as e:
print('e',e)
continue
print(url)
data = downloader(url) if __name__ == '__main__':
total_page = init_start()
tasks = Queue()
task_urls = start_urls(tasks,total_page)
pool = pool.Pool(3)
for i in range(3):
pool.spawn(work,task_urls,i)
pool.join()
五、web服务器与客户端实现
python 并发专题(六):协程相关函数以及实现(gevent)的更多相关文章
- python并发编程之协程知识点
由线程遗留下的问题:GIL导致多个线程不能真正的并行,CPython中多个线程不能并行 单线程实现并发:切换+保存状态 第一种方法:使用yield,yield可以保存状态.yield的状态保存与操作系 ...
- 32 python 并发编程之协程
一 引子 本节的主题是基于单线程来实现并发,即只用一个主线程(很明显可利用的cpu只有一个)情况下实现并发,为此我们需要先回顾下并发的本质:切换+保存状态 cpu正在运行一个任务,会在两种情况下切走去 ...
- 四 python并发编程之协程
一 引子 本节的主题是基于单线程来实现并发,即只用一个主线程(很明显可利用的cpu只有一个)情况下实现并发,为此我们需要先回顾下并发的本质:切换+保存状态 cpu正在运行一个任务,会在两种情况下切走去 ...
- 第十篇.5、python并发编程之协程
一 引子 本节的主题是基于单线程来实现并发,即只用一个主线程(很明显可利用的cpu只有一个)情况下实现并发,为此我们需要先回顾下并发的本质:切换+保存状态 cpu正在运行一个任务,会在两种情况下切走去 ...
- 第 12 章 python并发编程之协程
一.引子 主题是基于单线程来实现并发,即只用一个主线程(很明显可利用的cpu只用一个)情况下实现并发,并发的本质:切换+保存状态 cpu正在运行一个任务,会在两种情况下切走去执行其他的任务(切换由操作 ...
- 37、python并发编程之协程
目录: 一 引子 二 协程介绍 三 Greenlet 四 Gevent介绍 五 Gevent之同步与异步 六 Gevent之应用举例一 七 Gevent之应用举例二 一 引子 本节的主题是基于单线程来 ...
- python 并发编程之协程
一.协程 协程: 单线程下的并发,又称 微线程.协程是一种用户态的的轻量级线程,即协程是由用户程序自己控制调度的. 协程的本质就是在单线程下,由用户自己控制一个任务,遇到 io 阻塞就切换另外一个 ...
- python并发编程之协程(实践篇)
一.协程介绍 协程:是单线程下的并发,又称微线程,纤程.一句话说明什么是线程:协程是一种用户态的轻量级线程,即协程是由用户程序自己控制调度的. 对于单线程下,我们不可避免程序中出现io操作,但如果我们 ...
- python并发编程之协程
---恢复内容开始--- 一.join方法 (1)开一个主线程 from threading import Thread,currentThread import time def walk(): p ...
- python协程详解,gevent asyncio
python协程详解,gevent asyncio 新建模板小书匠 #协程的概念 #模块操作协程 # gevent 扩展模块 # asyncio 内置模块 # 基础的语法 1.生成器实现切换 [1] ...
随机推荐
- wdcp如何添加反向代理功能
1.winscp进入目录 /www/wdlinux/httpd-x.x.x/conf/右键编辑 httpd.conf 这个文件 依次把下面文件名字前面的 # 号去掉 LoadModule proxy_ ...
- LR脚本信息函数-lr_get_host_name
lr_get_host_name() 返回主机的名称. char * lr_get_host_name(); lr_get_host_name函数返回执行脚本的机器的名称. 示例:lr_get_hos ...
- Linux - Python的虚拟环境配置的坑 virtualenv: error: unrecognized arguments: --no-site-packages
如果你在CentOS8下面配置虚拟环境时,遇到如下错误: [root@localhost ~]# mkvirtualenv my_django usage: virtualenv [--version ...
- 12.实战交付一套dubbo微服务到k8s集群(5)之交付dubbo-monitor到K8S集群
dubbo-monitor官方源码地址:https://github.com/Jeromefromcn/dubbo-monitor 1.下载dubbo-monitor源码并解压 [root@hdss7 ...
- RocksDB事务的隔离性分析【原创】
Rocksdb事务隔离性指的是多线程并发事务使用时候,事务与事务之间的隔离性,通过加锁机制来实现,本文重点剖析Read Commited隔离级别下,Rocksdb的加锁机制. Rocksdb事务相关类 ...
- JavaWeb网上图书商城完整项目--day02-21.退出功能的实现
1.当用户点击退出的时候,跳转到登陆页面 当用户点击退出的时候,需要将session中保存的登陆的用户销毁掉 当用户点击退出的时候,调用UserServlet的quit方法 退出按钮在top.jsp中 ...
- Vue基础篇 (1) —— Vue-Router的使用
Vue-Cli中Vue-Router的使用 一.创建vue-cli的项目 npm create myproject vue create 为vue.js 3.0构建项目的命令,2.0版本可以通过vue ...
- WARN deploy.SparkSubmit$$anon$2: Failed to load org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount.
前言 今天运行Spark Structured Streaming官网的如下 ./bin/run-example org.apache.spark.examples.sql.streaming.Str ...
- python fabric安装
1 安装epel wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo 2 安装pip yum i ...
- 作为一个Java开发你用过Jib吗
1. 前言 Jib是Google开发的可以直接构建 Java应用的Docker和OCI镜像的类库,以Maven和Gradle插件形式提供.它最骚操作的是可以在没有Docker守护程序的情况下构建,也就 ...