笔记-python-standard library-17.1 threading

1. threading

source code: Lib/threading.py

本模块构建高级别的线程接口，而这是基于低级别的thread模块的。

有时threading模块因为缺乏_thread模块而不可用，这时可使用dummy_threading。

1.1. functions

threading.active_count()

返回当前活动Thread对象的数量，结果等于enumerate()返回列表的长度。

threading.crrent_thread() 返回当前线程对象的数量。
threading.get_ident() 返回当前线程的“thread identifier”，它是一个非0整数，数值没有直接意义，当线程退出时该数值可回收。
threading.enumerate()

返回当前活动的Thread对象的列表。

threading.main_thread()

返回main Thread 对象。通常情况下，主线程是编译开始的地方。

threading.settrace(func)
threading.setprofile(func)
threading.stack_size([size])

返回创建新线程时分配的线程栈大小。参数size指定大小，必需是0或者至少32kb(视平台而定)。

threading.TIMEOUT_MAX

1.2. thread-local data

thread-local data的值由线程指定，管理线程本地数据，需要创建local实例并在其中存储属性：

mydata = threading.local()

mydata.x = 1

这个实例的值在不同线程中不同。

class threading.local

A class that represents thread-local data.

For more details and extensive examples, see the documentation string of the_threading_local module.

1.3. Thread objects

Thread类代表一个单独的线程。

有两种方式指定动作，将可调用对象传递给构造函数，或通过覆盖子类中的run()方法。注意，只能重写__init__()和run()方法。

当线程对象创建后，需要调用线程的start()方法来启动它。

当线程启动后，它就是‘alived’，当它的run()方法结束时，它就死了，当然，抛出一个没法处理的异常也可以。is_alive()方法用于测试线程是死是活。

其它线程可以调用线程的join()方法，这会使调用的线程阻塞直到join()方法终止。

一个线程有一个名字，可以通过name属性读取和修改。

线程可以标记为“守护线程”，这会使当仅剩守护线程时整个python程序结束（有点绕，可以理解为，如果其它线程都结束了，但守护线程没结束时，守护线程会执行结束操作）。

class threading.Thread(group=None, target=None, name=None, args=(), kwargs={},*,daemon=None)

参数释义：

group：应该是None，可以视为一个保留参数。
target:可调用对象，默认是None。
name:该线程的名字，默认情况下，名字形式是Thread-N，N是一个小整数。
args是参数元组。
kwargs

下面是类的一些方法：

　　isAlive(): 返回线程是否在运行。正在运行指启动后、终止前。

　　get/setName(name): 获取/设置线程名。

　　start(): 线程准备就绪，等待CPU调度

is/setDaemon(bool): 获取/设置是后台线程（默认前台线程（False））。（在start之前设置）

现在使用daemon直接获取。

　　如果是后台线程，主线程执行过程中，后台线程也在进行，主线程执行完毕后，后台线程不论成功与否，主线程和后台线程均停止

如果是前台线程，主线程执行过程中，前台线程也在进行，主线程执行完毕后，等待前台线程也执行完成后，程序停止

　　start(): 启动线程。

　　join([timeout]): 阻塞当前上下文环境的线程，直到调用此方法的线程终止或到达指定的timeout（可选参数）。

由于任何进程默认会启动一个线程，就将该线程称为主线程。主线程又可以启动新的线程。python的threading模块有个current_thread()函数，它永远返回当前线程的实例。主线程实例的名字叫MainThread，子线程的名字在创建时指定，若不指定名字python就会自动给线程命名为Thread-1、Thread-2

注意：在Cpython中，因为GIL的存在，同时只有一个线程在执行。

如果想更好的利用计算机资源，可以使用multiprocessing或 concurrent.futures.ProcessPoolExecutor.

1.3.1. 创建案例

方法一：

# passing a callable object to the constructor

import threading

import time

def action(arg):

time.sleep(1)

print('the arg is:%s\n'%arg)

for i in range(4):

t = threading.Thread(target=action, args=(i,))

t.start()

print('the main thread end!')

#方法二：从Thread继承，并重写run()

# override run()

class MyThread(threading.Thread):

def __init__(self, arg):

super(MyThread, self).__init__()

self.arg = arg

def run(self):

time.sleep(2)

print('the arg is:%s\n'%self.arg)

for i in range(4):

t = MyThread(i)

t.start()

print('the main thread end!')

1.3.2. 守护线程

代码：

def action(arg):

time.sleep(2)

print('the arg is:%s\n'%arg)

time.sleep(2)

for i in range(4):

t = threading.Thread(target=action, args=(i,))

t.setDaemon(True)

t.start()

print('the main thread end!')

print(t.is_alive())

输出：

the main thread end!

True

注意要用使用命令行模式去执行脚本才能得到上面的结果。如果在IDLE中运行后面也会打印出所有线程执行的结果。

对主线程来说，运行完毕指的是主线程所在的进程内所有非守护线程统统运行完毕，主线程才算运行完毕

1.3.3. join

代码：

# join

def action(arg):

time.sleep(5)

print('the arg is:%s\n'%arg)

time.sleep(2)

thread_list = []

for i in range(4):

t = threading.Thread(target=action, args=(i,))

print(t)

#t.setDaemon(True)

thread_list.append(t)

print(thread_list)

for t in thread_list:

t.start()

for t in thread_list:

t.join()

print('the main thread end!')

print(t.is_alive())

需要注意的是不能顺序阻塞，否则就不是并发了。

1.4. Lock objects

在python中，锁是目前可用的最低级的状态同步指令。

锁包含两个基本状态-锁定和非锁定，以及两个基本方法。

在acquire()块中有多个线程被阻塞时，解锁时只有一个线程会继续执行，具体是哪一个线程不确定，可能因代码实现不同而异。

基本方法：

所有方法都以原子方式执行。

acquire(blocking=True, timeout=-1)

blocking默认为Ture，意为阻塞直到解锁，加锁并返回True

如果blocking为False，如果加锁失败立即返回False。

timeout为-1代表无限等待，如果为正值x等待x秒，当blocking为False时不能指定timeout。

release()

解锁，不仅可以从加锁线程调用，也可以从非加锁线程调用。

1.4.1. 代码实现

# lock

lock = threading.Lock()

a = lock.acquire()

print(a)

a =lock.acquire(timeout=5)

print(a)

a = lock.release()

print(a)

a = lock.acquire()

print(a)

a = lock.release()

print(a)

1.5. Rlock objects

rlock可以由同一线程多次获取。

acquire(blocking=True, timeout=-1)

不带参数调用：如果此线程已拥有锁，则将递归级别加1。如果另一线程已拥有该锁，等待至解锁后加锁。

如果多个线程等待解锁，只有一个线程能得到锁，这种情况下没有返回值。

其它与Lock的相同。

release()

递归级别减1，如果减1后级别为0，将锁置为非锁定状态。

如果减1后级别非0，保持锁定，继续由该线程持有。

1.5.1. 代码实现

# RLock

rlock = threading.RLock()

a = rlock.acquire()

print(a)

a = rlock.acquire()

print(a)

a = rlock.release()

print(a)

a = rlock.release()

print(a)

1.6. condition objects

class threading.Condition(lock=None)

该类实现条件变量对象。它允许一个或多个线程等待，直到它们被另一个线程唤醒。

如果给出了lock参数则必需是Lock或Rlock对象。如果没有指定则默认创建一个Rlock对象作为底层锁。

methods:

acquire(*args) 调用对应底层锁的方法。
release()
wait(timeout=None)

等到通知或者超时。如果调用时线程未获得锁，抛出RuntimeError异常。

该方法释放底层锁，然后阻塞直到被另一线程中的同一条件变量调用notify()或者notify_all()唤醒。

timeout指定超时。

当底层锁是Rlock时，它并不使用release()来解锁，原因是可能不能真正的完成解锁。而是使用一套内部接口来完成。

wait_for(predicate, timeout=None)

等待到条件为真。

notify(n=1)

默认唤醒条件变量中的一个等待线程，如果调用时未获取锁会抛出RuntimeError异常。

notify_all()

与notify()类似。

1.6.1. 代码案例1

简单实现

# producer and comsumer

product = None

con = threading.Condition()

# 生产者方法

def produce():

global product

if con.acquire():

while True:

if product is None:

print('produce...')

product = 'anything'

# 通知消费者，商品已经生产

con.notify()

# 等待通知

con.wait()

time.sleep(2)

# 消费者方法

def consume():

global product

if con.acquire():

while True:

if product is not None:

print('consume...')

product = None

# 通知生产者，商品已经没了

con.notify()

# 等待通知

con.wait()

time.sleep(2)

t1 = threading.Thread(target=produce)

t2 = threading.Thread(target=consume)

t2.start()

t1.start()

1.6.2. 多生产者多消费者实现

condition = threading.Condition()

products = 0

class Producer(threading.Thread):

def run(self):

global products

while True:

if condition.acquire():

if products < 10:

products += 1;

print("Producer(%s):deliver one, now products:%s" %(self.name, products))

condition.notify()#不释放锁定，因此需要下面一句

condition.release()

else:

print("Producer(%s):already 10, stop deliver, now products:%s" %(self.name, products))

condition.wait();#自动释放锁定

time.sleep(2)

class Consumer(threading.Thread):

def run(self):

global products

while True:

if condition.acquire():

if products > 1:

products -= 1

print("Consumer(%s):consume one, now products:%s" %(self.name, products))

condition.notify()

condition.release()

else:

print("Consumer(%s):only 1, stop consume, products:%s" %(self.name, products))

condition.wait();

time.sleep(2)

if __name__ == "__main__":

for p in range(0, 2):

p = Producer()

p.start()

for c in range(0, 3):

c = Consumer()

c.start()

1.6.3. 案例3

# codition 3

alist = None

condition = threading.Condition()

def doSet():

if condition.acquire():

print(threading.current_thread())

while alist is None:

condition.wait()

for i in range(len(alist))[::-1]:

alist[i] = 1

condition.release()

def doPrint():

if condition.acquire():

print(threading.current_thread())

while alist is None:

condition.wait()

for i in alist:

print(i)

condition.release()

def doCreate():

global alist

if condition.acquire():

print(threading.current_thread())

if alist is None:

alist = [i for i in range(10)]

condition.notify_all()

condition.release()

tset = threading.Thread(target=doSet,name='tset')

tprint = threading.Thread(target=doPrint,name='tprint')

tcreate = threading.Thread(target=doCreate,name='tcreate')

tset.start()

tprint.start()

tcreate.start()

1.7. semaphore objects

信号量对象管理一个计数器，每次调用acquire()会减1，调用release()则加1；当它为0时，调用acquire()时会阻塞，直到另一个线程调用了release()。

class threading.Semaphore(value=1)

实现semaphore对象。

方法：

acuqire(blocking=True, timeout=None)

不带参数调用：

如果计数器大于0，减1并返回True；

如果计数器等于0，阻塞直到被release()唤醒。线程被唤醒是没有顺序了。

带参数调用：

blocking=False，不阻塞，立即返回False；

timeout，设置超时时间。

release()d

class threading.BoundedSemaphore(vlaue=1)

Class implementing bounded semaphore objects. A bounded semaphore checks to make sure its current value doesn’t exceed its initial value. If it does, ValueError is raised. In most situations semaphores are used to guard resources with limited capacity. If the semaphore is released too many times it’s a sign of a bug. If not given, value defaults to 1.

下面是一个有限信号量的代码模拟。

maxconnections = 5

# ...

pool_sema = BoundedSemaphore(value=maxconnections)

声明信号量后，线程调用。

with pool_sema:

conn = connectdb()

try:

# ... use connection ...

finally:

conn.close()

1.7.1. 案例代码

另一个案例：

def run(n):

semaphore.acquire() #加锁

time.sleep(1)

print("run the thread:%s\n" % n)

semaphore.release() #释放

num = 0

semaphore = threading.BoundedSemaphore(5) # 最多允许5个线程同时运行

for i in range(22):

t = threading.Thread(target=run, args=("t-%s" % i,))

t.start()

while threading.active_count() != 1:

pass # print threading.active_count()

else:

print('-----all threads done-----')

1.8. event objects

这是一种最简单的线程同步机制，一个线程注册一个事件，另一个线程等待它。

class threading.Event

线程事件类

方法：

is_set() 如果标志为真返回True；
set() 设置标志为True；如果已经为真则调用wait()；
clear() 设置标志为False，如果已为否则调用wait()；
wait(timeout=None)

1.8.1. 案例代码

event = threading.Event()

def lighter():

count = 0

event.set() #初始值为绿灯

while True:

if 5 < count <=10 :

event.clear() # 红灯，清除标志位

print("\33[41;1mred light is on...\033[0m")

elif count > 10:

event.set() # 绿灯，设置标志位

count = 0

else:

print("\33[42;1mgreen light is on...\033[0m")

time.sleep(1)

count += 1

def car(name):

while True:

if event.is_set(): #判断是否设置了标志位

print("[%s] running..."%name)

time.sleep(1)

else:

print("[%s] sees red light,waiting..."%name)

event.wait()

print("[%s] green light is on,start going..."%name)

light = threading.Thread(target=lighter,)

light.start()

car = threading.Thread(target=car,args=("MINI",))

car.start()

1.9. timer objects

该类表示在等待预定时间后执行操作。timer是thread的子类。

需要注意的是timer对象等待的时间与用户定义的时间不一定完全相同。

class threading.Timer(interval, function, args=None, kwargs=None)

方法：

cancel()

停止计时器，取消操作。

1.9.1. 案例代码

def hello():

print("hello, world")

t = Timer(30.0, hello)

t.start() # after 30 seconds, "hello, world" will be printed

1.10. with stagement

模块中所有的包含acquire()和release()方法的对象都支持上下文管理器。

所以也可以使用with语句：

with some_lock:

# do something...

等效于

some_lock.acquire()

try:

# do something...

finally:

some_lock.release()

2. 附录

2.1. GIL

1. GIL是什么？

GIL的全称是Global Interpreter Lock(全局解释器锁)，来源是python设计之初的考虑，为了数据安全所做的决定。

GIL并不是语言特性，它是在实现pthon解析器(Cpython)时加入的。

2. GIL对Python的影响：

每个CPU在同一时间只能执行一个线程（在单核CPU下的多线程其实都只是并发，不是并行，并发和并行从宏观上来讲都是同时处理多路请求的概念。但并发和并行又有区别，并行是指两个或者多个事件在同一时刻发生；而并发是指两个或多个事件在同一时间间隔内发生。）

在Python多线程下，每个线程的执行方式：

获取GIL

执行代码直到sleep或者是python虚拟机将其挂起。

释放GIL

在Python2.x里，GIL的释放逻辑是当前线程遇见IO操作或者ticks计数达到100（ticks可以看作是Python自身的一个计数器，专门做用于GIL，每次释放后归零，这个计数可以通过 sys.setcheckinterval 来调整），进行释放。

而在python3.x中，GIL不使用ticks计数，改为使用计时器（执行时间达到阈值后，当前线程释放GIL），这样对CPU密集型程序更加友好，但依然没有解决GIL导致的同一时间只能执行一个线程的问题，所以效率依然不尽如人意。

3. 为什么不放弃GIL

GIL对诸如当前线程状态和为垃圾回收而用的堆分配对象这样的东西的访问提供着保护。但是，这对Python语言来说没什么特殊的，它需要使用一个GIL，只是某种实现方法的一种典型产物。现在也有其它的Python解释器（和编译器）并不使用GIL。

那么为什么不抛弃GIL呢？在1999年，针对Python 1.5，一个经常被提到但却不怎么理解的“free threading”补丁已经尝试实现了这个想法，该补丁来自Greg Stein。在这个补丁中，GIL被完全的移除，且用细粒度的锁来代替。然而，GIL的移除给单线程程序的执行速度带来了一定的代价。当用单线程执行时，速度大约降低了40%。使用两个线程展示出了在速度上的提高，但除了这个提高，这个收益并没有随着核数的增加而线性增长。由于执行速度的降低，这一补丁被拒绝了，并且几乎被人遗忘。

实际上Python社区也在非常努力的不断改进GIL，甚至是尝试去除GIL。并在各个小版本中有了不少的进步。

4. 怎么解决这个问题

CPython带有GIL，但并不是所有的Python解释器都是这样的。IronPython，Jython，还有使用.NET框架实现的Python就没有GIL。

如果不能忍受GIL，也可以尝试用一下其他实现版本的Python。当然，使用其它解释器的缺点在于支持的库可能会少一些。