python3 线程 threading

最基础的线程的使用

import threading, time

value = 0
lock = threading.Lock() def change(n):
global value
value += n
value -= n def thread_fun(n):
for i in range(1000):
lock.acquire()
try:
change(n)
finally:
lock.release() t1 = threading.Thread(target=thread_fun, args=(2,))
t2 = threading.Thread(target=thread_fun, args=(3,)) t1.start()
t2.start()
t1.join()
t2.join() print(value) output: 0

使用一个类, 推荐使用这种方法

import threading, time

class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
global n, lock
for i in range(100):
lock.acquire()
try:
print (n , self.name)
n += 1
finally:
lock.release() if "__main__" == __name__:
n = 1
ThreadList = []
lock = threading.Lock() for i in range(1, 5):
t = MyThread()
ThreadList.append(t) for t in ThreadList:
t.start() for t in ThreadList:
t.join()

性能

python GIL 性能

启动一个执行死循环的线程,CPU占有率可以达到100%

启动与CPU核心数量相同的N个线程,在4核CPU上可以监控到CPU占用率仅有160%,也就是使用不到两核。

即使启动100个线程,使用率也就170%左右,仍然不到两核。

但是用C、C++或Java来改写相同的死循环,直接可以把全部核心跑满,4核就跑到400%,8核就跑到800%,为什么Python不行呢?

因为Python的线程虽然是真正的线程,但解释器执行代码时,有一个GIL锁:Global Interpreter Lock,任何Python线程执行前,必须先获得GIL锁,然后,每执行100条字节码,解释器就自动释放GIL锁,让别的线程有机会执行。这个GIL全局锁实际上把所有线程的执行代码都给上了锁,所以,多线程在Python中只能交替执行,即使100个线程跑在100核CPU上,也只能用到1个核。

GIL是Python解释器设计的历史遗留问题,通常我们用的解释器是官方实现的CPython,要真正利用多核,除非重写一个不带GIL的解释器。

所以,在Python中,可以使用多线程,但不要指望能有效利用多核。如果一定要通过多线程利用多核,那只能通过C扩展来实现,不过这样就失去了Python简单易用的特点。

如何决定用不用线程, 如何决定用python还是

IO密集型适合用多线程。涉及到网络、磁盘IO的任务都是IO密集型任务,这类任务的特点是CPU消耗很少,任务的大部分时间都在等待IO操作完成(因为IO的速度远远低于CPU和内存的速度)。对于IO密集型任务,任务越多,CPU效率越高,但也有一个限度。常见的大部分任务都是IO密集型任务,比如Web应用。

IO密集型任务执行期间,99%的时间都花在IO上,花在CPU上的时间很少,因此,用运行速度极快的C语言替换用Python这样运行速度极低的脚本语言,完全无法提升运行效率。对于IO密集型任务,最合适的语言就是开发效率最高(代码量最少)的语言,脚本语言是首选,C语言最差

注:参考了廖雪峰大神的教程


附一篇备着自己看

python GIL性能详解

(For the sake of focus, I only describe CPython here—not Jython, PyPy, or IronPython. CPython is the Python implementation that working programmers overwhelmingly use.)

Behold, the global interpreter lock

Here it is:

static PyThread_type_lock interpreter_lock = 0; /* This is the GIL */

This line of code is in ceval.c, in the CPython 2.7 interpreter's source code. Guido van Rossum's comment, "This is the GIL," was added in 2003, but the lock itself dates from his first multithreaded Python interpreter in 1997. On Unix systems, PyThread_type_lock is an alias for the standard C lock, mutex_t. It is initialized when the Python interpreter begins:

void
PyEval_InitThreads(void)
{
interpreter_lock = PyThread_allocate_lock();
PyThread_acquire_lock(interpreter_lock);
}

All C code within the interpreter must hold this lock while executing Python. Guido first built Python this way because it is simple, and every attempt to remove the GIL from CPython has cost single-threaded programs too much performance to be worth the gains for multithreading.

The GIL's effect on the threads in your program is simple enough that you can write the principle on the back of your hand: "One thread runs Python, while N others sleep or await I/O." Python threads can also wait for a threading.Lock or other synchronization object from the threading module; consider threads in that state to be "sleeping," too.

hand with writing

When do threads switch? Whenever a thread begins sleeping or awaiting network I/O, there is a chance for another thread to take the GIL and execute Python code. This is cooperative multitasking. CPython also has preemptive multitasking: If a thread runs uninterrupted for 1000 bytecode instructions in Python 2, or runs 15 milliseconds in Python 3, then it gives up the GIL and another thread may run. Think of this like time slicing in the olden days when we had many threads but one CPU. I will discuss these two kinds of multitasking in detail.

Think of Python as an old mainframe; many tasks share one CPU.

Cooperative multitasking

When it begins a task, such as network I/O, that is of long or uncertain duration and does not require running any Python code, a thread relinquishes the GIL so another thread can take it and run Python. This polite conduct is called cooperative multitasking, and it allows concurrency; many threads can wait for different events at the same time.

Say that two threads each connect a socket:

def do_connect():
s = socket.socket()
s.connect(('python.org', 80)) # drop the GIL for i in range(2):
t = threading.Thread(target=do_connect)
t.start()

Only one of these two threads can execute Python at a time, but once the thread has begun connecting, it drops the GIL so the other thread can run. This means that both threads could be waiting for their sockets to connect concurrently, which is a good thing. They can do more work in the same amount of time.

Let's pry open the box and see how a Python thread actually drops the GIL while it waits for a connection to be established, in socketmodule.c:

/* s.connect((host, port)) method */
static PyObject *
sock_connect(PySocketSockObject *s, PyObject *addro)
{
sock_addr_t addrbuf;
int addrlen;
int res; /* convert (host, port) tuple to C address */
getsockaddrarg(s, addro, SAS2SA(&addrbuf), &addrlen); Py_BEGIN_ALLOW_THREADS
res = connect(s->sock_fd, addr, addrlen);
Py_END_ALLOW_THREADS /* error handling and so on .... */
}

The Py_BEGIN_ALLOW_THREADS macro is where the thread drops the GIL; it is defined simply as:

PyThread_release_lock(interpreter_lock);

And of course Py_END_ALLOW_THREADS reacquires the lock. A thread might block at this spot, waiting for another thread to release the lock; once that happens, the waiting thread grabs the GIL back and resumes executing your Python code. In short: While N threads are blocked on network I/O or waiting to reacquire the GIL, one thread can run Python.

Below, see a complete example that uses cooperative multitasking to fetch many URLs quickly. But before that, let's contrast cooperative multitasking with the other kind of multitasking.

Preemptive multitasking

A Python thread can voluntarily release the GIL, but it can also have the GIL seized from it preemptively.

Let's back up and talk about how Python is executed. Your program is run in two stages. First, your Python text is compiled into a simpler binary format called bytecode. Second, the Python interpreter's main loop, a function mellifluously named PyEval_EvalFrameEx(), reads the bytecode and executes the instructions in it one by one.

While the interpreter steps through your bytecode it periodically drops the GIL, without asking permission of the thread whose code it is executing, so other threads can run:

for (;;) {
if (--ticker < 0) {
ticker = check_interval; /* Give another thread a chance */
PyThread_release_lock(interpreter_lock); /* Other threads may run now */ PyThread_acquire_lock(interpreter_lock, 1);
} bytecode = *next_instr++;
switch (bytecode) {
/* execute the next instruction ... */
}
}

By default the check interval is 1000 bytecodes. All threads run this same code and have the lock taken from them periodically in the same way. In Python 3 the GIL's implementation is more complex, and the check interval is not a fixed number of bytecodes, but 15 milliseconds. For your code, however, these differences are not significant.

Thread safety in Python

Weaving together multiple threads requires skill.

If a thread can lose the GIL at any moment, you must make your code thread-safe. Python programmers think differently about thread safety than C or Java programmers do, however, because many Python operations are atomic.

An example of an atomic operation is calling sort() on a list. A thread cannot be interrupted in the middle of sorting, and other threads never see a partly sorted list, nor see stale data from before the list was sorted. Atomic operations simplify our lives, but there are surprises. For example, += seems simpler than sort(), but += is not atomic. How can you know which operations are atomic and which are not?

Consider this code:

n = 0

def foo():

global n

n += 1

We can see the bytecode to which this function compiles, with Python's standard dis module:

>>> import dis
>>> dis.dis(foo)
LOAD_GLOBAL 0 (n)
LOAD_CONST 1 (1)
INPLACE_ADD
STORE_GLOBAL 0 (n)

One line of code, n += 1, has been compiled to four bytecodes, which do four primitive operations:

load the value of n onto the stack

load the constant 1 onto the stack

sum the two values at the top of the stack

store the sum back into n

Remember that every 1000 bytecodes a thread is interrupted by the interpreter taking the GIL away. If the thread is unlucky, this might happen between the time it loads the value of n onto the stack and when it stores it back. How this leads to lost updates is easy see:

threads = []
for i in range(100):
t = threading.Thread(target=foo)
threads.append(t) for t in threads:
t.start() for t in threads:
t.join() print(n)

Usually this code prints 100, because each of the 100 threads has incremented n. But sometimes you see 99 or 98, if one of the threads' updates was overwritten by another.

So, despite the GIL, you still need locks to protect shared mutable state:

n = 0
lock = threading.Lock() def foo():
global n
with lock:
n += 1

What if we were using an atomic operation like sort() instead?:

lst = [4, 1, 3, 2]

def foo():
lst.sort()

This function's bytecode shows that sort() cannot be interrupted, because it is atomic:

>>> dis.dis(foo)
LOAD_GLOBAL 0 (lst)
LOAD_ATTR 1 (sort)
CALL_FUNCTION 0

The one line compiles to three bytecodes:

load the value of lst onto the stack

load its sort method onto the stack

call the sort method

Even though the line lst.sort() takes several steps, the sort call itself is a single bytecode, and thus there is no opportunity for the thread to have the GIL seized from it during the call. We could conclude that we don't need to lock around sort(). Or, to avoid worrying about which operations are atomic, follow a simple rule: Always lock around reads and writes of shared mutable state. After all, acquiring a threading.Lock in Python is cheap.

Although the GIL does not excuse us from the need for locks, it does mean there is no need for fine-grained locking. In a free-threaded language like Java, programmers make an effort to lock shared data for the shortest time possible, to reduce thread contention and allow maximum parallelism. Because threads cannot run Python in parallel, however, there's no advantage to fine-grained locking. So long as no thread holds a lock while it sleeps, does I/O, or some other GIL-dropping operation, you should use the coarsest, simplest locks possible. Other threads couldn't have run in parallel anyway.

Finishing sooner with concurrency

I wager what you really came for is to optimize your programs with multi-threading. If your task will finish sooner by awaiting many network operations at once, then multiple threads help, even though only one of them can execute Python at a time. This is concurrency, and threads work nicely in this scenario.

This code runs faster with threads:

import threading
import requests urls = [...] def worker():
while True:
try:
url = urls.pop()
except IndexError:
break # Done. requests.get(url) for _ in range(10):
t = threading.Thread(target=worker)
t.start()

As we saw above, these threads drop the GIL while waiting for each socket operation involved in fetching a URL over HTTP, so they finish the work sooner than a single thread could.

Parallelism

What if your task will finish sooner only by running Python code simultaneously? This kind of scaling is called parallelism, and the GIL prohibits it. You must use multiple processes, which can be more complicated than threading and requires more memory, but it will take advantage of multiple CPUs.

This example finishes sooner by forking 10 processes than it could with only one, because the processes run in parallel on several cores. But it wouldn't run faster with 10 threads than with one, because only one thread can execute Python at a time:

import os
import sys nums =[1 for _ in range(1000000)]
chunk_size = len(nums) // 10
readers = [] while nums:
chunk, nums = nums[:chunk_size], nums[chunk_size:]
reader, writer = os.pipe()
if os.fork():
readers.append(reader) # Parent.
else:
subtotal = 0
for i in chunk: # Intentionally slow code.
subtotal += i print('subtotal %d' % subtotal)
os.write(writer, str(subtotal).encode())
sys.exit(0)

Parent.

total = 0
for reader in readers:
subtotal = int(os.read(reader, 1000).decode())
total += subtotal print("Total: %d" % total)

Because each forked process has a separate GIL, this program can parcel the work out and run multiple computations at once.

(Jython and IronPython provide single-process parallelism, but they are far from full CPython compatibility. PyPy with Software Transactional Memory may some day be fast. Try these interpreters if you're curious.)

Conclusion

Now that you've opened the music box and seen the simple mechanism, you know all you need to write fast, thread-safe Python. Use threads for concurrent I/O, and processes for parallel computation. The principle is plain enough that you might not even need to write it on your hand.

A. Jesse Jiryu Davis will be speaking at PyCon 2017, which will be held May 17-25 in Portland, Oregon. Catch his talk, Grok the GIL: Write Fast and Thread-Safe Python, on Friday, May 19.

原文连接:https://opensource.com/article/17/4/grok-gil

python3 线程 threading.Thread GIL性能详解(2.3)的更多相关文章

  1. 剑指Offer——线程同步volatile与synchronized详解

    (转)Java面试--线程同步volatile与synchronized详解 0. 前言 面试时很可能遇到这样一个问题:使用volatile修饰int型变量i,多个线程同时进行i++操作,这样可以实现 ...

  2. 【python3+request】python3+requests接口自动化测试框架实例详解教程

    转自:https://my.oschina.net/u/3041656/blog/820023 [python3+request]python3+requests接口自动化测试框架实例详解教程 前段时 ...

  3. C#多线程详解(一) Thread.Join()的详解

    bicabo   C#多线程详解(一) Thread.Join()的详解 什么是进程?当一个程序开始运行时,它就是一个进程,进程包括运行中的程序和程序所使用到的内存和系统资源.而一个进程又是由多个线程 ...

  4. java线程池的使用与详解

    java线程池的使用与详解 [转载]本文转载自两篇博文:  1.Java并发编程:线程池的使用:http://www.cnblogs.com/dolphin0520/p/3932921.html   ...

  5. (转)Python3.5——装饰器及应用详解

    原文:https://blog.csdn.net/loveliuzz/article/details/77853346 Python3.5——装饰器及应用详解(下)----https://blog.c ...

  6. 嵌入式linux性能详解_转

    最近简单看了下<嵌入式Linux性能详解>一书,对系统内存分布测试.程序运行.动态库等都很很好的解析. 作者史子旺,loughsky@sina.com. 有时间希望仔细通读,并验证.

  7. Python3、setuptools、Pip3安装详解

    Python3.setuptools.Pip3安装详解 2017年08月19日 18:58:47 安静的技术控 阅读数:26002    版权声明:本文为博主原创文章,未经博主允许不得转载. http ...

  8. Python3调用C程序(超详解)

    Python3调用C程序(超详解) Python为什么要调用C? 1.要提高代码的运算速度,C比Python快50倍以上 2.对于C语言里很多传统类库,不想用Python重写,想对从内存到文件接口这样 ...

  9. Python3安装使用SaltStack以及salt-api详解

    序言 最近在使用salt-api做主机批量管理部署,整理一下文档.之前使用saltstack 多用于命令行管理,自己做web版的自动化管理平台时,发现命令行的些许局限性,接触到salt-api,找到了 ...

随机推荐

  1. CentOS 7.2 安装MySQL 5.7

    CentOS 7之后的版本yum的默认源中使用MariaDB替代原先MySQL,因此安装方式较为以往有一些改变: 下载mysql的源 wget http://dev.mysql.com/get/mys ...

  2. 使用PaxScript为Delphi应用增加对脚本的支持

    通过使用PaxScript可以为Delphi应用增加对脚本的支持. PaxScript支持paxC,paxBasic,paxPascle,paxJavaScript(对ECMA-262做了扩展) 四种 ...

  3. 51nod 1149 Pi的递推式(组合数学)

    传送门 解题思路 首先因为\(Pi\)不是整数,所以不能直接递推.这时我们要思考这个式子的实际意义,其实\(f(i)\)就可以看做从\(i\)这个点,每次可以向右走\(Pi\)步或\(1\)步,走到[ ...

  4. AcWing 286. 选课 (树形依赖分组背包)打卡

    有依赖的背包 首先依赖的概念,就是一个东西依附与一个东西之上,我们想买附品的话必须要把主品先买下来,这个可以先做下这道题 https://www.cnblogs.com/Lis-/p/11047466 ...

  5. [bzoj2287]消失之物 题解(背包dp)

    2287: [POJ Challenge]消失之物 Time Limit: 10 Sec  Memory Limit: 128 MBSubmit: 1138  Solved: 654[Submit][ ...

  6. 剑指offer——数组中出现次数超过一半的数字(c++)

    题目描述数组中有一个数字出现的次数超过数组长度的一半,请找出这个数字.例如输入一个长度为9的数组{1,2,3,2,2,2,5,4,2}.由于数字2在数组中出现了5次,超过数组长度的一半,因此输出2.如 ...

  7. Java-Class-I:javax.servlet.http.HttpServletRequest

    ylbtech-Java-Class-I:javax.servlet.http.HttpServletRequest 1.返回顶部   2.返回顶部 1. package com.ylbtech.ap ...

  8. HTTP协议的消息头:Content-Type和Accept的作用 转载https://www.cnblogs.com/lexiaofei/p/7289436.html

    一.背景知识 1.概述 Http报头分为通用报头,请求报头,响应报头和实体报头. 请求方的http报头结构:通用报头|请求报头|实体报头 响应方的http报头结构:通用报头|响应报头|实体报头 Acc ...

  9. Ethenet: MAC PHY MII RMII

    https://www.cnblogs.com/liangxiaofeng/p/3874866.html 1. general 下图是网口结构简图.网口由CPU.MAC和PHY三部分组成.DMA控制器 ...

  10. ThreadLocal浅析

    1.ThreadLocal的大体理解 ThreadLocal 又名 线程局部变量,是 Java 中一种较为特殊的 线程绑定机制,可以为每一个使用该变量的线程都提供一个变量值的副本,并且每一个线程都可以 ...