Python之multiprocessing模块的使用
1、开启多进程的简单示例,处理函数无带参数
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing def worker():
print('工作中') if __name__ == '__main__':
for i in range(5):
p = multiprocessing.Process(target=worker)
p.start()
multiprocessing_simple.py
运行效果
[root@ mnt]# python3 multiprocessing_simple.py
工作中
工作中
工作中
工作中
工作中
2、开启多进程的简单示例,处理函数有带参数
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing def worker(num):
print('工作id: %s' % num) if __name__ == '__main__':
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
p.start()
multiprocessing_simple_args.py
运行效果
[root@ mnt]# python3 multiprocessing_simple_args.py
工作id:
工作id:
工作id:
工作id:
工作id:
3、多进程处理导入模块里面的任务
#!/usr/bin/env python
# -*- coding: utf-8 -*- def worker():
print('工作中')
return
multiprocessing_import_worker.py
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import multiprocessing_import_worker if __name__ == '__main__':
for i in range(5):
p = multiprocessing.Process(
target=multiprocessing_import_worker.worker,
)
p.start()
multiprocessing_import_main.py
运行效果
[root@ mnt]# python3 multiprocessing_import_main.py
工作中
工作中
工作中
工作中
工作中
4、多进程自定义进程名字
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import logging
import time logging.basicConfig(
level=logging.DEBUG,
format="(%(threadName)-10s) %(message)s",
) def worker():
name = multiprocessing.current_process().name
logging.debug('%s 开始' % name)
time.sleep(3)
logging.debug('%s 结束' % name) def my_service():
name = multiprocessing.current_process().name
logging.debug('%s 开始' % name)
time.sleep(3)
logging.debug('%s 结束' % name) if __name__ == '__main__':
service = multiprocessing.Process(
name='my_service',
target=my_service,
) worker_1 = multiprocessing.Process(
name='worker_1',
target=worker,
) worker_2 = multiprocessing.Process(
target=worker,
)
service.start()
worker_1.start()
worker_2.start()
multiprocessing_names.py
运行结果
[root@ mnt]# python3 multiprocessing_names.py
(MainThread) worker_1 开始
(MainThread) Process- 开始
(MainThread) my_service 开始
(MainThread) worker_1 结束
(MainThread) Process- 结束
(MainThread) my_service 结束
5、守护进程无等待的方式
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import time
import logging logging.basicConfig(
level=logging.DEBUG,
format='(%(threadName)-10s) %(message)s',
) def daemon():
p = multiprocessing.current_process()
logging.debug('%s %s 开始' % (p.name, p.pid))
time.sleep(2)
logging.debug('%s %s 结束' % (p.name, p.pid)) def no_daemon():
p = multiprocessing.current_process()
logging.debug('%s %s 开始' % (p.name, p.pid))
logging.debug('%s %s 结束' % (p.name, p.pid)) if __name__ == '__main__':
daemon_obj = multiprocessing.Process(
target=daemon,
name='daemon'
) daemon_obj.daemon = True no_daemon_obj = multiprocessing.Process(
target=no_daemon,
name='no_daemon'
) no_daemon_obj.daemon = False daemon_obj.start()
time.sleep(1)
no_daemon_obj.start()
multiprocessing_daemon.py
运行结果
[root@ mnt]# python3 multiprocessing_daemon.py
(MainThread) daemon 开始
(MainThread) no_daemon 开始
(MainThread) no_daemon 结束
6、守护进程等待所有进程执行完成
运行效果
[root@ mnt]# python3 multiprocessing_daemon_join.py
(MainThread) daemon 开始
(MainThread) no_daemon 开始
(MainThread) no_daemon 结束
(MainThread) daemon 结束
7、守护进程设置等待超时时间
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import time
import logging logging.basicConfig(
level=logging.DEBUG,
format='(%(threadName)-10s) %(message)s',
) def daemon():
p = multiprocessing.current_process()
logging.debug('%s %s 开始' % (p.name, p.pid))
time.sleep(2)
logging.debug('%s %s 结束' % (p.name, p.pid)) def no_daemon():
p = multiprocessing.current_process()
logging.debug('%s %s 开始' % (p.name, p.pid))
logging.debug('%s %s 结束' % (p.name, p.pid)) if __name__ == '__main__':
daemon_obj = multiprocessing.Process(
target=daemon,
name='daemon'
) daemon_obj.daemon = True no_daemon_obj = multiprocessing.Process(
target=no_daemon,
name='no_daemon'
) no_daemon_obj.daemon = False daemon_obj.start()
time.sleep(1)
no_daemon_obj.start() daemon_obj.join(1)
logging.debug('daemon_obj.is_alive():%s' % daemon_obj.is_alive())
no_daemon_obj.join()
multiprocessing_daemon_join_timeout.py
运行效果
[root@ mnt]# python3 multiprocessing_daemon_join_timeout.py
(MainThread) daemon 开始
(MainThread) no_daemon 开始
(MainThread) no_daemon 结束
(MainThread) daemon_obj.is_alive():True
8、进程的终止,注意:terminate的时候,需要使用join()进程,保证进程成功终止
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import time
import logging logging.basicConfig(
level=logging.DEBUG,
format='(%(threadName)-10s) %(message)s',
) def slow_worker():
print('开始工作')
time.sleep(0.1)
print('结束工作') if __name__ == '__main__':
p = multiprocessing.Process(
target=slow_worker
)
logging.debug('开始之前的状态%s' % p.is_alive()) p.start()
logging.debug('正在运行的状态%s' % p.is_alive()) p.terminate()
logging.debug('调用终止进程的状态%s' % p.is_alive()) p.join()
logging.debug('等待所有进程运行完成,状态%s' % p.is_alive())
multiprocessing_terminate.py
运行结果
[root@ mnt]# python3 multiprocessing_terminate.py
(MainThread) 开始之前的状态False
(MainThread) 正在运行的状态True
(MainThread) 调用终止进程的状态True
(MainThread) 等待所有进程运行完成,状态False
9、进程退出状态码
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import sys
import time def exit_error():
sys.exit(1) def exit_ok():
return def return_value():
return 1 def raises():
raise RuntimeError('运行时的错误') def terminated():
time.sleep(3) if __name__ == '__main__':
jobs = []
funcs = [
exit_error,
exit_ok,
return_value,
raises,
terminated,
]
for func in funcs:
print('运行进程的函数名 %s' % func.__name__)
j = multiprocessing.Process(
target=func,
name=func.__name__
)
jobs.append(j)
j.start() jobs[-1].terminate() for j in jobs:
j.join()
print('{:>15}.exitcode={}'.format(j.name, j.exitcode))
multiprocessing_exitcode.py
运行效果
[root@ mnt]# python3 multiprocessing_exitcode.py
运行进程的函数名 exit_error
运行进程的函数名 exit_ok
运行进程的函数名 return_value
运行进程的函数名 raises
运行进程的函数名 terminated
Process raises:
exit_error.exitcode=
exit_ok.exitcode=
return_value.exitcode=
Traceback (most recent call last):
File "/usr/local/Python-3.6.6/lib/python3.6/multiprocessing/process.py", line , in _bootstrap
self.run()
File "/usr/local/Python-3.6.6/lib/python3.6/multiprocessing/process.py", line , in run
self._target(*self._args, **self._kwargs)
File "multiprocessing_exitcode.py", line , in raises
raise RuntimeError('运行时的错误')
RuntimeError: 运行时的错误 #注意的是,抛出异常,退出码默认是1
raises.exitcode=
terminated.exitcode=-
10、多进程全局日志的开启
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import logging
import sys def worker():
print('工作中...')
sys.stdout.flush() if __name__ == '__main__':
multiprocessing.log_to_stderr(logging.DEBUG)
p = multiprocessing.Process(target=worker, )
p.start()
p.join()
multiprocessing_log_to_stderr.py
运行效果
[root@ mnt]# python3 multiprocessing_log_to_stderr.py
[INFO/Process-] child process calling self.run()
工作中...
[INFO/Process-] process shutting down
[DEBUG/Process-] running all "atexit" finalizers with priority >=
[DEBUG/Process-] running the remaining "atexit" finalizers
[INFO/Process-] process exiting with exitcode
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >=
[DEBUG/MainProcess] running the remaining "atexit" finalizers
11、多进程日志开启之设置日志的显示级别
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import logging
import sys def worker():
print('工作中...')
sys.stdout.flush() if __name__ == '__main__':
multiprocessing.log_to_stderr()
logger = multiprocessing.get_logger()
logger.setLevel(logging.INFO)
p = multiprocessing.Process(target=worker, )
p.start()
p.join()
multiprocessing_get_logger.py
测试效果
[root@ mnt]# python3 multiprocessing_get_logger.py
[INFO/Process-] child process calling self.run()
工作中...
[INFO/Process-] process shutting down
[INFO/Process-] process exiting with exitcode
[INFO/MainProcess] process shutting down
12、利用继承multiprocessing.Process类,实现无参的多进程
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import logging
import sys class Worker(multiprocessing.Process):
def run(self):
print('当前运行进程名字: %s' % self.name) if __name__ == '__main__':
jobs = []
for i in range(5):
p = Worker()
jobs.append(p)
p.start() for j in jobs:
j.join()
multiprocessing_subclass.py
运行效果
[root@ mnt]# python3 multiprocessing_subclass.py
当前运行进程名字: Worker-
当前运行进程名字: Worker-
当前运行进程名字: Worker-
当前运行进程名字: Worker-
当前运行进程名字: Worker-
13、多进程队列multiprocessing.Queue()的使用
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing class MyFancyClass(object):
def __init__(self, name):
self.name = name def do_something(self):
proc_name = multiprocessing.current_process().name
print('当前进程名字: %s,当前实例化初始名字:%s' % (proc_name, self.name)) def worker(q):
obj = q.get()
obj.do_something() if __name__ == '__main__':
queue = multiprocessing.Queue() #开启进程并且传进队列的实例化对象,此时队列是空,所以会阻塞等数据的到来
p = multiprocessing.Process(
target=worker,
args=(queue,)
)
p.start() #往队列增加数据
queue.put(MyFancyClass('Mrs Suk'))
queue.close() #队列等待进程处理完成
queue.join_thread()
p.join()
multiprocessing_queue.py
运行效果
[root@ mnt]# python3 multiprocessing_queue.py
当前进程名字: Process-,当前实例化初始名字:Mrs Suk
14、多进程队列multiprocessing.JoinableQueue()的使用,示例:实现数字乘法运算,并且把结果存入队列中,最后再从队列中取出打印出来
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import time class Consumer(multiprocessing.Process):
"""消费者类""" def __init__(self, task_queue, result_queue, *args, **kwargs):
super(Consumer, self).__init__(*args, **kwargs)
self.task_queue = task_queue
self.result_queue = result_queue def run(self):
proc_name = self.name # 获取进程名字
while True:
next_task = self.task_queue.get()
if next_task is None: #如果获取到对象为空的话,则队列已经退出
print('%s 退出' % proc_name)
self.task_queue.task_done()
break
print('{}:{}'.format(proc_name, next_task))
answer = next_task() # 这里会调用_Task类_call__方法
self.task_queue.task_done() #处理完成,向队列发送task_done(),让该队列不要在join,如果没有发送task_done(),则队列一直是join
self.result_queue.put(answer) # 将运行结果放在results队列中 class Task(object):
def __init__(self, a, b):
self.a = a
self.b = b def __call__(self, *args, **kwargs):
time.sleep(0.1)
return '{self.a} * {self.b} = {product}'.format(self=self, product=self.a * self.b) def __str__(self):
return '{self.a} * {self.b}'.format(self=self) if __name__ == '__main__':
# 队列比Queue多了两个方法,task_done(),join()
tasks = multiprocessing.JoinableQueue()
# 结果存放的队列
results = multiprocessing.Queue() # 获取电脑CPU核数
num_consumers = multiprocessing.cpu_count() * 2
print('创建{}位消费者'.format(num_consumers))
consumers = [
Consumer(tasks, results) for i in range(num_consumers)
]
# 开启消费者多进程
for w in consumers:
w.start() # 往排队队列增加数据
num_jobs = 10
for i in range(10):
tasks.put(Task(i, i)) # 往每一个消费队列设置默认值 None
for i in range(num_consumers):
tasks.put(None) # 等待所有的任务完成
tasks.join() # 打印处理的结果
while num_jobs:
result = results.get()
print('运算结果:', result)
num_jobs -= 1
multiprocessing_producer_consumer.py
运行结果
[root@ mnt]# python3 multiprocessing_producer_consumer.py
创建2位消费者 #因为测试机只有2核,所以产生两位消费者
Consumer-: *
Consumer-: *
Consumer-: *
Consumer-: *
Consumer-: *
Consumer-: *
Consumer-: *
Consumer-: *
Consumer-: *
Consumer-: *
Consumer- 退出
Consumer- 退出
运算结果: * =
运算结果: * =
运算结果: * =
运算结果: * =
运算结果: * =
运算结果: * =
运算结果: * =
运算结果: * =
运算结果: * =
运算结果: * =
15、多进程事件设置
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import time def wait_for_event(event_obj):
print('无超时等待事件开始')
event_obj.wait()
print('阻塞事件状态:', event_obj.is_set()) def wait_for_event_timeout(event_obj, timeout):
print('设置超时等待事件开始')
event_obj.wait(timeout)
print('非阻塞事件状态:', event_obj.is_set()) if __name__ == '__main__':
event_obj = multiprocessing.Event() block_task = multiprocessing.Process(
name='block_task',
target=wait_for_event,
args=(event_obj,)
)
block_task.start() non_block_task = multiprocessing.Process(
name='non_block_task',
target=wait_for_event_timeout,
args=(event_obj, 2)
)
non_block_task.start() print('等待3秒,让所有进程都正常开启')
time.sleep(3)
event_obj.set()
print('设置事件状态为set()=True')
multiprocessing_event.py
运行效果
[root@ mnt]# python3 multiprocessing_event.py
等待3秒,让所有进程都正常开启
设置超时等待事件开始
无超时等待事件开始
非阻塞事件状态: False
设置事件状态为set()=True
阻塞事件状态: True
16、多进程资源控制访问,锁的使用
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import sys def worker_with(lock, stream):
with lock:
stream.write('通过with获取得到锁\n') def worker_no_with(lock, stream):
lock.acquire()
try:
stream.write('通过lock.acquire()获取得到锁\n')
finally:
lock.release() if __name__ == '__main__':
lock = multiprocessing.Lock() w = multiprocessing.Process(
target=worker_with,
args=(lock, sys.stdout,)
) nw = multiprocessing.Process(
target=worker_no_with,
args=(lock, sys.stdout,)
) w.start()
nw.start() w.join()
nw.join()
multiprocessing_lock.py
运行效果
[root@ mnt]# python3 multiprocessing_lock.py
通过lock.acquire()获取得到锁
通过with获取得到锁
17、多进程multiprocessing.Condition()同步
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import time def task_1(condition_obj):
proc_name = multiprocessing.current_process().name
print('开始 %s' % proc_name)
with condition_obj:
print('%s运行结束,开始运行task_2' % proc_name)
condition_obj.notify_all() def task_2(condition_obj):
proc_name = multiprocessing.current_process().name
print('开始 %s' % proc_name)
with condition_obj:
condition_obj.wait()
print('task_2 %s 运行结束' % proc_name) if __name__ == '__main__':
condition_obj = multiprocessing.Condition() s1 = multiprocessing.Process(name='s1',
target=task_1,
args=(condition_obj,)) s2_clients = [
multiprocessing.Process(
name='task_2[{}]'.format(i),
target=task_2,
args=(condition_obj,),
) for i in range(1, 3)
] for c in s2_clients:
c.start()
time.sleep(1)
s1.start() s1.join()
for c in s2_clients:
c.join()
multiprocessing_condition.py
运行效果
[root@ mnt]# python3 multiprocessing_condition.py
开始 task_2[]
开始 task_2[]
开始 s1
s1运行结束,开始运行task_2
task_2 task_2[] 运行结束
task_2 task_2[] 运行结束
18、利用multiprocessing.Semaphore()自定义控制资源的并发访问
#!/usr/bin/env python
# -*- coding: utf-8 -*- import random
import multiprocessing
import time class ActivePool: def __init__(self, *args, **kwargs):
super(ActivePool, self).__init__(*args, **kwargs)
self.mgr = multiprocessing.Manager()
self.active = self.mgr.list()
self.lock = multiprocessing.Lock() def makeActive(self, name):
with self.lock:
self.active.append(name) def makeInactive(self, name):
with self.lock:
self.active.remove(name) def __str__(self):
with self.lock:
return str(self.active) def worker(s, pool):
name = multiprocessing.current_process().name
with s:
pool.makeActive(name)
print('Activating {} now running {}'.format(
name, pool))
time.sleep(random.random())
pool.makeInactive(name) if __name__ == '__main__':
pool = ActivePool()
s = multiprocessing.Semaphore(3)
jobs = [
multiprocessing.Process(
target=worker,
name=str(i),
args=(s, pool),
)
for i in range(10)
] for j in jobs:
j.start() while True:
alive = 0
for j in jobs:
if j.is_alive():
alive += 1
j.join(timeout=0.1)
print('Now running {}'.format(pool))
if alive == 0:
# all done
break
multiprocessing_semaphore.py
运行效果
[root@ mnt]# python3 multiprocessing_semaphore.py
Activating now running ['']
Activating now running ['', '']
Activating now running ['', '', '']
Activating now running ['', '', '']
Now running ['', '', '']
Now running ['', '', '']
Now running ['', '', '']
Now running ['', '', '']
Activating now running ['', '', '']
Now running ['', '', '']
Now running ['', '', '']
Now running ['', '', '']
Now running ['', '', '']
Activating now running ['', '', '']
Now running ['', '', '']
Now running ['', '', '']
Activating now running ['', '', '']
Activating now running ['', '', '']
Now running ['', '', '']
Now running ['', '', '']
Now running ['', '', '']
Now running ['', '', '']
Activating now running ['', '', '']
Now running ['', '', '']
Now running ['', '', '']
Activating now running ['', '', '']
Now running ['', '']
Now running ['']
Now running ['']
Now running ['']
Now running []
19、多进程multiprocessing.Manager()共享字典或列表数据
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing def worker(dict_obj, key, value):
dict_obj[key] = value if __name__ == '__main__':
#创建一个多进程共享的字典,所有进程都能看到字典的内容
mgr = multiprocessing.Manager()
mgr_dict = mgr.dict()
jobs = [
multiprocessing.Process(
target=worker,
args=(mgr_dict, i, i * 2),
)
for i in range(10)
]
#开启worker任务
for j in jobs:
j.start()
##等待worker任务执行完成
for j in jobs:
j.join()
print('运行结果:', mgr_dict)
multiprocessing_manager_dict.py
运行效果
[root@ mnt]# python3 multiprocessing_manager_dict.py
运行结果: {: , : , : , : , : , : , : , : , : , : }
20、多进程multiprocessing.Manager()共享命名空间,字符串类型:全局可以获得值
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import time def producer(namespace_obj, event):
"""生产者"""
namespace_obj.value = '命名空间设置的值:1234'
event.set() def consumer(namespace_obj, event):
""""消费者"""
"""
生产者和消费者首次进程开启的时候,
namespace_obj.value不存在,所以会抛异常,
当生产者事件设置set()的时候,
消费者event.wait()不阻塞,继续执行后面的结果
"""
try:
print('进程事件前的值: {}'.format(namespace_obj.value))
except Exception as err:
print('进程事件前错误:', str(err))
event.wait()
print('进程事件后的值:', namespace_obj.value) if __name__ == '__main__':
# 创建一个共享管理器
mgr = multiprocessing.Manager() # 创建一个命名空间类型共享类型
namespace = mgr.Namespace() # 创建多进程的事件
event = multiprocessing.Event() p = multiprocessing.Process(
target=producer,
args=(namespace, event),
) c = multiprocessing.Process(
target=consumer,
args=(namespace, event),
) c.start()
time.sleep(1)
p.start() c.join()
p.join()
multiprocessing_namespace.py
运行效果
[root@ mnt]# python3 multiprocessing_namespace.py
进程事件前错误: 'Namespace' object has no attribute 'value'
进程事件后的值: 命名空间设置的值:
21、多进程multiprocessing.Manager()共享命名空间,列表类型:全局不可以获得值
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import time def producer(namespace_obj, event):
"""生产者"""
namespace_obj.my_list.append('命名空间设置的值:1234')
event.set() def consumer(namespace_obj, event):
""""消费者"""
"""
生产者和消费者首次进程开启的时候,
namespace_obj.value不存在,所以会抛异常,
当生产者事件设置set()的时候,
消费者event.wait()不阻塞,继续执行后面的结果
"""
try:
print('进程事件前的值: {}'.format(namespace_obj.my_list))
except Exception as err:
print('进程事件前错误:', str(err))
event.wait()
print('进程事件后的值:', namespace_obj.my_list) if __name__ == '__main__':
# 创建一个共享管理器
mgr = multiprocessing.Manager() # 创建一个命名空间类型共享类型
namespace = mgr.Namespace() # 如果是列表类型,不是能全局更换列表
namespace.my_list = [] # 创建多进程的事件
event = multiprocessing.Event() p = multiprocessing.Process(
target=producer,
args=(namespace, event),
) c = multiprocessing.Process(
target=consumer,
args=(namespace, event),
) c.start()
p.start() c.join()
p.join()
multiprocessing_namespace_mutable.py
运行效果
[root@ mnt]# python3 multiprocessing_namespace_mutable.py
进程事件前的值: []
进程事件后的值: []
22、进程池之列表数字的运算
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing def do_calculation(data):
return data * 2 def start_process():
print('进程开始', multiprocessing.current_process().name) if __name__ == '__main__':
inputs = list(range(10))
print('inputs :', inputs) #使用内置的map方法运算
builtin_outputs = map(do_calculation, inputs)
print('Built-in:', list(builtin_outputs)) pool_size = multiprocessing.cpu_count() * 2
pool = multiprocessing.Pool(
processes=pool_size,
initializer=start_process,
)
#使用进程池进行运算
pool_outputs = pool.map(do_calculation, inputs)
pool.close()
pool.join()
print('Pool :', pool_outputs)
multiprocessing_pool.py
运行效果
[root@ mnt]# python3 multiprocessing_pool.py
inputs : [, , , , , , , , , ]
Built-in: [, , , , , , , , , ]
进程开始 ForkPoolWorker-
进程开始 ForkPoolWorker-
Pool : [, , , , , , , , , ]
23、进程池设置一个进程最多运行多少次(maxtasksperchild)就执行重启进程,作用:避免工作进程长时间运行消耗很多的系统资源
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing def do_calculation(data):
return data * 2 def start_process():
print('进程开始', multiprocessing.current_process().name) if __name__ == '__main__':
inputs = list(range(100))
print('inputs :', inputs) # 使用内置的map方法运算
builtin_outputs = map(do_calculation, inputs)
print('Built-in:', list(builtin_outputs)) pool_size = multiprocessing.cpu_count() * 2
pool = multiprocessing.Pool(
processes=pool_size,
initializer=start_process,
maxtasksperchild=2
)
# 使用进程池进行运算
pool_outputs = pool.map(do_calculation, inputs)
pool.close()
pool.join()
print('Pool :', pool_outputs)
multiprocessing_pool_maxtasksperchild.py
运行效果
[root@ mnt]# python3 multiprocessing_pool_maxtasksperchild.py
inputs : [, , , , , , , , , ]
Built-in: [, , , , , , , , , ]
进程开始 ForkPoolWorker-
进程开始 ForkPoolWorker-
进程开始 ForkPoolWorker-
进程开始 ForkPoolWorker-
Pool : [, , , , , , , , , ]
24、利用多进程的进程池实例MapReduce,下面示例简单:读取文件内容,分词计数器
#!/usr/bin/env python
# -*- coding: utf-8 -*- import collections
import itertools
import multiprocessing class SimpleMapReduce: def __init__(self, map_func, reduce_func, num_workers=None):
"""
:param map_func: 会调用file_to_words(filename)函数
:param reduce_func: 会调用count_words(item)的函数
:param num_workers:
"""
self.map_func = map_func
self.reduce_func = reduce_func
self.pool = multiprocessing.Pool(num_workers) def partition(self, mapped_values):
"""包装一个字典集合"""
partitioned_data = collections.defaultdict(list)
for key, value in mapped_values:
partitioned_data[key].append(value)
return partitioned_data.items() def __call__(self, inputs, chunksize=1):
"""
:param inputs:文件名
:param chunksize: 处理块的大小
:return:
"""
#这里返回值是:[(word,1)...]
map_responses = self.pool.map(
self.map_func,
inputs,
chunksize=chunksize,
) # 返回的是collections.defaultdict().items()的key,value
partitioned_data = self.partition(
itertools.chain(*map_responses)
) #将包组好的dict_items()对象,调用传入count_words(item)的item里面,这样子,就可以使聚合函数sum()生效
reduced_values = self.pool.map(
self.reduce_func,
partitioned_data,
)
return reduced_values
multiprocessing_mapreduce.py
#!/usr/bin/env python
# -*- coding: utf-8 -*- import multiprocessing
import string from multiprocessing_mapreduce import SimpleMapReduce def file_to_words(filename):
"""作用:读取文件内容,分词+计数"""
# 怱略统计字符串集合
STOP_WORDS = set([
'a', 'an', 'and', 'are', 'as', 'be', 'by', 'for', 'if',
'in', 'is', 'it', 'of', 'or', 'py', 'rst', 'that', 'the',
'to', 'with',
]) TR = str.maketrans({
p: ' '
for p in string.punctuation
}) print('进程:{} 读取文件名:{}'.format(multiprocessing.current_process().name, filename))
output = [] with open(filename, 'rt', encoding='utf-8') as f:
for line in f:
#怱略注释..开头
if line.lstrip().startswith('..'):
continue line = line.translate(TR) # 去除TR包含的符号 for word in line.split():#通过空格分割
word = word.lower()
if word.isalpha() and word not in STOP_WORDS:
output.append((word, 1))
return output def count_words(item):
"""词的聚合函数求合""" word, occurences = item
return (word, sum(occurences)) if __name__ == '__main__':
import operator
import glob #搜索当前文件,后缀为*.rst结尾的文件
input_files = glob.glob('*.rst') #实例化一个MapReduce对象
mapper = SimpleMapReduce(file_to_words, count_words) word_counts = mapper(input_files) #这里会调用SimpleMapReduce类里面的__call__方法
word_counts.sort(key=operator.itemgetter(1)) #获取word_counts的下标为1,作为排序
word_counts.reverse() #倒序 print('\nTOP 20 WORDS BY FREQUENCY\n')
top20 = word_counts[:20]
longest = max(len(word) for word, count in top20)
for word, count in top20:
print('{word:<{len}}: {count:5}'.format(
len=longest + 1,
word=word,
count=count)
)
multiprocessing_wordcount.py
If there is a relationship() from Parent to Child, but there is not a reverse-relationship that links a particular Child to each Parent, SQLAlchemy will not have any awareness that when deleting this particular Child object, it needs to maintain the “secondary” table that links it to the Parent. No delete of the “secondary” table will occur. If there is a relationship that links a particular Child to each Parent, suppose it’s called Child.parents, SQLAlchemy by default will load in the Child.parents collection to locate all Parent objects, and remove each row from the “secondary” table which establishes this link. Note that this relationship does not need to be bidirectional; SQLAlchemy is strictly looking at every relationship() associated with the Child object being deleted. A higher performing option here is to use ON DELETE CASCADE directives with the foreign keys used by the database. Assuming the database supports this feature, the database itself can be made to automatically delete rows in the “secondary” table as referencing rows in “child” are deleted. SQLAlchemy can be instructed to forego actively loading in the Child.parents collection in this case using the passive_deletes directive on relationship(); see Using Passive Deletes for more details on this. Note again, these behaviors are only relevant to the secondary option used with relationship(). If dealing with association tables that are mapped explicitly and are not present in the secondary option of a relevant relationship(), cascade rules can be used instead to automatically delete entities in reaction to a related entity being deleted - see Cascades for information on this feature.
统计的素材 test.rst
运行效果
[root@python-mysql mnt]# python3 multiprocessing_wordcount.py
进程:SpawnPoolWorker- 读取文件名:test.rst TOP WORDS BY FREQUENCY child :
relationship :
this :
parent :
on :
delete :
table :
sqlalchemy :
not :
can :
database :
used :
option :
deleted :
parents :
will :
each :
particular :
links :
there :
Python之multiprocessing模块的使用的更多相关文章
- python多进程multiprocessing模块中Queue的妙用
最近的部门RPA项目中,小爬为了提升爬虫性能,使用了Python中的多进程(multiprocessing)技术,里面需要用到进程锁Lock,用到进程池Pool,同时利用map方法一次构造多个proc ...
- python的multiprocessing模块进程创建、资源回收-Process,Pool
python的multiprocessing有两种创建进程的方式,每种创建方式和进程资源的回收都不太相同,下面分别针对Process,Pool及系统自带的fork三种进程分析. 1.方式一:fork( ...
- python中multiprocessing模块
multiprocess模块那来干嘛的? 答:利用multiprocessing可以在主进程中创建子进程.Threading是多线程,multiprocessing是多进程. #该模块和Threadi ...
- Python(多进程multiprocessing模块)
day31 http://www.cnblogs.com/yuanchenqi/articles/5745958.html 由于GIL的存在,python中的多线程其实并不是真正的多线程,如果想要充分 ...
- python 进程 multiprocessing模块
一.multiprocess.process模块 1.process类 Process([group [, target [, name [, args [, kwargs]]]]]),由该类实例化得 ...
- python 多进程multiprocessing 模块
multiprocessing 常用方法: cpu_count():统计cpu核数 multiprocessing.cpu_count() active_children() 获取所有子进程 mult ...
- Python 调用multiprocessing模块下面的Process类方法(实现服务器、客户端并发)-UDP协议
#基于UDP协议的multiprocessing自定义通信 服务端: from multiprocessing import Process import socket def task(server ...
- python之multiprocessing创建进程
python的multiprocessing模块是用来创建多进程的,下面对multiprocessing总结一下使用记录. multiprocessing创建多进程在windows和linux系统下的 ...
- Python使用multiprocessing实现一个最简单的分布式作业调度系统
Python使用multiprocessing实现一个最简单的分布式作业调度系统介绍Python的multiprocessing模块不但支持多进程,其中managers子模块还支持把多进程分布到多台机 ...
随机推荐
- Redis 是怎么实现 “附近的人” 的?
针对"附近的人"这一位置服务领域的应用场景,常见的可使用PG.MySQL和MongoDB等多种DB的空间索引进行实现. 而Redis另辟蹊径,结合其有序队列zset以及geohas ...
- django时区与时间差的问题
时区的正确配置方式: # 这里还可以配置成中文 一般用不到 LANGUAGE_CODE = 'en-us' # TIME_ZONE = 'UTC' TIME_ZONE = 'Asia/Shanghai ...
- Python验证数据的抽样分布类型
假如要对一份统计数据进行分析,一般其来源来自于社会调研/普查,所以数据不是总体而是一定程度的抽样.对于抽样数据的分析,就可以结合上篇统计量及其抽样分布的内容,判断数据符合哪种分布.使用已知分布特性,可 ...
- 怎样使用 Vue 的监听属性 watch ?
需求: 我需要在某个数据变化时能够执行特定的动作, 比如我在输入框中输入数字 88, 系统检测到以后就会弹窗 拜拜 , 而输入其他字符则不会触发, 这种需求简直多入牛毛, 实际上这就是 自定义事件 , ...
- C# 添加log4net日志
一.添加log4net的Nuget包 二.在Web.config或者App.config文件中添加log4net配置 代码: <log4net> <!-- OFF, FATAL, E ...
- 数据库HAVING的使用
HAVING语句通常与GROUP BY语句联合使用,用来过滤由GROUP BY语句返回的记录集. HAVING语句的存在弥补了WHERE关键字不能与聚合函数联合使用的不足. 记录一下
- 将java文件编译成class文件
一般情况下,在myeclipse中保存java文件后会自动编译成class文件,但是这种情况只能编译当前工程的java文件,但是如果需要编译不是一个工程的java文件,比如在网上拷贝的java文件改如 ...
- webpack配置不同打包配置
生成环境与开发环境打包配置 使用package.json配置npm run开启不同的打包配置 ...webpack基本使用最后一篇博客 在上一篇博客中详细的演示了webpack开启本地服务和热更新,这 ...
- 创建LEANGOO账号
转自:https://www.leangoo.com/leangoo_guide/leangoo_guide_login.html#toggle-id-2 Leangoo采用SaaS模式运行,通过邮箱 ...
- ssh connection refused 问题
以下内容引用自:ephererid的文章: https://segmentfault.com/a/1190000014532520 问题 在使用ssh连接时出现: $ ssh localhost ss ...