通过阅读python subprocess源码尝试实现非阻塞读取stdout以及非阻塞wait

http://blog.chinaunix.net/uid-23504396-id-4661783.html

执行subprocess的时候，执行不是问题
最麻烦的是获取进程执行后的回显来确认是否正确执行，还不能阻塞
还要获取进程执行后的返回状态确认进程是否正确结束，也不能阻塞

分开解决这个问题
我们先解决第一个问题，获取回显

一般获取回显，代码都是如下写法

点击(此处)折叠或打开

sub_process = subprocess.Popen(command, stdin = subprocess.PIPE,stdout = subprocess.PIPE,stderr = subprocess.PIPE, shell = True)

为了搞清楚subprocess是怎么获取子进程stdout的，我们首先看看 subprocess.PIPE是什么
进入代码里可以看见subprocess.PIPE 直接是个int -1
再看看网上一般获取subprocess回显的代码

点击(此处)折叠或打开

lines = sub_process.stdout.readline()

subprocess.PIPE是-1，为什么Popen这个类的stdout变成了什么对象，可以用readline方法呢
打印type可以知道Popen对象的stdout的类型是file,我们看看subprocess里做了什么操作。
我们看看Popen的init方法(python 2.7.8)

stdout传入_get_handles函数准换出(p2cread, p2cwrite,c2pread, c2pwrite,errread, errwrite)

点击(此处)折叠或打开

(p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite) = self._get_handles(stdin, stdout, stderr)

p2cread, p2cwrite,c2pread, c2pwrite,errread, errwrite 传入_execute_child中，这个函数看名字就知道是真正的执行函数

点击(此处)折叠或打开

self._execute_child(args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite)

p2cread, p2cwrite,c2pread, c2pwrite,errread, errwrite传入执行函数后，stdout等通过fdopen函数转换问file对象

点击(此处)折叠或打开

if p2cwrite is not None:
self.stdin = os.fdopen(p2cwrite, 'wb', bufsize)
if c2pread is not None:
if universal_newlines:
self.stdout = os.fdopen(c2pread, 'rU', bufsize)
else:
self.stdout = os.fdopen(c2pread, 'rb', bufsize)
if errread is not None:
if universal_newlines:
self.stderr = os.fdopen(errread, 'rU', bufsize)
else:
self.stderr = os.fdopen(errread, 'rb', bufsize)

我们先看看_get_handles方法，部分代码如下

点击(此处)折叠或打开

def _get_handles(self, stdin, stdout, stderr):
"""Construct and return tuple with IO objects:
p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite
"""
p2cread, p2cwrite = None, None
c2pread, c2pwrite = None, None
errread, errwrite = None, None
if stdin is None:
pass
elif stdin == PIPE:
p2cread, p2cwrite = self.pipe_cloexec()
elif isinstance(stdin, int):
p2cread = stdin
else:
# Assuming file-like object
p2cread = stdin.fileno()

再跟踪进去看pipe_cloexec

点击(此处)折叠或打开

def pipe_cloexec(self):
"""Create a pipe with FDs set CLOEXEC."""
# Pipes' FDs are set CLOEXEC by default because we don't want them
# to be inherited by other subprocesses: the CLOEXEC flag is removed
# from the child is FDs by _dup2(), between fork() and exec().
# This is not atomic: we would need the pipe2() syscall for that.
r, w = os.pipe()
self._set_cloexec_flag(r)
self._set_cloexec_flag(w)
return r, w

可以知道，当stdout赋值为subprocess.PIPE(即-1)时，subprocess内部通过os.pipe()创建一个管道，并返回管道的读，写文件描述符

点击(此处)折叠或打开

os.pipe()
Create a pipe. Return a pair of file descriptors (r, w) usable for reading and writing, respectively.

_set_cloexec_flag函数暂时不用详细看了，只是通过fcntl设置下文件做控制。

所以从这里我可以看出stdout等传入subprocess.PIPE后，这个值只是作为一个判断值，判断为此值以后，内部通过os.piep()用作输入输出传送。
由于subprocess内部创建的pipe()大小不可控，所以推举做法是使用StringIO创建一个内存文件对象，并传入这个对象的fileno，参考文章
http://backend.blog.163.com/blog/static/2022941262014016710912/

现在就剩下单问题就是，这个管道如何获得子进程的输入输出的呢，这就要看_execute_child里是怎么做的了
具体说明我直接在下面源代码里注释说明，最后再做总结

点击(此处)折叠或打开

def _execute_child(self, args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite):
"""Execute program (POSIX version)"""
if isinstance(args, types.StringTypes):
args = [args]
else:
args = list(args)
if shell:
args = ["/bin/sh", "-c"] + args
if executable:
args[0] = executable
if executable is None:
executable = args[0]
#这里又创建了一个管道，这个管道只用来获取自进程try后except出来的内容，不是获取stderr
errpipe_read, errpipe_write = self.pipe_cloexec()
try:
try:
gc_was_enabled = gc.isenabled()
#这里关闭了gc回收,防止对象被回收，这里值得学习。
gc.disable()
try:
self.pid = os.fork()
except:
if gc_was_enabled:
gc.enable()
raise
self._child_created = True
if self.pid == 0:
#如果pid为0，表示自己是子进程，执行下面代码（父进程获取到的是子进程的PID，不执行此代码）
#父子进程pipe()通信原理——利用pipe()建立起来的无名文件（无路径名）。只用该系统调用所返回的文件描述符来标识该文件.
#只有调用pipe()的进程及其子孙进程才能识别此文件描述符，才能利用该文件（管道）进行通信。当这些进程不再使用此管道时，核心收回其索引结点。
#如果Pope对象初始化的时候，stdin stdout stderr都用subprocess.PIPE的话，那么fork前会创建3个管道，并传入对应的文件描述符进来
try:
#关闭从父进程复制过来的的不需要的管道的一端
if p2cwrite is not None:
os.close(p2cwrite)
if c2pread is not None:
os.close(c2pread)
if errread is not None:
os.close(errread)
os.close(errpipe_read)
#下面都是做了一些文件描述符复制操作，反正通过下面的代码将子进程的输出传到父进程
#那些描述符复制操作基本就相当于把子进程的stdout、stdin、stderr的fd绑定的父进程传过来的文件描述符上
# When duping fds, if there arises a situation
# where one of the fds is either 0, 1 or 2, it
# is possible that it is overwritten (#12607).
if c2pwrite == 0:
c2pwrite = os.dup(c2pwrite)
if errwrite == 0 or errwrite == 1:
errwrite = os.dup(errwrite)
# Dup fds for child
def _dup2(a, b):
# dup2() removes the CLOEXEC flag but
# we must do it ourselves if dup2()
# would be a no-op (issue #10806).
if a == b:
self._set_cloexec_flag(a, False)
elif a is not None:
os.dup2(a, b)
_dup2(p2cread, 0)
_dup2(c2pwrite, 1)
_dup2(errwrite, 2)
#2.7才有的写法，2.6这样写报错，2.7大概这样写比list里找快一点，所以用了dict
#如果管道文件描述符大于2的话，关闭从主进程赋值过来的管道的一端，
closed = { None }
for fd in [p2cread, c2pwrite, errwrite]:
if fd not in closed and fd > 2:
os.close(fd)
closed.add(fd)
#这里控制关闭前面用来保存except输出的管道
if close_fds:
self._close_fds(but=errpipe_write)
#切换下执行目录防止运行出错,这里也值得学习！
if cwd is not None:
os.chdir(cwd)
if preexec_fn:
preexec_fn()
#可以看到，最终是通过execvp/execvpe来执行系统命令的
if env is None:
os.execvp(executable, args)
else:
os.execvpe(executable, args, env)
except:
exc_type, exc_value, tb = sys.exc_info()
# Save the traceback and attach it to the exception object
exc_lines = traceback.format_exception(exc_type,
exc_value,
tb)
exc_value.child_traceback = ''.join(exc_lines)
#子进程将错误信息写入接受except的管道的写端
os.write(errpipe_write, pickle.dumps(exc_value))
#这里退出子进程
os._exit(255)
#父进程启动自进程后，重新打开gc回收
if gc_was_enabled:
gc.enable()
finally:
#父关闭保存子进程except输出的管道的写端
os.close(errpipe_write)
#父进程也关闭不需要使用的管道的一端
if p2cread is not None and p2cwrite is not None:
os.close(p2cread)
if c2pwrite is not None and c2pread is not None:
os.close(c2pwrite)
if errwrite is not None and errread is not None:
os.close(errwrite)
#通过获取except输出的管道的读端获取最大1M的数据
data = _eintr_retry_call(os.read, errpipe_read, 1048576)
finally:
#父关闭保存子进程except输出的管道的读端
os.close(errpipe_read)
#如果有子进程except输出，抛出自定义错误，init函数那边会try到并做相应处理
if data != "":
try:
_eintr_retry_call(os.waitpid, self.pid, 0)
except OSError as e:
if e.errno != errno.ECHILD:
raise
child_exception = pickle.loads(data)
raise child_exception

下面我们总结下,创建Popen对象时，我们传入subprocess.PIPE。
内部通过os.pipe()创建1-3个管道
生成的子进程复制了这些管道的文件描述符，子进程内部将自己的输出绑定到这写管道上
父进程通过os.fdopen将管道的文件描述符打开为file对象
并赋值给self.stdin self.stdout stderr

因为是file对象，我们就可以直接通过read、readline、readlines等方法获取回显的字符串了
但是由于file对象的read、readline、readlines方法都是阻塞的，那么我们可以这样。
新建立一个线程去读取，并把读出来的内容塞入一个列表，每次我们主进程都去读取这个列表的最后一列
线程中读取后写入列表的延迟需要大于主进程读取列表最后一列的延迟，以免判断内容还没被主进程读取已经进入下一列

读取子进程回显函数

点击(此处)折叠或打开

def stdout_theard(end_mark,cur_stdout,stdout_lock,string_list):
#用户获取subprocess的stdout输出的线程,防止阻塞
#cur_stdout是一个file对象,end_mark是个随机字符串，获取到这个字符串表明结束
#先暂停0.01秒
time.sleep(0.01)
for i in range(3000):
try:
out_put = cur_stdout.readline()
if not out_put:
#添加结束标记
stdout_lock.acquire()
string_list.append(end_mark)
stdout_lock.release()
break
if out_put == end_mark:
#out put正好和end_mark相等的特殊情况
continue
#外部获取到指定内容会清理string_list列表，所以要加锁
stdout_lock.acquire()
string_list.append(out_put.rstrip().lstrip())
stdout_lock.release()
time.sleep(0.03)
except:
print 'wtffff!!!!!!tuichule !!'
break

主进程中启动线程

点击(此处)折叠或打开

stdout_list = []
stdout_lock = threading.Lock()
end_mark = 'end9c2nfxz'
cur_stdout_thread = threading.Thread(target=stdout_theard, args=(end_mark,sub_process.stdout,stdout_lock,stdout_list))
cur_stdout_thread.setDaemon('True')
cur_stdout_thread.start()

主进程中判断子进程回显内容是否正确
我的例子是的作用是 erl进程里输入command_reload_list里的所有命令，并判断并记录每个命令执行后是否有ok_str返回

点击(此处)折叠或打开

for command_reload_dict in command_reload_list:
sub_process.stdin.write(command_reload_dict['com'] + '\r\n')
#每个命令执行后通过线程修改的str list的最后一个元素来获取取回显的最后一行
#得到返回值等于ok_str的为正确,延迟0.2后退出并清理回显,否则总共等待300*0.01秒
ok_str = 'load module %s true' % command_reload_dict['mod']
for i in xrange(300):
if len(stdout_list)>0:
#获得正确的返回,退出
if stdout_list[-1] == ok_str:
#记录当前模块热更成功
command_reload_dict['res'] = 'ok'
break
if stdout_list[-1] == end_mark:
#遇到end_mark 说明读线程已经结束，说明有错,直接退出
return_value['msg'] += 'reload mod process has been exit in [%s]' % command_reload_dict['mod']
return return_value
break
time.sleep(0.01)
#清除上个reload命令产生的回显
stdout_lock.acquire()
del stdout_list[:]
stdout_lock.release()
#子进程输入退出命令
sub_process.stdin.write('q().\r\n')
#等待tmp erl 进程退出
for i in xrange(300):
if len(stdout_list)>0:
if stdout_list[-1] == end_mark:
break
time.sleep(0.01)

=======================================第二个问题的分割线=========================================
进程执行后的返回状态确认进程是否正确结束，不能阻塞
之前我有接触过这个问题的，当时还没细看subprocess源码
http://blog.chinaunix.net/uid-23504396-id-4471612.html

我现在的写法

点击(此处)折叠或打开

if stop_process.poll() is None:
try:
if stop_process.stdout:
stop_process.stdout.close()
if stop_process.stderr:
stop_process.stderr.close()
stop_process.terminate()
time.sleep(0.5)
if stop_process.poll() is None:
stop_process.kill()
time.sleep(0.2)
if stop_process.poll() is None:
print 'wtf!!!!'
else:
stop_process.wait()
else:
stop_process.wait()
except:
print 'wtf?'

上面代码我一直有个疑问,poll()之后如果有问题进程还没结束怎么办？
因为sub_process.wait()是阻塞的，所以我在poll以后直接sub_process.wait()是不是也会被卡住？
subprocess的wati到底调用了什么？

当然我也可以像获取回显那样，启一个线程，主进程通过一个可以指定次数的循环来获取wait返回。
不过这样做太绕了，所以我们直接进代码看，把wait彻底搞明白

点击(此处)折叠或打开

def poll(self):
return self._internal_poll()

点击(此处)折叠或打开

def _internal_poll(self, _deadstate=None, _waitpid=os.waitpid,
_WNOHANG=os.WNOHANG, _os_error=os.error, _ECHILD=errno.ECHILD):
"""Check if child process has terminated. Returns returncode
attribute.
This method is called by __del__, so it cannot reference anything
outside of the local scope (nor can any methods it calls).
"""
if self.returncode is None:
try:
pid, sts = _waitpid(self.pid, _WNOHANG)
if pid == self.pid:
self._handle_exitstatus(sts)
except _os_error as e:
if _deadstate is not None:
self.returncode = _deadstate
if e.errno == _ECHILD:
# This happens if SIGCLD is set to be ignored or
# waiting for child processes has otherwise been
# disabled for our process. This child is dead, we
# can not get the status.
# http://bugs.python.org/issue15756
self.returncode = 0
return self.returncode

再看看wait的代码

点击(此处)折叠或打开

def wait(self):
"""Wait for child process to terminate. Returns returncode
attribute."""
while self.returncode is None:
try:
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
except OSError as e:
if e.errno != errno.ECHILD:
raise
# This happens if SIGCLD is set to be ignored or waiting
# for child processes has otherwise been disabled for our
# process. This child is dead, we can not get the status.
pid = self.pid
sts = 0
# Check the pid and loop as waitpid has been known to return
# 0 even without WNOHANG in odd situations. issue14396.
if pid == self.pid:
self._handle_exitstatus(sts)
return self.returncode

看到这里就明白了，poll和wait最终调用的是os.waitpid，但是poll是非阻塞的wait是阻塞的.....
我们看看python的文档

点击(此处)折叠或打开

os.waitpid(pid, options)
The details of this function differ on Unix and Windows.
On Unix: Wait for completion of a child process given by process id pid, and return a tuple containing its process id and exit status indication (encoded as for wait()). The semantics of the call are affected by the value of the integer options, which should be 0 for normal operation.
os.WNOHANG
The option for waitpid() to return immediately if no child process status is available immediately. The function returns (0, 0) in this case.

所以，发送kill信号后，pool()后就不需要wait了

通过阅读python subprocess源码尝试实现非阻塞读取stdout以及非阻塞wait的更多相关文章

教你阅读 Cpython 的源码（二）
第二部分:Python解释器进程在上节教你阅读 Cpython 的源码(一)中,我们从编写Python到执行代码的过程中看到Python语法和其内存管理机制. 在本节,我们将从代码层面去讨论 ,Py ...
daily news新闻阅读客户端应用源码(兼容iPhone和iPad)
daily news新闻阅读客户端应用源码(兼容iPhone和iPad),也是一款兼容性较好的应用,可以支iphone和ipad的阅读阅读器源码,设计风格和排列效果很不错,现在做新闻资讯客户端的朋友可 ...
读取本地HTML的小说阅读器应用源码项目
该源码是一个不错的读取本地HTML的小说阅读器,读取本地HTML的小说阅读器,并且源码也比较简单的,非常适合我们的新手朋友拿来学习,有兴趣的朋友研究下. 源码下载: http://code.662p. ...
【转】python:让源码更安全之将py编译成so
python:让源码更安全之将py编译成so 应用场景 Python是一种面向对象的解释型计算机程序设计语言,具有丰富和强大的库,使用其开发产品快速高效. python的解释特性是将py编译为独有的二 ...
如何阅读Android系统源码－收藏必备
对于任何一个对Android开发感兴趣的人而言,对于android系统的学习必不可少.而学习系统最佳的方法就如linus所言:"RTFSC"(Read The Fucking So ...
《python解释器源码剖析》第0章--python的架构与编译python
本系列是以陈儒先生的<python源码剖析>为学习素材,所记录的学习内容.不同的是陈儒先生的<python源码剖析>所剖析的是python2.5,本系列对应的是python3. ...
android新闻项目、饮食助手、下拉刷新、自定义View进度条、ReactNative阅读器等源码
Android精选源码 Android仿照36Kr官方新闻项目课程源码一个优雅美观的下拉刷新布局,众多样式可选安卓版本的VegaScroll滚动布局 android物流详情的弹框健身饮食记录助手 ...
android选择器汇总、仿最美应用、通用课程表、卡片动画、智能厨房、阅读客户端等源码
Android精选源码 android各种选择器汇总源码高仿最美应用项目源码 android通用型课程表效果源码 android实现关键字变色 Android ViewPager卡片视差.拖拽及 ...
android五子棋游戏、资讯阅读、大学课程表、地图拖拽检测、小说搜索阅读app等源码
Android精选源码 Android 自动生成添加控件 android旋转动画.圆形进度条组合效果源码一款很强的手机五子棋app源码 android地图拖拽区域检测效果源码实现Android大学 ...

随机推荐

dos命令：系统命令
系统命令一.mode命令 1.介绍配置系统设备. 2.语法串行端口: MODE COMm[:] [BAUD=b] [PARITY=p] [DATA=d] [STOP=s] [to=on|off] ...
使用scrapy ImagesPipeline爬取图片资源
这是一个使用scrapy的ImagesPipeline爬取下载图片的示例,生成的图片保存在爬虫的full文件夹里. scrapy startproject DoubanImgs cd DoubanIm ...
BUG_sql未解决bug
[SQL]truncate table org_cert;受影响的行: 0时间: 0.021s [Err] 1055 - Expression #1 of ORDER BY clause is not ...
自动化测试-22.RobotFrameWork鼠标和键盘的操作针对出现window界面的处理
键盘和鼠标的操作:使用AutoItLibrary模块 1.安装pywin32 http://sourceforge.net/projects/pywin32/files/pywin32/Build%2 ...
JS之计时器
JavaScript 计时事件通过使用 JavaScript,我们有能力作到在一个设定的时间间隔之后来执行代码,而不是在函数被调用后立即执行.我们称之为计时事件. 在 JavaScritp 中使用计 ...
centos 安装thrift
Thrift介绍 Thrift是一个软件框架,用来进行可扩展且跨语言的服务的开发.它结合了功能强大的软件堆栈和代码生成引擎,以构建在 C++, Java, Python, PHP, Ruby, Erl ...
Kaggle：Home Credit Default Risk 数据探索及可视化（1）
最近博主在做个 kaggle 竞赛,有个 Kernel 的数据探索分析非常值得借鉴,博主也学习了一波操作,搬运过来借鉴,原链接如下: https://www.kaggle.com/willkoehrs ...
npm http-server ubuntu
Node.js中http-server的使用使用阿里的npm镜像国外的npm太慢了.查看一下自己使用的源: npm config get registry 1 应该显示https://regist ...
安卓 dex 通用脱壳技术研究（一）
注:以下4篇博文中,部分图片引用自DexHunter作者zyqqyz在slide.pptx中的图片,版本归原作者所有: 0x01 背景介绍安卓 APP 的保护一般分为下列几个方面: JAVA/C代码 ...
Python之路，第十八篇：Python入门与基础18
python3 面向对象编程2 类方法: @classmethod 作用:1,类方法是只能访问类变量的方法: 2,类方法需要使用@classmethod 装饰器定义: 3,类方法的第一个参数是类的实 ...

通过阅读python subprocess源码尝试实现非阻塞读取stdout以及非阻塞wait

通过阅读python subprocess源码尝试实现非阻塞读取stdout以及非阻塞wait的更多相关文章

随机推荐

热门专题