1、问题的发现

  今天,一个在windows上运行良好的python脚本放到linux下报错,提示错误 BrokenPipeError: [Errno 32]Broken pipe。经调查是subprocess.run方法的timeout参数在linux上的表现和windows上不一致导致的。

try:
ret = subprocess.run(cmd, shell=True, check=True, timeout=5,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
except Exception as e:
logging.debug(f"Runner FAIL")

2、问题描述

 为了描述这个问题,做了下面这个例子。subprocess.run调用了1个需要10s才能执行完的程序,但是却设定了1s的超时时间。理论上这段代码应该在1s后因超时退出,但事实并不如此。

import subprocess
import time t = time.perf_counter()
args = 'python -c "import time; time.sleep(10)"'
try:
p = subprocess.run(args, shell=True, check=True,timeout=1,stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
except Exception as e:
print(f"except is {e}")
print(f'coast:{time.perf_counter() - t:.8f}s')

 在windows上测试:

PS C:\Users\peng\Desktop> Get-ComputerInfo | select WindowsProductName, WindowsVersion, OsHardwareAbstractionLayer

WindowsProductName WindowsVersion OsHardwareAbstractionLayer
------------------ -------------- --------------------------
Windows 10 Pro 2009 10.0.19041.2251
PS C:\Users\peng\Desktop> python .\test_subprocess.py
except is Command 'python -c "import time; time.sleep(10)"' timed out after 1 seconds
coast:10.03642740s
PS C:\Users\peng\Desktop>

 在linux上测试:

21:51:31 wp@PowerEdge:~/bak$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
21:56:45 wp@PowerEdge:~/bak$ python test_subprocess.py
except is Command 'python -c "import time; time.sleep(10)"' timed out after 1 seconds
coast:1.00303393s
21:57:02 wp@PowerEdge:~/bak$

 可见,subprocess.run的timeout参数在windows下并没有生效。subprocess.run执行指定的命令,等待命令执行完成后返回一个包含执行结果的CompletedProcess类的实例。这个函数的原型为:

subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, capture_output=False,
shell=False, cwd=None, timeout=None, check=False, encoding=None, errors=None, text=None, env=None, universal_newlines=None)
  • args:表示要执行的命令。必须是一个字符串,字符串参数列表。
  • stdin、stdout 和 stderr:子进程的标准输入、输出和错误。其值可以是 subprocess.PIPE、subprocess.DEVNULL、一个已经存在的文件描述符、已经打开的文件对象或者 None。subprocess.PIPE 表示为子进程创建新的管道。subprocess.DEVNULL 表示使用 os.devnull。默认使用的是 None,表示什么都不做。另外,stderr 可以合并到 stdout 里一起输出。
  • timeout:设置命令超时时间。如果命令执行时间超时,子进程将被杀死,并弹出 TimeoutExpired 异常。
  • check:如果该参数设置为 True,并且进程退出状态码不是 0,则弹 出 CalledProcessError 异常。
  • encoding: 如果指定了该参数,则 stdin、stdout 和 stderr 可以接收字符串数据,并以该编码方式编码。否则只接收 bytes 类型的数据。
  • shell:如果该参数为 True,将通过操作系统的 shell 执行指定的命令。

3、问题分析

subprocess.run 会等待进程终止并处理TimeoutExpired异常。在POSIX上,异常对象包含读取部分的stdoutstderr字节。上面测试在windows上失效的主要问题是使用了shell模式,启动了管道,管道句柄可能由一个或多个后代进程继承(如通过shell=True),所以当超时发生时,即使关闭了shell程序,而由shell启动的其他程序,本例中是python程序依然在运行中,所以阻止了subprocess.run退出直至使用管道的所有进程退出。如果改为shell=False,则在windows上也出现1s的结果:

python  .\test_subprocess.py
except is Command 'python -c "import time; time.sleep(10)"' timed out after 1 seconds
coast:1.00460970s

  可以说这是windows实现上的一个缺陷,具体的可见:

https://github.com/python/cpython/issues/87512

[subprocess] run() sometimes ignores timeout in Windows #87512


subprocess.run() handles TimeoutExpired by terminating the process and waiting on it. On POSIX, the exception object contains the partially read stdout and stderr bytes. For example:

cmd = 'echo spam; echo eggs >&2; sleep 2'
try: p = subprocess.run(cmd, shell=True, capture_output=True,
text=True, timeout=1)
except subprocess.TimeoutExpired as e: ex = e >>> ex.stdout, ex.stderr
(b'spam\n', b'eggs\n')

 On Windows, subprocess.run() has to finish reading output with a second communicate() call, after which it manually sets the exception's stdout and stderr attributes.

The poses the problem that the second communicate() call may block indefinitely, even though the child process has terminated.

 The primary issue is that the pipe handles may be inherited by one or more descendant processes (e.g. via shell=True), which are all regarded as potential writers that keep the pipe from closing. Reading from an open pipe that's empty will block until data becomes available. This is generally desirable for efficiency, compared to polling in a loop. But in this case, the downside is that run() in Windows will effectively ignore the given timeout.

 Another problem is that _communicate() writes the input to stdin on the calling thread with a single write() call. If the input exceeds the pipe capacity (4 KiB by default -- but a pipesize 'suggested' size could be supported), the write will block until the child process reads the excess data. This could block indefinitely, which will effectively ignore a given timeout. The POSIX implementation, in contrast, correctly handles a timeout in this case.

 Also, Popen.exit() closes the stdout, stderr, and stdin files without regard to the _communicate() worker threads. This may seem innocuous, but if a worker thread is blocked on synchronous I/O with one of these files, WinAPI CloseHandle() will also block if it's closing the last handle for the file in the current process. (In this case, the kernel I/O manager has a close procedure that waits to acquire the file for the current thread before performing various housekeeping operations, primarily in the filesystem, such as clearing byte-range locks set by the current process.) A blocked close() is easy to demonstrate. For example:

args = 'python -c "import time; time.sleep(99)"'
p = subprocess.Popen(args, shell=True, stdout=subprocess.PIPE)
try: p.communicate(timeout=1)
except: pass p.kill() # terminates the shell process -- not python.exe
with p: pass # stdout.close() blocks until python.exe exits

The Windows implementation of Popen._communicate() could be redesigned as follows:

  • read in chunks, with a size from 1 byte up to the maximum available,

    as determined by _winapi.PeekNamedPipe()
  • write to the child's stdin on a separate thread
  • after communicate() has started, ensure that synchronous I/O in worker

    threads has been canceled via CancelSynchronousIo() before closing

    the pipes.

    The _winapi module would need to wrap OpenThread() and CancelSynchronousIo(), plus define the TERMINATE_THREAD (0x0001) access right.

With the proposed changes, subprocess.run() would no longer special case TimeoutExpired on Windows.

BrokenPipeError错误和subprocess.run()超时参数在Windows上无效的更多相关文章

  1. MySQL 各种超时参数的含义

    MySQL 各种超时参数的含义 今日在查看锁超时的设置时,看到show variables like '%timeout%';语句输出结果中的十几种超时参数时突然想整理一下,不知道大家有没有想过,这么 ...

  2. php-fpm nginx 超时参数设置

    php-fpm:request_terminate_timeout = 30php.ini:max_execution_time = 30 request_terminate_timeout 适用于, ...

  3. Flink run提交参数

    折腾了好几天,终于搞定了Flink run提交参数,记录一下. 背景: 之前一直报错,akka,AskTimeoutException,尝试添加akka.ask.timeout=120000s, 依然 ...

  4. Delphi HTTPRIO控件怎么设置超时参数

    HTTPRIO控件怎么设置超时参数 //HTTPRIO1: THTTPRIO  设置5分钟超时 HTTPRIO1.HTTPWebNode.ConnectTimeout := 5000; Connect ...

  5. subprocess.run()用法python3.7

    def run(*popenargs, input=None, capture_output=False, timeout=None, check=False, **kwargs): "&q ...

  6. 如何查看docker run启动参数命令

    通过runlike去查看一个容器的docker run启动参数 安装pip yum install -y python-pip 安装runlike pip install runlike 查看dock ...

  7. Docker run 命令参数及使用

    Docker run 命令参数及使用 Docker run :创建一个新的容器并运行一个命令 语法 docker run [OPTIONS] IMAGE [COMMAND] [ARG...] OPTI ...

  8. Oracle 统计量NO_INVALIDATE参数配置(上)

    转载:http://blog.itpub.net/17203031/viewspace-1067312/ Oracle统计量对于CBO执行是至关重要的.RBO是建立在数据结构的基础上的,DDL结构.约 ...

  9. Http multipart/form-data多参数Post方式上传数据

    最近,工作中遇到需要使用java实现http发送get.post请求,简单的之前经常用到,但是这次遇到了上传文件的情况,之前也没深入了解过上传文件的实现,这次才知道通过post接口也可以,是否还有其他 ...

  10. HTTP 错误 404.2 - Not Found 由于 Web 服务器上的“ISAPI 和 CGI 限制”列表设置,无法提供您请求的页面

    详细错误:HTTP 错误 404.2 - Not Found. 由于 Web 服务器上的“ISAPI 和 CGI 限制”列表设置,无法提供您请求的页面. 出现环境:win7 + IIS7.0 解决办法 ...

随机推荐

  1. Taurus.MVC 微服务框架 入门开发教程:项目集成:6、微服务间的调用方式:Rpc.StartTaskAsync。

    系统目录: 本系列分为项目集成.项目部署.架构演进三个方向,后续会根据情况调整文章目录. 开源地址:https://github.com/cyq1162/Taurus.MVC 本系列第一篇:Tauru ...

  2. Python数据科学手册-Pandas数据处理之简介

    Pandas是在Numpy基础上建立的新程序库,提供了一种高效的DataFrame数据结构 本质是带行标签 和 列标签.支持相同类型数据和缺失值的 多维数组 增强版的Numpy结构化数组 行和列不在只 ...

  3. 记录一下对jdk8后的接口的一些理解

    对于jdk8后的接口,接口中加入了可以定义默认方法和静态方法. 为什么要这样设计呢? 是为了在给接口扩展方法的时候,不会影响已经实现了该接口的类 加入默认方法可以解决:在添加方法的同时,不影响现有的实 ...

  4. 使用 Loki 进行日志报警(二)

    转载自:https://mp.weixin.qq.com/s?__biz=MzU4MjQ0MTU4Ng==&mid=2247492374&idx=1&sn=d09f6db623 ...

  5. 部署一个生产级别的 Kubernetes 应用(以Wordpress为例)

    文章转载自:https://mp.weixin.qq.com/s?__biz=MzU4MjQ0MTU4Ng==&mid=2247487811&idx=1&sn=67b39b73 ...

  6. logstash 读取MySQL数据到elasticsearch 相差8小时解决办法

    logstash和elasticsearch是按照UTC时间的,kibana却是按照正常你所在的时区显示的,是因为kibana中可以配置时区信息. 具体看这个: logstash 的配置文件添加 fi ...

  7. PAT (Basic Level) Practice 1023 组个最小数 分数 20

    给定数字 0-9 各若干个.你可以以任意顺序排列这些数字,但必须全部使用.目标是使得最后得到的数尽可能小(注意 0 不能做首位).例如:给定两个 0,两个 1,三个 5,一个 8,我们得到的最小的数就 ...

  8. Debian+Wine For Termux,兼容Windows on arm的安卓手机子系统!

    如果已经安装了termux,先删掉. 安装方法 下载安装我提供的termux 链接: https://pan.baidu.com/s/13hbp6igps18V2RJcOxgQIg 提取码: 1irn ...

  9. PAT甲级英语单词整理

    proper 正确 合适 vertex(vertices)顶点 respectively 个别 分别 indices 指标 索引 shipping 运输 incompatible 不相容 oxidiz ...

  10. GitLab CI/CD 自动化部署入门

    前言:因为找了B站内推,测试开发,正好知道内部使用GitLab做自动化测试,所以简单学了一下,有错误的地方请指正. 入门 初始化 cp: 无法获取'/root/node-v12.9.0-linux-x ...