error msg:

System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException: Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token) at System.Net.Http.HttpConnection.ReadAsync(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) at System.Net.Security.SslStreamInternal.<FillBufferAsync>g__InternalFillBufferAsync|38_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial) at System.Net.Security.SslStreamInternal.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.FillAsync() at System.Net.Http.HttpConnection.ReadNextResponseHeaderLineAsync(Boolean foldedHeadersAllowed) at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken) --- End of inner exception stack trace ---

相关bug

1.

Most / all of these requests were using POST operations (or other methods that have method bodies).
We suspect, but aren't sure, that this might be related to connection pooling in some manner.

2.

That error message is for SocketError.OperationAborted, which likely means the socket was closed while the send or receive was in progress.

3.

Is there anything here that we can look at?
To reproduce this we need to schedule > 5 runs of 4,000+ tests that takes an hour plus.
In such scenario, we typically see 1 - 3 such failures using SocketsHttpHandler.

Note that we never use SocketsHttpHandler directly, so I don't think that we improperly using it.
We got through HttpClient, a shared instance that is re-used among threads, as recommended.

The code is actually doing something like:

try
{
return await httpHandler.SendAsync(...);
}
catch(HttpRequestException e)
{
}

So we can catch the exception and then check state / logging / something.

Anything you can recommend that we'll look at to get to the root cause?

4.

Is it possible you're hitting a timeout? It shouldn't manifest this way, but it would give us a place to start looking. For example, if you change your HttpClient's Timeout to Timeout.InfiniteTimeSpan, does this still happen?

Or if it's potentially related to connection pooling, what happens if you change the SocketHttpHandler's PooledConnectionLifetime and PooledConnectionIdleTimeout to infinite... does it still repro?

5.

And... I have a repro. On my machine, this reproduce the scenario instantly.
You were correct about the timeout causing this.

using System;
using System.Net.Http;
using System.Threading.Tasks; namespace ConsoleApp15
{
class Program
{
static async Task Main(string[] args)
{ // http://slowwly.robertomurray.co.uk/delay/3000/url/http://www.google.co.uk var client = new HttpClient(new SocketsHttpHandler
{
})
{
Timeout = TimeSpan.FromMilliseconds(250)
}; var tasks = new[]
{
Run(client),
Run(client),
Run(client),
Run(client),
Run(client),
Run(client),
Run(client),
Run(client),
Run(client),
}; await Task.WhenAll(tasks); Console.WriteLine("Done"); } public static async Task Run(HttpClient client)
{
for (int i = 0; i < 25; i++)
{
try
{
await client.GetStringAsync("http://slowwly.robertomurray.co.uk/delay/3000/url/http://www.google.co.uk");
Console.WriteLine("Should timeout");
}
catch (Exception e)
{
}
await client.GetStringAsync("http://slowwly.robertomurray.co.uk/delay/100/url/http://www.google.co.uk");
} }
}
}

  

6.

That's hitting a timeout and throwing an OperationCanceledException... that's expected.

What's not expected is getting an IOException (not wrapped in an OperationCanceledException) out of SendAsync and when nothing on the request is being canceled / timing out.

7.

I captured System.Net logs and Wireshark traces of another failure. Here's where I see the failure in Wireshark:

1990370 214.439692    127.0.0.1 → 127.0.0.1    TCP 84 51686 → 51687 [FIN, ACK] Seq=1 Ack=10334254 Win=525568 Len=0
1990371 214.439736 127.0.0.1 → 127.0.0.1 TCP 84 51687 → 51686 [ACK] Seq=10334254 Ack=2 Win=525568 Len=0
1990408 274.618073 127.0.0.1 → 127.0.0.1 TCP 84 51687 → 51686 [RST, ACK] Seq=10334254 Ack=2 Win=0 Len=0

The RST looks unusual to me, especially since it's occurring after the connection should have been closed. The timestamp on the RST makes it look like is was sent 60 seconds after the connection was closed, but I'm still investigating whether or not that's accurate or just an issue with the logs I collected.

I'm going to try capture another network trace to see if the issue occurs after a FIN again, since that could be valuable info.

8.

We are seeing the exact same thing when running our applications on Linux. A temporary "workaround" for us has been to set the timeout to 100 seconds(it's not an acceptable work around).

When we have the timeout to something more reasonable, like 10 seconds or something, we get the error every 500 request or something like that.

The httpendpoint that we are calling are under our control and when the error happens, we can't even see that the request has reached the service. So it seems like the request doesn't leave the box.
We are using dotnet core 2.1.

I've started a stopwatch for each requests and then Im catching TaskCanceledException and logs the following(this is when we have the timeout set to 15 seconds):

Task was cancelled after 15000ms System.Threading.Tasks.TaskCanceledException:
The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException: Operation canceled
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
at System.Net.Security.SslStreamInternal.g__InternalFillBufferAsync|38_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)

9.

Thanks for the additional info. It's possible the behavior you're seeing is related, but it sounds more like an actual timeout. In this case there are two differences from what I've observed. (1) in all other cases I've seen the connection is actually sending the data, as observed via wireshark and (2) in other cases the failure occurs well before a timeout should have occurred.

There's an open issue right now (#20296) tracking the fact that we throw TaskCanceledException on an actual request timeout. I think that's probably the actual root cause here.

If you think that your issue isn't actually being caused by timeouts though, feel free to open a new issue with all of the info you shared here.

10.

Interesting development. We have encountered this issue after adding to our test suite a test that spawns a process with powershell script (very simple one - writing 1 line to a file). ~30 seconds after this tests there is a failure in 1-3 random subsequent tests.

Because of that we thought that this might be related to processes inheriting handles from the parent one and it appears that we are right.

We have made a PR that unskipps those tests and switches the implementation of Process to the one that sets 'inherit handles' to 'false'. After doing so everything seems to be fine, it looks like the issue is with handle inheritance and the behavior of those handles after that child process ends.

The PR in question is here: ravendb/ravendb#8028

Our idea is that, at least as a workaround, it would be good to be able to set the

retVal = Interop.Kernel32.CreateProcess(
null, // we don't need this since all the info is in commandLine
commandLine, // pointer to the command line string
ref unused_SecAttrs, // address to process security attributes, we don't need to inherit the handle
ref unused_SecAttrs, // address to thread security attributes.
true, // handle inheritance flag
creationFlags, // creation flags
(IntPtr)environmentBlockPtr, // pointer to new environment block
workingDirectory, // pointer to current directory name
ref startupInfo, // pointer to STARTUPINFO
ref processInfo // pointer to PROCESS_INFORMATION
);

  

11.

That's very interesting -- thanks for sharing that info. I briefly looked into the possibility that this was related to socket handles a few weeks ago after observing similar behavior with the h2spec process spawned by the Kestrel tests. It's probably worth looking into in more detail if you're seeing the same behavior.

12.

Note that as of 3.0, socket handles are no longer inherited: #32903

13.

that would probably solve the issue, but:

  1. Can this be applied to 2.1?

or

  1. Can you add private bool _inheritHandles = true field to Process.cs that we would be able to set via reflection to false at least?

14.

If that actually "fixes" it, I have trouble believing it's anything other than masking a symptom, and we should actually find the root cause and fix that.

Unless we believe the root cause really is that the socket handle is getting inherited and somehow that child process then closing is causing problems for the socket? In which case we should be able to repro that pretty easily, and if that's actually the problem, then I agree it would be a candidate for back porting.

15.

Fixing the root cause would be best, but till then could you consider applying point 2. so we can workaround this?

16.

Applying it where... as a servicing fix to 2.1? That's unlikely.

17.

Adding that private field in Process.cs (probably Process.Windows.cs) and using it instead of

would be a problem?

18.

The bar for servicing fixes is very high. You're suggesting adding unexposed, unvalidated, functionality-accessible-only-via-reflection-and-that-we-encourage-developers-to-use-via-reflection as a servicing fix. Hence "unlikely". If you need it immediately, you can of course make the change in your own local fork.

Let's get to the bottom of the problem and then figure out the right fix.

19.

I agree with @stephentoub.
I feel we are getting closer. @rmkerr invested already weeks of his time trying to get to the bottom of this (incl. extensive discussion with winsock team in Windows). Let's see if this new information can help root cause it and address the root cause.
If it doesn't help too much, we can think about either porting the change #32903 to 2.1, or doing something to enable people work around the suspectible root cause.

20.

I've been trying to build a repro that launches a process and then has several sockets sending and receiving data, but I have not been able to force the issue to occur. @ppekrol, would it be possible to set up a project with just the affected test and the code you're using to spawn the process?

If not I'll put some more time into the repro when I'm back in the office next week.

21.

Hi @rmkerr
Apologies for the delay, I really doubt that this would be doable. We already spend significant amount of hours to isolate the issue, create a smaller subset of tests, but for an unknown reason, it only appears for us on a larger set of tests e.g. our full suite. I do not have an idea what else could be done to isolate this. We tried moving that from SlowTests project (takes around 20 min to complete) to FastsTests (around 90 seconds), but it does not reproduce then.

22.

No problem @ppekrol. The info you shared is still valuable. I'm back in the office today, and will spend some time trying to cut down the size of the repro.

23.

I'm able to reproduce the behavior described by @ppekrol with a modified version of the kestrel test suite.

To start, I disabled all of the tests except for the H2Spec tests (which are launches as a separate process) andConnectionNotClosedWhenClientSatisfiesMinimumDataRateGivenLargeResponseHeaders, which was the test that I observed failing the most frequently. In order to reproduce the issue more consistently, I set up the ConnectionNotClosed... test to run five times.

I was able to further minimize the repro by replacing the H2Spec tests with the following code:

[Fact]
public void LaunchProcess()
{
for (int i = 0; i < 100; ++i)
{
//Thread.Sleep(100);
Process.Start("Powershell.exe", @"""C:\Tools\HelloWorld.ps1""").WaitForExit();
}
}

The targeted script just prints "hello world" to the console.

When running the test suites after this modification, I was able to reproduce the failure on 6 out of 10 runs. After replacing Process.Start with Thread.Sleep (as commented out above), the tests passed on 10 out of 10 runs.

Obviously this is pretty rough testing, and there isn't a lot of data. But I do think the behavior is interesting.

HttpClient exception:ExceptionType:System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation ca的更多相关文章

  1. .Net多线程编程—System.Threading.Tasks.Parallel

    System.Threading.Tasks.Parallel类提供了Parallel.Invoke,Parallel.For,Parallel.ForEach这三个静态方法. 1 Parallel. ...

  2. 转载 Net多线程编程—System.Threading.Tasks.Parallel

    .Net多线程编程—System.Threading.Tasks.Parallel   System.Threading.Tasks.Parallel类提供了Parallel.Invoke,Paral ...

  3. System.Threading.Tasks.Task 引起的 IIS 应用池崩溃

    接口服务运行一段时间后,IIS应用池就会突然挂掉,事件查看日志,会有事件日志Event ID为5011的错误 为应用程序池“PokeIn”提供服务的进程在与 Windows Process Activ ...

  4. System.Threading.Tasks.Task引起的IIS应用程序池崩溃

    问题现象 IIS应用程序池崩溃(Crash)的特征如下: 1. 从客户端看,浏览器一直处于连接状态,Web服务器无响应. 2. 从服务器端看(Windows Server 2008 + IIS 7.0 ...

  5. 一、并行编程 - 数据并行 System.Threading.Tasks.Parallel 类

    一.并行概念 1.并行编程 在.NET 4中的并行编程是依赖Task Parallel Library(后面简称为TPL) 实现的.在TPL中,最基本的执行单元是task(中文可以理解为"任 ...

  6. System.Threading.Tasks.Task 任务引起的IIS应用程序池崩溃

    转载:http://www.cnblogs.com/aaa6818162/p/4421305.html 问题现象 IIS应用程序池崩溃(Crash)的特征如下: 1. 从客户端看,浏览器一直处于连接状 ...

  7. using System.Threading.Tasks;

    using System.Threading.Tasks; .Net并行库介绍——Task1

  8. vs2013c#测试using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; namespace ConsoleApplication1_CXY { class Program { stati

    首先安装Unit Test Generator.方法为:工具->扩展和更新->联机->搜索“图标为装有蓝色液体的小试管.Unit Test Generator”, 编写代码,生成一个 ...

  9. System.Threading.Tasks并发和异步代码使用

    main.cs System.Threading.Tasks.Parallel.For(0, 10, i =>            {                TestLock test ...

随机推荐

  1. Python title()、upper()、lower()方法--string

    描述 title()方法: 将字符串中的单词“标题化”,即首字母大写,其余字母转化为小写. upper()方法:将字符串中的小写字母转化为大写字母. lower()方法:将字符串中的大写字母转化为小写 ...

  2. Mac ssh启动和停止

    原文地址:http://blog.csdn.net/cwj649956781/article/details/37913637 mac本身安装了ssh服务,默认情况下不会开机自启 1.启动sshd服务 ...

  3. 红黑树与AVL特性

    红黑树:比较平衡的二叉树,没有一条路径会比其他路径长2倍,因而是近似平衡的.所以相对于严格要求平衡的AVL树来说,它的旋转保持平衡次数较少.插入删除次数多的情况下我们就用红黑树来取代AVL. 红黑树规 ...

  4. 2019春第八周作业Compile Summarize

    这个作业属于那个课程 C语言程序设计II 这个作业要求在哪里 在这里 我在这个课程的目标是 能更加进一步的够熟练掌握指针的用法 这个作业在那个具体方面帮助我实现目标 指针对于高阶题目的做法 参考文献与 ...

  5. C# 字典Dictionary

    Dictionary<TKey, TValue> 泛型类提供了从一组键到一组值的映射.通过键来检索值的速度是非常快的,接近于 O(1),这是因为 Dictionary<TKey, T ...

  6. Golang 之协程详解

    转自:https://www.cnblogs.com/liang1101/p/7285955.html 一.Golang 线程和协程的区别 备注:需要区分进程.线程(内核级线程).协程(用户级线程)三 ...

  7. BIML 101 - ETL数据清洗 系列 - BIML 快速入门教程 - 序

    BIML 101 - BIML 快速入门教程 做大数据的项目,最花时间的就是数据清洗. 没有一个相对可靠的数据,数据分析就是无木之舟,无水之源. 如果你已经进了ETL这个坑,而且预算有限,并且有大量的 ...

  8. Failed to set MokListRT: Invalid Parameter Something as gone seriously wrong: import_mok_state() failed: Invalid Parameter

    今天yum update升级centos7,重启后发现开不了机,报错如下: Failed to set MokListRT: Invalid ParameterSomething as gone se ...

  9. Git使用之pull request

    一直对git的使用都不熟,由于工作需要经常需要在github上pull request,第一次还是有些麻烦的,写个笔记记录下 1. fork源项目到自己的github仓库中 fork之后自己也会多出一 ...

  10. java中进程与线程的区别

    进程是一个正在运行的应用程序.一个进程包含一个或多个线程.它可以是一段完整的代码或部分程序的动态执行.系统资源分配与调度的基本单位.而线程是CPU调度与运行的基本单位,它是一组指令的集合或是程序的特殊 ...