让你的saga更具有可伸缩性(Scaling NServiceBus Sagas)
https://lostechies.com/jimmybogard/2013/03/26/scaling-nservicebus-sagas/
当我们使用NServiceBus sagas (process managers)的时候,特别是在一个存在大量消息的情况下,我们常常会碰到下面两个问题:
- 死锁
- 资源不足
这是因为saga的设计师:
- 一个saga对应一个saga实体(导致死锁)
- 一个saga处理的所有的消息全部传递到同一个终点(endpoint )(导致性能瓶颈)
每个问题对应各自的解决方式,但是和之前一样,我们也会用真实世界中的例子来观察我们的问题。这些问题都被记录在企业集成模式()中,所以你如果遇到这些问题不必要惊讶。
死锁
如果我们的saga的并行需求突然激增就很容易产生死锁。回到我们的麦当劳例子,我们发布了一个消息然后等待所有的工作台报告:
When looking at NServiceBus sagas (process managers), especially at high volume of messages, we often run into two main problems:
- Deadlocks
- Starvation
This is because of the fundamental (default) design of sagas is that:
- A single saga shares a single saga entity (causing deadlocks)
- All messages handled by a saga are delivered to the same endpoint (causing a bottleneck)
Two very different problems, with very different solutions. But, as per usual, we’ll look at how we might handle these situations in the real life to see how our virtual world should behave. Both of these problems (amongst others) are called out in the Enterprise Integration Patterns book, so it shouldn’t be a surprise if you run into these issues. Not as clear, however, is how to deal with them.
Deadlocks
Deadlocks in sagas are pretty easy to run into if we start pumping up the concurrency on our sagas. Back in our McDonald’s example, we published a message out and waited for all of the stations to report back:
![]()
In our restaurant, there’s only room for one person to examine a single tray at a time. When someone finishes a food item, they have to go look at all the trays. But what if someone is putting food on my tray? Do I push them out of the way?
Probably not, I need to wait for them to finish. It turns out that this is what happens in our sagas – we can only really allow one person at a time to affect a saga instance. The way NServiceBus can handle this by default is by adjusting the isolation level for Serializable transactions. This ensures that only one person looks at a tray at any given time.
What we don’t want to have happen is the fries person look at the tray, see a missing sandwich, and decide that the order isn’t done. The sandwich person does the same thing in reverse – look at the tray, see missing fries, and decide the order isn’t done.
It turns out that our Serializable isolation level has unintended consequences – we might lock more trays (or ALL of the trays) unintentionally. If there are more than one message that can start our saga, then effectively every employee has to look at all the trays to see if our tray is already on the counter, or we’re the first one and need to put a new one out. But if someone is also doing the same thing – multiple trays for the same order might go out!
We don’t want to block the entire counter for our order, so we can adjust our scheme slightly. If we implement a simple versioning scheme, we can simply have the employees look at a version number on the receipt, make their changes, and bump the version number with a little tally mark. If the tally mark has gone up since we last looked at the tray, we’ll just re-do our work – something has changed, so let’s reevaluate our decisions.
To get around the problem of too many trays for the same order, we can rely on a third party to ensure we don’t have too many trays. If when anyone places a tray down, the supervisor enforces this rule that only one tray per order can exist, they’ll ask us to put our tray back up and redo our work.
Two things we want to do then:
- Introduce a unique constraint on the correlation identifier to prevent duplicates
- Use optimistic concurrency to allow us to isolate just our own tray (and not affect anyone else)
This is a rather straightforward fix – and in fact, NServiceBus defaults to using optimistic concurrency in 4.0 (now in beta). This was a rather easy fix, but what about our other problem of the bottleneck?
Starvation and bottlenecks
Back in my early days of school, I worked at a fast food restaurant during breaks (surprise, surprise). I worked the graveyard shift of a 24-hour burger joint, and it was just me and one other person manning the entire place. From 10PM to 6AM, just two of us.
One thing that I found pretty early was that we had quite a few “regulars”. These were folks that every morning, came in and ordered the same thing at around the same time. Because it was quite early, they could rely on little to no contention for resources, and the queue was more or less empty.
Most of the time.
But every once in a while, we would get someone ordering food for a large group of people. We didn’t do catering, but we did sell breakfast tacos. Handling one or two at a time wasn’t a problem, but someone coming in and ordering 50 tacos – that’s a problem. Our process looked something like this:
![]()
The problem was, because there were only two of us, most messages flowed through exactly one channel. In NServiceBus Sagas, this is what happens as well – all messages for a saga are delivered to a single endpoint. We might have the situation like this:
Saga Queue
Prep Taco – #1
Prep Taco – #1
Prep Taco – #1
Prep Taco – #1
Prep Taco – #1
Prep Taco – #1
Prep Taco – #1
Prep Pancakes – #2
My cook has dozens of tacos to go through before getting to that second order, and because that person packs all the food and preps all the food, all items from the first order must be both prepped and packed before order #2 is complete:
Saga Queue
Pack Taco – #1
Pack Taco – #1
Pack Taco – #1
Pack Taco – #1
Pack Taco – #1
Pack Taco – #1
Pack Taco – #1
Pack Pancakes – #2
That’s not a huge problem, except we have both packing and prepping done by a single person. Another order comes in, the prepping for that one interferes with the others. Additionally, items are sitting in the saga queue waiting to get packed while prepping finishes.
In many NServiceBus sagas, the sagas themselves don’t perform the actual work. They’re instead the controllers/coordinators, making decisions about next steps. But all the messages are delivered to the exact same queue, effectively creating a bottleneck. I’m waiting for all prepping to be done before moving on to any packing. The time it takes to this entire set of orders is basically the time it takes to prep all items plus the time it takes to pack all items.
The problem is our saga is modeled rather strangely. In the real world, long-running business processes, split into multiple activities/steps, are performed many times by different people. Effectively, different channels/endpoints/queues. But our saga shoved everyone back into one queue!
What we did in our taco tragedy above was that I came off the line from taking orders and performed the prep stage. For N items, we took a total time of N+1 or so. We had two queues going for the one saga:
Prep Queue
Prep Taco – #1
Prep Pancakes – #2
Pack Queue
Pack Taco – #1
Pack Taco – #1
The total number of items in the Pack queue was always small – packing was easy and quick. By splitting the saga handling into separate queues, we ensured that an overload in one type of message didn’t choke out all the others. This isn’t the default way of managing long-running processes in NServiceBus though, we have to set this up ourselves.
When dealing with the Process Manager pattern, it’s important to keep in mind what this pattern brings to the table. We’re able to support complex message flow, but at the price of potential bottlenecks and deadlocks.
Next time – a better pattern for linear processes that doesn’t bring in deadlocks and bottlenecks, the routing slip pattern.
让你的saga更具有可伸缩性(Scaling NServiceBus Sagas)的更多相关文章
- 【NServiceBus】什么是Saga,Saga能做什么
前言 Saga单词翻译过来是指尤指古代挪威或冰岛讲述冒险经历和英雄业绩的长篇故事,对,这里强调长篇故事.许多系统都存在长时间运行的业务流程,NServiceBus使用基于事件驱动的 ...
- Saga的实现模式——观察者(Saga implementation patterns – Observer)
https://lostechies.com/jimmybogard/2013/03/11/saga-implementation-patterns-observer/ 侵删. NServiceBus ...
- NServiceBus+Saga开发分布式应用
前言 当你在处理异步消息时,每个单独的消息处理程序都是一个单独的handler,每个handler之间互不影响.这时如果一个消息依赖另一个消息的状态呢? 这时业务逻辑怎么处理? ...
- Python cPickle模块
新博客地址:http://gorthon.sinaapp.com/ 持久性就是指保持对象,甚至在多次执行同一程序之间也保持对象.通过本文,您会对 Python对象的各种持久性机制(从关系数据库到 Py ...
- ]Java 5|6 并发包介绍
ava.util.concurrent 包含许多线程安全.测试良好.高性能的并发构建块.不客气地说,创建 java.util.concurrent 的目的就是要实现 Collection 框架对数据结 ...
- Java多线程与并发面试题
1,什么是线程? 线程是操作系统能够进行运算调度的最小单位,它被包含在进程之中,是进程中的实际运作单位.程序员可以通过它进行多处理器编程,你可以使用多线程对运算密集型任务提速.比如,如果一个线程完成一 ...
- Java面试题之基础篇概览
Java面试题之基础篇概览 1.一个“.java”源文件中是否可以包含多个类(不是内部类)?有什么限制? 可以有多个类,但只能有一个public的类,且public的类名必须与文件名相一致. 2.Ja ...
- [ Java面试题 ]并发篇
1.Java内存模型是什么? Java内存模型规定和指引Java程序在不同的内存架构.CPU和操作系统间有确定性地行为.它在多线程的情况下尤其重要.Java内存模型对一个线程所做的变动能被其它线程可见 ...
- Java多线程面试题整理
部分一:多线程部分: 1) 什么是线程? 线程是操作系统能够进行运算调度的最小单位,它被包含在进程之中,是进程中的实际运作单位.程序员可以通过它进行多处理器编程,你可以使用多线程对运算密集型任务提速. ...
随机推荐
- Java并发(10)- 简单聊聊JDK中的七大阻塞队列
引言 JDK中除了上文提到的各种并发容器,还提供了丰富的阻塞队列.阻塞队列统一实现了BlockingQueue接口,BlockingQueue接口在java.util包Queue接口的基础上提供了pu ...
- [bzoj4602][Sdoi2016]齿轮——dfs
题目 现有一个传动系统,包含了N个组合齿轮和M个链条.每一个链条连接了两个组合齿轮u和v,并提供了一个传动比x : y.即如果只考虑这两个组合齿轮,编号为u的齿轮转动x圈,编号为v的齿轮会转动y圈.传 ...
- springmvc对于前台date类型注意点
springmvc,可以自动将数据注入到: “name”值相同,便注入,比如String Integer 还有我们自定义的bean,比如User. 但是date类型的数据,如果前台传的是用" ...
- 在表达式和脚本中将bean实例暴露出来
默认情况下,在activiti.cfg.xml中的所有bean和Spring配置文件中的所有bean都可以用于表达式和脚本.如果要限制配置文件中的bean的可见性,可以在流程引擎配置文件中配置一个名为 ...
- MySQL阅读笔记
左连接:包含所有的左边表中的记录甚至是右边表中没有和它匹配的记录.右连接:包含所有的右边表中的记录甚至是左边表中没有和它匹配的记录. select ename,deptname from emp le ...
- hit2739
好题,回路的问题一般都要转化为度数来做若原图的基图不连通,或者存在某个点的入度或出度为0则无解.统计所有点的入度出度之差di对于di>0的点,加边(s,i,di,0):对于di<0的点,加 ...
- 设置iSCSI的发起程序(客户端)(三)
iSCSI 发起程序是一种用于同 iSCSI 目标器认证并访问服务器上共享的LUN的客户端.我们可以在本地挂载的硬盘上部署任何操作系统,只需要安装一个包来与目标器验证. 初始器客户端设置 功能 可以处 ...
- Web.Config文件配置之限制上传文件大小和时间
在邮件发送系统或者其他一些传送文件的网站中,用户传送文件的大小是有限制的,因为这样不但可以节省服务器的空间,还可以提高传送文件的速度.下面介绍如何在Web.Config文件中配置限制上传文件大小与时间 ...
- scrapy xpath 从response中获取li,然后再获取li中img的src
lis = response.xpath("//ul/li") for li in lis: src = li.xpath("img/@src") # 如果xp ...
- Codeforces 626 B. Cards (8VC Venture Cup 2016-Elimination Round)
B. Cards time limit per test 2 seconds memory limit per test 256 megabytes input standard input ...