RabbitMQ and batch processing 批提交
RabbitMQ - RabbitMQ and batch processing
http://rabbitmq.1065348.n5.nabble.com/RabbitMQ-and-batch-processing-td35634.html
20 posts
|
I mentioned this on Twitter and a couple of people have requested that I bring this up on the mailing list.
It seems to be a given that RabbitMQ was not designed for the batch processing use case (i.e. using RabbitMQ as a buffer between large serial steps). We have a system in place that attempts to do just that, however.
I have been working with the developers of the software involved in an attempt to help them redesign around a more ideal use of RabbitMQ (or to help them move to a different bus altogether -- database or something like kafka) and some of them have been able to simply operate in smaller batch sizes (thus keeping their queues relatively small).
However, I cannot stem the tide of improper RabbitMQ use.
When things go poorly, millions of messages end up in the queues.
In 3.1.x we saw this regularly cause our clusters to partition.
In 3.1.x and 3.2.x when we would delete large queues (5+ million messages enqueued), this would cause the cluster to become unresponsive, run out of memory, and then crash.
During the 3.1 -> 3.2 upgrade, we had to completely rebuild our clusters. When 3.2 came up, it soon crashed.
In the most recent upgrade, we saw a 3.2.3 cluster in our dev environment crash. I performed an opportunistic upgrade to 3.3.1, because hey... downtime already, so let's see if 3.3.1 addresses some of the issues we've been seeing.
After the upgrade, 3.3.1 would not startup at all. I removed /var/lib/rabbitmq/mnesia on all of the nodes and brought RabbitMQ back up.
3.3.1 has been up and running alright so far, but we haven't done another end-to-end test in our development environment in a while. One of these tests can lead to at least a million messages in the queue over a period of time on average.
So, I guess my question is:
If I know that I have people using RabbitMQ like this, and there is nothing I can do to change that fact... what do I do?
_______________________________________________ |
144 posts
|
I'll respond inline w our experience:
On Sun, May 18, 2014 at 2:55 PM, Greg Poirier <[hidden email]> wrote:
It is not a 'given' as far as we are concerned. We have some processes that result in a million or more messages being queued within a minute or so. These messages are processed over the ensuing several minutes (for 'dismissals' of news items from individual devices) to several hours (for lower-priority individualized 'offers'). This is the new 'batch'.
We put large message bodies in S3 and pass them by reference. We never use RabbitMQ persistence and compensate for that with replication. For 'real' persistence we use Cassandra. Most importantly, none of our internal users know this, as we provide them with an abstracted interface.
We try to make it easier to use us than not. We work hard to be the most reliable, fastest, most scalable, most flexible and cheapest component of our customers technology mix.
We target zero length queues. If they grow unexpectedly we: 1) autoscale, 2) shift load, 3) start new regions - usually all those. Then we diagnose.
We have never had a partition in production because we always overprovision RabbitMQ so it can maintain cluster communications. We basically avoid disk IO due to the risk of IO wait interfering w the cluster heartbeat.
When we have tested situations like this, we found it best to just wipe out the cluster and restart. Before doing this, we shift the load to other regions operating in parallel.
We have not had that problem.
We are not yet in production w 3.3.1 but 3.2.4 is running solidly in stage and we will upgrade stage to 3.3.1 this coming week.
A million is not that many - depending on size of course. As I said - our target is 0, but really the question is: what's your rate of change? I try to have enough 'headroom' to easily handle the surges - volumes can vary 20 to 1 depending on the news of the moment etc. If a queue builds and stays high we add resources until it goes down and then investigate.
You need enough resource. And it is good to be able to autoscale.
A specific suggestion I would make for any internal service provider is to use an amqp proxy. We locate proxy clusters that we control in our internal customers' computing environments. They publish to and subscribe from these proxies. We control the shoveling/federation of the proxies to/from our core pipelines in regions, redirecting as needed. The proxies are an additional buffer and also allow us to 'launder' incoming messages, e.g. by forcing persistence off.
We also track and account for every message using metadata, and can charge back... We are cheap but not free.
Anyway, I hope this helps.
ml
_______________________________________________ |
RabbitMQ and batch processing 批提交的更多相关文章
- [转] 深入理解Batch Normalization批标准化
转自:https://www.cnblogs.com/guoyaohua/p/8724433.html 郭耀华's Blog 欲穷千里目,更上一层楼项目主页:https://github.com/gu ...
- SAP OData $batch processing
例として.1回の呼び出しで100個の新しい商品を作成したい場合.最も簡単な方法は.$ batch要求を使用して100個のPOST呼び出しすべてを単一のサービス呼び出しにまとめることです. URIの末尾 ...
- 转载-【深度学习】深入理解Batch Normalization批标准化
全文转载于郭耀华-[深度学习]深入理解Batch Normalization批标准化: 文章链接Batch Normalization: Accelerating Deep Network T ...
- 把SAS批提交添加到鼠标右键
下载注册表管理工具:RegSeeker Portable v2.57 中文绿色便携版 在RegSeeker中搜索:batch
- 莫烦课程Batch Normalization 批标准化
for i in range(N_HIDDEN): # build hidden layers and BN layers input_size = 1 if i == 0 else 10 fc = ...
- Spring Batch 跑批框架
SpringBatch的框架包括启动批处理作业的组件和存储Job执行产生的元数据. 如果作为一个批处理应用程序的开发人员,你暂时没有必要跟这些组件打交道, 因为它们主要为我们提供组件支持的角色,但是您 ...
- 【深度学习】深入理解Batch Normalization批标准化
这几天面试经常被问到BN层的原理,虽然回答上来了,但还是感觉答得不是很好,今天仔细研究了一下Batch Normalization的原理,以下为参考网上几篇文章总结得出. Batch Normaliz ...
- [转载]深入理解Batch Normalization批标准化
文章转载自:http://www.cnblogs.com/guoyaohua/p/8724433.html Batch Normalization作为最近一年来DL的重要成果,已经广泛被证明其有效性和 ...
- (十三)Batch Processing
In addition to being able to index, update, and delete individual documents, Elasticsearch also prov ...
随机推荐
- Windows 如何录屏
从Windows10开始,Windows开始自带了录屏功能(XBOX附带的).本来是方便游戏录制,但日常的录制也不在话下. 快捷键:Win + G 打开XBOX的录制工具 在打开录制工具后 Win + ...
- 【HCIA Gauss】学习汇总-数据库管理(数据库设计 范式 索引 分区)-7
zsql user/pasword@ip:port -c "show databases" # 展示一条sql语句 spool file_path 指定输出文件 可以为相对路径 s ...
- Dymola — 多学科系统仿真平台
Dymola 是法国Dassault Systems公司的多学科系统仿真平台,广泛应用于国内外汽车.工业.交通.能源等行业的系统总体架构设计.指标分解以及系统功能验证及优化等.Dymo ...
- 基于Java+Selenium的WebUI自动化测试框架(九)-----基础页面类(BasePage)
上篇我们写了java读取xml文件的类,实现了可以从xml文件读取元素的方式.那么,接下来我们需要考虑一个问题.我们拿了这些元素之后怎么去操作呢? 先来看看我们手工测试的时候是怎么进行的. 双击浏览器 ...
- Thrift使用入门---RPC服务
https://blog.csdn.net/zkp_java/article/details/81879577 RPC基本原理 大部分的RPC框架都遵循如下三个开发步骤: RPC通信过程如下图所示 通 ...
- idea启动springboot项目报Error running 'ServiceStarter': Command line is too long. Shorten command line for ServiceStarter or also for Application
解决办法:在.idea文件夹下面的workspace.xml中的 <component name="PropertiesComponent">标签下面添加: <p ...
- flask中使用ajax 处理前端请求,结果展示在同一页面,不点击页面不展示
在同一页面点击按钮,后端处理后展示在同一页面,不点击隐藏该结果:与上一篇大同小异,需要在 html.flask.js微调 效果展示: (未点击查询) (点击查询) html: <html> ...
- AOP(execution表达式)
作者:门罗的魔术师 推荐:y-yg 在使用spring框架配置AOP的时候,不管是通过XML配置文件还是注解的方式都需要定义pointcut"切入点" 例如定义切入点表达式 ex ...
- LOJ2823 三个朋友 ——查询字串的哈希值
概念 查询字串的hash值 我们所说的哈希通常都是进制哈希,因为它具有一些很方便的性质,例如,具有和前缀和类似的性质. 假设一个字符串的前缀哈希值记为 $h[i]$,进制为 $base$,那么显然 $ ...
- [USACO]骑马修栅栏 Riding the Fences
题目链接 题目简述:欧拉回路,字典序最小.没什么好说的. 解题思路:插入边的时候,使用multiset来保证遍历出出答案的字典序最小. 算法模板:for(枚举边) 删边(无向图删两次) 遍历到那个点 ...