打开Replication Monitor,在Subscription Watch List Tab中,发现有大量的status= “Performance critical” 的黄色Warning,Latency 非常高,第六感告诉我,出事了,无法求援,只能强迫自己淡定下来,既来之,则安之。

1,查看status= “Performance critical” 的Subscription的Detail,在Distributor to subscriber history tab中查看Action messages。

时间最早的Action Message 详细信息如下:

The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the subscriber. Publisher and distributor are still active.

原因是The replication agent 在10 分钟里没有记录一个progress message,不是很明白。

从MSDN上查到 MSSQL_ENG020554 跟上述Action Message 的描述相同

Message Details 

The replication agent has not logged a progress message in %ld minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active.

Explanation   

The Replication agents checkup job runs at a specified interval (10 minutes by default) to check on the status of each replication agent. If an agent has not logged any progress messages since the last time the agent checkup job ran, error MSSQL_ENG020554 can be raised. The agent is expected at least to log history messages even if no other replication activity is occurring. Although the replication agent is not responding as expected, it has not necessarily stopped or failed (if an agent has failed, error MSSQL_ENG020536 should be raised).

The following issues can cause error MSSQL_ENG020554 to be raised:

  • The agent is busy.

    If the agent is too busy to respond when polled by the agent checkup job, the agent checkup job cannot report whether the replication agent is functioning properly. There are a number of reasons why the replication agent could be busy: there might be a lot of data being replicated, or there might be application design or configuration issues that result in processes that run for a long time.

  • The agent cannot log in to one of the computers in the topology.

    All agents have a parameter -LoginTimeOut (set to 15 seconds by default), which governs how long an agent attempts to log in to a replication node, such as a Merge Agent logging in to the Publisher. If the -LoginTimeOut value is set higher than the interval at which the replication agent checkup job runs, a login problem could be the root cause of the error: error MSSQL_ENG020554 is raised before the agent is able to raise a more specific error.

我想起来了,前一天,我更新了一张大表dbo.dt_Test,更新的数据量大概有1.6 亿条,可能是Replication推送的数据量过大,导致agent 太busy。

查看DW正在运行的语句,发现sql server 正在执行一个Replication的sp :sp_MSupd_dbodt_test ,该sp用于更新表dbo.dt_Test。

Root cause 找到后,Google了一下,找到解决方案:

This error message gets generated because of the Distribution heartbeat interval property. This property governs how long an agent can run without logging a progress message. If your replication agents are not reporting an error message and you are seeing the above message, then you could change your heartbeat interval to a higher value. One of the option could be that you changed the history logging option for your replication agent so that it doesn’t log any message.

  1. exec sp_changedistributor_property
  2. @property = 'heartbeat_interval',
  3. @value = <value in minutes>;
  4.  
  5. USE master
  6. GO
  7.  
  8. exec sp_changedistributor_property
  9. @property = 'heartbeat_interval',
  10. @value = 30;

set 30 minutes and check. if you still face the issue change it to 60 minutes

参考文档:

The replication agent has not logged a progress message in 10 minutes.的更多相关文章

  1. Replication:The replication agent has not logged a progress message in 10 minutes.

    打开Replication Monitor,在Subscription Watch List Tab中,发现有大量的status= “Performance critical” 的黄色Warning, ...

  2. 错误排查:Cloudera Manager Agent 的 Parcel 目录位于可用空间小于 10.0 吉字节 的文件系统上。 /opt/cloudera/parcels

    临时解决办法: 点击右上角的抑制,选中抑制复选框,然后重启服务即可.

  3. op.go

    package } ) : : : ,: ,: : : ,: ,: : : ,: ,: ;; ] )} } minutes when there is no incoming events. // P ...

  4. 如何用SQL脚本在SQL Server Replication中创建合并复制,以及怎么创建分区合并复制

    假设我们要创建合并复制的发布端数据库是EFDemo其中有四张表,订阅端数据库是EFDemoSubscription,如下图所示: 首先创建发布端快照代理Sql agent job:"EFDe ...

  5. ProxySQL Tutorial : setup in a MySQL replication topology

    ProxySQL Tutorial : setup in a MySQL replication topology 时间 2015-09-15 05:23:20 ORACLE数据库技术文刊 原文  h ...

  6. OEMCC 13.3 主机agent部署问题排查

    部署安装 具体的安装过程可参考,Alfred Zhao的文章,非常详细,文章是OEMCC13.2的部署过程.OEMCC13.3没有太大差别. https://www.cnblogs.com/jyzha ...

  7. RSyslog Windows Agent 安装配置

    下载地址:https://www.rsyslog.com/windows-agent/windows-agent-download/ 安装过程: 1.双击rsyslogwa安装包,开始进行安装 2.一 ...

  8. MySQL主从复制之半同步(semi-sync replication)

    GreatSQL社区原创内容未经授权不得随意使用,转载请联系小编并注明来源. 半同步简介 MASTER节点在执行完客户端提交的事务后不是立刻返回结果给客户端,而是等待至少一个SLAVE节点接收并写到r ...

  9. RAC的QA

    RAC: Frequently Asked Questions [ID 220970.1]   修改时间 13-JAN-2011     类型 FAQ     状态 PUBLISHED   Appli ...

随机推荐

  1. unity3D项目中如何避免硬代码(C#)

    平时做项目,代码中是不允许出现硬代码的,一般我们是怎么处理的呢? 那么硬代码又是什么呢? 我们俗称的硬代码:eg:   label.text = "欢迎来到梦幻岛";  这样我们俗 ...

  2. 网站开发网页广告条不显示,出现ERR_BLOCKED_BY_CLIENT

    原因: 原因很简单,是因为被360浏览器的ad block给屏蔽掉了. 现象: 解决方案: 更换图片文件名,去掉ad,改成其他的字符.

  3. 弱省互测#2 t2

    题意 给两个树,大小分别为n和m,现在两棵树各选一些点(包括1),使得这棵树以1号点为根同构(同构就是每个点的孩子数目相同),求最大的同构树.(n, m<=500) 分析 我们从两棵树中各取出一 ...

  4. Python3.5 Day1作业:实现用户密码登录,输错三次锁定。

    作业需求: 1.输入用户名密码 2.认证成功后显示欢迎信息 3.输错三次后锁定 实现思路: 1.判断用户是否在黑名单,如果在黑名单提示账号锁定. 2.判断用户是否存在,如果不存在提示账号不存在. 3. ...

  5. iOS开发之单元测试

    开始之前 本文侧重讲述如何在iOS程序的开发过程中使用单元测试.使用Xcode自带的OCUnit作为测试框架. 一.单元测试概述 单元测试作为敏捷开发实践的组成之一,其目的是提高软件开发的效率,维持代 ...

  6. elasticsearch常用的概念整理

    节点node 节点(node)是一个运行着的Elasticsearch实例 集群中一个节点会被选举为主节点(master),它将临时管理集群级别的一些变更,例如新建或删除索引.增加或移除节点等.主节点 ...

  7. 关于learntorank http://qiita.com/rockhopper/items/bb3d46f01df5f6499123

    一.数据转换 如何对于训练数据做pairwise的transform,比如你原始数据是要么点击要么不点击,如何对这些样本数据做pairwise的transform? 下面的方法主要是做组合的方法,就是 ...

  8. CF2.D

    D. Santa Claus and a Palindrome time limit per test 2 seconds memory limit per test 256 megabytes in ...

  9. jquery如何获取第一个或最后一个子元素?

    通过children方法,children("input:first-child") 1 2 $(this).children("input:first-child&qu ...

  10. 【实战Java高并发程序设计 2】无锁的对象引用:AtomicReference

    AtomicReference和AtomicInteger非常类似,不同之处就在于AtomicInteger是对整数的封装,而AtomicReference则对应普通的对象引用.也就是它可以保证你在修 ...