The people using PostgreSQL and the Streaming Replication feature seem to ask many of the same questions:

1. How best to monitor Streaming Replication?

2. What is the best way to do that?

3. Are there alternatives, when monitoring on Standby, to using the pg_stat_replication view on Master?

4. How should I calculate replication lag-time, in seconds, minutes, etc.?

In light of these commonly asked questions, I thought a blog would help. The following are some methods I’ve found to be useful.

Monitoring is critical for large infrastructure deployments where you have Streaming Replication for:

1. Disaster recovery

2. Streaming Replication is for High Availability

3. Load balancing, when using Streaming Replication with Hot Standby

PostgreSQL has some building blocks for replication monitoring, and the following are some important functions and views which can be use for monitoring the replication:

1. pg_stat_replication view on master/primary server.

This view helps in monitoring the standby on Master. It gives you the following details:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
pid:              Process id of walsender process
usesysid:         OID of user which is used for Streaming replication.
usename:          Name of user which is used for Streaming replication
application_name: Application name connected to master
client_addr:      Address of standby/streaming replication
client_hostname:  Hostname of standby.
client_port:      TCP port number on which standby communicating with WAL sender
backend_start:    Start time when SR connected to Master.
state:            Current WAL sender state i.e streaming
sent_location:    Last transaction location sent to standby.
write_location:   Last transaction written on disk at standby
flush_location:   Last transaction flush on disk at standby.
replay_location:  Last transaction flush on disk at standby.
sync_priority:    Priority of standby server being chosen as synchronous standby
sync_state:       Sync State of standby (is it async or synchronous).

e.g.:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+---------------------------------
pid              | 1114
usesysid         | 16384
usename          | repuser
application_name | walreceiver
client_addr      | 172.17.0.3
client_hostname  |
client_port      | 52444
backend_start    | 15-MAY-14 19:54:05.535695 -04:00
state            | streaming
sent_location    | 0/290044C0
write_location   | 0/290044C0
flush_location   | 0/290044C0
replay_location  | 0/290044C0
sync_priority    | 0
sync_state       | async

2. pg_is_in_recovery() : Function which tells whether standby is still in recovery mode or not.

e.g.

1
2
3
4
5
postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

3. pg_last_xlog_receive_location: Function which tells location of last transaction log which was streamed by Standby and also written on standby disk.

e.g.

1
2
3
4
5
postgres=# select pg_last_xlog_receive_location();
 pg_last_xlog_receive_location
-------------------------------
 0/29004560
(1 row)

4. pg_last_xlog_replay_location: Function which tells last transaction replayed during recovery process. e.g is given below:

1
2
3
4
5
postgres=# select pg_last_xlog_replay_location();
 pg_last_xlog_replay_location
------------------------------
 0/29004560
(1 row)

5. pg_last_xact_replay_timestamp: This function tells about the time stamp of last transaction which was replayed during recovery. Below is an example:

1
2
3
4
5
postgres=# select pg_last_xact_replay_timestamp();
  pg_last_xact_replay_timestamp
----------------------------------
 15-MAY-14 20:54:27.635591 -04:00
(1 row)

Above are some important functions/views, which are already available in PostgreSQL for monitoring the streaming replication.

So, the logical next question is, “What’s the right way to monitor the Hot Standby with Streaming Replication on Standby Server?”

If you have Hot Standby with Streaming Replication, the following are the points you should monitor:

1. Check if your Hot Standby is in recovery mode or not:

For this you can use pg_is_in_recovery() function.

2.Check whether Streaming Replication is working or not.

And easy way of doing this is checking the pg_stat_replication view on Master/Primary. This view gives information only on master if Streaming Replication is working.

3. Check If Streaming Replication is not working and Hot standby is recovering from archived WAL file.

For this, either the DBA can use the PostgreSQL Log file to monitor it or utilize the following functions provided in PostgreSQL 9.3:

1
2
pg_last_xlog_replay_location();
pg_last_xact_replay_timestamp();

4. Check how far off is the Standby from Master.

There are two ways to monitor lag for Standby.



   i. Lags in Bytes: For calculating lags in bytes, users can use thepg_stat_replication view on the master with the functionpg_xlog_location_diff function. Below is an example:

1
pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location)

which gives the lag in bytes.

  ii. Calculating lags in Seconds. The following is SQL, which most people uses to find the lag in seconds:

1
2
3
4
SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
              THEN 0
            ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
       END AS log_delay;

Including the above into your repertoire can give you good monitoring for PostgreSQL.

I will in a future post include the script that can be used for monitoring the Hot Standby with PostgreSQL streaming replication.

Since 9.6 this is a lot easier as it introduced the function pg_blocking_pids() to find the sessions that are blocking another session.

So you can use something like this:

How to check blocking processes in postgresql

select pid,
usename,
pg_blocking_pids(pid) as blocked_by,
query as blocked_query
from pg_stat_activity
where cardinality(pg_blocking_pids(pid)) > 0;

14 - How to check replication status的更多相关文章

  1. java.lang.IllegalStateException: Failed to check the status of the service

    java.lang.IllegalStateException: Failed to check the status of the service com.pinyougou.sellergoods ...

  2. DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1603)

    安装greenplum集群出现以下错误: 20160315:13:49:16:025696 gpinitsystem:h95:jason-[INFO]:-Checking configuration ...

  3. check failed status == cudnn_status_success (4 vs. 0) cudnn_status_internal_error

    Check failed: error == cudaSuccess (30 vs. 0) unknown error  这个有可能是显存不足造成的,或者网络参数不对造成的 check failed ...

  4. 关于Failed to check the status of the service com.taotao.service.ItemService. No provider available fo

    原文:http://www.bubuko.com/infodetail-2250226.html 项目中用dubbo发生: Failed to check the status of the serv ...

  5. Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR

    I0930 21:23:15.115576 30918 solver.cpp:281] Learning Rate Policy: multistepF0930 21:23:17.263314 310 ...

  6. CUDA报错: Cannot create Cublas handle. Cublas won't be available. 以及:Check failed: status == CUBLAS_STATUS_SUCCESS (1 vs. 0) CUBLAS_STATUS_NOT_INITIALIZED

    Error描述: aita@aita-Alienware-Area-51-R5:~/AITA2/daisida/ssd-github/caffe$ make runtest -j8 .build_re ...

  7. node js fcoin api 出现 api key check fail : {"status":1090,"msg":"Illegal API signature"}

    //主区://ft / btc 不支持市价 买入数量不能小于5个FT 买//ft / eth 支持市价 最小买入eth不能小于0.01 买//ft / usdt 支持市价 最小买入usdt不能小于10 ...

  8. dubbo Failed to check the status of the service com.user.service.UserService. No provider available for the service

    Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'u ...

  9. springboot 报错nested exception is java.lang.IllegalStateException: Failed to check the status of the service xxxService No provider available for the service

    spring: dubbo:#关闭所有服务的启动时检查:(没有提供者时报错) consumer: check: false timeout: 3000

随机推荐

  1. mysql 分库分表 ~ 柔性事务

    一 定义 TCC方案是可能是目前最火的一种柔性事务方案二 具体 内容 TCC=try(预设)-confrim(应用确认)-canal(回滚取消)三 目的 解决跨服务调用场景下的分布式事务问题,避免使用 ...

  2. codeforces 915E - Physical Education Lessons 动态开点线段树

    题意: 最大$10^9$的区间, $3*10^5$次区间修改,每次操作后求整个区间的和 题解: 裸的动态开点线段树,计算清楚数据范围是关键... 经过尝试 $2*10^7$会$MLE$ $10^7$会 ...

  3. 【easy】349. Intersection of Two Arrays

    找两个数组的交集(不要多想,考虑啥序列之类的,就是简单的两堆数求交集啊!!!最后去个重就好了) //LeetCode里用到很多现成函数的时候,苦手だな- //这个题的思路是,先sort,把两个vect ...

  4. 【原创】大数据基础之Marathon(1)简介、安装、使用

    marathon 1.6.322 官方:https://mesosphere.github.io/marathon/ 一 简介 Marathon is a production-grade conta ...

  5. iOS rebuild from bitcode对ipa大小的影响

    https://developer.apple.com/library/content/technotes/tn2432/_index.html 为了测试一下rebuild from bitcode的 ...

  6. Redis中bitmap的妙用

    BitMap是什么 就是通过一个bit位来表示某个元素对应的值或者状态,其中的key就是对应元素本身.我们知道8个bit可以组成一个Byte,所以bitmap本身会极大的节省储存空间. Redis中的 ...

  7. liunx 下WebBench 安装与压力测试

    安装: wget http://blog.zyan.cc/soft/linux/webbench/webbench-1.5.tar.gz tar zxvf webbench-1.5.tar.gz cd ...

  8. node中间层转发请求

    前台页面: $.get("/api/hello?name=leyi",function(rps){ console.info(rps); }); node中间层(比如匹配api开头 ...

  9. select2 下拉搜索控件

    1.添加相应的script链接 jquery: <script type="text/javascript" src="http://cdn.bootcss.com ...

  10. SQL反模式学习笔记12 存储图片或其他多媒体大文件

    目标:存储图片或其他多媒体大文件 反模式:图片存储在数据库外的文件系统中,数据库表中存储文件的对应的路径和名称. 缺点:     1.文件不支持Delete操作.使用SQL语句删除一条记录时,对应的文 ...