redis 主从哨兵02

一.为什么要复制

1.实现数据的多副本存储，从而可以实现服务的高可用
2.提供更好的读性能，分担读请求

二.复制技术的关键点及难点

1.如何指定被复制对象
2.增量还是全量，以及如何实现增量
3.复制时不影响前端业务的操作
4.网络被中断后如何处理
5.如何防止发送出去的数据丢失，没有到达从服务器
6.如何识别被复制的数据源发生变化，导致数据出错

三.复制步骤

graph LR

全量同步--增量同步-->命令传播

3.1指定master

1.配置文件配置slaveof
2.从节点命令执行slaveof命令

3.2建立socket连接

从服务器根据配置或者命令行命令slaveof，创建连向主服务器的socket

3.3发送ping命令(当连接创建后发送)

1.通过ping命令检查socket的读写状态是否正常
2.检查主服务器是否能正常处理命令请求
3.当从服务器不能在规定的时间内得到ping的回复，则表示网络不正常，从服务器会断开socket并重连
4.如果从服务器收到主服务器返回的一个错误信息，比如BUSY redis is busy running ascript, youcan...，则从服务器会断开并重连
5.如果从服务器收到的回应是PING，则表示一切正常，可以执行下一步流程

3.4身份验证

1.如果从服务器设置了masterauth选项，则进行身份验证，否则部进行
2.通过向master发送命令auth来实现认证，auth passwd
3.当master没有设置requirepass时，会提示出现no password is set
4.当master设置与slave的密码不一样时，则出现invalid password错误

3.5发送端口信息

1.从服务器执行命令REPLCONF listening-port <port-number>，向主服务器发送从服务器的命令监控端口
2.这个端口号是为了在master上执行info命令时，可以查看从节点的端口信息，也就是从主动告知主自己的监听端口

3.6同步

主从服务器之间互为客户端，可以皮尺发送命令和相应回应

3.7命令传播

主服务器执行命令后会发送给从服务器

四.同步过程记录

五.配置说明

slave <masterip> <masterport>

# 指定被复制的数据源

masterauth <master-password>

# 被复制数据源的认证密码

slave-serve-stale-data yes

# yes 表示slave与master之间的连接断开或者正处于复制时，slave服务器可以接受客户端的请求，缺点是可能读取到可期数据

# no 表示不接受客户端请求，返回错误信息"SYNC with master ip progress"

slave-read-only yes

# 从服务器是否只读，如果不是只读，可能会和主从之间产生数据不一致

repl-timeout 60

# 复制超时时间

# slave在于master SYNC期间有大量数据传输，造成超时

# 在slave角度，master超时，包括数据、ping等

# 在master角度，slave超时，当master发送REPLCONF、ACK pings

repl-disable-tcp-nodelay no

# yes redis将使用更少的tcp和带宽来向slave发送数据，本质就是提高包的有效使用率，多个数据放在一个包中传输，但会导致一定的数据延迟，linux系统是发送堆栈超时40ms

# no 包利用率不高，但延迟更低

repl-backlog-size 1mb

# master端固定发送缓冲区，影响从节点与主节点网络中断后是否全部同步；如果从节点需要多少的数据还在缓冲区，则增量同步，如果超时或者积压淘汰，则发生全量同步

repl-backlog-ttl 3600

# 当slave与master断开后，一定时间超时后，释放backlog的数据

slave-priority 100

# 用于配置从节点优先级，当主节点不能正常工作时，redis sentinel使用它来选择一个从节点并提升为主节点，优先级越高的从节点更有几率提升为主节点

# 当满足下面的条件时，主不接收前端的写请求

min-slaves-to-write 3

# 最少多少个slave在线，默认是0，表示关闭此功能

min-slaves-max-lag 10

# 最小时间延迟，超过该值前端停止写入

六.同步流程

七.全量同步过程

7.1从库进行slaveof

415:S 20 Nov 14:17:17.330 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.

415:S 20 Nov 14:17:17.331 * SLAVE OF 172.16.10.140:6379 enabled (user request from 'id=4 addr=127.0.0.1:55027 fd=11 name= age=198 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof')

415:S 20 Nov 14:17:17.586 * Connecting to MASTER 172.16.10.140:6379

415:S 20 Nov 14:17:17.586 * MASTER <-> SLAVE sync started

415:S 20 Nov 14:17:17.586 * Non blocking connect for SYNC fired the event.

415:S 20 Nov 14:17:17.587 * Master replied to PING, replication can continue...

415:S 20 Nov 14:17:17.587 * Trying a partial resynchronization (request 572caecf4c0bf264880b2e3899a3dae52e7704e9:1).

415:S 20 Nov 14:17:17.592 * Full resync from master: 030a3c44c4f64eb9a02c3b36f3891226fc2074fe:0

415:S 20 Nov 14:17:17.592 * Discarding previously cached master state.

415:S 20 Nov 14:17:17.681 * MASTER <-> SLAVE sync: receiving 201 bytes from master

415:S 20 Nov 14:17:17.698 * MASTER <-> SLAVE sync: Flushing old data

415:S 20 Nov 14:17:19.605 * MASTER <-> SLAVE sync: Loading DB in memory

415:S 20 Nov 14:17:19.605 * MASTER <-> SLAVE sync: Finished with success

415:S 20 Nov 14:17:19.606 * Background append only file rewriting started by pid 687

415:S 20 Nov 14:17:19.631 * AOF rewrite child asks to stop sending diffs.

687:C 20 Nov 14:17:19.631 * Parent agreed to stop sending diffs. Finalizing AOF...

687:C 20 Nov 14:17:19.631 * Concatenating 0.00 MB of AOF diff received from parent.

687:C 20 Nov 14:17:19.632 * SYNC append only file rewrite performed

687:C 20 Nov 14:17:19.632 * AOF rewrite: 2 MB of memory used by copy-on-write

415:S 20 Nov 14:17:19.707 * Background AOF rewrite terminated with success

415:S 20 Nov 14:17:19.707 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)

415:S 20 Nov 14:17:19.707 * Background AOF rewrite finished successfully

7.2主库的log

10110:M 20 Nov 14:17:16.884 * Slave 172.16.10.141:6379 asks for synchronization

10110:M 20 Nov 14:17:16.884 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '572caecf4c0bf264880b2e3899a3dae52e7704e9', my replication IDs are '14f0fbdd33f13d8e6d07c13bb0a184ba7a43c258' and 'ff98eda832c57bef003947b34ae024063689ca44')

10110:M 20 Nov 14:17:16.885 * Starting BGSAVE for SYNC with target: disk

10110:M 20 Nov 14:17:16.888 * Background saving started by pid 11565

11565:C 20 Nov 14:17:16.891 * DB saved on disk

11565:C 20 Nov 14:17:16.891 * RDB: 6 MB of memory used by copy-on-write

10110:M 20 Nov 14:17:16.978 * Background saving terminated with success

10110:M 20 Nov 14:17:16.978 * Synchronization with slave 172.16.10.141:6379 succeeded

7.3主库关闭

# 主库log

12519:C 20 Nov 14:22:10.243 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

12519:C 20 Nov 14:22:10.243 # Redis version=4.0.8, bits=64, commit=00000000, modified=0, pid=12519, just started

12519:C 20 Nov 14:22:10.243 # Configuration loaded

12520:M 20 Nov 14:22:10.245 * Increased maximum number of open files to 10032 (it was originally set to 1024).

12520:M 20 Nov 14:22:10.245 # Creating Server TCP listening socket *:6379: bind: Address already in use

10110:M 20 Nov 14:23:36.032 # User requested shutdown...

10110:M 20 Nov 14:23:36.032 * Calling fsync() on the AOF file.

10110:M 20 Nov 14:23:36.032 * Removing the pid file.

10110:M 20 Nov 14:23:36.032 # Redis is now ready to exit, bye bye...

# 从库log

415:S 20 Nov 14:23:36.736 # Connection with master lost.

415:S 20 Nov 14:23:36.736 * Caching the disconnected master state.

415:S 20 Nov 14:23:37.456 * Connecting to MASTER 172.16.10.140:6379

415:S 20 Nov 14:23:37.456 * MASTER <-> SLAVE sync started

415:S 20 Nov 14:23:37.456 # Error condition on socket for SYNC: Connection refused

415:S 20 Nov 14:23:38.458 * Connecting to MASTER 172.16.10.140:6379

415:S 20 Nov 14:23:38.459 * MASTER <-> SLAVE sync started

415:S 20 Nov 14:23:38.459 # Error condition on socket for SYNC: Connection refused

415:S 20 Nov 14:23:39.462 * Connecting to MASTER 172.16.10.140:6379

7.4主库启动

# 从库log

415:S 20 Nov 14:24:39.625 # Error condition on socket for SYNC: Connection refused

415:S 20 Nov 14:24:40.626 * Connecting to MASTER 172.16.10.140:6379

415:S 20 Nov 14:24:40.626 * MASTER <-> SLAVE sync started

415:S 20 Nov 14:24:40.627 * Non blocking connect for SYNC fired the event.

415:S 20 Nov 14:24:40.627 * Master replied to PING, replication can continue...

415:S 20 Nov 14:24:40.628 * Trying a partial resynchronization (request 030a3c44c4f64eb9a02c3b36f3891226fc2074fe:702).

415:S 20 Nov 14:24:40.629 * Full resync from master: 1e1b4acf86e7882c044eb952136e04e5a70b077b:0

415:S 20 Nov 14:24:40.629 * Discarding previously cached master state.

415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: receiving 216 bytes from master

415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: Flushing old data

415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: Loading DB in memory

415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: Finished with success

415:S 20 Nov 14:24:40.713 * Background append only file rewriting started by pid 1102

415:S 20 Nov 14:24:40.737 * AOF rewrite child asks to stop sending diffs.

1102:C 20 Nov 14:24:40.737 * Parent agreed to stop sending diffs. Finalizing AOF...

1102:C 20 Nov 14:24:40.737 * Concatenating 0.00 MB of AOF diff received from parent.

1102:C 20 Nov 14:24:40.737 * SYNC append only file rewrite performed

1102:C 20 Nov 14:24:40.738 * AOF rewrite: 2 MB of memory used by copy-on-write

415:S 20 Nov 14:24:40.829 * Background AOF rewrite terminated with success

415:S 20 Nov 14:24:40.829 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)

415:S 20 Nov 14:24:40.829 * Background AOF rewrite finished successfully

# 主库log，run_id改变，全同步

12992:M 20 Nov 14:24:39.924 * Slave 172.16.10.141:6379 asks for synchronization

12992:M 20 Nov 14:24:39.925 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '030a3c44c4f64eb9a02c3b36f3891226fc2074fe', my replication IDs are '510ae8234d41a712b9c60fe63a4cf193fc3a9fe2' and '0000000000000000000000000000000000000000')

12992:M 20 Nov 14:24:39.925 * Starting BGSAVE for SYNC with target: disk

12992:M 20 Nov 14:24:39.925 * Background saving started by pid 13002

13002:C 20 Nov 14:24:39.927 * DB saved on disk

13002:C 20 Nov 14:24:39.927 * RDB: 6 MB of memory used by copy-on-write

12992:M 20 Nov 14:24:40.008 * Background saving terminated with success

12992:M 20 Nov 14:24:40.008 * Synchronization with slave 172.16.10.141:6379 succeeded

八.断线后增量复制过程

从库重启

8.1从库关闭

# 主库记录连接丢失

12992:M 20 Nov 14:30:33.092 # Connection with slave 172.16.10.141:6379 lost.

8.2主库继续写数据

127.0.0.1:6379> set k11 v11

OK

127.0.0.1:6379> set k22 v22

OK

8.3从库启动，从库重新启动，也会进行全量同步，因为slave的 run_id也改变了

# 从库log

1520:S 20 Nov 14:31:55.315 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.

1520:S 20 Nov 14:31:55.315 * SLAVE OF 172.16.10.140:6379 enabled (user request from 'id=2 addr=127.0.0.1:55195 fd=10 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof')

1520:S 20 Nov 14:31:55.712 * Connecting to MASTER 172.16.10.140:6379

1520:S 20 Nov 14:31:55.712 * MASTER <-> SLAVE sync started

1520:S 20 Nov 14:31:55.712 * Non blocking connect for SYNC fired the event.

1520:S 20 Nov 14:31:55.712 * Master replied to PING, replication can continue...

1520:S 20 Nov 14:31:55.713 * Trying a partial resynchronization (request 3a389f3b7dc9e3a394e6fdac5b7028e59aa635a8:1).

1520:S 20 Nov 14:31:55.715 * Full resync from master: 1e1b4acf86e7882c044eb952136e04e5a70b077b:575

1520:S 20 Nov 14:31:55.715 * Discarding previously cached master state.

1520:S 20 Nov 14:31:55.784 * MASTER <-> SLAVE sync: receiving 235 bytes from master

1520:S 20 Nov 14:31:55.785 * MASTER <-> SLAVE sync: Flushing old data

1520:S 20 Nov 14:31:55.785 * MASTER <-> SLAVE sync: Loading DB in memory

1520:S 20 Nov 14:31:55.785 * MASTER <-> SLAVE sync: Finished with success

1520:S 20 Nov 14:31:55.786 * Background append only file rewriting started by pid 1533

1520:S 20 Nov 14:31:55.809 * AOF rewrite child asks to stop sending diffs.

1533:C 20 Nov 14:31:55.809 * Parent agreed to stop sending diffs. Finalizing AOF...

1533:C 20 Nov 14:31:55.809 * Concatenating 0.00 MB of AOF diff received from parent.

1533:C 20 Nov 14:31:55.809 * SYNC append only file rewrite performed

1533:C 20 Nov 14:31:55.809 * AOF rewrite: 6 MB of memory used by copy-on-write

1520:S 20 Nov 14:31:55.812 * Background AOF rewrite terminated with success

1520:S 20 Nov 14:31:55.812 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)

1520:S 20 Nov 14:31:55.812 * Background AOF rewrite finished successfully

# 主库log

12992:M 20 Nov 14:31:55.010 * Slave 172.16.10.141:6379 asks for synchronization

12992:M 20 Nov 14:31:55.010 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '3a389f3b7dc9e3a394e6fdac5b7028e59aa635a8', my replication IDs are '1e1b4acf86e7882c044eb952136e04e5a70b077b' and '0000000000000000000000000000000000000000')

12992:M 20 Nov 14:31:55.010 * Starting BGSAVE for SYNC with target: disk

12992:M 20 Nov 14:31:55.011 * Background saving started by pid 14369

14369:C 20 Nov 14:31:55.013 * DB saved on disk

14369:C 20 Nov 14:31:55.013 * RDB: 6 MB of memory used by copy-on-write

12992:M 20 Nov 14:31:55.081 * Background saving terminated with success

12992:M 20 Nov 14:31:55.081 * Synchronization with slave 172.16.10.141:6379 succeeded

从库断线，进行增量同步（积压区数据还在）

1.从库断线后，主库依然写入数据

# slave

systemctl stop network&&sleep 60&&systemctl start network &

2.slave上线后

#主库log

12992:M 20 Nov 15:17:37.019 # Disconnecting timedout slave: 172.16.10.141:6379

12992:M 20 Nov 15:17:37.019 # Connection with slave 172.16.10.141:6379 lost.

12992:M 20 Nov 15:17:38.092 * Slave 172.16.10.141:6379 asks for synchronization

12992:M 20 Nov 15:17:38.093 * Partial resynchronization request from 172.16.10.141:6379 accepted. Sending 165 bytes of backlog starting from offset 4388.

# 从库log

1705:S 20 Nov 15:17:38.792 # MASTER timeout: no data nor PING received...

1705:S 20 Nov 15:17:38.793 # Connection with master lost.

1705:S 20 Nov 15:17:38.793 * Caching the disconnected master state.

1705:S 20 Nov 15:17:38.793 * Connecting to MASTER 172.16.10.140:6379

1705:S 20 Nov 15:17:38.794 * MASTER <-> SLAVE sync started

1705:S 20 Nov 15:17:38.795 * Non blocking connect for SYNC fired the event.

1705:S 20 Nov 15:17:38.795 * Master replied to PING, replication can continue...

1705:S 20 Nov 15:17:38.795 * Trying a partial resynchronization (request 1e1b4acf86e7882c044eb952136e04e5a70b077b:4388).

1705:S 20 Nov 15:17:38.796 * Successful partial resynchronization with master.

1705:S 20 Nov 15:17:38.796 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

九.停掉主后，重新启动，会不会重新全量同步

因为run_id源改变，发生全量同步

十.心跳检测

从服务器默认每10秒一次的频率向主发送心跳命令：REPLCONF ACK <replication_offset>

通过心跳检测可以知道网络状况，通过info命令可以查看到lag参数，表示主从延迟，单位是秒，一般为0或者1
在心跳检测中带有当前从的复制偏移量，当主发送给从的命令有丢失时，可以通过这种高频的心跳检测及时发现偏移量不正确，主服务器可以把缺失的命令重新发给从服务器
通过心跳检查可以实现min-slaves功能，即如果主从状态不正常时，不允许主写入数据

十一.Redis高可用应该解决那些问题

1.多个节点拥有相同的数据
- 复制技术
2.当主节点宕机后，如何产生新的主节点
3.当主节点宕机后，从节点如何自动连接到新的主节点
4.如何判断主节点宕机
5.旧的主节点恢复后，如何处理
6.如何监控redis所有节点的健康状态

十二.什么是sentinel（哨兵）

1.本身也就是redis程序的一部分
2.主要功能
- 2.1监控redis节点的健康状态
- 2.2通知，把监控到的变化通知给相关系统或者redis实例，通过redis的订阅机制实现
- 2.3自动热备（failover），主节点宕机----选举新的主节点
- 2.4.配置管理，redis实例可以通过sentinel获取到某些共享信息
3.Sentinel本身也是分布式，解决了自身单点问题

12.1安装配置sentinel

1.复制配置slave

port 6380

logfile "/home/liubx/redisdata/slave1/logs/redis.log“

pidfile /var/run/redis.pid与主路径不一致

dir /home/liubx/redisdata/slave1

slaveof localhost  6379

2.sentinel配置

在redis的安装目录下有一个配置文件sentinel.conf

daemonize yes

logfile "/home/liubx/sentinel/sentinel.log“

sentinel monitor mymaster 127.0.0.1 6379 1

# 监控名 IP 端口 票数

# 1个sentinel可监控多个master

3.启动sentinel
- redis-sentinel ../sentinel.conf
- redis-server ../sentinel.conf --sentinel

12.2HA步骤

1.主观判断主节点是否下线
2.客观判断主节点下线
3.sentinel选举出执行故障转移的节点（多个sentinel构成对主节点的监控）
4.故障转移
- 选出新的主服务器
- 修改从服务器的复制目标
- 将旧的主服务器变为从服务器

12.3主观判断下线

1.默认每10秒一次的频率发送ping命令，用于检测相关节点是否在线

包括主服务器主所属的从服务器以及其它sentinel
返回+PONG 、–LOADING、 -MASTERDOWN这三种状态中一种表示节点在线，反之，则节点不在线

2.在某段时间内，如果ping的返回不正确，则表示该节点主观下线

时间由参数sentinel down-after-milliseconds master 50000配置,单位为毫秒
这个时间的设置不仅仅影响主节点，还影响主节点所属的所有从节点以及同样监听这个主节点的其它sentinel
- 比如master的ip为1.1 此时的sentinel的ip为1.2，有从节点1.3，1.4，均指向1.1主节点；同时，另外一个sentinel的ip为1.5，并监控1.1；则如果1.2这个sentinel的时间配置为10000毫秒，则1.2判断1.1，1.3，1.4，1.5主观下线的时间都为10000毫秒
不同的sentinel，这个配置时间可以不一样

12.4客观判断下线

当一定数量的其它sentinel也同样判断该master下线时，此sentinel就认为此master为客观下线

这个数量由sentinel monitor master ip port num这里面的num指定

Sentinel之间会创建通信连接，通过发送命令来获取别的sentinel的判断信息

发送sentinel is-master-down-by-addr <current_epoch>
- Current_epoch 配置纪元，也可以理解为选举轮次计数器
- runid为sentinel的实例id，可以为*,代表判断主节点是否下线状态，如果是具体的id，则表示选举领头的sentinel
- Ip为被sentinel判断为主观下线的主服务器的ip地址
- Port为被判断下线的主服务器端口
当其它sentinel收到上面的命令时，会返回以下三个数据
- down_state:1代表主服务器下线，0代表未下线
- leader_runid：*代表此次回复仅为判断主服务器是否下线，具体的值为局部领头sentinel的运行id
- leader_epoch：上一个参数为具体的运行id时，此参数代表此实例的配置纪元类似于配置版本;如果上一个参数为*，则此参数为0

12.5选举领头sentinel

某个sentinel发现主节点客观不在线后都可以发起选举
一个sentinel在一次选举中只能投一次票，先到先得
一次投票完成后，无论是否成功，投票周期都会加一，即epoch加一
如果某个sentinel获取到超过一半的投票，则自己就成为领头sentinel，负责实施故障转移

12.6选举举例

场景：三台sentinel，编号为1，2，3，master的ip为192.168.1.110，端口为6379

步骤：

1这个sentinel先判断主节点主观下线
1发送sentinel is-master-down-by-addr 192.168.1.110 6379 1 *给2和3节点
1获取到反馈后，达到了判断master客观下线的条件
1发起选举，发送sentinel is-master-down-by-addr 192.168.1.110 6379 1 ab12cd34(1自己的实例id)给2和3节点
2收到消息后，因为是第一个收到1的，所以它也选举1，回复消息包含1，ab12cd34，1，分别代表主已经下线，选举的sentinel的实例id为

ab12cd34，选举周期为1；
1收到2的反馈后，发现所获得票是一半以上，则自己成为主，执行故障转移操作

12.7故障转移

1.选出新的主服务器

删除主服务器的所有slave中处于下线状态的从服务器
删除最近5秒内没有回复sentinel发出的info命令的从服务器
删除与主服务器断线时间超过down-after-milliseconds*10毫秒的服务器
按照slave的优先级排序，优先级越高，越容易被选中
优先级一样高，则按照复制偏移量来排，数据偏移量越大说明数据越新
通过向选出的从服务器发送slaveof no one命令来转变身份
以每秒一次的频率发送info命令，如果返回信息中role：master，则选举成功

2.修改从服务器的复制目标

向其它从服务器发送slaveof命令即可

3.将旧的主服务器变为从服务器

因为主服务器已经下线，并不会做任何操作，但是sentinel会在自己的内部状态中维护主已经变为从，当重新连接后，会发送slaveof命令

十三.sentinel

13.1

1)当前主从模式

127.0.0.1:6379> info replication

# Replication

role:master

connected_slaves:1

slave0:ip=172.16.10.141,port=6379,state=online,offset=5266,lag=0

master_replid:1e1b4acf86e7882c044eb952136e04e5a70b077b

master_replid2:0000000000000000000000000000000000000000

master_repl_offset:5266

second_repl_offset:-1

repl_backlog_active:1

repl_backlog_size:1048576

repl_backlog_first_byte_offset:1

repl_backlog_histlen:5266

2)配置2节点的sentinel

vi /usr/local/redis/etc/sentinel.conf

dir "/usr/local/redis/work"

logfile "/usr/local/redis/sentinel.log"

daemonize yes

protected-mode no

sentinel monitor mymaster 172.16.3.140 6379 1

# 上面的mymaster随意起，但是一定要放在下面这行引用的名字之前，不然会报名字找不到

sentinel auth-pass mymaster foobared

3)启动sentinel监控redis-sentinel /usr/local/redis/etc/sentinel.conf

25401:X 20 Nov 15:30:06.428 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

25401:X 20 Nov 15:30:06.428 # Redis version=4.0.8, bits=64, commit=00000000, modified=0, pid=25401, just started

25401:X 20 Nov 15:30:06.428 # Configuration loaded

25402:X 20 Nov 15:30:06.430 * Increased maximum number of open files to 10032 (it was originally set to 1024).

25402:X 20 Nov 15:30:06.431 * Running mode=sentinel, port=26379.

25402:X 20 Nov 15:30:06.431 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.

25402:X 20 Nov 15:30:06.432 # Sentinel ID is 20c0b9c989a852c87c59d913cd1c17c5b7bc2414

25402:X 20 Nov 15:30:06.432 # +monitor master mymaster 172.16.10.140 6379 quorum 1

25402:X 20 Nov 15:30:06.433 * +slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379

25402:X 20 Nov 15:30:06.902 * +sentinel sentinel ff661bc57580186ec6bd2c5162925381e0eef451 172.16.10.141 26379 @ mymaster 172.16.10.140 6379

5778:X 20 Nov 15:30:03.530 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

5778:X 20 Nov 15:30:03.530 # Redis version=4.0.8, bits=64, commit=00000000, modified=0, pid=5778, just started

5778:X 20 Nov 15:30:03.530 # Configuration loaded

5779:X 20 Nov 15:30:03.532 * Increased maximum number of open files to 10032 (it was originally set to 1024).

5779:X 20 Nov 15:30:03.533 * Running mode=sentinel, port=26379.

5779:X 20 Nov 15:30:03.534 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.

5779:X 20 Nov 15:30:03.535 # Sentinel ID is ff661bc57580186ec6bd2c5162925381e0eef451

5779:X 20 Nov 15:30:03.535 # +monitor master mymaster 172.16.10.140 6379 quorum 1

5779:X 20 Nov 15:30:03.537 * +slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379

5779:X 20 Nov 15:30:09.198 * +sentinel sentinel 20c0b9c989a852c87c59d913cd1c17c5b7bc2414 172.16.10.140 26379 @ mymaster 172.16.10.140 6379

4)关闭master 172.16.10.140

# 过一会后 slave变成master

5779:X 20 Nov 15:32:42.152 # +sdown master mymaster 172.16.10.140 6379

5779:X 20 Nov 15:32:42.152 # +odown master mymaster 172.16.10.140 6379 #quorum 1/1

5779:X 20 Nov 15:32:42.152 # +new-epoch 1

5779:X 20 Nov 15:32:42.152 # +try-failover master mymaster 172.16.10.140 6379

5779:X 20 Nov 15:32:42.153 # +vote-for-leader ff661bc57580186ec6bd2c5162925381e0eef451 1

5779:X 20 Nov 15:32:42.155 # 20c0b9c989a852c87c59d913cd1c17c5b7bc2414 voted for ff661bc57580186ec6bd2c5162925381e0eef451 1

5779:X 20 Nov 15:32:42.253 # +elected-leader master mymaster 172.16.10.140 6379

5779:X 20 Nov 15:32:42.253 # +failover-state-select-slave master mymaster 172.16.10.140 6379

5779:X 20 Nov 15:32:42.308 # +selected-slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379

5779:X 20 Nov 15:32:42.308 * +failover-state-send-slaveof-noone slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379

5779:X 20 Nov 15:32:42.366 * +failover-state-wait-promotion slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379

5779:X 20 Nov 15:32:43.095 # +promoted-slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379

5779:X 20 Nov 15:32:43.095 # +failover-state-reconf-slaves master mymaster 172.16.10.140 6379

5779:X 20 Nov 15:32:43.147 # +failover-end master mymaster 172.16.10.140 6379

5779:X 20 Nov 15:32:43.147 # +switch-master mymaster 172.16.10.140 6379 172.16.10.141 6379

5779:X 20 Nov 15:32:43.147 * +slave slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379

5779:X 20 Nov 15:33:13.204 # +sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379

# 原主库log

25402:X 20 Nov 15:32:41.451 # +new-epoch 1

25402:X 20 Nov 15:32:41.452 # +vote-for-leader ff661bc57580186ec6bd2c5162925381e0eef451 1

25402:X 20 Nov 15:32:41.459 # +sdown master mymaster 172.16.10.140 6379

25402:X 20 Nov 15:32:41.459 # +odown master mymaster 172.16.10.140 6379 #quorum 1/1

25402:X 20 Nov 15:32:41.459 # Next failover delay: I will not start a failover before Tue Nov 20 15:38:42 2018

25402:X 20 Nov 15:32:42.445 # +config-update-from sentinel ff661bc57580186ec6bd2c5162925381e0eef451 172.16.10.141 26379 @ mymaster 172.16.10.140 6379

25402:X 20 Nov 15:32:42.445 # +switch-master mymaster 172.16.10.140 6379 172.16.10.141 6379

25402:X 20 Nov 15:32:42.446 * +slave slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379

25402:X 20 Nov 15:33:12.473 # +sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379

5)master已经飘到其中一个slave上了
6)新master上的redis日志

1705:S 20 Nov 15:32:41.912 * MASTER <-> SLAVE sync started

1705:S 20 Nov 15:32:41.912 # Error condition on socket for SYNC: Connection refused

1705:M 20 Nov 15:32:42.366 # Setting secondary replication ID to 1e1b4acf86e7882c044eb952136e04e5a70b077b, valid up to offset: 22832. New replication ID is 6e7a0afb3aa5dbfc2c5b6c4f78afe8a9f0d0035c

1705:M 20 Nov 15:32:42.366 * Discarding previously cached master state.

1705:M 20 Nov 15:32:42.366 * MASTER MODE enabled (user request from 'id=60 addr=172.16.10.141:55503 fd=11 name=sentinel-ff661bc5-cmd age=159 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')

1705:M 20 Nov 15:32:42.367 # CONFIG REWRITE executed with success.

7)将挂掉的master开启

# 原master

25402:X 20 Nov 15:50:25.990 # -sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379

25402:X 20 Nov 15:50:35.967 * +convert-to-slave slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379

# 新master

5779:X 20 Nov 15:50:26.744 # -sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379

8)sentinel.conf被自动修改

dir "/usr/local/redis/work"

logfile "/usr/local/redis/work/sentinel.log"

daemonize yes

protected-mode no

sentinel myid 20c0b9c989a852c87c59d913cd1c17c5b7bc2414

# 上面的mymaster随意起，但是一定要放在下面这行引用的名字之前，不然会报名字找不到

sentinel monitor mymaster 172.16.10.141 6379 1

# Generated by CONFIG REWRITE

port 26379

sentinel auth-pass mymaster foobared

sentinel config-epoch mymaster 1

sentinel leader-epoch mymaster 1

sentinel known-slave mymaster 172.16.10.140 6379

sentinel known-sentinel mymaster 172.16.10.141 26379 ff661bc57580186ec6bd2c5162925381e0eef451

sentinel current-epoch 1

dir "/usr/local/redis/work"

logfile "/usr/local/redis/work/sentinel.log"

daemonize yes

protected-mode no

sentinel myid ff661bc57580186ec6bd2c5162925381e0eef451

# 上面的mymaster随意起，但是一定要放在下面这行引用的名字之前，不然会报名字找不到

sentinel monitor mymaster 172.16.10.141 6379 1

# Generated by CONFIG REWRITE

port 26379

sentinel auth-pass mymaster foobared

sentinel config-epoch mymaster 1

sentinel leader-epoch mymaster 1

sentinel known-slave mymaster 172.16.10.140 6379

sentinel known-sentinel mymaster 172.16.10.140 26379 20c0b9c989a852c87c59d913cd1c17c5b7bc2414

sentinel current-epoch 1

9)注意

21239:X 29 Mar 16:43:12.722 # +try-failover master mymaster 172.16.3.140 6379

21239:X 29 Mar 16:43:12.724 # +vote-for-leader 863c1c8c627415dbc3004deb529d27df2299c2df 95

21239:X 29 Mar 16:43:23.438 # -failover-abort-not-elected master mymaster 172.16.3.140 6379

21239:X 29 Mar 16:43:23.497 # Next failover delay: I will not start a failover before Thu Mar 29 16:49:13 2018

当出现上面停掉master后，无法failover，我用的是第一种方法

1）如果redis实例没有配置

protected-mode yes

bind 192.168.98.136

则在sentinel 配置文件加上

protected-mode no 

即可

2）如果redis实例有配置

protected-mode yes

bind 192.168.98.136

则在sentinel 配置文件加上

protected-mode yes

bind 192.168.98.136

即可

redis 主从哨兵02的更多相关文章

redis主从+ 哨兵模式(sentinel)+漂移VIP实现高可用系统
原文:https://www.jianshu.com/p/c2ab606b00b7 客户端程序客户端程序(如PHP程序)连接redis时需要ip和port,但redis-server进行故障转移时, ...
Redis 主从+哨兵+监控（centos7.2 + redis 3.2.9 ）
环境准备: 192.168.0.2 redis01 主 192.168.0.3 redis02 从 192.168.0.4 redis03 从 Redis 主从搭建一:下载并安装redis软件 ...
Redis主从&哨兵集群搭建
主从集群在搭建主从集群前,我们先把Redis安装起来: #解压Redis压缩包 [root@master lf]# tar -zxvf redis-6.2.1.tar.gz -- #安装gcc [r ...
三千字介绍Redis主从+哨兵+集群
一.Redis持久化策略 1.RDB 每隔几分钟或者一段时间会将redis内存中的数据全量的写入到一个文件中去. 优点: 因为他是每隔一段时间的全量备份,代表了每个时间段的数据.所以适合做冷备份. R ...
redis主从+哨兵模式
主从模式配置分为手动和配置文件两种方式进行配置,我现在有192.168.238.128(CentOS1).192.168.238.131(CentOS3).192.168.238.132(CentOS ...
redis主从+哨兵模式(借鉴)
三台机器分布 192.168.189.129 // master的角色 192.168.189.130 // slave1的角色 192.168.189.131 // salve2的角色 ...
【Redis学习专题】- Redis主从+哨兵集群部署
集群版本: redis-4.0.14 集群节点: 节点角色 IP redis-master 10.100.8.21 redis-slave1 10.100.8.22 redis-slave2 10.1 ...
redis 主从哨兵01
主从复制过程 1.从服务器开始连接主服务器时,会向主服务器发送一个SYNC同步命令 2.主服务器接收到命令后,执行BGSAVE,异步的将写命令保存到一个缓冲区里 3.主服务器执行完BGSAVE之后,就 ...
Redis主从哨兵和集群搭建
主从配置哨兵配置集群配置 1.主从: 国王和丞相,国王权力大(读写),丞相权利小(读) 2.哨兵: 国王和王子,国王死了(主服务挂掉),王子继位(从服务变主服务) 3.集群: 国王和国王,一个国王 ...

随机推荐

linux 之学习路线
原文地址:https://www.oschina.net/question/587367_156024 推荐的发行版如下: UBUNTU 适合纯菜鸟,追求稳定的官方支持,对系统稳定性要求较弱,喜欢最新 ...
Android MVP 十分钟入门！
前言在日常开发APP 的过程中,随着业务的扩展,规模的变化.我们的代码规模也会逐渐变得庞大,每一个类里的代码也会逐渐增多.尤其是Activity和Fragment ,由于Context 的存在,基本 ...
Echarts设置点击事件
简单明了. echarts初始化完成之后,给实例对象通过on绑定事件. 这里的事件包括: 'click','dblclick','mousedown','mouseup','mouseover','m ...
熟透vue手机购物商城开发的重要性
带手机验证码登陆, 带全套购物车系统带数据库前后端分离开发带定位用户功能数据库代码为本地制作好了带支付宝支付系统带django开发服务器接口教程地址: https://www.dua ...
Codeforce 1251C. Minimize The Integer
C. Minimize The Integer time limit per test2 seconds memory limit per test256 megabytes inputstandar ...
codeforce 266c Below the Diagonal 矩阵变换（思维题）
C. Below the Diagonal You are given a square matrix consisting of n rows and n columns. We assume th ...
2019 ICPC 南京网络赛 H-Holy Grail
As the current heir of a wizarding family with a long history,unfortunately, you find yourself force ...
P4720【模板】扩展卢卡斯，P2183 礼物
扩展卢卡斯定理最近光做模板了想了解卢卡斯定理的去这里,那题也有我的题解然而这题和卢卡斯定理并没有太大关系(雾但是,首先要会的是中国剩余定理和exgcd 卢卡斯定理用于求\(n,m\)大,但模数 ...
CSS设置table样式
\(\color{purple}{表格是个很重要的东西,让我们来美化一下吧!}\) table{ width:290px;height:300px; border:1px solid black;/* ...
Git 向远端仓库推文件
第一次推送: 1.git init (创建本地仓库) 2. git remote add origin <远端仓库地址> (与远端仓库建立链接) 3.git checkout -b < ...

redis 主从哨兵02

一.为什么要复制

二.复制技术的关键点及难点

三.复制步骤

3.1指定master

3.2建立socket连接

3.3发送ping命令(当连接创建后发送)

3.4身份验证

3.5发送端口信息

3.6同步

3.7命令传播

四.同步过程记录

五.配置说明

六.同步流程

七.全量同步过程

八.断线后增量复制过程

从库重启

从库断线，进行增量同步（积压区数据还在）

九.停掉主后，重新启动，会不会重新全量同步

十.心跳检测

十一.Redis高可用应该解决那些问题

十二.什么是sentinel（哨兵）

12.1安装配置sentinel

12.2HA步骤

12.3主观判断下线

1.默认每10秒一次的频率发送ping命令，用于检测相关节点是否在线

2.在某段时间内，如果ping的返回不正确，则表示该节点主观下线

12.4客观判断下线

当一定数量的其它sentinel也同样判断该master下线时，此sentinel就认为此master为客观下线

Sentinel之间会创建通信连接，通过发送命令来获取别的sentinel的判断信息

12.5选举领头sentinel

12.6选举举例

12.7故障转移

1.选出新的主服务器

2.修改从服务器的复制目标

3.将旧的主服务器变为从服务器

十三.sentinel

13.1

redis 主从哨兵02的更多相关文章

随机推荐

热门专题