redis 主从哨兵02
一.为什么要复制
- 1.实现数据的多副本存储,从而可以实现服务的高可用
- 2.提供更好的读性能,分担读请求
二.复制技术的关键点及难点
- 1.如何指定被复制对象
- 2.增量还是全量,以及如何实现增量
- 3.复制时不影响前端业务的操作
- 4.网络被中断后如何处理
- 5.如何防止发送出去的数据丢失,没有到达从服务器
- 6.如何识别被复制的数据源发生变化,导致数据出错
三.复制步骤
graph LR
全量同步--增量同步-->命令传播
3.1指定master
- 1.配置文件配置
slaveof
- 2.从节点命令执行
slaveof
命令
3.2建立socket连接
- 从服务器根据配置或者命令行命令
slaveof
,创建连向主服务器的socket
3.3发送ping命令(当连接创建后发送)
- 1.通过ping命令检查socket的读写状态是否正常
- 2.检查主服务器是否能正常处理命令请求
- 3.当从服务器不能在规定的时间内得到ping的回复,则表示网络不正常,从服务器会断开socket并重连
- 4.如果从服务器收到主服务器返回的一个错误信息,比如
BUSY redis is busy running ascript, youcan...
,则从服务器会断开并重连 - 5.如果从服务器收到的回应是
PING
,则表示一切正常,可以执行下一步流程
3.4身份验证
- 1.如果从服务器设置了
masterauth
选项,则进行身份验证,否则部进行 - 2.通过向master发送命令auth来实现认证,
auth passwd
- 3.当master没有设置
requirepass
时,会提示出现no password is set
- 4.当master设置与slave的密码不一样时,则出现
invalid password
错误
3.5发送端口信息
- 1.从服务器执行命令
REPLCONF listening-port <port-number>
,向主服务器发送从服务器的命令监控端口 - 2.这个端口号是为了在master上执行info命令时,可以查看从节点的端口信息,也就是从主动告知主自己的监听端口
3.6同步
- 主从服务器之间互为客户端,可以皮尺发送命令和相应回应
3.7命令传播
- 主服务器执行命令后会发送给从服务器
四.同步过程记录
五.配置说明
slave <masterip> <masterport>
# 指定被复制的数据源
masterauth <master-password>
# 被复制数据源的认证密码
slave-serve-stale-data yes
# yes 表示slave与master之间的连接断开或者正处于复制时,slave服务器可以接受客户端的请求,缺点是可能读取到可期数据
# no 表示不接受客户端请求,返回错误信息"SYNC with master ip progress"
slave-read-only yes
# 从服务器是否只读,如果不是只读,可能会和主从之间产生数据不一致
repl-timeout 60
# 复制超时时间
# slave在于master SYNC期间有大量数据传输,造成超时
# 在slave角度,master超时,包括数据、ping等
# 在master角度,slave超时,当master发送REPLCONF、ACK pings
repl-disable-tcp-nodelay no
# yes redis将使用更少的tcp和带宽来向slave发送数据,本质就是提高包的有效使用率,多个数据放在一个包中传输,但会导致一定的数据延迟,linux系统是发送堆栈超时40ms
# no 包利用率不高,但延迟更低
repl-backlog-size 1mb
# master端固定发送缓冲区,影响从节点与主节点网络中断后是否全部同步;如果从节点需要多少的数据还在缓冲区,则增量同步,如果超时或者积压淘汰,则发生全量同步
repl-backlog-ttl 3600
# 当slave与master断开后,一定时间超时后,释放backlog的数据
slave-priority 100
# 用于配置从节点优先级,当主节点不能正常工作时,redis sentinel使用它来选择一个从节点并提升为主节点,优先级越高的从节点更有几率提升为主节点
# 当满足下面的条件时,主不接收前端的写请求
min-slaves-to-write 3
# 最少多少个slave在线,默认是0,表示关闭此功能
min-slaves-max-lag 10
# 最小时间延迟,超过该值前端停止写入
六.同步流程
七.全量同步过程
- 7.1从库进行slaveof
415:S 20 Nov 14:17:17.330 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
415:S 20 Nov 14:17:17.331 * SLAVE OF 172.16.10.140:6379 enabled (user request from 'id=4 addr=127.0.0.1:55027 fd=11 name= age=198 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof')
415:S 20 Nov 14:17:17.586 * Connecting to MASTER 172.16.10.140:6379
415:S 20 Nov 14:17:17.586 * MASTER <-> SLAVE sync started
415:S 20 Nov 14:17:17.586 * Non blocking connect for SYNC fired the event.
415:S 20 Nov 14:17:17.587 * Master replied to PING, replication can continue...
415:S 20 Nov 14:17:17.587 * Trying a partial resynchronization (request 572caecf4c0bf264880b2e3899a3dae52e7704e9:1).
415:S 20 Nov 14:17:17.592 * Full resync from master: 030a3c44c4f64eb9a02c3b36f3891226fc2074fe:0
415:S 20 Nov 14:17:17.592 * Discarding previously cached master state.
415:S 20 Nov 14:17:17.681 * MASTER <-> SLAVE sync: receiving 201 bytes from master
415:S 20 Nov 14:17:17.698 * MASTER <-> SLAVE sync: Flushing old data
415:S 20 Nov 14:17:19.605 * MASTER <-> SLAVE sync: Loading DB in memory
415:S 20 Nov 14:17:19.605 * MASTER <-> SLAVE sync: Finished with success
415:S 20 Nov 14:17:19.606 * Background append only file rewriting started by pid 687
415:S 20 Nov 14:17:19.631 * AOF rewrite child asks to stop sending diffs.
687:C 20 Nov 14:17:19.631 * Parent agreed to stop sending diffs. Finalizing AOF...
687:C 20 Nov 14:17:19.631 * Concatenating 0.00 MB of AOF diff received from parent.
687:C 20 Nov 14:17:19.632 * SYNC append only file rewrite performed
687:C 20 Nov 14:17:19.632 * AOF rewrite: 2 MB of memory used by copy-on-write
415:S 20 Nov 14:17:19.707 * Background AOF rewrite terminated with success
415:S 20 Nov 14:17:19.707 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
415:S 20 Nov 14:17:19.707 * Background AOF rewrite finished successfully
- 7.2主库的log
10110:M 20 Nov 14:17:16.884 * Slave 172.16.10.141:6379 asks for synchronization
10110:M 20 Nov 14:17:16.884 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '572caecf4c0bf264880b2e3899a3dae52e7704e9', my replication IDs are '14f0fbdd33f13d8e6d07c13bb0a184ba7a43c258' and 'ff98eda832c57bef003947b34ae024063689ca44')
10110:M 20 Nov 14:17:16.885 * Starting BGSAVE for SYNC with target: disk
10110:M 20 Nov 14:17:16.888 * Background saving started by pid 11565
11565:C 20 Nov 14:17:16.891 * DB saved on disk
11565:C 20 Nov 14:17:16.891 * RDB: 6 MB of memory used by copy-on-write
10110:M 20 Nov 14:17:16.978 * Background saving terminated with success
10110:M 20 Nov 14:17:16.978 * Synchronization with slave 172.16.10.141:6379 succeeded
- 7.3主库关闭
# 主库log
12519:C 20 Nov 14:22:10.243 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
12519:C 20 Nov 14:22:10.243 # Redis version=4.0.8, bits=64, commit=00000000, modified=0, pid=12519, just started
12519:C 20 Nov 14:22:10.243 # Configuration loaded
12520:M 20 Nov 14:22:10.245 * Increased maximum number of open files to 10032 (it was originally set to 1024).
12520:M 20 Nov 14:22:10.245 # Creating Server TCP listening socket *:6379: bind: Address already in use
10110:M 20 Nov 14:23:36.032 # User requested shutdown...
10110:M 20 Nov 14:23:36.032 * Calling fsync() on the AOF file.
10110:M 20 Nov 14:23:36.032 * Removing the pid file.
10110:M 20 Nov 14:23:36.032 # Redis is now ready to exit, bye bye...
# 从库log
415:S 20 Nov 14:23:36.736 # Connection with master lost.
415:S 20 Nov 14:23:36.736 * Caching the disconnected master state.
415:S 20 Nov 14:23:37.456 * Connecting to MASTER 172.16.10.140:6379
415:S 20 Nov 14:23:37.456 * MASTER <-> SLAVE sync started
415:S 20 Nov 14:23:37.456 # Error condition on socket for SYNC: Connection refused
415:S 20 Nov 14:23:38.458 * Connecting to MASTER 172.16.10.140:6379
415:S 20 Nov 14:23:38.459 * MASTER <-> SLAVE sync started
415:S 20 Nov 14:23:38.459 # Error condition on socket for SYNC: Connection refused
415:S 20 Nov 14:23:39.462 * Connecting to MASTER 172.16.10.140:6379
- 7.4主库启动
# 从库log
415:S 20 Nov 14:24:39.625 # Error condition on socket for SYNC: Connection refused
415:S 20 Nov 14:24:40.626 * Connecting to MASTER 172.16.10.140:6379
415:S 20 Nov 14:24:40.626 * MASTER <-> SLAVE sync started
415:S 20 Nov 14:24:40.627 * Non blocking connect for SYNC fired the event.
415:S 20 Nov 14:24:40.627 * Master replied to PING, replication can continue...
415:S 20 Nov 14:24:40.628 * Trying a partial resynchronization (request 030a3c44c4f64eb9a02c3b36f3891226fc2074fe:702).
415:S 20 Nov 14:24:40.629 * Full resync from master: 1e1b4acf86e7882c044eb952136e04e5a70b077b:0
415:S 20 Nov 14:24:40.629 * Discarding previously cached master state.
415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: receiving 216 bytes from master
415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: Flushing old data
415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: Loading DB in memory
415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: Finished with success
415:S 20 Nov 14:24:40.713 * Background append only file rewriting started by pid 1102
415:S 20 Nov 14:24:40.737 * AOF rewrite child asks to stop sending diffs.
1102:C 20 Nov 14:24:40.737 * Parent agreed to stop sending diffs. Finalizing AOF...
1102:C 20 Nov 14:24:40.737 * Concatenating 0.00 MB of AOF diff received from parent.
1102:C 20 Nov 14:24:40.737 * SYNC append only file rewrite performed
1102:C 20 Nov 14:24:40.738 * AOF rewrite: 2 MB of memory used by copy-on-write
415:S 20 Nov 14:24:40.829 * Background AOF rewrite terminated with success
415:S 20 Nov 14:24:40.829 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
415:S 20 Nov 14:24:40.829 * Background AOF rewrite finished successfully
# 主库log,run_id改变,全同步
12992:M 20 Nov 14:24:39.924 * Slave 172.16.10.141:6379 asks for synchronization
12992:M 20 Nov 14:24:39.925 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '030a3c44c4f64eb9a02c3b36f3891226fc2074fe', my replication IDs are '510ae8234d41a712b9c60fe63a4cf193fc3a9fe2' and '0000000000000000000000000000000000000000')
12992:M 20 Nov 14:24:39.925 * Starting BGSAVE for SYNC with target: disk
12992:M 20 Nov 14:24:39.925 * Background saving started by pid 13002
13002:C 20 Nov 14:24:39.927 * DB saved on disk
13002:C 20 Nov 14:24:39.927 * RDB: 6 MB of memory used by copy-on-write
12992:M 20 Nov 14:24:40.008 * Background saving terminated with success
12992:M 20 Nov 14:24:40.008 * Synchronization with slave 172.16.10.141:6379 succeeded
八.断线后增量复制过程
从库重启
- 8.1从库关闭
# 主库记录连接丢失
12992:M 20 Nov 14:30:33.092 # Connection with slave 172.16.10.141:6379 lost.
- 8.2主库继续写数据
127.0.0.1:6379> set k11 v11
OK
127.0.0.1:6379> set k22 v22
OK
- 8.3从库启动,从库重新启动,也会进行全量同步,因为slave的 run_id也改变了
# 从库log
1520:S 20 Nov 14:31:55.315 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1520:S 20 Nov 14:31:55.315 * SLAVE OF 172.16.10.140:6379 enabled (user request from 'id=2 addr=127.0.0.1:55195 fd=10 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof')
1520:S 20 Nov 14:31:55.712 * Connecting to MASTER 172.16.10.140:6379
1520:S 20 Nov 14:31:55.712 * MASTER <-> SLAVE sync started
1520:S 20 Nov 14:31:55.712 * Non blocking connect for SYNC fired the event.
1520:S 20 Nov 14:31:55.712 * Master replied to PING, replication can continue...
1520:S 20 Nov 14:31:55.713 * Trying a partial resynchronization (request 3a389f3b7dc9e3a394e6fdac5b7028e59aa635a8:1).
1520:S 20 Nov 14:31:55.715 * Full resync from master: 1e1b4acf86e7882c044eb952136e04e5a70b077b:575
1520:S 20 Nov 14:31:55.715 * Discarding previously cached master state.
1520:S 20 Nov 14:31:55.784 * MASTER <-> SLAVE sync: receiving 235 bytes from master
1520:S 20 Nov 14:31:55.785 * MASTER <-> SLAVE sync: Flushing old data
1520:S 20 Nov 14:31:55.785 * MASTER <-> SLAVE sync: Loading DB in memory
1520:S 20 Nov 14:31:55.785 * MASTER <-> SLAVE sync: Finished with success
1520:S 20 Nov 14:31:55.786 * Background append only file rewriting started by pid 1533
1520:S 20 Nov 14:31:55.809 * AOF rewrite child asks to stop sending diffs.
1533:C 20 Nov 14:31:55.809 * Parent agreed to stop sending diffs. Finalizing AOF...
1533:C 20 Nov 14:31:55.809 * Concatenating 0.00 MB of AOF diff received from parent.
1533:C 20 Nov 14:31:55.809 * SYNC append only file rewrite performed
1533:C 20 Nov 14:31:55.809 * AOF rewrite: 6 MB of memory used by copy-on-write
1520:S 20 Nov 14:31:55.812 * Background AOF rewrite terminated with success
1520:S 20 Nov 14:31:55.812 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
1520:S 20 Nov 14:31:55.812 * Background AOF rewrite finished successfully
# 主库log
12992:M 20 Nov 14:31:55.010 * Slave 172.16.10.141:6379 asks for synchronization
12992:M 20 Nov 14:31:55.010 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '3a389f3b7dc9e3a394e6fdac5b7028e59aa635a8', my replication IDs are '1e1b4acf86e7882c044eb952136e04e5a70b077b' and '0000000000000000000000000000000000000000')
12992:M 20 Nov 14:31:55.010 * Starting BGSAVE for SYNC with target: disk
12992:M 20 Nov 14:31:55.011 * Background saving started by pid 14369
14369:C 20 Nov 14:31:55.013 * DB saved on disk
14369:C 20 Nov 14:31:55.013 * RDB: 6 MB of memory used by copy-on-write
12992:M 20 Nov 14:31:55.081 * Background saving terminated with success
12992:M 20 Nov 14:31:55.081 * Synchronization with slave 172.16.10.141:6379 succeeded
从库断线,进行增量同步(积压区数据还在)
- 1.从库断线后,主库依然写入数据
# slave
systemctl stop network&&sleep 60&&systemctl start network &
- 2.slave上线后
#主库log
12992:M 20 Nov 15:17:37.019 # Disconnecting timedout slave: 172.16.10.141:6379
12992:M 20 Nov 15:17:37.019 # Connection with slave 172.16.10.141:6379 lost.
12992:M 20 Nov 15:17:38.092 * Slave 172.16.10.141:6379 asks for synchronization
12992:M 20 Nov 15:17:38.093 * Partial resynchronization request from 172.16.10.141:6379 accepted. Sending 165 bytes of backlog starting from offset 4388.
# 从库log
1705:S 20 Nov 15:17:38.792 # MASTER timeout: no data nor PING received...
1705:S 20 Nov 15:17:38.793 # Connection with master lost.
1705:S 20 Nov 15:17:38.793 * Caching the disconnected master state.
1705:S 20 Nov 15:17:38.793 * Connecting to MASTER 172.16.10.140:6379
1705:S 20 Nov 15:17:38.794 * MASTER <-> SLAVE sync started
1705:S 20 Nov 15:17:38.795 * Non blocking connect for SYNC fired the event.
1705:S 20 Nov 15:17:38.795 * Master replied to PING, replication can continue...
1705:S 20 Nov 15:17:38.795 * Trying a partial resynchronization (request 1e1b4acf86e7882c044eb952136e04e5a70b077b:4388).
1705:S 20 Nov 15:17:38.796 * Successful partial resynchronization with master.
1705:S 20 Nov 15:17:38.796 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
九.停掉主后,重新启动,会不会重新全量同步
- 因为run_id源改变,发生全量同步
十.心跳检测
从服务器默认每10秒一次的频率向主发送心跳命令:REPLCONF ACK <replication_offset>
- 通过心跳检测可以知道网络状况,通过info命令可以查看到lag参数,表示主从延迟,单位是秒,一般为0或者1
- 在心跳检测中带有当前从的复制偏移量,当主发送给从的命令有丢失时,可以通过这种高频的心跳检测及时发现偏移量不正确,主服务器可以把缺失的命令重新发给从服务器
- 通过心跳检查可以实现min-slaves功能,即如果主从状态不正常时,不允许主写入数据
十一.Redis高可用应该解决那些问题
- 1.多个节点拥有相同的数据
- 复制技术
- 2.当主节点宕机后,如何产生新的主节点
- 3.当主节点宕机后,从节点如何自动连接到新的主节点
- 4.如何判断主节点宕机
- 5.旧的主节点恢复后,如何处理
- 6.如何监控redis所有节点的健康状态
十二.什么是sentinel(哨兵)
- 1.本身也就是redis程序的一部分
- 2.主要功能
- 2.1监控redis节点的健康状态
- 2.2通知,把监控到的变化通知给相关系统或者redis实例,通过redis的订阅机制实现
- 2.3自动热备(failover),主节点宕机----选举新的主节点
- 2.4.配置管理,redis实例可以通过sentinel获取到某些共享信息
- 3.Sentinel本身也是分布式,解决了自身单点问题
12.1安装配置sentinel
- 1.复制配置slave
port 6380
logfile "/home/liubx/redisdata/slave1/logs/redis.log“
pidfile /var/run/redis.pid与主路径不一致
dir /home/liubx/redisdata/slave1
slaveof localhost 6379
- 2.sentinel配置
在redis的安装目录下有一个配置文件sentinel.conf
daemonize yes
logfile "/home/liubx/sentinel/sentinel.log“
sentinel monitor mymaster 127.0.0.1 6379 1
# 监控名 IP 端口 票数
# 1个sentinel可监控多个master
- 3.启动sentinel
- redis-sentinel ../sentinel.conf
- redis-server ../sentinel.conf --sentinel
12.2HA步骤
- 1.主观判断主节点是否下线
- 2.客观判断主节点下线
- 3.sentinel选举出执行故障转移的节点(多个sentinel构成对主节点的监控)
- 4.故障转移
- 选出新的主服务器
- 修改从服务器的复制目标
- 将旧的主服务器变为从服务器
12.3主观判断下线
1.默认每10秒一次的频率发送ping命令,用于检测相关节点是否在线
- 包括主服务器 主所属的从服务器 以及其它sentinel
- 返回+PONG 、–LOADING、 -MASTERDOWN这三种状态中一种表示节点在线,反之,则节点不在线
2.在某段时间内,如果ping的返回不正确,则表示该节点主观下线
- 时间由参数sentinel down-after-milliseconds master 50000配置,单位为毫秒
- 这个时间的设置不仅仅影响主节点,还影响主节点所属的所有从节点以及同样监听这个主节点的其它sentinel
- 比如master的ip为1.1 此时的sentinel的ip为1.2,有从节点1.3,1.4,均指向1.1主节点;同时,另外一个sentinel的ip为1.5,并监控1.1;则如果1.2这个sentinel的时间配置为10000毫秒,则1.2判断1.1,1.3,1.4,1.5主观下线的时间都为10000毫秒
- 不同的sentinel,这个配置时间可以不一样
12.4客观判断下线
当一定数量的其它sentinel也同样判断该master下线时,此sentinel就认为此master为客观下线
- 这个数量由sentinel monitor master ip port num这里面的num指定
Sentinel之间会创建通信连接,通过发送命令来获取别的sentinel的判断信息
- 发送sentinel is-master-down-by-addr <current_epoch>
- Current_epoch 配置纪元,也可以理解为选举轮次计数器
- runid为sentinel的实例id,可以为*,代表判断主节点是否下线状态,如果是具体的id,则表示选举领头的sentinel
- Ip为被sentinel判断为主观下线的主服务器的ip地址
- Port为被判断下线的主服务器端口
- 当其它sentinel收到上面的命令时,会返回以下三个数据
- down_state:1代表主服务器下线,0代表未下线
- leader_runid:*代表此次回复仅为判断主服务器是否下线,具体的值为局部领头sentinel的运行id
- leader_epoch:上一个参数为具体的运行id时,此参数代表此实例的配置纪元类似于配置版本;如果上一个参数为*,则此参数为0
12.5选举领头sentinel
- 某个sentinel发现主节点客观不在线后都可以发起选举
- 一个sentinel在一次选举中只能投一次票,先到先得
- 一次投票完成后,无论是否成功,投票周期都会加一,即epoch加一
- 如果某个sentinel获取到超过一半的投票,则自己就成为领头sentinel,负责实施故障转移
12.6选举举例
场景:三台sentinel,编号为1,2,3,master的ip为192.168.1.110,端口为6379
步骤:
- 1这个sentinel先判断主节点主观下线
- 1发送sentinel is-master-down-by-addr 192.168.1.110 6379 1 *给2和3节点
- 1获取到反馈后,达到了判断master客观下线的条件
- 1发起选举,发送sentinel is-master-down-by-addr 192.168.1.110 6379 1 ab12cd34(1自己的实例id)给2和3节点
- 2收到消息后,因为是第一个收到1的,所以它也选举1,回复消息包含1,ab12cd34,1,分别代表主已经下线,选举的sentinel的实例id为
ab12cd34,选举周期为1; - 1收到2的反馈后,发现所获得票是一半以上,则自己成为主,执行故障转移操作
12.7故障转移
1.选出新的主服务器
- 删除主服务器的所有slave中处于下线状态的从服务器
- 删除最近5秒内没有回复sentinel发出的info命令的从服务器
- 删除与主服务器断线时间超过down-after-milliseconds*10毫秒的服务器
- 按照slave的优先级排序,优先级越高,越容易被选中
- 优先级一样高,则按照复制偏移量来排,数据偏移量越大说明数据越新
- 通过向选出的从服务器发送slaveof no one命令来转变身份
- 以每秒一次的频率发送info命令,如果返回信息中role:master,则选举成功
2.修改从服务器的复制目标
- 向其它从服务器发送slaveof命令即可
3.将旧的主服务器变为从服务器
- 因为主服务器已经下线,并不会做任何操作,但是sentinel会在自己的内部状态中维护主已经变为从,当重新连接后,会发送slaveof命令
十三.sentinel
13.1
- 1)当前主从模式
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=172.16.10.141,port=6379,state=online,offset=5266,lag=0
master_replid:1e1b4acf86e7882c044eb952136e04e5a70b077b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:5266
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:5266
- 2)配置2节点的sentinel
vi /usr/local/redis/etc/sentinel.conf
dir "/usr/local/redis/work"
logfile "/usr/local/redis/sentinel.log"
daemonize yes
protected-mode no
sentinel monitor mymaster 172.16.3.140 6379 1
# 上面的mymaster随意起,但是一定要放在下面这行引用的名字之前,不然会报名字找不到
sentinel auth-pass mymaster foobared
- 3)启动sentinel监控
redis-sentinel /usr/local/redis/etc/sentinel.conf
25401:X 20 Nov 15:30:06.428 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
25401:X 20 Nov 15:30:06.428 # Redis version=4.0.8, bits=64, commit=00000000, modified=0, pid=25401, just started
25401:X 20 Nov 15:30:06.428 # Configuration loaded
25402:X 20 Nov 15:30:06.430 * Increased maximum number of open files to 10032 (it was originally set to 1024).
25402:X 20 Nov 15:30:06.431 * Running mode=sentinel, port=26379.
25402:X 20 Nov 15:30:06.431 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
25402:X 20 Nov 15:30:06.432 # Sentinel ID is 20c0b9c989a852c87c59d913cd1c17c5b7bc2414
25402:X 20 Nov 15:30:06.432 # +monitor master mymaster 172.16.10.140 6379 quorum 1
25402:X 20 Nov 15:30:06.433 * +slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
25402:X 20 Nov 15:30:06.902 * +sentinel sentinel ff661bc57580186ec6bd2c5162925381e0eef451 172.16.10.141 26379 @ mymaster 172.16.10.140 6379
5778:X 20 Nov 15:30:03.530 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
5778:X 20 Nov 15:30:03.530 # Redis version=4.0.8, bits=64, commit=00000000, modified=0, pid=5778, just started
5778:X 20 Nov 15:30:03.530 # Configuration loaded
5779:X 20 Nov 15:30:03.532 * Increased maximum number of open files to 10032 (it was originally set to 1024).
5779:X 20 Nov 15:30:03.533 * Running mode=sentinel, port=26379.
5779:X 20 Nov 15:30:03.534 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
5779:X 20 Nov 15:30:03.535 # Sentinel ID is ff661bc57580186ec6bd2c5162925381e0eef451
5779:X 20 Nov 15:30:03.535 # +monitor master mymaster 172.16.10.140 6379 quorum 1
5779:X 20 Nov 15:30:03.537 * +slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
5779:X 20 Nov 15:30:09.198 * +sentinel sentinel 20c0b9c989a852c87c59d913cd1c17c5b7bc2414 172.16.10.140 26379 @ mymaster 172.16.10.140 6379
- 4)关闭master 172.16.10.140
# 过一会后 slave变成master
5779:X 20 Nov 15:32:42.152 # +sdown master mymaster 172.16.10.140 6379
5779:X 20 Nov 15:32:42.152 # +odown master mymaster 172.16.10.140 6379 #quorum 1/1
5779:X 20 Nov 15:32:42.152 # +new-epoch 1
5779:X 20 Nov 15:32:42.152 # +try-failover master mymaster 172.16.10.140 6379
5779:X 20 Nov 15:32:42.153 # +vote-for-leader ff661bc57580186ec6bd2c5162925381e0eef451 1
5779:X 20 Nov 15:32:42.155 # 20c0b9c989a852c87c59d913cd1c17c5b7bc2414 voted for ff661bc57580186ec6bd2c5162925381e0eef451 1
5779:X 20 Nov 15:32:42.253 # +elected-leader master mymaster 172.16.10.140 6379
5779:X 20 Nov 15:32:42.253 # +failover-state-select-slave master mymaster 172.16.10.140 6379
5779:X 20 Nov 15:32:42.308 # +selected-slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
5779:X 20 Nov 15:32:42.308 * +failover-state-send-slaveof-noone slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
5779:X 20 Nov 15:32:42.366 * +failover-state-wait-promotion slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
5779:X 20 Nov 15:32:43.095 # +promoted-slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
5779:X 20 Nov 15:32:43.095 # +failover-state-reconf-slaves master mymaster 172.16.10.140 6379
5779:X 20 Nov 15:32:43.147 # +failover-end master mymaster 172.16.10.140 6379
5779:X 20 Nov 15:32:43.147 # +switch-master mymaster 172.16.10.140 6379 172.16.10.141 6379
5779:X 20 Nov 15:32:43.147 * +slave slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
5779:X 20 Nov 15:33:13.204 # +sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
# 原主库log
25402:X 20 Nov 15:32:41.451 # +new-epoch 1
25402:X 20 Nov 15:32:41.452 # +vote-for-leader ff661bc57580186ec6bd2c5162925381e0eef451 1
25402:X 20 Nov 15:32:41.459 # +sdown master mymaster 172.16.10.140 6379
25402:X 20 Nov 15:32:41.459 # +odown master mymaster 172.16.10.140 6379 #quorum 1/1
25402:X 20 Nov 15:32:41.459 # Next failover delay: I will not start a failover before Tue Nov 20 15:38:42 2018
25402:X 20 Nov 15:32:42.445 # +config-update-from sentinel ff661bc57580186ec6bd2c5162925381e0eef451 172.16.10.141 26379 @ mymaster 172.16.10.140 6379
25402:X 20 Nov 15:32:42.445 # +switch-master mymaster 172.16.10.140 6379 172.16.10.141 6379
25402:X 20 Nov 15:32:42.446 * +slave slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
25402:X 20 Nov 15:33:12.473 # +sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
- 5)master已经飘到其中一个slave上了
- 6)新master上的redis日志
1705:S 20 Nov 15:32:41.912 * MASTER <-> SLAVE sync started
1705:S 20 Nov 15:32:41.912 # Error condition on socket for SYNC: Connection refused
1705:M 20 Nov 15:32:42.366 # Setting secondary replication ID to 1e1b4acf86e7882c044eb952136e04e5a70b077b, valid up to offset: 22832. New replication ID is 6e7a0afb3aa5dbfc2c5b6c4f78afe8a9f0d0035c
1705:M 20 Nov 15:32:42.366 * Discarding previously cached master state.
1705:M 20 Nov 15:32:42.366 * MASTER MODE enabled (user request from 'id=60 addr=172.16.10.141:55503 fd=11 name=sentinel-ff661bc5-cmd age=159 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')
1705:M 20 Nov 15:32:42.367 # CONFIG REWRITE executed with success.
- 7)将挂掉的master开启
# 原master
25402:X 20 Nov 15:50:25.990 # -sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
25402:X 20 Nov 15:50:35.967 * +convert-to-slave slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
# 新master
5779:X 20 Nov 15:50:26.744 # -sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
- 8)sentinel.conf被自动修改
dir "/usr/local/redis/work"
logfile "/usr/local/redis/work/sentinel.log"
daemonize yes
protected-mode no
sentinel myid 20c0b9c989a852c87c59d913cd1c17c5b7bc2414
# 上面的mymaster随意起,但是一定要放在下面这行引用的名字之前,不然会报名字找不到
sentinel monitor mymaster 172.16.10.141 6379 1
# Generated by CONFIG REWRITE
port 26379
sentinel auth-pass mymaster foobared
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
sentinel known-slave mymaster 172.16.10.140 6379
sentinel known-sentinel mymaster 172.16.10.141 26379 ff661bc57580186ec6bd2c5162925381e0eef451
sentinel current-epoch 1
dir "/usr/local/redis/work"
logfile "/usr/local/redis/work/sentinel.log"
daemonize yes
protected-mode no
sentinel myid ff661bc57580186ec6bd2c5162925381e0eef451
# 上面的mymaster随意起,但是一定要放在下面这行引用的名字之前,不然会报名字找不到
sentinel monitor mymaster 172.16.10.141 6379 1
# Generated by CONFIG REWRITE
port 26379
sentinel auth-pass mymaster foobared
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
sentinel known-slave mymaster 172.16.10.140 6379
sentinel known-sentinel mymaster 172.16.10.140 26379 20c0b9c989a852c87c59d913cd1c17c5b7bc2414
sentinel current-epoch 1
- 9)注意
21239:X 29 Mar 16:43:12.722 # +try-failover master mymaster 172.16.3.140 6379
21239:X 29 Mar 16:43:12.724 # +vote-for-leader 863c1c8c627415dbc3004deb529d27df2299c2df 95
21239:X 29 Mar 16:43:23.438 # -failover-abort-not-elected master mymaster 172.16.3.140 6379
21239:X 29 Mar 16:43:23.497 # Next failover delay: I will not start a failover before Thu Mar 29 16:49:13 2018
当出现上面停掉master后,无法failover,我用的是第一种方法
1)如果redis实例没有配置
protected-mode yes
bind 192.168.98.136
则在sentinel 配置文件加上
protected-mode no
即可
2)如果redis实例有配置
protected-mode yes
bind 192.168.98.136
则在sentinel 配置文件加上
protected-mode yes
bind 192.168.98.136
即可
redis 主从哨兵02的更多相关文章
- redis主从+ 哨兵模式(sentinel)+漂移VIP实现高可用系统
原文:https://www.jianshu.com/p/c2ab606b00b7 客户端程序 客户端程序(如PHP程序)连接redis时需要ip和port,但redis-server进行故障转移时, ...
- Redis 主从+哨兵+监控 (centos7.2 + redis 3.2.9 )
环境准备: 192.168.0.2 redis01 主 192.168.0.3 redis02 从 192.168.0.4 redis03 从 Redis 主从搭建 一:下载并安装redis软件 ...
- Redis主从&哨兵集群搭建
主从集群 在搭建主从集群前,我们先把Redis安装起来: #解压Redis压缩包 [root@master lf]# tar -zxvf redis-6.2.1.tar.gz -- #安装gcc [r ...
- 三千字介绍Redis主从+哨兵+集群
一.Redis持久化策略 1.RDB 每隔几分钟或者一段时间会将redis内存中的数据全量的写入到一个文件中去. 优点: 因为他是每隔一段时间的全量备份,代表了每个时间段的数据.所以适合做冷备份. R ...
- redis主从+哨兵模式
主从模式配置分为手动和配置文件两种方式进行配置,我现在有192.168.238.128(CentOS1).192.168.238.131(CentOS3).192.168.238.132(CentOS ...
- redis主从+哨兵模式(借鉴)
三台机器分布 192.168.189.129 // master的角色 192.168.189.130 // slave1的角色 192.168.189.131 // salve2的角色 ...
- 【Redis学习专题】- Redis主从+哨兵集群部署
集群版本: redis-4.0.14 集群节点: 节点角色 IP redis-master 10.100.8.21 redis-slave1 10.100.8.22 redis-slave2 10.1 ...
- redis 主从哨兵01
主从复制过程 1.从服务器开始连接主服务器时,会向主服务器发送一个SYNC同步命令 2.主服务器接收到命令后,执行BGSAVE,异步的将写命令保存到一个缓冲区里 3.主服务器执行完BGSAVE之后,就 ...
- Redis主从哨兵和集群搭建
主从配置 哨兵配置 集群配置 1.主从: 国王和丞相,国王权力大(读写),丞相权利小(读) 2.哨兵: 国王和王子,国王死了(主服务挂掉),王子继位(从服务变主服务) 3.集群: 国王和国王,一个国王 ...
随机推荐
- linux 之学习路线
原文地址:https://www.oschina.net/question/587367_156024 推荐的发行版如下: UBUNTU 适合纯菜鸟,追求稳定的官方支持,对系统稳定性要求较弱,喜欢最新 ...
- Android MVP 十分钟入门!
前言 在日常开发APP 的过程中,随着业务的扩展,规模的变化.我们的代码规模也会逐渐变得庞大,每一个类里的代码也会逐渐增多.尤其是Activity和Fragment ,由于Context 的存在,基本 ...
- Echarts设置点击事件
简单明了. echarts初始化完成之后,给实例对象通过on绑定事件. 这里的事件包括: 'click','dblclick','mousedown','mouseup','mouseover','m ...
- 熟透vue手机购物商城开发的重要性
带手机验证码登陆, 带全套购物车系统 带数据库 前后端分离开发 带定位用户功能 数据库代码为本地制作好了 带支付宝支付系统 带django开发服务器接口教程 地址: https://www.dua ...
- Codeforce 1251C. Minimize The Integer
C. Minimize The Integer time limit per test2 seconds memory limit per test256 megabytes inputstandar ...
- codeforce 266c Below the Diagonal 矩阵变换 (思维题)
C. Below the Diagonal You are given a square matrix consisting of n rows and n columns. We assume th ...
- 2019 ICPC 南京网络赛 H-Holy Grail
As the current heir of a wizarding family with a long history,unfortunately, you find yourself force ...
- P4720【模板】扩展卢卡斯,P2183 礼物
扩展卢卡斯定理 最近光做模板了 想了解卢卡斯定理的去这里,那题也有我的题解 然而这题和卢卡斯定理并没有太大关系(雾 但是,首先要会的是中国剩余定理和exgcd 卢卡斯定理用于求\(n,m\)大,但模数 ...
- CSS设置table样式
\(\color{purple}{表格是个很重要的东西,让我们来美化一下吧!}\) table{ width:290px;height:300px; border:1px solid black;/* ...
- Git 向远端仓库推文件
第一次推送: 1.git init (创建本地仓库) 2. git remote add origin <远端仓库地址> (与远端仓库建立链接) 3.git checkout -b < ...