pgpool-II的master-slave模式的分析
磨砺技术珠矶,践行数据之道,追求卓越价值
回到上一级页面: PostgreSQL集群方案相关索引页 回到顶级页面:PostgreSQL索引页
现象描述:
客户来邮件,问:为何Pgpool-II在master-slave模式的时候,发生:
pgpool-II的某子进程与slave db节点间的连接因为长时间无联系被L4SW切断,却不发生failover,而此时向master db节点的commit已经生效,但是马上返回出错信息?
简单言之,那是因为,Pgpool-II开发的时候,没有考虑到这种进程的单独的连接被刻意切断的情形。
此时,如果fail_over_on_backend_error为ture,那么也会激发failover过程。
如果fail_over_on_backend_error为false,而pgpool-II的主进程此时还不断地进行healthcheck,可以正常检测到slave db节点,那么failover过程不会被激发。
上代码:
源代码概要A:
- /*
- * child main loop
- */
- void do_child(int unix_fd, int inet_fd)
- {
- …
- for (;;)
- {
- …
- /* perform accept() */
- frontend = do_accept(unix_fd, inet_fd, &timeout);
- if (frontend =/=* N cUonLLn)ection request from frontend timed out */
- {
- /* check select() timeout */
- if (connected && pool_config->child_life_time > &&
- timeout.tv_sec == && timeout.tv_usec == )
- {
- pool_debug("child life %d seconds expired", pool_config->child_life_time);
- /*
- * Doesn't need to call this. child_exit() calls it.
- * send_frontend_exits();
- */
- child_exit();
- }
- continue;
- }
- …
- /*
- * Ok, negotiaton with frontend has been done. Let's go to the
- * next step. Connect to backend if there's no existing
- * connection which can be reused by this frontend.
- * Authentication is also done in this step.
- */
- …
- /*
- * if there's no connection associated with user and database,
- * we need to connect to the backend and send the startup packet.
- */
- /* look for existing connection */
- found = ;
- backend = pool_get_cp(sp->user, sp->database, sp->major, );
- …
- /* Mark this connection pool is conncted from frontend */
- pool_coninfo_set_frontend_connected(pool_get_process_context()->proc_id, pool_pool_index());
- /* query process loop */
- for (;;)
- {
- POOL_STATUS status;
- status = pool_process_query(frontend, backend, );
- sp = MASTER_CONNECTION(backend)->sp;
- switch (status)
- {
- …
- }
- if (status != POOL_CONTINUE)
- break;
- }
- …
- }
- child_exit();
- }
- /*
- * Main module for query processing
- * reset_request: if non 0, call reset_backend to execute reset queries
- */
- POOL_STATUS pool_process_query(POOL_CONNECTION *frontend,
- POOL_CONNECTION_POOL *backend,
- int reset_request)
- {
- …
- for (;;)
- {
- …
- /*
- * If we are prcessing query, process it.
- */
- if (pool_is_query_in_progress())
- {
- status = ProcessBackendResponse(frontend, backend, &state, &num_fields);
- if (status != POOL_CONTINUE)
- return status;
- }
- /*
- * If frontend and all backends do not have any pending data in
- * the receiving data cache, then issue select(2) to wait for new
- * data arrival
- */
- else if (is_cache_empty(frontend, backend))
- {
- bool cont = true;
- ① status = read_packets_and_process(frontend, backend, reset_request,
- &state, &num_fields, &cont);
- if (status != POOL_CONTINUE)
- return status;
- else if (!c/o*n Dt)etected admin shutdown */
- return status;
- }
- else
- {
- …
- }
- …
- }
- return POOL_CONTINUE;
- }
- /*
- * Read packet from either frontend or backend and process it.
- */
- static POOL_STATUS read_packets_and_process(POOL_CONNECTION *frontend,
- POOL_CONNECTION_POOL *backend, int reset_request, int *state, short *num_fields, bool *cont)
- {
- …
- if (!reset_request)
- {
- if (FD_ISSET(frontend->fd, &exceptmask))
- return POOL_END;
- else if (FD_ISSET(frontend->fd, &readmask))
- {
- ② status = ProcessFrontendResponse(frontend, backend);
- if (status != POOL_CONTINUE)
- return status;
- }
- }
- …
- return POOL_CONTINUE;
- }
- POOL_STATUS ProcessFrontendResponse(POOL_CONNECTION *frontend,
- POOL_CONNECTION_POOL *backend)
- {
- …
- switch (fkind)
- {
- …
- case 'X': /* Terminate */
- free(contents);
- return POOL_END;
- case 'Q': /* Query */
- allow_close_transaction = ;
- ③ status = SimpleQuery(frontend, backend, len, contents);
- break;
- …
- default:
- pool_error("ProcessFrontendResponse: unknown message type %c(%02x)", fkind, fkind);
- status = POOL_ERROR;
- }
- free(contents);
- if (status != POOL_CONTINUE)
- status = POOL_ERROR;
- return status;
- }
- /*
- * Process Query('Q') message
- * Query messages include an SQL string.
- */
- POOL_STATUS SimpleQuery(POOL_CONNECTION *frontend,
- POOL_CONNECTION_POOL *backend, int len, char *contents)
- {
- …
- /* log query to log file if necessary */
- if (pool_config->log_statement)
- {
- pool_log("statement: %s", contents);
- }
- else
- {
- pool_debug("statement2: %s", contents);
- }
- …
- if (parse_tree_list != NIL)
- {
- …
- /*
- * Decide where to send query
- */
- ④ pool_where_to_send(query_context, query_context->original_query,
- query_context->parse_tree);
- …
- }
- …
- /* switch memory context */
- pool_memory_context_switch_to(old_context);
- return POOL_CONTINUE;
- }
- /*
- * Decide where to send queries(thus expecting response)
- */
- void pool_where_to_send(POOL_QUERY_CONTEXT *query_context, char *query, Node *node)
- {
- …
- /*
- * In raw mode, we send only to master node. Simple enough.
- */
- if (RAW_MODE)
- {
- pool_set_node_to_be_sent(query_context, REAL_MASTER_NODE_ID);
- }
- else if (MASTER_SLAVE && query_context->is_multi_statement)
- {
- …
- }
- else if (MASTER_SLAVE)
- {
- POOL_DEST dest;
- POOL_MEMORY_POOL *old_context;
- old_context = pool_memory_context_switch_to(query_context->memory_context);
- ⑤ dest = send_to_where(node, query);
- pool_memory_context_switch_to(old_context);
- pool_debug("send_to_where: %d query: %s", dest, query);
- /* Should be sent to primary only? */
- if (dest == POOL_PRIMARY)
- {
- pool_set_node_to_be_sent(query_context, PRIMARY_NODE_ID);
- }
- /* Should be sent to both primary and standby? */
- else if (dest == POOL_BOTH)
- {
- pool_setall_node_to_be_sent(query_context);
- }
- /*
- * Ok, we might be able to load balance the SELECT query.
- */
- else
- {
- …
- }
- }
- else if (REPLICATION || PARALLEL_MODE)
- {
- …
- }
- else
- {
- pool_error("pool_where_to_send: unknown mode");
- return;
- }
- …
- return;
- }
- /*
- * From syntactically analysis decide the statement to be sent to the
- * primary, the standby or either or both in master/slave+HR/SR mode.
- */
- static POOL_DEST send_to_where(Node *node, char *query)
- {
- if (bsearch(&nodeTag(node), nodemap, sizeof(nodemap)/sizeof(nodemap[]),
- sizeof(NodeTag), compare) != NULL)
- {
- /*
- * SELECT INTO
- * SELECT FOR SHARE or UPDATE
- */
- if (IsA(node, SelectStmt))
- {
- /* SELECT INTO or SELECT FOR SHARE or UPDATE ? */
- if (pool_has_insertinto_or_locking_clause(node))
- return POOL_PRIMARY;
- return POOL_EITHER;
- }
- …
- /*
- * Transaction commands
- */
- else if (IsA(node, TransactionStmt))
- {
- /*
- * Check "BEGIN READ WRITE" "START TRANSACTION READ WRITE"
- */
- if (is_start_transaction_query(node))
- {
- /* But actually, we send BEGIN to standby if it's
- BEGIN READ WRITE or START TRANSACTION READ WRITE */
- if (is_read_write((TransactionStmt *)node))
- return POOL_BOTH;
- /* Other TRANSACTION start commands are sent to both primary
- and standby */
- else
- return POOL_BOTH;
- }
- /* SAVEPOINT related commands are sent to both primary and standby */
- else if (is_savepoint_query(node))
- return POOL_BOTH;
- /*
- * 2PC commands
- */
- else if (is_2pc_transaction_query(node))
- return POOL_PRIMARY;
- else
- /* COMMIT etc. */
- return POOL_BOTH;
- }
- …
- /*
- * EXECUTE
- */
- else if (IsA(node, ExecuteStmt))
- {
- /* This is temporary decision. where_to_send will inherit
- * same destination AS PREPARE.
- */
- return POOL_PRIMARY;
- }
- …
- /*
- * Other statements are sent to primary
- */
- return POOL_PRIMARY;
- }
- /*
- * All unknown statements are sent to primary
- */
- return POOL_PRIMARY;
- }
分析如下:
send_to_where函数中,处在Master/Slave模式的时候,数据的增、删、改指令只向PrimaryDB发送。
begin/commit这样的事务有关的指令,则既向Master送信,也向Slave送信。
再看源代码概要B:
通过上述的分析,从pool_process_query→send_to_where 的调用关系,
commit则既向Master送信,也向Slave送信,但是!
由于子进程与Slave之间的网络通信被中断,pool_read发生错误,那么此子进程就exit消亡了。
而此时,已经向PrimaryDB发送了的commit指令,已经成功,是无法取消的。
- /*
- * child main loop
- */
- void do_child(int unix_fd, int inet_fd)
- {
- …
- for (;;)
- {
- …
- /* query process loop */
- for (;;)
- {
- POOL_STATUS status;
- status = pool_process_query(frontend, backend, );
- …
- switch (status)
- {
- …
- /* error occured. discard backend connection pool
- and disconnect connection to the frontend */
- case POOL_ERROR:
- pool_log("do_child: exits with status 1 due to error");
- child_exit();
- break;
- …
- default:
- break;
- }
- if (status != POOL_CONTINUE)
- break;
- }
- …
- }
- child_exit();
- }
- /*
- * Do house keeping works when pgpool child process exits
- */
- void child_exit(int code)
- {
- …
- /* let backend know now we are exiting */
- send_frontend_exits();
- exit(code);
- }
- /*
- * send frontend exiting messages to all connections. this is called
- * in any case when child process exits, for example failover, child
- * life time expires or child max connections expires.
- */
- static void send_frontend_exits(void)
- {
- …
- for (i=;i<pool_config->max_pool;i++, p++)
- {
- ///ここで、マスタDB関連コネクションへ、exit信号は発送されません
- if (!MASTER_CONNECTION(p))
- continue;
- if (!MASTER_CONNECTION(p)->sp)
- continue;
- if (MASTER_CONNECTION(p)->sp->user == NULL)
- continue;
- pool_send_frontend_exits(p);
- }
- POOL_SETMASK(&oldmask);
- }
- /*
- * send "terminate"(X) message to all backends, indicating that
- * backend should prepare to close connection to frontend (actually
- * pgpool). Note that caller must be protecedt from a signal
- * interruption while calling this function. Otherwise the number of
- * valid backends might be changed by failover/failback.
- */
- void pool_send_frontend_exits(POOL_CONNECTION_POOL *backend)
- {
- …
- for (i=;i<NUM_BACKENDS;i++)
- {
- …
- if (VALID_BACKEND(i) && CONNECTION_SLOT(backend, i))
- {
- …
- pool_set_nonblock(CONNECTION(backend, i)->fd);
- pool_flush_it(CONNECTION(backend, i));
- pool_unset_nonblock(CONNECTION(backend, i)->fd);
- }
- }
- }
- /*
- * flush write buffer
- */
- int pool_flush_it(POOL_CONNECTION *cp)
- {
- …
- for (;;)
- {
- …
- if (sts > )
- {
- …
- }
- else if (errno == EAGAIN || errno == EINTR)
- {
- continue;
- }
- else
- {
- /* If this is the backend stream, report error. Otherwise
- * just report debug message.
- */
- if (cp->isbackend)
- pool_error("pool_flush_it: write failed to backend (%d). reason: %s offset: %d wlen: %d",
- cp->db_node_id, strerror(errno), offset, wlen);
- else
- pool_debug("pool_flush_it: write failed to frontend. reason: %s offset: %d wlen: %d",
- strerror(errno), offset, wlen);
- cp->wbufpo = ;
- return -;
- }
- }
- …
- return ;
- }
- /*
- * Main module for query processing
- * reset_request: if non 0, call reset_backend to execute reset queries
- */
- POOL_STATUS pool_process_query(POOL_CONNECTION *frontend,
- POOL_CONNECTION_POOL *backend,
- int reset_request)
- {
- …
- for (;;)
- {
- …
- /*
- * If we are prcessing query, process it.
- */
- if (pool_is_query_in_progress())
- {
- status = ProcessBackendResponse(frontend, backend, &state, &num_fields);
- if (status != POOL_CONTINUE)
- return status;
- }
- …
- }
- return POOL_CONTINUE;
- }
- POOL_STATUS ProcessBackendResponse(POOL_CONNECTION *frontend,
- POOL_CONNECTION_POOL *backend,
- int *state, short *num_fields)
- {
- …
- status = read_kind_from_backend(frontend, backend, &kind);
- if (status != POOL_CONTINUE)
- return status;
- …
- }
- /*
- * read_kind_from_backend: read kind from backends.
- * the "frontend" parameter is used to send "kind mismatch" error message to the frontend.
- * the out parameter "decided_kind" is the packet kind decided by this function.
- * this function uses "decide by majority" method if kinds from all backends do not agree.
- */
- POOL_STATUS read_kind_from_backend(POOL_CONNECTION *frontend,
- POOL_CONNECTION_POOL *backend, char *decided_kind)
- {
- …
- for (i=;i<NUM_BACKENDS;i++)
- {
- …
- if (VALID_BACKEND(i))
- {
- …
- do
- {
- char *p, *value;
- int len;
- if (pool_read(CONNECTION(backend, i), &kind, ) < )
- {
- pool_error("read_kind_from_backend: failed to read kind from %d th backend", i);
- return POOL_ERROR;
- }
- …
- } while (kind == 'S');
- …
- }
- else
- kind_list[i] = ;
- }
- …
- return POOL_CONTINUE;
- }
- /*
- * read len bytes from cp
- * returns 0 on success otherwise -1.
- */
- int pool_read(POOL_CONNECTION *cp, void *buf, int len)
- {
- …
- while (len > )
- {
- …
- if (cp->ssl_active > ) {
- readlen = pool_ssl_read(cp, readbuf, READBUFSZ);
- } else {
- readlen = read(cp->fd, readbuf, READBUFSZ);
- }
- …
- if (readlen == -)
- {
- …
- pool_error("pool_read: read failed (%s)", strerror(errno));
- if (cp->isbackend)
- {
- /* if fail_over_on_backend_erro is true, then trigger failover */
- if (pool_config->fail_over_on_backend_error)
- {
- notice_backend_error(cp->db_node_id);
- child_exit();
- }
- else
- return -;
- }
- else
- {
- return -;
- }
- }
- else if (readlen == )
- {
- if (cp->isbackend)
- {
- pool_error("pool_read: EOF encountered with backend");
- return -;
- }
- else
- {
- /*
- * if backend offers authentication method, frontend could close connection
- */
- return -;
- }
- }
- …
- }
- return ;
- }
回到上一级页面: PostgreSQL集群方案相关索引页 回到顶级页面:PostgreSQL索引页
磨砺技术珠矶,践行数据之道,追求卓越价值
pgpool-II的master-slave模式的分析的更多相关文章
- jenkins的Master/Slave模式
一. Master/Slave模式 分担jenkins服务器的压力,任务分配到其它执行机来执行 Master:Jenkins服务器 Slave:执行机(奴隶机).执行Master分配的任务,并返回任务 ...
- Jenkins—Master/Slave模式
Jenkins可部署在windows或者linux平台上,项目系统的用户多数为windows系统.如果Jenkins部署在linux上,而自动化任务要在windows平台执行,那么就需要使用Jenki ...
- ActiveMQ集群支持Master/Slave模式
现在ActiveMQ, 在Failover方面有两种解决方案:Pure Master Slave和Shared File System Master Slave. 先看Pure Master ...
- MySQL master/slave 模式
1 .复制 Mysql内建的复制功能是构建大型,高性能应用程序的基础.将Mysql的数据分布到多个系统上去,这种分布的机制,是通过将Mysql的某一台主机的 数据复制到其它主机(slaves)上,并重 ...
- jenkins master/slave模式
master是主机,只有master装jenkins slave是小弟机无需装jenkins,主要执行master分配的任务 一.新建slave 1.新建slave的方法:点击magian jenki ...
- hadoop的master和slave模式
hadoop的集群是基于master/slave模式. namenode和jobtracker属于master,datanode和tasktracker属于slave,master只有一个,而slav ...
- Jenkins设置Master/Slave
说明:通过master/slave模式,可以在master节点上统一管理其他slave节点. 下面说明一步步实现master/slave模式. 第一步:以管理员登录jenkins.点击"系统 ...
- Redis的master/slave复制
摘自:Redis的master/slave复制 Redis的master/slave数据复制方式可以是一主一从或者是一主多从的方式,Redis在master是非阻塞模式,也就是说在slave执行数据同 ...
- redis 学习笔记(3)-master/slave(主/从模式)
类似mysql的master-slave模式一样,redis的master-slave可以提升系统的可用性,master节点写入cache后,会自动同步到slave上. 环境: master node ...
随机推荐
- 四级菜单实现(Python)
menu_dict = { '山东' : { '青岛' : { '四方':{'兴隆路','平安路','杭州路'}, '黄岛':{}, '崂山':{} }, '济南' : { '历城':{}, '槐荫' ...
- 数据类型.md
数据类型 整型 数据类型 含义(有符号) tinyint(m) 1个字节 范围(-128~127) smallint(m) 2个字节 范围(-32768~32767) mediumint(m) 3个字 ...
- 历史在重演:从KHTML到WebKit,再到Blink
http://36kr.com/p/202396.html 上周四,Google宣布从WebKit 分支出自己的浏览器渲染引擎 Blink.很多人觉得这像是晴天霹雳,或者甚至是迟到的愚人节笑话,但是其 ...
- JavaScript权威指南第03章 类型、值和变量(1)
版权声明:本文为博主原创文章.未经博主同意不得转载. https://blog.csdn.net/huangbin10025/article/details/27953481 类型.值和变量 数据类型 ...
- Apache Kafka系列(七)Kafka Repartition操作
Kafka提供了重新分区的命令,但是只能增加,不能减少 我的kafka安装在/usr/local/kafka_2.12-1.0.2目录下面, [root@i-zk1 kafka_2.-]# bin/k ...
- Kali-linux使用Aircrack-ng工具破解无线网络
Aircrack-ng是一款基于破解无线802.11协议的WEP及WPA-PSK加密的工具.该工具主要用了两种攻击方式进行WEP破解.一种是FMS攻击,该攻击方式是以发现该WEP漏洞的研究人员名字(S ...
- ajax几种请求几种类型
jquery向服务器发送一个ajax请求后,可以返回多种类型的数据格式,包括:html,xml,json,text等. 首先说一下jquery中ajax标准的格式. $.ajax({ url: &qu ...
- leetcode 20 括号匹配
class Solution { public: bool isValid(string s) { stack<char> result; for(char c:s){ if(c == ' ...
- 【问题】 百度地图marker不在中心点处
问题: var map = new BMap.Map("mapshow"); map.centerAndZoom(point, 15); 显示百度地图时,map.centerAnd ...
- 《AngularJS即学即用》读书笔记(一)
最近在学习angularJS,就买了一本<AngularJS即学即用>作为自己的入门书籍,到目前为止看了两章的内容,感觉这本书还是不错的,东西讲的浅显易懂.之所以写这篇文章,一是督促自己能 ...