磨砺技术珠矶,践行数据之道,追求卓越价值

回到上一级页面: PostgreSQL集群方案相关索引页     回到顶级页面:PostgreSQL索引页

现象描述:

客户来邮件,问:为何Pgpool-II在master-slave模式的时候,发生:

pgpool-II的某子进程与slave db节点间的连接因为长时间无联系被L4SW切断,却不发生failover,而此时向master db节点的commit已经生效,但是马上返回出错信息?

简单言之,那是因为,Pgpool-II开发的时候,没有考虑到这种进程的单独的连接被刻意切断的情形。

此时,如果fail_over_on_backend_error为ture,那么也会激发failover过程。

如果fail_over_on_backend_error为false,而pgpool-II的主进程此时还不断地进行healthcheck,可以正常检测到slave db节点,那么failover过程不会被激发。

上代码:

源代码概要A:

  1. /*
  2. * child main loop
  3. */
  4. void do_child(int unix_fd, int inet_fd)
  5. {

  6. for (;;)
  7. {

  8. /* perform accept() */
  9. frontend = do_accept(unix_fd, inet_fd, &timeout);
  10. if (frontend =/=* N cUonLLn)ection request from frontend timed out */
  11. {
  12. /* check select() timeout */
  13. if (connected && pool_config->child_life_time > &&
  14. timeout.tv_sec == && timeout.tv_usec == )
  15. {
  16. pool_debug("child life %d seconds expired", pool_config->child_life_time);
  17. /*
  18. * Doesn't need to call this. child_exit() calls it.
  19. * send_frontend_exits();
  20. */
  21. child_exit();
  22. }
  23. continue;
  24. }

  25. /*
  26. * Ok, negotiaton with frontend has been done. Let's go to the
  27. * next step. Connect to backend if there's no existing
  28. * connection which can be reused by this frontend.
  29. * Authentication is also done in this step.
  30. */

  31. /*
  32. * if there's no connection associated with user and database,
  33. * we need to connect to the backend and send the startup packet.
  34. */
  35. /* look for existing connection */
  36. found = ;
  37. backend = pool_get_cp(sp->user, sp->database, sp->major, );

  38. /* Mark this connection pool is conncted from frontend */
  39. pool_coninfo_set_frontend_connected(pool_get_process_context()->proc_id, pool_pool_index());
  40. /* query process loop */
  41. for (;;)
  42. {
  43. POOL_STATUS status;
  44. status = pool_process_query(frontend, backend, );
  45. sp = MASTER_CONNECTION(backend)->sp;
  46. switch (status)
  47. {

  48. }
  49. if (status != POOL_CONTINUE)
  50. break;
  51. }

  52. }
  53. child_exit();
  54. }
  55.  
  56. /*
  57. * Main module for query processing
  58. * reset_request: if non 0, call reset_backend to execute reset queries
  59. */
  60. POOL_STATUS pool_process_query(POOL_CONNECTION *frontend,
  61. POOL_CONNECTION_POOL *backend,
  62. int reset_request)
  63. {

  64. for (;;)
  65. {

  66. /*
  67. * If we are prcessing query, process it.
  68. */
  69. if (pool_is_query_in_progress())
  70. {
  71. status = ProcessBackendResponse(frontend, backend, &state, &num_fields);
  72. if (status != POOL_CONTINUE)
  73. return status;
  74. }
  75. /*
  76. * If frontend and all backends do not have any pending data in
  77. * the receiving data cache, then issue select(2) to wait for new
  78. * data arrival
  79. */
  80. else if (is_cache_empty(frontend, backend))
  81. {
  82. bool cont = true;
  83. status = read_packets_and_process(frontend, backend, reset_request,
  84. &state, &num_fields, &cont);
  85. if (status != POOL_CONTINUE)
  86. return status;
  87. else if (!c/o*n Dt)etected admin shutdown */
  88. return status;
  89. }
  90. else
  91. {

  92. }

  93. }
  94. return POOL_CONTINUE;
  95. }
  96.  
  97. /*
  98. * Read packet from either frontend or backend and process it.
  99. */
  100. static POOL_STATUS read_packets_and_process(POOL_CONNECTION *frontend,
  101. POOL_CONNECTION_POOL *backend, int reset_request, int *state, short *num_fields, bool *cont)
  102. {

  103. if (!reset_request)
  104. {
  105. if (FD_ISSET(frontend->fd, &exceptmask))
  106. return POOL_END;
  107. else if (FD_ISSET(frontend->fd, &readmask))
  108. {
  109. status = ProcessFrontendResponse(frontend, backend);
  110. if (status != POOL_CONTINUE)
  111. return status;
  112. }
  113. }

  114. return POOL_CONTINUE;
  115. }
  116.  
  117. POOL_STATUS ProcessFrontendResponse(POOL_CONNECTION *frontend,
  118. POOL_CONNECTION_POOL *backend)
  119. {

  120. switch (fkind)
  121. {

  122. case 'X': /* Terminate */
  123. free(contents);
  124. return POOL_END;
  125. case 'Q': /* Query */
  126. allow_close_transaction = ;
  127. status = SimpleQuery(frontend, backend, len, contents);
  128. break;

  129. default:
  130. pool_error("ProcessFrontendResponse: unknown message type %c(%02x)", fkind, fkind);
  131. status = POOL_ERROR;
  132. }
  133. free(contents);
  134. if (status != POOL_CONTINUE)
  135. status = POOL_ERROR;
  136. return status;
  137. }
  138.  
  139. /*
  140. * Process Query('Q') message
  141. * Query messages include an SQL string.
  142. */
  143. POOL_STATUS SimpleQuery(POOL_CONNECTION *frontend,
  144. POOL_CONNECTION_POOL *backend, int len, char *contents)
  145. {

  146. /* log query to log file if necessary */
  147. if (pool_config->log_statement)
  148. {
  149. pool_log("statement: %s", contents);
  150. }
  151. else
  152. {
  153. pool_debug("statement2: %s", contents);
  154. }

  155. if (parse_tree_list != NIL)
  156. {

  157. /*
  158. * Decide where to send query
  159. */
  160. pool_where_to_send(query_context, query_context->original_query,
  161. query_context->parse_tree);

  162. }

  163. /* switch memory context */
  164. pool_memory_context_switch_to(old_context);
  165. return POOL_CONTINUE;
  166. }
  167.  
  168. /*
  169. * Decide where to send queries(thus expecting response)
  170. */
  171. void pool_where_to_send(POOL_QUERY_CONTEXT *query_context, char *query, Node *node)
  172. {

  173. /*
  174. * In raw mode, we send only to master node. Simple enough.
  175. */
  176. if (RAW_MODE)
  177. {
  178. pool_set_node_to_be_sent(query_context, REAL_MASTER_NODE_ID);
  179. }
  180. else if (MASTER_SLAVE && query_context->is_multi_statement)
  181. {

  182. }
  183. else if (MASTER_SLAVE)
  184. {
  185. POOL_DEST dest;
  186. POOL_MEMORY_POOL *old_context;
  187. old_context = pool_memory_context_switch_to(query_context->memory_context);
  188. dest = send_to_where(node, query);
  189. pool_memory_context_switch_to(old_context);
  190. pool_debug("send_to_where: %d query: %s", dest, query);
  191. /* Should be sent to primary only? */
  192. if (dest == POOL_PRIMARY)
  193. {
  194. pool_set_node_to_be_sent(query_context, PRIMARY_NODE_ID);
  195. }
  196. /* Should be sent to both primary and standby? */
  197. else if (dest == POOL_BOTH)
  198. {
  199. pool_setall_node_to_be_sent(query_context);
  200. }
  201. /*
  202. * Ok, we might be able to load balance the SELECT query.
  203. */
  204. else
  205. {

  206. }
  207. }
  208. else if (REPLICATION || PARALLEL_MODE)
  209. {

  210. }
  211. else
  212. {
  213. pool_error("pool_where_to_send: unknown mode");
  214. return;
  215. }

  216. return;
  217. }
  218.  
  219. /*
  220. * From syntactically analysis decide the statement to be sent to the
  221. * primary, the standby or either or both in master/slave+HR/SR mode.
  222. */
  223. static POOL_DEST send_to_where(Node *node, char *query)
  224. {
  225. if (bsearch(&nodeTag(node), nodemap, sizeof(nodemap)/sizeof(nodemap[]),
  226. sizeof(NodeTag), compare) != NULL)
  227. {
  228. /*
  229. * SELECT INTO
  230. * SELECT FOR SHARE or UPDATE
  231. */
  232. if (IsA(node, SelectStmt))
  233. {
  234. /* SELECT INTO or SELECT FOR SHARE or UPDATE ? */
  235. if (pool_has_insertinto_or_locking_clause(node))
  236. return POOL_PRIMARY;
  237. return POOL_EITHER;
  238. }

  239. /*
  240. * Transaction commands
  241. */
  242. else if (IsA(node, TransactionStmt))
  243. {
  244. /*
  245. * Check "BEGIN READ WRITE" "START TRANSACTION READ WRITE"
  246. */
  247. if (is_start_transaction_query(node))
  248. {
  249. /* But actually, we send BEGIN to standby if it's
  250. BEGIN READ WRITE or START TRANSACTION READ WRITE */
  251. if (is_read_write((TransactionStmt *)node))
  252. return POOL_BOTH;
  253. /* Other TRANSACTION start commands are sent to both primary
  254. and standby */
  255. else
  256. return POOL_BOTH;
  257. }
  258. /* SAVEPOINT related commands are sent to both primary and standby */
  259. else if (is_savepoint_query(node))
  260. return POOL_BOTH;
  261. /*
  262. * 2PC commands
  263. */
  264. else if (is_2pc_transaction_query(node))
  265. return POOL_PRIMARY;
  266. else
  267. /* COMMIT etc. */
  268. return POOL_BOTH;
  269. }

  270. /*
  271. * EXECUTE
  272. */
  273. else if (IsA(node, ExecuteStmt))
  274. {
  275. /* This is temporary decision. where_to_send will inherit
  276. * same destination AS PREPARE.
  277. */
  278. return POOL_PRIMARY;
  279. }

  280. /*
  281. * Other statements are sent to primary
  282. */
  283. return POOL_PRIMARY;
  284. }
  285.  
  286. /*
  287. * All unknown statements are sent to primary
  288. */
  289. return POOL_PRIMARY;
  290. }

分析如下:

 send_to_where函数中,处在Master/Slave模式的时候,数据的增、删、改指令只向PrimaryDB发送。
 begin/commit这样的事务有关的指令,则既向Master送信,也向Slave送信。

再看源代码概要B:

 通过上述的分析,从pool_process_query→send_to_where 的调用关系,
 commit则既向Master送信,也向Slave送信,但是!
 由于子进程与Slave之间的网络通信被中断,pool_read发生错误,那么此子进程就exit消亡了。
 而此时,已经向PrimaryDB发送了的commit指令,已经成功,是无法取消的。

  1. /*
  2. * child main loop
  3. */
  4. void do_child(int unix_fd, int inet_fd)
  5. {

  6. for (;;)
  7. {

  8. /* query process loop */
  9. for (;;)
  10. {
  11. POOL_STATUS status;
  12. status = pool_process_query(frontend, backend, );

  13. switch (status)
  14. {

  15. /* error occured. discard backend connection pool
  16. and disconnect connection to the frontend */
  17. case POOL_ERROR:
  18. pool_log("do_child: exits with status 1 due to error");
  19. child_exit();
  20. break;

  21. default:
  22. break;
  23. }
  24. if (status != POOL_CONTINUE)
  25. break;
  26. }

  27. }
  28. child_exit();
  29. }
  30.  
  31. /*
  32. * Do house keeping works when pgpool child process exits
  33. */
  34. void child_exit(int code)
  35. {

  36. /* let backend know now we are exiting */
  37. send_frontend_exits();
  38. exit(code);
  39. }
  40.  
  41. /*
  42. * send frontend exiting messages to all connections. this is called
  43. * in any case when child process exits, for example failover, child
  44. * life time expires or child max connections expires.
  45. */
  46. static void send_frontend_exits(void)
  47. {

  48. for (i=;i<pool_config->max_pool;i++, p++)
  49. {
  50. ///ここで、マスタDB関連コネクションへ、exit信号は発送されません
  51. if (!MASTER_CONNECTION(p))
  52. continue;
  53. if (!MASTER_CONNECTION(p)->sp)
  54. continue;
  55. if (MASTER_CONNECTION(p)->sp->user == NULL)
  56. continue;
  57. pool_send_frontend_exits(p);
  58. }
  59. POOL_SETMASK(&oldmask);
  60. }
  61.  
  62. /*
  63. * send "terminate"(X) message to all backends, indicating that
  64. * backend should prepare to close connection to frontend (actually
  65. * pgpool). Note that caller must be protecedt from a signal
  66. * interruption while calling this function. Otherwise the number of
  67. * valid backends might be changed by failover/failback.
  68. */
  69. void pool_send_frontend_exits(POOL_CONNECTION_POOL *backend)
  70. {

  71. for (i=;i<NUM_BACKENDS;i++)
  72. {

  73. if (VALID_BACKEND(i) && CONNECTION_SLOT(backend, i))
  74. {

  75. pool_set_nonblock(CONNECTION(backend, i)->fd);
  76. pool_flush_it(CONNECTION(backend, i));
  77. pool_unset_nonblock(CONNECTION(backend, i)->fd);
  78. }
  79. }
  80. }
  81.  
  82. /*
  83. * flush write buffer
  84. */
  85. int pool_flush_it(POOL_CONNECTION *cp)
  86. {

  87. for (;;)
  88. {

  89. if (sts > )
  90. {

  91. }
  92. else if (errno == EAGAIN || errno == EINTR)
  93. {
  94. continue;
  95. }
  96. else
  97. {
  98. /* If this is the backend stream, report error. Otherwise
  99. * just report debug message.
  100. */
  101. if (cp->isbackend)
  102. pool_error("pool_flush_it: write failed to backend (%d). reason: %s offset: %d wlen: %d",
  103. cp->db_node_id, strerror(errno), offset, wlen);
  104. else
  105. pool_debug("pool_flush_it: write failed to frontend. reason: %s offset: %d wlen: %d",
  106. strerror(errno), offset, wlen);
  107. cp->wbufpo = ;
  108. return -;
  109. }
  110. }

  111. return ;
  112. }
  113.  
  114. /*
  115. * Main module for query processing
  116. * reset_request: if non 0, call reset_backend to execute reset queries
  117. */
  118. POOL_STATUS pool_process_query(POOL_CONNECTION *frontend,
  119. POOL_CONNECTION_POOL *backend,
  120. int reset_request)
  121. {

  122. for (;;)
  123. {

  124. /*
  125. * If we are prcessing query, process it.
  126. */
  127. if (pool_is_query_in_progress())
  128. {
  129. status = ProcessBackendResponse(frontend, backend, &state, &num_fields);
  130. if (status != POOL_CONTINUE)
  131. return status;
  132. }

  133. }
  134. return POOL_CONTINUE;
  135. }
  136.  
  137. POOL_STATUS ProcessBackendResponse(POOL_CONNECTION *frontend,
  138. POOL_CONNECTION_POOL *backend,
  139. int *state, short *num_fields)
  140. {

  141. status = read_kind_from_backend(frontend, backend, &kind);
  142. if (status != POOL_CONTINUE)
  143. return status;

  144. }
  145.  
  146. /*
  147. * read_kind_from_backend: read kind from backends.
  148. * the "frontend" parameter is used to send "kind mismatch" error message to the frontend.
  149. * the out parameter "decided_kind" is the packet kind decided by this function.
  150. * this function uses "decide by majority" method if kinds from all backends do not agree.
  151. */
  152. POOL_STATUS read_kind_from_backend(POOL_CONNECTION *frontend,
  153. POOL_CONNECTION_POOL *backend, char *decided_kind)
  154. {

  155. for (i=;i<NUM_BACKENDS;i++)
  156. {

  157. if (VALID_BACKEND(i))
  158. {

  159. do
  160. {
  161. char *p, *value;
  162. int len;
  163. if (pool_read(CONNECTION(backend, i), &kind, ) < )
  164. {
  165. pool_error("read_kind_from_backend: failed to read kind from %d th backend", i);
  166. return POOL_ERROR;
  167. }

  168. } while (kind == 'S');

  169. }
  170. else
  171. kind_list[i] = ;
  172. }

  173. return POOL_CONTINUE;
  174. }
  175.  
  176. /*
  177. * read len bytes from cp
  178. * returns 0 on success otherwise -1.
  179. */
  180. int pool_read(POOL_CONNECTION *cp, void *buf, int len)
  181. {

  182. while (len > )
  183. {

  184. if (cp->ssl_active > ) {
  185. readlen = pool_ssl_read(cp, readbuf, READBUFSZ);
  186. } else {
  187. readlen = read(cp->fd, readbuf, READBUFSZ);
  188. }

  189. if (readlen == -)
  190. {

  191. pool_error("pool_read: read failed (%s)", strerror(errno));
  192. if (cp->isbackend)
  193. {
  194. /* if fail_over_on_backend_erro is true, then trigger failover */
  195. if (pool_config->fail_over_on_backend_error)
  196. {
  197. notice_backend_error(cp->db_node_id);
  198. child_exit();
  199. }
  200. else
  201. return -;
  202. }
  203. else
  204. {
  205. return -;
  206. }
  207. }
  208. else if (readlen == )
  209. {
  210. if (cp->isbackend)
  211. {
  212. pool_error("pool_read: EOF encountered with backend");
  213. return -;
  214. }
  215. else
  216. {
  217. /*
  218. * if backend offers authentication method, frontend could close connection
  219. */
  220. return -;
  221. }
  222. }

  223. }
  224. return ;
  225. }

回到上一级页面: PostgreSQL集群方案相关索引页     回到顶级页面:PostgreSQL索引页

磨砺技术珠矶,践行数据之道,追求卓越价值

pgpool-II的master-slave模式的分析的更多相关文章

  1. jenkins的Master/Slave模式

    一. Master/Slave模式 分担jenkins服务器的压力,任务分配到其它执行机来执行 Master:Jenkins服务器 Slave:执行机(奴隶机).执行Master分配的任务,并返回任务 ...

  2. Jenkins—Master/Slave模式

    Jenkins可部署在windows或者linux平台上,项目系统的用户多数为windows系统.如果Jenkins部署在linux上,而自动化任务要在windows平台执行,那么就需要使用Jenki ...

  3. ActiveMQ集群支持Master/Slave模式

    现在ActiveMQ, 在Failover方面有两种解决方案:Pure Master Slave和Shared File System Master Slave.      先看Pure Master ...

  4. MySQL master/slave 模式

    1 .复制 Mysql内建的复制功能是构建大型,高性能应用程序的基础.将Mysql的数据分布到多个系统上去,这种分布的机制,是通过将Mysql的某一台主机的 数据复制到其它主机(slaves)上,并重 ...

  5. jenkins master/slave模式

    master是主机,只有master装jenkins slave是小弟机无需装jenkins,主要执行master分配的任务 一.新建slave 1.新建slave的方法:点击magian jenki ...

  6. hadoop的master和slave模式

    hadoop的集群是基于master/slave模式. namenode和jobtracker属于master,datanode和tasktracker属于slave,master只有一个,而slav ...

  7. Jenkins设置Master/Slave

    说明:通过master/slave模式,可以在master节点上统一管理其他slave节点. 下面说明一步步实现master/slave模式. 第一步:以管理员登录jenkins.点击"系统 ...

  8. Redis的master/slave复制

    摘自:Redis的master/slave复制 Redis的master/slave数据复制方式可以是一主一从或者是一主多从的方式,Redis在master是非阻塞模式,也就是说在slave执行数据同 ...

  9. redis 学习笔记(3)-master/slave(主/从模式)

    类似mysql的master-slave模式一样,redis的master-slave可以提升系统的可用性,master节点写入cache后,会自动同步到slave上. 环境: master node ...

随机推荐

  1. 四级菜单实现(Python)

    menu_dict = { '山东' : { '青岛' : { '四方':{'兴隆路','平安路','杭州路'}, '黄岛':{}, '崂山':{} }, '济南' : { '历城':{}, '槐荫' ...

  2. 数据类型.md

    数据类型 整型 数据类型 含义(有符号) tinyint(m) 1个字节 范围(-128~127) smallint(m) 2个字节 范围(-32768~32767) mediumint(m) 3个字 ...

  3. 历史在重演:从KHTML到WebKit,再到Blink

    http://36kr.com/p/202396.html 上周四,Google宣布从WebKit 分支出自己的浏览器渲染引擎 Blink.很多人觉得这像是晴天霹雳,或者甚至是迟到的愚人节笑话,但是其 ...

  4. JavaScript权威指南第03章 类型、值和变量(1)

    版权声明:本文为博主原创文章.未经博主同意不得转载. https://blog.csdn.net/huangbin10025/article/details/27953481 类型.值和变量 数据类型 ...

  5. Apache Kafka系列(七)Kafka Repartition操作

    Kafka提供了重新分区的命令,但是只能增加,不能减少 我的kafka安装在/usr/local/kafka_2.12-1.0.2目录下面, [root@i-zk1 kafka_2.-]# bin/k ...

  6. Kali-linux使用Aircrack-ng工具破解无线网络

    Aircrack-ng是一款基于破解无线802.11协议的WEP及WPA-PSK加密的工具.该工具主要用了两种攻击方式进行WEP破解.一种是FMS攻击,该攻击方式是以发现该WEP漏洞的研究人员名字(S ...

  7. ajax几种请求几种类型

    jquery向服务器发送一个ajax请求后,可以返回多种类型的数据格式,包括:html,xml,json,text等. 首先说一下jquery中ajax标准的格式. $.ajax({ url: &qu ...

  8. leetcode 20 括号匹配

    class Solution { public: bool isValid(string s) { stack<char> result; for(char c:s){ if(c == ' ...

  9. 【问题】 百度地图marker不在中心点处

    问题: var map = new BMap.Map("mapshow"); map.centerAndZoom(point, 15); 显示百度地图时,map.centerAnd ...

  10. 《AngularJS即学即用》读书笔记(一)

    最近在学习angularJS,就买了一本<AngularJS即学即用>作为自己的入门书籍,到目前为止看了两章的内容,感觉这本书还是不错的,东西讲的浅显易懂.之所以写这篇文章,一是督促自己能 ...