某日开发反馈测试环境的集群启动失败

报错内容如下:

[gpadmin@hadoop-test2:/root]
$ gpstart
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-Starting gpstart with args:
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-Gathering information and validating the environment...
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 5.0.0 build dev'
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-Greenplum Catalog Version: ''
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-Starting Master instance in admin mode
:::: gpstart:hadoop-test2:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode
:::: gpstart:hadoop-test2:gpadmin-[CRITICAL]:-Error occurred: non-zero rc:
Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /home/gpadmin/gpdata/gpmaster/gpseg-1 -l /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_log/startup.log
-w -t 600 -o " -p 2346 --gp_dbid=1 --gp_num_contents_in_cluster=0 --silent-mode=true -i -M master --gp_contentid=-1 -x 0 -c gp_role=utility " start'
rc=, stdout='waiting for server to start...................................................................................................................................
...........................................................................................................................................................................
...........................................................................................................................................................................
.................................................................................................................................. stopped waiting
', stderr='could not change directory to "/root"
pg_ctl: could not start server
Examine the log output.

查看启动日志发现:

vim /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_log/startup.log
-- ::24.067241 GMT,,,p5464,th-,,,,,,,seg-,,,,,"WARNING","","""work_mem"": setting is deprecated, and may be removed in a future release.",,,,,,,,"set_config_option","guc.c",,
-- ::24.067612 GMT,,,p5464,th-,,,,,,,seg-,,,,,"WARNING","","""work_mem"": setting is deprecated, and may be removed in a future release.",,,,,,,,"set_config_option","guc.c",,
-- ::24.083813 GMT,,,p5465,th-,,,,,,,seg-,,,,,"LOG","","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",,
-- ::24.098673 GMT,,,p5465,th-,,,,,,,seg-,,,,,"FATAL","XX000","could not create shared memory segment: Invalid argument (pg_shmem.c:183)","Failed system call was shmget(key=2346001, size=177586016, 03600).","This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 177586016 bytes), reduce PostgreSQL's shared_buffers parameter (currently 4000) and/or its max_connections parameter (currently 253).
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory configuration.",,,,,,"InternalIpcMemoryCreate","pg_shmem.c",183,1

内容大概是说/etc/sysctl.conf设置的内核参数shmmax过小,导致启动失败

查看/etc/sysctl.conf下的配置发现:

kernel.shmmax =
kernel.shmmni =
kernel.shmall =
kernel.sem =
kernel.sysrq =
kernel.core_uses_pid =
kernel.msgmnb =
kernel.msgmax =
kernel.msgmni =
net.ipv4.tcp_syncookies =
net.ipv4.ip_forward =
net.ipv4.conf.default.accept_source_route =
net.ipv4.tcp_tw_recycle =
net.ipv4.tcp_max_syn_backlog =
net.ipv4.conf.all.arp_filter =
net.ipv4.ip_local_port_range =
net.core.netdev_max_backlog =
net.core.rmem_max =
net.core.wmem_max =
vm.overcommit_memory =

对比官网建议的设置和参数定义以及集群已有的数据量,发现确实过小。于是改成官网建议的设置后启动。

:::: gpstart:hadoop-test2:gpadmin-[INFO]:-----------------------------------------------------
:::: gpstart:hadoop-test2:gpadmin-[INFO]:- Successful segment starts =
:::: gpstart:hadoop-test2:gpadmin-[INFO]:- Failed segment starts =
:::: gpstart:hadoop-test2:gpadmin-[INFO]:- Skipped segment starts (segments are marked down in configuration) =
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-----------------------------------------------------
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-Successfully started of segment instances
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-----------------------------------------------------
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-Starting Master instance hadoop-test2 directory /home/gpadmin/gpdata/gpmaster/gpseg-
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-Command pg_ctl reports Master hadoop-test2 instance active
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-No standby master configured. skipping...
:::: gpstart:hadoop-test2:gpadmin-[INFO]:-Database successfully started

启动成功。

总结:pg启动相关的内核参数配置与实际情况不匹配时,会导致启动失败。可通过查看日志详细信息查找根源解决问题。

参考文档:

1、官网建议设置 http://gpdb.docs.pivotal.io/4380/prep_os-system-params.html#topic3

2、内核参数含义http://www.oicqzone.com/pc/2012091612901.html

Greenplum启动失败Error occurred: non-zero rc: 1的修复的更多相关文章

  1. MyCat启动失败 Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: rebirth.a: rebirth.a: unknown error

    在使用Nactive连接MyCat的时候发现怎么连接都不ok,明明已经启动了(实际上启动失败了)! 粗心的我,后来看了下日志,果然,启动失败了 Error: Exception thrown by t ...

  2. supervisord 启动失败 Error: Another program is already listening on a port that one of our HTTP serve...

    Linux系统中 Supervisor 配置守护进程: 启动Supervisor 服务语句: supervisord -c /etc/supervisor/supervisord.conf 这个过程可 ...

  3. jboss服务启动失败报:Error occurred during initialization of VM

    今天下午突然间公司的GTV管理平台上不去了 访问确实,提示找不到页面 登录终端查看服务进程. ps -ef | grep jboss 发现没有这个进程.怎么办,启动被. 输入nohup /home/c ...

  4. Eclipse启动时发生An internal error occurred during: "Initializing Java Tooling".错误的解决方法

    问题描述: Eclipse启动时发生An internal error occurred during: "Initializing JavaTooling".错误的解决方法 解决 ...

  5. Eclipse无法启动报An internal error occurred during: "reload maven project". java.lang.NullPointerException

    由于没有正常关机导致eclipse无法将数据正常写入配置文件导致无法启动.报这样一个异常 An internal error occurred during: "reload maven p ...

  6. MyEclipse for Spring启动时报错"An internal error occurred during: 'Updating indexes'.Java heap space"的解决办法

    问题 MyEclipse for Spring在启动时,报如下错误:An internal error occurred during: 'Updating indexes'.Java heap sp ...

  7. Discuz云平台站点信息同步失败,An unknown error occurred. May be DNS Error.

    站点信息同步失败 An unknown error occurred. May be DNS Error. (ERRCODE:1) 经过Discuz教程网(http://www.1314study.c ...

  8. 启动Eclipse发生错误:An internal error occurred during: "Initializing Java Tooling".

    问题描述   由于上一次关闭 Eclipse 时没有正常关闭,再次启动 Eclipse 时报错:An internal error occurred during: "Initializin ...

  9. Eclipse启动报错An internal error occurred during: "Initializing Java Tooling"

    Eclipse启动报错An internal error occurred during: "Initializing Java Tooling" 解决方案: 删除工作空间work ...

随机推荐

  1. MyEclipse忽略js报错

    MyEclipse对官网下载的js报错,解决办法如下: 1. 对js文件右键选择 MyEclipse --> Exclude From Validation 2. 然后继续右键执行MyEclip ...

  2. 【详解】Tomcat是如何监控并删除超时Session的?

    前言 偶然发现Tomcat会话时间的半小时,并不是说会话创建后,只有半小时的有效使用时间,而是说会话空闲半小时后,会被删除.索性就翻了一下源码.做了一番整理. 注:空闲时间,指的是同一个会话两次请求之 ...

  3. Logback中如何自定义灵活的日志过滤规则

    当我们需要对日志的打印要做一些范围的控制的时候,通常都是通过为各个Appender设置不同的Filter配置来实现.在Logback中自带了两个过滤器实现:ch.qos.logback.classic ...

  4. 【golang-GUI开发】Qt项目的打包发布

    这是本系列的第三篇文章,前两篇我们讲了qt的安装和编译,今天我们讲一讲程序的打包. 好像我们现在都没怎么讲到qt的使用,因为想要放开手脚写代码,一些基础是要打牢的. 不过请放心,下一篇文章开始我们就会 ...

  5. C#计算字符串长度,汉字算两个字符

    在C#中的字符串类String中,有个Length属性表示字符串的长度,但该字段返回的是字符的个数,如果字符串中含有中文字符的话,一个汉字占用两个字符的长度,此时获取的长度就不够精确,当然也看具体业务 ...

  6. [angularjs] angularjs系列笔记(八)事件

    AngularJs有自己的HTML事件 ng-click指令 ng-click指令定义了AngularJs点击事件 当点击按钮的时候,赋值count变量并且给count变量加1,显示出count变量 ...

  7. python使用tcp实现一个简单的下载器

    上一篇中介绍了tcp的流程,本篇通过写一个简单的文件下载器程序来巩固之前学的知识. 文件下载器的流程如下: 客户端: 输入目标服务器的ip和port 输入要下载文件的名称 从服务器下载文件保存到本地 ...

  8. JavaSE-基础语法(三)-面向对象

    面向对象 8.类9.对象10.封装11.继承12.多态13.构造器14.super15.this16.接口17.抽象类18.权限修饰符19.内部类

  9. js节点的类型

    1. dom>documentElement>body>tagname 2.我们常用的节点标签. 元素节点(标签) 文本节点 属性节点(标签里的属性) 3.document有个属性n ...

  10. JavaScript机器学习之线性回归

    译者按: AI时代,不会机器学习的JavaScript开发者不是好的前端工程师. 原文: Machine Learning with JavaScript : Part 1 译者: Fundebug ...