简单来说GPDB是一个分布式数据库软件,其可以管理和处理分布在多个不同主机上的海量数据。对于GPDB来说,一个DB实例实际上是由多个独立的PostgreSQL实例组成的,它们分布在不同的物理主机上,协同工作,呈现给用户的是一个DB的效果。Master是GPDB系统的访问入口,其负责处理客户端的连接及SQL 命令、协调系统中的其他Instance(Segment)工作,Segment负责管理和处理用户数据。

环境准备:
操作系统:CentOS Linux release 7.6.1810 (Core) 64位
master 1台(架构图中的主节点),Standby 1台(架构图中的从节点),Segment 2台。共4台服务器。

一、Master主机 Root 用户上操作

1. 修改/etc/hosts文件,添加下面内容(注:4台服务器相同的配置)

vim /etc/hosts

192.168.18.130 gp-master
192.168.18.131 gp-standby
192.168.18.132 gp-node1
192.168.18.133 gp-node2

2. 服务器关闭selinux,防火墙4台服务器相互开放,测试环境可以直接先关闭防火墙。(注:4台服务器相同的配置)

关闭Firewalld

systemctl stop firewalld
systemctl disable firewalld

永久关闭Selinux

vim /etc/selinux/conf

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted

注:查看Selinux运行状态:getenforce,CLI界面非永久生效设置SeLinux:setenforce 0(0-1对应关闭和开启)

3. 操作系统参数设置

vim /etc/sysctl.conf   (注:4台服务器相同的配置)

kernel.shmmax =
kernel.shmmni =
kernel.shmall =
kernel.sem =
kernel.sysrq =
kernel.core_uses_pid =
kernel.msgmnb =
kernel.msgmax =
net.ipv4.tcp_syncookies =
net.ipv4.ip_forward =
net.ipv4.conf.default.accept_source_route =
net.ipv4.tcp_tw_recycle =
net.ipv4.tcp_max_syn_backlog =
net.ipv4.conf.all.arp_filter =
net.ipv4.conf.default.arp_filter =
net.core.netdev_max_backlog =
vm.overcommit_memory =
kernel.msgmni =
net.ipv4.ip_local_port_range =

vim /etc/security/limits.conf  (注:4台服务器相同的配置)

* soft nofile
* hard nofile
* soft nproc
* hard nproc

磁盘预读参数及 deadline算法修改 (注:4台服务器相同的配置)

blockdev --setra  /dev/sda
echo deadline > /sys/block/sda/queue/scheduler

注:磁盘盘符sda需根据自己的实际情况进行配置

软件下载地址:https://network.pivotal.io/products/pivotal-gpdb,下载:greenplum-db-5.21.1-rhel7-x86_64.rpm

在Master主机上安装GP二进制文件,也就是主机名是mdw的服务器。(注:在master上安装即可,后面通过批量的方法安装剩下的服务器)

rpm -ivh greenplum-db-5.21.1-rhel7-x86_64.rpm

注:默认安装目录:/usr/local

在Master上添加gpadmin用户

adduser gpadmin
echo gpadmin | passwd --stdin gpadmin

注:设置密码为了后面gpssh-exkeys -f hostfile_allhosts 使用

在Master上给gpadmin用户提权

[root@gp-master ~]# visudo
gpadmin ALL=(ALL) ALL
gpadmin ALL=(ALL) NOPASSWD:ALL

在Master主机上赋予gpadmin用户Greenplum文件夹的的权限

chown -R gpadmin.gpadmin /usr/local/greenplum-db*

二、Master 主机 Gpadmin用户上操作

准备用于批量安装软件以及后续集群的初始化文件,hostfile_allhosts,hostfile_segments,hostfile_mshosts,存放到/home/gpadmin

su - gpadmin

vim hostfile_allhosts

gp-master
gp-standby
gp-node1
gp-node2

vim hostfile_segments

gp-node1
gp-node2

vim hostfile_mshosts

gp-master
gp-standby

设置各主机之间免密登录

gpssh-exkeys -f hostfile_allhosts

注:需输入gpadmin用户的密码,此处为:gpadmin

设置用于安装Greenplum的文件夹权限

gpssh -f hostfile_allhosts
=> sudo chown gpadmin.gpadmin /usr/local
=> exit

创建及赋权master/standby主机元数据存储目录

gpssh -f hostfile_mshosts
=>sudo mkdir /data/greenplum_data/gpmaster
=>sudo chown -R gpadmin.gpadmin /data
=>exit

创建及赋权Segments主机数据存储目录

gpssh -f hostfile_segments
=>sudo mkdir /data/greenplum_data/{primary,mirror}
=>sudo chown -R gpadmin.gpadmin /data
=>exit

批量安装软件(GP)

cd /home/gpadmin/
source /usr/local/greenplum-db/greenplum_path.sh
gpseginstall -f hostfile_allhosts -u gpadmin -p gpadmin

设置NTP同步


Yum下载安装NTP服务器,已安装的可以略过

sudo yum install ntp -y

若出现如下报错,可看下一步解决方法

There was a problem importing one of the Python modules
required to run yum. The error leading to this problem was: No module named yum Please install a package which provides this module, or
verify that the module is installed correctly. It's possible that the above module doesn't match the
current version of Python, which is:
2.7. (r266:, Jan , ::)
[GCC 4.4. (Red Hat 4.4.-)] If you cannot solve this problem yourself, please go to
the yum faq at:
http://yum.baseurl.org/wiki/Faq

解决方法:

unset PYTHONHOME
unset PYTHONPATH
unset LD_LIBRARY_PATH

再进行yum安装之后,再修改回来,使得GP能正常使用

source /usr/local/greenplum-db/greenplum_path.sh

注:报错原因:在安装GP集群之后,会在master节点中的环境变量中会增加 PYTHONHOME,PYTHONPATH,LD_LIBRARY_PATH几项,并且会修改原本的path。

补充:LD_LIBRARY_PATH 该环境变量主要用于指定查找共享库(动态链接库)时除了默认路径之外的其他路径。


在每个Segment主机,编辑/etc/ntp.conf文件。设置第一个server参数指向Master主机,第二个server参数指向Standby主机。如下面:

sudo vim /etc/ntp.conf

server gp-master prefer
server gp-standby

在Standby主机,编辑/etc/ntp.conf文件。设置第一个server参数指向Master主机,第二个参数指向数据中心的时间服务器。

sudo vim /etc/ntp.conf

server gp-master prefer

在Master主机,使用NTP守护进程同步所有Segment主机的系统时钟。例如,使用gpssh来完成:

gpssh -f hostfile_allhosts -v -e 'ntpd'

输出如下代表成功:

[root@gp-master gpadmin]# gpssh -f all_hosts -v -e 'ntpd'
[WARN] Reference default values as $MASTER_DATA_DIRECTORY/gpssh.conf could not be found
Using delaybeforesend 0.05 and prompt_validation_timeout 1.0 [Reset ...]
[INFO] login mdw
[INFO] login smdw
[INFO] login sdw1
[INFO] login sdw2
[ mdw] ntpd
[smdw] ntpd
[sdw1] ntpd
[sdw2] ntpd
[INFO] completed successfully [Cleanup...]

配置Greenplum初始化文件

cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config   /home/gpadmin/gpinitsystem_config
chmod gpinitsystem_config

相关配置如下:

[gpadmin@gp-master ~]$ cat gpinitsystem_config
# FILE NAME: gpinitsystem_config # Configuration file needed by the gpinitsystem ################################################
#### REQUIRED PARAMETERS
################################################ #### Name of this Greenplum system enclosed in quotes.
ARRAY_NAME="Greenplum Data Platform" #### Naming convention for utility-generated data directories.
SEG_PREFIX=gpseg #### Base number by which primary segment port numbers
#### are calculated.
PORT_BASE= #### File system location(s) where primary segment data directories
#### will be created. The number of locations in the list dictate
#### the number of primary segments that will get created per
#### physical host (if multiple addresses for a host are listed in
#### the hostfile, the number of segments will be spread evenly across
#### the specified interface addresses).
declare -a DATA_DIRECTORY=(/data/greenplum_data/primary) #### OS-configured hostname or IP address of the master host.
MASTER_HOSTNAME=k8s-master #### File system location where the master data directory
#### will be created.
MASTER_DIRECTORY=/data/greenplum_data/gpmaster #### Port number for the master instance.
MASTER_PORT= #### Shell utility used to connect to remote hosts.
TRUSTED_SHELL=ssh #### Maximum log file segments between automatic WAL checkpoints.
CHECK_POINT_SEGMENTS= #### Default server-side character set encoding.
ENCODING=UTF- ################################################
#### OPTIONAL MIRROR PARAMETERS
################################################ #### Base number by which mirror segment port numbers
#### are calculated.
MIRROR_PORT_BASE= #### Base number by which primary file replication port
#### numbers are calculated.
REPLICATION_PORT_BASE= #### Base number by which mirror file replication port
#### numbers are calculated.
MIRROR_REPLICATION_PORT_BASE= #### File system location(s) where mirror segment data directories
#### will be created. The number of mirror locations must equal the
#### number of primary locations as specified in the
#### DATA_DIRECTORY parameter.
declare -a MIRROR_DATA_DIRECTORY=(/data/greenplum_data/mirror) ################################################
#### OTHER OPTIONAL PARAMETERS
################################################ #### Create a database of this name after initialization.
DATABASE_NAME=testDB #### Specify the location of the host address file here instead of
#### with the the -h option of gpinitsystem.
MACHINE_LIST_FILE=/home/gpadmin/hostfile_segments

运行初始化工具初始化数据库

source /usr/local/greenplum-db/greenplum_path.sh
gpinitsystem -c gpinitsystem_config

初始化日志:

20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Review options for gpinitstandby
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-------------------------------------------------------
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-The Master /data/master/gpseg-1/pg_hba.conf post gpinitsystem
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-has been configured to allow all hosts within this new
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-new array must be explicitly added to this file
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-located in the /usr/local/greenplum-db/./docs directory
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-------------------------------------------------------

现在只有1个master,2个segment,没有standby,那么接下来把standby加入集群。

在Master服务器上执行

gpinitstandby -s gp-standby

输出如下:

[gpadmin@mdw ~]$ gpinitstandby -s smdw
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Checking for filespace directory /data/master/gpseg- on smdw
:::: gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master initialization parameters
:::: gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master hostname = mdw
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master data directory = /data/master/gpseg-
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master port =
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master hostname = smdw
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master port =
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master data directory = /data/master/gpseg-
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum update system catalog = On
:::: gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
:::: gpinitstandby:mdw:gpadmin-[INFO]:- Filespace locations
:::: gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
:::: gpinitstandby:mdw:gpadmin-[INFO]:-pg_system -> /data/master/gpseg-
Do you want to continue with standby master initialization? Yy|Nn (default=N):
> y
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby
:::: gpinitstandby:mdw:gpadmin-[INFO]:-The packages on smdw are consistent.
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Adding standby master to catalog...
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Database catalog updated successfully.
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Updating pg_hba.conf file...
:::: gpinitstandby:mdw:gpadmin-[INFO]:-pg_hba.conf files updated successfully.
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Updating filespace flat files...
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Filespace flat file updated successfully.
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Starting standby master
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Checking if standby master is running on host: smdw in directory: /data/master/gpseg-
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files...
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully.
:::: gpinitstandby:mdw:gpadmin-[INFO]:-Successfully created standby master on gp-standby

查看启动进程:

[gpadmin@gp-master ~]$ ps -ef | grep postgres
gpadmin : ? :: /usr/local/greenplum-db-5.21./bin/postgres -D /data/greenplum_data/gpmaster/gpseg- -p --gp_dbid= --gp_num_contents_in_cluster= --silent-mode=true -i -M master --gp_contentid=- -x -E
gpadmin : ? :: postgres: , master logger process
gpadmin : ? :: postgres: , stats collector process
gpadmin : ? :: postgres: , writer process
gpadmin : ? :: postgres: , checkpointer process
gpadmin : ? :: postgres: , seqserver process
gpadmin : ? :: postgres: , ftsprobe process
gpadmin : ? :: postgres: , sweeper process
gpadmin : ? :: postgres: , stats sender process
gpadmin : ? :: postgres: , wal writer process
gpadmin : ? :: postgres: , wal sender process gpadmin 192.168.18.131() streaming /C05A028
gpadmin : pts/ :: grep --color=auto postgres

设置gpadmin用户环境变量,Master,Standby都需设置。

vim /home/gpadmin/.bashrc

[gpadmin@gp-master ~]$ cat .bashrc
# .bashrc # Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi # Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER= # User specific aliases and functions source /usr/local/greenplum-db/greenplum_path.sh export MASTER_DATA_DIRECTORY=/data/greenplum_data/gpmaster/gpseg-
export PGPRORT=
export PGDATABASE=testDB
[gpadmin@gp-master ~]$ scp .bashrc gp-standby:`pwd`

启动和停止数据库测试是否能正常启动和关闭,命令如下

gpstart
gpstop

到此 Greenplum 就部署完成了。下面进行一些简单的测试。

登录数据库:psql -d postgres

建表,插入,查询

postgres=# create table student ( no int primary key,student_name varchar(40),age int);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "student_pkey" for table "student"
CREATE TABLE
postgres=# insert into student values(1,'yayun',18);
INSERT 0 1
postgres=# select * from student;
no | student_name | age
----+--------------+-----
1 | yayun | 18
(1 row)

Greenplum 5.21.1 集群安装部署的更多相关文章

  1. HBase集群安装部署

    0x01 软件环境 OS: CentOS6.5 x64 java: jdk1.8.0_111 hadoop: hadoop-2.5.2 hbase: hbase-0.98.24 0x02 集群概况 I ...

  2. flink部署操作-flink standalone集群安装部署

    flink集群安装部署 standalone集群模式 必须依赖 必须的软件 JAVA_HOME配置 flink安装 配置flink 启动flink 添加Jobmanager/taskmanager 实 ...

  3. HBase 1.2.6 完全分布式集群安装部署详细过程

    Apache HBase 是一个高可靠性.高性能.面向列.可伸缩的分布式存储系统,是NoSQL数据库,基于Google Bigtable思想的开源实现,可在廉价的PC Server上搭建大规模结构化存 ...

  4. 1.Hadoop集群安装部署

    Hadoop集群安装部署 1.介绍 (1)架构模型 (2)使用工具 VMWARE cenos7 Xshell Xftp jdk-8u91-linux-x64.rpm hadoop-2.7.3.tar. ...

  5. 2 Hadoop集群安装部署准备

    2 Hadoop集群安装部署准备 集群安装前需要考虑的几点硬件选型--CPU.内存.磁盘.网卡等--什么配置?需要多少? 网络规划--1 GB? 10 GB?--网络拓扑? 操作系统选型及基础环境-- ...

  6. K8S集群安装部署

    K8S集群安装部署   参考地址:https://www.cnblogs.com/xkops/p/6169034.html 1. 确保系统已经安装epel-release源 # yum -y inst ...

  7. 【分布式】Zookeeper伪集群安装部署

    zookeeper:伪集群安装部署 只有一台linux主机,但却想要模拟搭建一套zookeeper集群的环境.可以使用伪集群模式来搭建.伪集群模式本质上就是在一个linux操作系统里面启动多个zook ...

  8. 第06讲:Flink 集群安装部署和 HA 配置

    Flink系列文章 第01讲:Flink 的应用场景和架构模型 第02讲:Flink 入门程序 WordCount 和 SQL 实现 第03讲:Flink 的编程模型与其他框架比较 第04讲:Flin ...

  9. centos7下zookeeper集群安装部署

    应用场景:ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,是Google的Chubby一个开源的实现,是Hadoop和Hbase的重要组件. 它是一个为分布式应用提供一致性服务的软 ...

随机推荐

  1. 【概率论】5-1:分布介绍(Special Distribution Introduction)

    title: [概率论]5-1:分布介绍(Special Distribution Introduction) categories: - Mathematic - Probability keywo ...

  2. 认识WebStorm-小程序框架wepy

    WebStorm是一个功能强大的IDE,适用于JavaScript开发,适合使用Node.js进行复杂的客户端开发和服务器端开发. WebStorm具有对JavaScript,HTML, CSS及其现 ...

  3. RESTFUL API 安全认证方式

    一般基于REST API 安全设计常用方式有: HTTP Basic Basic admin:admin Basic YWRtaW46YWRtaW4= Authorization: Basic YWR ...

  4. python pillow 绘制图片

    demo1 #coding=utf- from PIL import Image img = Image.,))###创建一个5*5的图片 pixTuple = (,,,)###三个参数依次为R,G, ...

  5. 去掉 webstorm 灰色的数据类型提示

  6. 页面性能优化:preload预加载静态资源

    本文主要介绍preload的使用,以及与prefetch的区别.然后会聊聊浏览器的加载优先级. preload 提供了一种声明式的命令,让浏览器提前加载指定资源(加载后并不执行),在需要执行的时候再执 ...

  7. Android中为什么主线程不会因为Looper.loop()方法造成阻塞

    很多人都对Handler的机制有所了解,如果不是很熟悉的可以看看我 如果看过源码的人都知道,在处理消息的时候使用了Looper.loop()方法,并且在该方法中进入了一个死循环,同时Looper.lo ...

  8. LC 516. Longest Palindromic Subsequence

    Given a string s, find the longest palindromic subsequence's length in s. You may assume that the ma ...

  9. Linux怎样设置tomcat自启动

    --未验证 越来越多的人把tomcat部署在Linux下,但是linux下必须用命令才能启动tomcat,如果同一个服务器下tomcat部署几个的话,每次启动就很繁琐,能不能设置在linux系统启动时 ...

  10. 002-01-RestTemplate-配置使用说明

    一.概述 Spring RestTemplate 是 Spring 提供的用于访问 Rest 服务的客户端,RestTemplate 提供了多种便捷访问远程Http服务的方法,能够大大提高客户端的编写 ...