About Greenplum Database

Greenplum Database is an MPP SQL Database based on PostgreSQL.  Its used in production in hundreds of large corporations and government agencies around the world and including the open source has over thousands of deployments globally.

Greenplum Database scales to multi-petabyte data sizes with ease and allows a cluster of powerful servers to work together to provide a single SQL interface to the data.

In addition to using SQL for analyzing structured data, Greenplum provides modules and extensions on top of the PostgreSQL abstractions for in database machine learning and AI, Geospatial analytics, Text Search (with Apache Solr) and Text Analytics with Python and Java, and the ability to create user-defined functions with Python, R, Java, Perl, C or C++.

Greenplum Database Ubuntu Distribution

Greenplum Database is the only open source product in its category that has a large install base, and now with the release of Greenplum Database 5.3, Ready to Install binaries are hosted for the Ubuntu Operating System to make installation and deployment easy.
Ubuntu is a popular operating system in cloud-native environments and is based on the very well respected Debian Linux distribution.

In this article, I will demonstrate how to install the Open Source Greenplum Database binaries on the Ubuntu Operating System.

Greenplum Database binaries for Ubuntu are hosted on the Personal Package Archive system, which allows the community to contribute readily to install packages that can be installed from any internet connected system.

So let’s get right to it!

Greenplum OSS on Ubuntu Installation Instructions

First, ensure you have a supported Ubuntu OS version.  At the time of this writing, Ubuntu builds of Greenplum are built for the 16.04 LTS (long-term support) release version of Ubuntu.  Check the PPA page, for current information about which OS version is available.

Add the Greenplum PPA repository to your Ubuntu System, like this:

sudo add-apt-repository ppa:greenplum/db

Update your Ubuntu system to retrieve information from the recently added repository, like this:

sudo apt-get update

Install the Greenplum Database software, like this:

sudo apt-get install greenplum-db-oss

The above command will install the Greenplum Database software and any required dependencies on the system automatically and put the resulting software in /opt/gpdb.

Load Greenplum Database software into your environment with the following command:

$ . /opt/gpdb/greenplum_path.sh
$ which gpssh
/opt/gpdb/bin/gpssh

You can see the software is on the path by testing using the which command as above.  Now you can copy a Greenplum cluster configuration file template into your local directory for editing like this:

cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_singlenode .

Edit gpinitsystem Configuration File

The following edits can be made for the most simple cluster configuration running locally.

Create this file and put only your hostname into the file:
MACHINE_LIST_FILE=./hostlist_singlenode

Update this line to have a directory you want to use for primaries for example:
declare -a DATA_DIRECTORY=(/gpdata1 /gpdata2)
declare -a DATA_DIRECTORY=(/home/inovick/primary /home/inovick/primary)
And make sure the directory mentioned above exists.

Update this line to have the hostname of your machine, in my case, the hostname is ‘ubuntu’:
MASTER_HOSTNAME=hostname_of_machine
MASTER_HOSTNAME=ubuntu

Update the master data directory entry in the file and ensure it exists by making the directory:
MASTER_DIRECTORY=/home/inovick/master

That’s enough to get the database initialized and up running, so close the file and let’s initialize the cluster. We will have a master segment instance and two primary segment instances with this configuration. In more advanced setups you would configure a standby master and segment mirrors on additional hosts, and the data would be automatically both sharded (distributed) between the primary segments and mirrored from primaries to mirrors.

Run gpinitsystem

First, let’s make sure ssh keys are exchanged by running the following command.  Screenshot from my system is shown below:

gpssh-exchkeys -f hostlist_singlenode

Ok, we need to start the cluster, let’s get started. Run the following command:

gpinitsystem -c gpinitsystem_singlenode

The utility will print out what its going to do and then ask you to confirm before proceeding.  Here is an example below:

Once it finishes you are good to go, you can create a database, login and start doing queries and inserting data as shown below:

To really get the full benefit, you will want to do some of the following things:

  • Allocate enough hardware to process large amounts of data in your cluster
  • Check the official Greenplum Database documentation
  • Watch some of the Greenplum Videos on YouTube
  • Load a lot of data using the high speed parallel load of gpload or external tables with gpfdist, PXF, or S3

That’s it for this tutorial, enjoy Greenplum OSS on Ubuntu.

PPA description

Installation into /opt/gpdb
---------------------------

sudo apt-get install -y software-properties-common
sudo add-apt-repository ppa:greenplum/db
sudo apt-get update
sudo apt-get install -y greenplum-db-oss

Initialize the cluster
----------------------

1. Install Greenplum on all the nodes you will include in your cluster as described in : https://gpdb.docs.pivotal.io/latest/install_guide/prep_os_install_gpdb.html

2. On all nodes, create unix user "gpadmin" as described at https://gpdb.docs.pivotal.io/latest/admin_guide/roles_privs.html . A convenient script for this is in Greenplum source at https://raw.githubusercontent.com/greenplum-db/gpdb/master/concourse/scripts/setup_gpadmin_user.bash

3. Change ownership of gpdb installed files to the gpadmin user, and do all the following initialization as that gpadmin user:

chown -R gpadmin:gpadmin /opt/gpdb
     su - gpadmin
     source /opt/gpdb/greenplum_path.sh

4. Follow instructions to initialize cluster at https://gpdb.docs.pivotal.io/latest/install_guide/init_gpdb.html

Adding this PPA to your system

You can update your system with unsupported packages from this untrusted PPA by adding ppa:greenplum/db to your system's Software Sources. (Read about installing)

sudo add-apt-repository ppa:greenplum/db
sudo apt-get update

Install Greenplum OSS on Ubuntu的更多相关文章

  1. Install Google Pinyin on Ubuntu 14.04

    Install Google Pinyin on Ubuntu 14.04 I've been spending more and more time on Ubuntu and I'm not us ...

  2. HOWTO install Oracle 11g on Ubuntu Linux 12.04 (Precise Pangolin) 64bits

    安装了Ubuntu 12.04 64bit, 想在上面安装Oracle 11gr2,网上找了好多文档都没成功,最后完全参考了MordicusEtCubitus的文章. 成功安装的关键点:install ...

  3. Install a Redmine on Ubuntu system

    # How to install a Redmine on Ubuntu system Ref to: https://www.linode.com/docs/applications/project ...

  4. Install LAMP Stack On Ubuntu 16.04

    原文:http://www.unixmen.com/how-to-install-lamp-stack-on-ubuntu-16-04/ LAMP is a combination of operat ...

  5. How do you install Google Chrome on Ubuntu?

    https://askubuntu.com/questions/510056/how-to-install-google-chrome sudo apt-get install chromium-br ...

  6. Install eclipse ns3 in ubuntu 14.04

    1. NS3 install 参考NS3 tutorial即可. 2.eclipse 2.1下载 下载地址:http://www.eclipse.org/downloads/              ...

  7. install dns server on ubuntu

    参考 CSDN/Ubuntu环境下安装和配置DNS服务器 在 Ubuntu 上安裝 DNS server Install BIND 9 on Ubuntu and Configure It for U ...

  8. [译]How to Install Node.js on Ubuntu 14.04 如何在ubuntu14.04上安装node.js

    原文链接为 http://www.hostingadvice.com/how-to/install-nodejs-ubuntu-14-04/ 由作者Jacob Nicholson 发表于October ...

  9. Install latest R for ubuntu

    ### delete old version rm -rf /usr/local/lib/R /usr/lib/R ~/**/R sudo apt-get autoremove rstudio sud ...

随机推荐

  1. react-router 从 v3 版本升到 v4 版本,升级小记

    react-router v4 跟 react 一样拆成了两部分,核心的 react-router 和依运行环境而定的 react-router-dom 或 react-router-native(跟 ...

  2. ThinkPHP5与JQuery实现图片上传和预览效果

    内容正文 这篇文章主要为大家详细介绍了thinkphp上传图片功能,和jquery预览图片效果,具有一定的参考价值,感兴趣的小伙伴们可以参考一下 先上效果图: html和js代码如下: <!DO ...

  3. 亿级用户百TB级数据的AIOps 技术实践之路

    关于面临的挑战 "因为专业性强,我认为反而让交互方式变简单了,打个点餐的比方,软件1.0阶段是,我要吃鱼香肉丝,我要吃辣的或是素一点的,根据清晰的接口上菜.而软件2.0阶段就是,我今天想吃开 ...

  4. java 泛型与通配符(?)

    泛型应用于泛型类或泛型方法的声明. 如类GenericTest public class GenericTest<T> { private T item; public void set( ...

  5. 中文自然语言处理工具hanlp隐马角色标注详解

    本文旨在介绍如何利用HanLP训练分词模型,包括语料格式.语料预处理.训练接口.输出格式等. 目前HanLP内置的训练接口是针对一阶HMM-NGram设计的,另外附带了通用的语料加载工具,可以通过少量 ...

  6. new和delete重载

    1. 简介 new/delete关键字,其本质是预定义的操作符,因此支持重载 默认new和delete的行为: new:    ①获取内存空间(默认为堆空间):②在获取的空间中调用构造函数创建对象 d ...

  7. Win10+Ubuntu1604双系统

    原本电脑有一块固态硬盘和机械硬盘,用来跑win10的,现在想直接在ubuntu上跑tensorflow,所以加了块320G的机械硬盘单独跑ubuntu. 一.准备 1.ubuntu-16.04.3-d ...

  8. php 文件读取方式

    整理了一下PHP中读取文件的几个方法,方便以后查阅. 1.fread string fread ( int $handle , int $length ) fread() 从 handle 指向的文件 ...

  9. 【git】之clone(克隆)

    直接克隆 git clone https://github.com/gyjx/test.git 指定克隆某个分支 git clone -b dev https://github.com/gyjx/te ...

  10. C++11--正则表达式<regex>

    /* 正则表达式:一个指定的模式用来对文本中的字符串提供精确且灵活的匹配 */ #include <regex> using namespace std; int main() { str ...