he Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The project includes these modules:

Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

This article will help you for step by step install and configure single node hadoop cluster using Hadoop on centos.

Install Java

Before installing hadoop make sure you have java installed on your system. Use this command to check the version of the installed Java.

java -version

java version "1.7.0_75"

Java(TM) SE Runtime Environment (build 1.7.0_75-b13)

Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

To install or update Java use following step by step instructions.

First step is to download latest version of java from the Oracle official website.

cd /opt/

wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz"

tar xzf jdk-7u79-linux-x64.tar.gz

Need to set up to use newer version of Java using alternatives. Use the following commands to do it.

cd /opt/jdk1.7.0_79/

alternatives --install /usr/bin/java java /opt/jdk1.7.0_79/bin/java 2

alternatives --config java

There are 3 programs which provide 'java'.

Selection    Command

-----------------------------------------------

*  1           /opt/jdk1.7.0_60/bin/java

+ 2           /opt/jdk1.7.0_72/bin/java

3           /opt/jdk1.7.0_79/bin/java

Enter to keep the current selection[+], or type selection number: 3 [Press Enter]

Now you may also required to set up javac and jar commands path using alternatives command.

alternatives --install /usr/bin/jar jar /opt/jdk1.7.0_79/bin/jar 2

alternatives --install /usr/bin/javac javac /opt/jdk1.7.0_79/bin/javac 2

alternatives --set jar /opt/jdk1.7.0_79/bin/jar

alternatives --set javac /opt/jdk1.7.0_79/bin/javac

The next step is to configure environment variables. Use following commands to set up these variable properly

Setup JAVA_HOME Variable

export JAVA_HOME=/opt/jdk1.7.0_79

Setup JRE_HOME Variable

export JRE_HOME=/opt/jdk1.7.0_79/jre

Setup PATH Variable

export PATH=$PATH:/opt/jdk1.7.0_79/bin:/opt/jdk1.7.0_79/jre/bin

Installing Apache Hadoop

After setting up the java environment. Let stat installing Apache Hadoop.

The first step is to create a system user account to use for hadoop installation.

useradd hadoop

passwd hadoop

Now you need to configure the ssh keys for the user hadoop. Using following command to enable ssh login without password.

su - hadoop

ssh-keygen -t rsa -P ''

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 0600 ~/.ssh/authorized_keys

exit

Now download hadoop latest available version from its official site hadoop.apache.org.

cd ~

wget http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

tar xzf hadoop-2.6.0.tar.gz

mv hadoop-2.6.0 hadoop

Now the next step is to set environment variable uses by hadoop.

Edit ~/.bashrc file and add the following listes of values at end of file.

export HADOOP_HOME=/home/hadoop/hadoop

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Then apply the changes in current running environment

source ~/.bashrc

edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable

export JAVA_HOME=/opt/jdk1.7.0_79/

Now you start with the configuration with basic hadoop single node cluster setup.

First edit hadoop configuration files and make following changes.

 cd /home/hadoop/hadoop/etc/hadoop

Let’s start by editing core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

Then Edit hdfs-site.xml:

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.name.dir</name>

<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>

</property>

</configuration>

and edit mapred-site.xml:

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

finally edit yarn-site.xml:

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

Now format the namenode using following command:

hdfs namenode -format

To start all hadoop services use the following command:

cd /home/hadoop/hadoop/sbin/

start-dfs.sh

start-yarn.sh

To check if all services are started well use ‘jps‘ command:

jps

You should see like this output.

26049 SecondaryNameNode

25929 DataNode

26399 Jps

26129 JobTracker

26249 TaskTracker

25807 NameNode

Now you can in your Browser at:

Verify all applications for cluster：http://your-ip-address:8088/

Verify all Hadoop Services ： http://your-ip-address:50070/

Thanks!!!

Referred : http://www.unixmen.com/setup-apache-hadoop-centos

----------------------------

hadoop安装完以后，经常会提示一下警告：

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform...

using builtin-java classes where applicable

搜了好多文章，都说是跟系统位数有关系，我使用的是Centos 6.5 64位操作系统。
前两天在做Docker镜像的时候发现了一个步骤可以解决这个问题，亲试了一下，果然不再提示了。
首先下载hadoop-native-64-2.4.0.tar： http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.4.0.tar 如果你是hadoop2.6的可以下载下面这个： http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.6.0.tar

下载完以后，解压到hadoop的native目录下，覆盖原有文件即可。操作如下：

tar -x hadoop-native-64-2.4.0.tar -C  hadoop/lib/native/

1、linux下jps command not found

hadoop启动，使用命令jps，可是却提示找不到命令，

hadoop执行jps 报

jps -bash: jps: command not
found

这条命令是在jdk下的bin目录下的一个可执行文件，我查看了一下我的jdk目录，发现有jps可执行文件，但是只是没有放在环境变量里面而已，环境变量可以通过etho
$PATH命令查看。

所以就要自己加上去，以root身份vi
/etc/profile，然后在下面加一行export
PATH="usr/java/jdk160_05/bin:$PATH",其中橘色的部分是你把jdk安装在哪的路径和jdk文件夹名称。保存退出。

然后source
/etc/profile就可以，没报错就说明是成功了，再执行jps就看到了。

启动SSHD

service sshd start

关闭firewall：

systemctl stop firewalld.service #停止

firewall systemctl disable firewalld.service #禁止firewall开机启动

firewall-cmd --state #查看默认防火墙状态（关闭后显示notrunning，开启后显示running）

How To Install HBASE

Download and unpack the latest release.

Choose a download site from this list of Apache Download Mirrors. Download the latest stable or fresh release, hbase-0.96.1.1-hadoop2-bin.tar.gz.

Upload file to /opt - decompress and untar:

tar xvfz hbase-0.96.1.1-hadoop2-bin.tar.gz

```
 Extract and setup HBase
```

Similar to what you have done for Hadoop, extract HBase and rename the folder. Run the following command from the location where you have saved hbase files

$tar xvfz hbase-0.96.0-hadoop2-bin.tar.gz

$mv hbase-0.96.0-hadoop2 hbase

Note: If you are using Hbase 0.96 version
Hbase 0.96 version uses older version of hadoop libraries. They are not compatible with Hadoop-2.2, we have downloaded. It uses beta version of hadoop common jar files. If you continue using, you will get errors like,
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): Unknown out of band call #xxxxxx
in hbase log files. To fix the problem, we need to remove all beta jar files from hbase and use the correct set of files. To start with we remove all the files which are not compatible

$cd /home/hdtest/hbase/lib

$rm -rf hadoop*.jar

Once the files are removed, copy the correct files from hadoop installation

$cd /home/hdtest/hbase/lib

$cp $HADOOP_HOME/share/hadoop/common/hadoop*.jar .

$cp $HADOOP_HOME/share/hadoop/hdfs/hadoop*.jar .

$cp $HADOOP_HOME/share/hadoop/mapreduce/hadoop*.jar .

$cp $HADOOP_HOME/share/hadoop/tools/lib/hadoop*.jar .

$cp $HADOOP_HOME/share/hadoop/yarn/hadoop*.jar .

Update the Hbase configuration file at hbase/conf/hbase-site.conf. Add the following content between configuration tag.

 <property>

    <name>hbase.rootdir</name>

    <value>hdfs://127.0.0.1:54310/sample</value>

  </property>

  <property>

    <name>hbase.zookeeper.property.dataDir</name>

    <value>hdfs://127.0.0.1:54310/zookeeper</value>

  </property>

  <property>

    <name>hbase.zookeeper.property.clientPort</name>

    <value>2181</value>

    <description>Property from ZooKeeper's config zoo.cfg.

                 The port at which the clients will connect.</description>

  </property>

If you want to use a local file system instead of HDFS, replace the URL withfile:///your/preferred/path/. Now lets start the HBase instance bu running teh following command.

$cd /home/hdtest/hbase/bin

$./start-hbase.sh

Hbase will not be started automatically in this configuration. You have to run this command again on your next reboot. Once this command is executed and you are back in shell prompt, you can check the log files if something is wrong. You can find the log files under/home/hdtest/hbase/logs folder. If you don’t see any issue, lets try to use hbase. Open the hbase shell and run a simple list command.

$./hbase shell

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.96.0-hadoop2, r1531434, Fri Oct 11 15:28:08 PDT 2013

hbase(main):001:0> list

TABLE

0 row(s) in 3.1580 seconds

=> []

hbase(main):002:0>

There are not tables on Hbase. That is why you see the outpit as []. Now lets create a table and run the list command.

hbase(main):003:0> create 'sample', 'r'

0 row(s) in 0.5550 seconds

=> Hbase::Table - sample

hbase(main):004:0> list

TABLE

sample

1 row(s) in 0.0600 seconds

=> ["sample"]

hbase(main):005:0>

Congrats..!!! you have your Hadoop and HBase running.

**** and now for the previous Hadoop from the above posting:

How to Install Hive

HIVE INSTALLATION

This section refers to the installation settings of Hive on a standalone system as well as on a system existing as a node in a cluster.

INTRODUCTION

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Apache Hive supports analysis of large datasets stored in Hadoop’s HDFS and compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL(Hive Query Language) while maintaining full support for map/reduce.

Hive Installation

Installing HIVE:

Browse to the link: http://apache.claz.org/hive/stable/
Click the apache-hive-0.13.0-bin.tar.gz

Save and Extract it

Commands

user@ubuntu:~$  cd  /usr/lib/

user@ubuntu:~$  sudo mkdir hive

user@ubuntu:~$  cd Downloads

user@ubuntu:~$  sudo mv apache-hive-0.13.0-bin /usr/lib/hive

Setting Hive environment variable:

Commands

user@ubuntu:~$  cd

user@ubuntu:~$  sudo gedit  ~/.bashrc

Copy and paste the following lines at end of the file

# Set HIVE_HOME

export HIVE_HOME="/usr/lib/hive/apache-hive-0.13.0-bin"

PATH=$PATH:$HIVE_HOME/bin

export PATH

Setting HADOOP_PATH in HIVE config.sh

Commands

user@ubuntu:~$ cd  /usr/lib/hive/apache-hive-0.13.0-bin/bin

user@ubuntu:~$ sudo gedit hive-config.sh

Go to the line where the following statements are written

# Allow alternate conf dir location.

HIVE_CONF_DIR="${HIVE_CONF_DIR:-$HIVE_HOME/conf"

export HIVE_CONF_DIR=$HIVE_CONF_DIR

export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH

Below this write the following

export HADOOP_HOME=/usr/local/hadoop    (write the path where hadoop file is there)

Create Hive directories within HDFS

Command

user@ubuntu:~$   hadoop fs -mkdir /usr/hive/warehouse

Setting READ/WRITE permission for table

Command

user@ubuntu:~$  hadoop fs -chmod g+w /usr/hive/warehouse

HIVE launch

Command

user@ubuntu:~$  hive

Hive shell will prompt:

OUTPUT

Shell will look like

Logging initialized using configuration in jar:file:/usr/lib/hive/apache-hive-0.13.0-bin/lib/hive- common-0.13.0.jar!/hive-log4j.properties

hive>show databases

Creating a database

Command

hive> create database mydb;

OUTPUT

OK

Time taken: 0.369 seconds

hive>

Configuring hive-site.xml:

Open with text-editor and change the following property

<property>

    <name>hive.metastore.local</name>

    <value>TRUE</value>

    <description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>

</property>

<property>

    <name>javax.jdo.option.ConnectionURL</name>

    <value>jdbc:mysql://usr/lib/hive/apache-hive-0.13.0-bin/metastore_db? createDatabaseIfNotExist=true</value>

    <description>JDBC connect string for a JDBC metastore</description>

</property>

<property>

    <name>javax.jdo.option.ConnectionDriverName</name>

    <value>com.mysql.jdbc.Driver</value>

    <description>Driver class name for a JDBC metastore</description>

</property>

<property>

    <name>hive.metastore.warehouse.dir</name>

    <value>/usr/hive/warehouse</value>

    <description>location of default database for the warehouse</description>

 </property>

Writing a Script

Open a new terminal (CTRL+ALT+T)

user@ubuntu:~$      sudo gedit sample.sql

create database sample;

use sample;

create table product(product int, productname string, price float)[row format delimited fields terminated by ',';]

describe product;

load data local inpath ‘/home/hduser/input_to_product.txt’ into table product

select * from product;

SAVE and CLOSE

user@ubuntu:~$ sudo gedit input_to_product.txt

user@ubuntu:~$ cd /usr/lib/hive/apache-hive-0.13.0-bin/ $ bin/hive -f /home/hduser/sample.sql

启动 Hive时，遇到如下错误

Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected

解决办法：

删掉以下文件即可

$HADOOP_Home/share/hadoop/yarn/lib/jline-0.9.94.jar

How To Setup Apache Hadoop On CentOS的更多相关文章

5 centos 6.10 三节点安装apache hadoop 2.9.1
Hadoop 版本: apache hadoop 2.9.1JDK 版本: Oracle JDK1.8集群规划master(1): NN, RM, DN, NM, JHSslave1(2): DN, ...
Steps to Install Hadoop on CentOS/RHEL 6---reference
http://tecadmin.net/steps-to-install-hadoop-on-centosrhel-6/# The Apache Hadoop software library is ...
【解决】org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job control
[环境信息] Hadoop版本:2.4.0 客户端OS:Windows Server 2008 R2 服务器端OS:CentOS 6.4 [问题现象] 在通过Windows客户端向Linux服务器提交 ...
Apache Hadoop 2.9.2 完全分布式部署
Apache Hadoop 2.9.2 完全分布式部署(HDFS) 作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.环境准备 1>.操作平台 [root@node101.y ...
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
在 windows 上运行 MapReduce 时报如下异常 Exception in thread "main" java.lang.UnsatisfiedLinkError: ...
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
1.window操作系统的eclipse运行wordcount程序出现如下所示的错误: Exception in thread "main" java.lang.Unsatisfi ...
hadoop 在centos中的搭建
总体思路,准备主从服务器,配置主服务器可以无密码SSH登录从服务器,解压安装JDK,解压安装Hadoop,配置hdfs.mapreduce等主从关系. 1.环境,3台CentOS7,64位,Hadoo ...
安装部署Apache Hadoop (完全分布式模式并且实现NameNode HA和ResourceManager HA)
本节内容: 环境规划配置集群各节点hosts文件安装JDK1.7 安装依赖包ssh和rsync 各节点时间同步安装Zookeeper集群添加Hadoop运行用户配置主节点登录自己和其他节点不 ...
安装部署Apache Hadoop (本地模式和伪分布式)
本节内容: Hadoop版本安装部署Hadoop 一.Hadoop版本 1. Hadoop版本种类目前Hadoop发行版非常多,有华为发行版.Intel发行版.Cloudera发行版(CDH)等, ...

随机推荐

Font Awesome使用简介
Font awesome是一种用字体来实现图标的CSS插件. 使用方法: 到http://fortawesome.github.io/Font-Awesome/ 下载代码包. 代码包解压到本地后, ...
IStat Menus 5.02 5.03 的注册码
1574-5977-7956-8062-0000 6015-5448-3282-4975-0000 9665-5955-6856-2071-0000 2447-9517-7939-5221-0000
Caused by: com.alibaba.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2016-07-20 16:27:34.873, end time: 2016-07-20 16:27:39.895, client elapsed: 0 ms
方案一: 重启dubbo连接 zookeeper 方案二: 经压测,greys跟踪得知,是dubbo的monitor的问题.主要超时的方法是dubbo的getIP方法,monitor每次收集数据的时候 ...
Android混合开发，html5自己主动更新爬过的坑
如今使用混合开发的公司越来越多,尽管出现了一些新技术,比方Facebook的react native.阿里的weex,但依旧阻挡不了一些公司採用h5的决心.当然,这也是从多方面考虑的选择. 在三年前就 ...
TOD&FIXME&XXX
TODO: + 说明: 如果代码中有该标识,说明在标识处有功能代码待编写,待实现的功能在说明中会简略说明.FIXME: + 说明:如果代码中有该标识,说明标识处代码需要修正,甚至代码是错误的,不能工作 ...
Oracle 11g 更改字符集
查看字符集: select * from v$nls_parameters where parameter = 'NLS_CHARACTERSET'; 修改字符集: sqlplus "/as ...
使用 Scrapy 构建一个网络爬虫
来自weixin 记得n年前项目需要一个灵活的爬虫工具,就组织了一个小团队用Java实现了一个爬虫框架,可以根据目标网站的结构.地址和需要的内容,做简单的配置开发,即可实现特定网站的爬虫功能.因为要考 ...
libevent个人理解
1.利用了前置声明来在c语言的基础上进行封装操作.即在include目录下防止event.h等头文件,在这些头文件中只暴露struct的声明却不暴露其定义,对于如event_base等结构的操作均使用 ...
idea 修改编辑区字体样式、大小
idea 修改编辑区字体样式.大小 CreateTime--2018年4月26日10:36:59 Author:Marydon 设置-->Editor-->Font-->修改Fo ...
怎么将txt文件转化为html格式的文件？--极为丑陋的方式
# -*- coding: utf-8 -*- #python 27 #xiaodeng #怎么将txt文件转化为html格式的文件?--极为丑陋的方式 import os #找出所有的txt格式的文 ...

How To Setup Apache Hadoop On CentOS

Install Java

Installing Apache Hadoop

Verify all applications for cluster：http://your-ip-address:8088/

Verify all Hadoop Services ： http://your-ip-address:50070/

Download and unpack the latest release.

HIVE INSTALLATION

INTRODUCTION

Hive Installation

Installing HIVE:

Setting Hive environment variable:

Setting HADOOP_PATH in HIVE config.sh

Create Hive directories within HDFS

Setting READ/WRITE permission for table

HIVE launch

OUTPUT

Creating a database

Configuring hive-site.xml:

Writing a Script

How To Setup Apache Hadoop On CentOS的更多相关文章

随机推荐

热门专题