Hadoop Ecosytem
There are a lot of Hadoop related projects which are open sourced and widely used by many componies. This article will go through the installations of them.
l Install JDK
l Install Hadoop
l Install Hbase
l Install Hive
l Install Spark
l Install Impala
l Install Sqoop
l Install Alluxio
Install JDK
Step 1: download package from offical site, and choose appropriate version.
Step 2: unzip the package and copy to destination folder
tar zxf jdk-8u111-linux-x64.tar.gz
cp -R jdk1.8.0_111/* /usr/share
Step 3: setting PATH and JAVA_HOME
vi ~/.bashrc
export JAVA_HOME=/usr/share/jdk1.8.0_111
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarsource ~/.bashrc
Step 4: reboot server to make the changes take effect
Step 5: check java version
java -version
javac -version
Install Hadoop
Follow below steps to install Hadoop in standalone mode.
Step 1: download package from apache site
Step 2: unzip the package and copy to destination folder
tar zxf hadoop-2.7.3.tar.gz
cp -R hadoop-2.7.3/* /usr/share/hadoop
Step 3: create 'hadoop' fiolder under 'home'
mkdir /home/hadoop
Step 4: set PATH and HADOOP_HOME
vi ~/.bashrc
export HADOOP_HOME=/usr/share/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOMEsource ~/.bashrc
Step 5: check hadoop version
hadoop version
Step 6: config hadoop hdfs, core site, yarn and map-reduce
cd $HADOOP_HOME/etc/hadoop
vi hadoop-env.sh
export JAVA_HOME=/usr/share/jdk1..0_111vi core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>vi hdfs-site.xml
<property>
<name>dfs.replication</name>
<value></value>
</property> <property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode</value>
</property> <property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode</value>
</property>vi yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>cp mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Step 7: initialize hadoop namenode
hdfs namenode -format
Step 8: start hadoop
start-dfs.sh
start-yarn.sh
Step 9: check hadoop site to see if it works
http://localhost:50070/
http://localhost:8088/
Install HBase
Follow below steps to install HBase in standalone mode.
Step 1: check if Hadoop installed
hadoop version
Step 2: download version 1.2.4 of hbase from apache site
Step3: unzip package and copy to destination folder
tar zxf hbase-1.2.4-bin.tar.gz
cp -R hbase-1.2.4-bin/* /usr/share/base
Step 4: configure hbase env
cd /usr/shar/hbase/conf
vi hbase-env.sh
export JAVA_HOME=/usr/share/jdk1..0_111
Step 5: modify hbase-site.xml
vi hbase-site.xml
<configuration>
//Here you have to set the path where you want HBase to store its files.
<property>
<name>hbase.rootdir</name>
<value>file:/home/hadoop/HBase/HFiles</value>
</property> //Here you have to set the path where you want HBase to store its built in zookeeper files.
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>
Step 6: start hbase and check hbase directory in hdfs
cd /usr/share/hbase/bin
start-hbase.sh
hadoop fs -ls /hbase
Step 7: check hbase via web interface
http://localhost:60010
Install Hive
Step 1: download version 1.2.1 of hive from apache site
Step 2: unzip the package and copy to destination folder
tar zxf apache-hive-1.2.1-bin.tar.gz
cp -R apache-hive-1.2.1-bin/* /usr/share/hive
Step 3: set HIVE_HOME
vi ~/.bashrc
export HIVE_HOME=/usr/share/hive
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/share/hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/share/hive/lib/*:.source ~/.bashrc
Step 4: configure env for hive
cd $HIVE_HOME/conf
cp hive-env.sh.template hive-env.sh
export HADOOP_HOME-/usr/share/hadoop
Step 5: download version 10.12.1.1 of Apache Derby from apache site
Step 6: unzip derby package and copy to destination folder
tar zxf db-derby-10.12.1.1-bin.tar.gz
cp -R db-derby-10.12.1.1-bin/* /usr/share/derby
Step 7: setup DERBY_HOME
vi ~/.bashrc
export DERBY_HOME=/usr/local/derby
export PATH=$PATH:$DERBY_HOME/bin
export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jarsource ~/.bashrc
Step 8: create a directory to store metastore
mkdir $DERBY_HOME/data
Step 9: configure metasore of hive
cd $HIVE_HOME/conf
cp hive-default.xml.template hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true </value>
<description>JDBC connect string for a JDBC metastore </description>
</property>
Step 10: create a file named jpox.properties and add the following content into it
touch jpox.properties
vi jpox.properties
javax.jdo.PersistenceManagerFactoryClass = org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema = false
org.jpox.validateTables = false
org.jpox.validateColumns = false
org.jpox.validateConstraints = false
org.jpox.storeManagerType = rdbms
org.jpox.autoCreateSchema = true
org.jpox.autoStartMechanismMode = checked
org.jpox.transactionIsolation = read_committed
javax.jdo.option.DetachAllOnCommit = true
javax.jdo.option.NontransactionalRead = true
javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL = jdbc:derby://hadoop1:1527/metastore_db;create = true
javax.jdo.option.ConnectionUserName = APP
javax.jdo.option.ConnectionPassword = mine
Step 11: enter into hive shell and execute command 'show tables'
cd $HIVE_HOME/bin
hive
hive> show tables;
Install Spark
Step 1: download version 2.12.0 of scala from scala site
Step 2: unzip the package and copy to destination folder
tar zxf scala-2.12.0.tgz
cp -R scala-2.12.0/* /usr/share/scala
Step 3: set PATH for scala
vi ~/.bashrc
export PATH=$PATH:/usr/share/scala/binsource ~/.bashrc
Step 4: check scala version
scala -version
Step 5: download version 2.0.2 of spark from apache site
Step 6: unzip the package and copy to destination folder
tar zxf spark-2.0.2-bin-hadoop2.7.tgz
copy spark-2.0.2-bin-hadoop2.7/* /usr/share/spark
Step 7: setup PATH
vi ~/.bashrc
export PATH=$PATH:/usr/share/spark/binsource ~/.bashrc
Step 8: enter into spark-shell to see if spark is installed successfully
spark-shell
Install Impala
Step 1: download version 2.7.0 of impala from impala site
Step 2: unzip the package and copy to destination folder
tar zxf apache-impala-incubating-2.7.0.tar.gz
cp -R apache-impala-incubating-2.7.0/* /usr/share/impala
Step 3: set PATH and IMPALA_HOME
vi ~/.bashrc
export IMPALA_HOME=/usr/share/impala
export PATH=$PATH:/usr/share/impalasource ~/.bashrc
Step 4: to be continued...
Install Sqoop
Preconditions: should have Hadoop (HDFS and Map-Reduce) installed
Step 1: download version 1.4.6 of sqoop from apache site
Step 2: unzip the package and copy to destination folder
tar zxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
cp -R sqoop-1.4.6.bin__hadoop-2.0.4-alpha/* /usr/share/sqoop
Step 3: set SQOOP_HOME and PATH
vi ~/.bashrc
export SQOOP_HOME=/usr/lib/sqoop
export PATH=$PATH:$SQOOP_HOME/binsource ~/.bashrc
Step 4: configure sqoop
cd $SQOOP_HOME/conf
mv sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/share/hadoop
export HADOOP_MAPRED_HOME=/usr/share/hadoop
Step 5: download version 5.1.40 of mysql-connector-java from site
Step 6: unzip the package and move related jar file into destination folder
$ tar -zxf mysql-connector-java-5.1.40.tar.gz
# cd mysql-connector-java-5.1.40
# mv mysql-connector-java-5.1.40-bin.jar /usr/lib/sqoop/lib
Step 7: verify if sqoop is installed successfully
cd $SQOOP_HOME/bin
sqoop-version
Install Alluxio
Step 1: download version 1.3.0 of alluxio from site
Step 2: unzip the package and move it to destination folder
tar zxf alluxio-1.3.0-hadoop2.7-bin.tar.gz
cp -R alluxio-1.3.0-hadoop2.7-bin/* /usr/share/alluxio
Step 3: create alluxio-env
cd /usr/share/alluxio
bin/alluxio bootstrapConf localhost local
vi conf/alluxio-env.sh
export ALLUXIO_UNDERFS_ADDRESS=/tmp
Step 4: format alluxio file system and start alluxio
cd /usr/share/alluxio
bin/alluxio format
bin/alluxio-start.sh local
Step 5: verify if alluxio is running by visiting http://localhost:19999
Step 6: run predefined tests
cd /usr/share/alluxio
bin/alluxio runTests
Hadoop Ecosytem的更多相关文章
- Hadoop 中利用 mapreduce 读写 mysql 数据
Hadoop 中利用 mapreduce 读写 mysql 数据 有时候我们在项目中会遇到输入结果集很大,但是输出结果很小,比如一些 pv.uv 数据,然后为了实时查询的需求,或者一些 OLAP ...
- 初识Hadoop、Hive
2016.10.13 20:28 很久没有写随笔了,自打小宝出生后就没有写过新的文章.数次来到博客园,想开始新的学习历程,总是被各种琐事中断.一方面确实是最近的项目工作比较忙,各个集群频繁地上线加多版 ...
- hadoop 2.7.3本地环境运行官方wordcount-基于HDFS
接上篇<hadoop 2.7.3本地环境运行官方wordcount>.继续在本地模式下测试,本次使用hdfs. 2 本地模式使用fs计数wodcount 上面是直接使用的是linux的文件 ...
- hadoop 2.7.3本地环境运行官方wordcount
hadoop 2.7.3本地环境运行官方wordcount 基本环境: 系统:win7 虚机环境:virtualBox 虚机:centos 7 hadoop版本:2.7.3 本次先以独立模式(本地模式 ...
- 【Big Data】HADOOP集群的配置(一)
Hadoop集群的配置(一) 摘要: hadoop集群配置系列文档,是笔者在实验室真机环境实验后整理而得.以便随后工作所需,做以知识整理,另则与博客园朋友分享实验成果,因为笔者在学习初期,也遇到不少问 ...
- Hadoop学习之旅二:HDFS
本文基于Hadoop1.X 概述 分布式文件系统主要用来解决如下几个问题: 读写大文件 加速运算 对于某些体积巨大的文件,比如其大小超过了计算机文件系统所能存放的最大限制或者是其大小甚至超过了计算机整 ...
- 程序员必须要知道的Hadoop的一些事实
程序员必须要知道的Hadoop的一些事实.现如今,Apache Hadoop已经无人不知无人不晓.当年雅虎搜索工程师Doug Cutting开发出这个用以创建分布式计算机环境的开源软...... 1: ...
- Hadoop 2.x 生态系统及技术架构图
一.负责收集数据的工具:Sqoop(关系型数据导入Hadoop)Flume(日志数据导入Hadoop,支持数据源广泛)Kafka(支持数据源有限,但吞吐大) 二.负责存储数据的工具:HBaseMong ...
- Hadoop的安装与设置(1)
在Ubuntu下安装与设置Hadoop的主要过程. 1. 创建Hadoop用户 创建一个用户,用户名为hadoop,在home下创建该用户的主目录,就不详细介绍了. 2. 安装Java环境 下载Lin ...
随机推荐
- python 单双引号交替的json串
单双引号交替的json串 1.常见的json串,类似于这种{"isSucess":true, "name":"yoyo", "st ...
- BZOJ 3083 遥远的国度(树链剖分+LCA)
Description 描述zcwwzdjn在追杀十分sb的zhx,而zhx逃入了一个遥远的国度.当zcwwzdjn准备进入遥远的国度继续追杀时,守护神RapiD阻拦了zcwwzdjn的去路,他需要z ...
- windows下go编码转换问题
github上有两个package做编码转换,都是基于iconv,用到了cgo,在linux下没有问题,在windows下用,非常麻烦.采用mingw安装libiconv也不行,一直提示找不到libi ...
- 为什么 kubernetes 天然适合微服务 (3)
此文已由作者刘超授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验 四.Kubernetes 本身就是微服务架构 基于上面这十个设计要点,我们再回来看 Kubernetes,会发现 ...
- C#质因子(自己别扭的逻辑。。)
static int length1(int num) //想着要定义一个函数取,质因子数组的长度 { ; ; i <= num; i++) //for循环中I 不会归零 只能遍历一次 { if ...
- ### 20165219 2017-2018-2《Java程序设计》结对编程一 第二周总结
20165219 2017-2018-2<Java程序设计>结对编程一 第二周总结 结对对象 20165219王彦博 20165232何彦达 需求分析 实现一个程序,要求: 1 支持整数运 ...
- 1221: Fibonacci数列 [数学]
1221: Fibonacci数列 [数学] 时间限制: 1 Sec 内存限制: 128 MB 提交: 116 解决: 36 统计 题目描述 Fibonacci数列的递推公式为:Fn=Fn-1+Fn- ...
- Jmeter_录制HTTPS
[环境] Jmeter版本:Jmeter3.2: JDK版本:JDK1.8 [配置] [1]添加“线程组.Http信息头管理器.httpCookie管理器.HTTP代理服务器”: [2]设置浏览器的“ ...
- 【bzoj4832】[Lydsy1704月赛]抵制克苏恩 期望dp
Description 小Q同学现在沉迷炉石传说不能自拔.他发现一张名为克苏恩的牌很不公平.如果你不玩炉石传说,不必担心,小Q 同学会告诉你所有相关的细节.炉石传说是这样的一个游戏,每个玩家拥有一个 ...
- rsync入门
rsync是Linux/unix下一个用于远程文件(目录)同步的一个精巧的小工具程序,有很多文章讨论了其功能和实现原理,本文主要就不赘述了. 主要介绍下实践时使用的一些方法和细枝末节留作工作笔记以便日 ...