Hadoop+HBase 集群搭建
Hadoop+HBase 集群搭建
1. 环境准备
说明:本次集群搭建使用系统版本Centos 7.5 ,软件版本 V3.1.1。
1.1 配置说明
本次集群搭建共三台机器,具体说明下:
主机名 |
IP |
说明 |
hadoop01 |
10.0.0.10 |
DataNode、NodeManager、NameNode
|
hadoop02 |
10.0.0.11 |
DataNode、NodeManager、ResourceManager、SecondaryNameNode
|
hadoop03 |
10.0.0.12 |
DataNode、NodeManager
|
1.2 机器配置说明
[clsn@hadoop01 /home/clsn]
$cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
[clsn@hadoop01 /home/clsn]
$uname -r
3.10.0-862.el7.x86_64
[clsn@hadoop01 /home/clsn]
$sestatus
SELinux status: disabled
[clsn@hadoop01 /home/clsn]
$systemctl status firewalld.service
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:firewalld(1)
[clsn@hadoop01 /home/clsn]
$id clsn
uid=1000(clsn) gid=1000(clsn) 组=1000(clsn)
[clsn@hadoop01 /home/clsn]
$cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.10 hadoop01
10.0.0.11 hadoop02
10.0.0.12 hadoop03
注:本集群内所有进程均由clsn用户启动
1.3 ssh互信配置
ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa.pub 127.0.0.1
scp -rp ~/.ssh hadoop02:/home/clsn
scp -rp ~/.ssh hadoop03:/home/clsn
1.4 配置jdk
在三台机器上都需要操作
tar xf jdk-8u191-linux-x64.tar.gz -C /usr/local/
ln -s /usr/local/jdk1.8.0_191 /usr/local/jdk
sed -i.ori '$a export JAVA_HOME=/usr/local/jdk\nexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH\nexport CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar' /etc/profile
. /etc/profile
2. 安装hadoop
2.1 安装包下载(Binary)
wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
2.2 安装
tar xf hadoop-3.1.1.tar.gz -C /usr/local/
ln -s /usr/local/hadoop-3.1.1 /usr/local/hadoop
sudo chown -R clsn.clsn /usr/local/hadoop-3.1.1/
3.修改hadoop配置
配置文件全部位于 /usr/local/hadoop/etc/hadoop 文件夹下
3.1 hadoop-env.sh
[clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
$ head hadoop-env.sh
. /etc/profile
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
3.2 core-site.xml
[clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
$ cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS老大(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/data/tmp</value>
</property>
</configuration>
3.3 hdfs-site.xml
[clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
$ cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 设置namenode的http通讯地址 -->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop01:50070</value>
</property>
<!-- 设置secondarynamenode的http通讯地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop02:50090</value>
</property>
<!-- 设置namenode存放的路径 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/name</value>
</property>
<!-- 设置hdfs副本数量 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 设置datanode存放的路径 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/datanode</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
3.4 mapred-site.xml
[clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/usr/local/hadoop/etc/hadoop,
/usr/local/hadoop/share/hadoop/common/*,
/usr/local/hadoop/share/hadoop/common/lib/*,
/usr/local/hadoop/share/hadoop/hdfs/*,
/usr/local/hadoop/share/hadoop/hdfs/lib/*,
/usr/local/hadoop/share/hadoop/mapreduce/*,
/usr/local/hadoop/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop/share/hadoop/yarn/*,
/usr/local/hadoop/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
3.5 yarn-site.xml
[clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
$ cat yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop02</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
</configuration>
3.6 masters & slaves
echo 'hadoop02' >> /usr/local/hadoop/etc/hadoop/masters
echo 'hadoop03
hadoop01' >> /usr/local/hadoop/etc/hadoop/slaves
3.7 启动脚本修改
启动脚本文件全部位于 /usr/local/hadoop/sbin 文件夹下:
(1)修改 start-dfs.sh stop-dfs.sh 文件添加:
HDFS_DATANODE_USER=clsn
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=clsn
HDFS_SECONDARYNAMENODE_USER=clsn
(2)修改start-yarn.sh 和 stop-yarn.sh文件添加:
YARN_RESOURCEMANAGER_USER=clsn
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=clsn
4. 启动前准备
4.1 创建文件目录
mkdir -p /data/tmp
mkdir -p /data/name
mkdir -p /data/datanode
chown -R clsn.clsn /data
在集群内所有机器上都进行创建,也可以复制文件夹
for i in hadoop02 hadoop03
do
sudo scp -rp /data $i:/
done
4.2 复制hadoop配置到其他机器
for i in hadoop02 hadoop03
do
sudo scp -rp /usr/local/hadoop-3.1.1 $i:/usr/local/
done
4.3 启动hadoop集群
(1)第一次启动前需要格式化
/usr/local/hadoop/bin/hdfs namenode -format
(2)启动集群
cd /usr/local/hadoop/sbin
./start-all.sh
5.集群启动成功
(1)使用jps查看集群中各个角色,是否与预期相一致
[clsn@hadoop01 /home/clsn]
$ pssh -ih cluster "`which jps`"
[1] 11:30:31 [SUCCESS] hadoop03
7947 DataNode
8875 Jps
8383 NodeManager
[2] 11:30:31 [SUCCESS] hadoop01
20193 DataNode
20665 NodeManager
21017 NameNode
22206 Jps
[3] 11:30:31 [SUCCESS] hadoop02
8896 DataNode
9427 NodeManager
10883 Jps
9304 ResourceManager
10367 SecondaryNameNode
(2)浏览器访问http://hadoop02:8088/cluster/nodes
该页面为ResourceManager 管理界面,在上面可以看到集群中的三台Active Nodes。
(3) 浏览器访问http://hadoop01:50070/dfshealth.html#tab-datanode
该页面为NameNode管理页面
6.Hbase配置
6.1 部署Hbase包
cd /opt/
wget http://mirrors.tuna.tsinghua.edu.cn/apache/hbase/1.4.9/hbase-1.4.9-bin.tar.gz
tar xf hbase-1.4.9-bin.tar.gz -C /usr/local/
ln -s /usr/local/hbase-1.4.9 /usr/local/hbase
6.2 修改配置文件
6.2.1 hbase-env.sh
# 添加一行
. /etc/profile
6.2.2
[clsn@hadoop01 /usr/local/hbase/conf]
$ cat hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<!-- hbase存放数据目录 -->
<value>hdfs://hadoop01:9000/hbase/hbase_db</value>
<!-- 端口要和Hadoop的fs.defaultFS端口一致-->
</property>
<property>
<name>hbase.cluster.distributed</name>
<!-- 是否分布式部署 -->
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<!-- zookooper 服务启动的节点,只能为奇数个 -->
<value>hadoop01,hadoop02,hadoop03</value>
</property>
<property>
<!--zookooper配置、日志等的存储位置,必须为以存在 -->
<name>hbase.zookeeper.property.dataDir</name>
<value>/data/hbase/zookeeper</value>
</property>
<property>
<!--hbase web 端口 -->
<name>hbase.master.info.port</name>
<value>16610</value>
</property>
</configuration>
注意:
zookeeper有这样一个特性:
集群中只要有过半的机器是正常工作的,那么整个集群对外就是可用的。
也就是说如果有2个zookeeper,那么只要有1个死了zookeeper就不能用了,因为1没有过半,所以2个zookeeper的死亡容忍度为0;
同理,要是有3个zookeeper,一个死了,还剩下2个正常的,过半了,所以3个zookeeper的容忍度为1;
再多列举几个:2->0 ; 3->1 ; 4->1 ; 5->2 ; 6->2 会发现一个规律,2n和2n-1的容忍度是一样的,都是n-1,所以为了更加高效,何必增加那一个不必要的zookeeper
6.2.3 regionservers
[clsn@hadoop01 /usr/local/hbase/conf]
$ cat regionservers
hadoop01
hadoop02
hadoop03
6.2.4 分发配置到其他节点
for i in hadoop02 hadoop03
do
sudo scp -rp /usr/local/hbase-1.4.9 $i:/usr/local/
done
6.3 启动hbase集群
6.3.1 启动hbase
[clsn@hadoop01 /usr/local/hbase/bin]
$ sudo ./start-hbase.sh
hadoop03: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop03.out
hadoop02: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop02.out
hadoop01: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop01.out
running master, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-master-hadoop01.out
hadoop02: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop02.out
hadoop03: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop03.out
hadoop01: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop01.out
访问http://hadoop01:16610/master-status 查看hbase状态
6.3.2 启动hbase 客户端
[clsn@hadoop01 /usr/local/hbase/bin]
$ ./hbase shell #启动hbase客户端
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hbase-1.4.9/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec 5 11:54:10 PST 2018
hbase(main):001:0> create 'clsn','cf' #创建一个clsn表,一个cf 列簇
0 row(s) in 7.8790 seconds
=> Hbase::Table - clsn
hbase(main):003:0> list #查看hbase 所有表
TABLE
clsn
1 row(s) in 0.0860 seconds
=> ["clsn"]
hbase(main):004:0> put 'clsn','1000000000','cf:name','clsn' #put一条记录到表clsn,rowkey 为 1000000000,放到 name列上
0 row(s) in 0.3390 seconds
hbase(main):005:0> put 'clsn','1000000000','cf:sex','male' #put一条记录到表clsn,rowkey 为 1000000000,放到sex列上
0 row(s) in 0.0300 seconds
hbase(main):006:0> put 'clsn','1000000000','cf:age','24' #put一条记录到表clsn,rowkey 为 1000000000,放到age列上
0 row(s) in 0.0290 seconds
hbase(main):007:0> count 'clsn'
1 row(s) in 0.2100 seconds
=> 1
hbase(main):008:0> get 'clsn','cf'
COLUMN CELL
0 row(s) in 0.1050 seconds
hbase(main):009:0> get 'clsn','1000000000' #获取数据
COLUMN CELL
cf:age timestamp=1545710530665, value=24
cf:name timestamp=1545710495871, value=clsn
cf:sex timestamp=1545710509333, value=male
1 row(s) in 0.0830 seconds
hbase(main):010:0> list
TABLE
clsn
1 row(s) in 0.0240 seconds
=> ["clsn"]
hbase(main):011:0> drop clsn
NameError: undefined local variable or method `clsn' for #<Object:0x6f731759>
hbase(main):012:0> drop 'clsn'
ERROR: Table clsn is enabled. Disable it first.
Here is some help for this command:
Drop the named table. Table must first be disabled:
hbase> drop 't1'
hbase> drop 'ns1:t1'
hbase(main):013:0> list
TABLE
clsn
1 row(s) in 0.0330 seconds
=> ["clsn"]
hbase(main):015:0> disable 'clsn'
0 row(s) in 2.4710 seconds
hbase(main):016:0> list
TABLE
clsn
1 row(s) in 0.0210 seconds
=> ["clsn"]
7. 参考文献
https://hadoop.apache.org/releases.html
https://my.oschina.net/orrin/blog/1816023
https://www.yiibai.com/hadoop/
http://blog.fens.me/hadoop-family-roadmap/
http://www.cnblogs.com/Springmoon-venn/p/9054006.html
https://github.com/googlehosts/hosts
http://abloz.com/hbase/book.html
Hadoop+HBase 集群搭建的更多相关文章
- 高可用Hadoop平台-HBase集群搭建
1.概述 今天补充一篇HBase集群的搭建,这个是高可用系列遗漏的一篇博客,今天抽时间补上,今天给大家介绍的主要内容目录如下所示: 基础软件的准备 HBase介绍 HBase集群搭建 单点问题验证 截 ...
- HBase集群搭建
HBase集群搭建 搭建环境:假设我们的linux环境已经准备好,包括网络.JDK.防火墙.主机名.免密登录等都没有问题,而且一定要有zookeeper.下面我们用3台linux虚拟机来搭建Hbase ...
- hbase集群搭建参考资料
hadoop分布式集群搭建 http://www.ityouknow.com/hadoop/2017/07/24/hadoop-cluster-setup.html hbase分布式集群搭建: htt ...
- Hadoop分布式集群搭建
layout: "post" title: "Hadoop分布式集群搭建" date: "2017-08-17 10:23" catalog ...
- hadoop+spark集群搭建入门
忽略元数据末尾 回到原数据开始处 Hadoop+spark集群搭建 说明: 本文档主要讲述hadoop+spark的集群搭建,linux环境是centos,本文档集群搭建使用两个节点作为集群环境:一个 ...
- hadoop ha集群搭建
集群配置: jdk1.8.0_161 hadoop-2.6.1 zookeeper-3.4.8 linux系统环境:Centos6.5 3台主机:master.slave01.slave02 Hado ...
- 大数据平台搭建-hadoop/hbase集群的搭建
版本要求 java 版本:1.8.*(1.8.0_60) 下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downl ...
- Hbase集群搭建及所有配置调优参数整理及API代码运行
最近为了方便开发,在自己的虚拟机上搭建了三节点的Hadoop集群与Hbase集群,hadoop集群的搭建与zookeeper集群这里就不再详细说明,原来的笔记中记录过.这里将hbase配置参数进行相应 ...
- 基于centos6.5 hbase 集群搭建
注意本章内容是在上一篇文章“基于centos6.5 hadoop 集群搭建”基础上创建的 1.上传hbase安装包 hbase-0.96.2-hadoop2 我的目录存放在/usr/hadoop/hb ...
随机推荐
- Hive| DDL| DML
类型转换 可以使用CAST操作显示进行数据类型转换 例如CAST(' 转换成整数1:如果强制类型转换失败,如执行CAST('X' AS INT),表达式返回空值 NULL. : jdbc:hive2: ...
- sql语句start with connect by prior语法解析
prior分两种放法: 1 放在子节点端 表示start with 指定的节点作为根节点,按照从上到下的顺序遍历 2 放在父节点端 表示start with指定的节点作为最底层节点,按照从下到上的顺序 ...
- git checkout 撤销多个文件,撤销整个文件夹
git checkout 撤销多个文件,撤销整个文件夹 git checkout <folder-name>/ git checkout -- <folder-name> 这样 ...
- Linux git 在自己的服务器上建立 git 仓库(repository)
Linux git 在自己的服务器上建立 git 仓库(repository) 服务器端: 在这里使用 ssh 方式登陆: ssh [username]@server_address(建议用超级用户登 ...
- Number String(DP)
题目: 题意: 给你一个字符串s,s[i] = 'D'表示排列中a[i] > a[i+1],s[i] = 'I'表示排列中a[i] < a[i+1]. 比如排列 {3, 1, 2, 7, ...
- node 简单的爬虫
基于express爬虫, 1,node做爬虫的优势 首先说一下node做爬虫的优势 第一个就是他的驱动语言是JavaScript.JavaScript在nodejs诞生之前是运行在浏览器上的脚本语言, ...
- Kmeans:利用Kmeans实现对多个点进行自动分类—Jason niu
import numpy as np def kmeans(X, k, maxIt): numPoints, numDim = X.shape dataSet = np.zeros((numPoint ...
- 生日蛋糕 POJ - 1190 (搜索+剪枝)
7月17日是Mr.W的生日,ACM-THU为此要制作一个体积为Nπ的M层生日蛋糕,每层都是一个圆柱体. 设从下往上数第i(1 <= i <= M)层蛋糕是半径为Ri, 高度为Hi的圆柱.当 ...
- 阿里 EasyExcel 使用及避坑
github地址:https://github.com/alibaba/easyexcel 原本在项目中使用EasyPoi读取excel,后来为了统一技术方案,改用阿里的EasyExcel.EasyE ...
- Git rebase命令实战
一.前言 一句话,git rebase 可以帮助项目中的提交历史干净整洁!!! 二.避免合并出现分叉现象 git merge操作 1.新建一个 develop 分支 2.在develop分支上新建 ...