三台linux集群hadoop，在此上面运行hive

---恢复内容开始---

一，准备

先有三台linux，对hadoop集群的搭建。

eddy01：开启一个hdfs的老大namenode，yarn的老大ResourceManager其中进程包括（NodeManager，ResourceManager，NameNode，SecondaryNameNode）

eddy02:（datanode，nodemanager）

eddy03:（datanode，nodemanager）

配置文件(只需要对eddy01中有这些配置，eddy02，eddy03都有hadoop，只需要在eddy01中的hosts文件标注其ip和主机名的映射并且在Hadoop的slaves文件中配置就可以，因为在eddy01中启动yarn即ResourceManager，它就会ssh到eddy02，eddy03中)

配置文件：

core-site.xml

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at                                                        

    http://www.apache.org/licenses/LICENSE-2.0                                                   

  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->                                                                                              

<!-- Put site-specific property overrides in this file. -->                                      

<configuration>

        <!-- 指定HDFS老大（namenode）的通信地址 -->

        <property>

                <name>fs.defaultFS</name>

                <!-- <value>hdfs://ns1</value> -->

                <value>hdfs://eddy01:9000</value>

        </property>

        <!-- 指定hadoop运行时产生文件的存储路径 -->

        <property>

                <name>hadoop.tmp.dir</name>

                <value>/usr/local/eddy/hadoop-2.4.1/tmp</value>

        </property>

<!-- 指定zookeeper地址

        <property>

                <name>ha.zookeeper.quorum</name>

                <value>eddy01:2181,eddy02:2181,eddy03:2181</value>

        </property>

-->

</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at                                                        

    http://www.apache.org/licenses/LICENSE-2.0                                                   

  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->                                                                                              

<!-- Put site-specific property overrides in this file. -->                                      

<configuration>

<!-- 设置hdfs副本数量 -->

                        <property>

                                <name>dfs.replication</name>

                                <value>2</value>

                        </property>

<!-- 元数据的保存位置 -->

                        <property>

                                <name>dfs.name.dir</name>

                                <value>/usr/local/eddy/hadoop-2.4.1/tmp/name1/</value>

                        </property>

</configuration>

~

mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at                                                        

    http://www.apache.org/licenses/LICENSE-2.0                                                   

  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->                                                                                              

<!-- Put site-specific property overrides in this file. -->                                      

<configuration>

<!-- 通知框架MR使用YARN -->

                        <property>

                                        <name>mapreduce.framework.name</name>

                                        <value>yarn</value>

                        </property>

</configuration>

yarn-site.xml

<?xml version="1.0"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at                                                        

    http://www.apache.org/licenses/LICENSE-2.0                                                   

  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->

<configuration>                                                                                  

<!-- Site specific YARN configuration properties -->

<!-- 指定YARN的老大（RM）的地址 -->

                        <property>

                <name>yarn.resourcemanager.hostname</name>

                <value>eddy01</value>

                        </property>                                                              

                        <!-- reducer取数据的方式是mapreduce_shuffle -->

                        <property>

                                <name>yarn.nodemanager.aux-services</name>

                                <value>mapreduce_shuffle</value>

                        </property>

</configuration>

slaves文件，关于slaves文件的详细解析查看：http://www.tuicool.com/articles/zINvYbf

eddy01

eddy02

eddy03

在eddy01上操作：

启动hdfs：

1,格式化hdfs

hdfs namenode -format (hadoop namenode -format)

2，启动hdfs

start-dfs.sh

成功的话用jps命令就会有以下进程：

[root@eddy01 sbin]# jps

 Jps

 NameNode

 SecondaryNameNode

3，启动yarn

start-yarn.sh

进程：

[root@eddy01 sbin]# jps

 Jps

 NodeManager

 ResourceManager

 NameNode

 SecondaryNameNode

分别在eddy02，eddy03上操作：

启动datanode和nodemanager

hadoop-daemon.sh start datanode

start-yarn.sh

[root@eddy02 eddy]# jps

 DataNode

 NodeManager

 Jps

二，安装hive

1，下载hive,上传到linux中，解压安装包到/usr/local/eddy/hive/中

tar -zxvf hive-0.9..tar.gz -C /cloud/

2，安装mysql，可以参看：

http://www.cnblogs.com/Eddyer/p/4993990.html

解决mysql乱码问题：

http://www.cnblogs.com/Eddyer/p/4995056.html

安装完成后，修改root的密码

[root@eddy01 etc]# mysqladmin -u root password "root";

设置root用户的权限：

安装hive和mysq完成后，将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下

    如果出现没有权限的问题，在mysql授权(在安装mysql的机器上执行)

    mysql -uroot -p

    #(执行下面的语句  *.*:所有库下的所有表   %：任何IP地址或主机都可以连接)

    GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION;

    FLUSH PRIVILEGES;

在使用的时候出现了问题：

create table years (year string, event string) row format delimited fields terminated by '\t';

FAILED: Execution Error, return code  from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)

解决：

1，在hdfs中添加权限：

hadoop dfs -chmod -R  /tmp

hadoop dfs -chmod -R  /user/hive/warehouse

2,在mysql中手动创建hive数据库

create hive ；

修改编码：

mysql> alter database hive character set latin1;

Query OK,  row affected (0.00 sec)

mysql> exit

再来启动Hadoop 和hive

3，配置hive文件：

.配置hive

    （a）配置HIVE_HOME环境变量  vi conf/hive-env.sh 配置其中的$hadoop_home

    （b）配置元数据库信息   vi  hive-site.xml

    添加如下内容：

<configuration>

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>

<description>JDBC connect string for a JDBC metastore</description>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>Driver class name for a JDBC metastore</description>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>root</value>

<description>username to use against metastore database</description>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>root</value>

<description>password to use against metastore database</description>

</property>

</configuration>

4，jar包的冲突：

Jline包版本不一致的问题，需要拷贝hive的lib目录中jline.2.12.jar的jar包替换掉hadoop中的

/home/hadoop/app/hadoop-2.6./share/hadoop/yarn/lib/jline-0.9..jar

启动hive

[root@eddy01 bin]# ./hive

Logging initialized using configuration in jar:file:/usr/local/eddy/hive-1.2./lib/hive-common-1.2..jar!/hive-log4j.properties

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/eddy/hadoop-2.4./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/Static

LoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/eddy/hadoop-2.4./share/hadoop/mapreduce/hadoop.jar!/org/slf4j/impl/StaticLoggerBinder.c

lass]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

hive>

附笔记

Hive只在一个节点上安装即可

.上传tar包

.解压

    tar -zxvf hive-0.9..tar.gz -C /cloud/

.安装mysql数据库（切换到root用户）（装在哪里没有限制，只有能联通hadoop集群的节点）

    mysql安装仅供参考，不同版本mysql有各自的安装流程

        rpm -qa | grep mysql

        rpm -e mysql-libs-5.1.-.el6_3.i686 --nodeps

        rpm -ivh MySQL-server-5.1.-.glibc23.i386.rpm

        rpm -ivh MySQL-client-5.1.-.glibc23.i386.rpm

    修改mysql的密码

    /usr/bin/mysql_secure_installation

    （注意：删除匿名用户，允许用户远程连接）

    登陆mysql

    mysql -u root -p

.配置hive

    （a）配置HIVE_HOME环境变量  vi conf/hive-env.sh 配置其中的$hadoop_home

    （b）配置元数据库信息   vi  hive-site.xml

    添加如下内容：

<configuration>

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>

<description>JDBC connect string for a JDBC metastore</description>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>Driver class name for a JDBC metastore</description>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>root</value>

<description>username to use against metastore database</description>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>root</value>

<description>password to use against metastore database</description>

</property>

</configuration>

.安装hive和mysq完成后，将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下

    如果出现没有权限的问题，在mysql授权(在安装mysql的机器上执行)

    mysql -uroot -p

    #(执行下面的语句  *.*:所有库下的所有表   %：任何IP地址或主机都可以连接)

    GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION;

    FLUSH PRIVILEGES;

. Jline包版本不一致的问题，需要拷贝hive的lib目录中jline.2.12.jar的jar包替换掉hadoop中的

/home/hadoop/app/hadoop-2.6./share/hadoop/yarn/lib/jline-0.9..jar

启动hive

bin/hive

----------------------------------------------------------------------------------------------------

.建表(默认是内部表)

    create table trade_detail(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';

    建分区表

    create table td_part(id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by '\t';

    建外部表

    create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/td_ext';

.创建分区表

    普通表和分区表区别：有大量数据增加的需要建分区表

    create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t'; 

    分区表加载数据

    load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');

    load data local inpath '/root/data.am' into table beauty partition (nation="USA");

    select nation, avg(size) from beauties group by nation order by avg(size);

---恢复内容结束---

三台linux集群hadoop，在此上面运行hive的更多相关文章

linux集群时钟问题
一.ntpd与ntpdate的区别: 摘自:ntpd与ntpdate的区别 - 百事乐 - 博客园 http://www.cnblogs.com/liuyou/archive/2012/07/29/ ...
实验室 Linux 集群的管理常用命令
实验室有一个Linux集群,本文做一下记录. SSH相关命令通过SSH登录集群中的其他机器上的操作系统(或虚拟机中的操作系统).操作系统之间已经设置免密码登录. 1. 无选项参数运行 SSH 通常使 ...
linux集群架构
Linux集群架构根据功能划分为两大类:高可用和负载均衡高可用集群通常为两台服务器,一台工作,另外一台作为冗余,当提供服务的机器宕机,冗余将接替继续提供服务实现高可用的开源软件有:heart ...
Linux集群架构（一）
第二十八课 Linux集群架构(一) 目录一. 集群介绍二. keepalived介绍三. 用keepalived配置高可用集群四. 负载均衡集群介绍五. LVS介绍六. LVS调度算法 ...
Linux集群基础
Linux集群基础作者:Danbo 时间:2015-7-12 集群概述什么是集群?集群是一组协同工作的服务器实体.用以提供比单一服务实体更具扩展性和可用性的平台. 集群的分类 1.HPC(High ...
Linux 笔记 - 第十八章 Linux 集群之（二）LVS 负载均衡集群
一.前言 Linux 集群从功能上可以分为两大类:高可用集群和负载均衡集群.上一篇已经讲解了 HA 高可用集群,此节讲解负载均衡集群. 负载均衡集群(Load Balance Cluseter,简称 ...
基于VMware的虚拟Linux集群搭建-lvs+keepalived
基于VMware的虚拟Linux集群搭建-lvs+keepalived 本文通过keepalived实现lvsserver的的双机热备和真实server之间的负载均衡.这方面的blog挺多,可是每一个 ...
Linux集群
集群的起源: 集群并不是一个全新的概念,其实早在七十年代计算机厂商和研究机构就开始了对集群系统的研究和开发.由于主要用于科学工程计算,所以这些系统并不为大家所熟知.直到Linux集群的出现,集群的概念 ...
大数据项目之_15_帮助文档_NTP 配置时间服务器+Linux 集群服务群起脚本+CentOS6.8 升级到 python 到 2.7
一.NTP 配置时间服务器1.1.检查当前系统时区1.2.同步时间1.3.检查软件包1.4.修改 ntp 配置文件1.5.重启 ntp 服务1.6.设置定时同步任务二.Linux 集群服务群起脚本2. ...

随机推荐

python编码及转换
第一种:ASCII码 ASCII(American Standard Code for Information Interchange,美国信息交换标准代码)是基于拉丁字母的一套电脑编码系统,主要用于 ...
成功解决android studio打包报错
Win7系统,Android Studio 版本2.3.1,对cpp-empty-test使用了 cocos compile -p android --android-studio,命令编译打包AP ...
python基础11_函数作用域_global_递归
看到了一个16进制转换的小知识点,就验证了一下运行结果. #!/usr/bin/env python # coding:utf-8 # 看到了16进制转换的问题.顺便验证一下. a = 255 b = ...
mysql中有关查询的技巧方法
* 查最高值或者最低值对应行的数据: 查询Score表中的最高分的学生学号和课程号: 两种方法(子查询或者排序): 子查询法:select sno,cno from score where degre ...
关于C#中break和continue的认识
1.在昨天我在做循环处理的时候,遇到了break和continue的使用问题,今天来总结了一下, break: for (int i=0;i<5;i++) { a++; if(a==1) bre ...
通用base.css —— 《编写高质量代码 web前端开发修炼之道》
@charset "utf-8"; /*CSS reset*/ html{color:#000;background:#FFF;} body,div,dl,dt,dd,ul,ol, ...
Python全栈之路----常用模块----random模块
程序中有很多地方需要用到随机字符,比如登陆网站的随机验证码,通过random模块可以很容易生成随机字符串. >>> import random >>> random ...
Js高级事件冒泡
什么叫事件冒泡当给父子元素的同一事件绑定方法时,触发了子元素身上的事件,执行完毕之后,也会触发父级元素的相同事件,这种传播机制叫事件冒泡. 取消事件冒泡 Event对象有个属性叫cancelBubb ...
2019西湖论剑网络安全技能大赛（大学生组）--奇怪的TTL字段（补充）
鉴于有人不会将得到的16进制数据在winhex中转成图片,我在这里写一个详细的步骤. 首先就是将六张图片的十六进制数据找出并提取出来. 打开winhex,新建一个文档. 大小可以选1bytes 将数据 ...
hibernate一级缓存和快照
摘自网络: Hibernate中的一级缓存的底层是以Map形式存在的,key是主键,value是对象,所以它的泛型为Map<Serializable,Object>,key的泛型为串行化是 ...

三台linux集群hadoop，在此上面运行hive

三台linux集群hadoop，在此上面运行hive的更多相关文章

随机推荐

热门专题