1. zookeeper

  • 配置
  • cp app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/zoo_sample.cfg app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/zoo.cfg
  • vim app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/zoo.cfg
  • dataDir=/home/cdh5/tmp/zookeeper
  • clientPort=2183
  • server.1=ocdata09:2888:3888
  • mkdir -p /home/cdh5/tmp/zookeeper
  • vim /home/cdh5/tmp/zookeeper/myid
  • echo "1" > /home/cdh5/tmp/zookeeper/myid
  • 初始化操作:

或者 ./runRemoteCmd.sh '~/och200/zookeeper/bin/zkServer-initialize.sh --myid=1' zoo

  • 分发配置
  • ./deploy.sh app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/zoo.cfg app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/ zoo
  • ./runRemoteCmd.sh "app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/zkServer.sh start" zoo
  • ./runRemoteCmd.sh 'echo ruok | nc localhost 2183' zoo
  • ./runRemoteCmd.sh "app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/zkServer.sh stop" zoo
  • 启动
  • 验证
  • 停止

2. HDFS

  • 配置hadoop
  • hdfs-site.xml
  • <property>
  • <name>dfs.nameservices</name>
  • <value>cdh5cluster</value>
  • <description>
  • Comma-separated list of nameservices.
  • </description>
  • </property>
  • <property>
  • <name>dfs.datanode.address</name>
  • <value>0.0.0.0:50011</value>
  • <description>
  • The datanode server address and port for data transfer.
  • If the port is 0 then the server will start on a free port.
  • </description>
  • </property>
  • <property>
  • <name>dfs.datanode.http.address</name>
  • <value>0.0.0.0:50076</value>
  • <description>
  • The datanode http server address and port.
  • If the port is 0 then the server will start on a free port.
  • </description>
  • </property>
  • <property>
  • <name>dfs.datanode.ipc.address</name>
  • <value>0.0.0.0:50021</value>
  • <description>
  • The datanode ipc server address and port.
  • If the port is 0 then the server will start on a free port.
  • </description>
  • </property>
  • <property>
  • ()
  • <name>dfs.nameservices</name>
  • <value>cdh5cluster</value>
  • </property>
  • <property>
  • (命名空间中所有NameNode的唯一标示名称。可以配置多个,使用逗号分隔。该名称是可以让DataNode知道每个集群的所有NameNode.当前,每个集群最多只能配置两个NameNode)
  • <name>dfs.ha.namenodes.cdh5cluster</name>
  • <value>nn1,nn2</value>
  • <description></description>
  • </property>
  • <property>
  • <name>dfs.namenode.name.dir</name>
  • <value>file:///data1/cdh5/dfs/name</value>
  • <description>Determines where on the local filesystem the DFS name node should store the name table.If this is a comma-delimited list of directories,then name table is replicated in all of the directories,for redundancy.</description>
  • <final>true</final>
  • </property>
  • <property>
  • <name>dfs.datanode.data.dir</name>
  • <value>file:///data1/cdh5/dfs/data,file:///data2/cdh5/dfs/data,file:///data3/cdh5/dfs/data</value>
  • <final>true</final>
  • </property>
  • <property>
  • <name>dfs.replication</name>
  • <value>3</value>
  • </property>
  • <property>
  • <name>dfs.permission</name>
  • <value>true</value>
  • </property>
  • <property>
  • <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
  • <value>true</value>
  • <description>
  • Boolean which enables backend datanode-side support for the experimental DistributedFileSystem*getFileVBlockStorageLocations API.
  • </description>
  • </property>
  • <property>
  • <name>dfs.permissions.enabled</name>
  • <value>false</value>
  • <description>
  • If "true", enable permission checking in HDFS.
  • If "false", permission checking is turned off,
  • but all other behavior is unchanged.
  • Switching from one parameter value to the other does not change the mode,
  • owner or group of files or directories.
  • </description>
  • </property>
  • <property>
  • (每个NAMENODE监听的RPC地址)
  • <name>dfs.namenode.rpc-address.cdh5cluster.nn1</name>
  • <value>ocdata09:8030</value>
  • <description>节点NN1的RPC地址</description>
  • </property>
  • <property>
  • <name>dfs.namenode.rpc-address.cdh5cluster.nn2</name>
  • <value>ocdata08:8030</value>
  • <description>节点NN2的RPC地址</description>
  • </property>
  • <property>
  • <name>dfs.namenode.http-address.cdh5cluster.nn1</name>
  • <value>ocdata09:50082</value>
  • <description>节点NN1的HTTP地址</description>
  • </property>
  • <property>
  • <name>dfs.namenode.http-address.cdh5cluster.nn2</name>
  • <value>ocdata08:50082</value>
  • <description>节点NN2的HTTP地址</description>
  • </property>
  • <property>
  • (这是NameNode读写JNs的uri。通过这个uri,NameNodes可以读写edit log内容。URI的格式”qjournal://host1:port1;host2:port2;host3:port3/journalId”。这里的host1、host2、host3指的是Journal Node的地址,这里必须是奇数个,至少3个;其中journaId是集群的唯一标示符,对于多个联邦命名空间,也使用同一个journaId。配置如下J
  • <name>dfs.namenode.shared.edits.dir</name>
  • <value>qjournal://ocdata05:8488;ocdata06:8488;ocdata07:8488/cdh5cluster</value>
  • <description>采用3个journalnode节点存储元数据,这是IP与端口</description>
  • </property>
  • <property>
  • <name>dfs.journalnode.edits.dir</name>
  • <value>/home/cdh5/journaldata/jn</value>
  • <description>journaldata的存储路径</description>
  • </property>
  • <property>
  • <name>dfs.journalnode.rpc-address</name>
  • <value>0.0.0.0:8488</value>
  • </property>
  • <property>
  • <name>dfs.journalnode.http-address</name>
  • <value>0.0.0.0:8483</value>
  • </property>
  • <property>
  • <name>dfs.client.failover.proxy.provider.cdh5cluster</name>
  • <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  • <description>该类用来判断哪个namenode处于生效状态</description>
  • </property>
  • <property>
  • <name>dfs.ha.fencing.methods</name>
  • <value>shell(/bin/true)</value>
  • </property>
  • <property>
  • <name>dfs.ha.fencing.ssh.connect-timeout</name>
  • <value>10000</value>
  • </property>
  • <property>
  • <name>dfs.ha.automatic-failover.enabled</name>
  • <value>true</value>
  • <description>
  • Whether automatic failover is enabled. See the HDFS High
  • Availability documentation for details on automatic HA
  • configuration.
  • </description>
  • </property>
  • <property>
  • <name>ha.zookeeper.quorum</name>
  • <value>ocdata09:2183</value>
  • <description>1个zookeeper节点</description>
  • </property>
  • <property>
  • <name>dfs.datanode.max.xcievers</name>
  • <value>4096</value>
  • </property>
  • <property>
  • <name>dfs.datanode.max.transfer.threads</name>
  • <value>4096</value>
  • <description>
  • Specifies the maximum number of threads to use for transferring data
  • in and out of the DN.
  • </description>
  • </property>
  • <property>
  • <name>dfs.blocksize</name>
  • <value>64m</value>
  • <description>
  • The default block size for new files, in bytes.
  • You can use the following suffix (case insensitive):
  • k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.),
  • Or provide complete size in bytes (such as 134217728 for 128 MB).
  • </description>
  • </property>
  • <property>
  • <name>dfs.namenode.handler.count</name>
  • <value>20</value>
  • <description>The number of server threads for the namenode.</description>
  • </property>
  • <property>
  • (这是一个描述集群中NameNode节点的URI(包括协议、主机名称\端口号—)集群里面的每一台机器都要知道NAMENODE的地址。DataNode节点会先在NAMENODE上注册,这样它们的数据才可以被使用。独立的客户端程序通过这个URI跟DATANODE交互,以取得文件的块列表。)
  • <name>fs.defaultFS</name>
  • <value>hdfs://cdh5cluster</value>
  • </property>
  • <property>
  • (hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配置namenode和datanode的存放位置,默认就放在这个路径中)
  • <name>hadoop.tmp.dir</name>
  • <value>/home/cdh5/tmp/hadoop/hadoop-${user.name}</value>
  • </property>
  • <property>
  • core-site.xml

(对本地jar包进行加载)

  • <name>io.native.lib.available</name>
  • <value>true</value>
  • <description>Should native hadoop libraries, if present, be used.</description>
  • </property>
  • (压缩和解压编码类列表,用逗号分隔,这些类是使用java ServiceLoader加载,如果不设置就为null)
  • <property>
  • <name>io.compression.codecs</name>         <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
  • </property>
  • ocdata05
  • ocdata06
  • ocdata07
  • ocdata08
  • ocdata09
  • ocdata05
  • ocdata06
  • export JAVA_HOME=/home/cdh5/app/jdk1.7.0_21
  • ./deploy.sh app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc all
  • slaves
  • masters
  • hadoop-env.sh
  • 分发
  • 初始化HDFS:

主节点执行

app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/hdfs zkfc -formatZK

./runRemoteCmd.sh 'app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/hadoop-daemon.sh start journalnode' jn

app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/hdfs namenode -format -initializeSharedEdits

app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/hadoop-daemon.sh start namenode

备节点执行

app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/hdfs namenode -bootstrapStandby

完成

app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/hadoop-daemon.sh stop namenode

./runRemoteCmd.sh 'app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/hadoop-daemon.sh stop journalnode' jn

  • 启动HDFS
  • app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/start-dfs.sh
  • http://10.1.253.99:50082/dfshealth.html (active)
  • http://10.1.253.98:50082/dfshealth.html (standby)
  • http://10.1.253.97:8483/journalstatus.jsp
  • http://10.1.253.96:8483/journalstatus.jsp
  • http://10.1.253.95:8483/journalstatus.jsp
  • app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/stop-dfs.sh
  • 验证:
  • 停止HDFS

3. Yarn

配置YARN

  • mapred-site.xml
  • cp app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop/mapred-site.xml.template app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop/mapred-site.xml
  • vim app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop/mapred-site.xml
  • <property>
  • <name>mapreduce.framework.name</name>
  • <value>yarn</value>
  • </property>
  • <property>
  • <name>mapreduce.shuffle.port</name>
  • <value>8350</value>
  • </property>
  • <property>
  • <name>mapreduce.jobhistory.address</name>
  • <value>0.0.0.0:10121</value>
  • </property>
  • <property>
  • <name>mapreduce.jobhistory.webapp.address</name>
  • <value>0.0.0.0:19868</value>
  • </property>
  • <property>
  • <name>mapreduce.jobtracker.http.address</name>
  • <value>0.0.0.0:50330</value>
  • </property>
  • <property>
  • <name>mapreduce.tasktracker.http.address</name>
  • <value>0.0.0.0:50360</value>
  • </property>
  • vim app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop/yarn-site.xml
  • <!-- Resource Manager Configs -->
  • <property>
  • <name>yarn.resourcemanager.connect.retry-interval.ms</name>
  • <value>2000</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.ha.enabled</name>
  • <value>true</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
  • <value>true</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
  • <value>true</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.cluster-id</name>
  • <value>yarn-rm-cluster</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.ha.rm-ids</name>
  • <value>rm1,rm2</value>
  • </property>
  • <property>
  • <description>Id of the current ResourceManager. Must be set explicitly on each ResourceManager to the appropriate value.</description>
  • <name>yarn.resourcemanager.ha.id</name>
  • <value>rm1</value>
  • <!-- rm1上配置为rm1, rm2上配置rm2 -->
  • </property>
  • <property>
  • <name>yarn.resourcemanager.recovery.enabled</name>
  • <value>true</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.store.class</name>
  • <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.zk.state-store.address</name>
  • <value>ocdata09:2183</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.zk-address</name>
  • <value>ocdata09:2183</value>
  • </property>
  • <property>
  • <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
  • <value>5000</value>
  • </property>
  • <!-- RM1 configs -->
  • <property>
  • <name>yarn.resourcemanager.address.rm1</name>
  • <value>ocdata08:23140</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.scheduler.address.rm1</name>
  • <value>ocdata08:23130</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.webapp.address.rm1</name>
  • <value>ocdata08:23188</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
  • <value>ocdata08:23125</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.admin.address.rm1</name>
  • <value>ocdata08:23141</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.ha.admin.address.rm1</name>
  • <value>ocdata08:23142</value>
  • </property>
  • <!-- RM2 configs -->
  • <property>
  • <name>yarn.resourcemanager.address.rm2</name>
  • <value>ocdata09:23140</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.scheduler.address.rm2</name>
  • <value>ocdata09:23130</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.webapp.address.rm2</name>
  • <value>ocdata09:23188</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
  • <value>ocdata09:23125</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.admin.address.rm2</name>
  • <value>ocdata09:23141</value>
  • </property>
  • <property>
  • <name>yarn.resourcemanager.ha.admin.address.rm2</name>
  • <value>ocdata09:23142</value>
  • </property>
  • <!-- Node Manager Configs -->
  • <property>
  • <description>Address where the localizer IPC is.</description>
  • <name>yarn.nodemanager.localizer.address</name>
  • <value>0.0.0.0:23344</value>
  • </property>
  • <property>
  • <description>NM Webapp address.</description>
  • <name>yarn.nodemanager.webapp.address</name>
  • <value>0.0.0.0:23999</value>
  • </property>
  • <property>
  • <name>yarn.nodemanager.aux-services</name>
  • <value>mapreduce_shuffle</value>
  • </property>
  • <property>
  • <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
  • <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  • </property>
  • <property>
  • <name>yarn.nodemanager.local-dirs</name>
  • <value>/tmp/pseudo-dist/yarn/local</value>
  • </property>
  • <property>
  • <name>yarn.nodemanager.log-dirs</name>
  • <value>/tmp/pseudo-dist/yarn/log</value>
  • </property>
  • ./deploy.sh app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc all
  • app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/start-yarn.sh
  • yarn-site.xml
  • 分发
  • Yarn的启动停止 YARN不需要初始化,登录主节点执行

cdh5 yarn的ha需要手动启动备节点

./runRemoteCmd.sh "cd app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin; ./yarn-daemon.sh start resourcemanager" rm2

验证

http://10.1.253.98:23188/cluster (有节点列表,active)

http://10.1.253.99:23188/cluster (无节点列表,standby)

app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/hadoop jar app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar TestDFSIO -write -nrFiles 40 -fileSize 20MB

停止YARN

app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/stop-yarn.sh

手动停止备节点

./runRemoteCmd.sh "cd app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin; ./yarn-daemon.sh stop resourcemanager" rm2

4. hive

  • 配置
  • cp hive-env.sh.template hive-env.sh
  • vim hive-env.sh
  • export HADOOP_HOME=/home/cdh5/app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT
  • cp hive-default.xml.template hive-site.xml
  • vim hive-site.xml

删除其他配置项,只保留:

<property>

(配置元数据库,一般为mysql)

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://10.1.252.69:3306/cdh5?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>

</property>

<property>

(配置元数据库的Driver)

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>Driver class name for a JDBC metastore</description>

</property>

<property>

(配置元数据库的名称)

<name>javax.jdo.option.ConnectionUserName</name>

<value>cdh5</value>

<description>username to use against metastore database</description>

</property>

<property>

(配置元数据库的密码)

<name>javax.jdo.option.ConnectionPassword</name>

<value>cdh5</value>

<description>password to use against metastore database</description>

</property>

  • 元数据库配置
  • CREATE USER cdh5 IDENTIFIED BY 'cdh5';
  • CREATE DATABASE cdh5;
  • alter database cdh5 character set latin1;
  • grant all privileges on *.* to cdh5@"%" identified by "cdh5";
  • flush privileges;
  • scp mysql-connector-java-5.1.26.jar cdh5@10.1.253.99:/home/cdh5/app/ochadoop-och3.0.0-SNAPSHOT/hive-0.12.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/lib/
  • ./deploy.sh app/ochadoop-och3.0.0-SNAPSHOT/hive-0.12.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT app/ochadoop-och3.0.0-SNAPSHOT/ hive
  • nohup ./hiveserver2 &
  • jdbc:
  • jdbc:hive2://10.1.253.99:10000/default
  • org.apache.hive.jdbc.HiveDriver
  • lib: Hadoop和hive下所有jar包
  • !connect jdbc:hive2://10.1.253.99:10000/default
  • Enter username:dmp
  • Enter password:dmp
  • show tables;
  • +--------------+
  • |   tab_name   |
  • +--------------+
  • | shaoaq_test  |
  • +--------------+
  • select * from shaoaq_test;
  • +-----+
  • | id  |
  • +-----+
  • +-----+
  • 上传jdbc jar包
  • 分发
  • 启动
  • 验证

5. hbase

  • 配置
  • vim app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/regionservers
  • ocdata05
  • ocdata06
  • ocdata07
  • vim app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/backup-masters
  • ocdata08
  • vim app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/hbase-site.xml
  • <property>
  • <name>hbase.rootdir</name>
  • <value>hdfs://cdh5cluster/hbase</value>
  • </property>
  • <property>
  • <name>hbase.cluster.distributed</name>
  • <value>true</value>
  • </property>
  • <property>
  • <name>hbase.zookeeper.quorum</name>
  • <value>ocdata09</value>
  • </property>
  • <property>
  • <name>hbase.zookeeper.property.clientPort</name>
  • <value>2183</value>
  • </property>
  • <property>
  • <name>hbase.regionserver.port</name>
  • <value>60328</value>
  • </property>
  • <property>
  • <name>hbase.regionserver.info.port</name>
  • <value>62131</value>
  • </property>
  • vim app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/hbase-env.sh
  • export JAVA_HOME=/home/cdh5/app/jdk1.7.0_51
  • export HBASE_CLASSPATH=/home/cdh5/app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop
  • export HBASE_HOME=/home/cdh5/app/hbase
  • export HADOOP_HOME=/home/cdh5/app/hadoop
  • export HADOOP_CONF_DIR=${HADOOP_HOME}/conf
  • export HBASE_LIBRARY_PATH=${HBASE_HOME}/lib/native
  • export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HBASE_HOME}/lib/native
  • export HBASE_MANAGES_ZK=false
  • ./deploy.sh app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/ all
  • ./start-hbase.sh
  • ./hbase shell
  • create 'hb_test', 'cf'
  • put 'hb_test','row1','cf:a','123'
  • get 'hb_test','row1'
  • COLUMN                            CELL
  • cf:a                             timestamp=1395204538429, value=123
  • 1 row(s) in 0.0490 seconds
  • quit
  • ./stop-hbase.sh
  • 分发配置
  • 启动
  • 验证
  • 停止

5. spark

spark当前可解压即用,yarn-client模式无需分发,只需修改客户端若干配置;

  • spark-1.1.0 on yarn的几个配置说明如下:
  • vim spark-env.sh
  • MASTER:部署模式,yarn-client/yarn-cluster/local
  • HADOOP_CONF_DIR:(必填)hadoop配置文件目录
  • SCALA_HOME:scala安装路径
  • SPARK_EXECUTOR_INSTANCES:spark申请的yarn worker总数
  • SPARK_EXECUTOR_CORES:每个worker申请的vcore数目
  • SPARK_EXECUTOR_MEMORY:每个worker申请的内存大小
  • SPARK_DRIVER_MEMORY:spark申请的appMaster内存大小
  • SPARK_YARN_APP_NAME:yarn中显示的spark任务名称
  • SPARK_YARN_QUEUE:spark任务队列
  • SPARK_SUBMIT_LIBRARY_PATH:spark任务执行时需要的库目录,如hadoop的native目录
  • SPARK_CLASSPATH:spark任务的classpath
  • SPARK_JAVA_OPTS:JVM进程参数,如gc类型、gc日志、dmp输出等
  • SPARK_HISTORY_OPTS:spark history-server配置参数,一般需要指定webUI端口、记录个数以及Event存储目录等
  • vim spark-defaults.conf
  • spark.local.dir:spark任务执行时的本地临时目录
  • spark.yarn.executor.memoryOverhead:每个worker的堆外内存大小,单位MB,yarn模式下需配置以防止内存溢出
  • spark.eventLog.enabled:是否记录Spark事件,用 于应用程序在完成后重构webUI
  • spark.eventLog.dir:保存日志相关信息的路径,可以是hdfs://开头的HDFS路径,也可以是file://开头的本地路径,都需要提前创建
  • spark.eventLog.compress:是否压缩记录Spark事件,前提spark.eventLog.enabled为true,默认使用snappy
  • 启停thrift-server:

使用spark-sql/thrift-server组件前需将hive-site.xml复制到$SPARK_HOME/conf目录下以使用hive的元数据和若干配置如server端口,可能需要去掉其中的一些多余或不支持的配置项,请留意;

$SPARK_HOME/sbin/start-thriftserver.sh

$SPARK_HOME/sbin/stop-thriftserver.sh

  • 启停history-server:
  • $SPARK_HOME/sbin/start-history-server.sh
  • $SPARK_HOME/sbin/stop-history-server.sh
  • 注意:
  1. 如hadoop中启用了lzo压缩需将hadoop-lzo-*.jar复制到SPARK_HOME/lib/目录下;
  2. SPARK-1.1.0版本中spark-examples-*.jar关联的thrift版本与spark-assembly-*.jar不一致,需删除;
  • 配置样例:

spark-env.sh

MASTER="yarn-client"

SPARK_HOME=/home/ochadoop/app/spark

SCALA_HOME=/home/ochadoop/app/scala

JAVA_HOME=/home/ochadoop/app/jdk

HADOOP_HOME=/home/ochadoop/app/hadoop

HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

SPARK_EXECUTOR_INSTANCES=50

SPARK_EXECUTOR_CORES=2

SPARK_EXECUTOR_MEMORY=4G

SPARK_DRIVER_MEMORY=3G

SPARK_YARN_APP_NAME="Spark-1.1.0"

#export SPARK_YARN_QUEUE="default"

SPARK_SUBMIT_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native

SPARK_JAVA_OPTS="-verbose:gc -XX:-UseGCOverheadLimit -XX:+UseCompressedOops -XX:-PrintGCDetails -XX:+PrintGCTimeStamps $SPARK_JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ochadoop/app/spark/`date +%m%d%H%M%S`.hprof"

export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=1000 -Dspark.history.fs.logDirectory=hdfs://testcluster/eventLog"

spark-defaults.conf

spark.serializer                    org.apache.spark.serializer.KryoSerializer

spark.local.dir                     /data2/ochadoop/data/pseudo-dist/spark/local,/data3/ochadoop/data/pseudo-dist/spark/local

spark.io.compression.codec          snappy

spark.speculation                   false

spark.yarn.executor.memoryOverhead  512

#spark.storage.memoryFraction       0.4

spark.eventLog.enabled              true

spark.eventLog.dir                  hdfs://testcluster/eventLog

spark.eventLog.compress             true

如下命令都是用root身份安装,或者在命令前加上sudo
采用yum安装方式安装
yum install mysql     #安装mysql客户端
yum install mysql-server  #安装mysql服务端
判断MySQL是否已经安装好:
chkconfig --list|grep mysql
启动mysql服务:
service mysqld start或者/etc/init.d/mysqld start
检查是否启动mysql服务:
/etc/init.d/mysqld status
设置MySQL开机启动:
chkconfig mysqld on 
检查设置MySQL开机启动是否配置成功:
chkconfig --list|grep mysql
显示2 3 4 5为on
创建root管理员
mysqladmin -uroot password root
登录
mysql -uroot -proot

hadoop-spark-hive-hbase配置相关说明的更多相关文章

  1. Hadoop & Spark & Hive & HBase

    Hadoop: http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-common/SingleCluster.html bi ...

  2. 大数据学习系列之七 ----- Hadoop+Spark+Zookeeper+HBase+Hive集群搭建 图文详解

    引言 在之前的大数据学习系列中,搭建了Hadoop+Spark+HBase+Hive 环境以及一些测试.其实要说的话,我开始学习大数据的时候,搭建的就是集群,并不是单机模式和伪分布式.至于为什么先写单 ...

  3. HADOOP+SPARK+ZOOKEEPER+HBASE+HIVE集群搭建(转)

    原文地址:https://www.cnblogs.com/hanzhi/articles/8794984.html 目录 引言 目录 一环境选择 1集群机器安装图 2配置说明 3下载地址 二集群的相关 ...

  4. Hadoop之Hive(2)--配置Hive Metastore

    Hive metastore服务以关系性数据库的方式存储Hive tables和partitions的metadata,并且提供给客户端访问这些数据的metastore service的API.下面介 ...

  5. Hadoop/Spark开发环境配置

    修改hostname bogon 为localhost 查看ip地址 [training@bogon ~]$ sudo hostname localhost [training@bogon ~]$ h ...

  6. Hadoop学习---Zookeeper+Hbase配置学习

    软件版本号: JDK:jdk-8u45-linux-i586.tar.gz Zookeeper:zookeeper-3.4.6 Hbase:hbase-1.0.0-bin 一.JDK版本更换 由于之前 ...

  7. hadoop之hive&hbase互操作

    大家都知道,hive的SQL操作非常方便,但是查询过程中需要启动MapReduce,无法做到实时响应. hbase是hadoop家族中的分布式数据库,与传统关系数据库不同,它底层采用列存储格式,扩展性 ...

  8. 从零自学Hadoop(20):HBase数据模型相关操作上

    阅读目录 序 介绍 命名空间 表 系列索引 本文版权归mephisto和博客园共有,欢迎转载,但须保留此段声明,并给出原文链接,谢谢合作. 文章是哥(mephisto)写的,SourceLink 序 ...

  9. 从零自学Hadoop(21):HBase数据模型相关操作下

    阅读目录 序 变量 数据模型操作 系列索引 本文版权归mephisto和博客园共有,欢迎转载,但须保留此段声明,并给出原文链接,谢谢合作. 文章是哥(mephisto)写的,SourceLink 序 ...

  10. hadoop 安装之 hadoop、hive环境配置

    总结了一下hadoop的大致安装过程,按照master . slave的hadoop主从类别,以及root和hadoop集群用户两种角色,以职责图的方式展现,更加清晰一些

随机推荐

  1. JavaWeb学习记录(二)——防盗链技术

    public class TestServlet extends HttpServlet { public void doGet(HttpServletRequest request, HttpSer ...

  2. Git错误non-fast-forward

    Git错误non-fast-forward后的冲突解决 [日期:2012-04-21] 来源:Linux社区  作者:chain2012 [字体:大 中 小]   当要push代码到git时,出现提示 ...

  3. Foundation框架 - 快速创建跨平台的网站页面原型

    API参考:http://foundation.zurb.com/docs/ 作为网页设计和开发人员,我们面临着以下几个严峻的问题: 每天,人们用来上网的设备种类和数量都在不断上升. 为每种设备设计开 ...

  4. Fortran编译多个文件(转载)

    最近需要在Linux系统下编译多个Fortran程序,在网上搜索了一下,但是资料不多,也许因为这个问题比较简单,不值一提,但还是把我知道的写出来,供大家参考: 方法一: 假如现在有两个Fortran程 ...

  5. 网络-数据包在路由转发过程中MAC地址和IP地址,变与不变

    关于MAC地址和IP地址在传输过程中变与不变的问题: 结论:MAC地址在同一个广播域传输过程中是不变的,在跨越广播域的时候会发生改变的:而IP地址在传输过程中是不会改变的(除NAT的时候),总结为 路 ...

  6. 通过JavaScript更新UpdatePanel备忘

    1<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs ...

  7. 如何用js检查浏览器是否安装flash插件

    <script type="text/javascript" language="JavaScript"> //Powered By smvv @h ...

  8. mysql 1449 : The user specified as a definer ('root'@'%') does not exist 解决方法

    权限问题,授权 给 root  所有sql 权限 mysql> grant all privileges on *.* to root@"%" identified by & ...

  9. java序列化---转

    Java 序列化Serializable详解(附详细例子) 1.什么是序列化和反序列化Serialization(序列化)是一种将对象以一连串的字节描述的过程:反序列化deserialization是 ...

  10. 如何计算oracle数据库内存

    数据库内存设置: 项目 数据关系 单位 系统CPU n 个 物理内存Memory 假设4G物理内存 4*1024 MB memory_target 0.5*4*1024 0.5*Memory sga_ ...