Hadoop 分布式环境搭建
- 安装概览
IP | Host Name | Software | Node | | ae01 | JDK 1.7 | NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker | | ae02 | JDK 1.7 | DataNode, TaskTracker | | ae03 | JDK 1.7 | DataNode, TaskTracker |
- 若使用虚拟机安装,可以安装 samba, smbfs方便对于文件的控制。
- 系统环境: ubuntu-12.04.2-server-amd64
- 安装目录: /usr/local/ae
- JDK 安装目录: export JAVA_HOME=/usr/local/ae/jdk1.7.0_51
- Hadoop版本: hadoop-1.2.1
- 安装SSH
user@ae01:~$ sudo apt-get install openssh-server
user@ae01:~# ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
:3d:a4:::c4:::6b:6b:1c:7c:e9:8f:: user@ae01
The key's randomart image is:
+--[ RSA ]----+
| .=*.. |
| === . . |
| Oo= E |
| = = . o |
| S . . |
| . + |
| . . |
| |
| |
+-----------------+ - 配置无密码SSH登录
修改ae01的公钥名字为id_rsa_ae01.pub。user@ae01:~/.ssh$ sudo cp id_rsa.pub id_rsa_ae01.pub
user@ae01:~/.ssh$ scp ./id_rsa_ae01.pub user@
登录到ae02,将id_rsa_ae01.pub 添加到authorized_keys
user@ae02:~/.ssh$ cat id_rsa_ae01.pub >> authorized_keys
user@ae01:~/.ssh$ ssh ae02
Welcome to Ubuntu 12.04. LTS (GNU/Linux 3.5.--generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Thu Jun :: CST System load: 0.0 Processes:
Usage of /: 10.3% of .45GB Users logged in:
Memory usage: % IP address for eth0:
Swap usage: % Graph this data and manage this system at https://landscape.canonical.com/ packages can be updated.
updates are security updates. Last login: Tue Jun :: from对以上机器都进行如上操作,确保两两之间可以实现无密码ssh.
三、安装 Hadoop
- 修改host文件,添加3台服务器的host
user@ae01:/usr/local/ae$ sudo vim /etc/hosts localhost ae01 ae02 ae03 - 解压Hadoop
将hadoop-1.2.1.tar.gz 复制到 /usr/local/ae,解压user@ae01:/usr/local/ae$ sudo tar -zxvf hadoop-1.2..tar.gz
- 添加Hadoop环境变量
export HADOOP_HOME=/usr/local/ae/hadoop-1.2.
export PATH=$PATH:$HADOOP_HOME/bin - 配置Hadoop
core-site.xml是全局配置,hdfs-site.xml和mapred-site.xml分别是hdfs和mapred的局部配置修改$HADOOP_HOME/conf/hadoop-env.sh 添加JAVA_HOME
export JAVA_HOME=/usr/local/ae/jdk1..0_51
修改$HADOOP_HOME/conf/core-site.xml 加入以下文件到<configuration>节点
<description>A base for other temporary directories.</description>
</property> <property>
The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.
The number of seconds between two periodic checkpoints.
</property> <property>
The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired.
</property>修改$HADOOP_HOME/conf/hdfs-site.xml 加入以下文件到<configuration>节点
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
</property> <property>
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
</property> <property>
The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port.
</property> <property>
If "true", enable permission checking in HDFS. If "false", permission checking is turned off,
but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.
</property> <property>
<description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
</property>修改$HADOOP_HOME/conf/mapred-site.xml 加入以下文件到<configuration>节点
The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
fs.default.name的值hdfs://ae01:9000 用来决定NameNode
mapred.job.tracker的值ae01:9001 用来决定JobTracker
slaves文件的值决定DataNode和TaskTracker创建文件目录/usr/local/ae/storage/hadoop, 并赋予hadoop 文件夹足够的权限
user@ae01: ~$ /usr/local/ae$ sudo chmod 777 ./storage/hadoop/
user@ae01:~$ hadoop namenode -format
user@ae01:~$ start-all.sh
ae01user@ae01:/usr/local/ae$ jps
26239 JobTracker
26158 SecondaryNameNode
36052 Jps
26468 TaskTracker
25687 NameNode
25926 DataNodeae02
user@ae02:~$ jps
25021 Jps
18999 TaskTracker
18791 DataNodeae03
user@ae03:~$ jps
3901 DataNode
9485 Jps
4106 TaskTracker
