Install RHadoop with Hadoop 2.2 – Red Hat Linux
Prerequisite
Hadoop 2.2 has been installed (and the below installation steps should be applied on each of Hadoop node)
Step 1. Install R (by yum)
[hadoop@c0046220 yum.repos.d]$ sudo yum update
[hadoop@c0046220 yum.repos.d]$ yum search r-project
[hadoop@c0046220 yum.repos.d]$ sudo yum install R
...
Installed:
R.x86_64 0:3.0.2-1.el6
Dependency Installed:
R-core.x86_64 0:3.0.2-1.el6 R-core-devel.x86_64 0:3.0.2-1.el6 R-devel.x86_64 0:3.0.2-1.el6 R-java.x86_64 0:3.0.2-1.el6
R-java-devel.x86_64 0:3.0.2-1.el6 bzip2-devel.x86_64 0:1.0.5-7.el6_0 fontconfig-devel.x86_64 0:2.8.0-3.el6 freetype-devel.x86_64 0:2.3.11-14.el6_3.1
java-1.6.0-openjdk-devel.x86_64 1:1.6.0.0-1.62.1.11.11.90.el6_4 kpathsea.x86_64 0:2007-57.el6_2 libRmath.x86_64 0:3.0.2-1.el6 libRmath-devel.x86_64 0:3.0.2-1.el6
libXft-devel.x86_64 0:2.3.1-2.el6 libXmu.x86_64 0:1.1.1-2.el6 libXrender-devel.x86_64 0:0.9.7-2.el6 libicu.x86_64 0:4.2.1-9.1.el6_2
netpbm.x86_64 0:10.47.05-11.el6 netpbm-progs.x86_64 0:10.47.05-11.el6 pcre-devel.x86_64 0:7.8-6.el6 psutils.x86_64 0:1.17-34.el6
tcl.x86_64 1:8.5.7-6.el6 tcl-devel.x86_64 1:8.5.7-6.el6 tex-preview.noarch 0:11.85-10.el6 texinfo.x86_64 0:4.13a-8.el6
texinfo-tex.x86_64 0:4.13a-8.el6 texlive.x86_64 0:2007-57.el6_2 texlive-dvips.x86_64 0:2007-57.el6_2 texlive-latex.x86_64 0:2007-57.el6_2
texlive-texmf.noarch 0:2007-38.el6 texlive-texmf-dvips.noarch 0:2007-38.el6 texlive-texmf-errata.noarch 0:2007-7.1.el6 texlive-texmf-errata-dvips.noarch 0:2007-7.1.el6
texlive-texmf-errata-fonts.noarch 0:2007-7.1.el6 texlive-texmf-errata-latex.noarch 0:2007-7.1.el6 texlive-texmf-fonts.noarch 0:2007-38.el6 texlive-texmf-latex.noarch 0:2007-38.el6
texlive-utils.x86_64 0:2007-57.el6_2 tk.x86_64 1:8.5.7-5.el6 tk-devel.x86_64 1:8.5.7-5.el6 zlib-devel.x86_64 0:1.2.3-29.el6
Complete!
Validation:
[hadoop@c0046220 yum.repos.d]$ R
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
Step 2. Install RHadoop
2.1 Getting RHadoop Packages
Download packages rhdfs, rhbase and rmr2 from https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads and then run the R code below.
[hadoop@c0046220 RHadoop]$ cd /tmp
[hadoop@c0046220 tmp]$ mkdir RHadoop
[hadoop@c0046220 tmp]$ cd RHadoop
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhdfs/master/build/rhdfs_1.0.8.tar.gz
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rmr2/3.1.0/build/rmr2_3.1.0.tar.gz
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhbase/master/build/rhbase_1.2.0.tar.gz
2.2 Install R packages that RHadoop depends on.
[hadoop@c0046220 java]$ echo $JAVA_HOME
/usr/java/jdk1.8.0_05
[hadoop@c0046220 java]$ sudo -i
[root@c0046220 ~]# export JAVA_HOME=/usr/java/jdk1.8.0_05
[root@c0046220 ~]# R CMD javareconf
[root@c0046220 ~]# R
...
> .libPaths();
[1] "/usr/lib64/R/library" "/usr/share/R/library"
> install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "caTools"))
> #install.packages("caTools") #needed for rmr2
2.3 Install RHadoop
Set environment variables
[hadoop@c0046220 ~]$ vi ~/.bashrc
# set HADOOP locations for RHADOOP
export HADOOP_CMD=$HADOOP_HOME/bin/hadoop
export HADOOP_STREAMING=/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
[hadoop@c0046220 ~]$ source .bashrc
[hadoop@c0040084 R]$ sudo -i
[root@c0040084 ~]# R
...
> Sys.setenv(HADOOP_HOME="/opt/hadoop/hadoop-2.2.0");
> Sys.setenv(HADOOP_CMD="/opt/hadoop/hadoop-2.2.0/bin/hadoop");
> Sys.setenv(HADOOP_STREAMING="/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar");
> install.packages(pkgs="/tmp/RHadoop/rhdfs_1.0.8.tar.gz",repos=NULL);
> install.packages(pkgs="/tmp/RHadoop/rmr2_3.1.0.tar.gz",repos=NULL);
Step 3. Validation
Load and initialize the rhdfs package, and execute some simple commands as below:
library(rhdfs)
hdfs.init()
hdfs.ls("/")
[hadoop@c0046220 ~]$ R
...
> library(rhdfs)
Loading required package: rJava
...
Be sure to run hdfs.init()
> hdfs.init()
14/05/15 10:02:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> hdfs.ls("/")
permission owner group size modtime file
1 drwxr-xr-x hadoop supergroup 0 2014-05-14 03:05 /apps
2 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:40 /data
3 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:45 /output
4 drwxrwx--- hadoop supergroup 0 2014-05-15 10:02 /tmp
5 drwxr-xr-x hadoop supergroup 0 2014-05-14 05:48 /user
6 drwxr-xr-x hadoop supergroup 0 2014-05-13 06:43 /usr
Load and initialize the rmr2 package, and execute some simple commands as below:
library(rmr2)
from.dfs(to.dfs(1:100))
from.dfs(mapreduce(to.dfs(1:100)))
[hadoop@c0046220 ~]$ R
...
> library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: bitops
Loading required package: digest
Loading required package: functional
Loading required package: reshape2
Loading required package: stringr
Loading required package: plyr
Loading required package: caTools
> from.dfs(to.dfs(1:100))
...
$key
NULL
$val
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
> from.dfs(mapreduce(to.dfs(1:100)))
...
$key
NULL
$val
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
library(rmr2)
input<- '/user/hadoop/tmp.txt'
wordcount = function(input, output = NULL, pattern = " "){
wc.map = function(., lines) {
keyval(unlist( strsplit( x = lines,split = pattern)),1)
}
wc.reduce =function(word, counts ) {
keyval(word, sum(counts))
}
mapreduce(input = input ,output = output, input.format = "text",
map = wc.map, reduce = wc.reduce,combine = T)
}
wordcount(input)
> library(rmr2)
> input<- '/user/hadoop/tmp.txt'
> wordcount = function(input, output = NULL, pattern = " "){
+ wc.map = function(., lines) {
+ keyval(unlist( strsplit( x = lines,split = pattern)),1)
+ }
+
+ wc.reduce =function(word, counts ) {
+ keyval(word, sum(counts))
+ }
+
+ mapreduce(input = input ,output = output, input.format = "text",
+ map = wc.map, reduce = wc.reduce,combine = T)
+ }
>
> wordcount(input)
...
14/05/15 10:18:40 INFO mapreduce.Job: Job job_1399887026053_0013 completed successfully
14/05/15 10:18:40 INFO mapreduce.Job: Counters: 45
File System Counters
FILE: Number of bytes read=11018
FILE: Number of bytes written=278566
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2004
HDFS: Number of bytes written=11583
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Failed reduce tasks=1
Launched map tasks=2
Launched reduce tasks=2
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=23412
Total time spent by all reduces in occupied slots (ms)=13859
Map-Reduce Framework
Map input records=24
Map output records=112
Map output bytes=10522
Map output materialized bytes=11024
Input split bytes=208
Combine input records=112
Combine output records=114
Reduce input groups=105
Reduce shuffle bytes=11024
Reduce input records=114
Reduce output records=112
Spilled Records=228
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=569
CPU time spent (ms)=3700
Physical memory (bytes) snapshot=574214144
Virtual memory (bytes) snapshot=6258499584
Total committed heap usage (bytes)=365953024
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1796
File Output Format Counters
Bytes Written=11583
rmr
reduce calls=110
14/05/15 10:18:40 INFO streaming.StreamJob: Output directory: /tmp/file612355aa2e35
function ()
{
fname
}
<environment: 0x37d70d0>
>
>
> from.dfs("/tmp/file612355aa2e35")
$key
[1] "-"
[2] "of"
[3] "Hong"
[4] "Paul's"
[5] "School"
[6] "College"
[7] "Graduate"
...
References
https://s3.amazonaws.com/RHadoop/RHadoop2.0.2u2_Installation_Configuration_for_RedHat.pdf

http://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Installing-R-under-Unix_002dalikes
http://www.rdatamining.com/tutorials/rhadoop
http://blog.fens.me/rhadoop-rhadoop/
http://datamgmt.com/installing-r-and-rstudio-on-redhat-or-centos-linux/
https://github.com/RevolutionAnalytics/RHadoop/wiki
https://github.com/RevolutionAnalytics/RHadoop/wiki/Which-Hadoop-for-rmr
Install RHadoop with Hadoop 2.2 – Red Hat Linux的更多相关文章
- Red hat Linux(Centos 5/6)安装R语言
Red hat Linux(Centos 5/6)安装R语言1 wget http://cran.rstudio.com/src/base/R-3/R-3.0.2.tar.gz2 tar xzvf R ...
- red hat Linux 使用CentOS yum源更新
red hat linux是商业版软件,没有经过注册是无法使用红帽 yum源更新软件的,使用CentOS源更新操作如下: 1.删除red hat linux 原有的yum 源 rpm -aq | gr ...
- red hat linux之Samba、DHCP、DNS、FTP、Web的安装与配置
本教程是在red hat linux 6.0环境下简单测试!教程没有图片演示,需要具有一定Linux基础知识,很多地方的配置需要根据自己的情况修改,照打不一定可以配置成功.(其他不足后续修改添加) y ...
- Red Hat Linux认证
想系统的学习一下Linux,了解了一些关于Red Hat Linux认证的信息.整理如下. 当前比较常见的是RHCE认证,即Red Hat Certified Engineer.最高级别的是RHCA ...
- Red Hat linux 如何增加swap空间
按步骤介绍 Red Hat linux 如何增加swap空间 方法/步骤 第一步:确保系统中有足够的空间来用做swap交换空间,我使用的是KVM,准备在一个独立的文件系统中添加一个swap交换文件,在 ...
- 分享red hat linux 6上安装oracle11g时遇到的gcc: error trying to exec 'cc1': execvp: No such file or directory的问题处理过程
安装环境:Red Hat Linux 6.5_x64.oracle11g 64bit 报错详情: 安装到68%时弹窗报错: 调用makefile '/test/app/Administrators/p ...
- Red Hat Linux 挂载外部资源
在我们安装的Red Hat Linux 中.当中一半机器为最主要的server配置,没有桌面环境.在从U盘上复制文件的时候可就犯难了.在网上查了查才知道.要訪问U盘就必须先将它们挂载到Linux系统的 ...
- Red Hat Linux 安装 (本地、网络安装)
Red Hat Linux 安装 (本地.网络安装) 650) this.width=650;" onclick='window.open("http://blog.51cto.c ...
- 在Red Hat Linux服务器端假设NSF Server来进行Linux系统安装全过程
本教程讲述了通过在Red Hat Linux服务器端假设NSF Server来进行Linux系统安装的过程,并详细介绍了如何制作网络启动盘的细节.演示直观,讲解通俗易懂,特别适合初学者 ...
随机推荐
- J2EE 关于WebLogic下应用使用URL.openConnection获取连接返回 HttpsURLConnection与SOAPHttpsURLConnection的问题
J2EE 关于WebLogic下应用使用URL.openConnection获取连接返回 HttpsURLConnection与SOAPHttpsURLConnection的问题 2012年03月09 ...
- Java 二分查找
public int binarySearch(int[] nums, int target) { int low = 0; int high = nums.length; while (low &l ...
- input页面居中,软键盘覆盖input
input框位于底部 对于ios的软键盘遮盖input输入框问题,网上已经有了一些解决办法,无非就是改变布局,再加scroll.js插件实现滚动, input框位于顶部 这种情况不会出现问题, inp ...
- 排序(5)---------高速排序(C语言实现)
继shell发明了shell排序过后呢,各位计算机界的大牛们又開始不爽了,为什么他能发明.我就不能发明呢.于是又有个哥们蹦出来了.哎...那么多排序,就木有一个排序是中国人发明的.顺便吐槽一下,一百年 ...
- android蓝牙4.0(BLE)开发之ibeacon初步
一个april beacon里携带的信息如下 ? 1 <code class=" hljs ">0201061AFF4C0002159069BDB88C11416BAC ...
- lvs+heartbeat搭建负载均衡高可用集群
[172.25.48.1]vm1.example.com [172.25.48.4]vm4.example.com 集群依赖软件:
- python-Pickle序列化
[Python之旅]第三篇(二):Pickle序列化 python 序列化 pickle 摘要: 说明:关于Pickle的说明 作如下说明: 1 2 3 4 5 6 7 序列化的概念很简单 ...
- 线程技术 ☞ Future模式
线程技术可以让我们的程序同时做多件事情,线程的工作模式有很多,常见的一种模式就是处理网站的并发,今天我来说说线程另一种很常见的模式,这个模式和前端里的ajax类似:浏览器一个主线程执行javascri ...
- hash练习
/* poj 1200 Crazy Search 字符串hash O(n)枚举起点 然后O(1)查询子串hash值 然后O(n)找不一样的个数 复杂度是线性的 */ #include<iostr ...
- WEB前端开发规范文档(转)
http://codeguide.bootcss.com/ 编写灵活.稳定.高质量的 HTML 和 CSS 代码的规范上面的文档 再结合下面的规范: 无论是从技术角度还是开发视角,对于web前端开发 ...