Install RHadoop with Hadoop 2.2 – Red Hat Linux
Prerequisite
Hadoop 2.2 has been installed (and the below installation steps should be applied on each of Hadoop node)
Step 1. Install R (by yum)
[hadoop@c0046220 yum.repos.d]$ sudo yum update
[hadoop@c0046220 yum.repos.d]$ yum search r-project
[hadoop@c0046220 yum.repos.d]$ sudo yum install R
...
Installed:
R.x86_64 0:3.0.2-1.el6
Dependency Installed:
R-core.x86_64 0:3.0.2-1.el6 R-core-devel.x86_64 0:3.0.2-1.el6 R-devel.x86_64 0:3.0.2-1.el6 R-java.x86_64 0:3.0.2-1.el6
R-java-devel.x86_64 0:3.0.2-1.el6 bzip2-devel.x86_64 0:1.0.5-7.el6_0 fontconfig-devel.x86_64 0:2.8.0-3.el6 freetype-devel.x86_64 0:2.3.11-14.el6_3.1
java-1.6.0-openjdk-devel.x86_64 1:1.6.0.0-1.62.1.11.11.90.el6_4 kpathsea.x86_64 0:2007-57.el6_2 libRmath.x86_64 0:3.0.2-1.el6 libRmath-devel.x86_64 0:3.0.2-1.el6
libXft-devel.x86_64 0:2.3.1-2.el6 libXmu.x86_64 0:1.1.1-2.el6 libXrender-devel.x86_64 0:0.9.7-2.el6 libicu.x86_64 0:4.2.1-9.1.el6_2
netpbm.x86_64 0:10.47.05-11.el6 netpbm-progs.x86_64 0:10.47.05-11.el6 pcre-devel.x86_64 0:7.8-6.el6 psutils.x86_64 0:1.17-34.el6
tcl.x86_64 1:8.5.7-6.el6 tcl-devel.x86_64 1:8.5.7-6.el6 tex-preview.noarch 0:11.85-10.el6 texinfo.x86_64 0:4.13a-8.el6
texinfo-tex.x86_64 0:4.13a-8.el6 texlive.x86_64 0:2007-57.el6_2 texlive-dvips.x86_64 0:2007-57.el6_2 texlive-latex.x86_64 0:2007-57.el6_2
texlive-texmf.noarch 0:2007-38.el6 texlive-texmf-dvips.noarch 0:2007-38.el6 texlive-texmf-errata.noarch 0:2007-7.1.el6 texlive-texmf-errata-dvips.noarch 0:2007-7.1.el6
texlive-texmf-errata-fonts.noarch 0:2007-7.1.el6 texlive-texmf-errata-latex.noarch 0:2007-7.1.el6 texlive-texmf-fonts.noarch 0:2007-38.el6 texlive-texmf-latex.noarch 0:2007-38.el6
texlive-utils.x86_64 0:2007-57.el6_2 tk.x86_64 1:8.5.7-5.el6 tk-devel.x86_64 1:8.5.7-5.el6 zlib-devel.x86_64 0:1.2.3-29.el6
Complete!
Validation:
[hadoop@c0046220 yum.repos.d]$ R
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
Step 2. Install RHadoop
2.1 Getting RHadoop Packages
Download packages rhdfs, rhbase and rmr2 from https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads and then run the R code below.
[hadoop@c0046220 RHadoop]$ cd /tmp
[hadoop@c0046220 tmp]$ mkdir RHadoop
[hadoop@c0046220 tmp]$ cd RHadoop
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhdfs/master/build/rhdfs_1.0.8.tar.gz
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rmr2/3.1.0/build/rmr2_3.1.0.tar.gz
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhbase/master/build/rhbase_1.2.0.tar.gz
2.2 Install R packages that RHadoop depends on.
[hadoop@c0046220 java]$ echo $JAVA_HOME
/usr/java/jdk1.8.0_05
[hadoop@c0046220 java]$ sudo -i
[root@c0046220 ~]# export JAVA_HOME=/usr/java/jdk1.8.0_05
[root@c0046220 ~]# R CMD javareconf
[root@c0046220 ~]# R
...
> .libPaths();
[1] "/usr/lib64/R/library" "/usr/share/R/library"
> install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "caTools"))
> #install.packages("caTools") #needed for rmr2
2.3 Install RHadoop
Set environment variables
[hadoop@c0046220 ~]$ vi ~/.bashrc
# set HADOOP locations for RHADOOP
export HADOOP_CMD=$HADOOP_HOME/bin/hadoop
export HADOOP_STREAMING=/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
[hadoop@c0046220 ~]$ source .bashrc
[hadoop@c0040084 R]$ sudo -i
[root@c0040084 ~]# R
...
> Sys.setenv(HADOOP_HOME="/opt/hadoop/hadoop-2.2.0");
> Sys.setenv(HADOOP_CMD="/opt/hadoop/hadoop-2.2.0/bin/hadoop");
> Sys.setenv(HADOOP_STREAMING="/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar");
> install.packages(pkgs="/tmp/RHadoop/rhdfs_1.0.8.tar.gz",repos=NULL);
> install.packages(pkgs="/tmp/RHadoop/rmr2_3.1.0.tar.gz",repos=NULL);
Step 3. Validation
Load and initialize the rhdfs package, and execute some simple commands as below:
library(rhdfs)
hdfs.init()
hdfs.ls("/")
[hadoop@c0046220 ~]$ R
...
> library(rhdfs)
Loading required package: rJava
...
Be sure to run hdfs.init()
> hdfs.init()
14/05/15 10:02:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> hdfs.ls("/")
permission owner group size modtime file
1 drwxr-xr-x hadoop supergroup 0 2014-05-14 03:05 /apps
2 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:40 /data
3 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:45 /output
4 drwxrwx--- hadoop supergroup 0 2014-05-15 10:02 /tmp
5 drwxr-xr-x hadoop supergroup 0 2014-05-14 05:48 /user
6 drwxr-xr-x hadoop supergroup 0 2014-05-13 06:43 /usr
Load and initialize the rmr2 package, and execute some simple commands as below:
library(rmr2)
from.dfs(to.dfs(1:100))
from.dfs(mapreduce(to.dfs(1:100)))
[hadoop@c0046220 ~]$ R
...
> library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: bitops
Loading required package: digest
Loading required package: functional
Loading required package: reshape2
Loading required package: stringr
Loading required package: plyr
Loading required package: caTools
> from.dfs(to.dfs(1:100))
...
$key
NULL
$val
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
> from.dfs(mapreduce(to.dfs(1:100)))
...
$key
NULL
$val
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
library(rmr2)
input<- '/user/hadoop/tmp.txt'
wordcount = function(input, output = NULL, pattern = " "){
wc.map = function(., lines) {
keyval(unlist( strsplit( x = lines,split = pattern)),1)
}
wc.reduce =function(word, counts ) {
keyval(word, sum(counts))
}
mapreduce(input = input ,output = output, input.format = "text",
map = wc.map, reduce = wc.reduce,combine = T)
}
wordcount(input)
> library(rmr2)
> input<- '/user/hadoop/tmp.txt'
> wordcount = function(input, output = NULL, pattern = " "){
+ wc.map = function(., lines) {
+ keyval(unlist( strsplit( x = lines,split = pattern)),1)
+ }
+
+ wc.reduce =function(word, counts ) {
+ keyval(word, sum(counts))
+ }
+
+ mapreduce(input = input ,output = output, input.format = "text",
+ map = wc.map, reduce = wc.reduce,combine = T)
+ }
>
> wordcount(input)
...
14/05/15 10:18:40 INFO mapreduce.Job: Job job_1399887026053_0013 completed successfully
14/05/15 10:18:40 INFO mapreduce.Job: Counters: 45
File System Counters
FILE: Number of bytes read=11018
FILE: Number of bytes written=278566
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2004
HDFS: Number of bytes written=11583
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Failed reduce tasks=1
Launched map tasks=2
Launched reduce tasks=2
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=23412
Total time spent by all reduces in occupied slots (ms)=13859
Map-Reduce Framework
Map input records=24
Map output records=112
Map output bytes=10522
Map output materialized bytes=11024
Input split bytes=208
Combine input records=112
Combine output records=114
Reduce input groups=105
Reduce shuffle bytes=11024
Reduce input records=114
Reduce output records=112
Spilled Records=228
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=569
CPU time spent (ms)=3700
Physical memory (bytes) snapshot=574214144
Virtual memory (bytes) snapshot=6258499584
Total committed heap usage (bytes)=365953024
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1796
File Output Format Counters
Bytes Written=11583
rmr
reduce calls=110
14/05/15 10:18:40 INFO streaming.StreamJob: Output directory: /tmp/file612355aa2e35
function ()
{
fname
}
<environment: 0x37d70d0>
>
>
> from.dfs("/tmp/file612355aa2e35")
$key
[1] "-"
[2] "of"
[3] "Hong"
[4] "Paul's"
[5] "School"
[6] "College"
[7] "Graduate"
...
References
https://s3.amazonaws.com/RHadoop/RHadoop2.0.2u2_Installation_Configuration_for_RedHat.pdf
http://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Installing-R-under-Unix_002dalikes
http://www.rdatamining.com/tutorials/rhadoop
http://blog.fens.me/rhadoop-rhadoop/
http://datamgmt.com/installing-r-and-rstudio-on-redhat-or-centos-linux/
https://github.com/RevolutionAnalytics/RHadoop/wiki
https://github.com/RevolutionAnalytics/RHadoop/wiki/Which-Hadoop-for-rmr
Install RHadoop with Hadoop 2.2 – Red Hat Linux的更多相关文章
- Red hat Linux(Centos 5/6)安装R语言
Red hat Linux(Centos 5/6)安装R语言1 wget http://cran.rstudio.com/src/base/R-3/R-3.0.2.tar.gz2 tar xzvf R ...
- red hat Linux 使用CentOS yum源更新
red hat linux是商业版软件,没有经过注册是无法使用红帽 yum源更新软件的,使用CentOS源更新操作如下: 1.删除red hat linux 原有的yum 源 rpm -aq | gr ...
- red hat linux之Samba、DHCP、DNS、FTP、Web的安装与配置
本教程是在red hat linux 6.0环境下简单测试!教程没有图片演示,需要具有一定Linux基础知识,很多地方的配置需要根据自己的情况修改,照打不一定可以配置成功.(其他不足后续修改添加) y ...
- Red Hat Linux认证
想系统的学习一下Linux,了解了一些关于Red Hat Linux认证的信息.整理如下. 当前比较常见的是RHCE认证,即Red Hat Certified Engineer.最高级别的是RHCA ...
- Red Hat linux 如何增加swap空间
按步骤介绍 Red Hat linux 如何增加swap空间 方法/步骤 第一步:确保系统中有足够的空间来用做swap交换空间,我使用的是KVM,准备在一个独立的文件系统中添加一个swap交换文件,在 ...
- 分享red hat linux 6上安装oracle11g时遇到的gcc: error trying to exec 'cc1': execvp: No such file or directory的问题处理过程
安装环境:Red Hat Linux 6.5_x64.oracle11g 64bit 报错详情: 安装到68%时弹窗报错: 调用makefile '/test/app/Administrators/p ...
- Red Hat Linux 挂载外部资源
在我们安装的Red Hat Linux 中.当中一半机器为最主要的server配置,没有桌面环境.在从U盘上复制文件的时候可就犯难了.在网上查了查才知道.要訪问U盘就必须先将它们挂载到Linux系统的 ...
- Red Hat Linux 安装 (本地、网络安装)
Red Hat Linux 安装 (本地.网络安装) 650) this.width=650;" onclick='window.open("http://blog.51cto.c ...
- 在Red Hat Linux服务器端假设NSF Server来进行Linux系统安装全过程
本教程讲述了通过在Red Hat Linux服务器端假设NSF Server来进行Linux系统安装的过程,并详细介绍了如何制作网络启动盘的细节.演示直观,讲解通俗易懂,特别适合初学者 ...
随机推荐
- poj1016
题目大意:数据统计 看明白了,就是给你一个数,例如31123314,代表的意思有3个1,1个2,3个3,1个4,但数字本身的有的数字也是有3个1,1个2,3个3,1个4,所以这样的数叫做self-in ...
- 遇到的retain cycle例子
retain cycle 会造成内存溢出,严重情况会引起崩溃.一般注意点也不会发生,但在网络连接比较多的地方就会不小心出现,vc异步的网络请求,成功后的block调用vc,如果此时,用户已经不用此vc ...
- Java调用R(一)_Rserve
最近做项目涉及到R总结一些Java本地调用R和Java web项目中调用R的基本步骤 1. R中安装Rserve包 2. 系统变量Path加上 C:\Program Files\R\R-3.0.1 ...
- Android开发 - 下拉刷新和分段头悬停列表
项目源码 本文所述项目已开源,源码地址 为什么做PullToRefresh-PinnedSection-ListView 前段时间因为项目需求,需要在Android中对ListView同时增加下拉刷新 ...
- VS 制作安装包小窥
难得忙里偷闲,看到有关VS制作安装包,按下文小试一把,还行,比不上Installshield. 首先在打开 VS2010 > 文件 >新建项目 创建一个安装项目 XXX 在“目 ...
- Android RadioGroup Fragment Viewpager FragmentPagerAdapter 去哪网Fragment嵌套
RadioGroup中的各个选择器 <selector xmlns:android="http://schemas.android.com/apk/res/android"& ...
- 1.1GTK+ 的简单程序HelloWorld
1.1GTK+ 的简单程序HelloWorld 编译执行如图所看到的:
- mac缺少预编译.a问题
在win7的svn提交了coco2d-x 3.0代码,在mac进行更新,用xcode打开工程,编译不成功,一看好多的.a文件全部都是红色的,无法找到文件,一开始不了解coco2d-x的prebuilt ...
- MySQL存储过程详解 mysql 存储过程(二)
mysql存储过程详解 1. 存储过程简介 我们常用的操作数据库语言SQL语句在执行的时候需要要先编译,然后执行,而存储过程(Stored Procedure)是一组为了完成特定功能的SQL ...
- UIScrollView设置了contentSize后还是没办法滚动?
1.最常见的原因是 contentSize 这个属性,比uiscrollview的frame要小, 无需滚动, 自然就滚动不了. scrollenabled 这个属性,标识着是否允许滚动,要言设成ye ...