hadoop 2.7.3本地环境运行官方wordcount
hadoop 2.7.3本地环境运行官方wordcount
基本环境:
系统:win7
虚机环境:virtualBox
虚机:centos 7
hadoop版本:2.7.3
本次先以独立模式(本地模式)来运行。
参考:
1 hadoop 安装
java环境
yum install java-1.8.0-openjdk
hadoop下载压缩包并安装
mkdir ~/hadoop/
cd ~/hadoop/
# http://apache.fayea.com/hadoop/common/hadoop-2.7.3/
curl http://apache.fayea.com/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz -O
# 如果下载出现中断,则可以使用-C参数继续下载
ls -l
#-rw-rw-r--. 1 jungle jungle 165297920 Jan 6 13:10 hadoop-2.7.3.tar.gz
curl http://apache.fayea.com/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz -C 165297920 -O
# ** Resuming transfer from byte position 165297920 ……
# download checksum
curl http://apache.fayea.com/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz.mds -O
# check
cat hadoop-2.7.3.tar.gz.mds
md5sum hadoop-2.7.3.tar.gz
sha256sum hadoop-2.7.3.tar.gz
tar -zxf hadoop-2.7.3.tar.gz
mv hadoop-2.7.3 hadoop-local
2 配置环境
因为是使用本地模式,需要配置的项非常少,只需要涉及环境变量。
# java path
whereis java
java: /usr/bin/java /usr/lib/java /etc/java /usr/share/java
ls -l /usr/bin/java
lrwxrwxrwx. 1 root root 22 Dec 30 12:26 /usr/bin/java -> /etc/alternatives/java
ls -l /etc/alternatives/java
lrwxrwxrwx. 1 root root 73 Dec 30 12:26 /etc/alternatives/java -> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/jre/bin/java
在~/.bashrc中增加如下三行
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/jre
export HADOOP_INSTALL=/home/jungle/hadoop/hadoop-local
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
确认hadoop可用:
hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /home/jungle/hadoop/hadoop-local/share/hadoop/common/hadoop-common-2.7.3.jar
2 使用linux文件系统做测试
直接使用linux的文件系统做测试,即不使用hadoop fs相关命令。
2.1 wordcount
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
# ...
wordcount: A map/reduce program that counts the words in the input files.
# ...
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount
# Usage: wordcount <in> [<in>...] <out>
2.2 准备数据
mkdir -p dataLocal/input/
cd dataLocal/input/
echo "hello world, I am jungle. bye world" > file1.txt
echo "hello hadoop. hello jungle. bye hadoop." > file2.txt
echo "the great software is hadoop." >> file2.txt
2.3 运行
cd /home/jungle/hadoop/hadoop-local/
hadoop jar /home/jungle/hadoop/hadoop-local/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount dataLocal/input/ dataLocal/outout
# dataLocal/outout 当前不存在,由程序生成
echo $?
ls -la dataLocal/outout/
total 12
drwxrwxr-x. 2 jungle jungle 84 Jan 6 16:53 .
drwxrwxr-x. 4 jungle jungle 31 Jan 6 16:53 ..
-rw-r--r--. 1 jungle jungle 82 Jan 6 16:53 part-r-00000
-rw-r--r--. 1 jungle jungle 12 Jan 6 16:53 .part-r-00000.crc
-rw-r--r--. 1 jungle jungle 0 Jan 6 16:53 _SUCCESS
-rw-r--r--. 1 jungle jungle 8 Jan 6 16:53 ._SUCCESS.crc
# 结果
cat dataLocal/outout//part-r-00000
I 1
am 1
bye 2
great 1
hadoop. 3
hello 3
is 1
jungle. 2
software 1
the 1
world. 2
2.4 最后是运行日志
通过日志,可以了解运行时的一些常用参数和配置。
17/01/06 16:53:26 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/01/06 16:53:26 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/01/06 16:53:26 INFO input.FileInputFormat: Total input paths to process : 2
17/01/06 16:53:26 INFO mapreduce.JobSubmitter: number of splits:2
17/01/06 16:53:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1147390429_0001
17/01/06 16:53:27 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/01/06 16:53:27 INFO mapreduce.Job: Running job: job_local1147390429_0001
17/01/06 16:53:27 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/01/06 16:53:27 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Waiting for map tasks
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1147390429_0001_m_000000_0
17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/01/06 16:53:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/01/06 16:53:27 INFO mapred.MapTask: Processing split: file:/home/jungle/hadoop/hadoop-local/dataLocal/input/file2.txt:0+70
17/01/06 16:53:27 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/01/06 16:53:27 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/01/06 16:53:27 INFO mapred.MapTask: soft limit at 83886080
17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/01/06 16:53:27 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/01/06 16:53:27 INFO mapred.LocalJobRunner:
17/01/06 16:53:27 INFO mapred.MapTask: Starting flush of map output
17/01/06 16:53:27 INFO mapred.MapTask: Spilling map output
17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufend = 114; bufvoid = 104857600
17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214356(104857424); length = 41/6553600
17/01/06 16:53:27 INFO mapred.MapTask: Finished spill 0
17/01/06 16:53:27 INFO mapred.Task: Task:attempt_local1147390429_0001_m_000000_0 is done. And is in the process of committing
17/01/06 16:53:27 INFO mapred.LocalJobRunner: map
17/01/06 16:53:27 INFO mapred.Task: Task 'attempt_local1147390429_0001_m_000000_0' done.
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local1147390429_0001_m_000000_0
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1147390429_0001_m_000001_0
17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/01/06 16:53:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/01/06 16:53:27 INFO mapred.MapTask: Processing split: file:/home/jungle/hadoop/hadoop-local/dataLocal/input/file1.txt:0+37
17/01/06 16:53:27 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/01/06 16:53:27 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/01/06 16:53:27 INFO mapred.MapTask: soft limit at 83886080
17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/01/06 16:53:27 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/01/06 16:53:27 INFO mapred.LocalJobRunner:
17/01/06 16:53:27 INFO mapred.MapTask: Starting flush of map output
17/01/06 16:53:27 INFO mapred.MapTask: Spilling map output
17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufend = 65; bufvoid = 104857600
17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
17/01/06 16:53:27 INFO mapred.MapTask: Finished spill 0
17/01/06 16:53:27 INFO mapred.Task: Task:attempt_local1147390429_0001_m_000001_0 is done. And is in the process of committing
17/01/06 16:53:27 INFO mapred.LocalJobRunner: map
17/01/06 16:53:27 INFO mapred.Task: Task 'attempt_local1147390429_0001_m_000001_0' done.
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local1147390429_0001_m_000001_0
17/01/06 16:53:27 INFO mapred.LocalJobRunner: map task executor complete.
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Waiting for reduce tasks
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1147390429_0001_r_000000_0
17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/01/06 16:53:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/01/06 16:53:27 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@2aa26fdb
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
17/01/06 16:53:27 INFO reduce.EventFetcher: attempt_local1147390429_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
17/01/06 16:53:27 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1147390429_0001_m_000000_0 decomp: 98 len: 102 to MEMORY
17/01/06 16:53:27 INFO reduce.InMemoryMapOutput: Read 98 bytes from map-output for attempt_local1147390429_0001_m_000000_0
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 98, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->98
17/01/06 16:53:27 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1147390429_0001_m_000001_0 decomp: 68 len: 72 to MEMORY
17/01/06 16:53:27 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/01/06 16:53:27 INFO reduce.InMemoryMapOutput: Read 68 bytes from map-output for attempt_local1147390429_0001_m_000001_0
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 68, inMemoryMapOutputs.size() -> 2, commitMemory -> 98, usedMemory ->166
17/01/06 16:53:27 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
17/01/06 16:53:27 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/01/06 16:53:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs
17/01/06 16:53:27 INFO mapred.Merger: Merging 2 sorted segments
17/01/06 16:53:27 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 156 bytes
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: Merged 2 segments, 166 bytes to disk to satisfy reduce memory limit
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: Merging 1 files, 168 bytes from disk
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
17/01/06 16:53:27 INFO mapred.Merger: Merging 1 sorted segments
17/01/06 16:53:27 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 160 bytes
17/01/06 16:53:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/01/06 16:53:27 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
17/01/06 16:53:27 INFO mapred.Task: Task:attempt_local1147390429_0001_r_000000_0 is done. And is in the process of committing
17/01/06 16:53:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/01/06 16:53:27 INFO mapred.Task: Task attempt_local1147390429_0001_r_000000_0 is allowed to commit now
17/01/06 16:53:27 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1147390429_0001_r_000000_0' to file:/home/jungle/hadoop/hadoop-local/dataLocal/outout/_temporary/0/task_local1147390429_0001_r_000000
17/01/06 16:53:27 INFO mapred.LocalJobRunner: reduce > reduce
17/01/06 16:53:27 INFO mapred.Task: Task 'attempt_local1147390429_0001_r_000000_0' done.
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local1147390429_0001_r_000000_0
17/01/06 16:53:27 INFO mapred.LocalJobRunner: reduce task executor complete.
17/01/06 16:53:28 INFO mapreduce.Job: Job job_local1147390429_0001 running in uber mode : false
17/01/06 16:53:28 INFO mapreduce.Job: map 100% reduce 100%
17/01/06 16:53:28 INFO mapreduce.Job: Job job_local1147390429_0001 completed successfully
17/01/06 16:53:28 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=889648
FILE: Number of bytes written=1748828
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=3
Map output records=18
Map output bytes=179
Map output materialized bytes=174
Input split bytes=256
Combine input records=18
Combine output records=14
Reduce input groups=11
Reduce shuffle bytes=174
Reduce input records=14
Reduce output records=11
Spilled Records=28
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=43
Total committed heap usage (bytes)=457912320
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=107
File Output Format Counters
Bytes Written=94
EBADF: Bad file descriptor。这两个错误,从网上看可以忽略。
hadoop 2.7.3本地环境运行官方wordcount的更多相关文章
- hadoop 2.7.3本地环境运行官方wordcount-基于HDFS
接上篇<hadoop 2.7.3本地环境运行官方wordcount>.继续在本地模式下测试,本次使用hdfs. 2 本地模式使用fs计数wodcount 上面是直接使用的是linux的文件 ...
- hadoop 2.7.3伪分布式环境运行官方wordcount
hadoop 2.7.3伪分布式模式运行wordcount 基本环境: 系统:win7 虚机环境:virtualBox 虚机:centos 7 hadoop版本:2.7.3 本次以伪分布式模式来运行w ...
- phpstudy等php本地环境运行缓慢的问题解决方法
我们经常会使用些一键安装包部署本地服务器环境.比如phpstudy.但是会有不少人发现,wordpress等使用数据库的程序打开或者切换页面的速度明显低于静态站点.甚至需要好几秒.这个问题一直困扰了我 ...
- hadoop——配置eclipse下的map-reduce运行环境 1
1.通过修改实例模板程序来实现自己的map-reduce: 为了让示例程序run起来: 1)安装eclipse 2)安装map-reduce的eclipse插件 eclipse的map-reduce插 ...
- spark本地环境的搭建到运行第一个spark程序
搭建spark本地环境 搭建Java环境 (1)到官网下载JDK 官网链接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8- ...
- Hadoop本地环境安装
一.服务器环境 本人用的是阿里云的ECS的入门机器,配置1核2G,1M带宽,搭了个Hadoop单机环境,供参考 Linux发行版本:Centos7 JDK:阿里云镜像市场中选择JDK8 二.安装步骤 ...
- Flink从入门到放弃(入门篇2)-本地环境搭建&构建第一个Flink应用
戳更多文章: 1-Flink入门 2-本地环境搭建&构建第一个Flink应用 3-DataSet API 4-DataSteam API 5-集群部署 6-分布式缓存 7-重启策略 8-Fli ...
- Hadoop.2.x_伪分布环境搭建
一. 基本环境搭建 1. 设置主机名.静态IP/DNS.主机映射.windows主机映射(方便ssh访问与IP修改)等 设置主机名: vi /etc/sysconfig/network # 重启系统生 ...
- 以太坊remix-ide本地环境搭建
remix-ide简介 remix-ide是一款以太坊官方solisity语言的在线IDE,可用于智能合约的编写.测试与部署,不过某些时候可能是在离线环境下工作或者受限于网速原因,使用在线remi ...
随机推荐
- Java多线程基础学习(二)
9. 线程安全/共享变量——同步 当多个线程用到同一个变量时,在修改值时存在同时修改的可能性,而此时该变量只能被赋值一次.这就会导致出现“线程安全”问题,这个被多个线程共用的变量称之为“共享变量”. ...
- 几个有趣的WEB设备API(二)
浏览器和设备之间还有很多有趣的接口, 1.屏幕朝向接口 浏览器有两种方法来监听屏幕朝向,看是横屏还是竖屏. (1)使用css媒体查询的方法 /* 竖屏 */ @media screen and (or ...
- load和initialize方法
一.load 方法什么时候调用: 在main方法还没执行的时候 就会 加载所有类,调用所有类的load方法. load方法是线程安全的,它使用了锁,我们应该避免线程阻塞在load方法. 在项目中使 ...
- lua 学习笔记(1)
一.lua函数赋值与函数调用 在lua中函数名也是作为一种变量出现的,即函数和所有其他值一样都是匿名的,当要使用某个函数时,需要将该函数赋值给一个变量,这样在函数块的其他地方就可以通过 ...
- ASP.NET MVC with Entity Framework and CSS一书翻译系列文章之第一章:创建基本的MVC Web站点
在这一章中,我们将学习如何使用基架快速搭建和运行一个简单的Microsoft ASP.NET MVC Web站点.在我们马上投入学习和编码之前,我们首先了解一些有关ASP.NET MVC和Entity ...
- ajax前后端数据交互简析
前端-------->后端 方法:POST 将要传递给后台的数据在前端拼接成url字符串,通过request.send()传递给后台,后台php把得到的数据以索引数组的方式存储在$_POST中. ...
- jquery屏幕滚动计算事件总结
获取浏览器显示区域(可视区域)的高度 : $(window).height(); 获取浏览器显示区域(可视区域)的宽度 : $(window).width(); 获取页面的文档高度: $(docume ...
- c# 基础 object ,new操作符,类型转换
参考页面: http://www.yuanjiaocheng.net/webapi/config-webapi.html http://www.yuanjiaocheng.net/webapi/web ...
- springmvc 多数据源 SSM java redis shiro ehcache 头像裁剪
获取下载地址 QQ 313596790 A 调用摄像头拍照,自定义裁剪编辑头像 B 集成代码生成器 [正反双向](单表.主表.明细表.树形表,开发利器)+快速构建表单; 技术:31359679 ...
- Android中Activity的四大启动模式实验简述
作为Android四大组件之一,Activity可以说是最基本也是最常见的组件,它提供了一个显示界面,从而实现与用户的交互,作为初学者,必须熟练掌握.今天我们就来通过实验演示,来帮助大家理解Activ ...