hadoop debug script
A Hadoop job may consist of many map tasks and reduce tasks. Therefore, debugging a
Hadoop job is often a complicated process. It is a good practice to first test a Hadoop job
using unit tests by running it with a subset of the data.
However, sometimes it is necessary to debug a Hadoop job in a distributed mode. To support
such cases, Hadoop provides a mechanism called debug scripts. This recipe explains how to
use debug scripts.
A debug script is a shell script, and Hadoop executes the script whenever a task encounters
an error. The script will have access to the $script, $stdout, $stderr, $syslog, and
$jobconfproperties, as environment variables populated by Hadoop. You can find a
sample script from resources/chapter3/debugscript. We can use the debug scripts
to copy all the logfiles to a single location, e-mail them to a single e-mail account, or perform
some analysis.
LOG_FILE=HADOOP_HOME/error.log
echo "Run the script" >> $LOG_FILE
echo $script >> $LOG_FILE
echo $stdout>> $LOG_FILE
echo $stderr>> $LOG_FILE
echo $syslog >> $LOG_FILE
echo $jobconf>> $LOG_FILE
when you execute this, you should pay attention to the execute path, or else it will not found debug script.
package chapter3; import java.net.URI; import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordcountWithDebugScript {
private static final String scriptFileLocation = "resources/chapter3/debugscript";
private static final String HDFS_ROOT = "/debug"; public static void setupFailedTaskScript(JobConf conf) throws Exception { // create a directory on HDFS where we'll upload the fail scripts
FileSystem fs = FileSystem.get(conf);
// Path debugDir = new Path("/debug");
Path debugDir = new Path(HDFS_ROOT); // who knows what's already in this directory; let's just clear it.
if (fs.exists(debugDir)) {
fs.delete(debugDir, true);
} // ...and then make sure it exists again
fs.mkdirs(debugDir); // upload the local scripts into HDFS
fs.copyFromLocalFile(new Path(scriptFileLocation), new Path(HDFS_ROOT
+ "/fail-script")); FileStatus[] list = fs.listStatus(new Path(HDFS_ROOT));
if (list == null || list.length == 0) {
System.out.println("No File found");
} else {
for (FileStatus f : list) {
System.out.println("File found " + f.getPath());
}
} conf.setMapDebugScript("./fail-script");
conf.setReduceDebugScript("./fail-script");
// this create a simlink from the job directory to cache directory of
// the mapper node
DistributedCache.createSymlink(conf); URI fsUri = fs.getUri(); String mapUriStr = fsUri.toString() + HDFS_ROOT
+ "/fail-script#fail-script";
System.out.println("added " + mapUriStr + "to distributed cache 1");
URI mapUri = new URI(mapUriStr);
// Following copy the map uri to the cache directory of the job node
DistributedCache.addCacheFile(mapUri, conf);
} public static void main(String[] args) throws Exception {
JobConf conf = new JobConf();
setupFailedTaskScript(conf);
Job job = new Job(conf, "word count"); job.setJarByClass(FaultyWordCount.class);
job.setMapperClass(FaultyWordCount.TokenizerMapper.class);
job.setReducerClass(FaultyWordCount.IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileSystem.get(conf).delete(new Path(args[1]), true);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
} }
digest from mapreduce cookbook
hadoop debug script的更多相关文章
- Hadoop官方文档翻译——MapReduce Tutorial
MapReduce Tutorial(个人指导) Purpose(目的) Prerequisites(必备条件) Overview(综述) Inputs and Outputs(输入输出) MapRe ...
- hadoop mapreduce核心功能描述
核心功能描述 应用程序通常会通过提供map和reduce来实现 Mapper和Reducer接口,它们组成作业的核心. Mapper Mapper将输入键值对(key/value pair)映射到一组 ...
- hadoop之 hadoop 2.2.X 弃用的配置属性名称及其替换名称对照表
Deprecated Properties 弃用属性 The following table lists the configuration property names that are depr ...
- Hadoop专业解决方案-第5章 开发可靠的MapReduce应用
本章主要内容: 1.利用MRUnit创建MapReduce的单元测试. 2.MapReduce应用的本地实例. 3.理解MapReduce的调试. 4.利用MapReduce防御式程序设计. 在WOX ...
- Hadoop Map/Reduce教程
原文地址:http://hadoop.apache.org/docs/r1.0.4/cn/mapred_tutorial.html 目的 先决条件 概述 输入与输出 例子:WordCount v1.0 ...
- 一步一步跟我学习hadoop(5)----hadoop Map/Reduce教程(2)
Map/Reduce用户界面 本节为用户採用框架要面对的各个环节提供了具体的描写叙述,旨在与帮助用户对实现.配置和调优进行具体的设置.然而,开发时候还是要相应着API进行相关操作. 首先我们须要了解M ...
- VS2015/2013/2012 IIS Express Debug Classic ASP
参考资料: https://msdn.microsoft.com/en-us/library/ms241740(v=vs.100).aspx When you attach to an ASP Web ...
- cdh版本的hue安装配置部署以及集成hadoop hbase hive mysql等权威指南
hue下载地址:https://github.com/cloudera/hue hue学习文档地址:http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-c ...
- Hadoop出现错误:WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable,解决方案
安装Hadoop的时候直接用的bin版本,根据教程安装好之后运行的时候发现出现了:WARN util.NativeCodeLoader: Unable to load native-hadoop li ...
随机推荐
- 用c#写的一个局域网聊天客户端 类似小飞鸽
用c#写的一个局域网聊天客户端 类似小飞鸽 摘自: http://www.cnblogs.com/yyl8781697/archive/2012/12/07/csharp-socket-udp.htm ...
- sql 两列相加存到另一列
假设表table1有a.b两个列,想生成另一个列为a列值+b列值计算列添加语句如下ALTER TABLE table1ADD c AS a+b
- socket调用流程的函数及数据结构
如有错误,欢迎指正. 如果需要,可以提供visio原文件. 参考: 1. <追踪Linux TCPIP代码运行--基于2.6内核> 2. Linux Kernel 2.6.26
- PHP学习笔记:APACHE配置虚拟目录、一个站点使用多域名配置方式
我用的是xmapp lite2016的集成包,配置虚拟目录教程如下: 找到httpd-vhosts.conf这个文件,这个文件一般是在xampp\apache\conf\extra这个路径下面,找不到 ...
- javascript实现排序算法
准备好好学习js了,js写的第一个排序 先推荐一个js在线编辑工具,RunJS,还不错. 冒泡排序 var arr = [2,4,1,5,3]; function handle(arr){ for(v ...
- [ERROR] Plugin 'InnoDB' init function returned error
今天一大早到公司,计划把开发环境的mysql升级到5.7.15,干净关闭系统后,把目录从5.6指向到5.7,一切正常,重新指向5.6启动时,报下列错误: 2016-10-31 08:13:14 869 ...
- 个人收集整理的5Ucms标签
{field:cid} 当前栏目id {field:id} 当前页面id {field:content} 当前页面内容 [List:Modifytime $format=yy-mm-dd] 文章发布 ...
- [Tips] Useful link ... on going
1. CSS Bootstrap http://getbootstrap.com/ Bootstrap 中文文档 http://getbootstrap.com/2.3.2/ 最全的 Twitter ...
- 转:NLog之:文件类型目标(File target)
转:http://www.cnblogs.com/RitchieChen/archive/2012/07/16/2594308.html 英文原文[http://nlog-project.org/wi ...
- MyBatis入门(三)---多个参数
一.建立表 1.1.建立表,并插入数据 /* SQLyog Enterprise v12.09 (64 bit) MySQL - 5.6.27-log : Database - mybatis *** ...