最近再hue 集群查询任务经常失败,经过几天的观察,终于找到原因,报错如下

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

分析:

  taskId=task_1514128895713_0770_1_00_000006 失败了几次,失败的原因是container被高优先级的任务抢占了。而task最大的失败次数默认是4.当集群上的任务比较多时,比较容易出现这个问题。还有一个原因我认为container设置的内存太小,我这是1.5G 不过没有修改,我设置了以下参数就解决了:

解决方案:

  

  问题解决了,希望帮到你

  

hive on tez 任务失败的更多相关文章

  1. hive on tez配置

    1.Tez简介 Tez是Hontonworks开源的支持DAG作业的计算框架,它可以将多个有依赖的作业转换为一个作业从而大幅提升MapReduce作业的性能.Tez并不直接面向最终用户--事实上它允许 ...

  2. hive on tez 错误记录

    1.执行过程失败,报 Container killed on request. Exit code is 143 如下图: 分析:造成这种原因是由于总内存不多,而容器在jvm中占比过高,修改tez-s ...

  3. 配置 Hive On Tez

    配置 Hive On Tez 标签(空格分隔): hive Tez 部署底层应用 简单介绍 介绍:tez 是基于hive 之上,可以将sql翻译解析成DAG计算的引擎.基于DAG 与mr 架构本身的优 ...

  4. hive on spark VS SparkSQL VS hive on tez

    http://blog.csdn.net/wtq1993/article/details/52435563 http://blog.csdn.net/yeruby/article/details/51 ...

  5. Hive on Tez 中 Map 任务的数量计算

    Hive on Tez Mapper 数量计算 在Hive 中执行一个query时,我们可以发现Hive 的执行引擎在使用 Tez 与 MR时,两者生成mapper数量差异较大.主要原因在于 Tez ...

  6. hive on tez

    hive运行模式 hive on mapreduce 离线计算(默认) hive on tez  YARN之上支持DAG作业的计算框架 hive on spark 内存计算 hive on tez T ...

  7. Hive执行count函数失败,Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException)

    Hive执行count函数失败 1.现象: 0: jdbc:hive2://192.168.137.12:10000> select count(*) from emp; INFO : Numb ...

  8. hive中创建表失败

    使用create table命令创建表失败,如下错误信息: hive> create table test(id int,name string,age int,sex string); FAI ...

  9. Hive配置Tez引擎踩坑

    框架版本 Hadoop 2.7.7 Hive 2.3.7 Tez 0.9.2 保证hadoop集群启动,hive元数据服务启动 上传tez到HDFS tar -zxvf apache-tez-0.9. ...

随机推荐

  1. Git 实习一个月恍然大悟合集

    从开始实习到现在大概有一个月了,这个月时间接触了很多新东西,其中就包括了git版本控制.分支管理等等.我在这段时间里,深深地感受到了git对公司项目代码管理和控制.团队合作带来的益处和其重要性.其实在 ...

  2. mac 安装 php7 及扩展

    mac 版本号:10.12.3 (16D30) 安装内容 php7.0.18(配置apache),composer,phpunit,xdebug扩展,docopts,mongo和redis扩展 php ...

  3. Centos7:dubbo监控中心安装,配置和使用

    制作dubbo-admin.war文件 下载dubbo-admin https://github.com/alibaba/dubbo 注:2.6版本后源码中不包含dubbo-admin工程 在dubb ...

  4. TKmybatis和mybatisplus哪个好用

    文档连接 :http://baomidou.oschina.io/mybatis-plus-doc/#/?id=%E7%AE%80%E4%BB%8B https://gitee.com/hengboy ...

  5. Swift(二)基础部分

    数据类型 Swift 包含了 C 和 Objective-C 上所有基础数据类型.它还增加了 Objective-C 中没有的高阶数据类型比如元组(Tuple) 1.基础类型 Int整形和UInt无符 ...

  6. PYTHON的程序在LINUX后台运行

    1.nohup 命令 nohup nohup 命令 用途:LINUX命令用法,不挂断地运行命令. 语法:nohup Command [ Arg ... ] [ & ] 描述:nohup 命令运 ...

  7. 什么是DDoS攻击?

    本文转载自知道创宇云安全的知乎回答:DDoS 的肉鸡都是哪来的? 说到DDoS攻击,我们就不得不说“肉鸡”. “肉鸡”可谓是DDoS攻击的核心大杀器,作为一个要发起DDoS攻击的黑客来说,没有肉鸡就是 ...

  8. 关于select的取值

    这篇博客,主要是记录我我所犯的错误,或者自己的写法不规范导致了错误,大家可以引以引以为鉴. 我要获取select当前的值,在IE9上我可以直接写document.getElementById(&quo ...

  9. 点击切换JS

    $(function(){ var tabnav = $("#tab-nav ul li"); tabnav.click(function(){ $(this).addClass( ...

  10. Java面试01

    一.谈谈你对java的理解 1.平台无关性,一次编译到处运行 2.GC 3.语言特性 4.面向对象 5.类库 6.异常处理 二.Java如何做到一次编译到处运行?(如何做到平台无关性) 首先我们先来编 ...