最近再hue 集群查询任务经常失败,经过几天的观察,终于找到原因,报错如下

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

分析:

  taskId=task_1514128895713_0770_1_00_000006 失败了几次,失败的原因是container被高优先级的任务抢占了。而task最大的失败次数默认是4.当集群上的任务比较多时,比较容易出现这个问题。还有一个原因我认为container设置的内存太小,我这是1.5G 不过没有修改,我设置了以下参数就解决了:

解决方案:

  

  问题解决了,希望帮到你

  

hive on tez 任务失败的更多相关文章

  1. hive on tez配置

    1.Tez简介 Tez是Hontonworks开源的支持DAG作业的计算框架,它可以将多个有依赖的作业转换为一个作业从而大幅提升MapReduce作业的性能.Tez并不直接面向最终用户--事实上它允许 ...

  2. hive on tez 错误记录

    1.执行过程失败,报 Container killed on request. Exit code is 143 如下图: 分析:造成这种原因是由于总内存不多,而容器在jvm中占比过高,修改tez-s ...

  3. 配置 Hive On Tez

    配置 Hive On Tez 标签(空格分隔): hive Tez 部署底层应用 简单介绍 介绍:tez 是基于hive 之上,可以将sql翻译解析成DAG计算的引擎.基于DAG 与mr 架构本身的优 ...

  4. hive on spark VS SparkSQL VS hive on tez

    http://blog.csdn.net/wtq1993/article/details/52435563 http://blog.csdn.net/yeruby/article/details/51 ...

  5. Hive on Tez 中 Map 任务的数量计算

    Hive on Tez Mapper 数量计算 在Hive 中执行一个query时,我们可以发现Hive 的执行引擎在使用 Tez 与 MR时,两者生成mapper数量差异较大.主要原因在于 Tez ...

  6. hive on tez

    hive运行模式 hive on mapreduce 离线计算(默认) hive on tez  YARN之上支持DAG作业的计算框架 hive on spark 内存计算 hive on tez T ...

  7. Hive执行count函数失败,Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException)

    Hive执行count函数失败 1.现象: 0: jdbc:hive2://192.168.137.12:10000> select count(*) from emp; INFO : Numb ...

  8. hive中创建表失败

    使用create table命令创建表失败,如下错误信息: hive> create table test(id int,name string,age int,sex string); FAI ...

  9. Hive配置Tez引擎踩坑

    框架版本 Hadoop 2.7.7 Hive 2.3.7 Tez 0.9.2 保证hadoop集群启动,hive元数据服务启动 上传tez到HDFS tar -zxvf apache-tez-0.9. ...

随机推荐

  1. TP5使用phpoffice phpexcel包操作excel(导出)

    安装composer(window版本) 安装composer(MAC版本) 安装composer(Linux版本) 在PhpStorm配置 导出excel 1.使用composer安装phpoffi ...

  2. ITCAST-C# 委托

    using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace _12委 ...

  3. springboot(二十)-配置文件 bootstrap和application区别

    用过 Spring Boot 的都知道在 Spring Boot 中有以下两种配置文件 bootstrap (.yml 或者 .properties) application (.yml 或者 .pr ...

  4. 运行期优化 Java内存模型与线程 线程安全与优化

  5. ffmpeg3.3.2命令行参数笔记

    组成: 1.libavformat:用于各种音视频封装格式的生成和解析,包括获取解码所需信息以生成解码上下文结构和读取音视频帧等功能,包含demuxers和muxer库: 2.libavcodec:用 ...

  6. 求二叉搜索树的第k小的节点

    题目描述: /** * 给定一棵二叉搜索树,请找出其中的第k小的结点. * 例如, (5,3,7,2,4,6,8)中, * 按结点数值大小顺序第三小结点的值为4. * 这是层序遍历: * 5 * 3 ...

  7. 标准C语言(2)

    字符类型名称是char,这个类型里一共包含256个不同的整数,每个整数代表一个字符(例如'a', '&'等),这些整数和字符可以互相替代,ASCII码表记录了所有整数和字符之间的对应关系 'a ...

  8. php + mysql 存储过程

    实例一:无参的存储过程$conn = mysql_connect('localhost','root','root') or die ("数据连接错误!!!");mysql_sel ...

  9. spark源码本地调试

    1.前提条件: 1)安装jdk 版本: 2)安装scala 版本: 3)安装sbt 版本: 4)安装maven 5)安装git 版本: 6)安装idea,并配置好sbt.git.maven 2.从gi ...

  10. JULY-Record-update

    2019/07/26~2019/07/29,关于学习的一些记录 神经网络和深度学习neural networks and deep-learning-中文_ALL(1) 张景,逻辑派,组织派,行为主义 ...