最近再hue 集群查询任务经常失败,经过几天的观察,终于找到原因,报错如下

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

分析:

  taskId=task_1514128895713_0770_1_00_000006 失败了几次,失败的原因是container被高优先级的任务抢占了。而task最大的失败次数默认是4.当集群上的任务比较多时,比较容易出现这个问题。还有一个原因我认为container设置的内存太小,我这是1.5G 不过没有修改,我设置了以下参数就解决了:

解决方案:

  

  问题解决了,希望帮到你

  

hive on tez 任务失败的更多相关文章

  1. hive on tez配置

    1.Tez简介 Tez是Hontonworks开源的支持DAG作业的计算框架,它可以将多个有依赖的作业转换为一个作业从而大幅提升MapReduce作业的性能.Tez并不直接面向最终用户--事实上它允许 ...

  2. hive on tez 错误记录

    1.执行过程失败,报 Container killed on request. Exit code is 143 如下图: 分析:造成这种原因是由于总内存不多,而容器在jvm中占比过高,修改tez-s ...

  3. 配置 Hive On Tez

    配置 Hive On Tez 标签(空格分隔): hive Tez 部署底层应用 简单介绍 介绍:tez 是基于hive 之上,可以将sql翻译解析成DAG计算的引擎.基于DAG 与mr 架构本身的优 ...

  4. hive on spark VS SparkSQL VS hive on tez

    http://blog.csdn.net/wtq1993/article/details/52435563 http://blog.csdn.net/yeruby/article/details/51 ...

  5. Hive on Tez 中 Map 任务的数量计算

    Hive on Tez Mapper 数量计算 在Hive 中执行一个query时,我们可以发现Hive 的执行引擎在使用 Tez 与 MR时,两者生成mapper数量差异较大.主要原因在于 Tez ...

  6. hive on tez

    hive运行模式 hive on mapreduce 离线计算(默认) hive on tez  YARN之上支持DAG作业的计算框架 hive on spark 内存计算 hive on tez T ...

  7. Hive执行count函数失败,Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException)

    Hive执行count函数失败 1.现象: 0: jdbc:hive2://192.168.137.12:10000> select count(*) from emp; INFO : Numb ...

  8. hive中创建表失败

    使用create table命令创建表失败,如下错误信息: hive> create table test(id int,name string,age int,sex string); FAI ...

  9. Hive配置Tez引擎踩坑

    框架版本 Hadoop 2.7.7 Hive 2.3.7 Tez 0.9.2 保证hadoop集群启动,hive元数据服务启动 上传tez到HDFS tar -zxvf apache-tez-0.9. ...

随机推荐

  1. 封装CURD

    <?php include ('ft.php'); $db=Danli::show(); //查询 //$re=$db->table('stree')->where(['name'= ...

  2. 如何决定使用 HashMap 还是 TreeMap? (转)

    问:如何决定使用 HashMap 还是 TreeMap? 介绍 TreeMap<K,V>的Key值是要求实现java.lang.Comparable,所以迭代的时候TreeMap默认是按照 ...

  3. 附录3:RMA算法原理

    RMA算法分三步: 一.背景校正(没精力写了) 二.归一化(没精力写了) 三.计算表达值 假设有5张芯片,这些芯片的某个探针组包含5个探针,它们的表达值如下: GeneChip 4 8 6 9 7 3 ...

  4. 关于redis的几件小事(七)redis缓存雪崩与穿透

    1.缓存雪崩 (1)什么是缓存雪崩 缓存雪崩指的是在同一时刻,缓存大量失效,导致大量的请求直接到了数据库,数据库压力剧增,引起系统崩溃.可能出现的情况有: ①大量的key设置了相同的过期时间,导致在缓 ...

  5. Python内置函数清单

    作者:Vamei 出处:http://www.cnblogs.com/vamei Python内置(built-in)函数随着python解释器的运行而创建.在Python的程序中,你可以随时调用这些 ...

  6. java网络编程+通讯协议的理解

    参考: http://blog.csdn.net/sunyc1990/article/details/50773014 网络编程对于很多的初学者来说,都是很向往的一种编程技能,但是很多的初学者却因为很 ...

  7. sftp及两种连接模式简介

    sftp是ssh内含的协议,只要sshd服务器启动了,它就可用,它本身不需要ftp服务器启动. FTP服务器和客户端要进行文件传输,就需要通过端口来进行.FTP协议需要的端口一般包括两种: 控制链路- ...

  8. 简单了解 TCP TCP/IP HTTP HTTPS

    一. 什么是TCP连接的三次握手 第一次握手:客户端发送syn包(syn=j)到服务器,并进入SYN_SEND状态,等待服务器确认: 第二次握手:服务器收到syn包,必须确认客户的SYN(ack=j+ ...

  9. composer入门教程

    初始化项目 使用composer初始化工作目录,在项目的根目录命令行输入 composer init 安装项目 在composer.json文件所在目录命令行下执行如下命令 php composer. ...

  10. PAT Basic 1011 A+B 和 C (15 分)

    给定区间 [−] 内的 3 个整数 A.B 和 C,请判断 A+B 是否大于 C. 输入格式: 输入第 1 行给出正整数 T (≤),是测试用例的个数.随后给出 T 组测试用例,每组占一行,顺序给出  ...