一问题

在hive1.2中使用hive或者beeline执行sql都有进度信息，但是升级到hive2.0以后，只有hive执行sql还有进度信息，beeline执行sql完全silence，在等待结果的过程中完全不知道执行到哪了

1 hive执行sql过程（有进度信息）

hive> select count(1) from test_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20181227162003_bd82e3e2-2736-42b4-b1da-4270ead87e4d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1544593827645_22873, Tracking URL = http://rm1:8088/proxy/application_1544593827645_22873/
Kill Command = /export/App/hadoop-2.6.1/bin/hadoop job -kill job_1544593827645_22873
2018-12-27 16:20:27,650 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 116.9 sec
MapReduce Total cumulative CPU time: 1 minutes 56 seconds 900 msec
Ended Job = job_1544593827645_22873
MapReduce Jobs Launched:
Stage-Stage-1: Map: 29 Reduce: 1 Cumulative CPU: 116.9 sec HDFS Read: 518497 HDFS Write: 197 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 56 seconds 900 msec
OK
104
Time taken: 24.437 seconds, Fetched: 1 row(s)

2 beeline执行sql过程（无进度信息）

0: jdbc:hive2://thrift1:10000> select count(1) from test_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
+------+--+
| c0 |
+------+--+
| 104 |
+------+--+
1 row selected (23.965 seconds)

二代码分析

hive执行sql的详细过程详见：https://www.cnblogs.com/barneywill/p/10185168.html

hive中执行sql最终都会调用到Driver.run，run会调用execute，下面直接看execute代码：

org.apache.hadoop.hive.ql.Driver

  public int execute(boolean deferClose) throws CommandNeedRetryException {

...

      if (jobs > 0) {

        logMrWarning(mrJobs);

        console.printInfo("Query ID = " + queryId);

        console.printInfo("Total jobs = " + jobs);

      }

...

  private void logMrWarning(int mrJobs) {

    if (mrJobs <= 0 || !("mr".equals(HiveConf.getVar(conf, ConfVars.HIVE_EXECUTION_ENGINE)))) {

      return;

    }

    String warning = HiveConf.generateMrDeprecationWarning();

    LOG.warn(warning);

    warning = "WARNING: " + warning;

    console.printInfo(warning);

    // Propagate warning to beeline via operation log.

    OperationLog ol = OperationLog.getCurrentOperationLog();

    if (ol != null) {

      ol.writeOperationLog(LoggingLevel.EXECUTION, warning + "\n");

    }

  }

可见在hive命令中看到的进度信息是通过console.printInfo输出的；
注意到一个细节，在beeline中虽然没有进度信息，但是有一个warning信息：

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

这个warning信息是通过如下代码输出的：

    OperationLog ol = OperationLog.getCurrentOperationLog();

    if (ol != null) {

      ol.writeOperationLog(LoggingLevel.EXECUTION, warning + "\n");

    }

所以如果让beeline执行sql也有进度信息，就要通过相同的方式输出；

三 hive进度信息位置

熟悉的进度信息在这里：

org.apache.hadoop.hive.ql.Driver

  public int execute(boolean deferClose) throws CommandNeedRetryException {

...

        console.printInfo("Query ID = " + queryId);

        console.printInfo("Total jobs = " + jobs);

  private TaskRunner launchTask(Task<? extends Serializable> tsk, String queryId, boolean noName,

      String jobname, int jobs, DriverContext cxt) throws HiveException {

...

      console.printInfo("Launching Job " + cxt.getCurJobNo() + " out of " + jobs);

org.apache.hadoop.hive.ql.exec.mr.MapRedTask

  private void setNumberOfReducers() throws IOException {

    ReduceWork rWork = work.getReduceWork();

    // this is a temporary hack to fix things that are not fixed in the compiler

    Integer numReducersFromWork = rWork == null ? 0 : rWork.getNumReduceTasks();

    if (rWork == null) {

      console

          .printInfo("Number of reduce tasks is set to 0 since there's no reduce operator");

    } else {

      if (numReducersFromWork >= 0) {

        console.printInfo("Number of reduce tasks determined at compile time: "

            + rWork.getNumReduceTasks());

      } else if (job.getNumReduceTasks() > 0) {

        int reducers = job.getNumReduceTasks();

        rWork.setNumReduceTasks(reducers);

        console

            .printInfo("Number of reduce tasks not specified. Defaulting to jobconf value of: "

            + reducers);

      } else {

        if (inputSummary == null) {

          inputSummary =  Utilities.getInputSummary(driverContext.getCtx(), work.getMapWork(), null);

        }

        int reducers = Utilities.estimateNumberOfReducers(conf, inputSummary, work.getMapWork(),

                                                          work.isFinalMapRed());

        rWork.setNumReduceTasks(reducers);

        console

            .printInfo("Number of reduce tasks not specified. Estimated from input data size: "

            + reducers);

      }

      console

          .printInfo("In order to change the average load for a reducer (in bytes):");

      console.printInfo("  set " + HiveConf.ConfVars.BYTESPERREDUCER.varname

          + "=<number>");

      console.printInfo("In order to limit the maximum number of reducers:");

      console.printInfo("  set " + HiveConf.ConfVars.MAXREDUCERS.varname

          + "=<number>");

      console.printInfo("In order to set a constant number of reducers:");

      console.printInfo("  set " + HiveConf.ConfVars.HADOOPNUMREDUCERS

          + "=<number>");

    }

  }

大部分都在下边这个类里：

org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper

  public void jobInfo(RunningJob rj) {

    if (ShimLoader.getHadoopShims().isLocalMode(job)) {

      console.printInfo("Job running in-process (local Hadoop)");

    } else {

      if (SessionState.get() != null) {

        SessionState.get().getHiveHistory().setTaskProperty(queryState.getQueryId(),

            getId(), Keys.TASK_HADOOP_ID, rj.getID().toString());

      }

      console.printInfo(getJobStartMsg(rj.getID()) + ", Tracking URL = "

          + rj.getTrackingURL());

      console.printInfo("Kill Command = " + HiveConf.getVar(job, HiveConf.ConfVars.HADOOPBIN)

          + " job  -kill " + rj.getID());

    }

  }

  private MapRedStats progress(ExecDriverTaskHandle th) throws IOException, LockException {

...

      StringBuilder report = new StringBuilder();

      report.append(dateFormat.format(Calendar.getInstance().getTime()));

      report.append(' ').append(getId());

      report.append(" map = ").append(mapProgress).append("%, ");

      report.append(" reduce = ").append(reduceProgress).append('%');

...

      String output = report.toString();

...

      console.printInfo(output);

...

  public static String getJobEndMsg(JobID jobId) {

    return "Ended Job = " + jobId;

  }

看起来改动工作量不小，哈哈

【原创】大叔经验分享（18）hive2.0以后通过beeline执行sql没有进度信息的更多相关文章

【原创】大叔经验分享（89）docker启动openjdk执行jmap报错
docker启动openjdk后,可以查看进程 # docker exec -it XXX jps 10 XXX.jar 可见启动的java进程id一直为10,然后可以执行jvm命令,比如 # doc ...
【原创】经验分享：一个小小emoji尽然牵扯出来这么多东西？
前言之前也分享过很多工作中踩坑的经验: 一个线上问题的思考:Eureka注册中心集群如何实现客户端请求负载及故障转移? [原创]经验分享:一个Content-Length引发的血案(almost.. ...
【原创】大叔经验分享（12）如何程序化kill提交到spark thrift上的sql
spark 2.1.1 hive正在执行中的sql可以很容易的中止,因为可以从console输出中拿到当前在yarn上的application id,然后就可以kill任务, WARNING: Hiv ...
【原创】大叔经验分享（20）spark job之间会停顿几分钟
今天遇到一个问题,spark应用中在一个循环里执行sql,每个sql都会向一张表写入数据,比如 insert overwrite table test_table partition(dt) sele ...
【原创】大叔经验分享（13）spark运行报错WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
本地运行spark报错 18/12/18 12:56:55 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting ...
【原创】大叔经验分享（33）hive select count为0
hive建表后直接将数据文件拷贝到table目录下,select * 可以查到数据,但是select count(1) 一直返回0,这个是因为hive中有个配置 hive.stats.autogath ...
【原创】大叔经验分享（24）hive metastore的几种部署方式
hive及其他组件(比如spark.impala等)都会依赖hive metastore,依赖的配置文件位于hive-site.xml hive metastore重要配置 hive.metastor ...
【原创】大叔经验分享（11）python引入模块报错ImportError: No module named pandas numpy
python应用通常需要一些库,比如numpy.pandas等,安装也很简单,直接通过pip # pip install numpyRequirement already satisfied: num ...
【原创】大叔经验分享（7）创建hive表时格式如何选择
常用格式 textfile 需要定义分隔符,占用空间大,读写效率最低,非常容易发生冲突(分隔符)的一种格式,基本上只有需要导入数据的时候才会使用,比如导入csv文件: ROW FORMAT DELIM ...

随机推荐

Python dict和set的实现原理
在python的dict中间进行查找某个key操作时,查找所需时间不会随着dict中键值对数量增多而变长,(时间复杂度为O(1))但是list中就会(时间复杂度为O(N)),这是因为list查询实现的 ...
C语言课堂随记
1.codeblocks中的pow函数会有误差. 自定义pow函数: int pow(int x,int y) { ; ; i<=y; i++) t=t*x; return t; } 2.C库函 ...
用Python开发小学二年级口算自动出题程序
版权声明:本文为博主原创文章,欢迎转载,并请注明出处.联系方式:460356155@qq.com 武汉光谷一小二年级要求家长每天要给小孩出口算题目,让孩子练习. 根据老师出题要求编写了Python程序 ...
mpvue——引入vant_weapp组件
克隆仓库克隆后,将dist目录下的所有文件复制到项目中的/static/vant/目录下,vant目录是我自己创建为了区分的 git clone https://github.com/youzan/ ...
Java【第三篇】基本语法之--选择结构
Java分支语句分类分支语句根据一定的条件有选择地执行或跳过特定的语句,分为两类: if-else 语句 switch 语句 if-else语句语法格式 if(布尔表达式){ 语句或语句块; } i ...
CF55D Beautiful numbers
题目链接题意定义一个数字\(x\)是\(beautiful\ number\)当且仅当\(x\)可以被其十进制表示下所有非\(0\)位置的数整除. 例如\(24\)是一个\(beautiful\ ...
org.hibernate.ObjectNotFoundException: No row with the given identifier exists解决办法
hibernate-取消关联外键引用数据丢失抛异常的设置@NotFound hibernate项目里面配了很多many-to-one的关联,后台在查询数据时已经作了健全性判断,但还是经常抛出对象找不到 ...
【原创】支持同时生成多个main函数 makefile 模板
背景: 去年做项目的时候,由于有需要编译出多个可执行文件的需求,修改了Makefile使其支持生成多个结果(编译多个含有main函数的文件),但总觉得自己的实现不够完美. 今年又遇到这样需求的时候,可 ...
第二十节: 深入理解并发机制以及解决方案(锁机制、EF自有机制、队列模式等)
一. 理解并发机制 1. 什么是并发,并发与多线程有什么关系? ①. 先从广义上来说,或者从实际场景上来说. 高并发通常是海量用户同时访问(比如:12306买票.淘宝的双十一抢购),如果把一个用户看做 ...
Erdos
Erdős Pál(1913年3月26日-1996年9月20日),匈牙利籍犹太人,发表论文达1475篇(包括和人合写的),为现时发表论文第二多的数学家(第一是Euler):曾和509人合写论文. Er ...

【原创】大叔经验分享（18）hive2.0以后通过beeline执行sql没有进度信息

一 问题