hive 2.1

一 问题

最近有一个场景,要向一个表的多个分区写数据,为了缩短执行时间,采用并发的方式,多个sql同时执行,分别写不同的分区,同时开启动态分区:

set hive.exec.dynamic.partition=true

insert overwrite table test_table partition(dt) select * from test_table_another where dt = 1;

结果发现只有1个sql运行,其他sql都会卡住;
查看hive thrift server线程堆栈发现请求都卡在DbTxnManager上,hive关键配置如下:

hive.support.concurrency=true
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

配置对应的默认值及注释:

org.apache.hadoop.hive.conf.HiveConf

  1. HIVE_SUPPORT_CONCURRENCY("hive.support.concurrency", false,
  2. "Whether Hive supports concurrency control or not. \n" +
  3. "A ZooKeeper instance must be up and running when using zookeeper Hive lock manager "),
  4.  
  5. HIVE_TXN_MANAGER("hive.txn.manager",
  6. "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager",
  7. "Set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive\n" +
  8. "transactions, which also requires appropriate settings for hive.compactor.initiator.on,\n" +
  9. "hive.compactor.worker.threads, hive.support.concurrency (true), hive.enforce.bucketing\n" +
  10. "(true), and hive.exec.dynamic.partition.mode (nonstrict).\n" +
  11. "The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides\n" +
  12. "no transactions."),

二 代码分析

hive执行sql的详细过程详见:https://www.cnblogs.com/barneywill/p/10185168.html

hive中执行sql最终都会调用到Driver.run,run会调用runInternal,下面直接看runInternal代码:

org.apache.hadoop.hive.ql.Driver

  1. private CommandProcessorResponse runInternal(String command, boolean alreadyCompiled)
  2. throws CommandNeedRetryException {
  3. ...
  4. if (requiresLock()) {
  5. // a checkpoint to see if the thread is interrupted or not before an expensive operation
  6. if (isInterrupted()) {
  7. ret = handleInterruption("at acquiring the lock.");
  8. } else {
  9. ret = acquireLocksAndOpenTxn(startTxnImplicitly);
  10. }
  11. ...
  12.  
  13. private boolean requiresLock() {
  14. if (!checkConcurrency()) {
  15. return false;
  16. }
  17. // Lock operations themselves don't require the lock.
  18. if (isExplicitLockOperation()){
  19. return false;
  20. }
  21. if (!HiveConf.getBoolVar(conf, ConfVars.HIVE_LOCK_MAPRED_ONLY)) {
  22. return true;
  23. }
  24. Queue<Task<? extends Serializable>> taskQueue = new LinkedList<Task<? extends Serializable>>();
  25. taskQueue.addAll(plan.getRootTasks());
  26. while (taskQueue.peek() != null) {
  27. Task<? extends Serializable> tsk = taskQueue.remove();
  28. if (tsk.requireLock()) {
  29. return true;
  30. }
  31. ...
  32.  
  33. private boolean checkConcurrency() {
  34. boolean supportConcurrency = conf.getBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY);
  35. if (!supportConcurrency) {
  36. LOG.info("Concurrency mode is disabled, not creating a lock manager");
  37. return false;
  38. }
  39. return true;
  40. }
  41.  
  42. private int acquireLocksAndOpenTxn(boolean startTxnImplicitly) {
  43. ...
  44. txnMgr.acquireLocks(plan, ctx, userFromUGI);
  45. ...

runInternal会调用requiresLock判断是否需要lock,requiresLock有两个判断:

  • 调用checkConcurrency,checkConcurrency会检查hive.support.concurrency=true才需要lock;
  • 调用Task.requireLock,只有部分task才需要lock;

如果判断需要lock,会调用acquireLocksAndOpenTxn,acquireLocksAndOpenTxn会调用HiveTxnManager.acquireLocks来获取lock;

1)先看那些task需要lock:

org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer

  1. private void analyzeAlterTablePartMergeFiles(ASTNode ast,
  2. String tableName, HashMap<String, String> partSpec)
  3. throws SemanticException {
  4. ...
  5. DDLWork ddlWork = new DDLWork(getInputs(), getOutputs(), mergeDesc);
  6. ddlWork.setNeedLock(true);
  7. ...

可见DDL操作需要;

2)再看怎样获取lock:

org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

  1. public void acquireLocks(QueryPlan plan, Context ctx, String username) throws LockException {
  2. try {
  3. acquireLocksWithHeartbeatDelay(plan, ctx, username, 0);
  4. ...
  5.  
  6. void acquireLocksWithHeartbeatDelay(QueryPlan plan, Context ctx, String username, long delay) throws LockException {
  7. LockState ls = acquireLocks(plan, ctx, username, true);
  8. ...
  9.  
  10. LockState acquireLocks(QueryPlan plan, Context ctx, String username, boolean isBlocking) throws LockException {
  11. ...
  12. switch (output.getType()) {
  13. case DATABASE:
  14. compBuilder.setDbName(output.getDatabase().getName());
  15. break;
  16.  
  17. case TABLE:
  18. case DUMMYPARTITION: // in case of dynamic partitioning lock the table
  19. t = output.getTable();
  20. compBuilder.setDbName(t.getDbName());
  21. compBuilder.setTableName(t.getTableName());
  22. break;
  23.  
  24. case PARTITION:
  25. compBuilder.setPartitionName(output.getPartition().getName());
  26. t = output.getPartition().getTable();
  27. compBuilder.setDbName(t.getDbName());
  28. compBuilder.setTableName(t.getTableName());
  29. break;
  30.  
  31. default:
  32. // This is a file or something we don't hold locks for.
  33. continue;
  34. }
  35. ...
  36. LockState lockState = lockMgr.lock(rqstBuilder.build(), queryId, isBlocking, locks);
  37. ctx.setHiveLocks(locks);
  38. return lockState;
  39. }

可见当开启动态分区时,锁的粒度是DbName+TableName,这样就会导致多个sql只有1个sql可以拿到lock,其他sql只能等待;

三 总结

解决问题的方式有几种:

  1. 关闭动态分区:set hive.exec.dynamic.partition=false
  2. 关闭并发:set hive.support.concurrency=false
  3. 关闭事务:set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager

三者任选其一,推荐第1种,因为在刚才的场景下,不需要动态分区;

【原创】大叔问题定位分享(22)hive同时执行多个insert overwrite table只有1个可以执行的更多相关文章

  1. 【原创】大叔问题定位分享(21)spark执行insert overwrite非常慢,比hive还要慢

    最近把一些sql执行从hive改到spark,发现执行更慢,sql主要是一些insert overwrite操作,从执行计划看到,用到InsertIntoHiveTable spark-sql> ...

  2. 【原创】大叔问题定位分享(15)spark写parquet数据报错ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead

    spark 2.1.1 spark里执行sql报错 insert overwrite table test_parquet_table select * from dummy 报错如下: org.ap ...

  3. hive INSERT OVERWRITE table could not be cleaned up.

    create table maats.account_channel ROW FORMAT DELIMITED FIELDS TERMINATED BY '^' STORED AS TEXTFILE ...

  4. 【原创】大叔问题定位分享(18)beeline连接spark thrift有时会卡住

    spark 2.1.1 beeline连接spark thrift之后,执行use database有时会卡住,而use database 在server端对应的是 setCurrentDatabas ...

  5. 【原创】大叔问题定位分享(16)spark写数据到hive外部表报错ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat

    spark 2.1.1 spark在写数据到hive外部表(底层数据在hbase中)时会报错 Caused by: java.lang.ClassCastException: org.apache.h ...

  6. 【原创】大叔问题定位分享(31)hive metastore报错

    hive metastore在建表时报错 [pool-5-thread-2]: MetaException(message:Got exception: java.net.ConnectExcepti ...

  7. 【原创】大叔问题定位分享(13)HBase Region频繁下线

    问题现象:hive执行sql报错 select count(*) from test_hive_table; 报错 Error: java.io.IOException: org.apache.had ...

  8. 【原创】大叔问题定位分享(30)mesos agent启动失败:Failed to perform recovery: Incompatible agent info detected

    mesos agent启动失败,报错如下: Feb 15 22:03:18 server1.bj mesos-slave[1190]: E0215 22:03:18.622994 1192 slave ...

  9. 【原创】大叔问题定位分享(28)openssh升级到7.4之后ssh跳转异常

    服务器集群之间忽然ssh跳转不通 # ssh 192.168.0.1The authenticity of host '192.168.0.1 (192.168.0.1)' can't be esta ...

随机推荐

  1. Leetcode 88. Merge Sorted Array(easy)

    Given two sorted integer arrays nums1 and nums2, merge nums2 into nums1 as one sorted array. Note:Yo ...

  2. 原生js设置rem

    使用rem是为了界面响应不同尺寸的手机,引入下面的方法就可以使用rem了. setFontSize: function (doc, win) { var docEl = doc.documentEle ...

  3. [转帖] CA如何保护自己的私钥

    作者:Gh0u1L5链接:https://www.zhihu.com/question/22260090/answer/648910720来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业 ...

  4. [转帖]Linux中的15个基本‘ls’命令示例

    Linux中的15个基本‘ls’命令示例 https://linux.cn/article-5109-1.html ls -lt 和 ls -ltr 来查看文件新旧顺序. list time rese ...

  5. springboot 打war

    pom.xml <packaging>war</packaging> <!-- 打包设置 --> <plugins> <plugin> &l ...

  6. js和jquery设置css样式的几种方法

    一.js设置样式的方法 1. 直接设置style的属性  某些情况用这个设置 !important值无效 element.style.height = '50px'; 2. 直接设置属性(只能用于某些 ...

  7. Magento 2 安装数据表

    Magento 2 安装数据表 第1步:安装脚本 首先,我们将为CRUD模型创建数据库表.为此,我们需要插入安装文件 app/code/Mageplaza/HelloWorld/Setup/Insta ...

  8. 安卓微信连接fiddler等抓包工具无法抓取https

    问题描述: 在手机连接pc的fiddler后,安卓微信打不开https页面,安卓的浏览器.qq等都可以正常访问https,ios也都可以,就只有安卓微信放问https是空白页面 解决思路: 一. 证书 ...

  9. GateOne Web SSH 环境搭建

    环境配置安装python及tornadoyum -y install python-pippip install tornado GateOne安装下载源码:git clone https://git ...

  10. linux线程(一)

    线程的优先级无法保障线程的执行次序.只不过优先级高的线程获取 CPU 资源的概率大一点而已. 线程相关函数(1)-pthread_create(), pthread_join(), pthread_e ...