Hive metastore源码阅读（三）

　　上次写了hive metastore的partition的生命周期，但是简略概括了下alter_partition的操作，这里补一下alter_partition,因为随着项目的深入，发现它涉及的地方较多，比如insert into 时如果路径存在情况下会调用alter_partition,调用insert overwrite语句时，也会调用该方法，

　　入口依旧是Hive.java这个类：

   public void alterPartition(String dbName, String tblName, Partition newPart)

       throws InvalidOperationException, HiveException {

     try {

       // Remove the DDL time so that it gets refreshed

       if (newPart.getParameters() != null) {

         newPart.getParameters().remove(hive_metastoreConstants.DDL_TIME);

       }

       newPart.checkValidity();

       getMSC().alter_partition(dbName, tblName, newPart.getTPartition());

     } catch (MetaException e) {

       throw new HiveException("Unable to alter partition. " + e.getMessage(), e);

     } catch (TException e) {

       throw new HiveException("Unable to alter partition. " + e.getMessage(), e);

     }

   }

　　随后通过HiveMetaStoreClient调用alter_partition请求服务端，传入的参数中包含新的partition，然后服务端调用了rename_partition方法，详细不再说了，上一篇大体的也说明了，这里直接从alterHandler.alterPartition进行partition的更改开始。

  public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname,

       final String name, final List<String> part_vals, final Partition new_part)

       throws InvalidOperationException, InvalidObjectException, AlreadyExistsException,

       MetaException {

     boolean success = false;

     Path srcPath = null;

     Path destPath = null;

     FileSystem srcFs = null;

     FileSystem destFs = null;

     Partition oldPart = null;

     String oldPartLoc = null;

     String newPartLoc = null;

     // Set DDL time to now if not specified

     if (new_part.getParameters() == null ||

         new_part.getParameters().get(hive_metastoreConstants.DDL_TIME) == null ||

         Integer.parseInt(new_part.getParameters().get(hive_metastoreConstants.DDL_TIME)) == 0) {

       new_part.putToParameters(hive_metastoreConstants.DDL_TIME, Long.toString(System

           .currentTimeMillis() / 1000));

     }

     Table tbl = msdb.getTable(dbname, name);

     //alter partition

     if (part_vals == null || part_vals.size() == 0) {

       try {

         oldPart = msdb.getPartition(dbname, name, new_part.getValues());

         if (MetaStoreUtils.requireCalStats(hiveConf, oldPart, new_part, tbl)) {

           MetaStoreUtils.updatePartitionStatsFast(new_part, wh, false, true);

         }

         updatePartColumnStats(msdb, dbname, name, new_part.getValues(), new_part);

         msdb.alterPartition(dbname, name, new_part.getValues(), new_part);

       } catch (InvalidObjectException e) {

         throw new InvalidOperationException("alter is not possible");

       } catch (NoSuchObjectException e){

         //old partition does not exist

         throw new InvalidOperationException("alter is not possible");

       }

       return oldPart;

     }
　　　　　　。。。。。。

　　从代码中我们可以看到:

　　1、通过Table tbl = msdb.getTable(dbname, name); get到该表的整个元数据的封装信息。

　　2、随后oldPart = msdb.getPartition(dbname, name, new_part.getValues());，通过dbName、tableName、Values获取partition的元数据信息，Values便是新的partition分区结构eg:（2017-09-11）,随后调用MetaStoreUtils.requireCalStats(hiveConf, oldPart, new_part, tbl)，进行元数据存在校验，如果不存在，则调用updatePartitionStatsFast进行更新（这里就不再详细说明，因为我不知道里面StatsSetupConst的配置参数是干嘛的哈哈哈哈哈~尴尬~一步步来嘛）

　　3、随后调用了updatePartColumnStats方法，进行物理partition地址的更新，我们一步一步看，代码如下：

   private void updatePartColumnStats(RawStore msdb, String dbName, String tableName,

       List<String> partVals, Partition newPart) throws MetaException, InvalidObjectException {

     dbName = HiveStringUtils.normalizeIdentifier(dbName);

     tableName = HiveStringUtils.normalizeIdentifier(tableName);

     String newDbName = HiveStringUtils.normalizeIdentifier(newPart.getDbName());

     String newTableName = HiveStringUtils.normalizeIdentifier(newPart.getTableName());

     Table oldTable = msdb.getTable(dbName, tableName);

     if (oldTable == null) {

       return;

     }

     try {

       String oldPartName = Warehouse.makePartName(oldTable.getPartitionKeys(), partVals);

       String newPartName = Warehouse.makePartName(oldTable.getPartitionKeys(), newPart.getValues());

       if (!dbName.equals(newDbName) || !tableName.equals(newTableName)

           || !oldPartName.equals(newPartName)) {

         msdb.deletePartitionColumnStatistics(dbName, tableName, oldPartName, partVals, null);

       } else {

         Partition oldPartition = msdb.getPartition(dbName, tableName, partVals);

         if (oldPartition == null) {

           return;

         }

         if (oldPartition.getSd() != null && newPart.getSd() != null) {

         List<FieldSchema> oldCols = oldPartition.getSd().getCols();

           if (!MetaStoreUtils.areSameColumns(oldCols, newPart.getSd().getCols())) {

             updatePartColumnStatsForAlterColumns(msdb, oldPartition, oldPartName, partVals, oldCols, newPart);

           }

         }

       }

     } catch (NoSuchObjectException nsoe) {

       LOG.debug("Could not find db entry." + nsoe);

       //ignore

     } catch (InvalidInputException iie) {

       throw new InvalidObjectException("Invalid input to update partition column stats." + iie);

     }

   }

　　5、Table oldTable = msdb.getTable(dbName, tableName);这里获取oldTable的所有元数据信息，随后通过makePartName拼接新老partition的partName(eg:/dt=2017-09-11/hour/1)用于新老partition的hdfs的路径对比，因为alterPartition操作，可能是通过alter table、table rename等操作执行的，所以如果老的dbName、tableName、以及partition Name与新的不同，那么就需要将元数据中类似于meta_partition的数据清空。随后通过客户端重新创建partition。

　　6、如果是相同的，那么说明修改是partition的列信息，通过MetaStoreUtils.areSameColumns(oldCols, newPart.getSd().getCols())进行校验（内部方法不再把代码贴出来了）

　　7、调用updatePartColumnStatsForAlterColumns开始进行column的更新，这里面代码还是要贴出来一起玩一下：

 private void updatePartColumnStatsForAlterColumns(RawStore msdb, Partition oldPartition,

      String oldPartName, List<String> partVals, List<FieldSchema> oldCols, Partition newPart)

          throws MetaException, InvalidObjectException {

    String dbName = oldPartition.getDbName();

    String tableName = oldPartition.getTableName();

    try {

      List<String> oldPartNames = Lists.newArrayList(oldPartName);

      List<String> oldColNames = new ArrayList<String>(oldCols.size());

      for (FieldSchema oldCol : oldCols) {

        oldColNames.add(oldCol.getName());

      }

      List<FieldSchema> newCols = newPart.getSd().getCols();

      List<ColumnStatistics> partsColStats = msdb.getPartitionColumnStatistics(dbName, tableName,

          oldPartNames, oldColNames);

      assert (partsColStats.size() <= 1);

      for (ColumnStatistics partColStats : partsColStats) { //actually only at most one loop

        List<ColumnStatisticsObj> statsObjs = partColStats.getStatsObj();

        for (ColumnStatisticsObj statsObj : statsObjs) {

          boolean found =false;

          for (FieldSchema newCol : newCols) {

            if (statsObj.getColName().equals(newCol.getName())

                && statsObj.getColType().equals(newCol.getType())) {

              found = true;

              break;

            }

          }

          if (!found) {

            msdb.deletePartitionColumnStatistics(dbName, tableName, oldPartName, partVals,

                statsObj.getColName());

          }

        }

      }

    } catch (NoSuchObjectException nsoe) {

      LOG.debug("Could not find db entry." + nsoe);

      //ignore

    } catch (InvalidInputException iie) {

      throw new InvalidObjectException

      ("Invalid input to update partition column stats in alter table change columns" + iie);

    }

  }

　　这里可以看到，它查询元数据并封装了一个ColumnStatistics对象，这个对象主要封装了tableName、PartName、colName等信息，随后将其取出来使新老ColName进行对比，注意，这里是对colName以及type进行对比，如果不同，则删除老的colName信息。

　　好的，现在相当于将所有old的不一致的数据删除，下来我们回到之前的alterPartition中来，随后调用alterPartition(dbname, name, new_part.getValues(), new_part)将新的partition数据注册到元数据中。以上，只是当调用rename_partition时，par_vals为null的情况下，对oldPart所进行的操作，那么不为null时呢？是不是很绝望？我们慢慢折磨哈哈。。。

　　8、在par_vals不为null的情况下，会通过dbName、tableName、以及part_vals进行oldPart的查找并进行校验。

　　9、对表的类型进行判断，如果该表为内部表，则将原有的oldPart的table所在storage路径，也就是hdfs路径赋给newPart,这里注意的是不是partition的location路径，是storage的location路径。随之调用deletePartitionColumnStatistics直接删除原有partition meta信息。

　　10、如果该表为外部表，其实就是进行check,随后删除元数据meta(其实是中间有没懂得地方哈哈哈。。而且太晚了，后续补上....)代码如下：

        try {

           destPath = new Path(wh.getTablePath(msdb.getDatabase(dbname), name),

             Warehouse.makePartName(tbl.getPartitionKeys(), new_part.getValues()));

           destPath = constructRenamedPath(destPath, new Path(new_part.getSd().getLocation()));

         } catch (NoSuchObjectException e) {

           LOG.debug(e);

           throw new InvalidOperationException(

             "Unable to change partition or table. Database " + dbname + " does not exist"

               + " Check metastore logs for detailed stack." + e.getMessage());

         }

         if (destPath != null) {

           newPartLoc = destPath.toString();

           oldPartLoc = oldPart.getSd().getLocation();

           srcPath = new Path(oldPartLoc);

           LOG.info("srcPath:" + oldPartLoc);

           LOG.info("descPath:" + newPartLoc);

           srcFs = wh.getFs(srcPath);

           destFs = wh.getFs(destPath);

           // check that src and dest are on the same file system

           if (!FileUtils.equalsFileSystem(srcFs, destFs)) {

             throw new InvalidOperationException("table new location " + destPath

               + " is on a different file system than the old location "

               + srcPath + ". This operation is not supported");

           }

           try {

             srcFs.exists(srcPath); // check that src exists and also checks

             if (newPartLoc.compareTo(oldPartLoc) != 0 && destFs.exists(destPath)) {

               throw new InvalidOperationException("New location for this table "

                 + tbl.getDbName() + "." + tbl.getTableName()

                 + " already exists : " + destPath);

             }

           } catch (IOException e) {

             throw new InvalidOperationException("Unable to access new location "

               + destPath + " for partition " + tbl.getDbName() + "."

               + tbl.getTableName() + " " + new_part.getValues());

           }

           new_part.getSd().setLocation(newPartLoc);

           if (MetaStoreUtils.requireCalStats(hiveConf, oldPart, new_part, tbl)) {

             MetaStoreUtils.updatePartitionStatsFast(new_part, wh, false, true);

           }

           String oldPartName = Warehouse.makePartName(tbl.getPartitionKeys(), oldPart.getValues());

           try {

             //existing partition column stats is no longer valid, remove

             msdb.deletePartitionColumnStatistics(dbname, name, oldPartName, oldPart.getValues(), null);

　　总的来说，会发现调用alterPartition的时候，并没有与物理操作耦合在一起，只是对ColumnStats元数据进行查找更新删除等动作，但是真正在调用alterPartition时，对于元数据本身，其实是更新了该partition的sd信息，以及重要的location.

　　相关的操作还是蛮多的，这里知识大致的分析了下，边看源码边写，如有错误之处，还望各位大神之处，谢谢~ 碎觉~~明天去作死的干活咯~

Hive metastore源码阅读（三）的更多相关文章

Hive metastore源码阅读（一）
不要问我为什么,因为爱,哈哈哈哈...进入正题,最近做项目顺带学习了下hive metastore的源码,进行下知识总结. hive metastore的整体架构如图: 一.组成结构: 如图我们可以看 ...
Hive metastore源码阅读（二）
最近随着项目的深入,发现hive meta有些弊端,就是你会发现它的元数据操作与操作物理集群的代码耦合在一起,非常不利于扩展.比如:在create_table的时候同时进行路径校验及创建,如下代码: ...
25 BasicUsageEnvironment0基本使用环境基类——Live555源码阅读(三)UsageEnvironment
25 BasicUsageEnvironment0基本使用环境基类——Live555源码阅读(三)UsageEnvironment 25 BasicUsageEnvironment0基本使用环境基类— ...
26 BasicUsageEnvironment基本使用环境——Live555源码阅读(三)UsageEnvironment
26 BasicUsageEnvironment基本使用环境--Live555源码阅读(三)UsageEnvironment 26 BasicUsageEnvironment基本使用环境--Live5 ...
24 UsageEnvironment使用环境抽象基类——Live555源码阅读(三)UsageEnvironment
24 UsageEnvironment使用环境抽象基类——Live555源码阅读(三)UsageEnvironment 24 UsageEnvironment使用环境抽象基类——Live555源码阅读 ...
Hive cli源码阅读和梳理
对Cli的重新认识*). hive cli有两种模式, 本地模式: 采用持有的driver对象来处理, 远程模式: 通过连接HiveServer来实现, 由此可见之前的架构图中的描述还是模糊且带有误导 ...
SparkSQL（源码阅读三）
额,没忍住,想完全了解sparksql,毕竟一直在用嘛,想一次性搞清楚它,所以今天再多看点好了~ 曾几何时,有一个叫做shark的东西,它改了hive的源码...突然有一天,spark Sql突然出现 ...
SpringMVC源码阅读(三)
先理一下Bean的初始化路线 org.springframework.beans.factory.support.AbstractBeanDefinitionReader public int loa ...
JDK源码阅读(三) Collection<T>接口,Iterable<T>接口
package java.util; public interface Collection<E> extends Iterable<E> { //返回该集合中元素的数量 in ...

随机推荐

【转载备忘】PowerDesigner16.5基本使用
这两天都在设计数据库,使用了powerdesigner进行设计的,然后摸索了好久,本来打算写一篇文章来记述一下的,写了一半,突然发现网上早就有比我写的好的文章了,所有删了之前写的,直接贴出来那个文章的 ...
Uva 12171 Sculpture - 离散化 + floodfill
题目连接:https://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem ...
ADO.NET复习总结（1）--ADO.NET基础介绍
1.为什么要学ADO.NET: 之前我们所学只能在查询分析器里查看数据,操作数据,我们不能让普通用户去学sql, 所以我们搭建一个界面(Web或Winform)让用户方便的操作数据库中的数据. 2.什 ...
CCF系列之有趣的数(201312-4)
题目链接: http://115.28.138.223:81/view.page?opid=4 试题名称: 有趣的数时间限制: 1.0s 内存限制: 256.0MB 问题描述: 问题描述我们把一个 ...
Javascript学习--烟花
<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <title&g ...
Redis 2种持久化模式的缺陷
http://blog.csdn.net/hexieshangwang/article/details/47254087 一.RDB持久化模式缺陷 1.问题描述: 并发200路,模拟不断写Redis, ...
Shell中脚本变量的作用域
原文地址:http://blog.csdn.net/abc86319253/article/details/46341839 在shell中定义函数可以使代码模块化,便于复用代码.不过脚本本身的 ...
Linkin大话eclipse快捷键
刚来这家公司的时候,作为菜鸟的我在帮别人调试代码的时候,有人说我快捷键使用的很熟悉. 呵呵,工欲善其事必先利其器,以下这些快捷键是最常用的也是要必须记住的. [Ctrl开头] Ctrl+1:快速修复 ...
java判断网页的编码格式
在爬取内容时,遇到乱码问题.故需对网页内容编码格式做判断,方式大体分为三种:一.从header标签中获取Content-Type=#Charset:二.从meta标签中获取Content-Type=# ...
GTID复制详解
前言 GTID复制是MySQL 5.6后的新功能,在传统的方式里,主从切换后,需要找到binlog和POS点,然后执行命令change master to 指向新的主库.对于不是很有经验的人来说,往往 ...

Hive metastore源码阅读（三）

Hive metastore源码阅读（三）的更多相关文章

随机推荐

热门专题