Hive 利用 on tez 引擎合并小文件

标签（空格分隔）： Hive



SET hive.exec.dynamic.partition=true;

SET hive.exec.dynamic.partition.mode=nonstrict;

set hive.exec.max.dynamic.partitions=3000;

set hive.exec.max.dynamic.partitions.pernode=500;

SET hive.tez.container.size=6656;

SET hive.tez.java.opts=-Xmx5120m;

set hive.merge.tezfiles=true;

set hive.merge.smallfiles.avgsize=1280000000;

set hive.merge.size.per.task=1280000000;

set hive.execution.engine=tez;

insert overwrite table zhaobo_test.lazy_st_rpt_priv_occupation_new partition (pt) select * from zhaobo_test.lazy_st_rpt_priv_occupation_new;



=============tez 合并========

Try using TEZ execution engine and then hive.merge.tezfiles. You might also want to specify the size as well.

set hive.execution.engine=tez; -- TEZ execution engine

set hive.merge.tezfiles=true; -- Notifying that merge step is required

set hive.merge.smallfiles.avgsize=128000000; --128MB

set hive.merge.size.per.task=128000000; -- 128MB

================合并============

If you want to go with MR engine then add following settings (I haven't tried it personally)

set hive.merge.mapredfiles=true; -- Notifying that merge step is required

set hive.merge.smallfiles.avgsize=128000000; --128MB

set hive.merge.size.per.task=128000000; -- 128MB

Above setting will spawn one more step to merge the files and approx size of each part file should be 128MB.

获取 partition.

beeline -u jdbc:hive2://10.111.55.163:10000 -n   deploy --showHeader=false --outputformat=tsv2 --silent=true -e "show partitions ods.t_city" > found_partitions.txt

开始执行

#!/bin/bash

for line in `cat found_partitions.txt`;

do

    echo "the next partition is $line"

    partition=`(echo $line | sed -e 's/\//,/g' -e "s/=/='/g" -e "s/,/',/g")`\'

    beeline -u jdbc:hive2://10.111.55.163:10000 -n  deploy -e "alter table database.table partition($partition) concatenate"

done

Hive 利用 on tez 引擎合并小文件的更多相关文章

Hadoop HDFS编程 API入门系列之合并小文件到HDFS（三）
不多说,直接上代码. 代码 package zhouls.bigdata.myWholeHadoop.HDFS.hdfs7; import java.io.IOException;import ja ...
[转载]mapreduce合并小文件成sequencefile
mapreduce合并小文件成sequencefile http://blog.csdn.net/xiao_jun_0820/article/details/42747537
HDFS 07 - HDFS 性能调优之合并小文件
目录 1 - 为什么要合并小文件 2 - 合并本地的小文件,上传到 HDFS 3 - 合并 HDFS 的小文件,下载到本地 4 - 通过 Java API 实现文件合并和上传版权声明 1 - 为什么 ...
hadoop 使用map合并小文件到SequenceFile
上一例是直接用SequenceFile的createWriter来实现,本例采用mapreduce的方式. 1.把小文件整体读入需要自定义InputFormat格式,自定义InputFormat格式需 ...
Hadoop合并小文件的几种方法
1.Hadoop HAR 将众多小文件打包成一个大文件进行存储,并且打包后原来的文件仍然可以通过Map-Reduce进行操作,打包后的文件由索引和存储两大部分组成: 缺点: 一旦创建就不能修改,也不支 ...
iceberg合并小文件冲突测试
基于iceberg的master分支的9b6b5e0d2(2022-2-9). 参数说明 1.PARTIAL_PROGRESS_ENABLED(partial-progress.enabled) 默认 ...
hadoop spark合并小文件
一.输入文件类型设置为 CombineTextInputFormat hadoop job.setInputFormatClass(CombineTextInputFormat.class) sp ...
Facebook-Haystack合并小文件
1.原文 https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf 2.翻译版 http://www.importn ...
hive优化之自己主动合并输出的小文件
1.先在hive-site.xml中设置小文件的标准. <property> <name>hive.merge.smallfiles.avgsize</name> ...

随机推荐

【weixin】微信支付简介
一.微信支付模式 1.付款码支付付款码支付是用户展示微信钱包内的“刷卡条码/二维码”给商户系统扫描后直接完成支付的模式.主要应用线下面对面收银的场景. 2.Native支付 Native支付是商户系 ...
SpringBoot整合Mybatis关于分页查询的方法
最近公司在用到SpringBoot整合Mybatis时当web端页面数据增多时需要使用分页查询以方便来展示数据.本人对分页查询进行了一些步骤的总结,希望能够帮助到有需要的博友.如有更好的方式,也希望评 ...
这不是javascript：什么？
javascript协议.<a href=“javascript:void(0):”>xxx</a>基于事件的事件,例如:<input onblur=“check():” ...
[转]DELL PERC 系列阵列卡选型和用法指南
引用地址 https://www.sulabs.net/?p=895 DELL PERC 系列阵列卡选型和用法指南 2018年12月29日 Su 本文缘起于一位朋友在生产服务器硬件中,使用了错误的阵列 ...
Line 算法与deepwalk的对比和个人理解
用户的关注关系本身就是一个图结构,要从用户关注关系生成用户的embedding表示,其实就是做graph的emebding表示. deepwalk+word2vec 比较简单,效果也还可以.这种方法再 ...
目标检测之车辆行人（darknet版yolov3）
序言自动驾驶是目前非常有前景的行业,而视觉感知作为自动驾驶中的“眼睛”,有着非常重要的地位和作用.为了能有效地识别到行驶在路上的动态目标,如汽车.行人等,我们需要提前对这些目标的进行训练, ...
P_C_Brules
最小孔径10mil,最小线宽4mil,最小安全间距4mil.这个是一般厂家能做的.嘉立创为5mil. 1.xiankuan . 一般设为10mil.嘉立创多层板3.5mil,单双面5mil 电流的考量 ...
JS 函数相关的声明调用
// 函数声明方法一 function f (a, b) { return a + b; } // 函数调用 console.log(f(1, 4)); // 函数声明方法二 var num = fu ...
Ubuntu各个版本下载
官网:https://www.ubuntu.com/download/desktop 没找到历史版本,且下载速度很慢在网易镜像站下载ubuntu: 网址:http://mirrors.163.com ...
JAVA WEB之Servlet使用
3. JSP提交数据和Servlet程序设计想要将JSP数据提交,主要的方法有form表单方式.url方式和Session方式.将JSP数据传递给后台,form表单显然比较简单方便. 正如上一节中展 ...

Hive 利用 on tez 引擎 合并小文件

Hive 利用 on tez 引擎 合并小文件

获取 partition.

开始执行

Hive 利用 on tez 引擎 合并小文件的更多相关文章

随机推荐

热门专题

Hive 利用 on tez 引擎合并小文件

Hive 利用 on tez 引擎合并小文件

Hive 利用 on tez 引擎合并小文件的更多相关文章