https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO

LanguageManual LZO

Created by Lefty Leverenz, last modified on Sep 19, 2017

LZO Compression

LZO Compression

General LZO Concepts

LZO is a lossless data compression library that favors speed over compression ratio. See http://www.oberhumer.com/opensource/lzo and http://www.lzop.org for general information about LZO and see Compressed Data Storage for information about compression in Hive.

Imagine a simple data file that has three columns

id
first name
last name

Let's populate a data file containing 4 records:

19630001     john          lennon

19630002     paul          mccartney

19630003     george        harrison

19630004     ringo         starr

Let's call the data file /path/to/dir/names.txt.

In order to make it into an LZO file, we can use the lzop utility and it will create a names.txt.lzo file.

Now copy the file names.txt.lzo to HDFS.

Prerequisites

Lzo/Lzop Installations

lzo and lzop need to be installed on every node in the Hadoop cluster. The details of these installations are beyond the scope of this document.

`core-site.xml`

Add the following to your core-site.xml:

com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec

For example:

<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>

<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

Next we run the command to create an LZO index file:

hadoop jar /path/to/jar/hadoop-lzo-cdh4-0.4.15-gplextras.jar com.hadoop.compression.lzo.LzoIndexer  /path/to/HDFS/dir/containing/lzo/files

This creates names.txt.lzo on HDFS.

Table Definition

The following hive -e command creates an LZO-compressed external table:

hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS hive_table_name (column_1  datatype_1......column_N datatype_N)

         PARTITIONED BY (partition_col_1 datatype_1 ....col_P  datatype_P)

         ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

         STORED AS INPUTFORMAT  \"com.hadoop.mapred.DeprecatedLzoTextInputFormat\"

                   OUTPUTFORMAT \"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\";

Note: The double quotes have to be escaped so that the 'hive -e' command works correctly.

See CREATE TABLE and Hive CLI for information about command syntax.

Hive Queries

Option 1: Directly Create LZO Files

Directly create LZO files as the output of the Hive query.
Use lzop command utility or your custom Java to generate .lzo.index for the .lzo files.

Hive Query Parameters

SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec

SET hive.exec.compress.output=true

SET mapreduce.output.fileoutputformat.compress=true

For example:

hive -e "SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec; SET hive.exec.compress.output=true;SET mapreduce.output.fileoutputformat.compress=true; <query-string>"

Note: If the data sets are large or number of output files are large , then this option does not work.

Option 2: Write Custom Java to Create LZO Files

Create text files as the output of the Hive query.
Write custom Java code to
1. convert Hive query generated text files to .lzo files
2. generate .lzo.index files for the .lzo files generated above

Hive Query Parameters

Prefix the query string with these parameters:

SET hive.exec.compress.output=false

SET mapreduce.output.fileoutputformat.compress=false

For example:

hive -e "SET hive.exec.compress.output=false;SET mapreduce.output.fileoutputformat.compress=false;<query-string>"

Write Custom Java to Create LZO Files的更多相关文章

How to create PDF files in a Python/Django application using ReportLab
https://assist-software.net/blog/how-create-pdf-files-python-django-application-using-reportlab CONT ...
转载Java NIO中的Files类的使用
Java NIO中的Files类(java.nio.file.Files)提供了多种操作文件系统中文件的方法. Files.exists() Files.exits()方法用来检查给定的Path在文件 ...
Java使用JSP Tag Files & JSP EL Functions打造你自己的页面模板
1. 简单说明:在JSP 2.0后, 你不再需要大刀阔斧地定义一堆TagSupport或BodyTagSupport, 使用JSP Tag Files技术可以实现功能强大的页面模板技术. 在这里抛砖引 ...
Java NIO.2 使用Files类遍历文件夹
在以前的Java版本中,如果要遍历某个文件夹下所有的子文件.子文件夹,需要我们自己写递归,很麻烦. 在Java7以后,我们可以NIO.2中的Files工具类来遍历某个文件夹(会自动递归). 大致用法: ...
使用Java内存映射(Memory-Mapped Files)处理大文件
>>NIO中的内存映射 (1)什么是内存映射文件内存映射文件,是由一个文件到一块内存的映射,可以理解为将一个文件映射到进程地址,然后可以通过操作内存来访问文件数据.说白了就是使用虚拟内存将 ...
Using Custom Java code in ODI
在ODI中调用jar包java方法的过程如下: 1.编写Java代码如下代码写hello world字符串到一个文件. package odi; import java.io.File; impor ...
Create XML Files Out Of SQL Server With SSIS And FOR XML Syntax
So you want to spit out some XML from SQL Server into a file, how can you do that? There are a coupl ...
[Tools] Batch Create Markdown Files from a Template with Node.js and Mustache
Creating Markdown files from a template is a straightforward process with Node.js and Mustache. You ...
Cognos权限Custom Java Provider表结构实例
select * from org_user;USER_ID USER_CODE USER_NAME FULL_NAME EMAIL PWD2 889 zhangsan 张三 123@126.com ...

随机推荐

【Visual Studio】The project appears to be under source control, but the associated source control plug-in is not installed on this computer
[问题描述]用 Visual Studio 2013打开一个项目时,出现下面错误: [问题原因]参考 http://codeverge.com/asp.net.web-forms/the-projec ...
标准C程序设计七---73
Linux应用编程深入语言编程标准C程序设计七---经典C11程序设计以下内容为阅读: <标准C程序设计>(第7版) 作者 ...
js-关于微信页面分享（取消或打开）
在微信二次开发中,我们会遇到页面可以分享或不能分享的情况(私人隐私页面不能.禁止分享) 1.禁止页面分享(取消微信开打页面的分享功能) <script> function onBridge ...
HDU 6206 Apple
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=6206 判断给定一点是否在三角形外接圆内. 给定三角形三个顶点的坐标,如何求三角形的外心的坐标呢? 知乎 ...
zookeeper源码 — 四、session建立
目录 session建立的主要过程客户端发起连接服务端创建session session建立的主要过程用一张图来说明session建立过程中client和server的交互主要流程服务端启动 ...
在路上：安全公司“跨界”SD-WAN
编者按:本文是SDNLAB“企业+”特别报道之一.“企业+”是SDNLAB重点打造的栏目,汇聚信息行业运营商.设备商.互联网公司.软件公司.集成公司.融创投资公司.科研院所等企业,重新定义IT行业撮合 ...
提高速度 history 的利用
history的介绍history是shell的内置命令,其内容在系统默认的shell的man手册中.history是显示在终端输入并执行的过命令,系统默认保留1000条.[root@localhos ...
this.class.getClassLoader().getResourceAsStream
this.getClass().getClassLoader().getResource("template"); 首先,调用对象的getClass()方法是获得对象当前的类 ...
迅雷中Peer连接信息中的状态解释（转）
在标准 Peer-to-Peer(P2P 点对点网络)中,以"Flags"表示 Peer Status(Peer 状态).其中: D - 正从 Peer 下载(感兴趣:解阻塞)搜索 ...
android showmessage
package com.example.yanlei.yl6; import android.annotation.TargetApi; import android.app.Activity; im ...

Write Custom Java to Create LZO Files

LanguageManual LZO

LZO Compression

General LZO Concepts

Prerequisites

Lzo/Lzop Installations

core-site.xml

Table Definition

Hive Queries

Option 1: Directly Create LZO Files

Option 2: Write Custom Java to Create LZO Files

Write Custom Java to Create LZO Files的更多相关文章

随机推荐

热门专题

`core-site.xml`