HDFS中文件的压缩与解压

文件的压缩有两大好处：1、可以减少存储文件所需要的磁盘空间；2、可以加速数据在网络和磁盘上的传输。尤其是在处理大数据时，这两大好处是相当重要的。

　　下面是一个使用gzip工具压缩文件的例子。将文件/user/hadoop/aa.txt进行压缩，压缩后为/user/hadoop/text.gz

 1 package com.hdfs;

 2

 3 import java.io.IOException;

 4 import java.io.InputStream;

 5 import java.io.OutputStream;

 6 import java.net.URI;

 7

 8 import org.apache.hadoop.conf.Configuration;

 9 import org.apache.hadoop.fs.FSDataInputStream;

10 import org.apache.hadoop.fs.FSDataOutputStream;

11 import org.apache.hadoop.fs.FileSystem;

12 import org.apache.hadoop.fs.Path;

13 import org.apache.hadoop.io.IOUtils;

14 import org.apache.hadoop.io.compress.CompressionCodec;

15 import org.apache.hadoop.io.compress.CompressionCodecFactory;

16 import org.apache.hadoop.io.compress.CompressionInputStream;

17 import org.apache.hadoop.io.compress.CompressionOutputStream;

18 import org.apache.hadoop.util.ReflectionUtils;

19

20 public class CodecTest {

21     //压缩文件

22     public static void compress(String codecClassName) throws Exception{

23         Class<?> codecClass = Class.forName(codecClassName);

24         Configuration conf = new Configuration();

25         FileSystem fs = FileSystem.get(conf);

26         CompressionCodec codec = (CompressionCodec)ReflectionUtils.newInstance(codecClass, conf);

27         //指定压缩文件路径

28         FSDataOutputStream outputStream = fs.create(new Path("/user/hadoop/text.gz"));

29         //指定要被压缩的文件路径

30         FSDataInputStream in = fs.open(new Path("/user/hadoop/aa.txt"));

31         //创建压缩输出流

32         CompressionOutputStream out = codec.createOutputStream(outputStream);

33         IOUtils.copyBytes(in, out, conf);

34         IOUtils.closeStream(in);

35         IOUtils.closeStream(out);

36     }

37

38     //解压缩

39     public static void uncompress(String fileName) throws Exception{

40         Class<?> codecClass = Class.forName("org.apache.hadoop.io.compress.GzipCodec");

41         Configuration conf = new Configuration();

42         FileSystem fs = FileSystem.get(conf);

43         CompressionCodec codec = (CompressionCodec)ReflectionUtils.newInstance(codecClass, conf);

44         FSDataInputStream inputStream = fs.open(new Path("/user/hadoop/text.gz"));

45          //把text文件里到数据解压，然后输出到控制台

46         InputStream in = codec.createInputStream(inputStream);

47         IOUtils.copyBytes(in, System.out, conf);

48         IOUtils.closeStream(in);

49     }

50

51     //使用文件扩展名来推断二来的codec来对文件进行解压缩

52     public static void uncompress1(String uri) throws IOException{

53         Configuration conf = new Configuration();

54         FileSystem fs = FileSystem.get(URI.create(uri), conf);

55

56         Path inputPath = new Path(uri);

57         CompressionCodecFactory factory = new CompressionCodecFactory(conf);

58         CompressionCodec codec = factory.getCodec(inputPath);

59         if(codec == null){

60             System.out.println("no codec found for " + uri);

61             System.exit(1);

62         }

63         String outputUri = CompressionCodecFactory.removeSuffix(uri, codec.getDefaultExtension());

64         InputStream in = null;

65         OutputStream out = null;

66         try {

67             in = codec.createInputStream(fs.open(inputPath));

68             out = fs.create(new Path(outputUri));

69             IOUtils.copyBytes(in, out, conf);

70         } finally{

71             IOUtils.closeStream(out);

72             IOUtils.closeStream(in);

73         }

74     }

75

76     public static void main(String[] args) throws Exception {

77         //compress("org.apache.hadoop.io.compress.GzipCodec");

78         //uncompress("text");

79         uncompress1("hdfs://master:9000/user/hadoop/text.gz");

80     }

81

82 }

　　首先执行77行进行压缩，压缩后执行第78行进行解压缩，这里解压到标准输出，所以执行78行会再控制台看到文件/user/hadoop/aa.txt的内容。如果执行79行的话会将文件解压到/user/hadoop/text,他是根据/user/hadoop/text.gz的扩展名判断使用哪个解压工具进行解压的。解压后的路径就是去掉扩展名。

　　进行文件压缩后，在执行命令./hadoop fs -ls /user/hadoop/查看文件信息，如下：

1 [hadoop@master bin]$ ./hadoop fs -ls /user/hadoop/

2 Found 7 items

3 -rw-r--r--   3 hadoop supergroup   76805248 2013-06-17 23:55 /user/hadoop/aa.mp4

4 -rw-r--r--   3 hadoop supergroup        520 2013-06-17 22:29 /user/hadoop/aa.txt

5 drwxr-xr-x   - hadoop supergroup          0 2013-06-16 17:19 /user/hadoop/input

6 drwxr-xr-x   - hadoop supergroup          0 2013-06-16 19:32 /user/hadoop/output

7 drwxr-xr-x   - hadoop supergroup          0 2013-06-18 17:08 /user/hadoop/test

8 drwxr-xr-x   - hadoop supergroup          0 2013-06-18 19:45 /user/hadoop/test1

9 -rw-r--r--   3 hadoop supergroup         46 2013-06-19 20:09 /user/hadoop/text.gz

第4行为压缩之前的文件，大小为520个字节。第9行为压缩后的文件，大小为46个字节。由此可以看出上面讲的压缩的两大好处了。

我喜欢，驾驭着代码在风驰电掣中创造完美！我喜欢，操纵着代码在随必所欲中体验生活！我喜欢，书写着代码在时代浪潮中完成经典！每一段新的代码在我手中诞生对我来说就象观看刹那花开的感动！

欢迎分享与转载

分类: hadoop

标签: hadoop, HDFS

HDFS中文件的压缩与解压的更多相关文章

Asp.net中文件的压缩与解压
这里笔者为大家介绍在asp.net中使用文件的压缩与解压.在asp.net中使用压缩给大家带来的好处是显而易见的,首先是减小了服务器端文件存储的空间,其次下载时候下载的是压缩文件想必也会有效果吧,特别 ...
XML序列化判断是否是手机字符操作普通帮助类验证数据帮助类 IO帮助类 c# Lambda操作类封装 C# -- 使用反射（Reflect）获取dll文件中的类型并调用方法 C# -- 文件的压缩与解压（GZipStream）
XML序列化 #region 序列化 /// <summary> /// XML序列化 /// </summary> /// <param name="ob ...
C#调用7z实现文件的压缩与解压
1.关于7z 首先在这里先介绍一下7z压缩软件,7z是一种主流的压缩格式,它拥有极高的压缩比.在计算机科学中,7z是一种可以使用多种压缩算法进行数据压缩的档案格式.主要有以下特点: 来源且模块化的组 ...
C# -- 文件的压缩与解压（GZipStream）
文件的压缩与解压需引入 System.IO.Compression; 1.C#代码(入门案例) Console.WriteLine("压缩文件..............."); ...
linux下tar gz bz2 tgz z等众多压缩文件的压缩与解压方法
Linux下最常用的打包程序就是tar了,使用tar程序打出来的包我们常称为tar包,tar包文件的命令通常都是以.tar结尾的.生成tar包后,就可以用其它的程序来进行压缩了,所以首先就来讲讲ta ...
浅谈在c#中使用Zlib压缩与解压的方法
作者:Compasslg 介绍近期用c#开发一个游戏的存档编辑工具需要用 Zlib 标准的 Deflate 算法对数据进行解压. 在 StackOverflow 上逛了一圈,发现 c# 比较常用到的 ...
C#使用ICSharpCode.SharpZipLib.dll进行文件的压缩与解压
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.I ...
文件的压缩与解压XZip,XUnzip
参考http://www.codeproject.com/KB/cpp/xzipunzip.aspx CreateZip() –创建一个空的 zip 文件 HZIP CreateZip(void *z ...
cmd实现cab文件的压缩与解压
压缩(makecab): 1.单文件压缩 makecab ip2.txt ip2.txt.cab 2.多文件压缩 makecab /f c:\list.txt /d expresstype=mszip ...

随机推荐

What is HHVM?
What is HHVM? HHVM is an open-source virtual machine designed for executing programs written in Hack ...
Critical thinking and Thoughtful writing
Critical thinkers are able to : Articulate their ideas clearly and persuasively in writing Understan ...
WCF与Web API 区别
WCF与Web API 区别(应用场景) Web api 主要功能: 支持基于Http verb (GET, POST, PUT, DELETE)的CRUD (create, retrieve, ...
POJ3233(矩阵二分再二分)
题目非常有简单: Description Given a n × n matrix A and a positive integer k, find the sum S = A + A2 + A3 + ...
分区表在安装系统（MBR）丢失或损坏
操作系统能识别出硬盘中的各个不同的分区,是靠硬盘分区表(MBR)来识别的. 硬盘分区表中记录了各个分区的位置和大小以及类型等信息,假设这个分区表破坏了,那么这块硬盘里面的分区就会丢失.系统是无法在浏览 ...
数以百万计美元的融资YO是什么东东？
给自己做个广告哈,新栏目"面试"已经推出,回复"面试"就可以获取. 这两天最火的应用是什么.非yo莫属,堪称史上最简单的社交应用,仅仅能向好友发送一个yo. 出 ...
Spring Resource之内置的Resource实现
Spring提供了大量的并且可以直接使用的Resource实现 1.UrlResource UrlResource封装了一个java.net.URL,而且可以通过一个URL用于访问任何对象,例如文件. ...
通过如何通过js实现复制粘贴功能
在ie中window.clipboardData(剪切板对象)是可以被获取,所以利用这个方法我们可以实现在IE当中复制粘贴的功能,demo如下! <html> <head> & ...
谈一谈struts2和springmvc的拦截器
最近涉及到了两个项目,都需要考虑全局的拦截器,其功能就是判断session的登陆状态,如果session信息完好,可以从中取得相应的信息,则放行,否则拦截,进入重定向的uri. 既然是全局的拦截器,其 ...
Javascript多线程引擎(七)
Javascript多线程引擎(七)--synchronized关键字经过两天的努力, 今天synchronzied关键字终于支持了, 如下是测试代码 thread() 是一个开启新线程的API, ...

HDFS中文件的压缩与解压

HDFS中文件的压缩与解压的更多相关文章

随机推荐

热门专题