关于reduce输出write方法

关于hadoop一些自定义输出

code>OutputFormat</code> describes the output-specification for a

 * Map-Reduce job.

首先继承outputFormat<key,value>这个抽象类 Map-Reduce job的输出规范

实现他的方法：

RecordWriter<KeyBaseDimension, BaseStatsValueWritable> getRecordWriter 在方法内可以进行数据库连接操作

这里需要一个返回一个RecordWriter

继承这个RecordWriter类

 实现里面的write方法 进行数据库jdbc存储即可

 关于reduce端输出时会调用的write方法

 实现类为：TaskInputOutputContextImpl

  private RecordWriter<KEYOUT,VALUEOUT> output;

  public void write(KEYOUT key, VALUEOUT value

                    ) throws IOException, InterruptedException {

    output.write(key, value);

  }

  最终是调用了RecordWriter的write方法，

map端读取hbase一个mr工具类

*在提交TableMap作业之前使用此选项。它将被适当地设置

*工作。

TableMapReduceUtil 这个类很重要，在提交读取hbase表job之前可以对其进行一系列过滤操作

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);

        filterList.addFilter(

                new SingleColumnValueFilter(EventLogConstants.EVENT_LOGS_FAMILY_NAME_BYTES,

                        Bytes.toBytes(EventLogConstants.LOG_COLUMN_NAME_EVENT_NAME),

                        CompareOp.EQUAL, Bytes.toBytes(EventLogConstants.EventEnum.BC_SX.alias)));

public static void initTableMapperJob(List<Scan> scans,

      Class<? extends TableMapper> mapper,

      Class<?> outputKeyClass,

      Class<?> outputValueClass, Job job,

      boolean addDependencyJars,

      boolean initCredentials) throws IOException {

 scan之前进行过滤器数据
   List<Scan> scanList = new ArrayList<Scan>();

        try {

            conn = ConnectionFactory.createConnection(conf);

            admin = conn.getAdmin();

            String tableName = EventLogConstants.HBASE_NAME_AUDIT_SX + GlobalConstants.UNDERLINE + statDate.replaceAll(GlobalConstants.KEY_SEPARATOR, "");

            if (admin.tableExists(TableName.valueOf(tableName))) {

                Scan scan = new Scan();

  scan读取多个表的设置
 // If an application wants to use multiple scans over different tables each scan must

  // define this attribute with the appropriate table name by calling

  // scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes(tableName))

 // static public final String SCAN_ATTRIBUTES_TABLE_NAME = "scan.attributes.table.name";

                scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes(tableName));

                scan.setFilter(filterList);

                scanList.add(scan);

  
          }
最后将job 与scanlist都设置进去

TableMapReduceUtil.initTableMapperJob(scanList, AuditorSXMapper.class,

AuditorDimensionKey.class, Text.class, job, false);

strom一些笔记知识

 storm echo(File(),fun,File())

 filter:实现filter接口 iskeep方法

 partitionAggregate函数：分区内聚合，实现aggregate<保存聚合状态的类> 的aggregate实现聚合逻辑 ，complete方法 ridentCollector collector.emit(Value(聚合后的值))

 一般的key拼接函数：实现function接口的execute方法

 HBaseMapState.Options optsWait = new HBaseMapState.Options();

     TridentState amtOfWaitState = partStream.project(new Fields("waitingTotalOfPartDay","dayAndContType"))

                .groupBy(new Fields("dayAndContType"))

                .persistentAggregate(

                        factoryWait,

                        new Fields("waitingTotalOfPartDay"),new Sum(),

                        new Fields("waitingGlobalOfDay")

                );

 persistentAggregate 持久化保存函数 进行全区的sum求和，输入各区，输出为总和

关于reduce输出write方法的更多相关文章

老版mapreduce跑streaming作业多路输出的方法
1. 继承MultipleTextOutputFormat实现自己的输出类. 2. 重写generateFileNameForKeyValue方法,返回输出的名字,可通过"/"分割 ...
PHP中zlib扩展实现GZIP压缩输出各种方法总结
一般情况下我们出现大量数据传输理希望减少服务器的带宽压力,会采取一种方式来压缩文件传输,php中用zlib也可以实现gzip压缩输出,下面我们来看GZIP压缩输出各种方法总结. GZIP(GNU-ZI ...
几种在shell命令行中过滤adb logcat输出的方法
我们在Android开发中总能看到程序的log日志内容充满了屏幕,而真正对开发者有意义的信息被淹没在洪流之中,让开发者无所适从,严重影响开发效率.本文就具体介绍几种在shell命令行中过滤adblog ...
Python中日期和时间格式化输出的方法
本文转自:https://www.jb51.net/article/62518.htm 本文实例总结了python中日期和时间格式化输出的方法.分享给大家供大家参考.具体分析如下: python格式化 ...
JavaScript实现按照指定长度为数字前面补零输出的方法
本文实例讲述了JavaScript实现按照指定长度为数字前面补零输出的方法.分享给大家供大家参考.具体分析如下: 例如我们希望输出的数字长度是固定的,假设为10,如果数字为123,则输出0000000 ...
分析Red Hat sosreport输出的方法
分析Red Hat sosreport输出的方法? Modified on: Fri, 31 May 2019 20:20:02 +0800 有一段时间(自EL 4.6以来),Red Hat嵌入了sy ...
数组reduce和map方法
1.有一个长度为100的数组,请以优雅的方式求出该数组的前10个元素之和 var a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],sum ...
ES6之reduce和reduceRight方法应用实例
for循环是最基本的遍历循环,但是有些时候并不是很实用,且效率和性能较低,故本文列举出工作学习中碰到的reduce方法应用实例,供自己揣摩熟练应用,以提高自己的研发水平和研发效率. reduce方法( ...
javascript reduce map函数方法
retduce: 对数组中的所有元素调用指定的回调函数.该回调函数的返回值为累积结果,并且此返回值在下一次调用该回调函数时作为参数提供. 语法 array1.reduce(callbackfn ...

随机推荐

xshell帮助
查看内置命令 $help Internal Commands:new: Creates a new session.open: Opens a session or the session dialo ...
ES6的export与Nodejs的module.exports
原文:https://www.cnblogs.com/lxg0/p/7774094.html module.exports与exports,export与export default之间的关系和区别 ...
Centos7限速和测速
限速 wondershaper是国外人开发的一款在Linux内核下基于TC工具的对整块网卡的限度工具. 第一种安装方法首先下载wondershaper的rpm安装包:wondershaper-1.1 ...
[django]session设置与获取原理
admin登录情况1: 登录后会产生一个sessionid 情况2: 自定义设置了key后,会多一个sessionid, 登录后会替换为登录后的sessionid的key值 if username ...
emq知识点
1 配置用户名默认是可以匿名登录(与mosquitto相同) ## Allow Anonymous authentication mqtt.allow_anonymous = true etc/p ...
python 内置方法join 给字符串加分隔符
#!/usr/bin/python3 # -*- coding: utf-8 -*- test = "今天吃了吗" test = "_".join(test) ...
js树形结构-----（BST）二叉树增删查
function BinarySearchTree(){ var cnodes = function(key){ this.key = key; this.left = null; this.righ ...
iOS 内存管理分析
内存分析静态分析(Analyze) 不运行程序, 直接检测代码中是否有潜在的内存问题(不一定百分百准确, 仅仅是提供建议) 结合实际情况来分析, 是否真的有内存问题动态分析(Profile == ...
关于promise自己的理解
参考文档:http://es6.ruanyifeng.com/#docs/promise#Promise-%E7%9A%84%E5%90%AB%E4%B9%89 ES6 规定,Promise对象是一个 ...
(已解决)cocos2d-x 运行时xcode提示错误："vtable for XXX", referenced from；
vtable/引用和虚函数相关,今天在添加一个层的时候报了这个错误,很低级的错误,忘了实现虚函数了(谨记!!) 若如果实现了虚函数还依然如此的话,可能是创建的时候忘了钩上 -desktop 选项了,把 ...

关于reduce输出write方法

关于reduce输出write方法的更多相关文章

随机推荐

热门专题