Accumulator<Long> implements of JavaSparkContext in Spark1.x

As we all know , up to Spark 1.6.2, JavaSparkContext only provides two kinds of accumulators: Integer and Double.

However, unfortunately I've met with problems of Integer overflow and the program returned me a negative number.

So I have to use original sparkcontext to implement the Long accumulator.

public static class LongAccumulatorParam implements AccumulatorParam<Long>,Serializable {

        @Override

        public Long addAccumulator(final Long r, final Long t) {

            return r + t;

        }

        @Override

        public Long addInPlace(final Long r1, final Long r2) {

            return  r1 + r2;

        }

        @Override

        public Long zero(final Long initialValue) {

            return 0L;

        }

    }

final Accumulator<Long> acc = jsc.sc().accumulator(new Long(0), new LongAccumulatorParam());

Actually it is pretty simple. I haven't looked into Spark 2 yet, hope the developers have fixed this issue.

Accumulator<Long> implements of JavaSparkContext in Spark1.x的更多相关文章

java使用spark/spark-sql处理schema数据(spark1.6)
1.spark是什么? Spark是基于内存计算的大数据并行计算框架. 1.1 Spark基于内存计算相比于MapReduce基于IO计算,提高了在大数据环境下数据处理的实时性. 1.2 高容错性和 ...
【Spark Java API】broadcast、accumulator
转载自:http://www.jianshu.com/p/082ef79c63c1 broadcast 官方文档描述: Broadcast a read-only variable to the cl ...
spark-2.2.0-bin-hadoop2.6和spark-1.6.1-bin-hadoop2.6发行包自带案例全面详解（java、python、r和scala）之Basic包下的JavaPageRank.java（图文详解）
不多说,直接上干货! spark-1.6.1-bin-hadoop2.6里Basic包下的JavaPageRank.java /* * Licensed to the Apache Software ...
spark 变量使用 broadcast、accumulator
broadcast 官方文档描述: Broadcast a read-only variable to the cluster, returning a [[org.apache.spark.broa ...
Spark1.6.2 java实现读取json数据文件插入MySql数据库
public class Main implements Serializable { /** * */ private static final long serialVersionUID = -8 ...
Spark1.6.2 java实现读取txt文件插入MySql数据库代码
package com.gosun.spark1; import java.util.ArrayList;import java.util.List;import java.util.Properti ...
flink - accumulator
读accumlator JobManager 在job finish的时候会汇总accumulator的值, newJobStatus match { case JobStatus.FINISHE ...
spark1.4的本地模式编程练习（1）
spark编程练习申明:以下代码仅作学习参考使用,勿使用在商业用途. Wordcount UserMining TweetMining HashtagMining InvertedIndex Tes ...
Spark1.0.x入门指南
1 节点说明 IP Role 192.168.1.111 ActiveNameNode 192.168.1.112 StandbyNameNode,Master,Worker 192.168.1. ...

随机推荐

python生成随机日期字符串
python生成随机日期字符串生成随机的日期字符串,用于插入数据库. 通过时间元组设定一个时间段,开始和结尾时间转换成时间戳. 时间戳中随机取一个,再生成时间元组,再把时间元组格式化输出为字符串 # ...
Django之setting文件
Django之setting文件转载:https://www.jb51.net/article/128678.htm 目录设置语言.时区 app路径数据库配置静态文件配置中间件 sessio ...
005.Ceph文件系统基础使用
一 Ceph文件系统 1.1 概述 CephFS也称ceph文件系统,是一个POSIX兼容的分布式文件系统. 实现ceph文件系统的要求: 需要一个已经正常运行的ceph集群: 至少包含一个ceph元 ...
log4j平稳升级到log4j2
一.前言公司中的项目虽然已经用了很多的新技术了,但是日志的底层框架还是log4j,个人还是不喜欢用这个的.最近项目再生产环境上由于log4j引起了一场血案,于是决定升级到log4j2. 二.现象虽 ...
redis5.0.0.版设置开机自启
数据源、数据集、同步任务、数据仓库、元数据、数据目录、主题、来源系统、标签、增量识别字段、修改同步、ES索引、HBase列族、元数据同步、
数据源.数据集.同步任务.数据仓库.元数据.数据目录.主题.来源系统.标签. 增量识别字段.修改同步.ES索引.HBase列族.元数据同步.DS.ODS.DW.DM.zk集群地址 == 数据源数据源 ...
Window通过zip安装并启动mariadb
下载解压后进入bin目录使用mysql_install_db.exe工具:https://mariadb.com/kb/en/mariadb/mysql_install_dbexe/ 安装完成后,在 ...
Codeforces.666E.Forensic Examination(广义后缀自动机线段树合并)
题目链接 \(Description\) 给定串\(S\)和\(m\)个串\(T_i\).\(Q\)次询问,每次询问\(l,r,p_l,p_r\),求\(S[p_l\sim p_r]\)在\(T_l\ ...
2017-9-17-Windows Live Writer常用快捷键
Window Live Writer常用的快捷键: 将当前文章发布到博客:CTRL+SHIFT+P 打开新文章: CTRL+N 将草稿保存到计算机上: CTRL+S 切换到"普通" ...
Ubuntu安装python3虚拟环境
大多数Linux自带python2.7,而Ubuntu1.6也自带python3.x,本文章主要记录virtualenv+vitualenvwrapper使用python3虚拟环境虚拟环境好处不多说 ...

Accumulator<Long> implements of JavaSparkContext in Spark1.x

Accumulator<Long> implements of JavaSparkContext in Spark1.x的更多相关文章

随机推荐

热门专题