spark streaming中使用flume数据源

有两种方式，一种是sparkstreaming中的driver起监听，flume来推数据；另一种是sparkstreaming按照时间策略轮训的向flume拉数据。

最开始我以为只有第一种方法，但是尼玛问题在于driver起来的结点是没谱的，所以每次我重启streaming后发现尼玛每次都要修改flume的sinks，蛋疼死了，后来才发现有后面的方法，好吧，把不同的方法代码写出来，其实变化不大。（代码转自官方的githup）

第一种，监听端口：

package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf

import org.apache.spark.storage.StorageLevel

import org.apache.spark.streaming._

import org.apache.spark.streaming.flume._

import org.apache.spark.util.IntParam

/**

 *  Produces a count of events received from Flume.

 *

 *  This should be used in conjunction with an AvroSink in Flume. It will start

 *  an Avro server on at the request host:port address and listen for requests.

 *  Your Flume AvroSink should be pointed to this address.

 *

 *  Usage: FlumeEventCount <host> <port>

 *    <host> is the host the Flume receiver will be started on - a receiver

 *           creates a server and listens for flume events.

 *    <port> is the port the Flume receiver will listen on.

 *

 *  To run this example:

 *    `$ bin/run-example org.apache.spark.examples.streaming.FlumeEventCount <host> <port> `

 */

object FlumeEventCount {

  def main(args: Array[String]) {

    if (args.length < 2) {

      System.err.println(

        "Usage: FlumeEventCount <host> <port>")

      System.exit(1)

    }

    StreamingExamples.setStreamingLogLevels()

    val Array(host, IntParam(port)) = args

    val batchInterval = Milliseconds(2000)

    // Create the context and set the batch size

    val sparkConf = new SparkConf().setAppName("FlumeEventCount")

    val ssc = new StreamingContext(sparkConf, batchInterval)

    // Create a flume stream

    val stream = FlumeUtils.createStream(ssc, host, port, StorageLevel.MEMORY_ONLY_SER_2)

    // Print out the count of events received from this server in each batch

    stream.count().map(cnt => "Received " + cnt + " flume events." ).print()

    ssc.start()

    ssc.awaitTermination()

  }

}

第二种是轮训主动向flume拿数据

package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf

import org.apache.spark.storage.StorageLevel

import org.apache.spark.streaming._

import org.apache.spark.streaming.flume._

import org.apache.spark.util.IntParam

import java.net.InetSocketAddress

/**

 *  Produces a count of events received from Flume.

 *

 *  This should be used in conjunction with the Spark Sink running in a Flume agent. See

 *  the Spark Streaming programming guide for more details.

 *

 *  Usage: FlumePollingEventCount <host> <port>

 *    `host` is the host on which the Spark Sink is running.

 *    `port` is the port at which the Spark Sink is listening.

 *

 *  To run this example:

 *    `$ bin/run-example org.apache.spark.examples.streaming.FlumePollingEventCount [host] [port] `

 */

object FlumePollingEventCount {

  def main(args: Array[String]) {

    if (args.length < 2) {

      System.err.println(

        "Usage: FlumePollingEventCount <host> <port>")

      System.exit(1)

    }

    StreamingExamples.setStreamingLogLevels()

    val Array(host, IntParam(port)) = args

    val batchInterval = Milliseconds(2000)

    // Create the context and set the batch size

    val sparkConf = new SparkConf().setAppName("FlumePollingEventCount")

    val ssc = new StreamingContext(sparkConf, batchInterval)

    // Create a flume stream that polls the Spark Sink running in a Flume agent

    val stream = FlumeUtils.createPollingStream(ssc, host, port)

    // Print out the count of events received from this server in each batch

    stream.count().map(cnt => "Received " + cnt + " flume events." ).print()

    ssc.start()

    ssc.awaitTermination()

  }

}

spark streaming中使用flume数据源的更多相关文章

Spark Streaming中向flume拉取数据
在这里看到的解决方法 https://issues.apache.org/jira/browse/SPARK-1729 请是个人理解,有问题请大家留言. 其实本身flume是不支持像KAFKA一样的发 ...
Spark Streaming中的操作函数分析
根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transformations Window Operations J ...
Spark Streaming中的操作函数讲解
Spark Streaming中的操作函数讲解根据根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transform ...
spark streaming中维护kafka偏移量到外部介质
spark streaming中维护kafka偏移量到外部介质以kafka偏移量维护到redis为例. redis存储格式使用的数据结构为string,其中key为topic:partition, ...
Spark Streaming中动态Batch Size实现初探
本期内容 : BatchDuration与 Process Time 动态Batch Size Spark Streaming中有很多算子,是否每一个算子都是预期中的类似线性规律的时间消耗呢? 例如: ...
flink和spark Streaming中的Back Pressure
Spark Streaming的back pressure 在讲flink的back pressure之前,我们先讲讲Spark Streaming的back pressure.Spark Strea ...
spark streaming中使用checkpoint
从官方的Programming Guides中看到的我理解streaming中的checkpoint有两种,一种指的是metadata的checkpoint,用于恢复你的streaming:一种是r ...
Spark Streaming数据限流简述
Spark Streaming对实时数据流进行分析处理,源源不断的从数据源接收数据切割成一个个时间间隔进行处理: 流处理与批处理有明显区别,批处理中的数据有明显的边界.数据规模已知:而流处理数 ...
Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南 | ApacheCN
Spark Streaming 编程指南概述一个入门示例基础概念依赖初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...

随机推荐

如何在word里面插入目录
点击“引用”->插入目录
Ubuntu无法关机解决办法
说明:如果不成功请参考一下文章最后的内容,也许会有帮助. 其实不止在ubuntu里面,fedora里面我也遇到了这个问题,就是电脑可以重启,但是不能直接关机,否则就一直停在关机界面,需手动关机.郁闷很 ...
百度编辑器Ueditor 初始化加载内容失败解决办法
项目上有用到百度文本编辑器ueditor,在页面加载的时候初始化编辑器内容时候,使用 $.document.ready(function() { UE.getEditor('editor').setC ...
Altera的几个常用的Synthesis attributes
各厂商综合工具,对HDL综合时都定义了一些综合属性这些属性可指定a declaration,a module item,a statement, or a port connection 不同的综合方 ...
Win 7 下制作 mac 系统启动U盘
Win 7 下制作 mac 系统启动U盘前几天因为工作需要,在mac 上安装了win7.后来因为习惯问题将win7 分区了,后来就是进不去mac os,只能进入win7 .可恶. 苹果客服说只能用m ...
NGUI 新版操作教程
http://www.tasharen.com/forum/index.php?topic=6754
Iphone和iPad适配, 横竖屏
竖屏情况下: [UIScreen mainScreen].bounds.size.width = 320 [UIScreen mainScreen].bounds.size.width = 568 横 ...
FATAL: ActionView::Template::Error (application.css isn't precompiled):
iwangzheng.com tty:[0] jobs:[0] cwd:[/opt/logs/m]13:02 [root@a02.cmsapi$ tail thin\ server\ \(0.0.0. ...
svn报错 400 Bad Request
MyEclipse中的svn,commit经常报错 Error: Commit failed (details follow): Error: At least one property chang ...
HDU 4435 charge-station bfs图论问题
E - charge-station Time Limit:1000MS Memory Limit:32768KB 64bit IO Format:%I64d & %I64u ...

spark streaming中使用flume数据源

spark streaming中使用flume数据源的更多相关文章

随机推荐

热门专题