有两种方式,一种是sparkstreaming中的driver起监听,flume来推数据;另一种是sparkstreaming按照时间策略轮训的向flume拉数据。

最开始我以为只有第一种方法,但是尼玛问题在于driver起来的结点是没谱的,所以每次我重启streaming后发现尼玛每次都要修改flume的sinks,蛋疼死了,后来才发现有后面的方法,好吧,把不同的方法代码写出来,其实变化不大。(代码转自官方的githup)

第一种,监听端口:

package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._
import org.apache.spark.streaming.flume._
import org.apache.spark.util.IntParam /**
* Produces a count of events received from Flume.
*
* This should be used in conjunction with an AvroSink in Flume. It will start
* an Avro server on at the request host:port address and listen for requests.
* Your Flume AvroSink should be pointed to this address.
*
* Usage: FlumeEventCount <host> <port>
* <host> is the host the Flume receiver will be started on - a receiver
* creates a server and listens for flume events.
* <port> is the port the Flume receiver will listen on.
*
* To run this example:
* `$ bin/run-example org.apache.spark.examples.streaming.FlumeEventCount <host> <port> `
*/
object FlumeEventCount {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println(
"Usage: FlumeEventCount <host> <port>")
System.exit(1)
} StreamingExamples.setStreamingLogLevels() val Array(host, IntParam(port)) = args val batchInterval = Milliseconds(2000) // Create the context and set the batch size
val sparkConf = new SparkConf().setAppName("FlumeEventCount")
val ssc = new StreamingContext(sparkConf, batchInterval) // Create a flume stream
val stream = FlumeUtils.createStream(ssc, host, port, StorageLevel.MEMORY_ONLY_SER_2) // Print out the count of events received from this server in each batch
stream.count().map(cnt => "Received " + cnt + " flume events." ).print() ssc.start()
ssc.awaitTermination()
}
}

第二种是轮训主动向flume拿数据

package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._
import org.apache.spark.streaming.flume._
import org.apache.spark.util.IntParam
import java.net.InetSocketAddress /**
* Produces a count of events received from Flume.
*
* This should be used in conjunction with the Spark Sink running in a Flume agent. See
* the Spark Streaming programming guide for more details.
*
* Usage: FlumePollingEventCount <host> <port>
* `host` is the host on which the Spark Sink is running.
* `port` is the port at which the Spark Sink is listening.
*
* To run this example:
* `$ bin/run-example org.apache.spark.examples.streaming.FlumePollingEventCount [host] [port] `
*/
object FlumePollingEventCount {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println(
"Usage: FlumePollingEventCount <host> <port>")
System.exit(1)
} StreamingExamples.setStreamingLogLevels() val Array(host, IntParam(port)) = args val batchInterval = Milliseconds(2000) // Create the context and set the batch size
val sparkConf = new SparkConf().setAppName("FlumePollingEventCount")
val ssc = new StreamingContext(sparkConf, batchInterval) // Create a flume stream that polls the Spark Sink running in a Flume agent
val stream = FlumeUtils.createPollingStream(ssc, host, port) // Print out the count of events received from this server in each batch
stream.count().map(cnt => "Received " + cnt + " flume events." ).print() ssc.start()
ssc.awaitTermination()
}
}

  

spark streaming中使用flume数据源的更多相关文章

  1. Spark Streaming中向flume拉取数据

    在这里看到的解决方法 https://issues.apache.org/jira/browse/SPARK-1729 请是个人理解,有问题请大家留言. 其实本身flume是不支持像KAFKA一样的发 ...

  2. Spark Streaming中的操作函数分析

    根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transformations Window Operations J ...

  3. Spark Streaming中的操作函数讲解

    Spark Streaming中的操作函数讲解 根据根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transform ...

  4. spark streaming中维护kafka偏移量到外部介质

    spark streaming中维护kafka偏移量到外部介质 以kafka偏移量维护到redis为例. redis存储格式 使用的数据结构为string,其中key为topic:partition, ...

  5. Spark Streaming中动态Batch Size实现初探

    本期内容 : BatchDuration与 Process Time 动态Batch Size Spark Streaming中有很多算子,是否每一个算子都是预期中的类似线性规律的时间消耗呢? 例如: ...

  6. flink和spark Streaming中的Back Pressure

    Spark Streaming的back pressure 在讲flink的back pressure之前,我们先讲讲Spark Streaming的back pressure.Spark Strea ...

  7. spark streaming中使用checkpoint

    从官方的Programming Guides中看到的 我理解streaming中的checkpoint有两种,一种指的是metadata的checkpoint,用于恢复你的streaming:一种是r ...

  8. Spark Streaming数据限流简述

      Spark Streaming对实时数据流进行分析处理,源源不断的从数据源接收数据切割成一个个时间间隔进行处理:   流处理与批处理有明显区别,批处理中的数据有明显的边界.数据规模已知:而流处理数 ...

  9. Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南 | ApacheCN

    Spark Streaming 编程指南 概述 一个入门示例 基础概念 依赖 初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...

随机推荐

  1. 如何在word里面插入目录

    点击“引用”->插入目录

  2. Ubuntu无法关机解决办法

    说明:如果不成功请参考一下文章最后的内容,也许会有帮助. 其实不止在ubuntu里面,fedora里面我也遇到了这个问题,就是电脑可以重启,但是不能直接关机,否则就一直停在关机界面,需手动关机.郁闷很 ...

  3. 百度编辑器Ueditor 初始化加载内容失败解决办法

    项目上有用到百度文本编辑器ueditor,在页面加载的时候初始化编辑器内容时候,使用 $.document.ready(function() { UE.getEditor('editor').setC ...

  4. Altera的几个常用的Synthesis attributes

    各厂商综合工具,对HDL综合时都定义了一些综合属性这些属性可指定a declaration,a module item,a statement, or a port connection 不同的综合方 ...

  5. Win 7 下制作 mac 系统启动U盘

    Win 7 下制作 mac 系统启动U盘 前几天因为工作需要,在mac 上安装了win7.后来因为习惯问题将win7 分区了,后来就是进不去mac os,只能进入win7 .可恶. 苹果客服说只能用m ...

  6. NGUI 新版操作教程

    http://www.tasharen.com/forum/index.php?topic=6754

  7. Iphone和iPad适配, 横竖屏

    竖屏情况下: [UIScreen mainScreen].bounds.size.width = 320 [UIScreen mainScreen].bounds.size.width = 568 横 ...

  8. FATAL: ActionView::Template::Error (application.css isn't precompiled):

    iwangzheng.com tty:[0] jobs:[0] cwd:[/opt/logs/m]13:02 [root@a02.cmsapi$ tail thin\ server\ \(0.0.0. ...

  9. svn报错 400 Bad Request

    MyEclipse中的svn,commit经常报错 Error: Commit failed (details follow):  Error: At least one property chang ...

  10. HDU 4435 charge-station bfs图论问题

    E - charge-station Time Limit:1000MS     Memory Limit:32768KB     64bit IO Format:%I64d & %I64u ...