Overview

Flume：一个分布式的，可靠的，可用的服务，用于有效地收集、聚合、移动大规模日志数据
我们搭建一个flume + Spark Streaming的平台来从Flume获取数据，并处理它。
有两种方法实现：使用flume-style的push-based方法，或者使用自定义的sink来实现pull-based方法。

Approach 1: Flume-style Push-based Approach

flume被设计用来在Flume agents之间推信息，在这种方式下，Spark Streaming安装一个receiver that acts like an Avro agent for Flume, to which Flume can push the data.

General Requirement

当你启动flume + spark streaming应用时，该机器上必须运行一个Spark workers。
flume可以向该机器的某一个port push数据。
基于这种push机制，streaming应用必须有一个receiver scheduled and listening on the chosen port.

Configuring Flume

配置flume以向Avro sink发送数据

agent.sinks = avroSink

agent.sinks.avroSink.type = avro

agent.sinks.avroSink.channel = memoryChannel

agent.sinks.avroSink.hostname = <chosen machine's hostname>

agent.sinks.avroSink.port = <chosen port on the machine>

Configuring Spark Streaming Application

Linking: 在maven项目中配置依赖

<dependency>

    <groupId>org.apache.spark</groupId>

    <artifactId>spark-streaming-flume-sink_2.10</artifactId>

    <version>2.1.0</version>

</dependency>

　　2. Programming：import FlumeUtils, 创建input DStream

 import org.apache.spark.streaming.flume._

 val flumeStream = FlumeUtils.createStream(streamingContext, [chosen machine's hostname], [chosen port])

注意：应该与cluster中的resourceManager使用同一个hostname，这样的话资源分配可以匹配names，并在正确的机器上launch receiver
一个简单的Spark Streaming统计Flume event个数的demo代码：

object FlumeEventCount {

  def main(args: Array[String]) {

    if (args.length < 2) {

      System.err.println(

        "Usage: FlumeEventCount <host> <port>")

      System.exit(1)

    }

    StreamingExamples.setStreamingLogLevels()

    val Array(host, IntParam(port)) = args

    val batchInterval = Milliseconds(2000)

    // Create the context and set the batch size

    val sparkConf = new SparkConf().setAppName("FlumeEventCount")

    val ssc = new StreamingContext(sparkConf, batchInterval)

    // Create a flume stream

    val stream = FlumeUtils.createStream(ssc, host, port, StorageLevel.MEMORY_ONLY_SER_2)

    // Print out the count of events received from this server in each batch

    stream.count().map(cnt => "Received " + cnt + " flume events." ).print()

    ssc.start()

    ssc.awaitTermination()

  }

}

<Spark Streaming><Flume><Integration>的更多相关文章

简单物联网：外网访问内网路由器下树莓派Flask服务器
最近做一个小东西,大概过程就是想在教室,宿舍控制实验室的一些设备. 已经在树莓上搭了一个轻量的flask服务器,在实验室的路由器下,任何设备都是可以访问的:但是有一些限制条件,比如我想在宿舍控制我种花 ...
利用ssh反向代理以及autossh实现从外网连接内网服务器
前言最近遇到这样一个问题,我在实验室架设了一台服务器,给师弟或者小伙伴练习Linux用,然后平时在实验室这边直接连接是没有问题的,都是内网嘛.但是回到宿舍问题出来了,使用校园网的童鞋还是能连接上,使 ...
外网访问内网Docker容器
外网访问内网Docker容器本地安装了Docker容器,只能在局域网内访问,怎样从外网也能访问本地Docker容器? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Docker容器 ...
外网访问内网SpringBoot
外网访问内网SpringBoot 本地安装了SpringBoot,只能在局域网内访问,怎样从外网也能访问本地SpringBoot? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装Java 1 ...
外网访问内网Elasticsearch WEB
外网访问内网Elasticsearch WEB 本地安装了Elasticsearch,只能在局域网内访问其WEB,怎样从外网也能访问本地Elasticsearch? 本文将介绍具体的实现步骤. 1. ...
怎样从外网访问内网Rails
外网访问内网Rails 本地安装了Rails,只能在局域网内访问,怎样从外网也能访问本地Rails? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Rails 默认安装的Rails端口 ...
怎样从外网访问内网Memcached数据库
外网访问内网Memcached数据库本地安装了Memcached数据库,只能在局域网内访问,怎样从外网也能访问本地Memcached数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装 ...
怎样从外网访问内网CouchDB数据库
外网访问内网CouchDB数据库本地安装了CouchDB数据库,只能在局域网内访问,怎样从外网也能访问本地CouchDB数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Cou ...
怎样从外网访问内网DB2数据库
外网访问内网DB2数据库本地安装了DB2数据库,只能在局域网内访问,怎样从外网也能访问本地DB2数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动DB2数据库默认安装的DB2 ...
怎样从外网访问内网OpenLDAP数据库
外网访问内网OpenLDAP数据库本地安装了OpenLDAP数据库,只能在局域网内访问,怎样从外网也能访问本地OpenLDAP数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动 ...

随机推荐

微信小程序选择视频，视频上传，视频播放
请查看链接地址看具体详情: 选择视频: https://mp.weixin.qq.com/debug/wxadoc/dev/api/media-video.html#wxchoosevideoobje ...
php算法，冒泡排序
冒泡排序 /*** *从小到大排列 * 逻辑分析假设数组 $arr=[a,b,c,d]; * 总数=4; * 比较对象第几个元素比较次数 * a 1 3 * b 2 2 * c 3 1 **/ ...
118. Pascal's Triangle (java)
问题描述: Given numRows, generate the first numRows of Pascal's triangle. For example, given numRows = 5 ...
Roman To Integer leetcode java
问题描述: Given a roman numeral, convert it to an integer. Input is guaranteed to be within the range fr ...
WDA基础十二：FREE PROGRAM SH (WDA TREE)
一个需要用TREE展示搜索帮助的需求: 1.创建WDA程序:ZCATEGORY 2.Component Controller中添加节点: (说明,此节点仅在搜索帮助程序中使用,可以不用interfac ...
Beta阶段——第2篇 Scrum 冲刺博客
Beta阶段--第2篇 Scrum 冲刺博客标签:软件工程一.站立式会议照片二.每个人的工作 (有work item 的ID) 昨日已完成的工作人员工作林羽晴完成https安全连接的问题 ...
[LeetCode] 99. Recover Binary Search Tree(复原BST) ☆☆☆☆☆
Recover Binary Search Tree leetcode java https://leetcode.com/problems/recover-binary-search-tree/di ...
InnoDB存储引擎表的主键
在InnoDB存储引擎中,表是按照主键顺序组织存放的.在InnoDB存储引擎表中,每张表都有主键(primary key),如果在创建表时没有显式地定义主键,则InnoDB存储引擎会按如下方式选择或创 ...
小程序BindTap快速连续点击页面跳转多次
原因: 手机端点击Tap基础事件解决300ms延迟解决办法: success 里面加一个延迟300ms能解决 setTimeout goRob(e) { const that = this retu ...
sql中，如何获取一个数的整数部分和余数部分
我们测试一下,我要得到的结果是多少周(整数),多少天(余数) 1.获取指定日期到当前日期之间的天数首先用DATEDIFF() 函数获取指定日期到当前日期的天数 --获取指定日期到当前日期的天数 se ...

<Spark Streaming><Flume><Integration>

Overview

Approach 1: Flume-style Push-based Approach

General Requirement

Configuring Flume

Configuring Spark Streaming Application

<Spark Streaming><Flume><Integration>的更多相关文章

随机推荐

热门专题