一、下载编译flume-ng-sql-source

下载地址:https://github.com/keedio/flume-ng-sql-source.git ,安装说明文档编译和拷贝jar包

嫌麻烦的也可以直接,CSDN下载地址:http://download.csdn.net/detail/chongxin1/9892184

此时最新的版本为flume-ng-sql-source-1.4.3.jar,flume-ng-sql-source-1.4.3.jar是flume用于连接数据库的重要支撑jar包。

二、把flume-ng-sql-source-1.4.3.jar放到flume的lib目录下

 

三、把oracle(此处用的是oracle库)的驱动包放到flume的lib目录下

oracle的jdbc驱动包,放在oracle安装目录下,路径为:D:\app\product\11.2.0\dbhome_1\jdbc\lib

如图:

把ojdbc5.jar放到flume的lib目录下,如图:

四、运行Demo

1、创建数据库表


  1. create table  flume_ng_sql_source (
  2. id         varchar2(32) primary key,
  3. msg        varchar2(32),
  4. createTime date not null
  5. );

  1. insert into flume_ng_sql_source(id,msg,createTime) values('1','Test increment Data',to_date('2017-08-01 07:06:20','yyyy-mm-dd hh24:mi:ss'));
  2. insert into flume_ng_sql_source(id,msg,createTime) values('2','Test increment Data',to_date('2017-08-02 07:06:20','yyyy-mm-dd hh24:mi:ss'));
  3. insert into flume_ng_sql_source(id,msg,createTime) values('3','Test increment Data',to_date('2017-08-03 07:06:20','yyyy-mm-dd hh24:mi:ss'));
  4. insert into flume_ng_sql_source(id,msg,createTime) values('4','Test increment Data',to_date('2017-08-04 07:06:20','yyyy-mm-dd hh24:mi:ss'));
  5. insert into flume_ng_sql_source(id,msg,createTime) values('5','Test increment Data',to_date('2017-08-05 07:06:20','yyyy-mm-dd hh24:mi:ss'));
  6. insert into flume_ng_sql_source(id,msg,createTime) values('6','Test increment Data',to_date('2017-08-06 07:06:20','yyyy-mm-dd hh24:mi:ss'));
  7. commit;

2、新建flume-sql.conf

在/usr/local/flume目录新建flume-sql.conf :

  1. touch /usr/local/flume/flume-sql.conf
  2. sudo gedit /usr/local/flume/flume-sql.conf​
flume-sql.conf输入以下内容:

  1. agentTest.channels = channelTest
  2. agentTest.sources = sourceTest
  3. agentTest.sinks = sinkTest
  4. ###########sql source#################
  5. # For each Test of the sources, the type is defined
  6. agentTest.sources.sourceTest.type = org.keedio.flume.source.SQLSource
  7. agentTest.sources.sourceTest.hibernate.connection.url = jdbc:oracle:thin:@192.168.168.100:1521/orcl
  8. # Hibernate Database connection properties
  9. agentTest.sources.sourceTest.hibernate.connection.user = flume
  10. agentTest.sources.sourceTest.hibernate.connection.password = 1234
  11. agentTest.sources.sourceTest.hibernate.connection.autocommit = true
  12. agentTest.sources.sourceTest.hibernate.dialect = org.hibernate.dialect.Oracle10gDialect
  13. agentTest.sources.sourceTest.hibernate.connection.driver_class = oracle.jdbc.driver.OracleDriver
  14. agentTest.sources.sourceTest.run.query.delay=1
  15. agentTest.sources.sourceTest.status.file.path = /usr/local/flume
  16. agentTest.sources.sourceTest.status.file.name = agentTest.sqlSource.status
  17. # Custom query
  18. agentTest.sources.sourceTest.start.from = '2017-07-31 07:06:20'
  19. agentTest.sources.sourceTest.custom.query = SELECT CHR(39)||TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS')||CHR(39),ID,MSG FROM FLUME_NG_SQL_SOURCE WHERE CREATETIME > TO_DATE($@$,'YYYY-MM-DD HH24:MI:SS') ORDER BY CREATETIME ASC
  20. agentTest.sources.sourceTest.batch.size = 6000
  21. agentTest.sources.sourceTest.max.rows = 1000
  22. agentTest.sources.sourceTest.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
  23. agentTest.sources.sourceTest.hibernate.c3p0.min_size=1
  24. agentTest.sources.sourceTest.hibernate.c3p0.max_size=10
  25. ##############################
  26. agentTest.channels.channelTest.type = memory
  27. agentTest.channels.channelTest.capacity = 10000
  28. agentTest.channels.channelTest.transactionCapacity = 10000
  29. agentTest.channels.channelTest.byteCapacityBufferPercentage = 20
  30. agentTest.channels.channelTest.byteCapacity = 1600000
  31. agentTest.sinks.sinkTest.type = org.apache.flume.sink.kafka.KafkaSink
  32. agentTest.sinks.sinkTest.topic = TestTopic
  33. agentTest.sinks.sinkTest.brokerList = 192.168.168.200:9092
  34. agentTest.sinks.sinkTest.requiredAcks = 1
  35. agentTest.sinks.sinkTest.batchSize = 20
  36. agentTest.sinks.sinkTest.channel = channelTest
  37. agentTest.sinks.sinkTest.channel = channelTest
  38. agentTest.sources.sourceTest.channels=channelTest

3、flume-ng启动flume-sql.conf和测试

启动kafka的消费者,监听topic主题:


  1. kafka-console-consumer.sh --zookeeper 192.168.168.200:2181 --topic TestTopic​

flume-ng启动flume-sql.conf


  1. flume-ng agent --conf conf --conf-file /usr/local/flume/flume-sql.conf --name agentTest -Dflume.root.logger=INFO,console

TestTopic消费者控制台打印:


  1. [root@master ~]# kafka-console-consumer.sh --zookeeper 192.168.168.200:2181 --topic TestTopic
  2. "'2017-08-01 07:06:20'","1","Test increment Data"
  3. "'2017-08-02 07:06:20'","2","Test increment Data"
  4. "'2017-08-03 07:06:20'","3","Test increment Data"
  5. "'2017-08-04 07:06:20'","4","Test increment Data"
  6. "'2017-08-05 07:06:20'","5","Test increment Data"
  7. "'2017-08-06 07:06:20'","6","Test increment Data"

根据配置查看相应的状态文件/usr/local/flume/agentTest.sqlSource.status:


  1. agentTest.sources.sourceTest.status.file.path = /usr/local/flume
  2. agentTest.sources.sourceTest.status.file.name = agentTest.sqlSource.status​

  1. {"SourceName":"sourceTest","URL":"jdbc:oracle:thin:@192.168.168.100:1521\/orcl","LastIndex":"'2017-08-06 07:06:20'","Query":"SELECT CHR(39)||TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS')||CHR(39) AS INCREMENTAL,ID,MSG FROM FLUME_NG_SQL_SOURCE WHERE TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS') > $@$ ORDER BY INCREMENTAL ASC"}

从"LastIndex":"'2017-08-06 07:06:20'",可以看出当前的最后一条增量数据日期是'2017-08-06 07:06:20',也就是说下一次WHERE TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS') > $@$,$@$的值就是'2017-08-06 07:06:20'。

往flume_ng_sql_source表中插入增量数据:


  1. insert into flume_ng_sql_source(id,msg,createTime) values('7','Test increment Data',to_date('2017-08-07 07:06:20','yyyy-mm-dd hh24:mi:ss'));
  2. insert into flume_ng_sql_source(id,msg,createTime) values('8','Test increment Data',to_date('2017-08-08 07:06:20','yyyy-mm-dd hh24:mi:ss'));
  3. insert into flume_ng_sql_source(id,msg,createTime) values('9','Test increment Data',to_date('2017-08-09 07:06:20','yyyy-mm-dd hh24:mi:ss'));
  4. insert into flume_ng_sql_source(id,msg,createTime) values('10','Test increment Data',to_date('2017-08-10 07:06:20','yyyy-mm-dd hh24:mi:ss'));
  5. commit;

TestTopic消费者控制台打印:


  1. [root@master ~]# kafka-console-consumer.sh --zookeeper 192.168.168.200:2181--topic TestTopic
  2. "'2017-08-01 07:06:20'","1","Test increment Data"
  3. "'2017-08-02 07:06:20'","2","Test increment Data"
  4. "'2017-08-03 07:06:20'","3","Test increment Data"
  5. "'2017-08-04 07:06:20'","4","Test increment Data"
  6. "'2017-08-05 07:06:20'","5","Test increment Data"
  7. "'2017-08-06 07:06:20'","6","Test increment Data"
  8. "'2017-08-07 07:06:20'","7","Test increment Data"
  9. "'2017-08-08 07:06:20'","8","Test increment Data"
  10. "'2017-08-09 07:06:20'","9","Test increment Data"
  11. "'2017-08-10 07:06:20'","10","Test increment Data"

根据配置查看相应的状态文件/usr/local/flume/agentTest.sqlSource.status:


  1. {"SourceName":"sourceTest","URL":"jdbc:oracle:thin:@192.168.168.100:1521\/orcl","LastIndex":"'2017-08-10 07:06:20'","Query":"SELECT CHR(39)||TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS')||CHR(39) AS INCREMENTAL,ID,MSG FROM FLUME_NG_SQL_SOURCE WHERE TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS') > $@$ ORDER BY INCREMENTAL ASC"}

"LastIndex":"'2017-08-10 07:06:20'"

至此,flume-ng-sql-source实现oracle增量数据读取成功!!!

五、相关配置参数说明

https://github.com/keedio/flume-ng-sql-source

Configuration of SQL Source:

Mandatory properties in bold

Property Name Default Description
channels - Connected channel names
type - The component type name, needs to be org.keedio.flume.source.SQLSource
hibernate.connection.url - Url to connect with the remote Database
hibernate.connection.user - Username to connect with the database
hibernate.connection.password - Password to connect with the database
table - Table to export data
status.file.name - Local file name to save last row number read
status.file.path /var/lib/flume Path to save the status file
start.from 0 Start value to import data
delimiter.entry , delimiter of incoming entry
enclose.by.quotes true If Quotes are applied to all values in the output.
columns.to.select * Which colums of the table will be selected
run.query.delay 10000 ms to wait between run queries
batch.size 100 Batch size to send events to flume channel
max.rows 10000 Max rows to import per query
read.only false Sets read only session with DDBB
custom.query - Custom query to force a special request to the DB, be carefull. Check below explanation of this property.
hibernate.connection.driver_class - Driver class to use by hibernate, if not specified the framework will auto asign one
hibernate.dialect - Dialect to use by hibernate, if not specified the framework will auto asign one. Check https://docs.jboss.org/hibernate/orm/4.3/manual/en-US/html/ch03.html#configuration-optional-dialects for a complete list of available dialects
hibernate.connection.provider_class - Set to org.hibernate.connection.C3P0ConnectionProvider to use C3P0 connection pool (recommended for production)
hibernate.c3p0.min_size - Min connection pool size
hibernate.c3p0.max_size - Max connection pool size

Custom Query

A custom query is supported to bring the possibility of using the entire SQL language. This is powerful, but risky, be careful with the custom queries used.

To avoid row export repetitions use the $@$ special character in WHERE clause, to incrementaly export not processed rows and the new ones inserted.

IMPORTANT: For proper operation of Custom Query ensure that incremental field will be returned in the first position of the Query result.

Example:


  1. agent.sources.sql-source.custom.query = SELECT incrementalField,field2 FROM table1 WHERE incrementalField > $@$ ​

这段话的意思大意就是为了避免出现问题,把增量字段写在查询的第一个位置。

flume-ng-sql-source实现oracle增量数据读取的更多相关文章

  1. 使用PL/SQL能查询oracle中数据,在for update 语句中一直卡住

    原因:在oracle中,执行了update或者insert语句后,都会要求commit,如果不commit却强制关闭连接,oracle就会将这条提交的记录锁住.下次就不能执行增删操作. 解决:1.查询 ...

  2. solr增量数据配置说明

    转帖地址:http://www.blogjava.net/conans/articles/379546.html 以下资料整理自网络,觉的有必要合并在一起,这样方便查看.主要分为两部分,第一部分是对& ...

  3. Flume NG Getting Started(Flume NG 新手入门指南)

    Flume NG Getting Started(Flume NG 新手入门指南)翻译 新手入门 Flume NG是什么? 有什么改变? 获得Flume NG 从源码构建 配置 flume-ng全局选 ...

  4. MS SQL到Oracle的数据迁移笔记

    MS SQL到Oracle的数据迁移笔记 一.任务背景 旧系统使用MS SQL Server数据库,新系统使用Oracle数据库,现在需要将旧系统中的数据迁移到新系统中,旧数据按照约定的规则转换后,能 ...

  5. 关于 Oracle 的数据导入导出及 Sql Loader (sqlldr) 的用法

    在 Oracle 数据库中,我们通常在不同数据库的表间记录进行复制或迁移时会用以下几种方法: 1. A 表的记录导出为一条条分号隔开的 insert 语句,然后执行插入到 B 表中2. 建立数据库间的 ...

  6. 关于sql server远程访问Oracle数据库 OpenQuery查询返回多条数据的问题

    在Sql Server远程访问Oracle 中的数据库表时: 远程语法通常为: select * from OpenQuery(Oracle链接服务器名称,‘查询语句’) eg: select * f ...

  7. sql server 与oracle数据互导的一种思路--sql server链接服务器

    思路:通过在sql server数据库中添加链接服务器,可以远程查询oracle数据库的表环境准备,安装sql server数据库,并安装好oracle驱动,在配置好tnsname文件中配置好orac ...

  8. java.sql.SQLException: ORA-01578: ORACLE 数据块损坏问题解决办法

    错误信息: java.sql.SQLException: ORA-01578: ORACLE 数据块损坏 (文件号 17, 块号 315703) ORA-01110: 数据文件 17: 'D:\ORA ...

  9. flume使用之exec source收集各端数据汇总到另外一台服务器

    转载:http://blog.csdn.net/liuxiao723846/article/details/78133375 一.场景一描述: 线上api接口服务通过log4j往本地磁盘上打印日志,在 ...

随机推荐

  1. python3:logging模块 输出日志到文件

    python自动化测试脚本运行后,想要将日志保存到某个特定文件,使用python的logging模块实现 参考代码: import logging def initLogging(logFilenam ...

  2. CompletableFuture

    若你的意图是并发,而非并行,或者你的主要目标是在同一个CPU上执行几个松耦合的任务,充分利用CPU的核,让其足够忙碌,从而最大化程序的吞吐量,那么其实真正想做的避免因为等待远程服务的返回,或对数据库的 ...

  3. Spring Batch框架流程的简单介绍

    Spring Batch流程介绍: 上图描绘了Spring Batch的执行过程.说明如下: 每个Batch都会包含一个Job.Job就像一个容器,这个容器里装了若干Step,Batch中实际干活的也 ...

  4. npm 包管理器的使用

    1. 权限问题 Warning "root" does not have permission to access the dev dir · Issue #454 · nodej ...

  5. 清除chrome浏览器HSTS缓存

    如果你的网站启用了HSTS 在chrome中会用缓存效果,即使你的站点取消了HSTS,下次访问时,仍旧会自动给你重定向到HSTS. 那么如何清除 HSTS呢? chrome://net-interna ...

  6. 【leetcode】350. Intersection of Two Arrays II

    problem 350. Intersection of Two Arrays II 不是特别明白这道题的意思,例子不够说明问题: 是按顺序把相同的元素保存下来,还是排序,但是第二个例子没有重复... ...

  7. Unity 3D光源-Directional平行光/逆光效果,光晕详解、教程

    Unity4大光源之平行光 本文提供全流程,中文翻译. Chinar 坚持将简单的生活方式,带给世人!(拥有更好的阅读体验 -- 高分辨率用户请根据需求调整网页缩放比例) Chinar -- 心分享. ...

  8. C#字符串string以及相关内置函数

    C#字符串string函数 本文提供全流程,中文翻译. Chinar 坚持将简单的生活方式,带给世人!(拥有更好的阅读体验 -- 高分辨率用户请根据需求调整网页缩放比例) Chinar -- 心分享. ...

  9. P1220 关路灯 (区间dp)

    题目链接:传送门 题目大意: 总共有N盏灯,老张从点C(1 ≤ C ≤ N)开始关灯(关灯不需要等待时间,C点的灯直接关掉),与此同时灯开始烧电(已知功率Pi). 老张每次可以往左走关最近的灯或者往右 ...

  10. lesson5rnns-fastai

    32min 如何确定embedding个数即嵌入矩阵容量?:不确定:与文本的字数关系不大,关键是语言的复杂度和需要解决的问题类型 embedding的效果要比one hot编码的效果好