streamsets origin 说明
origin 是streamsets pipeline的soure 入口,只能应用一个origin 在pipeline中,
对于运行在不同执行模式的pipeline 可以应用不同的origin
- 独立模式
- 集群模式
- edge模式(agent)
- 开发模式(方便测试)
standalone(独立模式)组件
In standalone pipelines, you can use the following origins:
- Amazon S3 - Reads objects from Amazon S3.
- Amazon SQS Consumer - Reads data from queues in Amazon Simple Queue Services (SQS).
- Azure IoT/Event Hub Consumer - Reads data from Microsoft Azure Event Hub. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- CoAP Server - Listens on a CoAP endpoint and processes the contents of all authorized CoAP requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Directory - Reads fully-written files from a directory. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Elasticsearch - Reads data from an Elasticsearch cluster. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- File Tail - Reads lines of data from an active file after reading related archived files in the directory.
- Google BigQuery - Executes a query job and reads the result from Google BigQuery.
- Google Cloud Storage - Reads fully written objects from Google Cloud Storage.
- Google Pub/Sub Subscriber - Consumes messages from a Google Pub/Sub subscription. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Hadoop FS Standalone - Reads fully-written files from HDFS. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- HTTP Client - Reads data from a streaming HTTP resource URL.
- HTTP Server - Listens on an HTTP endpoint and processes the contents of all authorized HTTP POST and PUT requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- HTTP to Kafka (Deprecated) - Listens on a HTTP endpoint and writes the contents of all authorized HTTP POST requests directly to Kafka.
- JDBC Multitable Consumer - Reads database data from multiple tables through a JDBC connection. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- JDBC Query Consumer - Reads database data using a user-defined SQL query through a JDBC connection.
- JMS Consumer - Reads messages from JMS.
- Kafka Consumer - Reads messages from a single Kafka topic.
- Kafka Multitopic Consumer - Reads messages from multiple Kafka topics. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Kinesis Consumer - Reads data from Kinesis Streams. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- MapR DB CDC - Reads changed MapR DB data that has been written to MapR Streams. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- MapR DB JSON - Reads JSON documents from MapR DB JSON tables.
- MapR FS - Reads files from MapR FS.
- MapR FS Standalone - Reads fully-written files from MapR FS. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- MapR Multitopic Streams Consumer - Reads messages from multiple MapR Streams topics. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- MapR Streams Consumer - Reads messages from MapR Streams.
- MongoDB - Reads documents from MongoDB.
- MongoDB Oplog - Reads entries from a MongoDB Oplog.
- MQTT Subscriber - Subscribes to a topic on an MQTT broker to read messages from the broker.
- MySQL Binary Log - Reads MySQL binary logs to generate change data capture records.
- Omniture - Reads web usage reports from the Omniture reporting API.
- OPC UA Client - Reads data from a OPC UA server.
- Oracle CDC Client - Reads LogMiner redo logs to generate change data capture records.
- PostgreSQL CDC Client - Reads PostgreSQL WAL data to generate change data capture records.
- RabbitMQ Consumer - Reads messages from RabbitMQ.
- Redis Consumer - Reads messages from Redis.
- REST Service - Listens on an HTTP endpoint, parses the contents of all authorized requests, and sends responses back to the originating REST API. Creates multiple threads to enable parallel processing in a multithreaded pipeline. Use as part of a microservice pipeline.
- Salesforce - Reads data from Salesforce.
- SDC RPC - Reads data from an SDC RPC destination in an SDC RPC pipeline.
- SDC RPC to Kafka (Deprecated) - Reads data from an SDC RPC destination in an SDC RPC pipeline and writes it to Kafka.
- SFTP/FTP Client - Reads files from an SFTP or FTP server.
- SQL Server CDC Client - Reads data from Microsoft SQL Server CDC tables. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- SQL Server Change Tracking - Reads data from Microsoft SQL Server change tracking tables and generates the latest version of each record. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- TCP Server - Listens at the specified ports and processes incoming data over TCP/IP connections. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- UDP Multithreaded Source - Reads messages from one or more UDP ports. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- UDP Source - Reads messages from one or more UDP ports.
- UDP to Kafka (Deprecated) - Reads messages from one or more UDP ports and writes the data to Kafka.
- WebSocket Client - Reads data from a WebSocket server endpoint.
- WebSocket Server - Listens on a WebSocket endpoint and processes the contents of all authorized WebSocket client requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
集群模式的组件
In cluster pipelines, you can use the following origins:
- Hadoop FS - Reads data from HDFS, Amazon S3, or other file systems using the Hadoop FileSystem interface.
- Kafka Consumer - Reads messages from Kafka. Use the cluster version of the origin.
- MapR FS - Reads data from MapR FS.
- MapR Streams Consumer - Reads messages from MapR Streams.
edge 模式
In edge pipelines, you can use the following origins:
- Directory - Reads fully-written files from a directory.
- File Tail - Reads lines of data from an active file after reading related archived files in the directory.
- HTTP Client - Reads data from a streaming HTTP resource URL.
- HTTP Server - Listens on an HTTP endpoint and processes the contents of all authorized HTTP POST and PUT requests.
- MQTT Subscriber - Subscribes to a topic on an MQTT broker to read messages from the broker.
- System Metrics - Reads system metrics from the edge device where SDC Edge is installed.
- WebSocket Client - Reads data from a WebSocket server endpoint.
- Windows Event Log - Reads data from a Microsoft Windows event log located on a Windows machine.
开发模式
To help create or test pipelines, you can use the following development origins:
- Dev Data Generator
- Dev Random Source
- Dev Raw Data Source
- Dev SDC RPC with Buffering
- Dev Snapshot Replaying
- Sensor Reader
参考资料
streamsets origin 说明的更多相关文章
- StreamSets 相关文章
相关streamsets 文章(不按顺序) 学习视频-百度网盘 StreamSets 设计Edge pipeline StreamSets Data Collector Edge 说明 streams ...
- streamsets 3.5 的一些新功能
streamsets 3.5 有了一些新的特性以及增强,总之是越来越方便了,详细的可以 查看官方说明,以下简单例举一些比较有意义的. origins 新的pulsar 消费origin jdbc 多表 ...
- streamsets 集成 cratedb 测试
我们可以集成crate 到streamsets 中可以实现强大的数据导入,数据分析能力. 演示的是进行csv 文件的解析并输出到cratedb 环境使用docker && docker ...
- StreamSets sdc rpc 测试
一个简单的参考图 destination pipeline 创建 pipeline flow sdc destination 配置 origin sdc rpc pipeline pipeline f ...
- StreamSets SDC RPC Pipelines说明
主要目的是进行跨pipeline 数据的通信,而不仅仅是内部pipeline 的通信,之间不同网络进行通信 一个参考图 pipeline 类型 origin destination 部署架构 使用多个 ...
- StreamSets 设计Edge pipeline
edge pipeline 运行在edge 执行模式,我们可以使用 data collector UI 进行edge pipeline 设计, 设计完成之后,你可以部署对应的pipeline到edge ...
- streamsets excel 数据处理
streamsets 有一个directory的origin 可以方便的进行文件的处理,支持的格式也比较多,使用简单 pipeline flow 配置 excel 数据copy 因为使用的是容器,会有 ...
- streamsets 错误记录处理
我们可以在stage 级别,或者piepline 级别进行error 处理配置 pipeline的错误记录处理 discard(丢踢) send response to Origin pipeline ...
- streamsets http client && json parse && local fs 使用
streamsets 包含了丰富的组件,origin processer destination 测试例子为集成了http client 以及json 处理 启动服务 使用docker 创建pipel ...
随机推荐
- 2018 Multi-University Training Contest 8 Solution
A - Character Encoding 题意:用m个$0-n-1$的数去构成k,求方案数 思路:当没有0-n-1这个条件是答案为C(k+m-1, m-1),减去有大于的关于n的情况,当有i个n时 ...
- 从知乎了解到,为什么Mysql禁用存储过程、外键和级联?
打开帖子直接一张醒目的图,是阿里巴巴的Java开发手册对Mysql相关的要求. 看看下面的回复 灵剑 存储过程没有版本控制,版本迭代的时候要更新很麻烦.存储过程如果和外部程序结合起来用,更新的时候很难 ...
- 五,动态库(dll)的封装与使用
在项目开发中,我们经常会使用到动态库(dll),要么是使用别人的动态库,要么是将功能函数封装为动态库给别人用.那么如何封装和使用动态库呢?以下内容为你讲解. 1.动态库的封装 以vs2010为例,我们 ...
- Python3:Requests模块的异常值处理
Python3:Requests模块的异常值处理 用Python的requests模块进行爬虫时,一个简单高效的模块就是requests模块,利用get()或者post()函数,发送请求. 但是在真正 ...
- 20145105 《Java程序设计》第1周学习总结
20145105 <Java程序设计>第1周学习总结 教材学习内容总结 学习了教材的第一章后,我初步了解了Java的发展历程,以及什么是JCP,JSR,JVM.JCP是一个开放性国际组织, ...
- SDN原理 控制层 Controller控制器
本文参照SDN原理视频而成:SDN原理 Controller 概念 从上面这个图片,我们能够知道,Controller 是一个非常重要的东西:承上启下,左右拓展. 从整个SDN的架构来看,控制器 处在 ...
- POJ 1833 排序
http://poj.org/problem?id=1833 题意: 给出一个排序,求出它之后的第k个排序. 思路: 排序原理: 1.如果全部为逆序时,说明已经全部排完了,此时回到1~n的排序. 2. ...
- 使用javascript模拟常见数据结构(三)
六.字典和散列表 我们已经知道,集合表示一组互不相同的元素(不重复元素).在字典中,存储的是键值对,其中键值是用来查询特定的元素的.字典和集合是很相似的,集合采用[值,值]的方式存储,而字典则是以[键 ...
- TC 配置插件
转载:http://hi.baidu.com/accplaystation/item/07534686f39dc329100ef310 1.插件下载地址:http://www.topcoder.com ...
- 使用由 Intel MKL 支持的 R
我们通常使用的 R 版本是单线程的,即只使用一个 CPU 线程运行所有 R 代码.这样的好处是运行模型比较简单且安全,但是它并没有利用多核计算.Microsoft R Open(MRO,https:/ ...