origin 是streamsets pipeline的soure 入口,只能应用一个origin 在pipeline中,
对于运行在不同执行模式的pipeline 可以应用不同的origin

  • 独立模式
  • 集群模式
  • edge模式(agent)
  • 开发模式(方便测试)

standalone(独立模式)组件

In standalone pipelines, you can use the following origins:

  • Amazon S3 - Reads objects from Amazon S3.
  • Amazon SQS Consumer - Reads data from queues in Amazon Simple Queue Services (SQS).
  • Azure IoT/Event Hub Consumer - Reads data from Microsoft Azure Event Hub. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • CoAP Server - Listens on a CoAP endpoint and processes the contents of all authorized CoAP requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • Directory - Reads fully-written files from a directory. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • Elasticsearch - Reads data from an Elasticsearch cluster. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • File Tail - Reads lines of data from an active file after reading related archived files in the directory.
  • Google BigQuery - Executes a query job and reads the result from Google BigQuery.
  • Google Cloud Storage - Reads fully written objects from Google Cloud Storage.
  • Google Pub/Sub Subscriber - Consumes messages from a Google Pub/Sub subscription. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • Hadoop FS Standalone - Reads fully-written files from HDFS. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • HTTP Client - Reads data from a streaming HTTP resource URL.
  • HTTP Server - Listens on an HTTP endpoint and processes the contents of all authorized HTTP POST and PUT requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • HTTP to Kafka (Deprecated) - Listens on a HTTP endpoint and writes the contents of all authorized HTTP POST requests directly to Kafka.
  • JDBC Multitable Consumer - Reads database data from multiple tables through a JDBC connection. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • JDBC Query Consumer - Reads database data using a user-defined SQL query through a JDBC connection.
  • JMS Consumer - Reads messages from JMS.
  • Kafka Consumer - Reads messages from a single Kafka topic.
  • Kafka Multitopic Consumer - Reads messages from multiple Kafka topics. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • Kinesis Consumer - Reads data from Kinesis Streams. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • MapR DB CDC - Reads changed MapR DB data that has been written to MapR Streams. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • MapR DB JSON - Reads JSON documents from MapR DB JSON tables.
  • MapR FS - Reads files from MapR FS.
  • MapR FS Standalone - Reads fully-written files from MapR FS. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • MapR Multitopic Streams Consumer - Reads messages from multiple MapR Streams topics. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • MapR Streams Consumer - Reads messages from MapR Streams.
  • MongoDB - Reads documents from MongoDB.
  • MongoDB Oplog - Reads entries from a MongoDB Oplog.
  • MQTT Subscriber - Subscribes to a topic on an MQTT broker to read messages from the broker.
  • MySQL Binary Log - Reads MySQL binary logs to generate change data capture records.
  • Omniture - Reads web usage reports from the Omniture reporting API.
  • OPC UA Client - Reads data from a OPC UA server.
  • Oracle CDC Client - Reads LogMiner redo logs to generate change data capture records.
  • PostgreSQL CDC Client - Reads PostgreSQL WAL data to generate change data capture records.
  • RabbitMQ Consumer - Reads messages from RabbitMQ.
  • Redis Consumer - Reads messages from Redis.
  • REST Service - Listens on an HTTP endpoint, parses the contents of all authorized requests, and sends responses back to the originating REST API. Creates multiple threads to enable parallel processing in a multithreaded pipeline. Use as part of a microservice pipeline.
  • Salesforce - Reads data from Salesforce.
  • SDC RPC - Reads data from an SDC RPC destination in an SDC RPC pipeline.
  • SDC RPC to Kafka (Deprecated) - Reads data from an SDC RPC destination in an SDC RPC pipeline and writes it to Kafka.
  • SFTP/FTP Client - Reads files from an SFTP or FTP server.
  • SQL Server CDC Client - Reads data from Microsoft SQL Server CDC tables. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • SQL Server Change Tracking - Reads data from Microsoft SQL Server change tracking tables and generates the latest version of each record. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • TCP Server - Listens at the specified ports and processes incoming data over TCP/IP connections. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • UDP Multithreaded Source - Reads messages from one or more UDP ports. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
  • UDP Source - Reads messages from one or more UDP ports.
  • UDP to Kafka (Deprecated) - Reads messages from one or more UDP ports and writes the data to Kafka.
  • WebSocket Client - Reads data from a WebSocket server endpoint.
  • WebSocket Server - Listens on a WebSocket endpoint and processes the contents of all authorized WebSocket client requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.

集群模式的组件

In cluster pipelines, you can use the following origins:

  • Hadoop FS - Reads data from HDFS, Amazon S3, or other file systems using the Hadoop FileSystem interface.
  • Kafka Consumer - Reads messages from Kafka. Use the cluster version of the origin.
  • MapR FS - Reads data from MapR FS.
  • MapR Streams Consumer - Reads messages from MapR Streams.

edge 模式

In edge pipelines, you can use the following origins:

  • Directory - Reads fully-written files from a directory.
  • File Tail - Reads lines of data from an active file after reading related archived files in the directory.
  • HTTP Client - Reads data from a streaming HTTP resource URL.
  • HTTP Server - Listens on an HTTP endpoint and processes the contents of all authorized HTTP POST and PUT requests.
  • MQTT Subscriber - Subscribes to a topic on an MQTT broker to read messages from the broker.
  • System Metrics - Reads system metrics from the edge device where SDC Edge is installed.
  • WebSocket Client - Reads data from a WebSocket server endpoint.
  • Windows Event Log - Reads data from a Microsoft Windows event log located on a Windows machine.

开发模式

To help create or test pipelines, you can use the following development origins:

  • Dev Data Generator
  • Dev Random Source
  • Dev Raw Data Source
  • Dev SDC RPC with Buffering
  • Dev Snapshot Replaying
  • Sensor Reader

参考资料

https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Origins/Origins_overview.html#concept_hpr_twm_jq__section_tvn_4bc_f2b

 
 
 
 

streamsets origin 说明的更多相关文章

  1. StreamSets 相关文章

    相关streamsets 文章(不按顺序) 学习视频-百度网盘 StreamSets 设计Edge pipeline StreamSets Data Collector Edge 说明 streams ...

  2. streamsets 3.5 的一些新功能

    streamsets 3.5 有了一些新的特性以及增强,总之是越来越方便了,详细的可以 查看官方说明,以下简单例举一些比较有意义的. origins 新的pulsar 消费origin jdbc 多表 ...

  3. streamsets 集成 cratedb 测试

    我们可以集成crate 到streamsets 中可以实现强大的数据导入,数据分析能力. 演示的是进行csv 文件的解析并输出到cratedb 环境使用docker && docker ...

  4. StreamSets sdc rpc 测试

    一个简单的参考图 destination pipeline 创建 pipeline flow sdc destination 配置 origin sdc rpc pipeline pipeline f ...

  5. StreamSets SDC RPC Pipelines说明

    主要目的是进行跨pipeline 数据的通信,而不仅仅是内部pipeline 的通信,之间不同网络进行通信 一个参考图 pipeline 类型 origin destination 部署架构 使用多个 ...

  6. StreamSets 设计Edge pipeline

    edge pipeline 运行在edge 执行模式,我们可以使用 data collector UI 进行edge pipeline 设计, 设计完成之后,你可以部署对应的pipeline到edge ...

  7. streamsets excel 数据处理

    streamsets 有一个directory的origin 可以方便的进行文件的处理,支持的格式也比较多,使用简单 pipeline flow 配置 excel 数据copy 因为使用的是容器,会有 ...

  8. streamsets 错误记录处理

    我们可以在stage 级别,或者piepline 级别进行error 处理配置 pipeline的错误记录处理 discard(丢踢) send response to Origin pipeline ...

  9. streamsets http client && json parse && local fs 使用

    streamsets 包含了丰富的组件,origin processer destination 测试例子为集成了http client 以及json 处理 启动服务 使用docker 创建pipel ...

随机推荐

  1. Recover Binary Search Tree,恢复二叉排序树

    问题描述:题意就是二叉树中有两个节点交换了,恢复结构. Two elements of a binary search tree (BST) are swapped by mistake. Recov ...

  2. Android 旋转、平移、缩放和透明度渐变的补间动画

    补间动画就是通过对场景里的对象不断进行图像变化来产生动画效果.在实现补间动画时,只需要定义开始和结束的“关键帧”,其他过渡帧由系统自动计算并补齐.在Android中,提供了以下4种补间动画. **1. ...

  3. UOJ #164 【清华集训2015】 V

    题目链接:V 这道题由于是单点询问,所以异常好写. 注意到每种修改操作都可以用一个标记\((a,b)\)表示.标记\((a,b)\)的意义就是\(x= \max\{x+a,b\}\) 同时这种标记也是 ...

  4. tcpdump 实现原理【整理】

    参考:http://blog.sina.com.cn/s/blog_523491650101au7f.html 一.tcpdump 对于本机中进程的系统行为调用跟踪,strace是一个很好的工具,而在 ...

  5. 雷林鹏分享:Ruby 发送邮件 - SMATP

    Ruby 发送邮件 - SMATP SMTP(Simple Mail Transfer Protocol)即简单邮件传输协议,它是一组用于由源地址到目的地址传送邮件的规则,由它来控制信件的中转方式. ...

  6. ActiveStorage Overview --Rails guide (history:7-1更新)

    如何attach一个或多个文件到一个记录.has_many_attach()方法. 如何删除一个附加的文件. purge方法 如何连接到一个附加的文件.url_for() 如何使用variants来转 ...

  7. Android------视频播放器(包含全屏播放,快退,快进,腾讯新闻的列表播放等)

    前段时间做了一个新闻APP,涉及到了列表视频播放,和腾讯新闻APP差不多,总结了一下代码,写了一个Demo来分享给大家. 用了  TabLayout+RecylerView+自定义视频控件  完成的 ...

  8. 实用性较强的idea插件

    1. .ignore 生成各种ignore文件,一键创建git ignore文件的模板,免得自己去写 2. GsonFormat 一键根据json文本生成java类  非常方便 3.Maven Hel ...

  9. idea中解决Git反复输入代码的问题

    打开git终端,或者idea中的插件终端,输入命令: git config --global credential.helper store 借用一下别人的图不要介意哈.......... 执行上述命 ...

  10. 在请求中使用XML Publisher生成文件报错

    在页面上使用按钮生成该文件不报错,但是使用请求就报错. 错误内容如下 Error : No corresponding LOB data found :SELECT L.FILE_DATA FILE_ ...