Processors 表示对于一种数据操作处理,在pipeline中可以应用多个Processors,
同时根据不同的执行模式,可以分为独立模式的,集群模式、边缘模式(agent),以及
帮助测试的测试Processors

独立pipelineonly

  • Record Deduplicator - Removes duplicate records.

独立&&集群pipeline

  • Aggregator - Performs aggregations and displays the results in Monitor mode and writes the results to events when enabled. This processor does not update the records being evaluated.
  • Base64 Field Decoder - Decodes Base64 encoded data to binary data.
  • Base64 Field Encoder - Encodes binary data using Base64.
  • Data Parser - Parses NetFlow or syslog data embedded in a field.
  • Delay - Delays passing a batch to the rest of the pipeline.
  • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
  • Field Flattener - Flattens nested fields.
  • Field Hasher - Uses an algorithm to encode sensitive data.
  • Field Masker - Masks sensitive string data.
  • Field Merger - Merges fields in complex lists or maps.
  • Field Order - Orders fields in a map or list-map root field type and outputs the fields into a list-map or list root field type.
  • Field Pivoter - Pivots data in a list, map, or list-map field and creates a record for each item in the field.
  • Field Remover - Removes fields from a record.
  • Field Renamer - Renames fields in a record.
  • Field Replacer - Replaces field values.
  • Field Splitter - Splits the string values in a field into different fields.
  • Field Type Converter - Converts the data types of fields.
  • Field Zip - Merges list data from two fields.
  • Geo IP- Returns geolocation and IP intelligence information for a specified IP address.
  • Groovy Evaluator - Processes records based on custom Groovy code.
  • HBase Lookup - Performs key-value lookups in HBase to enrich records with data.
  • Hive Metadata - Works with the Hive Metastore destination as part of the Drift Synchronization Solution for Hive.
  • HTTP Client - The HTTP Client processor sends requests to an HTTP resource URL and writes the results to a field.
  • JavaScript Evaluator - Processes records based on custom JavaScript code.
  • JDBC Lookup - Performs lookups in a database table through a JDBC connection.
  • JDBC Tee - Writes data to a database table through a JDBC connection, and enriches records with data from generated database columns.
  • JSON Generator - Serializes data from a field to a JSON-encoded string.
  • JSON Parser - Parses a JSON object embedded in a string field.
  • Jython Evaluator - Processes records based on custom Jython code.
  • Kudu Lookup - Performs lookups in Kudu to enrich records with data.
  • Log Parser - Parses log data in a field based on the specified log format.
  • PostgreSQL Metadata - Tracks structural changes in source data then creates and alters PostgreSQL tables as part of the Drift Synchronization Solution for PostgreSQL.
  • Redis Lookup - Performs key-value lookups in Redis to enrich records with data.
  • Salesforce Lookup - Performs lookups in Salesforce to enrich records with data.
  • Schema Generator - Generates a schema for each record and writes the schema to a record header attribute.
  • Spark Evaluator - Processes data based on a custom Spark application.
  • SQL Parser - Parses SQL queries in a string field.
  • Static Lookup - Performs key-value lookups in local memory.
  • Stream Selector - Routes data to different streams based on conditions.
  • Value Replacer (Deprecated) - Replaces existing nulls or specified values with constants or nulls.
  • Whole File Transformer - Transforms Avro files to Parquet.
  • XML Flattener - Flattens XML data in a string field.
  • XML Parser - Parses XML data in a string field.

边缘pipeline

  • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
  • Field Remover - Removes fields from a record.
  • JavaScript Evaluator - Processes records based on custom JavaScript code.
  • Stream Selector - Routes data to different streams based on conditions.

测试Processors

  • Dev Identity
  • Dev Random Error
  • Dev Record Creator

参考资料

https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Processors/Processors_overview.html#concept_hpr_twm_jq

 
 
 
 

streamsets Processors 说明的更多相关文章

  1. StreamSets 相关文章

    相关streamsets 文章(不按顺序) 学习视频-百度网盘 StreamSets 设计Edge pipeline StreamSets Data Collector Edge 说明 streams ...

  2. streamsets 3.5 的一些新功能

    streamsets 3.5 有了一些新的特性以及增强,总之是越来越方便了,详细的可以 查看官方说明,以下简单例举一些比较有意义的. origins 新的pulsar 消费origin jdbc 多表 ...

  3. streamsets geoip 使用

    geoip 分析对于网站数据分析是很方便的 安装geoip2 下载地址 https://dev.maxmind.com/geoip/geoip2/geolite2/ 配置streamsets geoi ...

  4. streamsets stream selector 使用

    stream selector 就是一个选择器,可以方便的对于不同record 的数据进行区分,并执行不同的处理 pipeline flow stream selector 配置 local fs 配 ...

  5. StreamSets使用指南

    StreamSets使用指南 最近在调研Streamsets,照猫画虎做了几个最简单的Demo鉴于网络上相关资料非常少,做个记录. 1.简介 Streamsets是一款大数据实时采集和ETL工具,可以 ...

  6. lib/sqlalchemy/cextension/processors.c:10:20: 致命错误: Python.h:没有那个文件或目录

    本文地址:http://www.cnblogs.com/yhLinux/p/4063444.html $ sudo easy_install sqlalchemy [sudo] password fo ...

  7. BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Conferencing Processors

    BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Confer ...

  8. regardless of how many processors are devoted to a parallelized execution of this program

    https://en.wikipedia.org/wiki/Amdah's_law Amdahl's law is often used in parallel computing to predic ...

  9. using 40 logical processors based on SQL Server licensing SqlServer CPU核心数限制问题

    公司服务器是120核心cpu,但是实际应用中只有40核,原因是业务部门发现服务器cpu承载30%的时候sql 就会卡死: 然后从sqlserver 去查询,cpu核心数: SELECT COUNT(1 ...

随机推荐

  1. 利用ssh-copy-id复制公钥到多台服务器

    http://www.cnblogs.com/panchong/p/6027138.html?utm_source=itdadao&utm_medium=referral # 连接新主机时,不 ...

  2. Windows共享文件

    通常局域网中处于同一工作组或同一域的计算机只要右键设置文件夹共享即可,但跨工作组或跨域的共享就需要设置一番了. 一.启用共享 二.关闭共享密码保护 这是英文版系统示例,中文版一样,就是在第一步向下拉一 ...

  3. Spark 任务提交脚本

    说明 该脚本是根据输入起始日期-结束日期,执行从数据库拉取日期间隔数据到HDFS.日期间隔中的日期就是每一年的自然日. 日期格式可以是以下几种:2018-01-01 2018-12-31 [-][/] ...

  4. ERROR: cannot launch node of type [robot_pose_publisher/robot_pose_publisher]: robot_pose_publisher

    sudo apt-get install ros-indigo-robot-pose-publisher

  5. Mysql语句转义

    String sqlStr = "SELECT * FROM t_sys_dic WHERE idPath LIKE" + "'" + "/19/20 ...

  6. JSP 语法

    JSP 语法 本小节将会简单地介绍一下JSP开发中的基础语法. 脚本程序 脚本程序可以包含任意量的Java语句.变量.方法或表达式,只要它们在脚本语言中是有效的. 脚本程序的语法格式: <% 代 ...

  7. hdu4289最小割

    最近博客断更了一段时间啊,快期末了,先把这个专题搞完再说 最小割=最大流 拆点方法很重要,刚开始我拆点不对就wa了,然后改进后tle,应该是数组开小了,一改果然是 #include<map> ...

  8. III USP Freshmen ContestH. MaratonIME gets candies

    这题挺有意思的,刚开始不会这交互题,模仿着做了一题就会了,蛮简单 的 这题我用2分,结果wa了,想了一下发现,1到1e9二分50次完全不够用啊,那就转换一下思维,先求出在10^n~10^(n+1)的n ...

  9. git commit进行代码检查

    使用Ant Design Pro提交代码的时候进行代码检查报了很多错 git commit --no-verify -m "commit"   就可以跳过代码检查 或者在项目里新建 ...

  10. 本人遇到的spring事务之UnexpectedRollbackException异常解决笔记

    本人最近在使用spring事务管理的过程中遇到如下异常,导致服务端抛出500给前端,让搞前端的哥们抱怨我心里着实不爽,前前后后折腾了近半个小时才得于解决,今天就做个笔记,以免日后又犯这个错误.好了,错 ...