Spring Batch Framework– introduction chapter(下)
Extract,Transform, and load(ETL)
Briefly stated, ETL is a process in the database anddata-warehousing world that performs the following steps:
- Extracts data from an external data source
- Transforms the extracted data to match a specific purpose
- Loads the transformed data into a data target; a database or data warehouse.
Many products, both free and commercial, can help create ETLprocesses. This is a bigger topic than we can address here, bt it isn’t alwaysas simple as these three steps. Writing an ETL process can present its own setof challenges invloving parallel processing, rerunnability, and recoverability.The ETL community has developeed its own set of best practices to meet theseand other requirements.
For the prurpose of our discussion, this ETL process is ablack box; it could be implemented with an ETL tool(like Talend) or even withanother Spring Batch job.
Spring Batch includes many ready-to-use components to readfrom and write to daa stores like files and databases.
Chunk Processing is particularly well suited to handle largedata operations because a job handles itenms in small chunks instead ofprocessing them all at once. Practically speaking, a large file won’t be loadedin memory; instead it’s streamed, which is more efficient in terms of memory consumption.Chunk processing allows more flexibility to manage the data flow in a job.Spring Batch also handles transactions and errors around read and writeoperations.
Spring Batch provides the FlatFileItemReader class to readrecords from a flat file. To use a FlatFileItemReader, you need to configuraresome Spring beans and implement a component that creates domain objects fromwhat the FlatFileItemReader reads;Spring Batch will handle the rest.
Choosing a chunk size and commit interval
First, the size of a chunk and the commit interval are thesame thing! Second, there’s no definitive value to choose. Our recommendationis a value between 10 and 200. Too small a chunk size creates too many transactions,which is costly and makes the job run slowly. Too alrge a chunk size makestransactional resources-like databases-run slowly too, because a database mustbe able to roll back operations. The best value for the commit interval dependson many factors:data, processing, nature of the resources, and so on. Thecommit interval is a parameter in Spring Batch, so don’t hesitate to change itto find the most appropriate value for your jobs.
Decompressing a file isn’t a read-write step, but Springbatch is flexible enough to implement such a task as part of a job.A 1-GB flatfile can compress to 100MB, which is a more reasonable size for file transfersover the internet.
Note that you could encrypt the file as well, ensuring thatno one could read the product data if the file were intercepted duringtransfer. The encryption could be done before the compression or as part of it.Spring Batch provides an extension point to handle processing in a batchprocess step: The Tasklet. You implement a Tasklet that decompresses a ZIParchive into its source flat file.
How does a job refer to the job repository?
You may have noticed that we say a job needs the jobrepository to run but we don’t make any reference to the job repository bean inthe job configuration. The XML step element can have its job-repositoryattribute refer to a job repository bean. This attribute isn’t mandatory,because by default the job uses a jobRepository bean. As long as you declare ajobRepository bean of type JobRepository, you don’t need to explicitly refer toit in your job configuration.
Spring Batch Framework– introduction chapter(下)的更多相关文章
- Spring Batch 使用场景
一个标准的批处理程序通常会从数据库,文件或者队列中读取大量的数据和记录,然后对获取的数据进行处理,然后将修改后的格式写回到数据库中. 通常 Spring Batch 在离线模式下进行工作,不需要用户干 ...
- Spring Batch Concepts Chapter
Spring Batch Concepts Chapter The below figure shows two kinds of Spring Batch components:infrastruc ...
- Spring Boot下Spring Batch入门实例
一.About Spring Batch是什么能干什么,网上一搜就有,但是就是没有入门实例,能找到的例子也都是2.0的,看文档都是英文无从下手~~~,使用当前最新的版本整合网络上找到的例子. 关于基础 ...
- 初探Spring Batch
此系列博客皆为学习Spring Batch时的一些笔记: 为什么我们需要批处理? 我们不会总是想要立即得到需要的信息,批处理允许我们在请求处理之前就一个既定的流程开始搜集信息:比如说一个银行对账单,我 ...
- Spring Batch的事务-Part 1:基础
原文 https://blog.codecentric.de/en/2012/03/transactions-in-spring-batch-part-1-the-basics/ This is th ...
- Spring Batch 批处理框架介绍
前言 在大型的企业应用中,或多或少都会存在大量的任务需要处理,如邮件批量通知所有将要过期的会员,日终更新订单信息等.而在批量处理任务的过程中,又需要注意很多细节,如任务异常.性能瓶颈等等.那么,使用一 ...
- [Spring Batch 系列] 第一节 初识 Spring Batch
距离开始使用 Spring Batch 有一段时间了,一直没有时间整理,现在项目即将完结,整理下这段时间学习和使用经历. 官网地址:http://projects.spring.io/spring-b ...
- Spring Batch(0)——控制Step执行流程
Conditional Flow in Spring Batch I just announced the new Learn Spring course, focused on the fundam ...
- Spring Batch在大型企业中的最佳实践
在大型企业中,由于业务复杂.数据量大.数据格式不同.数据交互格式繁杂,并非所有的操作都能通过交互界面进行处理.而有一些操作需要定期读取大批量的数据,然后进行一系列的后续处理.这样的过程就是" ...
随机推荐
- Oracle数据库的下载和安装
那天分享一下Oracle的下载和安装的过程,有需要的朋友可以借鉴参考一下.如有雷同不胜感激! 首先可以到Oracle的官网下载Oracle的最次年版本的Oracle数据库.一下是个人下载的数据库版本百 ...
- c语言的自动类型转换
转自c语言的自动类型转换 自动转换遵循以下规则: 1) 若参与运算量的类型不同,则先转换成同一类型,然后进行运算. 2) 转换按数据长度增加的方向进行,以保证精度不降低.如 ...
- edX Devstack 汉化(i18n)
操练了几日edx Devstack后,发现自己e文还是那么poor,如果和我一样,继续往下看,否则可以轻轻的飘过- 1.运行起 edx Devstack cd /devstack vagrant up ...
- Android 使控件各占屏幕的一半
在xml中将两个要占屏幕一半的控件都加上android:layout_weight="1": 注意:weight只能用在LinearLayout布局中. 在LinearLayout ...
- Xcode6新建的工程没有Frameworks文件夹了?!原来是这样
http://stackoverflow.com/questions/24181062/default-frameworks-missing-in-xcode-6-beta They are impo ...
- Chrome已原生支持“Chrome To Mobile”
完成PC和手机端Chrome的同gmail帐号绑定后,即可按如下操作进行: 已知在版本“19.0.1084.15”中,这个功能默认未开启,需要进入“chrome://flags/”进行手工启用(早几期 ...
- 谈谈分布式事务之一:SOA需要怎样的事务控制方式
在一个基于SOA架构的分布式系统体系中,服务(Service)成为了基本的功能提供单元,无论与业务流程无关的基础功能,还是具体的业务逻辑, 均实现在相应的服务之中.服务对外提供统一的接口,服务之间采用 ...
- NET中的引用类型和值类型 zt
.NET中的类型分为值类型和引用类型,他们在内存布局,分配,相等性,赋值,存储以及一些其他的特性上有很多不同,这些不同将会直接影响到我们应用程序 的效率.本文视图对.NET 基础类型中的值类型和引用类 ...
- 数据结构:二级指针与Stack的数组实现
[简介] Stack,栈结构,即传统的LIFO,后进先出,常用的实现方法有数组法和链表法两种.如果看过我上一篇文章<数据结构:二级指针与不含表头的单链表>,一定会看到其中的关键在于,利用v ...
- 【原】Comparator和Comparable的联系与区别
1.知识点了解 Comparator和Comparable都是用用来实现集合中元素的比较.排序的,所以,经常在集合外定义Comparator接口的方法和集合内实现Comparable接口的方法中实现排 ...