What can Reactive Streams offer EE4J?
https://developer.lightbend.com/blog/2018-02-06-reactive-streams-ee4j/index.html
By James Roper (@jroper), February 6, 2018
In my current role at Lightbend I’m investigating and pursuing opportunities where Reactive Streams can make the lives of EE4J (the new Java EE) developers better. In this blog post I’m going to share some of the ideas that we’ve had for Reactive Streams in EE4J, and how these ideas will benefit developers.
Reactive Streams was adopted by the JDK in the form of the java.util.concurrent.Flow API. It allows two different libraries that support asynchronous streaming to connect to each other, with well specified semantics about how each should behave, so that backpressure, completion, cancellation and error handling is predictably propagated between the two libraries. There is a rich ecosystem of open source libraries that support Reactive Streams, and since its inclusion in JDK9, there are a few in development implementations that are targetting the JDK, including the incubating JDK9 HTTP Client, and the Asynchronous Database Adapter (ADBA) effort that have also adopted it.
High level use case
Before I jump into the specific parts of EE4J where Reactive Streams would be useful, it’s worth talking about the overall picture. Reactive Streams is an integration API, or more specifically, a Service Provider Interface (SPI). It’s not intended that application developers implement the reactive streams interfaces directly themselves, rather, it is intended that the various streaming data sources and sinks provided by libraries, database connectors, clients and so on, implement reactive streams, so that application developers can then easily plumb those sources and sinks together.
Doing a quick off the top of my head count based on my knowledge of existing specs that do streaming in EE4J and the JDK, there exist no less than 10 different APIs that are offered by the various specs for streaming data either synchronously or asynchronously. These range from, of course, InputStream
and OutputStream
in the JDK, to NIO Channel
’s, to the Servlet 3.1ReadListener
and WriteListener
extensions, to the JDBC ResultSet
, JSR 356 @OnMessage
annotations, Message Driven Beans and JMS, CDI events using @Observes
, Java collectionStream
and Iterator
based APIs, and finally, the new JDK9 Flow
API. Each of these APIs streams data in some way, and offers varying levels of capability - some are synchronous, some are asynchronus, some offer backpressure and some don’t, some have well defined error handling semantics and others don’t.
The problem arises when I want to connect two of these together. For example, if I want to emit CDI events on a WebSocket. Or if I want to plumb a stream of messages into a database. Or if I want to stream data from an HTTP client response to a servlet response. When the two APIs for streaming are different, then I have to write non trivial boiler plate code to connect them together. This even applies to connecting an InputStream
and OutputStream
, I can’t simply pass an InputStream
to an OutputStream
and say “here’s your source of data, write it”, I have to allocate a buffer, read and write in a loop making sure that I get the semantics and argument ordering correct, I have to ensure that I properly wrap things in try-with-resources blocks to ensure that things are cleaned up correctly even when there are errors, and so on. And all this for what should be the simplest of all use cases.
It gets even more complex when we start integrating asynchronous APIs. What if the source is producing too much data for the sink, how do I tell the source to slow down? In some cases, the API’s don’t even offer a backpressure mechanism, in others, they do, but the mechanisms differ, we have interest based event APIs like NIO, we have on ready callback based APIs like Servlet 3.1, and we have token based backpressure APIs like JDK9 Flow. Connecting these together, especially given that these asynchronous APIs require concurrent programming, is unreasonable to expect application developers to implement and maintain for each permutation of APIs that need to interoperate.
And so the high level reason for supporting Reactive Streams in EE4J is in two parts. Firstly, it facilitates integration of EE4J APIs with each other. Secondly, it facilitates integration of EE4J APIs with third party libraries. To connect a Reactive Streams publisher to a Reactive Streams Subscriber is one line of code, publisher.subscribe(subscriber);
, and in this single line of code, a developer can be confident that data flow, backpressure, completion handling and error handling are correctly handled.
So as we consider the individual specs within EE4J and the use cases associated with them that Reactive Streams helps with, we should remember that each additional place where Reactive Streams is used in EE4J increases the usefulness of Reactive Streams in other parts of the spec. In my use cases below I’ve tried to focus on use cases that are interesting today in and of themselves, assuming no other changes are made to support Reactive Streams. For each use case that does get implemented, the usefulness of Reactive Streams will multiply.
Servlet IO
It is a common use case for an application to store or transfer files, sometimes large files, perhaps hundreds of megabytes in size. And a very common place to store these is in an object storage service, such as Amazon S3. Let’s imagine you have an expense reporting application, and each expense must have an associated scan or photo of a receipt, so your application offers the ability to upload and download these receipts, using Amazon S3 to store them.
A unique thing about a request to upload or download large files is that such a request is long running. It can take minutes to upload or download a single file - consider someone trying to use the dodgy airport wifi to upload a receipt for a meal they purchased at the airport, uploading their 4mB image at 20kB/s. That single upload will take over 3 minutes. Using the existing blocking APIs offered by the servlet API requires each upload consuming a thread from the servlet containers thread pool for the duration of the upload, 3 minutes. Threads are a very limited resource. Each thread requires up to a few megabytes of memory allocated for it stack (depending on configuration and use), you can’t just go create more threads when you need them because they are expensive. For this reason, servlet containers use pools of threads, a typical configuration is to have a pool of 200 threads. With 200 threads, that server would only be able to serve one upload of a receipt on dodgy wifi per second before the server exhausts its thread pool and starts rejecting requests. And that’s not including any other requests the server has to handle. This is dismal performance, and a very inefficient use of resources.
The solution to the problem is to use asynchronous IO to handle these file uploads and downloads. Asynchronous IO allows the server to only assign a thread to a connection when it actually needs it - when there’s data available to read and write. There are a number of asynchronous HTTP clients out there, for this use case we’ll choose the JDK9 HTTP client. Servlet 3.1 introduced asynchronous IO, so we could use that to receive the uploaded data, however, connecting the JDK9 HTTP client to the Servlet 3.1 asynchronous IO APIs is not at all trivial. These 180 lines of concurrent code are what it takes to write code that adapts Servlet IO to the Reactive Streams API offered by the JDK9 HTTP client. However, if HttpServletRequest
offered a method that allowed getting a Publisher<ByteBuffer>
for consuming a request, this is what a servlet that handled file uploads to S3 would look like:
public class S3UploadServlet extends HttpServlet {
private final HttpClient client = HttpClient.newHttpClient();
private final String S3_UPLOAD_URL = ...;
public void doPost(HttpServletRequest req, HttpServletResponse resp) {
AsyncContext ctx = req.startAsync();
client.sendAsync(HttpRequest.newBuilder(S3_UPLOAD_URL)
// pipe the incoming request bytes directly to S3
.POST(BodyPublisher.fromPublisher(req.getPublisher()))
.build()
).thenAccept(s3Response -> {
resp.setStatus(s3Response.statusCode());
ctx.complete();
});
}
}
As you can see, the line of code needed to connect the servlet request body stream to the HTTP client request body stream was.POST(BodyPublisher.fromPublisher(req.getPublisher()))
. That’s 180 lines of non trivial concurrent code in the example I linked to above down to one line of trivial code, with built in handling of backpressure and completion/error propogation. Furthermore the code reads like the high level task that the developer is trying to achieve, literally “Post the body published from the request”.
Multipart request handling
Servlet 3.0 introduced support for handling multipart/form-data
requests, however this support involves buffering to disk, and does not offer any asynchronous streaming capabilities. As an extension of the previous use case, a developer might like to use multipart/form-data
to upload files. A reactive streams based API to handle this might expose the parts of the form as a stream of parts, and each part might be a ByteBuffer
sub stream itself. This is what the code might look like, using Akka Streams to handle the outer stream:
public class S3UploadServlet extends HttpServlet {
private final HttpClient client = HttpClient.newHttpClient();
private final ActorSystem system = ActorSystem.create();
private final Materializer materializer = ActorMaterializer.create(system);
private final String S3_UPLOAD_URL = ...;
public void doPost(HttpServletRequest req, HttpServletResponse resp) {
AsyncContext ctx = req.startAsync();
Source.fromPublisher(req.getPartPublisher())
// Filter to only get the file part
.filter(part -> part.getName().equals("file"))
// Handle one part at a time asynchronously
.mapAsync(1, part -> {
// Post the file part to S3
client.sendAsync(HttpRequest.newBuilder(S3_UPLOAD_URL)
// Here we plumb the publisher for the part (containing the
// bytes) to the HTTP client request body to S3.
.POST(BodyPublisher.fromPublisher(part.getPublisher()))
.build())
})
// Collect the the result of the upload
.runWith(Sink.head(), materializer)
// Attach a callback to the resulting completion stage
.thenAcceptAsync(s3Response -> {
resp.setStatus(s3Response.statusCode());
ctx.complete();
});
}
}
Similarly, other implementations of Reactive Streams can easily be used to handle the multipart request in the same way. The important thing that Reactive Streams allows here is that a developer can select whatever tool they want to handle which ever part of the stream they want, and be assured that end to end, all data, backpressure, completion and errors are consistently propogated.
Messaging/JMS
So far we’ve only looked at use cases that deal with streaming bytes for IO. Reactive Streams is also very useful for streaming high level messages, such as those produced and consumed by message brokers.
Here’s an example of what it might look like to subscribe to a queue using a Reactive Streams compatible messaging API in EE4J, using Akka Streams to handle the stream and save message content using an asynchronous database API:
@MessageSubscriber(topic = "mytopic")
public Subscriber<MessageEnvelope<MyEvent>> handleMyTopic() {
// Create an Akka stream that will materialize into a Subscriber
// that can subscribe to the events
Subscriber<MessageEvelope<MyEvent>> handler = Source.asSubscriber()
// Handle each message by saving it to the database
.mapAsync(1, msg ->
saveToDatabase(msg.data())
// return the message rather than the result of the database op
.thenApply(result -> msg)
// Commit the message once handled
).map(msg -> msg.commit())
// Feed into an ignoring sink, since everything is now handled
.to(Sink.ignore())
// And run it to get the subscriber
.run(materializer);
return handler;
}
This is a fairly trivial example, but using Akka Streams, we could have the messages fanned out to multiple other consumers, we could have cycles in the processing that aggregate state, etc.
WebSockets
WebSockets is another type of long lived connection where synchronous IO is not appropriate. The current EE4J spec for WebSockets, JSR-356, does offer asynchronous handling of messages, however it does not support backpressure on receiving (so if the other end is sending too much data, there is no way to tell it to slow down, you must buffer or fail). It is also a purpose built API just for WebSockets, so any integration with other asynchronous data sources or sinks must be implemented manually by the end user.
Now imagine we wanted to implement a chat room, perhaps we’re going to use Apache Kafka as our chatroom backend, with one single partition topic per room. To implement this today, a reasonable amount of boilerplate is required, not just to transfer messages from the client to Apache Kafka and back, but also to propogate errors. Furthermore, if a client was producing messages at a high rate - faster than Apache Kafka is willing to consume, these messages are going to buffer on the server, causing it to run out of memory.
There exist however a number of Reactive Streams implementations for Apache Kafka. If JSR-356 were to support Reactive Streams, perhaps in the form of an @OnStream
annotation that can be used as an alternative to @OnMessage
, this is what it might look like to implement such a chat room:
@OnStream
public Publisher<ChatMessage> joinRoom(
@PathParam("room") room,
Publisher<ChatMessage> incomingMessages
) {
// The Kafka subscriber will send any messages it receives to Kafka
Subscriber<ChatMessage> kafkaSubscriber = createKafkaConsumerForRoom(room);
// The Kafka publisher will emit any messages it receives from Kafka
Publisher<ChatMessage> kafkaPublisher = createKafkaProducerForRoom(room);
// We now connect the incoming chat messages to the subscriber
incomingMessages.subscribe(kafkaSubscriber);
// And return the rooms publisher to be published to the WebSocket
return kafkaPublisher;
}
The createKafka*
methods would be code specific for connecting to Kafka, the only JSR-356 specific code would be the above method itself.
CDI
Contexts and Dependency Injection (CDI) could take advantage of Reactive Streams for event publishing and subscribing, with many use cases similar to the ones described above, but another opportunity for CDI that we think is important is rather something that facilitates not just CDI, but all asynchronous processing in general.
The problem comes with CDI implementations using thread locals to propagate context, for example, the context for the current request, which might include the current authenticated user, cached authorization information for the user, etc. The assumption in CDI is that the container will always be in control of the threads that operate within this context. However, with asynchronous processing, that is not the case.
A good example of this is in a chat room. Each user connected to the chat room has an active WebSocket request, which has a particular CDI context that goes with it. Processing of messages received from these users might happen in the right CDI context, but what happens when User A sends a message to the chat room, and then it has to be sent to User B? The thread that is propogating the message to User B now needs the request context for User B’s request, in order to correctly handle the message, but that thread will have User A’s request context, because it was the receipt of a message from User A that initiated the current processing.
The only answers to this in CDI currently is to not use CDI contexts, rather, to capture all context at the start of the request, and from then on use a non CDI mechanism (such as simply passing the context every manually) to use that context when it’s needed. We think CDI can offer a better solution to this, by allowing context to be captured, and set up/torn down in another thread.
JPA
Before JPA can support any asynchronous operations, a standard for asynchronous database drivers is needed. Fortunately, there is currently a lot of active development going on towards this, so we can expect to see progress here I think in the next 12 months. Once that support exists, here are some examples of how JPA could be modified to support asynchronous streaming:
Streamed queries
Database exports are one obvious example for query streaming where you don’t want to load the entire result set into memory, but I think more interesting use cases will arrive in the not too distant future, particularly around event logging, as different services may want to stream events from a database to be brought up to date on demand. Here’s an example of serving one such event stream through a Reactive Streams WebSocket API, using an imaginedgetResultPublisher
method on TypedQuery
:
@OnStream
public Publisher<Event> streamEvents(
@QueryParam("since") Date since
) {
Publisher<Event> events = entityManager.createQuery(
"select e from events where e.timestamp >= :since",
Event.class
).setParameter("since", since)
.getResultPublisher();
return events;
}
Once again, backpressure, completion and error handling is all done for you.
Streamed ingestion
An example of streamed ingestion might be persisting logs, here’s an example where logs a pushed in via WebSockets, using an imagined persistSubscriber
method on EntityManager
:
@OnStream
public void ingestLogs(
Publisher<Log> logs
) {
Subscriber<Log> ingester = entityManager.persistSubscriber(Log.class);
logs.subscribe(ingester);
}
Streaming combinators
In the use cases above, when plumbing streams, there’s often a need to transform the stream in some way, whether it’s a simple map or filter, or perhaps collecting a stream into a single value, or maybe concatenating, broadcasting or merging streams. For Reactive Streams, this functionality is provided by a number of implementations such as Akka Streams, Reactor and RxJava. While these do the job well, it may be desirable for EE4J developers to have common functionality available to them out of the box.
Furthermore, a standard library for streaming combinators allows other APIs to use this to provide more fluent APIs to developers than what the bare Reactive Streams Publisher and Subscriber interfaces offer, without having to depend on a third party library.
Finally, it would allow EE4J libraries to assume a basic set of streaming operators that they can use in their own implementations, since they could not and should not depend on 3rd party libraries. This is very important, as otherwise the burden of reimplementing many operators would fall into the hands of each and every library wanting to provide streaming capabilities, sidetracking them from the core issue they’re attempting to solve, be it database access, WebSockets or anything else.
New possibilities
All of the use cases above only cover changes to existing streaming usages in EE4J, however, adoption of a robust streaming standard like Reactive Streams facilitates new possibilities for features and technologies that EE4J could support.
Here are some examples:
Event sourcing
Event sourcing itself may not be that interesting from a streaming perspective, but the ability to consume the event log as a stream of messages is incredibly interesting. Basing the event log stream on Reactive Streams would allow all sorts of consumers to be plugged in, such as message brokers, WebSockets, CDI event publishers, and databases.
Real time message distribution
Applications are tending to provide more responsive user interfaces, with updates being pushed to users as soon as they happen. While WebSockets may offer the mechanism to communicate with the client, EE4J doesn’t yet offer any backend solution for communicating pushes between clusters of machines in the backend. Message brokers can often be too heavy weight to achieve this, and instead a lighter-weight, inter-node communication mechanism would be better. Reactive Streams is the perfect API for building this on, as it then allows seamless plumbing to WebSockets and other stream sinks/sources.
Data processing pipelines
Current EE4J features tend to be focussed on short lived transactional processes, which have a definite start and definite end. There is a shift in the industry to more stream based approaches, which is seen in stream processing projects like Spark, Flink and Kafka Streams. EE4J could provide a mechanism for integrating EE4J applications into these pipelines, and Reactive Streams would be the perfect API to offer this integration.
Summary
In this article we’ve looked at a broad array of possibilities of what Reactive Streams brings to the table for EE4J. Each possibility brings value in its own right, but the more possibilities that are implemented overall, the more value each brings. It’s our hope at Lightbend that we can now start collaborating with the EE4J community in make all of these possibilities a reality.
Next: sbt 1.2.0 roadmap
What can Reactive Streams offer EE4J?的更多相关文章
- Java 9 揭秘(17. Reactive Streams)
Tips 做一个终身学习的人. 在本章中,主要介绍以下内容: 什么是流(stream) 响应式流(Reactive Streams)的倡议是什么,以及规范和Java API 响应式流在JDK 中的AP ...
- JVM平台上的响应式流(Reactive Streams)规范
// Reactive Streams // 响应式流是一个倡议,用来为具有非阻塞后压的异步流处理提供一个标准.大家努力的目标集中在运行时环境(JVM和JavaScript)和网络协议上. 注:响应式 ...
- Spring Boot Reactive Streams
1 响应式编程规范 目标:provide a standard for asynchronous stream processing with non-blocking backpressure ht ...
- FunDA(6)- Reactive Streams:Play with Iteratees、Enumerator and Enumeratees
在上一节我们介绍了Iteratee.它的功能是消耗从一些数据源推送过来的数据元素,不同的数据消耗方式代表了不同功能的Iteratee.所谓的数据源就是我们这节要讨论的Enumerator.Enumer ...
- FunDA(5)- Reactive Streams:Play with Iteratees
FunDA的设计目标就是把后台数据库中的数据搬到内存里,然后进行包括并行运算的数据处理,最后可能再对后台数据库进行更新.如果需要把数据搬到内存的话,那我们就必须考虑内存是否能一次性容纳所有的数据,有必 ...
- FunDA(7)- Reactive Streams to fs2 Pull Streams
Reactive-Stream不只是简单的push-model-stream, 它还带有“拖式”(pull-model)性质.这是因为在Iteratee模式里虽然理论上由Enumerator负责主动推 ...
- oracle 19c jdbc之Reactive Streams Ingestion (RSI) Library
19c jdbc新特性 https://blogs.oracle.com/dev2dev/whats-new-in-193-and-183-jdbc-and-ucp jdbc实现直接路径加载 http ...
- Akka Stream文档翻译:Quick Start Guide: Reactive Tweets
Quick Start Guide: Reactive Tweets 快速入门指南: Reactive Tweets (reactive tweets 大概可以理解为“响应式推文”,在此可以测试下GF ...
- reactive stream: 响应式编程
既然 Reactive Stream 和 Java 8 引入的 Stream 都叫做流,它们之间有什么关系呢?有一点关系,Java 8 的 Stream 主要关注在流的过滤,映射,合并,而 Reac ...
随机推荐
- Oracle通过ROWID删除表中重复记录
-- 1 通过ROWID删除T1表里重复的记录 SELECT ROWID,A,B--DELETE FROM T1WHERE ROWID IN ( SELECT RD FROM ( ...
- Windows10 等 administrator 打开IE 或者edge的方法
gpedit.msc 组策略处理即可
- day 7-19 Mysql索引原理与查询优化
一,介绍 1.什么是索引? 一般的应用系统,读写比例在10:1左右,而且插入操作和一般的更新操作很少出现性能问题,在生产环境中,我们遇到最多的,也是最容易出问题的,还是一些复杂的查询操作,因此对查询语 ...
- python raise
当程序出现错误,python会自动引发异常,也可以通过raise显示地引发异常.一旦执行了raise语句,raise后面的语句将不能执行. 演示raise用法 try: s = None if s ...
- postman+jenkins+newman自动化api接口测试
一.下载nodejs https://nodejs.org/zh-cn/download/ 二.linux下解压 xz -d node-v8.11.3-linux-x64.tar.xz tar xf ...
- python数据结构与算法第十六天【贪心算法与动态规划】
对于一个字符串,对字符串进行分割,分割后的每个子字符串都为回文串,求解所有可行的方案 这个问题可以使用贪心算法与动态规划来求解 步骤如下: (1)先得出所有的单个字符的回文串,单个字符必定是回文串, ...
- python数据结构与算法第十天【插入排序】
1.插入排序的原理 2.代码实现 def insert_sort(alist): # 从第二个位置,即下标为1的元素开始向前插入 for i in range(1, len(alist)): # 从第 ...
- linux硬盘的分区、格式化、挂载以及LVM
linux硬盘的分区.格式化.挂载以及LVM 多块硬盘的组合: 硬盘分两种:ide和scsi. ide硬盘: /dev/hda 第一块IDE硬盘 /dev/hdb 第二块IDE硬盘 ... /de ...
- redis.clients.jedis.exceptions.JedisDataException :READONLY You can't write
分布式直连同步调用测试时出现的错误:主从复制架构下,默认Slave是只读的,如果写入则会报错: redis.clients.jedis.exceptions.JedisDataException: R ...
- Linux下 rewrite_mod 的配置
以下使用最新的 Ubuntu 16.04 测试; 安装好apache后先确认有没有rewrite模块,大多数情况下是有的:ls /etc/apache2/mods-available |grep re ...