What can Reactive Streams offer EE4J?
https://developer.lightbend.com/blog/2018-02-06-reactive-streams-ee4j/index.html
By James Roper (@jroper), February 6, 2018
In my current role at Lightbend I’m investigating and pursuing opportunities where Reactive Streams can make the lives of EE4J (the new Java EE) developers better. In this blog post I’m going to share some of the ideas that we’ve had for Reactive Streams in EE4J, and how these ideas will benefit developers.
Reactive Streams was adopted by the JDK in the form of the java.util.concurrent.Flow API. It allows two different libraries that support asynchronous streaming to connect to each other, with well specified semantics about how each should behave, so that backpressure, completion, cancellation and error handling is predictably propagated between the two libraries. There is a rich ecosystem of open source libraries that support Reactive Streams, and since its inclusion in JDK9, there are a few in development implementations that are targetting the JDK, including the incubating JDK9 HTTP Client, and the Asynchronous Database Adapter (ADBA) effort that have also adopted it.
High level use case
Before I jump into the specific parts of EE4J where Reactive Streams would be useful, it’s worth talking about the overall picture. Reactive Streams is an integration API, or more specifically, a Service Provider Interface (SPI). It’s not intended that application developers implement the reactive streams interfaces directly themselves, rather, it is intended that the various streaming data sources and sinks provided by libraries, database connectors, clients and so on, implement reactive streams, so that application developers can then easily plumb those sources and sinks together.
Doing a quick off the top of my head count based on my knowledge of existing specs that do streaming in EE4J and the JDK, there exist no less than 10 different APIs that are offered by the various specs for streaming data either synchronously or asynchronously. These range from, of course, InputStream
and OutputStream
in the JDK, to NIO Channel
’s, to the Servlet 3.1ReadListener
and WriteListener
extensions, to the JDBC ResultSet
, JSR 356 @OnMessage
annotations, Message Driven Beans and JMS, CDI events using @Observes
, Java collectionStream
and Iterator
based APIs, and finally, the new JDK9 Flow
API. Each of these APIs streams data in some way, and offers varying levels of capability - some are synchronous, some are asynchronus, some offer backpressure and some don’t, some have well defined error handling semantics and others don’t.
The problem arises when I want to connect two of these together. For example, if I want to emit CDI events on a WebSocket. Or if I want to plumb a stream of messages into a database. Or if I want to stream data from an HTTP client response to a servlet response. When the two APIs for streaming are different, then I have to write non trivial boiler plate code to connect them together. This even applies to connecting an InputStream
and OutputStream
, I can’t simply pass an InputStream
to an OutputStream
and say “here’s your source of data, write it”, I have to allocate a buffer, read and write in a loop making sure that I get the semantics and argument ordering correct, I have to ensure that I properly wrap things in try-with-resources blocks to ensure that things are cleaned up correctly even when there are errors, and so on. And all this for what should be the simplest of all use cases.
It gets even more complex when we start integrating asynchronous APIs. What if the source is producing too much data for the sink, how do I tell the source to slow down? In some cases, the API’s don’t even offer a backpressure mechanism, in others, they do, but the mechanisms differ, we have interest based event APIs like NIO, we have on ready callback based APIs like Servlet 3.1, and we have token based backpressure APIs like JDK9 Flow. Connecting these together, especially given that these asynchronous APIs require concurrent programming, is unreasonable to expect application developers to implement and maintain for each permutation of APIs that need to interoperate.
And so the high level reason for supporting Reactive Streams in EE4J is in two parts. Firstly, it facilitates integration of EE4J APIs with each other. Secondly, it facilitates integration of EE4J APIs with third party libraries. To connect a Reactive Streams publisher to a Reactive Streams Subscriber is one line of code, publisher.subscribe(subscriber);
, and in this single line of code, a developer can be confident that data flow, backpressure, completion handling and error handling are correctly handled.
So as we consider the individual specs within EE4J and the use cases associated with them that Reactive Streams helps with, we should remember that each additional place where Reactive Streams is used in EE4J increases the usefulness of Reactive Streams in other parts of the spec. In my use cases below I’ve tried to focus on use cases that are interesting today in and of themselves, assuming no other changes are made to support Reactive Streams. For each use case that does get implemented, the usefulness of Reactive Streams will multiply.
Servlet IO
It is a common use case for an application to store or transfer files, sometimes large files, perhaps hundreds of megabytes in size. And a very common place to store these is in an object storage service, such as Amazon S3. Let’s imagine you have an expense reporting application, and each expense must have an associated scan or photo of a receipt, so your application offers the ability to upload and download these receipts, using Amazon S3 to store them.
A unique thing about a request to upload or download large files is that such a request is long running. It can take minutes to upload or download a single file - consider someone trying to use the dodgy airport wifi to upload a receipt for a meal they purchased at the airport, uploading their 4mB image at 20kB/s. That single upload will take over 3 minutes. Using the existing blocking APIs offered by the servlet API requires each upload consuming a thread from the servlet containers thread pool for the duration of the upload, 3 minutes. Threads are a very limited resource. Each thread requires up to a few megabytes of memory allocated for it stack (depending on configuration and use), you can’t just go create more threads when you need them because they are expensive. For this reason, servlet containers use pools of threads, a typical configuration is to have a pool of 200 threads. With 200 threads, that server would only be able to serve one upload of a receipt on dodgy wifi per second before the server exhausts its thread pool and starts rejecting requests. And that’s not including any other requests the server has to handle. This is dismal performance, and a very inefficient use of resources.
The solution to the problem is to use asynchronous IO to handle these file uploads and downloads. Asynchronous IO allows the server to only assign a thread to a connection when it actually needs it - when there’s data available to read and write. There are a number of asynchronous HTTP clients out there, for this use case we’ll choose the JDK9 HTTP client. Servlet 3.1 introduced asynchronous IO, so we could use that to receive the uploaded data, however, connecting the JDK9 HTTP client to the Servlet 3.1 asynchronous IO APIs is not at all trivial. These 180 lines of concurrent code are what it takes to write code that adapts Servlet IO to the Reactive Streams API offered by the JDK9 HTTP client. However, if HttpServletRequest
offered a method that allowed getting a Publisher<ByteBuffer>
for consuming a request, this is what a servlet that handled file uploads to S3 would look like:
public class S3UploadServlet extends HttpServlet {
private final HttpClient client = HttpClient.newHttpClient();
private final String S3_UPLOAD_URL = ...;
public void doPost(HttpServletRequest req, HttpServletResponse resp) {
AsyncContext ctx = req.startAsync();
client.sendAsync(HttpRequest.newBuilder(S3_UPLOAD_URL)
// pipe the incoming request bytes directly to S3
.POST(BodyPublisher.fromPublisher(req.getPublisher()))
.build()
).thenAccept(s3Response -> {
resp.setStatus(s3Response.statusCode());
ctx.complete();
});
}
}
As you can see, the line of code needed to connect the servlet request body stream to the HTTP client request body stream was.POST(BodyPublisher.fromPublisher(req.getPublisher()))
. That’s 180 lines of non trivial concurrent code in the example I linked to above down to one line of trivial code, with built in handling of backpressure and completion/error propogation. Furthermore the code reads like the high level task that the developer is trying to achieve, literally “Post the body published from the request”.
Multipart request handling
Servlet 3.0 introduced support for handling multipart/form-data
requests, however this support involves buffering to disk, and does not offer any asynchronous streaming capabilities. As an extension of the previous use case, a developer might like to use multipart/form-data
to upload files. A reactive streams based API to handle this might expose the parts of the form as a stream of parts, and each part might be a ByteBuffer
sub stream itself. This is what the code might look like, using Akka Streams to handle the outer stream:
public class S3UploadServlet extends HttpServlet {
private final HttpClient client = HttpClient.newHttpClient();
private final ActorSystem system = ActorSystem.create();
private final Materializer materializer = ActorMaterializer.create(system);
private final String S3_UPLOAD_URL = ...;
public void doPost(HttpServletRequest req, HttpServletResponse resp) {
AsyncContext ctx = req.startAsync();
Source.fromPublisher(req.getPartPublisher())
// Filter to only get the file part
.filter(part -> part.getName().equals("file"))
// Handle one part at a time asynchronously
.mapAsync(1, part -> {
// Post the file part to S3
client.sendAsync(HttpRequest.newBuilder(S3_UPLOAD_URL)
// Here we plumb the publisher for the part (containing the
// bytes) to the HTTP client request body to S3.
.POST(BodyPublisher.fromPublisher(part.getPublisher()))
.build())
})
// Collect the the result of the upload
.runWith(Sink.head(), materializer)
// Attach a callback to the resulting completion stage
.thenAcceptAsync(s3Response -> {
resp.setStatus(s3Response.statusCode());
ctx.complete();
});
}
}
Similarly, other implementations of Reactive Streams can easily be used to handle the multipart request in the same way. The important thing that Reactive Streams allows here is that a developer can select whatever tool they want to handle which ever part of the stream they want, and be assured that end to end, all data, backpressure, completion and errors are consistently propogated.
Messaging/JMS
So far we’ve only looked at use cases that deal with streaming bytes for IO. Reactive Streams is also very useful for streaming high level messages, such as those produced and consumed by message brokers.
Here’s an example of what it might look like to subscribe to a queue using a Reactive Streams compatible messaging API in EE4J, using Akka Streams to handle the stream and save message content using an asynchronous database API:
@MessageSubscriber(topic = "mytopic")
public Subscriber<MessageEnvelope<MyEvent>> handleMyTopic() {
// Create an Akka stream that will materialize into a Subscriber
// that can subscribe to the events
Subscriber<MessageEvelope<MyEvent>> handler = Source.asSubscriber()
// Handle each message by saving it to the database
.mapAsync(1, msg ->
saveToDatabase(msg.data())
// return the message rather than the result of the database op
.thenApply(result -> msg)
// Commit the message once handled
).map(msg -> msg.commit())
// Feed into an ignoring sink, since everything is now handled
.to(Sink.ignore())
// And run it to get the subscriber
.run(materializer);
return handler;
}
This is a fairly trivial example, but using Akka Streams, we could have the messages fanned out to multiple other consumers, we could have cycles in the processing that aggregate state, etc.
WebSockets
WebSockets is another type of long lived connection where synchronous IO is not appropriate. The current EE4J spec for WebSockets, JSR-356, does offer asynchronous handling of messages, however it does not support backpressure on receiving (so if the other end is sending too much data, there is no way to tell it to slow down, you must buffer or fail). It is also a purpose built API just for WebSockets, so any integration with other asynchronous data sources or sinks must be implemented manually by the end user.
Now imagine we wanted to implement a chat room, perhaps we’re going to use Apache Kafka as our chatroom backend, with one single partition topic per room. To implement this today, a reasonable amount of boilerplate is required, not just to transfer messages from the client to Apache Kafka and back, but also to propogate errors. Furthermore, if a client was producing messages at a high rate - faster than Apache Kafka is willing to consume, these messages are going to buffer on the server, causing it to run out of memory.
There exist however a number of Reactive Streams implementations for Apache Kafka. If JSR-356 were to support Reactive Streams, perhaps in the form of an @OnStream
annotation that can be used as an alternative to @OnMessage
, this is what it might look like to implement such a chat room:
@OnStream
public Publisher<ChatMessage> joinRoom(
@PathParam("room") room,
Publisher<ChatMessage> incomingMessages
) {
// The Kafka subscriber will send any messages it receives to Kafka
Subscriber<ChatMessage> kafkaSubscriber = createKafkaConsumerForRoom(room);
// The Kafka publisher will emit any messages it receives from Kafka
Publisher<ChatMessage> kafkaPublisher = createKafkaProducerForRoom(room);
// We now connect the incoming chat messages to the subscriber
incomingMessages.subscribe(kafkaSubscriber);
// And return the rooms publisher to be published to the WebSocket
return kafkaPublisher;
}
The createKafka*
methods would be code specific for connecting to Kafka, the only JSR-356 specific code would be the above method itself.
CDI
Contexts and Dependency Injection (CDI) could take advantage of Reactive Streams for event publishing and subscribing, with many use cases similar to the ones described above, but another opportunity for CDI that we think is important is rather something that facilitates not just CDI, but all asynchronous processing in general.
The problem comes with CDI implementations using thread locals to propagate context, for example, the context for the current request, which might include the current authenticated user, cached authorization information for the user, etc. The assumption in CDI is that the container will always be in control of the threads that operate within this context. However, with asynchronous processing, that is not the case.
A good example of this is in a chat room. Each user connected to the chat room has an active WebSocket request, which has a particular CDI context that goes with it. Processing of messages received from these users might happen in the right CDI context, but what happens when User A sends a message to the chat room, and then it has to be sent to User B? The thread that is propogating the message to User B now needs the request context for User B’s request, in order to correctly handle the message, but that thread will have User A’s request context, because it was the receipt of a message from User A that initiated the current processing.
The only answers to this in CDI currently is to not use CDI contexts, rather, to capture all context at the start of the request, and from then on use a non CDI mechanism (such as simply passing the context every manually) to use that context when it’s needed. We think CDI can offer a better solution to this, by allowing context to be captured, and set up/torn down in another thread.
JPA
Before JPA can support any asynchronous operations, a standard for asynchronous database drivers is needed. Fortunately, there is currently a lot of active development going on towards this, so we can expect to see progress here I think in the next 12 months. Once that support exists, here are some examples of how JPA could be modified to support asynchronous streaming:
Streamed queries
Database exports are one obvious example for query streaming where you don’t want to load the entire result set into memory, but I think more interesting use cases will arrive in the not too distant future, particularly around event logging, as different services may want to stream events from a database to be brought up to date on demand. Here’s an example of serving one such event stream through a Reactive Streams WebSocket API, using an imaginedgetResultPublisher
method on TypedQuery
:
@OnStream
public Publisher<Event> streamEvents(
@QueryParam("since") Date since
) {
Publisher<Event> events = entityManager.createQuery(
"select e from events where e.timestamp >= :since",
Event.class
).setParameter("since", since)
.getResultPublisher();
return events;
}
Once again, backpressure, completion and error handling is all done for you.
Streamed ingestion
An example of streamed ingestion might be persisting logs, here’s an example where logs a pushed in via WebSockets, using an imagined persistSubscriber
method on EntityManager
:
@OnStream
public void ingestLogs(
Publisher<Log> logs
) {
Subscriber<Log> ingester = entityManager.persistSubscriber(Log.class);
logs.subscribe(ingester);
}
Streaming combinators
In the use cases above, when plumbing streams, there’s often a need to transform the stream in some way, whether it’s a simple map or filter, or perhaps collecting a stream into a single value, or maybe concatenating, broadcasting or merging streams. For Reactive Streams, this functionality is provided by a number of implementations such as Akka Streams, Reactor and RxJava. While these do the job well, it may be desirable for EE4J developers to have common functionality available to them out of the box.
Furthermore, a standard library for streaming combinators allows other APIs to use this to provide more fluent APIs to developers than what the bare Reactive Streams Publisher and Subscriber interfaces offer, without having to depend on a third party library.
Finally, it would allow EE4J libraries to assume a basic set of streaming operators that they can use in their own implementations, since they could not and should not depend on 3rd party libraries. This is very important, as otherwise the burden of reimplementing many operators would fall into the hands of each and every library wanting to provide streaming capabilities, sidetracking them from the core issue they’re attempting to solve, be it database access, WebSockets or anything else.
New possibilities
All of the use cases above only cover changes to existing streaming usages in EE4J, however, adoption of a robust streaming standard like Reactive Streams facilitates new possibilities for features and technologies that EE4J could support.
Here are some examples:
Event sourcing
Event sourcing itself may not be that interesting from a streaming perspective, but the ability to consume the event log as a stream of messages is incredibly interesting. Basing the event log stream on Reactive Streams would allow all sorts of consumers to be plugged in, such as message brokers, WebSockets, CDI event publishers, and databases.
Real time message distribution
Applications are tending to provide more responsive user interfaces, with updates being pushed to users as soon as they happen. While WebSockets may offer the mechanism to communicate with the client, EE4J doesn’t yet offer any backend solution for communicating pushes between clusters of machines in the backend. Message brokers can often be too heavy weight to achieve this, and instead a lighter-weight, inter-node communication mechanism would be better. Reactive Streams is the perfect API for building this on, as it then allows seamless plumbing to WebSockets and other stream sinks/sources.
Data processing pipelines
Current EE4J features tend to be focussed on short lived transactional processes, which have a definite start and definite end. There is a shift in the industry to more stream based approaches, which is seen in stream processing projects like Spark, Flink and Kafka Streams. EE4J could provide a mechanism for integrating EE4J applications into these pipelines, and Reactive Streams would be the perfect API to offer this integration.
Summary
In this article we’ve looked at a broad array of possibilities of what Reactive Streams brings to the table for EE4J. Each possibility brings value in its own right, but the more possibilities that are implemented overall, the more value each brings. It’s our hope at Lightbend that we can now start collaborating with the EE4J community in make all of these possibilities a reality.
Next: sbt 1.2.0 roadmap
What can Reactive Streams offer EE4J?的更多相关文章
- Java 9 揭秘(17. Reactive Streams)
Tips 做一个终身学习的人. 在本章中,主要介绍以下内容: 什么是流(stream) 响应式流(Reactive Streams)的倡议是什么,以及规范和Java API 响应式流在JDK 中的AP ...
- JVM平台上的响应式流(Reactive Streams)规范
// Reactive Streams // 响应式流是一个倡议,用来为具有非阻塞后压的异步流处理提供一个标准.大家努力的目标集中在运行时环境(JVM和JavaScript)和网络协议上. 注:响应式 ...
- Spring Boot Reactive Streams
1 响应式编程规范 目标:provide a standard for asynchronous stream processing with non-blocking backpressure ht ...
- FunDA(6)- Reactive Streams:Play with Iteratees、Enumerator and Enumeratees
在上一节我们介绍了Iteratee.它的功能是消耗从一些数据源推送过来的数据元素,不同的数据消耗方式代表了不同功能的Iteratee.所谓的数据源就是我们这节要讨论的Enumerator.Enumer ...
- FunDA(5)- Reactive Streams:Play with Iteratees
FunDA的设计目标就是把后台数据库中的数据搬到内存里,然后进行包括并行运算的数据处理,最后可能再对后台数据库进行更新.如果需要把数据搬到内存的话,那我们就必须考虑内存是否能一次性容纳所有的数据,有必 ...
- FunDA(7)- Reactive Streams to fs2 Pull Streams
Reactive-Stream不只是简单的push-model-stream, 它还带有“拖式”(pull-model)性质.这是因为在Iteratee模式里虽然理论上由Enumerator负责主动推 ...
- oracle 19c jdbc之Reactive Streams Ingestion (RSI) Library
19c jdbc新特性 https://blogs.oracle.com/dev2dev/whats-new-in-193-and-183-jdbc-and-ucp jdbc实现直接路径加载 http ...
- Akka Stream文档翻译:Quick Start Guide: Reactive Tweets
Quick Start Guide: Reactive Tweets 快速入门指南: Reactive Tweets (reactive tweets 大概可以理解为“响应式推文”,在此可以测试下GF ...
- reactive stream: 响应式编程
既然 Reactive Stream 和 Java 8 引入的 Stream 都叫做流,它们之间有什么关系呢?有一点关系,Java 8 的 Stream 主要关注在流的过滤,映射,合并,而 Reac ...
随机推荐
- pojo类自动生成序列化ID
自动生成序列化ID
- java随笔2 变量类定义
如果要定义变量为对象,就要创建此对象对应的java类, 且定义的类型为类名,且都为private
- yml中driver-class-name: com.mysql.jdbc.Driver 解析不到的问题
当在idea中使用springboot的快捷创建方式时,选中了mysql 和jdbc 那么pom文件中会直接有 <dependency> <groupId>mysql</ ...
- Guava Cache源码详解
目录 一.引子 二.使用方法 2.1 CacheBuilder有3种失效重载模式 2.2 测试验证 三.源码剖析 3.1 简介 3.2 源码剖析 四.总结 优点: 缺点: 正文 回到顶部 一.引子 缓 ...
- web攻擊
一.dos攻擊 向服務器發送數量龐大的合法數據,讓服務器分不清是不是正常請求,導致服務器接收所有的請求.海量的數據請求會使得服務器停止服務和拒絕服務. 防禦:阿里云或其它資源服務器有專門web應用防火 ...
- 天虎云商wap和微信话项目总结
1:架构:以后要采用项目分模块的方式写代码了,不能写一个公用的controller包,每个模块分包,分别建立service,dao,但是模块同级的有个功能的baseDao, BaseSe ...
- Nginx 4层反向代理
L112 是基于TCP POST_ACCEPT阶段 在建立连接后所做的事情 PREACCESS阶段 limit_conn 限流 与HTTP类似 ACCESS阶段 类似HTTP模块用于控制访问权限 S ...
- Android热修复原理
参考:https://www.cnblogs.com/popfisher/p/8543973.html 一. AndFix AndFix的原理就是方法的替换,把有bug的方法替换成补丁文件中的方法. ...
- BZOJ2616 SPOJ PERIODNI(笛卡尔树+树形dp)
考虑建一棵小根堆笛卡尔树,即每次在当前区间中找到最小值,以最小值为界分割区间,由当前最小值所在位置向两边区间最小值所在位置连边,递归建树.那么该笛卡尔树中的一棵子树对应序列的一个连续区间,且根的权值是 ...
- java json转换(一)
主要使用了2个类 JsonConvert.class 和 ConvertHelper.class 由于常规转json.只要model牵涉到复杂的关联实体对象.那么就会出现 深度循环的错误. 因此这里通 ...