The missing piece between MQTT and a SQL database in a M2M landscape

Message Queue Telemetry Transport (MQTT) is awesome when it comes to Machine-to-Machine (M2M) Communication. Due to its applied Publish-Subscribe pattern it offers great scalability even with thousands of connected devices.

The picture above shows a classic M2M landscape with a few publishers and a few subscribers.

Talking from the perspective of a provider of M2M services (which you are when you are hosting your own broker e.g. for homeautomation or your applications), you typically have additional needs to generate added value for yourself or your customer. So let’s say you want to store all MQTT publishes which are broadcasted to the broker for later analysis in a SQL database.

The concrete use case

So we want to store every message in a SQL database in our concrete use case. Let’s say we want to store them into a MySQL/MariaDB. The following simple database scheme will be used:

Implementation with a wildcard subscriber

The easiest way to achieve the storage is to add an additional client which subscribes to the Wildcard Topic (which happens to be # in MQTT). This ensures that the client receives all messages which are distributed by the broker. The client can now persist the message to the MySQL database every time a message arrives.

This would look like this:

We chose to implement the client library with Eclipse Paho. For brevity only the relevant callback part on message arrival is shown here. The full source code can be found here.

 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
    ......
    private static final String SQL_INSERT = "INSERT INTO `Messages` (`message`,`topic`,`quality_of_service`) VALUES (?,?,?)";
     ......
 
    @Override
    public void messageArrived(MqttTopic topic, MqttMessage message) throws Exception {
 
        //Let's assume we have a prepared statement with the SQL.
        try {
            statement.setBytes(1, message.getPayload());
            statement.setString(2, topic.getName());
            statement.setInt(3, message.getQos());
 
            //Ok, let's persist to the database
            statement.executeUpdate();
        } catch (SQLException e) {
            log.error("Error while inserting", e);
        }
 
    }

So we essentially just implemented the messageArrived method, which is called every time a new message arrives. Then we just persist it with a plain ol’ JDBC Prepared Statement. That’s all.

Gotchas and Limitations

This approach works well in some scenarios but has some downsides. Some of the challenges we will face with that approach could be:

  • What happens if the wilcard subscriber disconnects? What happens if it reconnects?
  • Isn’t the wildcard subscriber some kind of bottleneck?
  • Do we need different wildcard subscribers when we want to integrate e.g. a second database?
  • Is there a way to ensure that each message will be sent only once?

Let’s look into these questions in more detail.

What happens on subscriber disconnect or reconnect?

A tough problem is how to handle disconnects of the wildcard subscriber. The problem in a nutshell is, that all messages which are distributed by the broker are never going to be received by the wildcard subscriber if it is disconnected at the moment. In our case that would mean, that we cannot persist these messages to the database.

Another challenge are retained messages. Retained messages are messages which are stored at the broker and will be published by the broker when a client subscribes to the topic with the retained message. The challenge here is, that these messages should not be written to our database in our case, because we most likely already received these messages before with a “normal” publish. To avoid this shortcoming, the wildcard subscriber could be implemented with clean session = false, so the broker remembers all subscriptions for the client.

Isn’t the wildcard subscriber some kind of bottleneck?

Short answer: Yes, most likely.

Slightly longer answer: It depends. In scenarios with very low message throughput there will be no problem with a wildcard subscriber from a performance perspective. When you are dealing with thousands, tens of thousands or even hundreds of thousands publishing clients, there is a chance that the client library is not able to handle the load or will thwart the system throughput. Another key factor here is, that all messages from the broker to the wildcard subscriber have to go over the network, which can result in unnecessary traffic. It is of course possible to launch the subscribing client on the same machine as the broker. This solves the traffic problem, but the broker and the subscriber share the same system resources and the messaging overhead is on both applications, which is not optimal. This is even more serious in a clustered broker environment.

Do we need different wildcard subscribers when we want to integrate a second database?

It depends on your use case and your expected message throughput. If for example all your writes to the different databases are blocking, you hit the bottleneck problem probably earlier than with just one integrated database. To distribute the “database-load”, it could be a smart idea to have different subscribers for different databases. If your actions are non-blocking, you could handle this with one wildcard subscriber.

Is there a way to ensure that each message will be only sent once?

This can only be achieved when all publishers publish with the MQTT Quality of Service of 2, which guarantees that each message is delivered exactly once to the broker. The subscriber client can subscribe also with Quality of Service 2 and now it is guaranteed that every message will arrive exactly once on the subscriber. This approach has two problems: It is unlikely that you can assure that all publishers send with Quality of Service 2 and with Quality of Service 2 it is much harder to scale.

Implementation with HiveMQs Plugin system.

To overcome these problems, we designed the HiveMQ MQTT broker with a powerful plugin system. This plugin system allows one to hook into HiveMQ with custom code to extend the broker with additional functionality to enable deep integration into existing systems or to implement individual use cases in an elegant and simple manner. Let us see how the SQL integration can be solved with the HiveMQ plugin system.

In this scenario, the plugin system of HiveMQ takes care of persisting the messages. No subscriber (and no publisher) are aware of the persistence mechanism, which essentially solves all the problems we identified. But let us look first how this is implemented:

Implementation

 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
public class MessageStoreCallback implements OnPublishReceivedCallback {
 
    private static final String SQL_INSERT = "INSERT INTO `Messages` (`message`,`topic`,`quality_of_service`) VALUES (?,?,?)";
 
    private final BoneCP connectionPool;
 
    @Inject
    public MessageStoreCallback(BoneCP connectionPool) {
        this.connectionPool = connectionPool;
    }
 
    @Override
    public void onPublishReceived(PUBLISH publish, String clientId) throws OnPublishReceivedException {
 
        try {
            final Connection connection = connectionPool.getConnection();
 
            final PreparedStatement preparedStatement = connection.prepareStatement(SQL_INSERT);
            preparedStatement.setBytes(1, publish.getPayload());
            preparedStatement.setString(2, publish.getTopic());
            preparedStatement.setInt(3, publish.getQoS().getQosNumber());
 
            preparedStatement.executeUpdate();
 
            preparedStatement.close();
 
            connection.close();
        } catch (SQLException e) {
            throw new OnPublishReceivedException(e, false); //We do not disconnect the publishing client here
        }
    }
}

When looking at the code, we can see that this is almost completely the same as we implemented the wildcard subscriber. The slight difference is, that we get much more information about the publish message as before. We can access all attributes a publish message consists of (like retained, duplicate, etc) and we get information about the client which published the message. This enables finer control of what we want to persist. (What about only persisting messages from a specific client?). Additionally, it is possible to disconnect a client when something wrong or illegal was published. This can be achieved with the OnPublishReceivedException.

For better performance we inject a BoneCP Connection Pool to get the database connections. Since all plugins can hook into HiveMQ and reuse its components via Dependency Injection, optimal testability for plugins is ensured. Of course it is possible to write plugins without Dependency Injection, however, it is not recommended.

Key benefits

All the problems we identified with Wildcard subscribers are solved with the plugin system:

  • No messages are lost since the broker takes care of the message handling.
  • There is no bottleneck. All plugin executions are completely asynchronous and do not thwart the broker.
  • We can choose if we write different plugins for different use cases (e.g. a second database) but we do not need to.
  • Every plugin execution for a message will only occur once, so we do not have to care about duplicate handling.

These benefits are also true for a clustered HiveMQ environment. With the HiveMQ plugin system we are not only able to write MQTT messages to a MySQL database in an efficient way, we can also utilize the same mechanism to integrate HiveMQ to an existing software landscape. It is easy to integrate an Enterprise Service Bus (ESB), call REST APIs, integrate your billing system or even publish new MQTT messages on specific message occurrence.

Summary.

We discussed two ways of how to handle the storage of MQTT messages to an existing SQL database. We discussed the downsides of using wildcard subscriber MQTT clients and why this approach does not scale well. We learned that the HiveMQ plugin system solves the problems and allows you to deeply integrate the HiveMQ broker with existing systems (which happens to be a SQL database in our example).

More information about the plugin system will follow up soon! Don’t hesitate to contact us if you want to learn more how HiveMQ and its plugin system can help you.

As a final note it is worth mentioning, that a SQL database can become a bottleneck pretty soon on high message throughput. We recommend using a NoSQL store for such tasks, but this will be discussed in a follow-up blog post.

MQTT 单个订阅消息量过大处理的更多相关文章

  1. javascript mqtt 发布订阅消息

    js client使用paho-mqtt,官网地址:http://www.eclipse.org/paho/,参考http://www.eclipse.org/paho/clients/js/官网给出 ...

  2. 使用Python发送、订阅消息

    使用Python发送.订阅消息 使用插件 paho-mqtt 官方文档:http://shaocheng.li/post/blog/2017-05-23 Paho 是一个开源的 MQTT 客户端项目, ...

  3. MQTT的学习研究(六) MQTT moquette 的 Blocking API 订阅消息客户端使用

    * 使用 Java 为 MQ Telemetry Transport 创建订户 * 在此任务中,您将遵循教程来创建订户应用程序.订户将针对主题创建预订并接收该预订的发布. * 提供了一个示例订户应用程 ...

  4. C# MQTT mqtt客户端,发布订阅消息

    如果想用C#来和mqtt的服务器进行数据交互的话,有一个常见的选择,那就是 MQTTNET 地址如下:https://github.com/chkr1011/MQTTnet 那个库在最近几个版本升级的 ...

  5. 分布式发布订阅消息系统 Kafka 架构设计[转]

    分布式发布订阅消息系统 Kafka 架构设计 转自:http://www.oschina.net/translate/kafka-design 我们为什么要搭建该系统 Kafka是一个消息系统,原本开 ...

  6. kafka 基础知识梳理-kafka是一种高吞吐量的分布式发布订阅消息系统

    一.kafka 简介 今社会各种应用系统诸如商业.社交.搜索.浏览等像信息工厂一样不断的生产出各种信息,在大数据时代,我们面临如下几个挑战: 如何收集这些巨大的信息 如何分析它 如何及时做到如上两点 ...

  7. Kafka — 高吞吐量的分布式发布订阅消息系统【转】

    1.Kafka独特设计在什么地方?2.Kafka如何搭建及创建topic.发送消息.消费消息?3.如何书写Kafka程序?4.数据传输的事务定义有哪三种?5.Kafka判断一个节点是否活着有哪两个条件 ...

  8. 分布式公布订阅消息系统 Kafka 架构设计

    我们为什么要搭建该系统 Kafka是一个消息系统,原本开发自LinkedIn,用作LinkedIn的活动流(activity stream)和运营数据处理管道(pipeline)的基础. 如今它已为多 ...

  9. Python实现MQTT接收订阅数据

    一.背景 目前MQTT的标准组织官网:http://www.mqtt.org,里面列出了很多支持的软件相关资源. 一个轻量级的MQTT服务器是:http://www.mosquitto.org,可以运 ...

随机推荐

  1. 使用SimHash进行海量文本去重[转载]

    阅读目录 1. SimHash与传统hash函数的区别 2. SimHash算法思想 3. SimHash流程实现 4. SimHash签名距离计算 5. SimHash存储和索引 6. SimHas ...

  2. Sign http

    http接口请求参数签名工具类的实现和测试代码 http://blog.csdn.net/5iasp/article/details/52539901 http://www.what21.com/pr ...

  3. 双层嵌套json字符串(即json对象内嵌json数组)解析为Map

    之前我层写过一篇文章,介绍了json与map的相互转化,但当时只涉及到单一的json对象或json数组,对json对象内嵌套这json数组的json字符串无法处理,这篇文章主要解决这个问题. 之前的那 ...

  4. python中RabbitMQ的使用(安装和简单教程)

    1,简介 RabbitMQ是一个由erlang开发的AMQP(Advanced Message Queue )的开源实现的产品,RabbitMQ是一个消息代理,从"生产者"接收消息 ...

  5. "二分法"-"折半法"-查找算法-之通俗易懂,图文+代码详解-java编程

    转自http://blog.csdn.net/nzfxx/article/details/51615439 1.特点及概念介绍 下面给大家讲解一下"二分法查找"这个java基础查找 ...

  6. bzoj 4008 亚瑟王 期望概率dp

    对于这种看起来就比较傻逼麻烦的题,最关键的就是想怎么巧妙的设置状态数组,使转移尽可能的简洁. 一开始我想的是f[i][j]表示到第j轮第i张牌还没有被选的概率,后来发现转移起来特别坑爹,还会有重的或漏 ...

  7. BZOJ_3174_[Tjoi2013]拯救小矮人_贪心+DP

    BZOJ_3174_[Tjoi2013]拯救小矮人_贪心+DP Description 一群小矮人掉进了一个很深的陷阱里,由于太矮爬不上来,于是他们决定搭一个人梯.即:一个小矮人站在另一小矮人的 肩膀 ...

  8. spss汉化详解

    今天写一下关于SPSS的汉化以及激活码 下载spss: 安装过程比较简单,主要就是激活码: 9DNCAF2O3QVDV7FBIO696OO6GWLNXZPPRYTPWF2PPX7C8T6Y24LMVV ...

  9. sql server 任务调度与CPU

    一. 概述 我们知道在操作系统看来, sql server产品与其它应用程序一样,没有特别对待.但内存,硬盘,cpu又是数据库系统最重要的核心资源,所以在sql server 2005及以后出现了SQ ...

  10. MySQL数据同步,出现Slave_SQL_Running:no和slave_io_running:no问题的解决方法

    一.问题描述: 当我们配置好MySQL主主同步时,是可以实现主主同步,但是重启机器后就发现无法同步了. 二.Slave两个关键进程: mysql replication 中slave机器上有两个关键的 ...