The New InfluxDB Storage Engine: Time Structured Merge Tree

by Paul Dix | Oct 7, 2015 | InfluxDB | 0 comments

For more than a year we’ve been talking about potentially making a storage engine purpose-built for our use case of time series data. Today I’m excited to announce that we have the first version of our new storage engine available in a nightly build for testing. We’re calling it the Time Structured Merge Tree or TSM Tree for short.

In our early testing we’ve seen up to a 45x improvement in disk space usage over 0.9.4 and we’ve written 10,000,000,000 (10B) data points (divided over 100k unique series written in 5k point batches) at a sustained rate greater than 300,000 per second on an EC2 c3.8xlarge instance. This new engine uses up to 98% less disk space to store the same data as 0.9.4 with no reduction in query performance.

In this post I’ll talk a little bit about the new engine and give pointers to more detailed write-ups and instructions on how to get started with testing it out.

Columnar storage and unlimited fields

The new storage engine is a columnar format, which means that having multiple fields won’t negatively affect query performance. For this engine we’ve also lifted the limitation on the number of fields you can have in a measurement. For instance, you could have MySQL as the thing you’re measuring and represent each of the few hundred different metrics that you gather from MySQL as separate fields.

Even though the engine isn’t optimized for updates, the new columnar format also means that it’s possible to update single fields without touching the other fields for a given data point.

Compression

We use multiple compression techniques which vary depending on the data type of the field and the precision of the timestamps. Timestamp precision matters because you can represent them down to nanosecond scale. For timestamps we use delta encoding, scaling and compression using simple8b, run-length encoding or falling back to no compression if the deltas are too large. Timestamps in which the deltas are small and regular compress best. For instance, we can get great compression on nanosecond timestamps if they’re only 10ns apart each. We’d achieve the same level of compression for second precision timestamps that are 10s apart.

We use the same delta encoding for floats mentioned in Facebook’s Gorilla paper, bits for booleans, delta encoding for integers, and Snappy compression for strings. We’re also considering adding dictionary style compression for strings, which is very efficient if you have repeated strings.

Depending on the shape of your data, the total size for storage including all tag metadata can range from 2 bytes per point on the low end to more for random data. We found that random floats with second level precision in series sampled every 10 seconds take about 3 bytes per point. For reference, Graphite’s Whisper storage uses 12 bytes per point. Real world data will probably look a bit better since there are often repeated values or small deltas.

LSM Tree similarities

The new engine has similarities with LSM Trees (like LevelDB and Cassandra’s underlying storage). It has a write ahead log, index files that are read only, and it occasionally performs compactions to combine index files. We’re calling it a Time Structured Merge Tree because the index files keep contiguous blocks of time and the compactions merge those blocks into larger blocks of time.

Compression of the data improves as the index files are compacted. Once a shard becomes cold for writes it will be compacted into as few files as possible, which yield the best compression.

转自:https://www.influxdata.com/new-storage-engine-time-structured-merge-tree/

 

InfluxDB存储引擎Time Structured Merge Tree——本质上和LSM无异,只是结合了列存储压缩,其中引入fb的float压缩,字串字典压缩等的更多相关文章

  1. MySQL存储引擎 InnoDB/ MyISAM/ MERGE/ BDB 的区别

    MyISAM:默认的MySQL插件式存储引擎,它是在Web.数据仓储和其他应用环境下最常使用的存储引擎之一.注意,通过更改 STORAGE_ENGINE 配置变量,能够方便地更改MySQL服务器的默认 ...

  2. mysql 开发基础系列11 存储引擎memory和merge介绍

    一. memory存储引擎 memoery存储引擎是在内存中来创建表,每个memory表只实际对应一个磁盘文件格式是.frm.   该引擎的表访问非常得快,因为数据是放在内存中,且默认是hash索引, ...

  3. HBase底层存储原理——我靠,和cassandra本质上没有区别啊!都是kv 列存储,只是一个是p2p另一个是集中式而已!

    理解HBase(一个开源的Google的BigTable实际应用)最大的困难是HBase的数据结构概念究竟是什么?首先HBase不同于一般的关系数据库,它是一个适合于非结构化数据存储的数据库.另一个不 ...

  4. facebook Presto SQL分析引擎——本质上和spark无异,分解stage,task,MR计算

    Presto 是由 Facebook 开源的大数据分布式 SQL 查询引擎,适用于交互式分析查询,可支持众多的数据源,包括 HDFS,RDBMS,KAFKA 等,而且提供了非常友好的接口开发数据源连接 ...

  5. MySQL(逻辑分层,存储引擎,sql优化,索引优化以及底层实现(B+Tree))

    一 , 逻辑分层 连接层:连接与线程处理,这一层并不是MySQL独有,一般的基于C/S架构的都有类似组件,比如连接处理.授权认证.安全等. 服务层:包括缓存查询.解析器.优化器,这一部分是MySQL核 ...

  6. [转][译] 存储引擎原理:LSM

    原译文地址:http://www.tuicool.com/articles/qqQV7za http://www.zhihu.com/question/19887265 http://blog.csd ...

  7. Log Structured Merge Trees(LSM) 算法

    十年前,谷歌发表了 “BigTable” 的论文,论文中很多很酷的方面之一就是它所使用的文件组织方式,这个方法更一般的名字叫 Log Structured-Merge Tree. LSM是当前被用在许 ...

  8. 第 3 章 MySQL 存储引擎简介

    第 3 章 MySQL 存储引擎简介 前言 3.1 MySQL 存储引擎概述 MyISAM 存储引擎是 MySQL 默认的存储引擎,也是目前 MySQL 使用最为广泛的存储引擎之一.他的前身就是我们在 ...

  9. 《MySQL技术内幕:InnoDB存储引擎(第2版)》书摘

    MySQL技术内幕:InnoDB存储引擎(第2版) 姜承尧 第1章 MySQL体系结构和存储引擎 >> 在上述例子中使用了mysqld_safe命令来启动数据库,当然启动MySQL实例的方 ...

随机推荐

  1. [bzoj2806][Ctsc2012]Cheat(后缀自动机(SAM)+二分答案+单调队列优化dp)

    偷懒直接把bzoj的网页内容ctrlcv过来了 2806: [Ctsc2012]Cheat Time Limit: 20 Sec  Memory Limit: 256 MBSubmit: 1943   ...

  2. [LUOGU] P1316 丢瓶盖

    题目描述 陶陶是个贪玩的孩子,他在地上丢了A个瓶盖,为了简化问题,我们可以当作这A个瓶盖丢在一条直线上,现在他想从这些瓶盖里找出B个,使得距离最近的2个距离最大,他想知道,最大可以到多少呢? 输入输出 ...

  3. KBE_创建项目和基本常识

    此笔记参考官方文档 第一个项目 资产库:是每一个项目文件夹的名称,使用KBE提供的生成工具生成一个最小资产库,其中包含了很多常用的工具,默认名server_assets: res:放置一些资源(入地图 ...

  4. HDU 6446 Tree and Permutation(赛后补题)

    >>传送门<< 分析:这个题是结束之后和老师他们讨论出来的,很神奇:刚写的时候一直没有注意到这个是一个树这个条件:和老师讨论出来的思路是,任意两个结点出现的次数是(n-1)!, ...

  5. 3D标签云

    一.圆的坐标表达式 for(var i = 0;i < len;i++){ degree = (2*(k+1)-1)/len - 1;a = Math.acos(degree);//这样取得弧度 ...

  6. HDU 4465 递推与double的精确性

    题目大意不多说了 这里用dp[i][0] 代表取完第一个盒子后第二个盒子剩 i 个的概率,对应期望就是dp[i][0] *i dp[i][1] 就代表取完第二个盒子后第一个盒子剩 i 个的概率 dp[ ...

  7. 【HDOJ6146】Pokémon GO(DP,计数)

    题意:一个2*n的矩阵,从任意一格出发,不重复且不遗漏地走遍所有格子,问方案数 mo 10^9+7 n<=10000 思路:因为OEIS搜出来的两个数列都是错误的,所以考虑DP 设B[i]为2* ...

  8. 【BZOJ4591】超能粒子炮·改(Lucas定理,组合计数)

    题意: 曾经发明了脑洞治疗仪&超能粒子炮的发明家SHTSC又公开了他的新发明:超能粒子炮·改--一种可以发射威力更加 强大的粒子流的神秘装置.超能粒子炮·改相比超能粒子炮,在威力上有了本质的提 ...

  9. transaction transaction transaction 最大费用最大流转化到SPFA最长路

    //当时比赛的时候没有想到可以用SPFA做,TLE! Problem Description Kelukin is a businessman. Every day, he travels aroun ...

  10. sqlserver2008 存储过程使用表参数

    ----首先,我们定义一个表值参数类型,其实就是一个表变量   Create type dbo.tp_Demo_MultiRowsInsert as Table   (   [PName] [Nvar ...