Querying or Reading Data

OpenTSDB offers a number of means to extract data such as CLI tools, an HTTP API and as a GnuPlot graph. Querying with OpenTSDB's tag based system can be a bit tricky so read through this document and checkout the following pages for deeper information. Example queries on this page follow the HTTP API format.

OpenTSDB提供很多方法抽取数据,例如CLI tools,HTTPAPI,Gnuplot图。

基于系统查询OpenTSDB的tag是件困难的事情。通过阅读下面的文档获取更多信息。

Query Components

OpenTSDB's query language is fairly simple but flexible. Each query has the following components:

OpenTSDB的查询语言是比较简单,灵活的。每个查询包括如下部分:

Parameter Date Type Required Description Example
Start Time String or Integer Yes

Starting time for the query. This may be an absolute or relative time. See Dates and Times for details

相对值和绝对值都支持

24h-ago
End Time String or Integer No

An end time for the query. If the end time is not supplied, the current time on the TSD will be used. See Dates and Times for details.

可以不提供,默认是当前时间

1h-ago
Metric String Yes

The full name of a metric in the system. Must be the complete name. Case sensitive

metric的名字,大小写敏感

sys.cpu.user
Aggregation Function String Yes

A mathematical function to use in combining multiple time series

整合多个时间序列的数学函数

sum
Tags String No

An optional set of tags for filtering or grouping

基于tags进行过滤

host=*,dc=lax
Downsampler String No

An optional interval and function to reduce the number of data points returned

减少返回数据点

1h-avg
Rate String No

An optional flag to calculate the rate of change for the result

计算结果变化的比率

rate

Times

Absolute time stamps are supported in human readable format or Unix style integers. Relative times may be used for refreshing dashboards. Currently, all queries are able to cover a single time span. In the future we hope to provide an offset query parameter that would allow for aggregations or graphing of a metric over different time periods, such as comparing last week to 1 year ago. See Dates and Times for details on what is permissible.

更利于阅读的时间戳或者Unix风格的都是支持的。相对时间主要是用于更新dashboard。

目前,所有的查询都是基于单个时间范围的。未来,会提供一个偏移量时间查询参数,支持一段时间内数据的整合,例如比较上个星期和一年前的。

While OpenTSDB can store data with millisecond resolution, most queries will return the data with second resolution to provide backwards compatibility for existing tools. Unless a down sampling algorithm has been specified with a query, the data will automatically be down sampled to 1 second using the same aggregation function specified in a query. This way, if multiple data points are stored for a given second, they will be aggregated and returned in a normal query correctly.

虽然OpenTSDB支持毫秒级数据存储,大多数的查询返回秒级数据。除非通过下载数据的方式,否则默认都是1秒内的数据采样。

如果单秒内存在多个数据点,它们会被整合,然后在一单个查询中返回。

To extract data with millisecond resolution, use the /api/query endpoint and specify the msResolution JSON parameter or ms query string flag and it will bypass down sampling (unless specified) and return all timestamps in Unix epoch millisecond resolution. Also, the scancommandline utility will return the timestamp as written in storage.

使用毫秒级别方式抽取数据,通过api/query接口,确定msResolution Json参数或者指定ms查询字符串,返回Unix格式的毫秒数据格式。

scan commandline utility 返回存储中的时间戳。

Tags

Every time series is comprised of a metric and one or more tag name/value pairs. Since tags are optional in queries, if you request only the metric name, then every metric with any number or value of tags will be returned in the aggregated results. For example, if we have a stored data set:

sys.cpu.user host=webserver01,cpu=0  1356998400  1
sys.cpu.user host=webserver01,cpu=1 1356998400 4
sys.cpu.user host=webserver02,cpu=0 1356998400 2
sys.cpu.user host=webserver02,cpu=1 1356998400 1

and simply craft a query start=1356998400&m=sum:sys.cpu.user, we will get a value of 8 at 1356998400 that incorporates all 4 time series.

每个时间序列是由metric和一个或多个tag name/value键值对组成。

虽然在具体查询中tags是可选的,可以只指定metric 名称,每个metric返回都带有tags。

例如存储数据的格式如下:

sys.cpu.user host=webserver01,cpu=0  1356998400  1
sys.cpu.user host=webserver01,cpu=1 1356998400 4
sys.cpu.user host=webserver02,cpu=0 1356998400 2
sys.cpu.user host=webserver02,cpu=1 1356998400 1

查询的query为start=1356998400&m=sum:sys.cpu.user,会得到4个时间序列,值为8

If we want to aggregate the results for a specific group, we can filter on the host tag. The querystart=1356998400&m=sum:sys.cpu.user{host=webserver01} will return a value of 5, incorporating only the time series wherehost=webserver01. To drill down to a specific time series, you must include all of the tags for the series, e.g.start=1356998400&m=sum:sys.cpu.user{host=webserver01,cpu=0} will return 1.

如果你想整合一个特点的group,需要基于host tag进行过滤。例如查询start=1356998400&m=sum:sys.cpu.user{host=webserver01} ,返回值为5,只整合host=webserver01的数据。获取一个特定的时间序列,需要加上所有tag,例如start=1356998400&m=sum:sys.cpu.user{host=webserver01,cpu=0} ,返回值为1

Grouping

A query can also aggregate time series with multiple tags into groups based on a tag value. Two special characters can be passed to the right of the equals symbol in a query:

  • * - The asterisk will return a separate result for each unique tag value
  • | - The pipe will return a separate result only for the exact tag values specified

Let's take the following data set as an example:

sys.cpu.user host=webserver01,cpu=0  1356998400  1
sys.cpu.user host=webserver01,cpu=1 1356998400 4
sys.cpu.user host=webserver02,cpu=0 1356998400 2
sys.cpu.user host=webserver02,cpu=1 1356998400 1
sys.cpu.user host=webserver03,cpu=0 1356998400 5
sys.cpu.user host=webserver03,cpu=1 1356998400 3

If we want to query for the average CPU time across each server we can craft a query likestart=1356998400&m=avg:sys.cpu.user{host=*}. This will give us three results:

  1. The aggregated average for sys.cpu.user host=webserver01,cpu=0 and sys.cpu.user host=webserver01,cpu=1
  2. The aggregated average for sys.cpu.user host=webserver02,cpu=0 and sys.cpu.user host=webserver02,cpu=1
  3. The aggregated average for sys.cpu.user host=webserver03,cpu=0 and sys.cpu.user host=webserver03,cpu=1

*操作,返回唯一tag值对应的数据,这个例子返回三个结果,三个唯一tag值

However if we have many web servers in the system, this could create a ton of results. To filter on only the hosts we want you can use the pipe operator to select a subset of time series. For example start=1356998400&m=avg:sys.cpu.user{host=webserver01|webserver03}will return results only for webserver01 and webserver03.

|操作,是或的关系

Aggregation

A powerful feature of OpenTSDB is the ability to perform on-the-fly aggregations of multiple time series into a single set of data points. The original data is always available in storage but we can quickly extract the data in meaningful ways. Aggregation functions are means of merging two or more data points for a single time stamp into a single value. See Aggregators for details.

OpenTSDB一个强大的特性就是将多个时间序列整合为单个集合的数据点。原数据存储在storage上,可以按照不同方式获取数据。

整合函数是将多个数据点按照一个时间维度存储在单个值上。

Interpolation(插值采样)

When performing an aggregation, what happens if the time stamps of the data points for each time series fail to line up? Say we record the temperature every 5 minutes in different regions around the world. A sensor in Paris may send a temperature of 27c at 1356998400. Then a sensor in San Francisco may send a value of 18c at 1356998430, 30 seconds later. Antarctica may report -29c at 1356998529. If we run a query requesting the average temperature, we want all of the data points averaged together into a single point. This is where interpolationcomes into play. See Aggregators for details.

当进行聚合的时候,如果数据点得时间戳不是线性增长的话会怎么样?

例如每5分钟记录世界不同地方的温度。Paris是27c at 1356998400,30秒后San Francisco是18c at 1356998430,Antarctica是-29c at 1356998529。

如果我们查询平均问题,需要将所有数据点计算平均值到一个点。这个就是interpolation。

Downsampling

OpenTSDB can ingest a large amount of data, even a data point every second for a given time series. Thus queries may return a large number of data points. Accessing the results of a query with a large number of points from the API can eat up bandwidth. High frequencies of data can easily overwhelm Javascript graphing libraries, hence the choice to use GnuPlot. Graphs created by the GUI can be difficult to read, resulting in thick lines such as the graph below:

OpenTSDB可以存储很多的数据,每秒一个数据点的时间序列。但是查询可能会返回比较大的数据量。

通过API访问会比较消耗带宽。

画图可以使用Javascript图库,也可以使用GnuPlot。

Down sampling can be used at query time to reduce the number of data points returned so that you can extract better information from a graph or pass less data over a connection. Down sampling requires an aggregation function and a time interval. The aggregation function is used to compute a new data point across all of the data points in the specified interval with the proper mathematical function. For example, if the aggregation sum is used, then all of the data points within the interval will be summed together into a single value. If avg is chosen, then the average of all data points within the interval will be returned.

Down sampling可以用于减少数据量,这样你可以从图中获取更直观的信息。

Down sampling需要aggregation function 和time interval。

aggregation function 用于计算新的点,在确定的time interval内。

可以使用数据函数,例如sum和avg

Intervals are specified by a number and a unit of time. For example, 30m will aggregate data points every 30 minutes. 1h will aggregate across an hour. See Dates and Times for valid relative time units. Do not add the -ago to a down sampling query.

Intervals是特定的时间序列。

Using down sampling we can cleanup the previous graph to arrive at something much more useful:

使用down sampling后上图变为如下:

As of 2.1, downsampled timestamps are normalized based on the remainder of the original data point timestamp divided by the downsampling interval in milliseconds, i.e. the modulus. In Java the code is timestamp - (timestamp % interval_ms). For example, given a timestamp of 1388550980000, or 1/1/2014 04:36:20 UTC and an hourly interval that equates to 3600000 milliseconds, the resulting timestamp will be rounded to 1388548800000. All data points between 4 and 5 UTC will wind up in the 4 AM bucket. If you query for a day's worth of data downsampling on 1 hour, you will receive 24 data points (assuming there is data for all 24 hours).

Normalization works very well for common queries such as a day's worth of data downsampled to 1 minute or 1 hour. However if you try to downsample on an odd interval, such as 36 minutes, then the timestamps may look a little strange due to the nature of the modulus calculation. Given an interval of 36 minutes and our example above, the interval would be 2160000 milliseconds and the resulting timestamp1388549520 or 04:12:00 UTC. All data points between 04:12 and 04:48 would wind up in a single bucket. Also note that OpenTSDB cannot currently normalize on non-UTC times and it cannot normalize on weekly or monthly boundaries.

时间序列的Normalization

Rate

A number of data sources return values as constantly incrementing counters. One example is a web site hit counter. When you start a web server, it may have a hit counter of 0. After five minutes the value may be 1,024. After another five minutes it may be 2,048. The graph for a counter will be a somewhat straight line angling up to the right and isn't always very useful. OpenTSDB provides the rate key word that calculates the rate of change in values over time. This will transform counters into lines with spikes to show you when activity occurred and can be much more useful.

OpenTSDB提供rate key计算一段时间内值的变化趋势,这样将网站的counters转换为不同趋势的线,更利用展示活跃期。

The rate is the first derivative of the values. It's defined as (v2 - v1) / (t2 - t1). Therefore you will get the rate of change per second. Currently the rate of change between millisecond values defaults to a per second calculation.

计算公式为(v2 - v1) / (t2 - t1)。因此可以获取每秒的变化。

OpenTSDB 2.0 provides support for special monotonically increasing counter data handling including the ability to set a "rollover" value and suppress anomalous fluctuations. When the counterMax value is specified in a query, if a data point approaches this value and the point after is less than the previous, the max value will be used to calculate an accurate rate given the two points. For example, if we were recording an integer counter on 1 byte, the maximum value would be 65,535. If the value at t0 is 64000 and the value at t1 is 1000, the resulting rate would usually be calculated as -63000. However we know that it's likely the counter rolled over so we can set the max to 65535and now the calculation will be 65535 - t0 + t1 to give us 2535.

OpenTSDB2.0支持特定单调递增处理。

counterMax

t0 64000

t1 1000

rate为-63000

最大值为65535,计算公式为65535 - t0 + t1 =2535

(没明白??)

Systems that track data in counters often revert to 0 when restarted. When that happens and we could get a spurious result when using the max counter feature. For example, if the counter has reached 2000 at t0 and someone reboots the server, the next value may be 500 at t1. If we set our max to 65535 the result would be 65535 - 2000 + 500 to give us 64035. If the normal rate is a few points per second, this particular spike, with 30s between points, would create a rate spike of 2,134.5! To avoid this, we can set the resetValue which will, when the rate exceeds this value, return a data point of 0 so as to avoid spikes in either direction. For the example above, if we know that our rate almost never exceeds 100, we could configure a resetValue of 100 and when the data point above is calculated, it will return 0 instead of2,134.5. The default value of 0 means the reset value will be ignored, no rates will be suppressed.

Order of operations

Understanding the order of operations is important. When returning query results the following is the order in which processing takes place:

  1. Grouping
  2. Down Sampling
  3. Interpolation
  4. Aggregation
  5. Rate Calculations

理解操作的顺序很重要!!!

OpenTSDB-Querying or Reading Data的更多相关文章

  1. 【MySQL】MySQL同步报错-> Last_IO_Error: Got fatal error 1236 from master when reading data from binary log

    这个报错网上搜索了一下,大部分是由于MySQL意外关闭或强制重启造成的binlog文件事务点读取异常造成的主从同步报错 Last_IO_Error: Got fatal error 1236 from ...

  2. [Hive - Tutorial] Querying and Inserting Data 查询和插入数据

    Querying and Inserting Data Simple Query Partition Based Query Joins Aggregations Multi Table/File I ...

  3. mysql 主从 Got fatal error 1236 from master when reading data from binary log: 'Could not find first 错误

    本地MySQL环境,是两台MySQL做M-M复制.今天发现错误信息: mysql 5.5.28-log> show slave status\G ************************ ...

  4. Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary lo

    mysql> show slave status\G *************************** 1. row ***************************         ...

  5. SQL data reader reading data performance test

    /*Author: Jiangong SUN*/ As I've manipulated a lot of data using SQL data reader in recent project. ...

  6. Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'系列一:

    从库报这个错误:Got fatal error 1236 from master when reading data from binary log: 'Could not find first lo ...

  7. mysql从库Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'报错处理

    年后回来查看mysql运行状况与备份情况,登录mysql从库查看主从同步状态 mysql> show slave status\G; *************************** . ...

  8. Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'系列三:重置主从同步

    1:停止slave服务器的主从同步 stop slave; 2:对Master数据库加锁 flush tables with read lock; 3:备份Master上的数据 mysqldump - ...

  9. MySQL案例09:Last_IO_Error: Got fatal error 1236 from master when reading data from binary log

    刚处理完“挖矿”事件,在做最后一个MySQL NBU备份的时候,发现从库有问题,好奇的是怎么主从状态异常没有告警呢?先不管这么多了,处理了这个问题再完善告警内容. 一.错误信息 从库show slav ...

随机推荐

  1. ASP.NET没有魔法——ASP.NET与数据库

    在之前的文章中介绍了使用ASP.NET MVC来开发一个博客系统,并且已将初具雏形,可以查看文章列表页面,也可以点击文章列表的其中一篇文章查看详情,这已经完成了最开始需求分析的读者的查看列表和查看文章 ...

  2. 合并Spark社区代码的正确姿势

    原创文章,转载请保留出处 最近刚刚忙完Spark 2.2.0的性能测试及Bug修复,社区又要发布2.1.2了,国庆期间刚好有空,过了一遍2.1.2的相关JIRA,发现有不少重要修复2.2.0也能用上, ...

  3. Window2008 R2(64位)使用codesmith连接Sqlite

    ①打开C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Config目录,找到machine.config文件新增 <add name=" ...

  4. Winform常用的一些功能收集(持续更新)

    #region progressBar实时显示进度 private void button1_Click(object sender, EventArgs e) { int i = 10000; pr ...

  5. python装饰器 & flask 通过装饰器 实现 单点登录验证

    首先介绍装饰器,以下是一段标注了特殊输出的代码.用于帮助理解装饰器的调用过程. import time def Decorator_one(arg1): info = "\033[1;31; ...

  6. Re-Order Buffer

    Re-order Buffer(ROB)是处理器中非常重要的一个模块,它位于renamer与scheduler(RS)之间,并且也是execution unit(EU)的出口.ROB作为指令处理的后端 ...

  7. Android Parcelable理解与使用(对象序列化)

    http://my.oschina.net/zhoulc/blog/172163 parcel定义介绍: android提供了一种新的类型:parcel(英文解释:包裹,小包),本类用来封装数据的容器 ...

  8. Druid连接池

    Druid 连接池简介 Druid首先是一个数据库连接池.Druid是目前最好的数据库连接池,在功能.性能.扩展性方面,都超过其他数据库连接池,包括DBCP.C3P0.BoneCP.Proxool.J ...

  9. git镜像仓库

    有时候我们会把一些仓库放到本地,当他更新的时候,可以使用简单命名更新他. 不是所有时间我们都有网,所以把远程的仓库作为镜像,可以方便我们查看 普通的git clone不能下载所有分支,想要简单的git ...

  10. C# 格式化字符串

    C#字符串使用{}来格式化 {引索,宽度:格式} 格式后面加数字保留位数 格式 C人民币 {0,10:C10} <script type="text/javascript"& ...