Querying or Reading Data

OpenTSDB offers a number of means to extract data such as CLI tools, an HTTP API and as a GnuPlot graph. Querying with OpenTSDB's tag based system can be a bit tricky so read through this document and checkout the following pages for deeper information. Example queries on this page follow the HTTP API format.

OpenTSDB提供很多方法抽取数据,例如CLI tools,HTTPAPI,Gnuplot图。

基于系统查询OpenTSDB的tag是件困难的事情。通过阅读下面的文档获取更多信息。

Query Components

OpenTSDB's query language is fairly simple but flexible. Each query has the following components:

OpenTSDB的查询语言是比较简单,灵活的。每个查询包括如下部分:

Parameter Date Type Required Description Example
Start Time String or Integer Yes

Starting time for the query. This may be an absolute or relative time. See Dates and Times for details

相对值和绝对值都支持

24h-ago
End Time String or Integer No

An end time for the query. If the end time is not supplied, the current time on the TSD will be used. See Dates and Times for details.

可以不提供,默认是当前时间

1h-ago
Metric String Yes

The full name of a metric in the system. Must be the complete name. Case sensitive

metric的名字,大小写敏感

sys.cpu.user
Aggregation Function String Yes

A mathematical function to use in combining multiple time series

整合多个时间序列的数学函数

sum
Tags String No

An optional set of tags for filtering or grouping

基于tags进行过滤

host=*,dc=lax
Downsampler String No

An optional interval and function to reduce the number of data points returned

减少返回数据点

1h-avg
Rate String No

An optional flag to calculate the rate of change for the result

计算结果变化的比率

rate

Times

Absolute time stamps are supported in human readable format or Unix style integers. Relative times may be used for refreshing dashboards. Currently, all queries are able to cover a single time span. In the future we hope to provide an offset query parameter that would allow for aggregations or graphing of a metric over different time periods, such as comparing last week to 1 year ago. See Dates and Times for details on what is permissible.

更利于阅读的时间戳或者Unix风格的都是支持的。相对时间主要是用于更新dashboard。

目前,所有的查询都是基于单个时间范围的。未来,会提供一个偏移量时间查询参数,支持一段时间内数据的整合,例如比较上个星期和一年前的。

While OpenTSDB can store data with millisecond resolution, most queries will return the data with second resolution to provide backwards compatibility for existing tools. Unless a down sampling algorithm has been specified with a query, the data will automatically be down sampled to 1 second using the same aggregation function specified in a query. This way, if multiple data points are stored for a given second, they will be aggregated and returned in a normal query correctly.

虽然OpenTSDB支持毫秒级数据存储,大多数的查询返回秒级数据。除非通过下载数据的方式,否则默认都是1秒内的数据采样。

如果单秒内存在多个数据点,它们会被整合,然后在一单个查询中返回。

To extract data with millisecond resolution, use the /api/query endpoint and specify the msResolution JSON parameter or ms query string flag and it will bypass down sampling (unless specified) and return all timestamps in Unix epoch millisecond resolution. Also, the scancommandline utility will return the timestamp as written in storage.

使用毫秒级别方式抽取数据,通过api/query接口,确定msResolution Json参数或者指定ms查询字符串,返回Unix格式的毫秒数据格式。

scan commandline utility 返回存储中的时间戳。

Tags

Every time series is comprised of a metric and one or more tag name/value pairs. Since tags are optional in queries, if you request only the metric name, then every metric with any number or value of tags will be returned in the aggregated results. For example, if we have a stored data set:

sys.cpu.user host=webserver01,cpu=0  1356998400  1
sys.cpu.user host=webserver01,cpu=1 1356998400 4
sys.cpu.user host=webserver02,cpu=0 1356998400 2
sys.cpu.user host=webserver02,cpu=1 1356998400 1

and simply craft a query start=1356998400&m=sum:sys.cpu.user, we will get a value of 8 at 1356998400 that incorporates all 4 time series.

每个时间序列是由metric和一个或多个tag name/value键值对组成。

虽然在具体查询中tags是可选的,可以只指定metric 名称,每个metric返回都带有tags。

例如存储数据的格式如下:

sys.cpu.user host=webserver01,cpu=0  1356998400  1
sys.cpu.user host=webserver01,cpu=1 1356998400 4
sys.cpu.user host=webserver02,cpu=0 1356998400 2
sys.cpu.user host=webserver02,cpu=1 1356998400 1

查询的query为start=1356998400&m=sum:sys.cpu.user,会得到4个时间序列,值为8

If we want to aggregate the results for a specific group, we can filter on the host tag. The querystart=1356998400&m=sum:sys.cpu.user{host=webserver01} will return a value of 5, incorporating only the time series wherehost=webserver01. To drill down to a specific time series, you must include all of the tags for the series, e.g.start=1356998400&m=sum:sys.cpu.user{host=webserver01,cpu=0} will return 1.

如果你想整合一个特点的group,需要基于host tag进行过滤。例如查询start=1356998400&m=sum:sys.cpu.user{host=webserver01} ,返回值为5,只整合host=webserver01的数据。获取一个特定的时间序列,需要加上所有tag,例如start=1356998400&m=sum:sys.cpu.user{host=webserver01,cpu=0} ,返回值为1

Grouping

A query can also aggregate time series with multiple tags into groups based on a tag value. Two special characters can be passed to the right of the equals symbol in a query:

  • * - The asterisk will return a separate result for each unique tag value
  • | - The pipe will return a separate result only for the exact tag values specified

Let's take the following data set as an example:

sys.cpu.user host=webserver01,cpu=0  1356998400  1
sys.cpu.user host=webserver01,cpu=1 1356998400 4
sys.cpu.user host=webserver02,cpu=0 1356998400 2
sys.cpu.user host=webserver02,cpu=1 1356998400 1
sys.cpu.user host=webserver03,cpu=0 1356998400 5
sys.cpu.user host=webserver03,cpu=1 1356998400 3

If we want to query for the average CPU time across each server we can craft a query likestart=1356998400&m=avg:sys.cpu.user{host=*}. This will give us three results:

  1. The aggregated average for sys.cpu.user host=webserver01,cpu=0 and sys.cpu.user host=webserver01,cpu=1
  2. The aggregated average for sys.cpu.user host=webserver02,cpu=0 and sys.cpu.user host=webserver02,cpu=1
  3. The aggregated average for sys.cpu.user host=webserver03,cpu=0 and sys.cpu.user host=webserver03,cpu=1

*操作,返回唯一tag值对应的数据,这个例子返回三个结果,三个唯一tag值

However if we have many web servers in the system, this could create a ton of results. To filter on only the hosts we want you can use the pipe operator to select a subset of time series. For example start=1356998400&m=avg:sys.cpu.user{host=webserver01|webserver03}will return results only for webserver01 and webserver03.

|操作,是或的关系

Aggregation

A powerful feature of OpenTSDB is the ability to perform on-the-fly aggregations of multiple time series into a single set of data points. The original data is always available in storage but we can quickly extract the data in meaningful ways. Aggregation functions are means of merging two or more data points for a single time stamp into a single value. See Aggregators for details.

OpenTSDB一个强大的特性就是将多个时间序列整合为单个集合的数据点。原数据存储在storage上,可以按照不同方式获取数据。

整合函数是将多个数据点按照一个时间维度存储在单个值上。

Interpolation(插值采样)

When performing an aggregation, what happens if the time stamps of the data points for each time series fail to line up? Say we record the temperature every 5 minutes in different regions around the world. A sensor in Paris may send a temperature of 27c at 1356998400. Then a sensor in San Francisco may send a value of 18c at 1356998430, 30 seconds later. Antarctica may report -29c at 1356998529. If we run a query requesting the average temperature, we want all of the data points averaged together into a single point. This is where interpolationcomes into play. See Aggregators for details.

当进行聚合的时候,如果数据点得时间戳不是线性增长的话会怎么样?

例如每5分钟记录世界不同地方的温度。Paris是27c at 1356998400,30秒后San Francisco是18c at 1356998430,Antarctica是-29c at 1356998529。

如果我们查询平均问题,需要将所有数据点计算平均值到一个点。这个就是interpolation。

Downsampling

OpenTSDB can ingest a large amount of data, even a data point every second for a given time series. Thus queries may return a large number of data points. Accessing the results of a query with a large number of points from the API can eat up bandwidth. High frequencies of data can easily overwhelm Javascript graphing libraries, hence the choice to use GnuPlot. Graphs created by the GUI can be difficult to read, resulting in thick lines such as the graph below:

OpenTSDB可以存储很多的数据,每秒一个数据点的时间序列。但是查询可能会返回比较大的数据量。

通过API访问会比较消耗带宽。

画图可以使用Javascript图库,也可以使用GnuPlot。

Down sampling can be used at query time to reduce the number of data points returned so that you can extract better information from a graph or pass less data over a connection. Down sampling requires an aggregation function and a time interval. The aggregation function is used to compute a new data point across all of the data points in the specified interval with the proper mathematical function. For example, if the aggregation sum is used, then all of the data points within the interval will be summed together into a single value. If avg is chosen, then the average of all data points within the interval will be returned.

Down sampling可以用于减少数据量,这样你可以从图中获取更直观的信息。

Down sampling需要aggregation function 和time interval。

aggregation function 用于计算新的点,在确定的time interval内。

可以使用数据函数,例如sum和avg

Intervals are specified by a number and a unit of time. For example, 30m will aggregate data points every 30 minutes. 1h will aggregate across an hour. See Dates and Times for valid relative time units. Do not add the -ago to a down sampling query.

Intervals是特定的时间序列。

Using down sampling we can cleanup the previous graph to arrive at something much more useful:

使用down sampling后上图变为如下:

As of 2.1, downsampled timestamps are normalized based on the remainder of the original data point timestamp divided by the downsampling interval in milliseconds, i.e. the modulus. In Java the code is timestamp - (timestamp % interval_ms). For example, given a timestamp of 1388550980000, or 1/1/2014 04:36:20 UTC and an hourly interval that equates to 3600000 milliseconds, the resulting timestamp will be rounded to 1388548800000. All data points between 4 and 5 UTC will wind up in the 4 AM bucket. If you query for a day's worth of data downsampling on 1 hour, you will receive 24 data points (assuming there is data for all 24 hours).

Normalization works very well for common queries such as a day's worth of data downsampled to 1 minute or 1 hour. However if you try to downsample on an odd interval, such as 36 minutes, then the timestamps may look a little strange due to the nature of the modulus calculation. Given an interval of 36 minutes and our example above, the interval would be 2160000 milliseconds and the resulting timestamp1388549520 or 04:12:00 UTC. All data points between 04:12 and 04:48 would wind up in a single bucket. Also note that OpenTSDB cannot currently normalize on non-UTC times and it cannot normalize on weekly or monthly boundaries.

时间序列的Normalization

Rate

A number of data sources return values as constantly incrementing counters. One example is a web site hit counter. When you start a web server, it may have a hit counter of 0. After five minutes the value may be 1,024. After another five minutes it may be 2,048. The graph for a counter will be a somewhat straight line angling up to the right and isn't always very useful. OpenTSDB provides the rate key word that calculates the rate of change in values over time. This will transform counters into lines with spikes to show you when activity occurred and can be much more useful.

OpenTSDB提供rate key计算一段时间内值的变化趋势,这样将网站的counters转换为不同趋势的线,更利用展示活跃期。

The rate is the first derivative of the values. It's defined as (v2 - v1) / (t2 - t1). Therefore you will get the rate of change per second. Currently the rate of change between millisecond values defaults to a per second calculation.

计算公式为(v2 - v1) / (t2 - t1)。因此可以获取每秒的变化。

OpenTSDB 2.0 provides support for special monotonically increasing counter data handling including the ability to set a "rollover" value and suppress anomalous fluctuations. When the counterMax value is specified in a query, if a data point approaches this value and the point after is less than the previous, the max value will be used to calculate an accurate rate given the two points. For example, if we were recording an integer counter on 1 byte, the maximum value would be 65,535. If the value at t0 is 64000 and the value at t1 is 1000, the resulting rate would usually be calculated as -63000. However we know that it's likely the counter rolled over so we can set the max to 65535and now the calculation will be 65535 - t0 + t1 to give us 2535.

OpenTSDB2.0支持特定单调递增处理。

counterMax

t0 64000

t1 1000

rate为-63000

最大值为65535,计算公式为65535 - t0 + t1 =2535

(没明白??)

Systems that track data in counters often revert to 0 when restarted. When that happens and we could get a spurious result when using the max counter feature. For example, if the counter has reached 2000 at t0 and someone reboots the server, the next value may be 500 at t1. If we set our max to 65535 the result would be 65535 - 2000 + 500 to give us 64035. If the normal rate is a few points per second, this particular spike, with 30s between points, would create a rate spike of 2,134.5! To avoid this, we can set the resetValue which will, when the rate exceeds this value, return a data point of 0 so as to avoid spikes in either direction. For the example above, if we know that our rate almost never exceeds 100, we could configure a resetValue of 100 and when the data point above is calculated, it will return 0 instead of2,134.5. The default value of 0 means the reset value will be ignored, no rates will be suppressed.

Order of operations

Understanding the order of operations is important. When returning query results the following is the order in which processing takes place:

  1. Grouping
  2. Down Sampling
  3. Interpolation
  4. Aggregation
  5. Rate Calculations

理解操作的顺序很重要!!!

OpenTSDB-Querying or Reading Data的更多相关文章

  1. 【MySQL】MySQL同步报错-> Last_IO_Error: Got fatal error 1236 from master when reading data from binary log

    这个报错网上搜索了一下,大部分是由于MySQL意外关闭或强制重启造成的binlog文件事务点读取异常造成的主从同步报错 Last_IO_Error: Got fatal error 1236 from ...

  2. [Hive - Tutorial] Querying and Inserting Data 查询和插入数据

    Querying and Inserting Data Simple Query Partition Based Query Joins Aggregations Multi Table/File I ...

  3. mysql 主从 Got fatal error 1236 from master when reading data from binary log: 'Could not find first 错误

    本地MySQL环境,是两台MySQL做M-M复制.今天发现错误信息: mysql 5.5.28-log> show slave status\G ************************ ...

  4. Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary lo

    mysql> show slave status\G *************************** 1. row ***************************         ...

  5. SQL data reader reading data performance test

    /*Author: Jiangong SUN*/ As I've manipulated a lot of data using SQL data reader in recent project. ...

  6. Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'系列一:

    从库报这个错误:Got fatal error 1236 from master when reading data from binary log: 'Could not find first lo ...

  7. mysql从库Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'报错处理

    年后回来查看mysql运行状况与备份情况,登录mysql从库查看主从同步状态 mysql> show slave status\G; *************************** . ...

  8. Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'系列三:重置主从同步

    1:停止slave服务器的主从同步 stop slave; 2:对Master数据库加锁 flush tables with read lock; 3:备份Master上的数据 mysqldump - ...

  9. MySQL案例09:Last_IO_Error: Got fatal error 1236 from master when reading data from binary log

    刚处理完“挖矿”事件,在做最后一个MySQL NBU备份的时候,发现从库有问题,好奇的是怎么主从状态异常没有告警呢?先不管这么多了,处理了这个问题再完善告警内容. 一.错误信息 从库show slav ...

随机推荐

  1. WebApi实现验证授权Token,WebApi生成文档等

    using System; using System.Linq; using System.Web; using System.Web.Http; using System.Web.Security; ...

  2. Linux命令-基本命令(1)

    1. ll dfdfdfd 2. vi dfffd

  3. win10 3dmax 激活后反复激活和激活码无效问题

    我也是遇到这个问题在网上找答案,像什么断网,清理注册表,删除某个.dat文件 各种试了好多都没管用 弄这个弄了五六个小时才总算成功 心累 现在我总结一下这些方法  我是第一条成功的 其他的我试着都没用 ...

  4. 基于Eclipse IDE的Ardupilot飞控源码阅读环境搭建

    基于Eclipse IDE的Ardupilot飞控源码阅读环境搭建 作者:Awesome 日期:2017-10-21 需准备的软件工具 Ardupilot飞控源码 PX4 toolchain JAVA ...

  5. 【NOIP模拟】LCS及方案数(DP)

    Description 对于一个序列

  6. Integrates Git with Sublime 3 to pull or push to Github by using Sublime plugin Git

    1. Git must be installed, Sublime plugin "Git" only connects Sublime with Git. Download UR ...

  7. mybatis返回int类型报null

    解决这个问题,是当查出来为NULL时,结一个默认值,如:0. MySQL: SELECT IFNULL(MAX(id),0)AS sort FROM table Oracle: SELECT nvl( ...

  8. HTTP Error 500.19 - Internal Server Error

    1.使用svn对项目进行管理 2.之前都是平安无事,忽然有一天报错:HTTP Error 500.19 - Internal Server Error,如图: 3.经过各种挣扎和求证,最后发现是项目. ...

  9. 初学者Web介绍一些前端开发中的基本概念用到的技术

    Web开发是比较费神的,需要掌握很多很多的东西,特别是从事前端开发的朋友,需要通十行才行.今天,本文向初学者介绍一些Web开发中的基本概念和用到的技术,从A到Z总共26项,每项对应一个概念或者技术. ...

  10. 数据处理不等式:Data Processing Inequality

    我是在差分隐私下看到的,新解决方案的可用性肯定小于原有解决方案的可用性,也就是说信息的后续处理只会降低所拥有的信息量. 那么如果这么说的话为什么还要做特征工程呢,这是因为该不等式有一个巨大的前提就是数 ...