Why Time Series Data Is Unique

A time series is a series of data points indexed in time. The fact that time series data is ordered makes it unique in the data space because it often displays serial dependence序列依赖. Serial dependence occurs when the value of a datapoint at one time is statistically dependent on another datapoint in another time. However, this attribute of time series data violates违反 one of the fundamental assumptions of many statistical analyses — that data is statistically independent.

What Is Autocorrelation?

Autocorrelation is a type of serial dependence. Specifically, autocorrelation is when a time series is linearly related to a lagged version of itself. By contrast, correlation is simply when two independent variables are linearly related.

Why Autocorrelation Matters

Often, one of the first steps in any data analysis is performing regression analysis. However, one of the assumptions of regression analysis is that the data has no autocorrelation. This can be frustrating because if you try to do a regression analysis on data with autocorrelation, then your analysis will be misleading.

Additionally, some time series forecasting methods (specifically regression modeling) rely on the assumption that there isn’t any autocorrelation in the residuals (the difference between the fitted model and the data). People often use the residuals to assess whether their model is a good fit while ignoring that assumption that the residuals have no autocorrelation (or that the errors are independent and identically distributed or i.i.d). This mistake can mislead people into believing that their model is a good fit when in fact it isn’t. I highly recommend reading this article about How (not) to use Machine Learning for time series forecasting: Avoiding the pitfalls in which the author demonstrates how the increasingly popular LSTM (Long Short Term Memory) Network can appear to be an excellent univariate time series predictor, when in reality it’s just overfitting the data. He goes further to explain how this misconception is the result of accuracy metrics failing due to the presence of autocorrelation.

Finally, perhaps the most compelling aspect of autocorrelation analysis is how it can help us uncover hidden patterns in our data and help us select the correct forecasting methods. Specifically, we can use it to help identify seasonality and trend in our time series data. Additionally, analyzing the autocorrelation function (ACF) and partial autocorrelation function (PACF) in conjunction is necessary for selecting the appropriate ARIMA model for your time series prediction.

How to Determine if Your Time Series Data Has Autocorrelation by python

For this exercise, I’m using InfluxDB and the InfluxDB Python CL. I am using available data from the National Oceanic and Atmospheric Administration’s (NOAA) Center for Operational Oceanographic Products and Services. Specifically, I will be looking at the water levels and water temperatures of a river in Santa Monica.

More.

Autocorrelation in Time Series Data的更多相关文章

  1. 3.1.7. Cross validation of time series data

    3.1.7. Cross validation of time series data Time series data is characterised by the correlation bet ...

  2. 增长中的时间序列存储(Scaling Time Series Data Storage) - Part I

    本文摘译自 Netflix TechBlog : Scaling Time Series Data Storage - Part I 重点:扩容.缓存.冷热分区.分块. 时序数据 - 会员观看历史 N ...

  3. 时间序列大数据平台建设(Time Series Data,简称TSD)

    来源:https://blog.csdn.net/bluishglc/article/details/79277455 引言在大数据的生态系统里,时间序列数据(Time Series Data,简称T ...

  4. vehicle time series data analysis

    以HADOOP为代表的云计算提供的仅仅是一个算法执行环境,为大数据的并行计算提供了在现有软硬件水平下最好的(近似)方法.并不能解决大数据应用中的全部问题.从详细应用而言,通过物联网方式接入IT圈的数据 ...

  5. PP: Tripoles: A new class of relationships in time series data

    Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...

  6. Time Series data 与 sequential data 的区别

    It is important to note the distinction between time series and sequential data. In both cases, the ...

  7. PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

    From: Stanford University; Jure Leskovec, citation 6w+; Problem: subsequence clustering. Challenging ...

  8. Anomaly Detection for Time Series Data with Deep Learning——本质分类正常和异常的行为,对于检测异常行为,采用预测正常行为方式来做

    A sample network anomaly detection project Suppose we wanted to detect network anomalies with the un ...

  9. Time series data mining

    from here 论文Timeseries data mining(2012)中提出:时间序列数据挖掘包括7个基本任务和3个基础问题: 7 tasks: query by content clust ...

随机推荐

  1. H5网页布局+css代码美化

    HTML5的结构化标签,对搜索引擎更友好 li 标签对不利于搜索引擎的收录,尽量少用 banner图片一般拥有版权,不需要搜索引擎收录,因此可以使用ul + li <samp></s ...

  2. 转:Flutter开发中踩过的坑

    记录一下入手Flutter后实际开发中踩过的一些坑,这些坑希望后来者踩的越少越好.本文章默认读者已经掌握Flutter初步开发基础. 坑1问题:在debug模式下,App启动第一个页面会很慢,甚至是黑 ...

  3. Markdown数学公式如何打出回归符号

    来源:https://blog.csdn.net/garfielder007/article/details/51646604 函数.符号及特殊字符 语法 效果 语法 效果 语法 效果 \bar{x} ...

  4. C# 工具类LogHelper

    一.创建一个WinForm的项目,并通过NuGet安装log4net. 二.创建LogHelper类以及log4net.config配置文件. 三.编写相关代码. 1.LogHelper类 using ...

  5. js 字符串中"\"

    var a = '\a' console.log(a) // a ???? js 字符串中"\" 有特殊功能,反斜杠是一个转义字符 js 中 遇到字符串中有'\'时候需要注意 '\ ...

  6. cookies欺骗-bugkuctf

    解题思路: 打开链接是一串没有意义的字符串,查看源码没有发现什么,然后查看url,发现 filename的值是base64编码的,拿去解码 发现是一个文件,那么我们这里应该可以读取当前目录下的本地文件 ...

  7. sql-labs 18-20(sqlmap注入)

    这三题主要是关于HTTP头部的注入 常见的HTTP注入点产生位置为 [Referer].[X-Forwarded-For].[Cookie].[X-Real-IP].[Accept-Language] ...

  8. Linux 基础操作命令

    关机和注销 shutdown -h now 立刻关机 shutdown -r now 立刻重启 shutdown -h + 1分钟后关机(重启同样用法) shutdown -h : 11点钟关机(重启 ...

  9. Docker构建镜像过于缓慢解决-----Docker构建服务之部署和备份jekyll网站

    参考原文链接:https://www.jianshu.com/p/e6b7e68f2ba7 来自<第一本Docker书>,我觉得很有趣,就记录一下 准备国内ubuntu镜像 每次构建Ubu ...

  10. Maven修改test/rsource的output folder报错Test source folder 'src/test/java'... is not also used for main s

    eclipse新建maven项目时候,只出来三个文件夹,然后大都督手动添加了缺失的src/test/resource 的文件夹,最后想修改一下 Output folder的路径为 (原来是     d ...