Time Series Analysis

Best MSE (Mean Square Error) Predictor

对于所有可能的预测函数 \(f(X_{n})\),找到一个使 \(\mathbb{E}\big[\big(X_{n} - f(X_{n})\big)^{2} \big]\) 最小的 \(f\) 的 predictor。这样的 predictor 假设记为 \(m(X_{n})\), 称作 best MSE predictor,i.e.,

\[m(X_{n}) = \mathop{\arg\min}\limits_{f} \mathbb{E}\big[ \big( X_{n+h} - f(X_{n}) \big)^{2} \big]
\]

我们知道:\(\mathop{\arg\min}\limits_{f} \mathbb{E}\big[ \big( X_{n+h} - f(X_{n}) \big)^{2} \big]\) 的解即为:

\[\mathbb{E}\big[ X_{n+h} ~ \big| ~ X_{n} \big]
\]

证明:

基于 \(X_{n}\) 求 \(\mathbb{E}\big[ \big( X_{n+h} - f(X_{n}) \big)^{2} \big]\) 的最小值,实际上:

\[\mathop{\arg\min}\limits_{f} \mathbb{E}\big[ \big( X_{n+h} - f(X_{n}) \big)^{2} \big] \iff \mathop{\arg\min}\limits_{f} \mathbb{E}\big[ \big( X_{n+h} - f(X_{n}) \big)^{2} ~ \big| ~ X_{n} \big]
\]

  • 私以为更严谨的写法是 \(\mathop{\text{argmin}}\limits_{f} ~ \mathbb{E}\Big[\Big(X_{n+h} - f\big( X_{n}\big)\Big)^{2} ~ | ~ \mathcal{F}_{n}\Big]\),其中 \(\left\{ \mathcal{F}_{t}\right\}_{t\geq 0}\) 为 \(\left\{ X_{t} \right\}_{t\geq 0}\) 相关的 natural filtration,but whatever。

等式右侧之部分:

\[\begin{align*}
\mathbb{E}\big[ \big( X_{n+h} - f(X_{n}) \big)^{2} ~ \big| ~ X_{n} \big] & = \mathbb{E}[X_{n+h}^{2} ~ | ~ X_{n}] - 2f(X_{n})\mathbb{E}[X_{n+h} ~ | ~ X_{n}] + f^{2}(X_{n}) \\
\end{align*}
\]

其中由于:

\[\begin{align*}
Var(X_{n+h} ~ | ~ X_{n}) & = \mathbb{E}\Big[ \big( X_{n+h} - \mathbb{E}\big[ X_{n+h}^{2} ~ | ~ X_{n} \big] \big)^{2} ~ \Big| ~ X_{n} \Big] \\
& = \mathbb{E}\big[ X_{n+h}^{2} ~ \big| ~ X_{n} \big] - 2\mathbb{E}^{2}\big[ X_{n+h}^{2} ~ \big| ~ X_{n} \big] + \mathbb{E}^{2}\big[ X_{n+h}^{2} ~ \big| ~ X_{n} \big] \\
& = \mathbb{E}\big[ X_{n+h}^{2} ~ \big| ~ X_{n} \big] - \mathbb{E}^{2}\big[ X_{n+h}^{2} ~ \big| ~ X_{n} \big]
\end{align*}
\]

which gives that:

\[\implies Var(X_{n+h} ~ | ~ X_{n}) = \mathbb{E}\big[ X_{n+h}^{2} ~ \big| ~ X_{n} \big] - \mathbb{E}^{2}\big[ X_{n+h} ~ \big| ~ X_{n} \big]
\]

因此,

\[\begin{align*}
\mathbb{E}\big[ \big( X_{n+h} - f(X_{n}) \big)^{2} ~ \big| ~ X_{n} \big] & = Var(X_{n+h} ~ | ~ X_{n}) + \mathbb{E}^{2}\big[ X_{n+h} ~ \big| ~ X_{n}\big] - 2f(X_{n})\mathbb{E}[X_{n+h} ~ | ~ X_{n}] + f^{2}(X_{n}) \\
& = Var(X_{n+h} ~ | ~ X_{n}) + \Big( \mathbb{E}\big[ X_{n+h} ~ \big| ~ X_{n}\big] - f(X_{n}) \Big)^{2}
\end{align*}
\]

方差 \(Var(X_{n+h} ~ | ~ X_{n})\) 为定值,那么 optimal solution \(m(X_{n})\) 显而易见:

\[m(X_{n}) = \mathbb{E}\big[ X_{n+h} ~ \big| ~ X_{n} \big]
\]

此时 \(\left\{ X_{t} \right\}\) 为一个 Stationary Gaussian Time Series, i.e.,

\[\begin{pmatrix}
X_{n+h}\\
X_{n}
\end{pmatrix} \sim N \begin{pmatrix}
\begin{pmatrix}
\mu \\
\mu
\end{pmatrix}, ~ \begin{pmatrix}
\gamma(0) & \gamma(h) \\
\gamma(h) & \gamma(0)
\end{pmatrix}
\end{pmatrix}
\]

那么我们有:

\[X_{n+h} ~ | ~ X_{n} \sim N\Big( \mu + \rho(h)\big(X_{n} - \mu\big), ~ \gamma(0)\big(1 - \rho^{2}(h)\big) \Big)
\]

其中 \(\rho(h)\) 为 \(\left\{ X_{t} \right\}\) 的 ACF,因此,

\[\mathbb{E}\big[ X_{n+h} ~ \big| ~ X_{n} \big] = m(X_{n}) = \mu + \rho(h) \big( X_{n} - \mu \big)
\]

注意:

若 \(\left\{ X_{t} \right\}\) 是一个 Gaussian time series,则一定能计算 best MSE predictor。而若 \(\left\{ X_{t} \right\}\) 并非 Gaussian time series,则计算通常十分复杂。

因此,我们通常不找 best MSE predictor,而寻找 best linear predictor。


Best Linear Predictor (BLP)

在 BLP 假设下,我们寻找一个形如 \(f(X_{n}) \propto aX_{n} + b\) 的 predictor。

则目标为:

\[\text{minimize: } ~ S(a,b) = \mathbb{E} \big[ \big( X_{n+h} - aX_{n} -b \big)^{2} \big]
\]

推导:

分别对 \(a, b\) 求偏微分:

\[\begin{align*}
\frac{\partial}{\partial b} S(a, b) & = \frac{\partial}{\partial b} \mathbb{E} \big[ \big( X_{n+h} - aX_{n} -b \big)^{2} \big] \\
& = -2 \mathbb{E} \big[ X_{n+h} - aX_{n} - b \big] \\
\end{align*}
\]

令:

\[\frac{\partial}{\partial b} S(a, b) = 0
\]

则:

\[\begin{align*}
-2 \cdot & \mathbb{E} \big[ X_{n+h} - aX_{n} - b \big] = 0 \\
\implies & \qquad \mathbb{E}[X_{n+h}] - a\mathbb{E}[X_{n}] - b = 0\\
\implies & \qquad \mu - a\mu - b = 0 \\
\implies & \qquad b^{\star} = (1 - a^{\star}) \mu
\end{align*}
\]

回代并 take partial derivative on \(a\):

\[\begin{align*}
\frac{\partial}{\partial a} S(a, b) & = \frac{\partial}{\partial a} \mathbb{E} \big[ \big( X_{n+h} - aX_{n} - (1 - a)\mu \big)^{2} \big] \\
& = \frac{\partial}{\partial a} \mathbb{E} \Big[ \Big( \big(X_{n+h} - \mu \big) - \big( X_{n} - \mu \big) a \Big)^{2} \Big] \\
& = \mathbb{E} \Big[ - \big( X_{n} - \mu \big) \Big( \big(X_{n+h} - \mu \big) - \big( X_{n} - \mu \big) a \Big)\Big] \\
\end{align*}
\]

令:

\[\frac{\partial}{\partial a} S(a, b) = 0
\]

则:

\[\begin{align*}
& \mathbb{E} \Big[ - \big( X_{n} - \mu \big) \Big( \big(X_{n+h} - \mu \big) - \big( X_{n} - \mu \big) a \Big)\Big] = 0 \\
\implies & \qquad \mathbb{E} \Big[\big( X_{n} - \mu \big) \Big( \big(X_{n+h} - \mu \big) - \big( X_{n} - \mu \big) a \Big)\Big] = 0 \\
\implies & \qquad \mathbb{E} \Big[\big( X_{n} - \mu \big) \big(X_{n+h} - \mu \big) - a \big( X_{n} - \mu \big) \big( X_{n} - \mu \big) \Big] = 0 \\
\implies & \qquad \mathbb{E} \Big[\big( X_{n} - \mu \big) \big(X_{n+h} - \mu \big) \Big] = a \cdot \mathbb{E} \Big[\big( X_{n} - \mu \big) \big( X_{n} - \mu \big) \Big] \\
\implies & \qquad \mathbb{E} \Big[\big( X_{n} - \mathbb{E}[X_{n}] \big) \big(X_{n+h} - \mathbb{E}[X_{n+h}] \big) \Big] = a \cdot \mathbb{E} \Big[\big( X_{n} - \mathbb{E}[X_{n}] \big)^{2} \Big] \\
\implies & \qquad \text{Cov}(X_{n}, X_{n+h}) = a \cdot \text{Var}(X_{n}) \\
\implies & \qquad a^{\star} = \frac{\gamma(h)}{\gamma(0)} = \rho(h)
\end{align*}
\]

综上,time series \(\left\{ X_{n} \right\}\) 的 BLP 为:

\[f(X_{n}) = l(X_{n}) = \mu + \rho(h) \big( X_{n} - \mu \big)
\]

且 BLP 相关的 MSE 为:

\[\begin{align*}
\text{MSE} & = \mathbb{E}\big[ \big( X_{n+h} - l(X_{n}) \big)^{2} \big] \\
& = \mathbb{E} \Big[ \Big( X_{n+h} - \mu - \rho(h) \big( X_{n} - \mu \big) \Big)^{2} \Big] \\
& = \rho(0) \cdot \big( 1 - \rho^{2}(h) \big)
\end{align*}
\]

Time Series Analysis (Best MSE Predictor & Best Linear Predictor)的更多相关文章

  1. PP: Multilevel wavelet decomposition network for interpretable time series analysis

    Problem: the important frequency information is lack of effective modelling. ?? what is frequency in ...

  2. A New Recurrence-Network-Based Time Series Analysis Approach for Characterizing System Dynamics - Guangyu Yang, Daolin Xu * and Haicheng Zhang

    Purpose: characterize the evolution of dynamical systems. In this paper, a novel method based on eps ...

  3. survey on Time Series Analysis Lib

    (1)I spent my 4th year Computing project on implementing time series forecasting for Java heap usage ...

  4. time series analysis

    1 总体介绍 在以下主题中,我们将回顾有助于分析时间序列数据的技术,即遵循非随机顺序的测量序列.与在大多数其他统计数据的上下文中讨论的随机观测样本的分析不同,时间序列的分析基于数据文件中的连续值表示以 ...

  5. predict.glm -> which class does it predict?

    Jul 10, 2009; 10:46pm predict.glm -> which class does it predict? 2 posts Hi, I have a question a ...

  6. Visibility Graph Analysis of Geophysical Time Series: Potentials and Possible Pitfalls

    Tasks: invest papers  3 篇. 研究主动权在我手里.  I have to.  1. the benefit of complex network: complex networ ...

  7. Regression analysis

    Source: http://wenku.baidu.com/link?url=9KrZhWmkIDHrqNHiXCGfkJVQWGFKOzaeiB7SslSdW_JnXCkVHsHsXJyvGbDv ...

  8. Bayesian generalized linear model (GLM) | 贝叶斯广义线性回归实例

    一些问题: 1. 什么时候我的问题可以用GLM,什么时候我的问题不能用GLM? 2. GLM到底能给我们带来什么好处? 3. 如何评价GLM模型的好坏? 广义线性回归啊,虐了我快几个月了,还是没有彻底 ...

  9. Time Series data 与 sequential data 的区别

    It is important to note the distinction between time series and sequential data. In both cases, the ...

  10. 7、RNAseq Downstream Analysis

    Created by Dennis C Wylie, last modified on Jun 29, 2015 Machine learning methods (including cluster ...

随机推荐

  1. Java继承Frame画一个窗口显示图片

    将图片显示到窗口上. 在工程目录下准备好图片5.png 运行代码: import javax.imageio.ImageIO; import java.awt.*; import java.awt.e ...

  2. 【云原生 · Kubernetes】Jenkins+Gitlab+Rancher+Docker 实现自动构建镜像的 CI 平台(一)

    1 准备 Jenkins+Gitlab 实验环境 1.1 准备实验环境:恢复到以一下快照:该环境已经配置好 jenkins+gitlab+sonar-配置通 主机角色: IP 地址 运行的服务 硬件配 ...

  3. 大前端html基础学习02

    CSS核心属性 一.css属性和属性值的定义 属性:属性是指定选择符所具有的属性,它是css的核心. 属性值:属性值包括法定属性值及常见的数值加单位,如25px,或颜色值等. 二.CSS文本属性 1. ...

  4. <二>派生类的构造过程

    派生类从继承可以继承来所有的成员(变量和方法) 除了构造函数和析构函数 派生类怎么初始化从基类继承来的成员变量的呢?通过调用基类的构造函数来初始化 派生类的构造函数和析构函数,负责初始化和清理派生类部 ...

  5. Datawhale组队学习_Task03:详读西瓜书+南瓜书第4章

    第4章 决策树 4.1 基本流程 #输入:训练集D={${(x_1,y_1),(x_2,y_2),...,(x_m,y_m)}$}; #属性集A=${{a_1,a_2,...,a_d}}$. #过程: ...

  6. 一文理解什么是DevOps,通俗易懂白话文

    一文理解什么是DevOps,通俗易懂白话文 devops是什么❝DevOps维基百科定义 DevOps(Development和Operations的组合词)是一种重视"软件开发人员(Dev ...

  7. 漫谈计算机网络:番外篇 ------网络安全相关知识——>公钥与私钥、防火墙与入侵检测

    <漫谈计算机网络>上次已经完结啦,今天出一个番外篇! 2022-12-06 今天我们来聊一聊网络安全 废话不多说直接进入正题 网络安全问题概述 计算机网络面临的安全性威胁 两大类威胁:被动 ...

  8. Qt对象跨线程出现的问题记录,以及解决方案

    Qt在跨线程开发的时候可能会出现不少问题,在这里记录一下 Qt目前用下来还是非常强大的,虽然只是用在桌面端程序开发上,但是其强大的桌面开发库真的挺好用的(Layout除外,你妈死了). Qt除了UI, ...

  9. 线程、GIL全局解释器锁、进程池与线程池

    目录 多进程实现TCP服务端并发 互斥锁代码实操 线程理论 创建线程的两种方式 多线程实现TCP服务端并发 线程的诸多特性 GIL全局解释器锁 验证GIL的存在 GIL与普通互斥锁 python多线程 ...

  10. C++进阶(map+set容器模拟实现)

    关联式容器 关联式容器也是用来存储数据的,与序列式容器(如vector.list等)不同的是,其里面存储的是<key,value>结构的键值对,在数据检索时比序列式容器效率更高.今天要介绍 ...