From: Stanford University; Jure Leskovec, citation 6w+;

Problem:

subsequence clustering.

Challenging:

discover patterns is challenging because it requires simultaneous segmentation and clustering of the time series + interpreting the cluster results is difficult.

Why discover time series patterns is a challenge?? thinking by yourself!! there are already so many distance measures(DTW, manifold distance) and clustering methods(knn,k-means etc.). But I admit the interpretation is difficult.

Introduction:

long time series ----breakdown-----> a sequence of states/patterns ------> so time series can be expressed as a sequential timeline of a few key states. -------> discover repeated patterns/ understand trends/ detect anomalies/ better interpret large and high-dimensional datasets.

Key steps: simultaneously segment and cluster the time series.

Unsupervised learning: hard to interpretation, after clustering, you have to view data itself.

how to discover interpretable structure in the data?

Traditional clustering methods are not particularly well-suited to discover interpretable structure in the data. This is because they typically rely on distance-based metrics

distance-based metrics, DTW.

距离式的算法,在处理multivariate time series上有劣势,看不到细微的数据结构相似性。

Propose a new method for multivariate time series clustering TICC:

  • define each cluster as a dependency network showing the relationships between the different sensors in a short subsequence.
  • each cluster is a markov random field.
  • In thes MRFs, an edge represents a partial correlation between two variables.
  • learn each cluster's MRF by estimating a sparse Gaussian inverse covariance matrix.
  • This network has multiple layers.
  • the number of layers corresponds to the window size of a short subsequence.
  • 逆协方差矩阵定义了MRF dependency network 的adjaccency matrix.

Related work:

time series clustering and convex optimization;

variations of dtw; symbolic representations; rule-based motif discovery;

However, these methods generally rely on distance-based metrics.

TICC ------ a model-based clustering method, like ARMA, Gaussian mixture or hidden markov models.

  • define each cluster by a Gaussian inverse covariance.
  • so the Gaussian inverse covariance defines a Markov random field encoding the structural representation.
  • K clusters/ inverse covariances.

selecting the number of clusters: cross-validation; mornalized mutual information; BIC or silhouette score.

看不懂哇 T T

Supplementary knowledge:

1. 对于unsupervised learning, 目前对结果的解释或者中间参数的选取,全是靠经验。

2. Aarhus data, Martin, 做多变量time series 预测。

3. Toeplitz Matrices: 常对角矩阵。

4. ticc code

Reference:

1. 如何用简单易懂的例子解释条件随机场(CRF)模型?

PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data的更多相关文章

  1. PP: Tripoles: A new class of relationships in time series data

    Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...

  2. 图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix)

    图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix) 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/ka ...

  3. PP: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network

    PROBLEM: OmniAnomaly multivariate time series anomaly detection + unsupervised 主体思想: input: multivar ...

  4. PP: Deep r -th Root of Rank Supervised Joint Binary Embedding for Multivariate Time Series Retrieval

    from: Dacheng Tao 悉尼大学 PROBLEM: time series retrieval: given the current multivariate time series se ...

  5. PP: Unsupervised deep embedding for clustering analysis

    Problem: unsupervised clustering represent data in feature space; learn a non-linear mapping from da ...

  6. [转]Multivariate Time Series Forecasting with LSTMs in Keras

    1. Air Pollution Forecasting In this tutorial, we are going to use the Air Quality dataset. This is ...

  7. PP: A dual-stage attention-based recurrent neural network for time series prediction

    Problem: time series prediction The nonlinear autoregressive exogenous model: The Nonlinear autoregr ...

  8. PP: Deep clustering based on a mixture of autoencoders

    Problem: clustering A clustering network transforms the data into another space and then selects one ...

  9. PP: Time series clustering via community detection in Networks

    Improvement can be done in fulture:1. the algorithm of constructing network from distance matrix. 2. ...

随机推荐

  1. gcd手写代码及STL中的使用方法

    一.手写代码 inline int gcd(int x,int y){ if(y==0) return x; else return(gcd(y,x%y)); } 二.STL中的使用方法 注:在STL ...

  2. 物联网开源框架Thingsboard使用总结

    Thingsboard中文社区:http://thingsboard.iotschool.com/ 参考网址:https://thingsboard.io/docs/getting-started-g ...

  3. 通过Java代码获取系统信息

    在开发中,我们需要获取JVM中的信息,以及操作系统信息,内存信息,CPU信息,磁盘信息,网络信息等,通过Java的API不能获取内存等信息,需要sigar的第三方依赖包. ①:加入依赖 <dep ...

  4. HTML表单概念、语法及创建表单,案例

    form 标签 Input标签的type属性值 单行文本域 <input type="text" /> 图像域(图像提交按钮) 下拉菜单和列表标签 select 标签属 ...

  5. excel的count、countif、sunif、if

    一.count统计数值个数 格式:count(指定区域)  , 例如:count(B2:G5) 二.countif统计数值满足条件个数 格式:COUNTIF(条件区域,指定条件)  ,例如:count ...

  6. laravel手动数组分页

    laravel文档中已经有写如何自己使用分页类去分页了,但没有详细说明. 如果你想手动创建分页实例并且最终得到一个数组类型的结果,可以根据需求来创建 IlluminatePaginationPagin ...

  7. Tomcat + mysql + myeclipse 启动遇到的问题

    1. 问题: Tomcat启动时报错如下:Table 'performance_schema.session_variables' doesn't exist 2. 网络上普遍找到的解决办法: 控制台 ...

  8. Redis的各个数据的类型基本命令

    什么是Redis: 概念: Redis (REmote DIctionary Server) 是用 C 语言开发的一个开源的高性能键值对(key-value)数据库. 特征:1. 数据间没有必然的关联 ...

  9. pillow 模块

    pillow模块 图片处理 中文文档 安装 pip install Pillow 对图片旋转90度显示 from PIL import Image im=Image.open("t.jpg& ...

  10. oracle备份与还原数据

    一.表数据备份与还原 creat table  备份表   select * from  原表  where insert  into  原表  select  *  from  备份表 二.利用备份 ...