From: Stanford University; Jure Leskovec, citation 6w+;

Problem:

subsequence clustering.

Challenging:

discover patterns is challenging because it requires simultaneous segmentation and clustering of the time series + interpreting the cluster results is difficult.

Why discover time series patterns is a challenge?? thinking by yourself!! there are already so many distance measures(DTW, manifold distance) and clustering methods(knn,k-means etc.). But I admit the interpretation is difficult.

Introduction:

long time series ----breakdown-----> a sequence of states/patterns ------> so time series can be expressed as a sequential timeline of a few key states. -------> discover repeated patterns/ understand trends/ detect anomalies/ better interpret large and high-dimensional datasets.

Key steps: simultaneously segment and cluster the time series.

Unsupervised learning: hard to interpretation, after clustering, you have to view data itself.

how to discover interpretable structure in the data?

Traditional clustering methods are not particularly well-suited to discover interpretable structure in the data. This is because they typically rely on distance-based metrics

distance-based metrics, DTW.

距离式的算法,在处理multivariate time series上有劣势,看不到细微的数据结构相似性。

Propose a new method for multivariate time series clustering TICC:

  • define each cluster as a dependency network showing the relationships between the different sensors in a short subsequence.
  • each cluster is a markov random field.
  • In thes MRFs, an edge represents a partial correlation between two variables.
  • learn each cluster's MRF by estimating a sparse Gaussian inverse covariance matrix.
  • This network has multiple layers.
  • the number of layers corresponds to the window size of a short subsequence.
  • 逆协方差矩阵定义了MRF dependency network 的adjaccency matrix.

Related work:

time series clustering and convex optimization;

variations of dtw; symbolic representations; rule-based motif discovery;

However, these methods generally rely on distance-based metrics.

TICC ------ a model-based clustering method, like ARMA, Gaussian mixture or hidden markov models.

  • define each cluster by a Gaussian inverse covariance.
  • so the Gaussian inverse covariance defines a Markov random field encoding the structural representation.
  • K clusters/ inverse covariances.

selecting the number of clusters: cross-validation; mornalized mutual information; BIC or silhouette score.

看不懂哇 T T

Supplementary knowledge:

1. 对于unsupervised learning, 目前对结果的解释或者中间参数的选取,全是靠经验。

2. Aarhus data, Martin, 做多变量time series 预测。

3. Toeplitz Matrices: 常对角矩阵。

4. ticc code

Reference:

1. 如何用简单易懂的例子解释条件随机场(CRF)模型?

PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data的更多相关文章

  1. PP: Tripoles: A new class of relationships in time series data

    Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...

  2. 图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix)

    图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix) 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/ka ...

  3. PP: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network

    PROBLEM: OmniAnomaly multivariate time series anomaly detection + unsupervised 主体思想: input: multivar ...

  4. PP: Deep r -th Root of Rank Supervised Joint Binary Embedding for Multivariate Time Series Retrieval

    from: Dacheng Tao 悉尼大学 PROBLEM: time series retrieval: given the current multivariate time series se ...

  5. PP: Unsupervised deep embedding for clustering analysis

    Problem: unsupervised clustering represent data in feature space; learn a non-linear mapping from da ...

  6. [转]Multivariate Time Series Forecasting with LSTMs in Keras

    1. Air Pollution Forecasting In this tutorial, we are going to use the Air Quality dataset. This is ...

  7. PP: A dual-stage attention-based recurrent neural network for time series prediction

    Problem: time series prediction The nonlinear autoregressive exogenous model: The Nonlinear autoregr ...

  8. PP: Deep clustering based on a mixture of autoencoders

    Problem: clustering A clustering network transforms the data into another space and then selects one ...

  9. PP: Time series clustering via community detection in Networks

    Improvement can be done in fulture:1. the algorithm of constructing network from distance matrix. 2. ...

随机推荐

  1. 消息队列MQ(一)

    消息队列 为什么要用消息队列,都有什么优缺点? 要问的是消息队列都有哪些场景,然后项目里具体实现的什么场景,你在这个场景里用的什么消息队列? 期望的回答是,你们公司有个什么业务,这个业务场景有什么技术 ...

  2. everspin最新1Gb容量扩大MRAM吸引力

    everspin提供了8/16-bit的DDR4-1333MT/s(667MHz)接口,但与较旧的基于DDR3的MRAM组件一样,时序上的差异使得其难以成为DRAM(动态随机存取器)的直接替代品.   ...

  3. linux中shell内置命令和外置命令

    shell内置命令 无法通过which或者whereis去查找命令的位置 例如cd,cp这些命令是shell解释器内置的命令 当shell内置命令传入shell解释器,shell解释器通过内核获取相关 ...

  4. .NetCore学习笔记:四、AutoMapper对象映射

    什么是AutoMapper?AutoMapper是一个简单的小型库,用于解决一个看似复杂的问题 - 摆脱将一个对象映射到另一个对象的代码.这种类型的代码是相当沉闷和无聊的写,所以为什么不发明一个工具来 ...

  5. JavaWeb学生公寓(宿舍)管理系统源码

    开发环境: Windows操作系统开发工具: MyEclipse+Jdk+Tomcat+MySQL数据库 运行效果图 源码及原文链接:https://javadao.xyz/forum.php?mod ...

  6. UPAD for iCloud

    UPAD for iCloud笔记软件 在笔记软件中创建文件夹橡皮:按两下橡皮就是清除整个屏幕导出笔记到pdf,或者直接导出到其他应用中打开在当前页面中新建一个页面删除某个页面

  7. 为spring cloud config实现刷新动态掉的坑

    正常搭建配置中心,网上教程多,这里不讨论,只记坑也是为了后来者少花时间在这里,由于是当时研究了好久才写的文章,所以只能提供问题的原因,当然会给出印证的思路,闲话不多说进入正题! 版本spring bo ...

  8. 一台服务器搭建部署两个或多个Redis实例

    问题描述: 今天程序那边说测试服里面有两个项目,为了数据不冲突,想一台服务器搞两个Redis实例, 然后自己这边查询了一下,确实可以这么整,只需要区分端口号和区分配置文件方面就行, 原理与nginx和 ...

  9. 2019年IT事故盘点【IT必读】

    昀哥@老兵笔记 2020农历新年开局不容易,新冠肺炎仍在攻艰克难阶段.回首过去的9102年,总有一些事主要是事故值得去记录.下面我们来盘点一下9102年的“外部事故”. 一,我们遭遇的IT基础设施服务 ...

  10. Patter discovery VS clustering

    “pattern driven” (PD) is based on enumerating candidate patterns in a given solution space and picki ...