PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

keeps_you_warm 2024-08-29 02:22:34 原文

From: Stanford University; Jure Leskovec, citation 6w+;

Problem:

subsequence clustering.

Challenging:

discover patterns is challenging because it requires simultaneous segmentation and clustering of the time series + interpreting the cluster results is difficult.

Why discover time series patterns is a challenge?? thinking by yourself!! there are already so many distance measures(DTW, manifold distance) and clustering methods(knn,k-means etc.). But I admit the interpretation is difficult.

Introduction:

long time series ----breakdown-----> a sequence of states/patterns ------> so time series can be expressed as a sequential timeline of a few key states. -------> discover repeated patterns/ understand trends/ detect anomalies/ better interpret large and high-dimensional datasets.

Key steps: simultaneously segment and cluster the time series.

Unsupervised learning: hard to interpretation, after clustering, you have to view data itself.

how to discover interpretable structure in the data?

Traditional clustering methods are not particularly well-suited to discover interpretable structure in the data. This is because they typically rely on distance-based metrics

distance-based metrics, DTW.

距离式的算法，在处理multivariate time series上有劣势，看不到细微的数据结构相似性。

Propose a new method for multivariate time series clustering TICC:

define each cluster as a dependency network showing the relationships between the different sensors in a short subsequence.
each cluster is a markov random field.
In thes MRFs, an edge represents a partial correlation between two variables.
learn each cluster's MRF by estimating a sparse Gaussian inverse covariance matrix.
This network has multiple layers.
the number of layers corresponds to the window size of a short subsequence.
逆协方差矩阵定义了MRF dependency network 的adjaccency matrix.

Related work:

time series clustering and convex optimization;

variations of dtw; symbolic representations; rule-based motif discovery;

However, these methods generally rely on distance-based metrics.

TICC ------ a model-based clustering method, like ARMA, Gaussian mixture or hidden markov models.

define each cluster by a Gaussian inverse covariance.
so the Gaussian inverse covariance defines a Markov random field encoding the structural representation.
K clusters/ inverse covariances.

selecting the number of clusters: cross-validation; mornalized mutual information; BIC or silhouette score.

看不懂哇 T T

Supplementary knowledge:

1. 对于unsupervised learning, 目前对结果的解释或者中间参数的选取，全是靠经验。

2. Aarhus data, Martin, 做多变量time series 预测。

3. Toeplitz Matrices: 常对角矩阵。

Reference:

1. 如何用简单易懂的例子解释条件随机场（CRF）模型？

PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data的更多相关文章

PP: Tripoles: A new class of relationships in time series data
Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...
图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix)
图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix) 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/ka ...
PP: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
PROBLEM: OmniAnomaly multivariate time series anomaly detection + unsupervised 主体思想: input: multivar ...
PP: Deep r -th Root of Rank Supervised Joint Binary Embedding for Multivariate Time Series Retrieval
from: Dacheng Tao 悉尼大学 PROBLEM: time series retrieval: given the current multivariate time series se ...
PP: Unsupervised deep embedding for clustering analysis
Problem: unsupervised clustering represent data in feature space; learn a non-linear mapping from da ...
[转]Multivariate Time Series Forecasting with LSTMs in Keras
1. Air Pollution Forecasting In this tutorial, we are going to use the Air Quality dataset. This is ...
PP: A dual-stage attention-based recurrent neural network for time series prediction
Problem: time series prediction The nonlinear autoregressive exogenous model: The Nonlinear autoregr ...
PP: Deep clustering based on a mixture of autoencoders
Problem: clustering A clustering network transforms the data into another space and then selects one ...
PP: Time series clustering via community detection in Networks
Improvement can be done in fulture:1. the algorithm of constructing network from distance matrix. 2. ...

随机推荐

unity 教程Tanks中的Transform.InverseTransformPoint理解
Tanks教程中在处理摄像机缩放的时候使用了下面的函数,取两个坦克的中心点之后,根据两个坦克之间的距离,保证两个坦克都在屏幕中,然后进行缩放. private float FindRequiredSi ...
jmeter请求参数的两种方式
Jmeter做接口测试,Body与Parameters的选取 1.普通的post请求和上传接口,选择Parameters. 2.json和xml请求接口,选择Body. 注意: 在做接口测试时注意下请 ...
day19 几个模块的学习
# 模块本质上就是一个 .py 文件# 数据类型# 列表.元组# 字典# 集合.frozenset# 字符串# 堆栈:特点:先进后出# 队列:先进先出 FIFO # from collections ...
剑指offer-面试题61-扑克牌中的顺子-数组
/* 题目: 从扑克牌中随机抽取n个数字,判断他们是否连续,扑克牌从A~K,大小王可代替任意数字. */ #include<iostream> #include<cstdlib> ...
Scheduled和HttpClient的连环坑
首页 > JAVA > @Scheduled和HttpClient的连环坑 @Scheduled和HttpClient的连环坑 2018-03-22 曾经踩过一个大坑: 由于业务特殊性,会 ...
OSI七层协议大白话解读
参考链接:https://www.cnblogs.com/zx125/p/11295985.html 国际标准化组织(ISO)制定了osi七层模型,iso规定了各种各样的协议,并且分了7层应用层应 ...
SpringBoot整合NoSql--（二）MongoDB
简介: MongoDB 是一个基于分布式文件存储的数据库.由 C++ 语言编写.旨在为 WEB 应用提供可扩展的高性能数据存储解决方案.MongoDB 是一个介于关系数据库和非关系数据库之间的产品,是 ...
day 9 深浅拷贝
浅copy 现有数据 data = { "name":"alex", "age":18, "scores":{ &quo ...
MySQL优化、锁
1. MySQL优化-查看执行记录 MySQL 提供了一个 EXPLAIN 命令, 它可以对 SELECT 语句进行分析, 并输出 SELECT 执行的详细信息, 以供开发人员针对性优化. 使用ex ...
cf1067b
题意简述:判断所给图是不是一个k递归图这是一个2递归图题解:仔细观察发现中心点一定是直径的中点,因此找到直径中点之后进行bfs判断即可,这里注意判断递归层次太大也不符合 const int max ...