How to Use Convolutional Neural Networks for Time Series Classification

2019-10-08 12:09:35

This blog is from: https://towardsdatascience.com/how-to-use-convolutional-neural-networks-for-time-series-classification-56b1b0a07a57

Introduction

A large amount of data is stored in the form of time series: stock indices, climate measurements, medical tests, etc. Time series classification has a wide range of applications: from identification of stock market anomalies to automated detection of heart and brain diseases.

There are many methods for time series classification. Most of them consist of two major stages: on the first stage you either use some algorithm for measuring the difference between time series that you want to classify (dynamic time warping is a well-known one) or you use whatever tools are at your disposal (simple statistics, advanced mathematical methods etc.) to represent your time series as feature vectors. In the second stage you use some algorithm to classify your data. It can be anything from k-nearest neighbors and SVMs to deep neural network models. But one thing unites these methods: they all require some kind of feature engineering as a separate stage before classification is performed.

Fortunately, there are models that not only incorporate feature engineering in one framework, but also eliminate any need to do it manually: they are able to extract features and create informative representations of time series automatically. These models are recurrent and convolutional neural networks (CNNs).

Research has shown that using CNNs for time series classification has several important advantages over other methods. They are highly noise-resistant models, and they are able to extract very informative, deep features, which are independent from time. In this article we will examine in detail how exactly the 1-D convolution works on time series. Then, I will give an overview of a more sophisticated model proposed by the researchers from Washington University in St. Louis. Finally, we will look at a simplified multi-scale CNN code example.

1-D Convolution for Time Series

Imagine a time series of length n and width k. The length is the number of timesteps, and the width is the number of variables in a multivariate time series. For example, for electroencephalography it is the number of channels (nodes on the head of a person), and for a weather time series it can be such variables as temperature, pressure, humidity etc.

The convolution kernels always have the same width as the time series, while their length can be varied. This way, the kernel moves in one direction from the beginning of a time series towards its end, performing convolution. It does not move to the left or to the right as it does when the usual 2-D convolution is applied to images.

 

1-D Convolution for Time Series. Source: [2] (modified).

The elements of the kernel get multiplied by the corresponding elements of the time series that they cover at a given point. Then the results of the multiplication are added together and a nonlinear activation function is applied to the value. The resulting value becomes an element of a new “filtered” univariate time series, and then the kernel moves forward along the time series to produce the next value. The number of new “filtered” time series is the same as the number of convolution kernels. Depending on the length of the kernel, different aspects, properties, “features” of the initial time series get captured in each of the new filtered series.

The next step is to apply global max-pooling to each of the filtered time series vectors: the largest value is taken from each vector. A new vector is formed from these values, and this vector of maximums is the final feature vector that can be used as an input to a regular fully connected layer. This whole process is illustrated in the picture above.

Let’s take it to another level

With this simple example in mind, let’s examine the model of a multi-scale convolutional neural network for time series classification [1].

The multi-scalability of this model consists in its architecture: in the first convolutional layer the convolution is performed on 3 parallel independent branches. Each branch extracts features of different nature from the data, operating at different time and frequency scales.

The framework of this network consists of 3 consecutive stages: transformation, local convolution, and full convolution.

 

Multi-Scale Convolutional Neural Network Architecture [1].

Transformation

On this stage different transformations are applied to the original time series on 3 separate branches. The first branch transformation is identity mapping, meaning that the original time series remains intact.

The second branch transformation is smoothing the original time series with a moving average with various window sizes. This way, several new time series with different degrees of smoothness are created. The idea behind this is that each new time series consolidates information from different frequencies of the original data.

Finally, the third branch transformation is down-sampling the original time series with various down-sampling coefficients. The smaller the coefficient, the more detailed the new time series is, and, therefore, it consolidates information about the time series features on a smaller time scale. Down-sampling with larger coefficients results in less detailed new time series which capture and emphasize those features of the original data that exhibit themselves on larger time scales.

Local Convolution

On this stage the 1-D convolution with different filter sizes that we discussed earlier is applied to the time series. Each convolutional layer is followed by a max-pooling layer. In the previous, simpler example global max pooling was used. Here, max pooling is not global, but still the pooling kernel size is extremely large, much larger than the sizes you are used to when working with image data. More specifically, the pooling kernel size is determined by the formula n/p, where n is the length of the time series, and p is a pooling factor, typically chosen between the values {2, 3, 5}. This stage is called local convolution because each branch is processed independently.

Full Convolution

On this stage all the outputs of local convolution stage from all 3 branches are concatenated. Then several more convolutional and max-pooling layers are added. After all the transformations and convolutions, you are left with a flat vector of deep, complex features that capture information about the original time series in a wide range of frequency and time scale domains. This vector is then used as an input to fully connected layers with Softmax function on the last layer.

Keras Example

from keras.layers import Conv1D, Dense, Dropout, Input, Concatenate, GlobalMaxPooling1D
from keras.models import Model#this base model is one branch of the main model
#it takes a time series as an input, performs 1-D convolution, and returns it as an output ready for concatenationdef get_base_model(input_len, fsize):
#the input is a time series of length n and width 19
input_seq = Input(shape=(input_len, 19))
#choose the number of convolution filters
nb_filters = 10
#1-D convolution and global max-pooling
convolved = Conv1D(nb_filters, fsize, padding="same", activation="tanh")(input_seq)
processed = GlobalMaxPooling1D()(convolved)
#dense layer with dropout regularization
compressed = Dense(50, activation="tanh")(processed)
compressed = Dropout(0.3)(compressed)
model = Model(inputs=input_seq, outputs=compressed)
return model#this is the main model
#it takes the original time series and its down-sampled versions as an input, and returns the result of classification as an outputdef main_model(inputs_lens = [512, 1024, 3480], fsizes = [8,16,24]):
#the inputs to the branches are the original time series, and its down-sampled versions
input_smallseq = Input(shape=(inputs_lens[0], 19))
input_medseq = Input(shape=(inputs_lens[1] , 19))
input_origseq = Input(shape=(inputs_lens[2], 19))#the more down-sampled the time series, the shorter the corresponding filter
base_net_small = get_base_model(inputs_lens[0], fsizes[0])
base_net_med = get_base_model(inputs_lens[1], fsizes[1])
base_net_original = get_base_model(inputs_lens[2], fsizes[2])embedding_small = base_net_small(input_smallseq)
embedding_med = base_net_med(input_medseq)
embedding_original = base_net_original(input_origseq)#concatenate all the outputs
merged = Concatenate()([embedding_small, embedding_med, embedding_original])
out = Dense(1, activation='sigmoid')(merged)model = Model(inputs=[input_smallseq, input_medseq, input_origseq], outputs=out)
return model

This model is a much simpler version of the multi-scale convolutional neural network.

It takes the original time series and 2 down-sampled versions of it (medium and small length) as an input. The first branch of the model processes the original time series of length 3480 and of width 19. The corresponding convolution filter length is 24. The second branch processes the medium-length (1024 timesteps) down-sampled version of the time series, and the filter length used here is 16. The third branch processes the shortest version (512 timesteps) of the time series, with the filter length of 8. This way every branch extracts features on different time scales.

After convolutional and global max-pooling layers, dropout regularization is added, and all the outputs are concatenated. The last fully connected layer returns the result of classification.

Conclusion

In this article I tried to explain how deep convolutional neural networks can be used to classify time series. It is worth mentioning that the proposed method is not the only one that exists. There are ways of presenting time series in the form of images (for example, using their spectrograms), to which a regular 2-D convolution can be applied.

Thank you very much for reading this article. I hope it was helpful to you, and I would really appreciate your feedback.

References:

[1] Z. Cui, W. Chen, Y. Chen, Multi-Scale Convolutional Neural Networks for Time Series Classification (2016), https://arxiv.org/abs/1603.06995.

[2] Y. Kim, Convolutional Neural Networks for Sentence Classification (2014), https://arxiv.org/abs/1408.5882.

How to Use Convolutional Neural Networks for Time Series Classification的更多相关文章

  1. tensorfolw配置过程中遇到的一些问题及其解决过程的记录(配置SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving)

    今天看到一篇关于检测的论文<SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real- ...

  2. Notes on Convolutional Neural Networks

    这是Jake Bouvrie在2006年写的关于CNN的训练原理,虽然文献老了点,不过对理解经典CNN的训练过程还是很有帮助的.该作者是剑桥的研究认知科学的.翻译如有不对之处,还望告知,我好及时改正, ...

  3. 《ImageNet Classification with Deep Convolutional Neural Networks》 剖析

    <ImageNet Classification with Deep Convolutional Neural Networks> 剖析 CNN 领域的经典之作, 作者训练了一个面向数量为 ...

  4. 卷积神经网络CNN(Convolutional Neural Networks)没有原理只有实现

    零.说明: 本文的所有代码均可在 DML 找到,欢迎点星星. 注.CNN的这份代码非常慢,基本上没有实际使用的可能,所以我只是发出来,代表我还是实践过而已 一.引入: CNN这个模型实在是有些年份了, ...

  5. A Beginner's Guide To Understanding Convolutional Neural Networks(转)

    A Beginner's Guide To Understanding Convolutional Neural Networks Introduction Convolutional neural ...

  6. 阅读笔记 The Impact of Imbalanced Training Data for Convolutional Neural Networks [DegreeProject2015] 数据分析型

    The Impact of Imbalanced Training Data for Convolutional Neural Networks Paulina Hensman and David M ...

  7. 读convolutional Neural Networks Applied to House Numbers Digit Classification 的收获。

    本文以下内容来自读论文以后认为有价值的地方,论文来自:convolutional Neural Networks Applied to House Numbers Digit Classificati ...

  8. (转)A Beginner's Guide To Understanding Convolutional Neural Networks Part 2

    Adit Deshpande CS Undergrad at UCLA ('19) Blog About A Beginner's Guide To Understanding Convolution ...

  9. 论文笔记之:Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking

    Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking  arXiv Paper ...

随机推荐

  1. 原生aJax跨域

    今天遇到一个问题,利用原生ajax请求时报了一段错, 大概意思就是同源策略阻止了访问,也就是跨域, 后来发现是我在代码中发送了请求头, xhr.setRequestHeader(); 将这段代码删除后 ...

  2. linux ssh scp免密码

    1.首先登入一台linux服务器做为母机(即登入其他linux系统用这台做为入口):执行一行命令生成key文件:ssh-keygen -t rsa 2.在母机上,进入/root/.ssh目录,找到id ...

  3. Golang Web应用 创建docker镜像笔记(win 平台)

    记录的是 本地编译好了再创建容器镜像的方法 ,这样子生成的镜像文件比较小,方便分发部署 win 平台需要设置golang交叉编译 生成linux可执行文件 CMD下: Set GOOS="l ...

  4. cookie和session。

    Cookie和Session: 会话技术: 1.会话:一次会话中包含多次请求和响应. * 一次会话:浏览器第一次给服务器发送请求,会话建立,直到一方断开为止. 2.功能:在一次会话的范围内的多次请求间 ...

  5. Android:异步处理之Handler+Thread的应用

    担心原文消失,做此记录,感谢 https://www.cnblogs.com/net168/p/4075126.html 前言 很久很久以前就听说了,每一个android的应用程序都会分别运行在一个独 ...

  6. docker(ubuntu)中安装cron运行crontab定时任务

    1.安装cron apt-get install cron 设置crontab定时任务 crontab -e */ * * * * /usr/bin/python /python/asch-check ...

  7. PostMan变量与断言应用(对标Jmeter)

    常见的接口测试工具有PostMan/Jmeter/SoapUI,当然,也有一些公司为了更贴近自身的应用开发了一些小工具. 从功能上对比,Jmeter更为强大,既能做压测还能测接口,扩展性也比较好. B ...

  8. 性能测试基础---LR关联

    ·什么时候需要做关联?一般来说,在脚本运行出错的时候,我们就可能需要进行关联处理. ·脚本出错分为两种情况: ·直接回放出错(失败).通常来说,如果录制成功,回放失败,排除数据的唯一性约束之后,那就必 ...

  9. httprunner学习20-跳过用例skip/skipIf/skipUnless

    前言 在实际工作中,我们有时候会需要对测试用例加判断,比如某个接口功能暂时去掉了,我们希望对这个用例skip不去执行. 当其它的接口依赖于登陆接口返回的token时候,如果登陆都失败了,后面的接口,我 ...

  10. Linux 定时重启 Tomcat、重启Keepalived

    1.在 tomcat 目录新建一个.sh 文件: vi restartTomcat.sh 2.输入内容: #!/bin/bash# author: Linnuo # date: -- # Filena ...