rnn应用

Weather Recognition plays an important role in our daily lives and many computer vision applications. However, recognizing the weather conditions from a single image remains challenging and has not been studied thoroughly. Generally, most previous works treat weather recognition as a single-label classiﬁca- tion task, namely, determining whether an image belongs to a speciﬁc weather class or not. This treat- ment is not always appropriate, since more than one weather conditions may appear simultaneously in a single image. To address this problem, we make the ﬁrst attempt to view weather recognition as a multi- label classiﬁcation task, i.e., assigning an image more than one labels according to the displayed weather conditions. Speciﬁcally, a CNN–RNN based multi-label classiﬁcation approach is proposed in this paper. The convolutional neural network (CNN) is extended with a channel-wise attention model to extract the most correlated visual features. The Recurrent Neural Network (RNN) further processes the features and excavates the dependencies among weather classes. Finally, the weather labels are predicted step by step. Besides, we construct two datasets for the weather recognition task and explore the relationships among different weather conditions. Experimental results demonstrate the superiority and effectiveness of the proposed approach. The new constructed datasets will be available at

1. Introduction

The weather conditions inﬂuence our daily lives and production in many ways [1], such as wearing, traveling, solar technologies and so on. Therefore, acquiring weather conditions automatically is important to a variety of human activities. A possible solution to weather recognition is utilizing various of hardwares. While these hardware equipments are usually expensive and need professionals to maintain. An alternative scheme is to recognize weather con- ditions from color images using computer vision techniques [2,3]. Nowadays, surveillance cameras are ubiquitous, which makes the computer vision solution feasible. Apart from the guiding signif- icance to our daily lives, weather recognition is also an impor- tant function to many other computer vision applications [4–7], such as image retrieval [8], image restoration [9], and the relia- bility improvement of outdoor surveillance systems [3]. Robotic vi- sion [10,11] and vehicle assistant driving systems [12,13] can also beneﬁt from the results of weather recognition. Thus, we can draw

a simple conclusion that weather recognition from outdoor images has great research signiﬁcance.

1.1. Motivation and overview

Although weather recognition is of remarkable value, only a few researches have been published to tackle this problem. Several pre- vious works [12,14–16] concentrated on recognizing weather con- ditions from images captured by in-vehicle cameras. Several other papers [1,17,18] exploited weather recognition from single outdoor images. All of these works referred to weather recognition as a single-label classiﬁcation task (the weather label means weather category in this paper), namely, determining whether an image be- longs to a speciﬁc weather category or not.

However, it is not always appropriate to view weather recogni- tion as a single-label classiﬁcation problem. There are mainly two reasons to explain this inappropriateness. The ﬁrst reason can be summarized as uncertainty, i.e., the class boundaries among some weather categories are ambiguous essentially. As can be seen from Fig. 1, the changes from Fig. 1(a)–(f) demonstrate that there are a series of states between a pure sunny weather (like Fig. 1(a)) and an obvious cloudy weather (as illustrated in Fig. 1(f)). It is hard to determine whether the category is sunny or cloudy whe

referring to an intermediate weather state like Fig. 1(c), (d) and (e) [2]. Thus, the uncertainty of such boundary samples causes the dif- ﬁculty to determine ground-truth labels even from the perspective of human beings, and few previous works present solutions to this problem. The second drawback of treating weather recognition as a single-label classiﬁcation task can be summarized as incomplete- ness, namely, a single weather label may not describe the weather conditions comprehensively for a given image. For example, the vi- sual effect of haze is obvious in Fig. 1(g), (h) and (i). Nevertheless, it can be seen from the comparisons among these three images that Fig. 1(g) seems more sunny while Fig. 1(h) seems more over- cast, and Fig 1(i) seems snowy. Therefore, only a haze label cannot reveal the differences among these three images.

Motivated by the aforementioned two reasons, we propose to view weather recognition as a multi-label classiﬁcation problem, i.e., assigning multi-labels to an image according to the displayed weather conditions. Speciﬁcally, it is achieved by a CNN–RNN ar- chitecture. The intuition lies in two aspects. On one hand, most of the previous works focused on exploiting hand-crafted weather features [1], [20], while these features did not achieve desired re- sults in the weather recognition task. Inspired by the great success of Convolutional Neural Network (CNN) in these years, we utilize CNN as the weather feature extractor. On the other hand, labels exhibit strong co-occurrence dependencies in weather domain. For example, snowy and cloudy usually occur together while rainy and sunny almost never co-occur. Inspired by the success of Recurrent Neural Network (RNN) in dependency modeling [21,22], we pro- pose to use RNN to model the dependencies among labels and pre- dict weather labels step by step. In such a way, when predicting subsequent labels, the network can refer to the previous hidden states that incorporate the historical information implicitly.

For weather recognition, different image regions have different importances when predicting labels. As shown in Fig. 2, the blue sky is crucial for judging a sunny day, and snow on the ground is signiﬁcant for estimating the snowy weather. Lu et al. [2] also em- phasized that such weather cues are critical. Therefore, it is nec- essary to make the weather cues discriminative and preserve the spatial information of the image. To achieve this goal, a channel- wise attention model is designed to exploit more discriminative

features for the weather recognition task. Besides, we use convo- lutional Long Short-Term Memory (LSTM) [23] instead of vanilla RNN in our CNN–RNN architecture to preserve the spatial infor- mation. Convolutional LSTM uses convolution operations in both state-to-state and input-to-state transformations, which captures better spatio-temporal information than fully connected LSTM (FC- LSTM) [23].

In addition, considering that there are lacking of datasets in the weather recognition task, two new datasets are constructed in this paper, where the ﬁrst consists of about 8K images from seven weather categories, it is transformed from an existing transient at- tribute dataset [19]. The second is built from scratch containing 10K images from ﬁve weather categories.

1.1. Contributions

In summary, there are three main contributions of this work:

(1) We propose to treat weather recognition as a multi-label classiﬁcation task by analyzing the drawbacks of classifying images with a single weather label and the co-occurrence relationships among different weather conditions.

(2) We present a CNN–RNN architecture to tackle the multi- label weather classiﬁcation task. It is composed of a CNN to extract features, a channel-wise attention model to recali- brate feature responses, and a convolutional LSTM to model the relationships among different weather labels.

(3) We build a new multi-label weather classiﬁcation dataset and transform an existing transient attribute dataset [19] for the weather recognition task. The datasets will be available on the project website.

1.2. Organization

The remainder of this paper is in the following structure: In Section 2, some related works about weather recognition are re- viewed. In Section 3, we describe the proposed approach in detail. In Section 4, we ﬁrst present the construction of the new multi- label weather image dataset and the modiﬁcation of the transient attribute dataset [19]. Then, we analyze the experimental results on these two datasets. In Section 5, the conclusion of this paper is drawn.

2. Related work

We roughly classify the weather recognition works into two subcategories in this paper. One category focuses on designing hand-crafted weather features, and another category attempts to use CNNs to solve the weather recognition task.

2.1. Weather recognition with hand-crafted features

Many vehicle assistant driving systems use weather recogni- tion to improve the road safety. For example, they can set speed limit in extreme weather conditions, automatically open the wiper in a rainy day and so forth. Hand-crafted features are popular in these works. Kurihata et al. [12,24] proposed that rain drops are strong cues for the presence of rainy weather and developed a rain feature to detect rain drops on the windshield. Roser et al. [15] deﬁned several regions of interest (ROI) and developed various types of histogram features for rainy weather recognition. Yan et al. [13] utilized gradient amplitude histogram, HSV color histogram as well as road information for the classiﬁcation task among sunny, cloudy and rainy categories. Besides, several methods are proposed specially for fog detection, Hautiére et al. [14] used Koschmieder’s Law [25] to detect the presence of fog and estimate the visibil- ity distance. Bronte et al. [26] utilized many techniques, includ- ing a Sobel based sunny-foggy detector, edge binarization, hough line detection, vanishing point detection and road/sky segmenta- tion. Gallen et al. [27] focused on night fog detection by detecting backscattered veil caused by the vehicle ego lights or halos around the street lights. Pavli et al. [16,28] transformed images into fre- quency domain and detected the presence of fog through training different scaled and oriented Gabor ﬁlters in the power spectrum. Although the aforementioned approaches have shown good perfor- mance, they are usually limited to the in-vehicle perspective and cannot be applied to wider range of applications.

There are also several researches devoted to weather recogni- tion from common outdoor images. Li et al. [29] proposed a pho- tometric stereo-based approach to estimate weather condition of

a given site. Zhao et al. [9] pointed out that pixel-wise inten- sities of dynamic weather conditions (rainy, snowy, etc.) ﬂuctu- ate over time while static weather conditions (sunny, foggy, etc.) almost stay unchanged. They proposed a two-stage classiﬁcation scheme which ﬁrst distinguishes between the two conditions then utilizes several spatio-temporal and chromatic features to further estimate the weather category. In [17], several global features were extracted for weather classiﬁcation, such as inﬂection point in- formation, power spectral slope, edge gradient energy, saturation, contrast and image noise. Li et al. [18] also utilized several fea- tures in [17], and constructed a decision tree according to the dis- tance between features. Except for regular global features, [1] pro- posed multiple weather cues including reﬂection, shadow and sky descriptor for two-class weather recognition. They also exploited a collaborative learning strategy in which voters closer to the test image have more weights. Zhang et al. [20,30] proposed the sunny feature, rainy feature, snowy feature and haze feature individually for each weather class as well as two global features. Furthermore, a multiple kernel learning approach is proposed in [30] to fuse these features. In [31], both spatial appearance and temporal dy- namics were investigated on short video clips which can recognize several weather types.

Although researchers have elaborately designed many features for weather recognition, the features are usually limited to speciﬁc perspectives or weather classes, and cannot be applied to wider range of applications.

1.1. Weather recognition with CNNs

In recent years, convolutional neural networks have shown overwhelming performance in a variety of computer vision tasks, such as image classiﬁcation [32], object detection [33], semantic segmentation [34], etc. Several excellent architectures of CNNs are proposed including AlexNet [32], VGGNet [35] and ResNet [36], which outperform the traditional approaches to a large extent. In- spired by the great success of CNNs, a few of works attempt to apply CNNs to weather recognition task. Elhoseiny et al. [3] di- rectly ﬁne-tuned AlexNet [32] on a two-class weather classiﬁca- tion dataset released by Lu et al. [1], and achieved a better result. Lu et al. [2] combined hand-crafted weather features with CNNs extracted features, and further improved the classiﬁcation perfor- mance. While as discussed in [2], there is no closed boundaries among weather classes. Multiple weather conditions may appear simultaneously. Therefore, all the above approaches suffer from the information loss when they treat weather recognition as a single label classiﬁcation problem. Li et al. [37] proposed to use auxil- iary semantic segmentation of weather cues to comprehensively describe the weather conditions. This strategy can alleviate the problem of information loss, while the segmentation mask is not intuitive for humans.

2. Our approach

In this paper, to comprehensively describe the weather condi- tions, we propose to treat weather recognition as a multi-label classiﬁcation problem. Furthermore, a CNN–RNN model is devel- oped for this task, which formulates the multi-label classiﬁca- tion as a step-wise prediction. Fig. 3 demonstrates the architec- ture of the proposed approach. It mainly composes of three parts, i.e., the basic CNN, a channel-wise attention model and a con- volutional LSTM. The CNN extracts the preliminary features of a given outdoor image. Speciﬁcally, the ﬁrst ﬁve groups of convolu- tional/pooling layers of VGGNet [35] are utilized in this paper. The channel-wise attention model adaptively calculates the channel- wise attention weights and recalibrates the feature responses. The convolutional LSTM uses visual features and the hidden state to

predict weather labels one by one, which implicitly models the co- occurrence dependency among labels by maintaining context infor- mation in internal memory states.

3.1. The convolutional LSTM in the CNN–RNN architecture

The Recurrent Neural Networks, especially LSTM, has recently achieved overwhelming success in sequence modeling tasks, such as image/video captioning [38] and neural machine translation [39]. Without loss of generality, the LSTM can be formulated as follows [40].

it = σ (Wiwxt + Uihht−1 + bi ), ft = σ (Wfwxt + Ufhht−1 + b f ), ot = σ (Wow xt + Uoh ht−1 + bo ),

gt = tanh(Wgw xt + Ugh ht−1 + bg ), ct = ft ◦ ct−1 + it ◦ gt ,

ht = ot ◦ tanh ct ,(1)

where the subscript t indicates the tth step of LSTM, xt denotes the input data, ht stands for the hidden state, ct is the cell state. it, ft and ot are input gate, forget gate and output gate of the LSTM, respectively. Ws , Us and bs are weights and biases to be learned. σ , tanh and ◦ represent the sigmoid function, hyperbolic tangent function and element-wise multiplication, respectively. As shown in Eq. (1), at each step, the data xt and the previous hidden state ht−1 is taken as the input of current LSTM unit, and the historical information are recorded in the hidden state ht, such that LSTM can exploit the temporal dependency.

Although the standard LSTM has demonstrated its powerful ca- pability in sequence modeling tasks, the spatial information is ig- nored when processing images [23]. As can be seen from Eq. (1),

fully connections are used in state-to-state and input-to-state transformations. Generally, if the input image data xt ∈ RW × H × C , it will be ﬂattened to an 1D-vector before input to the LSTM. While

this process will suffer from the loss of spatial information. To overcome this drawback, the convolutional LSTM is employed in our approach [23], which can be formulated as follows.

it = σ (Wiw 0 xt + Uih 0 ht−1 + bi ), ft = σ (Wfw 0 xt + Ufh 0 ht−1 + b f ), ot = σ (Wow 0 xt + Uoh 0 ht−1 + bo ),

gt = tanh(Wgw 0 xt + Ugh 0 ht−1 + bg ),

ct = ft ◦ ct−1 + it ◦ gt ,

ht = ot ◦ tanh ct ,(2)

where 0 denotes the convolution operator and other symbols are the same with Eq. (1). It should be noted that the input feature xt, cell state ct, hidden state ht and gates it, ft, ot of convolu- tional LSTM are all 3D tensors, and convolution operations are used in state-to-state and input-to-state transformations. Therefore, the spatial information of features are preserved in this way. Further- more, the convolution operation actually has implicit spatial at- tention mechanism, since regions corresponding to the target la- bel usually have higher activation responses. In the experiment, we also ﬁnd that the convolutional LSTM pays attention to several critical regions for weather label prediction, and achieves better re- sults than common LSTM with or without spatial attention model.

3.1. Channel-wise attention model in the CNN–RNN architecture

Usually, different regions will be activated in disparate chan- nels of the feature map, and different image regions have different importance when estimating various weather conditions. In our CNN–RNN architecture, each step of the convolutional LSTM will predict one weather label. Inspired by Hu et al. [41], we propose a channel-wise attention model for the CNN–RNN architecture to adaptively recalibrate the feature responses when predicting differ- ent weather labels. The illustration of the proposed channel-wise attention model is shown in Fig. 4.

As discussed in [41], exploiting global information is a popu- lar method in feature engineering works. To calculate the atten- tion weight of each feature map channel, we adopt the similar strategy, i.e., global average pooling is used to generate channel- wise statistics which can be viewed as a descriptor of the channel- wise global spatial information. While different from [41], in our multi-label weather classiﬁcation task, we want to adaptively ob- tain the channel-wise attention weights according to the previous predicted weather label. So we also take into account the channel- wise statistics information encoded in the hidden state of the con- volutional LSTM. The two kinds of statistics information are formu- lated as follows.

where N denotes the number of training samples, pi,t indicates the ground-truth label of the ith sample on the tth weather class, and

～pi,t is the corresponding predicted label. Finally, the total loss is

formulated as follows,

Loss = 「 losst ,(9)

t=1

where T represents the number of all weather classes.

3.4. Training details

1 W H

「「

The open source library tensorﬂow is used to implement the

ak = fa (xk ) = W

× i=1 j=1

1 W

xk (i, j),(3)

proposed approach. To accelerate the convergence, we adopt a two stage training strategy. In the ﬁrst stage, the basic CNN of our ap- proach (i.e., the ﬁrst ﬁve groups of convolutional/pooling layers of

「「

dk = fa (ht−1,k ) = WH

i=1 j=1

ht−1,k (i, j),(4)

VGGNet [35]) is trained. Speciﬁcally, we transform VGGNet into a multi-label classiﬁcation framework by replacing the output layer

where xk and ht−1,k denote the visual feature and the previous hidden state of the convolutional LSTM at the kth channel (k = 1, 2, ..., C), respectively. fa represents the global average pooling

function, ak and dk denote the statistics information of visual fea- ture and hidden state at the kth channel. W and H stand for the width and height of visual features. It should be noted that, in our approach, the visual features and hidden states are in the same di- mension.

After the statistics information of the visual features and hid- den states is obtained, the channel-wise attention weights are cal- culated by

with T neurons (T represents the number of weather classes), and train it with multi-label sigmoid cross-entropy loss function. The pre-trained VGGNet model on ImageNet Large Scale Visual Recog- nition Challenge (ILSVRC) is used for ﬁne-tune. In the second stage, we remove the fully connection layers of VGGNet, and ﬁx the other parameters. Then, the convolutional LSTM and channel-wise atten- tion model are trained from scratch based on the CNN extracted features. Xavier initialization method is employed in this stage. Adam [43] optimization approach is used to minimize loss func- tions in both two stages where the ﬁrst and second momentum are set to 0.9 and 0.999, respectively. To avoid overﬁtting, the dropout

[44] operation is used after the fully connection layers in both two

zk = σ (w2δ(w1[ak , dk ] + b1 ) + b2 ),(5)

stages, and L2

regularization is also employed for all weight pa-

where ws and bs are weights and biases to be learned, δ represents

the ReLU [42] function that is utilized to learn the non-linear map- ping, [ · , · ] is the concatenation operation, σ indicates the sigmoid function which normalizes the attention weight to the range of 0–

1. Finally, the recalibrated features are obtained by rescaling the original features with attention weights,

x˜ = 「 zkxk .(6)

k=1

3.3. Inference

In this paper, the weather labels are predicted in a ﬁxed path. Practically, the order of other weather labels are set accord- ing to their co-occurrence relationships, details are depicted in Section 4.2.

In each step of the convolutional LSTM, the 3D hidden state is ﬂattened to a 1D vector, then it is used to predict the weather label.

pt = σ (wpht + bp ),(7)

where pt ∈ [0, 1] is the output probability of the tth weather label, ht is the ﬂattened hidden state, wp and bp are the learned weight and bias.

The loss of each prediction step is determined by the following function.

rameters. We set the dropout ratio and weight of L2 regularization to 0.5 and 0.0005 during the entire training process. The learning rate is initialized as 0.0001 and drops by a factor of 10 after the loss is stable. Besides, we also attempt to ﬁne-tune all parameters after the second training stage, i.e., unﬁx the parameters of the ba- sic CNN, while experiments prove that this strategy cannot bring performance improvements.

Before training, each sample is resized into a 256 × 256 image.

Random ﬂip, random crop and random noise are used for data aug- mentation. We adopt the stochastic mini-batch training strategy, images are randomly shuﬄed and they constitute mini-batches of size 50 before each training epoch. Table 1 shows the detailed shapes of several critical components of the proposed CNN–RNN architecture. Besides, the shapes of all biases can be easily inferred.

4. Experiments

Since this is the ﬁrst work to treat weather recognition as a multi-label classiﬁcation problem, there are no existing datasets for this task. Therefore, to evaluate the proposed approach, we construct two datasets where one is the modiﬁcation of the tran- sient attribute dataset [19] and another one is created from scratch. In this section, we ﬁrst introduce the construction procedure and details of the two datasets. Then, the co-occurrence relationships among weather labels are explored. Finally, the evaluation metrics,

1 Ncomparison approaches and experimental results are presented in

losst = − 「 pi,t log ～pi,t + (1 − pi,t ) log(1 − ～pi,t ),(8)

4.1. Dataset description

4.1.1. The transient attribute dataset

The ﬁrst dataset is transformed from an existing transient attribute dataset [19] which was originally erected for outdoor scenes understanding and editing. Although the transient attribute dataset is not specially designed for weather recognition, this dataset presents many appealing properties. First, images are cap- tured across many outdoor scenes including mountains, cities, towns and urban sceneries. Images in this dataset are of different scales and views, which enhances the diversity across scenes. Sec- ond, images are selected elaborately to ensure they exhibit various appearances of the same scene. Moreover, the authors of [19] de- ﬁned 40 transient attributes for this dataset including weather re- lated attributes (e.g., ‘sunny’, ‘rain’, ‘fog’, etc.). For each image, the weather related attributes are annotated non-exclusively, which is important for our multi-label weather recognition experiments. Several examples of the transient attribute dataset are illustrated in Fig. 5.

For weather recognition, six weather related attributes among all 40 transient attributes are selected, i.e., ‘sunny’, ‘cloudy’, ‘fog’, ‘snow’, ‘moist’ and ‘rain’, others are ignored in our experiments. Besides, we ﬁnd that there exists a few examples in which all weather attribute strengths are very low. Some of them are cap- tured at dawn and dusk, others do not show obvious features cor- responding to any weather categories. Therefore, we add an ‘other’ class to represent those examples where every attribute strength is lower than 0.5. It is noteworthy that the strength lower than

0.5 indicates the annotation workers do not think the image ex- hibits the corresponding attribute. In this paper, for the weather recognition task, weather attributes greater and lower than 0.5 are set to 1 and 0, respectively. Finally, the dataset contains seven weather classes and 8571 images in total. The detailed statistics of the dataset are displayed in Table 2.

4.1.2. The multi-label weather classiﬁcation dataset

To further evaluate the proposed approach, we construct a new dataset from scratch, which contains 10,000 images from 5 weather classes, i.e., ‘sunny’, ‘cloudy’, ‘foggy’, ‘rainy’ and ‘snowy’. All images are elaborately selected from Internet. Compared to other weather recognition datasets, our dataset has the follow- ing advantages. First, most of the existing datasets focus on only two or three weather classes, while our dataset covers all common weather conditions in the daily life. Second, the new constructed dataset contains many different scenes including cities, villages, ur- ban areas and so on, as depicted in Fig. 6. In addition, this dataset also exhibits different scales and views. Third, in our dataset, the weather labels are not mutually exclusive, which can provide more weather information.

The annotation of multi-label weather classiﬁcation dataset was completed by a crowd-sourced task. The annotation workers are asked to determine weather attribute strengths non-exclusively for a given image, and the range of strengths is from 0 to 1, in which

0.5 is a demarcation point. Weather attribute strength lower than

0.5 indicates that the image cannot be judged as the corresponding weather condition (even if the image contains corresponding at- tribute). In this dataset, an image is annotated by at least ﬁve workers, and the average value of each attribute strength is se- lected as the result. To ensure the effectiveness of the annotation task, we also calculate the variance of each attribute strength for a given image. If the variance is bigger than a threshold, the re- sult will be re-determined by discussion. Finally, to generate the weather labels, all attribute strengths greater than or equal to 0.5 are set to 1, others are set to 0.

Fig. 7 shows the weather label distribution on the two experi- ment datasets. The detailed statistics can also be found at Table 2. In both datasets, cloudy is the class with large number of samples. This is because that cloudy usually co-occurs with other weather conditions. Apart from cloudy, the new constructed dataset is more

balanced than the transient attribute dataset. Besides, it can be ob- served from Table 2 that over half samples have multiple weather labels in both of the two datasets, which also veriﬁes the validity

i and j. Q represents all the samples in the dataset. conc(i, j) and

I(i) are indicator functions which are deﬁned as follows,

of taking weather recognition as a multi-label classiﬁcation task.

conc(i, j) =

1, Arr(i)0.5 Arr( j)0.5

0,otherweise ,(11)

4.2. Co-occurrence relationships

We have qualitatively argued that more than one weather con- ditions may occur simultaneously in one image. The quantitative

I(i) =

1, Arr(i) ≥ 0.5

0, otherweise

,(12)

analysis of co-occurrence relationships among different weather conditions is also conducted according to the following equation,

}, conc(i, j)

R(i, j) Q,(10)

}, I(i)

where both i and j denote a kind of weather condition, R(i, j) indi- cates the measurement of the co-occurrence relationship between

where Arr(i) denotes the attribute strength of weather condition i,

∧ represents the conjunction symbols. In summary, Eq. (10) indi- cates the ratio between co-occurrence number of the two weather conditions and the occurrence number of weather condition i over

all images. Therefore, }, j R(i, j) and }, j R( j, i) represent the inﬂu-

ence and dependence of label i to others, respectively. To exploit the dependencies when predicting the weather labels, it is natural for us to predict the most inﬂuential label ﬁrst and the dependent label last. Based on this, the following equation is utilized to rank

the weather labels,

},

OR =

},N

n=1

},K

i=1

f ( pn,i, ～pn,i )

,(15)

j R(i, j)

},NK

r = },

j R( j, i)

.(13)

n=1 i=1 pn,i

where N denotes the number of samples in the dataset, K rep-

Obviously, the label with a higher score of r should rank ﬁrst.

The analytical result is depicted in Fig. 8, from which we can simply draw the following conclusions. First, in accordance with our intuition, there are stronger co-occurrence relationships among different weather conditions, such as rainy and cloudy, snowy and

resents the number of weather classes, pn, i and ～pn,i indicate the

ground-truth label and predicted label of the nth sample on the ith weather class, respectively. f( · ) is an indicator function which is deﬁned as follows,

1,p = p

foggy, etc. The corresponding samples are usually near the cate- gory boundary. In this paper, we propose to use the combination

f ( p, ～p) =

～

0, otherwise

.(16)

of labels to represent these samples. Second, there are indeed la- bel dependencies in the weather recognition task. It is necessary to consider this problem when predicting multiple weather labels. In this paper, the convolutional LSTM is employed to capture the dependencies among different weather labels, and the labels are predicted step by step. According to Eq. (13), the order of weather

labels is ﬁxed as moist → cloudy → others → sunny → snowy

→ foggy → rainy on the transient attribute dataset, and cloudy

→ sunny → foggy → rainy → snowy on our multi-label weather

classiﬁcation dataset. Practically, we have also tried several other label orders, they get comparable performance, and the above two achieve the best in most occasions.

4.2. Evaluation metrics and comparison approaches

Per-class precision and recall are ﬁrst computed as evaluation metrics. Per-class means that for a given weather label, the predic- tion result is true as long as the current label is correctly predicted. Then, the average precision (AP) and average recall (AR) are calcu- lated, which are deﬁned as the average values of per-class preci- sion and recall, respectively.

Besides, sample-wise evaluation metrics are also adopted, which are deﬁned as overall precision (OP) and overall recall (OR).

Finally, the F1 scores (including AF1 and OF1) are computed, which are the harmonic mean of precision and recall.

Since there are no other multi-label weather recognition ap- proaches, we compare with the multi-label version of AlexNet [32] and VGGNet [35]. To verify the effectiveness of convolu- tional LSTM and channel-wise attention model in this paper, we also compare with some other CNN–RNN frameworks, including CNN–LSTM, CNN–LSTM with spatial attention model (CLA), CNN– GRU with spatial attention model (CGA), CNN-ConvLSTM without channel-wise attention model. Besides, two widely used general multi-label approaches are also employed as comparison methods, i.e., ML-KNN [45] and ML-ARAM [46]. ML-KNN proposed a multi- label lazy learning method that adapts the traditional K-nearest neighbor (KNN) algorithm to the multi-label purpose. ML-ARAM extended the Adaptive Resonance Associative Map neural network for multi-label classiﬁcation tasks. In our experiment, we test these two approaches using the implementations of the popular scikit- multilearn library. For fair comparisons, all CNN–RNN frameworks use the same CNN (i.e., VGGNet) with our approach. Features input to ML-KNN and ML-ARAM are also extracted by VGGNet (the last fully connection layer) pre-trained on two experimental datasets. The proposed approach are referred to as CNN-Att-ConvLSTM.

4.3. Results on the transient attribute dataset

N KFor the transient attribute dataset, 1000 images are randomly

}, }, f ( pn,i, p˜n,i )

n=1 i=1

selected for testing, another 1000 images are selected for valida-

OP =

,(14)

N · K

tion, and the remains are for training. The experimental result is

shown in Table 3, from which we can see that the proposed ap- proach CNN-Att-ConvLSTM achieves the best results on OP, OR and OF1, and comparable results with the state-of-the-arts on AP, AR and AF1. CNN–LSTM with spatial attention model (CLA) also gets good results. While without spatial attention model, CNN–LSTM suffers from serious performance degradation. This indicates the importance of some key regions in the weather recognition task. To evaluate the inﬂuences of LSTM in the CNN–RNN framework, we also test CNN–GRU with spatial attention model (CGA), and ﬁnd CGA achieves almost the same results with CLA. CNN-ConvLSTM also gets similar results with CLA, which denotes the effectiveness of convolutional LSTM in information extraction of key regions. Overall, the proposed approach perform better than multi-label version of AlexNet, VGGNet, the general multi-label approaches ML-KNN, ML-ARAM, and other CNN–RNN methods, which proves the superiority of our approach.

For per-class result, all these methods perform worse on ‘rainy’ and ‘other’ classes. This is because that most images in transient attribute dataset present distant views. It is diﬃcult to recognize the rainy weather from such distant views. In addition, samples of ‘other’ class are very rare, and can be easily misclassiﬁed as sunny or cloudy in this dataset.

4.2. Results on the multi-label weather classiﬁcation dataset

For multi-label weather classiﬁcation dataset, 2000 images are randomly selected for testing, 1000 images for validation, and the remains for training. As presented in Table 4, CNN-Att-ConvLSTM performs the best on almost all the evaluation metrics, which demonstrates the effectiveness of the proposed approach again.

To analyze the effectiveness of our approach, some weather recognition examples are presented in Fig. 9. It includes the clas- siﬁcation results, activation maps and attention weights from our approach. The results of VGGNet are utilized for comparison, since our approach also uses it as the deep feature extractor.

Speciﬁcally, our approach works well on the above three images. From the selected activation maps and their attention weights, we can see that our approach can attend to the most correlated weather cues when predicting different weather labels, while the results of VGGNet are not so satisfactory. For example, the ﬁrst image is annotated as sunny and foggy, correspondingly the blue sky, the bright area and the region of haze have stronger responses in our activation maps, and the attention weights of cor- responding activation maps are relatively high when predicting dif- ferent labels. However, the ground is activated by VGGNet mistak- enly, which leads to the wrong label, i.e., rainy. Besides, our ap- proach fails on the rest two images, where the fourth image is an- notated as sunny and cloudy, which means an intermediate state between sunny and cloudy. However, only the cloud regions are activated, and the sunny label is lost in our approach. It is mainly because the sunny label is a little ambiguous. The ﬁfth image is annotated as cloudy and rainy. However, due to the wet ground is not so obvious, it is mis-classiﬁed as cloudy and foggy in our approach. Overall, the results in Fig. 9 indicate that our approach performs well in most cases, but sometimes fails when the anno- tation is ambiguous and the weather cues are not obvious. It is reasonable since our approach is just based on the visual features, and maybe better performance can be achieved with other modal- ity information, such as humidity, which can be taken into consid- eration in our future work.

4. Conclusion

Considering that more than one weather conditions may oc- cur simultaneously in one image, we ﬁrstly analyze the drawbacks of taking weather recognition as a single label classiﬁcation task,

and propose a multi-label classiﬁcation framework for the weather recognition task. It allows one image to belong to multiple weather categories, which can provide more comprehensive description of the weather conditions. Speciﬁcally, it is a CNN–RNN architecture, where CNN is extended with a channel-wise attention model to extract the most correlated visual features, and a convolutional LSTM is utilized to predict the weather labels step by step, mean- while, maintaining the spatial information of the visual feature. Be- sides, we build two datasets for the weather recognition task to make up the problem of lacking training data. Practically, the ex- perimental results have veriﬁed the effectiveness of the proposed approach.

In the future work, we plan to introduce the distribution pre- diction task for weather recognition [47–50], which cannot only classify the image with multi-labels, but also predict the strengths of different weather class, so as to describe the weather conditions

more comprehensively. Besides, other modality information, such as humidity and temperature, can also be utilized in the future work.

rnn应用的更多相关文章

RNN求解过程推导与实现
RNN求解过程推导与实现 RNN LSTM BPTT matlab code opencv code BPTT,Back Propagation Through Time. 首先来看看怎么处理RNN. ...
在RNN中使用Dropout
dropout在前向神经网络中效果很好,但是不能直接用于RNN,因为RNN中的循环会放大噪声,扰乱它自己的学习.那么如何让它适用于RNN,就是只将它应用于一些特定的RNN连接上. LSTM的长期记 ...
RNN 入门学习资料整理
建议按序阅读 1. RNN的一些简单概念介绍 A guide to recurrent neural networks and backpropagation Deep learning:四十九(RN ...
lecture7-序列模型及递归神经网络RNN
Hinton 第七课 .这里先说下RNN有recurrent neural network 和 recursive neural network两种,是不一样的,前者指的是一种人工神经网络,后者指的是 ...
RNN 入门教程 Part 4 – 实现 RNN-LSTM 和 GRU 模型
转载 - Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano ...
RNN 入门教程 Part 3 – 介绍 BPTT 算法和梯度消失问题
转载 - Recurrent Neural Networks Tutorial, Part 3 – Backpropagation Through Time and Vanishing Gradien ...
RNN 入门教程 Part 2 – 使用 numpy 和 theano 分别实现RNN模型
转载 - Recurrent Neural Networks Tutorial, Part 2 – Implementing a RNN with Python, Numpy and Theano 本 ...
RNN 入门教程 Part 1 – RNN 简介
转载 - Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs Recurrent Neural Networks (RN ...
CNN & RNN 及一些常识知识（不断扩充中）
参考: http://blog.csdn.net/iamrichardwhite/article/details/51089199 一.神经网络的发展历史五六十年代,提出感知机八十年代,提出多层感 ...
循环神经网络(RNN, Recurrent Neural Networks)介绍（转载）
循环神经网络(RNN, Recurrent Neural Networks)介绍这篇文章很多内容是参考:http://www.wildml.com/2015/09/recurrent-neur ...

随机推荐

npm 是node.js下带的一个包管理工具
npm 是node.js下带的一个包管理工具 npm install -g webpack webpack是一个打包工具 gulp是一个基于流的构建工具,相对其他构件工具来说,更简洁 ...
安装Office2016遇到“无法流式传输Office”问题
安装Office2016遇到“无法流式传输Office”问题,请问如何解决很抱歉,找不到所需的文件,请检查安装源是否可访问,然后再试. 错误代码:30068-39(2) ============== ...
Unity3D里怎样隐藏物体
方法很多: 1.改position,移到视野外,推荐,最节省 2.gameObject.SetActive (false); //要提前引用,要不你就改不回来了... 3.renderer.enabl ...
js清除childNodes中的#text（选项卡中会用到获取第一级子元素）
我们一般为了代码整洁代码都会换行,如上面所述. 获取div1节点下的childNodes var div = document.getElementById('div1') var child = d ...
LinkedList源码阅读笔记（1.8）
目录 LinkedList类的注解阅读 LinkedList类的定义属性的定义 LinkedList构造器核心方法校验方法普通方法迭代器(iterator&ListIterator) ...
Reids 持久化AOF 重写实现原理
AOF重写 AOF重写并不需要对原有AOF文件进行任何的读取,写入,分析等操作,这个功能是通过读取服务器当前的数据库状态来实现的.(auto-aof-rewrite-percentage和auto-a ...
mint修改host
sudo xed /etc/hosts # Pycharm 0.0.0.0 account.jetbrains.com0.0.0.0 www.jetbrains.com #sublime text3 ...
Shiro框架配置-applicationContext里面的(仅提供借鉴)
 <bean id="shiroAuthRealm" class="com.sykj.realm.ShiroAut ...
[大数据面试题]storm核心知识点
1.storm基本架构 storm的主从分别为Nimbus.Supervisor,工作进程为Worker. 2.计算模型 Storm的计算模型分为Spout和Bolt,Spout作为管口.Bolt作为 ...
3D视图的2D展示
效果图:预览 :预览如何在2d界面显示3d图形? 如果把屏幕的中心作为视点的中心位置,那由远及近的物体应该是逐渐缩小的,而且是逐渐模糊的, 我们首先获取元素相对于中心点的距离,然后抽取这个距离的百分 ...