MILAD: Robust Anomaly Detection for Electric Vehicles with Label Noise

One of the bottlenecks restricting the development of electric vehicle industry is the safety problem. Although numerous of anomaly detection algorithms for electric vehicles have been proposed, most of them may perform poorly due to the complexity and unpredictability of real scenes. We consider that there may be a certain degree of potential safety hazard in the battery system of electric vehicles before, during and after the process of faults in the real scenes, that is, label noise. In order to solve this problem, we propose a Multi-Instance Learning based Anomaly Detection (MILAD) framework, to perform anomaly detection for electric vehicles with label noise problem. Extensive cross validation experiments fully verify that the framework can effectively detect the existence of abnormal conditions in the presence of label noise in multivariate time series data.


Introduction
Electric vehicles have been developed rapidly for the past few years and have received numerous attentions. However, accidents caused by thermal runaway of lithium batteries in electric vehicles are common, which raises many discussions about the safety of electric vehicles. Scholars all over the world have proposed a large number of methods for the anomaly detection problem of electric vehicles. These methods can be roughly divided into two categories: model based approaches and data-driven approaches [11] [13]. Model based approaches attempt to construct a model for the internal state of each power battery, through which we can accurately monitor the health status of each battery, and then realize abnormal detection of electric vehicles. The data-driven approaches do not explicitly construct a model for each battery. They use machine learning technology to explore patterns in battery data, and then provide information on the safety of electric vehicles.
Although previous algorithms have achieved certain success on their data set, a large part of these algorithms are based on ideal data constructed under laboratory conditions, and their reliability are verified by manually constructing abnormal conditions, like heating the battery or loosening battery component screws. Such algorithms have great limitations when transferred to real-world applications, because they do not take into account that electric vehicles will show a huge difference from the laboratory scene due to the complexity and unpredictability of the real scene, nor do they consider that the failure of the power battery should be a gradual accumulation rather than a sudden process.
We consider that the battery system itself has a certain degree of safety hazards right before the failure of the electric vehicle in the real scene, i.e. the failure of the power battery of the electric vehicle should be a gradual accumulation rather than a sudden process. In nature, this is a unidirectional label noise problem for multivariate time series data, which means that the normal data (negative samples) within a period of time around the failure (positive samples) may have implicit anomalies, that is, noisy negative labels.
In this paper, we propose MILAD, a Multi-Instance Learning based Anomaly Detection framework to solve the label noise problem in precision equipment such as electric vehicles in real scenarios. Based on the idea of multi-instance learning, we let the data in a time serial bag share the same label, forcing the model to pay attention to the label noise of a segment of data before the failure, instead of simply treating it as completely normal data, and then learn a more compact decision bound between the normal mode and the failure mode from the data. Cross-validation experiments have fully confirmed that this framework can effectively detect the existence of abnormal conditions in multi-variable time series data with label noise.

Related Work
Multi-variable time series anomaly detection is an active topic in both academia and industry. Most of existing algorithms of anomaly detection are unsupervised or semi-supervised, owing to the lack of anomalies [4]. For these label-free algorithms technologically, they can be mainly divided into following categories: discord detection, dissimilarity-based and prediction model-based [2]. The first and most straightforward idea, i.e. discord detection is to compare each subsequence with the others [5]. Dissimilarity-based methods are based on the direct comparison of subsequences or their representations, using a reference of normality [3]. Prediction model-based methods intend to build a prediction model that captures the dynamics of the series using past data and thus detect abnormal temporal segments whose prediction error is higher than a selected threshold [7].
These label-free algorithms have achieved superior performances on various public datasets and realworld applications. However, considering that one of the most challenging problems in anomaly detection is how to define "anomaly" with little prior information about it, recently some work which is implemented in a weakly-supervised fashion and utilizes a small amount of known anomalies has been proposed. [8] [10] show that by well leveraging few known anomalies can enable much more accurate anomaly detection. Similar to our work, some scholars have applied weakly-supervised MIL based anomaly detection technique on surveillance videos [12][14], and [14] devised a graph convolutional network to correct noisy labels in the MIL framework. However, these methods are specifically designed for videos, and cannot be simply transferred for electric vehicles.

Problem Statement
Supposed that we are given a normal time series set , , ⋯ , and an abnormal time series set , ⋯ , . For a time series , , we have X x , ⋯ , x ∈ R as its features and ∈ 0,1 as its labels, where is the feature dimension and is its length. For a time frame , 0,1 indicates this time frame is normal or abnormal respectively. For convenience, we denote , ⋯ , , as a time window (or temporal segment) of fixed length at time , and its label is . We now define a bag (or batch) with start time is , , ⋯ , , where is the bag (or batch) size, the bag-level label is max , , ⋯ , . Our goal is to train an anomaly score ranking model : → 0,1 by the given data set ∪ . For a testing time window , we have ∈ 0,1 indicating how abnormal is.

Proposed Framework
In response to this problem, we proposed a novel anomaly detection framework. Our main idea is derived from multi-instance learning. We let the data in a bag share the same label, forcing the model to pay attention to the label noise of a segment of data around the failure, instead of simply treating it as completely normal data. Then the model will learn the more compact decision bound between a normal mode and the failure mode from the data.  Figure 1. Overview of our proposed framework. The bag generation module first divides the original time series of electric vehicle data into temporal segments and packs continuous ones into a bag/batch. Then the anomaly scoring networks, composed of an encoder, a decoder and a regressor, capture the temporal dependence of features, yield discriminant representations and calculate the anomaly score for each temporal segment. Finally, all the anomaly scores of time series segments in the bag will be compared with the label of the bag to calculate the multi-instance loss.
Our propose framework is shown in Figure 1. It is composed of three parts: a bag generation module, an anomaly scoring module, and a multi-instance loss module. It works as follows: (1) The bag generation module first receives the original real-world data of the electric vehicle as input, and outputs a bag to downstream tasks through operations such as data processing, data sampling, and packing data into a bag; (2) The anomaly scoring module takes each time series segment in the bag from upstream as its input, and calculates an anomaly score for each of these segments through the recurrent neural network architecture; (3) Finally, all the anomaly scores of time series segments in the bag will be integrated and compared with the label of the bag to calculate the multi-instance loss. And the parameters of the neural network will be updated by gradient descent back propagation.

Bag Generation
To solve the problem of potential label noise around fault data, we adopt a multi-instance learning based method. The first step is to pack the original real-world data of electric vehicles into bags.
We consider that we need to input time series into the recurrent neural network, which requires a clip of time series for each input to extract the temporal representations from it. Therefore, we divide each time series into overlapping temporal segments of fixed length for the input shape of the recurrent neural network. Formally, consider a time series , , ⋯ , where each step ∈ in the time series is a -dimension vector. For each time step , data that does not exceed steps before , that is , along with the its corresponding label will be sent to the anomaly scoring network to model the temporal mapping mode, where is the fixed window length. Considering the label noise problem, we treat each batch as a bag, in which all temporal segments share the same label, which is the maximum value of the labels of each time steps.
Finally, we considered the problem of class imbalance. We solved this problem with a simple downsampling trick. In each epoch, we traverse all abnormal bags, randomly sample an equal number of normal bags from all normal bags and output them for downstream tasks.

Anomaly Scoring Network
The anomaly scoring network is fed with a temporal segment with length and outputs a value ∈ 0,1 , named anomaly score, indicating the degree of the input segment being abnormal. The anomaly scoring network maps each temporal segment in the bag from upstream to an anomaly score. We converts the anomaly detection problem into a regression problem by this anomaly scoring module. We formulate the mapping function of anomaly scoring network as: ( 1 ) where is the parameters of anomaly scoring network, is the input temporal segment or time window at time from the bag generated from upstream, and is the anomaly score.
To extract discriminant representations and achieve better regression performance simultaneously, we adopt the GRU-AE architecture, which is shown in Figure 2. The anomaly scoring network can further be divided into three parts: an encoder, a decoder and a regressor, all of which have GRU layer inside to model the temporal dependence. The encoder first encodes input batch/bag into a lowdimension space as embedded representations to compress the features in raw data. Then the decoder decodes the dense representations into initial feature spaces to reconstruct the last time frame in . Finally, the regressor maps the extracted temporal representations to a scalar for each time frame as the anomaly score.
We formulate these parts as following: where , , denote the encoder, decoder and regressor, respectively, is the dense representations extracted from by the encoder, is the reconstructed last time frame in , and is the output scalar. The output scalar is then normalized to ∈ 0,1 , which can be seen as the ranking score or anomaly score indicating the degree of being abnormal of the time frame.

Multi-Instance Loss
As mentioned above, we adopt multi-instance loss to make the learning model to focus on the overall maximum anomaly score. Consider a batch (or a bag) of temporal segments , , ⋯ , and the corresponding label sequence , ⋯ , , then the bag-level label is max . To ensure that the model can learn both normal patterns and abnormal patterns in bags with different bag label, we adopt different forms of loss function w.r.t bag label as: If the bag label is positive, our proposed multi-instance loss is where ℒ denotes Anomaly Discrimination Loss, and ℒ denotes Multi-Instance Learning Loss. As we expect that the model should have superior discrimination on anomalies, we directly optimize the square error for all labeled anomalies, which is formulated as ℒ . As for normal samples in abnormal bags, we consider that it is dangerous to minimize the output anomaly scores, because there is underlying label noise in normal samples around anomalies, i.e. in abnormal bags. Inspired by the success of MIL in surveillance videos [12][14], we optimize their anomaly score by the MIL loss.
When it turns to negative bag, i.e. Y 0 , the loss function is ℒ ∑ where ℒ denotes Normality Discrimination Loss, ℒ denotes Reconstruction Loss, and ‖ ⋅ ‖ refers to the L2-norm. For a negative bag totally composed of normal instances, the prediction anomaly scores for each instance should be as low as possible. We achieve this by minimize the square error formulated as ℒ . We also need to minimize the reconstruction loss of negative bags formulated as ℒ . Denotes that we do not optimize this objective in positive bags because we expect that the AE can be trained on completely normal instances and learn the normal patterns implied in data, so that the abnormal instances and normal instances show greater differences w.r.t. both reconstruction error and anomaly score.

Dataset Details
To evaluate the performance of our proposed framework, we conduct extensive experiments on Huawei dataset. This dataset is composed of the real runtime data of 7 electric vehicles. All of these vehicles have experienced overheating fault, which is the target anomaly to be detected. The anomaly ratio of total dataset is 2.99%, a low-level to perform weakly-supervised learning.

Competing Methods
This section compares MILAD with state-of-art anomaly detection algorithms. First, we choose iForest [6] as the baseline, as it is simple and effective. However, iForest is unsupervised and stationary, and deep learning methods can acquire diverse and discriminate representations to achieve better performance [9]. For this reason, we also studied the results of DAGMM [15], USAD [1], DevNet [10], all of which are based on deep architectures. DAGMM joints Deep Autoencoder (AE) and Gaussian Mixture Model (GMM) to estimate the density of extracted latent representations, but it is not designed for multi-variable time series. USAD is based on AEs and trained within an adversarial training inspired by the Generative Adversarial Networks (GAN), which makes it effective to detect anomalies in multivariable time series in unsupervised fashion. While DevNet is designed to leverage a few labelled anomalies with a prior to fulfil an end-to-end differentiable learning of anomaly scores.  MILAD efficiently leverages the limited labelled anomalies. iForest, DAGMM and USAD are all unsupervised and lack of prior information about anomalies. Although we remove abnormal data in the training phase to enhance their discriminant ability, they still lose to MILAD and DevNet.  MILAD is aware of the label noise problem. Inspired by the success of MIL in surveillance videos, MILAD adopts MIL loss to model the label noise in normal segments around anomalies. By optimizing the MIL loss, the model gets better discrimination and finally defeat DevNet.

Conclusions
This paper introduces MILAD, a Multi-Instance Learning based Anomaly Detection framework for electric vehicles. We take the lead in considering the problem of label noise in real scenarios, and we propose a MIL loss to convert the label noise problem to a weakly-supervised learning problem. The GRU layer enables our model to capture the time dependence and extract discriminant representations, and the MIL loss leads our model to focus on the label noise on normal data. We compared our proposed MILAD with other four state-of-art anomaly detection methods in the Huawei dataset, and the experiment results demonstrated that MILAD got superior performance over other methods.