Fine-Grained Multivariate Time Series Anomaly Detection in IoT

: Sensors produce a large amount of multivariate time series data to record the states of Internet of Things (IoT) systems. Multivariate time series timestamp anomaly detection (TSAD) can identify timestamps of attacks and malfunctions. However, it is necessary to determine which sensor or indicator is abnormal to facilitate a more detailed diagnosis, a process referred to as fine-grained anomaly detection (FGAD). Although further FGAD can be extended based on TSAD methods, existing works do not provide a quantitative evaluation, and the performance is unknown. Therefore, to tackle the FGAD problem, this paper first verifies that the TSAD methods achieve low performance when applied to the FGAD task directly because of the excessive fusion of features and the ignoring of the relationship’s dynamic changes between indicators. Accordingly, this paper proposes a multivariate time series fine-grained anomaly detection (MFGAD) framework. To avoid excessive fusion of features, MFGAD constructs two sub-models to independently identify the abnormal timestamp and abnormal indicator instead of a single model and then combines the two kinds of abnormal results to detect the fine-grained anomaly. Based on this framework, an algorithm based on Graph Attention Neural Network (GAT) and Attention Convolutional Long-Short Term Memory (A-ConvLSTM) is proposed, in which GAT learns temporal features of multiple indicators to detect abnormal timestamps and A-ConvLSTM captures the dynamic relationship between indicators to identify abnormal indicators. Extensive simulations on a real-world dataset demonstrate that the proposed algorithm can achieve a higher F1 score and hit rate than the extension of existing TSAD methods with the benefit of two independent sub-models for timestamp and indicator detection.


Introduction
Many sensor devices in the Internet of Things (IoT) produce a significant amount of time series data to record the states of the IoT system dynamically.Anomalies in the time series of states indicate a malfunction or attack.Detecting and localizing anomalies [1,2] in the time series is an essential method for detecting malfunction or attack.When an anomaly is detected, treatments can be made to reduce financial losses.Therefore, anomaly detection plays a vital role in the artificial secure management of IoT.
In practice, sensors in the IoT often generate multiple indicators 1 and form multivariate time series (MTS).For example, the indicators used in waterworks systems include the water level, water flow, valve status, water pressure, etc.The MTS in Fig. 1 contains five indicators: the flow meter, ultra filtration (UF) feed pump, oxidation-reduction potential (ORP) meter, motorized valve, and level transmitter.MTS can reflect different aspects of a physical device or system and contain more information than univariate time series data.Therefore, MTS anomaly detection (MTSAD) has become an attractive field of research.There are several tasks involved in MTSAD.Existing multivariate time series anomaly detection techniques [3][4][5][6][7][8][9][10][11] focus primarily on the "timestamp anomaly detection (TSAD)" task.TSAD task aims to identify the timestamps when the system behavior deviates from the norm because of errors or attacks.However, it is also necessary to determine which specific indicator is experiencing anomalies at the time of the abnormal timestamp.Identifying abnormal indicators helps with finding the root causes and more rapidly applying correct treatments to reduce losses, which refer to as "fine-grained anomaly detection (FGAD)," "anomaly interpretation [6]", or "anomaly diagnosis [11,12]" tasks.Taking Fig. 1 as an example, the multivariate time series in waterworks systems contains five indicators and 27000 timestamps.Two abnormal timestamps are highlighted in red.TSAD identifies the two abnormal timestamps.FGAD identifies the ORP meter as an anomaly on the two abnormal timestamps.
Due to the ability of FGAD to root causes, this paper focuses on the fine-grained anomaly detection task.The aims of TSAD and FGAD tasks are different and existing methods mostly solve the TSAD task.Therefore, multivariate time series fine-grained anomaly detection (MFGAD) remains an open problem and still faces several challenges.
• Although some works [6,11] have noted that further FGAD can be applied based on the extension of existing TSAD techniques, they do not provide a quantitative evaluation, and the performance of these extension methods is unknown.• Some works [13] can identify abnormal indicators; however, these are in a period, meaning that the exact abnormal timestamp cannot be identified.• Only a few indicators are abnormal, and most of them are normal within the abnormal timestamp.The anomaly ratio of the FGAD task is substantially lower than that of the TSAD task.The imbalance problem is more serious, which makes the FGAD problem more difficult.
Therefore, this paper designs an MFGAD framework and an algorithm based on Graph Attention Neural Networks (GAT) and Attention-based Convolutional Long-Short Term Memory Networks (A-ConvLSTM) technique to tackle the FGAD problem.The significant contributions can be summarized as follows: • This paper first verifies that extending the TSAD methods [11] does not work well on the FGAD task.The performance of these extended techniques on the FGAD task is much lower than that of the original techniques on the TSAD task.The main reason is that these models are prone to excessive mixing of the indicator-wise features and ignore the relationship's dynamic changes between indicators, which are insufficient to distinguish indicators.• A multivariate time series fine-grained anomaly detection framework is proposed to avoid excessive fusion of features.It constructs two sub-models to independently identify the abnormal timestamp and abnormal indicator instead of a single model and then combines the two kinds of abnormal results to detect the fine-grained anomaly.• Based on the framework, a fine-grained anomaly detection algorithm is implemented by GAT and A-ConvLSTM.GAT learns temporal features of multiple indicators to detect abnormal timestamps.A-ConvLSTM captures the dynamic relationship between indicators and extracts distinct indicators' features to identify abnormal indicators.• Extensive simulations on a real-world dataset demonstrate that the proposed framework and algorithm can achieve a higher F1 score and hit rate than the extension of state-of-the-art methods.
The remainder of this paper is organized as follows.The related work is reviewed in Section 1.The problem description and motivation are presented in Section 2. The MFGAD framework and the detailed steps are outlined in Section 3. The performance of the proposed framework is evaluated via experiments in Section 4. Section 5 concludes this work and discusses future work.

Related Works
Nowadays, anomaly detection methods are mainly based on deep learning [14].Although several anomaly detection methods are designed for log data [15], network traffic data [16,17], or video data [18,19], they can not apply to MTS data because of the different data structures.MTSAD is usually classified into three tasks, as shown in Fig. 2: TSAD, indicator anomaly detection (IAD), and FGAD.Indicator anomaly detection identifies abnormal indicators but does not point out the exact timestamp of the abnormal indicators.The abnormal indicator is located within the full timestamp or duration.In the following section, this paper reviews the work related to the three tasks.[3] use LSTM to achieve high prediction performance and provide a nonparametric, dynamic, and unsupervised anomaly threshold approach to detect anomalies.
2) Generation-based methods: Generative models are widely applied to TSAD for reconstructing the time series.LSTM-based Variational AutoEncoder (LSTM-VAE) [4] projects multimodal observation and temporal dependencies into a latent space and reconstructs the expected distribution.Deep Autoencoding Gaussian Mixture Model (DAGMM) [5] trains a deep autoencoding and Gaussian mixture model simultaneously to produce a low-dimensional representation and reconstruction error.OmniAnomaly [6] exploits a stochastic recurrent neural network to capture the robust representations of normal patterns and reconstruct the observations.Multivariate Anomaly Detection with Generative Adversarial Networks (MAD-GAN) [7] exploits LSTM as the base model in the generative adversarial network framework to capture the temporal correlation and detect anomalies using discrimination and reconstruction.Unsupervised Anomaly Detection (USAD) [10] uses an encoder-decoder framework within adversarial training to facilitate fast and energy-efficient training.Adversarial Autoencoder Anomaly Detection Interpretation (DAEMON) [20] exploits two discriminators to antagonistically train an autoencoder that learns the normal patterns of the multivariate time series.InterFusion [21] uses a hierarchical Variational AutoEncoder (VAE) to model the inter-metric and time dependence, then exploits a Markov Monte Carlo-based method to obtain reasonable embedding and refactoring of abnormal parts.Static and Dynamic Factorized VAE (SDFVAE) [22] exploits BiLSTM and recurrent VAE to distinctly decompose the latent variables into dynamic and static parts to learn the representation of time series.
3) Graph-based methods: Graph attention networks are applied to model the correlations between indicators and the temporal dependencies for predicting future behavior.Multivariate Time series Anomaly Detection using Temporal pattern and Feature pattern (MTAD-TF) [8] exploits multiscale convolution and graph attention networks to capture temporal patterns.Multivariate Timeseries Anomaly Detection via Graph Attention Network (MTAD-GAT) [9] attempts to model the correlations between different univariate time series and the temporal dependencies of each time series via GAT.Graph Deviation Network (GDN) [11] learns the dependence relationships between time series and predicts future behavior by GAT.The prediction error is used to detect deviations.Graph learning with Transformer for Anomaly detection (GTA) [23] combines temporal convolutional networks and graph convolutional networks to extract temporal and spatial features and then further exploits Transformer to predict the following value and detect anomalies.Graph Relational Learning Network (GReLeN) [24] employs graph relationship learning to capture the dependencies between sensors and graph neural networks to reconstruct values for anomaly detection.However, the above methods only identify the timestamps when the system has failed or been attacked and can not solve the FGAD problem directly.

Indicator Anomaly Detection of MTS
MSCRED [13] addresses the anomaly detection and diagnosis problem simultaneously.This approach can detect an anomaly, identify the root cause, and interpret anomaly severity by an attention-based convolutional LSTM network as an encoder and decoder for reconstruction.He et al. [25] identify the abnormal indicator streams from among all irregular streams.However, they identify abnormal indicators without the exact timestamp.

Fine-Grained Anomaly Detection of MTS
Xie et al. [26,27] apply matrix decomposition and tensor decomposition to detect anomalies in the network data.The data is decomposed into a low-ranked tensor and a sparse tensor, the latter of which can be treated as an anomaly.Xie et al. [28] use graphs to improve accuracy, while Xie et al. [29] employ sliding window reuse to speed up online anomaly detection.Garg et al. [12] conduct an evaluation of anomaly detection and diagnosis in MTS.Anomaly diagnosis is performed by ranking the indicatorwise anomaly scores generated by TSAD and returning the top-ranked indicator.
However, the extension of existing works [12] achieves low performance when applied to the FGAD task.This phenomenon is verified experimentally in Section 3.2.Although many works addressing multivariate time series anomaly detection have been proposed, no specially designed method for fine-grained anomaly detection has been devised.Thus, fine-grained anomaly detection on multivariate time series remains an open problem.

Problem Description and Motivation
The definition of multivariate time series fine-grained anomaly detection and the motivation are presented in this section.

Multivariate Time Series Fine-grained Anomaly Detection Problem Definition
A multivariate time series with K indicators and N timestamps can be denoted by X = x 1 , x 2 , . . ., x K T ∈ R K×N .The i-th indicator can be represented by The goal of timestamp anomaly detection is to identify whether the following t timestamp x t is anomalous.All existing related works [3][4][5][6][7][8][9][10][11] solve this problem.Differing from it, this paper focuses on multivariate time series fine-grained anomaly detection, which indicates whether or not the i-th indicator on the t-th timestamp x i t is abnormal.Assuming that the label y i t is 1, it means that x i t is abnormal.

Motivation
For the TSAD task, most existing methods train a model to predict or reconstruct normal data, while the model outputs substantial errors when encountering abnormal data.These methods achieve high performance on the TSAD task.Taking GDN [11] as an example, their basic processes of them are as follows.GDN predicts all indicator values on timestamps through graph neural networks, such that the error between the predicted value and the observed value on each indicator is normalized into an indicator-wise anomaly score.All indicator-wise anomaly scores are then transformed and aggregated into a single anomaly score per timestamp.When the single anomaly score is higher than a given threshold, that timestamp is abnormal, as shown in Fig. 3.

Figure 3: ATop and PTop extension methods
The indicator-wise anomaly scores before aggregation can be exploited to detect fine-grained anomalies by ranking them and returning the top-ranked indicators [12].As shown in Fig. 3, the specific extension methods are as follows: • The first method ranks all indicator-wise anomaly scores on all timestamps and directly takes the top M indicators on their timestamps as anomalies, which is referred to as ATop.• The second method ranks all indicator-wise anomaly scores on the abnormal timestamps identified by GDN.It takes the top M indicators on the abnormal timestamps as anomalies referred to as PTop.
This research extends five baseline methods of TSAD by ATop and PTop to verify the performance of TSAD methods in the FGAD task.The baseline methods and experimental parameters are described in Section 5.The results are shown in Table 1.The F1 score of all methods in the TSAD task is over 76%.However, when applied to the FGAD task, the performance of all methods with two kinds of extension drops sharply.Taking GDN as an example, there are two reasons for this phenomenon.First, GDN abstracts the dynamic relationship between indicators into a static graph structure.Therefore, the feature relationship of all indicators extracted from GDN does not change over time.Second, the loss function is based on the mean squared error between the predicted output and the observed data.The features of all indicators are fused on each timestamp.GDN over-integrates the features of the indicators, meaning that it cannot effectively distinguish between them.Because of the static feature relationship and the lack of distinct indicator features, the extended model can not effectively detect fine-grained anomalies.Therefore, it is necessary to design a particular scheme for fine-grained anomaly detection.An overview of the proposed framework and the details of each part are presented in this section.

Overview
This paper proposes an MFGAD framework to solve the FGAD problem.It contains a basic TSAD sub-model to learn temporal features and detect timestamps.This paper introduces an IAD sub-model that learns more spatial features to identify the indicator of duration to capture the dynamic relationship between indicators and compensate for the extreme mixture of the indicator fixtures in the above sub-model.Therefore, MFGAD constructs two sub-models instead of a single model.It first independently identifies the abnormal timestamp and indicator and then combines the two kinds of results to diagnose the anomaly.MFGAD comprises four parts: data preprocessing, TSAD sub-model, IAD sub-model, and anomaly diagnosis (see illustration in Fig. 4).This paper designs an algorithm by exploiting the GAT-based prediction model and the A-ConvLSTM-based reconstruction model to implement the TSAD and IAD sub-models, respectively.In the following, this paper uses the exact names of the algorithms rather than the names of the sub-models.

Data preprocessing:
This component processes the original data to provide a convenient input data form for each of the sub-models.The multivariate time series X ∈ R K×N is divided via a sliding window w p to generate multiple feature matrices X ∈ R H×K×wp and an adjacency matrix A ∈ R K×K , which construct a graph structure and are the input of the prediction model.The multivariate time series X is divided based on a fixed window into multivariate sub-sequences.A product matrix A h ∈ R K×K is obtained via the inner product between indicators in a multivariate sub-sequence and is then used as the input of the reconstruction model.The product matrix stores a similar relationship between indicators.

GAT-based prediction model:
To detect abnormal timestamps, this component predicts the value at each timestamp.The feature matrix X t ∈ R K×wp and adjacency matrix A are fed into the GAT model to predict the value xt ∈ R K×1 per timestamp.The aggregated error between the ground truth and the predicted value is taken as an anomaly score for identifying abnormal timestamps.The notations in this paper are shown in Table 2.In the following sub-section, each part is described in more detail.

Data Preprocessing
This paper aims to construct a graph structure representing the relationship between the indicators.The graph contains K nodes, each of which represents an indicator and has its feature.The edges represent the relationship between indicators, while the data for the prediction and reconstruction models are independently processed.

Preprocessing for Prediction Model
The multivariate time series is divided into a set of multivariate sub-sequences by a sliding window with window size w p and step size 1.The t-th multivariate sub-sequence, denoted as X t = x t−wp , x t−wp+1 , . . ., x t−1 ∈ R K×wp , is used to predict the value at timestamp t, where t = w p + 1, w p + 2, . . ., N .There are a total of H multivariate sub-sequences, where H = N − w p + 1.All multivariate sub-sequences constitute a feature tensor X ∈ R H×K×wp .
Based on the multivariate time series X, the cosine similarity is exploited to measure the similarity between indicators.An adjacency matrix A ∈ R K×K is constructed based on the top S similarity.The top S similarity is set to 1 to indicate that the two nodes are adjacent, while the remaining values are set to 0 in the adjacency matrix, which can be formulated as follows: where A ij represents the adjacency relationship between the i-th and j-th indicators.

Preprocessing for Reconstruction Model
The multivariate time series is divided into a set of multivariate sub-sequences by a fixed window with window size w r .The h-th multivariate sub-sequence is denoted by X h = x h×wr , . . ., x (h+1)×wr−1 ∈ R K×wr , where h = 0, 1, . . ., H − 1 and H = N/w r .There are a total of H multivariate sub-sequences.The related H product matrices form the product tensor A ∈ R H×K×K .
For the multivariate sub-sequence, this paper uses the inner product between indicator pairs to represent the adjacency relationship between indicator pairs in a given window, which can be represented by a product matrix A h ∈ R K×K .The element of the i row and j column in the h-th product matrix is formulated as follows: where x i h×wr , . . ., x i (h+1)×wr−1 T represents the sub-sequence of the i-th indicator on the h-th multivariate sub-sequence.

GAT-based Prediction Model
To predict the following values and detect abnormal timestamps, this paper adopts a feature extractor based on GAT.GAT fuses the feature of an individual node with those of its neighbors according to the graph structure learned from data preprocessing.In more detail, this paper obtains the aggregated representation z i t of node i at t timestamp as follows: where x i t−wp is the value of node i at timestamp t − w p , N (i) = {j, A ij > 0} is the neighbor node set of node i, W ∈ R d×wp is the weight matrix needing to be trained, d is the dimension of hidden features, and α ij is the attention coefficient, which can be calculated as follows: Here, ⊕ represents concatenation, while α is the learnable coefficient vector of the attention mechanism.This paper uses LeakyReLU as a non-linear activation to calculate the attention coefficient and softmax function to normalize the attention coefficient in Eq. ( 7).
This paper extracts the aggregated representations of all nodes at timestamp t by a stacked fully connected layer with dimension K to predict the value at timestamp t, denoted by xt : where z i t is the aggregate value of node i at timestamp t.The mean square error between the predicted output xt and the ground truth x t are taken as the loss function: Based on these learned relationships, this paper can detect and explain anomalies that deviate from these relationships.The predicted value and the ground truth value are compared to obtain an error value Err i t at node i and timestamp t: where x i t , xi t are the ground truth and prediction of node i at timestamp t and | • | is the absolute value function.Subsequently, a max function is exploited to obtain an overall time anomaly score (TAS) at timestamp t by aggregating the error of all nodes: Finally, when TAS t exceeds a fixed threshold θ , timestamp t is identified as anomalous.Different methods can be used to set the threshold, such as extreme value theory [30].This paper uses the maximum value as the threshold in the validation dataset.

A-ConvLSTM-based Reconstruction Model
This component extracts the relationship among indicators via convolutional neural networks and the temporal feature of the relationship via attention-based convolutional LSTM networks (A-ConvLSTM) to capture the dynamic relationship among indicators.This model uses a convolutional encoder to encode the inter-correlation between indicators and an A-ConvLSTM to capture the temporal patterns of the inter-correlations, as shown in Fig. 5. Subsequently, a convolutional decoder is used to reconstruct the input based on the feature mapping of the inter-correlation and temporal patterns.After the decoder, the reconstructed error is used to detect and diagnose abnormal indicators.
. This model aggregates the information in these m product matrices and reconstructs the product matrices themselves.

Convolutional Encoder
To capture the inter-correlations among indicators, four-layer convolution neural networks are performed on the input product tensor G.The convolution operation is specifically expressed as follows: where * is the convolution operation, σ is the activation function, G l h is the spatial feature tensors of the h-th product tensor after the l-th convolution, and W l , b l denote the convolutional kernel and bias in the l-th layer.

A-ConvLSTM
The spatial feature tensors in the convolutional encoder are temporally dependent on previous time steps.A-ConvLSTM is used to capture the temporal information in the spatial feature tensors sequence inspired by ConvLSTM.Reference [31] shows further details of ConvLSTM.
Given the spatial feature tensor G l h from the l-th convolutional layer and the previous hidden state Not all previous steps are equally correlated to the current state.This paper combines H l h with the previous hidden states by the attention mechanism to form a refined hidden representation CMC, 2023, vol.75, no.3

Because of four convolutional layers, it generates four refined hidden representations
where α is the attention coefficient, σ is the activation function, and a is the learnable coefficient vector of the attention mechanism.

Convolutional Decoder
To decode the feature tensors and reconstruct the product matrices, the convolutional decoder works in reverse order.This paper first convolves the refined hidden representation H 4 h of the fourth layer to obtain Ĥ3 h .Then, in the third, second, and first layers, the Ĥl h is concatenated with the refined hidden representation H l h of the previous layer to generate matrix representation Ĥl−1 h .The Ĥ0 h is the reconstructed product matrix: where ⊗ is the deconvolution operation, ⊕ is the connection operation, Ŵl , bl are the deconvolutional kernel and bias in the l-th layer, σ is the activation function (same as the encoder), Last (•) refers to taking the last matrix of the tensor, and Âh ∈ R K×K is the reconstructed product matrix of the h-th product matrix A h .The loss function is the root mean square error between the reconstructed product matrix and the raw product matrix.

Anomaly Score
The error matrix is made up of the differences between the reconstructed product matrix Âh and the raw product matrix A h , denoted by E h ∈ R K×K , where: There are two metrics used to identify abnormal indicators: an anomaly threshold λ and an anomaly probability threshold γ .If an element in the error matrix is greater than the anomaly threshold, it is deemed abnormal.The indicator anomaly score (ISA) is defined as the ratio of the number of abnormal elements in a row and a window.
Here, IAS i h is the indicator anomaly score of the i-th indicator in the h-th product matrix, Count () represents the sum of the quantities that meet the requirements, and E i h is the i-th row of the h-th error matrix.When it exceeds the anomaly probability threshold γ , the i-th indicator at the h-th time window is deemed abnormal.

Anomaly Diagnosis
The GAT-based prediction model detects the abnormal timestamp x t and the A-ConvLSTMbased model detects the abnormal sub-indicator Combining the abnormal timestamp and the abnormal sub-indicator is fine-grained anomaly detection.This paper next needs to determine whether the abnormal timestamp is within the time window located by the abnormal sub-indicator.If the timestamp t belongs to window h (t ∈ {h × w r , . . ., (h + 1) × w r − 1}), there is an intersection between the timestamp and sub-indicator.This intersection is a fine-grained anomaly, which means that indicator i is abnormal at timestamp t (ŷ i t = 1).The anomaly diagnosis algorithm is presented in Algorithm 1.

Algorithm 1 Anomaly Diagnosis
Input: IAS, TAS Output: ŷ 1: The size of the fixed window is w r .2: for t ∈ (0, N) do // t represents the timestamp id.

3:
if TAS t > θ then 4: for h ∈ (0, N/w r ) do // h represents the window id.5: if h × w r < t < (h + 1) × w r − 1 then 6: for i ∈ (0, K) then // i represents the indicator id.The Safe Water Treatment (SWaT) dataset2 is derived from the water treatment testbed coordinated by the Public Utilities Authority of Singapore.It is a realistic Industrial Internet of Things system that requires protection from malicious attacks.SWaT contains examples of real-life attack scenarios.In total, 23 attacks are launched in SWaT.Table 3 presents the statistics of SWaT.Due to the huge amount of raw data, downsampling is performed by taking the median value of the raw data every 10 s.Once there is an anomaly in the 10 seconds, it is labeled as abnormal.There are 46 indicators in SWaT.The test dataset contains 44990 timestamps and 2069540 (= 46 × 44990) fine-grained points.
There are 4541 abnormal timestamps, and the anomaly rate is 10.1%.Moreover, the number of finegrained anomalies is 9850, and the anomaly rate is only 0.476%.[32] as the encoder and decoder of AE.
• OmniAnomaly [6]: It uses a Gated Recurrent Unit (GRU) to capture complex temporal correlations between multivariate time series.• MSCRED [13]: It detects the anomaly, identifies the root cause, and interprets anomaly severity by an attention-based convolutional LSTM.
• GDN [11]: It uses an attention-based graph neural network to learn the dependence relationships between time series and predict future behavior.
Two extension methods are applied to all five baseline methods for the FGAD task.
• ATop: It directly takes the top M anomaly score of all indicators on all timestamps as anomalies.
• PTop: It takes the top M anomaly score of all indicators on the abnormal timestamps identified by the timestamp anomaly detection method as anomalies.

Parameters Settings and Evaluation Metrics
In terms of the parameters in the GAT-based prediction model, the Adam optimizer is used for training, the learning rate is 1 × 10 −3 , the window size w p is 30, and the number of neighbors S is set to 15.In terms of the parameters in the A-ConvLSTM-based reconstruction model, the Adam optimizer is used for training with a learning rate of 1 × 10 −4 .The window size w r is 5.The number of input product matrices m is set to 5. For ATop and PTop extension methods, the top M is set to 9000 according to the percent of anomalies in the FGAD task.The parameters of the fully convolutional layer in the encoder are as follows: 32 kernels of size 3 × 3 × 3, 64 kernels of size 3 × 3 × 32, 128 kernels of size 2 × 2 × 64, and 256 kernels of size 2 × 2 × 128, along with strides of 1 × 1, 2 × 2, 2 × 2, and 2 × 2. The parameters of the deconvolution layer in the decoder are as follows: 128 kernels of size 2 × 2 × 256, 64 kernels of size 2 × 2 × 128, 32 kernels of size 3 × 3 × 64, and three kernels of size 3 × 3 × 64 filters, along with strides of 2 × 2, 2 × 2, 2 × 2, and 1 × 1.The parameter settings are listed in Table 4.All experiments are run on a server with an Intel i7-9700 CPU, RTX 2080 SUPER GPU, and 32 GB RAM, as well as with Python 3.7 and Pytorch 1.1.8.This paper uses precision (Prec), recall (Rec), F1 score (F1), Receiver Operating Characteristic (RoC), Area under Curve (AUC), HitRate@100 (HR@100), and HitRate@150 (HR@150) to evaluate the performance of our method.The HitRate@100 and HitRate@150 metrics give the average fraction of overlap between the true anomalous indicators and the top 1.0x and 1.5x indicators on the anomalous timestamps.Precision, recall, and F1 score based on ATop or PTop rewards identify the anomalous indicators from several timestamps, while hit rate metrics reward identifying the anomalous indicators from each timestamp.This paper considers two cases according to the availability of the ground truth of anomalous timestamps for hit rate metrics: • Tp-det: The ground truth of anomalous timestamps is unknown.Only the detected anomalous timestamps are considered.Different methods detect different anomalous timestamps.
• Tp-all: The ground truth of anomalous timestamps is known, and all abnormal timestamps are considered.All detection methods have the same abnormal timestamps.

Experimental Results
The anomaly detection results in terms of precision, recall, and F1 score on SWaT are shown in Table 5.All five methods with PTop outperform ATop.The reason is that PTop is based on the abnormal timestamp and then in-depth look up the fine-grained anomalies of the abnormal indicator.In this way, the scope of anomaly identification is reduced.In the following experiment, only the PTop extension is considered for comparison with MFGAD.The proposed algorithm MFGAD significantly outperforms all five methods because it considers the dynamic relationship between indicators.The ROC and AUC of the six methods are shown in Fig. 6.MFGAD achieves the highest AUC (0.92) with the benefit of two independent sub-models for timestamp and indicator detection.Table 6 shows the hit rate results for six methods.Four methods, except OmniAnomaly and MFGAD, have lower hit rate metrics at the detected anomalous timestamps (Tp-det) than at all anomalous timestamps (Tp-all).That means that an algorithm that is good at ranking the indicatorwise score correctly at all anomalous timestamps, may not be good at ranking the indicator-wise scores correctly at their detected timestamps.However, the detected anomalous timestamps also depend on aggregating the indicator-wise scores.A good indicator-wise score ranking may not lead to a good aggregation score ranking because of the different aims between TSAD and FGAD.MFGAD can detect more anomalous indicators in both the detected and all anomalous timestamps.The anomaly detection results are shown in Fig. 8.There is no reconstruction model in PTop-GDN, and its performance remains stable.The window size of the reconstruction model is found to have a significant impact on MFGAD.Precisely, the larger the window size, the more difficult it is for the reconstruction model to detect abnormal indicators with a short duration, which reduces the performance of MFGAD.When the window size is 5, the F1 score of MFGAD is optimal.The proposed MGFAD is thus significantly better than the latest extended methods.dimension, then combines them to identify fine-grained anomalies.Simulation experiments on a real-world dataset show that the proposed MFGAD can achieve a better F1 score and hit rate compared with the latest extended methods.However, the two sub-model of MFGAD lead to a high computational cost.In future work, the prediction model and reconstruction model in the framework can be further reduced their computational cost.

Figure 1 :
Figure 1: Anomalies in multivariate time series data.The image depicts a five-indicator multivariate time series with two abnormal timestamps highlighted in red.In each abnormal timestamp, the ORP meter is the abnormal indicator

Figure 2 :
Figure 2: The types of anomaly detection tasks on multivariate time series data

A
-ConvLSTM-based reconstruction model: To capture the dynamic relationship between indicators, this component extracts the relationship between indicators (via CNN) and the temporal feature of the relationship (via LSTM), which is referred to as the A-ConvLSTM model.The m product matrices from A h−m+1 to A h forming a tensor G h ∈ R m×K×K are extracted by CNN and sent to the A-ConvLSTM model for reconstructing the product matrix Âh .The reconstruction error matrix is the difference between the product matrix A h and the reconstruction product matrix Âh .The abnormal indicator can be identified according to this difference.

Figure 4 :
Figure 4: The framework of MFGAD.MFGAD comprises four parts: data preprocessing, TSAD submodel, IAD sub-model, and anomaly diagnosis.The TSAD sub-model is implemented by the GATbased prediction model.The IAD sub-model is carried out by the A-ConvLSTM-based reconstruction model Anomaly diagnosis: This component identifies the fine-grained anomaly by combining the anomaly timestamp and indicator.

Figure 5 :
Figure 5: The framework of the reconstruction model.It consists of three parts: convolutional encoder, A-ConvLSTM network, and convolutional decoder

Figure 6 :
Figure 6: The ROC and AUC of different methods

Figure 8 :
Figure 8: The experimental results with different window sizes for the A-ConvLSTM-based reconstruction model

Table 1 :
The performance of the TSAD task and the FGAD task 4 Our Proposed Framework

Table 2 :
Notations tThe predicted label of the i-th indicator and t-th timestamp

Table 3 :
Description of SWAT datasets

convolutional network AE (TCN AE):
It exploits the TCN model

Table 4 :
Parameters setting

Table 5 :
Performance of different methods on SWaT

Table 6 :
Hit rate on SWaT