An IoT Time Series Data Security Model for Adversarial Attack Based on Thermometer Encoding

School of Information Science and Technology, North China University of Technology, Beijing, China Department of Computer Science, Faculty of Science and Arts at Belgarn, University of Bisha, Sabt Al-Alaya 61985, Saudi Arabia College of Computing and Information Technology, Faculty of Computing and Information Technology, University of Bisha, Bisha 61922, Saudi Arabia Department of Computer Science & Technology, China University of Petroleum, Beijing 102249, China Beijing Key Laboratory of Petroleum Data Mining, China University of Petroleum-Beijing, Beijing 102249, China


Introduction
IoT amalgamates well-known products with state of the art infrastructures including distributed data storage, big data solutions, artificial intelligence (AI) utilities, or cloud [1]. Internet of ings (IoT) envisions connected, pervasive, and smart nodes link independently while providing all kinds of services. IoT data are collected at large to aid in decision-making. e IoT consumer products are no longer just the product only; it is the data, the product, the infrastructure, and the algorithms. ese IoT products have switched to connected technologies from analog one, therefore, introducing novel risks for consumers regarding potential safety, privacy, and security issues for discriminatory data [2][3][4].
Moreover, Papernot et al. [5] have found that the adversarial samples are more transferable amongst various machine learning approaches, i.e., support vector machine, logistic regression, decision tree, and deep neural networks.
ere are many application scenarios for IoT, as shown in Figure 1, including medical health, electricity, and intelligence device. ere are also areas, which are very sensitive to attacks, such as industrial control decision support systems.
In other fields, such as State Grid and Industrial Control, the deep learning model built for them is prone to make decision errors due to data noise and deliberate attacks to modify data. For example, smart grids time series data were analyzed for electricity fraud detection, wherein these use cases perturbed data can succor thieves from being detected.
As illustrated in Figure 1, in some sensitive and crucial systems, time-series data classification models are admired for their vast application. us, security and precisely the ability to detect the nodes being compromised, along with collecting and preserving evidence of malicious activities or an attack transpire as a priority in the triumphant deployment of IoT networks. Among these potential risks, AI algorithm security is rarely gained research interest although it is a very hot topic in the domain of AI.
Modern approaches in time-series data classification are based on the deep learning paradigm [6], specifically adversarial examples, which could lead to big recognition errors by adding small perturbation to the original time series. e reason lies in the high dimension linear design of deep learning models. In order to better combat against the adversarial attack, we applied an encode-decode model to reconstruct time series examples from thermometer encoding of original time series. Although the classification models are not trained on the reconstruction examples, their training and valuation accuracy is the same as the model trained on the original examples. Moreover, we found that the new model is robust to the fast gradient sign method (FGSM) attack to some extent. To summarize, the contributions of this article are three folds as follows: (1) Summarize some potential risks in the IoT time series classification (TSC) model (2) Analyze the classification activation map and the attack area in time series (3) A robust model-based encode-decode and thermometer encoding e remaining paper is structured as follows. In Section two, we overview TSC works based on deep learning as well as attacks and defense methods in the fields of computer vision. Section three shows some potential risks in IoTTSC from different views. Section four presents a detailed description of our method and some basic theory. In the experiment section, we introduce the datasets, classification model architecture, and attack defense results. Finally, we analyze the defense effectiveness and give our future research directions.

Related Works
TSC problems are experienced in numerous real-life data mining tasks ranging from power consumption monitoring [4], food safety [7], and health care [2,8,9].
Deep learning has resolved some problems like pattern recognition in temporal and spatial data with higher accuracy that was thought to be impossible a few years ago. Fortunately, TSC tasks can be efficiently framed as deep learning problems; therefore, many researchers have recently begun to adopt deep learning models for TSC tasks [6]. e classification of time series IoT data is a key problem in various application domains. Backing the development of deep learning, investigators have started to work on the vulnerability of deep neural networks to adversarial attacks [10]. In the field of image processing, an adversarial attack alters original images in such a way that the modifications are nearly imperceptible by a human. e altered image is termed as an adversarial image, that will be confused by the neural network and will be misclassified, while that of the original image will be correctly classified. e well-known real-world attack includes modifying a traffic sign image so that it is misconceived by an autonomous vehicle [11]. Alteration of illegal content to make it undetectable by automatic moderation algorithms is another example. e most notable attack is gradient-based attack, where the attacker alters the image in the direction of the gradient of the loss function with reference to the input image and therefore escalates the rate of misclassification [12,13]. e model of deep learning applied in a real environment on IoT data is fragile which is vulnerable to adversarial attack, and this has become a common problem of deep learning in other areas. At the same time, there are much security works for image processing such as defensive distillation [14], data compression [15], depth compression network [16], data randomization [17], and gradient regularization [18]. ere are hardly any comprehensive studies on defense against an attack on temporal data. Fawaz et al. [8] discussed some serious problems in the classification of time-series data using a deep learning model. Different from the image, IoT time-series data own its special characteristics, such as dynamic changing and different sampling scale. Based on the characteristics of IoT data, this paper uses an encode-decode model-based deep neural network.
In the encode-decode stage, we used a thermometer coding method to be the decoded output. e reason to use the thermometer coding is to consider bringing a strong nonlinear transformation to the model. is is inspired by Goodfellow, who showed us the high dimension of the wellstructured deep learning model. Buckman's et al. [19] work confirmed that the input discretization approach could repel against adversarial attacks. Inspired by these thoughts, different from the aforementioned works, we try to construct a whole network. In this network, the input is the original curves, and it will learn its original curve through the encode-decode model with its thermometer encoding as input. With the thermometer coding as input to the ResNet to  predict its type, we will show the details of the proposed network and its effectiveness in the following parts.

Adversarial Attack in IoT Time Series Data
In this paper, we used Coffee's dataset [20] as typical time series data to illustrate the adversarial attack phenomena in IoT fields and ResNet [21] as a measure for neural network architecture.

Fast Gradient Sign Method and TSC Adversarial Attacks.
Some adversarial examples and definition of the TSC problem were introduced by Fawaz et al. [8]. According to them, time series data can be mathematically represented as Let T is a real number and represents the length of X. Further, there is a well-trained deep Here, Y is the label space of time series, and R is a real number space. e adversarial example has to find another example X ′ to be a perturbed cloned version of X with the restriction that X − X ′ < ε and Y ≠ Y ′ . A visual illustration of given definitions is visualized in Figure 2. e most classic adversarial method is the fast gradient sign method (FGSM). FGSM was first introduced by Goodfellow et al. [12] for generating adversarial images that trick the well-known GoogLeNet model. e attack is set up through a one-step gradient update in the direction of the gradient's sign at every single timestamp. e perturbation procedure shown in Figure 3 can be represented mathematically as follows: where ε symbolizes the magnitude of the perturbation. e adversarial time series X ′ can be computed using Author of the FGSM paper mentioned the underlying reason why FGSM attacks the neural network. Firstly, the influence of disturbance in the neural network will be as big as snowball due to the linear design of the model. At present, ReLU is a kind of linear activation function in neural networks, which makes the whole network tend to be linear. Furthermore, the larger the dimension of input, the more vulnerable will be the model to adversarial attack.

3.2.
e Distribution of Data in Adversarial Attack. Multidimensional scaling (MDS) [22] provides a possibility to get insights into the spatial distribution of the input time series. MDS project N-dimensional space into two-dimensional space while keeping the relative distance for any two time series. Given the nearest neighbor classifier achieving low accuracy on the raw time series, Euclidean distance (ED) could not be used directly in the raw data.
However, the high feature learned by the network could be used as a good presentation of the raw time series. Commonly, the perfectly connected layers in the last several layers of the neural network are often used as latent space, where the class-specific region differs for different classes.
We apply this method on ResNet, which achieves the best accuracy on most of the TSC problems [6]. In the ResNet architecture, there is a global average pooling (GAP) layer preceding the classifier layer.
e GAP layer is a learned good representation of the raw time series, which is used to compute ED. When we get the distance for each pair of two time series, the metric MDS is a cost function called stress and can be obtained as follows: where d ij is the ED between the GAP vectors of time series X i and X j . In this way, the original raw time series space is largely reduced to two-dimensional space. Each time series X i is represented as a single data point x i . e visualization of MDS shows the distribution of the data in the raw data space to some extent. Here, we used the same technique to show how the adversarial attack works from the data distribution angle. e Coffee dataset is used as an example, and the ResNet is applied as a base neural network. Details are shown in Figure 4.
As shown in Figure 4(a), one can easily separate the set of time series belonging to the two classes by utilizing MDS on the latent representation learned by the network. Yet, in Figures 4(b)-4(d), with the attack ratio eps becoming larger, it becomes harder to classify these two datasets by using linear classifier in the two-dimensional space. With the help of MDS, we could observe that the adversarial attack surprisingly changes the distribution of data.

Transferability of Adversarial
Class -2 with 99% confidence Figure 2: Adversarial examples taken from [8]. Security and Communication Networks networks trained by diverse datasets [23]. Moreover, adversarial attacks for a special architecture can trick other classifiers trained by different machine learning algorithms or even other's neural networks with dissimilar architectures [5].
Recently, Tramèr et al. [24] found that on average, the distance to the model's decision boundary is larger than the distance between two models' boundaries in the same direction which confirms the existence of transferability of adversarial attack examples up to some extent. ey also prove, by presenting a counter-example, that transferability is not an intrinsic characteristic of deep neural networks.  Typically, we trained a machine learning model by the following process, as shown in Figure 5; the trained model is deployed to the industry environment after being evaluated on the prepared test dataset. is is extremely dangerous in the environment of IoT due to its device controlling characteristics.
We trained a ResNet model to classify some randomly generated noise data along with the time series data. Unmistakably, the random time series data will be rejected by the classifier with low confidence. However, the random noise data classified were classified as class two with high confidence that prove that there is a potential risk in the model. Some predicted labels of the samples of noise time series examples are visualized in Figure 6.
As illustrated by Figure 6, we notice that even zero values or random noise also can lead to high confidence output. As a result, the model cannot be used directly for intelligent devices.

Class Activation Map and Adversarial Examples.
Class activation map (CAM) proposed by Zhou et al. [26] was exploring to find the discriminative and susceptible field of an image. Later, Wang et al. [9] proposed a one-dimensional CAM application in TSC. Here, we use the CAM method to highlight the susceptible region of a time-series data. Consequently, the susceptible fields of the time-series data are continually distributed which potentially enable that some preprocessing method could improve the robustness of the model.
is method describes the classification of a definite deep learning model to underline the subsequences that contribute the most to a specific classifier. It is to be noted that utilizing CAM is only feasible for the models with a GAP layer prior to the softmax classifier. at is the reason, in this section, we only measured the ResNet model that achieves the highest accuracy for majorities of the datasets. ResNet benefits from the CAM approach using a global average pooling (GAP) layer that helps identify possible regions of an input time series data that contribute to the certain classifier.
Let A(t) be the result of the last convolutional layer MTS with M variables. A m (t) is the univariate time series for the variable m ∈ [1, M], where m ∈ [1, M] is the result of applying the mth filter. Let w c m be the weight between the output neuron of class c and the mth filter. As a GAP layer is utilized, therefore, the input to the neuron of class c, i.e., (Z c ) can be computed using the following equation: e second summation contributes the averaged time series to the whole time dimension. For simplicity, the denominator is omitted here. e input Z c can also be represented in equation form as follows: Lastly CAM c , the class activation map, that explains the classification as label c is given by the equation as follows: Here, CAM is a univariate time series in which each item at a certain timestamp t ∈ [1, T] is equal to the weights being learned by the neural network, i.e., weighted sum of the data points M at time t. Figure 7 shows the result of applying CAM, respectively, on the Coffee dataset.

Security and Communication Networks
From Figure 7, we found that the key classification activation fields are in the same points where the vision difference exists. However, the adversarial examples are not trying to modify these places to defraud the classifier.
In Figure 8, we found the adversarial example is a tiny difference from the original time series and the changing place is not in the key area learned by the neural network.

Proposed Method
e results of the analysis in Section 3 provides evidence for some potential risks that exist in deep learning models besides the fact that best performance can be achieved in time series classification. Furthermore, in the IoT field, it is extremely dangerous if these algorithms are deployed in devices. We designed a new training strategy based on the encode-decode model to increase the robustness of the model.
Our method consists of two main parts: one is to encodedecode model and the second is a traditional deep neural network model for classification. In the encode-decode model, the input is the nonlinear transformation of the original time series. Here, we applied the thermometer encoding method as the nonlinear transformation. e decoded output is the original time series that is recovered from its thermometer encoding forms. e reason to use the encode-decode model is to take advantage of its nonlinear transforms to remove some noise and adversarial perturb which is based on linear gradient signs. e schema of our proposed method is shown in Figure 9.
Our method tries to bring nonlinear transformation by the encode-decode model which will defend the traditional adversarial attacks. e network consists of two main parts, one is encode-decode part. In this part, the network tries to learn a noise and nonlinear function which tries to minimize the loss of original example with the thermometer discretized examples. e encoder maps the input to a fixed-length vector (which needs to contain all the input information) and the decoder then outputs the translation. In the model, the encoder learns a coding sequence representing the semantic information of time series, and the decoder maps the sequence to the original time series.
First, the time series will be discretized into an average of ten evenly spaced levels. Additionally, the thermometer encoding method is applied to the discretized curves. Based on the thermometer encoding time series, the encode-decode model is trained to reconstruct time series. erefore, the loss function we used here is the mean-square error (MSE).
In the process of training the encode-decode model, we add some random noise to the time series that increase the reconstruction ability. Figure 10 shows that the encoder part tries to learn some robust illustration of the input time series. e decoder tries to map the input to its original time series. Here, we add some random noise to the original input time series to increase the robustness of the encode-decode model.
In order to discretize the input time series x without losing the relative distance information, Buckman [3] proposed thermometer encodings. For an index j ∈ 1, . . . , k { }, let τ(j) ∈ R k be the thermometer vector defined as follows: en, the discretization function f is defined for a time index point i ∈ 1, . . . , n { } as follows: where Sum is the cumulative sum function and f onehot (x i ) is the one-hot coding method. e thermometer encoding is characteristic is very important for time series that hold the order and shape information of the original time series. Figure 11 shows the discretize process of a time series. Table 2 shows the thermometer encoding result of a continuous value. e time series can be disseized by the average bin method and transformed into other code. e coding method is highly nonlinear, which could defense the attack for the gradient-based attack method. e curve with certain noise can be restored normally after discretization and encode-decode model. e welltrained encode-decode model could recover the original time series from its thermometer encoding. We showed the example of the Coffee dataset to illustrate its effectiveness in Figure 12.
As illustrated in Figure 12, the reconstruction time series contains all the information of the original time series. e difference is the high-frequency part of the time series, which looks to link random noises. We showed that the deep learning model trained on these reconstruction examples could show high accuracy and ability to defend from adversarial attacks.

Experiment and Evaluation
In this section, we present an attack method FGSMs and ResNet [21] architecture. We then use FGSMs to generate adversarial time series attack examples for the ResNet model.

Data Sets and Comparison
Method. 85 datasets of the UCR archive are utilized in experiments [27]. ese datasets encapsulate diverse time series data from fields like electricity industry, food security, image, and sensors.
One of the dataset is electronic devices known as smart meters, which record detailed electricity consumption data. A previous study [28] showed that these electricity data could be used to analyze the type of electric device. e  e aim to collect and analyze the electricity consumption data is to monitor the device being used by the citizens' homes and in future to reduce carbon footprint. 375 univariate time series come under the umbrella of the dataset. e classes are Microwave, Toaster, and Kettle of length 720. e dataset classically illustrates IoT time series attack example and is a vital task in the intelligent device.
ResNet architecture the same as [8] has been employed for the comparison process. Details about the architecture and its parameter are shown in Table 3. e block of ResNet is illustrated in Figure 13. In ResNet, time series act as input and the possible classes K serve as an output. e convolution kernel size is 8, 5, and 3 for every individual block of the ResNet which indicates that for extracting some useful features, it will have the neighbor size 8, 5, and 3.
e ResNet we employed, comprises of three blocks, and they have 64, 128, and 128 filters, respectively.

Result and Analysis of Attack and
Defense. e experiments are conducted on Keras 2.1 and TensorFlow 1.8. e number of samples in training and testing phase is decided by the original public available dataset (UCR). We trained the encode-decode ResNet network and extracted the ResNet part as the attack target. e input of the attack model is the original time series; the gradient of this model is computed in the same way as illustrated in the work. Although in our method, we used a thermometer as the input to train the encode-decode model, and the comparison model is the same ResNet as illustrated in [8]. e experiments in this manuscript are carried out to show the efficiency of the encode-decode model in the defense part. e results of the defense are shown in Table 4. During the attack and defense stage, the perturbation ratio ε is set to 0.1.
In Table 4, we could see that the accuracy of most of the datasets is largely improved compared with encode-decode training. e result shown in Table 4 reveals that our method could resist the attack of FGSM in the TSC problem to some extent.
To future analyze, the encode-decode model could defend against the attack by FGSM, and we get the accuracy of a typical sensor dataset, i.e., Coffee dataset under different ε. Original Discretize Figure 11: Examples of mapping continuous-valued inputs to quantized inputs and thermometer codes with ten evenly spaced levels.    Figure 8. Coffee dataset [20] is a two-class problem that discriminates between Arabica and Robusta coffee beans. e encode-decode ResNet shows a good defense result for this dataset, and the accuracy curve is shown in Figure 14. In this figure, we could find that the accuracy of these two datasets decreased slowly as the number of perturbation increases. It means the attack of FGSM still works here, but its effectiveness is largely reduced. e reason lies in the thermometer encoding because it is a highly nonlinear transformation. e thermometer encoding discretizes the time series and retains the order information about the original curve.

Preprocessing Method for Defense Adversarial Attack.
Actually, in an industrial environment, we could apply some practical preprocessing methods such as time series smooth method to weaken the fluency of adversarial examples. Here,  we show two methods known as smooth and encode-decode to assist the attack.
In the experiment, we first applied the thermometer encoding method to transform the adversarial examples; then, the encode-decode model is used to map the thermometer encoding back into the original time series. Of course, the reconstructed time series is different from the original time series. e recognition accuracy is shown in Figure 15.
As illustrated in Figure 15, the yellow line is below the red line, which means the encode-decode model improves the accuracy of attacks by the FGSM. is result hints that the encode-decode model could be used as a data preprocessing method before being put into the classification model.

Conclusions
e proposed method of this paper is of using encode-decode model joint training strategy to strengthen the robustness of the deep learning model. e experiments reveal that our model can resist FGSM attacks to some extent. Moreover, the encode-decode model could be used as a way of preprocessing to weaken the attack from FGSM. ough, it is not easy to eliminate the white-box attack launched by FGSM. Our method improves the robustness of the trained model but fails to resist the attack completely. To check the effectiveness of our method, more experiments on other datasets are required as well.
Moreover, we found that different trained models own different power against the same attack, and it is hard to evaluate the goodness of the model. Fundamentally, there are no theoretical studies on how to quantify the goodness or robustness of a trained model. erefore, given the popularity of applying the deep learning method to IoT data analysis, it still needs more research to focus on the interpretability of deep learning models. Our future research directions include how to evaluate the defensive capability to adversarial examples in the area of IoT data.

Data Availability
e data used to support the findings of this study are included in the manuscript.

Conflicts of Interest
e authors declare that they have no conflicts of interest.