Function-Aware Anomaly Detection Based on Wavelet Neural Network for Industrial Control Communication

,


Introduction
Nowadays, almost all CPSs (Cyber-Physical Systems) in critical infrastructures (such as electrical and petrochemical systems, sewage systems, and transportation systems) concerning the national economy and the people's livelihood have developed industrial control systems to realize significant automation of industrial processes [1,2]. In particular, with the rise of Industry 4.0 and Internet of Things [3,4], the flexible manufacturing and convenient interoperability has already been brought into schedule by academia and industry. Actually, smart CPSs can unleash strong driving forces for the innovation and integration of industrialization and informatization. As an applicable solution, information communication technologies have a positive influence on strengthening traditional industrial control systems [5]. However, the application of ICTs is gradually breaking the original "information island" status of industrial control systems, and the incoming cybersecurity can be dramatically impacted. In consequence, many experienced engineers shift their focus from the process safety to the information security [6]. Over the past several years, CPSs came under cyberattacks from all sides. According to the ICS-CERT (Industrial Control Systems Cyber Emergency Response Team) statistics [7], the ICS-CERT incident response team generalized and analyzed 290 industrial security incidents in 2016, and more and more sophisticated attacks against industrial control systems are developed by the adversaries. Actually, three comprehensible causes in such a situation can be recognized as follows: (1) multifarious vulnerabilities of industrial control systems have been exposed gradually in recent years, for example, system architecture vulnerability [8,9], embedded control device vulnerability [9][10][11], and industrial communication vulnerability [11,12]; (2) the types of cyberattacks are distinctive and diversified, and targeted attacks and APTs (Advanced Persistent Threats) have permeated to face reality [13]; (3) industrial-oriented defense technologies are in an underway and exploring stage, and the regular Internet security methods are unable to satisfy the special industrial control requirements [14].
The major contributions and advantages of this paper involve three aspects: Firstly, we propose a novel time-related feature calculation and construction algorithm to adequately describe the function control characteristics, and this algorithm can slickly extract function control behaviors from industrial control communication activities. Secondly, based on the time-related function control behaviors, we introduce the optimized wavelet neural network to realize the functionware anomaly detection. Finally, a real-world control system is simulated to evaluate our approach, and the experimental results show that our approach is practicable and effective. Actually, the biggest difference of our approach focuses on the first aspect. That is, adequately modeling function control behaviors is one necessary prerequisite to further explore the real-time anomaly detection. In our approach, we design an original function control feature calculation and construction algorithm to overcome this difficulty.

Related Work
According to different detection techniques, the anomaly detection approaches in CPSs can involve three major aspects: rule matching, statistics analysis, and computational intelligence [18]. In the rule matching ones, the prior knowledge must be prepared to learn the general rules for intrusion detection and the rule match is executed to detect many kinds of attacks. Typically, Almalawi et al. [20] automatically extract the proximity detection rules from the consistent and inconsistent states of SCADA data to identify integrity attacks on SCADA systems. Genge et al. [21] propose a systematic and auto-configured anomaly detection approach, which includes modeling of ICS networks and generating anomaly detection rules, to identify the attacks violating ICS connection patterns. Due to the predefined rules, these approaches can improve the classification accuracy and have the practical detection efficiency. But the huge rule database is hard to build and update, because the extracted rules must cover all known attack instances. Besides, they also lack the ability to exploit the unknown attacks which frequently occur in today's CPSs. In the statistics analysis ones, the underlying distribution (such as the network traffic profile) can be learned to detect anomalies, and these techniques are better able to resist the incomplete and imprecise training data than the rule matching ones. For instance, Do [22] and Gawand et al. [23] introduce the CUSUM mechanism to detect the change point of industrial communication traffic. Different from the rule matching ones, these approaches can attempt to find the weaknesses of the unknown attacks, but this ability is very limited because the sophisticated and targeted attacks can easily bypass the distribution changes. Moreover, the high false positive and negative rate is another drawback because it is difficult to determine the traffic profile. In the computational intelligence-based ones, these techniques always have a strong correlation with data mining. Furthermore, the normal models or profiles are built from multivariate training data, and the corresponding anomaly detection is realized by using the mechanism of classification or optimization. Actually, the computational intelligence techniques have been attracting great interests of both industry and academia, and many computational intelligence approaches have been researched, mainly including SVM (Support Vector Method) [15,24,25], neural network [26], decision trees [27], genetic algorithm [27,28], and clustering technique [29]. Although the computational intelligence-based techniques have the relatively high computational overhead, they can achieve better performance in detection, tolerance, and generality [14]. Additionally, these approaches can not only detect known attacks with high detection efficiency, but also have a better function in identifying new intrusion modes [15]. It is worth mentioning that our approach belongs to the computational intelligence ones. Differently, we propose a new feature calculation and construction algorithm for industrial control communication, which not only successfully extracts the function control behavior from industrial communication characteristics, but also moderately reduces the computational complexity.

Function Control Feature Calculation and Construction
In industrial control communication, function codes, which represent control signals sent from the operator or engineer workstations, are distributed to the executive devices for the purpose of controlling industrial automation process. Therefore, the feature calculation and construction algorithm analyzes the time-related function codes to simulate the function control behavior. In particular, we cannot simply gather the function codes at regular time intervals as function control samples to train the behavior model, and the intrinsic reasons include the following: (1) the number of function codes at each regular time interval is distinct, and the prerequisite for the behavior model based on WNN is that the dimensions of input samples must be consistent with one another; (2) the number of function codes at each regular time interval may be very large, and it may waste computational resources and reduce detection efficiency. Figure 1 depicts the detailed feature calculation and construction process, and each step can be outlined below.
Step 1 (function code sequence preprocessing). Like our prior work in [15], in order to associate time characteristics with function control activities, we first parse the captured function control packets in depth and obtain the function code sequence = 1 2 3 ⋅ ⋅ ⋅ in every interval (here, is the serial number of every function code in ). The function code sequence set = { 1 , 2 , ⋅ ⋅ ⋅ , } in the interval ( = ∑ =1 ) can consist of all function code sequences ( = 1, ⋅ ⋅ ⋅ , ), and all sequence dimensions in can separate from each other because of the different in each sequence. After that, we recombine all ( = 1, ⋅ ⋅ ⋅ , ) to the one big sequence according to the time order.
Step 2 (feature factor selection). Because the dimensions of obtained function control samples ( = 1, ⋅ ⋅ ⋅ , ) must be consistent, we first need to construct the feature base vector according to the selected feature factors. In particular, the selected feature factors consist of two main components: single function code and short sequence pattern. More specifically, all single function codes 1 , 2 , ⋅ ⋅ ⋅ , V are searched in sequential order from the large function code sequence , and each single function code is different from the others. According to each single function code ( ∈ [1, V]), we design the short sequence pattern ( ∈ [1, V]). Furthermore, consists of and , and we can get V short sequence patterns 1 , 2 , ⋅ ⋅ ⋅ , V for each single function code . Taken together, we can acquire the feature base vec- The intrinsic reasons for such feature factor selection include the following: (1) the single function code can represent its own role in each function code sequence; (2) the short sequence pattern can establish the relationship between two function codes and indirectly reflect the continuous control operations in industrial automation process.
Step 3 (function control sample calculation). According to the feature base vector, we further calculate the corresponding feature variable for each feature factor in the function code sequence . For the single function code , we regard its frequency in as the corresponding feature variable, and the calculation formula is = ( )/ ; here ( ) represents the number of in . For the short sequence pattern , we calculate its frequency in as the corresponding feature variable by the formula = ( )/( − 1). By calculating all feature variables in , we can complete the construction of the sample Additionally, there is a one-to-one correspondence between function code sequences and function control sample, and each function control sample contains V(1 + V) feature variables. To sum up, all calculated function control samples form the function control sample set = { 1 , 2 , ⋅ ⋅ ⋅ , }.

Function-Aware Anomaly Detection Based on Wavelet Neural Network
After the feature construction, we can train a wavelet neural network to discover any functional change in industrial control communication. Moreover, we introduce the WNN's prediction capability to realize that the function-aware anomaly detection, an optimized WNN, and the correlative detection threshold are achieved by the loop-based iterative train. Figure 2 shows the overall architectural design of function-aware anomaly detection based on WNN. As this figure shows, this detection approach is made up of two phases: model training and real-time detection. Actually, model training is an essential step or a prerequisite in order to improve the detection accuracy. In this phase, by using the training function control samples extracted from the normal industrial communication data, an optimized WNN-based behavior model is successfully built and an accompanying detection threshold (including an upper limit and a lower limit) is measured and recorded.   In the real-time detection phase, industrial communication data are captured and parsed in depth to form the test function control samples by means of the feature calculation and construction algorithm mentioned in Section 3, and the optimized wavelet neural network analyzes these input test samples to calculate the predicted results, which are further compared with the detection threshold. When the predicted results are not covered by the detection threshold, an alarm will be generated in real-time.

Wavelet Neural Network and
Optimization. WNN has already been successfully applied to many practical areas [30], and in our approach it is introduced as the critical behavior model to identify function control misbehaviors. In practice, the topological structure of WNN evolves from BP neural network, and it regards the wavelet basis function as the activation function of hidden layer wavelons, which are referred to as the hidden units. In the hidden layer, the input variables are inserted and transformed to wavelets, and all wavelons are combined to estimate the approximation of the target values [31].
In the WNN's structure depicted in Figure 2, 1 , 2 , ⋅ ⋅ ⋅ , are the input variables in the input layer, and 1 , 2 , ⋅ ⋅ ⋅ , are the predicted results in the output layer. Additionally, and stand for the network weights. If the input variables are ( = 1, 2, ⋅ ⋅ ⋅ , ), the corresponding outputs can be given by the expression: Here, ℎ( ) is the output of the hidden unit in the hidden layer; is the connection weight between the input layer and the hidden layer; represents the translation parameter of the wavelet basis function ℎ ; represents the dilation parameter of the wavelet basis function ℎ .
In our wavelet neural network, the Morlet wavelet is selected as the wavelet basis function, given by After the calculation of the hidden layer, we can further obtain the predicted results by the following expression: Here, is the connection weight between the hidden layer and the output layer; ℎ( ) is the output of the hidden unit in the hidden layer; is the number of the hidden units; is the number of the output units.
It is worth mentioning that we use the loop-based iteration to train the optimized WNN, and its main purpose is to improve the network parameters, including the connection weights and , the translation parameter , and the dilation parameter . Furthermore, the predicted error is introduced to shorten the distance between the predicted results and the expected outputs, and the predicted error can be computed by Here, ( ) is the expected output, and ( ) is the predicted result. Algorithm 1 shows the pseudocode of WNN's optimization process. In this process, the parameter increments are introduced to update all network parameters, and the specific process can refer to the WNN's training steps in Section 4.3. In practice, two different terminations of iteration for this process can be selected: one is the maximum number of iterations, and the other is the preconfigured error threshold which indicates the iteration is completed if the distance between the predicted results and the expected outputs is small enough. In our approach, we select the first one as the terminal condition.

Training and Detection.
As mentioned in Section 4.1, the main steps of model training are outlined below.
Step 1 (network parameter initialization). We first initialize the primary parameters of wavelet neural network, including the dilation parameter , the translation parameter , and the connection weights and . Additionally, we also set the learning rate, which is used to improve the above parameters.
Step 2 (predicted error calculation). According the training function control samples, we calculate the predicted error by (4).
Step 3 (parameter modification). On the basis of the predicted error, we further improve the network parameters to shorten the distance between the predicted results and the expected outputs.
Step 4 (detection threshold measurement). Finally, we repeat Steps 2 and 3 until the iteration ended and record the optimized detection threshold.
After the model training, we can perform the real-time detection to identify function control misbehaviors. Moreover, the basic prerequisite is that we must resolve the realtime function control samples from the observed industrial communication data by using our feature calculation and construction algorithm. As the input variables, these samples can further be analyzed by the optimized wavelet neural network to estimate the predicted results, which will be compared with the detection threshold. The judgment criterion to generate an alarm is that if the predicted results fall within the range from the lower limit to the upper limit, we can believe these function control activities are normal; if the predicted results escape from these ranges, we may doubt the corresponding function control activities are abnormal.

Experimental Modbus/TCP Control System.
In order to evaluate the detection performance, we use the simulation control system which is built in our earlier work [15] to furnish the analyzed function control data. Furthermore, the industrial control communication of this system is based on Modbus/TCP, in which various function codes are utilized to facilitate different control operations. Figure 3 shows the basic network architecture of this control system. Furthermore, the chief purpose of this system is to accomplish the material production by monitoring and controlling the valves and the liquid levels, and the detailed technological process has been presented in [15]. In particular, the whole technological process is repeated every 1 minute. Besides, in this control system we carry out some attack experiments to forge and replay some malicious Modbus control commands, and our ultimate goal is to evaluate the detection accuracy and realtime capability by using these malicious function control data.
The normal communication packets are captured from the industrial switch to train the optimized wavelet neural network, and the capture time lasts 1h15m02s. After the preliminary statistics, the number of Modbus/TCP function codes in these packets reaches 11693. Additionally, we also use Matlab to analyze these packets in depth, and the hardware configurations are also the same with the ones in [15]. Per one minute, we compute the number of different function codes, and all statistical results are shown in Figure 4. As these results show, the simulation control system uses four categories of function codes to complete the whole technological process, and these function codes are 1, 3, 5, and 6, respectively. Besides, all five curves in this figure flatten out, and the number of every function code fluctuates smoothly. In brief, these results can also demonstrate that the simulation control system has the relatively steady communication patterns under normal circumstances, and its function control status appears on a relatively limited range.

Detection Performance Evaluation.
Without loss of generality, we choose the detection accuracy and real-time capability as the main performance indicators to evaluate our approach. Before training the optimal behavior model, we first preprocess the captured Modbus/TCP packets. More specifically, we extract the function codes per 1 minute to form the function code sequences, and by using the feature calculation and construction algorithm we win a total of 75 function control samples. Because 4 function codes exist in the simulated technological process, each function control sample contains 20 feature variables. According to these normal function control samples, we further train and optimize the wavelet neural network. It is worth mentioning that we set the number of iterations to 200 in order to reduce the predicted error, and Figure 5 plots the change curve of predicted errors with the iteration times. From this figure we can see that, along with the increasing of iteration times, the curve of predicted errors changes from rapid reduction to gentle trend. In particular, the best detection accuracy in the 200th iteration can reach 98.67%; that is, the predicted accuracy of the optimized WNN to detect 75 normal function control samples can reach 98.67%, and only one normal function control sample is mistakenly regarded as the outlier.
By using the optimal behavior model, we further evaluate its detection performance, including detection accuracy and consuming time. In each experiment, we forge and replay some malicious Modbus/TCP packets to attack and destroy the normal technological process. Moreover, we suppose that these malicious Modus/TCP packets cannot contain other function codes which are different with the four categories    of function codes used in the simulation control system, and these packets only change the function control process. The major reason of such assumption is that the malicious packets containing other function codes can be easily filtered by the applied industrial firewall [11,32]. Besides, we generate 60 malicious function code sequences in each experiment. More specially, the percentage of the malicious function codes in each function code sequence is about 1/10, and the locations of the malicious function codes in each function code sequence can be considered random. Similarly, we can obtain 60 malicious function control samples in each experiment after the feature calculation and construction. By calculating the predicted result for each malicious function control sample, we compare it with the detection threshold to identify the corresponding abnormal function control behavior. Table 1 shows the experimental results of detection performance under 10 different experiments in detail. In this table, the average detection accuracy is 91.17%, and the average consuming time is 0.0104s. In the extreme case, the smallest detection accuracy is 88.33% in the 3rd, 7th, and 10th experiments, and the largest consuming time is only 0.0281s to detect 60 function control samples in the 7th experiment. In a word, we fully demonstrate the function-ware anomaly detection approach has the fine detection accuracy and adequate real-time capability; namely, they indirectly declare it has the remarkable capacity to differentiate the abnormal function control activities.
Actually, the adversary can change the attack intensity by adjusting the attack frequency; for example, the sending rate of malicious Modbus/TCP packets can be increased by the adversary to launch an attack with a higher probability of success. Therefore, the percentage of the malicious function codes in each function code sequence may also change accordingly. However, the different percentages of the malicious function codes in each function code sequence  Figure 6 plots the detection accuracy variation under different percentages of the malicious function codes in each function code sequence. In this figure, 1 , 2 , ⋅ ⋅ ⋅ , 6 represent the percentages whose values are 1/5, 1/10, 1/15, 1/20, 1/25, and 1/30, respectively, and the minimum detection accuracies, the average detection accuracies, and the maximum detection accuracies are plotted according to every 5 experiments. Viewed generally, the experimental results reflect the detection accuracy also decreases with the reduction of the percentage; that is, our approach can be more effective in detecting the function control misbehavior caused by the larger percentage of the malicious function codes. However, our approach still maintains a high detection accuracy; for instance, when the percentage is 1/30, the average detection accuracy can reach 76.67%. Additionally, Figure 7 shows the average consuming time under different percentages of the malicious function codes. From this figure, we can see that the consuming time fluctuates remarkably in a narrow range. In other words, the different percentages have almost no influence on the consuming time.

Compared Analysis.
In practice, the innovations of our approach mainly include two aspects: (1) we propose a new feature calculation and construction algorithm to extract function control characteristics in industrial control communication; (2) according to the extracted function control samples, we introduce the optimal function-aware WNN model to differentiate the aberrant industrial control communication activities. Therefore, we also provide the compared analysis to explain its advantages from these two aspects.
For one thing, compared with the work in [15,25], the feature calculation and construction algorithm in this paper can learn more information about function control characteristics from industrial communication packets. On the one hand, this algorithm selects the single function code  as an independent feature factor to enhance its own role effect in each function code sequence. On the other hand, the short sequence patterns include all adjacent cases of two function codes, and they not only indirectly reflect the continuous control operations in the normal technological process, but  also consider the impact of two nonadjacent control operations in the actual technological process. Therefore, more information about function control characteristics can be utilized to improve the detection efficiency. Based on the same sample extraction by using the proposed feature calculation and construction algorithm, we compare our approach with BP neural network to evaluate the detection accuracy and explain that the proposed approach is more suitable and applicable to detect function control misbehaviors. Similarly, we also perform 10 experiments, and the function control samples in each experiment are the same with the ones whose percentage of the malicious function codes in each function code sequence is about 1/10. Figure 8 plots the detection accuracy comparison between our approach and BP neural network under 10 experiments, and Table 2 shows the corresponding average detection accuracies of two approaches. From these results we can see that BP neural network has a relatively large fluctuation of the detection accuracy, and its average detection accuracy is only 83.50% which is lower than the one of our approach. Therefore, our approach has the ability to provide the better detection accuracy.

Conclusion
Aiming at differentiating the aberrant industrial control communication activities, this paper proposes a functionaware anomaly detection approach based on WNN. Firstly, we design the feature calculation and construction algorithm to learn the function control characteristics and extract the time-related features. Secondly, a behavior model based on WNN is established and optimized to detect function control misbehaviors in industrial control communication. Finally, in order to evaluate our approach, we simulate a real-world control system based on Modbus/TCP to perform plenty of experiments, and the experimental results and the compared analysis are offered to express the advantages: our approach has the fine detection accuracy and adequate real-time capability.

Data Availability
In this manuscript, the analyzed function code data are captured and analyzed from our simulation control system, which is built to accomplish the material production according to one real-world control system. Actually, we have sketched the basic technological process in this manuscript, but some contents and specific parameters of this process are not completely open to the public due to the commercialized secrets. Therefore, the analyzed function code data used to support the findings of this study are currently under embargo. If other researchers want to verify the results, replicate the analysis, or conduct secondary analyses, please contact with the corresponding author or first author. The requests for the data will be considered by them after a confidentiality agreement.