State-Based Control Feature Extraction for Effective Anomaly Detection in Process Industries

: In process industries, the characteristics of industrial activities focus on the integrality and continuity of production process, which can contribute to excavating the appropriate features for industrial anomaly detection. From this perspective, this paper proposes a novel state-based control feature extraction approach, which regards the finite control operations as different states. Furthermore, the procedure of state transition can adequately express the change of successive control operations, and the statistical information between different states can be used to calculate the feature values. Additionally, OCSVM (One Class Support Vector Machine) and BPNN (BP Neural Network), which are optimized by PSO (Particle Swarm Optimization) and GA (Genetic Algorithm) respectively, are introduced as alternative detection engines to match with our feature extraction approach. All experimental results clearly show that the proposed feature extraction approach can effectively coordinate with the optimized classification algorithms, and the optimized GA-BPNN classifier is suggested as a more applicable detection engine by comparing its average detection accuracies with the ones of PSO-OCSVM classifier.


Introduction
Different from the discrete processing, process industries focus on the essential continuity of production process, whose main purpose is to produce products through a series of successive chemical reactions or physical changes [Muller and Oehm (2019)]. Furthermore, the production facilities are orderly organized according to the technological process, and the processing sequence is fixed and invariable. Actually, process industries have broadly infiltrated into many industrial critical infrastructures, such as petrochemical, machinery, electric power, water conservancy, etc., In process industries, various control systems and computer systems have been effectively applied to monitor and control real-time statuses and technological parameters [Ge, Song, Ding et al. (2017)], and the demands and realizations for advanced automation and networking are constantly growing. As the emerging application modes, Industrial Internet and Industry 4.0 have been widely recognized by both academia and industry, and they emphasize the deep integration between automatic control technologies and information communication technologies [Li, Yu, Deng et al. (2017); Kourtis, Kavakli and Sakellariou (2019)]. Therefore, they can provide powerful support for the digital and intelligent development of process industries. In essence, the core infrastructure of process industries remains intelligent manufacturing-oriented control system, whose vulnerabilities have been increasingly exposed because its original self-determination situations are completely broken [Galloway and Hancke (2013); ]. According to statistics, current industrial control systems are confronted with more and more serious security challenges under various outsider and insider attacks [Baybutt (2017); You, Lee, Oh et al. (2018); Xu, Tao, Yang et al. (2019)]. From Stuxnet in 2010 [Nourian and Madnick (2018)] to Triton in 2019 [Martynova and Zhang (2019)], industrial security threats have presented the obvious trends of organized, covert and persistent characteristics, which completely conform to the model of APTs (Advanced Persistent Threats) [AI-Rabiaah (2018)]. In other words, APTs have become the most popular and fatal attack patterns in industrial control systems. Especially, Industrial Internet celebrates the beautiful interconnection and interoperability of all things based on the physical network, and this innovation may actually encourage APTs' acts and accentuate their impacts. The main causes can be summarized as follows: for one thing, the interconnection and interoperability may expose more attack entrances and paths; for another, some emerging technologies may bring new security problems, for example, the virtualization vulnerabilities may become a stumbling block to the application of industrial cloud computing [Xu, Lee, Kim et al. (2018)]. In order to resolve industrial cyber threats, the researchers have started to develop industrial-oriented security solutions by combining industrial control characteristics and regular IT defense technologies. Based on the finite behaviors and stable patterns in industrial control communications, industrial anomaly detection has been regarded as a feasible way to effectively identify misbehaviors without compromising usability [Goldenberg and Wool (2013) ;Wan, Yao, Jing et al. (2018)]. In practice, one kind of exploring research idea can be summarized as follows: by using artificial intelligence algorithms, industrial anomaly detection can not only learn industrial communication regularities and behavior characteristics to extract the applicable features, but also design the optimized detection engines to achieve intrusion recognition with high accuracy. It is especially interesting that feature extraction is an important target in industrial anomaly detection, because the appropriate features can not only administer to correctly describe the characteristics of various industrial activities, but also enhance the accuracy and efficiency of detection engines ; Zhao and Dong (2018)]. In process industries, the characteristics of industrial activities focus on the integrality and continuity of production process. Moreover, the integrality demands that all industrial elements are orderly organized to execute the whole production process in period, and the continuity reveals that all stages of production process smoothly work without interruption. From the viewpoint of these characteristics, we propose a novel state-based control feature extraction approach, which selects significant features from the successive control operations in one production process. In particular, this approach designs the finite control operations to different states, and the procedure of state transition can adequately express the change of successive control operations. In this paper, we also introduce two different classification algorithms as detection engines to indirectly evaluate the proposed feature extraction approach. More specifically, these two classification algorithms are OCSVM (One Class Support Vector Machine) ] and BPNN (BP Neural Network) [Wu, Shi, Wang et al. (2019)], and PSO (Particle Swarm Optimization) and GA (Genetic Algorithm) [Pham, Malinowski and Bartczak (2011)] are chosen to optimize the key parameters of these detection engines, respectively. According to the experimental results in the Modbus/TCP control system which simulates the material synthesis process, we can draw the following conclusions: (1) the state-based control feature extraction approach can effectively coordinate with the optimized classification algorithms; (2) without considering the training process, the optimized GA-BPNN classifier is suggested as a serviceable detection engine due to its higher detection accuracy. The main accomplishments and contributions of this paper are summarized as follows: firstly, based on FSM (Finite State Machine), we propose a novel state-based control model to analyze and characterize the integrality and continuity of control operations, and calculate the feature value by integrating the motivation coefficient with the statistical information of state transition; secondly, in order to effectively cooperate with the proposed feature extraction approach, we select OCSVM and BPNN classifiers as two representative detection engines, which are optimized to enhance their detection capabilities; thirdly, we define three practical attack types against the normal production process, and design different attack powers to compare the detection accuracies of two classifiers. In particular, the dramatic difference of our concern is that an excellent feature extraction approach not only is one significant precondition for anomaly detection, but also guarantees and improves the detection quality. In our approach, we focus on the detailed design of statebased control feature selection and calculation to address this challenge.
2 State-based control feature extraction As stated previously, one notable advantage of process industries is that the process controls in different production stages are executed in a single uninterrupted sequence. In other words, the whole production process in process industries always completes some periodic control operations under the condition of finite states, and the change of successive control operations can reflect the corresponding production process to some extent. On this basis, we propose a novel state-based control feature extraction approach, which regards the finite control operations as different states. Furthermore, the change of successive control operations can be represented by the procedure of state transition, and the statistical information between different states can be used to calculate the feature values. The specific steps are listed as follows: 1) Initialization In process industries, each production process involves a series of successive control operations, which can achieve a combination of different functions. When one master operation station wants to control one slave PLC (Programmable Logic Controller), the corresponding types and roles of all control operations have been defined in the function fields of industrial communication protocols. Namely, if we capture and parse industrial communication packets in chronological order, the obtained control sequence in the interval τ can represent the change of successive control operations in one or several production stages. As a result, we can obtain the control sequence set ) is different from each other. Additionally, all control sequences involve l different functions l f , here l k ≤ . By selecting the appropriate interval τ , each control sequence i C can consist of l different functions, who are rearranged according to one specific production process. 2) State-based control model building Based on the control sequence set C , we further build the state-based control model by using FSM [Soewito, Vespa, Mahajan et al. (2009)]. In this model, each function l f can be considered as a state l S , and all states form a finite set . Therefore, the change of two successive control operations can be expressed as the transition from one state to another state, and any control sequence can be described by the state transition of multiple states. Additionally, different state transitions need to be triggered through diversified input signals in FSM. Similarly, we select the previous control operation before two successive control operations as the trigger signal, which is referred to the motivation factor in the statebased control model. For example, 1 j i c − can be regarded as the motivation factor of two successive control operations 1 j j i i c c + in the short control sequence 3) Feature factor selection and feature value calculation According to the state transition paths, we select uv u v M S S as the feasible feature factor. More specifically, each feature factor consists of three successive control operations, in which the first control operation is viewed as the motivation factor uv M and the latter two control operations represent the state transition u v S S → caused by the motivation factor uv M . Based on the above definitions, no matter the motivation factor or the state is actually a control operation. If all control sequences involve l different functions, the maximum number of feature factors may be 3 l , that is, the dimension of feature sample may reach 3 l . Actually, each production process will probably not cover all feature factors, and the corresponding dimension of feature sample will be less than 3 l . In the state-based control model, a simple identification method of feature factor is designed as follows: firstly, we rearrange all control sequence in the set C in chronological order; secondly, by recursively traversing the rearranged control operations, we find each different short control sequence which can be identified as the selected feature factor; thirdly, we can further obtain the number d of feature factors in this production process by computing the number of all short control sequences.
For each control sequence we can calculate the feature value of each feature factor by ( ) Here, Eco is the motivation coefficient generated by the motivation factor, and ( ) According to the definition of Pearson correlation coefficient, we further calculate the motivation coefficient Eco by Here, ( ) Above all, each feature can work out one feature value, and each control sequence

Detection engines and optimization
For the same feature samples, different detection engines may export distinct results due to their own detection characteristics. In this paper, we introduce two different classification algorithms as alternative detection engines to match with our feature extraction approach. Furthermore, the first classification algorithm is OCSVM which can be easily trained by only using normal feature samples, and the second classification algorithm is BPNN which can be specifically trained with the help of both normal and malicious feature samples. In order to obtain perfect classification effects, we choose PSO and GA to optimize the key parameters of these two classification algorithms, respectively.

OCSVM classifier optimized by PSO
Different from the traditional SVM, OCSVM can be directly applied in one type of training feature samples, which are correctly extracted from normal system or network data. By judging the attribution of observed data, the OCSVM classifier can mark the suspicious data as the abnormal type. Moreover, the general mechanism of OCSVM is described below: in order to enhance the preferable aggregation, the original feature samples { , 1, 2, , } i x i n =  need to be mapped into the high-dimensional feature space ( ) Φ  by using the kernel function ( , ) ( ), ( ) and an optimal hyperplane in this feature space is resolved to maximize separation between the observed feature samples and the ordinate origin, which is postulated as the only one abnormal feature sample. As shown in Eq. (4), by resolving the quadratic programming problem, OCSVM can calculate the normal vector ω and compensation factor ρ to generate the final decision function.
Here, v is the tradeoff parameter to affect the number of support vectors, and i α is the Lagrange multiplier in Lagrange function. In our OCSVM classifier, we introduce Gaussian kernel function to realize the nonlinear mapping of feature space, and its kernel parameter g plays an important role in the excellent hyperplane construction [Xiao, Wang and Xu (2015)]. To sum up, we employ PSO to optimize the tradeoff parameter v and kernel parameter g , and the detailed optimization process is depicted in Fig. 2

BPNN classifier optimized by GA
BPNN belongs to the multilayer feedforward neural network, whose significant characteristics involve forward signal propagation and reverse error propagation. When BPNN serves as an anomaly classifier, it requires different types of training feature samples, which can improve its capability of association and prediction. Furthermore, BPNN consists of three layers: the input layer, the hidden layer and the output layer, and the neurons between every two layers possess the connection weights ij ω and jk ω . By setting the hidden and output thresholds Th and To , the outputs in the hidden layer and output layer can be calculated by Eq. (5).
Here, n , l and m are the unit numbers in these three layers, and ( ) f  is the activation function of hidden layer.
In our BPNN classifier, we define the misclassification rate as BPNN's prediction error, and this error can be further used to update the parameters ij ω , jk ω , Th and To , which make a positive contribution during the favorable network construction. Therefore, we employ GA to optimize these parameters, and the detailed optimization process is depicted in Fig. 2(b).

Experimental testing and result comparison
By organically combining the proposed feature extraction approach and two detection engines, we can designate the detection accuracy as a practicable evaluation indicator. One the one hand, this indicator can contribute to developing the serviceable detection engine, which is more applicable to the proposed feature extraction approach; on the other hand, it can indirectly reflect the effectiveness of feature extraction, which embodies the appropriate level to describe the characteristics of production process in process industries. In order to achieve this goal, we build a Modbus/TCP control system to simulate the material synthesis process. As shown in Fig. 3, the production process is summarized as follows: firstly, PLC 1 opens the valves of two funnels to drop materials 1 and 2, and closes these two valves when the quantities of materials 1 and 2 reach the setting values respectively; secondly, PLC 2 switches on the conveyor belt, and materials 1 and 2 are carried into the reaction furnace; thirdly, after the material synthesis reaction, PLC 3 opens valve 3 to discharge the synthetic material 3. By means of different Modbus/TCP packets, three PLCs are managed and controlled by one operator station, and the complete cycle of production process is 30 seconds.  Figure 3: Modbus/TCP control system to implement the material synthesis process

Experimental data acquisition and analysis
By running this system, we capture lots of normal Modbus/TCP packets, which are divided into two parts: the first part serves as normal training data, which contains 65486 control operations during the running time of 280 minutes; the second part is regarded as normal test data, whose number of control operations is 33340 during the running time of 143 minutes. Fig. 4 shows the distribution characteristics of control operations in the normal training data. Moreover, the whole production process involves 5 different functions: 01, 03, 05, 15 and 16, which represent "Read coils", "Read multiple registers", "Write single coil", "Write multiple coils" and "Write multiple registers" in the Modbus/TCP protocol specification, respectively. From Figs. 4 (a) and 4(b) we can see that, the total number of control operations per 60 s has a tight fluctuation, and the accumulated number of each control operation presents a trend of smooth growth. In short, all of these can provide indirect evidence of the stability and periodicity of production process under the finite states. Similarly, Figs. 4 (c) and 4(d) show the average numbers and variances of different control operations per 60s, and the maximum variance for the control operation 01 is only 1.5, that is, all control operations in every production process have tiny deviations from their average numbers.

Different attack assumptions
In order to evaluate the detection accuracy for malicious attacks, we suppose three different attack types against this system. Furthermore, the main purpose of these attacks is to destroy the normal production process by launching some imitative control operations, for example, if next control operation is changed to 05 from the normal operation control 01 in one production stage, one industrial accident may be caused because this imitative control operation has broken the continuity of production process. Additionally, another reasonable hypothesis is that the imitative control operations only involves the above 5 different functions because incompatible control operations can be easily filtered by current industrial firewalls [Wan, Shang, Kong et al. (2017); Cheminod, Durante, Seno et al. (2018)]. Based on the network structure of simulated control system, the malicious attacker is designed to directly connect to the industrial switch, and has obtained its ownership permission. As shown in Fig. 5, the first two attack types belong to the category of MITM (Man in The Middle) attacks, and the third attack type is based on the third-party injection attack. More specifically, the detailed definitions of three attack types are interpreted as follows: Definition 1. Continuous MITM attack The malicious attacker can hijack the normal control operations from the industrial switch, and continuously modifies some normal control operations to a chain of imitative control operations. In other words, this attack type can cause a chain of irregular control operations to appear in the normal production process.

Definition 2. Random MITM attack
The malicious attacker can hijack the normal control operations from the industrial switch, and randomly modifies several normal control operations to the imitative control operations. In other words, this attack type can induce the imitative control operations to randomly spread over the normal production process. Definition 3. Continuous injection attack As a hidden third-party adversary, the malicious attacker can launch a chain of imitative control operations, and continuously inject them into the normal control operations. In other words, this attack type can add some additional and irregular control operations into the normal production process.

Detection evaluation for PSO-OCSVM classifier
According to the proposed feature extraction approach, we acquire 280 normal training feature samples from the normal training data. Actually, the number of feature factors is only 51, and is much less than the theoretical maximum value 3 5 125 = due to 3 l = . By using normal training feature samples, we obtain an optimized PSO-OCSVM classifier, and the optimal tradeoff parameter and kernel parameter are 0.0114 v = and 12.9090 g = . Furthermore, Fig. 6 depicts the changes of two fitness curves under 200 iterations, and all fitness values are computed by using 3-fold cross validation. From this figure we can see that the best value in each iteration grows fast and monotonically converges to the global optimum, which can reach 99.64%. Additionally, Fig. 7 shows the classification results for 280 normal training feature samples, and the corresponding classification accuracy can reach about 97.86%. In this figure, "1" represents the normal category, and "-1" represents the abnormal category. According to the classification results, only 6 normal feature samples are misidentified as abnormal ones, and we can conclude that this classifier has a fine ability of learning and generalization. For 143 normal test feature samples extracted from the normal test data, we further evaluate the false classification of PSO-OCSVM classifier. Fig. 8 plots the classification results of 143 normal test feature samples, and the corresponding classification accuracy can reach 96.50%. Namely, only 5 normal feature samples are incorrectly classified as abnormal ones, and it directly proves that this classifier can ensure a low rate of false classification. In order to evaluate the detection accuracies for three different attack types, we simulate each attack type to destroy the normal production process. For each attack type, we generate 280 malicious control sequences in one experiment, and the number of imitative control operations in each malicious control sequence is flexibly designed according to the assumed attack powers. For example, when one malicious attacker carries out the continuous injection attack, he can continuously launch 15 imitative control operations as one attack power, and the corresponding percentage in each control sequence is about 6.03%. In practice, if the malicious attacker wants to achieve a higher success probability, it is an efficient way to improve the attack power by increasing the number of imitative control operations. However, different attack powers may also have significant impacts on the detection accuracy of PSO-OCSVM classifier. As a result, we must compare the detection accuracies under different numbers of imitative control operations for each attack type. Tab. 2 shows the experimental results for three attack types, and each average detection accuracy in this table is calculated by conducting 6 different experiments. Additionally, it's worth noting that the number of imitative control operations for the random MITM attack is differently designed from the ones for another two attack types, and the causes can be briefly analyzed as follows: on the one hand, the proposed feature extraction approach is very sensitive to the random distribution of imitative control operations, that is, a tiny amount of imitative control operations can bring a significant impact on the feature value calculation; on the other hand, we focus on the trend of average detection accuracy under the incremental number of imitative control operations, and the same design for the random MITM attack may cause trouble in this trend estimation. From this table we can find that the optimized PSO-OCSVM classifier has a satisfying ability to detect the given attack types, and we can also summarize the following conclusions: (1) for all attack types, as the number of imitative control operations increases, the average detection accuracy of PSO-OCSVM classifier shows a trend of significant growth; (2) although the selected numbers of imitative control operations for all attack types are not the same, it remains the highest detection accuracy for the random MITM attack, because if we set the number of imitative control operations to 12, its average detection accuracy can reach 99.88% by performing additional 6 experiments; (3) based on the proposed feature extraction approach, the random distribution of imitative control operations can cause more significant changes of feature values, which can contribute to the detection accuracy of PSO-OCSVM classifier.

Detection evaluation for GA-BPNN classifier
Differently, BPNN requires both normal training feature samples and malicious training feature samples, which can improve its classification capability. Based on the attack assumptions, we generate 50 malicious control sequences to extract malicious training feature samples for each attack type, and it is worth mentioning that the number of malicious training feature samples is far less than the one of normal training feature samples because the malicious attacks infrequently occur in real-world process industries. By using the above-mentioned training feature samples, we obtain an optimized GA-BPNN classifier, and Fig. 9    (2) the optimized GA-BPNN classifier achieves the greatest efficiency to detect the random MITM attack, even when the number of imitative control operations is set to 12, its average detection accuracy can grow to 98.75%; (3) the proposed feature extraction approach is more sensitive to the random MITM attack due to the random distribution of imitative control operations. Differently, by comparing the experimental results in Tabs. 2 and 3, we find that these two classifiers can present their own advantages and disadvantages: firstly, the optimized PSO-OCSVM classifier obtains the highest detection accuracy 98.81% for the random MITM attack, but its average detection accuracy under 5 imitative control operations is well below the one of GA-BPNN classifier; secondly, the optimized GA-BPNN classifier exhibits more excellent detection stability, because the change of average detection accuracies follows a relatively smooth curve; thirdly, the optimized GA-BPNN classifier has a distinct advantage to detect the continuous MITM attack and continuous injection attack, even though its average detection accuracy for the continuous MITM attack under 17 imitative control operations is slightly lower than the one of PSO-OCSVM classifier. Above all, the above experimental comparisons and analysis convincingly illustrate the following two points: for one thing, the state-based control feature extraction approach can not only correctly describe the characteristics of control operation in process industries, but also effectively coordinate with the optimized classification algorithms, because both of two optimized classifiers have a desirable detection capability; for another, if malicious training feature samples are sufficient and diversified, we suggest the optimized GA-BPNN classifier as a serviceable detection engine to cooperate with our feature extraction approach.

Conclusion
According to the integrality and continuity of production process in process industries, this paper proposes a novel state-based control feature extraction approach, which selects the finite control operations as different states to construct the feature factor. Moreover, the change of successive control operations can be represented by the procedure of state transition, and the statistical information between different states can be used to calculate the feature values. Additionally, this paper also introduces two different classification algorithms as detection engines to indirectly evaluate the proposed feature extraction approach, and these classification algorithms are optimized to the PSO-OCSVM and GA-BPNN classifiers by using the training feature samples. By supposing three applicable attack types, we further compare the detection accuracies of these two classifiers. The experimental results show that both two classifiers have a desirable detection ability, and the average detection accuracy of GA-BPNN classifier is generally higher than the one of PSO-OCSVM classifier. In other words, the proposed feature extraction approach can effectively coordinate with the optimized classification algorithms.