Fault Diagnosis Using Kernel Principal Component Analysis for Hot Strip Mill

In the field of hot rolling process monitoring, the activation of non-linear dynamic behaviour may render the procedure of fault diagnosis more difficult. Principal component analysis (PCA) is known as a popular method for diagnosis but as it is basically a linear method, it may pass over some useful non-linear features of the system behaviour. One possible extension of PCA is kernel PCA (KPCA), owing to the use of non-linear kernel functions that allow introduction of non-linear dependences between variables. The objective of this study is to address the problem of fault diagnosis (in terms of non-linear activation) in hot rolling automation system using a KPCA-based method. The detection is achieved by comparing the subspaces between the reference and a current state of the system through the concept of subspace angle. It is shown in this work that the exploitation of the measurements in the form of KPCA can effectively improve the detection results. Disciplines Engineering | Science and Technology Studies Publication Details F. Zhang, S. Zong & Z. Ling, "Fault Diagnosis Using Kernel Principal Component Analysis for Hot Strip Mill," Journal of Engineering, vol. 2017, (9) pp. 527-535, 2017. This journal article is available at Research Online: http://ro.uow.edu.au/eispapers1/1498 Fault diagnosis using kernel principal component analysis for hot strip mill Fei Zhang1,2, Shengyue Zong1, Zhi Ling1 Institute of Engineering Technology, University of Science and Technology Beijing, Beijing 100083, People’s Republic of China School of Electrical, Computer & Telecommunications Engineering, University of Wollongong, Wollongong, NSW 2522, Australia E-mail: zhfeicn@gmail.com Published in The Journal of Engineering; Received on 16th May 2017; Accepted on 11th August 2017 Abstract: In the field of hot rolling process monitoring, the activation of non-linear dynamic behaviour may render the procedure of fault diagnosis more difficult. Principal component analysis (PCA) is known as a popular method for diagnosis but as it is basically a linear method, it may pass over some useful non-linear features of the system behaviour. One possible extension of PCA is kernel PCA (KPCA), owing to the use of non-linear kernel functions that allow introduction of non-linear dependences between variables. The objective of this study is to address the problem of fault diagnosis (in terms of non-linear activation) in hot rolling automation system using a KPCA-based method. The detection is achieved by comparing the subspaces between the reference and a current state of the system through the concept of subspace angle. It is shown in this work that the exploitation of the measurements in the form of KPCA can effectively improve the detection results. In the field of hot rolling process monitoring, the activation of non-linear dynamic behaviour may render the procedure of fault diagnosis more difficult. Principal component analysis (PCA) is known as a popular method for diagnosis but as it is basically a linear method, it may pass over some useful non-linear features of the system behaviour. One possible extension of PCA is kernel PCA (KPCA), owing to the use of non-linear kernel functions that allow introduction of non-linear dependences between variables. The objective of this study is to address the problem of fault diagnosis (in terms of non-linear activation) in hot rolling automation system using a KPCA-based method. The detection is achieved by comparing the subspaces between the reference and a current state of the system through the concept of subspace angle. It is shown in this work that the exploitation of the measurements in the form of KPCA can effectively improve the detection results.


Introduction
A significant process in the areas of manufacturing and processing of metals is the tandem rolling of hot metal strip. In the case of steel, almost one-half of the finished product made in the world is in the form of sheet and strip that originally is produced in a hot strip rolling process. While this unquestionably reflects a fundamental drive of the industry for higher efficiency production, more specific factors include operation rationalisation moves such as the synchronisation with continuous casting plant, and a series of equipment refurbishments aimed at higher product quality [1].
Hot strip rolling is a kind of high-production and high-efficiency industrial process. Its purpose is to process cast steel slabs into steel strip with a wide range of thickness. Because of its huge size and large investment a hot strip mill need to have a lifetime of several decades. The mill must be capable of meeting the market demands for a wide range of steel grades, in particular, high strength and advanced high strength steels with good cold formability and with superior strip properties.
The primary function of the hot strip mill (HSM) is to reheat semi-finished steel slabs of steel nearly to their melting point, then roll them thinner and longer through some successive rolling mill stands driven by motors, and finally coiling up the lengthened steel sheet for transport to the next process. The slabs, of up to 35 t weight, are typically 250 mm thick and 10 m long, and the rolled strips are typically 2 mm thick and 1250 m long. The reheat furnace ensures that the slab is at a suitable temperature to start hot rolling, which is ∼1200°C. The next unit, the roughing mill (RM), is responsible for major reductions in slab thickness (e.g. a reduction from 200 to 30 mm). The transfer table carries the slab, now called a transfer bar, from the RM to the finishing mill (FM). The transfer bar is typically 40-90 m long and 0.5-1.5 m wide. A relatively recent development in hot strip mill design, the coilbox, is sometimes used to wrap the transfer bar into a coil in order to obtain a more uniform temperature profile along the strip. The transfer bar is then peeled and fed into the FM, which progressively squeeze the steel to make it thinner. As the steel becomes thinner, it also of course becomes longer, and starts moving faster. Because the single piece of steel will be a whole range of different thicknesses along its length as each section of it passes through a different stand, different parts of the same piece of steel are travelling at different speeds. Dimensions of hot and thin strip, especially the width, are sensitive to tension variations. The tension variations are inevitable in tandem mills, because the roll rotation speeds cannot be regulated with sufficient precision. In order to add a degree of freedom that prevents abrupt tension changes, and to serve as a sensitive indicator of tension variations, loopers are used. A looper is a metal cylinder supported by an arm free to move around a pivot. Keeping the strip above the pass line, it stores the extra amount of strip that is necessary for strip tension regulation. The strip emerging from the FM is typically much longer than the runout table; so, the coiling starts before the tail end leaves the FM. This requires very close control of the speeds at which each individual stand rolls; and the entire process is controlled by computer. By the time it reaches the end of the mill, the steel is travelling at about 40 miles per hour. Afterwards it is cooled down intensively by water sprays from coolant headers along the runout table and then winded up in a coiler, which is essentially a rotating mandrel. During this part of the process, piece temperatures are important and should be, typically, 870°C after the last rolling stand and 600°C at the coiler [2]. At the same time, a series of quality control measures, such as automatic thickness control (AGC), automatic width control, automatic shape control, and automatic temperature control, are implemented to guarantee the superior quality of the products.
The outline of a typical 1700 mm HSM is illustrated in Fig. 1, where R1 is the No. 1 stand of RM, HSB and FSB is the highpressure descaling box and finishing descaling box, respectively, E1 and FE1 are the roughing edger mill and finishing edger mill, respectively, CB and CS are the coil box and crop shear, respectively, RET and RDT represent the RM enter and delivery temperature, respectively, FET and FDT represent the FM enter and delivery temperature, respectively, and CT represents cooling temperature.
Due to the demanding dimensional quality requirements and multitude of operating objectives, the FM has in time become a complex unit equipped with a high level of automation [3]. The surface quality, internal defects, shape, thickness, width and microstructure of the hot-rolled strip directly affect the quality of downstream processing products. Throughout the HSM automation system, there are 55 process variables affecting the 3 surface quality, 40 process variables affecting the internal defects, 36 process variables affecting the three coupled variables of the strip shape, plate thickness, plate width, and up to 108 process variables affecting the microstructure and properties of materials [4]. These process variables include temperature, roll force, displacement, thickness, width, velocity, pressure, tension, torque, current parameters and so on. These process variables are coupled to each other, for example, the change of roll gap will affect the strip exit thickness, and then exit thickness changes will affect the forward slip, at the same time, the factors affecting the forward slip also include reduction, the friction coefficient between the roller and the rolled piece and so on. Besides the mutual coupling characteristics, the rolling process variables also have the characteristics of non-linearity, uncertainty, large scale, multi scale and so on. Therefore, the rolling mill system is a very typical non-linear system, and there is a strong non-linear relationship between the process variables.
The complexity of rolling technology and the characteristics of high temperature, high pressure and high speed in hot rolling process determine the relatively high failure rate of automation system, and even a tiny fault in production process may cause significant economic losses. In order to guarantee the quality of the products, and keep the rolling process in a continuous and reliable operation, the engineers and operators need real-time monitoring of the running conditions of the whole rolling process, especially the abnormal conditions that may affect the quality of products and safety of personnel and equipment.
The increasing pressure of competition and the tighter market are forcing the plant users to seek the optimum utilisation of their facilities, improved product quality and an extended product mix. All these presuppose a high availability of the plants and the avoidance of unforeseen failures. Moreover, the maintenance costs have to be reduced to a minimum in order to continue to be competitive. For a long time, due to the inability to predict the occurrence of faults, people have to take two measures: The first is maintenance when the equipment failure, but which will lead to a large economic loss, high maintenance costs, and even casualties. The second is a regular maintenance of equipment, which has a certain planning and preventive, but a lot of blindness, may easily lead to 'over repair' or 'under repair.' The technology of fault diagnosis is developed with the historical evolution of equipment technology and the scientific development of maintenance activities, and makes the equipment maintenance history into the stage of condition based maintenance.
The prompt detection and precise diagnosis of faults become a main requirement for any enterprise for safe, optimal and profitable operation. For that reason, the problem of fault diagnosis and diagnosis for industrial processes has received considerable attention during the last two decades. Although of the non-linear characteristics of most of the industrial processes, the majority of the fault diagnosis methods are of linear nature, e.g. principal component analysis (PCA). PCA is the most widely used data-driven technique for process monitoring since it can effectively deal with highdimensional, noisy and highly correlated data by projecting the data onto a lower-dimensional subspace which contains most of the variance of the original data [5]. However, PCA sometimes shows quite a degraded performance when the process exhibits strong nonlinear correlations between its variables [6], and most of the existing non-linear PCA approaches are based on neural network, which has to solve a nonlinear optimisation problem [7]. Among the non-linear PCA techniques, KPCA developed by Schölkopf has been attracted because it does not involve non-linear optimisation [8], it is as simple as in linear PCA, and it need not specify the number of principal components (PCs) prior to modelling compared to other non-linear methods [9,10]. The core idea of kernel PCA (KPCA) is to first map the data space into a feature space using a non-linear mapping and then compute the PCs in the feature space. It should also be noted that KPCA only requires the solution of an eigenvalue problem, and, since it can incorporate different kernel functions, KPCA can handle a wide range of nonlinearities. The main advantage of the KPCA method over referred non-linear PCA approaches is that no non-linear optimisation should be involved [7]. Lee et al. [11] used this method for non-linear process monitoring. Hiden et al. [12] suggested non-linear PCA using genetic programming. Shao et al. [13] proposed a non-linear PCA based upon an input-training neural network. Cheng et al. [14] used adaptive KPCA to monitor small disturbances of non-linear processes. Zhang [15] integrated KPCA and kernel independent component analysis for, respectively, monitoring the Gaussian part and non-Gaussian part of a process. Further, support vector machine is used to classify the fault types. Zhang et al. [16] used kernel partial least squares for non-linear process monitoring.
Although a lot of approaches have been used in the fault diagnosis of hot rolling process [17][18][19][20][21][22][23], but KPCA has not been found. This paper focuses on the development and application of KPCA techniques to address these concerns specifically within the steel industry. The motivation for the application of empirical-based methodologies lies in the fact that monitoring and fault diagnosis can be carried out through process representations (models) which do not require the expert development of phenomenological models. KPCA is capable of efficiently modelling the non-linear relationships that exist between sensor measurements and quality variables. In this research, we use KPCA approach for hot rolling fault diagnosis, and through the analysis of the test result, it is proved that the KPCA method is more effective than the PCA method.
This paper is organised as follows. Section 2 explains KPCA and its properties. In Section 3, the KPCA-based fault diagnosis to a HSM is presented and discussed. Finally, Section 4 provides concluding remarks.

Algorithm of KPCA
As a simple linear transformation technique, PCA compresses high-dimensional data into low-dimensional with minimum loss of data information. When the algorithm is carried out in the feature space, KPCA is obtained. KPCA is a type of kernel-based learning machine. The key idea of KPCA is both intuitive and generic. The basic idea of KPCA is to map the input data x into a feature space F first via a non-linear mapping Φ, and then perform a linear PCA in F. However, it is difficult to do so directly because the dimension h of the feature space F can be arbitrarily large or even infinite. In implementation, the implicit feature vector in F does not need to be computed explicitly, while it is just done by computing the inner product of two vectors in F with a kernel function [24].
Given an initial data matrix, X, representing n observations of m variables as where x i (i = 1, 2, . . . , n) is the n observations (column vectors) normalised from the n training samples x * i (i = 1, 2, . . . , n) of the input space. By the non-linear mapping Φ, the measured inputs are extended into the hyper-dimensional feature space as follows: The mapping of x i is simply noted as Φ(x i ) = Φ i . The sample covariance in the feature space can be constructed by where non-zero eigenvalues of covariance matrix C are positive. A PC v is then computed by solving the eigenvalue problem where λ denotes eigenvalue and v denotes eigenvector of the covariance matrix. Here Cv can be represented as where kx, yl denotes the dot product between x and y. This implies that all solutions v with λ≠0 must lie in the span of F 1 , F 2 , . . . , F n . Hence (5) is equivalent to For any λ ≠ 0, there exists coefficients a i (i = 1, 2, . . . , n), such that Then (9) can be deduced from (7) and (8) To obtain the coefficients a i (i = 1, 2, . . . , n), a kernel matrix K of dimension n × n is defined, and its elements are determined by virtue of kernel tricks where k(x i , x j ) is the calculation of the inner product of two vectors in F with a kernel function. A number of kernel functions exist. According to Mercer's theorem of functional analysis, there exists a mapping into a space where a kernel function acts as a dot product if a kernel function is a continuous kernel of a positive integral operator. The requirement of the kernel function is to therefore satisfy Mercer's theorem [25]. The representative kernel functions are as follows: Polynomial kernel Sigmoid kernel Radial basis kernel where d, β 0 , β 1 and c are specified a priori by user. The polynomial kernel and radial basis kernel always satisfy Mercer's theorem while the sigmoid kernel satisfies it only for some values of β 0 and β 1 [26]. The specific choice of a kernel function implicitly determines the mapping φ and the feature space F. If one has a nonlinear information of process, it could be used to select the kernel function among kernels in KPCA. Before applying KPCA, mean centring and variance scaling in high-dimensional space should be performed. Then (9) can be simplified to where a = a 1 , a 2 , . . . , a n T identifies the eigenvector v after normalisation. Notice that before applying KPCA, we have to perform meancentring procedure since the gram matrix used for the above eigenvalue problem is not mean-centred. The centred gram matrixK can be easily obtained bȳ with From (14)- (16), the final eigenvalue problem in KPCA approach is to solve The coefficient α should be normalised to satisfy a 2 = 1/nl, which corresponds to the normality constraint v 2 = 1 of eigenvector.
Cumulative percent variance (CPV) is utilised to determine number d of PC, i.e.
where CL is the control limit. Then the dimension reduction can be achieved by retaining the first d eigenvectors. After constructing the PCs in the feature space F, the score vector of the kth observation in the training data set can be obtained by projecting the centred valueF(x) onto the eigenvectors v k in F of the new sample x, where k = 1, 2, . . . , d, such that where the mapping of x is simply noted as Φ(x) = Φ. Using (18), we finally obtain a score vector t = t 1 , t 2 , . . . , t d T for x.
For the special case in which F(x) = x, KPCA is equivalent to linear PCA. From this viewpoint, KPCA can be regarded as a generalised version of linear PCA.

Online monitoring procedure
Generally, the fault diagnosis based on PCA uses two statistics, Hotelling's T 2 and SPE. Hotelling's T 2 represents Mahalanobis distance of an observation on the PCA model subspace. On the other hand, SPE represents the Euclidean distance from the model space. Such statistics in the feature space can be obtained using energy decomposition [27,28]. The observations obtained from a significantly nonlinear process are highly non-Gaussian due to the non-linearity. Hence, the non-linear mapping to the higher-dimensional feature space is formulated such that the training samples conform to a Gaussian distribution after the nonlinear mapping. Then, the distribution of the mapped training data can be estimated by a normal probability density in F. The corresponding energy, represented by the negative logarithm of the probability, is given by where the mapping of x is simply noted as Φ(x)=Φ,C denotes the regularised covariance matrix, which is calculated as follows:C where Λ is a diagonal matrix of eigenvalues associated with the retained PCs, λ ⊥ is a constant value which replaces all zero or nearzero eigenvalues in C for regularisation, and I is an identity matrix. Substituting the regularised covariance matrix into (20), we can decompose the energy into two parts, one is a Mahalanobis distance in the KPCA space and the other is a Euclidean distance from the model subspace as follows: where the mapping of x is simply noted as Φ(x) = Φ.
Therefore, two monitoring statistics for new observation x new , T 2 new and SPE new can be written as follows: The confidence limit for T 2 can be determined by F-distribution where F d, n−d, α denotes a F-distribution with the degree of freedom d and n−d with the level of significance 100(1−α)%. For SPE, we used the same rule as linear PCA to obtain 100(1−α)% control limit as follows: where D denotes the effective dimension of feature space discussed below.
Because the dimension of feature space is arbitrarily high, the control limit of SPE may be unrealistically large. Hence the effective dimension of feature space D is empirically determined as the smallest number of the ordered eigenvalues whose cumulative sum is above 99% of the sum of all eigenvalues.

KPCA-based fault diagnosis to a HSM
If the fault diagnosis of HSM is to be carried out, the data acquisition system of the monitoring signal should be set up at first, and different acquisition techniques and methods should be adopted according to different signal types.

Signal selection
There are hundreds of kinds of control and monitoring signals for a single stand of HSM, so it is impossible for us to choose all the signals. AGC is the most important mechanism for dynamic thickness control in conventional rolling mills. Since the AGC system is responsible for maintaining the dynamic performance of the predicted quality of thickness, it is designed to suppress the disturbances during the rolling process such as hardness and temperature fluctuation of the strip. The extremely sophisticated algorithms are developed to fulfil the task of dynamic thickness control, however, the philosophy and the principles for tuning AGC control gains become very complicated and difficult to the operators in the case of improving the quality for specific product and process [29]. The rolling stands of HSM are shown in Fig. 2, from which we can find the hydraulic actuators, known as roll force cylinders, are installed above the top backup roll chocks. Hydraulic actuator is the main equipment to implement AGC.
Considering the main factors affecting the gauge, we select process variables including roll gap, roll force, roll bending force and delivery thickness. Here in order to verify the effectiveness of the algorithm, we only select the 19 AGC-related signals or measured variables which are illustrated in Table 1. As shown in Fig. 3, all signals connected to the control cabinets are received by programmable automation controllers.

Data acquisition
Measurements on all process variables, including those up to the FM, were recorded for each coil rolled over a one week period of production. For this study, the manufacture of a single grade of steel coil on a six-stand FM was considered.
The first stage was to carry out data pre-screening to eliminate or correct data anomalies such as missing data and outliers. The data was then divided into two sub-sets, X M and X V X M comprised data on the next-of-batch coils, while X V included the first-of-the-batch coils. Previous statistical studies have shown that the variation in the quality parameters was greater for the first-of-batch coils than for later coils in the batch. In terms of out-of-specification product, the variation is small; however, in economic terms, the benefits of tighter control limits can yield significant savings. Since data set X M consisted of 612 coils rolled with the correct roll gap settings that corresponded to products that met customers' specifications, it was used to generate a nominal process representation. Data set X V , with 224 first-of-batch coils, formed the test data on which the proposed techniques were evaluated.
Measured data is collected by ibaPDA (Process Data Acquisition) system from programmable automation controllers. The ibaPDA system is a PC-based acquisition and analysis system for measured values. It is made up of distributed hardware components for signal acquisition, connections via optical fibres and other media, such as PROFIBUS-DP, boards for standard PCs or notebooks, as well as online recording software and offline analysis software. Besides recording, the online software also offers a user-friendly visualisation function for an unlimited number of channels with ongoing line diagram presentation, similar to a recorder.
The ibaPDA system features a modular design with heterogeneously structured signal acquisition lines. ibaPDA is capable of processing a very large number of channels in a uniform and synchronous manner. This makes the ibaPDA system particularly suitable for distributed and multiple systems. The measured data gathered is saved in files on the hard disk of the online PC or on special file servers. Fig. 4 shows the system topology with oneserver and multiple clients. The server is a basic process in ibaPDA which handles the data acquisition and storage, and it can run independently from and without a client.

Fault diagnosis
The substantial growth in the use of automated in-process sensing technologies creates great opportunities for manufacturers to detect abnormal manufacturing processes and identify the root causes quickly. It is critical to locate and distinguish two types of faults: process faults and sensor faults. Sensor is a window to understand the process state, and its effectiveness is the prerequisite and basis for the implementation of process control, process optimisation and other fault diagnosis. Dependable sensor data are vital in complex systems, which rely on a suite of sensors for control as well as condition monitoring. With any unanticipated deviations in sensor values, the challenge is to determine if the anomalies are the result of one or more flawed sensors or if it is indicative of a potentially more serious system-level fault. This paper mainly studies the sensor fault.
Sensor fault diagnosis is also known as instrument fault diagnosis, instrument validation, sensor calibration and so on. A sensor (instrument) usually includes a sensing device, a transducer, a signal processing unit, and a communication interface. Any part of the above may be faulty, and then the deviation from the sensor output signal and the actual value of the variable (nominal value) exceeds the allowable range. Sensor faults are classified into four types: bias, drift, precision degradation, and complete failure [30].
Signal curves of HSM in normal production are illustrated in Fig. 5, where the curves of the roll gap, roll force and roll bending force refer to the curves of the six FM stands, and the delivery thickness only refers to the FM delivery thickness. These signals play important roles in hot rolling process. Sensor faults of the roll gap, roll force, roll bending force and delivery thickness will seriously deteriorate the final quality of hot rolled production if not detected. Various sensor faults may occur in the rolling process. Through analysis, different types of sensor faults are found out from the historical data. In order to demonstrate the fault recognition effects of KPCA, four different faults  corresponding to the above mentioned four types, respectively, are selected for validation. As shown in Fig. 6, the faults during rolling process include: (1) a step bias of roll force sensor of F2, where Fi represents the ith stand of FM. All faults are actual faults which have a serious impact on the quality of the products during the production process, which comes from the data collected by the ibaPDA system. And these faults have been confirmed by the field engineer.
In all cases, the sampling interval is 0.01 s, and the rolling processes are displayed for 16 s, with the faults starting at the 8th second. Then 1600 data points are collected to use for KPCA modelling in the 16 s production period.
The SPE and T 2 charts for KPCA and PCA monitoring of the rolling process are shown in Figs. 7a-d. It is evident from these charts that KPCA shows relatively correct fault diagnosis in comparison to PCA, whether the T 2 or SPE statistics. The above study demonstrates that compared to PCA, KPCA can effectively capture the non-linear relationship among process variables and its application to hot rolling process monitoring shows better performance than PCA. These fault identification results prove the validity of the KPCA approach. However, this method should not be overestimated, because sometimes fault cannot be

Conclusions
In this paper, a fault diagnosis method based on KPCA was formulated for supervising hot rolling processes. Thanks to its capability to process data in a nonlinear way, KPCA presents some advantages with respect to other methods like PCA which are basically linear procedures. The sensitivity of KPCA regarding the detection of the onset of non-linear dynamic behaviour has been illustrated using measured data from a HSM with non-linear characteristics. The method provides powerful tool for ensuring consistent highquality products at the same time helping the operators in their decision making. The importance of manufacturing zero-defect coils, in the face of increasing global competition in the steel processing industries cannot be underestimated.
The main drawback of KPCA (with respect to PCA) is the computation time. Since the number of measured signals is often much smaller than the number of time samples, PCA may be performed  economically using singular value decomposition of the observation matrix and the size of the PC matrix does not exceed the number of measurements. Regarding KPCA, data is at first mapped to a high-dimensional feature space whose dimension equals the number of time samples. For this reason, the eigenproblem to solve in the feature space may be costly if the dimension is too high. It requires a limitation of the length of the signals to avoid a too demanding computation.

Acknowledgments
This work was partially supported by Innovation Method Fund of China (no. 2016IM010300) and National Natural Science Funds of China (no. 51404021).