Studies on fault diagnosis of dissolved oxygen sensor based on GA-SVM

: The present research envisaged the analysis of the dissolved oxygen fault of the water quality monitoring system using the genetic algorithm-support vector machine (GA-SVM). The real-time data collected by the dissolved oxygen sensor was classified into the fault types. The fault types were divided into complete failure fault, impact fault, and constant output fault. Based on the fault classification of the dissolved oxygen parameters, SVM fault diagnosis experiments were conducted. Experimental results show that the accuracy of dissolved oxygen was 98.53%. On comparison with the experimental results of the back propagation (BP) neural network, it was found that the diagnosis results of the dissolved oxygen parameters using SVM were better than those of the BP neural network. The genetic algorithm (GA) was used to optimize the parameters. After iteration, the optimal parameters such as C and g were selected (C is the penalty coefficient, which adjusts the weight of the two index preferences in the optimization direction, i.e. , the tolerance for errors, and g is a parameter that comes with the function that implicitly determines the distribution of the data after mapping to the new feature space.). By using GA, after iteration, the optimized values of C and g was found to be 2.1649 and 5.3312, respectively. The experimental results showed that the method exhibited a good accuracy.


Introduction
Dissolved oxygen is a necessary factor for the survival of the water organisms, and the concentration of dissolved oxygen is also an important indicator for evaluating the quality of the water environment [1]. With the continuous development in the field of aquaculture, the number of sensors with various parameters has increased, and the failure of sensors during the use has occurred frequently, including the failure of sealing, material cracking, performance degradation, and data monitoring failures. These types of problems are collectively called sensor failures [2]. In the aquaculture water quality monitoring system, the failure of the sensor can cause a misdiagnosis and false alarms in the entire system. Long-term failure can cause the entire monitoring system to fail thereby affecting the economic benefits. Therefore, the proper functioning of the sensor in the process of water quality monitoring has a decisive effect on the entire monitoring system, and the fault diagnosis of the sensor itself is particularly important [3].
Sensors have become a wide range of tools for the agricultural monitoring and the electronic industrial applications. They are usually used in the water industry production, ocean exploration, environmental protection, resource investigation, medical diagnosis and biological engineering. Fault diagnosis methods [4] can be summarized into three methods, namely signal detection-based fault detection and diagnosis [5], model-based fault detection and diagnosis, and knowledge-based fault detection and diagnosis [6]. Model-based fault diagnosis methods can be divided into three types as the parameter estimation method [7], state estimation method [8] and the equivalent space method [9]. The fault detection and diagnosis method based on the signal processing uses the basic signals of some special equipment, measurement system or equipment to extract the fault characteristics of the measurement signal through the signal processing technology, in order to achieve the purpose of detecting the fault [10]. Knowledge-based fault detection and diagnosis methods require a large amount of historical measurement data, and include a calculation program that uses expert knowledge for reasoning. The knowledge-based fault diagnosis method relies on experience, while the artificial intelligence and other technologies can analyze the historical data to achieve the fault diagnosis of the system or equipment. An important feature of this method is that it can obtain the characteristic information through the training and learning of the historical data, and can obtain the diagnosis, based on the output results.
In recent years, there have been many research achievements on the sensor fault diagnosis in the water quality monitoring. In [11], it was the first to study the fault diagnosis technology and developed a fault diagnosis expert system, which started a new wave of fault diagnosis. Subsequently, Wang et al. [12] developed a micro-computerized rotating machinery condition monitoring and fault diagnosis device for the mechanical state, and verified the feasibility of the diagnostic device. Zhou et al. [13] utilised a different approach and analyzed the signal characteristics when a fault was present, thereby developing a signal processing fault diagnosis system and achieving good results. In terms of the water quality monitoring system, Wang et al. [14] used a deep learning to diagnose and analyze the faults of the polarographic dissolved oxygen sensor, and found that the experimental effect was good through comparison. In 2014, Wang et al. [15] proposed the research of seawater quality detection technology, based on the ultraviolet-visible spectroscopy, using the partial least square regression of visible spectroscopy for modeling, and realized the automatic compensation and prediction analysis during the sensor use.

Experimental materials
Dissolved Oxygen (DO) represents the amount of the dissolved oxygen in water. The unit is milligrams of oxygen per liter of water. The content of the dissolved oxygen in aquaculture water quality is an important indicator of the water quality parameters and an important factor in characterizing the degree of water pollution. For the polarographic sensor, it belongs to the electrochemical type, that is, the redox reaction is used to complete the determination of dissolved oxygen parameters. Specifically, platinum is used as the cathode, silver/silver chloride is generally used as the anode, and potassium chloride solution is used as the electrolyte. Among them, in addition to platinum as the cathode material, there is a layer of selective permeability film, which makes the dissolved oxygen pass to the other side. The material of the film is generally selected from novel materials such as polyvinyl chloride. A voltage of 0.5 to 1.5 V is applied between the two electrodes to promote the generation of current, so that the concentration of dissolved oxygen in the water can be obtained by measuring the current [16,17].
The sensor used in this study to monitor the dissolved oxygen is film-coated polarography, whose appearance is a chamber filled with electrolyte. The outside of the chamber is wrapped with a selectively permeable film, as shown in Figure 1. The inside of the dissolved oxygen sensor is composed of a cathode and an anode. The cathode uses platinum, the anode uses silver, and the electrolyte uses potassium chloride and other salt substances. When the external voltage is 0.8 V, the cathode inside the sensor loses electrons to form a current, and the current of the monitoring probe can determine the content of the dissolved oxygen in the water. The accuracy of this dissolved oxygen sensor is ±0.4%, and the collection range is 0 to 20 mg/L. The experimental equipment of this study employed the polarographic dissolved oxygen sensor, and the water quality monitoring system was used to collect the dissolved oxygen parameters. The water quality monitoring system mainly composed of the sensors, GPRS wireless transmission module, ZigBee wireless transmission network and the data analysis module. ZigBee network was used for the wireless transmission, because ZigBee network has the characteristics of low power consumption, can realize the data transmission in a small range, and achieves the purpose of parameter data transmission in water quality monitoring system. General packet radio service (GPRS) can guarantee the reliability of the remote transmission of monitoring data. GPRS can quickly create the network connections with other networks, and the users can be online for 24 hours. In this time, the charge is collected by the way of traffic statistics, and the cost of its utilization is relatively low. The specific water quality monitoring system is shown in Figure 2. The specific parameters and the types of the sensors in the designed water quality monitoring system are shown in Table 1.

Obtaining experimental data
In May 2019, the dissolved oxygen parameters were collected in the crab aquaculture pond from the Gaocheng Town, Yixing City, Jiangsu Province. The dissolved oxygen parameters were collected once every 10 minutes.
Taking one of the dissolved oxygen sensors as an example, the trend of the dissolved oxygen concentration data with time changes is shown in Figure 3. The abscissa is the time, and the ordinate is the concentration of the dissolved oxygen. Because the sensor collection interval was once every 10 minutes, 144 collections were required 24 hours a day. Figure 3 shows the variation for three consecutive days. It can be seen from Figure 3 that the concentration of the dissolved oxygen under normal conditions was 4-12 mg/L, and the dissolved oxygen changed periodically with the abscissa time, where time 1 was the data monitored at 0:00 in the morning. The lowest was around 6:00 at dawn in 24 hour period. After this time, the sun's rays gradually increased, which caused a gradual increase in the photosynthesis intensity of the phytoplankton in the water body, leading to an increase in the dissolved oxygen content. The concentration of the dissolved oxygen reached its highest peak at around 4:00 pm in a complete day. After that, as time changed, the light intensity gradually weakened, and the corresponding photosynthesis also weakened, resulting in a slower generation of the dissolved oxygen in the water. Repeatedly, the content of dissolved oxygen constantly changed, but its content was roughly cyclical.

Classification of fault types
Compared with the change trend of the dissolved oxygen under normal conditions, by analyzing the monitored dissolved oxygen data, the failures of the change of dissolved oxygen parameters were summarized into the following three types: complete failure, impact failure, constant output failure, etc. Among them, according to the type of failure, it was further divided into complete failure and incomplete failure. According to different causes, incomplete failure was divided into the shock failure and constant output failure. The loss of a specified function of a product is called a failure. The loss of the prescribed function at this time includes the complete loss of the prescribed function, as well as the reduction of the prescribed function. Products that can be repaired are called failures. According to the degree of failure, it can be divided into complete failure and partial failure. Total failure refers to the failure to completely loose the specified function. Using the dissolved oxygen parameters, on analyzing the monitoring data of the polarographic dissolved oxygen sensor, it was found that the data of some sensors was stable and normal output in the early stage. In the process of data analysis, in order to quantify the change of dissolved oxygen content over time, the complete failure without reading data is calibrated to indicate 0 to quantify the characteristics of the failure. Complete failure is a rare type of failure of the dissolved oxygen sensor. It is caused by lightning strikes in thunderstorm weather, resulting in excessive current and damage (as shown in Figure 4). Eventually, the sensor cannot be repaired and a complete failure occurs, leading to a malfunction. At the same time, the consumption of electrolyte also makes the polarographic dissolved oxygen sensor unable to monitor, resulting in complete sensor failure.

Impact failure
Impact failure means that the average value of the measured data does not get changed in the measured value of the sensor, but the measured maximum or minimum value changes. Compared with the time of continuous changes in the surroundings, the shock data showed a sudden change. As shown in Figure 5, at time 31 and time 151, the dissolved oxygen concentration reached 12 mg/L and 8 mg/L, respectively, and the data around the control data changed suddenly, which is the case of impact failure.
In the investigation, it was found that when the machine with large power around the sensor was  49  65  81  97  113  129  145  161  177  193  209  225  241  257  273  289  305  321  337  353  369  385  401  417 Dissolved oxygen concentration(mg/L) Time working, it was easy to cause an impact failure of the sensor. When this machine stopped working, most sensors returned to their original normal working state.  The constant output failure refers to the fact that there are data measurements during the sensor detection process, but in terms of parameters that change with the time domain, the monitoring results are maintained to the same number, which is a constant output failure. Similar to the indication of a complete failure changing to 0, the constant output fault data is also maintained at a certain moment, but often the constant output fault does not change step by step, and remains unchanged from to the last output data. For dissolved oxygen parameters, taking one of the sensor  Dissolved oxygen concentration (mg/L) Time data as an example, as shown in Figure 6, after 271, the dissolved oxygen concentration was maintained at about 5 mg/L, which is called a constant output failure of the dissolved oxygen.

Principle of SVM
Support Vector Machine (SVM) is an algorithm for solving the problems related to the classification and regression. With the continuous attempts and utilization in different fields, SVM shows unique advantages in solving the small sample and nonlinear recognition problems. At the same time, the theory and method of SVM can be applied to the high-dimensional pattern recognition, which can be extended to other machine learning problems such, as in function fitting [18,19]. The emergence of SVM provides a new idea for the classification problem in the machine learning. Its basic idea is mainly a new exploration in the data mining, which can perfectly solve many problems such as classification and regression, and can be extended to various aspects such as the parameter prediction and evaluation. Based on this, SVM can be widely used in engineering, science, management, and in many other disciplines. At the same time, SVM is also a general linear classifier, which can be understood as an example of the Tikhonov Regularization (TR) method, a special case analysis, which can be called a maximum marginal zone classifier. At the same time, as the SVM is used to solve the classification and regression problem, its core idea is to map the input feature vector to a high-dimensional feature space through non-linear mapping. Such a mapping relationship can realize the division of the hyperplane space. Of course, the mapping is selected in advance and can be selected according to requirements. The segmentation in the hyperplane avoids the non-linear surface segmentation calculation in the original space [20].

BP neural network
BP (Back Propagation, BP) neural network has been developed for more than 30 years. It is a topological structure and generally includes a three-layer network, viz., the input layer, the middle layer (also called the hidden layer) and the output layer. The layers in the three-layer network are stacked and interconnected. The BP neural network simulates the response mechanism of the brain. After the process of neural conflict, the ends of multiple dendrites receive external signals, and in the process they are transmitted to neurons for processing and fusion. Finally the nerves are integrated through the axons that pass to other neurons or effectors. The complex topological structure features: neurons in each layer and adjacent layers are connected, and there is no feedback connection between the same layer and each layer.
BP neural network has a wide range of applications and can be used in various fields such as classification, clustering, and prediction. The neural network as the main body needs a sufficient data for training, so that the network can learn implicit knowledge from it. For specific problems, the typical characteristics of the problem and the various measurement data are analyzed, so that such data can be used to train the neural network, and finally achieve the goal of reliability of the results. Even though the developing BP networks have certain applications in classification and prediction, its network itself still retains certain shortcomings and deficiencies. For example, the learning rate in the BP neural network is immutable, and this kind of learning rate limitation causes the network to take more time to train during operation, and the network's convergence rate is slow. In addition, the BP neural network algorithm causes the weight to converge to a specific value during operation, but often this specific value cannot be determined to be the global minimum in the error plane. The reason may be the gradient descent method may create a local minimum. Finally, the number of hidden layers in the BP neural network cannot be selected. Many experts and scholars often determine it, based on experience in the course of experiments. This increases the variability of the network, and the network retains a lot at this time. The redundancy increases the burden of the online learning.

Genetic algorithm optimization
Genetic algorithm (GA) has been developed for more than 50 years. As a natural selection mechanism optimization algorithm, it has been widely used in various fields such as combination optimization, image processing, and machine learning, and is moving towards a wider field. Genetic algorithm as a global optimization algorithm, its main idea is to use computer simulation operations through mathematics, and at the same time draw on Darwin's biological evolution theory of natural selection, to achieve the ultimate goal of searching for the optimal solution by simulating the natural evolution process. Based on this, the GA genetic algorithm mainly starts from the coding space instead of the problem parameter space. First, it studies the population that represents the potential solution set in the problem. According to Darwin's evolution theory, the basic idea of survival of the fittest in the process of evaluating individual pros and cons, repeated use of selection, crossover, and mutation operators acts on the group to make it evolve continuously and gradually approaches the optimal solution [21,22].
The specific process is as follows: Where, m represents the total number of output nodes of the neural network, i d represents the expected output of the i-th node of the neural network, i p and represents the predicted output of the i-th node of the neural network.
In this, the fitness value of individual i is the total number of individuals in the population.
Where, max P represents the upper bound of the individual, min P represents the lower bound of the individual, v represents the current iteration number, max V represents the maximum evolution number, and  is a random number between (0,1).

Genetic algorithm optimization
In order to optimize the dissolved oxygen parameters, GA genetic algorithm was employed in the present study to find the optimal parameters. The parameter C is the penalty factor, i.e, the tolerance of the error. The higher the C is, the easier the overfitting occurs, and the smaller the C, the easier the underfitting. Therefore, choosing the best penalty factor can achieve a better classification. g is the radius of the kernel function, describing the influence of a single training sample. The greater the value of g, the fewer the support vectors, and the greater the value of g, the more support vectors, and the number of support vectors affects the speed of training and prediction.The optimization results are shown in Figure 7. As shown in Figure 7, after using GA genetic algorithm for the dissolved oxygen parameters, the optimal C after screening was 2.1649, g was 5.3312, the number of termination iterations was 50, and the number of population is 20.

Genetic algorithm optimization
The diagnosis of the dissolved oxygen sensor was diagnosed by using SVM diagnosis model. Starting from extracting the data sample set, all data according to the ratio of training set and test set was divided 3:1, the original sample data before sample training was pre-processed, followed by the training sample, and the obtained classifier was used model the test The samples were tested, and the test accuracy of the model was finally obtained, and the accuracy of the model was determined by the parameters. The types of failures of dissolved oxygen parameters include impact failure, constant output failure and complete failure. Mark normal data, electric shock failure, constant output failure and complete failure as 1, 2, 3 and 4 respectively. First, the collected data of 272 sensors was used for training and testing, of which 204 sensors were used for testing, and 68 sensors were used for training. The diagnosis result is shown in Figure 8.
It can be seen from Figure 8 that in the four types of data classification, almost all of the 68 sets of data were correctly classified, and only one data was the type 1 data, which was indeed mistaken for the type 2 data classification. At this time, the detection accuracy was 98.53%.

Genetic algorithm optimization
In order to better analyze the diagnosis results, the BP neural network was used for a comparative study. The same data of BP neural network was input for the training and testing. In this experiment, the setting parameters of the BP neural network are: learning rate 0.05; learning rate 0.05. The hidden layer is 7 layers; the number of iterations is 1000; and the target error is set to 0.00001. The actual value of the BP neural network was compared with the predicted value as shown in Figure 9. It can be seen from Figure 9 that in the 4 types of data classification, there were 5 sets of data marked as the first type, i.e., the normal data of dissolved oxygen, which were erroneously divided into the third type of abnormal data and the fourth type of abnormal data. At this time, the detection accuracy was 92.68%.

Conclusions
This study employed the dissolved oxygen fault of the water quality monitoring system as the research object, and conducted a fault diagnosis research on the dissolved oxygen based on the SVM algorithm. According to the different types of faults of dissolved oxygen, it achieved the purpose of efficient diagnosis of the sensor faults.
Considering the faults of the dissolved oxygen, the fault types were classified. The main fault types of the polarographic dissolved oxygen sensor included three types of faults: complete failure, impact failure, and constant output failure.
In this study, the fault diagnosis results of sensors were based on the fault classification of the dissolved oxygen, and the fault diagnosis experiment of SVM was carried out. The experiment found that the SVM diagnosis result was 98.53%, better than the BP neural network. The GA was used to optimize the parameters. After iteration, the optimal values of C and g were found to be 2.1649 and 5.3312, respectively.
In the present study, genetic algorithm was employed to optimize the classification method of the support vector machine, and the optimized classification accuracy was greatly improved. This indicated that the genetic algorithm optimized support vector machine classification method was highly suitable for the classification of the dissolved oxygen faults.