AI-HydSu: An advanced hybrid approach using support vector regression and particle swarm optimization for dissolved oxygen forecasting

Since the variations in the dissolved oxygen concentration are affected by many factors, the corresponding uncertainty is nonlinear and fuzzy. Therefore, the accurate prediction of dissolved oxygen concentrations has been a difficult problem in the fishing industry. To address this problem, a hybrid dissolved oxygen concentration prediction model (AI-HydSu) is proposed in this paper. First, to ensure the accuracy of the experimental results, the data are preprocessed by wavelet threshold denoising, and the advantages of the particle swarm optimization (PSO) algorithm are used to search the solution space and select the best parameters for the support vector regression (SVR) model. Second, the prediction model optimizes the invariant learning factors in the standard PSO algorithm by using nonlinear adaptive learning factors, thus effectively preventing the algorithm from falling to local optimal solutions and accelerating the algorithm’s optimization search process. Third, the velocities and positions of the particles are updated by constantly updating the learning factors to finally obtain the optimal combination of SVR parameters. The algorithm not only performs searches for the penalty factor, kernel function parameters, and error parameters in SVR but also balances its global and local search abilities. A dissolved oxygen concentration prediction experiment demonstrates that the proposed model achieves high accuracy and a fast convergence rate.


Introduction
Dissolved oxygen (DO) is a factor of great importance to aquaculture. When the DO concentration is slightly below the critical level, the cultured aquatic animals will start to show a decrease in feeding, slow growth, an increase in the feed coefficient, and a decrease in shrimp shelling frequency. If the water is continuously low in oxygen for a long time, it reduces the resistance of the animals to environmental stressors and diseases, which can easily cause production losses. Therefore, the aquaculture industry needs to master the changing pattern in dissolved oxygen to reduce risks and increase the success rate of aquaculture.
In recent years, domestic and foreign scholars have made much progress in the accurate prediction of water quality parameters. The commonly used methods for water quality dissolved oxygen prediction mainly include linear regression [1,2], time series prediction methods [3,4], wavelet analysis prediction methods [5,6], artificial neural network prediction methods [7,8], etc. Olyaie et al. [9] used three different methods for DO prediction. The comparison of the estimation accuracy of various intelligent models showed that a support vector machine (SVM) was the most accurate DO estimation model compared to other models. Kisi et al. [10] proposed a new integration method, Bayes model averaging (BMA), for estimating hourly DO concentration. Raheli et al. [11] proposed a hybrid prediction model called an MLP-FFA model; this model is based on the firefly algorithm (FFA), which acts as a metaheuristic optimizer and integrates with a multilayer perceptron (MLP) with the ability to predict monthly water quality in the Langat River Basin. Masrur Ahmed [12] developed a feedforward neural network (FFNN) model and radial basis function (RBF) neural network (RBFNN) model to predict biochemical oxygen demand (BOD) and chemical oxygen demand (COD) in the Surma River. Keshtegar et al. [13] developed and compared two nonlinear mathematical modeling methods, the modified response surface methodology (MRSM) and MLP neural network (MLPNN), to simulate daily DO concentrations. Csábrági et al. [14] used four algorithmic models for the prediction of dissolved oxygen concentration. Experimental results showed that the nonlinear model had better prediction results than the linear model. Heddam et al. [15] used different extreme learning machine (ELM) models to predict DO concentrations. The experimental results showed that the ELM was more effective than an MLPNN and multiple linear regression (MLR) in simulating DO concentrations in riverine ecosystems.
The above water quality analysis and prediction models are mainly focused on shallow learning models based on artificial neural networks. The neural network model has a high self-learning and generalization ability compared with the traditional prediction model. It can not only solve complex nonlinear approximation problems but also has a good simulation and prediction effect on the development trend in water environments. Even so, there are still some defects in the experimental process. For example, the learning speed of the algorithm is too slow to make the training of the model efficient, and it easily falls into local minima during the training process, which makes the results inaccurate.
A support vector machine (SVM) [16] is a class of generalized linear classifiers that performs binary classification of data in a supervised manner whose decision boundary is the maximum margin hyperplane solved for the learned samples. Its minimization of actual risk by seeking structural risk minimization can solve both classification and regression problems. Extending SVMs from classification problems to regression problems yields support vector regression (SVR). At this time, the standard SVM algorithm is also known as support vector classification (SVC). The hyperplane decision boundary in SVC is the SVR model, and this paper used the SVR method to predict DO dissolution.
Luo et al. [17] proposed a hybrid prediction method combining the discrete Fourier transform (DFT) with SVR. Ahmad et al. [18] proposed a novel SVR model for predicting the splice strength of unconstrained beam samples. Zhang et al. [19] proposed a wind power prediction model based on a combination of particle swarm optimization (PSO)-SVR and gray theory. Dodangeh et al. [20] proposed the group method of data handling (GMDH), which is based on SVR for meta-optimization and data processing. Xiang et al. [21] proposed a combinatorial model for extracting information based on ensemble empirical mode decomposition (EEMD), which employs various supervised learning methods for different components of the input data. Panahi et al. [22] proposed the modified simulated annealing (MSA) algorithm to optimize the SVR prediction model with an improved annealing schedule and perturbation range. All of the abovementioned papers have optimized SVR to some extent, but there are limitations in data application, and some of the models are computationally complex.
The penalty coefficient C and the kernel function parameter σ in the SVR model have a significant impact on the model. The penalty coefficient C reflects the degree to which the algorithm penalizes sample data that exceed the accuracy ε, and its value affects the complexity and stability of the model. When C is too small, the penalty for the sample data exceeding the accuracy ε is small, and the training error becomes larger. When C is too large, the learning accuracy increases accordingly, but the generalizability of the model becomes poor. When σ is too large, the model is easily underfit, and the prediction accuracy subsequently decreases. When σ is too small, the model is easily overfit and the training time increases, while the demand for the number of samples increases. Therefore, choosing the appropriate C, σ, and ε greatly improves the prediction accuracy of the model. The most commonly used parameter selection method is grid search, where the best combination of parameters is obtained through continuous trial and error. However, this method also has disadvantages. It can find the global optimal solution when the set interval is large enough, and the step size is small enough. However, this inevitably generates a large number of unnecessary and invalid computations, which in turn leads to an exponential increase in the computation time.
Therefore, combining the SVR model with the PSO algorithm has the advantage of searching the solution space and is computationally simple. This paper proposed and constructed a DO prediction model based on the nonlinear adaptive learning factor of the PSO algorithm fused with an SVM (AI-HydSu). The experimental results showed that 1) the proposed AI-HydSu model performs well in a small nonlinear sample of DO data and has a faster convergence speed and higher accuracy than other similar algorithms; 2) the AI-HydSu method can effectively solve the problem of long model operation time caused by SVR using the grid search method to select the model structure parameters; 3) compared with other similar algorithms, the model proposed in this paper has a better generalizability and has a better prediction effect for water quality parameter prediction in different waters.
The other chapters of this paper are organized as follows: Chapter 2 of this paper introduces the related methods; Chapter 3 introduces the algorithm and constructs the model in detail; Chapter 4 shows the experimental results of the proposed algorithm and conducts a comparative analysis; Chapter 5 summarizes the main ideas of this paper and introduces the next work.

Support vector regression
SVR uses the decision boundary of the optimal hyperplane in SVC to build a regression model. Suppose the set of load data is (x i , y i ), where x i is the sample input, y i is the sample output, and φ (x i ) denotes the feature vector of x i after mapping to the high-dimensional feature space. Its corresponding optimal hyperplane equation is: where ω is the normal vector and b is the displacement term.
For the sample (x i , y i ), traditional regression models usually calculate the loss directly based on the difference between the model output f (x i ) and the true output y i ; the loss is 0 if and only if f (x i ) is exactly the same as y i . In contrast, SVR assumes that we can tolerate an error of at most ε between f (x i ) and y i . The loss is calculated only when the absolute value of the difference between f (x i ) and y i is greater than ε.
The essence of the SVR model training process is to find the optimal ω and b such that f (x i ) approximates y i , resulting in a convex optimization function: where C denotes the penalty coefficient, which serves to balance the correlation between the structural risk 1 2 ω 2 and empirical risk The constraints corresponding to this convex optimization function are: where the relaxation factor ε represents the deviation value of f (x i ) and y i . Then, Lagrange multipliers a i and a * i are introduced: To solve this constrained convex optimization function, ω and the SVR function f (x) can be obtained as follows: where σ represents the kernel parameter of the kernel function K (x i , x). The resulting kernel function K (x i , x) can improve the model's ability to deal with nonlinear regression problems. The RBF kernel can effectively improve the fitting effect and prediction performance of the model, so it is often used as the kernel function to optimize the SVR model.

Particle swarm optimization
Eberhart and Kennedy first proposed PSO in 1995 [23]. The algorithm was modified for Hepper's model of a simulated bird swarm (fish school) to enable particles to fly to the solution space and land at the best solution, resulting in the PSO algorithm. The PSO algorithm has many advantages, such as simplicity, ease of implementation, no gradient information required, few parameters, and in particular, its natural real number encoding feature is particularly suitable for dealing with real optimization problems while having a deep intelligent background. Therefore, PSO is suitable for both scientific research and engineering applications.
If a particle is used to simulate the abovementioned bird individuals, then each particle can be regarded as a searching individual in the N-dimensional search space. The current position of the particle is a candidate solution of the corresponding optimization problem, and the flight process of the particle is the search process of the individual [24]. The flight speed of the particle can be dynamically adjusted according to the particle's historical optimal position and the population's historical optimal position. The particle has only two properties: velocity and position. The velocity represents the speed of movement, and the position represents the direction of movement [25]. The optimal solution searched by each individual particle is called the individual extremum, and the optimal individual extremum of the particle swarm is used as the current global optimal solution. By continuously iterating, the velocity and position are updated, and finally, the optimal solution satisfying the termination condition is obtained.
The algorithm flow is as follows.

Initialization
First, this study sets the maximum number of iterations, the number of independent variables of the objective function, the maximum velocity of the particles, and the position information for the whole search space. In this paper, the velocity and position are randomly initialized on the velocity interval and the search space, and the particle swarm size is set to M. Each particle is randomly initialized with a flying velocity.

Individual extremes and global optimal solutions
The fitness function is defined, and the individual extremes are the optimal solutions found for each particle. Then, a global value is found from these optimal solutions, called the current global optimum, which is updated by comparing it with the historical global optimum.
3. Equation of the updated velocity and position where ω is the inertia factor with a nonnegative value. When ω is larger, the global optimization ability is stronger, and the local optimization is weaker. In contrast, the local optimization is stronger and the global optimization is worse. C 1 , C 2 is called the acceleration factor, which is generally a constant. The former is the individual learning factor of the particle, which is the weight coefficient of the particle tracking its own historical optimal value, and the latter is the particle's social learning factor, which is the weight coefficient of the particle tracking group's optimal value. random(0, 1) represents the random number in the interval (0, 1), P id represents the dth dimension of the individual extreme value of the ith variable, P gd represents the dth dimension of the global optimal solution of the ith variable, X id represents the position of the dth dimension of the ith variable, and V id represents the velocity of the dth dimension of the ith variable. r is a coefficient added in front of the velocity when updating the position. This coefficient is called the constraint factor and is usually set to 1. The particles here track their own historical optimal value and global (group) optimal value at the same time to change their position and velocity, so this method is also called the global version of the standard PSO algorithm [26].

Termination Conditions
(1) The set number of iterations is reached; (2) The adaptation value reaches a certain value.
The acceleration factors C 1 and C 2 represent the weight of the statistical acceleration term that pushes each particle to the individual optimal solution and the global optimal solution position. A lower value allows the particles to hover outside the target area before being pulled back, and a higher value causes the particles to suddenly rush toward or over the target area. When C 2 = 0, it is called the self-aware PSO algorithm (i.e., "only self, no society"); there is social sharing without information at all, resulting in the slow convergence of the algorithm. When, C 1 = 0, it is called the selfless PSO algorithm (i.e., "only society, no self"); it will quickly lose group diversity, easily fall into the local optimal solution and cannot jump out. When C 1 and C 2 are not 0, then it is called the complete PSO algorithm; in this case, it is easier to maintain the balance between the convergence speed and the search effect, which is the best choice.

Wavelet threshold denoising
A prerequisite for accurate water quality prediction is the quality and accuracy of the DO dataset. During data acquisition, noise may be generated if the sensor equipment has a low accuracy and performance degradation. If the unprocessed raw data are directly used for DO prediction, the data prediction accuracy will be affected. Therefore, noise reduction is needed during raw data acquisition to ensure the DO prediction accuracy.
Depending on the noise energy, conventional denoising methods generally focus on high frequencies. Based on the signal spectrum distributed in a finite interval, the Fourier transform can be used to transform the noisy signal to the frequency domain, and then a low-pass filter can be used for filtering. However, the Fourier transform-based denoising method cannot effectively distinguish between the high-frequency part of the useful signal and the high-frequency interference caused by noise, so there is a tradeoff between protecting the localization of the signal and noise suppression. The wavelet transform can preserve the spikes and local prominence of the signal very well. Given the advantageous nature of the wavelet thresholding method, the widely used wavelet thresholding denoising [27] is used here for analysis.
The basic idea of wavelet threshold denoising is the selection of the generated wavelet coefficients after transforming the signal through the Mallat algorithm [28]. Given that the wavelet coefficients of the signal are larger after wavelet decomposition, the wavelet coefficients of the noise are smaller than the wavelet coefficients of the signal. By selecting a suitable threshold value, wavelet coefficients greater than the threshold value are considered signal and should be retained; those less than the threshold value are considered noise and should be set to zero for the purpose of denoising.
The basic steps of the wavelet threshold contraction method are as follows: (1) Wavelet basis function selection: Generally, the wavelet basis function is selected based on the support length, vanishing moment, symmetry, regularity, similarity, etc., for comprehensive consideration. Given that wavelet basis functions have their own characteristics in the processed signals, no single wavelet basis function can achieve optimal denoising results for all types of signals.
In general, the Daubechies wavelet and symlet families [29] are often used in speech denoising, and a wavelet with N layers is selected to perform wavelet decomposition of the signal [30].
(2) Selecting the number of decomposition layers: On the one hand, the larger the number of decomposition layers is, the more obvious the differences in the noise and signal performance characteristics, and the more beneficial in the separation of the two. On the other hand, the larger the number of decomposition layers is, the larger the distortion of the reconstructed signal, which will affect the final denoising effect to a certain extent. Therefore, extra attention is needed to address the contradiction between the two and choose an appropriate decomposition scale.
(3) Threshold selection: In the wavelet domain, the effective signal corresponds to a large coefficient, while the noise corresponds to a small coefficient. At this point, the coefficients corresponding to the noise in the wavelet domain still follow the Gaussian white noise distribution. The threshold selection rule is based on the model y=f(t)+e, where e is Gaussian white noise (N(0,1)). Therefore, the evaluation of the threshold that can eliminate noise in the wavelet domain can be performed by wavelet coefficients or the original signal. Currently, the most common threshold selection methods are fixed threshold estimation, extreme value threshold estimation, unbiased likelihood estimation, and heuristic estimation (N is the signal length).
(4) Threshold function selection: After determining the threshold of Gaussian white noise in the wavelet coefficients (domain), a threshold function is needed to filter these wavelet coefficients containing noise coefficients and remove the Gaussian noise coefficients. Among them, the commonly used thresholding functions are soft thresholding and hard thresholding methods.
(5) Reconstruction: The signal is reconstructed with the processed coefficients.

AI-HydSu model
In the early stage of the algorithm search, the searchability of the particle itself is evaluated, and the later stage of the algorithm search should focus on the optimal global particle. Therefore, in this paper, a nonlinear cosine function is used for the acceleration factor C 1 , and a nonlinear sine function is used for the acceleration factor C 2 : The acceleration factor is updated continuously with an increasing number of iterations to accommodate the particle velocity updates. In the late stage of the algorithm search, the acceleration factor can obtain a larger C 2 and a smaller C 1 , thus jumping out of the local extremes.
The propose AI-HydSu algorithm refers to the PSO algorithm based on the nonlinear adaptive learning factor's optimization capability to continuously train the SVR model by searching for the parameters C, σ, and ε. The PSO algorithm based on the nonlinear adaptive learning factor has a better ability to optimize high-dimensional functions, so it has a better ability to optimize the parameters of high-dimensional kernel functions in SVR to achieve the accurate prediction of the DO concentration. Applying DO prediction to optimize the two parameters in SVR not only improves the parameter search capability but also better balances the global and local search capability of the algorithm and reduces the prediction time.
19: The AI-HydSu DO concentration prediction model is shown in Figure 2. First, the noise in the original data are reduced using the wavelet threshold denoising algorithm, and the initial values of the parameters of the PSO algorithm (i.e., the initial velocity and initial position of the particles) are determined; the parameters to be optimized by the SVR model (i.e., the penalty parameters, kernel coefficients, and algorithm accuracy) are identified, and the optimization-seeking intervals for each parameter are determined. Second, the fitness function of the SVR model is set. The dissolved oxygen prediction model proposed in this paper uses the mean squared error (MSE) function as the fitness function. This paper then takes advantage of the search in the solution space of the PSO algorithm to update the velocities and positions of the particles by introducing nonlinear learning factors and then finding the individual optimal solutions and the global optimal solution. Finally, the DO concentration is predicted by using the optimal combination of SVR parameters, and the optimal combination of the SVR parameters is obtained. The specific process is shown in Algorithm 1.

Data source and preprocessing
In this paper, the DO concentration in a shrimp farming base in a marine ranch in Yantai in China's Blue Economic Zone, was taken as the research object. The sampling period was August 1, 2016, June 30, 2020, with 10-minute intervals adopted. The DO data for 54 consecutive days were used as the data sample set for this experiment, with a total sample size of 7776 samples.
The experimental data were obtained from various aquaculture farms in the Blue Economic Zone of China. Figure 3 shows some of the raw DO signal sequences. Figure 3 shows that the changes in the DO peaks and troughs in the farmed water are relatively substantial, showing strong nonlinear, nonsmooth, and periodic characteristics, and the raw DO signal contains noise. If used directly for model training, it will increase the training time and lead to slow convergence or even a failure to converge. To reduce the interference of noise and obtain the real DO signal, it is necessary to reduce the amount of noise the raw signal of the monitored DO concentration.  In this study, the noise in the raw signal of the monitored DO was reduced by using wavelet threshold denoising. The noise-containing signal was decomposed by orthogonal wavelets at each scale, as shown in Figure 4. Figure 4 shows the wavelet transform of the original data using the db8 wavelet with 5 decomposition layers.
In this study, the decomposition values at large scales (low resolution) are retained, and for the decomposition values at small scales, a default threshold is used. Wavelet coefficients with amplitudes below this threshold are set to 0, and those above this threshold are retained. Finally, in this study, the processed wavelet coefficients are reconstructed using a wavelet transform to recover the useful signal. The noise reduction process is carried out on the raw signal of the monitored DO, and its noise is shown in Figure 5, while the smoothed data after removing the noise are shown in Figure 6. The noise reduction model based on wavelet analysis can effectively reduce the interference of noise in the farm water quality parameters of the DO monitoring data, which can not only retain the useful components of the original DO signal but also reflect the change trend in DO concentration.

Algorithm implementation and testing
In this study, the AI-HydSu-based DO prediction simulation for shrimp aquaculture was written in Python 3.7. The dataset was divided into training and testing sets for DO concentration prediction. The experiment use 95% of the dataset as the training set and 5% as the test set.The DO data of the first half-hour were used as the visual layer input quantity to predict the change in DO in the next moment. To verify the reliability of the algorithms, the dataset was disrupted without rules. The maximum population size for the PSO algorithm and AI-HydSu algorithm was set to N=10; the maximum number of iterations was set to Tmax=20; the initial values of acceleration factors were set to C 1 = 4 and C 2 = 4, and the initial weight was set to w = 0.73.
The fitness curves of the two algorithms are shown in Figures 7 and 8. It can be seen that the PSO algorithm [31] without the nonlinear learning factor falls into a local optimum at evolutionary generation 3 and keeps falling into the local optimum afterwards, and finally, the MSE function value stabilizes at 0.001065 at 14 iterations. In contrast, the AI-HydSu algorithm has a large slope at the beginning and stabilizes at 4 iterations, with the MSE function value stabilizing at 0.00095. The main reason why PSO easily falls into a local optimum is that the learning factor is a fixed value, which leads to the algorithm not focusing on the optimal global particles in the later stage and thus very easily falling into local optimum solutions. The AI-HydSu algorithm focuses on the searchability of the particles themselves at the early stage of the algorithm search and focuses on the optimal global particles at the later stage; thus, it avoids falling into the local optimum. Through the experimental analysis, it can be seen that the AI-HydSu algorithm is faster and less likely to fall into the local optimum than the PSO algorithm, and its overall performance is better than that of the PSO algorithm.  Figure 9 shows the relative error between the true value and the predicted value. From Figure 9, it can be seen that the maximum value of the AI-HydSu algorithm prediction relative error curve does not exceed 2%, and the prediction results are relatively stable, which further indicates that the AI-HydSu model achieves a good prediction accuracy.  To verify the prediction performance of the proposed AI-HydSu-based DO prediction model, the backpropagation neural network (BPNN) [32,33], PSO-SVR [34,35], autoregressive integrated moving average (ARIMA), and long short-term memory (LSTM) algorithms were chosen to predict the DO time series of shrimp aquaculture waters (Figure 10), and the prediction results were evaluated by the mean absolute error (MAE) [36], root mean square error (RMSE) [37], and coefficient of determination R 2 [38]. and 63.7%, respectively. R 2 is an important statistic reflecting the goodness of fit of the model and is the ratio of the regression sum of squares to the total sum of squares. R 2 takes values between 0 and 1 and is unitless, and its numerical magnitude reflects the relative degree of the regression contribution, i.e., the percentage of the total variation in the dependent variable Y that is explained by the regression relationship. R 2 is the most commonly used index to evaluate the degree of merit of regression models; the larger R 2 is (i.e., the closer it is to 1), the better the fitted regression equation is. In Table 1, the R 2 of the AI-HydSu model proposed in this paper is closest to 1. The experiment shows that the AI-HydSu model has high accuracy in the prediction of DO concentration. Figure 10 shows the comparison between the predicted and true value curves of the proposed AI-HydSu prediction model and other similar prediction models under the same conditions [39,40], and it can be seen from this figure that the proposed model fits the true value curve more closely and does not have large deviation points and abrupt change points. The AI-HydSu model is applicable not only to DO data but also to other water quality parameters. Figure 11 shows the temperature prediction of the AI-HydSu model applied to the water quality parameters, and it can be seen that the predicted values do not deviate much from the true values, and some of the curves almost overlap [41,42], reflecting the superiority of the AI-HydSu model.
To verify the accuracy, stability, and generalizability of the AI-HydSu model, experiments were conducted on nine different datasets [43], with experimental data from nine ranches in China's Blue Economic Zone. Since the amount of data varies from ranch to ranch, the experiments all use 95% of the dataset as the training set and 5% as the test set and compare the real values with the predicted values. The experimental results are shown in Figure 12. From this figure, it can be seen that the AI-HydSu model achieves a better prediction effect for different training samples.
The AI-HydSu model was further validated against the RBFNN, LSTM, BPNN, and PSO-SVR model on nine different datasets, and the experimental results are shown in Figure 13. The stability and accuracy of the AI-HydSu model are experimentally proven to be better than other similar algorithms, and the average absolute errors in the different datasets are minimized.

Conclusions
In this paper, a DO prediction model based on a PSO algorithm incorporating an SVM with nonlinear adaptive learning factors is proposed for accurate prediction of DO concentration. The results show that the AI-HydSu model has a better prediction accuracy on a small sample, which can effectively solve the problems of easily falling into a local optimum and slow convergence speed of the traditional PSO-SVR model. In this study, the DO concentration of each marine pasture in the Blue Economic Zone of China was studied, and the results of the AI-HydSu model were experimentally compared with those of the PSO-SVR model and other similar prediction methods. The experimental results show that the AI-HydSu prediction method can fit the training and testing data well and has the characteristics of smaller error fluctuations and more stability. The RMSE and MAE of the AI-HydSu model are smaller than those of the other algorithms. Overall, the proposed AI-HydSu model achieves a good prediction performance, has as strong generalizability and can provide accurate aquaculture fishing information services for the production of intensive fisheries.
There are many factors affecting DO, such as the water temperature, salinity, and pH. Future research will explore the relationship between DO and other water quality parameters in depth to build a more accurate DO prediction model.