Comparing deep learning with several typical methods in prediction of assessing chlorophyll-a by remote sensing: a case study in Taihu Lake, China

Chlorophyll-a (Chl-a) is an important index in water quality assessment by remote sensing technology. For the study of Chl-a value measurement in rivers or lakes, there were many classical methods, such as curve ﬁ tting, back propagation (BP) neural network and radial basis function (RBF) neural network, and all of them had some corresponding applications. With the rise of computer power and deep learning, this study intended to analyze the measurement of water quality and Chl-a in deep learning (DL) and to compare with several classical methods, so as to explore and develop better methods. Taking Taihu Lake of China as the case, this study adopted the measured data of Chl-a in Taihu Lake in 2017 and the data corresponding to the same time of Landsat8. In this study, the four methods were used to inverse the distribution of Chl-a value in Taihu Lake. From the results of inversion, power curve ﬁ tting model with the ∑ Residual 2 of ﬁ tting of 90.469 and inverse curve ﬁ tting model with the one of 602156.608 had the better results than other curve ﬁ tting models, however, were not as accurate as the machine learning method from segmentation results images. The machine learning method had better accuracy than the curve ﬁ tting methods from segmentation results images. The mean squared error of testing of the three methods of machine learning (BP, RBF, DL) were respectively 1.436, 4.479, 4.356. Thus, the BP method and DL method had better results in this study. less error than the curve ﬁ tting methods


INTRODUCTION
There are many studies about water quality monitoring in Taihu Lake (China) using satellite remote sensing data with the development of spectroscopy technology. The Landsat series satellites are the most popular data resources among these different satellites (Wu & Yang )  curve fit or non-linear function (Gitelson ; Gurlin et al. ). It was widely utilized for estimating chlorophyll-a concentration, which was a simple, suitable, high accuracy method for a small area (Gitelson et al. 1992;Simis et al. ;Matthews ;Odermatt et al. ;Wu & Yang ). Based on the statistical analysis, researchers selected the optimal band data or band combination data from remote sensing data, obtained the inversion algorithm between the water quality parameters from laboratory and band data (or band combination data), then analyzed the accuracy and applicability of the algorithm. In Taihu Lake, there were typical empirical methods, such as linear regression, single-band method, band-combination method Taihu Lake is typical case-ii water, and there are interactions among the composition of water quality. The relevance between spectral characteristics and chlorophyll-a were complexity. It was further improved when exponential curve fitting was used instead of linear regression. Studies on chlorophyll-a concentration inversion showed the curve fitting algorithm was a potentially useful method (Hansen & Schjoerring ; Liu et al. ; Wang et al. a).
In recent years, many scholars tried to use ANN (Artificial Neural Network) method to retrieve water quality parameters because this method can describe non-linear and complex systems. ANN can 'learn' from observed data because it was based on non-linear mapping structures of human brain. It was a universal and highly flexible function  (Haykin ). Deep learning was widely used in many fields. Taihu Lake area also has some research based on neural network algorithm to monitor water quality, also has some results. However, these methods usually employ one or two algorithms, and precision comparison; It is rare to see simulations using more than two methods, so there is a lack of precision comparison between multiple methods.
In this research, four models were used to calculate the relation between algal chlorophyll-a and Landsat 8 spectrum data. These models were curve fitting model, BP  In addition, the study used the Landsat-8 data which could be downloaded from this web (http://www.gscloud. cn/) and kept in syncin with the measured data in the 32 points in 2017.

METHODOLOGY Curve fitting methods
Curve fitting is a method of approximating discrete points on a plane by continuous curves. The commonly used methods are linear fitting, polynomial fitting, exponential fitting, gaussian fitting, etc. Generally, the curve type can be determined according to the professional characteristics. If not, the scatter diagram can be drawn to select the appropriate curve type according to the distribution of the scatter (Liu et al. ).

BP neural network
BP neural network is a multi-layer forward feedback neural network, whose transformation function of neurons is s-type (Sigmoid) function, so the output is a continuous quantity between 0 and 1, which can realize arbitrary nonlinear mapping from input to output.
BP algorithm belongs to delta(δ) algorithm, which is a supervised learning algorithm. The main idea of BP is inputting learning samples: X 1 , X 2 ..X n . It is known that the

RESULTS AND DISCUSSION
Curve fitting models There were many curve fitting models used to predict the Chl-a concentration with remote sensing data (Liu & Woods ). In addition, the NDV values (B4/B3) were used to be the key index in inversion of Chl-a (Gu & Pei ). Therefore, the study made the NDV values of Land-sat8 data as the X-axis and the measured Chl-a values as Y-axis to achieve curve fitting. Figure 3 and Table 1, the accuracy, R 2 and ∑Residual 2 can show that the Power a model and S model had better results than other curve fitting models with ∑Residual 2 ¼ 90.469 and ∑Residual 2 ¼ 90.466. Therefore, the study used these models to map the level of Chl-a in the RS data based on Y ¼ 14.85988184091882 *x (À4.582236735711132) and Y ¼ e (À1.33226861857868þ4.060797975393096/x) . In addition, the Logarithmic model and Inverse model also had good results in by analyzing the R 2 index(R 2 ¼ 0.001), which showed that the volatility of curve fitting was low, the result can be seen in Figure 4.

From the
The following figures were the results of these models based on the Landsat8 satellite data by using the ENVI software. The best two curve models (Inverse model and Power

BP neural network
The study used the 77.2% total data and 22.8% total data as the training and testing with the structure of the BP neural network which was shown as Figure 5. The B1-B11 bands data and Chl-a values were used as the input from Figure   It can be seen from Figure 6, the results of predicting Chl-a value showed that the total values were smooth based on BP method. In addition, the west-southern area had a higher Chl-a value than other areas.

RBF neural network
The study used the 77.2% total data and 22.8% total data as the training and testing with the structure of the RBF neural network which was shown as  From Figure 8, the results of prediction of Chl-a value showed that the total values were smooth based on RBF method.

Deep learning method
The After trying different models, the segmentation results of the Landsat8 data about accessing the Chl-a level were shown as Figure 11. These were some areas where Chl-a was apparent in the study area. Moreover, Figure 11 can help to show the visual performance of the Chl-a level in these models. Obviously, the curve fitting models were not as accurate as the machine learning model from Figure 11. In addition, the power curve fitting model had the best results in these curve fitting models with ∑Residual 2 of fitting ¼ 90.469, however, which was worse than machine learning models. In three machine learning models, results of BP were the best, followed by DNN, and then RBF from the aspect of mean squared error of testing from Table 4.

CONCLUSIONS
In this study, the Chl-a values in Taihu Lake were retrieved by various methods based on Landsat8 in 2017. From the curve fitting methods, the power model and the inverse model had the better∑Residual 2 of fitting of results than other curve fitting models. However, the machine learning method had less error than the curve fitting methods. Of course, the curve fitting methods were simpler than the machine learning method, and they did not require too much on the calculation power of the computer and the configuration of the machine.
In three machine learning methods, BP had the best results, and DNN also had nice the mean squared error of results.
Therefore, the best method of this study was BP model.
Although deep learning was a very popular method now, the result was slightly inferior to BP method in this study.
The reason may be that the function between the last hidden layer and the output layer of the deep learning method adopted in this study was the same as BP method.
The data were the data of multiple measurement points in multiple months of the whole year of 2017. If the study is the situation with small amount of data, deep learning may be better than BP method. Anyway, the advantages of deep network are not fully demonstrated, but other methods of script research are more effective in this study. In the future, the data volume and equipment condition need to be discussed to choose the optimal method.

ACKNOWLEDGEMENTS
Chlorophyll-a concentration data was from CERN(Taihu Laboratory for Lake Ecosystem Research, Nanjing Zhao wrote the paper.

DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.