Quantitative Representation of Water Quality Biotoxicity by Algal Photosynthetic Inhibition

The method based on the photosynthetic inhibition effect of algae offers the advantages of swift response and straightforward measurement. Nonetheless, this effect is influenced by both the environment and the state of the algae themselves. Additionally, a single parameter is vulnerable to uncertainties, rendering the measurement accuracy and stability inadequate. This paper employed currently utilized photosynthetic fluorescence parameters, including Fv/Fm(maximum photochemical quantum yield), Performance Indicator (PIabs), Comprehensive Parameter Index (CPI) and Performance Index of Comprehensive Toxicity Effect (PIcte), as quantitative toxicity characteristic parameters. The paper compared the univariate curve fitting results with the multivariate data-driven model results and investigated the effectiveness of Back Propagation(BP) Neural Network and Support Vector Machine for Regression (SVR) models to enhance the accuracy and stability of toxicity detection. Using Dichlorophenyl Dimethylurea (DCMU) samples as an example, the mean Relative Root Mean Square Error (RRMSE) corresponding to the optimal parameter PIcte for the dose-effect curve fitting was 1.246 in the concentration range of 1.25–200 µg/L. On the other hand, the mean RRMSEs corresponding to the results of the BP neural network and SVR models were 0.506 and 0.474, respectively. Notably, BP neural network exhibited excellent prediction accuracy in the medium-high concentration range of 7.5–200 µg/L, with a mean RRSME of only 0.056. Regarding the stability of the results, the mean Relative Standard Deviation (RSD) of the univariate dose-effect curve results was 15.1% within the concentration range of 50–200 µg/L. In contrast, the mean RSDs for both BP neural network and SVR results were less than 5%. In the concentration range of 1.25–200 µg/L, the mean RSDs were 6.1% and 16.5%, with the BP neural network performing well. The experimental results of Atrazine were analyzed to further validate the effectiveness of the BP neural network in improving the accuracy and stability of results. These findings provided valuable insights for the development of biotoxicity detection by using the algae photosynthetic inhibition method.


Introduction
As unicellular organisms that play a significant role in ecosystems, the algae serve as excellent test organisms for accessing biological toxicity due to their sensitivity to toxins, short generation periods, ease of culture, and direct observation of toxicity symptoms at the cellular level [1]. The Organization for Economic Cooperation and Development (OECD) has recognized the algal growth inhibition test as a standard method for assessing the toxicity of chemicals [2]. Different from the growth inhibition method, which relies on cell density, cell yield, and growth rate as reaction endpoints, the algal photosynthetic inhibition method assesses the toxic effects of pollutants in water by measuring changes in photosynthetic fluorescence parameters based on rapid chlorophyll fluorescence induction kinetics. This method offers the advantages of rapid and convenient measurement and enables online detection in real time. In recent years, experiments on the photosynthesis inhibition of algae by pollutant toxicity have been carried out extensively [3]. Strasser [4] derived a series of fluorescence parameters based on the theoretical models of biofilm energy flow. He used dose-effect (Log Logistic) curves to non-linear fit the inhibition rates of fluorescence parameters and pollutant concentrations and obtained dose-effect relationships between them. However, the quantitative analysis remained at the univariate curve fitting calibration stage. As research progressed, it was found that photosynthetic inhibition in algae was strongly influenced by environmental factors and the state of the algae themselves; relying on a single parameter resulted in uncertain measurements with lower accuracy and stability [5]. As a result, more parameters were needed to represent the photosynthesis inhibition effects, and quantitative analysis methods with multivariate inputs needed to be developed to reduce the interference of process factors by complementing information among variables. These improvements would lead to improved accuracy and stability of the algal photosynthesis inhibition method for biotoxicity detection.

Algorithm
Data-driven algorithms are well-suited for analyzing the complex relationship between dose and effect in algal photosynthesis inhibition detection. Such algorithms can effectively adjust parameters and extract the most relevant information about dependent variables, particularly when the relationship between multiple variables is unknown and difficult to discern. The commonly used data-driven models for this purpose are the Back Propagation(BP) neural network and Support Vector Regression(SVR).

BP Neural Network
The BP neural network is a multi-layer forward neural network based on the error backpropagation algorithm and has strong non-linear fitting and generalization capabilities. The network's input forward propagation equation [6] is C NNR is the predicted output of the output layer, the pollutant concentration predicted by the neural network. I i represents the inhibition rate of photosynthetic fluorescence parameters after normalization, y h is the output of the hidden layer, f 1 is the S-shaped transfer function between the input layer and the hidden layer, f 2 is the linear transfer function between the hidden layer and the output layer, W ih is the connection weights between the input layer and the hidden layer, W hj is the connection weight between the hidden layer and the output layer. If the obtained output does not match the target output, the difference between the two is backward propagated, and the connection weights between layers are adjusted. The formulas for updating weights are w hj (n + 1) and w ih (n + 1) denote the (n + 1) th weights, w hj (n) and w ih (n) denote the n th weights; T j is the j th node target value; η is the learning rate and takes the value in the range of 0 to 1. In this paper, we utilized Python 3.9 as the programming language and PyTorch 1.13.1 as the framework to train the BP neural network.

Support Vector Machine for Regression
SVR is an excellent algorithm for solving regression problems [7]. Based on statistical learning theory, this algorithm can find the best compromise between the complexity and the ability of the model on limited sample information; it can achieve the optimal learning effect while guaranteeing the generalization ability. The algorithm is especially suitable for handling small sample sizes and non-linear quantitative analysis [8] and can also handle highly correlated input variables. The formula of SVR is [9] In the formula, C SVR is the pollutant concentration obtained from SVR, SV is the set of support vectors, a i and l i are the Lagrange Multiplier and support vectors, respectively. I is the input vector, b is a constant, k rb f is a radial basis function, the formulas are [9]: The gamma is an important parameter in the kernel function. To improve the precision of predictions for the quantitative analysis of biotoxicity, the penalty factor C and gamma parameters need to be adjusted, and this paper employs the grid search method for parameter optimization. We used Python 3.9 as the programming language and SciKit-Learn 1.2.1 as the algorithm package in this paper, similar to the previous Neural Network section.
Multiple fluorescence parameters with high usage rates are extracted to compare the effectiveness of the input dose-effect curve fitting method with univariate input, the BP neural network, and SVR with multivariate input. This paper is going to investigate the effectiveness of the data-driven method with univariate input in improving the accuracy and stability of the algal photosynthetic inhibition method for biotoxicity detection.

Experimental Measurement
The experiments in this paper used Chlorella Pyrenoidosa, a common algae species in freshwater, as the test organism and the pesticide Dichlorophenyl Dimethylurea (DCMU) and Atrazine as the stressors. The multi-phase chlorophyll fluorescence kinetic curves were measured by a variable light pulse-induced chlorophyll fluorescence analyzer (AGHJ-TPLIF-I, developed by Anhui Institute of Optics and Mechanics, Chinese Academy of Sciences) [10], as depicted in Figure 1. The selection and culturing methods were the same as in reference [11]. The initial chlorophyll concentration of the experimental algae was maintained at 100-200 µg/L, with 50 mL of algae liquid and 1 mL of stressor liquid (excluding the interference caused by Dimethyl Sulfoxide and dilution of the algae liquid), and the concentration and inhibition time of the stressors were set as indicated in Table 1. The measured fluorescence kinetic curves under DCMU and Atrazine stress were displayed in Figure 2, and the photosynthetic fluorescence parameters used in this paper were based on the inversion of these fluorescence kinetic curves.

Parameters Selection
The most used toxicity response parameters are Fv/Fm (maximum quantum yield of photosystem PSII) and the Performance Indicator for energy conservation from photons absorbed by PSII antenna to the reduction of QB (PIabs) [12][13][14][15]. Wang et al. [15] examined the stress response of Microcystic aerugionosa and Pseudokirchneriella subcapitata to Tetracycline and concluded that the maximum quantum yield of photosystem PSII (Fv/Fm) is a suitable indicator for Tetracycline toxicity detection. Li et al. [14] presented that Fv/Fm can be used to assess the toxicity of Cu 2+ . Strasser [4] constructed PIabs based on a theoretical model of biofilm energy flow, considering the energy flow during photosynthesis. Sun et al. [16] demonstrated that PIabs are the most convincing parameters among all fluorescence parameters by comparing the effects of Atrazine on different parameters of Chlorella. Based on the fact that the toxic stress caused the distortion of the algal fluorescence kinetic curve (OJIP), and the degree of change was proportional to the intensity of toxicity (as shown in Figure 2), some researchers extracted information on curve variability and obtained comprehensive fluorescence parameters to characterize the biological toxicity of water bodies, such as Comprehensive Parameter Index (CPI) and Performance Index of Comprehensive Toxicity Effect (PIcte). Moreover, they proved that the comprehensive parameters were su-

Parameters Selection
The most used toxicity response parameters are Fv/Fm (maximum quantum yield of photosystem PSII) and the Performance Indicator for energy conservation from photons absorbed by PSII antenna to the reduction of QB (PIabs) [12][13][14][15]. Wang et al. [15] examined the stress response of Microcystic aerugionosa and Pseudokirchneriella subcapitata to Tetracycline and concluded that the maximum quantum yield of photosystem PSII (Fv/Fm) is a suitable indicator for Tetracycline toxicity detection. Li et al. [14] presented that Fv/Fm can be used to assess the toxicity of Cu 2+ . Strasser [4] constructed PIabs based on a theoretical model of biofilm energy flow, considering the energy flow during photosynthesis. Sun et al. [16] demonstrated that PIabs are the most convincing parameters among all fluorescence parameters by comparing the effects of Atrazine on different parameters of Chlorella. Based on the fact that the toxic stress caused the distortion of the algal fluorescence kinetic curve (OJIP), and the degree of change was proportional to the intensity of toxicity (as shown in Figure 2), some researchers extracted information on curve variability and obtained comprehensive fluorescence parameters to characterize the biological toxicity of water bodies, such as Comprehensive Parameter Index (CPI) and Performance Index of Comprehensive Toxicity Effect (PIcte). Moreover, they proved that the comprehensive parameters were su-

Parameters Selection
The most used toxicity response parameters are Fv/Fm (maximum quantum yield of photosystem PSII) and the Performance Indicator for energy conservation from photons absorbed by PSII antenna to the reduction of QB (PI abs ) [12][13][14][15]. Wang et al. [15] examined the stress response of Microcystic aerugionosa and Pseudokirchneriella subcapitata to Tetracycline and concluded that the maximum quantum yield of photosystem PSII (Fv/Fm) is a suitable indicator for Tetracycline toxicity detection. Li et al. [14] presented that Fv/Fm can be used to assess the toxicity of Cu 2+ . Strasser [4] constructed PI abs based on a theoretical model of biofilm energy flow, considering the energy flow during photosynthesis. Sun et al. [16] demonstrated that PI abs are the most convincing parameters among all fluorescence parameters by comparing the effects of Atrazine on different parameters of Chlorella. Based on the fact that the toxic stress caused the distortion of the algal fluorescence kinetic curve (OJIP), and the degree of change was proportional to the intensity of toxicity (as shown in Figure 2), some researchers extracted information on curve variability and obtained comprehensive fluorescence parameters to characterize the biological toxicity of water bodies, such as Comprehensive Parameter Index (CPI) and Performance Index of Comprehensive Toxicity Effect (PI cte ). Moreover, they proved that the comprehensive parameters were superior in terms of toxicity response, minimum detection limit, maximum response concentration, stability, and reproducibility under stress with DCMU, Dibromothymoquinone (DBMIB), Methyl Viologen (MV), Malathion and Carbofuran [17,18].
In this paper, based on the above research about toxicity parameters, the photosynthetic fluorescence parameters Fv/Fm and performance indicators PI abs , comprehensive parameters CPI and PI cte , would be extracted and used as input parameters for univariate dose-effect curve fitting and multivariate BP neural network and SVR.

Method and Results
As explained in Section 3.2, we conducted dose-effect curve, BP neural network, and SVR analyses using normalized Fv/Fm, PI abs , CPI and PI cte as independent variables, while the sample concentration served as the dependent variable. We established n experimental concentrations and used n-1 of them as the training set and 1 as the test set to obtain predicted results for all samples. We evaluated the accuracy of predictions using the Relative Root Mean Square Error (RRMSE) and the stability using the Relative Standard Deviation (RSD) [19].
In the formula, x i is the prediction corresponding to a sample, x t is the true concentration, n is the times of repeat experiments.
In the formula, x i is the prediction corresponding to a sample, x is the mean concentration, n is the times of repeat experiments.

Analysis of Single Parameter for Dose-Effect Curve Fitting
Initially, the inhibition rates of Fv/Fm, PI abs , CPI and PI cte variables were set in a dose-effect curve fitting independently to predict concentration. The results presented that RRMSEs of samples in concentrations of 0.625 µg/L and 0.75 µg/L were greater than 10 and thus not significantly reliable. Only results in the effective concentrations range of 1.25-200 µg/L were included below. Comparing the accuracy of the prediction results for different parameters, RRMSEs corresponding to Fv/Fm, PI abs , CPI and PI cte were 2.25, 1.31, 1.54 and 1.25, respectively. The three comprehensive parameters predicted higher accuracy than Fv/Fm alone, with PI cte having the lowest RRMSE and a 44.9% decrease compared to Fv/Fm. This suggested that the comprehensive parameters reflect the effects of multiple nodes, rather than one node of the photosynthesis process, and provide more valid information and greater accuracy in predicting samples of unknown concentrations. Further analysis of the trend of the RRMSE for each parameter with increasing concentration showed that the RRMSE of the three comprehensive parameters rapidly decreased with increasing concentration and stabilized in the middle and high range. However, the accuracy and the stability of predictions were not high enough, as shown in Figure 3a.

Analysis of Multi-Parameters for Data-Driven Model
The normalized inhibition rates of Fv/Fm, PIabs, CPI and PIcte were used as the independent variables for regression analysis in the data-driven model, and the concentration of samples was used as the dependent variable. Univariate dose-effect curves for PIcte were compared to multivariate input predictions using BP neural network and SVR, as indicated in Table 2.  Regarding stability, multiple parallel samples in the low to medium concertation region (0.625-32 µg/L) had a consistent RSD of zero, and this was not in line with reality. The sensitivity to changes in independent variables decreased sharply for the larger values of the power p of the Logistic function (y = A 2 + (A 1 − A 2 )/(1 + (x/x 0 ) p ) corresponding to the dose-effect curve, which seriously affected the sensitivity of the detection in the low-concentration region. Therefore, only results in the concentration range of 50-200 µg/L were considered. The RSDs corresponding to Fv/Fm, PI abs , CPI and PI cte were 35.4%, 19.9%, 17.2% and 15.1%, respectively, while the PI cte parameter had the smallest RSD, which was reduced by 57.3% relative to Fv/Fm, as shown in Figure 3b.

Analysis of Multi-Parameters for Data-Driven Model
The normalized inhibition rates of Fv/Fm, PI abs , CPI and PI cte were used as the independent variables for regression analysis in the data-driven model, and the concentration of samples was used as the dependent variable. Univariate dose-effect curves for PI cte were compared to multivariate input predictions using BP neural network and SVR, as indicated in Table 2. The mean RRMSE values for the effective concentration range (1.25-200 µg/L) were 1.246, 0.506, and 0.474 for the dose-effect curve, BP neural network, and SVR, respectively. The BP neural network and SVR models clearly outperformed the conventional dose-effect curve, with a reduction in RRMSE values of 59.3% and 61.9%, respectively. An analysis of the RRMSE trends with concentration indicated that the data-driven model predictions decreased rapidly with increasing concentration and were stable in medium-to highconcentration regions, as depicted in Figure 4a. The mean RRMSE of the BP neural network in the medium to high concentration range (7.5-200 µg/L) was only 0.056, demonstrating high accuracy, attributed to its strong non-linear fitting capability in approximating any non-linear continuous function with arbitrary precision.
Comparative analysis of the stability of the prediction results revealed that the mean RSD of the univariate dose-effect curve fitting for PI cte was 15.1% in the effective concentration interval (50-200 µg/L), while both data-driven models showed mean RSDs of less than 5% in this concentration region. For the effective concentration interval (1.25-200 µg/L), the RSDs of the multivariate BP neural network and SVR predictions decreased with increasing concentration, with mean values of 6.1% and 16.5%, respectively, and the BP neural network predictions were more stable. In summary, for DCMU samples, BP neural network with multivariate inputs exhibited superior predictive accuracy and stability relative to dose-effect curve fitting and SVR, owing to their strong non-linear fitting capability.
Comparative analysis of the stability of the prediction results revealed that the mean RSD of the univariate dose-effect curve fitting for PIcte was 15.1% in the effective concentration interval (50-200 µg/L), while both data-driven models showed mean RSDs of less than 5% in this concentration region. For the effective concentration interval (1.25-200 µg/L), the RSDs of the multivariate BP neural network and SVR predictions decreased with increasing concentration, with mean values of 6.1% and 16.5%, respectively, and the BP neural network predictions were more stable.
In summary, for DCMU samples, BP neural network with multivariate inputs exhibited superior predictive accuracy and stability relative to dose-effect curve fitting and SVR, owing to their strong non-linear fitting capability.

Experimental Verification of the Other Substance
The methodology was applied to Atrazine samples to assess the efficacy of multivariate BP neural network in improving prediction accuracy. Initially, dose-effect curve fitting was conducted with Fv/Fm, PIabs, CPI and PIcte inhibition rate used as independent variables, and with the concentration of the samples used as the dependent variable. The mean RRMSEs in the effective concentration range (1.25-200 µg/L) were obtained for each parameter and were 2.20, 1.59, 1.40 and 1.40, respectively. The predictions of the three comprehensive parameters were found to be more accurate than the common parameter Fv/Fm, and there were the most accurate results for CPI and PIcte. The mean RSDs corresponding to the parameters Fv/Fm, PIabs, CPI and PIcte in the effective concentration interval (50-200 µg/L) were 16.4%, 16.7%, 19.7% and 17.0%, respectively. There was no significant difference in the RSDs of the predicted results for the four parameters using a one-ANOVA analysis of variance. Figure 5 displayed a further comparison of the multivariate BP neural network and SVR predictions. There was a higher accuracy for BP neural network and SVR predictions, with mean RRMSE values of 0.532 and 1.379, respectively. The accuracy of BP neural network prediction was the highest, and the RRMSEs were decreased by 62.0% relative to the results of the univariate PIcte for dose-effect fitting. In the whole concentration region, the

Experimental Verification of the Other Substance
The methodology was applied to Atrazine samples to assess the efficacy of multivariate BP neural network in improving prediction accuracy. Initially, dose-effect curve fitting was conducted with Fv/Fm, PI abs , CPI and PI cte inhibition rate used as independent variables, and with the concentration of the samples used as the dependent variable. The mean RRMSEs in the effective concentration range (1.25-200 µg/L) were obtained for each parameter and were 2.20, 1.59, 1.40 and 1.40, respectively. The predictions of the three comprehensive parameters were found to be more accurate than the common parameter Fv/Fm, and there were the most accurate results for CPI and PI cte . The mean RSDs corresponding to the parameters Fv/Fm, PI abs , CPI and PI cte in the effective concentration interval (50-200 µg/L) were 16.4%, 16.7%, 19.7% and 17.0%, respectively. There was no significant difference in the RSDs of the predicted results for the four parameters using a one-ANOVA analysis of variance. Figure 5 displayed a further comparison of the multivariate BP neural network and SVR predictions. There was a higher accuracy for BP neural network and SVR predictions, with mean RRMSE values of 0.532 and 1.379, respectively. The accuracy of BP neural network prediction was the highest, and the RRMSEs were decreased by 62.0% relative to the results of the univariate PI cte for dose-effect fitting. In the whole concentration region, the RRMSEs of the BP neural network predictions remained basically unchanged with concentration, demonstrating good accuracy. Additionally, in the effective concentration region (50-200 µg/L), the mean RSD of the BP neural network predictions was 9.8%, lower than the dose-effect curve fitting results. These results were consistent with those obtained from DCMU samples in Section 4.2, which further illustrated the ability of BP neural networks to significantly improve the accuracy and stability of toxicity detection of photosynthetic inhibition effects in algae.
centration, demonstrating good accuracy. Additionally, in the effective concentration region (50-200 µg/L), the mean RSD of the BP neural network predictions was 9.8%, lower than the dose-effect curve fitting results. These results were consistent with those obtained from DCMU samples in Section 4.2, which further illustrated the ability of BP neural networks to significantly improve the accuracy and stability of toxicity detection of photosynthetic inhibition effects in algae.

Conclusions
In this paper, we examined the effectiveness of a multivariate BP neural network in improving the accuracy of predictions using four commonly used photosynthetic fluorescence parameters, namely Fv/Fm, Plabs, CPI and PIcte. We compared and analyzed the accuracy and stability of the inversion results of the univariate dose-effect curve and multivariate BP neural network and SVR. Firstly, we analyzed the results of the four parameters entered individually into the dose-effect curve for DCMU samples. The mean RRMSE and RSD values corresponding to the comprehensive parameter PIcte were 1.26 (1.25-200 µg/L) and 15.1% (50-200 µg/L), respectively, which were better than those of the other parameters but not sufficiently high. We then compared the results of the BP neural network and SVR with multivariate input. The corresponding mean RRMSEs were 0.506 and 0.474, respectively, with a mean RRMSE of only 0.056 in the medium to high concentration range (7.5-200 µg/L) for the BP neural network. The BP neural network also exhibited good stability, with an average RSD of only 6.1% (1.25-200 µg/L). Finally, the experimental finding of Atrazine samples further verified that the BP neural network could effectively improve the accuracy and stability of algal photosynthesis inhibition methods. In the next step, the BP neural network quantitative analysis will be optimized for the practical situation of the algal photosynthetic inhibition method on biotoxicity detection.

Conclusions
In this paper, we examined the effectiveness of a multivariate BP neural network in improving the accuracy of predictions using four commonly used photosynthetic fluorescence parameters, namely Fv/Fm, Pl abs , CPI and PI cte . We compared and analyzed the accuracy and stability of the inversion results of the univariate dose-effect curve and multivariate BP neural network and SVR. Firstly, we analyzed the results of the four parameters entered individually into the dose-effect curve for DCMU samples. The mean RRMSE and RSD values corresponding to the comprehensive parameter PI cte were 1.26 (1.25-200 µg/L) and 15.1% (50-200 µg/L), respectively, which were better than those of the other parameters but not sufficiently high. We then compared the results of the BP neural network and SVR with multivariate input. The corresponding mean RRMSEs were 0.506 and 0.474, respectively, with a mean RRMSE of only 0.056 in the medium to high concentration range (7.5-200 µg/L) for the BP neural network. The BP neural network also exhibited good stability, with an average RSD of only 6.1% (1.25-200 µg/L). Finally, the experimental finding of Atrazine samples further verified that the BP neural network could effectively improve the accuracy and stability of algal photosynthesis inhibition methods. In the next step, the BP neural network quantitative analysis will be optimized for the practical situation of the algal photosynthetic inhibition method on biotoxicity detection.  Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: For personal reasons, the data will not be disclosed for the time being.

Conflicts of Interest:
The authors declare no conflict of interest.