Modelling Freshwater Eutrophication with Limited Limnological Data Using Artificial Neural Networks

Artificial Neural Networks (ANNs) have wide applications in aquatic ecology and specifically in modelling water quality and biotic responses to environmental predictors. However, data scarcity is a common problem that raises the need to optimize modelling approaches to overcome data limitations. With this paper, we investigate the optimal k-fold cross validation in building an ANN using a small water-quality data set. The ANN was created to model the chlorophyll-a levels of a shallow eutrophic lake (Mikri Prespa) located in N. Greece. The typical water quality parameters serving as the ANN’s inputs are pH, dissolved oxygen, water temperature, phosphorus, nitrogen, electric conductivity, and Secchi disk depth. The available data set was small, containing only 89 data samples. For that reason, k-fold cross validation was used for training the ANN. To find the optimal k value for the k-fold cross validation, several values of k were tested (ranging from 3 to 30). Additionally, the leave-one-out (LOO) cross validation, which is an extreme case of the k-fold cross validation, was also applied. The ANN’s performance indices showed a clear trend to be improved as the k number was increased, while the best results were calculated for the LOO cross validation as expected. The computational times were calculated for each k value, where it was found the computational time is relatively low when applying the more expensive LOO cross validation; therefore, the LOO is recommended. Finally, a sensitivity analysis was examined using the ANN to investigate the interactions of the input parameters with the Chlorophyll-a, and hence examining the potential use of the ANN as a water management tool for nutrient control.


Introduction
Data-driven models are systematically used in the field of water resources management because they are less demanding in terms of data acquisition and quantity than the empirical and the physical-based hydrological models [1]. Artificial Neural Networks (ANNs) are such data-driven models with a wide application in water resources and aquatic sciences [2]. In particular, ANNs are used for modelling several domains of aquatic ecology such as benthic macroinvertebrates, planktonic communities, fish assemblages, and biomanipulation assessment [3]. During the last two decades, several ANN water quality modelling studies were performed with very good modelling results [4][5][6]. Their main advantages include their ability to model complex and nonlinear processes and the fact that they do not require assumptions about the distribution of the data or the relationships between input and output variables [2]. In addition, they offer the user the flexibility to successfully model environmental relationships with limited knowledge of the problem [7]. Thus, ANNs are considered ideal for modelling aquatic ecosystems, which are characterized by complex dynamics and nonlinear analytics [3].
Water quality monitoring programmes in particular produce a large amount of data with complex structures [8]. Unfortunately, there are also many cases where insufficient data quantity and quality can be a problem in ecological quality assessment and management [9]. Several factors such as bad weather conditions during monitoring, lack of funding, and problematic sensors are responsible for the lack of data. Data scarcity of water resources is an issue highlighted in the study of Cigizoglu and Kisi [10]; the authors address this issue successfully by applying k-fold cross validation into the ANN's training data set. Generally, the phenomenon of data scarcity/small data sets when modelling with ANNs is an issue found in many scientific fields and the method of k-fold cross validation is widely used [11]. The problem of having a small data set when applying k-fold cross validation is discussed in a water quality modelling study of Goethals et al. [5], where it is stated that lower k-values may produce more robust ANN models, but with lower performance, and therefore a high k-value is recommended.
In this article, we address the issue of training an ANN with a small dataset of water quality parameters collected from a eutrophic lake in Greece. Eutrophication is one of the most significant problems today that is responsible for the degradation of water quality in many freshwater, coastal, and marine ecosystems worldwide [12,13]. Because freshwater lakes are major providers of ecosystem services (e.g., water supply for potable use and irrigation), eutrophication has severe socioeconomic implications that threaten human well-being, particularly in areas of the world where water is scarce. Therefore, scientists are developing new tools and methods for efficient and improved monitoring of water quality. In addition, they employ advanced modelling techniques and algorithms for creating novel forecasting schemes of eutrophication, since a lake's ecosystem is very complex and sensitive and even a mild external pressure (such as tourist activity) with small inflows of biogenic elements can promote eutrophication [14]. Not surprisingly, ANNs have been used before in modelling eutrophication processes in lakes [15]. Linear regression methods and decision tree methods are some alternative methodologies used for eutrophication modelling. These methodologies have the merits of needing few parameters to adjust and being simple; however, they might not perform well when the data sample is small or does not follow certain assumptions about linear or Gaussian distributions [16]. In contrast, ANN models are not affected by the non-linearity effect, which is often observed between the environmental parameters associated with eutrophication [17].
The purpose of this modelling study is two-fold. First, we trained an ANN by implementing an approach of k-fold cross validation. Then, we compared the obtained modelling results between models built with different k values. Additionally, the factor of the needed computational time was also taken into consideration as a criterion for choosing the optimal model, with respect to the ANN performance and complexity in terms of computation time. Second, we examined the explanatory power of the optimal model to identify whether it can act as a water management tool. We show that the ANN managed to predict the chlorophyll-a levels of Lake Mikri Prespa with high accuracy. Based on this model, the contribution of each environmental parameter was evaluated with the use of a sensitivity analysis algorithm. The results of the sensitivity analysis produced useful conclusions about the role of each parameter with emphasis on the role of nutrients levels changes on the trophic status of the lake.

Study Area and Data Collection
The used dataset contains environmental variables obtained from Lake Mikri Prespa, a shallow eutrophic lake located in northwestern Greece. The lake is an ecosystem of great ecological importance as it belongs to the Prespa National park, is a Ramsar wetland site, an important bird area, and a Natura 2000 site [18]. The lake lies at an elevation of 853 m above sea level [19] and is located in the wider transboundary Prespa area shared by Greece, Albania, and North Macedonia ( Figure 1). The climate of the area is characterized as sub-Mediterranean with continental influences, with frequent snowfall in the winter and summer rain drops [20]. The lake is characterized as a shallow lake, with approximately 48 km 2 of surface area, a maximum water depth of 8 m [21] and mean depth of 4.1 m [19]. The trophic status of the lake is characterized as eutrophic. As a result, prolonged cyanobacterial blooms have been recorded, which may start in spring and persist until December [22]. Chlorophyll-a concentration and environmental data were collected from fifteen sampling sites on a seasonal basis from 2006 to 2008 (see Hadjisolomou et al. [19]). We used the following environmental parameters as predictors of chlorophyll-a (Chl-a): pH, surface dissolved oxygen concentration (DO), electrical conductivity (EC), Secchi disk depth (SD), water depth (WD), surface water temperature (WT), total phosphorus concentration (TP) and dissolved inorganic nitrogen concentration (DIN).
The basic statistical properties of the environmental parameters are shown with the use of box plots in Figure 2.

Preliminaries on ANN and Model Construction
ANNs mimic the way biological neurons learn and make logical conclusions. A very popular ANN is the multi-layer feedforward (MLF) network trained by the backpropagation algorithm [23]. Feedforward networks are considered suitable for function approximation problems. The MLF networks are divided into at least three layers and each layer consists of neurons. The first layer is the input layer, followed by at least one intermediate hidden layer and the output layer that produces the final output. Each neuron of a layer relates to all the neurons of the next layer using synaptic weights. Every neuron performs aggregation on its weighted inputs and yields an output through an activation function. The most commonly used activation functions are the linear, the logistic and the hyperbolic tangent activation function [24]. The output value of the j-th neuron (o j ) is given by the equations as described by Dedecker et al. [25]: where f is the activation function, x i is the input from i-th neuron belonging to the immediate previous layer, w ij is the synaptic weight that connects x i with the j-th neuron and z j a bias term. The output of each neuron is computed and propagated through the next layer until the last layer, producing a network output that compares with the given output [26]. The learning procedure is repeated several times with the use of a training algorithm and each time the synaptic weights are adjusted until they minimize an error function, usually taken as the mean square difference between the predicted and the given output [27]. The Levenberg-Marquardt (LM) algorithm has the fastest convergence among the existing variations of backpropagation algorithms when it comes to ANNs with up to a few hundred parameters [23]. The topology of an ANN that determines the number of hidden layers and the number of neurons in each layer can be determined after a trial and error procedure [28,29]. The topology of a 3-layer ANN can be presented as L1-H1-L2 where L1 is the number of neurons in the input layer, H1 the number of neurons in the hidden layer and L2 the number of neurons in the output layer. In order to avoid overfitting, the maximum number of the hidden layer neurons can be computed according to the following the rule of thumb proposed by Maier et al. [30]: where N H is the number of hidden layer neurons, N I the number of inputs, and N TR the number of training samples. The maximum N H must be the smallest number found by those two rules. Data normalization is a common procedure since the ANN's performance is improved [25,31] and after network training the data is set back to its initial form. Dimensionality reduction of the measured variables is also recommended in order to achieve a reduced size of the original data set of variables [32]. The variable/parameter's dimension reduction is performed not only because it reduces the model computational complexity, but also eliminates the possibility for model's misconvergence and poor accuracy [4]. ANNs are evaluated based on several performance indices for their test set [33]. The Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) are some commonly used performance indices and have the following mathematical formulas, respectively [34,35]: where for the above set of equations the parameter o i is the observed value, s i the simulated/predicted value, and n the number of observations. The sensitivity analysis is performed based on the 'Perturb' method. The 'Perturb' method is a sensitivity analysis methodology, which is computing the perturbation effect of the input variables regarding the output variable. The 'Perturb' method is examining the effect that a small change of an input variable has on the ANN's output, therefore the input variables can be classified by an order of importance [36]. The mathematical formula that describes the 'Perturb' method of sensitivity analysis is explained by Lee et al. [37] as follows: where n represents the number of observations. The k-fold cross validation method is used in order to avoid overfitting and the data set is divided into k equally sized folds/subsamples [38]. For each k-fold of the data set, an ANN model is trained on the other k − 1 folds of the data set and validated for the k-th fold. The cross-validation procedure is repeated k times (hence, each fold is used exactly once as the validation data set) and finally the average of the k calculated validation performance indices is computed [39,40]. According to Goethals et al. [5], the optimal k-value is found based on an evaluation procedure, where the robustness and reliability of the developed ANN models is assessed. A high value of k is recommended when the dataset is small and has very few observations. An extreme case of the k-fold cross validation method is the case of leave-one-out (LOO) cross validation, when the arithmetic value of k equals the number of measured samples (n) in the dataset.

Results
The monitored variables (pH, DO, EC, SD, depth, WT, TP, DIN, and Chl-a) were analyzed for collinearity by calculating the Pearson correlation coefficient (r). As stated by Gebler et al. [41], the deletion of the collinear parameters helps the ANN to avoid unnecessary/superfluous information and simplifies the model's structure. The results of the correlation analysis (Table 1) revealed a strong correlation between the parameters SD depth and water depth (r = 0.722). Therefore, the parameter water depth is eliminated from the ANN's inputs. The Matlab (R2018b) software was used for data analysis and ANNs' development for the needs of this study. Several topologies were examined using a typical trial-and-error methodology, as recommended by Ozesmi et al. [29] and Palani et al. [42]; while a more analytical procedure for finding the ANN's optimal topology is described in the study of Tuhtan et al. [43]. The 7-8-1 topology was found to be the optimal based on the calculated performance indices, which in our case are the Root Mean Square Error (RMSE) and the Mean Absolute Error (MAE). The k-fold cross validation method is applied in this modeling study, since the data sample for the ANN learning and training is small (n = 89, corresponding to sample points without any missing values). The optimal k-value was calculated among a candidate set for k values ranging between [3,30] and the LOO cross validation case (where k = 89). For each k-value, k ANNs (with the same topology 7-8-1) were created and their performance indices for the test sets were averaged (see Figures 3 and 4). Based on RMSE and MAE results for each k-value, it was calculated that the optimal k-value was k = 89 since it produced the lowest RMSE and MAE. The calculated RMSE and MAE values for k = 89 are always equal because there is only one observation (n = 1) per test data set (recall Equations (5) and (6)). It must also be noted that k = 89 exhibits the maximum number of outliers when calculating the RMSE and MAE performance indices. However, the ratio of outlier number per k-fold number (when k = 89) is lower than the calculated ratio for the majority of k = 3:30.
The computational time needed to train and test each ANN model for the different given k values was recorded ( Figure 5). Based on the different time values that are needed for each k-fold, the comparison for the needed time and the ANN's performance for the given k values is enabled (by observing Figures 3-5). The LOO case had the highest computational time (t = 314 s), as expected. Even though the LOO case had the highest computational time, this time value is relatively low because the data set is small and consists of only 89 samples. The RMSE and MAE values are following a clear pattern, where they decrease as the k number increases. For the k = 89 case, the RMSE and MAE are equal to 0.36; however, for k = 30 the RMSE = 2.33 and MAE = 1.42. Therefore, for the k = 89 value there is a significant improvement of the ANN's modelling performance compared with that of k = 30 with low computational overhead. So, by considering both the computational times and the performance indices, it is decided that the k = 89 (LOO) is the best value for the modelling needs of this study.   The associated results for predicting Lake's Mikri Prespa Chl-a production are based on the averaged values for the LOO case. The ANN model's sensitivity analysis is also expressed based on these averaged values.
The created ANN model managed to predict with good accuracy the Chl-a levels, as it was demonstrated from the RMSE and MAE indices. The measured values of the Chl-a parameter are plotted against those predicted by the ANN (Figure 6) for both the test dataset and the averaged predicted values of the entire dataset. It can be observed that these three graphical plots are almost identical. Therefore, the ANN model can be characterized as a reliable predictor for the Chl-a parameter. The biggest differences between the real and the averaged predicted data are observed for some elevated values of the Chl-a parameter. However, the ANN typically produces output results that match/describe the lake's tendency towards a eutrophic or hypereutrophic status for these elevated values of Chl-a input data. Additionally, the test data points also match the real data points closely, with the exception of a few instances. The ANN's sensitivity analysis algorithm allowed the calculation of each input environmental parameter's impact on the Chl-a parameter. The input parameters were perturbated by −10%; −5%; +5%; and +10% for each of the 89 inputs and the average results are presented in Figure 7. These input's fluctuations correspond to different mod-elling scenarios that examine the Chl-a level's changes in regard with the perturbation of the ANN's inputs. The pH parameter perturbations between −10% and +5% produced no clear results about the interactions between pH and Chl-a. However, for increased pH values (+10% change) a significant change of Chl-a levels is observed (+99.08%).
The DO parameter perturbations seem to follow a pattern, where the Chl-a levels are increased when the DO parameter is increased and vice versa. EC had the greatest contribution to the Chl-a production (600.9% when the EC is increased by 10% and 198.2% when decreased by 10%). Interestingly, smaller increases or decreases of EC lead to smaller increases to Chl-a.
Regarding SD parameter's negative or positive fluctuations, the ANN also did not reveal a clear pattern. Although algal production is not the only mechanism affecting the SD, lower water transparency (−10%) was related with an increase of Chl-a by approximately 44%.
The WT fluctuations were associated with increase in Chl-a levels. However, in the cases of WT increases, the Chl-a increased drastically, showing the strong impact of WT on Chl-a. Both TP and DIN are related to the Chl-a parameter. It is clearly shown that Chl-a changes follow the changes in TP and DIN.

Discussion
Data scarcity is a computational issue which lake modelers must address very often. As it is highlighted by Jeong et al. [45], the unavailability of suitable/enough freshwater quality data is a problem found when applying machine learning techniques. Generally, ANNs must be trained with a big enough data number in order to produce good results. When more data are available during the training process, more accurate results are given by the ANNs [46]. As mentioned by Kavzoglu [47], a small sample size might result in poor performance of the ANN. Under these conditions, k-fold cross validation is considered a good option [48]. 10-fold cross validation is widely used in ANN water quality modelling studies (e.g., [39,49]), while 5-fold cross validation (e.g., [50]) is also used. In some cases, researchers are using less commonly used k values. For example, in a water quality modelling study by Chang et al. [51], the created ANN was verified with the use of 14-fold cross validation.
In our case, since the available data set was very small (n = 89), the k-fold cross validation method was applied and investigated. The k number was ranged for values between [3,30] and LOO cross validation was also investigated. It was clearly observed that the bigger values of k produced better outputs. In the case of LOO cross validation, the predicted outputs were highly reliable, since they were evaluated based on a very low RMSE = 0.3605. Additionally, the graphical plots of the real data and the predicted data demonstrate an almost perfect match, except for a few instances related with very high values of Chl-a concentrations (eutrophic/hypertrophic values). However, even in that case, the ANN managed to capture the Lake's Mikri Prespa tendency for eutrophic conditions and the predicted outputs were characterized as eutrophic instances/values. LOO cross validation was not only chosen for the needs of this ANN modelling study because it calculated the best outputs, but the computational time factor was also taken into consideration. The computational time for k = 89 (LOO), was higher than the time needed for smaller k values. However, the computational time needed for k = 89 is still relatively low (t = 314 s) and is considered practical. Hence, choosing the LOO cross validation for the ANN modelling is not a limitation based on computational time terms. Therefore, even though the k = 1:30 values can produce faster computations, the ANN's good performance calculated with the use of LOO cross validation prevailed.
Besides the development of a reliable model based on ANN techniques adjusted for the needs of Lake Mikri Prespa, another objective of this study was the extraction of information related with the role/impact of the measured water quality parameters to algal production. The sensitivity analysis had a crucial role for investigating a parameter's contribution. The negative or positive fluctuations of the input parameters, and specifically nutrient perturbations, allowed us to create various modelling scenarios. Based on the application of sensitivity analysis, useful results about Mikri Prespa Lake algal production can be extracted and the constructed ANN can serve as a lake restoration management tool [52]. The impact of the environmental parameters on the Chl-a can be directly measured/quantified based on sensitivity analysis algorithm. ANN-based modelling studies (e.g., [17,53]) dealing with eutrophication control are of great environmental significance. ANNs are ideal for ecological modeling, since they can model phenomena with non-linear and complex data [52]. Furthermore, ANNs require no a priori assumptions about the model or the data distribution [54] and are considered advanced modelling techniques, because they can be used on a heterogeneous data set. ANNs are successfully addressing these modelling issues, which other modelling methods might fail to overcome. Therefore, the development of ANNs for limnological studies is considered a good modelling practice. Modelling scenarios based on ANNs can aid management authorities with the implementation of measures for lake water quality improvement and ecosystem restoration. For example, quantifying the ecosystem response to a simulated decrease of one or both nutrients (phosphorus and nitrogen) can provide useful insight whether managers should target only phosphorus or both phosphorus and nitrogen. With this study we showed a strong connection between Chl-a and nutrients, which corroborates many limnological studies [55][56][57].
Concerning the other parameters (i.e., EC, SD and WT), the ANN did not produce a clear pattern. However, as stated by Hadjislomou et al. [19], the modelling of limnological parameters is a very case-sensitive task and the underlying mechanisms controlling limnological parameters are complex and usually their interactions are not easily correlated/examined. ANNs are data-driven models and the relationships/interactions among the associated parameters are not always easy to be understood, since ANNs require no a priori assumptions about the model or the data distribution [54]. Additionally, even if the created ANN is a good predictor for the Chl-a parameter, there are many other factors that might play an important role in explaining the observed patterns of Chl-a. As stated by Napiórkowska-Krzebietke et al. [58], the dynamics and the accumulation of cyanobacterial blooms in lakes are controlled by many factors, such as wind strength, while wind-driven sediment resuspension is a common feature in shallow lakes. Additionally, other special characteristics of the lake (e.g., location) must be taken into consideration when examining algal production. For example, the extended duration of cyanobacterial blooms in Lake Mikri Prespa is favored by the warm Mediterranean climate throughout the year [22]. Regarding the WT, the results of the sensitivity analysis are in agreement with a relevant modeling study of Lake Mikri Prespa [19], where the interactions among the environmental variables with the help of an unsupervised ANN were examined. It was concluded that the data from Lake Mikri Prespa are primarily associated with the WT. Temperature has a crucial role for the Mikri Prespa lake's functioning, and because it is a shallow lake, it is affected by the seasonality effect. The results of our modeling study agree with these findings, since in our case the WT is calculated to be the second most influencing parameter.
Another interesting finding derived from the sensitivity analysis results is the relatively small increase of Chl-a, which is observed for negative changes of the WT parameter and corresponds to temperatures of less hot months, might be attributed to other meteorological conditions/factors, which exist during spring and autumn. For example, wind mixing is more intense during these seasons and can lead to the release of phosphorus and nitrogen from the sediments, a process which favors eutrophication [59]. In addition, based on the sensitivity analysis results, it is concluded that the EC parameter role is not easy to be understood, since it is associated with complex processes such as wind mixing and inorganic dissolved matter inflow from the lake's catchment, e.g., surface runoff and river inflow [60]. Nevertheless, the ANN managed to associate the increased levels of EC with elevated algal production, which possibly shows the effect of increased nutrients on the algal productivity.
According to the ANN's results, the reduction of TP is associated with a reduction of Chl-a levels and vice versa. This modelling scenario, which is related to TP perturbations, is supported by the fact that strong relationships between increased phosphorus loadings and eutrophication have been shown in freshwater ecosystems [61]. The second scenario, which is related with DIN perturbations, also showed a reduction to Chl-a levels when the DIN parameter decreased. The DIN parameter has similar behavior with the TP parameter; therefore, any increase of nitrogen levels is associated with increased algal productivity [62]. The linkage between nutrient loading and eutrophication is often non-linear, because of the complex mechanisms by which hydrological and meteorological conditions interfere with the nutrients [63]. Based on this statement, the non-proportional increase/decrease for Chl-a levels in regard to the associated nutrient perturbations that were observed during the sensitivity analysis can be justified.
Based on the ANN sensitivity analysis results, the TP parameter has greater impact on algal production than the DIN parameter. The stronger impact of TP compared to the DIN parameter is more noticeable when TP concentration increases. Nevertheless, DIN's role should not be underestimated in eutrophication management/control, since it is observed that high DIN levels are related with continued serious eutrophication problems caused by cyanobacterial blooms [64]. Even though the role of nitrogen as a limiting factor is debated, it has an important role in shallow polymictic eutrophic lakes [65]. The ANN simulation scenarios regarding the DIN parameter clearly showed that DIN additions into the lake are promoting algal production, while DIN level decrease is related with Chl-a level decrease. Therefore, it is recommended that the lake's nutrient management should not only be focused on phosphorus, but on nitrogen as well. Additionally, it is documented that the simultaneous decrease of both nutrients has a bigger reduction of Chl-a levels and is related with the synergistic effect of DIN and TP parameters [17]. In the case of synergistic/combined perturbations of nutrients, a similar behavior with the case of only one nutrient fluctuation/perturbation is observed. Yet, the combined reduction of DIN and TP leads to even lower levels of Chl-a than the single reduction of DIN or TP concentration.
According to several studies eutrophication levels in lakes are linked with both nutrients (nitrogen and phosphorus) and high levels of nutrients results in high levels of Chl-a [13].
Even a small reduction of phosphorus and nitrogen concentration into Lake Mikri Prespa has a beneficial effect on lake trophic status, while adding nutrients into the Lake promotes eutrophication. A recent study by Verstijnen et al. [66] stated that Lake Mikri Prespa is very sensitive to nutrient increase, and that even small additions of nutrients derived from waterbirds are associated with cyanobacterial blooms. The ANN clearly captured this relationship for both nutrients (DIN and TP) and how prone Mikri Prespa Lake is to eutrophication. In conclusion, the created ANN is a reliable predictor for Chla levels and can successfully investigate different management scenarios related with nutrient control. However, since the data set is small, the generalization ability of the ANN might not be sufficient for new data related with abnormal/unusual conditions (e.g., huge increase of a nutrient). Based on this limitation, the created management scenarios were restricted to parameter's fluctuations up to ± 10. Therefore, the re-calibration of the model when more data is available is recommended in order to extend its capabilities as a management tool.
Generally, the LOO cross validation method is considered to be the best option for modelling small datasets with the use of ANNs. However, some concerns related with the effect of overtraining might exist. The LOO cross validation provides the benefit of having more data available for training, but at the same time the data set that is used for validation becomes smaller and the evaluation becomes less reliable and robust. This issue is compensated by the fact that more training and validations are applied. Of course, the LOO cross validation should not be limited to ANN-based applications, but it can be used for modelling with other machine learning methods such as linear regression and random forests.

Conclusions
Data scarcity is a very common issue observed when modelling limnological data sets with the use of ANNs. In the case of Lake Mikri Prespa, a relatively small number of observations (n = 89) was used to develop an ANN and model the trophic status of the Lake with the use of k-fold cross validation. For that purpose, several k values were examined. The LOO cross validation produced the better outputs, while the computational time that was needed was relatively low. Therefore, LOO cross validation is recommended for the needs of this eutrophication-related modelling study. Additionally, the created ANN was a good Chl-a parameter predictor and can serve as a water management tool. Based on sensitivity analysis, the ANN examined the scenarios when the nutrients (phosphorus and nitrogen) levels into the lake increased or decreased. In the case of nutrient increase, the model clearly showed that Lake Mikri Prespa's water quality declines even more. On the other hand, when the nutrient levels are decreased the ANN model clearly showed that algal production reduced.

Data Availability Statement:
Currently data is not publicly available.

Conflicts of Interest:
The authors declare no conflict of interest.