A New Index Contributing to an Early Warning System for Cyanobacterial Bloom Occurrence in Atlantic Canada Lakes

Cyanobacterial harmful algal blooms (cyanoHAB) have become more frequent and prominent in Atlantic Canada freshwater bodies over the last several years, especially in Nova Scotia (NS). Inspired by the trophic index of Vollenweider, a new index was developed with modification and adaptation for freshwater systems. Our model TRINDEX shows the effectiveness of estimation for the variation of cyanobacterial dominance in phytoplankton communities. TRINDEX can assist in determining the threshold for cyanobacterial bloom onset. Combinations of nutrients and pigments under TRINDEX were tested by a binary discrimination test to find the optimal range of threshold for cyanoHAB formation in freshwater lakes. INTRODUCTION Many drinking waters and recreational reservoirs around the globe suffer from cyanobacterial HABs (cyanoHABs) and the used restrictions have caused negative social and economic impacts on local communities and businesses. The desire to protect valuable freshwater and marine resources have motivated extensive research on methods for predicting and mitigating algal blooms. Mitigation strategies for HABs have been divided into two categories, namely precautionary prevention (or early warning protocols) and bloom controls (Kim 2006). Precautionary prevention refers to monitoring and predicting events while bloom control involves both direct controls applied after an HAB has begun and indirect controls deal with strategies, such as management of land-derived nutrient inputs (Kim 2006). The wish to predict the algal bloom occurrence and proliferation under a complex environmental situation has led to developing many indices for the estimation of eutrophication. There is a real need for improving the knowledge of the eco-physiological mechanisms leading to cyanoHABs; but this cannot be achieved only through reliance on the bulk indices (chlorophyll-a (chl-a), temperature, nutrients). Most research on eutrophication are based on chemical components, i.e. nutrient characteristic of water bodies such as total phosphorus (TP) and its mineral part (PO4-P, the dissolved and bioavailable form of phosphate easily consumed by algae) and dissolved inorganic nitrogen (DIN), which contribute significantly to algal growth. All single indices or combined parameters (Novotny & Olem 1994, Bartram et al. 1999, Brient et al. 2008, Brylinsky 2009, Chorus 2012, Ndong et al. 2014, Ahn et al. 2017), despite their usefulness, can show only the real-time conditions and do not predict adequately the cyanobacterial growth in mass and the situation of bloom occurrence as well as the thresholds of bloom onset. They were primarily developed for the determination of trophic status only. There were a few studies which focused on the bloom forecasting using simulations (Anderson et al. 2016). However, almost no work has been done to determine the thresholds that can predict exactly the HAB occurrence in freshwater ecosystems, except for the model from Downing et al. (2001) which provided statistical analysis for predicting the risk of cyanobacterial dominance. A warning system is an essential tool, from our perspective, which should be able to adequately foresee the irregular patterns such as massive blooms and contribute to water management and decision making. Nowadays, some modern approaches use sophisticated tools and devices including remote sensing, imaging process, etc. to observe and predict blooms. However, two main issues have persisted: (1) They are costly (especially for computational cost) and are used primarily for long-term forecasting (Anderson et 2020 pp. 1887-1897 Vol. 19 p-ISSN: 0972-6268 (Print copies up to 2016) No. 5 (Suppl) Nature Environment and Pollution Technology An International Quarterly Scientific Journal Original Research Paper e-ISSN: 2395-3454 Open Access Journal Nat. Env. & Poll. Tech. Website: www.neptjournal.com Received: 16-03-2020 Revised: 21-04-2020 Accepted: 15-06-2020


INTRODUCTION
Many drinking waters and recreational reservoirs around the globe suffer from cyanobacterial HABs (cyanoHABs) and the used restrictions have caused negative social and economic impacts on local communities and businesses. The desire to protect valuable freshwater and marine resources have motivated extensive research on methods for predicting and mitigating algal blooms. Mitigation strategies for HABs have been divided into two categories, namely precautionary prevention (or early warning protocols) and bloom controls (Kim 2006). Precautionary prevention refers to monitoring and predicting events while bloom control involves both direct controls applied after an HAB has begun and indirect controls deal with strategies, such as management of land-derived nutrient inputs (Kim 2006).
The wish to predict the algal bloom occurrence and proliferation under a complex environmental situation has led to developing many indices for the estimation of eutrophication. There is a real need for improving the knowledge of the eco-physiological mechanisms leading to cyanoHABs; but this cannot be achieved only through reliance on the bulk indices (chlorophyll-a (chl-a), temperature, nutrients).
Most research on eutrophication are based on chemical components, i.e. nutrient characteristic of water bodies such as total phosphorus (TP) and its mineral part (PO 4 -P, the dis-solved and bioavailable form of phosphate easily consumed by algae) and dissolved inorganic nitrogen (DIN), which contribute significantly to algal growth. All single indices or combined parameters (Novotny & Olem 1994, Bartram et al. 1999, Brient et al. 2008, Brylinsky 2009, Chorus 2012, Ndong et al. 2014, Ahn et al. 2017, despite their usefulness, can show only the real-time conditions and do not predict adequately the cyanobacterial growth in mass and the situation of bloom occurrence as well as the thresholds of bloom onset. They were primarily developed for the determination of trophic status only. There were a few studies which focused on the bloom forecasting using simulations (Anderson et al. 2016). However, almost no work has been done to determine the thresholds that can predict exactly the HAB occurrence in freshwater ecosystems, except for the model from Downing et al. (2001) which provided statistical analysis for predicting the risk of cyanobacterial dominance.
A warning system is an essential tool, from our perspective, which should be able to adequately foresee the irregular patterns such as massive blooms and contribute to water management and decision making. Nowadays, some modern approaches use sophisticated tools and devices including remote sensing, imaging process, etc. to observe and predict blooms. However, two main issues have persisted: (1) They are costly (especially for computational cost) and are used primarily for long-term forecasting (Anderson et al. 2016); (2) The biophysical coupling effects involving bloom occurrences and the distinction between cyanoHAB and other algal blooms were not satisfactorily considered (Kudela et al. 2015, Anderson et al. 2016. In this paper, a new index is developed to forecast the potential bloom occurrence in freshwater bodies and to assess the freshwater quality relating to the cyanoHAB presence. Our goal is to determine the bloom threshold based on the nutrient level combined with algal pigments and then estimate the appearance of bloom patterns. Specifically, the three following objectives were addressed: (1) To develop a new index, the Threshold Index (or colloquial TRINDEX) for cyanoHAB onset prediction; (2) To validate the TRIN-DEX using field data from two Nova Scotian lakes, Mattatall (Colchester and Cumberland counties) and Torment (Kings County). Determining the threshold TRINDEX for bloom prediction is assessed by binary discrimination test (Receiver Operating Characteristic (ROC) analyses); (3) To suggest a practical scheme for bloom prediction based on TRINDEX definition with the expectation that our approach can be applied at a larger scale for different trophic waterbodies where blooms could happen.

Study Sites
Two sites in the province of Nova Scotia (NS) Canada, were our targets. Mattatall lake (ML), between Colchester and Cumberland counties, was served as the main site; and lake Torment LT (Kings County) was used for independent verification. The datasets collected from both lakes are independent as LT is located over 200 km from ML in a different geographic area. Both locations are shown in Fig.  1 with their information in Table 1. ML is mainly spring-fed with some brooks. In terms of human activities, there are blueberry fields and forestry on the west side of the lake. There are approximately 60 residences (both seasonal and year-round) with varying lot sizes and ages. With the data from three years (2015-2017), ML showed a moderately eutrophic level based on chlorophyll-a and TP measurements and contained potentially toxic cyanobacterial species . There was a bloom of green algae (Mougeotia sp.) in the middle of summer (2015,2016) following by a cyanobacterial bloom of Dolichospermum planctonicum in late summer-autumn.
LT is in East Dalhousie, Kings County. The lake is used for residential and recreational purposes. It covers 261 hectares. There are 250 cottages and homes around the lake. It is surrounded by a forest with no significant agricultural activity on the watershed. The lake is dystrophic with brown water (colour changes 70-145 mg.L -1 Pt), low pH (5-6), and high organic content (DOC was 6.5-8.5 mg.L -1 ) (Marty & Reardon 2016. The frequency of HAB has increased every year, i.e. since summer 2016. The cyanobacterial blooms were dominated by other cyano-different geographic area. Both locations are shown in Fig. 1 with their information in Table 1.

Field Sampling Process and Lab Analysis
Samples were taken bi-weekly or every month, depending on the weather conditions, starting in May through to November, at the surface and bottom levels. Sampling locations are presented in Fig. 1. DO was measured by YSI probe (Professional Plus, Hoskin Scientific LTD, USA). The data on phosphate (PO 4 ), nitrate (NO 3 ), chlorophyll-a (chl-a) and phycocyanin (PC) were analysed by our Laboratory.
To determine the concentration of pigments, water samples were filtered through the GF/A Whatman filters. The filters were extracted after that in 90% acetone for chlorophyll-a or in phosphate buffer saline for phycocyanin, then sonicated (50% amplitude for 30 seconds) and centrifuged twice (first centrifugation at room temperature with 3500 g for 10 mins; second centrifugation at 4ºC, 13000 g for 1.5 hours). The pigment concentrations (chl-a and PC) were measured in µg.L -1 unit by using the Turner 10AU Fluorometer (Turner Designs, USA) based on the calibration standard curve for both pigments. A dissolved fraction of phosphate and nitrate were measured after filtration through GF-A filters by a photometer using a tablet reagent system (YSI 2010).

Mathematical Formulations
We believe the TRIX concept suggested by Vollenweider et al. (1998) for coastal marine zones is reasonable to be employed for freshwater resources, as it uses the combination of key biological and hydrochemical parameters in a logarithmic relationship without specific characteristics of the marine environment. However, this conception was not well known in freshwater literature. To deal with the non-normal distribution of most of the environmental data, the logarithmic transformation is an appropriate way to 'transform' random data into the normal distribution form. Inspired by the logarithmic transformation of Vollenweider et al. (1998), we suggest herein our Threshold Index (hereafter named TRINDEX) formula as follows.
key biological and hydrochemical parameters in a logarithmic thout specific characteristics of the marine environment. However, this not well known in freshwater literature. To deal with the non-normal ost of the environmental data, the logarithmic transformation is an y to 'transform' random data into the normal distribution form. Inspired ic transformation of Vollenweider et al. (1998), we suggest herein our x (hereafter named TRINDEX) formula as follows. Where, TRINDEX -Threshold Index to be considered M i -measured parameter i L i -lower limit (concentration) of the considered parameter i U i -upper limit (concentration) of the considered parameter i k -factor standing for the maximum value of considered range (0,10), so k=10 by default n -total of parameters M i we expect to consider It is our view that chl-a is not a perfect parameter to represent the cyanobacterial bloom detection, because chl-a can be produced by all algal species including microalgae. We propose that the pigment PC, therefore, needs to be introduced into the index as an alternative parameter to reflect the cyanobacterial presence in all phytoplankton communities. There will be hence two scenarios of TRINDEX to be considered by our study. The absolute deviation of oxygen from 100% (D%O) shows the main processes of phytoplankton growth which can be used for the detection of bloom onset; nitrogen and phosphorus were chosen in the form of nitrate (NO 3 ) and phosphate (PO 4 ) as the main sources of nutrients for cyanobacteria growth. These components can be easily measured daily. The DO fluctuation could be high depending on each period of the day in eutrophic waters. Our observations on various Nova Scotian mesotrophic lakes showed that the period between 8 AM to 2 PM was the optimal time for the development and accumulation of phytoplankton. This period was, therefore, suggested to be used in our monitoring purposes.
The quantity (logU i -logL i ) is defined by the difference between upper and lower limits. When these limits are determined, all values being out of this range should be excluded. Therefore, to have an appropriate range to cover different trophic conditions, we used limits of detection (LOD) as the lower limit and maximum value obtained in measurements of the considered variable for the upper limit.

Definition of onset of blooms:
The onset of a bloom can be defined as the start or beginning of any visible signs of blooms, i.e. the first visible appearance of signs or symptoms of some surface scums of a waterbody. However, our definition of bloom onset herein is not only associated with visible signs of algal appearance, but also with scenarios where there are no visible algal signs (but certain amounts of phycocyanin present). Therefore, we suggest that when PC concentration is over 0.03 mg.L -1 ± 0.002, these cases can be considered as onset of bloom (PC criteria based on Brient et al. 2008), equivalent to the cell count 20,000 cells per mL of cyanobacteria. The onset could be a visible bloom or scum situation, but this may not be stable. The surface bloom at onset status can be observed appearing and disappearing unstably in a short period (critical phase) while the supercritical phase of blooms can show a stable situation where blooms or scums can last visibly for long periods (many hours or many days). The onset status can lead to the 'stable blooms' if ambient conditions allow them to develop, or completely vanish, also due to the ambient conditions.

Discrimination Test for Threshold: Receiver Operating Characteristic (ROC) Curve
As in the clinical practice (Carter et al. 2016), a 'yes or no' decision is usually required for 'diseased or non-diseased' situation, herein two states for the bloom: 'yes -bloom occurrence and no -no bloom' are also defined. The bloom threshold T is based on the variable TRINDEX that will drive the outcomes of the decision, as positive (yes -bloom) or negative (no -no bloom) as follows:

(ROC) Curve
As in the clinical practice (Carter et al. 2016), a 'yes or no' decision is usually required for 'diseased or non-diseased' situation, herein two states for the bloom: 'yes -bloom occurrence and no -no bloom' are also defined. The bloom threshold T is based on the variable TRINDEX that will drive the outcomes of the decision, as positive (yesbloom) or negative (no -no bloom) as follows: (2) threshold T 2 more of a balance is struck, as both positive and negative events are missed, and finally, (3) T 3 most negative events are correctly identified, but a large proportion of the positive patterns are incorrectly deemed negative.
Four possible outcomes can result for each trial: correctly positive, correctly negative, incorrectly positive and incorrectly negative. At this point, the cut-off area will be introduced as the area which measures the discrimination, i.e. the ability of the TRINDEX test to correctly classify those with, or without the 'disease', as a binary variable. That is equivalent to bloom occurrence (yes) or no bloom (no) respectively.
The ROC analysis is a binary discriminator test which assesses the predictive power of a binary classification system to evaluate a model in a decision-making process and it helps to identify the threshold T. This test is recognized as a useful tool for interpreting medical test results and in many other fields as a method for evaluating the accuracy of analyses (Lerman et al. 2010). For more details of ROC curves and related metrics, refer to Brown & Davis (2006).
A curve illustrating the model performance can then be determined by plotting CPF (correct positive fraction or sensitivity) on the vertical axis and (1 -CNF) (CNF is correct negative fraction or specificity) on the horizontal axis (Fig. 2a). The sensitivity is the probability that case X was classified correctly as above the threshold while specificity is the inverse, namely probability that X classified correctly as below the threshold.
The perfect model (Fig. 2b) corresponds to a point in the top left-hand corner of the Y-axis (i.e. CNF = CPF = 1), the top right (CPF = 1, CNF = 0) and bottom left (CPF = 0 and CNF = 1) of the diagram correspond to the extremes of the decision process where every trial is always deemed either positive or negative. A random predictor (CP = IP and CN = IN) gives a straight line CPF = 1 -CNF (X = Y, line of equality or random change). This can be explained will be introduced as the area which measures the discrimination, i.e. the ability of the TRINDEX test to correctly classify those with, or without the 'disease', as a binary variable. That is equivalent to bloom occurrence (yes) or no bloom (no) respectively.
The ROC analysis is a binary discriminator test which assesses the predictive power of a binary classification system to evaluate a model in a decision-making process and it helps to identify the threshold T. This test is recognized as a useful tool for interpreting medical test results and in many other fields as a method for evaluating the accuracy of analyses (Lerman et al. 2010). For more details of ROC curves and related metrics, refer to Brown & Davis (2006).  Binary discrimination skill assessment curves on the right (Adapted from Stow et al. 2009). by a value of index reaching equalling numbers of true and false positives occur. This value is considered as critical or threshold. The definition of the area under the ROC curve (AUC) was introduced as a criterion to evaluate the overall performance of the discrimination test. This is the percentage of randomly drawn pairs for which this is true. AUC may take values ranging from 0.5 (no discrimination) to 1 (perfect discrimination). A rough practical guide for evaluating the accuracy of a discrimination test with the AUC criteria described as in Table 2 (Carter et al. 2016).
Another factor to estimate the effectiveness of our test is the Youden index J. The Youden index J (Youden 1950) is defined as: Where c ranges over all possible criterion values.
The Youden index J, ranging between 0 and 1, is commonly used to measure overall diagnostic effectiveness (Schisterman et al. 2005). When J values are close to 1, it indicates that the effectiveness is relatively good, while values close to 0 indicate limited effectiveness.
We use the dataset from ML (2015-2017) for TRINDEX development and data from LT (2015-2018) to validate our approach. In the following calculations, our parameters sensitivity (correct positive fraction) and specificity (correct negative fraction) are displayed in the percentage (%) instead of the fraction (see Fig.2b).
In our model, the real sample size of two lakes is different (170 samples of LT compared to 266 ones of ML, greater than the required minimum number 132), hence it is statistically significant.
Our experimental data related to HAB for both lakes (Mattatall and Torment) are not normally distributed. Using log transformation as above mentioned is to convert them into the 'normal distribution' and TRINDEX can be then processed. The statistical software R combined with Excel and MedCalc is used to carry out all steps.

TRINDEX Calculations and the ROC Curve for Performance of Bloom Prediction
Data used for determining lower and upper limits are given in Table 3. Table 3, formulas (2) and (3) for TRINDEX will become:     ], …(7) ... (7) Data were divided into two groups: (i) bloom occurrence and (ii) no bloom. The distinct scenario for both bloom and no bloom conditions for TRINDEX1 and TRINDEX2 using rnorm in R software is graphically represented in Fig. 3.

Based on
The cut-off point 5.0 was estimated from Fig. 3. However, this cut-off point should be validated by field observations via ROC curve analysis to precisely determine the threshold value for cyanobacterial bloom. This discrimination test was processed with field observation data ( Fig. 4 and Table 5).
Sensitivity (true positive cases) was calculated by assuming that every TRINDEX value can lead to bloom. Inversely, specificity of false positive was done by assuming that every TRINDEX cannot lead to blooms. All calculations of TRIN-DEX were rounded at 0.2 unit. Formulas for false positive and false negative are as follows.

True positive = Sensitivity = ∑ Number of TRINDEX with bloom/ Total of bloom case, …(8a) False positive = (100 -Specificity) = ∑ Number of no bloom TRINDEX /Total of no bloom cases, …(8b)
The dataset of 266 values of TRINDEX1 during 2015-2017 was used for TRINDEX in ML, among them 74 cases with bloom and 192 cases without bloom. Single cases of bloom were detected when TRINDEX1 started from value 4 ( Table 5). The higher TRINDEX1 (greater than 5.0), more frequent bloom cases were recorded than no bloom cases; and maximum bloom cases (14 cases) happened when TRINDEX1 = 6.2. Therefore, it can be said that the TRIN-DEX1 range from 4.0 to 5.0 is the marginal situation, where there is likely to be no sign of a visible bloom but just small disturbances of the environmental conditions (leading to a higher TRINDEX1) could trigger the cyanobacterial bloom.
There were 249 calculated values of TRINDEX2 (Table  5) with 75 bloom cases and 174 no bloom cases. The lowest TRINDEX2 showing bloom was 4.4, but when TRINDEX2 = 5.2 the number of bloom cases was more prevalent than no bloom cases. The maximum number of no bloom cases was noticed when TRINDEX2 = 4.0 and the maximum bloom cases were when TRINDEX2 = 6.2. From the above analyses, the proposition of a transition range for TRINDEX2 was from 4.4 to 5.2 and the suggested threshold value for bloom occurrence suggested was 5.2.
From ROC curves (Fig. 4), the appropriate threshold for bloom onset can be chosen. It should have the maximum sensitivity and at the same time the minimum false positive cases. As two axes of our ROC curve are determined by the sensitivity 100% (the probability of true positive results) and (100% -specificity) (the probability of false positive results). As such, the false positive cases show TRINDEX are high but no blooms are occurring, while the false negative ones show the opposite scenario: TRINDEX are low but blooms are observed.
ROC curve for TRINDEX1 has the best combination of high sensitivity (81%) and low false positive (12%) (Fig. 4, left side), equivalent to the point 5.0 in Table 5. So, all results of TRINDEX1 equal or greater than 5.0 must be resulting in cyanobacterial blooms. TRINDEX2 (Fig.4 right) has the best combination of high sensitivity (83%) and low false positive (6%), equivalent to 5.2 in Table 5. The significant level represented by p-value stands for the probability that the observed sample AUC (area under the curve) is found when the true (population) AUC is 0.5. When p is small (p < 0.05) then it can be concluded that AUC differs significantly from 0.5. Carter et al. (2016) have mentioned that a ROC curve test has (at least) some discriminatory power if the 95% confidence interval of AUC does not include 0.50. In our case of ML, the AUC is 0.926 (95% CI: 0.887 to 0.954; p < 0.0001) for TRINDEX1 of ML and AUC is 0.961 (95% CI: 0.929 to 0.981; p < 0.0001) for TRINDEX2. This confirms the good fit of our threshold 5.0 for ML as the AUC = 0.926 and 0.961, the discrimination test was then excellent (Table 2).
An AUC over 0.9 (0.926 and 0.961 for TRINDEX1 and TRINDEX2, respectively) implies that in a hypothetical experiment in which we randomly select pairs of positive cases (no bloom) a false negative result is deemed comparable to that of a false positive result. With the environmental factors that can affect a lake system, the random excitation can cause a change of stability around the equilibrium point and beyond this equilibrium point, blooms occur, i.e. instability will cause the HAB.  The significant level represented by p-value stands for the probability that the observed sample AUC (area under the curve) is found when the true (population) AUC is 0.5. When p is small (p < 0.05) then it can be concluded that AUC differs significantly from 0.5. Carter et al. (2016) have mentioned that a ROC curve test has (at least) some discriminatory power if the 95% confidence interval of AUC does not include 0.50. In our case of ML, the AUC is 0.926 (95% CI: 0.887 to 0.954; p < 0.0001) for TRINDEX1 of ML and AUC is 0.961 (95% CI: 0.929 to 0.981; p < 0.0001) for TRINDEX2. This confirms the good fit of our threshold 5.0 for ML as the AUC = 0.926 and 0.961, the discrimination test was then excellent (Table 2).
An AUC over 0.9 (0.926 and 0.961 for TRINDEX1 and TRINDEX2, respectively) implies that in a hypothetical experiment in which we randomly select pairs of positive cases (no bloom) a false negative result is deemed comparable to that of a false positive result. With the environmental factors that can affect a lake system, the random excitation can cause a change of stability around the equilibrium point and beyond this equilibrium point, blooms occur, i.e. instability will cause the HAB.
The range of values of TRINDEX1 from 4.0 to 5.0 can be classified as the transition phase, i.e. potential for a bloom occurrence in the near future. Considering TRINDEX as 'predictor' for bloom, the Youden index J is significant in our tests: 0.69 for TRINDEX1 and 0.78 for TRINDEX 2 (Table 4c). Hence, it is concluded that for ML, two following cut-off points are considered as thresholds: 5.0 for TRIN-DEX1 while 5.2 for TRINDEX2 with a goodness of fit of discrimination test.

Independent Verification by Lake Torment Data
The same procedure was followed by using data from lake Torment (LT) and cut-off point was found approximately 4.6 for TRINDEX1 and TRINDEX2 (Fig.5). Fig. 6 shows the ROC curve analyses for LT.
The range of values of TRINDEX1 from 4.0 to 5.0 can be classified as the transition phase, i.e. potential for a bloom occurrence in the near future. Considering TRINDEX as 'predictor' for bloom, the Youden index J is significant in our tests: 0.69 for TRINDEX1 and 0.78 for TRINDEX 2 (Table 4c). Hence, it is concluded that for ML, two following cut-off points are considered as thresholds: 5.0 for TRINDEX1 while 5.2 for TRINDEX2 with a goodness of fit of discrimination test.

Independent Verification by Lake Torment Data
The same procedure was followed by using data from lake Torment (LT) and cut-off point was found approximately 4.6 for TRINDEX1 and TRINDEX2 (Fig.5). Fig. 6 shows the ROC curve analyses for LT.     (Fig. 6 left) had the best combination of high sensitivity (79%) and low false positive (6%) at value 4.8, while TRINDEX2 (Fig. 6 right) had the best combination of high sensitivity (79%) and false positive (2%), at value 5.2.
The transition phase of TRINDEX1 for LT data was 3.4-4.8 and the cut-off point was 4.8. The AUC = 0.887 confirmed the discrimination test was excellent (95%CI: 0.830 to 0.930; p < 0.0001). For TRINDEX2, the transition range is 4.0-5.2 and cut-off point at 5.2, and AUC = 0.956 also showing that the discrimination test is excellent (95%CI: 0.914 to 0.982; p < 0.0001). Youden index J is also significant: 0.73 for TRINDEX1 and 0.78 for TRINDEX2.
Tables 4 a,b,c show comparison between LT and ML data in term of threshold and ROC curve analyses.
As TRINDEX2 combines both pigments PC and chl-a, it seems inaccurate for the prediction of cyanobacterial bloom thresholds due to the increase of chl-a by other phytoplankton rather than just cyanobacteria, hence increasing TRINDEX2 above the real threshold. Therefore, TRINDEX1 based only on PC seems the better indicator to estimate the threshold for cyanobacterial bloom than TRINDEX2. The lowest value of TRINDEX1 when blooms appear in the 2 lakes was chosen for the transition phase and the cut-off point is the threshold for bloom onset. TRINDEX1 thresholds for the prediction of cyanobacterial blooms can be defined: • TRINDEX1 < 3.4: no bloom happens as the system is stable.
• TRINDEX1 is between 3.4 and 4.8: there will be a high risk of cyanobacterial bloom development; this range is called 'transition phase'. Other environmental components such as weather conditions should be triggering factors to predict bloom development. In this transitional phase, the situation tends to the onset tendency, that means blooms are happening but can be unstable (appearing and then disappearing in a short period) or becoming stable, depending on ambient conditions.
• TRINDEX1 > 4.8: cyanobacterial blooms could happen and become stable during a certain period (hours or even days).
The performance of our model was evaluated via Accuracy, Precision, Recall and F1 score metrics. Precisely, among these 40 observations, we have 4 false positive cases (10%); 22 true positive (TP) cases; 1 false negative (FN) case (5%); and 13 true negative (TN) cases. It is important to note when we have a large number of true negative cases, it can influence the accuracy of our predictions.    In summary, the TRINDEX1 model which is applied to the real observation data from LT for 3 summers is 87.5% accurate, with a recall 0.95 (which is excellent as far above 0.5), a precision of 84.6%, and a F 1 score near 0.9 (which is also very good as F 1 defined in the range from 0 (bad test) to 1 (excellent test)).

Practical Scheme for Application
As indicated, TRINDEX1 is a more appropriate indicator to predict the cyanobacterial blooms. The transition phase can point out the need and significance of a frequent monitoring program for the waterbody. For the possibility of bloom occurrence: the closer TRINDEX1 to threshold values, the higher probability of cyanobacterial growth. The scheme in Fig. 7 is suggested as a practical tool for bloom onset prediction and management based on TRINDEX1.
From this scheme, three scenarios of risk could lead to the management decision for waterbody dealing with algal bloom issue: (1) When no bloom is observed and TRINDEX1<3.4, the monitoring plan for waterbody should follow its established routine; (2) When TRINDEX1 of the lake goes between 3.4-4.8, the risk for a cyanoHAB growth increases. In this case, there might not have any visible sign of bloom in a waterbody, but a more frequent sampling plan with all nutrient parameters, plus taxonomy and toxin analyses should be initiated. Also, the early warning signs could be placed in all accesses to the lake to inform residents about the algal growth concerns. (3) If TRINDEX1 is calculated greater than 4.8, the risk of blooming issues is high and stable blooms could be either observed (on the surface) or not (blooms dissipate in the water column). In this last scenario, any activities of people and pets must be restricted in the TRINDEX1 is our recommendation for the bloom prediction indicator for monitoring purposes.

Advantages and Limitations of TRINDEX
A "one-size-fits-all" approach (a term used by Anderson et al. 2015) for HAB modelling is not practical and even utopic. Whether forecasting the potentially harmful bloom occurrence or tracking its path, models should always be linked to the local chemistry, physics, and biology of the waterbody and based on the in-situ data. Alert systems and mitigation strategies will be dictated by the history of human resource use in the region and will hinge on local to federal government mandates for protecting those resources (Anderson et al. 2015). From this perspective, we would underline here the advantages and limitations that could lead to conceive TRINDEX for monitoring purposes in each waterbody.
(1) TRINDEX, especially TRINDEX1, is suggested as an indicator for the cyanobacterial bloom onset. This can tell about cyanobacteria presence and bloom occurrence in the waterbody.
(2) The range called 'transition phase' should be understood as a potential risk, or 'a situation involving exposure to bloom' possibility, i.e. that can lead to (i) stable bloom situation (supercritical); or (ii) unstable blooming immediately (critical); or (iii) nothing happening (subcritical), depending on many other factors (such as light, wind, temperature etc.). The transition phase must be considered as an important step of the bloom onset and need to be carefully studied due to its high sensitivity for both false positive and false negative cases.
(3) Temperature was not yet considered in our TRINDEX model herein, because different potentially toxic cyanobacterial species grow with various temperature ranges. We suggest our TRINDEX1 could be used when the temperature is greater than 15 o C (which is the lowest temperature range observed in Atlantic Canada for cyanobacterial blooms development).
(4) The inaccuracy of the model may be caused by community metabolism, which was not considered yet. For the potential users for other lakes (oligotrophic to medium eutrophic), our TRINDEX thresholds suggested herein can be appropriately applicable; while for high eutrophic ones, we should recommend that the users would define their own upper and lower limits and would go through all necessary steps to adjust correctly their own lake thresholds.
(5) Also, the dominant species generating blooms can be a significant factor that could intervene in the accuracy of TRINDEX. Both lakes in our consideration (Mattatall and Torment) have the same genus Dolichospermum.
Further investigation needs to be undertaken with other waterbodies containing different species generating blooms and community metabolism. From this scheme, three scenarios of risk could lead to the management decision for waterbody dealing with algal bloom issue: (1) When no bloom is observed and TRINDEX1<3.4, the monitoring plan for waterbody should follow its established routine; (2) When TRINDEX1 of the lake goes between 3.4-4.8, the risk for a cyanoHAB growth increases. In this case, there might not have any visible sign of bloom in a waterbody, but a more frequent sampling plan with all nutrient parameters, plus taxonomy and toxin analyses should be initiated. Also, the early warning signs could be placed in all accesses to the lake to inform residents about the algal growth concerns.
(3) If TRINDEX1 is calculated greater than 4.8, the risk of blooming issues is high and stable blooms could be either observed (on the surface) or not (blooms dissipate in the water column). In this last scenario, any activities of people and pets must be restricted in the concerned waterbody. The monitoring plan should be more intensive during the bloom episodes.

CONCLUSIONS
The prediction of cyanobacterial bloom occurrence has always been a challenging subject in both marine and freshwater environments for many decades, and the emphasis on the determination of thresholds for bloom onset, especially in the freshwater ecosystem was not strong. An ideal alert system should quantitatively predict cyanoHAB likelihood, intensity, and potential blooming. The number of approaches for monitoring, detecting, predicting, and forecasting the onset, fate, and demise of algal blooms is arguably comparable to the diversity of species being studied.
Here we focus our work on the prediction of cyanoHABs using index capable to show the thresholds determining the transitional phase to blooming aspects, above which, cyanobacterial blooms in freshwater bodies could happen. All our efforts rely on a close relationship between observations and a simple model leading to developing a forecasting capability. TRINDEX could be practically developed and used in the potential application of smart systems for water management.