Relationship between water transparency and physical-chemical variables in lakes of the western Amazon , Brazil

Water transparency is one of the main indicators of seasonal changes from the water level in lacustrine systems, but other environmental spatial and temporal variables can act jointly on water transparency. The diagnosis of interactions among physical-chemical variables and water transparency can be made by fitting statistical models based on multiple regression analysis. The objective of this study is to evaluate the set of limnologic variables that best predict the seasonal variation in water transparency in lakes of the western Amazon. These data were collected in both the drought and flood seasons in 78 lakes of the Mamirauá Reserve for Sustainable Development (RDSM). The delineation of the sampling followed the choice of variables, the analysis of premises imposed by the linear regression model and the validation of the model. The variables that best fitted the model were: water level, bottom temperature, conductivity, and pH for both seasons analyzed (R = 0.64). The resulting model suggests that the physical-chemical variables influence the re-suspension process and sedimentation of particulate material together with the seasonal variation of the water level at the lake system of RDSM.


Introduction
The Amazon floodplain is one of the most extensive wetlands on Earth.Its area has an extension of 300,000 km 2 , considering only the Central Amazon section (IRION et al., 1997).It is a flat region, with a height gradient of around 120 m along its entire extension.Due to this low gradient, water level variations in the main channel between 7 and 13 m correspond to an inundation area of 20 to 100 km perpendicular to the river's margins (COSTA, 2005).
The rivers and floodplains of the Amazon basin constitute a complex of channels, rivers, islands and depressions that are permanently modified by sedimentation and transport of suspended material through whitewater rivers (MERTES et al., 1996).Most of the lakes in these systems are formed by the lateral displacement of the main channel of meandering rivers (SIOLI, 1984) and their chemical composition depends on the entry of water masses originated from the associated river, from the tributaries of terra firma and from rainfall (FORSBERG et al., 1988;JUNK, 1997).
The physical, chemical and biological processes occurring in the lakes vary according to season (flood and drought).During the flood season, surface waters from the rivers enter the lakes through channels remaining in the floodplains, and during the drought this stored water is released naturally, influencing the geochemistry of adjacent rivers (ALCÂNTARA et al., 2008).Generally, during the flood period the suspended material is deposited, increasing water transparency, and during the dry season the vertical mixture caused by winds re-suspends the sediment material, which turns the water more turbid (SCHMIDT, 1973).
According to Wetzel (2001), turbidity (the opposite of water transparency) is a visual property of water, constituting a reduction or absence of light in the water column due to suspended particles, usually inorganic, originated from soil erosion in the river basin and from sediment re-suspension in the bottom.In Amazon lakes, turbidity usually varies with the flooding pulse (ALCÂNTARA et al., 2008) and, for the same time of year, the depth of water transparency can vary significantly, reaching a variation of 0.25 m to 1.3 m in drought season to 0.75 m to 2.2 m in flood season (QUEIROZ, 2007).This behavior suggests the existence of other environmental variables that, in conjunction with the flooding pulse, act on the change in water transparency in Amazon lakes.
The study of the dynamics of water transparency in natural lakes is important because of its consequences for the aquatic biota (RUSSEL et al., 2001).A high concentration of suspended sediment stimulates the occurrence of floating and emerged plants, which reduces the concentration of submersed plants (CARVALHO et al., 2005).The increase of this variable could also cause changes in the bio-dynamic cycles, reducing the phytoplankton community as it interferes in the speed and intensity of photosynthetic activity and influencing negatively on the predation capacity of the ichthyofauna due to its interference in visibility (BUKATA et al., 1995;RADKE;GAUPISCH, 2005).
Aiming to delineate a model that would express the relationship between water transparency in lakes of the Mamirauá Reserve for Sustainable Development, Amazonas State, and limnologic variables which would manifest the dynamics of lacustrine systems, a set of statistical methods was applied to determine the significance of parameters corresponding to the selected variables by multiple regression analysis.

Study area
The Mamirauá Reserve for Sustainable Development (RDSM, Figure 1) is located between the geographical coordinates S 2 o 48' to 2 o 54' and W 64 o 53' to 65 o 03', in the confluence of the Solimões and Japurá rivers, and present the characteristic várzea mosaic of water bodies and forest types (QUEIROZ, 2007).With a total area of 1,124,000 ha, it is the largest exclusive protection reserve of the Amazon floodplain.Two types of floodplain are found at the RDSM: (1) areas located between the Auati-Paraná and Paraná-Aranapu rivers (Subsidiary Area), which make up 85% of the entire reserve, constituted mainly by terrains of Pleistocene origin, and (2) the floodplains between Paraná do Aranapu and the confluence of the Solimões and Japurá rivers (Focal Area), of Holocene origin (AYRES, 1995).Within the RDSM focal area there are around 830 lakes, distributed among nine lake systems: Mamirauá, Jarauá, Ingá, Liberdade, Tijuaca, Horizonte, Boa União, Aranapu and Barroso.
Preliminary analysis indicates that the lake systems Mamirauá and Jarauá are those with most limnologic information available (QUEIROZ, 2005(QUEIROZ, , 2007)).For that reason, both systems were selected for our study.
The Mamirauá system, located at the entry of the Focal Area, is connected during the drought only with the Japurá river and the Jarauá system, which is located at the center of the focal area, connected with rivers Solimões and Japurá during the entire hydrologic year.According to Queiroz (1999), the Jarauá includes approximately 60% of the lakes the entire RDSM focal area.

Physical and chemical parameters
Data on the precise location of the lakes at RDSM and the limnologic parameters were acquired from the database of the Mamirauá Institute for Sustainable Development (IDSM), collected during studies of fish communities at RDSM (QUEIROZ, 2007) in 78 lakes from lake systems Mamirauá and Jarauá (Table 1).The location of the lakes was made using a GPS instrument, and the water level (NL) was collected daily by a measurement at a river gauge located inside the reserve (November 1992 to March 2001).Water transparency data, measured by the visual disappearance depth of the Secchi disk (SEC), were collected during the dry (NL = 5 m) and flood (NL = 15 m) periods in 78 lakes (n-sampling) of the Mamirauá and Jarauá systems (SIST) within the RDSM.Aside from these parameters, in this study the following data were used: water electric conductivity (CD), surface temperature (TS), bottom temperature (TF), hydrogen-ion potential (pH), dissolved oxygen concentration (OD) and water current flow (FCA).

Model construction
Accounting for 80% of the total n-sampling (training data: 63 samples) were randomly selected and used for the model construction step.The other 20%, accounting for 15 samples, were used in model validation analysis.
The assumptions of any regression model, independence and normality of the dependent variable Y and constant variance of the residues, were investigated respectively by the Kolmogorov-Smirnov (SIEGEL, 1975) and Levene (KUTNER et al., 2004) statistical tests.The Pearson index (KUTNER et al., 2004) was used to measure the correlation between every pair of explaining variables and between each of them and the dependent variable.
The Best Subset and the Forward Stepwise techniques (KUTNER et al., 2004) were used for the selection of variables based on linear regression model with a single dependent variable.Based on the adjustment criteria R P 2 , R a 2 and C p (Best Subset) and FSt (Forward Stepwise), these techniques can define the subset of explaining variables that best predict the values of the dependent variable for new observations.
The values verified by the R P 2 criterion are based on the existence of a lower proportion of variation in Y for a given X and on the number of variables included in the regression model; hence the maximum value of R P 2 occurs when all variables are in the model.Because of this, the analysis using the R P 2 criterion should not be limited to the best value, but rather the value where the addition of variables did not cause an increase in R P 2 .The R a 2 criterion is analyzed in the same way.The C p criterion is based on the mean square error and the total variance.The best fit of variables occurs when the lower value of C p is near the number of parameters in the determined model.Based on F statistics, the Forward Stepwise test analyzes the input and output of many variables and combinations of these until the establishment of a constant variable in the model.A problem linked to this test is the inclusion of highly correlated variables in the model.
The quality of interaction from each variable in the models was evaluated by multiple regression analysis, observing the p-value from each variable and the determination coefficient (R 2 ) of the estimated models.
Despite being part of the same complex, the Solimões and Jarauá systems have differences in connectivity during the drought season.That encouraged us to test the existence of variations in each system, considering them as a binary qualitative variable.So a complete model weighting with the systems variable and a reduced model without the systems influence were adjusted.The significance of the coefficients of the adjusted regressions was verified by a linear F statistics test for both the complete and the reduced model.The validation of the chosen model was done using two procedures: (i) comparison of the intercept coefficients (β 0 -point of intersection of the line with the ordinate), the angular coefficients of regression (β 1 , β 2 , β 3 , … , β x -measure of variation associated with explanatory variable when the dependent variable is changed in one unit) and the respective standard deviations (s{β}) originated from the training and validation data; (ii) analysis of the β hypothesis of the linear regression between the validation observed values (Y) and the predicted values (Ŷ), calculated from β parameters of the chosen model for the validation data.All the analysis was done using the software STATISTICA 6.0.

Results and discussion
As required when using linear regression analysis, the normality of the dependent variable was analyzed.Water transparency (Y) presented a normal distribution only when log-transformed (Kolmogorov-Smirnof, D calculated = 0.110 < D critical = 0.153).The result of this change also increased the correlation coefficient between Y and the explaining variables (Table 2).The independent variables best related with Y also presented a high correlation among them, meaning that they are not suitable for being jointly used in the model (Table 2).The adjustments among explaining variables and variable Y agree with literature findings (ALMEIDA; MELO, 2009;CRISTOFOR et al., 1994;FREIRE et al., 2009;MELO;SOUZA, 2009), except for the flow.The relationship between flow and water transparency should be inverse, due to the increment of suspended sediments in the water column caused by the swirl (ALMEIDA; MELO, 2009;CRISTOFOR et al., 1994;MELO;SOUZA, 2009).However, the increased turbidity in Amazon lakes occurs during the low water period, when the process of sedimentation is interrupted by the same resuspension induced by wind (ALCÂNTARA et al. 2010;BOURGOIN et al., 2007).In this study, 60% of data were collected in the flood season (Table 1), which could explain the positive relationship between flow and water transparency.
The statistical tests analyzed presented three sets of different variables for the adjustment of the model, where criteria R p 2 and R a 2 obtained the same selection of variables.The best models generated for criteria R p 2 , R a 2 , C p and FSt are presented on Table 3, and the acceptance and/or rejection of selected variables of these models, determined by the pvalue, are listed at Table 4. , reached the best value of the determination coefficient (R 2 ) that is, the greater measure of adequacy of fit of the regression with five variables included in the model: Y = 3,188 + 0,030 * NL -0,056 * TF + 0,002 * CD -0,312 * pH -0,05*OD (1) Nevertheless, one observes that only variable pH presented a significant acceptance probability (p pH ≤ 0.05).This demonstrates that the high value of R 2 obtained in Model 1 represents the cumulative effect of a certain number of highly correlated variables (Table 1  and 4 The adequacy of a smaller number of predictor variables that explain the total variation in the dependent variable Y in Model 3 would be advantageous were it not for the fact that FSt statistics may adjust models that include highly correlated variables.In this case, the only two variables included in Model 3 (TF and OD) have the highest correlation coefficient (Pearson = 0.85), which determined the exclusion of this model.
The high correlation between the TF and OD variables led to the impossibility of inserting them in the models due to redundancy of information, which probably caused the reduction of the individual significance of variables selected in Model 1.Therefore, the adjustment criterion C p used along with the Best Subset technique generated the most appropriate model (Model 2), with a good coefficient of determination and a significant inclusion of all selected variables (Table 4).
Model 2 predicts that 64% of water transparency variation in RDSM lakes is explained by a set of four variables: positively by the variation in water level and conductivity -a higher NL and CD, a greater transparency of water; and negatively by the variation in bottom temperature and pH -a lower TF and more acidic lake water, a greater value of depth of the visual disappearance of Secchi disk.This does not mean that the relationship found between the independent variables is standard for other lake systems.In RDSM lakes, some studies showed an inverse relationship between these variables.Chaves et al. (2005) found that during the flood in the floodplains lakes of RDSM the waters are cooler and more alkaline than in the dry season.According to Henderson et al. (1999), during the flooding when the water level reaches its maximum, the conductivity in the lakes of Jarauá and Mamirauá systems is lower (μ = 88) than during the dry season (μ = 94).
The complete and reduced models adjusted to the binary qualitative variables Solimões and Jarauá were generated from Model 2, as follows: Complete model: Reduced model: According to the F test used to determine the existence of variation between the Solimões and Jarauá systems, there is no difference between the complete and reduced models at the level of significance α = 0.05 (F* = 0.579 ≤ F critical 2.389).Therefore, the validation process was applied to the reduced model, which corresponds to Model 2.
All analysis used in the validation step demonstrated the good fit of the selected variables of Model 2. The residues of the multiple linear regression analysis between the dependent LogSEC variable and the explanatory variables NL, CD, T and pH constructed with the validation sample subset (n = 15) showed constancy of variance according to the Modified Levene test conducted with α = 0.05 (*t = -0.045< t critical = 2.298).The normality of the residuals was confirmed by a joint examination of the normal probability plot and the frequency distribution histogram of the residual classes compared to the normal predicted curve (Figure 2).
Since the variables were collected in a temporal sequence corresponding to the seasonal continuity of the flood and dry periods, it was necessary to analyze the correlation among close errors (residuals).The graphic analysis of the total errors of the regression model, disposed sequentially in time, demonstrated that there was no temporal correlation (Figure 3).The presence of values that diverge greatly from the average (outliers) was identified by Standardized Residual and Leverage tests.This analysis put in evidence three outliers in Y and 3 in X, respectively (Figure 4).Using test DFFITS ((DFFITS) i < 1 ou n p / 2 = 0.563) and the statics of Cook's Distance (D i < 10% of the value from F (5; 58) = 1.696), we verified that the predicted outlier values, did not significantly influence the estimated regression line.During the model validation, the β 0 , β 1 parameters and its respective standard deviations s{β} between the regression models originated from training and validation data were compared.In this analysis, the validity of the model generated is a result of the similarity between the parameters.According to the limits reached by β 0 and β for each variable in the training data and validation, there was little difference between β NL and β 0 ; β CD was reversed but very similar and only β TF and β pH showed large variations (Table 5).However, the large overlap between the coefficients β 0 and β 1,2,3,4 of training and validation models by the assigned standard deviations validates the model for this analysis.As a complement to this validation process, a linear regression analysis was done between Y observed (Y) and Y predicted (Ŷ), aiming to examine if the regression line passes through the origin and if the correspondence between Y and Ŷ is equivalent to one unit (100% of agreement among training and validation data).The t hypothesis test demonstrated that the regression passes through the origin (t* = 1.676 < t critical = 2.160), but that there was no fitting between the models (t* = 6.758 > t critical = 2.160, Figure 5).The non-fitting of the model chosen with validation data can be related to the sampling bias due to the small number of validation samples.This causes biased results because it does not totally represent the environmental phenomenon analyzed.Nevertheless, the determination coefficient (R 2 = 0.77) and the pvalue (p < 0.000) of regression analysis and the similarity of β demonstrate a significant fitting among the data analyzed of the model chosen.The selection of the model and appropriate variables, as well as how the variables enter the model, are complex tasks that must be performed by taking care to maximize the inter-relationships between variables (RAFTERY; DEAN, 2006).In this study, the results show the potential of the procedures used in developing the model for calculating water transparency in terms of physical and chemical variables selected for the lake ecosystems analyzed.
We suggest considering the temporal continuity during the collection of sampling data, including the transition periods between flood and drought characteristic of the Amazon hydrologic system.We also recommend the inclusion of other variables for the analysis of water transparency variation, searching for a relationship among parameters of the analyzed model with climatic, physiographic and topographic characteristics of the drainage basins.

Conclusion
The best set of variables that explain the water transparency variation in the lakes of systems Solimões and Jarauá at the RDSM were positively water level and conductivity; that is, the higher value of these variables increased the visual disappearance depth of Secchi disk, and negatively the bottom temperature and the pH, the higher bottom temperature and pH, lower water transparency.The chosen model showed a good fit with the selected variables.

Figure 1 .
Figure 1.Localization of the Mamirauá Reserve for Sustainable Development, Amazonas State.

Figure 4 .
Figure 4. Relation between errors and predicted Y values.Outliers in Y (traced circles) in X (continuous circles).

Figure 5 .
Figure 5. Regression between observed Y and predicted Y, calculated from the parameters of the model considered.

Table 1 .
Medium values of physical-chemical variables in 78 lakes of RDSM used in this study.

Table 2 .
Pearson's Correlation coefficients of variable dependent (Y and LogY) and explaining variables.

Table 3 .
Significant values for the selection of variables (p < 0.05)

Table 4 .
Multiple regression parameters (p-value and R 2 ) between water transparency and selected variables on the models

Table 5 .
Parameters of regression model for the training and validation data.