Hybrid Machine Learning techniques in the management of harmful algal blooms impact

Harmful algal blooms (HABs) are episodes of high concentrations of algae that are potentially toxic for human consumption. Mollusc farming can be affected by HABs because, as filter feeders, they can accumulate high concentrations of marine biotoxins in their tissues. To avoid the risk to human consumption, harvesting is prohibited when toxicity is detected. At present, the closure of production areas is based on expert knowledge and the existence of a predictive model would help when conditions are complex and sampling is not possible. Although the concentration of toxin in meat is the method most commonly used by experts in the control of shellfish production areas, it is rarely used as a target by automatic prediction models. This is largely due to the irregularity of the data due to the established sampling programs. As an alternative, the activity status of production areas has been proposed as a target variable based on whether mollusc meat has a toxicity level below or above the legal limit. This new option is the most similar to the actual functioning of the control of shellfish production areas. For this purpose, we have made a comparison between hybrid machine learning models like Neural-Network-Adding Bootstrap (BAGNET) and Discriminative Nearest Neighbor Classification (SVM-KNN) when estimating the state of production areas. The study has been carried out in several estuaries with different levels of complexity in the episodes of algal blooms to demonstrate the generalization capacity of the models in bloom detection. As a result, we could observe that, with an average recall value of 93.41% and without dropping below 90% in any of the estuaries, BAGNET outperforms the other models both in terms of results and robustness.


Introduction
Bivalve molluscs find their own food source in the microalgae (phytoplankton) present in the aquatic environment.Some of these microalgae belong to species which can highly increase their numbers in a given location.These phenomena are called "algal blooms".They are known as "harmful algal blooms" (HABs) when caused by microalgal species with harmful effects on human health, the environment, tourism and aquaculture ( [1]).These microalgae produce toxins which can be classified into three types depending on the type of poisoning they produce: Paralytic Shellfish Poisoning (PSP), Amestic Shellfish Poisoning (ASP) or Diarrhoeic Shellfish Poisoning (DSP).The latter being the most common in galician coast (northwest Spain)( [2]).This region is of great importance due to its high production of molluscs (Galicia generates around 40% of the European mussel production ( [3])).Although it is possible to detect them practically all year round, their abundance varies seasonally, as well as depending on certain external factors ( [2]).Although these events correspond to natural phenomena known for centuries ( [4]), in the last decades these events seem to have increased in frequency, intensity and geographical distribution ( [4]).There is currently an important scientific interest in understanding the causes and effects of the spatial and temporal distribution of algal species that make up HABs, as their potential effects include ecosystem alterations, public health problems, reduced tourism and social problems, among others, which imply important economic losses ( [5,6]).For this reason, constant monitoring of these phenomena is necessary to take preventive action when they appear.HABs are a natural phenomenon, and their occurrence cannot be deliberately prevented.Therefore, an active surveillance plan must be maintained to monitor their occurrence and to determine the geographic location of the upwelling.
Within the European Union, the management of openings and closures of production areas is carried out by analysing the presence of toxicity in mollusc meat ( [7]), ( [8]) and ( [9]).Where such analyses are not possible, the legislation allows the authorities responsible for such monitoring to make decisions based on endogenous and exogenous factors that favour the proliferation of toxic phytoplankton.At present, the closing of the production areas is based on expert knowledge and lacks predictive models to support it.The occurrence of these events represents serious disruptions and sometimes economic losses for the industry.This drawbacks could be reduced with the existence of predictive models that allow for the creation of contingency plans ( [10]).The use of machine learning techniques is the option that offers the best results ( [11]).There are two main approaches to the creation of these models.This approach depends on the prediction goal of the ML models.On the one hand, we have models that focus on predicting the concentration of very specific cells such as the Dinophysis acuminata ( [12,13]), or species of the genus Pseudo-nitzschia ( [14]) or Karlodinium ([15]).This approach is more specific but maybe insufficient when predicting HABs composed simultaneously of several species.On the other hand, we have the more generalist approach, which seeks to predict biomarkers that are highly related to HABs, such as chlorophyll-a concentration ( [16]) or the toxin concentration itself ( [17,18]).Prediction based on the presence of toxin is the least common method, although this information is used by experts in the control of shellfish production areas.This is largely due to the irregularity of the data due to the established sampling programs.As an alternative, the activity status of the production areas has been proposed in this study as an objective variable, depending on whether the meat of cultured molluscs has a toxicity level below or above the legal limit.
From a machine learning perspective, there is a trend towards increasing model complexity.Algorithms such as Support Vector Machine (SVM) have offered good results thanks to their ability to work with a small amount of data and their good generalization capacity ( [19,20]).However, it has a high computational cost and tends to overfit when applied to high-dimensional multivariate time series.The most recent approaches are based on Random Forest (RF) and Artificial Neural Network (ANN), both for predicting HABs and toxins in molluscs meat.On one hand, RFs provide explicit rules that can be easily interpreted by humans, but present difficulties when dealing with high-dimensional multivariate data ( [21,22]).On the other hand, ANNs are better suited to this type of data although they are difficult to interpret ( [23,24]).In general, other studies work with data sampled at regular time intervals, which allows the creation of complete time series data sets.This is a clear advantage when training ML models that enables the use of methods such as Autoregressive Integrated Moving Average (ARIMA) ( [25]) or Convolution Neural Network (CNN) ( [26]).In cases such as the one we are studying, where the sampling intervals are irregular and the data has large imbalances, the results are worse.The unbalance between positive and negative cases is the reason for this behavior.In addition, HAB episodes correspond to the minority class, making their detection a great challenge.Therefore, the aim of this work is to alleviate these problems and improve the results in the prediction of HABs.According to the current state of the art concerning this type of problem, the best results obtained are based on ensemble techniques such as XGBoost ( [27]).Due to the recent boom of hybrid techniques within the field of artificial intelligence (AI) and its high adaptability capacity ( [28]), it has been decided to study the feasibility of applying this type of techniques.
In order to support the HABs related opening and closing of production areas, we propose the cre-ation of a predictive model based on hybrid AI techniques.The techniques implemented in this study are: Neural-Network-Adding Bootstrap (BAGNET) ( [29]) and Discriminative Nearest Neighbor Classification (SVM-KNN) ( [30]).These two methods have not been previously tested in the prediction of HABs.To test the performance of these techniques we have used as a benchmark a set of techniques already applied in other studies related to HAB prediction.In this control group we have used: Random Forest, Artificial Neural Networks (ANN), k-Nearest Neighbour (kNN), Support Vector Machines (SVMs), XGBoost, and Naïve Bayes ( [18]).These models were tested in the literature to support very localized mussel production areas.These regions do not cover all possible situations present in the production areas.This resulted in the creation of local models with low generalization capacity.In contrast, in this work we aim to create models capable of good generalization.For this purpose, the models are trained with data collected from several production areas located in different estuaries.This allows us to test the performance of the models over regions with heterogeneous HABs behavior.
The structure of this article is defined as follows: It starts with the definition of the HAB problem and how it affects the seafood industry, as well as the proposed solution.Section 2 contains a brief explanation of the hybrid techniques used as well as the collection and processing of the dataset is given.The results obtained after the application of these techniques are presented in section 3 and evaluated in section 4 by comparing them with the existing literature.Finally, in section 5, we present the conclusions drawn from this work and the possible lines of future work that the advances made leave open.

Dataset and its construction
Galician coast was chosen for this study because it is one of the main mollusc producing regions in Spain and has a base study to compare the results obtained ( [14,13,17]).For the creation of the dataset, we used a series of metrics related to HABs and their proliferation collected between the years 2004 and 2019.These data were obtained from oceanographic sampling carried out by the Instituto Tecnolóxico para o Control do Medio Mariño de Galicia (INTECMAR) ( [31]) in the 42 oceanographic stations distributed along the 5 Galician estuaries ( [32]), deciding to eliminate stations recently installed that offer less historical records.
Additional data was also collected from Instituto Español de Oceanografía (IEO) marnaraia project ( [33]).The distribution of these 42 oceanographic stations can be seen in table 1.In addition, the production areas where the toxicity is analysed are shown for each estuary, being this information the one used to label the samples.The location of these production areas can be seen in Figure 1.Working with several estuaries separately allows us to create study sets with different levels of complexity.The Ares-Betanzos estuary is the simplest with only 4 stations and 2 production zones, while Arousa is the most challenging one with a total of 10 and 22, respectively.(the sampling program stops on weekends).
It was decided to create independent predictive models for each estuary in order to test the robustness of the models when applied to heterogeneous environments.Taking into account that each Galician estuary has its own oceanographic stations, the number of input features will depend on the estuary where the model will be applied.Each of the 42 oceanographic stations can record the following variables: chlorophyll "a", "b" and "c", Dinophysis acuminata, Dinophysis acuta, Dinophysis caudata, Dinophysis spp., Alexandrium spp., Gymnodinium catenatum, Pseudonitzschia spp., nutrients (phosphate, nitrate, nitrite and ammonium), temperature, salinity and oxygen.Other variables such as the upwelling index and the opening and closing of the production areas were used.These features were processed and adapted to create the datasets in the following way: • The maximum values of chlorophyll concentration "a", "b" and "c" have been used.
• The count of the different phytoplankton cells that produce DSP toxin (Dinophysis acuminata, Dinophysis acuta, Dinophysis caudata and Dinophysis spp.) was used.
• The concentration value of dissolved nutrients (phosphate, nitrate, nitrite and ammonium) was used.
• The average values of temperature, salinity and oxygen have been used.In addition, with the temperature and salinity values, the absolute difference between the mean of the first 6 meters and the next 6 meters was calculated to detect the presence of thermocline and halocline stratification.
• We have used the weekly mean value of the upwelling index.
• Production areas are classified as open or closed depending on whether they have a toxicity level below or above the legal limit, respectively.This classification was used both for the creation of the output parameters and one of the input parameters.The output parameter is composed of the Monday value of the week following the week studied, while the input parameter is composed of the Friday value as this is the day with information sampling closest to the day of the prediction.
• The production area to which each sample belongs was coded by one hot encoding ( [34]).
• The sampling date was transformed only into the number of the week of the year.
After data processing, 5 datasets were obtained (one for each estuary) consisting of the variables reflected in the table 2. This table is made up of different statistical values like average, maximum or minimum value.In the table we can see how the maximum concentration values of D. acuminata vary greatly depending on the estuary.Since this variable is one of the most decisive when estimating closures caused by the presence of DSP toxin, we can foresee that the estuaries with the greatest variability between maximum and minimum values (Arousa and Pontevedra) will be the hardest to make predictions for.These datasets had a large number of inconsistencies in the feature values.This may be due to technical failures, the impossibility to sample or the late creation of certain stations.Since the models only admit that samples with the same dimension, it was necessary to eliminate those that had null values in any of their features.After this filtering, the distribution of openings and closings of the production zones present in each estuary can be seen in the table 1.

Machine Learning Models
Within the field of machine learning, one of the active areas of study has been the development of hybrid methods that improve classification/prediction performance over approaches based on single learning methods.
In general, such methods focus on combining two different machine learning techniques.Based on this idea and previous literature, 2 hybrid machine learning methods such as BAGNET and KNN-SVM have been selected.We have chosen BAGNET inspired by the good performance of bootstrapping based ensemble techniques and the power of Artificial Neural Networks (ANNs).In the case of SVM-KNN, its choice is based on the assumption that the datasets are made up of local data regions with their own distributions.If this were the case, and under the assumption that these local regions were linearly separable, this method could fit the problem very well.These two methods are briefly defined below.

Neural-Network-Adding Bootstrap
The Bagging aggregation method seeks to combine multiple predictors using bootstrap replicates of the training data ( [35]).When this technique is combined with Artificial Neural Networks ( [36]), the model ).Since the dataset is sampled with replacement, the probability that a given instance is not chosen after n samples is (1 − 1/n) n ≈ 0.368 as n goes to infinity.
On the other hand, the probability of being chosen is calculated as 1 − (1 − 1/n) n ≈ e −1 ≈ 0.632.This implies that approximately 0.632 * n unique samples are selected as bootstrap training sets and we would reserve 0.368 * n out-of-bag samples for testing at each iteration.Therefore, this weighting is calculated by the Eq. 1.Where ACC train is the accuracy computed on the whole training set, and ACC h,i is the accuracy on the out-of-bag sample.
Figure 2: A bootstrap aggregated neural network.This method is based on resampling with replacement of the available dataset and training an individual network on each resampled instance of the original dataset.

Discriminative Nearest Neighbor Classification
This system, called SVM-KNN, is based on the hypothesis that a complex separation region can be decomposed into different separated regions that behave linearly locally.To verify or reject this hypothesis, we propose the development of a hybrid model that combines the techniques kNN ( [38]) and SVM ( [39]).Training an SVM on the entire dataset is slow.However, in the neighborhood of a small number of examples and a small number of classes, SVMs usually perform better than other classification methods ( [30]).The philosophy of this method is similar to that of Bottou and Vapnik's "Local Learning" ( [40]), in which they pursued the same general idea using kNN followed by a linear classifier with ridge regularizer.However, by using only an L2 distance, their work was not driven by the constraint of fitting a complex distance function.A diagram of a SVM-KNN is shown in figure 3, The model would be as follows: to classify an instance, the kNN algorithm is applied to select the k instances from the training set of closest resemblance.Once these are available, a linear SVM is trained with them, and the instance to be classified is applied to this SVM.Once this is done, the SVM is discarded, since it is of local use and is only valid for that instance ( [30]).
Figure 3: A discriminative nearest nighbor classification.To classify an instance, the kNN algorithm is applied to select the k instances from the training set of closest resemblance.Once these are available, a linear SVM is trained with them, and the instance to be classified is applied to this SVM.Once this is done, the SVM is discarded, since it is of local use and is only valid for that instance.

Performance measures
When analysing the trained models and for subsequent comparison, five statistics were taken into account that were considered relevant for evaluating the results: accuracy, recall, F1-score and kappa coefficient.In the confusion matrix used to calculate the statistics, the closures of the production zones were defined as positive and the openings of the production zones as negative.Thus, True Positives (T P ) correspond to closures correctly classified as closures, True Negatives (T N ) identify openings classified as openings, False Positives (F P ) represent openings misclassified as closures and, finally, False Negatives (F N ) are closures that have been classified as openings.
Calculated according to Eq. 2 the accuracy estimates the correctness of a binary classification test that identifies or excludes a condition.

Accuracy = T P + T N T P +
Due to the risk to the health of the population of introducing molluscs with toxins into the market, we have decided to prioritise a more conservative model in terms of opening production areas.This is reflected in the especially recall will be taken into account due to the importance of avoiding false negatives.This is because classifying a potential closure as an opening of the production area could pose a risk to public health.

Experimentation setup
To ensure the robustness of the models, we have applied the K-fold cross-validation strategy ( [42]).In this particular study, the K-fold cross-validation strategy selected will be the 10-fold strategy, where k takes value 10.Taking into account the imbalance present in the data, we have used a stratified k-fold.This method ensures that each of the folds will have a balanced amount of data.
The following models have been used as benchmarks for comparison: Random Forest, Artificial Neural Networks (ANNs), k-Nearest Neighbour (kNN), Support Vector Machines (SVMs), XGBoost, and Naïve Bayes.Their hyperparameters have been obtained by replicating the experimentation process established in previous studies ( [18]) to the new datasets.
In the case of the models proposed in this paper, the attributes of the models used were adjusted by a grid search.Grid search is a tuning technique that attempts to compute the optimum values of hyperparameters.It is an exhaustive search that tests all combinations within the values given to the hyperparameters of a model.As described above, BAGNET is an ensemble model composed of a set of networks of neurons.In our particular case, we have worked with an ensemble composed of 50 networks.
When testing the operation of BAGNET we have tried different configurations of the networks that compose it, both in the number of layers and in the number of neurons that compose them.The number of neurons used is based on the use of multiples of 2 and combinations of them.The configuration of the individual networks that compose the ensemble is based on the ANNs used in previous studies ( [18]).On these values we have followed an empirical experimentation in order to determine the values used in the grid search.The specific configuration of this model is shown in table 4.

Neural-Network-Adding Bootstrap
Number

Results
Using the k-fold cross-validation strategy with k = 10 yields 10 values for each statistic.To avoid choosing models with good means, but with high variability and low robustness, it was decided to add the standard deviation of each statistic studied as a parameter to be taken into account.The implemented models obtained results that will be shown in the table 6, consisting of: the estuary where the models have been applied, the trained algorithm, the value of the adjustment parameters (values of k and C for SVM-KNN and the number of neurons per layer for BAGNET), the mean and standard deviation of the metrics studied.As one of the main objectives of this study is to study the generalization capacity of the implemented algorithms, table 7 shows a summary of the average behavior of the models in the 5 different estuaries.In turn, the metrics associated with each estuary are determined by the averaged value resulting from applying 10-fold cross-validation.The metrics shown for the SVM-KNN and BAGNET models are those obtained with the hyperparameter configuration that provided the best F1-score values.The best configuration for SVM-KNN was k = 5 and C = 1 and for BAGNET was two layers with 192 and 128 neurons respectively (these specific configurations will be noted as SVM-KNN* and BAGNET* respectively).In the table 7 we can see how the accuracy values hardly vary depending on the model.If we look at the recall values we can see that BAGNET offers the best results, followed by the models based on ANN, Random Forest and SVM-KNN.If we consider the F1-score values, the best models are those based on Random forest, SVM-KNN and BAGNET, the first one standing out for its low standard deviation.With the exception of the Naïve Bayes-based model, the rest of the models have obtained Kappa values above 0.80.The study of the models applied to each estuary independently allows us to understand how the models behave according to its complexity.These performance allows us to understand why the models have such a high standard deviation, reaching values of more than ±15% in the recall of the SVM, XGBoost and Naïve Bayes models.This is because the behaviour of the models varies a lot between folds depending on the estuary where they are applied, but vary little between folds in the same estuary.So it can be seen that the models give robust results.

Discussion
As we can see in the table 7, very good results of around 92% of accuracy are obtained.These values are very similar regardless of the model applied.This is due to the fact that the datasets present a large imbalance between positive and negative cases (table 1).Because of this, our comparison focuses on the recall value obtained.Taking this metric into account, BAGNET is positioned in first place, reaching recall values of 93.41%, followed by ANN, SVM-KNN and Random Forest, all with a recall rate of around 89% and a higher standard deviation in their results.These models, with the exception of ANN, reached a Kappa index value above 0.8 when compared to the real values.This is an "almost perfect" agreement according to the scale of values proposed by Landis and Koch ([41]).It should be noted that although BAGNET is the model that offers the best recall values, it is a model that offers somewhat lower F1-score values than its alternatives.This is because, like the ANN model, they are configured to weight the loss function (during training only).
This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
This results in an increase in false positives in exchange for being the models with the lowest number of false negatives, the latter being of great importance as explained above.
It should be noted that there is no reference dataset on which to test the effectiveness of the models.This is due to the fact that the data used for their construction is provided ad-hoc for the respective studies.Furthermore, it is noteworthy that there are very few studies that aim to predict the presence of toxin in mussel meat (biomarker used by the EU for the management of shellfish production areas).As mentioned earlier in the introduction to this study, most studies determine bloom/no boom based on the concentration of certain phytoplankton species present in the environment.This situation increases the complexity when making comparisons within this field of study.In order to make as fair a comparison as possible between our models and the most common state-of-the-art models, they have all been applied to the same dataset defined in this study.(table7).
Among the studies that do work with the presence of toxin in mussels, those applied to very localized areas stand out ( [18]).In this situation, the kNN technique was the most effective, with recall values of 97%.But when we applied this technique to our data, which were collected over larger regions and contained more variability, we found that the effectiveness of this model decreased in favor of ensemble-based alternatives.kNN offers recall values of 86% compared to 93% for the BAGNET model.The proposed BAGNET and SVM-KNN models are a better options for larger environments.
Other works, such as Harley et al. (2020) ( [22]), also work on the prediction of toxins present in mussels.Although in their case the type of toxin studied is PSP, it has been considered relevant to highlight this due to the small number of papers presenting this approach.They have achieved recall values of 81% using Random Forest for PSP prediction.As can be seen in the table 7, applying Random Forest in the prediction of DSP blooms improves the results compared to the other type of toxin, with a recall of 88%.Although these results are still lower than those obtained with BAGNET and SVM-KNN.
One of our goals is to obtain a model that is extrapolable to other regions with the same problems and that can also be adjusted and provide good results.For this reason, we have applied the predictive models to several estuaries composed of multiple production areas.In doing so, we have found that the models do not behave the same regardless of the region in which they are applied (figure 4).If we analyse the reason why the models offer worse results in estuaries such as Arousa, this is mainly due to two factors.
We have to take into account that this is the estuary studied with the greatest imbalance in the labels of its dataset (16% positives -84% negatives), which makes it a challenge for the models to be able to generalise with a smaller number of cases.Moreover, this estuary is the largest we have studied.Composed of a total of 24 mussel production areas in floating hatcheries located along the entire estuary, it represents a greater variability of events that the models have to deal with.Despite these setbacks, we can see that the BAGNET model still offers recall values of around 90.46% in the Ría de Arousa.The other model that obtains recall values higher than 80% in this estuary is ANN with 84.31%, but this model achieves 10% less F1-score and 15% less Kappa index than BAGNET.

Conclusions
Hybrid techniques of machine learning algorithms obtain better results than the simple techniques studied in the literature.We also verified how the BAGNET method, with an average recall value of 93.41% and without falling below 90% in any of the estuaries, exceeds the results obtained by other hybrid methods proposed in the literature, such as XGBoost or Random Forest.This positions BAGNET as the best algorithm for DSP control of HABs when sampling conditions are unfavorable.
To make this comparison, we have had to implement the models proposed in other studies.This is due to the fact that, to date, there is no reference data set in the prediction of HAB that allows an objective comparison of the models studied in other works.This is because the data sets used are provided ad-hoc for each study.For this reason, the creation of a complete and public data set that can be used for future comparisons has been considered of great importance.
In order to create more complex test environments and increase the generalization capacity of the system, we have created data sets grouping all the production areas of each estuary.Although this process carries the risk of increasing the complexity of the system, the use of ensemble-based techniques and bootstrapping, such as BAGNET, has allowed us to address this problem with very good results.
In the course of this study, one of the main problems we faced was the large number of gaps in the time series data.This is an important limitation when working with the data and interpreting it.Being able to create a data set consisting of a complete time series would allow new approaches to HAB prediction.
The importance of having a system capable of monitoring and predicting the appearance of HAB phenomena is well known.Poor planning before the formation of these natural phenomena can cause significant economic losses in the fishing and shellfish businesses.In addition, and more importantly, there is the risk of introducing toxin-containing shellfish onto the market due to failure to close the production areas on time due to poor planning.This risk poses a significant danger to the health of the population.For all these reasons, we believe that models such as the one proposed can be of great help in addressing this problem.In particular, it is worth highlighting factors such as the creation of models focused on the control of shellfish production areas and the creation of models with greater generalization capacity, capable of being applied in multiple regions.These last factors are where our work stands out, improving the results obtained in previous studies.

Figure 1 :
Figure 1: Map of Galicia and the location of the production areas studied marked in yellow.

Figure 4 :
Figure 4: Comparison of accuracy, recall, f1-score and kappa between benchmark models and the best results of SVM-KNN and BAGNET for the prediction of HAB episodes by DSP.

Table 2 :
[37]t values of computational models for DSP toxicity event prediction the accuracy and robustness of the model.The overall output of a BAGNET is a weighted combination of the individual neural network outputs.Proper determination of the aggregation weights is essential for good modelling performance as is the variant 0.632 bootstrap ([37] [29]n as Neural-Network-Adding Bootstrap (BAGNET) emerges.This method is based on resampling with replacement of the available dataset and training an individual network on each resampled instance of the original dataset ([29]).The figure2shows a diagram of a BAGNET, in which several neural network models developed to model the same relationship between inputs and outputs are combined together.Instead of selecting a single neural network model, a BAGNET model combines several neural network models to improve

Table 4 :
[30]ET model configuration parametersIn the case of the SVM-KNN model, we have tested various values of the parameter k, in order to check whether the problem can be effectively decomposed into linearly separable regions.For the choice of the parameter k we have applied the greed search method on the values proposed in the work of Hao Zhang et al.([30]).We have limited the maximum value of k to 45 as we did not obtain any improvement by increasing the complexity of the subspaces.In addition to what was proposed in said work, we have decided to test the behaviour of the model by varying the values of C. The parameters used in the SVM-KNN model can be found in table5.

Table 6 :
Table with metrics for each model itemized by estuaryIn figure4we can see how the models perform depending on the estuary where they are applied.When separating the metrics obtained according to the production area where they are applied, we can see how the models have less difficulty in making predictions in estuaries such as Ares-Betanzos, Muros-Noia and Vigo.The worst results are obtained in the Pontevedra estuary and the Arousa estuary.