Advances in Streamflow Prediction: A Multimodel Statistical Approach for Application on Water Resources Management

The growing demand for urban water users, industrial, environmental and agricultural, makes the task of managing water resources is becoming more complex. Obtaining accurate simulations and forecasting of water availability is a key step in efficient planning, operation and management of these resources. This is the reason because the development of reliable surface water flow forecasting methods for real-time operational water resources management becomes increasingly important. Because of this, in recent years, interest in predictability of river discharge variability has increased markedly in most of the world regions. The hydrological system acts as spatial and temporal integrator of precipitation (rain and snow), temperature, and related evapotranspiration over a specific region. Therefore, seasonal to decadal streamflow variability in many large river-basins can be controlled by corresponding changes in large scale atmospheric circulation patterns. The relationship between these climatic regimes and its hydrological response, through its streamflow, presents different grades of complexity according to the physical characteristics of the basin. Nevertheless, streamflow can be better related with important patterns of climate teleconnections than precipitation or temperature fields, since variations in precipitation are amplified in streamflow, and in general, it is easier to detect a change in discharge than directly in the basic climatic variables (Dettinger & Diaz, 2000; Trigo et al., 2004). On seasonal timescales, anomalous atmospheric conditions are often linked with seasonal variations in the rivers streamflow and reservoir storages, via variations in precipitation and temperature (Dettinger & Diaz, 2000; Cullen et al., 2002; Trigo et al., 2004). The interannual climate and hydrologic fluctuations are modulated by, or superimposed upon, lower frequency variations with decadal and longer time scales. The sources of these lower frequency climate variations are uncertain but may have roots in the tropics (Trenberth & Hoar, 1996), in the extratropical oceans (Pacific: Latif & Barnett, 1994; Atlantic: Deser & Blackmon, 1993; Tanimoto et al., 1993; Houghton, 1996), or in some interplay of the two (e.g., Graham et al., 1994; Jacobs et al., 1994). Usually, the skill of the long-range forecasts is associated with the introduction of predictors that represent the slow varying components of the climate system such as sea surface temperatures (SST) (Koster et al., 2010). Consequently, variability in SST can help provide predictive information about the


Introduction
The growing demand for urban water users, industrial, environmental and agricultural, makes the task of managing water resources is becoming more complex.Obtaining accurate simulations and forecasting of water availability is a key step in efficient planning, operation and management of these resources.This is the reason because the development of reliable surface water flow forecasting methods for real-time operational water resources management becomes increasingly important.Because of this, in recent years, interest in predictability of river discharge variability has increased markedly in most of the world regions.The hydrological system acts as spatial and temporal integrator of precipitation (rain and snow), temperature, and related evapotranspiration over a specific region.Therefore, seasonal to decadal streamflow variability in many large river-basins can be controlled by corresponding changes in large scale atmospheric circulation patterns.The relationship between these climatic regimes and its hydrological response, through its streamflow, presents different grades of complexity according to the physical characteristics of the basin.Nevertheless, streamflow can be better related with important patterns of climate teleconnections than precipitation or temperature fields, since variations in precipitation are amplified in streamflow, and in general, it is easier to detect a change in discharge than directly in the basic climatic variables (Dettinger & Diaz, 2000;Trigo et al., 2004).On seasonal timescales, anomalous atmospheric conditions are often linked with seasonal variations in the rivers streamflow and reservoir storages, via variations in precipitation and temperature (Dettinger & Diaz, 2000;Cullen et al., 2002;Trigo et al., 2004).The interannual climate and hydrologic fluctuations are modulated by, or superimposed upon, lower frequency variations with decadal and longer time scales.The sources of these lower frequency climate variations are uncertain but may have roots in the tropics (Trenberth & Hoar, 1996), in the extratropical oceans (Pacific: Latif & Barnett, 1994;Atlantic: Deser & Blackmon, 1993;Tanimoto et al., 1993;Houghton, 1996), or in some interplay of the two (e.g., Graham et al., 1994;Jacobs et al., 1994).Usually, the skill of the long-range forecasts is associated with the introduction of predictors that represent the slow varying components of the climate system such as sea surface temperatures (SST) (Koster et al., 2010).Consequently, variability in SST can help provide predictive information about the Monthly data flows from five different basins throughout the Iberian Peninsula have been analyzed: the rivers Ebro, Miño, Douro, Tejo and Guadiana.Figure 1 shows the location of the selected river basins in the Iberian Peninsula.The Ebro River (Figure 1) presents the largest catchment in Spain, with an area of 85,530 km 2 in the northeast of Spain.Ebro basin is a complex region from both the topographical and meteorological perspectives, with great physical heterogeneity, which is a deciding factor in the annual seasonality of flow.This region forms a broadly triangular morphological unite, which is a depression surrounded by high mountain ranges.The heterogeneous topography contrasts the influences of the Atlantic and Mediterranean as well as of different large-scale atmospheric patterns (Vicente- Serrano & López-Moreno, 2006).The Miño River basin is located in the northwestern of the Iberian Peninsula.It covers a limited territory, bordered by the Atlantic Ocean and the Cantabric Sea.And finally, the entire central Iberian plateau is dominated by the Douro, Tejo and Guadiana River basins covering more than half of the whole Peninsula and which are roughly oriented in a NE-SW direction, running from the mountains in Spain down to the Atlantic Ocean across Portugal.
Monthly streamflow data of the rivers have been kindly provided by different Organisms from both Spain and Portugal such as the River Hydrographic Administrations of Douro, Miño and Ebro, the National Water Research Centre (CEDEX) and the Portuguese National Electrical Supply Company (REN).These agencies control the monthly flows of rivers in different areas by using direct gauging stations.Streamflow series distributed in the basins have been intensively checked for inconsistencies, and only series showing less than 10% of missing values during the period 1950-2006 have been considered in this study.Data gaps were filled using data from neighbouring stations with a Pearson correlation coefficient of at least 0.8.For the Ebro River, a total of 83 streamflow series, with values during the period 1950-2006, distributed  Hydrological series commonly do not follow a normal distribution, being highly biased, often requiring some preliminary transformation in order to adjust the records to an appropriate distribution.Also, the magnitude of the monthly streamflow values varies greatly among the stations, due to both the different climatic regions in Ebro basin and the different drainage basin characteristics associated with each streamflow station.Some studies have shown a good adjustment of the discharge series to the log-normal distribution (Zaidman et al., 2001;Kalayci & Kahya, 2006), while Vicente- Serrano (2006) and López-Moreno et al. (2009) found that the Pearson III distribution is more appropriate over some parts of the Iberian Peninsula.Using these considerations three different theoretical distributions for modelling the monthly stream river flow were tested: the Pearson III, the log-normal and the normal distribution.Furthermore, the goodness of fit was evaluated using three different tests, the Kolmogorov-Smirnov, the Anderson-Darling and the Chisquared tests.We find that for all the cases the log-normal theoretical distributions present good results.For this reason, and in order to facilitate the intercomparison between regions, the streamflow data were first subjected to logarithmic transformation to reduce the disparity in the magnitudes, from which monthly standardized streamflow anomalies were computed.The monthly standardized streamflow anomalies were constructed by subtracting the mean and dividing by the standard deviation for each month separately.The standardized streamflow for each station closely follows a normal distribution.
The global SST taken from the HadISSTv1.1 data set (Rayner et al., 2003) derived from the Hadley Centre for Climate Prediction and Research (UK Meteorological Office) has been used as predictor.The winter, spring, summer and autumn SST seasonal fields were generated by averaging the monthly SST anomalies (using the mean and standard deviation for the period  for DJF, MAM, JJA and SON, respectively. Aditionally, some teleconnection indices such as the North Atlantic Oscillation (NAO), the East Atlantic (EA), the East Atlantic/Western Russia (EATL/WRUS), the Scandinavia (SCAND) and the Pacific Decadal Oscillation (PDO), have been obtained from NOAA, Climate Prediction Center (http://www.cpc.noaa.gov/data/teledoc/telecontents.shtml), and from the Joint Institute for the Study of the Atmosphere and Ocean (http://jisao.washington.edu/pdo/PDO.latest).

Methodology
This Section describes the complete modelling scheme that can be used for operational prediction of streamflow anomalies, based on the combination of different statistical techniques.As discussed in the Results Section, depending on the features of the basin and the availability of data in it, the modelling scheme can be applied in full or in part.
Firstly, for those basins that present a homogeneous spatial distribution of the gauging stations (as the case of the Ebro and Miño rivers), regionalization of streamflow series in the basins has been performed using rotated Principal Component Analysis (PCA).PCA has been applied in analyzing the spatial variability of physical fields.In climatology and hydrology, the PCA is a tool to explain the fundamental nature of streamflow, reducing a

65
large number of interrelated variables to a few independent Principal Components (PCs) that capture much of the variance of the original dataset (Wilks, 1995).It produces a few major spatial-variability patterns (or Empirical Orthogonal Functions, EOFs), and the corresponding time series represent the time evolution of the spatial-variation patterns.Rotation (varimax) of PCA is then applied to the first few major patterns to capture better the physically meaningful and simplify spatial patterns (Barnston & Livezey, 1987).This procedure allows extracting representative stations for each of the areas obtained in the regionalization process.PCA has been used by other authors in similar contexts of the present study (Widman & Schär, 1997;Maurer et al., 2004;Kalayci & Kahya, 2006).
Second step is the study of temporal variability of the streamflow time series representative of the regions previously identify by PCA.To this end the Singular Spectral Analysis (SSA) is applied to each streamflow time series.SSA is a powerful form of the standard Principal Component Analysis based on the extensive use of the lag correlation structure of a time series (Vautard et al., 1992), which is particularly successful in isolating multiple period components with fluctuating amplitudes and trends in short and noisy series.SSA has been successfully applied to many geophysical and climatological time series to study and predict periodic activities (Ghil & Mo, 1991;Ghil & Vautard, 1991;Plaut & Vautard, 1994;Gámiz-Fortis et al., 2002, 2008a, 2010a, 2011a, 2011b;Paluš & Novotná, 2006).A comprehensive review, explaining in detail the mathematical foundations of SSA, can be found in Vautard et al. (1992) and Plaut et al. (1995).
SSA is based on the diagonalization of the lagged-autocovariance matrix of a time series.As in the case of PCA, the eigenvectors or Empirical Ortogonal Functions (T-EOFs) represent patterns of temporal behaviour, and the Principal Component series (T-PCs) are characteristic time series containing a very limited number of harmonic components.The detailed reconstruction of a set of significant components, called SSA-filtered components or reconstructed components (RCs) of the time series, is carried out by an optimal linear square fitting between the corresponding PCs and the original data.Each RC represents the contribution of its associated EOF to the variance of the time series; additionally, the RCs are additive and their sum provides the original time series.When two eigenvalues of the lagged-covariance matrix are nearly equal and their corresponding eigenvectors are orthogonal, they represent an oscillation.Therefore, we can synthesise that SSA is an eigenvalue technique particularly efficient to extract and reconstruct periodic components from noisy time series.Determining the corresponding frequencies requires, however, estimations of power spectra.The Maximum Entropy Method (MEM) is used to evaluate the spectral contents of the PC time series corresponding to the EOFs, and the Monte Carlo (MC) technique is used for the significance study (see Gámiz-Fortis et al., 2002, for further details).The analysis of the temporal structure of the series allows us to model separately their variations at different time scales: seasonal, interannual and decadal.
Next step involves identifying sectors of oceanic SST anomalies that can be used as predictors for river flow.Simple methods of statistical analysis can describe the potential relationship between the streamflow series and the SST, both contemporary and of the preceding seasons, in order to evaluate predictive ability.To do that the point linear correlation between the monthly or seasonal streamflow anomalies and the SST anomalies from previous seasons must be evaluated.Regions showing significant correlations are identified as potential predictors.A very important issue is to identify, among these regions, those that can be classified as stable predictors.This is achieved through the analysis of the variability of the correlation between flow anomalies and SST anomalies from potential predictor regions using different moving windows with lengths between 10 and 20 years.
The correlation is considered to be stable for those regions where streamflow and SST anomalies are significantly correlated at 90% level for more than 80% of the 15-year windows covering the total period under study and, furthermore, where the sign of the correlation does not change with time.Regions verifying this criterion are considered as robust predictors and are used in a multiple linear regression model (hereinafter SST_model) to simulate the river flow anomalies.The method of ordinary least squares is used to estimate the regression coefficients (Draper & Smith, 1998) by minimizing the sum of squared errors (SSE), i.e. the unexplained variance part.The coefficient of multiple determination R 2 , which measures the fraction of variance in the response variable that can be explained by variations in the explanatory factors also is computed.Note that a high value of R 2 does not imply that a particular model is appropriate.In fitting the model, the assumption is made that the unknown random effects are represented by "e", which is a vector of independent, normally distributed noise.The validity of this assumption should be checked.
Additionally, in order to detect some other kind of influence, the residual time series obtained by subtracting the SST_model from the raw streamflow series is studied.This residual component is firstly modelled by the significant quasi-oscillatory modes obtained from the SSA (residual_SSA_filter hereinafter) and autoregressive-moving-average (ARMA) models are fitted to the residual_SSA_filter of streamflow.The SSA filtering prior to obtaining the ARMA models considerably improves the forecasting skill of these ARMA models (Gámiz-Fortis et al., 2002, 2008a, 2010a).ARMA models can be regarded as a special case of general linear stochastic processes and provide a linear representative structure of the temporal evolution of the data.The order of the model is selected, in a preliminary approach, studying the autocorrelation function (ACF) and partial autocorrelation function (PACF).In physical terms, the best model has as few parameters as possible.The Akaike information criterion (AIC) (Akaike, 1974) has been used to select the final model among all the candidates.The AIC is based on information theory and represents a compromise between the goodness of the fit and the number of parameters of the model.A comprehensive review, explaining in detail how to fit ARMA models to datasets following the identification, estimation, and diagnostic check stages, can be found in Brockwell & Davis (1996) and Hipel & Mcleod (1994).
Finally, the combination of both SST_models and ARMA_models is evaluated in a forecasting experiment.For reliable skill assessment, a fundamental aspect of this evaluation is the separation of calibration and validation periods (Wilks, 1995).We employ data until 1989 to calibrate the different modelling, while data from 1990 onwards are used for validation purposes only.For model evaluation appropriate skill measures commonly used are employed, such as the mean absolute error (MAE) and the mean square error (MSE) as accuracy measures; the Pearson correlation coefficient, which is used to measure the relationship between the modeled/forecasted series and the original/expected series; the percentage of phase agreement, that is, the percentage of cases in which the modeled/forecasted values has the same sign as the original values; and the coefficient of multiple determination R 2. .

Results
This section shows the main results obtained from the application of the previously described methodology to the five rivers studied in the Iberian Peninsula.Depending on the availability of data in the selected basins not always is possible to apply the full modelling scheme.For example, for Ebro and Miño rivers, that account with a relative high amount of gauging stations distributed around the basins, the regionalization process using PCA can be carried out.For the Ebro River three significant spatial EOFs, accounting for 55.4% of the total variance, were found; while for the Miño River just one significant spatial EOF with an associated variance of 80% was found.Figure 1 also shows the location of the selected streamflow time series representative of the regions identify by PCA for the Ebro and Miño basins.For the Douro, Tejo and Guadiana rivers authors do not dispose of a database homogeneously distributed over the basins, so regionalization can not be carried out.However, studies of other authors like Leite & Peixoto (1995) that use temperature, precipitation and insolation fields in the global Douro Basin reveals the existence of different climatic regions, with a western region dominated by oceanic air masses with heavy precipitation, on contrast with a central and eastern region drier and warmer.It is therefore expectable that the influence of anomalous SST on these river streamflow to be more important over the western sector of these basins.Additionally, these rivers are subjected to large regulation in the Spanish part of the basins, which can interfere the SST/river discharges relationship.Note that the expected effect of large dams is an increased time between precipitation episodes and the arrival of the corresponding flow to lower sections of the river basin.Based on these considerations it is understandable that the contribution from the SST to these river streamflows to be more important for the western part of the basins.For this reason the monthly time series of these three rivers discharge were recorded at stations situated in the Portuguese side of the border (see Figure 1), in the lower part of the catchment areas.As expected for these regions, different months account for the majority of runoff, depending on the climatic characteristic of these regions.It is expected that the potential relationships between the flow and the SST field will be intensified for those months showing maximum streamflow values.Taking into account the maximum streamflow values observed for the different flows (see Table 1, last column), the study was limited to those months in which discharge time series are highest.For the case of Miño, Douro, Tejo and Guadiana, where months of January, February and March appear with maximum streamflow values, the winter streamflow was computed as the average of these three monthly normalized series.For the EZ station of the Ebro River (Figure 1), January river flow accounts for the majority of runoff, being followed by a relatively long and dry s u m m e r p e r i o d f r o m J u l y t o S e p t e m b e r .T h e E G s t a t i o n i s a s s o c i a t e d t o t h e r e g i o n represented by a nivo-pluvial flow type, located in the Pyreness Mountains, where snowmelt influence is very important.This area is characterized by a snowy winter, very rainy spring and rainy late summer and autumn (Bejarano et al., 2010).Finally, the EN station, located in the southeast of the Ebro basin is characterised by fairly low total annual precipitation and prolonged dry season.

Temporal variations
In this section, SSA is applied to each of the streamflow series selected in Table 1 to further identify and extract the oscillation components.Different window lengths between M = 15 and M = 32 years and the Vautard and Ghil (1989) algorithm were used in SSA.The oscillation pairs identified by SSA on streamflow series for the selected stations are summarized in Table 2 for the Ebro River and Table 3 for the reminder rivers.Because SSA cannot resolve periods longer than the window length, we identified the zero frequency as the non-linear trend, but it is important to note that this non-linear trend could be composed of quasi-oscillatory modes with periods longer than window length years.For the remainder rivers, no oscillatory mode with period longer than 9 years is found (see Table 3), which means that for these Atlantic rivers the interannual variability is dominating.Based on this feature the next step for the statistical modelling was reversed for the Atlantic rivers and the Ebro River.That is, for the Miño, Douro, Tejo and Guadiana rivers, interannual variability is firstly modelled by the use of ARMA models, while for the Ebro river, modelling of the decadal variability using the SST is carried out firstly.In summary, depending on which component is dominating, interannual or decadal, one or another modelling (ARMA_modelling or SST_modelling) has been firstly applied.

Ebro river modelling
This section shows the modelling of the Ebro River streamflow.The relationship between the climatic regimes and its hydrological response, through the streamflow, presents different grades of complexity according to the physical characteristics of the basin so it has been studied in detail.

Modelling on decadal time scales
Standardized streamflow anomalies were smoothed with a 7-year running mean filter to obtain the decadal component of the series (decadal_component hereinafter).This filter is designed to fall both near the low-frequency end of the spectrum associated with El Niño events and near the high-frequency end of the decadal climate variations of the Atlantic and North Pacific climate systems (Dettinger et al., 2001), and has been used following other authors (Ionita et al., 2011) in order to exclude any cross-spectrum between Ebro River streamflow and El Niño.Spatial correlation maps between the decadal_component of January EZ_FLOW, June EG_FLOW and May EN_FLOW and the preceding seasons for global SST data have been computed and are shown in Figure 2.For meaningful evaluation of the relationship between the SST and streamflow fields on decadal time scales, the linear trend components of all the data sets need to be removed.The fact of eliminating the trend when working with variables of different physical natures is necessary because otherwise trends could lead spurious correlations.
For January EZ_FLOW, maximum statistically significant correlations were found with the previous spring SST in the Atlantic region (see Figure 2a).Particularly, we can identify positive correlations with the SST anomalies in the tropical North Atlantic and southern Greenland, with the tropical part of the SST pattern (approximately 0-20ºN) playing the most important role.This pattern represents what is commonly called North Atlantic horseshoe pattern, described e.g. in Czaja and Frankignoul (2002), and it is represented by the third spatial EOF when a PCA is applied to the global spring SST field (Figure 3).This spring SST pattern is shown to be related with the subsequent winter NAO (Rodwell and Folland, 2002).An additional region in the south Pacific also shows significant positive correlation during previous summer (see Figure 2b).Based on this correlation maps we define three SST indices by averaging the normalized SST anomalies over the regions showing maximum correlation values: SST1 EZ (54ºW-49ºW; 9ºN-12ºN) and SST2 EZ (44ºW-39ºW; 54ºN-56ºN) in previous spring, and SST3 EZ (99ºW-96ºW; 28ºS-22ºS) in previous summer.
SST indices were smoothed with a 7-year running mean filter to obtain the decadal component of the series associated to these SST regions, and correlations between the January streamflow anomalies and the SST1 EZ , SST2 EZ and SST3 EZ indices using a moving window of 15 years were computed.Additionally, the PC3 time series obtained from the PCA of spring SST, which represents the complete horseshoe pattern, was also considered in this analysis.Stable positive correlations (not shown) during the period 1916-2003 are found only for the SST1 EZ index, while the remainder indices fail the stability test and dismissed from the rest of the analysis.There are several problems when using standard climate indices for hydroclimatic prediction.This is the case of PC3 of spring SST, where the hydroclimate of the river basin is more strongly correlated with an oceanic region´s SSTs, which is different from the predetermined regions that are used to calculate the standard index (Tootle and Piechota, 2006).The second problem is that the PCA, used to obtaining this index, might not preserve enough information from the original dataset (Switanek et al., 2009).
The spatial correlation maps between the decadal_component of June EG_FLOW and the preceding seasons for SST data (Figure 2c and 2d) show statistically significant correlations with global SST being maxima during the spring of the previous year and in some regions of the Pacific for the previous summer.In this case, stable correlations (not shown) during the period 1953-2003 are only found for the following defined indices: SST1 EG = (30ºE-40ºE; 41ºN-46ºN), corresponding to the Black Sea during the spring of the previous year, and SST2 EG = (88ºW-81ºW; 25ºS-20ºS), associated to the south-eastern Pacific Ocean during the previous summer.
For the decadal_component of May EN_FLOW, the spatial correlation maps with the preceding seasons for SST (Figure 2e and 2f) present maximum statistically significant correlations in the Pacific region.Particularly, we can identify correlations that resemble the negative pattern of the Pacific Decadal Oscillation during the spring of the previous year.For this case, stable correlations during the period 1953-2003 are found for the following indices: SST1 MN (158ºW-154ºW; 50ºN-55ºN), SST2 MN (76ºW-72ºW; 30ºS-21ºS) and SST3 MN (168ºE-172ºE; 21ºS-18ºS).Additionally, in this case, the stability of the correlation between the May streamflow and the spring PDO index of the previous year was also studied, finding negative stable correlation along the period.
Using the significant and stable indices as predictor for the streamflow time series at decadal time scales we developed a model based on linear regression, using as calibration period 1916-1989for EZ_FLOW, and 1953-1989 for EG_FLOW and EN_FLOW.The optimal models for explaining the decadal_component of these flows can be written as: These SST_models are able to explain around 56%, 71% and 81% of the variance of the streamflow decadal_component during the calibration period for the EZ_FLOW, EG_FLOW and EN_FLOW, respectively.

Modelling on interannual time scales
Regarding the modelling of the interannual variability (time scales less than 7 years) an additional analysis has been carried out using the quasi-oscillatory modes with periods lower than 7 years obtained from the previous SSA.Firstly, a SSA-filtered series, the interannual_SSA_filter, has been computed like the sum of the reconstructed components of these oscillatory modes, and then an ARMA_model is fitted to this SSA-filtered series.Using the sample Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) (not shown) we find that while the interannual_component behaves as a white noise process, the interannual_SSA_filter series shows a strong autocorrelation pattern.This feature implies a higher predictability of the interannual_SSA_filter when compared to the unfiltered time series.Based on these analyses, we used the Akaike Information Criterion (AIC) to select the ARMA models for the interannual_SSA_filters, containing the following parameters for each Ebro gauging station: ARMA (7,6) Significance of the parameters was computed using approximate t-values, derived from the parameter standard errors.Parameters highlighted with "*" are statistically significant at 5% significance level.The selected order models indicate a certain degree of persistence.
These ARMA_models are able to explain around 67%, 76% and 64% of the variance of the streamflow interannual_components during the calibration period, for EZ_FLOW, EG_FLOW and EN_FLOW, respectively.

Combination of interannual and decadal modelling
This section shows the results obtained from the combination of SST_model (for decadal time scales) and ARMA_model (for interannual time scales).The observed and modelled January EZ_FLOW, June EG_FLOW and May EN_FLOW series are shown in Figure 4, for calibration and validation periods, and the statistical results are shown in Tables 4, 5 and 6 4. Statistical results for the January EZ_FLOW streamflow forecasting experiment using both the ARMA and the regression model, which includes de SST information.Results are displayed both for the calibration and validation period.For the sake of comparison, the results of the ARMA-alone forecast are also shown.Correlation coefficients with "*" are statistically significant at the 95% confidence level.
Results obtained reveal a considerable skill achieved by the combined [ARMA_model + SST_model] models (see Tables 4, 5 and 6), with good correlation coefficients between the raw series and the models for the validation period (r = 0.79, 0.90 and 0.87, respectively) and coefficients of multiple determination R 2 (62%, 81% and 76%, respectively The combined modelling capacity can be appreciated in Figure 4 with a clear improvement in relation to the ARMA model, particularly evident during the period 1960-1993for EZ_FLOW, 1987-1992 for EG_FLOW and during the peaks of streamflow for EN_FLOW.

Modelling on interannual time scales
For these Atlantic rivers, interannual variability, which is dominating, is firstly modelled by the use of ARMA models.For the SSA_filters obtained for these rivers from the previous SSA, we have selected the following ARMA_models: ARMA (6,4) [1990][1991][1992][1993][1994][1995][1996][1997][1998][1999][2000][2001][2002][2003].c) As in a) but for the June EG_FLOW anomalies.Calibration period in this case is 1953-1989. d) As in c) but during the validation period.e) As in c) but for the May EN_FLOW anomalies.f) As in e) but during the validation period.These ARMA_models are able to explain around the 62%, 36%, 69% and 23% of the variance of the streamflow during the calibration periods (see first column of Tables 7, 8, 9 and 10), for the Miño, Douro, Tejo and Guadiana rivers, respectively.

Modelling on seasonal time scales
In this section, the effort is concentrated on potential added value of the Ocean SST on the seasonal predictability of these streamflow series.The analysis is based on the results obtained with the interannual predictability experiment carried out in the previous section.Here, the main aim is to evaluate any increment in winter streamflow forecasting skill attributable to the SST.To this end, the residual time series resulting from the interannual forecasting experiment have been analyzed.This methodology allows, firstly, comparing the relative importance of the seasonal against interannual predictability, and secondly, to construct a statistical forecasting model which includes both seasonal and interannual sources of predictability.
Four new time series have been obtained by subtracting the ARMA forecasts computed in the previous section from the raw streamflow series.Note that these time series provide some kind of "residual" time series which contains the "information" that the interannual ARMA model was not able to capture.We will call hereinafter these new time series: residual_Miño, residual_Douro, residual_Tejo and residual_Guadiana.
In order to achieve this goal, the linear correlation coefficients between the previous seasonal SST and the following winter residual streamflow series for the four rivers have been computed.Additionally, time series representative of the five significant summer and autumn SST spatial modes of variability, obtained by applying a PCA to the Atlantic SST data, have been used in this analysis.Correlation results obtained for different calibration periods: 1962-1989 for the Miño River, 1930-1989 for the Douro andTejo, and1953-1989 for the Guadiana, show that the fourth autumn mode of Atlantic SST presents a statistically significant correlation with the Miño residual time series, meanwhile only the second autumn mode of Atlantic SST presents a statistically significant correlation with the other three residual time series.Additionally, a stability analysis shows that the correlations were stable throughout the analyzed periods.Again, the second autumn mode of Atlantic SST corresponds to the tripole pattern in the North Atlantic section described previously in Figure 3, while the fourth autumn mode of Atlantic SST (not shown) presents positive loading factors to the western North Africa and Europe and negative loading factors in the centre of the Atlantic Ocean.
Based on the previous results, we have developed linear regression models in order to study the seasonal predictability of each streamflow.The models were fitted to the residual time series: residual_Miño, residual_Douro, residual_Tejo and residual_Guadiana.To develop the models, the period 1962-1989 was used for calibration for the Miño River, 1930-1989 for the Douro and Tejo, and the shorter period 1953-1989 for the Guadiana.Again, in all the four cases, the final period 1990-2004 (or 1990-2005 for the Miño River) was used for validation purposes only.These periods correspond to the ARMA calibration and validation periods used in previous section, and were selected in order to combine both the ARMA and SST forecast (following section).The regression models fitted for the four rivers residual time-series are as follows: residual_Miño = 0.11 + 0.27 x (PC4 autumn Atlantic SST) residual_Douro = 0.10 + 0.64 x (PC2 autumn Atlantic SST) residual_Tejo = 0.020 + 0.39 x (PC2 autumn Atlantic SST) residual_Guadiana = -0.06+ 0.61 x (PC2 autumn Atlantic SST)

Combination of interannual and seasonal modelling
This section shows the results obtained from the combination of ARMA_models (for interannual time scales) and SST_models (for seasonal time scales).The observed and modelled Miño, Douro, Tejo and Guadiana streamflow series are shown in Figure 5 and Figure 6, for calibration and validation periods.Results obtained reveal a considerable skill achieved by the combined [ARMA_model + SST_model] models (see Tables 7, 8, 9 and 10).
As can be appreciated in Table 7, the Miño River shows a moderately improvement in the forecasting skill by using the SST information.For the validation period, the correlation coefficient is 0.81 (0.77 using only the ARMA model), which means that the combined model can explain only 65% of the variability for this river.

Miño
Calibration period 1962-1989Validation period 1990-2005 ARMA (6,4) ARMA (6,4) + SST_model ARMA (6,4) ARMA (6,4 7. Statistical results for the Miño streamflow forecasting experiment using both the ARMA and the regression model, which includes de SST information.Results are displayed both for the calibration and validation period.For the sake of comparison, the results of the ARMA-alone forecast are also shown.Correlation coefficients with "*" are statistically significant at the 95% confidence level. Figure 5 shows the results of the forecasting experiment for the Miño streamflow using the ARMA-alone and the ARMA+SST model for the calibration  and for the validation (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) periods.Only a very moderately improvement can be observed for individual years such as 1966, 1967, 1969, 1976, 1978 or 1981.The Douro River shows a considerable improvement in the forecasting skill by using the SST information (Table 8).Particularly, the correlation coefficient is 0.93 (0.73 using only the ARMA model), which means that the combined model can explain 86% of the variability.Finally, the phase agreement is 95% (90% using the ARMA model).Skill values are very similar during the calibration period.

Douro
Calibration period 1930-1989Validation period 1990-2004 ARMA (7,3) ARMA(7,3) + SST_model ARMA (7,3) ARMA (7, The Tejo River shows lower improvement in the forecasting skill by using the SST information.As shown in Figure 6d, the improvement has to do mainly with the better capacity to forecast extreme positive values.In these cases (e.g. 1996 and 2001), the SST information provides valuable additional information in order to improve the estimates from the ARMA-alone model.This is also true for the calibration period.On the other hand, for the extreme negative streamflow values, the use of the SST information does not seem to improve model skill.Overall, the phase accordance of the model with the observation is around 90% during the validation period and the correlation coefficient is 0.89, showing that the model is able to explain around 79% of the winter Tejo streamflow variability along the period 1990-2004 (Table 9).Results for the Guadiana River (Table 10) are similar to those obtained for the Douro River.By using the SST information the forecast skill measures improve considerably.In particular, during the validation period, the correlation coefficient improves considerably, from 0.47 using the ARMA-alone model to 0.90 using the combined model, explaining around 80% of the streamflow variability.Again, the main reason of the improvement in the forecasting skills of the complete model is that the SSTs provide better estimates for both the extreme positive (e.g. 1996, 2001) and negative (e.g. 1989, 1992, 2000) streamflow values during the validation period, or the 1975-85 during the calibration period (Figures 6e and 6f).

Conclusion
The streamflow variability and predictability of five different rivers in the Iberian Peninsula trough of a statistical multimodel approach that uses as predictor the sea surface temperatures from the preceding seasons have been investigated.Monthly data flows from the Ebro, Miño, Douro, Tejo and Guadiana Rivers have been analyzed to characterize geographic differences in terms of the streamflow seasonality and some aspects of interannual and decadal variability.The results of these analyses demonstrated that while for the Miño River, the influence of the SST on the streamflow is very moderate, for the remainder rivers, strong influences of ocean conditions are found.Particularly, for the gauging stations associated to the Basque-Cantabrian region of the Ebro basin and for the Douro, Tejo and Guadiana rivers, statistically significant linear influence was found from a tripolar spatial pattern of the anomalies in the North Atlantic SST, which is commonly called the North Atlantic horseshoe pattern (Czaja and Frankignoul, 2002).For the Basque-Cantabrian region of the Ebro, positive significant correlations between the streamflow anomalies in January and this SST pattern during the previous spring are found.This association is maximum and stable for the tropical part of the pattern (approximately 0-20ºN) in previous spring, decreases in summer and autumn, and comes back strongly in December.This result is in agreement with other studies that suggest a link between spring conditions in the North Atlantic Ocean and atmospheric conditions over the same region the subsequent winter (Czaja andFrankignoul, 1999, 2002;Rodwell and Folland, 2002;Iwi et al., 2006).Rodwell and Folland (2002) showed that the North Atlantic SST in May could be used as a predictor of the subsequent winter North Atlantic Oscillation (NAO).One hypothesis is that conditions present in the North Atlantic Ocean in spring persist through the summer to influence the atmosphere in the following autumn and winter.During summer the ocean anomalies may be capped by the shallow mixed layer, subsequently to emerge as the mixed layer deepens in early winter.This mechanism is known as re-emergence (Deser et al., 2003;Cassou et al., 2004).Sutton et al. (2001) showed that a tripole pattern of SST anomalies could induce a weak NAO-like atmospheric response, and further experiments indicated the important role played by the tropical part of the pattern (Cassou et al., 2004;Peng et al., 2005).
For the Douro, Tejo and Guadiana rivers, the Atlantic horseshoe pattern in autumn has a significant influence on the variability of the following winter streamflow values.Czaja and Frankignoul (2002) also found that the negative NAO winter pattern is preceded during summer and autumn by an Atlantic SST anomaly pattern similar to that shown in Figure 3. Therefore, the positive correlation between the second autumn PC and the following winter streamflow series can be related to this phase of the NAO, which leads to positive precipitation (and streamflow) anomalies in Iberia.The opposite SST anomalies leads to the positive phase of the NAO and, then, to negative precipitation and streamflow anomalies.
The inclusion of this SST information considerable improves the skills of the forecasts compared to the ARMA-alone model forecasts.These improvements are mostly related to the ability of the SST information to provide better estimates of the extreme positive streamflow values.
These results provide information concerning the underlying physics of the teleconnection found for the streamflow and Atlantic SST, suggesting an important influence of the SLP as a main component in the SST/streamflow relationship; however, physical explanations for the connection between the SST and streamflow are not yet clear.
For the gauging stations associated to the Pyrenees region of the Ebro basin, the correlations between the June streamflow and global SST, at decadal time scales, are weaker than for those of the Basque-Cantabrian region, and only stable connections are found in the Black Sea SST during the spring of the previous year and in the south-eastern of the Pacific Ocean during the previous summer.How this connection between the SST and the flow is produced is an issue that needs further research.
For the gauging stations associated to the southeast of basin higher than average values of the Ebro river flow in May tend to be associated with cooler than average SST anomalies along the western coast of North America, and warmer than average SSTs in the central North Pacific during spring of the previous year.This Pacific SST pattern associated with the Ebro flow decadal variability bears some resemblance to the SST pattern characterizing the negative phase of the Pacific Decadal Oscillation (PDO) (Mantua et al., 1997).Currently, there are several hypotheses concerning the causes of decadal variations in different geographical regions and the influence of PDO is one of these hypotheses (Baik and Paek, 1998;Tomita et al., 2001).Other authors have also found teleconnections between the PDO and other rivers in Europe (Dettinger and Diaz, 2000;Rimbu et al., 2002;Labat, 2010).
Positive phase of the PDO corresponds to the intensification of the NAO, which is accompanied by the positive anomalies of sea level pressure in the tropical and subtropical Atlantic, over central and southern Europe, and over the Mediterranean Sea.Opposite conditions correspond to the negative phase of the PDO.The basic physical proposed scheme is that the climatic anomalies of the atmosphere-ocean interaction propagate from the Pacific Ocean toward the neighbouring continents and other oceans by means of stationary Rossby waves and synoptic atmospheric formations (Ambrizzi and Hoskins, 1997).
Overall, the analysis reveals the existence of a valuable predictability of the streamflows at seasonal, interannual and decadal time scales, a result that may be useful to water resources management.The combination of different modelling gives as result a methodology that has the capacity to provide basin-specific hydroclimatic predictions for the Iberian Peninsula river streamflows, which may improve the management of the increasing limited water resources in this region.Although the research is site-specific, its conceptual basis and lessons learned should be transferable to other areas of the world facing similar problems.
From this work several issues that require further analysis can be raised.First, our results are encouraging since that only information contained in the SST has been used to explain the streamflows variability.Statistical forecasting models that use multiple variables such as land surface temperature, precipitation or SLP as potential predictor variables can improve forecast skill by increasing the robustness of the methodology.Second, at interannual time scales the direct influence of SST variability on streamflows has not been considered.However, the periodicities associated to El Niño, which oscillate between 2 and 7 years, could be related in some way with the interannual quasi-oscillatory modes found in the streamflows.Finally, further research should be conducted in order to clarify physical mechanisms that produce the connections between the SST and the streamflow in each specific basin.

Fig. 1 .
Fig. 1.Location of the five analyzed river basins in the Iberian Peninsula.Red dots show the location of the selected gauging stations used in this study along with their code names.
in the basin, were considered.For the Miño River, the data base of streamflows comprises monthly data from 18 stations, covering the period from 1956 to the remainder rivers, just three stations, spanning between 1923-2004 for theDouro and Tejo, and 1947-2004  for the Guadiana, within the Portuguese section of the rivers, are considered.

Fig. 2 .
Fig. 2. Correlation maps between the decadal_component of streamflow time series of a) January EZ_FLOW and previous spring SST anomalies, b) January EZ_FLOW and previous summer SST anomalies, c) June EG_FLOW and spring SST anomalies of the previous year, d) June EG_FLOW and previous summer SST anomalies, e) May EN_FLOW and previous spring SST anomalies, and f) May EN_FLOW and previous summer SST anomalies.Only significant values at the 95% confidence level are shown.The rectangle corresponds to the stable SST regions finally considered for the decadal_component modelling.

Fig. 3 .
Fig. 3. Loading factors (by ten) for the third spatial mode resulting from the PCA of the global spring (March-April-May) sea surface temperature.The period of analysis is 1871-2007.

Fig. 5
Fig. 5. a) Results of the forecasting experiment for the Miño streamflow using the ARMAalone and the ARMA+SST model for the calibration period 1962-1989.b) As in a) but for the validation period 1990-2005.

Fig. 6 .
Fig. 6. a) Results of the forecasting experiment for the Douro streamflow using the ARMAalone and the ARMA+SST model for the calibration period 1930-1989.b) As in a) but for the validation period 1990-2004.c) As in a) but for the Tejo.d) As in b) but for the Tejo.e) As in a) but for the Guadiana and for the calibration period 1953-1989.f) As in b) but for the Guadiana.

Table 1 .
List of individual gauging stations selected in the river basins.
Table 1 presents a description of all the selected gauging stations used in this study.River Station name Code Data length Months with maximum streamflow values

Table 2 .
Significant oscillatory modes (peak period in years) identified by SSA on streamflow gauging stations of the Ebro Basin, along with the fraction of total variance (%) explained by each mode.Last row shows the total variance explained by the SSA_filter.Note from Table2that for the Ebro River, along with the non-linear trends, oscillatory modes with periods longer than 9 years are explaining a big part of the variance of the series.This means that for this Mediterranean river the variance of the raw series associated to the decadal or multi-decadal variability (periods longer than 9 years) is relatively high, especially for the Ebro_EZ and Ebro_MN time series that reach 31.3% and 35.2% of the total variance, respectively.

Table 3 .
Significant oscillatory modes (peak period in years) identified by SSA on streamflow gauging stations of the Miño, Douro, Tejo and Guadiana rivers, along with the fraction of total variance (%) explained by each mode.Last row shows the total variance explained by the SSA_filter. .

Table 6 .
As Table 4 but for May EN_FLOW.

Table 8 .
As Table7but for the Douro River.

Table 9 .
As Table7but for the Tejo River.

Table 10 .
As Table7but for the Guadiana River.