Seasonal sub-basin-scale runoff predictions: A regional hydrometeorological Ensemble Kalman Filter framework using global datasets

Study region: The S ˜ ao Francisco River Basin (SFRB) in Brazil Study focus: In semi-arid regions, interannual variability of seasonal rainfall and climate change is expected to stress water availability and increase the recurrence and intensity of extreme events such as droughts or floods. Local decision makers therefore need reliable long-term hydro-meteorological forecasts to support the seasonal management of water resources, reservoir operations and agriculture. In this context, an Ensemble Kalman Filter framework is applied to predict sub-basin-scale runoff employing global freely available datasets of reanalysis precipitation (ERA5-Land) as well as bias-corrected and spatially disaggregated seasonal forecasts (SEAS5-BCSD). Runoff is estimated using least squares predictions, exploiting the covariance structures between runoff and precipitation. The performance of the assimilation framework was assessed using different ensemble skill scores. New hydrological insights for the region: Our results show that the quality of runoff predictions are closely linked to the performance of the rainfall seasonal predictions and allows skillful predictions up to two months ahead in most sub-basins. The anthropogenic conditions such as in the Western Bahia state, however, must be taken under consideration, since non-stationary runoff time-series have poorer skill as such unnatural variations can not be captured by long-term co-variances. In sub-basins which are dominated by little anthropogenic influence, the presented framework provides a promising and easily transferable approach for skillful operational seasonal runoff predictions on sub-basin scale.


Study region:
The São Francisco River Basin (SFRB) in Brazil Study focus: In semi-arid regions, interannual variability of seasonal rainfall and climate change is expected to stress water availability and increase the recurrence and intensity of extreme events such as droughts or floods.Local decision makers therefore need reliable long-term hydrometeorological forecasts to support the seasonal management of water resources, reservoir operations and agriculture.In this context, an Ensemble Kalman Filter framework is applied to predict sub-basin-scale runoff employing global freely available datasets of reanalysis precipitation (ERA5-Land) as well as bias-corrected and spatially disaggregated seasonal forecasts (SEAS5-BCSD).Runoff is estimated using least squares predictions, exploiting the covariance structures between runoff and precipitation.The performance of the assimilation framework was assessed using different ensemble skill scores.New hydrological insights for the region: Our results show that the quality of runoff predictions are closely linked to the performance of the rainfall seasonal predictions and allows skillful predictions up to two months ahead in most sub-basins.The anthropogenic conditions such as in the Western Bahia state, however, must be taken under consideration, since non-stationary runoff time-series have poorer skill as such unnatural variations can not be captured by long-term covariances.In sub-basins which are dominated by little anthropogenic influence, the presented framework provides a promising and easily transferable approach for skillful operational seasonal runoff predictions on sub-basin scale.

Introduction
The effect of global warming on climatic extreme events such as droughts combined with the growing water demand for cities and irrigation can result in intense water scarcity in certain regions of the world (Schewe et al., 2014;Stocker et al., 2014;Du et al., 2018).Mostly impacted are semi-arid regions due to limited water availability, accumulation and intensification of drought periods and increasing sedimentation of water reservoirs.Especially in those regions, observational data is sparse and the number of reliable and long-term operating measurement stations is decreasing (Lorenz and Kunstmann, 2012).This does not allow decision makers to rely on local statistics to anticipate those hardly predictable extreme events, and thus to plan for a proactive water allocation during droughts and to manage reservoir levels during floods.Climatic information on seasonal scales are therefore particularly important and fortunately, semi-arid climates near or at the tropics can offer reasonable predictability on seasonal timescales: In fact, the climate variability in these regions is mainly affected by large-scale forcing of thermal and dynamical atmosphere-ocean interaction from intraseasonal to multidecadal timescale (e.g., Madden Julian Oscillation, El Nião Southern Oscillation, Pacific Decadal Oscillation, Atlantic Multi-Decadal Oscillation, Quasi-Biennal Oscillation) (Kayano and Andreoli, 2004;De Souza and Ambrizzi, 2006;Taschetto and Ambrizzi, 2012;Taschetto and Wainer, 2008;He et al., 2017;Li et al., 2020); or long-term interactions like inertial memory, soil moisture, sea surface temperatures (Koster et al., 2000), that are well represented in state-of-the-art seasonal predictions systems (Johnson et al., 2019).Hence, an adjustment of policies to sustainable water-management and the development of tools to make global information usable for individual regions is required.In this context, seasonal sub-basin-scale runoff is proposed to be estimated applying a hydro-meteorological EnKF framework, using the statistics of global freely available datasets and global model predictions.This approach is derived from a previous study (Lorenz et al., 2015), where EnKF-based basin-scale runoff estimations proved to provide promising results on monthly time scales.More specifically, we focus on the application of publicly available global hydrometeorological datasets and exploit the joint temporal and regional covariance structures to generate least square estimated predictions.The motivation is to develop a similar framework for seasonal runoff predictions, using seasonal precipitation predictions from the European Centre for Medium-Range Weather Forecasts (ECMWF) latest seasonal forecasting system SEAS5 as well as reanalysis-based reference precipitation from ERA5-Land, which is an offline re-run of the land-surface component from ECMWF's latest atmospheric reanalysis ERA5.We further use runoff observation data from the Brazilian National Water Agency (ANA).In fact, different studies showed the potential of seasonal forecasts to predict extreme events like droughts and to support water management in semi-arid regions (Changnon and Vonnhame, 1986;Garbrecht et al., 2006;Portele et al., 2021).In this regard, various statistical rainfall-runoff models (Misumi et al., 2001;Garbrecht et al., 2006;Ajami et al., 2016;De Paiva et al., 2020) or process-based hydrological models (Yuan et al., 2016;Crochemore et al., 2016;Meißner et al., 2017;Foster et al., 2018;Sehgal and Sridhar, 2019;Wanders et al., 2019), which convert climate or seasonal precipitation forecasts into stream flows adapted for real-time basin monitoring, already exist.In comparison, our proposed framework uses high-resolution precipitation forecasts within a statistical rainfall-runoff EnKF framework to predict computationally efficient seasonal runoff at sub-catchment level.In this study, the São Francisco River Basin (SFRB) in Brazil is investigated.Being disposed to have increasing water-related problems in the future (Cunha et al., 2018), the SFRB requires sustainable mitigation strategies and therefore provides a suitable testbed for the EnKF-based seasonal hydrological forecasting system.

Study area
With a total surface area of 636, 851 km 2 , the SFRB (Fig. 1) is a very multifaceted region, both in terms of its climatic and physical characterization as well as its environmental diversity.With its 2860 km of length, the river originates in the Canastra mountain in the central part of the Southeast region and flows into the Atlantic Ocean in the Northeast part of the catchment.The main course of the river is supplied by the tributaries of several sub-basins with 75 % of the water flow generated in the upper reach of the state of Minas Gerais (Maneta et al., 2009;Traini et al., 2012;de Jong et al., 2018).The SFRB unites regions with two different climate regimes between the upper and lower basin.In the southern part of the basin (regions 3-11 in Fig. 1), the rainy season is during November-December-January (NDJ) and its variability is strongly influenced by the South Atlantic Convergence Zone (SACZ) (Kodama, 1992;Kodama, 1993;Carvalho et al., 2002), (Carvalho et al., 2004).The rainy season over the northern part (regions 0-2 in Fig. 1) is observed during January-Apr (JFMA) and is mainly linked to the high activity of the Intertropical Convergence Zone (ITCZ) and the emergence of easterly waves and squall lines derived by the sea breeze (ParedesTrejo et al., 2016;Sun et al., 2016).Because of its high population of approx.17 million people, the region incorporates important economic activities, increasing the stress on the environment and the water resources (Ioris, 2001;Maneta et al., 2009).Apart from drinking water, freshwater resources are mainly used for electricity generation, urban and industrial services, intensive irrigation and mining activities.This involves a high concentration of dams and water abstractions all along the SFRB river (de Jong et al., 2018;do Vasco et al., 2019;dos Santos et al., 2020).The Western Bahia state (regions 5, 6, and 7 in Fig. 1) in particular, is a very active agricultural area which is subject to significant reduction in river discharge, rainfall and ground water level since the 1980s (Pousa et al., 2019;Santos et al., 2020).In addition to the significant environmental impact of these activities on biodiversity, soil erosion and land desertification (Cunha et al., 2015;Tomasella et al., 2018), the North East of the region was recently hit by an intense multi-year drought that caused a significant lack of water in the region and emptied most reservoirs (Marengo et al., 2018;Martins et al., 2018).

Data
The study is based on 11 sub-basins of the SFRB (Fig. 1) and monthly time series over a total period of 30 years between 1981 and 2011.The 11 sub-basins were delineated and associated with runoff gauges given the spatial distribution of the gauges and the spatial resolution of the precipitation products.Information about the different data sets used for this study are summarized in Table 1.More detailed information about the runoff gauges and missing values can be found in Table 2.

Runoff
The gauges located at the sub-basins outlet were used to quantify the runoff of each sub-basin (red triangles in Fig. 1).The daily discharge data of the 11 selected gauges were obtained from the Brazilian National Water Agency (ANA) database, which contains all  2) are filled using the climatology.Because of the high dam concentration in region 0 (Fig. 1) which denies the description of the natural relationship between precipitation and runoff through simple covariances or correlations, only the N ∘ 1-11 sub-basins are considered in this study.

Precipitation
Reference precipitation is taken from the ERA5-Land climate reanalysis (Muñoz-Sabater et al., 2021) dataset of the European Centre for Medium-Range Weather Forecasts (ECMWF) Integrated Forecasting System (IFS).This dataset is derived from a re-run of the land component of the ECMWF ERA5 climate reanalysis (Hersbach, 2016) and contains gridded daily global land surface precipitation at 9 km resolution from 1950 onwards.ERA5-Land has the particular advantage of including a representation of the hydrological cycle, which improves the agreement between river discharge estimates and available observations.The hydro-meteorological consistency and high spatial resolution of ERA5-Land makes it an ideal dataset for hydrological research purposes.The gridded data were monthly and spatially averaged over the different sub-basins to obtain basin scale rainfall estimates.

Bias-corrected and spatially disaggregated seasonal forecasts (SEAS5-BCSD)
Monthly seasonal forecasts for precipitation are obtained from ECMWFs latest seasonal forecasting system SEAS5, which are biascorrected and spatially disaggregated towards ERA5-Land (Lorenz et al., 2021).The forecasts cover the whole (re-)forecast period from 1981 to 2019 and include bias-corrected and spatially disaggregated daily and monthly ensemble forecasts for precipitation, average, minimum and maximum temperature as well as for shortwave radiation from the issue date to the next 215 days and have a spatial resolution of 0.1 ∘ .They consist of an ensemble of 25 (until 2016) and 51 (from 2017) members, that are corrected towards ERA5-Land using a quantile-mapping approach for estimating a temporally consistent set of land-surface variables.The bias-correction of precipitation forecasts further includes a correction of precipitation intermittency to ensure the agreement of the wet-and dry-day frequencies from ERA5-Land and SEAS5-BCSD.Compared to the raw SEAS5 forecasts, SEAS5-BCSD has a higher spatial resolution, lower biases with respect to ERA5-Land and substantially reduced model drifts with forecast lead time and are, hence, more suitable for regional applications.Because this study focuses on a time period prior 2016, the SEAS5-BCSD ensemble size has only 25 members.

Prediction model
For this EnKF framework, the state of the system is represented as a multi-dimensional vector X of size 2 × N reg with N reg being the number of sub-basins, containing the runoff R and the precipitation P state vectors to be estimated at each time step t.
The prediction equation has been derived under the assumption that the time evolution of the precipitation and runoff anomalies can be written as a first order stochastic process with the monthly residual r at time step t and the long-term mean annual cycle for the state vector X at month m over a total period of τ months The prediction is characterized by the matrix of the process dynamics A and is supposed to model the dynamics and statistical relationship of the temporal evolution of the two water cycle variables.The zero-mean normally distributed white prediction noise ϵ ∼ N (0, Q P ) represents errors coming from the model data, parameters and physics with Q P being the prediction error covariance matrix.Because of the complexity and the lack of physical knowledge about the dynamics of the process model, A is considered as unknown.
Nevertheless, an estimation Â can be obtained with least square prediction methods making use of the process stochastic information.For a complete and detailed derivation of the methodology, see (Moritz, 1980;Kurtenbach et al., 2012;Tourian, 2013;Lorenz et al., 2015).We are looking for an approximation Â as a linear estimator of r t given r t− 1 where r is the predicted state.The estimated prediction matrix Â should minimize the prediction error e = r t − rt , i.e. minimize the trace of the error covariance matrix Q P .The total error can be developed as follow and is thus minimized for By inserting ( 7) into (6) we finally obtain the error covariance of the prediction matrix Â where Σ = E{r t r T t } and Σ Δ = E{r t r T t− 1 } are the spatio-temporal auto-and cross-covariance matrices, respectively.These covariance matrices are approximated with the empirical sample covariance matrices Σ and ΣΔ between anomalies of precipitation, and runoff over the total time period of τ months

The Ensemble Kalman Filter framework
According to the previous derived least square prediction estimate, the prediction equation can be developed by substituting (3) in (2) where B and U represent the control input terms As a result, the predicted state vector X f t at time t can be estimated using the analysis state X a t− 1 at time t-1, the prediction matrix Â, describing the variables spatio-temporal relationships and the control input U that represents the mean annual cycle.I stands for the identity matrix (I = ΣΣ − 1 ) and the index d refers to the ensemble member from an ensemble of size N ens .The prediction and observation covariance matrices are estimated by the sample covariance matrix from the ensemble prediction of size N ens with X f t , the ensemble mean.These covariance matrices describe the uncertainity in the predicted and corrected states.In the innovation step, the analysis state X a t,d and its associated covariance matrix Σ a X,t are obtained by updating the gain-weighted difference between the observed state Y t and the predicted state X f t,d .The correction step is formulated as follow with the observation relation matrix H t that maps the predicted state to the observed space and the zero-mean normally distributed . The Kalman gain matrix K t represents the optimal weighting between the error covariance of the predicted state Σ − X,t and the errors of the observed state Q t,obs .It is defined as The model is then integrated recursively forward in time from the innovation until the next update cycle.

Observation update
Generally, the observation vector Y t and associated observation error covariance matrix Q t,obs are given by where P t,obs and R t,obs are the observation vectors of precipitation and runoff at time t with Σ P,t and Σ R,t their corresponding error covariance matrices.For the forecast initialization, the observation vector Y t of size 2 × N reg includes the historical precipitation (ERA5-LAND) and runoff (ANA) vectors at time t.Since we assume perfect observations during the initialization, the corresponding error covariance is set to zero.During the forecast, the predicted state is updated at each time step t using the SEAS5-BCSD precipitation forecasts at the corresponding forecast lead time.Since runoff observations are not assimilated, the observation vector reduces to where P t,obs represents the precipitation observation vector at time t including all SEAS5-BCSD ensemble members.This gives Y t a total size of N reg × N BCSD , with N BCSD the SEAS5-BCSD ensemble size.Indeed, the whole ensemble is assimilated in order to benefit from all the information contained in the forecasts.Each element of the observation vector has an associated observation error covariance contained in Q t,obs = Σ P,t , which is estimated from the squared ensemble spread of the SEAS5-BCSD precipitation ensemble forecasts for the given region and forecast horizon.

Further assumptions
No spin-up is applied to the model states as this reduces the prediction performance.Possible reasons could be that the observation covariance matrices generated by a probabilistic climatology are not well conditioned or that the convergence to the optimal error level is too slow and would require more data.
Various ensemble sensitivity tests, not shown here, suggest that an ensemble size of N ens = 200 members is a robust choice for the EnKF runs.

Evaluation framework
Four different skill scores, namely the Inter-Quantile Range Skill Score (IQRSS) between the 95th and 5th percentiles, the Mean Absolute Error Skill Score (MAESS), the Continuous Ranked Probability Skill Score (CRPSS) and the Brier Skill Score (BSS) for above and below normal conditions were computed.A detailed description of the used score metrics can be found in Appendix A.
The reference ensemble forecast for runoff and precipitation are leave-one-year-out climatology-based ensemble forecasts generated from the ANA runoff and the ERA5-Land precipitation data, respectively.For a given year, this established method consists of combining the observations of all other years to create an ensemble forecast, so that the validation is done with independent data.
The performance scores are then derived for both runoff and SEAS5-BCSD precipitation predictions and are computed for each region, time step and lead time over the evaluation period.In this study, we focus only on the first four lead times, with lead 0 referring to the current month's forecast and lead 3 to the fourth month's forecast.
An overview of the size of the different parameters used in the EnKF framework can be found in table 3.

Annual cycle
Fig. 2 shows the rainy season of the whole SFRB and its 12 sub-basins.The two maxima visible in December (regions 3-11) and March (regions 0-2) shows the entanglement of the two different climate regimes between the upper and lower basin.The main rainy season in the lower basin is observed during January-April (JFMA) and in the upper portion of the basin it is during November-January (NDJ).In this study, we consider the period from November to March as a good compromise between the two climate zones to describe the wet season of the entire SFRB.Folland et al. (2001), Misra (2006) and Kulikova et al. (2014) presented evidence of a relationship between Pacific and tropical Atlantic SST anomaly patterns and seasonal precipitation in northeast Brazil, contributing to the long-term predictability over the SFRB.This is confirmed by the good level of agreement between the first lead times of SEAS5-BCSD precipitation forecasts and ERA5-Land precipitation in Fig. 2. Similarities in the spatio-temporal relationship between runoff and rainfall can also be observed.However, regions 5, 6, and 7 do not seem to follow the typical rainy season cycle, it is expected therefore that predicting runoff in those regions may be more difficult.

Prediction performance
The runoff of the 11 SFRB sub-basins is reforecasted on a monthly basis over the evaluation period 1981-2011.Fig. 3 shows the reforecasts of lead time 0 (blue) and corresponding runoff gauge observations (red).In most cases, the forecasts can reproduce the observed stream flow.In particular, the upper-basin (region 8-11) and lower basin (region 1-4) show the best agreement, with a good concordance between the predicted and observed runoff peaks.However, predictions for the Western Bahia state regions 5-7 have a lower quality compared to observations, with a noisy behavior and a general underestimation of the baseflow during the first 15 years.Correspondingly, the reduction of the river streamflow described in Section 2.1 is visible in these regions over the same period.Since the control input term U (eq. 12) carries information on the mean annual cycle, an abnormal structure of the annual cycle of those regions can explain the observed underestimation of the runoff predictions.While most events in the catchment are well predicted, a few seem to be more difficult to predict.(e.g.. Fig. 3 region 11 in 1991 or region 8 in 1982).It is therefore important to determine the origin of the performance, whether it derives from the algorithm, the quality of the driving data or the quality of the SEAS5 BCSD rainfall predictions.Therefore, a deeper analysis was initiated where the different performance metrics described in the methods section were calculated.
. Figs. 4 and 5 show the monthly and seasonally averaged skill scores, respectively, of the EnKF runoff reforecasts and corresponding SEAS5-BCSD precipitation predictions during the main rainy season (November -March) for the first four lead months in all regions.The skill of the predictions is shown in colors from blue (better skill than the reference forecast) to orange (worse skill than the reference forecast).In both figures, an overall good agreement of the three skill scores for the runoff predictions and the SEAS5-BCSD precipitation predictions can be seen, with similar patterns over months, lead times and regions, and a general trend of decreasing skill with lead time.Additionally, for both the SEAS5-BCSD precipitation forecasts and the runoff forecasts, an improved skill during the first months of the rainy season (Nov, Dec, Jan) is evident compared to the last months of the rainy season (Feb, March).This monthly trend is more pronounced for the runoff forecasts, especially at higher lead times (lead 2 and 3).The forecast sharpness, accuracy and overall performance of both the runoff and SEAS5 BCSD forecasts decrease similarly for the first two lead times.The good IQRSS and MAESS and CRPSS values for these lead times imply that the predictions are very accurate, only slightly deviate from the reference and have a more representative distribution with respect to climatology.
However, for higher lead times, the SEAS5-BCSD predictions performance remains more uniform compared to the runoff predictions which decrease more linearly.In Fig. 4, where all three skill scores of the runoff forecasts are positive for most of the upper and lower basin, regions 5, 6 and 7 show poorer performance affecting all different lead times and months.This can also be seen in Fig. 5, where the MAESS and IQRSS of the runoff forecasts over the rainy season are negative for all lead times in region 5 and negative for regions 6-7 from lead time 2 on.This trend which isn't visible in the SEAS5-BCSD precipitation predictions skill scores is concordant with the previously mentioned erratic behavior in the runoff time series in those regions (Fig. 3).Interestingly, the lead 2 forecasts in February have a lower skill compared to the lead 3 forecast.This is also associated with a lower skill in the SEAS5-BCSD precipitation forecasts.This finding is counter-intuitive since a monotonically decreasing performance with lead time was expected.It has however been shown by several studies that central eastern Brazil can show low predictability on a seasonal scale during the austral summer as rainfall variability is related to complex land-atmosphere interactions that are not well captured by models (Marengo et al., 2003;Grimm et al., 2007).
To assess the ability of the EnKF framework to forecast events above or below the seasonal average, the BSS was calculated.To investigate which part of the skill is directly engendered by the SEAS5-BCSD precipitation forecast and which part is inherent to the EnKF framework, the BSS was also computed for SEAS5-BCSD precipitation forecast against the ERA5-Land product.Fig. 6 shows the BSS for both mentioned cases for lead 0 over the different regions and months of the rainy season.For both cases of "Above Normal" and "Below Normal", we can see positive BSS above 0.3, for most months and regions, even for the regions of the Western Bahia state.In addition, Fig. 7 shows the BSS for each month averaged over the whole rainy season and for the first four lead times.In both cases, the upper and lower tercile show better skill for lead 0 compared to higher lead times, with scores close to zero.Furthermore, lead 0 has higher skill scores in the lower basin compared to the upper basin.The features visible here show that the precipitation events in the above and below normal categories of the SEAS5-BCSD precipitation forecast match that of the predicted runoff, with higher scores in lead 0 compared to higher leads.
The skill score analysis illustrates how this framework is able to exploit the simple "rainfall-runoff" relationship, which in combination with the quality of the SEAS5-BCSD, provides skillful runoff predictions up to two months ahead.In most cases, the SEAS5-BCSD precipitation forecasts outperform or are close to the runoff forecasts, but in some cases the SEAS5-BCSD precipitation forecasts also outperform the runoff forecasts.Greuell et al. (2019), Crochemore et al. (2016) and Yossef et al. (2013) have shown that in European catchments, most frameworks using process-based hydrological models and post-processed seasonal forecasts (ECMWF seasonal forecast Systems 3 (SEAS3) and 4 (SEAS4)) as input, hardly show skill beyond one lead month.Yossef et al. (2013) showed that in the tropical catchments of South America, the accuracy of runoff forecasts from the FEWS-World global seasonal runoff forecasting system, driven by SEAS3 precipitation forecasts, is limited to 1-2 months lead time during most of the year.The presented EnKF framework shows similar performance with skillful runoff predictions of 1-2 months ahead during the rainy season in most of the SF sub-basins.Consequently, the severe recurrent droughts in northeastern Brazil in 2011-2016 (Cunha et al., 2019;Marengo et al., 2017;Erfanian et al., 2017) could have been anticipated 1-2 months in advance, corresponding to sub-basins 1, 2 and 3.However, a direct comparison is inherently delicate as the forecast performance can depend on many factors, e.g., the hydrological model employed, the seasonal forecasts product, the post-processing techniques applied, the different geographical locations or the different regional and seasonal weather regimes affecting the study area (ENSO, MJO, North Atlantic Oscillation (NAO).
Finally, this framework does not require prior knowledge of local hydrological initial conditions generally necessary to run a model (soil moisture, snowpack, groundwater storage, regulation of lakes and dams), which makes this approach very suitable for regions where hydrological information is lacking.Furthermore, it is a transferable approach that can complement other types of statistical or more physically based models to easily monitor a single catchment at sub-basin level.The number of input variables such as evapotranspiration or temperature is also expandable, allowing for more robust correlations between all variables and thus improving prediction quality.As shown in Lorenz et al. (2015), this approach for global sub-basin forecasts can be extended to other regions, as SEAS5-BCSD forecasts for other regions already exist (Lorenz et al., 2021).

Summary and conclusion
In this study, a hydro-meteorological Ensemble Kalman Filter (EnKF) based data-assimilation framework is developed, with which sub-basin-scale runoff time-series are predicted.A least-squares prediction method has been applied for the prediction scheme in order to exploit the spatio-temporal relationship between precipitation and runoff.Besides reference runoff (ANA) and precipitation (ERA5- This framework, however, also suffers from a number of limitations: (1) The skill of the SEAS5-BCSD predictions is the primary constraining factor for the performance of runoff forecasts, as these are used as input to the model.(2) The simplicity of the leastsquares prediction method is limited by the insufficient knowledge of the model physics, the catchment characteristics and the local hydroclimatic conditions.For example, our current model cannot describe regions where runoff is disturbed by anthropogenic activities such as in the western state of Bahia (regions 5, 6 and 7).On the other hand, the framework is very suitable for regions where the key parameters for hydrological modeling as well as the initial and boundary conditions are missing.(3) The set of input variables, which can be extended to further variables in the SEAS5-BCSD dataset, such as evapotranspiration or temperature, could provide additional correlation statistics that have the potential to further improve the skill of the runoff predictions.Future work could include comparisons and implementation of this methodology to process-based hydrological models with more input variables and a wider range of regions.
Finally, this approach, which relies on global, freely available information for regional runoff forecasting, has great potential for regional decision support as well as seasonal water resource management and can complement other existing operational hydrological models.This includes the development of institutional systems and national strategies for drought preparedness plans, long-term flood awareness and sustainable reservoir management with a computationally cost-effective and easily transferable approach.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to Fig. 6.Brier Skill Score (BSS) for runoff reforecasts against runoff observations (Run.) and SEAS5 BCSD precipitation predictions against ERA5-Land (Prec.).The BSS is computed for lead 0 over the different regions and months of the rainy season for Below Normal and Above Normal conditions.Since scores aren't linear for negative values, all scores below the threshold of − 0.1 aren't shown.

Fig. 1 .
Fig. 1.The São Francisco River Basin (SFRB) in Brazil.The blue solid lines indicate the tributaries and the black solid lines show the delineation of the sub-basins.The markers represent the location of each, i.e, the selected gauges (red triangle) and all the gauges (orange circle).The numbers (0− 11) are used to identify the different sub-catchments.Sub-basin N ∘ 0 is not analysed in the EnKF framework.

Fig. 3 .
Fig. 3. Runoff reforecasts for lead 0 (blue scatter) and corresponding observations (solid red line) over the evaluation period 1981-2010 for all regions.

Fig. 4 .
Fig. 4. Overview of the 3 skill scores (IQRSS, MAESS, CRPSS) of the runoff reforecasts (squares) and SEAS5-BCSD precipitation predictions (circles) for each region, month and lead time during the main rainy season (November -March).Blue and red colors describe negative and positive skill scores, respectively.Since scores aren't linear for negative values, all scores below the threshold of − 0.1 aren't shown.

Fig. 5 .
Fig. 5. Skill scores of runoff reforecasts (Run.) and SEAS5 BCSD precipitation predictions (Prec.)averaged during the main rainy season for the first four lead times over the 11 SFRB sub-catchments.Since scores aren't linear for negative values, all scores below the threshold of − 0.1 aren't shown.

Table 1
Summary of the datasets used in this study.collected by the National Hydrometeorological Network (RHN) in Brazil.The data are averaged monthly and the few missing values (see table information

Table 2
Description of the gauging stations used for each sub-basin with information on mean annual discharge, runoff and percentage of missing values in the daily time series.

Table 3
Description and value of the different EnKF parameters.