Forecasting annual maximum water level for the Negro River at Manaus using dynamical seasonal predictions

Early and skilful prediction of the Negro River maximum water levels at Manaus is critical for effective mitigation measures to safeguard lives and livelihoods. Using dynamical seasonal prediction hindcasts, from six prediction centres, we investigate extending the lead time of previously developed statistical models, which issue forecasts in March for Manaus. The original statistical forecast models used observed rainfall as the major predictor. We advance the capability to issue skilful forecasts earlier, in February. We develop ensemble forecasts by combining predictor data from observations and seasonal hindcasts. We compare those forecasts against the original statistical forecast models and forecasts using the observed climatology or persistence of predictors. The ensemble-mean forecasts, issued in February, using European Centre for Medium-Range Weather Forecasts (ECMWF) hindcast input, perform similarly as the original forecasts issued in March and gain one month of lead time. The ECMWF-based ensemble forecasts skilfully predict the likelihood of water levels exceeding the severe flood level of 29 m. Forecast performance reduces and ensemble spread increases with increasing lead time from February to January. We conclude that forecasts for Manaus maximum water levels can be produced using combined input from observations and real-time ECMWF forecasts.


Practical Implications
Most cities, rural settlements and indigenous villages in the Central Amazon region are established along the main river and tributaries. The river valleys have been settled and used for centuries by indigenous and traditional populations performing activities for subsistence and commerce such as agriculture, livestock production, fishery and forestry which are intrinsically related to the annual hydrological cycle. Due to the amount and seasonality of rainfall in the Amazon basin, the large rivers have regularly occurring high-water (around June) and low-water seasons (October/November) with annual water level amplitudes of about 10 m in the Central Amazon region. During the last three decades scientists have observed increasing magnitude and frequency of severe flood events which endanger thousands of people, result in severe health problems, loss to infrastructure, properties and other socioeconomic sectors (Marengo et al., 2012;Gloor et al., 2013;Barichivich et al., 2018). During the last ten years the Central Amazon region has been affected by six severe flood events (2012,2013,2014,2015,2017,2019) attaining the critical threshold to declare the situation of emergency. To prevent and mitigate severe impacts on the urban and rural populations and on socioeconomic sectors, seasonal forecasts of severe flood events with a long lead time are required, to provide a reliable decision-making tool for public policymakers.
Based on previous studies, during the IV CSSP (Climate Science for Service Partnership) Brazil Annual Scientific Workshop 2019 a consortium of scientists from UK and Brazil, to develop a skilful forecasting system with sufficiently long lead time to forecast annual maximum water level for the Negro River at Manaus, Brazil with the potential to expand it in future to other strategic locations in the Amazon basin. This project named PEACFLOW (Predicting the Evolution of the Amazon Catchment to Forecast the Level Of Water) is designed to support the official forecast of flood events performed by the Brazilian Geological Survey (CPRM) in Manaus end of March, providing essential and additional information for effective implementation of disaster risk management actions. The new method included the use of various predictors from preceding months, such as rainfall, river water level, and Pacific and Atlantic Ocean conditions. The regularity of delay between the catchment rainfall and peak water level at Manaus allowed for the development of skilful statistical forecast models for issuing reliable forecasts with the same skill as existing operational models at longer lead times.
In this study, we gained an additional month of lead time when we replaced the observed input data with the ECMWF (European Centre for Medium-Range Weather Forecasts) dynamical seasonal ensemble forecast. We developed two operational models using this data, which provide probabilistic forecasts at the beginning of January and February. The probabilistic forecasts, using ECMWF input, show good skill for the likelihood of river stage exceeding the 29 m emergency flood stage threshold. The methods developed in this project can also be used to develop forecast models for water levels over other Amazon basin regions. The fully automated PEACFLOW project models are provided in an open access GitHub repository (https://github.com/achevuturi/PEAC-FLOW_Manaus-flood-forecasting). PEACFLOW models can be used to provide operational forecasts. Using the PEACFLOW models we retrospectively forecasted the annual maximum water levels at Manaus for 2020, and we actively forecasted for 2021 in real time. Our forecast for the year 2021 shows the maximum water levels exceeding 29 m, which correctly indicated emergency conditions due to floods for Manaus.

Introduction
The Amazon is the largest river basin in the world (Callède et al., 2010) and one of the few remaining networks of free-flowing large rivers (Grill et al., 2019). The Amazon river system provides water for domestic use, irrigation, livestock, hydro-power generation, river-based transportation, and essential ecosystem services (Junk et al., 2014). Variations in river water levels (flood and droughts) in the Amazon basin can cause considerable regional environmental and socioeconomic impacts (Marengo and Espinoza, 2016). Many parts of the Amazon Basin lack integrated flood and drought disaster management plans (Dolman et al., 2018). More frequent and intense flood hazards in the last two decades (e.g., Gloor et al., 2013;Barichivich et al., 2018) make it essential to have early and skillful forecasting systems for annual maximum water levels of Amazonian rivers, to better prepare for extreme floods.
The Negro River, like most of the Amazon network rivers, has seasonal flood levels during May-July and low levels during September-November (Schöngart and Junk, 2007;Chevuturi et al., 2021). The Amazonian rivers present a regular hydrological annual cycle with a single annual flood event that lasts for weeks to months. This regular monomodal flood-pulse offers an opportunity to predict, months in advance, the maximum water level of the free-flowing river system. The hydrograph for River Negro at Manaus showing the evolution of water level through out the year is shown in Fig. 1. Currently, the Brazilian Geological Survey (CPRM) uses simple linear regression models to issue forecasts at end of March, April and May for maximum water level at Manaus, using current water levels of the respective issuance month as a predictor (Maciel et al., 2020;Maciel et al., 2022). Another seasonal forecasting model for Manaus was developed collaboratively by Instituto Nacional de Pesquisas da Amazonia (INPA) and Max-Planck Society, using multiple linear regressions of maximum water level against prior Manaus water levels and indices representing large-scale modes of climate variability connected to basin rainfall (Schöngart and Junk, 2007). The INPA model issues forecasts by the first week of March for annual maximum water levels for Manaus, using Niño3.4 sea surface temperature (SST) anomalies for December-February, the Southern Oscillation Index for November-January, the Pacific Decadal Oscillation for February, the previous year's minimum water level and the 7 th March water level at Manaus (Schöngart and Junk, 2020).
Complex dynamical coupled hydrological models can issue daily river streamflow ensemble forecasts for South America on a grid-point level (Siqueira et al., 2020), and perform satisfactorily for water level forecasts when compared against in situ station data and remote sensing estimates (Siqueira et al., 2018). Catchment-based hydrologic-hydrodynamic models use discretization of the river networks to successfully simulate water levels in the Amazon basin (Fan et al., 2021), even with limited data for the river features and geometry (Paiva et al., 2011). The water level forecasts can have meaningful skill for Amazon river flow, even for seasonal timescales, by assimilating in situ and radar altimetry data ; can be used to map the flood hazards, which compare well against CPRM estimates for different flood return periods . However, the hydrological models used over the Amazon basin have errors due to the precipitation forcing and the riverfloodplain parameters used (de Paiva et al., 2013); the uncertainty from the model initial conditions degrades river flow forecast skill at seasonal timescales (Paiva et al., 2012). Further, hydrological models usually perform better at the subseasonal and regional scales (Towner et al., 2019). Thus, along-with large-scale flood forecasts from hydrological models, station-level statistical forecasts for maximum water level are operationally useful for the Amazon region (Maciel et al., 2020).
The large drainage basins of the Amazonian rivers integrate smallscale precipitation variability; the river valley topography and wetlands attenuate and delay the impact of rainfall on river water levels (Junk et al., 2011). This allows statistical seasonal forecast models to predict reliably the magnitude of hydrological peak water levels (Schöngart and Junk, 2007) from prior catchment rainfall observations (Richey et al., 1989). As Amazonian rainfall and subsequently river water level are also influenced by large-scale coupled ocean-atmosphere modes of variability (Towner et al., 2020), these modes of variability can also be potential predictors for water level forecast models (Schöngart and Junk, 2020).
Using this premise, Chevuturi et al. (2021) developed three statistical forecast models for annual maximum water level for the Negro River at Manaus, that use observed antecedent rainfall, antecedent water level, large-scale modes of variability represented by climate indices and the linear trend of historical water levels as predictors. The models developed by Chevuturi et al. (2021) can issue forecasts at three lead times: March (the current earliest operational lead time), February and January; and were compared against the models from INPA (Schöngart and Junk, 2007;Schöngart and Junk, 2020) and CPRM (Maciel et al., 2020;Maciel et al., 2022). The results show that these forecasts gain one month of lead time against the CPRM models with similar forecast performance .
Seasonal forecasts show medium-to-high performance for rainfall anomalies over Amazonia during austral summer, as shown by statistically significant anomaly correlation coefficients (0.6-0.8; Nobre et al., 2006). Seasonal forecast performance for rainfall over South America depends on an accurate representation of the SST variability of the surrounding oceans (e.g., Montecinos et al., 2000). Predictions systems combining a SST-based empirical models and European multi-model seasonal forecasts show good performance for austral summer rainfall over South America (Coelho et al., 2006). Dynamical seasonal forecasts for rainfall over South America from the European Centre for Medium-Range Weather Forecasts (ECMWF) outperform forecasts from empirical prediction systems using Niño3.4 SST as predictor (Gubler et al., 2020). Current forecast models of maximum water level rely on the linear relationship between modes of variability and water levels connected through rainfall, but the findings of Gubler et al. (2020); Siqueira et al. (2020) suggest that dynamical seasonal rainfall predictions could be used to skilfully forecast maximum water levels.
We investigate extending the lead time of the Chevuturi et al. (2021) maximum water level forecast models, by replacing the observed rainfall and SST inputs to the statistical models with forecast values from dynamical seasonal forecast models. Longer lead times allow for earlier warnings of, and additional preparation time for, high-impact floods events. Using ensemble forecast data as input to the statistical models, we generate ensemble forecasts of the maximum water level. Ensemble forecasts provide a range of possible outcomes and can quantify forecast uncertainty. We evaluate the deterministic and probabilistic skill of these longer lead time ensemble forecasts and compare that skill to those of the original observation based models and to benchmark models where the dynamical forecast rainfall and SST indices are replaced with climatology or persistence.
The paper is organized as follows: Section 2 introduces the data (Section 2.1) and describes the methods used (Section 2.2); seasonal hindcast performance is discussed in Section 3; verification of ensemble mean deterministic forecasts (Section 4.1), probabilistic forecasts (Section 4.2) and real-time forecasts (Section 4.3) for annual maximum water level at Manaus is discussed in Section 4; Section 5 concludes this study. Chevuturi et al. (2021) developed three statistical models for maximum water levels at Manaus, using multiple linear regression, which can issue forecasts by the middle of March, February and January. These statistical models use combinations of predictors, including antecedent rainfall, antecedent Atlantic Multi-decadal Oscillation index (AMO), the previous year's minimum water level (Lmin) and the year of the forecast (Year), which represents the linear trend. For forecasts issued in March, November-February (NDJF) mean rainfall, NDJF mean AMO, Lmin and Year are used as input. For February issued forecasts, November-January (NDJ) mean rainfall and NDJ mean AMO along with Year are used as input variables. For January issued forecasts, November-December (ND) mean rainfall, Year and Lmin are used as predictors. These models are further discussed in the Section 2.2.

Data
To develop the statistical models, 1903-2004 was selected as the training period, using GPCC (Global Precipitation Climatology Centre) rainfall (Schneider et al., 2017); models were validated over 2005-2019 using CHIRPS (Climate Hazards Group InfraRed Precipitation with Station) rainfall, due to the lack of GPCC real-time data . CHIRPS version 2.0 (Funk et al., 2015), at 0.05 • resolution, is available from 1981-present.
We use daily Negro River water level (stage) at the Manaus harbour station (ID: 14990000), measured by Capitania dos Portos (Port Authority) since September 1902 (Maciel et al., 2020), to calculate annual maximum and minimum water levels for Manaus. The location of Manaus (3.14 • S, 60.03 • W) and the catchments of the Negro, Solimões and Madeira Rivers are shown in Fig. 2. The observed antecedent rainfall in the catchment upstream of Manaus is an important predictability source for water levels. Chevuturi et al. (2021) used area-mean rainfall over the masked region ( Fig. 2) within the three basins as a predictor for the original forecast models. The mask for each antecedent month is built from gridpoints with statistically significant correlation values at the 5% level between annual maximum water level and CHIRPS rainfall (grey shaded areas in Fig. 2). All three basins are considered as the upstream catchment for the Negro River, as its water levels are influenced by water levels in the Solimões and Madeira Rivers due to backwater effects (Meade et al., 1991;Schöngart and Junk, 2020).
To extend the lead time for the original forecast model of annual maximum water level, we combine observed and seasonal hindcast rainfall and AMO as input to the forecast model. To evaluate this approach we use seasonal hindcasts from six prediction centres: Euro-Mediterranean Center on Climate Change (CMCC), Deutscher Wetterdienst (DWD), European Centre for Medium-Range Weather Forecasts (ECMWF), Météo-France (METFR), National Centers for Environmental Prediction (NCEP), and UK Met Office (UKMO). For details of the hindcast sets please see Table 1. We use bias corrected (see Section 2.2) monthly total precipitation and sea surface temperature (SST) variables for 1 st of January (Jan_Start) and 1 st of February (Feb_Start) initialisations (i.e., the dates when the hindcasts were "produced"; Table 1) for each hindcast for the common period of 1994-2016, available at 1.0 • spatial resolution. We also test a longer time period from ECMWF hindcasts, available from 1981-2016.

Method
We use CHIRPS observed rainfall and other observed variables (AMO, Lmin and Year) and calculate annual maximum water level forecasts for 1994-2019, using the original three models from Chevuturi et al. (2021). We refer to the original models' forecasts as CHIRPS-ORI-Mar, CHIRPS-ORI-Feb and CHIRPS-ORI-Jan, based on their respective month of issue (Table 2). Full comparison between the operational models from CPRM (Maciel et al., 2020) and the original models using CHIRPS have been provided in Chevuturi et al. (2021). The results showed that CHIRPS based original models are moderately better than CPRM forecast for the same issuance month, and show statistically similar performance to the CPRM forecasts one month ahead. Thus, we utilize the CHIRPS based original models as benchmark in this work, rather than CPRM models.
We extend the lead time of CHIRPS-ORI-Mar model, which uses NDJF-mean rainfall, NDJF-mean AMO, Year and Lmin as predictors. By replacing either January and February or only February observed rainfall and SST input with seasonal hindcast input, we produce maximum water level forecasts two or one months earlier, respectively. To calculate the forecasts we use the same model equation and parameters as CHIRPS-ORI-Mar. For rainfall input, we use the same mask used for CHIRPS rainfall (Fig. 2), regridded to the seasonal hindcast common grid of 1.0 • . The seasonal hindcast rainfall input is the monthly mean hindcast rainfall averaged over the masked regions in Fig. 2. The seasonal hindcast AMO input is SST anomalies averaged over the Atlantic Ocean (0 • -70 • N and 75 • W-5E • ).
The hindcasts have errors in the magnitude and variability for rainfall (Fig. 3, 4) and SST (not shown), as expected. Thus, we bias correct the rainfall and AMO hindcast input, using the standardizedreconstruction technique (Pan and Dool, 1998). This technique forces the hindcast to have the same mean and standard deviation as the observations. We first standardize the hindcast by removing the hindcast mean and standard deviation. The corrected hindcast is reconstructed by applying the observed mean and standard deviation to the standardized hindcast anomalies.
We combine the bias corrected seasonal hindcast February rainfall and February AMO index, from the 1 st of February (Feb_Start) initialisations, with the observed NDJ rainfall and AMO index. By combining observed and hindcast rainfall and AMO index, along with observed Lmin and Year, as input in the CHIRPS-ORI-Mar model, we can issue forecasts by early February. Further, by combining bias corrected seasonal hindcast JF rainfall and AMO index, from the 1 st of January (Jan_Start) initialisations, and observed ND rainfall and AMO index, along with observed Lmin and Year, we can issue forecasts by early January. Thus, for each seasonal hindcast set, we produce two forecasts that can be issued in February and January for each year (Table 2). We refer to the new forecasts using their hindcast name and issuance time (CMCC-Feb, DWD-Feb, ECMWF-Feb, METFR-Feb, NCEP-Feb, UKMO-Feb, CMCC-Jan, DWD-Jan, ECMWF-Jan, METFR-Jan, NCEP-Jan, UKMO-Jan). We produce a water level forecast from each ensemble member of each seasonal hindcast, to obtain an ensemble of water level forecasts for each year of 1994-2016. For ECMWF-Feb and ECMWF-Jan, we also produce forecasts over a longer 1981-2016 period (see Section 4.3).
By combining all the ensemble forecasts for maximum water level issued in February (CMCC-Feb, DWD-Feb, ECMWF-Feb, METFR-Feb, NCEP-Feb, UKMO-Feb), we generate a multi-model super-ensemble forecast (MM-Feb) that can be issued in February. This super-ensemble is a collection of all ensemble members from all six forecasts. Similarly, we combine all ensemble forecasts issued in January (CMCC-Jan, DWD-Jan, ECMWF-Jan, METFR-Jan, NCEP-Jan, UKMO-Jan), to generate a multi-model super-ensemble forecast (MM-Jan) that can be issued in January.
We compare the performance of the 14 new maximum water-level forecasts, produced using combined observations and seasonal hindcasts, against the original three statistical forecasts, which only use observations. Further, we also analyse whether the new forecasts  Chevuturi et al., 2021) identified using statistically significant correlation values at the 5% level between CHIRPS grid point monthly-mean rainfall and observed annual maximum water level for Negro River at Manaus (red circle; 3.14 • S, 60.03 • W), along with river basin catchments (regions surrounded by black lines). The Negro River basin is the northernmost basin; the Solimões River basin is the central basin; and the Madeira River basin is the southernmost basin.
outperform baseline forecasts produced with the same statistical models with climatological and persistence forecasts of rainfall and AMO as input. To do this, we replace the hindcast data in the models above with: (i) 1994-2010 climatological CHIRPS rainfall and AMO index of February and January (CHIRPS-CLIM-Feb, CHIRPS-CLIM-Jan); (ii) seasonal persistence of CHIRPS rainfall and AMO index of February and January (CHIRPS-SEAS-Feb, CHIRPS-SEAS-Jan); and (iii) monthly persistence CHIRPS rainfall and AMO index of February and January (CHIRPS-MON-Feb, CHIRPS-MON-Jan). For seasonal persistence, we replace February rainfall and AMO observations with the preceding NDJ-mean, to issue forecasts by February; and we replace JF rainfall and AMO with the preceding ND-mean, to issue forecasts by January. For monthly persistence, we replace February rainfall and AMO observations with the preceding January, to issue forecasts by February; and we replace January rainfall and AMO with the preceding December, to issue forecasts by January. Please see Table 2 for the full list of forecasts compared.
We validate ensemble mean forecasts using deterministic performance metrics (correlation coefficient, CC; root mean square error, RMSE) over the common period 1994-2016 (Table 2). To compare the models' performance fairly, we calculate distributions of model performance metrics (CC and RMSE), using a bootstrapping approach, by randomly generating 10000 samples, from the validation years, with replacement, and show the 5 th -95 th percentiles of the distributions generated. To assess significance of improvement in the model performance against the baseline climatological forecast we calculate the percentage of a models' distributions that falls outside the 95 th percentile of CC distribution and 5 th percentile of RMSE distribution of the forecast generated using climatological input of the respective months (CC Score and RMSE Score). Thus, we compare all the models issuing forecasts in February against CHIRPS-CLIM-Feb and all the models issuing forecasts in January against CHIRPS-CLIM-Jan. For the CHIRPS-ORI-Mar model we also use CHIRPS-CLIM-Feb as a baseline.
We also validate the ensemble water-level forecasts for the probability that the river stage will exceed the reference level of 29 m, at which Brazilian government declares an emergency, during the forthcoming annual flood season. To evaluate the categorical skill of the probabilistic forecast we use skill metrics: Accuracy or Hit rate (ACC; Wilks, 2011), Heidke skill score (HSS; Heidke, 1926), discrete Brier Skill Score (BSS; Weigel et al., 2007), discrete Ranked Probability Skill Score, (RPSS; Weigel et al., 2007), Area under the relative operating characteristic curve (ROC; Mason, 1982). ACC and HSS are calculated for each ensemble member and then averaged over the ensemble.

Precipitation and AMO hindcasts performance
We use the 1 st of February (Feb_Start) seasonal hindcasts from six prediction centres (CMCC, DWD, ECMWF, METFR, NCEP, UKMO), for February rainfall and SST (lead month 0) and 1 st of January hindcasts (Jan_Start) for January (lead month 0) and February (lead month 1) rainfall and SST. Before we discuss the verification of the annual maximum water level forecasts, we discuss the performance of seasonal rainfall and SST hindcasts for 1994-2016.
Rainfall performance for lead month 0 for both initialisations, measured by grid-point wise CC between CHIRPS and hindcast monthlymean rainfall, is statistically similar among most investigated models over the northern parts of the catchment area (Fig. 3). The correlations are highest and cover most of the catchment for ECMWF, and weakest and cover the least of the catchment for METFR. As expected, the hindcasts show reduced performance at longer lead times, i.e. for lead month 1, February from Jan_Start ( Fig. 3g-l), when compared to shorter lead times, i.e. for lead month 0, January from Jan_Start ( Fig. 3a-f). As correlations between grid-point hindcast rainfall and observed maximum water levels show weaker relationships than correlations of observed rainfall with maximum water levels (not shown), we do not use the rainfall masks from the seasonal hindcasts. Instead we use the regridded CHIRPS rainfall mask to calculate the input hindcast rainfall index. This method allows us to represent the real-world relationship between rainfall and water level, and not the (likely) biased relationship in the models.
Hindcast rainfall biases are well established by lead month 0 over most of South America (Fig. 4a-f, 4m-r). Within the catchment region, there are strong positive rainfall biases over the Andes, in all six hindcasts, which are most probably associated with errors in simulating orographic precipitation. CMCC, NCEP and UKMO show a dry bias over the catchment area, whereas ECMWF shows a wet bias and DWD and METFR have mixed wet and dry biases over different parts of the catchment area. These biases in the hindcasts grow with lead time (Fig. 4g-l).
Using combinations of SST observations and hindcasts, we calculate the AMO index for all ensemble members for January and February lead time (Fig. 5). For forecasts issued in February, we combine the observed NDJ AMO index along with the Feb_Start hindcast data for February (lead month 0) AMO index. For forecasts issued in January, we combine the observed ND AMO index along with Jan_Start hindcast data for JF (lead months 0 and 1) AMO index. These AMO indices, from all six hindcast sets, for both lead times, show similar interannual variability as observed (Fig. 5). The hindcast AMO index intialised on the 1 st of January shows a larger ensemble spread than the index for hindcasts initialised on the 1 st of February, which is expected due to the increase in spread with lead time. The bias in the forecasted AMO index also increases with lead time.

Maximum water level forecast verification
We validate the seven forecasts (one for each of the six models, plus the multi-model ensemble of these six models) for annual maximum water level for the Negro River at Manaus calculated using hindcast input for two start dates: February and January. Thus, we have 14 forecasts using hindcast input: CMCC-Feb, DWD-Feb, ECMWF-Feb, METFR-Feb, NCEP-Feb, UKMO-Feb, MM-Feb, CMCC-Jan, DWD-Jan, ECMWF-Jan, METFR-Jan, NCEP-Jan, UKMO-Jan, MM-Jan (Fig. 7). These 14 forecasts have an ensemble computed from the hindcast ensemble members. We first compare the performance of the ensemble mean forecasts (Section 4.1) against the three original statistical forecasts developed by Chevuturi et al. (2021), that can issue forecasts in January (CHIRPS-ORI-Jan), February (CHIRPS-ORI-Feb) and March (CHIRPS-ORI-Mar), and forecasts developed using observed climatology (CHIRPS-CLIM-Feb, CHIRPS-CLIM-Jan), seasonal persistence (CHIRPS-SEAS-Feb, CHIRPS-SEAS-Jan) and monthly persistence (CHIRPS-MON-Feb, CHIRPS-MON-Jan) as input for rainfall and AMO (Fig. 6). Next we verify the performance of probabilistic forecasts given by each of the 14 ensemble-based forecasts against the observed annual maximum water level (Section 4.2). Please see Section 2.2 for more details about the forecasts discussed here.

Deterministic forecasting
Variability of the annual maximum water level at Manaus, over 1994-2015, is generally well represented by the original statistical models (Fig. 6a) and and even the forecasts that use observed persistence and climatology as input for rainfall and AMO (Fig. 6b-d). The ensemble mean forecasts, using seasonal hindcast input, also adequately represent the variability of the annual maximum water level (Fig. 7). For the years 2000, 2002, 2010 and 2013 we note similarly strong negative biases in all forecasts. As in Chevuturi et al. (2021), the forecasts show reduced performance for CC and RMSE with increased lead time (Marchto-February-to-January; Table 2). We further look at distributions of CC and RMSE for a fair comparison of all forecasts (Fig. 8a-b).
The baseline forecasts with seasonal and monthly persistence and climatology issued in February have similar performance (CC; Fig. 8a) as Table 2 Description of input predictors for statistical forecast models for annual maximum water level for Manaus. The statistical forecasts are issued using original Chevuturi et al. (2021) models (ORI) in March (CHIRPS-ORI-Mar), February (CHIRPS-ORI-Feb) and January (CHIRPS-ORI-Jan). Forecasts from this study are calculated using observed seasonal persistence (SEAS), monthly persistence (MON) and climatology (CLIM), and seasonal hindcasts (CMCC, DWD, ECMWF, METFR, NCEP, UKMO and MM) for both February and January. Columns CC and RMSE show metrics for the deterministic and ensemble mean forecasts. Columns CC Score and RMSE Score shows the percentage of bootstrap samples for each metric that are outside the 5 th -95 th percentile of the distributions of CHIRPS-CLIM-Feb (for March and February models) and for CHIRPS-CLIM-Jan (for January models). MM-Jan CMCC-Jan, DWD-Jan, ECMWF-Jan, METFR-Jan, NCEP-Jan, UKMO-Jan 0.83 0.60 11.56 13.75 A. Chevuturi et al. CHIRPS-ORI-Feb but show larger errors (RMSE; Fig. 8b), and have reduced performance compared to CHIRPS-ORI-Mar (Table 2), as expected. We see similarly reduced performance in baseline forecasts issued in January, except for CHIRPS-MON-Jan, which shows similar performance (high confidence) as the CHIRPS-ORI-Feb. We say high confidence, when at least 95% of the forecast falls within the 5 th -95 th percentiles ranges of another forecast's CC and RMSE while comparing forecasts. The forecasts, issued in February using seasonal hindcasts, have similar performance as CHIRPS-ORI-Feb and are not significantly better than the forecasts issued using persistence and climatology, except for ECMWF-Feb and MM-Feb (Fig. 8). ECMWF-Feb outperforms CHIRPS-ORI-Feb, and shows similar performance as CHIRPS-ORI-Mar (high confidence). This suggests that we can gain one month of lead time by combining observations and ECMWF seasonal forecasts, for the same performance (Table 2; Fig. 8).
The forecasts issued in January, using seasonal hindcasts, also have similar performance as the CHIRPS-ORI-Jan, except for ECMWF-Jan and MM-Jan. For January, the ECMWF-Jan forecasts outperform CHIRPS-ORI-Jan, but perform similarly to CHIRPS-MON-Jan (high  A. Chevuturi et al. Climate Services 30 (2023) 8 confidence). However, the ECMWF-Jan forecasts are similar to CHIRPS-ORI-Feb (moderate confidence). Moderately confidence in comparison between forecast performance is considered when only 75% of the distributions for CC and RMSE overlap between the forecasts.
For ECMWF-Feb we find that 50% of the CC distribution and 20% of the RMSE distribution is outside the 5 th -95 th percentile range of CHIRPS-CLIM-Feb, as with CHIRPS-ORI-Mar (Table 2). For the ECMWF-Jan forecasts, these percentages fall to 20% of CC and RMSE distributions being outside the 5 th -95 th percentile range of CHIRPS-CLIM-Jan. CHIRPS-MON-Jan performs slightly better than ECMWF-Jan. However, the ECMWF-based maximum water level forecasts are significantly better than the water level forecasts using climatology as input. All other hindcast-based forecasts perform similarly to, or worse than, the climatology-based forecasts. Further, multi-model mean forecasts (MM-  Feb and MM-Jan) show no improvement over their respective ECMWF counterparts (ECMWF-Feb and ECMWF-Jan) and offer little value over climatology or persistence for forecasting water levels.

Probabilistic forecasting
To verify the 14 probabilistic forecasts of maximum water level at Manaus issued in January and February, we compare probabilities derived from the forecast ensemble against the observed annual maximum water level for 1994-2016. Ensemble forecasts issued in January have a much larger ensemble spread than the forecasts issued in February (Fig. 7), as expected. For the January forecasts, we include lead month 0 and 1, whereas for February forecasts, we only include lead month 0. We see similar increase in ensemble spread with lead time for AMO forecasts (Fig. 5).
The ensemble forecasts of all models include the observed maximum water level values for most years, with few exceptions (e.g. 1998, 2001, 2002, 2010Fig. 7). We objectively measure the probabilistic skill for a two-category forecast (above and below the 29 m water level threshold) using ACC, HSS, RPSS and ROC (Section 2.2). As for ensemble mean forecasts, the ECMWF-based probabilistic forecasts (ECMWF-Feb and ECMWF-Jan) are also best at both lead times. Further, the multimodel super-ensemble forecasts (MM-Feb and MM-Jan) are not significantly better than the ECMWF-based forecasts.
As ECMWF-based water level forecasts (ECMWF-Feb and ECMWF-Jan) clearly outperform the other hindcast-based forecasts, we evaluate the ECMWF forecasts further for the full period available . Over 1981-2016 the performance metrics for the ensemble mean ECMWF-Feb forecasts (CC = 0.91, RMSE = 0.45 m) and ECMWF-Jan forecasts (CC = 0.83, RMSE = 0.59 m) remain similar to those for the 1993-2016 period (Table 2).
We also evaluate the probabilistic skill of the ECMWF forecasts over 1981-2016 (Fig. 8c). The ECMWF-Feb ensemble forecast has 90% accuracy (ACC ≈ 0.90) in forecasting the category (above and below 29 m water level), with 65% accuracy after correcting for those forecasts which would be correct due to chance (HSS ≈ 0.65). ECMWF-Feb also shows better probabilistic forecast performance relative to the reference climatological forecast for the two categories (RPSS > 0); the forecast can clearly discriminate between the two alternative outcomes (ROC > 0.9). This suggests that ECMWF-Feb probabilistic forecasts have clear potential for operational use. ECMWF-Jan probabilistic forecasts have lower skill than ECMWF-Feb probabilistic forecasts, as expected, associated with loss of sharpness in the ECMWF-Jan ensemble forecasts relative to ECMWF-Feb (Fig. 7c). However, the probabilistic skill for January is still moderate and may have some use for earlier warnings of flood events.
For forecasts of terciles of maximum water level, the ECMWF forecasts have higher skill in the lower and the upper tercile than the middle tercile (not shown), which suggests that the forecasts predict extreme conditions much better than near-normal conditions. This improved skill may stem from higher forecast skill for upper-tercile and lower-tercile rainfall, at subseasonal timescales over the Amazon basin . Better performance of ECMWF ensemble forecasts over South America not only produce skilful forecasts of water level using statistical models as in the current study, but have also been used to provide skilful streamflow forecasts using hydrologic-hydrodynamic models (Siqueira et al., 2020).

Real-time forecasting
To further test our method for real-time operational forecasting application, we use the operational seasonal forecasts provided by the ECMWF available for 2017-2021. The only difference between the seasonal forecasts (2017-2021) and the hindcasts (1981-2016) is the ensemble size: 51 vs 25 members respectively. We cannot test this for all models as some prediction centres do not make forecasts available for the full 2017-2021 period. We retrospectively issue forecasts of the annual maximum water level at Manaus for 2017-2020 January and February, using combined data from observations and ECMWF forecasts. We compare these retrospective forecasts with observations and CHIRPS-ORI-Mar forecasts (Fig. 8d). The ECMWF-Jan and ECMWF-Feb forecast ensemble spread includes the observations for most years, with only the 2017 and 2021 values outside the ECMWF-Feb distribution. For 2017, both ECMWF-Feb and CHIRPS-ORI-Mar overestimate the maximum water level; the ECMWF-Jan ensemble mean is much closer to the observations. The January issued forecasts have a much larger spread and thus are more likely to include the observations, but also have higher forecast uncertainty. For 2021, all of the forecasts underestimate the extreme flood (30.02 m), with only some extreme ensemble members for ECMWF-Jan coming close to the observations. However, we note that ECMWF-Feb and ECMWF-Jan predict extreme flood levels (greater than 29 m) for 2021 with 90% and 73% probability respectively. For all the other years, ECMWF-Feb outperforms ECMWF-Jan, with a much lower spread and thus lower forecast uncertainty. Our analysis of the ECMWF-based forecasts indicates that ECMWF-Feb and ECMWF-Jan ensemble forecasts can provide useful and skilful real-time operational probabilistic forecasts.

Summary and Conclusions
Flooding in the rivers of the Amazon basin, like the Negro River that flows through Manaus, can be devastating to the surrounding areas (Marengo et al., 2013;Marengo and Espinoza, 2016). It is therefore critical to advance the prediction of high water levels of Amazonian rivers, to provide more effective and earlier warnings of impending disasters, for more effective action to safeguard lives and livelihoods (Schöngart and Junk, 2007;Maciel et al., 2020). Operational forecasts of maximum water level for Manaus are provided by Brazilian institutes CPRM (Maciel et al., 2020;Maciel et al., 2022) and INPA (Schöngart and Junk, 2007;Schöngart and Junk, 2020). Chevuturi et al. (2021) developed statistical forecast models for the annual maximum water level of the Negro River at Manaus, based on catchment rainfall (the predominant predictor), large-scale teleconnection indices, the long-term linear trend of water level and antecedent water levels. These forecasts, which only use observations as input, showed operationally viable performance, at the current earliest operational lead time of March, by advancing the skill of the operational models.
In this study, we investigated extending the lead time of the original statistical model, issued in March, by incorporating dynamical seasonal forecast information, to develop operationally useful and skilful forecasts for annual maximum water level at Manaus. We generated forecasts, which combine observations and seasonal hindcasts from six prediction centres as input to the original model. The new forecasts can be issued operationally at two lead times: February and January. We compare the new forecasts against the original statistical models and forecasts generated using persistence and climatology of rainfall and AMO observations over the period of 1994-2016. Our results are summarized below: • Of all hindcast-based forecasts issued in February, only ECMWFbased forecasts perform better than the original statistical February forecast (75% confidence). The improved performance of ECMWF