GRACE Satellites Enable Long-Lead Forecasts of Mountain Contributions to Streamflow in the Low-Flow Season

Terrestrial water storage (TWS) in high mountain areas contributes large runoff volumes to nearby lowlands during the low-flow season when streamflow is critical to downstream water supplies. The potential for TWS from GRACE (Gravity Recovery and Climate Experiment) satellites to provide long-lead streamflow forecasting in adjacent lowlands during the low-flow season was assessed using the upper Yellow River as a case study. Two linear models were trained for forecasting monthly streamflow with and without TWS anomaly (TWSA) from 2002 to 2016. Results show that the model based on streamflow and TWSA is superior to the model based on streamflow alone at up to a five-month lead-time. The inclusion of TWSA reduced errors in streamflow forecasts by 25% to 50%, with 3–5-month lead-times, which represents the role of terrestrial hydrologic memory in streamflow changes during the low-flow season. This study underscores the high potential of streamflow forecasting using GRACE data with long lead-times that should improve water management in mountainous water towers and downstream areas.


Introduction
Long-term streamflow forecasts often depend heavily on terrestrial hydrologic memory [1,2]. As sparse in situ observations and large uncertainties in climate forecasts impede long-lead streamflow forecasting in mountainous regions [3], long-term memory of soil moisture and groundwater can significantly contribute to forecast skill [1,4]. However, the use of terrestrial hydrologic memory for streamflow forecasts has been restricted due to limited observations [5,6]. In recent decades, remotely sensed data, such as snow and soil moisture, have been increasingly used for streamflow forecasting, and may play a more important role in hydrology in the future [7], despite considerable uncertainty [8]. Among others, the original Gravity Recovery and Climate Experiment (GRACE), and the GRACE follow-on satellite missions, provide unique observations of Terrestrial Total Water Storage (TWS), including snow, canopy, surface water, soil moisture, and groundwater [9].
TWS anomalies (TWSAs) inferred from GRACE satellites have global coverage and are provided with about 1-month latency [10], and have been widely used for detecting changes in TWS and groundwater in many regions [11,12]. GRACE TWSA has also shown great potential for streamflow prediction at short term or seasonal timescales [13,14]. GRACE TWSA allows for the early warning of flood events, with more than 5-month lead times in the U.S. Missouri River Basin [15]. In the Ganges-Brahmaputra-Meghna Rivers, fairly reliable streamflow forecasts with 2-3-month lead times were achieved with GRACE TWSA data alone [16]. The important role of GRACE TWSA was further recognized when used in conjunction with meteorological data in seasonal streamflow forecasts in Central Asia [17]. Previous studies indicate that streamflow forecasting during the flood season generally benefited from month-to-month variations in the terrestrial hydrological memory represented by GRACE TWSA. The strength of the terrestrial hydrological memory may vary seasonally and over different hydrological regimes [6,18,19]. Thus, the contribution of GRACE TWSA to forecast skill would differ across seasons with different hydrological regimes.
Snow and soil moisture conditions, which represent the initial conditions in hydrological models, have the strongest influence on streamflow forecasts, especially for flood prediction during the wet season [20][21][22]. However, TWS components are not well represented yet in current hydrological models, which makes it a challenge to model the variation in streamflow during low-flow seasons [23]. Previous studies mainly focused on applying GRACE TWSA in forecasting wet season streamflow; however, it is unclear how knowledge of water stored during the wet and warm season would aid in streamflow forecasts in the subsequent dry season.
Predicting low flow in a changing climate is critically important to society. Variations in streamflow during low-flow periods destabilize water supplies to humans and the environment downstream to different degrees. River low-flows, which are essential for municipal water supply, irrigation, industry, hydropower, river navigation, recreation, and wildlife conservation, face unprecedented threats from climate extremes and change [24][25][26][27]. Observed evidence of drier dry seasons, predominantly in extratropical latitudes, suggests that reduced dry season water availability can be attributed to human-induced climate change as a consequence of increasing evapotranspiration under global warming [28][29][30]. Thus, an improved understanding of variations in low-flow and the extension of long-lead forecasts are essential for enhancing water resource management downstream.
The objective of this study was to test the potential of the GRACE TWSA signal, obtained in the wet and warm season, for improving long-lead streamflow forecasts during the low-flow cold season. The Upper Yellow River (UYR) in China is an ideal system to test the approach because of the large variability in streamflow and the importance of low flows to downstream users. The UYR originates in the Tibetan Plateau, which is known as the water tower of the Yellow River because of its significant contribution (~34%) to the total runoff of the entire Yellow River basin [31]. In the UYR, precipitation mostly occurs during the warm season (April to October), and low streamflow in the following cold dry season (November to March) is mostly associated with TWS from the preceding season's precipitation [32,33]. Low river flows from the UYR profoundly influence ecosystems and economies dependent on the water source [25,34,35]. Novel aspects of this study are the use of the GRACE TWSA signal obtained in the wet season for low-flow prediction and extending the lead times up to 5 months at a mountainous river basin, with limited ground observations. This study should have important implications for streamflow management and improvement of hydrological modeling in the Yellow River, with potential applications to similar river basins globally that are fed by mountainous watersheds.

Precipitation, Streamflow and TWSA Data
Monthly streamflow data from 2002 through 2016 were collected for the control station in the UYR of China, i.e., the Tangnaihai station (altitude 2.7 km; Figure 1), from annual hydrological yearbooks [36]. Monthly areal precipitation during 2002 to 2016 was calculated as the average precipitation of all grid cells within the UYR from the China meteorological forcing dataset [37]. The catchment area of the UYR (122,000 km 2 ) is at the lower limit of GRACE resolution (~100,000 km 2 ). Monthly TWSA in the UYR from April 2002 through December 2016 was derived from the GRACE Release 06 (RL06) mascon solutions provided by NASA Jet Propulsion Laboratory (JPL) and the University of Texas Center for Space Research (CSR). The JPL GRACE RL06 mascon solution (version 2.0), with a spatial resolution of 0.5 • , includes the coastline resolution improvement [38,39]. The native resolution of CSR GRACE RL06 mascon solution (version 1) is 1 • and was downscaled to 0.25 • [40]. TWSAs were calculated from the two GRACE solutions by removing the mean for the 2002-2016 period. In this study, monthly TWSA data were spatially aggregated over the GRACE grids weighted by the fraction covered by the UYR basin. A total of 19 months of data are missing from the GRACE products from April 2002 to December 2016 due to satellite operational issues. The data in October are missing in 2012, 2015, and 2016. Linear interpolation was used to fill the missing months using the nearest two neighbors. The length of individual data gaps is less than 2 months. Thus, the data filled by linear interpolation should have minimal impact on the results [41,42]. The average TWSA of the JPL and CSR solutions was used in the main analysis, and the results of each solution are provided in the Supplementary Materials. According to the streamflow climatology in the UYR (Figure 2), the low-flow season is defined as the period from November to March, and the TWSA data from October were used as predictors of streamflow during the low-flow season.
Remote Sens. 2021, 13, x FOR PEER REVIEW 3 of 13 meteorological forcing dataset [37]. The catchment area of the UYR (122,000 km 2 ) is at the lower limit of GRACE resolution (~100,000 km 2 ). Monthly TWSA in the UYR from April 2002 through December 2016 was derived from the GRACE Release 06 (RL06) mascon solutions provided by NASA Jet Propulsion Laboratory (JPL) and the University of Texas Center for Space Research (CSR). The JPL GRACE RL06 mascon solution (version 2.0), with a spatial resolution of 0.5°, includes the coastline resolution improvement [38,39]. The native resolution of CSR GRACE RL06 mascon solution (version 1) is 1° and was downscaled to 0.25° [40]. TWSAs were calculated from the two GRACE solutions by removing the mean for the 2002-2016 period. In this study, monthly TWSA data were spatially aggregated over the GRACE grids weighted by the fraction covered by the UYR basin. A total of 19 months of data are missing from the GRACE products from April 2002 to December 2016 due to satellite operational issues. The data in October are missing in 2012, 2015, and 2016. Linear interpolation was used to fill the missing months using the nearest two neighbors. The length of individual data gaps is less than 2 months. Thus, the data filled by linear interpolation should have minimal impact on the results [41,42]. The average TWSA of the JPL and CSR solutions was used in the main analysis, and the results of each solution are provided in the Supplementary Materials. According to the streamflow climatology in the UYR (Figure 2), the low-flow season is defined as the period from November to March, and the TWSA data from October were used as predictors of streamflow during the low-flow season.

Linear Models
Two linear models (LM1 and LM2), following Reager et al. [15], were used to develop monthly streamflow forecasts. LM1 included one predictor which can be monthly streamflow, TWSA, or 5-month cumulative precipitation. Taking streamflow (Q) for an example: whereQ t is the predicted monthly streamflow at t month, Q t−τ is the observed streamflow at t − τ month (τ months before time t), τ is the lead time (months), a 1 is the regression coefficient, and c 1 is a constant. LM2 used a combination of two to three of the predictors (e.g., streamflow, TWSA, or 5-month cumulative precipitation), and taking streamflow and TWSA for an example:Q

Linear Models
Two linear models (LM1 and LM2), following Reager et al. [15], were used to develo monthly streamflow forecasts. LM1 included one predictor which can be monthly stream flow, TWSA, or 5-month cumulative precipitation. Taking streamflow (Q) for an examp = × ( where is the predicted monthly streamflow at t month, Qt−τ is the observed streamflo at t − τ month (τ months before time t), τ is the lead time (months), a1 is the regressio coefficient, and c1 is a constant. LM2 used a combination of two to three of the predicto (e.g., streamflow, TWSA, or 5-month cumulative precipitation), and taking streamflo and TWSA for an example: where a2 and b2 are regression coefficients, c2 is the intercept. The coefficients for the line models were determined using the least-squares method implemented by the pytho package scikit-learn (https://scikit-learn.org/, accessed on 19-05-2021). In this stud streamflow and TWSA from October were used as predictors of streamflow from Novem ber to March. Streamflow and TWSA time series were divided into a 13-(or 12)-year training p riod and a two-year forecast period. More specifically, the length of the training datas was 13 years for the target months of November and December, and 12 years for Janua to March. A test suggested that longer forecast periods (shorter training periods) wou result in slightly poorer linear models due to limited records (Figure S1 online). The boo strap method was used to resample the training and forecast datasets 100 times to estima Streamflow and TWSA time series were divided into a 13-(or 12)-year training period and a two-year forecast period. More specifically, the length of the training dataset was 13 years for the target months of November and December, and 12 years for January to March. A test suggested that longer forecast periods (shorter training periods) would result in slightly poorer linear models due to limited records (Figure S1 online). The bootstrap method was used to resample the training and forecast datasets 100 times to estimate the uncertainty in the linear models caused by the selection of training data. Accordingly, an ensemble (100 time series) of streamflow forecasts was generated. We used the ensemble medians of streamflow simulations for the main analysis, with uncertainty estimated from the 25th and 75th percentiles of the ensemble.
The contribution of GRACE TWSA data to the forecasts was evaluated by comparing monthly streamflow from the models and observations during the low-flow season. Relative error (RE) and mean relative absolute error (MRAE) were used to evaluate the performance of the linear models, which were calculated as where Q t,f and Q t,o are the forecasted and observed monthly streamflow, respectively, at month t, N is the number of data points (the number of years in this study), and i indicates the ith year. The RE and MRAE were calculated for all data, i.e., including the training and forecast data. The adjusted R 2 (R 2 adj ) was also used to examine the goodness of fit: where k is the number of independent predictors, Q t,o and Q t, f are averages of observed and forecasted monthly streamflow, respectively, σ Q t,o and σ Q t, f are the standard deviation of the observed and forecasted monthly streamflow, respectively.

Results
TWS changes more slowly than streamflow during the shift from high-flow to low-flow seasons ( Figure 2). Mean monthly streamflow at the Tangnaihai station was~5.4 mm (converted to runoff depth) and mean monthly precipitation was~4.3 mm over the UYR during the low-flow season for 2002-2016. Streamflow from November to February exceeded precipitation input, implying that water stored from the previous season's precipitation contributed a considerable volume of discharge during the low-flow season. Unlike streamflow, which is lowest in January and February, TWSA often reaches the lowest values in April, during the transition from low-flow to high-flow seasons. The potential contribution of TWSA to discharge can be inferred from the lag-correlations between TWSA in October and streamflow in the months from November to March (see the inner plot in Figure 2). The lag-correlations are statistically significant (at the 95% level) and are particularly high (R ≥ 0.8) for January and February. The correlation coefficients between streamflow in October and streamflow are larger in November and December than those for TWSA and streamflow in November and December; thereafter, autocorrelations in streamflow decrease for the following months and are smaller than those between TWSA in October and streamflow in January, February, and March (see the inner plot in Figure 2).
Using LM1 and LM2, we predicted monthly streamflow in the low-flow season (November through March) at the end of the preceding high-flow season (October). The linear models were fitted with the least-squares method, and the Akaike information criterion (AIC) values were estimated ( Table 1). The results based on the average TWSA of JPL and CSR solutions show that AIC values of LM1 are generally smaller than those of LM2 for the predictions of November and December, while the opposite is found for the predictions of January to March.
The performances of the linear models based on the average TWSA of the JPL and CSR solutions are shown in Figure 3 (see also Figures S2 and S3 online). As indicated by the mean relative absolute errors (MRAEs) (Figure 3A), the performance of LM1 with predictor streamflow (LM1_Q) is better for target months November and December, while LM1 with predictor TWSA (LM1_S) is better for target months January to March. The LM1 with predictor cumulative precipitation (LM1_P) is the worst. The MRAEs of the LM2 with predictors streamflow and TWSA (LM2_QS) are significantly lower than those of LM1_Q for all target months (Table S1 online), while the MRAEs of the LM2 with predictors cumulative precipitation and TWSA (LM2_PS) and the LM2 with predictors TWSA are significantly higher than those of LM1_Q for target months November and December, but are significantly lower (at the 95% confidence level) for target months from January to March. Among others, LM2_QS shows the best performance for all target months. The median MRAEs of LM2_QS in these target months are 7.9% (January), 12.8% (February), 10.7% (March), and 9.3% (JFM), which are at least 25% less than those of LM1_Q. Overall, the inclusion of GRACE TWSA reduced the error in streamflow forecasts with a lead time of 3-5 months by 25% to 50%.  Improved performance of the linear models due to the inclusion of TWSA is also indicated by the adjusted R 2 values ( Figure 3B). The LM1_Q, LM2_QS, and LM2_QP have similar adjusted R 2 values for target months November (>~0.7) and December (~0.5). LM1_P has the lowest adjusted R 2 for all target months, while the adjusted R 2 of LM1_S is much higher than that of LM1_Q for January, February, March, and JFM. LM2_QS shows Improved performance of the linear models due to the inclusion of TWSA is also indicated by the adjusted R 2 values ( Figure 3B). The LM1_Q, LM2_QS, and LM2_QP have similar adjusted R 2 values for target months November (>~0.7) and December (~0.5). LM1_P has the lowest adjusted R 2 for all target months, while the adjusted R 2 of LM1_S is much higher than that of LM1_Q for January, February, March, and JFM. LM2_QS shows the best performance in January, with a median adjusted R 2 ≥ 0.8, and~0.7 for February and JFM. The median adjusted R 2 values of LM1_Q are~0.3-0.4 for January and February, and~0.1 for March. Differences in median adjusted R 2 values between LM1_Q and LM2_QS are~0.3-0.4 for January to March, suggesting that the inclusion of GRACE TWSA in LM2 can explain~30-40% more variance in streamflow forecasts. LM2_QS represented the improvement of streamflow forecast well relative to LM1_Q, due to the inclusion of TWSA. Therefore, for simplicity, the results of these two models are presented in the following analysis.
The REs in the simulated streamflow by LM1_Q are larger than those for LM2_QS ( Table 2) for most years. Specifically, about 60% to 70% of REs of LM2_QS predictions are less than those of LM1_Q. Only four REs are out of the ±20% range for LM2_QS, which is much less compared to the 21 REs for LM1_Q. For all target months, annual fluctuation in REs of simulated streamflow from LM1_Q is larger than that from LM2_QS. The difference between the maximum and minimum ensemble median REs during the 15 or 14 years ranges from 51% to 78% for LM1_Q, relative to 32% to 52% for LM2_QS for all target months. This indicates that more robust forecasts can be achieved if the LM2_QS approach is adopted, i.e., including GRACE TWSA signal. Note: Bold numbers mean that the magnitude of REs from LM2_QS are smaller than those from LM1_Q.
The MRAEs were also calculated for streamflow forecasts in the three driest and three wettest years (Table 3). For the target months, except for December, MRAEs from both LM1_Q and LM2_QS are larger in the three driest years than in the three wettest years. MRAEs from LM2_QS, particularly in the dry years (MRAE is~10%), are less than those from LM1_Q for all target months. In the driest (wettest) years, the MRAEs of streamflow forecasts from LM1_Q for January, February, March are at least 13 (5) larger than those of LM2_QS. In other words, streamflow forecasts by LM2_QS reduced MRAEs by 5% to 70% in the driest years, and by 3% to 30% in the wettest years compared to LM1_Q. The range of MRAEs across the driest and wettest years from LM2_QS is 7% to 16%, which is lower than that of LM1_Q (12% to 27%). This further indicates better performance by the LM2_QS approach for streamflow forecasting during the low-flow season, especially in dry years. Table 3. Mean relative absolute errors (MRAEs, %) in modeled streamflow by LM1_Q and LM2_QS for each target month during the three driest years (labeled Dry) and three wettest years (labeled Wet). The differences (Diff) in MRAEs between LM1_Q and LM2_QS are shown as percentages. The REs of each year for the training period and forecast period show that LM2_QS generally performs better overall than LM1_Q during both periods (Figure 4 and Figure S4 online). Specifically, LM2_QS agrees better with observed Q than LM1_Q in most years, particularly in years when LM1_Q has large REs (e.g., 2003, 2004, and 2013). The REs for the training years are often larger than those of the forecast years for both LM1_Q and LM2_QS. This is also indicated by the ensemble mean of absolute REs for each year for training (pink and light blue lines) and forecast (red and blue lines) periods, respectively. For both training and forecasting periods, the ensemble mean REs for LM1_Q are significantly different from those for LM2_QS at the 95% significance level in each year for all target months. The REs of each year for the training period and forecast period show that LM2_QS generally performs better overall than LM1_Q during both periods (Figure 4 and Figure  S4 online). Specifically, LM2_QS agrees better with observed Q than LM1_Q in most years, particularly in years when LM1_Q has large REs (e.g., 2003, 2004, and 2013). The REs for the training years are often larger than those of the forecast years for both LM1_Q and LM2_QS. This is also indicated by the ensemble mean of absolute REs for each year for training (pink and light blue lines) and forecast (red and blue lines) periods, respectively. For both training and forecasting periods, the ensemble mean REs for LM1_Q are significantly different from those for LM2_QS at the 95% significance level in each year for all target months.

Discussion
In this study, we examined the possibility of improving streamflow forecasts in the low-flow season by including GRACE TWSA information from the previous high-flow

Discussion
In this study, we examined the possibility of improving streamflow forecasts in the lowflow season by including GRACE TWSA information from the previous high-flow season with linear models. The results suggest that the inclusion of GRACE TWSA information would significantly improve streamflow prediction during the dry season. Particularly, the linear models including GRACE TWSA would perform better in years that the model with streamflow alone performs poorly. For short term (e.g., 1 to 2 months ahead) streamflow forecasts, the added information from TWSA plays only a minor role. That is, streamflow forecasts for November and December by the linear models are mainly dominated by the preceding streamflow. The limited role of TWSA in streamflow forecasts for November and December implies that streamflow in the two months may be impacted by other factors not represented by TWSA in the UYR, e.g., freeze-up, which often starts in mid-November. Improvement in streamflow forecasts is greatest for longer lead times (3-5 months). For example, for the target months of January, February, and March, the linear model with streamflow and TWSA from the previous October reduced errors by 25% to 50%, relative to the model with streamflow alone for 3-5-month lead time forecasts. The inclusion of GRACE TWSA improves streamflow forecasts in dry years more than in wet years, which suggests a more important role of GRACE TWSA in dry years. Improved performance of streamflow forecasts underscores the strong connections between TWSA during the wet season and streamflow in the following dry season, which is largely associated with long-term terrestrial hydrologic memory in the headwater region [13,43]. During the wet season, both streamflow and water stored in the watershed, including surface water and groundwater, are significantly affected by precipitation. The TWS would contribute to streamflow in the subsequent dry season. TWSA, which decreases during the dry season, primarily results from changes in surface water, soil moisture, and groundwater rather than snowpack in the UYR because snow accumulates while TWS continues to decrease during the cold winter ( Figure 2).
Previous studies demonstrate the potential of GRACE data for streamflow forecasts during the flooding season, following Reager et al. (2014). However, this study provides new insights into the topic by focusing on cold-and low-flow season prediction using GRACE data from the preceding wet season. Streamflow in the wet season is linked to the decrease in upstream TWS related to snow melting and groundwater storage change. GRACE detects the TWS change from month to month, serving as a good predictor of storage change. During the cold and low-flow season, however, streamflow mainly drains from water stored in the wet season under gravity. The GRACE TWSA at the end of the wet season (not TWS change from month to month) can be used to predict streamflow in the following low-flow months, which relies on a mechanism which is different from that demonstrated in previous studies. Alternatively, the results show that, even with groundbased streamflow observations, GRACE TWSA can greatly improve long-lead streamflow forecasts. GRACE TWSA may be a better indicator of water stored in the basin at the end of the wet season compared with observed streamflow. The mechanism presented here is not basin-or region-specific and should be applicable to other mountainous rivers with sufficient contributing basin areas, where low flows are mainly derived from water stored during the wet season and drain under gravity during the following dry season.

Conclusions
The case study in the UYR shows that inclusion of GRACE TWSA, alone or in combination with other hydrometeorological observations, in the linear models significantly reduced prediction errors and provided more robust long-lead streamflow forecasts in the low-flow season. Therefore, this study underscores the importance of water stored during the high-flow seasons in modulating streamflow in the low-flow seasons and the great potential of GRACE-based TWSA information in improving streamflow forecasts in mountainous rivers. The GRACE follow-on mission will further enhance the skill of streamflow forecasts and, to some extent, make operational forecasts possible [17]. Understanding the processes linking streamflow, soil moisture, and groundwater would be very valuable in enhancing streamflow forecasting using hydrological models. The UYR basin is a typical mountainous headwater area which serves as a water tower supplying a substantial volume of water to meet local and downstream water demands [44]. Streamflow from headwaters to downstream reaches in the low-flow season commonly depend on contributions from water (in the forms of surface water, soil moisture, and groundwater) stored in the preceding high-flow season. The UYR example suggests that GRACE observations should be a useful long lead-time predictor of streamflow in the low-flow season in valleys adjacent to mountainous headwater areas. The improvement provided by GRACE TWSA in streamflow forecasting with a long lead-time in low-flow seasons would favor water resources management and planning in the world's water towers and areas downstream of mountainous rivers.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/rs13101993/s1, Figure S1: Performance of the linear models with different forecast periods, Figure S2: Performance of the linear models with the JPL TWSA at the Tangnaihai stations, Figure S3: Performance of the linear models with the CSR TWSA at the Tangnaihai stations, Figure S4: Observed monthly streamflow and modeled results with resampled data for each target month in each year. Table S1: Mean relative absolute error (MRAE) for LM1_Q (LM1) and LM2_QS (LM2), and the significance (P) of the difference between LM1 and LM2 for each target month and each year.

Conflicts of Interest:
The authors declare no conflict of interest.