Discerning Watershed Response to Hydroclimatic Extremes with a Deep Convolutional Residual Regressive Neural Network

: The impact of climate change continues to manifest itself daily in the form of extreme events and conditions such as droughts, ﬂoods, heatwaves, and storms. Better forecasting tools are mandatory to calibrate our response to these hazards and help adapt to the planet’s dynamic environment. Here, we present a deep convolutional residual regressive neural network (dcrrnn) platform called Flux to Flow (F2F) for discerning the response of watersheds to water-cycle ﬂuxes and their extremes. We examine four United States drainage basins of varying acreage from smaller to very large (Bear, Colorado, Connecticut, and Mississippi). F2F combines model and ground observations of water-cycle ﬂuxes in the form of surface runoff, subsurface baseﬂow, and gauged streamﬂow. We use these time series datasets to simulate, visualize, and analyze the watershed basin response to the varying climates and magnitudes of hydroclimatic ﬂuxes in each river basin. Experiments modulating the time lag between remotely sensed and ground-truth measurements are performed to assess the metrological limits of forecasting with this platform. The resultant mean Nash–Sutcliffe and Kling–Gupta efﬁciency values are both greater than 90%. Our results show that a hydrological machine learning platform such as F2F can become a powerful resource to simulate and forecast hydroclimatic extremes and the resulting watershed responses and natural hazards in a changing global climate.


Introduction
Water is connected to and connects all living things on Earth. It is wielded to power electronic devices, enables plants, food, and animals to grow, serves as the living and recreational space for all creatures, and is nourishment to the human body. It has been both the subject of, platform for, and weapon of choice in numerous conflicts. Global hydraulic infrastructure is highly variable. Dirty water can be a source of disease and death. It is branded, modified, and sold at differing levels of purity and concentration. The cost of equipment to control the flow of water is high, maintenance is frequent, and changes in demand and supply for water as a resource are constant sources of concern.
Human activities have changed and continue to change Earth's environment. The changes are visible in both short-(meteorological) and long-(climatological) time scale observations [1]. As the temperature of our home planet increases, the amount of snow and sea ice loses volume over time [2,3], sea levels rise and swallow up once inhabited land [4,5], storms intensify [6], droughts last longer [7], floods become more severe [8,9], animal populations go extinct [10], and the availability of freshwater becomes more unreliable [11]. Concurrently, manmade Earth observation and control systems continue to improve [12,13].

Materials and Methods
The F2F code base provides an in-depth view of the materials and methods applied within. As such, it is a key of part of the work and has been made openly accessible at https://github.com/albertlarson/f2f (accessed on 30 March 2023). The scripts follow the logical order of this paper. Specifically, these notebooks serve as the vessel to start with time-series preprocessing and neural network building and analysis. Furthermore, we have selected a cross-section of the pertinent information below.

Study Areas
Four United States drainage basins with areas of greater than one million acres each were selected as study areas and are shown in Figure 1. The Bear River and Connecticut River watersheds are significantly smaller than either the Mississippi River or the Colorado River basins. The satellite imagery used observes approximately 100 square kilometers of area (on the order of 25,000 acres) in each pixel.

Satellite-Derived Observations
For each basin there are two input images extracted from raw data obtained through the NASA Goddard Earth Sciences Data and Information Services Center. The raw data are National Land Data Assimilation System (NLDAS) model output. The NLDAS is a project run by several United-States-based institutions and universities. The NLDAS takes continental-scale meteorological data parameters (e.g., air temperature, wind speed, surface pressure, precipitation, incoming radiation, specific humidity) as input and deterministically creates water-and energy-flux layers as outputs. The NLDAS project in its second phase applies several different water-and energy-balance algorithms to create flux outputs from one common set of meteorological inputs. Here, the Noah water-and energy-budget algorithm is used. The channels of interest are components of the water flux, specifically the surface and subsurface runoff, as they collectively represent the lateral movement of liquid water along and under the surface towards the terminal drainage point at a given point in time [22,23].

Ground-Truth Measurements
Concurrent with the two NLDAS channels is a single gauged-in-the-river streamflow measurement. Daily streamflow measurements from sites near the terminus of each basin are obtained from the United States Geological Survey's National Water Information System. The sites were selected based on the availability, proximity to the terminal point of the basin, and relative continuity of the data. Gaps in the data collection are solved with linear interpolation.
Ground-truth streamflow data are critical for hydrologic modeling, as they provide a means of validating and calibrating model results. The USGS National Water Information System (NWIS) is a primary source of ground-truth streamflow data in the United States [24]. The NWIS is a network of over 1.5 million sites that collect measurements of water, some of which are then used to calculate streamflow [25]. These gauges are operated by the USGS in collaboration with other federal, state, and local agencies, as well as private organizations. The data collected by the NWIS are used for a wide range of applications, including flood forecasting, water management, and environmental assessments [26].
The NWIS stream gauges provide a valuable resource for monitoring and managing the nation's water resources. The network covers a broad range of water bodies, including rivers, streams, lakes, and reservoirs, and the data collected help to support a variety of water-related activities. For example, the streamflow data collected by the NWIS are used to support flood-forecasting efforts, which are essential for public safety and property protection. In addition, the data are used to assess water availability for agriculture, industry, and domestic use and to monitor the health of aquatic ecosystems.
The USGS has a long history of collecting and analyzing streamflow data, dating back to the late 19th century when the agency was established [27]. Since then, the network of stream gauges has expanded and become more sophisticated, incorporating new technologies such as acoustic Doppler current profilers and advanced telemetry systems [25]. The USGS has also played a key role in developing standardized methods for collecting, processing, and analyzing streamflow data, which have been adopted by other countries around the world [28]. Today, the USGS continues to operate and maintain the largest network of stream gauges in the United States, providing a valuable resource for hydrologic research and water resource management [26]. With the growing importance of water resource management and the increasing threat of climate change, the role of the NWIS in monitoring and managing the nation's water resources is more critical than ever.

Data Collection and Preprocessing
For this study, we looked at the time range starting on 1 January 2015, until the most recent output available, 1 March 2022. The NLDAS model output is available as a monthly and hourly product. We combine the hourly data available for surface and subsurface streamflow into a daily product. The raw hourly NLDAS product with all variables is a directory sized 351 gigabytes composed of 62,805 hourly files. The summing and extraction of lateral flows shrunk the total file size by a factor of more than 150. Each raw data file consumes 5.8 megabytes of disk space, while each daily surface and subsurface flow extraction was 822.7 kilobytes. The filtered data consume only 2.1 gigabytes and can easily be held on a graphical processing unit when trained with the neural network. The file size decreases further when clipped to a particular basin. The images are z-scored relative to themselves, while the gauged streamflow data are z-scored relative to the entire time series of seven years. Whitening has been shown to improve the performance of training a neural network [29,30].

Treatment
For this experiment, we constructed a deep convolutional residual regressive neural network. Our network selection was inspired by the strong performance of a similar structure in the task of image classification [31] and the hypothesis that this work might be transferred to a new domain [32]. The guiding research question behind the use of this structure was "How does the use of a state-of-the-art stochastic model fare in the job of streamflow prediction typically reserved for deterministic methods such as the Saint-Venant equations [22] or the Muskingum model [33]?" The images of Earth's surface and subsurface water flow we obtain from NLDAS are passed through this network. Eventually, the transformed images reach a destination where the image shapes have been constrained in size to match that of the target of the input pair; here, the target is one pixel as the daily value for gauged streamflow is a single physical measurement. Shape transformation is a common occurrence in the application of neural networks, where one constrains an input shape to a target output shape. This is performed with different "layers", where some linear computations are performed based on the user-specified shape of the neural network. There are helpful resources for these fundamentals [34]. The problem is one of regression because the prediction of streamflow is continuous and can theoretically be any value greater than zero. We use convolutional neural networks because our input to the network is a sequence of two channel images [35]. We also use residual learning, which allows us to make the network very deep but control the opacity of the initial structure of the image. This makes training faster [31]. Rectified linear unit activation functions are applied in all but the last layer of nodes, and batch normalization is used in the residual layers [36,37]. Batch normalization is like the z-score treatment in our preprocessing step. Finally, we selected a variant of stochastic gradient descent for optimization of the neural network nodes [38,39].
The gauged streamflow measurements of the four target rivers are significantly different in magnitude from one another; therefore, we process each with a z-score treatment to center their mean values around the number zero and standardize each increasing and decreasing integer around intervals of standard deviation. Figure 2 shows plots of the gauged streamflow measurements of each basin oriented in two ways. The four individual strip charts in the top right show the change in streamflow over time. The more prominent normal-distribution-shaped plots show how often actual measurements in the respective basin occur relative to the average discharge of each basin. This is a single-dimensional z-scoring system. We also perform a two-dimensional treatment on each of the input channels, surface, and subsurface streamflow. Whereas the 1D treatment uses the entire time series of gauged streamflow measurements for computation, the 2D z-scores are computed based on a single image at a time. The modifiable hyperparameter controls of the network are the basin under observation, the lag in time between the input and output datasets, the number of training epochs (forward and backward passes of the neural network), and the ratio of training data to testing data. There is also an override for stopping the model training early when the training data have a Nash-Sutcliffe efficiency (NSE) value of a variable efficiency percentage.  Figure 3) or the total number of epochs of back and forward propagation of the entire basin dataset reaches 100. The computations are constrained to a single node with two central processing units, a single NVIDIA GeForce RTX 2080 Ti graphical processing unit, and no more than 130 gigabytes of random access memory. Our platform is written in the python programming language and managed with the miniconda package manager. The total run time to compute the experiments was 83.0 h.

Results
The hourly NLDAS model outputs of surface and subsurface flow are summed to daily accumulations over the time span of 1 January 2015 to 1 March 2022. This time series is 2617 days long and composed of two channel images. The channels are surface and subsurface flow measured in units of kilograms per square meter. These units are analogous to the weight of water in a location. The sample observation output from each basin capturing the flow behavior on 6 June 2021 is displayed in Figure 4. The effects of spatial resolution are apparent, as the Bear River and Connecticut River basins have pronounced rectangular edges due to their relatively small size. This pixelation effect is not present in the Mississippi River and Colorado River observations of lateral flow from the basin view at this constrained figure size.  Figure 3 shows a sample output from one configuration of the neural network. The topmost graph illustrates the time series of discharge measurements in cubic feet per second of the Bear River. This graph is rotated ninety degrees relative to its sibling hydrograph in Figure 2. There is a notable seasonality to this streamflow measurement of the Bear. Spring brings melting snowpack in the nearby mountainous terrain and subsequent increases in the neighboring river flows. The spring melting snow in 2021 appears more subdued than all other years observed. The Bear River drainage basin is located in between the Great Salt Lake and Yellowstone National Park in the Rocky Mountain region of the United States. The eponymously named river flows in a counterclockwise loop.
The second row plots each modeled observation in the time series against its respective actual measurement. On the left is a study of the model output ordered on the x-axis from low to high flows with the corresponding actual measurement on the y-axis. The right plot retains the same axis labels, but instead observes the spatial proximity of values. Darker points are more-commonly occurring ranges of flow. The left plot also contains two lines of best fit: the ideal or desired line found from the data and the actual line of fit as exists between the actual gauged streamflow and the neural network model output of streamflow from surface and subsurface flow.
The third and final row shows the epochal values during the neural network training process. On the left, the average error between the actual measurements and network output declines as the model goes through its iterations of propagate and backpropagate. Concomitant with the error vs. epoch is efficiency vs. epoch. As the error declines towards zero, the NSE measurement increases towards 100%. Here, the neural network was set to stop at an NSE value of 95%, which occurs in the sixth epoch.

Discussion
The results presented indicate a relatively favorable performance of the neural network architecture when transforming the surface and subsurface flow into a prediction of basin gauged streamflow; the kernel density estimates (KDE) in Figures 5 and 6 illustrate this point. We executed a total of more than 2200 experiments using the common architecture. We used two hydrological metrics: Kling-Gupta (KGE) and Nash-Sutcliffe (NSE) [40][41][42][43]. For each of these metrics, the peak resultant merit value of the 2268 experiments is greater than ninety percent with a standard deviation of less than 0.06. The results are tolerant to lagging the data beyond the residence time of water in the atmosphere [44,45]. This study presents a new simple framework that performs significantly better than the routing algorithm by the author of the input NLDAS input datasets. See [22] for a thorough explanation. Their streamflow modeling results (with respect to NSE) are admitted to be less than desirable, and they encourage improvement in the NSE scores. Based on the performance here, we see the F2F methodology as a valid suitor for a follow-on dataset such as NLDAS-3.  Others have observed the changing water quantity of the Mississippi. One study using NLDAS data focuses on a subsection of the Mississippi with a higher quantity of streamflow target sites [46]. Another group considers a different data system altogether for watershed modeling on the upper Mississippi basin [47]. Some groups suggest that NLDAS is too simplistic and decided to create their own blend. They take a much broader approach than the scope of the problem observed here [48]. The same is true for another study, which considers several different models and about 1000 small river basins [49]. Some use meteorological data as a predictor for electric outages, as seen in a study looking at Connecticut. They, too, use the Nash-Sutcliffe efficiency as a figure of merit [50] but approach the problem with a different lens. Their target is a smaller population and the risk of being without electric power.
This process can be expanded in different ways. Our study relies on the internal programming of NLDAS to compute the surface and subsurface flow. There is much uncertainty in these observations based on the natural heterogeneity of the land surface.
We do not look at the independent influence of any single forcing variable. Take snow, for example. In large mountain proximal basins such as those near the Rocky Mountains or Himalayan ranges, an accumulation of subzero-degree Celsius water in solid form provides a continuous upland buffer tank for the communities the river downland serves. As the relative presence of carbon dioxide increases and the land temperature responds in agreement, the duration and scale of snow melt and sea ice responds. It is challenging to equate with exact certainty how much solid water exists. To a degree, interpolating satellite data with gauged data is sufficient, but these apparatuses are challenging to maintain in cold temperatures or in places of very high altitude. One could elect to observe more individual locations as targets, therefore making the relationship no longer image-to-singlevalue at a given time, but instead image-to-image. There are studies that consider the impact of slow-moving oceanic and atmospheric abnormalities upon the hydrology of the land. Independent variables include the Madden-Julian oscillation [51], the El Niño-Southern Oscillation [52], and the Atlantic meridional overturning circulation [53].
While the NLDAS product used here is of a particular spatial fidelity, the Global Land Data Assimilation System is coarser in its resolution. It is beneficial to the scientific community to have a clearer picture of the meteorological forcing and environmental responses in the ocean, land, air, and mixed interfaces. One could use this framework to fuse the high-resolution NLDAS product with the global GLDAS product and evaluate the result according to one common set of metrics. The software could be packaged and ported to use with an already-existing embedded in situ mesh system to provide forecasting information. Instead, one might consider looking at a different time signature, such as seasonally decomposed but over several years. Instead, one might introduce higher-resolution localized water-quality data into the model. By tracking environmental statistical anomalies relative to other points in time and relative to the global community, municipal decision makers can clue into the trajectory of their land, their structures, and their constituents within. The choice to retreat is not to be approached lightly, but in some instances is becoming the necessary measure [54,55]. This intelligence can also be placed in the hands of consulting engineers to distribute in new and existing infrastructure. Logic is necessary to manage the assets of complex hydraulic systems (pumps, motors, chemical feed, aeration, dewatering, gates, valves), and digital twin systems are becoming fashionable.
In 2022, Pakistan and the United States were hit by massive floods that caused widespread devastation. In Pakistan, heavy monsoon rains led to flooding across the country, affecting millions of people and causing significant damage to infrastructure and property [56]. Similarly, in the United States, the Mississippi River experienced severe flooding due to heavy rainfall, causing extensive damage to homes, businesses, and farmland [57]. These events demonstrate the devastating impact that extreme water events can have on communities and the urgent need for improved water-management strategies.
Strong rotational winds cause hurricanes and cyclones, which carry bulk quantities of water. These catastrophes are notable for their brute strength and historically have caused the displacement of people, loss of lives, damage to infrastructure, and disruption of social and economic systems. One of the most notable wind-driven water-based disasters in recent years was Hurricane Harvey in 2017, which caused catastrophic flooding in Texas and Louisiana [58]. The storm resulted in over eighty fatalities and more than 125 billion in damages, making it one of the costliest natural disasters in US history. The intensity and frequency of hurricanes are expected to increase due to climate change, resulting in an increased risk of devastating floods and damage to coastal infrastructure [59]. Another significant event was Cyclone Idai, which hit Mozambique in 2019, causing widespread damage and loss of life. The storm resulted in over 1000 fatalities and an estimated economic loss of over USD2 billion Cyclone Idai was one of the worst weather-related disasters to hit the southern hemisphere, highlighting the increasing vulnerability of developing countries to extreme weather events [60].
On the other end of the water-quantity-disaster spectrum, the 2017 Cape Town water crisis brought attention to the challenges of managing a sustained lack of renewable water resources over a prolonged period of time. The city of Cape Town, South Africa faced an unprecedented drought that lasted for several years, leading to a severe water shortage. The crisis prompted the implementation of strict water-rationing measures and increased investment in water-conservation and -management strategies. This event highlighted the importance of proactive and adaptive water-management strategies in the face of changing environmental conditions [61].
The impacts of water-based disasters are not limited to the immediate physical damage they cause. These disasters can have long-lasting effects on the environment through vectors of water pollution and ecosystem degradation. For example, the 2011 Fukushima nuclear disaster in Japan led to the release of radioactive materials into the ocean, resulting in significant environmental damage. The incident, of course, had an impact on the surrounding marine ecosystem, with some species of fish still showing elevated levels of radiation years after the disaster [62,63].
Improving water-management strategies requires a multifaceted approach, including better monitoring systems, enhanced cooperation with the environment, and increased public awareness and participation. Effective water-management strategies should aim to balance the competing demands of human society and the natural environment while promoting the sustainable and equitable use of water resources. The opportunities to improve our monitoring systems are many; however, more people are needed in the conversation to consider how we might better cooperate with the environment.
Effective management and mitigation of water-based disasters require coordinated efforts from multiple stakeholders, including governments, non-governmental organizations, and the private sector. Such efforts include improving early warning systems, developing more resilient infrastructure, and promoting sustainable water-management practices. Early warning systems play a crucial role in preparing for and responding to water-based disasters. These systems can provide timely and accurate information to people in affected areas, enabling them to take necessary precautions and evacuate if necessary [64]. The development of more resilient infrastructure is also essential in mitigating the impact of water-based disasters [65]. For example, the use of green infrastructure, such as rain gardens and permeable pavement, can help to reduce the impact of flooding by slowing down the rate at which water enters the drainage system [66]. Additionally, the use of naturebased solutions, such as wetland restoration, can help to improve the overall resilience of ecosystems to climate change and extreme weather events [67].
Finally, it is crucial to recognize that the impacts of water-based disasters are not distributed equally. Vulnerable populations, such as those living in poverty or in marginalized communities, are often disproportionately affected by these events [68]. In addition, climate change is exacerbating the frequency and severity of water-based disasters, particularly in regions with already-limited resources and infrastructure [69]. Therefore, addressing the root causes of vulnerability and promoting equity in disaster management and response must be an integral part of efforts to mitigate the impacts of water-based disasters.
We find that the major limitation of Flux to Flow in this set of experiments is the small quantity of target outputs for a given basin. The Mississippi, for example, is an extremely large basin, and to boil the entirety of its existence down to a single gauged point is certainly an oversimplification of the basin's complexity. Future studies with Flux to Flow should improve the granularity of basin comprehension. This improved basin comprehension should both be literal in the words used to describe the basins, as well as more streamflow gauges. Furthermore, when we computed the efficiency values after each of the 2268 models was trained, we computed these values not just based on the test data but also including the training data. Certainly, this might allow for a positive biased result value because a large amount of training data has been factored into the performance of the system. Further studies should be harsher on the performance of Flux to Flow. More simultaneous target measurements are required, and only test data should be factored into the computation of network performance. We also looked at the derived fields of surface and subsurface flow that are computed via a host of meteorological forcings as well as thoughtfully applied deterministic methods [22,70]. Because of this abstraction, it is possible that our work is carrying through biases created from oversimplification of the system that has been used to make our inputs.
It would be valuable to consider using more pure inputs and see if this provides a more robust performance. The factors that we see as potential next raw inputs are the consideration of terrain, land cover, and shifts in the land due to geophysical phenomena. The changing elevation of water is a crucial facet in the power of water. As such, there is a wide breadth of elevation data available and scientists focused on this aspect. Both elevation and land cover datasets are available with much finer resolution than the spatial scales considered here [71][72][73].
We constrained our research to a single computing node at a time. We are aware that this handicaps the scope of what can be computed in the experiments. However, we crafted the software so that it might be scaled to these larger and finer-resolution datasets. Furthermore, we only consider one neural network structure. It is possible that our system is more powerful than is necessary since our results seem to be mostly intolerant to lag or splitting of the data. Because of this, one way to distinguish F2F would be to consider its performance against other networks. There are no shortages of potential structures, but we think a logical next step would be to consider simpler structures such as the building block components of our dcrrnn architecture. Another avenue is to apply some of the newer, more complex structures. Some authors have already approached this topic in the Colorado basin [74]. Transformers are gaining much notoriety with the growth of large language models. It is less known that these same mechanisms can be applied to regressive tasks such as the one approached here and could prove to be a workhorse for climate modeling [75].

Conclusions
In this study, we introduce a fresh perspective to studying and understanding the water cycle with a learned representation using modern techniques and data systems. Our results show that a deep convolutional residual regressive neural network (dcrrnn) combined with water flux and gauged streamflow data can exhibit strong forecasting performance according to standard hydrological statistical figures of merit. We used the dcrrnn concept to develop a platform called Flux to Flow (F2F) and examined four major river basins across the United States. F2F outputs strong forecasting performances (Nash-Sutcliffe and Kling-Gupta efficiency above 90%) in most cases and at various time lags. Through the careful use of visuals and data management, this approach can provide a satisfactory performance for various locations, degrees of fidelity, time scales, and parameters of interest for the water resource and climate science community. Furthermore, we render the code in a form that is meant to be accompanied with the white paper. We believe that the hybridization of the literature with clearly written open-access software is the future of watershed modeling research (and research in general). We think this approach opens the door for new students who are interested in the water cycle, but who might be limited by 1. financial resources associated with black-box high-performing solutions or 2. their prior experience with the programming environment. We do not sacrifice modeling quality for simplicity of understanding and believe this is a unique facet of this research. We hope this work will spur others on in the quest for better prediction of the water cycle and conservation of the global ecosystem.

Acknowledgments:
The authors thank to the University of Rhode Island Research Computing Services and the Massachusetts Green High Performing Computing Center for use and support of the UNITY cluster.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: