Spatio-temporal analysis of flood data from South Carolina

To investigate the relationship between flood gage height and precipitation in South Carolina from 2012 to 2016, we built a conditional autoregressive (CAR) model using a Bayesian hierarchical framework. This approach allows the modelling of the main spatio-temporal properties of water height dynamics over multiple locations, accounting for the effect of river network, geomorphology, and forcing rainfall. In this respect, a proximity matrix based on watershed information was used to capture the spatial structure of gage height measurements in and around South Carolina. The temporal structure was handled by a first-order autoregressive term in the model. Several covariates, including the elevation of the sites and effects of seasonality, were examined, along with daily rainfall amount. A non-normal error structure was used to account for the heavy-tailed distribution of maximum gage heights. The proposed model captured some key features of the flood process such as seasonality and a stronger association between precipitation and flooding during summer season. The model is able to forecast short term flood gage height which is crucial for informed emergency decision. As a byproduct, we also developed a Python library to retrieve and handle environmental data provided by some main agencies in the United States. This library can be of general usefulness for studies requiring rainfall, flow, and geomorphological information over specific areas of the conterminous US.


Introduction
During October 2-5, 2015, an extraordinary rainfall event took place in the Carolinas, many parts of which observed 500-year-event levels of precipitation. Accumulation of rainfall amount reached 24.23 inches (61.57 centimeters) near Boone Hall (Mount Pleasant, Charleston County) by 11:00 a.m. Eastern Time on October 4, 2015. The rainfall peaked on October 4 with a 24-hour total of 16.69 inches (42.39 centimeters) of precipitation; and the total 48-hour precipitation during October 3-4 was more than 20 inches (51 centimeters). The likelihood of the rainfall amounts ranged from anywhere between a 1in-250-year event to a 1-in-1000-year event in the study region with some places such as Columbia and Lexington, SC receiving more than 17 inches (43 centimeters) of rain over a four-day period (Phillips et al. 2018). Columbia, the capital of South Carolina, broke its all-time wettest 1-day, 2-day, and 3-day periods on record (e.g., Bonnin et al. 2006). The (2020) 7:11 Page 3 of 19

Data sources
In this section, we discuss our data sources and the necessary data munging steps we used in our study. We compiled a dataset for 94 unique locations in South Carolina with precipitation, elevation, gage height, along with basin information, over a span of five years. We primarily cover the collection of variables such as daily rainfall and gage height, since we are interested in exploring the dynamics between them. We mention the watershed information briefly since it is used in defining the proximity matrix. A detailed discussion of this can be found in "Adjacency matrix and watershed" section.

Precipitation
The National Weather Service (NWS) collects precipitation data at 12 Contiguous United States (CONUS) River Forecast Centers (RFCs). The precipitation is recorded using a multisensor approach. Hourly estimates from weather radars are compared to ground gage reports, and a correction factor is calculated and applied to the radar field (Daly et al. 2000). For areas where radar coverage is not accessible, satellite precipitation estimates can be used to construct the multisensor field (Daly et al. 1994). Note that this method has been applied to South Carolina and most other eastern states, whereas a different method is used to process precipitation data in mountainous areas west of the Continental Divide.
The precipitation data are then mosaicked into a gridded field with a spatial resolution of four by four kilometers. The record is an accumulation of 24-hour periods and 1200 GMT is used as the ending time for a 24-hour total. Spatially, the original dataset extends well beyond the U.S. border, most notably north of Washington and Idaho and west of Texas, in order to model rivers that flow into the United States. However, only the observations within South Carolina and nearby states are retained in our study since the rainfall far outside the state is unlikely to have a major effect on flood gage heights in the short term. Available data dates back to 2004 and still is actively updated by NWS. Rainfall values from 2012 to 2016 (inclusive) were retrieved for our study.
The raw data are archived in https://water.weather.gov/precip/archive/. The major challenges of handling this dataset are parsing the raw data (in NetCDF format) and filtering out values from irrelevant regions and dates. "Miscellaneous code" section is a brief introduction of our proposed approaches to streamline the data preprocessing steps by developing a Python library.

Gage height
Gage height (also known as stage) is the height of the water in the stream above a reference point. Gage height refers to the elevation of the water surface in the specific pool at the streamgaging station, not along the entire stream (United States Geological Survey (USGS) 2011). Gage height also is not exactly the same as the depth of the stream. Since the stage baselines are set in a case-by-case manner across locations, we subtract the station-wise historical median (the median gage height for each location, over a 10year period) from each gage height measurement to make the measurements comparable (see "Model description" section for details). This is done as a preliminary centering step before we fit the model. The U.S. Geological Survey (USGS) provides an archive of approximately 1.9 million observation sites of all kinds in all 50 states, the District of Columbia, Puerto Rico, the Virgin Islands, Guam, American Samoa, under the Water Data for the Nation portal on (2020) 7:11 Page 4 of 19 its website. More than 1000 such sites can be found within the border of South Carolina. However, the site count is drastically reduced when we focus on locations measuring surface water and exclude those that have ceased functioning. Eventually, we have approximately 150 to 200 locations (depending on the timeframe) within South Carolina that give a valid reading of the gage level on a daily basis. One can either use the interface provided by USGS or the data.download_flood() function from our Python library ("Miscellaneous code" section) to download the data. The former comes with a graphical user interface but may be harder to maneuver when multiple sites are needed. The latter, on the other hand, allows user customization to a greater degree. Notably, the precipitation and the gage height are measured in different locations, since the former are measured in gridded fields and the latter are located at major rivers and dams. We implemented a "blurry lookup" approach to combine the two pieces of information. For readers familiar with SQL, the algorithm is similar to a left join, where all rows in the left table (gage height) are retained, and on the right (precipitation) only records with matching keys are kept. This is different from a typical left join in that although a latitude and longitude pair serves as the key, typical merging is not feasible due to the location mismatch. Hence, the merging is done by finding the nearest neighbor. For each row (location i) in the gage height table, we find a location j in the precipitation table that is closest to it. We add the rainfall information at location j to location i for each i in the left table. Admittedly, this is not ideal since the precipitation and gage height are not from the exact same location, but the high resolution of the precipitation data (4×4 km) makes this issue less critical.
Additionally, since a fair amount of records are missing in the dataset, we first calculate the missing data ratio, which is the percentage of days with missing records over the total number of days during the aforementioned time span (2012)(2013)(2014)(2015)(2016). We discard the location if the missing data ratio is beyond a certain threshold. We strike a balance between a larger sample size and better data completeness with the help of Fig. 1, which shows how many locations are retained for different time spans and thresholds. Note that the x-axis is number of years from 2016 counting backwards. For instance, there are 120 locations retained in the dataset for 2016 with a 95% complete-data threshold. Based on Fig. 1, we pick 90% (94 unique stations) as the complete-data threshold for a time span of five years, since further increasing the threshold leads to a significant decrease in the amount of available gaging stations.
Imputation for the remaining locations is based on the temporal adjacency. In other words, to fill the missing gage height values on certain dates, the weighted average of values from neighboring dates is used, that is where Y t is the missing value at time t and w * = w t−2 + w t−1 + w t+1 + w t+2 . We set w t−1 = w t+1 = 2 and w t−2 = w t+2 = 1 since observations closer in time to the missing value should be more informative. Alternatively, one can fill the missing values based on spatial closeness, but we argue that the gage height measurements in the same location may change quite steadily and continuously. Filling missing data spatially is less ideal since doing so would involve pooling together different observation locations, which are associated with varying gage baseline levels and landscapes.

Elevation
The elevation information is obtained based on the Shuttle Radar Topography Mission (SRTM), which is an international research effort that obtained digital elevation models on a near-global scale from 56°S to 60°N (Farr et al. 2007). The 30-meter topographic data products are publicly distributed by the USGS along with the 90-meter data. These data are made available via an Earth Explorer on the US Geological Survey website in a .tiff format. We retrieve the elevation information from the 90-meter data for the aforementioned gage locations by matching the latitude and longitude. Elevation of the nearest neighbor is used if an exact match cannot be found.

Other covariates
Besides the precipitation and elevation, we also include three dummy variables to account for the seasonality in the data. The three dummy variables respectively take values 1 if the data record is from the spring (March through May), summer (June through August), or fall (September through November), and 0 otherwise. More importantly, interaction terms of the season indicator and precipitation are included, so that we can explore whether a difference in rainfall effect on flood levels exists across seasons. Specifically, if the interaction variable between spring and precipitation manifests itself as positive and significant, one can conclude that during March through May, rainfall increases are likely to lead to an even greater average rise in gage heights than in the baseline season (winter).

Basins and watersheds
The watershed information is pivotal to our model in a way that is different from elevation or precipitation. Rather than entering the model as a covariate, the watershed membership is used for the adjacency matrix W, whose definition can be found in "Adjacency matrix and watershed" section, along with a more detailed account of the watershed system. In this section, we focus on preprocessing such information into a well-structured format. USGS hosts the watershed information by state on Amazon Web Services (AWS), which is publicly available. It is a repository of contour files with varying sizes. A 4-digit (2020) 7:11 Page 6 of 19 hydrologic unit code (HUC) is less localized and covers a larger area than a 6-digit HUC (HUC-6), for example. We use the contour information to define the watershed membership. Practically, a categorical variable with the watershed name is added for each available location. We decide to use the 6-digit hydrological unit to categorize all available locations into four regions. We discuss this more in "Adjacency matrix and watershed" section.

Miscellaneous code
A Python library, climate_data_toolkit, is developed in parallel with our study, which accomplishes two goals. First, we intend to streamline the process of downloading and preprocessing raw data from different sources. Rather than using varying user interfaces for different databases, one can achieve the same result nearly instantly by function calls like get_flood(). Second, we package our models that we use in "Model description" section with a user-friendly interface. Hence, a compilation of a few Python modules, or, a library, is a natural choice for this purpose. In addition, we also have a plotting system, which is a handy tool to visualize spatial data, since it can display spatial elements such as markers and contours on top of an OpenStreet Map in a manner reminiscent of the R package ggplot (Wickham 2016). The Python library is hosted on Github, and users can find the source code and help documents at https:// github.com/HaigangLiu/spatial-temporal-py. Alternatively, the package also supports pip install, which is a convenient command line tool for package management.

Adjacency matrix and watershed
The concept of the adjacency/proximity matrix W, first introduced by Cressie (1993) in areal data analysis, is pivotal to reflect the dependence among nearby locations. The entries w ij in the adjacency matrix describe the connection between location i and j in some fashion. Typically, one builds the adjacency matrix based on either a distance or a binary status. For instance, one can define w ij = 1 if i and j share some common boundary and 0 otherwise. Alternatively, w ij could reflect "distance" between units. Further modifications can be made as well. For instance, we could set w ij = 1 for all i and j within a specified distance. Or, for a given i, we could define w ij = 1 if j is one of the K nearest (in distance) neighbors of i. In the context of our study, we define the adjacency matrix based on the watershed information since it serves as an indicator of flood activity and its domain. Specifically, if two locations i and j are within the same watershed, then w ij = 1 and w ij = 0 otherwise. Note that this choice reflects the river network connections. A watershed is an area of land where rainfall accumulates and drains off into a river, bay or other body of water (Betson 1964). Other terms used interchangeably with watershed are drainage area, catchment basin and water basin. The watersheds have different scales and the hierarchy is reflected by HUC system. For instance, an area indexed by a two-digit code is composed of several smaller four-digit basins. There are six levels in the hierarchy, represented by hydrologic unit codes from two to twelve digits long, called regions, subregions, basins, subbasins, watersheds, and subwatersheds (Seaber et al. 1987). Figure 2 illustrates all the six-digit and eight-digit hydrological units that are located fully or partially in South Carolina.
Notably, basins (areas indexed by a HUC-6 code) appear to be an appropriate granularity when we investigate the watersheds in South Carolina, since these hydrological areas are neither too dense nor sparse in terms of data points.   Alternatively, to define the proximity matrix, one can use a river basin system as well (Fig. 3), which is a product of the first watershed planning activities in 1970s by the state of South Carolina. According to the river basin system, eight mutually exclusive areas are defined: Broad River, Savannah River, Pee Dee River, Santee River, Catawba River, Catawba River, Saluda River, Edisto River and Salkehatchie River. However, we prefer the watershed system since it is not constrained by state borders. Furthermore, based on the river basin segmentation, some river basins, e.g., Salkehatchie, contain as few as two unique observing stations. Such sparsity might lead to less stable parameter estimates.

Model description
A neighborhood structure to reflect the spatial structure is pivotal in some spatial and spatiotemporal analyses. Often, one can define the neighborhood structure based on distance from certain centroids or similarity of an auxiliary variable (Cressie 1993). In our study, watershed information is used to construct the neighborhood structure since it outlines the domain of hydrological water movement activity, and we define measured stations within the same basin as neighbors.
The observed variables in our study include Y i , the gage height, and p explanatory variables, x i = x i1 , . . . , x ip . In order to compute the shifting patterns in flood records, this research incorporated exogenous covariates such as precipitation, dummy variables for spring, summer and fall, as well as the interactions between the seasonality dummy variables and precipitation into the Conditional Autoregressive (CAR) model. In addition, the elevation of each location was considered as a covariate but not included in the final model, as noted in "Result" section. The Conditional Autoregressive model for the responses, Y = (Y 1 , . . . , Y n ) , is formulated by specifying the set of full conditional distributions satisfying a form of autoregression given by where Y (i) = Y j , j = i , and β = β 1 , . . . , β p are unknown regression parameters. Also, σ 2 i > 0 and c ij are covariance parameters with c ii = 0 for all i. It should be noted that the values of the CAR parameter estimates should reflect reasonable physical mechanisms to guarantee that the patterns observed in the period of record are not just effects of fluctuations of runoff processes whose dynamics evolve over longer time scales (e.g., (Koutsoyiannis 2011;Koutsoyiannis and Montanari 2014)). Banerjee et al. (2014) demonstrate that one can derive the joint distribution based on full conditional distribution for Y with Brook's Lemma. The joint distribution of Y is given (2020) 7:11 Page 9 of 19 as Y ∼ N n Xβ, (I n − C) −1 M , where M = diag σ 2 1 , . . . , σ 2 n and the elements of C = c ij . Note that Brook's Lemma requires M −1 (I n − C) to be positive definite and M −1 C symmetric, which means c ij σ 2 j = c ji σ 2 i for i, j = 1, . . . , n. We further simplify this model by assuming M = σ 2 I n , with σ 2 > 0 and unknown and C = αW. The parameter α can be interpreted as the unknown "spatial parameter" and W = w ij is a known "weight" matrix, which satisfies w ij = 1 if and only if sites i and j are neighbors. De Oliveira (2012) establishes that setting up the model with these two assumptions automatically satisfies the two aforementioned assumptions (symmetric and positive definite). Hence, the joint distribution of Y is further reduced to Y ∼ N n Xβ, σ 2 (I − ρW) −1 . Note that taking advantage of the fact that I − ρW is a sparse matrix can further accelerate the MCMC sampling. See the Appendix for more on this.
Note that this presentation of the model assumes the random error (or systematic fluctuations in model dynamics; see Clark et al. (2015)) follows a normal distribution, an assumption that should be checked when analyzing a real data set; if this normality assumption is violated, a different error distribution could be specified (e.g., Samadi et al. 2018). In our analysis in "Model fitting" and "Model diagnosis" sections, the residuals indicated a heavy-tailed pattern, and we considered a Laplace error specification, but ended up using a t distribution for the error distribution that proved to be proficient for South Carolina's rainfall-runoff processes (see Samadi et al. (2018)). Otherwise, the CAR model outlined in this section was used with our analysis.

Scaling
Scaling is implemented for the gage level measurements since baseline levels vary drastically across locations because they are determined in a fairly arbitrary manner. For instance, Station 02160991, located in the Broad River near Jenkinsville in South Carolina, has an average gage height of more than 200 feet (61 meters), while the Waccamaw River, for example, has a much lower average gage height.
To account for the disparity, we use y ij −ỹ i· as the response variable, where y ij is the original gage height for location i on the jth day, andỹ i· is the median of location i over 10 years. Figure 4 is a time series plot of the gage heights of five randomly selected locations after scaling.

Autoregressive terms
Autoregressive terms were considered for inclusion in the model with the covariates such as precipitation since it might conceivably take days for precipitation to cause a significant rise in the gage level. The optimal number of terms were determined by inspecting the ACF and PACF of the residuals, along with a comparison of mean squared errors with models with more or fewer autoregressive terms. A first-order autoregressive term was used in the finalized model, and a detailed discussion can be found in "Model diagnosis" section.

Result
We now present the model fitting results using the CAR model described in "Model description"; recall that we specify a t error distribution to account for the heavytailed behavior of the random errors, as explained further in "Model diagnosis" sections. We sample four chains from the posterior distribution of β, τ and ρ, and 95% credible intervals are reported as follows. For each chain we ran 50,000 iterations, and the burn-in period was set to 5,000. We experimented with both the Metropolis sampler and the No-U-Turn sampler and found that both yield similar credible intervals. Winter is used as the baseline season, and a positive estimate for summer, for example, indicates a rise in the predicted gage height compared to winter. Additionally, we found that elevation was not significantly related to the gage level and thus is not included in the final model. As seen from Table 2, precipitation has a significant effect on the flood level, and a rise of one inch (2.54 centimeter) precipitation leads to an 0.25 inch (0.64 centimeter) increase in the gage measurements on average during the winter season. Among all the seasons, only summer stands out with a statistically significant effect on the gage height. A positive estimate indicates a 0.041 inch (0.10 centimeter) higher predicted gage level for a summer day than a winter day, assuming the days had no precipitation. More importantly, the interaction between the summer season and precipitation has a positive estimate, which indicates that the effect of rainfall on gage levels are different across seasons. Specifically, during summer, rainfall contributes to a larger rise (0.03 inch, or 0.08 centimeter more) in the predicted gage level. In other words, a stronger association between precipitation and flooding can be observed during summer compared to other times of the year. Lastly, a positive estimate of α suggests that the locations within the same watershed are positively associated, while a positive estimate of ρ indicates that an autoregressive effect is present between different days: For example, a large gage height at a particular location is very likely to be followed by a large gage height measurement the next day at that location. This implies that the relationships between model parameters and covariates can reflect physical mechanisms of runoff generation at a watershed scale. When the model parameters and the covariates have a stochastic pattern/behavior (in time), the model structure  reflects more complex nonlinear temporal patterns and relationships between a response variable and the covariates. In this context, spatio-temporal variability of the interface needs to be deduced by meta-data such as effects of water abstraction scheduling, dams' construction and operation, etc. as recently concluded by Serinaldi and Kilsby (2015), and Samadi and Meadows (2017).

Model diagnosis
In this section, we examine the goodness of fit of the CAR model from several perspectives. The autocorrelation function (ACF) and partial autocorrelation (PACF) are employed to examine the residuals from a temporal point of view. Spatially, we display the residuals on the map and check for signs of systematic misprediction in any certain areas.
To examine the residuals from a temporal perspective, the residuals are grouped based on their watershed membership, and averaged over all locations within the watershed. The CAR model was initially fitted without autoregressive terms, and the ACF and PACF of residuals are given in Fig. 5. The slow decay in the ACF plot and the cut-off pattern in the PACF plot suggest an addition of an AR(1) term in the CAR model. To further evaluate the effectiveness of the autoregressive model, we show a time series plot (Fig. 6, left panel) after averaging out residuals spatially. We also calculate the 2.5% and 97.5% percentiles, and thus the shaded area indicates the range of 95% of all residuals. No apparent autocorrelation pattern is detected, although the last few observations indicate increased volatility in gage height. The absence of an autocorrelation pattern is attributed to the autoregressive term, since a CAR model without the AR(1) term gives the residual time series plot that shows a more obvious autocorrelation pattern and more variability (Fig. 6, right panel).
In addition, we examine several time series plots at multiple locations in Fig. 7. We picked four stations of varying degrees of volatility for demonstration purpose, each from a different HUC-6 area. Station 02172002, for example, has rarely seen drastic changes in gage height over the five year between 2012 and 2016. The only exception is during the 2015 flood event. In contrast, Station 02197500 demonstrates a different pattern which is more variable and not dominated by the 2015 event. Overall, the autoregressive model consistently captures the trend across different HUC-6 area locations with no systematic overestimation or underestimation. The calculation of mean squared error per site also validates this conclusion.
From a spatial perspective, we examine the distribution of residuals by visualizing them on a map with different colors representing negative and positive errors (Fig. 8). The radius of the circle is proportional to the residual. This is a daily snapshot on October 3, 2015, from which one can conclude that the residuals are fairly evenly distributed. Several randomly selected snapshots have been examined during the five-year span and no significant sign of overestimation or underestimation is observed. Additionally, one can aggregate the residuals over a time period, for instance, a year, and make a yearly residual map for inspection. Such visualization presents a similar picture as Fig. 8 and is thus omitted for the sake of space.
Recall that our fitted model presented in "Model fitting" section used a non-normal error structure; We now explain that choice. If we fit a model with a normal error distribution and examine the qq (quantile-quantile) plot (Fig. 9b), we perceive a pattern that suggests heavy-tailed errors and thus a violation of normality. This is potentially due to the heavy-tailed distribution of maximum gage heights (Fig. 9a). Note that the observations are plotted after the aforementioned scaling operation. The data are also slightly skewed to the right possibly due to occasional thunderstorms, which cause short-term, sharp and severe rises in the gage heights. Flow dynamics in the eastern US are mainly governed by landfall of tropical cyclones and extratropical systems. Tail asymmetry can be partly related to the mixed dynamics forcing floods in different seasons (Smith et al. 2011). Instead of normal errors, using an error structure that follows either a t or Laplace distribution handles extreme rainfall values better. Specifically, we pick a t distribution with 3 degrees of freedom since ν = 3 defines a distribution with reasonably heavy tails and guarantees that both expectation and variance exist. Alternatively, one can also set ν as a hyper-parameter which can be sampled from the posterior distribution. Another possible error distribution could be a skew-t distribution, although for this data set the symmetric heavy-tailed error distributions provide a good fit. The t distribution where ν = 3 (right panel, Fig. 10) is slightly better in terms of its qq plot than the Laplace (left panel, Fig. 10). Hence, the estimation reported in "Result" section was based on the model assuming that the error term follows a t distribution with 3 degrees of freedom. Note that the parameter estimates would be similar for the two models assuming either of the heavy-tailed distributions (t or Laplace).

Model comparison
It is of interest to evaluate the forecasting capability of the aforementioned CAR model since the gage observations, in and of themselves, are time series data, and forecasting realtime and future flood events might be helpful for early warning systems and emergency management. We compare out-of-sample predictions for the first week of 2017 with the ground truth and calculate the mean squared error and the mean absolute error as metrics since such calculations can be applied to any type of models as long as the response variables are continuous. Note that the rainfall and gage height data are measured simultaneously, so that the rainfall at a particular time is used to predict or explain the flooding at the same time. We have found that the time lag between the rainfall cause and the flooding consequence is minimal. However, note that in practice, the use of this model for real-time forecasting of flooding would require some projections of rainfall into the future. The use of the model for retrospective explanatory purposes after the flooding was observed would not require such projections, of course. In addition, we consider several other models to compare with the CAR model: specifically, the general linear regression model, and two members of a popular family of classification and regression methods: random forest and boosting trees. Comparing the linear model with CAR highlights the necessity of including a proximity matrix since a customized covariance structure is the major difference. Random forest and boosting trees, two popular machine learning algorithms, predict the patterns in data by combining the outputs of individual trees and can give decent benchmarks of model performance. Random forest and boosting trees are both tree-based algorithms and entropy is used as the loss function, but the random forest works in parallel while adaptive boosting works sequentially. Specifically, a random forest obtains results by taking the average of each decision tree prediction, while adaptive boosting builds decision trees iteratively, and the weight of each observation is adjusted until convergence.
The random forest method (Breiman 2001) and boosting trees (Friedman 2001) are chosen as benchmark models since they are routinely used in water sciences. For example, the random forest regression model is demonstrated to be effective in unsaturated hydraulic conductivity and superior to the M5 tree model (Sihag et al. 2019). A random forest approach is also applied in forecasting the spatial distribution of sediment pollution and detecting its predictors (Walsh et al. 2017). Tyralis et al. (2019) provided a thorough overview of recent papers which involve the study of hydrology and random forest applications, whose themes include rainfall runoff model, streamflow forecasting and drought prediction.
For a fair comparison, all three models include the same seven covariates as the CAR model: precipitation, seasonality variables and all related interaction terms. It is worth noting that the spatial information is handled differently between the CAR model and the other benchmark models. Rather than defining a covariance matrix based on the basin information, we include the water basin indicator as a categorical variable. The mean squared error (MSE) and the mean absolute error (MAE) are reported in Table 3. As seen in Table 3, the CAR model outperforms the benchmark models by a considerable margin. In general, the CAR model is better at exploiting the spatial dependency while the random forest algorithm makes better use of covariates. Therefore the large difference between the two models in MAE and MSE can be explained by the small number of covariates. This notion can be further substantiated by examining the feature importance of the random forest and boosting trees (feature importance is measured by the amount of entropy reduced after a variable is added to the full model), since these two models assign almost negligible importance to the watershed variables (Table 4). On the  other hand, consistent with the CAR model, the benchmark models such as the random forest recognize precipitation as a major contributor to the flood height (with a feature importance value of 0.6720 based on random forest, 0.4686 based on boosting trees).
To validate the conclusion, we also compute the MSE based on leave-one-out (LOO) cross-validation based on locations. We use LOO rather than traditional K-fold crossvalidation to avoid the computational burden of partitioning the data (locations) repeatedly and fitting the model on every partition. Vehtari et al. (2017) introduced an efficient computation of LOO from MCMC samples, which uses Pareto-smoothed importance sampling (PSIS) to provide an estimate of point-wise out-of-sample prediction accuracy. The CAR model, in this experiment, gives an MSE of 0.32, which outperforms linear model (1.12), random forest (1.32) and boosting trees (1.66), consistent with outcome of the one-week forecast task.

Discussion
We have presented a spatiotemporal model for gage height in South Carolina from 2011 to 2015, a period including one of the most destructive storms in state history. Our model accounts for the heavy-tailed pattern of the response variable and allows us to determine several covariates that affect the gage height and to interpret their effects. In particular, due to the effect of interactions, a stronger association between precipitation and flooding can be observed during summer compared to other times of the year. If reliable precipitation forecasts are available, our model could be used for forecasting realtime and future flood events, potentially aiding early warning systems and emergency management.
In addition, we developed a Python library to streamline the data preprocessing steps. Data scraping, cleaning, aggregating and transforming steps can be done by simple function calls. We demonstrate several reusable modules we have developed by providing some basic examples in the Github repository of our package. Our hope is that such tools will enable straightforward employment of similar spatio-temporal models for flood data in the future.
wasted on the zeroes if we employ a standard dense-matrix algorithm. Specifically, a dense matrix is typically stored as a two-dimensional array, and each entry in the array represents an element a ij of the matrix. One can access any element by specifying the row index i and the column index j. In contrast, in a typically sparse matrix representation, only the nonzero entries are stored and thus memory use can be reduced substantially. As a tradeoff, retrieving individual elements becomes more complex in a sparse matrix.
In practice, there are several representations of a sparse matrix. While some types stand out for their efficient modification, such as DOK (Dictionary of Keys) and COO (Coordinate List), others, e.g., Compressed Sparse Row (CSR), support fast matrix operations. CSR suits our needs better since evaluating a multivariate normal distribution involves matrix multiplication, and thus is implemented as part of our model.
The compressed sparse row (CSR) represents a matrix by three one-dimensional arrays: the nonzero values, the row indices, and the column indices. Note that the row indices are not defined in a straightforward manner. An example is given as follows to demonstrate how a CSR representation is implemented. The array A is the nonzero values, whose column indices are stored in JA. For instance, 3 is in the third column and thus the third element in JA is 2, which stands for the third column since a zero-based index is used. On the other hand, IA contains the row information and is defined recursively, where IA[0] = 0 and IA[i] = IA[i-1] + k. Note that k is number of nonzero elements on the ith row in the original matrix. According to this definition, the length of IA is m + 1 when the matrix has m columns, and the last element in IA is always the number of nonzero values.
The sparse matrix stored in CSR is efficient in matrix-vector multiplication due to the structure of IA and JA. For instance, multiplying [5,8,0,0] by another vector, say [1,0,9,9], requires only retrieving nonzero values at location 0 and 1 from the second row. The location information is conveniently stored in JA, and the length of nonzero values can be found in IA. Since we only need to compute the dot product of [5,8] and [1,0], the computation is reduced by half. In practice, we observe a six-to-ten times boost in sampler performance by switching from a dense matrix implementation, since the adjacency matrix has greater sparsity.