On the Evaluation of Both Spatial and Temporal Performance of Distributed Hydrological Models Using Remote Sensing Products

: Evaluating the spatial and temporal model performance of distributed hydrological models is necessary to ensure that the simulated spatial and temporal patterns are meaningful. In recent years, spatial and temporal remote sensing data have been increasingly used for model performance evaluation. Previous studies, however, have focused on either the temporal or spatial model performance evaluation. In addition, temporal (or spatial) model performance evaluation is often conducted in a spatially (or temporally) lumped approach. Here, we evaluated (1) the temporal model performance evaluation in a spatially distributed approach (spatiotemporal) and (2) the spatial model performance in a temporally distributed approach (temporospatial). We further demonstrated that both spatiotemporal and temporospatial model performance evaluations are necessary since they provide different aspects of the model performance. For this, a case study was developed using the Soil and Water Assessment Tool (SWAT) for the Upper Baitarani catchment in India, and the spatiotemporal and temporospatial model performance was evaluated against three different remotely based actual evapotranspiration (ETa) products (MOD16 A2, SSEBop, and TerraClimate). The results showed that an increase in the spatiotemporal model performance would not necessarily lead to an increase in the temporospatial model performance and vice versa, depending on the evaluation statistics. Overall, this study has highlighted the necessity of a joint spatiotemporal and temporospatial model performance evaluation to understand/improve spatial and temporal model be-havior/performance.


Introduction
Spatial heterogeneity in catchment characteristics (land use, soil type, and slope) and spatial and temporal variations in hydrological forcing (rainfall, wind speed, and solar radiation) are often observed in nature [1]. These factors cause spatial and temporal variations of hydrological processes/responses, e.g., soil moisture [2,3], evapotranspiration [4], and groundwater recharge [5]. Spatially and temporally distributed hydrological models (herein referred to as distributed models) have been proven as effective tools for modeling spatial and temporal hydrological processes, e.g., the Soil and Water Assessment Tool (SWAT) [6,7], the TOPography-based hydrological MODEL (TOPMODEL) [8], and the Variable Infiltration Capacity (VIC) model [9], and the mesoscale Hydrologic Model (mHM) [10,11].
In order to ensure that the distributed models provide meaningful spatial and temporal patterns of the interested processes, the simulated spatial and temporal patterns should be evaluated against the reference spatiotemporal data. In recent years, remotesensing data of high spatial and temporal resolution have been increasingly used for model evaluation, especially in data-scarce regions [12][13][14][15][16][17][18][19][20][21][22][23]. However, evaluating both the spatial and temporal performance of distributed models using remote sensing data has not been well addressed in the literature. The preceding studies have focused on evaluating either the temporal or spatial model performance. The majority of the conducted studies have evaluated the temporal model performance while the spatial evaluation is often performed using a spatially lumped approach, e.g., at the basin or subbasin level [20,21,24]. Similarly, the spatial model evaluation is often conducted using a temporally lumped approach, e.g., monthly, yearly, or the entire simulation time (e.g., Koch et al. [17]). Up to now, no study has evaluated both (1) the temporal model performance in a spatially distributed approach (hereinafter referred to as the spatiotemporal model performance evaluation) and (2) spatial model performance in a temporally distributed approach (hereinafter referred to as the temporospatial model performance evaluation).
Evaluating the spatiotemporal and temporospatial model performance are the two different aspects of the model evaluation. In this study, we focus on the spatiotemporal model performance evaluation approach which quantifies the temporal pattern matching between the simulated and reference time-series data at each spatial unit (e.g., pixel) of the model domain. This can be undertaken by using different temporal performance indices, e.g., the Nash-Sutcliffe efficiency (NSE) [25], the Kling-Gupta efficiency (KGE) [26], percentage bias (PBIAS), the root mean square error (RMSE), and the ratio of the RMSE to the standard deviation of measured data (RSR). Spatiotemporal model performance evaluation, therefore, provides information about the locations where the temporal model performance is not good (spatial model issue).
In contrast, this study also assessed the temporospatial model performance evaluation that quantifies the matching between the simulated and reference spatial patterns at each model time step. It can be undertaken using different spatial performance indices. Some of the temporal performance metrics could be used (e.g., RSR, BIAS). Some performance metrics were specially developed for evaluating the spatial pattern matching, e.g., the SPAtial Efficiency metric (SPAEF; Koch et al. [17]), fractions skill score [27], and others [28]. Among these spatial performance matrices, the SPAEF metric, which consists of three equally weighted components (correlation, coefficient of variation, and histogram overlap) was demonstrated as a robust statistical index for evaluating spatial pattern matching [17]. Temporospatial model performance evaluation provides information about the timing when the spatial model performance is not good (temporal model issue). This topic has been less explored in depth so far.
From the above discussion, it can be said that spatiotemporal and temporospatial model evaluation (1) should be used together to detect both spatial and temporal model issues, and (2) are the two different aspects of model evaluation. However, this statement has not been validated by the research conducted in the past. Therefore, the main objective of this study is to validate the two aforementioned statements. For these objectives, the Soil and Water Assessment Tool (SWAT) [6] was set up for the Upper Baitarani catchment located in a data-scare region in India. Satellite-derived actual evapotranspiration (ETa) data from multiple sources [29][30][31] were used as reference spatiotemporal data for model evaluation due to its global coverage and availability. Results from this study are also expected to promote the use of remotely sensed data for the spatial and temporal evaluation of distributed hydrological models.

Study Area and Data
The Upper Baitarani catchment is in India with an area of about 1711 km 2 ( Figure 1). The digital elevation model (DEM) from the Shuttle Radar Topography Mission shows that the elevation of the catchment varies in a wide range, from 330 to 1120 m above mean sea level (a.m.s.l). Despite the study area covering a relatively large area with high variation in topography, there is only one weather station located in the area. Observed rainfall shows an average annual rainfall of 1165 mm with high temporal variation and out of which more than 80% of the rainfall occurs between June and October. The average monthly minimum and maximum temperature in the area are 11 °C and 34 °C, respectively. The dominant land uses/land covers in the area are agricultural, forest, and range grass, accounting for about 36%, 32%, and 25% of the study area, respectively. The dominant soil type in the area is sandy clay loam with low water holding capacity [32]. The observed weather data were procured from the India Meteorological Centre, Bhubaneswar, India. Furthermore, sunshine hours were used for calculating the solar radiation data for the considered catchment.

Spatiotemporal and Temporospatial Model Performance Evaluation
Evaluation of the spatial and temporal performance is the assessment of the simulated spatiotemporal data against the reference (or observed) spatiotemporal data. Here, the model output data that need to be evaluated are time-series data at every pixel (with an equal area) of the modeling domain ( Figure 2). In the spatiotemporal approach (Figure 2), the simulated time-series data at every pixel (spatial index S) are first evaluated against the reference time-series data at the corresponding pixel using a temporal performance index T (e.g., NSE, KGE, PBIAS, RMSE). The spatial variation of the temporal index T provides information on the spatiotemporal model performance. In the temporospatial approach (Figure 2), the simulated spatial patterns are evaluated against the reference spatial patterns at every time step using the spatial index S (e.g., SPAEF, RMSE). Information about the temporospatial model performance could be obtained from the temporal variation of the spatial index S.

Reference Spatiotemporal ETa
In this study, global satellite-based ETa products from the Moderate Resolution Imaging Spectroradiometer (MOD16 A2) [29], the operational Simplified Surface Energy Balance model (SSEBop) [30], and the TerraClimate [31] were used. ETa from these products was estimated using different techniques and is available at different spatial and temporal resolutions. For example, MOD16 A2 ETa was derived using the Penman-Monteith approach with daily meteorological reanalysis data (air pressure, temperature, humidity, solar radiation) and 8-day remote-sensing vegetation indices (leaf area index, albedo, and fraction of photosynthetically active radiation). MOD16 A2 ETa is available at 0.5-km spatial resolution and 8-day interval. SSEBop ETa was calculated with the simplified energy balance model [30]. This model combines the estimated ET fraction based on MODIS thermal imagery and grass-reference potential ET. SSEBop ETa is available at 1-km resolution and monthly timestep. TerraClimate ETa was estimated based on the one-dimensional modified Thornthwaite-Mather climatic water-balance model with (1) monthly climate data from the WorldClim [33,34], the Climate Research Unit [35], and the Japanese 55-year Reanalysis [36], and (2) water storage capacity from the root zone storage capacity [37]. Global TerraClimate ETa is available at 4-km spatial resolution and monthly time step.
The differences in the absolute ETa between the three products are expected due to the different techniques and input data used to estimate ETa. The annual average ETa during the period 2003-2010 in the study area is 606 mm (MOD16 A2), 970 mm (SSEBop), and 845 mm (TerraClimate). The spatial and monthly ETa patterns also show their dissimilarities ( Figure 3). However, there are some common spatial and temporal ETa patterns among these products, e.g., low ETa near the catchment outlet, high ETa in the western part of the study area, and high seasonal variation. Evaluating the accuracy of these products is not within the scope of this study. Here, we instead provide the novel techniques for spatiotemporal and temporospatial model evaluation. Future studies can evaluate the validity of different remote-sensing products before applying our approach for spatiotemporal and temporospatial model evaluations. In this study, the considered ETa products were used as reference data for model evaluation to (1) explore the SWAT model performance compared with different ETa products and (2) account for the uncertainty in the estimated ETa from different ETa products.

The SWAT Model
SWAT is a distributed hydrological used for evaluating the impacts of land use management practices on water, sediment, and nutrient yields [6]. SWAT has been widely used and tested in various catchments worldwide [38]. In SWAT, a catchment is divided into sub-catchments which are further sub-divided into hydrologic response units (HRUs). An HRU is a fraction of land with a unique combination of land use, soil type, and slope within a sub-catchment. SWAT simulates two phases of the hydrological cycle, the land phase and the routing phase. The land phase includes HRU-related processes (e.g., evapotranspiration, surface runoff, infiltration, lateral flow, groundwater recharge, and based flow). The routing phase simulates flow in water bodies (e.g., river, reservoir, and pond). SWAT provided outputs at different spatial levels, e.g., HRU, sub-catchment, or catchment. Outputs at the HRU level could be mapped to the HRU raster file, which contains the spatial information of HRUs, created during the model setup with ArcSWAT (e.g., Kim et al. [39]; Nguyen & Dietrich [40]).

Model Setup
Based on the land use, soil type, and slope, the study area is divided into 45 catchments, which are further sub-divided into 875 hydrologic response units (HRUs). The model was set up to simulate for ten years at the daily time step with two years of warm-up (2001)(2002) and eight years of model evaluation (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010). In SWAT, ETa is calculated after potential evapotranspiration (ETp). There are several methods available for calculating ETp [6,41], for example, the Penman-Monteith [42][43][44], Priestly-Taylor [45], and Hargreaves [46]. Here, we selected the Penman-Monteith method as it was one of the widely used methods for evapotranspiration quantification [47]. Daily ETp is calculated as follows [6,[42][43][44]: where Δ is the slope of the saturated vapor pressure curve (kPa °C −1 ), is the net radiation (MJ m −2 , d −1 ), G is the heat flux to the soil (MJ m −2 d −1 ), is the air density (kg m −3 ), is the specific heat at constant pressure (MJ kg −1 °C −1 ), and are the saturated and the actual vapor pressure at height z (kPa), respectively, is the bulk surface aerodynamic resistance for water vapor (s m −1 ), is the latent heat of vaporization (MJ kg −1 ), is the psychrometric constant (kPa °C −1 ), and is the canopy surface resistance (s m −1 ). Daily simulated ETa from SWAT was aggregated to an 8-day interval for comparison with MOD16 A2 ETa and to a monthly interval for comparison with SSEBop and Terra-Climate ETa. The aggregated ETa was then mapped to the HRU raster map of 90 m resolution that was created during the model setup using ArcSWAT (e.g., Kim et al. [39]; Nguyen & Dietrich [40]). ETa products from MOD16 A2, SSEBop and TerraClimate were spatially disaggregated to the same spatial resolution of SWAT ETa (90 m resolution) for comparison.

Parameter Variation and Model Performance Evaluation Scenarios
The objectives of parameter variation are to (1) search for a global optimal solution, and (2) find a relationship between spatiotemporal and temporospatial model performance in different model simulations. The model parameters (Table 1) were selected based on the literature review of the most common parameters for ETa calibration [19,20,23,48,49]. These parameters affect evaporation/evapotranspiration in different model compartments. For example, the soil evaporation compensation factor (ESCO) controls the maximum allowable soil evaporation with depth. Higher plant uptake compensation factors (EPCO) mean that plants can take water from deeper soil layers, resulting higher evapotranspiration. Higher groundwater "revap" coefficients mean that more water from the shallow aquifer could be moved up to the root zone for evaporation. The maximum canopy storage (CANMX) affects the amount of precipitation that can be held by plant canopy; therefore, it affects evapotranspiration. The effect of other parameters (Table 1) on evapotranspiration is not straightforward to interpret. In this study, some of the parameters were changed in a spatial approach according to land use and soil type ( Table 1). Latin Hypercube Sampling (LHS) was used to generate 2000 parameter sets in a search for a global optimum solution. LHS has been demonstrated as an efficient sampling approach for searching for a global optimal solution for high-dimensional problems [23]. The prefixes "v" and "r" indicate "replace" and "relative" change compared to the original values.
In this study, the optimal solution achieved from 2000 model runs was used as an example for demonstrating (1) how spatiotemporal and temporospatial model performance evaluation could be conducted, and (2) the benefits of conducting both spatiotemporal and temporospatial model evaluation (Section 3.1). The following multi-objective function (OF) was used for selecting the optimal solution: where TS and ST are the spatiotemporal and temporospatial model performance indices, respectively, and are the statistical indices for temporal and spatial model performance, respectively, for actual evapotranspiration (ETa) at the pixel j and time step k, npixels is the number of pixels, and k is the number of evaluation time steps. For this analysis, we selected a MOD16 A2 ETa product as reference data with is SPAEF (Equation (6)) and is NSE (Equation (7)) as an example in this case (Section 3.1). To evaluate the relation between spatiotemporal and temporospatial model performance, we used different spatiotemporal (ST) and temporospatial (TS) performance indices (Table 2). For this evaluation, different ETa products (MOD16 A2, SEEBop, and Terra-Climate) were used to draw reliable conclusions. The relation between spatiotemporal and temporospatial model performance was analyzed based on 2000 model runs (Section 3.2). Table 2. List of spatial and temporal model performance statistics used in this study.

Notation
Range and Ideal Value Notation Range and Ideal Value Numbers in bold indicate the ideal value.
The spatiotemporal (ST) and temporospatial (TS) performance statistics (Table 2) are calculated using Equations (3) and (4), respectively, while the temporal T (e.g., SPAEF, NSE, RMSE, RSE, absolute bias aBIAS) and spatial S (e.g., KGE, NSE, RMSE, RSR, aBIAS) performance are calculated as follows: with: where is the linear correlation between observed and simulated values (which in this study are the simulated and satellite-derived ETa), is the variability measure, is the bias term, is the histogram intersection, and are the simulated and observed values, ̅ is the mean of observed , and L are the histograms of observed and simulated variables, respectively, m is the number of bins, and | | is the absolute sign. Figure 4 shows the best spatiotemporal and temporospatial model performance for ETa (with MOD16 A2 ETa as the reference dataset). It is seen that the spatial distribution of the NSE is far from homogeneous (Figure 4a), which could not be seen in a spatially lumped approach. From the spatial distribution of NSE, the areas with a relatively poor temporal model performance compared to others could be easily identified. This information could be used to (1) further improve the model performance and (2) detect the reason for poor temporal model performance in these areas. The histogram and cumulative plots provide useful statistical information about the temporal model performance (Figure 4b,c). The histogram (or cumulative) plot shows the fraction of the study area that has the NSE within a specific range (or below a specific value). The histogram plot is seen to be highly skewed toward the optimal NSE value (optimal NSE = 1). This is the desired model performance, in other words, a higher skewness of the NSE distribution toward the optimal point indicates a better model performance.

Spatiotemporal and Temporospatial Model Performance
Despite the NSE being a temporal performance index, it does not provide information about the time when the model poorly performs. However, this can only be seen in the temporal variation of the SPAEF index (Figure 3d). For example, during April 2004 and April-May 2007, the simulated patterns are poorly matched with the reference data while in other periods the match is far better. The distribution of SPAEF indices is also highly skewed toward the optimal value as about 40% of the time step of the SPAEF indices are less than −1 (Figure 4e,f).  Figure 5 shows that the relation between spatiotemporal (ST) and temporospatial (TS) model performance varies depending on the reference ETa data and the statistical indices used for evaluation. For example, when KGE and SPAEF were used to derive the spatiotemporal (S-KGE) and temporospatial (T-SPAEF) mode of performance statistics, S-KGE and T-SPAEF could be highly positively correlated (Figure 5a), highly negatively correlated ( Figure  5b), or uncorrelated (Figure 5c). A high positive correlation between S-KGE and T-SPAEF indicates that an increase in the spatiotemporal model performance will likely lead to an increase in temporospatial model performance. However, a high negative correlation between S-KGE and T-SPAEF means that an increase in spatiotemporal model performance will likely result in a decrease in temporospatial model performance and vice versa. An uncorrelated S-KGE and T-SPAEF indicates that the spatiotemporal model performance cannot be inferred from the temporospatial model performance and vice versa. The results show that even if the same statistical index was used for spatiotemporal and temporospatial model evaluation (e.g., S-NSE and T-NSE, Figure 5a) it does not always guarantee that an increase in the spatiotemporal model performance will lead to an increase in the temporospatial model performance (Figure 5a-c). Overall, the results show that both spatiotemporal and temporospatial model performance evaluation is needed.

Conclusions
Distributed hydrological models have long been used for understanding the spatial and temporal hydrological responses of a catchment. To ensure that the model can provide meaningful spatial and temporal information, the simulated spatiotemporal data from these models should be evaluated against the reference data. However, studies have either focused on temporal or spatial model evaluation. In addition, temporal (or spatial) model evaluation is often conducted in a spatially (or temporally) lumped approach. The terms "spatiotemporal" and "temporospatial" model performance evaluations introduced in this study refer to the evaluation of (1) the temporal model performance in a spatially distributed approach and (2) the spatial model performance in a temporally distributed approach, respectively. Here, we demonstrated that both spatiotemporal and temporospatial model performance evaluations are needed. They are indeed two different aspects of model evaluation. The results showed that spatiotemporal (or temporospatial) model performance evaluation could help in detecting the locations (or the time) where (or when) the temporal (or spatial) patterns are poorly represented by the model. Additionally, an increase in the spatiotemporal model performance will not necessarily lead to an increase in the temporospatial model performance and vice versa. This further suggests that both spatiotemporal and temporospatial model performance evaluations are needed to achieve a reliable model performance. Overall, the proposed approach provides graphical and statistical indicators for a better understanding of spatiotemporal and temporospatial model behavior which will further help in improving the model performance.  Data Availability Statement: The MOD16 A2, SSEbop, TerraClimate data used for model evaluation are freely available at https://modis.gsfc.nasa.gov/data/dataprod/mod16.php (accessed 15 January 2022), and https://earlywarning.usgs.gov/fews/product/460 (accessed 15 January 2022), and http://www.climatologylab.org/terraclimate.html (accessed 15 January 2022), respectively. The R code and SWAT model used in this study are available upon request via contacting the corresponding author.