Calibration and evaluation of a semi-distributed watershed model of Sub-Saharan Africa using GRACE data

Irrigation development is rapidly expanding in mostly rainfed Sub-Saharan Africa. This expansion underscores the need for a more comprehensive understanding of water resources beyond surface water. Gravity Recovery and Climate Experiment (GRACE) satellites provide valuable information on spatio-temporal variability in water storage. The objective of this study was to calibrate and evaluate a semi-distributed regional-scale hydrologic model based on the Soil and Water Assessment Tool (SWAT) code for basins in Sub-Saharan Africa using seven-year (July 2002–April 2009) 10-day GRACE data and multi-site river discharge data. The analysis was conducted in a multi-criteria framework. In spite of the uncertainty arising from the tradeoff in optimising model parameters with respect to two non-commensurable criteria defined for two fluxes, SWAT was found to perform well in simulating total water storage variability in most areas of Sub-Saharan Africa, which have semi-arid and sub-humid climates, and that among various water storages represented in SWAT, water storage variations in soil, vadose zone and groundwater are dominant. The study also showed that the simulated total water storage variations tend to have less agreement with GRACE data in arid and equatorial humid regions, and model-based partitioning of total water storage variations into different water storage compartments may be highly uncertain. Thus, future work will be needed for model enhancement in these areas with inferior model fit and for uncertainty reduction in component-wise estimation of water storage variations.


Introduction
Sub-Saharan Africa (SSA) is used as a collective term that refers to African nations which lie (or partially lie) south of the Sahara.The region makes up about 80 % of the African and 10 % of the global population.Agriculture forms the backbone of the SSA economy; however, SSA countries largely missed the green revolution.The agricultural productivity in SSA countries remains low relative to other parts of the world and the region is still beset with food insecurity.The number of estimated undernourished people in SSA in 2010 reached 239 million (FAO, 2010).SSA is also the only region where childhood malnutrition is projected to increase as a result of rapid population growth, climate change and continued low productivity in agriculture (Rosegrant et al., 2009).Annual population growth in SSA is 2.2 %, much higher than global average of 1.1 % (World Bank, 2009).In addition, SSA is regarded as the region with a particularly low capacity to adapt to climate change (IPCC, 2007).
Sustainable intensification of agriculture, with a focus on irrigation development, is considered a key pillar for increasing agricultural productivity in SSA (Rosegrant et al., 2002;Molden, 2007;Rockström et al., 2007).SSA straddles the Equator and is dominated by tropical and sub-tropical climate.Rainfall in SSA is highly variable both spatially and temporally and constitutes a more critical factor than temperature for agriculture.Limited water availability, particularly during droughts, is a key reason for crop failure, especially considering the fact that SSA agriculture is predominantly rainfed with only 3 % of the cultivated area irrigated (Siebert, 2010;FAO, 2011).Both international development banks and African governments have pledged to significantly increase irrigation development to address low agricultural productivity, rural poverty and food security challenges in the region.
Significant expansion of irrigated agriculture in SSA, however, will require a more comprehensive understanding of water resources in the region.Mathematical models are important tools for scientific investigation and to support policy decisions.They provide a feasible and economical way to explore key hydrologic processes and to evaluate alternative management options where direct observation and experimentation are not possible, are costly, or both.However, hydrologic modelling is challenging, particularly for regions with limited data.Models are only a rough representation of reality.Model calibration and evaluation using historical monitoring data is critical before the model is used to provide reliable results.
In this study, we present the calibration and evaluation of a regional-scale semi-distributed watershed model using GRACE data.The model was developed to simulate hydrology in Sub-Saharan African countries, and the model development is based on the Soil and Water Assessment Tool (SWAT) code (Arnold et al., 1998).SWAT is a physicallybased comprehensive river basin model with a proven track record of successful application globally, including in agricultural water management (e.g., Kim et al., 2008;Xie et al., 2008Xie et al., , 2011;;Dhar and Mazumdar, 2009;Oeurng et al., 2011).The size of the study river basins in reported SWAT applications typically range from a few square kilometres to tens of thousands of square kilometres.However, the model also shows potential for watershed studies at very large scales.Related to Africa, SWAT was applied to West Africa and to the entire continent to estimate blue and green water resources by Schuol et al. (2008a, b).
Conventional processes for calibrating and validating hydrologic models generally use stream discharge data.In previous SWAT applications in Africa, the model was calibrated and validated using river discharge time series data on monthly basis (Schuol et al., 2008a, b).In the model calibration and evaluation study reported in this paper, satellitebased observations of total water storage (TWS) variations derived from the Gravity Recovery And Climate Experiment (GRACE) were used to complement discharge data.GRACE is a joint mission launched in 2002 by NASA and the German Space Agency (DLR) to accurately map the Earth's gravity field (Tapley et al., 2004).After corrections for tidal and atmospheric mass variations, the hydrologic cycle is the primary source of variations in the Earth's gravity field on the continents (Schmidt et al., 2008).Variations in TWS (i.e., water storages variations integrated vertically over all water storage layers) can be inferred from the GRACE gravity signal.
Including additional state observations other than river discharge expands the data base for model evaluation and may help generate additional insights into model perfor-mance (Parajka et al., 2006;Fenicia et al., 2008;Konz and Seibert, 2010).In this study, the merits of incorporating GRACE-based hydrologic observations into the calibration and evaluation of the SWAT model of Sub-Saharan Africa (SWAT-SSA) are two-fold.Firstly, the river systems in SSA are poorly monitored and many river basins are ungauged.GRACE-based TWS variations have a global coverage and, thus, offer the opportunity to calibrate and evaluate the model for those areas where river discharge data are not available or sparse.Secondly, river discharge is part of "blue" water, which is the traditional focus of water resources planning and management, but only accounts for a small portion of total water resources.Over the past decade, the definition of agricultural water management has widened to include the entire hydrologic cycle (e.g., Falkenmark and Rockström, 2006).GRACE data provide direct estimates of TWS to help verify the capacity of the SWAT model to simulate spatio-temporal variability across all water balance components.
GRACE has been widely used to monitor changes in water mass redistribution for various basins globally.For example, GRACE has been used to quantify changes in water storage in response to droughts with a specific focus on groundwater systems (Leblanc et al., 2009;Chen et al., 2010).Water storage in East African Great Lakes was estimated using GRACE data as well (Becker et al., 2010).Many studies evaluated groundwater depletion related to irrigation (NW India: Rodell et al., 2009;California, US: Famiglietti et al., 2011) with some studies emphasising ground referencing using well data (Longuevergne et al., 2010;Scanlon et al., 2012).Good correspondence was found between GRACE-based storage estimates and well data within uncertainty envelopes of GRACE-based estimates.In addition to basin scale and global studies of changes in water storage, GRACE is also widely used in modelling studies to condition land surface models (Guntner, 2008) and for data assimilation (Zaitchik et al., 2008).To date most studies that use GRACE data for model conditioning are limited to model validation without significant calibration or model parameter tuning to GRACE data (Niu and Yang, 2006;Ngo-Duc et al., 2007;Syed et al., 2008;Yirdaw et al., 2009;Alkama et al., 2010;Tang et al., 2010;Grippa et al., 2011;Yang et al., 2011).Werth et al. (2009Werth et al. ( , 2010) ) may be the first to present calibration analyses for water storage variability in global hydrologic modelling using GRACE data.In their studies, GRACE-based water storage variations were used to calibrate and validate the WaterGAP Global Hydrology Model (WGHM) for 28 major river basins globally.More recently, Milzow et al. (2011) combined GRACE data with altimetry and Synthetic Aperture Radar (SAR) surface soil moisture data to calibrate and validate the SWAT model for the Okavango catchment in Southern Africa.The study presented in this paper is focused on the calibration and evaluation of the SWAT model at a regional scale.finer temporal scale (10-day) relative to the typical monthly interval for most GRACE products.The rest of the paper is organised as follows: the setup of the SWAT model is described in Sect.2, and the key datasets and steps for calibrating and evaluating the SWAT-SSA models are described in Sect. 3 through 5. Section 6 presents the results of the model calibration and validation.A summary of the major findings from this study and their implications are provided in Sect.7.

SWAT model setup
The model area in this study is ∼ 21 million km 2 (Fig. 1).The major datasets used for the setup or initial parameterisation of the SWAT-SSA model are listed in Table 1.The data acquisition and processing strategy in our study are similar to those described in Schuol et al. (2008a, b), but updated data or alternative options were selected in most cases.
The drainage topology of the study region is represented in SWAT by partitioning the river basins into subbasins and defining the corresponding drainage network of the river system with one river channel segment in each subbasin.Elevation data used in this step of watershed delineation were clipped from the HydroSHEDS database (Lehner et al., 2008).HydroSHEDS is a derivative mapping product Based on topographic analysis of HydroSHEDS elevation data, SSA was divided into 1488 subbasins (Fig. 1).Furthermore, SWAT is a semi-distributed watershed model with potentially different parameter values for different subbasins/reaches.To reflect the spatial variability of SWAT parameters in calibration and considering computational efficiency, the SSA was divided into 10 sub-continental regions; one SWAT model was set up and calibrated separately for each sub-continental region (Fig. 1; Table 2).Within a subbasin, SWAT allows multiple hydrologic response units (HRUs) to be defined that reflect spatial variability in soil and land cover distributions.However, due to computational limitations, only one HRU with the dominant land cover and soil was created for each subbasin (Winchell et al., 2007).The soil data were obtained from the Harmonized World Soil Database (HWSD, v. 1.1, FAO/IIASA/ISRIC/ISSCAS/JRC, 2009) and the land cover data were obtained from the Global Land Cover 2000 database (European Commission, Joint Research Centre, 2003, http://bioval.jrc.ec.europa.eu/products/glc2000/glc2000.php).The HWSD contains updated soil data for eastern, central and southern African countries relative to the FAO/UNESCO Soil Map of the World.The soil attribute data in HWSD meet most requirements for SWAT model parameterisation; however, two important parameters that describe the hydrologic properties of soils  (available water capacity and saturated hydraulic conductivity) are missing and were estimated using pedotransfer functions (Saxton et al., 1986;Schaap et al., 2001).
Climate forcing data for SWAT include 1 degree daily (1DD) precipitation, temperature, solar radiation and relative humidity data and were obtained from the NASA Langley Research Center POWER Project.A GIS layer of polygon-based SWAT subbasin boundaries was overlaid with a GIS layer of climate gridded data to calculate fractions of area covered by different climate data grid cells for each subbasin and to compute area-weighted values of climatic variables as basin-wide estimates of these variables.The original source of the precipitation data is the Global Precipitation Climate Project (GPCP, http://precip.gsfc.nasa.gov).The 1-DD GPCP dataset combined observations from multiple sensors (Huffman et al., 2001).The data for other climate variables were compiled from various NASA's projects (Table 1) by the POWER project and were primarily used for estimation of potential evapotranspiration (PET).SWAT includes three different methods for estimating PET (Neitsch et al., 2005) with varying data requirements and the Priestley-Taylor method (Priestley and Taylor, 1972) was selected because it is considered more accurate than the Hargreaves method (Hargreaves and Samani, 1985), which is temperature-based, and reliable estimates of wind speed required for the Penman-Monteith method (Monteith, 1965) were not available at the time of this study.
SWAT also provides two options for simulating flow routing in river channels.The variable storage method was used to route water in river channels because pilot simulations suggested that this is more robust than the Muskingum option in this study.Anthropogenic impacts on water resources were considered to be negligible in SSA.Agriculture is the dominant water use sector.However, current agriculture in SSA is mainly rainfed; the area of SSA equipped for irrigation only accounts for 3 % of the total cultivated area (Siebert, 2010;FAO, 2011).Therefore, existing irrigation was not simulated in this study.
SSA has a number of large fresh water bodies, such as Lake Victoria, the world's second largest fresh water lake in terms of surface area (239 000 km 2 ), and Lake Volta, the world's largest reservoir in terms of surface area (8502 km 2 ).The major lakes and reservoirs in SSA were defined in our SWAT-SSA model (Fig. 2).Locations and storage capacities of these water impoundments were obtained from the Global Lakes and Wetlands Database (GLWD) (Lehner and Döll, 2004).Due to signal leakage, mass variations in these lakes and reservoirs may have a significant contribution to GRACE TWS observations (e.g., Becker et al., 2010), even if their size is much less than the GRACE footprint (∼ 450 km, i.e., 200 000 km 2 in area).We compared the simulated water level change data and water level change data obtained with satellite altimetry (Crétaux et al., 2011, see http://www.legos.obs-mip.fr/en/soa/hydrologie/hydroweb/)and found that it is difficult to adequately simulate water storage variations in these lakes and reservoirs because of lack of detailed bathymetry and reservoir operation data.In this study, an alternative modelling strategy was taken, i.e., lake and reservoir mass correction was applied to GRACE TWS data according to height and volume variations of the 22 largest lakes and reservoirs in SSA (Table 3) from satellite altimetry data analysis for a fair comparison between GRACE and hydrologic model.Accordingly, simulated water mass variations in lakes and reservoirs were excluded from the modelbased TWS variation calculation.Bettadpur, 2007) was also used to estimate GRACE errors at a 10-day timescale.CSR data were destriped according to Swenson and Wahr (2006).For error calculation, both GRGS and CSR GRACE products were truncated at degree 30 and smoothed using a 300-km Gaussian smoother to evaluate large-scale errors.Error is computed at a monthly time step as the difference between CSR and GRGS data and resampled as 10-day errors.

Hydrol
In the mass correction, the impact of 22 lakes and reservoirs were first forward modelled at GRACE GRGS resolution, prior to subtraction from GRACE.Lake volume variations were distributed on a grid and projected on spherical harmonics.They were then recombined up to degree 50 on a 0.5 degree grid.

Total water storage variation calculation in SWAT
SWAT was developed to provide continuous simulations of the basin hydrology at a daily timescale.For each day of the simulation, the model first computes the water yields on land and then routes the water through the defined river channel network.In the land phase simulation, SWAT uses the Soil Conservation Service (SCS) curve number method (SCS 1972) to estimate the volume of overland flow and storage routing techniques to simulate percolation and lateral movement of water through the soil profile.The water leaving the base of the soil profile does not enter aquifers immediately, but is time lagged based on transport through the vadose zone.The vadose zone is defined in SWAT as the unsaturated zone beneath the base of the soil profile and above the groundwater table.An exponential decay weighting function, proposed by Venetis (1969)  predict effective recharge into shallow aquifers (Sangrey et al., 1984).Variation in groundwater flow to rivers is linearly related to changes in water-table elevation.SWAT simulates several water storage components that make up total water storage to compare with GRACE TWS.These storages include: 1. Overland water storage (V 1 ), including river channels, bank storage and canopy storage.Due to the mass correction in GRACE data processing, the water storage variations in lakes/reservoirs were not taken into account.
2. Storages (V 2 + V 3 ) for lagged surface runoff and lateral flow.The two storages are defined in SWAT for estimating the amount of overland and lateral flow reaching river channels on a daily time step.SWAT allows for delayed release of overland flow and lateral flow yielded in river basins with time of concentration greater than one day.

Vadose zone (V 5
). Water storage in the vadose zone is typically not considered as a storage in SWAT water balance analysis because the Venetis' exponential decay weighting function (1969) does not alter the quantity of water from soil into aquifers.However, the time delay for water to move through the vadose zone results in variations in water storage and needs to be addressed in TWS variation calculations (Milzow et al., 2011).

Groundwater (V 6
). SWAT simulates an unconfined shallow aquifer and a confined deep aquifer in each subbasin.Water storage in shallow aquifers may contribute to flow in the main river channels or be re-evaporated into the soil.By contrast, there is no simulated outlet for water in deep aquifers except pumpage."Water that enters the deep aquifer is assumed to contribute to streamflow somewhere outside of the watershed" (Neitsch et al., 2005).While this assumption may hold in studies for small river basins, it is no longer valid at a continental scale.Due to the accumulation of water percolated from shallow aquifer, an upward trend in water storage in deep aquifers would be found, which is unrealistic.To circumvent this problem, the deep aquifer was excluded from the simulations and the calculation of TWS by setting the percolation rate to the deep aquifer to zero.
For each subbasin, the model-based TWS for each 10-day period was calculated as: where t is the index for the 10-day period.The series of SWAT subbasin-wide TWS anomalies (TWSV t ) was computed by differencing the TWS for each 10-day period TWS t and the mean of the TWS over the entire GRACE data period: TWSV t = TWS t − TWS where TWS is the mean of the TWS t over the GRACE data period, or was calculated by taking the average of 10d TWS t 's during July 2002-April 2009.

Calibration approach
Calibration and evaluation of the SWAT-SSA model in this study was carried out using a multi-criteria framework, similar to the studies by Werth and Güntner (2010).The multicriteria approach extends the traditional calibration approach by casting the calibration into a multi-objective optimisation problem, and for independent data, allows evaluation of model performance against more than one objective to improve model robustness and predictability (Gupta et al., 1998).The solution to the multi-criteria optimisation programme consists of the non-dominated calibration parameter sets, which are optimal in a Pareto efficiency sense.The trade-off between model fits evaluated by different criteria  reflects the minimum parameter uncertainty (Vrugt et al., 2003) caused by errors in the input and measured data as well as by the model structure.
For the calibration of the SWAT-SSA model, two objective functions were defined.Their definitions and calculations are explained in detail below.The multi-objective optimisation problems defined in the multi-criteria calibration of the SWAT-SSA models were solved using the Non-dominated Sorting Genetic Algorithm II (NSGA-II, Deb et al., 2002), a population-based heuristic evolutionary optimisation technique with a proven track record of success in solving many large-scale optimisation problems.The population sizes chosen in the optimisations varied from 150 to 300, and the optimisations lasted for 50 ∼ 100 generations until no significant improvements in the solution were observed.

Comparison of model-based and GRACE-derived TWS variations
As GRACE provides a filtered image of reality, the modelled storage variations from SWAT were first converted to GRACE resolution to provide storage values at the same   spatial scales for comparison.This mathematical process involves projecting SWAT modelled spatial fields to Spherical Harmonics (SH) up to degree 50 (in this study, SH transformation was conducted using SHTOOLS, http://www.ipgp.fr/∼ wieczor/SHTOOLS/SHTOOLS.html) in which the SWAT-based basin-wide TWS variations for each 10-day period were first disaggregated into a 0.5 by 0.5 degree grid prior to the transformation.In order to allow for a comparison between GRACE-and SWAT-based TWS variations for sub-continental regions, simulated variations in TWS by the Noah land surface model (Ek et al., 2003) in NASA's Global Land Data Assimilation System (GLDAS) (Rodell et al., 2004) were used as a priori information to fill areas outside of the SSA sub-region of interest in the SH transformation.Agreement between GRACE-derived and model-based TWS variations was evaluated using a weighted total square error (WTSE) function: where TWSV i,j,t,SWAT and TWSV i,j,t,GRACE are SWAT-and GRACE-based TWS variations for 10-day period t and grid cell (i, j ), respectively.I s is an indicator function.I s = 1 if the grid cell is located within the study region; otherwise, I s = 0, w i,j,t is the weight, an inverse of standard error of GRACE-based TWS variations TWSV i,j,t,GRACE .Finally, following the convention in hydrologic model calibration, available GRACE data were divided into two groups: the first 112 10-day periods (29 July 2002-December 2005) were used for calibration and the data for remaining 120 10-day periods were reserved for validation.GRACE data (2002-2009), and GPCP 1-DD precipitation data (1997-2009) pose difficulty for model calibration.In this study, we focused on evaluating the performance of SWAT for modelling TWS variability: SWAT was run for 2002-2009 (with five additional years 1997-2001 as the spin-up period) and, following the approach by Werth and Güntner (2010), simulated and observed monthly river discharge rates in two time frames were compared on a multi-year average basis.The fit of the SWAT model at each GRDC station was measured using the Nash-Sutcliffe Efficiency (NSE) coefficient (Nash and Sutcliffe, 1970), which is defined as where Q t,obs and Q t,sim are the multi-year averaged monthly discharges calculated using the simulated and available observed discharges (m 3 s −1 ), respectively.Qt,obs is the mean of Q t,obs (m 3 s −1 ).The NSE coefficient can range from −∞ to 1, where 1 indicates a perfect model fit.
The GRDC station network is relatively dense in West Africa, but limited in other regions (Fig. 2).This highlights the benefit of applying GRACE data to support hydrologic simulations in SSA.The NSE values for all GRDC stations in a sub-regional model were weighted by length of observed monthly river discharge series: where NSE i is NSE coefficient at GRDC station i, and w i is the weighting factor proportional to the length of the monthly river discharge data time series at that station ( The weighted NSE (WNSE) serves as the criterion for evaluating performance of SWAT in simulating river discharge.

Calibration parameters
The hydrologic processes and watershed properties in SWAT are characterised by many parameters.A list of SWAT parameters selected for calibration, together with their lower and upper bounds of adjustable ranges, are shown in Table 4.This list was determined from literature review, numerical sensitivity analysis (Morris, 1991), and according to results from several test runs of the calibration programmes.In these SWAT calibration parameters, SCS curve number is a key parameter for surface runoff estimation.It is defined to characterise the potential maximum soil moisture retention capacity.A low value indicates low runoff, but high infiltration potential.Surface runoff lag coefficient (SURLAG) determines how much total available runoff enters into a river reach on a given day and is a sensitivity parameter for simulating river discharge hydrographs.Soil evaporation compensation factor (ESCO) is defined to specify the depth distribution used to meet soil evaporative demand.As the value of ESCO decreases, more water can be evaporated from deeper soil layers.SOL AWC, SOL K, and  the water storage variation in the vadose zone.The remaining four parameters in the table, GW REVAP (groundwater revap coefficient), ALPHA BF (baseflow alpha factor), RE-VAPMN (threshold depth of water in the shallow aquifer for "revap" to occur) and GWQMN (threshold depth of water in the shallow aquifer required for groundwater flow to occur) control the behaviour of shallow aquifers.
The ten SWAT sub-regional models were calibrated with the parameters shown in Table 4.The model for each subregion was calibrated independently, but within a region, the same percentage changes were made for those parameters which have spatially varying values (or parameters other than SURLAG and ESCO), except for the SCS curve number (CN2), based on spatial fields of their initial estimates.The CN2 values are correlated with land cover and soil permeability.In this study, within a sub-region the CN2 parameters were grouped by land cover, and the CN2 values for major land covers/uses were considered as independent calibration parameters.

Results
The Pareto fronts in the two-dimensional objective space found via multi-criteria calibration for all ten SSA subregions are shown in Fig. 3. Paired values of weighted root-mean-square-error (RMSE) and weighted NSE coefficients are plotted on horizontal and vertical axes, respectively (weighted RMSE is a monotonic function of weighted TSE).A model with a perfect fit to GRACE data and river discharge data would have a weighted RMSE of 0 and a weighted NSE coefficient of 1.Thus, the Pareto front curves are convex towards the point (0, 1), reflecting tradeoffs between the ability to fully describe discharge or TWS variations.
With regard to performance of calibrated models in river discharge simulation, the highest values of weighted NSE coefficients obtained vary from −2.55 to 0.66 and are negative for five out of the ten sub-region models (West Africa, Nile, Congo, Zambezi and Madagascar).This measure of goodness of fit statistic is also sensitive to different solutions of parameter sets in Pareto fronts.The deterioration of its value is greater than two in models for all sub-regions other than West Africa, Nile and Zambezi when the parameter set in the Pareto frontier that most closely matches the simulation of GRACE TWS variations was used.The NSE model coefficients for each individual GRDC gauging station are shown in Fig. 4. When the "best-fit" solutions for river discharge simulation were taken, 20 % of the stations have NSEs ≥ 0.7, 43 % ≥ 0.4 and the NSEs for 64 % of the GRDC stations are positive.These percentages decrease from 20 to 6 %, 43 to 17 % and 64 to 30 % if the models are run with the "least-fit" solutions for river discharge simulation.
More satisfactory model fits were achieved in simulation of TWS variations after the calibration of the SWAT model.The ensembles of time series of zonally averaged simulated TWS variations over the 10 sub-regions and associated with Pareto optimal solutions found in multi-criteria calibration are plotted in Fig. 5, together with the time series of zonally averaged GRACE-based mean TWS variations and the related one-sigma (68.7 %) confidence interval (CI).The NSE coefficients for the time series of model-based TWS variations with respect to the GRACE-based mean TWS variations were also calculated and summarised, for calibration and validation periods, respectively (Table 5).Overall, simulated and GRACE-based zonally averaged time series are in good agreement in sub-regions of West Africa, Volta, Chad, Nile, Eastern African, Zambezi and Madagascar during both calibration and validation periods.The means of the NSE coefficients for these sub-regional models range from 0.66 to 0.91.Larger discrepancies were found in the Congo and Horn of Africa.In the simulation of temporal variations of TWS for these two sub-regions, the model still captures the general trends/phase changes of TWS variations, but the mismatch in amplitude is greater.For the Southern African model, the model fit was poorer during the calibration period; however, the model performs much better during the validation period.
The NSE coefficients calculated on a gridded basis are shown in Fig. 6a and b.The "best-fit" and "least-fit" solutions were determined according to model fits with respect to GRACE TWS variations.Figure 6c and d show the agroclimatic zonation (derived from FAO Agroecological Zones crafted by HarvestChoice, Z. Guo, personal communication, 2011), and the density of cropland (fraction of cropland area in 5 min × 5 min grid, Ramankutty et al., 2008) in SSA.Generally, the model performs well in simulating TWS variations in semiarid and sub-humid areas, which encompass most cropland in SSA.The largest discrepancies in simulated TWS variations (NSE coefficients ≤ 0) occurred in arid areas, where water storage amplitude is lower or equivalent to GRACE error (the Sahara, Somalia, western Ethiopia, northwest Kenya, south Namibia and most of Southern Africa) and the equatorial humid area (notably in central Democratic Republic of the Congo).
GRACE TWS data integrate water mass variations from all storage components.Sometimes, interest is focused on estimating water mass variations in certain storage components (e.g., groundwater; Rodell et al., 2009;Tiwari et al., 2009).Temporal variability of zonally aggregated water mass in six water storages parameterised in the SWAT model are characterised by calculating ratios between variances of these storage variables σ 2 ) and variance of model-based total water storage variation σ 2 V Total (in unfiltered space), or σ 2 V i σ 2 V Total .Means and ranges of calculated normalised variances for each storage variable and each sub-region are listed in  7. Systematic phase differences exist among the time series for the three storage variables: in each annual cycle when the rainy season begins the soil moisture is first replenished and peaks, followed by vadose zone water and then groundwater.
The statistics in Table 6 and graphs in Fig. 7 also indicate that there could be even larger uncertainties in estimation of component-wise water storages than what is seen in TWS estimation.For example, the model gives divergent estimates for water storage variations in vadose zone and groundwater in Eastern Africa when the model was run with parameter sets across the Pareto frontier.The estimated time series for water storage variations in the vadose zone and groundwater fall into two groups: one group has large variations in the vadose zone water storage, but relatively smaller variations in groundwater storage; in another group, vadose zone water storage variations are almost zero and variations in groundwater storage are much larger.Figure 8 shows the Pareto fronts in parameter space with normalised parameter values in [0, 1] intervals (zero values represent the lower bounds of the adjustable ranges of the parameters and one corresponds to the upper bounds).The disparate estimates for vadose zone and groundwater storage variations can be explained by the dichotomy in the estimated values of GW DELAY.As the value of GW DELAY approaches zero, water exiting the bottom of the soil profile can enter aquifers immediately causing no variation in vadose water storage and larger variations in groundwater storage, with the opposite for large values for GW DELAY.Similar divergent estimates in vadose zone water and groundwater storage variations caused by different GW DELAY estimates were also found in the Congo model.For Chad, Nile and Southern Africa, there are large uncertainties in estimating soil water storages, which is most likely related to divergent estimates of SOL AWC X.

Discussion and conclusions
The study presented in this paper concerns calibration/evaluation of a semi-distributed model based on SWAT code (SWAT-SSA) for regional-scale hydrologic simulation in Sub-Saharan African countries.The SWAT-SSA models were calibrated and evaluated in a multi-criteria framework to both river discharge and GRACE TWS data, but with more focus on assessing the model's capacity for simulating TWS variability using GRACE data.In spite of uncertainty arising from the tradeoff in optimising model parameters with respect to two model fitting criteria and in estimation of storage variations contributed by different storage components, the study showed that the calibrated SWAT-SSA model performs well in simulating TWS variations in semi-arid and semi-humid areas, where agriculture in SSA is concentrated and, therefore, is capable of acting as an effective modelling tool for agricultural water management in SSA.
Any model calibration and validation exercise is subject to certain limitations.A major limitation in this study originated from use of multi-year average monthly river discharge data for a time frame  different from that in which the models were actually run (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009) as a result of limited availability and accessibility to recent stream flow data in Sub-Saharan countries.Therefore, it is difficult to evaluate the model's adequacy in simulating the surface water system.Climate and land use change may alter the flow regimes of rivers (e.g., Amogu et al., 2010) and potentially bias the calibration of river discharge and estimation of the contribution of water mass variations in river systems to TWS variations.Furthermore, an interesting question often raised in model calibration/evaluation is what value is brought to model calibration and evaluation by use of additional dataset(s).The limitation with river discharge data makes it difficult to answer this question.However, the outcome of the SWAT model calibration and validation in this study tends to suggest that use of GRACE data may only provide limited additional constraints to reduce parametric uncertainties of the model because the NSEs shown in Fig. 3 and Table 5 indicate that there might be stronger equifinality (Beven and Binley, 1992) in the TWS simulation than in discharge simulation.On the other hand, it is apparent that GRACE data are of great value in that they provide valuable information and unprecedented opportunity to validate and evaluate the model's capacity to simulate spatio-temporal variability in TWS.
Finally, there is less agreement between model-and GRACE-based TWS variations in arid and equatorial humid areas.A few possible reasons for the larger discrepancies include: firstly, uncertainties associated with GRACE TWS variations themselves.Noticeable disagreement between model-and GRACE-based TWS variations in arid regions were also reported by Ngo-Duc et al. (2007) in a global-scale model validation study, who attributed the disagreements in arid areas to weak GRACE signals and resulting larger errors in GRACE TWS variations.Secondly, errors in climatic forcing data, especially in precipitation data may be important.Due to lack of ground-based observations in precipitation, regional or global scale hydrologic simulations typically rely on use of precipitation estimates from different satellite-based or meteorological reanalysis data products.Uncertainty associated with these precipitation datasets is often a principle source of uncertainty for hydrologic simulation (e.g., Fiedler and Döll, 2007).Thirdly, inadequacy of SWAT parameterisation or algorithms in simulating hydrology in arid and humid areas may also help explain the discrepancy.Future work will be required to identify physical reasons for model misfits and for model enhancement.

Fig. 1 Fig. 1 .
Fig. 1 Study area boundary, sub-region division, and watershed delineation in SWAT-SSA model setup Fig. 1.Study area boundary, sub-region division and watershed delineation in SWAT-SSA model setup.

Figure 2 Fig. 2 .
Figure 2 Global Runoff Data Centre (GRDC) stations and reservoirs/lakes included in the817

Figure 5 Fig
Figure 5 Observed and zonally averaged simulated TWS variations for ten sub-continental 845

Figure 6 .
Figure 6.Spatial patterns of SWAT model fits in simulations of total water storage variations in 852

Fig. 6 .
Fig. 6.Spatial patterns of SWAT model fits in simulations of total water storage variations in Sub-Saharan Africa (country boundaries are shown).

Figure 8 Fig. 8 .
Figure 8 Estimates of SWAT calibration parameters obtained from mutli-criteria calibration 875

Table 1 .
The datasets for SWAT model setup.

Table 2 .
Sub-regions in SWAT-SSA model setup and calibration.

Table 3 .
Lakes/reservoirs to which mass corrections in GRACE data processing were applied.
Wahr et al. (1998) and)ined for model calibration purposes only.Actual values of these parameters used in simulation are equal to their default values multiplied by the calibration factors.3GRACEdataGRACEdataused in this study were obtained from CNES-GRGS (Centre National d'Etudes Spatiales-Groupe de Recherches de Géodésie Spatiale), RL2 product(Bruinsma et al., 2010).The data are provided as spherical harmonics as 10 day means.The Stokes coefficients are truncated at degree 50 to remove high frequency noise.No further filtering is required for these solutions.Stokes coefficients were recombined followingWahr et al. (1998) and

Table 5 .
Nash-Sutcliffe efficiency coefficients for calibrated SWAT models for simulation of variations in TWS.

Table 6 (
note that the water mass variations in six stor-

www.hydrol-earth-syst-sci.net/16/3083/2012/ Hydrol. Earth Syst. Sci., 16, 3083-3099, 2012 that
have largest temporal variability, thus, contributing most to TWS, are soil, vadose zone and groundwater storage.By contrast, contributions from overland flow, surface runoff and lateral flow lags are trivial.Zonally aggregated time series of soil water, vadose zone water and groundwater storage variations obtained from the calibrated SWAT models are shown in Fig.