Unpacking dasymetric modelling to correct spatial bias in environmental model outputs

Complex environmental model outputs used to inform decisions often have systematic errors and are of inappropriate resolution, requiring downscaling and bias correction for local applications. Here we provide a new interpretation of dasymetric modelling (DM) as a spatial bias correction framework useful in environmental modelling. DM is based on areal interpolation where estimates of some variable at target zones are obtained from overlapping source zones using ancillary information. We explore DM by downscaling runoff output from a distributed hydrological model using two meta-models and describe the properties of the methodology in detail. Consistent with properties of linear scaling bias correction, results show that the methodology 1) reduces errors compared to the source data and meta-models, 2) improve the spatial structure of the estimates, and 3) improve the performance of the downscaled estimates, particularly where meta-models perform poorly. The framework is simple and useful in ensuring spatial coherence of downscaled products.


Introduction
Modern environmental decision making depends on the results of various outputs from environmental modelling (Jakeman et al., 2008;Schmolke et al., 2010).The need for impact modelling to support decision making is particularly important in the face of unprecedented changes the world is currently experiencing in climate, land use and other environmental conditions (IPCC, 2021).The application of complex environmental models is, however, not trivial, and the easily available outputs may be biased or of inappropriate scale or configuration, leading to a need for further processing particularly for local applications.
Mismatch between scales of analysis and the source data has been commonly addressed using downscaling, upscaling, or both.Downscaling methods attempt to estimate the state of a variable at a finer resolution by relating it to the state of the variable at a coarser resolution (von Storch and Zorita, 2019).They are useful when we wish to estimate the internal variability within some temporal or spatial unit.In dynamic downscaling, a fine-resolution model is applied within areas defined by product under downscaling, and where the coarse data provides boundary conditions for the fine-scale model.This is common when, e.g., global climate models are downscaled using regional climate models.In empirical downscaling, the link between the finer and coarser scales is determined by statistical methods.There is a large body of literature on different methods of empirical downscaling, for instance, using interpolation methods (e.g.Mennis, 2009;Kallio et al., 2019;Wang et al., 2015;Lima et al., 2021) or different machine learning methods (e.g.Latombe et al., 2018;Stevens et al., 2015;Bardossy et al., 2005;Ahmed et al., 2013).The outputs from the downscaling methods may exhibit systematic errors (i.e. they may be biased; Tabari et al., 2021) when compared to observations.These systematic errors may arise from imperfect model conceptualization and data aggregation (Teutschbein and Seibert, 2012) or from errors in input data (e.g.Sperna Weiland et al., 2015), and often need to be corrected (Ahmed et al., 2013;J. C. Chen et al., 2021;Ibarra et al., 2021;Teutschbein and Seibert, 2010).A large number of different approaches to bias correction are available (see e.g.Teutschbein and Seibert, 2012) and are of particular importance for climate change impact modelling (e.g.Hempel et al., 2013;Lange, 2019;Warszawski et al., 2014) which requires unbiased meteorological forcing data (Tramberend et al., 2021).These approaches commonly deal with the temporal dimension with less attention given to correction of spatially autocorrelated errors in distributed environmental models (Nahar et al., 2018).
Nonetheless, some methods to correct spatial bias exist.Hnilica et al. (2017) corrected precipitation with a combination of principal component analysis and quantile mapping.Nahar et al. (2018) used a similar methodology, but apply independent component analysis instead of principal component analysis.Kim et al. (2021) developed a Bayesian Kriging-Based Spatial Disaggregation-Quantile Delta Mapping method, where precipitation distribution parameters are interpolated on to a fine grid with Kriging and subsequently used to bias correct downscaled timeseries.Lange (2019) conducted multivariate bias correction using modified version of Cannon (2018) which ensures that downscaled estimates are consistent with the original climate simulations.Furthermore, data assimilation methods are used to optimally combine observations with model outputs in order to reduce their systematic errors (Reichle, 2008).
This study presents a novel interpretation of an advanced areal interpolation method Dasymetric Modelling (DM) as a method for spatial bias correction of downscaled data.Areal interpolation consists of all those methods which use values from a set source zones to estimate the same variables at a set of intersecting target zones (Comber and Zeng, 2019;Goodchild and Lam, 1980).The target zones are commonly of a higher resolution than source zones.In areal interpolation, the value of the variable is distributed among target zones within a source zone, thus preserving the aggregate value in the area represented in the source zone.DM is an extension to areal interpolation where the distribution of the source zone value is guided by some model.DM and bias correction share a similar objective of adjusting values to satisfy some constrainte.g., the distributed values at target zones must match that of their source zone, or scaling a daily timeseries to match a monthly reference value.
We further note that when applying DM to downscale available coarse-resolution model outputs, the application can be described as meta-modelling.Meta-modelling is the process of building a model to emulate another model (also known as surrogate modelling; e.g.Razavi et al., 2012).In meta-modelling, the output of an original model is emulated by some statistical or process-based model from a set of input data with the common aim to reduce computational costs.Downscaling can be considered meta-modelling when the relationship between explanatory variables, for instance relating to topography, and e.g., a climate model output is established at a coarser scale and applied at a finer granularity.
We conduct an experiment in the Upper Bhima Basin (UBB), central India, downscaling local runoff generation (i.e., without accumulation to streamflow).Several other methods have been developed to handle bias in runoff and streamflow.Skøien et al. (2006) extended kriging interpolation to account for topological relationships within a stream network necessary for the interpolation of streamflow records, and further developed a spatio-temporal topological kriging method in Skøien and Blöschl (2007).Paiva et al. (2015) provided another spatio-temporal kriging method to estimate river discharges along a river network.Loonat et al. (2020) used data assimilation of streamflow records to a distributed rainfall-runoff model by kriging and global bias correction.Naz et al. (2019) downscaled runoff data using a data assimilation scheme using soil moisture as an ancillary variable.Bennett et al. (2021) developed a mechanism for streamflow which specifically bias correct runoff and apply river routing to make it possible to use streamflow as the reference to correct against.
In our case study, we examine the bias correcting property of dasymetric modelling in detail.We train two meta-models to emulate the local runoff output of the Community Water Model (CWatM; Burek et al., 2020) -a distributed physically based hydrological modelbased on a global model run at 0.5 • (approx.50 × 50 km) resolution.The meta-models are then applied to a 5 arc-minute (approx.10 × 10 km) grid within the same area.The 0.5 • grid serves as the source zones giving a spatial reference of the magnitude of the values within the basin, and the 5 arc-minute grid serves as the target zones.The meta-model outputs at the target zones are then bias corrected using the source zone values as a reference.The comparison to CwatM at higher resolution is appropriate because the meta-models are trained on CwatM at lower resolution, emulating the behavior of the model.
The experiment we conduct here extends previous work in Kallio et al. (2021Kallio et al. ( , 2019)), where global runoff products were downscaled with Dasymetric Mapping, i.e.DM with a single ancillary variable instead of a model.In the earlier studies, the bias correcting property is confounded due to A) use of irregular polygons which may be influenced by several intersecting source zones, B) the uncertainties related to river routing (modelling of accumulation of runoff in the stream network), C) the difficulty of assigning a low resolution streamflow value to the more complex, higher-resolution river network (see e.g.Bennett et al., 2021, for a discussion of this problem), and D) the use of a monthly timestep, which is large considering the size of the basins.The experiment here focuses on the bias correcting property by using a nested grid (controls the confounding by areal interpolation), comparing local runoff production (controlling for uncertainties in routing models and aggregation effects), and is applied at a daily rather than a monthly timestep.In this way, we can isolate the influence of bias correction from the other factors influencing the performance.
The remainder of the paper is structured as follows.Section 2 provides the background describing similarities and equivalence of DM to meta-modelling and bias correction, respectively.Section 3 describes our case study giving information on CwatM, the Upper Bhima Basin, meta-models and how we assess performance.Results and detailed

Table 1
Global performance metrics computed from the entire dataset covering all source or target zones, and all timesteps.α is the variability error term, β is the bias term, and r is dynamics term in the non-parametric KGE np .KGE ss is the skill score wrt. the benchmark AW ŷt .NRMSE refers to the Normalized Root Mean Square of Error where the standardization is given by the mean of the timeseries.discussion of the case study outputs are given in Section 4. The bias correction framework and the case study are discussed in Section 5. Finally, we conclude the paper in Section 6.

Dasymetric modelling, meta-modelling, and bias correction
This section first introduces DM in Section 2.1.and proceeds to describe the similarities of the common DM workflow and metamodelling in Section 2.2.In the next Section 2.3 we establish the equivalence of DM and linear scaling bias correction.In Section 2.4 we bring them all together in a single framework.

Dasymetric modelling
Areal interpolationa form of spatial interpolation dealing with interpolating values from a source zone to another set of overlapping arbitrary target zonesis a standard practice in many fields (Comber and Zeng, 2019).In its simplest form, Area Weighted Interpolation (AW), a value from a source zone is assigned to target zones based on their overlapping areas as shown in Eq. 1 where ŷt is the estimated value at a target zone t, y s denotes the value associated with a source zone s, and A is the area of the zone.The target zone t is therefore assigned values from all source zones it intersects (denoted as the set S) according to the proportion of the area of the target zone (within each source zone, A t∩s ) to the area of each overlapping source zone.The formulation allows estimation of values in arbitrary target zones which do not neatly conform to the (also arbitrary) shape of the source zones.Areal interpolation has a pycnophylactic mass preservingproperty because it is, in essence, distributing the source zone values onto the overlapping target zones.This procedure leads to the preservation of the source zone values in the target zones (Mennis, 2009;Tobler, 1979).Dasymetric mapping is extension of AW where the interpolation is refined by the use of ancillary information (Comber and Zeng, 2019;Mennis, 2009).DM has originally been developed for improved cartographic presentation of population density (Mennis, 2009;Wright, 1936).A large body of literature has been written on different applications of DM and its extensions, particularly in the context of state-of-the-art population mapping (e.g.Stevens et al., 2015;Tatem, 2017), demographic attributes (e.g.Tatem et al., 2014), and to lesser extent in other fields such as environmental modelling (Kallio et al., 2019(Kallio et al., , 2021;;Chen et al., 2019).In dasymetric mapping, the task of the ancillary data is to provide information on the distribution of the interpolated value within a source zone.With the ancillary information x t , Eq. (1) becomes which uses information about target zones t belonging to a set T, consisting of all target zones intersecting with source zone s, to scale the ancillary data.In Dasymetric Modelling (we will use DM to refer to Dasymetric Modelling for the remainder of the paper), the ancillary information x t is given by some model describing the target zone level spatial distribution of the variable given in source zones (rather than a single variable as in dasymetric mapping; Nagle et al., 2014).For a review of areal interpolation methods we refer to the review by Comber and Zeng (2019).

Dasymetric modelling and meta-modelling
In statistical downscaling the relationship between ancillary variable (s) and the downscaled quantity is commonly regressed at the source zone level (von Storch and Zorita, 2019).This is a common workflow in DM as well (Mennis, 2009;Nagle et al., 2014).The model trained at source zones is subsequently applied at the target zone level to obtain the downscaled estimates.In DM, the estimates are used as the ancillary information x t .The outputs of the trained model are used to derive weights according to which the source zone value is distributed to contributing target zones.If the values associated with the source zones are an output of a model (e.g., a climate-, an impact-or some empirical statistical model), the workflow can be described as a meta-modelling approach.A meta-model attempts to emulate the output of the original model with the aid of explanatory variablesin this case the meta-model attempts to estimate what the full model would have output at the target zone level if it was run at that resolution.The motivation of using different types of meta-models is often to reduce the complexity of a full model, to reduce the computational cost, and sometimes to reduce input data requirements (Asher et al., 2015;Razavi et al., 2012).The common workflow in DM can therefore be seen as meta-model assisted distribution of the model outputs evaluated at source zones on to the overlapping target zones.Areal interpolation (DM) allows this to be done with the objective of upscaling, downscaling, or representing the values in similar resolution, but different zonal arrangement (Kallio et al., 2021).

Dasymetric modelling and bias correction
In a special case of DM where all target zones are located within a single source zone (e.g. a nested grid, a common setup in downscaling earth science data) and each target zone is of the same area (e.g. a raster grid with an equal area projection), the area terms in Eq. ( 2) can be eliminated resulting in Eq. (3a).
This equation can be read as solution where the value associated with a source zone, y s , is distributed to the target zones according to the proportion of some meta-model output x t from the sum of meta-model results within the source zone.If we swap the position of y s and x t , the resulting Eq. ( 3b) is equivalent, but with the change in grouping of terms, we can now read the equation as the scaling of the meta-model outputs If the denominator is divided by the number n of target zones within a source zone, we obtain the mean output μ xt of the meta-model within the source zone.Using the source zone as our reference, the equivalent mean across n target zones within the source zone is μ yt = 1 n y s , and thus we can simplify Eq. (3b) to Eq. ( 5) with the intermediate step given in Eq. ( 4), where y s is the value associated with the source zone in which the target zone is located, and μ xt is the mean value of all meta-model outputs within the source zone.This formulation is identical to the linear scaling bias correction method shown in Eq. ( 6) (Lenderink et al., 2007;Teutschbein and Seibert, 2012) M. Kallio et al.where y * model is the bias corrected model output.Although bias correction is commonly applied to timeseries data, it may also be applied to a spatial domain where target zone values are corrected against the source zone data.Thus, when stripped of the area terms under the conditions described above, DM is mathematically equivalent to the common linear scaling bias correction method.
An example of linear scaling bias correction with one dimensional synthetic data is provided in Fig. 1, showing a cosine function and a hypothetical meta-model output x t , and its bias corrected counterpart ŷt , along with goodness-of-fit statistics.For illustration purposes, the hypothetical meta-model output for each point t was obtained here by adding random variation into the cosine series, dividing the values into groups, and subtracting a random amount from each group.Linear scaling is applied group-wise for each source s using Equation ( 5) rather than to the dataset as a whole, akin to the DM approach having multiple source zones for which to perform bias correction separately.From this example we note a few important properties of linear scaling: 1) by design, bias is eliminated group-wise and subsequently for the entire data series.2) Pearson correlation coefficient is preserved within group, but changes for the entire series.3) the magnitude of standard deviation is changed.
The reason for preservation of Pearson correlation coefficient ρ (Eq.( 7)) within a group is because ρ depends on how the two data series vary around their respective means.
Since each data point in a group is subjected to the same linear transformation, the relationship between the standard deviation σ and the mean μ remains unchanged.In other words, coefficient of variation CV (σ/μ) remains unaltered, even when the absolute magnitude of the standard deviation is changed.As a consequence, bias correction preserves the dynamics provided by the model within each group, but because each group is subjected to different scaling operation, the overall dynamics (correlation) is altered; in this synthetic example Pearson correlation coefficient improves from 0.74 to 0.96.Fig. 1B shows autocorrelation functions (correlation of a variable with itself) for the true original, the meta-model output x t , and bias corrected metamodel output ŷt .Autocorrelation is significantly improved with the method.Given the equivalence between linear scaling and dasymetric mapping on nested grids, a similar behavior is expected in a spatial bias correction context.We note that for the linear scaling bias correction to work as described here, all meta-model output values x t and the reference values associated with the source zones y s should be strictly finite-positive values to prevent erratic behavior.

Dasymetric modelling as a combined downscaling-bias correction framework
The previous sections describe how meta-modelling and bias correction relate to DM. Fig. 2 shows workflows for each of these three methods separately and combined together into a single framework.Each of the three methods have a specific purpose in the workflow.The meta-model outputs x t evaluated at the target zones is responsible for representing the spatial dynamics of how the value is distributed within a source zone, i.e., the meta-model output should have a high correlation with the ground truth values within the source zones.Bias correction ensures that the spatial information associated with source zones are preserved in the target zones, leading to spatially debiased estimates.DM provides flexibility in the spatial configuration of source-and target zones.It describes how non-conforming source and target zones are handledhow should the meta-model output in each target zone be divided during the intersection with source zones, and how to combine the values back to the original target zones after bias correction.This operation may differ for instance for intensive (standardized variables, e. g., population density, or water use per capita) and extensive variables (counts, e.g., the amount of population, or volume of water).
The workflow is agnostic as to how x t is estimatedwhether given by another similar model, a meta-model, or measured data.Similarly, the method is agnostic as to where the source zone reference values y s are obtained.In principle, with the redistribution of conventional DM interpreted here as bias correction, linear scaling may be replaced by more advanced bias correction methods.This leads to the framework being modular, consisting of three distinct parts.

Case study
We explore the dasymetric modelling framework as interpreted in Fig. 2, and its properties, by a case study where we downscale a coarse resolution (0.5 • , source zones) CWatM simulation with a global extent onto a higher resolution, 5 arc-minute grid (target zones).We make use of meta-models trained at the source zone level and applied at the target zones.The results are compared to a run of CWatM at the target zones in order to evaluate the usefulness of the methodology and the ability to emulate a more spatially detailed model output.The experiment is setup so that the source and target zones are nested, which reflects the simplification described in Section 2.3 -this allows us to examine the efficiency of the spatial bias correction without the added complication of non-conforming source and target zones.
In Section 3.1 we give a brief introduction to CWatM, and Section 3.2 contains a description of the study area, the Upper Bhima Basin.Next, Section 3.3 introduces the two meta-models used, and Section 3.4 describes how we assess the downscaling performance.

Community Water Model
The Community Water Model is an open-source integrated hydrological and channel model that calculates water availability (surface and groundwater), environmental flow requirements, and socio-economic water demands and impacts from water infrastructures such as reservoirs, groundwater pumping, and irrigation (Burek et al., 2020).The global applications of the model are based on a 0.5 • or 5 arc-minute resolution grid and are run at a daily temporal resolution.The model can be used for regional applications using a daily temporal and 30 arc-sec spatial resolution.For topography and land cover, it uses a sub-grid approach and for soil and routing processes, it uses sub-daily timesteps.
The conceptual framework and structure of CWatM are similar to that of other hydrological models such as H08 (Hanasaki et al., 2006(Hanasaki et al., , 2008(Hanasaki et al., , 2010)), WaterGAP (Alcamo et al., 2003;Flörke et al., 2013), PCR-GLOBWB (van Beek et al., 2011;Wada et al., 2014) and others.The model accounts for future water demands based on socio-economic change and the impacts of water availability in response to climate change.Irrigation water demand is estimated for paddy and non-paddy crops separately by dynamically linking irrigation water demand with the surface and soil water balance.For the Upper Bhima basin CWatM is forced by the meteorological variables including temperature, wind speed, relative humidity, incoming longwave and shortwave radiation, and surface air pressure from W5E5 v2.0 (Cucchi et al., 2020).For precipitation, we use data from Pai et al. (2014).Please see Burek et al. (2020) for a more detailed description of CWatM hydrological processes.

Upper Bhima Basin
The Bhima Basin in India, shown in Fig. 3, is a highly managed and densely populated watershed with important rain-fed and irrigated crop production.Rainfall occurs towards the west and concentrates through 5 months of monsoon, with a total annual rainfall of around 35 km 3 (approx.760 mm per year).
The Bhima basin is an upstream basin of the Krishna Basin, originating in the mountains of the western Ghats, with a river length of ~325 km.The Bhima basin supports a population of 18.7 million (2015), distributed over 45 800 km 2 of largely agricultural land, flowing mostly within the state of Maharashtra.The largest urban agglomeration within the Bhima Basin is Pune with nearly 7 million residents, followed where x ts is the output of meta-model at timestep ts, DSP is the number of Days Since previous Precipitation event, and DUNE is a topographic index (Loritz et al., 2019) computed from the HydroSHEDS 15 arc second DEM (Lehner et al., 2008).DUNE describes runoff production through a combination of gravitational potential energy of water (height above nearest drainage), and dissipation of that potential energy as water moves along the landscape (distance to nearest drainage).Loritz et al. (2019) find that it can distinguish between regions with different runoff producing regimes, and Kallio et al. (2019) found it useful in a downscaling context with variable terrains.Both elevation and DUNE were aggregated from the 15 arc-second elevation model to 0.5 • degree and 5 arc-minute resolution by taking their mean value.RF was applied using the ranger package for R (Wright and Ziegler, 2017) with 500 trees, unlimited tree depth, and variance-based split rule.We chose the LM and RF meta-models to have a comparison of a simple linear meta-model, and a more complex machine learning method which can deal with non-linear relationships between the predictand and predictors (Cutler et al., 2007), and which has been successfully used in downscaling studies (e.g. C. Chen et al., 2021;Hutengs and Vohland, 2016), including DM (Stevens et al., 2015).Both meta-models were trained with the output of CWatM global simulation run at 0.5 • resolution using those computational cells which intersect the study area of Upper Bhima Basin (Fig. 3).The source zone dataset was divided so that a random 75% of data points (all 30 source zones and all 3652 daily timesteps over 2001-2010 together yields 109 054 data points) were used for training, and the remaining 25% datapoints were used for testing.
When applying bias correction, we first shift the meta-model outputs so that the minimum value within each source zone is > 0 (Reibel and  Agrawal, 2007), if there are any negative values.Shifting the values is done here because 1) for the linear scaling to work as described, all values should be positive, and 2) negative runoff values make no physical sense.While there may be better alternatives to removing negative values from meta-model outputs, we shift because shifting preserves the variability provided by the model (i.e., the standard deviation is preserved), which is task given to the meta-models in Section 2.4.We note that as a result, CV of any shifted timeseries is not preserved, because the mean changes, but standard deviation does not.Bias correction is applied to each daily timestep separately.

Assessment
We assess the performance of meta-models and bias corrected metamodels in downscaling coarse resolution CWatM output (at the source zone level) against a high resolution model run of CWatM (at the target zone level) using Normalized Root Mean Square of Error (NRMSE, normalized using the mean) and a form of the Kling-Gupta Efficiency (KGE; Gupta et al., 2009).KGE is a multi-objective function consisting of measures of bias, variability, and dynamics shown in Eq. 9 where r is the Pearson correlation coefficient between simulations sim and observations obs, α is a measure of variability error σ sim/ σ obs , and β is the bias μ sim/ μ obs .σ stands for the standard deviation, and μ for the mean.
We use a non-parametric variant of KGE where the variability α is described using the average error in normalized Flow Duration Curves (FDC) and Pearson correlation coefficient is replaced with Spearman rank correlation (Pool et al., 2018).We use the non-parametric KGE because standard deviation is sensitive to the mean of the timeseries, and thus the term is not independent of the bias, whereas the normalized FDC is independent of a possible bias in the timeseries.Further, Spearman rank correlation is desired here because it is less sensitive to extreme values (Pool et al., 2018).
In order to assess an improvement in the overall skill of the downscaling, we also compute a KGE skill score KGE ss using Eq. ( 10) (Knoben et al., 2019), where KGE model refers to the meta-model, and KGE benchmark refers to the performance of source zone output of CWatM as an estimate of the target zone output of CWatM.KGE ss is interpreted so that any positive value means improvement over the benchmark, KGE ss = 0 signifies equal

Results and interpretation
We discuss three different aspects of the performance of DM with linear scaling bias correction.First overall performance of the metamodels, benchmark, and the results of the bias correction (Section 4.1), second, how DM is affecting the spatial correlation (4.2), and third, we consider temporal efficiency (4.3).

Improved global and local performance after bias correction
Both meta-models trained at the source zone level show good overall performance and a comparison between a training and testing datasets show no sign of overfitting (Table 1).Of the two meta-models, RF shows, however, considerably better performance (KGE np 0.96) than LM (KGE np 0.71) in the testing dataset.When the meta-models trained at the source zones are applied to the target zones, their performance considerably deteriorates, as expected for out of sample prediction.Error increases and the performance in all components of KGE np decrease.A global comparison of the meta-models benchmarked against use of the source zone values for the target zones (AW ŷt ) shows considerably worse performance for both meta-models.The deteriorating performance is likely due to 1) the simplicity of the meta-models; they do not consider land use or soil properties (which increase in importance in higher resolutions) and thus are not able to represent the heterogenous conditions existing within the basin, and 2) the meta-model training using a lower resolution (spatially averaged) model run does not represent all the conditions found when modelling is performed in higher resolution.Despite the meta-models showing inferior performance in the KGE np components than AW ŷt , overall errors are smaller, as shown in Table 1.
The global goodness-of-fit statistics reveal that after applying the DM spatial bias correction, performance is similar to the benchmark with further reduced errors.We note that because the source model runs have 8% higher runoff than target zone model runs (reflected in the bias component of AW ŷt in Table 1, and shown in Fig. 4 as comparison between source and target zone runoff), the maximum attainable KGE, even when the dynamic and variability terms were perfectly matched, is 0.92.That is, the assumption that source zone information needs to be preserved means that we have potentially accepted a discrepancy compared to the target zone model runs.This level of KGE shows a very high match between two timeseries.In DM, bias is corrected for each source zone separately and consequently for the entire area, however, the spatial differences in the bias lead to different maximum attainable KGE in each area.We note here that the two CWatM model runs use different climate forcing data, which is a major source of errors in hydrological model runs (Sperna Weiland et al., 2015), and that it is known that model runs at different levels of spatial aggregation often result in differences in the simulated water balance (Wen et al., 2021).Due to this model-to-model comparison we do not assess which one more correctly represents the true runoff production within UBB.The merits of the presented downscaling method (without meta-models) against observed streamflow records are assessed in Kallio et al. (2019Kallio et al. ( , 2021)).
Using a global evaluation, however, only provides a partial understanding of the model performance.Fig. 5 shows how local performance is distributed across the basin in small neighborhoods of 3 × 3 target zones.The maps reveal that the meta-models perform more poorly than the benchmark in the central and eastern part of the basin (Fig. 5C, G), areas characterized by low rainfall resulting in low runoff.However, both meta-models improve on the performance in the western basin where runoff production has a high spatial gradient such that downscaling is more important.Bias correcting the meta-model output with DM eliminates nearly all differences between LM ŷt and AW ŷt in the area where the meta-model LM x t was found to be the poorest, while still maintaining the improvement of the meta-model within the highgradient zone.The poor performance artefact found in AW ŷt along longitude 74 • E becomes almost eliminated in LM ŷt .The artefact is entirely eliminated in RF ŷt , but has much higher variability in performance elsewhere compared to AW ŷt than LM ŷt .While the RF metamodel shows extremely high performance at the source zone level, this finding may indicate that while it performs well on source zone testing data, it has still overfitted on the training data and is not as well transferable to the higher resolution target zones as the simpler LM meta-model.We note here, however, that the LM meta-model predicts nearly half of data points as negative runoff values, and the good performance can only be achieved after restricting the outputs to strictly positive values (see Section 2.3).

Improved spatial correlation
One of the main benefits of spatial bias correction is that the spatial structure of outputs can substantially improve, similarly to the autocorrelation function shown in Fig. 1B.Fig. 6 shows the median and the 90% distribution of spatial correlation of the benchmark, meta-models, bias corrected meta-models, and the reference model run of CWatM 5min , computed between all pairs of target zones.The source zone values at target zones AW ŷt show a reasonably good agreement with the reference model run.Both runs are made with the same model (albeit with different climate forcing) and thus it should be expected that when modelling the same output variable, their large-scale spatial correlation would be similar.However, the effect of the coarse correlation can really be seen at the short distances (the first two bins in Fig. 6A), since the resolution of the 0.5 • source zones is approximately 50 km at UBB, a major proportion of all correlation pairs show a perfect match (correlation = 1).This is not the case for the meta-models in Fig. 6B and C. The figures also show that the spatial correlation structure of the metamodels in the study area substantially improves in both cases, though for RF the improvement is more modest than using LM due to the better structure in RF.Apart from small differences, spatial correlation in AW ŷt , LM ŷt , and RF ŷt are all highly similar.

Assessment of timeseries
Evaluations based on global and local performance and spatial correlations suggest that using solely source zones seems to be a viable option.However, it does not accomplish the goal of downscaling as achieved using DM.This is highlighted in Fig. 7, which shows examples of the distribution of runoff values against time within three different source zones.The first row in the figure visualizes the fact that AW ŷt does not show any distribution of values within the source zone, while LM (row two) RF (row three) show a large distribution of values.The timeseries also illustrates many of the properties of linear scaling bias correction: 1) Panels A1 to A3 show that when the meta-model estimates are reasonably good, and with their means close to the reference value of the source zone (AW ŷt ), there are only minor adjustments to the distribution.2) In panels B1 to B3 in mid-June and mid-July 2004, bias correcting meta-model outputs improves the distribution estimate.However, around July 1st, source zone value AW ŷt is outside the envelope of the CWatM reference model run, and thus the distribution of bias corrected meta-model outputs have also moved outside the envelope.3) Panels C1 to C3 show multiple years of runoff production in the area where the performance of all ŷt estimates are poor.The LM metamodel produces a significant number of negative runoff values and a too wide a range of values.Bias correcting brings a major improvement in the values although the performance is still poor overall as evidenced by the maps in Fig. 5. Similar improvement is seen also for RF, however without the need to shift values.4) The absolute magnitude of the distribution changes according to the direction of the adjustment; if the meta-model mean is higher than the reference value, the distribution shrinks because the adjustment μ obs/ μ model < 1 (see Eqs. ( 5) and ( 6)).The effect is reversed if the mean of meta-model outputs is smaller than the reference.
As the examples in Fig. 7 portray, the final performance of the timeseries depends on the accuracy of the source zone reference values for providing an unbiased estimate, and on the ability of the meta-model to accurately portray the dynamics within the source zone.

Differences to alternative methods and high-resolution modelling
Our study describes spatial bias correction with a method that complements methods found in the literature, mainly based on pointobservations, in that it specifically addresses downscaling of data with (arbitrary) areal spatial support.Closest areal interpolation alternative, area-to-area kriging, results in areal output, but require areal features to be first cast to point representation for estimation of the semi-variogram (Hu and Huang, 2020).This intermediate step is unambiguous.Additionally, kriging is unbiased at the observed point locations, but the consequence of using point representation for areal features is that estimates may no longer be unbiased across areal features, which is a feature of DM.Kriging-based runoff interpolation furthermore require observations with a high spatial density (Parajka et al., 2015), not generally available in data-scarce contexts.On the other hand, data assimilation methods estimate an optimal combination of model simulations and reference data, but do not result in entirely unbiased estimates.They also require rigorous a priori uncertainty estimation for both model outputs and observations.The standard method of DM is mathematically and conceptually simple and relatively easy to implement: statistical (meta-)modelling of arbitrary complexity, linear scaling bias correction based on means, and spatial intersections between source and target zones.
Furthermore, the DM methodology presented herein provides a number of advantages over a full CWatM simulation at the higher resolution: 1) substantial computational improvement compared to running the full model.In our case study consisting of 30 source zones, 567 target zones and 3652 timesteps, the unoptimized DM implementation in R language (R Core Team, 2020) runs within a few minutes (including training of the meta-models), compared to approximately an hour (Kallio, 2020) for a full CWatM run at 5 arc-minute resolution in the area (without calibration procedure), 2) statistical models such as the LM and RF models used here can use a wide range of explanatory variables while hydrological models have strict requirements for inputs, 3) faster calibration to the study area compared to calibrating a distributed hydrological model against observed data or regionalizing parameters from another area, and 4) a wider pool of experts is familiar with statistical modelling than with physically based, distributed hydrological modelling.Each of the advantages listed here stack up when uncertainty management requires the use of multiple estimates, as is often the case (Saxe et al., 2021).

Sensitivities and limitations
In light of the presented results, source zone values AW ŷt , despite not being downscaled, seem to reflect the runoff outputs of the target zone reasonably well.The question therefore arises whether downscaling with the methodology of meta-models and bias correction bring any added value?Our analysis clearly suggests that there is little benefit in downscaling in general when the area within a source zone is ho-mogenous with no large differences in the output.In these regions, using the coarse resolution data may be justified against the added complexity of training, applying and bias correcting a meta-model.Similar concepts have been long explored in hydrological literature; e.g. the concept of Hydrological Response Units in which areas with similar hydrological functioning are lumped together, or model applications with variable modelling resolution (Gharari et al., 2020).There is, however, strong improvement in the performance in areas with a high gradient in the modelled outputs, as seen in the western part of the UBB (Fig. 5).
Furthermore, the standard DM methodology used here assumes that the reference values provided by source zones are unbiased and accurate for the areas they describe.There are very few variables which can be measured to certainty over large spatial domains or variable spatial units (The Modifiable Area Unit Problem, MAUP; Manley, 2014), and this uncertainty can affect the regressed model parameters (Fotheringham and Wong, 1991).Our case study provides a simplified (but useful) use case with well-defined source and target zones as nested grids defined by a CWatM simulations, and where the configuration or sizes of target zones play no role.This allowed us to have a detailed look at the properties of DM as a downscaling and spatial bias correction method without an added complexity of non-conforming target and source zones.As an areal interpolation method, DM can, however, deal with arbitrarily shaped source and target zones, and therefore can be used to address MAUP (Dark and Bram, 2007;Goodchild and Lam, 1980;Kallio et al., 2021).Performance of the downscaling to non-conforming target zones in hydrology has been assessed in Kallio et al. (2019Kallio et al. ( , 2021) ) and the uncertainty in different realizations of target zones in Virkki (2019).We note that modified DM methodologies able to deal with uncertain source and ancillary data are also available (Leyk et al., 2013;Nagle et al., 2014).
However, while DM methodology can handle arbitrary overlapping configurations of source and target zones, it does not handle spatial dependencies across source zones.In environmental modelling, this particularly means processes such as streamflow accumulation, or air pollution: the entire upstream catchment influences streamflow at any particular location, and atmospheric processes can transport pollutants from far-away regions.The meta-model used may include these spatial dependencies, but independent bias correction at each source zone means that discontinuities and mass-balance violations may be introduced at source zone boundaries.Methods for bias correction of streamflow have been developed which handle the tree-like structure of stream networks (see e.g.Bennett et al., 2021;Gottschalk, 1993;Paiva et al., 2015;Skøien et al., 2006).The standard DM methodology is suitable for any process which may be considered local, such as runoff generation (without accumulation), soil moisture, land use and land cover, or soil properties, or when the source data covers the entire area of generating process (see e.g. the application of the framework to migration data in Niva et al., 2022).

Potential advancements for DM as a bias correction framework
In our case study, bias correction is carried out individually for each timestep, and therefore the distributions of LM ŷt and RF ŷt closely follow the source zone values AW ŷt .However, extension to spatiotemporal variant of DM (Mennis, 2016) means that the reference source zone values could be expressed in a spatio-temporal domain, e.g.providing a monthly reference value for the source zone.In such a case, the daily bias corrected meta-model outputs would better retain temporal dynamics provided by the meta-modelan aspect which is lost with daily adjustments.
While the linear scaling method used in this study does not always result in the correct distribution of runoff values within source zones, other more advanced bias correction methods can be implemented.Knowledge about the distribution within the source zone, say the standard deviation of values found within, variance scaling (Chen et al., 2011) could be used to correct for both mean, and standard deviation.If the actual distribution is known or can be estimated a priori for each source zone, more advanced methods like quantile-quantile mapping (Teutschbein and Seibert, 2012) can be used instead (and is recommended, if possible; Teutschbein and Seibert, 2013).We can expect these methods to improve the spatial distribution of runoff values, and thus further improve downscaling performance.In principle, the used bias correction method is limited by what information is available at the source zone level.We note here, however, that any method which modifies the distribution of the meta-model outputs, also modifies their spatial correlation.In DM, the spatial dynamics (correlation) of the meta-model is preserved within source zones, and only modified across source zones.This behavior is different from methods which apply bias corrections continuously.
We close the discussion of the presented methodology with a remark that while areal interpolation, and as an extension, DM, are based on area standardization, this is not a strict requirement.The methodology can extend to other type of variables as well, such as water use per capita.The modification from standard areal interpolation that needs to be done to support other standardization schemes is to determine how the values are divided when intersecting source and target zones, and when combining the target zone intersections back to original (i.e., the areal interpolation component of DM workflow in Fig. 2D).This is straightforward when standardization is by area but can be more complicated for alternative standardizations.

Conclusions
In this paper we presented an interpretation of dasymetric modelling as a flexible spatial bias correction framework for downscaling environmental model outputs.We showed that the common workflow in dasymetric modelling can be described using concepts of metamodelling and that the distribution of source zone values in dasymetric modelling is in fact equivalent to the common linear scaling bias correction method.In a case study we then downscale coarse resolution runoff estimates from CWatM (source zones) and compare the outputs against a higher-resolution model run (target zones).
By exploring the properties of linear scaling in a spatial bias correction setting within the DM framework, we find that the DM methodology works well in areas where runoff production exhibits a strong gradient, but that there is limited benefit in more homogenous areas.The meta-models based on precipitation, temperature and topographic data have poor performance in most of the study area.Correcting for the spatially autocorrelated errors (spatial bias), however, corrects most of the problems exhibited by the meta-model outputs and significantly improve spatial correlation structure.The benefit is strongly dependent on unbiased reference values associated with the source zones, and correctly represented dynamics by the meta-models.
Dasymetric modelling is a mature technique developed in population studies with multiple variants in the literature.It offers great potential for environmental impact studies where large amount of data reflects the future uncertainty space and downscaling to local scales is imperative for policy making.As the presented downscaling and spatial bias correction method, it offers a flexible, simple, and effective method which can accommodate arbitrarily shaped areal units for both source and target zones.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Zenodo repository.The link is provided in the data and software availability statement within the manuscript file.

Fig. 1 .
Fig. 1.Panel A) shows a 1-dimensional schematic of linear scaling bias correction and demonstrates some properties of the method.Statistics r, sd, and pbias are the components Kling-Gupta Efficiency (Pearson correlation coefficient, ratio of standard deviations, and percent bias, respectively) and are evaluated against the true function shown in dashed red line.Panel B) shows an autocorrelation function computed for each of the three data series.

Fig. 2 .
Fig. 2. The workflows in A) meta-modelling, B) bias correction, C) dasymetric modelling, and D) the spatial bias correction framework consisting of A-C.
3.3.Meta-modelsTwo meta-models were developed to estimate CWatM modelled runoff output over Upper Bhima Basin during 2001-2010.The first meta-model is a standard Ordinary Least Squares Regression, referred to as the linear model, LM.The second meta-model is based on Random Forest Regression (RF;Breiman, 2001).Both models were trained at the source zone level (0.5 • resolution) with the same model equation based on derived variables from a Digital Elevation Model (DEM), daily precipitation (P) and daily temperature (T) measurements x ts = P ts + P ts−1 + P ts−2 + sum ( + T ts + T ts−1 + T ts−2 + mean ( T ts−1,…,−7 )

Fig. 3 .
Fig. 3.The Upper Bhima Basin on the western peninsular of India.The map shows the mean annual precipitation over 2001-2010 at the target zone level, major hydrological features of the basin as well as the source and target zones used in the case study.

Fig. 5 .
Fig. 5. Local KGE np performance computed from a 3 × 3 moving window and all timesteps for A) source zone values AW ŷt , B) LM meta-model output x t , C) KGE ss between LM x t and AW ŷt D) bias corrected LM meta-model output ŷt ,and E) KGE ss between LM ŷt and AW ŷt .The lower row shows F) RF meta-model x t , G) KGE ss between RF x t and AW ŷt , H) bias corrected RF ŷt , and I) KGE ss between RF ŷt and AW ŷt .The mean flow benchmark performance with KGE = −0.41 is obtained by using temporally averaged mean runoff for the entire timeseries as a prediction(Knoben et al., 2019).

Fig. 6 .
Fig. 6.Spatial correlation computed in 20 km lags, showing the 90% distribution and the median of the reference model run (CWatM at 5 arc minute target zones) and the distributions and the median of the meta-model outputs x t and bias corrected outputs ŷt .Panel A) shows AW, a benchmark given by the source zone values, B) shows the distributions for LM meta-model, and C) for RF meta-model.

Fig. 7 .
Fig. 7. Selected time periods from three source zones showing the 90% distribution of target zone values within the source zones for AW, LM and RF and their bias corrected counterparts.Panel A and B are located where meta-model performance is relatively high (see Fig. 5), and panel C where meta-models perform poorly.The timeseries distribution shown here are not influenced by the neighboring source zones.