Joint inverse estimation of fossil fuel and biogenic CO2 fluxes in an urban environment: An observing system simulation experiment to assess the impact of multiple uncertainties

Introduction Changes in climate have increased due to the impact of greenhouse gas (GHG) emissions on the Earth’s radiative budget over recent decades. The carbon dioxide produced from fossil fuel combustion (CO2ff) is the most important cause of the increase in atmospheric CO2 concentration. Atmospheric CO2 concentrations have risen by 40% since pre-industrial times and are now at their highest level for the past 800,000 years at a minimum (Lüthi et al., 2008). During 2002–2011, global carbon emissions from fossil fuel combustion and cement production averaged 8.3 ± 0.7 GtC yr–1 (1 GtC = 1 Gigatonne of carbon = 1015 grams of carbon) (Boden et al., 2016), with over 70% of CO2ff emissions attributable to urban areas (EIA, 2013). Quantitative estimation of anthropogenic CO2 emissions from urban areas is a high research priority for the formulation and implementation of policies to mitigate climate change and ensure urban sustainability (Hutyra et al., 2014). Estimation of anthropogenic carbon emissions to the atmosphere has generally been performed via two complementary approaches: “bottom-up” and “top-down” RESEARCH ARTICLE


Introduction
Changes in climate have increased due to the impact of greenhouse gas (GHG) emissions on the Earth's radiative budget over recent decades. The carbon dioxide produced from fossil fuel combustion (CO 2 ff) is the most important cause of the increase in atmospheric CO 2 concentration.
Atmospheric CO 2 concentrations have risen by 40% since pre-industrial times and are now at their highest level for the past 800,000 years at a minimum (Lüthi et al., 2008). During 2002During -2011 carbon emissions from fossil fuel combustion and cement production averaged 8.3 ± 0.7 GtC yr -1 (1 GtC = 1 Gigatonne of carbon = 10 15 grams of carbon) (Boden et al., 2016), with over 70% of CO 2 ff emissions attributable to urban areas (EIA, 2013). Quantitative estimation of anthropogenic CO 2 emissions from urban areas is a high research priority for the formulation and implementation of policies to mitigate climate change and ensure urban sustainability (Hutyra et al., 2014).
Estimation of anthropogenic carbon emissions to the atmosphere has generally been performed via two complementary approaches: "bottom-up" and "top-down"

RESEARCH ARTICLE
Joint inverse estimation of fossil fuel and biogenic CO 2 fluxes in an urban environment: An observing system simulation experiment to assess the impact of multiple uncertainties Kai Wu * , Thomas Lauvaux * , Kenneth J. Davis * , Aijun Deng * , Israel Lopez Coto † , Kevin R. Gurney ‡ and Risa Patarasuk ‡ The Indianapolis Flux Experiment aims to utilize a variety of atmospheric measurements and a high-resolution inversion system to estimate the temporal and spatial variation of anthropogenic greenhouse gas emissions from an urban environment. We present a Bayesian inversion system solving for fossil fuel and biogenic CO 2 fluxes over the city of Indianapolis, IN. Both components were described at 1 km resolution to represent point sources and fine-scale structures such as highways in the a priori fluxes. With a series of Observing System Simulation Experiments, we evaluate the sensitivity of inverse flux estimates to various measurement deployment strategies and errors. We also test the impacts of flux error structures, biogenic CO 2 fluxes and atmospheric transport errors on estimating fossil fuel CO 2 emissions and their uncertainties.
The results indicate that high-accuracy and high-precision measurements produce significant improvement in fossil fuel CO 2 flux estimates. Systematic measurement errors of 1 ppm produce significantly biased inverse solutions, degrading the accuracy of retrieved emissions by about 1 µmol m -2 s -1 compared to the spatially averaged anthropogenic CO 2 emissions of 5 µmol m -2 s -1 . The presence of biogenic CO 2 fluxes (similar magnitude to the anthropogenic fluxes) limits our ability to correct for random and systematic emission errors. However, assimilating continuous fossil fuel CO 2 measurements with 1 ppm random error in addition to total CO 2 measurements can partially compensate for the interference from biogenic CO 2 fluxes. Moreover, systematic and random flux errors can be further reduced by reducing model-data mismatch errors caused by atmospheric transport uncertainty. Finally, the precision of the inverse flux estimate is highly sensitive to the correlation length scale in the prior emission errors. This work suggests that improved fossil fuel CO 2 measurement technology, and better understanding of both prior flux and atmospheric transport errors are essential to improve the accuracy and precision of high-resolution urban CO 2 flux estimates.
Keywords: urban CO 2 emissions; biogenic CO 2 fluxes; atmospheric inversion; OSSE; observation strategy; atmospheric transport error; flux error structure methods. Bottom-up methods aggregate together source-specific CO 2 ff flux estimates to form a total emission inventory based on activity data (such as energy consumption, population density, traffic data and local air pollution reporting) and emission models (e.g. a building energy consumption model) (Gurney et al., 2012). Inventories can be highly resolved in both space and time (Gurney et al., 2009), but they are prone to systematic errors and their uncertainties are not well known (Andres et al., 2014). Top-down methods infer quantitative information on surface CO 2 fluxes from variations in atmospheric CO 2 concentrations through inverse modeling with atmospheric tracer transport models (Ciais et al., 2011), and may include isotope composition measurements to identify fossil fuel sources (Levin et al., 2003;Miller et al., 2012;Turnbull et al., 2015;Basu et al., 2016). Uncertainties in atmospheric transport models (Peylin et al., 2002;Lauvaux et al., 2009;Peylin et al., 2011;Isaac et al., 2014), limited density of atmospheric measurements (Gerbig et al., 2009;Lauvaux et al., 2012;Turner et al., 2016) and uncertainties in prior fluxes (Peylin et al., 2005;Carouge et al., 2010;Lauvaux et al., 2016) all constitute sources of error in this method (Engelen et al., 2002).
With the increasing interest in monitoring and verifying surface CO 2 exchange, several studies have been conducted to invert for biogenic (Peters et al., 2007;Schuh et al., 2010;Lauvaux et al., 2012;Ogle et al., 2015) and anthropogenic CO 2 fluxes Bréon et al., 2015;Staufer et al., 2016;Lauvaux et al., 2016;Verhulst et al., 2017). The Indianapolis Flux Experiment (INFLUX, http://sites.psu.edu/influx/) was proposed to develop, test and improve methods to estimate anthropogenic GHG emissions from cities, using Indianapolis as a test bed . This project uses aircraft (Cambaliza et al., 2014;Heimburger et al., 2017) and a high-density surface tower network Richardson et al., 2017) combined with high-resolution atmospheric modeling Sarmiento et al., 2017) to infer CO 2 ff emissions at 1 km spatial resolution (Lauvaux et al., 2016). Figure 1 shows the distribution of instrumented towers and daytime average surface CO 2 fluxes during the first 10 days of September 2013. The availability of a highresolution emission inventory (Gurney et al., 2012) and a high-precision atmospheric transport model Lauvaux et al., 2016) enables us to test the possible improvements and limitations of an urban atmospheric CO 2 inversion system.

Figure 1: Map of ground-based tower locations (numbers and black stars mark different towers) in
Indianapolis (39.363 N-40.137 N, 86.667 W-85.654 W) and daytime (13-19 local standard time) average surface CO 2 fluxes (shade) during the first 10 days of September 2013. All towers have CO 2 mole fraction measurements, and towers 1, 2, 3, 5 and 9 have 14 C content and CO mole fraction measurements. Surface CO 2 fluxes consist of fossil fuel CO 2 emissions (Hestia inventory data) and biogenic CO 2 fluxes simulated from the Vegetation Photosynthesis and Respiration Model (VPRM). Since the CO 2 ff emissions at some strong point sources (such as the Harding Street Power Plant and some industrial emission points) have emissions per unit area that are more than 10 times larger than most of the city area but cover only 3 percent of all the grid points, we limited the maximum range of emissions to 15 µmol m -2 s -1 to show the spatial distribution of daytime average CO 2 fluxes. DOI: https:// doi.org/10.1525/elementa.138.f1 Quantification of urban CO 2 fluxes is limited by several challenges. Of particular importance is the separation of CO 2 ff emissions and biogenic CO 2 (CO 2 bio) exchange (Pataki et al., 2003(Pataki et al., , 2007Briber et al., 2013;Hardiman et al., 2017). Measurement of the radioactive carbon isotope ( 14 C) is a highly effective approach to isolate the 14 C-free CO 2 ff emissions given the depletion of radiocarbon in extremely old fossil fuels (Turnbull et al., 2009). In addition, carbon monoxide (CO) can be used as a tracer for CO 2 ff, relying upon an empirical emission ratio of CO to CO 2 ff from incomplete combustion of hydrocarbons (Silva et al., 2013). CO measurements are more readily available, while flask measurements of 14 C are expensive and discontinuous, although CO is a less accurate tracer of CO 2 ff than 14 C (Levin and Karstens, 2007). Another challenge is minimization of uncertainty in high-resolution atmospheric transport models used to simulate trace gas transport in an urban setting, where complex boundary layer structures may be formed due to the land-use/land-cover change and intensive human activities (Wang et al., 2011;Sarmiento et al., 2017;Gaudet et al., 2017). In addition, the inverse estimation of CO 2 ff emissions is constrained by atmospheric CO 2 measurements, and the trade-off between measurement density and quality is an important emerging debate for urban GHG monitoring (Wu et al., 2016;Shusterman et al., 2016;Turner et al., 2016;Martin et al., 2017). Lastly, uncertain spatial structures in prior flux errors influence the precision of inverse flux estimates (Saide et al., 2011;Lauvaux et al., 2016). Therefore, studying the impacts of CO 2 bio fluxes, atmospheric transport errors, observation deployment strategies and prior flux error structures on CO 2 ff flux estimates are important considerations for advancing our understanding of uncertainties in the estimation of anthropogenic carbon emissions in an urban environment.
An Observing System Simulation Experiment (OSSE) (Figure 2), designed to examine the ability of synthetically generated measurements (pseudo-data) to retrieve the assumed "true" fluxes within a Bayesian synthesis inversion framework, is a useful approach for quantifying the impacts of different inversion system configurations and error characteristics on flux estimates and their uncertainties (Law et al., 2002;Carouge et al., 2010;Gourdji et al., 2010;Chatterjee et al., 2012). Using an OSSE to evaluate uncertainties in the urban CO 2 inversion system has three advantages. First, the presupposed true fluxes make it possible to evaluate the impact of different inversion scenarios on the ability to infer CO 2 ff emissions. Second, since the synthetic CO 2 measurements are generated from surface CO 2 fluxes within the domain of interest, there is no need to consider inflow at the boundary (i.e. CO 2 from outside of the study area), which avoids possible biases from incorrect estimation of boundary conditions for limited-domain inversions although this is an important source of error in real inversions (Schuh et al., 2013;Lauvaux et al., 2012Lauvaux et al., , 2016. Third, the atmospheric transport can be known perfectly (i.e. no bias) because the same transport matrix is used to create the synthetic measurements and to estimate fluxes in the inversion system.
In this study, we conduct a series of OSSEs to evaluate the sensitivity of urban-scale flux estimates to various observational and inversion system configurations over the city of Indianapolis. The primary objectives of this study are threefold. First, we test the impact of prior flux errors on the inferred CO 2 ff flux uncertainties. Second, we demonstrate a method to estimate the impacts of different observational configurations and CO 2 bio fluxes on the ability to infer CO 2 ff emissions. Third, we investigate the impacts of atmospheric transport errors and synthetic CO 2 ff measurements on the accuracy and precision of inferred CO 2 ff emissions.

Inverse theory
Atmospheric inverse modeling of CO 2 sources and sinks is a process to infer a set of statistically optimal fluxes (posterior fluxes), which assimilates all available information sources (measurements and prior fluxes) within their respective uncertainties. Solving this inverse problem requires (1) a set of atmospheric CO 2 mole fraction measurements, (2) a priori estimation of CO 2 fluxes, and (3) a linear operator representing the atmospheric transport linking prior CO 2 fluxes to simulated CO 2 mole fractions  at the location of observations. Knowledge of these three elements, together with their associated uncertainties, allows one to reduce the errors in prior CO 2 fluxes and improve the estimation of CO 2 sources and sinks (Ciais et al., 2011). A Bayesian synthesis inversion (Enting, 2002;Tarantola, 2004) is an algorithm used to maximize posterior conditional probability or minimize posterior variance by minimizing a cost function (F): where x is an m × 1 vector of the discretized unknown surface CO 2 fluxes, x 0 is the prior state vector of surface CO 2 fluxes with m × 1 elements, and y is an n × 1 vector of atmospheric CO 2 mole fraction measurements. H is a known n × m matrix describing the sensitivity of CO 2 mole fractions to surface CO 2 fluxes. B(m × m) is the flux error covariance matrix that represents the uncertainties in prior state and R(n × n) is the observation error covariance matrix describing the error magnitude of discrepancies between observed (y) and modeled (Hx) CO 2 mole fractions caused by measurement and atmospheric transport errors. The inverse (or posterior) fluxes (x a ) and their uncertainties (A) are derived from minimizing the cost function (F) with respect to x: Gain (G) and error reduction (ER) are two metrics used to quantitatively evaluate the inverse flux estimates (mean) and their uncertainties (standard deviation) (Lauvaux and Davis, 2014).
where x ai , x ti and x 0i are the posterior flux, the true flux and the prior flux at the i grid respectively. The σ ai and σ bi are the standard deviations (corresponding to variances at the diagonal of A and B matrixes) in posterior state and prior state at the i grid. The gain metric represents the improvement of flux magnitude after inversion. And the error reduction metric represents the increase of confidence from prior state to posterior state. These two metrics complement each other to comprehensively assess the inversion performance.

Urban fossil fuel CO 2 emissions
Indianapolis was the 14th largest city in the U.S. in 2013 with a population of ~8 35,000 and an area of ~9 63.5 km 2 . The city is surrounded by agricultural areas (primarily cropland) and is located far from other metropolitan areas, so changes in GHG concentrations from the city can be isolated with relative ease. In addition, the flat terrain makes the meteorological conditions relatively simple to simulate. The Hestia Project is the first effort to use bottom-up methods to quantify hourly CO 2 ff emissions for an entire urban landscape down to the scale of individual buildings, road segments, and industrial/electricity production facilities at ~2 00 m resolution (Gurney et al., 2012). Hestia shows that traffic, utility and industry are the main sectors contributing to anthropogenic CO 2 emissions in Indianapolis. Figure 3A is the spatial distribution of daytime CO 2 ff emissions average from 13 to 19 local standard time (LST) during the first 10 days of September 2013.

Vegetation CO 2 fluxes
The CO 2 bio fluxes over the city of Indianapolis were simulated hourly at 1 km resolution using the Vegetation Photosynthesis and Respiration Model (VPRM) coupled to the Weather Research and Forecasting (WRF) model. In the WRF-VPRM system, VPRM uses meteorological fields from WRF and high-resolution satellite indices to simulate the CO 2 bio fluxes with spatiotemporal patterns (Ahmadov et al., 2007). Specifically, VPRM simulates gross ecosystem exchange (GEE) for different vegetation categories using (1) shortwave radiative flux (SWDOWN) and temperature at 2 meters (T2) provided by the WRF simulation; (2) enhanced vegetation index (EVI), which represents the fraction of shortwave radiation absorbed by leaves; and (3) the land surface water index (LSWI), which reflects changes in both leaf water content and soil moisture (Xiao et al., 2004). Respiration fluxes are estimated as a linear function of T2. To account for the abundant soybean and corn fields surrounding Indianapolis and the different photosynthesis and respiration of these two crops (Lokupitiya et al., 2009), we added an extra vegetation category into the WRF-VPRM implementation from the United States Department of Agriculture National Agricultural Statistics Service Cropland Data Layer (USDA-NASS-CDL) to distinguish corn fields, and the remaining croplands were treated as soybean fields. The net ecosystem exchange (NEE) measured by two eddy covariance flux towers from AmeriFlux network were used to optimize four user-estimated parameters in VPRM (Schmid et al., 2000;Mahadevan et al., 2008). Morgan Monroe State Forest (US-MMS: 39.32 N, 86.41 W) and Fermi National Accelerator Laboratory -Batavia (US-IB1: 41.86 N, 88.22 W) are two closest stations to the study area with available data for the ecosystems of interest (Ehman et al., 2002;Matamala, 2016). US-MMS flux measurements from 2013 were used to represent broadleaf forest. US-IB1 flux measurements from 2008 and 2009 were used to represent corn and soybean, respectively, based on the crops grown at the site during those years. Therefore, we used these flux data to optimize parameters for three vegetation categories (deciduous broadleaf forest, corn and soybean), which together account for more than 95% of the total area in the simulated domain. We optimized these parameters simultaneously using an unconstrained nonlinear optimization method (Nelder and Mead, 1965). Figure 3B shows the daytime (13 to 19 LST) average CO 2 bio fluxes during the first 10 days of September 2013.

Atmospheric transport model
This study used the WRF model with a slightly modified chemistry module (WRF-Chem) and the Lagrangian Particle Dispersion Model (LPDM) (Uliasz, 1994) to simulate CO 2 footprints (i.e. influence functions, H matrix in Equation 1) (Lauvaux et al., 2016). The simulation domain is centered on Indianapolis and covers an area of 87 km × 87 km at 1 km spatial resolution and hourly temporal resolution in the LPDM. The National Centers for Environmental Prediction North American Regional Reanalysis (NCEP-NARR) gridded meteorological data were used as the initial conditions to drive the WRF-Chem modeling system (Mesinger et al., 2006), which continuously assimilated meteorological observations using a Four-Dimensional Data Assimilation (FDDA) system to produce more accurate meteorological conditions (Deng et al., 2009), similar to the WRF-WMO-FDDA case described in Deng et al. (2017). The wind field, potential temperature, and turbulent kinetic energy from the WRF-Chem simulations were used as input variables to drive the particle backward motions from the tower locations (Figure 1) in the LPDM. At each tower location, 6300 particles were released every hour for 12-hour back-trajectories. Since the simulation of atmospheric transport during nighttime may have large errors due to difficulty in simulating the stable boundary layer, this study utilizes CO 2 footprints during 7 daytime hours (13-19 LST) in the first 10 days of September 2013 to conduct pseudo-data inversion experiments. Although we do not use synthetic nocturnal observations, the influence functions used to interpret daytime observations do extend into the nighttime (12 hours before the synthetic daytime observations), and hence the current system has some sensitivity to nocturnal emissions.
Quantitative estimation of uncertainties in atmospheric transport is a critical element in urban inversions. Limited model resolution, imperfect atmospheric initial conditions, and imprecise model physical parameterizations can all lead to significant errors in the simulated CO 2 mole fractions. These uncertainties are difficult to quantify. The urban environment is challenging since the underlying surface is heterogeneous, potentially leading to complex sub-grid scale flows. Additionally, the high-resolution atmospheric simulation tends to introduce highly spatiotemporally correlated errors within the urban domain, which are complicated to characterize and could influence the inverse flux estimates and their uncertainties (Lauvaux et al., 2009). Our objective is to focus primarily on the interaction of CO 2 ff and CO 2 bio fluxes. We make the simplifying assumption that transport errors are uncorrelated, which means R matrix is a diagonal matrix. We do vary the assumed magnitude of uncertainty in atmospheric transport (i.e. random error) to evaluate the impact of improvements to atmospheric transport model.

Observing system simulation experiment
We set up a series of observing system simulation experiments by assuming that the daytime average CO 2 ff emissions (x f ) from the Hestia Project and CO 2 bio fluxes (x b ) from the WRF-VPRM system are the true fluxes (X t ). After combining the true fluxes with the linear transport matrix (h), the synthetic "perfect" CO 2 mole fraction measurements (Y p ) at each site were produced. We use two different inversion schemes to simulate atmospheric CO 2 measurements. One inverse system (scheme 1) utilizes only total CO 2 mole fraction measurements (CO 2 tt, y t ). The other inverse system (scheme 2) utilizes both CO 2 tt and CO 2 ff (y f ) mole fraction measurements. These two schemes are achieved by reconstructing the transport matrix (H) as follows: To illustrate the impacts of biogenic CO 2 fluxes and observational network on anthropogenic CO 2 flux estimates, our experiments are based on three different scenarios (Figure 4). Scenario 1 (S1), a reference case, includes only CO 2 ff emissions and synthetic CO 2 tt mole fraction measurements. Since there are no CO 2 bio fluxes, S1 conceptually corresponds to the winter when the CO 2 bio exchange between land and atmosphere is assumed to be negligible compared to CO 2 ff emissions. Scenario 2 (S2) includes CO 2 ff emissions, CO 2 bio fluxes and only CO 2 tt mole fraction measurements (scheme 1). This scenario conceptually represents summer conditions, but a more limited atmospheric observing system. Scenario 3 (S3) has both CO 2 ff and CO 2 bio fluxes (like S2), but includes both CO 2 tt and CO 2 ff mole fraction measurements (scheme 2). The comparison of S1 and S2 illustrates the impact of CO 2 bio fluxes on the inverse estimate of CO 2 ff emissions. The impact of adding CO 2 ff measurements on the inversion performance is evaluated by comparing S2 and S3. Additionally, we also vary the assumed uncertainties in the prior fluxes, atmospheric transport, and atmospheric observations to test the sensitivity of inverse CO 2 ff flux estimates to these characteristics of the system. Evaluating our ability to reduce prior flux errors is the primary objective of this study. Among prior flux errors, the most important challenge is to remove biases. Thus we respectively add mean biases of 3 µmol m -2 s -1 and -2 µmol m -2 s -1 to form prior CO 2 ff and CO 2 bio fluxes, which are about 60% of the average flux signals for each component. These biases represent systematic errors in the prior CO 2 fluxes (Figure 3C and 3D). All of the following experiments include these prior flux biases. Random errors in the prior fluxes, atmospheric transport, and atmospheric measurements also confound our ability to retrieve the true CO 2 fluxes. The magnitudes of these errors vary according to the quality of our instrumentation, atmospheric transport and prior flux models. Therefore, we impose a range of assumed random errors, which are combined with different scenarios, to provide a comprehensive evaluation of the inversion system.
Our first cases explore random errors in the prior flux estimates. The random error magnitude, or Root Mean Square Error (RMSE), represents the magnitude of flux error at each grid point corresponding to diagonal elements of B matrix in Equation 1. The spatial coherence in the flux error is approximated with an exponentially Figure 4: Schematic diagram for three inversion scenarios. CO 2 tt mole fraction measurements are used to invert for CO 2 ff fluxes in scenario 1. CO 2 tt mole fraction measurements are used to invert for CO 2 ff and CO 2 bio fluxes in scenario 2. CO 2 tt and CO 2 ff mole fraction measurements are used to invert for CO 2 ff and CO 2 bio fluxes in scenario 3. decaying function of the distance between two grid points. The Spatial Correlation Length (SCL) at which the correlation between two separated grid points is less than 0.5 is defined to characterize the spatial correlation in the prior flux error structures corresponding to off-diagonal elements in B matrix (Houweling et al., 2004;Peters et al., 2005;Saide et al., 2011;Wu et al., 2011). Neither the random error magnitude nor the spatially correlated error structures are well known. We use S1 with 2 µmol m -2 s -1 RMSE (~40% of the average CO 2 ff fluxes) and 5 km SCL as the default case ( Figure 3C). The RMSE is varied to be 1 µmol m -2 s -1 or 4 µmol m -2 s -1 (i.e. half of or double the default case) to test the sensitivity of the flux error reduction to the prior flux error magnitude, and the SCL is varied to be 2 km or 8 km to explore the influence of different prior flux error structures on the posterior flux uncertainties ( Table 1). Both the Degree of Freedom in the Signal (DFS) and the averaging kernel sensitivity are tested to evaluate the impact of the correlation structures on the solutions (Rodgers, 2000;Bocquet, 2009). In addition, we use S1, S2 and S3 to investigate the impacts of CO 2 bio fluxes, different observational configurations (i.e. density, accuracy and precision of observations) and the use of CO 2 ff measurements on posterior CO 2 ff flux estimates and their uncertainties. This study generated synthetic CO 2 mole fraction measurements for 7 daytime hours (13-19 LST) during the first 10 days of September 2013 at each tower location, and varied the magnitude of observation error to represent different accuracy and precision of atmospheric measurements. For example, 1 ppm observation error means that we set 1 ppm standard deviation to generate hourly random noise for the entire observation period (10 days with 7 hours per day), and then add it to the model-data mismatch at each site. We first estimate flux error reduction under S1 and S2 for four different observation cases ( Table 2): (1) 5 sites (towers 1, 2, 3, 5 and 9) with 1 ppm observation error; (2) 12 sites with 1 ppm observation error; (3) 12 sites with 3 ppm observation error; (4) 5 sites (towers 1, 2, 3, 5 and 9) with 1 ppm observation error and the other 7 sites with 3 ppm observation error. The comparison of case 1 and case 2 indicates the effect of increasing the number of observation sites, and the impact caused by different observation precision is evaluated in the comparison of cases 2, 3 and 4. To explore the impact of observation biases on inversion performance, we set another case (case 5) as 12 sites with 1 ppm bias and 1 ppm RMSE ( Table 2). These random errors and biases could be caused by either imperfect atmospheric CO 2 measurements or by atmospheric transport errors. The use of CO 2 ff measurements is tested in S3 for 5 sites and 12 sites, respectively. Since using 14 C to infer CO 2 ff mole fractions introduces additional measurement errors (Turnbull et al., 2015), the CO 2 ff observation errors are increased 1 ppm compared to the CO 2 tt mole fraction measurements ( Table 2).
Finally, the effect of improved atmospheric transport modeling is explored by decreasing the magnitude of random error in the observation error covariance matrix (R matrix in Equation 1). Richardson et al. (2017) demonstrated that the instrument error from continuous measurements of CO 2 tt mole fractions using wavelength-scanned cavity ring-down spectroscopy (WS-CRDS, Picarro Inc.) is approximately 0.1 ppm. Atmospheric transport error is not as well defined, but has been estimated to be much larger (approximately 2 to 5 ppm in the U.S. Great Plains) depending on the atmospheric conditions and the scale of interest (Lauvaux et al., 2012). Due to the combination of a high-resolution transport model and a meteorological data assimilation system, this study approximates current random atmospheric transport error to be 1 ppm (~30% of the daytime average urban CO 2 enhancement) . With unbiased synthetic measurements at 12 instrumented towers, the impact of different atmospheric transport models is explored by setting random errors in simulated CO 2 tt mole fractions to be 1.0, 0.5 and 0.1 ppm, corresponding to cases that considering reducing and essentially eliminating atmospheric transport errors (Table 3).

Results
We first present the impact of prior flux errors on the precision of posterior flux estimates. Figure 5 shows spatial distributions of error reduction for five cases in Table 1,  using 12 towers with 1 ppm observation error at each site. There is little difference in the spatial structure of error reduction corresponding to the change of prior flux error (RMSE-B) (Figure 5R, 5A and 5B). However, the change of SCL causes an obvious difference in the estimation of flux error reduction, which is consistent with to a previous study (Saide et al., 2011). In the case with 2 km SCL (Figure 5C), prior flux errors are reduced less than 20%, and only close to the tower locations. About 50% of prior flux errors are removed in the vicinity of the towers in the case with 8 km SCL ( Figure 5D) and the error reduction area expands relative to the 2 km case. Since larger SCL means that uncertainties in the prior fluxes are correlated in a larger spatial area, more flux errors can be removed using the same number of observation sites. We find that DFS and averaging kernel sensitivity, additional measures sometimes used to evaluate the spatial structure of inverse flux estimates, provide little information for the range of SCLs we have studied. The DFS is nearly constant across the range of SCLs that we examine, and only decreases as the SCL approaches and exceeds the spacing between our towers ( Figure S1). The DFS decreases for very small SCL values (less than 2 km) ( Figure S1), which is related to a singularity of the Continuum Limit (Bocquet, 2005). Similarly, maps of the averaging kernel sensitivity show very small changes across the range of SCLs we have examined ( Figure S2). The metric of error reduction yields more information about the change in sensitivity of the solution to the assumed SCL. We next explore the impact of different observational networks (i.e. number of towers and quality of measurements) on correcting flux errors. Figure 6 shows error reduction for different observational configurations in S1 ( Table 2). With the increase of observations from 5 sites to 12 sites (Figure 6.1 and 6.2), the area of error reduction  is expanded and the magnitude of error reduction in the center of the city is increased from ~2 0% to ~4 0%, which indicates that it is beneficial to increase the density of observations in a high-resolution urban CO 2 inversion system. In addition, the increase of observation error from 1 ppm to 3 ppm significantly increases uncertainties in the posterior flux estimates (Figure 6.3). Since the daytime average urban CO 2 enhancement in Indianapolis ranges from 0.3 ppm to 2.9 ppm , highprecision measurements are important to remove prior flux errors. For the mixed configuration (Figure 6.4, case 4), flux error reductions in the vicinity of the towers increase to ~3 0% from ~1 0% in case 3 (Figure 6.3), but the error reduction is still not comparable to case 2 (Figure 6.2). With the existence of CO 2 bio fluxes (S2), uncertainties in the posterior CO 2 ff flux estimates are obviously increased, as demonstrated by the reduced flux error correction and the shrinkage of error reduction area (Figure 7). Even for the case with the highest observational density and the most precise measurements (12 sites with 1 ppm observation error), the error reduction in S2 is decreased to less than 20% (Figure 7.2) from ~4 0% in S1 (Figure 6.2). The presence of CO 2 bio fluxes significantly weakens our ability to reduce CO 2 ff flux errors by limiting our ability to distinguish fossil fuel emissions from biogenic fluxes.
To further test the use of biased sensors and CO 2 ff measurements to infer CO 2 ff flux estimates, we compared the gain, error reduction and flux bias averaged across the urban domain for different observational configurations ( Table 2). The gain is negative when using 12 biased sensors (case 5) in S1 (S1_c5 in Figure 8), meaning that the posterior fluxes have a higher bias than the prior state (S1_c5 in Figure 9). It indicates that high-accuracy measurements are necessary to remove systematic errors in the prior CO 2 ff flux estimates. As expected, both gain and error reduction are small (less than 0.2 and 8%, respectively) for S2 compared to S1 for all cases with unbiased observations (case 1 to case 4 in Figure 8). The comparison of S1 and S2 indicates that the presence of CO 2 bio fluxes decreases the gain, and increases random and systematic errors in the estimation of CO 2 ff emissions. Including 12 CO 2 ff measurement sites (S3_c2 in Figure 8) increases the spatially averaged gain to 0.40 from 0.19 in the scenario without CO 2 ff measurements (S2_c2 in Figure 8), corresponding to the obvious correction of flux bias in the posterior state (S3_c2 in Figure 9). This implies that high-density CO 2 ff mole fraction measurements can partially compensate for the interference from CO 2 bio fluxes. Figure 10 shows the absolute difference between posterior CO 2 ff flux estimates and true CO 2 ff fluxes corresponding to different observation errors ( Table 3). The  Table 2: (1) 5 sites (towers 1, 2, 3, 5 and 9) with 1 ppm observation error, (2) 12 sites with 1 ppm observation error, (3) 12 sites with 3 ppm observation error, and (4) 5 sites (towers 1, 2, 3, 5 and 9) with 1 ppm observation error and the other 7 sites with 3 ppm observation error. The same prior flux error (RMSE-B is 2 µmol m -2 s -1 and SCL is 5 km) is used for all these observational configurations. The domain and coordinates in each panel are same as Figure 1. DOI: https:// doi.org/10.1525/elementa.138.f6 variety of observation error represents different atmospheric transport model errors, assuming that high-precision instruments are used. The prior flux errors (default error setting for CO 2 ff and CO 2 bio flux components) and number of measurement towers (12 sites) are constant for all cases. The difference between the posterior fluxes and the true fluxes decreases continuously as the transport error decreases. The existence of CO 2 bio fluxes (S2) causes more flux differences around the urban boundary (middle column in Figure 10). Using CO 2 ff mole fraction measurements (S3) yields reduced flux differences compared to S2, and spatial patterns (right column in Figure 10) are more similar to S1. As expected, the most significant error reduction occurs in the scenario without CO 2 bio fluxes and with the smallest observation error (S1 with 0.1 ppm RMSE-R in Figure 11). The worst case is the one with CO 2 bio fluxes and the largest observation error but no CO 2 ff measurements (S2 with 1.0 ppm RMSE-R in Figure 11), in which  A series of observing strategies are designed by varying the density, accuracy and precision of observations under different scenarios. The S1_c1 notation corresponds to case 1 in scenario 1. All scenario notations are described in Table 2. The specific cases are: (1) 5 sites (towers 1, 2, 3, 5 and 9) with 1 ppm observation error, (2) 12 sites with 1 ppm observation error, (3) 12 sites with 3 ppm observation error, (4) 5 sites (towers 1, 2, 3, 5 and 9) with 1 ppm observation error and the other 7 sites with 3 ppm observation error, (5) 12 sites with 1 ppm observation error and 1 ppm bias (Table 2). DOI: https://doi.org/10.1525/elementa.138.f8 the error reduction is about 10% limited to the area immediately around the tower locations. The use of CO 2 ff measurements expands the area of significant error reduction (right column in Figure 11). In addition, more precise atmospheric transport model (i.e. smaller random error) significantly enhances the magnitude of error reduction around tower locations from ~4 0% (1 ppm error) to ~8 0% (0.1 ppm error), and expands the error reduction area for the three scenarios. Figure 12 shows the spatially averaged gain and error reduction for CO 2 ff and CO 2 bio flux components corresponding to different atmospheric transport errors ( Table 3). The reduction of atmospheric transport errors (i.e. RMSE-R decreases from 1 ppm to 0.1 ppm) enhances gain and error reduction. Comparing S2 and S3 with the same observation error criterion shows that gain and error reduction are improved by including CO 2 ff measurements. For the CO 2 ff flux component, it is interesting to note that S1 with 0.5 ppm error (S1-cII-ff in Figure 12) is equivalent to S3 with 0.1 ppm error (S3-cI-ff in Figure 12), which implies that having precise CO 2 ff mole fraction measurements and small atmospheric transport error can partially compensate for the interference caused by the CO 2 bio fluxes. Spatially averaged flux bias is shown in Figure 13. Without CO 2 bio fluxes (S1), the atmospheric inversion can remove about 70% of the prior flux bias, reducing the bias from 3 µmol m -2 s -1 in the prior state to less than 1 µmol m -2 s -1 in the posterior state (S1-cIII-ff in Figure 13). There are still large posterior CO 2 ff flux biases in S2 (with the presence of CO 2 bio fluxes but no CO 2 ff measurements), especially in cases where the observation errors are 0.5 ppm and 1 ppm (S2-cII-ff and S2-cIII-ff in Figure 13). The use of CO 2 ff measurements (S3) also improves the correction of systematic errors as compared to S2. However, the S3 case with the smallest observation error (S3-cI-ff in Figure 13) is equivalent to the case in S1 with the largest observation error (S1-cIII-ff in Figure 13), which indicates that the influence of CO 2 bio fluxes is significant for the correction of biases in whole-city CO 2 ff emissions estimates.

Conclusions and Discussion
Based on a series of observing system simulation experiments, we demonstrated that high-accuracy and highprecision measurements are necessary to achieve high levels of accuracy and precision in urban CO 2 flux estimates. Within the bounds of the Indianapolis environment and our assumed prior error structures, random observation errors of 1 ppm or less can reduce systematic flux errors to less than 1 µmol m -2 s -1 and remove more than 30% of prior random flux errors in the center of the city. A systematic observation error of 1 ppm increased the posterior flux bias over the prior state. In addition, the presence of uncertain biogenic CO 2 fluxes significantly weakens our ability to invert for anthropogenic CO 2 emissions, but assimilating continuous high-precision (less than 1 ppm hourly random measurement errors) fossil fuel CO 2 measurements partially compensates for the degraded performance caused by biogenic CO 2 fluxes. Moreover, increasing the number of measurement sites from 5 towers to 12 towers enhances the magnitude of error reduction from ~2 0% to ~4 0% in the center of the Figure 12: Spatially averaged gain and error reduction for CO 2 ff (blue) and CO 2 bio (green) flux components in three scenarios with different observation errors. The S1-cI-ff symbol means CO 2 ff flux component corresponding to the case I in scenario 1. All scenario notations are described in Table 3. DOI: https://doi. org/10.1525/elementa.138.f12 city, and expands the error reduction area. Systematic and random flux errors can be further reduced by reducing model-data mismatch errors caused by atmospheric transport uncertainty. Finally, the precision of the inverse flux estimate is highly sensitive to the correlation length scale in the prior emission errors. It is important to note that real data inversions are subject to more complexity than synthetic data experiments (Gourdji et al., 2010), but pseudo-data experiments provide a baseline to compare the constraint on fluxes achieved by various inversion setup choices and to illuminate the best achievable performance of real-data inversions. That is, if an approach for quantifying urban emissions fails in a synthetic data experiment, it is unlikely to succeed given the added complication of a real measurement deployment. In order to relate the results of this synthetic data study to a real urban measurement network and inversion system, we discuss three important issues: the atmospheric measurement network, prior flux error structures and atmospheric transport errors.
This study shows that sensor quality (i.e. accuracy and precision) must be relatively high to ensure accurate and precise urban inverse flux estimates. Some recent studies found that sensors with lower measurement quality, given sufficient numbers, serve as potentially useful tools for the detection of urban CO 2 emissions (Wu et al., 2016;Turner et al., 2016;Shusterman et al., 2016;Martin et al., 2017). These studies, however, only considered random error in the sensors, not sensor bias. We demonstrate that even moderately biased sensors (e.g. 1 ppm) introduce systematic errors in the posterior flux estimates which can degrade the posterior fluxes to a point that is worse than the prior flux estimates. Even with state-of-the-science instruments, minimization of sensor bias requires extensive inter-calibration , and calibration efforts require significant resources which can counter the apparent benefit of sensors that might have a lower initial capital cost. Lower-cost sensors (Stephens et al., 2011) were considered for Indianapolis and ruled out because greater calibration requirements were estimated to cost more to deploy and operate over time than more expensive, but more stable instruments. We also note that Indianapolis is a medium-sized city where the daytime average, city-center CO 2 enhancement is about 3 ppm . It is likely that the threshold for sensor quality is related to the magnitude of the urban CO 2 enhancement.
Sensor development and new analytic methods, particularly for CO 2 ff measurements, are needed to capitalize on the inversion methods outlined in this study. Our study shows that additional CO 2 ff measurements can partially compensate for the interference from CO 2 bio fluxes and improve the inversion performance for CO 2 ff emissions. We assumed, however, that continuous CO 2 ff measurements with 1 ppm precision were available. This measurement capability has not yet been demonstrated, but continuous lower-precision measurements are available via a combination of periodic 14 C and continuous CO measurements (Levin and Karstens, 2007). During the INFLUX experiment, 14 C measurements directly related to CO 2 ff are collected weekly at 5 towers using a flask sampling system (Turnbull et al., 2012). Continuous measurements of CO could be expanded to all 12 towers. The accuracy and precision of the inference of continuous CO 2 ff from this potentially observational system (integrating 14 C and CO measurements at 12 towers) have not yet been quantified, and are complicated by CO/CO 2 ff ratios that vary as a function of emission source, and by photochemical CO production. Additional research is needed in pursuit of continuous, accurate and precise CO 2 ff measurements.
Uncertainty in the spatial structure in prior emission errors greatly limits our ability to map CO 2 emissions at high resolution with confidence. Prior flux errors are likely to be correlated as a function of emission sectors (e.g. traffic, utility, industry). For example, errors in fuel efficiency estimates are probably correlated along highways. A few studies have addressed this problem using hyper-parameter optimization (Desroziers et al., 2005) which provides a direct constraint on the prior emission error structures. For example, Wu et al. (2013) optimized the length scale of Gaussian error structures in a mesoscale inversion system. Similar techniques could be implemented to constrain the spatial structures of emission errors at the urban scale. Direct assessment could also be conducted via the input data and equations used to construct the prior flux estimates (Ogle et al., 2010).
Finally, this study demonstrates that reducing the random errors introduced by uncertainties in atmospheric transport is an effective approach to improving inversion performance. Multiple elements in the atmospheric transport model (e.g. parameterization schemes, boundary and initial conditions, and spatial resolution) complicate the assessment of transport errors (Isaac et al., 2014). Evaluation and minimization of transport errors can be achieved by improving model parameterizations (Sarmiento et al., 2017) and by assimilating site-specific meteorological observations . We note also that our study makes the simplifying assumption of uncorrelated transport errors, whereas, in reality, transport errors are likely to be correlated, especially at the spatiotemporal scales characteristic of an urban study. Our simplifying assumption of uncorrelated transport errors yields the maximum error reduction for a given observational network and assumed error structures. Thus, the current study represents a best-case scenario for the level of error reduction that could be achieved by improving atmospheric transport. The effect of correlated transport errors on inversion performance for urban CO 2 emissions is an important topic for future studies.

Data Accessibility Statement
The Hestia inventory is available on the website (http:// hestia.project.asu.edu/), and other data from this study can be made available upon request.

Supplemental files
The supplemental files for this article can be found as follows: • Text S1. In this section, we present the impact of varying both the Root Mean Square Error (RMSE) and the Spatial Correlation Length (SCL) in the prior flux error structures on the Degree of Freedom in the Signal (DFS). We express the DFS = Trace(KH) with K the Kalman Gain and H the influence function (Rodgers, 2000). Figure S1 illustrates the relationship between the DFS and the SCL from 1 km to 100 km. Three modes are observed depending on the values of SCL. For low values of SCL (less than 2 km), the DFS becomes very small instead of converging to the maximum DFS (assuming no spatial coherence in the inverse emissions). This problem has been illustrated by Bocquet (2005) as a singularity of the Continuum Limit. The Gram matrix G = HBH T (i.e. the Hessian of the dual problem) is not well defined and requires regularization at high resolutions. This regularization is performed here by the introduction of correlations in B matrix. For intermediate values (2 km < SCL < 8 km), the tower footprints overlap over the city which limits the impact of the SCL as the state space is already coherent, at least around the tower locations. The values of the DFS are nearly constant over the 2-8 km range of SCL values. For the last segment of the plot, when SCL is larger than 10 km, the DFS decreases steadily as the impact of the prior error correlation artificially extends the optimization to the entire state space, beyond the city limit. DOI: https://doi.org/10.1525/elementa.138.s1 • Text S2. We present the averaging kernel sensitivity for four cases in scenario 1 corresponding to Figure  5 in the main text ( Figure S2). Compared to the error reduction maps, the averaging kernel sensitivity maps show marginal differences across the different cases for low error variance ( Figure S2A) and intermediate error variance while the SCL is varying from 2 to 8 km ( Figure S2C and S2D). Despite the changes in SCL values, the DFS remains similar. Considering higher values of error variance, DFS shows more sensitivity when the RMSE is 4 µmol m -2 s -1 ( Figure  S2B), but no dependence for smaller values (i.e. 1 or 2 µmol m -2 s -1 ). The overall spatial extent of the different maps remains similar, with small variations compared to the error reduction maps presented in Figure 5. We conclude here that DFS values, as a product of prior error assumptions and observational constraints, is inadequate to estimate the impact of prior emissions errors in our inversion system. DOI: https://doi.org/10.1525/elementa.138.s1 • Figure S1. The change of Degree of Freedom in the Signal (DFS) corresponding to different Spatial Correlation Lengths (SCLs) in prior flux error structures. DOI: https://doi.org/10.1525/elementa.138.s1 • Figure S2. The diagonal elements of the averaging kernel matrix (i.e. S = KH) for four cases in scenario 1 corresponding to Figure 5 in the main text. DOI: https://doi.org/10.1525/elementa.138.s1