Bayesian inverse estimation of urban CO2 emissions: Results from a synthetic data simulation over Salt Lake City, UT

Top-down, data-driven models possess ample power to improve the accuracy of bottom-up carbon dioxide (CO2) emission inventories, and more work is needed to explore the merger of top-down and bottom-up estimates to better inform the metrics used to monitor global CO2 fluxes. Here we present a Bayesian inverse modeling framework over Salt Lake City, Utah, which utilizes available CO2 emission inventories to establish a synthetic data simulation aimed at exploring model uncertainties. Prescribing a high-resolution, urban-scale data product (Hestia) as the “true” emissions in the model, we combine prior emissions with an atmospheric transport model to derive modeled afternoon CO2 enhancements at six monitoring sites within the Salt Lake Valley during the month of September 2015. A global high-resolution gridded emissions data product (ODIAC) is used as the prior, and objective uncertainty structures are defined for both the a priori estimates and the transport model-data relationship which consider non-negligible spatial and temporal covariances. Optimized (posterior) emissions over the Salt Lake Valley agree closely with the assumed “true” emissions during afternoon times, while results including unconstrained times (e.g. night-time) lack such agreement. Both spatial and temporal correlations of prior errors were found to be necessary for obtaining a robust posterior estimate. Model sensitivity analyses are performed, which examine correlation length and time scales, model-data mismatch error, and measurement site network variability. Through these analyses, one measurement site is identified as being particularly prone to introducing bias into posterior emissions due to influences from a nearby point source. Increasing model-data mismatch error at this site is shown to reduce bias in the posterior without significantly compromising agreement with monthly averaged true emissions.

This is especially significant in the wake of the U.S. decision to withdraw from the Paris Climate Accord, in which U.S. sub-national entities such as cities and states are bearing extra responsibility in upholding emission reduction strategies (Tabuchi and Fountain, 2017;US Conf. Mayors, 2017).
Improved understanding of the urban carbon cycle is needed to independently evaluate emissions over time and to provide detailed guidance on potential urban emission mitigation opportunities. Mitigation and management techniques that focus on specific sectors or subcity domains can help uncover opportunities for energy efficiency and emission reduction actions (Salon et al., 2010;Dhakal and Shrestha, 2010). "Bottom-up" emission estimates seek to achieve these goals via a collection of approaches such as direct flux monitoring, energy consumption statistics and activity modeling; however, only a limited number of data products exist for urban domains which comprehensively distinguish activity sectors at very fine resolutions Newman et al., 2016;Patarasuk et al., 2016;Gately and Hutyra, 2017;Bun et al., 2018). When bottom-up inventories are not available, researchers often need to rely on data products that downscale global/national estimates using spatial proxies (e.g. CDIAC, EDGAR, ODIAC). These global products are widely available, however, oftentimes the products are less resolved representations of local emissions behavior, which can misallocate emissions in time and space due to the spatial proxies used Gately and Hutyra, 2017).
Uncertainties unique to both bottom-up and global downscaling methods can be addressed and constrained with "top-down" (atmospheric) estimates, which infer emission patterns utilizing real-time measurements of concentrations and atmospheric transport models. Yet, top-down methods are prone to uncertainties that exist within atmospheric transport models (Peylin et al., 2011;Díaz Isaac et al., 2014), along with measurement representation errors and inconsistent data availability (Gerbig et al., 2009;Lauvaux et al., 2012;Turner et al., 2016). To merge bottom-up and top-down methods, Bayesian inverse frameworks can provide a quantitative and rigorous approach for flux estimation by pairing atmospheric observational constraints with spatial and temporal structures describing prior estimates and uncertainties in both fluxes and model-data relationships (Enting et al., 2002).
Flux inversion methodologies have long been used by the atmospheric science community (Tarantola, 1987;Enting andMansbridge, 1989, 1991), aiding in flux estimation for trace gases using a combination of data and transport models. Recent studies have applied these frameworks to examine GHG behavior over cities, providing top-down optimized emissions estimates of CO 2 and methane (Sargent et al., 2018;Miles et al., 2017;McKain et al., 2012McKain et al., , 2015. In slight contrast to these frameworks, Bayesian-style approaches incorporate uncertainties of observations and fluxes as constraining inputs, and have been successfully applied over urban ground-based measurement networks to optimize existing bottom-up estimates of CO 2 fluxes over Cape Town (Nickless et al., 2018), Indianapolis (Lauvaux et al., 2016;Oda et al., 2017;Gurney et al., 2017;Turnbull et al., 2019), Paris (Bréon et al., 2015), and Davos (Lauvaux et al., 2013). Studies have also used aircraft campaign data as the atmospheric measurements for urban inverse analyses over Los Angeles (Gourdji et al., 2018;Cui et al., 2015;Brioude et al., 2013) and Houston (Brioude et al., 2011(Brioude et al., , 2012, while satellite and ground-based data were used to analyze carbon fluxes over California (Fisher et al., 2017, Hedelius et al., 2018. Given that the accuracy of inversion estimates is often not adequate to verify reported GHG emissions (National Research Council (NRC), 2010), recent studies have emphasized the exploration of underlying uncertainty covariance structures, which drive Bayesian inverse estimates. Within these studies, specific attention has been paid to the correlation of prior emissions uncertainties in either the spatial (Wu et al., 2018;Lauvaux et al., 2016;Oda et al., 2017) or temporal dimension (Nickless et al., 2018;Breon et al., 2015). However, the combination of space and time has yet to be formally examined within an urban anthropogenic CO 2 inversion context. Similarly, correlation of errors within the observation-model transport relationship (i.e. "model-data mismatch") has seen limited experimentation using non-negligible off-diagonal correlation terms (Lauvaux et al., 2016;Oda et al., 2017). In this study, we aim to further explore these structures by incorporating variations in both prior error and modeldata mismatch covariance matrices within an atmospheric inversion over Salt Lake City, Utah.
Salt Lake City lies at the northern end of the Salt Lake Valley, geographically positioned between the Wasatch and Oquirrh mountain ranges in northern Utah. Encompassing Salt Lake City and its surrounding metropolitan area, the Salt Lake Valley (SLV) region is a rapidly developing urban area with a population of over 1 million people. Within the SLV, a number of stationary atmospheric monitoring sites and mobile platforms constitute an extensive, long-standing network of CO 2 measurements Mitchell et al., 2018aMitchell et al., , 2018b. To date, the sole inversion study using this network was by McKain et al. (2012), which matched modeled anthropogenic CO 2 enhancements with observed signals at 5 sites around the valley. However, no follow-up studies have taken advantage of the SLV CO 2 network for inverse analysis of carbon emissions. Because the Salt Lake Valley is one of a handful of urban regions (e.g. Indianapolis and Los Angeles) where extensive measurement networks have aligned with research interests to create a high-resolution emissions data product Gurney et al., 2012Gurney et al., , 2017Newman et al., 2016), a SLV Bayesian inversion analysis can contribute towards a growing pool of urban studies that seek to better understand the urban carbon cycle.
As a preliminary assessment of the SLV inversion framework, we implement a "synthetic data" experiment in which known urban emissions are used to generate synthetic CO 2 observations, which are then used in an atmospheric inversion to solve for emissions. Because the "true" emissions are known, synthetic data experiments allow researchers to evaluate the impact of different data/flux parameters on possible deviations from the known answer. These studies, often referred to as observing-system simulation experiments (OSSEs), have been used by numerous researchers to explore uncertainties in various inversion frameworks (Wu et al., 2018;Turner et al., 2016;Kort et al., 2013;Wu et al., 2011;Gerbig et al., 2006). Within the OSSE framework, no boundary in-flow or biogenic enhancements are considered, and transport is consistent within the model and synthetic data; thus, biases from these model components can be ignored, allowing us to better examine the underlying uncertainty structures in the inversions and their effects on model performance. In this OSSE study, we define a baseline configuration for model parameters and incorporate spatially-and temporally-resolved prior uncertainty, along with spatial and temporal correlations of prior errors and component-level derivation of model-data mismatch error structures. Thereafter, we perform sensitivity tests to assess this baseline setup and examine covariance within prior uncertainty and model data mismatch error, as well as the relative influence of network measurement sites used in our study.

Methods
Our domain of interest follows the spatial bounds of Salt Lake County, which occupies the Salt Lake Valley and parts of surrounding mountains to the East and West (Figure 1). Salt Lake County encompasses six measurement sites within the SLV CO 2 observational network which are used in this study. Sites are shown in the domain map in Figure 1 and are labelled with their respective site codes -UoU (University of Utah), DBK (Daybreak), RPK (Rose Park), SUG (Sugarhouse), MUR (Murray), and SUN (Suncrest) Mitchell et al., 2018a). As the focus of our inverse method, we solve for CO 2 emissions at a fine-scale spatial resolution of 0.01° × 0.01°, resulting in a total of 2386 grid cells within our Salt Lake County domain. We adopt a temporal resolution of 6-hourly time-steps corresponding to 0-6; 6-12; 12-18; and 18-24 UTC (local time is Mountain Daylight Time (MDT), equivalent to UTC-6 hours). Emissions are optimized starting from September 1, 18:00 UTC through September 30, 6:00 UTC, with a total of 114 time-steps which span over a roughly 4-week window.

Emission Inventories
Two independent CO 2 emission inventories are used as state vectors in the inversion (Section 2.3), one acting as our prior estimate (primarily a global downscaling method), and the other as the set of "true" emissions (a bottom-up approach) which we aim to recover through optimization of the prior.
We use the 2016 version of the Open-source Data Inventory for Anthropogenic CO 2 (ODIAC2016, Oda and Maksyutov, 2015) as the a priori input estimate for our inversion. ODIAC is a global high-resolution (1 km) monthly fossil fuel CO 2 emission data product which is designed and developed primarily for global and regional tracer transport and inversion applications, but also has been used for several urban studies (e.g. Lauvaux et al., 2016;Oda et al., 2017). ODIAC is based on downscaling of bottom-up CO 2 emissions estimates using power plant emissions and geolocation information taken from the Carbon Monitoring and Action (CARMA) database (www. carma.org), as well as satellite-observed nightlight data (Oda and Maksyutov, 2011;Oda et al., 2018). The native temporal and spatial resolutions of ODIAC are monthly and Arc 30 second respectively. Here, we use a temporaldownscaling and re-gridding method described by Nassar et al. (2013) to scale the emissions to a 6-hourly temporal resolution and 0.01° grid resolution. CARMA point source geolocation used in ODIAC places various point sources at locations which are misaligned with their counterparts in Hestia after re-gridding ODIAC to 0.01°. These differences are described in further detail in Gurney et al., Salt Lake County represented by the unshaded region on the right. Yellow circles represent CO 2 monitoring locations across Salt Lake County used in this study. DOI: https://doi.org/10.1525/elementa.375.f1 (2018). Because inversion frameworks generally cannot correct these gridding discrepancies on their own, misaligned point sources are manually re-located within our prior state vector to match locations in Hestia, and the relocated grid cells are replaced with a mean inventory value solely for the purpose of this study. A map of afternoonaveraged (18-24 UTC) ODIAC emissions with re-aligned power plants is shown in Figure 2a for the month of September 2015. The Hestia-SLC v2018.01.24 fossil fuel CO 2 emissions data product was defined as the "true" emissions within our synthetic data framework, due to its street-level granularity. The Hestia project consists of fine-scale mappings of CO 2 emissions estimates using a bottomup approach for select cities across the U.S. (including Salt Lake City). Hestia-SLC is comprised of eight different source sectors in Salt Lake County -Commercial (buildings and point), Industrial, On-road, Non-road, Railroad, Airport, Residential, and Electricity-production Gurney et al., 2012). Native grid spacing in Hestia is comprised of point, line and polygon sources, and are gridded for atmospheric modeling to 0.002° (Figure 2b). For the purposes of this study, emissions were re-gridded to 0.01° resolution, and the native hourly temporal resolution is used. Afternoon averaged Hestia emissions can be seen in Figure 2b over the same time frame as Figure 2a.

Model Transport
This study uses the Weather Research and Forecasting (WRF) model coupled with the Stochastic Time-Inverted Linear Transport (STILT) model to simulate atmospheric transport for this inversion (Lin et al., 2003;Nehrkorn et al., 2010). The domain setup and parameterization configuration used here is similar to that of Mallia et al., (2015), with the exception of the simulation time, which is set for September of 2015. WRF winds are used to drive an ensemble of 1000 STILT backward trajectories, which are released and traced backwards in time for 24 hours from each of the six site locations for all hours between September 3 rd and 30 th for a total of 4032 simulations. STILT has been widely used to interpret CO 2 and other trace gases, including studies focusing on inverse modeling methods (Lin et al., 2003;Lin and Gerbig, 2005;Mallia et al., 2015;Lin et al., 2017;Mallia et al., 2018;Fasoli et al., 2018). Turbulent dispersion within STILT is parameterized as stochastic motions within the backward trajectories. These trajectories are then used to map the surface influence or "footprint" for each measurement time and location, quantifying the sensitivity of the observation to upwind source regions using units of concentration per unit flux (Lin et al., 2003).
Model transport acts as the linkage between gridded emissions and individual observations; therefore, footprint receptors are selected to match afternoon hourly-averaged observations from 12-17 local time (18-23 UTC), when the planetary boundary layer (PBL) is well-mixed within the valley at each of the six measurement sites used. Within "real-data" contexts, analysis is generally limited by large uncertainties in model transport outside of the afternoon, particularly from erroneous PBL height estimation. Here we adopt "real-data" conventions by excluding non-afternoon times from our observations and footprints. To encompass the first 24-hour period of emissions in the state vector, receptor times begin 24 hours after the beginning of the emissions vector. A total of 1008 footprints/observations are initially composed of 6 sites each, with 6 afternoon hours over 28 days (September 2, 18:00 UTC through September 30, 23:00 UTC). Data and footprints are further aggregated to represent daily afternoon averages in order to minimize the influence of model biases in PBL height on the variability of data between hours, following "real-data" conventions used in other urban studies (e.g. McKain et al., 2015;Miles et al., 2017). This method condenses data down to a total of n = 168 observations, using footprints averaged over all 6 hours. Figure 3 shows monthly average footprint mapping for each of the measurement sites over the month of September 2015.

Bayesian Inverse Methods
Posterior emission estimates are optimized through the minimization of the cost function described by (Tarantola, 1987) and (Enting, 2002): Where z (n × 1) is a vector of observed enhancements, H (n × m) is a Jacobian matrix of footprint values which relate the measurements to the state vector of unknowns (s (m × 1)), R (n × n) is a square and symmetric matrix describing the covariance of model-data mismatch errors (also referred to as observational errors), s p is the state vector of prior emissions, and Q (m × m) is a square and symmetric matrix describing the covariance of deviations between the true field s and prior field s p . In the context of matrix dimensions, n is equal to the number of constraining observations, and m is equal to the number of The posterior best estimate of emissions, ŝ, is given as the solution to the above cost-minimization function: In addition, we construct the posterior uncertainty covariance matrix V ŝ , given as: The solution to the cost function and resulting posterior uncertainty given above are solved following the computational methods described by Yadav & Michalak (2013), which express Q as a Kronecker product (explained in section 2.5) to solve for posterior fluxes at a significant cost reduction when the size of the state vectors (e.g. ŝ, s p , s truth ) become large. A minor variation of these methods is introduced when treating prior uncertainty as a spatiotemporally heterogeneous vector, for which we utilize methods described by NOAA's CarbonTracker-Lagrange software (www.esrl.noaa.gov/gmd/ccgg/carbontrackerlagrange/). Further details on these computational methods are described in Text S-1 of the Supplemental Material.

Synthetic data framework
A key feature of this study is the use of synthetic data to drive corrections to the a priori estimate. The goal of this framework is to recover true emissions (Hestia) starting from the prior inventory (ODIAC). In order to achieve this, synthetic data are created which align with enhancements from the set of "true" emissions at all receptor locations and times, using footprints from the transport model to convert emissions to observed signals. Daily afternoon synthetic observations from 12-17 MDT are expressed as a single afternoon-aggregated observation for each site, where modeled signals from s truth are obtained via convolution with daily afternoon-averaged footprints (described in the H matrix). Random noise is then added to the synthetic data, resembling possible "true" perturbations equal to the observational error described in the modeldata mismatch (R) matrix in section 2.5. The equation for synthetic data generation is given as: where errors (ε) follow a Gaussian distribution with a mean of 0 and standard deviation equal to the standard error within the diagonal R matrix. Because synthetic data are generated with an additional random component (ε), posterior estimates of emissions can vary due to random perturbations. To analyze inversion results, we obtain expected posterior values by running a Monte-Carlo-style ensemble of 10,000 inversion iterations and taking the mean posterior best estimate. Details of this averaging method are given in Text S-2 of the Supplemental Material.

Error Covariance Parameters
As mentioned earlier, the prior error covariance (Q) matrix describes both the variance in prior emissions uncertainty and the spatial and temporal correlation of these uncertainties. Here we derive Q from three distinct components: spatial covariance (E), temporal covariance (D), and prior error variance (σ 2 ), where the spatial and temporal error covariance matrices are combined via a Kronecker product, described as in Carbon-Tracker-Lagrange documentation (www.esrl.noaa.gov/ gmd/ccgg/carbontracker-lagrange/): Under this method, I σ is a diagonal matrix whose elements describe the uncertainty of prior emissions, defined here as the magnitude of difference between the prior and true emission inventories, with a minimum value of 1 μmol m -2 s -1 assigned within the vector. The square of this uncertainty term (I σ ) in equation 5 comprises the prior error variance (σ 2 ) component of Q. By this definition of Q, distinct uncertainty values are given to individual grid-cells for each time step in order to describe the heterogeneous nature of deviations between the prior and true emission inventories. Accurate representation of grid-level uncertainties is especially important in the realdata context, as these parameters are largely responsible for the spatially-explicit nature of flux inversion corrections. However, despite the challenging nature of assigning grid-level uncertainties in both global downscaling and bottom-up methods Hogue et al., 2016;Hutchins et al., 2017;Gurney et al., 2018), within the context of our synthetic-data analysis we can assume Hestia gridded values to be "true" and therefore free from errors. Using this approach, we assume a prior uncertainty state vector for which each grid cell describes the deviation between its prior and "true" values. Thus, uncertainty is here equivalent to the 1σ range, reflecting the absolute value of gridded differences between ODIAC and Hestia. For the remaining components of Q, spatial and temporal covariance matrices are defined using exponential decay equations as shown below: These matrices are computed using separation distances (X s ) and lag-times (X t ) between cells and timesteps respectively, divided by their corresponding correlation range parameters l s and l t . Within the temporal correlation matrix, temporal correlation is assumed in fluxes across days only for time periods at equivalent hours of the day (non-matching times of day are considered uncorrelated and given a value of zero on the off-diagonal across the flux time domain). The correlation range parameters defined here describe the distance and time at which errors in the prior emissions are considered uncorrelated. In order to determine the extent of spatial and temporal variability within prior errors, we employ the use of two objective methods for defining these range parameters. To find the spatial length-scale l s , we fit a variogram using a mapping of the average afternoon difference in the ODIAC and Hestia emission inventories over Salt Lake County, determining the spatial length-scale to be approximately 6 km. To find the temporal length-scale l t , we analyzed the autocorrelation function of daily afternoon-averaged inventory differences over the month of September 2015, finding the temporal length-scale to be equal to ~2 days.
These covariance parameters, while subtle, control the extent to which corrections to prior emissions propagate or spread to neighboring grid cells and time steps. Despite particular attention given to these terms, known uncertainties exist due to downscaling methods used in the prior inventory (e.g., mis-specified point source geolocation in ODIAC). Because of this, we assess the validity of each correlation parameter value by exploring the effects of length scale variability on posterior emission results in section 3.2. Figures and further explanation of length scale determination can be found in Text S-3 of the Supplemental Material.
The model-data mismatch matrix (R) describes errors relating the transport model to the observations, and can be expressed as the sum of uncertainties given as: Individual components of the model-data errors are summarized in Table 1, and are described as follows: R part is the error which stems from the release of a finite number of particles within trajectory ensembles. This value was determined following methods used in Mallia et al., (2015) but applied to the given emissions domain and found to be small (<0.1 ppm) given that a sufficiently large number of particles (1000) are released within STILT trajectory ensembles. R aggr is the error introduced in the model from aggregating spatially and temporally heterogeneous fluxes into single homogeneous cells and timesteps. Here we define this as the root mean square error (RMSE) of the difference between our chosen resolution (0.01 deg/6-hourly) and available finer resolutions (0.002 deg/1-hourly).
Transport model error in WRF-STILT is broken down into components describing horizontal wind error (R transWIND ), vertical mixing-layer height error (R transPBL ), and unresolved eddy turbulence error (R eddy ). R transWIND is estimated by comparing sets of WRF-STILT runs that included wind error, which are assumed to follow a Gaussian distribution and are thus unbiased. These wind errors are then incorporated as additional stochastic motions (Lin and Gerbig, 2005). The difference in variance between WRF-STILT simulated CO 2 with and without wind errors are then used to estimate the impacts of transport errors.
R transPBL is ideally constrained by comparing radiosonde measurements to WRF-modeled PBL heights; however, due to the lack of available radiosonde measurements in the afternoon over Salt Lake City, an approximated error of 7% mean enhancement was adopted from Gerbig et al., (2008) to represent errors using a high-resolution model for afternoon-only time periods. Finally, R eddy is accounted for within the above transport and aggregation errors and is neglected here.
Remaining errors in the equation (R instr , R bkgd , and R bio ) do not pertain to our synthetic data approach, as physical instruments, boundary in-flow, and biogenic fluxes can be disregarded here, and are thus neglected in the baseline case.
Model-data mismatch errors in inversion contexts are often considered statistically independent, which is problematic in that these errors are often temporally correlated with other errors within a time window. Here, we attempt to characterize these correlations by assigning error correlations to the off-diagonal elements in R aggr , R transPBL , and R transWIND for hourly observations within each afternoon (but not across days or sites). R aggr and R transPBL are considered to be fully correlated within each afternoon (as these error components are not expected to be random within a given afternoon), while R transWIND is given decaying correlation with increasing time between observations, assuming a correlation timescale of 2.8 hours as in Mallia et al. (2017). In this study we assume error to be constant across sites, and do not account for spatial correlation of measurements between sites; however, it should be noted that in a real-data application, errors are likely to be correlated between towers, especially if separation distances between sites are small (as they are in this study area). Model-data mismatch errors are also likely to vary between sites based on a variety of factors such as instrument reliability, inlet height, local sources, and surrounding topography. After hourly error correlations are applied, modeldata mismatch errors are aggregated to the daily scale for each measurement site. Details covering this aggregation method are described further in Text S-3 of the Supplemental Material. The final standard error used here for observations across sites is 1.48 ppm, expressed on the diagonal of the R matrix as a variance of (1.48 ppm) 2 .

Sensitivity Analyses and Method Validation
In addition to our baseline case, we compare the results to a variety of sensitivity tests which evaluate parameters' effects on model performance. In this analysis, we test the influence of spatial and temporal correlation range parameters (l s and l t ), model-data mismatch variance, and the site array. Values and configurations for each model parameter are varied to encompass a wide range of model scenarios, and a summary of the tested parameters is given in Table 2. Results for these sensitivity analyses are discussed in section 3.
In order to evaluate the efficacy of the inversion model under certain parameters, we assess multiple measures of performance. Many possible measures exist which serve to validate atmospheric inverse models (see Michalak et al., 2017), and within this study we focus on a number of statistical validation methods that seek to verify the consistency of our assumptions within estimates of prior and posterior emissions and their respective uncertainties.
Our primary validation method compares posterior emission estimates with the "true" emissions on an aggregated space-and time-domain scale, averaged over a sufficiently large number of synthetic data iterations. To obtain these values, grid-scale emissions are first averaged in time, and then weighted by grid cell area to average in space. Similar methods are used to determine error reduction by calculating prior and posterior uncertainty aggregated in space and time, with prior and posterior covariances included within this aggregation technique. Through this, we obtain a measure of the amount of uncertainty that is reduced as a result of our constraints on emissions. Further description of these aggregation methods can be found in Text S-3 the Supplemental Material.
Modeled posterior observations, which are computed by multiplying footprints with posterior emissions, can be compared to synthetic data to evaluate model efficacy.
Here we compute standard error (RMSE) and coefficient of determination (r 2 ) between posterior and observed enhancements at each measurement site.
To assess the validity of posterior emissions given our prescribed error terms, we verify our results by calculating the reduced chi-squared value from our model residuals. Squared data and emissions residuals from our inversion are normalized by their respective variances in R and Q, and are expected to follow a chi-squared distribution with n degrees of freedom (among n + m residuals from data and emissions, respectively). A single reduced chi-squared value can be obtained by following the equation described by (Tarantola, 1987   where ν is the number of degrees of freedom, in this case equal to the number of observations in the inverse problem. A value of reduced chi-squared = 1 indicates alignment of residuals with their prescribed errors. Given that residuals are based on randomly-generated synthetic data, an ensemble of inversions must be generated in order to obtain an expected chi-squared value. Details of this calculation and computational limitations are discussed in Text S-1 of the Supplemental Material.

Results
We will first assess the agreement of posterior emissions, obtained using our baseline configuration, with Hestia, the reference set of true emissions. Further exploration of inverse model performance under varying conditions of spatial and temporal uncertainty, model-data mismatch, and observational network configuration are described in sections 3.2-3.4. Figure 2c shows the posterior emissions (afternoonaveraged) over the Salt Lake Valley. Spatial patterns of optimized emissions still largely resemble those of the prior, with some additional point sources and onroad activity recovered from the true emissions grid (Figure 2b). Posterior corrections (posterior minus prior emissions, Figure 2d) are mostly positive in the northern downtown region of the domain, and largely control corrections to the prior at the domain-averaged scale. Differences between posterior and true emission maps can be seen in Figure S4 in the Supplemental Material. Figure 4 shows the time series for temporally resolved emissions averaged over the total SLV spatial domain, with only afternoon emissions shown (corresponding to times of day which are constrained with synthetic data). Posterior emissions are generally closer to truth than prior; however, emissions are overestimated from September 8-16 and even correct in the wrong direction, away from truth, from September 19-22 and 27-29.

Model performance under baseline configuration
Due to inherent loss of information from atmospheric mixing, along with properties of both the prior baseline emissions values and uncertainty structures prescribed within the Q and R matrices, we do not expect the resulting corrections to exactly reproduce the truth at precise time and space resolutions. We therefore analyze the optimized results at the domain-aggregated monthly average scale, inferring a single value of net emissions over the Salt Lake County domain for September 2015. Using Monte Carlo methods described in section 2.4 and text S-2 of the Supplemental Material, the expected value of posterior emissions aggregated over the afternoon domain is 4.63 ± 0.03 μmol m -2 s -1 , which compares favorably to the true emissions (4.63 μmol m -2 s -1 ) relative to the prior emissions (3.97 μmol m -2 s -1 ). The standard error of the average estimate, 0.03 μmol m -2 s -1 , is less than 1 percent of true emissions. The corresponding afternoon domainaveraged posterior uncertainty reduction for the baseline case is 39.32%. Maps of prior uncertainty and uncertainty reduction (both in afternoon) are shown in Figure 5. It should be noted that within this synthetic data context, true emissions are known; thus, uncertainty reduction should be used solely as a comparison metric between scenarios of this inversion and is a statistical measure of the information content gained from the constraining inputs and uncertainties. Despite strong agreement in the afternoon at the spatially and temporally averaged scale, constraining influence from afternoon footprints diminishes when moving backwards in time throughout the day (Figure 3b). Consequently, morning hours which are intentionally unconstrained (i.e. 6-12 local time) retain some residual influence from afternoon footprints and are corrected to some degree. However, these corrections are based solely on the prior-true difference in the afternoon, giving overestimated emissions for the time period (ŝ = 3.63, s p = 3.54, and s truth = 3.52 μmol m -2 s -1 ). Further back from the afternoon (i.e. 0-6 and 18-24 local time bins), no more than 10% of corrective influence remains within the domain, and posterior emissions revert to the prior. As a result, emissions averaged over all hours of day (ŝ = 3.64, s p = 3.45 and s truth = 3.39 μmol m -2 s -1 ) also reflect differences in prior and true emissions from the constrained afternoon timeframe and are not as accurate a depiction of inversion performance as afternoon emissions. Prior, observed and posterior enhancements at each site are shown in Figure 6. Similar to afternoon emissions, the prior enhancements (blue) often underestimate the observed "true" signal (black). Posterior corrections to the prior enhancements (red) align closely with the observed enhancements in general, and misalignments between observed and posterior signals are generally limited to instances at the RPK site when prior signals switch from underestimation to overestimation, and vice versa (i.e. when blue and black lines cross). This is likely due to the lag effect of the emissions corrections due to temporal correlation of posterior corrections.
The expected value of reduced chi-squared value of fit for the baseline case is equal to 0.92. While this is slightly lower than the ideal value of 1.0, this represents relatively good fit of residuals to error matrices. As noted by Nickless et al. (2018), a reduced chi-squared value < 1 indicates that prior flux covariance and model-data mismatch errors may be overestimated. This could potentially be explained by the assignment of small modifications to the prior uncertainty structure (e.g. assigning a floor value of 1 μmol m -2 s -1 ).

Effects of spatial and temporal correlation on additive corrections
Results of sensitivity analyses on temporal and spatial correlation parameters are shown in Figures 7 and 8. Increasing correlation length, l s , as shown in Figure 7, results in additional spread in the spatial extent of uncertainty reduction. With small length-scales (e.g. l s = 0 km), isolated areas of correction (and therefore high uncertainty reduction) are present immediately surrounding the observational sites, despite lower domain-scale uncertainty reduction at these length-scales (Figure 8a). In contrast, larger length-scales show widespread reduction without such concentrated spikes around observational sites (Figure 7). Overall reduction in uncertainty increases when applying larger length scales, reflecting increased spread of corrective power to distant neighboring grid cells. The same effect is seen in the correlation time scale, where increasing time scale values show temporal spread-ing of corrections to neighboring days, strengthening the corrective trend (Figure 8b). As can be inferred here, misspecification of spatial and temporal correlation scales may result in biased posterior emissions, where optimized emissions are overly influenced by strong trends in the difference between observations and prior-modeled enhancements. An additional inversion run was generated using a setup that neglects both spatial and temporal correlation  in prior errors (l s = 0 km, l t = 0 days). This configuration yields afternoon-averaged posterior emissions equal to 4.16 μmol m -2 s -1 , which underestimates the true afternoon grid-averaged emissions by nearly 0.5 μmol m -2 s -1 . It is thus apparent that given the prior (ODIAC) and true (Hestia) emissions used here, neglecting either spatial or temporal correlations in prior errors results in significant underestimation of optimized emissions.

Model-data mismatch error influence on inversion performance
Mis-specified model-data mismatch errors also have potential to introduce bias in posterior emissions estimates. Figure 9 shows comparisons of posterior emissions and uncertainty reduction to model-data mismatch error on the diagonal of the R matrix. Results show the expected behavior that with increasing uncertainty in the model-data relationship, correction to the prior diminishes. While this analysis does not consider any sort of systematic bias in the observations themselves, it asserts that underestimations of the model-data mismatch error result in misleadingly high reduction in uncertainty and overly-confident corrections to the prior (which are not necessarily correct) given observed enhancements with normally-distributed errors. Following this, overestimation of model-data mismatch error results in a posterior estimate that remains closer to the prior, limiting the model's capability to provide full correction to match the true emissions. It should be noted that within this sensitivity analysis, model-data mismatch errors drive the magnitude of random perturbations (which follow a normal distribution) in the "true" signal, and thus analysis of these observational errors reflect increasingly varying synthetic observations. The calculated χ r 2 value of the baseline case (0.92) is less than 1, suggesting that prescribed uncertainties in R and Q may be slightly overestimated. However, as noted by Michalak et al. (2005), a χ r 2 = 1 metric is not in itself a comprehensive indicator of properly estimated covariance parameters. Additionally, "true" anthropogenic enhancements are no longer known in a real-data case (due to uncertainty in boundary conditions), leaving it up to the modeler to estimate any errors or systematic biases intrinsic to the model-data relationship. While not performed in this study, a potential additional method to quantify error covariance parameters in R, as well as Q, is via Restricted Maximum Likelihood estimation (Michalak et al., 2005). This method uses a top-down approach to optimize parameters in R and Q given a degree of prior knowledge about their underlying structures. Because the defining parameters and underlying structures of these covariance matrices are generally unknown and can have significant uncertainties, this method may act as a strong verification tool in future work.

Relative leverage of specific sites on emissions estimates
Within our synthetic data framework, we are able to assume 100% data availability at each observational site within our network. However, in the real world, routine maintenance of supporting instruments and hardware often prevents this from being the case. Thus, the final portion of this analysis focuses on the leverage that individual monitoring sites within our observational network have on the performance of the inversion. Using these results as guidance, future inversion studies over the SLV network may be better equipped to interpret the behavior of different sites' data, and to identify time periods with maximally-informative data availability. Table 3 describes results of various iterations of the inversion performed with specific sites removed from the observational network. Sites whose removal results in a large change in domain-averaged emissions (e.g., UoU, SUG) contribute significantly to the corrective magnitude of the inversion, whereas sites whose removal results in significantly less uncertainty reduction (e.g. DBK) have more independent and unique upwind influence regions. It should be noted that if each site contributed equally and independently to the total uncertainty reduction (~39%), each site's uncertainty reduction would be ~6 .5%. This is, however, not the case, indicating that a degree of overlap exists among sites' footprints. DBK comes closest at 6.17% and thus provides the largest degree of independent information of all sites, due to its isolated location and unique footprint over the western domain (Figure 3). Small changes to overall uncertainty reduction (<1%) indicate that a given site's contribution to the information content of the inversion is less unique relative to other sites' contributions. The UoU and SUG sites are examples of sites that exhibit low changes in uncertainty reduction; however, their combined removal results in a much larger loss of information to the inversion because their combined footprint covers a unique region within the geographical domain. Thus, neither site is individually imperative for the information network of the study, but their combined information content is needed for maximum performance of the inversion. While it is clear that all six measurement sites are needed to maximize the information content from an inversion over Salt Lake City, the relative contributions found here can serve as a guide to the utility of each site in a real data inversion context where data availability is less consistent.
3.4.1. Bias introduced from proximity to point sources A useful outcome of the network analysis is the ability to monitor unexpected behaviors in the optimized posterior with changing site network configurations. Here we examine more closely the time periods of September 19-22 and 27-29, when valley-averaged emissions appear to overshoot and are corrected downwards despite true emissions being higher than the prior (see Figure 4). In contrast to these periods, we compare with September 8-16, where the posterior corrections notably overestimate the truth on the valley-average. To investigate this anomalous behavior specifically from the underestimation period, we note that exclusion of the RPK site in the network sensitivity analysis is the only configuration that does not see this period's underestimation biases. RPK site is unique among the six sites in being located within <5 km of four high-emitting point sources (according to Hestia), suggesting that a significant degree of hyper-local behavior may influence this site. As shown in Figure 6, large positive differences in observed vs. prior enhancements at RPK align with systematic afternoon over-correction in emissions from September 8-16. While the true signal is generally larger than prior at other northern sites, it is smaller than prior at RPK between September 19-22 and 27-29, aligning with underestimation of true emissions over the valley average (boxed portions of Figures 4 and 6).
To explore the linkage between these phenomena and uncover their driving behaviors, we examine the afternoon mean CO 2 contribution maps for these two time periods at the RPK site (Figure 10), detailing the gridded contributions to true and prior CO 2 enhancements (footprints multiplied by emissions) in upwind source regions. During September 8-16, the RPK site's highest contributing grid cell from the true emissions is a point source located just south of the site. This same cell, however, exhibits almost no contributions from the true emissions during the September 19-22 and 27-29 periods. Prior and true enhancements are driven by the same transport, which infers that temporal differences in the proportional contributions must therefore be driven by temporal variability in footprints or the emissions themselves. However, because footprints averaged over each period do not display large differences in upwind source regions (see Figure S5 in the Supplemental Material), variability in emissions are the most likely cause. Indeed, the lower panel of Figure 10 shows that this particular source, identified as the Gadsby power plant, experiences large variations within the month. These variations align with the temporal patterns in RPK's prior and true enhancements, as well as over-and under-estimations of posterior emissions from truth on the valley-average. The coincidence of these enhancements and domain-average biases enforces the idea that point sources can be a driving cause of biased behavior in the posterior emissions.
This naturally leads to the question: what can be done to mitigate the model bias at smaller temporal sub-scales introduced from a site like RPK? The answer hinges on whether the cause is due to misspecification of prior error covariance (Q), or of the model-data mismatch error (R). To explore this question, we examine results of additional model configurations to test if these biases can be addressed while preserving the baseline's agreement with the truth. Two model configurations are run in addition to baseline and RPK-excluded setups: one with prior uncertainty of four nearby point sources reduced to 10% of original in the Q matrix (variation A), and one with 4 ppm of additional error introduced to RPK observations in the R matrix (variation B). Time series of spatiallyaveraged emissions from these variations are compared in Figure 11, showing that during the September 19-22 and 27-29 periods, the underestimations are actually amplified from baseline in the reduced sigma case. In contrast, these underestimations are minimized with additional error applied to RPK observations; afternoon domain-averaged emissions from variation B are equal to 4.64 μmol m -2 s -1 , maintaining close alignment with the true afternoon average (4.63 μmol m -2 s -1 ). The reduced bias from increasing RPK observational error supports the notion that the biases introduced by the RPK site originate in the large enhancements from nearby point sources, rather than these sources' prescribed uncertainty. Thus, despite true emissions being consistently higher than prior, the posterior is lower than prior during September 19-22 and 27-29 because of the corrections driven by these periods' low signals from truth relative to prior at the RPK site.
Interestingly, RMSE between posterior and "true" enhancements is lowest at RPK of all sites (Table S-1 in the Supplemental Material), showing that this is not an indicator of bias introduced to the flux space. In this case, adjustment of the site's error to "de-weight" observations proportional to other sites (basically attempting to increase RMSE between data and posterior signals) is one way to address concerns of hyperlocal enhancements disproportionally correcting emissions across the domain. The quantity of or extent to which error should be applied to this site (or similar sites within other domains) depends on the degree to which the bias introduced skews posterior emissions over the space and time domain in question. Thus, we are limited here by our chosen domain, as a single month is likely too limited a time scale to make accurate assertions about both the level of bias present in domain-averaged emissions and the appropriate error to assign to observations at this site.
Following the conclusions of Turner et al. (2016), a denser array of observational sites is likely to simultaneously reduce the bias in corrections introduced from the RPK site as well as minimize the information content lost from de-weighting these observations. This suggests that the SLV observational network could significantly benefit from additional receptors such as mobile measurements described in Mitchell et al. (2018b). Given the current functional network in this study with only six sites, we still seek to take optimal advantage of all available data, including those from the RPK site which ultimately contribute valuable information content to the inversion. Seeing as the RPK-excluded set-up is essentially a scenario which assigns infinite error to these observations, the appropriate additional error to minimize bias depends on the nature and magnitude of influence from surrounding point sources, as well as the density of remaining measurement sites.

Limitations
While the synthetic data inversion setup is useful in that it allows us to examine more closely the errors within our model, some limitations exist in the extent to which these findings can be fully applied to a real-data context. As mentioned earlier, the measures which are used here to justify the accuracy of optimized posterior emissions  Figure 4, comparing baseline posterior (red) to additional scenarios. Scenario A is the same as baseline, but with uncertainty at four point sources surrounding RPK reduced to 10%. Scenarios B and C are the same as baseline, but with 2 ppm and 4 ppm RMSE, respectively, added to RPK observations on the R diagonal. The RPK-excluded scenario is same as baseline but with RPK removed from observations (only 5 sites used). RPK-excluded emissions do not exhibit negative corrective bias within the grey shaded columns; Scenarios B and C minimize this bias from baseline, and Scenario A shows exaggerated bias. DOI: https://doi.org/10.1525/elementa.375.f11 are limited to some degree by our selection of temporal domain. The time frame used here represents a monthlong snapshot of emissions at 6-hourly resolution, over which variations in emissions could differ from other time periods at different scales. This month-long "snapshot" aspect of the chosen time domain limits our ability to quantify the magnitude of bias introduced by the interaction between the chosen site network and the true emissions field. Ideally, the observational network of an urban inverse study should be dense enough to capture all hyperlocal emissions activity that occurs within the domain; however, emission optimization using this Bayesian inverse framework is only as accurate as site network density and model transport allow. While baseline results presented here exactly match "true" emissions during the constrained period, this value is sensitive to the flux timespan and specific site configuration and could likely change if viewing over a different month or site network. Another limitation of this study is the simplification of synthetic data, which are generated using model transport that is consistent with that used in the inversion. Realistically, data are prone to potentially large additional errors in biogenic CO 2 signals (from flux estimates and transport) and other boundary conditions. These errors (along with horizontal and vertical transport errors) may be prone to systematic biases that are unknown and likely difficult to quantify. Correlation of observational errors both in time and space (i.e. between measurement sites) may be unknown or ambiguous as well, and within the fluxes themselves, a level of uncertainty exists in the methods used to determine spatial and temporal length scales. For these reasons, applying this framework to real-data contexts (especially over other urban domains) would necessitate additional considerations regarding unresolved uncertainties.

Conclusion and Discussion
In this study, we describe methods used to produce optimized emissions within a synthetic data framework, with emphasis on quantifying prior error covariance length scales and observational uncertainties. Expected afternoon averages from domain-averaged output closely match those of prescribed "true" emissions, with a standard error of about 0.03 μmol m -2 s -1 (less than 1% of the aggregated emissions). Domain-averaged uncertainties of afternoon emissions were reduced by around 39%. Emissions were constrained only during the afternoon (18-23:00 UTC), excluding other times due to concerns of large observational/transport uncertainties outside of these times. Our ability to correct for non-afternoon emissions given these constraints is limited based on these errors and results in a loss of around 50% information content when not isolating the constrained afternoon times from results. As previous synthesis reports and studies (e.g. NRC, 2010;McKain et al., 2012) have suggested, column-based constraints on emissions may be one way to strengthen the information content of the inversion, due to the decreased sensitivity of column-based signals to errors in the PBL height, which remains a significant modeling challenge outside the afternoon.
Proper estimation of correlation range parameters is proven to be important in this study in order to produce results in agreement with true emissions. We show here that neglecting spatial and/or temporal correlation, in the context of the given domain and prior/true emissions vectors, greatly limits the corrective power of the inversion model and results in poor agreement with true emissions. Spatial and temporal correlation of prior errors result in a degree of corrective spreading from areas with large influence/uncertainty into neighboring cells; however, when estimates of correlation range parameters are set too high, overcorrection can take place. Thus, objective determination of these parameters is highly recommended for optimizing model efficacy. One approach mentioned but not implemented in this study is Restricted Maximum Likelihood estimation, which could be a valuable tool to aid in determining covariance parameters for future studies.
Varying levels of model-data mismatch error are found to be a factor in introducing biases into the optimized emissions based on this synthetic data approach with normally-distributed random errors in observed CO 2 enhancements. This study provides a comprehensive breakdown of total observational uncertainty which thoroughly assesses most components of error; deviating from this estimate of error was shown to compromise agreement with true emissions given the normally-distributed random perturbation method used in this study.
For future inverse analyses involving the Salt Lake Valley measurement network used here, we have shown that the DBK site contributes the most unique and independent information for constraining emissions. We have also noted the overlapping influence of the UoU and SUG sites. Overall, our network of sites displays sufficient spread in influence from upwind areas for the purpose of this analysis, but further work is needed to quantify gaps in information content resulting from unused locations within the urban domain where potential future sites could contribute to more effective inversion analyses.
Within this site network analysis, point sources are shown to have powerful influence on nearby sites, as is exemplified at the RPK site where large differences in prior and true signals propagated strong biased corrections across the emissions domain at certain times of the month. Increasing model-data mismatch error for RPKspecific observations is shown to reduce this bias without significantly compromising agreement with "true" emissions. While the spatial recoverability of missing point sources is shown to be adequate based on the setup used here, the location and magnitude of missing point source emissions are rarely known a priori in the real world. Thus, while recoverability is inherently limited by the resolution of emissions and transport models, as well as prior estimates of gridded uncertainty, further exploration of site network density and error covariance structures (e.g. using Maximum Likelihood Estimation or non-exponential spatial correlation to estimate Q) is needed to identify best practices for recovering missing point sources. Additionally, a multi-pronged approach to reducing bias from nearby point sources would also include the improvement of point source definitions within urban emission bottom-up data products. While global emission inventories do not specifically cater to high-resolution urban inversion analyses, adoption of more comprehensive point source allocation (which are sometimes available for individual cities or nations) in global inventories such as ODIAC will likely better suit them for future urban inversion applications.

Data Accessibility Statement
Source code in R, sample input files, and select sample footprints are uploaded as online supporting information at https://github.com/lkunik/bayesian-osse-R-sample/. Updates and modifications to the code are ongoing, and contributions to the code framework are welcome and can be initiated by pull request. The version of software referenced for this study at the time of publication can be found here: https://doi.org/10.5281/zenodo.2655990. Questions about the code and files can be directed to the corresponding author.