Fossil fuel CO2 emissions over metropolitan areas from space: A multi-model analysis of OCO-2 data over Lahore, Pakistan

Urban areas, where more than 55% of the global population gathers, contribute more than 70% of anthropogenic fossil fuel carbon dioxide (CO2ff) emissions. Accurate quantification of CO2ff emissions from urban areas is of great importance for formulating global warming mitigation policies to achieve carbon neutrality by 2050. Satellite-based inversion techniques are unique among “top-down” approaches, potentially allowing us to track CO2ff emission changes over cities globally. However, their accuracy is still limited by incomplete background information, cloud blockages, aerosol contamination, and uncertainties in models and priori emission inventories. To evaluate the current potential of space-based quantification techniques, we present the first attempt to monitor long-term changes in CO2ff emissions based on the OCO-2 satellite measurements of column-averaged dry-air mole fractions of CO2 (XCO2) over a fast-growing Asian metropolitan area: Lahore, Pakistan. We first examined the OCO-2 data availability at global scale. About 17% of OCO-2 soundings over the global 70 most populated cities from 2014 to 2019 are marked as high-quality. Cloud blockage and aerosol contamination are the two main causes of data loss. As an attempt to recover additional soundings, we evaluated the effectiveness of OCO-2 quality flags at the city level by comparing three flux quantification methods (WRF-Chem, X-STILT, and the flux cross-sectional integration method). The satellite/bottom-up emissions (OCO-2/ODIAC) ratios of the high-quality tracks with reduced uncertainties in emissions are better agreed across the three methods compared to the all-data tracks. This demonstrates that OCO-2 quality flags are useful filters of low-quality OCO-2 retrievals at local scales. All three methods consistently suggested that the ratio medians are greater than 1, implying that the ODIAC slightly underestimated CO2ff emissions over Lahore. Additionally, our estimation of the a posteriori CO2ff emission trend was about 734 kt C/year (i.e., an annual 6.7% increase). 10,000 Monte Carlo simulations of the Mann-Kendall upward trend test showed that less than 10% prior uncertainty for 8 tracks (or less than 20% prior uncertainty for 25 tracks) is required to achieve a greater-than-50% trend significant possibility at a 95% confidence level. It implies that the trend is driven by the prior and not due to the assimilation of OCO-2 retrievals. The key to improving the role of satellite data in CO2 emission trend detection lies in collecting more frequent high-quality tracks near metropolitan areas to achieve significant constraints from XCO2 retrievals.


Introduction
Carbon dioxide (CO 2 ) alone has contributed to more than 60% of the global direct radiative forcing from Greenhouse Gases (GHGs) that has increased by 45% from 1990 to 2019 according to the Annual Greenhouse Gas Index (AGGI) (NOAA, n.d.. Global fossil fuel CO 2 (CO 2ff ) emissions exceeded 38 Gt in 2020 (Crippa et al., 2020) accounting for more than 77% of fossil fuel greenhouse gas emissions (Crippa et al., 2019). Out of the global CO 2ff emissions, more than 70% originate from cities alone (Birol, 2008;Mitchell et al., 2018), where more than 55% of the global population resides (World Bank, 2014). Urbanization is continuously increasing, with no projected decline (World Bank, 2014). This makes accurately quantifying CO 2ff emissions from urban areas of great importance to formulating the global warming mitigation policies necessary to achieve carbon neutrality by 2050 (UNFCCC, 2015).
Two distinct approaches are commonly used to estimate CO 2ff emissions, i.e., 'bottom-up' and 'top-down'. 'Bottom-up' approaches estimate CO 2ff emissions based on standardized protocols, combining activity data such as fuel production and consumption as well as traffic monitoring data with pre-calculated emission factors for specific sources across different activity sectors (UNFCCC, 2015). Bottom-up approaches remain the most common and standardized solutions to quantifying the CO 2ff emissions used to design climate policies. Recent bottom-up approaches downscale CO 2ff emissions to finer resolutions by using either spatial proxies like population density (e.g. MIX, Li et al., 2017) or nighttime lights (e.g. ODIAC, Oda et al., 2018), or combinations of point sources like power plants (e.g. PKU-Fuel, Wang et al., 2013) and line sources like on-road emissions (e.g. HESTIA, Gurney et al., 2012) to construct high-resolution maps of CO 2ff emissions. Uncertainties in bottom-up approaches are caused by data gaps, a lack of information on energy and fuel use statistics, and outdated or inaccurate emission factors (Andres et al., 1996;Liu et al., 2015;Macknick, 2009) ranging from 5% in Organization for Economic Co-operation and Development (OECD) countries (Marland, 2008), to 15-20% for China (Gregg et al., 2008), to 50% or more for emerging economies (Andres et al., 2014).
As the atmosphere integrates CO 2 signals from various sources and sinks, the first and most challenging step in an inverse analysis is differentiating the influence of cities from non-CO 2ff (e.g., vegetation, water bodies) and distant (so-called background) CO 2 sources and sinks. Unlike in-situ measurements of isotopes, such as Δ 14 CO 2 quantification of CO 2ff (Graven et al., 2018;Turnbull et al., 2019), satellite-retrieved CO 2 relies on biogenic models to estimate natural CO 2 fluxes (Hu et al., 2020;Ye et al., 2020). Various approaches have been developed to determine the background for ground-based in-situ measurements. The most common involve calculating the CO 2 gradients between upwind and downwind sites (e.g. Lauvaux et al., 2016) or by solving fluxes over a larger domain boundary (Nickless et al., 2019). With satellite data, studies have used a constant background column-averaged dry-air mole fractions of CO 2 (X CO2 ) over a period, including using the median X CO2 over a latitudinal band (Hakkarainen et al., 2018), the average X CO2 in the surrounding area relatively unaffected by urban emissions (Kort et al., 2012), or a linearly-interpolated background derived from a twostep linear regression (Ye et al., 2020). The choice of background determination method depends on the specificities of each studied city, as all are impacted by simplifications that will penalize the estimated city emissions (Schuh et al., 2021).
The X CO2 enhancement caused by urban areas positively relates to the size and emissions of the city (Labzovskii et al., 2019). To quantify CO 2ff emissions from space, recent studies have either 1) assimilated observed X CO2 enhancement in a transport model and used the distribution and magnitude of emission sources from bottom-up inventories as a priori information (e.g., Ye et al., 2020) or 2) directly computed CO 2ff emissions from the combination of enhancements in X CO2 observations and local wind information (e.g., Nassar et al., 2017;Varon et al., 2018). In the first approach, observed column-averaged dry-air mole fractions of CO 2ff (X CO2ff ) are assimilated into a Bayesian inverse system to estimate CO 2ff emissions (posteriors) (e.g., Pillai et al., 2016). Eulerian models (e.g., WRF-Chem, Ye et al., 2020) adopt fixed grid cells and simulate X CO2 within the model grid cells by transporting the emitted CO 2 molecules forward-in-time. In Lagrangian modeling (e.g., X- STILT Wu et al., 2018 andWRF-STILT, Nehrkorn et al., 2010), the model directly calculates the satellite footprint at the surface and efficiently defines the area of influence, and thus possible upwind sources, by transporting air parcels backward-in-time from the measurement locations. The second approach relies on detecting an enhancement in the observations near the intersection of the satellite track and the supposed city plume location. The emissions are quantified by computing the flux through the plume transect as the product of the wind and the integral of the CO 2 mass in the detected plume, using methods such as flux cross-sectional integration (FCSI) and Gaussian plume inversion. The second approach requires less computational time and is therefore favourable for analyzing a large number of cities (e.g., Zheng et al., 2020) or power plants (e.g., Nassar et al., 2017). While the first method can be strongly impacted by model transport errors at the local scale, the second method is highly sensitive to assumptions about how the effective wind applies to the plume at the satellite transect location. Both are impacted by uncertainties in the wind field driving the plume from the cities.
While satellite missions offer unprecedented spatial coverage of CO 2 over the globe, sampling from space is limited by the cloud coverage, low solar radiances at high latitudes, the interference caused by aerosols, and the difficulty of characterizing surface properties through retrieval algorithms (O'Dell et al., 2018). Given the pollution around urban areas, aerosol-related biases are far from negligible over large metropolitan areas (Pillai et al., 2016). The narrow swath of current satellite tracks (e.g., OCO-2 retrieves 8 soundings over a 10.3-km acrosstrack) often fails to capture the local CO 2ff enhancement caused by a city if the track is not located downwind of the city or is too far away to detect the enhancement. Finally, because of the long revisit time (e.g., OCO-2 every 16 days, GOSAT every 3 days), a limited number of the satellite overpasses can be collected within a year for a given city, which reduces researchers' ability to quantify the large day-to-day variability in CO 2ff emissions (Zhang et al., 2016). However, this limitation can be overcome by next generation satellite missions with increased imaging capabilities (e.g., OCO-3, Eldering et al., 2019) that provide more details of the CO 2 spatial distribution (Kiel et al. 2021) and more frequent observations (e.g. CO2M) (Kuhlmann et al., 2019).
Despite the pros and cons of current CO 2 satellite missions, this study focuses on the detection capability of long-term trends in urban CO 2ff emissions. To evaluate the current potential of space-based quantification techniques, we present the first attempt to monitor long-term changes in CO 2ff emissions based on OCO-2 satellite measurements over the fast-growing Asian metropolitan area of Lahore, Pakistan. After examining the OCO-2 data availability at global scale, we evaluated the data quality and its potential use for CO 2ff emission estimations over the largest cities. We then locally evaluated the OCO-2 data quality flags designed for global studies within a multi-model comparison of three top-down methods (two based on atmospheric numerical models, WRF-Chem and X-STILT, and the FCSI method) for the 25 available tracks near Lahore. Finally, we evaluated the capability of OCO-2 to track the 5-year trend of city-scale CO 2ff emissions over Lahore, Pakistan, considering both the uncertainties and significance of the trend calculation.

OCO-2 data and quality flags
The OCO-2 satellite was launched on July 2, 2014 and started collecting data on September 4, 2014. It operates in a sun-synchronous polar orbit at an altitude of about 705 km, nominally crossing the equator at 13:36 local time. OCO-2 collects eight adjacent spatiallyresolved soundings every 0.333 s (24 samples per second) along a narrow (0.8 • ) swath, with a cross-track resolution of 0.1-1.3 km (Crisp et al., 2010). This results in about 400 soundings per degree of latitude per orbit, or ~ 1 million soundings, collected each day over the sunlit hemisphere. However, a large fraction of the X CO2 retrievals is inadequate for inverting CO 2ff emissions, due to the data loss associated with cloud blockages, aerosol contaminations, or invalid geolocations. To investigate the number of usable tracks and determine the causes of data loss, we analyzed the available soundings and the quality flags representing sounding quality from 2014 to 2019. We focused on the OCO-2 nadir and glint modes, excluding the target mode from this study due to its limited sampling area.
The OCO-2 X CO2 soundings are subject to a two-level quality inspection (L1 and L2). Only soundings that meet all selection criteria in the inspection are marked as high-quality soundings in the L2 OCO-2 quality flags (QF = 0). Soundings failing to meet one or more criteria are marked as low-quality soundings (QF = 1). A detailed introduction of the two-level quality inspection can be found in Section Text S2.

OCO-2 background X CO2 determination
To constrain CO 2ff emissions from cites, we needed to remove the large-scale influence (so-called "background") from the observations and only keep the local X CO2 signatures. We used the "two-step linear regression" background removal method as described in (Ye et al., 2020). First, we applied noise canceling by averaging X CO2 soundings over one second (covering ~10.32 km × 6.75 km) to reduce the instrumental noise in the data. We then assumed that the 1-s averaged X CO2 data (X CO2_1s ) consisted of two parts: "background samples" and "local samples". Because the "background samples" have less spatial variability compared to the "local samples" affected by local urban CO 2ff emissions, we selected "background samples" from X CO2_1s using the criteria: X CO2_detrend < 0.5⋅σ detrend , where X CO2_detrend is the detrended X CO2_1s and σ detrend is the standard deviation of X CO2_detrend . X CO2_detrend is derived by a linear function: X CO2_1s = a⋅x + b, where x is latitude, and a and b are the slope and intercept derived by linear regression. We then recalculated the linear regression line, or the "background line," based on the "background samples" representing the background spatial trend. Physically CO 2ff should all be greater than zero, so some 1-s averaged soundings were removed if they were below zero after background removal.

Case study: Lahore in Pakistan
Limited by computational resources, we could only select one city to serve as a case study to test if OCO-2 can capture the trend of CO 2ff emissions over urban areas. The idealized sample city should have sufficient OCO-2 tracks and a clear growing trend of CO 2ff emissions. Lahore is the second-largest city in Pakistan in a flat and mostly agricultural region of Asia (Punjab region) (World Bank, 2014). The city's gross domestic product (GDP) by purchasing power parity (PPP) was estimated at $40 billion as of 2008 (PricewaterhouseCoopers, 2009) and $65.14 billion as of 2017 (InpaperMagazine, 2018), with a projected average growth rate of 5.6%. The population exceeded 11 million in 2017 with an annual growth rate of 4.07% since 1998 (Pakistan Bureau of Statistics, 2017a, 2017b). The ODIAC inventory version 2019 inventory suggested that the Lahore whole-city CO 2ff emissions increased by about 646 kt C/year during October 2014-May 2019, translating into a total change of 27% over 2015-2019 (i.e., a mean annual 5.9% increase), which is consistent with Pakistan's national emission estimates of 5.05% during 2001-2018 (Lei et al., 2020). The fast-growing economy and population over the recent two decades make it ideal for testing the potential of detecting trends based on OCO-2 data.
We selected 25 tracks of OCO-2 over Lahore from 2014 to 2019 that meet the following criteria: 1) must have a sufficient number of retrievals, i.e., more than 150 retrievals in a 50 × 50 km box around Lahore evenly distributed along the track; 2) located downwind of Lahore; 3) variations in the 10-m wind field are less than 45 • to minimize transport model errors; 4) a local enhancement in 1-s average retrievals larger than the background variations.
The 25 tracks, referred to as all-data tracks, contain all high-and lowquality soundings (i.e., quality flags are not considered yet). Only 8 out of 25 tracks fit the above criteria, referred to as high-quality tracks, if just high-quality soundings are considered. Detailed information about the selected tracks is shown in Fig. S1. Due to the lack of ground-based XCO 2 data, we ensured the robustness of the results by comparing the OCO-2 data with outputs of three flux quantification methods, WRF-Chem, X-STILT, and the FCSI method.
The outputs of WRF and X-STILT are XCO 2 based on ODIAC, while the FCSI method output is CO 2ff emissions. We compared the ratios of XCO 2 from OCO-2 to XCO 2 from WRF and X-STILT and the ratios of CO 2ff emissions from the FCSI method to CO 2ff emissions from the ODIAC product in Section 3.5 to evaluate the effectiveness of OCO-2 quality flags at city scale. The ratios are mathematically equivalent in Bayesian inversion, so we call them ratios of satellite/bottom-up-emissions (OCO-2/ODIAC) for simplicity.

WRF-Chem model setup
We simulated the X CO2ff mole fractions using WRF-Chem V3.6.1 (Grell et al., 2005;Skamarock et al., 2008) coupled with CO 2 emissions and biospheric fluxes in passive tracer mode (Lauvaux et al., 2012), following the WRF-Chem setup (domain size, resolution, physics schemes) and emission estimation method for "plume city" located in relatively flat terrain as described in Ye et al. (2020).

Atmospheric transport model setup
This study uses one-way nested domains with resolutions of 9 km and 3 km centered on Lahore. The outer domain is set to 59.04-89.64 • E, 15.43-39.97 • N, ~2700 km × 2700 km, which covers part of the Pamir Mountains, the Tibetan Plateau, the Indian subcontinent, the Arabian Sea, and the Bay of Bengal. It includes the complex meteorology introduced by high mountains and vast water bodies. The inner 3 km domain is set to 73.55-75.04 • E, 30.88-32.15 • N, a ~ 50 km × 50 km domain centered on Lahore covering the large urban area and surrounding rural region.
The hourly ERA5 data (Hersbach et al., 2020) on 0.25 • × 0.25 • grids are used as the initial and boundary conditions of the meteorological and land surface fields. The model levels represent the atmospheric column from the surface to 50 hPa using 51 vertical levels. The initial and boundary conditions for the CO 2 mole fractions are set constant, assuming that the long-range transport of CO 2 generates only large-scale spatial gradients and preserves the local enhancements. The simulations run for 4 days starting at 00:00 UTC 3 days before the OCO-2 overpassing day, including a spin-up time of 12 h, producing hourly model outputs for each available OCO-2 track. The simulations were nudged to ERA5 to minimize the X CO2 errors caused by model transport errors. A comparison of X CO2ff distribution between nudged and non-nudged simulations is shown in Fig. S7.

Fossil fuel CO 2 emissions
Fossil fuel CO 2 emissions are based on the 1 km × 1 km gridded global and monthly product of ODIAC Version 2019 Oda and Maksyutov, 2011). It disaggregates country-level CO 2ff emission estimates based on satellite-observed nightlight data and power plant profiles. Here we sum and compare the WRF-Chem and ODIAC total emissions over each domain to conserve the total mass of emitted CO 2 after re-projection. We then downscaled the monthly ODIAC product to hourly resolution, using the Temporal Improvements for Modeling Emissions by Scaling (TIMES) model (Nassar et al., 2013) to consider diurnal and weekly cycles (scaling factors).

Biogenic CO 2 fluxes
Biogenic CO 2 fluxes are based on a 3-hourly net ecosystem exchange (NEE) suite provided by the 15 different global Terrestrial Biogeochemical Models (TBMs; 0.5 • × 0.5 • ) in the Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP) (Huntzinger et al., 2013). The 3-hourly fluxes were linearly interpolated to hourly resolution for the simulations. Spatial downscaling followed Ye et al. (2020), in which the Moderate Resolution Imaging Spectroradiometer (MODIS) climatological Green Vegetation Fraction (GVF) is used to downscale the native biogenic fluxes to model grids, under the assumption that vegetation productivity and respiration scale linearly with canopy coverage in each grid cell.

X-STILT model setup
Built upon the Stochastic Time-Inverted Lagrangian Transport (STILT) model Lin et al., 2003), the column version, X-STILT (Wu et al., 2018), tracks the movement of air parcels backward in time from the same atmospheric column as OCO-2 soundings ("column receptors"). It incorporates averaging kernel and pressure weighting functions from OCO-2 to generate source-receptor matrices or "column footprints" (ppm / (μmol m-2 s-1), which describe the sensitivity of potential upwind emission sources on downwind satellite soundings. Specifically, 100 air parcels are released from each of the vertical levels, ranging from the surface to 6 km with a vertical spacing of 100 m below 3 km and 500 m above. For every track, we sampled approximately equally spaced 40 soundings within a 1 • latitude range of Lahore and 20 soundings per degree-latitude outside Lahore. The meteorology from Global Data Assimilation System at 0.5 • grid spacing (GDAS0p5, Rolph et al., 2017) is used to drive X-STILT. The sum of the convolution of gridded footprints and ODIAC emissions represent the CO 2 enhancement from fossil fuel emissions, as sampled by air parcels arriving at the OCO-2 sounding locations. To have the same representation of the WRF-Chem model setup, only contributions from the nearfield land (30.88-32.155 • N, 73.55-75.04 • E) were included in the X CO2ff calculation.

Flux cross-sectional integration (FCSI) method setup
Our flux integration technique follows Varon et al. (2018) and we applied it to each OCO-2 track city plume transect. To establish the link between the flux computed across the plume transect and the city emissions, our method assumes steady-state conditions for both emissions and wind in the hours preceding the satellite overpass and a relatively homogeneous wind field over the area. The method follows five successive steps: (i) estimation of the effective wind driving the plume from the city, (ii) calculation of the background concentrations, (iii) detection of the limits of the plume section, (iv) integration of the flux directly derived from the plume enhancements, and (v) determination of the surface footprint of the selected plume.
The effective wind corresponds to the mean wind seen by the targeted CO 2 signal, i.e. the mean over the plume of the wind weighted by local CO 2 mass. Because the CO 2 vertical distribution is unknown, the effective wind cannot be directly estimated. We used the mean wind in the planetary boundary layer (PBL), weighted by the dry air mass instead of the unknown CO 2 mass. This estimator for the effective wind is similar to Kuhlmann et al. (2020) and Zheng et al. (2020), who used a specific altitude (500 m above ground level) as the vertical limit. Here, we assumed that the observed CO 2ff is well-mixed within the PBL because OCO-2 transects are located sufficiently far from the sources. The effective wind vector is averaged horizontally over the whole WRF-Chem 3-km domain to smooth the integrations along the OCO-2 tracks. For wind information, we used the high-resolution operational forecasts (HRES, available at 0.1 • /1-h resolution) of the Integrated Forecasting System of the European Centre for Medium-Range Weather Forecasts (ECMWF) (Owens and Hewson, 2018). The estimate of the effective wind is a major source of uncertainty in the method (Section Text S1.5). Both errors on its speed and its direction generate errors in the emission estimates (see Eq. 2). Information on the actual effective wind direction could be derived from the direction from Lahore to the detected plume. However, as discussed in section 3.4, the inconsistencies between this direction and that of the estimate of the effective wind could be linked to a misattribution of the detected enhancement. Furthermore, using a combination between such a characterization of the effective wind direction and the estimate of the effective wind speed from the meteorological analysis may not make more sense than using the full and consistent estimate of the effective wind speed and direction from the meteorological analysis. The level of coherence between the estimate of the effective wind direction and the plume direction is thus taken as a qualitative indication on the accuracy of the results but is not exploited to adjust the estimate of the effective wind.
The determination of the background X CO2 over Lahore, the X CO2 due to upwind sources and sinks other than the emissions from Lahore, aims at isolating the X CO2 enhancement from sources within the urban area of Lahore. We used two different estimators for the background, detailed in Section Text S1.1. X CO2 enhancements (peaks) can appear along the OCO-2 tracks due to nearby sources, fine-scale variations in the background caused by complex transport patterns, structured errors in the OCO-2 retrievals, and/or potential errors in the background determination method. Therefore, the next step consists of identifying the enhancement originating from Lahore. Spatial smoothing of the retrievals is applied to the background-removed retrievals to filter the noise in the data, separate the different enhancements, and identify the most probable plume. Multiple smoothing methods are applied (Section Text S1.2), along with two peak identification methods (Section Text S1.2). The combination of processing steps and two background calculation methods (Section Text S1.1) give us a set of emissions estimates for each track. The sets are filtered to ensure the coherence between the wind direction from the effective wind computation and the direction from Lahore to the plume section, as well as verify that the plume section is wide enough to catch a representative segment of Lahore emissions (Section Text S1.4).
The enhancement identified as being the plume from Lahore (denoted ΔX CO2Lahore ) is then integrated to estimate the flux of CO 2 mass coming from Lahore through the transect. The enhancement, in ppm, is converted in kg/m 2 using the following equation: where M CO2 and M dryair are the CO 2 and dry air molar masses, P dryair(surf) is the dry air surface pressure, and g is the gravitational acceleration. The flux is then calculated following: where n Track ̅̅ ̅→ is the vector normal to the track and x the along track distance and U eff ̅ ̅→ is the effective wind vector. A key parameter for OCO-2 track selection is the angle between the OCO-2 track and the effective wind vector, which are ideally normal to each other. Indeed, the impact of wind angle errors increases non-linearly with the angle between the normal to the track and the wind (Section Text S1.5).
As the detected plume corresponds to a fraction of the city administrative boundaries or area with significant emissions in and around the city, we define the emission zone as the surface area upwind of the selected plume (Section Text S1.3). The flux estimate will then be compared to ODIAC emissions within this specific emission zone.

CO 2ff emission trend detection 2.7.1. Bayesian inversion framework
The posterior emission (E m ) and its error variance (σ e 2 ) are obtained as follows: where λ is the scaling factor, E i is the emission from the priori emission inventory and σ 2 is the error variance of the scaling factor. Note that λ is different from the ratios of satellite/bottom-up-emissions. The Bayesian inversion calculated λ by optimizing the satellite/reference ratios with uncertainties (Eq. 6). Following Ye et al. (2020), the scaling factor (λ) was calculated using a Bayesian inversion method. We estimated the error variance in the observations (σ 2 o ) by adding the two terms: where σ 2 satellite is the satellite measurement error variance and σ 2 model is the forward model error variance. The estimations of these two terms are detailed in Section 2.7.2.
For each track, scaling factor (λ) and the posterior error variance (σ 2 ) can be written as: where y o and y m represent the integrated X CO2ff along a latitudinal range from the satellite and model, respectively and σ a is the uncertainty of the prior estimate, set as a fraction of the emissions. The prior estimate (λ a ) is set to unity (1).
In this study, we estimated the posterior emissions based on outputs of WRF-Chem only, allowing us to include error estimations for both the biosphere and the transport model error in the final estimation. The inputs include CO 2ff prior emission (E i ), prior uncertainty (σ a ), integrated X CO2ff along a latitudinal range from the satellite (y o ) and model (y m ), and satellite and model errors (σ satellite and σ model ). Specially, E i is obtained from ODIAC and y o is derived based on OCO-2 X CO2 after removing the background and the Biogenic X CO2 (See Section 2.2 and 2.4.3), with error determination described in Section 2.7.2.

Error determination
2.7.2.1. Prior uncertainty and measurement error. We assumed 3 different (10%, 20%, and 40%) prior uncertainty levels to test their effect on CO 2ff emission inversion. The satellite measurement error (σ satellite in eq. (5)) of each X CO2 sounding consists of three parts: a random error related to noise, a systematic error, and the uncertainty from biogenic flux. Similar to a previous study (Ye et al., 2020), we conservatively considered a random error for each X CO2 sounding with a standard deviation of 1 ppm, leading to a 0.20-0.45 ppm standard deviation for the 1-s averaged data calculated with 5-24 soundings (a fraction of soundings filtered by the quality flags). Systematic errors are ignored as we used bias-corrected data, which is presumably free from potential biases. The uncertainty from biogenic fluxes cannot be derived from the satellite data. Instead, we used the standard deviations of Bio-X CO2 driven with the 15 different global TBMs (also see Section 2.4.3) to represent the biogenic flux uncertainty in observed X CO2 .

Model error caused by transport error.
Following the method described in Ye et al. (2020), we determined transport model errors using the rotation and stretch of modeled wind fields. We rotated the simulated plume c(x, y, t) at a given time (t) by an angle θ about the emission center (x 0 , y 0 ) to get c r (x r , y r , t). The rotated plume was then transformed to incorporate random wind speed error (ε) as: where We assumed that errors in wind speed and direction follow the normal distributions of N(0, σ ws ) (unit: m/s) and N(0, σ wd ) (unit: • ), respectively, where σ ws and σ wd are standard deviations of wind speed and direction, respectively. The evaluations of σ ws and σ wd are shown in Section Text S4. The X CO2 model errors are represented by the standard deviations of 10,000 Monte Carlo simulations of transformed plumes.

Global view of high-quality OCO-2 tracks
To understand the potential of OCO-2 satellite data on detecting an urban CO 2ff emission trend, the global data availability must be quantified. Thus, we scanned all the OCO-2 soundings from 2014 to 2019 and calculated how many are high-quality soundings, as marked by quality flags on 1 • x 1 • grids (Fig. 1). Note that the OCO-2 soundings are subject to a two-level quality inspection (L1 and L2). The ratios of high-quality soundings for each level are shown in Fig. S3. The ratios used in Fig. 1 were obtained by multiplying the ratios in L1 and L2. Globally, about 10.82% of soundings are marked as high-quality by OCO-2 quality flags, while the high-quality sounding ratio in the low-and mid-latitudes (< 60 • ) is 14.61%. The global mean ratio over land (11.47%, excluding Antarctica) is similar to the global ocean (11.85%) mean ratio. In the low-and mid-latitudes, the mean ratio over land is 14.36% and 14.71% over the oceans. The ratios over Asia, Latin America, North America, Africa, Europe, and Oceania are 9.82%, 6.99%, 7.39%, 18.57%, 6.10%, and 38.07%, respectively. Some areas in the mid-latitude, such as the western United States, southern Africa, Australia, North Africa, and the Middle East, have ratios higher than 50% while some areas in China, Southeast Asia, part of the Sahara Desert, Central Africa, and the Amazon forests have ratios lower than 10%.

OCO-2 high-quality soundings over the 70 most populated cities
Since this study focuses on the CO 2ff emissions from city areas, we scanned all OCO-2 tracks over the 70 most populated cities globally from 2014 to 2019 to examine how many tracks are valid for CO 2ff emission inversion calculations. The names of the 70 cities are listed in Fig. S4. Considering that cities vary significantly by size, we calculated the ratios of OCO-2 high-quality soundings from all soundings within boxes of Fig. 1. Global 1 • x 1 • OCO-2 high-quality sounding ratios past two-level quality inspection. different sizes (from 25-to 200-km border-to-center distance) to account for suburban areas and nearby towns (Fig. 2). Like the 1 • x 1 • global view, the ratios of high-quality soundings over the 70 cities are shown more specifically at the L1 and L2 levels in Fig. S5. Overall, the ratio remains at about 17%, independent of the box sizes. Further investigation shows that the data loss in OCO-2 measurements is mainly due to cloud and aerosol followed by topography (see Section Text S3). We only considered the target and glint modes in our study. The numbers of soundings in the same-size boxes over different cities are almost equal because the soundings are evenly distributed along the tracks (for both nadir and glint modes). Assuming that the ratio of useable tracks is similar to the fraction of high-quality soundings, on average 3-4 tracks per city per year are valid for CO 2ff emission inversion, given the 16-day OCO-2 revisit period. The ratios of high-quality soundings near cities in North America and Africa are higher than for any other continent. Over North America, Europe, and Asia the ratios of high-quality soundings gradually decrease as the box size increases. There are more high-quality soundings within a 25-km box than other sized boxes in Latin America, which might relate to the proximity of the coastline (lower retrieval density over water). The ratio of good soundings increases significantly as the box size increases in Oceania because Sydney, the only selected city in Oceania, is on the coast and far from other large cities. The ratio of good soundings also increases with box size in Africa from 25 to 100 km.

Comparison of modeled and observed local X CO2ff enhancement over Lahore
The X CO2ff enhancement is defined as the enhancement in X CO2 due to local fossil fuel emissions. Fig. 3 shows two sample comparisons between modeled and observed X CO2ff enhancements. On Apr 2, 2017, the OCO-2 1-s averaged data showed a 0.68 ppm X CO2ff enhancement peak at 31.57 • E latitude. WRF-Chem captured a similar 0.58 ppm peak, but the position shifted slightly south. X-STILT showed a 0.29 ppm peak, much lower than OCO-2 and WRF-Chem, located between the OCO-2 and WRF-Chem peaks. On Jan 15, 2018, OCO-2 1-s averaged data showed a 0.22 ppm peak at 31.43 • E latitude. WRF-Chem also shows a peak at that position, but the enhancement is about 1.77 times higher than the OCO-2 peak at 0.39 ppm. X-STILT showed a similar 0.38 ppm peak as WRF-Chem, but located at 31.22 • E latitude. It is notable that the modeled peak enhancement and position are sensitive to the modeled wind field. For example, the WRF-Chem and X-STILT peaks on Jan 15, 2018, not overlapping but show a similar enhancement (Fig. 3d), implying that the wind directions slightly differ in the two models.
The comparison of 25 all-data tracks is shown in Fig. S9 and the comparison of the 8 high-quality tracks is shown in Fig. S10. Overall, the OCO-2 data in the all-data tracks showed greater variations of X CO2ff and greater discrepancies with the two models than the high-quality tracks. The X CO2 enhancement peaks of the two models are more consistent with each other than with those in the OCO-2 data, in terms of both position and magnitude. Fig. 4 shows two representative examples of plume transects by OCO-2 and illustrates the application of the different steps of the FCSI method. The first example (Fig. 4ab) corresponds to favourable conditions on Jan. 15, 2018 with a near-perfect match between the maximum enhancement and the projected wind vector, while the second example illustrates the more challenging conditions on May 15, 2015. Note that favourable condition is a stricter condition than high-quality tracks. Tracks with favourable conditions are much fewer than high-quality tracks, not enough to evaluate the performance of the FSCI method. In the second example, the mismatch between the location of the detected plume section and the estimate of the effective wind direction can be explained by either (i) uncertainties in the wind field from the meteorological re-analysis product used to estimate the effective wind, (ii) uncertainties in the derivation of the effective wind due to a lack of knowledge on the vertical distribution of CO 2 (especially when CO 2 accumulates near the surface), or (iii) the fact that the detected plume section could actually correspond to a signal from CO 2 sources other than the city of Lahore. A lack of coherence between the detected plume direction characterized by the location of the detected plume section and the effective wind direction estimate highlights a major error in the estimate of the effective wind that applies to both direction and speed or misattribution of the enhancement. To determine the corresponding surface area influencing the OCO-2 retrievals, a mask is applied to ODIAC based on the effective wind direction of the plume (computed

Evaluation of OCO-2 quality flags at the city-level
The OCO-2 quality flags were originally designed for global-scale studies. Although there are some TCCON sites located in urban areas (http://www.tccon.caltech.edu/site-locations/index.html), its effectiveness at city scale has not been fully evaluated. Because the soundings filtered out by the quality flags in the L1 product have no corresponding X CO2 values, we could only examine the X CO2 soundings in the L2 data. Due to the lack of ground-based X CO2 data, we evaluated the relevance of quality flags in L2 products based on the three methods. This multimodel evaluation is independent of model-specific assumptions, especially the meteorological conditions: WRF driven by ERA-5, X-STILT driven by GDAS, and FCSI with ECMWF HRES. Fig. 5 shows the satellite/bottom-up-emissions (OCO-2/ODIAC) ratios for the 25 all-data tracks (in red) and 8 high-quality tracks (in blue). We compared the ratios rather than the differences because the CO 2ff emission scaling factors are based on the satellite/bottom-up-emissions ratios. For simplicity, Fig. 5 is a boxplot without outliers (see Fig. S11 for outliers). The ratios of the high-quality group converge closer to 1 compared to the all-data group. The high-quality tracks produce relatively few outliers while the 25-track ratios show large outliers (>10; Fig. S11). For high-quality tracks, WRF-Chem and X-STILT have similar reported error bars but the error bar of the FCSI method is greater. All 3 models consistently suggest that the median ratios are greater than 1, which implies that the ODIAC slightly underestimates the CO 2ff emissions over Lahore. The possible reasons for underestimation could be unreported sources or inaccurate emission factors over fast-growing areas like Lahore. For the FCSI method, the mean of high-quality cases is not similar to the other methods, but the spread is much smaller than the all-data estimate. Based on the reduced spreads of high-quality tracks, we conclude that the OCO-2 quality flags are useful filters to identify low-quality OCO-2 retrievals producing unlikely emission values over Lahore. Hence, quality flags should be considered in cityscale studies. The larger uncertainties of the FCSI method might be caused by the variability in the surface footprint, which is difficult to accurately determine due to uncertainties in the identification of the city plume and vertical distribution of CO 2 (with well-mixed conditions assumed in the wind calculation). The high-quality track ratios from X-STILT seem to converge more (smaller range) than WRF-Chem. Given that only 8 high-quality tracks are used for comparison, more tracks are needed to confirm which model is optimal for CO 2ff inversion. The large spread in low-quality OCO-2 tracks across the three methods confirms the relevance of the quality flags in urban emission quantification.

Emission trend detection over Lahore
We calculated the CO 2ff emissions using our Bayesian inversion system applied to the 8 high-quality tracks simulated by WRF-Chem (Fig. 6). We estimated the posterior (optimal) emissions only using WRF-Chem, as it is the only model for which both biosphere fluxes and transport errors have been quantified here. To better understand the impact of prior emissions and observations (defined as σ a and σ o in eqs. (5), (6), and (7)), we ran our Bayesian inversion system with various uncertainty levels. In a previous study (Ye et al., 2020), prior uncertainties over cities other than Los Angeles were estimated as 40% of Fig. 3. Comparison between modeled and observed X CO2ff enhancements of two sample high-quality tracks over Lahore. Panels (a) and (c) show the simulated X CO2ff from WRF-Chem and the observed X CO2ff obtained from the OCO-2 data (background and biosphere X CO2 have been subtracted). The vectors represent 10-m wind from WRF-Chem with the reference vector standing for a wind speed of 5 m/s. Panels (b) and (d) show the OCO-2 X CO2ff (grey diamond marks represent high-quality (QF = 0) soundings, with background and biosphere X CO2 subtracted), 1-s averaged OCO-2 X CO2ff (Orange dotted line), simulated X CO2ff by WRF-Chem (red dotted line), and X-STILT (black dot line). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) the net emissions at the daily time scale. We also defined our initial prior emission uncertainty as 40% of city emissions. However, the observation uncertainty varies by track; see Section 2.7.2 and Table S2 for details of observation uncertainty estimation. The baseline (red line in Fig. 6a) uses a prior uncertainty of 40% of emissions and observation uncertainty as indicated in Table S2. We tested the impact of prior uncertainties by assigning different values, from 10% to 20% and 40% of emissions (Fig. 6a), as well as the impact of observation uncertainties by decreasing the variances down to 10%, 20%, and 40% of the original estimation (Fig. 6b). The inversion inputs are listed in Table S2. Note that the 8 high-quality tracks are sampled at different times of the year. We adjusted the individual posterior emissions by their prescribed monthly and weekday scaling factors to remove the effects of sub-annual variability. Monthly scaling factors are the ratios of monthly emissions / monthly averaged emissions in corresponding years from ODIAC. Weekday scaling factors were obtained from the Temporal Improvements for Modeling Emissions by Scaling (TIMES) model (Nassar et al., 2013). In future studies, these adjustments should be based on economic activity data (e.g., energy production/consumption, traffic, industrial activities) instead of a prescribed climatology of fossil fuel emission.
The trend in the posterior emission baseline case is about 734 kt Carbon per year (or 6.7%) independent of prior uncertainty. However, the emission trend decreases as the level of confidence in the OCO-2 data (and the WRF-Chem model) increases (i.e., observation uncertainty decreases) (cf. Table 1). Based on the Mann-Kendall test (Lei et al., 2018(Lei et al., , 2019, we evaluated the trend significant probability at various prior uncertainty levels. We calculated the possibility of p-values less than 0.05 from 10,000 Monte Carlo simulations of the Mann-Kendall upward trend test. The possibility significantly increases as prior uncertainty decreases, while significantly decreasing as observation uncertainty decreases. We conclude here that for the 8 high-quality tracks, the trend is mostly driven by prior emissions while the track-to-track variability is too large to identify the positive trend based solely on OCO-2 data. To understand the effect of the number of valid satellite tracks, we also ran idealized simulations of p-values for 25 high-quality tracks (with the same dates as the 25 all-data tracks) assuming that posterior emissions and uncertainties are equal to the prior. The probability of detecting the positive trend with 25 tracks increases to 98.71%, 60.20%, and 25.25% when prior uncertainties are set at 10%, 20%, and 40%, respectively. The possibility is significantly higher than using 8 high-quality tracks. To achieve a greater-than-50% significance possibility at the 95% confidence level, less than 10% prior uncertainty for 8 tracks (or less than 20% prior uncertainty for 25 tracks) is required. This implies that the trend is driven by prior and not optimized emissions, even with 25 valid  (c) show the OCO-2 X CO2 sounding over the ODIAC inventory (in the WRF inner domain), with the calculated emission zone (white shading) and estimated effective wind direction (red arrow). Panels (b) and (d) show the along-track background-removed raw OCO-2 X CO2 data (grey dots) with the smoothed data inside (blue line) and outside (orange line) the detected plume. The blue area highlights the integrated area used for emission calculation. The variations of the X CO2 signal outside the detected limits of the plume originate from other sources excluded by our emission mask and are thus not used for the emission calculation. In all panels, black dots represent the plume detected limits and the red dots are the wind axis-track axis intersection. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) tracks over 5 years.

Discussion
We calculated the number of usable tracks based on the ratios of OCO-2 high-quality soundings compared to all available soundings. We also found that the number of usable tracks varies significantly among cities when searching for tracks with over 150 high-quality soundings. For example, 7-9 high-quality tracks per year are found over Los Angeles and 14-16 high-quality tracks per year are found over Mexico City, but we found less than 1 high-quality track per year over Beijing and Washington DC. On the other hand, the number of high-quality tracks is limited by factors such as downwind location of the track, wind homogeneity, sounding distribution, magnitude of X CO2 enhancement, city size, and aerosol contamination.
We selected Lahore in Pakistan as a case study. In theory, its size and fast-growing economy should make it easier to capture the annual CO 2ff emission trend from satellite data. For cities with slowed growth or reduced fossil fuel consumption under climate policies, a longer time window is needed until the total changes in emissions that can be statistically confirmed are greater than sub-annual variations. Currently, the number of high-quality satellite tracks is not sufficient to capture sub-annual variations. Forthcoming satellite missions will provide additional observations potentially able to capture urban emission trends (e.g. GeoCarb, CO2M). Another solution is to refine the sounding selection criteria for cities. We examined the thresholds of selection criteria that trigger the quality flags and found the default thresholds in the OCO-2 L2 product are slightly over-cautious for Lahore (Section Text S5). Adjusting the thresholds for cities may bring more high-quality tracks from current satellites (OCO-2/3). Note that the impact of local topography on dispersion varies between plume city and base city (Ye et al., 2020), which might lead to large differences in sounding selection criteria due to aerosol accumulation.
The signal-to-noise ratio is also a notable problem. The X CO2ff enhancements derived from 1-s averaged data ( Fig. 3 and Fig. S10) are usually less than 1 ppm, but the variation in non-average data can be as high as 1.5 to 2.0 ppm in the 1-s interval. Thus, satellite data with higher precision would be beneficial to better constrain CO 2ff emissions.
Observing System Simulation Experiments (OSSEs) (Y. Wang et al., 2020a;Ye et al., 2020) could constrain the monthly emission uncertainties with multiple valid tracks at higher temporal frequency, thanks to the absence of any significant bias in model errors. If multiple tracks were made available each month, the CO 2ff emission trend could be captured by satellite observations independently of priori emission trends.
The cost-efficiency ratios of the models must be considered for future expansions of this study to more cities. The FCSI method is the least CPU demanding but its satellite/bottom-up-emissions ratios of suitable tracks seem to converge less than for WRF and X-STILT. However, more valid tracks would be needed to confirm this result. X-STILT requires midlevel computational resources among three approaches and its derived ratios of high-quality tracks converged more than the other two methods. Given that only 8 high-quality tracks were examined over Lahore in this study, no definitive conclusion can be drawn whether X-STILT or WRF-Chem is an optimal solution with the highest costeffectiveness. Additional comparisons between these approaches using more tracks and over different cities are required in the future.
The next generation Orbiting Carbon Observatory satellite, OCO-3, was launched in May 2019, providing a novel viewing mode dedicated to denser data collection over cities. The Snapshot Area Map (SAM) mode of OCO-3 yields a raster of data by scanning wider areas over cities compared to OCO-2, reducing the possibility that the tracks miss the plumes from cities. The Bayesian inversion framework should be adjusted to fit the wider spatial coverage. OCO-2 continues to collect observations even after the launch of OCO-3, allowing us to study the difference between the two satellites and whether the SAM mode improves CO 2ff emission inversion. In addition, more CO 2 monitoring satellites (e.g. GeoCarb and CO2M) will be launched in the coming years. Reconciling various satellite products will remain challenging with noticeable differences between X CO2 retrievals from the different satellites, but this constellation also represents an opportunity to leverage a large volume of satellite-derived X CO2 data to better understand urban carbon emissions. The ratios of the FCSI method correspond to the ratios of CO 2ff emissions over the areas identified as plumes. For each box, the central line indicates the median while the bottom and top edges of the box indicate the 25th and 75th percentiles (q1 and q3), respectively. Outliers greater than q3 + 2 × (q3 − q1) or less than q1-2 × (q3 − q1) are omitted for simplicity and can be found in Fig. S11. The whiskers extend to the most extreme value that is not an outlier. Fig. 6. Posterior CO 2ff emissions over Lahore at various uncertainty levels: a) fix obeservation uncertainty (σ o ) as 100% of derived from OCO-2 and WRF-Chem as well as set prior uncertainty (σ a ) as 10%, 20%, and 40%; b) fix prior uncertainty (σ a ) as 40% and set obeservation uncertainty (σ o ) as 10%, 20%, and 40% of derived from OCO-2 and WRF-Chem. Note: To avoid overlapping error bars, we offset 20% and 40% error bars by 1.5 and 3 months, respectively. The dash lines represent the least-squares linear regression of the medians.

Conclusions
In this study, we examined the number of valid OCO-2 tracks over the 70 most populated cities globally by examining the ratios of highquality soundings. About 17% of OCO-2 soundings are marked as high-quality by the current flagging procedure. Given the 16-day OCO-2 revisit period, this produces 3-4 valid tracks per year per city on average. The occurrence of sounding selection criteria shows that cloud blockage and aerosol contamination are the two main causes of data loss.
The OCO-2 quality flags were originally designed for global-scale studies. We for the first time evaluated their effectiveness at the city level. We selected Lahore in Pakistan as a case study and compared satellite/bottom-up-emissions ratios from WRF-Chem, X-STILT, and the FCSI method using 25 all-data tracks and 8 high-quality tracks. The ratios of the high-quality group with significantly lower uncertainties converge closer to 1 compared to the all-data group. Thus, OCO-2 quality flags defined at the global level are useful filters for removing low-quality OCO-2 retrievals in city-scale studies. All three models consistently suggest that the ratio medians are greater than 1, implying that the ODIAC product may be slightly underestimating CO 2ff emissions over Lahore.
We estimated the posterior (optimal) CO 2ff emissions over Lahore using a Bayesian inversion system at various prior and observation uncertainty levels and adjusted posterior emissions by monthly and weekday scaling factors. The prior CO 2ff emissions from ODIAC suggest that the Lahore 2014-2019 trend was about 646 kt C/year (i.e., an annual 5.9% increase), while the baseline of posterior CO 2ff emission trend was about 734 kt C/year (i.e., an annual 6.7% increase). The trend significant probability of posterior emissions was independent of prior uncertainty but decreases as the observation uncertainty decreases. This suggests that the emission trend is mostly driven by ODIAC emissions. To understand the effect of the number of valid satellite tracks, we also ran idealized simulations of p-values for 25 tracks assuming that posterior emissions and uncertainties equal to the prior values. The 10,000 Monte Carlo simulations of the Mann-Kendall upward trend test showed that less than 10% prior uncertainty for 8 tracks (or less than 20% prior uncertainty for 25 tracks) is required to achieve a greater-than-50% trend significant possibility at a 95% confidence level. It implies that the trend is driven primarily by the prior emissions and not the OCO-2 observations, with either 8 or 25 valid tracks over 5 years. The key to improving the role of satellites and models in emission trend detection is to obtain more valid tracks. This requires additional satellite measurements with an increased sampling frequency, wider swaths of satellite tracks, a better understanding of X CO2 retrievals over cities, and a parallel improvement of CO 2ff emission inversion methods.

Data availability
The following data in this study is available from public sources: NCAR Upper Air Database: https://rda.ucar.edu/datasets/ds370. GDAS0p5: ftp://arlftp.arlhq.noaa.gov/archives/gdas0p5/ All analysis code will be made available on request.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Table 1
Least-squares linear regression equations of posterior CO 2ff emissions over Lahore and trend significant probabilities at various uncertainty levels. Note: the trend significant probabilities are defined as the possibilities that the p-values of the Mann-Kendall upward trend test less are than 0.05 derived by 10,000 Monte Carlo simulation. R. Lei et al.