Assessing the optimized precision of the aircraft mass balance method for measurement of urban greenhouse gas emission rates through averaging

To effectively address climate change, aggressive mitigation policies need to be implemented to reduce greenhouse gas emissions. Anthropogenic carbon emissions are mostly generated from urban environments, where human activities are spatially concentrated. Improvements in uncertainty determinations and precision of measurement techniques are critical to permit accurate and precise tracking of emissions changes relative to the reduction targets. As part of the INFLUX project, we quantified carbon dioxide (CO 2 ), carbon monoxide (CO) and methane (CH 4 ) emission rates for the city of Indianapolis by averaging results from nine aircraft-based mass balance experiments performed in November-December 2014. Our goal was to assess the achievable precision of the aircraft-based mass balance method through averaging, assuming constant CO 2 , CH 4 and CO emissions during a three-week field campaign in late fall. The averaging method leads to an emission rate of 14,600 mol/s for CO 2 , assumed to be largely fossil-derived for this period of the year, and 108 mol/s for CO. The relative standard error of the mean is 17% and 16%, for CO 2 and CO, respectively, at the 95% confidence level (CL), i.e. a more than 2-fold improvement from the previous estimate of ~40% for single-flight measurements for Indianapolis. For CH 4 , the averaged emission rate is 67 mol/s, while the standard error of the mean at 95% CL is large, i.e. ±60%. Given the results for CO 2 and CO for the same flight data, we conclude that this much larger scatter in the observed CH 4 emission rate is most likely due to variability of CH 4 emissions, suggesting that the assumption of constant daily emissions is not correct for CH 4 sources. This work shows that repeated measurements using aircraft-based mass balance methods can yield sufficient precision of the mean to inform emissions reduction efforts by detecting changes over time in urban emissions.

urban environments also emit large amounts of methane (CH 4 ) (Wunch et al., 2009, Wennberg et al., 2012, McKain et al., 2015, a potent greenhouse gas (GHG) with a global warming potential 28-34 times greater than that of CO 2 over a 100-year time period (Myhre et al., 2013). Despite ocean and land sinks, ~5 5% of CO 2 emissions accumulate in the atmosphere and can thus impact the global climate (Kirschke et al., 2013;Le Quéré et al., 2014). Quantification of both the magnitude and uncertainty of GHG emissions in urban environments is therefore critical for implementing coherent and effective policies to mitigate such emissions, and to reduce their effects on climate change (IEA, 2008;Hutyra et al., 2014).
Since the Copenhagen Accord in 2009, several countries have confirmed their commitment to reduce their GHG emissions (UNFCCC, 2010;President's Climate Action Plan, 2013, 2014a, 2014bEuropean Commission, 2014). During the recent United Nations Framework Convention on Climate Change negotiating session held in Paris 2015 (COP21/CMP11), the United States announced a goal of 26-28% reduction in emissions by 2025 compared to 2005 levels, while China committed to a carbon dioxide emissions reduction of 60-65% per unit of gross domestic product by 2030 (China's INDC, June 2015). Europe adopted a reduction target of at least 40% below 1990 levels by 2030 (30-40% below 2005 depending on activity sectors; http://ec.europa.eu/clima/policies/strategies/2030/index_en.htm, accessed on 12/07/2015). Such goals are achievable, but reduction efforts need to be "measurable", "reportable", and "verifiable" (Vine and Sathaye, 1999;Schakenbach et al., 2006;NRC, 2010;UNFCCC, 2015). Measurements of GHG emissions often have significant uncertainties, ranging from less than 10% to 100% (NRC, 2010), depending on the methods used (e.g., bottom-up inventories, which assess and integrate emissions from specific sources and activities; top-down techniques, which rely on greenhouse gas concentration measurements conducted within the atmosphere), the type of sources and target gases, and the spatial scale considered (national, regional, and local) (Marland, 2008(Marland, , 2012NRC, 2010;Peylin et al., 2011;Kirschke et al., 2013;Cambaliza et al., 2014;Bergamaschi et al., 2015). Uncertainties for GHG emissions at sub-national scales (e.g. city/county/state/province) are usually significantly greater (~50% to >100%, Gurney et al., 2009;Mays et al., 2009;NRC, 2010;Cambaliza et al., 2014;2015) than those at national or continental scales (< 25% at national levels, Marland, 2008Marland, , 2012Gurney et al., 2009;NRC, 2010), and are in many cases larger than the emission reduction targets themselves, making it difficult to assess the efficacy of local scale efforts. These large uncertainties can be partially explained by the dearth of local-scale measurements and method development, coupled with the general lack of local-scale mitigation efforts (Rosenzweig et al., 2010;Hutyra et al., 2014). However, local policy plans initiated by proactive cities and networks of local political decisionmakers (i.e. World Mayors Council on Climate Change, Local Governments for Sustainability) have recently emerged and highlighted the urgent need of new and timely information on urban GHG emissions from the scientific community (Rosenzweig et al., 2010;Hutyra et al., 2014).
Urban emissions of CO 2 , CH 4 , and CO (a proxy for combustion) are usually derived from several natural and anthropogenic sources that are difficult to separate (Hutyra et al., 2014). Both fossil fuel-and biogenic-related CO 2 concentrations in urban environments are affected by many sources and sinks. Fossil fuel-related CO 2 is largely emitted by electricity generation, mobile source combustion, and point sources, such as large industrial facilities. Biogenic CO 2 sources and sinks include biosphere respiration, biofuel use, and biomass burning. CO has been used as a tracer for anthropogenic sources (Parrish et al., 1993;Turbull et al., 2011), and is emitted during incomplete combustion of fossil-and bio-fuels from vehicles, agricultural waste (not at Indianapolis), and industrial processes. CO is also produced by oxidation of biogenic volatile carbon compounds, especially during the summer time, and is consumed in the atmosphere by reaction with the OH radical (Parrish et al., 1993). CH 4 can be emitted from natural wetlands, or from anthropogenic sources (Bousquet et al., 2006;Kirschke et al., 2013), such as rice production (not at Indianapolis), ruminant animals (not at Indianapolis), natural gas infrastructure, biomass burning, landfills, and wastewater treatment plants. CH 4 emission estimates are usually less certain than CO 2 estimates because many CH 4 sources are more diffuse, as a result of unintended release or more complex and temporally and spatially variable biological decay processes (Forster et al., 2007;Kirschke et al., 2013;President's Climate Action Plan, 2014b).
Improvement in the quality of GHG measurements represents an urgent challenge. The development of low cost, precise emissions measurement techniques that can be applied to a range of urban environments is needed to provide scientifically sound information for future GHG mitigation strategies, whether local, regional or national. The multi-institution collaborative Indianapolis Flux Experiment project (INFLUX, http://sites.psu.edu/ influx/) was designed to evaluate and minimize GHG emissions uncertainties at the city-scale, by developing, assessing, combining, and improving top-down and bottom-up approaches to quantify urban GHG emissions. Indianapolis is an advantageous test area, due to its physical separation from adjacent cities by surrounding agricultural lands, effectively isolating the city from other major sources of anthropogenic pollution (Cambaliza et al., 2014(Cambaliza et al., , 2015. The INFLUX project involves various top-down approaches to estimate CO 2 , CH 4 , and CO emissions from the city using continuous measurements and flask sampling from 12 towers situated within the city and in the surrounding suburbs (Miles et al., 2015;Turnbull et al., 2012Turnbull et al., , 2015, periodic aircraft measurements (Mays et al., 2009;Cambaliza et al., 2014Cambaliza et al., , 2015, inverse modeling , and a high resolution model-data fusion product, Hestia, providing a bottom-up estimate of fossil fuel-related CO 2 emissions for Indianapolis (Gurney et al., 2012). Integration of multiple top-down and bottom-up approaches are needed to converge on the most accurate, policy-relevant, and mechanistically representative emission estimate (Nisbet and Weiss, 2010).
As part of INFLUX, we have been quantifying the emission rates for CO 2 and CH 4 for the city of Indianapolis since 2008, and for CO since 2014, using a top-down aircraft-based mass balance approach. Top-down aircraftbased approaches are particularly effective at quantifying urban emissions because of their capability to sample the entire plume of a large emission source, with a short deployment time. Limitations of this method of measurement include initial costs, sporadic frequency of measurements, reliance on weather conditions (e.g. constant wind direction), and challenges in the definition of background conditions (e.g. White et al., 1976;Trainer et al., 1995;Lind and Kok, 1999;Kalthoff et al., 2002;Carras et al., 2002;Mays et al., 2009;Turnbull et al., 2011;O'Shea et al., 2014aO'Shea et al., , 2014bGioli et al., 2014;Cambaliza et al., 2014Cambaliza et al., , 2015Karion et al., 2015). However, aircraft-based measurement approaches have been successfully compared to bottom-up inventories (Cambaliza et al., 2014;Karion et al., 2015;Lamb et al., 2016). While accurate average results are achievable, individual flight realizations of emission rates have an uncertainty of ~ ±40% (Cambaliza et al., 2014).
In this paper, we describe our efforts to evaluate the potential of the mass balance experiment (MBE) approach to achieve useful precision through averaging of assumed random error. Karion et al. (2015) showed that, by averaging eight flight experiments, the precision of the mean for the MBE approach can be significantly improved when data are collected in a relatively short period of time, enabling the assumption of constant GHG emissions during the sampling period. They demonstrated that the averaged CH 4 emission rate in the Barnett Shale region, which includes eight different counties with dense natural gas production and urban areas, can be known with a precision of ~2 8% using the 1-sigma standard deviation of the mean, and of 17% at the 95% confidence level (CL) using a statistical bootstrapping method, while the relative uncertainties they found for a single-flight MBE can be as high as ~4 0%. In this paper, we investigate the efficacy of this method at the city scale through averaging of nine MBEs conducted from November 13 th to December 3 rd 2014 (3 weeks). We present and discuss the results for CO 2 , CO, and CH 4 emission rates for the nine MBEs, and their average and standard error of the mean at 95% CL. Finally, we discuss the variability of the emission rates, and areas for improvement of this method.

Study area
Indianapolis, Indiana (39.79°N, 86.15°W, ~2 40 m above sea level) is the 12 th largest city in the United States with a population of 848,800 (2014 United States Census Bureau) (Figure 1). The metropolitan area, combining Indianapolis, Carmel and Anderson, has (for the study period) a population of 1,971,000. Indianapolis is located on a flat plain, at least 120 km away from other metropolitan areas and is surrounded in all directions by rural land-use, primarily cropland. This isolated urban geography, often referred to as an "island city", is also accompanied by a relatively uniform inflow of boundary layer GHG concentrations, since upwind anthropogenic sources are relatively well-mixed when they reach Indianapolis (Cambaliza et al., 2014). These features make atmospheric measurements easier to interpret and simulate.

Aircraft instrumentation
CH 4 , CO, and CO 2 emissions were quantified using an aircraft-based platform, combined with a mass balance approach. Flight experiments were performed using Purdue University's Airborne Laboratory for Atmospheric Research (ALAR, http://science.purdue.edu/shepson/ research/bai/alar.html), a light twin-engine Beechcraft Duchess aircraft with an instrumentation compartment space of ~1 m 3 . This aircraft is equipped with i) a global positioning and inertial navigation system (GPS/INS), ii) a Best Air Turbulence (BAT) probe for wind measurements (Crawford and Dobosy, 1992;Garman et al. 2006Garman et al. , 2008, iii) a Picarro cavity ring-down spectrometer (CRDS, model G2401-m) for in-situ, real-time CO 2 , CO, CH 4 , and H 2 O measurements (Crosson, 2008;see Karion et al. (2013a) for more details on the instrument performance), iv) an in-flight CO 2 /CH 4 calibration system, and v) a programmable flask package (PFP) system for discrete ambient air sampling (Karion et al., 2013a;Sweeney et al., 2015).
Wind speeds and wind directions were obtained at 50 Hz using data from the nine-port differential pressure BAT probe that extends from the nose of the aircraft. Atmospheric temperature was measured using a microbead thermistor located at the center of the probe. A fast response thermocouple was also attached to the probe for comparison with the microbead observations. The measured pressure variations across the hemisphere of the probe were combined with 50 Hz spatial data from both GPS/INS and temperature sensors to obtain the three-dimensional wind vectors (Garman et al, 2006).
The CRDS measures gas concentrations at 0.5 Hz. Ambient air is pulled from the nose of the aircraft through a 5 cm diameter PFA Teflon tube at a flow rate of 1840 l/min (residence time: ~0 .1 s) using a high-capacity blower located at the rear of the aircraft. The spectrometer is connected to the Teflon tubing using a tee and a 0.64 cm diameter Teflon inlet line, allowing ambient air to be continuously pumped through the analyzer at a flow rate of ~3 00 sccm (residence time: ~1 0 s). Just before the beginning and at the end of each MBE, both CO 2 and CH 4 were calibrated in-flight using three NOAA/ESRL reference cylinders. The certified mole fractions in each tank were: i) low concentrations: 368.02 ppm and 1781.05 ppb for CO 2 and CH 4 , respectively, ii) medium concentrations: 410.73 ppm and 2222.42 ppb, iii) high concentrations: 447.11 ppm and 3261.49 ppb. The NOAA reproducibility (1 s) on the cylinder measurements are 0.03 ppm for CO 2 and 0.35 ppb for CH 4 . The CRDS exhibits consistent reproducibility and linearity over time (Supplementary information: Figure S1). CO was calibrated at NOAA/ESLR before and after the field campaign, i.e. in Sept. 2014 and July 2015. The two calibrations exhibited the same calibration coefficients to within the uncertainty of the measurement, which was determined using in-flight calibrations, as described in Section S1.

Flight design and boundary layer height determination
Nine MBEs were performed during the late fall, on weekdays, from Nov. 13 th to Dec. 3 rd 2014 (Table S1). They were conducted when the convective boundary layer (CBL) was most likely fully developed and relatively constant in height throughout the duration of the experiment (about 4 hours on average), i.e. between 12:00 and 16:00 local time (Cambaliza et al., 2014). Prior to each flight, we ensured that morning wind conditions were sufficiently strong (wind speed at the surface and aloft) and consistent (wind direction) to avoid significant GHG accumulation in the boundary layer prior to our measurements (wind conditions monitored via Forecast (http://www.wunderground. com/, Terminal Aerodrome Forecast/TAF, Model Output Statistics/MOS) and real time weather at Indianapolis Automatic Weather Observation System/ASOS, METeorological Airport Report/METAR). For a typical MBE, the aircraft groundtrack was oriented perpendicular to the wind direction to intercept the polluted plume from the city (e.g. Carras et al., 2002). Three to five horizontal transects were performed at one downwind distance from the city and at different constant altitudes up to close to the top of the CBL (z i ) (Figure 1, Figure S2). Downwind transects were flown approximately 30 km from the city center, a distance far enough from the city that the plume was well-mixed but close enough to measure the urban plume above the background and to minimize mixing to and contact with the top of the boundary layer. The downwind transects allow for the construction of a two-dimensional plane onto which measurements are projected (e.g. Carras et al. 2002;Kalthoff et al., 2002;Cambaliza et al., 2014). Indianapolis is approximately 70 km in width and therefore transect lengths are extended to 80-100 km to allow the capture of downwind air from beyond the city limits for regional background concentration determination ( Cambaliza et al., 2014). One upwind horizontal transect (30 km upwind of the city center) was flown at a constant altitude (approximately 350 m above ground, Figure S2) prior to the downwind transects to identify possible significant CO, CO 2 , and CH 4 sources upwind of Indianapolis, which would contribute to the observed city plume. In this case and since only one upwind transect is performed by MBE, the emission rate of the upwind point source was calculated by a single transect approach (Turnbull et al., 2011;Karion et al., 2013b). The emission rate is then subtracted from the final downwind emission rate exiting the city. A significant upwind source of CO 2 from the Eagle Valley power plant was transported to the city on Nov. 19 th , the only day where upwind CO 2 emissions were observed ( Figure S2). No identifiable upwind CO emissions were observed during the field campaign. Depending on the wind direction, CH 4 emissions from two different landfills located outside the city are evident in the sampled air and thus mixed into the city plume (TwinBridges if westerly winds, Caldwell if easterly winds, see Figures 1 and S2). We subtracted the averaged emission rate of the respective landfill reported in Cambaliza et al. (2017), when necessary. Specifically, we subtracted an emission rate of 12.2 mol/s estimated for the TwinBridges landfill ( Cambaliza et al., 2017) from the city-wide CH 4 emission rate obtained on Nov. 13 th , 14 th , 17 th , 19 th , 20 th , 25 th and Dec. 3 rd (westerly winds, Figure S2), and 8.5 mol/s estimated for the Caldwell landfill (Cambaliza et al., 2015) on Nov. 21 st (easterly winds, Figure S2). Emissions from the two landfills were not advected into the city plume on Dec. 1 st (southerly winds). The CBL depth was determined using measured vertical profiles (VPs) of H 2 O, CO 2 and CH 4 concentrations and potential temperature (θ) ( Figure S3). When conditions allowed, VPs were performed from as close to the ground as is safe (usually 150 m above ground level (agl)), to above the top of the CBL and into the free troposphere (~1500 m agl). Typically, two vertical profiles are conducted for each experiment (e.g., before starting the upwind transect and after finishing the last downwind transect) to assess the change in CBL height over the course of the experiment (Figures 1 and S2, Table S1). The average of the upwind and downwind z i from the two VPs is used as the upper bound of the vertical integration for the emission rate calculation (Eq. 2, see section 2.4). For days of thick cloud cover, only one VP (Nov. 17 th and 19 th ) or no VP (Nov. 13 th and 25 th ) was achieved (Table S1). To determine the full range of z i for these particulars days, we used observations from a Doppler LIDAR (Light Detection and Ranging), Halo, located on the roof of Ivy Tech Community College northeast of Indianapolis (GPS coordinates: 39.8615°N, 86.0038°W, http://www.esrl.noaa.gov/csd/ groups/csd3/measurements/influx/) (Figures 1 and S4). The LIDAR started to measure boundary layer parameters beginning in April, 2013. A fixed scan pattern, repeated every 20-30 minutes, provides vertical profiles of horizontal wind speed and direction, vertical velocity variance and aerosol backscatter intensity. z i values from the LIDAR were determined using the vertical velocity variance data available on the NOAA/ESRL website (see link above), after we ensured that both the vertical velocity variance and the aerosol backscatter signal strength provided similar results. From August 2013 to November 2014, nine VPs were flown above the LIDAR, allowing the comparison of z i from both aircraft and LIDAR observations (Figure 2). The linear fit (Figure 2), forced through zero, exhibits a slope equal to 1.00 ± 0.02 (1σ) with a R 2 = 0.90 (Pearson coefficient is equal to 0.950, when the critical value is 0.798 at 99% CL, n-2 = 7), highlighting a good correlation between the two methods, with no significant bias. We thus considered that the full range of z i for flights when only one or no VP was performed can be determined using the LIDAR observations. We averaged the z i observations from the LIDAR corresponding to the duration of a single MBE, and used this average in Eq. 2 for emission rate integration.

Aircraft emission rate calculation and background selection
CH 4 , CO 2 , and CO concentrations ( Figure S5), temperature, pressure and perpendicular wind speed recorded on the downwind transects were used to calculate the emission rates (in mol/s) from the city. First, fluxes (F ijk , in mol/m 2 /s) were calculated at each data point ( Figure S6) using the following formula:  where C M,ijk is the concentration measured for a specific latitude (i), longitude (j) and height (k) and converted to mol/m 3 using the ideal gas law (pressure and temperature are used to calculate molar density). C bg,ijk represents background concentration (mol/m 3 , see below for more details), U ijk is the 10 s averaged perpendicular wind speed (m/s, Cambaliza et al., 2014). The term (C M,ijk -C bg,ijk ), also referred to as E ijk in the supplementary information, represents the enhancement from the city. All the single-point horizontal fluxes are then projected onto a 2-D plane, and interpolated from the ground to the top of the boundary layer (z i , in m), to the edges of the downwind distance flown by the airplane using a multi-transect kriging approach, at 10 m (z-axis) × 100 m (x-axis) resolution Here, ER (mol/s) is the integrated emission rate for one MBE, -x and +x are the effective horizontal boundaries of the city, and dx and dz are the discrete horizontal and vertical distances (in m), over which the emission rate values are calculated, respectively. Whereas in Eq. 1 we calculate the flux through a vertical plane downwind of the city, in Eq. 2 we sum the flux in each interpolated pixel of that plane to calculate the total integrated emission rate from the city for a MBE. This produces a result which is consistent with what we observe, which is the total signal from all emission sources in the city. This distinguishes it from a surface flux, which can be calculated as a spatial average, e.g. as done by Mays et al. (2009). However, the surface fluxes are spatially highly heterogeneous (the power plant is a point source that represents roughly a third of the total, and mobile source emissions follow the major roads), and so calculated fluxes are not readily comparable between observations, while total emissions are much more so. And, the area used by Mays et al. (2009) represents an arbitrary choice of the definition of city boundaries. Thus we report only the total urban emissions for Indianapolis, in moles/s. While entrainment/detrainment is possible, this has not been observed to date in our downwind vertical profiles. Typically, the boundary layer height does not increase by more than 15% and often less, during the course of the experiment. Because of this and the relatively short distance of the downwind transects from city sources (~30 km), chosen to minimize time for full mixing to the top of the boundary layer, our method assumes that entrainment/detrainment effects are minimal. We should also note that since we designed these flights for the fully developed boundary layer afternoon period, our measurements reflect emission rates for that time of day, as well as the Fall season. We also calculated enhancements, and then fluxes, after interpolation of the data (kriging), i.e. at each gridded cell using interpolated GHG concentrations, pressure, temperature and perpendicular wind speed (Mays et al., 2009;Cambaliza et al., 2014Cambaliza et al., , 2015, to assess any changes in emission rates due to the choice of the kriging approach. Results from the both methods are presented in the "Results and Discussion". Concentrations from both edges of the downwind transects are considered to be equivalent to the inflow of background air on the upwind side of the city, as previously defined by Mays et al. (2009) and Cambaliza et al. (2014Cambaliza et al. ( , 2015 for aircraft measurements at Indianapolis. This approach was also used by Karion et al. (2015) for the Barnett Shale region. The location and distance of the transect edges were determined using a similar technique proposed by Cambaliza et al. (2014), i.e. by i) projecting the city boundaries onto the downwind transects using the observed mean wind direction, ii) observing the Gaussian shape of the CO 2 , CH 4 and CO plumes recorded on these transects, and iii) defining the "edge" as the area of the transect where mixing ratios decrease to a constant concentration, and are likely not influenced by urban emissions. For all the flights, plumes from the city are well defined and concentrations decrease before reaching a constant concentration on both edges of the downwind transects. These edge concentrations are used for background determination (Figures 3, S5).
In the interest of consistency, we defined backgrounds in two ways for all the flights. First, background concentrations were defined as a single horizontally constant value, but varying with altitude (Cambaliza et al., 2014). This choice was motivated by the fact that significant background vertical gradients are observed for several flights, demonstrated on Dec. 3 rd for CO 2 , CH 4 and CO ( Figure S5, see also vertical profiles on Figure S3), suggesting that altitude-dependent background concentrations, which might also vary with time and the growth of the CBL, should be determined for each individual downwind transect. Additionally, for some downwind transects, we observed that background concentrations on one edge can be significantly different from background concentrations observed on the opposite edge, suggesting a horizontal gradient in background concentration along the transect (see Nov. 25 th , Figures 3 and S5). Therefore, backgrounds were also defined using a linear function passing through median concentrations of the two sets of edge concentrations for each downwind transect (Figures 3,  S5). The slope and the y-intercept of the linear function are used to calculate background concentrations at each data point where a measurement was recorded (C bg,ijk in Eq. 1). Consequences of background determination on emission rate results are discussed in the "Results and Discussion".
For the Nov-Dec 2014 field campaign, we assumed constant emissions from CO 2 , CH 4 and CO sources at Indianapolis such that each of the nine MBEs can be considered as statistically repeated independent sampling of carbon emissions from the city. This is a reasonable assumption for CO and CO 2 at a given climate condition for weekdays, and for CH 4 , given the diversity of sources in the city, either biogenic, or from the natural gas distribution system (Lamb et al., 2016). Emission rates from the nine MBEs were then averaged and the standard error of the mean at 95% CL (also referred SEM95 hereafter) calculated as t s n * , with t-student = 2.306, s is the sample standard deviation from the nine MBEs, and n the number of experiments.

Method comparison
To optimize our analysis method, data were processed using two different kriging approaches and background determinations. The background was defined i) as a single horizontal constant background concentration varying with altitude (BG avg ) (Cambaliza et al., 2014) and ii) as a linear function to account for any horizontal gradient in background concentrations along the length of the downwind transects (BG linReg ). We also interpolated i) concentrations, temperature, pressure and winds before calculating fluxes at each gridded cell (kriging_CTPU) (Mays et al., 2009;Cambaliza et al., 2014Cambaliza et al., , 2015 or ii) individual point determinations of the calculated fluxes (kriging_F ijk ). Calculated average emission rates for CO 2 , CO, and CH 4 are presented in Table 1, as a function of background selection and interpolation approach. When averaged, CO 2 , CO, and CH 4 emission rates are statistically indistinguishable using either BG avg or BG linReg and kriging_F ijk or kriging_CTPU, as shown in Table 1. When BG avg is used with both the kriging_F ijk and kriging_CTPU approaches, CO 2 SEM95 is equal to 45% and 39%, respectively, i.e. poorer than the 17% SEM95 (kriging_F ijk ) and 25% SEM95 (kriging_CTPU) found when BG linReg is applied. This result suggests that linear functions used to determine background concentrations greatly reduce variability of CO 2 emission rates calculated from replicate measurements. However, the SEM95 is similar for CO and CH 4 regardless of the method of background determination. The use of kriging_CTPU or kriging_F ijk does not significantly change the SEM95 of the method for CO 2 and CO. For CH 4 , the SEM95 is improved by ~2 0% using kriging_CPTU instead of kriging_F ijk . However, the CH 4 SEM95 remains large (41-60%, Table 1) compared to those for CO 2 and CO for the same set of flights, suggesting that the CH 4 emission rate variability is driven by factors other than the data analysis approach, and that for CH 4, the assumption of constant emission rate is not robust. In the following, we use the kriging_F ijk approach with BG linReg , since these two choices appear to yield the lowest SEM95 for the averaged CO 2 and CO emission rates. Cambaliza et al. (2014) demonstrated that the choice in background concentration determination is one of the most sensitive parameters impacting uncertainties and variability in the emission rate results. Of course, uncertainties in the calculated emission rates due to background depend significantly on the magnitude of the enhancement, i.e. that the uncertainties are larger for smaller enhancements. For the Nov.-Dec. 2014 flights, we observed that change in background determination can lead to significant change in the emission rate of a single MBE. When the method kriging_F ijk is applied, the average difference between emission rates of a MBE calculated using BG linReg and using BG avg is 50% for CO 2 (median = 9%), 32% for CH 4 (median = 18%) and 13% for CO (median = 9%), and range from 1% -198%, 7% -150% and 1% -35%, respectively (Figure 4, Table  S2). The most significant differences are observed on Nov. 17 th and 19 th for CO 2 , and Dec. 3 rd for CH 4 (Figure 4, Table  S2). However, given the presence of spatial gradients in  Table S1). Dashed lines represent the background (as a linear regression) for each downwind transect. Higher background concentrations on the southern part of transects (positive distances) might be attributed to Eagle Valley power plant plume (see Figure 1). Vertical back lines represent the limits between background (BG, edges) and the city plume. DOI: https://doi.org/10.1525/elementa.134.f3 background, we regard it best to use the linear regression approach. These results do indicate that the CH 4 emission rate determination is not more sensitive to background than is CO 2 , supporting our conclusions that the actual CH 4 emission rates are variable day-to-day.
We also observed differences in calculated emission rates for a single MBE depending on the choice of the kriging method (and for a same background determination) (Figure 4). For example, when BG linReg is used, the averaged relative difference between the and CO emission rates from the nine mass balance experiments (MBEs) i) when the emission rates were calculated by kriging the point fluxes (kriging_F ijk ) and ii) when they were calculated and integrated after interpolation of concentrations, pressure, temperature and perpendicular wind speed (kriging_CTPU). For both interpolation methods, backgrounds were defined iii) as linear regression (BG linReg ) and iv) as averages (BG avg   two kriging methods for a single flight is equal to 25% (median = 14%, range = 3% -62%) for CO 2 , 25% (median = 21%, range = 6% -58%) for CH 4 and 24% (median = 17%, range = 3% -66%) for CO (Table S2). However, we believe it is most logical to interpolate the individual fluxes, since this interpolation is conducted only for one data set (i.e. calculated point fluxes).
3.2 Improvement of CO 2 fossil fuel and CO emission rate precision through averaging Table 2 summarizes emission rate results of the nine MBEs obtained from Eq. 1 and 2 (for kriging_F ijk and BG linReg ). Individual CO 2 , CO, and CH 4 emission rates vary from 10,200 to 20,200 mol/s, 62 to 139 mol/s, and 16 to 189 mol/s, respectively (Figure 4). CO 2 and CH 4 emission rates are in the range of emission rates previously reported in Mays et al. (2009) and Cambaliza et al. (2014Cambaliza et al. ( , 2015 for Indianapolis (from 2,500 to 49,000 mol/s for CO 2 and 12 to 230 mol/s for CH 4 ). When averaged, our measured emission rates are equal to 14,600 mol/s for CO 2 , 108 mol/s for CO and 67 mol/s for CH 4 . The average CO 2 emission rate for eight non-growing-season measurements in Mays et al. (2009)  use aircraft-based measurements and the mass balance approach to determine the CO 2 , CH 4 and CO emission rates from London, UK. They measured emission rates of 36,000 ± 3300 mol/s, 240 ± 16 mol/s and 220 ± 8 mol/s for CO 2 , CH 4 and CO, respectively. These emission rates are considerably greater than those measured for Indianapolis, which is likely due, in part, by the larger population in London (appx 8.6 million) and older infrastructure. Estimation of the uncertainty for a single MBE (∆ER, see Text S1 in supplementary information, Table S3) was calculated by propagating i) the measurement uncertainties of each term involved in Eq. 1, (i.e. uncertainties in pressure and temperature, which were used to convert into units of mol/m 3 , uncertainties of the calibration and from the linear fit for background determination and in the wind speed) and ii) the uncertainty of the CBL height due to growth during the experiment. Our uncertainty estimate represents a lower limit of the uncertainty associated with a single MBE, since all the factors known to influence emission rate results, such as entrainment from the free troposphere and interpolation of data points using the kriging approach (Cambaliza et al., 2014), were not accounted for. Relative uncertainties (RSD% = ∆ER/ER) calculated from the uncertainty estimate of single MBEs vary between 23% and 91% (average = 44%) for CO 2 , 24% and 81% (average = 42%) for CO, and 25% and 153% (average = 67%) for CH 4 ( Table 1). When emission rates are averaged over the nine MBEs, the SEM95 (representing the expected precision for replication of a nine-point  ) is equal to 17% for CO 2 , 16% for CO, and 60% for CH 4 . Although the CH 4 variability is large, the precision of the mean for CO 2 and CO, when calculated from several MBEs performed in a short period of time (so that the emissions can be assumed to be constant), represents a significant improvement compared to the measurement uncertainty estimated for a single MBE measurement (and as shown by Cambaliza et al. (2014) to be 40-50%). This averaging approach represents a considerable improvement compared to the commonly reported uncertainties of GHG emission measurements from urban environments (~50% up to 100%; Trainer et al., 1995;Gurney et al., 2009;Mays et al., 2009;NRC, 2010;Turnbull et al., 2011;Cambaliza et al., 2014Cambaliza et al., , 2015. Since the results for CH 4 are very different, for the same set of flight data, it seems clear that the emission rate variability is substantially different for CH 4 . The proposed averaging method has several potential limitations. All of the MBEs in this study were performed during the late fall when the total CO 2 enhancement from Indianapolis can be attributed to urban fossil fuel-related sources and biogenic CO 2 can be considered negligible (Turnbull et al., 2015). Also during the late fall, the mobile sector is the dominant source of CO emissions (Turnbull et al., 2015). The interpretation of the SEM95 is also based on the assumption that fossil fuel-related CO 2 , CO, and CH 4 emissions from Indianapolis are constant during the three-week sampling period. In reality, some daily variability in urban emissions occurs (e.g. due to variability in ambient temperature and subsequent heating and electric power requirements), which may contribute to a larger (poorer) apparent estimated precision. Thus the SEM95 results obtained here are upper limits to the method precision for the period of year considered here, and it is likely that the true method precision is better than reflected in the SEM95 for CO 2 and CO.

Variability of CH 4 emissions
As discussed above, CH 4 emissions for Indianapolis are considerably more variable day-to-day than are those of CO 2 and CO. The averaged CH 4 emission rate (67 ± 40 mol/s) found for late fall 2014 is close to the emission rate calculated from a bottom-up inventory (57 mol/s, Lamb et al., 2016) built with measurements of selected sources in the city, including natural gas distribution facilities, landfills and waste water treatment facilities. The range of the CH 4 emission rate for the late fall overlaps with but is significantly lower than the range reported for springsummer 2011 at Indianapolis (135 ± 58 mol/s, averaged from five airborne MBEs; Cambaliza et al., 2015), implying considerable variability for methane.
As shown by Cambaliza et al. (2015), the large enhancement of CH 4 signal from the south side of Indianapolis is attributed to the Southside landfill (SSLF, Figure 1), which represents a significant portion of the city-wide CH 4 emissions. We used wind direction data recorded by the BAT probe and the Hybrid Single Particle Lagrangian Integrated Trajectory Model (HYSPLIT, Draxler and Rolph, 2012) to confirm that the large CH 4 enhancements observed in fall 2014 ( Figure S5b) can also be attributed to the SSLF emissions. Cambaliza et al. (2015) determined that the SSLF represented, on average, 33% of the total CH 4 emissions from Indianapolis. Lamb et al. (2016) reported an average CH 4 emission rate of 31 mol/s for the SSLF, using ground-based mobile sampling for inverse plume modeling, and 30 mol/s is obtained from the EPA's GHG Reporting Program. If we assume that the CH 4 average we obtained (67 mol/s) for the city-wide total is representative, the Lamb et al. (2016) value for the SSLF represents 46% of the city-wide total. Variability of CH 4 emissions from landfills is partially dependent on the local climate, which can influence the seasonal oxidation of landfill covers (Xu et al., 2014;Spokas et al., 2015). To better understand the variability of the SSLF emissions, we considered several factors known to influence landfill emissions, such as change in barometric pressure over time, air temperature and precipitation (Xu et al., 2014;Spokas et al., 2015). However, no particular correlations were found between aircraft-determined CH 4 emission rates (the part attributed to the landfill) and average atmospheric temperature recorded at Indianapolis International airport during the experiments (data downloaded from http://www.ncdc.noaa.gov/qclcd/QCLCD, accessed on 10/05/2015), as well as with relative humidity and barometric pressure. Lamb et al. (2016) report that the uncertainty in their determination of the SSLF emission rate is ±30%, or 9 mol/s. In contrast, the SEM95 for the city-wide total we determined is 40 mol/s (60%), i.e. much larger than the SSLF uncertainty, and larger than the average SSLF emission rate reported by Lamb et al. (2016). The total CH 4 variability must, then, derive from CH 4 sources other than the landfill and must be large enough to explain the high variability of the city-wide CH 4 emission rate (SEM95 of 60%). Using the ratio of propane-to-methane ((C 3 H 8 )/CH 4 ) concentrations from aircraft flask samples and the slope of the C 3 H 8 vs. CH 4 regression, Cambaliza et al (2015) demonstrated that the non-landfill-related CH 4 emissions from Indianapolis represent ~6 7% of the total CH 4 emissions and could be attributed to the natural gas distribution system. Lamb et al. (2016) estimated a contribution of natural gas sources of 43% from ethane/methane observation coupled with inverse modeling. The authors also found that emissions from natural gas systems can vary by three orders of magnitude, depending on the type of structures in the natural gas system (i.e., pipeline, transmission and storage station, etc.). Emissions from the natural gas system at Indianapolis are potentially due to random small sources (Lamb et al., 2016) and can be highly spatially and temporally variable, which likely explains the variability of CH 4 emission rates we observed in Nov.-Dec. 2014. However, to explain our observed variability, we would need to turn to large GHG contributors, among which are the natural gas sources.

Conclusions
From nine mass balance experiments performed in Indianapolis in Nov.-Dec. 2014, we quantified averaged CO 2 , CO, and CH 4 emission rates (14,600, 108 and 67 mol/s, respectively). The SEM95 results were equal to 17% for CO 2 and 16% for CO (at the 95% CL), i.e. much lower than for the single-flight precision (~40%) found for Indianapolis (Cambaliza et al., 2014), and for the estimated single-flight uncertainties for the results reported here. We applied averaging to improve estimation of CO 2 and CO urban emission rates during the late fall, when CO 2 and CO emissions are primarily of anthropogenic origin. During the summer, poorer apparent precision would be expected due to the greater CO 2 and CO emissions variability from biogenic emission contributions. A field campaign similar to the one presented in this study (multiple MBEs flown in a short period of time, to allow for an assumption of constant emission rates over the time period) should be performed during the spring and summer seasons to evaluate the averaging method when biogenic CO 2 and CO emissions are not negligible. While the precision obtained in this mass balance experiment is approaching GHG reduction target values, we expect averaged emission rate precisions to be further improved with increasing numbers of airborne MBEs performed over a short period of time. Precisions might also be further improved by either the use of multiple aircraft flying simultaneously, or one aircraft flying at the top of the boundary layer and employing a downward-looking Differential Absorption LIDAR (Dobler et al., 2013;Abshire et al., 2014). Thus the target precision is technically attainable, the main limitation being the cost of the experiment.
Previous aircraft and surface-based measurements have suggested that most of the remaining CH 4 (~2/3 of the city-wide emissions) likely derives from the city's natural gas distribution system (Cambaliza et al., 2015;Lamb et al., 2016). Since the Southside Landfill is a minor component of the total, and the remainder is believed to be derived from the natural gas network, it appears that the variability in our observations of the city-wide total may be attributed to variability in the nature and magnitude of individual leaks in that system. Surface-based measurements of methane, ethane, and d 13 C-CH 4 may allow us to differentiate between CH 4 sources, complementing the aircraft-based total emission rate measurement. For this city, evaluation of emission mitigation progress for methane may be difficult because of the large day-to-day variability in the source strength.

Data Accessibility Statement
Data presented in this paper are available on the INFLUX webpage (http://sites.psu.edu/influx/) and upon request from the corresponding author.