Comparisons of the Orbiting Carbon Observatory-2 (OCO-2) X CO 2 measurements with TCCON

NASA's Orbiting Carbon Observatory-2 (OCO-2) has been measuring carbon dioxide column-averaged dry-air mole fraction, X_(CO_2), in the Earth's atmosphere for over 2 years. In this paper, we describe the comparisons between the first major release of the OCO-2 retrieval algorithm (B7r) and X_(CO2) from OCO-2's primary ground-based validation network: the Total Carbon Column Observing Network (TCCON). The OCO-2 X_(CO_2) retrievals, after filtering and bias correction, agree well when aggregated around and coincident with TCCON data in nadir, glint, and target observation modes, with absolute median differences less than 0.4 ppm and RMS differences less than 1.5 ppm. After bias correction, residual biases remain. These biases appear to depend on latitude, surface properties, and scattering by aerosols. It is thus crucial to continue measurement comparisons with TCCON to monitor and evaluate the OCO-2 X_(CO_2) data quality throughout its mission.


Introduction
The Orbiting Carbon Observatory-2 (OCO-2) is NASA's first Earth-orbiting satellite dedicated to observing atmospheric carbon dioxide (CO 2 ) to better understand the carbon cycle. The mission's main goal is to measure carbon dioxide with enough precision and accuracy to characterize its sources and sinks on regional scales and to quantify its seasonal and interannual variability Boland et al., 2009;Crisp, 2015). OCO-2 was successfully launched on 2 July 2014 into low-Earth orbit, and its grating spectrometers measure near-infrared spectra of sunlight reflected off the Earth's surface in three spectral regions (centered at 0.765, 1.61, and 2.06 µm). Carbon dioxide and oxygen (O 2 ) in the Earth's atmosphere absorb sunlight at well-known wavelengths in the three spectral regions. By fitting those absorption features using an optimal estimation retrieval algorithm described in detail by O'Dell et al. (2012) and Connor et al. (2008), atmospheric abundances of carbon dioxide and surface pressure are retrieved along with other atmospheric and surface properties (e.g., cloud and aerosol optical depth and distribution, water vapor, temperature, and surface reflectance).
The main product from the retrieved abundances of carbon dioxide and surface pressure is the column-averaged dry-air mole fraction of CO 2 , called X CO 2 , which is the ratio of CO 2 to the dry surface pressure. The X CO 2 quantity is useful for carbon cycle science, as it is used to directly infer surface fluxes of CO 2 , and is relatively insensitive to vertical mixing (Yang et al., 2007;Keppel-Aleks et al., 2011). In the remainder of this paper, a "measurement" refers to the entire process of producing the atmospheric abundances of X CO 2 .
OCO-2 measures X CO 2 with high precision from space  but possesses biases that the OCO-2 team have attempted to characterize and remove (Man-drake et al., 2015). To validate the OCO-2 measurements, we use the Total Carbon Column Observing Network (TC-CON; Wunch et al., 2011a), a comprehensive ground-based validation network that also measures X CO 2 . The TCCON instruments are solar-viewing Fourier transform spectrometers, and they measure the same atmospheric quantity as OCO-2, but their measurements are unaffected by surface properties and minimally affected by aerosols. TCCON instruments cannot measure through optically thick clouds.
The OCO-2 satellite has three viewing modes: nadir mode, in which the instrument points straight down at the surface of the Earth; glint mode, in which the instrument points just off the glint spot on the surface; and target mode, in which the observatory is commanded to scan about a particular point on the ground as it passes overhead. The three modes serve different purposes: the nadir and glint-mode measurements are normally used for scientific analyses, and the target mode is used primarily as part of the OCO-2 bias correction procedure. All three modes must be independently verified using comparisons with the TCCON data. This paper will describe the OCO-2 observation modes in Sect. 2, how the OCO-2 version 7 algorithm target-mode retrievals compare with the TCCON data in Sect. 3, and how the glint and nadir mode measurements compare with TCCON data in Sect. 4.

OCO-observation modes
OCO-2's nadir and glint observation modes are considered the nominal "science modes" of the OCO-2 measurement scheme. The nadir observations produce useful measurements only over land and near the sub-solar point over tropical oceans. The glint data are often separated into glint over land ("land glint") and glint over water ("ocean glint"), as the two modes use different surface reflectance models: Lambertian over land (matching the surface model of the nadir observations) and Cox-Munk with a Lambertian component over water. Retrievals are performed over a limited latitude range in glint due to concerns about biases introduced by aerosol scattering over the largest optical path lengths; see Fig. 1. The nadir mode data can provide more reliable X CO 2 measurements over higher latitudes over land, which is particularly important in the Northern Hemisphere, where the boreal forest, a driver of the CO 2 seasonal cycle, extends north of 70 • N. Measurements over inland lakes can be successful in ocean glint mode.
OCO-2 has a geographical "near-repeat" after 16 days. During each 16-day period, the satellite orbits the Earth 233 times, with each orbit along a distinct "orbital path". The OCO-2 orbit is sun-synchronous, with an equator crossing time near local noon (13:36 LT; Crisp, 2015). The original measurement scheme alternated between glint and nadir observations on alternate 16-day ground track repeat cycles. Due to the loss of ocean measurements during nadir mode, and the loss of high latitude measurements during glint mode,  Figure 1. OCO-2 nadir, glint, and target-mode measurement density in 5 • bins as a function of latitude from the beginning of the mission through 31 December 2016. These are from the "lite" files applying "warn level" 11 filters and requiring that the "xco2_quality_flag" is zero.
key components of the carbon cycle (e.g., the springtime draw down of CO 2 due to the onset of the Northern Hemisphere growing season) were poorly sampled. Thus, the observing strategy was changed to improve the coverage of the oceans and high latitude land masses on 2 July 2015 to alternate between glint and nadir modes for each subsequent orbit. The OCO-2 observation scheme was optimized on 12 November 2015, to assign orbits that are almost entirely over ocean to always measure in glint mode. This change occurred on 72 out of the 233 orbital paths: 15 over the Atlantic and 57 over the Pacific, resulting in higher data throughput due to the reduction in nadir soundings over ocean. Crisp et al. (2017) discuss the measurement strategy in detail.
Target mode is designed to evaluate biases in the OCO-2 X CO 2 product. The target locations are mostly selected to be coincident with ground validation stations, typically at TCCON sites. During a target-mode maneuver, the OCO-2 satellite rotates from its nominal science mode to point at a selected ground location. This transition takes approximately 5 min and rotates the spacecraft's solar panels away from the Sun. The spacecraft then scans across the site or "nods" as it passes overhead to sweep across the ground several times (see Fig. 2) over a period of about 4.5 min: these dithered measurements comprise the "target-mode data". The spacecraft then transitions out of target mode and back into its nominal science mode over the next 5 min. In total, the maneuver takes about 14.5 min and, during this time, the spacecraft, traveling at 7.5 km s −1 , has traveled over 6500 km.
The strength of target-mode measurements is that thousands of spectra are obtained in a short period of time over a small region of the world (about 0.2 • longitude × 0.2 • lat-  Figure 2. The zenith angles viewed during an OCO-2 target-mode maneuver over Lamont on 5 March 2015. The spacecraft "nods" across the ground target as it rotates overhead. The colors and decreasing size of the points indicate the time of the measurement. The top inset shows the locations of the measurements in latitude and longitude. The eight footprints are apparent in the roughly N-S stripes. There are 3473 soundings with a retrieval zenith angle of less than 40 • in this target-mode maneuver, most of which are obscured in the inset by the later, nearly spatially coincident soundings. itude for the densest measurements). For example, in Fig. 2, there are 3473 soundings in the region around the Lamont TCCON station. As long as the target location is far from large emissions sources, X CO 2 can be assumed constant spatially and temporally within a target region, because atmospheric X CO 2 is unlikely to change significantly over small geographic regions within 4.5 min. However, during the maneuver, many other parameters can change significantly, such as the atmospheric path, the path length of the measurement (referred to as the "airmass", where one airmass corresponds to the optical path length of one vertical column through the atmosphere), surface reflectivity (albedo), and topography. Any variability in the retrieved X CO 2 in the target-mode data is considered to be an artifact and can provide insight into biases caused by the algorithm's treatment of the parameters. With this in mind, the target locations were carefully chosen to span a wide range of latitudes, longitudes, and surface types to challenge the OCO-2 retrieval algorithm (B7r) and reveal any biases it causes.

Target locations and selection
There are a limited number of ground locations that can be targeted because the locations must be preprogrammed into the spacecraft software. For the 1st year after launch, there Table 1. Available targets. Note that the target location (listed in degrees latitude, degrees longitude, and altitude above sea level in km) may not be exactly centered on a TCCON site location. Targets without a corresponding TCCON station are marked with a star ( * ) and are not discussed in this paper.  Griffith et al. (2014b) were 19 possible target locations. In July 2015, 8 additional targets slots became available, allowing for 27 target locations. At several times, target locations have been changed or replaced. A list of the ground target locations and dates is provided in Table 1, and a map of their locations is in Fig. 3. Individual locations can be targeted by OCO-2 only on specific OCO-2 orbit paths. Only one target location can be assigned to a given orbit path, and only if the OCO-2 ground track for that path is sufficiently close to the ground target location. Thus, for each day, there are between one and seven ground target locations to choose from. The spacecraft power systems can handle up to three target-mode maneuvers per day due to the power constraints imposed by rotating the spacecraft solar panels away from the Sun. We typically select only one target per day. There are several TCCON stations that are located in regions with significant spatial variability in topography or ground cover. For example, the Lauder TCCON station is in the midst of rolling hills, the Wollongong TCCON station is between the ocean and a sharp escarpment, and the Edwards TCCON station is adjacent to a very bright playa, a land surface property previously identified from the Greenhouse Gases Observing Satellite (GOSAT; Kuze et al., 2009Kuze et al., , 2016 results as challenging for space-borne X CO 2 retrievals (Wunch et al., 2011b). With target-mode measurements, the impact that local surface variability has on the X CO 2 retrievals becomes apparent.
Other TCCON stations (e.g., Park Falls, Lamont) have relatively uniform surface properties and are reasonably far from anthropogenic CO 2 sources, but the ground cover can vary from season to season. The Sodankylä and Eureka sites, located at high northern latitudes, challenge the OCO-2 algorithm at very high solar zenith angles and airmasses and with snowy scenes. Izaña, Réunion, and Ascension, all lowerlatitude sites, are located on small islands remote from large land masses but with significant topography. The Izaña TC- CON station (28.3 • N) is at 2.37 km altitude, whereas the Réunion (20.9 • S, 0.087 km) and Ascension Island (7.9 • S, 0.032 km) stations are closer to sea level.
There are several target locations that are not TCCON stations (Fig. 3, orange stars), and, although data from those targets will not be analyzed in this paper, the data will help assess the radiometric calibration of the instrument, its ability to measure large urban sources of CO 2 , validate its solar-induced fluorescence observations , and assess its ability to measure vertically resolved information about CO 2 . Railroad Valley is a heavily instrumented radiometric calibration site (Kuze et al., 2011), and Libya has surface properties that are valuable for radiometric calibration. Shanghai, São Paulo, and Mexico City are geographically well-constrained urban regions with significant CO 2 emissions. Rosemount and Litchfield have instrumentation that will help verify the OCO-2 solar-induced fluorescence observations. Boulder has frequent AirCore CO 2 profile measurements (Karion et al., 2010). Fairbanks is the location of a future TCCON station.
The OCO-2 spacecraft must be manually commanded to perform a target maneuver. The target locations are selected a day or two in advance, based on the weather forecast, the operational status of the TCCON station (if the target is a TCCON station), the importance of the projected data loss in nadir or glint mode from performing the target-mode operation, and the historical statistics of successful target-mode measurements over that site. The projected data loss depends primarily on whether the nominal mode for that orbit was nadir over land, nadir over ocean, glint over land, or glint over ocean. If the nominal mode is nadir over ocean, little useful data loss occurs, as nadir measurements over ocean are usually too dark in the near-infrared for successful retrievals: in this case, the target is almost always selected given a reasonable weather forecast. This has mostly been the case for Réunion Island, which has been targeted regularly from OCO-2 nadir orbits. For the other three cases, there will be some loss of regular science data to accommodate a target-mode operation. In these cases, the historical statistics of acquiring good target-mode data and weather forecasts are weighted more heavily before enabling the target. Often, if the weather forecast is not ideal, no target-mode measurements will be selected.
As of 31 December 2016, 264 targets have been observed, with 230 of them over TCCON stations. The TCCON data have been analyzed for 90 % of those targets. Of the remaining 208 targets, about 59 % (123) were clear enough to obtain sufficient high-quality OCO-2 data to compare with TCCON data.

Target mode and the OCO-2 bias correction
All current space-based X CO 2 measurements have systematic biases. These biases can be caused by uncertainties in the spectroscopy, by limitations in the information content of the measurements (i.e., the spectra do not contain enough information to resolve multiple independent vertical pieces of information), by uncertainties or oversimplifications in the optical properties of the atmosphere and surface -particularly from low-lying cloud, haze, and aerosols -and by uncertainties in the instrument characterization and calibration (e.g., Crisp et al., 2017;Wunch et al., 2011a;Guerlet et al., 2013;Schneising et al., 2012). Considerable effort is dedicated to creating robust "bias correction" procedures, and these are detailed in regularly updated documentation avail-D. Wunch et al.: OCO-2 validation able online through the Goddard Data Center (GES-DISC, 2016) and the CO 2 portal (JPL-Caltech, 2016). The bias correction procedure for the current B7r dataset is described in Mandrake et al. (2015).
There are three key types of biases addressed by the OCO-2 bias correction procedure: footprint-dependent biases; spurious correlations of the retrieved X CO 2 with other retrieval parameters (a "parameter-dependent" bias); and a multiplicative factor to scale to the World Meteorological Organization (WMO) trace-gas standard scale (Zhao and Tans, 2006), which we will refer to as a "scaling" bias. The parameterdependent bias can depend on retrieval parameters such as the surface pressure retrieval error, signal level, airmass, surface albedo, or spurious variability in the retrieved CO 2 profile.
Each OCO-2 spectral channel records eight spectra simultaneously, each with a slightly different atmospheric path, and hence measures sunlight that has reflected off of a different surface location or "footprint". The spectrally dependent radiometric response of each footprint is different and is calibrated independently. Small (< 0.1 %) uncertainties in the calibration introduce persistent footprint-dependent biases in the retrieved X CO 2 that must be removed as part of the bias correction process. Footprint-dependent biases are corrected using a subset of OCO-2 data collected over small areas around the world, in which there were at least 100 soundings with low variability, and where all eight footprint measurements resulted in a successful retrieval (Mandrake et al., 2015). Note that there are two footprint-dependent corrections applied to the B7r OCO-2 data: one that is applied as part of the standard bias correction algorithm and one that was discovered after the generation of the bias correction. This second "residual footprint bias" correction must be applied manually by the data user (Mandrake et al., 2015). In all subsequent analyses in this paper, both footprint-dependent biases are removed from the data, unless otherwise specified. In future versions of the OCO-2 algorithm, there will be no residual footprint bias correction required.
The parameter-dependent bias correction uses a genetic algorithm to determine which retrieval parameters account for the largest fraction of the spurious variability found in the estimated X CO 2 on large spatial scales (Mandrake et al., 2013(Mandrake et al., , 2015. The algorithm uses two subsets of the OCO-2 data for this task: a "Southern Hemisphere approximation" which exploits the low spatial and temporal variability of X CO 2 in the Southern Hemisphere south of 25 • S (e.g., Wunch et al., 2011b) and a "small area analysis" which exploits the low spatial variability of X CO 2 within small regions (0.89 • latitude on a single orbit track) and can be applied at all latitudes (Mandrake et al., 2015). A multivariate regression is performed between spurious X CO 2 variability and the parameters. The resulting slopes of the regressions allow us to then subtract the predicted bias from the X CO 2 values. In the results that follow, the footprint and parameter-dependent biases in the OCO-2 target-mode data have been removed fol-  is after bias correction but before the scaling is applied. Plot (c) shows the relationship when the scaling correction is applied and the recommended residual footprint correction described in Mandrake et al. (2015). Note that the best fit line in plot (c) is much more consistent with the one-to-one line than in plot (b). The slope and scatter in plot (c) is unaffected by the residual footprint correction. The one-to-one line is indicated by the dashed line, and the best fit is marked in the solid line. The error bars represent the standard deviation about the median. lowing Mandrake et al. (2015), allowing us to determine the scaling factor that ties the OCO-2 X CO 2 scale to the TCCON scale. Data near coastlines are used to link the scaling factors between measurement modes. The parameter-dependent corrections can affect the scaling bias; therefore, they must be removed before the scaling bias can be computed.
Placing the OCO-2 data on the World Meteorological Organization's trace-gas standard scale is crucial for obtaining accurate flux estimations that are consistent with the inversions that assimilate the surface in situ CO 2 measurements that are carefully calibrated to the WMO scale (Zhao and Tans, 2006). The TCCON data are tied to the WMO scale and serve as the link between the calibrated surface in situ measurements and the OCO-2 measurements.
To tie the TCCON measurements to the WMO scale, over 30 profiles of in situ CO 2 have been measured directly overhead of 15 TCCON stations with aircraft carrying carefully calibrated instrumentation (Wofsy, 2011;Pan et al., 2010;Singh et al., 2006) or AirCore (Karion et al., 2010). These profiles, the first of which were collected in 2004, vary in altitude range, depending on the vehicle, and thus must be combined with estimates of the CO 2 in the highest altitudes of the atmosphere to generate a full vertical profile. These high-altitude CO 2 profile estimates are provided by the TC-CON a priori profiles, which are based on in situ measurements of the atmosphere from aircraft and high-altitude balloon platforms . The full vertical profiles are then integrated, smoothing with the TCCON averaging kernel and a priori profile to compute the best estimate of the "true" X CO 2 value. Integrated profiles are compared with the retrieved X CO 2 from the TCCON spectra and result in a highly linear relationship which defines a multiplicative bias between the TCCON X CO 2 and the best estimate of the "truth". Removing this bias from the TCCON X CO 2 ties it to the WMO scale. The details of this method of tying the TCCON X CO 2 to the WMO scale are described in , Washenfelder et al. (2006), Messerschmidt et al. (2011), and in Wunch et al. (2015).
We consider TCCON data to be coincident with the OCO-2 target-mode measurements when they have been recorded within ±30 min of the time at which the spacecraft is closest to nadir during the maneuver. If there are fewer than five TCCON data points recorded within that time, the window is extended to ±120 min, but this is required in only 10 % of cases. We use the full OCO-2 version B7 retrospective data (i.e., B7r), available from GES-DISC (2016, http: //disc.sci.gsfc.nasa.gov/OCO-2), and manually apply the filters listed in Table 2.
The analyses of the target-mode data to develop the scaling bias are completed prior to the generation of "warn levels" and the official filtering schemes, and this scaling bias is applied as part of the bias correction procedure required to generate the "lite" files used commonly by the scientific community. Warn levels determine sets of OCO-2 data with consistent quality data (as defined by the RMS scatter) within an observation mode (Mandrake et al., 2013(Mandrake et al., , 2015. A significant volume of data is required to generate warn levels, which is difficult to achieve with the relatively sparse targetmode data. Furthermore, individual warn levels in one measurement mode are not necessarily equal in quality to another Table 2. Filters applied to the target-mode OCO-2 data from the standard OCO-2 files (i.e., not the "lite" files). The parameter names listed below are written as they are in the standard L2 files. Parameters for which there is only one limit are marked with a "-". The units are listed where applicable. The parameter "blended_albedo" is defined as 2.4 × albedo_o2_fph − 1.13 × albedo_strong_co2_fph. The tag "fph" denotes parameters from the full physics algorithm; "abp" denotes parameters from the A-band preprocessor algorithm designed for quick cloud screening; "idp" denotes the IMAP-DOAS preprocessor.  Figure 6. The site-to-site differences between the OCO-2 data and the coincident TCCON data, separated by observation mode. This is a "box plot": the bottom and top edges of the box indicate the 25 and 75 percentile limits; whiskers represent the full range of the data, excluding the outliers (McGill et al., 1978). The outliers and sites for which only one coincident set of measurements is available are represented by plus ("+") symbols. The grey shaded area indicates the ±0.4 ppm uncertainty in the TCCON values: deviations beyond the shading are more likely attributable to uncertainties in the OCO-2 data. Filled boxes indicate sites for which more than 10 coincident measurements were made. Open boxes have at least three coincident measurements.
mode. The target-mode filters are consistent with the "warn level 15" scheme described by Mandrake et al. (2015), except that the filter on the surface pressure difference from the prior in the A-band preprocessor is loosened, and we have added an additional outlier filter. Figure 4 shows the OCO-2 X CO 2 target-mode median data comparisons with coincident TCCON data. The best fit lines were computed using a method that accounts for uncertainties in the dependent and independent variables (York et al., 2004). Panel (a) shows the results prior to applying the parameter-dependent bias correction and has a correlation coefficient of R 2 = 0.78. Panel (b) shows the relationship after the correction has been applied and an improved correlation coefficient (R 2 = 0.86). This increase in R 2 is significant at the 90 % level (but not the 95 % level; p = 0.055) using a standard Fisher's z-transformation test. The improvement indicates that the parameter-dependent bias correction is effective at removing spurious variability in the OCO-2 data with respect to TCCON. The slope in panel (b), which has a y intercept that is forced through 0, is used to derive the scaling factor between TCCON and OCO-2 target observations (m = 0.9977 ± 0.04, which represents ∼ 1 ppm) for the time period spanning 8 September 2014 through 31 December 2016. The y intercept is forced through 0 because it is assumed that in the absence of atmospheric CO 2 , both OCO-2 and TCCON will measure 0 ppm. The scaling factor derived as part of the Mandrake et al. (2015) bias correction procedure was produced using the data available at the time, which spanned November 2014 through May 2015, and resulted in a similar, but not identical, slope of 0.99694 ± 0.00102. This scaling bias difference results in a 0.3 ppm offset between OCO-2 and TCCON X CO 2 at 400 ppm; the standard bias-corrected OCO-2 measurements appear to be 0.3 ppm too high. Panel (c) of Fig. 4 shows the relationship between the OCO-2 X CO 2 after applying the bias correction, scaling, and the residual footprint correction (m = 1.0007±0.04, R 2 = 0.86). The residual footprint correction does not impact the slope or R 2 value of the relationship. Zhang et al. (2017) have shown that the uncertainties computed on this slope are likely to be significantly overestimated.
The long-term time dependence of the difference between the OCO-2 target-mode data and the coincident TCCON data ( X CO 2 ), after the scaling bias is removed, is plotted in Fig. 5. The algorithm, calibration, and instrument cause no apparent time-dependent drift in X CO 2 or their errors. Thus, the bias correction is successful at reducing both the parameter-dependent and scaling biases with respect to TC-CON and our other bias correction datasets described earlier in this section.
However, the target-mode measurements are sensitive enough to point to some residual biases (i.e., those not corrected by the Mandrake et al., 2015, bias correction process) that are currently under investigation by the OCO-2 algorithm, calibration, and validation teams. These residual biases are more geographically localized in nature and appear to be related to surface properties or instrument pointing errors and as such might not be expected to be captured by the standard bias correction, which is designed to minimize biases that dominate on a more global scale.

OCO-2 biases related to surface properties
Site-dependent differences from the one-to-one plot in Fig. 4b are shown in Fig. 6 and reveal significant locationdependent biases. Any differences with magnitudes less than 0.4 ppm could be attributable to TCCON station site-to-site biases , so we focus on the biases that are significantly larger and thus most likely attributable to the OCO-2 data. Two clear examples of site-dependent biases are at Edwards, with a median low bias of ∼ 1 ppm, and Wollongong, with a median high bias of ∼ 0.8 ppm. The spatial dependence of the target-mode measurements reveals that small-scale variability in surface properties (e.g., albedo, altitude, surface roughness) can cause significant and spurious variability in the OCO-2 X CO 2 .  The Edwards TCCON station is situated in the bright California high desert on the edge of a very bright playa with near-infrared albedos reaching 0.6 and little topographic change (Fig. 7). There have been 12 target observations of Edwards, 10 of which had clear skies during the OCO-2 maneuver. On all but one of the clear-sky target maneuvers over Edwards, the OCO-2 X CO 2 appears to include a spurious dependence on surface brightness, with higher X CO 2 retrieved over brighter surfaces. However, the magnitude of the sensitivity differs from target to target: the RMS of the targetmode measurements ranges from 0.9 to 1.7 ppm, and the relationships between surface albedo and X CO 2 have different slopes (ranging from −2.8 to 10.5 ppm per unit albedo with a mean of 4.5 ppm per unit albedo). The underlying physical reason is currently unknown. All mean target-mode OCO-2 A spatial bias is clearly present, related to the surface elevation, but the sign of the bias changes between targets prior to 20 November 2014 and after.
X CO 2 at Edwards is biased lower than the coincident TCCON X CO 2 .
Conversely, the Wollongong station, which is situated near the east coast of Australia, is a dark surface (with nearinfrared albedos over land of 0.3) and lies between the Tasman Sea to the east and the Illawarra escarpment to the west (Fig. 8). The OCO-2 retrievals of X CO 2 in target mode are systematically higher than those from the TCCON, and are particularly high (up to 5 ppm higher than TCCON) in July and August (Fig. 9), due to the problem discussed below in Sect. 4. OCO-2 data over Białystok, located in a dark, forested region, also has a persistent high bias (on the order of 1 ppm) compared with TCCON.
Even for sites at which OCO-2 X CO 2 does not appear to have a significant bias with respect to TCCON, the retrievals can show spurious spatially correlated errors. The Lauder TCCON station is situated in a valley between rolling hills (Fig. 10). The surface altitude is spatially correlated with changes in X CO 2 during each target-mode maneuver. The pattern is apparent in all but one clear-sky target-mode measurement over Lauder. The biases with respect to TCCON switch sign after 20 November 2014, when the pointing offsets used by the spacecraft were updated (Fig. 10b and c). The average RMS of the differences in X CO 2 before and after 20 November 2014 are 1.2 and 1.1 ppm, respectively. The near-nadir OCO-2 measurements during the target-mode maneuver (defined by restricting retrieval zenith angles to ≤ 20 • ) show RMS variabilities of 0.9 ppm after 20 November 2014 and 0.8 ppm prior to 20 November 2014.

Nadir and glint-mode comparisons to TCCON
In this section, we evaluate the bias-corrected OCO-2 glint and nadir modes against ground-based TCCON data to reveal other biases that were not eliminated using the standard version 7 bias correction. We use the version B7 retrospective OCO-2 "lite" files here, which have had the footprint-dependent, parameter-dependent, and scaling biases (described in Sect. 3.2) removed. The residual footprint correction was applied manually to the data. The "lite" files are available from the CO 2 Virtual Science Data Environment (JPL-Caltech, 2016, http://co2.jpl.nasa.gov) and from GES-DISC (2016).
We limit ourselves to data for which the warn level is less than or equal to 11, as recommended by Mandrake et al. (2015), and for which the "xco2_quality_flag" is zero. Mandrake et al. (2015) caution against using warn levels above 12 for nadir and glint modes, because those data can contain errors significantly in excess of the stated a posteriori uncertainties on the X CO 2 values. For these comparisons, we choose the following coincidence criteria: a box centered around the TCCON station that spans 5 • in latitude and 10 • in longitude on the same day as a TCCON measurement, with the exceptions mentioned below. In the Southern   Figure 11. The dependence of the difference between OCO-2 X CO 2 coincident with TCCON X CO 2 ( X CO 2 ) on the season and OCO-2 observing mode. This is a box plot akin to Hemisphere south of 25 • S, we use a larger box spanning 20 • in latitude and 120 • in longitude because the geographical variance in X CO 2 in the Southern Hemisphere is low (e.g., Wunch et al., 2011b). The Edwards and Pasadena boxes are constructed differently because they are geographically very close to each other, but the Pasadena site is within the polluted, mountain-contained South Coast Air Basin, and Edwards is in the clean desert north of the mountains. Thus, we limit the Edwards latitudes to north of Edwards but allow the longitudes to span 5 • further west over the Pacific Ocean. The Pasadena coincidence box is constrained to the South Coast Air Basin, which significantly limits the number of coincident points (see Appendix Fig. A1a-t).
The median OCO-2 X CO 2 within the coincidence box recorded on a single day is compared with the TCCON daily median for that day. We choose to compare OCO-2 nadir and glint-mode X CO 2 with the TCCON daily median values because the median reduces the random component of the TCCON error budget, it is less sensitive to outlier measurements, and it weights the results to local noon where solar zenith angle changes are slowest, and the timing is better matched with the overpass time of OCO-2's orbit. The more complicated dynamical coincidence criteria used to increase the number of coincident measurements between TC-CON and GOSAT in Wunch et al. (2011b) and Nguyen et al. (2014) are not required for OCO-2, due to OCO-2's much higher data density. Figure 11 shows the differences between coincident OCO-2 X CO 2 and that from TCCON, separated by viewing mode and season. The bottom panel collects the viewing modes together, still separating by season. The OCO-2 X CO 2 appears to have a bias with respect to TCCON that increases  Figure 12. Land glint OCO-2 one-to-one plot against TCCON. The slope of the relationship is represented by "m" in the figure, and the coefficient of determination is represented by "R 2 ". The number of points on the graph is indicated by "N" and the root-mean-square value (RMS) of the differences between OCO-2 and TCCON X CO 2 is also shown. Each point represents a daily median of coincident OCO-2 and TCCON measurements. Many points are overlaid in this graph, obscuring the density of points along the best fit line. with increasing latitude in the land glint and nadir data north of 45 • N (Park Falls). This latitude-dependent bias is consistent with the target-mode results (Fig. 6). A seasonal bias is not apparent at latitudes for which all four seasons have sufficient coincident measurements (Lamont, Edwards, Ascension, Réunion, Wollongong), indicating that the latitudinal bias is not likely caused by an airmass-dependent bias (in either OCO-2 or TCCON). In general, however, the number of coincident measurements is low (Table 3), especially in the Northern Hemisphere north of 45 • N.
In the Southern Hemisphere winter, there is a significant high bias in the retrieved X CO 2 from the OCO-2 ocean glint data. The top panel of Fig. 11 clearly illustrates this problem by showing the divergence of the OCO-2 X CO 2 measurements in ocean glint mode over Wollongong and Lauder from the TCCON X CO 2 values during June, July, and August. There were also three target-mode measurements recorded in the Southern Hemisphere during that time: two points over Wollongong and a third point over Réunion recorded during late July and early August 2015 that hint at this residual bias  Figure 14. Nadir OCO-2 one-to-one plot against TCCON. The annotations follow those in Fig. 12. Table 3. Glint and nadir statistics for data filtered using warn levels ≤ 11 and the xco2_quality_flag = 0. The median bias (OCO-2 − TCCON) and its RMS, R 2 and number of daily median comparison points, or "coincidences" (N ) are listed below for each TCCON station. If the number of coincidences is larger than 10, the results are marked in bold font. The "Total" row is calculated by considering all the coincidences in the  Table 4. Glint and nadir statistics for data filtered using Warn Levels ≤ 15 and the xco2_quality_flag = 0. The median bias (OCO-2 − TCCON) and its RMS, R 2 and number of daily median comparison points, or "coincidences" (N ) are listed below for each TCCON station. If the number of coincidences is larger than 10, the results are marked in bold font. The "Total" row is calculated by considering all the coincidences in the table independently. In general, the RMS values are equal to or larger than those in Table 3 Table 5. Bias-corrected glint, nadir, and target relationships with TCCON. The slope and its uncertainty, R 2 , and number of daily median comparison points (N) are listed below for each OCO-2 viewing mode. The uncertainties on the slopes are the standard deviation of the slopes computed through bootstrapping. The values for ocean glint data with and without the Southern Hemisphere winter data are included on separate rows. Note that the slopes are computed after the global bias has been removed from the data and the residual footprint corrections have been applied. The glint and nadir data are filtered with warn level ≤ 11 and xco2 quality flag = 0; the target-mode data are filtered using the filters described in Table 2 (Fig. 5). Appendix Fig. A1r and s also show this problem as a function of time. The bias is also seen in preliminary comparisons to models (not shown), which also indicate a low bias of OCO-2 ocean glint X CO 2 in the tropical oceans. However, this latter bias has not been clearly detected in compar-isons with TCCON data (e.g., Fig. A1p and r). The Southern Hemisphere ocean glint bias does not impact the overall scaling bias between OCO-2 and TCCON X CO 2 within the uncertainty but does impact the latitudinal gradients (and hence fluxes) inferred by the OCO-2 data. While the cause of the bias in the southern winter is currently unclear, there is a promising hypothesis related to the OCO-2 B7r algorithm's misrepresentation of stratospheric aerosols, exacerbated by the eruption of Mount Calbuco in Chile on 22 April 2015 (Romero et al., 2016). The overall comparisons between the OCO-2 data and TCCON data are reported in Tables 3 and 5 and shown in Figs. 12-14 for data from land glint mode, ocean glint mode, and nadir mode. The differences between aggregated, bias-corrected OCO-2 X CO 2 data coincident with all available TCCON daily median measurements are −0.3, 0.2, and 0.2 ppm for land glint, ocean glint, and nadir, respectively. The RMS values of these differences are 1.3, 1.4, and 1.3 ppm, respectively. The differences between the biascorrected OCO-2 values and the TCCON medians differ from site to site; sites with more than 10 coincident measurements have differences in land glint mode ranging from −0.7 ppm (Wollongong) to 0.9 ppm (Karlsruhe), in ocean glint mode ranging from −1.1 ppm (Saga) to 0.4 ppm (Park Falls), and in nadir mode ranging from −0.1 ppm (Wollongong) to 1.6 ppm (Garmisch). Table 4 contains the overall nadir and glint statistics when using warn levels ≤ 15 instead of the recommended warn level filter (≤ 11).
The nadir mode data show the best correlation of the three science modes (R 2 = 0.81), followed closely by land glint (R 2 = 0.79) and finally ocean glint (R 2 = 0.63). The low correlation coefficient in the ocean glint data is partially driven by the high anomalies in the Southern Hemisphere winter, most obviously in the data over Wollongong (Fig. 11). If the Southern Hemisphere winter data (June-September) are excluded from the ocean glint correlations, the R 2 improves to 0.75. The slopes of all three regressions are within uncertainty of 1.0. The agreement between the science-mode OCO-2 data and TCCON is poorer than that for the targetmode measurements. Halving the spatial coincidence criteria over each site does not significantly improve the correlation coefficients. This suggests that it is not solely our definition of the coincidence criteria that causes the low correlation coefficients and that perhaps the surface properties within the coincidence boxes contain sufficient variability to degrade the comparisons. This highlights the importance of the target-mode data for assessing local, site-to-site, and overall bias.

Conclusions
Aggregated OCO-2 X CO 2 estimates filtered with warn level ≤ 11 and xco2_quality_flag = 0 generally compare well with coincident TCCON data at global scales, with absolute median biases less than 0.4 ppm and RMS differences less than 1.5 ppm. While the bias correction clearly improves the relationship between TCCON and OCO-2 globally, some biases remain. Spurious local X CO 2 variability that is correlated with topography and surface brightness is apparent in the target-mode measurements, particularly over Edwards, Wollongong, and Lauder. Ocean glint measurements from OCO-2 at southern high latitudes during the Southern Hemisphere winter are biased high, possibly due to stratospheric aerosol interference. In all observation modes, there is an apparent latitude-dependent bias, with the largest north of 45 • N. Remedying these residual biases is the current focus of the OCO-2 algorithm development and validation teams, and we anticipate that the next version of the OCO-2 data will represent a significant improvement. It is imperative to continue measurement comparisons with TCCON in all modes (target, glint, and nadir) to monitor and evaluate the OCO-2 data quality throughout its entire mission.
Data availability. Unfiltered, uncorrected OCO-2 data are available from the Goddard Data Center (GES-DISC, 2016). The filtered and bias-corrected data are contained in "lite" files, which are available both from JPL's CO 2 portal (https://co2.jpl.nasa.gov/, JPL-Caltech, 2016) and the Goddard Data Center. TCCON data are available from the TCCON data archive, hosted by CDIAC: http://tccon.ornl.gov. Each TCCON dataset used in this paper is cited independently in Table 1  The ocean glint, land glint, and nadir mode plots for each TCCON station are shown in Fig. A1. In each plot, there are four panels. The top left panel shows the time series of the TCCON daily median data (black circles) and the OCO-2 data (triangles colored differently for each mode). The bottom left panel shows the difference between OCO-2 and TCCON measurements (OCO-2 − TCCON). The top right panel shows the correlations between the TCCON data and the OCO-2 data. The bottom right panel shows the coincidence area for the OCO-2 measurements. Note that the gap in the OCO-2 data over Lauder in winter is caused by neardirect sun glint, during which time the spacecraft is not permitted to measure (i.e., no data were recorded at that latitude during that time).  Figure A1. The top left panel of each plot (a-t) shows the time series of the TCCON daily medians (black circles) and the daily medians of the OCO-2 glint mode (gold triangles), split into land glint (blue triangles) and ocean glint (red triangles), and OCO-2 nadir mode (purple triangle). The bottom left panel shows the difference between the OCO-2 data and TCCON data as a function of time. The top right panel shows the one-to-one correspondence between the OCO-2 X CO 2 values and the TCCON values, and the best fit lines in the colors corresponding to the symbols. The one-to-one line is marked in black. The lower right panel shows the location of the TCCON station (black circle) and the locations of the OCO-2 data, showing glint-mode data in gold and nadir-mode data in purple. The lower right panel is intended to give a sense of the spatial coincidence criteria applied to the OCO-2 data for each TCCON station.