Scientific assessment of background ozone over the U.S.: Implications for air quality management.

Ozone (O3) is a key air pollutant that is produced from precursor emissions and has adverse impacts on human health and ecosystems. In the U.S., the Clean Air Act (CAA) regulates O3 levels to protect public health and welfare, but unraveling the origins of surface O3 is complicated by the presence of contributions from multiple sources including background sources like stratospheric transport, wildfies, biogenic precursors, and international anthropogenic pollution, in addition to U.S. anthropogenic sources. In this report, we consider more than 100 published studies and assess current knowledge on the spatial and temporal distribution, trends, and sources of background O3 over the continental U.S., and evaluate how it inflattainment of the air quality standards. We conclude that spring and summer seasonal mean U.S. background O3 (USB O3), or O3 formed from natural sources plus anthropogenic sources in countries outside the U.S., is greatest at high elevation locations in the western U.S., with monthly mean maximum daily 8-hour average (MDA8) mole fractions approaching 50 parts per billion (ppb) and annual 4th highest MDA8s exceeding 60 ppb, at some locations. At lower elevation sites, e.g., along the West and East Coasts, seasonal mean MDA8 USB O3 is in the range of 20-40 ppb, with generally smaller contributions on the highest O3 days. The uncertainty in U.S. background O3 is around ±10 ppb for seasonal mean values and higher for individual days. Noncontrollable O3 sources, such as stratospheric intrusions or precursors from wildfires, can make significant contributions to O3 on some days, but it is challenging to quantify accurately these contributions. We recommend enhanced routine observations, focused fi studies, process-oriented modeling studies, and greater emphasis on the complex photochemistry in smoke plumes as key steps to reduce the uncertainty associated with background O3 in the U.S.

(VOCs). These O 3 precursors are emitted by fossil fuel combustion, agriculture, biomass burning, oil and gas production, and a variety of other industrial processes. Anthropogenic emissions of NO x and some VOCs have decreased in the U.S. over the past several decades, and peak O 3 levels have declined in most areas of the U.S. as a result Simon et al., 2015;Strode et al., 2015). At the same time, new evidence has demonstrated adverse health effects at lower O 3 levels (US EPA, 2013) and the EPA recently strengthened both the primary and secondary NAAQS (US EPA, 2015). A monitor meets the standard if the 3-year average of the annual 4th highest maximum daily 8-hour average O 3 mole fraction (MDA8), called the "ozone design value (ODV)", is less than or equal to 70 parts per billion (ppb). An additional metric, the "W126 exposure index", can be used to assess the cumulative seasonal exposure of vegetation to O 3 .
Regulation of locally formed O 3 is complicated by the fact that O 3 also has significant background levels in the troposphere. Observations from remote sites along the west coast of North America show that seasonal mean O 3 ranges from 30 to 50 ppb, thus the "background" air that enters the U.S. with the prevailing westerly winds already contains a substantial fraction of the 70 ppb standard. Observations and/or modeling show that, on some days, O 3 at a site may be enhanced by noncontrollable O 3 sources (NCOS), such as recent stratosphere-totroposphere transport (STT), long-range transport from non-domestic sources, lightning, or photochemical production from natural NO x and VOC precursor emissions including wildfires initiated by natural or human causes (Jaffe et al., , 2005Parrish et al., 2010;Ambrose et al., 2011;Wigder et al., 2013a;Langford et al., 2009Langford et al., , 2012. While foreign sources of pollution are theoretically controllable, these are beyond the control of any local jurisdiction, so for this discussion we include these in the NCOS category. In addition, foreign pollution is often mixed in with other types of NCOS (e.g., Cooper et al. 2004b;Ambrose et al., 2011), making it difficult to quantify these sources. The CAA provides several mechanisms, including Section 319b (Exceptional Events Rule (US EPA, 2016a, b)) and Section 179B (international transport), that offer policy solutions to account for high O 3 due to these noncontrollable sources (US EPA, 2013). We note that the EPA uses the term "exceptional events (EEs)" to consider days when surface O 3 is elevated above the NAAQS by episodic natural sources such as stratospheric intrusions or wildfires that cannot be "reasonably controlled" (EEs can also include episodic emissions of anthropogenic precursors if these were not reasonably controllable and are unlikely to recur at a specific location). EE influenced data can be excluded from the design value calculation if they are identified by the state agency and supported by evidence, which is then evaluated and approved by the EPA. Thus, excluding high O 3 caused by exceptional events may allow an area to be designated in attainment of the NAAQS. For areas that would otherwise violate the NAAQS because of international transport, Section 179B provides relief from penalties for failing to attain the NAAQS, but days affected by international transport are included in the calculation of the design value. In this review we focus on NCOS, rather than EEs, to consider more broadly the contributions of both international transport and EEs. Individual NCOS events can increase local surface O 3 levels on timescales ranging from hours to days before dissipating to become part of the tropospheric background. They are potentially important throughout the U.S., but the impact appears to be greatest in the western states where wildfires tend to be larger , deep stratospheric intrusions are more frequent (Skerlak et al., 2014), and transport from Asia is more important (Verstraeten et al., 2015).
The frequency of NCOS events, and thus higher background O 3 , in the western U.S. makes it essential that we understand the sources of that O 3 , and this requires careful analysis using both observations and models. In this review, we use the term "U.S. background O 3 (USB O 3 )" as O 3 formed from NCOS plus anthropogenic sources in countries outside the U.S. (Dolwick et al., 2015). While USB O 3 incorporates the influence from NCOS, in our discussion, we focus on NCOS that elevate O 3 on a short-term basis (e.g., daily), to values above the seasonal mean USB O 3 . Although the global CH 4 burden reflects both domestic and international emissions, we include its contributions in USB O 3 , similar to previous work (e.g., Fiore et al., 2014a). Essentially, USB O 3 encompasses the contributions from natural and foreign sources of O 3 that cannot be controlled by precursor emissions reductions solely within the U.S. Since USB O 3 varies daily and is a function of season, meteorology, and elevation, quantification of USB O 3 on days that exceed the NAAQS is more relevant to air quality management than seasonal mean estimates. We note that some studies use the term "North American background (NAB) O 3 ", which is similar to USB O 3 , but is defined as O 3 formed from natural sources plus anthropogenic sources in countries outside the U.S., Canada, and Mexico.
A quantitative understanding of USB O 3 is essential for air quality management in general, and for state and local efforts to meet the NAAQS in particular. This is especially true given the recent lowering of the NAAQS O 3 levels and the associated increasing relative importance of USB O 3 as domestic precursor emissions decrease. Primary tools used by states and the EPA to manage air quality are the State Implementation Plans (SIPs; US EPA, 2015) or Federal Implementation Plans (FIPs). These documents are federally-enforceable plans developed by and/or for states that identify how the state will attain and/or maintain the air quality standards. A key component of each SIP is the maintenance of a network of regulatory O 3 monitors that use standardized sampling methodologies, quality assurance, and siting requirements established by the EPA, along with other federal, tribal, state and local agencies. Knowledge of the sources contributing to the ambient levels on the highest O 3 days is important because controlling the domestic contribution to O 3 production affects the estimates of both the health benefits and the economic costs and benefits associated with achieving the NAAQS (US EPA, 2014c). This knowledge is also important for SIP development because it helps states identify the most effective emission control strategies.
Quantification of USB O 3 requires a chemical transport model (CTM) since it cannot be measured directly (e.g., Fiore et al., 2002Fiore et al., , 2003Zhang et al., 2009), but these models must be informed and evaluated using observations. In addition to USB O 3 , an alternative useful metric for evaluating modeled mole fractions is "baseline" O 3 , which is the distribution of O 3 observations at a rural or remote site that has not been influenced by recent, local emissions (HTAP, 2010). We note that this definition differs from the one adopted by a National Research Council (NRC) report (NRC, 2010), which defined baseline as "the statistically defined lowest abundances of O 3 in the air flowing into a country." We find the HTAP (2010) definition to be a more useful metric, since the lowest mole fractions may be associated with a particular season or transport pathway and therefore not representative of all conditions. Measurements of baseline O 3 are expected to be greater than model-estimated USB O 3 since the former includes some O 3 produced many days earlier by U.S. emissions that have been recirculated regionally or globally. In the following discussion, it is important to keep in mind that baseline O 3 is not the same as USB (or NAB) O 3 , but both can be characterized by a seasonal mean, MDA8, 3-year ODV, and other statistical metrics. Because states develop their SIPs by evaluating O 3 response to emissions controls on the highest modeled O 3 days, an especially useful metric is the estimate of USB and NCOS O 3 on those days. Natural, international, and domestic sources all contribute to observed surface O 3 . Figure 1 demonstrates how these sources contribute to O 3 mole fractions that are used in air quality management decisions. Depending on the magnitude of the sources, such as stratospheric intrusions or wildfires, these sources could be identified as EEs. However, the magnitude of the events and the ability of current data and tools to characterize it will impact whether specific episodes qualify as EEs. Which NCOS can be removed from the analysis may impact air quality management including SIPs.
In this review, we focus mainly on work completed since 2011 and build on earlier studies (NRC, 2010;McDonald-Buller, 2011). We address a number of scientific questions:  . The positive vertical gradient and local orographic flows also cause the observations at MBO to show lower O 3 in the daytime, when air from the surrounding valley is lifted to the summit and higher O 3 at night, when the site is exposed to the free troposphere (Weiss-Penzias et al., 2006). Altitude also has an influence on the ODV metrics as can also be seen by comparing nearby rural sites at different elevations. Table 1 shows ODVs for pairs of rural monitoring sites in Oregon, Wyoming, and New Hampshire. In each case the higher elevation site (>1000 meters elevation difference) shows an ODV that is enhanced by at least 10 ppb compared to the lower elevation site. This reflects both the higher seasonal median O 3 and larger contributions from NCOS. For the Mt. Washington, New Hampshire site, and to a lesser extent the Centennial, Wyoming site, this could also reflect greater transport of domestic O 3 , given that these sites are downwind of major U.S. source regions (e.g., Huang et al., 2013a). This is not the case for Mt. Bachelor, however, which receives minimal influence from U.S. anthropogenic sources (Ambrose et al., 2011 (Wigder et al., 2013a). Furthermore, the O 3 lifetime is longer in the lower free troposphere than in near-surface air where it undergoes depositional loss to the surface and where chemical reaction rates may be enhanced in warmer, more humid air masses. The O 3 levels measured at mountain sites and nearby populated areas may be similar, however, if the boundary layers are sufficiently deep and well-mixed as is often the case in the Intermountain West (Langford et al., 2017).

Approaches used to quantify USB and NAB O 3
Most estimates of background O 3 have been made using regional CTMs such as the CMAQ (Community Multiscale Air Quality Modeling System) (Byun and Schere, 2006) and CAMx (Comprehensive Air Quality Model with Extensions) (Ramboll Environ, 2014) models that are initialized using lateral boundary conditions (BCs) derived from global models. In this section, we summarize the model approaches used to estimate USB O 3 and examine their different merits, limitations, and best uses. We note that different methods of employing CTMs may be best suited (scientifically or computationally) to a specific policy or research question. Biases owing to misspecification of emissions, errors in physical processes, choices regarding chemical mechanisms (Knote et al., 2015), model resolution (Lin et al., 2010), and plume dispersion (Rastigejev et al., 2010;Eastham and Jacob, 2017) may propagate into biases in the source attribution. In some cases, as described in more detail later, ad hoc methods for biascorrecting model estimated source attribution have been applied (e.g., Lin et al., 2012a, b;Lapina et al., 2014). The most common modeling approach for quantifying USB O 3 is the "zero-out" method, whereby domestic anthropogenic emissions are set to zero (e.g., Fiore et al., 2014a) to provide a direct estimate of the O 3 levels that would exist without domestic emissions. Nuances arise when applying the zero-out method to regional models wherein USB O 3 is transported into (and potentially out of) the regional modeling domain. For example, the regional boundary conditions used for defining USB O 3 may come from a global model run with U.S. anthropogenic emissions set to zero (e.g., Emery et al., 2012), or may be drawn from global model runs without any emissions perturbations (e.g., Lefohn et al., 2014). Huang et al. (2017) found that surface O 3 responses in a regional model over North America to changes in USB O 3 contribution from East Asia were smaller than those in the global models used to generate the boundary conditions. Zero-out scenarios also change O 3 production efficiency within the model domain causing the contributions from different sectors and regions to be non-linearly related. This is particularly obvious in the case of NO x titration, which is removed when local emissions are zeroed, causing O 3 increases. This non-linearity can prevent the source contributions from adding up to 100% of the total modeled O 3 levels , which could be a concern when multiple model zero-out simulations from different source regions are combined.
Sensitivity methods can also be used to estimate USB O 3 and contributions by source. The most basic implementation of sensitivity modeling is direct perturbation modeling, where emissions from each source or region of interest (or contributions from the stratosphere) are reduced or increased by small amounts (e.g., ±20%; Wu et al., 2009;Galmarini et al., 2017) such that nonlinear O 3 responses are not typically triggered in polluted conditions (Cohan et al., 2005). At the extreme limit of perturbation methods (i.e., infinitesimally small perturbations), techniques such as adjoint modeling (Sandu et al., 2005;Zhang et al., 2009) and decoupled direct methods (DDM;Dunker et al., 1981;Hakami et al., 2004) efficiently calculate the local linear sensitivity of USB O 3 to numerous source contributions. These methods provide results suited for projecting changes in O 3 owing to small emissions perturbations (e.g., <20-50%; Reidmiller et al., 2009;Huang et al., 2017). Second-order correction terms can be applied to sensitivity approaches to estimate O 3 contributions caused by larger perturbations Wild et al., 2012), or nonlinear changes can be evaluated using path-integral methods (Dunker et al., 2017). While these techniques can track sensitivities within a given model, they depend strongly on the emission inventories applied in that model. It is thus critical to evaluate uncertainties in historic and future source estimates, and how these uncertainties propagate into projections of specific O 3 metrics. Tagging techniques track source contributions in models without perturbing emissions (Cohan and Napelenok, 2011;Grewe et al., 2010). Tagging relies on a set of rules for assigning each molecule of O 3 to a particular source. These sources may be defined as specific tropospheric production regions (e.g., Wang et al., 1998;Fiore et al., 2002) or the stratosphere (e.g., Lin et al., 2012a;Zhang et al., 2014). Other tagging approaches use chemical indicators of the factors limiting O 3 production (e.g., the ratio of hydrogen peroxide to nitric acid production, or the maximum incremental reactivity of VOC families) to assign O 3 to either NO x or VOC sources, such as the CAMx OSAT (Ozone Source Apportionment Technology) and CMAQ ISAM (Integrated Source Apportionment Method) source tagging schemes (Ramboll Environ, 2014;Kwok et al., 2015). Tagging may also be defined through the addition of tracers to track the origin of precursor molecules such as NO x (e.g., Emmons et al., 2012;Pfister et al., 2013) or VOCs (Butler et al., 2011). Other tagging rules include assignment preferentially to anthropogenic precursors (Ramboll Environ, 2014), or tagging of all O 3 precursors (NO x , CO, and VOCs) such as in Grewe et al. (2010Grewe et al. ( , 2017 and Guo et al. (2017), which leads to larger estimates of USB O 3 than sensitivity studies or tagging only one type of precursor. Ying and Krishnan (2010) developed a scheme that includes tracers for O 3 produced from individual species; the treatment of VOC impacts on radical species in this approach may underestimate contributions from reactive VOCs and overestimate those from less reactive VOCs (Kwok et al., 2015). Lefohn et al. (2014) define an Emissions-Influenced Background (EIB) that accounts for the decrease in the lifetime of USB O 3 caused by anthropogenic emissions. This diversity of tagging approaches can make direct comparisons across such studies challenging, and the differences in source attribution estimates as well as the computational cost of these methods make them less well suited than zero-out simulations for estimating USB O 3 .
Several studies have compared USB O 3 estimates calculated using different methods. In one study, a tagging source apportionment method using CAMx was compared to a zero-out method using CMAQ. The two approaches were found to provide similar estimates of April-October mean NAB O 3 in rural areas, but in urban areas CAMx APCA (Anthropogenic Precursor Culpability Assessment) provided lower estimates of background O 3 compared to CMAQ zero-out (Dolwick et al., 2015). Other comparisons note that tagging is more appropriate for source attribution than for estimating responses to emissions changes (e.g., Collet et al., 2014). In cases strongly affected by nonlinearities of O 3 formation, the choice of source estimation method can lead to considerable differences (Grewe et al., 2010;Stock et al., 2013;Lapina et al., 2014;Emmons et al., 2012). Parrish et al. (2017a) noted that the running average ODVs for sites in Southern California over the past 4 decades can be fit to a simple exponential decay function. They postulated that the asymptotic value of this fit is the same as USB O 3 . However, it is difficult to compare this approach with modeling studies that use a more rigorous definition for USB O 3 . To derive USB O 3 from the Parrish et al. (2017a) method, it is necessary to assume that U.S. emissions are asymptotically approaching zero, that emissions and ODVs are directly related, and that USB O 3 on ODV days is constant over the analysis time period. Because of these limitations, the "background ODVs" calculated in this manner are probably more representative of current baseline O 3 , plus some unquantified contribution from U.S. anthropogenic emissions.

Spatial and temporal distributions of USB O 3
Here we review published work on spatial and temporal distributions of USB O 3 from CTMs and summarize consistent and robust patterns. We also identify discrepancies between estimates of USB O 3 and, if possible, the causes for these discrepancies. While a clear, quantitative synthesis across the published literature (Tables S1 and S2) is confounded by inconsistencies in the metrics reported and the time periods and regions considered, some robust patterns are evident and several CTMs have been able to capture the major features in the daily and seasonal surface O 3 patterns Reidmiller et al., 2009;Schnell et al., 2015).
The McDonald-Buller et al. (2011) review relied heavily on background O 3 estimates from the global GEOS-Chem (GC) model available at that time . Major methodological advances since McDonald-Buller et al. (2011) include seasonal mean USB and NAB O 3 estimates from additional global and regional models (Table S1) and studies quantifying the influence of NCOS on surface O 3 distributions (Table S2). A broad set of modeling studies robustly shows that seasonal mean USB and NAB O 3 are usually largest at western U.S. highaltitude sites (Table S1), as expected from the general increase in O 3 with altitude in the troposphere (e.g., Newchurch et al., 2003;Logan et al., 1999). This spatial pattern was emphasized in the earlier McDonald-Buller et al. (2011) review paper and was based on observations of baseline O 3 and published USB and NAB O 3 estimates from the GC model.
Individual studies report different O 3 metrics and vary in their definitions of peak O 3 season, ranging from two to seven months, mostly in spring and summer. Synthesizing across these studies, we find a range of 15-65 ppb (Table  S1) for seasonal mean USB O 3 (MDA8) over the U.S. The higher end of this range occurs over high-altitude western U.S. sites in spring when Asian pollution and transport from the stratosphere make their largest contributions (20-35 ppb; Table S2) and when the O 3 lifetime is longer than in summer (see Table S1). In the eastern U.S. and along the California coast, seasonal mean NAB O 3 from the GC model is in the range of 20-40 ppb (Fiore et al., 2014a) and USB O 3 is similar for the California coast from CMAQ (Dolwick et al., 2015). Other O 3 metrics, such as those relevant for vegetation exposure, like W126, a 3-month integral that heavily weights high O 3 , differ in their sensitivity to USB O 3 (e.g., Lapina et al., 2014Lapina et al., , 2016Huang et al., 2013b). A 3-model average NAB O 3 contributed 64-78% of the May-July daytime O 3 over the Intermountain West during 2010, but only 9-27% of the W126, which more strongly weights the highest O 3 levels (Lapina et al., 2014).
NCOS (and USB O 3 ) also show significant interannual variability, complicating direct comparisons across studies from different years. The studies in Table S2 summarize individual seasonal mean NCOS estimates, which include up to 25 ppb transported from the stratosphere, up to 10 ppb produced from lightning NO x , and up to a few ppb from wildfires. Estimates for seasonal mean Asian influence are generally below 5 ppb (Table S2). Anthropogenic CH 4 is included in the USB O 3 estimates in Table S1, and has been estimated to contribute ~5 ppb to U.S. surface O 3 (Fiore et al., 2008. Near the U.S. borders with Canada and Mexico, international pollution transport enhances USB O 3 relative to NAB O 3 (Wang et al., 2009;Guo et al., 2018). In the southwestern U.S., seasonal mean USB O 3 is higher than in other regions during both spring and summer, and NCOS play a more important role on high O 3 days (Fiore et al., 2014a;Langford et al., 2017), although stratospheric intrusions occasionally decrease surface O 3 in the heavily polluted Los Angeles Basin (Langford et al., 2012).
At some locations, the influence from individual NCOS (Figure 1) leads to day-to-day variability in observed O 3 and modeled USB O 3 . For example, at high-altitude western U.S. sites, USB O 3 correlates with simulated total ground-level MDA8 O 3 , implying that USB O 3 drives dayto-day variations in observed O 3 (Fiore et al., 2014a; see their Figure 8). Other models consistently find western USB O 3 increases with observed (total) O 3 (Lefohn et al., 2014;Huang et al., 2015), although Dolwick et al. (2015) note that the fractional USB O 3 contribution is typically less for the highest modeled values. Numerous studies have shown that NCOS can contribute up to 30 ppb to the observed MDA8 at regulatory monitors due to deep stratospheric intrusions, especially at high-altitude sites (e.g., Langford et al., 2009Langford et al., , 2015aLin et al., 2012aLin et al., , 2015aKnowland et al., 2017) or from wildfires Singh et al., 2012;Dreessen et al., 2016;Gong et al., 2017). Cross-border transport from Mexico or Canada can also contribute to significant variations in daily MDA8 values (Wang et al., 2009). Modeled USB O 3 also show these daily variations due to NCOS, with modeled USB MDA8 O 3 sometimes exceeding 70 ppb (Lin et al., 2012a, b;Zhang et al., 2014). Models will not necessarily capture the O 3 maximum on the highest observed days, implying uncertainty in the simulated partitioning of total O 3 into USB O 3 and other sources (Fiore et al., 2014a). Furthermore, even if a model captures the observations perfectly, it does not necessarily follow that the simulated source attribution is correct. Figure 3 illustrates that the 4 th highest NAB MDA8 value at rural locations in the NOAA GFDL AM3 model is much lower than the observed 4 th highest MDA8 over most densely populated U.S. regions, but that NAB O 3 contributes to some of the highest observed days in the Intermountain West, Pacific Northwest, and along the U.S.-Canada border. At some high elevation sites, the annual 4 th highest NAB MDA8 from AM3, averaged over 2010-2014, exceeds 60 ppb although we note that AM3 simulations may be biased high by too much transport from the stratosphere (Lin et al., 2012b;Fiore et al., 2014a). Over the eastern U.S., where Figure 3 shows 4 th highest NAB MDA8 values below 60 ppb, both AM3 and GEOS-Chem indicate that the highest O 3 events are typically fueled by U.S. anthropogenic emissions with little correlation between USB O 3 and total simulated O 3 (with the possible exception of some sites along the Gulf Coast; Figure 8 of Fiore et al., 2014a).
A few of the studies in Table S1 compared seasonal mean and daily NAB O 3 estimates across 2-4 models and found discrepancies in the magnitude and variability, both spatial and temporal, of NAB O 3 estimates for the MDA8 (Fiore et al., 2014a), daytime mole fractions, and the W126 (Lapina et al., 2014) O 3 metrics. The AM3 model generally simulates significantly higher seasonal mean values in both spring and summer (up to 20 ppb higher), compared to other models. Fiore et al. (2014a) concluded that differences in model estimates of NAB O 3 resulted primarily from different model representations of stratosphere-troposphere exchange, wildfire, and lightning sources (and their subsequent chemistry) as well as isoprene oxidation chemistry in the models. HTAP (2010) and Huang et al. (2017) show that Asian and other intercontinental O 3 sources also vary by model. Orbe et al. (2017) show how different convection schemes can have large influences on transport, even when using the same meteorological fields. Dolwick et al. (2015) applied two regional models to compare the zero-out and source apportionment approaches and found similar seasonal mean MDA8 USB O 3 estimates (after correcting for biases as large as ±10 ppb versus observations in each of the regional models compared to observations). Discrepancies between these USB O 3 estimates occurred most strongly in urban areas where anthropogenic emissions can lower background O 3 levels due to NO x titration (Dolwick et al., 2015). Consideration of odd oxygen in the tracers used for source apportionment would minimize such discrepancies. Odd oxygen here would be defined as including O 3 + NO x to account for conversion of O 3 to NO 2 (by NO titration).
Uncertainty in estimates of USB O 3 can be difficult to consolidate across studies into an overall uncertainty estimate owing to differences in region, season, source apportionment method, and O 3 metrics considered in different works. Nevertheless, insight into the range of uncertainties can be gained from several studies that have considered multiple models or approaches in an internally self-consistent manner. While model diversity does not strictly represent the total model uncertainty (which must also consider bias against observations), it is still a useful measure of confidence in USB O 3 estimates. For example, the daytime NAB O 3 in Lapina et al. (2014) from three different global models showed modest differences over most regions of the U.S., but much more significant differences in NAB O 3 for the W126 vegetation index. In this case, the contribution from NAB O 3 to W126 can differ by a factor of 2 using different models. In Dolwick et al. (2015), two different regional models and source apportionment methods were used to estimate seasonal MDA8 USB O 3 . They found that at over 75% of the locations, the differences were less than 2.5 ppb after the base models were bias corrected although we note that the same global model boundary conditions were used in each regional model. In Fiore et al. (2014a), estimates of MDA8 NAB from two global models differed by 1-10 ppb, depending upon region, season, and altitude. Hogrefe et al. (2018) evaluated surface O 3 simulations in a regional model using four sets of boundary conditions from different global models (AM3, MOZART, Hemispheric CMAQ, and GEOS-Chem). The largest differences exceed 10 ppb for seasonal mean O 3 observed at U.S. sites and reached 15 ppb on individual days. For two sets of boundary conditions, observation-model differences were much smaller (typically ±4 ppb). Qualitative synthesis by the authors of all these estimates of model differences and estimates of model biases suggests uncertainties in seasonal mean USB O 3 of about ±10 ppb.
Comparisons to observations are essential for assessing the fidelity of models used to quantify USB O 3 and NCOS and their spatial and temporal variability and lending confidence to their estimates. In some cases, different models bracket observed O 3 abundances (e.g., Fiore et al., 2014a), but in others, such as for ground-level O 3 over the southeastern U.S. in summer, systematic model biases exist (e.g., Travis et al., 2016). Travis et al. (2017) found that this pervasive positive summertime bias over the southeast U.S. is restricted to the surface and may reflect shortcomings in model resolution of asymmetric top-down and bottom-up vertical mixing. Systematic biases may also reflect missing (or poorly represented) loss processes (e.g., halogen chemistry (Sherwen et al., 2017) or dry deposition (e.g., Val Martin et al., 2014)). Some of the studies in Table S1 have attempted to bias-correct USB or NAB O 3 estimates by simply assuming the bias is entirely due to USB O 3 (Lin et al., 2012b) or by assuming that the relative model contributions from individual sources are accurate such that USB O 3 is adjusted proportionally to its contribution to total simulated O 3 (Dolwick et al., 2015). The former approach assumes a single process causes the error whereas the latter assumes the model is missing a sink that acts on all O 3 regardless of the source (or overestimates O 3 from all sources equally). Models assimilating tropospheric satellite-based O 3 columns or aircraft-based profiles show improved model representation of western U.S. ozonesonde profiles (e.g., Huang et al., 2015) but would require assumptions to partition the adjustment into USB O 3 versus O 3 produced from U.S. anthropogenic emissions. While models adjusting emissions of O 3 precursors based on satellite data assimilation (e.g., Huang et al., 2015) could lead to improved estimates of USB O 3 , this approach is still subject to errors in model transport and cannot differentiate between natural and anthropogenic sources occurring in the same model grid cell. Although a single model may best represent a particular site or day of interest, a multi-model approach may best provide a general characterization of spatial, seasonal, and daily variability in USB O 3 until the root sources of individual model biases are clear. Future efforts would benefit from moving beyond abundancebased evaluations and towards process-based evaluation to demonstrate whether models capture the variability in observations attributable to USB O 3 and specific NCOS. This type of evaluation will require intensive field campaigns and long-term observations that measure not only O 3 but also related meteorological and chemical variables. Locations and times with inter-model differences with major implications for air quality management could guide targeted observations for evaluating process-level representation in the models. Efforts to coordinate multimodel approaches, as has been done for quantifying the influence of foreign anthropogenic emissions on surface O 3 under the Task Force on Hemispheric Transport of Air Pollution (HTAP, 2010;Galmarini et al., 2017), would facilitate a more systematic and rigorous assessment of our quantitative understanding of USB O 3 as represented across a suite of modeling systems.
Satellite observations enable new global model analyses (via data assimilation) and have made significant contributions to EE analyses (e.g., Fiore et al., 2014b). However, satellite data have not yet been able to retrieve O 3 mole fractions in the boundary layer and at the surface. Some satellite analyses have quantified tropospheric column O 3 , either directly (e.g., Liu et al., 2010) or by difference (Ziemke et al., 2011). However, this situation is likely to change dramatically as several geostationary satellite instruments will be deployed in the next 5 years. This includes the U.S. Tropospheric Emissions: Monitoring Pollution instrument (TEMPO), the Korean Geostationary Environment Monitoring Spectrometer (Bak et al., 2013), and the European Sentinel-4 satellite (Zoogman et al., 2017). By measuring backscattered solar radiation in both the visible and near ultraviolet (290-740 nm) from a geostationary orbit, TEMPO should be able to distinguish boundary layer O 3 from that in the free troposphere and stratosphere, and provide hourly data for the continental U.S. on key O 3 precursors, such as nitrogen dioxide (NO 2 ) and formaldehyde (HCHO). Specifications for TEMPO call for a precision of 10 ppb for the 0-2 km and free tropospheric O 3 measurements. Thus, TEMPO should provide key constraints on modeled O 3 that can improve source and EE attribution (Zoogman et al., 2014(Zoogman et al., , 2017. The satellite community has been engaged with regional air quality efforts via programs such as the NASA Air Quality Applied Sciences Team, and this has led to important partnerships between the scientific and regulatory communities (e.g., Fiore et al., 2014b;Witman et al., 2014).

Interannual variability and trends in baseline and USB O 3
Generalization of individual measurement and model results is complicated by the fact that background O 3 exhibits both long-term trends and substantial year-toyear variability. Observed year-to-year variations of surface O 3 show large-scale similarity across sites over the Intermountain West (Jaffe, 2011;Lin et al., 2017), indicating that the controlling processes operate across large scales. Both mean O 3 and the frequency of high O 3 events (>65 ppb) measured at western U.S. rural sites increased in the springs following the strong La Niña winters that occurred in 1998-1999-2011Xu et al., 2017). Anomalously frequent high-O 3 events were also observed at Mt. Bachelor and urban sites downwind in April-May 2012. The enhanced O 3 in spring 2012 resulted in 3-6 days with an MDA8 greater than 70 ppb at several rural locations including Great Basin National Park and Lassen Volcanic National Park . Using the AM3 model, Lin et al. (2015b) were able to capture the significant interannual variability and identify the cause. The highest MDA8 values at western U.S. rural sites occurred in the springs of 1999, 2011, and 2012, following La Niña patterns. The increased frequency of deep tropopause folds, linked to a cyclical amplification of the polar jet stream, is the key driver of year-to-year variability of springtime high USB O 3 events over the western U.S. (Lin et al., 2015b).
Large-scale variations in temperature, pressure, and airflow can also lead to substantial year-to-year variations in O 3 production, air mass stagnation, snowpack accumulation, and wildfire severity (Fiore et al., 2015;Mote et al., 2016;Gong et al., 2017;Jaffe and Zhang., 2017;Lin et al., 2017;Shen and Mickley, 2017). Interannual variability of surface O 3 in the Intermountain West during summer is found to correlate with wildfire severity (Jaffe, 2011;Jaffe et al., 2008). This correlation may also reflect common underlying correlations with temperature rather than a causal relationship between fire and O 3 , as supported by a model with constant fire emissions, which captures the observed O 3 interannual variability . While wildfire emissions can enhance summertime monthly mean O 3 at individual sites by 2-8 ppb, high temperatures and the associated buildup of O 3 produced from regional anthropogenic emissions are also important to elevating observed summertime O 3 in the western U.S.  and throughout the rest of the country .
Information on long-term baseline O 3 trends requires rural monitoring sites combined with methods that can select the data that are representative of air masses originating beyond the nation's borders. While boundary layer O 3 observations show more influence from local, continental, or marine sources, observations at high elevation sites (1.5-3.0 km asl) show greater influence from largescale downward mixing of free tropospheric air, although they can also be influenced by transport of photochemically aged plumes from nearby urban areas or wildfires during summer (e.g., Ambrose et al., 2011). Studies of baseline O 3 trends have mainly focused on the limited number of well-positioned monitoring sites along the U.S. borders (Parrish et al., , 2017bGratz et al., 2015;Zhang and Jaffe, 2017) and across the Intermountain West during spring due to the great interest in the potential impact of rising Asian emissions on U.S. surface O 3 (Jacob et al., 1999).
Cooper et al. (2012) found a tendency towards increasing O 3 at high elevation rural sites across the western U.S. in spring and no clear trend in summer over the period 1990-2010, despite stringent precursor emission controls in the U.S. that have decreased O 3 in urban areas (e.g., Russell et al., 2012). Extending the analysis to 1988-2014, Lin et al. (2017) found 0.2-0.5 ppb yr -1 increases in median springtime MDA8 O 3 measured at 50% of 16 western U.S. high elevation sites, with 25% of the sites showing increases across the entire O 3 mole fraction distribution. There is also evidence that O 3 increased in the mid-troposphere (500 hPa or ~5 .7 km asl) above western North America during April-May at the rate of ~0 .3 ppb yr -1 from 1995 to 2014 (Lin et al., 2015b).
Baseline O 3 trends on the West Coast of the U.S. have been determined at several of the surface and mountain sites described above, although the data records are relatively short. From 2004 to 2015, mean O 3 at Mt. Bachelor (2.8 km asl) has increased significantly: 0.62 ± 0.25 ppb yr -1 in spring, 0.66 ± 0.27 ppb yr -1 in summer, and 0.79 ± 0.34 ppb yr -1 in fall . In the most recent analyses, marine boundary layer O 3 has remained unchanged at Cheeka Peak, Washington, and decreased at Trinidad Head in northern California (Parrish et al., 2017b). Figure 4 shows these trends. The decrease of O 3 at Trinidad Head may be associated with a shift in transport pattern (as indicated by rapidly warming temperatures), while the spring increase at Mt. Bachelor has been attributed to changes in Asian emissions over the past decade and the summer increase attributed to regional wildfires . The differences at these two sites, separated by a horizontal distance of 850 km, likely reflect the different influences of local processes, interannual meteorological variability, and changing USB O 3 .
Attribution of baseline O 3 trends requires consideration of changes in global emissions, as well as regional climate variability, particularly in short data records. It is well established that O 3 formation depends on both temperature (e.g., Weaver et al., 2009) and humidity and changes in these climate variables must be considered when evaluating trends. For example, Bloomer et al. (2010) Zhang et al., 2008), (2) free-running chemistryclimate models (CCMs) that generate their own weather, but are driven with historical emissions (Cooper et al., 2014;Lamarque et al., 2010;Parrish et al., 2014), and (3) multi-decadal hindcast simulations driven with observed meteorology and historical emissions (Brown-Steiner et al., 2015;Koumoutsaris and Bey, 2012;Lin et al., 2015b;Lin et al., 2014;Lin et al., 2017;Strode et al., 2015;Xing et al., 2015). The O 3 trends derived from observations are higher than those from CTMs with constant meteorology, and from free-running CCMs by a factor of two at some sites (e.g., Parrish et al., 2014). These discrepancies may partly reflect the influence of internal climate variability on observed O 3 (although we note that the reduced variability in CCMs may also reflect errors in their representation of chemistry and dispersion and from numerical diffusion, similar to CTMs whose meteorology is forced to match observed large-scale weather patterns). As the freerunning CCM cannot reproduce the exact meteorological fields for the specific observational period, the model cannot be expected to capture the observed trend exactly (e.g., Lin et al., 2014Lin et al., , 2015aBarnes et al., 2016). For example, Deser et al. (2012) have shown that summertime surface temperature projections for mid-century in some U.S. regions can vary from <1 up to 5°C for the exact same climate forcing scenario solely because of slight variations in the initial atmospheric state. As trends in O 3 are tied to meteorology, and it is unlikely if not impossible that a single climate model simulation would represent the internal variability exactly as manifest in the real atmosphere, CCMs cannot be evaluated in the same manner as CTMs driven by the observed meteorology. Furthermore, meteorologically-driven O 3 variability is large over western North America, leading to significant variations in O 3 trends between sites (Lin et al., 2015b). One recent study using hindcast simulations forced with observed meteorology was able to match measured O 3 trends at rural western U.S. sites by narrowing the analysis to days when the airflow is predominantly from the North Pacific Ocean in the model . This study suggests that the common model-observation disagreement in baseline O 3 trends at western U.S. sites reflects an excessive offset from regional pollution decreases in the global models owing to their coarse resolution, which cannot fully resolve the observed baseline conditions. This shortcoming can be corrected by filtering model O 3 for baseline conditions using regionally emitted tracers in the model, such as CO .
A synthesis of available observations from the mid-1990s to the 2000s indicates increases in surface and free tropospheric O 3 across East Asia (see Supplementary Note 1 in the SI). Quantifying the effects of increasing Asian precursor emissions on O 3 in the U.S., relative to the effects of regional emission controls, has been an active research area in the last decade. Reidmiller et al. (2009) and Wild et al. (2012) used the HTAP simulations to show that regional emission controls over North America are 2-10 times as effective at reducing U.S. surface O 3 as the equivalent controls in Asia and Europe. Even so, Lin et al. (2017) demonstrated that the tripling of Asian NO x emissions from 1990 to 2014 contributed 65% of modeled springtime background O 3 increases (0.3-0.5 ppb yr -1 ) over the western U.S., outpacing O 3 decreases (<0.1 ppb yr -1 ) attained via a 50% reduction of U.S. NO x emissions. Increases in global methane contributed about 15% to the trend.
Detailed analyses of baseline O 3 trends along the U.S. southern and northern borders are limited in the peerreviewed literature. Recent analysis by the Tropospheric Ozone Assessment Report (Schultz et al., 2017) of all available rural O 3 monitoring sites in the U.S. and Canada has provided some insight. While some O 3 data are available for urban sites in Mexico, there are no rural monitoring sites, greatly limiting our ability to understand Mexico's impact on U.S. baseline O 3 . However, roughly 3 dozen rural sites are located across southern Canada with trends that are similar to those observed on the U.S. side of the border, based on the annual 4 th highest MDA8 O 3 value. In general, there appears to be little change in O 3 across southern Canada in spring but there is an indication of decreasing O 3 in summer, presumably associated with Canadian NO x emission decreases of 34% from 2000 to 2014 (Hoesly et al., 2018). The trend in O 3 transported from Mexico to the southern U.S. is not known from observations, but Mexican NO x emissions have gone down by only 3% for 2000-2014 (Hoesly et al., 2018). Further details regarding observed O 3 trends across North America are provided in the SI (see Supplementary Note 2).
A number of studies have demonstrated that U.S. emissions and mole fractions of NO x have declined substantially Lamsal et al., 2015;Krotkov et al., 2016), but at the same time, there can still be substantial uncertainty in the absolute amounts (Hassler et al., 2016). One analysis suggests that the EPA National Emission Inventory (NEI) significantly over-estimates NO x emissions from mobile and/or industrial sources . The most recent inventory shows that U.S. anthropogenic NO x emissions decreased by 49% from 2000-2014 (Hoesly et al., 2018). It should be noted that fertilized agricultural and soil emissions of NO x may be substantial, and may become more important as industrial emissions decline (Jaeglé et al., 2005;Almaraz et al., 2018). These emissions have higher uncertainties than the industrial emissions.
Peak O 3 levels and ODVs have decreased at most monitoring sites in the U.S., with the largest decreases in the eastern U.S. and in California (e.g., Simon et al., 2015). Figure S1 shows trends of the annual 4 th highest MDA8 O 3 values (based on April-September observations) at all available rural O 3 monitoring sites in the U.S. and Canada, for the period 2000-2014. The great majority of sites show decreasing O 3 with p-values <0.10. Figure 5 shows O 3 trends at high elevation (>1 km altitude) rural sites over the period 2000-2016. The analysis is applied to the 5 th , 50 th , and 95 th percentiles of midday observations (1100-1600 local time) for spring (April-May) and summer (June-July-August) with the goal of assessing O 3 trends within air masses that are as regionally representative as possible. During spring only one site shows increasing O 3 , Mt. Bachelor for the 50 th and 95 th percentiles (both trends in the range of 0.5-0.6 ppb yr -1 ). In the case of Mt. Bachelor, only nighttime data are used here to focus on free tropospheric/baseline conditions, and the analysis at this particular site is limited to 2004-2016. Of the remaining western sites, most show no significant springtime trend while any significant trends are negative. In summer, Mt. Bachelor is again the only site with a statistically significant O 3 increase at the 50 th and 95 th percentiles (0.5 and 0.8 ppb yr -1 , respectively), likely due to recent increases in regional wildfire influence . Otherwise, sites in the west and east show a clear tendency towards decreasing summertime O 3 , especially in the upper tail of observations (95 th percentile), presumably due to regional emissions controls. These results, limited to observations since 2000, differ from the conclusions of prior studies spanning the much longer periods of 1990- ) and 1988-2014, which showed a general increase of O 3 in spring and no consistent trend in summer. While most U.S. rural sites do not show significant springtime O 3 decreases since 2000, it appears that regional emission controls have led to widespread decreases in summertime O 3 at these sites, especially in the upper tail of observations. Models may fail to simulate accurately the responses of O 3 to changes in U.S. emissions due to shortcomings in the underlying emission inventories. Several retrospective dynamic model evaluation studies using CMAQ tend to underestimate observed decreases in U.S. O 3 over the past decades (Foley et al., 2015;Xing et al., 2015;Zhou et al., 2013). Karamchandani et al. (2017) found that models more accurately simulate trends in observed O 3 in southern California when basin-wide VOC emissions were doubled. In contrast, for the eastern U.S., Travis et al. (2016) found that reducing industrial NO x emissions, compared to the NEI, gave results that were more consistent with observations. Thus, emission inventory accuracy is key to model performance and inventories may have biases that vary by region. Inaccuracies in the magnitude of NO x and VOC emissions introduce errors in the modeled sensitivity of O 3 to changes in precursor emissions. Wherever possible, O 3 sensitivities to precursor emissions should be evaluated directly as other sources of errors (e.g., inaccurate representation of changes in chemical or depositional loss rates) may also contribute to discrepancies between modeled and observed responses. To the extent that models misrepresent the contribution to O 3 from domestic sources, they will incorrectly estimate the relative fractions of controllable and background O 3 .
We examined the change in the annual 4 th highest MDA8 for 2000-2017 for 9 urban locations in the U.S. (San Bernardino, Chicago, Atlanta, Boston, Albuquerque, Sacramento, Salt Lake City, Denver, and Reno). In each location, we chose a single monitoring site with one of the highest ODVs in that urban area ( Figure S2). From this we find that San Bernardino, Atlanta, Boston, Albuquerque, and Sacramento all show statistically significant downward trends in the 4 th highest MDA8, whereas Chicago, Salt Lake City, Denver, and Reno show no significant trend  (Table S3). Overall, the significant reductions in the urban areas are generally consistent with the rural O 3 trends shown in Figure S1. The negative trends in 4 th highest MDA8 O 3 are linked to significant reductions in emissions of O 3 precursors, while at the same time there can be important regional differences in emission trends (e.g., emissions related to oil and gas extraction in some parts of the western U.S.) that can help explain some of the weaker trends. We note that three of the four locations with no significant trend are high elevation sites (Salt Lake City, Denver, and Reno). Trends in O 3 at these western sites might also be influenced by increasing wildfire activity. Exclusion of wildfire EEs would impact the trend in ODVs at these sites, if relevant states have submitted the EE documentation and EPA approves. Although we have examined only a single monitor in each urban area, this demonstrates the importance of accurate assessment of the USB O 3 contribution for these locations and regional modeling to quantify the controllable sources, as described in Section 6, below.

USB O 3 influence on regional air quality modeling: A western case study
Regulatory applications (e.g., SIPs) require models to represent accurately O 3 sources so that they can be used to examine emission scenarios and demonstrate future attainment of the NAAQS. This section shows one case study to highlight results as used in regulatory model applications. The regulatory treatment includes exclusion of identified exceptional days and focuses on the top 10 observed days. While this case study compares only two models, it provides insights into the relationships between regional model estimates of USB O 3 and observations. In particular, this analysis compares how simulated USB O 3 and other sources correlate and the implications for model performance as used in regulatory modeling. The EPA Transport Assessment (US EPA, 2016c) and the Western Air Quality Study (WAQS, 2017) both independently simulated USB O 3 at 12-km resolution in Colorado for 2011. This is an ideal case study for USB O 3 relevant to state planning because the western states typically have high USB O 3 contributions, and because the Northern Colorado Front Range often experiences high O 3 levels that exceed the NAAQS. Both modeling systems use global simulations to provide time-varying boundary conditions (EPA: GEOS-Chem; WAQS: MOZARTv4) and quantified USB O 3 contribution as the sum of tagged boundary and natural sources of O 3 from May 1 to Sept. 29. Further details on both modeling systems are provided in the SI (see Supplementary Note 3). We compare simulations and contributions for two illustrative monitors: Chatfield (AQS 08-035-0004, hereafter CHAT), a regulatory relevant suburban monitor southwest of Denver, and Rocky Mountain National Park (AQS 08-069-0007, hereafter RMNP), a relatively rural high elevation monitor to the northwest. Figure 6 shows the observed and modeled MDA8 (EPA model only) and the USBO contribution (from both models) at CHAT. Figure S3 shows a similar comparison for RMNP. Monthly averaged biases at the CHAT monitor were marginally-negative in the EPA simulations (-2.5 ± 0.4 ppb) and marginally-positive in the WAQS simulations (4.0 ± 2.8 ppb), and both are consistent with literature synthesis of model performance (Simon et al., 2012). Figure 6 suggests four distinct segments of performance and simulated contributions at CHAT that are related to NCOS contribution. The simulations start in a USB O 3 dominated regime (May 1 to June 7), go through a transition period (June 8 to July 15), and then end with two periods dominated by local contributions (July 16 to Aug 22 and Aug 23 to Sept. 29). During the USB O 3 dominated period, the EPA model had stronger correlation (r = 0.74) than the WAQS (r = 0.33), and WAQS had several days where USBO was greater than total observed O 3 . During the transition period, both simulations performed poorly (r = 0.23). During the locally dominated periods, both simulations performed well. Table S4 shows additional correlations for individual model components. In general, there is a negative correlation between USBO and local contributions. Similar results were found at RMNP (see Figure S1), where the correlation was typically not as good as at CHAT. Based on this comparison, we find that periods associated with higher background contribution were associated with worse model performance. Thus, the simulations performed better during periods of sustained contribution (USB O 3 or local), simulations performed even better when USB O 3 and local contribution were not anti-correlated, and simulations performed best when local contributions were dominant.
Regulatory applications focus on high concentration days, so Figure 7 examines the two models' performance on only the top 10 MDA8 O 3 days. The top 10 days were defined by the observed mole fractions. For this analysis, we excluded two days from the observations with suspected significant stratospheric influence (June 7 th and 24 th ), consistent with guidance for regulatory modeling (US EPA, 2014a, and see further discussion in Supplementary Note 4 in SI). Both simulations have a negative mean bias (EPA: -5 ppb; WAQS: -4 ppb). The significance of the bias was evaluated using t-test. The null hypothesis is that the predicted and observed means are equal-put another way, that the predictions are on average unbiased. Despite large individual day biases on the top 10 days (range of +11 to -22 ppb), neither model bias was significant (p > 0.05).
We further compare USB O 3 between EPA and WAQS on the "observed top 10 days" to test if the choice of the modeling system produced significantly different contributions from NCOS and U.S. sources. Despite daily difference of up to 14 ppb, the average difference (5 ± 5 ppb) was not significant (p > 0.05). The USB O 3 differences were comparable in magnitude to differences in local contribution (-4 ± 8 ppb) that were also not significant. Our review of EPA and WAQS 2011 modeling for Chatfield highlights similarities between different models, but also confirms the need to improve modeling of background O 3 . Correlations between observations and contributions at CHAT over the whole period are generally consistent with previous studies (US EPA, 2013;Zhang et al., 2011;Emery et al., 2012) showing that: (1) USBO is a significant fraction of total O 3 at the CHAT and RMNP sites; (2) the observed and predicted O 3 are most strongly correlated with the local contribution; and (3) boundary conditions are anti-correlated with the local contribution (see Table S4).
Both models perform well for average biases, but model correlation with observations is better when local contributions are dominant and when anti-correlation between local and USB O 3 contributions is weak. The boundary conditions derived from global models are dominated by USB O 3 in both models, which suggests a need for more research coupling global and regional models. The top 10 observed days are generally when the models perform best, and both models predict total O 3 that is consistent with the observations and each other. The finding that the models perform worst when USB O 3 and local contribution anti-correlation is strongest, or during transitions from USB O 3 to local contribution dominance, highlights the need for more research on USB O 3 and provides specific conditions for future studies.

Evidence for NCOS from observations and models
Individual NCOS events have long been associated with episodic increases in surface O 3 , and much of our knowledge about their impacts in the U.S. and Canada has been inferred from routine ground-based measurements coupled with meteorological analyses (Ambrose et al., 2011;Fine et al., 2015;Jaffe and Zhang, 2017;Lefohn et al., 2012;Stauffer et al., 2017;Teakles et al., 2017;Wigder et al., 2013a, b) or with models and satellite retrievals (He et al., 2011;Lin et al., 2012a, b). These studies have been hampered by the sparsity of surface O 3 monitors in the western states where the impacts tend to be greatest , and by limited free tropospheric measurements by aircraft (Yates et al., 2013), ozonesondes (He et al., 2011), or lidars (Kuang et al., 2012;Langford et al., 2018).
The episodic nature of some NCOS makes it difficult to target these sources with dedicated field studies, but opportunistic measurements have been made during field campaigns with other objectives (Langford et al., 2012;Ott et al., 2016;Sullivan et al., 2015). Long-range transport of O 3 and its photochemical precursors from Asia to the western U.S. was a focus of several recent campaigns including the California Research at the Nexus of Air Quality and Climate Change (CalNex) (Neuman et al., 2012;Ryerson et al., 2013) and Arctic Research of the Composition of the Troposphere from Aircraft and Satellites (ARC-TAS) (Huang et al., 2010;Jacob et al., 2010) missions. The impact of U.S. wildfires on O 3 in the West was also investigated during ARC-TAS (Singh et al., 2012) and other studies Dreessen et al., 2016), and the influence of wildfires, long-range transport, and stratosphere-to-troposphere transport (STT) were foci of the Las Vegas Ozone Study (LVOS) (Langford et al., 2015a).
Most STT in the U.S. occurs through tropopause folds, tongues of upper troposphere/lower stratosphere (UT/LS) air extruded beneath the jet stream circulating around mid-latitude cyclones. These occur most frequently in winter when Rossby wave activity is at a maximum in the Northern Hemisphere, but the potential impact on surface O 3 is greater in late spring through early summer, when there is more O 3 in the lower stratosphere and deeper mixed layers can more easily entrain O 3 that reaches the lower troposphere (Langford et al., 2017). Descending stratospheric intrusions can also merge with biomass burning plumes (Brioude et al., 2007) or transported pollution (Cooper et al., 2004a, b;Lin et al., 2012b) and carry additional O 3 from these sources downward to the surface. Most tropopause folds are dissipated in the free troposphere and the transported O 3 becomes part of the free tropospheric background. Deep tropopause folds sometimes create localized spikes in surface O 3 (Langford et al., 2009), but they more frequently lead to smaller increases (<20 ppb) that can affect larger areas over several days (Lin et al., 2012a). They can also indirectly increase surface O 3 by fomenting the spread of wildfires due to their low humidity . Several studies (e.g., Skerlak et al., 2014) have shown that the west coast of North America is one of the preferred regions for deep tropopause folds and there is growing evidence that the integrated contributions of frequent intrusions and co-mingled Asian pollution contribute to the springtime maximum in background O 3 in the southwestern U.S. and Intermountain West. STT events have also been implicated in exceedances of the O 3 NAAQS in the western U.S. (Langford et al., 2009;Langford et al., 2015a).
The contributions of STT to surface O 3 are not easily simulated using regional CTMs, which have traditionally included only the troposphere with no internal stratospheric processes. Regional simulations that use a global model to provide the lateral boundary conditions have shown qualitative success at simulating STT timing and location, but typically with significant under- (Emery et al., 2012;Zhang et al., 2014) or over-estimations (He et al., 2011). Under-estimations have often been attributed to poor horizontal resolution. Emery et al. (2012) showed several case studies where 12-km horizontal resolution was capable of reproducing transport to the surface. Inadequate vertical resolution and mixing is also a problem; for example, He et al. (2011)  in the upper troposphere that improved springtime performance by the WRF-CMAQ model, but degraded it in fall. One outstanding challenge for model assessments of STT is how to treat O 3 that was originally produced in the troposphere, transported to the stratosphere, and then transported back to the troposphere, as part of a stratospheric intrusion. Zhang et al. (2014) show that different definitions for stratospheric O 3 can lead to a factor of 2 difference in the amount of O 3 identified as "stratospheric". While this does not change the total modeled O 3 , it could lead to significant discrepancies in source contributions identified by different models.
Stratospheric intrusions can be identified in high-resolution reanalysis data (Knowland et al., 2017), and some global models have been successful in reproducing the surface contributions of STT. Simulations by GFDL-AM3 (Lin et al., 2012a, b), RAQMS (Pierce et al., 2003), and FLEXPART (Brioude et al., 2007) agreed well with lidar and in situ measurements made during the Las Vegas Ozone Study (LVOS) (Langford et al., 2017). He et al. (2011) also found good agreement between FLEXPART and surface and ozonesonde measurements made during several STT events in the BAQS-Met campaign. GFDL-AM3 estimated that deep STT can episodically increase surface O 3 by 20-40 ppb on days when observed MDA8 O 3 exceeds 70 ppb at western U.S. high elevation sites. GEOS-Chem can identify STT influence at the surface at high elevation sites but typically underestimates the contribution .
Biomass burning can produce significant amounts of O 3 , and wildfires are a growing concern (US GCRP, 2016). In the western U.S., forest management and climatic factors (e.g., drought and pine bark beetle infestations) have resulted in extensive tree mortality (Raffa et al., 2008), a significant increase in wildfire activity (Dennison et al., 2014), and deteriorating air quality in some areas (McClure and Jaffe, 2018). Agricultural burning is commonplace in the central and eastern U.S. (McCarty et al., 2007;Liu et al., 2016), but these fires are, in principle, controllable so are not considered NCOS. The chemistry in fire plumes is complex and highly variable, and does not always generate O 3 . In a review of more than 100 studies on wildfire smoke, Jaffe and Wigder (2012) found that O 3 production generally increases for up to 5 days downwind, but with a very wide range in reported ΔCO/ΔO 3 enhancement ratios. While the majority of smoke plumes show some degree of O 3 enhancement, many studies have found no O 3 production or even O 3 loss. This reflects the large variability in NO x and VOC emissions, plume heights, and downwind meteorology (Briggs et al., 2016;Baylon et al., 2015). Because wildfire emissions have high VOC/ NO x ratios (Akagi et al., 2011), O 3 production can increase when plumes pass over NO x -rich urban areas (Singh et al., 2012;Gong et al., 2017).
Modeling O 3 production in wildfire plumes with Eulerian models is complicated by variable emissions, sub-grid processes, complex chemistry, uncertainties in emission magnitudes and injection heights, and the poorly characterized radiation fields in and around smoke plumes. Chemical transport models often over-predict the amount of O 3 produced near the fire Zhang et al., 2014;Lu et al., 2016), although the simulated bias is strongly case dependent. For example, Baker et al. (2016) used CMAQ to model the O 3 produced from two wildfires that burned in 2011 and found frequent overpredictions of up to 60 ppb in hourly mole fractions. This may be mainly due to the presence of oxygenated VOCs in fire emissions, especially acetaldehyde (Akagi et al., 2011), which result in rapid sequestration of NO x into PAN (Briggs et al., 2016;Müller et al., 2016). Herron-Thorpe et al. (2014) evaluated MDA8 O 3 at numerous sites in the Pacific Northwest for the summers of 2007 and 2008 and found that the AIRPACT-3 modeling system had a slight negative bias of 4.6 ppb with a mean error of 8.9 ppb over the two summers with significant fire emissions, but the authors also identified some large over-predictions for individual events. In summary, estimating wildfire O 3 production from Eulerian models is challenging, due to numerous factors, and these models need careful evaluation with observations. Alvarado et al. (2015) developed a Lagrangian plume model to examine both O 3 and secondary aerosol formation from one prescribed fire in California. These results supported a critical role for rapid in-plume chemistry and NO x sequestration (as PAN) to explain O 3 formation rates. A similar box model approach was successfully used by Müller et al. (2016). Both the Lagrangian and box model approaches avoid the problems of grid resolution, which is a major challenge for modeling fire plumes with 3D Eulerian models. Using a statistical model, combined with surface particulate matter (PM) and satellite data from the NOAA Hazard Mapping System, Gong et al. (2017) showed that wildfire impacts on MDA8 O 3 at 7 urban sites in the western U.S. range from negative values up to 33 ppb, including on days that had MDA8 values over 70 ppb. Plume models and statistical methods may provide useful estimates of O 3 production in fire plumes, but these approaches need further evaluation.

Methods to quantify the impact of NCOS on regulatory monitors as relevant to policy
The CAA recognizes that states and tribes should not be held responsible for sources of air pollution over which they have no control and provides several relief mechanisms to address NCOS. These include the Exceptional Events (EE) Rule (US EPA, 2016b) and CAA 179B provisions related to international transport (US EPA, 2016a). The effective implementation of these mechanisms depends on the ability to quantify the amount of O 3 from NCOS. Here we review several methods and assess the strengths and weaknesses of each approach.
The EPA has not yet published guidance on EE STT demonstrations; however, the EPA has approved EE demonstrations submitted by the state of Wyoming (WYDEQ, 2012;US EPA, 2014b) and other states (https://www.epa.gov/ air-quality-analysis/treatment-air-quality-data-influencedexceptional-events). These demonstrations can include measurements and model simulations showing layers of stratospheric air (characterized by elevated O 3 , very low humidity, and CO), increased potential vorticity (Xing et al., 2016), and transport into the boundary layer. These analyses provided qualitative demonstrations of substantial contribution from a stratospheric intrusion event but do not provide quantitative estimates of the contribution to O 3 . While model simulations can provide quantitative estimates of stratospheric contributions, models sometimes fail to simulate accurately the observed surface O 3 during intrusion events and thus do not provide reliable quantitative estimates. Langford et al. (2015aLangford et al. ( , 2017 have shown that O 3 lidar measurements can be useful for directly observing layers of stratospheric air that descend deep into the troposphere and reach the surface boundary layer. Quantitative attribution of the stratospheric contribution can be improved if these observations are supplemented by surface measurements of O 3 , CO, and PM 2.5 to help determine if the descending UT/LS air has mixed with international transport or wildfire plumes. The EPA has published guidance on EE for wildfires (US EPA, 2016b) that describes three levels (or tiers) of technical analyses required to support an EE demonstration for a high O 3 day. All tiers include a narrative that demonstrates a clear causal relationship between the wildfire and an O 3 exceedance. When a fire is close to a site where monitored O 3 is typically low, Tier 1 uses trajectory analyses (e.g., HYSPLIT) and satellite imagery to show that the fire plume impacted the monitor. For Tier 2, fire emissions divided by distance from the monitor (Q/D) must be greater than 100/tons/day/km. Tier 2 additionally requires evidence that smoke from the fire impacted the monitor, such as monitoring data, satellite imagery, or photographs. For all other cases, a Tier 3 demonstration requires further additional evidence that supports the clear causal relationship between the wildfire and the O 3 exceedance. Typically, this includes an estimate of the wildfire contribution using matching day analyses, statistical regression models, or photochemical models, as described in more detail in US EPA (2016b). We note that the Q/D method, described in the EPA guidance, is based on previous methods for primary pollutants, and at present, there has been very little evaluation of the Q/D method with respect to O 3 produced from wildfires. A number of states have successfully demonstrated EEs for O 3 due to wildfire emissions, as described on the EPA website (https://www.epa.gov/air-quality-analysis/ treatment-air-quality-data-influenced-exceptional-events).
Because of the difficulty of using Eulerian models to estimate wildfire O 3 , EPA guidance also recommends use of a statistical approach. Statistical relationships have been developed to estimate O 3 as a function of a variety of meteorological indicators (e.g., Camalier et al., 2007). Depending on the location and meteorological data available, this method typically explains between 50 and 80% of the observed daily variability. Several studies have applied this method to estimate the O 3 contribution due to wildfires (CARB, 2011;Jaffe et al., 2013;Gong et al., 2017). In this approach, the statistical model is used to estimate the usual O 3 mole fraction for the observed meteorological conditions and the difference between the observation and the predicted, called the residual, is considered the additional O 3 due to some unusual source. While this approach cannot identify the cause for the additional O 3 , it can give an indication of the magnitude of unusual contributions, if the residual is sufficiently large. Both the EPA guidance and Gong et al. (2017) discuss this method in more detail.

Conclusions and recommendations
The O 3 NAAQS has been strengthened several times since 1979 and most recently set at 70 ppb in 2015. With each downward step, the relative importance of background O 3 increases, as does the role of USB O 3 in air quality policy. Contributors to USB O 3 , also called noncontrollable O 3 sources (NCOS), include natural precursor emissions (e.g., wildfires), long-range transport (e.g., from Asia, Canada, Mexico, or other countries), and stratospheric intrusions. When the standard is strengthened, daily variations in NCOS become more important and contribute to an increased frequency of MDA8 levels above the O 3 NAAQS. Model-calculated USB O 3 is greatest in March through June, with monthly mean MDA8 mole fractions at higher elevations in the west of up to 50 ppb and annual 4 th highest MDA8 values exceeding 60 ppb at some locations. Lower elevation cities nationwide have monthly mean USB O 3 of 20-40 ppb during the O 3 season. Daily variations, particularly in spring and early summer, can be due to stratospheric intrusions mixed with Asian pollution, which can contribute to observed MDA8 values over 70 ppb. Elevated levels of O 3 or its precursors are also found in fire plumes, in some cases contributing to observed MDA8 O 3 values in excess of 70 ppb, particularly if fire plumes interact with NO x -rich urban emissions.
While USB O 3 cannot be measured directly, baseline O 3 can, but suitably positioned observational stations are limited in number. Along the West Coast, baseline O 3 has increased since 2004 at the Mt. Bachelor Observatory in Oregon (2800 m asl) since 2004, while surface/marine boundary layer O 3 at Trinidad Head in northern California has decreased and O 3 at Cheeka Peak, Washington (500 m asl), is largely unchanged. However, we note that the marine boundary layer sites are less relevant to air quality beyond their immediate coastal surroundings. In contrast, the Mt. Bachelor site is more representative of the free tropospheric inflow to western North America, but the data record is relatively short. So, while there is a significant positive O 3 trend at this site, both meteorological variability and changes in USB O 3 are likely involved. In comparison, O 3 trends from most rural and urban sites in the U.S. show a consistent downward trend in the annual 4 th highest MDA8 values since 2000, indicating the importance of regional emission reductions. The exceptions to this pattern are Chicago, Salt Lake City, Denver, and Reno, where trends in the annual 4 th highest MDA8 at the most polluted monitors have not changed significantly since the year 2000.
Multiple methods have been used to estimate USB O 3 , and, at times, significant differences can arise. These estimates of USB O 3 rarely include uncertainty. The lack of consistent reporting of model performance metrics hinders a quantitative uncertainty estimate. Uncertainty in USB O 3 is estimated from many factors including differences between model results, model biases against observations, and interannual variations and trends. Baseline O 3 can vary significantly between years. At Trinidad Head in the marine boundary layer, spring (April-May) observed mean O 3 ranges from 32-48 ppb, based on data from 2004-2016. At Mt. Bachelor (2.8 km asl), the range in spring mean O 3 is 45-59 ppb over the same time period. For summer (June-August), the ranges are 18-29 ppb at the surface and 42-55 ppb at 2.8 km asl. Thus model simulations of USB O 3 must demonstrate the ability to capture these significant interannual variations with no significant bias. If systematic model biases are present, these must be explored so as to understand the underlying cause.
Given these limitations, our best estimate of the current uncertainty in the seasonal mean USB O 3 for typical years is ±10 ppb, which arises from model uncertainty, as discussed in Section 4. However, in some years, seasonal mean baseline O 3 is more than 5 ppb higher or lower than average (Figure 4) as a result of climate variability (e.g., El Niño), wildfire extent, and possibly other factors. Thus, for any given year, our predictive capability of USB O 3 could have an uncertainty greater than ±10 ppb, which arises from the modeling uncertainty compounded by the additional interannual variability. Uncertainty for shorter time periods can be higher (e.g., Figure 6) and accurate estimates of USB O 3 are especially important for MDA8 O 3 on days that exceed the NAAQS. In the case of potential EE determinations (e.g., due to fires or stratospheric intrusions), this level of uncertainty can have policy implications. In the case of SIP or NAAQS analyses, enhanced NCOS contributions that remain in the ODV (i.e., not excluded through the process defined in the Exceptional Events Rule) can directly impact the level of estimated controls required (US EPA, 2013). We note that some level of NCOS is always present as part of the mean USB O 3 . Methods used to estimate USB O 3 and NCOS include both CTMs, as well as empirical approaches, and the difference between these methods is not well characterized. This is particularly true for wildfires that can occur at spatial scales smaller than those typically resolved by CTMs. In such cases, Lagrangian and statistical models can be used, but their application in such situations is still in its infancy.
The effort to quantify USB O 3 to date has lacked coordination and dedicated resources, as was noted in previous reports (NRC, 2010;McDonald-Buller, 2011;Cooper et al., 2015). With a lower O 3 NAAQS, local, state, and regional air quality planning organizations will increasingly need improved methods to quantify USB O 3 and NCOS with smaller uncertainties. To reduce these uncertainties, we have identified a series of research needs (in approximate order of importance):

An improved observation network is needed to better understand baseline O 3 , USB O 3 , and NCOS.
While the U.S. has an extensive network of regulatory surface O 3 monitors, co-located measurements of key species (e.g., CO, NO x , VOCs, PM 2.5 , and speciated PM) that could be used to identify influences from stratospheric, foreign, natural, and/or biomass burning sources are made at only a few locations. In addition, most of the existing O 3 monitors are located near population centers because of regulatory requirements and limited funding, leaving much of the interior western U.S. under-sampled. A new generation of low-cost sensors could facilitate routine observations of O 3 and other key tracers at more surface monitoring sites (with careful validation), and an augmented baseline network with remote or high mountain locations and frequent vertical profiling (e.g., ozonesondes, lidar) (Langford et al., 2018) would improve identification of stratospheric, foreign, natural, and/ or biomass burning sources. Key locations for enhanced observations are elevated locations and/or vertical profiles along the West Coast, in the Intermountain West, and along the U.S.-Mexico border.

Improved quantification of USB O 3 and the key processes controlling its distribution could be accelerated by one or more large-scale field experiments.
Ideally, an experiment of this type would be conducted shortly after the TEMPO satellite instrument (Zoogman et al., 2014) becomes operational to provide large-scale, spatially and temporally continuous measurements across North America that can be directly linked to USB O 3 estimates. The experiments should also include a suite of baseline sites (expanded from the targeted network above), near-continuous vertical profiles of O 3 and precursor species, high mountain measurements, aircraft measurements, and multiple models operating over different seasons, including when USB O 3 is expected to be highest and during O 3 exceedances. Consideration should also be given to examining USB O 3 over multiple years to account for interannual variations. Past experience has shown that the success of large-scale field experiments requires a communitywide effort with observational and modelling assets drawn from multiple federal, state, and university institutions (e.g., CalNex and INTEX-B).

The ability of CTMs to quantify USB O 3 accurately and consistently across different temporal and spatial scales should continue to be improved to more effectively support policy and scientific applications.
In general, CTMs have greatly improved our understanding of the sources of USB O 3 . Continued progress will require process-oriented evaluations that include other key tracers wherever possible, with more attention paid to uncertainty and sensitivity analysis. Future modeling studies should report a consistent set of metrics including, at minimum, seasonal mean USB O 3 , the USB O 3 on the observed annual 4 th highest day and top 10 days (at the same time as the O 3 maximum), and distributions of USB O 3 binned by observed O 3 (e.g., at least for the ranges below 60 ppb, 60-70 ppb, and above 70 ppb), as well as standard model performance metrics identified in recent reviews (Simon et al., 2012). Model stud-ies should also report evaluation metrics specific to the intended use (e.g., fire or stratospheric intrusion evaluations, if those results are reported). At their core, models rely on emission inventories. Particularly as larger industrial emissions are reduced, smaller source categories become more important. The role of deposition and chemical sinks in shaping O 3 distributions, including the USB O 3 component has received far less attention than the role of sources. We recommend that coordinated modeling efforts include diagnostics to allow exploration of inter-model differences in sinks as well as sources of USB O 3 . In urban areas, USB O 3 estimates from models of different spatial resolutions may differ strongly across models due to NO x titration. Additional work is needed to test whether consideration of odd oxygen (defined as O 3 + NO x ) reconciles such discrepancies. For hemispheric or global models that provide boundary conditions, it is necessary to archive four dimensional fields of all key tracers at 3-hour resolution at the regional model boundaries. Further, tracers or diagnostics are required that can distinguish between different types of NCOS at the boundaries. For detailed model inter-comparisons, full four-dimensional fields of O 3 , VOC, CO, and NO x , and key reaction products such as nitric acid, organic nitrates, total oxidized nitrogen species, and peroxides should be archived across the model domain. Comparison between models should focus on process-level analyses and model sensitivities, considering not only O 3, but also related species. Intercomparisons of model source apportionment estimates can be difficult to interpret because of differences in the approaches used to implement source attribution techniques. Instead, process-level intercomparisons should include sensitivity experiments, such as simulations with zero anthropogenic emissions, to assess differences in model estimates of natural and background O 3 . Simulations with zero anthropogenic emissions will also provide improved estimates of background O 3 in urban areas where local NO x emissions can titrate O 3 . A better understanding of model uncertainty will require comparisons with baseline observations, targeted intensive campaigns, and coordinated model inter-comparisons.

Better methods for quantifying the impact of wildfires on O 3 (and PM) should be developed, tested, and compared.
Wildfires can drive exceedances of both O 3 and PM, but the formation and dispersion associated with fires is poorly understood. Future progress will require more detailed observations such as those currently planned for several largescale process-oriented studies (e.g., FIREX [https:// esrl.noaa.gov/csd/projects/firex/whitepaper.pdf], FIRECHEM [https://espo.nasa.gov/FIREChem_ White_Paper], and WECAN [https://www.eol.ucar. edu/field_projects/we-can]). The field experiments will require measurements upwind and downwind of wildfires to develop a detailed understanding of chemical processing, establish plume to plume vari-ability, and improve smoke plume simulations by air quality models. Wildfire chemical processes simulated by Eulerian and Lagrangian models should be compared to statistical models to evaluate the efficacy of the three approaches.
Over the past decade, much progress has been made in our efforts to understand aspects of the USB O 3 problem (e.g., episodic stratospheric sources, interannual variability, wildfire contributions), but these efforts have lacked coordination. While our understanding of USB O 3 and the available tools have advanced, the uncertainties remain large and many of the conclusions and recommendations made here are similar to those made in the McDonald-Buller et al. review (2011). For a topic of such importance to air quality management and regional stakeholders, a more focused approach is needed. The strengthening of the O 3 standard and the increased importance of EE demonstrations heighten the need for the scientific, regulatory, and stakeholder communities to make substantial progress in improving the observations and tools to understand USB O 3 .

Supplemental files
The supplemental files for this article can be found as follows: •