Cleaner air has contributed one-fifth of US maize and soybean yield gains since 1999

Crop productivity is potentially affected by several air pollutants, although these are usually studied in isolation. A significant challenge to understanding the effects of multiple pollutants in many regions is the dearth of air quality data near agricultural fields. Here we empirically estimate the effect of four key pollutants (ozone (O3), particulate matter (PM), sulfur dioxide (SO2), and nitrogen dioxide (NO2)) on maize and soybean yields in the United States using a combination of administrative data and satellite-derived yield estimates. We identify clear negative effects of exposure to O3, PM, and SO2 in both crops, using yields measured in the vicinity of monitoring stations. We also show that while stations measuring NO2 are too sparse to reliably estimate a yield effect, the strong gradient of NO2 concentrations near power plants allows us to more precisely estimate NO2 effects using satellite measured yield gradients. The presence of some powerplants that turn on and others that shut down during the study period are particularly useful for attributing yield gradients to pollution. We estimate that total yield losses from these pollutants averaged roughly 5% for both maize and soybean over the past two decades. While all four pollutants have statistically significant effects, PM and NO2 appear more damaging to crops at current levels than O3 and SO2. Finally, we find that the significant improvement in air quality since 1999 has halved the impact of poor air quality on major crops and contributed to yield increases that represent roughly 20% of overall yield gains over that period.


Introduction
A growing literature has demonstrated negative effects of poor air quality on agricultural outcomes, using a variety of experimental [1,2], observational [3,4], and modeling approaches [5,6]. These effects are deemed largest in countries such as India and China that have persistent poor air quality [4,7,8], although yield losses have also been reported in other regions including Europe and the United States [3,9].
Significant uncertainties remain regarding both the effect of individual pollutants as well as the overall effect of pollution on agriculture. These uncertainties limit our ability to identify the potential benefits for agriculture from air quality improvements associated with technology or policy changes [10], as well as which pollutants deserve the most scrutiny. Numerous studies have focused on effects of O 3 , reflecting the well-documented effects in greenhouse experiments as well as the availability of data on O 3 levels from air monitoring stations. While other pollutants such as NO 2 and SO 2 are also known to be phytotoxic [11,12], they have been subject to less empirical study [13], presumably because of a relative paucity of data. Study of these two compounds is also complicated by the fact that they react chemically in the atmosphere and contribute to formation of both O 3 and aerosol particulate matter (PM). PM, which is often monitored because of wellknown human health effects, may also impact agricultural productivity through physical and socioeconomic pathways, including dimming and scattering of solar radiation [7] and reduction in farm worker productivity [14].
Some studies have attempted to estimate the net impact of multiple pollutants, either by comparing yields directly to emission sources [8,15] or by combining estimates from different studies [16]. These analyses indicate a significant role for multiple pollutants, for instance with Fischer [16] estimating 21% yield loss from O 3 and 9% from dimming related to PM in India wheat.
Here we consider effects of multiple pollutants on maize and soybean in the central United States, a region that is a major global producer of these crops and where air pollution has declined in recent decades subsequent to the Clean Air Amendments in 1990 (figure S1 (available online at stacks.iop.org/ERL/16/ 074049/mmedia)). The main objectives of the current study are to empirically estimate the sensitivity of maize and soybean yields to four key pollutants (O 3 , PM, SO 2 , and NO 2 ), and to assess the role that pollution reductions have played in yield progress over the past two decades. To do this, we exploit a few useful features of the study region. First, comprehensive public datasets exist for air quality monitors, power plant emissions, and county-level crop yields (figure 1). Second, despite overlap in emissions sources (e.g. vehicles, power plants) we observe relatively low correlation between pollutants at specific locations (table S1), which allows us to estimate the effects of each. Third, recent progress in satellite analysis has resulted in fine-resolution (30 m) yield maps for both maize and soybean for a nine-state region comprising the largest rainfed maize and soybean producing states since 1999 [17,18] (figure 1). These data enable us to tailor our data sample to cropped areas near monitors (versus, for example, interpolating pollution data) and enable detailed inspection of yields across pollution gradients. In particular, we find that yield gradients near power plantsstationary sources where emissions of SO 2 and NO 2 are continuously monitored and vary dramatically by feedstock type and unit technology-provide a very useful constraint on the overall sensitivity of yields to pollution.

Study region and period
The United States is a major producer of both maize and soybean, constituting roughly one-third of global production for each crop [19]. In the current study, we focus on a nine-state region (Illinois, Indiana, Iowa, Michigan, Minnesota, Missouri, Ohio, South Dakota and Wisconsin) that produces roughly twothirds of national output and for which detailed satellite-based yield estimates exist (see section 2.5). We consider the 1999-2019 period, again based on the availability of satellite-based yield records for both crops.
For O 3 , we computed the accumulated ozone exposure over a threshold (AOT) during the main growing season (June-August for both maize and soybean) for different thresholds (thresh) from 40 to 80 ppb in 10 ppb increments: where O h is hourly ozone values, h goes from 0 to H (the total number of hours with measurements in June-August) in increments of one hour. Although a threshold of 40 ppb is commonly used in the literature (i.e. AOT 40 ) [5,6,20], some authors have argued for a higher threshold [21], and therefore we examined a range of possible values. Most stations have measurements for all hours, although exceedances generally occur during daytime. A small number of stationyears (∼2%) had more than 20% of days missing and were excluded from analysis. The remaining site-years were missing an average of less than one day of measurement during the 92 d June-August period. For all pollutants other than O 3 (PM 10 , PM 2.5 , NO 2 , and SO 2 ) we calculated the average of daily maximum observations for June-August as reported in the daily EPA files. PM 2.5 refers to all PM below 2.5 mm in size, whereas PM 10 includes particulates below 10 mm. Slight changes in the growing season definition, for instance adding April-May, did not qualitatively affect results.

Power plant data
We used emissions data from US electric power generation facilities provided in the EPA's Air Markets Program Database (https://ampd.epa.gov/). The AMPD includes most generation facilities in the country, and provides facility location, technology, operation, and emissions information. We aggregated emissions and fuel information to the facility level (one facility might contain many generation units), and coded facilities according to their primary fuel type with 'coal' including any coal or coke use, 'natural gas' including all variants of natural gas, and 'other' encompassing most residual oil, biomass, and diesel fuel facilities.

Satellite-based pollution data
For evaluating NO 2 gradients near power plants and identifying facilities with high gradients, we utilized tropospheric column NO 2 data from the Sentinel 5 TROPOMI sensor. TROPOMI NO 2 estimates are available at 0.01 • × 0.01 • (∼1 km) resolution starting in late June 2018. Here we use the July-August 2018 average of daily tropospheric column vertical density from the L3 OFFL product, obtained through Google Earth Engine [22].

Crop yield data
Annual maize and soybean yields at the county level were obtained from the USDA National Agricultural Statistics Service (https://quickstats.nass.usda.gov/). NASS yield estimates are based on phone surveys of farmers throughout the region, and generally cover all counties producing maize and soybean. The number of counties with data varies slightly by year because of varying survey response rates, with a notable decline in recent years [23].
Yields at 30 m spatial resolution were obtained for each year in the study period from the scalable crop yield mapper (SCYM) [17] applied to Landsat satellite imagery. Briefly, SCYM uses crop simulations to develop regressions relating final maize (or soybean) yield to two measures of canopy greenness (greenness at the peak of season and 30 d after the peak) and four measures of weather relating to temperature, rainfall, and radiation. Multiple Landsat observations are then used to estimate the greenness measures for each pixel classified as maize or soybean in the USDA's Cropland Data Layer [24]. SCYM estimates have been validated for both crops against county and field level data in the region [17,18]. These studies document strong agreement between SCYM and ground-measured yields, as well as the ability of SCYM to accurately estimate the yield response to management or soil factors at the field level [25]. For the current study, SCYM yields were averaged for a 20 km radius around each EPA station in each year. In addition, a sample of pixel-level estimates were taken for 20 000 random pixels in each state to examine yield variation with distance from power plant locations.

Weather data
Daily values of minimum temperature, maximum temperature, and precipitation were obtained from the daily PRISM 2.5 arc minutes (∼5 km) resolution weather dataset [26] for each location with yield estimates. Growing season weather conditions were then summarized by calculating four common measures known to explain variation in maize and soybean yields [27]: growing degree days between 8 • C and 30 • C during the growing season (GDD), extreme degree days above 30 • C (EDD), total precipitation between April 1 and August 31 (Pr), and the square of precipitation (Prsq).
In addition, for testing whether EPA monitoring stations (and, by inference, agricultural fields) upwind and downwind of power plants experienced significantly different pollution levels, we obtained daily wind speeds for each power plant facility from the University of Idaho Gridded Surface Meteorological Dataset [28]. Average wind direction over the June-August period was calculated for each facility in each year, and the angle between wind direction and EPA monitor direction was calculated for every monitor that was closest to that facility. Monitors were considered downwind if the cosine of the angle was greater than zero, and upwind if the cosine was less than zero, although results were not sensitive to using stricter definitions of downwind (e.g. cosine greater than 0.5).

Yield regressions
To relate yields to pollution, we estimated a panel model with site and year fixed effects: where yield i,t is yield in site i in year t, W is a vector of controls for weather variation, P is the measure of pollution exposure, γ i is a site intercept or fixed effect, and γ t is a year intercept. We alternatively define yield as the average of all SCYM estimates within a 20 km radius around the EPA station location, or the reported NASS yield for the county in which the station is located. When using NASS yields, stations located in the same county were assigned identical yields. Weather controls were the aforementioned four weather variables (GDD, EDD, Pr, and Prsq) averaged over the same 20 km radius. Equation (2) was estimated using a fixed effect ordinary least squares regression in the fixest R package [29], with standard errors clustered at the site level. The regression analysis was conducted separately for maize and soybean. We did not attempt to estimate interaction terms between weather and pollution given the limited number of monitoring sites (figure 1); hence the effects of pollutants were assumed to be constant across different weather conditions.
The fixed effects for site and year indicate that each location and year is given its own unique intercept, which is a common practice to control for unobserved factors (e.g. fertilizer use, soil quality) that might both affect the outcome of interest (i.e. yield) and be correlated with spatial or temporal variations in the causal factor of interest (i.e. pollution), leading to bias in the estimated coefficients. By using fixed effects, identification of τ W and τ P comes only from deviations from the expected value for a given site and year, reducing the chance for bias from omitted variables. In addition, the use of fixed effects helps reduce the influence of collinearity between pollutants, since pollutants are less correlated after accounting for fixed effects (table S1). Figure S3 presents a simple simulation to illustrate the value of fixed effects when two variables are highly correlated across locations. Similarly, removing fixed effects helps to reduce the collinearity between pollutants and weather (table S1).

Yield gradients near power plants
In addition to the panel analysis that uses yields in the vicinity of monitoring stations, we also examine yield gradients near power plants. As described in the Results, these yield gradients are particularly useful for constraining the effect of NO 2 , given the relatively weak constraints provided by the panel analysis (owing to a low number of NO 2 monitors), and the strong gradients of NO 2 near power plants.
We define P i (d) as the expected value of a pollutant i at distance d from a power plant, estimated from a LOWESS (locally weighted scatterplot smoothing) regression of pollutant values vs distance ( figure 3). Similarly, we define Y(d) as the expected value of the yield residual at distance d, where the residual is from a regression of yields against weather and soil. For the yield regression we use the same weather variables as in equation (2), while for soil we use the National Commodity Crop Productivity Index (NCCPI) produced by the USDA [30].
For a given distance, we then define Which we refer to as the gradients in pollutant and yield levels, respectively, compared to values at 1 km distance from the power plants. We do not divide these differences by the distance, so the units of pollution gradients are in the original units (e.g. ppb) and the units of yield gradients are t ha −1 .
We then utilize the estimated coefficients from the panel analysis (equation (2)) to estimate the expected net yield impact of gradients in O 3 , PM, and SO 2 near the power plants: The effect of NO 2 on yields is then estimated as: Thus, the amount of yield gradient not explained by the other three pollutants, or by local weather and soil gradients, is attributed to the gradient in NO 2 . To assess the uncertainty of τ NO2 estimated in equation (6), we re-estimate the equation 100 times after resampling the τ s for O 3 , PM, and SO 2 based on standard errors from equation (2), and resampling estimates of P i (d) for each pollutant based on resampling of the sites used in the LOWESS fit (figure 3). We calculate equation (6) for distances of 10 km-50 km in 10 km increments, finding that estimates are most stable for 30 km-50 km (see section 3). We use d = 30 km for estimates reported in the main text, as the pollution gradients are less reliable beyond 30 km because of sparse station coverage at these distances (figure S2).

Estimation of total pollution effects and contribution to yield trends
For each year in the dataset, we multiply the inferred yield sensitivity to pollutants from equations (2) and (6) by the estimated average level of each pollutant at crop locations. The latter is estimated as: where P i,c,t is the average level of pollutant i over croplands at time t, P i,s,t is the average level at EPA station locations in year t, and S i is a scaling factor that relates the average value at croplands to the average value at stations. We utilize scaling factors, rather than the simple average of observed concentrations at monitoring stations, since monitor locations are not representative of agricultural regions ( figure S2). These scaling factors were derived in two ways. The first was to calculate a weighted average of the curve describing the dependence of P i on distance from the nearest power plants (e.g. figure 3(b)) using weights that correspond to the distribution of crop distances to power plants ( figure S2(a)). These weighted averages were calculated for each year and then divided by the simple average of station values for that year. Table S3 (in SI appendix) reports the average value of this ratio for each pollutant. In a second approach, we averaged satellite-based measures of pollutants over crop and EPA station locations, and calculate the ratio of these two averages. These satellite-based scaling factors were only possible for pollutants with accurate satellite measures of near surface concentrations, namely PM and NO 2 , and served mainly to corroborate the estimates derived from the station data (table S3). Main results reported in the paper correspond to the station-based scaling factors.
The total effect of all pollutants in each year was calculated by summing across the individual estimates for the four pollutants. This assumes that the effects of each are additive, and for instance the effect of PM does not depend on the level of O 3 . Other studies have made similar assumptions [12,16,31]. It is possible that effects could be multiplicative, for instance with damage from NO 2 exacerbated at high levels of SO 2 [32], in which case our total estimates of damage would be conservative.

Crop yields are significantly inhibited by all four pollutants
We first statistically relate crop yields measured near air quality monitors to the levels of pollutants measured at those monitors during the growing season using panel regression models. These regressions provide average estimates, across the study region and period, of the associations between variations in pollutant concentrations and crop yields while controlling for weather, common time variation (through inclusion of linear time trends or more flexible year fixed effects), and time-invariant site-specific drivers of yields such as soil differences (through inclusion of location fixed-effects; see Methods). Because separate monitoring networks exist for each pollutant, their locations do not typically overlap (table S2). For example, less than one-fifth of the O 3 observations and less than one-quarter of PM 10 observations over the study period were coincident with SO 2 observations. We therefore conduct regressions separately by pollutant, with sensitivity tests used to gauge the potential effect of omitted pollution measures. We estimate these relationships using both the high-resolution satellite-based yields (SCYM) and coarser county-level crop yield data from the USDA (NASS) [33].
We find broadly negative effects of O 3 , PM, and SO 2 on both maize and soybean yields (figure 2). For maize, effects are significantly negative (p < 0.05) whether using satellite-based estimates of yield averaged within 20 km of the EPA station or using NASS reported yields for the entire county containing the station. For soybean, effects are significantly negative except for the estimate for PM 10 when using NASS yields, and the estimate for SO 2 using satellite yields.
To assess whether collinearity with other pollutants would significantly alter results, we repeat the regressions above for the subset of site-years with multiple pollution measures ( figure S4). As expected, the much lower sample sizes lead to substantially larger error bars. However, we find that inclusion of other pollutants has relatively small effects on the estimated coefficients. This is consistent with the low correlations after removing site and year fixed effects (table S1), which indicates that while levels of pollutants may be correlated when examined in crosssection (e.g. cities versus rural areas), the fluctuations of individual pollutants within a given location are largely uncorrelated. Similarly, a recent study of O 3 effects in China found that O 3 coefficients were only slightly altered when accounting for PM [34].
The magnitudes of uncertainties in figure 2 reflect differences in the density of monitoring networks for each pollutant. Ozone has been monitored at 387 unique stations in the region, with a median coverage of 15 growing seasons of data for 1999-2018, resulting in over 4000 site-year observations. Roughly half this number was available for PM and SO 2 , resulting in slightly broader confidence intervals for the estimates. In contrast, fewer than 600 site-year observations were available for NO 2 , and as a result the panel estimates for effects of this pollutant were much less precise, and not significantly different than zero.
To further constrain the yield effects of NO 2 , we turn to investigation of yields in the vicinity of power plant facilities. We observe a clear increase in average satellite-based yields as distance increases from the nearest power plant facility (figure 3). The yield increases are most rapid within the first 30 km but continue to exhibit gradual increases for both maize and soybean up to at least 50 km. Ideally, one would be able to compare yield and pollution gradients for both down-and up-wind locations, as wind is a common source of exogenous variation to aid identification [35]. However, given relatively low wind speeds and variable wind directions in the region, we found very little dependence of pollution or yields on direction (figure S5). Nonetheless, other lines of evidence suggest that these yield gradients near power plants arise primarily from air pollution, rather than other variables correlated with power plant locations such as soil quality or farmer behavior. First and foremost, we observe that the gradients increase significantly after power plants turn on, and decrease significantly after they shut down. That is, using the subset of facilities that either turn on (n = 111) or shut down (n = 44) during the study period, we perform a regression to estimate the effect of a facility turning on or off on the local yield gradient (the average yield at 29 km-31 km distance minus yield at 0 km-2 km distance), using fixed effects for year and location to isolate the effect of whether the facility is active (table 1). We find a significant increase in the yield gradient after facilities turn on, as well as a significant decline in yield gradients after facilities turn off. Second, we observe only small gradients in observed soil or weather variables near powerplants, with yields predicted based on observed weather and soil (described in section 2.8) exhibiting gradients of less than 0.1 t ha −1 . While unmeasured aspects of soil quality could be important, we note that the NCCPI variable we use is a common summary of soil quality that captures many physical and chemical aspects of soil believed to influence yield.
Moreover, several lines of evidence suggest that among the pollutants leading to these yield gradients, NO 2 is likely the most important factor: (a) yield gradients are significantly larger near facilities where gradients in NO 2 , as measured in recent years by the Sentinel 5 TROPOMI sensor, are larger (figure S6); (b) yield gradients are also higher near facilities with higher overall NO x emissions (figure S6); (c) most importantly, while all pollutants generally decline with distance from power plant, NO 2 exhibits a stronger gradient than the others, likely related to its short atmospheric lifetime (figures 3(b) and (c)). The lack of a rapid decline in O 3 reflects the complex chemistry of O 3 formation and destruction, and is consistent with studies documenting that peaks in O 3 often occur far from the primary source of NO 2 [13,36,37]. Similarly, the gradual PM decline reflects the variety of PM sources and their broad-scale transport outward from facilities. On average, SO 2 initially exhibits an increase with distance from power plants ( figure 3(b)), a pattern that is driven entirely by facilities using coal as feedstock (as coal contains much more sulfur and creates much larger quantities of SO 2 during combustion than natural gas).
By combining our estimates of O 3 , PM, and SO 2 effects from the panel estimates (figure 2) with observed relationships between their concentrations and distance to power plant (figure 3(c)), we estimate the expected yield change associated with these pollutants as one moves away from power plants ( figure 3(d)). We then estimate the effect of NO 2 based on the residual yield gradient that is not explained by the other pollutants, finding that these gradients provide a much better constraint on NO 2 effects than the panel approach (figure S7). While we use the average gradient for all facilities to estimate NO 2 effects, we also find notable differences when limiting the analysis to only facilities with (or without) active coal units (figures 3(c)-(d)). On average, larger yield gradients are observed near noncoal facilities. This finding is consistent with the larger declines in PM and SO 2 as one moves away from these as compared to facilities with coal units. In both . Error bars show 5%-95% confidence interval based on uncertainty in both pollution gradients and yield sensitivities. Yellow bar indicates the residual yield gradient not explained by the three pollutants, which is then used to estimate the NO2 effect by dividing the residual yield gradient by the NO2 gradient. Table 1. Coefficients for regression models that estimate the effect of power plant facilities (1) starting or (2) stopping operation during the study period. The dependent variable in both cases is the yield gradient near the facility, defined as the difference in average yield 29 km-31 km away from the facility and 0 km-2 km away. Values in parentheses show the standard error of the estimate, clustered by year. Asterisks indicate that coefficients are significant at p < 0.05.
(1) Start-up model (2)  cases, the amount of yield gradient attributed to NO 2 (i.e. not explained by the combination of ozone, PM, and SO 2 ) is similar, as is the average NO 2 gradient ( figure 3(d)).
As an additional check on our primary NO 2 estimate we separately perform a cross-sectional regression of yields on satellite-derived NO 2 levels in 2018. These regressions, which control for county fixed-effects as well as weather and soil covariates, also find significant negative associations between yields and NO 2 (figure S8).

Total pollution impacts on yields are significant but declining
Using the inferred sensitivity of yield to each pollutant, we find the total yield burden in the region for each year by multiplying the derived coefficients by the average exposure across the cropped region. For our baseline specification, we estimate that the total yield losses from pollutants averaged 5.8% for maize and 3.8% for soybean for the 1999-2019 period in the nine-state region ( figure 4). Across all specifications, the median estimate was of slightly smaller magnitude than the baseline for maize (5.0%) and larger magnitude for soybean (6.6%). These losses were considerably larger at the start of the period, and gradually declined as air quality improved. On average, we estimate that the reduction in pollutants caused an increase of 4.0% for maize and 3.0% for soybean over the 1999-2019 period for the baseline specification, with a median of 5.0% increase for maize and 6.2% increase for soybean across all specifications. As a percentage of the overall yield gains for each crop over the time period, these values represent 19% for maize and 23% for soybean. Thus, we estimate that the benefits of cleaner air comprised roughly one-fifth of yield gains since 1999 for both crops.

Results are robust across many plausible model specifications
The results in figure 4 correspond to our baseline specification which uses high resolution satellite-based crop yield estimates, characterizes O 3 exposure as cumulative hours spent above 70 ppb and PM exposure as average daily maximum value of PM 10 , and uses year fixed effects that gives each year its own intercept. These baseline measures for O 3 and PM were based on analysis of out-of-sample prediction error across the various measures ( figure S9). Nonetheless, we find that while each of these assumptions has some effect on results, the overall impact of pollutants is fairly robust to each of these (figures S10-S13).
For most specifications, we find that O 3 and SO 2 have smaller impacts in a typical year than PM and NO 2 , with the relative importance of the latter two varying across models. In particular, effects of PM are sensitive to whether effects are estimated using PM 10 or PM 2.5 , and whether using averaging satellite-based yields in a 20 km radius around the monitoring site or using county level yields from NASS. For maize, PM effects are smaller in magnitude, and sometimes even positive in sign, when using county level data. Results for soybean are more consistent across specifications than for maize, with PM again being the most variable.
The consistency in estimates of overall impacts of pollution despite the uncertainty in PM effects results from the use of yield gradients in the vicinity of power plants to constrain estimates of NO 2 effects. Since all pollutant levels are lower at 30 km than 1 km distance from power plants ( figure 3), a larger estimated effect for O 3 , PM, or SO 2 implies a smaller effect for NO 2 . Specifically, in cases where the PM effects from the panel regression were strongly negative, a large fraction of the yield gradient near power plants can be explained by PM, and a smaller effect is thus estimated for NO 2 . In contrast, specifications that result in small effects of PM result in a much smaller fraction of the yield gradient explained by pollutants other than NO 2 . Thus, while PM uncertainty imparts significant uncertainty in NO 2 estimates, their combined effect is constrained by the observed yield gradient across known gradients of pollution.
A similar story emerges for the trend in pollution impacts over time. Total benefits of pollution reductions are fairly similar across all specifications, with the baseline specification ranking near the middle of the various plausible alternatives. In some specifications, including the baseline one, the biggest benefits are estimated to have accrued from reductions in NO 2 , whereas in others the biggest benefits are from reductions in PM or O 3 . O 3 reductions are particularly important in specifications that use higher thresholds for O 3 exposure (i.e. >60 ppb), because these high exposures have been declining faster than exposure to more moderate levels of O 3 (i.e. 40 ppb) or levels of other pollutants such as NO 2 (figure 4(a)).

Discussion and conclusions
Given the emphasis of much of the literature on O 3 impacts [3,5,6,34], it is somewhat surprising that both NO 2 and PM emerge as more important than O 3 in most of our specifications. We view three factors as potentially important in explaining this apparent discrepancy. First is the relatively low levels of high-O 3 exposure for the study period and region compared to much of the empirical literature, which includes Asia [7,16] as well as earlier decades and more eastern counties in the United States [3]. Second, much of the empirical literature does not adequately control for the effects of weather when estimating O 3 sensitivity. Among the four major pollutants studied here, O 3 levels are most strongly correlated with the occurrence of extreme heat (table S1), which is known to cause significant yield loss in both maize and soybean. In fact, if we omit extreme heat exposure (days over 30 • C, or EDD) as a control variable, our estimate of O 3 sensitivity increases by 56%-157%, depending on the crop and source of yield data. Similar increases were observed if we replace EDD with a cruder measure of average daytime maximum temperature, as done in [3]. Thus, while [3] estimate ∼10% and ∼5% yield losses in the United States circa 2000 for maize and soybean, respectively, whereas we estimate a roughly 3% loss in 2000 (with a range of roughly 0%-6% loss for both crops across different specifications), some of this difference likely results from their incomplete accounting for extreme heat. Although one could argue that some of the effects attributed to extreme heat are in fact from O 3 , several lines of evidence indicate a strong impact of extreme heat even in the absence of O 3 : extreme heat has similar effects in recent decades with much lower O 3 than prior decades [38]; crop model simulations can reproduce the empirical sensitivity to extreme heat even though they do not include O 3 effects [39]; and the model's explanatory power is substantially improved by including extreme heat compared to including only O 3 .
Third, some empirical studies rely on O 3 exposure estimates based on values interpolated between stations. In a sensitivity test, we find that if these interpolations are overly smooth, such as by using inverse distance weighting of all stations [21], estimates of yield sensitivity to O 3 are inflated by as much as a factor of 10 (table S4). This suggests that empirical strategies that use high resolution outcome data and samples near monitors, or that leverage new sources of ground station-based monitoring, may be preferable to interpolation. These findings also suggest the need for research to untangle interpolation and aggregation effects in spatial analyses.
We are not aware of other empirical studies that have estimated effects of NO 2 levels on crop yields, although our results are consistent with many fumigation or observational studies that document phytotoxic effects at levels found in our study region. For example, a transect study for a wild plant species found significant reductions in plant dry weight above 18 ppb NO 2 [11], a level that was frequently exceeded in our region (figure S1). These prior studies also help to elucidate mechanisms, including reduction of photosynthetic rates and increased dark respiration [40,41], in a way that our empirical approach cannot. Similarly, our study is able to assess impacts in commercial farmers' fields in a way that experimental approaches cannot. Altogether we find consistent evidence that ambient NO 2 causes yield loss for both maize and soybean, although the national aggregate impact is currently modest (∼1%), perhaps not surprising given that NO 2 is short-lived and declines rapidly as one moves away from major sources.
We conclude that air pollution has had a significant negative impact on both maize and soybean yields in the US Corn Belt. In contrast to prior work, we estimate the effects of multiple pollutants, and in doing so identify a relatively large role for PM and NO 2 . We also find that, because of considerable progress in air quality, the yield impact of air pollution is only half of what it was two decades ago. The estimated yield gains associated with the reduction in air pollution has been roughly 20% of total observed gains, indicating that yield growth would have been 20% slower without air quality improvements. Since US maize and soy production is currently valued at ∼$100B/year, in very coarse terms (i.e. not accounting for other factors that might have changed in response to reduced US yields in a counterfactual scenario without reduced pollution) these air quality-driven yield gains correspond to ∼$5B/year.
Finally, our results also highlight the considerable power of using satellite-based yield estimates to understand pollution impacts, as the fine spatial resolution of satellite estimates enables us to exploit pollution gradients near power plants in a manner not possible with county-level administrative yield data. These gradients provide a useful constraint on pollution effects, particularly for NO 2 given the sparse coverage of ground monitoring stations. In addition, the ability to restrict regressions using satellite-based yields to areas near monitoring stations provides an important validation for county-level results. The value of satellite estimates could be even higher in other countries, which typically have both fewer pollution monitors and less administrative yield data than the United States.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.