Radiative forcing in the ACCMIP historical and future climate simulations

The Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP) examined the short-lived drivers of climate change in current climate models. Here we evaluate the 10 ACCMIP models that included aerosols, 8 of which also participated in the Coupled Model Intercomparison Project phase 5 (CMIP5). The models reproduce present-day total aerosol optical depth (AOD) relatively well, though many are biased low. Contributions from individual aerosol components are quite different, however, and most models underestimate east Asian AOD. The models capture most 1980–2000 AOD trends well, but underpredict increases over the Yellow/Eastern Sea. They strongly underestimate absorbing AOD in many regions. We examine both the direct radiative forcing (RF) and the forcing including rapid adjustments (effective radiative forcing; ERF, including direct and indirect effects). The models’ all-sky 1850 to 2000 global mean annual average Published by Copernicus Publications on behalf of the European Geosciences Union. 2940 D. T. Shindell et al.: Radiative forcing in the ACCMIP historical and future climate simulations total aerosol RF is (mean; range) −0.26 W m−2; −0.06 to −0.49 W m−2. Screening based on model skill in capturing observed AOD yields a best estimate of −0.42 W m−2; −0.33 to −0.50 W m−2, including adjustment for missing aerosol components in some models. Many ACCMIP and CMIP5 models appear to produce substantially smaller aerosol RF than this best estimate. Climate feedbacks contribute substantially (35 to−58 %) to modeled historical aerosol RF. The 1850 to 2000 aerosol ERF is −1.17 W m−2; −0.71 to−1.44 W m−2. Thus adjustments, including clouds, typically cause greater forcing than direct RF. Despite this, the multi-model spread relative to the mean is typically the same for ERF as it is for RF, or even smaller, over areas with substantial forcing. The largest 1850 to 2000 negative aerosol RF and ERF values are over and near Europe, south and east Asia and North America. ERF, however, is positive over the Sahara, the Karakoram, high Southern latitudes and especially the Arctic. Global aerosol RF peaks in most models around 1980, declining thereafter with only weak sensitivity to the Representative Concentration Pathway (RCP). One model, however, projects approximately stable RF levels, while two show increasingly negative RF due to nitrate (not included in most models). Aerosol ERF, in contrast, becomes more negative during 1980 to 2000. During this period, increased Asian emissions appear to have a larger impact on aerosol ERF than European and North American decreases due to their being upwind of the large, relatively pristine Pacific Ocean. There is no clear relationship between historical aerosol ERF and climate sensitivity in the CMIP5 subset of ACCMIP models. In the ACCMIP/CMIP5 models, historical aerosol ERF of about−0.8 to −1.5 W m−2 is most consistent with observed historical warming. Aerosol ERF masks a large portion of greenhouse forcing during the late 20th and early 21st century at the global scale. Regionally, aerosol ERF is so large that net forcing is negative over most industrialized and biomass burning regions through 1980, but remains strongly negative only over east and southeast Asia by 2000. Net forcing is strongly positive by 1980 over most deserts, the Arctic, Australia, and most tropical oceans. Both the magnitude of and area covered by positive forcing expand steadily thereafter.


Introduction
While well-mixed greenhouse gases (WMGHGs) are the largest single driver of climate change since the preindustrial, aerosols and ozone are also important contributors. Best estimates in the IPCC AR4 of 1750 to 2006 radiative forcing were 2.63 W m −2 from changes in WMGHGs, 0.35 W m −2 for tropospheric ozone, −0.42 W m −2 for aerosol direct effects and −0.70 W m −2 for aerosol-cloud albedo effects (Forster et al., 2007). Aerosols and ozone are also distributed unevenly, and their distinct radiative forcing patterns contribute to the regional pattern of climate change.
Despite their importance, climate model intercomparisons have traditionally neglected to document these agents. For example, simulations performed for the Climate Model Intercomparison Project (CMIP) phase 3 in support of the IPCC AR4 provided valuable insight into climate sensitivity, historical climate and climate projections, but did not examine aerosol or ozone forcing. This is important since the radiative forcings imposed in the simulations differed from model to model due to varying assumptions about emissions, differences in the behavior of physical processes affecting shortlived species, and differences in which processes and constituents were included at all (e.g. only 8 of 23 CMIP3 models included black carbon (BC) while less than half included future tropospheric ozone changes). Hence it is not straightforward to understand the relative importance of variations in climate sensitivity versus differences in the forcings.
The CMIP5 project  compares historical and future climate simulations from coupled oceanatmosphere models, but similarly provides little information on aerosol or ozone forcing. Hence there is a need for characterization of the forcings imposed in the CMIP5 historical and future simulations, for diagnostics to allow us to understand the causes of the differences in forcings from model to model, and for evaluation of the underlying simulated aerosols and ozone. ACCMIP attempts to meet these needs through a set of coordinated simulations primarily using the same composition models used by the CMIP5 groups . Here we describe the ACCMIP models' aerosol simulations, evaluate the simulated optical properties against observations, and present the resulting forcing. We examine both the conventional direct radiative forcing at the tropopause (RF) and the forcing including rapid adjustments (effective radiative forcing; ERF, including direct and indirect effects). We then combine the aerosol forcings with ACCMIP analyses of ozone and WMGHG forcing to estimate the total anthropogenic forcing through time.

ACCMIP model descriptions and experimental design
While intended primarily to examine the anthropogenic drivers of climate change in CMIP5, ACCMIP was open to the wider modeling community, and several groups participated that were not in CMIP5. We include both types of models, providing analyses of the CMIP5 subset of AC-CMIP models when appropriate. As this study focuses on aerosols, we do not include models that provided only gasphase diagnostics. Key information about the aerosols included in the models and the experiments performed is presented in Table 1. We note that the GISS-E2-R-TOMAS and NCAR-CAM5.1 models include representations of aerosol   Ghan et al. (2013) 1 HadGEM2 simulations for ACCMIP (and CMIP5) did not include dust or nitrate forcing, but nitrate was calculated in Bellouin et al. (2011), and those results have been included here when available (AOD and forcing). 2 MIROC-CHEM nitrate and SOA were calculated, but not used for their CMIP5 simulations. AOD diagnostics for these two components were not available, but forcings were.
sizes, while all other models use a bulk approach in which size distributions are prescribed and only aerosol masses are computed. More detailed information on each model can be found in the ACCMIP overview paper of . All models used time-varying anthropogenic and biomass burning emissions of aerosol and tropospheric ozone precursors from Lamarque et al. (2010) for the historical period. For 2005 to 2100, anthropogenic and biomass burning emissions were created by four separate integrated assessment modeling (IAM) groups. The scenarios are called "representative concentration pathways" (RCPs) and are named by their nominal 2100 forcing relative to 1750: RCP2.6, 4.5, 6.0 and 8.5 (van Vuuren et al., 2011). In all the RCPs, aerosol-related anthropogenic and biomass burning emissions generally fall markedly in the distant future, with global mean 2100 emissions roughly 80 % lower than present-day for SO 2 , 50 % for BC, and from 10-40 % for organic aerosol (OA). Ammonia emissions, however, increase by 10-80 % at 2100. Emissions were modified in two models, with GISS-E2-R scaling the biomass burning emissions of BC and OA by 1.4, and CSIRO-Mk3.6 scaling all BC by 1.25 and all OA by 1.5. Natural emissions varied across the models, and in many models also varied as climate changed. Further information on emissions in each model can be found in Lamarque et al. (2013). Concentrations of well-mixed greenhouse gases (CO 2 , CH 4 , N 2 O and halocarbons) were prescribed according to the RCP projections that are from the reduced-complexity coupled carbon cycle climate model MAGICC6.3 which estimated mixing ratios based on the IAM RCP emissions .
ACCMIP simulations were typically performed as timeslices for several historical and future times (Table 1). These used emissions from the given year, prescribed "climate" (sea surface temperatures (SSTs), sea-ice and WMGHGs) for that same time from companion CMIP5 simulations, and free-running atmospheric models. For the GISS-E2-R model, however, the CMIP5 runs included interactive chemistry and aerosols and hence the ACCMIP diagnostics were saved directly from the CMIP5 transient simulations. Similarly, the LMDzORINCA model archived many diagnostics from transient simulations . Values are averaged over the available years of the timeslice (generally 5-10 yr) or the nearest 11 yr for the transients, and are area-weighted for global or regional means.
All models included changes in the ocean, and hence meteorology/climate, except NCAR-CAM5.1, CICERO-OsloCTM2 and CSIRO-Mk3.6. These simulations include not only the impacts of climate change on aerosols via processes such as altered wet or dry removal or oxidation, but also changes in emissions of dust and sea-salt aerosols in many models (Table 1). To separate these, additional simulations with fixed climate but altered emissions, or vice-versa, were performed. Climate was maintained at 1850 while later anthropogenic and biomass burning emissions were used, for example. We include climate change-induced aerosol forcing in our estimates when they were included in the groups' CMIP5 simulations.

Evaluation of present-day aerosols
Recent data is the most comprehensive and highest quality, so we first evaluate the present-day aerosol climatology using the ACCMIP 2000 timeslice. RF is the end result of a path from emissions to concentrations to aerosol optical properties to forcing. As RF is not directly observed, we examine the earlier stages, primarily focusing on aerosol optical depth (AOD) as the nearest observed quantity to RF. Analysis of ACCMIP BC surface concentrations is presented elsewhere . The suffix cs indicates clear-sky. Means use only the clear-sky versions of model diagnostics when available (and exclude all-sky from those models). SurfObs are the AeroNet surface observations. *GISS-E2-R-TOMAS results were virtually identical for all-sky and clear-sky. Biases are the normalized mean bias (in percent), and absolute biases are the average absolute value of the area-weighted biases.

Satellite AOD
We first compare the spatial distribution of annual mean 550 nm AOD in the models with observations from the MODIS (Remer et al., 2008) and MISR  satellite instruments averaged over [2004][2005][2006]. We use the standard MODIS AOD product over most of the globe, but obtain greater coverage by infilling missing data with values produced by the Deep Blue algorithm that allows retrievals over bright surfaces such as deserts (Hsu et al., 2004). Differences between 2000 and 2004-2006 are expected to be generally small. Though not requested, separate all-sky and clear-sky AOD diagnostics were available from several models, and hence both were analyzed. We concentrate on clearsky output when both are available since that is more comparable to observations. The 10-model mean clearly captures many features seen in the satellite measurements ( Fig. 1). High aerosol loadings over desert regions associated with mineral dust stand out in both models and observations, as does the band of locally enhanced AOD over the Southern Ocean associated with seasalt. Areas with large anthropogenic aerosol emissions also have relatively high AOD, especially East Asia and the Indo-Gangetic plain. The multi-model mean underestimates both the magnitude and extent of AOD in these areas, however. Similarly, AOD in tropical South America, Africa and Indonesia, where biomass burning emissions are large, is underpredicted.
The multi-model mean captures the broadly lower AOD levels in Europe and North America relative to developing Asia. At finer scales, however, local features are sometimes poorly represented, for example over the Po Valley or the Mojave Desert. Over Australia, the multi-model mean is sim-ilar to MISR, and does not show the extreme lows or highs seen by MODIS. The models underpredict AOD over Arctic land areas and the Southern Ocean.
We next quantify correlations and biases. All calculations use regridded 1 • × 1 • monthly mean model fields sampled only when and where the satellite instruments report observations. Correlations between the models and the two satellite datasets are generally similar, but slightly higher for MISR (Table 2). Differences between all-sky and clear-sky are very small for the GISS-E2-R-TOMAS model, modest for CSIRO-Mk3.6, and quite large for GISS-E2-R (see Appendix A). We therefore use clear-sky data for GISS-E2-R and CSIRO-Mk3.6 hereafter. The models show a substantial diversity in their ability to capture the observed distribution and magnitude of AOD. Biases range between −30 % to +20 %, with most models being too low. Evaluating the mean AOD of ACCMIP models shows the same biases as the average of the individual models, but substantially higher correlations (0.69 versus MODIS, 0.75 versus MISR) than either the average or any individual model. An enhanced performance of the multi-model mean relative to any single model has also been seen in analyses of climate parameters (e.g. Reichler and Kim, 2008).
A portion of the negative bias in many models is due to missing nitrate and SOA. Adding the mean AODs from the other models to adjust for these missing components improves model agreement with observed global mean total AOD in most cases, though not always (Fig. 2).
Though global mean total AOD biases tend to be ∼ 0.03 or less (within 20 % of observations), models find a realistic total AOD with a very diverse partitioning among components (Fig. 3). Both the absolute AOD and the fraction of the total contributed by primarily natural dust and sea-salt vary  Table 2). by more than a factor of 2 across models. Primarily anthropogenic aerosols show similarly large variations, with individual component AODs varying by as much as a factor of 4 across models despite nearly identical sources.
Satellite observations cannot readily distinguish between different aerosol types. Instead, we compare monthly MODIS fine-mode fraction data (Remer et al., 2005) with the modeled sum of all aerosols except sea-salt and dust, as these are predominantly large particles. Though some seasalt and dust particles are small and are included in the MODIS fine mode product, there is considerable uncertainty  Clear-sky AOD is used for the GISS models. associated with separating out such small particles (Bellouin et al., 2008;Yu et al., 2009). Models should therefore be biased low. Over the oceans, this is the case for all but three models, though the magnitude varies greatly (Table 3). Spatial and temporal correlations are fairly low, however. For the entire globe, half the models show positive biases, and correlations are again relatively poor. However, we place greater value on the comparison over ocean regions, where MODIS data are more reliable.
To further evaluate individual aerosol components, we compare observed and modeled annual AOD in locations within the top decile (10 %) of component mass density in each model (Lee and Adams, 2010) (Table 4). Note that locations vary from model to model. Where sulfate mass density maximizes, AOD is positively biased in CICERO-OsloCTM2 and GFDL-AM3, and negatively biased in MIROC-CHEM. Some models have comparable magnitude  In OA-rich locations, several models appear to have too little AOD while GFDL-AM3 appears to have too much. These biases have no clear relationship to global mean OA AOD values other than GFDL-AM3 having the largest OA AOD. Dust and sea-salt are primarily naturally occurring aerosols, and thus their absolute amount tells us little about their contribution to forcing. In areas with largest dust loading, the GISS-E2-R-TOMAS model overestimates AOD (and has the highest global mean dust AOD), while MIROC-CHEM underestimates in comparison with MISR. The latter may be part of the reason that model has the lowest global mean AOD of any of the models in ACCMIP (though it is only marginally lower than some others). Note that differences between the satellite datasets are particularly large for dusty areas. The AOD in sea-salt rich regions tends to be fairly well simulated, though there are moderate biases in some models. Sea-salt biases do correspond fairly well with global mean sea-salt AOD values, with models that have a global mean sea-salt AOD in the 0.043 to 0.056 range agreeing best with measurements.

AeroNet AOD
The most quantitatively reliable large-scale network of AOD observations are AeroNet sun photometers. We compare the 2000 simulations with an AeroNet climatology spanning 2000-2009 based on measurements from 388 stations located below 1000 m in altitude. Coverage is largely limited to continental areas (other than a few island stations) and is quite sparse in some regions. Comparisons are again made only in months with data available (roughly 10 months per year on average).
Correlations reflect the models' ability to capture both the spatial pattern of AOD and its seasonal cycle. Values range from 0.44 to 0.69 for the ACCMIP models (Table 2). For comparison, correlations in the AeroCom phase I models ranged from 0.29 to 0.77, with 6 of the 7 model having values between 0.52 and 0.77 . The Aero-Com phase II models report correlations from 0.26 to 0.78, with 8 of the 10 models between 0.60 and 0.78 (M. Schulz, personal communication, 2012). In the ACCMIP set, 7 of the 9 models have correlations between 0.54 and 0.69. The slightly lower correlations in the ACCMIP models could reflect the use of meteorological reanalyses in AeroCom versus free-running models in ACCMIP, the use of less sophisticated aerosol physics in some ACCMIP/CMIP5 models, the use of daily data in AeroCom as opposed to monthly mean in ACCMIP, or a combination of these factors. Biases in the models tend to be smaller with respect to AeroNet than in comparison with satellite datasets, perhaps indicating that primarily anthropogenic aerosols over land are better simulated than aerosols over remote oceans. Overall, evaluation against ground-based and satellite datasets is fairly consistent, however.
Separating the analysis into temporal and spatial components, we find that the models all capture the monthly variations of AOD in every continental-sized region fairly well, with nearly all correlations between 0.6 and 0.8. There is no relationship between the quality of the seasonality in the models and the quality of the spatial structure. Hence we focus the remainder of the analysis on the spatial pattern, which differs more strongly across models, comparing the annual mean values in the model against observations.
Looking regionally, we find that the spatial correlations in the models are highest for North Africa (0.46-0.80), with North America second (0.48-0.67 in all but one model). Spatial correlations vary widely across models for East Asia (0.09-0.57), but are consistently low over Europe (0.31-0.45 in all but one model). Biases in the models tend to be comparatively small over North Africa, Europe and North America (Fig. 4). Nearly all models show large negative biases over East Asia, however. The two models that do not show a large negative bias over East Asia show the largest positive biases over both Europe and North America, indicating they are systematically higher than the other models rather than matching East Asia observations better due to regional differences. Root-mean-square (RMS) differences relative to AeroNet are also typically fairly small for Europe and North America, and substantially larger for East Asia. This suggests that the models have the greatest difficulty in capturing AOD over East Asia (of the regions analyzed), though the spatial pattern over Europe is also problematic (although aerosol loading is lower, so biases and RMS errors remain relatively small). For areas with more limited AeroNet coverage, biases over remote island stations in the central Pacific and    over Southern Hemisphere mid-latitude stations are nearly always positive, while biases over the Russian and especially the Western Hemisphere Arctic are typically negative (see Appendix B). As with the screening versus satellite AOD based on the dominant mass component, we can use the spatial compari-son against AeroNet to constrain the dominant aerosol component in some locations. Examining the zonal mean AOD by component (Fig. 5), we see that the largest contributor to AOD in much of the NH mid-latitudes is sulfate in all models except NCAR-CAM5.1. Several of the models overpredict the total AOD from about 40-60 • N, with a strong suggestion that this is due to too much sulfate AOD. This is especially the case for the CICERO-OsloCTM2 and GFDL-AM3 models, consistent with the analysis against satellite AOD in sulfate-rich areas. Further towards the equator, dust becomes a large contributor to AOD with a peak around 20 • N in most models. Dust is by far the largest component of AOD in several models, but not in others, and there is no clear relationship between the models' skill in capturing observed total AOD and the magnitude of the dust AOD. The GISS-E2-R-TOMAS model's dust AOD is larger than the total observed AOD around 15 • N, indicating that there is too much dust in that model and accounting for the positive bias in North African total AOD (Fig. 4; and consistent with the screening by dust mass against satellite data in Table 4, though that model actually has the highest spatial correlation with AeroNet in that region), but in other cases the zonal mean AOD does not constrain the dust loading.
At low Southern latitudes there is a peak in AOD from OA in the two GISS models, LMDzORINCA, and CICERO-OsloCTM2, which show low OA AOD elsewhere, while the values are very, very low at all latitudes in HadGEM2, NCAR-CAM3.5 and MIROC-CHEM. In contrast, organic AOD is high in NCAR-CAM5.1 across a broad area of the tropics and even into the extratropics in both hemispheres, and is also broadly distributed in GFDL-AM3 though with a smaller magnitude than in NCAR-CAM5.1. Hence as in the case of dust (and partly due to the overlap in dust and OA at low latitudes), the total AOD does not provide a tight constraint on the contribution from OA. We note, however, that OA is the only component that typically exhibits a local peak AOD at 20 • S, where the observations also show a local maximum, suggesting that the strong influence of OA on AOD at this latitude in some models is likely realistic (sulfate and dust, in contrast, tend to have a local minimum at this latitude). The sea-salt AOD peak seen in most models from 50-60 • S appears to be typically too large, so that nearly all models overpredicts total AOD at the edge of the AeroNet record. Data availability becomes very limited at these latitudes, however, so this result should be treated with caution.

Satellite and AeroNet AAOD
Though both contribute to AOD, scattering aerosols exert a negative RF, while absorbing aerosols can cause positive RF. We therefore also analyzed the absorbing aerosol optical depth (AAOD). Note that there appeared to be a problem with the HadGEM2 AAOD diagnostic, so it was excluded from this analysis.
We first compared the models' 2000 timeslice with satellite observations from the Ozone Monitoring Instrument (OMI) (Torres et al., 2007) averaged over -2007. The multi-model mean shows maximum values over Africa, Arabia, South and East Asia, and tropical South America. OMI measurements show a similar pattern, but values are generally much larger than in the models in areas with substantial AAOD. Model values are especially low over tropical South America, the Persian Gulf, and much of South and Southeast Asia. The observations also suggest substantial transport over the ocean from Southern Hemisphere continents, which is not captured in the models. This might indicate that absorbing aerosol lifetimes are too short in the models, though distributions seem reasonable in the Northern Hemisphere. Comparison with remote in situ measurements indicates that most models overestimate BC (Schwarz et al., 2010), suggesting that the oceanic AAOD underestimates may be due to an underestimate of the Southern Hemisphere sources or lifetime of dust. Quantitatively, the models have fairly poor spatial correlations with OMI, and underestimate AAOD by roughly a factor of two (Table 5). The multi-model mean shows negative biases in AAOD in every region in comparison with OMI. Underestimates are particularly large in South America, South and Southeast Asia, East Asia and Southern Hemisphere Africa, where every model shows markedly less AAOD than the satellite measurements. OMI data indicates large Arctic AAOD values, but satellite AAOD retrievals are especially challenging over bright surfaces, so these observations may not be reliable. Correlations with OMI are substantially better excluding the Arctic (Table 5).  It is difficult to measure AAOD accurately from space, and the OMI measurements may have substantial biases. In particular, the reported AAOD exceeds the total AOD in many locations. We therefore also compare with AeroNet observations (see Appendix C), which likewise shows that the models severely underestimate AAOD over South and Southeast Asia, South America, Southern Hemisphere Africa and East Asia. Since dust dominates the total BC+dust AOD over South Asia and parts of East Asia, part of the low bias there could by due to underestimates of dust loading as well as BC. The low biases in modeled AAOD over South America and Southeast Asia occur in areas where BC's contribution to total AAOD is quite large, however. Though AAOD retrievals from ground-based lidar may themselves have biases (e.g. due to required removal of NO 2 absorption), it seems likely that the at least a portion of the modeled AAOD bias versus the two datasets analyzed is real. The model biases could stem from underestimates in the emissions inventory, the absorption efficiency of BC, or both factors.

AOD trends
Long-term data on aerosol AOD is fairly limited. The most complete long-term satellite record is from the Advanced Very High Resolution Radiometer (AVHRR) and extends back to 1981. We compare the difference between the AC-CMIP 1980 and 2000 timeslices with differences between AVHRR over 1981-1987(excluding March 1982-December 1984 to avoid the effects of the El Chichon volcanic eruption) and 1997-2003. We primarily analyze average monthly mean AVHRR data produced by the Global Aerosol Climatology Project (GACP) (Geogdzhayev et al., 2005) and by NOAA (Zhao et al., 2008) over oceans and lakes, where the retrievals are most reliable for trend analysis. Both AVHRR datasets show substantial decreases over most of the globe (Fig. 7). Decreasing trends in AOD are most pronounced around Europe, especially in the NOAA product, and off of eastern North America and over the Great Lakes (though GACP shows large decreases over most of the Northern Hemisphere). AOD also decreases off West Africa, though as interannual variability in dust emissions is large, our relatively short averaging periods may not be representative for long-term trends there. The models show decreases in AOD over continental Europe and eastern North America that extend out over the nearby oceans. Quantitative comparisons were made by sampling the models where satellite data were reported for at least eight months during at least two years of both the 1981-1987 and 1997-2003 periods. Tests show that values in most locations are extremely similar using thresholds of 9 or 10 months. The multi-model mean decreases near Europe and eastern North America are in good agreement with NOAA AVHRR, though North American trends are too small in comparison with GACP AVHRR (Table 6). A very recent reprocessing of the AVHRR dataset by the same NOAA group has produced the NCDC AVHRR-CDR dataset (Chan et al., 2013). This dataset has less coverage (i.e. no Great Lakes data), but at least nominally improves over the prior processing. This version shows more negative trends everywhere relative to the earlier NOAA analysis, and the models underpredict AOD decreases near both Europe and North America with respect to this analysis ( Table 6).
The models show large increases in AOD in east Asia, south Asia and Indonesia between 1980 and 2000 ( Fig. 7). Observed trends in nearby oceans show strong increases in the NOAA product, but only weak increases in the AVHRR-CDR product and decreases in GACP. It seems counterintuitive that AOD would decrease near rapidly developing Asian countries during this period, and thus we favor the NOAA analyses. The multi-model mean increases are similar to the older NOAA values averaged over a broad area of near-Asian oceans, but are substantially smaller than those seen the Yellow/Eastern Sea region where observed trends maximize ( Table 6). The modeled increases, however, extend over a broader area of the Indian and western Pacific Oceans. Modeled trends are much too large near Asia relative to the new AVHRR-CDR product. It is difficult to reconcile modeled increases being far too large with the consistent underprediction of 2000 total AOD in south and east Asia (Fig. 1). Hence the older NOAA AVHRR trends may be the most plausible in this region. There are clearly substantial limitations to AVHRR data, however, as highlighted by the large differences between the three datasets, with many The multi-model mean is from 9 models, with HadGEM2 excluded. OMI retrieval is based on OMAERUVd.003 daily products from 2005-2007 that were obtained through and averaged using GIO-VANNI (Acker and Leptoukh, 2007). White indicates no data in the satellite record. White borders in the lower panel show areas included in regional AAOD analyses. assumptions about clouds and aerosols required to translate observed radiances into AOD (Li et al., 2009). The trend comparison suggests that in regions with relatively robust observed trends the modeled AOD trends are fairly reasonable, while near Asia trends are more difficult to constrain from observations but the model results seem plausible.
Near Europe and eastern North America, GISS-E2-R simulations show particularly small AOD trends. The difference relative to other models stems from large increases in nitrate aerosol AOD in that model in these areas. This suggests that nitrate increases in GISS-E2-R (and the resulting forcing) are likely too strong. Near Asia, the GFDL-AM3 and CICERO-OsloCTM2 models best match the AOD increases in the NOAA AVHRR analysis over the Yellow/Eastern Sea, but greatly overestimate trends over the broader South and East Asian coastal region where most other models perform well (Table 6), making it difficult to evaluate overall model skill.  1997-2003and 1981-1987(excluding March 1982-Decenber 1984 averages based on the NOAA product (center) and the GACP product (bottom).

Historical and future aerosol forcing
Radiative forcing is a useful metric for evaluating the contribution of a given factor to climate change over a particular time and for comparing the influence of multiple factors. RF is not a perfect indicator of the eventual global mean temperature response to a sustained forcing, but it is generally reasonably close. Notable exceptions occur with aerosols, however, especially for aerosol-cloud interactions and BC albedo forcing (Forster et al., 2007;Koch et al., 2011;Flanner et al., 2007).
RF is calculated from the difference in flux at the tropopause between a pair of radiative transfer calculations with reference (usually zero) aerosols in the first and actual aerosols in the second. Changes in this flux difference over time are the RF. This diagnoses the so-called "direct" aerosol RF, but does not capture the various effects of aerosols on clouds. To diagnose those, we use additional simulations that isolated flux changes due to all aerosol effects (see Sect. 5.2). BC surface albedo forcings were calculated using a combination of the NCAR Community Land Model 4 (Lawrence et al., 2011) and models of snow and sea-ice interactions with aerosols (Flanner et al., 2007;Holland et al., 2012) that determined the forcing due to black carbon and dust deposition as reported by each ACCMIP model (see Appendix D).

Global mean preindustrial to present-day RF
We first examine total and component global mean annual average aerosol RF between 1850 and 2000 ( Fig. 8; Table 7). As the sample size is sometimes small, we show individual model results rather than a "best estimate" and range. The mean and standard deviation of sulfate RF is −0.40 ± 0.13 W m −2 , and this range encompasses 6 of the 9 models. The only models exhibiting sulfate RF larger than this range are CICERO-OsloCTM2 and CSIRO-Mk3.6. The CICERO model showed the largest positive AOD biases in sulfate-rich regions in comparison with observations (Table 4; CSIRO did not provide speciated AOD data). The only model with sulfate RF smaller than this range is NCAR-CAM5.1. That model's AOD was biased low in sulfate regions, but did not stand out from the other models. We thus estimate a most probable range for sulfate RF of −0.18 to −0.44 W m −2 (i.e. the range of all models except CICERO and CSIRO). As biases in present-day AOD do not necessarily correlate with biases in forcing, we cannot rule out larger negative forcings, however.
Carbonaceous aerosol forcing was diagnosed according to emission sources: BC from fossil and biofuel (ff+bf) sources, OA ff+bf, and biomass burning (BB) total carbonaceous forcing. This source apportionment was used since OA is always co-emitted along with BC emissions, but the BC/OA ratio is typically much lower for biomass burning. Evaluation by emission source is consistent with AeroCom  and AR4 (Forster et al., 2007), although it leaves us with a mixture of pollutant-based and sectorbased analyses.
For BCff+bf, mean and standard deviation of RF is 0.24± 0.09 W m −2 , with a full range from 0.14 to 0.38 W m −2 . As shown in Sect. 3, the models underestimate AAOD in comparison with OMI by 52 %, while biases with respect to AeroNet are even larger in most parts of the world, although we acknowledge that there are substantial uncertainties regarding the AAOD measurements themselves. This suggests that the BCff+bf RF could similarly be greatly underestimated, though the magnitude of the RF bias is unclear as some of the AAOD bias may come from biomass burning BC or dust, and biases in BC's present-day climatology do not translate directly into biases in time-dependent forcing. Individual points are from different models. Aer are totals. Aer+ are totals including adjustment by adding in forcing due to missing nitrate and SOA (see text). Aer(CMIP5) is the subset of AC-CMIP models that also participated in CMIP5. The number of models for each component is: sulfate 9, BCff+bf 5, OAff+bf 4, BB 4, nitrate 5, SOA 4, Aer 10, Aer+10, and Aer(CMIP5) 8. Note that HadGEM2 nitrate  and NCAR-CAM3.5 nitrate are included in those models' Aer+ but not the Aer or Aer(CMIP) values. MIROC-CHEM nitrate and SOA are included in their Aer and Aer+ values, but not in their Aer(CMIP) values.
Evaluation against BC deposition recorded in ice cores also reveals substantial biases in these models , though again these cannot be clearly related to RF.
Few model results are available for the remaining components. The mean RF from ff+bf OA is −0.04 W m −2 , with a range from −0.01 to −0.08 W m −2 . The models with the weakest OA forcing substantially underestimate AOD in OArich regions (Tables 4 and 7). This suggests that the OA RFs from −0.04 to −0.08 W m −2 reported by the models with smaller OA-rich region biases might be more realistic (though there are uncertainties going from AOD to RF due to incomplete knowledge of OA optical properties and temporal evolution).
The mean RF for biomass burning BC+OA is 0.00 W m −2 , with a range from −0.02 to 0.02 W m −2 . This RF is thus quite small and the limited number of models show fairly similar results. Three additional models reported total carbonaceous aerosol RF (Table 7). The MIROC-CHEM model has the greatest bias with respect to OMI AAOD (Table 5), so is likely substantially low. The other models show similar totals to the sector-specific sums. Thus we are fairly confident that the reported ranges for the carbonaceous aerosols forcings are robust across these models.
Results for the remaining components, SOA and nitrate, show substantial spread. For SOA, the substantial positive forcing in MIROC-CHEM is quite different from the other models, which show small negative forcings. Emissions of biogenic SOA precursors are coupled to land-use changes in MIROC-CHEM, while other models use fixed present-day vegetation distributions. Thus only MIROC-ESM-CHEM incorporates decreases in forest area leading to reduced emissions of SOA precursors, and hence a positive SOA RF. Thus the "outlier" may in fact be the most realistic model. This highlights the role of "structural" uncertainties concerning which physical processes are represented in models in addition to "scientific uncertainties" represented by the range of results across models incorporating similar processes.
Total aerosol RF was reported in 10 models, and the mean, standard deviation, and range are −0.26, 0.14 and −0.06 to −0.49 W m −2 , respectively. We also examine the total aerosol RF accounting for missing components (nitrate and SOA) in some models, which we call Aer+. Mean values for missing components are taken from the other ACCMIP models. We exclude the MIROC-CHEM SOA, however, as this includes land-use changes that have thus far been assessed in only a single model and so we do not know how representative that result is. We also weight GISS-E2-R nitrate by 0.5 to account for its biases against AVHRR trends in nitraterich areas (Sect. 4). Multi-model means are −0.05 W m −2 for SOA and −0.16 W m −2 for nitrate. The mean, standard deviation, and range for Aer+ are −0.39, 0.14 and −0.12 to −0.62 W m −2 , respectively. Since accounting for missing components improves the agreement between models and satellite AOD in nearly all cases (Sect. 3), we consider the missing component-adjusted values to be more realistic.
We also test if there is a relationship between model skill and aerosol RF. The models with the highest correlation (over 0.60) against the satellite datasets or AeroNet do have a narrower RF range, −0.16 to −0.49 W m −2 , than the full set of models (Fig. 9). The models with smallest biases (< 15 %) have the same forcing range. Screening the models by correlation or high bias with respect to MODIS fine-mode fraction over the oceans gives identical results. The range for this subset of quality-screened models is −0.33 to −0.62 W m −2 accounting for missing components (Aer+). Screening based on NOAA AVHRR AOD trends (near-Europe and near-South and East Asia where the multi-model mean agrees with observations), leaving out models more than one standard deviation from the multi-model mean, gives a range of −0.16 to −0.40 W m −2 , or −0.33 to −0.50 W m −2 for Aer+. Note, however, that several models are absent from the AVHRR analysis, which may contribute to the reduced range. Hence the full range based on all screening is encompassed by −0.42 ± 0.09 W m −2 (Aer+). This RF is almost identical to the multi-model mean Aer+, but has less than half the range of the full set of models. Thus although there are large differences in RF per unit AOD in aerosol models, and hence screening by AOD would not obviously lead to a reduced RF range, this is the case in these models. Note that the screening does not take into account uncertainties in emissions, which could alter the relative agreement with observations of the various models.
The evaluation against observations thus indicates that −0.42 ± 0.09 W m −2 is the best estimate of the total 1850 to 2000 aerosol RF. This is similar to the IPCC AR4 estimate of −0.50 ± 0.40 W m −2 , but has a much smaller uncertainty range. Our range does not account for uncertainty in the underlying emissions, however. Additionally, positive forcing from fossil+biofuel BC is likely underestimated (Sect. 3.3).
We performed a similar analysis of fossil+biofuel BC forcing in comparison with model skill in representing AAOD. Unfortunately, the requisite data was available from only 5 models, and the analysis did not show robust relationships between skill and RF at either the global scale or for regions with greatest BC/dust AAOD ratios. Hence a model's ability to reproduce present-day climatological AAOD provides a poor test of its long-term BC RF. Without better understanding of the causes of the AAOD underestimate, it is not yet clear how best to adjust forcing from BC or co-emitted species to correct for model biases.
The CMIP5 subset of eight ACCMIP models has an aerosol RF mean and standard deviation of −0.28 ± 0.13 W m −2 (Fig. 8). Thus the aerosol RF actually driving those CMIP5 climate simulations tends to be underestimated in comparison with our best estimate (Aer+), primarily owing to the lack of nitrate and/or SOA in several models.
As mentioned previously, several models included changes in dust and sea-salt aerosols, and nearly all models included the effect of climate change on aerosols via the imposed SST and sea-ice trends (except NCAR-CAM5.1, CICERO-OsloCTM2 and CSIRO-Mk3.6). The impact of climate change can be isolated in four models by comparing the full 1850 to 2000 changes against the influence of emissions alone (the simulation with 1850 climate and 2000 emissions differenced with the simulation with all conditions at 1850). The effect on aerosol RF is −0.02 W m −2 in HadGEM2, −0.07 W m −2 in GFDL-AM3, −0.17 W m −2 in GISS-E2-R and 0.07 W m −2 in MIROC-CHEM. The range is clearly quite large, at least in part because models include different processes influencing aerosols. In GISS-E2-R, for example, sulfate and nitrate aerosols can form coatings on dust and sea-salt particles, changing their lifetimes. Similarly, HadGEM2 alone included sea-salt but not dust changes in their forcing calculation. From 35 to −58 % of the 1850 to 2000 total aerosol RF is attributable to the influence of climate feedbacks on aerosols rather than aerosol direct or precursor emissions.

Temporal evolution of global mean RF
In the ACCMIP timeslices, total aerosol RF becomes increasingly negative in all models from 1850 to 1930, and again from 1930 to 1980 (Fig. 10). From 1980 to 2000, however, the total aerosol negative RF becomes weaker in six of the nine models. This is due to pollution controls that limited emissions, especially of sulfur dioxide. Sulfate RF weakens or stays approximately constant from 1980 to 2000 in all models for which data is available. In contrast, fossil+biofuel BC RF grows more positive throughout the 20th century in all models, contributing to the weakening total aerosol negative RF between 1980 and 2000. Unlike fossil+biofuel BC RF, BC albedo forcing peaks in 1980. This is due to regional shifts in the location of BC emissions from higher latitudes, where they can more easily reach Arctic snow and ice covered areas, to lower latitude developing nations (further details are presented in Lee et al. (2013) and Appendix D). Total aerosol RF from 1980 to 2000 becomes more negative in the GFDL-AM3 and GISS-E2-R models, by −0.05 and −0.10 W m −2 , respectively. GFDL-AM3 did not diagnose RF by aerosol component. In GISS-E2-R, sulfate contributes 0.01 W m −2 , carbonaceous aerosols 0.05 W m −2 , nitrate −0.13 W m −2 and SOA −0.01 W m −2 . Hence RF becoming more negative is primarily attributable to nitrate in that model. Note that this cannot be the case for GFDL, which did not include nitrate.
In the RCP2.6 and RCP8.5 emission scenarios, which span the range of RCP projections in terms of total forcing/climate impacts, all aerosol RF declines greatly in most models. There is little difference between the scenarios in MIROC-CHEM. CICERO-OsloCTM2 and LMDzORINCA show substantially greater declines in negative aerosol RF at 2030 under RCP2.6, but the differences narrow at 2100. In contrast, differences between the scenarios increase throughout the 21st century in GFDL-AM3. In that model, total aerosol RF stays approximately constant under RCP8.5. Total aerosol RF becomes increasingly more negative in the future in GISS-E2-R. This results from increased nitrate negative RF and reduced BCff+bf positive RF, which together outweigh the reduced negative RF from other scattering aerosols. Most of the increase is due to nitrate, with 2100 versus 2000 RF of −0.36 W m −2 under RCP2.6 and −0.49 W m −2 under RCP8.5. Bellouin et al. (2011) report HadGEM2 2100 versus 2000 nitrate forcings of −0.4 W m −2 and −0.5 W m −2 for RCPs 2.6 and 8.5, respectively, including direct and cloud albedo effects. Their ratio of historical nitrate direct forcing to nitrate direct plus cloud albedo forcing is 0.71, suggesting that the direct forcing is ∼ 30 % lower than these RCP RF values (those estimates are shown in Fig. 10). Hence the HadGEM2 results seem fairly consistent with the GISS-E2-R nitrate projections, though slightly smaller. Future nitrate forcing was not available from other models. Nitrate aerosols become increasingly important because sulfur dioxide emissions are greatly reduced, and sulfate and nitrate precursors compete for a limited supply of ammonium, and because the RCPs assume that pollution controls are effective for industry, vehicles and power generation, but ammonia emissions from agriculture increase during the 21st century (van Vuuren et al., 2011). Note that in some models, dust and sea-salt were included in future RF (Table 1). While diagnostics were not available for these natural aerosols in most models, in GISS-E2-R they were remarkably stable, contributing less than 0.03 W m −2 to 2100 versus 2000 RF.
Analysis of 2000 to 2010 shows small aerosol RF. Across all four RCPs, results from five models (though not all models ran all scenarios) show aerosol RF from −0.013 to 0.033 W m −2 , with mean and median values of 0.007 W m −2 and −0.001 W m −2 , respectively. For RCPs 4.5 and 6.0, with at least three model results available (Table 1), values are not consistent in sign among the models.

Geographic pattern of RF
The distribution of aerosol RF is highly inhomogeneous. This has important consequences for both global and especially regional climate change, as feedbacks are non-uniform and aerosol impacts tend to maximize in areas with greatest forcing (e.g. Rotstayn and Lohmann, 2002;Ming and Ramaswamy, 2009;Boer and Yu, 2003;Shindell et al., 2010). We therefore next analyze the spatial pattern of aerosol RF.
The 1850 to 2000 sulfate forcing is greatest over the most industrialized and heavily populated areas, especially east and south Asia, Europe and eastern North America where negative forcing exceeds −1 W m −2 (Fig. 11). Forcing extends past these regions, where the emissions of sulfur dioxide are largest, out over the nearby oceans and over the Middle East, due to atmospheric transport. The variation across models is greatest in these same regions, and is fairly uniform across regions.
Fossil+biofuel BC RF is, like sulfate, very large over east and south Asia where it is more than 1 W m −2 , but is comparatively small over Europe and North America. The spatial distribution of fossil+biofuel OA RF is generally very similar to fossil+biofuel BC RF, unsurprisingly, but has the opposite sign (except over eastern North America) and a much smaller magnitude.
Unlike other components, biomass burning aerosol and SOA RF show regions of both substantial positive and negative forcing. For biomass burning, this stems from varying regional fire frequency trends, with decreases in the southeastern United States and increases in Indonesia. For SOA, the large positive forcings come from the MIROC-CHEM model's incorporation of changing land-use. As the other models do not include this factor, there is an extremely large standard deviation in this RF. While global means always mask regional patterns, the existence of both positive and negative forcing means that the global mean can be particularly misleading for these two components.
Nitrate aerosol RF shows local maxima over East Asia, Europe and eastern North America, and to a lesser extent over south Asia. This distribution is similiar to that of sulfate. There is a broader distribution of small Southern Hemisphere forcing values, however. This stems from both MIROC-CHEM and especially from GISS-E2-R, which efficiently lofts ammonia in convective plumes from tropical sources to the upper troposphere, where it then spreads to both hemispheres. This leads to unexpectedly large nitrate aerosol abundances in the upper troposphere over much of the world, though given the paucity of measurements it's difficult to evaluate this forcing. As noted previously, GISS-E2-R appears to underestimate AOD trends near North America and Europe owing to overly large increases in nitrate. Hence the multi-model mean nitrate forcing over Southern Hemisphere mid-latitudes, coming in large part from that model, may also be too large. There is great divergence between models in this region. In comparison, standard deviations are of similar magnitude over industrialized areas, but forcing is much larger. Models differ in industrialized regions as well though, and it is difficult to determine which is more realistic overall. For example, over China, a model that captures observed nitrate well found annual mean nitrate RF of −0.95 W m −2 (relative to zero nitrate) (Zhang et al., 2012). For ACCMIP models, GISS-E2-R finds a very similar value (−1.08 W m −2 ), while CICERO-OsloCTM2 has about half as much year 2000 nitrate and MIROC-CHEM roughly double the GISS-E2-R amount. Given the small number of models having reported nitrate RF, it is clear that uncertainties are especially large for this aerosol component.
The total aerosol RF is strongly negative over most Northern Hemisphere land areas below 60 • N. Over the Sahara, the Tibetan plateau, and the Arctic, however, very high surface albedo reduces the effect of scattering aerosols while increasing the effect of absorbing aerosols, leading to net positive forcing. There is a local maximum in negative forcing over the Amazon that is largely attributable to SOA, and large values over western Central Africa driven by SOA, biomass burning and nitrate. This suggests that forcing in these regions may be underestimated (i.e. not negative enough), as many models do not include all these aerosols. Large modelto-model variations in all aerosol RF over west Africa and Indonesia stem from SOA and biomass burning aerosols, while large diversity over Arabia is due to forcing there arising primarily from long-range transport which can vary substantially between models, with contributions from dust changes included in some models as well. Substantial forcing also extends over many oceanic regions.
BC albedo forcing, which is not included in the all aerosol RF, is largest in western Russia, the Karakoram and Manchuria. Arctic forcing is also substantial, though not as large as at lower latitudes where sunlight is more plentiful. We also examine the distribution of forcing through time, focusing on the total and carbonaceous aerosol (carbonaceous aerosol RF is simpler to display than the three separate components, and the OA ff+bf and BB forcings are comparatively small). RF in 1930 was primarily concentrated over Europe and eastern North America for both cases (Fig. 12). Magnitudes increased substantially in those areas from 1930 to 1980, while large forcings also appeared over east Asia, and for the total, over parts of Africa, Latin America and southeast Asia. From 1980 to 2000, negative all aerosol RF increased substantially over south and southeast Asia, where increased sulfate outweighed increased BC (Fig. 13). These more nearly offset one another over East Asia, leading to small trends in all aerosol RF there. Over Europe and North America, however, forcing is positive due to declining sulfate, which over Europe outweighs a large reduction in fossil+biofuel BC RF during this period. Tropical trends are attributable to changes in biomass burning (they are not from fossil+biofuel BC RF; Fig. 13), which increased in Indonesia and decreased in western Africa.
From 1930 to 1980 BC albedo forcing declines over North America while increasing markedly over Eurasia (Fig. 12). Trends are especially large over western Russia, an area downwind of European BC sources and one with substantial biomass burning. During 1980 to 2000, BC albedo forcing decreases in most of Russia, while increasingly greatly over the Karakoram and Manchuria.
The all aerosol 2000 to 2030 RF is similar in many respects to the 1980 to 2000 RF (Fig. 14). Under RCP2.6 or RCP8.5 negative all aerosol RF over Europe and North America continues to decline, though more so under RCP2.6. Likewise negative all aerosol RF continues to increase over South Asia, especially under RCP8.5. As with the recent past, trends in scattering and absorbing aerosols are more nearly balanced over east Asia, leading to modest all aerosol RF there. By 2100, aerosol forcing declines sharply in magnitude virtually everywhere (regions with negative 1850 to 2000 forcing show positive RF, and vice-versa). Carbonaceous aerosol RF similarly declines in the RCPs. As with the global mean, differences between scenarios are larger at 2030 than 2100.

Aerosol effective radiative forcing
In addition to direct aerosol RF, we calculate the "effective radiative forcing" (ERF), defined here as the top-of-theatmosphere (TOA) net energy flux change with ocean conditions held fixed but all other processes allowed to respond to the aerosol changes. Along with the direct RF, ERF thus includes aerosol indirect effects on clouds via microphysics (affecting cloud albedo and cloud lifetime) as well as responses of water vapor, lapse rate and clouds to aerosol thermodynamic impacts (including so-called "semi-direct" effects). In half the models, BC-induced albedo changes are also included (GISS-E2-R, GISS-E2-R-TOMAS, MIROC-CHEM and NCAR-CAM5.1). ERF is in general a better indicator of the eventual climate response than RF (Hansen et al., 2005;Lohmann et al., 2010). Note that since land temperatures are allowed to adjust, the ERF include a small portion of response which lowers its magnitude by ∼ 10 % or less (Hansen et al., 2005;Andrews et al., 2012). It is relatively straightforward to account for this bias at the global scale, but it is not clear how to do so at the regional scale and hence we do not remove this bias.
ERF values come from comparing the 1850 simulations against additional simulations that (1) used a different year's short-lived species emissions, (2) maintained 1850 climate and WMGHG concentrations, and (3) included interactions with radiation and clouds for aerosols but not for ozone. Thus aerosols were the only changes influencing radiation. CMIP5 included a very similar pair of simulations with fixed ocean boundary conditions and 1850 to 2000 aerosol concentration changes (although using aerosols from historical simulations includes the influence of climate change on aerosols). We evaluate aerosol ERF from the CMIP5 simulations for three models: LMD and HadGEM2 (ACCMIP results not available), and CSIRO (BC albedo did not affect radiation in AC- CMIP runs but did in CMIP5 runs). In the NCAR-CAM5.1 simulations, background climate and WMGHG conditions were fixed at 2000 rather than 1850, and ozone precursor emissions did not change .
The ACCMIP models' mean and approximate 5-95 % confidence interval (1.65-σ ) for ERF is −1.2 ± 0.5 W m −2 . The spatial pattern of 1850 to 2000 aerosol ERF is broadly similar to the aerosol RF pattern (Fig. 15). ERF is relatively stronger over outflow regions, however. This is likely because anthropogenic aerosols have an enhanced effect on clouds in remote areas where there are few natural cloud condensation nuclei and high humidity. The ERF is positive in several regions, including the Sahara, parts of the Himalayas/Karakoram, and over both polar regions. Over much of the Arctic Ocean, values are more than 0.5 W m −2 . Arctic ERF is especially large in boreal spring, with values exceeding 0.75 W m −2 over large areas of the Arctic Ocean and Greenland (Fig. 16). This is attributable to both the more positive RF over highly reflective surfaces discussed previously and to the greater influence of clouds on longwave versus shortwave radiation at high latitudes. These results suggest that aerosols may have played a greater role in rapid Arctic climate change than generally appreciated.   Variability across models is large in many locations for ERF. The global average of the standard deviation at all points is 1.27 W m −2 , far larger than the standard deviation of any individual aerosol component's RF. The standard deviation across each model's global mean ERF is only 0.29 W m −2 , however, indicating that models produce fairly similar total aerosol ERFs but with forcing locations shifted between models. The global mean ERF standard deviation is 25 % of the multi-model mean, less than the comparable ratio for all aerosol RF (50 %) or RF by component (35-40 % for sulfate and BCff+bf, 80 % or more for others). The noisy ERF structure indicates that calculating local ERF values may be difficult for small forcings, however, whereas these can be easily isolated in the RF methodology that is not influenced by meterological variability. Over regions with substantial ERF, the relative standard deviation of the ERF is no larger than that of RF (Fig. 17). Over some areas, such as parts of east Asia, it is actually smaller.

Atmos
Effective atmospheric forcing by aerosols (defined as TOA minus surface ERF) shows strong absorption of energy where BCff+bf and biomass burning RF are large. There are also indications of dynamics changes, including shifts over North America and Australia. The 2000 versus 1850 reduction in surface shortwave flux due to aerosol (including their effects on clouds, water vapor, etc.) is 2.50 ± 0.81 W m −2 , in excellent agreement with the most recent IPCC assessment of Global mean ERF minus RF is −0.90 W m −2 , providing a rough estimate of aerosol indirect effects on clouds (though it includes additional responses such as water vapor and lapse rate adjustments). A large number of model studies constrained by satellite data summarized in Lohmann et al. (2010) found ERF ranging from approximately −0.6 to −1.6 W m −2 , with roughly 1/3 direct and 2/3 indirect. These values are in good agreement with our results. Most inverse methods based on observed temperature changes, ocean heat uptake, and cacluated non-aerosol forcings produce ERF best estimates of −0.8 to −1.6 W m −2 (Murphy et al., 2009;Shindell and Faluvegi, 2009;Church et al., 2011;Hansen et al., 2011), though some find smaller values (e.g. Libardoni and Forest, 2011). Hence the ACCMIP models' ERF is consistent with most prior studies, though weaker values have also been reported.
As the most ERF results were available for 2000, we evaluate global mean ERF at other times based on fractional differences relative to 2000 in models that diagnosed ERF at both times. The temporal evolution of aerosol ERF does not closely follow the temporal evolution of all aerosol RF (Fig. 18). Through 1980, ERF follows scattering aerosol RF, which increases by 377 % from 1930 to 1980 while ERF increases by ∼ 375 % (all aerosol RF increases by more than 500 %). ERF continues to increase from 1980 to 2000, however, while both all aerosol RF and scattering aerosol RF tend to decrease. Aerosol ERF became more negative in all three models that calculated aerosol ERF differences between 1980 and 2000, and did not closely track RF. This suggests that, unsurprisingly, ERF may be quite sensitive to background aerosol loading, the geographic location of the aerosols, and the mixture of aerosol types. In particular, increases in negative ERF from 1980 to 2000 are large over East Asia and the Pacific outflow regions, with large values extending all the way to the eastern North Pacific (Fig. 18), while increases in negative all aerosol RF are fairly small and localized there during this time (Fig. 13). Thus the global mean RF trends are dominated by decreasing aerosol over Europe and North America, but the ERF is dominated by increasing negative forcing over and downwind of Asia. Thus suggests that recent aerosol ERF has been more strongly influenced by recent increases in Asian emissions than by coincident European or North American decreases due to the former being upwind of the large, comparatively pristine Pacific Ocean. Though results are only available from three models, this is consistent with the relatively stronger ERF vs. RF response in outflow regions seen in the analysis of all models. Analysis of ocean heat uptake and the planetary energy budget suggests that aerosol ERF indeed became more negative in the late 1990s and early 2000s relative to the 1980s and early 1990s (Church et al., 2011).
Diagnoses of future ERF are only possible for RCP8.5 at 2030 and 2100. We analyze flux differences between the full future simulations and runs with 2000 emissions and future climate (i.e. ocean conditions are at future values in both cases). When ozone changed in the RCP8.5 simulations, we subtract ozone forcing to get aerosol ERF. ERF is positive for 2030 relative to 2000 over Europe and the US and negative over South Asia and the Himalayas (Fig. 18). This pattern is similar to the 2000 versus 1980 ERF, except that ERF ceases becoming more negative over and downwind of east Asia. The global mean ERF trend therefore changes direction during this time. By 2100, the ERF has become positive nearly everywhere relative to 2000. The spatial pattern closely resembles the inverse of the 1980 or 2000 ERF (relative to 1850), indicating that most of the historical aerosol forcing has been removed. The primary exceptions are the negative tropical African and South American ERFs related to biomass burning.
Global mean ERF becomes less negative from 2000 to 2030 under RCP8.5, and by 2100 nearly recovers to its 1850 value. Unlike RF, future ERF becomes less negative in all the models, including the GFDL-AM3 and GISS-E2-R models that showed steady or increasingly negative future aerosol RF. We expect similar 2100 ERF under the other RCPs since they all remove most anthropogenic aerosol and aerosol precursor emissions, except for ammonia . There are differences in the timing of reductions, however, making it difficult to infer ERF at earlier times for the other RCPs.
Understanding the contribution of specific aerosol types to ERF is important for attribution of historical changes to particular emissions and especially for assessing the impact of potential future emissions pathways or emissions mitigation policies. Little information on the ERF attributable to specific aerosol types is available, however. Chuang et al. (2002) reported that cloud forcing due to sulfate and carbonaceous aerosols followed the burden fairly closely (within 15 %), which would imply a slightly larger historical indirect effect from sulfate. Jacobson (2002) suggested that the indirect effect of sulfate was roughly double that of carbonaceous aerosols, however. Among the ACCMIP models, MIROC-CHEM and GISS-E2-R performed simulations to isolate the contribution of individual aerosol components to the 2000 versus 1850 ERF (see Appendix E). In the MIROC-CHEM model, nearly all the indirect forcing is attributable to sulfate (93 %), with only small negative forcings (∼ 10 % W m −2 ) from BC and OC (Takemura, 2012). In the GISS-E2-R model, sulfate contributes a large nega-tive cloud forcing (−0.45 ± 0.03 W m −2 ) while OC causes a much smaller cloud forcing of −0.17 ± 0.03 W m −2 and BC causes a positive cloud forcing (0.23 ± 0.03 W m −2 ). The total cloud forcing is −1.13 W m −2 , suggesting that additional components (nitrate and SOA) are important in that model or the sum is highly non-linear. Hence there seems to be consistency in attributing the largest share of aerosol indirect forcing to sulfate, presumably owing to its greater solubility, but the relative values for individual components span a wide range in extant studies, and the sign of the BC-induced indirect effect is not clear (see Appendix E).

Comparison of forcing with AOD and estimated CMIP5 model forcing
As calculation of RF adds some computational expense, and calculation of ERF adds a great deal, forcing is often not diagnosed. We therefore test how well forcing can be estimated based on more readily available AOD changes. AOD changes are highly correlated (r 2 = 0.95) with RF calculated in the ACCMIP models (Fig. 19). RF per unit AOD change, which we call Normalized Radiative Forcing (NRF), is −7.6 W m −2 (−8.9 to −6.6 at the 95 % confidence interval (CI)). The correlation is higher (r 2 = 0.88) for the larger 1850 to 2000 RF than for the 1980 to 2000 RF (r 2 = 0.72). Hence AOD may provide a reasonable indicator of RF for large AOD changes, but does not appear to be as reliable for smaller changes.
AOD changes are a fairly poor indicator of ERF, however, with an r 2 correlation of 0.48 in the ACCMIP models. The ERF/dAOD ratio is −30 W m −2 with a very large 95 % CI spanning −660 to −15. Note that both the RF and ERF analyses include cases in which changes in global mean AOD of near zero produce small but substantial forcings. Aerosol burden changes are generally not well correlated with RF, but analysis of burdens provides insight into the relative importance of emissions and lifetime changes in driving aerosol forcing (see Appendix F). If we use the RF/dAOD ratio calculated here to adjust model forcings for the bias relative to satellite observations, assuming all the bias can be attributed to trend underestimates, the multi-model mean would increase by −0.11 W m −2 . While this would produce an RF value (−0.37 W m −2 ) in fairly good agreement with our best estimate, some of the bias relative to observations may be systematic over time.
AOD trends are available from a large number of CMIP5 models. Using the NRF of −7.6 W m −2 derived from AC-CMIP, we estimate the direct aerosol RF based on CMIP5 AODs (Fig. 19; Appendix G). The estimated aerosol RFs in the subset of ACCMIP models with CMIP5 AOD data is −0.31 W m −2 , a value larger than the true forcing diagnosed in those models (−0.24 W m −2 ). This suggests that if anything, the AOD-based RF estimates may be biased high. The estimated aerosol RFs in the non-ACCMIP CMIP5 models have a mean of −0.21 W m −2 , and most of the CMIP5 models not in ACCMIP have estimated aerosol RF less than the range encompassed by our best estimate (−0.42 ± 0.09 W m −2 ). Across both projects, only a few models have aerosol RF within the range of our best estimate, with all the others too small. Note that in many cases different model versions from the same institution have similar aerosols, but not always.
The present-day modeled AOD shows an enormous range for CMIP5 models, and the bias relative to AeroNet is strongly correlated with the AOD change (r 2 = 0.84). This suggests that our screening by skill in capturing present-day AOD provided a useful constraint on RF because present-day AOD is highly correlated with long-term aerosol changes. The CMIP5 multi-model mean AOD, excluding the two GISS models with much larger all-sky than clear-sky AODs, is biased 13 % low versus AeroNet. As the CMIP5 multimodel mean estimated forcing is roughly half the best estimate presented here, the AOD changes may be too small by an even larger factor.
As described previously, some models performed simulations under CMIP5 that allow aerosol ERF to be diagnosed for 2000 relative to 1850. Analysis of ERF was performed for six additional models (Appendix G). AOD changes from 1850 to 2000 were available for three of those models (MRI-CGCM3, MIROC5, and NorESM1-M), allowing those models to be added to the AOD change versus ERF comparison (Fig. 19). Consistent with the ACCMIP models, there is no clear relationship, and the correlation across both sets of models decreases to r 2 = 0.34.

Total anthropogenic composition forcing
ACCMIP characterized radiative forcing from ozone as well as from aerosols. Ozone RF was calculated offline using the NCAR Community Climate System Model 4 radiative transfer model (RTM) and allowing stratospheric temperatures to adjust . We compute net longwave and shortwave all-sky flux at the tropopause (based on a climatology of tropopause pressure from the NCAR/NCEP reanalyses) varying only the ozone distribution. Results presented here are for ozone changes throughout the atmosphere. Detailed analyses of ACCMIP tropospheric ozone RF in Stevenson et al. (2013) show that use of a different RTM yields values 10 % higher, providing a rough estimate of uncertainty associated with the RTM. Young et al. (2013) show that the ACCMIP models generally capture observed 1980 to 2000 total ozone column trends relatively well. Here we analyze the ozone RF from most of the same models providing AC-CMIP aerosol simulations (Table 7), including all but two of the CMIP5 subset of ACCMIP models (Table 1). The 1850 to 2000 ozone forcing is 0.33 ± 0.10 W m −2 . Comparison of the radiative impact of present-day tropospheric ozone in these six models versus that of ozone observations from the Tropospheric Emission Spectrometer shows that these models have Atmos. Chem. Phys., 13, 2939-2974, 2013 www.atmos-chem-phys.net/13/2939/2013/ global mean biases of 0.035 ± 0.044 W m −2 (instantaneous longwave forcing), much smaller than the industrial-era RF (Bowman et al., 2012). Stevenson et al. (2013) also show that their ozone forcing results are only very weakly sensitive to changes in the models included (analyzing sets ranging from 4 to 17 models). While present-day biases and very recent trends do not necessarily constrain the long-term behavior of ozone, to the best of our ability to evaluate them these models appear to produce realistic ozone and ozone trends.
As variation between models in the RF due to WMGHGs is small, we examined this in only two models: NCAR-CAM3.5 and GISS-E2-R for 2000 relative to 1850. The spatial patterns of RF are indeed quite similar, and hence we take the mean as representative of the geographic distribution of RF from WMGHGs for all times, and scale the values uniformly to match the global mean RF due to WMGHGs prescribed in the historical period and under the RCPs. Uncertainty is estimated to be 10 % of the RF (Forster et al., 2007).
WMGHG forcing is relatively homogeneous, with slightly greater values at subtropical latitudes where clouds are less prevalent (Fig. 20). WMGHG forcing increases continually with time in the past and in the future under RCP8.5 (in RCP2.6 WMGHG forcing decreases after ∼ 2050). Ozone forcing is positive between ∼ 45 • S and 90 • N, but is negative over and near Antarctica for the 1980 to 2030 timeslices due to the Antarctic ozone hole, which was not yet present in 1930 and has recovered by 2100. Positive ozone forcing maximizes in the subtropics similarly to the WMGHGs, and increases from 1850 through 2000. Ozone forcing continues to rise under RCP8.5 while it decreases to nearly its 1930 value by 2100 under RCP2.6. Relative uncertainties for ozone are substantially larger than for WMGHGs, but are smaller than those for aerosols (Fig. 11).
We then create composite fields of WMGHG, ozone and aerosol forcing, using ERF for aerosols and RF for WMGHGs and ozone. Though clearly it would be preferable to use the same metric for all agents, studies to date suggest that for both WMGHG and ozone ERF and RF values are probably within 5 % (Hansen et al., 2005;Andrews and Forster, 2008;Lohmann et al., 2010). Though aerosol ERF was only calculated for RCP8.5, as discussed previously we can regard the 2100 RCP8.5 aerosol ERF as a reasonable approximation to 2100 aerosol ERF under the other scenarios as well.
Total anthropogenic composition forcing relative to 1850 shows positive global mean values throughout the historical period, but distinct regional differences (Fig. 21). The net forcing is strongly negative over many industrialized areas in 1980 as negative aerosol ERF outweighs positive GHG (WMGHG+ozone) forcing. WMGHG forcing rises sharply in the late 20th century while negative aerosol forcing over Europe and North America declines, so that by 2000 the net forcing is near zero over Europe and positive over North America, while remaining strongly negative only over east 2.83 ± 0.28 0.14 ± 0.12 −0.12 ± 0.10* 2.86 ± 0.32 2100 RCP4.5 4.33 ± 0.43 0.23 ± 0.15 −0.12 ± 0.10* 4.44 ± 0.46 2100 RCP6.0 5.60 ± 0.56 0.25 ± 0.09 −0.12 ± 0.10* 5.74 ± 0.58 2100 RCP8.5 8.27 ± 0.83 0.55 ± 0.30 −0.12 ± 0.05 8.71 ± 0.88 All values are relative to 1850. Uncertainties are 5-95 % confidence intervals, assigned as 10 % for WMGHG RF (Forster et al., 2007), as 1.65 times the standard deviation across models for ozone RF, and as 40 % for aerosol ERF, which is 1.65 times the year 2000 standard deviation. * For 2100 ERF, calculated ERF under RCP8.5 was used for all scenarios, with uncertainty doubled for other scenarios to account for potential differences relative to RCP8.5. and southeast Asia. The negative aerosol forcing over central Africa and northwestern South America visible in 1980 is still present in 2000, but the increased WMGHG forcing balances it by that time. Net anthropogenic composition forcing remains negative over a shrinking portion of southeast Asia in 2030 under RCP8.5, and remains small over parts of Africa and South America with substantial biomass burning, and over Antarctica. Nearly everywhere else forcing exceeds 2 W m −2 , and in many areas is greater than 4 W m −2 . By 2100 under RCP8.5, WMGHG forcing is so large and aerosol forcing so small that there is relatively little spatial variation in the net forcing, which is mostly between 6 and 10 W m −2 . Forcing at 2100 is much lower under the RCP2.6 scenario, but is again dominated by WMGHGs and so is relatively uniform.
Early in the 20th century, forcing from aerosols and ozone largely offset one another, so that the net anthropogenic composition forcing follows the WMGHG forcing (Fig. 21). Aerosol ERF grows increasingly negative through 2000, masking a considerable portion of the GHG forcing. In the 21st century, aerosol masking is reduced and the net forcing again approaches the WMGHG forcing. Uncertainties are heavily influenced by aerosols, so that relative uncertainties maximize in 1980, and are far larger in 1980, 2000 and 2030 than in 2100 (Table 8). As WMGHGs dominate 2100 forcing, the net value is close to the RCP targets for all RCPs (Table 8).
In an earlier generation of climate models (those used in the IPCC Third Assessment Report), there was a distinct correlation between the magnitude of negative aerosol forcing and climate sensitivity (Kiehl, 2007). A similar analysis for the CMIP5 subset of ACCMIP models shows that in this generation of models there is now an anti-correlation between historical aerosol RF and equilibrium climate sensitivity (ECS; taken from Andrews et al., 2012;Bitz et al., 2012;Gettelman et al., 2012) and GISS-E2-R simulations) (Fig. 22). However, ECS is not particularly correlated with aerosol ERF, which is the more relevant quantity, nor with total forcing. Similar conclusions come from comparing forcing with transient climate response. We examine modeled change in comparison with both the Climate Research Unit (CRU) and GISS historical temperature datasets (Brohan et al., 2006;Hansen et al., 2006). As the latter begins only in 1880, we use GISS 1880s to 1996-2005 plus CRU 1850s to 1880s in that case. Many models have difficulty in capturing the observed global mean historical warming (Fig. 22). We hypothesize that representations of aerosol-cloud interactions have become so complex that the emergent aerosol ERF cannot be readily predicted or adjusted, and hence models must accept limitations in their historical simulations to maintain their most realistic representation of aerosol-cloud physical processes.
There is a clear relationship between the historical response and the imposed forcing across the ACCMIP and CMIP5 models (Fig. 22). The regression through the mean of all models is 0.57 C response per W m −2 forcing (0.53 C per W m −2 without the three models that do not include aerosol indirect effects: top three sets of points in the lower panel), though there is substantial variation from this fit due to variations in modeled climate sensitivity and ocean heat uptake. Although uncertainties in both those terms are substantial, the results suggest that given the temperature response per unit forcing in the current generation of ACCMIP/CMIP5 models, aerosol ERF from about −0.8 to −1.5 W m −2 (along with ozone RF of ∼ 0.3 W m −2 ) is consistent with the observed historical warming. Account for the land temperature adjustment incorporated into these ERF estimates implies a bias-corrected ERF of about −0.9 to −1.6 W m −2 . An ERF in this range is in good agreement with most of the inverse calculations and satellite-constrained model studies discussed previously (Sect. 5.2).  Table 8.

Conclusions
We have evaluated the ACCMIP aerosol models against observations of the AOD climatology and trends. The models generally capture the observed magnitude of present day AOD within 30 %, though with a tendency to be biased low, and represent much of the spatial and seasonal structure (correlations typically 0.5 to 0.7). Fine-mode AOD and AAOD show less agreement with observations, with large underestimates for AAOD. Analysis of AOD trends suggests that most models realistically reproduce changes over North America and Europe during the last few decades, while results for Asia are more ambiguous as the observations are highly sensitive to the satellite analysis methodology. Many of the AC-CMIP models do not include nitrate and SOA, which is a primary cause of the general low bias in aerosols, though there are also clear underestimates of AOD in biomass burning regions. CMIP5 models also appear to have an overall low bias in AOD, which again is partially attributable to the lack of nitrate and SOA in many models, and most ACCMIP and CMIP5 models likely underestimate aerosol RF. Further-more, there is evidence from one model that forcing by SOA induced by land-use changes may be large, and from two models that nitrate may become the largest aerosol RF component in the latter part of the 21st century, highlighting the need for more models to represent these processes.
We have used the ERF metric to characterize aerosol forcing, and shown that the relative variation in ERF across models is approximately equal to, or even less than, that in aerosol RF at both global and regional scales. As the majority of aerosol forcing is indirect, RF alone provides a very incomplete portrayal of aerosol impacts. A disadvantage of ERF is that it cannot be diagnosed in transient simulations with multiple forcings the way RF can. Trends in these metrics are in opposite directions for some periods. In particular, during 1980 to 2000, aerosol RF becomes less negative while aerosol ERF continues its historical trend becoming more negative. The ERF trends appear to be largely driven by increases in Asian emissions of both sulfate precursors and BC that largely offset one another's RF while sulfate appears to have a stronger impact on clouds than BC. This is consistent with the large ERF seen downwing of east Asia and with sulfate being a more abundant and soluble aerosol than BC, and with model results showing that the ERF/RF ratio is greater for sulfate than for BC (or OC). However, much more work is needed to characterize ERF by emitted species and region. Given current uncertainties, ERF cannot be adjusted to account for missing components, further emphasizing the need for complete representations of aerosol types.
Our results suggest that while pollution controls in North American and Europe have reduced aerosol forcing, increases in Asian emissions have more than compensated so that globally there has not yet been an unmasking of WMGHG forcing via aerosol reductions. Instead, the continued increase in negative aerosol ERF may have contributed to relative slow rates of global warming during recent years.
The models show a large reduction in negative aerosol ERF at 2030 and 2100. With the pollution controls envisioned under the RCPs, the combined aerosol ERF and ozone RF becomes 5 % or less of the total anthropogenic composition forcing in the latter part of the century. This would lead to an almost total unmasking of WMGHG forcing. Under the RCP2.6 scenario, in fact, the projected change in aerosol ERF from 2000 to 2100 is actually larger than the increased WMGHG forcing. Such a complete reduction in pollutant emissions may be overly optimistic, however, as current legislation certainly does not set the world on such a track Pozzer et al., 2012).

Clear-sky versus all-sky AOD
All-sky conditions determine forcing but observations are usually restricted to clear-sky conditions. Differences between all-sky and clear-sky AOD were only reported for a few models, however. These differences are especially large for the GISS-E2-R model, while they are fairly minor for the other two models reporting both values (CSIRO-Mk3.6 and GISS-E2-R-TOMAS). For the GISS-E2-R model, large positive biases versus observations were found when using allsky AODs, which are replaced by small negative biases using clear-sky values. The incorporation of large aerosol water uptake in cloudy regions, which causes strong non-linearities in optical properties at very high relative humidity (RH) values, seems to have a large influence on the all-sky values in this model. In support of this hypothesis, we note that GISS-E2-R all-sky values are much larger than clear-sky values for sulfate, moderately larger for sea-salt and nitrate, and quite similar for other components, and hence follow the relative solubility of the different species. GISS-E2-R calculates clear-sky AOD by including only AOD values calculated in model locations where clouds are not present, rather than performing a global calculation with clouds removed from the model. This technique is more comparable to the sampling Fig. B1. Bias (%) in models for present-day annual average AOD relative to AeroNet for locations with AeroNet observations. Clear-sky AOD is used for the GISS-E2-R model. of the satellites, but leads to spatial and temporal differences in sampling compared with all-sky calculations in addition to the role of the clouds themselves. It appears that in GISS-E2-R, there is in general substantially more AOD when and where clouds are present. In contrast, for CSIRO-Mk3.6, AOD is less where clouds are present. Several competing factors are at work, including increased wet removal rates and cloud scavenging in cloudy areas leading to lower AOD but increased in-cloud oxidation rates and RH in cloudy areas leading to higher AOD, so it is perhaps not surprising that clear-sky versus all-sky differences can be of either sign.
For other models, HadGEM2 reports that its AOD is clearsky, while MIROC-CHEM and CICERO-OsloCTM2 report all-sky but note clear-sky is very similar in their models. In addition to the issue of sampling that could bias RH if allsky rather than only clear-sky areas are sampled, the RH used in calculation of water uptake is also important and diverges between models. For example, LMDzORINCA uses the clear-sky humidity to compute aerosol growth, while NCAR-CAM5.1 and MIROC-CHEM report that all-sky grid cell mean RH is used. Concerted efforts to diagnose model behavior for the relevant parameters, including how aerosol mass, cloud scavenging and RH are apportioned between clear and cloudy portions of model grid boxes, will be required to better understand the divergence across models in the all-sky/clear-sky AOD ratios.

Regional comparison of modeled AOD with AeroNet
As discussed in the main text, the models show regional biases that are consistent across models in some areas. In particular, Fig. 4 shows that virtually all models have large negative biases over east Asia. The only two models that do not greatly underestimate AOD over east Asia have large positive biases over Europe and North America, where the other models match AeroNet relatively well. Hence the biases appear systematic across models. Additional spatial information can be seen in Fig. B1, which shows that, as noted in the main text, biases over remote island stations in the central Pacific and Indian Oceans, over the southern tip of South America, and over Australia and New Zealand are nearly always positive, while biases over the Russian and especially the Western Hemisphere Arctic are typically negative.

Regional comparison of modeled AAOD with observations
We quantitatively evaluate AAOD on a regional basis against both OMI and AeroNet observations. We calculate the ratio of regionally averaged modeled to observed AAOD. The ratio values using OMI data are calculated including only Fig. C1. Ratio of regional average model to retrieved AeroNet (top) and OMI land-area (bottom) clear-sky AAOD at 550 nm for the AC-CMIP models. Number of measurement sites is given for AeroNet for each region. The AeroNet data are for 1996-2006, v2 level 2, with annual averages for each year used if more than eight months were present, and monthly averages required more than 10 days of measurements. The values at 550 nm were determined using the 0.44 and 0.87 µm Angstrom parameters. Regions are shown in Fig. 6 and are defined as North America (130 to  land-area locations to be more comparable with AeroNet. For comparison, we show similar ratios calculated from an earlier AeroCom model intercomparison (Koch et al., 2009) (though that study did not include Northern Hemisphere Africa or South and Southeast Asia). Models underestimate AAOD in general compared with AeroNet (Fig. C1). This is especially the case for south and southeast Asia, South America, and Southern Hemisphere Africa, where the model average AAOD is less than half the observed, but also for east Asia. Note, however, that these four areas have very limited coverage of AeroNet sites. Model performance is gen-erally better for North America, Europe and Northern Hemisphere Africa. In comparison with OMI land-area AAOD, the models show similar biases to AeroNet except over North America, where underestimations relative to OMI are much larger. In comparison with either set of observations, the GFDL-AM3 model shows the smallest overall underestimate of present-day AAOD, with the largest underestimates in GISS-E2-R-TOMAS and MIROC-CHEM. Biases in different regions tend to be systematic for a given model. For the four regions with the most pronounced low biases (south and southeast Asia, South America, Southern Hemisphere Africa and east Asia), every model examined is biased low in comparison with either AeroNet or OMI. High biases are seen most commonly for Northern Hemisphere Africa, where 2 of 9 models are high relative to OMI and 3 are high relative to AeroNet. The AAOD in this region, which includes the Sahara and Arabian deserts, is dominated by mineral dust aerosols rather than BC (which contributes less than 5 % of total BC+dust AAOD), suggesting that dust loading or absorption may be too high in these models (though biases are comparatively small). We note that calculations of the regionally averaged ratios of modeled to observed AAOD at individual locations always show larger values. This indicates that the model biases are most pronounced at locations with large AAOD.
Biases in the ACCMIP models are often fairly similar to those in the 2009 AeroCom analysis, though typically somewhat larger in comparison with AeroNet. Comparisons with OMI for Europe are very different, however, with the very large overestimate in the AeroCom models replaced by a smaller but still large underestimate in the ACCMIP models. Comparison of the models against OMI over the full land and ocean areas reduces the model biases in all regions. In particular, multi-model mean ratios increase by 0.06 for east Asia, 0.07 for North America, and 0.08 for south and southeast Asia. This accounts for a good portion of the difference for North America relative to the 2009 AeroCom analysis, which did not remove oceanic areas, though it accounts for only a small portion of the east Asia difference.

Appendix D BC albedo forcing methodology and comparison with prior studies
BC albedo forcing was calculated in offline simulations conducted using prescribed meteorology from 1994-2000, with spinup from 1994-1995 and analysis (averaging) over 1996-2000. Black carbon and dust deposition fields from each ACCMIP model were prescribed with monthly resolution (annually-repeating), and linearly interpolated to the model timestep. The land simulations applied the NCAR Community Land Model 4 (Lawrence et al., 2011), using biascorrected atmospheric forcing data from (Qian et al., 2006), and run at 1.9 × 2.5 degree resolution. A sensitivity test run at 0.9 × 1.25 degree resolution showed global mean values within 1 % of those obtained at the coarser resolution. The sea-ice temperature, wind, specific humidity, and surface pressure forcing data come from NCEP, radiation data are from GISS, and precipitation data from the GCGCS blended product. The land snow treatments of aerosol processes and radiative transfer are described by Flanner et al. (2007) and Lawrence et al. (2011), and the new sea-ice aerosol and radiation treatments are described by Holland et al. (2012). The snow and sea-ice fields generated with these offline configurations agree better with observed conditions during this time period than those simulated with coupled land-oceanatmosphere simulations, but the precipitation and aerosol deposition fluxes are less compatible with each other than in coupled aerosol-climate simulations. The influence of this incompatibility on simulated surface snow BC concentrations and radiative forcing is somewhat mitigated by the use of temporally-smoothed monthly aerosol deposition fields. Additional analyses of the BC deposition fields, including extensive comparisons with both recent snowpack measurements and historical trends from ice-cores, as well as further discussion of the BC albedo forcing can be found in .
The BC albedo forcing estimates reported here are smaller than those reported in previous studies (e.g. Flanner et al., 2007Flanner et al., , 2009) because of the offline configuration that was applied, which produces less snow cover (and hence less area over which the forcing can operate) in the Tibetan Plateau and other parts of Asia. Because the forcing was quantified using snow and ice states which are representative of 1996-2000, and likely diminished relative to previous periods, actual BC snow forcings in 1850 may have been slightly greater (Lawrence et al., 2012). Additionally, BC albedo forcing is sensitive to the methodology of the calculation, with values reported from calculations internal to three of the ACCMIP models showing substantial variations from these offline results in their magnitude, though their time-dependence is similar .

Calculation of indirect forcing attributable to specific aerosol components
The indirect aerosol forcing attributable to specific aerosol types was analyzed in only two ACCMIP models as these experiments are very computationally expensive. In the GISS-E2-R model, with all other conditions set at year 2000 values, sulfate, BC and OA were individually removed and their direct RF and ERF were diagnosed in 50-yr atmosphere simulations with fixed ocean conditions. At the same time, a cloud forcing diagnostic was saved that calculates the flux perturbation due to the model's clouds relative to zero clouds ev- The results of these calculations are presented in Table D1. The GISS simulations reveal that for OA and BC, cloud forcing is equivalent in magnitude to RF, and ERF is statistically equivalent to a linear sum of the direct RF and the cloud forcing (though the mean ERF estimated for OA is ∼ 20 % less than the direct plus cloud RF). In contrast, for sulfate the cloud forcing is larger than the direct RF, and the ERF is clearly less (∼ 25-30 %) than the direct plus cloud forcings. This suggests that the greater solubility of sulfate causes it to have an enhanced cloud forcing relative to OA by more efficiently serving as cloud condensation nuclei, and that the overall effect of BC on clouds is dominated by thermodynamic effects of local heating rather than microphysical effects in this model. In the all aerosol case, cloud forcing is much greater than direct RF, which is likely due to the logarithmic dependence of cloud droplet number concentration on nucleation sites (Gultepe and Isaac, 1999) leading to a greater response at the low aerosol numbers reached when all aerosols are removed simultaneously. The all aerosol ERF is, like sulfate (and perhaps OA), approximately 25-30 % less than the sum of the direct and cloud forcings. This suggests that other rapid responses, such as adjustment of the temperature lapse rate and water vapor concentration, compensate for some of the cloud response.
In contrast, in the MIROC-CHEM simulations, BC causes a negative indirect forcing, but forcing by sulfate strongly dominates the total aerosol indirect forcing. The sum of all aerosol forcings is again weaker than the sum of individual components, consistent with the non-linear response at low aerosol number densities discussed above. The sign of the indirect impact of BC differs in these two models. There is a large range seen in the literature, which shows values ranging from about −0.35 W m −2 to +0.3 W m −2 just for the effect of BC on mixed-phase or ice clouds Liu et al., 2009). In addition, while many climate models find a substantial negative BC indirect forcing, observationally-constrained estimates often indicate this forcing is positive (Ruckstuhl et al., 2010;Kaufman and Koren, 2006). This has led recent assessments to conclude that the most likely range for BC's indirect forcing is −0.4 to +0.4 W m −2 (UNEP WMO, 2011;Shindell et al., 2012). Hence reliable quantification of aerosol ERF due to individual components in general, and due to BC in particular, remains a substantial challenge for the community.

Analysis of sulfate and BC burden and lifetime changes
Examination of changes in sulfate removal and BC emissions shows large variations across models for sulfate removal (Table F1), implying substantial differences in sulfur emissions (as these are balanced) probably due to varying dimethyl sulfide emission responses to climate change. In contrast, BC Atmos. Chem. Phys., 13, 2939-2974, 2013 www.atmos-chem-phys.net/13/2939/2013/ The Normalized RF used here is −7.6 W m −2 per unit AOD change. * We calculate mean bias excluding the GISS models as those models have much higher all-sky than clear-sky AOD (see Sect. 3.1).
emissions or removal changes are quite consistent across models. Both sulfate and BC burden changes show comparable spread, however, as do changes in lifetimes. Sulfate lifetimes typically decrease while BC lifetimes typically increase, however. The decrease in sulfate lifetime is consistent with increases in precipitation seen in most models . Changes in lifetime reflect both the influence of climate change and the shifting spatial distribution of emissions (as lifetimes vary regionally). One reason that AOD changes may not provide a better indicator of forcing, especially when AOD changes are small, is because AOD changes include both cooling and warming agents. To see if these might be separated, we also examined correlations between the 2000 versus 1850 RF from sulfate and the change in sulfate burden and between BC ff+bf forcing and BC burden changes. Correlations between sulfate burden changes and sulfate RF are fairly weak at r 2 = 0.44, while correlations between BC burden changes and BC forcing are near zero. This suggests that variations in the optical properties of the aerosols are too large to allow burden changes alone to provide a good indicator of RF, as in prior studies .

Appendix G AOD in the CMIP5 models
Time varying AOD is available from many CMIP5 models. We compare the present-day AOD with AeroNet following the methods described in Sect. 3.2. We also evaluate the decadal mean AOD change between 1850 and 2000, and from that derive an estimated aerosol RF using the normalized RF (NRF) calculated from the ACCMIP models (see Sect. 5.3). As discussed in the main text, this estimate produces a larger mean forcing than that diagnosed directly in the ACCMIP models. This may stem from different experimental setups, such as the coupled ocean, and imperfections in the NRF-based estimates. Results are presented in Table G1.
Analysis of ERF from six additional CMIP5 models was possible based on the simulations and methodology described in Sect. 5.2. Results are presented in Table G2. Note that FGOALS-s2 and bcc-csm1-1 do not include aerosol indirect effects.