Intercomparison and evaluation of global aerosol microphysical properties among AeroCom models of a range of complexity

. Many of the next generation of global climate models will include aerosol schemes which explicitly simulate the microphysical processes that determine the particle size distribution. These models enable aerosol optical properties and cloud condensation nuclei (CCN) concentrations to be determined by fundamental aerosol processes, which should lead to a more physically based simulation of aerosol direct and indirect radiative forcings. This study examines the global variation in particle size distribution simulated by 12 global aerosol microphysics models to quantify model diversity and to identify any common biases against observations. Evaluation against size distribution measurements from a new European network of aerosol supersites shows that the mean model agrees quite well with the observations at many sites on the annual mean, but there are some seasonal biases common to many sites. In particular, at many of these European sites, the accumulation mode number concentration is biased low during winter and Aitken mode concentrations tend to be overestimated in winter and underestimated in summer. At high northern latitudes, the models strongly underpredict Aitken and accumulation particle concentrations compared to the measurements, consistent with previous studies that have highlighted the poor performance of global aerosol models in the Arctic. In the marine boundary layer, the models capture the observed meridional


Introduction
Atmospheric aerosol exerts a substantial influence on the earth's climate both directly by scattering and absorbing solar and terrestrial radiation (e.g.Haywood and Boucher, 2000) and indirectly by affecting the evolution and optical properties of clouds (e.g.Lohmann and Feichter, 2005).There are also many other ways in which the atmospheric aerosol interacts with the earth's climate system (e.g.Heintzenberg et al., 2012).Surface cooling induced by increases in aerosol abundance since the pre-industrial period may have partially offset the warming from increased greenhouse gases, but there is large uncertainty in the magnitude of aerosol radiative forcings, particularly in the indirect effects associated with changes in cloud properties (Forster et al., 2007).There is also a range of Earth System feedbacks associated with climate change induced changes in natural aerosol and precursor emissions (Carslaw et al., 2010) and these are expected to exert a strong influence on regional climate (Paasonen et al., 2013).There is a need for models to better quantify global aerosol properties and trends in order to reduce uncertainties in model projections of future changes in climate (Andreae et al., 2005) and over recent decades (Booth et al., 2012).To address uncertainties in indirect forcings, it is particularly important to improve model representation of aerosol microphysical properties, such as particle number concentrations and size distributions.
Atmospheric aerosol particles have traditionally been separated into coarse and fine particles (diameters larger and smaller than about 2 µm respectively, e.g.Whitby, 1978), which broadly maps onto whether they were mechanically generated or formed following growth from nanometre-sized nuclei.Aerosol particles are also classified as either primary (i.e.directly emitted), or secondary particles (formed in the atmosphere from gas to particle nucleation).Fine particles are much more numerous than coarse particles (e.g.Raes et al., 2000) and consist of small primary particles (e.g.sub-micron sea-spray/dust and carbonaceous combustion aerosol) and also secondary particles, which initially form at nanometre sizes, but can grow by coagulation and condensation to large enough sizes to scatter visible radiation and activate into cloud droplets.Fine particles are further separated into Aitken and accumulation modes, based on observed number size distributions in a range of environments showing two distinct peaks, generally found in the 10 to 100 nm and 100 to 1000 nm dry diameter range (Raes et al., 2000).The larger peak occurs at particle sizes where both dry deposition and sedimentation are relatively inefficient, causing size distributions to evolve into a distinct "accumulation" mode.In remote marine regions, the two separate modes are caused by cloud processing, where the larger sub-set of fine particles activate to cloud droplets where they can grow larger following aqueous chemical reactions in non-precipitating clouds (Lelieveld and Heintzenberg, 1992;Hoppel et al., 1994).Although combustion sources generate particles as small as 10 nm dry diameter, these particles rapidly evolve to larger sizes due to coagulation (e.g.Jacobson and Seinfeld, 2004) and global models directly emit the particles in the mid-Aitken size range (e.g.Dentener et al., 2006).The Aitken size range can also contain secondary particles which have grown from an initial nucleation mode at around 1 to 3 nm (e.g.Kulmala et al., 2004).
Modelling the evolution of the particle size distribution is therefore rather complex, and requires an aerosol dynamics scheme whereby two or more moments (e.g.number and mass) are prognosed in several size classes.Models following this approach are called aerosol microphysics models, and can be broadly classified into two different types.Sectional schemes (Gelbard et al., 1980) discretise the particle size spectrum into multiple size bins whereas modal schemes (Whitby and McMurry, 1997) parametrise the variation of the size distribution within the nucleation, Aitken, accumulation and coarse ranges, with each mode usually approximated via a log-normal function in particle dry diameter.In the 1990s, sectional aerosol microphysics schemes were incorporated into several regional air quality models (e.g.Jacobson, 1997a, b;Lurmann et al., 1997) and in the 2000s became established in several global models (Jacobson, 2001;Adams and Seinfeld, 2002;Spracklen et al., 2005aSpracklen et al., , 2011;;Yu and Luo, 2009;Lee and Adams, 2010;Bergman et al., 2012).Two-moment modal aerosol microphysics schemes were similarly initially implemented into regional models (e.g.Binkowski and Shankar, 1995) and subsequently within several global models (Ghan et al., 2001a, b;Wilson et al., 2001;Stier et al., 2005;Liu et al., 2005Liu et al., , 2012;;Bauer et al., 2008;Mann et al., 2010;Aan de Brugh et al., 2011;Zhang et al., 2012;Bellouin et al., 2013).
The international AeroCom initiative seeks to improve our understanding of global aerosol and associated radiative forcings and has provided a mechanism for coordinating efforts to evaluate and intercompare global aerosol models.The stated overall goals of AeroCom are to identify weaknesses in particular models and modelling aspects, and to assess uncertainties in simulated aerosol properties and radiative forcings (Kinne et al., 2006).The first phase of AeroCom aligned with the lead-up to the Intergovernmental Panel on Climate Change (IPCC) fourth climate assessment report (AR4), and resulted in several multi-model intercomparison papers documenting simulated aerosol optical properties (Kinne et al., 2006), aerosol lifecycles (Textor et al., 2006(Textor et al., , 2007) ) and radiative forcings (Schulz et al., 2006;Penner et al., 2006).New observational constraints on simulated aerosol optical properties from satellite measurements and retrievals from the AERONET global network of sun photometers led to a reduced uncertainty range for aerosol direct forcings in AR4, which also caused a narrower uncertainty range in total anthropogenic radiative forcing (Haywood and Schulz, 2007).
In recent years, many more modelling centres have incorporated aerosol modules with size-resolved aerosol microphysics into climate models.This represents a major shift in model sophistication (Ghan and Schwarz, 2007), improving upon previous "first generation" aerosol schemes in which aerosol optical properties and cloud droplet concentrations tended to be based on the simulated mass of several externally mixed aerosol types, each assigned a prescribed size distribution.The microphysical aerosol schemes calculate and transport the number concentration and component mass in several size classes of particles and can also represent both external and internal mixtures.Separate transport of sizeresolved number and mass allows growth processes such as condensation and aqueous sulfate production to realistically conserve particle number while adding mass, and enables new particle formation and coagulation to provide explicit sources and sinks for particle number, which has been shown to be important in capturing changes in aerosol in response to changing emissions (Bellouin et al., 2013).The microphysics models explicitly simulate the evolution of the particle size distribution, and use this to determine aerosol optical properties and cloud condensation nuclei concentrations.In so doing, they represent aerosol interactions with clouds and radiation consistently with the underlying physics of the fundamental aerosol processes.We note however that climate model representations of cloud processes tend to be highly parametrised, and characterising aerosol-cloud interactions in these models continues to be a major challenge.
In the second phase of AeroCom (AeroCom-2), working groups have been established to examine different aspects of the global aerosol, with a new set of experiments defined (Schulz et al., 2009).Analysis of the AeroCom-2 experiments, and of the original set of experiments, have led to recent publications with multi-model comparisons of simulated direct forcings (Myhre et al., 2013), indirect effects (Quaas et al., 2009), black carbon (Koch et al., 2009;Schwarz et al., 2010;Samset et al., 2013), dust (Huneeus et al., 2011), vertical profiles (Koffi et al., 2012), radiative transfer (Stier et al., 2013;Randles et al., 2013) and organics (Tsigaridis et al., 2014).This paper reports initial findings from a working group to intercompare and evaluate 12 global aerosol microphysics models which participated in AeroCom-2.This initial study focuses on the particle size distribution, whose evolution is specifically simulated by these models, and has so far not specifically been considered in AeroCom publications.Note that we also plan a follow-up study to intercompare simulated CCN concentrations, and will use the globally varying size distribution fields derived here for offline calculations of cloud droplet number concentrations and first indirect radiative effects predicted by the global aerosol microphysics models.
The present paper has three key objectives.First, we aim to document the diversity of simulated particle number concentrations in several size ranges among the new generation of global aerosol microphysics models.Secondly, we derive data sets of multi-model mean particle concentrations that can be used as a reference for future development and improvement of these models.Thirdly, we evaluate the multi-model mean (with associated diversity) against several benchmark observational data sets from ground station networks and compilations over multiple field campaigns.The chosen benchmark observational data sets have been selected to provide a climatological overview of the skill of the models covering both marine and a range of different continental environments, both at the surface and in the vertical profile.In carrying out these objectives, we aim to determine how well the models simulate aerosol microphysical properties and identify any generic weaknesses or gaps in scientific understanding.

Particle size distribution metrics considered
Aerosol indirect radiative effects are driven by the sub-set of particles large enough to be activated to cloud droplets (so-called cloud condensation nuclei, CCN).Although the minimum size for activation can be just a few tens of nm for supersaturations of around 1.0 %, concurrent size distribution and CCN measurements for more moderate supersaturations of 0.2 to 0.5 % suggest that 50 to 100 nm is a reasonable value for the threshold CCN diameter (Kerminen et al., 2012).Aerosol microphysical processes such as nucleation, coagulation, condensation and cloud processing exert a strong control on the evolution of nucleation, Aitken and accumulation mode particle concentrations and are therefore very important in determining CCN concentrations.
In comparing and evaluating size distributions simulated by global aerosol microphysics models, we will often consider integral size-resolved particle concentrations, which help summarise the comparisons and evaluation considering different sub-sets of particles.The number concentrations N 3 , N 10 , N 14 are integral concentrations of particles with dry diameters larger than 3, 10 and 14 nm, and are often referred to as condensation nuclei (CN).The sizes refer to the typical thresholds of condensation particle counter (CPC) instruments, which we use to evaluate the total number of particles simulated by the models across the full measurable particle size range.Not all of these particles are directly relevant to CCN, but they provide information about how well the models capture concentrations of secondary particles, which contribute a large fraction of CCN in many regions (e.g.Merikanto et al., 2009;Kerminen et al., 2012).
We also consider concentrations of particles larger than 30, 50 and 100 nm dry diameter (N 30 , N 50 and N 100 ).The N 50 concentrations counts accumulation and coarse sized particles, and also part of the Aitken size range, with 50 nm representing the minimum size ammonium sulfate particles would activate at supersaturations of 0.42 % (a value typical for marine stratocumulus).The 30 nm dry diameter (N 30 ) represents a typical lower size limit for activation (0.9 % supersaturation) and 100 nm (N 100 ) represents an upper limit (0.14 % supersaturation).Aerosol optical properties are mainly controlled by particles larger than 100 nm, since they account for most of the light scattering at visible and longer wavelengths.None of these metrics are uniquely relevant to the aerosol effect on clouds and climate because the actual activation size depends on the particle chemical composition, cloud updraught velocity and the details of the full size distribution (e.g.Abdul-Razzak and Ghan, 2000;Nenes and Seinfeld, 2003).However, studies suggest (e.g.Dusek et al., 2006) that the particle number size distribution is the most important quantity in determining atmospheric CCN concentrations (Kerminen et al., 2012).The metrics therefore represent typical aerosol microphysical properties of relevance to climate and can easily and consistently be compared among models and with observations.

Description of model experiments
For the second phase of AeroCom coordinated experiments (Schulz et al., 2009), a new control present-day emissions simulation was defined (A2- CTRL-2006).A matching preindustrial emissions double-call nudged run (A2-PRE-2006) was also requested for intercomparison of simulated direct aerosol forcings (see Myhre et al., 2013).To reduce intermodel differences, general circulation models (GCMs) were advised to use nudging techniques (e.g.Jeuken et al., 1996;Telford et al., 2008) to follow meteorological re-analysis fields for the year 2006.Also, GCMs were asked to use a double-call configuration (see e.g.Bellouin et al., 2013) whereby the main "advancing call" to the model radiation scheme has zero aerosol and only a second "diagnostic-call" includes the simulated aerosol properties.This approach allows aerosol forcings to be diagnosed without the aerosol feeding back on the model dynamics, so that control and perturbed experiments have equivalent meteorology.Modellers were also requested to submit 3-D monthly-mean data sets Table 1.List of participating global aerosol microphysics models.Two-moment schemes (2 m) carry number and mass in each size class whereas single-moment (1 m) schemes carry only mass.Most models are modal or sectional but CanAM4-PAM uses the piecewise lognormal approach (pcwise-lgnrml).The "Multi-dist" column indicates whether the scheme includes multiple distributions, i.e. whether it is possible to have two particles of the same size but different composition.The "Tracers" column indicates the total number of transported aerosol tracers for each scheme (the sum of the number concentrations and component masses over all size classes).Schemes running in freerunning (free) General Circulation Models (GCMs) submitted multi-annual monthly means from 5 yr simulations whereas nudged (nudg) GCMs and CTMs submitted monthly-mean results driven by 2006 meteorological re-analyses.

Model
Scheme  Spracklen et al. (2005aSpracklen et al. ( , 2011) ) a Although treatment of SOA in ECHAM5-HAM2 involves 20 SOA species, only four additional advected aerosol tracers are required in addition to the 25 for ECHAM5-HAM.Another four species are required for the condensable organic gases.b Note that GISS-MATRIX scheme follows the quadrature method of moments.
of all transported aerosol types (known as aerosol tracers) to allow flexible intercomparison of simulated particle size distributions between models of different complexity.Having the full tracer distribution available also allowed the models to be compared with a wide range of in situ measurements across different particle size ranges.
Twelve global aerosol microphysics models submitted 3-D all-aerosol-tracer data sets for the A2-CTRL-2006 experiment, with a range of sophistication in their aerosol size representation (Table 1).The number of transported aerosol tracers over these global models ranges from 15 to 160, with between 3 and 100 size classes to describe the size distribution.Several models are flexible in the selection of resolution, the number of layers and their vertical extent, and some apply the aerosol schemes in the stratosphere as well as the troposphere.Furthermore, some models include thermodynamics schemes to represent the gas-particle partitioning of semi-volatile components (e.g.Metzger et al., 2002) whereas others parametrise this process or neglect compounds such as nitrate.The model spatial resolution also varied widely, with the highest longitude by latitude resolution at 1.875 • by 1.25 • and the lowest at 4.0 • by 5.0 • .Six of the eight GCMs nudged to meteorological re-analyses from the year 2006, with the chemical transport models (CTMs) prescribing winds and temperatures from meteorological re-analyses also from that year.Where modelling centres did not have the capability to nudge their GCM to meteorological re-analysis fields, results were submitted from means over 5 yr of freerunning simulations.
Seven of the models use modal aerosol schemes (GLOMAP-mode, ECHAM5-HAM2, EMAC, TM5, CAM5-MAM3, GISS-MATRIX and HadGEM-UKCA), three use sectional schemes (GISS-TOMAS, GLOMAP-bin and ECHAM5-SALSA), whilst GEOS-Chem-APM uses a modal approach for black carbon (BC) and primary organic particles, with sectional approach for other particle types.CanAM4-PAM uses the piecewise log-normal approach, which applies sectional and modal methods for different parts of the particle size spectrum (see von Salzen, 2006).
Eleven of the 12 models use two-moment approaches whereby both the number and mass concentration in each size class are transported, allowing each size class to have representative size which varies in time and space.The GEOS-Chem-APM model uses a single-moment approach, but has a large number of size classes to allow the size distribution to freely evolve in response to the processes.
Table 2 summarises the primary and secondary aerosol sources used in each model.Although the intention was for the models to use the same anthropogenic emissions from Diehl et al. (2012) for the year 2006, this was not achieved, with some submissions using the IPCC year 2000 emissions (Lamarque et al., 2010), and others using the AE-ROCOM first-phase emissions (Dentener et al., 2006).In addition to these differences in emissions inventories, the models also used their own choice for the size and injection heights applied to primary emissions sources.Although recommendations for these emission size assumptions were made by Dentener et al. (2006) (Dentener et al., 2006), HCA-06 (Diehl et al., 2012), IPCC-00 (Lamarque et al., 2010), IPCC-06 (RCP4.5 for 2006, Thomson et al., 2011).The "Primary size" column refers to the geometric mean diameter values (nm) assumed for primary carbonaceous emissions, which most (but not all) models treat as a source of particles consisting of an internal mixture of BC and OC.The comma-separated values shown are for fossil fuel and biofuel sources respectively with geometric standard deviation also shown in parentheses.Nucleation parametrisations are abbreviated as BHN (binary homogeneous nucleation), BLN (activation boundary layer nucleation), THN (ternary homogeneous nucleation), IIN (ioninduced nucleation) and IMN (ion-mediated nucleation).References for nucleation parametrisations are V02 (Vehkamaki et al., 2002), S06 (Sihto et al., 2006), M07 (Merikanto et al., 2007), K98 (Kulmala et al., 1998), K10 (Kazil et al., 2010), N02 (Napari et al., 2002) and Y10 (Yu, 2010).Also shown is each model's column global burdens of sulfate (Tg of sulfur) and BC (Tg of carbon), and global mean surface number concentrations (cm −3 ) of particles with dry diameter larger than 30 nm (N 30 ) and 100 nm (N 100 ).range of values used by the models.The assumed size has been shown to have a strong influence on simulated particle concentrations (Spracklen et al., 2010) and size distribution (Reddington et al., 2011), so we list these here for each model.Many of the models used prescribed oxidant fields in determining aerosol precursor oxidation, although five did have tropospheric chemistry schemes determining oxidant concentrations online in the simulation.A diversity of nucleation parametrisations was apparent across the models, with most including only binary homogeneous nucleation which produces particles only in the free troposphere.

Model
Only one of the models used an empirical boundary layer nucleation mechanism (e.g.Sihto et al., 2006) for their AERO-COM simulations, although some models simulate ternary or ion-induced/mediated nucleation which can generate particles efficiently in the boundary layer.The simulated burdens and surface size-resolved number concentrations from each model are also shown in Table 2 for reference.
Comparison of aerosol properties simulated by the same aerosol microphysics scheme implemented within different modelling frameworks have been carried out for both sectional (Trivitayanurak et al., 2008) and modal (Zhang et al., 2010) modules, and have shown that predictions are sensitive to host model differences.We have therefore chosen not to try to discriminate the extent to which sectional schemes may outperform modal aerosol microphysics schemes, as we believe this would not be possible given the variety of host model frameworks used for the benchmark simulations.

Deriving comparable model size distributions
To compare particle size distributions between models of different complexity, the 3-D-varying number and size for each size class is required.The CanAM4-PAM and GEOS-Chem-APM models submitted data sets which had mapped their size classes onto a fixed size bin grid.Since all other models followed either two-moment modal or two-moment sectional size distribution approaches, a common methodology could be applied.First, the mean dry volume V dry,i was calculated for each size class i summing over all present internally mixed aerosol components j (sulfate, sea salt, BC, organic matter, dust, nitrate or ammonium): where m ij is the number of molecules per particle of component j in mode i, the ρ j and M j are the density and molar mass of component j and N a is Avogadro's constant.The m ij values were derived from each model's submitted number concentrations (n i ) and mass mixing ratios (q ij ) as where M da is the molar mass of dry air, k B is Boltzmann's constant and p and T are the ambient pressure and temperature.Once the mean dry volume for each size class was derived, the geometric (number) mean dry diameter D i was then calculated as where σ g,i is set to unity for sectional schemes and to their assumed constant values for the log-normal modes used by the modal schemes.Each modelling group provided a document explaining the mapping from tracer index to size class and aerosol component, together with their scheme's values for σ g,i , ρ j and M j .
The monthly-mean number concentration N i and size D i was then calculated for each size class on the 3-D grid.The vertical coordinate grid for each model was also constructed from the information provided.
Size-resolved number concentrations were then derived for particles larger than 3, 10, 14, 30, 50 and 100 nm by integrating the size distribution based on n i , D i and σ g,i in each size class.These threshold dry diameters (D thresh ) were chosen to facilitate comparison with the measurements described in Sect.3.2.For modal schemes, partial integrals over each log-normal size class were computed using the error function.For sectional schemes, the calculation involved summing the number concentration in all size classes larger than the threshold size including a fractional contribution from bins with interface dry diameters that span D thresh .
To enable size distributions to be assembled into a multimodel mean, each model's size distribution was calculated on a common size grid.For sectional models, the number size distribution = log e (10) where D i is from the parent model bin dry diameter grid.These parent dry diameter grid size distributions were then interpolated onto a common 50-bin grid D k between 1 nm and 10 µm.For modal schemes, dN d log 10 D was calculated by evaluating the lognormal distribution on the common 50-bin grid: . (5) Although calculating size-resolved number concentrations and size distributions from monthly-mean aerosol tracers does not account for higher temporal variations in mass to number ratios, the approach allows us to intercompare the full set of global aerosol microphysics models with a consistent methodology.To assemble the multi-model mean and diversity, each model quantity at the surface (BC, sulfate, N 30 , N 100 ) was interpolated onto a 1 • by 1 • grid and zonal-means against latitude and height were interpolated onto a 1 • by 100 m grid.

Definition of multi-model mean and diversity
In Sect.3.1, we examine spatial distributions of multi-model mean and diversity over a "central" sub-set of the models, omitting models with aerosol properties outside a chosen range.Such central-model-mean fields provide a "best estimate" of the global distribution of aerosol properties and may also become useful as reference data sets against which to assess evolving model development.We follow the approach of Kinne et al. (2006) in using the central two-thirds (here eight models) as the basis for the central model mean and diversity.When calculating the central-8 mean we take the geometric mean over the values for each model.Note that the assessment of which models are "central" is done locally, so the central mean will be over different models in different regions.As in Kinne et al. (2006), the diversity is presented as the ratio of the maximum and minimum values over those central two-thirds of models.This approach is useful as it immediately gives the factor over which those central models range.It is important to note that we always refer to model diversity as the ratio of the central two-thirds maximum and minimum (rather than as an absolute quantity) to enable the diversity to be compared between clean and polluted regions.Finally, we note that multi-model diversity is not the same as the true model uncertainty.For example, the diversity may be low close to emissions sources if models use similar emissions inventories.Additional uncertainty will be caused by uncertainties in emissions (L. A. Lee et al., 2013) which has not been accounted for here.

Multi-model mean and diversity of aerosol properties
As a reference to help understand the mean and diversity of size-resolved number concentrations, we first examine simulated mass concentrations of sulfate and BC.We do not intercompare simulated particulate organic matter (POM) among the models as this is the subject of another AeroCom intercomparison paper (Tsigaridis et al., 2014).We also do not analyse simulated mass concentrations of dust and sea salt as they are mainly from super-µm particles, whereas our focus is on sub-µm particles.Note however, that the size-resolved POM, dust and sea salt masses in the models are included in the construction of the model size distributions, and hence Diversity here is the ratio of the maximum and minimum values over the central 8 of the 12 models (defined locally, as described in Sect.2.4).Note that the geometric mean is used when averaging over the central-8 models.
their influence on size-resolved number concentration is accounted for.

Surface sulfate and black carbon
Sulfate is mostly a secondary aerosol species formed by oxidation of sulfur dioxide (SO 2 ).In marine regions SO 2 derives mainly from the oxidation of dimethyl sulfide (DMS), produced by phytoplankton, although SO 2 from continuously erupting volcanoes also has an important influence on aerosol properties (Andres and Kasgnoc, 1998;Schmidt et al., 2012).
In the present-day atmosphere, the dominant global source of sulfate is derived from anthropogenic SO 2 which greatly exceeds marine and volcanic SO 2 sources (e.g.Dentener et al., 2006).Figure 1a illustrates this strong anthropogenic influence, with the multi-model mean sulfate mass concentration highest over the main industrialised regions, with maximum surface annual means of 2 to 5 µg m −3 of sulfur over eastern China.BC mainly determines the aerosol absorption and is a primary aerosol mass species, being directly emitted from wildfires and anthropogenic fossil fuel and biofuel combustion sources.The global BC distribution in Fig. 1b reflects these source regions, and since the vast majority of BC is emitted from continental sources, marine concentrations are typically at least a factor of 10 lower than over the continents.
The central diversities of surface sulfate and BC mass (Fig. 1c and d) are generally lower in continental regions than in marine regions.For BC, which is almost entirely emit-ted in continental regions, this land-sea contrast in diversity is much greater.Since BC is a primary emitted species, the main cause of the diversity near to the sources is likely to be differences in emissions between the models, although boundary layer mixing and dry deposition may also play a role.BC emissions are treated in all models based on prescribed emissions inventories, and Fig. 1d shows that the diversity in simulated BC concentrations is less than a factor of 2 in the main polluted regions.
In general, the diversity in surface BC (Fig. 1d) increases substantially with distance away from source, from a factor of about 3 in the main source regions to a factor of 4 to 6 in more remote marine regions, and to around a factor of 10 or more at high latitudes.These large diversities are consistent with the findings from Koch et al. (2009) who found the largest model BC diversity occurred in northern Eurasia and the remote Arctic and Schwarz et al. ( 2010) who showed that, over the remote Pacific, the ratio of the 75th to 25th percentiles was around a factor of 10 at the surface between 60 • N and 60 • S and a factor of 30 to 100 at higher latitudes.In these previous studies, the differences were attributed to both emissions and removal processes.The mapping of the diversity here suggests that differences in removal processes are the dominant source of model BC diversity in remote regions (possibly in combination with approaches to ageing), because diversity is much lower in the main emission regions.This finding agrees with recent studies (Vignati et al., 2010;Kipling et al., 2013) which have also found a strong influence of model treatment of scavenging on simulated BC in  2013) investigated the diversity in simulated BC from seven models participating in the Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP) and also found increasing diversity with increasing distance from source, with the standard deviation among simulated Arctic BC columns greater than their mean.In that study, only one of the chemistry-climate models was nudged to meteorological reanalysis data, while all models used the same emissions inventory, and the large diversity in simulated BC (a factor of 3 for global column burdens) was found to be caused by differences in removal and transport.
The diversity in surface sulfate mass has regional variations that are not evident in BC.For example, there is much more diversity over the high-sulfate region in Europe than over the eastern United States (US).By contrast, the two regions have similar BC diversity at the surface, although the western US is more diverse in simulated BC, where wildfire emissions dominate.Figure 1c also shows that model diversity in simulated sulfate is much higher in northern Europe than in southern Europe.An important sulfate production mechanism is from aqueous oxidation of dissolved sulfur dioxide in cloud droplets (e.g.Barrie et al., 2001) via aqueous chemical reactions with dissolved hydrogen peroxide and ozone.In northern Europe, concentrations of hydrogen peroxide and ozone are much lower than in southern Europe (e.g.Berglen et al., 2004) and different treatments of chemistry, including some models' prescription of oxidant fields (see Table 2) could explain the higher sulfate diversity in northern Europe.The higher sulfate diversity in northern Europe could also be explained by the expected increase with distance away from the source region, due to differences in the representation of removal processes.However, the BC diversity map does not show this maximum in northern Europe, so the model treatment of sulfate production is the more likely cause.In their comprehensive analysis of aerosol microphysical uncertainties, L. A. Lee et al. (2013) also found that aqueous sulfate production was a major cause of uncertainty in simulated CCN at high northern latitudes.

Surface size-resolved particle concentrations
Figure 2 shows global maps of particle number concentrations with dry diameter larger than 30 nm (N 30 , Fig. 2a) and 100 nm (N 100 , Fig. 2b).In each grid box, the central twothirds of the model annual means was calculated, and the map shows the geometric mean over those eight values.Surface N 30 concentrations are highest in the main industrialised regions, due mainly to anthropogenic primary emissions.In eastern China, annual mean N 30 reaches 10 000 cm −3 , and in India, central Europe and eastern USA there are large regions with annual-mean N 30 above 2000 cm −3 .Regions with strong biomass burning emissions also have high annual mean N 30 , with central Africa and South America in excess of 1000 cm −3 .In marine regions, N 30 is much higher in the Northern Hemisphere than the Southern Hemisphere, exceeding 200 cm −3 everywhere between 30 and 60 • N in the North Atlantic and North Pacific.By contrast, N 30 is less than 200 cm −3 throughout the Southern Hemisphere marine boundary layer, falling below 100 cm −3 poleward of 60 • S. It is interesting that, even in the Antarctic, annual mean N 30 never falls below 50 cm −3 , whereas the annual means of N 100 and the mass concentrations of sulfate and BC mass have steep meridional gradients towards the remote polar regions.This constant background N 30 is likely due to a steady source of particles from nucleation in the free troposphere (e.g.Raes, 1995;Merikanto et al., 2009).The presence of this constant background source of potential CCN could be important for determining the baseline pre-industrial cloud droplet concentrations which has a strong influence on indirect forcing over the industrial period (e.g.Carslaw et al., 2013;Schmidt et al., 2012).
Surface N 100 concentrations show a similar spatial distribution to N 30 in continental regions, but with lower concentrations.However, in the outflow regions off the coast of East Asia and eastern USA, N 100 decreases more rapidly away from the source than N 30 which may reflect a lower proportion of particles in marine N 100 than N 30 .Another factor is that larger particles tend to be shorter-lived because they are more efficiently removed by nucleation scavenging.Only a weak local maximum in N 100 is seen in the high sea-spray belt in the Southern Ocean between 40 and 55 • S with N 100 above 50 cm −3 , and N 100 only falls below 10 cm −3 over continental Antarctica.
The diversity in the main anthropogenic emissions regions (Fig. 2c) is high for N 30 (factor 2 to 5), whereas N 100 is substantially lower (within a factor of 2, Fig. 2d) and follows a continental diversity pattern similar to BC (Fig. 1d).The high continental N 30 diversity is partly due to differences in assumed size distribution for primary emissions sources in the different models (see Table 2).A smaller assumed size results in higher primary particle number emissions (for a given particle emission mass flux), and also affects simulated sizedependent processes such as gas to particle transfer and particle growth by coagulation and condensation.Different assumptions for the size distribution of primary emitted particles have been shown to strongly influence simulated particle number concentrations (Pierce and Adams, 2009;Spracklen et al., 2010).Reddington et al. (2011) examined the effect on model size distributions finding a stronger influence on simulated N 30 than N 100 in Europe where carbonaceous emissions are mostly from fossil fuel combustion sources.The size at which these primary particles are emitted also strongly affects how efficiently they are removed and also their cloud nucleating and optical properties.As seen in Table 2, although all the models represent new particle formation, most only include a binary nucleation mechanism such as Kulmala et al. (1998) or Vehkamaki et al. (2002).These parametrisations do not generate a significant number of new particles in the continental boundary layer (e.g.Spracklen et al., 2006;Merikanto et al., 2009;Yu et al., 2010), so the main particle number source in continental regions (near the surface) will tend to be from direct emission of primary particles (e.g.carbonaceous or sub-grid "primary sulfate" particles).
In remote marine regions, N 30 has a relatively low diversity (a factor of 2), with higher values (factor 3 to 6) seen in regions where primary aerosol dominates the particle source, such as the sea-spray belt (40 to 55 • S), and in biomass burning outflow regions (Merikanto et al., 2009).Whereas N 30 has much higher diversity in continental than marine regions, the reverse is true for N 100 (Fig. 2d), which has a diversity generally within a factor of 2 in the anthropogenic source regions, although biomass burning regions are more diverse.Marine N 100 is diverse among the central two-thirds, typically by around a factor 3 to 5, with even higher diversity near the equator.
The patterns of diversity in N 30 and N 100 can be explained by differences in the sources of the two size classes of particles.N 30 in marine regions tends to be dominated by secondary particles which were nucleated in the free troposphere and subsequently entrained into the marine boundary layer (e.g.Raes, 1995;Clarke and Kapustin, 2002;Merikanto et al., 2009).Marine CCN concentrations have been shown (Spracklen et al., 2005b;L. A. Lee et al., 2013) to be relatively insensitive to a factor of 10 change in the free tropospheric nucleation rate, due mainly to the negative feedback effect from coagulation being more effective at higher particle concentrations.In the main sea-spray region (40-50 • S), the N 30 diversity is much higher than in other marine regions, likely indicating differences in the way the models treat ultrafine sea-spray, which is more diverse among the models than concentrations of entrained particles from the free troposphere.Observations from field campaigns (e.g.O'Dowd and Smith, 1993) and laboratory measurements (e.g.Martensson et al., 2003) have shown that sea-spray efficiently produces particles down to sub-100 nm dry diameters and global model studies have shown that these ultrafine sea-spray particles contribute directly to CCN (Pierce and Adams, 2006) and also indirectly through their influence on the size distribution of marine sulfate aerosols (Gong and Barrie, 2003).The higher diversity in marine N 100 (than N 30 ) may also be indicative of those particles being long-range transported or cloud-processed particles that have been shaped by several processes with a higher combined diversity.

Meridional and vertical distributions
In this section, we examine the modelled vertical and meridional distributions, considering zonal-means in each model as a function of latitude and altitude.Figure 3 shows the zonal mean vertical and latitudinal profile of sulfate and BC mass concentrations and Fig. 4 shows N 30 and N 100 .
The zonal and annual-mean BC concentrations (Fig. 3b) are highest for latitudes 30 to 40 • N at about 0.2 µg m −3 of carbon, with a second, slightly weaker, local maximum at 0-10 • N.These two maxima correspond to the major source regions in the mid-latitude Northern Hemisphere (mostly  , d).Diversity here is the ratio of the maximum and minimum values over the central 8 of the 12 models (defined locally, as described in Sect.2.4).All concentrations are with respect to local temperatures and pressures in the models.Note that the geometric mean is used when averaging over the central-8 models.anthropogenic) and tropical regions (mostly biomass burning).It is noticeable that the vertical concentration gradient is steeper for the Northern Hemisphere mid-latitude BC maximum than it is in the Tropics.The explanation is likely to be stronger convection in the Tropics and the fact that wildfire sources can inject aerosol to higher altitudes (e.g.Dentener et al., 2006) whereas anthropogenic BC is mostly emitted near the surface.Since BC is emitted almost entirely in continental regions, its concentration is very low in the mid-and high-latitude Southern Hemisphere.
The vertical profile of BC diversity (Fig. 3d) shows the expected distribution, with the least diversity near source in the lowest few km (50 • S-50 • N).Model diversity is higher in the mid-and upper troposphere and in remote regions because differences in removal and processing add to the initial emissions-induced diversity near sources.Sulfate has a more complex structure of meridional and vertical diversity distribution compared to BC.The lowest diversity occurs between about 3 and 4 km, with slightly higher model diversity at the surface and a factor of 2 to 3 between 1 and 2 km, possibly due to large differences in model treatments of in-cloud sulfate production.There is a local maximum in model diversity for BC between 8 and 11 km in the latitude range 15 • S to 15 • N that is not present for sulfate.This is likely due to the strong sensitivity of BC to different model treatments of convective scavenging (e.g.Kipling et al., 2013).
The different vertical and meridional pattern of sulfate and BC diversity reflects the fact that sulfate is a secondary aerosol species formed via oxidation in the atmosphere some time after emission of the precursor gases (DMS and SO 2 ).Thus sulfate has a less steep vertical gradient than BC above the northern mid-latitude anthropogenic source regions.The meridional gradient in sulfate is also weaker than for BC since there is a substantial marine source of sulfate originating from DMS (mainly during summer).
The meridional and vertical distribution of N 30 and N 100 is shown in Fig. 4. The zonal-mean N 100 distribution (Fig. 4b) is qualitatively similar to the BC distribution (Fig. 3b), but has a much slower decrease with increasing altitude, suggesting that N 100 is influenced by secondary particle sources in the free and upper troposphere.N 30 has an even weaker vertical gradient, particularly in the Southern Hemisphere, consistent with N 30 being more strongly influenced by secondary particles formed in the free troposphere than N 100 .
The model diversity in N 30 (Fig. 4c) is quite high at the surface due to differences in the size distribution of primary emissions.Above the boundary layer the N 30 diversity is much lower as there is a mixture of nucleated and primary particles.It is interesting that for both N 30 and N 100 there is a maximum in model diversity at about 5 to 7 km in the Tropics which could reflect differences in vertical transport and scavenging between the models., d).Diversity here is the ratio of the maximum and minimum values over the central 8 of the 12 models (defined locally, as described in Sect.2.4).All concentrations are with respect to local temperatures and pressures in the models.Note that the geometric mean is used when averaging over the central-8 models.

Comparison with observations
Previous evaluation of multiple global aerosol models against observations (e.g.Kinne et al., 2006) has tended to focus on data sets with a wide spatial and temporal coverage, such as the AERONET sun photometer network (Holben et al., 1998) or satellite data (e.g.Tanre et al., 1997;Torres et al., 2002;Kahn et al., 1998).Although these data sets have given useful information on the global distribution of column aerosol optical properties, they provide only limited information on the particle size distribution.In situ measurements of the particle size distribution have been made in numerous field campaigns and at monitoring sites over several decades, and several data compilations have been created that are useful for model evaluation.
Here, we evaluate the 12 global aerosol microphysics models against several such data compilations from airborne, ship-borne and land-based in situ measurements.Global aerosol microphysics models are considerably more complex than mass-based aerosol schemes with prescribed size distributions (see Sect. 2.2).As a consequence, intercomparing the size distributions simulated by different aerosol microphysics schemes is a technically challenging exercise.Rather than providing a comprehensive evaluation of each model, the idea here is to assess the skill of the multi-model mean and isolate cases where the central models cannot account for the observations.The data sets used are listed in Table 3 and are briefly described below.Their locations are shown on a global map in Fig. 5.

-Global Atmosphere Watch (GAW) sites
The  When comparing to the CPC measurements, we derive from the models particle concentrations larger than 3, 10 and 14 nm.These size thresholds correspond to the cut-off diameters for the different type of particle counter used in the measurements at each site.Clarke and Kapustin (2002) were made.The yellow boxed regions show the locations of the cruise campaign measurements compiled by Heintzenberg et al. (2000).When comparing to the measurements, each of the models was sampled based on a mask or interpolation to these locations.
these sites have several decades of data available which can be used to establish trends in aerosol concentration (e.g.Asmi et al., 2013).In this study, we compare to multi-annual means and standard deviations over the monthly-mean data over the number of years listed in Table 3.The total number of years of data used, and the size thresholds for the CPC at each site are shown in

-The Lindenberg Aerosol Characterization Experiment 1998 (LACE 98)
The LACE 98 campaign (Petzold et al., 2002) took place over eastern Germany during summer 1998 with a range of airborne aerosol measurements made to characterise aerosol properties over central Europe.The aircraft instrumentation deployed in LACE 98 included three CPCs measuring total integral particle concentrations (with different lower size limits) and Passive Cavity Aerosol Spectrometer Probe (PCASP) measurements of the particle size distribution between 0.1 and 3 µm dry diameter.Further work to analyse and process these measurements led to median and 25th/75th percentile profiles of N 5 , N 15 and N 120 on a 1 km vertical grid (see Lauer et al., 2005) that have been used to evaluate size-resolved particle concentrations in the boundary layer and free troposphere, as simulated by global aerosol microphysics models.Note that when comparing to this data set, each model's number concentrations are at ambient temperature and pressure to be consistent with the observed profiles.
-30 yr of ship-borne aerosol measurements Marine boundary layer particle concentrations and number size distribution measurements have been compiled into a global climatological data set (Heintzenberg et al., 2000).The data set brings together measurements from several field campaigns in many regions including the Arctic (Heintzenberg and Leck, 1994;Covert et al., 1996), the central Pacific (Quinn et al., 1990(Quinn et al., , 1993(Quinn et al., , 1995(Quinn et al., , 1996)), the North Atlantic (Van Dingenen et al., 1995;Leaitch et al., 1996;Raes et al., 1997) and the Southern Ocean and Antarctic (Jaenicke et al., 1992;Davison et al., 1996;Bates et al., 1998).The climatology has been used as an observational constraint for global model simulated Aitken and accumulation mode number, size and widths (e.g.Easter et al., 2004;Pierce and Adams, 2006;Spracklen et al., 2007;Trivitayanurak et al., 2008;Zhang et al., 2010;Mann et al., 2012).It would be highly desirable to repeat the valuable efforts of Heintzenberg et al. (2000), and produce a similar, updated marine climatology incorporating the wide range of aerosol microphysics measurement data sets made on cruises since 2000.
-10 yr of aircraft measurements over the Pacific and Southern Oceans Data from numerous field campaigns have been compiled by Clarke and Kapustin (2002) to produce climatological profiles of ultrafine particle concentrations within latitude ranges 70 to 20 • S, 20 • S to 20 • N and 20 to 70 • N. The aircraft measurements very clearly show a distinct maximum in particle concentrations in the free and upper troposphere, which has been shown to provide an important source of CCN in marine regions (Merikanto et al., 2009).Note that when comparing to this data set, each model's number concentrations are converted to standard temperature and pressure to be consistent with the observed profiles.

Total particle number concentrations at GAW sites
Figure 6 shows a scatter plot of modelled annual mean particle number concentrations against the multi-year annual mean from the observations at each site.The model values are simulated concentrations of particles larger than the cutoff diameter used by the CPC at each measurement site (3, 10 or 14 nm, see Sect.2.3 and Table 3).The vertical whiskers indicate the range over the central 8 models, whereas the horizontal whisker shows the standard deviation over the annualmeans over the several years of measurements (see Table 3).The central-model mean represents the spatial variation of the annual mean particle concentrations well with a Pearson correlation coefficient (R) of 0.96 and normalised mean bias (b) of −0.21, and is within a factor 2 of the observations at all 13 sites.However, as seen in Sect.3.1, particle concentrations are rather diverse among the different models.For example, at Pallas and Mace Head, the central model diversity is about a factor of 5.The three FT sites (Jungfraujoch, Mauna Loa and South Pole) have lower diversity but still it is around a factor 2 to 4. This large model diversity indicates against CPC observations at all 13 GAW sites.Different size thresholds are used for each site corresponding to the cut-off diameters for the CPC used (3 nm at Cape Grim, Hohenpessenberg, 10 nm at Jungfraujoch, Mauna Loa, Mace Head, Southern Great Plains, Pallas and 14 nm at South Pole, Neumayer, Barrow, Samoa, Trinidad Head, Bondville).The model values are geometric means over the central 8 models with the vertical whisker indicating their range.For the observations, the multi-annual mean is shown with the horizontal whisker showing plus and minus the standard deviation over the several years of data shown in Table 3. that many of the models have considerable biases against the observations.However, at only 2 of the 13 sites (Southern Great Plains and Neumayer) does the central two-thirds range not span the multi-annual mean of the measurements.It is interesting that the central models have opposite bias at the two Antarctic sites, tending to be slightly biased high at the South Pole site, but biased low at the coastal Neumayer site.Boundary layer nucleation events have been observed in a recent field campaign at Neumayer (R. Weller, personal communication, 2013) and have also been measured at the Finnish coastal Antarctic site Aboa (Asmi et al., 2010).The coastal N 14 low bias could therefore be due to most models' nucleation parametrisations not forming new particles efficiently in the boundary layer.The other site with a low bias is Southern Great Plains in rural continental USA.As shown in Table 2, most of the model nucleation parametrisations do not generate particles efficiently in the boundary layer, and such boundary layer nucleation mechanisms have been shown to represent a substantial source of small particles in rural continental environments (e.g.Kulmala et al., 2004;Spracklen et al., 2006Spracklen et al., , 2008)).
Annual cycles of total particle number at the GAW sites are shown in Figs.7-9.Considering the free troposphere sites, Mauna Loa in Hawaii (19 • N) has no significant seasonal variation (Fig. 7b), whereas Jungfraujoch (Fig. 7a) and South Pole (Fig. 7c) have clear seasonal cycles with summer total particle concentrations higher than in winter by factors of about 2 and 10 respectively.At South Pole, this seasonal cycle in N 14 is likely driven by the strong seasonal variations in DMS seawater concentration and photochemistry although seasonal transport effects are also a likely contributor (Bodhaine et al., 1986).The central model mean captures the South Pole seasonal cycle in N 14 very well (R = 0.95) albeit with a slight high bias (b = 0.39, as seen in Fig. 6), which worsens during winter.At Jungfraujoch, the seasonal cycle likely reflects stronger photochemistry during the summer, leading to higher gas phase H 2 SO 4 concentrations or organic vapours which will tend to give higher nucleation rates at the site (Boulon et al., 2010).Increased pollution and transport from lower altitudes during the summer will also be an important influence.The models also show elevated N 10 during summer at Jungfraujoch, although the central-8 mean model shows a moderate low bias (b = −0.13)over the full year.
For the marine boundary layer GAW sites, the strong seasonal cycle at the Antarctic coastal site Neumayer (Fig. 8b) is well captured by the multi-model mean (R = 0.92), with a low bias (as seen in Fig. 6) apparent throughout the year (b = −0.51).However, at the Alaskan site Barrow, although the central-mean model compares fairly well with observations on the annual mean, the seasonal cycle is not well captured (R = 0.22), with the models highest in May when the observations show a local minimum (Fig. 8c).Simulating Arctic aerosol is challenging because of the complex factors that lead to the formation of the Arctic haze observed in late winter and early spring (e.g.Quinn et al., 2002).The poor model performance is consistent with the findings of previous studies, which have highlighted the importance of seasonal variations in scavenging processes and local nucleation (Browse et al., 2012(Browse et al., , 2013;;Bougeois and Bey, 2011;Garrett et al., 2010;Huang et al., 2010;Korhonen et al., 2008a;Liu et al., 2011).At Mace Head (Fig. 8a), simulated particle concentrations are biased low (b = −0.48)as seen on the annual mean in Fig. 6, and the models also do not capture the observed concentration peaks in May and September (R = 0.22).At Cape Grim (Fig. 8f), the N 3 seasonal cycle (over all air masses) is fairly flat despite there being an established strong influence of DMS on the N 3 and CCN seasonal cycle from the marine air mass sector (e.g.Ayers and Gras, 1991;Korhonen et al., 2008b).At the other two sites: Samoa in the Pacific (Fig. 8d) and Trinidad Head on the US California coast (Fig. 8e), the observations show no clear seasonal cycle, but the models have highest concentrations in late summer at Trinidad Head, which is not seen in the observations.
At the continental boundary layer sites (except for Southern Great Plains), the central-8 model mean agrees well with the observations on the annual mean (Fig. 6).The weak seasonal N 14 variation at Bondville (Fig. 9b) and Hohenpeissenberg (Fig. 9d) is also well captured by the central-8 model mean, although the models predict a peak at Hohenpeissenberg during March that is outside the observed multi-year mean plus or minus standard deviation (1995 to 2005).At Pallas (Fig. 9c), the observations show a strong www.atmos-chem-phys.net/14/4679/2014/Atmos.Chem.Phys., 14, 4679-4713, 2014 Figure 7. Simulated annual cycle in surface N (D p > 10 nm/14 nm) against CPC observations at free troposphere GAW sites Jungfraujoch (10 nm), Mauna Loa (10 nm) and South Pole (14 nm).The solid line is the geometric mean over the central two-thirds of models in each month, with the dashed lines the minimum and maximum over those central-8.The dotted line shows the minimum and maximum over all 12 models.The error bars on the observations indicate the standard deviation over the several years of data shown in Table 3.
Figure 8. Simulated annual cycle in surface N (D p > 10 nm/14 nm/3 nm) against CPC observations at marine boundary layer GAW sites Mace Head (10 nm), Neumayer (14 nm), Barrow (14 nm), Samoa (14 nm), Trinidad Head (14 nm) and Cape Grim (3 nm).The solid line is the geometric mean over the central two-thirds of models in each month, with the dashed lines the minimum and maximum over those central-8.The dotted line shows the minimum and maximum over all 12 models.The error bars on the observations indicate the standard deviation over the several years of data shown in Table 3.
seasonal variation, with monthly mean N 10 concentrations around a factor of 3 higher in spring and summer than in winter.The central-8 mean model particle concentration peaks in spring rather than summer, and the variation is weaker than in the observations (by about a factor 2). Spracklen et al. (2010) found that including a boundary layer nucleation mechanism improves the seasonal variation in particle concentrations at continental sites, particularly at Pallas, although simulated concentrations tend to peak in spring whereas the observations show a peak in summer.Secondary organic aerosol has been shown to strongly influence new particle formation rates (e.g.Metzger et al., 2010) and Scott et al. (2014) examined the seasonal cycle in N 80 at Hyytiala and Pallas, showing that the observed summertime peak in particle concentrations could be much better reproduced in their model when an organic-mediated nucleation parametrisation was used.3.

Size-resolved number concentrations at EUSAAR/GUAN sites
Figure 10 compares the mean of the central two-thirds models with observed size-resolved particle concentrations at 17 low-altitude sites in the EUSAAR/GUAN network (Asmi et al., 2011).The seven sites above 900 m altitude were omitted as these tend to be affected by local factors, for example daily variations from polluted air masses from lower altitudes (Asmi et al., 2011), which is unlikely to be captured at the coarse resolution used in the global models.Asmi et al. (2011) analysed the EUSAAR/GUAN observations, presenting percentiles of the size distributions and sizeresolved number concentrations from the hourly measurements.However, since the model results are monthly means, i.e. an arithmetic mean over values at all time steps, we compare here against an arithmetic mean over the hourly observations (A.Asmi, personal communication, 2012).In the full size distribution comparisons , the median observed values are also shown for reference (from Asmi et al., 2011).At most sites, the median and mean observed values are similar at sizes larger than 100 nm, but at Aitken mode sizes (10 to 100 nm), the median is much lower than the mean, suggesting that it is temporally the more variable of the two modes.
Simulated N 30 (Fig. 10a) is very diverse among the central 8 models at most of these European sites, more so than for  N 100 (Fig. 10c), more than 50 % of their mean at many sites.However, at the Arctic site Zeppelin N 30 diversity is lower than for N 100 , consistent with the spatial distribution in diversity seen in Fig. 2. Despite this large model diversity however, as seen for the comparisons to the CPC measurements (Fig. 6), the central two-thirds model mean generally compares quite well with the observations on the annual-mean, with R = 0.80, 0.80, 0.78 and b = −0.19,−0.23, −0.36 for N 30 , N 50 , N 100 respectively over the full set of sites.At all sites, except Ispra (which is strongly influenced by local pollution sources) and the Arctic site Zeppelin, the central mean is within a factor of 2 of the observations for all three size ranges on the annual mean.Aside from Zeppelin, the N 100 particle concentrations have lower diversity and also generally compare better with the measurements than N 30 and N 50 .This suggests that CCN concentrations (which can be approximated by N 50 ) are more diverse among the models than are aerosol optical properties (which are mainly influenced by particles larger than 100 nm).It is noticeable however that Ispra and Preila have a stronger low bias at N 100 than N 30 .
Simulated size-resolved number concentrations across the full annual cycle are compared to the EUSAAR/GUAN observations in Fig. 11 (Nordic and Baltic sites), Fig. 12 (western European, Mediterranean and Arctic sites) and Fig. 13 (central European sites).Figure 14 summarises these seasonal cycle comparisons in terms of the winter and summer bias (model divided by observed) for each site.
From Fig. 10 we have seen that, on that annual mean, at the Nordic and Baltic sites, the central two-thirds mean is in good agreement with the observations for N 30 , N 50 and especially N 100 .However, the seasonal cycle is less well captured at these sites (Fig. 11).In particular, for several of the sites, the central-8 mean model N 30 is mostly biased high during the winter (see also Fig. 14) and biased low during the summer.This discrepancy is similar to the total particle concentration comparison at Pallas (Fig. 9c), with the multi-model value having a fairly flat seasonal variation whereas the observations show concentrations at least a factor of 2 higher in summer than winter.By contrast, the central-8 model mean captures the seasonal variation in N 100 much better.For many of the models (see Table 2), binary homogeneous nucleation is the only new particle formation mechanism, and this may explain the poor seasonal variation of N 30 in the models.As already noted, Spracklen et al. (2010) found that, in model simulations with only binary nucleation, although adjustments to the assumed size distribution for primary emissions could reproduce observed annual mean concentrations of the finest particles at Pallas, better agreement with the observed seasonal cycle could be achieved by also including a nucleation mechanism effective in the boundary layer.
At the Arctic EUSAAR site (Fig. 12f), the observations show that there is a substantial shift in the particle size distribution in the winter and early spring compared to the rest of the year.High accumulation mode concentrations (N 100 ) are observed between January and April (the Arctic haze season) whereas Aitken mode particles (N 30 ) are highest during summer.In contrast, for the central-8 mean model, N 30 and N 100 have very similar seasonal cycles.Possible reasons for this model-observation discrepancy could be due to the models not representing seasonal changes in long-range transport and the models' limited representation of scavenging by drizzle, which has also been shown (Browse et al., 2012) to be an important control for simulated Arctic aerosol during summer.Local particle sources (missing in most models) have also been shown to exert important controls on Arctic aerosol properties, for example marine primary organic aerosol (e.g.Leck and Bigg, 2005) or boundary layer new particle formation (Browse et al., 2013).
At Harwell (Fig. 12b), the central-8 mean model N 30 and N 100 agree quite well with the observations (R = 0.42, 0.11 and b = 0.36, −0.01).At Cabauw (Fig. 12a) the central-8 mean agrees quite well with N 100 (R = 0.44, b = −0.28),whereas at Mace Head (Fig. 12c) the models strongly underpredict N 100 (b = −0.48)with observed peaks in December, February, May and September not captured by any of the central models (R = 0.27).As seen for most of the Nordic and Baltic sites, at both Cabauw and Mace Head, the central-8 mean model underestimates N 30 during summer.Mace Head has been shown to be influenced by coastal new particle formation events (e.g.O'Dowd et al., 1998) which will not be well represented in the global models, and this could explain some of the strong underprediction of particle concentrations during the summer.By contrast, new particle formation episodes are much less frequent at Harwell, occurring on only around 5 % of observation days (Charron et al., 2007).
We saw in Fig. 10 that the models underpredict particle concentrations at Ispra for all three size ranges.In Fig. 12e, it is clear that the low bias at this site is apparent throughout the year, with the accumulation mode (represented by N 100 ) particularly strongly underestimated b = −0.74,with even the highest of the central models being too low.Very high N 100 is observed during winter, likely reflecting local boundary layer trapping of nearly pollution sources adjacent to steep orography, which will tend to be poorly represented at the coarse resolution of the global models.Another source of error in N 100 could be that most of the models do not represent nitrate aerosol, which efficiently partitions into the particle phase during the colder winter months (e.g.Adams et al., 1999), although this alone is unlikely to explain such a large N 100 discrepancy.
At the five central European sites (Fig. 13), the central-8 model mean N 30 compares quite well to the observations over the annual cycle.However, at several of these central European sites (Bosel, Kosetice, Melpitz, Waldhof), the observed N 30 shows a local maximum in April or May that is not seen in the models.For N 100 there is quite good agreement at the five sites during summer, with a weak low bias, but there is a much larger low bias during winter at many of the sites, as was also seen at Ispra.
An overview of the summer and winter N 30 , N 50 and N 100 biases against the measurements is shown in Fig. 14.As seen for the annual mean comparisons, aside from Ispra (JRC) and Zeppelin (ZEP), modelled N 100 is generally in good agreement with the observations during summer.During winter however, modelled N 100 is biased low at many sites, which could indicate missing number sources at those sizes or insufficient growth from smaller sizes.Aquila et al. (2011) evaluated a global aerosol microphysics model against a different set of European size distribution measurements (Van Dingenen et al., 2004) and also found that, in the accumula-tion mode, number concentrations had a strong low bias during winter but were in much better agreement during summer.Nitric acid partitions into the particle phase during winter forming an important component of the sub-µm particle mass (e.g.Adams et al., 1999), and this may account for some of the missing mass.Tsigaridis et al. (2014) find a general underprediction of wintertime organics which will also contribute to this model accumulation mode low bias.
For N 30 the agreement is also reasonable, however the median model often has a high bias during winter and a low bias during summer.This was also seen for the total particle concentrations comparison for Pallas (see Fig. 9c) with a flat seasonal cycle in the models whereas the observations showed greatly enhanced concentrations during the summer.A factor that could explain some of this bias is that many of the models may have used too small particle size (when characterising primary emissions) leading to a high bias in particle number emissions derived from the emitted mass flux.This would lead to too many particles in the Aitken sizes and too few in the accumulation mode, which would also be consistent with the N 30 and N 100 biases seen at many of these continental sites.There is clear need for improved understanding of primary and secondary particle sources, and better constraints for model assumptions for the size of primary emitted particles.Future studies are needed to carry out more detailed comparisons of the model size distributions to the new measurements from the EUSAAR/GUAN supersites.For example these could examine probability density functions over high temporal resolution model and observed data sets and apply cluster analysis techniques (e.g.Beddows et al., 2009), such as have already been applied to the EUSAAR/GUAN sites (Beddows et al., 2014).

Sub-µm size distributions at European surface sites
Figures 15-17 compare simulated particle size distributions against the SMPS/DMPS measurements at the EUSAAR and GUAN sites.The upper panels (a-f) are for summer with the lower panels (g-l) showing winter.Model size distributions are derived from the different complexity models following the methodology described in Sect.2.3.When comparing the multi-model size distribution to the measurements, one should compare the red solid line (central model geometric mean) to the black solid line, which shows the arithmetic mean over the hourly observations for that season.The observed median (dot-dashed black) and 5th to 95th percentile ranges (grey shading) as published by Asmi et al. (2011) are also shown for reference.
Where there is a large difference between the observed median and mean size distributions, it is indicative that the site experiences large temporal variability in particle number concentrations.Many of the sites show such large variability in the Aitken size range, and at some sites (e.g.Hyytiala, SMR) this may indicate that nucleation events (e.g.Kulmala et al., 2004) frequently affect that part of the size range.Such variability can also exist when a site experiences diverse air mass types.For example, at Mace Head (MHT) there is large variation across the Aitken and accumulation size range, which is likely due to the site experiencing episodes of polluted air from mainland Europe as well as the more frequent clean air from the North Atlantic.
At Nordic and Baltic EUSAAR sites, in summer the multimodel geometric-mean size distribution (red line) compares well to the observations (solid black line) in the accumulation mode (except for Preila) but tends to be biased low in the Aitken size range (Fig. 15).At most of these sites, the maximum over the central-8 models (dashed line) compares better to the observed size distribution below 200 nm dry diameter.This indicates that some models are better able to capture the size distribution at these sites and sizes.In winter however, the multi-model mean overestimates the concentration of Aitken particles and the central-8 model maximum is biased very high (by up to a factor 10).By contrast, the models' wintertime accumulation mode has a strong low bias, which can be interpreted either as a substantial underprediction of particle growth or as an underprediction of particle sources at these sizes.To grow these particles sufficiently to match the observations however, would require about a factor 2 increase in diameter, equivalent to a factor 8 increase in mode mass, suggesting that missing number is an important component.
At the central European sites (Fig. 16a-f), there is good agreement between the modelled and observed accumulation mode in summer.The summertime Aitken mode low bias seen at Nordic and Baltic sites is much less in central Europe, although the multi-model mean is still slightly low.In wintertime (Fig. 16g-l), the Aitken mode compares quite well with, if anything, a slight high bias at some sites.However, the wintertime accumulation mode low bias seen in the Nordic and Baltic sites is very evident here.
At Harwell during summer (Fig. 17b) the multi-model mean compares very well with the observations across the entire size range, but in winter (Fig. 17h) there is much too little number (and mass) in the accumulation mode and too much number below 200 nm dry diameter.At Mace Head and Cabauw, during summer (Fig. 17a and c), although there is good agreement above 200 nm dry diameter, there is a strong underestimation of number in the Aitken mode size range (10 to 100 nm) at both sites, although the size of the Aitken mode peak is well represented.The summertime Aitken low bias, and the high variability in the Aitken size range (difference between the solid and dot-dashed black lines in Fig. 17a), would be consistent with biogenic nucleation events occurring during summer as observed frequently at the coastal Mace Head site (e.g.O'Dowd et al., 2007).At Cabauw however, the median and mean size distribution are similar across the size range, suggesting that a more uniform particle source is missing or underestimated in the models.Also,  Asmi, personal communication, 2012).The published (Asmi et al., 2011) median (black dot-dashed) and 5th to 95th percentile range (grey shading) over the hourly measurement data are also shown for reference.considering Fig. 17g and i, whereas Mace Head compares better in the Aitken mode during summer, the Cabauw Aitken mode low bias is present in both seasons, suggesting that the cause of the model-observation discrepancy may be different between the two sites.As noted in the discussion around Fig. 12b, new particle formation events are rather infrequent at Harwell (Charron et al., 2007), and the better agreement there is consistent with such secondary particle production not being well captured by the models.
At the Arctic site Zeppelin, during summer (Fig. 17f), the multi-model mean has a low bias across the size range, although the models do capture the observed shape of the size distribution with the Aitken mode peak being around a factor 2 higher than the accumulation mode peak.During winter however (Fig. 17l), the observations suggest that the Aitken peak is a factor 10 higher than the accumulation mode peak, whereas the multi-model mean predicts the ratio less than 2. The observed 5th to 95th percentile range suggests that very high observed particle concentrations are sporadically observed at around 20 nm dry diameter, which indicates a strong local nucleation or ultrafine particle source, which none of the central models capture.
At the Mediterranean site Finokalia, the multi-model mean compares well with the observations in both summer and winter (Fig. 17d and j).The good agreement in the accumulation mode at this site is consistent with the model wintertime accumulation mode low bias seen at other sites being caused by semi-volatile organics or nitrate since the warmer conditions at Finokalia will mean these species will tend not to partition into the particle phase there.At Ispra (Fig. 17e  and k), the previously identified very strong wintertime accumulation mode low bias is clearly evident, likely due to boundary layer trapping of local pollution sources.During the summer there is a more moderate low bias across both Aitken and accumulation size particles.

Vertical profile of size distribution over Europe
Figure 18 compares the models against a compilation of aircraft measurements of size-resolved particle concentrations from the LACE 98 field campaign (Petzold et al., 2002).The measurements comprise vertical profiles of N 5 and N 15 from two CPCs, and N 120 from integrating the size distribution measured by the PCASP instrument (as presented by Lauer et al., 2005).For this comparison, the model data for  Asmi, personal communication, 2012).The published (Asmi et al., 2011) median (black dot-dashed) and 5th to 95th percentile range (grey shading) over the hourly measurement data are also shown for reference.
August was interpolated to 14.0 • N, 52.1 • E, the mid-point of the relatively small region of the flights (13.5-14.5 • N, 51.5-52.7 • E, Lauer et al., 2005).The model vertical profiles were then interpolated onto a common pressure grid between 950 and 220 hPa.
The modelled accumulation mode particle concentrations (represented here by N 120 ) capture the vertical profile well (Fig. 18c), although throughout the lowest few kilometres most of the models have a considerable low bias (b = − 0.48 for the central two-thirds model mean).For particle concentrations at the smallest sizes (N 5 and N 15 ), the central twothirds model mean is also biased low in the boundary layer, but is biased high (around a factor of 5) in the free and upper troposphere.Within the boundary layer the observations show a sharp increase in N 5 and N 15 towards the surface that is not captured by the central models, likely due to nucleation being underestimated.The observations also suggest only a weak peak in N 5 in the upper troposphere, with maximum concentrations of about 800 cm −3 , whereas the models predict a strong peak with a central-8 mean and range of about 2300 cm −3 and 900-9 000 cm −3 .

Marine boundary layer size distributions
Marine boundary layer (MBL) particle size distribution measurements from Heintzenberg et al. (2000), based on 30 yr of field campaigns, are shown for the Southern Hemisphere (Fig. 19) and Northern Hemisphere (Fig. 20).To summarise these comparisons, in Fig. 21 we compare the models' simulated number and size in the Aitken and accumulation modes to observed values shown in Heintzenberg et al. (2000), which were derived via log-normal fits to the size distribution measurements.The data compilation is based on 15 • longitude by 15 • latitude averages of ship-borne measurements using Differential Mobility or Aerodynamic Particle Sizers (DMPS/APS) over several field campaigns (see Fig. 5).To derive equivalent size distributions from the models, the number concentration and representative dry diameters for each model's size class were averaged over marine grid boxes in each of the 15 • by 15 • regions.
The observations show that accumulation mode number concentrations are approximately symmetric across the equator, while Aitken mode particle concentrations are around a factor of 2 higher in the Southern Hemisphere than in the Northern Hemisphere.The measurements also show that  Asmi, personal communication, 2012).The published (Asmi et al., 2011) median (black dot-dashed) and 5th to 95th percentile range (grey shading) over the hourly measurement data are also shown for reference.
typical sizes of both Aitken and accumulation modes are around 25 % larger in the Northern Hemisphere, implying a factor 2 higher particle volume concentration, approximately matching observations of sulfate mass.
In the Southern Hemisphere (Fig. 19), the central models capture the general variation of the boundary layer size distribution, with the observed minimum between the Aitken and accumulation modes (e.g.Hoppel et al., 1994) at around the right size, although peak concentrations in both modes are biased low by about a factor 2 south of 30 • S. The shift in the Aitken-accumulation mode dN / d log 10 r ratio is also well captured, with the Aitken mode peak stronger than the accumulation mode south of 30 • S, whereas these two size distribution peaks are of similar magnitude between 30 • S and the equator.
In the Northern Hemisphere (Fig. 20), the multi-model mean size distribution is rather flat, which likely indicates that the models do not agree on the position of the Hoppel gap rather than the models predicting a flat size distribution across the Aitken and accumulation size ranges.At latitudes > 30 • N, the central model range of MBL number concentrations in these two modes agrees quite well with the obser-vations, but is low biased between 0 and 30 • N. The general shift in the Aitken-accumulation dN / d log 10 r ratio is again well captured, with the two peaks approximately equal at low latitudes and the Aitken mode peak much stronger at high latitudes.
Figure 21 compares the meridional variation of N 10 (Fig. 21a) and particle concentrations in the Aitken (Fig. 21b) and accumulation (Fig. 21c) size ranges.The comparisons show that although the general variation of the size distributions is well captured, the models predict higher MBL particle concentrations in the Northern Hemisphere than in the Southern Hemisphere whereas the observations show the reverse.
A general finding across all the models is that Aitken mode particle concentrations are underpredicted in Southern Hemisphere mid-latitudes and overpredicted in Northern Hemisphere mid-latitudes.The Southern Hemisphere low bias in Aitken mode particle concentrations has also been found in multi-model comparisons of sectional (Trivitayanurak et al., 2008) and modal schemes (Zhang et al., 2010).Pierce and Adams (2006) found the bias was much reduced by using sea-spray source functions which capture the observed  efficient emission at ultrafine particle sizes (e.g.Martensson et al., 2003;Clarke et al., 2006).The meridional variation of accumulation mode concentrations is better captured with good agreement in the Northern Hemisphere, but a low bias in the Southern Hemisphere mid and high latitudes.As noted by Spracklen et al. (2007), it is also important to realise that most of the Southern Hemisphere cruise measurements in the Heintzenberg et al. (2000) observation climatology were taken during the summer.So some of the apparent low bias in Aitken and accumulation mode concentrations there may just be reflecting a sampling bias with higher concentrations tending to be observed and modelled (not shown) during the summer.

Vertical profile of particle concentrations in marine regions
Figure 22 compares vertical profiles of total particle concentrations (N 3 ) over the Pacific and Southern Oceans against profiles compiled from aircraft measurements (Clarke and Kapustin, 2002).These measurements were produced from ultrafine condensation particle counter (u-CPC) measurements over several field campaigns (GLOBE-2: May 1990, ACE-1: November 1995, PEM-Tropics A: September 1996 and PEM-Tropics-B: March 1999), and compiled as three separate climatological profiles for the Southern Hemisphere (70-20 • S), tropical regions (20 • S-20 • N) and the Northern Hemisphere (20-70 • N).
In the free and upper troposphere, over all three marine regions, the central models capture the vertical N 3 profile very well, with relatively small inter-model diversity.This agreement is in contrast to Europe, where the models overestimate Figure 21.Meridional variation of central-model simulated N 10 , Aitken mode and accumulation mode particle concentrations in the marine boundary layer, compared with a compilation of observations from cruise measurements (Heintzenberg et al., 2000).The observed values were derived from fitting modes to the full size distributions, whereas the model Aitken and accumulation mode concentrations are here calculated as mean N 10 − N 100 and N 100 respectively, averaging over all marine grid boxes in each latitude band.The solid line shows the geometric mean of the central-8 models, dashed lines indicate the maximum and minimum of the central-8, while dotted lines indicate the maximum and minimum over all 12 models.particle concentrations (Fig. 18).The observed maximum in particle concentrations (which reflects the balance between particle production via nucleation and loss via coagulation) is captured very well by the central-8 model mean in the Northern and Southern Hemisphere regions, although it is biased slightly low in the Tropics.The central-8 model mean captures boundary layer N 3 concentrations well in the Tropics and particularly the Northern Hemisphere, although there is a slight low bias compared to the aircraft measurements in the Southern Hemisphere.Considering the full model range, one model is showing a factor 20-50 too high particle concentrations, which could indicate too high sulfuric acid vapour concentrations or that the nucleation parametrisation is producing particles much too efficiently.The lowest model has N 3 a factor 10 too low throughout the free troposphere.Since N 3 is dominated by secondary particles from new particle formation, the low bias could be due to an aerosol surface area high bias in the free troposphere, which would give too low simulated sulfuric acid concentrations and nucleation rates.Lee et al. (2011) considered the effect on simulated CCN concentrations of co-varying eight parameters in a global aerosol microphysics model, showing that in the European free troposphere, simulated CCN concentra-tions are highly sensitive to parameters associated with the treatment of nucleation scavenging.

Conclusions
We have carried out the largest ever intercomparison of model simulated size distributions among the new generation of global aerosol microphysics models.Twelve global microphysics models have participated in the coordinated experiments within the AeroCom multi-model intercomparison initiative.We have derived benchmark multi-model data sets based around the mean of the central two-thirds of these models which provides a best estimate of global variation of the sub-µm particle size distribution, critical for understanding aerosol-climate interactions.These multi-model data sets will also serve as a useful reference to assist in model development.
An assessment of the diversity of the central two-thirds of models has identified regions where the models agree and disagree in terms of their predictions of size-resolved particle concentrations and mass concentrations of BC and sulfate.The different patterns of diversity can be explained by dominating aerosol processes and their associated uncertainty.In regions of strong anthropogenic emissions, the diversity of simulated number concentrations of particles larger than 30 nm dry diameter (N 30 ) is very high (factor 2 to 6), while the diversities of N 100 (factor 1.5 to 2) and of sulfate and BC mass concentrations (factor 1.2 to 3) are lower.The high N 30 diversity in emissions regions is most likely due to intermodel differences in the size distribution assumed for primary emitted particles, which is a key parameter in need of better observational constraint.In remote marine regions, the pattern of size-resolved diversity is opposite to polluted regions, with N 30 diversity (factor 1.5 to 2) much lower than for N 100 (factor 2 to 5), sulfate (factor 2 to 4) and BC (factor 5 to 15).The relatively low N 30 diversity in remote environments suggests that current global aerosol microphysics models are fairly consistent in their simulation of "natural" background concentrations of particles in the 30 to 100 nm dry diameter range.Model diversity is highest in polar regions, where N 30 diversity reaches a factor 2 to 7 and N 100 diversity a factor 6 to 20.
Although there is large model diversity, the central models in general capture well the global variation of the size distribution.For example, the mean of the central two-thirds models agrees very well with observed total particle concentrations at Global Atmosphere Watch sites on the annual mean.Exceptions are poor agreement at the Arctic site Barrow, moderate high biases at South Pole and moderate low biases at Samoa, Mace Head, Neumayer and Southern Great Plains.For this central two-thirds mean, agreement is reasonable against particle size distributions over Europe, aside from the Arctic site Zeppelin, and Ispra, which is strongly affected by nearby pollution sources and steep orography, features not expected to be well captured by the global models.However, there are some important biases common among the models at many of the EUSAAR/GUAN sites.For example there is a strong underprediction of accumulation mode particle concentrations during winter, which is likely due to inadequately constrained particle number sources (both primary and secondary) or underprediction of growth due to a general underprediction of wintertime sources of mass (for example from secondary organic aerosol), or both.The results also show that model Aitken mode concentrations are too high during winter and too low during summer, which may reflect an underprediction of particle growth (to larger sizes) in winter and an underprediction of nucleation events in the summer.
The central models capture well the general meridional variation of size distribution in marine regions, with number concentrations at high latitudes mainly in the Aitken mode, whereas the Aitken and accumulation modes have similar number concentrations in the Tropics and mid-latitudes.However, for total particle concentrations (larger than 10 nm) there is a general overestimation in the Northern Hemisphere mid-latitudes and a low bias in the Southern Hemi-sphere mid-latitudes.The Southern Ocean low bias in total and Aitken particle number concentrations may be due to the models not adequately capturing the observed emission of sea-spray at sub-100 nm sizes (e.g.O'Dowd and Smith, 1993;Clarke et al., 2006;Pierce and Adams, 2006).
The global aerosol microphysics models capture very well the observed peak in ultra-fine condensation nuclei concentrations in the upper troposphere, which is caused by efficient new particle formation in that region.In continental regions there is a tendency to overpredict particle concentrations which could indicate a deficiency in nucleation parametrisations or in the simulated condensation sink.
Overall, the multi-model-mean data set constructed in this study has been shown to have reasonable skill in simulating global particle size distributions, albeit with some important biases in some locations and seasons.The incorporation of aerosol microphysics schemes into climate models has the potential to represent a significant step forward in the fidelity of simulated aerosol radiative forcings.The findings here indicate that most of these global aerosol microphysics models are performing quite well in terms of global variation of the size distribution.Further work to compare the models against size distribution observations at higher temporal resolution is required to better characterise primary and secondary particle sources.Greater understanding of the role of secondary organic aerosol and other components (e.g.nitrate) in affecting nucleation and particle growth in the boundary layer is also required.Edited by: Y. Balkanski dN d log 10 (D) was first constructed on the parent size grid: dN d log 10 (D) i

Figure 1 .
Figure 1.Global maps of central-8 model mean (panels a and b) and diversity (panels c and d) for simulated annual mean surface mass concentrations of sulfate (a, c) and black carbon (b, d).Diversity here is the ratio of the maximum and minimum values over the central 8 of the 12 models (defined locally, as described in Sect.2.4).Note that the geometric mean is used when averaging over the central-8 models.

Figure 2 .
Figure 2. Global maps of central-8 model mean (panels a and b) and diversity (panels c and d) for simulated annual mean surface sizeresolved number concentrations for N (D p > 30 nm) (a, c) and N(D p > 100 nm) (b, d).Diversity here is the ratio of the maximum and minimum values over the central 8 of the 12 models (defined locally, as described in Sect.2.4).Note that the geometric mean is used when averaging over the central-8 models.

Figure 3 .
Figure 3. Zonal-mean vs. latitude and altitude plots of central-8 model mean and diversity for simulated annual mean mass concentrations of sulfate (a, c) and black carbon (b, d).Diversity here is the ratio of the maximum and minimum values over the central 8 of the 12 models (defined locally, as described in Sect.2.4).All concentrations are with respect to local temperatures and pressures in the models.Note that the geometric mean is used when averaging over the central-8 models.

Figure 4 .
Figure 4. Zonal-mean vs. latitude and altitude plots of central-8 model mean and diversity for simulated annual mean size-resolved number concentrations for N(D p > 30 nm) (a, c) and N (D p > 100 nm) (b, d).Diversity here is the ratio of the maximum and minimum values over the central 8 of the 12 models (defined locally, as described in Sect.2.4).All concentrations are with respect to local temperatures and pressures in the models.Note that the geometric mean is used when averaging over the central-8 models.

Figure 6 .
Figure6.Simulated annual mean surface N (D p > 3/10/14 nm) against CPC observations at all 13 GAW sites.Different size thresholds are used for each site corresponding to the cut-off diameters for the CPC used (3 nm at Cape Grim, Hohenpessenberg, 10 nm at Jungfraujoch, Mauna Loa, Mace Head, Southern Great Plains, Pallas and 14 nm at South Pole, Neumayer, Barrow, Samoa, Trinidad Head, Bondville).The model values are geometric means over the central 8 models with the vertical whisker indicating their range.For the observations, the multi-annual mean is shown with the horizontal whisker showing plus and minus the standard deviation over the several years of data shown in Table3.

Figure 9 .
Figure9.Simulated annual cycle in surface N (D p > 10 nm/14 nm/3 nm) against CPC observations at continental boundary layer GAW sites Southern Great Plains (10 nm), Bondville (14 nm), Pallas (10 nm) and Hohenpeissenberg (3 nm).The solid line is the geometric mean over the central two-thirds models in each month, with the dashed lines the minimum and maximum over those central-8.The dotted line shows the minimum and maximum over all 12 models.The error bars on the observations indicate the standard deviation over the several years of data shown in Table3.

Figure 10 .
Figure 10.Simulated annual mean surface N(D p > 30 nm (a), > 50 nm (b) and > 100 nm (c), against those measured by SMPS/DMPS instruments at 17 of the EUSAAR/GUAN sites (excludes those at high altitude, taken as above 900 m altitude).Model values are the geometric mean of the central two-thirds model annual-means, with the vertical whiskers indicating the minimum and maximum values over those central 8. Observed values are arithmetic means over the hourly measurement data (A.Asmi, personal communication, 2012).

Figure 14 .
Figure 14.Box plots indicating the median, 25th and 75th percentiles of model to observation ratio for (a) N 30 , (b) N 50 and (c) N 100 at the 17 low-altitude EUSAAR/GUAN sites.Winter and summer values are shown in blue and red respectively.The plots show the base-10 logarithm of the ratio, so a value of 1.0 means a factor of 10 high bias and a value of −1.0 means a factor of 10 low bias.The dashed lines indicate where the model is within a factor of 2 of the observations.

Figure 17 .
Figure 17.Summer (a-f) and winter (g-l) multi-model simulated size distributions against DMPS/SMPS measurements at the six EUSAAR sites classified as western Europe: Cabauw (CBW), Harwell (HWL), Mace Head (MHT), Mediterranean: Ispra (JRC), Finokalia (FKL) or Arctic: Zeppelin (ZEP).Shown are the central-8 model geometric means (red solid), central-8 model maximum/minimum (red dashed) and all-12 model minimum/maximum (red dotted) of the June-July-August (arithmetic) mean size distributions at each site.Observed values (black solid line) are arithmetic means over the hourly measurement data (A.Asmi, personal communication, 2012).The published(Asmi et al., 2011) median (black dot-dashed) and 5th to 95th percentile range (grey shading) over the hourly measurement data are also shown for reference.

Figure 18 .
Figure18.Summertime central-model simulated profiles of N(D p > 5 nm) (a), N (D p > 15 nm) (b), and N (D p > 120 nm) (c), over Germany against those derived from aircraft-borne CPC and PCASP measurements (asterisks) during the Lindenberg Aerosol Characterisation Experiment(Petzold et al., 2002), as presented byLauer et al. (2005).The solid line shows the geometric mean of the central-8 models, dashed lines indicate the maximum and minimum of the central-8, while dotted lines indicate the maximum and minimum over all 12 models.

Figure 19 .
Figure 19.Southern Hemisphere annual-mean central-model simulated size distributions in the marine boundary layer averaged into 15 • latitude ranges to compare against the compilation of 30 yr of cruise DMPS/APS measurements from Heintzenberg et al. (2000).The solid line shows the geometric mean of the central-8 models, dashed lines indicate the maximum and minimum of the central-8, while dotted lines indicate the maximum and minimum over all 12 models.

Figure 20 .
Figure 20.Northern Hemisphere annual-mean central-model simulated size distributions in the marine boundary layer averaged into 15 • latitude ranges to compare against the compilation of 30 yr of cruise DMPS/APS measurements from Heintzenberg et al. (2000).The solid line shows the geometric mean of the central-8 models, dashed lines indicate the maximum and minimum of the central-8, while dotted lines indicate the maximum and minimum over all 12 models.

Figure 22 .
Figure 22.Simulated vertical profile of marine size-resolved N 3 profiles over the Pacific and Southern Oceans compared to observed compilation of aircraft-borne u-CPC measurements as compiled in Clarke and Kapustin (2002).Model values are averages over grid boxes in the latitude ranges a) 70 to 20 • S, b) 20 • S to 20 • N and c) 20 to 70 • N. Longitude ranges used to sample the models were a) 185 to 90 • W), b) 160 to 120 • W) and c) 135 to 180 • E respectively.These averaged profiles for each model were interpolated onto a 1 km vertical grid.Again, since the measurements are taken over many different seasons, annual mean values were used when constructing the multi-model quantities.The solid line shows the geometric mean of the central-8 models, dashed lines indicate the maximum and minimum of the central-8, while dotted lines indicate the maximum and minimum over all 12 models.Note that model particle concentrations have been converted to values at standard temperature (300 K) and pressure (1000 hPa) to be consistent with these u-CPC measurements.In all other figures measured and model values are at ambient conditions.

the
European Research Council (ERC) under FP7 grant agreement FP7-280025.K. J. Pringle, H. Tost and J. Lelieveld received funding from the ERC (grant agreement 226144).K. Zhang was supported by funding from the Max Planck Society.Simulations with ECHAM5-HAM2 were performed at the German Climate Computing Center (Deutsches Klimarechenzentrum GmbH, DKRZ).K. von Salzen was supported by the Canadian Foundation for Climate and Atmospheric Sciences (CFCAS) and Environment Canada.A. Strunk acknowledges financial support from the Flemish agency for Innovation by Science and Technology (IWT) through the Climate and Air Quality Modelling for Policy Support (CLIMAQS) project.F. Yu and G. Luo were supported by NASA under grant NNX11AQ72G and NSF under grant 0942106.The EUSAAR network of aerosol supersites were established with EU funding from the Research Infrastructure Action under the FP6 Structuring the European Research Area Programme, Contract RII3-CT-2006-026140. Data for the Aspvreten and Zeppelin sites were provided by the Atmospheric Science Unit, Department of Applied Env.Sci, Stockholm University with financial support from the Swedish Environmental Protection Agency.The Harwell station is operated with financial support from the UK Department for Environment, Food and Rural Affairs.The Mace Head station received support from several Irish Government Agencies including the EPA, Met Eireann, and Department of the Environment.S. G. Jennings and C. D. O'Dowd would like to acknowledge the support of the European Union, through various projects within the 5th, 6th and 7th Framework programmes.J. Heintzenberg gratefully acknowledges financial support from the German Ministry of Education and Science (AFO 2000 programme) and from the European Commission's DGXII Environment RTD 4th, 5th, 6th and 7th framework programmes.Airborne data provided by A. Clarke, University of Hawaii, represent about 2 decades of approximately equal support from the NSF-Atmospheric Chemistry Program and the NASA-Earth Science Division.We are also grateful to the two anonymous reviewers whose comments have improved the paper considerably.

Table 2 .
Treatment of emissions, oxidants and nucleation in each model.Abbreviations for emissions are AERO-00

Table 3 .
Observational data sets on size-resolved number concentrations used in the evaluation of the global aerosol microphysics models.
Global map indicating the locations of the measurement data sets shown in Table3.Coloured circles show GAW-WDCA stations (blue), EUSAAR/GUAN supersites (aqua) and the location of the LACE 98 field campaign (red).The aqua boxed regions indicate where the aircraft field campaign measurements compiled in