Constraints on aerosol processes in climate models from vertically-resolved aircraft observations of black carbon

Constraints on aerosol processes in climate models from vertically-resolved aircraft observations of black carbon Z. Kipling, P. Stier, J. P. Schwarz, A. E. Perring, J. R. Spackman, G. W. Mann, C. E. Johnson, and P. J. Telford Department of Physics, University of Oxford, UK Co-operative Institute for Research in Environmental Sciences, University of Colorado, Boulder, Colorado, USA Chemical Sciences Division, Earth System Research Laboratory, National Oceanic and Atmospheric Administration, Boulder, Colorado, USA National Centre for Atmospheric Science, University of Leeds, UK School of Earth and Environment, University of Leeds, UK Met. Office Hadley Centre, Exeter, UK National Centre for Atmospheric Science, University of Cambridge, UK Department of Chemistry, University of Cambridge, UK


Introduction
Aerosol particles in the atmosphere play an important role in the climate system on both global and regional scales, through several mechanisms: direct modification of the short-wave radiation budget by scattering and absorption ( Ångström, 1962;Schulz et al., 2006;Myhre et al., 2013); effects on clouds and the hydrological cycle, indirectly modifying the radiation budget (Twomey, 1977;Albrecht, 1989;Lohmann and Feichter, 2005); and "semi-directly" by altering the temperature profile of the atmosphere, and evaporating or suppressing cloud, through absorption of radiation (Hansen, 1997;Koch and Del Genio, 2010).Consequent changes to circulation patterns may lead to additional effects (e.g.Roeckner et al., 2006).The magnitudes of all these effects are subject to considerable uncertainty.
Black carbon (BC) aerosol can contribute to all of these classes of effect, although its absorption of short-wave radiation makes it of particular interest in the context of the direct and semi-direct effects (Stier et al., 2007;Ramanathan and Carmichael, 2008).The relative magnitudes of these effects, and thus the sign of the net (semi-)direct forcing due to BC, are thought to depend heavily on the vertical distribution of BC, and in particular its altitude relative to cloud layers (Johnson et al., 2004;Zarzycki and Bond, 2010).In addition, "aged" BC particles with a soluble coating can act as cloud condensation nuclei (Penner et al., 1996;Lohmann et al., 2000), and thus contribute to indirect effects; ageing may also reduce the lifetime of black carbon (by increasing susceptibility to wet deposition) and enhance its absorption of radiation (Ackerman and Toon, 1981;Stier et al., 2006;Schwarz et al., 2008).
Some progress has been made in analysing the relative positions of BC and cloud layers, and the resulting radiative effects, from satellite observations (Peters et al., 2011;Wilcox, 2012).However, neither passive satellite remote sensing nor ground-based observations can provide well-resolved vertical profiles of BC (or aerosol in general), and thus we turn to in-situ aircraft observations.Although such observations are limited in spatial and temporal coverage, they can provide data with much better vertical resolution than can be obtained from other sources, as well as more direct measurements of the quantities (e.g.concentrations, mixing ratios, composition and particle size distributions) represented in aerosol models.
Previous studies using aircraft observations to evaluate aerosol models on a global scale have generally compared monthly-mean model profiles with campaign-mean profiles from a collection of separate campaigns (which may differ in their methodology), each over a limited geographical area (e.g.Koch et al., 2009).Other studies have focused on more detailed evaluation on a regional scale using individual flight campaigns -e.g.Reddington et al. (2013), which also highlights the importance of uncertainties in the size distribution of BC as well as its total mass.
The large-scale flight campaign conducted by the Highperformance Instrumented Airborne Platform for Environmental Research (HIAPER) Pole-to-Pole Observations (HIPPO) of Carbon Cycle and Greenhouse Gases Study (Wofsy et al., 2011) provides the opportunity to evaluate against consistently-collected data from a single campaign over a large area of the Pacific region.The data are described in more detail in Sect. 2. The BC data from the first phase of the HIPPO campaign are analysed in Schwarz et al. (2010), where the observed vertical profiles are used to evaluate the simulated BC profiles from the Aerosol Comparisons between Observations and Models (AEROCOM; http://dataipsl.ipsl.jussieu.fr/AEROCOM)Phase I (Textor et al., 2006) models, comparing climatological monthly-mean model profiles against regional-mean profiles from HIPPO.The model diversity is large -one to two orders of magnitude over a wide altitude range, both in the Pacific regions studied in Schwarz et al. (2010) and the continental regions in Koch et al. (2009) -but both the mean and median of the model ensemble systematically overestimate the BC mass mixing ratio (MMR) compared to the observations.
In this study, we carry out a more detailed evaluation of the vertical distribution of BC in two particular models, HadGEM3-UKCA and ECHAM5-HAM2, against BC mass mixing ratio data derived from the first three phases of the HIPPO campaign.These models and their configurations are described in Sect.3. Rather than averaging the instantaneous observations on a regional basis and comparing to model climatology, we use nudging and interpolation techniques to sample the models in time and space along the track of the flight campaign, as described in Sect. 4.
We apply this approach to investigate and constrain the effects of convective scavenging (which has an important role in controlling vertical transport) and biomass-burning emissions (which are the most temporally and spatially variable source of BC) on the vertical profile of BC in the models.To this end, we conduct a series of sensitivity tests, as described in Sect.5, to assess how the agreement with the observations is affected by the choice of convective scavenging scheme and emissions inventory.with some profiles extending to ⇠ 14km.In total, we identify 184 separate vertical profiles suitable for our analysis (the criteria used are discussed in Sect.4.1).A wide range of instruments were carried on these flights, but for our purposes the most relevant data comes from a Single Particle Soot Photometer (SP2; Schwarz et al., 2006), which measures the mass of BC in individual aerosol particles.Particles were detected within a range of ⇠ 0.8 to 175fg BC (⇠ 75 to 540 nm volume-equivalent diameter, assuming a void-free density of 1.8 ⇥ 10 3 kg m 3 ).
Following Schwarz et al. (2010), we calculate the MMR of BC in the atmosphere by aggregating the observed particles over 1-minute intervals: where (M 1 ,...,M N ) are the masses of BC in each individual particle observed by the SP2 instrument, F is the volumetric flow rate at which the air is sampled (4cm 3 s 1 , constant) and ⇢ air is the density of the sampled air, derived from contemporaneous measurements of ambient pressure from the HIPPO flight data, and a fixed temperature of 290 K representing the cabin air temperature of the aircraft.(These are an approximation of the actual sampling conditions, but the resulting error is small compared to that from other sources.)The factor of 1.1 inflates the mass by 10 % to account for the portion of the aerosol size spectrum which the instrument does not detect, as per Schwarz et al. (2010).We then produce a "curtain" plot of BC MMR against latitude and altitude to show the distribution of BC over a vertical slice through the atmosphere (top row of Fig. 2).We attach an uncertainty of ±30 % to these MMR values (the ±40 % quoted in Schwarz et al. is now considered overly cautious).In the cleanest regions (BC MMR less than ⇠ 0.5ngkg 1 ), where only a small number of particles were detected per minute, the sampling uncertainty of the observations is likely to contribute significantly to the scatter in the results; however this is not considered further in the present study.
With the exception of HIPPO-2, which makes a detour to Australia at about 30 S, there is generally more BC seen in the Northern Hemisphere than the Southern Hemisphere at all levels (which is consistent with the greater anthropogenic emissions in the north) -this contrast is particularly stark for HIPPO-3, which spent very little time near land.While some BC is seen in the lower and mid tropical troposphere (⇠ 10 11 kg kg 1 in places), very little is seen at higher levels in the tropics in any of the phases (typically less than 10 12 kg kg 1 above about 6 km); at higher latitudes, however, significant BC mass mixing ratios (above 10 11 kg kg 1 ) frequently extend into the upper troposphere.

Models
Two aerosol-climate models are considered here: HadGEM3-UKCA and ECHAM5-HAM2.These are described in the following sections, and the major differences relevant to black carbon aerosol are summarised in Table 1.

HadGEM3-UKCA
HadGEM3 (Hewitt et al., 2011) is the latest version of the Hadley Centre Global Environmental Model developed at the UK Met.Office.Although the full model contains many components (atmosphere, land surface, ocean, sea ice etc.), this study is concerned only with the uncoupled atmosphere component, using prescribed sea-surface temperature (SST) and sea ice fields.The dynamical core (Davies, 2005) is non-hydrostatic and fully compressible, with semi-Lagrangian advection and a hybrid sigma/height vertical coordinate.Large-scale cloud uses the bulk prognostic scheme of Wilson et al. (2008), with precipitation microphysics  respectively).The circles show the BC burden (in kg m 2 ) estimated from the HIPPO SP2 observations over each vertical profile, while the background shading shows the monthly-mean BC burden from the HadGEM3-UKCA (BASE and CVSCAV+G3M) and ECHAM5-HAM2 (G3M) simulations.The bottom row shows the burdens from the AEROCOM Phase I (Textor et al., 2006) median model (constructed from the ARQM, GISS, GOCART, GRANTOUR, KYU, LOA, MATCH, MPI HAM, MOZGN, PNNL, UIO CTM, UIO GCM, ULAQ and UMI models).The side plots show the observed burdens (red bars, representing the range due to uncertainty in extrapolation of profiles to the surface and a 15 km lid, plus the 30 % uncertainty in the mixing ratios used), the along-track model burden (blue line, two-valued due to the southbound and northbound legs) and the zonal range of the model burden between the map edges (shading).
based on Wilson and Ballard (1999); sub-grid-scale convection is based on the mass-flux scheme of Gregory and Rowntree (1990) with subsequent modifications.
The standard tropospheric chemistry scheme in UKCA (O'Connor et al., 2013) is used.This includes oxidants (O x , HO x and NO x ) and hydrocarbons (CO, ethane and propane) with eight emitted species, 102 gas-phase reactions, 27 pho-tolytic reactions and interactive wet and dry deposition.An additional aerosol-precursor chemistry scheme treats the oxidation of sulphur compounds (SO 2 and dimethyl sulphide) and monoterpene to form the sulphuric acid and organic compounds which may condense to form secondary aerosol material.The aerosol scheme in UKCA (Mann et al., 2013) is the two-moment modal version of the Global Model of Aerosol Processes (GLOMAP-mode; Mann et al., 2010), which follows the M7 framework (Vignati, 2004) in transporting five components (sulphate, sea salt, black carbon, particulate organic matter and mineral dust) in seven internally-mixed log-normal modes (four soluble and three insoluble; not all components are found in all modes).Because mineral dust is transported by a separate scheme (Woodward, 2001) in HadGEM3, only four components and five modes are enabled in the UKCA configuration of GLOMAP-mode used here (omitting the two larger insoluble modes which contain only mineral dust).The representation of aerosol microphysical processes is based on the sectional GLOMAP-bin scheme (Spracklen et al., 2005), with each process acting sequentially in an operator-split manner (except nucleation, coagulation and condensation which are solved iteratively).

Z. Kipling et al.: Constraints on aerosol processes in climate models
Primary BC emissions use the AEROCOM recommended size distributions (Dentener et al., 2006), as modified by Stier et al. (2005), but with biofuel emissions using the same distribution as fossil fuel rather than biomass burning.Fossil-fuel and biofuel emissions are added to the lowest model level with a geometric mean diameter of 60 nm, while biomassburning emissions have a geometric mean diameter of 150nm and are distributed uniformly in height over levels 2 to 12 (⇠ 50 m to 3 km, compressed over orography) -this is different to the TOMCAT-based version of GLOMAP-MODE documented in Mann et al. (2010), which uses the biomedependent vertical profiles recommended in Dentener et al. (2006).For all sources, the geometric standard deviation of the particle diameter is 1.59.
BC aerosol is initially insoluble, but can be "aged" into the soluble Aitken mode following uptake of sulphuric acid and secondary organic material via condensation and coagulation.This ageing proceeds at a rate consistent with a 10monolayer coating being required to make a particle soluble.
All sizes of soluble and insoluble aerosol particles may be removed by dry deposition and below-cloud impaction scavenging; soluble accumulation-and coarse-mode particles may also be removed by in-cloud nucleation scavenging.Dry deposition and gravitational sedimentation are calculated following Slinn (1982) and Zhang et al. (2001).Below-cloud scavenging follows Slinn (1984), using Beard and Grover (1974) scavenging coefficients and terminal velocities from Easter and Hales (1983), assuming a modified Marshall-Palmer raindrop size distribution (Sekhon and Srivastava, 1971).In-cloud scavenging by large-scale precipitation assumes that 100% of the aerosol in the soluble accumulation and coarse modes is taken up by cloud water in the cloudy fraction of each 3-D grid box, and is then removed at the same rate at which the large-scale cloud water is converted to rain.(Nucleation, Aitken and insoluble modes are not subject to in-cloud scavenging.)Aerosol is removed immediately, and is not returned to the atmosphere when rain evaporates.Convective rainfall is treated similarly, but assumes a cloud fraction of 30 % and a conversion rate of 99 % over 6 h in all grid-boxes where convective rain is produced.(This is different to the TOMCAT-based version of GLOMAP-mode, in which convective scavenging is dependent on the rain rate while large-scale scavenging uses a fixed removal timescale.)The scavenged aerosol is removed from the grid-box mean tracers after the convection scheme has run -i.e. from the post-convection environmental air at the level where the precipitation formed, rather than the convective updraught itself.This allows a greater separation of the convection and aerosol schemes, but may limit the ability of convective scavenging to control vertical transport (as we show in Sect.6.1).
The model configuration used here is based on a development version of HadGEM3 (atmosphere-only, climatological SST, Met.Office Unified Model version 7.3) at N96L38 resolution (1.25 latitude ⇥ 1.875 longitude ⇥ 38 vertical levels up to ⇠ 40 km) with UKCA in a standard tropospheric chemistry and aerosol configuration as described above, with aerosol feedbacks disabled.In order to capture the meteorological conditions at the time of the flight campaign, we use the technique of nudging (Jeuken et al., 1996).In the HadGEM implementation (Telford et al., 2008(Telford et al., , 2013)), potential temperature and horizontal wind are relaxed towards fields from the ERA-Interim reanalysis (Dee et al., 2011).The relaxation time constant is the "natural" one of 6 h (the time spacing of the reanalysis data); this choice is validated in Telford et al. (2008).The nudging is applied between levels 14 (⇠ 4 km) and 32 (⇠ 21 km) inclusive; levels 13 and 33 are nudged at half strength (i.e. with a 12 h time constant), and no nudging is performed on levels outside this range.Free-running simulations (without nudging) were also run for comparison.
For the sensitivity tests, four different simulations were carried out covering the period of the first three phases of the HIPPO campaign, as shown in Table 2.All simulations were run from September 2008 through to the end of April 2010, allowing four months spin-up before the start of HIPPO-1.No re-tuning of the model was performed for either the nudged or the free-running simulations.
The BASE configuration is derived from the standard UKCA aerosol configuration, which takes its black carbon emissions from the AEROCOM hindcast inventory (Diehl et al., 2012), including emissions from fossil fuel, biofuel and biomass burning through to the end of 2006.Although the HIPPO campaign began after this, the fossil fuel and biofuel emissions have little interannual variability and so we simply repeat those for 2006.Biomass burning, however, has significant interannual variability; since the emissions inventory does not cover the required period, we used a monthly climatology derived from the "modern" portion of the AERO-COM hindcast inventory (1997 to 2006), which is based on monthly-mean emission fields of the Global Fire Emissions Database (GFED) version 2 (van der Werf et al., 2006).Other (non-BC) emissions are also taken from year 2006 of the AEROCOM hindcast inventory, or (for additional gas-phase emissions not included therein but required by the UKCA chemistry scheme) from year 2006 of Representative Concentration Pathway (RCP) 8.5 (Riahi et al., 2011).

ECHAM5-HAM2
ECHAM5 (Roeckner et al., 2003) is the fifth-generation climate model developed at the Max Planck Institute for Meteorology.It has a spectral dynamical core, solving prognostic equations for vorticity, divergence, surface pressure and temperature in spherical harmonics with a triangular truncation.A hybrid sigma/pressure vertical coordinate is used.Physical parameterisations are solved on a corresponding Gaussian grid.Tracer transport is semi-Lagrangian in grid-point space (Lin and Rood, 1996).
Table 2. Configurations and emissions used for model simulations of the HIPPO campaign.The inventory (GFED2 or GFED3.1)used for biomass-burning emissions is shown, along with the year for which these emissions are specified.Other emissions are taken from the AEROCOM Hindcast inventory, or (for additional gas phase emissions in UKCA) RCP 8.5.HAM 2.0 (Stier et al., 2005;Zhang et al., 2012) is also a two-moment modal aerosol scheme based on the M7 framework (Vignati, 2004), transporting five components (sulphate, sea salt, black carbon, particulate organic matter and mineral dust) in seven internally-mixed log-normal modes (four soluble and three insoluble).Unlike in UKCA, mineral dust in ECHAM5-HAM2 is incorporated into the M7 framework.

Model
Primary BC emissions use a modified version of the AE-ROCOM recommended size distributions, accounting for the width of the M7 modes.Fossil-fuel and biofuel emissions are added as a surface flux to the boundary-layer vertical diffusion equations, while biomass-burning emissions use a biome-dependent vertical profile, as specified for AERO-COM Phase I (Dentener et al., 2006).BC aerosol is initially insoluble, but can be "aged" by sulphate through condensation and coagulation to become soluble; in contrast to UKCA only a single monolayer is required.
Dry deposition of soluble and insoluble particles follows Ganzeveld et al. (1998), modified to use the explicit size distribution from the model, and is applied as a surface flux to the boundary-layer vertical diffusion along with the emissions.Below-cloud scavenging is calculated according to the rain and snow fluxes, using size-dependent collection efficiencies from Seinfeld and Pandis (1998).In-cloud scavenging assumes that a prescribed fraction of the number and mass of aerosol in each mode from the cloudy part of each grid box is susceptible to removal, at the rate at which largescale cloud water/ice is converted to rain/snow.Scavenging in convective clouds is coupled with the tracer transport in the mass-flux convection scheme, and proceeds similarly but removing aerosol from the convective tracer flux according to the rate at which water and ice are removed in convective precipitation.Where (a fraction of) the precipitation in a column evaporates before reaching the ground, the same fraction of the aerosol removed from the column is returned to the atmosphere.
The model configuration used here is based on ECHAM 5.5 (atmosphere-only, AMIP2 prescribed SST) at T63L31 resolution (⇠ 1.875 ⇥ 31 vertical levels up to ⇠ 10hPa) with HAM 2.0.Once again, the large-scale dynamics are nudged towards ERA-Interim (Dee et al., 2011) reanalysis data, following Jeuken et al. (1996): temperature, vorticity and divergence are surface log-pressure are relaxed towards the reanalysis fields with time constants of 24 h, 6 h, 48 h and 24 h, respectively, on all model levels.The nudging is performed in spectral space, on all but the wavenumber-0 (global-mean) spectral component.
For the sensitivity tests, three different simulations were carried out for the period covering the first three phases of the HIPPO campaign, as shown in Table 2.As for HadGEM3-UKCA, all simulations were run from September 2008 through to the end of April 2010, allowing four months spin-up before the start of HIPPO-1, and no re-tuning was performed.
In the BASE configuration, emissions are taken from the AEROCOM hindcast inventory (Diehl et al., 2012) for 2006, with biomass-burning emissions using a 1997 to 2006 climatology as described for HadGEM3-UKCA.

Method
As mentioned in Sect. 1, to best compare the simulations with the aircraft measurements from the HIPPO campaign, we sample the output of the HadGEM3-UKCA and ECHAM5-HAM2 models along the flight track.However, to give a general indication of how the models' BC distributions compare to the observations, we first show regional maps of the simulated BC column burden with that derived from the SP2 measurements over-plotted along the flight track (Fig. 1).

Burdens
From a modelling perspective, column-integrated mass burdens are a useful metric by which to measure the distribution of aerosol.However, it is difficult to obtain direct measurements of aerosol burden on large scales, as satellite-based instruments can only measure integrated optical properties (with passive instruments) or vertically-resolved backscatter (with active lidar).Burdens cannot be inferred from such measurements without additional knowledge of the chemical and microphysical properties of the aerosol particles.Ground-based sun-photometers and lidar are similarly limited, while ground-based in-situ measurements are limited to particles near the surface.The geographical and vertical coverage of the HIPPO campaign, however, provides a basis on which to evaluate model burdens directly.
We estimate the local BC column burden in the vicinity of each HIPPO ascent or descent profile.Suitable profiles are identified as periods of near-continuous ascent or descent covering at least the 0.5 km to 7.5 km altitude range.From each profile, the mean BC concentration (mass of BC per unit volume) in each 0.5 km altitude interval from 0 to 15 km is calculated.These are then integrated vertically to give an estimate of the column burden (shown on the maps in Fig. 1 as coloured circles).
Because the HIPPO profiles do not extend all the way to the surface or our 15 km "lid", there is some uncertainty in how we extrapolate the profile when calculating the burden.We calculate a lower estimate by assuming the BC concentration is zero outside the altitude range of the observations; for an upper estimate, we assume that the BC concentrations observed at the bottom and top of the profile continue to the surface and 15km, respectively.(This does not give a true upper bound on the burden, since the concentrations outside the observed altitude range may be higher then those within, but provides an estimate of the extrapolation uncertainty.)Because Schwarz et al. (2010) attribute the largest part of the ±30% uncertainty in BC MMR to calibration (correlated) rather than random (uncorrelated) error, we assume that the full ±30% may apply to the derived burden estimates.These ranges (including both the extrapolation and measurement uncertainty) are shown as the red bars on the side-plots in Fig. 1.

Point-by-point comparison
For a more detailed point-by-point comparison, we perform on-line interpolation of the instantaneous mass mixing ratio fields from each model to the points along the HIPPO flight track, following O' Connor et al. (2005) and Telford et al. (2013).The spatial interpolation is linear in log-pressure and both horizontal directions.Temporally, each observation is matched to the following model time-step.Coupled with nudging to reproduce the observed synoptic conditions (notwithstanding the uncertainty in reanalysis fields in remote regions with sparse observations), this allows us to sample the model output consistently with the observations rather than using a monthly mean or climatology.Although neither water vapour nor any cloud variables are nudged directly, Telford et al. (2008) show that large-scale cloud and precipitation patterns are reproduced well in a nudged model, while Russo et al. (2011) show that for convection a nudged model performs as well as an offline chemical transport model (CTM) driven directly by meteorological fields from a reanalysis.
Once this sampling has been done, we can evaluate the model output pointwise against the actual HIPPO observations both visually and quantitatively.For a visual comparison we simply plot the differences in mass mixing ratio at each point on the flight track (see Figs. 3 and 4, discussed in detail in Sects.6.1 and 6.2).For a more quantitative analysis, we look at the mean difference (bias) and correlation coefficient between the logarithms of the real and simulated mass mixing ratios, over all the points along the flight track (see Fig. 5, discussed in detail in Sect.6.3).Logarithms are taken as the distribution of observed mixing ratios appears to be approximately log-normal; this results in a distribution which is more symmetric and closer to a normal distribution, making standard statistical techniques more meaningful.Without logarithms, the correlation coefficient is distorted by differences in the long upper tail of the distribution.
To estimate the uncertainty in the quantitative analysis, we use bootstrapping to construct 95% confidence intervals for the bias and correlation.Because both the observed and modelled data series show significant autocorrelation, we use a moving-block bootstrap (Kunsch, 1989) with block length 30 (i.e.resampling in approximately half-hour blocks).This provides an estimate of the uncertainty due to random sampling variability.To incorporate the uncertainty in the SP2derived mixing ratios, we extend the error bars on the bias by ±30% (to accommodate the worst-case effect on the bias, of a systematic calibration error).For the correlation, we apply random multiplicative Gaussian white noise with a standard

HadGEM3-UKCA CVSCAV+G3M
∆ BC MMR / kg kg −1 Fig. 3. Difference of BC mass mixing ratio (kg kg 1 ) simulated by HadGEM3-UKCA in each configuration (rows) from that observed during each phase of the HIPPO campaign (columns).The model is nudged and sampled along the HIPPO flight track at 1-min intervals; observed mixing ratio is calculated from HIPPO SP2 data aggregated over 1-min intervals.
deviation of 30 % to each bootstrap sample (to accommodate the worst case effect on the correlation, of completely uncorrelated observation errors).
There is some additional uncertainty in the comparison, due to the limited size range of the SP2 measurementswe adjust the measurements as described in Sect. 2 to account for this, but some uncertainty remains as in practice the fraction of BC which is within the detectable range will be variable.An alternative approach, of calculating the num-ber of modelled particles which would contain a detectable amount of BC, is problematic because the models assume uniform composition, with BC mass spread over all particles in a given mode.This results in lower BC masses per particle, and many fewer detectable particles, than if the BC is confined to a subset of particles -which Reddington et al. (2013) show may indeed be the case, at least in the more polluted air over continental Europe.5 Sensitivity tests

Biomass-burning emissions
One of the most variable and uncertain sources of BC is from biomass burning (responsible for approximately half the BC emissions by mass in the models, the remainder coming from fossil fuel and biofuel burning).The emissions used in the BASE configurations of both models are a monthly climatology derived from the AEROCOM hindcast inventory, itself based on the Global Fire Emissions Database (GFED), version 2 (van der Werf et al., 2006).However, GFED version 3.1 is now available (van der Werf et al., 2010).Amongst various improvements to the emission estimates, there is a substantial reduction in total carbon emissions from biomass burning, which is reflected in the BC emissions.In addition, GFED3.1 provides daily fractional emission fields (at the same 0.5 resolution as the monthly data, but not resolved by chemical species) which can be applied to each month's data to estimate emissions at daily time resolution, and a diurnal profile for each month (also at 0.5 resolution, in 3-h intervals), giving estimates at 3-hourly time resolution (Mu et al., 2011).The new dataset now covers the period to the end of 2010, sufficient for simulations during the first three phases of the HIPPO campaign and removing the need to extrapolate the emission dataset.Switching to GFED3.1 emissions for biomass burning gives the G3M (monthly emissions) and G3H (3-hourly emissions) configurations (the latter only implemented in ECHAM5-HAM2).In HadGEM3-UKCA, both BASE and G3M configurations distribute the biomass-burning emissions uniformly in height over levels 2 to 12 (⇠ 50 m to 3 km).In ECHAM5-HAM2, the BASE configuration uses a biome-dependent vertical profile for the emissions, as in AEROCOM Phase I (Dentener et al., 2006), while the G3M and G3H configurations divide the emissions equally between the model levels diagnosed to be within the boundary layer.

Convective scavenging
Convection plays a dominant role in the upward transport of both gaseous and particulate matter in the atmosphere, with wide variation amongst models especially for shortlived species (Hoyle et al., 2011).However, convection (especially the vigorous, deep convection that can transport air parcels from the boundary layer to the upper troposphere) is also associated with intense precipitation, and thus a significant amount of material may be removed by wet scavenging before it is detrained from the convective updraught.Schwarz et al. (2010) identify the treatment of this process as likely to be a major factor in the diversity of the AEROCOM models and their high bias compared to the HIPPO-1 SP2 observations, particularly in the tropics.
The models used here take two different approaches to convective scavenging.In the operator-split approach, as used in the BASE configuration of HadGEM3-UKCA, convective scavenging removes aerosol from the grid-box mean field after the convection scheme (including convective tracer transport) has run.In the in-plume approach, as used in ECHAM5-HAM2, aerosol is removed directly from the tracer flux in the convective updraught, along with the removal of water by convective precipitation.Additional simulations with HadGEM3-UKCA have also been carried out using an in-plume scheme (CVSCAV and CVSCAV+G3M configurations).This assumes that 100 % of the soluble accumulation and coarse modes in the upward convective tracer flux is taken up by the cloud drops and, therefore, removed in proportion to the amount of cloud water which precipitates (as in the existing scheme for large-scale cloud); additionally, 50 % (by mass and number) of the soluble Aitken mode is taken up and thus susceptible to removal, as a crude representation of the fact that smaller particles can be activated in the faster updraughts found in convective cloud.(The figure of 50% is somewhat arbitrary, and there is certainly scope for refinement -both the large-scale and convective schemes should ideally use an appropriate critical radius based on Köhler theory.)It should be noted, however, that this scheme does not yet include resuspension of aerosol when rain evaporates (nor does the existing operator-split scheme, or the large-scale scavenging scheme), unlike that in ECHAM5-HAM2.
Although in this study we focus on the impact of the coupling between convective transport and wet deposition, it is worth noting that the parameterisation of convective transport itself (in particular entrainment and detrainment) may have a significant impact on the vertical distribution of tracers, as demonstrated in Hoyle et al. (2011) and Croft et al. (2012).

HadGEM3-UKCA
The BC MMR from HadGEM3-UKCA (in its BASE configuration), sampled at 1-minute intervals along the flight track for the first three phases of the HIPPO campaign, is shown in the second row of Fig. 2.Although some features (e.g. the disparity between the hemispheres in the HIPPO-3 data) are well reproduced, the model does not appear to reproduce other large-scale features of the observations very well.Most noticeably, for all three phases, the model has a significant excess of BC in the upper troposphere, especially in the tropics.
Figure 3 shows the difference between the HadGEM3-UKCA simulations in each configuration and the actual observations from each phase of the HIPPO campaign.
It is clear from these difference plots that, at least for HIPPO-1 and HIPPO-2, the upper-tropospheric excess seen in the BASE configuration is largely removed when the inplume convective scavenging scheme is switched on (i.e. in CVSCAV and CVSCAV+G3M), suggesting that the lack of realistic convective scavenging may have been responsible.This is supported by the fact that -without adjusting any other parameters in the model -the improvement is so strong, while introducing very little in the way of new visible errors which we might expect to see if the new scheme was compensating for errors in a different process (which would likely have a different structure).The third row of Fig. 2 shows the BC mixing ratio from the CVSCAV+G3M simulation, which is visibly more realistic with respect to the observations.For HIPPO-3, the improvement is largely confined to the Southern Hemisphere; in the Northern Hemisphere both simulations produce too little aerosol at lower levels and too much aloft.This is despite the fact that HIPPO-3 observed more BC at upper levels in the Northern Hemisphere than the earlier phases.
The change in switching to GFED3.1 biomass-burning emissions (i.e.BASE to G3M) is less dramatic.While, for HIPPO-1 and HIPPO-2, the difference plot for the G3M simulation (second row of Fig. 3) indicates less of a positive bias than for BASE, the upper-tropospheric excess remains clear.Applying the emissions change on top of the in-plume convective scavenging (i.e.going from CVSCAV to CVS-CAV+G3M) removes what little excess remains in the middle and upper troposphere, but appears to leave an overall negative bias compared to the observations.For HIPPO-3, the differences from the choice of emissions are even less clear.Unlike for convective scavenging, we cannot be confident that the small improvements seen here are genuinely attributable to better emissions, rather than compensating for biases elsewhere in the model.
It thus appears that for HIPPO-1 and HIPPO-2 globally, and for HIPPO-3 in the Southern Hemisphere, the disagreement between the BASE model and observations is www.atmos-chem-phys.net/13/5969/2013/Z. Kipling et al.: Constraints on aerosol processes in climate models dominated by the lack of realistic convective scavenging, and is much improved when an in-plume approach is introduced.For HIPPO-3 in the Northern Hemisphere, however, it appears that the disagreement is dominated by other effects which have not yet been identified.This is consistent with HIPPO-3 occurring in Northern Hemisphere spring, when convective precipitation in the northern mid-latitude Pacific is relatively weak.
The differences can also be seen in the burdens (top two rows of Fig. 1).The BASE simulation over-predicts the BC burden at most of the profile locations, in many cases by an order of magnitude.CVSCAV+G3M performs much better, with the model burden frequently close to the range estimated from the HIPPO observations.For brevity, the separate plots for CVSCAV and G3M are omitted; however as before, most of the improvement is seen in the former and is particularly pronounced over the tropical warm pool region where strong convective scavenging is expected.The high burdens observed in the Arctic in HIPPO-1 were attributed to a localised biomass-burning plume (Schwarz et al., 2010) as they were dominated by two particular profiles which were close together.In HIPPO-2 and HIPPO-3, however, the high Arctic burdens are a more systematic feature of the profiles in this region.This suggests that the model is underestimating the transport of BC to the Arctic -either due to errors in the transport itself, or because it is removed too rapidly (probably by large-scale wet scavenging, since this affects both BASE and CVSCAV simulations).

ECHAM5-HAM2
The BC MMR from ECHAM5-HAM2 (in its BASE configuration), sampled at 1-minute intervals along the flight track for the first three phases of the HIPPO campaign, is shown in the bottom row of Fig. 2.These do not exhibit the large upper-troposphere excesses seen in the HadGEM3-UKCA BASE simulation, but there are some unexpectedly large mixing ratios at even higher altitudes (including into the lower stratosphere).
Figure 4 shows the difference between the ECHAM5-HAM2 simulations in each configuration and the actual observations from each phase of the HIPPO campaign.The lower-stratosphere anomalies are clear in all simulations, and for HIPPO-1 and HIPPO-2 the BASE configuration shows patches of (mostly positive) bias throughout the troposphere that are not immediately obvious from Fig. 2. Some of the strongest biases are reduced in the G3M simulation: in particular, at lower levels around the equator (for all three phases) and also in the southern mid-latitudes (for HIPPO-1).This suggests that part of the tropospheric error in the BASE configuration may be attributable to the choice and implementation of biomass-burning emissions; however as in HadGEM3-UKCA, the improvement is not decisive enough to exclude the possibility that we are compensating for other biases in the model.It is also possible that some of this dif-ference is due to the different vertical profile of emissions between BASE and G3M.However, an additional simulation (not shown here) for HIPPO-1 with the same GFED2 emissions as BASE, but the boundary-layer-following vertical profile of G3M shows results very similar to BASE.This suggests that it is the updated inventory, rather than the change in vertical profile, which makes the difference.Similarly, using a boundary-layer-following emission profile in HadGEM3-UKCA (instead of the default fixed ⇠ 50 m to 3 km profile) makes little difference, indicating that the different emission profiles do not contribute significantly to the differences between the two models.
As in HadGEM3-UKCA (CVSCAV), HIPPO-3 looks rather different to the earlier phases.There appears to be little change between the ECHAM5-HAM2 BASE and G3M simulations, with the tropospheric error in both simulations dominated by negative anomalies throughout most of the Northern Hemisphere.
For all three phases, there is almost no visible difference between the G3M and G3H simulations.This indicates that, at least for simulations in remote regions such as those covered in the HIPPO campaign, monthly biomass-burning emissions are sufficient as any high-frequency variability at the source is smoothed out during transport.Higher-timeresolution emissions could provide more benefits for simulations closer to source regions, however.
The ECHAM5-HAM2 simulated burdens (third row of Fig. 1) show a similar pattern of overestimating the observations as the HadGEM3-UKCA BASE simulation, despite ECHAM5-HAM2 already having an in-plume convective scavenging scheme.We cannot determine from this analysis what process is responsible for the high burden in ECHAM5-HAM2 -a more detailed study of the role of the different processes in this model would be required.It may still be some aspect of scavenging which is too weak, but equally the problem may be elsewhere.The AERO-COM Phase I (Textor et al., 2006) median model, shown in the bottom row, shows a similar positive bias.
The presence of high BC burdens in the remote Pacific in these models, in a way that does not correspond with observations, suggests that the model BC lifetime is too long.In HadGEM3-UKCA this appears to be a structural issue with convective scavenging; in ECHAM5-HAM2 and other models it may be due to different processes.

Quantitative evaluation
Figure 5 shows the bias and correlation coefficient of log(BC MMR) for each simulation against each phase of the HIPPO campaign, along with bootstrap uncertainty estimates as described in Sect. 4.
The improvement in both bias and correlation when switching to in-plume convective scavenging in HadGEM3-UKCA can clearly be seen when going from BASE to CVS-CAV, or from G3M to CVSCAV+G3M (with the exception HIPPO-1 HIPPO-2 HIPPO-3 HadGEM3-UKCA ECHAM5-HAM2 of the negative bias against HIPPO-2 and HIPPO-3 in the CVSCAV+G3M simulation).The improvement in correlation should perhaps be regarded as more relevant, as the bias is likely to be more susceptible to model tuning/calibration.For all three phases, this increase in correlation (0.22 !0.41, 0.27 !0.42, 0.51 !0.65 between BASE and CVS-CAV for HIPPO-1, -2, -3 respectively) is statistically significant in the sense that the error bars of the nudged BASE and CVSCAV (or G3M and CVSCAV+G3M) simulations do not overlap.Carrying out the analysis separately for the points in the two hemispheres (not shown) indicates that the increase in correlation comes largely from the Northern Hemisphere.
For both models, a small improvement in correlation (although not in bias for HadGEM3-UKCA CVSCAV) is seen when going from BASE to G3M (or CVSCAV to CVSCAV+G3M), although the overlapping error bars indicate that this is not statistically significant.As with the visual analysis, there is almost no difference between the ECHAM5-HAM2 G3M and G3H simulations.It is probably the case that evaluation closer to source regions would be more powerful in distinguishing between emissions inventories and their time resolution.
The correlation and, in most cases, also the bias are much improved in the nudged HadGEM3-UKCA simulations (solid symbols) compared to their free-running counterparts (hollow symbols).The correlation increases for the BASE configuration are 0.14 !0.22, 0.08 !0.27, 0.44 !0.51 between free-running and nudged simulations for HIPPO-1, -2, -3, respectively.In addition, the improvement in bias and correlation from changes to the model configuration is enhanced in the nudged simulations.This is particularly significant for HIPPO-1, where nudging eliminates the overlapping error bars on the correlation axis between BASE and CVSCAV.This allows us to conclude that the improvement in CVSCAV is statistically significant, which may not have been clear from the free-running simulations alone.(It should be noted in this context that the error bars on the free-running models are an underestimate -ensemble simulations would be needed to quantify the additional uncertainty from the simulated meteorology.)Thus, not only does nudging help to produce realistic simulations of aerosol during a given flight campaign, but it also makes it easier to evaluate the effect of changes to the aerosol scheme by damping errors due to differences in large-scale dynamics.

Comparison with profile curves
To compare the point-by-point analysis presented here with a more traditional approach, we have constructed profile curves from the HIPPO-1 observations for four latitude bands using the (geometric) mean and standard deviation over all the profiles identified for the burden analysis (as described in Sect.4.1) in each latitude band.We have also constructed corresponding curves from the January 2009  monthly-mean output from the HadGEM3-UKCA simulations, by horizontally interpolating to the location of each profile identified in the observations.The results are shown in Fig. 6.Although the construction is similar to that in Schwarz et al. (2010), the curves are not identical due to the different profile-detection algorithm used.While some improvement from both CVSCAV and G3M can be seen in these curves, this is rather less clear than in the point-by-point analysis (Fig. 3).In many cases, the differences are overshadowed either by the variability between profiles within a region (e.g. in the 60 N-80 N band, where all four model curves lie almost entirely within the spread of the observed profiles) or by overall regional biases (e.g. in the 20 S-20 N band, where all four model curves lie almost entirely outside the spread of the observed profiles).It is therefore difficult to ascribe statistical significance to the model improvements from this analysis, as we did in Sect.6.3 from Fig. 5.
This demonstrates the usefulness of the point-by-point analysis presented here in allowing us to evaluate processlevel changes to the model with rather more confidence than can be obtained from a more traditional approach.

Conclusions
In this study, we develop methods for evaluating aerosolclimate models against large-scale aircraft campaigns, and apply these to investigate the impact of convective scavenging and biomass-burning emissions on the vertical profile of black carbon.
By running two aerosol-climate models in nudged configurations and interpolating their output onto the track of a flight campaign, we make a detailed pointwise comparison between model output and in-situ aircraft observations.Using data from a campaign such as HIPPO, which has good vertical resolution over an extended geographical area, this gives a powerful tool for evaluating the vertical distribution of aerosol in the models.We also show how these measurements can be used to evaluate column-integrated burdens in the models, which are a more direct product of most models than the optical/radiative properties (e.g.aerosol optical depth) which can be evaluated via remote sensing.
We apply this approach to black carbon aerosol in the HadGEM3-UKCA and ECHAM5-HAM2 models, and shown how each has different areas of disagreement with the HIPPO SP2 observations.Both models significantly overpredict BC burden, especially in the more remote regions, suggesting that the BC lifetime is too long.In the case of HadGEM3-UKCA, the largest discrepancy (an excess of aerosol in the tropical upper troposphere) can be eliminated by switching from the default operator-split convective scavenging scheme to one which scavenges directly from the convective plume.This change improves both the vertical distribution of BC and the simulated burdens against the HIPPO observations, yielding a statistically significant increase in the pointwise correlation coefficient for all three phases of the HIPPO campaign (0.22 !0.41, 0.27 !0.42, 0.51 !0.65 for HIPPO-1, -2, -3, respectively).
In both models, a somewhat smaller and not statistically significant improvement can be seen when switching from GFED2-based biomass-burning emissions to GFED3.1; however, there is virtually no change in this remote region when the time resolution of these emissions is increased from monthly to 3-hourly.It seems likely that a similar analysis with a wider range of flight campaigns, including the major biomass-burning regions, might better constrain the choice of such emissions.
We show for HadGEM3-UKCA that both the correlation between the BASE configuration and the observations, and the increase in correlation due to the new convective scavenging scheme, are enhanced when the simulations are nudged as opposed to free-running.In this way, nudging can enable statistically significant improvements in the model to be detected where they might not be in a free-running simulation; e.g. the above increase in the correlation of the nudged model against HIPPO-1 (from 0.22 to 0.41) is statistically significant, while the corresponding increase for the free-running model (from 0.14 to 0.27) is not.
In an analysis of this kind, there is always the potential for compensating errors in different parts of the model to obscure which processes are poorly represented.This is particularly true in this case where we make a change in the biomassburning emissions and compare to observations in remote regions; an analysis nearer the source regions might better distinguish emissions effects from other sources of bias.For convective scavenging in HadGEM3-UKCA, however, we see a very clear improvement from a more physicallyrealistic implementation without adjusting any other model parameters, with very little new error introduced; thus we conclude that we are not simply compensating for other errors, but that it is the convective scavenging itself which must be accurately represented in the model to obtain a realistic vertical profile.
It is clear that vertically-resolved in-situ measurements of aerosol have an important role to play in evaluating the aerosol distributions simulated by aerosol-climate models, in conjunction with satellite remote sensing and ground-based observations, and that they can provide particular insight into the processes governing the vertical transport of aerosol in the atmosphere, as we have seen with convective scavenging.

Fig. 1 .
Fig.1.Flight tracks for the first three phases of the HIPPO campaign(January 2009, October/November 2009 and March/April 2010,  respectively).The circles show the BC burden (in kg m 2 ) estimated from the HIPPO SP2 observations over each vertical profile, while the background shading shows the monthly-mean BC burden from the HadGEM3-UKCA (BASE and CVSCAV+G3M) and ECHAM5-HAM2 (G3M) simulations.The bottom row shows the burdens from the AEROCOM Phase I(Textor et al., 2006) median model (constructed from the ARQM, GISS, GOCART, GRANTOUR, KYU, LOA, MATCH, MPI HAM, MOZGN, PNNL, UIO CTM, UIO GCM, ULAQ and UMI models).The side plots show the observed burdens (red bars, representing the range due to uncertainty in extrapolation of profiles to the surface and a 15 km lid, plus the 30 % uncertainty in the mixing ratios used), the along-track model burden (blue line, two-valued due to the southbound and northbound legs) and the zonal range of the model burden between the map edges (shading).

Fig. 2 .
Fig. 2. Mass mixing ratio (kg kg 1 ) of BC in the atmosphere, from each phase of the HIPPO campaign, calculated by aggregating SP2 data over 1-minute intervals, and from nudged HadGEM3-UKCA (BASE and CVSCAV) and ECHAM5-HAM2 (BASE) simulations, sampled along the HIPPO flight track (also at 1-min intervals).

Fig. 4 .
Fig.4.Difference of BC mass mixing ratio (kgkg 1 ) simulated by ECHAM5-HAM2 in each configuration (rows) from that observed during each phase of the HIPPO campaign (columns).The model is nudged and sampled along the HIPPO flight track at 1-min intervals; observed mixing ratio is calculated from HIPPO SP2 data aggregated over 1-min intervals.

Fig. 5 .
Fig.5.Bias-correlation plots of log(BC mass mixing ratio) between the HadGEM3-UKCA (top row) and ECHAM5-HAM2 (bottom row) simulations and each phase of the HIPPO campaign (columns).The error bars represent a 95% confidence interval based on a moving-block bootstrap and the ±30 % error in the SP2-derived mixing ratios from HIPPO-1.The solid symbols represent nudged simulations, while the hollow symbols (for HadGEM3-UKCA) represent free-running simulations.The "obs." point on the right-hand side indicates where a model which reproduces the observations perfectly would be located.

Fig. 6 .
Fig. 6.Vertical profile curves of BC mass mixing ratio for HIPPO-1 and horizontally-matched locations in the January 2009 monthlymean output from the HadGEM3-UKCA simulations.The shaded region shows the (geometric) standard deviation of the observed MMR values over the profiles in each latitude band, plus the ±30 % measurement error.

Table 1 .
Differences relevant to black carbon between the aerosol schemes in HadGEM3-UKCA and ECHAM5-HAM2, in their BASE configurations.