THE GLOBAL AEROSOL SYNTHESIS AND SCIENCE PROJECT (GASSP) Measurements and Modeling to Reduce Uncertainty

Novel methodologies for quantifying model uncertainty are combined with an extensive new database of in situ aerosol microphysical and chemical measurements to reduce uncertainty in aerosol effects on climate.

arth's planetary radiative balance is strongly affected by atmospheric aerosol particles, which reflect and absorb solar radiation and influence the albedo and other properties of clouds. Air pollution has altered the properties of aerosols and caused a change in radiative balance, or radiative forcing, over the industrial period of between near 0 and −2 W m −2 (Boucher et al. 2013). This uncertainty has persisted through all Intergovernmental Panel on Climate Change (IPCC) assessment reports since 1996 and significantly limits our understanding of historical climate change and our confidence in climate change projections (Andreae et al. 2005;Seinfeld et al. 2016).
Changes in aerosols also have important effects on regional climate, atmospheric circulation, clouds, and precipitation (Shindell and Faluvegi 2009;Booth et al. 2012;Philipona et al. 2009;Wang et al. 2014;Kaufman et al. 2005;Stevens and Feingold 2009;Rosenfeld et al. 2008;Smith et al. 2016;Bollasina et al. 2011). Our ability to reliably quantify the long-term effects of aerosols on these climate processes ultimately rests on being able to reliably simulate aerosol properties and radiative effects on regional and global scales.
The uncertainty in aerosol radiative forcing has not fallen over the last 20 years despite substantial developments in model complexity, numerous model intercomparison projects, and enormous investments in global observing systems. While our knowledge about aerosols has improved, this has not been translated into more robust global models-that is, models that can make reliable predictions in spite of uncertainties in the underlying processes (see sidebar "Model uncertainty, constraint, and robustness" for definitions of terms related to model uncertainty). A recent assessment shows that global aerosol models have a very large spread of a factor of 2-30 (depending on region) in their simulations of climate-relevant aerosol properties (Mann et al. 2014).
The main challenge we face in reducing uncertainty is that models are now very complex, which can reduce their robustness (Knutti and Sedláček 2012). Early models simulated just the mass of various aerosol chemical components, but many now simulate the full aerosol size distribution and all the associated microphysical processes in order to more realistically simulate how aerosols interact with solar radiation and clouds (Ghan and Schwartz 2007). This added complexity affects how we tackle the uncertainty. First, simulated aerosol properties are controlled by dozens of poorly quantified processes (Textor et al. 2006;Lee et al. 2013;Kipling et al. 2016;Kinne et al. 2006), which are difficult to observe in isolation. Second, a wider range of measurements is needed to evaluate the models. Third, the necessary aerosol microphysical and chemical properties cannot be measured using satellite remote sensing instruments (Stier 2016), so model evaluation must rely on sparse measurements from aircraft, ships, and ground stations. This article describes the Global Aerosol Synthesis and Science Project (GASSP; http://gassp.org.uk/), which has four main objectives: • understand and reduce the persistent uncertainty in aerosol models and the associated aerosol radiative forcing by constraining the spread of model simulations using a synthesis of aerosol measurements; • attribute the reduction in uncertainty to particular measurements so that we can understand the "value of measurements" and identify where new measurements should be prioritized; • understand model robustness by exploring whether models constrained by measurements remain reliable when used to predict under new conditions, which is an essential requirement for calculating radiative forcing; and • exploit our enormous investments in aerosol measurements of microphysical and chemical properties to reduce model uncertainty as much as possible. Figure 1 shows the four main activities in GASSP.
Objective 1 collects and harmonizes in situ aerosol measurements from aircraft, ships, and ground sites to produce an easily accessed dataset suitable for statistical constraint of global aerosol models. The first phase of GASSP has focused on constraining aerosol microphysical and chemical properties that are not well constrained by Earth-observation datasets (Stier 2015). Future extension to cloud microphysical properties would be highly valuable for evaluating modeled aerosol-cloud interaction and associated radiative forcing.
Objective 2 focuses on developing new methodologies for comparing models against sparse in situ measurements. Our research has shown that there are inherent uncertainties associated with using sparse measurement data, caused, for example, by spatial inhomogeneities that are not represented in the models (see section "Representativeness of sparse measurements and the importance for model constraint"). Objective 3 is about developing methodologies to understand and quantify model uncertainty. GASSP has focused on the spread of model predictions caused by uncertainties in the processes and inputs (like emissions) in a model-known as parametric uncertainty-because there are established statistical methodologies for quantifying and reducing this source of uncertainty (see sidebar "Model uncertainty, constraint, and robustness"). The spread of such simulations can be similar to the spread of several structurally different models (Lee et al. 2013; Kipling et al. 2016), which is the measure of uncertainty used in many assessments (Kinne et al. 2006;Lamarque et al. 2013;Eyring et al. 2016).
Objective 4 brings everything together to assess how the observational constraint of aerosol microphysical properties affects the range of aerosol radiative forcing simulated by the model.

THE GASSP AEROSOL MEASUREMENT DATABASE.
A synthesis of multiple measurements is a vital component of GASSP. As we describe in the Table 1. List of ground-based monitoring station networks with aerosol measurements: particle number concentration N, particle number size distribution (NSD), cloud condensation nucleus concentration (CCN), speciated mass concentration/composition (Comp), black carbon mass concentration (BC), and particle mass concentration less than 2.5 µm in diameter (PM 2.5 ). Short-term field campaign measurement data made at some of these stations are included in Table ES1 "Model uncertainty and observational constraint" section, a diversity of measurements is a critical factor in constraining a complex model with many compensating sources of uncertainty. The community's considerable investment in measurements of aerosol microphysical and chemical properties is significantly underexploited to evaluate and constrain climate models, mainly because many datasets are not available in a harmonized form (i.e., standardized format, time units, variable names, etc.) in common repositories. A key objective of GASSP was therefore to harmonize a large fraction of the world's in situ aerosol measurements and make them "user ready." The number and sophistication of aerosol measurements has increased dramatically over the last 20 years, providing essential data to evaluate the latest model simulations of aerosol microphysical properties (see sidebar "Developments in aerosol microphysical measurements"). Long-term measurement programs have been established, such as the Global Atmosphere Watch (GAW) program; the Aerosols, Clouds, and Trace Gases Research Infrastructure (ACTRIS); and the Interagency Monitoring of Protected Visual Environments (IMPROVE) network. These data are generally readily available (see Table 1) and have been used to evaluate multiple models (Mann et al. 2014). However, many measurements are made in short campaigns of typically a month duration. While some campaign data have been synthesized (Heintzenberg et al. 2000;Clarke andKapustin 2002, 2010;Asmi et al. 2013) and used to evaluate multiple models (Mann et al. 2014), most data have been used to evaluate specific aspects of particular models under specific conditions, often called "golden days." We collected measurements related to six broad aerosol properties important to the effects of aerosols on climate: particle number concentration, cloud condensation nucleus concentration, particle number size distribution, particle mass concentration less than 2.5 µm in diameter (PM 2.5 ), chemical composition, and black carbon (BC) mass concentration ( Table 2). The  (Table 3), and within the six aerosol property classes there are about 70 different measured variables related to different particle size thresholds, cloud condensation nucleus supersaturations, etc. It is this diversity of measurements, compared to gas-phase chemistry (Sofen et al. 2016), that makes it so challenging to use a wide array of aerosol data in model evaluation.
The GASSP database currently contains measurements from 86 field campaigns (of a total of 119 collected) and long-term measurements from over 350 ground-based monitoring stations spanning 1990-2015 (see Table ES1 in the online supplement). It includes 20 ship campaigns, 16 ground station campaigns, 29 aircraft campaigns, and 21 campaigns involving multiple measurement platforms. We obtained data from 15 repositories (Table ES2) and, for 42 campaigns, through direct contact with investigators. The measurements include over 9,500 f light hours (over 13,000 instrument hours) and 22,000 ship hours (over 33,000 instrument hours). We estimate the total research investment in the aircraft measurements to be about $182 million (based on an average of $18,000 per flight hour plus $375,000 per detachment), not including personnel hours. The cost for ship measurements is about $57 million (based on $60,000 per day and $130,000 per cruise).
It took about 1.5 person years to collect and harmonize 52,000 data files from about 20 original file types and formats ranging from simple text files to standard formats like National Aeronautics and Space Administration (NASA) Ames (Stephens 2008) or International Consortium for Atmospheric Research on Transport and Transformation (ICARTT; Aknan et al. 2013). Considerable effort was put into merging files and correcting or standardizing units, variable names, error flags, missing value indicators, and time dimensions. We also collated metadata (as well as missing metadata from published papers) such as the location of the measurement, the sampling conditions (e.g., standard or ambient pressure), and information on the instrument type, detection limits, particle size definition, occurrence of clouds, and whether the particles were dried or not.
The data were converted to a standardized Network Common Data Form (netCDF) format (Pringle 2017). We included a large number of data fields or "tags" within the global attributes of the file, with many adopted from the National Center for Atmospheric Research (NCAR) Research Aviation Facility (RAF) file format (NCAR 2009), such as the bounding box and time coverage of the measurements. To make model evaluation more tractable, the information necessary to construct observed variables from model output variables is stored as GASSP file attributes. GASSP uses data "levels" based on terminology widely used in remote sensing: M ost simply, uncertainty can be defined as the spread of outputs in model simulations.
Structural uncertainty arises because there are different ways of representing the physical processes in a model because of insufficient knowledge, simplifications, different algorithms, or missing processes. Structural differences between models explain a large part of the range of predictions between different models, often evaluated through a multimodel ensemble (MME). The spread of outputs from a small number of models is often called the model diversity, and it is this diversity that has persisted through all IPCC assessments.
Process or parametric uncertainty is caused by uncertain values of model input parameters, like chemical-rate coefficients or emissions (Lee et al. 2013). Parametric uncertainty can be quantified using a set of simulations in which parameter values are systematically perturbed (a perturbed-parameter ensemble), an approach widely used in climate science (Allen et al. 2000;Pan et al. 1998;Lohmann and Ferrachat 2010;Haerter et al. 2009;Sexton et al. 2011;Shiogama et al. 2012;Yang et al. 2013). They are often combined with statistical emulation (see sidebar "Model emulation to enable dense sampling of uncertainty"), which enables model outputs to be generated for all parts of parameter space. Constraint of a model is the process of reducing the spread of model simulations by comparing the simulations against measurements. There are various ways of defining plausible or acceptable model simulations, taking account of the measurement uncertainty and the model-measurement sampling uncertainty (see "Representativeness of sparse measurements and the importance for model constraint" section; Lee et al. 2016).
Model-measurement sampling uncertainty is the uncertainty associated with comparing a model with a measurement. Measurements sample only a small part of the atmosphere so uncertainty arises because the real world has greater spatial heterogeneity and temporal variability than is represented in a model.
A robust model is one that is reliable and can make useful predictions under different sets of conditions in spite of uncertainties in the model and its inputs (Knutti and Sedláček 2012). The robustness of a model can only be assessed if the uncertainties have been systematically explored.  Table 3 for a summary of instruments and abbreviations. Gridded "level 3" data formats (commonly provided with remote sensing datasets) and aerosol climatologies (Heintzenberg et al. 2000) allow for easy data handling. However, because requirements differ for specific applications, GASSP has instead supported the development of the Community Intercomparison Suite (CIS; www.cistools.net; Watson-Parris et al. 2016), an open-source command-line tool and Python library designed to read, aggregate, collocate, analyze, and visualize a wide range of ungridded and gridded datasets. CIS has a plug-in for NCAR-RAF and GASSP format data so the full GASSP level 2 database is accessible via CIS, allowing the in situ measurements to be collocated with model and many satellite products. Figures 2 and 3 show the spatial and temporal distributions of the main measurement types. To give an impression of the density of measurements (albeit over several years), Fig. 4 shows particle concentrations from aircraft campaigns. North America is by far the most sampled region in terms of the number of airborne measurements, with more than 50% of the measurements conducted there. Most of the other airborne measurements have been made over Europe, the Pacific, and the Arctic (but less so the high Arctic), with less than 5% over the Southern Ocean, Africa, and most of Asia. Shipborne measurements of total particle concentrations are widespread, but measurements of cloud condensation nuclei (CCN), particle composition, and BC (see sidebar "Developments in aerosol microphysical measurements") are concentrated over A s the complexity of models increases, so does the demand for more advanced measurements. In the last 20 years many measurement techniques have been developed that are capable of measuring aerosol particle number, size, composition, and other properties with high time resolution and on a variety of platforms. A full list of instruments in GASSP is given in Table 3.

MODEL UNCERTAINTY, CONSTRAINT, AND ROBUSTNESS
Mobility particle size spectrometers to measure size-resolved particle number concentrations have existed for a long time. International standards have been developed so that instruments can be operated consistently and those data used with confidence (Wiedensohler et al. 2012). The GASSP database includes measurements from scanning and differential mobility particle sizers (SMPS/DMPS), aerodynamic particle sizers (APS), differential mobility analyzers (DMA), ultra-high-sensitivity aerosol spectrometers (UHSAS), passive cavity aerosol spectrometer probes (PCASP), and optical particle counters (OPC).
Size distributions can be compared directly with models (Mann et al. 2014), but are often summarized in terms of the particle number concentration above a particular size (e.g., N 50 for particles larger than 50-nm diameter). Variables like N 50 and N 100 are often used by modelers as representative of CCN concentrations (Mann et al. 2014).
Particle composition can now be measured in real time with AMS [and the related aerosol chemistry speciation monitor (ACSM)] (Canagaratna et al. 2007;Ng et al. 2011). The instruments provide quantitative data on the nonrefractory particle components, that is, organic matter, ammonium, sulfate, nitrate, and chloride.
Refractory BC mass can be measured in real time by SP2, which use laser-induced incandescence (Schwarz et al. 2006). High sensitivity allows airborne measurements even in pristine air. Previously, measurements had been made by combustion analysis or using optical absorption as a proxy, which involves serious artifact issues (Andreae and Gelencsér 2006).
CCN measurements were previously made using parallel plate systems (Snider et al. 2006(Snider et al. , 2010. Now the continuous flow method (Roberts and Nenes 2005) commercialized by Droplet Measurement Technologies allows for faster measurements at supersaturations more representative of atmospheric conditions. Fig. 3. Number of data hours by year and month for the main aerosol instrument types: Particle number concentration using CPC, CCN concentrations using CCN counters (CCNC), speciated mass concentrations using AMS or ACSM instruments, BC mass concentration using the SP2 instrument, and particle number size distribution (NSD) using SMPS, DMPS, UHSAS, PCASP, DMA, and APS. Instrument types are defined in Table 3.

DEVELOPMENTS IN AEROSOL MICROPHYSICAL MEASUREMENTS
the Atlantic and Arctic. The availability of measurements from aircraft and ships varies substantially with the time of year, with only 8% of global measurements in winter (December-February).
Pristine (unpolluted) regions are considerably undersampled, especially on the continents. Such regions are informative about aerosol conditions in the preindustrial era (Hamilton et al. 2014), which are an important component of the aerosol-cloud radiative forcing uncertainty because cloud properties are very sensitive to aerosol uncertainties in clean conditions (Carslaw et al. 2013a).
Until 2004 almost all measurements were of particle concentration or size distribution, but since then  many measurements of speciated particle composition have been made by the Aerodyne aerosol mass spectrometer (AMS) and BC concentrations from the single particle soot photometer (SP2) instrument.

REPRESENTATIVENESS OF SPARSE MEA-SUREMENTS AND THE IMPORTANCE FOR MODEL CONSTRAINT.
Even the largest consistent collection of in situ aerosol measurements remains spatially and temporally sparse, raising important questions about how representative they are of what a low-resolution model simulates. Measurement sparseness can introduce measurement-model sampling uncertainties (see sidebar "Model uncertainty, constraint, and robustness").
In situ measurements at ground sites are particularly important data in GASSP. To quantify the potential spatial sampling errors in a typical continental environment, we used a high-resolution model to simulate "measurements" and compared them with spatially averaged fields representing a 100-km-scale global model grid cell (Schutgens et al. 2016a). Figure 5 shows that sampling errors can be as large as 80% when using monthly mean model output, but the errors are typically less than 30%. The error depends on the spatial heterogeneity of the aerosols, so particle number concentrations and BC mass concentrations have much larger sampling errors than PM 2.5 due to a greater heterogeneity in sources.
Our analysis suggests that spatial sampling errors significantly exceed measurement errors: measurement errors for PM 2.5 and BC mass concentrations are typically 15%, while typical instantaneous sampling errors (for data sampled every hour) are about 50% (Schutgens et al. 2016a). These errors can be reduced by about a factor of 3 by averaging over a month. However, there is a catch: long-term averaging increases the likelihood that model-measurement agreement is a result of compensating errors, such as underpredicting aerosol concentrations in polluted conditions but overpredicting them in clean conditions.
Another problem is that measurements are often discontinuous. Monthly mean sampling errors can be as high as 37% when measurements are discontinuous, with sometimes significant regionwide biases (Schutgens et al. 2016b). These increased errors can be largely mitigated by sampling the model to the measurement times before monthly averaging, for example, by using the CIS tools ( Watson-Parris et al. 2016).
Flight campaign data present a particular challenge. In Fig. 6, we compare the spatial sampling error distribution for instantaneous measurements from a flight campaign and a point measurement in a biomass-burning environment. While model sampling errors are smaller for f light measurements than for single point measurements (because multiple point measurements are averaged across a model grid box), the errors can still exceed 30%-50%. Campaigns in which aerosol plumes are deliberately followed will of course be particularly prone to model spatial sampling errors and biases, so they are not ideal for the statistical evaluation of global models. A perturbed-parameter ensemble (see sidebar "Model uncertainty, constraint, and robustness") of several hundred simulations of a climate model can only sample the multidimensional parameter uncertainty space extremely sparsely. Even with just two parameter settings for each dimension (a typical high-low perturbation approach), around 1 billion simulations are needed to cover 30 dimensions with all possible parameter combinations. In GASSP we therefore used emulators to enable a very dense sampling of the parameter space of the global model.

Fig. 6. Spatial sampling errors (1-h mean) for BC mass concentrations for typical flight campaigns over the Congo during the biomass-burning season. The gray shades indicate sampling errors for north-south tracks through global model grid boxes, with the shade indicating different quantile ranges (light: 2%-98%; medium: 9%-91%). The red dashed line shows the 9%-91% quantile range for east-west tracks, while the black dashed line represents point measurements.
The purpose of emulation is to use the outputs from a perturbed parameter ensemble to generate continuous functions (multidimensional response surfaces) that describe how the model outputs vary across all the uncertain model parameters sampled by the simulations (the training data for the emulators). The emulators are validated by testing them against an additional set of model simulations.
Emulators can be built to describe the behavior of a model output at a specific location, the mean behavior over a region, or globally (Lee et al. 2012(Lee et al. , 2013Carslaw et al. 2013a;Regayre et al. 2014Regayre et al. , 2015. An emulator is not a simplified model. Rather, it is a way to generate model output at untried parts of a particular model's parameter space. In GASSP we used emulators to generate millions of "model variants" by sampling across the parameter space very densely in a Monte Carlo way. These model variants can be used to i) generate probability density functions to characterize uncertainty in a model output, ii) perform a sensitivity analysis to determine which processes affect the uncertainty, iii) compare against measurements so that plausible parts of parameter space can be found, and iv) test model robustness by using plausible parts of parameter space to quantify uncertainty in radiative forcing.
We have found that about 6-10 training simulations per parameter are needed to explore 30 dimensions of uncertainty in an aerosol-climate model, thus requiring about 180-300 simulations. When the meteorology is well defined (such as in a nudged climate model), then 1-yr simulations are sufficient (Lohmann and Ferrachat 2010).

MODEL EMULATION TO ENABLE DENSE SAMPLING OF UNCERTAINTY
Our overall conclusion is that the model-measurement sampling errors are likely to be nonnegligible compared to the model uncertainty that we are trying to constrain. Figure 7 shows the estimated standard deviation of monthly mean particle concentrations in the Global Model of Aerosol Processes (GLOMAP; Carslaw et al. 2013b; see "Model uncertainty and observational constraint" section). Over heterogeneous continental regions the standard deviation is about 50%-100%, so the spatial sampling errors for in situ measurements described above will limit the extent to which these model uncertainties can be reduced. Satellite measurements (e.g., of aerosol optical depth) may help alleviate the spatial sampling ▶ Fig. 9. The estimated reduction in uncertainty in modeled concentrations of particles larger than 50-nm diameter N 50 when an N 50 measurement in central Europe is used to constrain the model. The reduction in uncertainty broadly mirrors the uncertainty cluster where anthropogenic aerosol parameters are the main source of uncertainty.
problem associated with in situ measurements, although they are of course very limited in the aerosol microphysical properties they can constrain. A combination of in situ and satellite data will likely be most effective.

MODEL UNCERTAINTY AND OBSERVA-TIONAL CONSTRAINT.
The objective of this aspect of GASSP was to quantify global model uncertainty caused by uncertain processes and emissions cloud condensation nuclei concentrations (Lee et al. 2011(Lee et al. , 2012(Lee et al. , 2013, total particle concentrations ( Fig. 7; Carslaw et al. 2013b), and aerosol-cloud radiative forcing since the preindustrial period (Carslaw et al. 2013a). An improved perturbed parameter ensemble explored radiative forcing regionally and over recent decades (Regayre et al. 2014(Regayre et al. , 2015. Two other ensembles of the climate model used to diagnose aerosol effective radiative forcing (Boucher et al. 2013) will be described elsewhere.
Even a few hundred simulations in a perturbed parameter ensemble are too few to adequately represent the model uncertainty because the simulations represent extremely sparse points in multidimensional Fig. 10. Constraining the GLOMAP perturbed-parameter ensemble using multiple aerosol measurements. We used a selection of measurements of the total particle concentration N tot , particle concentrations larger than 50-nm diameter N 50 , BC concentrations, and particle mass concentration less than 2.5 µm in diameter (PM 2.5 ) in the boundary layer. and then to explore how in situ aerosol measurements can help to reduce this uncertainty and determine what effect this has on aerosol radiative forcing.

Model simulations to quantify sensitivity and uncertainty.
We used perturbed parameter ensembles of the aerosol model GLOMAP implemented in the chemical transport model Tropospheric Off-Line Model of Chemistry and Transport (TOMCAT) and the general circulation model Hadley Centre Global Environment Model (HadGEM) to quantify uncertainty. The GLOMAP-TOMCAT ensemble samples 28 uncertain aerosol parameters based on 168 simulations and has been used to explore uncertainty in present-day parameter space. We therefore used the ensemble of simulations to build model emulators, which enable us to use Monte Carlo sampling to perform a full statistical analysis of the model (Lee et al. 2013). The sidebar "Model emulation to enable dense sampling of uncertainty" and Fig. 1 provide more details.
Relating aerosol measurements to causes of model uncertainty. One objective of GASSP is to understand which sources of uncertainty in a model can be constrained by particular measurements. As an example, Fig. 8 shows a map of the model parameters that control uncertainty in the number of particles larger than 50-nm dry diameter N 50 , a variable that is commonly used to approximately represent cloud condensation nuclei concentrations in models (see sidebar "Developments in aerosol microphysical measurements"). The causes of uncertainty can be clustered into "uncertainty environments" within which the causes of uncertainty are similar (Lee et al. 2016). In Northern Hemisphere polluted regions (pink), measurements of N 50 would help to constrain a range of aerosol microphysical parameters throughout the year, while in the boreal forest regions (pale yellow), N 50 measurements would help to constrain aerosol processes in winter and fire emissions in summer.
The fact that model uncertainties can be clustered regionally suggests that large gaps in available measurements may not necessarily affect how well we can constrain modeled aerosols as long as we have measurements that are representative or characteristic of aerosols throughout these regions. To illustrate the constraint provided by even isolated measurements, Fig. 9 shows how much the N 50 uncertainty could be reduced globally by constraining the model spread to match a measurement that is representative of aerosols over central Europe (Lee et al. 2016). Although this is only a single (monthly mean) measurement, Fig. 9 shows that it can have tremendous effect on model uncertainty provided its representativeness of the region can be defined. Such model sensitivity data could in the future guide us to where new measurements should be made, and what effect the measurement will have on model uncertainty.
Observational constraint and model robustness. We have tried two approaches to constrain the spread of simulated aerosols. The first approach was to simply select a small number of simulations from the GLOMAP ensemble that best match the GASSP measurements (Fig. 10). A subset of the observationally most plausible simulations can be selected (those with a small average error compared to multiple measurements), which leads to a much narrower uncertainty range compared to the full ensemble (Fig. 10b).
Selecting the best runs from an ensemble produces a model that is close to aerosol measurements, but it is not sufficient to understand the reduction in aerosol radiative forcing uncertainty. This is because the best simulations lie very sparsely in multidimensional parameter space (Fig. 10c) so they are not statistically representative. Nevertheless, it is worth bearing in mind that in multimodel ensembles most modeling centers typically choose one model variant from the many that might be plausible (Kinne et al. 2006;Lamarque et al. 2013;Mann et al. 2014;Eyring et al. 2016). Our second approach was to use emulators (see sidebar "Model emulation to enable dense sampling of uncertainty") to generate several million model variants that densely sample the 28-dimensional parameter space (Lee et al. 2016). The advantage of this approach is that we can find the observationally plausible regions of the model parameter space, rather than relying on just a few sparse simulations. Figure 11 shows the result of selecting model variants from a set of 3,000,000 that match hypothetical measurements of N 50 within a typical 30% sampling uncertainty over the North Atlantic (Lee et al. 2016). This relatively tight constraint of N 50 concentrations has very little effect on the spread of the calculated cloud albedo radiative forcing, even though we have eliminated over 95% of the initial parameter space and the aerosol model is assumed to be the only source of uncertainty in the calculated forcing. Further work is now underway to relate all the measurement types in GASSP to their effectiveness at constraining the modeled radiative forcing. This is an important discovery of the GASSP project-that even a tightly constrained global aerosol model can still generate a wide range of aerosol radiative forcings even though the change in aerosols directly causes the forcing, and we have assumed that the uncertainty in the simulated forcing comes only from the aerosol component of the model. The explanation is that the observationally plausible model variants lie in widely distributed parts of parameter space (just as in Fig. 10c). We describe these as "equifinal models" (Beven 2006). Only by sampling across the whole parameter space of the model has it been possible to demonstrate the existence of these equifinal models.
Identifying model structural errors. The comparison of a perturbed parameter ensemble with measurements provides a way to identify potential model structural errors. For example, Fig. 12 shows that the full GLOMAP ensemble, sampling 28 dimensions of parameter uncertainty, fails to reproduce the seasonal cycle of PM 2.5 at one site despite an order of magnitude model spread. This bias, which is apparent in other aerosol properties, can be attributed to poor representation of regional aerosol emissions. The extensive GASSP aerosol database combined with the perturbed parameter ensembles provides an optimum way to detect such structural errors.

SUMMARY AND FUTURE PROSPECTS.
GASSP set out to understand and reduce uncertainty in model simulations of aerosol radiative forcing caused by the uncertainties in the aerosol component of the model.
What have we learned about the nature and availability of aerosol in situ measurements in the context of model constraint? GASSP has created a harmonized dataset of nearly a quarter of a century of aerosol in situ measurements comprising over 46,000 measurement hours from aircraft and ships and from over 300 surface sites (Fig. 2). We intend to make these data readily accessible for wider use with an appropriate data protocol and with the permission of the data providers and other data centers. The database can, and should, be added to in the future.
Aerosol measurements are highly diverse and difficult to harmonize (the GASSP database includes 70 measured aerosol variables). However, this diversity is a strength because the aerosol system is complex with many compensating uncertainties (Lee et al. 2016), so optimum model constraint is achieved with the greatest diversity of measurements. While seeking the most valuable measurements, we should also embrace measurement diversity and ensure that we have procedures that enable the measurement data to be ingested into models as easily as possible, such as GASSP has helped to facilitate.
The distribution of global aerosol in situ measurements is not optimum for constraining models. The overwhelming focus of aerosol measurement campaigns has been on "process-based studies" rather than to obtain a representative sample from different aerosol environments. Model uncertainty reduction would be accelerated if we put more effort into understanding the representativeness of measurements, how they characterize aerosol properties in different environments, and how they help to reduce the spread of plausible model simulations.
Pristine (unpolluted) environments are considerably undersampled. Analysis in GASSP shows that much of the uncertainty in radiative forcing stems from the properties of pristine aerosol environments (Carslaw et al. 2013a). We have some idea where near-pristine, preindustrial-like environments exist (Hamilton et al. 2014), such as the summertime Arctic, some remote boreal regions, and the western Pacific, which should provide some guide to where future measurements could be prioritized.
The use of sparse point measurements imposes limits on how much model uncertainty can be reduced because the real atmosphere has greater spatial and temporal variability than a low-resolution global model (Schutgens et al. 2016a,b). Typical spatial sampling errors of 6) are smaller than the current parametric uncertainty in single models ( Fig. 7; Lee et al. 2013;Carslaw et al. 2013b) and multiple models (Mann et al. 2014), but may still be large enough to limit the reduction in radiative forcing uncertainty (Lee et al. 2016).
What have we learned about our ability to constrain radiative forcing uncertainty caused by uncertainty in aerosol models? Aerosol-climate models are close to becoming an overdetermined system with many interacting sources of uncertainty but a limited range of observations to constrain them (Haerter et al. 2009;Lohmann and Ferrachat 2010;Lee et al. 2016). Although the spread of aerosol model simulations can be constrained by measurements, there are many "model variants" that can achieve equally plausible model-measurement agreement, and these variants may simulate a wide range of aerosol forcings (Lee et al. 2016). The implication is that agreement of a single tuned model with measurements does not imply a robust model; that is, there are likely to be other plausible model variants that will simulate different aerosol radiative forcings. Such variants need to be identified if we are to quantify model uncertainty.
Given the complexity of the aerosol-cloud-climate system, there is unlikely to be a shortcut to reducing uncertainty. Rather, reduction in uncertainty will be achieved by simultaneously applying extensive and diverse observational constraints on the whole system. This means constraining aerosol, cloud, and radiation state variables as well as the relationships between them, which often relate to specific processes and the behavior of the system (Quaas et al. 2009;Feingold et al. 2016;Ghan et al. 2016).
What are the priorities for future research on aerosol measurements and model uncertainty? While there will always be a need to improve specific processes in models in order to eliminate structural errors, we argue that model development needs to be pursued in conjunction with intensified efforts to quantify and constrain model uncertainty. There is comparatively little effort devoted to understanding the radiative forcing problem from a system uncertainty point of view.
Further effort is needed to enable future aerosol measurements to be harmonized so that modelers and experimentalists can collaborate more easily to reduce model uncertainty. Network infrastructures like ACTRIS make a substantial contribution in this direction, but extensive campaign-type measurements should also be harmonized. Measurement standards and quality control are vital (Wiedensohler et al. 2012), but equally important is data harmonization to reduce the very large number of data formats that have proliferated (see "The GASSP aerosol measurement database" section).
We should prioritize new measurements in a targeted way with the specific objective of reducing model uncertainty. GASSP has shown that we can define representative "uncertainty environments" (Fig. 8), which help to establish the effect that specific measurements will have on model uncertainty (Fig.  9). Such information from models could help guide measurement strategies and priorities.
A greater number of longer-term measurements with greater global coverage, particularly in undersampled environments, would help to constrain global aerosol models. Development and deployment of low-cost sensors might be one way to achieve the required coverage.
A similar effort to GASSP dedicated to collecting and harmonizing cloud microphysical properties, such as droplet and ice crystal number concentrations, would be very valuable. While cloud droplet effective radii can be retrieved from satellite measurements, in situ measurements at cloud base, ideally along with updraft speed, would provide a strong constraint on modeled aerosol-cloud interactions.
A logical next step is to extend the statistical approach used in GASSP to multiple models and thereby merge efforts on parametric and structural uncertainty (Shiogama et al. 2014). Such an approach would be particularly useful for identifying the causes of model-observation bias because we will learn from the similarities and differences in the structural deficiencies across models.
Finally, greater progress will be made through closer cooperation of modeling and observational scientists, as increasingly achieved in projects like AeroCom and ACTRIS. Such interaction is particularly necessary in aerosol science because of the huge diversity and complexity of aerosol measurements. Greater interaction will enable a larger fraction of aerosol measurements to be used routinely in model evaluation and constraint, leading to a long-sought reduction in aerosol model uncertainty and enhancing our ability to simulate historical and future climates.