Chemical and physical inﬂuences on aerosol activation in liquid clouds: a study based on observations from the Jungfraujoch, Switzerland

. A simple statistical model to predict the number of aerosols which activate to form cloud droplets in warm clouds has been established, based on regression analysis of data from four summertime Cloud and Aerosol Characterisation Experiments (CLACE) at the high-altitude site Jungfraujoch (JFJ). It is shown that 79 % of the observed variance in droplet numbers can be represented by a model accounting only for the number of potential cloud condensation nuclei (deﬁned as number of particles larger than 80 nm in di-ameter), while the mean errors in the model representation may be reduced by the addition of further explanatory variables, such as the mixing ratios of O 3 , CO, and the height of the measurements above cloud base. The statistical model has a similar ability to represent the observed droplet numbers in each of the individual years, as well as for the two predominant local wind directions at the JFJ (northwest and southeast). Given the central European location of the JFJ, with air masses in summer being representative of the free troposphere with regular boundary layer in-mixing via con-vection, we expect that this statistical model is generally applicable to warm clouds under conditions where droplet formation is aerosol limited (i.e. at relatively high updraught velocities and/or relatively low aerosol number concentra-tions). A comparison between the statistical model and an es-tablished microphysical parametrization shows good agreement between the two and supports the conclusion that cloud Published by Copernicus Publications on behalf of the European Geosciences Union.


Introduction
Aerosols have a well-documented and pronounced influence on the microphysical and therefore radiative properties of clouds (e.g. Twomey, 1974Twomey, , 1977Albrecht, 1989;Hu and Stamnes, 1993). The properties of atmospheric aerosol particles thus have a strong potential to affect local and regional climates. However, the influence of aerosols on clouds remains the single largest uncertainty hampering the calculation of future climate scenarios (Boucher et al., 2013). To reduce this uncertainty, an improved understanding of the aerosol properties and environmental conditions that allow parts of the aerosol population to act as cloud condensation nuclei (CCN) and form cloud droplets is required.
Previous ground-based studies have investigated statistical relationships between cloud droplet or CCN number concentration, aerosol properties, and environmental variables (e.g. Henning et al., 2002;Dusek et al., 2006;Verheggen et al., 2007;Jurányi et al., 2010Jurányi et al., , 2011Anttila et al., 2012). Based on around 22 days of data from the Taunus Observatory in central Germany, Dusek et al. (2006) determined that the concentration of CCN (as measured at different supersaturations in a CCN counter) is largely dependent on the measured particle size distribution, with the CCN concentration increasing with increasing particle diameter and chemical composition of the aerosol playing a secondary role.
Various studies have investigated the mechanisms through which the chemical composition of aerosol influences its water uptake and activation and how this can be accounted for (e.g. Köhler, 1936;McFiggans et al., 2006;Petters and Kreidenweis, 2007). In addition, surface active compounds may influence surface tension and thus the activation of aerosol particles to form cloud droplets (Shulman et al., 1996;Shilling et al., 2007;King et al., 2009). Recently, it has been suggested that this may lead to a temperature influence on aerosol activation (Nenes et al., 2002;Christensen and Petters, 2012). Nevertheless, the works of, for example, Dusek et al. (2006) and Jurányi et al. (2010Jurányi et al. ( , 2011 suggest that the relatively small variations in chemical composition of aerosol in areas away from sources may play a smaller role in determining CCN activity of the aerosol than variations in the size distribution. Examining 1 month of data from a remote site in northern Finland, Anttila et al. (2012) determined that the highest correlations with activated aerosol number occur with the number of available CCN, which was defined as the total number of particles greater than 100 nm in diameter, and that the number of droplets formed did not strongly depend on updraught velocity. A set of regimes where the number of cloud droplets formed depends on updraught velocities (at low ratios of updraught to aerosol number), and where the number of cloud droplets depends more on the number of aerosol (at high ratios of updraught to aerosol number), were described by Reutter et al. (2009), based on cloud parcel model studies. At the Jungfraujoch site, Henning et al. (2002) determined that aerosol particles larger than 100 nm in diameter were typically activated to form cloud droplets in clouds with liquid water content (LWC) above 0.15 g m −3 . Verheggen et al. (2007) investigated relationships between environmental variables and activated fraction, defined as the fraction of total particles, larger than 100 nm in diameter, that have been activated to form cloud droplets. The latter study based its analysis on one summer and two winter campaigns, and found that the activated fraction increased with increasing LWC and decreased with decreasing temperature below 0 • C, as clouds began to glaciate. Also using data from the Jungfraujoch site, Jurányi et al. (2010Jurányi et al. ( , 2011 found that with knowledge of the average chemical composition of aerosol, a very high degree of correlation could be found between the number of activated aerosol predicted by the κ-Köhler approach (Petters and Kreidenweis, 2007) and the observed number of activated particles measured at different supersaturations in a CCN counter.
Although both Dusek et al. (2006) and Jurányi et al. (2010Jurányi et al. ( , 2011 found that, with a known aerosol size distribution, one can obtain good correlations between the predicted and observed number of droplets at a particular supersaturation in a CCN counter, the peak supersaturation reached in an air parcel is not generally a known quantity. It is also not possible to say how well the number of droplets predicted in this way corresponds with the number of droplets in a cloud which has formed some time ago. Although several studies exist in which a good degree of closure was achieved between predicted and observed cloud droplet numbers (of the order of 20 % difference between calculated and observed droplet numbers; e.g. Fountoukis et al., 2007;Meskhidze et al., 2005;Conant et al., 2004), a simple method of predicting cloud droplet numbers based on easily quantifiable parameters would be useful.
It has long been recognized that the number and the size of aerosol particles strongly influences the number of CCN and that, at higher aerosol number concentrations, clouds will be composed of a greater number of droplets (Köhler, 1936;Fitzgerald and Spyers-Duran, 1973;Twomey, 1974Twomey, , 1977. Several simple parametrizations of the number of cloud droplets as a function of the aerosol diameter and total aerosol number have been suggested for both continental and maritime locations, (Köhler, 1936;Raga and Jonas, 1993;Jones, 1994;Martin et al., 1994), mainly for stratus and stratocumulus clouds.
Subsequently, more advanced parametrizations were developed, allowing for the influence of the aerosol size distribution, updraught velocity, and the chemical composition and mixing state of the aerosol to be accounted for when calculating aerosol water uptake and activation to form cloud  Barahona and Nenes, 2007;Fountoukis and Nenes, 2005;Kumar et al., 2009;Nenes and Seinfeld, 2003). A parametrization was also developed by Kivekäs et al. (2008) which predicts the number of cloud droplets using four parameters: the total submicron aerosol volume concentration, the number-to-volume aerosol concentration ratio, the soluble fraction of the particle volume, and the air updraught velocity. Good agreement was found between the number of droplets predicted by this parametrization and observed droplet numbers in northern Finland.
In this study, data from four summer measurement campaigns carried out at the Jungfraujoch between 2002 and 2011 are used to develop simple statistical models of the relationship between the number of observed cloud droplets and various environmental factors, as well as the aerosol number size distribution, in liquid clouds. Using such an extensive data set collected over a period of nearly 10 years allows the construction of relationships which are applicable to a wide range of conditions, although the statistical model developed here is only valid for liquid clouds. The results from the statistical models are compared to simulations using an established cloud droplet formation parametrization for use in climate model simulations of the aerosol indirect effect.

Measurement site
The Jungfraujoch (JFJ) high-alpine measurement site is located at 3580 m a.s.l., atop an exposed crest in the Bernese Alps, Switzerland, and is accessible by train throughout the year. The site is engulfed in cloud approximately 40 % of the time Nyeki et al., 1998) and local emissions are minimal with the exception of occasional construction activities. Aerosol measurements have been carried out at the JFJ since the early 1970s (Bukowiecki et al., 2016), with continuous measurements since 1986 (Baltensperger et al., 1991(Baltensperger et al., , 1997, and the site has been part of the Global Atmosphere Watch (GAW) programme since 1995. A review of the aerosol observations at the JFJ is provided by Bukowiecki et al. (2016). The location of the station makes it suitable for continuous monitoring of the remote continental troposphere. The topography around the measurement site defines two predominant local wind directions, southeast or northwest. To the southeast, the Aletsch Glacier gradually slopes away from the JFJ at an approximate angle of 15 • . In contrast, the northwestern side drops steeply at an average slope of approximately 46 • . This difference in topography causes updraught velocities to be higher in air masses approaching the station from the northwest than from the southeast, with median peak supersaturations of around 0.41 % (representative of cumulus or orographic clouds) and 0.22 % (representative of shallow layer or stratiform clouds) being reached for the respective wind directions (Hammer et al., 2014;Lugauer et al., 1998). Therefore, depending on con-ditions and wind direction, data gathered at the JFJ can be representative of convective or of stratiform-type clouds.
The unique topography surrounding the JFJ site and the long-term measurements performed there provide substantial opportunity for investigating not only how relationships between environmental variables change between years but also what effect the differing topography to the north and south has, through its influence on the vertical wind velocity. Furthermore, the composition of aerosols in air coming from the south is influenced by different source regions than air coming from the north. Peak supersaturation values, updraught velocity, aerosol hygroscopicity, and cloud droplet number concentration were studied by Hammer et al. (2014), who found that all these quantities showed statistically significant differences between the two wind sectors. This work was extended by Hammer et al. (2015), who quantified the influence of updraught velocity and particle composition and concentration on peak supersaturation.
While measurements made at the JFJ often sample the free troposphere, in summer the air masses are mostly influenced by injections of boundary layer air due to convective events (Lugauer et al., 1998;Nyeki et al., 1998) and frontal systems (Zellweger et al., 2003). On average during summer, a boundary layer influence is detected at the JFJ around 80 % of the time, dropping to around 60 % in spring or autumn or lower than 40 % in January . The latter study also showed that the large degree of boundary layer influence is partly due to the effect of the alpine topography on air flow.
The JFJ observatory is also one of 16 stations of the Swiss National Air Pollution Monitoring Network. As part of this operation, continuous in situ observations of about 70 different trace gases are performed by Empa, the Swiss Federal Laboratories for Materials Science and Technology.

Data collection
Data used in this study were collected as part of the Cloud and Aerosol Characterisation Experiments (CLACE). The CLACE measurements have been conducted at the JFJ since 2000. They are a series of intensive winter and summer campaigns designed to investigate the chemical, physical, and optical properties of aerosols as well as their interaction with clouds (Henning et al., 2002;Verheggen et al., 2007;Sjogren et al., 2008;Kammermann et al., 2010;Jurányi et al., 2010Jurányi et al., , 2011Hammer et al., 2014). The present study utilizes data collected during four summer campaigns, in 2002, 2004, 2010, and 2011 (Table 1).
The following description refers to the basic experimental set-up during all CLACE campaigns. The particles and hydrometeors were sampled via a total and an interstitial inlet which were installed through the roof of the laboratory (Hammer et al., 2014). The total inlet sampled all the particles that had a diameter of less than 40 µm, including the   Hammer et al. (2014) hydrometeors, at wind speeds up to 20 m s −1 (Weingartner et al., 1999). The condensed water of the particles and hydrometeors was evaporated by heating up the top part of the inlet to approximately 25 • C so that all particles were dried (and therefore residual aerosol particles contained in cloud droplets were set free) while reaching the instruments in the laboratory. The interstitial inlet only sampled particles smaller than 1 and 2 µm diameter using a size discriminator of PM 1 (during CLACE2002) and PM 2 (during CLACE2004, CLACE2010, and CLACE2011) respectively. Thus, only non-activated particles (i.e. particles that did not act as CCN and were thus not contained in cloud droplets) passed this inlet. The transition to laboratory temperatures (typically 20 to 30 • C) resulted in the drying of the particles at a relative humidity less than 10 %. The difference between the number of aerosol sampled through the total inlet and the number sampled through the interstitial inlet gives the number of aerosol which were activated to form cloud droplets, n act . It has been shown by Henning et al. (2002), in a comparison with forward scattering spectrometer probe (FSSP) droplet measurements, that this value can be used as a proxy for the number of cloud droplets. Therefore this is the approach that we adopt in the present study. Downstream of the inlets, a scanning mobility particle sizer (SMPS) was used to measure the total and interstitial aerosol size distribution respectively. The SMPS measured particles in the size range of 16 to 600 nm. One scan required 6 min. During CLACE2002 and CLACE2004, the SMPS was installed behind a pinch valve to switch between the two inlets after each scan (i.e. 6 min). The data in 2002 and 2004 are therefore at 12 min resolution. For CLACE2010 and CLACE2011, two SMPS measured simultaneously behind each inlet so that a higher time resolution (approximately 6 min) could be achieved. Each SMPS consisted of a differential mobility analyser (DMA), a bipolar charger to obtain charge equilibrium (krypton source, 85 Kr), and a condensation particle counter (CPC) (Wiedensohler et al., 2012). During cloud-free periods, the interstitial and the total SMPS should measure the same aerosol number size distribution. For the campaigns where two SMPS measured simultaneously, the out-of-cloud particle size distribution showed differences of up to 10 % for particles with diameters between 20 and 600 nm (Hammer et al., 2014). This is within the typical uncertainty for this type of measurements (Wiedensohler et al., 2012). To account for these differences between the two units, the interstitial number size distributions (for each campaign specific instrument) were corrected towards the total aerosol size distribution. A size-and time-dependent correction factor was determined by comparing the total and interstitial number size distributions during all cloud-free periods .
To monitor the cloud presence, the LWC was measured using a particle volume monitor (PVM-100; Gerber, 1991), which measures the LWC by forward light scattering.
A measurement of the horizontal wind speed and direction was provided by the Rosemount Pitot tube anemometer, which is mounted on a 10 m mast as part of the SwissMetNet network of MeteoSwiss. Likewise, temperature measured at the site as part of the SwissMetNet network was used.
In recent years, outdoor tourism activities around the JFJ have increased, resulting in more frequent local pollution events. Data that are likely affected by construction activities, snow groomer operation, and other local anthropogenic influences (mainly cigarette smoke; Fröhlich et al., 2015) have been removed from the data sets. As the JFJ is characterized as a background site, sudden, short-lived fluctuations in the aerosol size distribution can be interpreted as local pollution . Therefore the affected data were identified by visual inspection of the aerosol size distribution spectra.
In situ trace gas measurements of O 3 and CO were conducted as part of the Swiss National Air Pollution Monitoring Network (NABEL). Measurements were recorded at 10 min intervals throughout all study periods, using a UV absorption technique for O 3 (Thermo Environmental Instrument, TEI49C) and non-dispersive IR absorption photometry (NDIR) for CO (Horiba APMA360, APMA370) (Gilge et al., 2010;Zellweger et al., 2009).

Data processing
For years where two SMPSs were operating simultaneously (CLACE2010, CLACE2011), n act , as a function of dry particle diameter, could be calculated directly from the difference between the total and the interstitial particle number size distributions. For the remaining 2 years (CLACE2002 and CLACE2004), the SMPS was switched between the total and the interstitial inlet. For these 2 years, the total measurement was taken to be the first measurement, with the interstitial measurement immediately following it used to calculate n act . The two scans inside this 12 min period were assumed to represent the same atmospheric conditions.
In order to exclude cloud periods that were influenced by the entrainment of dry air, as well as to exclude mixed-phase clouds, the fraction of activated particles was analysed as a function of particle diameter. Without entrainment, in theory all particles above a particular size will be activated during cloud formation if the aerosol is internally mixed (as is generally the case at remote sites such as the JFJ). This size is known as the activation diameter and depends on the peak supersaturation reached within the air parcel. The activation diameter of the aerosol was calculated for each measurement time, following Hammer et al. (2014). In atmospheric measurements, the fraction of activated particles increases between approximately 0 and 1 over a small range of diameters, rather than making a sharp transition at a particular diameter. Therefore the activation diameter is defined as that at which half the particles are activated and half are unactivated.
As described below, for the aged aerosol found at the JFJ, the critical diameter lies around 80-100 nm. Entrainment and mixing of air into the cloud will lead to non-activated particles larger than the activation diameter co-existing with activated particles and therefore the maximum activated fraction above the activation diameter will be less than 1. Similarly, the lower water vapour pressure over ice particles in mixedphase clouds will lead to evaporation of droplets and deactivation of aerosol, reducing the activated fraction above the activation diameter. A threshold of 0.9 was defined, and all measurements with maximum activated fractions of less than this threshold were assumed to be influenced by entrainment or partial glaciation of the cloud and thus excluded from the analysis.
The data were also filtered to remove any data points that were measured outside of clouds, in patchy cloud, or on the edges of clouds. This was achieved based on the measured LWC. For the campaigns that had two SMPS scanners operating simultaneously (CLACE2010 and CLACE2011), the criterion follows Hammer et al. (2014), where cloud was defined to be present when the 30th percentile of the 10 s LWC values' distribution during one 6 min scan period was higher than 5 mg m −3 . For the other campaigns which had only one SMPS system operating (CLACE2002 and CLACE2004), creating a 12 min resolution data set, the criterion used was that of Henning et al. (2002) and Cozic et al. (2008), which defined cloudy conditions if the LWC was higher than 20 mg m −3 for more than 85 % of an hourly period. This more stringent criterion was used to avoid the inclusion of cloud-free periods in the longer (12 min) SMPS scanning time. In contrast, using the criterion of Hammer et al. (2014), which was found to be adequate for excluding cloud-free periods during the 6 min scan time, allowed the inclusion of more data from the 2010 and 2011 campaigns.
Total water content (TWC) was calculated by adding measured LWC to calculated gas-phase water (GPW), except during CLACE2010 where it could be determined directly from a dew point measurement in air sampled through the total inlet. In campaigns other than CLACE2010, such dew point measurements were not available and the GPW was calculated, using the ambient temperature, under the assumption that the in-cloud relative humidity was 100 %.
Data were classified according to wind direction (north and south), in order to determine whether different factors influence the CCN quality depending on the origin of the aerosol particles.
For the purposes of this study, an estimate of the updraught velocity (w act ) at cloud base was calculated, similarly to Hammer et al. (2014), from the local topography and the horizontal wind speed measured at the JFJ (v h JFJ ) using where α is the inclination angle of the flow lines at the cloud base. These values were α = 46 • for the northern terrain and α = 15 • for the southern terrain (for further details see Hammer et al., 2014). This equation is based on the assumptions that the flow lines of the updraught strictly follow the terrain on either side of the JFJ research station and that there is neither sideways convergence nor divergence of the flow lines between the cloud base and the JFJ.

Selection of predictor variables
Six different predictor variables either measured at the JFJ or calculated for the cloud base were included in the statistical analysis. These were the height of the JFJ above cloud base, updraught velocity, number of available potential CCN particles (hereafter referred to as n CCN , see definition below), air temperature at the cloud base, CO, and O 3 . The height of the JFJ above the cloud base was calculated by using the TWC and temperature measured at the JFJ, assuming a moist adiabatic temperature lapse rate (6 K km −1 ) and thus calculating the temperature (and therefore the distance below the JFJ) at which the partial pressure of water in the air mass decreased below the saturation vapour pressure. This approach is described in detail in Hammer et al. (2014) and implicitly assumes that a minimal amount of water is lost from the air mass via precipitation between the cloud base and the JFJ. The height of the JFJ above the cloud base was included as a predictor variable as it determines the amount of condensed water at the altitude of the measurements, and it is also related to the age of the cloud, during which scavenging or coagulation processes may occur.
The updraught velocity, estimated as described in Sect. 4.1, was chosen as it is known to influence the peak supersaturation achieved during cloud formation and, therefore, the activation diameter of the aerosol and the activated fraction of a particular aerosol size distribution.
The n CCN is estimated from the measured aerosol size distributions. As described in Sect. 1, the aerosol number size distribution is known to play an important role in defining the number of cloud droplets formed, with larger particles more likely to be activated, and the smallest particles rarely playing a role in cloud formation. Therefore, it is necessary to choose a minimum diameter, above which a particle can be considered a potential CCN (here, a potential CCN is considered to be an aerosol particle that may act as a CCN when subjected to supersaturation with respect to liquid water). As described above, at aerosol number concentrations larger than approxi-mately 100 cm −3 , Henning et al. (2002) found that the activation diameter at the JFJ is around 100 nm. Further, Hammer et al. (2014) reported that there is a systematic difference in the activation diameter for aerosol in air masses approaching the JFJ from the north (87 nm) and from the south (106 nm).
Here we have chosen a diameter of 80 nm as the lower size bound defining potential CCN. The relatively low value was chosen so as not to exclude potentially important sizes of aerosols.
The air temperature at cloud base (calculated from the temperature at the JFJ) was chosen to account for any temperature-dependent effects on water uptake to the aerosols which may influence activation. However, the cloud base temperature was found not to contribute significantly in the linear regression models for the years 2010 and 2011 (i.e. the years with most observational data). It was thus excluded by backward elimination of explanatory variables for final model selection. Likewise, no significant relationship between air pressure and n act was found.
Finally, the two chemical tracers CO and O 3 were included in the analysis to account for the history of the air parcels. While CO is a primary pollutant and O 3 is produced photochemically as a secondary pollutant from precursors such as volatile organic compounds and nitrogen oxides, both of these can act as tracers of anthropogenic emissions or of biomass burning events (e.g. Staudt et al., 2001;Liang et al., 2004;Yashiro et al., 2009;Zhang et al., 2006Zhang et al., , 2009Gilge et al., 2010), and therefore in this study they are used as indicators of the degree of influence of polluted air masses, in an attempt to determine whether this has an important effect on particle activation at the JFJ. Ozone at the JFJ may be influenced by stratospheric intrusions, but a modelling study (Cui et al., 2009) has suggested that this is the case for less than 20 % of the year, making such events relatively rare.

Statistical analysis
In order to determine if and how environmental and chemical factors can be related to the number of cloud droplets (i.e. the number of activated aerosol, n act ), we chose a simple multiple linear regression model for the analysis. Multiple linear regression is a commonly used statistical method for explanatory and theory-testing purposes, and thus it is appropriate to use in assessing how the environmental and chemical variables contribute to the prediction of n act (Johnson et al., 2004;Tonidandel and LeBreton, 2011). It is likely that several of the predictor variables selected for this analysis will be cross-correlated; thus traditional regression indices (p value, regression coefficients) will fail to appropriately partition the predictor variables into respective contribution to the overall R 2 of the model (Tonidandel and Le-Breton, 2011). Nevertheless, active research in the statistical sciences has led to a set of tools for the assessment of the relative importance of individual covariates in linear regression models in the presence of correlated explanatory vari-ables. A widely used approach, first proposed by Lindeman et al. (1980), hence referred to as LMG, but better known in the sequential additive version proposed by Kruskal (1987), allows assigning shares of "relative importance" to a set of regressors in a linear model (Grömping, 2007). Here we use the LMG method, in its implementation in the "relaimpo" package, developed by Grömping (2006) and available for the scientific computing language R (R Core Team, 2014), to assess the relative importance of individual explanatory variables in a simple linear regression model for the cloud droplet numbers in warm tropospheric clouds.
Below we detail the LMG method and its application to our statistical model following Grömping (2006). Once the set of explanatory variables/regressors (x i1 , . . ., x ip ) is defined, as in our analysis in Eq. (5), the multiple linear regression model is fitted and the regression coefficients for each explanatory variable (β k , k = 0, . . ., p) included in the model are estimated by minimising the sum of squared unexplained parts. The coefficient of determination (R 2 ) can then be expressed using the fitted response values (ŷ i ) and estimated coefficients (β k ) as the ratio between the model and total sum of squares (MSS and TSS respectively), i.e. R 2 =

MSS
The LMG method decomposes the coefficient of determination into non-negative contributions that sum to the total R 2 . First sequential (i.e. regressors are used in listed order, e.g. as given in our model in Eq. 5) sums of squares (SSS) are derived via analysis of variance (ANOVA). These sequential sums of squares, for each regressor, sum to the MSS of the TSS. Next sequential R 2 contributions are derived by dividing SSS by TSS. These sequential R 2 contributions are then utilized in the LMG method. As the order of the explanatory variables in any regression model is a permutation of the available regressors x 1 , . . ., x p , it can be denoted by the tuple of indices r = (r 1 , . . ., r p ). The set of regressors entered in the model before regressor x k in the order of r can then be denoted as S k (r). Thus the portion of R 2 allocated to explanatory variable x k in the order r can be written as Using Eq.
(2) the metric LMG can be written as which can be further simplified to as orders with the same S k (r) can be summarized into one summand (Grömping, 2006). In the following we propose simple linear regression models developed based on 4 years of observations from the JFJ, Switzerland. Additionally, the best performing regression model was run for subsets of the data corresponding to the different years, and wind directions, to identify any features in the data which were particular to these subsets. The aim of this analysis was to determine whether a single statistical model can be constructed which will be generally applicable for the prediction of the number of cloud droplets for all years and wind directions.

Results
In total, 2399 data points were included in the analysis, with the majority being from 2010 (1087) and 2011 (896). Data were limited in 2002 (206 points) and 2004 (210 points) compared to those in 2010 and 2011, since there were more episodes of entrainment or partially glaciated clouds where data were excluded from this analysis. The 2002 campaign was relatively short and the time resolution of the measurement data set was lower in 2004 and 2002 than in later years, as described above, yielding fewer data points. In Figs. 1 to 4, time series of the predictor variables are shown for each campaign. In these plots, it can be seen that the data sets include a wide range of conditions with respect to meteorology and air parcel composition. In the upper panels of the plots, n CCN is plotted together with n act . In 2011 and 2010 ( Figs. 1 and 2) there are episodes of relatively high n CCN , during which not all particles larger than 80 nm are activated, as shown by the lower n act numbers. Additionally, the fraction of particles that are activated appears to be lower when the wind is from the southeast (red symbols in the bottom panel of the plots). In 2004 (Fig. 3), however, n CCN is generally fairly low, with, in a few cases, larger n act than CCN, indicating that also particles below the chosen cut-off diameter for potential CCN are being activated. In 2002 (Fig. 4), there is a broad range of n CCN values, and activation appears to be high in almost all cases, regardless of wind direction or updraught velocity. In all years, the mixing ratios of CO and O 3 (second panel) appear to be fairly well correlated with each other, except around day 12 of the 2002 campaign (overall R = 0.65). There does not appear to be an appreciable link between wind direction and CO or O 3 mixing ratio. The temperature range is similar for all the data sets, with temperatures generally between 270 and 280 K. An episode of warmer temperatures in the first half of the 2010 campaign corresponds with relatively high CO and O 3 values, as well as higher aerosol number concentrations. The cooling after day 20 is accompanied by a marked reduction in n CCN , as well as an increase in the fraction of aerosol which are activated to form cloud droplets. As can be seen in the bottom panel of each plot, the updraught velocities are generally lower when the wind is from the southeast than when it is from the northwest, consistent with the findings of Hammer et al. (2014).

Statistical relationships for combined data
The modelled number of cloud droplets is plotted against the observed number (n act ), for a variety of statistical model formulations, in Fig. (5). In panel a, only n CCN is used to predict the number of cloud droplets. Already here a good relationship is found, with a correlation (R) of 0.89; how-ever, the intercept in the model leads to an unphysical cut-off at low modelled numbers. Including the updraught velocity improves the model slightly, while the R value remains the same, the root mean squared error (RMSE) reduces slightly from 59.7 to 58.1. updraught velocity rather than the updraught velocity itself. The latter statistical model was found to provide the best representation of the observed number of droplets, with an R value of 0.91, RMSE of 54.2, and a mean error (ME) of 38.1. The statistical model presented in panel d of Fig. 5 provides a simple and reasonably accurate way of predicting the number of cloud droplets formed based on only a few explanatory variables. The number of activated aerosol (considered equivalent to the number of droplets) predicted by this model is given by where ω is the estimated updraught velocity at cloud base in m s −1 , CO and O 3 are mixing ratios in ppb, and H is the height of the JFJ above the cloud base in metres (H must be greater than 0). The model considering only the number of CCN, as shown in panel a of Fig. 5, is n act = 0.57n CCN + 43.27.
The same analysis was performed with changes in the minimum size of aerosol considered to be CCN to 70, 90, and 100 nm (Fig. 6), but this did not improve the model skill in relation to the results obtained when counting only aerosol larger than 80 nm to determine n CCN . In fact, there was little variation in the model skill when these different size criteria were used in the definition of potential CCN.
It should be noted that at very low n CCN , the statistical model may return negative values for the number of droplets, which is obviously unphysical. However, this only applies to a very small number of points (16 of the 2399 points presented here) and thus does not compromise the general applicability of the proposed model.

Comparison with physically based parametrization
To put the results presented in Figs. 5 and 6 into the context of previous work, a state of the art cloud droplet formation parametrization was used to calculate the cloud droplet number for the same data points. Here we apply the sectional form of the cloud droplet formation model of Nenes and Seinfeld (2003) and Fountoukis and Nenes (2005), with the giant CCN correction as described by Barahona et al. (2010). In applying this parametrization, input data are required, describing the chemical composition, aerosol size distribution, updraught velocity, pressure, and temperature. For the aerosol, the size distributions obtained by the SMPS are used (in original bin form), while an average aerosol hygroscopicity of 0.25 (corresponding to an aerosol mixture of roughly 42 % ammonium sulfate and 48 % insoluble aerosol) is assumed, which is similar to the hygroscopicity value found from 17 months of measurements at the JFJ by Jurányi et al. (2011), for particles with a critical dry diameter of around 80-100 nm. The parametrization was also run for the overall median hygroscopicity value given by Jurányi et al. (2011) of 0.2, as well as a value of 0.3, to test the sensitivity of the results to small changes in assumed hygroscopicity within the bounds of that which has been measured at the JFJ. Vertical velocity for the parametrization input was calculated using the method of Hammer et al. (2014), multiplied by an estimated correction factor of 0.25, following the suggestions of Hammer et al. (2015). Pressure and temperature at cloud base are also used, calculated in the same way as for the statistical model. A comparison of the predicted number of cloud droplets and the number of observed cloud residuals is shown in Fig. 7. The agreement between the modelled and observed data is excellent, with an R value of 0.86, RMSE of 67.2, and an ME of 42.8. The errors for Eq. (5), in panel d of Fig. 5, are only slightly lower than this. The R and error values for the microphysical parametrization run for the three different hygroscopicity parameters are shown in Table 2. There it can be seen that within the range of likely hygroscopicity values for the JFJ there is little variation in the R values or errors from the model calculations. A slight decrease in the RMSE and ME is found when the hygroscopicity value is increased from 0.2 to 0.25 and 0.3.

Difference between wind directions
It was observed by Hammer et al. (2014) that the number and properties of aerosol in air parcels approaching the JFJ from the southeast was different from those in air approaching from the northwest. Further, they found that the activation diameter of particles differed considerably between the two wind directions. Therefore the total data set used here was divided according to wind direction, and the statistical model given by Eq. (5) was applied to see whether its ability to reproduce the observed number of droplets differed between the two wind directions. This comparison is shown in Fig. 8. The R values for the northwestern wind direction and the southeastern wind direction are the same (0.9), but the RMSE and ME are both substantially lower for the northwestern wind direction (RMSE of 49.3 vs. 67.4 and ME of 34.5 vs. 49.4). In the northwesterly case it can be seen that the model shifts from a slight overestimation of the observed number of cloud droplets to a slight underestimation, with the crossover occurring at about 150 drops cm −3 . The data in the southeastern case appear to closely follow the 1 : 1 line.    Therefore there appears to be no systematic bias introduced by considering both wind directions in the model together.
The results of the microphysical parametrization simulations, separated by wind direction, are shown in Fig. 9. Here it is seen that the microphysical parametrization is better able to represent the number of droplets in the northwestern wind case (R of 0.91), while in the southeastern case the RMSE increases to 107, and the model underestimates the number of cloud droplets, particularly for numbers of residuals above about 300 cm −3 . This may be due to differences in turbulence and vertical wind velocity between the northwestern and southeastern wind cases, which are not resolved by our vertical wind velocity estimation.

Difference between years
To determine how representative the model in Eq. (5) is for data from different years, the results were broken up into data for each year, shown in Fig. 10. For 2002, 2010, and 2011, the modelled data are well correlated with the observed number of droplets (R of between 0.89 and 0.95), but the slope varies between different years. While the data from 2011 lie along the 1 : 1 line, the 2010 data seem to be composed of two different groups of points with different slopes, below and above approximately 300 drops cm −3 . It is not surprising that the R and error values are better for 2010 and 2011, as these years provide by far the most data points to which the model was fitted. The R for 2002 (0.95) was the highest of all years, but many of the data points are below the 1 : 1 line and the RMSE was higher than for the other years (82.7). The data collected during 2004 are less well fit by the model (R of 0.76, RMSE of 47.2). However, as there were so few data points in 2004, and these were mostly at low droplet numbers, it is difficult to say whether this is due to the data sampled or the conditions being fundamentally different during 2004.
Again, the results of the microphysical parametrization are shown, this time separated by year, in Fig. 11. The RMSE for the 2002 data is higher than for the statistical model (124 vs. 82.7), and the microphysical parametrization was found to generally underestimate the number of cloud droplets in cases where there were more than approximately 200 residuals cm −3 . It is interesting to note that the statistical model also generally underestimates the observed values for 2002. For 2004, the microphysical parametrization represents the observational data better than the statistical model, with an R value of 0.82 compared with the 0.76 of the statistical model and an RMSE of 42.9 compared to 47.2 for the statistical model. For 2010 and 2011, both the statistical model and the microphysical parametrization represent the observed data well.
The differences between the years were also investigated by re-fitting the statistical model to each individual year of data (Fig. 12). Naturally, this results in higher values of R and smaller errors. For example, in 2002 a good correlation is seen, with R of 0.96 and an RMSE of only 53.5. In Figure 11. The number of cloud droplets calculated by the microphysical parametrization, separated by year, compared to the number of observed cloud droplet residuals.
2002, it can also be seen that the model underestimation of points above 500 drops cm −3 seen in previous plots is not due to a saturation effect, as the observed droplet number can be predicted over the whole range of n CCN with one set of parameters. The model representation of 2004 is improved when the model is fitted to only 2004 data, but the R value is still only 0.83, lower than for the other years. This appears to be related to the overall low range of n CCN observed in 2004. Both 2010 and 2011 are well represented by models fitted specifically to these data.
As a further way to assess the general applicability of the proposed linear model, we sampled 100 data points at random (without replacement; i.e. individual data points are allowed to be drawn only once to avoid a sampling bias as e.g. in Friedman, 2015) from each year of data, and the R and error values were calculated with (i) the general model and (ii) the models fitted to each sampled set of 400 observations (i.e. 100 observations from each year) separately. To ensure for statistically robust results this analysis was performed for a set of 1000 random samples, and the results are summarized in Fig. 13. Due to the small number of data points in 2004 (210) and 2002 (206), the samples for these years did not differ greatly. In Fig. 13, it is apparent that the individually fitted models for the 1000 subsets perform slightly better than the simultaneously applied general model (as expected); however, given the small differences in both R and error values between the individual and general models, illustrated by Atmos. Chem. Phys., 16, 4043-4061 the overlap of the inner quartile ranges in both R and error values, the general model can be considered to be robust for the data set and applicable over a wide range of observed conditions.

Discussion
The analysis above shows that the number of cloud droplets can be reasonably well predicted by a single statistical model, containing the n CCN , the log of the updraught velocity, the height above cloud base, and the mixing ratios of CO and O 3 . The contribution of each variable to the variance explained by Eq. (5) is shown in Fig. 14, along with error bars, denoting the range of the contributions of each variable in the random sampling analysis described in the previous section. The range of the parameters included in Fig. 14 is relatively small, indicating that the contribution to the explained variance is similar regardless of the sample taken from the data set. By far the greatest contribution to the explained variance is from n CCN , but including additional explanatory variables does improve the model with respect to absolute biases. The O 3 and CO mixing ratios contributed around 10 and 4 % respectively of predictive ability to the model, suggesting that for sites such as the JFJ, which are located relatively far from direct emissions sources, the chemical history or source region of the air mass is not greatly relevant in predicting the activation of aerosol to cloud droplets. Previously, Jurányi   Hammer et al. (2014) found that the hygroscopicity parameter of aerosols observed at the JFJ is not highly variable. The results presented here also indicate that changes in aerosol properties, which would generally be correlated with CO or O 3 concentrations, are not large enough to substantially influence aerosol activation. The height above the cloud base, H , contributed a small amount (around 7 %) to the explained variance. This is likely due to the height above cloud base being a measure of the total amount of condensible water in the cloud, with greater condensible water generally leading to more droplets. The cloud base temperature was not found to be significantly correlated with the cloud droplet number over the combined data set; therefore we find no evidence that temperature-dependent influences of surface active compounds play a significant role in cloud droplet activation. A previous study carried out at the JFJ, by Henning et al. (2002), found that when the number of potential CCN with diameter greater than 100 nm reduced below 100 cm −3 , the activation diameter shifted to smaller sizes, so that significant numbers of aerosol smaller than 100 nm began to activate. However, the ability of Eq. (5) to predict n act does not deteriorate at low particle numbers, possibly because in this work particles larger than 80 nm are considered potential CCN.
A linear dependence of the number of cloud droplets on n CCN implies that there is not a strong competition for water vapour during most of the activation phase of cloud droplet formation. Whether or not this occurs depends on the CCN number, the slope of the CCN spectrum, vertical velocity, the degree of external mixing, the presence of giant CCN (sea salt, dust), and temperature (e.g. Rissman et al., 2004;Reutter et al., 2009;Ghan et al., 1997;Morales Betancourt and Nenes, 2014). A good indicator of linearity is expressed by the partial sensitivity of the droplet number to the number of aerosol, ∂N d /∂N a (also known as the aerosol-cloud index, ACI), for a given set of aerosol and cloud formation conditions. The closer the ACI is to unity, the less competition effects are present, linearity applies, and vice versa. The ACI can be calculated either numerically with a parcel model (Reutter et al., 2009) or with a parametrization adjoint Moore et al., 2013;Morales Betancourt and Nenes, 2014). The latter is used here to establish the degree to which linearity holds for the conditions at the JFJ. The results of this calculation are shown in Fig. 15. In panel a, it can be seen that the ACI increases from near zero at low updraught velocities to around 0.4 at updraught velocities of approximately 1 ms −1 and higher (note that the updraught velocities shown in Fig. 15 have been corrected by a factor of 0.25, as described in Sect. 5.2). This suggests that the form of the relationship between the number of droplets and n CCN does not change at updraught velocities higher than approximately 1 ms −1 . Therefore while the updraught has only a small influence on the number of cloud droplets under these conditions, it does slightly influence the relationship between the number concentration of aerosol and the number of droplets. Panel b of Fig. 15 shows the sensitivity of the droplet number to n CCN as a function of n CCN . Here it can be seen that the sensitivity does not display any obvious trend with increasing n CCN , supporting our choice of a linear relationship between the number of droplets and n CCN .
These results correspond with previous studies. For example, Reutter et al. (2009) found the number of cloud droplets to be directly proportional to the particle number concentra-tion when the ratio of updraught velocity to particle number concentration was high, but they found that, under low ratios, the number of cloud droplets formed was only dependent on the updraught velocity. In that study, the lower limit of the regime where the number of cloud droplets depends on the number of particles was found to be an updraught to particle number concentration ratio of 10 −3 ms −1 cm 3 ), which, for a CCN concentration of 800 cm −3 , requires a vertical wind speed of only 0.8 ms −1 . Examining Figs. 1 to 4, it can be seen that almost all of the northwestern wind cases, and most of the southeastern wind cases, have vertical wind speeds higher than 1 ms −1 (if the wind speeds in Figs. 1 to 4 were corrected by a factor of 0.25, as was done for the microphysical modelling, 67 % would still be above 1 ms −1 ). Therefore, based on the study of Reutter et al. (2009), a direct dependence of the number of droplets on the number of potential CCN would be expected. The study of Partridge et al. (2012) showed that under relatively clean conditions, the details of the aerosol number size distribution determined the number of cloud droplets; however, when the accumulation mode particle concentrations were above approximately 1000 cm −3 , the chemical composition of the particles played the major role in determining the number of cloud droplets. Partridge et al. (2012) also found that the importance of the particle chemistry increases relatively to that of the particle sizes at lower updraught velocities. Under conditions where the aerosol population is externally mixed, the number of cloud droplets formed may also not be directly dependent on the number of CCN, as changes in the relative abundance of particles with differing hygroscopicities will influence the formation of cloud droplets. Nevertheless, Dusek et al. (2006) found that there was little change in the activation diameter of particles (less than 20 nm) when comparing polluted and background air masses at a non-urban site. These studies support the idea that for cloud formation at remote sites such as the JFJ, with updraught velocities above approximately 1.0 ms −1 and relatively low aerosol number concentrations, the number of cloud droplets formed should be dependent on the number and size of the aerosol present.
Finally, the statistical models and the microphysical parametrization presented in this study are compared with two existing parametrizations, those of Jones (1994) and Martin et al. (1994), both of which used n CCN to predict the number of cloud droplets which would be formed. The Martin et al. (1994) parametrization is given by N droplets = −2.10 × 10 −4 A 2 + 0.568A − 27.9, where A is the number of aerosol in the size range 100 nm-3.0 µm in diameter. We use the version suggested for use in maritime air masses (their Eq. 12), as the version for continental air masses (their Eq. 13) produces a very poor representation of the number of observed droplets at the JFJ (not shown). This is possibly because the maritime parametrization is more representative for air masses with relatively low aerosol number concentrations, as encountered at the JFJ. Figure 16. A comparison of the statistical models developed in this study, and the microphysical parametrization, with the performance of two existing models by Martin et al. (1994) and Jones (1994), which are based only on n CCN .
The maritime parametrization is described as being valid over the range of aerosol number concentrations of 36 to 280 cm −3 . The Jones (1994) parametrization is derived from a combination of the continental and maritime parametrizations of Martin et al. (1994) and should therefore be valid over the range of aerosol number concentrations of 36 to 1500 cm −3 . It is given by N droplets = 375 1 − exp −2.5 × 10 −3 A .
The modelled cloud droplet number concentration is plotted against the measured values for Eqs. (5) and (6) as well as against the models of Martin et al. (1994) and Jones (1994) and the microphysical parametrization, in Fig. 16. Comparison of Eqs. (5) and (6) with the other models considered shows that, although all five models provide a similar degree of explained variance (between 74 and 83 %), error values are higher for the Jones (1994) and Martin et al. (1994) models. The microphysical parametrization has a slightly lower R value than the other models but has better error values than the Jones (1994) and Martin et al. (1994) models. While all five models show a good correlation between modelled and measured cloud droplet numbers, the model of Martin et al. (1994) has a too shallow slope, resulting in a general underestimation of the observed values. Both the Jones (1994) and Martin et al. (1994) models have included a saturation effect at higher n CCN which limits the number of cloud droplets formed, similarly to the effect described by Reutter et al. (2009). No such saturation effect is observed at the JFJ, but it cannot be ruled out that such an effect may occur at higher aerosol number concentrations than those presented here.

Conclusions
Using data from four summertime CLACE campaigns performed at the high-altitude research station at the Jungfraujoch, we have shown that the number of cloud droplets formed in warm clouds can be rather accurately represented by a simple statistical model (Eq. 5), producing a similar degree of accuracy to that achieved with a microphysical parametrization. The majority of the variance in the observed droplet numbers is explained by the number of potential CCN, which is defined in this study as the total number of particles with a dry diameter greater than 80 nm. Using the number of potential CCN alone, 79 % of the observed variance is explained (Eq. 6). With the addition of further explanatory variables, such as CO and O 3 mixing ratios, and the height above cloud base, the RMSE and ME errors can be slightly reduced.
Although tuning the statistical model to each year of data separately produces slightly improved results, Eq. (5) represents the observed droplet numbers from the individual years quite adequately. Likewise, the model is applicable to data from both of the predominant wind directions at the JFJ, and although there is more variability in the model's ability to predict the number of droplets formed during southeasterly wind conditions, there appears to be no substantial bias.
In contrast to previous studies in which such models were constructed (e.g. Martin et al., 1994;Jones, 1994), no evidence for a saturation effect of high CCN numbers was observed; instead, the number of droplets formed increased continually with n CCN . Such a saturation effect is expected to occur at higher aerosol number concentrations, for example closer to aerosol sources or in more polluted environments.
It should be noted that the statistical model is based only on data collected during summer campaigns and that periods with partially or fully glaciated clouds have been excluded from the data set (as described in Sect. 4.1). During such periods the number of activated aerosol is also influenced by water uptake by ice particles, changing the relationship between the number of CCN and the number of cloud droplets. The statistical model is thus considered valid only for liquid clouds.
Due to the location of the JFJ station on the alpine divide, with air masses approaching from both the north and the south, we expect Eqs. (5) and (6) to be broadly applicable to the remote European continental troposphere but with a boundary layer influence. Indeed, these equations should be generally applicable to conditions where droplet activation occurs in the aerosol limited regime. While such empirically derived relationships have their limitations, and may not remain valid under substantially perturbed atmospheric conditions, they provide a simple and computationally effi-4058 C. R. Hoyle et al.: Aerosol activation in warm clouds cient way to calculate the number of cloud droplets in warm clouds, when appropriately applied.