ACTRIS non-methane hydrocarbon intercomparison experiment in Europe to support WMO GAW and EMEP observation networks

. The performance of 18 European institutions in-volved in long-term non-methane hydrocarbon (NMHC) measurements in ambient air within the framework of the Global Atmosphere Watch (GAW) and the European Monitoring and Evaluation Programme (EMEP) was assessed with respect to data quality objectives (DQOs) of ACTRIS (Aerosols, Clouds, and Trace gases Research InfraStructure Network) and GAW. Compared to previous intercomparison studies the DQOs deﬁne a novel approach to assess and ensure a high quality of the measurements. Having already been adopted by GAW, the ACTRIS DQOs are demanding with deviations to a reference value of less than 5 % and a repeatability The participants of the intercomparison analysed two dry gas mixtures in pressurised cylinders, a 30-component NMHC mixture in nitrogen (NMHC_N 2 ) at approximately 1 nmol mol − 1 and a whole air sample (NMHC_air), following a standardised operation procedure including zero- and calibration gas measurements. Furthermore, participants had to report details on their instruments and assess their measurement uncertainties. The NMHCs were analysed either by gas chromatography–ﬂame ionisation detection (GC-FID) or by gas chromatography–mass spectrometry (GC-MS). For the NMHC_N 2 measurements, 62 % of the reported values were within the 5 % deviation class corresponding to the ACTRIS DQOs. For NMHC_air, generally more frequent and larger deviations to the assigned values were observed, with 50 % of the reported values within the 5 % deviation class. Important contributors to the poorer performance in NMHC_air compared to NMHC_N 2 were a more complex matrix and a larger span of NMHC mole fractions (0.03–2.5 nmol mol − 1 ). The performance of the participating laboratories were affected by the different measurement procedures such as the usage of a two-step vs. a one-step calibration, breakthroughs of C 2 –C 3 hydrocarbons in the focussing trap, blank values in zero-gas measurements (especially for those systems using a Naﬁon ® Dryer), adsorptive losses of aromatic compounds, and insufﬁcient chromatographic separation.

The participants of the intercomparison analysed two dry gas mixtures in pressurised cylinders, a 30-component NMHC mixture in nitrogen (NMHC_N 2 ) at approximately 1 nmol mol −1 and a whole air sample (NMHC_air), following a standardised operation procedure including zero-and calibration gas measurements. Furthermore, participants had to report details on their instruments and assess their measurement uncertainties.
The NMHCs were analysed either by gas chromatography-flame ionisation detection (GC-FID) or by gas chromatography-mass spectrometry (GC-MS). For the NMHC_N 2 measurements, 62 % of the reported values were within the 5 % deviation class corresponding to the ACTRIS DQOs. For NMHC_air, generally more frequent and larger deviations to the assigned values were observed, with 50 % of the reported values within the 5 % deviation class. Important contributors to the poorer performance in NMHC_air compared to NMHC_N 2 were a more complex matrix and a larger span of NMHC mole fractions (0.03-2.5 nmol mol −1 ). The performance of the participating laboratories were affected by the different measurement procedures such as the usage of a two-step vs. a one-step calibration, breakthroughs of C 2 -C 3 hydrocarbons in the focussing trap, blank values in zero-gas measurements (especially for those systems using a Nafion ® Dryer), adsorptive losses of aromatic compounds, and insufficient chromatographic separation.

Introduction
Volatile organic compounds (VOCs) are important atmospheric trace gases with anthropogenic and biogenic emissions (e.g. Koppmann, 2007;Warneck, 1988, and references therein). VOCs include a large variety of non-methane hydrocarbons (NMHCs, mostly from C 2 -C 16 ) such as alkanes, alkenes, alkynes, aromatic compounds, and terpenoids as well as oxygenated VOCs (OVOCs) such as alcohols, aldehydes, and ketones (Andreae and Merlet, 2001;Monks et al., 2009;Placet et al., 2000;Plass-Duelmer et al., 1993;Sawyer et al., 2000). The mole fractions of these compounds vary from below 1 pmol mol −1 to tens of nmol mol −1 in background and urban air, respectively (e.g. Gros et al., 2007;Parrish and Fehsenfeld, 2000). Atmospheric VOCs have an impact on the oxidising capacity of the atmosphere through their role in the generation of photo-oxidants (e.g. ozone and organic radicals) and are precursors of secondary organic aerosols. For these reasons, reliable measurements of VOCs are essential, and they are consequently included in the long-term monitoring programmes of the Global Atmosphere Watch (GAW) of the World Meteorological Organization (WMO, 2007a), regional programmes such as the European Monitoring and Evaluation Programme (EMEP), and national air pollution monitoring networks.
In Europe the measurement capacity for VOCs in the atmosphere is diverse. On the one hand, several laboratories maintain long-term and high-quality measurements based on sophisticated quality assurance/quality control (QA/QC) systems, high-quality standard gases, previous intercomparison activities, and audits by the World Calibration Centre for VOCs. On the other hand, the performance of other laboratories is limited by a lack of such quality measures and by the fact that there are no commonly agreed-upon guidelines concerning standards, homogenised quality assurance procedures and measurement methods. In Table 1 of the WMO GAW Report No. 171 (WMO, 2007b) 17 priority VOCs (NMHCs and oxygenated VOCs) were identified and general quality assurance recommendations were defined ( Table 2). The European infrastructure network project ACTRIS (Aerosols, Clouds, and Trace gases Research In-fraStructure) has expanded the priority substances to further NMHCs described in Table 1. Furthermore, measurement guidelines and a quality management system were developed under ACTRIS to harmonise trace gas measurements of NMHCs in Europe (http://www.actris.net/Project/ WorkPackages/WP4/tabid/4428/Default.aspx). One objective of ACTRIS was to assess the current NMHC measurement capacity in Europe and to investigate the analytical performance of laboratories in terms of data quality objectives (DQOs) for repeatability and uncertainty. Strict DQOs were defined in ACTRIS (Table 2) and were adopted by the GAW Scientific Advisory Group for Reactive Gases during their meeting in October 2014. Whilst in the WMO GAW Report No. 171 DQOs are defined for accuracy and precision, these have been replaced in ACTRIS with uncertainty (in the sense of expanded combined uncertainty with coverage factor k = 2; JCGM, 2008) and repeatability (which characterises the short-term standard variation in multiple measurements).
VOC species are normally measured with gas chromatography coupled to either a flame ionisation detector (GC-FID) or a mass spectrometer (GC-MS). Furthermore, proton transfer reaction mass spectrometry (PTR-MS) is also used for the measurement of oxygenated VOCs, terpenoids, dialkenes, and aromatics. While PTR-MS analyses VOCs from air samples directly, GC-based techniques need a preconcentration step. Here VOCs are either analysed immediately after sampling onto suitable adsorbents (online) or they are collected in specially treated steel or glass cylinders or on cartridges filled with adsorbents and analysed later in the laboratory (offline). Problems which can occur are chemical reactions in the samples (due to e.g. reactions with ozone), adsorptive losses, memory effects or leaks, losses during the preconcentration and the desorption steps, chemical reactions during thermal desorption, insufficient separation on the chromatographic column and misidentification, peak overlap, and inaccurate quantification (Helmig, 1997(Helmig, , 1999Helmig and Vierling, 1995;Koppmann et al., 1995;Parrish and Fehsen- -Duelmer et al., 2002;Rudolph, 1999;Westberg and Zimmerman, 1993). Several NMHC intercomparisons have been carried out in the past on European and global scales. Generally, these aimed at an evaluation of the quality of VOC measurements without data quality objectives defining a threshold to differentiate between higher-and lower-quality data (e.g. NOMHICE: Apel et al., 1994Apel et al., , 1999Apel et al., , 2003; AMOHA: Table 2. ACTRIS and former GAW 1 data quality objectives (DQOs). Numbers express the expanded uncertainty (coverage factor k = 2), and the repeatability (standard deviation).  Plass-Duelmer et al., 2006;GAW: Rappenglueck et al., 2006;Bernardo-Bricker et al., 1995;De Saeger and Tsani-Bazaca, 1993;Hahn, 1994;Pérez Ballesta et al., 2001;Romero, 1995;Volz-Thomas et al., 2002). NOMHICE (Nonmethane Hydrocarbon Intercomparison Experiment) and AMOHA (Accurate Measurements of Hydrocarbons in the Atmosphere) were two systematic multistage intercomparisons for NMHCs performed in North America and Europe, where the complexity of the NMHC measurements (numbers of compounds and sample gas mixtures) increased during the experiments. While in earlier intercomparisons the use of certified NMHC calibration standards was not common (Apel et al., 1994(Apel et al., , 1999(Apel et al., , 2003De Saeger and Tsani-Bazaca, 1993;Hahn, 1994;Pérez Ballesta et al., 2001;Romero, 1995), multicomponent standards with certified NMHC mole fractions were circulated for analysis among the participating laboratories in more recent intercomparisons (Plass-Duelmer et al., 2006;Rappenglueck et al., 2006;. Within these studies the calibration with multicomponent NMHC calibration standards was superior to calibration with a single hydrocarbon species. Therefore, also in the ACTRIS intercomparison experiments all participating laboratories were asked to use certified multicomponent NMHC calibration standards, traceable to the GAW scale, for calibrating their instruments and for performing quality checks. Eighteen stations or laboratories from nine European countries took part in this ACTRIS intercomparison exercise for the analysis of NMHCs. OVOCs were excluded due to their instability in pressurised cylinders at ambient mole fractions. Pressurised cylinders filled with NMHCs in nitrogen (in the following called NMHC_N 2 ) and NMHCs in whole air (in the following called NMHC_air) were analysed by the different laboratories using their own certified multicomponent NMHC calibration standard. The participants performed their measurements with GC-FID, GC-MS, or PTR-MS instrumentation. The performance of the different laboratories was examined with respect to compliance with the DQOs of ACTRIS and GAW (Table 2). Feedback was provided to the participants during a workshop, via analysis of technical details of each instrument, and the provision of recommendations for further characterisations and improvements.
This paper presents the findings of the intercomparison, with a focus on alkanes, alkenes, alkynes, and aromatic compounds. Results are used to discuss the status of current NMHC measurement capabilities in Europe 10-20 years after AMOHA, GAW, and NOMHICE intercomparisons and to discuss and evaluate issues with different instrumental setups used in the field.

Intercomparison approach
Eighteen European laboratories with 23 different GC instruments participated in this NMHC intercomparison exercise in 2012 ( Fig. 1 with laboratory abbreviations, Tables S1-S2 in the Supplement). Additionally, two PTR-MS instruments analysed the NMHC mixtures (Table S2; results are shown only in the Supplement). It should be pointed out that the "PerkinElmer Online Ozone Precursor Analyzer" is the only commercially available all-in-one instrument tested in this study. All other instruments use combinations of commercially available parts and custom-built units.
The intercomparison exercise was performed in two loops (with nine participants each) in order to keep the total time for the exercise within a few months. All participants received two cylinders, one filled with NMHC_N 2 and one with NMHC_air.

Preparation of NMHC mixtures
The two NMHC mixtures, NMHC_N 2 and NMHC_air, were prepared in 10 L "Quantum" passivated aluminium cylinders (Air Products, purchased from National Physical Laboratory (NPL)). NMHC_N 2 was diluted with nitrogen (quality 5.0 from Linde AG, Germany) from a ∼ 100 nmol mol −1 uncertified mixture of 30 NHMCs (and several monoter-  Table S1-S2. penes) in nitrogen (prepared by NPL for HPB on demand) into two cylinders by HPB. The resulting mole fractions in NMHC_N 2 were ∼ 1 nmol mol −1 ( Table 1). The final pressure in the cylinders was ∼ 120 bar. NMHC_air was filled with ambient air from Dübendorf (a suburban area of Zurich, Switzerland) in two 10 L cylinders, using a modified oilfree diving compressor (Model SA-6; RIX Industries, USA) on 31 October 2011. Due to the pressurisation, the water vapour condensed and the final humidity in the cylinders was very low (dew point < −30 • C, relative humidity ∼ 1 %). The mole fractions of C 2 -C 8 NMHCs in NMHC_air ranged from 0.03 to 2.5 nmol mol −1 ( Table 1). The final pressure in the cylinders was ∼ 80 bar. Mole fractions in NMHC_air were in the upper range of rural stations in Europe and higher than remote conditions (Helmig, 1997;Helmig et al., 2008;Read et al., 2009).
Three laboratories (WCC-VOC, HPB, and Empa) assigned NMHC mole fractions to the different cylinders before and after the intercomparison. Additionally, these two time-separated measurements were used to assess the stability of the NMHC mixtures. All three laboratories used certified NMHC calibration standards from the GAW Central Calibration Laboratory for NMHCs (NPL), which defines the scale for NMHC measurements in WMO GAW. The analytical systems of the three reference laboratories can be considered as sufficiently independent as different pre-concentration systems and chromatographic columns are used (see Table S2a-b).
Since HPB and Empa acted as reference laboratories, their data measured in the middle of the intercomparison were not used for the reference value determination. However for completeness, their values are displayed in Figs. 2-4 and S1-S4, and Tables S3-S6 together with those of the other participants and correspondingly marked as no "independent" results.
The NMHC mole fractions were usually assigned as errorweighted means (Barlow, 1989;Bronštejn, 2007) and are displayed with their corresponding expanded uncertainty (coverage factor k = 2, corresponding to 2σ or roughly a 95 % confidence interval) in Table 1. A more detailed description is given in the Supplement.

Measurement approach
A detailed measurement guideline was provided to the participants to ensure consistent and comparable measurements of the NMHC mixtures. All participants used the same provided pressure regulators (model 206A from Scott Specialty Gases, USA) and transfer lines (Silcosteel ® , 1 / 16 in., ∼ 2.5 m). The pressure regulator was mounted at least 24 h before the measurement onto the gas cylinder and connected to the transfer line. Afterwards, the regulator and the transfer line were flushed three times and an initial leak test was performed (observation of pressure drop during 10 min). The pressure regulator and the transfer line were kept pressurised for at least 24 h (with closed cylinder valve) for equilibration of surfaces. Additionally, this setup served as a static leak test.
All participants were asked to quantify the NMHC mole fractions using their own calibration standard (Table S2b) and to report their expanded measurement uncertainty (see Supplement). Within GAW and ACTRIS, the scale for NMHC measurements is defined by the Central Calibration Laboratory (CCL) NPL, which continuously compares their NMHC scale and the associated expanded combined uncertainty of typically 2 % with other NMI (National Metrology Institutes) (Grenfell et al., 2010). Though different scales might cause a bias of results by participants that are related to a non-NPL laboratory standard, this ACTRIS comparison study addresses the inter-laboratory compatibility related to the standardised GAW NMHC scale provided by NPL. The certified expanded uncertainties of the NPL of 2 % are generally much lower than deviations discussed in this paper, e.g. beyond the DQO of 5 %.
The composition and the mole fractions in the cylinders were unknown to all participants, except for the reference laboratories HPB and Empa (see above). The measurement procedure was the following: at least three calibration standard measurements, five measurements of NMHC_N 2 , five measurements of NMHC_air, at least three calibration standard measurements, and a zero-gas measurement before and after the NMHC mixture measurements. Fourteen analyses were by GC-FIDs and nine by GC-MSs (Table S2). In this paper, results for 27 and 35 NMHCs are shown for NMHC_N 2 and NMHC_air, respectively. The three trimethylbenzenes and the monoterpenes present in NMHC_N 2 were not investigated in this intercomparison paper due to the lack of available data. The assigned NMHC mole fractions (with expanded uncertainties) are given in Table 1.

Data quality objectives (DQOs) for NMHC measurements
In the WMO GAW Report No. 171 (WMO, 2007b) general DQOs for different priority VOCs were defined (Table 2). Within the framework of ACTRIS, the list of priority compounds (Table 1) was expanded, and more challenging DQOs (ACTRIS DQOs) were defined (Table 2). Overall, ACTRIS DQOs are about a factor of 2 stricter than those in the GAW Report 171. The reason for the introduction of the ACTRIS DQOs was to detect trends of NMHCs more accurately, which currently decline by 1-8 % per year in Europe (Solberg, 2012(Solberg, , 2013, and references therein). These ACTRIS DQOs were also adopted by the GAW VOC Expert Group and the GAW Scientific Advisory Group for Reactive Gases during their meetings in Daejong (South Korea, October 2014). For the uncertainty, which describes the deviation from a reference value, the goals are set to 5 % for alkanes, alkenes (including isoprene), alkynes, and aromatics (and to 10 % for monoterpenes). Values express the expanded uncertainty with a coverage factor of k = 2. The goals in repeatability, defined as the standard deviation of the NMHC measurements, are 2 % for alkanes, alkenes (including isoprene), alkynes, and aromatics, and 5 % for monoterpenes. For mole fractions below 0.1 nmol mol −1 an absolute value of 0.005 nmol mol −1 is accepted as uncertainty, and 0.01 nmol mol −1 for monoterpenes.
In the results section the measurement performance is compared against these DQOs by ACTRIS and adopted by GAW (Table 2). Hereby the uncertainties u ref of the assigned reference values (Table 1) need to be taken into account. Thus, a result fulfils the ACTRIS DQO if the deviation from the reference is less than the 5 % deviation class defined as 5% class = DQO 2 ACTRIS + u 2 ref . (1) For the 10 % deviation class (10 % class), the respective former GAW DQO (Table 2) is applied.

C response for GC-FID systems
A GC-FID system can be characterised for losses or artefacts by making use of the known carbon response, the socalled C response (Plass-Duelmer et al., 2002). When the C responses for the various NMHC compounds are calculated, they should agree within a few percent, except for ethyne (Burns et al., 1983;Dietz, 1967;Faiola et al., 2012;Gong and Demerjian, 1995;Scanlon and Willis, 1985;Sternberg et al., 1962). The C response R i for each compound i was calculated as follows: where A std i and A b i are the peak areas of compound i in the calibration standard (std) and the blank (b), respectively; m std i denotes the certified mole fraction of the calibration standard; N i the number of C atoms in compound i; and V std the sampled volume of the calibration standard.
When comparing the C response values in the calibration standard and in NMHC_N 2 , the C responses should ideally be identical. Deviation points towards either artefacts in the analytical system (e.g. breakthrough during trapping, adsorptive losses, peak overlap, changes on active sites) or in the FID due to sample matrix effects influencing the flame. For easier comparison, the C responses were normalised by the average C response of the available C 4 -C 6 alkanes (highlighted in yellow in Fig. 4). As some stations did not report C 2 -C 3 alkanes (e.g. HPB_B, FZJ_A) and additionally breakthrough in C 2 compounds could have occurred, only C 4 -C 6 alkanes were taken into account. For two-column systems, the average C response of the second column was determined using C 7 -C 8 alkanes, benzene, and toluene (highlighted in green in Fig. 4). Any individual C response deviating more than 10 % from the average C response was not considered in the normalisation process. NMHC mixtures. The majority of the participants submitted a relative repeatability in NMHC_N 2 within the former GAW DQO (± 5 % for alkanes and alkynes, ± 10 % for aromatics and ± 15 % for alkenes including isoprene), 70 % even within ± 2 % (ACTRIS DQO), independent of the detector type. Poor repeatability was mostly linked with poor chromatographic resolution (see Tables S5 and S6). In the following, reasons for deviations larger than the stated quality objectives will be discussed.

Uncertainty estimations of the NMHC measurements
Performing a complete uncertainty estimation is critical to the quality of the data. Nevertheless, only the participants DOU, KOS (both systems), RIG, HPB (both systems), JFJ, MHD, NILU and ZSF provided a thorough analysis (see "Determination of assigned values (error-weighted means) for NMHC mixtures" in the Supplement) of their expanded uncertainties (error bars in Fig. 2). All other participants calculated their measurement uncertainties only partially (e.g. only reporting repeatability). Generally, for many results the uncertainties were underestimated and, even combined with the uncertainties of the reference values, do not comprise the deviation from the assigned values. Thirty-six percent of results in NMHC_N 2 were out of the stated uncertainty ranges, and 35 % in NMHC_air. As the expanded uncertainty corresponds to the 95 % confidence interval, it would be expected that not more than 5 % of the results deviate by more than the uncertainty from the assigned values. This needs to be improved in programmes like GAW and EMEP, as realistic uncertainty estimation is essential for the user, e.g. in model validation.
Critical in this evaluation are the assigned values; if these are biased relative to the "true" values, deviations may occur. However, in NMHC_N 2 there was a check by a common dilution factor relative to a NPL-certified standard of identical relative composition, which strongly supports the determined mole fractions within better than 2 % and does not indicate any bias. For NMHC_air, we rely on the uncertainty evaluation of the reference values by the reference laboratories, which is considered a realistic estimate. Though the assigned values are generally higher than the majority of the participants' results ( Fig. 3), they are typically between the median and the 75-percentile or 90-percentile values with partially contradicting deviations for the various techniques; e.g. alkanes derived from MS are high, whereas those from FID are low compared to the reference (Fig. 3c). Furthermore, deviations in participants' results are similar for NMHC_N 2 and NMHC_air (e.g. Fig. 3a, various results in Fig. 2), supporting the assigned values in NMHC_air based on reliable NMHC_N 2 determination (see below).

Calibration procedure
One essential step on the way to high-quality NMHC data is the use of an adequate calibration procedure. The participants calibrated their NMHC measurements either directly against certified multicomponent standards (one-step calibration) or against whole air working standards, which in turn are related to a certified multicomponent standard (two-step calibration done by CMN and Medusa systems). The systems using a NPL (the GAW Central Calibration Laboratory for NMHCs) standard for direct, one-step calibrations (Table S2b) generally exhibited a good performance. Since the NMHC_N 2 mixture and the NPL calibration standard virtually comprise the same matrix, complexity, and manufacturer, observed deviations for sites referring to the NPL scale should be within the repeatability of the instruments. This is not the case for some participants and compounds, and it points to unidentified sample transfer issues. The mole fraction range of the used NPL standards (e.g. 2, 4, or 10 nmol mol −1 ) and date of production apparently did not affect the quality of the results (Fig. 2, Table S2b).
The systems FZJ_B, FZJ_A, MHD, and PUY used different certified NMHC calibration standards (Table S2b). If a systematic offset between different scales exists, it should result in systematic deviations from the assigned values. FZJ_B, FZJ_A, and MHD all used calibration standards from Apel-Riemer, but the observed deviations from the assigned values are random (e.g. deviations for alkanes are of different extent and sign (Fig. 2m, p, and v). Obviously other instrumental issues (e.g. chromatographic resolution, non-linearity of MS detector) affected these results and therefore systematic differences between the different calibration scales cannot be assessed.
The Medusa instruments (JFJ, MHD, and NILU) generally overestimated the NMHC mole fractions (Figs. 2u-w and 3b). However, the excellent repeatability suggests that the systems run much better than the deviations indicate. Thus, a significant issue might arise from the fact that Medusa instruments and CMN are calibrated with whole air working standards using a two-step calibration. Direct calibration by certified NMHC standards appears to be superior to whole air working standards for NMHCs.

GC-FID systems
In order to analyse the performance of the GC-FID systems, the normalised C response factors for the calibration standards and NMHC_N 2 were compared (Fig. 4). Though identical C-responses are expected, several GC-FID systems tend to slightly underestimate NMHCs in NMHC_N 2 compared to the calibration standard (Figs. 4 and 2). Even more surprising was the fact that in some of the systems which have two separation columns, a lower normalised C response for NMHC_N 2 compared to the calibration standard was observed in only one column, e.g. AUC (in the PLOT column) bon number is between 2 and 2.6, indicating a higher uncertainty of the C response for this compound. Thus, in the normalised C response figures ethyne is expected to be 1 or higher. This was actually observed for DOU, YRK, and RIG. Deviations between the laboratory standard and the ACTRIS NMHC_N 2 were observed at ZSF, DOU, HPB_A, and FZJ_B. Since at ZSF and FZJ_B observed deviations were not particular to ethyne but a general phenomenon for many compounds, both stations are not further considered in this specific discussion. The normalised C response of ethyne in the calibration standard of IPR was substantially lower than that of other stations (Fig. 4f).
Together with ethene, ethyne is the most difficult compound to be retained in air-toxics/air-monitoring traps (Badol et al., 2004). As AUC, HAR, PAL, SMK, ZSF, IPR, and KOS_A employ this type of traps, a breakthrough might be possible. However as already discussed, no conclusive behaviour, e.g. higher losses for higher sample volume and higher trapping temperature, was observed.
The instruments at DOU and HPB_A had in common that both employ an Al 2 O 3 /KCl PLOT column. However, other stations using the same type of column (YRK, RIG) did not show this feature. We are currently speculating about slightly different matrices between the calibration standard and NMHC_N 2 causing different interactions with active sites of the specific PLOT column, resulting in more or fewer losses.
Despite these losses observed in the C response factors, the difference to the assigned mole fractions were minor for six systems and moderate to substantial for 7 of 14 systems (larger than 10 % in either or both of the two NMHC mixtures) (Fig. 2), with often substantially different deviations for NMHC_N 2 and NMHC_air indicating matrix effects. This shows that it is essential to have ethyne in the calibration standard for direct calibration and that there is a need for thorough testing of matrix effects; e.g. real ambient air samples with higher humidity might result in higher breakthrough.

Alkene artefacts
Alkenes in NMHC_air exhibited largest differences to the assigned values (Fig. 2), especially pronounced for all systems which used Nafion ® Dryer water traps, including PerkinElmer systems (Fig. 3d).
When using a Nafion ® Dryer to remove humidity from the sample, potential artefacts in C 2 -C 4 alkenes may occur depending on the status of the Nafion ® Dryer (Gong and Demerjian, 1995;Plass-Duelmer et al., 2002, and references therein). Butene peaks (for 1-butene, trans-2-butene, and cis-butene) are frequently observed in zero-gas measurements due to Nafion ® Dryer artefacts, and these blank values have to be subtracted in calibration or ambient air measurements. Instruments using a Nafion ® Dryer reported blank values up to 0.35 nmol mol −1 for C 2 -C 3 alkenes and up to 0.1 nmol mol −1 for C 4 alkenes. Combined with the fact that the mole fractions of C 4 -C 5 alkenes were in the range of 0.02-0.12 nmol mol −1 , it is expected that substantial differences to the assigned values occur due to blank issues. For ethene and propene, however, such effects were comparably small due to much larger mole fractions up to 2.5 nmol mol −1 and blank values up to 0.25 nmol mol −1 . It should be noted that the samples measured here were not humid and thus the effects of water removal from the sample and the Nafion ® Dryer behaviour cannot fully be assessed. Most participants were aware of the effects of a Nafion ® Dryer and reported larger uncertainties of their values (Fig. 2).

Losses of aromatic compounds and C 6 -C 8 alkanes
The C responses for the C 7 -C 8 alkanes and for the aromatics were lower than 1 (Fig. 4), indicating losses in the analytical system. Lower C responses were observed either in both calibration standard and NMHC_N 2 ( Fig. 4; AUC, PAL, SMK, IPR, YRK (except benzene), RIG, FZJ_B, and less evident in HPB_A) or only in NMHC_N 2 ( Fig. 4; HAR, DOU, HPB_B, and FID). This effect was apparent in both intercomparison loops. This does not seem to be a general C response issue for aromatics, because in many systems not all aromatics showed a reduced C response ( Fig. 4; KOS (both systems); for benzene: AUC, HAR, HPB (both systems), RIG, YRK) and several other systems showed only a reduced C response for NMHC_N 2 ( Fig. 4; HAR, DOU, and HPB_B, FID). For these systems, systematic problems like insufficient desorption from the trap or adsorptive losses in the GC system can thus be excluded. However, adsorptive losses only in NMHC_N 2 might have occurred due to insufficient equilibration time and the flushing procedure of the respective pressure regulator and transfer lines. RIG reported lower C responses compared to the calibration standard for C 6 -C 8 alkanes and aromatics (Fig. 4k). This was related to insufficient desorption temperature due to ice on the outer side of the Peltier-cooled trap which had built up during trapping. In general, too-low desorption temperature from the trap can be excluded for the glass bead traps (70-130 • C, Table S2). For the air-toxics traps no losses of aromatics were observed for HAR (trap at 320 • C) (Fig. 4a). By contrast, losses prevailed at up to 380 • C (IPR), which were consequently not due to too-low desorption temperature (Fig. 4f). YRK results indicated losses which were not due to desorption temperature (Carbopack B and Carboxen 1000 at 350 • C) but were ascribed to adsorption on newly installed stainless-steel transfer lines. In the slightly more humid NMHC_air, YRK achieved relatively higher aromatic mole fractions compared to the assigned values (Figs. 2-3), indicating humidity passivation of active surface sites. Thus, losses were only apparent in their dry calibration standards (Fig. 4g). Compatible with this observation is the fact that the box plots ( Fig. 3b and e) show a systematic underestima-tion of aromatics only for NMHC_N 2 , while for NMHC_air the results are more equally distributed.
Different hypotheses to explain losses of aromatics and C 6 -C 8 alkanes did not result in simple and conclusive explanations. Losses were observed in individual systems when desorption was not sufficient, when adsorptive losses on inappropriate surfaces like newly installed stainless-steel lines (heated or not) occurred, or when dry sample gases were analysed. As long as a decrease in the C response is evident in both the calibration and NMHC_N 2 , the submitted mole fractions did not differ much from the assigned values (e.g. YRK and AUC) (Figs. 4a, g, 2a, and i).

Chromatographic resolution
Poor peak separation or peak shape (tailing) influences the peak integration and the results. Both effects can mask other problems if the sample matrix is rather complex, such as in NMHC_air, where peak overlap is likely to occur in FID systems. Due to substantially different mole fractions in ambient air compared to NMHC_N 2 , the chromatographic resolution, e.g. peak overlap, for NMHC_air differed considerably from the characteristics seen in NMHC_N 2 . Insufficient C 4 -C 6 peak separation often resulted in mole fractions outside the 10 % class in NMHC_air, especially for 2-, 3-methylpentane; 2,2-,2,3-dimethylbutane; and 2-methyl-2-butene (Figs. 2 and 3b). Similar results were already reported in the AMOHA intercomparsion, where some participants had problems in separating 1-butene from 1,3butadiene, cis-2-butene from 2-methylbutane, and isoprene from the methyl pentanes (Plass-Duelmer et al., 2006;. The reasons for the insufficient chromatographic separation include column degradation (AUC, FZJ_B), inadequate oven temperature programme (KOS), or non-baseline separation (HPB_A for C 5 -C 6 alkanes) (for chromatograms see Supplement).

MS systems
Compared to FID systems, MS systems allow a better compound identification and peak separation at the cost of detector stability. With few exceptions, HPB_B (MS) reported the NMHCs within the 5 % class (Fig. 2q). It should be kept in mind that for HPB this was not a blind intercomparison. However, the ACTRIS mixtures were treated like unknown samples. Further, HPB_B was not used for the determination of the assigned values. The instrument is operated with a FID running in parallel to the MS detector. While the FID revealed stable behaviour of the instrument, in the MS signal drifts were observed by HPB. Thus, in routine measurements the MS is tuned weekly and every air sample is accompanied by a calibration measurement. In fact the HPB_B (MS) system was the best-performing MS system in this intercomparison, indicating that NMHC measurements within the 5 % class (ACTRIS DQOs) are achievable by MS systems.
The relatively large deviations from the assigned reference values in NMHC_N 2 and NMHC_air observed for CMN and the Medusa systems (Fig. 3) were mainly due to calibration issues (two-step calibration; see Sect. 3.3). Nevertheless, the very good repeatability of the Medusa systems indicates the potential to perform high-quality NMHC measurements within the 5 % class (Fig. 2u-w).
FZJ_A was optimised to perform fast chromatography as the instrument is employed in aircraft measurements. The sample volume is kept small in order to reduce the sampling time. With a chromatography time of 3 min, the peak resolution can hardly be compared to the other GC systems. Nevertheless, FZJ_A performed fairly well for normal alkanes and aromatics, whereas branched alkanes and alkenes showed larger deviations from the assigned values (Fig. 2p). Whether this was due to the rather complex 74-component calibration standard in the 0.1 to 10 nmol mol −1 -range (Apel-Riemer Environmental Inc.) cannot be judged from the available data. Furthermore, breakthrough of C 4 compounds was reported by FZJ_A. In general, the blank chromatogram revealed many peaks (chromatogram not shown), which possibly affected the results, especially in NMHC_air.
For NMHC_N 2 the MS systems of PUY and SIR reported most values with a deviation less than 10 %, whereas for NMHC_ air more of the reported values were outside the 10 % class (Fig. 2r and s). For PUY this was probably due to drifting calibration standard measurements (up to 20 %) and poor repeatability; for SIR it was probably connected to high blanks (relatively high blank values compared to assigned values (Table S7)) and poor stability of the calibration measurements.
The MS at SMR clearly underestimated the NMHC mole fractions in NMHC_N 2 (Fig. 2t), except for isoprene. In contrast, SMR reported all values within the 10 % class (Table S3) for NMHC_air. SMR reported a non-linear calibration curve and low reproducibility of the submitted calibration measurements, whereas the two NMHC mixtures were reproducibly measured.
In summary, the calibration, drift, and non-linearity are important issues for MS systems, which have to be handled with most care when using a GC-MS system for the measurements of NMHCs.

Other issues
During the ACTRIS intercomparison only very dry NMHC mixtures were analysed, and therefore a full performance assessment of water management systems (Nafion ® Dryers, cold traps, or hydrophobic adsorbents at room temperature) cannot be made. Nevertheless, some basic conclusions can be drawn. The cold-trap systems used by YRK and HPB_A (Table S2a) exhibited no artefacts. Such systems sometimes have a large internal volume for water removal, and, whilst very suitable for online measurements, they are not so well suited for conditions where limited flushing vol-ume is an issue, e.g. when analysing limited sample volumes. In this intercomparison, where dry samples were analysed, this method was superior compared to Nafion ® Dryers where alkene artefacts are observed (see Sect. 3.4.3). The use of hydrophobic adsorbents at room temperature indicated no problems for HPB_B. However, the weak adsorbents used in HPB_B are not appropriate for the adsorption of low-boiling NMHCs (C 2 -C 3 ).
Ozone management was not in the scope of this ACTRIS intercomparison study, and, furthermore, ozone is rapidly destroyed on metal surfaces; thus no ozone was present in the cylinders.
One specific issue was associated with the ZSF system, which had been brought to 2650 m a.s.l. shortly before this intercomparison. The reduced atmospheric pressure might have caused changes in the chromatographic conditions which had not been adjusted at the time of the measurements.

PTR-MS results
The two NMHC mixtures were analysed with the PTR-MS systems of SMR II and WCC-VOC. Isoprene in NMHC_N 2 fitted well inside the 5 % class, whereas isoprene in NMHC_air, toluene, and benzene in both NMHC mixtures were reported outside the 10 % class. Detailed results and some explanations are given in the Supplement.

Comparison with previous intercomparisons
During AMOHA phase 4 (Plass-Duelmer et al., 2006) and NOMHICE phase 4 (Apel et al., 2003) measurements of whole air and synthetic test samples were compared. As outlined in the Introduction, conditions were different and, accordingly, these studies cannot be compared with the AC-TRIS intercomparison in the strictest sense. However, the whole air test samples supplied by canisters (NOMHICE and AMOHA phase 4 part 1) or sampled into individual canisters by participants (AMOHA phase 4 part 2) had a similar complexity to the whole air sample used in the actual intercomparison (e.g. 20-50 % of NMHCs < 0.1 nmol mol −1 ). Originally introduced by Apel et al. (2003) and modified by Plass-Duelmer et al. (2006), a ranking procedure defining a score for quality and quantity of the provided results by each lab was introduced: Rank = ((n < +10%) + 0.75 (+10% < n < +25%) + 0.5 (+25% < n < 50%)) N where n is the number of reported values falling into the given reference intervals, N is the total number of compounds reported, X is the total number of compared compounds, and k = This "Rank" score can reach a maximum of 1.3 (all compounds measured and correct within 10 %) down to negative numbers for substantial deviations from the reference (large k). Minimum, median, and maximum ranks, respectively, for NOMHICE part 4 are 0.23, 0.81, and 1.16 (37 compounds); for AMOHA 4 phase 1 are 0.82, 1.02, and 1.14 and for phase 2 −0.31, 1.0, and 1.12; and in this study for NMHC_air are 0.49, 1.03, and 1.19 (the latter excludes results by the reference laboratories). The best-performing laboratories were in all studies similar at 1.14-1.19, the mid-quality increased from NOMHICE to AMOHA and this study, and the lowestperforming labs were best in AMOHA 4 phase 1 and AC-TRIS. If we interpret the results as development over time, there is a tendency of improvement of the lower-performing labs, whereas the medium to best laboratories perform essentially unchanged over the last 15 years. However, AMOHA was a "learning" intercomparison with phases of increasing complexity and feedback to the participants in between, which in the end yielded the best performance for AMOHA 4 phase 1. Compared to this, ACTRIS may be seen as a snapshot with reasonable performance, as well as highlighting the need of more regular feed-back to the stations.

Conclusions
In the NMHC intercomparison exercise performed in the European infrastructure project ACTRIS, a significant number of instruments were capable of measuring NMHC in nitrogen (NMHC_N 2 ) fairly accurately: 88 % of the submitted NMHC values were within 10 %, and 58 % even within 5 %, of the reference values, which are the DQOs of ACTRIS with respect to the deviation to assigned values. It should be noted that NMHC_N 2 was almost identical to the NPL calibration standards used at the stations and a substantial number of deviations was not expected. Participants generally achieved very good repeatability in their measurements in line with the objectives of 2 %.
In compressed whole air (NMHC_air) generally more frequent and larger deviations to the assigned values compared to NMHC_ N 2 were observed (77 % of the reported values were within 10 %, but only 48 % were within 5 %). It should be noted that this comparison uses test gases which do only partly reflect the complexity of ambient air, e.g. no ozone and low water content. On the one hand, an important contributor to insufficient results in NMHC_air was blank issues observed in zero-gas measurements in some of the systems, especially those using a Nafion ® Dryer. On the other hand, systems with cold traps exhibited smaller blank issues. The study highlights the importance of good zero-gas measurements to determine realistic blank values to be subtracted from measurement results.
Another factor contributing to the poorer NMHC_air results is the reduced chromatographic resolution, particularly in the range of C 4 -C 6 compounds. Generally, those systems using direct calibrations in the nanomole-per-mole range achieved better results than those using whole air calibration standards. This confirms and emphasises the results found in the AMOHA and GAW intercomparisons (Plass-Duelmer et al., 2006;Rappenglueck et al., 2006; as the two-step calibration and more complex matrix in whole air calibration standards introduce additional potential errors. For ethyne, losses may occur due to breakthrough in the adsorption trap, and yet unexplained reduced C response was observed in several systems. This intercomparison supports previous studies, finding that it is essential to calibrate ethyne directly and carefully characterise the response of the system in dry calibration standard and humid ambient air sample matrices. The use of FID C responses proved to be a powerful tool because it helped to identify problems in a number of analytical systems. However, as long as a system behaves similarly in different sample gas matrices, deviations in the C response may cancel, resulting in correct mole fractions. But this requires thorough testing of the respective GC systems. Breakthrough is generally an issue for C 2 -C 3 hydrocarbons in adsorptive traps. Deviations from the expected C responses for low-boiling hydrocarbons were mainly observed in systems using the PerkinElmer Thermodesorber with air-toxics/air-monitoring traps. Whether these deviations were due to breakthrough or split injection issues could not be resolved. Almost all of the participating instruments indicated losses of C 7 -C 8 aromatic compounds, most probably due to adsorptive losses. Despite such losses, many participants achieved good results for aromatics, but overall deviations were slightly larger than for other compound groups. On average, FID systems achieved better results, but good measurements were also obtained with GC-MS systems; however, since the MS is less stable than FID, more frequent calibrations are required.
Another important result of this intercomparison is that in more than 25 % of the reported results uncertainties were substantially underestimated and major uncertainty contributions were not correctly assessed. Last but not least, erroneous results were also caused by the occasionally inattentive data submission, with mistakes and incomplete information. While these problems were detected and resolved in the relatively small data set of this intercomparison, it is an issue with submission of insufficiently controlled data sets to public data centres and end-users.
The PerkinElmer Online Ozone Precursor Analyzer is the only commercially available instrument used by five participants in this intercomparison. Although these were not among the best performing in this study, reasonable results can be achieved. We demonstrated that the ACTRIS DQOs, albeit demanding, can be achieved with state-of-the art measurement systems. However, equally important for achieving high-quality results are experienced operators, comprehen-sive quality assurance and quality control, well-characterised systems, and sufficient manpower to operate the systems and evaluate the data.
The Supplement related to this article is available online at doi:10.5194/amt-8-2715-2015-supplement.