Observations of molecular hydrogen mixing ratio and stable isotopic composition at the Cabauw tall tower in the Netherlands

(cid:1) Multi-year time series of H 2 and d D(H 2 ) were measured at a regional tall tower site. (cid:1) The dataset contains the ﬁ rst measured d D height pro ﬁ les in the boundary layer. (cid:1) The features of the time series are consistent with large anthropogenic in ﬂ uences. (cid:1) The apparent d D source signature is much lower than fossil fuel combus- tion estimates. (cid:1) Both source signature and pro ﬁ les suggest microbial H 2 production around the tower. Measurements of the stable isotopic composition ( d D(H 2 ) or d D) of atmospheric molecular hydrogen (H 2 ) are a useful addition to mixing ratio ( c (H 2 )) measurements for understanding the atmospheric H 2 cycle. d D datasets published so far consist mostly of observations at background locations. We complement these with observations from the Cabauw tall tower at the CESAR site, situated in a densely populated region of the Netherlands. Our measurements show a large anthropogenic and 200 m). There was a signi ﬁ cant shift to lower median d D values at the lower levels. This con ﬁ rms the limited role of soil uptake around Cabauw, and again points to microbial H 2 production during an extended growing season, as well as to possible differences in average fossil fuel combustion source signature between the different footprint areas of the sampling levels. So, although knowledge of the background cycle of H 2 has improved over the last decade, surprising features come to light when a non- background location is studied, revealing remaining gaps in our understanding.

dD datasets published so far consist mostly of observations at background locations. We complement these with observations from the Cabauw tall tower at the CESAR site, situated in a densely populated region of the Netherlands. Our measurements show a large anthropogenic influence on the local H 2 cycle, with frequently occurring pollution events that are characterized by c(H 2 ) values that reach up to z1 ppm and low dD values. An isotopic source signature analysis yields an apparent source signature below À400‰, which is much more D-depleted than the fossil fuel combustion source signature commonly used in H 2 budget studies. Two diurnal cycles that were sampled at a suburban site near London also show a more D-depleted source signature (zÀ340‰), though not as extremely depleted as at Cabauw. The source signature of the Northwest European vehicle fleet may have shifted to somewhat lower values due to changes in vehicle technology and driving conditions. Even so, the surprisingly depleted apparent source signature at Cabauw requires additional explanation; microbial H 2 production seems the most likely cause. The Cabauw tower site also allowed us to sample vertical profiles. We found no decrease in c(H 2 ) at lower sampling levels (20 and 60 m) with respect to higher sampling levels (120

Introduction
With typical background mixing ratios above 500 ppb (nmole/ mole), atmospheric molecular hydrogen (H 2 ) is the second most abundant reduced trace gas in the atmosphere after methane (CH 4 ). It is formed by the oxidation of hydrocarbons, by combustion processes and by a wide range of microbial processes in soils and in the ocean, including fermentation and nitrogen fixation. Globally, uptake by soils removes roughly three quarters of this H 2 , leaving a quarter to be oxidized by hydroxyl radicals (OH). Despite increasing methane levels and increased combustion of fossil fuels, there is no evidence for a long-term trend in H 2 mixing ratios (c(H 2 )) (Ehhalt and Rohrer, 2009;Grant et al., 2010b). This may change when H 2 comes into wider use as an energy carrier and leaks into the atmosphere at an increased rate during transport and storage (Bond et al., 2011). The resulting rise in H 2 levels may have consequences for stratospheric ozone chemistry and an indirect climate effect through a decrease in the oxidative capacity of the atmosphere (Schultz et al., 2003;Warwick et al., 2004;Tromp et al., 2003;Feck et al., 2008). Because of this impending disturbance of the H 2 cycle, the global H 2 budget has received growing attention in recent years, with several new budget estimates and simulations of future scenarios published recently by Bousquet et al. (2011); Yashiro et al. (2011);Yver et al. (2011a); Pieterse et al. (2011Pieterse et al. ( , 2013; Wang et al. (2013) and Popa et al. (2015), but considerable uncertainties remain.
Investigation of the stable isotopic composition of H 2 can provide additional information about the sources and sinks of H 2 (Gerst and Quay, 2001;Rahn et al., 2002b) and put extra constraints on models that describe the H 2 cycle (Price et al., 2007;Pieterse et al., 2009Pieterse et al., , 2011Pieterse et al., , 2013. The stable isotopic composition of H 2 is generally expressed in isotopic "d" notation: where R Sample is the atomic ratio of deuterium (D) to "light" hydrogen atoms (H) in the H 2 of the sample, and R VSMOW is the atomic ratio in the international VSMOW (Vienna Standard Mean Ocean Water) standard. The different production and destruction processes of H 2 have very different effects on the dD value of the H 2 reservoir. Photochemical oxidation of methane and other hydrocarbons yields H 2 that has a positive dD value, i.e., is enriched in D with respect to VSMOW (Gerst and Quay, 2001;Rahn et al., 2003;Rhee et al., 2006aRhee et al., , 2008Feilberg et al., 2007;R€ ockmann et al., 2003Pieterse et al., 2009). This dD value, with estimates ranging from þ116‰  to þ190 ± 50‰ (CH 4 only (Rhee et al., 2006a)), is relatively similar to the dD value of the ambient H 2 reservoir, with average dD values of þ117 to þ150‰, depending on latitude . For this reason, photochemical H 2 production has generally little influence on ambient dD variability, despite being the largest global source of H 2 .
H 2 produced in combustion processes has a negative dD value, i.e., it's depleted in D with respect to VSMOW. Estimates of the source signature (dD) of biomass burning range between À290‰ (Gerst and Quay, 2001) and À90‰ (Rhee et al., 2006b); more recent studies indicate that the more negative of these estimates is more likely realistic (R€ ockmann et al., 2010a;Haumann et al., 2013). Emissions from vehicles constitute z60% of the total fossil fuel source, which in turn is z14% of the sum of all H 2 sources (Ehhalt and Rohrer, 2009). Gerst and Quay (2001) reported a source signature of À196‰ for fossil fuel combustion based on samples collected in environments with different degrees of pollution, the most polluted being a parking garage. Rahn et al. (2002b) reported a source signature of À270‰ based on samples from different locations within the Los Angeles Basin and found good agreement with direct samples of car exhaust. More recent test bench work by Vollmer et al. (2010) shows that the dD in engine exhaust depends on engine operating conditions and decreases for engines equipped with modern catalytic converters, with a lowest found dD of À370‰. Preliminary results from a highway tunnel campaign in Switzerland, where the traffic was mostly fluent, seem to indicate a source signature close to the one found by Rahn et al. (2002b) The contribution of microbial production to the global H 2 production is small (Conrad and Seiler, 1980), but it may have a strong isotopic effect on local scales due to its extremely depleted source signature. Walter et al. (2012) found a source signature of (À741 ± 20)‰ for H 2 produced by different fermentation processes, which seems in agreement with earlier fermentation results by Rahn et al. (2002b). However, a first field study of H 2 emitted from soils, conducted at the same site as discussed here (Cabauw), yielded a less depleted estimate of (À530 ± 40)‰ (Chen et al., 2015). A possible cause for this difference might be that the H 2 produced in the clover-rich grassland around Cabauw is a byproduct of N 2 fixation rather than fermentation.
Both destruction processes of H 2 , uptake by soil and photochemical oxidation by the hydroxyl (OH) radical, are associated with isotopic fractionation, as the "light" form of H 2 (HH) is destroyed faster than the deuterated form (HD). The strength of the fractionation is often expressed as fractionation factor a, which is the ratio of the removal rate of HD to the removal rate of HH. The H 2 isotope fractionation is much stronger for the photochemical removal (a ¼ 0.57 at 298.15 K (Talukdar et al., 1996)) than for the uptake by soils (a z 0.943 (Gerst and Quay, 2001;Rahn et al., 2002a;Rice et al., 2011;Chen et al., 2015)).
The first environmental observations of dD in the troposphere were already made in the 1950s (see (Kaye, 1987) for an overview), but difficult and time-consuming measurements limited the application until new techniques became available at the start of this century (Rahn et al., 2002b;Rhee et al., 2004). Especially in the last decade, many more tropospheric observations were published (Rhee et al., 2006b;Rice et al., 2011;Batenburg et al., 2011Batenburg et al., , 2012Walter et al., 2013Walter et al., , 2016. These recent observations were mostly done at "background" locations, either from ships, from aircraft in the upper troposphere or at predominantly remote surface stations. This paper presents dD results from the Cabauw tall tower station in the Netherlands. This tower is located in an agricultural area in the centre of the Netherlands, but within 20e50 km from the four largest Dutch cities. One of the reasons to do these measurements was to complement the existing dataset of mostly background observations with data from a more anthropogenically polluted site. To assess the representativeness of our results, we also measured two diurnal cycles (or "diets") of dD and c(H 2 ) that were sampled at a suburban site near London (Royal Holloway).
Very few dD data from polluted regions exist, but an increasing number of observations of c(H 2 ) were made at such places in recent years. Semicontinuous measurements were performed at suburban sites near Zürich (Dübendorf: Steinbacher et al., 2007), Paris (Gifsur-Yvette: Yver et al., 2009), Heidelberg (Hammer et al., 2009), andLondon (Royal Holloway: Fowler et al., 2011), as well as at an urban site in Bristol (Grant et al., 2010a), and also at the Cabauw tall tower . A common feature in the published time series from these stations is that c(H 2 ) is very variable and shows large excursions to high values superposed on a seasonally changing baseline. During these H 2 peaks, c(H 2 ) can increase to two or, in the case of Dübendorf and Bristol, even three times its background value.
An additional advantage of sampling from the Cabauw tower is that air can be sampled from different sampling heights. Measurements of c(H 2 ) have been performed at different heights at the Cabauw tower  and at the Trainou television tower near Orl eans (Yver et al., 2011b). The measurement of vertical gradients is useful in the study of H 2 uptake by soil. Also, by measuring at different heights, it is possible to probe different "footprint" areas, i.e. differently weighted combinations of local and more remote sources and sinks . To our knowledge, this paper presents the first vertical profiles of dD(H 2 ) in the boundary layer.

Tower location and continuous on-site measurements
The Cabauw tall tower is a 213 m steel construction dedicated to atmospheric research Vermeulen et al., 2011). It is located at the CESAR site (51 58 0 N, 4 55 0 E, 2 m a.s.l., GAW ID: CES, also often referred to as CBW), which is in the so-called "Green Heart" of the Netherlands, a relatively rural region surrounded by the "Randstad" conurbation. The area directly surrounding the tower is relatively sparsely populated and mainly used for agriculture (mostly grassland), but population and road density increase steeply further away from the tower. The distance from the tower to the city of Utrecht is z20 km, to Rotterdam z30 km, to the Hague z40 km and to Amsterdam z45 km; an estimated seven million people inhabit the Randstad conurbation that consists of these four big cities and their many neighbouring settlements.
The "footprint" or "influence region" of a measurement site can be described as the area from which the trace gas fluxes have a detectable influence on the measured concentrations. For Cabauw, this has previously been evaluated for CO 2 by calculating backward trajectories with the COMET transport model while assuming a constant CO 2 emission per unit area. A footprint area of roughly 500 Â 700 km was then found to cause about 50% of the concentration signal at the top of the Cabauw tower. This footprint is quite large compared to other sites, due to the often high wind speeds in the region and the large variability in wind direction. The footprint of the lower sampling levels is much smaller (Henne et al., 2010;Vermeulen et al., 2011).
The soil in the surrounding area is a combination of peat and clay layers with a high water table. These soil characteristics may be one of the reasons that Popa et al. (2011) found low H 2 deposition velocities at this location. The tower is equipped with a tubing system to sample air at 20, 60, 120, and 200 m for greenhouse gas analysis . During the time that the samples described here were collected, air was continuously drawn from the different heights and lead through Decabon ® tubing to the laboratory in the base of the tower. There, 400 ml min À1 air streams were separated from the bulk air flow (which was flushed outside), dried in cryogenic water traps to a dew point of zÀ50 C, divided and analyzed on either a CO 2 analyser (z150 ml min À1 ) or on a GC system for other GHGs (z80 ml min À1 ). A reduction gas analyser (RGA-3) for the measurement of H 2 and CO mixing ratios was added in series after the GC system and measured a vertical profile every half hour . The c(H 2 ) measurements of the RGA-3 were calibrated to the recently defined WMO scale (Jordan and Steinberg, 2011).
These continuous c(H 2 ) measurements were used to assess the quality of the flask sample data presented in this work until they were stopped at the end of 2011.

Collection of Cabauw flask samples
We used Normag borosilicate 3.3 glass flasks with Kel-F (PCTFE) O-ring sealed stopcocks for the sampling at Cabauw. Rothe et al. (2004) found that this Normag flask type provides good stability for a set of six trace gases, which included H 2 . Only a few outliers occurred in their H 2 tests [A. Jordan, personal communication]. Most samples were collected in 1 L flasks; a minority was collected in 2 L flasks of the same type. All flasks were covered with a black shrink hose and stored in closed boxes to avoid photochemical alteration. Flask samples were stored between 1 and 20 months before analysis because of delays in the automation of the isotope measurement system.
The air for the flask samples was drawn in through the same tubing system as was used for the greenhouse gas and semicontinuous H 2 and CO measurements described in Subsection 2.1. Every half hour, the in-situ measurement equipment switches to measuring the working or standard tank gases that are used to calibrate and check the (semi-)continuous measurements as described by Vermeulen et al. (2011) and Popa et al. (2011). During this time, the air that is drawn through the sample lines is vented. For the flask sampling, this venting flow was led through the sampling flasks, and the filling of the flasks was always done during the venting period. The flasks were filled up to z1.5 bar of absolute pressure.
During the sampling period, from July 2008 to July 2012, 119 individual samples were taken from the 200 m level around midday, resulting in more than two samples per month that are representative of the daytime conditions in Cabauw. In the same period, 29 complete vertical profiles were sampled additionally, as well as 8 incomplete profiles with at least two heights sampled. 99 individual 200 m samples, 16 complete and 16 incomplete profiles remained after quality control of the data (see Section 2.6, several complete profiles became incomplete as flasks were rejected by the procedure). The vertical profiles were mostly sampled under (very) stable atmospheric conditions, when the real-time raw data from the RGA showed a positive or negative vertical gradient in c(H 2 ).
The vertical profiles are therefore biased towards stable conditions, and not fully representative of the H 2 climatology at Cabauw.

Collection of Royal Holloway flask samples
The Royal Holloway diurnal cycle samples were collected on the Royal Holloway campus of the University of London in Egham, located z30 km WSW of central London (Lowry et al., 2001;Hern andez-Paniagua et al., 2015). One set of eight samples was collected in July 2008, another set of 12 was collected in January 2009. Both were collected under conditions where the Royal Holloway site received urban air masses from the Greater London area and an overnight c(H 2 ) variation of more than 100 ppb was observed in the continuous measurements at the site.
The sample air was dried, collected in 3 L stainless steel tanks and subsequently analyzed on a Peak Performer 1 (Peak Laboratories) for c(H 2 ) and c(CO). The tanks were stored for 1À2 months before isotopic analysis. Production of H 2 can occur in some steel vessels (Gerst and Quay, 2000); therefore, we compared the GC-IRMS c(H 2 ) results to the continuous measurements and the results of the Peak Performer 1. We concluded that the isotopic analysis of two of the 20 tanks yielded unreliable results. These were left out of all further analysis.

c(H 2 ) and dD(H 2 ) analysis of the flask samples
The analysis of the flasks was performed with an online Gas Chromatography -Isotope Ratio Mass Spectrometry (GC-IRMS) setup as developed by Rhee et al. (2004), with adaptations described by R€ ockmann et al. (2010b). This setup isolates the H 2 from the air matrix and analyzes it for c(H 2 ) and dD in four steps: 1 Cryogenic removal of the main components from an z0.5 L aliquot of air by condensation onto a cold head kept at z40 K.
Only the most volatile components (H 2 , He, Ne and traces of N 2 ) are retained in the gas phase. 2 Pre-concentration of the H 2 in a stainless steel trap filled with 5 Å molecular sieve cooled to z63 K (the triple point of N 2 ). 3 Cryo-focussing of the H 2 on a steel-jacketed 5 Å molecular sieve gas chromatography column submerged in liquid nitrogen, followed by gas chromatographic purification on a 5 Å molsieve gas chromatography column kept at 323 K. 4 Injection of the purified H 2 into the IRMS (ThermoFinnigan Delta plus XL) through an open split interface for analysis. During acquisition of the final chromatogram, an "internal" IRMS H 2 working gas standard is injected four to seven times before the injection of the sample H 2 , and one or two times after.

Data processing, calibration and drift correction
At the start of this research project, the GC-IRMS system was semi-automated to enable automated measurement of one aliquot of air. In the summer of 2011, the system was automated further to enable the fully automated measurement of up to 25 aliquots in one measurement sequence. Part of the increased measurement capacity after the automation was used to improve data quality by including more laboratory reference air measurements into the daily measurement routine. Thus, the sample calibration procedures differ somewhat for the two measurement periods.
Before the full automation, one blank measurement and two measurements of air from laboratory reference air cylinders were typically performed every measurement day. A cylinder containing compressed whole air with c(H 2 ) ¼ (545.0 ± 0.5) ppb (measured by MPI-BGC Jena) and dD ¼ (þ73.0 ± 1.8)‰ was used until the spring of 2010, at which time it was almost empty. It was replaced with two cylinders made of pickled and passivated stainless steel (Graeven Metall-Technik GmbH). The two new cylinders contained mixtures of pure H 2 with synthetic air, one with c(H 2 ) ¼ (580.78 ± 0.03) ppb and dD ¼ (þ207.0 ± 0.3), and one with c(H 2 ) ¼ (244.3 ± 0.8) ppb and dD ¼ (þ198.2 ± 0.5)‰ (error bars indicate standard errors, c(H 2 ) measured by MPI-BGC). We used 5day moving average values obtained from these laboratory reference cylinders to calculate the c(H 2 ) for each flask sample and to correct the measured dD values for instrument drift. We subtracted a 9.5‰ offset from the dD values for samples measured between June 2010 and the full automation in the summer of 2011, to correct for an empirically observed bias. This procedure is described in more detail in (Batenburg et al., , 2012. After the full automation, two new laboratory reference air cylinders were added to the set of reference cylinders and were measured regularly. One cylinder was of the same type as the two stainless steel cylinders already in use, and contained a similar mixture of hydrogen and synthetic air with c(H 2 ) (858.9 ± 1.8) ppb and dD (À206.3 ± 0.6)‰, as was determined by using the original two reference air cylinders. The other cylinder was made of aluminium, had a larger volume, and contained whole ambient air. Because of its larger volume, it was measured more often during the measurement days, with the goal of using these frequent measurements for regular data evaluation. After measuring the new cylinders for some time, it became apparent that the three similar stainless steel cylinders with synthetic mixtures were stable with respect to each other. However, the large aluminium cylinder drifted with respect to the three stainless steel cylinders, at a rate of about þ4.8 ppb and À3.5‰ per month from the initial values of z770 ppb and zÀ110‰, which suggested significant H 2 production in the cylinder.
In order to use this slowly drifting cylinder for sample correction, we had to allow for the assigned values for c(H 2 ) and dD in the cylinder to vary in time. We used the measurements of the mixtures in the three stainless steel cylinders to calculate 5-day moving average values of the three parameters that are needed to correct a sample measurement for instrument drift and to calculate c(H 2 ) 3 .
With these averaged parameters, we calculated an instrumentdrift-corrected c(H 2 ) and dD for each measurement of air from the large, aluminium cylinder. Then, we used 29-day moving average values of these as assigned c(H 2 ) and dD values to calculate the c(H 2 ) and instrument-drift-correct the dD of the flask samples with the large cylinder values, following the standard procedure with 5-day moving averages. Manual adjustments to the averaging intervals were sometimes necessary where sudden visible shifts in values occurred, e.g. due to changes in the mass spectrometer. Finally, the previously mentioned empirical correction of À9.5‰ was also applied to these dD results. The estimated standard deviation for the dD measurements was 4.5‰ (absolute) before the automation of the GC-IRMS, but it increased to 6.9‰ (absolute) after system automation, based on the measurements of the additional stainless steel reference air cylinder. The estimated standard deviation in c(H 2 ) did not change from the value of 2.5% (relative) mentioned in (Batenburg et al., 2012). Error bars displayed in this article are estimates of measurement uncertainty alone.

Quality control of the Cabauw data by comparison to continuous measurements
Where available, we used the continuous measurements of c(H 2 ) at the Cabauw tower and continued, un-published measurements) for quality assessment of the flask sample measurements, as follows. We calculated 6-h averages and standard deviations of the continuous data around each flask sampling time, both for the data collected at all heights and for the data collected at the actual flask sampling height. If the c(H 2 ) measured by GC-IRMS in the flask was more than four standard deviations different from both the all-height and the same-height average, we labeled the flask measurement as "bad". If it was more than four standard deviations removed from only one of the averages, we labeled it as "suspect". We labeled flasks that could not be assessed because of a lack of continuous data (less than 4 datapoints) as "no control available", with the exception of one sample collected on 11 May 2011 that had an extremely large c(H 2 ) and a large effect on the outcome of our fit routines; we labeled this "suspect". We labeled all others "good". Of all 259 flask measurements, we labeled 165 as "good" after comparison, 28 as "suspect", 20 as "bad", and 46 as "no control available". The number of "no control available" flasks was unfortunately quite large, as there were large gaps in the continuous data and no quality controlled continuous data were available after 2011. The "no control available" data are treated as "good" data in the rest of this paper.

Results and discussion
3.1. Time series Fig. 1 shows all the c(H 2 ) (a) and dD (b) flask data for the Cabauw tower as a function of sampling time. Flask data that were collected from the EUROHYDROS station Mace Head in a similar fashion are shown for comparison, as well as an extrapolated harmonic function that was fit to the Mace Head data . Mace Head is located on the west coast of Ireland (53 20 0 N, 9 54 0 W) and is considered a clean background station. Because of its location and the dominant westerly winds in this latitude band, the measurements performed there can be used as a background for continental Europe (Grant et al., 2010b).
It can be seen in Fig. 1(a) that the Mace Head c(H 2 ) data form the lower bound for the Cabauw c(H 2 ) data. In Cabauw, very large excursions occur above this baseline, especially during winter.
During these excursions, c(H 2 ) is often highest at the lower sampling heights. It seems that due to these excursions, the seasonal found a pronounced c(H 2 ) variability, with particularly large peaks in winter, and as a result, a seasonal cycle with a maximum in winter, as for our data. This maximum was slightly earlier and higher for the lower sampling heights. The occurrence of these large winter excursions was attributed to the decrease in atmospheric mixing in winter, which leads to large accumulations of pollution in a more shallow and stable boundary layer. Similar features were also seen in the results of Steinbacher et al. (2007) from the Dübendorf site.
The dD time series (Fig. 1(b)) looks like a vertically flipped image of the c(H 2 ) time series. Here, the Mace Head data form the higher bound of the Cabauw data. It can be seen that the c(H 2 ) peaks are accompanied by dD dips, and that during these dips, the lower sampling heights often have lower dD values than the higher sampling heights. The dD cycle also seems shifted with respect to Mace Head; the April/May Mace Head minimum is shifted to the earlier winter in 2009, 2010, and 2011 at Cabauw. From the opposite behavior of the Cabauw c(H 2 ) and dD time series and from the D-depletion in Cabauw with respect to Mace Head, it can already be concluded that the H 2 that makes up the large, mostly wintertime, peaks originates from sources with a D-depleted source signature. Because of the location and frequent pollution events observed at the Cabauw site, H 2 from fossil fuel combustion is a very likely explanation for these excursions. In the warmer seasons, microbial H 2 production can also contribute. In the next section, the isotopic composition of the source mix that affects Cabauw is investigated in detail.

Source signature studies
d values are linear with regard to mixing of different reservoirs, and therefore the effect of mixing a background reservoir of a gas with mixing ratio c bg and isotopic composition d bg with a "polluted" airmass with mixing ratio c s and d s is described by a simple mass balance equation: where the "obs" superscript indicates observed values. This can be rewritten to   Fig. 2(a) shows a Keeling plot of all Cabauw H 2 data. The linear fits to this Keeling plot were made with the bivariate, error-weighted Williamson-York fit algorithm described by Cantrell (2008). The fit to all the data that were labeled "good" in the data quality assessment, indicated with a solid line, yielded a source signature estimate of (À515 ± 26)‰ (error indicates one standard error, 1 SEM). Another fit, indicated by a dotted line, was made to all the data, including the "suspect" and "bad" ones. This fit yielded a source signature that was much less depleted, namely (À323 ± 12). Clearly, we affected the outcome of this fit considerably by selecting and removing suspicious data from our dataset.
An implicit assumption of this Keeling plot method is that the background mixing ratio and d value are constant. Fig. 1 shows that this is not the case for H 2 at Cabauw. For such cases, Miller and Tans (2003) proposed an alternative approach of evaluating the mass balance equation. The mass balance can be written as: This equation shows that the slope of a plot of d obs c obs À d bg c bg against c obs À c bg can also be interpreted as the source signature.
Unlike the Keeling plot method, this method requires that mixing ratio and isotopic composition of the background reservoir are known. A plot following the Miller-Tans method for Cabauw is shown in Fig. 2(c). The harmonic functions that were fit to the seasonal cycles of c(H 2 ) and dD at Mace Head (MHD) by Batenburg et al. (2011) were used to calculate background values. Only Cabauw samples with a c(H 2 ) above the calculated Mace Head value were included. A linear Williamson-York fit to all "good" data (solid line) yields an isotopic source signature estimate of (À625 ± 38)‰. Again, a fit that includes the data that did not pass the quality control procedure yields a much less depleted source signature (dotted line).
In our data quality assessment, we chose a time interval length for the averaging and a number of standard deviations as cutoff. These choices are, to some degree, arbitrary. Assessing the effect of our data selection was the motivation to make the fits to the complete dataset, including the "suspect" and "bad" data, that are shown as dotted lines in Fig. 2. The difference with the fit to the "good" data is large, which indicates that the "strictness" of our data selection algorithm may have an effect on the fit outcomes. To estimate the uncertainty as a result of this, we applied a bootstrapping routine where the fits were applied to 10.000 random samples of the "good" data. The distribution of the resulting Keeling intercepts are shown in Fig. 2(b), and the distribution of the resulting Miller-Tans slopes is shown in Fig. 2(d). The Keeling fit intercepts are distributed around a mean of À522‰ with a standard deviation of 50‰, and the Miller-Tans fit slopes are distributed around a mean of À629‰ with a standard deviation of 63‰. The large standard deviations suggest that the selection of samples introduces a larger uncertainty into the source signature estimate than the uncertainty estimates in Fig. 2(a) and (c). It seems that in this case, the error estimate obtained from our bivariate fit routine is unrealistically small. Miller and Tans (2003) also noted a tendency to underestimate the error of fit parameters in situations where only measurement uncertainty is considered as a source of scatter while clearly natural variability also contributes. We applied the same bootstrapping procedure to the whole dataset, including the "suspect" and "bad" data (not shown) and obtained a distribution with multiple peaks, which indicates that at least some of the "suspect" and "bad" datapoints are justly rejected by our data quality assessment.
Both the Keeling and the Miller-Tans method yield very Ddepleted values (< À400) for the source signature estimate. They are certainly more depleted than previous estimates of the total isotopic source signature of fossil fuel combustion of (À196 ± 10)‰ by Gerst and Quay (2001) and À 270‰ by Rahn et al. (2002b). Vollmer et al. (2010) found dD values between À270‰ and À370‰ for H 2 in the exhaust of a modern engine setup with a three-way catalytic converter (TWC). The TWC and fuel-rich driving conditions (with a low air-to-fuel ratio in the engine) were observed to lower the dD values. The Northwest European car fleet consists for a large part of modern vehicles with catalytic converters (UNECE, 2012), and traffic conditions in the Netherlands are often congested, leading to fuel-rich driving conditions. For these reasons, the isotopic signatures of Northwest European traffic emissions may well be more depleted than previously reported signatures. However, that they would be below À400‰ seems unlikely, so this does not provide a full explanation for our extremely depleted source signature estimates. H 2 emitted by N 2 -fixing microbes in soil (in the root nodules of legumes) likely contributes to the low apparent total source signature. The isotopic signature of microbially produced H 2 can be depleted down to as much as (À741 ± 20)‰ (Walter et al., 2012), so only a small quantity from this source can affect the local isotope budget considerably and make the apparent isotopic signature of the source mix more depleted. Some types of legumes, especially types of clover, are common around the Cabauw tower and some microbial production of H 2 occurs there (Chen et al., 2015). This production was, however, until recently expected to be limited to the growing season. Conrad and Seiler (1980) reported that the production peaked from April to June, and that very little production occurred in the rest of the year. Flasks sampled in April, May or June constitute less than a quarter of the data in Fig. 2. Omitting these data from the set of "good" data and applying the bootstrapping routine leads to source signature estimates that are only slightly less D-depleted (more D-enriched) than for the whole "good" dataset ( Fig. S1 in Supplement). Omitting the data from an even larger growing period (MarcheOctober) leads to multiple peaks in the distributions of the bootstrapping results (not shown), making it hard to draw conclusions from them. Based on these results, it seems that either microbial H 2 production is not the main cause of the low apparent source signatures in our Cabauw observations or microbial production of H 2 around Cabauw takes place during a much longer part of the year than previously thought. Chen et al. (2015) found production of H 2 in July and August, which supports the latter explanation.
It is interesting to note that when the Keeling and Miller-Tans fits are applied separately to the samples taken in different seasons, much more depleted apparent source signatures are found in MarcheAugust than in SeptembereFebruary (see Fig. S2 in Supplement). This would support the suspicion that microbial H 2 production takes place during a large part of the year, causing a very low source signature. Only in autumn and winter, when microbial production decreases and H 2 from fossil fuels can accumulate in inversion situations, less depleted source signatures occur. The less depleted source signature that is found in these seasons, however, hinges on only a few quite polluted samples that have high "leverage" in the linear fits. Also, the error estimates of the fit parameters may be too low, as the results of the bootstrap algorithm discussed above suggest. More sophisticated methods will be needed to disentange the role of different sources in the complex situation around the Cabauw tower.

Diurnal cycles at the Royal Holloway site
For comparison to the Cabauw data, two diurnal series of observations at Royal Holloway are shown in Fig. 3(a) and (b). The figure shows an anticorrelation between c(H 2 ) and dD. The afternoon rush hour peak, with elevated c(H 2 ) values and a simultaneous decrease in dD values, can be distinguished clearly around 7 p.m.
A Keeling plot of these data is shown in Fig. 3(c). Linear Williamson-York fits were made to the separate diurnals and to the combined set of data. All three fits indicate a source signature lower than À260. This is on the lower side of, or lower than, what is generally assumed for the fossil fuel combustion source of H 2 , and this may indicate that these generally assumed values are too high for modern vehicle fleets. However, it is not quite as extremely Ddepleted as the Cabauw signature. A difference in fossil fuel combustion source signature between the countries may exist, but it's also very likely that microbial H 2 production in leguminous plants plays a larger role in Cabauw. The Cabauw site is immediately surrounded by clover-rich fields, whereas the Royal Holloway site is not.

Height profiles
In Fig. 1, inspection by eye suggests that pollution signals are often stronger at lower sampling heights than at the 200 m level. This is, for example, the case for the profiles sampled on 8 January , 5 March 2009, 20 November 2010and 29 December 2010. To investigate such differences between the sampling heights, the data are plotted as a boxplot that indicates the median, quartile and 95th percentile values per height (Fig. 4). From this plot, it seems that the median c(H 2 ) value is slightly higher and the median dD value somewhat lower at the lower sampling heights than at the higher sampling heights. As noted, the sampling of the vertical profiles is biased towards (very) stable, and therefore polluted conditions. Hence, it is not representative for the full year-round variability in c(H 2 ) and dD. However, one conclusion about the role of soil uptake around Cabauw can already be drawn. At locations where uptake by soils dominates the H 2 cycle, the expected vertical profile has lower c(H 2 ) values and higher dD values at the bottom than at the top, since the uptake of H 2 by soils has a Denriching effect on the remaining H 2 reservoir (Gerst and Quay, 2001;Rahn et al., 2002a;Rice et al., 2011). This is clearly not the case for Cabauw; there is no c(H 2 ) depletion observed at the bottom, and dD is lower there, not higher. This indicates that the soil uptake does not play a large role in driving the H 2 variability at Cabauw. This is in agreement with the low uptake speeds found by Popa et al. (2011).
To determine if the differences between sampling levels are significant, pairwise Kruskal-Wallis H-tests were performed on the data from the different heights. The resulting p-values are listed in Table 1. A p-value below 0.05 indicates that the medians of the two datasets are different with more than 95% confidence. For c(H 2 ), a significant difference was found only between the 20 m and 200 m level. For dD, significant differences were found for three of the pairs, each combining one of the lower sampling altitudes (20 or 60 m) with one of the higher sampling levels (120 or 200 m). So we can conclude that there is a shift in dD between the 60 m and 120 m level, with lower dD values below, and higher dD values above. Vermeulen et al. (2011) reported that the footprint area for the 200 m sampling level (covering the Benelux, the north of France, the north and middle of Germany and the southeast of England) is considerably larger than that of the 20 m level (covering the Benelux and bits of northern France and western Germany). In addition, our height profiles were sampled under very stable conditions, where the lower air masses can become completely decoupled  from the higher ones. The temporary footprint of the lower levels can then be as low as 50e100 km, while the higher levels receive signals from hundreds of km away. Apparently, these nonoverlapping footprint areas cause the 20 and 60 m sampling levels to experience a more D-depleted H 2 source mix than the 120 and 200 m levels. There is a number of possible explanations for why the airmasses that reach the lower sampling levels have, on average, lower dD values than the airmasses that reach the higher levels under these conditions.
The first is that pollution from further away has travelled longer before reaching the tower. In an aging polluted airmass, oxidation processes can take place that both produce H 2 (from hydrocarbons) and remove it. Both the production of H 2 from the oxidation of hydrocarbons and the removal of H 2 by reaction with OH radicals have a D-enriching effect on the H 2 reservoir. Therefore, it is possible that these processes cause dD in polluted airmasses to increase with average age, which then causes the higher median dD values at the 120 and 200 m sampling levels. But considering that these processes are relatively slow compared to the transport times, it does not seem likely that this is the main contributing process to the dD difference between the heights. H 2 , for example, has a lifetime of z8 years with respect to oxidation by OH. Only a few VOC species undergo oxidation reactions on shorter timescales. The Supplement contains a rough calculation to assess the potential size of the effect, which shows that it is probably negligible.
Another explanation is the local production of extremely Ddepleted H 2 by N 2 -fixing microbes. As noted before, some microbial H 2 -production occurs in the vicinity of the tower (Chen et al., 2015), and it is possible that the clover-rich grasslands of the Green Heart are more prone to produce microbial H 2 than farther-away areas. Because of the different footprints, the 20 and 60 m levels are more sensitive to local processes than the higher levels, explaining their lower median dD values. This effect should be strongest in the AprileJune growing season, but only 6 of the 32 profiles were sampled in this period. In these profiles, the dD difference between the sampling levels is less significant than during the rest of the year (although that may be partly due to the smaller sample size). Thus, microbial production does not seem a satisfactory explanation to account for the overall difference between the median dD values for the different sampling levels, unless this production takes place during a larger part of the year. The results of (Chen et al., 2015) and the source signature fits for the different seasons ( Fig. S2 in Supplement) indicate that microbial production likely does take place outside the AprileJune time interval.
A third and also likely explanation is that there is a difference in the fossil fuel combustion source of H 2 between the different footprint areas. While we expect the whole Northwest European vehicle fleet to be very modern, subtle differences may exist between the fleets of the Netherlands, Belgium, France, Germany and the United Kingdom. For example, the market share of diesel cars differs per country and in time (UNECE, 2012). The driving conditions may differ as well, as the Dutch road system is notorious for its many congestions. The road networks of the Netherlands and the UK have the largest percentage of main network links that exhibit daily congestions in Northwestern Europe (Bovy and Salomon, 1999). These congestions might lead to more fuel-rich driving conditions in the Netherlands than in most surrounding countries, and according to the results of Vollmer et al. (2010), this may lead to lower dD values. Under the stable sampling conditions, the lower sampling levels are mostly sensitive to emissions from the Randstad, while the higher sampling levels may receive pollution from over the country's borders. Thus, a relatively low source signature of Dutch vehicles may affect the lower levels more than the higher levels.

Conclusions and outlook
The set of c(H 2 ) and dD data presented here shows that the H 2 cycle at Cabauw is under heavy anthropogenic influence. The time series show that the excursions to high c(H 2 ) values already observed by Popa et al. (2011) are accompanied by low dD values, implying a large role for depleted H 2 sources such as fossil fuel combustion. Further analysis shows that the apparent dD source signature is much lower than the signature generally assumed for fossil fuel combustion. We attribute this to production of extremely depleted H 2 by microbial sources in the soil during a large part of the year. The modernization of the Northwest European vehicle fleet and the often congested driving conditions in the Netherlands might also contribute to the low apparent source signature. The results of the Royal Holloway observations so far indicate that the isotopic signature of the source mix there may be lower than the previously assumed signature for fossil fuel combustion too, but not as very low as in Cabauw, possibly because microbial H 2 production does not occur in the immediate vicinity of the Royal Holloway sampling site. More observations would be needed to obtain a yearround picture in Royal Holloway.
In the vertical profiles measured at Cabauw under stable conditions, there was a significant shift to lower median dD values at the lower sampling levels (20 and 60 m) with respect to the higher levels (120 and 200 m). We hypothesize that the difference between the sometimes completely decoupled airmasses received at the lower and higher levels is caused by local or regional microbial H 2 production during an extended growing season that lasts until August at least, possibly combined with differences in car fleets and in driving conditions between the varying footprint areas of the different sampling heights. Differences in the average age of the pollution that reaches the different levels of the tower is not expected to contribute much to the effect. These results show that the magnitude of the microbial H 2 production term and the regional and seasonal variations therein need further study. Conrad and Seiler (1980) estimated the global magnitude of the microbial H 2 production term from a set of observations in Germany, but since then very little work has been done to study possible regional differences and refine the estimate. Microbial production may only constitute a small part of the global H 2 budget, but due to its extreme D-depletion (Rahn et al., 2002b;Walter et al., 2012;Chen et al., 2015) its effect on the isotope budget may be considerable, particularly on local scales. Also, the assumption in global models of a uniform dD source signature for H 2 emissions from vehicles might be an oversimplification. It may be more appropriate to allow for differences between regional Table 1 Resulting p-values from pairwise comparisons of the datasets from different sampling heights with the Kruskal-Wallis H-test, which tests the null hypothesis that the medians of the datasets are equal. It is assumed that the datasets are independent, and that the Kruskal-Wallis H-statistic has a c 2 distribution. A p value below 0.05 (bold font) indicates that the medians are different with 95% confidence. The used data are the same as shown in Fig. 4, i.e. all data from profiles with "good" data from at least two sampling heights. vehicle fleets, with more D-depleted source signatures for the more modern fleets that experience more congested situations. Our observations presented here complement previous observations made at background locations or in the free troposphere, and show that understanding the H 2 cycle at this site is considerably more complex than in those relatively undisturbed places. Ultimately, more information on the H 2 cycle in densely populated areas such as the Netherlands can help in assessing the climate and air quality impacts of future H 2 emissions in such regions with regional models. Measurements at different sampling heights can provide additional help in distinguishing local from long-range influences.