A dual center study to compare breath volatile organic compounds from smokers and non-smokers with and without COPD

There is increasing evidence that breath volatile organic compounds (VOC) have the potential to support the diagnosis and management of inflammatory diseases such as COPD. In this study we used a novel breath sampling device to search for COPD related VOCs. We included a large number of healthy controls and patients with mild to moderate COPD, recruited subjects at two different sites and carefully controlled for smoking. 222 subjects were recruited in Hannover and Marburg, and inhaled cleaned room air before exhaling into a stainless steel reservoir under exhalation flow control. Breath samples (2.5 l) were continuously drawn onto two Tenax® TA adsorption tubes and analyzed in Hannover using thermal desorption-gas chromatography-mass spectrometry (TD-GC-MS). Data of 134 identified VOCs from 190 subjects (52 healthy non-smokers, 52 COPD ex-smokers, 49 healthy smokers, 37 smokers with COPD) were included into the analysis. Active smokers could be clearly discriminated by higher values for combustion products and smoking related VOCs correlated with exhaled carbon monoxide (CO), indicating the validity of our data. Subjects from the study sites could be discriminated even after exclusion of cleaning related VOCs. Linear discriminant analysis correctly classified 89.4% of COPD patients in the non/ex-smoking group (cross validation (CV): 85.6%), and 82.6% of COPD patients in the actively smoking group (CV: 77.9%). We extensively characterized 134 breath VOCs and provide evidence for 14 COPD related VOCs of which 10 have not been reported before. Our results show that, for the utilization of breath VOCs for diagnosis and disease management of COPD, not only the known effects of smoking but also site specific differences need to be considered. We detected novel COPD related breath VOCs that now need to be tested in longitudinal studies for reproducibility, response to treatment and changes in disease severity.

and more specific diagnosis and treatment of infections [5][6][7]. To assess lung and systemic inflammatory processes in chronic obstructive pulmonary disease (COPD), different devices, methodologies and study designs were used in the past [2,[7][8][9][10][11][12][13][14][15][16][17][18][19][20]. In these studies it was possible to discriminate the breath of COPD patients from healthy subjects, from patients with asthma or tumor patients. However, despite these intensive efforts breath research in COPD is still in the exploratory phase of gathering experience with different sampling and analysis strategies to gather data of VOCs and VOC patterns with potential clinical value. An ERS taskforce and one by the international association of breath research (IABR) [21] were set up in 2013 to work on standardization.
For one electronic nose (eNose) device there is data available for comparison between sites [22] and in one eNose study independent patient groups from a different site were used for cross validation (CV) [23]. We are aware of only two studies that used independent validation cohorts for samples that were analyzed by GC-MS [24,25].
In this study we collected breath samples of COPD patients and controls in two German centers for lung research (Biomedical Research in Endstage and Obstructive Lung Disease Hannover (BREATH) and Universities Giessen and Marburg Lung Centre (UGMLC)). This way we were able to perform one of the largest exhaled breath COPD studies so far, including 190 patients and controls. It also allowed to specifically test the influence of study site. As active smoking is known to have a profound effect on the exhaled breath VOC composition [26] we carefully controlled for the smoking bias by including smokers with COPD, ex-smokers with COPD, healthy smokers and healthy non-smokers into the study. Breath samples were loaded directly onto Tenax ® TA tubes during tidal breathing under flow control without the intermediate storage in bags. All samples were centrally analyzed by TD-GC-MS at the analytical laboratory of the Hannover study site.

Subjects
At the Hannover site 31 smokers with moderate COPD, 30 ex-smokers with COPD, 29 healthy smokers and 29 non-smokers were enrolled into this study. At the Marburg site the respective numbers of subjects were 13, 26, 33, and 31. Due to technical reasons (e.g. insufficient loading of Tenax ® tubes, undetected leakage in the TD-GC transfer line system, or other instrument failures during the analysis), a total of 12 (Hannover) and 13 (Marburg) breath samples could not be evaluated. Patients were recruited in random order and not by group to avoid a potential analysis bias. Table 1 lists the demographics of all subjects that were included into the final analysis. Subjects had to be free of exacerbations or acute infections within four weeks prior to the study day. The study was conducted in accordance with Good Clinical Practice and the Declaration of Helsinki. Subjects gave their written informed consent. The study was approved by the Ethical Committees of Hannover Medical School (Nr. 6490) and Marburg University (AZ 179/12).

Study design
The study was conducted at two sites in Germany from December 2013 to August 2015. Subjects visited the site on a single day and were asked to abstain from eating, drinking (except water) and smoking at least 2 h prior to the visit. After providing informed consent, subjects underwent a thorough physical examination. A lung function test was performed and a blood sample was taken to provide a serum sample, which was stored at −80 °C. The COPD Assessment Test to inquire QoL (CAT) score was assessed by the standard questionnaire. Next we analyzed exhaled carbon monoxide (Smokelyzer, Bedfont, Kent, UK) to verify the smoking status and nitric oxide (NO, NIOX Mino, Aerocrine, Bad Homburg, Germany). The analysis of breath VOCs was performed as described in detail below. In Hannover the collection of breath VOCs was performed in a temperature and humidity controlled room. In addition to sampling breath VOCs onto Tenax ® TA tubes we used an electronic nose (Cyranose 320, Sensigent, Baldwin Park, USA) and an IMS detector (BioScout, B&S Analytik, Dortmund, Germany) in a subset of patients in Marburg and used a close gas loop high resolution GC-IMS device [27][28][29] as well as an ultra-high sensitive TD-GC-APCI-MS [30] in a subset of subjects in Hannover. The data of these subgroup analyses and their comparison with the GC-MS data will be reported separately [31].

Collection of breath samples for GC-MS analysis
The volunteer subjects inhaled through an A2 carbon filter (Dräger, Lübeck, Deutschland; adapted from standard operating procedure of P Brinkman, AMC University of Amsterdam [9]) and exhaled into a stainless steel tube sampling reservoir (50 cm length, 4 cm inner diameter) with flow restrictor at the end of the tube. In Hannover an inhalation/exhalation valve from Hans Rudolph (Shawnee, USA), in Marburg from Intersurgical (St. Augustin, Germany) was used. The first 3 min of each measurement were used to remove acutely accumulated environmental VOCs from the lung and to condition the stainless steel reservoir. Following this, the subjects continued exhaling into the reservoir for a five minute breath collection. Subjects exhaled against a small resistor and were asked to maintain an exhalation pressure between 3 or 6 mbar by visual control of a manometer.
Loading of two Tenax ® TA tubes (Perkin Elmer, Waltham, USA) from the reservoir was achieved by using an in-house vacuum line (Hannover) or vacuum pump (Marburg) with separate pre-calibrated restrictors for each tube that allowed 500 ml min −1 (variation <2% between tubes) to pass through the   tube. An additional tube was used to collect 2.5 l of room air close to the study subject using identical sampling conditions. Due to a temporary lack of thermal desorption tubes (in a period of TD-GC-MS instrument maintenance and repair) the room air control could not be sampled in some subjects at both sites.

VOC analysis
Tenax ® TA tubes were conditioned in Hannover (250 °C, 5 min) and either stored or shipped to Marburg for sampling. Tubes were tightly closed with Swagelok Caps containing PTFE fittings (polytetrafluorethylen, Swagelok ® , Solon, USA) for storage and shipment. The time between sampling and analysis was generally less than 1 week for Marburg samples and less than 3 d for Hannover samples.
Breath VOCs were analyzed by TD-GC-MS and identified in Hannover (for technical specifications see online supplement 1 (stacks.iop.org/JBR/10/026006/ mmedia)) using Enhanced Chemstation Software Rev. E.02.02 (Agilent Technologies, Santa Clara, USA). Compound identification was performed by retention time, reference compounds and by library search using the National Institute of Standards and Technology (NIST) Mass Spectral Search Program Version 2.0 (NIST, Boulder, USA). The detailed parameters for the analysis are listed in table S1 (online supplement) and examples for chromatograms from the different subject groups as well as for room air are also provided in the online supplement (figures S1 and S2). 134 VOC were quantified using MS response of specific target ions which most commonly matched with m/z signals of highest intensity in the respective mass spectrum. To avoid faulty discrimination between signals of same m/z ratio in the chromatogram (i.e. m/z 43 in acetic acid and ethyl acetate), other more specific target ions were selected for quantification. A subset of 30 compounds was chosen for external  standard validation and calibration (table 2). The reference compounds chosen for external standard measurements included linear alkanes, alcohols, aldehydes, aromatic compounds and terpenes all of which were detected in more than 90 percent of the subjects' samples. Pure compounds were solved in methanol yielding concentrations between 1 and 100 ng μl −1 and manually adsorbed onto Tenax ® TA tubes. Therefore 1 μl of external standard solution was injected onto the top of the Tenax ® TA surface using micro syringes for GC injection. To avoid loss of insufficiently retained VOCs prior to analysis during TD dry purge and flushing procedures, the tubes were immediately brought into a stream of dry nitrogen carrier gas (200 ml min −1 , 2 min) to completely adsorb the VOC molecules to the Tenax ® TA. Samples were injected using the same TD-GC-MS parameters as breath samples. Retention time and mass spectrum were used to validate NIST library search suggestions. Except for glutaraldehyde (No. 12 second best NIST match vinyl acetate in table 2), in all tested 30 cases retention times matched and NIST identification was confirmed. Furthermore reference compounds were used to assess concentration ranges of breath compounds and their reproducibility with respect to quantification (standard deviation/mean for fivefold experiment <5%). Single ion counts between 0.1 and 10 ng represented typical response values for compounds with normal abundance in 2.5 l of breath.

Data processing
The raw data of two Tenax ® tubes were taken for the analysis without normalization or correction with room air sample data. We found that the sum of total ion current (TIC) was not suited for normalization as the chromatograms included drastically varying peaks e.g. related to cleaning of valves or to Tenax ® degradation. The raw data of two simultaneously loaded Tenax ® tubes was first compared to identify outliers. The mean value of the single ion response of the two tubes was calculated and log transformed as this data was not normally distributed. Subjects showing hardly detectable VOC peaks due to improper sampling, the typical Tenax ® matrix decomposition VOC pattern, and most known indoor contaminations (e.g. cleaning reagents) were at this point excluded. Thus 101 VOCs of 190 subjects remained in the database for the final statistical data analysis.

Statistics
Log transformed data of breath VOCs are displayed as mean and standard error (SE) of the mean in the respective figures. Univariate comparison was performed by t-test. Demographic data were analyzed by ANOVA and post-hoc analysis was performed by the Newman-Keuls test. For correlations the Pearson correlation coefficient was given. Linear discriminant analysis (LDA) was used to determine the major VOCs that discriminate between patient groups (SPSS 20, IBM, Armonk, USA). The models were computed using stepwise inclusion. Cross-validation was performed by the leave-one-out method.

Demographics, comparison between sites
The overall COPD severity as indicated by the CAT score was milder in Hannover, but there were more GOLD 3 patients among the ex-smokers. Among the actively smoking COPD patients the highest mean CAT score and the highest CO levels were found in the COPD patients recruited in Marburg. There were also more GOLD 3 and 4 patients in this group (table 1). COPD patients were significantly older than healthy control subjects and had a larger BMI. Among the COPD patients recruited in Hannover were more women. Active smoking in COPD patients and healthy smokers resulted in significantly higher values of exhaled CO           and carboxyhemoglobin (COHb) as well as lower NO values compared to non-and ex-smokers.

Reproducibility of measurements, transport BTX levels
We compared the VOC levels from two independently loaded Tenax ® tubes to search and to correct for outliers. VOC levels were generally highly reproducible (figure 2), but fluctuations in retention time sometimes led to an insufficient peak integration that had to be manually corrected. Heptane partly overlapped with acetic acid in the chromatogram and m/z 43 showed to be the main fragmentation ion for both compounds. However targeting on alternative quantifier did not improve the quality of automatically generated results sufficiently and therefore heptane was excluded from the analysis. It is important to note, that the level of benzene, toluene, xylene (BTX) aromatic hydrocarbons was not significantly elevated in samples of healthy non-smokers from Marburg compared to non-smokers from Hannover. Analyzing all non/ex-smokers we even found slightly higher values in the Hannover samples, for toluene this difference was significant (p < 0.01). In room air samples we also observed slightly higher levels of BTX compounds in Hannover, for the level of benzene this difference was statistically significant (p < 0.05). On the other hand the effect of active smoking on the level of BTX was clearly detectable and these VOCs were related to the level of CO in the breath of smokers (table 2, figures 3 and 4).

VOCs in exhaled breath
134 VOCs basically representing the major peaks observed in the chromatograms of the different subject groups or of room air samples were randomly selected for the analysis (table 2). Depending on the subject group the selection corresponds to approximately 80-90% of peaks that would have been included into the automated analysis of the Chemstation software. Table 2 summarizes the available information concerning these VOCs, including retention time, percentage of positive findings in the different subject groups and in room air samples. Eight of the VOCs in the list were Tenax ® related (e.g. acetophenone, benzoic acid, benzaldehyde) and therefore excluded from the further analysis. Phenol, which can also be related   to Tenax ® matrix decomposition was increased in exhaled breath samples compared to room air samples and therefore not excluded. Fourteen VOCs were considered as low abundant, because they were detected in less than 25 subjects per site, and excluded from the analysis except for 2, which were considered as smoking related. Thirty-two VOCs were also found in significant amounts in room air samples at both sites (median ratio breath/room air <1.1), of which we excluded 18 from the analysis, as these were not or only in few exhaled breath samples observed in higher concentrations compared to room air. Typical chromatograms of the different subject groups of each study site as well as room samples are provided in the online supplement.

Smoking related VOCs
Eight VOCs were only detected in active smokers, additional 46 (including phenol) were considered as smoking related, because higher levels were found in actively smoking subjects. Active smokers could be clearly discriminated by higher values for combustion products like furans as well as BTX aromatic hydrocarbons. Table 2 provides a list with those VOCs that were significantly elevated in active smokers and also shows that most of these correlate with the CO or COHb values that were obtained at the time of VOC breath collection. The relationship between these independent measurements indicates a valid breath sampling methodology and VOC analysis. As higher numbers of pack-years (PY) are related to the amount of active smoking at the time of breath sampling, there were also weak correlations between the smoking related VOCs and PY with correlation coefficients <0.5.
To test our LDA approach we assessed whether 'smoking status' would be correctly classified (table 3). For this analysis we included the 101 VOCs that are listed in table 2. Of the 10 VOCs included into the model most were considered as smoking related VOCs. Interestingly, 7 of the 8 misclassified smokers had CO levels below 6 ppm and overall the group of the 8 misclassified smokers had a significantly ( p < 0.003) lower level of exhaled CO compared to the 78 correctly classified smokers.

Comparison between sites
Within the list of 134 VOCs used for exhaled breath analysis we found significant differences (p < 0.001) for room air between sites for e.g. ethanol, isoprene, acetone, room cleaning related compounds or menthol (table 2). For hexanoic acid (increased in Marburg) and for 2-methyl benzofurane (increased in Hannover) these differences were also observed in breath of healthy non-smoking controls, though limited to a subset of subjects. Some lower abundant VOCs that were not included into the overall analysis were also site specific (table 2). Table 2 shows the 5 breath VOCs that were found to be significantly higher in all groups between sites (p < 0.001) and those additional 12 VOCs that had more than twice the number of positive cases at a site compared to the other.
Using LDA and including the 101 selected VOCs (table 2) into the analysis almost resulted in a perfect separation of the sites (table 4). 19 VOCs were included into this model, for 3 VOCs we found evidence for a room air relationship. Excluding these from the analysis did not change the result. Both the univariate and the multivariate data indicate that independent from environmental factors differences in breath VOCs exist.

COPD related breath VOC
For the search of COPD related VOCs we used a multivariate LDA as well as the univariate comparison between healthy subjects and patients with COPD. Due to the known profound effect of smoking, both approaches were performed separately for non/ ex-smokers and for active smokers.
For the LDA analysis we included all 190 subjects and used the 101 VOCs indicated (index 25) in table 2, which were also used for the discrimination of smokers and non-smokers and for site described above. The result of this analysis is displayed in tables 5 and 6.
8 VOCs were included into the model to separate the non/ex-smokers, and 4 different VOCs were included to separate the actively smoking groups, however, 6/8 VOCs included into the model to discriminate COPD patients from controls in non/ex-smokers and 3/4 VOCs included into the model for active smokers showed similar differences in the respective other group   table 2). This could be interpreted as additional evidence for a relationship of these VOCs to COPD. We also performed the analysis using both sites as training sets and the respective other site as an independent validation set. The training datasets of the separate sites lead to comparable levels of discrimination between healthy subjects and COPD patients, with a lower rate of correct detection in Hannover compared to Marburg. The respective validation of the different models derived from the site specific analysis, however, yielded unacceptable high error rates. Limiting the analysis to those VOCs considered as relevant from the whole group analysis did not improve the error rates substantially.
Therefore we also performed the univariate analysis between healthy and COPD patients separately for active smokers and non/ex-smokers. The bold values in table 2 indicate the VOCs with significant (p < 0.005) differences between groups separately for the whole group and for the two sites. If in the whole group or in one of the sites such a clear difference was found, than it is also indicated, that a comparable difference with p < 0.02 was also observed at the respective other site or within the whole group. The table shows, that in Marburg we found a much larger number of VOCs with differences between healthy subjects and COPD patients and that these were predominantly found in smoking subjects. In Hannover only a few VOCs with significant differences were observed, and these were mainly in the non/ex-smokers. Figure 4 shows the 7 VOCs that were significantly (p < 0.05) different in the whole group and showed the same kind of difference between healthy subjects and COPD patients both in non/ex-smokers and smokers. Interestingly, only vinyl acetate and 1-ethyl-3-methyl benzene are among the VOCs that were included into the multivariate models.
Overall we found evidence for a relationship to COPD for 14 VOCs. None of the 14 VOCs showed a correlation with age, neither when including all subjects, nor within the 4 groups of subjects separately. We also found no relationship with the FEV/FVC ratio.

Discussion
In this study we used continuous breath sampling from an inert reservoir and analyzed selected target ion identified VOCs for their relationship to stable COPD. Based on (a) the good correlation between the data of two simultaneous collected adsorption tubes, (b) the correlation between smoking related VOCs and the independently assessed levels of exhaled CO, and (c) the ability to discriminate between smokers and nonsmokers with a low error rate we assume a sufficient validity of our data. From the multi-and the univariate analysis we consider 14 of the 134 selected VOCs to be COPD related. The evidence is either derived from the inclusion into the respective LDA models or from the fact that a comparable difference between controls and COPD patients was not only observed in the analysis of all subjects, but also separately for the subjects of each site and within the groups of smokers and non/ ex-smokers. In addition, site specific differences were detected for a number of VOCs. These extend beyond environmental and cleaning related compounds, therefore we consider this to be an important result that needs to be taken into account in future breath research trials. In line with others we found that smoking has a profound effect on the composition of breath VOCs [26,33], emphasizing the need to adequately control for smoking status especially in studies investigating smoking related diseases such as COPD [12].

Smoking
The effect smoking, on the other hand, can be used to validate breath VOC data and the respective statistical analysis procedures. As expected, we were able to discriminate between non/ex-smokers and smokers in the multivariate analysis. Interestingly, the misclassifications were almost all smokers that were considered as non-smokers. They might have had longer periods than the recommended 2 h of nonsmoking or smoked fewer cigarettes prior to their visit, as they had significantly lower CO levels compared to the rest of the smokers. The independently assessed breath CO levels also support the validity of our GC/MS data, as they showed a good correlation with most smoking related VOCs. For some of these, like BTX hydrocarbons and 2,5-dimethyl-furane, comparable effects of active smoking have also been reported by Alonso and coworkers [33].

Site
Both for room air VOCs and for breath VOCs we found differences between the two sites Marburg and Hannover. The room air differences could be in part attributed to differences in overall cleaning procedures. Some of the site specific VOCs would also suggest differences in the number of staff in the vicinity of breath sampling or in room air exchange rates between the sites. With respect to the differences in BTX aromatic hydrocarbons both in room air and in breath we would assume that the size and location of the sites played a role. Marburg is a small city in a rural area 20 km away from a major highway with just 80 000 inhabitants, while Hannover has about 500 000 inhabitants, and is linked to several major highways. Our results did not provide evidence for a contamination of Marburg samples during road shipment by courier. Differences in VOC pattern between two sampling sites in China and Latvia have also been reported before [34]. Besides the differences in room air we also found some very pronounced differences in breath VOCs, which could be attributed to differences in breathing valves and/ or their respective cleaning procedures. Examples for these differences are provided in the online supplement (figures S1-S3). These extreme site specific VOCs were not included into the analysis and the 134 VOCs listed in table 2. Excluding all room air or suspected cleaning related VOCs from the analysis still resulted in almost complete discrimination between subjects of both sites, indicating that additional site specific differences in exhaled breath exist. These could be environmental or lifestyle-related and have to be taken into account in multicenter studies.

COPD
To identify COPD related VOCs we compared the VOC patterns between healthy subjects and COPD patients separately in the group of non/ex-smokers and in smokers. Merged evidence from the univariate and multivariate analysis resulted in a list of 14 VOCs with a potential relationship to COPD. We considered the fact that a VOC was included into the multivariate model of a group and at the same time showed a comparable difference between controls and COPD in the respective other group as evidence. If the VOC also showed the same difference between controls and COPD patients separately at each site, the evidence was considered as strong ( figure 4). Analysis of the data from the separate sites as training and test datasets did not lead to models with sufficient discriminative power. Even focusing the analysis on the 14 potentially COPD related VOCs lead to unacceptable high error rates. This could be due to absolute differences in the levels of these VOCs between sites or due to the variability of the data (see individual data for selected VOCs in figure 4). There were also some differences between the patients groups recruited at the study sites with respect to gender distribution and lung function, but currently their effect on the results are difficult to interpret.
The results of the multivariate analysis are affected by the number and kind of variables included. Adding a VOC or leaving one out can have an effect on the VOCs included into the models to discriminate COPD patients from controls. Therefore, our pre-selection of 101 VOCs for the analysis is likely to affect our results, but more in a sense that we might have missed discriminative VOCs. However, with the currently existing differences in sampling procedures, adsorption material, analysis technology, and statistical evaluation methods between study groups, a complete set of COPD related VOCs cannot be expected from a single study. But every study can contribute to the list of potentially COPD related VOCs that then can be tested in larger population-and multi-center studies for their clinical relevance. This includes to test how stable these VOC markers are over time and how they respond to changes in e.g. airway inflammation, treatment, exacerbation rate and potentially to other disease related outcomes.

Comparison with other COPD studies
There are a number of studies reporting differences between patients with COPD and healthy control subjects. The results of studies using sensor arrays or eNose technology [9,14,23] cannot be compared with our data. This is also true for studies using very sensitive ion mobility spectrometry (IMS), as these generally do not identify the discriminating peaks. One example is the recently published study by Besa and colleagues reporting differences in breath VOC patterns for 45 COPD patients, 23 healthy smokers and 28 healthy non-smokers [8]. Interestingly, however, that there appeared to be a smaller percentage of smoking related VOCs among the discriminating peaks as compared to our study. Using a real time MS, Sinues and coworkers found a similar level of discrimination between controls and COPD patients [17]. One of their major discriminating VOCs was most likely indole, which also showed lower levels in the COPD patients in our study.
In 2012, Phillips and coworkers performed a large COPD study including 119 stable COPD patients (34% active smokers) and 63 healthy controls (10% active smokers) [12]. The major difference compared to our study was that just 130 ml samples were collected and loaded onto carbon based adsorption tube material. Similar to the study by Besa et al, approximately half the patients were GOLD 3 and 4 and therefore more severely ill as compared to the patients included into our study. Despite the differences in sampling and analysis, Phillips et al also found evidence for BTX aromatic hydrocarbons, acetic acid, and phenol to be COPD related.
Basanta and coworkers compared 39 COPD patients (31% active smokers) and 32 healthy subjects (31% active smokers) [11] using a special sampling device that enabled a direct loading of late expiratory breath onto Tenax ® TA/Carbotrap adsorption tubes. Comparable to our study 3 l of breath were analyzed, but the use of GC-time of flight (ToF)-MS 487 VOCs were identified. The COPD subject, which were predominantly GOLD II and III could also be discriminated from the healthy controls, however, there is no overlap between the VOCs used for model classification by Basanta et al and those used in our study. Hexanal, which was excluded from our analysis due to high levels in room air, was the only substance that overlaps with the markers listed by Phillips et al [12]. Nonanal and decanal, which were also considered as discriminative between healthy subjects and COPD by Basanta were excluded from our analysis, as both are considered Tenax ® TA artifacts [32]. For these VOCs we observed a clear correlation between breath samples and room air samples, potentially indicating varying levels of trap Tenax ® TA decomposition.
Van Berkel and coworkers recruited 50 COPD patients (76% active smokers) and 29 controls (31% active smokers) into their study. In addition they included 16 COPD patients (12% active smokers) and 16 controls (19% active smokers) from a different hospital for validation. Breath sampling was performed by filling Tedlar bags with 5 l of breath which were transferred to adsorption tubes comparable to those used by Phillips et al as described above. Analysis was performed by TD-GC-TOF-MS. Again, hexanal was among the potentially discriminating VOCs, which in opposite to our results and e.g. data by Basanta et al all showed lower levels and lower frequencies in the COPD patients. Interestingly, the training model by van Berkel et al with 6 VOCs (not including hexanal) showed a nearly optimal performance in the validation dataset from a different hospital, indicating that their 6 VOC classifiers are independent of site and smoking.

Selection and analysis of VOCs
We selected 101 VOCs for the final analysis. The main reason to exclude a VOC from this list was that the VOC (1) was suspected to be a Tenax ® TA decomposition product, (2) low abundance or (3) room air related unless we found evidence for a relationship to smoking. Apart from these rules VOC 14 and 116 (table 2) remained in the list as we found evidence for a relationship to COPD from the univariate analysis and VOC 56 and 125, as a correlation between breath and room air was only found at one site. The exclusion of VOCs from the analysis bears the risk of losing potential information and missing discriminating VOCs. On the other hand, we feel that available information about a VOC e.g. its relationship to room air or to cleaning procedures should be taken into account if possible. In addition, limiting the number of variables in studies with limited number of patients provides a more robust basis for the statistical analysis and reduces the risk of over fitting.

Limitations of our study
Despite the fact that we performed one of the largest exhaled breath studies to find COPD related VOCs by including 190 patients and controls, studies with much larger patient numbers are required to provide data for robust statistical analysis. However, these require a multicenter approach with standardized sampling and analysis procedures that enables to build a large disease related database. Efforts for a standardization of procedures are under way and a recently introduced novel sampling device (Owlstone, UK [35]) could potentially support this.
Our study provided cross-sectional data for COPD related VOCs, which cannot be used to predict any COPD outcomes or help to monitor the disease. The clinical value of the potentially COPD related VOCs will need to be evaluated in longitudinal trials. Being well aware that COPD is a multifactorial, very complex disease the suggested markers of our and other comparable studies are also unlikely to improve the diagnosis of COPD at the current stage.
We did not include external standards into our analysis. Therefore we have to deal with a certain variability of our data due to changes in loading capacity of Tenax ® tubes or due to instrument drift. As our patients were recruited in random order, we consider it unlikely that we introduced a major bias into our data due to the lack of such a standard. However, it might have affected the analysis by increasing the overall variability of the data.
Using Tenax ® TA as adsorption material has advantages, e.g. useful for high moisture content samples like breath, low desorption temperature with less risk to produce unwanted artifacts, but also some drawbacks. There are a number of known decomposition products, some of which, like benzaldehyde have been suggested by others for data normalization [12,26]. In addition, nonanal and decanal, which have been suggested as potentially COPD related, are difficult to evaluate correctly when Tenax ® TA is used as adsorption material.
In this study we did not use all the available information provided by the chromatograms to discriminate between groups. Instead we identify 134 VOCs by target ions and by NIST database. This approach is likely to miss information especially of lower abundant or lower level VOCs. On the other hand the identification by target ions allows the separation of VOCs with overlapping retention times which is not possible when using the TIC as outcome parameter. Using a growing database with retention times and target ions is also an economic way to deal with the large amount of data, avoiding pre-processing like alignment of chromatograms or baseline subtraction, which could have unexpected effects on the data if performed unsupervised.

Conclusion
In summary, we identified 10 novel breath VOCs that appear to be related to COPD using a different approach in breath sampling compared to others and by using a limited number of target ion identified VOCs. We confirm the profound effect of smoking in our study and show that so far unexplained site specific differences exist, that need to be taken into account when interpreting results from multicenter studies.