Volatile organic compounds in a headspace sampling system and asthmatics sputum samples

The headspace of a biological sample contains exogenous volatile organic compounds (VOCs) present within the sampling environment which represent the background signal. This study aimed to characterise the background signal generated from a headspace sampling system in a clinical site, to evaluate intra- and inter-day variation of background VOC and to understand the impact of a sample itself upon commonly reported background VOC using sputum headspace samples from severe asthmatics. The headspace, in absence of a biological sample, was collected hourly from 11am to 3pm within a day (time of clinical samples acquisition), and from Monday to Friday in a week, and analysed by thermal desorption-gas chromatography-mass spectrometry (TD-GC-MS). Chemometric analysis identified 1120 features, 37 of which were present in at least the 80% of all the samples. The analyses of intra- and inter-day background variations were performed on 13 of the most abundant features, ubiquitously present in headspace samples. The concentration ratios relative to background were reported for the selected abundant VOC in 36 asthmatic sputum samples, acquired from 36 stable severe asthma patients recruited at Glenfield Hospital, Leicester, UK. The results identified no significant intra- or inter-day variations in compounds levels and no systematic bias of z-scores, with the exclusion of benzothiazole, whose abundance increased linearly between 11am and 3pm with a maximal intra-day fold change of 2.13. Many of the identified background features are reported in literature as components of headspace of biological samples and are considered potential biomarkers for several diseases. The selected background features were identified in headspace of all severe asthma sputum samples, albeit with varying levels of enrichment relative to background. Our observations support the need to consider the background signal derived from the headspace sampling system when developing and validating headspace biomarker signatures using clinical samples.


Introduction
The extraction and the analysis of the headspace of a biological sample is an approach mostly used to detect possible biomarkers of physiological and pathological conditions affecting the sample [1][2][3]. The headspace of a sample consists of the vapour phase within a sealed container in which the sample is located. The vapour is generally concentrated above the sample material at the top of the container, hence the word 'head-space' [4]. The headspace of a biological sample contains both volatile and semi-volatile organic compounds (VOCs/SVOCs) which represent the vapour fraction of the sample, mirroring its composition and biophysical properties [5]; therefore physiological or pathological alterations of a biological sample may be reflected in qualitative and/or quantitative VOC variations in its headspace.
Headspace analysis of cell cultures (such as inflammatory cells [6] small airway epithelial cells [7], sputum samples (as source microbes) [8] and of microbial cultures [9]), has been used, together with the analysis of volatile metabolites in exhaled breath (breathomics) [10,11], as a non-invasive methodology to identify potential airborne biomarkers linked to different airway pathological conditions and in order to stratify different patients' phenotypes.
VOCs present in headspace of biological samples and in breath might originate from biological metabolic processes of endogenous constituents. For example acetone resulting from celluar ketogenesis [12] and is considered a systemic volatile component [12,13] which is detected in breath and urine headspace [14]. Headspace of biological samples can also contain VOCs originating from the catobolism of external components to which individuals are exposed [15]. For example exogenous VOCs can be inhaled, ingested or parenterally absorbed, eventually metabolised and released in biological samples such as serum, plasma, urine, sputum and breath, which are routinely used in clinical metabolomics profiling studies. For example the presence of hydroxylated polycyclic aromatic hydrocarbons, such as naphthalene, phenanthrene and pyrene, was detected in the headspace of urine samples from workers who were exposed to hot asphalt [16]. The headspace of a biological sample can also contain exogenous VOCs present in the room air or VOCs that have originated from the headspace collection system. These exogenous VOCs represent part of the background signal affecting the headspace composition. The background signal can vary due to several factors such as ambient temperature and humidity of the room in which the headspace is collected, meteorological conditions [17], materials involved in the sampling process and the timing of headspace collection as demostrated by Boshier et al, who identified that in the ambient air at three different clinical sites, there were higher levels of ethanol in the morning compared to the evening, and increased levels of ammonia at the end of the day compared to the morning [18]. Some of the VOCs which are reported in literature as exogenous VOCs present within clinical environment, are also reported as potential biomarkers for several disease conditions. Toluene and isopropanol, for example, are reported as common volatile components of the clinical environment [19] but are also found among the predominant VOCs in breath of asthmatic patients [20] and in headspace of faeces samples of patients with non-alcholic fatty liver disease [21]. These observations suggest that the background VOCs and their variations could affect, for instance, the signal of a targeted VOC in the headspace of a biological sample, and support the need for a better understanding of the background signal in headspace of biological samples during the development and validation of a panel of volatile biomarkers.
With this study we aimed (a) to identify background VOCs generated within a headspace sampling system developed for collecting headspace from sputum samples, (b) to evaluate the intra-day and inter-days background variation of common and abundantly identified features and (c) to estimate the contribution of the volatile background signal generated from the sampling system into headspace of sputum samples acquired from a cohort of stable severe asthma patients.

Study design
The study was performed in the month of October 2019 at the Glenfield Hospital, Leicester (UK) in a clinical research facility testing room where lighting, temperature and humidity levels were constantly monitored and calibrated.

Background VOC variation study
The headspace collection in absence of a biological sample (blank sample)-which contained just the background VOCs generated from the headspace sampling system-was performed using a headspace sampling system consisting of a small vessel, an empty Petri dish and a supply of pure air. The headspace samples collected in absence of a biological specimens and the VOC found in them, were reported in this study respectively as background headspace and background VOCs.
For the headspace background signal intra-day variation study, the headspace from a blank sample was collected-using the method subsequently described-hourly from 11am to 3pm. The time intervals were chosen based on the period within which clinical research samples (e.g. sputum, urine, and blood) are usually collected and processed. The experiments were repeated in quadruplicate (per time interval), generating 20 headspace samples in order to estimate the intra-day background signal variations. For the inter-days variations study of the background headspace signal, the headspace from a blank sample was collected every day for a week, from Monday to Friday between 11am and 3pm. The experiment was repeated in triplicate, generating 15 headspace samples. The same analytical process consisting of TD-GC-MS analysis was applied to the headspace samples collected for both the intra-and inter-days background variation studies. The TD-GC-MS data of both the studies were integrated, deconvolved and aligned (as subsequently described) to generate a matrix of 1120 features. A feature represents a peak in the chromatogram that satisfies the adopted integration parameters (s/n, peak area, peak height etc.).
The 1120 identified features were used to (a) characterise the background volatile signature generated from the headspace sampling system represented by the features present in at least the 80% of all the headspace samples (for both the intra and interdays variation studies); and (b) to identify features ubiquitously present across all the 35 samples which were used for statistical analysis of background signal intra-day and inter-days variation in headspace samples.

Sputum headspace asthma study
A separate and subsequent study was conducted to evaluate the occurrence of the most abundant and ubiquitous present background features in headspace of sputum samples from severe asthmatic patients. The cohort of severe asthmatics consisted of an adult population with an age (median (Q1:Q3)) of 62 (57:70.25) years; 14 females and 22 males with a post bronchodilator FEV 1 (Forced Expiratory Volume in the first second of the forced breath)% predicted of 69.75 (52.50:87.60) and a post-bronchodilator FEV 1 /FVC (forced vital capacity)% predicted of 87.78 (74.4:97.7). According to the Global Initiative for Asthma (GINA) [22] treatment strategy for adults, 20 patients were classified as GINA 5 and 16 as GINA 4; 14 of the GINA 5 patients were also receiving anti-IL5 therapy (Mepolizumab).
The sputum samples were spontaneously produced and acquired at least 6 weeks post exacerbation in the asthma cohort. The sputa were collected in the clinical research facility testing room where the background experiments were conducted and where the headspace was subsequently sampled. Headspace from a blank sample was collected for each sputum headspace as control. Based on the results of background headspace variation study (within a day and between different days) the sampling time for sputum samples and the respective controls was between 11am and 3pm (no specific time point selection was needed). The 36 sputum samples were collected from Monday to Friday over a period time of 12 months; the headspace from sputum samples and their controls were collected immediately after the sputum production and then analysed by TD-GC-MS.
The most abundant background features identified in the previous study were identified as well in control headspace samples and compared with those found in headspace of sputum samples.

Method of headspace collection
The sampling system and the method used to collect headspace in presence and absence of the sputum sample were the same and were based on the generation of a dynamic headspace within the sampling system in presence and absence of a biological sample, and on its collection in sorbent tubes to be subsequently analysed by TD-GC/MS.
A small empty Petri dish (Merck Chemicals Ltd, Nottingham, UK) (where the sputum was placed in case of collection of headspace from the sputum samples) was located, uncovered, into a custom vessel (University of Leicester, UK) made of polyether ether ketone (PEEK), a semi-crystalline thermoplastic material with excellent mechanical and chemical resistance properties. The vessel was sealed with a screw cap fitted with two ports; an inlet port, joined to a pure air cylinder and an outlet port which, during the headspace sampling, was connected to the sorbent tube as reported in figure 1. In order to generate the dynamic headspace, the empty Petri dish contained in the vessel was flushed with a flow of air (BTCA grade, BOC, UK) at a flow rate of 200 ml min −1 for 5 min, sampling 1 l of air onto the sorbent tube. The airflow displaced the headspace above the Petri dish and concentrated VOCs from the headspace onto the sorbent tube (carbograph 1TD/tenax TA 60/40, Markes International, Llantrisant). The air composition was 20.04% oxygen with nitrogen balance (total hydrocarbons ≤0.1 ppm, carbon monoxide ≤1 ppm, carbon dioxide ≤300 ppm, total NOx ≤0.1 ppm).

Sorbent tubes preparation
The sorbent tubes were previously weighed and conditioned; the uncapped tubes were weighed before each conditioning procedure to check possible sorbent mass degradation and were conditioned at 330 • C using a nitrogen flow at a pressure of 1.5 bar, for 150 min. Before the TD-GC-MS analysis, in order to ensure that inter-samples variability was maintained at acceptable levels and analytical system artefacts were minimised, an internal standard solution was loaded into the sorbent tube. This was performed by attaching the sorbent tube to an injector and by introducing 0.6 µl of 20 µg ml −1 of internal standard solution in a purified nitrogen flow of 100 ml min −1 for 120 s to ensure an efficient transfer of the standard onto the adsorbent packing. The internal standard solution was a mixture of deuterated toluene (D8), octane (D18) and phenanthrene (D10).

TD-GC/MS headspace analysis
The background headspace samples were analysed by TD-GC-MS; using a Unity-2 thermal desorption unit (Markes International, Cardiff, UK) linked to a gas chromatograph (Agilent, 7820A) coupled to a single quadrupole mass spectrometer (Agilent, 5977B). The sorbent tubes were placed inside the thermal desorption unit and desorbed at 300 • C for 5 min with a flow rate of 45 ml min −1 splitless. The volatiles were drown along the heated line onto a 'hydrophobic, general' trap (matching the sorbent in the sample tubes) which was held at −10 • C. The trap was heated at the maximum heating rate to 300 • C for 5 min with a desorption flow rate of 2 ml min −1 splitless and the compounds were introduced to the GC column (Rxi-5ms 60 m × 0.25 mm i.d. × 0.25 µm). The volatiles were separated on the chromatographic column in a Helium flow of 2 ml min −1 with an initial oven temperature of 40 • C and a final oven temperature of 300 • C hold for 5 min (oven temperature ramp of 5 • C min −1 ). Mass spectrometer was fitted out of an electron ionization source and a single quadrupole as

TD-GC-MS data analysis
The TD-GC-MS data analysis was performed in Ana-lyzerPro (SpectralWorks, Runcorn, UK-version5.7) for integration, deconvolution and alignment of chromatographic peaks across the samples. The software annotated each feature with its Kovats' retention index (RI), the first and second mass spectrum quantifier ion (m/z) and its retention time, which together constituted a unique compound label that was used to align the identified compounds across both the headspace samples collected at different time-points in a day and those sampled on different days in a week. To the chromatographic features was assigned a Level 2 annotation in accordance with the Metabolomics Standards Initiative [23] based on mass spectral similarity with the NIST 2011 mass spectral library, using a 50% confidence threshold, and on matching the RI with data reported using the same chromatographic column stationary phase. The chromatographic feature identities were compared to data acquired for chemical standards (Sigma-Aldrich), where available, run on the TD-GC-MS in order to assign to the chromatographic features a Level 1 of identification through comparison of RI and mass spectra, aiming to achieve a reliable identification of the compounds of interest. The analytical standards were run on the GC-MS for 23 out of 37 features identified in at least the 80% of all the samples (as reported in table 1, third column). For the remaining features it was not possible to run analytical standards because they were not available or because some features, in absence of a NIST match, were just identified with their RI and base and secondary most intense ions in their mass spectrum.
For the study of background volatiles intra-and inter-day variations in headspace of blank samples the software generated a data matrix containing the chromatographic features (rows) identified across all the headspace samples (columns) of both the studies and their respective chromatographic peak area values representing their abundance.
The data analysis of sputum headspace and the respective headspace controls generated a matrix of chromatographic features which was used to consult the most abundant background volatiles selected in the previous study.

Chemometric analysis
The data matrices generated by AnalyzerPro from the GC-MS data analysis of both the intra-and inter-days background headspace variation studies, and for the sputum headspace study were edited removing VOCs deriving from the internal standard solution spiked into sorbent tubes previous GC-MS analysis. The abundance value of each VOC was normalised for the internal standard toluene D8-the most representative internal standard across all the samples, compared to the others spiked in sorbent tubes before thermal desorption. For the intra-and inter-days background headspace variation studies the data matrix contained a list of 1120 chromatographic features across the all the 35 headspace samples, 20 samples collected in quadruplicate at five different time-points in a day, and 15 collected in triplicate on different days in a week.
A list of features characterising the background signal generated from the headspace sampling system was obtained analysing the original data matrix and identifying those compounds present in at least the 80% of all the headspace samples. The list of 37 features was used to select 13 compounds for the statistical analyses for both the intra-day and inter-days background variations studies. The 13 selected features met the selection criteria of being the most abundant volatiles-with an abundance above or equal to the average value of all the peak areas of the VOCs found in at least the 80% of all the headspace samples-and being ubiquitously present in all the samples collected at each time-point in a day, on each day in a week and in all the replicates.  [26] • Faeces-Crohn's syndrome [24] 1059.72_118_117 • Saliva-Celiac disease [26] • Ulcerative colitis [27] 1080.33_68_67 Eleven of the thirteen background selected features were identified in sputum headspace samples and their controls (headspace in absence of biological sample) and used to evaluate background contribution in sputum headspace signal of severe asthmatic patients.

Statistical analysis
To analyse the background headspace intra-and inter-day abundance variation, one way analysis of the variance for the normalised abundance values, summarised as median and interquartile range, was examined by Kruskal-Wallis test for each selected feature and the inter-group comparisons (between timepoints or between days) were performed by Dunn's multiple comparisons test. A p value < 0.05 was required to be considered statistically significant.
Z-scores were calculated to standardise compounds abundance values (for the selected group of 13 features) to monitor their variations from the mean. A linear regression plot was fitted to the features z-scores to evaluate the functional relation between abundance of background compounds in headspace and the sampling times. The standard statistical comparator tests and graphical plots with the exclusion of principal component analysis (PCA), were performed using GraphPad Prism 8 (GraphPad Software Inc. La Jolla, CA, USA).
PCA was conducted for the data matrix of 37 features normalised abundance values identified in all the 35 headspace samples to generate a 3D scatter plot to visualize similarities between samples; the first three principal components were plotted in the Cartesian planes (x, y, z). PCA identified the major abundance variations of the selected chromatographic features in background headspace in the main principal components. The PCA was performed using IBM SPSS (Version 26.0. Armonk, NY: IBM Corp.).
The PCA on the abundance values of the 1120 features instead, was automatically generated by Analyz-erPro for both the studies.
In the study to evaluate the background contribution into sputum headspace samples, the normalised peak area values of 11 of the 13 selected volatiles (due to the absence of two of the 13 selected features in sputum headspace samples)-identified in both the headspace of control (blank) samples and headspace of sputum samples-were summarised as mean and standard deviation (SD) and analysed by a multiple t test (two tailed and considered significant at p < 0.05) using GraphPad Prism 8 to compare abundance of each background feature between control and sputum headspace. GraphPad Prism 8 was used also to plot the normalised chromatographic peak area average values ratio between sputum headspace and control headspace samples for each of the selected background features.

Results
For the intra-and inter-days background headspace variation studies the GC-MS data analysis for all the headspace background samples generated a matrix of chromatographic features that contained the peak area values of 1120 chromatographic features across the headspace samples collected at each time-point (11am to 3pm) in a day, on each day (Monday-Friday) in a week and in all the replicates, resulting in a matrix of 1120 compound rows (×) 35 columns of samples. As shown in figure 2(a), 37 features were present in at least the 80% of all the 35 background headspaces and were considered in this study as the best representative constituents of the background signal generated from the headspace sampling system. The 80% cut off was arbitrary chosen to select those features stably recurrent in background headspace samples.
The remaining features were present in less than the 80% of the background headspace samples; their sporadic appearance in headspace of blank samples suggested that, although representing part of the background signal, they probably originated from sources which were external to the headspace sampling system. In order to estimate the variability of the background signal generated from the sampling system we referred to the most constant identified features.
The PCA results for the data matrix of 37 features in all the background headspace samples are summarised in the 3D scatter plot in figure 2(b). The total proportion of variance explained by the three principal components was 93.9%. PCA identified that features from the intra-day study clustered around components 1 and 2, whereas, features from the inter-days study clustered around components 2 and 3.
In table 1 the identities of the selected 37 chromatographic features are reported as AnalyzerPro automatic generated labels-consisting of compound Kovats' RI and first and second quantifier ions (m/z)-and as NIST 2011 library match; the features whose identification was confirmed by TD-GC-MS analysis of standards are marked in the third column. The Human Metabolome Database (HMDB) consultation allowed to find references in literature that reported the listed background chromatographic features as potential diseases biomarkers; many of them were identified in breath and saliva, faeces, urine headspace as potential biomarkers for asthma, several chronic gastrointestinal diseases and cancer [22][23][24][25][26][27][28][29].
To study the intra-day and inter-days variations of the background signal, 13 chromatographic features were selected from this table-because they were ubiquitously present in all the headspace samples collected at each time point in a day, on each day in a week and in all the replicates with a peak area value above For the study of the background signal intra-day variations, the PCA in figure 3(a), which was performed on the abundance values of the 1120 background features, showed samples grouping according to their collection time in a day; the score plot showed that the samples collected at 11am and 3pm were outliers whereas the other samples were not all loaded on PC1 primarily.
The results for the statistical analysis of background signal intra-day variations at different timepoints are shown in the bar plot in figure 3(b); the normalised peak area values of the 13 selected background features were plotted at five different timepoints in a day. The analysis of the variance showed a significant difference for the intra-day abundance variations of benzothiazole only, between 11am and 3pm (11am-3pm; α = 0.0034). However, the fold change for benzothiazole abundance values between 11am and 3pm was small (2.13). A graphical visualization of the abundance variation of the selected components across the different time-points in a day is shown in figure 3(c); the peak area values reported as z-scores, displayed the compound abundance variations enclosed within ±1.96 SD.
For the background VOCs inter-day variation study, the PCA in figure 4(a) showed the 15 background headspace samples grouped according to the day of the week in which they were collected and based on the abundance values of all the 1120 features. The results of the statistical analysis for the inter-day variation of background headspace features are reported in figure 4(b); no statistically significant difference in the abundance values of the 13 selected chromatographic features was reported between different week days. The inter-day variations of the abundance values reported as z-scores were enclosed within ±1.96 range, as showed in figure 4(c). To test whether the abundance of the selected background chromatographic features changed in relation to the sampling times (to the time-points in a day and to the days in a week), a linear regression plot was fitted to both the z-scores plots of both studies; each feature was plotted against each time-point in a day and each day in a week; the only significant linear relation was observed for benzothiazole between The analysis of the contribution of 11 of the 13 selected background features into headspace of 36 severe asthma sputum samples and 36 blank controls is shown in figure 5. The normalised peak area value ratio was calculated for each background feature between sputum headspace and control headspace displaying that the levels of background volatiles increased in sputum headspace compare to control with the exception of benzenemethanol, α, α-dimethyl-and pentadecane whose levels decreased and undecane and naphthalene which were not detected in sputum headspace. A statistical significant increase of acetone abundance is shown in headspace of biological samples compared to control headspace sample (p = 0.001958). Three fold increase of the abundance of styrene was also reported in headspace of sputum samples followed by 1.5 fold increase of benzaldehyde levels; p-xylene, butanoic acid methyl ester-, acetophenone and benzothiazole showed an increase of abundance of <1.5 fold.

Discussion
The volatile signal in the headspace of a biological sample contains exogenous volatiles that represent the background signal. This study aimed to characterise the background signal generated within a headspace sampling system developed for collecting headspace from sputum samples and to study its intra-and inter-day variations, and also to evaluate background signal contribution in headspace of sputum samples collected from severe asthmatic patients. The sampling system consisted of a small PEEK vessel, an empty Petri dish and a supply of pure air. In the study to characterize the background signal and to study its intra-and inter-day variations, background headspace samples were collected in quadruplicate at five time-points in a day and in triplicate on five week days in order to study their respectively intra-day and inter-days abundance variations.
We have made three key observations in this study. Firstly, that the variation of the most abundant features found in the background of headspace capture system did not appear (with the exception of a single feature) to show significant variations within and between days suggesting that routine acquisition of samples in clinical studies may not necessarily need to be standardised to a particular time or day (for the time windows evaluated in this study). This assertion is only likely to be valid for features that are abundant in the majority of samples as reported here, as we demonstrated clustering features according to time of the day and day of the week when applying PCA to all 1120 quantified features.
Secondly, we identify that many of the most abundant features in the background system are widely reported as disease biomarkers in inflammatory disorders such as asthma and inflammatory bowel disease. Although we noted that the features found in headspace of blank samples originate from exogenous sources, this does not exclude them as putative biomarkers in breath or headspace of biological samples. An exogenous compound present in the inspired air can be also endogenously produced and released in alveolar breath; this may cause the potentially unpicking of endogenously derived components as biomarkers. For example, although in this study acetone represents one of the most abundant background features in headspace of blank samples, it is also commonly produced endogenously in human body and released in breath and in sputum headspace. Acetone abundance variations in breath can be associated to pathological processes such as diabetic ketoacidosis [35] and have also been described as disease biomarkers with asthma [20]. The acetone detected in this study shows higher concentration in headspace of sputum samples compared to its levels in headspace of blank samples, as showed in figure 5.
There are as well exogenous compounds in the inspired air which are degraded and/or excreted in the body and that therefore may be present in low concentration in alveolar breath unless a pathological condition, affecting their metabolism, occurs. For this reason some exogenous compounds-normal constituents of the background signal in headspace and breath samples, resulting from dietary or environmental exposure [36]-may represent potential disease biomarkers. For example, limonene is a compound present in at least the 80% of the headspace samples collected from the sampling system in our study, but although it is an exogenous compound, it is adsorbed by human body and metabolised in the liver and in case of liver dysfunction it is released in breath. Ferrandino et al have reported limonene as a putative volatile biomarker of cirrhosis and hepatocellular carcinoma (HCC), showing a correlation between limonene levels in breath and the cirrhosis-induced HCC compared to the healthy controls [37].
Finally we demonstrate that several background VOCs, previously described as asthma biomarkers are present and enriched in the headspace of severe asthmatic sputum samples (e.g. acetone and acetophenone), whereas other putative asthma biomarkers whilst present in sputum headspace were not enriched compared to background samples e.g. Napthalene (which in this case was undetected in sputum headspace).
The majority of the 1120 identified background features were found in a small percentage of the total number of background headspace samples (figure 2(a)); therefore a group of 37 features, which were the most frequently identified among all the headspace samples, was selected as representative of the background signal generated from the headspace sampling system and in absence of a biological sample.
The HMDB consultation for the 37 background features in table 1 reported many studies in which some of these background volatiles, in concomitance with other compounds, were identified in headspace of biological samples and in breath, as part of diseases biomarker panels. Background features identified in this study, such as butanoic acid methyl ester, p-xylene, styrene and octanal, were detected in headspace of faeces samples as potential biomarkers of ulcerative colitis [27]; phenol, octanal, nonanal, pentadecane, and benzene were identified in saliva headspace samples as potential biomarkers of celiac disease [26], while tetradecane, decane and nonanal, were reported in a panel of potential breath biomarkers of allergic asthma [31]. The background features identified in headspace of blank samples may originate selectively from the different components of the headspace sampling system.
The vessel used for the headspace collection is made of PEEK, which is a high-performance semi-crystalline engineering thermoplastic. The PEEK's aromatic backbone comprising the bulk of its monomer unit-which attributes to the PEEK excellent thermal properties-could be, for instance, the source of aromatics components; moreover PEEK can release propene (not observed in this study) and p-xylene [38] which has been widely detected in the headspace of blank samples.
The flexible tube that links the vessel to the gas cylinder and the sorbent tubes holders are made of polyfluor tetraethylene, the most common man-made chemical belonging to the Teflon® group, which could be responsible for the presence of fluorinated components in the headspace samples such as the 1-1difluoroethane but also of alcohols like the 2-ethyl-1-hexanol [38].
However, both PEEK and Teflon®, commonly used to make flexible tubing as part of the headspace collection system, are reported in literature as the most stable materials when comparing their volatilome to the one produced by other substances such as the polyvinyl chloride, nylon, silicone and polyethylene [39].
The polystyrene Petri dishes used in this study can release styrene, while the active carbon filter contained in sorbent tubes can be the source of hydrocarbons [40], which is the reason why the sorbent tubes are conditioned in nitrogen previously their use; besides, the oxidation of the TENAX TA sorbent can release benzaldehyde [41].
All these observations stress the importance of a good knowledge of the background signal generated from the collecting system in headspace samples and highlight the potential impact of the background signal and its abundance variations on hypothetical panels of potential disease biomarkers.
In this study it was demonstrated, that although the 13 selected background features showed a wide intra-day as well as inter-day abundance variations, these changes were not statistically significant and therefore it seems unlikely that the selected background features can have a relevant impact on the composition of a potential panel of biomarkers developed from headspace samples. Only benzothiazole showed a significant intra-day linear abundance increase between 11am and 3pm. However, the fold change in benzothiazole normalised peak area value between 11am and 3pm was 2.13, indicating just a small numerical increase in the abundance value across the day and the variations were enclosed within ±1.96 SD from the mean. These results suggested that it is unlikely that the variation of benzothiazole levels could have a relevant impact on the development of a potential biomarker signature in headspace of biological samples, in addition it has to be considered that it is unusual, in the process of biomarker signature development and validation, to rely just on a single biomarker for discrimination [42].
The inter-days abundance variations of the identified features in background headspace samples did not showed any relevant results, and the level of benzothiazole in this experiment did not resulted significantly different between different weekdays ( figure 4(b)). Moreover, there was no linear relationship between the z-score of abundance values of the selected chromatographic features and the different days of the week in which the background headspace samples were collected.
The study has a number of limitations. Firstly, the intra-day study only considered a narrow series of time intervals and it is known that VOCs demonstrate diurnal variation [43]. However, the time intervals and weekday sampling were chosen as it coincides with when routine clinical samples for headspace analyses would normally be collected. Secondly, we did not localise the background signals to individual components of the sampling system. Future studies will need to evaluate individual system components to try to minimise further background VOCs.
The subsequent study has showed that 11 features reported in the background, and originated from the headspace sampling system, were present and enriched in headspace of sputum samples acquired from patients with severe asthma compared with headspace of control samples (headspace collected in absence of biological samples) (figure 5). The increased levels of acetone in headspace of sputum samples were expected considering that acetone is also an endogenous compound, and one of the most abundant volatile identified in breath samples of both asthmatics and non-asthmatics [25].
Styrene is the second background feature that showed a 3 fold increase in headspace of sputum samples even though the difference in abundance levels between sputum and control headspace samples was not reported as statistically significant. Styrene is a volatile monomer used in the production of polymers, copolymers and reinforced plastics and it has been reported as cause of occupational asthma; workers exposed to a polyester resin containing styrene showed a worsening of upper airway and asthma symptoms [44]. These results suggested that styrene is released not only from the headspace sampling system, but it is probably also present in ambient air and internalised in human body and released in biological samples. The study revealed also a decrease in abundance levels of benzenemethanol, α, α-dimethyl-and pentadecane in sputum samples compared to controls but the difference between the two groups were not statistically relevant, while the absence of undecane and naphthalene was reported in headspace of sputum samples even though serum levels of naphthalene have been reported to be significantly higher among asthmatics compared to nonasthmatics [45].
This research was exclusively focused on the evaluation of the background signal contribution in headspace of sputum samples generated from a specific sampling system and from a single disease cohort (severe asthmatics patients); it would be interesting in future to evaluate the background signal for other types of collection system for headspace and the contribution in headspace of sputum samples from other disease cohorts and healthy volunteers to better understand the specificity of putative biomarkers.
We acknowledge the potential of a selection bias given high proportion of males to females for asthma cohorts. However, it should be noted that the acquisition of spontaneous sputum (without hypertonic saline induction) is conditional on patients generating sputum plugs when they are stable (i.e. not exacerbating), this factor would not be expected to follow population level asthma gender proportions as it is dependent on patients having a bronchitic phenotype of asthma. In additional a significant proportion of our patients were on anti-IL5 biologics (14/30) and it was previously reported that there is a male predominance of patients with late onset eosinophilic asthma [46], who would usually have corticosteroid refractory disease requiring biologic therapies. Finally, gender does not appear to affect the overall profile of exhaled VOCs [47].
Moreover, all of our observations are only likely to be valid for features that are abundant in the majority of samples, therefore further research is needed to study the role and the influence of the highly abundant but less frequent (not present in all the headspace samples) background VOCs.

Conclusions
We report the volatome of a headspace sampling system for in vitro studies and its intra/inter-day variation. We have identified limited variation of abundant chromatographic features and clustering of less abundant features according to sampling time and day. We have also demonstrated that many of the volatiles identified in the background system have been reported as disease biomarkers. The background signal generated from the headspace sampling system was identified in headspace of sputum samples of severe asthmatics patients and its contribution to the biological signal has been reported. Further studies are required to understand how our observations can be used to study the volatiles headspace of in vitro samples acquired in clinical biomarker profiling studies.