Proteomic characterization of human exhaled breath condensate

To improve biomedical knowledge and to support biomarker discovery studies, it is essential to establish comprehensive proteome maps for human tissues and biofluids, and to make them publicly accessible. In this study, we performed an in-depth proteomics characterization of exhaled breath condensate (EBC), a sample obtained non-invasively by condensation of exhaled air that contains submicron droplets of airway lining fluid. Two pooled samples of EBC, each obtained from 10 healthy donors, were processed using a straightforward protocol based on sample lyophilization, in-gel digestion and liquid chromatography tandem-mass spectrometry analysis. Two ‘technical’ control samples were processed in parallel to the pooled samples to correct for exogenous protein contamination. A total of 229 unique proteins were identified in EBC among which 153 proteins were detected in both EBC pooled samples. A detailed bioinformatics analysis of these 153 proteins showed that most of the proteins identified corresponded to proteins secreted in the respiratory tract (lung, bronchi). Eight proteins were salivary proteins. Our dataset is described and has been made accessible through the ProteomeXchange database (dataset identifier: PXD007591) and is expected to be useful for future MS-based biomarker studies using EBC as the diagnostic specimen.


Introduction
Exhaled breath condensate (EBC) is a biological sample collected by condensing droplets of airway lining fluid present in the exhaled air. It is a highly diluted matrix containing diverse components including salts, phospholipids, metabolites, proteins and inhaled particles such as carbonaceous and metal nanoparticles [1]. EBC collection and physicochemical characterization drive increasing research efforts to explore pathophysiological changes and identify new biomarkers for toxic exposure [2], respiratory diseases [3,4] and systemic diseases [5]. In this context, the Task Force of the European Respiratory Society has recently published guidelines and recommendations to standardize sample collection and to evaluate technical approaches targeting various analytes in exhaled breath [6]. In the field of proteomics, a few investigations using surfaceenhanced laser desorption/ionization (SELDI) mass spectrometry profiling, two-dimensional electrophoresis and/or liquid chromatography tandem-mass spectrometry analysis (LC-MS/MS) have attempted to characterize the protein content and modifications to EBC in specific situations [7][8][9]. Some of these studies revealed potential protein biomarkers for asthma [10], chronic obstructive pulmonary disease (COPD) [11,12] and lung cancer [13,14]. However, improvements in sampling and analytical procedures are still required to achieve sensitive and comprehensive proteomics characterization of EBC [6,15].
Although easily collected by a non-invasive technique, EBC is difficult to handle for proteomics analysis as it is extremely diluted (protein concentration <1 μg ml −1 ) and because it contains surfactant phospholipids. All previous studies used methods for protein concentration and phospholipid removal, considering them essential for in-depth characterization of the EBC proteome. Moreover, most experiments were performed using pooled EBC samples to improve the detection of low abundance proteins (near the detection limit in individual samples) and enhance the depth of proteome coverage. In 2012, Bredberg et al [7] reported the identification of 32 and 116 proteins in exhaled air endogenous particles (PEx) using LC-MS/MS analysis of pooled samples from six and 10 healthy donors, respectively. Exhaled endogenous particles were collected on a specific device and were concentrated using silicon plates before trypsin digestion and LC-MS/MS analysis. Thoroughly, the authors introduced a negative control to correct for non-specific protein identification. In 2015, Mucilli et al [9] identified 167 proteins in EBC based on LC-MS/MS analysis of a single lyophilized EBC pool collected from nine healthy donors; nine out of the 10 most abundant proteins identified were cytokeratins [9]. More recently, in the context of a lung cancer biomarker discovery study, Lopez-Sanchez et al [14] collected 49 EBC samples from healthy donors and identified a total of 123 proteins in these EBC specimens based on sample lyophilization, in-solution digestion and LC-MS/MS analysis.
In line with international initiatives that streamline and coordinate efforts in the field of exhaled biomarkers [6], we engaged this study to extend the knowledge of EBC proteome composition and to assess the risk of contamination associated to EBC sample collection and processing. To do this, we performed an in-depth nanoLC-MS/MS analysis of two pooled EBC samples, each of which corresponded to exhalate from 10 healthy donors. Pooled EBC samples were collected using the RTube commercial device, lyophilized, digested in-gel with trypsin and finally submitted to nanoLC-MS/MS analysis. Based on a rigorous procedure to exclude technical contaminants, 153 unique proteins were reliably identified in both EBC pools.

Materials and methods
EBC collection and preparation EBC was collected from 20 healthy non-smoking volunteers (seven men and 13 women, mean age: 36±10 years) with no known significant health problems (systemic or respiratory disease) and no symptoms of respiratory tract infection. The RTube© collection device (Respiratory Research Inc., USA) was used to collect EBC samples as previously described [2]. Volunteers breathed normally into the pre-cooled (−20°C) device for 15 min, using a nose clip to prevent nasal inhalation and exhalation. For each volunteer, the collected sample corresponded to 120 l of exhaled breath condensed in a final volume of 1.5-2 ml [16]. Samples were immediately frozen, dried by lyophilization (−47°C, 9 kPa, 12 h) and stored at −80°C. During the EBC sampling procedure, gloves and gowns were used to minimize keratin contamination.

SDS-PAGE and in-gel trypsin digestion
Individual samples were combined to constitute two pools of 10 EBC samples each (characteristics of the subjects included in each pool are presented in supplemental table 1 is available online at stacks.iop. org/JBR/12/021001/mmedia). To produce each pool, 25 μl of Laemmli buffer (glycerol, β mercaptoethanol, SDS, bromophenol blue (1%), Tris-Cl pH 6.8) was added to the first dried sample (sample 1) and centrifuged at 800 g, 4°C, for 1 min. Sample 1 was then pipetted and added to dried sample 2. These steps were repeated until 10 samples had been combined. Proteins from pooled samples were stacked on the top of a precast polyacrylamide gel (NuPAGE ™ 4%-12% bis-Tris protein gel, Invitrogen) and revealed by Coomassie blue staining. Gel pieces containing EBC proteins were manually excised and proteins were digested in-gel with trypsin as previously described [17]. Two control samples (distilled water) were included and processed in parallel with the EBC pools as blanks to allow monitoring for protein contamination occurring during the pre-analytical procedure. Peptide digests were resolubilized in 25 μl of 2% acetonitrile, 0.1% formic acid, and 10 μl was injected into the LC-system.

Mass spectrometry-based proteomic analyses
Peptides resulting from trypsin digestion were analyzed by nanoliquid chromatography combined with tandem-mass spectrometry (Ultimate 3000 coupled to LTQ-Orbitrap Velos Pro, Thermo Scientific) using a 120 min gradient, as previously described [18]. RAW files were processed using MaxQuant [19] version 1.5.3.30. Spectra were searched against the SwissProt database (Homo sapiens taxonomy, December 2015 version) and the pig trypsin sequence. Trypsin was chosen as the enzyme and two missed cleavages were allowed. Precursor mass error tolerances were set at 20 and 4.5 ppm for first and main searches, respectively. Fragment mass error tolerance was set at 0.5 Da. Peptide modifications allowed during the search were: carbamidomethylation (C, fixed), acetyl (Protein N-term, variable) and oxidation (M, variable). Minimum peptide length was set to seven amino acids. Minimum number of peptides, razor+unique peptides and unique peptides were all set to 1. Maximum false discovery rates (FDR)-calculated by employing a reverse database strategy-were set to 0.01 at peptide and protein levels. Intensity-based absolute quantification (iBAQ) [20] values were calculated from MS intensities of unique+razor peptides. Proteins identified in the reverse database and trypsin were discarded from the list of proteins identified. LC-MS/MS data (original raw files) have been deposited to the ProteomeExchange Consortium via the PRIDE partner repository with the dataset identifier: PXD007591 [21].
Data filtering and mining Protein contamination is a crucial issue when analyzing EBC. Two types of contamination were considered: (i) technical contamination during sample preparation and (ii) biological contamination by saliva during sample collection. To correct for protein contamination during sample processing, 'technical' control samples (distilled water) were processed and analyzed alongside the two pooled EBC samples. For each protein identified, a minimum of 100-fold enrichment between the pooled EBC sample and its corresponding 'technical' control sample was required for inclusion in the final EBC protein list. Proteins with an enrichment ratio below 100 were considered as technical contaminants. To evaluate contamination of EBC with salivary proteins, the expression pattern for each protein identified was examined using the Human Protein Atlas database (http://proteinatlas. org/). Functional analysis of the EBC proteome was performed using Gene Ontology (GO) (http:// geneontology.org) enrichment using the ClusterProfiler R package [22]. P-value threshold for enrichment significance was set to 0.05. The lung proteome was considered as background dataset (5469 genes) and was extracted from the Human Protein Atlas according to the following criteria: tissue = 'lung', level (of expression)='Medium' or 'High', and Reliability= 'Approved' or 'Supported'.

Results and discussion
EBC proteome characterization Two pools of 10 individual EBC samples and two 'technical' control samples were constituted to allow in-depth and reliable characterization of the EBC proteome. Samples were processed as follows: lyophilization, protein concentration using a stacking gel, in-gel digestion with trypsin and analysis of peptide digests by single-shot nanoLC-MS/MS (figure 1). Data processing using 1 significant peptide per protein and a FDR below 1% at the peptide and protein levels led to the identification of 430 proteins in the four samples (supplemental table 2). To extract the 'core' EBC proteome, data were further filtered using more stringent criteria: (i) identification with a minimum of two significant peptides per protein, (ii) minimal iBAQ enrichment of 100-fold between each pooled EBC sample and its corresponding 'technical' control sample. Based on these criteria, we identified a total of 229 unique proteins in the two pooled EBC samples. More precisely, 175 proteins were present in the first pooled EBC sample, 207 in the second sample, and 153 proteins were common to both pools (table 1). The final list of 153 unique proteins identified in the 2 pooled samples was considered as the 'core' proteome of EBC (tables 2 and 3).
Importantly, several previous investigations of EBC protein content reported cytokeratins as major constituents of the EBC proteome [9,23]. However, this group of proteins can also be present due to technical contamination during sample processing. In this study, following filtering, 10 cytokeratins were reliably identified as true components of the EBC proteome. A group of 10 other proteins, however, were identified in both 'technical' control samples with an enrichment in EBC samples below the fixed threshold. These proteins were thus considered to be technical contaminants (table 4). Their specific or highly predominant expression in the skin was confirmed using the Human Protein Atlas database [24].
As EBC samples are obtained from air exhaled through the oral cavity, and even though the RTube collection device contained a saliva trap to separate saliva from the exhaled breath, contamination with salivary proteins had to be assessed. Several studies quantified α-amylase activity levels as a means to assess salivary contamination. Alternatively, the EBC proteome can be compared to the salivary proteome, as characterized by Sivadasan et al [25]. However, the origin of proteins identified in both samples is difficult to determine; does it correspond to true overlap or cross-contamination? In this study, we decided to check the expression pattern for each protein of the 'core' EBC proteome using the Human Protein Atlas, which was originally developed as an expression dictionary for all protein-coding genes in human tissues and organs [24], the NextProt database [26] and bibliographic information. We sorted the proteins identified into four different groups: (i) proteins specifically expressed in the salivary glands (n=8), (ii) proteins expressed both in the salivary gland and in other tissues from the respiratory tract (lung, bronchi and nasopharynx) (n=94), (iii) proteins not expressed in the salivary glands and expressed in the respiratory tract (n=49) and (iv) 2 proteins expressed in the tongue, esophagus and skin (tables 2 and 3). Interestingly, among the 49 proteins expressed in the respiratory tract only, some are mainly expressed in the upper respiratory parts such as serpin B3 (bronchi, nasopharynx); others are more abundant in the deep lung such as fatty acid-binding protein 5, which is strongly expressed in lung macrophages. At last, some proteins are expressed all along the respiratory tract, such as cystatin-A. While the precise contribution of each respiratory compartment to the EBC content is still under discussion [13], our results bring additional confirmation that EBC may be representative of all the levels of the respiratory tract including deep lung which is a critical target for different toxicants such as nanoparticles.

Functional annotation of the EBC proteome
The list of 145 proteins identified in the two pooled EBC samples (excluding the eight salivary proteins) was submitted to GO-term enrichment analysis [22] to determine functions that were significantly enriched in our EBC proteomic dataset compared to the lung proteome (corresponding to 5469 genes extracted from the Human Protein Atlas). According to this analysis, the main biological processes that were found over-represented in EBC compared to lung were immune system processes, exocytosis and NAD/ NADH metabolism ( figure 2(A)). Hence, the EBC proteome was found to contain several proteins of the airway mucus including mucin 5B, DMBT1 (deleted  in malignant brain tumors 1) protein and alpha-1antitrypsin [27]. Mucosal secretion prevents adherence of pathogens to the airway epithelial cells and ensures their clearance by the mucociliary escalator, together with inhaled particles. Lysozyme and lactoferrin which are the two most abundant antibacterial proteins secreted into the respiratory tract were also identified in our dataset as well as a myriad of proteins secreted by immune system cells [28]. In general, these results demonstrate that EBC constitutes a relevant matrix to study major physiological functions of the respiratory tract, especially mucosal layer secretion, innate and adaptive antimicrobial defense mechanisms and clearance of inhaled particles [28,29].
Comparison with previous studies Our experimental design and the dataset produced (i.e. the list of 153 proteins identified in both pooled EBC samples including the 8 salivary proteins) were compared to the two most extensive EBC proteome maps previously described for healthy subjects [7,9]. In 2015, Mucilli et al [7] collected EBC from nine nonsmoking volunteer donors using a Turbo DECCS device (Medivac, Italy). Samples were pooled to create a single EBC sample with a final volume of 65 ml (equivalent to 1800 l of exhaled breath). After lyophilization, in-gel digestion and LC-MS/MS analysis, these authors identified 167 proteins (two significant peptides per protein, FDR 1%), 77 of which were also included in our protein list ( figure 2(B), supplemental table 2). Unlike our procedure, Mucilli et al [7] omitted a control to assess contamination during sample processing, and the eight most abundant proteins in their dataset were cytokeratins, representing 48% of the total emPAI (exponentially modified protein abundance index) [30].
Another proteomics study was performed in 2012 by Bredberg et al [7] to characterize the protein composition of endogenous particles in exhaled air (PEx). These authors used a specific sampling procedure involving silicon plates. Two pooled samples (obtained from six and 10 subjects with forced exhalation) and a negative control (sampling device exposed to ambient air and processed in parallel with the two pooled samples) were analyzed by LC-MS/MS after in-gel digestion. This analysis identified 124 proteins from the two pooled samples, but only 24 proteins were shared by both pools, as a result of the high variability of PEx sample collection. Among the 124 proteins identified in at least one pooled sample, 36 were also identified in our dataset ( figure 2(B), supplemental table 2). As already discussed by Mucilli et al [9], these data demonstrate that the sampling method can influence the protein composition of the collected samples. For instance, in 2012, a PEx sampling technique described by Larsson et al [31] was shown to be more efficient in collecting albumin and surfactant protein A than classical EBC collection. Accordingly, no surfactant protein was identified by Mucilli et al [9] and we could detect surfactant protein A in the second EBC pool only (supplemental table 2).
Importantly, 59 proteins from our dataset were identified in neither of these previous studies. A complementary analysis using GO-term annotation [32] showed that these 59 proteins have the same functional distribution between the different proteomic Cystatin-SN 2 P09228 Cystatin-SA 2 Keratin, type II cuticular Hb5 2 datasets (figure 2(C)). All together, these data demonstrate that our analytical procedure did not enrich a specific subproteome but merely extended the coverage of EBC proteome. Undoubtedly, the use of a 2 h LC gradient improved peptide distribution throughout MS/MS analysis and enabled the identification of these novel EBC proteins.
Biomedical potential of EBC proteome As a non-invasive specimen, EBC could be used for biomarker discovery and analysis. In line with these potential applications, comparative proteomics studies identified biomarker candidates for a variety of pulmonary diseases, including COPD [11,12], asthma [10], pulmonary emphysema with α-1-antitrypsin deficiency [8] and lung cancer [13,14]. In agreement with these studies, some of these biomarker candidates (such as α-1-antitrypsin, hornerin, cytokeratins 6A and 6B) were identified in our EBC proteomics dataset. However, our study also identified 10 proteins with high abundance in the two 'technical' control samples, including dermcidin, which was recently selected as a potential biomarker for lung cancer in EBC [14]. The expression pattern for dermcidin may have been modified by tumorigenesis processes (in healthy individuals, dermcidin is not expressed in the respiratory tract), but its presence might also be a technical artefact. This result emphasizes the importance of reliable reference proteome datasets to support clinical biomarker studies [10,15] and occupational health monitoring of workers exposed to engineered nanoparticles [33].
Most published investigations of the EBC proteome were performed using pooled and lyophilized samples to counteract dilution and favor the detection of low-abundant proteins. However, pooling of EBC samples precludes the evaluation of biological variability which is known to be influenced by age, gender, height and other factors [6,34]. In our study, we optimized a straightforward analytical procedure based on sample lyophilization, in-gel digestion and nanoLC-MS/MS analysis to characterize EBC specimens. Interestingly, only 40% of each of the peptide digests obtained from 10 healthy subjects was required for injection into the liquid chromatography system before MS/MS analysis. Undoubtedly, this opens the possibility to work with larger sample cohorts, at individual scale using shotgun LC-MS/MS or better still, targeted proteomics approaches.
Recently, shotgun nanoLC-MS/MS experiments were performed at individual scale using EBC samples from 49 healthy volunteers [14]. However, after sample concentration and digestion, very few proteins were identified (an average of 13 proteins per EBC sample) illustrating the difficulty to process submicrogram protein amounts and to achieve in-depth proteome characterization. In this context, targeted proteomics methods such as selected reaction monitoring (SRM) [35] appear extremely promising. SRM -also referred to as multiple reaction monitoring-is a highly selective MS-based technique that overcomes some limitations of untargeted LC-MS/MS methods. SRM analyses offer the unique possibility to specifically and simultaneously monitor the signatures-so called SRM transitions-of hundreds of preselected peptides generated by protein digestion. Due to its high selectivity, SRM methodology is inherently more sensitive than MS/MS and is especially adapted to the detection of low-abundant proteins in biological matrices. In addition, when combined with isotopedilution quantification standards, SRM experiments can provide quantitative data for each protein targeted. Likely, proteins identified from untargeted LC-MS/MS analyses of EBC pools will be detectable and quantifiable at individual scale using SRM approaches.

Conclusion
Over the last decade, significant advances in MS-based proteomics instrumentation and methodologies have supported the establishment of comprehensive proteomics maps for human tissues and biofluids. These characterization efforts were sustained by several international research initiatives, such as the Human Proteome Project (HPP) [36][37][38]. Reliable proteomics surveys, most of which were acquired by LC-MS/MS, are now available for human tissues and biofluids in public repositories. Simultaneously, the European Respiratory Society and the American Thoracic Society have provided recommendations and guidelines to increase the reliability and comparability of exhaled biomarker studies [6]. As a contribution to this field, we performed an in-depth and reliable characterization of the EBC proteome for healthy subjects, taking into account potential exogenous (technical) and endogenous (salivary) sources of protein contaminants. Undoubtedly, this dataset will support future clinical studies dedicated to the discovery of novel protein biomarkers for pulmonary diseases and toxic exposure.