Bronchoalveolar Lavage Fluid Protein Expression in Acute Respiratory Distress Syndrome Provides Insights into Pathways Activated in Subjects with Different Outcomes

Acute respiratory distress syndrome (ARDS) is associated with high mortality. We sought to identify biological pathways in ARDS that differentiate survivors from non-survivors. We studied bronchoalveolar lavage fluid (BALF) from 36 patients with ARDS (20 survivors, 16 non-survivors). Each sample, obtained within seven days of ARDS onset, was depleted of high abundance proteins and labeled for iTRAQ LC-MS/MS separately. Protein identification and relative quantification was performed employing a target-decoy strategy. A variance weighted t-test was used to identify differential expression. Ingenuity Pathway Analysis was used to determine the canonical pathways that differentiated survivors from non-survivors. We identified 1115 high confidence proteins in the BALF out of which 142 were differentially expressed between survivors and non-survivors. These proteins mapped to multiple pathways distinguishing survivors from non-survivors, including several implicated in lung injury and repair such as coagulation/thrombosis, acute phase response signaling and complement activation. We also identified proteins assigned to fibrosis and ones involved in detoxification of lipid peroxide-mediated oxidative stress to be different in survivors and non-survivors. These results support our previous findings demonstrating early differences in the BALF protein expression in ARDS survivors vs. non-survivors, including proteins that counter oxidative stress and canonical pathways associated with fibrosis.

disease biology that accounts for the variability in ARDS manifestations is critical to improve outcomes for ARDS patients 22 . Sophisticated modeling using latent class analysis of the ARDS Network cohort has identified subphenotypes in ARDS 23 . The subphenotypes with higher plasma concentrations of inflammatory markers are associated with an increased prevalence of sepsis requiring vasopressors and an inadequate response to therapeutic interventions such as fluid restrictive therapy 24 . Understanding differences in the underlying biologic processes that influence survival of ARDS patients early in the course of the illness could promote precision-medicine and exploration of new or existing therapeutic agents to improve survival 22 .
We have previously reported early differences in bronchoalveolar lavage fluid (BALF) between ARDS survivors and non-survivors 25 , although the study was conducted on pooled BALF and with only a single mass-spectrometry (MS) experiment. Here, we sought to identify biological processes and canonical pathways that differ in ARDS survivors and non-survivors by characterizing individual BALF samples using state-of-the-art label-based semi-quantitative protein expression profiling using iTRAQ (isobaric tagging for relative and absolute quantification) labeling in combination with fixed-gene set enrichment analysis. This study represents a critical first step in our effort to identify proteomic molecular endotypes in ARDS and identify potential signaling pathway targets to reduce ARDS-related mortality.

Results
Characteristics of Study Subjects. In this study, we characterized ARDS BALF samples from 20 subjects who survived and 16 subjects that died during their hospital stay. Five patients (3 survivors and 2 non-survivors) had indirect lung injury from either sepsis (n = 4) or pancreatitis (n = 1), whereas 31 cases had direct lung injury such as pneumonia or aspiration pneumonia (17 survivors and 14 non-survivors). Table 1 outlines the risk factors for ARDS and previous pulmonary history for the study subjects. The median time to death after bronchoscopy in the non-survivors was seven days (IQR = 16.5 days). The median time from ARDS onset (defined as the day of intubation) to bronchoscopy was 2 days. The length of intensive care unit (ICU) stay was not significantly different between the two groups; although the length of hospital stay was lower in non-survivors, this did not reach statistical significance (Table 2).
Survivors were younger than non-survivors (mean age 42 vs. 59 years, p = 0.01), but there were no statistically significant differences between the two groups in the time from ARDS onset to BALF collection (ARDS day) or in BALF leukocyte, neutrophil or lymphocyte counts ( Table 2). The global internal standard consisted of BALF from 27 subjects who had respiratory failure from various etiologies, including pleural disease, mediastinal mass, myopathy/polyneuropathy, interstitial lung disease without exacerbation, radiation pneumonitis, infectious pneumonia, congestive heart failure and ARDS (one patient).

ARDS Risk Factor
Direct lung injury  Table S1). The database search using the combined RAW files from all six LC-MS/MS experiments resulted in the identification of 1189 proteins at a local FDR ≤ 5% (Supplemental Table S2, master protein list tab). After manually removing misidentified proteins, reverse matches, contaminants, and proteins that were not completely removed by the high abundance protein depletion column, we retained 1115 unique proteins for further analysis. The fold change for these proteins can be found in Supplemental Table S2.
Among the proteins that were more abundant in survivors were different isoforms of aldehyde dehydrogenase, alcohol dehydrogenase, and aldo-keto reductases (Table 4). Specifically, we observed significantly higher levels of isoform 2 of alcohol dehydrogenase 2, aldo-keto reductase family 1 C1, glutathione peroxidase 3, aldehyde dehydrogenase mitochondrial. There also was a statistical trend towards a higher abundance of glutathione-s-transferase A2 (q-value = 0.08).  Biological Relevance of the Differentially Expressed Proteins. We performed IPA core analysis to identify the canonical pathways which map to the differentially expressed proteins that met a threshold q-value of ≤0.05. The pathways that met the statistical threshold described in the methods (−log [B-H p-value] ≥ 1.3) and the proteins assigned to each canonical pathway are listed in Table 5. Figure 1 shows the degree of enrichment and the number of proteins with increased or decreased abundance for each pathway in survivors compared with non-survivors. The z-score was available for Complement System (1.667), Acute Phase Response Signaling (2.238), LXR/RXR Activation (3.162), Intrinsic Prothrombin Activation Pathway (1.00) and Coagulation System (0.447). Of the pathways listed in Table 5, only pathways that participate in fibrosis (Hepatic Fibrosis/Hepatic Stellate Cell Activation) were represented in an independent core analysis done on proteins that were more abundant in non-survivors than in survivors. Certain pathways such as Complement Activation, Fatty acid alpha oxidation, Histamine Degradation, Oxidative Ethanol degradation were mapped by separate IPA core analysis by the proteins that were high in survivors and also by some proteins that were high in non-survivors (although with limited coverage of the pathway). The remaining canonical pathways were mapped by proteins more abundant in survivors.

ARDS Survivors ARDS Non-survivors p-value A
We also identified proteins with differential expression that are assigned to pathways that participate in fibrosis (Table 4) such as collagen alpha-1(V), insulin-like growth factor binding protein 4, vascular cell adhesion protein 1, fibronectin, and collagen alpha-1(XVIII). Additionally, there were several other proteins that participate in the pathophysiology of fibrosis, although they were not differentially expressed in survivors versus non-survivors. These included monocyte differentiation antigen CD14, angiotensinogen, insulin-like growth factor II, insulin-like growth factor binding protein 3, metalloproteinase inhibitor 1 and 2, 72 kDa type IV collagenase, matrix metalloproteinase-9, and collagens alpha-1(I), alpha-2(I), alpha-1(III), alpha-1(V), and alpha-3 (VI).

Proximity Extension Assay for Proteins Participating in Inflammation.
As a validation of our iTRAQ findings, we selected five proteins identified in the iTRAQ experiments that were also present in the Olink PEA panel. These proteins were growth-regulated alpha protein (P09341-CXCLI), macrophage colony-stimulating factor 1 (P09603), monocyte chemotactic protein 1 (P13500), Protein S100-A12 (P80511), and Interleukin-18 (Q14116). The Olink results were consistent with those of the MS experiments in finding no difference in the protein level between survivors and non-survivors for growth-regulated alpha protein or S100 A-12. Additionally, the Olink analysis found no statistically significant difference in macrophage colony-stimulating factor 1 or monocyte chemotactic protein 1 by study group. This comparison was not possible with MS because high-quality quantitative information was not available; monocyte chemotactic protein 1 was only present in one of the six iTRAQ experiments, and error factor was not available for macrophage colony-stimulating factor 1 in two of the three iTRAQ MS/MS experiments. By MS, Interleukin-18 was identified in the BALF of only 12 ARDS subjects (6 survivors and 6 non-survivors), and although the levels were higher in non-survivors, the difference   did not achieve statistical significance (q-value = 0.22). However, with Olink, Interleukin-18 was identified in all ARDS cases studied except one non-survivor and higher levels were seen in non-survivor compared to survivors (5.8 ± 0.88 vs. 4.9 ± 1.1, p = 0.04).

Discussion
In this study, we successfully applied label based high-resolution protein expression profiling tools to comprehensively characterize BALF from individual cases of ARDS and identified differences between survivors and non-survivors. We also confirmed the finding of a coordinated response in survivors and a fibrotic signature in non-survivors. PEA results provide analyte level validation for a subset of inflammatory proteins that are present in ARDS BALF. Oxidative stress occurs in ARDS due to the underlying pathophysiology, but is exacerbated by mechanical ventilation given the high oxygen fractions in inspired gas that is required to maintain adequate oxygenation. In many critically ill patients, limiting oxidative stress by either restricting oxygen exposure or treating with supplemental antioxidants is beneficial and can improve mortality 26,27 . However, in ARDS the use of antioxidants has produced mixed results. Some studies show improved gas exchange 28 , shorter duration of ventilation 28,29 , and improved APACHE II score 30 while others show no benefit 31,32 . In our prior work, Gene Ontology (GO) enrichment analysis, a form of over-representation analysis of fixed-gene sets, identified proteins more abundant in survivors that mapped to cellular cation homeostasis and iron ion homeostasis 25 and some of these proteins are antioxidant and cytoprotective proteins regulated by nuclear factor, erythroid 2 like (NRF2)-antioxidant response elements (ARE) 33 . In the current study, the IPA core analysis revealed differential expression of several metabolic canonical pathways, such as fatty acid alpha-oxidation and ethanol degradation, represented by many of the same proteins (Table 4), including several that are thought to be enzymes involved in the detoxification of lipid peroxidation derived aldehydes 34 and are regulated by NRF-ARE. These proteins include aldehyde dehydrogenase, alcohol dehydrogenase, aldo-keto reductases, and glutathione-S-transferase, which were found to be more abundant in survivors. Lipid aldehydes such as Malondialdehyde (MDA), hydroxynonenal (HNE), and acrolein are formed in conditions of oxidative stress 35 by oxidation of polyunsaturated fatty acids. These byproducts are highly reactive, forming adducts with DNA or proteins [35][36][37] to impact cellular homoeostasis by inactivating critical enzymes such as Na,K-ATPase 38-41 . The damage due to protein modification can be mitigated by mechanisms that convert the lipid aldehydes into less reactive alcohols by aldo-keto reductases or alcohol dehydrogenases, or to acids by aldehyde dehydrogenase. These reactions can occur spontaneously (phase 1 metabolism) or can be catalyzed by glutathione-S-transferase (phase 2 metabolism). In this study, enzymes that participate in detoxification of the lipid peroxides are higher in survivors suggesting a higher capacity to counteract lipid aldehydes. The fact that proteins that participate in phase 1 and 2 metabolisms of toxic byproducts of peroxidation are more abundant in survivors suggests that the host response to mitigate oxidative stress governs outcomes in ARDS. The precise lipid aldehyde and the specific mechanism will need further characterization for developing specific treatments in the ARDS cases lacking mechanisms to limit oxidant stress.
Similar to our previous findings in which differentially expressed proteins mapped to the GO term 'collagen metabolic processes' 25 , several of the differentially expressed proteins identified in the current analysis mapped to pathways that participate in fibrosis. Although these proteins map to 'hepatic fibrosis' , this is likely due to a better annotation of these proteins in liver fibrosis, and the fibrotic signature in our study represents a fibrotic response in the lung. In a recent autopsy study, pulmonary fibrosis was present in 51% of ARDS cases of pulmonary origin but only in 20% of cases of extrapulmonary origin 17 ; collagen content in the lung is higher in pulmonary than in extrapulmonary ARDS 17,42 . A majority of the cases in our study had direct pulmonary injury as a trigger which could explain why many of our subjects demonstrated a difference in matrix proteins. In conditions associated with pulmonary fibrosis, the extracellular matrix (ECM) and alveolar microenvironment regulate profibrotic genes [43][44][45][46] . Our previous study observed several ECM proteins, including collagen type 1, III and V, mucin 5a, and matrix metalloproteinase 9, to have a higher abundance in non-survivors 25 . In the current study, we identified proteins that participate in the pathophysiology of fibrosis, including ECM proteins such as collagens alpha-1 (XVIII), alpha-1 (I), alpha-2 (I), alpha-1 (III), alpha-1 (V), alpha-3 (VI), and fibronectin 1. Several of these proteins are involved in transforming growth factor beta and insulin-like growth factor-mediated pulmonary fibrosis. Of these, angiotensinogen, collagen alpha-1 (XVIII), collagen alpha-1 (V), fibronectin 1, insulin-like growth factor binding protein 4, and vascular cell adhesion protein 1 had differential expression between survivors and non-survivors. Though speculative, our findings regarding differential protein expression suggest a difference in host response mediated by transforming growth factor beta or insulin-like growth factor leading to early collagen deposition in non-survivors. The exact significance of this finding will need further evaluation. It is thus possible that mediators of fibrosis that increase after disease onset 47 could be detected much earlier in BALF than with histology on lung tissue.
Congruent with the findings from our previous study, the variance weighted fold change showed higher levels of ceruloplasmin, plasminogen, antithrombin III and coagulation factor XII in survivors, whereas the level of club cell secretory protein (uteroglobin) was higher in non-survivors (Table 4). In contrast to our prior study with pooled BALF, we did not find higher moesin levels in non-survivors in this study but did observe that ezrin (a member of the ezrin, radixin and moesin family of proteins) was high in non-survivors. We acknowledge that the levels of some proteins that met the statistical threshold in our prior study -such as thioredoxin and S100-A9 -did not achieve statistical significance in this study; in fact, the direction of change was different between the two studies. It is possible that differences in the respective study populations or the inherent variability in protein levels over the course of illness could account for these differences. Utilizing a uniform time for sample collection from prospectively enrolled subjects would be beneficial in future studies. Despite the observed differences from our previous study mentioned above, the pathways and biologic processes represented by differentially expressed proteins between ARDS survivors and non-survivors found in the current study are consistent with those seen in the previous study with pooled patients 25 . The results of this study should be viewed in light of several limitations. First, the outcome of interest is binary (survival versus non-survival) and can be influenced by some factors not controlled for in our study, such as comorbid conditions. In addition, although there was sufficient BALF to perform MS on individual samples, the overall sample size was relatively limited. Further, the MS method used data-dependent acquisition (DDA) which is biased against low-abundance protein quantification. Future work would benefit from utilizing data-independent acquisition (DIA) such as Sequential Windowed Acquisition of all Theoretical Fragment Ion Mass Spectrometry (SWATH-MS). Also, given the relative heterogeneity between survivors and non-survivors, there may be potential confounders which could have influenced the observed results. Despite these limitations, the current study provides new hypotheses for testing in a larger study cohort. A larger sample size would also permit the use of bootstrapping and other cross-validation tools for modelling phenotypic differences in heterogeneous and complex diseases such as ARDS.
Results from the current study validate several proteins previously reported to be upregulated in survivors in pooled BALF studies 25 . These proteins include plasminogen, antithrombin III, coagulation factor XII, and ceruloplasmin. Although all demonstrated an increase in expression, not all were statistically significant in the current analysis, likely due to the limited sample size in this study. One protein, club-cell secretory protein, was significantly higher among non-survivors, as was observed in our previous study.
This study furthers our knowledge about the differences in the biological processes activated in ARDS survivors and non-survivors. It also identifies potential future research areas, including determining the role of lipid peroxides and lung fibrosis. Given that a variety of conditions can lead to ARDS, optimal treatment of patients may differ by the mechanism active in individual subjects 48 , thus supporting phenotyping of ARDS for personalizing care 23,49 .

Study Population. Eligible subjects consisted of individuals with ARDS as defined by American-European
Consensus Conference (AECC) criteria 50 treated at the University of Minnesota Medical Center between May 2009 and September 2012. This study included 36 subjects who underwent a clinically indicated bronchoscopy within seven days of their ARDS diagnosis and for whom excess BALF was available. The study was designed before the publication of the criteria established by the ARDS Definition Task Force (Berlin definition) 11 . Pooled BALF from 27 subjects who participated in our previous study 51 was used as the control to determine the relative protein abundance and as global internal standards for comparison across the different iTRAQ LC-MS/MS experiments. A standard protocol for bronchoscopy was used for BALF collection 25,51 .
ARDS subjects were divided into two groups: those who survived until hospital discharge (survivors) and those who died prior to discharge (non-survivors). Subjects with a history of HIV or viral hepatitis were excluded from the study.
Sample processing for protein profiling. After collection, all BALF samples were immediately placed on ice and centrifuged at 500 g at 4 °C for 10 minutes within 60 minutes of collection. Cell and debris free supernatant were stored at −80 °C and did not undergo any thaw-freeze cycles until sample processing.
We employed label-based semi-quantitative proteomics using eightplex iTRAQ reagent 52 for our study. iTRAQ multiplexed sets of reagents for quantitative protein analysis place isobaric mass labels at the N-termini and lysine side chains of peptides. All resulting peptides are isobaric and chromatographically indistinguishable, but yield signature of reporter ions following collision induced dissociation that can be used to identify and quantify peptides in a digest mixture.
To compare protein abundance across different LC-MS/MS experiments and as a reference for determining relative protein abundance, we used the global internal standard. In each LC-MS/MS experiment, two iTRAQ reporter ion channels contained the global internal standard. The remaining six channels contained individual study samples (i.e., we performed 6 separate iTRAQ LC-MS/MS experiments to characterize the 36 ARDS cases in the study). The labeling strategy for the 36 BALF samples studied is outlined in Supplemental Table S4. To prevent reporter ion signal (channel) bias, survivor and non-survivor samples were randomly placed in different iTRAQ reporter ion channels in each experiment.
BALF containing at least 8 mg of protein (Bradford reagent, Bio-Rad cat#500-0006) was processed separately employing a protocol previously published with minor modifications 25,51 . BALF was concentrated and desalted using Amicon 3-MWCO filters (Millipore Ireland Ltd, Cork, Ireland), depleted of high abundance proteins (Seppro IgY 14 spin column, Sigma-Aldrich, cat # SEP010) with appropriate buffer exchanges for trypsin digestion and labeling with iTRAQ followed by two-dimensional (2D) LC-MS/MS. The adequacy of the trypsin digestion was confirmed by analysis of 3 µg of the digested peptides with linear trap quadrupole MS (LTQ-MS). Equal amounts of the remaining peptide mixtures within each experiment were labeled with eightplex iTRAQ reagent per the manufacturer's (AB Sciex, Framingham, MA) instructions 51 . 2D LC-MS/MS of the iTRAQ-labeled peptides was conducted as previously described 51 . Data-dependent MS acquisition was performed on a Thermo Scientific LTQ Orbitrap Velos system with higher energy collision induced dissociation (HCD) activation for peptide tandem MS. LC and MS experimental details were the same as previously reported 51 . Database search for protein identification and quantification. RAW files generated directly from the mass spectrometer were imported into Galaxy-P platform 53 for protein identification and quantification 25,51 . Galaxy-P has also been used for proteogenomics analysis [54][55][56] and metaproteomics studies 57,58 . The MGF files were searched against the target-decoy version of Human UniProt database along with the contaminant sequences from the common Repository of Adventitious Proteins (cRAP, http://www.thegpm.org/crap/) (88,304 sequences in total; Date Aug 1, 2014) using ProteinPilot version 4.5. PSPEP-FDR reports and protein and peptide-level summaries were generated within Galaxy-P as previously described [54][55][56][57][58] . The MS proteomics data have been deposited to the ProteomeXchange Consortium 59 via the PRIDE partner repository with the dataset identifier PXD002672.
In each eightplex iTRAQ LC-MS/MS experiment, we used the global internal standard as a reference to determine the relative abundance and labeled two iTRAQ reporter ion channels in order to provide guidance regarding the FDR of quantitative differences (within each experiment). Specifically, the relative abundance of all proteins identified in each experiment when compared using the two reporter ion channels labeled with the global internal standard should be equal to 1; proteins that show a statistically significant difference thus indicate false positives. In our six iTRAQ LC-MS/MS experiments, the number of proteins that had a statistically significant difference and therefore false positive were 8, 15, 53, 11, 25 and 8, respectively, suggesting an adequate quantitative assessment.
The results of multiple iTRAQ LC-MS/MS experiments were aligned to compare protein-levels using Protein Alignment Template vs. 2.00p (AB Sciex) 51,60 . For this alignment, we created a 'reference master list' by performing a database search using RAW files from all six iTRAQ LC-MS/MS experiments. To ensure that the proteins in this list were of high ID quality, a local FDR ≤5% was used as a threshold for protein identification in the reference master list as per the recommendation of the Protein Alignment Template. For the creation of feature tables with quantitative values, the threshold of ≤5% global FDR was used for individual sets for the six-iTRAQ LC-MS/MS experiments. Protein Alignment Template resulted in aligning the ratios (fold change), p-values and error factors for the proteins across replicate experiments by using accession numbers of isoforms within the protein summary and UniProt database. However, these included misidentified proteins (e.g., bovine albumin, pig trypsin, bovine casein), matches to reverse (decoy) protein sequences 61 , the contaminant protein sequences from the common Repository of Adventitious Proteins, and the proteins that were not completely removed by the IgY 14 depletion column or their protein fractions such as immunoglobulin chains. These proteins were manually removed from the final protein list (Supplemental Table S2, removed proteins tab).
To further investigate the difference in the BALF inflammatory proteins in ARDS survivors, we performed Olink proseek proximity extension assay (PEA) 62 . The inflammatory panel of the Olink PEA was performed on 29 un-depleted BALF samples (non-survivors = 14, survivors = 15). BALF was concentrated and desalted using a 3KD MW Amicon centrifugal filter (Millipore, Cork, IRL). Bradford assay (Bio-Rad, USA) determined the protein concentration of samples diluted with Dulbecco's PBS to 0.5ug/uL and an equal amount (0.5 ug) of proteins per the manufacturer's instructions.

Statistics. Identification of differentially expressed proteins between ARDS survivors and non-survivors
involved several steps that were similar to our prior studies with minor modifications 51 . We controlled for multiple comparisons by FDR corrected q-value ≤ 0.05. Log-transformed fold changes for all proteins that were identified in at least two-iTRAQ LC-MS/MS experiments for which error factors were available were compared between ARDS survivors and non-survivors using weighted two-sample t-tests with weights being the inverse variance of the log-transformed fold changes. The Storey method 63 was used to control the FDR. This analysis was carried out using the lower reporter ion channel (113-117) and the higher reporter ion channel (117-121) in Statistical Analysis Software (version 9.3, SAS Institute Inc., Cary, NC).
To gain insight into the biological significance of differentially expressed proteins, we performed functional analysis using Ingenuity Pathway Analysis (IPA ® QIAGEN, Redwood City www.quiagen.com/ingenuty Build 321510 M, Version 21249400). This analysis was performed on proteins with a q-value of ≤ 0.05 as the cutoff for differential expression. IPA core analysis was performed using the difference of the weighted log fold change between survivors and non-survivors. Thus, the proteins that were higher in survivors had a positive value and the proteins that were lower had a negative value. We focused on canonical pathways that met a Benjamini and Hochberg (B-H)-corrected p-value (obtained using the right-tailed Fisher exact test) of ≤0.05 (equivalent to −log [B-H p-value] ≥ 1.3). Additionally, IPA uses the z-score algorithm that informs the activation states of canonical pathways with ability to predict activation, inhibition, no change or inability to predict the activation state 64 . In our study, a positive score predicts activation in ARDS survivors. Several of the pathways in our dataset did not have a z-score assigned to them to predict their activation state.
Study Approval. The University of Minnesota Institutional Review Board (IRB) Human Subjects Committee approved this study (IRB # 0812M5623) and all methods were performed according to the relevant guidelines. Because we utilized excess BALF from clinically indicated bronchoscopies in this study, and as there was minimal risk to study participants, the IRB approved a waiver of consent for this study.