Metagenomic Characterization of The Tracheobronchial Microbiome in Lung Cancer

Background The tracheobronchial and oral microbiome may be associated with lung cancer, potentially acting as predictive biomarkers. Therefore, we studied the lung and oral bacteriome and virome in non-small cell lung cancer (NSCLC) patients compared to melanoma controls to discover distinguishable features. In this pilot case-control study, we recruited ten patients with resectable NSCLC (cases) and ten age-matched melanoma patients (controls) who both underwent tumor resection. Preoperative oral gargles were collected from both groups, who then underwent transbronchoscopic tracheal lavage after intubation. Lung tumor and adjacent non-neoplastic lung were sterilely collected after resection. Microbial DNA from all specimens underwent 16S rRNA gene sequencing. Lavage and gargle specimens underwent whole-genome shotgun sequencing. Microbiome metrics were calculated to compare both cohorts. T-tests and Wilcoxon rank sum tests were used to test for signicant differences in alpha diversity between cohorts. PERMANOVA was used to compare beta diversity. No clear differences were found in the microbial community structure of case and control gargles, but beta diversity of case and control lavages signicantly differed. Two species, Granulicatella adiacens and Neisseria subava, appeared in higher abundance in case versus control lavages. Case lavages also maintained higher relative abundances of oral commensals compared to controls. The CosmosID platform was used to process WGSS data and perform strain-level taxonomic classication. Briey, their algorithm disambiguated short sequence reads into discrete genomes. The pipeline used pre-computation phases [using the CosmosID taxonomic reference databases containing bacteria, viruses, phages, fungi, virulence markers, and antimicrobial resistance markers curated by CosmosID (CosmosID, Inc., Germantown, MD(16))] with per-sample computation (searches short sequence reads or contigs from draft de novo assemblies against ngerprint sets), detect and classify microbial sequencing reads. To exclude false positives, the platform ltered reads using a ltering threshold derived from internal scores determined by analyzing a large number of diverse metagenomes.


Background
Lung cancer is the most frequent cancer worldwide and the most common cause of cancer deaths with 1.8 million deaths in 2020. (1) In the U.S., lung cancer has the second highest incidence rate among both males and females, but it is the most common cause of death among both sexes.(2) An estimated 135,720 Americans died from lung cancer in 2020, exceeding the number of deaths expected from colon, breast, and prostate cancers, combined.(3) Despite enormous research and treatment efforts, the high fatality rate of this malignancy (82%) has changed little over the last few decades.(4) Furthermore, delayed diagnosis of lung cancer continues, with 85% of cases not being recognized until later stages, contributing to the high mortality rate associated with this disease.(5) Screening chest computed tomography offers the opportunity to discover earlier stage disease in high risk individuals, but, despite its ready availablity, is underutilized with only 3.9% of eligible people obtaining a scan.(6) Therefore, exploration of potential biomarkers of this disease is warranted.
As the affordability of next generation sequencing techniques improves, the microbiome, or the collective genomic material of all microorganisms found within and on the body, is increasingly being investigated for associations with disease and potential therapeutic value. Most research has focused on the gut microbiome, the largest and most diverse microbiome in the human body, with relatively little investigation of microbiota of other anatomic sites. Until recently, the lungs were considered sterile, but evidence indicates this organ is indeed colonized by commensal microbes, including Acinetobacter, Pseudomonas and Ralstonia. (7) Furthermore, composition and function of the microbiota in lung tissue are distinct from other anatomic sites, including the oral cavity. (7) Recent research has further shown associations between the local lung microbiome and various lung pathologies, such as asthma, cystic brosis, and chronic obstructive pulmonary disease (COPD).(8, 9) Additionally, hypotheses regarding an association between the lung microbiome and lung cancer, potentially mediated by in ammation, has been suggested. (9) Nevertheless, relatively little research on the lung microbiome in the context of lung cancer has been conducted.
Given the potential for the lung microbiome to be associated with lung cancer and to be utilized as a biomarker, this study aimed to characterize the lung and oral bacteriomes and viromes in early-stage non-small cell lung cancer (NSCLC) patients compared to melanoma controls to discover potentially distinguishable features in the compositions of the oral, tracheal and tumor microbiomes of NSCLC patients. Those results may allow assessment of the potential for minimally invasive samples to act as proxies for the tumor microbiome. The most immediate bene t of nding a microbial "signature" for lung cancer is the possibility of developing a reliable screening technique for detection of highrisk individuals with early cancers.

Patients
This prospective, exploratory case-control study recruited ten early-stage NSCLC patients and ten control melanoma patients undergoing surgical resection of their tumor under general anesthesia at Mo tt Cancer Center between July 2015 and May 2016. Melanoma patients were chosen as controls because this cancer type did not show clear evidence of microbial etiology and patients were already undergoing anesthesia with intubation for major resection of extremity melanomas. Lung cancer cases and melanoma controls were matched by age (±10 years) and smoking status (current/former versus never smokers). Eligible participants were at least 21 years of age, mentally competent, not pregnant, and received no chemotherapy within 1 year of surgery. Furthermore, participants could not have postobstructive pneumonitis, current pneumonitis, purulent bronchitis, other acute respiratory infections, cystic brosis, clinically-signi cant bronchiectasis, other in ammatory or brotic lung diseases, chronic or current corticosteroid use, antimicrobial therapy within one month or prebiotics/probiotics within 3 months of surgery. This study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments and was approved by the Liberty Institutional Review Board, Protocol 14.12.0036 (MCC 17976). Informed consent was obtained from all participants. Inc., Dublin, OH), transported on ice to the laboratory, and processed by centrifugation at 3,000 x g for 15 minutes at 4°C to separate supernatant and cell pellet. 3.2mL of supernatant was pipetted between two cryovials. Cell pellets were resuspended in 1.2mL of sterile PBS and aliquoted evenly between two cryovials, which were snap-frozen in liquid nitrogen. Cell pellets were snap frozen in liquid nitrogen (LN) and stored at -80°C.

Oral Gargle Samples
We also collected oral gargles from cases and controls in the preoperative area. Participants vigorously swished and gargled 15mL of disinfectant-free mouthwash for 15 seconds that was then expectorated into a sterile 50mL conical tube. Specimens were centrifuged according to the same parameters as lavages. We collected 3.2mL of supernatant between two cryovials. The cell pellet was re-suspended in 20mL of PBS and centrifuged again at the same speed, duration, and temperature. The nal cell pellet was re-suspended in 1.2mL of PBS and aliquoted as two 0.6mL aliquots that were stored at -80°C.

Tissue Samples
Only lung cancer patients provided tumor and adjacent non-neoplastic lung tissue specimens. Immediately after resection, LAR, while wearing a mask, removed 1cm 3 from the tumor using sterile instruments in a sterile eld in the frozen section room. A similar-sized, non-neoplastic lung specimen was also harvested in the same manner at a distance from the tumor. Tissue specimens were transported to the lab and snap frozen in LN before undergoing macrodissection and long-term storage at -80°C.

DNA extraction
Microbial DNA was extracted from all sample types. The MoBio® PowerSoil DNA isolation kit (Qiagen, Germantown, MD) was utilized, in a modi ed protocol, to extract bacterial DNA from 0.6mL cell pellets from lavages and gargles.
Brie y, cell pellets were vortexed and spun down until the sample collected at the bottom of the tube. It was then added to a bead beating tube with buffer and processed in the MP-Bio Fastprep™ 5G (MP Biomedicals, Irvine, CA) for 30 seconds at 6m/s for each of 2 cycles. Samples were centrifuged at 10,000xg for 30 seconds at room temperature with resulting supernatant collected. The supernatant was processed to remove PCR inhibitors and eluted with 100µL of buffer. DNA were quantitated using Qubit and quality checked using Nanodrop.
We used the Qiagen® DNeasy Blood and Tissue kit (Qiagen, Germantown, MD) to isolate DNA from tissue samples according to the manufacturer's protocol. Approximately 25mg or about half of the total tissue volume was utilized.
Brie y, the tissue was added to a bead beating tube containing 360 µL of ATL buffer and 40 µL of proteinase K before being vortexed and incubated in the lytic step. Samples were bead-beat according to the same steps outlined above. Samples were then centrifuged at 20,000xg for 3 minutes and resulting supernatant further processed and eluted in buffer AE.

16S rRNA Gene Sequencing
All samples underwent 16S rRNA gene sequencing with appropriate controls. Libraries were prepared using standard operating procedures (SOPs) from the Weinstock Lab at the Jackson Laboratory (The Jackson Laboratory, Farmington, CT). Brie y, high-performance liquid chromatography-puri ed primers and 4ng of DNA template were used to amplify the V1-V3 regions of the 16S rRNA gene. Libraries were screened for size and quantity as described in the SOP, and after pooling, they were quanti ed by qPCR using the Kapa Library Quanti cation Kit. The nal libraries were sequenced with a 50% PhiX spike-in on an Illumina MiSeq v3 2x300 sequencing run.
Metagenomic Whole Genome Shotgun Sequencing (WGSS) On DNA isolated from all oral gargles and lung lavages, whole genome shotgun DNA libraries were prepared from 100ng of DNA using the Illumina TruSeq Nano DNA kit following the manufacturer's protocol (Illumina, Inc., San Diego, CA), and sequenced on Illumina NextSeq High Output Kits v2 2x150 to about 80 to 260 million paired end reads, depending on the percent alignment to microbial species. This method was utilized to resolve bacterial signatures to species level and to identify viral signatures.
Bioinformatics and Statistical Analyses 16s rRNA Sequencing Data Analysis Paired-end sequencing reads were cleaned using Trimmomatic v. 0.39 (11) with the following parameters LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 to remove adaptors and low quality reads. Treatment samples with a minimum of 2,000 reads were kept for further downstream analysis. The chimeric reads were searched against the 16S rRNA Gold database with the default UCHIME (4.2) parameters. (12) Next, the cleaned reads were merged with PEAR (0.9.10)(13) and operational taxonomic units (OTUs) were generated by open reference of QIIME1.9.1 pipeline. (14) Only OTUs with the minimum total observation count of 100 were retained. Database used for taxonomic assignment was Silva 128 97_otus_16S.fasta.(15) Alpha-and Beta-diversity were analyzed using QIIME1.9.1. The taxonomy plots were based on 25 most prevalent OTUs. PERMANOVA was used to compare beta diversity estimate.
Fold differences in the top 25 most abundant microbes, with relative abundances of at least 1% in one comparison group, were calculated by dividing the relative abundance of the microbe in the comparison groups. Similarly, fold differences in the top 25 most prevalent microbes with prevalence of at least 10% in either comparison group, were calculated and organized into Venn diagrams. Student two-sample t-test and Wilcoxon rank sum test were used for differential abundance analysis between cases and controls and sample types. Two-sided P values <0.05 were considered statistically signi cant. Statistical analysis was completed using Phyloseq package in R software (v3.1.1 and v4.1.0, The R Foundation, Vienna, Austria).

WGSS Data Processing Analysis for Taxonomic Classi cation Methods
The CosmosID platform was used to process WGSS data and perform strain-level taxonomic classi cation. Brie y, their algorithm disambiguated short sequence reads into discrete genomes. The pipeline used pre-computation phases [using the CosmosID taxonomic reference databases containing bacteria, viruses, phages, fungi, virulence markers, and antimicrobial resistance markers curated by CosmosID (CosmosID, Inc., Germantown, MD(16))] with per-sample computation (searches short sequence reads or contigs from draft de novo assemblies against ngerprint sets), detect and classify microbial sequencing reads. To exclude false positives, the platform ltered reads using a ltering threshold derived from internal scores determined by analyzing a large number of diverse metagenomes.

Patient Characteristics
All 20 participants (ten NSCLC cases and ten melanoma controls) were Caucasian (Table 1). Cases had a higher percentage of females compared to controls (40% vs. 20%). Most lung cancer patients had stage I disease (80%), while most melanoma controls were advanced stage (50% had stage III disease). The majority of cases (90%) and controls (80%) had not received antibiotics within 2 months prior to their surgery. No signi cant differences were observed in any of the characteristics measured between cases and controls.  Lung cancer lavages tended to be slightly more diverse compared to controls by Chao1 index, though not signi cant (Table 3 and Figure 2C).(18, 19) Beta diversity measured by Bray-Curtis dissimilarity was signi cantly different, though clear separation between cases versus controls was not observed (Table 3 and Figure 2D). Whole genome shotgun sequencing: Bacterial species appeared similarly prevalent in cases and controls ( Figure S1A). The bacterial species Granulicatella adiacens and Neisseria sub ava, were more abundant in cases compared to controls by 6.18 and 15.93-fold, respectively (Table 2 and Figure S1B). Several species of Prevotella were more abundant in controls compared to cases. Alpha diversity estimates revealed no consistent pattern and, along with beta diversity, was not signi cantly different between cases and controls (Table 3 and Figure S1C and S1D).
The virome of the tracheal lavages was assessed through WGSS sequencing too, identifying largely similar prevalence and relative abundance between cases and controls ( Table 2 and Figure S2A and S2B). Human betaherpesvirus 7 was more abundant in case versus control lavages but was rare. Though not statistically signi cant, case lavages consistently showed higher viral alpha diversity compared to control lavages (Table 3 and Figure S2C). Beta diversity was not signi cantly different.
Cases vs. Controls: Oral gargles 16S rRNA gene sequencing: Oral gargles from lung cancer and melanoma patients showed very little difference in terms of prevalence ( Figures 1B and 3A).
The genus Prevotella was more prevalent in controls (90%) compared to cases (50%), while Granulicatella was identi ed in all oral gargles from all patients ( Figure 3A). In terms of relative abundance, Streptococcus and Prevotella, (17) were the most abundant genera in oral gargles from both cases and controls ( Figure 3B). Neisseria was much more abundant in controls compared to cases, while the opposite trend was observed for Fusobacterium (almost 2.5fold higher in cases versus controls). Alpha diversity and beta diversity across all indices showed no signi cant differences between gargles (Table 3).
Whole genome shotgun sequencing: Streptococcal species, such as S. infantis and S. pseudopneumoniae, appeared heavily prevalent in all oral gargles ( Figure S3A). While Neisseria sub ava was 2 times more abundant in gargles from lung cancer cases compared to controls and Rothia dentocariosa was more abundant in controls versus cases (Table 2 and Figure S3B), overall, bacterial abundance appeared similar among oral gargle samples. Diversity within samples were not signi cant. Beta diversity was not different by case or control status (Table 3 and Figure S3D). Bacteriophages appeared more abundant in cases, with Haemophilus phages HP1 and HP2 being identi ed in over 14-fold and 7-fold higher abundance compared to controls (Table 2 and Figure S4B). Human-tropic viruses were similarly prevalent between cases and controls. Neither alpha nor beta diversity indices demonstrated any signi cant differences between cases and controls (Table 3 and Figure S4C and S4D).
Cases: Tumor versus Normal (Non-neoplastic) Lung Tissue 16S rRNA gene sequencing: In terms of prevalence, Propionibacterium, Atopobium, and Granulicatella were identi ed in at least one tumor specimen but not in normal tissues ( Figure 1C and Figure S5A). Conversely, Actinomyces was identi ed in 20% of normal tissue samples but not in tumor tissues. The most abundant genus, albeit rare, in both the tumor and normal tissue was Burkholderia, not considering unclassi ed, though it is slightly more abundant in tumors ( Table 2 and Figure S5B). Alpha and beta diversity were not signi cantly different between tissue types (Table 3 and Figure S5C and S5D), though, normal tissue generally had lower alpha diversity compared to tumor tissue. (20) Cases: Lavage versus Gargle 16S rRNA gene sequencing: Several genera appeared more prevalent in lavages compared to gargles ( Figure 1D and S6A), including Leptotrichia, while genera like Capnocytophaga were more prevalent in gargles. Despite an overall similarity in relative abundances, the genera Streptococcus, Prevotella, and Rothia were more abundant in gargles compared to lavages, whereas Leptotrichia (>5-fold higher abundance) showed the opposite trend. By the Chao1 index, lavages maintained higher diversity compared to gargles (Table 3 and Figure S6C). Bacterial community structures were signi cantly different between gargles and lavages by both Bray Curtis dissimilarity and unweighted UniFrac distance (p=0.001).
Whole genome shotgun sequencing: Rothia dentocariosa was 2.7x more abundant in lavages versus gargles (Table 2 and Figure S7B). Veillonella dispar was much more abundant in gargles though all 10 gargles had G. adiaciens. Interestingly, E. coli was not identi ed in the top 25 most prevalent species of lavages but was observed in all gargle samples. The relative abundances of bacterial species maintained some notable differences, despite R. mucilaginosa and N. sub ava comprising the two most abundant species in both gargles and lavages.  Figure S7).
In terms of viral signatures, prevalence appears different between these two sample types (Figures S8 and S12C). For example, Human parain uenza virus 3 and respiratory syncytial virus were identi ed in more lavages (10 and 6, respectively) compared to gargles (6 and 3, respectively). Human gammaherpesvirus 4 and beta herpesvirus 7 were identi ed in 5 and 6 gargle samples but only 1 lavage specimen, respectively. Several non-human, plant and bacterial pathogens were identi ed in these samples as well. There also appeared to be a much higher proportion of unclassi ed viral taxa in lung cancer lavages (92.6%) versus gargles (74.6%). Gargles maintained higher relative abundances of Haemophilus phages HP1 and HP2 as well as human betaherpesvirus 7 compared to lavages, though Human parain uenzavirus 3 was more abundant in the latter (Table 2). Alpha diversity by Chao1 was higher in gargles versus lavages (Table 3). Beta diversity was signi cant across Bray Curtis dissimilarity (p=0.001), weighted UniFrac distance (p=0.004), and unweighted UniFrac distance.

Cases: Lavage versus Tumor
16S rRNA gene sequencing: The genus Burkholderia is more prevalent in tumor tissue compared to lavages (90% versus 40%, respectively), while Granulicatella, Prevotella, Atopobium, and Rothia are just a few of the more prevalent genera found in lavages ( Figure S9A and S11A). The tumor tissue mostly contained unclassi ed organisms but did maintain a higher relative abundance of Burkholderia than the lavage samples (1.4% versus 0.1%, respectively). However, genera like Streptococcus, Fusobacterium, Veillonella, Granulicatella, Neisseria, Leptotrichia, Prevotella, and Rothia, amongst many others, were more abundant in lavages compared to tumor tissues. LEfSe showed that Burkholderia and its associated family were signi cantly differentially abundant in tumors as compared to lavages (data not shown). A large number of bacterial taxa signi cantly differentiated lavages from tumors, including Granulicatella, Leptotrichia, Neisseria, Prevotella, and Rothia. Alpha diversity was signi cantly different between lavages and tumor tissue by all three indices, Shannon (p=0.0005), Simpson (p=0.0246) and Chao1 (p=0.0003), whereby intra-sample diversity was consistently higher in lavages versus tumor tissue (Table 3 and Figure S9C). Similarly, all three Beta diversity measures showed that bacterial community structure is signi cantly different between the sample types (p=0.001 across all three indices) ( Figure S9D).

Cases: Gargle versus Tumor
16S rRNA gene sequencing: Most genera were more abundant in gargles versus tumor tissue ( Figure S10 and S11B). A substantially higher number of bacterial taxa were signi cantly discriminatory between gargles versus tumors, including Granulicatella, Leptotrichia, Neisseria, Prevotella, and Rothia. Alpha diversity was signi cantly different between these two sample types for both Simpson (p=0.007) and Chao1 (p=0.043), indicating that oral gargles were more diverse in bacterial species richness and evenness compared to tumor tissue (Table 3). Beta diversity was signi cantly different across all three metrics, showing that these sample types maintain largely disparate community structures. Although in preclinical models, the bacterial composition of the gut microbiome appears to determine whether there is a response to immune checkpoint inhibitors (ICI), numerous human studies have failed to identify speci c species or phyla that are clearly associated with immunotherapy e cacy in any cancer. (24) Clearly other factors such as bacterial-dependent gut metabolite production that modify blood metabolites and immune competence may hold the key to ICI e cacy. (25) Although the taxonomy of gut microbiota has been under intense investigation, only a few studies have focused on the respiratory microbiome and its relationship to lung cancer.

Microbial Biomarkers
The primary focus of this study was to evaluate the oral and lower airway microbiome compositions of lung cancer cases compared to melanoma controls to reveal differences with potential applications towards biomarker studies.
Therefore, we compared tracheobronchial lavages and oral gargles that were collected from both lung cancer cases and controls. The results demonstrate that there are few signi cant differences in overall microbial composition using prevalence, abundance and diversity measures in the oral gargles between lung cancer and melanoma patients, indicating the readily available, noninvasively sampled oral gargle microbiome would not likely serve as a lung cancer biomarker.
Beta diversity refers to the variation between the samples of one community (group) compared to another community, such that the microbiome composition of one group with a higher beta diversity indicates a greater difference from the other group. By 16S rRNA gene sequencing data, beta diversity measured by Bray Curtis dissimilarity demonstrated signi cant differences (p=0.022) between case and control lavages indicating that the bacterial communities of lung cancer versus melanoma lavages were distinct, although no such trend was observed for gargles. While the lung cancer and control tracheobronchial lavages were signi cantly different by 16S rRNA-derived beta diversity, it is di cult to say that the lavages will be able to distinguish lung cancer from non-lung cancer patients since these results were not replicated by WGSS.
Abundance denotes the percentage that a speci c bacterium contributes to a sample's overall composition. Whereas prevalence refers to the number (percentage) of cases in a speci c group in which a bacterium are detected. Interesting trends were observed in abundance and prevalence of the lavages. Most noteworthy was Granulicatella adiacens was more prevalent and abundant in cases. This is a well-recognized oral commensal bacterium that has been etiologically linked to endocarditis.(26) We found this bacterium as one of the top 25 most abundant genera in lung cancer lavages, and it was much higher prevalence appearing in virtually all lung cancer tracheal lavages (100%) versus only some control lavages (30%), despite being similarly abundant in gargle specimens of both groups. Granulicatella adiacens is the same organism that Cameron and associates found in the sputum of lung cancer patients but not controls in a recent pilot study, suggesting this as a potential novel biomarker of lung cancer. (27) Replication of this nding in our study suggests that this microbe may actually be important to further investigate as a potential diagnostic biomarker, (27) and possibly even a predisposing factor to the development of lung cancer.
Additionally, lavage from the lower airways of our lung cancer cases harbored numerous supraglottic bacteria Neisseria (oral commensal), Capnocytophaga (oral commensal), Leptotrichia (oral commensal) and Moryella with twice the prevalence compared to control lavages. Neisseria sub ava which commonly colonizes the dorsum of the tongue was also found is high abundance in lung cancer lavages.
LEfSe (linear discriminant analysis effect size) analysis is used to validate biomarkers by detailing features (bacterial taxa in lavages in this case) that distinguish two groups from one another based on relative abundances. In our study, the LEfSe analysis did show several bacterial taxa, including Fusobacteria and Neisseria (especially the oral commensal N. sub ava) to be signi cantly 8-fold differentially abundant in the tracheobronchial lavages of lung versus melanoma patients. These intriguing results strongly support continued research into the tracheal microbiota as potential biomarkers of lung cancer, especially the highly prevalent and abundant Granulicatella adiacens and Neisseria sub ava.
Our study also investigated the potential utility of the oral gargle or tracheobronchial lavage microbiomes as proxies for the tumor microbiome in lung cancer. If the lavage and oral microbiomes were similar to the tumor microbiome, these less invasive sample types could be utilized to study the tumor microbiome more easily. Initially, lavages and gargles were compared to see if the gargle could potentially mimic the lavage microbiota. However, signi cant differences were found between both bacterial and viral community structures (i.e., beta diversity) and alpha diversity in lavages and gargles. That is, the gargle microbiota were dissimilar from the lavages and cannot be used as a representation of the lavage microbiota.
Alpha diversity refers to the variation (how diverse it is) of bacteria within a single sample, with a higher alpha diversity usually associated with a more diverse, healthier microbiome. In our study, the alpha diversity of lavages versus gargles was likewise different, with gargles consistently maintaining higher bacterial and viral diversity by WGSS. LEfSe, performed on both 16S rRNA gene sequencing and WGSS data, also showed many differentially abundant bacterial taxa and some viral taxa between lavages and gargles. Unfortunately, as a result these differences prevent oral gargles from acting as clinical proxies for tracheobronchial lavages. Further differences were identi ed between the tumor, gargles and lavages that preclude using these sample types as proxies of one another. This was not surprising, however, considering previous literature that has identi ed signi cant differences between lung tissue and oral microbiomes. (7) Despite these results, two recent studies have revealed the prognostic biomarker potential of the lung microbiome: one identi ed associations of the bronchoalveolar lavage microbiome with recurrence,(28) and another identi ed Enterobacter in this same sample type associated with worse survival, (19) emphasizing the importance of continued investigation of the lung microbiome in lung cancer. It has already been hypothesized that Enterobacteriaceae, a bacterial family which express the common antigen lipopolysaccharide and identi ed in our study to be signi cantly more abundant in lavages and tumor tissue versus oral gargles in lung cancer cases, may induce in ammation in lung cancer that could be associated with poor prognosis. (19) Other studies have suggested that some microbiota may opportunistically invade damaged lung epithelium, caused by smoking, and drive tumorigenesis through production of free radicals like ROS/RNS that can damage the TP53 gene. (20) Mouse models further suggest that lung microbiota may contribute to γδ-T cell activation, which are cells that go on to release the cytokines IL-17A and IL-22.(29) These cytokines appeared to co-occur with tumor progression in the mice. (29) Additional studies are needed to provide substantiated evidence of the mechanistic relationships between the microbiome, the immune system, and lung cancer.

Microbiome of Tumor and Non-Neoplastic Lung
Differences in the composition of the tumor and normal non-neoplastic tissue microbiomes of lung cancer patients were examined to highlight differences that might suggest a microbial contribution to lung carcinogenesis. If the microbiome signatures differed slightly but maintained somewhat similar microbial signatures between tumor and normal tissue, it may indicate certain microbes from the normal lung environment that could have contributed to tumorigenesis, or at least were opportunistic inhabitants of the tumor microenvironment. Indeed, sequencing revealed no signi cant differences in bacterial relative abundance, alpha or beta diversity between tumor and normal tissue samples. Interestingly, normal tissue had lower alpha diversity compared to tumor tissue, contrary to that observed between tumor and healthy tissue controls previously. (30) Finally, slight variations in bacterial prevalence were identi ed: higher prevalence of the genera Granulicatella and Burkholderia in tumors was observed, as well as higher prevalence of Neisseria and Fusobacterium in normal tissues.
The genus Granulicatella in particular has been found in a previous study to inhabit the tumor microenvironment, and as it becomes increasingly anaerobic there is production of useful metabolites for this genus. (9) In the current study, Granulicatella was also identi ed in higher prevalence not only in tumor and normal tissue but also was more prevalent in tracheobronchial lavages of lung cancer patients versus melanoma controls. Hosgood and associates also found a strong correlation between the nding of Granulicatella enriched in the oral and sputum samples of lung cancer patients compared to controls.(31) This provides some intriguing preliminary data suggesting a possible carcinogenic role for some bacteria or at least opportunistic inhabitants of the tumor microenvironment, but testing in larger cohort studies is needed.
Overall, tracheal lavages and gargles do not appear to provide a consistent microbial signature for the tumor microbiome either. In fact, signi cant differences were observed between the lavage and tumor microbiomes. By all three alpha diversity indices, lavages maintained higher bacterial diversity than tumor tissue and, by all three beta diversity indices, bacterial communities are different between lavages and tumor tissue. LEfSe revealed a large number of bacterial genera more abundant in lavages, like Granulicatella and Neisseria, but one genus was more abundant in tumor tissue, namely Burkholderia, an important Gram-negative pathogen of lung infections in cystic brosis patients (32) and is the causative agent in the life-threatening respiratory illness meliodosis. (33) Similarly, beta diversity indicated signi cantly different bacterial community structures between oral gargles and tumor tissue. This was not surprising considering previous literature that has identi ed signi cant differences between lung tissue and oral microbiomes. (7) Indices of alpha diversity also showed gargles to be signi cantly more diverse than tumor tissues, and LEfSe revealed the genus Burkholderia to once again be more abundant in tumor tissue versus oral gargles.
Overall, lavages and gargles cannot accurately stand in as proxies of the tumor microbiome given the substantial differences between them. However, the genus Burkholderia, in particular, appeared more abundant and prevalent in tumor tissue versus both lavages and gargles, suggesting a potential role in tumorigenesis or at least opportunistic inhabitance of the tumor microenvironment.

Microaspiration
Microaspiration is a common event, occurring in as many as 50% of healthy people,(34) although it is unknown how many have persistent colonization of the tracheobronchial tree with oral commensals. Previous studies by Segal and associates(35) demonstrated enrichment of oral commensals in the lower airways of normal individuals is associated with increased host in ammatory tone and increase in checkpoint inhibitor markers. This lower airway dysbiotic signature was found by Tsay and colleagues to distinguish between patients with lung cancer and benign lung nodules.(36) Particularly notable differences in our study are the marked 16-, 6-and 6-fold higher abundance of the oral commensals Neisseria sub ava, Granulicatella adiacens and Leptotrichia in the lung cancer lavages versus controls. Also, the dysbiotic tracheal microbiome had extensive 2-3 times enrichment of oral microbiota (Granulicatella, Capnocytophaga, Leptotrichia and Neisseria) in lung cancer patients compared to controls (Table 2), perhaps contributing to an in ammatory environment. The control lavages have a markedly reduced abundance of oral taxa, suggesting microaspiration and in ammation occurs to a larger extent in lung cancer patients compared to the control lavages. Indeed, beta diversity studies revealed signi cant differences (p=0.022) in bacterial community structures between the lung cancer and the control melanoma lavages.
Patnaik and associates also found oral aspiration as the source of lower airway microbiota in lung cancer with the actual microbial community in bronchial lavage correlating with the recurrence of lung cancer after resection. only a relatively small number of viruses were identi ed in our study likely because of the low abundance and inability to classify common viral organisms.
The virome of the tracheal lavages on our study was assessed through WGSS sequencing identifying largely similar prevalence and relative abundance between cases and controls in terms of both lavages and gargles. There was a higher abundance of human respiratory syncytial virus in the melanoma versus lung cancer patient lavages, and conversely human betaherpesvirus 7 was more abundant in lung cancer lavages versus controls. Oddly, many of the more prevalent viruses we identi ed in both cases and controls are plant pathogens (e.g., yellow vein viruses and tomato yellow leaf curl viruses), although yellow leaf curl is known to infect tobacco plants, which could possibly enter the respiratory tree by cigarette smoking.
Viral signatures in the oral gargles demonstrated bacteriophages targeting Haemophilus bacteria were more prevalent in melanoma controls versus lung cancer cases. Human-tropic viruses, such as endogenous retrovirus K and betaherpesvirus 7, were similarly prevalent between cases and controls. Although LEfSe identi ed several unclassi ed viral signatures as being signi cantly different between cases and controls, neither alpha nor beta diversity indices demonstrated any signi cant differences between cases and controls. However, unclassi ed viral signatures were the most highly abundant in cases and controls.
Comparison of viral signatures between the oral gargle and lavages in the cases demonstrated the prevalence appears different between these two sample types. The most prevalent viral signature in lavages was human parain uenza virus 3 and the human respiratory syncytial virus compared to gargles. However, human gammaherpesvirus 4 was identi ed more commonly in gargles. LEfSe revealed several viral taxa signi cantly differentially abundant between gargles and lavages. Alpha diversity for viral signatures was higher across all three indices in gargles compared to lavages. Finally, beta diversity revealed quite signi cant differences in viral community structure between gargles and lavages.
Unfortunately, our WGSS and bioinformatics approaches left the vast majority of the viral taxa unclassi ed in lung cancer lavages (92.6%) and in gargles (74.6%), thus hampering meaningful evaluation of the microbiome.

LIMITATIONS AND STRENGTHS
The small sample size of this study results in signi cant limitations that may have obscured statistically signi cant differences in microbiome compositions between lung cancer cases and melanoma controls. In addition, the small subsample sizes prevent us from appropriate sub-analysis of smokers versus non-smoker results. Future research will require larger cohorts to allow su cient power to detect clinically meaningful differences that could hold biomarker potential. Additionally, a more thorough evaluation of contamination should be implemented in future studies, though our case and control samples were processed in the same way such that contamination should theoretically not result in substantial difference in microbiome signatures between our comparison groups.
Due to the study's case-control design, the effect of changes in the microbiome over time could not be established to identify when microbial alterations may have occurred in lung cancer patients as compared to the controls. Therefore, further research into microbial dysbiosis in lung cancer will ideally require collecting samples at various time points using prospective cohort designs. This will enable a better understanding of when microbial dysbiosis occurs and how it is associated with clinically important events, such as disease initiation, progression, or treatment response.
Despite these limitations, this study has several major strengths including the direct comparison of the oral, tracheal, lung tumor and non-neoplastic lung microbiome versus the oral and tracheal microbiome of control patients without lung cancer. Also important is the use of WGSS in addition to 16S rRNA gene sequencing of the specimens. WGSS enabled greater taxonomic resolution, speci cally to the species level-more so than 16S rRNA gene sequencing would have enabled alone.
WGSS additionally enabled elucidation of viral, not merely bacterial, signatures to generate a more holistic view of the microbial environments among the different sample types. However, since the vast majority (93%) of viral signatures were unclassi ed, we are conducting additional research studies focusing on the more speci c PCR approaches targeting speci c viral taxa suspected to be associated with lung cancer, including human retroviruses, human papillomavirus,(38) and hepatitis B virus,(39) demonstrated in our prior pan-microbial array study of biobanked frozen lung cancers. (40) Ultimately the question arises as to why does a dysbiotic, in ammatory tracheobronchial microbiome appear to be uniformly associated with lung cancer, and perhaps the answer lies in the multifactorial nature of carcinogenesis as suggested by the human papillomavirus (HPV) and cervical cancer picture. HPV has been convincingly proven to cause 99.7% of cervical cancer.(41) If a woman is found to have high risk HPV on her pelvic exam, then she is at elevated risk for the malignancy, yet at most only 8% of high risk HPV-positive women ever develop either pre-cancerous cervical changes or frank cancer. (42) Recent studies suggest that the primary factor determining the ability of HPV to transform cervical cells is the vaginal microbiota, such that a dysbiotic, in ammatory microbiome is needed. An eubiotic, low diversity, low pH vaginal microbiome, particularly dominated by lactobacillus species, likely help clear HPV infections and are also cytotoxic by secreting bacteriocins that modulate the immune system to inhibit viral activity. However, the dysbiotic, proin ammatory microbiome induces oxidative DNA damage and promotes viral transformation of the cervix by the resident HPV. (43) Therefore, we might postulate a similar scenario for the consistent nding of a dysbiotic tracheobronchial microbiome in lung cancer patients. If some or all lung cancer is "caused' by one or more oncogenic viruses such as HPV, (44) bovine leukemia virus, (40) and HTLV-1 (40,45), then the development of a dysbiotic, in ammatory tracheobronchial microbiome, such as that found in the current study and others, may be the promoting factor that allows existing colonized, oncogenic viruses to cause malignant transformation in the lung. However, this attractive hypothesis will require a number of future studies to substantiate.

Conclusions
The primary focus of this study was to evaluate the oral and lower airway microbiome compositions of lung cancer cases compared melanoma controls to reveal any differences that may have potential applications towards biomarker studies. Indeed, in this case-control study, we found that bacterial communities of lung cancer versus melanoma lavages were signi cantly different although no such trend was observed for gargles. Several bacterial taxa, including oral commensals Neisseria sub ava and Granulicatella adiacens were signi cantly 8-fold differentially abundant in the tracheobronchial lavages of lung versus melanoma patients suggesting these organisms may warrant future study as potential biomarkers of lung cancer.
Like other published studies, we found a dysbiotic tracheal microbiome with extensive 2-3 times enrichment of oral microbiota (higher abundances of oral commensals Granulicatella, Capnocytophaga, Leptotrichia and Neisseria) in lung cancer patients compared to controls, and the control lavages have a markedly reduced abundance of oral taxa, perhaps suggesting far more microaspiration and in ammation occurs in lung cancer patients. The tumor microbiome shows substantial difference between the lavages and gargles, such that they cannot accurately stand in as proxies of the tumor. However, the genus Burkholderia in particular appeared more abundant and prevalent in tumor tissue versus both lavages and gargles, suggesting a potential role of this organism in tumorigenesis or at least as an opportunistic inhabitant of the tumor microenvironment. Finally, our WGSS and bioinformatics approaches left the vast majority of the viral taxa unclassi ed in lung cancer and control lavages (92.6%) and in gargles (74.6%), thus hampering meaningful evaluation of the microvirome. Overall, this study generated encouraging preliminary results con rming some of the ndings in the published literature that can be used in hypothesis generation for basing future studies directed at identifying potential microbial biomarkers of lung cancer. COMPETING INTERESTS: The authors declare that they have no competing or potential con icts of interests.
DATA AVAILABLITY: The data generated or analysed during the current study are available from the corresponding author on reasonable request.
CODE AVAILABILITY: Not applicable.
AUTHORS' CONTRIBUTIONS: SH was involved in analyzing and interpreting the data as well as being a major contributor in writing the manuscript. CP was involved in creating and carrying out the protocol, analyzing and interpreting the data and was a major contributor in writing the manuscript. SP, RT and YK performed the data analysis, bioinformatics and biostatistical evaluation and creation of the tables and gures. LR created the study protocol, performed all the tissue collections and bronchoscopy, was involved in data analysis and interpretation and was a major contributor in writing the manuscript. All authors read and approved the nal manuscript.
ETHICS APPROVAL AND CONSENT TO PARTICIPATE: All patients provided informed consent to participate in this study and only de-identi ed patient data is included in this manuscript. This study was approved by the Mo tt Scienti c  Figure 1 <p>Venn diagram comparison of bacterial genera prevalence between cases and controls and between different sample types among controls, as determined by the top 25 most prevalent bacterial genera identi ed in 16S rRNA sequencing analyses. Bacterial genera in the center of the Venn diagram are found in both groups. <strong>A</strong>. Comparison of bacterial prevalence between lung cancer and melanoma tracheobronchial lavages (TBL).

Figures
<strong>B</strong>. Comparison of bacterial prevalence between lung cancer and melanoma oral gargles. <strong>C</strong>. Comparison of bacterial prevalence between tumor and normal tissue from lung cancer cases.
<strong>D</strong>. Comparison of bacterial prevalence between tracheobronchial lavages and oral gargles from lung cancer cases.&nbsp;</p> Figure 2 <p>Comparing the bacteriomes, assessed by 16S rRNA gene sequencing, of tracheobronchial lavages (TBLs) of lung cancer cases and melanoma controls. <strong>A. </strong>Comparison of the prevalence of bacterial genera in lung cancer versus melanoma control TBLs. <strong>B. </strong>Comparison of the relative abundance of bacterial genera in lung cancer versus melanoma control TBLs. <strong>C. </strong>Comparison of the alpha diversity, as measured by Shannon, Simpson, and Chao1 indices, between lung cancer case and melanoma control TBLs. <strong>D.</strong> Comparison of the beta diversity, measured by Bray Curtis, Weighted and Unweighted UniFrac distance measures, between lung cancer case and melanoma control TBLs.&nbsp;</p> Figure 3 <p>Comparing the bacteriomes, assessed by 16S rRNA gene sequencing, of oral gargles of lung cancer cases and melanoma controls. <strong>A. </strong>Comparison of the prevalence of bacterial genera in lung cancer versus melanoma control oral gargles. <strong>B. </strong>Comparison of the relative abundance of bacterial genera in lung cancer versus melanoma control oral gargles. <strong>C. </strong>Comparison of the alpha diversity, as measured by Shannon, Simpson, and Chao1 indices, between lung cancer case and melanoma control oral gargles. <strong>D. </strong> Comparison of the beta diversity, measured by Bray Curtis, Weighted and Unweighted UniFrac distance measures, between lung cancer case and melanoma control oral gargles.&nbsp;</p>

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.