Screening of potential microbial markers for lung cancer using metagenomic sequencing

Abstract Introduction Lung cancer is the most prevalent cancer with high mortality in China, and it is associated with the dysbiosis of the lung microbiome. This study attempted to screen for specific microorganisms as potential biomarkers for distinguishing benign lung disease from lung cancer. Methods Bronchoalveolar lavage fluid (BALF) sample was selected in the study instead of saliva to avoid contamination with oral microorganisms, and microbial taxonomic and functional differences in BALF samples from patients with lung cancer and those with those from patients with benign lung diseases were performed based on metagenomic next‐generation sequencing, for the first time, so that microorganisms other than bacteria could be included. Results The results showed that the intrasample diversity of malignant samples was different from benign samples, and the microbial differences among malignant samples were smaller, with lower microbial diversity, significantly changed microbial abundance and metabolic functions. Metabolic function analysis revealed amino acid‐related metabolism was more prevalent in benign samples, whereas carbohydrate‐related metabolism was more prevalent in malignant samples. By LEfSe, Metastat and Random Forest analysis, we identified a series of important differential microorganisms. Importantly, the model combining five key genera plus one tumor marker (neuron‐specific enolase) as indicators presented the optimal disease typing performance. Conclusion Thus results suggest the value of these differential microorganisms enriched in tumors in mechanism research and may be potential new targets for lung cancer therapy. More importantly, the biomarkers identified in this study can be conducive to improve the clinical diagnosis of lung cancer and have good application prospects.


| INTRODUCTION
Lung cancer is the most frequent malignant tumor across all cancer types in China with calculated 816,563 new cases and 714,699 deaths in 2020 as declared by global tumor statistics. 1 Although the intensive research efforts and progress have obtained on the diagnosis and treatments for lung cancer, unsatisfactory prognosis and high mortality still exist, due to higher metastasis and delayed diagnosis, especially in patients with advanced stages. 2,3 Therefore, it is clinically imperative to search for novel and effective diagnostic tools to early diagnose the onset of lung cancer.
Substantial evidence has demonstrated that microorganisms are implicated with human health, affecting multiple physiological processes including nutrition metabolism, immune defense, brain activity, and disease regulation. 4 The associations of microorganisms with metabolic diseases, autoimmune diseases, neurological disease, and even cancers have drawn lots of attentions. 4,5 Emerging studies have indicated that microorganisms exert a vital role in the occurrence, development, and treatment of cancers through regulating tumor microenvironment, predicting cancer risk, triggering inflammation to stimulate immune response, and combining with anticancer drugs to enhance cancer treatment. 6 Statistically, approximately 13% of malignant tumors are attributable to microorganisms in 2018, including Helicobacter pylori, hepatitis B virus (HBV), Epstein-Barr virus (EBV), and human papilloma virus (HPV). 7 Especially, as one of the largest surface area mucosal organs in the body, the lung has been proven to possess a unique microbial community in both healthy and pathological conditions. 8 The altered lung microbiome may contribute to tumorigenesis by changing the stability of the host genome via secretion of bacterial toxins, disrupting local immune barriers, and releasing oncogenic microbial metabolites. 9 Given the close relationship between lung cancer and microbes, some researchers have been contributed to identify microbes associated with lung cancer as biomarkers for diagnostic purposes by leveraging revolutionary advances in culture-independent sequencing technology. 10 For example, Yu et al. have reported higher level of Thermus and lower level of Ralstonia in tumor tissue from patients with advanced lung cancer compared with non-malignant lung tissue, suggesting an important role for these bacteria in lung cancer progression. 11 Yan et al. have shown that the relative abundances of Veillonella and Capnocytophaga are markedly elevated in saliva samples from lung cancer patients. 12 In addition, Lee et al. have also detected an incremental abundant of Veillonella and Megasphaera in bronchoalveolar lavage fluid (BALF) specimens from lung cancer patients. 13 Moreover, Cameron et al. have demonstrated that Granulicatella adiacens and several other opportunistic pathogens are more frequently found in spontaneous sputum samples from lung cancer patients. 14 Notably, previous studies screening microbial markers for lung cancer were largely investigated using 16S rRNA sequencing, while the studies based on metagenomic sequencing are limited. Therefore, the further comprehensive understanding on the diagnostic value of specific microorganisms for lung cancer using metagenomic sequencing and its role in tumor progression are still urgently necessary based on metagenomic sequencing.
Generally, the microbiota in patients with lung cancer may vary depending on the sample type, sampling method, and patient cohort. Since saliva and sputum specimens are vulnerable to oral microbial interference, and lung tissue samples are usually difficult to acquire from patients with advanced lung cancer. 15 Therefore, in this study, the comparisons of microbial classification and functional differences between BALF specimens from lung cancer patients and patients with benign lung disease were performed to screen potential microbial markers for lung cancer based on metagenomic sequencing. Furthermore, considering tumor markers such as carcino-embryonic antigen (CEA), neuron-specific enolase (NSE), and cytokeratin (CYF21-1) are widely used to evaluate the diagnosis, progression, and prognosis of lung cancer patients, 16 these three tumor markers combined with specific microorganisms were utilized to establish the clinical prediction models for lung cancer and then evaluate the performance to improve the clinical diagnosis of lung cancer.

| Patient and sample collection
Between February and September 2021, patients who developed suspicious nodules based on thoracic computed tomography examination and underwent clinical bronchoscopy in Tianjin Chest Hospital were initially recruited in this study. Our inclusion criteria were as follows: nodule size of 0.8-3.0 cm on CT of the chest, single nodule, no calcified foci, no obvious satellite foci, and the presence of lymphadenopathy in the mediastinal window. If there was lymphadenopathy of ≤1.5 cm in diameter, there should be no obvious damage to the ribs or vertebrae on CT. In conclusion, diseases that could not be distinguished benign or malignant on imaging and were not associated with infection were selected. Exclusion criteria were as follows: excluding patients with acute lung infections, second primary tumors, other pulmonary comorbidities including chronic obstructive pulmonary disease, pulmonary fibrosis and bronchiectasis, or patients who had received antibiotic therapy within 1 month.
On this basis, the patients were then divided into lung cancer as malignant group and benign lung diseases as benign group according to the pathological diagnosis. We used the pathological diagnosis of specimens obtained by bronchoscopy as the gold standard and included those patients with non-neoplastic, noninfectious disease in the benign disease group. In fact, sarcoidosis was most common in the benign group, which is characterized by a non-caseating necrotic granuloma indistinguishable from malignant neoplasm on imaging.
Considering the influences of several factors, including age, gender, BMI, and smoking on individual's microbiota, no statistically significant differences were ensured in these factors between the two groups, minimizing their interference with the final sequencing results. Ultimately, a total of 60 patients were enrolled in this study, including 29 cases with lung cancer and 31 with benign lung diseases.
BALF was performed in the lung on the side of the suspected nodule according to a standardized protocol developed to reduce oral contamination, 17 and then 2 ml of BALF was collected from each patient and stored at −80°C within 30 min. In addition, the bronchoscope was washed with 10-20 ml of sterile 0.9% saline prior to the bronchoscopy, and the washing fluid was stored in sterile centrifuge tubes as negative control (NC). Lastly, a total of 90 BALF samples were obtained in this study, including 30 NC samples, 31 benign samples, and 29 malignant samples.
Both demographic and clinical characteristics were documented for all participants, including age, gender, BMI, smoking history, smoking index, tumor markers (CEA, NSE, and CYF21-1), pathology type, and tumor stage. This study was approved by the Medical Ethics Committee of Tianjin Chest Hospital (ethical number: 2021YS-024-01), and the subjects' written informed consents were obtained.

| Extraction of genomic DNA and metagenomic sequencing
Genomic DNA was extracted from BALF samples using the QIAamp DNA Microkit kit (QIAGEN). Next, the QIAseq™ Ultralow Input Library Kit for Illumina (QIAGEN) was utilized to construct the DNA libraries. The quality of the constructed libraries was assessed using Qubit fluorescence quantitative analyzer (Thermo Fisher, MA, USA) and Agilent 2100 bioanalyzer (Agilent Technologies). Finally, qualified DNA libraries were amplified and sequenced by a Nova6000 PE150 platform sequencer (Illumina).

| Data processing and analysis
Raw data quality was assessed using FastQC software. Reads with low quality (Q < 30) and short fragments (<35 bp) as well as adapter contamination were removed from the raw data by Trimmomatic software, followed by the removal of human host reads through mapping data to human reference database using Bowtie2 software. The remaining reads were devoted to perform microbiota taxonomic diversity and the relative abundance calculation, respectively, using kraken2 and bracken softwares. Finally, contaminated species that defined as reads per million (RPM) > 50 in blank control and RPM ≥onethird of the total RPM of NC at species or genus level were eliminated.

| Statistical analysis
All statistical data analyses were performed using R software (version 3.6.1). The top 20 genera and species in relative abundance in all BALF samples were screened for the following analysis. The correlation of these genera and species in abundance between samples was assessed using Spearman's correlation analysis. The alpha diversity that reflecting the microbial diversity within a single sample was described using Chao1, Shannon, Gini-Simpson indices, and then these indices of all BALF samples were compared between the benign and malignant groups. Beta-diversity was evaluated using principal component analysis (PCA) and non-metric multidimensional scaling (NMDS) cluster analysis that describing dissimilarity in overall microbial community composition between the two groups, and then multi-response permutation procedure (MRPP) was used to analyze whether the inter-group difference in microbial community composition was significant. LEfSe and Metastat analysis were conducted to determine microorganisms that were significantly different between the two groups.
Furthermore, metagenome-based metabolic pathway analysis was performed using HUMAnN3 software. 18 Low-quality and host-filtered reads were matched to the UniRef90 gene family database, and then regrouped into MetaCyc pathways and KO categories for pathway annotation. MetaCyc abundance and KO abundance were normalized using CPM, and the differences in metabolic pathways between the benign and malignant groups were compared by Wilcoxon rank sum test, with p < 0.05 considered significant. Afterwards, GSVA as a non-parametric, unsupervised method 19 was used to perform the KEGG pathway enrichment analysis based on the normalized KO abundance (CPM), and the differences between the two groups were analyzed using the limma package in R, with the adjusted p < 0.05 by Bonferroni-Holm method and |log2FC| > 0.2.
Lastly, based on 40 genera with significant differences in relative abundance as well as three tumor markers (CEA, NSE, and CYF21-1), random forest classification analysis was used to construct a genus-based classifier that was able to discriminate malignant and benign lung diseases, using the leave-one-out cross-validation method due to the small sample size, 20 and the optimal classifier was evaluated using the R package caret. The ROC curve analysis was then conducted to assess the forecasting performance of classifier.

| Baseline demographic and clinical characteristics of participants
Demographic and clinical data of 60 participants, including 29 patients with lung cancer and 31 patients with benign lung diseases, are summarized in Table S1. No significant differences were found in terms of age, gender, body mass index (BMI), smoking status, or smoking index between the benign and malignant groups. The levels of three tumor markers CEA, NSE, and CYF21-1 displayed significant differences between the two groups. In addition, the pathology types of these 29 patients with lung cancer consisted of 12 with lung adenocarcinoma, 11 with lung squamous cell carcinoma, and six with small cell lung cancer, along with different tumor stage.

| Microbial composition and diversity of BALF samples in patients with lung cancers and benign lung diseases
The top 20 genera and species with the highest relative abundance in all BALF samples from patients with lung cancers and benign lung diseases were screened and then combined, resulting in the pooling of 159 genera and 288 species for subsequent analysis. The correlations of these genera and species between samples were presented as a heatmap ( Figure S1). Multi-response permutation procedure (MRPP) analysis, shown in Tables S2 and S3, suggest that at genus level, the intergroup differences between benign disease and malignant tumors were significantly greater than the intragroup differences. In addition, Chao1, Shannon, and Gini-Simpson indices were then employed to measure the alpha diversity of microbial community, and the results showed that these indices of BALF samples in the malignant group was generally lower than those in the benign group but without significant differences both at the genus and species levels, excepting only for the Chao1 index at genus level that illustrated significantly difference between the two groups ( Figure 1). Furthermore, both the results of principal component analysis (PCA) and non-metric multidimensional scaling (NMDS) clustering analysis showed that there was no clear separation in microbial community both at the genus and species levels between the malignant and benign groups, while the PCA results showed that compared with benign samples, malignant samples were more clearly clustered, and NMDS results showed the opposite.

| Metabolic pathways of BALF microbiota in patients with lung cancers and benign lung diseases
The metabolic function of BALF microbiota was further explored using HUMAnN3 software. The MetaCyc pathway differential analysis showed that compared with the malignant group, BALF microbiota in the benign group were predominantly associated with amino acidrelated metabolism, such as ketogenesis, L-cysteine biosynthesis VI (from L-methionine), L-arginine biosynthesis IV (archaebacteria), L-arginine biosynthesis III (via N-acetyl-L-citrulline), L-ornithine biosynthesis I, L-arginine biosynthesis I, and L arginine biosynthesis I (via L-ornithine) ( Figure 2A). Meanwhile, KEGG pathway differential analysis based on Gene Set Vairant Analysis (GSVA) revealed that glutathione metabolism, RNA degradation, and protein export were significant enriched in the benign group; in contrast, carbohydraterelated pathways were more prevalent in the malignant group, such as starch and sucrose metabolism, pentose and glucuronate interconversions, and galactose metabolism ( Figure 2B).

| Species composition characteristics in different types of cancer tissues
The top 10 genera and species in the abundance of benign disease samples and malignant tumor samples were taken, respectively, and aggregated to draw stacked maps ( Figure S2). The results showed that the common microbial genera in the benign and malignant groups included Veilonella, Toxoplasma

| Retrieval of significantly different species in the benign and malignant groups
Two methods, namely linear discriminant analysis (LDA) effect size (LEfSe) and Metastat, were conducted to search for differentially enriched microorganisms in BALF samples between the malignant and benign groups. LEfSe analysis was performed based on the Wilcoxon or Kruskal-Wallis rank sum test and LDA, and the genera and species with statistically significant differences in relative abundance between the two groups (threshold value with LDA score > 2) are shown in Figure 3, such as Prevotella, Klebsiella, Mycobacterium, Gordonia, and Sphingobium at genus level, as well as Prevotella jejuni, Sphingobium sp. YG1, Klebsiella pneumoniae, Gordonia polyisoprenivorans, and Acidovorax sp. KKS102 at species level. Metastat analysis, unlike LEfSe analysis, automatically adjusts the statistical method according to the sample. Metastat analysis results revealed that the top 10 genera with the most significant differences in relative abundance in BALF samples between the two groups were Achromobacter, Chryseobacterium, Herbaspirillum, Pedobacter, Thermomonas, Undibacterium, Caulobacter, Novosphingobium, Prevotella, and Dechloromonas, and the top 10 species included Sphingobium sp. YG1, F I G U R E 2 Metagenome-based metabolic pathway analysis of bronchoalveolar lavage fluid (BALF) in patients with lung cancers and benign lung diseases. (A) Box plot of metabolic differential pathways as analyzed by MetaCyc pathway differential analysis based on HMP Unified Metabolic Analysis Network. Red color and green color, respectively, indicate log 2 (CPM + 1) values for benign disease and malignant tumor samples. (B) KEGG differential pathways with adjusted p < 0.05 and |log 2 FC| > 0.2 based on Gene Set Vairant Analysis. The rows are pathways, and the columns are samples. Columns are grouped by sample type and rows are grouped by log 2 FC score of the pathway, where log 2 FC < −0.2 is the pathway with high score in the benign group, while log 2 FC >0.2 is the pathway with high score in the malignant group.

| Screening and optimization of potential biomarkers
A random forest classifier was constructed to categorically rank the importance of the differential genera. Since the F I G U R E 3 Differentially enriched microorganisms in bronchoalveolar lavage fluid (BALF) of patients with lung cancers and benign lung diseases retrieved by linear discriminant analysis (LDA) effect size (LEfSe). (A) Differentially enriched microorganisms at genus level with LDA score greater than 2 displayed by the bar chart on the left. The larger the LDA value, the greater the impact on the corresponding group and the higher the abundance. Differentially enriched microorganisms at genus level displayed by the clade diagram on the right, with circles radiating from the inside out, representing the taxonomic levels from kingdom to species. Each small circle at a different taxonomic level represents the taxonomy at that level, and the diameter of the small circle is proportional to the relative abundance. Microorganisms with no significant differences are uniformly colored in yellow, and genera with significant differences have been marked as green or red. (B) Deferentially enriched microorganisms at species level displayed by the bar chart (left) and the clade diagram (right). microorganisms with significant differences between the two groups determined by LEfSe and Metastat methods basically overlapped, all 40 genera with significant differences (p < 0.05, LDA > 3) based on the LEfSe results were selected for the random forest classification analysis. Leave-one-out cross-validation (LOOCV) was used, and the training and testing sets were set to 3:1, with the testing results shown in Figure S3. According to mean decrease accuracy (MDA) and mean decrease Gini (MDG) indices, five genera were considered as potential markers for identification of lung cancer, including Klebsiella, Mycobacterium, Pedobacter, Prevotella, and Xanthomonas, with threshold value of MDA > 5 or MDG > 2.
We further evaluated the typing ability of these five genera (Figure 4), and the results showed that using only these five genera as biomarkers was better than using all 40 genera as markers.
Considering that tumor markers CYF21-1, CEA, and NSE can also play important roles in tissue typing, we explored what combination of microbial biomarker can achieve the highest efficiency ( Figure S4).
We further evaluated the diagnostic performance of the classifiers based on the five genera combined with three tumor markers (CEA, NSE, and CYF21-1) in lung cancer by ROC curves (Figure 5, diagnostic performance of the three tumor markers alone shown in Figure S5)

| DISCUSSION
Herein, we explored the composition of the microbiome in BALF samples of patients with lung cancer and benign lung diseases using metagenomic sequencing, as well as identified specific microorganisms that were able to serve as promising biomarkers for the early diagnosis of lung cancer. Although mNGS technology has been widely used in the clinical examination, 21 to the best of our knowledge, it is the first time to apply it to the screening of biomarkers for lung disease typing. A mounting number of clinical and animal experiments have suggested a link between dysregulated lung microbiome and lung cancer. 9 It have been demonstrated that in some cancers, like cervical cancer 22 and excised pancreatic cancer, 23 an increase in alpha-diversity is often linked to improved outcomes in terms of survival and treatment response, which may be realized by affecting the host's response to immunity. As previously by Lee et al., the alphadiversity of microbial communities varies significantly in BALF samples between the lung cancer and benign lung diseases. 13 Similarly, our study also found that the alphadiversity of malignant tumors was different from that of benign diseases both at genus level and species level. In addition, beta-diversity results showed that although no significant separation in microbial community both at the genus and species levels between the malignant and benign groups, malignant tumor group was more clustered (according to PCA analysis), representing less variation among the malignant BALF samples compared with benign samples (NMDS results showed the opposite).
Furthermore, the MRPP results showed that the overall structure of the microbial communities between the two groups demonstrated a differentiation trend, which is consistent with the findings of Tsay et al. 24 and Liu et al. 25 that also revealed significantly different lung microbiome compositions between lung cancer and non-malignant diseases.
The above results collectively imply that the microbiome composition in the two groups of tissues is inherently different, which is a potential prerequisite for screening microbial markers, confirming the rationality of using microorganisms as tumor markers. Although almost no microorganisms themselves are carcinogenic except some bacteria and viruses, recent studies have shown that changes in lung tissue microenvironment are related to microbial colonization, 26 and it is possible that lower respiratory tract bacteria may participate in the occurrence and development of lung cancer through changes in lung tissue environment, and the main mechanisms are summarized as follows: (1) The activation of inflammatory pathways mediated by inflammatory microenvironment promotes the occurrence and development of lung cancer. 27 (2) Immune microenvironment mediated immune activation promotes the occurrence and development of lung cancer in the host immune-microbial relationship. Pulmonary microbiota induces immune tolerance through recruitment of dendritic cells, γδT, and T-regulatory cells. 28 (3) Metabolic regulation mediated by metabolic microenvironment promotes the occurrence and development of lung cancer. In recent years, studies have found that bacteria participate in the regulation of host metabolism, and bacteria utilize the metabolites of host cells, such as amino acids, nucleotides, polysaccharides, lipids, vitamins, etc. Meanwhile, the metabolites produced by bacteria also affect the occurrence and development of tumors, affect the growth and diffusion of tumors, inhibit cell apoptosis, and enhance tumor angiogenesis. 29 At the same time, lung cancer is often associated with lower respiratory tract infection, which affects the composition of airway flora, affects the therapeutic effect of lung cancer, and the overall prognosis of the patient. 30 F I G U R E 5 ROC curves of the classifiers of the five genus combined with three tumor markers (CEA, NSE, and CYF21-1) for diagnosing lung cancer. A, All genus classifier + CEA + NSE + CYF21-1. B-F, 5 genus classifier combined with different tummarkers. The red dot represents diagnostic threshold, with larger than the threshold as malignant and lower than as benign. The data at the red dot indicates threshold (sensitivity and specificity). The shaded areas are the 95% confidence intervals of the AUC.
Additionally, this study identified specific microorganisms related to lung cancer through LEfSe and Metastat analysis. The results showed significantly higher relative abundance of Achromobacter in the BALF samples of lung cancer patients, whereas previous studies have suggested that Achromobacter infection manifests as lung nodules mimicking carcinoma. 13 Concatenation among the top 10 genera and species in the abundance of benign and malignant samples showed that the common microbial genera in each sample included Veilonella, which has been previously verified to be a potential biomarker for lung cancer by quantitative PCR. 12 These differential enriched microorganisms may play specific roles in the tumor microenvironment by regulating specific metabolic pathways, 8 Previous studies have also shown that the reduction in the abundance of specific microorganisms may increase the risk of cancer. 14 Thus, we further conducted metagenomebased metabolic pathway differential analysis in patients with lung cancers and benign lung diseases. The results found that amino acid-related metabolism was more prevalent in benign samples, whereas carbohydrate-related metabolism was more prevalent in malignant samples. It has been documented that both amino acid-related metabolism 31 and carbohydrate-related metabolism 32 are involved with the occurrence and development of various cancers. These data indicate that the inhibition of amino acid metabolism and the activation of carbohydrate metabolism may be associated with the progression of lung cancer.
To further evaluate the diagnostic performance of specific microorganisms in lung cancer, the random forest classifiers were established based on these specific microorganisms combined with three tumor markers (CEA, NSE, and CYF21-1). According to feature importance on MDA and MDG indicators, five genera were screened as potential biomarkers, including Klebsiella, Mycobacterium, Pedobacter, Prevotella, and Xanthomonas. Klebsiella is often associated with lung infections. 33 However, in this study, its relative abundance was significantly increased in the benign group, which may be explained that Klebsiella is a very common pathogenic microorganism in lung infections, and its high virulence can cause symptoms similar to those of malignant tumors, but does not lead to an increase in the incidence of malignant tumors. 34 Mycobacterium is a definite pathogenic agent tuberculosis. It has been clinically proven that a history of pulmonary tuberculosis is closely associated with an increased risk of lung cancer, 35 and blocking the PD-1/PDL1 signaling pathway may benefit patients with Mycobacterium tuberculosis or other chronic infections, or even prevent their cancer development. 36 However, related studies also pointed out that this carcinogenic effect is a long-term effect. 37 In this study, the higher loading of Mycobacterium in benign tissues seems to imply that the presence of Mycobacterium more predominantly marks the onset of pathogenic infection rather than lung cancer. Pedobacter is considered as a background microorganism in the environment. 38 The loading of this genus in malignant sample was significantly reduced in this study, however, the relationship between Pedobacter and lung cancer needs to be explored further. Prevotella were significantly higher in lung cancer patients than in patients with benign lung disease, which is consistent with previous findings. 24 This suggests the momentous clinical significance of the genera in lung cancer progression, which is worthy further exploration. This study has shown that in vitro exposure of airway epithelial cells to Wolbachia, Prevotella, and Streptococcus contribute to the upregulation of ERK and PI3K signaling pathways in lung cancer. 24 In addition, Xanthomonas was significantly higher in benign samples, and consistently, microbes of this genus have been reported to play an anti-tumor role. 39 Afterwards, the diagnostic performance of the random forest classifiers was evaluated through the ROC curve. and we found that these five key genera + NSE presented the optimal diagnostic performance (AUC = 0.959) for lung cancer. The study of Jin et al. established a diagnostic model based on age, smoking years, and 11 types of bacteria to predict lung cancer with AUC of 0.882. 40 Cheng et al. also constructed a random forest model similar to our study based on lung microbiome and tumor markers with AUC of 0.845 to diagnose lung cancer. 41 Compared with the above results, the model of five genera + NSE in this study displays obvious diagnostic advantages for lung cancer.
However, our study has some limitations. First, the number of patients enrolled in this study was not large, and a larger sample size in following study is clearly beneficial for drawing more evidence-based conclusions. In addition, lung cancer patients were not classified by histological subtypes or different stages which may cause results heterogeneity. Second, the model for distinguishing lung cancer from benign lung disease lacking evaluation of a validation cohort can lead to false-positive values and unreliability. As a single-center study, a multi-center subject design should be considered to validate its findings prior to subsequent industrialization attempts. Third, this study is a cross-sectional study that illustrates the phenomenon only from a microbiological perspective. Fourth, BALF is not as readily available as saliva, which may have limitations in clinical application. Although the possible metabolic pathways involved with lung cancer were preliminarily predicted based on the microbiome results, the interaction mechanism between them was not further explored.

| CONCLUSION
In this study, we focused on the microbiome characteristics of BALF samples in patients with lung cancer and benign lung disease based on metagenomic sequencing, and predicted the metabolic pathways related to lung cancer. The model of five key differential genera + NSE established in this research exhibited remarkable performance for distinguishing lung cancer from benign lung disease. The scientific significance of these results is to highlight these specific microorganisms as potential new targets for lung cancer diagnosis and treatment, with value of mechanism research and provide directions for further in-depth research.