Genetic and tumor microenvironment differences between cell cycle progression pathway altered/non-altered patients with lung adenocarcinoma


 Background

Lung cancer is the leading cause of cancer-related death worldwide, among which lung adenocarcinoma (LUAD) is the most common type. Identified as a hallmark of cancer, the dysregulated cell cycle progression plays an important role in the promotion and progression of LUAD. This article aims to elucidate the heterogeneity between CDKN2A-CDK/cyclin-RB1 cell cycle progression pathway altered /non-altered patients with LUAD, thus helping us have a better understanding of the effect of the aberrant cell cycle.
Material and Methods

The data of this study were downloaded from The Cancer Genome Atlas (TCGA) data portal (https://portal.gdc.cancer.gov/) and UCSC Xena Browser (http://xena.ucsc.edu/), including simple nucleotide variation data, RNA-seq gene expression data, survival data, clinical data, and miRNA expression data. After matching the RNA-seq gene expression data, simple nucleotide variation data, miRNA expression data, and survival data with clinical data, 510 gene and long non-coding RNA expression data, 506 simple nucleotide variation data, 440 microRNA expression data, and 497 survival data were included in this study for further analysis. R software (version 4.0.3) was used for analysis.
Results

After dividing the patients into mutation (n = 57) and wild (n = 453) groups according to the cell cycle progression pathway status, we found no significant difference in survivorship between them. The mutation group had a higher mutational load and mutational rates of various genes such as tumor protein P53 (TP53) compared to the wild group. Subsequently, we analyzed the differentially expressed genes (DEGs) between the two groups. Among the 58387 genes analyzed, 302 were upregulated, and 354 were downregulated in the mutation group. Enrichment analysis indicated that these DEGs were closely related to metabolism items and cell cycle-related events. After performing immune cell infiltration analysis, we found the two groups have different patterns of immune cell profiling. Albeit the immune and stromal scores were higher in the wild group, we failed to find any significant difference between the two groups. Finally, we build a computational model to predict the cell cycle progression pathway-related gene mutation by LASSO-binary logistic regression analysis, the predictive accuracy of which is 0.88.
Conclusion

In summary, our study compared the genetic and microenvironment differences between cell cycle progression pathway altered /non-altered patients with LUAD by analyzing the data from TCGA datasets. We hope our findings could improve our understanding of the heterogeneity between the two kinds of patients, thus providing new insight into LUAD patients' treatments.


Introduction
Lung cancer is the leading cause of cancer-related death and the most common type of cancer in men globally 1 . The two major types of lung cancer are non-small-cell lung cancer (NSCLC) and small-cell lung cancer (SCLC). The former type accounts for 85% of lung cancer cases 2 , among which lung adenocarcinoma (LUAD) is the most common subtype with more than 500,000 deaths each year worldwide 3 . According to the report published by the American Cancer Society and the National Cancer Institute, the 5-year relative survival of NSCLC for all stages is only 23% 4 . In the past decades, the utilization of targeted therapies and immunotherapies alone or in combination with conventional therapies such as chemotherapies has greatly improved the prognosis of LUAD patients 5 , and they begin to play an increasingly important role in the treatment of LUAD. These remarkable achievements are attributed to the deep understanding of the genetic and molecular traits of LUAD. In this article, we will talk about the cell cycle progression pathway, which is important in the promotion and progression of LUAD.
The cell cycle, a highly organized process regulated by cyclin and cyclin-dependent kinase (CDK) complexes and other regulators, consists of four sequential phases: G1, S, G2, and M phases 6 . In the G1 phase, cells are busy with biosynthesis prepared for the following steps. S phase is characterized by DNA replication and the synthesis of related proteins like histones. During the G2 phase, cells enhance the lipid synthesis required for membrane construction and guarantee that everything is ready to initiate the mitosis 6 . In the M phase, the chromosomes will be evenly separated into two daughter cells, and then the mitosis comes to an end. Compared to normal tissues that could carefully control the production and release of growth-promoting signals during the cell cycle, cancer cells could maintain sustainable proliferative signals and avoid programs that may arrest cell cycle progression 7 . We found the ve genes, including cyclin-dependent kinase inhibitor 2A (CDKN2A), cyclinD1 (CCND1), cyclin-dependent kinase 4 (CDK4), cyclinE1 (CCNE1), and retinoblastoma 1 (RB1), are frequently aberrant in LUAD patients, and they orchestrate a pathway together which could modulate cell cycle progression. CDKN2A, which encodes the CDK inhibitor p16 INK4a and p14 ARF protein, is located upstream of the cell cycle progression pathway and could be found loss-of-function in multiple human tumors, including NSCLC 8 . As a tumor suppressor gene CDKN2A functions as an inhibitor of CDK4, which is a driver for mitosis. CDK4-CCND1 complexes, along with CDK2-CCNE1 complexes, could phosphorylate and inactivate the RB1 to release the E2F transcription factor, thus inducing the cell to complete G1-to-S transition [9][10][11][12] . Among all the mitotic steps mentioned above, the transition from G1 to S is the most crucial for cell cycle progression. Once the cell has entered the S phase, it is bound to get through S, G2, and M phases and generate two daughter cells 13 . As we can see, the cell cycle progression pathway is well organized by various regulators, errors at any step may disrupt the homeostasis of cell proliferation, apoptosis, and death, thus contributing to tumor promotion and progression.
In this article, we systematically analyzed the differences in somatic mutations, gene expression, immune cell in ltration, microRNAs (miRNAs), and long non-coding RNAs (lncRNAs) between the cell cycle progression pathway altered and non-altered patients with LUAD. This study aims to enhance our understanding of the function of the cell cycle progression, which could shed new lights on the development of new drugs targeting this pathway.

TCGA data collection
The TCGA data of this study were downloaded from The Cancer Genome Atlas (TCGA) data portal (https://portal.gdc.cancer.gov/) and UCSC Xena Browser (http://xena.ucsc.edu/), including simple nucleotide variation data, RNA-seq gene expression data, survival data, clinical data, and miRNA expression data. After matching the RNA-seq gene expression data, simple nucleotide variation data, survival data, miRNA data with clinical data, 510 gene and lncRNA expression data, 440 miRNA expression data, 506 simple nucleotide variation data, and 497 survival data were included in this study for further analysis in R software (version 4.0.3).

Clinical data of the patients and survival analysis
Patients who harbored any mutant cell cycle progression pathway-related genes (CDKN2A, CCND1, CDK4, CCNE1, and RB1) were de ned as the mutation group, and the others were de ned as the wild group. Clinical data including age, sex, race, anatomic location of the lesion, smoking history, and TNM stage were collected for analysis by t-test. Next, Kaplan-Meier survival analysis was performed between the two groups.

Somatic mutations and mutational load
The simple nucleotide variation data were divided into two groups according to the CDKN2A-CDK/cyclin-RB1 pathway status, and then they were performed by the R maftools package separately. P-value <0.05 was considered as signi cantly different.

Differentially expressed genes (DEGs) and enrichment analysis
Based on the RNA-seq gene expression data (count format), DEGs were analyzed by the R edgeR package. The absolute value of Log 2 FoldChange (|Log FC |) >1 and p-value <0.05 were considered as signi cantly different. Subsequently, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis were performed using R org.Hs.eg.db and Clusterpro ler package. Gene set enrichment analysis (GSEA) and Gene set variation analysis (GSVA) were also performed for further analysis.

Characteristics of immune cell in ltration
The pro ling of 22 immune cell types was analyzed using gene expression data (FPKM format) and the leucocyte signature matrix (LM22) in CIBERSORT (https://cibersort.stanford.edu/) [permutations set =1000]. LM22, composed of 547 genes, could distinguish 22 human hematopoietic cells, including various subtypes of T cells, B cells, plasma cells, NK cells, and myeloid cells 14 . Besides, we calculated the immune and stromal scores by the R estimate package. P-value<0.05 was considered as signi cantly different.

Construction of predictive model by LASSO -binary logistic regression analysis
Based on the 656 DEGs, we used LASSO (Least absolute shrinkage and selection operator) and binary logistic regression analysis to build a computational model to predict if a LUAD patient harbored any cell cycle progression pathway-related gene mutation. The LASSO regression analysis is a penalized method to select data with high dimensions and reduce the impact of over tting 15,16 . Ten-fold cross-validation was adopted using the R glmnet package to determine the optimal parameter λ and corresponding genes.
Subsequently, we conducted binary logistic regression using the candidate genes and build the predictive model. Based on the gene expression data, each sample obtained a predictive value according to this model. Then we exploited the receiver operating characteristic curve (ROC) to determine the cutoff value for the established model and calculate the predictive accuracy.

Clinical data of the patients and survival analysis
The mutational rates of RB1, CDKN2A, CCNE1, CDK4 and CCND1 in TCGA samples are 7.6%, 4.9%, 1.2%, 0.6% and 0.4%. After dividing the patients into mutation (n=57) and wild (n=453) groups, we found that more than half of the patients in the mutation group harbored RB1 mutation (68.4%, the common variants in lung cancer were C706F, G748K, and R661W), followed by CDKN2A (43.9%, the common variants in LUAD were R80Q and H83Y), CCNE1 (10.5%), CDK4 (5.3%, the common variants in LUAD were R24L and R24C ) and CCND1 (3.5%) Figure 1A . Among the 57 patients in the mutation group, 8 of them had more than one mutant gene in the pathway. The clinical data analysis showed that the differences between the two groups in age, sex, anatomical location of the lesion, smoking history, and TNM stage of the disease were not statistically signi cant Table 1 . Then we performed the Kaplan-Meier survival analysis between the two groups, which indicated that the mutation group had a higher survival rate most of the time Figure 1B . But as the number of patients at risk in this group was quite few after 42 months, the curve of the mutation group suffered a sharp decline after that, resulting in a worse prognosis than the wild group. Thus, we think this survival result needs to be further veri ed in larger groups.

Differences in somatic mutations and mutational load
Different mutational patterns were detected between the mutation and wild groups Figure 2A&B, Supplementary Figure 1 . Among the top 20 mutant genes, most of them, including tumor protein 53 (TP53, 74% VS 44%), Xin actin binding repeat containing 2 (XIRP2, 37% VS 22%), zinc nger protein 536 (ZNF536, 33% VS 18%), piccolo presynaptic cytomatrix protein (PCLO, 32% VS 16%), and protocadherin related 15 (PCDH15, 33% VS 17%) were signi cantly higher in the mutation group. Only 2 [Kirsten Rat Sarcoma Viral Proto-Oncogene (KRAS) and Ankyrin 2 (ANK2)] had higher mutational rates in the wild group. After calculating the mutational load between the two groups, we found the mutation group had a higher mutational load (P<0.01) Figure 3A , which indicated that the mutation group is more likely to bene t from immunotherapies. Then we analyzed the expression patterns of 15 immune checkpoints and 20 co-stimulators 17 between the two groups Figure 3B&C . Among co-stimulators, TNF superfamily member 13 (TNFSF13) showed a signi cant difference and was higher in the wild groups. As for immune checkpoints, lymphocyte activating gene 3 (LAG3) was signi cantly higher in the mutation group.
To explore the biological functions affected by DEGs, we performed the GO and KEGG analysis. In GO analysis, the top ve signi cantly different pathways were corni cation, antimicrobial humoral response, neurotransmitter transport, negative regulation of peptidase activity, and negative regulation of endopeptidase activity Figure 4B-D . KEGG analysis showed the DEGs were related to fat digestion and absorption, drug metabolism − cytochrome P450, and retinol metabolism Figure 4E . So we can see that these DEGs may make a difference in the metabolism items in our body.
Apart from GO and KEGG analysis, we further investigated the in uence of cell cycle progression pathway status using GSEA and GSVA analysis. Processes like cell cycle, DNA replication, and homologous recombination were signi cantly different between the two groups by GSEA analysis Figure 5A-C . GSVA analysis showed biological processes such as DNA replication, E2F targets, mitotic spindle, and G2M checkpoints were signi cantly different between the two groups Figure 5D-I , which was consistent with the GSEA results, suggesting that the different gene expression patterns have a signi cant impact on the cell cycle-related events.
Both miRNAs and lncRNAs could act as important modulators of the expression and function of genes. Here, we analyzed the differentially expressed miRNAs and lncRNAs between the two groups Supplementary Figure 2A&B . The results showed that 31 miRNAs were upregulated in the mutation group, the most signi cantly different ones of which were miR−3689e (logFC=2. 30 After matching DEGs and differentially expressed lncRNAs with differentially expressed miRNAs, we built a ceRNA network of DEGs-differentially expressed miRNAs-differentially expressed lncRNAs (Here, only genes negatively regulated by miRNAs were included in this network) Figure 6 . For example, miR-509-3p could regulate ZNF713 and the other genes. We found that miR-3131 and miR-185-3p could regulate more than 40 genes, which indicated that the two miRNAs and the genes they regulate might play a signi cant role in the differences between the mutation and wild group. As for lncRNAs, RP11-230G5.2 handled the most miRNAs. macrophages (P<0.01) were higher in the mutation group, while CD4 + memory resting T cells (P<0.01), monocytes (P<0.001), activated mast cells (P<0.05), resting (P<0.05) and activated (P<0.01) dendritic cells are higher in the wild group, indicating that the mutation group may have a cold tumor microenvironment. Then we further explore the tumor purity between the mutation and wild groups. The results suggested that the immune and stromal scores were higher in the wild group, but we failed to nd any signi cant difference between the two groups Figure 7B&C .

Construction of predictive model by LASSO -binary logistic regression analysis
In LASSO analysis, the optimal parameter λ was set as lambda.1se criteria, which means one standard deviation from the minimum binomial deviance Figure 8A&B . Seven genes including natriuretic peptide A (NPPA), free fatty acid receptor 2 (FFAR2), mucolipin TRP cation channel 3 (MCOLN3), zinc nger protein 713 (ZNF713), F-box protein 43 (FBXO43), TATA-box binding protein associated factor 7 like (TAF7L), and synaptic vesicle glycoprotein 2B (SV2B) were employed to build the predictive model. Then we conducted the binary logistic regression analysis to get the coe cients of every factor, and the predictive model is as follows: score= -0.8434×NPPA + 0.4617×FFAR2 + 0.6178×MCOLN3 + 0.9924×ZNF713 + 0.8930×FBXO43 + 0.5772×TAF7L + 1.3533×SV2B Figure 8C . ROC demonstrated the cutoff value of this model was 3.008, and AUC (area under the curve) is 0.777 Figure 8D . When the score is equal to or higher than 3.008, the test sample is more likely to harbor cell cycle progression pathway-related gene mutation. Otherwise, it is more likely to carry no cell cycle progression pathway-related gene mutation. After testing this model in TCGA data, we found the predictive accuracy is 0.880.

Discussion
Lung cancer is the leading cause of cancer-related death worldwide, among which LUAD has been the most prevalent histopathological type since 1985 19 . In our study, we explored the differences between the CDKN2A-CDK/cyclin-RB1 cell cycle progression pathway altered/non-altered patients with LUAD in somatic mutations, gene expression, immune cell in ltration, and various biological processes. Compared to the wild group, the mutation group had a higher mutational load and mutational rates of multiple genes such as TP53. By analyzing the RNA-seq gene expression data, we found that 302 protein-coding genes were upregulated in the mutation group, such as VGLL2, GCG, and cysteine-rich secretory protein 3 (CRISP3), and 354 were downregulated like DEFA5. Through enrichment analysis, the DEGs were closely related to metabolism items and cell cycle-related events. The same as DEGs, miRNAs, and lncRNAs, which play an important role in epigenetic modi cation, differed between the two groups. Subsequently, we built a ceRNA network of DEGs-differentially expressed miRNAs-differentially expressed lncRNAs to demonstrate their relationships. After performing immune in ltration analysis, we found that 7 out of 22 cell types were signi cantly different between the two groups. The proportion of CD4 + memory resting T cells (P<0.01), monocytes (P<0.001), activated mast cells (P<0.05), resting (P<0.05), and activated (P<0.01) dendritic cells were higher in the wild group, indicating that the mutation group may have a cold tumor microenvironment.
Most RB1-mutant tumors also harbor TP53 co-alterations 20 . In our study, the mutational rate of TP53 between mutation and the wild group is 74% VS 44%. As a well-known tumor suppressor gene, TP53 could monitor the abnormalities within the cell. Once the cell suffers excessive DNA damage or the status is not optimal for mitosis, TP53 will suspense the progression of the cell cycle in time until the conditions have been normalized. Besides, when a cell encounters irreversible impairment, TP53 could trigger apoptosis to maintain homeostasis 7 . Mutant TP53 not only lost the abilities of surveillance but also create a circumstance that favors immune evasion and tumor progression by disrupting the innate immune signaling 21 . Given the important roles RB1 and TP53 play in the cell cycle progression and the high mutational rates in LUAD patients, both of them may be potential targets for drug development.
The mutation group had a higher mutational load (P<0.01), indicating that these patients are likely to produce more neo-antigens that could be recognized by T cells 22 . Therefore, the mutation group may acquire a better clinical outcome to immunotherapies compared to the wild group. Aside from mutational load, LAG3, an immune checkpoint molecule, is also higher in the mutation group. Immune checkpoints are crucial for maintaining autoimmune tolerance but could be harnessed by cancer cells to evade immune surveillance. The advent of anti-programmed death 1 (PD-1) and programmed death-ligand 1 (PD-L1) drugs have shown great e cacy in lung cancer patients. However, less than 25 percent of patients are expected to bene t from them 23 . LAG3 could be detected on tumor-in ltrating lymphocytes (TILs), B cells, natural killer cells, and Tregs 24 . With nearly 20% sequence identical to CD4, it could bind with MHC-, thus negatively regulating the function of NK cells, CD4 + and CD8 + T cells, and Tregs 25 . In NSCLC patients, LAG3 is usually associated with PD-1 expression and poor prognosis 26 . It is reported that LAG3 may have a synergistic effect with PD-1/PD-L1. Therefore, LAG3 may be a new biomarker for NSCLC patients and provide new strategies to improve the effects of anti-PD-1 and PD-L1 therapies. Patients with altered cell cycle progression pathways are likely to bene t from it.
As our results show, the mutation group had a higher expression level of CRISP3. CRISP3 comprises two domains: a CAP domain common to the entire superfamily and a clade-speci c CRISP domain 27 . The CRISP domain is shown to regulate certain types of ion channels. Accumulating evidence suggests that CRISP3 may be involved in the oncogenesis of oral squamous cell carcinoma 28 and progression of prostate cancer 27 . Highly enriched in male reproductive tract, CRISP3 protein is found to play an important role in the dissemination of cancer cells in mouse models of prostate cancer, and it is considered a potential target for treatment 27 . In breast cancer patients, CRISP3 is negatively related to prolonged overall survival (OS) and disease-free survival (DFS) 29 . Analogous to prostate and breast cancer, CRISP3 may also play a pro-tumor role in LUAD, but its exact effect is still unknown.
Compared to the wild group, DEFA5 is signi cantly downregulated in the mutation group. Defensins are small cysteine-rich cationic polypeptides that are secreted by speci c leukocytes and epithelial cells 30 .
Recognized as antimicrobial agents, defensins also play an important anti-tumor role 31 . Via binding with B cell-speci c Moloney murine leukemia virus integration site 1 (BMI1), DEFA5 could exert anti-tumor effects by inhibiting the cell mitosis. BMI1 serves as a transcription inhibitor of CDKN2A 32 , which is a tumor suppressor gene and could arrest the cell cycle progression. Therefore, by antagonizing BMI1 expression, DEFA5 could indirectly call a halt to the aberrant cell division. In an experiment of gastric cancer cells, DEFA5 overexpression dramatically increased the number of G1-phase cells but signi cantly decreased G2/M-phase cells 30 , indicating that DEFA5 overexpression could result in cell cycle arrest at the G1 phase. Given the role of DEFA5 in cell cycle arrest, it may be a novel target for developing new drugs, but further exploration is needed to clarify its effect in LUAD. miR-3131, which was upregulated in the mutation group, regulated more than 40 genes in the ceRNA network. Important as it is, its function is barely reported. Wang CQ et al. founded that the overexpression of miR-3131 is positively correlated with cell viability of hepatocellular cancer, and miR-3131 could inhibit apoptosis by upregulating BCL-2 and downregulating caspase-3 33 . So miRNA-3131 may also exert protumor effects in LUAD, but its role still needs further veri cation. MiR-185-3p, also upregulated in the mutation group, is proven a tumor suppressor in colorectal cancer (CRC) 34 , the overexpression of which could enhance the chemosensitivity of CRC patients. Therefore, it is considered a potential target for the treatment of 5-FU-insensitive CRC 35 .

The tumor microenvironment, which is composed of tumor cells, immune cells, and stromal cells, is
closely correlated with the response to immunotherapies and drug resistance 36 . We found that the wild group had a higher proportion of monocytes and dendritic cells through immune cell in ltration analysis. Monocytes are highly heterogenous in the complex tumor microenvironment. Based on the expression level of Ly6C, monocytes can be divided into Ly6C hi classical monocytes, and Ly6C lo non-classical monocytes, both of which could play pivotal roles in pro-and anti-tumor effects in different conditions 37 .
For example, non-classical monocytes could release immunosuppressive cytokines, such as IL-10, which suppresses the function of cytotoxic T lymphocytes in tumors 38 . They are also capable of recruiting NK cells to the pre-metastatic niche 39 , thus killing the cancer cells. This NK cell-recruiting function has been con rmed in patients with early-stage lung cancer 40 . Since monocytes function like a double-edged sword, further research may look at in which condition it can exert anti-tumor effects in LUAD patients.
As the most important antigen-presenting cells (APCs) in our body, dendritic cells (DCs) play a signi cant role in both innate and adaptive immunity. By presenting the antigen-MHC complex to the T cells, DCs could activate the T cell-mediated immune response to kill cancer cells 41 . What's more, by secreting high levels of type-interferons and chemokines such as CXCL9 and CXCL10, DCs could facilitate the recruitment of effector T cells and NK cells into tumors and maintain the cytotoxic functions of effector cells 42,43 . Sipuleucel-T, a DC-based vaccine, has shown clinical bene ts and has been approved by the US Food and Drug Administration (FDA) for treating prostate cancer 44 . As the stunning anti-tumor effects DCs could elicit, it can be employed for killing cancer cells and may greatly improve the prognosis of LUAD patients.
Since sustaining proliferative signals is a hallmark of cancer 7 , inhibition of the dysregulated cell division is a promising strategy for cancer therapies. The administration of CDK4/6 inhibitors could prevent the phosphorylation of RB1 by CDK-cyclin complexes, calling a halt to the cell cycle progression. Up to date, three generations of CDK inhibitors have been developed for cancer treatment. As the third-generation selective CDK inhibitors, Palbociclib, ribociclib, and abemaciclib have been approved by the US FDA to treat breast cancer 45 . Despite the clinical bene t in speci c breast cancer patients, the results of clinical trials accessing the effect of single-agent CDK inhibitors in NSCLC are frustrating 46 . To fully evaluate the application of CDK inhibitors in NSCLC, further investigation in this eld may target combination with other anti-cancer therapies and the discovery of predictive biomarkers.
There are also some limitations of our study. Besides the status of the cell cycle progression pathway, variants such as comorbidities could also contribute to differences between the two groups, but these factors were neglected in this article. What's more, the number of patients in the mutation group is smaller than the wild group, and we cannot verify our results in Gene Expression Omnibus (GEO) datasets due to the lack of gene mutation information, so further investigation is needed in larger groups.

Conclusion
In conclusion, we analyzed the differences in somatic mutations, gene expression, various biological pathways, and immune cell in ltration between the CDKN2A-CDK/cyclin-RB1 cell cycle progression pathway altered/ non-altered patients with LUAD. Patients in the two groups have the different somatic mutation and gene expression patterns, pro ling of immune cell in ltration, and biological pathways that may play an important role in oncogenesis and tumor metastasis. We hope our study could improve our understanding of the function of the cell cycle progression pathway, thus contributing to the development of new therapies and precision medicine in the future.

Declarations
Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable.
Competing interests: The authors declare no competing interests in this work.