Impact of mini-driver genes in the prognosis and tumor features of colorectal cancer samples: a novel perspective to support current biomarkers

Background Colorectal cancer (CRC) is the second leading cause of cancer-related deaths, and its development is associated with the gains and/or losses of genetic material, which leads to the emergence of main driver genes with higher mutational frequency. In addition, there are other genes with mutations that have weak tumor-promoting effects, known as mini-drivers, which could aggravate the development of oncogenesis when they occur together. The aim of our work was to use computer analysis to explore the survival impact, frequency, and incidence of mutations of possible mini-driver genes to be used for the prognosis of CRC. Methods We retrieved data from three sources of CRC samples using the cBioPortal platform and analyzed the mutational frequency to exclude genes with driver features and those mutated in less than 5% of the original cohort. We also observed that the mutational profile of these mini-driver candidates is associated with variations in the expression levels. The candidate genes obtained were subjected to Kaplan–Meier curve analysis, making a comparison between mutated and wild-type samples for each gene using a p-value threshold of 0.01. Results After gene filtering by mutational frequency, we obtained 159 genes of which 60 were associated with a high accumulation of total somatic mutations with Log2 (fold change) > 2 and p values < 10−5. In addition, these genes were enriched to oncogenic pathways such as epithelium-mesenchymal transition, hsa-miR-218-5p downregulation, and extracellular matrix organization. Our analysis identified five genes with possible implications as mini-drivers: DOCK3, FN1, PAPPA2, DNAH11, and FBN2. Furthermore, we evaluated a combined classification where CRC patients with at least one mutation in any of these genes were separated from the main cohort obtaining a p-value < 0.001 in the evaluation of CRC prognosis. Conclusion Our study suggests that the identification and incorporation of mini-driver genes in addition to known driver genes could enhance the accuracy of prognostic biomarkers for CRC.


INTRODUCTION
Colorectal cancer (CRC) is a significant global health problem, ranking as the third most diagnosed neoplasia and the second leading cause of cancer-related death worldwide (Bray et al., 2018;Sung et al., 2021). In 2020, there were approximately 1.93 million new cases of CRC, representing 10% of all cancer cases (Sung et al., 2021). Despite advances in detection and treatment, CRC incidence and mortality rates have increased by 7.2% from 2018 to 2020. Therefore, it is essential to continue research into the underlying mechanisms driving CRC progression (Bray et al., 2018;Sung et al., 2021).
KRAS, NRAS, BRAF, TP53, and APC are considered driver genes for CRC since a few pathogenic mutations in any of these genes are sufficient to develop a tumor (Thierry et al., 2014). However, defining a single group of genes as the drivers of all tumors is challenging and only explains a small number of cancer cases (Thierry et al., 2014). This challenge arises because the description of any genetic variation as a pathogenic mutation depends on factors such as its impact on the translated protein (missense, nonsense, etc.) or the number of nucleotides affected (single-nucleotide, insertion, deletion, etc.).
To differentiate deleterious variations from polymorphisms, current genetic definitions consider mutations to be any genetic variations with a reduced frequency (<1%) in a healthy population (Al-Koofee & Mubarak, 2020). However, these definitions do not account for the possibility of new transcriptional switches that can support novel consequences of previously known polymorphisms, as recently described in CRC (Abdi, Latifi-Navid & Latifi-Navid, 2022).
The identification of coding genes, lncRNA, circRNA, and miRNA through transcriptomic studies has broadened the concept of mini-driver genes to genes or genomic regions that may collectively be associated with poor prognosis in cancer (Yang et al., 2018;Wu et al., 2020). Nevertheless, considering the large group of genes associated with cancer features and prognosis whose function is not elucidated, it is essential to propose a strategy that includes all types of genetic variations as mutations for analyzing candidate genes capable of supporting current driver genes. To this end, we adopt the concept of mini-driver genes, which refers to low-frequency genetic alterations with a relatively weak tumor-promoting effect (Castro-Giner, Ratcliffe & Tomlinson, 2015).
There are established criteria for identifying mutated genes that may be considered mini-driver genes. Firstly, individual mutations in a mini-driver gene should provide a growth advantage for cancer cells compared to normal cells, although this is not necessarily critical for tumor development. Secondly, mini-driver genes must be present in a small proportion of tumors. Thirdly, they may be present in subclones because they have a relatively weak selective advantage, resulting in less probability of selective sweeping. Fourthly, mini-driver genes should show parallel or convergent evolution between cancer subclones and between cancers of the same type. Finally, they should be involved in processes such as gene expression regulation, mRNA stability, transcriptional changes, DNA methylation, and other non-coding genomic features (Castro-Giner, Ratcliffe & Tomlinson, 2015;van Ginkel, Tomlinson & Soriano, 2023).
Mini-driver genes play a significant role in tumor diversification, but the mechanisms underlying their effects are not well understood. One possible mechanism is that mini-drivers may help to maintain tumor homeostasis by counteracting the deleterious effects of some passenger mutations (Li & Thirumalai, 2016;Cuykendall, Rubin & Khurana, 2017). In some cases, non-coding mutations could be called "mini-drivers" because they alter transcriptional regulation, mRNA translation and stability, splicing control, and chromatin structure, leading to altered gene expression that favors tumor progression (Elliott & Larsson, 2021). Another possible mechanism by which mini-driver genes contribute to tumor development is through working together with driver genes/ mutations. Large-scale genome-wide association studies (GWAS) have shown that even the most significant loci explain only a fraction of the predicted genetic variation for typical traits (Boyle, Li & Pritchard, 2017). Therefore, mini-driver genes may explain how polygenic effects provide a means by which heterogeneous mutation patterns can generate distinctive changes consistent with the phenotype observed in tumors (Bennett et al., 2018).
Clinical studies have identified mini-driver genes as prognostic biomarkers in cancer (Bennett et al., 2018) and our group evaluated mini-driver features in selected genes (Campos Segura, 2022) to focus on their contribution to CRC progression. In this study, we propose a strategy to identify potential mini-driver genes in CRC using Next Generation Sequencing (NGS) data and determine whether they could serve as prognostic markers, providing insights into their role in CRC progression.

Database filtering and mutational frequency analysis
We retrieved data from Next Generation Sequencing (NGS) experiments and clinical information of colorectal cancer (CRC) patients using the cBioPortal platform (https:// www.cbioportal.org/) ("cBioPortal for Cancer Genomics"; Cerami et al., 2012). Genomic data was only utilized to provide mutational information, while transcriptomic data, when available, was used to obtain mutational status, prediction of copy number alterations, and gene expression levels. Three cohorts of colorectal adenocarcinoma were selected for analysis, including the Dana-Farber Cancer Institute (DFCI) cohort (n = 619) (Giannakis et al., 2016), the Pan-Cancer Atlas from The Cancer Genome Atlas (TCGA) cohort (n = 594) , and the Memorial Sloan Kettering-Cancer Center (MSKCC) cohort (n = 138) (Brannon et al., 2014) datasets. All data sets were generated using Illumina HiSeq sequencers, including information for 18,215 genes. Clinical data are summarized in Table 1.

Characterization of somatic mutational profile in CRC
The cBioPortal platform was used to generate a report that displays the number of genetic variants detected in each gene, the number of patients with at least one variant per gene, and the total number of patients with available mutational information per gene. For this study, we included all somatic genetic variations, encompassing both SNPs (≥1% frequency in populations) and pathological mutations (<1% frequency in populations), as mutations. We then calculated the mean number of mutations per gene and patient (Table S1).
Using these values, we established four groups of genes. For that, we consider the power of the sample (number of patients), the percentage of participating genes, and the ranking of common driver genes in CRC. As result, we classified the genes as rarely mutated (<7%), low-mutated levels (7-10%), moderately mutated (11-50%), and highly mutated (>50%) genes. To visualize these results, we designed a scatterplot using the R software v.4.2.0 (R Core Team, 2022) with the ggplot2 package (Wickham, 2011).

Association between mutational status and tumor mutational burden
After filtering the prior data, we selected a putative group carrying mini-driver genes, consisting of low-mutated genes in patients (7% and 10%). Based on these genes, we compared the total number of variations (tumor mutational burden, TMB) per patient according to their mutational status per gene. In our comparison, we considered any patient with at least one somatic variation in a specific gene as "mutated". To compare TMB, we utilized the Mann-Whitney test and calculated the fold change between mutated and wild-type groups per gene. We then adjusted the p-values using the Benjamini-Hochberg (BH) method (Benjamini & Hochberg, 1995). Finally, we established a mean-value threshold (10 −5 ) for selecting the top genes whose mutational status was associated with changes in TMB. To visualize our findings, we plotted the results using the Enhanced Volcano package in the R software ("EnhancedVolcano"; Blighe, Rana & Lewis 2018).

Association between mutational status and gene expression/copy number variations
We investigated the expression levels of genes with mutational status and copy number variations. To accomplish this, we used the TCGAretriever package ("TCGAretriever: Retrieve Genomic and Clinical Data from TCGA"; Fantini, 2019) to verify the normalized expression levels of genes and putative copy number alterations predicted by Gistic (Mermel et al., 2011) in colon and rectal adenocarcinoma samples. We applied the same criteria as in the previous step to dichotomize the patient population for each selected gene (mutated or wild-type) and compared the expression levels of genes or copy number alterations. We then used the Mann-Whitney test to compare these values, taking into account the distribution of expression levels. Finally, we represented all samples in boxplots for each pair of genes. The Y-axis displays expression levels or the number of altered copies of the first gene, while the X-axis shows patient categories based on the mutational status of the second gene.

Gene enrichment
Gene lists were obtained after applying the Enrichr tool (Xie et al., 2021). We used all available databases from Transcription (17 databases), Ontology (25 databases), and Pathway (eight pathways) modules. After loading gene lists, we only considered relevant pathways where more than three genes were associated with p < 0.001.

Survival analysis
To investigate the prognostic value of the mini-drivers, we employed Kaplan-Meier curves and Cox proportional hazard regression analysis (Borgan Rnulf, 2001). Then, we identified the mini-drivers that potentially play a role in disease progression, including DOCK3, FN1, PAPPA2, DNAH11, and FBN2. Then, we established a gene panel based on these genes. Cox regression analysis was performed, including age (as continuous), sex, and tumor stages (AJCC, American Joint Committee on Cancer) as potential confounding variables.
To perform these analyses, we utilized the survival and survminer packages of R software version 4.2.0 (R Core Team, 2022).

RESULTS
Identifying mini-drivers in CRC based on the mutational frequency As summarized in Fig. 1, to identify potential mini-drivers in CRC, we analyzed the gene mutation frequency, and the statistical power. Then, we classified the genes into four groups (Table S1, Fig. 2): rarely mutated (≤7%, 17,993 genes), lowly mutated (7-10%, 159 genes), moderately mutated (11-50%, 62 genes), and highly mutated (>50%, two genes). The rarely mutated group includes genes with less than one variation in at least 59 patients, which represents less than 5% of the patients with available information for mutational profiling (980 patients from three cohorts with available information). To avoid non-representative results in further analyses, we excluded this group of genes. The moderately and highly mutated groups include genes considered as drivers, such as APC with 66%, TP53 with 56%, KRAS with 35%, BRAF with 16%, and ARID1A with 11%, among others. We focused on exploring the genes in the lowly mutated group (159 genes) as potential mini-drivers. We analyzed these genes with the Enrichr database and found that they are mainly associated with hsa-miR-218-5p regulation (miRTarBase 2017 database, adjusted p-value = 1.9 × 10 −4 ), extracellular matrix organization (Reactome 2022 database, adjusted p-value = 9.1 × 10 −11 ), and epithelial-mesenchymal transition (EMT, MsigDB Hallmark 2020 database, adjusted p-value = 2.5 × 10 −8 ).
Putative mini-driver genes are associated with high mutation rates In order to assess the impact of mini-drivers on tumoral progression, we analyzed the polygenic effect of 159 genes selected from the previous step. Our goal was to identify which of these genes were associated with high mutation rates. As shown in Fig. 3, we found that 60 genes had a significantly higher TMB when mutated compared to their respective wild-type group, with an increase ranging from 5.4-8.7-fold (p-value < 10 −5 ). Among the most statistically significant genes associated with high TMB were MUC5B, DNAH7, DOCK3, and BMPR2. Meanwhile, we identified SYNE2, COL7A1, NOTCH3, and SPEG as the genes with the highest mutation counts. Mini-drivers are associated with specific gene signatures in CRC After identifying the genes associated with high TMB (Fig. 3), we compared the expression levels and copy number alterations with the mutational status of these genes. We discovered 46 genes whose mutational status was linked to changes in the expression levels of other genes (p < 0.05, Table S2). Remarkably, the mutated group of the BMPR2 gene alters the expression of nine genes (ACVR2A, DOCK3, MUC5B, FLNA, DNAH8, SYNE2, FBN2, ITPR3, and DLC1), whereas mutated groups of FLNB, ACVR2A, SIPA1L3, and CELSR1 genes altered the levels of other six genes. Table 2 provides a summary of the 43 relevant gene pairs (log 10 (FC) > 0.4, p < 0.01), while Fig. 4 displays a selection of these findings. Among them, we found that the subexpression of MYH11 was related to its mutational profile (p = 0.0043), UBR5 levels were decreased in patients with mutations in FLNB (p = 0.0012) or COL7A1 (p = 0.0075), whereas FBN3 mutated was associated with low expression of FBN2 (p = 0.0079). Additionally, we observed reduced expression levels of ZNF536 and CUX1 genes in samples mutated for FLNB and TMEM132D genes, respectively (p < 0.008). Furthermore, we compared the mutational status of these 60 genes with copy number variations (CNVs). Specifically, we compared the number of altered copies (gains or losses) between mutated and WT groups for these genes. We identified 12 genes whose mutational status was linked to copy number alterations in other genes (p < 0.05, Table S3). Interestingly, the mutated SLITRK5 gene cluster was related to copy number alterations in five genes (NOTCH3, FBN3, SIPA1L3, PTPRS, and CUX1), particularly in NOTCH3 and FBN3, where the SLITRK5 mutated group showed large ratios (greater than 4) compared to WT patients (Fig. 5).
Overall, our results demonstrate that MUC5B, TMEM132D, SYNE2, POLE, CUX1, and NOTCH3 play critical roles as effectors with altered expression levels (Table S2) or copy number variations (Table S3) in the context of mini-driver genes.

Mini-driver genes contribute to CRC prognosis
We used the 60 genes obtained to analyze their performance as biomarkers for survival in CRC (Fig. 3). We applied this analysis to 201 CRC patients that had complete information on overall survival rates and mutational profiles of the 60 genes. The mutational status of five genes, DOCK3, FN1, PAPPA2, DNAH11, and FBN2, were associated (p < 0.05) with poor survival in CRC patients (Fig. 6A). We observed a survival rate of~25% or less for the mutated group, while the wild-type (WT) group retained more than 75% of survival after 5 years of follow-up. Then, when we constructed a panel with these genes, we observed an improvement in the survival rates differences between both groups (median OS mutated = 38.5 months vs. median OS WT = Not reached; Log-rank, p < 0.0001; Fig. 6B). Interestingly, we were able to identify a gene expression signature constituted of 16 genes that characterize the mutated group (p < 0.05 and absolute FC > 1.5). Between these genes, DOCK3, FN1, ADAMTS2, AHNAK, AHNAK2, DNAH7, NBEA, SACS, SMAD4, and VWF, were upregulated in the mutated group; whereas AMER1, DIDO1, LRP1, LRP1B, RNF43, and TG, were downregulated (Fig. S1). Finally, we tested the ability to predict the outcome of our mini-driver gene panel, in the presence of possible confounding variables (Fig. 6C). Our multivariate Cox regression demonstrated that our gene panel maintains its usefulness in predicting prognosis independent of age, sex, and AJCC tumoral staging (HR = 2.92, p = 0.002). Finally, Fig. 7 presents a visual abstract to summarize all our findings in this study and how our strategy could propose mini-driver genes as an additional set of markers to support current driver genes in CRC.

DISCUSSION
In this study, we utilized three reliable databases of colorectal cancer patients to propose novel perspectives on the analysis of putative mini-driver genes. These studies provide crucial and representative information on colorectal cancer. For instance, DFCI's study focused on molecular characterization utilizing whole exome sequencing (WES) to gather tumor genomic data alongside detailed pathological and clinical information (Giannakis et al., 2016). TCGA's comprehensive analysis using around 10,000 samples representing 33 cancer types was employed for our interest in gene expression analysis corresponding to CRC samples . Finally, MSKCC's study analyzed inter-and intratumoral heterogeneity as evidence in the development of CRC (Brannon et al., 2014).
As mini-driver genes have a low mutational frequency, their impact alone is insufficient to generate a significant advantage for tumor cells. Badr et al. (2022) state that multiple accumulated weak mutations can combine to form a polygenic conductor (main driver) with enough impact to modify cellular function and patient prognosis (Badr et al., 2022), as depicted in Fig. 6. Our findings indicate that patients with mutations in DOCK3, FN1, PAPPA2, DNAH11, and FBN2 genes had a shorter survival rate compared to patients without mutations. These results suggest that these alterations lead to cell dysregulation, as seen in other cancer types (Irmak-Yazicioglu, 2016;Wilk & Braun, 2018;Furuya et al., 2021).
The DOCK3 gene has been reported to participate in various processes related to invasion, migration, and metastasis in cancer cells (Hofer et al., 2017;Kotelevets & Chastre, 2020;Lu et al., 2020). However, there is limited research on the involvement of the MUC5B gene in CRC, with recent evidence showing high expression of this gene in elderly CRC patients, particularly in poorly differentiated tumors (Iranmanesh et al., 2021). Extracellular vesicles with high levels of phosphorylated and expressed FN1 have been identified as potential prognostic factors and therapeutic targets in CRC (Qi et al., 2020;Zheng et al., 2020). It has been postulated that FN1 could function as a promoter gene in non-canonical pathways of mini-driver genes and their mutations, with upregulation of FN1 by the HMGA2 gene contributing to a metastatic profile in CRC cells . Mutations in the PAPPA2 have been associated with tumor progression and treatment of digestive tumors, although its role in CRC is not yet understood (Miao et al., 2022). Similarly, the rs2285947 polymorphism in the DNAH11 gene has been linked to an increased risk of several cancer types, suggesting its potential contribution as a mini-driver gene in carcinogenesis (Wang et al., 2015).
The FBN2 gene is hypermethylated in CRC tissue and serum samples from patients with CRC and liver metastases, and its expression is directly correlated with shorter survival rates in colon cancer patients, suggesting a possible role as a tumor suppressor gene (Leygo et al., 2017;Wang et al., 2022). Although methylation data for all genes in a representative number of CRC samples were not available for this study, we anticipate that upcoming omics data will provide more information on methylation levels in CRC patients, allowing Figure 7 A visual abstract of the study. Since mutations in driver genes use canonical criteria to be characterized and could be not sufficient to explain all cancer cases, we evaluated a strategy for proposing additional genes using the mini-driver hypothesis.
Full-size Other genes, such as NOTCH3 and SLITRK5, have clear contributions to tumor development. NOTCH3 promotes tumor cell survival and proliferation, induces EMT and cancer stem cell (CSC) properties, and has been linked to various clinical and pathological features, including larger tumor size, advanced TNM stage, higher pathological grade, and tumor metastasis (Pastò et al., 2014;Aburjania et al., 2018;Xiu et al., 2021). Frequent genetic, epigenetic, and transcriptional changes have been observed in the SLITRK5 gene in colorectal neoplasias (Hesson et al., 2016). Nevertheless, genes such as CUX1 have controversial reports. It has been recently discovered that CUX1 is a tumor suppressor paradoxically overexpressed in tumor cells (Cancer Genome Atlas Network, 2012;Jo et al., 2017;Liu et al., 2020). Overall, a mini-driver gene approach could be a useful tool to support further analysis (sense and antisense strands) of these controversial regions to understand their involvement in tumor growth.
According to Dressler et al. (2022), when a driver gene mutates, it significantly impacts cancer cell growth, providing them with a fundamental advantage in their development (Dressler et al., 2022). However, our classification of patients as mutated, which includes at least one mutation in one of five genes, has resulted in decreased survival time. The universe of mutations present in these genes may have different pathways to favor tumor proliferation. Nonetheless, the classification of somatic mutations is affected by the initial analysis. Typically, researchers detect all somatic mutations but exclude those highly present in large populations (Timmermann et al., 2010;Ma et al., 2020). This can limit the analysis to a reduced number of targets (Leedham & Tomlinson, 2012;Lee-Six et al., 2019) based on the pathological effect related to variations rarely present in healthy individuals. However, in cancer, aberrant expression levels and unexpected pathways (Hanahan, 2022) may support new functions for these polymorphisms that are usually discounted when found as somatic variations.
Therefore, our study suggests analyzing all somatic mutations to assess cancer prognosis, combining traditionally evaluated driver genes and mutations with additional tumor-promoting regions (mini-drivers). We believe that even silent mutations may have additional functions related to the expression of non-coding RNA expressed from the same genomic locations. Li & Thirumalai (2016) argue that when the main drivers lose their mutagenic capacity, mini-drivers help normalize the physical condition advantage of the drivers. Additionally, mini-drivers could confer a physical fitness advantage on cancer cells, especially when they accumulate during tumor progression due to stochastic genetic mutations (Li & Thirumalai, 2016). To increase the number of therapy proposals and biomarkers discovered for colorectal cancer (CRC), we suggest exploring candidate mini-driver regions in a gene-panel strategy.
We used NGS-based information to analyze somatic mutational profiles from a broader perspective to evaluate a new strategy for predicting biomarkers for CRC. However, our study has limitations. We have no control over the data collection, information, and permissions declared by the patients. Nonetheless, the databases used are supported and validated by accredited public and private health institutions. Furthermore, many of the proposed biomarkers for CRC have limited specificity and require further experimental research. Nevertheless, our methodology uses critical and rigorous settings to determine putative mini-driver genes, providing deeper insights into colorectal carcinogenesis to mitigate the risk of presenting biased information. Therefore, we expect that our findings will support future research to find prognostic markers for CRC using solid and liquid biopsies by testing panels of driver and mini-driver genes.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work was supported by the Universidad Nacional Federico Villarreal (Lima, Peru) through Resolution No. 9343-2021-UNFV conceded to Anthony Vladimir Campos Segura and Prof. Ana Isabel Flor Gutiérrez Román as part of the incentives program for undergraduate thesis. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.