Construction of miRNA-mRNA-TF Regulatory Network for Diagnosis of Gastric Cancer

Gastric cancer (GC), as an epidemic cancer worldwide, has more than 1 million new cases and an estimated 769,000 deaths worldwide in 2020, ranking fifth and fourth in global morbidity and mortality. In mammals, both miRNAs and transcription factors (TFs) play a partial role in gene expression regulation. The mRNA expression profile and miRNA expression profile of GEO database were screened by GEO2R for differentially expressed genes (DEGs) and differentially expressed miRNAs (DEMs). Then, DAVID annotated the functions of DEGs to understand the functions played in biological processes. The prediction of potential target genes of miRNA and key TFs of mRNA was performed by mipathDB V2.0 and CHEA3, respectively, and the gene list comparison was performed to look for overlapping genes coregulated by key TFs and DEMs. Finally, the obtained miRNAs, TF, and overlapping genes were used to construct the miRNA-mRNA-TF regulatory network, which was verified by RT-qPCR. 76 upregulated DEGs, 199 downregulated DEGs, and 3 upregulated miRNAs (miR-199a-3p/miR-199b-3p, miR-125b-5p, and miR-199a-5p) were identified from the expression profiles of mRNA (GSE26899, GSE29998, GSE51575, and GSE13911) and miRNA (GSE93415), respectively. Through database prediction and gene list comparison, it was found that among the 199 downregulated DEGs, 61, 71, and 69 genes were the potential targets of miR-199a-3p/miR-199b-3p, miR-125b-5p, and miR-199a-5p, respectively. 199 downregulated DEGs were used as the gene list for the prediction of key TFs, and the results showed that RFX6 ranked the highest. The potential target overlap genes of miR-199a-3p/miR-199b-3p, miR-125b-5p, and miR-199a-5p were 4 genes (SH3GL2, ATP4B, CTSE, and SORBS2), 7 genes (SLC7A8, RNASE4, ESRRG, PGC, MUC6, Fam3B, and FMO5), and 6 genes (CHGA, PDK4, TMPRSS2, CLIC6, GPX3, and PSCA), respectively. Finally, we constructed a miRNA-mRNA-TF regulatory network based on the above 17 mRNAs, 3 miRNAs, and 1 TF and verified by RT-qPCR and western blot results that the expression of RFX6 was downregulated in GC tissues. These identified miRNAs, mRNAs, and TF have a certain reference value for further exploration of the regulatory mechanism of GC.


Introduction
Gastric cancer (GC), as an epidemic cancer worldwide, has more than 1 million new cases and an estimated 769,000 deaths worldwide in 2020, ranking fifth and fourth in global morbidity and mortality [1]. Despite great advances in our understanding of cancer, the causes of its occurrence and progression are not fully understood. In general, the related factors involved in the occurrence and development of GC can be divided into genetic and epigenetic changes, as well as environmental and lifestyle factors [2,3]. The molecular pathogenesis of GC has always been the focus of scientists. In recent years, miRNA and transcription factors (TFs) have become an important breakthrough direction in the diagnosis, prognosis, and targeted therapy of GC.
TFs are proteins that bind to chromatin to activate or inhibit transcription by helical binding of DNA to specific regulatory sequences in the form of trans-activated or trans-repressive domains. They are expressed in tissues spatially, temporally, and sequentially during development, cell renewal, or differentiation, and any change in their expression may lead to uncontrolled cell integrity or dynamic homeostasis, leading to pathological changes [21], such as diabetes [22] and cancers [23]. TFs are classified into different families, reflecting the homology of their DNA binding domains, thus, reflecting the binding sequence of DNA [24,25]. However, in-depth studies have found that TFs can play a double-edged sword role, so they may play both anticancer and oncogenic roles, even though TFs of the same family may have two-way effects on the same cancer [26,27]. Taking the role of SOX family TFs in the pathological process of GC as an example, since the first discovery of SRY protein as a TF involved in mammalian male determination, up to 20 members of SOX family have been identified in mammals [28,29]. However, although they share a DNA binding domain consisting of 79 amino acids, their effects on GC cells are quite different [30]. For example, SOX1 and SOX6 have anticancer effects; SOX3, SOX4, and SOX5 have procancer effects; and SOX2, SOX7, and SOX10 have both anticancer and procancer effects [31]. Therefore, to accurately understand the role of a TF in a certain cancer is the focus of cancer mechanism research and treatment in recent years.
The aim of this study is to further understand the direct regulatory relationship between mRNA, miRNA, and TFs. Through the analysis of existing mRNA and miRNA expression profile data in the GEO database, the functional annotation, and TF prediction of DEGs, target gene prediction of DEMs was performed with the help of databases such as DAVID, mipathDB V2.0, and CHEA3, respectively, and the regulation network of miRNA-mRNAs and TF was finally constructed based on the results above.

Microarray Data.
To identify DEGs and DEMs in the pathogenesis of GC and build a regulatory network of TF-miRNA-mRNA, we searched the Gene Expression Omnibus (GEO) database with keywords such as "gastric cancer," "mRNA," and "miRNA." Only the data sets containing both GC tissue samples and control group information and without any drug intervention before sequencing could meet the criteria. Finally, 4 mRNA data sets (GSE26899, GSE29998, GSE51575, and GSE13911) and 1 miRNA data set (GSE93415) were selected for the follow-up analysis of this work.
2.2. Identification of DEGs and DEMs. GEO2R (https://www .ncbi.nlm.nih.gov/geo/geo2r/) is a software based on GEO database to perform differential analysis on expression profile chips. After the processing of 5 data sets using GEO2R, DEGs and DEMs were screened out using adjust p value < 0.05, |logFC | >1 and adjust p value < 0.01, |logFC | >1:9 as the criteria, respectively. The p value adjusted by Benjamini Hochberg false discovery rate (FDR) method can significantly reduce the false positive rate [32].
2.3. Biological Significance Analysis of DEGs. In order to further understand the biological function of DEGs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed via a database for annotation, visualization, and integrated discovery (DAVID, https://david.ncifcrf.gov/home .jsp). The cut-off criterion, p value < 0.05, was set to selected GO terms and KEGG pathway enrichment analysis, and then the analysis results were visualized through OriginPro (2021b_Beta7) software and bioinformatics, an online data visualization tool (http://www.bioinformatics.com.cn).

Prediction of miRNA Target
Genes. miRpathDB v2.0 (https://mpd.bioinf.uni-sb.de/overview.html) is a multifunctional online miRNA research site with a large database of miRNA target genes. In order to comprehensively and accurately excavate the regulatory relationship between miRNA and mRNA in GC tissues, the potential targets of candidate miRNA were comprehensively predicted by mipathDB v2.0, and the possible miRNA-mRNA regulation was found by comparing potential targets with DEGs.

Prediction of mRNA Transcription Factors.
In the process of gene transcription and expression, miRNA and TF, respectively, play a partial regulatory role. Therefore, further digging out key TFs after clarifying the regulatory relationship between miRNA and mRNA is more conducive to indepth understanding of the pathological mechanism of GC. TF prediction was performed via ChIP-X Enrichment Analysis 3 (ChEA3). ChEA3, whose database contains a collection of gene set libraries generated from multiple sources including TF-gene coexpression from RNAseq studies, TF-target associations from ChIP-seq experiments, and TF-gene cooccurrence computed from crowd-submitted gene lists, is a TFs enrichment analysis tool that ranks TFs associated with user-submitted gene sets.

Patients and Samples
To further verify the reliability of the above bioinformatics analysis, 4 pairs of clinical tissue samples from GC patients were obtained from the First Affiliated Hospital of Zhejiang Chinese Medical University. After surgical removal from the patient, all tissue samples were stored in a -80°C cryogenic refrigerator and kept cryogenic during the samples transfer and nonexperimental disposal period. All biometric tests of tissue samples were performed with the informed consent of the patients. The Ethics Committee of the First Affiliated Hospital of Zhejiang Chinese Medical University approved the research (approval no.: 2017-K-002-01).
3.1. RNA Isolation and Quantitative Reverse Transcription-PCR (RT-qPCR). Total RNA was extracted from tissue samples by Trizol method. First of all, 0.05 g tissue samples were weighed and put into 1.0 ml Trizol (Carlsbad, California, USA) solution, fully ground, and then stood for 10 min. Then, 200 μl trichloromethane (Xilong Chemical Co., Ltd., Guangdong Province, China) was added, shaken, and stood for 2 min, and the supernatant was extracted by centrifugation at 4°C 12000 rpm for 15 min. Next, add the same amount of isopropyl alcohol (Huadong Medicine Co., Ltd., Zhejiang Province, China) and rest overnight in the -80°C cryogenic refrigerator. The next day, the samples were taken out, and the supernatant was centrifuged at 4°C 12000 rpm for 10 min and washed twice with an appropriate amount of 75% ethanol (Hangzhou Longshan Fine Chemical Co., Ltd., Zhejiang Province, China). Finally, 20-30 μl nucleasefree water (Promega Corporation, Wisconsin, USA) was added for mixing.
Then, Revert Aid First Strand cDNA Synthesis Kit (Waltham, Massachusetts, USA) was used to obtain cDNA by the instructions, and the iTaqTM Universal SYBR@ Green Master Mix kit was used to detect the expression level of mRNA. All primers (RFX6 forward: CATGGCAAGCC GAGGAAGTGTC; RFX6 reverse: GGTATGTGGAGCAG TGTGATGGAG) were synthesized by Sangon Bioengineering (Shanghai) Co., Ltd. GAPDH (GAPDH forward: CGAGCC ACATCGCTCAGACA, GAPDH; GAPDH reverse: CTGG TGAAGACGCCAGTGGA) was used as endogenous control, and all results were calculated and expressed as 2 -ΔΔCt .

Western Blot.
Clinical tissue samples were lysed in RIPA lysis buffer, and protein concentrations were determined using a BCA Protein Assay Kit (Thermo Scientific, MA, USA). Tissue lysates were separated with SDS-PAGE gels and transferred to a polyvinylidene difluoride (PVDF) membrane (Millipore, Eschborn, Germany). The blots were blocked in 5% milk for 1 h at room temperature. PVDF membranes were incubated with the primary antibodies overnight in a cold room at 4°C. Subsequently, bound primary antibodies were reacted with corresponding secondary antibodies for 1 h at room temperature and evaluated with by chemiluminescence.

Statistical Analysis.
Statistical analysis was performed through GraphPad Prism (version 6, San Diego, CA) software. Student's t-tests were utilized for the comparison of two sample groups. Differences were considered as statistically significant when p < 0:05.

Validation of Significant TF in GC Tissues.
In order to further verify the accuracy of the above bioinformatics analysis results, the mRNA and protein levels of the significant regulatory factor RFX6 were verified in 4 pairs of GC tissues and normal tissues by RT-qPCR and western blot. RFX6 activates transcription by forming a heterodimer with RFX3 and binding to the X-box in the target gene promoter, so the expression of RFX6 is positively correlated with downstream genes. The validation results showed that the expression of RFX6 mRNA and protein was significantly downregulated, which suggested that the expression of RFX6 was positively correlated with the expression of downstream genes (Figure 4(b)). Therefore, the above bioinformatics

Discussion
According to the data released by the Global Cancer Observatory (GCO), in 2020, there were 1,9292,789 new cancer patients in the world, among which 1,089,103 were GC patients, accounting for 5.6%, ranking the fifth. The research on the mechanism of GC and the development of new drugs have always been the shining research direction of medical and scientific researchers.
In this study, DEGs and DEMs were screened out from the existing mRNA and miRNA expression profiles, and the potential targets of DEMs and TF of DEGs were predicted through the database, so as to construct miRNA-mRNA-TF regulatory network and further understand the pathogenesis of GC. First of all, in the 4 mRNA expression profiles containing 210 GC tissues and 118 normal tissues, 275 DEGs were screened using adjust p value < 0.05, |logFC | >1 as the criterion, among which 76 genes were significantly upregulated in GC tissues compared with normal tissues, while the remaining 199 genes were significantly downregulated in GC tissues compared with normal tissues. Second, to further understand the mechanism of DEGs in biological processes, GO terms and KEGG pathway enrichment analysis were conducted for the upregulated genes and downregulated genes, respectively, and the results showed that the downregulated genes with a higher proportion of DEGs had more obvious enrichment. Then, using adjust p value < 0.01, |logFC | >1:9 as the criterion, three miRNAs, including miR-199a-3p/miR-199b-3p (adjust p value = 3.02E-04, log FC = 2:15675), miR-125b-5p (adjust p value = 1.58E-04, log FC = 1:98925), and miR-199a-5p (adjust p value = 9.38E-06, log FC = 1:9549), were screened out from a miRNA expression profile containing 20 GC tissues and 20 normal tissues, which were overexpressed in GC tissues relative to normal tissues. After screening the DEMs, to explore the regulatory relationship between miRNA and mRNA and construct miRNA-mRNA regula-tory network, the potential targets of miRNAs were predicted by the mipathDB V2.0 data platform, and some of the 199 low-expressed genes that might be regulated by miR-199a-3p/miR-199b-3p, miR-125b-5p, and miR-199a-5p were obtained by using the Venn diagram. Finally, the prediction of TFs, 199 significantly low-expressed genes were used to predict key TFs that played an important role in the progression of GC cases. The prediction results showed that RFX6 ranked the highest in the average of multiple databases, suggesting that RFX6 may play a key regulatory role in the pathogenesis of GC. RT-qPCR results confirmed that the expression of RFX6 was significantly changed in clinical GC tissues. miRNAs affect gene expression by regulating the translation and degradation of mRNA, and the promotion or inhibition effect of this mechanism on cancer cells has been reported in recent years [33]. It was found by retrieval that the significantly upregulated expression of miR-199a-3p, miR-125-5p, and miR-199a-5p in GC cells and tissues analyzed in this study has been reported many times. In 2003, a computational prediction of miR-199a's identity was made based on the conservatism of miR-199a among humans, mouse, and puffer fish [34]. The expression of miRNA in zebrafish was verified, and the ends of the miRNA were localized by cloning. The two miRNA sequences were named miR-199a and miR-199a * (from the 3 'arm), respectively. It has been reported that both of these two miRNA expression forms are expressed in humans and are named as miR-199a-5p and miR-199a-3p, respectively [35]. Functional studies have shown that miR-199a-3p can significantly promote the proliferation and inhibit apoptosis of GC cells in vivo and in vitro. It has been found in clinical studies that the overexpression of miR-199a-3p is associated with tumor size, Lauren stage, depth of invasion, lymph node metastasis, distant metastasis, TNM stage, and prognosis [36]. Moreover, in stage I, II, and III tumors, the high expression of miR-199a-3p is associated with a significantly reduced 5-year survival rate. In addition, luciferase report assay demonstrated that miR-199a-3p directly targeted the expression level of ZHX1 regulatory protein. Therefore, researchers hypothesize that the oncogenic activity of miR-199a-3p may be related to the direct targeting and inhibition of ZHX1 [37]. Similar to miR-199a-3p, miR-199a-5p can promote the migration and invasion of GC cells, and its expression level is related to tumor diameter, lymph node metastasis, and TNM stage. However, the difference was that the expression level of miR-199a-5p was not associated with Lauren staging and distant metastasis [38]. Different from miR-199a-3p and miR-199a-5p, the role of miR-125b-5p in the pathogenesis of GC is not fully understood, but the identification results at the cell level and tissue level indicate that miR-125b-5p is significantly upregulated during the development of GC [39]. In addition, in a study on drug resistance, it was found that STAT3 may be a potential target of miR-125b-5p, and the signaling cascades involved in the regulation of chemotherapy drug resistance in GC [40].
In addition to these identified miRNAs, other mRNAs in the miRNA-mRNA-TF regulatory network have also been individually reported, in addition to CTSE, SORBS2,  [41]. Moreover, the β subunit of the gastric H + , K + -ATPase encoded by ATPase H + /K + Transporting Beta Subunit (ATP4B) are decreased in GC cells and tissues due to the interaction between DNA methylation and histone acetylation [42,43]. Previous studies have found that if ATP4B expression level is restored, it can inhibit the proliferation, activity, migration, invasion, tumorigenicity, and induction of apoptosis of GC cells, and its tumorsuppressive effect may be played through the regulation of mitochondrial metabolism and apoptosis pathway [44].
Besides, pepsinogen C (PGC) belongs to the aspartic protease family, which is expressed in situ in gastric mucosa, expressed in serum, and expressed ectopic. It is secreted by the gastric host cells and has an activation effect on pepsin C and can digest polypeptides and amino acids [45]. In the process from superficial gastritis to atrophic gastritis and finally to the formation of GC, the positive level of PGC continues to decline, which indicates that the in situ expression of PGC may have a negative correlation with the formation of GC [46]. And the results showed that the expression of MUC6 was decreased in GC tissues, but surprisingly, there was no significant difference between the expression of MUC6 in GC tissues and the gender, tumor site, lymphatic infiltration, clinical stage, metastasis, Lauren intestinal type, diffuse type, and mixed type GC tissues [47].
As for the only TF, RFX6, a member of the RFX (regulatory factor X-box binding) family of winged-helix transcription factors [48], which is downstream of Neurog3 and upstream of many other islet transcription factors in the islet development factor hierarchy, is required for differentiation of four of the five islet cell types [49]. In prostate cancer, on the one hand, clinical data showed that the upregulated expression of RFX6 was associated with the risk of tumor progression, metastasis, and biochemical recurrence [50]. In addition, RFX6 has also been reported to be differentially expressed in melanoma, liver cancer, GC, and other cancer tissues, but the further molecular mechanism remains to be explored.

Data Availability
The datasets of GSE26899, GSE29998, GSE51575, GSE13911, and GSE93415 were downloaded from the GEO database.

Conflicts of Interest
The authors have declared that no competing interest exists.  Figure 4: Construction of miRNA-mRNA-TF regulatory network and validation of RFX6 expression level. (a) Three miRNAs (miR-199a-3p/ miR-199b-3p, miR-125b-5p, and miR-199a-5p) and transcription factor (TF) RFX6 coregulate gene expression in the miRNA-mRNA-TF regulatory network, and miRNAs negatively regulate their expression levels by binding to the mRNA 3′-UTR region of 17 genes. RFX6 activates transcription by forming a heterodimer with RFX3 and binding to the X-box in the target gene promoter. (b) and (c) Compared with normal gastric tissue, RFX6 was significantly downregulated in GC tissue.