Effects of Helicobacter pylori Infection on the Development of Chronic Gastritis

Background/Aims: Based on the gene expression profiles of gastric epithelial tissue at different stages of Helicobacter pylori-infected gastritis, key long noncoding RNAs and genes in the development of Helicobacter pylori infection-induced gastritis were screened to provide a basis for early diagnosis and treatment. Materials and Methods: We downloaded 2 sets of sample data from the database, including gastric epithelial tissue samples from gastritis patients from Bhutan and Dominican, and screened mRNAs in the differentially expressed RNAs of the 2 regions. Mfuzz clustering algorithm was used to screen RNAs related to the 3 stages of chronic gastritis. The competing endogenous RNA (ceRNA) regulation network was constructed, and the selected key RNAs were verified. Samples from Bhutan and Dominican were subdivided into the chronic gastritis/normal comparison groups, and the differentially expressed RNAs were screened to obtain 1067 overlapping RNAs, containing 21 long noncoding RNAs and 1046 mRNAs. Results: Thirty-eight significant gene ontology functional nodes and 6 expression pattern clusters were obtained. Two ceRNA regulatory networks were constructed, and 4 shared miRNAs (hsa-miR-320b, hsa-miR-320c, hsa-miR-320d, and hsa-miR-155-5p) were obtained. Eleven important long noncoding RNAs (AFAP1-AS1, MIR155HG, LINC00472, and FAM201A) and mRNAs (CASP10, SLC26A2, TRIB1, BMP2K, SCAMP1, TNKS1BP1, and MBOAT2) regulated by these 4 miRNAs were obtained. These results indicated that Helicobacter pylori infection had a certain influence on the development of gastritis. Conclusion: The 11 key RNAs can provide a target for the early diagnosis and treatment of chronic gastritis following Helicobacter pylori infection.


INTRODUCTION
Chronic gastritis (CG) is a common and frequently occurring disease of the digestive system. It is a disease of the intrinsic gland of the mucosa caused by repeated damage to the gastric mucosa. Chronic gastritis is nearly associated with the occurrence of gastric cancer (GC), 1 which is one of the most common malignant tumors. The mortality rate is third only to lung cancer and liver cancer, and the survival rate is only approximately 20% within 5 years, which is a serious threat to human survival and health. 2,3 Helicobacter pylori (Hp) is a unipolar, multiflagellate, blunt-ended, spiral-curved, microaerobic, Gram-negative bacterium with a length of 2.5-4.0 μm and a width of 0.5-1.0 μm. 4 Helicobacter pylori has strong exercise capacity, can penetrate the mucus layer to damage gastric mucosal epithelial cells, and is a major pathogen that can survive in the human stomach for a long time, which is related to the development of CG and peptic ulcer. 5 Helicobacter pylori infection occurs mainly through 3 stages: (1) stable colonization in gastric mucosal epithelial cells; (2) the evasion from host immune system attack; and (3) the release of toxins that damage the gastric mucosa. [6][7][8] Helicobacter pylori infection can trigger innate and adaptive immune responses of the host, resulting in the infiltration of several neutrophils, monocytes, and macrophages, leading to the occurrence of acute gastritis and CG. Helicobacter pylori induces the host to produce various cytokines that alter the physiological environment in the stomach. [9][10][11] Recently, studies have found that the abnormal expression of long noncoding RNAs (lncRNAs) at the cellular level is closely associated with the occurrence and development of cancers. Long noncoding RNAs participate in various biological behaviors, such as cell differentiation and ontogeny, at multiple levels. However, information about the function of most ncRNAs is limited; so they have broad research prospects. 12,13 Long noncoding RNAs are a class of ncRNAs that are greater than 200 nt in length and lack the ability to encode proteins. Compared with RNA-encoding proteins, lncRNAs are shorter in length, have fewer exons, perform less coding, and have tissue or cell specificity. 14 Existing studies have confirmed that the occurrence of diseases is often associated with abnormal transcription. This abnormality not only is limited to abnormalities in protein-encoding RNA levels but also includes abnormalities in the function of ncRNAs in the genome, including lncRNAs. Long noncoding RNAs are noncoding RNAs that play an important regulatory role in the development of some diseases and may serve as a biomarker for disease progression or prognosis. Researchers have discovered that many functional lncRNAs can directly or indirectly regulate the expression of known oncogenes or tumorsuppressor genes, indicating that lncRNAs have broad prospects for research and development as a tool for tumor diagnosis and treatment. 15 18 so that the gene expression data were converted from a skewed distribution to approximately normal distribution, and then, the data were normalized using the median normalization method.
Significant Differential Expression of RNA First, the annotation platform of GSE60427 was downloaded, and the detection sequence provided by the platform was used to align Clustal 2 (http: //www .clus tal.o rg/ cl ustal 2/) 19 using human whole-genome sequencing (version: GRCh38) to identify lncRNAs and mRNAs and their corresponding expression information. 20,21 Since the samples were from 2 regions, to eliminate regional differences, we first classified the samples according to the presence or absence of gastritis in the Bhutan and Dominican samples. Then, the Limma package version 3.34.0 in the R3.4.1 language 18 was used to calculate the differential false discovery rate and foldchange value of RNA expression between the groups. False discovery rates less than 0.05 and |log2FC| more than 0.5 were regarded as the threshold for screening the differentially expressed RNAs (DERs). Based on the expression level of the RNAs obtained by screening, the expression values were subjected to bidirectional hierarchical clustering based on the Euclidean distance 22,23 using the Pheatmap package version 1.0.8 (https ://cr an.rproje ct.or g/pac kage = pheatmap) 24 in R3.4.1 language and displayed using a heat map.
The DERs obtained in the Bhutan and Dominican regions were compared, and then, the intersection of the 2 was considered a set of DERs after eliminating geographical differences. The DAVID 6.8-based (https ://da vid.n cifcr f.gov /) 25,26 gene ontology (GO) functional node enrichment annotation analysis was performed on the mRNA in the DERs, and the significantly associated GO functional nodes and KEGG (Kyoto Encyclopedia of Genes and Genomes) signaling pathways were screened, and P values of less than .05 were selected as the significant enrichment screening threshold.

Mfuzz Clustering Algorithm for Screening RNAs Related to Different Stages of Chronic Gastritis
In the analysis, the CG samples contained disease samples at different stages of development-mild, severe, and IM stages-which are the process of deteriorating  27 to perform time-series trend analysis on DERs according to the CG level and obtain the expression trend module gene clustering. The expression trends of differential genes during CG deterioration were observed, and then, the significant module gene sets were analyzed using the DAVID 6.8-based 25,26 GO biological process and KEGG signaling pathway enrichment annotation.

Construction of ceRNA Regulatory Networks
According to the membership value of RNAs in the trend module obtained by Mfuzz clustering in the previous step, key RNAs were identified in each module, and then, the ceRNA (competing endogenous RNA) network was constructed as follows: A. Prediction of lncRNA-miRNA linkage relationship: For the lncRNAs, the binding relationship between the target lncRNAs and miRNAs was searched using the DIANA-LncBasev2 database information (http: //car olina .imis .athe na-in novat ion.g r/dia na_to ols/w eb/in dex.p hp?r= lncba sev2% 2Find ex-ex perim ental ), 28 and the linkage of miRNA target gene (miTG) score was higher than 0.8. The miTG-score is defined as the sum of the scores of all identifiable microRNA response elements at the 3ʹ-UTR. The higher the value, the greater the probability of targeting. B. Prediction of miRNA-mRNA linkage relationships: For the miRNAs obtained in A, the target genes regulated by them were searched using the starBase version 2.0 database. 29 The starBase database provides comprehensive target gene prediction information from TargetScan, PicTar, RNA22, PITA, and miRanda. We screened at least one of the database's regulation relationships, which were considered target miRNAs to regulate the target gene's relationship pairs and corresponded to the mRNA in the key RNA set, retaining the key target mRNA ligation pairs regulated by miRNA. C. Construction of the ceRNA network: The linkages were combined between A and B; then, a lncRNA-miRNA-mRNA regulation network was constructed, and the DAVID-based GO biological processes and KEGG signaling pathway enrichment analysis were performed for each ceRNA to regulate mRNA in that network.

Verification of the Expression Levels of Key RNAs
The various ceRNA regulatory networks constructed in the fourth step were compared, and different ceRNA networks regulated by the same miRNAs were selected to integrate the key RNAs in different modules. For the selected key RNAs, we first showed the expression level of CG/normal in the original analysis dataset GSE60427 and then showed the curve of the expression level changes in different developmental stages of CG. Additionally, key RNAs were validated in healthy controls and CG samples in GSE111762, and the expression levels were further validated at different stages of disease progression.

Data Preprocessing and Significant Differential Expression Screening
First, the data of the expression spectrum dataset obtained by downloading were standardized. The box diagram before and after standardization is shown in Figure 1.
The samples were then divided into Bhutan and Dominican  Figure 2A and 2B. The bidirectional hierarchical clustering heat map based on the 2 DER expression levels obtained by screening is shown in Figure 2A and 2B. It can be seen from the figure that the RNA expression values obtained by screening can separate the different types of samples well, and the color is clear, indicating that the RNAs screened in the CG and control groups in the 2 regional groups are characteristic of the samples.

Comparative Analysis of Differentially Expressed RNAs Collected from Different Regions
The DERs screened in the Bhutan and Dominican samples were then compared, and the results are shown in Figure 3A. Overall, 1067 overlapping RNAs were obtained, of which 200 were downregulated and 867 were upregulated to express RNAs, including 21 lncRNAs and 1046 mRNAs. The distribution ratio is shown in Figure 3A. Then, we analyzed the correlation between the expression levels of overlapping RNAs in disease samples from the Bhutan and Dominican regions, and the results are shown in Figure 3B. The results showed that overlapping RNAs, although from CG samples from different regions, showed a high positive correlation. Additionally, based on overlapping RNAs, bidirectional hierarchical clustering analysis based on the expression levels was performed, and the results are shown in Figure 3C. The results showed that the degree of difference between the overlapping RNAs in the 2 groups was highly significant, and the direction of the difference was the same. In summary, the difference between the samples from different regions was eliminated by taking the intersection of DERs obtained from the 2 regions (helping combine different regional samples for different stages of disease development).
Then, the mRNAs contained in the overlapping RNAs were subjected to enrichment annotation analysis based on the DAVID-based GO functional node and KEGG signaling pathway. The results are shown in Table 1 and Figure 4.     The larger the point, the more the number of genes; the color of the dots represents the correlation, and the closer the color is to red, the higher the significance. (B) KEGG signaling pathway pie chart with significant enrichment of overlapping genes. Each component represents a different KEGG pathway, the number represents the genes number involved in the pathway, the color represents significance, and the closer to red, the higher the significance.

Zhou et al. Effects of Hp Infection on CG Turk J Gastroenterol 2023; 34(7): 700-713
Thirty-eight significantly related GO functional nodes were obtained, including 14 biological processes, 13 cellular components, and 11 molecular functions. Overlapping genes were significantly involved in biological processes, such as immune responses, and in 11 KEGG signaling pathways, including cytokine-cytokine receptor interactions.

Mfuzz Clustering Algorithm to Screen RNAs Related to Different Stages of Chronic Gastritis
In the sample analysis, the CG samples contained disease tissues at different stages of development-mild, severe, and IM stages-which comprise the process of deterioration from light to severe in the development of CG. Screening for DERs in regular overlapping DERs with regular changes in expression of stages can help examine and screen RNAs closely related to disease progression. We used the Mfuzz package to analyze the time-series trend of the overlapping DERs according to the CG level. The results are shown in Figure 5; 6 expression pattern clusters were obtained. Among them, the genes contained in clusters 1 and 2 maintained a single change state during the normal-mild-severe-IM deterioration development stage, and the expression continued to rise and fall. Therefore, we speculated that the continuous changes in gene expression in these 2 modules are closely related to the progression of CG. The enrichment annotation analysis of GO biology and KEGG signaling pathways was performed on mRNAs contained in clusters 1 and 2, and the results are shown in Tables 2 and 3 and Figure 6.

Construction of the ceRNA Regulatory Network
According to the membership value of the RNAs in the trend module obtained by Mfuzz clustering in the previous step, the gene membership values in clusters 1 and 2 were sorted by power. RNAs with a membership value higher than 0.6 were selected as key genes in clusters 1 and 2, containing 30 RNAs (i.e., 2 lncRNAs and 28 mRNAs) and 190 RNAs (i.e., 8 lncRNAs and 182 mRNAs), respectively. Then, the following analysis was performed: A. Long noncoding RNA-miRNA connection relationship prediction: For the 2 and 8 lncRNAs contained in clusters 1 and 2, respectively, the binding relationship between the target lncRNA and miRNA was searched using DIANA-LncBasev2 database information. Only the ligation of miTG scores above 0.8 was retained; 16 and 27 pairs of lncRNA-miRNA connections in clusters 1 and 2, respectively, were screened. B. miRNA-mRNA connection relationship prediction: For the miRNAs obtained in A, the target genes regulated by them were searched using the starBase version 2.0 database. Then, the target gene is mapped into the Figure 6. The mRNAs contained in clusters 1 (A) and 2 (B) are significantly correlated with the gene ontology biological process and KEGG signaling pathway. The horizontal axis indicates the gene number, the vertical axis indicates the item name, the color of the column represents significance, and the closer the color is to orange, the higher the significance. GO:0046887~positive regulation of hormone secretion mRNAs contained in the 2 clusters; 33 and 56 pairs of miRNA-mRNA regulation connections of miRNAs linked to clusters 1 and 2 lncRNAs, respectively, in step A were obtained. C. Construction of a ceRNA regulatory network: Combining the regulation relationships in A and B, a ceRNA regulatory network composed of RNAs in clusters 1 and 2 was constructed ( Figure 7). The GO biological process and KEGG signaling pathway were then performed on the mRNAs in the 2 ceRNA regulatory networks (Tables 4 and 5). Screening yielded 6 and 6 significant correlations in GO biological processes and 1 and 0 KEGG signaling pathways, respectively.
For the 11 key RNAs selected, we first showed the CG/ normal expression level in the original analysis dataset GSE60427 ( Figure 9A). Additionally, expression levels were verified for the 11 key RNAs in 3 healthy control samples and 6 CG samples in GSE111762 ( Figure 9B). The expression levels of the 11 RNAs in different samples of the 2 datasets were the same; SCAMP1 and BMP2K were significantly different between the 2 sets of healthy control and CG samples.

DISCUSSION
Global cancer statistics show that GC ranks second in cancer incidence in developing countries and third in mortality. Approximately 70% of new cases and deaths each year come from developing countries. 2,3 Since the discovery of Hp in 1984, which has led to stomach and duodenal ulcers, Hp has been studied more deeply as the initiator of gastritis. Until 1991, Knight et al [30][31][32][33] have found that Hp infection can increase the risk of stomach cancer, and the risk is almost 3 times higher than that of people without Hp infection.
Several studies have reported that Hp infection of host cells causes instability of the genome inside the cell. 34,35 In numerous malignant tumors, treatment is more     42 found that the expression of the lncRNA THAP9-AS1 was upregulated after infection of GC cells with Hp and was higher in GC tissues than in gastritis tissues. Colony formation, CCK8, and transwell assays were performed to show that THAP9-AS1 can promote GC cell proliferation and migration in vitro. The results of this study demonstrate that the lncRNA THAP9-AS1 is induced by Hp to promote GC cell growth and migration. These studies suggest that bacterial infection caused by Hp can affect genes related to antibacterial mechanisms and induce inflammation that may manifest as symptoms of CG.
Recently, through the statistical calculation and analysis of gene sequences and further experimental verification, we can further understand the relationship between certain genes and life activities. Some well-known RNAs have been widely used in diagnosing clinical tumors or as molecular targets and become faithful predictors of human health. 43,44 HOTAIR is a known long non-coding RNA which has recently been associated with the progression of some cancer types. The overexpression of HOTATR promoted the proliferation and migration of GC cells, and the knockdown of HOTATR expression significantly inhibited tumor proliferation. 45 The expression level of CCATI in GC tissues is significantly higher than that in healthy tissues, and its overexpression can promote the proliferation and invasion of tumor cells. 46 GAS5 is an lncRNA with tumor-suppressing function, and its expression level is significantly downregulated in various tumors, such as breast and pancreatic cancers. Low GAS5 expression was significantly associated with low survival rate of GC and late Tumor Node Metastasis (TNM). The overexpression of GAS5 can promote GC cell proliferation and induce apoptosis. GAS5 interacts with the transcriptional activator YBX1 to affect the protein expression level of YBX1, thereby affecting the expression of p21 to induce cell cycle arrest. 47 An increasing number of studies have found that RNAs are abnormally expressed in GC and contribute to suppressing or promoting cancer. In this study, we screened 11 important lncRNAs (AFAP1-AS1,