Investigation of hub genes involved in Turner syndrome using biological informatics methods

Abstract Background: This study aimed to explore candidate genes and their potential interaction mechanism critical to the pathophysiology of Turner syndrome by using the Gene Expression Omnibus database. Methods: GSE58435 data set was obtained by querying the Gene Expression Omnibus database. Differentially expressed genes (DEGs) were screened using R and subsequently annotated by Gene Ontology. Functional enrichment analysis was performed based on the Kyoto Encyclopedia of Genes and Genomes database for annotation, visualization, and integrated discovery. A protein-protein interaction network of different genes was constructed based on the STRING database, in which hub genes were explored through Cytoscape software. The expression of the hub genes was verified by analyzing the gene expression in the GSE46687 data set. Results: A total of 733 differential genes were identified. These differentially expressed genes were significantly enriched in nucleoplasm and nucleus. Their molecular function was concentrated on DNA binding and transcription, coronary artery, and adipose tissue development. According to the annotation of Kyoto Encyclopedia of Genes and Genomes, the identified DEGs were mainly enriched in inflammatory mediator regulation of TRP channels, osteoclast differentiation. A total of 10 hub genes (HIST1H2BA, TRIM71, HIST1H2BB, HIST1H4D, TNF, TP53BP1, CDCA8, EGF, HMG20B, and BCL9) were identified from the constructed protein-protein interaction network. These genes were discovered to be highly expressed in osteoclasts, ovaries, digestive tract, blood, and lymphatic tissues through the online application of human protein atlas. Conclusion: In this study, 733 DEGs and 10 hub genes were identified. They would be new candidate targets in Turner syndrome.


Introduction
Turner syndrome (TS) is a chromosomal disorder in women characterized by complete of partial loss of 1 of the 2 Xchromosomes.Its incidence is 3 to 5 in 10,000 new-born girls.It is characterized by small stature, absent menarche, and absent growth spurt.Other clinical features of TS may include typical dysmorphic stigmata, renal, cardiac, skeletal, and metabolic abnormalities. [1]It also involves gastrointestinal symptoms like, for example, celiac disease (CD). [2]Half of the children with TS have congenital heart abnormality, like mitral stenosis and aortic stenosis. [3,4]Patients also develop liver disease at adult age. [5]icroarray technology and bioinformatics analysis allow for the screening of genetic changes at the genome level.In this study, statistical analysis and data mining techniques were employed to identify novel candidate targets with high specificity and sensitivity by detecting genes with significant differences in amniotic fluid expression between TS and normal samples.Our findings contribute to the understanding of the genetic etiology of TS and provide a new perspective for clinical diagnosis and treatment.
We present the following article in accordance with the MDAR reporting checklist.

Microarray data and preprocessing
The microarray dataset GSE58435 deposited by Massingham needs to be downloaded from Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/).This data set is based on the GPL570 human genome U133 Plus 2.0 array platform.There are 10 samples included in this experiment: 5 TS samples and 5 normal samples.These samples are all obtained from amniotic fluid.The gene expression data in this chip were extracted for further analysis.Additionally, the annotation file of GPL570 is also downloaded from GEO.The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Data quality evaluation and data processing
R affyPLM library (version 4.0.4) was used to perform regression calculations on the data set.Besides, the affy libraries are adopted to obtain RNA degradation data, and the data quality is further evaluated.The corresponding graphics are drawn to make the data quality more intuitive.
The affy package is employed to correct and standardize the background of the TS group and normal group and thus eliminate the errors caused by non-experimental factors.Then, the corrected data of the 2 groups are combined and summarized to obtain a corrected complete data matrix.The limma libraries were utilized to screen the differential expression genes with the detect conditions of difference multiple >2 times and P < .05(logfoldchange > 1 or logfc < [À1] and adjustment P < .05).The qualified differentially expressed genes (DEGs) in TS and normal chips were obtained.Specifically, logFC < 0 indicates that the gene expression is down-regulated: logFC > 0 suggests that gene expression is up-regulated.According to the detected differential expression genes, the volcano map and heat map are drawn using R-related visualization functions.

Functional enrichment analysis of DEGs
Annotation, visualization, and integrated discovery database (DAVID) are commonly used tools in bioinformatics research.In this study, the DAVID 6.8 was employed to annotate Gene Ontology (GO) function and analyze Kyoto Encyclopedia of Genes and Genomes (KEGG) function enrichment of different genes.GO analysis, involving biological process (BP), cellular component (CC), and molecular function (MF), was conducted to predict protein function.Moreover, the ggplot2 package of R was used to visualize the screened data and make a bar plot and circle chart.
Additionally, the differential genes detected above were imported into DAVID 6.8 online analysis website for KEGG signaling pathway.KEGG pathway analysis was performed to assign a series of DEGs to specific pathways, so as to build a network of interactions, reactions, and relationships among molecules.The ggplot2 package of R was taken to make bubble charts and visualize each pathway.

Construction of protein-protein interaction between DEGs
STRING (https://string-db.org/) is an online database of known and predicted protein-protein interactions (PPI).These interactions include physical and functional associations.The data were primarily obtained from computational prediction, highthroughput experiments, automatic text mining, and co-expression networks.The data screened by high-throughput experiments were imported into the online application of STRING, and >0.7 was set as the threshold.If the comprehensive score is >0.7, these DEGs are considered essential protein pairs.The protein pairs were further sorted and analyzed using the cytoscapehubba of Cytoscape v3.9.0.The top 10 hub genes were detected.Then, the online application of the human protein atlas was performed to analyze the tissue-specific expression of these 10 genes.

Verification of hub genes
The microarray expression profiling data set GSE46687 was downloaded from gene expression omnibus.There were 36 samples in this experiment: 26 TS patients and 10 normal children.The samples were obtained from children's blood.Through the GEO2R analysis of online database, the differential genes were screened out, and the screening conditions were set as difference multiple >2 times and P < .05(logfoldchange > 1 or logfc < [À1] and adjp < 0.05).The hub genes screened in the GSE58435 database were searched from the screened differential genes.

Quality evaluation of TS and normal data sets and the DEG
The quality of the data set was visualized with R. A quality evaluation map is illustrated in Fig. 1. Figure 1A presents the relative logarithmic expression.There are 10 samples in a horizontal baseline, suggesting that the data set is of good quality.Figure 1B is a graph of RNA degradation.The fluorescence intensity at the 5 0 end of the chip is much lower than that at the 3 0 end.Thus, the abscissa runs from left to right from 5 0 end to 3 0 end.The slope in the graph also demonstrates the high quality of the data.
The limma package and impute package in R were used to detect differential expression genes, with the detecting conditions of logfoldchange > 1 or logfoldchange < (À1), and adjp < 0.05.Besides, 733 significantly differentially expressed genes were obtained.Compared with normal samples, the number of genes up-regulated and down-regulated in TS was 387 and 346, respectively.The volcano and heat map of the detected differential genes is exhibited in Fig. 2.

Enrichment analysis and signal pathway analysis of DEGs
The GO analysis of DEGs was conducted with DAVID 6.8 online tool, and 733 differential genes were further studied.The GO analysis involves BP, CC, and MF.Accordingly, the detected differential expression genes are imported into DAVID.Then, the detected results are taken as P < .05,and 27 terms are obtained.The barplot is obtained by the R ggplot2 library (Fig. 3A), and the  first 10 terms are screened out.Moreover, a circle diagram is made through the GOplot package (Fig. 3B).The results revealed that the most abundant terms include GO:0006355, GO:0006351, GO:0005634, GO:0046872, GO:0003677, GO:0003700, and GO:0003676.Hence, the genes in CC mainly involve the nucleus, nucleoplasm.MF indicates that the main functions of these genes are to regulate the DNA binding, transcription factor activity, nucleic acid binding, arylesterase activity.BP analysis demonstrated that the functions of these genes are mainly engaged in the following biological processes: the regulation of transcription, DNA-templated transcription, positive regulation of tyrosine phosphorylation of STAT3 protein, coronary vasculature development, adipose tissue development, transcription from RNA polymerase II promoter, and negative regulation of intracellular estrogen receptor signaling pathway (Table 1).Additionally, the data are visualized through the GOplot package and ggplot2 package of R software.In this study, a chart of the terms obtained from the analysis of all the screened DEGs is acquired while the up-regulated and downregulated genes are analyzed.Besides, P < .05 is taken to visualize the results (Fig. 3C and D).In Fig. 3C and D, the qualified GO enrichment analysis results of up-regulated genes and downregulated genes contain 20 items and 17 items, respectively.It was revealed by analyzing the data in detail that the obtained data is not the sum of simple up-regulation genes and down-regulation genes, though most enrichment results are the same in the process of GO analysis.For example, in addition to the above biological processes, BP also includes protein glycosylation.The enrichment analysis of down-regulated genes also yielded similar results.
The KEGG pathway analysis was performed to verify DEG function and signal pathway enrichment.A total of 11 essential signal paths have been identified.Similarly, the results of KEGG path analysis are visualized through the application package of R software (Fig. 4). Figure 4A indicates that the pathways with the most gene enrichment include inflammatory mediator regulation of TRP channels; others involve osteoclast differentiation, TNF signaling pathway, toxoplasmosis, mucin-type O-glycan biosynthesis, beta-Alanine metabolism, and inflammatory bowel disease (IBD) (Table 2).The separate analysis of the up-regulated and down-regulated genes demonstrated the epithelial cell signaling in Helicobacter pylori infection in the KEGG pathway analysis of up-regulated genes.Moreover, down-regulated genes may be engaged in the formation of hepatitis B and C.

PPI network analysis of DEGs
The PPI network is constructed using the online analysis website STRING, and the data are further analyzed for protein interaction.The differential genes are imported into the online analysis website, and the visualization graph is composed of 388 edges and 608 nodes.
In this paper, cytohubba in Cytoscape is employed to calculate the hub genes in 12 ways.Then, the central elements in complex networks by intersection are discovered, such as HIST1H2BA, TRIM71, HIST1H2BB, HIST1H4D, TNF, TP53BP1, CDCA8, EGF, HMG20B, and BCL9.According to the results of functional enrichment analysis, these 10 genes are mainly involved in DNA binding, DNA transcription, nucleus, coronary vessels, and neural tube development (Table 3).
Table 1 Differential genes between Turner group and normal group were screened by using the limma package in R language.

Turner vs normal Genesymbol
Human protein atlas (https://www.proteinatlas.org/)online is also used to analyze the tissue-specific expression of the selected genes.The results revealed that HIST1H2BB, HIST1H2BA, HIST1H4D, TNF, CDCA8, and HMG20B were mainly engaged in the expression of osteoclasts.Besides, BCL9 was highly expressed in the ovary; TRIM71 was highly expressed in lung and brain; TP53BP1 was mainly expressed in thyroid tissue; EGF was expressed in kidney, pancreas, and muscle; TP53BP1, CDCA8, and HMG20B were highly expressed in the digestive tract; TNF, CDCA8, and HMG20B were highly expressed in blood; HIST1H4D was also expressed in lymphoid tissue.

Verification analysis
The network online database GEO2R was utilized to analyze the GSE46687 data set.The differential genes were selected and compared with the key genes selected from the GSE58435 database.The expression of HIST1H4D, TNF, TP53BP1, CDCA8, EGF, HMG20B, and BCL9 in the GSE46687 data set was significant and consistent (Table 4).

Discussion
TS is a sex chromosome genetic syndrome with a variety of clinical symptoms.There is little research on the molecular level  of TS.The research sample is amniotic fluid, and only GSE58435 is retrieved from the GEO database.In this study, 733 differential expression genes were detected by analyzing the gene expression microarray of 5 TS samples and 5 normal samples from GSE58435.In this study, several overlapping hub genes were discovered through the differential gene analysis of the GSE46687 data set to better verify the stability of our analysis results.
The GO functional annotation analysis revealed that the genes with significant differences in expression were mainly involved in the processes of DNA binding and transcription, regulation of protein tyrosine, production of cytokines, response to hormones, development of coronary artery, and adipose tissue.GO analysis showed that the genes were involved in DNA binding and transcription.DNA methylation changes were widespread on all autosomal chromosomes in 45,X and in 47,XXY individuals, with Turner individuals presenting 5 times more affected loci.Differentially methylated CpGs, in most cases, have intermediate methylation levels and tend to occur outside CpG islands, especially in individuals with Turner syndrome. [6]In a large number of individuals, a study verified several loci by pyrosequencing and observed only weak inter-loci correlations between the verified regions.This suggests a certain stochastic/ random contribution to the methylation changes at each locus. [6]he research analyzed DNA methylation changes during reprogramming of male and female somatic cells and in resulting induced pluripotent stem cells (iPSCs).At an intermediate reprogramming stage, somatic and pluripotency enhancers are targeted for partial methylation and demethylation.Demethylation within pluripotency enhancers often occurs at ESC binding sites of pluripotency transcription factors.Late in reprogramming, global hypomethylation is induced in a female-specific manner.Genome-wide hypomethylation in female cells affects many genomic landmarks, including enhancers and imprint control regions, and accompanies the reactivation of the inactive X chromosome.The loss of 1 of the 2 X chromosomes in propagating female iPSCs is associated with genome-wide methylation gain. [7]EGF plays a protective role in oxidative stress of the coronary artery.RNA molecular analysis was conducted on peripheral blood cells from patients with stable coronary artery disease; the mRNA levels of inflammation and oxidative stress markers, such as RORgt (T helper cell 17 cell marker), FoxP3 (regulatory T cell marker), NLRP3, ICAM1, SIRT1, Notch ligands JAG1 and DLL4, and HES1(Notch target gene), were determined; the changes in SIRT1 and HES1 mRNA were related to serum epidermal growth factor. [8]The EGF level was down-regulated in the key differential genes selected by us.The GO enrichment analysis also suggested that EGF was involved in the development of coronary vessels in TS patients.Magnetic resonance imaging 4-D flow-based examination of TS patients demonstrated that the diameter of the aorta was significantly larger than that of normal people, and the eddy current of ascending aorta increased, leading to the occurrence and development of aortic dissection. [9]This aortic abnormality mainly involves the left filling artery, and the most crucial abnormality is the absence of the left aorta. [10]Furthermore, the abnormality of the aorta is a risk factor of hypertension in normal people and TS patients. [11]TRIM71 is also an important DEGs, and GO enrichment analysis shows that it is involved in the development of neural tube.Molecular studies on congenital hydrocephalus show that, exome sequencing of 125 CH trios and 52 additional probands identified 3 genes with significant burden of rare damaging de novo or transmitted mutations, in addition to TRIM71, there are SMARCC1 and PTCH1.Strikingly, all 3 genes are required for neural tube development and regulate ventricular zone neural stem cell fate. [12]EGG enrichment analysis suggested that many of these genes were associated with osteoclast differentiation, alanine metabolism, IBD, H pylori infection, hepatitis, and other signal pathways.Cui et al [13] induced and cultured pluripotent stem cells (iPSC) of TS patients, discovering that osteoclasts were highly expressed in the TS population.In KEGG analysis, we revealed that TNF may be involved in the osteoclast differentiation pathway.The specific mechanism of action would be that the level of circRNA-circHmbox1 can be significantly reduced in TNF-a-induced osteoclast formation in vivo and in vitro.CircHmbox1 could inhibit RANKL-induced osteoclasts differ-  entiation primarily by binding to microRNA-1247-5p.TNF-a decreased osteoblasts differentiation by exosomal with low circHmbox1 expression from osteoclasts.Mechanistic studies presented that microRNA-1247-5p regulated osteoclasts differentiation and osteoblasts differentiation by targeting Bcl6.These results confirmed that circHmbox1-targeting miR-1247-5p was engaged in the regulation of bone metabolisms by TNF-a in PMOP. [14]Since TNF is a down-regulated gene in our differential gene screening, osteoclasts are active in differentiation, consistent with our research results.However, the specific expression of key genes in tissues reflected that TNF is highly expressed in osteoclasts.This lays a molecular foundation and provides a target site for further experimental research.The incidence of metabolic syndrome is higher in TS patients.KEGG pathway suggested that genes were involved in the alanine metabolic pathway including DPYS.Dihydropyrimidinase gene (DPYS) was analyzed.Dihydropyrimidinase (DHP) was the second enzyme in the pyrimidine degradation pathway, which catalyzed the ring-opening of 5,6-dihydrouracil and 5,6dihydrothymine to N-carbamoyl-b-alanine and N-carbamoylb-aminobutyric acid, respectively 2 patients with complete DHP deficiency were reported, both of whom were heterozygotes with missense mutation 1078T>C(W360R) in exon 6 and new missense mutation 1235G>T(R412 M) in exon 7. [15] Our DEG analysis revealed that DPYS is down-regulated in TS patients, resulting in lower alanine levels than normal people.The previous 2 latest research results also support our conclusion. [16,17]The risk of autoimmune diseases in TS patients is around twice that of the general female population, and the spectrum covers IBD and celiac disease. [18]This autoimmune disease may be correlated with an MHC locus on the long arm of the X chromosome, and the deletion of this region may cause insufficient immune regulation. [19]he KEGG analysis of up-regulated genes demonstrated that these genes also participated in the process of H pylori infection.Previous reports have suggested a correlation between H pylori infection and idiopathic short stature.The positive rate of H pylori antibody in the idiopathic short group is significantly higher than that in the control group. [20]Our study revealed the H pylori infection pathway in the selected key genes.This reflected that it might be related to TS's short stature to some extent.The mechanism may be induced by the molecular similarity between H pylori and a peptide substance, which leads these antibodies to recognize H pylori and open the trigger mechanism of microbial antigen in the gastrointestinal tract, contributing to influencing the regulation of peptides produced in the digestive tract, adipose tissue, and brain on growth hormone secretion and food intake. [21]At present, there is no molecular research on H pylori and TS.In our research results, 4 genes enriched in the KEGG pathway (ATP6V1A, MAPK9, ATP6AP1, CHUK) may be the new direction of related research.
There are few reports on hepatitis in the TS population.Calanchini et al [22] evaluated the liver function of 125 TS patients.The results suggested that g-glutamyltransferase (GGT) accounted for a relatively high proportion (about 88.7%), and the increase of LFTs (about 49.6%) was more essential because it might develop into severe liver disease, such as viral hepatitis, liver fibrosis, and liver cirrhosis. [5]Additionally, a 5-year followup study reported that pathological anhydrase elevation is very common in TS women, and about 36% of patients in the followup population exhibited anhydrase elevation. [23]However, this study demonstrated that the increase in such anhydrase was not related to viral hepatitis.Meanwhile, some reports suggested that diseases with abnormal neurodevelopment may lead to hepatitis. [24]We discovered crucial genes involved in hepatitis formation in KEGG analysis of down-regulated genes.Since the relationship between TS patients and hepatitis is not clear, other prospective studies may be required for further verification.
Besides, further network protein analysis was performed on the differential genes to understand the interaction between proteins.The essential protein pairs were screened to select 10 hub genes.Among them, HMG20B, as a DNA binding gene, has been verified to play the role of erythroid differentiation inhibitor by downregulating differentiation-related genes (such as Hrasls3) and activating differentiation inhibitors (such as Gfi1b) in mice. [25]Therefore, HMG20B may be engaged in the clinical manifestation of anemia in TS patients.Our analysis of the specific expression of key genes in tissues implied that HMG20B is highly expressed in osteoclasts.
Although the expression of hub genes was discovered, this study still has limitations.First, the database GSE58435 contains only 5 TS patients, and the sample size is small.Thus, the screened differential genes cannot be completely representative.Additionally, there are few studies on TS.When searching in the GEO database, only this database is an amniotic fluid sample.Hence, the stability of the results cannot be guaranteed by combining them with other data.Due to the lack of animal and human studies on TS at present, the screened differential genes cannot be verified correspondingly.As a result, some of our conclusions are not supported by evidence.The advantages of this study are described as follows.First, we found the possible influence of pathways such as H pylori that have not been proposed in previous studies on the height of Turner patients, and we included GSE46687 data set to verify hub genes.This can partly explain the stability of the current results.Considering that different sample sources or different included populations may lead to different results, the up-regulated genes of HIST1H2BA, TRIM71, and HIST1H2BB screened from the GSE58435 data set are down-regulated genes in the GSE46687 data set.
In conclusion, 733 DEGs and 10 hub genes were identified.They are new candidate targets in TS to understand the pathogenesis and progression mechanism, and may help to recognize the syndrome and open ways to new forms of treatment, so as to obtain a better prognosis.Although differential genes all play corresponding roles in the pathogenesis of Turner syndrome, the related hub genes of the common phenotype were detected in our analysis.This provides an imperative research basis for related clinical research and a new direction for further treatment more accurately.However, research at the molecule level of TS is up to now very limited, specifically more clinical research on the molecule level is wanted to be able to eventually relieve symptoms in TS.

Figure 1 .
Figure 1.Data set quality evaluation results.(A) Relative logarithmic expression (RLE), which is the logarithm of the expression value of a probe group in a certain sample divided by the median value expressed by the probe group in all samples, reflecting the consistency of parallel experiments.(B) RNA degradation diagram, with different colors representing 10 samples.

Figure 2 .
Figure 2. Processing result of data set.(A) The heat map shows 733 genes with the most significant differences.Red represents a high expression signal, and blue represents a low expression signal.(B) Volcano map, showing the DEGs in the chip compared with normal amniotic fluid; red dots represent Turner highly expressed genes, and blue dots represent Turner low expressed genes.DEGs = differentially expressed genes.

Figure 3 .
Figure 3. GO enrichment analysis results.(A) From the enrichment analysis results of all DEGs, terms were obtained by taking P < .05. (B) For GO enrichment analysis of all differential genes, the first 10 GO data items are selected as visual circles.Red dots indicate up-regulated genes, and blue dots indicate downregulated genes.The closer the inner circle is to the outer circle, the more genes there are in this item.The darker the inner circle, the more up-regulated genes account for this item.(C) The result of up-regulating DEGs.(D) The result of down-regulating DEGs.DEGs = differentially expressed genes, GO = Gene Ontology.

Figure 4 .
Figure 4. KEGG visualization results.The abscissa represents the P-value, the ordinate represents different paths.The larger the dots in the figure, the more genes contained in this pathway; the redder the dot color, the higher the probability of genes rich in this pathway.(A) Results of KEGG enrichment analysis of all DEGs.(B) Enrichment analysis results of up-regulated DEG.(C) Enrichment analysis results of down-regulated DEGs.DEGs = differentially expressed genes, KEGG = Kyoto Encyclopedia of Genes and Genomes.

Table 2
GO enrichment analysis of DEGs.

Table 3
KEGG pathway enrichment analysis of DEGs.DEGs = differentially expressed genes, KEGG = Kyoto Encyclopedia of Genes and Genomes

Table 4
Genes of interest.