Integrated bioinformatics analysis of potential pathway biomarkers using abnormal proteins in clubfoot

Background As one of the most common major congenital distal skeletal abnormalities, congenital talipes equinovarus (clubfoot) affects approximately one in one thousandth newborns. Although several etiologies of clubfoot have been proposed and several genes have been identified as susceptible genes, previous studies did not further explore signaling pathways and potential upstream and downstream regulatory networks. Therefore, the aim of the present investigation is to explore abnormal pathways and their interactions in clubfoot using integrated bioinformatics analyses. Methods KEGG, gene ontology (GO), Reactome (REAC), WikiPathways (WP) or human phenotype ontology (HP) enrichment analysis were performed using WebGestalt, g:Profiler and NetworkAnalyst. Results A large number of signaling pathways were enriched e.g. signal transduction, disease, metabolism, gene expression (transcription), immune system, developmental biology, cell cycle, and ECM. Protein-protein interactions (PPIs) and gene regulatory networks (GRNs) analysis results indicated that extensive and complex interactions occur in these proteins, enrichment pathways, and TF-miRNA coregulatory networks. Transcription factors such as SOX9, CTNNB1, GLI3, FHL2, TGFBI and HOXD13, regulated these candidate proteins. Conclusion The results of the present study supported previously proposed hypotheses, such as ECM, genetic, muscle, neurological, skeletal, and vascular abnormalities. More importantly, the enrichment results also indicated cellular or immune responses to external stimuli, and abnormal molecular transport or metabolism may be new potential etiological mechanisms of clubfoot.


INTRODUCTION
As a common developmental malformation of newborns, clubfoot affects approximately 2% of newborns (Wang et al., 2019). If not actively and timely treated, this deformity will accompany the child for a lifetime, which will not only affect the appearance of the child, including walking difficulties or even a disability, but also cause serious adverse effects on the mental health of the child. Therefore, timely and active treatments are urgently needed. The treatment of clubfoot includes surgical and nonsurgical treatments. Surgical releases, such as muscle balance and anterior tibial tendon transposition, may cause many problems, such as large trauma complications, which seriously affect the healthy development of children (Smith et al., 2014). The number of extensive surgical cases of clubfoot has declined by 60% from 1996 to 2006 in the United States (Zionts et al., 2010). In contrast, the Ponseti method has received increasing attention in the treatment of clubfoot and has become the most commonly used method. Although the success rate of Ponseti's first correction is satisfactory after a long period of development, there are still many problems and challenges, such as having a recurrence of deformity, having a poor compliance, and determining the best time to stop wearing braces (Miller et al., 2016;Thacker et al., 2005).
Clubfoot is associated with neuromuscular lesions, heredity abnormalities, skeletal dysplasia, soft tissue contracture, vascular abnormalities, ECM abnormalities, and intrauterine growth retardation (Chesney, Barker & Maffulli, 2007;Eckhardt et al., 2019;Miedzybrodzka, 2003;Ošt'ádal et al., 2015;Poon, Li & Alman, 2009;Sodre et al., 1990;Wang et al., 2013a;Wang et al., 2013b). In addition, smoking and viral infections in pregnant women are also closely related to congenital clubfoot (Palma et al., 2013;Robertson Jr & Corbett, 1997). Although the direct cause of congenital clubfoot remains to be unified, these findings provide an important opportunity and basis for the treatments of clubfoot. Several studies have revealed that botulinum toxin injection can relieve muscle or soft tissue contracture and may replace percutaneous tendoachilles tenotomy in the treatment of clubfoot (Alvarez et al., 2009;Howren, Jamieson & Alvarez, 2015). Based on the progress of pathogenesis, appropriate drug treatment may improve patient compliance and ultimately improve the efficacy of the Ponseti method. In addition, familial occurrence and interand intraphenotypic variability of clubfoot is well documented (Basit & Khoshhal, 2018). Several genes were identified as susceptible genes in clubfoot, such as the HOX family, CASP family, PITX-TXB4 pathway, troponin (TN) family, GLI3, T-box and MTHFR genes (Hecht et al., 2007;Shrimpton et al., 2004;Shyy et al., 2009;Weymouth et al., 2016;Zhang et al., 2016). However, these studies do not further explore abnormally active signaling pathways and potential upstream and downstream regulatory networks. Therefore, the aim of the present investigation is to explore the abnormal pathways and their interactions using integrated bioinformatics analysis in clubfoot. We expect the results of this study will provide an update on the etiopathogenetic mechanism of idiopathic clubfoot.

Inclusion of abnormal proteins
Two widely used databases, PubMed and Science Direct, were used for literature retrieval. Keywords were ''(clubfoot or clubfeet or congenital talipes equinovarus) and (etiology or embryology or etiopathogenesis or genomics or genetics or pathology or pathophysiology)''. The search date was up to May 7, 2019. A total of 1,057 articles were found in PubMed, and 657 articles were found in Science Direct. After eliminating duplicate articles, a total of 1,093 studies were retrieved. By reading the title and abstract, 1,015 articles were eliminated following the inclusion and exclusion criteria as described below. Inclusion criteria included clinical or preclinical studies written in English that were focused on the etiology of clubfeet. Investigations with a focus on secondary/syndromic clubfoot, such as distal arthrogryposis, myelomeningocele, and moebius syndrome, were excluded. We performed a full text assessment of the remaining 78 articles. A total of 47 articles were excluded because the study topic was not clubfoot, genetic information was not mentioned or full text of the study was not available. Finally, 8 articles were included in the present study. A total of 30 proteins that are shown in Table 1 were used in this investigation after assessing the protein. The process of the inclusion of abnormal proteins was shown in Fig. 1.

WebGestalt analysis
WebGestalt analysis was performed as described previously (Wang et al., 2017). The parameters for the enrichment analysis were as follows. A specific organism was chosen H. sapiens (human). Enrichment categories were used KEGG, Panther and Reactome pathways. A reference list was used for all mapped entrezgene IDs from the selected platform genome. The reference list was mapped to 61,506 entrezgene IDs, and 2266 IDs were annotated to the selected functional categories and used as the reference for the enrichment analysis. The minimum number of IDs in each category was 5, and the maximum number of IDs was 2000. Among 30 unique entrezgene IDs, 19 IDs were annotated to the selected functional categories and used for the enrichment analysis. The Gene Ontology (GO) Slim summary was based upon 30 unique entrezgene IDs. Fisher's exact test-based overpresentation enrichment analysis (ORA) method was used for enrichment analysis. FDR was used for the Benjamani-Hochberg (BH) method.

g:Profiler analysis
The version of g:Profiler was e94_eg41_p11_9f195a1 (database updated on 01/24/2019). The parameters for the enrichment analysis were as follows. A specific organism was chosen H. sapiens (human). GO analyses (GO molecular function (GO: MF), GO cellular component (GO: CC), and GO biological process (GO: BP)) were carried out sequentially. The biological pathways used were the KEGG, Reactome, and WikiPathways databases. The protein databases used were the Human Protein Atlas and CORUM databases. The statistical domain scope was used only for annotated genes. The significance threshold was the g:SCS threshold. The user threshold was 0.05.

NetworkAnalyst analysis
The significantly changed genes from the previous analyses were mapped to the corresponding molecular interaction databases. The procedure typically produces one large subnetwork with several smaller ones. The website was upgraded and maintained until May 8, 2019 by the Xia Lab (https://www.networkanalyst.ca/NetworkAnalyst/faces/ home.xhtml). The parameters for the enrichment analysis were as follows. A specific organism was chosen H. sapiens (human). The ID type was chosen Uniprot Protein ID. PPI analysis was performed using the MEx Interactome database. The parameters were referred  'ádal et al. (2015) to the literature-curated comprehensive data from the Innate DB (Breuer et al., 2013). Gene-miRNA interactome analysis was carried out with comprehensive experimentally validated miRNA-gene interaction data collected from TarBase and miRTarBase (Chou et al., 2018;Vlachos et al., 2015). TF-gene interaction analysis was performed using the ENCODE database. Transcription factor and gene target data derived from the ENCODE ChIP-seq data. The peak intensity signal <500 and the predicted regulatory potential score <1 used the BETA Minus algorithm (Wang et al., 2013a;Wang et al., 2013b). TF-miRNA coregulatory network analysis was performed using curated regulatory interaction information collected from the RegNetwork repository (Liu et al., 2015).

REAC enrichment analysis
The enrichment analysis was performed against Reactome version 66 on 04/05/2019 using UniProt identifiers for the mapping. The web link is as follows: https://reactome.org/ PathwayBrowser. Information in the REAC database is authored by expert biologists and entered and maintained by Reactome's team of curators and editorial staff. Reactome content frequently cross-references other resources, e.g., NCBI, Ensembl, UniProt, KEGG (Gene and Compound), ChEBI, PubMed and GO. REAC analysis was performed as described previously (Fabregat et al., 2016). The parameters for the enrichment analysis were as follows. A specific organism was chosen H. sapiens (human). IntAct interactors were included to increase the analysis background. An overrepresentation analysis method was used for enrichment analysis. This test produces a probability score, which was corrected for the false discovery rate using the BH method. Twenty-seven out of 30 identifiers in the sample were found in Reactome, and 430 pathways were found by at least one of the identifiers.

Statistics
The enrichment analysis method in the WebGestalt analysis was used for ORA method. The significance threshold in the g:Profiler analysis was the g:SCS threshold (g:Profiler analysis soft version: e94_eg41_p11_9f195a1 (database updated on 01/24/2019)). The adjusted p value method used was the BH method and the adjusted p value was transformed to negative log10 (-log10(AdjP)). All significantly changed pathways and interactions had a p value <0.05.

Results of the enrichment analysis by g:Profiler and Reactome
To visually observe the enrichment information of these candidate proteins, g: Profiler and Reactome were performed for the bioinformatics analysis. A large number of terms were enriched by g:Profiler in the GO, KEGG, REAC, WP and HP databases ( Fig. 2A). A large number of signaling pathways were enriched by Reactome (Fig. 2B). Additionally, there were extensive interactions among these signaling pathways (Fig. 2B).
To clarify the categories of these pathways, we summarized these pathways enriched by REAC. Signal transduction, disease, metabolism, gene expression (transcription), and immune system were the top 5 pathways and their percentages were as high as 65% (Fig. 3A). Among the 30 proteins, the top 10 proteins with the widest participation, such as P35222, P01137, P02751 and P08123, were mainly ECM proteins or proteins that interact with the ECM (Figs. 3B and 3C). We continued to summarize the top 10 pathways and found that these pathways were mainly focused on the ECM, metabolism and cell communication (Fig. 3D).

Results of the signaling pathway enrichment analysis
Through the Reactome enrichment analysis, a large number of pathways were enriched (Figs. 2 and 3). To further analyze these potential pathways, signaling pathway enrichment analysis was carried out using g:Profiler in the REAC, WP, KEGG and HP. A total of 32, 11, . Classification analysis revealed that these pathways were mainly concentrated in ECM, disease and metabolic pathways (Fig. 4E). The proportion of the top three was as high as 78%. In addition, human phenotype ontology enrichment analysis results revealed that these abnormal proteins were mainly expressed in the lower appendages, such as muscle, ankle, foot, joint, skin and connective tissue, and their proportion was more than 50% (Fig. 4F). Similar, similar results were found in KEGG and REAC enrichment analyses using NetworkAnalyst and WebGestalt (Fig. S1).

Results of the GO enrichment analysis
GO enrichment analysis was further performed to explore the BP, MF and CC induced by these proteins. A total of 69, 9 and 24 GO terms were enriched by g:Profiler in BP, MF and CC, respectively (Figs. 5A-5C). Classification analysis revealed that GO: BP was mainly concentrated in embryo or organ development, and its combined percentage was over 80% (Fig. 5E). Among these pathways, skeletal development was dominant. ECM function, growth factor binding, cell adhesion, heparin binding and protein binding were the main pathways in MF, and their percentages were as high as 85% (Fig. 5F). Additionally, ECM and membrane were the main CC, and their percentages were almost 60% (Fig. 5G). In addition, GO enrichment analysis results analyzed by NetworkAnalyst were consistent with those from g:Profiler (Fig. S2).

Results of PPIs and gene regulatory networks analysis
The NetworkAnalyst analysis was performed to explore protein-protein interactions and gene regulatory networks for genes that code these candidate proteins. These genes interact extensively with predictive genes (Fig. 6A). A total of 2,630 PPIs were enriched by NetworkAnalyst. There are 1,263 interactions that were related to the top 10 genes and these interactions account for 48% of the total. The top genes were FN1, CTNNB1, FHL2, TGFB1 and COL1A2 (Fig. 6B). Among these genes, FN1 and CTNNB1 were dominant, and the percentage of PPIs induced by these two genes was almost 39%. TF-gene interactions were also enriched by NetworkAnalyst. A total of 692 TF-gene interactions were enriched (Fig. 6C). The top genes were TGFB1, FN1, SOX9, AOC3 and HOXD13 (Fig. 6D). Additionally, the interactions among these 10 genes were as high as 43% of the total.

DISCUSSION
In the present investigation, a large number of signaling pathways were enriched in the GO, KEGG, REAC, WP and HP databases using g:Profiler, NetworkAnalyst and WebGestalt analyses of clubfoot. Among them, pathways in embryo or organ development, ECM, metabolism, immune system, cell cycle, cell responses to external stimuli, and apoptosis or programmed cell death were the top pathways. A wide range of interactions existed among these enriched signaling pathways. In addition, there were also extensive regulations between the upstream and downstream of genes encoding these proteins.
There were 452 enriched pathways identified by REAC enrichment analysis. Among them, signal transduction, disease, metabolism, gene expression (transcription), immune system, developmental biology, cell cycle, ECM and hemostasis were advantageous pathways. Additionally, signaling pathways in DNA repair, PCD, cell response to external stimuli, cell-cell communication, molecular transport and chromatin organization were enriched. In the process of cell biological activities, numerous changes occurred in cells, such as signal transduction, gene expression, cell cycle, DNA repair, molecular transport and metabolism. Abnormal changes in these biological processes may alter cell fate, cell-cell communication, and cell response to external stimuli or even cause immune system changes.
Select soft tissues in clubfoot are contracted, resulting in stiffness. Extracellular matrix proteins, such as asporin, collagen types III, V, and VI, versican, tenascin-C, and TGF-beta induced protein, are highly expressed in contracted tissues clubfoot (Eckhardt et al., 2019;Ošt'ádal et al., 2015). Additionally, the expression levels of growth factors TGF-beta and platelet-derived growth factor are high and a blockade of growth factors led to decreased collagen expression, proliferation, and chemotaxis (Li et al., 2001). These proteins seem to be promising targets for future investigations and treatments of this disease. Indeed, the muscle contraction strategy, such as botulinum toxin injection, can relieve muscle or soft tissue contracture and thus improve and alleviate clubfoot symptoms (Howren, Jamieson & Alvarez, 2015;Shrimpton et al., 2004). ECM-associated pathways were enriched and included collagen chain trimerization, collagen formation and degradation, and ECM proteoglycans. TGF-beta receptor signaling was also enriched. These findings support the idea that these ECM proteins are promising targets for the treatment of clubfoot.
Abnormal biological development, such as muscle, neurological, skeletal and vascular abnormalities, has been previously identified and confirmed (Basit & Khoshhal, 2018;Eckhardt et al., 2019;Herceg et al., 2006;Hester et al., 2009;Lovell & Morcuende, 2007;Ošt'ádal et al., 2015). The HOX and TBX families governed limb identity, and fibroblast growth factor participated in the formation of limb muscles (Ohuchi & Noji, 1999;Wang et al., 2008). Additionally, gene-gene interactions between CASP SNPS and variants in HOXA, HOXD, and insulin-like growth factor binding protein affect muscle and limb development (Ester et al., 2010). Apoptosis and programmed cell death associated pathways and activation of HOX gene pathways, such as activation of HOX genes during differentiation, and activation of anterior HOX genes in hindbrain development during early embryogenesis, were enriched in the REAC database. A myogenesis pathway was also found.
ROBO family genes regulate axonal guidance and cell migration. ROBO1 and ROBO2 receptors regulate the proliferation and transition of primary to intermediate neuronal progenitors (Borrell et al., 2012), while the interaction of ROBO4 with SLIT3 is involved in the proliferation, motility and chemotaxis of endothelial cells, and accelerates the formation of blood vessels (Zhang et al., 2009). We found that the regulation of commissural axon pathfinding by SLIT, ROBO, netrin-1 signaling and pathways with signaling by ROBO receptors, semaphorin interactions, neurotransmitter release cycle were enriched. These results support that activation and inactivation of developmental signaling pathways initiate embryonic and organ development. Abnormalities in any link may cause deformities.
Epidemiological data confirmed that smoking by any parent or the presence of any household smoking increased the risk of clubfoot in Peru (Palma et al., 2013). Maternal diabetes also showed a significant association with clubfoot (Parker et al., 2009). In mice, maternal smoking or diabetes increased mitochondrial damage and oxidative stresses (Stangenberg et al., 2015). In addition, maternal diabetes also increased hypoxia-inducible factor 1 α expression in utero (Moazzen et al., 2015). Robertson Jr & Corbett (1997) confirmed that clubfoot was caused by an intrauterine enterovirus through allowing anterior horn cell lesions. The adverse in utero environment caused by the external or internal environment may increase oxidative stress and thus induce cell damage in utero. We found that the signaling pathways involved in the regulation of gene expression by hypoxia-inducible factor, reactive oxygen species detoxification, cell response to hypoxia, cell response to heat stress, cell response to stress, oxidative stress-induced senescence, cellular senescence, and cellular responses to external stimuli were enriched in the REAC database. In addition, these external stimuli activate the immune system. We also found that interleukin family signaling, MAP kinase activation, toll-like receptor family cascade, adaptive immune system, and innate immune system were enriched. These data indicate that abnormal changes in the immune system and cell response to external stimuli may play key roles in clubfoot.
In the enrichment analysis, we also found that several molecular transport pathways were enriched by enrichment analysis, and induced plasma lipoprotein assembly, remodeling, and clearance, iron uptake and transport, small molecules transport, binding and uptake of ligands by scavenger receptors, lysosome vesicle biogenesis, vesicle-mediated transport, trans-Golgi network vesicle budding, and membrane trafficking. Molecular transport is essential for material transport, signal transduction and neurotransmitter transporters and modulates the cell response to an external force (Hu & Papoian, 2013). Disorders induced by abnormal molecular transport are serious and even fatal for cells. Based on these results, we proposed that molecular transport abnormalities may play a large role in clubfoot.
Metabolism of proteins, RNAs or other molecules plays a key role in the basic functions of cells (Esteban-Martínez, Sierra-Filardi & Boya, 2017). Pathways in these metabolic processes were found in the enrichment analyses, and induced regulation of IGF transport and uptake by IGFBPs, activation of genes by ATF4 in response to endoplasmic reticulum stress, synthesis, secretion, and inactivation of GLP-1, metabolism of nitric oxide, activation and regulation of eNOS, metabolism of carbohydrates, and regulation of lipid metabolism by PPAR α. A previous study indicated that lipid droplets and glycogen increased and the number of mitochondria decreased in chondrocytes from the biopsied iliac crest cartilage of joint contracture patients (Nogami et al., 1983). These data suggested that glucose metabolism, lipid metabolism and mitochondrial oxidative stress were associated with multiple joint contractures. In the present work, glucose metabolism regulated by IGF or GLP, lipid metabolism regulated by PPAR, and oxidative stress regulated by NOS were found in the REAC database. These data suggest that metabolic abnormalities may play a significant role in clubfoot.
All candidate papers were obtained from PubMed and Science Direct. Almost all of these differentially expressed proteins were carefully and strictly selected from clinical trials. Clubfoot is abnormal in both bone and muscle. Most of the samples are from bone, and few are from muscle, therefore selection bias cannot be avoided. Through strict screening criteria, candidate proteins from different tissues were adopted for subsequent bioinformatics analysis, and the selection bias was possibly minimized. Although the adjusted p value method was performed, there may be some false positives in using these candidate proteins for bioinformatics analysis because these proteins were not collected from the same investigation. Although we proposed that cell or immune responses to external stimuli, abnormal molecular transport and metabolism are new potential etiopathogenetic mechanisms of clubfoot, direct experimental evidence is needed. We are carrying out preclinical and clinical studies to confirm these enrichment pathways and proposed hypotheses.

CONCLUSIONS
In summary, a large number of signaling pathways were enriched using REAC, KEGG and WP enrichment analyses by g:Profiler, NetworkAnalyst and WebGestalt. Among them, signal transduction, disease, metabolism, gene expression (transcription), immune system, developmental biology, cell cycle, and ECM were the top functions. GO enrichment analysis also revealed pathways in embryo or organ development, and ECM proteins were dominant. PPIs and GRNs analysis results indicated that extensive and complex interactions occur in these proteins, enrichment pathways, and TF-miRNA coregulatory networks. Transcription factors such as SOX9, CTNNB1, GLI3, FHL2, TGFBI and HOXD13, cooperated with hsa-miR-29a, hsa-miR-101, hsa-miR-520d-5p, hsa-miR-29b and hsa-miR-568 and regulated these candidate proteins. In addition to supporting the proposed hypotheses, such as ECM abnormalities, fetal movement reduction, genetic abnormalities, muscle abnormalities, neurological abnormalities, skeletal abnormalities, uterine compression and vascular abnormalities, we propose that cellular or immune responses to external stimuli, and abnormal molecular transport or metabolism are new potential etiological mechanisms of clubfoot.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The authors received no funding for this work.