Identifying significant genes and functionally enriched pathways in familial hypercholesterolemia using integrated gene co-expression network analysis

Familial hypercholesterolemia (FH) is a monogenic lipid disorder which promotes atherosclerosis and cardiovascular diseases. Owing to the lack of sufficient published information, this study aims to identify the potential genetic biomarkers for FH by studying the global gene expression profile of blood cells. The microarray expression data of FH patients and controls was analyzed by different computational biology methods like differential expression analysis, protein network mapping, hub gene identification, functional enrichment of biological pathways, and immune cell restriction analysis. Our results showed the dysregulated expression of 115 genes connected to lipid homeostasis, immune responses, cell adhesion molecules, canonical Wnt signaling, mucin type O-glycan biosynthesis pathways in FH patients. The findings from expanded protein interaction network construction with known FH genes and subsequent Gene Ontology (GO) annotations have also supported the above findings, in addition to identifying the involvement of dysregulated thyroid hormone and ErbB signaling pathways in FH patients. The genes like CSNK1A1, JAK3, PLCG2, RALA, and ZEB2 were found to be enriched under all GO annotation categories. The subsequent phenotype ontology results have revealed JAK3I, PLCG2, and ZEB2 as key hub genes contributing to the inflammation underlying cardiovascular and immune response related phenotypes. Immune cell restriction findings show that above three genes are highly expressed by T-follicular helper CD4+ T cells, naïve B cells, and monocytes, respectively. These findings not only provide a theoretical basis to understand the role of immune dysregulations underlying the atherosclerosis among FH patients but may also pave the way to develop genomic medicine for cardiovascular diseases.


Introduction
FH is a metabolic condition where defective uptake of circulating LDL particles by the liver leads to the elevation of serum cholesterol levels . This chronic condition causes a high cholesterol deposition in the inner walls of arteries, hastening atherosclerosis and increase the susceptibility to develop premature cardiovascular diseases (CVD) (Alhabib et al., 2021;Fantus et al., 2013). The estimated risk of premature CVDs is 20-fold higher for FH patients, compared to the general population (Villa et al., 2017). FH displays genetic heterogeneity. The disease can be either monogenic or polygenic, with a variety of molecular etiologies. Up to 80% of the FH patients have heterozygous loss-offunction (LoF) mutations in the LDLR (Berberich and Hegele, 2019); while a minority have LoF mutations in the receptor-binding functional segments of APOB (Andersen et al., 2016); or gain-offunction (GoF) mutations in PCSK9 (Abifadel et al., 2003). Biallelic LDLRAP1 gene mutations also exist, but to a much lesser extent. The monogenic forms of FH can be classified as either homozygous FH (HoFH) or heterozygous FH (HeFH) based on the allelic status of LDLR (Warden et al., (MA)2000.). In the majority of the investigated populations, HoFH is seen one in one hundred sixty thousand to three hundred thousand persons, while HeFH is seen one in two hundred fifty to three hundred persons (Berberich and Hegele, 2019). In comparison to HeFH, HoFH results in a much more severe disease manifestation. The latter also increases the rate of surgical intervention and death in the mid-twenties (Raal and Santos, 2012). Approximately, 30-70% of the clinically diagnosed FH patients, do not carry pathogenic PCKS9, LDLR, or APOB variants. However, some of them may carry rare frequency LoF variants in other FH genes like LIPA, ABCG8, APOE or ABCG5. Additionally, several high frequency allelic variants, which together act to influence serum LDL-C concentrations, are also reported (Paththinige et al., 2017).
FH is traditionally studied as a classical Mendelian disease, where causative variants perturb LDL binding, internalization and transportation mechanisms, resulting in the elevation of cholesterol laden LDL levels in the serum (Braenne et al., 2014;Wu et al., 2014;Marks et al., 2003). However, the effects of individual variants and the functional interactions between different variants in the protein is likely to be influenced by the expression status of that protein (Li et al., 2019). So, there is a greater need to study the gene expression changes in FH patients, which potentially contributes to atherosclerosis (Soutar and Naoumova, 2007). In this context, studying high-throughput gene expression profiling technologies like cDNA microarrays can provide a rapid and quantifiable assessment of thousands of genes spanning the whole genome (Qin et al., 2021). Moreover, unprecedented developments taking place in statistical modelling and bioinformatics approaches, over the past few decades, have given us a greater advantage in investigating the impact of gene expression changes on protein networks and pathways (Sabir et al., 2019;Sabir et al., 2020;Banaganapalli et al., 2021;Banaganapalli et al., 2020). Additionally, they have also assisted in the identification of potential druggable biomarkers (Huang, 1999).
The new bioinformatic methods like CIBERSORT, TIMER and EPIC can effectively characterize immune cell composition of different diseases using large-scale gene expression data (Craven et al., 2021;Xie et al., 2021). Therefore, owing to the lack of sufficient information, this study aims to uncover potential genetic biomarkers for FH. A broad range of advanced computational methods, including differential gene expression analysis, protein network mapping, hub gene identification, functional enrichment of biological pathways, and immune cell enrichment analysis were used to get a deeper understanding about the molecular abnormalities, which contributes to the development of FH and the associated health complications.

FH transcriptome datasets
Two FH transcriptomic datasets were taken from Array Express (Athar et al., 2019). The first dataset (E-GEOD-13985) consists of the white blood cell expression profiles of five FH patient samples and five controls (sex, age, body mass index and life style habits matched participants), generated on the Affymetrix GeneChip U133 + 2 (Human Genome) (Režen et al., 2019). The second expression dataset (E-GEOD-6088) included the white blood cell expression profiles of 10 FH patients and 13 controls (sex, age, body mass index and life style habits matched participants), generated via the HG-U133 + 2 chip (Mosig et al., 2008). The complete details of the study layout, samples used, RNA extraction, and array hybridization protocols are discussed in the original publication (Mosig et al., 2008).

Raw expression signals, pre-processing, and detection of DEGs
The processing of raw expression signals was performed with the Bioconductor package in R software program (Irizarry et al., 2003;Gautier et al., 2004). The noise reduction and standardization of the sample data were achieved by uploading .CEL files into the Bioconductor tool affy program. The unprocessed expression signals were standardized to median values by applying the Robust Multiarray Average (RMA) method (Ritchie et al., 2007). The genes showing 1.5-fold change (FC) expression difference and which cleared Benjamini and Hochberg's false discovery rate (FDR) with p-value 0.05 were flagged as DEGs (Gentleman et al., 2004). The volcano and mean plots were generated using R Program limma package. All the expression probes were referenced against Entrez gene IDs, and duplicate transcripts were removed. The DEGs common to both the datasets were identified with Venny 2.1 webtool (www.bioinfogp.cnb.csic.es/tools/venny).

Network construction and GO-Annotations
The STRING webserver helps in studying the direct or indirect interactions between expressed proteins . In this context, the DEGs obtained from the above steps were used to construct the protein-protein interaction (PPI) net- Table 1 The top ten DEGs (FC > 1.5) of two FH datasets (GSE6088 and GSE13985). work with help of STRING database (Szklarczyk et al., 2017). The maximum enrichment p-value and the minimal average local clustering coefficient values of the PPI networks were < 1.0x10 -16 and > 0.4, respectively. The confidence score of > 0.4 and the maximum additional interactions of 10 nodes were used to build the network. Then GO annotation analysis was performed to find the most significant terms using the Enrichment Analysis Visualization Appyter . This Appyter allows the generation of scatterplots, bar charts, hexagonal canvas, Manhattan plots, and volcano plots (https://appyters.maayanlab.cloud/#/Enrichment_ Analysis_Visualizer). All the protein-protein network was illustrated using Cytoscape3.8.2 (Shannon et al., 2003).

Identification of network clusters
A Cytoscape plugin called as Molecular Complex Detection (MCODE), was utilized to look for clusters within the PPI networks using the default settings like, k-core is equal to 2, node score of 0.2, degree cutoff of 2, and network depth of 100 (Bader and Hogue, 2003). Then, hub genes were identified based on the high MCODE scores (>5) using the cytoHubba plug-in of Cytoscape. Furthermore, subnetworks were created using known primary and secondary FH candidate genes (LDLR, LDLRAP1, PCSK9, APOB, APOA2, GHR, ABCA1, SMARCA4, and EPHX2) and hub genes.

Enrichr-GO annotations of cluster networks
The PPI subnetworks created with hub genes and FH candidate genes were provided as an input to the Enrichr webserver (https://maayanlab.cloud/Enrichr) . The GO analysis option available in Enrichr provides biological information regarding the involvement of PPI subnetworks in Biological-Processes (BPs), Cellular-Components (CC), Molecular-Functions (MFs) and pathway database (Kyoto-Encyclopedia of Genes and Genomes Pathways (KEGG)). The Benjamini-Hochberg (BH) stepdown method was used for statistical assessment of GO terms at a p-value of < 0.05 and a combined enrichment score of > 20. The REVIGO tool was then used to summarize GO terms and to remove redundant terms (Supek et al., 2011).

Open target phenotype identification
The hub genes selected from GO enrichment network findings were further explored in the Open Targets Platform. (http://platform.opentargets.org). This webserver provides unified access to diverse computational tools to query the causality relationship, including physical binary interactions, enzymatic reactions, or functional relationships between therapeutic targets and disease phenotypes. The query gene list was provided as an input option in the search box and the output was in the form of an evidence score for a given target-disease pair. The open target platform association score at a < 0.5 cutoff value was considered significant to detect the expression status of druggable molecular targets.

Mapping immune cell specific transcriptional signatures
The DICE database was used to understand how key genes enriched in all GO annotation terms show immune cell specific variations (https://dice-database.org/genes/). This database provides comprehensive information on immune cell expression. Upon providing the query gene list, this tool displays the expression level of genes in transcripts per million (TPM) on the x-axis, and cell types are sorted based on the y-axis of box plot graphs. The DEGs across all the cell types were identified by the DESeq package. Here, cell sorting is performed in reference to the gene expression values, arranged from the highest one to the lowest one. The x-and y-axes show pair-wise comparisons of two cell types. Statistical significance from log2 fold changes to P values is demonstrated in an interactive manner.

Gene-Drug interactions
The druggability of the query genes was determined using the Drug-Gene Interaction database (DGIdb) (Cotto et al., 2018). DGIdb is a centralized repository for data on drug-gene interactions and the druggability of each query gene is collected from multiple sources. The default filters like approved drugs, cytotoxic agents, and immunotherapeutic drug interactions from nine diseaseagnostic source databases were used during the searching step. The drug-gene interaction output data of queried genes was further filtered based on interaction types (activator or inhibitor) and molecular category of gene (transporter, domain, surface protein, G-protein receptor etc.), at a cut-off interaction score of > 0.03.

The analysis of DEGs
The gene expression profile of E-GEOD-13985 dataset revealed a total of 1363 DEGs (650 upregulated and 713 downregulated). In E-GEOD-6088 dataset, there were 1266 DEGs (589 upregulated and 677 down-regulated). The comparison of DEGs across the datasets using VENN plot, has detected 115 shared genes (47 upregulated and 68 down regulated) ( Fig. 1A-E). These shared genes were selected for further functional analysis with an underlying assumption that they have a critical role in hypercholesterolemia (Table 1).

Analysis of STRING PPI network and Appyters-GO annotations
PPI networks demonstrate the physical connectivity among different protein partners, and their disturbance could negatively impact broad range of molecular mechanisms essential for the cellular function. The PPI networks of 115 shared DEGs were expanded by 10 additional interactor proteins with a confidence value of > 0.4. The protein-protein interaction network generated was characterized with 151 nodesconnected to 323 edges with 4.28 Å as the average node degree and avg, local clusteringcoefficient value of 0.473 ( Fig. 2A).
GO annotations provide biological knowledge about genes and their protein products. GO enrichment findings of 115 shared DEGs with Appyter tool, has shown their annotations in 153 BP, 22 CC, 29 MF, and 10 KEGG pathways. The top key enrichment terms were, the controlling the Wnt signaling pathway (GO:00060828) under BP category with an p-value of < 0.0001, coated vesicle

Open target phenotype identification
The open target platform makes use of the systematic ''experimental factor ontology" (EPO) to identify both direct and indirect correlations between the target gene and disease phenotype. In Fig. 4. A. Bar graph represents the significant pathways enriched by DEGs at a p-value of < 0.05. B. The thyroid hormone signaling pathway with red color boxes highlighting the hub genes. this context, we explored the association of the above mentioned CSNK1A1, JAK3, PLCG2, RALA, and ZEB2 genes sourced in cardiac and immune disorder-related diseases in the Open Target Platform. Of those five genes, three genes (JAK3, PLCG2, and ZEB2) have shown a significant overall association score of 0.37 in both cardiac and immune disorder-related diseases. ZEB2 and PLCG2 genes have shown stronger associations with coronary artery disease, with an association value>0.3. JAK3 deficiency is significantly correlated with a multitude of immune-associated disorders, includes tuberculosis, severe combination immunodeficiency, and rheumatoid arthritis, with a score>0.5 (Fig. 5, Table 4).

Immune cells expression analysis
The transcript expression analysis of JAK3, PLCG2, and ZEB2 genes has been performed to explore their immune cell type restriction. The JAK3 is expressed by T-follicular helper CD4 + T cells Of the all-cell types, T-follicular helper CD4 + T cells has shown highest correlation with naïve CD8 + T cell (log2 fold change or lfc is 1.33), NK cell CD56 dimCD16 + (lfc is 1.49), B cell naïve (lfc is 1.58), Monocyte non-classical (lfc is 1.95), Monocyte classical (lfc is 2.56) (p=<0.001) (Fig. 6A-C).

Drug-gene interaction analysis
Druggability analysis on the three filtered DEGs, i.e., JAK3, PLCG2, and ZEB2 was done utilizing the DGidb webserver. Our findings suggest that JAK3 and PLCG2 genes have the potential to act as therapeutic targets owing to their drug interaction score of > 0.05. Both of them can be effectively inhibited by Ruxolitinib and Ibrutinib molecules (Table 5). Whereas, no drug targeting ZEB2 gene was found in our analysis.

Discussion
Over the recent decades, systems biology approaches have been widely used to study the gene expression datasets available in open resource databases like GEO and they have identified numerous disease genes and molecules (Sahly et al., 2021;Mujalli et al., 2020). An integrated bioinformatics method is used in this context Table 3 Top 4 GO terms after cluster enrichment with the 9 FH genes obtained by Enrichr-GO annotations for biological processes (BP), cellular components (CC), molecular functions (MF), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.  (Sheng et al., 2014). Moreover, dysregulation of the mucin type O-glycan pathway may negatively impact the O-glycosylation of class A repeats in LDLR, whose stable expression is essential for LDL absorption into cells (Pedersen et al., 2014).
The continuous deposition of cholesterol and other lipids in the blood vessel wall promotes local hyperplasia, cytokine secre- tions, macrophage invasions and macrophage foam cell formation, which are the eventual underlying cause of early onset of atherosclerosis and cardiovascular diseases (Falk, 2006;Vallejo-Vaz et al., 2016;Heusch et al., 2014). However, rather than single gene actions, the chronic disease like atherosclerosis involves the action of perturbed protein networks, Therefore, based on the DEGs identified in FH patients, we initially constructed proteinprotein networks and further decomposed it into a functional cluster. In this context, we extended the protein network cluster with 9 FH candidate genes (APOA2, APOB, GHR, LDLR, LDLRAP1, ABCA1, PCSK9, SMARCA4, and EPHX2) and performed their GO annotations.
Network clusters are characterized by extensive connectivity between a set of genes, and GO annotations provide their biological interpretation (Zhong and Xie, 2007). The top significant GO terms in the biological processes category were the regulation of intracellular signal transduction and positive regulation of transcription, DNA-templated. The dysfunction of lipid homeostasis, PI3K/AKT signaling transduction pathways, monocyte chemotaxis, macrophage migration, and neovascularization are some of the atherosclerotic plaque formation characteristics (Li et al., 2018;Zhao et al., 2021). The accumulation of LDL particles leads to the formation and deposition of fibrous plaques in the subendothelial space, the eventual narrowing of arterial diameter leads to heart ischemia and myocardial infarction (35). In KEGG pathways category, thyroid hormone signaling pathway was the key enrichment term followed by ErbB signaling pathway. Since thyroid hormones are known to influence lipid homeostasis in the liver, hypothyroidism can cause hypercholesterolemia, which is commonly observed in patients with hypothyroidism (Delitala et al., 2017). Hypercholesterolemia in hypothyroidism leads to simultaneous diminishing control by triiodothyronine (T3) of SREBP-2 protein, which regulate the synthesis of cholesterol by modifying the HMG-CoA enzyme activity (Duntas and Brenta, 2018). Cholesterol levels are known to influence EGFR signaling processes by negatively affecting the receptor function and trafficking (Pike and Casey, 2002). Moreover, EGFR inhibition is shown to be useful for the treatment of hypercholesterolemia in high-fat-diet-fed Mitogen-inducible gene 6 (Mig-6) Mig-6 d/d mice . GO results have shown that 5 (CSNK1A1, JAK3, PLCG2, RALA, ZEB2) network cluster genes were commonly enriched against all annotation terms. Then open target platform-based correlations between the target gene and disease phenotypes revealed that ZEB2, JAK3 and PLCG2 genes are highly associated with cardiovascular and immunology related phenotypes.
The JAK3 is a tyrosine kinase protein, whose critical role in the cytokine receptor signal transduction pathway is very important for T-lymphocyte differentiation and function (Murray, 2007). A defective JAK3 signalling pathway is important for a multitude of immune-associated disorders, includes atherosclerotic process. Our immune cell restriction analysis findings show that JAK3 expression is highly correlated with T-follicular helper CD4 + T cells, which stimulate atherosclerosis and additional CVDs (Methe et al., 2005). JAK3 has been an ideal molecular target for several immunomodulators . Ruxolitinib is a dual JAK1 and JAK2 inhibitor with biological IC50s in the single digit nanomolar range for both kinases. A sixfold selectivity over Tyk2 and approximately 130-fold selectivity over JAK3 is represented within the JAK family members (Wu et al., 2019). Ibrutinib (PCI-32765) is a strong Brutons tyrosine kinase (Btk) inhibitor with an IC50 of 0.5 nM in cell-free tests. It is shown to act as a potent inhibitor to Bmx, CSK, FGR, BRK, and HCK, but less potent to EGFR, Yes, ErbB2, JAK3, and other kinases (van Vollenhoven et al., 2015).
The second gene PLCG2, drives the hydrolysis of membrane phospholipids to secondary messengers IP3 and diacylglycerol using calcium as a cofactor. PLCG2 variants are reported to contribute to autoinflammation and immune dysregulation (Sims et al., 2017). Our immune restriction analysis findings have confirmed the elevated expression of PLCG2 in naïve B cells, NK cells, and monocytes. The third gene ZEB2 belongs to the Zfh1 family of zinc finger/homeodomain proteins, which act as DNA binding transcriptional inhibitors (Bar Yaacov et al., 2018). The activity of ZEB2 is strongly related to dyslipidemia, metabolic changes, and CD8 + T cell alterations seen in atherosclerotic plaques (Fernandez, et al., 2020). Our immune cell restriction analysis results show that monocytes show higher expression of ZEB2 than native B cells or TH1 CD4 + T cell. In the hematopoietic system, Zeb2 together with Tbx21 (Tbet) promotes natural killer (NK) cell maturation and CD8 + T cells, and inactivation of ZEB2 dysregulates hematopoiesis with neutrophilia and monocyte loss, which validates our findings (Li et al., 2017). Table 4 Top 5 Open Target associated disease phenotypes for the 3 genes that were found common in all Enrichr-GO annotation terms (JAK3, ZEB2, and PLCG2). Even though the current study used a thorough bioinformatic analysis, some technical limitations or weaknesses could not be ruled out. Owing to the small amount of data, our results could not be generalized to a larger number of FH patients. The accuracy of the analytical findings can be improved by enlarging the sam-ples. Furthermore, while it can be described to certain extent that hub genes are strongly involved in FH pathogenesis and may potentially serve as key biomarkers for therapeutic molecules, future research work on animal models or cell line models is required to validate our findings.

Conclusion
This study has detected the expression differences of 115 genes in the FH samples compared to the controls. Functional ontology findings have linked these genes to intracellular lipoprotein metabolism, immune responses, cell adhesion molecules, canonical Wnt signaling, mucin type O-glycan biosynthesis pathways in FH patients. The expanded protein interaction network construction,GO annotations, and open target analysis has revealed ZEB2, JAK3 and PLCG2 genes as the key hub genes, which connects cardiovascular and immune response related phenotypes. These findings not only offer a hypothetical basis for better understanding immune dysregulation mechanisms underlying the atherosclerosis among FH patients, but also open the path for the development of therapeutic targets to reduce the cardiovascular disease burden in FH patients.

Ethical approval
This study does not require any ethical approval as the publicly available gene expression datasets have been used.

Data availability statement
The article includes all datasets studied for this investigation.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.