Identifying critical genes associated with aneurysmal subarachnoid hemorrhage by weighted gene co-expression network analysis

Aneurysmal subarachnoid hemorrhage (aSAH) is a life-threatening medical condition with a high mortality and disability rate. aSAH has an unclear pathogenesis, and limited treatment options are available. Here, we aimed to identify critical genes involved in aSAH pathogenesis using peripheral blood gene expression data of 43 patients with aSAH due to ruptured intracranial aneurysms and 18 controls with headache, downloaded from Gene Expression Omnibus. These data were used to construct a co-expression network using weighted gene co-expression network analysis (WGCNA). The biological functions of the hub genes were explored, and critical genes were selected by combining with differentially expressed genes analysis. Fourteen modules were identified by WGCNA. Among those modules, red, blue, brown and cyan modules were closely associated with aSAH. Moreover, 364 hub genes in the significant modules were found to play important roles in aSAH. Biological function analysis suggested that protein biosynthesis-related processes and inflammatory responses-related processes were involved in the pathology of aSAH pathology. Combined with differentially expressed genes analysis and validation in 35 clinical samples, seven gene (CD27, ANXA3, ACSL1, PGLYRP1, ALPL, ARG1, and TPST1) were identified as potential biomarkers for aSAH, and three genes (ANXA3, ALPL, and ARG1) were changed with disease development, that may provide new insights into potential molecular mechanisms for aSAH.


INTRODUCTION
Aneurysmal subarachnoid hemorrhage (aSAH) is a lifethreatening medical event caused by the rupture of an intracranial aneurysm, resulting in blood leakage into the subarachnoid space [1]. According to the relevant literature, aSAH accounts for 75%-80% of nontraumatic SAH, with an annual incidence of approximately 6-16 cases per 100,000 individuals worldwide [2]. The mortality rate of SAH is estimated to be approximately 40-50%, with a 36% mortality rate within 30 days of the development of symptoms [3][4][5].
In order to identify aSAH samples at an early stage, we hypothesized that potential biomarkers may exit in the peripheral blood, for prediction of aSAH. Accordingly, in this study, we analysed the peripheral blood samples of 43 patients with aSAH due to ruptured intracranial aneurysms and those of 18 individuals with headaches using the weighted gene co-expression analysis (WGCNA) to select hub genes associated with aSAH. Subsequently, critical genes were identified by combining these data with differentially expressed genes (DEG) analysis and validation in 35 clinical samples. We identified three genes, ALPL, ANAXA3 and ARG1, that may be associated with aSAH disease progression. The workflow of the current analysis is shown in Figure 1.

Data processing
For the data matrix, 12,095 unique genes were annotated with GPL10558 platform annotation file, and 7,987 of these genes were found to be protein-coding genes after referring to the human genome assembly GRCh38. The dataset included 43 aSAH samples and 18 control samples; no outlying samples were identified with the criterion Z.ku lower than the -5 base on Euclidean distance based sample network analysis. Thus, all samples expression data were applied to construct the co-expression network.

Weighted gene co-expression network construction
In WGCNA, we calculated the soft thresholding power base on the scale-free topology criterion using the pickSoft Threshold function. A beta value of 3 (R 2 > 0.9) was chosen to construct the gene network by applying the default WGCNA approach (Figure 2A). In this analysis, 21 modules were identified with a minimum module size of 30, the medium sensitivity of 2 to branch splitting. We merged the modules with their pairwise correlation is larger than 0.8 so that to avoid modules eigengenes are highly correlated. Finally, 14 modules were picked out and they were displayed in Figure 2B.

Identification of significant modules
To select modules that were significantly associated with aSAH, the association between module eigengenes and clinical characteristics was evaluated with Pearson correlation analysis. Figure 2C shows the correlations between module eigengenes and aSAH. Among the modules, red (P = 0.004), blue (P = 1e-05), brown (P = 5e-04) and cyan (P = 7e-06) were closely associated with aSAH (P < 0.01). Genes with a high significance for aSAH and with high module membership in the selected four modules were identified depending on the gene significance (GS) and module membership (MM) measures. GS and MM were highly correlated in red (correlation coefficient = 0.77, P = 1e-09), blue (correlation coefficient = 0.94, P = 1e-200), brown (correlation coefficient = -0.85, P = 1e-200), and cyan (correlation coefficient = -0.91, P = 1e-200) modules, indicating that the red and blue modules contained genes with high positive correlations with aSAH, whereas the brown and cyan modules contained genes with negative correlations with aSAH ( Figure 3A). The gene expression of genes in the aSAH group in the red (P < 0.01) and blue (P < 0.0001) modules was significantly higher than that of the genes in the control group, whereas the opposite results was observed for the brown (P < 0.001) and cyan (P < 0.0001) modules ( Figure 3B).

Identification of hub genes and functional enrichment analysis
In each significant module, hub genes were identified based on the following criteria: absolute value of the correlation between the gene and aSAH higher than 0.2 and the absolute value of the correlation of the module higher than 0.8. Based on these thresholds, 364 hub genes were identified (Table 1). Then, the hub genes list was uploaded to Metascape (http://metascape.org/) to explore the biological functions of the hub genes. The top 10 biological processes (BPs) and Kyoto Encyclopedia of Genes and Genomes (KEGG) terms annotated with hub genes are displayed in Figure 4. The BP annotations showed that the hub genes were significantly enriched in the  AGING ribonucleoprotein complex biogenesis, ribosome biogenesis, translational initiation, nuclear-transcribed mRNA catabolic process, nonsense-mediated decay, viral gene expression, SRP-dependent cotranslational protein targeting to membrane, cotranslational protein targeting to the membrane, noncoding RNA processing, rRNA processing, and peptide biosynthetic process. The KEGG pathway enrichment analysis showed that the hub genes mainly participated in ribosome, hematopoietic cell lineage, tuberculosis, human T-cell leukemia virus 1 infection, ribosome biogenesis in eukaryotes, spliceosome, Th17 cell differentiation, RNA transport, toxoplasmosis, and human T-lymphotropic virus-I infection.

Identification and validation of critical genes
In order to select critical genes from the hub genes, we analyzed the DEGs between aSAH and control individuals using the limma package, according to the cut-off criteria of |log2 fold change (FC)| greater than or equal to 1 and adjusted P value less than 0.05. Among all genes, 13 genes (CD27, IL2RB, FCER1A, ANXA3, ACSL1, HP, PGLYRP1, ALPL, ARG1, TPST1, SLPI, ECHDC3, and ORM1) were screened as DEGs ( Table 2). The expression profile heatmap of the DEGs is shown in Figure 5A. Among the genes, 7 overlapped genes (CD27, ANXA3, ACSL1, PGLYRP1, ALPL, ARG1, and TPST1) were identified between DEGs and hub genes, including in one downregulated gene (CD27) and six up-regulated genes (ANXA3, ACSL1, PGLYRP1, ALPL, ARG1, and TPST1) in aSAH ( Figure 5B, 5C). These genes were defined as critical genes with playing a key role in the aSAH development. Receiver operating characteristic (ROC) curve was plotted and the area under the curve (AUC) was calculated to distinguish individuals with aSAH from controls. The AUCs of almost all critical gene was higher than 0.8 in the datasets, indicated they may be act as potential biomarker in diagnosing aSAH ( Figure 6).

DISCUSSION
aSAH accounts for 75%-80% of all cases of SAH and is associated with a high mortality rate of approximately 40-50%, with a 30-day fatality rate of 36% [2,4,5]. The molecular mechanisms involved in the pathophysiology of aSAH remain unclear. Therefore, exploring susceptibility modules and genes for aSAH may contribute to the early diagnosis and treatment of SAH, thereby reducing mortality and serious adverse reactions.
In this study, four modules were found to be highly associated with aSAH, based on WGCNA.
Additionally, 364 hub genes were identified. BP and KEGG enrichment analyses suggested that protein biosynthesis-related processes and inflammatory response-related processes were significantly involved in the pathology of aSAH. Among the hub genes, seven were found to be differentially expressed between aSAH and control groups and were identified as critical genes involved in the development of aSAH, with potential applications in the early prediction of aSAH. Finally, we validated our results using clinical data obtained from quantitative reverse transcription polymerase chain reaction (qRT-PCR).
Among the critical genes identified in this study, CD27, ANXA3, PGLYRP1, and ARG1 are closely associated with the immune system. CD27, a transmembrane glycoprotein, plays an important role in immune response and is expressed on most B lymphocytes cells, T lymphocytes cells, and natural killer (NK) cells [14,15]. The expression of CD70, which is primarily AGING controlled by antigen receptor and Toll-like receptor stimulation on T cells, B cells and dendritic cells is a key factor in determining the contribution of CD27 to the immune response [16]. CD27 binds to the receptor CD70, and plays important roles in regulation of the activation of T lymphocytes cell and the synthesis of immunoglobulin. In this study, CD27 was obviously down regulated in the aSAH group with that in the control group, suggesting that decreased CD27 transcription was related to B lymphocytes, T lymphocytes, and NK cells in patients with aSAH. This finding was consistent with the result from Joanna et al., who revealed that decreased CD27 mRNA expression was related to T lymphocytes in aSAH [17]. ANXA3, PGLYRP1, and ARG1 were all participated in the recognition of bacteria by neutrophils. ANXA3 is particularly abundant in neutrophils, accounting for approximately 1% of all cytosolic proteins [18] and contributing to neutrophil antimicrobial activity by promoting phagolysosome fusion [19]. PGLYRP1, which belongs to a family of PGN-binding proteins (PGRPs), highly conserved among insects and mammals, is an antibacterial protein found in neutrophil tertiary granules. PGRP1 plays critical roles in neutrophil production of reactive oxygen species and modulation of immune response [20,21]. ARG1 is stored in granules of neutrophils. Once released and activated, ARG1 can degrade extra cellular arginine, resulting in inhibition of the activation and proliferation of T lymphocytes cells [22]. We found that ARG1 was significantly upregulated in the SAH group, suggesting that high expression of ARG1 may be related to decreased T lymphocytes activation and proliferation in patients with aSAH. Therefore, these results suggested that there was an abundance of transcripts related to monocytes and neutrophils with a simultaneous decrease in transcripts related to T lymphocytes in patients with aSAH.
ACSL1 is an enzyme that converts free long-chain fatty acids into fatty acyl-CoA esters, and thereby plays critical roles in both in lipid biosynthesis and fatty acid degradation [23]. Thus, disordered lipid metabolism may be involved in the development of aSAH. A previous study also demonstrated that related membrane lipid metabolism is altered in spastic basilar arteries after SAH [24]. Statins have been used and show significant benefits in models of traumatic brain injury and the related disease processes, including cerebral ischemia, intracerebral haemorrhage, and SAH [25].

AGING
The most compelling preclinical data has been obtained in experimental SAH, where statins have been shown to reduce vasospasm and improve outcomes after SAH in the animal experiments [26][27][28]. Similarly, statin treatment has been shown to improve outcomes in murine models of intracranial hemorrhage [29,30] and acute ischemic stroke [31][32][33]. ALPL encodes tissuenonspecific alkaline phosphatase (ALP), which has key roles in skeletal mineralization via the regulation of diphosphate levels. ALP can also promote vascular calcification by catalyzing the hydrolysis of organic pyrophosphate, an inhibitor of vascular calcification [34]. A number of studies have reported that a close relationship between serum ALP and increased morbidity and mortality in patients with cardiovascular diseases [35,36]. Moreover, elevated serum ALP levels have been shown to be associated with increased mortality rates, poor functional outcomes, and disease recurrence in patients with stroke [37,38]. Zhu et al. evaluated the association between the outcomes and serum ALPL level in 196 patients with aSAH and found that higher serum ALP levels are associated with an increased risk of vasospasm, delayed cerebral ischemiainduced clinical deterioration, and functional outcomes after aSAH [39]. Thus, ALPL may be a predictive biomarker for patients with aSAH.
TPST1, a type of homologous tyrosyl protein sulfotransferase (TPST) enzymes, plays a critical role in protein tyrosine sulfation for transfer of sulfate from the cofactor PAPS (3'-phosphoadenosine 5'phosphosulfate) to a context-dependent tyrosine in a protein substrate [40]. To date, the functional importance of protein tyrosine sulfation is still unclear; however, this process has been shown to play a role in altering biological activities of proteins, modulating the proteolytic processing of bioactive peptides [41], influencing the half-life of proteins in circulation [42], and regulating extracellular proteinprotein interactions, as observed for inflammatory leukocyte adhesion [43,44]. The recent discovery of tyrosine sulfation of chemokine receptors suggests an even broader role in inflammatory responses [45,46]. The role of TPST1 in aSAH is unknown, and further studies are needed to explore the potential mechanisms.
With the development of sequencing technology, genomics is playing an important role in disease diagnosis, mechanism research, drug development and treatment, especially in tumor diagnosis and treatment. Nowadays, sequencing technologies are increasingly used in clinical settings, and key genes may play an important role in the occurrence and development of a certain disease. Therefore, the expression of key genes can be used to determine the diagnosis of aneurysm, and the intervention of key genes can be used to treat aneurysm, thus preventing the occurrence of serious complications.
In conclusion, in this study, we found that protein biosynthesis-related processes and inflammatory responses related processes were involved in the pathology of aSAH. Additionally, we found that CD27, ANXA3, ACSL1, PGLYRP1, ALPL, ARG1, and TPST1 were significant potential biomarkers to guide the identification and treatment of aSAH. According to our PCR data, the levels of ALPL, ANAXA3, and ARG1 were reduced over time in patients with aSAH. However, further studies are needed to determine the relationships of these changes with the disease status. Moreover, our study lacked extensive clinical experimental verification of the identified genes. Thus, in future analyses, it will be necessary to verify our findings in clinical studies.

Microarray data processing
The GSE36791 gene expression matrix was retrieved and obtained from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) by using the GEO query package in the R environment (Version 4.2.0) [47]. Gene expression data from peripheral blood samples were obtained from 43 patients with SAH caused by a ruptured intracranial aneurysm and 18 patients with headaches symptoms as the control group, the detail characteristics of all samples were displayed in Table 1 of the paper-Gene expression profiling of blood in ruptured intracranial aneurysms: in search of biomarkers [17]. The corresponding annotation file-GPL10558 matrix which includes more than 47,000 probes and targets to more than 31,000 annotated genes, was downloaded and applied to convert the probe into the target gene. If the target gene was annotated with two or more probes, the mean value was calculated for subsequent analysis. Among the targeted genes, the protein-coding genes were obtained by referring to the human genome assembly GRCh38. The matrix was normalized without transformation by using the Bead Array package [48]. In this analysis, the data were log2 transformed. The outlying microarray samples were identified with Euclidean distance-based sample network methods and a Z.ku cut-off of -5 was calculated as ku-mean(k) / sqrt(var[k]) [49].

Weighted gene co-expression network construction
WGCNA was performed to identify clusters that were highly correlated with all three phenotypes using the AGING WGCNA package [49]. First, the soft threshold beta was chosen via scale free topology with the R function pickSoft Threshold. Second, we applied a power adjacent function to select adjacencies between all protein-coding genes and to transform data into a topological overlap matrix (TOM), and the corresponding dissimilarity (1-TOM) was evaluated. Third, the parameters of cutree Dynamic function were set as a minimum gene size of 30 and a medium sensitivity of 2 for branch splitting to calculate the average linkage hierarchical clustering tree. Finally, highly correlated modules were merged with a pairwise correlation coefficient higher than 0.8 for the identification of modules with very similar expression profiles depending on the clustering methods.

Identification of significant modules
To select modules that were significantly associated with aSAH, the associations between module eigengenes and aSAH were evaluated via Pearson correlation analysis. The modules with the P-value < 0.01 were considered to be significantly associated with aSAH. Since the module eigengene is an optimal summary of the gene expression profiles of a given module, it is natural to correlate eigengenes with these characteristics and to find the most important associations. To quantify the similarity of all genes on the array to the identified module. We quantify associations of individual genes with aSAH by defining GS as the absolute value of the correlation between the gene and the specific trait and by defining the quantitative measure of MM as the correlation of the module eigengene and the gene expression profile.

Identification of hub genes and functional enrichment analysis
In each significant module, hub genes were screened according to the following criterion, including the absolute value of the correlation between the aSAH and gene higher than 0.2 and the absolute value of the correlation of the module higher than 0.8 [50].
To explore the biological function of the hub genes, we performed Gene Ontology (GO) and KEGG enrichment analysis using Metascape (http://metascape.org/) [51]. The top 10 GO terms and KEGG terms were visualized with the ggplot2 package in the R programming language [52].

Identification of critical genes
In this analysis, the critical genes were identified based on two traits: significant differential expression between aSAH and control samples and high interconnections with genes in the module. Briefly, critical genes were defined as differentially expressed hub genes. The limma package [53] was applied to identify differential expressed genes (DEGs) between two groups in the expression data with the cut-off criteria |log2 fold change (FC)| ≥ 1 and adjust P value < 0.05 [54]. Then, the critical genes were screened and visualized with the Venn diagrams package [55]. The expression of critical genes was displayed and they were verified in another dataset. Additionally, ROC curves were plotted with the pROC package to verify the diagnostic performance of critical genes.

Validation of critical genes using qRT-PCR
Finally, we validated the obtained results from microarray data of peripheral blood samples by using qRT-PCR on samples from 25 patients with aSAH and 10 healthy controls recruited from the Department of Neurosurgery, Jinling Hospital, the First School of Clinical Medicine, Southern Medical University, China. Blood was collected from patients with aSAH at three time points: before therapy, 3 days after aSAH, and 7 days after aSAH, and that from the control samples was collected at the physical examination center. All samples were obtained in the fasting condition. The characteristics of the recruited patients are shown in Table 3. Whole blood samples were homogenized in TRIzol reagent (Servicebio, Wuhan, China). Total cellular RNA was then extracted and transcribed into cDNA using a Servicebio RT First-strand cDNA Synthesis Kit (Servicebio, Wuhan, China). qPCR was subsequently performed by using 2×SYBR Green qPCR Master Mix (Low ROX; Servicebio, Wuhan, China) with the CFX Real-time PCR system (Bio-Rad Laboratories, MN, USA). Table  4 lists all primer oligos, which were synthesized by Servicebio Biotechnology (Wuhan, China). The mRNA levels of glyceraldeyhyde 3-phosphate dehydrogenase (GAPDH) were used for normalization of mRNA expression (The average qRT-PCR values are shown in Supplementary Table 1). Subsequently, relative quantification was performed based on the comparative threshold cycle (2 -ΔΔCT ) method. The qPCR experiment of each clinical sample was repeated for 3 times, and the mean value were calculated for differential comparation. The differential gene expression between the two groups was analyzed using a non-parametric test, and P values less than 0.05 were considered statistically significant.

Data availability
The data used to support the findings of this study are from previously reported studies and datasets, which have been cited.