Novel Signature Genes for Human Left Ventricle Cardiomyopathies Identied by Weighted Co-expression Network Analysis (WGCNA)

suggests that the disorder expression pattern of three subtypes cardiomyopathies could be more complex. The signature genes (TFAM, RHEB) were shared by Viral Cardiomyopathy and Post. Partum / Familiar /Idiopathic Cardiomyopathy groups, and only RHEB was signicantly down-regulated in Idiopathic cardiomyopathy group. It suggests that signature genes (TFAM, RHEB) play less contribution in pathological progress of viral cardiomyopathy and Post. Partum / Familiar /Idiopathic Cardiomyopathy.


Introduction
Cardiomyopathy is featured with incapacity of heart to pump and/or to ll with blood as body requires, and the worst state lapses into heart failure, a complex pathophysiological condition with left ventricle myocytes dysfunction. Globally, at least 26 million people were suffered from heart failure and consumed over $30 billion health expenditure in the world 1 . The mortality of heart failure is as high as ~ 50% in a ve-year period 2 . Although cardiomyopathy is a primary condition for heart failure, pathogenic damages (abnormal physical structure and dysfunction) have often happed in the tissue of patient's cardiomyopathy. Identi cation of biomarkers can be very useful for early diagnosis of cardiomyopathy, interruption of the disease procession to heart failure, and decrement of the mortality.
Etiologies for cardiomyopathy are multiple factors, which make cardiomyopathy a heterogeneous complex cardiovascular disease. The major stimuli include changed physiological conditions (such as pregnancy and delivery), and stressful pathological conditions (for example, ischemia, hypertension, diabetes, viral infection, and so on). According to the distinct etiologies, cardiomyopathy is divided into several subgroups that usually own speci c morphological features 3 . The most prevalent subtypes are hypertrophic cardiomyopathy (HCM), dilated cardiomyopathy (DCM), viral cardiomyopathy (VCM), familial cardiomyopathy (FCM), post-partum cardiomyopathy (PCM) and ischemic cardiomyopathy (ISCM) 3 . Early clinical investigations recognized cardiomyopathic cases caused by abnormal gene expression, without somatic genetic alteration, suggesting the impact of epigenetics and transcriptional changes on cardiomyopathy 4 . However, complex etiologies may lead to a variety of abnormal expression genes, making it di cult to identify the common cardiomyopathy biomarkers with limited number cases.
Computerization methodologies have been applied into the discovery of signature genes as potential biomarkers of diseases. Among many computerization methodologies, the Weighted Gene Co-expression Network Analysis (WGCNA) is a useful approach that analyzes gene expression pro ling to nd out coexpression network based on their functional features, and to discover the genetic disorders in diseases 5 .
WGCNA has been widely applied into screen novel biomarkers of cancers, and is demonstrated to be a reliable and powerful tool [6][7][8] . In this study, WGCNA was employed to analyze gene expression pro les of several cardiomyopathies, containing 90 human biopsies from GEO databases in NCBI. The genes were discovered that signi cantly associated with cardiomyopathies. The signature genes and key biological pathways identi ed by WGCNA were further validated through bioportal database, an independent annotation database of cardiovascular diseases.

Screening Gene Expression Datasets of cardiomyopathy from GEO Database
The gene expression dataset of cardiomyopathy used for data analysis were screened from the Gene Expression Omnibus (GEO) database (NCBI). Numbers of patients (more than 20 cases), variety of subgroups of cardiomyopathy, and clearly annotations with clinical trials were the major metrics for screening. A dataset, with a GEO tracking number GSE1145 and a platform entry number GPL570 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE114 5), was screened out for further analysis.
In this datasets, cardiac transcription pro les were established from cardiomyopathy patients of who were undergoing heart failure and planed for a cardiac transplantation. Human left ventricle samples were collected from biopsies of cardiomyopathy patients, or from "normal" organ donors whose hearts cannot be used for transplants. The heart failure of these cardiomyopathy patients arises from different etiologies. The transcriptional pro les were measured by Affymetrix Human Genome U133 Plus 2.0 Array. Changes in transcriptional pro les were correlated with the physiologic pro le of failure hearts acquired at the time of transplantation. Sample collection and microarray dataset were performed by the cardiogenomics lab, Department of Bauer Center for Genomic Research, Harvard University.
Construction of Weighted Gene Co-expression Network WGCNA package of R (version 1.63) was downloaded (http://www.Rproject.org) and setup by following the protocol previously 9 . Each gene expression value from the downloaded dataset (GSE1145) was normalized by compared with inter reference genes, and performed a log2 transformation. Microarray quality was tested by sample clustering according to the distance between different samples in Pearson's correlation matrices. Outliers were identi ed with a height cut of 170000. Outliers and samples with excessive missing values were excluded from next analysis. Data quality was checked by the principal component analysis (PCA). The co-expression network was constructed, setting the soft threshold β = 7, which means that R 2 is equal to 0.98 and indicates the constructed network is close to a scale-free network. Once β value was determined, the Topological Overlap Matrix (TOM) and dissTOM = 1 − TOM were obtained automatically. Modules were identi ed based on TOM and dissTOM. The hierarchical clustering analysis was used to identify gene modules and color to indicate modules, which is a cluster of densely interconnected genes in terms of co-expression. Genes that were not co-expressed were assigned into a grey module. The signi cant p-value of candidate genes was calculated via T-test. The association between the modules and diseases were evaluated by Gene signi cance (GS) and Module signi cance (MS); GS was de ned as mediated p-value of each gene (GS = lgP); MS was de ned as the average GS of all the genes involved in the same module. The cut-off signi cant standard was setup as p-value lower than 0.05. Importance of a gene within a module was measured by the module membership (MM). MM was calculated as MM(i) = cor (xi, ME), where i represents gene contained in module; ME (module eigengene) is de ned as the rst principal component of the module and represents the overall expression level of the module 10 . Modules that signi cantly associated with the traits of different etiologies were identi ed by calculation the correlation of MEs with clinical pathological features.
Function & Pathway enrichment analysis for gene signi cance in module Signi cant genes that were related to different pathological phenotype were blast through the web-based GenCLiP 2.0. Correlation analysis that biological functions and molecular networks involved with the genes were performed 11 . The connection strength of a gene to other genes in a global functional pathway and network was measured by gene connectivity. Genes associated with cardiomyopathy were ltered from the identi ed signi cant genes by blast in the Cardiovascular Disease Portal, which contains annotations of 854 genes with veri ed functions related to human cardiomyopathies 12,13 . Genes that have been reported to link with pathological features were labelled with purple border.

Identi cation of hub genes
Hub gene in a module, a signi cant gene that widely connects with other signi cant genes, is the key interconnected nodes within a functionally network and plays the most important roles in the biological processes 14 . Hub genes were identi ed by two methods: co-expression network and PPI network analysis.
Potential hub genes among each signi cant module were selected by co-expression network, using GS (> 0.2), MM (> 0.8, with a threshold of p-value < 0.05), and correlated to certain clinical traits as the screening criteria. In parallel, protein-protein interaction (PPI) network of the module genes were built in a selected signi cant module through the STRING database. Interaction between genes was de ned as positive, as cutoff > 0.4; potential hub gene was ltered in PPI network analysis, if its connectivity degree of ≥ 8 through STRING database. Overlapped potential hub genes in both co-expression network and PPI network analysis were the "real" hub genes.

Validations of Expression of Hub genes and Signature Genes
Signi cant genes were validated by comparison of their expression levels among the subgroups of cardiomyopathy and health group that was used as the benchmark. Individual gene expression in each group was presented as means ± standard error of the mean (SEM). Signi cance of differences was determined by Student t-test (Prism, GraphPad, San Diego, CA). Differences were signi cant if p < 0.05 (*). When p < 0.05, The standard of signi cance was setup as up-expression (Fold change > 1.0, p < 0.05) or down-expression (Fold change < 1.0, p < 0.05).  Enriched Genes Signi cance related to different cardiomyopathies.

Clinical information of dataset
Compared the module memberships (MM) correlation and Genes Signi cance (GS) among the all signi cant modules, the module with most signi cant value was de ned as the best candidate for pathological traits correlation analysis (Table 1, Table.1). In the familial cardiomyopathy and post. partum cardiomyopathy groups, modules of brown, magenta and purple were tightly clustered with familiar cardiomyopathy, while the magenta module was the most signi cant associated with disease status ( Figure.S4D-4E). In the hypertrophic cardiomyopathy group, no module was associated with its pathological feature. In the ischemic cardiomyopathy group, although module of black and midnightblue were clustered in closer branch, the yellow module was allocated in the adjacent branch ( Figure.  The Biological Processes were mainly concentrated in subgroups of cellular macromolecular metabolic process, protein metabolic process, organic substance metabolic process and macromolecule modi cation. The genes signi cant enriched in molecular functions were summarized and listed (Table.S4). The molecular functions were linked to endoplasmic reticulum functions, cell functions (migration, death, growth, division), DNA binding, protein-protein interaction, kinases activity and signal transduction, iron binding and nucleotide binding, etc. The cellular components were mainly enriched in membrane-bounded organelle, intracellular organelle part, etc. The pathways concentrated on mitogenactivated protein kinase (MAPK) signaling pathway, protein processing in endoplasmic reticulum, regulation of actin cytoskeleton, etc. These results suggested that dysregulation of cardiac functions would be associated with metabolism abnormal and accelerated progress of cardiomyopathies. Moreover, the network and connectivity of signi cance genes were identi ed as node-connection map, which was correlated to different traits (Ischemic, Figure. (Table.3), and the numbers of real hub genes were listed ( Figure. 4A-F). The Idiopathic Dilated group was dismissed for further analysis as no identi ed real hub gene. Through Venn diagrams analysis, three common axes of hub genes were discovered among these cardiomyopathic groups (Figure.4G).
The rst axis was PICALM, which shared by Ischemic Cardiomyopathy, Idiopathic Cardiomyopathy and Post. Partum Cardiomyopathy groups ( Figure.4G), and signi cantly up-regulated in Idiopathic Cardiomyopathy and Ischemic Cardiomyopathy groups (Table.3). PICALM is key regulator in iron homeostasis, clathrin-mediated endocytosis 15,16 . Overexpression of PICALM impaired endocytosis of Transferrin (Tf) Receptor (TfR) and Epidermal Growth Factor Receptor (EGFR) and disturbed the iron homeostasis 16,17 . Up to now, it is still illusive that the exactly role and deregulatory mechanism of PICALM in cardiomyopathies. It is strongly suggesting that PICALM work as potential novel biomarker and therapy target for these subcases of cardiomyopathies.
The secondary axis, contained genes of PRKACB, MOB1A, CDC40, were shared in Post. Partum Cardiomyopathy and Idiopathic Cardiomyopathy groups. In addition, these genes (PRKACB, MOB1A, CDC40) were signi cantly overexpressed in Idiopathic cardiomyopathy group, and MOB1A was upregulated in Post. Partum cardiomyopathy group (Table.3).These genes were linked to the cAMP (cyclic AMP)-dependent protein kinase A (PKA) mediated the exciting-contraction coupling in cardiomyocytes 18 , and regulated microtubule stability, cell cycle and cell proliferation & migration, and restrained cardiomyocyte proliferation and size via Hippo pathway 19,20 . PRKACB (protein kinase cAMP-activated catalytic subunit β gene) was linked to congenital heart defect with abnormal over-expression 21 . MOB1A (MOB kinase activator 1A) was required for cytokinesis through regulating microtubule stability. It worked as binding partners as well as co-activators of Ndr family protein kinases and mediated phosphorrecognition in core Hippo pathway that restrains cardiomyocyte proliferation during development to control cardiomyocyte size 19,20 . Overexpression of MOB1A induces centrosomes fail to split and cell size (cAMP-responsive element-binding protein) had been identi ed as the transcription factor and mediated cAMP stimulation by multiple extracellular signals, such as growth factors and hormones. The CREB1 was the key regulator in heart and linked with heart disease via cAMP-PKA pathway dysregulation 25,26 . The DBT (dihydrolipoamide branched chain transacylase E2) is an inner-mitochondrial enzyme complex regulated to degrade the branched-chain amino acids isoleucine, leucine, and valine 27 . The DBT was reported as clinical diagnostics biomarker for patients with dilated cardiomyopathy via caused mitochondria dysfunction 28 . NCOA2 (nuclear receptor coactivator 2) is a transcriptional coactivator that functional aid for nuclear hormone receptors, including steroid, thyroid, retinoid, and vitamin D receptors. NCOA2 promotes muscle cells maintenance and growth, eventually regulates in cardiac cTnT levels 29,30 .
Overexpression of NCOA2 regulated cell proliferation in cardiomyopathy 30,31 . NUDT21 (nudix hydrolase 21) is a novel of cell fate regulator by alternative polyadenylation chromatin signaling, and suppression of NUDT21 will enhance the cell pluripotent, facilitated trans-differentiation into stem cell 32 . NUDT21 regulates cell proliferation through ERK pathway 33 . Up to now, little knows about the function of NUDT21 in cardiomyocytes. PIK3C2A (phosphatidylinositol-4-phosphate 3-kinase catalytic subunit type 2 alpha) is an enzyme belong to phosphorylate the 3'-OH of inositol ring of phosphatidylinositol (PI) superfamily and regulates multiple signaling pathways. PIK3C2A is mainly expressed in endothelial cells, vascular endothelium, and smooth muscle 34 . Lower expression of PIK3C2A in peripheral blood was used as signi cant biomarker for acute myocardial infarction patients 35 . More interesting, these hub genes indicated different expression pattern. The expression level of DBT, NCOA2, NUDT21 and PIK3C2A were signi cantly upregulated in Idiopathic cardiomyopathy group, and PIK3C2A was up-regulated in Familiar cardiomyopathy group (Table.3). It hints that these hub genes play different regulatory pattern in the progress of these subtype's cardiomyopathies.

The fourth axis of hub genes (HNRNPC, UEVLD) were shared by Familiar Cardiomyopathy and Idiopathic
Cardiomyopathy groups, and signi cantly overexpressed in Idiopathic cardiomyopathy group (Table.3). HNRNPC (heterogeneous nuclear ribonucleoprotein C) is RNA binding protein that belong to ubiquitously expressed heterogeneous nuclear ribonucleoproteins subfamily, and mediates pre-mRNAs transport and metabolism between cytoplasm and nucleus 36,37 and overexpression caused cells multi-nucleation 38 . UEVLD (EV and lactate/malate dehydrogenase domain-containing protein) involves the protein degradation and dysregulated linked with metabolic disease 39 . In this study, the expression level of HNRNPC and UEVLD were signi cantly up-regulated in Idiopathic cardiomyopathy group (Table.3).
Furthermore, through different signi cant expression analysis, the signi cant changed hub genes were summarized (Table.3, p < 0.05). Combined these results together, it hints that these signi cantly expressed Hub genes play dominant role and work as common key regulatory nodes in progress of cardiomyopathies.

Disease Signature Genes Identi cation and expression analysis
The ltered disease signature genes were summarized with the functional annotation of genetic dysregulation correlated to heart diseases phenotypes, including ten signature genes in the ischemic group and viral cardiomyopathy group, forty signature genes among the group of familiar cardiomyopathy, Post-partum cardiomyopathy and Idiopathic cardiomyopathy, and 69 signature genes in the ischemic cardiomyopathy group (Table.4). Through Venn Diagram analysis, the common signature genes were determined among different groups ( Figure. Furthermore, through different signi cant expression analysis, the signi cantly changed signature genes were summarized (Fig. 5D-F, Table.5). In ischemic group, MDM4 gene was signi cantly upregulated (FC = 1.0495, p = 0.0037) (Table 5, Figure.5B), which genetic deletion associated with cardiomyopathy 43 . In viral cardiomyopathy group, COA5 was overexpressed (FC = 1.087485, p < 0.0001) (Table 5, Figure. Table.5). These results suggest that these common disease signature genes work as novel biomarker and be potential key regulators of the cardiomyopathy progress.

Discussion
In this study, to discover novel signature genes or biomarkers to accelerate the precise clinical diagnostics and interference for different subtype of cardiomyopathies, the WGCNA pipeline was applied to analyze the gene expression pro ling of 86 clinical left ventricle biopsy samples, which represents 8 subtype's cardiomyopathies. The WGCNA pipeline was widely used for performing various functions in weighted correlation network analysis, including constructing network, detecting module, calculating topological properties, simulating data, visualization, and interfacing with external software 9 . The whole transcriptome pro le contained 20,283 target genes for promise diagnostic assessment and mainly covered variously biological and cellular processes. It is representative of real pathological satiation and valuable to discover the signature gene of cardiomyopathies. First of all, it was reasonable to build the coexpression networks with different clinical cardiomyopathies traits using the Pearson correlation ( Figure.2A). To discover the related modules to cardiomyopathies phenotype, the genes signi cance of the modules was calculated by the linear mixed effects model for testing the association of node to the pathological phenotypes. It was identi ed that the association signi cance between individual modules of gene expression pro le and different cardiomyopathies feature ( Figure.2B). Through the Eigengene dendrogram analysis, the most signi cantly module was pick out for next analysis (Figure.3B, Figure.S4A-G). For the next step, the real hub genes among each signi cant module were screened by module membership (MM) -Gene Signi cance and Protein-Protein Interaction Network analysis, and comprised as key interconnected nodes within a functionally network and played important roles in biological functions 14 . Without any real hub genes identi ed in the Idiopathic Dilated group, it was discarded for next analysis. In addition, the Idiopathic Dilated case was treated as unique physiological state without impaired the normal cardiac function and ignored for analysis. Brie y, the next analysis was mainly concentrated on these ve subtype's groups, including idiopathic cardiomyopathy (IdCM), familial cardiomyopathy (FCM), post-partum cardiomyopathy (PCM), Ischemic cardiomyopathy (IsCM) and viral cardiomyopathy (VCM). There were four axes of hub genes shared among these cardiomyopathic groups. It was suggesting that these Hub genes work as common key regulator. It was exception that viral cardiomyopathy group did not share hub gene with the others groups. It was possible unique that the dysregulation expression pattern of viral cardiomyopathy. Furthermore, to deeply dig the correlation of signi cance genes and different cardiomyopathies, the signi cance genes were blast through GenCliP2 to mine gene networks and functions connection, biological process, molecular functions, the cellular components and functional pathways. Although the enriched pathways were varied from different subtype's cardiomyopathies, the key pattern was similar ( Figure.S5A-F). The biological processes were signi cantly concentrated on cellular metabolic, protein metabolic, organic substance metabolic and macromolecule modi cation. It suggests that the metabolic process disorder be associated with progress of cardiomyopathies. The molecular functions were mainly involving in protein binding, heterocyclic compound binding, purine ribonucleotide binding, iron binding and nucleotide binding, etc. The cellular components were including membrane-bounded organelle, intracellular organelle part, etc. The pathways were concentrated on MAPK signaling pathway, protein kinase C protein processing in endoplasmic reticulum, regulation of actin cytoskeleton, etc ( Figure.S6A-F). These related genes were summarized with functions and pathways (Table. S4A-F). Furthermore, most of signi cance genes were labelled as geneterm association not reported. It represents that the regulatory mechanism of these genes is illusive in progress of cardiomyopathies. Through Literature Gene Networks Mining, the genes were identi ed in the regulatory network with less published literatures, which were associated with different cardiomyopathies except Post-Partum cardiomyopathy ( Figure.S5D-B). It is possible that less researches and few reports concentrated on Post-Partum cardiomyopathy. For subtype of idiopathic cardiomyopathy (IdCM), familial cardiomyopathy (FCM), and Ischemic cardiomyopathy (IsCM), the highlighted genes were mainly concentrated in MAPK signaling pathway, including MAPK1, MAPK14, CREB1, RAC1. The genes YY1, RAPGEF1, SMAD2, JUND, ATF1, and SRA1 linked with viral cardiomyopathy ( Figure.S5F-B), which are assemble in SMAD signaling pathway 48 and YY1 was overexpressed in heart failure 49 . These results partially matched the key axes of hub genes linked the functions and pathways. These new discovered genes linked to viral cardiomyopathy may give a new view for clinical diagnostics and treatment. It suggests that the dysfunctions of these signi cance genes would be associated with metabolism disorder and progress of cardiomyopathies.
Limited by hardly accessible to clinical samples, blast through the cardiovascular disease BioPortal database was employed as validation strategy to explore the disease signature genes that associated with different subtype cardiomyopathies. The Cardiovascular Disease Portal provides easy access to multiple genetic data associated with speci c cardiomyopathy types and. The Cardiovascular Disease Portal integrates data for genes, QTLs and strains associated with the disease(s) highlighted and translational research annotation and associated disease information. The ltered genes will be de ned as disease signature genes for cardiomyopathies 12, 13 . The varied number of disease signature genes were ltered for different groups (Table.4). There were three axes of common signature genes identi ed among ve subtype's cardiomyopathies groups (Fig. 5A). The rst axis contained 4 disease signature genes (MDM4, CFLAR, RPS6KB1, PKD1L2) shared by ischemic and ischemic cardiomyopathy group. Compared with health control, only MDM4 was signi cantly overexpressed (FC = 1.0495, p = 0.0037) in ischemic group, while the four signature genes did not signi cantly change in ischemic cardiomyopathy group. It matches the previously reports that the upregulated MDM4 plays cardio-protective effect in ischemia-refuse injury 50 . Overexpression of MDM4 re ects the self-correction of physiological system in abnormal physiological condition of ischemic. It will generate new strategy to interfere the ischemic lapse into cardiomyopathies by arti cial upregulated MDM4 expression. The secondary axis, consisted of eight signature genes (MAPK1, MAPK11, MAPK14, LMNA, RAC1, PECAM1, XIAP, CREB1), was shared by Ischemic Cardiomyopathy with Post. Partum/Familiar/Idiopathic Cardiomyopathy groups. These signature genes (MAPK1, MAPK11 and LMNA down-regulated; RAC1 up-regulated) were signi cantly changed in ischemic cardiomyopathy group, which played dominant role in progress of cardiomyopathy, and only LMNA was signi cantly down-regulated in Idiopathic Cardiomyopathy group (Table.5, Fig. 5G). It suggests that the disorder expression pattern of three subtypes cardiomyopathies could be more complex. The signature genes (TFAM, RHEB) were shared by Viral Cardiomyopathy and Post. Partum / Familiar /Idiopathic Cardiomyopathy groups, and only RHEB was signi cantly down-regulated in Idiopathic cardiomyopathy group. It suggests that signature genes (TFAM, RHEB) play less contribution in pathological progress of viral cardiomyopathy and Post. Partum / Familiar /Idiopathic Cardiomyopathy. Combined these results together, it strongly suggests that these genes could be used as promising biomarkers or therapy targets for cardiomyopathies. This progress will be helpful to integrate precise clinical application for different subtype cardiomyopathies.
There are some limitations in this study. Firstly, lacking of larger number of clinical samples and more detail of disease state information, it was hardly to track the patient's cases with expression pro les and verify these potential biomarkers with original patient's pathological feature. Secondly, due to the nature of bioinformatics analysis, the discovered speci c GO pathways and biomarkers do need further investigation. Although these genes were validated signi cantly associated with cardiomyopathies feature through cardiovascular disease BioPortal database, and it is necessary to verify these potential biomarkers with more clinic patient's biopsies by immunohistochemistry (IHC) or other genetic detection method, like qPCR or sequencing in coming research work. It is mandatory and rational to investigate the contribution and mechanism of these potential disease signature genes to the progress of cardiomyopathies by animal model and validate with clinical data in the future research. On the other side, this study has several novelties. Firstly, it has applied reverse strategy by using WGCNA approach to discover the genes signi cantly associated with cardiomyopathies pathological feature in clinical samples. In parallels, compared these signature genes expression level among health control and disease groups, the results indicated that signature genes have signi cant expression change in disease groups (Table.5). Secondly, the key functional & GO pathways that genes signi cance in module involving in the progress of cardiomyopathies were inquired by Genclip enrichment analysis. The results will be a clause for the next step research. Thirdly, exploring through bioportal database, some novel potential signature genes were identi ed as potential biomarkers of cardiomyopathies in previously independent researches.
These results are partially as evidences to support our results and research strategy.
In summary, this study provides new insight to identify the potential novel key regulatory biomarkers or therapy targets for varied cardiomyopathies induced by different etiologies. The disease signature genes associated with cardiomyopathies were identi ed and listed as the potential therapy targets for clinical application. In the future research, the detail of regulatory mechanism of these disease signature genes will be deeply investigated and develop novel therapy strategy for cardiomyopathies.