Integrated Analysis of Summary Statistics to Identify Pleiotropic Genes and Pathways for the Comorbidity of Schizophrenia and Cardiometabolic Disease

Genome-wide association studies (GWAS) have identified abundant risk loci associated with schizophrenia (SCZ), cardiometabolic disease (CMD) including body mass index, coronary artery diseases, type 2 diabetes, low- and high-density lipoprotein, total cholesterol, and triglycerides. Although recent studies have suggested that genetic risk shared between these disorders, the pleiotropic genes and biological pathways shared between them are still vague. Here we integrated comprehensive multi-dimensional data from GWAS, expression quantitative trait loci (eQTL), and gene set database to systematically identify potential pleiotropic genes and biological pathways shared between SCZ and CMD. By integrating the results from different approaches including FUMA, Sherlock, SMR, UTMOST, FOCUS, and DEPICT, we revealed 21 pleiotropic genes that are likely to be shared between SCZ and CMD. These genes include VRK2, SLC39A8, NT5C2, AMBRA1, ARL6IP4, OGFOD2, PITPNM2, CDK2AP1, C12orf65, ABCB9, SETD8, MPHOSPH9, FES, FURIN, INO80E, YPEL3, MAPK3, SREBF1, TOM1L2, GATAD2A, and TM6SF2. In addition, we also performed the gene-set enrichment analysis using the software of GSA-SNP2 and MAGMA with GWAS summary statistics and identified three biological pathways (MAPK-TRK signaling, growth hormone signaling, and regulation of insulin secretion signaling) shared between them. Our study provides insights into the pleiotropic genes and biological pathways underlying mechanisms for the comorbidity of SCZ and CMD. However, further genetic and functional studies are required to validate the role of these potential pleiotropic genes and pathways in the etiology of the comorbidity of SCZ and CMD, which should provide potential targets for future diagnostics and therapeutics.

INTRODUCTION Schizophrenia (SCZ) is a serious mental illness, with approximately 10-20 years life expectancy reduced compared with the general population (1). The most common cause of premature death in people with SCZ is cardiovascular disease (CVD), which results in three-fold higher mortality and 10 years shorter life expectancy for patients with SCZ than the general population (2). While the increased risk of CVD morbidity and mortality in SCZ can be explained by several factors (e.g., smoking, poor diet, and sedentary behavior) (3), it is now established that cardiometabolic disease (CMD) including body mass index (BMI), coronary artery diseases (CAD), type 2 diabetes (T2D), low-density lipoproteins (LDL), high-density lipoproteins (HDL), total cholesterol (TC), and triglycerides (TG), accounts for the majority of the incidence of CVDrelated death in schizophrenic patients (4).
Historically, the high prevalence of CMD among schizophrenic patients has been primarily attributed to unhealthy lifestyle factors and the side effects of antipsychotic medications (5). However, recent evidences have suggested that genetic basis and common biological pathways shared between SCZ and CMD (6)(7)(8). For example, Andreassen et al. using genetic-pleiotropy-informed methods detected 10 loci associated with both SCZ and CVD risk factors, which include waist-to-hip ratio, systolic blood pressure, BMI, LDL, HDL, and TG (6). So et al. performed polygenic risk scores, linkage disequilibrium score regression, and Mendelian randomization analysis and showed that genetic basis shared between SCZ and BMI, the causal relationship between SCZ and TG, and common biological pathways (e.g., aldosterone synthesis and secretion, neuronal system, and insulin secretion) shared between SCZ and CMD (8). These evidences provide the foundation for the genetic factors contribute to the comorbidity of SCZ and CMD.
Despite the fact that abundant genetic variants have been reported to be associated with the comorbidity of SCZ and CMD, understanding the functional consequences of genetic variation and identifying the pleiotropic genes and pathways are challenging in human genetics. First, the linkage disequilibrium (LD), a correlation structure exists across genetic variation of different loci (9). The top associated variant at a locus is often not the causal variant (10). Second, the complexity of gene regulatory. As genetic variants can affect phenotype through distal regulation of gene expression, the nearest gene to the genome-wide association studies (GWAS) top signal is often not the causal gene (10). Additionally, genetic variation affects gene expression in a tissue-specific manner (11). The complexity of LD and gene regulatory hinder the identification of pleiotropic genes and pathways for the comorbidity of SCZ and CMD.
In this study, we utilized different approaches and strategies to translate the genetic risk loci into potential candidate genes and pathways for SCZ and CMD, respectively, and then investigated the pleiotropic genes and pathways underlying the comorbidity between them ( Figure 1). Firstly, we used positional mapping to functionally annotate of traits-associated genetic variants from GWAS summary statistics of SCZ and CMD.
We then integrated the GWAS summary statistics of SCZ and CMD, and tissue-specific expression quantitative trait loci (eQTL) data to predicate the causal genes for SCZ and CMD. Finally, we performed gene-set enrichment analysis with GWAS summary statistics to identify the potential biological pathways for SCZ and CMD. This landscape of potential pleiotropic genes and biological pathways will help us to understand clearly the increased risk of CVD morbidity and mortality in SCZ.

GWAS Summary Datasets
The GWAS summary statistics analysis included in this study were obtained from the following publicly available databases: i. The GWAS data of SCZ was obtained from Psychiatric Genomics Consortium (PGC) (12), which systematically meta-analyzed of the genome-wide genotypes from 49 independent samples (46 of European and 3 of Asian ancestry, including 35,476 SCZ cases and 46,839 controls). Genotype data was processed by the PGC using unified quality control procedures followed by imputation of SNPs and insertion-deletions using the 1000 Genomes Project reference panel. Around 9.5 million variants after quality control were included in the dataset and used in this study. ii. The largest-scale GWAS meta-analysis summary data of BMI was performed by Genetic Investigation of ANthropometric Traits (GIANT) (13), which conducted with a total sample of 322,154 individuals of European descent. This GWAS examined the phenotype of BMI as determined from measured or self-reported weight and height, and identified 77 loci reaching genome-wide significance (P < 5 × 10 −8 ). iii. The summary data of genetic variants associated with CAD was performed by the Coronary ARtery DIsease Genome wide Replication and Meta-analysis plus The Coronary Artery Disease Genetics (CARDIoGRAMplusC4D) Consortium (14), which assembled 60,801 cases and 123,504 controls from 48 studies. The majority (77%) of the participants were of European ancestry; 13% and 6% were of South Asian and East Asian ancestry, respectively, with smaller samples of Hispanic and African Americans. The results of association analysis from an additive model and a recessive model were used in this study. iv. Genetic variations associated with T2D were obtained from DIAbetes Genetics Replication And Meta analysis (DIAGRAM) Consortium (15). This study is a meta-analysis from 32 GWAS, including 898,130 individuals (74,124 cases and 824,006 controls) of European ancestry. More than 200 loci reaching genome-wide significance (P < 5 × 10 −8 ) in the BMIunadjusted analysis and 152 loci BMI-adjusted analysis. The GWAS summary statistics with BMI adjustment was considered for our analyses. v. The GWAS summary statistics of dyslipidemia were accessed from the Global Lipids Genetics Consortium (GLGC) (16), which provides the meta-analysis results on four phenotypes: HDL, LDL, TC, and TG. These results are based on GWAS results from 46 cohorts comprising of more than 100,000 individuals of European ancestry (N HDL = 99,900, N LDL = 95,454, N TC = 100,184, and N TG = 96,598).
All GWAS summary statistics used in this study are based on the hg19 human assembly and rsIDs were mapped to dbSNP build 151 using MySQL local database if necessary. We excluded the genetic variants in extended major histocompatibility complex (MHC) region (chr6:25-35MB), due to the complexity of haplotype and LD structure. The major samples in the GWAS summary datasets came from populations of European ancestry. The GWAS summary datasets used in this study were downloaded from publicly available resources listed in Supplementary Table S1. More detailed information about sample recruitment and diagnosis, genotyping, quality control, and statistical analysis can be found in the original paper (12)(13)(14)(15)(16).  (20). Qi et al. meta-analysed cis-eQTL between brain and blood to identify putative functional genes for brain-related complex traits and diseases (21). Battle et al. sequenced RNA from whole blood of 922 genotyped the European ancestry individuals from the Depression Genes and Networks cohort for understanding the consequences of regulatory variation in the human genome (22). The GTEx V7 was established to characterize human transcriptomes and has created a reference resource of gene expression levels from nondiseased tissues, including genotype, gene expression, and histological data for 449 human donors across 44 tissues (23). More details about sample description, genotyping, expression quantification, and statistical analyses can be found in the corresponding original paper (17)(18)(19)(20)(21)(22)(23).
Considering that genetic variants may affect gene expression in a tissue-specific manner, we used brain and whole blood eQTL for SCZ; subcutaneous adipose, visceral omentum adipose and whole blood eQTL for BMI; left ventricle, atrial appendage, aorta artery, coronary artery, tibial artery, subcutaneous adipose, visceral omentum adipose, liver, and whole blood eQTL for FIGURE 1 | The overall analysis conducted in this study. First, we obtained the summary-level GWAS datasets of SCZ and CMD from public GWAS databases. Then different approaches including FUMA, Sherlock, SMR, UTMOST, FOCUS, and DEPICT were conducted to predicate the candidate genes for them and identified the pleiotropic genes shared between them. Finally, we performed the gene-set enrichment analysis with GWAS summary datasets by the software of GSA-SNP2 and MAGMA to explore the biological pathways shared between them.

Identifying Causal Genes Using Positional Mapping (FUMA)
Functional annotation of genetic variants from GWAS summary statistics was performed using FUMA (24), which incorporates 18 biological data repositories to process GWAS summary statistics. In particular, positional mapping in FUMA was performed by the ANNOVAR annotations of specifying the maximum distance between SNPs and genes, and using Combined Annotation Dependent Depletion (CADD) scores (25) to predict the functional consequences of SNPs on genes. The CADD scores predict how deleterious the effect of an SNP is likely to be for a protein structure or function, with higher scores referring to higher deleteriousness. In this method, we chose the default distance 10kb as the maximum distance, and performed SNPs filtering based on CADD score. The threshold for significance is CADD scores ≥ 12.37 with SNP P-value ≤ 5×10 −8 .

Integration of GWAS and eQTL Datasets (Sherlock)
Considering that genetic variants may affect the disease through regulation of gene expression, we applied the method named Sherlock (26) to integrate GWAS and eQTL data with the aim to identify causal genes for diseases. Its underlying principle is that any genetic variants perturbs expression levels of risk genes is also likely to influence the risk of disease. Sherlock uses a Bayesian model and the information of SNPs in GWAS and eQTL data to calculate the SNP-level Bayes factor for estimating the association of the SNPs with the expression of gene and the disease, respectively. For the SNPs overlap between eQTL for a gene and the significant SNPs loci associated with the disease, it is straightforward to combine the SNP-level Bayes factor to obtain the Bayes factor for the gene and yielding a single pergene score to test whether the expression change of this gene has any impact on the risk of disease. Statistical significance was determined using a Bonferroni corrected with P-value < 0.05/the total number of genes.

Integration of GWAS and eQTL Datasets (SMR)
Summary databased Mendelian randomization (SMR) (27) was used to predict causal genes by integrating the summary-level data from GWAS and data from eQTL studies. The principle of SMR analysis is to use a genetic variant as an instrumental variable to test for the causative effect of the gene expression (the exposure) on the phenotype of interest. The method including two tests: SMR test and heterogeneity in dependent instruments (HEIDI) test. SMR uses the simulation analysis to evaluate the effect of genetic variant on gene expression, genetic variant on phenotype, and gene expression on phenotype, respectively (SMR test). To test whether gene expression and phenotype are affected by the same causative variant, it uses multiple SNPs in a cis-eQTL region to distinguish pleiotropy from linkage (HEIDI test). The gene is considered to be plausible causal gene if pass the SMR (Bonferroni corrected P-value < 0.05) and HEIDI tests (P-value ≥ 0.05).

Predicting Causal Genes Using Tissue Expression Data (UTMOST)
In contrast to Sherlock and SMR, unified test for molecular signatures (UTMOST) (28) is a powerful approach to studying the genetic architecture of complex traits by using multi-task learning method to jointly impute gene expression in tissues. Briefly, the UTMOST framework includes three main steps. First, it trains a cross-tissue expression imputation model by using the genotype information and matched expression data. Next, it tests the association between the trait of interest and imputed gene expression in each tissue. Finally, a cross-tissue test is performed for each gene to summarize single-tissue association statistics into a powerful metric that quantifies the overall gene-trait association. Statistical significance was determined using a Bonferroni corrected with P-value < 0.05/ the total number of genes.

Predicting Causal Genes Using Expression Weights Data (FOCUS)
To identify potential causal genes involved in complex traits and diseases, we apply the approach of fine-mapping of causal gene sets (FOCUS) (29). This approach is a probabilistic framework that models correlation among transcriptome-wide association study signals to assign a probability for every gene in the risk region to explain the effect of SNPs on a trait. By integrating the GWAS summary data, expression prediction weights (as estimated from eQTL reference panels), and LD among all SNPs in the risk region, it identifies causal gene to be included in a 90%-credible set and give a posterior probability (PIP) to for estimating the causality in relevant tissue types. In this work, we used the recommend eQTL reference panel weight database, which combines GTEx weights with the Metabolic Syndrome in Men study (adipose, n = 563) (30,31), the Netherlands Twins Registry (NTR; blood, n = 1,247) (32), the Young Finns Study (YFS; blood, n = 1,264) (33,34), and the CommonMind Consortium (dorsolateral prefrontal cortex, n = 452) (35) weights into a single usable database for FOCUS. The setting of significance threshold is the genes in a 90%-credible set with PIP ≥ 0.5.

Identifying Causal Genes Using Gene Prioritization Analysis (DEPICT)
We used Data-driven Expression Prioritized Integration for Complex Traits (DEPICT) (36) to prioritize genes at associated loci based on predicted gene functions. Briefly, DEPICT prioritizes genes based on the assumption that truly associated genes should share functional annotations. By using coregulation data from 77,840 microarrays and publicly available datasets, DEPICT accurately predicts gene function and generated 14,461 "reconstituted" gene sets. Integrating these precomputed gene functions and the GWAS summary data, DEPICT prioritizes genes that share predicted functions with genes from the other associated loci more often than expected by chance. As it has been widely used in previous studies for gene prioritization of SCZ (37,38) and CMD (39)(40)(41), we included their results into our study. We only use this method to prioritizes genes for TC. We chose P-value < 1.0 × 10 −5 as the GWAS significance threshold, which was recommended by the developers of the DEPICT software. The Benjamini-Hochberg procedure with a threshold of a false discovery rate (FDR) < 0.05 was regarded with statistical significance.

Pathway Enrichment Analyses
KEGG pathway enrichment analyses were carried out using clusterProfiler package (42) as implemented in R. The significance P-values of the KEGG pathways were corrected by the Benjamini-Hochberg procedure with FDR < 0.05.

Gene-Set Enrichment Analysis
In order to explore the biological pathways shared between SCZ and CMD, enriched pathways were identified for each trait using GSA-SNP2 software (43), which only requires the P-values of the SNPs in GWAS data and retains SNPs with 20 kb upstream or downstream of a gene. In this study, we used gene set databases of canonical pathways, KEGG, BioCarta and Reactome, which were downloaded from the Molecular Signatures Database (MSigDB) in GSEA (http://software.broadinstitute.org/gsea/ msigdb/index.jsp). The Benjamini-Hochberg method is used for the multiple testing correction. The minimum P-values of the pathway was chosen, and the P-values of pathways with FDR < 0.05 were regarded as statistical significance.
To validate the significant finding, the common pathways identified by GSA-SNP2 were investigated by MAGMA (44) with the same parameter settings (retaining SNPs with 20 kb upstream or downstream of a gene). Unlike GSA-SNP2, we used a nominal P-value threshold of P < 0.05.

RESULTS
To reveal the potential candidate gene for SCZ and CMD, we utilized six different approaches including FUMA, Sherlock, SMR, UTMOST, FOCUS, and DEPICT. To estimate the LD structure, we used the reference data from the European population of 1000 Genomes Project phase 3 (45). All analyses were carried out using the default parameters recommended by the developers if not mentioned in the methods section. A gene may represent a potential candidate gene if it is predicted by two or more than two approaches.

Candidate Genes Identified for SCZ
Functional annotation of the GWAS summary statistics of SCZ was performed using positional mapping in FUMA, and 110 causal genes associated with SCZ was identified (Supplementary Table S2). Through integrating the genetic variations associated with SCZ and tissue-specific eQTL data from brain and whole blood, Sherlock, SMR, UTMOST, and FOCUS identified 198, 50, 236, and 109 causal genes associated with SCZ, respectively (Supplementary Table S2). In addition, causal genes prioritized by DEPICT were obtained from Pers et al. (37) and Li et al. (38), which predicted 106 causal genes for SCZ (Supplementary Table S2). In total, we identified 553 causal genes whose expression level change may contribute to SCZ risk. KEGG pathway enrichment analysis showed that these causal genes were enriched in dopaminergic synapse (corrected P = 3.3×10 −2 ) and adrenergic signaling in cardiomyocytes (corrected P = 4.3×10 −2 ) (Supplementary Table S10).
Through integrating these causal genes predicted by the six different approaches, we identified 150 potential candidate genes associated with SCZ (Supplementary Table S2). Among the 150 genes, only ABCB9 was predicted by all approaches, which has been reported to be correlated with the risk of SCZ (46). There are seven genes (ARL6IP4, C2orf47, GATAD2A, GNL3, NT5C2, PCCB, and SNX19) predicted by five approaches, two genes (C2orf47 and PCCB) of which have not been reported in the literature yet.  Table S3). KEGG pathway enrichment analysis showed that these causal genes were also enriched in dopaminergic synapse (corrected P = 6.8×10 −4 ) and adrenergic signaling in cardiomyocytes (corrected P = 2.5×10 −4 ) (Supplementary Table S10), which were identified to be associated with SCZ.

Candidate Genes Identified for BMI
Among the 654 causal genes, most of them are only predicted by DEPICT, and 79 potential candidate genes were identified (Supplementary Table S3). Ten genes (ZNF668, NEGR1, KAT8, SH2B1, BCKDK, POC5, MAP2K5, C18orf8, NPC1, and C1QTNF4) were at least predicted by four different approaches, thus represent the most promising candidate genes for BMI. The ten promising candidate genes include six genes (NEGR1, KAT8, POC5, MAP2K5, C18orf8, and C1QTNF4) which were previously reported as causal genes in the original study (13), two genes (SH2B1 and NPC1) have been reported to be associated with BMI by other studies (47,48), and two genes (ZNF668 and BCKDK) were novel.

Candidate Genes Identified for T2D
Using different approaches (including FUMA, Sherlock, SMR, UTMOST, and FOCUS) to prioritize the causal genes for T2D, we identified 190, 188, 63, 327, and 178 causal genes associated with T2D, respectively (Supplementary Table S5). Causal genes predicated by DEPICT for T2D obtained from Scott et al. (40), which predicted 29 causal genes for T2D. Through integrating the results from these approaches, 171 potential candidate genes were identified to be associated with T2D (Supplementary Table  S5). Among the 171 potential candidate genes, there are eight genes (ABCB9, PABPC4, ITFG3, ANK1, CALR, CEP68, GCDH, and ZZEF1) predicted by five approaches, five genes (PABPC4, ITFG3, CEP68, GCDH, and ZZEF1) of which have not been reported in the literature yet.  Tables S6-9). Among these potential candidate genes, there are six genes (CETP, APOB, TMEM258, FADS2, FADS1, and PVRL2) associated with all phenotypes of dyslipidemia. Similar to our results, all of the six genes were previously reported to be associated with at least one phenotype of dyslipidemia. Overall, 245 potential candidate genes were identified to be associated with dyslipidemia.

Candidate Genes Identified for Dyslipidemia
Similar to our results, genetic variants in NT5C2 were previously reported to be associated with the comorbidity of SCZ and BMI (6), and genetic variants in or around MPHOSPH9 were reported to be increased risk of T2D in SCZ (49). The remaining genes are considered as novel genes associated with the comorbidity of SCZ and CMD.
Pleiotropic Genes Identified for the Comorbidity of SCZ and CMD Through integrating the potential candidate genes identified for SCZ and CMD, we identified 21 potential pleiotropic genes FIGURE 2 | This chord (50) diagram depicts the potential pleiotropic genes shared between SCZ and CMD. Connections indicate that the pleiotropic genes for SCZ and specific phenotype of CMD. Red indicates the gene is associated with SCZ and multiple phenotypes of CMD. BMI, body mass index; CAD, coronary artery diseases; T2D, type 2 diabetes; HDL, high-density lipoproteins; TC, total cholesterol; TG, triglycerides.

Common Pathway Identified for the Comorbidity of SCZ and CMD
We conducted KEGG pathway enrichment analysis for the causal genes of SCZ and CMD, however, there is no significant pathway enrichment shared between SCZ and CMD. To further explore whether there are some pathogenic pathways shared between SCZ and CMD, we performed gene set enrichment analysis using GSA-SNP2 with GWAS summary datasets. The results show that there are nine pathways shared between SCZ and CMD ( Table 2, Supplementary Table S11). Among these significant pathways, there are two pathways including CXCR4   pathway and growth hormone pathway shared between SCZ and three phenotypes of CMD.
To further validate the significant finding, the common pathways identified by GSA-SNP2 were investigated by MAGMA. However, only one pathway (MAPK-TRK pathway) shared between SCZ and BMI, and two pathways including growth hormone signaling and regulation of insulin secretion signaling shared between SCZ and T2D were confirmed by MAGMA ( Table 2, Supplementary Table S12). Among the three pathways, regulation of insulin secretion signaling was previously reported to be associated with the comorbidity of SCZ and CMD (8). The other pathways can be considered as novel.

DISCUSSION
As the key risk factor for CVD, CMD was getting more prevalent among patients with SCZ which is chiefly responsible for increasing risk of CVD morbidity and mortality in SCZ. Although unhealthy lifestyle factors and the side effects of antipsychotic medications have been primarily attributed to the high prevalence of CMD in SCZ, shared genetics between SCZ and CMD might also be of importance. Previous studies have shown that both SCZ and CMD are high heritable and polygenic (51,52). Recent studies have identified numbers of risk loci that are associated with the comorbidity of SCZ and CMD (6). These evidences provide the foundation for the genetic factors contribute to the comorbidity of SCZ and CMD. In this study, by utilizing several well-characterized methods of integrating the GWAS summary statistics of SCZ and CMD, and tissue-specific eQTL data and gene set database to translate the genetic risk loci into potential causal genes and pathways for them, we systematically predicted the candidate genes and biological pathways for SCZ and CMD. Through integrating the results from different approaches, we first revealed 21 potential pleiotropic genes and three biological pathways that are likely to be shared between SCZ and CMD.
Among the 21 potential pleiotropic genes, there are five genes associated with the comorbidity of SCZ and multiple phenotypes of CMD. NT5C2 encodes a cytosolic purine 5′-nucleotidase (cytosolic 5′-nucleotidase II) involved in cellular purine metabolism (53), which is associated with SCZ, BMI, and CAD. Previous studies have shown that loss of function of NT5C2 gene reduced body weight gain, improved glucose tolerance, reduced plasma insulin and triglyceride in high-fat diet mice (54). However, a recent study showed that knockdown of neuronal CG32549 in D. melanogaster, which is similar to NT5C2 protein in human, is associated with impaired motility behavior (55). These results strongly suggest that NT5C2 may play a key role in SCZ and CMD. C12orf65 is associated with SCZ, HDL, and T2D. The C12orf65 gene encodes a mitochondrial matrix protein participating in the process of mitochondrial translation (56). Although multiple phenotypes have been shown to be associated with mutations in C12orf65 gene, including early-onset optic atrophy, encephalomyopathy, peripheral neuropathy, intellectual disability, and spastic paraparesis (56)(57)(58), the function of C12orf65 gene remains largely unknown. SETD8 (SET8/Pr-SET7/KMT5A) is the causal gene for SCZ, HDL, and T2D. As a member of methyltransferase family specifically targeting Histone H4 Lys20 for methylation, SETD8 plays an important role in cellular senescence, proliferation and apoptosis (59)(60)(61). Consistent with our results, multiple experiments indicated that changed expression level of SETD8 may affect insulinoma cell proliferation (62), hyperglycemic memory (63), and lipid metabolism (64). One study has shown that reducing the expression levels of SETD8 may contribute to altered hippocampal cellular composition, impaired neurodevelopment, and subsequent neurocognitive impairment (65), which are associated with the phenotype of SCZ. GATAD2A is a subunit of the nucleosome remodeling and histone deacetylase (NuRD) complex, which is generally a s s o c i a t e d w i t h e m b r y o n i c d e v e l o p m e n t , c e l l u l a r differentiation, and the repression of transcription (66). Although our results showed that GATAD2A is associated with SCZ, T2D, TC, and TG, the pathogenesis is still unclear. Recent studies showed that NuRD complex is involved in neuronal development (67), cardiac and skeletal muscle structural and metabolic (68). These results indicated that GATAD2A may contribute to the comorbidity of SCZ and CMD. TM6SF2 is associated with SCZ, TC, and TG. The TM6SF2 gene encodes a multi-pass membrane protein localized in the endoplasmic reticulum and the ER-Golgi intermediate compartment (69). Several studies in vivo and in vitro have proved that TM6SF2 is closely related to abnormal metabolism of blood lipids, especially plasma TC and TG (70,71). However, the detailed mechanisms contributing to SCZ are still poorly understood.
Beyond these genes associated with the comorbidity of SCZ and multiple phenotypes of CMD, there are 16 potential pleiotropic genes associated with SCZ and one phenotype of CMD. In detail, VRK2, INO80E, YPEL3, and MAPK3 are the common candidate genes for SCZ and BMI. FES and FURIN are the candidate genes for both SCZ and CAD. ARL6IP4, OGFOD2, PITPNM2, CDK2AP1, ABCB9, MPHOSPH9, SREBF1, and TOM1L2 are associated with the comorbidity of SCZ and T2D. SLC39A8 and AMBRA1 are associated with the comorbidity of SCZ and HDL. Except that two genes (FURIN and SREBF1) have strongly suggested association with the phenotype of SCZ and CMD, the role of other genes in the pathogenesis of both SCZ and corresponding phenotype of CMD remains unknown. FURIN encodes a protein of the proprotein convertases family, which processes proproteins through limited proteolysis and convert them into bioactive proteins and peptides (72). Accumulating evidence suggests that FURIN plays a critical role in atherosclerosis through regulation of lipid metabolism and vascular inflammation (73). Recent studies also showed that overexpression of FURIN in monocyte/macrophage cell promoted migration, increased proliferation, and reduced apoptosis (74), which might contribute to atherogenesis. Intriguingly, studies of the function of FURIN in brain have shown that knockdown of FURIN decrease head size, and inhibit human neural progenitor cells migrate (35). Overexpression of FURIN enhances long-term potentiation and spatial learning and memory performance (75). Further study is needed to state the role of FURIN in SCZ and CAD. SREBF1 is a transcription factor participates in lipogenesis (76), insulin resistance (77), and inflammatory response (78), which may contribute to the development of T2D. Intriguingly, a recent study showed that SREBF1 is associated with multiple subphenotypes of SCZ, such as hyperlocomotor activity in dark, depression-like and aggressive behaviors, and social deficits (79). These evidences suggest that SREBF1 is likely associated with the comorbidity of SCZ and CMD. Although most of these genes have been confirmed to be related to the occurrence and development of SCZ and CMD, there is no experiment to confirm whether they are related to the comorbidity of SCZ and CMD.
To identify the potential biological pathways for SCZ and CMD, we conducted KEGG pathway enrichment analysis for all significant causal genes of SCZ and CMD. The results show that dopaminergic synapse and adrenergic signaling in cardiomyocytes are likely shared between SCZ and BMI. However, when we performed KEGG pathway enrichment analysis for the potential candidate genes of SCZ and CMD, there is no significant pathway enrichment. This may be caused by the function of some candidate genes are still unclear. To further explore the potential biological pathways shared between SCZ and CMD, we performed gene set enrichment using GSA-SNP2 and MAGMA. Our results showed that MAPK-TRK pathway shared between SCZ and BMI, growth hormone signaling and regulation of insulin secretion signaling shared between SCZ and T2D. The MAPK-TRK pathway is mainly regulated by neurotrophic factor ligands (e.g. brain-derived neurotropic factor, nerve growth factor) binding to tropomyosin-related kinase (Trk) receptor, which was associated with neuronal survival and morphogenesis, hippocampal long-term potentiation, and synaptic plasticity (80,81). Loss of Trk signaling also has been linked with food intake regulation and body weight (82,83). These evidences strongly suggest that the MAPK-TRK pathway may be related to the comorbidity of SCZ and BMI. The insulin and growth hormone signaling are closely related to the occurrence and development of T2D, and the biological effects of insulin and growth hormone are involved in lipid metabolism, carbohydrate metabolism, and glucose metabolism (84)(85)(86), which are potential therapeutic effectiveness for T2D. Intriguingly, a recent clinical research showed that insulin and growth hormone signaling were associated with the development of SCZ (87). Further research is worthwhile to explore the insulin and growth hormone signaling in the comorbidity of SCZ and CMD.
There were some limitations of the current study. First, the major samples in the GWAS summary datasets came from populations of European ancestry, and it is worthwhile to validate in other ethnic groups. Second, to generate highly credible candidate genes for the comorbidity of SCZ and CMD, the causal gene for a disease was chosen if it is predicated by two or more than two approaches. Although these genes are promising candidate genes for SCZ and CMD, genes supported by individual prediction approach may also have a role in disease. Third, the eQTL datasets used in this study mostly came from normal human tissues, which may miss the candidate gene for diseases. Lastly, though this study identified potential candidate genes shared between SCZ and CMD, further biological experiments are needed to demonstrate the role of these genes in the comorbidity of SCZ and CMD.
In summary, we first characterized the landscape of potential pleiotropic genes and biological pathways that are likely to be shared between SCZ and CMD. Through integrating the GWAS summary statistics, tissue-specific eQTL data and gene set database, we identified some potential candidate genes and biological pathways for SCZ and CMD (including BMI, CAD, T2D, HDL, LDL, TC, and TG), respectively. In total, we revealed 21 potential pleiotropic genes and three biological pathways shared between SCZ and CMD, which will enable us to better understand the etiology for the comorbidity of SCZ and CVD.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. The links to download GWAS datasets can be found in Supplementary  Table S1. Other data including eQTL datasets and the code of software for data analysis could be obtained from the resource described in Materials and Methods.

AUTHOR CONTRIBUTIONS
HL contributed to data analysis and wrote the manuscript. YS and XZ were responsible for technical support and revised the manuscript. SL and DH contributed to data and method preparation. LX and YC contributed to plot pictures and tables. LH and DW were responsible for the study design and supervised the whole study. All authors read and approved the final manuscript.