Transcriptome association studies of neuropsychiatric traits in African Americans implicate PRMT7 in schizophrenia

In the past 15 years, genome-wide association studies (GWAS) have provided novel insight into the genetic architecture of various complex traits; however, this insight has been primarily focused on populations of European descent. This emphasis on European populations has led to individuals of recent African descent being grossly underrepresented in the study of genetics. With African Americans making up less than 2% of participants in neuropsychiatric GWAS, this discrepancy is magnified in diseases such as schizophrenia and bipolar disorder. In this study, we performed GWAS and the gene-based association method PrediXcan for schizophrenia (n = 2,256) and bipolar disorder (n = 1,019) in African American cohorts. In our PrediXcan analyses, we identified PRMT7 (P = 5.5 × 10−6, local false sign rate = 0.12) as significantly associated with schizophrenia following an adaptive shrinkage multiple testing adjustment. This association with schizophrenia was confirmed in the much larger, predominantly European, Psychiatric Genomics Consortium. In addition to the PRMT7 association with schizophrenia, we identified rs10168049 (P = 1.0 × 10−6) as a potential candidate locus for bipolar disorder with highly divergent allele frequencies across populations, highlighting the need for diversity in genetic studies.


28
Individuals of recent African ancestry have been grossly underrepresented in genomic studies. African

29
American participants make up about 2.0% of all GWAS subjects (Sirugo et al., 2019). Specifically, 30 individuals of African ancestry make up only 1.2% of all neuropsychiatric GWAS  Gregor, 2018). With the advent of polygenic risk scores, accuracy in disease prediction is critical to the 32 development of precision medicine (Khera et al., 2018); however, the lack of representative diversity in 33 the study of genomics has impacted the accuracy of genetic risk prediction across diverse populations. 34 Despite similar incidences of schizophrenia across European and African ancestry populations (De Candia 35 et al., 2013;Whiteford et al., 2013), Africans have been predicted to have significantly less disease risk 36 than their European counterparts using current GWAS summary statistics (Martin et al., 2017). Inaccuracy 37 in predicting disease risk across populations can lead to further disparities in health and treatment of 38 underrepresented populations. To prevent misclassification of genetic risk, further work in the genetics un-39 derlying complex traits in African Americans is needed (Manrai et al., 2016). In an attempt to address this 40 discrepancy in genetic risk prediction, we performed a series of genetic association tests for schizophrenia 41 and bipolar disorder in two cohorts of African American individuals (Manolio et al., 2007;Suarez et al., 42 2006; Smith et al., 2009). 43 Schizophrenia and bipolar disorder are two heritable neuropsychiatric disorders whose genetic com-44 ponents have been attributed to the cumulative effect of thousands of loci across the genome (Ripke et al., 2014;Zhiqiang et al., 2017;Ikeda et al., 2017b). Past work shows that the genetic architectures of these 46 two disorders significantly overlap (Bhalala et al., 2018;Allardyce et al., 2018;Consortium, 2009;Stahl et al., 2019). Up to this point, the largest GWAS of schizophrenia and bipolar disorder comprise hundreds Manual of Mental Disorders) as described previously (Smith et al., 2009 (Suarez et al., 2006;Manolio et al., 2007), throughout the 93 rest of the paper, we will refer to the combined cohort as GAIN. In each cohort, we removed SNPs with 94 genotyping call rates less than 99% and those that significantly deviated from Hardy-Weinberg equilibrium 95 (P < 1 × 10 −6 ). We then removed individuals with excess heterozygosity. Individuals greater than three 96 standard deviations from mean heterozygosity were removed from the study. We used EIGENSOFT 97 smartpca (Patterson et al., 2006) to generate the first ten principal components, which were used to 98 confirm self-identified ancestry (Figs. S1 and S2 MAFs identical to those imputed from CAAPA at our chosen imputation r 2 and MAF thresholds (Fig. 111 S3) (Mathias et al., 2016).

113
Using PLINK, we performed a logistic regression of the phenotype using the first ten genotypic principal 114 components as covariates to account for population structure. We used a significance threshold of P 115 < 5 × 10 −8 to identify significantly associated SNPs. Plots were generated from PLINK results using the We performed the gene-based association test PrediXcan on both phenotypes, schizophrenia and bipolar 119 disorder, in this study. PrediXcan functions by predicting an individual's genetically regulated gene 120 expression levels using tissue-dependent prediction models trained using reference transcriptome data 121 (Gamazon et al., 2015). approach implemented in the R package ashr (Stephens, 2017). Using this package, we calculated the 138 local false sign rate (lfsr) for each test, which is similar to traditional false discovery rate approaches, 139 but takes into account both the effect sizes and standard errors of each gene-tissue pair (n=248,605).

140
In addition, this empirical Bayes approach uses the assumption that the distribution of actual effects is 141 unimodal with the mode at 0. We set our significance threshold for gene-tissue pairs at lfsr < 0.2.

142
Due to the dearth of African American neuropsychiatric cohorts, replication could not be completed  (Table S1). We found no significant gene-tissue associations using the MESA or 168 DLPFC models. While PRMT7 in atrial appendage had the lowest lfsr across all models, RP11-646C24.5 169 had a lower p-value ( Fig. 1), but high lfsr in both pancreas (lfsr = 0.860) and sigmoid colon (lfsr = 0.851).

170
Notably, the standard error in both of these tissues was over twice the size of that of PRMT7. Unlike more 171 traditional false discovery rate approaches such as Bejamini-Hochberg, both effect size and standard error 172 are used in an empirical Bayesian framework to calculate lfsr and thus the gene with the lowest p-value 173 may not be the gene with the lowest lfsr (Stephens, 2017). We found no significant associations with the 174 MultiXcan, cross-tissue model.

176
To develop a better understanding of the genetic mechanisms governing bipolar disorder in African 177 Americans, we performed PrediXcan in a cohort of 1,019 individuals (671 controls and 348 Cases).

178
Similar to our gene-based association study of schizophrenia, we performed our tests across the same 55 179 gene expression prediction models in our bipolar disorder study.

180
In the GAIN cohort of 1,019 African American individuals, no genes were identified to be significantly 181 associated with bipolar disorder. Increased predicted expression of GREM2 in testis was the most 182 associated (P = 2.20 × 10 −5 ) gene-tissue pair with bipolar disorder (Fig. 4). KCNMB3 had the lowest lfsr 183 at 0.919. We also found no significant associations with the MultiXcan, cross-tissue model.

184
Schizophrenia SNP-level Association Test 185 We performed a GWAS across greater than 12 million SNPs following imputation to help elucidate the 186 role specific SNPs play in the genetics of schizophrenia in African Americans. We used the first ten . PrediXcan association results for schizophrenia in GAIN African Americans. Each point on the Manhattan (A) and Quantile-Quantile (B) plots represents one gene-tissue test for association with schizophrenia using GTEx version 7 gene expression prediction models. PRMT7 expression in atrial appendage of the heart is labeled in both plots since it had the lowest lfsr (local false sign rate) of all tissues (lfsr = 0.119). Predicted RP11-646C24.5 expression in pancreas and sigmoid colon associations are represented as the two points with lower p-values than PRMT7, respectively, but lfsr was greater than 0.8 for each association. Unlike more traditional false discovery rate approaches such as Bejamini-Hochberg, the gene with the lowest p-value may not be the gene with the lowest lfsr especially if the standard error of the effect size estimate is high (Stephens, 2017).
principal components as covariates for our logistic regression in order to adjust for population stratification 188 in the cohort. In our SNP-level GWAS, we found no significantly associated SNPs; however, one of the 189 most associated SNPs, rs8063446 (P = 2.66 × 10 −6 ), is located at the PRMT7 locus (Fig. 5) While not 190 genome-wide significant, the most associated SNP in our study was rs112845369 (P = 1.094 × 10 −6 ) on 191 chromosome 15.

193
We also performed a logistic GWAS in over 12 million SNPs in an attempt to understand the role specific

194
SNPs play in the genetics of bipolar disorder. We similarly used the first ten principal components 195 to adjust for population stratification in the bipolar disorder cohort. Similar to our findings in our with increased expression of PRMT7 in 32 of 33 tissues in which it was predicted in the GAIN cohort.

212
When S-PrediXcan was applied to the PGC summary stats, increased expression of PRMT7 was associated 213 with schizophrenia in all 42 tissues in which expression was predicted (Fig. 3). PRMT7 made up five 214 of the eight most associated gene-tissue pairs (Table 2). PRMT7 has previously been associated with

218
While not found to be significantly associated with schizophrenia in brain tissues, the association is found in SLC7A6OS and 514 bp upstream of PRMT7. In our PrediXcan analyses, we found that increased predicted expression of PRMT7 is associated with schizophrenia. rs8063446 is located in a linkage disequilibrium (LD) block with other SNPs associated with schizophrenia when plotted using 1000G AFR LD Population.

226
In our GWAS of bipolar disorder, rs10168049 was the most significantly associated SNP. This SNP has 227 not been implicated in previous studies, and its low minor allele frequency in European populations 228 suggests that it could have a larger functional impact in African populations (Fig. 6). In addition to not ancestry, but at the loss of nearly half of the sample size of many GTEx tissues. To ideally predict 242 expression in African American cohorts, prediction models built in more tissues from African ancestry 243 reference transcriptomes are needed. Moreover, future ancestry-specific models will not only increase 244 accuracy of expression prediction, but they will also create opportunities for different methods, such as 245 local ancestry mapping, to be applied to expression prediction by accounting for recent admixture within 246 African American cohorts (Zhong et al., 2019).

253
The size and diversity of our prediction models further hindered our ability to identify novel genes 254 associated with these disorders. The MESA models, the most diverse of our predictors, were still limited