Psychosis Endophenotypes: A Gene-Set-Specific Polygenic Risk Score Analysis

Abstract Background and Hypothesis Endophenotypes can help to bridge the gap between psychosis and its genetic predispositions, but their underlying mechanisms remain largely unknown. This study aims to identify biological mechanisms that are relevant to the endophenotypes for psychosis, by partitioning polygenic risk scores into specific gene sets and testing their associations with endophenotypes. Study Design We computed polygenic risk scores for schizophrenia and bipolar disorder restricted to brain-related gene sets retrieved from public databases and previous publications. Three hundred and seventy-eight gene-set-specific polygenic risk scores were generated for 4506 participants. Seven endophenotypes were also measured in the sample. Linear mixed-effects models were fitted to test associations between each endophenotype and each gene-set-specific polygenic risk score. Study Results After correction for multiple testing, we found that a reduced P300 amplitude was associated with a higher schizophrenia polygenic risk score of the forebrain regionalization gene set (mean difference per SD increase in the polygenic risk score: −1.15 µV; 95% CI: −1.70 to −0.59 µV; P = 6 × 10−5). The schizophrenia polygenic risk score of forebrain regionalization also explained more variance of the P300 amplitude (R2 = 0.032) than other polygenic risk scores, including the genome-wide polygenic risk scores. Conclusions Our finding on reduced P300 amplitudes suggests that certain genetic variants alter early brain development thereby increasing schizophrenia risk years later. Gene-set-specific polygenic risk scores are a useful tool to elucidate biological mechanisms of psychosis and endophenotypes, offering leads for experimental validation in cellular and animal models.

Background and Hypothesis: Endophenotypes can help to bridge the gap between psychosis and its genetic predispositions, but their underlying mechanisms remain largely unknown.This study aims to identify biological mechanisms that are relevant to the endophenotypes for psychosis, by partitioning polygenic risk scores into specific gene sets and testing their associations with endophenotypes.Study Design: We computed polygenic risk scores for schizophrenia and bipolar disorder restricted to brain-related gene sets retrieved from public databases and previous publications.Three hundred and seventy-eight gene-set-specific polygenic risk scores were generated for 4506 participants.Seven endophenotypes were also measured in the sample.Linear mixed-effects models were fitted to test associations between each endophenotype and each gene-set-specific polygenic risk score.Study Results: After correction for multiple testing, we found that a reduced P300 amplitude was associated with a higher schizophrenia polygenic risk score of the forebrain regionalization gene set (mean difference per SD increase in the polygenic risk score: −1.15 µV; 95% CI: −1.70 to −0.59 µV; P = 6 × 10 −5 ).The schizophrenia polygenic risk score of forebrain regionalization also explained more variance of the P300 amplitude (R 2 = 0.032)

Introduction
Psychotic disorders are highly heritable, with a heritability estimate of approximately 80% for schizophrenia and bipolar disorder. 1,2Breakthroughs have been made by genome-wide association studies (GWAS) in understanding the genetic basis of psychosis, with 270 loci associated with schizophrenia and 64 loci associated with bipolar disorder identified so far. 3,4Although these findings are promising, the functional effects of these variants in the pathophysiology of psychosis are still in the process of being understood.
Partitioning the effects of risk loci into distinct brain functional domains can provide important biological insights into the mechanisms of psychosis.One such approach uses endophenotypes, ie, heritable phenotypes associated with a, putatively more complex, illness. 5As such, a biomarker is considered an endophenotype if it is heritable and consistently shown to be altered in both patients and their unaffected relatives. 6Previous studies have established several endophenotypes for psychosis, such as verbal memory, 7,8 executive functions, 9 P300 amplitudes/latencies, [10][11][12][13] and lateral ventricular volumes. 14olygenic risk scores, the sum of the number of risk alleles weighted by their effect sizes, provide a method to test the genetic overlap between psychosis and its endophenotypes.6][17][18][19][20][21] This could be because genomewide polygenic risk scores combine many risk alleles across the genome, but only a subset of them are associated with an endophenotype related to a specific biological process. 21ene-set-specific polygenic risk scores can be a useful tool to address the issue.They are the effect size-weighted sum of risk alleles restricted to genes within a particular gene set (often associated with a biological process), thus only containing a subset of risk alleles that might be relevant to a specific endophenotype.For instance, in a sample of 333 participants, Rampino et al found that both attentional performance and prefrontal cortex activity during an attention control task were associated with the schizophrenia polygenic risk score of glutamate signaling. 22Merikanto et al calculated a schizophrenia polygenic risk score for the CACNA1l region and found that it was significantly associated with sleep spindle amplitude, duration, and intensity in a sample of 157 adolescents. 23By contrast, 2 studies with 167 to 2725 participants did not find an association between geneset-specific schizophrenia polygenic risk scores related to neurotransmission/neurodevelopment and brain volumes measured by magnetic resonance imaging (MRI). 24,25n summary, the utility of gene-set-specific polygenic risk scores needs further testing in a broader range of psychosis endophenotypes.More gene sets should be studied, as previous studies only focused on a small number of hypothesis-driven gene sets.Therefore, by testing the association between 7 known psychosis endophenotypes and gene-set-specific polygenic risk scores for schizophrenia and bipolar disorder, the current study aims to identify the biological processes underlying the genetic risk for psychosis.

Participants and Clinical Assessments
Overall, 6935 participants were recruited by the Psychosis Endophenotypes International Consortium (PEIC) at 8 research centers in Australia, Germany, the Netherlands (as part of the Genetic Risk and Outcome of Psychosis [GROUP] Study), Spain, and the United Kingdom.The study was approved by the local ethics committee at each research center.All participants provided written informed consent before assessments.There were 3 clinical groups recruited in the sample: Patients with psychosis, their unaffected first-degree relatives, and controls.

Gene-Set Polygenic Risk Scores for Psychosis
8][29][30][31][32] Details of diagnostic measures and inclusion and exclusion criteria can be found in supplementary materials.

Cognitive Measures
Participants were assessed by the block design and digit span tasks in the Wechsler Adult Intelligence Scale, revised version (WAIS-R) 33 or third edition (WAIS-III). 34he block design task measured participants' visuospatial ability and the digit span task measured participants' short-term and working memory.As different research centers adopted slightly different versions of the block design and digit span tasks, we used percentage (raw score/max score) to represent participants' performance in the 2 tasks.Participants were also assessed by the Rey Auditory Verbal Learning Test, 35,36 which included the immediate and delayed recall tests (measuring short-term and long-term verbal memory).

EEG and MRI Data Collection and Processing
8][39][40] EEG data were collected with vertical electrooculography (EOG) from 17 to 20 scalp sites based on the International 10/20 system, 41 referenced to mastoids or earlobes.EEG was corrected for eye blink artifacts using regression-based weighting coefficients, 42 as well as additional visual inspection.The P300 amplitude and latency were measured at the peak between 250 and 600 ms following the target tones at the Pz electrode.[45][46][47][48][49][50][51][52][53][54][55][56][57][58] Genotyping, Quality Control, and Imputation Blood DNA samples of 6935 participants were collected at all research centers and sent to the Wellcome Trust Sanger Institute (Cambridge, UK) for initial processing and quality control.Subsequently, samples were sent to Affymetrix Services Laboratory (www.affymetrix.com) for genotyping.Genotypes were called using the CHIAMO algorithm modified for use with the Affymetrix 6.0 genotyping array. 59,60They underwent standard quality control at UCL using software including PEDSTATS, 61 Evoker, 62 LDAK, 63 and PLINK. 64uality-controlled genotypes were uploaded to the Sanger Imputation Server (https://imputation.sanger.ac.uk) for imputation. 65Pre-phasing and imputation were conducted according to the EAGLE2/PWBT pipeline based on the Haplotype Reference Consortium panel (r1.1). 66,67The imputed genotypes were converted to bestguess format using a hard-call threshold of 0.8 and SNPs with an INFO score <0.8 were excluded.A total of 6 215 801 SNPs and 4835 participants remained after quality control.[70]

Relationship Inference and Principal Component Analysis
To account for familial relatedness and population structure in the sample, we used the GENESIS R/ Bioconductor package to generate a kinship matrix and conduct principal component (PC) analysis. 71,72Based on the genotyped data that passed quality control, an unadjusted kinship matrix was first generated using KINGrobust 2.2.5. 73The genotyped data were further pruned using the SNPRelate package in R 4.0.2 74and analyzed with the unadjusted kinship matrix by the PC-AiR function to estimate the ancestrally representative PCs. 71We then estimated a new kinship matrix adjusted for the PCs by the PC-Relate function, which allows for more accurate estimation of familial relatedness independent of ancestral background. 75Details of relationship inference and PC analysis can be found in supplementary materials.

Selection of Gene Sets
We retrieved a group of gene sets related to the central nervous system from previous publications, [76][77][78] most of which were derived from the Mouse Genome Informatics Mammalian Phenotype database. 79We downloaded other lists of curated gene sets from the following public access databases: Reactome, 80 Kyoto Encyclopedia of Genes and Genomes, 81 Pathway Commons, 82 and Panther. 83Gene sets from the "Cellular Component" and "Biological Process" categories were downloaded from Gene Ontology. 84To reduce the burden of multiple testing correction, for gene sets downloaded from public databases we retained only those with at least one of the following key terms: Brain, cerebral, nerve, nervous, neuron, neuronal, neural, glia, microglia, astrocyte, oligodendrocyte, axon, axonal, dendrite, dendritic, synapse, synaptic, neurotransmitter, or neurotransmission.Gene sets with terms indicating the direction of regulation (ie, positive or negative) were removed, as gene sets were only used to subset SNPs and the direction of regulation of the gene sets would not be relevant to polygenic risk scores.Based on these criteria, we included a total of 378 gene sets in our final analysis.

Polygenic Risk Scoring
We used PRSice v2.3.3 85,86 to calculate the genome-wide polygenic risk scores for schizophrenia and bipolar disorder for each individual in the PEIC sample.GWAS summary statistics for schizophrenia and bipolar disorder were downloaded from the Psychiatric Genomics Consortium (PGC3). 3,4As the PEIC sample only included participants of European ancestry and was part of the PGC3 sample, the GWAS summary statistics we used were generated based on the European participants of the PGC3 that excluded the PEIC sample.We excluded SNPs with an INFO score <0.8 or a minor allele frequency <0.01 (in cases or controls) in the GWAS summary statistics, and performed clumping with an r 2 threshold = 0.1 in a 500 kilobase window.We applied a P-value threshold of 1 to include all SNPs that passed the quality control to calculate the genome-wide schizophrenia and bipolar disorder polygenic risk scores.We also applied a P-value threshold of .05 for the genomewide schizophrenia polygenic risk score and 0.1 for the genome-wide bipolar disorder polygenic risk score, as those P-value thresholds generated the polygenic risk scores that explained the most variance in disease risk in the previous publications by the PGC3. 3,4e then used the PRSet function in PRSice v2.3.3 to calculate the gene-set-specific polygenic risk scores. 85,87ompared to other methods, 88,89 PRSet is computationally efficient and performs clumping for each gene set to keep all independent signals. 87We calculated the scores of each gene set selected above for schizophrenia and bipolar disorder separately.The method used here was similar to that for the genome-wide polygenic risk scores, but restricted to SNPs that fall within a 10-kilobase window around each gene included in a gene set.SNPs were clumped independently for each gene set using an r 2 threshold = 0.1 in a 2-megabase window.We applied a P-value threshold of 1 for all gene-set-specific polygenic risk scores without excluding any SNPs after clumping, to maximize the number of SNPs included in each gene set.
In total, we generated 380 (378 gene-set specific, 2 genome-wide) polygenic risk scores for schizophrenia and 378 (376 gene-set specific, 2 genome-wide) polygenic risk scores for bipolar disorder.Two gene sets were excluded from the bipolar disorder polygenic risk scores as no SNPs in the gene sets were found in the GWAS summary statistics and the PEIC sample.

Statistical Analysis
Our primary analysis tested associations between the 7 endophenotypes and the polygenic risk scores.We standardized the polygenic risk scores based on the means and SDs of the control group.For each endophenotype, we fitted a linear mixed-effects regression model with each polygenic risk score as a fixed effect.For covariates, we included age, sex, clinical group, research center, and the first 4 ancestry PCs as fixed effects, and the kinship matrix as a random effect.For significant associations, we also checked if the associations were consistent across 3 clinical groups and if they were driven by specific genes in the gene set.
In our secondary analysis, we tested associations between the polygenic risk scores and participants' casecontrol status, including only patients and controls.We fitted a fixed-effect logistic regression model with case-control status as a binary outcome and each of the gene-set-specific polygenic risk scores as a fixed effect.We included age, sex, research center, and the first 4 ancestry PCs in the model as covariates.The kinship matrix was not included as participants in the patient and control groups were generally unrelated.Participants recruited in Munich or Pamplona were excluded from the analysis as the 2 centers recruited only patients or only controls.
We accounted for multiple testing using Bonferroni correction, generating a new significance threshold based on the number of polygenic risk scores tested for each endophenotype (0.05/(380 + 378) = 7 × 10 −5 ), and additionally applied a more stringent threshold accounting for the number of endophenotypes (0.05/(380 + 378)/7 = 9 × 10 −6 ).We used Nakagawa's R 2 to indicate the variance of each endophenotype explained by each polygenic risk score, 90 and Nagelkerke's pseudo R 2 for case-control status to indicate the improvement of the model by adding the polygenic risk score compared to the null model without it. 91We initially included an interaction term between polygenic risk score and clinical group in the model, but eventually dropped it as no significant interactions were detected after correction for multiple testing.
For all analyses mentioned above, we excluded participants who did not pass genetic quality control or with missing data on any of the covariates included in the model.As different research centers collected different endophenotypes, the total number of participants analyzed in the models also varied across endophenotypes.All statistical analyses were conducted using R 4.0.2. 72

Gene-Set Polygenic Risk Scores for Psychosis
detailed information on sample characteristics by clinical group.
The summary statistics of the 7 endophenotype measures by clinical group are shown in table 2, and the sample sizes vary across different endophenotypes (n = 510 to 3088).In general, patients and relatives showed deficits in all endophenotypes compared to controls, which has been reported in our previous publications using the same sample. 17,68,69

Associations Between Endophenotypes and Polygenic Risk Scores
Based on the significance threshold of 7 × 10 −5 after multiple testing corrections, we found a significant negative association between the P300 amplitude and the schizophrenia polygenic risk score of forebrain regionalization in a sample of 510 participants (211 patients, 160 relatives, and 139 controls; mean difference per SD increase in the polygenic risk score: −1.15 µV; 95% CI: −1.70 to −0.59 µV; P = 6 × 10 −5 ; figure 1A).The schizophrenia polygenic risk score of forebrain regionalization also explained more variance of the P300 amplitude (R 2 = 0.032) than any other schizophrenia polygenic risk scores, including the genome-wide schizophrenia polygenic risk scores with a P-value threshold of 0.05 (R 2 = 0.015) and 1 (R 2 = 0.019) (figure 1B).
As validation, we also checked if the association between the P300 amplitude and the schizophrenia polygenic risk score of forebrain regionalization was consistent across 3 clinical groups.The direction of the association was consistent in all groups, which also reached the nominal significance level (P < .05) in both patients and controls (supplementary figure S5).Notably, EMX1, one of the genes within the forebrain regionalization gene set, contained a locus that reached genome-wide significance in the latest GWAS on schizophrenia. 4 Indeed, an additional analysis showed that higher partitioned schizophrenia polygenic risk scores restricted to the EMX1 region were associated with reduced P300 amplitudes at the nominal significance level (mean difference per SD increase in polygenic risk score: −0.66 µV, 95% CI: −1.27 to −0.05, P = .033)(supplementary materials).No significant associations were found between other endophenotypes and schizophrenia or bipolar disorder polygenic risk scores after correction for multiple testing (supplementary figure S1 to S4).The −log10(P-value) for those associations was not or very weakly correlated with the number of SNPs included in the polygenic risk scores, indicating that our results were not confounded by the number of SNPs in each score (supplementary materials).No associations passed the more stringent significance threshold of 9 × 10 −6 .Fig. 1.Associations between P300 amplitude and schizophrenia polygenic risk scores (A) and variance of P300 amplitude explained by schizophrenia polygenic risk scores (B).Gene-set-specific polygenic risk scores are grouped by the search terms they contain.7][78] On the x-axis, gene sets from the same source were arranged in descending order of the number of SNPs included in each polygenic risk score.CNS, central nervous system; PRS, polygenic risk score; Pt, P-value threshold.

Associations Between Case-Control Status and Polygenic Risk Scores
For associations with case-control status in a sample of 1138 cases and 1508 controls, 55 gene-set specific polygenic risk scores for schizophrenia and 18 gene-set specific polygenic risk scores for bipolar disorder passed the 7 × 10 −5 threshold after multiple testing corrections.However, the genome-wide polygenic risk scores were generally more significantly associated than the gene-setspecific polygenic risk scores (figure 2A and figure 2B).The genome-wide polygenic risk scores also had a much bigger pseudo R 2 than any of the gene-set specific polygenic risk scores, as shown in figure 2C and figure 2D.In general, stronger associations with case-control status were found for polygenic risk scores that included more SNPs (supplementary materials).

Discussion
The current study used gene-set-specific polygenic risk scores as a tool to investigate the biological mechanisms underlying endophenotypes that convey psychosis risk.A significant association was found between the P300 amplitude and the schizophrenia gene-set-specific polygenic risk score of forebrain regionalization.The reduction in P300 amplitudes is a well-established endophenotype for psychosis, [10][11][12][13] and may predict transition to psychosis in individuals at ultra-high risk. 92,93However, no compelling theories have been developed to explain the underlying neurobiology of P300 deficits in schizophrenia, and our study indicates that they may be related to alterations in early brain development.
Forebrain regionalization is a critical stage in early brain development, during which highly regionalized gene expression modulates the patterning of discrete regions. 94his involves several processes such as cell migration and neuronal differentiation, facilitating the separation of the forebrain into the telencephalon (cerebrum) and the diencephalon (thalamus, hypothalamus, epithalamus, and subthalamus). 95In line with the finding on the P300 amplitude, a recent transcriptome-wide association study by our group suggests that early neurodevelopment may also influence mismatch negativity, another EEG measure associated with auditory change detection. 70Moreover, the role of forebrain development in schizophrenia is supported by a study using human induced pluripotent stem cells (hiPSCs). 96In this study, the authors found that genes differentially expressed in neural progenitor cells and neurons between patients with schizophrenia and controls were enriched in the forebrain development pathway. 97nterestingly, they found that hiPSC-derived neurons from patients exhibited altered electrophysiological measures related to Na + channel function. 97It is plausible that such changes at the neuronal level may also influence higherlevel neurophysiological measures such as the P300, although more research is needed to draw this link.Fig. 2. Associations between case-control status and schizophrenia (A) or bipolar disorder (B) polygenic risk scores.Pseudo R 2 of case-control status explained by schizophrenia (C) or bipolar disorder (D) polygenic risk scores.Gene-set-specific polygenic risk scores are grouped by the search terms they contain.7][78] On the x-axis, gene sets from the same source were arranged in descending order of the number of SNPs included in each polygenic risk score.CNS, central nervous system; PRS, polygenic risk score; Pt, P-value threshold.
Our additional analysis revealed that the partitioned schizophrenia polygenic risk score restricted to EMX1 was negatively associated with the P300 amplitude at the nominal P-value threshold.This gene contains a genome-wide significant locus identified by the latest schizophrenia GWAS 4 and is involved in several critical biological processes during early brain development, such as neuron differentiation and neural stem cell proliferation. 98,99Thus, given the strong evidence for the involvement of the EMX1 gene in schizophrenia and in P300 amplitude deficits, further research should seek to characterize its functions using cellular and animal models as well as other endophenotypes in humans.
We found no significant associations for other endophenotypes measured in the current study.This could be explained by the relatively high heritability of the P300 amplitude (69%) 37 compared to other endophenotypes, such as specific cognitive abilities (average heritability estimates of 56%). 100 Moreover, the lack of significant associations with bipolar disorder polygenic risk scores might reflect the small number of patients with bipolar disorder in our sample, which limited the statistical power.Finally, it is worth noting that our significant finding did not survive the additional more stringent correction.Therefore, caution needs to be taken when interpreting our results, and future replication studies are needed.
As expected, our secondary analysis revealed that compared to gene-set specific polygenic risk scores, genomewide polygenic risk scores were more strongly associated with and explained more variance of case-control status.Nevertheless, investigating the associations between gene-set-specific polygenic risk scores and case-control status may still help to pinpoint the core gene sets that are most relevant to disease mechanisms.Although this is beyond the scope of the current study, a previous study found that the schizophrenia polygenic risk scores generated based on predefined core gene sets outperformed polygenic risk scores of randomly generated gene sets of similar sizes. 101he present study has its limitations.Although the PEIC has a relatively large sample size, our study might still be underpowered to detect certain associations.More associations between endophenotypes and gene sets may arise in future studies with increased power through meta-or mega-analyses of multiple samples.Moreover, while data from multiple research centers increased the overall sample size, this might have also increased heterogeneity.Nevertheless, we have controlled for potential confounders by including multiple covariates in the regression models, and a strength of this study is that all blood samples underwent the same genotyping and quality control process.Finally, it is worth noting that other factors, such as gene-gene/gene-environment interactions and rare variants associated with psychosis may also influence endophenotypes.Although those were not tested in the current study, our previous study using the same dataset found that schizophrenia-related rare copy number variants were associated with verbal memory deficits. 69Certain environmental exposures, such as medication, could also affect endophenotype performance. 102lthough medication use was not recorded in the PEIC, we believe our finding on the P300 is still valid, as the association was consistent in unaffected relatives and controls who were medication-free (supplementary materials).
To conclude, the current study offered evidence for the utility of endophenotypes and gene-set-specific polygenic risk scores to illuminate the biological mechanisms underlying psychosis.We found that a reduced P300 amplitude was associated with a higher schizophrenia polygenic risk score of forebrain regionalization, supporting the neurodevelopmental hypothesis of schizophrenia. 103,104Future studies with larger samples and more gene sets will advance our understanding of biological processes underlying endophenotypes for psychosis.We also need more mechanistic studies, such as those using animal models and human-induced pluripotent stem cells from patients with psychosis, to further illuminate how neurodevelopmental impairments affect endophenotypes and increase psychosis risk.

Table 1 .
Sample Characteristics by Clinical GroupNote.The Netherlands included 4 study sites (Amsterdam, Groningen, Maastricht, and Utrecht) in the GROUP Study, which employed similar recruitment and assessment procedures.

Table 2 .
Summary Statistics of Endophenotype Measures by Clinical Group Note.Participants' performance in the block design and digit span tasks was measured by percentage (raw score/max score).RAVLT, Rey Auditory Verbal Learning Test.