Association of schizophrenia polygenic risk score with data-driven cognitive subtypes: A six-year longitudinal study in patients, siblings and controls

Cross-sectional studies have shown that the polygenic risk score for schizophrenia (PRSSCZ) may influence heterogeneity in cognitive performance although evidence from family-based longitudinal study is limited. This study aimed to identify trajectories of cognitive function and assess whether the PRSSCZ is associated with baseline cognitive performance and predicted six-year trajectories. We included 1119 patients with a schizophrenia spectrum disorder, and 1059 unaffected siblings and 586 unrelated controls who are eligible at baseline. Genotype data were collected at baseline, whereas clinical and sociodemographic data were collected at baseline, three and six years. Group-based trajectory modeling was applied on a weighted standardized composite score of general cognition to unravel cognitive subtypes and explore trajectories over time. We followed a standard procedure to calculate the polygenic risk score. A random-effects ordinal regression model was used to investigate the association between PRSSCZ and cognitive subtypes. Five cognitive subtypes with variable trajectories were found in patients, four in siblings and controls, and six in all combined samples. PRSSCZ significantly predicted poor cognitive trajectories in patients, siblings and all samples. After Bonferroni correction and adjustment for non-genetic factors, only the results in all combined sample remained significant. Cognitive impairment in schizophrenia is heterogeneous and may be linked with high PRSSCZ. Our finding confirmed at least in all combined samples the presence of genetic overlap between schizophrenia and cognitive function and can give insight into the mechanisms of cognitive deficits.


Introduction
Cognitive impairment is one of the core features of schizophrenia and can contribute to poor daily functioning and quality of life in patients (Savilla et al., 2008). This impairment is highly heterogeneous and variable over time in a domain-selective as well as a general manner and affects the majority of patients and their siblings (Shmukler et al., 2015;Petrova and Dorofeikova, 2017). General cognition is a highly heritable phenotype with estimated single-nucleotide polymorphisms (SNPs)-based heritability of up to 30% (Trampush et al., 2017;Davies et al., 2015). To date, N100 loci have been associated with general cognitive function, in which most of them are located in the intronic or intergenic regions (Ohi et al., 2018). Similarly, family-based heritability of up to 85% is observed in specific cognitive domains, such as verbal fluency, early visual perception, spatial working memory and IQ (Aukes et al., 2008). Findings from the Bipolar-Schizophrenia Network on Intermediate Phenotypes (B-SNIP) Study showed that cognitive impairment clusters frequently within families, whereby the cognitive performance of first-degree relatives of patients with schizophrenia is lower than performance in the general population, but less severe compared to their affected proband (Hill et al., 2013). This evidence is further supported by the significant familial transmission of attention, memory, and verbal learning impairment in schizophrenia (Bigdeli et al., 2019).
Given the highly polygenic nature of cognition and schizophrenia, and as the cognitive impairment is defined as one of the features of schizophrenia, the endophenotype hypothesis states that alleles associated with the low cognitive ability also increase the risk for schizophrenia. Within the framework of the Consortium on the Genetics of Schizophrenia (COGS) study, genome-wide association analysis of 12 quantitative neurophysiological and neurocognitive endophenotypes for schizophrenia found more than seven genome-wide significant regions containing several genes of interest for seven endophenotypes related to executive function, inhibition, episodic memory, attention, learning, vigilance, working memory and social cognition domains (Greenwood et al., 2013;Greenwood et al., 2016;Greenwood et al., 2019). Other reports from the B-SNIP Consortium and Cognitive Genomics consorTium (COGENT) showed a bidirectional genetic association between cognition and schizophrenia in both non-clinical and clinical samples. Additionally, the COGENT reported SNPs that have previously been strongly associated with schizophrenia also had nominally significant associations with cognition in healthy individuals (Lencz et al., 2014). A meta-analysis of genome-wide association studies of COGENT, Psychiatric Genomics Consortium, Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium and UK Biobank cohort also revealed that 21 loci associated to both schizophrenia and verbal-numerical reasoning, reaction time and general cognitive ability (Smeland et al., 2017). These evidence further consolidated by polygenic risk score (PRS)-based studies that found polygenic risk scores for schizophrenia (PRS SCZ ) was associated with low general cognition and polygenic risk score for low cognition was associated with high risk for schizophrenia (Trampush et al., 2017;Shafee et al., 2018;Lencz et al., 2014).
Cognitive function is a suitable objectively measurable (endo)phenotype (Gur et al., 2006;Luperdi et al., 2019). However, results suggest that endophenotypes are equally complex as the disease, which makes it difficult to conclude about the extent of heterogeneity, variability over time and their predictors. To overcome the clinical heterogeneity, molecular complexity and facilitate gene discovery of schizophrenia, the use of endophenotypes and subtyping approaches were advocated as a promising alternative strategy in psychiatric research (Aukes et al., 2008;Aukes et al., 2009). To explore the heterogeneity of cognitive function in a Dutch population, we first cross-sectionally examined cognitive heterogeneity in non-affected siblings of patients (Quee et al., 2014) and subsequently explored cognitive trajectories over six years in patients and their siblings (Islam et al., 2018). To date, up to five clinically and statistically meaningful homogenous cognitive subtypes have been identified (Habtewold et al., 2019). The course of cognitive functioning over time is characterized by stability, progressive degeneration, relapsing and progressive improvement trajectory (Habtewold et al., 2019;Kochunov and Hong, 2014). Our study and others report showed that trajectory of cognitive impairment can be associated with exposure to multifactorial clinical and sociodemographic factors, including aging, male gender, low educational status, low IQ, high dose secondgeneration antipsychotics, late age of illness onset, long duration of illness, severe positive and/or negative symptoms, and poor functioning and quality of life (Habtewold et al., 2019). Nevertheless, none of the previous studies determined the effect of underlying genetic susceptibility either of a single or an aggregated effect of schizophreniaassociated SNPs on cognitive trajectories. Identification of the sum effects of SNPs on homogeneous trajectories may help to accurately predict the clinical course of schizophrenia.
Here, we extended previous knowledge and investigated the trajectories of general cognitive function among patients with schizophrenia spectrum disorders, their unaffected siblings and healthy controls using longitudinal family-based data. We further examined the predictive effect of PRS SCZ on cognitive performance at baseline and over the six years.

Study procedure and population
Data were extracted from the Genetic Risk and Outcome of Psychosis (GROUP) cohort (data release 7.00), a six-year longitudinal multicenter national study in the Netherlands (Korver et al., 2012). In total, 2764 individuals were eligible at baseline: 1119 patients with a schizophrenia spectrum disorder, 1059 unaffected first-degree relatives and 586 unrelated healthy controls. Outpatients or inpatients presenting at the mental health service centers were consecutively recruited. Patients with the age range of 16 to 50 years (both extremes included), a diagnosis of schizophrenia spectrum disorder based on the 4th edition Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) criteria, good command of the Dutch language, and willing and capable of giving written informed consent were included. Similar criteria, excluding a diagnosis of psychosis, were applied to the siblings. Patients and siblings are clustered within families (i.e. two to six individuals per family). Controls were included if they had no known lifetime psychotic disorder and no first-degree family member with a lifetime psychotic disorder. The main reason to use the age range of 16 to 50 years was that psychosis is presumed to be prevalent to occur in this age range. This age limit was also used to minimize the effect of age-related cognitive decline. Genotype data were collected at baseline, whereas clinical and sociodemographic data were collected at baseline, third year and sixth year using various standard tools. This study was conducted in line with the published study protocol (Korver et al., 2012).

Sociodemographic and clinical variables
Age, gender, ethnicity, years of education and marital status were included sociodemographic variables. Premorbid IQ, premorbid adjustment, age of illness onset, duration of illness, psychotic episode, use of antipsychotics, psychotic symptoms, schizotypy, psychotic-like experience, social and occupational functioning, and health-related quality of life were included clinical variables. Schizophrenia spectrum disorders were ascertained using Schedules for Clinical Assessment in Neuropsychiatry (SCAN) or Comprehensive Assessment of Symptoms and History (CASH) (Andreasen et al., 1992) instrument (Aboraya et al., 1998;Rijnders et al., 2000). The Premorbid Adjustment Scale (PAS) was used to assess premorbid functioning (Cannon-Spoor et al., 1982). Severity of positive and negative symptoms in the patients was assessed by the positive and negative syndrome scale (PANSS), which is a 30item clinician-rated questionnaire and currently recognized as the most widely used scale (Kay et al., 1987). Positive and negative schizotypy in siblings and healthy controls were assessed using a reliable scale known as the Structured Interview for Schizotypy-revised (SIS-R) (Vollema and Ormel, 2000;Vollema et al., 2002;Kendler et al., 1989). Community Assessment of Psychic Experiences (CAPE), a 42item reliable and valid self-report questionnaire, was used to assess the psychotic-like experiences (Brenner et al., 2007). The World Health Organization Quality of Life-BREF (WHOQOL-BREF) questionnaire, which has high construct validity and reliability, was used to assess health-related quality of life (Trompenaars et al., 2005), whereas Social Functioning Scale (SFS) was used to measure functioning (Birchwood et al., 1990).

Cognitive function assessment
Cognitive function was assessed using Word Learning Task (WLT) (immediate recall, delayed recall and retention rate), Continuous Performance Test-HQ (CPT-HQ) (CPT performance index and CPT variability), Wechsler Adult Intelligence Scale (WAIS-III) (digit symbol substitution, information, calculation, block design), Response Setshifting Task (RST), Benton Facial Recognition Test (BFT), Degraded Affect Recognition Task (DFAR) and Hinting Task (HT) Nuechterlein et al., 2008;Nuechterlein et al., 2004). HT, RST, and BFT were excluded due to the absence of data at the third and/or sixth year measurement. DFAR was excluded because it measures social cognition. The retention rate component of WLT was excluded due to high correlation with immediate and delayed recall. Finally, immediate recall, delayed recall, CPT performance index and variability, digit symbol substitution, information, calculation and block design tests were retained to use for the principal component analysis (PCA). Details on the scoring system were published in our previous article (Islam et al., 2018).

Principal component analysis (PCA)
Unlike our previous studies (Quee et al., 2014;Islam et al., 2018), we applied PCA to obtain a weighted standardized composite score from the eight cognitive tests. PCA was done on the correlation matrix using varimax with the Kaiser Normalization rotation method separately in patients, siblings, control and all combined samples. We chose the correlation matrix to standardize cognitive tests given that each cognitive function was measured on a different scale. Principal components were extracted using a two-step approach combining the Kaiser criterion (i.e., Eigenvalue ≥1) with the visual inspection of scree plots and their component scores were saved for further analysis. The composite score for general neurocognition was obtained by summing up component scores and used in the trajectory modeling. PCA has been previously applied in several studies that examined the heterogeneity of schizophrenia symptoms and cognitive deficits (Strauss et al., 2013;Wang et al., 2012;Barrantes-Vidal et al., 2010;Geisler et al., 2015;Chang et al., 2015;Bell et al., 2013;Nuechterlein et al., 2004).

Trajectory modeling
A trajectory represents people subgroups with a homogenous symptom profile within a group, but have a heterogeneous profile between groups over time (Abdin et al., 2017). Distinct clinically and statistically meaningful trajectories in patients, siblings, controls and all samples that have unique cognitive profiles over time were identified using censored normal group-based trajectory modeling (GBTM) using PROC TRAJ in SAS version 9.4 for Windows (Jones et al., 2001). The drop-out model, which includes a logistic model of drop-out probability per wave of measurement, was used to investigate whether the attrition rates significantly biased group membership probabilities of trajectories (Haviland et al., 2011). For each group, we assumed that the drop-out probability depends on the two previous responses (i.e. baseline and three-year observation). A complete theoretical explanation of concepts, functions, and applications of GBTM is published elsewhere (Nagin, 1999;Nagin and Odgers, 2010;Nagin, 2010). We started with one group and repeatedly increased the number of groups with a quadratic polynomial order until a model with optimum number of trajectory groups were identified. The optimal number of trajectories was selected based on sample size adjusted Bayesian information criterion (BICn), logged Bayes factor (i.e. 2*ΔBICn) and authors' clinical judgment. ΔBICn calculated by subtracting the absolute value of BICn of less complex model from the absolute value of BICn of the complex model. The model with the lowest BICn and logged Bayes factor N 10 was selected as the best-fitting model. Then, the polynomial order was adjusted until it became significant at the confidence level alpha (α) = 0.05 for all trajectories. To evaluate the model classification accuracy, we calculated the average group posterior probability (AvePP). An AvePP N0.7 indicates that the model has good accuracy in the classification of individuals (Niyonkuru et al., 2013).

Genotyping and quality control (QC)
Genotype data of 2812 individuals were generated on a customized Illumina, IPMCN array with 570 k SNPs comprising of~250 k common tagging SNPs, 250 k exome SNPs (rare, nonsynonymous with minor allele frequency (MAF) b 1%), and~50 k psychiatric-related SNPs. QC procedures were performed using PLINK v1.9 (Purcell et al., 2007). SNPs and samples with call rates below 95% and 98% were removed, respectively. A strict SNP QC involved a MAF threshold N10% and Hardy-Weinberg equilibrium (HWE) p-value N1e-05 that followed by linkage disequilibrium (LD)-based SNP pruning (R 2 b 0.2). This resulted iñ 58 k SNPs to assess sex mismatch, heterozygosity (at F-value b3 standard deviation (SD)), homozygosity (F-value N3SD) and relatedness by pairwise identity by descent (IBD) values. Duplicate samples (at pihat N0.8) were removed and remaining pairs were manually checked since this dataset contains family members. After removing failing samples, a regular SNP QC was performed considering SNP call rate of N98%, HWE p N 1e-06 and MAF N 1%. Multidimensional scaling clustering with Hapmap Phase 3 individuals was done to check ancestry and individuals deviated more than 3SD from the mean of our sample were removed (n = 91). Besides, the first 20 genetic principal components of qualified samples were determined by EIGENSTRAT software using genetic information from the strict SNP QC list (Price et al., 2006). Next, strand ambiguous SNPs and duplicate SNPs were removed. Mendelian errors were set to missing followed by another missingness check at a 2% threshold for samples (n = 8 excluded) and SNPs, and SNPs with differential missingness between cases and controls were removed. In total, 2505 individuals and 275 k SNPs passed QC steps. SNPs were imputed on the Michigan server using the Haplotype Reference Consortium

Polygenic risk score calculation
We followed a standard approach to calculate the polygenic risk score for schizophrenia using PLINK v1.9 (Purcell et al., 2007). Overlapping SNPs between the Psychiatric Genomics Consortium (PGC) genome-wide association study (training dataset), 1000 reference Genome (reference dataset) and the present sample (target dataset) were selected. Then, insertion/deletion or ambiguous SNPs, SNPs with MAF b 0.01 and imputation quality (R 2 ) b 0.8 both in the training and target dataset, and SNPs located in complex-LD regions were excluded (Price et al., 2008), which leaving 2950 k SNPs. Next, these SNPs were clumped in two rounds: in the first round with the default parameters (physical distance threshold 250 kb and LD threshold (R 2 ) b 0.5 and with a physical distance threshold of 5000 kb and LD threshold (R 2 ) b 0.2 in the second round, which resulted in 194 k independent SNPs for PRS calculation. Odds ratios for autosomal SNPs were computed in PGC after excluding the present Dutch samples (Ripke et al., 2014) and log-converted to beta values. Finally, four PRS SCZ were constructed using the PLINK score function separately for 2505 individuals at four p-value thresholds (P T ) of 5e-08 (81 SNPs included), 0.05 (35 k SNPs), 0.1 (54 k SNPs) and 0.5 (147 k SNPs) while adjusting for the 20 genetic principal components to control for any undetected population stratification. To simplify the interpretation, we standardized all the PRSs to a standard normal distribution with a mean of 0 and a standard deviation of 1 (Lewis and Vassos, 2017). Each threshold resulted in a different score for every subject. Our main reason to use the four thresholds was to compare the predictive value of different PRSs that were constructed based on the aggregate effects of SNPs selected at different p-value thresholds. In general, as the number of associative p-value threshold becomes less stringent, the number of informative/predictive SNPs included in the risk estimation becomes larger, and hence the predictive value of PRS of the desired outcome improves as well. A p-value threshold (PT) of ≤0.05 was considered the most predictive threshold for schizophrenia (Ripke et al., 2014).

Statistical analysis
A linear mixed-effects model using PROC MIXED in SAS version 9.4 for Windows was used to explore the differences in PRSs and all continuous environmental factors between patients, siblings and controls. Since individuals in the present sample were clustered into families, it is very likely that observations from the same family are statistically correlated and dependent. Therefore, familial clustering was used as a random effect. Using family as a random effect sets up a common correlation among all individuals within the same family. We ignored family as a random effect if the G-matrix was not positive definite. The Maximum Likelihood (ML) estimation method was used to estimate the model parameters and Type-III (overall) tests of difference in fixed-effects (i.e. sample baseline sociodemographic and clinical characteristics) were interpreted. Pair-wise comparison was done if the overall difference was significant. For categorical variables, chi-square tests were used to examine the difference between patients, siblings, and controls. Pearson's correlation test was applied to assess the bivariate association between baseline cognitive function and PRS SCZ across all sample groups as well as all combined samples. A random-effects ordinal regression analysis using PROC NLMIXED in SAS version 9.4 for Windows (Hedeker, 2005;Hedeker, 2008;Hedeker, 2015) was done to determine whether PRS SCZ (each PRS separately modeled) predicted cognitive trajectories by setting 'family' as a random-effect and adjusting covariables under the proportional and non-proportional odds assumption. The covariables adjusted were age, sex, years of education, ethnicity, premorbid IQ, premorbid adjustment, positive and negative symptoms/schizotypy, and psychotic-like experiences that reported in our previous study (Islam et al., 2018). Random-effects ordinal regression analysis was used because cognitive trajectories had more than two ordered categories and study participants were clustered within families. We reported results based on the proportional and non-proportional odds assumption because the proportional odds assumption was not consistently fulfilled for all four PRSs across samples. During modeling, data missingness was handled automatically by using full information maximum likelihood that use data from partial observations and provides parameter estimates by maximizing the likelihood function of the incomplete data. It is believed the ML estimation method produces unbiased results compared with other methods (Dong and Peng, 2013). Moreover, all tests were two-tailed and Bonferroni correction (i.e., correction for multiple comparisons) was applied to the non-proportional odds regression model. To obtain the Bonferroni corrected/adjusted p-value, the original p-value was divided by the number of comparisons we made on the outcome.

Baseline background characteristics of participants
During the six years follow-up period, including individuals converted to psychosis, data from 1136 eligible patients with a schizophrenia spectrum disorder, 1045 unaffected siblings and 583 healthy controls were used. As presented in Table 1, patients were more likely to be male, young and unmarried, and had poor premorbid IQ, premorbid adjustment, functioning and quality of life compared to siblings and controls. Siblings sociodemographic characteristics, functioning and quality of life laid between patients and controls. In general, N80% of participants were Caucasians.

Principal components of cognition
Based on the Kaiser criterion and visual inspection of scree plots (Fig. S1), the PCA provided three components for each group of samples as well as all combined samples. For nearly all cognitive tests, as shown in Table S1, the component loading was N0.71 indicating that the correlation between the observed cognitive tests and the extracted components was excellent. In addition, all three principal components accounted for 72.7%, 70%, 68.1% and 72.4% of the cognitive variance in patients, unaffected siblings, controls and all combined samples, respectively.

Cognitive subtypes and trajectories
As illustrated in Fig. 1, we found five subgroups of patients, labeled as severely impaired (2.7%), moderately impaired (15.8%), mildly impaired (35.7%), normal (33.6%), and high performing (12.3%) group. Except the moderate group, which had a flat/stable trajectory, all groups had a quadratic trajectory (Table S3). In healthy siblings and controls, we identified four trajectory subgroups denoted as moderately impaired (2.5% and 5.7%), mildly impaired (24.5% and 25.8%), normal (42.0% and 47.3%) and high performing (31.1% and 21.3%) group. Moderately impaired sibling group had flat/stable trajectory, mild and high groups had linear trajectory and the normal group had a quadratic trajectory (Table S4). In controls, mild and moderate groups had flat/stable trajectory, whereas normal and high groups had linear trajectory (Table S5). In all combined samples, we identified six trajectory subgroups (ranging from very severely impaired to high performance) of individuals with a variable trajectory of general cognitive function over the six years. Severe and very severe groups (7%) had flat/stable trajectory, mild group had linear trajectory and, moderate, normal and high groups had a quadratic trajectory (Table S6). The average group posterior probability (AvePP) was 0.73 to 0.96 in all trajectory modeling analyses, indicating that our model had good accuracy of classification of individuals into the respective trajectory groups. A substantial difference in the variability of cognitive function was observed across patient, sibling and control samples. In general, the variability of siblings' cognitive function over time laid between patients and controls. Details on the model fit indices and parameter estimates are presented in supplementary Tables S2-S6. Of interest, most individuals belong to similar trajectory groups when analyzed within their corresponding samples (patient, siblings and unrelated controls) and when all samples are pooled together (Table S7).

Comparisons of participants PRSs
The mean PRS SCZ of patients was significantly higher than the mean PRS SCZ of siblings and controls (p b 0.001 at P T = 0.05). Similarly, the mean PRS SCZ of siblings was significantly higher than the mean PRS SCZ of controls (p b 0.001 at P T = 0.05) (Fig. 2, Table S8).

Association between PRS SCZ and six years cognitive trajectories
In all four PRSs, as presented in Fig. 3, individuals with poor cognitive trajectories had high mean PRS SCZ that support the results from the bivariate correlation analyses at baseline.
In the univariable proportional and non-proportional odds regression model, PRS SCZ significantly predicted the trajectory of general cognition in patients (at P T = 0.05, 0.1 and 0.5) and all combined samples (at all p-value thresholds). For a 1SD increase in PRS SCZ at P T = 0.05, based on the proportional odds model, the odds of patients being in the impaired group (i.e. severely and moderately impaired) was 1.33 (95%CI = 1.08-1.65, p = 0.007) times the odds of patients in the high performing group (i.e. mildly impaired, normal and high). In the combined analysis of all samples of nonproportional odds model, for a 1SD increase in PRS SCZ at P T = 0.05, the odds of individuals being in the impaired group (i.e. severely and very severely impaired group) was 1.72 (OR = 1.72, 95%CI = 1.42-2.08, p b 0.0001) times the odds of individuals in the high performing group (i.e. moderately impaired to high performers). Details on comparisons have been presented in Table 2.
In the multivariable regression model, the significant association between PRS SCZ and cognitive subtypes was persisted only in the combined analyses of all samples. While adjusting for non-genetic covariables in the nonproportional odds model, the odds of individuals  being in the impaired group (i.e. severely and very severely impaired group) was 1.72 (OR = 1.72, 95%CI = 1.18-2.50, p = 0.004) times the odds of individuals in the high performing group (i.e. moderately impaired to high performers) for a 1SD increase in PRS SCZ at P T = 0.1 (Table 3).

Discussion
We found five general cognitive subtypes with variable trajectories in patients (ranging from severely impaired to high functioning), whereas four subtypes were identified in siblings and healthy controls (ranging from moderately impaired to high functioning). Combining all sample groups, we found six cognitive subtypes ranging from very severely impaired to high functioning. Interestingly, all identified cognitive trajectories showed nominal stability, there were small learning effects over time and cognitive degeneration was not observed in the patient sample. PRS SCZ significantly associated with low cognitive function at baseline in all samples. Of note, PRS SCZ is also predicted the sixyear cognitive trajectories in patients and all combined samples. After Bonferroni correction and adjustment for non-genetic factors, only the results in all combined samples remained significant.
In patients, we found five general cognitive subtypes with substantial variability of trajectories during the six years. A cross-sectional study (Dawes et al., 2011) identified five clusters of cognitive functioning in patients with schizophrenia, but they did not evaluate the trajectory over time. In agreement with previous cross-sectional studies (Reser et al., 2015;Rangel et al., 2015;Lewandowski et al., 2018;Lewandowski et al., 2014;Geisler et al., 2015), we found four cognitive trajectory groups in unaffected siblings. Needless to mention, the present finding in patients and siblings was similar to our previous study (Islam et al., 2018) even though we followed a different method to generate the composite score of cognition and modeling trajectories. In the current study, PCA was used to construct composite scores as a measure of general cognition and, siblings and controls converted to psychosis during the six years period were included in the trajectory analysis (i.e. participants status determined at the sixth year). We further identified four cognitive trajectory groups in healthy controls. Moreover, we found six cognitive subtypes for the first time in all combined samples irrespective of their diagnostic status. To date, only one crosssectional study (Thompson et al., 2013) identified three subtypes by combining controls and patients. Overall, we distinguished more subgroups with homogenous cognitive profiles, as compared to most of the previous studies, which may be because we included large samples with a longer follow-up period and used advanced data-driven statistical methods.
We observed both normally functioning (31.1% resp. 21.3%) and severely impaired (5.7% resp. 2.7%) groups in patients as well as in healthy individuals. This supports the previous finding that patients with schizophrenia may sometimes exhibit even better cognitive performance than their matched healthy controls (Bechi et al., 2018). In addition, the presence of severe cognitive impairment in healthy controls shows that there is great variability in cognitive functioning. It also suggests that cognitive heterogeneity may not be fully explained by disease-related factors (as seen in patients) or shared genes and environmental exposures (as seen in siblings) because non-related healthy subjects are independent of disease-related and shared environmental factors. In patients, controls and all combined samples, 15.8%, 31.5% and 7% had stable cognitive trajectory during the six years. Subjects (independent from their phenotypes) stay at the same state of cognitive performance over six years. When they start in low levels of cognition, they stay low, and the same is true for other subtypes as well. Taking this into account, we have shown in the past and in this study that our primary hypothesis is confirmed and people can be categorized to certain subgroups with homogenous stable cognitive functioning. This supports the notion that cognitive function can be a suitable endophenotype in schizophrenia (Gur et al., 2006;Gur et al., 2007). It is conceivable that each cognitive subgroup may have a partly different risk-profile for schizophrenia, predict the incidence of schizophrenia at a different level and associated at a different level with disease genetic and none-genetic risk factors.
In our study, high PRS SCZ was significantly correlated with low baseline IQ (in patients, siblings and controls), verbal learning and memory (in siblings and controls), attention (only in patients) and general cognitive function (only in patients). Similarly, the domainspecific as well as general cognitive deficits in combined analyses of all samples cross-sectionally and longitudinally (also after adjustment for covariables) associated with high PRS SCZ at least for one p-value threshold. A cross-sectional study by the B-SNIP Consortium found that high PRS SCZ was significantly associated with lower general cognitive performance among healthy individuals, but not in people with psychosis (Shafee et al., 2018). Likewise, a meta-analysis found a negative significant correlation between PRS SCZ and global cognition in the general healthy population, while no significant correlation was seen in patients with schizophrenia (Mallet et al., 2020). Reports from a number of independent consortia, such as the B-SNIP Consortium (Bishop et al., 2017), COGS (Greenwood et al., 2013;Greenwood et al., 2016;Greenwood et al., 2019), COGENT (Lencz et al., 2014) and PGC, CHARGE and UK Biobank cohort (Smeland et al., 2017) revealed multiple common genetic variants between schizophrenia and various endophenotypes of cognition, such as IQ, executive function, inhibition, episodic memory, attention, learning, vigilance, working memory, verbal-numerical reasoning, reaction time and general cognitive ability in both non-clinical and clinical samples. Even though the level of evidence varies, altogether, accumulating evidence persists for an existing common genetic susceptibility between schizophrenia and cognitive impairment, yielding the higher genetic susceptibility to schizophrenia, the worse cognitive impairment is observed.
Inconsistencies between studies have been observed even though most studies showed common genetic susceptibility for schizophrenia and cognitive deficits. This difference might be due to a difference in the severity of schizophrenia across cohorts, cognitive assessment tools and their mode of administration, selection of samples (e.g. age difference) and statistical modeling (e.g. adjustment for confounding factors). We refer to one important aspect that, unlike other studies, our study tackled the confounding effect of heterogeneity by focusing on longitudinal trajectories consisting of people with homogeneous cognitive profiles that identified using data-driven methods. This approach empowers the likelihood of finding genetic associations in our study. In contrast, previous studies lumped samples from various cohorts together, which increasing sample diversity, heterogeneity of severity of illness and use of different cognitive battery tests (Mallet et al., 2020). These factors may cause sample dilution and nonsignificant or weak genetic associations, as a result. We also combined association analysis with a data-driven subtyping approach. The previous studies investigated the link between PRS SCZ and neurocognition using a traditional method (i.e. correlation or regression analysis) and interpreted for all study participants. Thus, significant associations may come from a specific subgroup of individuals who are the carrier of high-risk SNPs. Additionally, we applied a principal component analysis to obtain a composite score as a measure of general cognition based on the individualized weights of each cognitive test. Constructing a composite score reduces type 1 error by reducing the number of measures to a more manageable level. Furthermore, using composite score improves signal detection by being more sensitive to disease state and treatment effects, more highly correlated with putative biomarkers such as polygenic risk score, and being better at    predicting disease progression (Riordan, 2017). Taken together, our study identified a larger number of stable cognitive subtypes that strongly predicted by PRS SCZ at least in all combined samples. In this study, we further have seen a clear pattern of results across samples and cognitive measures that suggests specific cognitive assessments used in the present study may be differentially informative and have different liability for genetic risk of schizophrenia (Gur et al., 2006). For example, all IQ measures were significantly associated with PRS SCZ in patients and siblings. In addition, CPT-variability (a measure of attention domain) associated with PRS SCZ in patients while immediate recall (a measure of verbal learning and memory domain), and calculation and digit symbol substitution (a measure of IQ domain) associated with PRS SCZ in controls. These results showed that cognitive alteration in schizophrenia is selective as well as general and siblings' degree of proneness to cognitive impairment closer to patients than healthy controls (Shmukler et al., 2015;Petrova and Dorofeikova, 2017). Additionally, cognitive variation in patients with schizophrenia and healthy individuals arises at least in part due to shared genetic factors (Avramopoulos, 2018;Richards et al., 2019;Xavier et al., 2018). In general, we confirmed using bivariate correlation analyses that cognitive deficits are partly genetically determined in patients as well as in healthy subjects.
Our study may provide a more accurate estimate of trajectories and their genetic susceptibility through minimizing methodological biases in cross-sectional studies. The trajectory modeling approach aims to capture inherent patterns on the course of illness longitudinally by characterizing the subgroups of the population with a similar course of symptoms and examining illness characteristics and their predictors (Abdin et al., 2017). Such characterization is sensitive to between-and within-patient heterogeneity in symptoms (Thompson et al., 2013). The application of data-driven methods on endophenotypes can provide an opportunity for discovering new specific treatment strategies to subgroup of population to optimize clinical recovery, evaluating the effects pharmacological treatment and discovering gene if subtypes with different trajectory are accurately identified (Abdin et al., 2017;Austin et al., 2015;Reilly and Sweeney, 2014). Determining the existence of cognitive heterogeneity in genetically non-related healthy samples could be clinically useful and may help to identify individuals with poor cognitive function due to exposure to certain environmental factors and who need early intervention. Our study may also give insight into the molecular basis of the genetic overlap between schizophrenia and cognitive ability to understand the pathogenesis of cognitive impairment in schizophrenia. Another relevant implication of this study is that the results can be baseline evidence for imaging, molecular, and biological subtyping of cognitive impairment in schizophrenia. In addition, the results of this study could be helpful to optimize precision medicine, and thereby, to perform personalizing assessment, diagnosing a high-risk group of population and promptly treating subgroups of patients with poor outcome.
Besides the longitudinal design and application of advanced methodological and statistical tools, our sample included matched healthy controls for whom similar cognitive and genetic (PRS) data have been available, and therefore, we can directly evaluate to what extent PRS SCZ contributes to cognitive differences between those with and without the illness. The use of healthy controls sample can also ensure the validity of identified cognitive subtypes in our study. To date, only a couple of studies (Lysaker et al., 2009;Thompson et al., 2013) have used data from healthy controls or general population, where none of these studies explored the longitudinal cognitive trajectory. Apart from this, we assessed a broad range of cognitive domains. This study has also some limitations. The sample size is small in comparison to the recommended sample size of 2000 for the application of PRS-based analysis (Wray et al., 2014). Nevertheless, we directly compared results between patients, siblings, and controls, and we confirmed our results using data from a longitudinally assessed cognitive measurement. Besides, the summary statistics used to construct PRS obtained from the latest genome-wide association study of PGC. Our study has also enough power to examine the association in the current sample (Korver et al., 2012). General limitations of the PRS analysis also applies to our study, which PRS cannot provide specific associations to a single SNP, elucidate functional mechanisms, or capture the effect of common variants in sex chromosomes, and rare and structural variants. Moreover, the results in the combined sample analyses may be confounded by the case-controls status.

Conclusions
We found five cognitive subtypes in patients, four in siblings and controls, and six by combining all samples. Despite the huge variability of trajectories observed across sample groups, stability has been commonly seen in most subtypes. After adjustment for environmental factors, PRS SCZ significantly predicted poor cognitive types in the all combined samples. Our finding demonestrated the presence of genetic overlap between schizophrenia and cognitive function, and can give insight about the mechanisms of cognitive deficits. These findings can be helpful to tackle heterogeneity and initiate person-based treatment instead of symptombased treatment. Identifying subgroups with high risk polygenic load can be beneficial to perform gene fine-mapping and enrichment analyses (Lin et al., 2009). Further deep endophenotyping by performing trajectory analysis on specific cognitive domain and investigating the effect of genetic risk factors is required as we observed in general cognition. Moreover, characterizing cognitive subtypes based on polygenic risk score of cognition, educational attainment, attention deficit hyperactivity disorder (ADHD) and major depressive disorder (MDD) is needed in the present sample and can shed more light on the genetic architecture of subtypes.

Contributors
TD, BZA and RB designed the study. TD managed the literature searches and analyses. TD, BZA, SMCZ and MAI undertook the statistical analysis, and author TD wrote the first draft of the manuscript. EJL, BZA, RB and HMB revised the manuscript and provided intellectual comments. All authors contributed to and have approved the final manuscript.

Funding source
Tesfa Dejenie is supported by the scholarship for a PhD from the University of Groningen, Groningen, the Netherlands. This research was supported by Geestkracht programme of the Dutch Health Research Council (Zon-Mw) (10-000-1001), and matching funds from participating pharmaceutical companies (Lundbeck; AstraZeneca; Eli Lilly and Janssen Cilag) and, universities and mental health care organizations (Amsterdam: Academic Psychiatric Centre of the Academic Medical Center and the mental health institutions: GGZ Ingeest; Arkin, Dijk en Duin; GGZ Rivierduinen; Erasmus Medical Centre and GGZ Noord Holland Noord. Groningen: University Medical Center Groningen and the mental health institutions: Lentis, GGZ Friesland; GGZ Drenthe; Dimence; Mediant; GGNet Warnsveld; Yulius Dordrecht and Parnassia psycho-medical center The Hague. Maastricht: Maastricht University Medical Centre and the mental health institutions: GGZ Eindhoven en De Kempen; GGZ Breburg; GGZ Oost-Brabant; Vincent van Gogh voor Geestelijke Gezondheid; Mondriaan; Virenze riagg; Zuyderland GGZ; MET ggz; Universitair Centrum Sint-Jozef Kortenberg; CAPRI University of Antwerp; PC Ziekeren Sint-Truiden; PZ Sancta Maria Sint-Truiden; GGZ Overpelt and OPZ Rekem. Utrecht: University Medical Center Utrecht and the mental health institutions: Altrecht; GGZ Centraal and Delta). The sponsors have no role in designing of the study, the collection, analysis, and interpretation of data, the writing of the report and the decision to submit the paper for publication.

Declaration of competing interest
None.