The Contribution of Autism-Related Common Variants to the Familial Aggregation of Quantitative Autistic Traits

Background: Autism spectrum disorder (ASD) polygenic risk scores (PRS), derived from genome-wide association studies (GWAS), have been employed to predict ASD diagnosis in the general population (Nagelkerke’s R 2 =2.45%). Critically, large genetic epidemiological studies have demonstrated that family members of affected individuals exhibit subclinical quantitative autistic traits (QAT), suggesting that diagnosed ASD is the pathological tail of a continuous distribution of QAT. Recent molecular genetic studies have reinforced these ndings by uncovering a genetic correlation between ASD and QAT in the general population, emphasizing the importance of employing QAT in GWAS. Yet no study to date has examined the extent to which aggregated effects of common variants contributes to variance in QAT in a familial multiplex sample—one enriched for inherited ASD risk. Methods: Given the elevation of QAT among unaffected members in multiplex families, we examined the contribution of ASD-PRS to QAT, as measured by the Social Responsiveness Scale (SRS), in 1491 subjects from multiplex families in the Autism Genetic Resource Exchange (AGRE)—with (N=538) and without (N=921) an ASD diagnosis. Using the iPSYCH-ASD-GWAS as our discovery dataset, comprised of cases (N=8,605) and controls (N=19,526), we estimated how much variance in QAT that the ASD-PRS explained in our target dataset, AGRE. We also examined if there were interaction effects of diagnosis and sex. Results: The ASD-PRS explained 0.34% of the variance in QAT (p=0.017). There was no signicant interaction effect of disorder status (p=0.511), but there was a signicant interaction of sex (p=0.027) and a signicant interaction effect of ASD and sex (p=0.031). Limitations: The primary limitation of this study is the small size of the discovery GWAS (iPSYCH). Conclusions: Our results reveal that autism-related polygenic load, as measured via the ASD-PRS, signicantly predicts QAT—critically, in both affected and unaffected family members. Further, the association of ASD-PRS with QAT in females was modied by diagnosis, indicating that this relationship was the strongest among affected females.The unique nature of this study’s multiplex familial sample enables a novel demonstration that common polygenic variation in ASD contributes to familial QAT, particularly among affected females. of PRS scores and modeled ASD diagnosis in a logistic regression model. We estimated odds ratios (OR) for ASD in Quartiles 2 through 4, conducting three separate tests of each quartile, as compared to the rst (lowest) quartile (<25% PRS). Using generalized estimating equations (GEE) (as implemented in SAS Proc Genmod), the model was adjusted for PC1 – PC3, sex, and family variance. We also computed a Cochran-Armitage trend test, a chi-squared test for a linear trend across the four quartiles to test for dose response.


Introduction
The aggregation of autism spectrum disorder (ASD) in families is well established, [1] but ASD's genetic architecture remains poorly understood. Three classes of genetic risk have been identi ed as contributing to ASD: de novo mutations, rare inherited variants, and common polygenic variation. Several models, none of which are mutually exclusive, have been proposed to explain this aggregation. A major gene model hypothesizes that either one highly penetrant rare mutation or a restricted number of moderately to highly penetrant mutations contribute to the familial nature of ASD. [2][3][4] In contrast, polygenic models assume that many common inherited variants (minor allele frequency > 0.05), each of very small effect, collectively account for a substantial portion of the variance of total ASD risk and contribute to the disorder's familial aggregation. [5][6][7][8][9] Further, single-nucleotide polymorphism (SNP) data heritability estimates suggest that common variants explain a large proportion of the population-wide variance of ASD [5,7,10] as in many other common, complex, psychiatric disorders. [8, [11][12][13]. In fact, the latest genome-wide association study (GWAS) of autism has revealed ve statistically signi cant loci and a SNP-heritability of ~12%. [7] Some evidence suggests that even highly penetrant de novo mutations are operating on a background of high polygenic risk burden for ASD. [14] Yet characterizing this polygenic risk has been more challenging than the eld initially hoped. Given the underwhelming amount of variance explained by GWAS and relatively few genome-wide signi cant loci uncovered, the eld of quantitative genetics developed polygenic risk scores (PRS), which capture the aggregate effect of genetic variants (polygenicity), even those that may not have reached a stringent genome-wide signi cance threshold. A cumulative index of measured genetic liability to a disorder, PRS quanti es within an individual the aggregate effect of common variants for a given trait, typically calculated as the sum of trait-associated alleles across the genome, weighted by effect size. [15,16] Both rare and common genetic variation act additively to contribute to ASD risk in simplex (one clinicallyaffected individual) and multiplex (more than one clinically-affected individual) families, though evidence also indicates that there exist at least partially disparate genetic pathways in simplex versus multiplex families. A growing body of literature suggests that in simplex families, relatively rare, de novo, more highly penetrant and deleterious genetic mutations are present, whereas in multiplex families, evidence suggests a more highly polygenic, common variant burden. [10,17,18] Supporting a polygenic (additive genetic) model in multiplex families is the fact that relatives of children with ASD demonstrate elevated quantitative autistic traits. [19][20][21] In fact, rst-degree relatives without an ASD diagnosis in multiplex families exhibit elevated quantitative autistic traits (QAT; subthreshold autistic-like de cits in social interaction and communication, behavioral rigidity, restricted interests and repetitive behaviors not severe enough to meet a relatively arbitrary cut-off for clinical diagnosis) [19,20], yet this pathological shift has not been found in rst-degree relatives of probands in simplex families. [19] Family and twin literature supports the theory that clinical diagnosis of ASD is often the pathological tail of a continuous distribution of heritable quantitative autistic traits. [22][23][24] Given the clinical and epidemiological support for an additive, polygenic model of ASD, molecular genetic studies have begun to interrogate the relationship between diagnosed ASD and QAT to determine their genetic overlap. Employing a general population sample, Robinson and colleagues [25] found a genetic correlation (SNP-rg) of ~.3 between ASD and a quantitative measure of autistic traits, the Social Communication Disorders Checklist (SCDC). However, the SCDC captures only de cits in social communication and interaction and does not evaluate the other established domain of ASD symptomology: restricted interests and repetitive behaviors. Given this genetic overlap, St. Pourcain and colleagues found in a UK population-based study that ASD-PRS are higher in ASD cases than in pseudocontrols and were associated with variation in SCDC scores, but only at 8 years (PGC-ASD: adjusted R 2 max = 0.13%, P min = 0.0042).
[26] Bralten and colleagues employed a novel questionnaire of autistic traits in a Dutch general population sample to evaluate genetic overlap of ASD with each of the individual traits assessed by the questionnaire as well as employ an ASD PRS to predict phenotypic variance in autistic traits. They found small, yet statistically signi cant overlap with and prediction of some of the autistic traits (R 2 values ranging from .17-.54% for some subscales). [27] Yet to our knowledge, no study to date has employed ASD-PRS in a multiplex familial sample to predict variation in a psychometricallyestablished valid and reliable phenotype that captures the continuous variation in trait distribution encompassing social communication and interaction and restricted interests and repetitive behaviors. A multiplex familial sample is an ideal sample in which to interrogate this question, as these individuals are more likely to carry risk alleles. To that end, we employed the Social Responsiveness Scale (SRS) [28], a comprehensive measure of QAT, in a multiplex family sample.
We hypothesized that common variants of small individual effect sizes contribute to the familial aggregation of QAT. To examine this question, we constructed a polygenic risk score (PRS) from the iPSYCH ASD GWAS, comprised of cases (N=8,605) and controls (19,526). [29] We then investigated the contribution of common polygenic variation to QAT in our multiplex family cohort. Our primary question was to determine how much variance in QAT phenotype an ASD-PRS derived from a case-control GWAS could explain in a multiplex familial sample. Secondary analyses were conducted to determine if there were interaction effects of diagnosis and sex in our sample. [30,31]

Discovery GWAS
This study's discovery dataset is a previously-published GWAS of ASD (iPSYCH) [29] based on a Danish nationwide population-based cohort [30] including individuals born in Denmark between 1981 and 2005 and diagnosed with ASD prior to 2014. Cases comprised subjects with ASD as the only ascertained diagnosis (classi ed in accordance with the International Statistical Classi cation of Diseases and Related Health Problems, 10 th revision (ICD-10)) (N=8,605 of 12,371 possible ASD cases; ICD codes F84.0, F84.1, F84.5, F84.8 and/or F84.9), and controls (N=19,526) were a randomly-selected subset of the cohort without diagnosis (ICD F00-F99). iPSYCH was chosen as the source GWAS over the combined iPSYCH/Psychiatric Genomics Consortium (PGC) meta-analysis GWAS to avoid overlap with the AGRE target data which is included in the PGC dataset.

AGRE Cohort
This study's target dataset was drawn from Autism Speaks' Autism Genetic Resource Exchange (AGRE), the largest private repository of genetic and phenotype data of families with ASD. [31] The sampling bias of the collection is inclusion of multiplex families. The original AGRE genomic dataset comprised 2303 individuals with both GWAS and SRS data. As outlined below, we excluded any non-European subjects to ensure genetic homogeneity and reduce risk for population strati cation. Individuals with known chromosomal or neurogenetic abnormalities (e.g., DiGeorge, Fragile X), as well as non-verbal subjects were also excluded. This nal AGRE dataset comprised 1491 individuals (206 of whom were parents) in 663 families (ages ranged from 1.2 to 79 years); 1459 subjects contributed to the analysis of ASD diagnosis (we did not have data regarding diagnosis for 32 subjects).

Social Responsiveness Scale (SRS)
The quantitative trait measure utilized is the Social Responsiveness Scale (SRS) [28]. The SRS is a 65item measure of reciprocal social behavior, de cits in which are characterized as quantitative autistic traits (QAT). The SRS capitalizes on observations of individuals in naturalistic social contexts, by parent and/or teacher report. Its internal consistency is very high (alpha = ~.95), and it distinguishes ASDaffected individuals from controls with a Cohen's d effect size of ~2.7 and from individuals with other psychiatric conditions with an effect size of d= ~1. [28,32] It characterizes variation in the two Diagnostic and Statistical Manual of Mental disorders, 5 th edition (DSM-5) core domains of social communication and interaction and restrictive interests and repetitive behaviors. Reciprocal social behavior, as measured by the SRS, is continuously distributed in the general population and highly heritable throughout the range observed from unaffected to sub-clinically affected to fully ASD-affected individuals Individuals' scores on the SRS were obtained from the Internet System for Assessing Autistic Children (ISAAC) database (https://www.autismtools.org/). When longitudinal scores were available, the earliest assessment was employed. Total raw teacher report scores were prioritized, with parental report scores used when teacher scores were unavailable, as has been done previously. [21] Typical norming and adjustment by sex was not done in order to stratify by sex and test interactions with sex in downstream correlation analysis and regression models.

ASD Diagnosis
Diagnoses of subjects with ASD were established with both the Autism Diagnostic Interview-Revised (ADI-R) [33]and Autism Diagnostic Observation Schedule (ADOS) [34], which are the current gold standard instruments for diagnosing individuals with ASD. Parents not in the AGRE pedigree dataset were counted as unaffected.
Genotyping, Quality Control, and Imputation Genotyping, quality control, and imputation for the original AGRE dataset have been described elsewhere. [21] This AGRE dataset consisted of 2303 individuals for whom imputed genotypes comprising 5.8M autosomal variants were available. [35] Genetic Ancestry Principal Components Analysis In order to create an ancestrally homogenous sample and reduce false positives generated by population strati cation, we performed an ancestral principal components analysis (PCA) using Eigensoft software.
[36] We employed 1000 Genomes Phase 3 (1KG) reference panels (European (EUR), ad-mixed American (AMR), and African (AFR)) and projected the resulting eigenvectors onto AGRE subjects with both SRS and GWAS data (N=2303), resulting in factor scores for the rst three principal components (PC1 -PC3 ). Using standard Eigensoft protocol, this analysis yielded a sample of 1491 subjects of European-descent ancestry. PC1 -PC3 were included in all regression models to control for bias due to subpopulations within EUR ancestry.

Calculation of PRS
To calculate the ASD-PRS, we used the SNP effect sizes estimated for common variants from the previously-published iPSYCH GWAS of ASD, [30] to ensure the validity of the PRS by estimating effect sizes in an independent cohort. To reduce the multiple testing burden associated with PRS at multiple thresholds, we used the p-value of 0.1 as our a priori threshold as it was the most predictive in our independent source dataset, as well as ltering by imputation information < 0.70, resulting in 745,190 SNPs. We then took the intersection of variants from the iPSYCH ASD source GWAS that overlapped with the imputed variants from our AGRE target dataset (465177 SNPs). Next, we reduced the list of intersecting variants to an independent set by performing LD-clumping (r 2 < 0. 15  in the AGRE dataset weighted by the betas from the discovery iPSYCH GWAS at the same SNP. The PRS was standardized to mean=0, standard deviation (SD)=1 in order to interpret the regression coe cient as the change in SRS score for one SD increase in PRS.

Computation of Odds Ratios
We grouped subjects into quartiles of PRS scores and modeled ASD diagnosis in a logistic regression model. We estimated odds ratios (OR) for ASD in Quartiles 2 through 4, conducting three separate tests of each quartile, as compared to the rst (lowest) quartile (<25% PRS). Using generalized estimating equations (GEE) (as implemented in SAS Proc Genmod), the model was adjusted for PC1 -PC3, sex, and family variance. We also computed a Cochran-Armitage trend test, a chi-squared test for a linear trend across the four quartiles to test for dose response.

Correlation Analyses
As a preliminary step and for purposes of data visualization, unadjusted Pearson correlations between SRS and PRS were computed in the total sample and strati ed by sex and affected status. Graphs were produced with R version 3.4.1 Copyright (C) 2017. [41] Linear Mixed Model Regression We employed a multiple linear mixed model (LMM) regression analysis because our sample comprised multi-level data in which subjects existed in groups (i.e., families), where the SRS phenotype is expected to correlate within families as well as between families. First, to account for relatedness in the AGRE family sample, we calculated a genetic relatedness matrix (GRM) using Plink 1.90 [37]. The GRM was then entered into all models as a random effect, implemented in SAS Proc Mixed [42], to adjust the residual variance for relatedness. Age was not included in our modeling as a covariate because SRS scores are known to be invariant to age effects [40]. For our primary model estimating the variance of QAT explained by PRS (Model 1), the xed effects comprised PC1 -PC3, sex, and PRS. We computed three other models in which we tested the interactions of PRS*sex (Model 2), PRS*ASD (Model 3), and PRS*sex*ASD (Model 4). To properly control for confounders in models including 2-way interactions, we also included covariate x environment and covariate x gene interaction terms into the models [43] (see Table 4 for a breakdown of all interactions included in the modeling).
To estimate the variance explained, we t a series of full and reduced models, as outlined in Table 4, and from these multi-level models, we then calculated a conditional pseudo-R 2 using SAS mixed t, as traditional R 2 statistics are not available for LMM due to the random effect for GRM . [44] We reduced the full model by eliminating the xed effect or interaction term of interest and measured the difference in pseudo-R 2 between the full model and this reduced model. This change in R 2 (ΔR 2 ) serves as our estimate of the variance explained by the xed effect (PRS) or interaction term.
Datasets iPSYCH discovery GWAS dataset: https://github.com/mgandal/Shared-molecular-neuropathologyacross-major-psychiatric-disorders-parallels-polygenic-overlap/tree/master/raw_data/GWAS AGRE target dataset: obtained from author JL The sample's descriptive statistics are outlined in Table 1. The distributions of the PRS and SRS observed in the sample strati ed by sex are shown in Figures 1 and 2, respectively; and the distributions of PRS and SRS strati ed by affected status are depicted in Figures 3 and 4, respectively. The unadjusted mean PRS scores of the total sample, strati ed by sex and diagnostic status are reported in Table 2. n=1459 subjects contributed to strata with ASD. Mean PRS scores, stratified by sex and ASD; *N=1491 subjects contributed to sex strata; n=1459 subjects contributed to strata with ASD.

ASD as Outcome
Prior to QAT analyses, we conducted a preliminary set of analyses with ASD diagnosis as our outcome measure to verify foundational hypothesized effects. To determine if odds ratios (OR) for ASD diagnosis increased with greater polygenic load, we separated individuals from the AGRE cohort into quartiles of PRS risk. Using a logistic regression model (adjusted for sex, PC1 -PC3, and family variance) to test for association between the PRS quartiles and diagnostic status, we found that OR in each PRS quartile increased with greater polygenic risk. As outlined in Table 3

Association between PRS and SRS
As a preliminary step in our analyses, we calculated descriptive statistics to visualize the data. The

Variance in QAT Explained by Common Variation in ASD Risk
In our primary model, Model 1, to estimate the variance explained by PRS, we compared the pseudo-R 2 of a full model (PC1 -PC3, sex, and PRS) to the reduced model (eliminating PRS) (  Figure 7, this 3-way interaction demonstrates that the association between ASD-PRS and QAT was most pronounced in affected females. All modeling estimates are outlined in detail in Table 4.

Discussion
Our results reveal that autism-related polygenic load, as measured via the ASD-PRS, not only signi cantly predicts ASD but is also related to QAT, as assessed using the SRS-critically, in both affected and unaffected family members. The uniqueness of this study is the application of ASD polygenic risk predictors to a multiplex family sample, enriched for inherited liability to ASD, incorporating both affected and unaffected family members who were uniformly phenotyped using a standardized, comprehensive measure of QAT. These ndings corroborate a large body of genetic epidemiology research which has established a biological correspondence between clinically diagnosed ASD and subclinical QAT in family members of an affected individual. Although the variance explained was modest, its magnitude was in keeping with other analyses of social behavioral variation in the general population samples. [25][26][27] Critically, there was no statistical difference in variance predicted in unaffected vs. affected subjects, suggesting that both re ect the genotype-phenotype association indexed by the iPYSCH PRS. It is likely that the study design resulted in individuals without a diagnosis being enriched for PRS load relative to population controls. Despite the lack of moderation of ASD-PRS by diagnosis, the association of ASD-PRS with QAT was the strongest in affected females, indicating that the links between ASD genetic susceptibility and QAT may be most pronounced in females with an ASD diagnosis.
Our results emphasize the importance of considering and incorporating trait variation in (a) the general population and (b) unaffected members of ASD-affected families to improve understanding of the genetic architecture of ASD and risk prediction. Given the mounting evidence that clinically diagnosed ASD is the extreme expression of one or more heritable quantitative traits (at least as it pertains to common variant genetic risk), a failure to capture subclinical variation in QAT and related phenotypes in individuals classi ed as controls may confound efforts to identify genetic variants associated with ASD. Moreover, greater speci cation of polygenic risk among individuals with rare genetic disorders may improve efforts to predict variation in expression of deleterious variants and thereby disease severity in affected individuals. [45] In light of these complexities, moving away from categorical classi cation of disease and employing quantitative approaches to understanding the origins of ASD may reveal a more complete picture of the mechanisms of inheritance in ASD. Uncovering the speci c polygenic contributors to ASD is already beginning to reveal a molecular basis for cross-disorder aggregation in families. Tracing the effects of polygenic risk to speci c early neurodevelopmental precursors of ASD may identify pro les of allelic variation that map to disparate developmental liabilities, for which speci c combinations or permutations give rise to ASD [46], and according to which an individual might be typed for personalized approaches to prevention or therapy. Furthermore, the inclusion of SNPs which are primarily associated with other disorders and traits can actually improve the predictive ability of the ASD-PRS (e.g., SNPs implicated in schizophrenia, depression, IQ). [7] Therefore, characterizing patients and controls with respect to inherited traits that index genetic liability to ASD-whether or not those traits are speci c to ASD-is expected to encompass more of the variance of the condition than can be identi ed by simply contrasting the genetic pro les of individuals with and without the diagnosis of ASD. Finally, quantitative approaches may also provide a means by which to classify more homogeneous subgroups of patients re ecting underlying biology, thereby allowing the pursuit of underlying neural mechanisms and pathways to be conducted with higher precision.
An important future direction suggested by these results is to determine whether the predictive power of polygenic risk signals for autism might be enhanced by including cohorts of unaffected individuals quantitatively phenotyped for variation in QAT in GWAS discovery sets. A recently-identi ed nuance of quantitative phenotyping in autism is that the measurements may be signi cantly less heritable as the threshold for clinical-level affectation is approached: quantitative trait scores in the general population exhibit heritability estimates on the order of 0.85 compared to 0.25 for quantitative measurements of symptom severity (across the wide range of affectation) among clinically-diagnosed individuals. [47] A possible implication, then, would be a paradoxical improvement in statistical power for genotypephenotype association in quantitative trait analyses when sampling from populations unaffected by clinical-level aggregation for the trait of interest.

Limitations
As with all complex diseases and traits, a limitation of this study is the proportion of variance explained by the discovery GWAS, likely due at least in part to the small sample size. The largest ASD GWAS (iPSYCH-PGC) is still relatively small (N = 18,381 individuals with ASD and 27,969 control) in comparison to GWAS of other traits, diseases, and psychiatric disorders (e.g., schizophrenia with N = 36,989 cases and 113,075 controls); we intentionally limited the total available discovery set by exclusion of the PGC sample due to its inclusion of some AGRE subjects and the consequent inability to specify with certainty a PGC subset devoid of overlap with the families in our AGRE replication cohort. As ASD GWAS sample sizes increase, the predictive power of PRSs is expected to improve, and the estimation of common variant effect sizes to become more precise. Future studies using a PRS based on weights from a larger GWAS sample are likely to capture even more of the variance in QAT and related, overlapping traits.

Conclusions
In conclusion, we demonstrated for the rst time that a PRS for clinical ASD is associated with variation in QAT above and below the threshold for diagnosis, using a quantitative phenotype that is known to speci cally index familial liability for autism. This was observed in a sample enriched for the inheritance of clinical autistic syndromes. Our study provides a critical replication of an observed polygenic risk signal for ASD in an independent sample, and supports the contribution of common polygenic risk to the aggregation of clinical and sub-clinical autistic traits in the population. These ndings underscore the importance of employing quantitative approaches in future genetic studies of ASD and other neuropsychiatric disorders, speci cally incorporating the measurement of quantitative variation in behavioral traits that exhibit genetic overlap with autism, not only among probands, but among unaffected relatives and controls. Given the modest amount of variance explained in QAT by ASD-PRS, it is possible that future GWAS of QAT will reveal novel signals which will enhance our understanding of the genetic architecture and underlying biology of ASD and related phenotypes.

Consent for publication
Not applicable.

Availability of data and materials
The data that support the ndings of this study are available from AGRE, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of AGRE.
Competing Interests JNC receives royalties from Western Psychological Services for the commercial distribution of the Social Responsiveness Scale; however, no royalties were generated from research implementation of the SRS in the AGRE registry. All other authors report no financial relationships with commercial interests.

Funding
Supported by ACE Network grants R01 MH081754 and R01 MH100027 and ACE Center grants P50 HD055784 and R01 MH071425 to Dr. Geschwind; by grant R01 HD042541 and IDDRC Center grant P30 HD062171 to Dr. Constantino. The Autism Genetic Resource Exchange is a program of Autism Speaks and was supported by NIMH grant U24 MH081810.
Authors' Contributions JNC conceptualized the study and contributed to sample accrual. JKL and DHG participated in sample assembly, advised analyses, and secured grant funding. JNC and REW determined statistical design and analyses, and AA and ECJ contributed substantial statistical and procedural advice. WH conducted the data processing and statistical analyses. JNC, REW, AA, and ECJ analyzed and interpreted the data. REW drafted the manuscript, AA, JKL, ECJ, and JNC critically revised the manuscript, and REW nalized the manuscript. All authors read and approved the nal manuscript. Figure 1 Distribution of relative frequency of PRS scores, strati ed by sex Distribution of relative frequency of SRS scores, strati ed by sex  Distribution of relative frequency of SRS scores, strati ed by ASD status  Correlation of PRS with SRS scores, strati ed by ASD status Correlation of PRS with SRS scores, strati ed by sex and ASD status