Edinburgh Research Explorer Genome-wide Association for Major Depression Through Age at Onset Stratification

BACKGROUND: Major depressive disorder (MDD) is a disabling mood disorder, and despite a known heritable component, a large meta-analysis of genome-wide association studies revealed no replicable genetic risk variants. Given prior evidence of heterogeneity by age at onset in MDD, we tested whether genome-wide signi ﬁ cant risk variants for MDD could be identi ﬁ ed in cases subdivided by age at onset. METHODS: Discovery case-control genome-wide association studies were performed where cases were strati ﬁ ed using increasing/decreasing age-at-onset cutoffs; signi ﬁ cant single nucleotide polymorphisms were tested in nine independent replication samples, giving a total sample of 22,158 cases and 133,749 control subjects for subsetting. Polygenic score analysis was used to examine whether differences in shared genetic risk exists between earlier and adult-onset MDD with commonly comorbid disorders of schizophrenia, bipolar disorder, Alzheimer ’ s disease, and coronary artery disease. RESULTS: We identi ﬁ ed one replicated genome-wide signi ﬁ cant locus associated with adult-onset ( . 27 years) MDD (rs7647854, odds ratio: 1.16, 95% con ﬁ dence interval: 1.11 – 1.21, p 5 5.2 3 10 -11 ). Using polygenic score analyses, we show that earlier-onset MDD is genetically more similar to schizophrenia and bipolar disorder than adult-onset MDD. CONCLUSIONS: We demonstrate that using additional phenotype data previously collected by genetic studies to tackle phenotypic heterogeneity in MDD can successfully lead to the discovery of genetic risk factor despite reduced sample size. Furthermore, our results suggest that the genetic susceptibility to MDD differs between adult- and earlier-onset MDD, with earlier-onset cases having a greater genetic overlap with schizophrenia and bipolar disorder. Lundbeck and Eli Lilly. All other authors report no biomedical ﬁ nancial interests or potential con ﬂ icts of interest.

Major depressive disorder (MDD) is a highly prevalent and heterogeneous disorder (1). With most individuals experiencing recurrent episodes throughout life (2), MDD is now the second leading cause of disability worldwide (3). MDD is defined by low mood and energy, inability to experience enjoyment, changes to eating and sleep patterns, feelings of guilt or worthlessness, and suicidal thoughts (4). Along with excess mortality and increased risk of suicide (5), MDD is associated with worse clinical outcomes when comorbid with health problems such as cardiovascular disease and cancer (6,7). Although the heritability is estimated at 31% to 42% (8), the causal variants remain elusive: a recent large mega-analysis with over 9000 MDD cases failed to identify any replicable associations (9), despite successes in similarly sized studies of schizophrenia (SCZ) and bipolar disorder (BPD) (10,11). This lack of biological markers may be among the causes for the well-established underfunding of research into MDD relative to its economic and health burden (12) and the reported stigmatization of sufferers (13).
However, several differences exist between MDD and other psychiatric disorders in which replicable genetic associations have been identified, including higher prevalence, greater diagnostic uncertainty, lower heritability, and, crucially, increased heterogeneity. One known source of heterogeneity that may contribute substantially is age at onset (AAO). Onset can occur at any stage of life, yet many factors associated with MDD are either age specific or age restricted. These include biological events such as puberty, menopause, and dementia, and environmental risk factors including childhood maltreatment, childbirth, and divorce. Earlier onset is associated with increased risk in first-degree relatives and with higher heritability (14)(15)(16)(17)(18). Considerable differences in the transmission of earlyversus late-onset MDD have also been reported (19,20), with some studies suggesting the effects of novel genetic risk factors for MDD appearing later in life (21,22).
Here we build on the previous mega-analysis of the Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium (PGC-MDD), using AAO to stratify cases within a sample of 8920 cases and 9521 control subjects, with the goal of reducing heterogeneity. For each of the nine PGC-MDD samples (23)(24)(25)(26)(27)(28)(29)(30), cases were ordered by AAO within study and divided into eight groups (octiles). Genome-wide association analysis of cases in these octiles was performed systematically against control subjects for 1,235,109 autosomal single nucleotide polymorphisms (SNPs). We examined three analytic strategies: 1) genetic variants specific to early-onset MDD; 2) risk variants specific to lateonset MDD; and 3) restricting to the intermediate four octiles excluding the 25% of cases at either extreme of AAO to test for potential heterogeneity introduced from very early or very late onset. Significantly associated SNPs were taken forward for replication in nine studies comprising 13,238 cases and 124,230 control subjects. We also examined the differences between early-and late-onset MDD in their shared heritability with commonly comorbid disorders of schizophrenia, bipolar disorder, Alzheimer's disease, and coronary artery disease to identify differences in genetic etiologies across onset groups.

Description of Samples
Full details of the studies that form the PGC-MDD are given in the supplementary materials of the original data analysis (9). Briefly, these nine MDD studies (23)(24)(25)(26)(27)(28)(29)(30) conducted genomewide genotyping on individual subjects of European ancestry. Subjects were required to have diagnoses of DSM-IV lifetime MDD established using structured diagnostic instruments from direct interviews by trained interviewers or clinicianadministered DSM-IV checklists. Two studies required recurrent MDD and one study required recurrent, early-onset MDD. Studies ascertained cases mostly from clinical sources, and control subjects were largely randomly selected from the population and screened for lifetime history of MDD. This led to a total of 9238 cases and 9521 control subjects with genotype information.

AAO Phenotype
AAO was defined as the age at which individuals first had symptoms that met the criteria of MDD and was self-reported in all studies. Of the original 9238 cases included in the sample, 8920 (95.6%) had a reported AAO. Cases reporting an AAO older than the recorded age at interview were removed from the analysis (n = 17). Within each study, cases were ordered by AAO and then divided into octiles, giving approximately 1000 cases per octile. Octiles were defined within each study to account for differences in case ascertainment. We noted a wide range in AAO between studies ascertaining recurrent depression using the same instrument, indicating that the precise setting (study, clinic, country) was important; we therefore chose to order cases by AAO within each study, rather than across studies, or by absolute AAO cutoffs. This strategy will identify genetic variants that were specific to early or late onset, relative to the mean AAO of the recruited cases. For secondary analysis of sex-specific effects, octiles were additionally defined within only male and female subjects for analysis of sex-specific effects, and for recurrent depression that has a higher heritability. These octiles will be referred to as O1 to O8, with O1 representing the earliest onset octile and O8 representing the latest onset octile. The GenRED (Genetics of Recurrent Early-Onset Depression) study only recruited MDD cases with an AAO below 31 years (25). We compared the distribution of AAO in GenRED with other similar studies (STAR*D [Sequenced Treatment Alternatives to Relieve Depression], RADIANT-UK) and estimated that GenRED recruited the youngest cases (62% of all possible MDD cases), with older cases absent from the study (Supplemental Figure S1). GenRED cases were ordered by AAO and assigned to the appropriate octiles O1 to O5, with no cases present in O6 to O8.

Quality Control
Genotyping was described in the supplementary materials in the original analysis (11). All samples were genotyped with SNP arrays of at least 200,000 SNPs. SNPs were removed for missingness .0.02, case-control difference in SNP missingness .0.02, SNP frequency difference from HapMap3 (phase three of the International HapMap Project) .0.15, or deviation Genomic Analysis of Age at Onset in Depression from Hardy-Weinberg equilibrium in control subjects (exact p , 1.0 3 10 -6 ). Subjects were removed for excessive missingness (.0.02), for being identical or closely related to any subject in any sample (p̂. .2 based on common autosomal SNPs), or if there was evidence for diverging ancestry. Ancestry was estimated using multidimensional scaling applied to 8549 SNPs directly genotyped in all samples and in approximate linkage equilibrium. Imputation was performed using Beagle 3.0.440 (31) with the CEU HapMap3 data (32) to impute 1,235,109 autosomal SNP allele dosages. The first 20 ancestry-informative principal components were included as covariates, along with an indicator for each study.

Genome-wide Association Analysis
Genome-wide association analysis was performed in PLINK using logistic regression to test the association between casecontrol phenotype and imputed SNP dosages under an additive model (33). Genotyping coordinates are given in NCBI Build 36/UCSC hg18 (National Center for Biotechnology Information, Bethesda, MD). Quality control was conducted separately for each sample. To test for SNPs associated with distinct aspects of MDD based on AAO, we performed three hypothesis-driven analyses (Supplemental Figure S2). The first analysis targeted those SNPs associated with early-onset MDD in a series of genome-wide association analyses, initially looking at the earliest onset cases (O1) against all control subjects, then the combined O1 and O2 cases against control subjects, then O1 to O3, etc., until all cases were included. This approach was based on the sequential additions method (34), which was developed to account for a quantitative trait that is measured in cases but not control subjects and provide an estimate of the best phenotype definition for future studies. The second analysis was similarly performed to examine those SNPs associated with later-onset MDD, but reversing the procedure (i.e., first looking at the latest-onset cases [O8] against all control subjects, then O7-O8 against control subjects, then O6-O8, etc., until all cases were included). The third analysis tested whether the extremes of AAO, both early and late, were introducing heterogeneity to the cases excluding O1 to O2 and O7 to O8, leaving those cases with onset within the interquartile range of AAO for each study (O3-O6). We then tested O3 to O6 cases against all control subjects. Each analysis was performed using all cases, male cases only, female cases only, and recurrent cases only. Analyses of all, or almost all, cases (O1-O8, O1-O7, or O2-O8) were used to identify SNPs that reached greatest significance without an AAO-specific effect and so were omitted from further analysis as would have been captured in the primary analysis of this data. In total, 52 genome-wide analyses were performed [5 (6 1 6 1 1) 3 4], making the standard multiple testing threshold of p , 5.0 3 10 -8 for genome-wide significance anticonservative. We applied a Bonferroni correction for 52 analyses to genome-wide significance, which is highly conservative because many analyses were highly correlated. For replication, we selected SNPs with p , 9.5 3 10 -10 in the discovery sample in more than one analysis (either by sex or recurrence or inclusion of octiles); the combination of cases that yielded the greatest significance was chosen as the basis for replication. This analysis strategy uses AAO as a stratifying variable to construct subsets of cases that may be more homogeneous and identify SNPs that are associated with susceptibility to MDD with a restricted AAO; it does not identify SNPs that control MDD AAO, which would require a case-only analysis of AAO as a quantitative trait.

Replication Analysis
Five replication samples used in the primary analysis of this dataset had AAO information available [TwinGene (35), GenREDII/ DepGenesNetwork (25), deCODE (9), PsychCoLaus (36), SHIP-LEGEND (Study of Health in Pomerania-Life-Events and Gene-Environment Interaction in Depression) (37)]. The GenPod/NEW-MEDS and Harvard i2b2 studies, which appeared in the replication of the primary analysis of this dataset, did not have AAO data available and were not included. Four new replication studies were available: 1) a collection of samples available through the University of Munster (38,39); 2) a combination of RADIANT cases from Denmark (40), the Danish DEMO and PRISME studies of MDD (41,42), and a set of Danish population control subjects; 3) the CONVERGE (China Oxford and VCU Experimental Research on Genetic Epidemiology) study of MDD cases and control subjects recruited in China (18,43,44); and 4) the Generation Scotland study, which included measures on MDD (45). These are outlined, alongside the definitions of AAO, in Supplemental Methods. Due to the early median AAO in the GenREDII/ DepGenesNetwork, an artificial "median" was introduced at age 27 years, based on the median for the discovery samples. Those SNPs that passed our threshold of p , 9.5 3 10 -10 in the discovery sample were tested for association within these nine replication studies. Due to differences in availability of genomewide genotype data, each study was genotyped and/or imputed separately. A fixed-effect inverse variance based meta-analysis of the replication studies was performed using METAL (46).

Polygenic Analysis
We also examined the association between early-and late-onset MDD and polygenic risk scores for other psychiatric disorders, as this might reflect either shared genetic etiology or phenotypic contamination (47). Polygenic risk scores for schizophrenia (9379 cases and 7736 control subjects) and bipolar disorder (6990 cases and 4820 control subjects) were created by the PGC, using imputed data and removing overlapping control subjects, ensuring that the datasets were completely independent (10,11). Alzheimer's disease polygenic risk scores were obtained from the GERAD (Genetic and Environmental Risk for Alzheimer's disease) Consortium (48) and the coronary artery disease scores from the CARDIoGRAM Consortium (49). The imputed GERAD sample comprised 3177 cases and 7277 control subjects, and the CARDIoGRAM consortium consisted of 22,233 cases and 64,762 control subjects. All four disorders were chosen for their previous genetic and epidemiological evidence of overlap with MDD (50)(51)(52).
Polygenic risk scores were calculated for MDD cases and control subjects, summing the number of risk alleles carried, weighted by the natural log of their odds ratio in the original genome-wide association study (GWAS). Score SNPs with low minor allele frequency (,0.02) or in the major histocompatibility complex region were removed, and score datasets were

Summary of AAO
After quality control, the sample consisted of 8920 MDD cases with AAO information and 9519 control subjects with 1,235,109 SNPs. The median AAO across all cases excluding GenRED was 27 years old (interquartile range: 18, 38) and the mean 28.9 613.64, reflecting the long tail of older onset cases ( Figure 1 and Supplemental Figure S3). Median AAO by study ranged from 20 to 37 years (16 for GenRED, which recruited only recurrent cases with onset no greater than 30), with the German samples having a slightly older onset. Mean AAO was lower for recurrent than nonrecurrent cases, 27.1 to 33.0 (p , .001, correcting for study and excluding GenRED). AAO was also older for male subjects with a mean of 29.0 compared with 26.7 in female subjects (p , .001).

Genome-wide Association Analysis
Our analysis of both early-and late-onset octiles and the intermediate median of AAO excluding extreme-AAO cases led to four tests passing our significance cutoff for replication (p , 9.5 3 10 -10 ), all for associations with SNP rs7647854 on chromosome 3. The only genome-wide significant association for this SNP was in the 50% oldest onset cases against all control subjects (O5-O8: p 5 3.4 3 10 -11 ) (Supplemental Figures  S4 and S5). As a secondary analysis of this SNP, we split cases within our discovery sample into nonoverlapping quartiles and analyzed them against control subjects. This showed a strong association in the oldest quartile (O7-O8: . Furthermore, when the inclusion threshold for AAO was increased 1 year at a time within each study (until only 100 cases remained), we observed a gradual increase in effect size for rs7647854 in all studies as the analysis was restricted to progressively later onset cases (Supplemental Figure S6). Analysis by sex or recurrence revealed no additional findings.

Replication Analysis
Given the results from our discovery dataset, rs7647854 was taken forward for replication in nine independent studies ( Table 1). The SNP was tested for association with oldest half of MDD cases (O5-O8), because this analysis attained the smallest p value in the discovery studies. The SNP was either directly genotyped or imputed at high confidence across all studies (Supplemental Table S1). The SNP was significantly associated with MDD in a meta-analysis of these studies, with a p value of 7.5 3 10 -4 and an OR of 1.10 (total number of MDD cases 5 6107 and total number of control subjects 124,230) ( Table 2; Supplemental Figure S5). Meta-analysis of the combined discovery sample with individual replication studies gave a p value of 5.2 3 10 -11 and an OR of 1.16 (95% CI: 1.11-1.21), surpassing genome-wide significance.

DISCUSSION
Our analysis of AAO and the genetic architecture of MDD suggested that AAO-specific genetic risk factors exist. Unexpectedly, the strongest associations we observed were with the oldest half of MDD cases, where previous analyses have focused on early-onset MDD, supported by studies of the genetic epidemiology of MDD. We emphasize that here "late"onset MDD was at still a relatively young age (median onset of MDD was at age 27 years with 98% of our sample having onset before 60 years) rather than onset in old age. It is clearer to conceptualize later-onset cases here as "adult-onset" cases, and we recommend that this subgroup should be included in recruitment of future genetic studies, especially as no difference in the heritability captured by SNPs was observed for this subtype compared with early-onset cases (see Supplemental Table S3). However, substantial differences in AAO across studies make it difficult to put a precise age cutoff for this recommended adult-onset MDD. We show significant association with rs7647854 on chromosome 3, which was associated with the 50% latest-onset cases (OR: 1.16, p 5 5.2 x 10 -11 ). This SNP was identified as the second strongest association in the primary mega-analysis of the PGC-MDD dataset (9), though at a much lower significance, and did not replicate (discovery p 5 6.5 3 10 -7 , replication p 5 .67). Stratifying by AAO leads to much stronger associations in both the discovery and replication samples, despite the reduced sample size. Although analyzing across multiple overlapping AAO subtypes runs the risk of overfitting to maximize significance, we observed highly significant associations for this SNP in both discovery and replication samples. Furthermore, we also observed a gradual increase in effect size as both raw AAO within studies and AAO percentile across studies increased (Figure 2, Supplemental Figure S4). That this was seen both for raw AAO within studies and as a percentile, and in both European and Chinese ancestry studies across a wide variety of AAO measurement tools, suggests that whereas our cutoff of the 50% latest onset cases is arbitrary, the effect exists regardless of exact cutoff used. However, this does not rule out the possibility that AAO may be a proxy for a more homogenous subgroup of MDD based on another factor (e.g., an age-specific environmental trigger or distinct pattern of symptoms with onset later in life). That median AAO and effect size across studies was not significantly correlated (r 5 .011, p 5.68) despite greater effect sizes by AAO percentile within studies suggests that differences in measurement of AAO across studies might obscure effects. rs7647854 is intergenic, with flanking genes including C3orf70, VPS8, EHHADH, and MAP3K13. C3orf70, VPS8, and MAP3K13 all show evidence of expression in the brain in several areas of potential interest for MDD including the hypothalamus, frontal cortex, pituitary, and thyroid (GTEx; http://www.gtexportal.org/) (53). VPS8 and MAP3K13 also show a slight increase in RNA expression within various brain regions during neonatal development (Human Brain Transcriptome; http://hbatlas.org/) (54) with maintained expression into adulthood.
This genetic association arises with the supposedly less heritable form of adult-onset MDD, although such summary measures give no information on the effect sizes of individual SNPs. One potential explanation is greater contamination of early-onset MDD cases by individuals misclassified with MDD or having comorbid disorders. Longitudinal studies show that early-onset depressive symptoms predict not only adult depression but also psychosis [e.g., (55)], and there is significant genetic overlap between MDD and other psychiatric disorders (50). Inclusion of early-onset cases with individuals who will later develop SCZ or BPD would reduce the power of GWASs for MDD, though secondary analysis of age at interview did not support this (Supplemental Figure S7). The polygenic risk score results show that early-and adult-onset MDD cases differ in their genetic susceptibility to BPD and SCZ and suggest that some of the heterogeneity in MDD results from the inclusion of early-onset cases with a greater genetic overlap or misclassification with these disorders. We did not observe a similar pattern of association for two disorders that are often comorbid with late-onset MDD, Alzheimer's disease, and coronary artery disease (51,52). We did, however, show for the first time a genetic overlap between MDD and coronary artery disease irrespective of AAO but no overlap between MDD and Alzheimer's disease. We also found that the heritability explained by SNPs across the genome for early-and late-onset MDD did not significantly differ, suggesting, for common variants at least, that recruiting only early-onset cases would not increase power (Supplemental Table S3).
The limitations of our approach are the reduced sample size and multiple testing from stratifying cases into subtypes, and requiring potentially less reliable secondary phenotypes to be widely collected. Measures such as AAO rely on self-report and are often assessed differently across studies, which can be problematic for comparison. The analyses presented here addressed this by analyzing AAO relative to the median of a study, though this assumed that each study recruited cases from the same distribution of onset with observed differences due to how AAO was defined. Furthermore, the effect of our genome-wide associated SNP were consistent in the replication studies, ethnicities, and countries, suggesting that differences in measurement may not be as much of a limitation as expected, at least in the case of AAO in MDD. The other disadvantage of looking at more homogenous groups is the reduction in sample size. However, it has previously been shown that only modest increases in effect size may be required to offset the reduction in power from analyzing fewer cases, implying that analyses of more homogenous subgroups have the potential to identify novel associations (56).
Our study illustrates the value of using additional phenotypic information on cases in GWASs. We show here that including information on AAO increases the power to detect All results reported are for the oldest half of MDD cases (O5-8), which had the strongest evidence for association in the discovery study. CI, confidence interval; MDD, major depressive disorder; OR, odds ratio.    (57), our stratification of cases on AAO is agnostic to the direction of the phenotypic effect. Our approach uses the additional phenotype data previously collected by genetic studies and complements an alternative emphasis on collecting large sample sizes through minimal phenotyping. Both strategies will undoubtedly be necessary to identify and characterize different components of the genetic architecture of psychiatric disorders. Furthermore, our analysis shows that tackling phenotypic heterogeneity in MDD can successfully lead to the discovery of a genetic risk factor despite reduced sample size. The identification here of a novel genetic risk variant for MDD is of great importance due to both the scarcity of evidence for the underlying biology and its pressing economic health burden.
Mental Health Centre, Copenhagen, Denmark; and The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Denmark. Funding from the BBSRC and MRC is gratefully acknowledged. Data used in the preparation of this article were obtained from the Genetic and Environmental Risk for Alzheimer's disease (GERAD1) Consortium. As such, the investigators within the GERAD1 consortia contributed to the design and implementation of GERAD1 and/or provided data but did not participate in analysis or writing of this report.
SS, HS, KS, and TET are employees of deCODE Genetics/Amgen. VA received funds from the German Federal Ministry of Education and Research, from the European Union (FP 7), and from the Interdisciplinary Center for Clinical Research Münster, and he has served on the advisory boards of, or has given presentations on behalf of the following companies: Astra-Zeneca, Janssen-Organon, Lilly, Lundbeck, Servier, Pfizer, Otsuka, and Trommsdorff. BTB has received funding from the National Health and All other authors report no biomedical financial interests or potential conflicts of interest.