Genome-wide association meta-analysis for early age-related macular degeneration highlights novel loci and insights for advanced disease

Advanced age-related macular degeneration (AMD) is a leading cause of blindness. While around half of the genetic contribution to advanced AMD has been uncovered, little is known about the genetic architecture of early AMD. To identify genetic factors for early AMD, we conducted a genome-wide association study (GWAS) meta-analysis (14,034 cases, 91,214 controls, 11 sources of data including the International AMD Genomics Consortium, IAMDGC, and UK Biobank, UKBB). We ascertained early AMD via color fundus photographs by manual grading for 10 sources and via an automated machine learning approach for > 170,000 photographs from UKBB. We searched for early AMD loci via GWAS and via a candidate approach based on 14 previously suggested early AMD variants. Altogether, we identified 10 independent loci with statistical significance for early AMD: (i) 8 from our GWAS with genome-wide significance (P < 5 × 10− 8), (ii) one previously suggested locus with experiment-wise significance (P < 0.05/14) in our non-overlapping data and with genome-wide significance when combining the reported and our non-overlapping data (together 17,539 cases, 105,395 controls), and (iii) one further previously suggested locus with experiment-wise significance in our non-overlapping data. Of these 10 identified loci, 8 were novel and 2 known for early AMD. Most of the 10 loci overlapped with known advanced AMD loci (near ARMS2/HTRA1, CFH, C2, C3, CETP, TNFRSF10A, VEGFA, APOE), except two that have not yet been identified with statistical significance for any AMD. Among the 17 genes within these two loci, in-silico functional annotation suggested CD46 and TYR as the most likely responsible genes. Presence or absence of an early AMD effect distinguished the known pathways of advanced AMD genetics (complement/lipid pathways versus extracellular matrix metabolism). Our GWAS on early AMD identified novel loci, highlighted shared and distinct genetics between early and advanced AMD and provides insights into AMD etiology. Our data provide a resource comparable in size to the existing IAMDGC data on advanced AMD genetics enabling a joint view. The biological relevance of this joint view is underscored by the ability of early AMD effects to differentiate the major pathways for advanced AMD.


Background
Age-related macular degeneration (AMD) is the leading cause of irreversible central vision impairment in industrialized countries. Advanced AMD presents as geographic atrophy (GA) and/or neovascular (NV) complications [1]. Typically, advanced AMD is preceded by clinically asymptomatic and thus often unrecognized early disease stages. Early AMD is characterised by differently sized yellowish accumulations of extracellular material between Bruch's membrane and retinal pigment epithelium (RPE) or between RPE and the photoreceptors (drusen or subretinal drusenoid deposits, respectively). Other features of early AMD are RPE abnormalities, including depigmentation or increased amount of pigment [1].
Early and advanced AMD can be documented by color fundus imaging of the central retina and/or other multimodal imaging approaches including optical coherence tomography (OCT) [1][2][3]. While the definition of advanced AMD is reasonably homogeneous across clinical and epidemiological studies, the classification of early AMD is more variable and different studies traditionally apply differing classification systems [4,5].
Epidemiological studies show that high age is the strongest risk factor for early and advanced AMD onset as well as progression [1,[6][7][8]. A robust genetic influence was shown for advanced AMD [1,[9][10][11] with 34 distinct loci at genome-wide significance in a large genome-wide association study (GWAS) for advanced AMD [9]. The genes underneath these advanced AMD loci were found to be enriched for genes in the alternative complement pathway, HDL transport, and extracellular matrix organization and assembly [9].
Exploring the genetics of early AMD offers the potential to understand the mechanisms of early disease processes, but also for the development to advanced AMD when comparing genetic effect sizes for early and advanced stages. Yet there have been few published GWAS searches for early AMD. One meta-analysis on 4089 early AMD patients and 20,453 control persons reported two loci with genomewide significance, both being well known from advanced AMD, the CFH and the ARMS2/HTRA1 locus [12].
We have thus set out to gather GWAS data for early AMD from 11 sources including own study data, data from the International AMD Genomics Consortium (IAMDGC), dbGaP and UK Biobank to conduct the largest GWAS meta-analysis on early AMD to date.

GWAS data from 11 sources
We included 11 sources of data with GWAS data and color fundus photography for early AMD phenotyping (Table S1). Our studies were primarily population-based cohort studies, where the baseline survey data were used for this analysis from studies of the authors (GHS, LIFE, NICOLA, KORA, AugUR) as well as for publicly available studies from dbGaP (ARIC, CHS, WHI; accession numbers: phs000090.v5.p1, phs000287.v6.p1, phs000746.v2.p3). We also included data from UK Biobank for participants from baseline and additional participants from the follow-up survey, since the color fundus photography program had started only after the main study onset (application number #33999). The studies captured an age range from 25 to 100 years of age (mean age from 47.5 years to 77.2 years across the 10 population-based studies, AugUR with the very old individuals range from 70.3 years to 95 years). About 50% of the study participants in each study were male (except for the Women's Health Initiative, WHI), and all demonstrated European ancestry. All studies (except GHS-1, GHS-2 and Life-Adult) excluded any person with at least 2nd degree relationship (Table S1). For each of these crosssectional data sets, participants with at least one eye gradable for AMD (see below) and with existing GWAS data were eligible for our analysis. We excluded participants with advanced AMD. We used participants with ascertained early AMD as cases and participants being ascertained for not having any signs of AMD as controls (n = 7363 cases, 73,358 controls across these population-based studies). Case-control data were also included from IAMD GC (http://amdgenetics.org/). The early AMD GWAS from IAMDGC is based on 24,527 individual participant data from 26 sources [9]. This data includes 17,856 participants with no AMD and 6671 participants with early AMD (excluding the 16,144 participants with advanced AMD). The cases and controls from IAMDGC were 16 to 102 years of age (mean age = 71.7 years). For all of these participants, DNA samples had been gathered and genotyped centrally (see below) [9].

Genotyping and imputation
All population-based studies were genotyped, quality controlled and imputed using similar chip platforms and imputation approaches (Table S2). As the imputation backbone, the 1000 Genomes Phase 1 or Phase 3 reference panel was applied [13], except GHS was imputed based on the Haplotype Reference Consortium (HRC) [14] and UK Biobank was imputed based on HRC and the UK10K haplotype resource [15]. Details on the UK Biobank genotypic resource are described elsewhere [16]. For the IAMDGC case-control data, DNA samples had been gathered across all participants and genotyped on an Illumina HumanCoreExome array and quality controlled centrally. Genotype quality control and imputation to the 1000 Genomes phase 1 version 3 reference panel (> 12 million variants) were conducted centrally. Details on the IAMDGC data were described in detail by Fritsche et al. [9].

Phenotyping
Across all studies included into this analysis, early AMD and the unaffected status was ascertained by color fundus photography. For participants from AugUR and LIFE, "early AMD" was classified according to the Three Continent Consortium (3CC) Severity Scale [4], which separates "mild early" from "moderate" and "severe early" AMD stages depending on drusen size, drusen area, or the presence of pigmentary abnormalities [4]. For the analysis, we collapsed any of these "early" AMD stages into the definition of "early AMD". However, the 3CC Severity Scale was not available for the other studies. In these, similar early AMD classifications, considering drusen size or area and presence of pigmentary abnormalities, were used (Table S1): For participants from GHS, the Rotterdam Eye Study classification was applied [17]. For participants from NICOLA, the Beckman Clinical Classification was utilized [18]. Participants from the KORA study were classified as "early AMD" based on the AREDS-9 step classification scheme and we defined "early AMD" for this analysis by AREDS-9 steps 2-8 [19]. The ascertainment of IAMDGC study participants is described in detail elsewhere and covers various classification systems [9]. Of note, LIFE and NICOLA phenotyping incorporated OCT information additional to the information from color fundus imaging (Table S1). For UK Biobank participants, color fundus images were received (application number 33999); there was no existing AMD classification available in UK Biobank (see below). The AMD status of a person was derived based on the AMD status of the eye with the more severe AMD stage ("worse eye") when both eyes were gradable, and as the grade of the one available eye otherwise. Eyes were regarded as gradable, if at least one image of the eye fulfilled defined quality criteria allowing for the assessment of AMD (bright image, good color contrast, full macular region captured on images). Images were excluded from AMD grading if they revealed obscuring lesions (e.g. cataract) or lesions considered to be the result of a competing retinal disease (such as advanced diabetic retinopathy, high myopia, trauma, congenital diseases, or photocoagulation unrelated to choroidal neovascularization). Details for IAMDGC are described previously [9]. Persons with gradable images for at least one eye were included in this analysis. Persons with advanced AMD defined as presence of neovascularization or geographic atrophy in at least one eye were excluded for the main GWAS on early AMD.

Automated classification of early AMD in UK biobank
To obtain early AMD phenotype data for UK Biobank participants, we used a pre-trained algorithm for automated AMD classification based on an ensemble of convolutional neural networks [20]. In the UKBB baseline data, fundus images were available for 135,500 eyes of 68,400 individuals with at least one image. Among the additional 38,712 images of 19,501 individuals in the follow-up, there were 17,198 individuals without any image from baseline. For each image (eye) at baseline and follow-up, we predicted the AMD stage on the AREDS-9 step severity scale using the automated AMD classification. We defined a person-specific AMD stage at baseline and follow-up based on the worse eye. Eyes that were classified as ungradable were treated as missing data and, if diagnosis was available for only one eye, the person-specific AMD stage was based on the classification of the single eye. If we obtained an automated disease classification to an AMD stage (i.e. not "ungradable" for both eyes) at baseline and follow-up, we used the follow-up disease stage (and follow-up age) in the association analysis. By this, we obtained an automated AMD classification for 70,349 individuals (2161 advanced AMD, 3835 early AMD, 64,353 unaffected). Individuals with advanced AMD were excluded from this analysis. Finally, we yielded 57,802 unrelated individuals of European ancestry with valid GWAS data that had either early AMD or were free of any AMD (3105 cases, 54,697 controls). We evaluated the performance of the automated disease classification by selecting 2013 individuals (4026 fundus images) for manual classification based on the 3CC Severity Scale. Details are described elsewhere [21]. We found reasonable agreement between the automated and the manual classification for the four categories of "no AMD", "early AMD", "advanced AMD" and "ungradable" (concordance = 79.5%, Cohen's kappa κ = 0.61, kappa with list-wise exclusion of ungradable individuals κ = 0.47 [22]). We found 305 of the 2013 individuals to be ungradable by the automated approach (i.e. 15.2%), including 257 individuals that have also been ungradable manually (i.e. truly bad image quality for both eyes) and 48 individuals that had been manually gradable, but not with the automated approach (i.e. 2.4%). Further details and further comparisons are provided in the Supplementary Note and Tables S3-S4.

Study-specific association analyses
Study-specific logistic regression analyses (early AMD cases versus controls, excluding advanced AMD cases) were applied by study partners (in Regensburg, Leipzig, Mainz, Belfast) using an additive genotype model and according to a pre-defined analysis plan. All publicly available data from dbGAP (studies ARIC, CHS and WHI) and UK Biobank as well as IAMDGC data was analyzed in Regensburg. All studies inferred the association of each genetic variant with early AMD using a Wald test statistic as implemented in RVTESTS [23]. Age and two principal components (to adjust for population stratification) were included as covariates in the regression models. We conducted sensitivity analyses to evaluate the impact of additionally including sex and 10 principal components in the regression models on the example of the two largest data sets, the IAMDGC and the UKBB data. The IAMDGC analyses were further adjusted for DNA source as done previously [9]. For the IAMDGC data that stemmed from 26 sources, we conducted a sensitivity analysis additionally adjusting for source membership according to previous work highlighting slight differences in effect estimates [24]; we found the same results.
Quality control of study-specific aggregated data GWAS summary statistics for all data sources were processed through a standardized quality-control (QC) pipeline [25]. This involved QC checks on file completeness, range of test statistics, allele frequencies, population stratification as well as filtering on low quality data. We excluded variants with low minor allele count (MAC < 10, calculated as MAC = 2*N eff *MAF, with N eff being the effective sample size, N eff = 4N Cases *N Controls /(N Cases + N Controls ) and MAF being the minor allele frequency), low imputation quality (rsq < 0.4) or large standard error of the estimated genetic effect (SE > 10). Genomic control (GC) correction was applied to each GWAS result to correct for population stratification within each study [26]. The estimation of the GC inflation factor was based on variants outside of the 34 known advanced AMD regions (excluding all variants within < 5 Mb base positions to any of the 34 known advanced AMD lead variants). The GC factors ranged from 1.00 to 1.04 (Table S2). We transferred all variant identifiers to unique variant names consisting of chromosomal, base position (hg19) and allele codes in (e.g. "3:12345:A_C", allele codes in ASCII ascending order).
GWAS meta-analysis across the 11 sources of data for early AMD genetics For signal detection and effect quantification, studyspecific genetic effects were combined using an inversevariance weighted fixed effect meta-analysis method as implemented in METAL [27]. We performed additional QC on meta-analysis results: We only included variants for identification that were available (i) in at least two of the 11 data sources with a total effective sample size of more than 5000 individuals (N eff > 5000) and (ii) for chromosome and position annotation in dbSNP (hg19). A conservative second GC correction (again focusing on variants outside the known advanced AMD regions) was applied to the meta-analysis result, in order to correct for potential population stratification across studies [26]. The GC lambda factor of the meta-analysis was 1.01.

Genome-wide search for early AMD variants, variant selection, and locus definition
In our first approach, we conducted a genome-wide search for variants associated with early AMD and judged at a genome-wide significance level (P < 5.0 × 10 − 8 ). Variants identified by this GWAS approach were deemed as established with genome-wide significance in our meta-analysis (tier 1). To evaluate the robustness of any novel genome-wide significant AMD locus, we performed leave-one-out (LOO) meta-analyses. We also evaluated heterogeneity between study-specific genetic effect estimates for selected variants using the I 2 measures [28][29][30].
We combined genome-wide significant variants (P < 5.0 × 10 − 8 ) into independent loci by using a locus definition similar to what was done previously [9]: the most significant variant was selected genome-wide, all variants were extracted that were correlated with this lead variant (r 2 > 0.5, using IAMDGC controls as reference) and a further 500 kb were added to both sides. All variants overlapping the sodefined locus were assigned to the respective locus. We repeated the procedure until no further genome-wide significant variants were detected. Genes overlapping the sodefined loci were used for biological follow-up analyses (gene region defined from start to end). To identify independent secondary signals at any novel AMD locus, approximate conditional analyses were conducted based on meta-analysis summary statistics using GCTA [31].

Candidate approach
Additionally to the genome-wide search in our metaanalysis of 14,034 cases and 91,214 controls, we adopted a candidate approach based on the 14 reported suggestive variants by Holliday et al. (P-values from 8.9 × 10 − 6 to 1.1 × 10 − 6 in their meta-analysis, 4089 cases and 20, 453 controls) [12]. For this, we analyzed our data without the studies that overlapped with the previously reported data (i.e. removing ARIC, CHS; yielding 13,450 cases and 84,942 controls). We also combined the data from Holliday et al. with this non-overlapping part of our data where possible (i.e. the 14 reported variants) by an inverse-variance weighted meta-analysis (altogether 17,539 cases, 105,395 controls). We judged these 14 variants' association at experiment-wise significance (P < 0.05/14) in our non-overlapping data and at genomewide significance (P Combined < 5.0 × 10 − 8 ) in the combined analysis. We considered a candidate-based selected variant as established with both experiment-wise significance (P < 0.05/14 in our non-overlapping data) and with genome-wide significance (P Combined < 5.0 × 10 − 8 in the combined data) (tier 2), or as established with experiment-wise significance (P < 0.05/14 in our non-overlapping data), but without established genomewide significance (P Combined ≥ 5.0 × 10 − 8 ) (tier 3).

Gene prioritization at newly identified AMD loci
To prioritize genes and variants at the newly identified AMD loci, we conducted a range of statistical and functional follow-up analyses. The following criteria were used: (1) Statistical evidence; we computed the posterior probability of each variant using Z-scores and derived 95% credible intervals for each locus [32]. The method assumes a single causal signal per locus. (2) Variant effect predictor (VEP) to explore whether any of the credible variants was located in a relevant regulatory gene region [33]. (3) eQTL analysis: We downloaded expression summary statistics for the candidate genes in retina from the EyeGEx database [34] and for 44 other tissues from the GTEx database [35] (both available at www.gtexportal.org/home/ datasets) and evaluated whether any of the credible variants showed significant effects on expression levels in the aggregated data. For each significant eQTL in EyeGEx, we conducted colocalization analyses using eCAVIAR [36] to evaluate whether the observed early AMD association signal colocalized with the variants' association with gene expression. (4) Retinal expression: We queried the EyeIntegration database to evaluate genes in the relevant loci for expression in fetal or adult retina or RPE cells [37].

Phenome-wide association study for newly identified AMD loci
We used 82 other traits and queried reported genomewide significant (P < 5.0 × 10 − 8 ) lead variants and proxies (r 2 > 0.5) for any of these traits for overlap with genes underneath our novel loci as done previously [39]. For this, we used GWAS summary results that were previously aggregated from GWAS catalogue [40], GWAS central [41] and literature search.
For the novel early AMD lead variants, we further evaluated their association with 118 non-binary and 660 binary traits from the UK Biobank [42]. The Phenome-wide association study (PheWAS) web browser "GeneATLAS" (www. geneatlas.roslin.ed.ac.uk) was used for the UK Biobank lookup. For each variant, association P values were corrected for the testing of multiple traits by the Benjamini-Hochberg false-discovery-rate (FDR) method [43].

Interaction analyses
For the novel early AMD effects and for the 34 known advanced AMD lead variants [9], we investigated whether age modulated early AMD effects by analyzing variant x age interaction in seven data sources for which we had individual participant data available in Regensburg (ARIC, CHS, WHI, IAMDGC, UKBB, AugUR and KORA). For each source, we applied logistic regression and included a variant x AGE interaction term in the model (in addition to the covariates used in the main analysis). We conducted meta-analysis across the seven sources to obtain pooled variant x age interaction effects and applied a Wald test to test for significant interaction (at a Bonferroni-corrected alpha-level). For the novel early AMD effects, we further investigated whether age modulated advanced AMD effects by evaluating publically available data from IAMDGC [44]. Finally, we investigated whether a novel early AMD lead variant modulated any of the effects of the 34 known AMD variants on advanced AMD [9]. We used the IAMDGC data and applied one logistic regression model for each pair of known advanced AMD variants and novel early AMD variants including the two respective variants and their interaction (and the same other covariates as before).

Comparison of genetic effects on early and advanced stage AMD
We estimated the genetic correlation between early and advanced AMD by utilizing the LDSC tool [45] with the GWAS summary statistics for early and advanced AMD (from the current meta-analysis and the IAMDGC [9], respectively). We used pre-calculated LD scores for European ancestry (https://data.broadinstitute.org/alkesgroup/ LDSCORE/eur_w_ld_chr.tar.bz2). We further compared genetic effect sizes between early and advanced AMD for the novel early AMD lead variants and for the 34 known advanced AMD lead variants [9]. For this, we queried the novel early AMD lead variants in the IAMDGC GWAS for advanced AMD [9] and (vice-versa) queried the 34 known advanced AMD lead variants [9] in the early AMD metaanalysis results. We compared effect sizes in a scatter plot and clustered the lead variants by their nominal significant association on advanced and/or early AMD. We classify different types of loci in a similar fashion as done previously for adiposity trait genetics [46]: (1) "advanced-and-early" AMD loci (P early < 0.05, P adv < 0.05), (2) "advanced-only" AMD loci (P early ≥ 0.05, P adv < 0.05), (3) "early-only" AMD loci (P early < 0.05, P adv ≥ 0.05).

Pathway analysis
To evaluate whether "advanced-and-early" AMD loci versus "advanced-only" AMD loci distinguished the major known pathways for advanced AMD, we performed pathway enrichment analysis separately for these two classes. We used the genes in the gene prioritization for all advanced AMD loci as previously described [9], derived the gene prioritization score and selected the best scored gene in each locus (two genes in the case of ties). We then separated the gene list according to the class of the respective locus, and performed pathway enrichment analysis via Enrichr [47] with default settings searching Reactome's cell signaling pathway database 2016 (n = 1530 pathways). Pvalues were corrected for multiple testing with Benjamini-Hochberg procedure [43].

Results
Eight genome-wide significant loci from a GWAS on early AMD We conducted a meta-analysis of genotyped and imputed data from 11 sources (14,034 early AMD cases, 91,214 controls; 11,702,853 variants; for study-specific genotyping, analysis and QC, see Tables S1-S2). For all participants, early AMD or control status (i.e. no early nor late AMD) was ascertained via color fundus photographs (Table S1). This included automated machinelearning based AMD classification of UK Biobank fundus images (application number 33999; 56,699 individuals from baseline, 13,650 additional individuals from follow-up) [16,20]. Based on logistic regression association analysis in each of the 11 data sets meta-analyzed via fixed effect model, we identified eight distinct loci with genome-wide significance (tier1; P = 1.3 × 10 − 116 to 4.7 × 10 − 8 , Fig. 1, Table 1; "locus" defined by the lead variant and proxies, r 2 ≥ 0.5, +/− 500 kb). Six of these loci were novel for early AMD; two loci had been identified for early AMD previously [12].
Taken together, we identified eight loci for early AMD, two known for early AMD and six novel, including one novel locus (near CD46) for any AMD with genome-wide significance.
Two further significant loci from a candidate-based approach of 14 variants Subsequently, we applied a candidate-based approach by investigating the 14 variants reported as suggestive by the previous GWAS for early AMD (4089 early AMD cases, 20,453 controls; reported P between 1.1 × 10 − 6 and 8.9 × 10 − 6 ) [12]. For this, we re-analyzed our data excluding the overlap with the previous GWAS (i.e. excluding ARIC, CHS study, yielding 13,450 early AMD cases and 84,942 controls) and also combined the reported and our non-overlapping data in an inversevariance weighted meta-analysis where possible (i.e. for the 14 reported variants; altogether 17,539 cases, 105, 395 controls). Among the 14 variants, we found four significant variants for early AMD (Table 2): (i) two "tier 2" variants (near PVRL2 and CD46) with experiment-wise significance (P < 0.05/14) in our non-overlapping data and with genome-wide significance (P Combined < 5.0 × 10 − 8 ) in the combined analysis, and (ii) two additional "tier 3" variants (near APOE/TOMM40 and TYR) with experiment-wise significance (P < 0.05/14) in our nonoverlapping data.
We compared the evidence from the Bonferronicorrected analysis judged at experiment-wise significance and the combined analysis judged at genome-wide significance. Both "tier 3" variants showed combined Pvalues close to the genome-wide significant threshold (P Combined = 9.0 × 10 − 8 and 1.7 × 10 − 7 , for the APOE/ TOMM40 and the TYR variant, respectively), while the other 10 of the 14 variants showed combined P-values far away from significance (P combined from 4.8 × 10 − 3 to 0.74). Therefore, the Bonferroni-corrected analysis and the combined analysis yielded similar evidence and separated the 14 suggested variants into four variants with a positive finding (near CD46, PVRL2, APOE/TOMM40 and TYR) from the 10 variants with no finding (P > 0.05 in our non-overlapping data, Table 2).
Two of the four identified variants were correlated: the tier3-identified variant rs2075650 near APOE/TOMM40 was located in the same locus as the tier2-identified variant rs6857 near PVRL2 (r 2 = 0.75 to rs2075650) and thus counted as one locus (near APOE/TOMM40/PVRL2). The tier2-identified variant rs1967689 near CD46 was located in one of the eight loci identified by our GWAS (r 2 = 0.77 to our GWAS lead variant rs4844620). The two other loci (near APOE/TOMM40/PVRL2 and TYR) were identified in addition to our GWAS. These two loci were identified here for the first time with statistical significance as loci for early AMD: one locus known for advanced AMD (near APOE) and one identified here for   sample size up to 55,750). The ARIC and CHS studies were excluded from our meta-analysis data to avoid overlap with the data by Holliday et al. [12]. The last column indicates whether the locus was identified by Fritsche et al. for advanced AMD [9] the first time for any AMD with statistical significance (rs621313, near TYR, P = 6.8 × 10 − 4 , Figure S2). Together with our GWAS approach, we identified 10 independent loci with statistical significance for early AMD: (i) eight from our GWAS (P < 5.0 × 10 − 8 in our meta-analysis, tier 1), (ii) one additional from the candidate approach with experiment-wise and with genomewide significance (tier 2), and (iii) one additional from the candidate approach with experiment-wise significance (tier 3). Among the 10 identified loci (any tier), two were reported previously for early AMD (near CFH, ARMS2/ HTRA1) and eight were identified here for the first time for early AMD with statistical significance (near CD46, C2, C3, CETP, TNFRSF10A, VEGFA, APOE/TOMM40/ PVRL2 and TYR). The eight loci included two that have not previously been identified with statistical significance for any AMD (near CD46 and TYR). These two loci showed no second signals (GCTA [31], P Cond > 5.0 × 10 − 8 for CD46 and P Cond > 0.05/14 for TYR, Figure S1-2).

Sensitivity analysis on the 10 identified early AMD loci
We conducted various sensitivity analyses to evaluate the robustness of the associations for the 10 identified lead variants: (a) leaving out one of the 11 data sets at a time showed similar effect estimates in most cases, except for the exclusion of the two largest contributing sources, IAMDGC and UKBB, which had the strongest impact on effect sizes due to their relatively large sample size ( Figure  S3). Exclusion of IAMDGC slightly increased the CD46 effect size (due to a slightly smaller effect in IAMDGC, Figure  S3A), had no impact on the TYR effect size ( Figure S3B), and decreased effect sizes for C2, CETP, C3, APOE, CFH and ARMS2/HTRA1 variants (due to a relatively large association in IAMDGC). Exclusion of UKBB had no impact on the CD46 variant effect ( Figure S3A), slightly diminished the TYR effect (due to a relatively strong effect in UKBB, Figure S3B) and increased effect sizes near C2, C3 or CFH.
(b) Between-study heterogeneity in our meta-analysis was similar to the heterogeneity in any of the leave-one-out meta-analyses, which indicated that the observed heterogeneity was not driven by one single data set (Table S5). We observed low to moderate between-study heterogeneity [28] for the two novel any AMD loci (I 2 between 22 to 48.3% and 45.4 to 63.8% for the TYR and CD46 lead variant, respectively). (c) Effect sizes were robust to additional adjustment for sex or inclusion of additional genetic principal components in the regression analyses ( Figure S4). In the following, we were particularly interested in finemapping the two loci that have not been identified before for any AMD: the loci near CD46 and TYR.

Gene prioritization at the two novel loci
To prioritize variants and genes at the CD46 and TYR locus, we conducted in silico follow-up analyses for all variants and overlapping genes for each of these two loci (4451 or 5729 variants, 10 or 7 genes, respectively). We found several interesting aspects (Table 3): (1) When prioritizing variants according to their statistical evidence for being the driver variant by computing 95% credible sets of variants [32], we found 23 and 294 credible set variants for the CD46 and TYR locus, respectively (Table S6). (2) Using the Variant Effect Predictor [33], we assessed overlap of credible set variants with functional regulatory regions and found variants influencing the transcript and/or the protein for four genes (Table S7): variants causing an alternative splice form for CD46, a nonsense-mediated mRNA decay (NMD) for CR1L, a missense variant for TYR (rs1042602, r 2 = 0.56 to the lead variant rs621313), and NMD variants for NOX4. (3) We investigated credible set variants for being an expression quantitative trait locus (eQTL) for any of the 17 genes in retina (Eye Genotype Expression database, EyeGEx [34]) or in 44 other tissues (Genotype-Tissue Expression database, GTEx [35]). For the CD46 locus, we observed significant association of the lead variant and additional 16 credible set variants on CD46 expression in retina (FDR < 5%, Table S8); the early AMD risk increasing alleles of all 17 variants were associated with elevated CD46 expression. Importantly, we observed the expression signal to colocalize with the early AMD association signal using eCAVIAR [36] (3 variants with colocalization posterior probability CLPP> 0.01, Table S9, Figure S5-S6). We also found credible variants to be associated with CD46 expression in 15 other tissues from GTEx, including four brain tissues (FDR < 0.05, Table S10). Among the credible set variants in the two loci, we found no further eQTL for any of the other genes. When extending beyond the credible set, we found one further CD46 locus variant as eQTL for CD55, but without colocalization (Table S9, Figure S5-S6). These findings support the idea that the credible set captures the essential signal. (4) We queried the 17 genes overlapping the two loci for expression in eye tissue and cells in EyeIntegration summary data [37]. We found five and three genes, respectively, expressed in adult retina and adult RPE cells (CD46, PLXNA2, CR1, CD34, CD55; TYR, GRM5, NOX4; Figure S7-S8). (5) When querying the 17 genes in the Mouse Genome Informatics, MGI [38] or Online Mendelian Inheritance in Man, OMIM®, database, for eye phenotypes in mice or humans, we identified relevant eye phenotypes for five genes in mice (CD46, CR1, CR1L, PLXNA2; TYR; Table  S11) and for one gene in human (TYR ; Table S12).
While it is debatable how to prioritize evidence for a gene's probability to be causal, one approach is to count any of the following characteristics for each of the 17 genes (Gene Prioritization Score, GPS Table 3): any credible set variant is (i) protein-coding, (ii) involved in NMD, (iii) affecting splice function, (iv) an eQTL for this gene in retina (EyeGEx) or in any other tissue (GTEx), or/and the gene (v) is expressed in retina or RPE, (vi) linked to eye phenotype in mouse or (vii) human. This approach offered CD46 and TYR as the highest scored gene in the respective locus (GPS = 4 for each; Table 3).

Phenome-wide association search for the two novel loci
Co-association of variants in the two novel loci for early AMD with other traits and diseases may provide insights into shared disease mechanisms. We queried different data sets on numerous phenotypes by a gene-based and by a locus-based view.
For the gene-based view, we focused on 82 traits and evaluated reported genome-wide significant (P < 5.0 × 10 − 8 ) lead variants (and proxies, r 2 > 0.5) for overlap with any of the 17 gene regions (Table S13). For the CD46 locus, we found significant association corrected for multiple testing (false-discovery rate, FDR < 5%) for schizophrenia (in CD46 and CR1L) and for Alzheimer's disease (in CR1, Table S14). For the TYR locus, we found significant associations for eye color, skin pigmentation and skin cancer (in GRM5 and TYR, Tables S14).
For the locus-based view, we conducted a phenomewide association study (PheWAS): we evaluated whether the two lead variants were associated with any of the 778 traits in UK Biobank using GeneAtlas (n = 452,264, age-adjusted estimates; Table S15) [42]. For the CD46 lead variant, we identified 27 significant trait associations (FDR < 5%), including four with particularly strong evidence (P < 5.0 × 10 − 8 ; white blood cell, neutrophil, monocyte count and plateletcrit); the early AMD risk increasing allele (G, frequency = 79%) was associated consistently with increased blood cell counts. We did not find a significant association of the CD46 lead variant with schizophrenia in UK Biobank (FDR > 5%; Alzheimer's disease not available). For the TYR lead variant (rs621313, G allele associated with increased early AMD risk, frequency = 48%), we identified 20 significant trait associations including Melanoma (FDR < 5%, G allele associated with increased Melanoma risk) and two with particularly strong evidence for skin color and ease of   Tables S6-S12. For the GPS, the sum of cell entries for "annotation" and "biology" was computed per row skin tanning (P < 5.0 × 10 − 8 , G allele associated with brighter skin color and increased ease of skin tanning).

Advanced AMD association and interaction analyses for the two novel loci
Next, we investigated whether the early AMD loci CD46 or TYR were associated with advanced AMD. We thus queried the two lead variants for early AMD (rs4844620 and rs621313, respectively) for their advanced AMD association in the IAMDGC data (Table S16). We observed nominally significant directionally consistent effects for advanced AMD (OR adv = 1.05, 95% confidence interval, CI = [1.01,1.09] and 1.03 [1.00,1.07], P adv = 0.02 and 0.05, respectively) that were slightly smaller compared to early AMD effects (OR early = 1.10 [1.06,1.14] and 1.05 [1.02, 1.08], P early = 4.7 × 10 − 8 and 6.8 × 10 − 4 ). When exploring variant x age interaction for early AMD (in a subset of our meta-analysis of 10,890 early AMD cases and 54,697 controls) or for advanced AMD (IAMDGC data [44]) for the two novel locus lead variants, we found no statistically significant interaction at a Bonferroni-corrected level for early or advanced AMD (P GxAGE > 0.05/2 = 0.025, Table S17-S18).
We were interested in whether one of the two novel lead variants showed interaction with any of the 34 known advanced AMD variants for association with advanced AMD (IAMDGC data). We found no significant interaction (P GxG > 0.05/34/2, Table S19), which suggests that the known advanced AMD effects are not modulated by the two novel early AMD variants.
Dissecting advanced AMD genetics into shared and distinct genetics for early AMD We were interested in whether we could learn about advanced AMD genetics from a joint view of advanced and early AMD genetic effects. First, when computing genetic correlation of advanced AMD genetics with early AMD genetics, we found a substantial correlation of 78.8% (based on LD-score regression). Second, we contrasted advanced AMD effect sizes (IAMDGC data [9]) with early AMD effect sizes (our meta-analysis,) for the 34 known advanced AMD lead variants (Fig. 2, Table S16). We found two classes of variants: (1) 25 variants showed nominally significant effects on early AMD (P < 0.05; "advanced-and-early-AMD loci"), all directionally consistent and all smaller for early vs. advanced AMD (OR early = 1.04-1.47; OR adv = 1.10-2.81); (2) nine variants had no nominally significant effect on early AMD (P ≥ 0.05; "advanced-only AMD loci"). We did not find any variant with early AMD effects into the opposite direction as the advanced AMD effects. Also, we did not find any variant-age interaction on early AMD (Table S18). Fig. 2 Advanced vs early AMD effect sizes. Shown are advanced AMD effect sizes contrasted to early AMD effect sizes (effect sizes as log odds ratios) for the 34 known advanced AMD variants [9] (blue or green for P early < 0.05 or P early ≥ 0.05, respectively) and for the two novel (early) AMD variants (red, near CD46, TYR). Detailed results are shown in Table S16 We observed that complement genes CFH, CFI, C3, C9, and C2 were all included in the 25 advanced-andearly-AMD loci. We were thus interested in whether advanced-and-early-AMD loci suggested different pathways compared to advanced-only-AMD loci. For this, we utilized the GPS from our previous work on advanced AMD [9] to select the best-supported genes in each of these loci (Table S20). We applied Reactome pathway analyses via Enrichr [47] twice: (i) for the 35 genes in the 25 advanced-and-early-AMD loci and (ii) for the nine genes in eight advanced-only-AMD loci (no gene in the "narrow" locus definition of the RORB locus). This revealed significant enrichment (corrected P < 0.05) for genes from "complement system" and "lipoprotein metabolism" in the 25 advanced-and-early-AMD loci and enrichment for genes in the pathways "extracellular matrix organization" and "assembly of collagen fibrils" in the 8 advanced-only-AMD loci (Table 4). This suggested that the early AMD effect of advanced AMD variants distinguished the major known pathways for advanced AMD.

Discussion
Based on the largest genome-wide meta-analysis for early AMD to date encompassing~14,000 cases and9 1,000 controls, all color fundus photography confirmed and a candidate approach based on 14 suggestive variants from Holliday et al. [12], we identified 10 loci for early AMD including eight novel and two previously identified for early AMD [12]. Eight of the 10 identified loci overlapped with known loci for advanced AMD [9] and two had not been detected by GWAS for early or advanced AMD so far. Our post-GWAS approach highlighted CD46 and TYR as compelling candidate genes in the two loci. Our joint view on early and advanced AMD genetics allowed us to differentiate between shared and distinct genetics for these two disease stages, which the pathway analyses suggested to be biologically relevant.
We defined three tiers of identified variants that reflected the different levels of data and evidence: tier 1 variants were identified with genome-wide significance in our GWAS meta-analysis data (identifying 8 loci including CD46), tier 2 variants were among the 14 candidate-based variants judged at experiment-wise significance in our non-overlapping data (P < 0.05/14 = 0.0036) as well as at genome-wide significance in the combined analysis (i.e. previously published summary statistics and ours; identifying one additional locus near APOE), and tier 3 variants were among the 14 variants judged at experiment-wise significance, but no genomewide significance in the combined data (identifying one additional locus near TYR). While the establishing of genome-wide significance for a locus is ideal, testing at a statistical significance level controlling the experimentwise type 1 error with Bonferroni-correction is also an established approach to provide statistical evidence in independent data [48].
Particularly interesting were the two loci near CD46 and TYR that were identified here for early AMD with statistical significance and have not been identified previously, not even for advanced AMD. The locus around CD46 had been reported as suggestive for early AMD by the previous largest GWAS for early AMD [12] (4089 early AMD cases, 20,453 controls), but not with statistical significance, and had not been identified with statistical significance by the previous largest GWAS for advanced AMD [9] (16,144 advanced AMD cases, 17,832 with no effect on early AMD (9 genes). Pathways with significant corrected P-value (P corr < 0.05) for each gene group from EnrichR querying human Reactome database 2016 are shown controls). Our meta-analysis was more than three times larger than the previous early AMD GWAS (effective sample size 48,651 compared to 13,631 [12]) and had a larger power to detect an "any AMD" effect with genome-wide significance than the previous advanced AMD GWAS (e.g. for OR = 1.10, allele frequency 30%: power = 92% compared to 61%, respectively). The TYR locus had not been identified with statistical significance in any previous GWAS on early or advanced AMD; it was stated as a locus with suggestive evidence from the previous GWAS on early AMD (P = 3.5 × 10 − 6 ) [12] and was identified here with statistical significance at experiment-wise error control. The combined analysis of the Holliday et al. and our non-overlapping data clearly separated the 14 Holliday variants into four with genome-wide or close to genome-wide significance and 10 that were far away from statistical significance. Prioritization of genes underneath association signals is a known challenge, but highly relevant for selecting promising candidates for functional follow-up. Our systematic approach, scrutinizing all genes underneath our two newly identified loci, highlighted CD46 and TYR as the most supported genes. CD46 is an immediate compelling candidate as a part of the complement system [49]. Complement activation in retina is thought to have a causal role for AMD [50,51]. Importantly, we found our CD46 GWAS signal to colocalize with CD46 expression with the early AMD risk increasing allele (rs4844620 G) increasing CD46 expression in retinal cells. On the one hand, this contrasts the presumption that a higher CD46 expression in eye tissue should protect from AMD, based on previous CD46 expression data [52] and a documented AMD risk increasing effect for increased complement inhibition [53]. On the other hand, CD46 had also been found to have pathogenic receptor properties for human viral and bacterial pathogens (e.g. measles virus) [54] and is known to downmodulate adaptive T helper type 1 cells [55]. Furthermore, a GWAS on neutralizing antibody response to measles vaccine had identified two intronic CD46 variants (rs2724384, rs2724374) [56]. In our data, these two variants were in the 95% credible set for the CD46 locus, highly correlated with our lead variant rs4844620 (r 2 > = 0.95), and the major alleles (rs2724374 T, rs2724384 A) increased early AMD risk. Interestingly, the rs2724374 G was shown for CD46 exon skipping resulting in a shorter CD46 isoform with a potential role in pathogen binding [56]. Based on this, one may hypothesize that the observed CD46 signal in early AMD is related to pathogenic receptor properties rather than complement inactivation.
At the second locus, TYR appears as the best supported gene by our systematic scoring. This locus and gene was already discussed by Holliday and colleagues [12]. Briefly, TYR is important for melanin production and TYR variants in human were associated with skin, eye and hair color [57][58][59]. While we did not identify any cis effect between the credible set variants and TYR expression, one of our credible set variants in TYR (rs1042602) is a missense variant. Interestingly, this variant was a GWAS lead variant not only for skin color [58], but also for macular thickness in UK Biobank [60]; the allele associated with thicker retina showed increased early AMD risk in our data. Since thicker RPE/Bruch's membrane complex was associated with increased early AMD risk in the AugUR study [61], this would be in line with our early AMD risk increasing allele being linked to a process of increased accumulation of drusenoid debris in the RPE/Bruch's membrane complex. Although CD46 and TYR were the most supported genes in the two loci, we could not rule-out the relevance of other genes in the loci.
It is a strength of the current study that early AMD and control status was ascertained by color fundus photography, not relying on health record data. However, the early AMD classification in our GWAS was heterogeneous across the 11 data sets: one study incorporated information from optical coherence tomography (NIC-OLA), the UK Biobank classification was derived by a machine-learning algorithm [20], and the IAMDGC data was multi-site with different classification approaches [9]. The uncertainty in early AMD classification and the substantial effort required for any manual AMD classification are likely reasons for the sparsity of early AMD GWAS so far. Our sensitivity analysis with the leaveone-out meta-analyses and corresponding heterogeneity estimates showed that effect estimates did not depend on one or the other data source or classification approach.
Our data on early AMD genetics is comparable in size to the existing data on advanced AMD genetics from IAMDGC (summary statistics at http://amdgenetics.org/ ) and thus provides an important resource (summary statistics at http://genepi-regensburg.de) to enable a joint view. By this joint view, we were able to differentiate the 34 loci known for advanced AMD into 25 "advancedand-early-AMD loci" and nine "advanced-AMD-only loci". Pathway enrichment analyses conducted separately for these two groups effectively discriminated the major known pathways for advanced AMD genetics [9]: complement complex and lipid metabolism for "advancedand-early-AMD" loci; extracellular matrix metabolism for "advanced-AMD-only" loci. The two novel loci around CD46 and TYR fit to the definition of "advanced-and-early-AMD" loci and the CD46 being part of the complement system supports the above stated pathway pattern. The larger effect size for early compared to advanced AMD for the two novel loci mayin partbe winner's curse.
How do our observations relate to potential etiological models? (1) For a genetic variant capturing an underlying mechanism that triggers both early and advanced AMD, we would expect the variant to show association with early and advanced AMD (compared to "healthy") with directionally consistent effects (Fig. 3, Model 1). This would be in line with our observed associations for the 27 "advanced-and-early-AMD" loci (25 known advanced AMD loci, 2 novel loci). This would also suggest that mechanisms of complement system or lipid metabolism trigger both early and advanced disease. (2) For a mechanism that triggers advanced AMD no matter whether the person is "healthy" or has early AMD, we would anticipate a variant effect for advanced AMD, but not for early AMD (Fig. 3, Model 2). This would be in line with our observed associations for the nine "advanced-AMD-only" loci. This would also suggest that mechanisms of extracellular matrix metabolism trigger advanced AMD rather than early AMD. Of note, these include the MMP9 locus, which is thought to trigger vascularization and wet AMD [9]. (3) Another mechanism is conferred by variants that are purely responsible for progression from early to advanced AMD, but do not increase advanced AMD risk for "healthy" individuals. In such a scenario, the advanced AMD risk increasing allele would be under-represented among persons with early AMD (Fig. 3, Model 3), particularly at older age, and it would be associated with decreased risk of early AMD (compared to "healthy"). None of the identified variants showed this pattern overall or for older age in the variant x age interaction analyses. (4) For a mechanism that triggers early AMD, but has no impact on the progression from early to advanced AMD, we would have an effect on early AMD, but no effect on advanced AMD (Fig. 3, Model 4). We did not find such a variant.
Our data and joint view on effects for both disease stages support two of the four etiological models. One may hypothesize that the unsupported models are nonexisting or unlikely. There are limitations to consider: (1) To reduce complexity, we adopted an isolated view per variant with some accounting for interaction, but ignoring more complex networks. (2) Early AMD effects were estimated predominantly in population-based studies, while advanced AMD effects were from a casecontrol design. (3) The cut-off of "nominal significance" for separating variants into "advanced-and-early" or "advanced-only" loci is arbitrary and larger data might give rise to re-classification. Still, the power to detect effects for early AMD in our meta-analysis was similar to the power in the advanced AMD data from IAMDGC (for OR = 1.05, allele frequency 30%, nominal significance: power = 94% or 83%, respectively). (4) An improved disentangling of genetic effects for the two chronologically linked disease stages will be an important subject of further research, requiring large-scale population-based studies with long-term follow-up and the estimation of transition probabilities.

Conclusions
In summary, our large GWAS on early AMD identified novel loci, highlighted shared and distinct genetics between early and advanced AMD and provides insights into AMD etiology. The ability of early AMD effects to differentiate the major pathways for advanced AMD underscores the biological relevance of a joint view on early and advanced AMD genetics.