Chromosome 9p21 in sporadic amyotrophic lateral sclerosis in the UK and seven other countries: a genome-wide association study

Summary Background Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease of motor neurons that results in progressive weakness and death from respiratory failure, commonly within about 3 years. Previous studies have shown association of a locus on chromosome 9p with ALS and linkage with ALS–frontotemporal dementia. We aimed to test whether this genomic region is also associated with ALS in an independent set of UK samples, and to identify risk factors associated with ALS in a further genome-wide association study that combined data from the independent analysis with those from other countries. Methods We collected samples from patients with sporadic ALS from 20 UK hospitals and obtained UK control samples from the control groups of the Depression Case Control study, the Bipolar Affective Case Control Study, and the British 1958 birth cohort DNA collection. Genotyping of DNA in this independent analysis was done with Illumina HumanHap550 BeadChips. We then undertook a joint genome-wide analysis that combined data from the independent set with published data from the UK, USA, Netherlands, Ireland, Italy, France, Sweden, and Belgium. The threshold for significance was p=0·05 in the independent analysis, because we were interested in replicating a small number of previously reported associations, whereas the Bonferroni-corrected threshold for significance in the joint analysis was p=2·20×10−7 Findings After quality control, samples were available from 599 patients and 4144 control individuals in the independent set. In this analysis, two single nucleotide polymorphisms in a locus on chromosome 9p21.2 were associated with ALS: rs3849942 (p=2·22×10−6; odds ratio [OR] 1·39, 95% CI 1·21–1·59) and rs2814707 (p=3·32×10−6; 1·38, 1·20–1·58). In the joint analysis, which included samples from 4312 patients with ALS and 8425 control individuals, rs3849942 (p=4·64×10−10; OR 1·22, 95% CI 1·15–1·30) and rs2814707 (p=4·72×10−10; 1·22, 1·15–1·30) were associated with ALS. Interpretation We have found strong evidence of a genetic association of two single nucleotide polymorphisms on chromosome 9 with sporadic ALS, in line with findings from previous independent GWAS of ALS and linkage studies of ALS–frontotemporal dementia. Our findings together with these earlier findings suggest that genetic variation at this locus on chromosome 9 causes sporadic ALS and familial ALS–frontotemporal dementia. Resequencing studies and then functional analysis should be done to identify the defective gene. Funding ALS Therapy Alliance, the Angel Fund, the Medical Research Council, the Motor Neurone Disease Association of Great Britain and Northern Ireland, the Wellcome Trust, and the National Institute for Health Research Dementias and Neurodegenerative Diseases Research Network (DeNDRoN).


Exclusion of SNPs with low minor allele frequency, low call rates and not in Hardy-Weinberg equilibrium in controls.
Thresholds for exclusion: minor allele frequency < 0•001, call rate < 95%, Hardy Weinberg test, P = 1 x 10 -6 . 0 markers removed for low frequency 12 markers removed for low call rate 179 markers removed for failing Hardy-Weinberg test in controls. 233031 markers remaining

Further general quality control measures Exclusion of individuals with low genotyping rates
Threshold for exclusion, genotyping rate < 95%. 13 individuals removed 13831 individuals remaining 4847 cases, 8984 controls 7783 males, 6048 females Total genotyping rate in remaining individuals 0.998633

Assessment of SNPs with outlier P-values
Threshold for assessment, P << 1 x 10 -100 .

markers removed 233016 SNPs remaining
Exclusion of chromosome X markers 5541 markers removed Final count: 227475 markers remaining 5. Tests using a reduced set of SNPs to identify relatedness These tests use a set of SNPs pruned to be in approximate linkage equilibrium (in our set, 74011 markers).

Exclusion of individuals with low heterozygosity
If the observed heterozygosity is lower than expected, this suggests inbreeding or poor genotyping quality. Threshold for exclusion, P < 0•05.

Exclusion of outliers for ancestry
A multidimensional scaling plot was derived from the reduced set of SNPs. The first two axes were used to plot individuals based on ancestry. Those outside the main cluster were excluded.

Exclusion of cryptically related individuals
Using the statistical genetics measure of relatedness, pi-hat, which is the proportion of genetic variants identical by descent between two individuals. Thus, identical twins or duplicate samples, show pi-hat = 1, siblings show pi-hat = 0•5 etc. Threshold for exclusion, pi-hat > 0•05. One individual of each pair was removed randomly.

individuals removed.
Final counts: 12263 individuals remaining 4133 cases, 8130 controls 6873 males, 5390 females *SNP rs12608932 was missing from the DeCC and BACC control set (listed as UK_MDEPR in Tables 2 and S1), but passed all other quality control measures. It was therefore retained to allow study of the UNC13A association. The loss of data for 1505 controls reduced power for detection of this SNP, with the equivalent values to those given in the Results as 0•73 power in the independent data and 0•77 power in the joint analysis.

Controlling for population substructure
The Eigenstrat package was used to analyse population substructure, using genotypes at 74011 SNPs as covariates to generate principal components axes (PCAs) for all 12791 individuals. The Twstats package was used to determine the optimal number of PCAs to use. This uses the Tracy-Widom distribution to estimate the optimal number of PCAs to include. Correction for substructure was not necessary in the independent analysis ( 01 1 GC ). In the joint analysis, correction was essential because of the multiple populations studied. After correction, 04 1 GC , which suggests no inflation of the test statistic. The QQ plots provide additional evidence for this ( Figures S1 and S2).  Table S2. SNP associations at P < 1 x 10 -4 in the independent study based on an additive model in logistic regression. Five of the top ten SNPs are in the associated chromosome 9 region.
Supplementary Figure S1. Q-Q plot for the independent study. The grey shaded area shows the 95% confidence limits for the expectation under the null hypothesis, shown in red. The test statistic is not inflated: 01 1 GC .