Recent Mitochondrial DNA Mutations Increase the Risk of Developing Common Late-Onset Human Diseases

Mitochondrial DNA (mtDNA) is highly polymorphic at the population level, and specific mtDNA variants affect mitochondrial function. With emerging evidence that mitochondrial mechanisms are central to common human diseases, it is plausible that mtDNA variants contribute to the “missing heritability” of several complex traits. Given the central role of mtDNA genes in oxidative phosphorylation, the same genetic variants would be expected to alter the risk of developing several different disorders, but this has not been shown to date. Here we studied 38,638 individuals with 11 major diseases, and 17,483 healthy controls. Imputing missing variants from 7,729 complete mitochondrial genomes, we captured 40.41% of European mtDNA variation. We show that mtDNA variants modifying the risk of developing one disease also modify the risk of developing other diseases, thus providing independent replication of a disease association in different case and control cohorts. High-risk alleles were more common than protective alleles, indicating that mtDNA is not at equilibrium in the human population, and that recent mutations interact with nuclear loci to modify the risk of developing multiple common diseases.


Introduction
Mitochondria are the principal source of cellular adenosine triphosphate (ATP) generated through oxidative phosphorylation (OXPHOS), which is linked to the respiratory chain. In humans, thirteen OXPHOS proteins are synthesised from the 16.5 Kb mitochondrial genome (mtDNA). MtDNA has accumulated genetic variants over time, and being strictly maternally inherited, undergoes negligible intermolecular recombination. As a consequence, ancient variants extant in the human population define haplogroups that have remained geographically or ethnically restricted [1]. Work on European haplogroups has shown that some polymorphic mtDNA variants affect mitochondrial function [2,3].
Given emerging evidence that mitochondria play a key role in several common diseases, it is likely that variation of mtDNA could alter the risk of developing different human disorders. Early mtDNA genetic association studies were under-powered, and the vast majority have not been replicated [4]. However, some recent large studies have found replicable associations with specific human diseases [5][6][7][8][9][10][11], most notably in sporadic Parkinson's disease [12][13][14]. These observations implicate mtDNA as part of the ''missing heritability'' of complex human disease traits.

Common mtDNA variants are associated with common disease
After applying stringent quality control measures (Supplementary Materials, Table S1 & S2), we initially compared the two healthy control groups using PLINK v2.050 [15] (Supplementary Materials, Figure S1), and found no significant difference in allele frequencies. We therefore merged control groups genotyped on the same platform for all subsequent analyses as follows: WTCCC-Control-1, WTCCC-Control-2 and WTCCC-Control-3 (Supplementary Materials, Table S2).
Cluster plots produced by principle component analysis (PCA) revealed no significant population stratification when comparing either: datasets from the same array or array-specific control datasets (Supplementary Materials, Figure S4).
Phylogenetically-related mtDNA variants are associated with common disease Next we performed lexical tree building to identify new associations with phylogenetically related variants, but without basing our anlysis on any prior assumptions related to the published mtDNA haplogroup structure [18,19]. This method uses fewer SNPs because individuals with missing SNP data cannot be used, but has greated power, and provides graphical summaries of the combinations of SNPs that are associated with increased or descreased risk of disease (Supplementary Materials, Table  S4). Lexical tree analysis identified significant relationships between the mtDNA tree structure and schizophrenia, primary biliary cirrhosis, multiple sclerosis (each at p,10 26 ), ulcerative colitis (p,10 24 ), and Parkinson's disease (p = 0.004) ( Table 1 and Supplementary Materials Figure S3), independently confirming previous haplogroup based association associations [5,12,16,17], and revealing new mtDNA clades associated with several different diseases. The other case-control trees, and comparisons between the different control populations were not significant at the 1% level.

Imputed mtDNA variants are associated with several different common diseases
To determine the functional basis of the associations we imputed missing genotypes across the whole mitochondrial genome using 7,729 complete mtDNA sequences. Subsequent analyses were performed on 35,901 European cases and 15,302 European controls, and captured 40.41% of European mtDNA population genetic variation (Supplementary Materials, Figure S2).
In keeping with our original hypothesis, specific variants with predicted functional consequences conferred either an increased risk (Table 2a) or decreased risk (Table 2b) across several different diseases. In addtion, we identified the same allelic-specific associations for different diseases compared to different platformspecific control groups, re-inforcing these findings. Functional variants associated with an increased risk in two or more diseases were limited to two structural genes: MTCYB (m.14793, m.15218) and MTCO3 (m.9477, m.9667). The only non-synonmous protien encoding variant consistently associated with a reduced risk of disease was in MTND3 (m.10398).
We also found evidence of associations across multiple diseases within the non-coding region (d-loop) of mtDNA, and 16S ribosomal RNA subunit genes ( Figure 2 and Table 2 and Supplementary Materials, Table. S3). Intriguingly, the same alleles were not associated with all of the diseases we studied, and for two variants (m.11299, m.16294), the same allele had opposite effects for two different diseases (Table 2c).

Discussion
Following stringent quality control, our initial analysis confirmed previous associations between mtDNA haplogroups and common disease in a much larger data set. These findings were independentely supported by lexical tree based analysis at higher levels of statistical significance. Subsequent imputation of missing genotypes captured .40% of European mtDNA population genetic variation in 35,901 European cases and 15,302 European controls. By simultaneously analysing eleven, ostensibly unrelated, diseases we identified several imputed mtDNA variants that were associated with more than one disease. The same associations were seen in different disease groups compared to different control groups. This provided confirmatory independent replication of a disease association, and supports our original hypothesis that the same genetic variants of mtDNA contribute to the risk of developing several common complex diseases.
Variants increasing the risk of two or more diseases were limited to MTCYB (m.14793, m.15218) and MTCO3 (m.9477, m.9667), encoding variants in cytochrome b (H16R, T158A) and subunit 3 of cytochrome c oxidase (complex IV, V91L, N154S). Functional variants of MTCYB have previosly been associated with several human phenotypes [20][21][22], but the most compelling evidence of a prior disease association is the increased risk of developing blindness in subjects harboring the mtDNA mutations in MTND genes known to cause Leber hereditary optic neuropathy (LHON), where they synergistically interact with a primary LHON mutation to cause a defect of OXPHOS complex I activity [23]. On the other hand, the only non-synonmous protien encoding variant associated with a reduced risk of several diseases was m.10398 in the MTND3 variant (complex I, T114A). m.10398 occurs twice on the human mtDNA phylogeny (homoplastic on haplogroups J and K), and has previously been associated with a reduced risk of Parkinson's disease [14,24]. This variant has been shown to reduce complex I activity, cytosolic calcium levels, and the mitochondrial membrane potential [3,25,26] and thus may reduce the level of reactive oxygen species, contributing to the

Author Summary
There is a growing body of evidence indicating that mitochondrial dysfunction, a result of genetic variation in the mitochondrial genome, is a critical component in the aetiology of a number of complex traits. Here, we take advantage of recent technical and methodological advances to examine the role of common mitochondrial DNA variants in several complex diseases. By examining over 50,000 individuals, from 11 different diseases we show that mitochondrial DNA variants can both increase or decrease an individual's risk of disease, replicating and expanding upon several previously reported studies. Moreover, by analysing several large disease groups in tandem, we are able to show a commonality of association, with the same mitochondrial DNA variants associated with several distinct disease phenotypes. These shared genetic associations implicate a shared underlying functional effect, likely changing cellular energy, which manifests as distinct phenotypes. Our study confirms the important role that mitochondrial DNA variation plays on complex traits and additionally supports the utility of a GWAS-based approach for analysing mitochondrial genetics.
underlying disease mechanim of several disorders.Variants in MTCO3 are typically associated with primary mitochondrial disorders [27,28], but have been also been indentified as risk factors in Alzheimer's disease [29,30], migrainous stroke [31] and sporadic optic neuropathy [32]. M.9477 and m.9667 are nonsynonmous protien encoding variants which are cladally related; present on haplogroup U sub branches (U5 and U5a1b, respectively). Cybrid studies of haplogroup U show a reduction in mtDNA copy number, resulting in a reduction in mitochondrial protein synthesis and complex IV activity [3,25], impairing energy production and likely contributing to disease.
We also noted disease associations with substitutions in the noncoding region and ribosomal genes ( Table 2 and Supplementary Materials, Table S3). Although highly polymorphic at the population level ( Figure 2), there is emerging evidence that both regions can have functional effects either through an effect on mtDNA replication, transcription or translation [33,34], as proposed in Alzheimer's disease [34].
It is intriguing that there were more functional variants associated with an increased risk, than with a decreased risk of disease ( Table 2 and Supplementary Materials, Table S3). This suggests that deleterious, novel sub-haplogroup variants have not yet been removed from the population through natural selection, possibly including the younger d-loop variants. This has been observed in the nuclear genome in the rapidly expanding human population [35,36], implying that the modern human population is far from equilibrium. An alternative explantion is that mtDNA alleles may escape purifying selection because the For two variants (m.11299, m.16294), the same allele was associated with an increased risk of developing one disease, and a reduced risk of developing another ( Table 2). Although differences in the sample size post-QC provide one explanation, these findings raise the possibility that different mtDNAmediated mechanisms are involved in different contexts, perhaps because some variants have a greater impact on bioenergetics, and others on the generation of reactive oxygen species. Alternatively, it is conceivable that the relevance of specific alleles may be context-specific, only excerting a functional effect on a particular haplogroup background [37]. Substantially larger whole mtDNA genome studies will be required to detect clade-specific epistastic interactions if they exist.
In some instances we observed multiple associations with different variants found within the same phylogenetic cluster. For example m.499 (K1a), m.11485 (K1a4) and m.11840 (K1a4a1) are known to reside within subdivisions of the major haplogroup K, and all associated decreased risk of MS and IS. Conversely, m.310 (U4a2) and m.3197 (U5) are distinct subclades of the U associated with increased risk of PS, MS, IS PD AS and UC. Although reassuring from a technical perpective, this illustrates the challenge of mtDNA association studies, where variants with a close ancestral relationship inevitably co-segregate, making it difficult to determine which alleles are responsible for the disease risk.
Finally, analysis of imputed data also revealed several different mtDNA alleles asssociated with different diseases, often reaching high levels of statistical significance (P,10 210 , Supplementary Materials, Table S3). However, these findings should only be considered preliminary and require independent replication in other populations (where specific European haplogroup distributions can vary) and thus do not form the major focus of this report.
In conclusion, these findings underscore the role of mitochondrial mechanisms in the pathogenesis of common diseases, and emphasise the importance of incorporating the mitochondrial genome in comprehensive genetic association studies. Although the strict phylogenetic stucture of maternally inherited mtDNA makes it difficult to identify the precise variants responsible, higher resolution genotyping at the whole mtDNA genome level will cast further light on the genetic mechanisms, particularly if recurrent homoplasies independently associate with phenotypes across several clades.
To ensure valid comparisons, each disease sample set wasonlycompared to its corresponding control array counterpart(i.e. SNP6.0 cases were compared to SNP6.0 controls).''
Power calculations were carried out using Genetic Power Calculator [58].

Population stratification
Only ancestral Europeans, determined by mitchondrial DNA genotype, were included in this study [1,60,61]. Additionally, population structure in each cohort (post-QC) and combined by array type was assessed by principle component analysis (PCA) of mitochondrial DNA variants [62]. Plots were made of the first two components for each array dataset (Illumina = AS Figure S3). At this resolution, individual PCA cluster analysis showed no significant stratification differences. All principle component scores were calculated in R using the 'princomp' function and plotted in R using ggplot (R Core Team 2013) [63].
The reference panel was merged with each QC'd case-control cohort in PLINK (v2.050), [15] invoking '-flip-scan' to detect and correct any stranding issues. Imputation association testing was carried out using '-proxy-assoc' and, in order to assess the imputation performance, '-proxy-drop'. [15] Significant SNPs associations with .99% of samples imputed, number of proxy SNPS .3, a MAF .0.01 and a content metric .0.8 were retained. [15] Given a popualtion size of 7,729 and total genotypic information of 2,873 as 100%, imputation of alleles with MAF. 0.0 captures 40% of total mtDNA genetic variabilty ( Figure S2).

Lexical tree analysis
Lexical tree analysis was performed in R (R Core Team 2013) [63] using a custom library (snptree, publically available from http://www.staff.ncl.ac.uk/i.j.wilson/). This analysis was performed on the Illumina 610K quad array, the Affymetrix SNP6.0 and the MetabaloChip datasets independently. An independent stringent QC was performed, removing in order: the SNPs with a call rate of below 95% or a MAF of below 0.5%, the 2% of individuals with the most missing sites, the bottom 50% of SNPs with the most missing samples at that site, and those individuals with any missing data from the remaining SNPs. Finally, those individuals with haplotypes (defined by all the remaining SNPs) that were not present in controls or had a frequency of less than 5 were removed. This left 27054 individuals on 24 SNPs for the Illumina 610K quad array,  Table  S4. A tree structure was contructed for haplotypes made from the retained SNPs by initially grouping all individuals at the root of a tree, and then successively considering all retained SNPs in decreasing order of their minor allele frequency (Supplementary Materials, Figure S3). At each stage, the haplotypes at each leaf node are split with those with the wild type being put on the left branch and those with the mutant allele on the right. This creates a tree with all leaves representing complete haplotypes and internal nodes partial haplotypes. Test statistics were then calculated for each node on the tree. An overall test statistic for the tree was calculated by calculating the the sum of the five largest node values that were not ancestors or descendents of each other. The test statistic was tested for significance by 1,000,000 random permutations of the Case/ Control labels.

Table S3
Association between imputed mitochondrial DNA variants and eight complex diseases, showing the corresponding control cohort, array SNP ID, variant position in the mitochondrial genome (rCRS, NC_012920), minor allele frequency in cases and controls (A1-cases and A1-Cont. respectively), case-control comparison (chi-square test P, na = not available in primary analysis), imputed significance (P) and odds ratio (OR). Hap =corresponding major and sub mitochondrial haplogroup. (DOCX)