Distribution of mitochondrial variants in the sequenced mtDNAs
The bioinformatics pipeline was confirmed to be consistent with the variant-calling pipeline for the mtDNA standard sequence [26]. According to NIST [38], the percentage of heteroplasmy for m.1393G>A was 17.4 ± 1.7% and that for m.7861T>C was 74.6 ± 14.5%, while the values in our study were 17.4 ± 0.6% and 89.5 ± 0.1%, respectively. These results confirmed the accuracy of the bioinformatics pipeline and experimental procedure for mtDNA sequencing and analysis. The average mtDNA coverage for the examined samples was 2,248X ± 599X.
We identified a total of 681 variants called from 118 individuals (Figure 1). Most of the variable positions were transitions (86.5% overall), while the overall percentage of transversions was about 3%. Our study showed that transitions were dominant, as reported previously for human mitochondrial genomes [39]. Indel polymorphism was detected at a rate of 8.5% among all individuals. The median of the mtDNA variants was 38 (range: 26–55). No variants were found in 9/22 tRNA-coding genes. MT-TRNP and the displacement loop (D-loop) had the most polymorphisms (113 and 74 variants, respectively) (Supplementary Table 1). No disease mutations with ‘confirmed’ status in the current MITOMAP database [39] were found in any individuals.
The lowest heteroplasmy level of mtDNA variants was 10% for several non-coding variants. The levels of heteroplasmy of some mutations (m.3206C>T, m.4824A>G, m.8473T>C, m.16179CA>C, and m.16183A>C) varied from approximately 10–100% (homoplasmy) and their allele frequency in the studied population was 11%. The m.16189T>C polymorphism causing a lower mtDNA copy number was present in 29/118 (24.58%) samples as homoplasmy, and in one healthy individual at a heteroplasmy level of 54.6% (Supplementary Table 2).
Classification of mitochondrial haplogroups in the Korean population
We examined all haplogroups previously reported in the Korean population in Korea1K (n = 1,094) [40], and listed in the Korean National Standard Reference Variome Database of whole genomes (KoVariome) [41]. Eleven variants (m.73A>G, m.263A>G, m.750A>G, m.1438A>G, m.2706A>G, m.4769A>G, m.7028C>T, m.8860A>G, m.11719G>A, m.14766C>T, and m.15326A>G) with an overall frequency ≥ 50% that are widespread across all lineages (L, M, and N) [42] were predominant in our study. Among these, five mtDNA variants present at ≥ 80% in lineages L, M, or N were found in all individuals. The rare mutation m.73A>C was observed in one individual in our population. The most dominant haplogroup for the Korean population was found to be D (34.74%), followed by M (16.1%). These haplogroups are prevalent in Asian populations [43] and were diverse in our study subgroups (Figure 2A, Table 1). The D2 sub-haplogroup, defined by mutation m.16271T>C [44] and possibly equating to D4e1b [45], was observed in one subject. Consequently, 87.8% of haplogroup D belonged to the D4 subgroup, which had the highest frequency in our Korean population; the D5 subgroup was observed in all remaining subjects. These results were consistent with the data of Korean1K, which includes sequence data from 1,094 whole genomes.
Table 1
The sub-haplogroups observed in our study (n=118) by using Haplogrep analysis
Haplogroup | Sub-haplogroup | Number of subjects (n=118) |
A | A+152+16362, A1, A11, A5a, A5a1a, A5b1a, A5b1b, A6, | 12 (10.17%) |
B | B4a1b1a, B4a1c1a1, B4a4, B4b1a2, B4b1a2a, B4c1a1a, B4c1a1b, B4d3a1, B4f1, B5a2a1+16129, B5b, B5b1, B5b2a2, | 14 (11.86%) |
C | C4a1a, C7a1c, | 2 (1.69%) |
D | D2, D4, D4+195, D4a, D4a1, D4a1c, D4a1h, D4a3b, D4a3b1, D4b2, D4b2a2a1, D4b2b, D4b2b1, D4c1a, D4c2c, D4e1a, D4e2, D4f1, D4g1, D4g1a, D4g1b, D4g1c, D4h1a, D4h1c1, D4i, D4n, D5a, D5a2a1+16172, D5b, D5b1, D5b1b, | 41 (34.75%) |
F | F1a1, F2, F2f, F2i, | 4 (3.39%) |
G | G1a1, G1a1a, G1a1a1, G2a+152, G2a1, G2a1+16189+16194, G2a1b, G2a1e, G2a5, G2b2, | 14 (11.86%) |
M | M10, M10a1a1b, M10a1b, M11b1a, M7a1a, M7a1a1, M7a1a5, M7b1a1a1, M7c1a2a1, M7c1a3, M7c1a5, M7c1b, M8a3a, M8a3a1, M9a1a1, M9a1a1a, | 19 (16.1%) |
N | N9a2, N9a2a, N9a2c, N9a2d, | 5 (4.24%) |
R | R+16189, R11a, | 2 (1.69%) |
Y | Y1, | 4 (3.39%) |
Z | Z4a, | 1 (0.85%) |
The distributions of the haplogroups in the Korean population were obtained from Korea1K [40], KoVariome [41] and five super-populations in the 1000 Genomes Project (1KG) [31] and then compared to our data. Our study showed the concordance of the haplogroups of the Korean and East Asian populations (1KG-EAS) (Figure 2B). Other super-populations had distinct mtDNA haplogroups, as described elsewhere [46]. Haplogroups M, A, H and L were most prevalent in South Asian (1KG-SAS), American (1KG-AMR), European (1KG-EUR) and African (1KG-AFR) populations, respectively.
Correlations between annotated mtDNA variants and drug-induced toxicity
The D-loop is 1,122 bp in length, and consists of two hypervariable regions (HVI at nucleotides 16024–16383 and HVII at nucleotides 57–372) and a tandem repeat of poly(C) at nucleotides 303–315 [47]. In all subjects, the mtDNA D-loop region had 74 polymorphisms. All 118 samples had the polymorphism m.263A>G in HVII and one subject carried m.73A>C, which was reported to be rare in Asian populations (0.01338509% in healthy individuals and 0.08976661% in patients). In 30/118 (25.42%) of the samples in the present study, a thymine to cytosine (T>C) transition was found at position 16189, producing an uninterrupted homopolymeric tract at mtDNA segment 16180–16195 in HVI. In total, 3 of the 118 samples (2.5%) had 12 uninterrupted cytosine residues, with the heteroplasmy level of m.16189T>C varying from 54.6–99.9%. Meanwhile, 2 of the 118 (1.7%) samples had length heteroplasmy in the mitochondrial HVII segment at positions 303–315 due to the insertion of cytosine at position 309, and one subject showed replication slippage in the case of a T>C transition at position 310.
We observed a total of 40 mitochondrial variants in mitochondrially encoded 12S rRNA (MT-RNR1) and mitochondrially encoded 16S rRNA (MT-RNR2) genes. Several mtDNA mutations associated with antibiotic-induced toxicity were detected in these two genes (Table 2). The m.1555A>G and m.1494C>T mutations are well known to be associated with ototoxicity in patients treated with aminoglycoside antibiotics, but were not observed in our population. We may not have been able to detect these rare variants, reported to be found at a rate of 2 per 2,922 individuals (0.07%) in the Korean population [48], due to the small sample size. Other aminoglycoside-induced hearing loss variant, m.1189T>C, was observed in 1.69% of all samples; the rate was previously reported as 0.34% for the Korean population [49, 50]. Variant at position 961 was identified at a frequency of 5.93%; these are more likely to be found in Korean than Asian populations (1–3% in the 1KG). The polymorphism of m.2706A>G transition, a variation previously suggested to lead to a predisposition to linezolid-associated lactic acidosis [51, 52], was dominant in all subjects. The m.3010G>A allele, which is associated with linezolid-induced mitochondrial toxicity, was more common in the Korean population (30.51%), consistent with previous studies [44,]. Lastly, we observed a high frequency of m.10398A>G variant that was reported to influence metabolic ART effects [53] at homoplasmic level in our study.
Table 2
Frequencies of mitochondrial variants in association with drug-induced toxicity in this study (n=118)
Gene | Variant | rsID | Drug | Clinically relevant | Frequency in our study (%) |
MT-RNR1 | m.663A>G | rs56489998 | Aminoglycoside | Ototoxicity | 10.17% |
m.961T>C | rs3888511 | Aminoglycoside | Ototoxicity | 5.93% |
m.961T>C | rs3888511 | Linezolid | Mitochondrial toxicity | 5.93% |
m.1095T>C | rs267606618 | Aminoglycoside | Ototoxicity | 0.85% |
m.1189T>C | rs28358571 | Aminoglycoside | Ototoxicity | 1.69% |
MT-RNR2 | m.2706A>G | rs2854128 | Linezolid | Lactic acidosis, Mitochondrial toxicity | 99.15% |
m.3010G>A | rs3928306 | Linezolid | Mitochondrial toxicity | 30.51% |
MT-ND3 | m.10398A>G | NA | ART | Metabolic/ cardiovascular complications | 70.34% |
* MT-RNR1: Mitochondrially encoded 12S ribosomal RNA; MT-RNR2: Mitochondrially encoded 16S ribosomal RNA; MT-ND3: Mitochondrially encoded NADH: Ubiquinone Oxidoreductase Core Subunit 3. |