Single-mitochondrion sequencing uncovers distinct mutational patterns and heteroplasmy landscape in mouse astrocytes and neurons

Background Mitochondrial (mt) heteroplasmy can cause adverse biological consequences when deleterious mtDNA mutations accumulate disrupting “normal” mt-driven processes and cellular functions. To investigate the heteroplasmy of such mtDNA changes, we developed a moderate throughput mt isolation procedure to quantify the mt single-nucleotide variant (SNV) landscape in individual mouse neurons and astrocytes. In this study, we amplified mt-genomes from 1645 single mitochondria isolated from mouse single astrocytes and neurons to (1) determine the distribution and proportion of mt-SNVs as well as mutation pattern in specific target regions across the mt-genome, (2) assess differences in mtDNA SNVs between neurons and astrocytes, and (3) study co-segregation of variants in the mouse mtDNA. Results (1) The data show that specific sites of the mt-genome are permissive to SNV presentation while others appear to be under stringent purifying selection. Nested hierarchical analysis at the levels of mitochondrion, cell, and mouse reveals distinct patterns of inter- and intra-cellular variation for mt-SNVs at different sites. (2) Further, differences in the SNV incidence were observed between mouse neurons and astrocytes for two mt-SNV 9027:G > A and 9419:C > T showing variation in the mutational propensity between these cell types. Purifying selection was observed in neurons as shown by the Ka/Ks statistic, suggesting that neurons are under stronger evolutionary constraint as compared to astrocytes. (3) Intriguingly, these data show strong linkage between the SNV sites at nucleotide positions 9027 and 9461. Conclusions This study suggests that segregation as well as clonal expansion of mt-SNVs is specific to individual genomic loci, which is important foundational data in understanding of heteroplasmy and disease thresholds for mutation of pathogenic variants. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-024-01953-7.


Supplemental Methods.
Morphology criteria for single cell pickup: Since both neurons and astrocytes can show variable morphologies, we used the following criterion for each cell type.
Neuron -Pyramidal neurons (A), bipolar (B) or unipolar (C), were selected (Figure S14 A-C).A cell was classified as a neuron if it had a small soma (approximately <10 um) with elongated 1-3 processes.
Astrocytes -Cells displaying 'classic-star shaped' morphology with multiple processes were chosen as astrocytes.A cell with a relatively larger (>10 um) and flatter cell body, accompanied by more than 4 processes of similar lengths were designated as astrocytes (Figure S14 D-F).
Additionally, we have incorporated references demonstrating that the neurons and astrocytes chosen for analysis exhibited morphological characteristics consistent with those of mouse cortical neurons and astrocytes identified by immunofluorescent markers.
Please refer to the following references showing neuronal and astrocyte morphology that resembles the cells we have selected as mouse cortical neurons and astrocytes

Mitochondrial (mt) barcode demultiplexing.
To determine the optimal threshold of error tolerance in mt barcode demultiplexing, the Levenshtein distance was calculated between any two barcodes (Table S1B) and a minimal distance 3 was obtained.Therefore, at most two errors in searching each of the 11bp barcodes were permitted using the command: cutadapt -g XCGATTATCACG -e 0.2 -O 11 (CGATTATCACG being M1 as an example, Table S1A).Please refer Figure S17-18 for sample overview and PCR read depth for each mt barcode.
Out-of-range SNV filtering.The mt barcode and the PCR primers' efficiency by the read depth was examined at each base position.We noticed the trailing coverage at the 3' end of some PCR target regions (e.g., Region3, 6 and 7, Figure S18).For the sake of rigor, the bases out of the sequencing read length were designated as "out-of-range", SNVs (if any) inside these regions were discarded.In addition, SNVs (if any) inside the PCR forward primers were removed because the genetic variants could be masked by the identically synthesized primer sequences, and hence any SNVs inside the primers could be artifactual.
Base Phred score filtering.To prevent false positives caused by sequencing errors, polymorphic bases with Phred scores below 30 were discarded.This threshold was derived from the Phred calibration analysis, where all non-reference bases were collected and their theoretical (x-axis) error rate was plotted from Phred calls against the observed error rate (y-axis), which was composed of true SNVs and technical errors.As expected, two well separated clusters were observed: the top-right quadrant (VAF >= 5%, Phred >= 30) and the bottom-left quadrant (VAF < 5%, Phred <30) (Figure S19).The threshold was maintained at 30 for the sake of simplicity (not lowered to 28.99 -the red dashed line).Only < 10% read depth was lost when imposing the more stringent cutoff, and most of the loss occurred to the 3' end.
Variant allele frequency threshold.Assuming that the RCA reaction went through  !cycles at a per-base error rate  !, and that the PCR reaction went through  " cycles at a per-base error rate  " .For a particular site, let  !be the total number of mutations resulting from RCA, and that  !~binom( !,  ! ), hence after RCA  !mutant copies and  !−  !intact copies will be obtained.Assuming no backward mutation, all mutated molecules would be amplified by PCR, and the final number of mutant copies would be  !× 2 # ! .For each of the RCA intact copies, assuming it induces  " $ errors during PCR cycle , where there are 2 $%! reactions, hence  " $ ~ binom(2 $%! ,  " ), and these errors would propagate and eventually result in 2 # !%$ ×  " $ mutant copies.Therefore, the total number of mutant copies exclusive to PCR would be . After RCA and PCR, the total number of amplicons would be  =  !× 2 # ! and among them, the total number of mutants . Assuming the amplicons to the depth of  were sequenced and  variant bases were observed, then ~binom 8, ' ( 9, hence at the VAF threshold , the probability of making a false positive call is Pr( > ).
Control experiment assessment.The raw read depth in the single mtDNA samples versus different types of controls was compared: pooled mtDNA(+), pooled mtDNA(+) primer(-), pooled mtDNA(-) primer(+), pooled mtDNA(-)primer(-) in 12 PCR target regions (Figure S21).As expected, in most of the amplified SNV regions, the positive controls (i.e., pooled, double positive) gave rise to the highest read depth, and the single-mitochondrion, double positive samples often showed the second highest depth.Compared with the positive controls, the negative controls generally showed two orders of magnitude lower depth.It is noteworthy that the read depth is not an accurate estimator for sensitivity for it is heavily influenced by PCR efficiency, starting material and sequencing depth, hence single-mitochondria samples and pooled controls might not be well comparable.The main purpose of this analysis was to highlight the global difference between the positive and negative pooled controls, and the results did match our expectation.
Correction for the soft-clipped 9027.During read alignment, we noticed that STAR could soft-clip the mismatching base(s) at the 3' end of reads and hence could cause an underestimated VAF should the variant base be at the 3' end.A systematic diagnostic analysis was performed to evaluate the impact of this issue.We found that 9027 was the only position affected by this problem-of 1489 mitochondria that have base coverage at 9027 from raw reads, there were in total 107,259,677 As, 13,953,942 Gs, 330,682 Cs, and 294,080 Ts; STAR discarded the non-reference As.Therefore the 9027:G>A read frequency for each sample was manually compensated.Figure S22 represents the change in 9027:G>A VAF before and after the correction.
Mouse strains coordinate standardization.During the phylogenetic analysis, a subtle discrepancy was seen in the UCSC GRCm38 reference genome (C57BL/6J) which is used in this study and the C57BL/6NJ genome (CM004277.1)from the Mouse Genomes Project: the former has 16299bp while the latter has 16300bp (Figure S24).Hence a distinction was made between C57BL/6J and C57BL/6NJ: UCSC C57BL/6J was added to the 16 strains from the Mouse Genomes Project and multiple sequence alignment was performed to 17 strains in total.The mitochondrial coordinate system was standardized based on UCSC GRCm38 (C57BL/6J).
Alignment quality assessment for linkage analysis.During analyzing SNV linkages, some significantly linked SNVs were observed in a small fraction of amplicons and showed suboptimal read quality.For example, 6 variants around 13775 were located on reads starting ~80bp downstream the 5' end of the PCR forward primer (Figure S25) and appeared as linked SNVs.To avoid suboptimal alignment that could be a major confounder for this analysis, a stricter set of filtering criteria was used: (1) reads must be on the positive strand only, (2) reads must start off-5'end only by +/-1 bp, (3) read length must be >= 135bp.Using this stricter processing, 776 total SNV sites and 209 SNV sites shared by >=3 mitochondria were observed.Comparing with the permissive processing (Figure S26), 92.6% (776) SNV sites survived the stricter filtering (Figure 26A).The stricter filtering also led to 22 new SNV sites shared by >= 3 mitochondria because of an increase in the VAF after the stricter filtering (Figure S26B).

NUMTs Computational Analysis on SMITO data.
To determine the number of false positives due to the potential NUMTs (12-16) contamination, we first assembled a list of mouse NUMTs regions.Briefly, we synthesized 50 bp reads at each mitochondrial position and mapped them to the nuclear genome using BLAST with highly sensitive parameters: -num_alignments 1000 -word_size 6 -perc_identity 80 -gapopen 3 -gapextend 1 -evalue 1; the results were merged as disjoint regions and referred to as NUMTs.We synthesized 50 bp reads from NUMTs and mapped them back to the mt-genome using bwa mem.To make the mapping quality comparable to STAR, we filtered for alignments with MAPQ >= 60.Finally, we called 381 NUMTs original, false mt-SNVs (Supplementary Fig S23A, Supplementary Dataset S10) using the aforementioned procedure in Methods section.
Please note that, for this analysis we only required VAF > 0 and did not filter out low-depth or low-VAF SNVs, hence our false positives here are overestimated.We compared our SNVs against this list; only 27 overlapped suggesting that, by computational analysis at most, 27 out of 838 (3.2%) SNVs could potentially be artifacts originating from NUMTs.
Furthermore, mapping reads that did not align to the mouse mt-genome to the mouse nuclear genome yielded a minimal number of unique alignments (average 0.33%, median 0.08%).Given the same amplification, library prep and sequencing conditions, nucleus borne reads should have mismatch rates comparable to mt reads.These nucleus exclusive alignments showed significantly (Wilcoxon rank-sum test p-value <2.2e-16) higher mismatch rate (mean 5.19%, median 5.27%) than mt exclusive alignments (mean 1.41%, median 1.17%).

9 Figure S1 .
Figure S1.An overview of the distribution of the total SNVs.(A) The distribution of

Figure S2 .
Figure S2.Distribution of the SNV counts in the SMITO dataset.Histograms showing

Figure S4 . 13 Figure S5 .
Figure S4.Comparison of Cells and Mitochondria Sharing Deleterious and

Figure S7 .
Figure S7.Comparison between the inherited and the somatic SNVs in mutational propensity.(A) Inherited SNVs' per-base variant rate on the y-axis in each target gene

Figure S10 .
Figure S10.Comparing inherited SNVs' AF variation across the mouse, cell, and mitochondrion level.(A) The observed (the colored line) and the null (gray lines) distribution of the AF variation proportion at the mouse (left), cell (middle) and

Figure S11 .
Figure S11.RNAfold predicted tRNA secondary structure changes caused by

Figure S12 .
Figure S12.Ka/Ks statistics in total SNVs.The background distribution was derived from simulated random mutations (3000 times permutation) conditioned on the overall mutational spectrum.The red dashed-line here represents the observed Ka/Ks from total

Figure S14 .Figure S15 .
Figure S14.Representative images of primary mouse neurons and astrocytes.A representative image of a pyramidal (A), bipolar (B) and unipolar (C) neuron, shown by

Figure S16 .
Figure S16.RCA products from single mitochondrion samples.Shown here is a

Figure S17 .
Figure S17.Sample size and quality assessment of positive and negative controls.The heatmap showing the number of bases with sufficient depth (≥ 50) in each mt (row) for each PCR target region (depicted as columns).

Figure S18 .
Figure S18.Per-base read depth for each mt-barcode and each PCR target region.Each subfigure depicts one of the 120 combinations of mt-barcode and PCR target region, and an individual curve (and color) within a subfigure represents a single mitochondrion.

Figure S19 .
Figure S19.Phred score threshold choice and its impact on read coverage.(A) Scatter plot showing the theoretical error rate (Phred score, x-axis) and the observed

Figure S21 .
Figure S21.Boxplots of the read depth in each of the 12 PCR target regions.The

Figure S22 .
Figure S22.Correction for Soft-clipping by STAR on 9027:G>A.The VAF difference between before and after patching the non-reference base As that were soft-clipped by

Figure S23 .
Figure S23.Mouse NUMTs Analysis on the SMITO dataset.(A) Homologous mtDNA shown as links; colors denote individual chromosomes.(B) Representative example of

FigureS25.
Figure S24.Sequence comparison between UCSC GRCm38 (C57BL/6J) and C57BL/6NJ from the Mouse Genomes Project in the region highlighting two

Figure S26 .
Figure S26.Comparison between the permissive and the stricter filtering.(A) Venn diagram showing the overlap of all SNV sites between the permissive and the stricter

Table S1B .
Edit distances between any pair of the mitochondrial-barcodes.

Table S2 .
List of the sequences of the forward/reverse PCR primers for selected specific target regions.

The following supplementary datasets are provided as individual excel files. Legends for Datasets: Dataset S1 (separate file).
The metadata for each mitochondrion or control sample.The impact annotation for all 1032 SNVs with SIFT scores.Major alleles and top minor alleles of SMITO and the 17 mouse strains data.