TruePrime is a novel method for whole-genome amplification from single cells based on TthPrimPol

Sequencing of a single-cell genome requires DNA amplification, a process prone to introducing bias and errors into the amplified genome. Here we introduce a novel multiple displacement amplification (MDA) method based on the unique DNA primase features of Thermus thermophilus (Tth) PrimPol. TthPrimPol displays a potent primase activity preferring dNTPs as substrates unlike conventional primases. A combination of TthPrimPol's unique ability to synthesize DNA primers with the highly processive Phi29 DNA polymerase (Φ29DNApol) enables near-complete whole genome amplification from single cells. This novel method demonstrates superior breadth and evenness of genome coverage, high reproducibility, excellent single-nucleotide variant (SNV) detection rates with low allelic dropout (ADO) and low chimera formation as exemplified by sequencing HEK293 cells. Moreover, copy number variant (CNV) calling yields superior results compared with random primer-based MDA methods. The advantages of this method, which we named TruePrime, promise to facilitate and improve single-cell genomic analysis.

alignment tool provided at the BLAST server, using the following alignment, CDD and query clustering parameters.: gap penalties (-11, -1); end-gap penalties (-5, -1); blast E-value: (0.003); word size: (4); max. cluster distance: (0.8). The multiple alignment confirmed the presence of extensive similarities, in addition to the three most conserved AEP motifs (A, B and C), that contain the three invariant catalytic carboxylates (indicated with a red dot in motifs A and C) and the invariant histidine (indicated with a blue dot in motif B), and the C-terminal PriCT-1 domain 2  (c) A phylogenetic tree was generated at the BLAST web server, using the 90 bacterial sequences selected, and choosing the Fast Minimum Evolution method, with the following settings: maximal sequence difference: 0.85; distance: Grishin (protein). The tree was represented using the NCBI Tree Viewer application and selecting a Circular Tree layout, but emphasizing the evolutionary distance (indicated with bold/colored lines). The different groups of bacteria for which a potential TthPrimPol close orthologue has been detected, are indicated and differentially colored.

Supplementary Figure 2
TruePrime-amplified DNA is target derived even at 1 fg input. Diagrams showing the percentage of reads that can be assigned unanimously to the human genome. Input amount were 1 ng, 1 pg, and 1 fg of purified genomic DNA (Promega). Amplification was performed for 6 h. Sequences were obtained using the Ion Torrent platform. Even at 1 fg input more than 95% of the obtained reads are target derived.  implying that the main driver behind this GC-dependence in relative chromosomal coverage is Φ29DNApol, not the priming mechanism. (d) High correlation of the chromosomal coverage patterns between the two amplification methods (R²=0.92).

Supplementary Figure 3
Supplementary Figure 6 CNV calling using ControlFreeC. From outward to inward: non-amplified, TruePrime, commercial RP-MDA kit, and MALBAC. The binned read depth (bin size = 50kb) is shown as black dots, whereas the calculated copy number is shown as a colored line (blue = one copy (haploid), green = two copies (diploid), and red = more than two copies (polyploid)). The top Circos plot shows all chromosomes, the bottom Circos plot a close-up of chromosome 6. The NA shows very little dispersion of binned read depths. TruePrime displays a greater dispersion, but allows for a largely identical calculation of ploidy states. The commercial RP-MDA kit shows a high fluctuation in binned read depth and often a very high calculated copy number. MALBAC displays a similar behavior, with better prediction of ploidy states than the commercial RP-MDA sample.
CNV calling using Gingko. Shown are the read numbers per bin (dots), and the deduced ploidy level of the chromosomal segment (lines). TruePrime shows the lowest dispersion level of bins and is closest to the non-amplified profile. Variable bin sizes of about 500 kb were used.

Supplementary Figure 8
Close-up views on chromosomal coverage reproducibility. Sliding window coverage comparison of chromosome 3 between non-amplified (grey) and four HEK293 cells amplified with TruePrime (blue) (input: exactly 5 million randomly selected read pairs). The coverage pattern between the four replicates and the NA sample is highly similar. Even a small region in the NA sample (around 57,500,000) which has a higher coverage than the surrounding region is reproduced by each TruePrime replicate down to a zooming region of only 600kb. Shown is the number of SNVs detected by samtools / bcftools, varscan2, ISAAC, and the low frequency caller from CLC BIO, the number of recovered SNVs as percentage of the SNVs detected in the nonamplified (NA) sample, the novelty rates of the SNVs (not contained in dbSNP137), the ratio of heterozygote/ homozygote SNVs, the transition / transversion ratio (Ts/Tv) of detected SNVs, the recall rate (position based), the precision (position based), the conversion rate of heterozygote to homozygote SNVs from the non-amplified sample to the amplified sample, the estimated ADO rate, the false positive rate (FPR), the conversion rate of homozygous to heterozygous SNVs for the complete genome and for the haploid chromosome 18 only.