Deep Impact of Random Amplification and Library Construction Methods on Viral Metagenomics Results

Clinical metagenomics is a broad-range agnostic detection method of pathogens, including novel microorganisms. A major limit is the low pathogen load compared to the high background of host nucleic acids. To overcome this issue, several solutions exist, such as applying a very high depth of sequencing, or performing a relative enrichment of viral genomes associated with capsids. At the end, the quantity of total nucleic acids is often below the concentrations recommended by the manufacturers of library kits, which necessitates to random amplify nucleic acids. Using a pool of 26 viruses representative of viral diversity, we observed a deep impact of the nature of sample (total nucleic acids versus RNA only), the reverse transcription, the random amplification and library construction method on virus recovery. We further optimized the two most promising methods and assessed their performance with fully characterized reference virus stocks. Good genome coverage and limit of detection lower than 100 or 1000 genome copies per mL of plasma, depending on the genome viral type, were obtained from a three million reads dataset. Our study reveals that optimized random amplification is a technique of choice when insufficient amounts of nucleic acid are available for direct libraries constructions.

: DNA yields after pre-amplification of the fractions NA and RNA and total reads after quality preprocessing for VRMP diluted in plasma (1:10). Table S6: Genome consensus length of the detected viruses in VRMP diluted in plasma (1:10) for both fractions. Table S7: Targeted and agnostic analyses of WRVS. Figure S1: Comparison of the distribution of WNCS (log10) detected in the VMRP panel for both NA and RNA fractions for seven methods. Boxes are color-coded according to the methods for both fractions. Figure S2. Size distribution of contigs in nucleotides generated from de novo assembly for the seven methods. Figure S3. Comparison of cumulative percentage of genome fractions of viruses detected in VMRP with seven methods. (A) DNA viruses in RNA and NA fractions. (B) RNA viruses in RNA and NA fractions. Figure S4. Comparison of whole genome coverage profiles of Human parainfluenza virus 1/respirovirus 1 detected in VMRP with six methods in NA and RNA fractions. Figure S5. Mapping of reads R1 on reference sequence Human parainfluenza virus 1 NC_003461 from Smarter V1 (A) and NoAmp (B) of the NA fractions. Figure S6. Comparison of genome fraction of VMRP spiked in plasma (ratio 1:10) detected in NA and in RNA fractions, by 100% cumulative bar chart. Figure S7. Comparison of whole genome coverage profiles of five viruses WRVS spiked in plasma sample at 10 4 genome-copies per mL in NA fraction. For each virus, top profile: MALBAC-V2, bottom profile: NoAmp. Human gammaherpesvirus 4 (HHV-4), porcine circovirus type 1 (PCV1), human orthoreovirus type 1 (REO1), human respiratory syncytial virus strain A2 (HRSV), feline leukemia virus (FeLV), and Squirrel monkey retrovirus (SMRV).

Reverse-transcription
The reverse transcription (RT) using SuperScript® IV First-Strand cDNA Synthesis (Invitrogen) consisted in a mix with 11 µL of RNA (or total nucleic acids), 1 µL of random hexamers (50 µM), and 1 µL of dNTP (10 mM), incubated at 65°C for 5 min and cooled on ice for 2 min, according to the manufacturer's instructions. Seven µL of the RT reaction mix (1X SSIV Buffer, 5mM DTT, 2.0 U Ribonuclease Inhibitor, 10 U SuperScript™ IV Reverse Transcriptase) were added to the 13 µL of annealed RNA and incubated at 23°C for 10 min, at 50°C for 10 min, then inactivated at 80°C for 10 min.
To remove RNA, 1 µL E. coli RNase H was added, and incubated at 37°C for 20 minutes. Single-stranded cDNA obtained with this protocol was the starting material for NoAmp, WTA, MALBAC, DOP, and Accel protocols for the RNA fraction and the total nucleic acids were also subjected to this protocol for the NA fraction.
For the experiments with WRVS, the denaturation temperature was increased to 95°C for 3 min before cooling on ice for 2 min and used with NoAmp, MALBAC and MALBAC-V2 protocols.

NoAmp
Following the reverse transcription step described in section 1, the second cDNA strand was synthetized using a mix containing 1× NEBuffer™ 2, 10 U DNA polymerase I Klenow fragment, 0.33 mM dNTPs , 20 µL of the denatured ss cDNA at 94°C for 2 min, and completed at 30 µL with DNase/RNase-free sterile water. Incubation was done at 37 °C for 1 h. Double-stranded cDNAs were purified using Agencourt AMPure beads (Beckman Coulter).
3. WTA (QuantiTect whole transcriptome kit (Qiagen)) Briefly, the protocol of a QuantiTect whole transcriptome kit (Qiagen) was followed, except that the cDNA synthesis step was performed with random hexamer primers and SuperScript® IV reverse transcriptase (Invitrogen) as described in section 1. The ligation step was performed on 10 µL of the RT reaction followed by the amplification step (6h) using an amplification mix (REPLI-g Midi DNA Polymerase and REPLI-g Midi Reaction Buffer).The resulting dsDNA was purified with the Agencourt AMPure beads (Beckman-Coulter), eluted in a final volume of 50 µL, and quantified with the Qubit™ DNA HS Assay (Life Technologies, Thermo Fisher Scientific Inc.).

4.
DOPlify™ WGA (Reproductive Health Science, Thebarton, Australia) DOPlify™ WGA method used DOP-PCR designed for amplifying total DNA from single cells to in a two-step protocol of three hours. The kit is designed to amplify picogram quantities of DNA. In our experiments, the template for the DOPlify kit was the cDNA synthesis product (1 st strand) performed as described in section 1 in a 4 µL volume as input. For the initial amplification at a low annealing temperature, 8 cycles were running. During the second stage of PCR amplification at a higher annealing temperature, 21 cycles were applied. For the viruses spiked into plasma pool, we doubled the volume input and the reaction volumes and ran two assays, either 8/21 cycles or 12/25 cycles for the successive amplifications. The resulting dsDNA was purified with the Agencourt AMPure beads (Beckman-Coulter), eluted and quantified with the Qubit™ DNA HS Assay (Life Technologies, Thermo Fisher Scientific Inc.).

5.
Accel-NGS® 1S Plus DNA Library Kit (Swift Biosciences) DNA fragmentation is the first step to get a peak length of 300 bp by Covaris M220 Focusedultrasonicator using microTUBE-15 (Peak Incident power (W)=18, Duty Factor=20%, Cycles per Burst =50, Treatment time (sec)=60). Then the Adaptase step was performed on 15 µL of fragmented DNA that simultaneously performs end repair, tailing of 3' ends, and ligation to the first truncated adapter to 3' ends in a proprietary reaction. An Extension step is used to facilitate ligation of the second truncated adapter. The synthesized strand does not get sequenced. A double bead-based SPRI clean-up is performed on the extension reaction. The Ligation reaction adds the second truncated adapter to the 5' ends and the product is cleaned-up. Dual index adapters are added and pre-assembled to the libraries for an amplification with 17 cycles of PCR. The PCR amplified libraries were cleaned-up using 0.9X AmpureXP Beads and concentrated.

6.
The stranded SMARTer technology (Takara Bio/Clontech) The stranded SMARTer technology is based on tagged random hexamer primers and a SMARTScribe Reverse Transcriptase (RT) with terminal transferase activity. When it finishes the first strand cDNA synthesis, the RT adds a few non-templated nucleotides to the 3′ end of the cDNA (GGG). The SMART adapter is complementary to these nucleotides and adds the 5′ tag. The first strand cDNA is used as a matrix to perform the PCR amplification using indexed primers. A depletion step targeting mammalian rRNA (16S and 28S) and mitochondrial rRNA (m12S and m16S) cut the library fragments using the Zapr enzyme and the remaining library is amplified. The recommanded quantity of RNA is 0.25-10 ng in a 8 µL volume as input for library preparation using a SMARTer® Stranded Total RNA-Seq Kit v2 -Pico Input Mammalian (Takara Bio/Clontech). Height µL were used as a template with a fragmentation step and 16 cycles or 14 cycles of final RNAseq library amplification respectively for non-spiked and spiked viruses. Advantages of the version 2 of the SMARTer Stranded Total RNA-Seq Kit version 2 over version 1 announced by the manufacturer is the more efficient removal of rRNA sequences, a higher cDNA yield, improved sequencing performance and Read 2 corresponds to the sense strand.

7.
The MATQ-seq: multiple annealing and dC-tailing-based quantitative single-cell RNA-seq The full protocol is available in (Sheng and Zong, 2019). The MATQ-seq (multiple annealing and dCtailing-based quantitative single-cell RNA-seq ) assay utilizes both oligo dT but also MALBAC random primers to perform first-strand synthesis. For our comparison of methods, based on reverse transcription using random primers, we missed oligo dT. Superscript IV replaced Superscript III reverse transcriptase. To melt secondary structure, samples are incubated at 72°C for 3 min, then 10 thermal cycles of 8°C for 12 s, 15°C for 45 s, 20°C for 45 s, 30°C for 30 s, 42°C for 2 min, 50°C for 3 min, followed by 15 min at 50°C are performed to allow primers to anneal to transcripts. Primers are digested after first-strand synthesis and cDNA is released as single-stranded DNA after a RNA digestion step. The single-stranded cDNA are the polyC tailed on their 3 'end using terminal transferase. MALBAC 6N3G primers bind to the PolyC tail of the single-stranded DNA during the second-strand synthesis using the Deep Vent exo-polymerase (NEB). DNA is then amplified by a 24 cycles of PCR with the GAT27PCR primer. The amplified product was then fragmented to get a peak length of 300 bp by Covaris M220 Focused-ultrasonicator using microTUBE-15 (Peak Incident power (W)=18, Duty Factor=20%, Cycles per Burst =50, Treatment time (sec)=60), then subjected to library preparation using the NEBNext Ultra II DNA Library Prep kit (NEB). The PCR amplified libraries (3 cycles) were cleanedup using 0.9X AmpureXP Beads.

MALBAC Single Cell WGA kit (Yikon Genomics)
MALBAC is based on multiple annealing and looping-based amplification cycles of gDNA and cDNA. The MALBAC Single Cell WGA kit (Yikon Genomics) can be used from 0.5pg of gDNA. It utilizes primers containing a 27 nucleotide common sequence and an height nucleotide variable sequence to produce fragments of amplified DNA (amplicons) during a quasi-linear pre-amplification step followed by a regular PCR step which loop back on themselves to prevent additional copying and crosshybridization. The common nucleotide sequence is GTG AGT GAT GGT TGA GGT AGT GTG GAG. The template was the cDNA synthesis product (1 st strand) performed as described in section 1 in a 5 µL volume as input. In absence of plasma matrix, the quasi-linear pre-amplification step was set on 12 cycles and the regular PCR on 21 cycles and in presence of plasma matrix, we reduced the number of cycles respectively to 8 and 17 cycles.

MALBAC-V2
The reverse transcription consisted in a mix with two MALBAC primers (GAT27 5N3G and GAT27 5N3T ) at 5µM each, 0.5nM dNTP, 0.8U RNase inhibitor, 2 mM DTT and 5 µL RNA template. Samples are incubated at 95°C for 3 min and cooled on ice for 2 min. The RT reaction mix (1X SSIV Buffer, 5mM DTT, 1.2 U RNase inhibitor, 10 U SuperScript™ IV Reverse Transcriptase, Invitrogen) as added to the 7 µL of annealed RNA for a total volume of 10 µL. Then 10 thermal cycles of 8°C for 12 s, 15°C for 45 s, 20°C for 45 s, 30°C for 30 s, 42°C for 2 min, 50°C for 3 min, followed by 15 min at 50°C were performed, as in MATQ-seq protocol.
The MALBAC Single Cell WGA kit (Yikon Genomics) is used for the quasi-linear pre-amplification step and the regular amplification step. Freshly-prepared Pre-Amplification Reaction Mix (30µL) is added to 5µL of the reverse transcription product and denatured à 94°c for 3 min.

Accuracy of the amplification methods
The assessment of the accuracy of the methods was not the objective of our study, as we were focused on the limit of detection of viruses and did not get a high depth coverage for this purpose. Nevertheless, we indirectly estimated the accuracy of the methods using the general alignment error rate on the human genome from the plasma matrix, computed as a ratio of total collected edit distance to the number of mapped bases with Qualimap 2.2.1 on the BAM alignment (BWA MEM version 0.7.4) against the human reference genome assembly hg38 using VMRP spiked in plasma matrix. In both fractions, the range of this general error rate was between 0.36% and 0.95% for NoAmp, MALBAC, DOP and WTA, with the best result for WTA and reached 1.44%-1.94% for SMARTerV2 and 3.39%-2.08% for Accel. However this result has not been confirmed in the raw VMRP, as the nucleotide sequence alignment (MAFFTv7) of the Phosphoprotein gene of the parainfluenzavirus 1/respirovirus covered by six methods (WTA not used) (594 bp) showed identical variants compared to the reference sequence NC_003461 in both fractions for all methods, except DOP that shared an additional variant in both fractions and another in the RNA fraction (data not shown).         Figure S2. Size distribution of contigs in nucleotides generated from de novo assembly for the seven methods. Boxes represent the interquartile range, bar the median value, and dots the outliers.   Figure S4. Comparison of whole genome coverage profiles of Human parainfluenza virus 1/respirovirus 1 detected in VMRP with six methods in NA and RNA fractions. Figure S5. Mapping of reads R1 on reference sequence Human parainfluenza virus 1 NC_003461 from Smarter V1 (A) and NoAmp (B) of the NA fractions.