Transposase assisted tagmentation of RNA/DNA hybrid duplexes

Tn5-mediated transposition of double-strand DNA has been widely utilized in various high-throughput sequencing applications. Here, we report that the Tn5 transposase is also capable of direct tagmentation of RNA/DNA hybrids in vitro. As a proof-of-concept application, we utilized this activity to replace the traditional library construction procedure of RNA sequencing, which contains many laborious and time-consuming processes. Results of activity of transposase assisted RNA/DNA hybrids co-tagmentation (termed “ATRAC-seq”) are comparable to traditional RNA-seq methods in terms of gene number, gene body coverage and gene expression analysis; at the meantime, ATRAC-seq enables a one-tube library construction protocol and hence is more rapid (within 8 h) and convenient. We expect this tagmentation activity on RNA/DNA hybrids to have broad potentials on RNA biology and chromatin research.


26
Transposases exist in both prokaryotes and eukaryotes and catalyze the 27 movement of defined DNA elements (transposon) to another part of the 28 genome in a "cut and paste" mechanism (1-3). Taking advantage of this 29 catalytic activity, transposases are widely used in many biomedical applications: 30 for instance, an engineered, hyperactive Tn5 transposase from E. coli has been 31 utilized in an in vitro double-stranded DNA (dsDNA) tagmentation reaction to 32 achieve rapid and low-input library construction for next-generation sequencing 33 (4-9). In addition, Tn5 was also used for in vivo transposition of native chromatin 34 to profile open chromatin, DNA-binding proteins and nucleosome position 35 ("ATAC-seq") (10). While Tn5 has been broadly adopted in high-throughput 36 sequencing, bioinformatic analysis and structural studies reveal that it belongs 37 to the retroviral integrase superfamily that act on not only dsDNA but also 38 RNA/DNA hybrids (for instance, RNase H). Despite the distinct substrates, 39 these proteins all share a conserved catalytic RNase H-like domain (see Figure   40 1a) (11)(12)(13)(14). Given their structural and mechanistic similarity, we attempted to 41 ask whether or not Tn5 is able to catalyze tagmentation reactions to RNA/DNA 42 hybrids (see Figure 1b), in addition to its canonical function of dsDNA 43 transposition. In this study, we tested this hypothesis and found that indeed Tn5 44 possesses in vitro tagmentation activity towards both strands of RNA/DNA 45 hybrids. As a proof of concept, we apply such activity of transposase-assisted 46 RNA/DNA hybrids co-tagmentation (ATRAC-seq) to achieve rapid and low-cost 47 4 RNA sequencing starting from total RNA extracted from 10,000 to 100 cells. We

56
To test whether Tn5 transposase has tagmentation activity on RNA/DNA 57 hybrids, we prepared RNA/DNA duplexes by performing mRNA reverse 58 transcription. We first validated the efficiency of reverse transcription and the 59 presence of RNA/DNA duplexes using a model mRNA sequence (~1,000 nt) as 60 template (see Figure S1a). We then subjected the prepared RNA/DNA hybrids 61 from 293T mRNA to Tn5 transposome, heat-inactivated Tn5 transposome and 62 a blank control (without Tn5), respectively (see Methods). The hybrids were 63 then recovered and their length distribution was analyzed by Fragment 64 Analyzer (see Figure 1c). Comparing with the heat-inactivated Tn5 sample or 65 the blank control sample, the Tn5 transposome sample exhibited a modest but 66 clear smear signal corresponding to small fragments ranging from ~30-650 67 base-pair (bp) (the blue patches in Figure 1c). Consistent with the 68 fragmentation event, we also observed a down shift of large fragments ranging 69 5 from ~700-4000 bp (the orange patches in Figure 1c). In addition, the 70 fragmentation efficiency increased in a dose-dependent manner with the 71 transposome, suggesting that fragmentation of RNA/DNA hybrids is dependent 72 on Tn5 (see Figure S1b).

74
We next asked whether RNA/DNA hybrids are tagged by Tn5. For a canonical 75 dsDNA substrates, the staggered tagmentation of Tn5 results in a 9 bp gap 76 between the nontransferred strand and the target DNA (see Figure 1d). We 77 anticipate that a similar in vitro tagmentation reaction to RNA/DNA hybrids 78 generates a structure with adaptors ligated to the 5' ends of both RNA and DNA 79 strands and gaps at the 3' ends (see Figure 1e). If such a structure is present, 80 we would be able to convert it into an amplifiable DNA sequence by reverse 81 transcription from the target DNA into this gap, followed by extension synthesis 82 of the attached adaptor sequence by strand displacement (see Figure 1e). We 83 chose Bst 3.0 DNA polymerase, which demonstrates strong 5'à3' DNA 84 polymerase activity with either DNA or RNA templates. We then performed 85 quantitative polymerase chain reaction (qPCR) quantification for the three 86 samples. We observed that cycle threshold (Ct) value of the Tn5 transposome 87 sample is about 8 cycles smaller than the heat inactivated Tn5 sample or the 88 control sample, indicating approximately 256 times more amplifiable products 89 (see Figure 1f). We also tested different buffer conditions and found that the 90 performance of Tn5 remained similar, indicating the robustness of the Tn5 91 6 tagmentation activity (see Figure S1c). Using Sanger sequencing, we validated 92 that the adaptor sequences are indeed ligated to the insert sequences (see 93 Figure S1d). Therefore, Tn5  second-strand synthesis, end-repair and adaptor ligation, we attempted to 106 replace the process using the tagmentation activity towards RNA/DNA 107 duplexes. With the help of ATRAC-seq, these steps are replaced with a "one-108 tube" protocol (see Figure 2a), which uses total RNA as input material and 109 involves just three seamless steps (reverse transcription, tagmentation and 110 strand extension), without the need for a second strand synthesis step. We first 111 conducted ATRAC-seq with 200 ng total RNA as input; we observed very high 112 correlation in gene-expression levels among three replicates, indicating 113 7 ATRAC-seq is highly reproducible (see Figure 2b). To test the robustness of 114 ATRAC-seq, we performed the experiments with 20 ng and 2 ng total RNA.

115
ATRAC-seq results are again highly reproducible among replicates (see Figure   116 S2a, S2b). More importantly, gene expression level measured using different 117 amount of starting materials remain consistent with each other (see Figure 2c).

119
We then compared the library quality between ATRAC-seq and NEBNext Ultra 120 II RNA library prep kit, a commonly used kit for RNA-seq library construction. 121 We found that ATRAC-seq libraries exhibited similar percentage of reads 122 mapped to annotated transcripts, rRNA contamination and gene numbers to 123 NEBNext data (see Table S1), despite the fact that ATRAC-seq directly uses 124 total RNA as input material. Most of the genes detected by ATRAC-seq overlaps 125 with that of NEBNext, with slightly more genes detected by ATRAC-seq (see 126 Figure 2d). In addition, ATRAC-seq showed comparable performance to 127 NEBNext in terms of gene expression measurement (see Figure 2e).

128
Compared to NEBNext, the insert size of ATRAC-seq library was considerably 129 shorter (see Figure S2c); nevertheless, we observed similar coverage 130 distribution over gene body. ATRAC-seq also showed a slight tendency to 3' 131 end of the gene body (see Figure 2f). This 3' bias of gene coverage decreased 132 as the amount of starting materials reduced; hence it is likely due to incomplete  Previous studies also found that Tn5 exhibits a slight insertion bias on dsDNA 146 substrates (18-20). We thus characterized sites of Tn5-catalyzed adaptor 147 insertion by calculating nucleotide composition of the first and last 10 bases of 148 each sequencing read after adaptor trimming. Similar to dsDNA substrates, we 149 also observed an apparent insertion signature on RNA/DNA hybrids (see Figure   150 S2f). Nevertheless, per-position information contents were extremely low, 151 suggesting such insertion bias is less likely to affect the uniformity of gene body 152 coverage (see Figure S2g). Overall, when utilized as a library preparation 153 method, ATRAC-seq demonstrates comparable performance with a traditional 154 RNA library preparation method, but outcompetes the traditional method in 155 terms of speed, convenience and cost.  Despite its unique advantages, there is room to further improve ATRAC-seq.

172
For instance, ATRAC-seq exhibits signature at sites of adaptor insertion as well 173 as a slight GC-bias for the insert sequences (see Figure S2e, S2f). Although 174 we did not find a predominant motif and hence this signature does not appear 175 to affect uniformity of coverage (see Figure S2g), it remains to be seen whether 176 or not future engineered Tn5 mutants can bypass this bias. In fact, a Tn5 mutant 177 showing reduced GC insertion bias on dsDNA has been reported previously 178 (21). In addition, the in vitro tagmentation efficiency of Tn5 on RNA/DNA hybrids 179 10 is low compared to its native substrate dsDNA. As wild-type Tn5 transposase 180 has been engineered to obtain hyperactive forms (4, 22-24), it is also tempting 181 to speculate that hyperactive mutants towards RNA/DNA hybrids could also be     For ATRAC-seq library preparation, all reactions were performed in one tube.