The Transposon-Encoded Protein TnpB Processes Its Own mRNA into ωRNA for Guided Nuclease Activity

TnpB is a member of the Obligate Mobile Element Guided Activity (OMEGA) RNA-guided nuclease family, is harbored in transposons, and likely functions to maintain the transposon in genomes. Previously, it was shown that TnpB cleaves double- and single-stranded DNA substrates in an RNA-guided manner, but the biogenesis of the TnpB ribonucleoprotein (RNP) complex is unknown. Using in vitro purified apo TnpB, we demonstrate the ability of TnpB to generate guide omegaRNA (ωRNA) from its own mRNA through 5′ processing. We also uncover a potential cis-regulatory mechanism whereby a region of the TnpB mRNA inhibits DNA cleavage by the RNP complex. We further expand the characterization of TnpB by examining ωRNA processing and RNA-guided nuclease activity in 59 orthologs spanning the natural diversity of the TnpB family. This work reveals a new functionality, ωRNA biogenesis, of TnpB, and characterizes additional members of this biotechnologically useful family of programmable enzymes.


Introduction
Proteins associated with the IS200/605 family of transposable elements have been found to be RNA-guided DNA-targeting enzymes. 1,2 These proteins, namely TnpB, IscB, and IsrB, associate with a noncoding RNA, termed omegaRNA or xRNA. The xRNA is often encoded in the same locus as the protein and enables TnpB to perform guided cleavage of DNA substrates. These IS200/ 605 transposon loci are flanked by transposon ends. 3 The 3¢ transposon end (right end or RE) demarcates the 3¢ end of the xRNA scaffold, which is followed by a guide sequence that is encoded outside of the transposon. 1,2 IS200/605 family transposons are most often mobilized by TnpA, a transposase with a single catalytic tyrosine (Y1) that preferentially excises from and inserts into single-stranded DNA (ssDNA). 4 TnpB is not known to be a transposase, but rather is thought to function akin to a homing endonuclease 5 that prevents loss of the transposon upon excision. 1,2,6 TnpB comprises a diverse family of proteins which have become associated with CRISPR arrays multiple times throughout evolution to give rise to various CRISPR-Cas12 subtypes. [7][8][9][10] Biochemical experiments have shown that TnpB cleavage of double-stranded DNA (dsDNA) is dependent on both the complementarity of the guide region of the xRNA to the target substrate and the presence of a target-adjacent motif (TAM) 5¢ to the target sequence. 1,2 This preference for a 5¢ TAM is analogous to the 5¢ protospacer-adjacent motif (PAM) preference of Cas12. 11 In addition to cleaving dsDNA, TnpB exhibits target-specific TAM-independent cleavage of ssDNA, as well as collateral (off-target) cleavage of ssDNA substrates in the presence of target-containing dsDNA or ssDNA. 1,2 TnpB is a promising candidate for development as a human genome editing tool due to its relatively compact size (350-550 amino acids) compared to most Cas nucleases (e.g., SpCas9 1368 amino acids (aa), 12 SaCas9 1053 aa, 13 AsCas12a 1307 aa). 11 The compact domain structure of TnpB is similar to Cas12f, consisting of N-terminal recognition and wedge (WED) domains connected to the RuvC catalytic domain by a linker. [14][15][16] This compact size enables packaging into adenoassociated viruses (AAVs), a clinically validated delivery modality. 17 Although AAVs have a number of advantages for gene therapy, they have a limited packaging capacity of 4.7 kb, which is too low for most Cas enzymes and the necessary regulatory elements and guide RNA. However, small Cas enzymes such as UnCas12f1 (529 aa) 18,19 and AsCas12f1 (422 aa) 20 can be engineered as genome or base editors and packaged into a single AAV, highlighting the promise of compact programmable nucleases for systemically-administered gene editing therapies.
To further explore the biotechnological potential of TnpB, we sought to further characterize the biochemical activity of diverse TnpB enzymes, focusing on the formation and identity of the xRNA. Given the broad diversity of the TnpB family, we sought to catalog and sample this diversity to explore the conserved features of TnpB function. Collectively, these results provide insight into the mechanism of TnpB and serve as a starting point for future endeavors to optimize TnpB for genome engineering.

Protein purification
Wild-type AmaTnpB (Addgene no. 176587) and RuvCmutant E271A AmaTnpB were purified as described previously. 1 TnpB with an N-terminal His14-MBP-TEV protease cleavage site was expressed in a pET45b(+) plasmid backbone in Rosetta 2(DE3) cells (Novagen). Cells were grown at 37°C in terrific broth (TB) medium supplemented with 100 lg/mL ampicillin and 30 lg/mL chloramphenicol overnight. One liter TB with 100 lg/mL ampicillin was inoculated with a 3 mL overnight culture, grown to an OD600 of 0.6-0.8, and subsequently induced with 0.2 mM IPTG and grown at 18°C for 24 h. Cells were harvested by centrifugation and resuspended in Buffer A (50 mM Tris pH 8, 1 M NaCl, 5% glycerol, 40 mM imidazole, and 5 mM b-mercaptoethanol) supplemented with benzonase (Sigma) and protease inhibitors (phenylmethylsulfonyl fluoride and Roche cOmplete, ethylenediaminetetraacetic acid-free) and then lysed by two passes with a high-pressure homogenizer (LM20 Microfluidizer, Microfluidics).
In vitro RNA and DNA cleavage reactions RNA substrates were prepared from PCR-generated DNA templates through in vitro transcription reactions with the NEB T7 HiScribe Kit and purified with the Zymo Clean & Concentrator-25 Kit. The 1221-nt target DNA substrate was generated by PCR and contains the cognate target sequence with a 5¢ TAM of TCAC, which when cleaved generates *531-nt and 690-nt fragments. Sequences for RNA and DNA substrates are provided in Supplementary File S1. In vitro reactions were prepared with final concentrations of 1 lM TnpB, 1 lM RNA substrate(s), 10 nM DNA substrate, and 1 U/lL murine RNase inhibitor (NEB) in 20 mM HEPES and 5 mM MgCl 2 . After incubation at 55°C for 30 min, the reactions were either (1) treated with RNase A (Qiagen), Proteinase K (NEB) and purified with PCR purification columns (Qiagen) or (2) treated with DNase I (NEB) and Proteinase K (NEB). RNase-treated samples were visualized on 2% E-Gel EX agarose gels (Invitrogen), and DNase-treated samples were denatured and visualized on 6% TBE-Urea polyacrylamide gels (Thermo Fisher Scientific). The DNase-treated samples were sequenced using the NEBNext Multiplex Small RNA Library Prep Set for Illumina (NEB).

Ortholog curation and selection
TnpBs for experimental characterization (Supplementary File S2) were sampled from diverse TnpBs by selecting different leaves from the phylogenetic tree described in a comprehensive survey of TnpB/Cas12 diversity. 8 Selected sequences were prioritized on the basis of their available contig lengths to ensure that short range (£3 kb) genomic associations of the transposon locus were captured appropriately, and whenever possible, sequences from complete genomes in the NCBI collection were prioritized due to their generally higher accuracy of assembled contigs relative to those from Joint Genome Institute and Whole Genome Shotgun assembled metagenomes. Locus RE boundaries (i.e., the boundary between the xRNA scaffold and guide sequence) were determined by aligning the 3¢ end of the locus with related locus sequences with MAFFT. 21 From the alignments, the guide-scaffold boundary was identified as the downstream-most position in which a sharp drop in sequence conservation occurred.
For the phylogenetic analysis presented in Figure 3, representative TnpBs were selected from the comprehensive TnpB/Cas12 study 8 that cover the main clades (Typical TnpBs, Derived TnpBs, and clades containing catalytic rearrangements of the RuvC-II (RII-r3 and 5) or RuvC-III (RIIIr-4) domain), major branches of TnpB, and some selected Cas12s (Supplementary File S3). These selected TnpB/Cas12 sequences, along with the experimentally studied TnpB sequences, were aligned using MAFFT-einsi, 21 then trimmed using TrimAl 22 with a gap threshold of 0.5. The LG+G4 substitution model for phylogenetic inference was selected using Model-Finder by optimizing the corrected Akaike Information Score. 23 A phylogenetic tree was then inferred using IQTree2 24 using the following parameters: -nstop 500 -bnni-ninit 5000-ntop 100-nbest 20, and 2000 ultrafast bootstraps 25 and finally visualized with iTOL. 26 Following this, all genes in each locus were searched using HMMER3 27 against hidden Markov model profiles for Cas1, Cas2, Cas4, 1 as well as Y1 TnpA, and IS607like Serine Recombinases using PF01797 and PF00239, respectively, from PFAM. 28 All CRISPRs were predicted using CRT. 29 Association to CRISPR arrays, Y1 TnpA, and IS607-like Serine Recombinases was determined by the presence of the respective feature (as identified above) within 1 kb of the protein coding sequence in the genomic locus. The multiple sequence alignment was used to determine for each sequence if each of the three residues in the RuvC catalytic triad was intact by comparing if the residue aligned at the catalytic triad position matched the expected residue (D for RuvC-I, E for RuvC-II, and D for RuvC-III). Noncanonical RuvC residues are indicated by the abbreviations RII-r (RuvC-II rearrangement) and RIII-r (RuvC-III rearrangement).
IVTT 5¢ RACE DNA templates were synthesized by Twist Biosciences or amplified by PCR from bacterial genomic DNA. All templates included a 5¢ UTR containing a T7 promoter and ribosomal binding site for expression in the PURExpress in vitro transcription and translation (IVTT) kit (NEB). Reactions were prepared with 110 ng of DNA template and 1 U/lL murine RNase inhibitor (NEB) and incubated at 37°C for 2 h. RNA was extracted with TRIzol reagent (Thermo Fisher Scientific) and purified with Zymo Clean & Concentrator-96. 5¢ rapid amplification of cDNA ends (RACE) was carried out by annealing 1 lM of primer annealing to the 3¢ end of the xRNA (Supplementary File S2) to 1000 ng of purified RNA at 65°C for 15 min and 25°C for 15 min. Next, 375 nM 5¢ SR adaptor (NEB) was denatured and ligated to the RNA using T4 RNA Ligase 1 (NEB) for 1 h at 25°C. Reverse transcription was carried out using Super-Script IV RT (Invitrogen). cDNA was extracted by PCR purification (Qiagen) and amplified with 12 cycles of PCR using NEBNext High Fidelity 2X PCR Master Mix (NEB) with one primer annealing to the 5¢ adaptor and one primer annealing to the 3¢ end of the xRNA, followed by a second round of PCR with 18 cycles with primers adding i7 and i5 Illumina adaptors and barcodes. Amplified libraries were gel extracted, quantified using Qubit Fluorometric Quantification (Thermo Fisher Scientific), and subjected to single-end sequencing on an Illumina MiSeq with the following parameters: read 1-300 cycles, index 1-8 cycles, index 2-8 cycles. Reads were trimmed of adaptors and aligned to template sequences using Geneious Prime.
xRNA scaffolds were annotated by selecting sequences with a clear 5¢ start site (i.e., where a substantial portion of reads corresponded to a single start site). These annotated scaffolds were used for secondary structure prediction and the TAM screen. Raw RNA-seq sequencing data can be accessed at the National Center for Biotechnology Information Sequence Read Archive (BioProject PRJNA954882). RNA secondary structure was visualized using Vienna RNAfold. 30,31 RNA sequences were aligned using mafft-xinsi. 21 RNAalifold 32 and R2R 33 were used to generate covariance models. For orthologs where no clear start site could be ascertained, the longest RNA species observed was used for the TAM screen.
IVTT TAM screen DNA templates encoding T7 promoter-driven TnpB proteins were generated by PCR from custom synthesis products or bacterial genomic DNA. DNA templates encoding T7 promoter-driven xRNA scaffolds with a 20-nt guide sequence were also generated by PCR and used to prepare RNA with the T7 HiScribe Kit. IVTT reactions were prepared with 150 ng protein template, 5000 ng RNA, and 1 U/lL murine RNase inhibitor in the PURExpress kit (NEB). After 4 h at 37°C, 50 ng of an 8N TAM library plasmid was added to each reaction, and the reaction proceeded for an additional 20 min at 37°C. Reactions were treated with 10 lg RNase A (Qiagen) and 8 U Proteinase K (NEB) each followed by a 5 min incubation at 37°C. DNA was extracted by PCR purification, and adaptors were ligated using the NEB-Next UltraII DNA Library Prep Kit for Illumina (NEB) with the NEBNext Adaptor for Illumina (NEB) as per the manufacturer's protocol. Following adaptor ligation, cleaved products were amplified using one primer specific to the TAM library backbone and one primer specific to the NEBNext adaptor with 12 cycles of PCR. After a second round of 18-cycle PCR with primers adding the i7 and i5 Illumina adaptors and barcodes, amplified libraries were gel-extracted, quantified using Qubit Fluorometric Quantification, and subjected to singleend sequencing on an Illumina MiSeq with the following parameters: read 1-80 cycles, index 1-8 cycles, index 2-8 cycles.
TAM enrichment was analyzed and visualized using a custom Python script. 1,34 Raw TAM screen sequencing data can be accessed at the National Center for Biotechnology Information Sequence Read Archive (BioProject PRJNA954882).

Results
TnpB processes its own mRNA into xRNA Given the evolutionary relationship between TnpB and Cas12s, we hypothesized that the ability of Cas12s to process pre-crRNA into crRNA 35-40 may have originated from analogous functions in TnpB. We therefore investigated whether an exemplar TnpB ortholog from Alicyclobacillus macrosporangiidus (AmaTnpB) possesses RNA processing activity to generate xRNA. To directly interrogate whether TnpB can process RNA (Fig. 1A), we incubated purified apo form wildtype AmaTnpB or a catalytically dead RuvC-II point mutant (DRuvC) with 1:1 molar ratios of four different in vitro transcribed RNA substrates, as well as a target DNA substrate (Fig. 1B). These RNA substrates include a random negative control sequence (substrate 1), the hypothesized xRNA with a 20-nt guide sequence, which was previously determined based on RNP pulldown of a closely related ortholog 1 (substrate 2), the full TnpB ORF extended to include the nonoverlapping xRNA and guide sequence (substrate 3), and the hypothesized xRNA with an additional 59 nt of 3¢ padding sequence (substrate 4). After incubation, a fraction of the sample was treated with DNase to visualize the RNA species on a denaturing gel, and the remaining sample was treated with RNase to visualize cleavage of the DNA substrate.
On the denaturing gel, no processed RNA substrates were visualized from incubation of TnpB with substrate 1 (Fig. 1C). Upon TnpB incubation with RNA substrate 2, the hypothesized xRNA sequence was processed to a 126-nt sequence, which we confirmed by RNA sequencing and hereafter refer to as the processed xRNA (Fig. 1D). Substrate 3, which resembles the native mRNA sequence, was processed to this same 126-nt species (Fig. 1C). Substrate 4 was processed to a 185-nt species. The difference in length between the processed species resulting from substrates 2/3 and 4 suggests that the 59-nt 3¢ flanking sequence is not processed by TnpB. The DRuvC mutant did not exhibit RNase activity on any substrate, suggesting that TnpB, and not an RNase contaminant from purification, is responsible for RNA processing and further that the RuvC domain of TnpB is responsible for xRNA biogenesis.
We next investigated whether xRNAs processed by TnpB support target cleavage by examining the RNasetreated fractions of the same reactions above, which include a 1221-nt dsDNA substrate containing the cognate target sequence and TAM for AmaTnpB (5¢ TCAC). 1 Wild-type TnpB utilized the processed xRNA from substrates 2 and 3 to cleave the DNA target, and the inclusion of 3¢ padding sequence in substrate 4 did not hinder substrate cleavage (Fig. 1C). These results imply that although TnpB is equipped to process 5¢, but not 3¢, sequence, the enzyme can perform cleavage of its DNA substrate regardless of extra sequence padding the xRNA. This finding is consistent with the fact that TnpB from Deinococcus radiodurans (Dra2TnpB) utilizes only the proximal 12 nt of the guide sequence for DNA targeting, regardless of the length of guide sequence provided, due to the lack of interaction between the protein and distal end of the xRNA: target heteroduplex. 15,16 Cis-regulation of DNA cleavage by AmaTnpB mRNA When assaying for activity, we noticed that DNA cleavage in the presence of substrate 3 is weak compared to substrates 2 and 4 (Fig. 1C). Therefore, we explored the hypothesis that the TnpB mRNA (i.e., the extra 5¢ sequence in substrate 3 compared to substrate 2) exerts an inhibitory effect on its own DNA cleavage activity.
We prepared reactions with active TnpB protein, the 126-nt processed xRNA, different 3¢ truncations of the mRNA, and DNA substrate ( Fig. 2A). Compared to a reaction with a scrambled 1190-nt RNA ( Fig. 2A lane 12) or without an additional RNA species (lane 1), the 5¢ 825 nt of the mRNA does not interfere with efficient DNA cleavage. However, a sequence or structural feature present in the mRNA between 825 and 875 nt (Fig. 2B) results in a substantial reduction in DNA cleavage activity ( Fig. 2A lanes  5-7). We confirmed that a 125-nt RNA fragment encompassing this region substantially reduces the DNA cleavage ( Supplementary Fig. S1). We hypothesize that xRNA secondary structure (Fig. 2C) may be disrupted by its complementarity to this region of the mRNA (Fig. 2D), pointing to a potential mechanism for the inhibitory effect.
Extensive xRNA diversity in survey of TnpB orthologs We then turned our attention to the natural diversity of TnpB systems to assess the prevalence of xRNA biogenesis and to further understand xRNA structural and sequence diversity. TnpB is highly abundant in bacteria and archaea, and there is substantial diversity found within this family of proteins. 1,41 To begin experimentally studying this diversity, we subsampled the full set of TnpB systems. 8 We focused on members of the IS200/IS605/IS607 transposon superfamily, that is, those lacking association with CRISPR arrays. We further excluded TnpBs associated with transposases besides the canonical Y1 or serine recombinases or nonmobile orthologs, as those TnpBs may perform alternate functions. We also imposed a requirement that the 3¢ end of the xRNA be well-conserved within the clade, to facilitate accurate prediction of the guide sequence. We focused on TnpBs from the five major clades, which are defined by particular configurations of the RuvC catalytic aa motif: Typical TnpBs (RuvC-III DRDXN), Derived TnpBs (RuvC-III NADXN), and clades containing catalytic rearrangements of the RuvC-II (RII-r3 and 5) or RuvC-III (RIII-r4) domain (Fig. 3). 8 We note that there is no a priori expectation that TnpB proteins from these latter three clades will be catalytically inactivated; although the predicted RuvC motif is atypical, compensatory mutations may permit catalytic function, akin to the natural variation in RNase H-like domain catalytic motifs. 42
Based on this information, we selected 59 TnpB orthologs that span the phylogenetic diversity within the constraints outlined above. These orthologs range in length from 353 to 550 aa, with some proteins having as little as 7% aa sequence identity to each other (Supplementary File S2).
To investigate these 59 TnpB orthologs at higher throughput, we expressed the TnpB-xRNA-encoding loci from a single DNA template in IVTT reactions. As TnpB processes only the 5¢ end, we utilized 5¢ RACE to determine the 5¢ processing site of each xRNA. By priming cDNA synthesis from the 3¢ end of the xRNA scaffold, this technique captures the xRNA scaffold only, excluding the guide region. We note that not all orthologs may exhibit proper expression and folding under IVTT conditions and that the absence of processing in our assay does not necessarily rule out the possibility that the ortholog has activity under different conditions. The xRNA sequences generated by the TnpB proteins we tested generally fall into one of two categories: those with at least one clear 5¢ processing site (30/59 orthologs) and those without a clear site based on the coverage plots of the RNA reads ( Supplementary Fig. S2). The IVTT 5¢ RACE assay recapitulated the AmaTnpB in vitro processing experiments, generating a 106-nt scaffold (ortholog 6 in Supplementary Fig. S2). Some orthologs, such as Dra2TnpB (ortholog 22), showed multiple apparent processing sites, consistent with previous observations suggesting either incomplete or promiscuous RNase activity. 2,15 The 30 orthologs with evidence of processing ability are found throughout the tree and are not confined to specific clades (Fig. 3).   3. Phylogenetic tree of TnpB protein sequences from publicly available genomes and metagenomes, illustrating representative sequences from the five major clades of TnpB and CRISPR-associated TnpBs (i.e., Cas12s). A schematic of the RuvC subdomains (I, II, III, ZF) for each clade illustrates the predicted catalytic residues (pink) at each site. Fifty-nine orthologs were sampled from the tree (dots), labeled with numbers. Labels with an asterisk indicate orthologs demonstrating active xRNA processing in IVTT reactions and whose predicted structures are illustrated in Supplementary Figure S3. Colored dots indicate 5¢ TAM sequence identified from the IVTT TAM screen and are shown in more detail in Supplementary Figure S5. IVTT, in vitro transcription and translation; TAM, targetadjacent motif; ZF, zinc finger.
Processed xRNA species ranged from 79 to 466 nt and have predicted structures rich in hairpins, of which the number and relative orientation vary widely. The RE of IS200/605 transposons is known to contain a subterminal hairpin, which we find in almost all orthologs within 10 nt of the 3¢ end of the xRNA scaffold (Fig. 4, Supplementary Fig. S3). This appears to be the only consistently conserved feature among these divergent TnpB xRNA scaffolds. The transposase TnpA is known to interact with subterminal hairpins at the 5¢ and 3¢ ends of the transposon, 43 but the 3¢ hairpin may be functionally important for TnpB as well. Recently solved cryo-EM structures of Dra2TnpB illustrate how a linker in the WED domain interacts with the stem of this 3¢ hairpin. 15,16 Furthermore, this interaction is present in experimentally solved structures of all Cas12 subtypes to date, pointing to an evolutionarily conserved nuclease-guide RNA interaction ( Supplementary Fig. S4). 14,15,37,39,[44][45][46][47][48] Limited diversity of TAM sequences We utilized the newly-determined xRNA sequences to conduct a screen for DNA nuclease activity of the 59 TnpB orthologs in IVTT reactions to ascertain 5¢ TAM preferences. For orthologs without a clear 5¢ start site of the xRNA, we utilized the longest RNA species observed in the sequencing. Using this screen, we recovered the known TAM sequences of TnpB orthologs characterized previously, including AmaTnpB (TCAC) 1 and Dra2TnpB (TTGAT). 2 In total, 27 out of 59 orthologs demonstrated TAM activity in the screen; these active orthologs are broadly distributed among the different TnpB clades and branches (Fig. 3). One ortholog in the RII-r5 clade is active, confirming that catalytic rearrangements of the RuvC catalytic site can still support guided dsDNA cleavage. We noted several orthologs that process xRNA but do not show activity in the IVTT TAM screen and speculate that these TnpBs may target alternate substrates besides dsDNA, consistent with the diversity of substrates targeted by Cas12s. 49,50 TAMs were found to be a maximum of 5 nt long, with relatively little degeneracy in the positions (Supplementary Fig. S5). Nearly all characterized TAMs are ATrich, although a small subset is G-rich. This relative lack of diversity in the TAM sequence is striking compared to, for example, Cas9, which is a relatively less abundant and diverse protein family compared to TnpB 1,8 but still exhibits a wide variation in PAMs. 51 The limited TAM diversity of TnpBs may be explained by the co-evolution of TnpA and TnpB, as TnpA also utilizes the TAM to recognize the transposon ssDNA for excision and insertion. 1

Discussion
The biochemical activity of OMEGA nucleases was previously found to include RNA-guided ssDNA and dsDNA cleavage, but no RNAse activity was evident. 1 Although heterologous expression of TnpB loci in Escherichia coli generates processed xRNA species that physically associate with TnpB protein, it was unclear whether these RNA species were the result of TnpB RNA processing activity or endogenous host RNases. 1,2,15 In this study, we show that TnpB can process its own mRNA into xRNA to generate active RNP complexes. In the cell, it is possible that endogenous host RNases assist in processing, especially at the 3¢ end of the guide. We observed that an intact RuvC catalytic domain is essential for TnpB xRNA processing activity. We therefore infer that the TnpB RuvC domain performs the nucleophilic attack on the RNA, which is consistent with the activities of Cas12c2 37 and Cas12j 40 but in contrast to Cas12a 35,36 and Cas12i, 38,39 which instead use the WED domain for crRNA processing. These findings attribute the additional function of RNA processing to this class of RNA-guided nucleases.
The observation that part of the AmaTnpB mRNA exerts an inhibitory effect on cleavage of DNA raises the question of whether this phenomenon is conserved among different TnpB orthologs. IS200/605 transposons are known to contain a variety of post-transcriptional cisregulatory mechanisms to regulate TnpA activity, including mRNA secondary structure and an antisense small RNA, 53 suggesting that TnpB activity may also be regulated by similar mechanisms. For TnpB, the inhibitory region of the mRNA may base pair with the xRNA, thereby disrupting its structure and ability to bind to TnpB. The inhibitory effect may also be acting on the RNP complex through a different mechanism. Whether this inhibitory effect can be recapitulated in cells and its functional consequence remains to be explored.
According to recent evidence, TnpB improves retention of its transposon whereby an xRNA expressed from one transposon locus targets transposon-lacking versions of that locus in the same cell, that is, loci where the transposon has not yet inserted or those which have undergone excision. 6 Therefore, it appears that TnpB DNA cleavage serves to fix the transposoncontaining locus in the population by eliminating loci that have undergone excision 1 and/or homing to uninserted loci. 2,54 Under this hypothesis, the TnpB mRNA could serve as a temporally-sensitive signal of active transcription of the transposon and its continued presence in the genome and, therefore, exert negative feedback on TnpB DNA cleavage to prevent unnecessary genome instability.
We also note that the negative feedback exerted by TnpB mRNA on DNA cleavage should be taken into account when utilizing TnpB for applications in heterologous systems. Codon optimization of the TnpB ORF sequence up until the 5¢ end of the xRNA will likely abrogate the inhibitory effect, ultimately maximizing enzymatic activity.
Our exploration of 59 different TnpB loci reconstituted in IVTT revealed a substantial amount of diversity in the xRNAs of these orthologs. Among these orthologs, the xRNAs contain a minimum of two predicted stem-loop structures but otherwise vary widely in length and overall topology. Furthermore, parts of the xRNA with predicted secondary structure may in fact be disordered or flexible, as was found with Dra2TnpB, for which a fully functional xRNA can be reconstituted by maintaining the structured domains of the xRNA. 15 Overall, our demonstration of TnpB RNA processing and observation of a cis-regulatory mechanism complement studies of the biological function of OMEGA effectors. Furthermore, our survey of the natural diversity of TnpB-xRNA complexes adds to the toolbox of this biotechnologically promising family of enzymes.