TwinBLAST: When Two Is Better than One

Analysis of sequence read pairs can be essential for characterizing structural variation, including junction-spanning pairs of reads (JSPRs) suggesting recent lateral/horizontal gene transfer. TwinBLAST can be used to facilitate this analysis of JSPRs by enabling the visualization and curation of two BLAST reports side by side in a single interface.

L ateral and horizontal gene transfer (LGT and HGT, respectively) are common in bacteria (1,2), which exchange DNA to increase variance in the absence of sexual recombination.
LGT can even occur between very diverse taxa, such as bacteria and animals. There is a plethora of recent LGTs from Wolbachia endosymbionts to their hosts (3)(4)(5)(6)(7)(8). Recent LGTs can be identified by the presence of junction-spanning read pairs (JSPRs) between donor and recipient genomes with tools such as LGTSeek (9) with manual examination of the BLAST search results for pairs of sequencing reads. To this end, we developed a flexible tool called TwinBLAST to enable visual inspection and curation of two BLAST (10) reports simultaneously.
TwinBLAST is available to users through either the source code (https://github.com/ IGS/twinblast) or a preconfigured virtual machine (VM; https://sourceforge.net/projects/ twinblast/files/) that has all the necessary dependencies installed, as well as example data. TwinBLAST is a Web-based utility with the interface implemented in Ext JS JavaScript and the server-side code implemented in Perl, making use of BioPerl (11) modules for BLAST file parsing/indexing, CGI for argument handling, and Bio::Graphics for rendering alignment visuals. A MySQL database is present in the backend to enable curation of the read pairs. The installation and usage of TwinBLAST are outlined in an online tutorial (https://docs.google.com/document/d/1YKzd8pH05Wd5dB5cNLmo _Q6AKyEQbGz4dmiIecjG6Ho/edit?uspϭsharing) and YouTube video (https://www .youtube.com/watch?vϭFUqoxEIGML0&listϭPLT3OVYkIByoHAOIu1ZxV-undAxsUf3cbg &indexϭ3).
The TwinBLAST interface (e.g., http://lgt.igs.umaryland.edu/twinblast/) has four panels (Fig. 1). The two largest panels each contain an independently scrollable hyperlinked BLAST report, one for each read in a read pair. Along the entire top is the configuration panel ( Fig. 1), which is used for loading the data and is hidden by default. On the right side of the configuration panel are places to specify BLAST output files for both reads and the identification (ID) suffix used to distinguish the two reads. The ID prefix free-form text box allows an ID to be specified, such that the BLAST reports for the ID will be displayed in the corresponding boxes on the left and right sides of the display. There is an option when setting up a private TwinBLAST interface to have radial buttons enabling curation. Lastly, a query list can be provided on the left side of the configuration panel that populates the navigation and curation panel.
TwinBLAST has greatly increased our ability to rapidly validate and curate putative JSPRs, aiding in the identification of putative LGTs. For example, we identified putative JSPRs in public data from Drosophila mauritiana mau12w (SRA number SRA050824) (12), where one read in a pair is initially identified as matching a Wolbachia reference genome from Wolbachia pipientis strain wMel (13) and Wolbachia sp. strains wRi (14) and wPip (15) (GenBank accession numbers NC_002978, NC_010981, and NC_012416) using BWA ALN version 0.5.9-r16 (16) with default parameters, while the other read in the pair did not map. These read pairs will include (i) putative JSPRs that could indicate Wolbachia-host LGT and/or (ii) junctions between a conserved and unique region in the query Wolbachia genome. Therefore, subsequently, these read pairs are searched against the NCBI NT database using BLASTN, visualizing and curating the results in TwinBLAST based on both the taxonomy of the BLAST matches for both reads in the pair and the complexity of the sequences. A manually curated subset is provided (http://lgt.igs.umaryland.edu/twinblast/) that includes only read pairs where one read matches a Wolbachia sp. and the other read in the pair matches the insect. This curation suggests that further experiments aimed at examining LGT from a Wolbachia endosymbiont to this line of D. mauritiana are justified and warranted.

ACKNOWLEDGMENTS
This research was funded by an NIH Director's New Innovator Award program (1-DP2-OD007372), an NIH Director's Transformative Research Award (R01CA206188), and a US National Science Foundation grant (ABI1457957).