The genome of the Hi5 germ cell line from Trichoplusia ni, an agricultural pest and novel model for small RNA biology

We report a draft assembly of the genome of Hi5 cells from the lepidopteran insect pest, Trichoplusia ni, assigning 90.6% of bases to one of 28 chromosomes and predicting 14,037 protein-coding genes. Chemoreception and detoxification gene families reveal T. ni-specific gene expansions that may explain its widespread distribution and rapid adaptation to insecticides. Transcriptome and small RNA data from thorax, ovary, testis, and the germline-derived Hi5 cell line show distinct expression profiles for 295 microRNA- and >393 piRNA-producing loci, as well as 39 genes encoding small RNA pathway proteins. Nearly all of the W chromosome is devoted to piRNA production, and T. ni siRNAs are not 2´-O-methylated. To enable use of Hi5 cells as a model system, we have established genome editing and single-cell cloning protocols. The T. ni genome provides insights into pest control and allows Hi5 cells to become a new tool for studying small RNAs ex vivo.


Introduction
To determine which system T. ni uses and to identify which contigs belong to the sex 175 chromosomes, we sequenced genomic DNA from male and female pupae and 176 calculated the male:female coverage ratio for each contig. We found that 175 177 presumably Z-linked contigs (20.0 Mb) had approximately twice the coverage in male 178 compared to female DNA (median male:female ratio = 1.92; Figure  In both the germline and the soma, T. ni piRNAs originate from discrete genomic loci. 466 To define these piRNA source loci, we employed an expectation-maximization algorithm 467 that resolves piRNAs mapping to multiple genomic locations. Applying this method to 468 multiple small RNA-seq datasets, we defined piRNA-producing loci comprising 10. as it is difficult to resolve reads mapping to its flanking regions: 70.8% of bases in the 514 flanking regions do not permit piRNAs to map uniquely to the genome. In fact, 85.1% of 515 the sequences between clusters on the W chromosome are not uniquely mappable.
These gaps appear to reflect low mappability and not boundaries between discrete 517 clusters. We propose that the W chromosome itself is a giant piRNA cluster. 518 To further test this idea, we identified piRNA reads that uniquely map to one 519 location among all contigs and measured their abundance per kilobase of the genome. 520 W-linked contigs had a median piRNA abundance of 14.4 RPKM in ovaries, 379-fold 521 higher than the median of all autosomal and Z-linked contigs, consistent with the view 522 that almost the entire W chromosome produces piRNAs. In B. mori females, a plurality 523 of piRNAs come from the W chromosome: ovary-enriched piRNAs often map to W-524 linked sequences, but not autosomes (Kawaoka, S et al., 2011). Similarly, for T. ni, 525 27.2% of uniquely mapping ovary piRNAs derive from W-linked sequences, even 526 though these contigs compose only 2.8% of the genome ( Figure 5C). The W 527 chromosome may produce more piRNAs than our estimate, as the unassembled 528 repetitive portions of the W chromosome likely also produce piRNAs. Thus, the entire W 529 chromosome is a major source of piRNAs in T. ni ovaries ( Figure 5B). To our 530 knowledge, the T. ni W chromosome is the first example of an entire chromosome 531 devoted to piRNA production. 532 To determine if there are W-linked regions devoid of piRNAs, we mapped all 533 piRNAs to the W-linked contigs and found that 11.0% of the W-linked bases were not 534 covered by any piRNAs, indicating at least part of the W chromosome does not produce 535 any piRNAs. Next, we manually inspected 74 putative W-linked protein-coding genes 536 and nine putative W-linked miRNAs. All nine W-linked miRNAs ( Figure 5B, 537 Supplementary file 1J) are T. ni-specific, and small RNAs mapping to these predicted 538 miRNA loci showed significant ping-pong signature (Z-score = 14.2, p = 1.81×10 -45 ), 539 suggesting that these are likely piRNAs, not authentic miRNAs. For the putative protein-540 coding genes, we categorized them into orphan genes (no homologs found), 541 transposons (good homology to transposons), uncharacterized/hypothetical proteins, 542 and potential protein-coding genes with homology to the NCBI non-redundant protein 543 sequences. We then asked whether piRNAs were produced from these genes ( Figure  544 5-figure supplement 2C). Among W-linked genes, those with transposon homology on 545 average produced the most piRNAs (44.9 median ppm) whereas those with homology 546 to annotated genes produced the fewest (9.81 median ppm). Some putative genes 547 (such as TNI001015 and TNI005339) produced no piRNAs at all. We conclude that 548 although some W-linked loci do not produce piRNAs, nearly the entire W chromosome 549 produces piRNAs. 550 In contrast to the W chromosome, T. ni autosomes and the Z chromosome 551 produce piRNAs from discrete loci-63 autosomal and 11 Z-linked contigs had piRNA 552 levels >10 rpkm. Few piRNAs are produced outside of these loci: for example, the 553 median piRNA level across all autosomal and Z-linked contigs was ~0 in ovaries ( Figure  554 5-figure supplement 2B). 555

Expression of piRNA clusters 556
In the T. ni germline, piRNA production from individual clusters varies widely, but the 557 same five piRNA clusters produce the most piRNAs in ovary (34.9% of piRNAs), testis 558 (49.3%), and Hi5 cells (44.0%), suggesting that they serve as master loci for germline 559 transposon silencing. Other piRNA clusters show tissue-specific expression, with the W 560 chromosome producing more piRNAs in ovary than in Hi5 cells, and three Z-linked 561 clusters producing many more piRNAs in testis than in ovary (15.0-24.7 times more), 562 even after accounting for the absence of dosage compensation in germline tissues 563 has no rhino ortholog, its piRNA precursor RNAs are rarely spliced as observed for that map across exon-exon junctions and a minimum splicing entropy of 2 to exclude 596 PCR duplicates (Graveley, BR et al., 2011). This approach detected just 27 splice sites 597 among all piRNA precursor transcripts from ovary, testis, thorax, and Hi5 piRNA 598 clusters ( Figure 6C). Of these 27 splice sites, 19 fall in uni-strand piRNA clusters. We 599 conclude that, as in flies, transcripts from T. ni dual-strand piRNAs clusters are rarely if 600 ever spliced. Unlike flies (Goriaux, C et al., 2014), RNA from T. ni uni-strand piRNA 601 clusters also undergoes splicing infrequently. 602 The absence of piRNA precursor splicing in dual-strand piRNA clusters could 603 reflect an active suppression of the splicing machinery or a lack of splice sites. To 604 distinguish between these two mechanisms, we predicted gene models for piRNA-605 producing loci, employing the same parameters used for protein-coding genes. For 606 piRNA clusters, this approach generated 1,332 gene models encoding polypeptides 607 >200 amino acids. These models comprise 2,544 introns with consensus splicing 608 signals ( Figure 6-figure supplement 1C). Notably, ~90% of these predicted gene 609 models had high sequence similarity to transposon consensus sequences (BLAST e-610 value<10 -10 ), indicating that many transposons in piRNA clusters have intact splice 611 sites. We conclude that piRNA precursors contain splice sites, but their use is actively 612 suppressed. 613 To measure splicing efficiency, we calculated the ratio of spliced to unspliced 614 reads for each predicted splice site in the piRNA clusters. High-confidence splice sites 615 in protein-coding genes outside piRNA clusters served as a control. Compared to the 616 control set of genes, splicing efficiency in piRNA loci was 9.67-fold lower in ovary, 2.41-617 fold lower in testis, 3.23-fold lower in thorax, and 17.0-fold lower in Hi5 cells ( Figure 6D), 618 showing that T. ni piRNA precursor transcripts are rarely and inefficiently spliced. To 619 test whether uni-and dual-strand piRNA cluster transcripts are differentially spliced in T. 620 ni, we evaluated the experimentally supported splice sites from Hi5, ovary, testis, and 621 thorax collectively. Dual-strand cluster transcripts had 1.71-fold lower splicing efficiency compared to uni-strand clusters ( Figure 6D). Thus, T. ni suppresses splicing of dual-623 and uni-strand piRNA cluster transcripts by a mechanism distinct from the Rhino-624 dependent pathway in D. melanogaster. That this novel splicing suppression pathway is 625 active in Hi5 cells should facilitate its molecular dissection. 626

Genome-editing and single-cell cloning of Hi5 cells 627
The study of arthropod piRNAs has been limited both by a lack of suitable cultured cell  additions-at the deletion junction is consistent with a Cas9-mediated dsDNA break 665 having been repaired by NHEJ ( Figure 7A). We note that these cells still contain at least 666 one wild-type copy of TnPiwi. We have not yet obtained cells in which all four copies of 667 TnPiwi are disrupted, perhaps because in the absence of Piwi, Hi5 cells are inviable. 668 To test whether an exogenous donor DNA could facilitate the site-specific 669 incorporation of protein tag sequences into Hi5 genome, we designed two sgRNAs with 670 target sites ~90 bp apart, flanking the vasa start codon ( Figure 7C). As a donor, we 671 perinuclear structure, consistent with Vasa localizing to nuage in Hi5 cells ( Figure 8C).

Discussion 701
Using Hi5 cells, we have sequenced and assembled the genome of the cabbage looper, 702 T. ni, a common and destructive agricultural pest that feeds on many plants of economic 703 importance. Examination of the T. ni genome and transcriptome reveals the expansion 704 of detoxification-related gene families (Table 1 and  Notably, the W chromosome not only is a major piRNA source, but also produces 750 piRNAs from almost its entirety. Future studies are needed to determine whether this is 751 a common feature of W chromosomes in Lepidoptera and other insects.                    Conserved miRNA genes Lepidoptera-speci c T. ni -speci c