Endogenous viral element-derived piRNAs are not required for production of ping-pong-dependent piRNAs from Diaphorina citri densovirus

Partial integrations of DNA and non-retroviral RNA virus genomes, termed endogenous viral elements (EVEs), are abundant in arthropod genomes and often produce PIWI-interacting RNAs (piRNAs) speculated to target cognate viruses through the ping-pong cycle, a post-transcriptional RNA silencing mechanism. Here we describe a Diaphorina citri densovirus (DcDV)-derived EVE in the genome of Diaphorina citri. We found that this EVE gives rise to DcDV-specific primary piRNAs and is unevenly distributed among D. citri populations. Unexpectedly, we found that DcDV is targeted by ping-pong-dependent viral piRNAs (vpiRNAs) in D. citri lacking the DcDV-derived EVE, while four naturally infecting RNA viruses of D. citri are not targeted by vpiRNAs. Furthermore, a recombinant Cricket paralysis virus containing a portion of the DcDV genome corresponding to the DcDV-derived EVE was not targeted by vpiRNAs during infection in D. citri harboring the EVE. These results represent the first report of ping-pong-dependent vpiRNAs outside of mosquitoes.


Introduction 23
The Piwi-interacting RNA (piRNA) pathway is a small RNA (sRNA) guided gene silencing 24 mechanism responsible for repressing transposable elements (TEs) in animals and emerging 25 evidence supports a role for this pathway in antiviral responses in some mosquito species and 26 mosquito-derived cell lines (1-5). In Drosophila melanogaster, biogenesis of primary piRNAs 27 begins with transcription of single-stranded piRNA precursor transcripts from discrete genomic 28 regions called piRNA clusters (6). TE sequences integrated into piRNA clusters can become 29 transcribed as part of a piRNA precursor transcript and these precursor transcripts are processed 30 into primary piRNAs that direct transcriptional and post-transcriptional silencing of TEs by 31 association with the Piwi-family Argonaute proteins Piwi and Aubergine, respectively (1). 32 During the ping-pong cycle, cleavage of sense TE RNA directed by Aubergine-bound antisense 33 primary piRNAs triggers production of sense secondary piRNAs from cleaved TE RNA and, in 34 turn, sense secondary piRNAs direct cleavage of antisense piRNA precursor transcripts via 35 association with another Piwi-family Argonaute protein, Argonaute 3 (Ago3), to specifically 36 amplify the response against active TEs (6). piRNAs are distinguished from other sRNAs by 37 their size (24-32 nt), nucleotide biases (uridine as the first nucleotide for primary piRNAs and 38 adenine as the tenth nucleotide for secondary piRNAs, known as the 1U and 10A bias), and 39 association with a Piwi-family Argonaute protein (1). Additionally, complementary piRNAs 40 produced by the ping-pong cycle have 5' ends separated by exactly 10 nt, known as the ping-41 pong signature (1). 42

43
The presence of ping-pong-dependent virus-derived piRNAs (vpiRNAs) during infection with 44 several RNA viruses in Aedes and Culex mosquitoes and cell lines has been reported (2-5,7-12) 45 and reduced expression of piRNA pathway components leads to increased replication of Semliki 46 Here we further characterized the DcDV-derived EVE present within the D. citri genome. We 116 found that piRNAs are produced from this EVE in multiple tissue types and that this EVE is 117 conserved among some geographically distinct populations of D. citri, but is absent from other 118 populations. By analyzing sRNA profiles in D. citri insects infected with DcDV, we found that 119 this virus is targeted by ping-pong-dependent vpiRNAs independent of DcDV-derived EVEs and 120 endogenous DcDV-specific piRNAs. Additionally, analysis of sRNA profiles during infection 121 with other viruses suggests that RNA viruses are not a target of vpiRNAs in D. citri. 122 123 124

Results 125
A DcDV-derived EVE with high nucleotide identity to DcDV is conserved in some populations 126 of D. citri, but absent in others 127 We previously identified a 621 bp DcDV-derived EVE located within a piRNA cluster on D. 128 citri genomic scaffold 2850 by comparing deduced virus protein sequences to deduced D. citri 129 genome-encoded protein sequences using BLASTx (15). This EVE was 85.5% identical to the 130 corresponding region of the DcDV genome at the deduced amino acid level and resided just 131 downstream from another 624 bp EVE derived from a different portion of the DcDV genome. To 132 characterize these EVEs at the nucleotide level, we aligned the nucleotide sequence of the DcDV 133 genome to the region of the D. citri genome harboring these DcDV-derived EVEs. We found 134 that this region of the D. citri genome contains sequences corresponding to the DcDV inverted 135 terminal repeats (ITRs) and to two regions of the non-structural protein (NS) gene cassette 136 together spanning 1,396 bp. For simplicity we refer to these regions as endogenous ITR (EITR) 137 and endogenous NS (ENS) (Fig. 1a). Notably, the portion of ENS spanning nucleotides 7,523 to 138 8,175 within genomic scaffold 2850 shares 86% nucleotide identity with the corresponding 139 region of the DcDV genome. 140 141 Because our EVE prediction shown in Fig. 1a was based only on BLAST results and a previous 142 D. citri genome assembly, we wanted to confirm the presence, sequence, and organization of the 143 DcDV-derived EVEs. To confirm the presence of these EVEs, we designed PCR primers to 144 amplify this region from D. citri genomic DNA (Fig. 1a). We obtained PCR products of 145 unexpected size (based on the available sequence of scaffold 2850 and the organization depicted 146 in Fig. 1a) for both primer sets (Fig. 1b). Sequencing of these PCR products revealed that the 147 portion of ENS spanning nucleotides 5,248 to 5,871 within genomic scaffold 2850 was not 148 present and the distance between the ITR-like fragments was 3,885 bp rather than 11,038 bp. Southern Asia has resulted in segregation of the two lineages, such that only lineage B is found 166 in North America, while lineage A predominates in Southeast Asia, Africa, and South America 167 (30). Besides CRF-CA, three other D. citri colonies are maintained at the CRF and these were 168 started using D. citri insects collected in Taiwan, Uruguay, and the US state of Hawaii 169 (designated CRF-TW, CRF-Uru, and CRF-HI, respectively). To determine whether ENS is 170 conserved among geographically distinct D. citri populations, we performed PCR using primers 171 flanking ENS and DNA extracted from insects collected from CRF-CA, CRF-TW, CRF-Uru, 172 and CRF-HI. We also included DNA extracted from field collected D. citri insects from 173 Pakistan, Brazil, and the US states of Arizona and Florida. We obtained nearly identical PCR 174 products from insects from CRF-CA, CRF-HI, Pakistan, Arizona, and Florida, but no PCR 175 products from insects from CRF-TW, CRF-Uru, or Brazil ( Fig. 2a & S1). Based on the known 176 distribution of the two D. citri lineages, these results suggest that ENS is absent in D. citri from 177 lineage A. 178 179 Sequences derived from some insect infecting viruses are maintained as circular episomal 180 molecules that produce virus-specific sRNAs (12,33). Given the heterogenous distribution of 181 ENS among D. citri populations, we examined whether ENS is maintained episomally rather 182 than representing a true EVE. When we performed a Southern blot with DNA extracted from 183 CRF-CA D. citri using an RNA probe based on the sequence of ENS we observed a single DNA 184 species at high molecular weight ( >12,000 bp) from undigested DNA and two species of lower 185 molecular weight in DNA digested with PstI and HindIII, indicating that ENS resides in high 186 molecular weight genomic DNA rather than low molecular weight episomal DNA (Fig. 2b). The 187 observation of two DNA species in the digested sample does not represent a second DcDV-like 188 EVE because ENS contains a PstI clevage site. To confirm that ENS is a true EVE, we treated D. 189 citri genomic DNA with an exonuclease to remove genomic DNA, but not circular episomal 190 DNA, and performed PCR with ENS-specific primers using exonuclease-treated DNA as 191 template. We obtained a PCR product from non-digested DNA, but not from digested DNA (Fig. 192 2c). Together, these results indicate the ENS is indeed integrated into the D. citri genome. 193 Finally, strand specific RT-PCR indicates that ENS is bidirectionally transcribed, although the 194 majority of transcripts are antisense to the corresponding DcDV transcript (Fig. 2d). 195 196 197 ENS and EITR give rise to DcDV-specific primary piRNAs 198 To evaluate whether ENS and EITR give rise to virus-specific primary piRNAs as described for 199 other EVEs, we mapped sRNAs from CRF-CA D. citri not infected with DcDV to the DcDV 200 genome. We found that sRNAs purified from CRF-CA D. citri mapped to EITR and to the 201 negative strand within the portion of the DcDV genome corresponding to ENS (i.e. antisense to 202 DcDV transcripts), but not to other regions (Fig. 3a). The size of these sRNAs was characteristic 203 of piRNAs and they possessed a strong 1U bias (Fig. 3b & c). Similar results were obtained for 204 other D. citri populations that are not infected with DcDV and for which sRNA datasets are 205 available ( Fig. S2a & b). As expected based on the lack of ENS in D. citri insects collected in 206 Brazil or from CRF-Uru, DcDV-specific piRNAs were not observed in these insects (Fig. S2c & 207 d). We note that the DcDV ITRs are comprised of a 210 nt hairpin present on both ends of the 208 DcDV genome. Thus, it is not possible to determine whether the sRNAs mapping to EITR are 209 specific to the positive or the negative strand. For this reason, because of the relatively small size 210 of EITR compared to ENS, and because EITR does not correspond to a transcribed region of the 211 DcDV genome (34), we chose to focus our analysis on piRNAs derived from ENS. 212

213
The expression of piRNAs is known to display tissue specificity and piRNA expression patterns 214 could have important consequences for the ability of EVE-derived piRNAs to target cognate 215 viruses. To determine the tissue specificity of piRNAs produced from ENS and to determine the 216 tissues in which the ping-pong cycle is active in D. citri, we sequenced sRNAs purified from 217 dissected D. citri guts, heads, ovaries, testis, and hemolymph using insects collected from CRF-218 CA. We found that while DcDV-specific piRNAs derived from ENS were present in all tissues 219 analyzed, their expression was significantly higher in D. citri guts than in any other tissue (Fig. 220 3d). The ping-pong cycle is restricted to germline tissues in D. melanogaster (1), however a 221 comprehensive analysis of somatic sRNAs mapping to TEs genome wide in 20 arthropod species 222 in combination with ancestral state reconstruction indicates that somatic ping-pong amplification 223 is widespread throughout arthropods despite having been independently lost in some species, 224 including D. melanogaster (35). To determine the tissues in which the ping-pong cycle is active 225 in D. citri we mapped sRNAs from CRF-CA D. citri guts, heads, hemolymph, ovaries, and testis 226 to all TEs identified within the D. citri genome and analyzed the mapped 27-32 nt sRNAs for the 227 presence of ping-pong signatures. We found evidence for ping-pong amplification of TE-derived 228 piRNAs in all tissues examined (Fig. S3). 229

231
DcDV is targeted by ping-pong-dependent piRNAs in the absence of a DcDV-derived EVE 232 We previously found that D. citri from CRF-CA are resistant to infection with DcDV (34). In 233 contrast, the virus is maintained as a persistent, maternally transmitted infection in D. citri from 234 CRF-TW (34). Thus, to understand the sRNA-based response to DcDV infection in D. citri, we 235 sequenced sRNAs from DcDV-infected D. citri from CRF-TW. These results revealed a major 236 population of 21 nt sRNAs, indicative of an siRNA-based response (Fig. 4a). Unexpectedly, we 237 also observed a smaller peak within the piRNA size range centered at 29 nt and we obtained 238 99.5% coverage of transcribed regions of the DcDV genome by mapping only 27-32 nt sRNAs 239 (Fig. 4a & b). We found that complementary 27-32 nt sRNAs mapping to opposite strands 240 throughout the DcDV genome possessed 5' ends separated by 10 nt more often than expected by 241 chance, an indication of ping-pong amplification (Z-score = 4.05±0.08) (Fig. 4c). Moreover, we 242 detected the 1U and 10A biases typical of ping-pong amplification in 27-32 nt sRNAs mapping 243 antisense and sense to the canonical DcDV transcripts, respectively (Fig. 4d & e). 244 245 We found that ENS is not present in the genome of CRF-TW D. citri (Fig. 2a), thus our 246 observation of ping-pong-dependent DcDV-derived piRNAs in these insects suggests that DcDV 247 is targeted by piRNAs independent of ENS. However, because this result was obtained by 248 performing PCR using primers flanking ENS, it is possible that ENS is present in CRF-TW D. 249 citri, but resides in different genomic context than seen in other D. citri populations. ENS is not 250 identical to the corresponding region of the DcDV genome at the nucleotide level (Fig. 1c). 251 Thus, some piRNAs derived from ENS do not perfectly map to DcDV, providing a means with 252 which to distinguish some ENS-derived piRNAs from DcDV-derived piRNAs. To rule out the 253 possibility that ENS is present in the genome of CRF-TW D. citri, we mapped 27-32 nt sRNAs 254 from CRF-TW D. citri to the DcDV genome without allowing any mismatches. When the 255 unmapped reads from this analysis were mapped to ENS without allowing any mismatches, no 256 reads mapped. In contrast, we obtained an average of 55.8% coverage of the ENS sequence when 257 the same analysis was performed using sRNAs from CRF-CA D. citri (data not shown). This 258 result indicates that ENS is indeed not present in the genome of CRF-TW D. citri and that the 259 targeting of DcDV by ping-pong-dependent vpiRNAs in these insects is independent of ENS-260 derived piRNAs. 261

262
We cannot exclude the possibility that the genome of CRF-TW D. citri harbors a different 263 piRNA-producing DcDV-derived EVE. Because DcDV is maternally transmitted to 100% of the 264 progeny of CRF-TW females (34), it is not possible to determine the repertoire of endogenous 265 piRNAs in these insects in the absence of DcDV infection. Thus, we analyzed the sRNAs present 266 in D. citri from CRF-Uru, as these insects lack ENS and are not infected with DcDV (Fig. 2a). 267 We found that no DcDV-specific piRNAs are produced in uninfected CRF-Uru D. citri, 268 indicating that these insects do not harbor a piRNA-producing DcDV-derived EVE (Fig. S2d). 269 The absence of DcDV-specific piRNAs was not due to a lack of piRNAs in general, as we 270 detected abundant ping-pong-dependent TE-derived piRNAs in these insects ( Diaphorina citri-associated c virus for which no such sRNA library exists) (28,36). As expected, 290 we detected a prominent peak at 21 nt for each virus, indicating a siRNA-based response (Fig. 291 S7). While virus-derived sRNAs within the piRNA size range were present during infection with 292 all four viruses, there were no peaks above background levels within the piRNA size range and 293 27-32 nt reads lacked signatures typical of primary or ping-pong-dependent piRNAs (Fig. S7). 294 These results indicate that viruses in general are not targeted by piRNAs in D. citri and suggest 295 that the targeting of DcDV by the piRNA pathway in D. citri may be due to differences in the 296 infection cycles between RNA and DNA viruses. 297 298 299 A recombinant CrPV-based reporter virus carrying DcDV sequence is not targeted by piRNAs in 300 D. citri that produce primary piRNAs from a DcDV-derived EVE 301 As has been observed for RNA viruses in mosquitoes, our results suggest that DcDV is targeted 302 by ping-pong-dependent vpiRNAs due to the de novo production of vpiRNAs from exogenous 303 DcDV RNA. If EVE-derived piRNAs were to prime the ping-pong cycle in this context, it would 304 be difficult to distinguish priming driven by EVE-derived piRNAs from priming driven by virus-305 derived vpiRNAs. Because our results suggest that RNA viruses are not targeted by vpiRNAs in 306 D. citri, we sought to construct a recombinant RNA virus harboring DcDV-derived EVE 307 sequence in order to study potential priming of the ping-pong cycle by EVE-derived piRNAs 308 without the background of vpiRNAs produced directly from viral RNA. For this purpose, we 309 used Cricket paralysis virus (CrPV), a dicistrovirus with a + ssRNA genome that was originally 310 isolated from field crickets (39). Due to the broad experimental host range of CrPV, including D.

Identification of TEs 576
We identified TEs present within the D. citri genome using RepeatMasker version 4.0.6 (70) 577 with the Metazoa library. In addition, to identify TEs lacking homology to previously annotated 578 TEs, we used RepeatModeler version 1.0.8 (71) to produce a de novo hidden Markov model for 579 TEs within the D. citri genome which was subsequently used as input for a second analysis using 580 RepeatMasker. All TE identification was performed using TEAnnotator.py as previously 581 described to produce a single strand-specific .fasta file containing the sequences of all TEs >100 582 nt identified in the D. citri genome (35).     DcDV. Infection was initiated in CRF-CA D. citri by oral acquisition. Insects were allowed to 1194 feed for 96 hours on an artificial diet solution containing 10 9 TCID50 units/mL of wild-type 1195 CrPV or CrPV-DcDV. Following the feeding period insects were moved to C. macrophylla