Characterization of the first complete genome sequence of an Impatiens necrotic spot orthotospovirus isolate from the United States and worldwide phylogenetic analyses of INSV isolates

Impatiens necrotic spot orthotospovirus (INSV) can impact economically important ornamental plants and vegetables worldwide. Characterization studies on INSV are limited. For most INSV isolates, there are no complete genome sequences available. This lack of genomic information has a negative impact on the understanding of the INSV genetic diversity and evolution. Here we report the first complete nucleotide sequence of a US INSV isolate. INSV-UP01 was isolated from an impatiens in Pennsylvania, US. RT-PCR was used to clone its full-length genome and Vector NTI to assemble overlapping sequences. Phylogenetic trees were constructed by using MEGA7 software to show the phylogenetic relationships with other available INSV sequences worldwide. This US isolate has genome and biological features classical of INSV species and clusters in the Western Hemisphere clade, but its origin appears to be recent. Furthermore, INSV-UP01 might have been involved in a recombination event with an Italian isolate belonging to the Asian clade. Our analyses support that INSV isolates infect a broad plant-host range they group by geographic origin and not by host, and are subjected to frequent recombination events. These results justify the need to generate and analyze complete genome sequences of orthotospoviruses in general and INSV in particular.

Orthotospoviruses are classified based on nucleocapsid (N) amino-acid (aa) sequence identity and serological cross reactivity, plant host range and thrips transmission specificity [22], and are considered as distinct species when their nucleocapsid aa identity is less than 90% [23]. INSV was first designated as TSWV-I strain [24,25,26,27]. INSV glycoproteins are serologically related to TSWV, while the N proteins are serologically unrelated [24,28].
INSV's host range includes about 300 plant species [22]. Even though INSV can considerably affect vegetables, its economic importance for vegetables was less than for ornamental plants [29,30], but in the last few years [21,31]. INSV importance is increasing in vegetables in Europe and North America.
Characterization studies on INSV are quite limited. Until now, only four complete INSV genomes have been sequenced. The type isolate (NL-07) was reported by van Poelwijk et al. in 1997 and consists of an L and S segments from the Netherlands [32,33], and an M segment from the US [34] (M74904.1; NC_003625.1, NC_003616.1, NC_003624.1). The M segment from the US was included in the type isolate because it differed by only 4 nucleotides from the M segment of NL-07, whose 700 (out of 5000) nucleotides were sequenced at that time, even though the similarity between the remaining 4300 nucleotides was unknown. Among the three remaining INSV full genome sequences, one is from Italy (DQ425094.1, DQ425095.1, DQ425096.1) and two from China (GQ336989.1, GQ336990.1, GQ336991.1; GU112503.1, GU112504.1, GU112505.1). Isolate GU112505.1 from China contains a non-functional RdRp due to mutation and is missing a portion of the S segment, lowering the number of complete INSV sequences de facto to two. Availability of genomic sequences from different geographic origin is pivotal to understand INSV genetic diversity and evolution, especially considering that orthotospoviruses have a tripartite genome and can reassort. Furthermore, while for other orthotospoviruses like TSWV, the aa sequence of N is sufficiently diverse to confer phylogenetic character, the INSV-N is highly conserved and it is not phylogenetically informative [35][36][37]. Genetic analysis can be used to characterize the structure of a virus population in relation to a location or host, and to probe the origin in a population and gene flow across time and space. Thus, we suggest that it is important to fully sequence a larger number of INSV genomes, and information gained by doing so will generate understanding of the etiology and aid management of the disease.  [21], with modifications) were conducted with symptomatic leaves from infected E. sonchifolia as virus source. First-instar larvae (12 h old) of WFT were given a 24 h acquisition access period and then reared on virus-free green bean pods until adulthood. These adult thrips were given a 48-h inoculation access period to 2 weeks old E. sonchifolia seedlings (20 thrips per plant). This experiment was repeated twice. Inoculated plants were maintained in a growth chamber (25 °C, 16 h photoperiod) for symptom development and then were tested by ELISA.

INSV
Transient agroinfiltration was used to test the functionality of the INSV NSs protein as silencing suppressor according to previous protocols [12,38]. Briefly, full-length UP01 NSs was cloned into pBin61 vector and transiently expressed through agroinfiltration together with pBin-GFP in 16C N. benthamiana. Vector only (pBin61) and pBin61-p19, both together with pBin-GFP, were used as negative control and positive control, respectively. GFP expression of agroinfiltrated plants was checked with UV light 3 days post-agroinfiltration.
Total RNA was extracted from systemically infected N. benthamiana leaves using the Spectrum ™ Plant-Total RNA Kit (Sigma-Aldrich, St. Louis, MO, USA), following the manufacturer's directions. Reverse transcription was performed using Superscript IV reverse transcriptase (Invitrogen, Grand Island, NY, USA), random primers and 500-1000 ng of RNA as template. Overlapping amplicons were obtained by PCRs with gene-specific primers designed on conserved regions of available INSV isolates (Additional file 1) and the Q5 High Fidelity PCR Kit (NEB, Ipswich, MA, USA), followed by 5 min adenylation at 72 °C using GoTaq DNA Polymerase (Promega, Madison, WI, USA). PCR products were cloned into pGEM-T Easy vector (Promega, Madison, WI, USA) and sequenced at the PSU Genomic Core Facility by Sanger sequencing. Overlapping sequences were assembled using Vector NTI software (Invitrogen, Grand Island, NY, USA).
Phylogenetic trees were constructed by neighborjoining method [39] using MEGA7 software [40], with 1000 bootstrap replicates. Percentages of pairwise identity among the aligned nucleotide and protein sequences were calculated using MatGAT v.2.03 [41]. Putative reassortment and recombination events were predicted by Recombination Detection Program (RDP4 v.4.80) [42] using several algorithms on the MUSCLE alignment file of concatenated full-length genome sequences, created with MEGA7.  [14]. Since the NSs of TSWV has been demonstrated to function as silencing suppressor [12,[43][44][45], we performed in planta transient Agrobacterium tumefaciens silencing suppression assays [38] to test this activity for UP01 and demonstrated that UP01 NSs is a strong silencing suppressor (Additional file 5).

Conserved motifs
Several amino acid substitutions distributed along the whole RdRp protein sequence were observed between UP01 and other INSV isolates (Additional file 6).

A recombination event in the L segment is predicted among INSV isolates
Analysis of putative reassortment/recombination events using INSV concatenated full-length genome sequences predicted the occurrence of a recombination event involving isolates UP01, NL-07 and the Italian isolate (Additional file 8). The event involved the L segment and was predicted by different algorithms with significance level set at P ≤ 0.05.

Discussion
UP01 is consistently placed into the same Western Hemisphere clade with other US isolates and NL-07, and is more distantly related to isolates in the Asian clade, where the Italian isolate also belongs (Figs. 1, 2, 3). As indicated by Elliott et al. [36] and Nekoduka et al. [37], our result confirms that INSV isolates do not group phylogenetically based on host species (Figs. 1, 2; Additional file 9). UP01 RdRp ORF is overall more related to NL-07 than to other isolates (Additional file 8) but it shares different degrees of similarities with all isolates based on the region of the RdRp examined, suggesting a possible recombination event for this segment involving the region of 2850-8690. The resolution of the RdRp phylogeny is penalized by having only 5 sequences available.
Phylogenetic analyses of the M segment and its two ORFs (Gn/Gc and NSm) (Fig. 1a-c) and IGR (Fig. 1d) show again that INSV isolates are divided into Western Hemisphere and Asian clades, with UP01 in the Western Hemisphere clade, and the isolates from Italy and Asia in the Asian clade. In the Asian clade are unexpectedly grouped also one A. thaliana (NSm JX138532.1, Gn/Gc JX138530.1) (Fig. 1a, b) and one lettuce isolates (KF745141.1) (Fig. 1b) from the US, suggesting that these isolates might be of European/Asian origin and have been introduced recently in the US. While this segment is better represented, still not many sequences are available to resolve some of the phylogenetic relationships for members of the two clades.
Phylogenetic analyses using N protein nucleotide sequences (Fig. 2a) indicate that UP01 grouped in the Western Hemisphere clade. This clade contains isolates from the US and the Netherlands, but also one isolate from Japan (AB894565.1), again indicating that INSV was probably introduced into different regions via import of infected plant material. UP01N protein shares very high aa identity with other INSV isolates (Additional file 10).
The division into Western Hemisphere and Asian clade is also congruent when looking at the phylogenetic analyses of the S segment (Fig. 2c), where UP01 belongs to the Western Hemisphere clade and is distantly related to the Chinese isolate (GU112504.1). But while for the M segment UP01 is closely related to the reference sequence (M74904.1, NC_003616.1), with whom it shares a more recent origin (bootstrap value > 90%), and it is less related with USA WA Basil isolate (KX790322.1) (bootstrap value > 90%) (Fig. 1c), the phylogenetic study of the S  (Fig. 2c) revealed that UP01 is more related to the USA WA basil isolate than to the reference sequence. This observation, for the first time, questions combining in a reference genome sequences that superficially seem to belong to the same isolate, but that could belong to distinct clades, when analyzed using a larger number of sequences. An alternative explanation to our result could indicate a reassortment event between isolates from in different geographic regions that led to the emergence of the reference genome. Interestingly, the USA CA lettuce isolate SV-L1 (NSs KF745142.1 and N KF745140.1, respectively) that was isolated from an INSV outbreak in Coastal California clustered with other US isolates when its NSs (Fig. 2b) and N ORFs (Fig. 2a) were analyzed, but its NSm (KF745141.1) sequence grouped with the Asian clade with high bootstrap support (Fig. 1b), indicating a possible reassortment or recombination event.
Unfortunately, the Gn/Gc sequences for these isolates are not available to support these hypotheses.
The phylogenetic analysis of the N protein (Fig. 2a) is the one for which more sequences are available, and highlights how having a large number of sequences can resolve better the INSV phylogeny and can be epidemiologically informative. In fact, in the case of the INSV sequences reported in a recent outbreak in lettuce in Costal California [31], phylogeny shows that all lettuce strains responsible for the outbreak were identical or highly related, but they differed from isolates found in the surrounding weeds and crops.
Result of the analysis of putative reassortment/recombination suggests that a recombination event involving UP01 might have happened. As mentioned above, phylogenetic analysis also supports the predicted recombination event (Fig. 3) and further confirms the occurrence of Reassortment is also biologically important because it could result in new resistant-breaking strains [45,53,54] or emergence of new viruses [55].

Limitations
Additional complete genome sequences from the INSV outbreak in Coastal California would be needed to confirm reassortment and recombination events between INSV isolates.