High-Throughput Sequencing and the Viromic Study of Grapevine Leaves: From the Detection of Grapevine-Infecting Viruses to the Description of a New Environmental Tymovirales Member

In the past decade, high-throughput sequencing (HTS) has had a major impact on virus diversity studies as well as on diagnosis, providing an unbiased and more comprehensive view of the virome of a wide range of organisms. Rather than the serological and molecular-based methods, with their more “reductionist” view focusing on one or a few known agents, HTS-based approaches are able to give a “holistic snapshot” of the complex phytobiome of a sample of interest. In grapevine for example, HTS is powerful enough to allow for the assembly of complete genomes of the various viral species or variants infecting a sample of known or novel virus species. In the present study, a total RNAseq-based approach was used to determine the full genome sequences of various grapevine fanleaf virus (GFLV) isolates and to analyze the eventual presence of other viral agents. From four RNAseq datasets, a few complete grapevine-infecting virus and viroid genomes were de-novo assembled: (a) three GFLV genomes, 11 grapevine rupestris stem-pitting associated virus (GRSPaV) and six viroids. In addition, a novel viral genome was detected in all four datasets, consisting of a single-stranded, positive-sense RNA molecule of 6033 nucleotides. This genome displays an organization similar to Tymoviridae family members in the Tymovirales order. Nonetheless, the new virus shows enough differences to be considered as a new species defining a new genus. Detection of this new agent in the original grapevines proved very erratic and was only consistent at the end of the growing season. This virus was never detected in the spring period, raising the possibility that it might not be a grapevine-infecting virus, but rather a virus infecting a grapevine-associated organism that may be transiently present on grapevine samples at some periods of the year. Indeed, the Tymoviridae family comprises isometric viruses infecting a wide range of hosts in different kingdoms (Plantae, Fungi, and Animalia). The present work highlights the fact that even though HTS technologies produce invaluable data for the description of the sanitary status of a plant, in-depth biological studies are necessary before assigning a new virus to a particular host in such metagenomic approaches.

In the past decade, high-throughput sequencing (HTS) has had a major impact on virus diversity studies as well as on diagnosis, providing an unbiased and more comprehensive view of the virome of a wide range of organisms. Rather than the serological and molecular-based methods, with their more "reductionist" view focusing on one or a few known agents, HTS-based approaches are able to give a "holistic snapshot" of the complex phytobiome of a sample of interest. In grapevine for example, HTS is powerful enough to allow for the assembly of complete genomes of the various viral species or variants infecting a sample of known or novel virus species. In the present study, a total RNAseq-based approach was used to determine the full genome sequences of various grapevine fanleaf virus (GFLV) isolates and to analyze the eventual presence of other viral agents. From four RNAseq datasets, a few complete grapevine-infecting virus and viroid genomes were de-novo assembled: (a) three GFLV genomes, 11 grapevine rupestris stem-pitting associated virus (GRSPaV) and six viroids. In addition, a novel viral genome was detected in all four datasets, consisting of a single-stranded, positive-sense RNA molecule of 6033 nucleotides. This genome displays an organization similar to Tymoviridae family members in the Tymovirales order. Nonetheless, the new virus shows enough differences to be considered as a new species defining a new genus. Detection of this new agent in the original grapevines proved very erratic and was only consistent at the end of the growing season. This virus was never detected in the spring period, raising the possibility that it might not be a grapevine-infecting virus, but rather a virus infecting a grapevine-associated organism that may be transiently present on grapevine

INTRODUCTION
Grapevine is one of the oldest domesticated crops and has been cultivated for more than seven millennia in a wide range of geographical areas (McGovern, 2003). To-date, over 70 viruses and five viroids have been identified as infecting grapevine (Martelli, 2017), making it the crop affected by the largest number of viral agents so far. While most plant viruses have probably co-existed with their hosts before domestication, others likely represent novel pathogen-host interactions. Many grapevineinfecting viruses or viroids have been detected in all grapevinegrowing region within the last decade (Al Rwahnih et al., 2009Rwahnih et al., , 2012Rwahnih et al., , 2016Navarro et al., 2009;Coetzee et al., 2010;Zhang et al., 2011Zhang et al., , 2014Giampetruzzi et al., 2012;Poojari et al., 2013;Beuve et al., 2015;Jo et al., 2017a,b;Silva et al., 2017;Blouin et al., 2018a,b;Candresse et al., 2018;Diaz-Lara et al., 2018), which is probably due to a combination of many factors such as: (i) the vegetative multiplication and international trade, (ii) newer and wider areas of cultivation associated with additional and different viral reservoir pool leading to potential spill-over (Perry et al., 2016), (iii) climate change with latent virus being awaken (Jones, 2015), (iv) a greater number of research being completed on such a high-profit/valuable crop, and (v) the use of the newest deep-sequencing technology (HTS, high-throughput sequencing) serving as a very sensitive diagnostic tool (Adams et al., 2009;Candresse et al., 2014).
In the last decade with the advent of HTS technologies, many microorganisms and their complex interactions within an ecosystem have been minutely described (Poinar et al., 2006;Kristensen et al., 2010;Qin et al., 2010;Suen et al., 2010). These new insights on the complex connection between microbial communities and their hosts contributed to a better description of the tree of life (Hug et al., 2016), but also to the elaboration of new concepts (Roossinck, 2011;Vayssier-Taussat et al., 2014;Cadwell, 2015) and the remodeling of old theories into new ones. In the pathology field, the most prominent advance would likely be the adaptation of Koch's original postulates, morphing from the simplistic "1 pathogen = 1 disease" equation to considering microbial interactions and their adaptation dynamics in order for the host to develop a disease, sometimes referred to as the "pathobiome" concept (Stecher et al., 2012;Byrd and Segre, 2016). However, after this first descriptive step, more work is needed in the field of etiology and functional genomics, in order to better understand the interactions between the microbiota, the pathogens and the host that might trigger the expression of disease and to precisely understand to which agents the pathobiome concept is most relevant.
In plant virology, a wide range of HTS approaches have been developed, leading to the detection of well-known viruses but also to the discovery of a wide range of novel plant viruses, greatly enriching our vision of the "virome" or "epigenome, " which refers to the exhaustive collection of nucleic acids that constitute the viral community sensu-lato associated with a particular host or ecosystem. In order to take into account viral diversity, many extraction methods have been perfected, focusing on the viral genetic make-up at different stages of the viral cycle: (i) total RNA or total DNA, with or without specific enrichment steps (Dayaram et al., 2012;Beuve et al., 2018), (ii) doublestranded RNA which targets viruses with dsRNA genomes as well as RNA viruses and viroids during their replication step (Coetzee et al., 2010;Blouin et al., 2016;Beuve et al., 2018), (iii) vsiRNA (viral small-interfering RNA) derived from the adaptive antiviral plant defense mechanism (Donaire et al., 2009;Kreuze et al., 2009), and (iv) encapsidated nucleic acid using the VANA (Virion-Associated Nucleic Acid) approach (Filloux et al., 2015). While each method may have some drawbacks (e.g., highly purified, DNAse-treated dsRNA not efficient for the detection of DNA viruses, non-encapsidated virus, or viruses with unstable particles missed by VANA), the large panel of HTS approaches available provides the investigator or diagnostician with the opportunity to fine-tune technical options to meet his specific objectives. HTS approaches have also been used at the ecosystem level in ecogenomic and metageogenomic studies (Roossinck et al., 2010;Bernardo et al., 2017). These largescale studies assess the spatial and temporal distribution of plant virus populations within specific ecosystems, helping deciphering key components of viral evolution and disease emergence that shape wild and cultivated habitats in important agro-ecological interfaces (Alexander et al., 2014). In addition, some of the HTSbased approaches allow for full genome assembly, facilitating genome-wide studies (De Souza et al., 2017;Hily et al., 2018a;Muller et al., 2018), which were rarely feasible during the Sanger-sequencing era. Finally, RNA seq-based techniques can provide comprehensive transcriptomic analyses enabling the evaluation of gene expression between different phenological stages or varieties (Zenoni et al., 2010;Massonnet et al., 2017), the monitoring responses to environmental constraints (such as drought, temperature) (Haider et al., 2017;Londo et al., 2018) or the study of plant responses to specific infectious agent (Gambino et al., 2012;Blanco-Ulate et al., 2015). The counterpart is also possible, with the study of the impact of the culture of a particular genotype onto its natural environment (e.g., genetically modified organism risk assessment) (Hily et al., 2018b).
The Tymovirales order has been first established in 2004 and confirmed to regroup five families the Alpha-, Beta-, Delta-, Gammaflexiviridae, and the Tymoviridae families in 2009 (https://talk.ictvonline.org/taxonomy/p/taxonomy-history? taxnode_id=20172171, last visited 05/2018). Many viruses infecting grapevine have been either confirmed or identified via HTS techniques, with at least five of which [grapevine asteroid mosaic-associated virus (GAMaV), grapevine rupestris vein feathering virus (GRVFV), grapevine Syrah virus-1 (GSyV-1), grapevine fleck virus (GFkV), and grapevine red-globe virus (GRGV)] are part of the Tymoviridae family. While Tymoviridae, similar to the Tymovirales order in general, is predominantly a plant-infecting virus family (Edwards et al., 1997;Martelli et al., 2002;Elbeaino et al., 2011;Agindotan et al., 2012;Dutta et al., 2014), several of its members come from a wider range of hosts, such as insects (Wang et al., 2012;de Miranda et al., 2015) and, more recently, from the pathogenic fungus Fusarium graminearum (Li et al., 2016), the causal agent of Fusarium head blight of cereals. The Tymoviridae family consists of around 30 assigned virus species separated in three genera, Tymovirus, Marafivirus, and Maculavirus, while about 20 more virus species remained unassigned within the family/order ( Table S1). Members of this family have many common characteristics (Martelli et al., 2002;King et al., 2012), such as: (i) non-enveloped isometric virions of about 30 nm (only family within the Tymovirales with such spherical particle, while the rest of this viral order is being composed of "flexivirus"), (ii) a mono-partite, positive, single-stranded RNA genome (6.0-7.5 kb in length), (iii) a generally high cytosine content (32-50% range), and (iv) a genomic RNA with a 5 ′ cap and a 3 ′ -end with either a tRNA-like structure (Tymoviruses) or a polyA stretch (Marafiviruses and Maculaviruses). Other than members of the genus Maculavirus and the unassigned mycovirus Fusarium graminearum mycotymovirus 1 (FgMTV1) (Li et al., 2016), all other Tymoviridae present a highly conserved 16-nt sequence located at the end of the large coding sequence for the replication-associated polyprotein. Known as the "tymobox" or the "marafibox" (Ding et al., 1990;Izadpanah et al., 2002), it is believed to be an important element for the expression of a subgenomic messenger RNA for subsequent translation.
Here, using an RNAseq approach, our goal was to better characterize the sanitary status of Gewurztraminer scions grafted onto Kober 5BB rootstocks mono-infected with specific grapevine fanleaf virus (GFLV) isolates. Other than the near ubiquitous grapevine rupestris stem pitting-associated virus (GRSPaV) and viroids, we confirmed the presence of GFLV in the inoculated vines. Surprisingly, a thorough analysis of the RNASeq datasets revealed the presence of a new virus. Phylogenetic analyses showed this new virus to cluster within the Tymovirales order and to displays some but not all of the hallmarks of the Tymoviridae family. The sequence of this virus present in the grapevine environment is divergent enough to consider it as a new species typifying a new genus. The new virus is tentatively named grapevine-associated tymo-like virus (GaTLV) and the new genus tentatively named Gratylivirus (GRApevine TYmo-LIke).

Plant Material and Conditions
Grapevine material used in this study came from a virus' corecollection maintained in an open-field by the Institut National de la Recherche Agronomique (INRA) in its Colmar research center (48.064457 lat., 7.334899 long.) ( Table S2). Virus sources were isolated from GFLV-infected grapevine and biologically cloned via multiple passages on herbaceous hosts (Legin et al., 1993;Vigne et al., 2005Vigne et al., , 2015Hemmer et al., 2018;Hily et al., 2018a). Viral isolates were inoculated to Kober 5BB (Vitis berlandieri X V. riparia, clone 259) rootstocks using a heterologous grafting technique. Finally, certified virus-free Vitis vinifera cv. Gewurztraminer (clone 643 from INRA Colmar collection) were grafted onto healthy and GFLV-infected rootstock for further study (Vigne et al., 2015).
For the epidemiological study, all details about the samples used are presented in Table 1. Samples were flash frozen in liquid nitrogen and kept at −80 • C prior to analysis.

Fungi Isolation and Cultivation
Leaves from field-grown grapevine plants that tested positive for the new virus in September 2017 were dipped into Nanopure TM water (Thermo Fisher Scientific, Waltham, MA, USA) and slowly shaken for 15 min. A few microliters of the solution or of a 1/100th dilution were spread onto three different selective media, Peptone Yeast Extract Agar, Potato Dextrose Agar, and Dichloran Rose Bengal Agar (all from Sigma-Aldrich, St Louis, MO, USA) and the plates incubated at room temperature for a few days. Fungal mycelia were isolated and maintained for further studies such as testing for the presence of the new virus or fungal ITS barcoding (White et al., 1990;Gardes and Bruns, 1993). Plasmopara viticola (downy mildew), Erysiphe necator (powdery mildew), and Guignardia bidwellii (black rot) isolated from the same open-field trial and maintained in laboratory conditions were kindly provided by Sabine Wiedemann-Merdinoglu and were similarly tested. Briefly, mycelia were recovered and ground with mortar and pestle after addition of Fontainebleau sand (MERCK eurolab, Briare le canal, FR) and 400 µL of Nanopure TM water. The mixture was then placed at 95 • C for 15 min and then kept at −80 • C prior to RNA and DNA extraction (see below).

Mechanical Transmission Attempts
Attempts to propagate the new virus by mechanical inoculation were carried out using two grapevine infected leaf tissue samples (EVC53 and EVC60). Freshly collected leaves were ground at a 1:5 ratio [wt:vol] in 5 ml of a modified Sorensen's phosphate buffer (35 mM Na2HPO4, 15 mM KH2PO4, pH 7.2) without or

HTS Data Analyses
Analyses of dataset were performed using the CLC Genomics Workbench 8.5.1 software (CLC bio Genomics, Aarhus, Denmark). After the trimming and quality check procedure, only reads above 70 nucleotides (nts) were kept (see Table S3).
The sanitary status of the grapevine samples was assessed by mapping reads onto a curated collection of grapevine viruses reference (Martelli, 2017) as previously described (Hily et al., 2018b). A relaxed mapping stringency (0.5 read length/0.7 similarity) was used in order to take into account genome diversity within each virus species. RPKM values, expressing the abundance of viral reads in each sample were calculated taking into account the number of reads mapping to each reference virus, its length, and the total number of reads from the sample. In parallel, de novo assembly was performed after removal of reads that mapped onto the Vitis vinifera genome (http://www.plantgdb.org/XGDB/phplib/ download.php?GDB=Vv, Genoscope 12x, last visited 05/2018). Contigs were then tested against GenBank reference sequences using BlastN/BlastX (http://blast.ncbi.nlm.nih.gov/Blast.cgi, last visited 05/2018).

In silico Sequence Analysis and Statistical Analyses
Nucleic acid and deduced amino acid products were analyzed using CLC Genomics Workbench 8.5.1. The GaTLV nucleotide and deduced protein sequences were compared with other viral sequences from GenBank and EMBL databases using the FASTA (Pearson and Lipman, 1988) and BLAST (Altschul et al., 1990) programs. Identity and similarity percentages were obtained using mean length of sequences, BLOSUM62 matrix with gap cost of 10 and gap extension of 0.5. Alignment analysis and tentative Maximum Likelihood-based phylogenetic trees of amino acid sequences were performed using the MUSCLE (Edgar, 2004) and MEGA7 (Kumar et al., 2016) softwares. The best ML-fitted model for each sequence alignments was used and bootstrapping analyses of 100 replicates were performed. Trees were visualized using iTOL (Letunic and Bork, 2016).

Molecular Analyses
A 3 ′ /5 ′ RACE kit (SMARTer RACE, Clontech Lab., CA, USA) was used for the amplification and confirmation of the viral genome termini, as per manufacturer's recommendations. The virus-specific primers (including a 15 bp overlap sequence for cloning purposes in italic), GSP fwd (5 ′ -GATTACGCCAAGCTT GTCAACGGGTTATTTGATGGCGGAGGGTG) and GSP rev (5 ′ -GATTACGCCAAGCTTCGCGGTACCAAACGT TCACGCTCACC) were used along with oligo UPM (Clontech Lab.) to amplify the genome 3 ′ and 5 ′ ends, respectively. Resulting PCR products were Sanger-sequenced, confirming already known sequences and allowing termini to be resolved.
Amplification of the putative CP coding sequence was performed using primers containing attB1 and attB2 sequences (in italic) upstream and downstream of the coding region (with start and stop codon underlined): fwd 5 ′ -GGGGACAAGTTTGTACAAAAAAGCAGGCTTCATG TCTGAGATTACACCCGTGC and rev 5 ′ -GGGGACCA CTTTGTACAAGAAAGCTGGGTCTCAAGCAAAAAC AATATCGTAACCAT. PCR products were then cloned by Gateway R recombination, following the manufacturer's protocol (Invitrogen, Carlsbad, USA), into successively pDONR/Zeo donor vector and pEAQ-HT-Dest1 binary plasmid (Peyret and Lomonossoff, 2013). The resulting plasmid was then used for attempting VLP production (Belval et al., 2016). A plasmid containing GFP was used as positive control. The same primers, but without the attB sequences, were used for viral detection.
For fungal identification, a barcoding analysis based on the internal transcribed sequences (ITS) which consisted in amplifying a genomic region with primers ITS1F (CTTGGTCATTTAGAGGAAGTAA) and ITF4 (TCCTCCGCTTATTGATATGC) followed by Sanger sequencing.

Analysis of the Full Sanitary Status of GFLV-Infected Plants and Initial Characterization of a Novel Virus
RNASeq datasets obtained from a collection of Gewurztraminer grapevines singly infected by GFLV isolates were first analyzed by directly mapping total cleaned reads (Table S3) onto a curated collection of grapevine viruses' reference sequences. Of the four plants tested, the EVC53 grapevine was used as a negative control in our laboratory, as it was not inoculated with GFLV. Its sanitary status was confirmed by HTS (Table 2), since no viruses nor viroids were detected, other than the three near ubiquitous agents of grapevine; grapevine rupestris stem pitting-associated virus (GRSPaV), hop stunt viroid (HSVd) and grapevine yellowspeckle viroid 1 (GYSVd1). On the other hand, in addition to GRSPaV, HSVd and GYSVd1, all three other tested grapevines samples (EVC42, EVC56, and EVC60) displayed, as expected, reads mapping on GFLV RNA1, and RNA2 sequences. EVC56 was the only sample to exhibit reads corresponding to the satellite RNA3 sequence ( Table 2). No other grapevine-infecting viruses/viroids were detected in any of the tested plants using this "direct-mapping" approach (data not shown).
To confirm this initial analysis, a de novo assembly was performed, allowing for contigs and scaffolds reconstruction. In this way, near-complete genomes for each of the viruses/viroids infecting all four samples tested, consisting in three new GFLV-RNA1 sequences, four GFLV-RNA2, one GFLV-RNA3, 11 GRSPaV as well as four HSVd, and three GYSVd1 viroid sequences were obtained ( Table 2, Table S2). For each samples, after removal of sequences corresponding to known viruses and viroids, the remaining contigs were annotated using BlastN/BlastX. Most contigs corresponded to grapevine transcriptome and were set aside. A few remaining contigs from the different samples displaying high percentage of identity among each other were of interest, with the largest of them displaying a maximum length of 4,900 nt. Using BlastN, this contig did not reveal any significant homology with any other GenBank sequence. However, at the amino acid level and using BlastX, the predicted encoded protein revealed homologies with members of the Tymoviridae family, with the presence of conserved domains of a viral methyltransferase (MTR), a peptidase (PRO), a helicase (Hel), and a polymerase (RdRp) with very strong e-values (1 e −8 or lower) but with only low amino acid sequence identity levels, close to 30%. The sequence was further manually extended by several rounds of read mapping, until no more reads mapped against it. This allowed to obtain a continuous contig of 6,030 nt (including a stretch of seven adenines ending the sequence at its 3 ′ end). To confirm terminal sequences, 3 ′ -and 5 ′ -RACE reactions were performed. The complete genome sequence of the new virus was thus established to be 6,033 nt (without the polyA tail), indicating that only nine bases at the 5 ′ -end had been missing from the original completed assembly. For all tested samples, final read counts mapping onto this new complete virus genome are shown in Table 2. The virus was detected in all tested samples, but with a very large variation in representation, with average sequencing depth varying between 14.6X (EVC53) and 291X (EVC60) ( Table 2). No other viral contigs were identified in any of the tested samples.

Genome Organization and Phylogenetic Association of the New Virus With the Tymovirales Order
Analyses of the new virus genome organization revealed two open reading frames (ORFs) (Figure 1A). The first ORF encodes a large replication-associated polypeptide of 1,790 amino acid (p203), which accounts for the majority of the coding capacity of the genome. Functional domains were detected using BlastP and are conserved with viruses of the alpha-like superfamily of positive-strand RNA viruses (locations and e-values are shown in Figure 1A). For example, the Hel domain (aa 951-1177) contains a sequence ( 953 GYPGCGKT 960 ) analogous to the Hel motif I [GxxGxGK(T/S)] of many viral NTP-binding proteins, well conserved within the Tymovirales order. The second ORF does not display any similarities with any GenBank sequences at the nucleic acid level nor at the amino acid level. Also no putative conserved domains could be detected in ORF2. More elements on this second ORF are detailed in the next section.
Phylogenetic analyses were performed on a multiple alignment of the ORF1-encoded proteins of Tymovirales members (Figure 2A, Figure S1). While the new virus replication-associated protein (REP) clustered within the Tymovirales order, it could not be assigned to a particular family as it clustered away from members of the various families that constitute this order. Similar analyses performed using only the most conserved regions of the REP proteins (MTR, Hel, and RdRP, Figure 2B, Figures S2A,B, respectively) confirmed the affinities of the new virus with the Tymovirales, but again without providing a definitive message as to its affinities within this order. Although the closest associations tended to be with Tymoviridae members.
While the new viral sequence could not definitely be assigned to a particular family within the Tymovirales using the REP protein phylogenetic analyses, the virus shares many properties with the Tymoviridae family. (i) The genome organization and order of the conserved domains along the large ORF1 are most consistent with those of Tymoviridae members (Figure 1). (ii) As indicated above, closest affinities for all REP conserved motifs are with Tymoviridae family members ( Figure 1A), with all genera (Tymovirus, Marafivirus, and Maculavirus) being identified and represented. For example, the RdRp motifs are well conserved, especially the REP motif IV ( 1517 ANDYTSFDQSQTGE 1530 ) and REP-VI ( 1605 VSGDD 1609 ), with that of other members of the Tymoviridae. This was confirmed by looking at amino acid identity levels of all conserved domains, with the Tymoviridae percentages being generally higher than with any other Tymovirales (Table S4). (iii) The genome size is in the low range but typical of the family (6.0-7.5 kp) (Martelli et al., 2002). (iv) The absence of the AlkB conserved domain or of a Triple Gene Block (TGB) or 30K-like movement protein module is also typical of the Tymoviridae as opposed to the other families comprised in the Tymovirales.
However, some typical features of the Tymoviridae family are not observed in the new virus. For example, the "Tymobox" or "marafibox, " a very distinctive 16 nt region thought to control subgenomic RNA synthesis and present in most Tymoviridae (except for two members, GFkV, and FgMTV1), was not identified in the genome of the new virus. In addition, unlike for other Tymoviridae, for which an unusually high cytosine (C) content (32-50%) is observed, the new virus sequence exhibited a strong unbalanced content in C but in the opposite direction, with only 15.2 % (Adenine count: 27.7%, Tymine: 32.2%, and Guanine: 24.9%). Number of reads (in bold) and RPKM (Reads Per Kilobase per Million reads mapped to the reference, in italic) for each grapevine viruses and viroids found in the four samples analyzed. Genome: correspond to the number of complete (to near complete) genomes assembled in de novo. GFLV, grapevine fanleaf virus; GRSPaV, grapevine rupestris stem pitting-associated virus; GaTLV, grapevine-associated tymo-like virus; HSVd, Hop stunt viroid, and GYSVd1, grapevine yellow speckle viroid-1. Reads were mapped on a set of reference viruses previously described as infecting Vitis vinifera. For mapping, a low stringency was used with parameters set to 0.5 for read length and 0.7 for similarity. Size of the sequences tested is noted.

Characterization of the Putative Capsid Protein
While no sequence similarities were found using the BLASTN/X/P tools, the secondary structure as well as residues accessibility of the protein encoded by this second ORF were predicted. In silico modeling of this protein was performed via the I-TASSER suit website and models were then compared to the Protein Data Bank (PDB) library (data not shown). While partial sequence identity were fairly low (<18%), the best model hits corresponded to the capsid proteins of a few Tymoviruses [e.g., desmodium yellow mottle tymovirus (DYMV), turnip mosaic virus (TuMV), and physalis mottle virus (PhMV)] (Krishna et al., 1999;Larson et al., 2000Larson et al., , 2005 and a Sobemovirus (sesbania mosaic virus, SeMV) (Bhuvaneshwari et al., 1995). The TM-alignment program was then used to match proteins at the structure level in order to pinpoint and further identify a potential ORF2 protein function. All nine PDB hits corresponded to viral capsid proteins, with the three best hits corresponding to Tymoviridae's CPs (TM-scores of 0.826, 0.821, and 0.817, for DYMV, TuMV, and PhMV capsid protein element structures, respectively, Figure S3) and the last one to the hepatitis E virus (HEV) (TM-score of 0.712). To try to validate ORF2 as the coding sequence for the capsid protein, the complete ORF2 sequence (from start to stop codon included) was cloned in a pEAQ-based expression vector in order to produce in planta the protein in the hope to generate VLPs as previously described (Belval et al., 2016). Despite the fact that the plasmid sequence was conform by Sanger sequencing and that a GFP-positive plasmid control expressed the fluorescent protein, no VLPs could be observed (data not shown).

Epidemiology and Potential Host of the New Virus
To better decipher the origin and the biological significance of the new virus, mechanical inoculation of crude leaf extracts were undertaken. None of the 80 plants tested (consisting of 20 plants of either wild-type N. benthamiana or C. quinoa, as well as N. benthamiana B2, and NbDCLx, both accessions highly susceptible to virus and known to greatly promote virus multiplication) displayed any symptoms nor tested positive for the virus by RT-PCR.
To further study the distribution, the prevalence and the diversity of the virus, more than 70 samples were screened for its presence via RT-PCR. All leaf samples used in this study are presented in Table 1. First, we tested 26 samples from the grapevine "core-collection" maintained in an experimental plot at INRA-Colmar. Those samples were chosen since they covered a wide range of grapevine varieties, various phenological growth stages and different sampling dates and years. Surprisingly, more than half of the samples tested positive for the presence of the virus ( Table 1, "epidemiology linked to timing" section). Interestingly, all positive samples seemed to have been collected at the end of the growing season (September or later). RT-PCRbased amplicons were then Sanger sequenced. Over a span of 404 nts within ORF2, a maximum of only six single nucleotide polymorphisms (SNPs) was detected, showing a high level of sequence conservation (>98.5%) between isolates. Testing of further samples from different origins, the seasonal detection of the new virus was confirmed and detected as early as the mid-summer from either abandoned vineyards (Turckheim, Alsace, Fr) or lightly treated plots (Wintzenheim, Alsace, Fr) but only from September on for the rest of the samples ( Table 1, "timing confirmation" and "open field vs. greenhouse" sections). Remarkably, from samples collected in September, the detection rate was higher in samples collected from openfield (100%), than in samples maintained under greenhouse conditions (8%). All aforementioned results seemed to imply a relatively loose association of the virus with grapevine, possibly reflecting an "environmental" component to the presence of the virus in the tested grapevine samples. So far, the virus was found to be present in areas surrounding the INRA Colmar research station, and, more broadly, along the "Route des vins d'Alsace" ( Table 1, "epidemiology" section). However, the virus was not detected in a few samples collected from the Cognac region. In addition, we never found any reads corresponding to its sequence in any of our other grapevine HTS datasets, corresponding to about 120 samples that span areas such as the Alsace, Champagne and Chablis regions (data not shown). Nonetheless, it should be considered that most of these samples had been collected in the spring/early summer when the titer grapevine fanleaf virus is at its peak. Interestingly, the new virus was however identified recently, in the frame of a study supported by the BIVB (Interprofessional Office of Burgundy Wines), in seven out of eight samples collected in early Autumn in four plots of Pinot Noir and Chardonnay in Burgundy, France ( Table 1, "epidemiology" section). Six complete (to near complete) genome sequences could be assembled from these samples and showed a maximum of 22 mutations (six of them coding) along the complete genome as compared to the Colmar isolate reference sequence (EVC60), corresponding to a minimum of 99.6% nucleotide identity. These results confirm both the temporal association of this new virus with grapevine and its low level of diversity.
This loose association with grapevine was reinforced by experiments in which the surface of grapevine leaves was swept gently with a cotton swab from which total RNAs were then extracted and tested. While plant RNAs were not detected in this way (all samples tested negative in a RT-PCR assay for grapevine glyceraldehyde 3-phosphate dehydrogenase gene), the virus was detected in two out of the tree tested samples ( Table 1, "superficial" section), suggesting that the virus might rather be a surface contaminant of grapevine leaves rather than a grapevineinfecting virus.
Since some Tymovirales, and more specifically Tymoviridae members, are known to infect fungi or insects, an attempt was made at correlating the titer of the new virus with the presence of contigs annotated as coming from insects or fungi in the original RNASeq datasets ( Table 2). A correlation was found between the presence of the virus and insects-derived contigs but was not statistically supported (CC = 0.43; R 2 = 0.18, p-value = 0.571). Conversely, a statistically valid and very robust positive correlation was observed between the new viral sequence and the presence of contigs identified as originating from fungi (CC = 0.96, R 2 = 0.92, p-value = 0.039), strongly suggesting that this new virus could in fact be a novel mycovirus. Following this hypothesis, a total of 15 fungal species were isolated from grapevine leaves and berries at a time when the virus was detected ( Table 1, "fungi" section). While all fungal isolates were readily identified using an ITS barcoding technique, none tested positive for the presence of the new virus by RT-PCR. During the same period, a few insect species were trapped and tested for the presence of the virus. As for fungi, our effort to attribute an insect host to this new virus was fruitless (data not shown).

DISCUSSION
With the dawn of HTS and the huge amounts of data being produced from a single experiment, many field of research have been significantly impacted. With the rise of HTS technology emerged new concepts, old theories, and postulates being modified and remodeled (e.g., Koch's original postulate) (Byrd and Segre, 2016). From medicine, clinical diagnostic, microbiology to ecology, plant pathology (and more specifically plant virology) seems to have benefited the most from these new tools. Grapevine research also improved with HTS, with many new viruses being identified all over the world, belonging to different families such as Betaflexiviridae (Al Rwahnih et al., 2012;Giampetruzzi et al., 2012;Jo et al., 2017a,b;Blouin et al., 2018a,b;Candresse et al., 2018;Diaz-Lara et al., 2018), Caulimoviridae (Zhang et al., 2011), Luteoviridae (Silva et al., 2017), Secoviridae (Al Rwahnih et al., 2016), or Tymoviridae (Al Rwahnih et al., 2009;Beuve et al., 2015;Cretazzo and Velasco, 2017;Vargas-Asencio et al., 2017). HTS technologies have been confirmed to be a powerful diagnostic tool allowing for an exhaustive description of viral species present in many grapevine sample (Coetzee et al., 2010;Al Rwahnih et al., 2011;Jo et al., 2015;Beuve et al., 2018). Depending on the chosen methodology, complete (to near complete) viral genome can be also be assembled, re-shaping the viral evolution field (Simmonds et al., 2017) with genome-wide studies made easily achievable (Hily et al., 2018a).
In this present study, our initial goal to better define the sanitary status of grapevine plants, healthy or mono-infected with different GFLV isolates, was fulfilled since we obtained the complete genome sequences of the different GFLV isolates involved. All three RNA1, four RNA2, and one RNA3 complete sequences were submitted to GenBank (Table S2). This was performed using a dual strategy involving either direct mapping of reads against a collection of reference sequences of grapevineinfecting viruses and viroids or by de novo assembly of reads followed by BlastN/BlastX annotation of contigs. As expected, all samples displayed reads corresponding to ubiquitous grapevine viral pathogens: grapevine rupestris stem pitting-associated virus (GRSPaV) as well as two viroids (HSVd and GYSVd1) ( Table 2). From each sample, two to four genomes of GRSPaV were assembled, confirming the fact that multiple GRSPaV variants can infect a single grapevine (Beuve et al., 2018). All 11 GRSPaV genomes thus obtained were part of a genome-wide diversity study of GRSPaV (Hily et al., 2018a). Altogether, this is another proof of the need to consider these viral or subviral agents as part of the grapevine "background" virome (Saldarelli et al., 2017). Surprisingly, the "healthy" grapevine (EVC53) displayed a few reads mapping onto GFLV references ( Table 2). These reads were considered as a mild "intra-lane" contamination since all 72 reads covered less than 30% of the complete RNA1 plus RNA2 GFLV genome (e.g., ≈11 000 nt). Such contamination is often encountered in HTS datasets and is discussed in a method article in this same issue (Vigne et al., 2004). No other grapevine-infecting viruses were detected using the "directmapping" method.
Out of the thousands of contigs de novo assembled for each grapevine sample and after comparison against the NCBI database using BlastN/BlastX, some contigs showed a distant relationship (average aa identity close to 30%) to several viruses belonging to the Tymoviridae family and the Marafivirus genus in particular. Further steps ultimately yielded a genome of 6,033 nt (excuding the polyA tail). From this sequence, two ORFs were predicted. Phylogenetic analyses of the replicase-based polypeptide encoded by the ORF1, placed this new virus within the Tymovirales order, however no unambiguous assignment to a particular family could be attained as the new virus REP clustered away from members of the five families currently defined in the Tymovirales. Although not biologically confirmed, ORF2 was computationally described as coding for the coat protein after modeling and comparison to the Protein Data Bank (PDB) library. The viral coat proteins to which distant homologies were identified in this way all have icoseadral particles, suggesting the new virus could also have paraspherical particles, similar to members of the Tymoviridae family ( Figure S3). This element could be added to the list of features shared with this family and outlined above. Yet, the very low C content (only 15.2%) and the REP phylogeny set the new virus aside from the Tymoviridae while other features exclude it from the known genera within the family. For example, the presence of a 3 ′ poly(A) disqualifies it to be part of the Tymovirus genus (Dreher, 2004), while the genome organization sets it aside from the Marafivirus and Maculavirus genera.
Considering the originality of the features of this new agent, and the inability to unequivocally assign it at this time in any existing families in the Tymovirales order, the safest option seem (i) to tentatively name it 'grapevine-associated tymo-like virus' or GaTLV and (ii) to consider that it defines a new genus (provisionally named Gratylivirus) that will remain in the order but unassigned to a particular family for the time being. More viruses closely related to GaTLV need to be described before a decision can be made to decide whether the genus Gratylivirus should be included in the Tymoviridae or be included in a novel family within the Tymovirales. The sequence of GaTLV was deposited to GenBank under accession number MH383239.
Viruses belonging to the Tymovirirales order are known to infect many different species covering different Kingdoms. They are mostly found in Plantae, but lately many have been described infecting Fungi as well as the class of Insecta in the Animalia Kingdom (King et al., 2012;Li et al., 2016). To further characterize GaTLV and discover its putative host(s), we first tried to propagate the virus in herbaceous plants via mechanical inoculation, which was unfortunately not successful. Such failure to identify alternative herbaceous hosts does not rule out grapevine as the original host. Indeed, it has been previously reported that some Tymoviridae display a narrow host range (Dreher, 2004;Alabdullah et al., 2010), up to the extreme situation of having a single identified host [e.g. maize rayado fino virus (MRFV) restricted to corn (Nault et al., 1980) or GFkV to Vitis spp. (King et al., 2012)]. A study spanning a wide range of samples was performed by RT-PCR (Table 1). While half of the samples tested positive for the presence of GaTLV, the virus was detected only in samples collected at the end of the summer/early autumn season, which corresponds to a period when fungicide/pesticide treatments are generally discontinued. Such connection with fungicide/pesticide applications is further emphasized by the comparison of the detection rates for samples collected in early autumn from open-field (all positive for the virus) and from greenhouses where fungicide treatments were still in use (less than 10% positive). In addition, a swipe test demonstrated that the virus seems to be loosely present on the surface of grapevine leaves and/or berries. Taken together all these evidences suggest that GaTLV is likely a surface contaminant on grapevine leaves and might therefore rather be a virus infecting insects or fungi. This hypothesis of GaTLV to be a mycovirus is reinforced by the strong positive correlation, statistically supported, observed between the presence of GaTLV, and that of fungi-derived contigs. Unfortunately, in an attempt to identify the potential host, none of the 15 isolated fungi tested positive for GaTLV, including some major grapevine pathogenic species such as Botrytis cinerea, Plasmopara viticola, Erysiphe necator, or Guignardia bidwellii.
From positive samples, a genetic diversity study was performed. When comparing all complete genomes from two different locations (Alsace and Burgundy), this virus displayed a very high identity percentage (>99.6%) along the genome. While the sampling size might be too small to be certain, this lack of diversity could underscore either a slow-evolving virus, a virus infecting a new host which did not have the time to accumulate substantial divergence or a virus highly specialized to its host. While none of the aforementioned experiments were conclusive, directly identifying a host, only correlative results were accumulated, tentatively pinpointing GaTLV as a mycovirus belonging to the grapevine phytobiome. More experiments are needed in order to uncover the host of GatLV, such as graft experiments that, if negative, would additionally support the notion that GaTLV is not a grapevine-infecting virus or controlled fungicide treatment of grapevines that could lend support to the hypothesis that GaTLV may be a mycovirus.
This work highlights the fact that even though HTS technologies produce an invaluable sum of information describing the sanitary status of a plant, a careful etiological and epidemiological study is necessary before assigning a new virus to a host. Nonetheless, in this work and as it is often the case following HTS analysis, even after a careful scientific investigation, it is still not possible to designate without any doubt the host of an infectious entity. Our study also confirm that grapevine phytobiome is probably richer than anticipated, with the use of HTS allowing for the detection of not only grapevine pathogens but also grapevine associated-ecosystem (Al Rwahnih et al., 2011;Espach et al., 2012).

DATA AVAILABILITY
All sequences de novo assembled have been submitted to GenBank ( Table S2). The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.