Draft genome sequence for virulent and avirulent strains of Xanthomonas arboricola isolated from Prunus spp. in Spain

Xanthomonas arboricola is a species in genus Xanthomonas which is mainly comprised of plant pathogens. Among the members of this taxon, X. arboricola pv. pruni, the causal agent of bacterial spot disease of stone fruits and almond, is distributed worldwide although it is considered a quarantine pathogen in the European Union. Herein, we report the draft genome sequence, the classification, the annotation and the sequence analyses of a virulent strain, IVIA 2626.1, and an avirulent strain, CITA 44, of X. arboricola associated with Prunus spp. The draft genome sequence of IVIA 2626.1 consists of 5,027,671 bp, 4,720 protein coding genes and 50 RNA encoding genes. The draft genome sequence of strain CITA 44 consists of 4,760,482 bp, 4,250 protein coding genes and 56 RNA coding genes. Initial comparative analyses reveals differences in the presence of structural and regulatory components of the type IV pilus, the type III secretion system, the type III effectors as well as variations in the number of the type IV secretion systems. The genome sequence data for these strains will facilitate the development of molecular diagnostics protocols that differentiate virulent and avirulent strains. In addition, comparative genome analysis will provide insights into the plant-pathogen interaction during the bacterial spot disease process.


Introduction
Xanthomonas arboricola [1] are plant associated bacteria in nine pathovars with a diverse range of biotic relationships [2,3]. Within this taxon, plant pathogenic strains with non-pathogenic strains have been described. Bacterial spot of Prunus spp. (X. arboricola pv. pruni), bacterial blight of Juglans spp. (X. arboricola pv. juglandis) and Corylus spp. (X. arboricola pv. corylina) are among the most harmful diseases of these tree hosts. These bacterial diseases are distributed worldwide and the causal bacteria are regulated in several countries including the European Union, where X. arboricola pv. pruni is a quarantine pathogen [4,5].
Within the pathovars, X. arboricola pv. pruni is a major threat to cultivated, exotic and ornamental Prunus species. This bacterium has been identified as a pathogen of P. armeniaca, P. avium, P. buergeriana, P. cerasus P. crassipes, P. davidiana, P. domestica, P. donarium, P. dulcis, P. laurocesasus, P. mume, P. persica and P. salicina [6]. During the last decade, some local outbreaks of bacterial spot in Spain have been reported on almond, peach, nectarine and plum [7]. For initial characterization of the bacterial strains isolated from Spanish outbreaks of bacterial spot, we performed a polyphasic study based on a multilocus sequence analysis, as well as some phenotypic characters [8]. After the characterization that showed the presence of different molecular and phenotypic variants, selected strains were analysed to assess the differences at the whole genome level.
Genome sequencing of X. arboricola strains has been completed for five strains isolated from walnut, three from peach, two from Musa sp., one from almond [9], one from barley [10] and one from Turkish hazel [11]. Genome sequencing includes the plasmid pXap41 [12], present in the X. arboricola pv. pruni strains. All these sequences have been deposited in the NCBI database.
Four genome sequences are available for pathogenic strains from Prunus, identified as X. arboricola pv. pruni. However, with the exception of the strain CITA 33 isolated from almond (P. amygdalus, syn. P. dulcis) in Spain [9], no detailed information about features of those genomes have been published. In the same way, there are no sequenced strains isolated from Japanese plum (P. salicina) or cherry rootstock (P. mahaleb). In addition, no avirulent strain of X. arboricola from Prunus spp. has been analysed at the whole-genome level. The occurrence of avirulent strains is of particular importance for a quarantine pathogen like X. arboricola pv. pruni with respect to accurate diagnosis of virulent strains.
Herein we present draft genome sequences for two X. arboricola strains: an avirulent strain, CITA 44, isolated from P. mahaleb, and X. arboricola pv. pruni strain, IVIA 2626.1, isolated from P. salicina cv. Fortuna, which differs from other sequenced strains in phenotypical features and virulence on several hosts [9]. The genome analysis of these two strains as well as comparison with other related strains should provide insight into the genetics of the pathogenesis process in X. arboricola strains associated with the bacterial spot disease of stone fruits and almond.

Classification and features
Strain CITA 44 was isolated in 2009 from asymptomatic leaves of Santa Lucía SL-64 cherry rootstock (P. mahaleb) in a nursery located in the north-eastern Spanish region of Aragón. This strain showed flagella associated swarming and swimming motility on 0.5 % agar PYM plates and 0.3 % agar MMA plates, respectively. Additionally, strain CITA 44 showed type IV pili associated twitching motility in the interstitial surface between 1 % agar PYM layer and the plastic plate surface. According to the atomized oil assay [13], this strain produced surfactant compounds on 1.5 % agar LB plates after 24 h at 27°C. In accordance with a detached leaf assay, conducted with a cotton swap damped with 1 × 10 8 CFU/ml, on almond cv. Ferraduel, apricot cv. Canino, peach cv. Calanda and European plum (P. domestica) cv. Golden Japan, X. arboricola strain CITA 44 did not cause bacterial spot symptoms at 28 days post inoculation (dpi). Despite this lack of symptoms, the bacterium could be re-isolated after such period.
X. arboricola pv. pruni strain IVIA 2626.1 was isolated from symptomatic leaves of Japanese plum (P. salicina cv. Fortune) in the southwestern Spanish region of Extremadura in 2002. This strain showed swarming, swimming and twitching type motility as well as production of surfactant compounds in the same culture conditions described above for strain CITA 44. In addition, according to the detached leaf assay described previously, strain IVIA 2626.1 was able to produce bacterial spot symptoms on almond, peach and European plum but not on apricot after 28 dpi.
Classification of the strains was performed using an MLSA approach based on the partial sequences of the housekeeping genes atpD, dnaK, efP, fyuA, glnA, gyrB and rpoD of the strains CITA 44 and IVIA 2626.1 as well as related strains of X. arboricola [3]. Nucleotide sequences were aligned with Clustal W and both ends of each alignment were trimmed (atpD 750 bp, dnaK 759 bp, efP 339 bp, fyuA 753 bp, glnA 675 bp, gyrB 735 bp and rpoD 756 bp) and concatenated to a total length sequence of 4,620 nucleotide positions. The phylogenetic tree was constructed using the maximum likelihood method implemented in MEGA 6.0 [14] using 1,000 bootstrap re-samplings. According to the phylogenetic analysis, strain CITA 44 belongs to the species X. arboricola, nevertheless, this strain could not be associated to any of the pathovars of this species. The concatenated sequence similarity among this strain and the other X. arboricola strains analysed varied from 97.08 % to 98.79 %. In contrast, strain IVIA 2626.1 was clustered in a group with the pathotype strain X. arboricola pv. pruni CFBP 2535, isolated from P. salicina in New Zealand, with a sequence similarity of 100 %.
Minimum information about genome sequence [19] of X. arboricola strain CITA 44 and X. arboricola pv. pruni strain IVIA 2626.1, as well as their phylogenetic position, are provided in Table 1 and Fig. 2.

Genome sequencing information
Genome project history X. arboricola strain CITA 44 and X. arboricola pv. pruni strain IVIA 2626.1 were selected for comparative whole sequencing analysis as X. arboricola strains isolated from Prunus spp. with several different phenotypic characters including virulence. Comparative genomics among the avirulent strain CITA 44 and the available Prunus-pathogenic strains including IVIA 2626.1 should be useful for identifying the molecular determinants associated with pathogenesis as well as those associated with host resistance and for diagnostic characterization of X. arboricola strains causing bacterial spot of Prunus spp. Whole Genome Shotgun Projects have been deposited at DDBJ/ EMBL/GenBank under the accession numbers LJGM00 000000 and LJGN00000000. The versions described in this paper are versions LJGM01000000 and LJGN01000000. Table 2 summarizes the project information and its association with MIGS.
Growth conditions and genomic DNA preparation X. arboricola strain CITA 44 and X. arboricola pv. pruni strain IVIA 2626.1 are deposited and available at the bacterial collections of the Instituto Valenciano de Investigaciones Agrarias (IVIA, Valencia, Spain) and the Centro de Investigación y Tecnología Agroalimentaria de Aragón (CITA, Zaragoza, Spain). Both strains were streaked on 1.5 % agar LB plates and were grown for 48 h at 27°C. A single colony of each strain was inoculated separately in 30 ml of LB broth and grown on an orbital shaker for 24 h at 27°C. DNA from pure bacterial cultures was extracted using a QIAamp DNA miniKit (Qiagen, Barcelona, Spain) according to the manufacturer instructions. DNA quality and quantity were determined by 1 % agarose gel electrophoresis, as well as using the Qubit flurometer (Invitrogen) according to the Quant-it dsDNA BR Assay Kit (Invitrogen) manufacturer instructions, and by a spectrophotometry (NanoDrop 2000 spectrophotometer, Thermo Scientific). A 2.0 μg/μl aliquot of 200 ng/μl sample was submitted for the sequencing.

Genome sequencing and assembly
The draft genome sequences for strains CITA 44 and IVIA 2626.1 were generated at the STAB VIDA Next Generation Sequencing Laboratory (Caparica, Portugal) using the Ion Torrent sequencing technology. Draft genome assembly of strain CITA 44 was based on 3,060,638 usable reads with a total base number of 948,933,067. The mean read length was 361.70 ± 93.50 and the mode read length was 385 bp. The draft genome assembly of IVIA 2626.1 was based on 2,317,319 reads, with a total base number of 461,361,072. The mean read length and Fig. 1 Images of X. arboricola CITA 44 (up) and X. arboricola pv. pruni IVIA 2626.1 (down) cells using contrast-phase microscopy (left) and the appearance of the colony morphology after 48 h growing on YPGA agar medium at 27°C (right). Flagella was stained (left) as described previously [63] the mode read length for this strain were 201.80 ± 85.30 bp and 241 bp, respectively. Genomic assemblies were constructed using MIRA 4.0 [20]. From the total of contigs generated, only those with a contig size above 500 bp and an average coverage above 99 in the case of CITA 44, and 40, in the case of IVIA 2626.1 were considered significant. Finally, 71 contigs (N50 = 120,981 bp; largest contig = 352,479 bp; average coverage = 198X) were generated for strain CITA 44 and for strain IVIA 2626.1, 214 contigs (N50 = 47,650; largest contig = 115,385; average coverage = 92X) were generated.

Genome annotation
The assembled draft genome for both strains was annotated using the RAST platform and the gene-caller GLIMMER 3.02 [21,22]. RNAmmer version 1.2 [23]    Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [62] and tRNAscan-SE version 1.21 [24] were used to predict rRNAS and tRNAS, respectively. Signal peptides and transmembrane domains were determined using the SignalP 4.1 server [25] and the TMHMM server version 2.0 [26], respectively. Assignment of genes to the COG database [27] and Pfam domains [28] was performed with the NCBI conserved domain database using an expected value threshold of 0.001 [29].
Major structural components associated with the flagellum [30,31], the type IV pilus [32], the type III secretory system [33,34] and the type III effectors [35,36], as well as the type IV secretory system and effectors [37][38][39], were identified in the draft genome sequence for each strain. Initially, the query of those genes was based on the coding sequence regions automatically annotated by RAST, and were confirmed using the BLASTn and BLASTx tools available at NCBI. Those components which were not automatically annotated were found in the genome sequence using the progressive Mauve alignment method [40]. Nucleotide sequences of the genes used for Fig. 2 Phylogenetic tree highlighting the position of two X. arboricola strains (shown in bold) relative to the pathotype strains (PT) of X. arboricola. X. citri subsp. citri str. 306 [64,65] was used as an outgroup. The tree was built based on the comparison of concatenated nucleotide sequences of seven housekeeping genes (atpD, dnaK, efP, fyuA, glnA, gyrB and rpoD) [3]. Sequences were first aligned and concatenated. The phylogenetic tree was constructed by using MEGA 6.0 software [13] with Maximum Likelihood method based on Tamura-Nei model. Bootstrap values (1,000 replicates) are shown at the branch points. GenBank accession number of X. citri subsp. citri str. 306 genome sequence is shown in parenthesis; accession numbers associated to the housekeeping loci of the pathotype strains can be found in a previous study [3]  these alignments were obtained from other xanthomonads in the NCBI gene database. Finally, the nucleotide sequence of the aligned regions was analysed using the BLAST approaches mentioned above. Those sequences with query coverage and identity percentage higher than 90 % were annotated. Additionally, the core components of the T3SS and T4SS were searched using the T346Hunter application [41]. T3Es and T4Es genes were predicted using the Effective database [42] after selection of the "gram-" parameter as organism type and the "plant set" parameter as classification module, and the SecReT4 tool [43], respectively. All the predicted genes were corroborated and annotated according to the BLAST parameters mentioned above.

Genome properties
The draft genome sequence of X. arboricola strain CITA 44 was 4,760,482 bp in length with an average GC content of 65.8 %, which is similar to that for other genomes of this species (65.4 to 66.0 %) reported in the NCBI  [45,46] as the reference. COG categories were assigned to genes by NCBI's conserved domain database [29]. The circular map was constructed using CGView [67]. From outside to center: Genes on forward strand (colored by COG categories); genes on reverse strand (colored by COG categories); GC content; GC skew genome database. For this strain, 4,306 genes were predicted and 4,250 were determined as protein coding genes. From these protein coding genes, 3,330 genes were assigned to a putative function and the remaining 920 were designated as hypothetical proteins. This strain presented 3 rRNA and 53 tRNA genes. In the case of the X. arboricola pv. pruni strain IVIA 2626.1, the draft genome sequence was 5,027,671 bp in length with an average GC content of 65.4 %, which is the same as for other strains of X. arboricola pv. pruni according to the NCBI database. A total of 4,770 genes were predicted and, among them, 4,720 were predicted as protein coding genes with 69.17 % assigned to a function and 30.83 % designated as hypothetical proteins. 50 RNA genes (3 rRNA and 47 tRNA genes) were predicted for this strain. The properties and characteristics associated with these genomes are presented in Table 3. The classification of the predicted protein coding genes into COG functional categories [44] is summarized in Fig. 3 and Table 4.

Insights from the genome sequence
Based on the phenotypic differences between CITA 44 and IVIA 2626.1 strains, selected genes associated with motility and pathogenicity were analysed (Table 5). No differences were observed for the structural components associated with bacterial flagella. A total of 30 out of the 31 components described for this organelle were identified [31], but neither of the two strains contained a homolog of the flhE gene. Regarding the 27 components associated with type IV pilus biogenesis and regulation in Xanthomonas [32,45,46], fimX, pilD, pilE, pilL and The total is based on the total number of protein coding genes in the annotated genome pilW genes were absent in strain CITA 44, whereas strain IVIA 2626.1 sequence did not contain homologs for fimX and pilL genes.
In the genus Xanthomonas, 24 structural and regulatory components of the T3SS have been determined. They are present in the hrp gene cluster which is regulated by the master regulons HrpG and HrpX [47]. Strain CITA 44 did not contain any of the 24 components of this gene cluster except two coding sequences which correspond to hrpG and hrpX homologs. The absence of T3SS has also been reported for another X. arboricola strain isolated from barley as well as for X. cannabis [10,48]. The absence of the genes hrcC, hrcJ, hrcN, hrcR, hrcS, hrcT, hrcU, hrcV, hrpB1, hrpD5 and hrpF was corroborated by conventional PCR as previously described [36]. In the case of strain IVIA 2626.1, 22 out of the 24 components, as well as homologs for the two master regulons were present, but no homologs for hpaF and hrpB5 were found. Homologs for these two genes were also absent in all the genome sequences of X. arboricola publicly available. Sixty T3Es described in genus Xanthomonas were absent in strain CITA 44 and absence of 21 of them, identified in X. arboricola pv. pruni, was corroborated by conventional PCR using specific primers [36]. On the other hand, strain IVIA 2626.1 contained 22 T3Es, 21 of them were described previously in other X. arboricola pv. pruni strains [36]. In addition to these effectors, a homolog of xopAQ was found. Both strains contained all 12 components associated with Agrobacterium tumefaciens [46,49] VirB/VirD4 T4SS [36]. Additionally, strain IVIA 2626.1 harbored a gene cluster homologous to the type four conjugation cluster (tfc). This cluster is composed by 24 genes associated with the expression of a conjugative pilus which is involved in the propagation of genomic islands [50]. In strain IVIA 2626.1, 17 out of the 24 genes associated with the T4SS were found and, within them, tfc2, tfc4, tfc12, tfc14, tfc16, tfc22 and tfc23 were identified as the core components required for the functioning of this T4SS [50].
An additional feature of the X. arboricola pv. pruni sequence is the presence of the plasmid pXap41 (41,102 Kbp) [12]. This plasmid is exclusively in X. arboricola pv. pruni strains and is associated with virulence because it contains some T3Es such as XopE3. Genome alignment of the plasmid pXap41 nucleotide sequence and the draft genome sequence for strain IVIA 2626.1 showed a region of 41.1 Kbp which was 99.90 % similar to the pXap41 plasmid of X. arboricola pv. pruni strain CFBP 5530. Conversely, no sequence region in the strain CITA 44 draft genome was similar to this plasmid. Negative results in the amplification of the genes repA1, repA2 and mobC associated with pXap41 [12] confirmed the absence of this plasmid in strain CITA 44.

Conclusions
Here we report and describe the draft genome sequence for two X. arboricola strains, CITA 44 and IVIA 2626.1, isolated from Prunus in Spain and associated with bacterial spot of stone fruits and almond by PCR protocols for identification of this pathovar [51,52]. The phenotype of these two strains varied for motility and virulence. Initial genomic analysis identified several   differences associated with motility (Type IV pilus) and virulence (T3SS, T3Es and T4SS), including the presence of the putative virulence plasmid pXap41 only in X. arboricola pv. pruni IVIA 2626.1 and the absence of the T3SS, T3Es and the plasmid pXap41 in the avirulent strain CITA 44. All these features make the avirulent strain a candidate for comparative studies to elucidate the molecular processes associated with the plant host interaction and virulence for strains of X. arboricola on Prunus species. Likewise, comparative genomic studies with related strains could provide target sequences for design of molecular diagnostics for the different pathovars of X. arboricola, as well as to differentiate between virulent and avirulent strains. Further functional studies will also provide insights into the pathogenesis process for X. arboricola strains associated with bacterial spot of stone fruits and almond.