The complete genome sequence of Dickeya zeae EC1 reveals substantial divergence from other Dickeya strains and species

Dickeya zeae is a bacterial species that infects monocotyledons and dicotyledons. Two antibiotic-like phytotoxins named zeamine and zeamine II were reported to play an important role in rice seed germination, and two genes associated with zeamines production, i.e., zmsA and zmsK, have been thoroughly characterized. However, other virulence factors and its molecular mechanisms of host specificity and pathogenesis are hardly known. The complete genome of D. zeae strain EC1 isolated from diseased rice plants was sequenced, annotated, and compared with the genomes of other Dickeya spp.. The pathogen contains a chromosome of 4,532,364 bp with 4,154 predicted protein-coding genes. Comparative genomics analysis indicates that D. zeae EC1 is most co-linear with D. chrysanthemi Ech1591, most conserved with D. zeae Ech586 and least similar to D. paradisiaca Ech703. Substantial genomic rearrangement was revealed by comparing EC1 with Ech586 and Ech703. Most virulence genes were well-conserved in Dickeya strains except Ech703. Significantly, the zms gene cluster involved in biosynthesis of zeamines, which were shown previously as key virulence determinants, is present in D. zeae strains isolated from rice, and some D. solani strains, but absent in other Dickeya species and the D. zeae strains isolated from other plants or sources. In addition, a DNA fragment containing 9 genes associated with fatty acid biosynthesis was found inserted in the fli gene cluster encoding flagellar biosynthesis of strain EC1 and other two rice isolates but not in other strains. This gene cluster shares a high protein similarity to the fatty acid genes from Pantoea ananatis. Our findings delineate the genetic background of D. zeae EC1, which infects both dicotyledons and monocotyledons, and suggest that D. zeae strains isolated from rice could be grouped into a distinct pathovar, i.e., D. zeae subsp. oryzae. In addition, the results of this study also unveiled that the zms gene cluster presented in the genomes of D. zeae rice isolates and D. solani strains, and the fatty acid genes inserted in the fli gene cluster of strain EC1 were likely derived from horizontal gene transfer during later stage of bacterial evolution.


Background
Dickeya zeae is the major pathogen responsible for the maize stalk rot and rice foot rot diseases, and has the ability to infect both monocotyledons and dicotyledons. Other members of Dickeya are D. chrysanthemi, D. dianthicola, D. dieffenbachiae, D. paradisiaca, D. dadantii and D. solani [1,2]. A recent recA-based molecular evolution and genetic diversity study showed that D. zeae is least related to other Dickeya species and even within D. zeae, there are many sequence variants (sequevars) among the isolates from different host plants [3], suggesting a long history of D. zeae adaptation and evolution in the processes of pathogen-host interactions.
The symptoms caused by Dickeya infection include soft rot as well as wilts resulting from vascular invasions. Extensive studies on the Dickeya pathogens which infect dicotyledon crops and ornamental plants led to identify a range of virulence factors including extracellular enzymes, phospholipase, iron metabolism, siderophores, indigoidine pigment, and type III secretion system, collectively contributing to the bacterial infections [4]. In contrast, genetic analysis and biochemical characterization of the pathogenic mechanisms of D. zeae were initiated only in recent years [5][6][7][8], following the outbreak of the rice root rot disease in China [9]. These studies led to identification of a new family of phytotoxins, i.e., zeamine and zeamine II [5,6], which appear to act as the key virulence determinants of the pathogen as mutation of the zeamine synthase genes zmsA and zmsK resulted in partial or almost complete loss of virulence on rice seeds germination [6,7]. The regulatory mechanisms that govern virulence gene expression are largely unknown, except that an acyl-homoserine lactone (AHL) quorum sensing (QS) system was shown to be involved in the regulation of certain virulence traits in D. zeae, including bacterial motility and biofilm development [8].
In this study, a complete genome sequence of D. zeae strain EC1, isolated from rice plants, was obtained using the Illumina next-generation sequencing technology coupling with polymerase chain reaction (PCR) method for gap closure. The genome sequence was annotated and compared with the representative genome sequences of other Dickeya species with a special focus on the virulence determinants and potential regulatory mechanisms. Substantial genomic rearrangements were revealed by comparing the genome sequence of D. zeae EC1 with other Dickeya isolates. Furthermore, we found that the gene cluster encoding for phytotoxin zeamine biosynthesis is conserved only in the D. zeae strains isolated from rice plants and the D. solani strains isolated from potato.

Results and discussion
General genomic features of D. zeae EC1 Assembly of D. zeae EC1 genome sequencing data resulted in 3 scaffolds with 3430.072 kb, 1294.767 kb and 1.821 kb in length, respectively, with 148× coverage in average. For finishing, the gaps were closed by custom primer walks or PCR amplification, followed by sequencing. Hence, the EC1 genome consists of a circular chromosome with 4,532,364 bps in size with no apparent autonomous plasmids. The average GC content of the whole genome is 53.43%. Coding sequences account for about 85.77% of the genome with GC content of 54.57%. The genome contains 4,154 open reading frames (ORFs) predicted to encode polypeptides with 935 bps in average length. The GC content in the intergenic region is 46.55%. An origin of replication designated as dnaA boxes was identified between the deduced gene for the 50S ribosomal protein L34 and the gyrB locus encoding DnaA, DnaN and RecF. The first nucleotide of the dnaA start codon was assigned as the base pair 1 of the chromosome (Table 1, Figure 1).
Seven rRNA operons were found in the chromosome of EC1 with 3 on the positive strand and 4 on the negative strand. Both chains contain an unusual rRNA operon with a 16S-23S-23S-5S organization, and another unusual one with 16S-23S-5S-5S organization locates on the negative chain, differing from the common 16S-23S-5S organization pattern. Similar unusual rRNA operons were also found in some other bacterial species, e.g. a 16S-23S-5S-5S gene cluster was found in Erwinia pyrifoliae Ep1/96, E. tasmaniensis Et1/99 [19]. The occurrence of the unusual organization of rRNAs within the chromosome was interpreted as the result of a chromosomal recombination event [19]. Altogether, 88 tRNA genes which recognize 20 codons were found in the genome of D. zeae EC1.  Table 1). In addition, D. zeae EC1 contains substantially more tRNAs and transposases than any other bacterial species, suggesting a complex history of DNA transposition and recombination in the process of genome evolution [20]. Consistent with the above features, D. zeae EC1 has more ORFs encoding peptides with function unknown than any other Dickeya species with complete genome sequence (Table 1).
Phylogenetic relationships of D. zeae EC1 with other Dickeya species were assessed by performing Multilocus Sequence Analysis (MLSA) using four housekeeping genes (atpD, dnaX, gyrB and recA), and the concatenated data set of the four genes was constructed with Pectobacterium atrosepticum SCR1043 as an outgroup. The numbers of Dickeya species used in this phylogeny assessment were varied from 35 to 38 depending on the availability of the housekeeping gene sequences indicated above. The phylogenic analysis results showed that the three D. zeae strains isolated from diseased rice plants (EC1, ZJU1202 and DZ2Q) are located in the same branch, and share the same clade with the D. zeae isolates collected from other plants or sources (Additional file 1A, B, C, D). The same results also showed that among the four Dickeya strains with complete genome sequences released, EC1 was most homologous to D. zeae Ech586 and then D. chrysanthemi Ech1591 (Additional file 1A, B and C).
To further evaluate the evolutionary relationships among the five Dickeya species with available complete genome sequences, the sequence of D. zeae EC1 was aligned with those of other four species by using BLAST program, respectively. The results showed that D. zeae EC1 is best co-linearly matched with D. chrysanthemi Ech1591 with a few rearrangement events (reverse match; blue) ( Figure 2A), similarly, a few inversed fragments were observed between EC1 and D. dadantii 3937 ( Figure 2B), while the alignment between EC1 and D. zeae Ech586 showed a big inversion with the largest amounts of homologous DNA fragments ( Figure 2C). Intriguingly, the genes adjacent to the borders of this inversion are W909_RS06895 and W909_13075, encoding an integrase IntB (WP_016943617.1) and a transposase (AJC68444.1), respectively, and the sequence between them is highly similar to that in Ech586 (Dd586_2634~Dd586_1449) with reverse orientation, suggesting that the inversion might be due to homologous recombination between these two genes. The entire length of the D. paradisiaca Ech703 genome was least matched with D. zeae EC1 with numerous gene content dissimilarities ( Figure 2D). The distant relationship of D. zeae EC1 with other four complete Dickeya species indicated by the genomic collinearity analysis (Figure 2A-D) is supported by the estimated numbers of shared genes through OrthoMCL comparative analyses (only 1835 proteins in total, Figure 3A). The number of the shared genes between D. zeae EC1 and D. chrysanthemi Ech1591, D. zeae Ech586 and D. dadantii 3937 was 2213, 3208 and 2192, respectively ( Figure 3A). Agreeable with the genomic collinearity analysis shown in Figure 2, the most dissimilar species D. paradisiaca Ech703 shared only 961 genes with D. zeae EC1, and 935, 964 and 940 with D. chrysanthemi Ech1591, D. zeae Ech586, and D. dadantii 3937, respectively.
To identify the genes specific to D. zeae EC1, we compared its genome sequence to the released complete genome sequences of D. chrysanthemi Ech1591, D. zeae Ech586, D. paradisiaca Ech703 and D. dadantii 3937, which unveiled 826 unique proteins located only in the EC1 genome ( Figure 3A). For example, the chromosome of EC1 harbors several large specific regions, including the region from 1418747 nt to 1469974 nt, involved in the biosynthesis of zeamine phytotoxins, containing two characterized zeamine synthase genes zmsA and zmsK [6,7], the region from 1551137 nt to 1566214 nt, encoding different peptide products including three HTH-type transcriptional regulators (AJC65834.1, AJC65839.1 and AJC65846.1), three transporter proteins (AJC65835.1, AJC65838.1, and AJC65845.1), an acetyltransferase (AJC65836.1), a phosphoserine phosphatase (AJC65837.1), Taken together, the above analyses show that D. zeae EC1 is most co-linear with D. chrysanthemi Ech1591, most conserved with D. zeae Ech586 and least similar to D. paradisiaca Ech703. The findings highlight substantial divergence among Dickeya species. For paving ways to understand the genetic basis of pathogenicity and host specificity, we specifically compared the similarity and divergence of the genes encoding virulence determinants and regulatory mechanisms in the following sections.
In addition, we compared the genome sequences of three D. zeae strains isolated from rice plants. Synteny analysis showed that EC1 isolated from Guangdong Province of China was 99.962% identity to strain ZJU1202, which was also from Guangdong Province of China [15], and 95.863% identity to strain DZ2Q from Italy [16] ( Figure 2E, 2 F), and the number of the shared genes among these three rice pathogen strains is up to 3502, only 131 genes are unique in strain EC1 ( Figure 3B), among which, 95.4% (125 out of 131) encode hypothetical proteins, and the 6 genes with known functions share high similarities with other Dickeya strains (Additional file 2), 4 of which encode tranposases (457403~457603 nt, AJC65190.1, AJC65197.1, and AJC65071.1), 1 encode citrate lyase acyl carrier protein CitD (AJC65916.1), and the remaining encodes a PilT plasmid maintenance protein pirin (AJC64970.1), respectively. Given the high level of genome sequence similarity and the fact that the genomes of strains ZJU1202 and DZ2Q have only been partially sequenced with numerous gaps, which hinders the accurate comparison, we believe that the common genes shared by these three strains could be substantially higher than the currently obtained 3502.

Cell wall-degrading extracellular enzymes and proteases
Extracellular enzymes, including pectinases, polygalacturonases, cellulases and proteases, are one of the key virulence factors during bacterial pathogenesis when infecting host plants to cause soft rot symptoms. Among these enzymes, pectate lyases (Pel) and other pectinases such as pectin methylesterases (Pem) and pectin lyases (Pnl) of D. dadantii 3937 have been shown to play a major role in the virulence and tissue maceration [21]. The genome of EC1 contains a total of 14 genes encoding pectin degradation enzymes, including pnl, pelN, pelL, pelI, pelA, pelE, pelD, pelC, pelB, and pelZ encoding pectate lyases, pelW and pelX encoding pectate disaccharide-lyases, paeY encoding a pectin acetylesterase, and pemA encoding a pectin methylesterase ( Table 2). These pectin degradation genes are highly conserved in various Dickeya species, except that pnl is absent in D. zeae Ech586 and D. chrysanthemi Ech1591, and pelI and pelD are not included in the D. paradisiaca Ech703 genome ( Table 2). The pehK and pehX genes encoding polygalacturonases are also conserved in Dickeya spp. except that pehX has two copies in D. chrysanthemi Ech1591 (pehW and pehX), and three copies in D. dadantii 3937 (pehV, pehW and pehX) ( Table 2). Similarly, ten genes in EC1 are involved in cellulose degradation, including 2 endoglucanase encoding genes celY and celZ, 7 beta-glucosidase encoding genes bglA, bgxA, bglB, nagZ, bglC, bglD and celH, and an alpha-glucosidase encoding gene lfaA. These genes, associated with oligosaccharide degradation, are conserved in Dickeya stains, except that bglC and bglD were absent in D. zeae Ech586, and bglA and lfaA were not found in D. paradisiaca Ech703 ( Table 2).
The prtGDEFBCX cluster (W909_09760~09795) encoding four proteases and three protease secretion associated proteins is located on the negative strand of EC1 chromosome. Among them, the four protease encoding genes, i.e., prtG, B, C and X, encode serralysin homologs sharing about 59.8% similarity at amino acid level, and the prtD, E and F encode the type I secretion system (T1SS) components PrtDEF (AJC66336.1, AJC66336.1, and AJC66336.1). In addition, a proteinase inhibitor encoding gene inh is located between prtG and prtD. Alignment with other Dickeya species showed that prtG is absent in Ech1591, while the whole prt gene cluster is lost in the genome of D. paradisiaca Ech703 (Table 2).

Type II secretion system
The type II secretion system (T2SS) is used by diverse gram-negative bacteria to translocate extracellular proteins across the outer membrane. In D. dadantii 3937, the T2SS is encoded by the out genes and stt genes. The Out system allows the secretion of several proteins including most pectinases while the Stt system only presents to the outer part of the outer membrane, such as the pectin lyases encoded by the adjacent pnl gene [22,23]. D. zeae EC1 chromosome contains a highly conserved T2SS gene cluster (outSBCDEFGHIJKLMO; W909_138251 3895) (Additional file 3), covering 13.814 kb with 15 ORFs, but no stt genes. The gene cluster shares a high similarity with that of D. zeae Ech586 (coverage 100%, identity 94%), D. chrysanthemi Ech1591 (coverage 90%, identity 87%), D. dadantii 3937 (coverage 89%, identity 85%), and D. paradisiaca Ech703 (coverage 77%, identity 77%) at nucleic acid level. The out gene cluster is highly conserved among Dickeya spp. except that D. zeae EC1 harbors an extra predicted gene designated as W909_13885 encoding a hypothetical protein (AJC68452.1) at the upstream of outC, and that D. dadantii 3937 contains an additional outT gene (Additional file 3).

Type III secretion system encoded by the hrp gene cluster
The hypersensitive response and pathogenicity (Hrp) type III secretion systems (T3SS) were established as pathogenicity factors in many phytopathogenic bacteria [24,25]. Similarly, the hrp genes in Dickeya spp. have also been reported to play an important role in their pathogenesis and interaction with host plants [26][27][28]. In the genome of D. zeae EC1, a large hrp gene cluster spanning 24.8 kb was identified, which is composed of 27 genes and arranged in three transcriptional units. Alignment of this gene cluster with the counterparts in other Dickeya species showed that this hrp gene cluster is absent in D. paradisiaca Ech703, but present in other three species, i.e., D. chrysanthemi Ech1591, D. zeae Ech586, and D. dadantii 3937 with certain variations (Figure 4). Among the strains containing the hrp gene cluster, the only variation was found at the adjacent regions of the plcA gene, which encodes an extracellular phospholipase [29]. The plcA gene is located between hrpE and hrpF genes, in D. zeae EC1, there is no additional genes, while in Ech586, Dd586_1900 (encoding a hypothetical protein) and Dd586_1899 (encoding a lytic murein transglycosylase) are located at the upstream, and Dd586_1902 and Dd586_1903 (both encoding hypothetical proteins) are located at the downstream (Figure 4). Similarly, Dd1591_1903 and Dd1591_1902 (encoding a HrpE/YscL family type III secretion apparatus protein) are located at the upstream in D. chrysanthemi Ech1591, and in D. dadantii 3937, three ORFs encoding two hypothetical proteins and a membrane-bound lytic murein transglycosylase MltB, respectively are located at the upstream ( Figure 4, additional file 4). However, the biological significance of these extra genes surrounding plcA has not yet been investigated. Some of these T3SS genes of the Dickeya species have been characterized in details. Among them, hrpXY encode a two-component system (TCS), which is required for the activation of the hrp gene expression in D. dadantii 3937 [30]. After phosphorylation, HrpY binds to the hrpS promoter and activates the expression of hrpS, which encodes a sigma (54)-dependent enhancer-binding protein HrpS. In turn, HrpS binds to the sigma factor homolog hrpL, and consequently activates the genes encoding T3SS structural proteins such as HrcJ and HrpA and secreted effectors including hrpN [30]. In D. dadantii EC16, the Hrp system appears to contribute to an early stage of pathogenesis, whereas in D. dadantii 3937 mutation of hrpN, hrpG, and hrcC resulted in substantially reduced lesion formation and bacterial growth in African violet leaves [26,27]. In addition to playing a role in plant-microbe interactions, the D. dadantii T3SS is also required for the formation of bacterial aggregates at the air-liquid interface [30,31]. Given that most key hrp genes are highly conserved in D. zeae EC1, it is highly possible that the T3SS in D. zeae EC1 could also play certain roles in the bacterial virulence and physiology, which awaits further characterizations.

Type IV secretion system
Type IV secretion system (T4SS) is unique from the other secretion systems in its ability to translocate nucleic acids in addition to proteins from donor to recipient cells [32]. It functions in conjugation, pathogenicity and DNA release/uptake [32,33]. In the EC1 genome, T4SS gene cluster covers 9.742 kb, harboring 11 T4SS-core genes (W909_12990~13040) encoding VirB1~11 (AJC66938.1Ã JC66948.1) (Additional file 5). VirB1 forms bores hole in peptidoglycan layer allowing T4SS complex assembly to occur, while previous studies indicated that VirB2 and VirB5 proteins constitute an extracellular pilus of T4SS, virB4 and virB11 encode ATPases providing energy and motor force for macromolecular secretion, architecture assembly and substrate pumping, and VirB6, VirB8, and VirB10 form a membrane channel encompassing both membranes, and the periplasmic protein VirB9 in complex with the short lipoprotein VirB7 could be part of this structure [32,34,35]. This T4SS gene cluster is highly conserved in D. chrysanthemi Ech1591 (coverage 100%, identity 97%), D. zeae Ech586 (coverage 99%, identity 96%), and D. dadantii 3937 (coverage 100%, identity 96%), but not found in D. paradisiaca Ech703.

Type VI secretion system
The type VI secretion system (T6SS) was identified recently and initially thought to take part in bacterial pathogenicity and host colonization [36]. Subsequent studies showed that T6SS functions in various biological processes, such as mediating cooperative or competitive interactions between bacteria and eukaryotes, and bacterial biofilm formation [37][38][39][40][41]. T6SS is typically encoded by clusters of 12 to 20 genes, with a minimum of 13 genes for production of a functional apparatus [42]. In Dickeya genus, D. zeae Ech586 and D. chrysanthemi Ech1591 were shown to possess an identical T6SS loci consisting of 17 genes [43], but the biological function has not yet been determined. In this study, a gene cluster encoding T6SS was found in D.
zeae EC1, spanning 44.774 kb with 37 ORFs designated from W909_06460 to W909_06640. In this gene cluster, in addition to the 17 T6SS genes described previously [43], there are 20 additional ORFs inserted between W909_06465 (vgrG encoding Valine-glycine repeat G) and W909_06570 (impB) genes ( Figure 5). These inserted genes encode a PAAR repeat-containing protein phospholipase A1 PldA (AJC65737.1), two ankyrins (AJC65738.    1), respectively, which share 47-70% protein similarity to their homologs in Pantoea ananatis and seem unlikely associated with the flagellar biogenesis. This fragment was also found between the fragellar genes in the rice strains DZZ2Q and ZJU1202 in same gene organization, but absent in other sequenced D. zeae strains. The flagellar genes are not clustered in the genome of D. paradisiaca Ech703, indicating again its distant evolution relationship with the other four Dickeya strains. Unlike some other Enterobacteriaceae, which encode more than one type of flagella and flagellin [44], Dickeya spp. encodes only one type of flagella and one type of flagellin. Flagellar proteins are generally responsible for cell motility and intracellular trafficking, secretion and vesicular transport, while the chemotaxis proteins are responsible for cell motility and signal transduction. In D. dadantii 3937, mutation of fliA encoding a sigma factor abolished the bacterial motility, significantly reduced Pel production and the bacterial attachment to plant tissues, demonstrating that FliA is a positive regulator of many traits associated with virulence [44]. Mutation of chemotactic genes (cheW, cheB, cheY and cheZ) caused substantial reduction in swimming motility and decreased the bacterial virulence against Arabidopsis and potato [45]. The roles of flagellar and chemotaxis systems in D. zeae infection, especially in pathogen-monocotyledon host interactions, remain to be investigated.

Quorum-sensing systems
Many Gram-negative bacterial pathogens utilize the luxI/ luxR quorum sensing system to regulate the expression of virulence genes. Typically, luxI encodes a synthase for production of acylhomoserine lactone (AHL) family quorum sensing signals, and luxR encodes an AHL signal receptor. Upon interaction with AHL signal, LuxR becomes an active transcription factor and hence modulating the expression of virulence genes. Our previous study showed that D. zeae EC1 produces an AHL family quorum sensing signal, i.e., N-(3-oxo-hexanoyl)-homoserine lactone (OHHL), which is encoded by the luxI homolog expI [8]. Mutation of expI resulted in the increased bacterial cell motility and decreased biofilm formation [8]. A Blast search of the EC1 genome revealed only one copy of expI (W909_00485) and also one well-conserved luxR homolog (W909_00480) designated as expR. ExpR is highly conserved in D. zeae Ech586 (97%), 3937 (91%), and D. chrysanthemi Ech1591 (91%), and D. paradisiaca Ech703 (64%). Interestingly, ExpI is absent in D. paradisiaca Ech703 (Additional file 8).
AI-2 produced by the S-ribosylhomocysteine lyase LuxS represents another type of conversed QS system involved in bacterial interspecies communication [46]. In Dickeya spp., evidence suggests that AI-2 production was switched off by indole-3-acetic acid (IAA) pathway under tryptophan control [47], but the biological significance of tryptophan modulation in planta and the role of AI-2 in pathogenesis remains unclear. A luxS gene homolog (W909_04510) encoding a S-ribosylhomocysteine lyase was also identified in D. zeae EC1, sharing about 79% identity to the luxS gene of Serratia marcescens ATCC 274 [48], and about 79% -93% identity to its homologs in Ech586, Ech1591 and Ech703 at nucleic acid level, respectively.
Interestingly, in the upstream of the expI/expR genes (W909_00480~00485), the vfm gene cluster encoding the biosynthesis of a novel QS signal was found highly conserved in D. zeae EC1 and other four Dickeya species and strains used in this study (70% to 98% identity at nucleic acid level, additional file 9, Additional file 10). The gene cluster was originally identified in D. dadantii 3937, and is associated with the regulation of virulence factor production and pathogenesis [49]. The biological significance of various QS systems in D. zeae EC1 and other Dickaya species remains to be further investigated.
Zeamine synthesis gene cluster D. zeae EC1 produces at least two polyamino phytotoxins and antibiotics, i.e., zeamine and zeamine II, which were shown to be the major virulence determinants [6]. The genes zmsA and zmsK, which respectively encode a multidomain polyketide synthase and an non-ribosomal peptide synthase containing only a condensation domain, are involved in zeamines biosynthesis and play a key role in the bacterial virulence [6,7]. Interestingly, zeamines have also been identified from a bacterial isolate of Serratia plymuthica RVH1 and a gene cluster containing 23 ORFs was proposed as the zeamine biosynthetic gene (zmn) cluster [50]. As the genome and the zmn sequences of S. plymuthica RVH1 are not publically available, we used the same gene cluster from S. plymuthica AS12 for comparison analysis. We found that most of the zmn genes of S. plymuthica are conserved in the D. zeae EC1 genome (W909_06685~06770) except a few genes including zmn1-4 and zmn23 ( Figure 6, additional file 11). Moreover, as expected, the two sets of zeamine biosynthetic genes from D. zeae EC1 and S. plymuthica AS12 share a high degree of similarity with amino acids identity ranging from 50-92% (Additional file 11). S. plymuthica is an uncommon cause of human infection, but the role of zeamines in its pathogenesis has not yet been determined The zms gene cluster of D. zeae EC1 spans 51,169 bp, and includes 18 ORFs ranging from zmsO to zmsN ( Figure 6). Surprisingly, the zms gene cluster is not conserved in the four Dickeya strains used in this study. Alignments of the nucleic acid sequence of the zms gene cluster showed that only few sporadic fragments in the cluster are present in the four whole-genome sequenced Dickeya spp., but surprisingly, it shares 69.35% similarity (95% coverage, 73% identity) with that in D. solani strains, such as 3337, D s0432-1, MK10, MK16, M005 and GBCC. The zeamine biosynthetic genes from D. zeae EC1 and D. solani D s0432-1 share a high degree of similarity with amino acids identity ranging from 59-94% (Additional file 11). We have also searched the partial genome sequences of other nine D. zeae strains, and found surprisingly that this zms gene cluster is only fully or almost fully conserved in the rice pathogens ZJU1202 [15] and DZ2Q [16] with 100% and 99% identity at nucleotide level, but absent in other D. zeae strains isolated from river water, maize, potato and banana, respectively [17,18]. Noticeably, the GC content of this cluster from D. zeae EC1 is 48.77%, significantly lower than the genomic GC contents of strains EC1 (53.43%), Ech1591 (54.5%), Ech586 (53.6%), Ech703 (55%) and 3937 (56.3%), suggesting that this zeamine biosynthesis gene cluster was acquired through horizontal gene transfer from other organisms at a relatively later stage of bacterial evolution, and seemingly associated with host specificity.

Conclusion
In this study, we present a complete genome sequence of D. zeae strain EC1 isolated from rice plants, which together with the other four released complete genome sequences of Dickeya species, i.e., D. chrysanthemi Ech1591, D. zeae Ech586, D. dadantii 3937, and D. paradisiaca Ech703, allows detailed genomic comparison to study bacterial evolution and identify clues associated with pathogenesis and host specificity. Synteny analysis of D. zeae EC1 and the other four released Dickeya spp. showed that the genome sequences of D. zeae strain EC1 is most co-linear with D. chrysanthemi Ech1591, most conserved with D. zeae Ech586 and least similar to D. paradisiaca Ech703. This is also evident from the overall similarity of the virulence genes conserved in the Dickeya strains used in this study. Several groups of virulence genes, such as protease genes (Table 2), T1SS gene cluster prtDEF, T3SS gene cluster (Figure 4), T4SS gene cluster (Additional file 5) and T6SS gene cluster ( Figure 5) are highly or fully conserved in D. zeae EC1 and D. zeae Ech586 but absent in the genome of D. paradisiaca Ech703.
Except for D. paradisiaca Ech703, most virulence genes are well-conserved in other four Dickeya pathogens. However, we also noted some variations in certain virulence determinants, for example, the T3SS (hrp) and T6SS gene clusters. Given the important role of these two virulence determinants in host-pathogen interactions, it would be interesting to determine the impact of such variations on bacterial virulence and host-specificity. Importantly, a range of unique genes associated with virulence were identified in D. zeae strain EC1, such as zeamine synthesis gene cluster, 4 hypothetical proteins in the T6SS gene cluster (AJC65743.1, AJC68373.1, AJC68374.1 and AJC65745.1), and the 9 fatty acid biosynthesis proteins (AJC66806.1~AJC66814.1) in the flagella biosynthesis cluster (Additional file 7). Among them, the genes encoding zeamine biosynthesis have been shown critical for D. zeae EC1 to establish infections in rice [6,7]. Intriguingly, the zms gene cluster was found in genomes of the six sequenced D. solani strains and strains of S. plymuthica RVH1 and AS12, suggesting that this cluster is possibly derived from a same organism by horizontal gene transfer during bacterial evolution. Given that the high similarity of the proteins encoded by the fragment inserted in the flagellar cluster (W909_12275~12315)) to P. ananatis (Additional file 7), we were tempted to speculate that this gene fragment may be transferred from an unknown organism or similar organisms to D. zeae EC1, P. ananatis strains AJ13355 (isolated from soil in Japan) and LMG20103 (isolated from Eucalyptus blight and dieback in South Africa).
In the last few years, up to 8 D. zeae isolates have been partially sequenced. Most intriguingly, sequence alignment showed that the zms gene cluster was only found in the D. zeae strains EC1, ZJU1202 and DZ2Q infecting rice plants [8,15,16], but not in other D. zeae strains from environment or other hosts, such as CSL RW192 and MK19 isolated from river water, NCPPB 2538 isolated from maize [51], Ech586 isolated from Philodendron [10], NCPPB 3531 and NCPPB 3532 isolated from potato, and MS1 isolated from banana [17,18]. Phylogenetic analysis of D. zeae strains indicates that these D. zeae strains are closely related. In particular, three rice isolates were found in the same evolutionary clade (Additional file 1). Taken together, our data suggests that the remaining D. zeae strains can be at least divided into two pathovars with one infecting rice and the other infecting other crops. We proposed to reclassify these rice strains as D. zeae subsp. oryzae.
In summary, comparison of the genome information of D. zeae EC1 with the closely related Dickeya species provides new insights for the conservation and evolution processes of virulence determinants in these important bacterial pathogens. The overall similarity in extracellular enzymes and hrp and T6SS systems of EC1 and other Dickeya pathogens seems to explain well why EC1 could also infect dicotyledons, and the unique genes (W909_06510 and W909_06520) in T6SS gene clusters might hold the keys to decipher the mechanisms of EC1 in infecting monocotyledons. Experimental dissection of the roles of these strain-specific genes and variations identified in this study shall reveal novel insights into the molecular mechanisms that specify host ranges and pathogenesis.

Whole-genome sequencing
Genomic DNA was extracted from D. zeae EC1 grown at 28°C in YEB medium [8] using MasterPure DNA purification kit (EPICENTRE Biotechnologies). The complete genome sequence was determined by the Beijing Genome Institute (Shenjun, China) using Solexa technology. A shotgun paired-end library with a fragment size between 150 to 500 bp and a long jumping distance mate-pair library with an insert-size between 2 to 10 kb were constructed. The two libraries were sequenced, and reads were trimmed on quality and length. In total, 1,664,407 high-quality filtered sequence reads were generated, with an average coverage equivalent to 148×. Sequence assembly was carried out using the SOAP software (http://soap.genomics.org.cn) 7 with a read length of 0.5 and a similarity of 0.8. Seven contigs were generated and joined into 3 scaffolds using pair-end information, with 3430.072 kb, 1294.767 kb and 1.821 kb in length, respectively. The in silico finishing of some gaps was carried out by mapping. We used the borders of the gaps as anchor and retrieved the reads in both orientations to perform a new de novo assembly on the gaps. The mapped reads were collected and used for de novo local assembling (read length of 0.5 and similarity of 0.8). Other gaps were closed by PCR amplification followed by DNA sequencing. The genome sequence data has been deposited in NCBI database with accession number CP006929.1.

Gene prediction and annotation
Genes were predicted using the coding sequence (CDS) prediction program Glimmer 3.0 [52]. Amino acid genome comparison was performed by bi-directional BLASTp (2.2.21) sequence alignment of translated ORFs in the Nr, Nt, SwissProt, COG (release: 20090331), and KEGG databases with a 10 −5 e-value threshold. The best hit was filtered using a 50% identity cut-off value. Genes were considered as strain-specific if identity of the encoded protein was lower than 80% on the similar portion between them. In addition, ORFs with a size less than 110 bp were cut-off by Glimmer 3.0. Transposon sequences were searched using the RepeatMasker and RepeatProteinMasker softwares. Tandem repeat sequences were predicted by Tandem repeat finder (TRF) and aligned in the known transposon sequence database. rRNAs were found in rRNA database or predicted by rRNAmmer [53], tRNAscan [54] was used to predict the tRNA regions and analyze the tRNA secondary structures, and Rfam was used to analyze miRNA, sRNA and snRNA.

Phylogenetic analysis
Relationships of EC1 with other isolates of Dickeya were determined from multilocus sequence analysis (MLSA) on four housekeeping genes including atpD, dnaX, gyrB and recA, which were always used for the identification of Dickeya spp. [1,55]. AtpD, gyrB and recA gene sequences of 38 Dickeya strains, and dnaX gene sequences of 35 Dickeya strains were compared with sequences available in the public database (http://www.ncbi.nlm.nih.gov/). All sequences for the atpD (642 nucleotides), dnaX (536 nucleotides), gyrB (745 nucleotides) and recA (425 nucleotides) genes were edited and aligned using the ClustalW 1.6 software with parameters as: Gap Opening Penalty of 15, and Gap Extension Penalty of 6.66 for Pairwise and Multiple Alignment; Transition Weight of 0.5; and Delay Divergent Cut-off of 30%. Trees obtained with the concatenated data set of the four genes were constructed with Pectobacterium atrosepticum SCRI1043 as an outgroup. Method of Neighbor-joining was used for analysis, and 1,000 bootstrap replicates were included in a heuristic search, with a random tree and the tree bisection-reconnection branch-swapping algorithm. The percent variation was calculated by comparing all isolates to the nearest relative.

Genome comparisons
According to the phylogenetic analysis, we selected the four closely relative species with released whole genomes including D. chrysanthemi Ech1591 (formerly named D. zeae, GenBank accession number CP001655.1), D. zeae Ech586 (formerly named D. dadantii, CP001836, GCF_000025065), D. dadantii 3937 (CP002038) and D. paradisiaca Ech703 (formerly named D. dadantii, CP001654) [10], and another two D. zeae rice strains including ZJU1202 (GCF_000264075) [15] and DZ2Q (GCF_000404105) [16] for genome comparisons. The sequence of EC1 is ordered according to that of the reference bacterium based on Mummer 3.22 (http://mummer.sourceforge.net/). Then the upper and following axes of linear synteny graph are constructed after the same proportion of size reduction in length of both sequences (parameter: b, 200; c, 65; extend: l, 20). According to BLAST results, each pair nucleic acid sequence of two alignments is marked in the coordinate diagram according to its position information after the same proportion of size reduction.
To identify the set of common genes and the set of genes unique to strain EC1, OrthoMCL clustering analyses were performed with the following parameters: P-value Cut-off = 1 × 10 −5 ; Identity Cut-off = 90%; Percent Match Cut-off = 80.