Complete genome sequence of endophytic nitrogen-fixing Klebsiella variicola strain DX120E

Klebsiella variicola strain DX120E (=CGMCC 1.14935) is an endophytic nitrogen-fixing bacterium isolated from sugarcane crops grown in Guangxi, China and promotes sugarcane growth. Here we summarize the features of the strain DX120E and describe its complete genome sequence. The genome contains one circular chromosome and two plasmids, and contains 5,718,434 nucleotides with 57.1% GC content, 5,172 protein-coding genes, 25 rRNA genes, 87 tRNA genes, 7 ncRNA genes, 25 pseudo genes, and 2 CRISPR repeats.


Introduction
The species Klebsiella variicola was classified in 2004 and consisted of clinical and plant-associated isolates [1].The species K. singaporensis was classified in 2004 based on a single soil isolate [2] and was recently identified as a later junior heterotypic synonym of K. variicola [3]. K. variicola is able to fix N 2 [1]. K. variicola strain At-22, one of the dominant bacteria in the fungus gardens of leaf-cutter ants, provides nitrogen source by N 2 fixation [4] and carbon source by degrading leaf polymers to the ant-fungus symbiotic system [5]. Former K. pneumoniae strain 342 (Kp342), which is phylogenomically close to strain At-22 [6,7] and has been identified as a strain of K. variicola [3], is able to colonize in plants and to provide small but critical amounts of fixed nitrogen to plant hosts [8].
K. variicola strain DX120E was isolated from roots of sugarcane grown in Guangxi, the major sugarcane production area in China [9]. It is able to colonize in sugarcane roots and shoots, to fix N 2 in association with sugarcane plants, and to promote sugarcane growth [10], and thus shows a potential as a biofertilizer. Here we present a summary of the features of the K. variicola strain DX120E (=CGMCC 1.14935) and its complete genome sequence, and thus provide a genetic background to understand its endophytic lifestyle, plant growth-promoting potentials, and similarities and differences to other plant-associated and clinical K. variicola isolates.

Organism information
Classification and general features K. variicola strain DX120E is a Gram-negative, nonspore-forming, non-motile rod ( Figure 1). It grows aerobically but reduces N 2 to NH 3 at a low pO 2 . It is able to grow and fix N 2 on media containing 10% (w/v) cane sugar or sucrose. It forms circular, convex, smooth colonies with entire margins on the solid high-sugar content media. It grows best around 30°C and pH 7 (Table 1).
Phylogenetic analysis of the 16S rRNA gene sequences from strain DX120E and strain Kp342, the type strains of the species in the genera Klebsiella and Raoultella, and the type strain of the type species of the type genus of the family Enterobacteriaceae (Escherichia coli ATCC11775 T ) showed that K. variicola strains (type strain F2R9, Kp342, DX120E and LX3) were most closely related and formed a monophyletic group with K. pneumoniae and K. quasipneumoniae ( Figure 2).

Genome sequencing information
Genome project history K. variicola DX120E was selected for sequencing because it is a plant growth-promoting endophyte [10]. Its 16S rRNA gene sequence is deposited in GenBank under the accession number HQ204296. Its genome sequences are deposited in GenBank under the accession numbers CP009274, CP009275, and CP009276. A summary of the genome sequencing project information and its association with MIGS version 2.0 [11] is shown in Table 2.
Growth conditions and DNA isolation K. variicola DX120E was grown in liquid Luria-Bertani (LB) medium at 30°C to early stationary phase. The genome DNA was extracted from the cells by using a TIANamp bacterial DNA kit (Tiangen Biotech, Beijing, China). DNA quality and quantity were determined with a Nanodrop spectrometer (Thermo Scientific, Wilmington, USA).

Genome sequencing and assembly
The genome DNA of K. variicola DX120E was constructed into a 4 -10 kb insert library and sequenced by  Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [21]. The two draft genomes were aligned by Mauve [26]. The Illumina scaffold 1 bridged the PacBio contig 1 and contig 2; the Illumina scaffold 3 bridged the PacBio contig 1, contig 2, and contig 3; the Illumina scaffold 11 bridged the circular PacBio contig 4; the Illumina scaffold 16 bridged the circular PacBio contig 5. The genome sequencing was completed by PCR and Sanger sequencing to close the contig gaps of the PacBio-sequenced genome.

Genome annotation
Automated genome annotation was completed by the NCBI Prokaryotic Genome Annotation Pipeline. Product description annotations were obtained by searching against the KEGG, InterPro, and COG databases. Genes with signal peptides were predicted by SignalP [27]. Genes with transmembrane helices were predicted by TMHMM [28]. Genes for tRNA were found by tRNAScanSE [29]. Ribosomal RNAs were found by BLASTN vs. ribosomal RNA databases; 5S rRNA hits were further refined by Cmsearch [30]. Thirteen disrupted genes were replaced by the complete gene sequences obtained from the Illumina HiSeq 2000 sequencing.

Genome properties
The genome of K. variicola DX120E contains one circular chromosome and two plasmids (pKV1 and pKV2) ( Table 3, Figure 3). The chromosome contains 5,501,013 nucleotides with 57.3% G + C content. The plasmid pKV1 contains 162,706 nucleotides with 50.7% G + C content. The plasmid pKV2 contains 54,715 nucleotides with 53.1% G + C content. The genome contains 5,316  The total is based on the total number of protein coding genes in the genome.
Insights from the genome sequence The genome of K. variicola DX120E contains genes contributing to multiple plant-beneficial functions. In accordance with previously detected N 2 fixation, indole-3-acetic acid production, siderophore production, and phosphate solubilization [9], the genome of K. variicola DX120E contains nif cluster, indole-3pyruvate decarboxylase, siderophore enterobactin synthesis genes (entABCDEF) and enterobactin exporter gene (entS), and pyrroloquinoline quinone synthesis genes (pqqBCDEF) contributing to these functions. Moreover, the genome of K. variicola DX120E contains the budABC operon for the synthesis of acetoin and 2,3-butanediol [32], and thus may induce plant systemic resistance to pathogens [33]. DX120E contains plasmids similar to those in Klebsiella relatives. The plasmid pKV1 is most similar to the plasmid pKp5-1 of the K. pneumoniae strain 5-1 (Kp5-1) [34] with a 97% identity of 56% coverage (Additional file 1: Figure S1); the similar regions mainly encode transposase/recombinases and proteins functioning in plasmid replication, partitioning, and conjugal transfer. The plasmid pKV2 is most similar to the plasmid pKOXM1C of the K. oxytoca strain M1 with a 96% identity of 89% coverage (Additional file 2: Figure S2); the similar regions mainly encode proteins for plasmid partitioning and phage functions.
The genome of K. variicola DX120E has high average nucleotide identities (ANI) [35] about 99% to the available genomes of K. variicola strains DSM 15968 T , At-22, Bz19, and Kp342. Bz19 was isolated from faeces of a hospitalized patient [6]. The plantbeneficial strain Kp342 is able to infect mouse organs, although it is less virulent than typical clinical K. pneumoniae isolates [36]. Kp5-1, which has the plasmid pKp5-1 close to pKV1, is a cotton pathogen causing boll-rot disease [34]. The genome of strain Kp5-1 has ANI values about 99% to the genomes of the known K. variicola strains and thus belongs to K. variicola. These drive concerns about potential pathogenicity of DX120E to animals and plants. Therefore, DX120E's pathogenic potentials to animals and plants should be determined before using DX120E as a biofertilizer in the field.