The complete genome sequence and analysis of a plasmid-bearing myxobacterial strain Myxococcus fulvus 124B02 (M 206081)

Myxobacteria, phylogenetically located in the delta division of the Proteobacteria, are well known for characterized social behaviors and large genomes of more than 9 Mb in size. Myxococcus fulvus is a typical species of the genus Myxococcus in the family Myxococcaceae. M. fulvus 124B02, originally isolated from a soil sample collected in Northeast China, is the one and only presently known myxobacterial strain that harbors an endogenous autonomously replicating plasmid, named pMF1. The endogenous plasmid is of importance for understanding the genome evolution of myxobacteria, as well as for the development of genetic engineering tools in myxobacteria. Here we describe the complete genome sequence of this organism. M. fulvus 124B02 consists of a circular chromosome with a total length of 11,048,835 bp and a circular plasmid of 18,634 bp. Comparative genomic analyses suggest that pMF1 has a longstanding sustention within myxobacteria, and probably contributes to the genome expansion of myxobacteria. Electronic supplementary material The online version of this article (doi:10.1186/s40793-015-0121-y) contains supplementary material, which is available to authorized users.


Introduction
The gliding Gram-negative myxobacteria are characterized by complex social behaviors, e.g. cells moving on solid surfaces in swarms, preying on other microorganisms in a 'wolf-like' pattern, and, when nutrients are depleted, developing into myxospores embodied in fruiting bodies [1,2]. In addition, myxobacteria are able to produce various secondary metabolites and macromolecule degradation enzymes, not only having potential in applications but also probably working as ecological weapons against other living microorganisms [3][4][5]. Myxobacteria possess large genomes. For instance, the genomes of Myxococcus xanthus DK1622 and the halotolerant M. fulvus HW-1 are 9.14 Mb [6] and 9.03 Mb [7] in size, while the genomes of Sorangium cellulosum even reach to 13.03 Mb in strain So ce56 [8] and 14.78 Mb in strain So0157-2 [9], respectively. The So0157-2 genome is still the largest one reported in prokaryotes.
Extrachromosomal autonomously replicating genetic materials are normally absent from myxobacterial cells.
Up to now, pMF1, originally discovered from M. fulvus 124B02 [10], is still the one and only endogenous plasmid that is able to replicate autonomously in myxobacterial cells. Genome sequencing of M. fulvus 124B02 is thus meaningful for understanding the evolution of myxobacterial genomes, and providing clues for the presence of pMF1 in strain 124B02. Here we report the complete genome sequence and analyses of M. fulvus 124B02.

Classification and features
Strain 124B02 was isolated from a soil sample collected in Northeast China [11]. Vegetative cells of the strain are slender rods with tapering ends, 0.6-0.8 × 4-8 μm. The fruiting bodies are spherical or slightly pear-shaped with a diameter of 50-250 μm and a yellow red color. The strain did not grow expansively, but into membranaceous clumps on CYE solid plates. When grown in liquid CYE medium, the cells grew into spherical clumps. Figure 1 shows morphological characteristics of M. fulvus 124B02. The optimal growth pH for strain 124B02 is in the range of 6.8-7.6, and the optimal growth temperature ranges between 26°C and 32°C. The predominant fatty acids of M.fulvus 124B02 cells were determined as iso-C 15:0 (33.18 %), C 16:1 ω5c (20.19 %), iso-C 14:0 3-OH (6.27 %), C 16:0 (5.79 %) and C 14:0 (5.65 %). 2-hydroxy and 3-hydroxy fatty acids are the major hydroxyl fatty acid components of strain 124B02. Figure 2 is a phylogenetic tree of the 16S rRNA gene sequences showing the location of M. fulvus 124B02 in the Cystobacterineae suborder of myxobacteria (the GenBank accession number of the 16S rRNA gene sequence of strain 124B02 is EU137665). All three 16S rRNA gene copies in the genome of strain 124B02 are identical, but differ by two nucleotides from the previously published 16S rRNA sequence generated from M.fulvus 124B02 (EU137665). According to the morphological and phylogenetic characteristics, M. fulvus 124B02 was determined as a typical strain of Myxococcus fulvus (Table 1 shows the classification and general features of the strain).

Genome project history
This organism was selected for sequencing because of its evolutionary significance as the only presently known myxobacterial strain bearing an endogenous plasmid. The genome project of M. fulvus 124B02 was deposited in the Genome Online Database and the complete genome sequence of strain 124B02 was deposited in GenBank under the accession number of CP006003. A summary of the project information is shown in Table 2.
Growth conditions and genomic DNA preparation M. fulvus 124B02 was cultivated in the CTT growth medium containing 1 % casitone, 10 mM Tris-HCl, 1 mM KH 2 PO 4 -K 2 HPO 4 , 8 mM MgSO 4 , pH 7.6. The cells were harvested by centrifugation after five days of incubation at 30°C. DNA was extracted from the cell mass using the methods described previously [12] with   [35,36] of the 16S rRNA gene sequence under the maximum likelihood criterion [37] and rooted with Nannocystis excedens. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are the supporting values from 1,000 bootstrap replicates. Lineages with type strain genome sequencing projects registered in GOLD [38] are shown in blue and published genomes in bold slight modifications. Briefly, approximately 50-100 mg cell pellets were suspended in 500 μl TE buffer, containing 25 mM Tris-HCl (pH 8.0), 25 mM EDTA, and 2 mg/ml lysozyme. The mixture was incubated at 37°C for 1 h with periodic gentle inversion for cell lysis. Then, 2.5 μl proteinase K was added to a final concentration of 100 μg/ml, and the mixture was incubated at 37°C for additional 1 h. The total protein was removed with Trissaturated phenol-chloroform-isoamyl alcohol (25:24:1, pH 8.0). To precipitate DNA, 0.1 volume of 3 M sodium acetate (pH 5.3) and the same volume of isopropyl alcohol were added to the final supernatant. The DNA pellet was washed with 70 % ethanol twice, air-dried, and dissolved in 50 μl ddH 2 O.

Genome sequencing and assembly
Genome sequencing and assembly were performed in Shanghai Majorbio Bio-Pharm Technology Co., Ltd. The genome was sequenced with a combination of the Roche 454 GS FLX and Illumina GAII sequencing platforms. The 454 pyrosequencing reads, containing 285.8 Mb draft data, were firstly assembled using the Newbler assembler V2.3, producing 51 contigs in 23 scaffolds. This initial assembly was converted into a phrap assembly by making fake reads from the consensus, to collect the read pairs in the 454 paired end library. The clean data from Illumina GAII sequencing were assembled with Velvet assembler and the consensus sequences were shredded into 800-bp overlapped fake reads, which were Evidence codes -TAS traceable author statement, i.e. the direct report in the literature, NAS non-traceable author statement, i.e. not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence. These evidence codes are from the Gene Ontology project [19] assembled with the 454 draft data. In total, the combination of the Illumina and 454 sequencing platforms produced 112.5× coverage of the genome. The final assembly contained 738,315 pyro sequences and 12,776,900 Illumina reads. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Then the Phred/Phrap/Consed software package [13][14][15] was used for quality assessment. Possible misassembles were corrected by sequencing the cloned bridging PCR fragments. We designed primers for the amplification of 76 gap regions to close gaps and to improve the quality of the finished genome. Gaps between contigs were closed by editing in Consed, PCR amplification and 3730 sequencing. The wrong bases were corrected by comparing with Illumina GAII data after the genome cyclization, using BWA (0.7.3a) [16] and samtools (0.1.19) [17]. The error rate of the completed genome sequence is less than 1 bp in 100,000 bp.

Genome annotation
The genome was annotated automatically in GenBank. In addition, we predicted Cluster Regularly Interspaced Short Palindromic Repeats (CRISPRS) with PILER-CR [18]. We analyzed the predicted protein sequences against the National Center for Biotechnology Information (NCBI) non-redundant database, Gene Ontology [19], KEGG [20], and COG [21] databases for functional annotation. The results were summarized with the InterProScan [22] software. To analyze the COG annotation, hits with an E-value < = 1e-5 were first retained. Then, only the best hit was selected for each protein. Signal peptides and transmembrane helices of all annotated proteins were predicted using SignalP 4.1 Sever [23] and TMHMM Sever v. 2.0 respectively.

Genome properties
The genome statistics are provided in Table 3, Table 4 and    The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome 68.7 %, respectively. There were 8,515 predicted coding sequences (CDSs) in the genome, including 9 rRNAs and 80 tRNAs. The protein coding sequences occupied 86.13 % of the whole genome sequence. The majority of the protein-coding genes (5,042, 58.24 % of the total) were assigned putative functions in categories of orthologous group (COG), while the remaining ones were annotated as hypothetical proteins. The distribution of genes in COGs functional categories is presented in Table 5.

Insights from the genome sequence
Until now, 22 [25,26] and intra-chromosomal gene duplication (IGD) [27,28] are two major contributors for the expansion of most prokaryotic genomes. BLASTP searching against the other three sequenced Myxococcus genomes revealed 576 strain-specific duplications in the strain 124B02 genome (the core circle in Fig. 3), accounting for 6.7 % of the total CDSs. The exogenous genetic materials may be introduced into bacterial genomes via plasmids, prophages, virus, integrative conjugative elements, insertion sequence elements or other unclassified elements [29]. Of the total 8,492 CDSs in M. fulvus 124B02 genome, 3926 (46.2 %) were probably derived from plasmids (circles 9 & 10 in Fig. 3), which is similar to that in other myxobacteria [6,9]. We conducted an all- blast-all analysis using BLASTP program with an E-value cutoff of 1e-5, and the results were transferred into OrthoMCL package to extract the paralogous and orthologous proteins. Interestingly, the phylogenomic analysis indicated that M.fulvus 124B02 is closer to M. stipitatus DSM 14675, rather than M. xanthus DK1622 or M. fulvus HW-1 (Fig. 4a), which was also supported by the genome synteny analysis (Fig. 4b). We found that the major differences between M. fulvus 124B02 and the other three Myxococcus strains were those protein sequences for the metabolism and environment adaption processes [Additional file 1: Table S1, Additional file 2: Table S2, Additional file 3: Table S3] and of those strain-specific genes. For example, according to the COG catalog, the major differences between M. fulvus 124B02 and M. stipitatus DSM 14675 were in the families of lipid transport and metabolism (p-value is 0.0076, Fisher's exact test, two-tailed test), transcription (p-value is 0.0097, Fisher's exact test, two-tailed test), secondary metabolites biosynthesis, transport and catabolism (p-value is 0.0015, Fisher's exact test, two-tailed test) and replication, recombination and repair (p-value is 0.0251, Fisher's exact test, two-tailed test). M. fulvus 124B02 had approximately 1,230 kb strain-specific fragments, which scattered throughout the whole genome (circles 5 & 6 in Fig. 3). Additional file 4: Table S4 lists the strain specific genes of replication, recombination and repair family, of which the number of M.fulvus 124B02 is less than M.stipitatus DSM 14675. pMF1 is a low copy number plasmid, containing 23 predicted ORFs [10]. The plasmid has no obvious beneficial genes for persistence in host, such as the genes encoding for antibiotic resistance, virulence, or growth phenotypes. All the predicted genes in pMF1 are of unknown functions, except the replication system (pMF1. ) and the partitioning system (pMF1.21-pMF1.23), both of which were determined by narrowing-down of sequence fragments [10,30,31]. While the pMF1.19 and pMF1.20 genes were on the lagging strand, the others were located on the leading strand (Fig. 5). Interestingly, BLASTP searching  (Fig. 6c). It is also noted that, although there is no gene coding for mobility systems [32], and we have not yet observed conjugative transfer of the plasmid between Myxococcus strains, the pMF1.2 and its homologue MYSTI_04155 had an AAA_10 and TraC-F-type motifs, both were reported to relate to conjugative transfer [33,34].  Submit your manuscript at www.biomedcentral.com/submit