Non-contiguous finished genome sequence and description of Gorillibacterium massiliense gen. nov, sp. nov., a new member of the family Paenibacillaceae

Strain G5T gen. nov., sp. nov. is the type strain of Gorillibacterium massiliense, a newly proposed genus within the family Paenibacillaceae. This strain, whose genome is described here, was isolated in France from a stool sample of a wild Gorilla gorilla subsp. gorilla from Cameroon. G. massiliense is a facultatively anaerobic, Gram negative rod. Here we describe the features of this bacterium, together with the complete genome sequence and annotation. The 5,546,433 bp long genome (1 chromosome but no plasmid) contains 5,145 protein-coding and 76 RNA genes, including 69 tRNA genes.


Introduction
Strain G5 T (= CSUR P290 = DSM 27179) is the type strain of Gorillibacterium massiliense gen. nov., sp. nov. This bacterium which is proposed to belong to the family Paenibacillaceae, is a Gram-negative, flagellated, facultative anaerobic, indole-negative bacillus that was isolated from a fecal sample of a wild western lowland gorilla from Cameroon, through a culturomics study of the bacterial diversity of the feces of wild gorillas. This technique was used successfully to explore the human gut microbiota allowing the isolation of many new species and genera [1][2][3]. The newly proposed strategy of applying high throughput genome sequencing, MALDI-TOF spectral analysis of cellular proteins, coupled with more traditional methods of phenotypic characterization has been demonstrated as a useful approach for the description of new bacterial taxa [4][5][6][7][8][9][10][11][12][13][14][15]. A principle advantage is that this method circumvents the vagaries of methods that rely mainly on DNA-DNA hybridization to delineate species. Here, we applied this polyphasic approach to describe G. massiliense gen. nov., sp. nov. strain G5 T .
Members belonging to this family were isolated mainly from soil, roots, blood, feces and other sources [16]. To the best of our knowledge, this is the first report of the isolation of a novel genus from the fecal flora of a gorilla. Here we present a summary classification and a set of features for G. massiliense gen. nov., sp. nov. strain G5 T (= CSUR P290 = DSM 27179) together with the description of the complete genomic sequencing and its annotation. These characteristics support the circumscription of a novel genus, Gorillibacterium gen. nov. within the family Paenibacillaceae, with Gorillibacterium massiliense gen. nov., sp. nov. as the type species.

Classification and features
In July 2011, a fecal sample was collected from a wild Gorilla gorilla subsp. gorilla near Minton, a village in the south-central part of the DJA FAU-NAL Park (Cameroon). The collection of the stool sample was approved by the Ministry of Scientific Research and Innovation of Cameroon. No experiments were conducted on this gorilla. The fecal specimen was preserved at -80°C after collection and sent to Marseille. Strain G5 T (Table 1) was isolated in August 2012 by aerobic cultivation at 37°C on sterilized soil medium (12 g of soil (Latitude: N 43° 17' 20.151''; Longitude: E 5° 24' 15.3822'') /agar (14g/l). This strain exhibited a 93.72% 16S rRNA nucleotide sequence similarity with Paenibacillus turicensis, the phylogenetically closest validly published Paenibacillus species (Figure 1). This value was lower than the 95.0% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new genus without carrying out DNA-DNA hybridization [37]. Phylum Firmicutes TAS [28][29][30] Class Bacilli TAS [31,32] Order Bacillales TAS [33,34] Family Paenibacillaceae TAS [16,32] Genus , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [35]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. Sequences were aligned using CLUSTAL X (V2), and phylogenetic inferences obtained using the maximum-likelihood method within the MEGA 5 software [36]. Numbers at the nodes are percentages of bootstrap values obtained by repeating the analysis 1,000 times to generate a majority consensus tree. Brevibacillus brevis was used as out-group. The scale bar represents a 2% nucleotide sequence divergence.
Different growth temperatures (25, 30, 37, 45°C) were tested. No growth occurred at 45°C, growth occurred between 25°and 37°C, and optimal growth was observed at 37°C. Colonies were bright grey with a diameter of 1.0 mm on 5% blood-enriched Columbia agar. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux), and under aerobic conditions, with or without 5% CO2. Growth was observed under anaerobic and microaerophilic conditions, but optimal growth was obtained aerobically. Moreover, the Gram staining showed Gramnegative rod ( Figure 2). A motility test produced a negative result. Cells grown on agar did not sporulate and the rods exhibited peritrichous flagella and had a mean length of 1.75 µm and a mean diameter of 0.67 µm as determined by negative staining transmission electron microscopy ( Figure 3).

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the family Paenibacillaceae, and is part of a "culturomics" study of the gorilla flora aiming at isolating all bacterial species within gorilla feces. It was the 81 st genome of the Paenibacillaceae family and the first genome of Gorillibacterium massiliense gen. nov., sp. nov. A summary of the project information is shown in Table 3. The Genbank accession number is CBQR000000000 and consists of 176 large contigs. Table 3 shows the project information and its association with MIGS version 2.0 compliance [43].

Growth conditions and DNA isolation
Gorillibacterium massiliense gen. nov., sp. nov., strain G5 T (= CSUR P290 = DSM 27179) was grown aerobically on 5% sheep blood-enriched Columbia agar at 37°C. Four petri dishes were spread and resuspended in 3×500µl of TE buffer and stored at 80°C. Then, 500 µl of this suspension were thawed, centrifuged 3 minutes at 10,000 rpm and resuspended in 3×100 µL of G2 buffer (EZ1 DNA Tissue kit, Qiagen). A first mechanical lysis was performed by glass powder on the Fastprep-24 device (Sample Preparation system, MP Biomedicals, USA) using 2×20 seconds cycles. DNA was then treated with 2.5µg/µL lysozyme (30 minutes at 37°C) and extracted using the BioRobot EZ1 Advanced XL (Qiagen). The DNA was then concentrated and purified using the Qiamp kit (Qiagen). The yield and the concentration were measured by the Quant-it Picogreen kit (Invitrogen) on the Genios Tecan fluorometer at 50ng/µl.

Genome sequencing and assembly
The paired-end library was prepared with 5 µg of bacterial DNA using DNA fragmentation on a Covaris S-Series (S2) instrument (Woburn, Massachusetts, USA) with an enrichment size at 4.5kb. DNA fragmentation was visualized with an Agilent 2100 BioAnalyzer on a DNA labchip 7500. The library was constructed according to the 454 GS FLX Titanium paired-end protocol (Roche). Circularization and nebulization were performed and generated a pattern with an op-timum at 510 bp. After PCR amplification through 17 cycles followed by double size selection, the single stranded paired-end library was quantified using a BioAnalyzer 2100 on a RNA pico 6000 labchip at 68 pg/µL. The library concentration equivalence was calculated as 2.45E+08 molecules/µL. The library was stored at -20°C until further use. The paired-end library was clonally amplified with 0.25 cpb and 0.5 cpb in 2 emPCR reactions with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yield of the emPCR was respectively of 5 and 6% as expected of the yield ranging from 5 to 20% recommended by the Roche procedure.
Approximately 790,000 beads were loaded twice (i.e. two runs were performed using the same paired-end library) on a ¼ region of the GS Titanium PicoTiterPlate PTP Kit 70×75 and sequenced with the GS FLX Titanium Sequencing Kit XLR70 (Roche). The two runs were performed overnight and then analyzed on the cluster through the gsRunBrowser and Newbler assembler (Roche). A total of 387,157 passed filter wells were obtained and generated 142.7 Mb of sequences with a length average of 369 bp. The passed filter sequences were assembled using Newbler with 90% identity and 40-bp as overlap. The final assembly identified 12 scaffolds with 176 large contigs (>1.5kb), generating a genome size of 5.5 Mb which corresponds to a genome coverage of 25.71×.  [46] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [47] and BLASTn against the GenBank database. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05.

Genome annotation
To estimate the mean level of nucleotide sequence similarity at the genome level between G. massiliense and another 2 members of the family Paenibacillaceae and Brevibacillus brevis, we use the Average Genomic Identity of Orthologous gene Sequences (AGIOS), a custom application we developed. Briefly, the AGIOS software com-bines the Proteinortho software [48] for detecting orthologous proteins between genomes compared two by two, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needleman-Wunsch global alignment algorithm.

Genome properties
The genome is 5,546,433 bp long with a 50.39% G+C content ( Figure 6 and Table 4). It is composed of 189 Contigs (176 large contigs, 12 scaffolds). Of the 5,221 predicted genes, 5,145 were protein-coding genes, and 76 were RNAs (1 gene is 16S rRNA, 1 gene is 23S rRNA, 5 genes are 5S rRNA, and 69 are tRNA genes). A total of 3,865 genes (75.12%) were assigned a putative function (by cogs or by NR blast). In addition, 272 genes were identified as ORFans (5.29%). The remaining genes were annotated as hypothetical proteins (680 genes => 13.22%). The distribution of genes into COGs functional categories is presented in Table 5. The properties and the statistics of the genome are summarized in Table 4 and 5.  Genes with transmembrane helices 1,267 24.63 a The total is based on either the size of the genome in base pairs or the total number of protein-coding genes in the annotated genome

Genomic comparison of G. massili ense and other members of the family Paenibacillaceae
The genome of G. massiliense strain G5 T was compared to those of P. elgii strain B69, P. alvei strain DSM 29 and B. brevis strain NBRC 100599 (Table 6A and Table 6B). The draft genome of G. massiliense is smaller in size than those of P. elgii, P. alvei and B. brevis (5.54 vs 7.96, 6.83 and 6.3 Mb respectively). G. massiliense has a lower G+C content than P. elgii (50.39% vs 52.6%) but higher than those of P. alvei and B. brevis (50.39% vs 45.9% and 47.3% respectively). The protein content of G. massiliense is lower than those of P. elgii, P. alvei and B. brevis (5,146 vs 7,597, 6,823 and 5,946 respectively) ( Table 6 and Table 6B). In addition, G. massiliense shares 2,122, 1,846 and 1,716 orthologous genes with P. elgii, P. alvei and B. brevis, respectively ( Table   6). The nucleotide sequence identity of orthologous genes ranges from 66 to 67.6% among previously published genomes, and from 65.3 to 68.7% between G. massiliense and other studied genomes (Table 6A and Table 6B). Table 6 summarizes the number of orthologous genes and the average percentage of nucleotide sequence identity between the different genomes studied.

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of Gorillibacterium massiliense gen. nov., sp. nov., that contains the strain G5 T . This bacterium has been found in stool sample of wild gorilla collected in Cameroon. Description of Gorillibacterium gen. nov.
G. massiliense is Gram-negative rod. Facultatively anaerobic. Mesophilic. Optimal growth is achieved at 37°C. Non-sporulating and nonmotile bacterium. Colonies are bright gray and 0.5-1 mm in diameter on blood-enriched Columbia agar. Cells are rod-shaped and have a mean diameter of 0.67 µm and a mean length of 1.75 µm.