Hugonella massiliensis gen. nov., sp. nov., genome sequence, and description of a new strictly anaerobic bacterium isolated from the human gut

Abstract The human gut is composed of a large diversity of microorganisms, which have been poorly described. Here, using culturomics, a new concept based on the variation in culture conditions and MALDI‐TOF MS identification, we proceed to explore the microbial diversity of the complex ecosystem of the human gut. Using this approach, we isolated strain AT8T (=CSUR P2118 = DSM 101782) from stool specimens collected from a 51‐year‐old obese French woman. Strain AT8T is a strictly anaerobic, nonmotile, nonspore‐forming gram‐positive coccus that do not exhibit catalase and oxidase activities. 16S rDNA‐based identification of strain AT8T demonstrated 92% gene sequence similarity with Eggerthella lenta DSM 2243, the phylogenetically closed validly named type species. Here, we present a set of features for the strain AT8T and the description of its complete genome sequence and annotation. The 2,091,845 bp long genome has a G+C content of 63.46% and encodes1,849 predicted genes; 1,781 were protein‐coding genes, and 68 were RNAs. On the basis of the characteristics reported here, we propose the creation of a new bacterial genus Hugonella gen. nov., belonging to the Eggerthellaceae family and including Hugonella massiliensis gen. nov., sp. nov., strain AT8T as the type strain.


| INTRODUCTION
The human gut harbors a complex bacterial community known as microbiota. However, this ecosystem remains incompletely characterized and its diversity poorly described (Eckburg et al., 2005;Lozupone, Stombaugh, Gordon, Jansson, & Knight, 2012). Culturomics concepy was recently proposed as a new alternative to explore this ecosystem and enriches the human microbiota repertoire. This method is based on the large variation in culture conditions and the use of rapid bacterial identification methods such as matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF) and 16S rRNA gene amplification and sequencing of the colonies (Lagier et al., 2015). Traditionally, several parameters were used to identify and define a new bacterial species including 16S rRNA gene sequencing and phylogeny, genomic diversity of the G+C content, DNA-DNA hybridization (DDH) intensive phenotypic, and chemotaxonomic characterization (Ramasamy, Mishra, Lagier, Padhmanabhan, & Rossi, 2014;Welker & Moore, 2011). Nevertheless, some limits have been appeared notably because the cutoff values vary dramatically between species and genera (Rosselló-Móra, 2006). So in order to describe new bacterial species, we recently proposed a new method named taxonogenomics, which includes both genomic analysis and proteomic information obtained by MALDI-TOF analysis (Ramasamy et al., 2014).
Using culturomics techniques (Lagier et al., 2015), we herein isolated strain AT8 T from a stool specimen of a 51-year-old obese French woman (BMI 44.38 kg/m 2 ). Here, we present a classification and a set of characteristics of strain AT8 T together with the description of its complete genome sequencing and annotation that allowed us to describe them as the first representative of a new bacterial genus classified into Eggerthellaceae family within the phylum Actinobacteria. The Eggerthellaceae family contains nine different genera, Adlercreutzia, Asaccharobacter, Cryptobacterium, Denitrobacterium, Eggerthella, Enterorhabdus, Gordonibacter, Paraeggerthella, and Slackia (Gupta et al. 2006). The species of this family are strictly anaerobic cocci and they do not form spores (Gupta et al. 2006).

| Ethics and sample collection
The stool sample was collected from a 51-year-old obese French woman (BMI 44.38 kg/m 2 ; weight 108 kg, 1.56 meters in height) in January 2012. Written consent was obtained from the patient at the Nutrition, Metabolic Disease and Endocrinology service, at La Timone Hospital, (Marseille, France). The study and the consent procedures were approved by the local IFR 48 ethics committee, under consent number 09-022, 2010. The stool sample was stored at −80°C after collection.

| Isolation of the strain
Strain AT8 T was isolated in June 2015 by anaerobic culture.
Approximately, 1 g of stool specimen was inoculated anaerobically in an anaerobic blood culture bottle supplemented with 5% (v/v) sheep blood and 5% (v/v) rumen fluid. pH was adjusted at 7.5 using KOH solution (10%) and the blood culture bottle was incubated at 37°C for 3 days. After 3 days incubation, subcultures were done on solid medium consisting of Columbia agar supplemented with 5% sheep blood and incubated anaerobically for 48 hr. All growing colonies were picked several times to obtain pure cultures.

| Strain identification by MALDI-TOF MS and 16S rRNA gene sequencing
The MALDI-TOF MS protein analysis consisted of picking an isolated colony and then depositing twelve distinct deposits on a MTP 96 MALDI-TOF target plate (Bruker Daltonics, Leipzig, Germany) to be analyzed. 2 μl of a matrix solution (saturated solution of α-cyano-4-hydroxycinnamic acid diluted in 50% acetonitrile and 2.5% of tri-fluoro-acetic acid) was added on each spot. Measurements and proteomic analysis of the isolate were carried out with a Microflex spectrometer (Bruker) as previously described (Seng et al., 2009).
Protein spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the spectra of the Bruker database (constantly incremented with our new spectra). A score >1.9 enabled the identification at the species level and a score < 1.7 did not enable any identification. Sequencing the 16S rRNA gene is needed to achieve the identification if the bacterium is not referenced in the database. The 16S rRNA gene amplification and sequencing were performed as previously described (Morel et al., 2015). For similarity level thresholds of 98.65% and 95%, a new species or a new genus was suggested, respectively, as proposed by Kim, Oh, Park, & Chun, (2014).

| Phylogenetic tree
A custom python script was used to automatically retrieve all species from the same order of the new genus and download 16S sequences from NCBI, by parsing NCBI eutils results and NCBI taxonomy page. It only keeps sequences from type strains. In case of multiple sequences for one type strain, it selects the sequence obtaining the best identity rate from the BLASTn alignment with our sequence. The script then separates 16S sequences in two groups: one containing the sequences of strains from the same family (group a) and one containing the others (group b). It finally only keeps the 15 closest strains from group a and the closest one from group b. If it is impossible to get 15 sequences from group a, the script selects more sequences from group b to get at least nine strains from both groups.

| Growth conditions
Growth of the strain AT8 T was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (bioMérieux, Marcy l'Etoile, France), and in aerobic conditions, with or without 5% CO 2 . Different temperatures (25, 30, 37, 45°C) were tested to determine the optimal growth of the strain AT8 T .
Optimal salt concentration required for growth was determined by growing the strain at 0, 0.5, 1,and 1.5% of NaCl. The optimal pH for growth was determined by testing different pH: 5, 6, 6.5, 7, 7.5, 8, and 8.5.

| Morphological, biochemical, and antibiotic susceptibility tests
Sporulation assay was done by a thermic shock at 60°C for 20 min follow by a subculture on 5% sheep blood-enriched Columbia agar medium (bioMérieux). Using a DM1000 photonic microscope (Leica Microsystems, Nanterre, France) with a100X objective lens, the motility of the strain from a fresh culture was observed. The colony's surface was observed on a 5% sheep blood agar culture medium after 24-hr' incubation at 37°C. In order to observe the cells morphology, they were fixed with 2.5% glutaraldehyde in 0.1mol/L cacodylate buffer for at least 1 hr at 4°C. A drop of the cell suspension was deposited for approximately 5 min on glow-discharged formvar carbon film on 400 mesh nickel grids (FCF400-Ni, EMS).
The grids were dried on blotting paper and cells were negatively stained for 10 s with 1% ammonium molybdate solution in filtered water at RT. Electron micrographs were acquired with a Tecnai G 20 Cryo (FEI) transmission electron microscope operated at 200 keV.
The gram coloration was performed using the color Gram 2 kit (bio-Mérieux) and observed using a DM1000 photonic microscope (Leica

Microsystems).
For the biochemical characterization assays, available API ZYM and API 50 CH strips (bioMérieux) were performed according to the manufacturer's instructions. Cellular fatty acid methyl ester (FAME) analysis was performed by gaz chromatography/mass spectrometry (GC/MS).
Two samples were prepared with approximately 20 mg of bacterial biomass per tube harvested from several culture plates. FAME were prepared as described by Sasser, (2006). GC/MS analyses were carried out as described before (Dione et al., 2016). Briefly, FAME were separated using an Elite 5-MS column and monitored by mass spectrome-

| DNA extraction and genome sequencing and assembly
Strain AT8 T was cultured on ten petri dishes with 5% sheep blood Columbia agar. Genomic DNA (gDNA) of strain AT8 T was extracted in two steps : a mechanical treatment was first performed by glass beads acid washed (G4649-500 g Sigma) using a FastPrep BIO 101 instrument (Qbiogene, Strasbourg, France) at maximum speed (6.5) for 3 × 30s. Then after a 2 hr lysozyme incubation at 37°C, DNA was extracted on the EZ1 biorobot (Qiagen) with EZ1 DNA tissues kit.
The elution volume is 50 μl. gDNA was quantified by a Qubit assay with the high sensitivity kit (Life technologies, Carlsbad, CA, USA) to 29.1 ng/μl. Total information of 10 Gb was obtained from a 690 K/mm 2 cluster density with a cluster passing quality control filters of 94.5% (16,542,000 passing filter paired reads). Within this run, the index representation for strain AT8 T was determined to 7.36%.
The 1,218,050 paired reads were trimmed then assembled in nine scaffolds.

| Genome annotation and comparison
The prediction of open reading frames (ORFs) was performed by Prodigal (http://prodigal.ornl.gov/) with default parameters but the predicted ORFs were excluded if they were spanning a sequencing gap region (containing N). The predicted bacterial protein sequences were searched against the clusters of orthologous groups (COG) (Galperin, Makarova, Wolf, & Koonin, 2015) using BLASTP (E-value 1e −03 , coverage 0.7, and identity percent 30%). If no hit was found, it was searched against the NR database using BLASTP with E-value of 1e −03 coverage 0.7 and identity percent of 30%, and if the sequence length was smaller than 80 amino acids, we used an E-value of 1e −05 . tRNA genes were found by the tRNAScanSE tool, whereas ribosomal RNAs were found using RNAmmer (Lagesen et al., 2007;Lowe & Eddy, 1997). Using Phobius, the lipoprotein signal peptides and the number of transmembrane helices were predicted (Käll, Krogh, & Sonnhammer, 2004). ORFans were identified if all the BLASTP performed did not give positive results (Evalue smaller than 1e −03 for ORFs with sequence size larger than 80 aa or E-value smaller than 1e −05 for ORFs with sequence length smaller 80 aa). The XEGEN software (Phylopattern) allowed us to automatically retrieve genomes from the 16S RNA tree (Gouret, Thompson, & Pontarotti, 2009). For each selected species, the complete genome sequence, proteome sequence, and Orfeome sequence were retrieved from the FTP of NCBI. The proteomes were analyzed with proteinOrtho (Lechner et al., 2011). Then for each couple of genomes, a similarity score (mean value of nucleotide similarity between all couple of orthologues between the two genomes studied) was computed by AGIOS software (Average Genomic Identity Of gene Sequences) (Ramasamy et al., 2014). An annotation of all proteome was done to determine the predicted genes functional classes' distribution according to the clusters of orthologous groups of proteins. The Multi-Agent software system DAGOBAH, which includes Figenix libraries for provide pipeline analysis and Phylopattern for tree manipulation, was used to perform the annotation and comparison processes (Gouret et al., 2011). Genome-to-Genome Distance Calculator (GGDC) analysis T A B L E 1 Classification and general features of Hugonella massiliensis strain AT8 T according to the MIGS recommendations (Field et al., 2008)  Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Nontraceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from http://www.geneontology.org/GO.evidence.shtml of the Gene Ontology project (Ashburner et al., 2000). If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgments.

| Strain identification and phylogenetic analysis
The spectra of strain AT8 T obtained by MALDI-TOF MS did not match with any strains in our database (Bruker, continuously incremented with our data) suggesting that our isolate could be a new isolate. We added the spectrum from strain AT8 T to URMITE database ( Figure S3) (Figure 1). This value is under the threshold that allows the identification of a new genus, as established by Kim et al., (2014).
Consequently, strain AT8 T is considered as the type strain of the first isolate of a new genus named Hugonella gen. nov., for which Hugonella massiliensis sp. nov strain AT8 T is the type strain. Finally, the gel view showed the mass spectra's differences with other closely related genera of Eggerthellacae family (Figure 2).

| Genome properties
The genome of H. massiliensis strain AT8 T (Accession number: FAUL00000000) contains 2,091,845 bp with 63.46% of G+C content F I G U R E 2 Gel view comparing Hugonella massiliensis strain AT8 T to other closely related species. The gel view displays the raw spectra of strain AT8 T of loaded spectrum files arranged in a pseudo-gel like look. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a gray scale scheme code. The right y-axis indicates the relation between the color of a peak and its intensity, in arbitrary units. Displayed species are indicated on the left (Table 4, Figure 3) and is composed of nine scaffolds with 12 contigs. The draft genome was shown to encode 1,849 predicted genes, among which 1,781 were protein-coding genes, and 68 were RNAs (six genes are 5S rRNA, six genes are 16S rRNA, six genes are 23S rRNA, 50 genes are tRNA genes). A total of 1,438 genes (80.74%) were assigned as putative function (by cogs or by NR blast). Ninety genes (5.05%) were identified as ORFans. The remaining genes were annotated as hypothetical proteins (201 genes = 11.29%) ( Table 4).

| Genome comparison
We made some comparisons with the closest annotated sequenced genomes currently available: Slackia piriformis ADMD00000000. The total is based on the total number of protein-coding genes in the annotated genome.

| DISCUSSION
Here, we used the culturomics approach to study the microbial

| CONCLUSION
Based on phenotypic, phylogenetic, MALDI-TOF, and genomic analyses, we formally propose the creation of the genus Hugonella gen.
nov., including Hugonella massiliensis gen. nov., sp. nov currently the only cultivated species. Indeed, H. massiliensis strain AT8 T is only 92% 16S rRNA sequence similarity with Eggerthella lenta DSM2243 which gives it the status of a new genus. The strain has been isolated from a stool specimen of a morbidly obese French woman, as part of the culturomics study by anaerobic culture at 37°C. Several other bacterial species that remain undescribed were also isolated from different stool specimens using different culture conditions, suggesting that the human intestinal microbiota remains partially unknown and its diversity has yet to be fully explored. Denitrobacterium detoxificans

± 00
Bold value: Presents the comparison between the strain and itself was isolated from feces of a 51-year-old obese French woman (BMI 44.38 kg/m2).
Colonies are smooth, shiny and measure 2-5 mm. No catalase and no oxidase activities were observed.