Closed and High-Quality Bacterial Genome Sequences of the Oligo-Mouse-Microbiota Community

The Oligo-Mouse-Microbiota (OMM12) gnotobiotic murine model is an increasingly popular model in microbiota studies. However, following Illumina and PacBio sequencing, the genomes of the 12 strains could not be closed. Here, we used genomic chromosome conformation capture (Hi-C) data to reorganize, close, and improve the quality of these 12 genomes.

T he Oligo-Mouse-Microbiota (OMM 12 ) is a murine bacterial synthetic community (12 strains) introduced in axenic mice. The resulting gnotobiotic murine model has been increasingly used for diverse gut microbiota studies across several animal facilities (1)(2)(3)(4). This bacterial consortium comprises members from the 5 major phyla naturally present in mice microbiota and reconstitutes its main functionalities, such as metabolism and colonization resistance against pathogens (5).
Despite two rounds of sequencing using PacBio and Illumina technologies (6, 7), one-half of the genomes currently accessible remain made of 2 to 20 contigs (Table 1). In particular, the genome of Bacteroides caecimuris, the most abundant bacterium of the OMM 12 consortium, consists of 19 contigs (5). Improving scaffolding would facilitate a number of downstream analyses, such as prophage prediction.
We used genomic chromosome conformation capture (Hi-C), a technique that uses chemical fixation (formaldehyde) to in vivo cross-link nearby DNA sequences, digestion with restriction enzymes, DNA extremities refilling with biotinylated nucleotides, and finally, proximity ligation, DNA extraction, and deep sequencing. The relative ligation frequencies between nonadjacent DNA segments reflect the average three-dimensional (3D) organization of the genome of interest (8) and also, because of the polymer nature of DNA, the relative distance separating them along the chromosome. The latter property has therefore been exploited to bridge scaffolding gaps resulting, for instance, from the presence of repeated elements (9). Hi-C scaffolding is now commonly used in sequencing projects of large eukaryotic genomes but applies to bacteria as well (10,11).
A Hi-C metagenomic protocol for mammalian gut samples developed in our laboratory (12) was applied to fecal pellets from the OMM 12 mice bred at Institut Pasteur (protocol 20.173 approved by the veterinary staff and under authorization APAFIS 26874 by the national ethics committee). Libraries were prepared using streptavidin beads to capture biotinylated ligation junction as described previously (13) and sequenced with an Illumina NextSeq 550 system to generate a total of 101,182,905 reads of 35 bp. The quality of the reads was first verified with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/ fastqc). Then, for each bacterium, the python library hicstuff (https://github.com/koszullab/ hicstuff) was used to generate Hi-C contact maps using the most recent OMM 12 genomes (6). Contigs were reorganized (i.e., scaffolded) based on their relative contact frequencies (14). When necessary, preexisting contigs were split and rearranged to better fit the expected 3D structure, based on the contact data. In order for users to detect junctions between contigs, contigs were separated by 10 Â N in the final fasta file. We successfully closed the genomes of the 12 strains except for Flavonifractor plautii, where the position of a 70,760-bp region could not be assigned into the main scaffold. The genome of F. plautii therefore remains split in 2 scaffolds. An automatic annotation was then performed using RAST (15). The access to a single scaffold of high quality will greatly improve the accuracy of genomic analyses performed by the community of users of the OMM 12 gnotobiotic mice.
Data availability. The 12 reassembled and closed genome sequences of the OMM 12 bacteria as well as the FastQ reads have been deposited under the accession numbers provided in Table 1 under the BioProject number PRJNA680355.

ACKNOWLEDGMENTS
We thank Cyril Matthey Doret and Agnès Thierry for assistance during computational and experimental work, respectively.
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. This research was supported by funding to L.D.