Complete Genome and Plasmid Sequences of 32 Salmonella enterica Strains from 30 Serovars

We report here 32 completed closed genome sequences of strains representing 30 serotypes of Salmonella. These genome sequences will provide useful references for understanding the genetic variation within Salmonella enterica serotypes, particularly as references to aid in comparative genomics studies, as well as providing information for improving in silico serotyping accuracy.

S almonella is the leading cause of bacterial gastroenteritis in North America, with more than 1.7 million cases per annum (1). Public health laboratories are replacing traditional serotyping with whole-genome sequencing (WGS) for faster and more accurate surveillance and outbreak detection (2). The adoption of short-read sequencing technology has generated large amounts of genomic information, but it is fragmented and does not represent the complete DNA sequence of an organism. Highquality genomes are of great value since the use of draft genomes in comparative genomic analyses is complicated due to the inability to distinguish between truly missing sequences and those which were not resolved during the assembly process. Much of the genomic information for Salmonella comes from highly prevalent serotypes, and there is an underrepresentation of the rarer serotypes. Tools for in silico serotype prediction, such as the Salmonella In Silico Typing Resource (SISTR) (3,4), will benefit from this collection of high-quality reference genomes for 30 serotypes for which no closed genomes were previously available.
As of 9 September 2018, there were 634 fully closed genomes for Salmonella enterica in the NCBI genome database. Unfortunately, the large amounts of raw data available in the Sequence Read Archive (SRA) are composed primarily of Illumina short reads, which cannot readily circularize the Salmonella genome as one contiguous nucleic acid molecule. We have sequenced diverse serotypes of Salmonella using a combination of both Illumina and Oxford Nanopore platforms to produce high-quality de novo closed genomes for public health and comparative genomics applications. This data set represents 30 novel serotypes with 32 closed reference genomes (listed in Table 1).
Samples were grown on LB plates at 37°C, and genomic DNA was isolated using the Qiagen EZ1 DNA tissue kit on the Qiagen Advanced XL automated instrument, per the manufacturer's protocol, using 190 l of G2 buffer with 10 l of proteinase K. Oxford Nanopore sequencing was performed at the National Microbiology Laboratory (NML) at Guelph (Ontario, Canada), using an Oxford Nanopore MinION sequencer with the default manufacturer protocol for rapid barcoding. Samples were prepared using either SQK-RBK001 or SQK-RBK004 rapid barcoding kits and subsequently run on a FLO-MIN106 R9.4 flow cell. Each multiplexed run produced between 4,719 and 111,488 reads per sample, with the mean read length ranging between 3,485 and 11,880 bp. Albacore v2.1.3, available from Oxford Nanopore, was used to perform demultiplexing, base calling, and quality filtering of the raw reads. Illumina sequencing was done at Hybrid de novo assemblies were produced without raw read filtering prior to assembly using the Unicycler pipeline v0.4.3 (5) and were manually reviewed to confirm completeness of the chromosome and any plasmids present. The predicted serotype was determined using the Salmonella In Silico Typing Resource (SISTR) (3,4) to confirm that the in silico predictions matched the phenotypic serotype determined by the NML Reference Laboratory for Salmonellosis at Guelph. The high-quality closed reference genomes produced here will be useful for comparative genomics applications, as well as for epidemiological studies on outbreak detection and surveillance of Salmonella.
Data availability. The genome sequences for the 32 Salmonella isolates produced by the National Microbiology Laboratory Reference Laboratory for Salmonellosis at Guelph have been deposited in NCBI/DDBJ/ENA under BioProject no. PRJNA354244, PRJNA177577, and PRJNA177212. The GenBank accession numbers are all listed in Table 1. The Illumina and Oxford Nanopore raw sequence data in fastq and fast5 formats are also available in the Sequence Read Archive (SRA).

ACKNOWLEDGMENTS
We sincerely thank the following for providing isolates and phenotypic serotyping: This study was funded by the Public Health Agency of Canada.