Draft genome sequences data of rare Salmonella enterica sub sp. enterica serovar Ceyco and serovar Hillegersberg isolated from diarrheal patients in India

We report here the draft genome sequence of two rare Salmonella serotypes, isolated from human faecal samples in India. The isolates were identified as Salmonella enterica subsp. enterica serovar Ceyco and serovar Hillegersberg by Wole genome sequencing (WGS) based serotype prediction. The genomic similarity of study isolates was identified by clustering with the global collection of Salmonella sp. available in EnteroBase and SISTR based on their cgMLST profile. Phylogenetic analysis showed the study isolates were closer to S. Detmold and other unknown serovars from serogroup D2. The information generated from genome sequencing of two rare S. enterica serovar will improve the overall understanding of the epidemiology of this clinically relevant pathogen.


a b s t r a c t
We report here the draft genome sequence of two rare Salmonella serotypes, isolated from human faecal samples in India. The isolates were identified as Salmonella enterica subsp . enterica serovar Ceyco and serovar Hillegersberg by Wole genome sequencing (WGS) based serotype prediction. The genomic similarity of study isolates was identified by clustering with the global collection of Salmonella sp. available in EnteroBase and SISTR based on their cgMLST profile. Phylogenetic analysis showed the study isolates were closer to S . Detmold and other unknown serovars from serogroup D 2 . The information generated from genome sequencing of two rare S. enterica serovar will improve the overall understanding of the epidemiology of this clinically relevant pathogen.

Value of the Data
• The availability of genome sequencing data of rare Salmonella sp. provides insight on genetic diversity of the species • The data also helps to understand the genomic epidemiology of this clinical pathogen • The data can be used to identify other untypable Salmonella serotypes based on the genomic similarity and antigenic formulae

Data Description
Salmonella enterica subsp . enterica is one of the major causes of bacterial diarrhea across the world. Based on the antigenic variations (O, H1, H2 and Vi) Salmonella enterica is classified into > 2,500 serotypes. Serovar determination by phenotypic characterization of the O and H-antigens of Salmonella by the slide agglutination test, often generate untypable serovar designation. Therefore, seven-gene MLST based molecular subtyping has been commonly employed to accurately infer Salmonella serovar designations. Unfortunately, Multilocus sequence typing (MLST) does not differentiate all serotypes (Eg. polyphyletic serovars). Hence whole genome sequencing (WGS) has been recently used to comprehensively identify untypable or rare serovars. Here we report two untypable rare serovars belonging to Salmonella enterica sub sp. enterica serovar Ceyco and serovar Hillegersberg isolated from diarrheal patients in India.
S. Ceyco was first identified in the year 1966 from human samples in India and reported to have reappeared in the year 1969 [1] . Similarly, S. Hillegersberg was first reported from a patient in Municipal Health Laboratory, Rotterdam, Netherlands [2] . Both serovars are rarely isolated in most countries and have not been characterized from clinical samples since the preliminary identification reports. The study isolate, S . Ceyco strain FC2085 was recovered from the stool sample of a 7-year-old extramedullary leukemic relapse patient admitted at Christian Medical College, Vellore, India. The second isolate, S. Hillegersberg strain FC2223 was isolated from the stool sample of a 20-year-old man with Anaplastic large cells lymphoma. The strains were isolated as per standard microbiology techniques from stool samples and serogrouped as 9,46 (D 2 ) with commercial typing antiserum based on Kauffman-White scheme [3] . Antimicrobial susceptibility testing was performed and both isolates were susceptible to tested antimicrobials except aminoglycosides. The breakpoints were interpreted according to Clinical and Laboratory Standards Institute guidelines [4] .

Experimental Design, Materials and Methods
Genomic DNA was extracted from the overnight culture of the isolates using Wizard DNA purification kit (Promega. Madison, WI). Sequencing ready, paired end library was prepared using 100 ng of DNA with the Nextera DNA flex library prep kit (Illumina, Inc., San Diego, USA). This was followed by sequencing on Illumina iSeq-100 platform with a paired-end run of 2 × 150 bp. Trimmed reads were de novo assembled using SPAdes (v.3.15.3) with default settings ( https://github.com/ablab/spades ) which resulted in a coverage of 71x and 23x for S. Ceyco and S. Hillegersberg respectively. The draft genome was annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP v. 4.1) and subsequently deposited at GenBank.
The raw sequencing reads of strain FC2085 and FC2223 were submitted to SeqSero (v.2.0) [5] to determine the antigenic formula to predict the serotype. Strain FC2085 was identified as Salmonella enterica serovar Ceyco with the antigenic formula '9,46:k:z35'. Notably, strain FC2223 was predicted to be '9,46:a:z35' (novel) as per SeqSero2. However, the antigenic formula was later confirmed as '9,46:z35:1,5' that belongs to Salmonella enterica serovar Hillegersberg upon analysis by the Centre for Reference and Research on Salmonella , Pasteur Institute in Paris, France. The assembled genome size of S. Ceyco strain FC2085 was 4,691,294 bp with a G + C content of 51.8% and N50 value of 417,690. Concurrently, S. Hillegersberg strain FC2223 accounted for a genome size of 4,744,996 bp with a G + C content of 52% and N50 value of 51,438. Gene prediction and annotation showed a total of 4,408 and 4,586 coding sequences for S. Ceyco and S. Hillegersberg respectively ( Table 1 ).
MLST profile of isolates from genome assembly revealed new sequence types (ST) and STs were subsequently assigned as ST8445 for S . Ceyco strain FC2085 and ST8446 for S . Hillegersberg strain FC2223 ( http://enterobase.warwick.ac.uk/species/senterica ). Clustered regularly inter-  spaced short palindromic repeat (CRISPR) typing of the study isolates identified using CRISPRDetect ( http://crispr.otago.ac.nz/CRISPRDetect/predict _ crispr _ array.html ) showed two CRISPR loci for both the isolates with loci 1 and 2 of strain FC2085 carrying 23 and 16 spacers respectively. Similarly, strain FC2223 carried 40 spacers in loci 1 and 10 in loci 2. Resistome analysis of study isolates using ResFinder v.4.1 ( https://cge.cbs.dtu.dk/services/ResFinder/ ) showed only chromosomal-encoded aac(6 )-Iaa gene, which confers aminoglycoside resistance, and parC T57S point mutation. Plasmids were not detected in both the isolates as analyzed by PlasmidFinder ( https://cge.cbs.dtu.dk/services/PlasmidFinder/ ). The study isolates were placed in the global phylogenomic framework based on Core genome ML ST (cgML ST) available in SISTR ( https://lfz.corefacility.ca/sistr-app/?# ) [6] . The dendrogram hence generated displayed the phylogenetic position of study isolates and closely related isolates were selected for further analysis. Similarly, isolates clustered with the isolates as per Grape-Tree clustering were identified from EnteroBase [7] . Representative genomes, hence identified ( n = 26 ) were used to generate the phylogenetic tree using Mash tree ( https://github.com/lskatz/ mashtree ) [8] . The resulting phylogenetic tree was visualised and annotated using the Interactive Tree of Life software (iTOL v6) [9] . Our isolates were found to be phylogenetically closer to S. Detmold and other unknown serovars from serogroup D 2 ( Fig. 1 ). The information generated from genome sequencing of two rare S. enterica serovar will improve the overall understanding of the epidemiology of this clinically relevant pathogen. The raw reads and assembled genome sequences of S . Ceyco strain FC2085 and S . Hillegersberg strain FC2223 have been deposited in GenBank under the Biosample number PRJNA224116 and PRJNA767943.

Ethics Statements
The isolates used in this study were collected in the Clinical Microbiology Laboratory of the Christian Medical College and Hospital Vellore. No patients were recruited and the data collected from patients samples was anonymized and hence ethical approval and informed consent statements are not applicable. All prevailing local, national and international regulations and conventions and normal scientific ethical practices have been respected and all ethical norms have been followed. Ethical requirements in accordance with the World Medical Association was strictly followed.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.