Whole genome sequence data of Stenotrophomonas maltophilia SCAID WND1-2022 (370)

The whole genome sequence of a hospital infection agent, Stenotrophomonas maltophilia SCAID WND1-2022 (370), is reported. Raw PacBio generated reads and the genome sequence were deposited at NCBI under BioProject PRJNA754843. The genome comprises two replicons: 4,880,425 bp long chromosome comprising 4524 proteins and functional RNA coding genes and 38,606 bp long plasmid containing 40 CDS. Both replicons were methylated at third cytosine residues of ACCTC motifs. The taxonomic provenance of SCAID WND1-2022 (370) was determined by calculating sequence similarity to the reference genomes at NCBI that showed the highest 97.35% identity to S. maltophilia ISMMS4. Many antibiotic resistance and virulence genes were identified on the chromosome of S. maltophilia SCAID WND1-2022 (370), which include multiple efflux pumps, beta-lactamases, and genes involved in biofilm formation. The plasmid sequence was dissimilar to any known plasmid and seemingly was acquired from a distant microorganism. Plasmid-born genes possibly contributed to the virulence of the pathogens, but not to its drug resistance.


a b s t r a c t
The whole genome sequence of a hospital infection agent, Stenotrophomonas maltophilia SCAID WND1-2022 (370), is reported. Raw PacBio generated reads and the genome sequence were deposited at NCBI under BioProject PRJNA754843. The genome comprises two replicons: 4,880,425 bp long chromosome comprising 4524 proteins and functional RNA coding genes and 38,606 bp long plasmid containing 40 CDS. Both replicons were methylated at third cytosine residues of AC C TC motifs. The taxonomic provenance of SCAID WND1-2022 (370) was determined by calculating sequence similarity to the reference genomes at NCBI that showed the highest 97.35% identity to S. maltophilia ISMMS4. Many antibiotic resistance and virulence genes were identified on the chromosome of S. maltophilia SCAID WND1-2022 (370), which include multiple efflux pumps, beta-lactamases, and genes involved in biofilm formation. The plasmid sequence was dissimilar to any known plasmid and seemingly was acquired from a distant microorganism. Plasmid-born genes possibly contributed to the virulence of the pathogens, but not to its drug resistance.
© 2022 The Author(s

Value of the Data
• These data are available for analysis by researchers to understand the molecular epidemiology of Stenotrophomonas maltophilia . • These data will be used to improve surveillance and prediction of hospital outbreaks of Stenotrophomonas maltophilia . • Whole genome sequencing data provide information about genomic determinants and antimicrobial resistance (AMR) genes of Stenotrophomonas maltophilia strain SCAID WND1-2022 (370).
• These data should be used by researchers and public health officers for surveillance and monitoring of Stenotrophomonas maltophilia to prevent the emergence of highly resistant strains. • The data can be used by researchers for genomics, proteomics and other evolutionary studies.

Objective
Antimicrobial resistance (AMR) is the main threats to human health. Monitoring the spread of genetic determinants and evaluating the etiologic and taxonomic composition of nosocomial pathogens will allow the timely identification of threats to human health. Data was obtained during the implementation of the ID #BR09458960 grant to create a collection of multi drug resistant bacterial strains causing nosocomial infections. The isolated strain Stenotrophomonas maltophilia SCAID WND1-2022 (370) will be used as a model organism to examine the development of virulence and antibiotic resistance. The data submitted demonstrate the genetic and epigenetic properties of the multidrug-resistant clinical isolate belonging to Gram-negative pathogens. These data can be helpful for the surveillance and prediction of outbreaks of Stenotrophomonas maltophilia -related hospital infections.

Data Description
Stenotrophomonas maltophilia is a major intra-hospital pathogen characterized by an extended multidrug resistance. Severe infections caused by S. maltophilia are associated with a high mortality rate, especially among people with weakened immunity [1] . Strategies are therefore necessary to improve the diagnostic of the infection and the treatment outcomes.
PacBio Sequel-I (Pacific Biosciences) sequencing platform was used to generate 103,602 single-end reads of an average length 8308 bp N50 -10,885 bp from a template DNA extracted from cultivated clinical isolate Stenotrophomonas maltophilia SCAID WND1-2022 (370). The sequencing statistics is summarized in Table 1 . The DNA reads were quality filtered and trimmed prior to assembly followed by genome annotation.
De novo assembly of the quality-controlled reads produced 2 circular contigs corresponding to one bacterial chromosome and one conjugative plasmid ( Table 2 ).
Both replicons were methylated at third cytosine residues of AC C TC motifs. This methylation is likely associated with a type III restriction-modification methyltransferase found on the chromosome. Taxonomic affiliation of the strain was predicted by whole genome comparison using calculated genome-to-genome distance calculator (GGDC) and OrthoANI values. The most closely related microorganism found in the GenBank database was Stenotrophomonas maltophilia ISMMS4. ( Table 3 and Fig. 1 ). Fig. 2 shows an atlas representation of the circular chromosome of S. maltophilia SCAID WND1-2022 (370). 17 loci on the chromosome were predicted as putative genomic islands (GIs).   Locations of the genomic feature were counted on the atlas clockwise starting from the replication origin (Ori) located ∼400 bp upstream of dnaA encoding chromosomal replication initiator protein.
Clinical isolates of S. maltophilia are characterized with high-level intrinsic and acquired resistance to a wide range of antibiotics [2] . Drug resistance (DR) and virulence (Vir) genes of the strain S. maltophilia SCAID WND1-2022 (370) were predicted using curated public database and gene annotation. The repertoire of DR genes includes 61efflux pumps organized in 26 operons of RND, EmbrAB-OMF, MdtABC-TolC and several other unrecognized types; 14 heavy metal efflux pumps; 14 beta-lactamases including one metallo-beta-lactamase; 3 small multidrug resistance family (SMR) proteins and one multidrug resistance protein NorM (Supplementary material). Vir proteins were identified by a search through the VFDM database. This identification revealed 56 genes involved in motility and biofilm formation; 37 adhesion and pilus proteins; 21 immune modulators; 30 effector transporters; 32 siderophores and iron transportation genes; 8 invasion proteins and toxins; 5 stress response proteins (Supplementary material). Many DR and Vir genes were located in the core part of the chromosome. Several operons of heavy metal resistance genes, multidrug efflux pumps, invasion and immune modulation proteins were found in GIs. These GIs are highlighted in Fig. 2 by red color.
The identified plasmid has another GC-content then the chromosome. A BLASTN search through the NCBI nr/nt database revealed the most similar plasmid pMRAD02 (CP0 010 03) from Methylobacterium radiotolerans JCM 2831 and several plasmids from phytopathogenic Xanthomonas citri . Alignment of the sequences of the plasmid of S. maltophilia SCAID WND1-2022 (370) and pMRAD02 is shown in Fig. 3 . An elaborated type IV secretion system (T4SSa) is shared by these two plasmids. No genes associated with antimicrobial resistance were detected on the plasmid.
The isolated strain S. maltophilia SCAID WND1-2022 (370) will be used as a model organism to examine the development of virulence and antibiotic resistance by this important pathogen causing acute hospital infections.

Sample Collection and Isolation of Stenotrophomonas maltophilia Strain SCAID WND1-2022 (370)
The isolate was obtained in 2022 from the intensive care unit at the Syzganov's National Surgery Center in Almaty, Kazakhstan. It has been approved by the Committee of Institutional Animal Care and Use in the Scientific Center for Anti-Infectious Drugs (SCAID), Almaty (ID: #2 from 09.16.2020). The sample was taken on a patient with bacterial septicemia. The sample was swabbed out of a purulent wound during a post-mortal autopsy after a fatal sepsis.

DNA Isolation, Genome Sequencing, Assembly, and Annotation
For DNA extraction, culture was grown on nutrient agar (Nutrient Agar, HiMedia) for 24 h at 37 °C. DNA was extracted using PureLink Genomic DNA Mini Kit (Invitrogen, USA) following the manufacturer's recommendations. The quality and quantity of the resulting DNA samples were determined using the NanoDrop 20 0 0c spectrophotometer (Thermo Scientific, USA) at the optical wavelengths of 260 and 280 nm.
The DNA library was sequenced using the PacBio Sequel-I (Pacific Biosciences) sequencing platform by Macrogen (Seoul, Korea) as described before [4] . Further processing of the DNA reads was performed using software tools as described below with default parameter settings if not indicated otherwise.
In total, 103,602 single-end reads with an average length of 8308 bp (N50 -10,885 bp) were obtained. The DNA reads were quality controlled and checked for remaining adapters using LongQC v1.2.0c [5] . Filtering and trimming returned 84,826 reads of the total length 802,012,239 bp. Genome assembly was performed using Canu v2.0 [6] . The original DNA reads were mapped to the scaffolds using pbmm2 (SMRT Link v10.1.0.119588) for error correction. Consensus sequences were generated from the alignments using the gcpp arrow algorithm (SMRT Link v10.1.0.119588) and also using a pipeline of samtools-1.10, bcftools-1.7 and vcfutils.pl utilities. The consensus sequences were annotated using NCBI Prokaryotic Genome Annotation Pipeline (PGAP) [7] .
The raw reads were deposited under the BioProject number PRJNA754843 with SRA accession SRR21079300. The complete chromosomal sequence was deposited under the accession number -CP102942 and the number of plasmid sequence is CP102943.
Genome-scale sequences were aligned by the program MAUVE 20150226 [8] .

Taxonomic Identification
OrthoANI v.0.93.1 [9] was used to identify strain by estimating genomic distance without alignment and computing mean nucleotide identity.

Genomic Islands Identify and DNA Methylation Profiling
Mobile horizontally acquired genomic islands (GIs) were identified using SeqWord Genomic Island Sniffer [13] . Sequence similarity between GIs was estimated by comparison of frequencies of tetranucleotides in their composition [14] .
Identification of methylated nucleotides and motifs of DNA methylation was performed using programs ipdSummary and motifMaker of the package SMRT Link v10.1.0.119588 as described previously [15] .

Ethics Statements
The protocol was approved by the Committee of Institutional Animal Care and Use in the Scientific Center for Anti-Infectious Drugs (SCAID), Almaty (ID: #2 from 09.16.2020).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.