Genome data of Stenotrophomonas maltophilia DF07 collected from polluted river sediment reveals an opportunistic pathogen and a potential antibiotic reservoir

Stenotrophomonas maltophilia DF07 is a gram negative bacterium isolated from polluted San Jacinto River sediment near Moncrief Park in Channelview, Texas. The genome of strain DF07 (chromosome and plasmid) was compiled at the scaffold level and can be accessed through the National Center for Biotechnology Information database under accession NZ_NJGC00000000. The DF07 genome consists of a total of 4,801,842 bp encoding for approximately 4,351 functional proteins. Approximately 86 proteins are associated with broad-spectrum antibiotic resistance, 11 are associated with bacteriocin production, and a total of 17 proteins encode for an assortment of Mycobacterium-like virulence and invasion operons. S. maltophilia DF07 is genetically similar to the nosocomial S. maltophilia strain AU12-09, but also harbors an unusually large plasmid that encodes for over 150 proteins of unknown function. Taken together, this strain is potentially an important antibiotic reservoir and its origin within a recreational park merits further study of the area.


Data
Sequence analysis identifies DF07 as a novel strain of Stenotrophomonas maltophilia, a member of the Xanthomonadaceae family from the Gammaprotebacteria class. This Gram-negative bacterium is ubiquitously distributed throughout both soil and aquatic environments. S. maltophilia is known to be an opportunistic human pathogen. According to MASH genome analysis, the closest relative of S. maltophilia DF07 is S. maltophilia AU12-09, a nosocomial isolate collected from a hospital intravascular catheter ( Fig. 1) [1]. Genome annotation reveals that S. maltophilia DF07 possesses many of the antibiotic resistance determinants identified in strain AU12-09. These determinants include a complement of 12 b-lactamase enzymes and associated proteins, aminoglycoside inactivation enzymes, fluoroquinolones resistance proteins as well as 5 tripartite and 27 multidrug pump proteins related to antibiotic efflux (Fig. 2). Of particular note is the large bacterial plasmid found within the DF07 strain. This plasmid has a length of 209,390 bp and encodes for approximately 179 genes, many of which have unknown functions. S. maltophilia DF07 also encodes for several chromosomal and plasmid based Mycobacterium-like virulence and invasion operons (Fig. 3).

Sample collection
Sediment was collected from the bottom of a 12 inch hole dug by the bank of the San Jacinto River alongside Moncrief Park in northern Channelview, Texas. Moncrief Park lies west of the now enclosed San Jacinto River Waste Pits, a submerged Superfund site once used for dumping of paper mill waste.

Value of the data
The genome data of S. maltophilia DF07 highlights the presence of an unusually large plasmid. S. maltophilia DF07 therefore provides insight into horizontal gene transfer and possibly the spread of antibiotic resistance and virulence determinants across the San Jacinto River in addition to other polluted waterways. This genomic data expands our understanding of the potential for opportunistic pathogenicity in S. maltophilia isolates and their capacity to act as an antibiotic resistance reservoir. The data presented in this brief can be used in antibiotic resistance comparisons between environmental and nosocomial (clinical) isolates of S. maltophilia.

Sample screening
Carbon selective media was prepared as previously described in Iyer et al., 2016 [2]. A total of 5 mL of carbon selective media was used for initial sample inoculation. Dibenzofuran added at a final concentration of 100 mg/mL was used as a screening agent and potential carbon source. Subcultures were performed over five weeks before plating onto minimal agar plates supplemented with dibenzofuran.

Genomic DNA preparation
Plated colonies, yellow in coloration, were revitalized in 5 mL Luria-Bertani medium and grown overnight. Total cellular DNA of the overnight culture was then extracted using a Qiagen DNeasy Blood and Tissue kit.

Whole genome sequencing
Prepared sample DNA was shipped to Genewiz (South Plainfield, NJ) who performed Illumina MiSeq paired-end sequencing (Table 1).

Genome annotation
Raw sequence data was first quality checked in Fastqc [3] and poor reads filtered out using BBTools [4]. Good sequence reads were then assembled with the Spades 3.10 program [5]. Annotation was performed both through the NCBI Prokaryotic Genomes Automatic Annotation Pipeline (http://www. ncbi.nlm.nih.gov/genomes/static/Pipeline.html) and RAST server (See Table 1) [6,7].

Phylogeny analysis
The Mash program was used first for species identification, then to map the five closest bacterial hits based on their Mash distances to the sample strain using the Mash sketch database for RefSeq release 70 (k-mer size ¼ 21, sketch size ¼ 1000) [8]. The file was then imported into R and the Ggdendrogram [9,10] package used to create a phylogenetic tree (See Fig. 1).

Acknowledgments
Funding for whole genome sequencing of S. maltophilia DF07 was provided by the National Institute of Standards and Technology (NIST) (G110008 58,106).

Author contributions
RI was the principal investigator of this research and all research work was conducted in her laboratory space. RI also submitted compiled sequence data to NCBI and proofread the manuscript. AD carried out genome assembly and phylogenetic analysis. BI performed RAST analysis and helped with drafting the manuscript. All authors read and approved the final text.