Genome sequencing data of extended-spectrum beta-lactamase-producing Escherichia coli INF191/17/A isolates of nosocomial infection

The infection with extended-spectrum beta-lactamase-producing Escherichia coli is associated with higher mortality, longer length of hospital-stay and increased costs compared to infection with antibiotic-susceptible E. coli. Here, the draft genome of ESBL-producing E. coli circulating at local hospital is reported. The strain was detected as containing the genes of antibiotic resistance TEM, CTX-M-1, and CTX-M-9. The 5,136,548-bp genome, with a GC content of 50.59%, comprised 4987 protein-coding genes, four ribosomal RNA, and 66 transfer RNA. The ResFinder was successfully predicted fourteen antimicrobial genes in the E. coli INF191/17/A genome. Sequence data has been deposited in the GenBank database under the accession number JAIEXV000000000. The BioProject ID in the GenBank database is PRJNA752944. The raw data was sequenced using Ilumina MiSeq and submitted to the NCBI SRA database (SRX11797310), which is publicly available.


a b s t r a c t
The infection with extended-spectrum beta-lactamaseproducing Escherichia coli is associated with higher mortality, longer length of hospital-stay and increased costs compared to infection with antibiotic-susceptible E. coli. Here, the draft genome of ESBL-producing E. coli circulating at local hospital is reported. The strain was detected as containing the genes of antibiotic resistance TEM, CTX-M-1, and CTX-M-9. The 5,136,548-bp genome, with a GC content of 50.59%, comprised 4987 protein-coding genes, four ribosomal RNA, and 66 transfer RNA. The ResFinder was successfully predicted fourteen antimicrobial genes in the E. coli INF191/17/A genome. Sequence data has been deposited in the GenBank database under the accession number JAIEXV0 0 0 0 0 0 0 0 0. The BioProject ID in the GenBank database is PRJNA752944. The raw data was sequenced using Ilumina MiSeq and submitted to the NCBI SRA database (SRX11797310), which is publicly available.  Table   Subject Health and medical sciences Specific subject area Microbiology and genomics. Genome sequencing of pathogenic bacteria by using next generation sequencing approach. Type of data

Value of the Data
• The whole genome sequencing data provides insight into genomic determinants of the ESBLproducing E. coli strains INF191/17/A and antimicrobial resistance (AMR) genes. • This data should be used by researchers and public health officers to keep up surveillance and control of ESBL-producing gram negative organisms in order to prevent the emergence of highly resistant strain , which is one of serious problem in the world. • The genome data of E. coli strain INF191/17/A accelerates knowledge for pathogenic microbial research in the context of comparative studies, pan-genome, and evolution of non-ESBL and ESBL strains within different epidemiology. • Furthermore, prior to biomarker discovery, drug or vaccine development, the comprehensive understanding of the whole genome of this pathogen is critically important.

Data Description
The Escherichia coli INF191/17/A was discovered as an extended-spectrum beta-lactamase (ESBL) strain carrying the antibiotic resistance genes TEM, CTX-M-1, and CTX-M-9 via polymerase chain reaction using ESBL specific primers [1] . The 251 base-pair paired-end (2 × 251 bp) sequencing raw reads of the E. coli strain INF191/17/A genome were obtained from the Illumina MiSeq system (Illumina, CA, USA) [2] . The raw reads were pre-processed before the genome assembly and annotation. Antimicrobial resistant genes were predicted using curated public database. Genomic DNA was extracted from E. coli strain INF191/17/A and sequenced to generate a total of 1,368,224 reads in a 500-cycle run. The total reads from a paired-end dataset (191-17-A_R1.fastq and 191-17-A_R2.fastq) have resulted in 329,238,355 total bases ( Table 1 ). The pre-processed of raw reads including trimming adapter sequences, low-quality and short reads, resulting 46.9% of clean readings. De novo assembly of the clean reads was performed and generated 314 contigs with a total size of 5.12 Mbp. Scaffolding resulted in 74 scaffolds with the longest scaffold is 2,520,446 and N50 scaffold length of 1,733,129 bases ( Table 2 ). The average coverage of assembled sequence is 66x with 50.59% of G + C content. Using PGAP, a total of 4987 coding sequences (CDS), four ribosomal RNA, and 66 transfer RNA ( Table 3 ) were predicted. Fur-    ( Table 4 ).

Sample Collection and Isolation of ESBL E. coli Strain INF191/17/A
E. coli strain INF191/17/A was isolated from a 45-year-old male patient who was suffering from a high fever at a local hospital. In brief, the sample was cultured in the Bactec 9240 blood culture system (Becton, Dickinson, USA) before proceeding with the biochemical testing and gram staining [3] . The ESBL screening and disk confirmation tests were measured according to Clinical and Laboratory Standards Institute (CLSI) [4] . The 16S rRNA sequences for this strain were validated using specific primers of E. coli [5] . Then, the PCR was conducted using ESBL-primers for the confirmation of ESBL-type [1] .

DNA Isolation, Genome Sequencing, Assembly, and Annotation
Genomic DNA was isolated using NucleoSpin tissue DNA, RNA, and protein purification kit according to manufacturer's instructions (Macherey-Nagel). The purified DNA was processed using Nextera XT DNA library preparation kit following the manufacturer's instructions (Illumina, USA). A whole-genome sequence was performed using the Miseq platform (Illumina, USA) (2 × 251 bp). The adapter trimming, quality trimming, contaminant filtering and read length filtering were performed using BBDuk (BBTools version 36) ( http://jgi.doe.gov/data-and-tools/ bbtools/ ). The low-quality bases ( < Q30) and short reads ( < 50 bp) were trimmed to produce clean reads with a high quality read dataset. The clean reads were assembled de novo using SPAdes v3.9.0 [6] to obtain contigs. These assembled contigs were subjected to scaffolding against the closest reference genomes [3] to produce a draft genome using Medusa (Multi-Draft based Scaffolder) software [7] . The genome was annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) v4.10 [8] .

Antimicrobial Resistant Genes Analysis
ResFinder (v4.1) [9] was used to screen for antimicrobial resistance genes. The assembled genome was searched against the curated Escherichia coli database using the default parameters. The prediction of the genes was confirmed if the assembled sequence had at least 95% nucleotide matching identity and 80% coverage with candidate genes in the database.

Ethics Statement
The study protocol was approved by the ethics committee of the Universiti Sains Malaysia (USM/JEPeM/20030152).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.