Whole-Genome Sequence of Escherichia coli Serotype O157:H7 Strain EDL932 (ATCC 43894)

The genome sequence of Escherichia coli serotype O157:H7 EDL933, a ground beef isolate from a 1983 hemorrhagic colitis outbreak, is a standard reference for comparative genomic studies of Shiga toxin-producing E. coli strains. Here, we report the genome sequence of a patient stool isolate from that outbreak, strain EDL932.

testinal disease characterized by hemorrhagic colitis, which may progress to severe sequelae, such as hemolytic-uremic syndrome. In the United States, O157:H7 is the most important STEC serotype, both in total numbers of individual cases and large outbreaks.
Individual serotype O157:H7 strains show significant genotypic variability depending on the environment from which they were isolated. This is evident with regard to biofilm-forming properties and RpoS-dependent stress resistance (4,5). Here, we report the whole-genome sequencing (WGS) of strain EDL932 (ATCC 43894), a patient isolate from that same 1983 Michigan outbreak.
Large genomic DNA (gDNA) fragments for single-molecule real-time (SMRT) sequencing were extracted from a frozen cell pellet (5 ml of overnight culture) of EDL932, designated 43894OW, using the Qiagen Genomic-tip 100/G kit. SMRT sequencing was done at the University of Delaware Sequencing and Genotyping Center (Newark, DE) using the PacBio RSII SMRT DNA sequencing system. De novo assembly of the SMRT reads used Hierarchical Genome Assembly Process (version 3). Total DNA was also sent to ProteinCT Biotechnologies, LLC (Madison, WI) for complete WGS workflow. Libraries were prepared using the Nextera DNA library preparation kit (Illumina) and sequenced using Illumina MiSeq. Approximately 4.5 million 2 ϫ 250-bp paired-end reads were generated. Raw data quality was evaluated using FastQC, and Trimmomatic (USADELLAB.org) was used to remove adapters and low-quality sequences (ϽQ20). Clean data were assembled with Velvet (https://www.ebi.ac.uk /~zerbino/velvet/ [6]). Several parameters were tested, and the assemblies were evaluated by comparison to the EDL933 NCBI genome reference sequence (accession no. NZ_CP008957.1). The final assembly used in the report has a hash size of 69. In addition to genome assembly, clean reads were remapped to the EDL933 NCBI genome using Burrows-Wheeler Aligner (BWA) (7). Duplicated reads were removed with Picard (http://broadinstitute .github.io/picard/); variants were called using SAMtools (8) and annotated with SnpEff (9).
The unclosed 43894OW draft genome aligned with the 5,547,323-bp EDL933 reference genome, except for a nearly 13kbp region with exceptional coverage depth, indicating gene duplication in strain EDL932 (Integrative Genomics Viewer [IGV]; Broad Institute, Cambridge, MA). Complete resolution of duplicated areas may expand the genome size. Variant detection identified small numbers of single-nucleotide polymorphisms (SNPs) and indels, many with low-quality scores. Included was a T-to-G transversion at position 721 of the rpoS coding sequence (CDS) compared to the reference EDL933, which changes a stop codon to a glutamic acid residue and extends EDL932 RpoS to 330 residues. This transversion matched the consensus for other serotype O157 strains, including strain EDL933 (ATCC 43895) rpoS gene sequenced in our earlier study (10), indicating variability among different outbreak isolates or a mistake during the original EDL933 sequence.
Nucleotide sequence accession numbers. This whole-genome shotgun project has been deposited at DDBJ/ENA/GenBank under the accession no. LPWC00000000. The version described in this paper is version LPWC02000000.

ACKNOWLEDGMENT
The mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. crossmark