Whole genome shotgun sequences of Streptococcus pyogenes causing acute pharyngitis from India

Streptococcus pyogenes, belonging to group A streptococcus (GAS), causes over 600 million infections annually being a predominant human pathogen. Lack of genomic data on GAS from India is one limitation to understand its virulence and antimicrobial resistance determinants. The genome of GAS isolates from clinical samples collected at Navi Mumbai, India was sequenced and annotated. Sequencing was performed on Ion Torrent PGM platform. The size of annotated S. pyogenes genomes ranged from ~1.69 to ~1.85 Mb with coverage of 38× to 189×. Most of the isolates had msr(D) and mef(A), and four isolates had erm(B) gene for macrolide resistance. The genome harboured multiple virulence factors including exotoxins in addition to phage elements in all GAS genomes. Four isolates belonged to sequence type ST28, 7 were identified as ST36 and 1 as ST55.


Data
Streptococcus pyogenes, belonging to group A streptococcus (GAS), causes over 600 million infections annually being a predominant human pathogen. GAS throat infections are common in children between 4 and 7 years and pose several clinical and public health challenges [1]. Prevalence of Pharyngitis caused by S. pyogenes is difficult to determine as it is a throat colonizer, but some studies report as 10-15% [2]. The GAS pharyngitis is usually undetermined due to its self-limiting nature and major cases being of viral etiology [3]. M proteins, pili, leukocidins, streptolysins (O,S), complement inhibiting proteins, immunoglobulin-degrading enzymes, and superantigens are genome-encoded virulence factors that have been well characterized in S. pyogenes, [4,5], where efflux pumps and leukocyte evasion strategies stays as an integral factors. High genomic plasticity is seen in S. pyogenes due to the prophage integration and horizontal gene transfer. [6].  Gene(s) with potential for conferring virulence traits The post Streptococcal sequelae following GAS pharyngitis are the non-suppurative manifestation of rheumatic fever followed by Rheumatic heart disease. In India, the overall prevalence is estimated at 1.5-2/1000 in all age groups, (total population about 1.3 billion) being suggestive of 2.0 to 2.5 million patients of RHD in the country [4]. Due to the high burden of the GAS infections in India, preventive strategies like vaccination turn to be the need of the hour.
Furthermore, lack of genomic data on GAS from India is one limitation to understand its virulence and antimicrobial resistance determinants. This study reports the whole genome sequence data of S. pyogenes for the first time from India. The GAS genomic data will serve as a base for further research focusing on the genomic attributes of virulence, antimicrobial resistance and clonal association by Whole genome shotgun sequencing.

Study isolates
During the months of March-May 2017, children up to 18 years with acute pharyngitis were screened for GAS infections at Dr. Yewale Multispeciality Hospital for Children, Navi Mumbai using the cutoff score of 3 of the Modified Centor criteria.

DNA extraction and genome sequencing
A total of 12 culture confirmed S. pyogenes were subjected to total DNA extraction using QiAamp DNA mini Kit (Qiagen, Germany).Whole genome shotgun sequencing was performed using IonTorrent PGM platform (Life Technologies) with 400 bp chemistry.

De novo assembly and annotation
Assembly of the raw reads were performed using AssemblerSPAdes v.5.0.0.0 embedded in Torrent suite server v.5.0.5. Annotation of the genome were done using the PATRIC database (the bacterial bioinformatics database and analysis resource) (http://www.patricbrc.org), [7] and the NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP) (http://www.ncbi.nlm.nih.gov/genomes/static/ Pipeline.html). Further genome analysis was performed with the genomic tools available at the Center for Genomic Epidemiology (CGE) server (http://www.cbs.dtu.dk/services), and PATRIC database. The size of annotated S. pyogenes genomes ranged from~1.69 to~1.85 Mb with coverage of 38X to 189X (Table 1). The number of Coding DNA sequences (CDS) per genome ranged between 1725 and 2042. The draft genome sequences have been deposited in DDBJ/ENA/GenBank under the accession numbers provided in Table 1. The version described in this manuscript is version 1.
Antimicrobial resistance (AMR) genes and plasmids were screened with ResFinder 2.1 and Plas-midFinder 1.3 tools [8,9]. Most of the isolates had msr(D) and mef(A), and four isolates had erm(B)  (6)-Ia, and tet(M) genes for aminoglycoside and tetracycline resistance respectively (Table 1). Also, PATRIC analysis revealed ABC transporter membrane-spanning permease, multidrug resistance efflux pump pmrA and multi antimicrobial extrusion (MATE) family transporter genes responsible for macrolide and multi-drug resistance in all isolates. Multiple virulence determinants in the GAS genomes were identified using the annotated data from PATRIC (Table 2). Of which, all the genomes harboured streptolysins O & S, and Streptococcal pyrogenic exotoxins C and G. Clusters of regularly interspaced short palindromic repeats (CRISPR) and spacer sequences in the genome were identified using CRISPR finder (http://crispr.u-psud.fr/Server/) [10]. All isolates carried 1,2,3,4,5d CRISPR type with varied repeat, spacer and array regions (Table 3).
Multi-locus sequence typing (MLST) of the GAS isolates were interpreted with the standard references available at the MLST 1.8 database (https://cge.cbs.dtu.dk//services/MLST/). Four isolates belonged to ST28, 7 were identified as ST36 and 1 as ST55. M protein typing was done using the Blast 2.0 server provided by National Centers for Disease Control, Biotechnology Core Facility Computing Laboratory and emm types were assigned. Isolates with ST28 corresponds to emm1.0 (emm cluster A-C3), ST36 to emm12.0 (emm-cluster A-C4) and ST55 to emm2.0 (emm-cluster E4) ( Table 1).
The phages and phage associated elements in the genome of GAS were identified using PHAge Search Tool Enhanced Release (PHASTER) [11] (