Emergence of a novel lineage containing a prophage in emm/M3 group A Streptococcus associated with upsurge in invasive disease in the UK

A sudden increase in invasive Group A Streptococcus (iGAS) infections associated with emm/M3 isolates during the winter of 2008/09 prompted the initiation of enhanced surveillance in England. In order to characterise the population of emm/M3 GAS within the UK and determine bacterial factors that might be responsible for this upsurge, 442 emm/M3 isolates from cases of invasive and non-invasive infections during the period 2001–2013 were subjected to whole genome sequencing. MLST analysis differentiated emm/M3 isolates into three sequence types (STs): ST15, ST315 and ST406. Analysis of the whole genome SNP-based phylogeny showed that the majority of isolates from the 2008–2009 upsurge period belonged to a distinct lineage characterized by the presence of a prophage carrying the speC exotoxin and spd1 DNAase genes but loss of two other prophages considered typical of the emm/M3 lineage. This lineage was significantly associated with the upsurge in iGAS cases and we postulate that the upsurge could be attributed in part to expansion of this novel prophage-containing lineage within the population. The study underlines the importance of prompt genomic analysis of changes in the GAS population, providing an advanced public health warning system for newly emergent, pathogenic strains.


Introduction
Group A Streptococcus (GAS) has long been recognized as a human pathogen responsible for a diverse range of diseases. GAS infections cause significant morbidity and mortality globally, largely attributable to rheumatic heart disease and invasive infection. The minimum estimate, of over 500 000 deaths per year, places GAS among the major human pathogens (Carapetis et al., 2005). The organism itself possesses numerous surface-associated and secreted proteins that play a key role in host-bacteria interaction such as adherence and immune evasion (Bisno et al., 2003;Cunningham, 2000) and are therefore subject to strong selective pressure. M-protein is one such surface protein encoded by the emm gene that acts as a major virulence factor, and provides the basis for molecular typing.
An unusual increase in invasive GAS (iGAS) infections was first reported in the UK in November 2008 (Health Protection Report, 2009) (Fig. 1). Concerns over the increased incidence and increased case fatality ratio led to initiation of enhanced surveillance for iGAS infection . Assessment of more than 1200 sterile-site GAS isolates referred to the national Streptococcus and Diphtheria Reference Unit between January and July 2009 identified a significant increase in emm/M3 isolates, rising from 14 % in in the previous year to 38 % in April 2009. Such type-specific dominance had never been described in the UK and generated considerable concern given the association between emm/M3 and severe disease presentation . There was no increase in any particular risk group but the proportion of infections in children has risen, to 22 % in comparison to 15 % in the 2003-2004 surveillance. Substantial increases in scarlet fever notifications were also documented during the upsurge period, some of which were also linked to emm/M3 isolates (Health Protection Report, 2008). The primary goal of this study was to investigate the observed changes in iGAS disease epidemiology in the UK in the upsurge period between November 2008 and April 2009 through bacterial whole genome sequencing of emm/M3 GAS isolates submitted to the reference laboratory before, during and after the upsurge.

Isolate collection
Microbiology laboratories in England are required to submit all sterile-site GAS isolates to the national reference laboratory for typing, and laboratories in other parts of the UK are able but not required to send isolates for typing. This UK-

Impact Statement
Invasive Group A Streptococcus (GAS) infections cause significant mortality worldwide each year. In 2009, an unusual upsurge of iGAS infections caused by the genotype/serotype emm/M3 was observed in the UK. We aimed to understand the reasons behind this upsurge through whole genome sequence analysis of emm/M3 strains isolates between 2001 and 2013. By examining the core and accessory genomes we identified a new lineage of emm/M3 associated with a prophage potentially responsible for the upsurge seen in 2009. Ongoing prophage surveillance can provide early warning of proliferation of lineages causing increased incidence of severe disease. Prompt identification of such emergent lineages may permit public health interventions to be developed at an early stage.   (Croucher et al., 2015) was used to avoid selecting possible recombination sites. Bespoke scripts written in the Python language were used to select candidate SNPs if DP (depth of coverage) was greater than 5, AD ratio (the ratio of the unfiltered count of all reads that carried that specific allele compared with other REF and ALT alleles in that site) was greater than 0.8, MQ (mapping quality) was greater than 30 and no more than 0.05 of reads mapping at the position possessed a mapping quality of 0 (MQ0). Heterozygous and SNP positions filtered out by the metrics listed were replaced with the character 'N'. For each isolate, output was directed to a serialized Python pickle file. Pickle files were then combined to generate a single multiple alignment concatenated FASTA file containing filtered SNPs with the maximum proportion of Ns to accept in any column in the alignment set at 0.1. The script also excluded SNPs within prophage elements based on the MGAS315 genome prophage coordinates [aken from MGAS315 enank file (http://www.ncbi.nlsm. nih.gov/nuccore/NC_004070.1)]. Maximum-likelihood (ML) phylogenetic trees were then reconstructed using RAxML (Stamatakis et al., 2006). [Initial phylogenetic trees were reconstructed using the MEGA phylogenetic tree analysis tool (Kumar et al., 2008).] De novo assembly. Reads were assembled using VELVET (version 1.2.10) (Zerbino et al., 2008). The VELVET shuffleSe-quences_fastq.pl script was used to produce a shuffled FASTQ file to become the input for VelvetOptimiser (version 2.1.9) (Gladman et al., 2012) to optimize the cumulative rank for N 50 with minimum and maximum Kmer lengths of 55 and 75, respectively (-s 55 -e 75 -f '-short-Paired'). The resulting contigs were used to extract the MLST type of each isolate by comparing it with the MLST Streptococcus pyogenes database (http://spyogenes.mlst.net) using BLAST+ (Camacho et al., 2009). The MLST types were mapped onto the ML tree.  (from 1935, 1980-1981, 2001-2013except November 2008to April 2009 The percentage of isolates is given in parentheses. Isolates from Lineage C were significantly over-represented ( < 0.0001) in the upsurge period compared with any other lineage. Accessory genome investigation. To investigate the phage content of each isolate, reads were mapped in a local alignment mode using Bowtie 2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) against a set of all the identified S. pyogenes prophages available in GenBank (total of 53 prophages, Table S2) to generate a sequence alignment/map (SAM) file. After converting the SAM file to BAM file format, the BAM file was used to generate a variant-calling file (VCF) using the Samtools mpileup ) algorithm with default settings. Base polymorphisms were detected using an in-house Python script which parsed the VCF file line-by-line to determine the base-call at each nucleotide position. This list was filtered if the SNP had coverage five or more reads, frequency of polymorphic bases was 80 % and the overall quality of the variant call (i.e. base mapping) was 25 phred score. The algorithm then generates an overall identity score for each prophage sequence. Isolates showing over 90 % nucleotides identified over 100 % length of the prophage sequence were considered as present.

Isolates and their sequence types
To determine if emm/M3 isolates from the iGAS upsurge period (November 2008 to April 2009) were distinct in any way, the genomes of 447 emm/M3 isolates, including 60 from the upsurge period, were sequenced. High-quality SNPs derived by mapping to reference sequence MGAS315 (emm/M3) were used to generate a ML phylogenetic tree (Fig. 2). A total of 3184 SNPs were found amongst all isolates. MLST data extracted in silico from the contigs derived by the de novo assemblies differentiated all isolates into one of three sequence types (STs): ST315 (211, 48 %), ST15 (181, 40 %) or ST406 (55, 12 %). The isolates of type ST315 and ST406 were confined to single clearly differentiated clonal lineages whereas ST15 was found in multiple ST15-specific lineages, including those designated Lineages A and C (Fig. 2). ST315 isolates (Lineage D) were predominant from 2001 to 2006 (ST315: 76 %, ST406: 14 %, ST15: 10 %). Isolates from the period of the upsurge were observed in all lineages, excluding the possibility that upsurge cases could be wholly attributed to a single lineage (Fig. 2).
Pan genome analysis: association of prophage FUK-M3.1-carrying lineage with upsurge To determine whether changes in the pan genome occurred during the upsurge period, we compared the ratios of all genes within the isolates in the upsurge period with all other isolates in this study and identified those observed more often than expected during the upsurge period.
Lineage C comprised 78 isolates, 65 of which carried the complete FUK-M3.1 prophage, while the remainder either did not carry it or contained only an incomplete prophage (without the speC or the spd1 genes). Of these 65 isolates, 46 (70 %) were from invasive cases and the remainder from non-invasive cases. Although the isolates sequenced within Lineage C were derived from clinical isolates taken over a period of 10 years, one-third (22/65) of the FUK-M3.1containing isolates were from the 6 month upsurge period. These FUK-M3.1-containing isolates accounted for 22 of the 60 upsurge cases included in the study. Although the numbers of emm3 isolates varied across the study period, it is noteworthy that isolates carrying prophage FUK-M3.1 were not observed until 2006 and diminished in subsequent years (Fig. 4).
Isolates from Lineage C were significantly over-represented [ 2 (1 d.f.) = 17.77; <0.0001] in the upsurge period compared with any other lineage (Table 1). In contrast, isolates from other lineages, predominantly Lineage D, were significantly under-represented (<0.0026) in the upsurge period compared withother lineages. No significant differences were identified between the other lineages.

Non-UK and older isolates
Of the four non-UK isolates (two from Dublin isolated in 2006 and 2012 and two from Copenhagen isolated in 2007 and 2008) sequenced in this study, one strain from Dublin (accession no. ERS311234 isolated in 2006), which contained the FUK-M3.1 prophage, was associated with Lineage C while the other three were in the other lineages in the phylogeny (Lineages A, B and D in Fig. 2). Data from 86 emm/M3 GAS genomes collected from Ontario, Canada, between 2003 and 2009 taken from Shea et al. (2011) were analysed and incorporated into the phylogeny (data not shown). This revealed that six strains (accession nos. SRR125478, SRR125479, SRR125450, SRR125480, SRR125449, SRR125474) from the Ontario collection, isolated from 2002 to 2009) fell within Lineage C and also contained the FUK-M3.1 prophage, while others were distributed across the other lineages.
As part of this study, we also sequenced three isolates from the 1980s and one isolate from 1935 (NCTC 8191, ERS311351). The 1980s isolates did not contain the FUK-M3.1 prophage and belonged to a separate lineage. They did, however, contain the F315.1 and F315.2 prophages. The 1935 isolate contained the FUK-M3.1 prophage, whilst prophages F315.1 and F315.2 were missing. Interestingly, this isolate did not fall within Lineage C associated with the upsurge but instead belonged to another lineage derived from the common ancestor of ST15 strains.

Discussion
GAS emm/M3 strains are associated with severe infections and are associated with a higher likelihood of streptococcal toxic shock syndrome, necrotizing fasciitis and death in some patients (Banks et al., 2002). The main objectives of this study were to determine whether there was a pathogenencoded factor or factors that may have been responsible for the upsurge in emm/M3 isolates observed in late 2008/ early 2009 causing invasive GAS disease in England. Based on whole genome phylogeny we identified and characterized a new clonal lineage of emm/M3 GAS that was not present in detectable numbers in the collection examined before 2006. This lineage (Lineage C) was significantly associated with the upsurge period, and accounted for approximately one-third of cases within the upsurge. Furthermore, accessory genome analysis demonstrated that this lineage had gained a novel bacteriophage (FUK-M3.1) containing the genes speC and spd1 but lost two typical emm/M3 prophages: F315.1 and F315.2.
We considered the possibility that the absence of prophages F315.1 and F315.2 typically found among emm/M3 isolates might be relevant to the transient success of Lineage C. Prophage F315.1 has a different insertion site from F315.2 and does not contain any known virulence factors; however, it is sited within the single CRISPR locus found within the emm/ M3 genome. The absence of F315.1 among strains of Lineage C could potentially restore the CRISPR locus and may influence not only susceptibility to new DNA uptake, but also expression of virulence factors (Nozawa et al., 2011). Prophage F315.2 is a T12-like prophage that includes the superantigen ssa gene and is inserted at the predicted T12 att site. While the presence of phage-encoded superantigen genes is considered likely to confer advantage to GAS, it may be that acquisition of additional phages that encode alternative possibly more potent superantigens or other virulence factors compensates for this loss.
The proportion of emm types circulating in a population has been shown to vary often in a cyclical nature and periodic surges in specific emm types have previously been linked to the emergence of distinct clades. These clades have expanded and apparently replaced earlier lineages, for example the emergence of the modern emm/M1 (Nasser et al., 2014) and, more recently, acapsular emm/M89 through recombination-related remodelling of the genome (Turner et al., 2015). In this study we report an increase in the proportion of emm/M3 strains within the UK population and show that this was at least in part due to a lineage that was recently introduced into the UK. This lineage was distinguished by having a phage containing the speC and spd genes, a combination not seen in any of the phages that are commonly observed in emm/M3 strains. Unlike the rise of the acapsular emm/M89 lineage, which has been sustained, the 2009 rise in Lineage C was short-lived; a further rise in 2013 in emm/M3 was not associated with the same lineage or phage combinations. An apparent overall rise in iGAS in 2013 was accounted for by rises in several emm types including emm/M1. It would appear that S. pyogenes lineages can adopt a range of strategies to expand within a population, resulting in changes that are of varying durability.
Bacteriophages comprise 12 % of the published emm/M3 GAS genome MGAS315 and three of the four prophages found in MGAS315 are associated with at least one extracellular virulence factor including the superantigenic toxins ssa, speK and speA and the phospholipase slaA . Infection is likely to have accounted for the initial acquisition of FUK-M3.1 by the common ancestor of the 78 isolates within Lineage C. However, in 13 of the isolates, FUK-M3.1 was either absent or incomplete, perhaps through excision of the prophage. Both the speC and the spd1 genes can be associated with many different prophages that have been identified in the published emm/M1, emm/ M2, emm/M4, emm/M5, emm/M6, emm/M12, emm/M18 and emm/M28 genomes. A BLAST comparison revealed that FUK-M3.1 is similar to the prophage identified in the published emm/M2, emm/M4 and emm/M12 GAS genomes, which suggested that if the prophage was acquired by a horizontal acquisition event the donor may have been one of these M types, although there is evidence that GAS share their phage pool with other species (Musser et al., 1991). The presence of a prophage containing the speC and spd1 genes in emm/M3 GAS has been detected, albeit rarely, in some countries outside the UK (Meisal et al., 1998;Musser et al., 1991;Sharkawy et al., 2002) and in our study in a single 1930s isolate, but is most commonly absent from emm/M3 isolates (Commons et al., 2008;Friães et al., 2003Rivera et al., 2006. The FUK-M3.1 prophage was not detected in any contemporary emm/M3 isolates from our study prior to 2006 (Fig. 4). Therefore, we speculate that the lineage associated with the upsurge may have arisen by introduction of this strain into the UK from abroad, and resulted in a short-lived upsurge in severe disease phenotypes associated with GAS infection. To support this hypothesis, isolates collected from the study in Ontario, Canada (Shea et al., 2011), between 2003and two collected from Dublin (2006, 2012 and two collected from Copenhagen (2006Copenhagen ( , 2009 in this study were mapped onto the phylogeny. Six isolates from the 86 isolates in Ontario were found within Lineage C. This suggested that Lineage C was not exclusive to the UK and was found in other populations. The prophage F315.1 and the ssaassociated prophage F315.2 were absent from the isolates in Lineage C. This was surprising as prophages F315.1 and F315.2 were present in all other emm/M3 isolates sequenced and many of those seen in other areas worldwide (Commons et al., 2008;Ferretti et al., 2001;Nozawa et al., 2011). We propose that the acquisition of FUK-M3.1 and loss of F315.1 and F315.2 occurred independently rather than replacement of one with the other given that the integrated prophage hybrid sites positions attL and attR in the genomes are dissimilar, although we cannot exclude biological interference between the three prophages. The overarching question arising from such studies remains the reason for the association of the presence of FUK-M3.1 and other phages with the success of dominant lineages. Superantigens, such as speC, are hypothesized to undermine host immunity potentially through T cell anergy, although direct evidence for this in the clinical setting is lacking (Llewelyn & Cohen, 2002). From an evolutionary standpoint, any advantage to the bacterium is likely to impact more on pharyngeal infection and transmission than invasiveness. Evidence from animal models supports a role for prophage-encoded superantigens in pharyngeal infection (Kasper et al., 2014;Virtaneva et al., 2005); however, whether T cell-related immunoparesis is important is unclear.

Conclusions
The upsurge in invasive emm/M3 GAS infections in England in 2008/2009 was associated with the emergence of a novel lineage of emm/M3 GAS isolates within the population. Decreased population immunity to this novel genetic variant coupled with biological advantage conferred by carriage of the speC/spd1-associated prophage FUK-M3.1 may have potentially permitted expansion of this lineage throughout the UK, although we cannot exclude the role of other lineage-specific molecular changes. Acquisition of prophages may be a common feature of newly or rapidly emergent streptococcal lineages, but may only partly explain the success of such lineages. The expansion of emm/M3 lineage C containing the FUK-M3.1 prophage does not appear to have been as enduring as the expansion observed for the modern emm/M1 and novel emm/M89 lineages in the UK and we have not detected FUK-M3.1 in isolates from 2014-2015 (our unpublished data). Longitudinal molecular-epidemiological surveillance of prophage and toxin gene content within distinct GAS lineages could provide greater understanding of the contribution that such prophages make to periodic changes that occur in both upper respiratory tract and iGAS disease abundance. Furthermore, such surveillance, if applied to upper respiratory tract isolates, could provide early warning of lineages that may have a propensity for rapid expansion, thus facilitating potential public health interventions.