Genetic heterogeneity of the Spy1336/R28—Spy1337 virulence axis in Streptococcus pyogenes and effect on gene transcript levels and pathogenesis

Streptococcus pyogenes is a strict human pathogen responsible for more than 700 million infections annually worldwide. Strains of serotype M28 S. pyogenes are typically among the five more abundant types causing invasive infections and pharyngitis in adults and children. Type M28 strains also have an unusual propensity to cause puerperal sepsis and neonatal disease. We recently discovered that a one-nucleotide indel in an intergenic homopolymeric tract located between genes Spy1336/R28 and Spy1337 altered virulence in a mouse model of infection. In the present study, we analyzed size variation in this homopolymeric tract and determined the extent of heterogeneity in the number of tandemly-repeated 79-amino acid domains in the coding region of Spy1336/R28 in large samples of strains recovered from humans with invasive infections. Both repeat sequence elements are highly polymorphic in natural populations of M28 strains. Variation in the homopolymeric tract results in (i) changes in transcript levels of Spy1336/R28 and Spy1337 in vitro, (ii) differences in virulence in a mouse model of necrotizing myositis, and (iii) global transcriptome changes as shown by RNAseq analysis of isogenic mutant strains. Variation in the number of tandem repeats in the coding sequence of Spy1336/R28 is responsible for size variation of R28 protein in natural populations. Isogenic mutant strains in which genes encoding R28 or transcriptional regulator Spy1337 are inactivated are significantly less virulent in a nonhuman primate model of necrotizing myositis. Our findings provide impetus for additional studies addressing the role of R28 and Spy1337 variation in pathogen-host interactions.

Introduction Streptococcus pyogenes (group A streptococcus, GAS) is a strict human pathogen responsible for >700 million infections and~517,000 deaths annually worldwide [1]. Human infections range in severity from relatively mild conditions such as pharyngitis to life-threatening septicemia and necrotizing fasciitis/myositis [2]. GAS also causes skin infections such as impetigo and erysipelas [3] and post-infection sequelae, including rheumatic fever [4], rheumatic heart disease [5], and glomerulonephritis [6].
GAS strains are commonly classified based on serologic diversity in M protein, an antiphagocytic cell-surface virulence factor, or allelic variation in the 5'-end of the emm gene that encodes this protein [7,8]. More than 250 emm types have been identified, but the majority of infections in many countries are caused by a relatively small number of prevalent emm types: emm1, emm3, and emm12 [9][10][11][12]. Strains of emm28 (serotype M28) GAS are of special importance because they are among the top five emm types causing invasive infections in the USA [11,13] and several European countries [14][15][16]. They also have an unusual propensity to cause puerperal sepsis (childbed fever) and neonatal infections [17][18][19][20][21]. The molecular mechanisms contributing to the ability of emm28 strains to cause devastating peripartum infections are poorly understood.
The R28 protein is a surface-associated virulence factor made by emm28 strains and has been studied as a vaccine candidate [22][23][24]. R28 originally was described by Lancefield and colleagues based on serologic studies [25][26][27]. The gene (Spy1336/R28) encoding R protein in emm28 GAS strains is nearly identical to the alp3 gene encoding the Alp3 protein in group B streptococcus (GBS), a common cause of neonatal sepsis, pneumonia, and meningitis [24,28,29]. Genome sequencing of emm28 GAS strain MGAS6180 discovered that the gene encoding the R28 protein (Spy1336/R28) is located on an integrative-conjugative element (ICE)-like element originally designated as region of difference 2 (RD2) [30,31]. In the genome of reference strain MGAS6180 [31], RD2 is a 37.4 Kb segment of DNA that has 34 annotated genes. RD2 is present in a small number of other GAS emm types and is >99% identical to a region present in the chromosome of the majority of GBS strains [31]. These characteristics suggest that RD2 has been disseminated into different streptococcal strains and species by horizontal gene transfer and recombination [32,33]; published data support this idea [33]. The presence of RD2 in GBS and emm28 GAS strains causing infections associated with the female genital tract also suggests that genes located on this ICE element are causally involved in this clinical phenotype.
The Spy1337 gene is adjacent to, and divergently transcribed from, Spy1336/R28 in GAS. The Spy1337 protein is a member of the AraC family of prokaryotic transcriptional regulators. Members of this family of regulators commonly have two domains, a conserved C-terminal domain that defines the members of the family, and a variable N-terminal domain that differs among family members. The C-terminal domain comprises approximately 100 amino acids and has two helix-turn-helix (HTH) DNA-binding motifs. The N-terminal domain can be bifunctional, mediating effector-binding and multimerization of the regulator [34,35]. Genes encoding Spy1337/AraC-like proteins are frequently found adjacent to genes encoding surface antigens containing the YSIRK signal sequence motif {(YF)SIRKxxxGxxS}, which is responsible for localized secretion at the division septum [36,37]. This motif is present in the N-terminal domain of R28 at amino acid positions 19 through 30.
Recently, we proposed that the Spy1337 protein is a positive transcriptional regulator of both Spy1336/R28 and Spy1337, and that by regulating expression of Spy1336/R28 and other genes, Spy1337 is involved in emm28 GAS virulence [38]. To positively regulate virulence gene expression, AraC-like transcriptional regulators usually either bind to chemical effectors present at the site of infection to cause a conformational change favoring DNA-binding to their cognate gene targets [39,40], or they bind to AraC negative regulators (ANRs) that inhibits such interactions [41]. In addition, they can regulate their own expression [42][43][44][45][46].
The Spy1336/R28 gene has a centrally-located long TR motif (referred to herein as TR R28 ), characteristic of genes encoding Alp family proteins. Due to the homologous nature of repetitive DNA sequence, regions having TRs frequently vary in size as a consequence of mutational events involving either unequal crossover or intramolecular recombination [62][63][64][65]. The R28 protein made by GAS emm28 reference strain MGAS6180 [31] has 13 identical TR R28s of 79 amino acids (aa) each.
In a recent study of 492 M28 invasive isolates of GAS for which whole genome sequence and transcriptome data were available, we used machine learning to determine that a variablelength T-nucleotide homopolymeric tract (HT, referred to herein as HT Spy1336-7 ) in the intergenic region of Spy1336/R28 and Spy1337 was associated with differences in transcript levels of these two genes [38]. HTs are commonly characterized by rapid length variation [66]. When present in promoter regions, HTs can modify transcript levels by altering the distance between promoter elements [67] or changing the binding of transcription factors [68,69]. In~94% of the strains we studied, HT Spy1336-7 had either nine or ten T residues. Comparison of human infection isolates found that strains with 9Ts had significantly lower transcript levels of the Spy1336/R28 and Spy1337 genes compared to strains with 10Ts. Compared to a parental strain with 9Ts, an isogenic mutant strain with 10Ts in the HT Spy1336-7 had significantly increased transcript levels of Spy1336/R28 and Spy1337 and was significantly more virulent in a mouse necrotizing myositis infection model. In addition, compared to the isogenic strain with 9Ts, the strain with 10Ts was significantly more resistant to killing by human polymorphonuclear leukocytes ex vivo and produced more R28 protein. Thus, a one-nucleotide indel in the HT Spy1336-7 caused altered level of these two transcripts and changed the virulence phenotype [38]. Of note, there is an HT comprised of 8-15 T residues located in the regulatory region 99 nucleotides (nts) upstream of the bca gene encoding the GBS alpha C Alp protein [68].
In the present study, we determined the extent of heterogeneity in the number of TR R28s in Spy1336/R28 in a subset of 493 M28 invasive clinical isolates for which we had whole genome and transcriptome data, including the reference strain MGAS6180 [38]. In addition, we analyzed size variation of the HT Spy1336-7 region located upstream of Spy1336/R28 in >2,000 strains of emm28 GAS cultured from invasive human infections and compared transcriptome changes in isogenic strains containing variable lengths of HT Spy1336-7 . Finally, we used isogenic mutant strains to determine the contributions of Spy1336/R28 and Spy1337 to global gene expression and virulence.

Ethics statement
The clinical isolates strains used in this study were collected as part of comprehensive population-based public health surveillance studies of emm28 S. pyogenes infections conducted in 11 states in the United States, Canada (ON), the Faroe Islands in Denmark, Finland, Iceland, and Norway [38]. Consent for collection of these strains was waived and all data were fully anonymized.

Bacterial strains and growth conditions
GAS strains were grown at 37˚C in Todd-Hewitt broth (Bacto Todd-Hewitt broth; Becton Dickinson and Co.) supplemented with 0.2% yeast extract (THY medium). THY medium was supplemented with chloramphenicol (20 μg ml -1 ) as needed. Trypticase soy agar supplemented with 5% sheep blood (Becton Dickinson and Co.) was used as required. E. coli strains were grown in Luria-Bertani (LB) medium at 37˚C, unless indicated otherwise. LB medium was supplemented with chloramphenicol (Acros Organics; 20 μg ml -1 ) as needed.

DNA manipulation
Standard protocols or manufacturer's instructions were used to isolate plasmid DNA, and conduct restriction endonuclease, DNA ligase, PCR, and other enzymatic treatments of plasmids and DNA fragments. Enzymes were purchased from New England Biolabs, Inc (NEB). Q5 high-fidelity DNA polymerase (NEB) was used. Oligonucleotides were purchased from Sigma Aldrich.

Chromosomal DNA extraction and PCR amplification of the Spy1336/R28 repeat region
Chromosomal DNA extraction was performed as described [70], using Fast-Prep lysing Matrix B beads in 2-ml tubes (M P Biomedicals), or the DNeasy blood and tissue kit (Qiagen). The primer sequences used for PCR-based size determination of the Spy1336/R28 repeat region are shown in S5 Table. Three different primer sets were used. All primers were designed to bind to conserved regions located upstream, in the case of the forward (FWD) primers, or downstream, for the reverse (REV) primers, from the DNA sequence of the Spy1336/R28 gene encoding the repeat region. Primer set 1 comprised primers JE433 (FWD), binding 60-nt upstream of the repeat region, and JE431 (REV), binding 226-nt downstream of the repeat region. Primer set 2 included primers JE432 (FWD), binding 255-nt upstream of the repeat region, and JE431 (REV). Primer set 3 was comprised of primers JE410 (FWD), binding 2,264-nt upstream of the repeat region, and JE412 (REV), binding 1,098-nt downstream of the repeat region. The extension time used for the PCR reactions was adjusted to accommodate anticipated PCR fragment length. Typically primer set 1 was used first to obtain the desired PCR product and the other 2 primer sets were used if primer set 1 failed to yield an amplified product. Two different DNA ladders were used to determine the size of the PCR products, including the exACTGene 100 bp-10,000 bp DNA ladder (Fisher Scientific), and the 100-bp DNA step ladder (Promega). Both ladders were loaded at least 3 times in every agarose gel and used as reference to determine DNA fragment length (S1 Table).

Analysis of the number of T residues in the HT 1336-7
A blastable database of SPADES [71] assemblies was made for 2,095 emm28 genomes [38]. Using two adjacent DNA sequences of 20-nts that flank HT 1336-7 , the DNA region surrounding these two DNA sequences, including the two sequences, was extracted from each strain, and the number of T nucleotides counted. In addition, we analyzed the number of T nucleotides present in HT Spy1336-7 in the 2,095 emm28 genomes with the command-line program Jellyfish [72], which uses k-mers, and counted the occurrences of each HT Spy1336-7 variant. Using these combined approaches we identified the length of HT Spy1336-7 in 2,074 (~99%) of the original 2,095 strains, corresponding to 30 different alleles (S2 and S3 Tables). Of these alleles, six contained indels in HT Spy1336-7 exclusively, which resulted in changes in the number of consecutive T nucleotides, with no additional polymorphisms. These six alleles were present in 2,020 (~97%) strains.

Western immunoblot analysis
Bacteria grown in THY were collected at OD 600 =~0.6 (ME), centrifuged at 16,100g for 1 min, and pellets were resuspended in PBS. The Western immunoblot procedure used has been described [38], with the following modifications: (i) protein transfer to nitrocellulose membranes was done for 80 ( Fig 6) or 45 min ( Fig 2C) at 120 volts, and (ii) the anti-R28 antibody [38] was diluted to 1:1,250 in PBS-T with 5% nonfat dry milk, whereas the HRP-conjugated anti-rabbit secondary antibody was used at 1:13,500 dilution.

Construction of isogenic mutant strains
All isogenic mutant strains used in this study are listed in S6 Table. Isogenic mutant strain MGAS27961-11T containing an 11T HT Spy1336-7 was generated using allelic exchange as described previously [73]. Briefly, primers HPN-1 and HPN-2 [38] were used to amplify ã 2,690-bp fragment using genomic DNA of MGAS11108, an emm28 clinical isolate with the naturally occurring 11T nucleotide region. The amplicon encompasses HT Spy1336-7 . The resulting PCR product was cloned into suicide plasmid pBBL740 and transformed into parental strain MGAS27961-9T. The plasmid integrant was used for allelic exchange as described previously [73]. To identify strains putatively containing the allelic replacement region encompassing the expected polymorphism (11 T nucleotides), we sequenced the Spy1336/R28 upstream region using primer HPN-seq (S5 Table).
Isogenic mutant strain MGAS27961-10T-ΔSpy1336 was constructed using MGAS27961-10T genomic DNA as template for amplification. All primers are listed in S5 Table. Primer sets 1336-1 and -2 and 1336-3 and -4 were used to amplify two fragments upstream and downstream, respectively, of Spy1336/R28. The two PCR fragments were merged by combinatorial PCR and ligated into the BamHI site of suicide vector pBBL740. The recombinant plasmid containing a Spy1336/R28 deletion encompassing the entire gene was transformed into strain 27961-10T to replace the native Spy1336/R28 via allelic exchange. Isogenic mutant strains MGAS27961-10T-ΔSpy1337 and the MGAS27961-10T-ΔSpy1336/ΔSpy1337 double mutant were constructed with methods analogous to those described above. Primer sets 1337-1 and -2 and 1337-3 and -4 were used to generate MGAS27961-10T-ΔSpy1337, and 1336-1337-1 and -2 and 1336-1337-3 and -4 were used to generate the double mutant. Whole genome sequence analysis on the isogenic mutant strains confirmed the absence of spurious mutations.

RNAseq library preparation, sequencing, and analysis
Isogenic emm28 strains were grown in triplicate in THY and harvested at mid-exponential (ME; OD 600 = 0.46-0.52) and early-stationary (ES; OD 600 = 1.65-1.7) phases of growth. Bacteria (2 ml) from the ME phase and ES phase (1 ml) were added to 4 ml and 2 ml of RNAprotect Bacteria Reagent (Qiagen), respectively, incubated at room temperature for 20 min, and centrifuged at 4,000 rpm for 15 min. The supernatant was discarded, and the bacterial pellet was frozen in liquid nitrogen and stored at -80˚C. The RNeasy kit (Qiagen) was used for total RNA isolation, and the quality of the total RNA was evaluated with RNA Nano chips (Agilent Technologies) and an Agilent 2100 Bioanalyzer. RNA extraction for all emm28 isogenic strains was performed as described previously [38, 70,74]. The rRNA was depleted with the Ribo-Zero rRNA removal kit for Gram-positive bacteria (Illumina). The quality of the rRNA-depleted RNA was evaluated with RNA Pico chips (Agilent Technologies) and an Agilent 2100 Bioanalyzer. NEBNext Ultra II DNA library prep kit (NEB) was used to prepare the cDNA libraries, according to the manufacturer's instructions. The quality of the cDNA libraries was evaluated with High-Sensitivity DNA chips (Agilent Technologies) and an Agilent 2100 Bioanalyzer. The cDNA library concentration was measured fluorometrically with Qubit dsDNA BR and HS assay kits (Invitrogen). Analysis of the RNAseq data was performed as described previously [38].

Necrotizing myositis infection models
A mouse model of necrotizing myositis was used to compare virulence of the 9T, 10T and 11T isogenic strains as previously described [38]. Briefly, 120 CD1 mice from Envigo were kept in cases containing 5 animals each, provided with chow pellets and acidified water ad libitum, and corn cob bedding with nesting material. They were inoculated in the right hindlimb with 5x10 8 CFU (n = 40 mice per strain) and followed for 7 days. They were euthanized with an overdose of isoflurane (primary), followed by cervical dislocation (secondary).
A well-described NHP model of necrotizing myositis was used to compare the virulence of the wildtype and isogenic MGAS27961-10T-ΔSpy1336/R28 and MGAS27961-10T-ΔSpy1337 strains [70,75]. Three cynomolgus macaques (2-3 years, 2-4 kg) were used. Animals were randomly assigned to different strain treatment groups and inoculated with 5x10 9 CFU/kg of one strain in the right limb and a different strain in the left limb. Each strain was tested in triplicate. The animals were observed continuously and necropsied at 24 h post-inoculation. Lesions (necrotic tissue) were excised, measured in three dimensions, and volume was calculated using the formula for an ellipsoid. A full-thickness section of tissue taken from the inoculation site was fixed in 10% phosphate buffered formalin and embedded in paraffin using standard automated instruments. Histology of the three sections taken from each limb was scored by a pathologist blinded to the strain treatment groups [75,76]. To obtain the quantitative CFU data, diseased tissue recovered from the inoculation site was weighed, homogenized (Omni International) in 1 mL PBS, and CFUs were determined by plating serial dilutions of the homogenate. Statistical differences between strain groups were determined with the Mann-Whitney test. Animal studies were approved by the Institutional Animal Care and Use Committee at Houston Methodist Research Institute (protocol numbers AUP-1217-0058 and AUP-0318-0016).
The humane endpoints used to determine when animals should be euthanized were for animals to be sacrificed if they either become immobile, reach lame scale = 4, reach body condition <2, develop an injection site abscess > 1 cm diameter, the injection site abscess ruptures, a metastatic (site other than injection site) abscess forms, lose >10% weight, or demonstrate other features of severe distress. Since none of the above criteria applied, the duration of the experiment was 7 days for mice and 24h for NHPs. Animals were euthanized immediately after the 7 day or 24h deadlines. The numbers of animals used was 120 (mice) and 3 (NHPs). They were all euthanized, and none was found dead. Animal care and handling was provided by the Comparative medicine core. Animal health and behavior were monitored at least once daily. All animal welfare considerations taken, including efforts to minimize suffering and distress, use of analgesics or anaesthetics, or special housing conditions were in accordance with guidelines specified by the Institutional Animal Care and Use Committee at Houston Methodist Research Institute (protocol numbers AUP-1217-0058 and AUP-0318-0016).
Mice were housed in groups of 5 animals per individually ventilated techniplast cage. Cages are furnished with autoclaved quarter inch corncob bedding and a pulped virgin cotton fiber nestlet. Irradiated Teklad Global Diet 2920 pellets and acidified reverse osmosis water were provided ad libitum. Animals were examined at least once daily by the veterinary staff, attending veterinarian and investigator. Similarly, nonhuman primates were housed individually in squeeze back cages with a perch and provided chow, fresh fruit and vegetables, and water ad libitum. An enrichment program is maintained by the comparative medicine program.

Neutrophil bactericidal activity assays
Neutrophil bactericidal activity assays were performed in accordance with protocol 01-I-N055, approved by the Institutional Review Board for human subjects, National Institute of Allergy and Infectious Diseases. All volunteers gave written informed consent prior to participation in the study. Human neutrophils were isolated from the venous blood of healthy volunteers using a standard method [77]. Killing of S. pyogenes by human neutrophils was performed as described previously [38], except assay tubes were rotated for 3 h at 37˚C.

Statistical analysis
Unless otherwise stated, error bars represent standard deviation (SD), and P values were calculated using either Kruskal-Wallis, or log-rank tests. Differential expression analysis was performed using DESeq2 1.16.1. Genes were considered differentially expressed if the fold-change was greater than 1.5-fold and associated with adjusted P value (Bonferroni corrected) < 0.05. For mouse survival studies, results were graphed as Kaplan-Meier curves and data were analyzed using the log-rank test with P < 0.05 considered to be significant. For the NHP virulence studies, lesion volume and CFU data were graphed as mean +/-SEM and analyzed using the Kruskal-Wallis test with P < 0.05 considered to be significant.

Heterogeneity in the number of ALP-family long tandem repeats in the Spy1336/R28 gene and protein
The R28 protein has 3 domains: the amino-terminal and carboxy-terminal domains flank the size-variable central TR R28 domain. The amino-terminal domain is composed of 424 aa and it contains a secretion signal sequence with the YSIRK motif. The carboxy-terminal domain has 46 aa and it contains a cell-wall anchoring sequence with the LPXTG motif characteristic of surface-attached proteins (Fig 1A). Inasmuch as the number of TRs can vary by recombination or other mechanisms [62][63][64][65] we hypothesized that the large number of emm28 GAS strains we studied could have different size variants of the Spy1336/R28 gene as a consequence of having different numbers of TR R28s . Our previously reported Illumina paired-end 150nt read Regions encoding the amino-terminal, carboxy-terminal, and variable repeat regions are indicated. DNA binding domain refers to two predicted helixturn-helix DNA-binding motifs. Schematic is not drawn to scale. (B) Data shown correspond to 493 strains. ND, not determined. ( � ), two strains studied did not contain the RD2 mobile genetic element, as confirmed by inspection of bam files using TABLET [78]. (C) Western immunoblot of strains with R28 proteins containing different numbers of TR R28 , inferred based on gene sequence data. All strains analyzed had an HT Spy1336-7 with 10Ts because 9T strains do not produce detectable R28 protein. TR R28 , number of tandem repeats per strain. MW, inferred molecular weight of the R28 protein. https://doi.org/10.1371/journal.pone.0229064.g001

PLOS ONE
length whole genome sequence data [38] could not be used to accurately assemble and determine the number of repeats in the TR R28 region, as the sequence reads are of insufficient length to span the 237nt repeated motif. Thus, to determine the extent of size variation in TR R28s in the Spy1336/R28 gene, we used PCR analysis, as described in the Materials and Methods, and studied the 493 strains analyzed previously by RNAseq [38]. Overall, we identified TR R28s size variants ranging from 1 to 17 copies of the repeat (Fig 1B and S1 Table). The most common numbers of TR R28s identified was ten (n = 71, 14.4%) then nine (n = 69, 14.0%). The inferred molecular weight of several R28 variants was confirmed by Western immunoblot analysis (Fig 1C).

Heterogeneity in a homopolymeric tract in the intergenic region between Spy1336/R28 and Spy1337
We previously discovered that a single nucleotide indel located in an HT in the intergenic region between the divergently transcribed Spy1336/R28 and Spy1337 genes (Figs 1A and 2A) significantly altered the transcript levels of the two genes [38]. Specifically, strains with 9Ts in the HT produced little or no detectable transcript of these two genes, whereas organisms with 10Ts in this tract produced abundant and significantly increased levels of transcripts [38]. Moreover, increased transcript levels of Spy1336/R28 and Spy1337 resulted in increased production of the R28 virulence factor and increased virulence in a mouse necrotizing myositis infection model [38]. We reported that approximately two-thirds of 493 strains had the 10T variant of the HT Spy1336-7 region, whereas one-third of strains had a 9T variant [38].
Taken together, these observations provided the impetus to expand our study of heterogeneity in the HT Spy1336-7 region to include the entire previously described cohort of 2,095 emm28 clinical isolates recovered from invasive human infections collected in six countries over a 26-year period [38]. The HT Spy1336-7 region was analyzed in three ways, as described in detail in Materials and Methods. First, the contigs from genome assemblies generated with SPAdes [71] for all strains were searched with sequences of 20-nt flanking HT Spy1336-7 on each side using nucleotide Basic Local Alignment Search Tool (BLASTn). The identified HT Spy1336-7 target regions were retrieved, binned by alleles, and alleles were enumerated. Subsequently, the number of T nucleotides in the HT Spy1336-7 for each allele was counted. This method yielded results for the great majority of strains (~89%). As a second assembly method, we interrogated the Illumina sequencing reads for each strain using a set of eight probes of 31-nts in length corresponding to HT Spy1336-7 alleles with 6 to 13 Ts. In the aggregate, six alleles of HT Spy1336-7 were identified that differed from one another only by the number of T residues, varying in length from 8 to 13Ts (Fig 2B; S2 and S3 Tables). This analysis identified the same approximate frequency distribution for strains containing 9T or 10T nucleotides as described in our previous study [38], namely~1/3 (n = 650) and~2/3 (n = 1,226), respectively. We also discovered that~6% (n = 128) of the strains had 11Ts in the HT Spy1336-7 sequence (Fig 2C). A small number of strains had 8Ts, 12Ts, or 13Ts in the HT Spy1336-7 sequence (S2 and S3 Tables). For any strain with discrepant or no results, we visually inspected read alignment, i.e. Binary Alignment Map (BAM) files corresponding to the Spy1336/R28-Spy1337 intergenic region using TABLET [78].

Construction and growth characteristics of isogenic mutant strains
We previously compared the global transcriptomes of isogenic mutant strains (MGAS27961-9T and MGAS27961-10T) that differ only in the number of T residues in the HT Spy1336-7 region [38]. In view of our finding that 6% of strains have 11Ts in this region, we constructed isogenic mutant strain MGAS27961-11T. We also constructed isogenic mutant strains MGAS27961-ΔSpy1336/R28, MGAS27961-ΔSpy1337, and MGAS27961-ΔSpy1336/ΔSpy1337 in which the target genes were deleted in a parental strain with 10Ts in the HT Spy1336-7 region. The goal of generating these strains was to perform comparative transcriptome and virulence analyses. All strains had a very similar growth curve under the laboratory conditions tested (S1 Fig).

Transcriptome analysis of clinical strains
We examined the transcriptome data for 442 M28 clinical isolates [38] to determine if variation in the length of HT Spy1336-7 (Fig 2B) altered the level of transcripts for Spy1336/R28 and Spy1337. Among these 442 clinical isolates, 423 strains had alleles exclusively containing indels

PLOS ONE
in HT Spy1336-7. All HT variants were represented in these 423 strains. We restricted examination of the transcriptome data to strains with 8 to 11Ts in the HT Spy1336-7 region because the 12T and 13T variants were represented by only one strain each. Strains with 8 and 9Ts in the HT Spy1336-7 region had low transcript levels of Spy1336/R28 and Spy1337 genes (Fig 3). In contrast, strains with the 10T variant had significantly higher transcript levels of Spy1336/R28 and Spy1337 compared to strains with the 9T variants (P<0.0001). Strains with the 11T variant (n = 25) had transcript levels for Spy1336/R28 (P<0.0001) and Spy1337 (P = 0.0021) that are significantly higher than either 10T or 9T strains (Fig 3).

Transcriptome analysis of isogenic mutant strains
We next used RNAseq to test the hypothesis that, when compared to a parental strain with 9Ts (MGAS27961-9T) in the HT Spy1336-7 region, an isogenic mutant strain with 11Ts (MGAS27961-11T) had an altered transcriptome. RNAseq analysis confirmed that the gene expression profile of MGAS27961-11T is modestly altered compared to MGAS27961-9T. Principal component analysis supported these findings (Fig 4A and 4B). Moreover, compared to strain MGAS27961-9T, isogenic strain MGAS27961-11T had differential expression of 4.7% (3.2% upregulated and 1.5% downregulated at ME) and 6% (0.7% upregulated and 5.4% downregulated at ES) of the GAS genome (S2A Fig). We note that at both phases of growth, transcript levels of Spy1336/R28 in MGAS27961-11T were significantly higher compared to the MGAS27961-9T strain values (Fig 4C and 4D, and S2A Fig). As expected, no detectable transcript expression of Spy1336/R28 and Spy1337 was observed in the isogenic deletion mutant strains, MGAS27961-ΔSpy1337 and MGAS27961-ΔSpy1336/ ΔSpy1337 (Fig 4C-4F). Virulence genes significantly upregulated, using a 1.5-fold cut-off, included: nga, encoding an NADase cytotoxin; slo, encoding the cytolytic protein streptolysin O; the sag operon encoding streptolysin S; mga, a positive transcriptional regulator of multiple virulence genes; emm28 encoding M protein; and sof, encoding serum opacity factor (S2 Fig).

Heterogeneity in the HT Spy1336-7 region significantly affects virulence in a mouse model of necrotizing myositis
To test the hypothesis that the number of T nucleotides in HT Spy1336-7 region contributes to GAS virulence, we inoculated mice intramuscularly with either the parental strain with 9Ts in HT Spy1336-7 or an isogenic mutant strain with 10Ts or 11Ts. Compared to the parental 9T strain, the isogenic 10T and 11T strains each caused significantly greater mortality and larger lesions with more tissue destruction (Fig 5). Taken together, the data support the hypothesis that the number of T nucleotides in HT Spy1336-7 significantly affects virulence in this infection model, especially when comparing the 9T to either the 10T or 11T isogenic strains.

Analysis of the R28 protein made by the isogenic mutant strains
In most bacteria, gene transcript levels typically correlate with amounts of the encoded proteins made, but this is not always the case [79,80]. We reported previously that the R28 protein is produced and detected by Western immunoblot in whole cell extracts and supernatants derived from MGAS27961-10T, but not from the isogenic strain MGAS27961-9T. This finding is consistent with RNAseq data showing that the transcript level of Spy1336/R28 was higher in MGAS27961-10T compared to MGAS27961-9T [38].
Next, we analyzed R28 protein production by parental and isogenic mutant strains containing the three different length variants of HT Spy1336-7 (Fig 6B). Consistent with previous data, we did not detect production of R28 by parental strain MGAS27961-9T, whereas immunoreactive R28 was produced by isogenic mutant strain MGAS27961-10T [38]. Also consistent with our hypothesis and RNAseq data, isogenic mutant strain MGAS27961-11T produced greater amounts of immunoreactive R28 compared to strain MGAS27961-10T and parental strain MGAS27961-9T (Fig 6B).

Spy1336/R28 and Spy1337 contribute significantly to virulence in a nonhuman primate (NHP) model of necrotizing myositis
To test the hypothesis that Spy1336/R28 and Spy1337 contribute to GAS virulence we used NHPs, a well-studied animal model that closely resembles human hosts in terms of physiology and immune response [70,[81][82][83]. We inoculated NHPs with parental strain MGAS27961- 10T and isogenic mutant strains MGAS27961-10T-ΔSpy1336/R28 or MGAS27961-10T-ΔSpy1337. Compared to the parental strain, each of the two isogenic deletion-mutant strains caused significantly smaller lesions characterized by less tissue destruction (Fig 7A and 7C). In addition, compared to the parental strain, significantly fewer CFUs of each isogenic mutant strain were recovered from the inoculation site (Fig 7B). Taken together, the data support the hypothesis that Spy1336/R28 and Spy1337 contribute to virulence in this infection model.

Discussion
S. pyogenes, a leading cause of human morbidity, mortality, and healthcare costs globally, produces a large number of extracellular virulence factors. Strains of serotype M28 S. pyogenes are repeatedly associated with puerperal sepsis (childbed fever) and are a prominent cause of pharyngitis in many countries. The molecular mechanisms responsible for puerperal sepsis, other invasive infections, and pharyngitis caused by serotype M28 strains are poorly understood. Several lines of evidence indicate that the Spy1336/R28 gene encodes a cell surface-anchored virulence factor [24,84] that is involved in S. pyogenes pathogenesis. We analyzed the Spy1336/ R28 gene encoding the R28 protein in approximately 2,000 emm28 invasive strains. We found DNA sequences located in the upstream regulatory region and repeat sequences in its coding sequence that make Spy1336/R28 highly polymorphic when compared across our population sample; importantly, these polymorphisms affect virulence.
We previously reported a bimodal distribution in the transcripts made by Spy1336/R28 and Spy1337 in a study of 492 emm28 strains [38]. Namely, strains with 9Ts in HT Spy1336-7 located between Spy1336/R28 and Spy1337 produced low transcript levels (Figs 1A and 2A), whereas strains with 10Ts produced significantly greater transcript levels [38]. The current work confirmed and extended our observations and found that in the sample of~2,000 strains, and when considering alleles exclusively containing indels in the HT Spy1336-7 , (i) the number of T residues varied between 8 and 13, (ii) most strains (~90.4%) had 9T (32%) or 10T (61%) residues, (iii) an 11T variant was present in 6% of the strains (Figs 2B, 2C and 8), and (iv) non-isogenic clinical isolates and an isogenic strain with an 11T variant had significantly higher transcript levels of Spy1336/R28 and Spy1337, and produced more R28 protein than 9T and 10T variant isogenic strains (Figs 3 and 6, S2 and S3 Tables).
Size variants of the HT Spy1336-7 may arise because HTs, especially poly(T)-containing HTs such as HT Spy1336-7 , are able to form transient mispaired regions leading to slipped-strand mispairing during DNA replication, resulting in expansion or contraction of the HT [85][86][87][88][89][90]. HTs located at the 5' upstream untranslated regions of genes such as the HT Spy1336-7 can contribute to altered regulation of gene transcript expression [68,91]. In this regard, HTs may be involved in phase variation [92] and bacterial adaptation [65,93]. The finding that the vast majority of strains had a 9T or 10T genotype suggests that this system may influence GAS-human interaction in some settings. Under one scenario, lack of or very low transcript production (9T genotype) is advantageous to the organism in certain currently undefined physiologic conditions. Conversely, the significantly-higher transcript level conferred by the 10T genotype could be advantageous in other conditions, also currently not defined. Compared to the 9T strain, the isogenic mutant strain with the 11T genotype had a higher level of Spy1336/R28 transcript ( Fig  4C and 4D). The 11T strain also had a higher level of production of R28 protein (Fig 6B), compared to both the 9T and 10T strains. We hypothesize that the increased R28 protein made by the 11T strain confers only slightly enhanced fitness to the organism (relative to the 10T genotype), an idea that is consistent with the observation that relatively few (6%) natural clinical isolates have this genotype, and substantiated by the fact that the isogenic mutant strain had only modestly increased virulence in the mouse model of necrotizing myositis. Alternatively, too much R28 protein could be detrimental during a natural course of infection in the human host by promoting too tight adherence, leading to decreased dissemination, or even promoting an increased immune response. Consistent with these ideas, the 10T and 11T isogenic mutant strains did not differ substantially in the number of differentially-expressed genes (S2 Fig, panels A and D), or in their resistance to killing by human PMNs ex vivo (Fig 9), although compared to the 9T wild-type parental strain, these two mutant strains had significantly enhanced resistance to killing by human PMN, and more differentially expressed genes (S2 Fig, panels A,  B, and D). In terms of the small number of strains containing additional size variants of the HT region (8Ts, 12Ts, and 13Ts), strains with 8Ts produce very minimal levels of Spy1336/R28 transcripts (Fig 3A), and those with 12Ts and 13Ts might not further enhance pathogen fitness, or alternatively, global transcript changes occurring in strains with these genotypes in some way decrease fitness. Clearly, additional investigations are required to address these ideas.
Puopolo and Madoff reported that deletion or expansion of a 5-nt repeat (AGATT) adjacent to the poly(T) tract is associated with a null phenotype for expression of the bca gene encoding the alpha C protein in GBS [68]. This region is highly conserved in GAS and GBS. However, only a small number of strains were studied. Here, analysis of 2,074 clinical isolates recovered from human patients with invasive disease identified only four strains with variation in this pentanucleotide motif (TCTAA in S3 Table), whereas most of the variation was found in HT Spy1336-7 region. Thus, in the natural populations of GAS we studied, variation in HT Spy1336-7 is the key driver of transcript level changes of Spy1336/R28 and Spy1337 [38].
Neither Spy1336/R28 transcript nor R28 protein production were detectable in an isogenic strain in which the Spy1337 transcription factor gene was deleted. In addition, deletion of either Spy1336/R28 or Spy1337 resulted in a significant decrease in virulence in an NHP model of necrotizing myositis (Fig 7). Thus, when taken together, our present and previous [38] data are consistent with a model in which the regulation of expression of the gene encoding the R28 virulence factor is partly dependent on a process whereby indels occurring in the HT Spy1336-7 affect binding of the Spy1337 transcriptional regulator to this DNA region by altering its consensus binding site, and/or changing the spacing, and therefore spatial orientation between two adjacent binding sites (Fig 2A). In this regard, the DNA sequence ATTTT present twice in the Spy1336/R28-Spy1337 regulatory region resembles part of the consensus binding site for the AraC transcriptional regulator ToxT [94]. In our proposed model, Spy1337 positively regulates the expression of both Spy1336/R28 and Spy1337 (Fig 8).
The R28 virulence factor binds through its N-terminal domain to integrin receptors α 3 β 1 , α 6 β 1 , and α 6 β 4 [60], which in the human host bind to laminin, an ECM protein [61]. This constitutes an additional instance in which a pathogen binds to host integrins [95][96][97][98], a strategy described for other GAS proteins [99]. For example, GAS PrtF1 binds to α 5 β 1 integrins and could trigger integrin clustering and internalization of α 5 β 1 integrin, ultimately resulting in GAS uptake [100]. The second region in Spy1336/R28 containing repeat sequences is located in its coding sequence (Fig 1A). A 237-nt repetitive sequence, corresponding to one 79-aa TR R28 , is present in Spy1336/R28 from 1 to 17 tandem copies (Fig 1B and S1 Table). Ten is the most prevalent number of TR R28 copies present in R28, followed by 9 ( Fig 1B). Thus, emm28 GAS strains make different size variants of the R28 protein, likely arising through recombination [62][63][64][65]101]. The function of TRs in virulence factors is not well understood. One possible function would be to extend the reach of a surface-anchored protein and expose its Nterminal domain at the bacterial surface without adding new mechanistic functions [102]. Variation in the number of TRs may alternatively produce antigenic variation. In this regard, variation in the conserved region of GAS M protein generates antigenic diversity [102], and similarly, variation in the GBS alpha C protein affects antigenicity and protective efficacy [52,103]. TR number variation might decrease adhesion to and entry into host cells [104]. No correlation was found between the number of T nucleotides in HT Spy1336-7 and the number of TR R28 repeats (S3 Fig). Additional studies designed to address these ideas are warranted.
To summarize, the R28 virulence factor in GAS is highly polymorphic in natural populations, both in level of transcript production and protein expression, and size. Differences in transcription levels are caused by variation in the number of Ts in a homopolymeric tract upstream of the Spy1336/R28 gene, whereas variance in R28 protein size is caused by variation  Table. HT Spy1336-7 alleles found in 2,074 emm28 GAS invasive strains.  (4)