Elucidation of independently modulated genes in Streptococcus pyogenes reveals carbon sources that control its expression of hemolytic toxins

ABSTRACT Streptococcus pyogenes can cause a wide variety of acute infections throughout the body of its human host. An underlying transcriptional regulatory network (TRN) is responsible for altering the physiological state of the bacterium to adapt to each unique host environment. Consequently, an in-depth understanding of the comprehensive dynamics of the S. pyogenes TRN could inform new therapeutic strategies. Here, we compiled 116 existing high-quality RNA sequencing data sets of invasive S. pyogenes serotype M1 and estimated the TRN structure in a top-down fashion by performing independent component analysis (ICA). The algorithm computed 42 independently modulated sets of genes (iModulons). Four iModulons contained the nga-ifs-slo virulence-related operon, which allowed us to identify carbon sources that control its expression. In particular, dextrin utilization upregulated the nga-ifs-slo operon by activation of two-component regulatory system CovRS-related iModulons, altering bacterial hemolytic activity compared to glucose or maltose utilization. Finally, we show that the iModulon-based TRN structure can be used to simplify the interpretation of noisy bacterial transcriptome data at the infection site. IMPORTANCE S. pyogenes is a pre-eminent human bacterial pathogen that causes a wide variety of acute infections throughout the body of its host. Understanding the comprehensive dynamics of its TRN could inform new therapeutic strategies. Since at least 43 S. pyogenes transcriptional regulators are known, it is often difficult to interpret transcriptomic data from regulon annotations. This study shows the novel ICA-based framework to elucidate the underlying regulatory structure of S. pyogenes allows us to interpret the transcriptome profile using data-driven regulons (iModulons). Additionally, the observations of the iModulon architecture lead us to identify the multiple regulatory inputs governing the expression of a virulence-related operon. The iModulons identified in this study serve as a powerful guidepost to further our understanding of S. pyogenes TRN structure and dynamics.

We previously reported an independent component analysis (ICA)-based framework that decomposes a compendium of RNA-sequencing (RNA-seq) expression profiles to determine the underlying regulatory structure of a bacterial transcriptome (27)(28)(29). ICA computes independently modulated sets of genes (termed iModulons) and their corresponding activity levels under each experimental condition. The iModulons can be interpreted as data-driven regulons, since they rely on observed expression changes instead of predicted transcription factor (TF)-binding sites. Therefore, it is possible that some iModulons reveal sets of genes never known to move coordinately. Moreover, since the number of iModulons is substantially fewer than the number of genes, they provide a helpful framework to more easily interpret the bacterial transcriptome. Here, we apply ICA for the first time to the major human pathogen S. pyogenes to elucidate its iModulon architecture.
Among over 200 serotypes of S. pyogenes, serotype M1 is the most frequently identified from streptococcal pharyngitis (30) and invasive diseases worldwide (31). Todd-Hewitt medium supplemented with yeast extract (THY) is the most commonly used growth medium in which transcriptome profiling of S. pyogenes has been performed. To elucidate the TRN features of S. pyogenes serotype M1, we compiled 116 high-quality RNA-seq data sets of S. pyogenes serotype M1 cultured in THY and conducted ICA-based decomposition. ICA computed 42 iModulons that we characterized and analyzed their activities to formulate hypotheses. Users can search and browse all iModulons from this data set on iModulonDB.org and view or interrogate them using interactive dashboards (32).
Drilling down, we focused in this study on four iModulons that each includes the nga-ifs-slo operon encoding an NAD glycohydrolase (NADase), immunity factor, and the pore-forming cytolytic toxin streptolysin O that contribute to enhanced virulence of S. pyogenes (33,34). Comparing iModulon activities across all 116 samples allowed us to formulate hypotheses and identify carbon sources that control nga-ifs-slo expression. Although maltose and dextrin both consist solely of glucose molecules, dextrin utilization upregulated the nga-ifs-slo operon and changed bacterial hemolytic activity in contrast to glucose or maltose utilization. Furthermore, dextrin utilization induced the activation of iModulons related to the two-component regulatory system CovRS, a global transcriptional control system of S. pyogenes virulence phenotypes (2). In the concluding part of this study, we show that the computed TRN structure from iModulon analyses can simplify the interpretation of noisy bacterial transcriptomes at the infection site using matrix projection. Our composite results suggest that S. pyogenes, in the inflammatory environment of the NF, senses and responds to both carbohydrate depletion and stresses affecting the CovRS system.

Data acquisition and preprocessing for independent component analysis (ICA)
We downloaded RNA-seq data of S. pyogenes serotype M1 from National Center for Biotechnology Information Sequence Read Archive. To calculate transcripts per kilobase million (TPM) from RNA-seq data and conduct the subsequent quality control (QC)/quality assurance (QA), the previously reported pipline was used with minor modifications (35). Briefly, the sequences were aligned to the S. pyogenes strain 5448 genome (Serotype M1, accession no. CP008776), using Bowtie2 (36). The aligned sequences were assigned to open reading frames using feature Counts (37). To reduce the effect of noise, genes with average counts per sample <10 were removed. The final count matrix with 1,723 genes was used to calculate TPM. To generate high-quality RNA-seq expression profiles, we removed RNA-seq samples that show low quality in FastQC (38) and low correlation between biological replicates (Pearson's R <0.92). Finally, QC-passed 116 RNA-seq data were used for ICA (8,33,(39)(40)(41)(42) (Supplementary Data 1).

Independent component analysis
ICA decomposes a transcriptomic matrix (X) into independent components (M) and their condition-specific activities (A) (Supplementary Data 2). The procedure for computing robust components with ICA has been described in detail previously (27). Log2(TPM + 1) values were centered on strain-specific reference conditions and used as ICA decomposition. Next, Scikit-learn (v0.20.3) implementation of the FastICA algorithm was used to calculate independent components with 100 iterations, convergence tolerance of 10 −7 , log(cosh(x)) as contrast function and parallel search algorithm (27,43). To determine the ideal number of components, we used our previously developed OptICA method (44). The resulting M matrix containing source components from the 100 iterations was clustered with Scikit-learn implementation of the DBSCAN algorithm with ε of 0.1 and a minimum cluster seed size of 50 samples (50% of the number of random restarts). If necessary, the component in each cluster was inverted such that the gene with the maximum absolute weighting of the component was positive. Centroids for each cluster were used to define the final weightings for M and the corresponding A matrix. The whole process was repeated 100 times to ensure that the final calculated components were robust. Finally, components with activity levels that deviated more than five times between samples in the same conditions were also filtered out.

Determining independently modulated sets of genes (iModulons)
ICA enriches components that maximize the non-Gaussianity of the data distribution. While most of the genes have weightings near 0 and fall under Gaussian distribution in each component, there exists a set of genes whose weightings in that component deviate from this significantly. To enrich these genes, we used Scikit-learn's implementation of the D'Agostino K2 test, which measures the skew and kurtosis of the sample distribution (D'Agostino, 1990). We first sorted the genes by the absolute value of their weightings and performed the K2 test after removing the gene with the highest weighting. This was done iteratively, removing one gene at a time, until the K2 statistic falls below a cutoff. We calculated this cutoff based on sensitivity analysis on the agreement between enriched iModulon genes and known regulons. Regulons predicted by RegPrecise (45) (S. pyogenes M1 GAS), shown in previous reports (PMIDs are listed in Supplementary Data 3), and prophage genes searched by PHASTER (46) are used for the information of known regulons (Supplementary Data 3). For a range of cutoffs, we ran the iterative D'Agostino K2 test on all components and checked for statistically significant overlap of iModulons with the known regulons using Fisher's exact test (Supplementary Data 4). For iModulons with significant overlap, we also calculated precision and recall. The cutoff of 150, which led to the highest harmonic average between precision and recall (F1 score), was chosen as the final cutoff. We also manually identified iModulons that consisted of genes with shared functions (e.g., cytolysins, transporter) or those that corresponded to other genomic features (Supplementary Information 1, Supplementary Data 4).

Bacterial strains and culture conditions
S. pyogenes M1T1 strain 5448 (accession no. CP008776) was isolated from a patient with toxic shock syndrome and NF and considered to be a genetically representative globally disseminated clone associated with the invasive infections (47). S. pyogenes were grown at 37°C in a screw-cap centrifuge tube (BD Biosciences, San Jose, CA, USA) filled with Todd-Hewitt broth supplemented with 0.2% yeast extract (THY) (Hardy Diagnostics, Santa Maria, CA, USA) in an ambient atmosphere and standing cultures. To obtain cultures for experiments, overnight cultures of S. pyogenes were back diluted 1:50 into fresh THY broth and grown at 37°C, with growth monitored by measuring optical density at 600 nm (OD 600 = 0.35 for mid-exponential samples or OD 600 = 0.75 for early-stationary samples). Colony-forming units (CFUs) were determined by plating diluted samples on THY agar.
Escherichia coli MC1061 strain was used as a host for derivatives of plasmids pHY304 (48). E. coli strains were cultured in Luria-Bertani medium (Hardy Diagnostics) at 37°C with agitation. For the selection and maintenance of strains, antibiotics were added to the medium at the following concentrations: erythromycin, 500 µg/mL for E. coli and 2 µg/mL for S. pyogenes; chloramphenicol, 2 µg/mL for S. pyogenes.

Construction of malR2 mutant strain
An in-frame malR2 TF deletion mutant (ΔmalR2) with a background of strain 5448 (wild type [WT]) was constructed using the pHY304 temperature-sensitive shuttle vector, as previously reported (49). Briefly, a pHY304-malR2KO plasmid harboring the DNA fragment, in which upstream of the malR2 gene, a chloramphenicol acetyltransferase gene (cat), and downstream regions of the malR2 gene were linked by overlapping PCR, was electroporated into WT strain and grown in the presence of erythromycin. The plasmid was then integrated into the chromosome via the first allelic replacement at 37°C, after which it was cultured at 30°C without antibiotics to induce the second allelic replacement. The deletion of malR2 was confirmed by site-specific PCR using purified genomic DNA. Primers are listed in Table 1.

Red blood cell hemolysis assay
Human blood was collected by using BD Vacutainer EDTA (Cat. 366643m; Becton, Dickinson, Mountain View, CA, USA) tubes. Red blood cells (RBCs) were isolated by centrifuging and washing with phosphate-buffered saline (PBS). Bacterial cultures in THY broth at OD 600 = 0.3-0.4 were centrifuged by using Eppen tubes and resuspended in an equal amount of chemically defined medium (CDM) (50). CDM was supplemented with 4.5 g/L of D-glucose (Cat. G7021; Sigma-Aldrich, St. Louis, MO, USA), D-maltose (Cat. M5885; Sigma-Aldrich), or dextrin (Cat. 31410; Sigma-Aldrich). At 2-hour postincubation at 37°C, 150 µL of whole culture or supernatants were collected for serial dilutions in PBS. All wells were added 50 µL of 2% RBC in PBS and incubated for 4 hours. Supernatants were collected from assay wells after centrifugation at 3,000× g for 15 minutes, and hemolysis was determined by absorbance with a SpectraMax M3 plate reader at 541 nm using SoftMax Pro software. Each titer was recorded as the point that the hemolysis

RNA-seq and data analysis
Libraries were sequenced using Illumina NextSeq or NovaSeq systems, with 40 bp (sequenced at the University of California, Davis DNA sequencing facility) or 150 bp (sequenced at the University of California San Diego IGM Genomics Center) paired-end reads. Raw reads were trimmed using the fastp (52), mapped to the S. pyogenes strain 5448 genome, and used to calculate the TPM with the commercially available Geneious Prime 2019.2 software package (Biomatters, Auckland, New Zealand). Differential expression and global analyses of RNA-seq expression data were performed using iDEP (ge-lab.org). EdgeR log transformation was used for clustering and principal component analysis (iDEP). Hierarchical clustering was visualized using the average linkage method with correlation distance (iDEP). Functional annotations of S. pyogenes strain 5448 genome are obtained by using EggNOG (eggnog5.embl.de) or PATRIC (www.patricbrc.org).

biologically meaningful sets of genes are revealed from 116 high-quality gene expression profiles by ICA
To extract regulatory signals from transcriptomic data, we used our established protocol (53) and downloaded RNA-seq data from the public Sequence Read Archive (54), where 229 RNA-seq samples were available for the M1 serotype of S. pyogenes. After QC, which checks for read quality, alignment, and high replicate correlations, we selected They are also freely available to browse, search, or download on iModulonDB.org (32). Precision and recall between iModulons and regulons ( Fig. 1B) were used to classify iModulons into four groups, namely well-matched, regulon subset, regulon discovery, and incompletely matched groups (Fig. 1C). (i) The well-matched group (n = 4) has precision and recall greater than 0.65. (ii) The regulon subset (n = 5) exhibits high precision and low recall. They contain only part of their enriched regulon, perhaps because the regulon is very large and only the genes with the most pronounced transcriptional changes are captured. (iii) A third group, regulon discovery (n = 5), has low precision but high recall. These iModulons contain some genes that are known to be co-regulated, along with other genes that are co-stimulated by the conditions in the data set. (iv) The remaining enriched iModulons are termed incompletely matched (n = 8) because neither their precision nor their recall met the cutoff; however, this grouping still had statistically significant enrichment levels and appropriate activity profiles. The difference in gene membership between these iModulons and their regulons provides excellent targets for discovery.
iModulons identified with no enrichments suggest that their composite gene sets may be regulated by unexplored transcriptional mechanisms or that there remains some noise due to large variance within the data. Functional categorization of iModulons provides a systems-level perspective on the transcriptome (Fig. 1D). Metabolism-related iModulons account for over 40% of the iModulons, while comparatively fewer iModulons deal with toxin production, stress response, replication, translation, and mobile genetic elements like prophages.

iModulon activities of mutant strains demonstrate the reliability of the annotation of iModulons
All iModulons have a computed activity in every sample (Supplementary Data 2), allowing for easy comparisons of iModulon activities across samples ( Supplementary  Information 1). To benchmark the reliability of iModulon annotation, we show the FabT and CovRS-related iModulon activities (CovR-1 and CovR-2 iModulons) of 116 all samples (Fig. 1E).
The FabT iModulon overlaps nearly perfectly with the FabT regulon. This iModulon contains all genes known to be regulated by FabT, plus a hypothetical protein (XK27_06930) that may be co-regulated with the known regulon. Additionally, transcriptomes from fabT deletion mutant strains (Fig. 1E, Red) (39) show high FabT iModulon activities, consistent with the role of FabT as a repressor. Thus, this iModulon captures a known regulon and its expected behavior using transcriptome data alone. Several other iModulons are similarly easily explainable, particularly those in the well-matched group.

nga-ifs-slo operon is in SLO iModulon, MalR2 iModulon, CovR-1 iModulon, and CovR-2 iModulon
To decipher S. pyogenes pathogenicity, it is important to resolve the regulatory dynamics underlying two important virulence operons encoding potent cytolytic toxins: the abovementioned nga-ifs-slo, which encodes the gene for streptolysin O (SLO), and sagA-I, which encodes the biosynthetic machinery to produce streptolysin S (SLS), the toxin responsible for the hallmark β-hemolytic phenotype of the bacterium grown on blood agar. Both SLS and SLO are important for the establishment of S. pyogenes skin infection and the formation of necrotizing skin lesions in invasive disease (55)(56)(57).
Because iModulons are based on matrix decomposition, some genes may be present in multiple iModulons. The effects of all iModulons add together to reconstruct individual expression levels, which may be due to multiple TFs influencing the target genes. This is the case for the nga-ifs-slo operon, which is part of the designated SLO iModulon ( Fig. 2A), MalR2 (a member of the LacI/GalR family of repressors) iModulon (Fig. 2B), CovR-1 iModulon, and CovR-2 iModulon (Fig. 1F). Indeed, the nga-ifs-slo operon are the only genes overlapping among these four iModulons (Fig. 2C). These findings indicate there are the multiple regulatory inputs governing the expression of the nga-ifs-slo operon. In the subsequent sections, we speculate and validate environment cues under which the nga-ifs-slo operon is regulated based on iModulon information and corresponding activities.

SLS and SLO iModulons suggests the antagonistic regulation between sag A-I and nga-ifs-slo operons in the stationary phase
The SLS iModulon contains the sag A-I operon encoding SLS (Fig. 2D). Among 42 iModulons, the SLS iModulon is the one containing all the sagA-I genes. Interestingly, there is a transcriptional tradeoff between the two major toxin-encoding virulence systems: the SLS iModulon activities are negatively correlated with SLO iModulon activities (Fig. 2E). SLS and SLO iModulon activities tend to be antagonized in samples obtained from stationary phase growth and a CcpA deletion mutant at mid-exponential phase (Fig. 2E). S. pyogenes stationary phase cultures in THY show evidence of glucose depletion (58). Carbon catabolite repressor CcpA is the major transcriptional regulator in carbon catabolite repression (CCR) and senses carbohydrate availability (11,59,60). These observations lead us to hypothesize that the antagonistic regulation between sag A-I and nga-ifs-slo operons depends on a change in carbon source or TFs sensing the availability of carbon sources (Fig. S2). SLO iModulon activities are also strongly deactivated in CcpA deletion mutants (Fig. 2E), suggesting CcpA deletion downregulates the ngaifs-slo operon. However, other studies have indicated that the ccpA deletion in S. pyogenes serotype M1 induces the upregulation of the slo gene (6) or does not change the expression level of the slo gene (60). Therefore, the SLO iModulon alone is not enough to explain the regulation of the nga-ifs-slo operon.

MalR2 iModulon harboring nga-ifs-slo operon is activated in the stationary phase, contrary to SLO iModulon
The MalR2 iModulon contains not only the nga-ifs-slo operon but also an operon for maltose or dextrin utilization (malACDX, amyAB) (Fig. 2B). Maltose is a disaccharide of D-glucose, whereas dextrin is a polysaccharide of D-glucose. Since maltose and dextrin are the products of the action of salivary amylases on dietary starch, it is possible S. pyogenes utilizes these carbohydrates during throat colonization or the development of pharyngitis.

Research Article mSystems
Contrary to the SLO iModulon, MalR2 iModulon activities are positively correlated with SLS iModulon activities and are very high in samples from the stationary phase or CcpA deletion mutants at the mid-exponential phase (Fig. 2F). These facts suggest that the transcription of the nga-ifs-slo operon is both downregulated (suggested by the SLO iModulon) and upregulated (suggested by the MalR2 iModulon) in stationary phase or under glucose-depleted conditions. These effects may counteract with one another, resulting in the upregulation of the slo gene in the MalR2 iModulon-activated state even under conditions that deactivate the SLO iModulon (Fig. 2G).

MalR2 TF does not contribute to the regulation of the nga-ifs-slo operon
The MalR2 iModulon is named due to a statistically significant overlap with the MalR2 regulon predicted by RegPrecise (45) (Fig. 2H). However, the predicted MalR2 regulon in RegPrecise does not contain the nga-ifs-slo operon. In the genome of serotype M1 S. pyogenes, two genes predicted to encode LacI family TFs, malR and malR2, are situated near the malACDX and amyAB operons (61,62). The MalR regulon was experimentally defined using MalR-deletion mutant strains (5), but no such data have been generated for MalR2.
To investigate whether the nga-ifs-slo operon is directly regulated by MalR2 TF, we compared the transcriptomes of WT and malR2 deletion mutant (ΔmalR2) strains at both mid-exponential and stationary growth phases in THY broth. Deletion of malR2 induced the upregulation of only the malACD operon in the mid-exponential phase (Fig. S3A, Supplementary Data 5), but in the stationary phase, it upregulated both the malACDFGX and amyAB operons (Fig. S3B, Supplementary Data 5). The nga-ifs-slo operon was not affected, which indicates that its inclusion with the MalR2 iModulon was due to co-stimulation, but not direct regulation. Therefore, we decided to explore the carbon sources that may stimulate the operon.

Maltose or dextrin utilization changes bacterial hemolytic activity as compared to the glucose utilization
Despite the lack of direct MalR2 regulation, the nga-ifs-slo operon may still have been included in the MalR2 iModulon due to co-stimulation by inducers such as maltose or dextrin. To investigate whether maltose or dextrin utilization changes bacterial hemolytic activity, we used a red blood cell hemolysis assay (Fig. 3A). We adjusted the CDM for S. pyogenes by following a previous report (50). CDM without carbohydrate sources could not support the growth of serotype M1 S. pyogenes strain 5448, but its viability (bacterial CFUs) was maintained. Supplementation of the CDM with glucose, maltose, or dextrin allowed S. pyogenes 5448 strain growth (Fig. 3B). In a hemolysis assay using WT S. pyogenes supernatant, the glucose (+) condition produced a high hemolytic titer, while the maltose (+) or dextrin (+) conditions yielded a low titer (Fig. 3C). To clarify whether this hemolysis was caused by SLS or SLO, we used SLS deletion mutant (ΔsagA) and SLO deletion mutant (Δslo) strains (Fig. 3D). In the glucose-supplemented condition, SLO is completely responsible for the hemolytic activity. In contrast, in the maltose-or dextrinsupplemented conditions, SLS is completely responsible for the hemolytic activity. In a hemolysis assay by using a live culture of the WT S. pyogenes strain, the glucose and dextrin conditions produced high hemolytic titer, while the carbon (-) and maltose conditions resulted in low titer (Fig. 3E). Here again, we used ΔsagA and Δslo strains (Fig.  3F). In the glucose-supplemented condition, SLO is principally responsible for the hemolytic activity, while SLS contributed only weakly. In other conditions, SLS is mainly responsible for the hemolytic activity. However, SLO-dependent hemolytic activity was confirmed in a dextrin-supplemented condition, while it was not present in the carbon (-) and maltose conditions. These results show how supplementation of CDM with different carbon sources changes the hemolytic activity and virulence factor expression in S. pyogenes. Therefore, we reasoned that the bacterial transcriptome obtained from

Utilization of glucose, maltose, or dextrin significantly activates different iModulons, containing nga-ifs-slo operon
To examine the influence of different carbon sources on bacterial iModulon activities, we conducted RNA-seq analysis by using the RNA samples isolated from the S. pyogenes M1 strain after 2-hour incubation in CDM (detailed conditions and results are presented in Supplementary Data 5). Samples isolated from S. pyogenes cultured in CDM were evaluated together with those cultured in THY and shown earlier. In a hierarchical clustering (Fig. 4A), samples from CDM carbon (-) samples are located close to samples from stationary phase growth in THY. Therefore, the CDM carbon (-) condition seems to recapitulate the carbohydrate-depleted condition in THY. Although CDM glucose, maltose, and dextrin conditions supported S. pyogenes growth, samples from these conditions are located far from that mid-exponential phase in THY on the hierarchical clustering. Since THY broth is a nutrient-rich media containing many kinds of carbon sources, a CDM is better suited for assessing the influence of the supplementation of carbon sources.
In the CDM + maltose condition, a total of 648 S. pyogenes genes were altered in total compared to the CDM + glucose condition ( Fig. 4B and Supplementary Data 5). Rather than analyzing so many DEGs individually, we propose that iModulon activities can be used to facilitate the interpretation of RNA-seq data. To that end, we calculated iModulon activities of samples in CDM + maltose and CDM + glucose conditions, which were centered on samples in CDM carbon (-) condition. After that, a Differential iModulon Activity plot (DiMA plot) allowed us to identify the significantly altered iModulons. In the CDM + maltose condition, antagonistic activation of the SLS and SLO iModulons was observed (Fig. 4B). The iModulons associated with the detection of carbohydrate depletion, including iModulons mapped to TFs of CcpA, MalR2, M protein transacting positive regulator Mga (9), and phase-dependent regulators Rgg (10), were likewise significantly activated in the CDM + maltose condition as compared to the CDM + glucose condition.
S. pyogenes grown in CDM + dextrin showed alterations in a total of 809 genes compared to the CDM + glucose condition (Fig. 4C and Supplementary Data 5). DiMA indicated that the significantly altered iModulons in the CDM + dextrin condition compared to CDM + glucose were similar to those significantly altered by maltose in Fig.  4B. However, SLO iModulon activity was not significantly altered. Notably, CovRS-related iModulons (CovR-1 and CovR-2) were significantly activated in the CDM + dextrin condition compared to CDM + glucose. Although the utilization of glucose, maltose, or dextrin each resulted in the upregulation of the nga-ifs-slo operon, the activated iModulons that contributed to operon upregulation differed by each carbon source (Fig.  4D).

The environment S. pyogenes were exposed in necrotizing fasciitis (NF) seemed both the carbohydrate depletion and stresses affecting the CovRS regulator
The iModulons presented in this study help us to interpret RNA-seq data from other studies. When we perform an analysis in this way, we can make certain observations about how data behave with respect to our TRN even if the data are of low quality. Here, we assess S. pyogenes transcriptome changes in the inflammatory environment  (63). At first, we calculated iModulon activities of samples isolated from NF at 24, 48, and 96 hours, which were centered on samples isolated from THY culture at mid-exponential phase (THY-ME, control). After that, we identified the significantly altered iModulons at each time point. Then, a set of 18 were visualized (Fig. 5A). The iModulons associated with the detection of the carbohydrate depletion, such as CcpA-related, MalR, MalR2, Mga, and Rgg iModulons, were significantly activated in RNA samples isolated from NF in vivo compared with those obtained from THY-ME in vitro. In addition, CovRS-related Research Article mSystems iModulons (CovR-1 and CovR-2) are significantly activated in RNA samples isolated from NF compared to THY-ME.
To compare data from NF with new data obtained in this study, we also re-calculated iModulon activities of samples in CDM carbon (-), glucose, maltose, and dextrin conditions, which were centered on samples from THY-ME of the WT strain. Of interest, the activities of CovRS-related iModulons in RNA samples isolated from NF are located very close to that from CDM dextrin condition (Fig. 5B). In addition, the activities of CcpArelated iModulons in the CDM carbon (-) condition are located relatively close to that those in NF (Fig. 5C). These results suggest S. pyogenes in NF sensed both carbohydrate depletion and stresses affecting the CovRS regulator. Taken with our other results about the regulation of the virulence factors, a coherent picture of carbon source depletion, CovRS, SLS, and SLO is presented. The most pronounced changes in the mouse model expression are in the same iModulons that we have been using to understand the regulation of virulence operons that contribute to NF pathology, which suggests this approach can be fruitful for studying in vivo disease models or clinical data.
To investigate whether or not we should take into account the global gene expression changes of bacterial growth, we computed the correlation of growth rate with the activities of iModulons on which we focused (Fig. 5D) (detailed conditions and results are presented in Supplementary Data 5). When comparing the iModulon activities between mid-exponential and stationary phases, no significant changes were seen in the activities of MalR2, SLO, FabT, and CovRS-related iModulons, while the activities of MalR2 and CovRS-related iModulons are consistently activated in RNA samples isolated from NF compared with THY-ME. These suggest that the significantly altered iModulon activities shown in Fig. 5A could capture a combination of both growth rate-independent and growth rate-dependent factors.

DISCUSSION
In this study, 116 existing, high-quality RNA-seq data sets of S. pyogenes serotype M1 were decomposed using ICA. This decomposition identified 42 iModulons and their iModulon activities; 26 of the iModulons correspond to specific biological functions or transcriptional regulators. Based on the gathered information on iModulons and their activities across samples, we could formulate hypotheses and identify carbon sources that modulate hemolysis activity in S. pyogenes. Calculation of iModulon activities in new RNA-seq data sets also allowed us to identify that S. pyogenes activated CovRS-related iModulons in a CDM + dextrin condition (Fig. 6). Furthermore, by computing iModulon activities of published bacterial transcriptomes, we estimate the stress to which S. pyogenes is exposed at the infection site of NF.
Sastry et al. (64) previously indicated that iModulons of E. coli directly capture groups of genes that vary relatedly, and independently, from other groups of genes, regardless of the magnitude of their variance. Lamoureux et al. (65) also demonstrated the overall similarity of the iModulon structure even after adding the 1,600 additional public samples from all sorts of different projects/laboratories/growth conditions. Therefore, an ICA-based statistical approach for the identification of iModulons can capture the general characterization of the gene regulatory network, although differences in data sets reflect some differences in gene sets of iModulons (64,65). We cannot at present conduct the same analysis in S. pyogenes due to the small number of RNA-seq samples available in public databases. However, we could re-capture FabT and CovRS-related iModulons even if samples of fabT deletion mutants and CovRS mutants are erased from the RNA-seq data set (data not shown).
Some iModulons contained sets of genes that had never previously been reported as a regulon. This is likely because iModulons that are defined through an untargeted ICAbased statistical approach represent a set of genes that are co-expressed as a by-product of multiple regulatory influences. By focusing on nga-ifs-slo operon, we could identify here that dextrin utilization alters bacterial hemolytic activity compared to maltose or glucose utilization. The full data set is available for examination by researchers with Research Article mSystems interests in the biology and pathogenesis of S. pyogenes infections, allowing them to explore the contents of other iModulons in detail and the relations between them. At iModulonDB.org, users can search and browse all iModulons from this data set and view them with interactive dashboards (32). Previous reports have identified regulons using single-gene knockout mutants to conclude that the nga-ifs-slo operon is regulated by many TFs, including CovRS (6), Mga (9), Rgg (10), and TrxR (3). In our unbiased global study of available transcriptomes, only the MalR2 iModulon, CovRS-related iModulon, and SLO iModulon contained the nga-ifs-slo operon. This distinction suggests that nga-ifs-slo operon is under multiple regulations at the same time. By comprehensively comparing iModulon activities across all samples, we speculate three environmental cues increase the expression of ngaifs-slo operon: (i) glucose-rich conditions (through activation of the SLO iModulon); (ii) glucose-depleted conditions (through activation of the MalR2 iModulon); and (iii) environmental stress sensed by CovRS (through activation of CovRS-related iModulon). The existence of multiple mechanisms for the upregulation of nga-ifs-slo operon may In this study, we compiled existing high-quality 116 RNA-seq data sets of S. pyogenes serotype M1 and conducted ICA-based decomposition. This decomposition identified 42 iModulons and these activities, which allowed us to formulate hypotheses and identify carbon sources that change bacterial hemolysis activity. In particular, dextrin utilization changes bacterial hemolytic activity as compared to maltose or glucose utilization. Furthermore, visualizing the iModulon activities in the transcriptome help us to interpret RNA-seq data. In particular, we identified that S. pyogenes activated CovRS-related iModulons in the CDM dextrin (+) condition.
Research Article mSystems link this operon with bacterial pathogenicity in various experimental conditions (33,34,66,67). Despite the structural similarities of glucose, maltose, and dextrin, S. pyogenes changed its hemolytic activity and transcriptome according to the carbon source. S. pyogenes in CDM + maltose or dextrin conditions activated iModulons associated with CcpA, MalR, and MalR2 TFs compared with those in CDM + glucose conditions. These TFs are repressors of the LacI/GalR family associated with CCR (5,62). Glucose is the primary carbohydrate source of energy for many bacteria, including S. pyogenes, which generates ATP from its conversion to pyruvate through glycolysis. However, utilization of maltose or dextrin requires energy input to first convert them to glucose to be metabolized. Therefore, it was believed that S. pyogenes suppressed genes involved in alternative carbohydrate utilization when glucose was available, leading to CCR (61,68). In the physiological niche of S. pyogenes, the human body, carbon sources are available in different combinations and levels depending on the type of tissue and state of starvation and could be amenable to further studies with these tools.
We identified the MalR2 regulon, malACDFGX and amyAB operons. Gene mutations of the malR2 gene in S. pyogenes M1 strain significantly increase fitness in the non-human primate model of necrotizing myositis (50). In another point of note, the malR2, malACDX, and amyAB genes exist in the M1, M2, M4, and M28 serotypes but are absent from M3, M5, M6, M12, M18, and M49 serotypes (62). The MalR2 iModulon may underpin differences in metabolic and virulence properties among serotypes of S. pyogenes. Maltose and dextrin are the products of the action of salivary amylases on dietary starch. AmyA, α-amylase, increased murine mortality following mucosal challenge (62). It is possible that the products of salivary amylase may encourage upregulation of the nga-ifs-slo virulence-associated operon. It is presently unclear which particular carbon sources, concentrations, and enzymes are most utilized by the pathogen at infected sites.
CovRS-related iModulons were activated by dextrin utilization. Mutations of the CovRS virulence regulator derepress several different kinds of virulence genes, leading to a hypervirulent phenotype of S. pyogenes (4, 69). Ikebe et al. (70) reported that CovRS mutations in S. pyogenes were more common in clinical isolates from severe invasive infections compared with isolates from non-invasive infections. CovRS mediates a general stress response in S. pyogenes whereby specific environmental stimuli, such as increased temperature, acidic pH, and high salt concentrations (2). It had not been previously reported that dextrin utilization releases the expression of CovRS-repressed genes. However, further study is needed to clarify that dextrin utilization affects CovRS directly or indirectly.
Growth rate is one of the several basic characteristics of a bacterial culture that may introduce variance in gene expression. In fact, the activities of Rgg, SLS, and CcpA-related iModulons were quite variant with growth rate. Therefore, it may be necessary to control the growth rate difference if making comparisons between the activity of these iModulons across samples. In particular, this artifact should be taken into account when interpreting the noisy in vivo data such as the NF data from the murine model analyzed herein. This point represents a current limitation of interpretation in this study.
In this study, 116 all RNA-seq data sets for ICA come from S. pyogenes serotype M1 cultured in THY broth. Therefore, identified iModulons are specific for this most commonly studied media growth condition. Future studies can append other conditions to this data set and observe the changes to the set of iModulons that ensue. Although iModulon activities can often be explained by prior knowledge, they can also present surprising relationships that lead to the generation of hypotheses. We describe new information about iModulons and their activities enabled us to query metabolic and regulatory cross-talk, discover new potential relationships, find coordination between metabolism and virulence, and facilitate the interpretation of the S. pyogenes response in vivo. These versatilities may allow iModulons identified in this study or in near future to serve as a powerful guidepost to further our understanding of S. pyogenes TRN structure and dynamics. | Victor Nizet, Conceptualization, Funding acquisition, Project administration, Writingreview and editing

DATA AVAILABILITY STATEMENT
The data accession numbers can be found in Supplementary Data 1. The normalized log TPM (X) and the calculated M and A matrix of the model can be found in Supplementary Data 2. The code for QC/QA, ICA, and assessment of the regulator enrichment can be found on Github (https://github.com/avsastry/modulome-workflow). Python package for analyzing and visualizing iModulons (PyModulon) can be also found on Github (https:// github.com/SBRG/pymodulon). All new 16 RNA-seq data obtained in this study have been deposited into DDBJ sequence read archive (DRA) under the accession number DRA014564. Interactive online dashboards for all iModulons and all data are available at https://imodulondb.org under the data set name "S. pyogenes."