Genome-Scale Metabolic Modeling Combined with Transcriptome Profiling Provides Mechanistic Understanding of Streptococcus thermophilus CH8 Metabolism

ABSTRACT Streptococcus thermophilus is a lactic acid bacterium adapted toward growth in milk and is a vital component of starter cultures for milk fermentation. Here, we combine genome-scale metabolic modeling and transcriptome profiling to obtain novel metabolic insights into this bacterium. Notably, a refined genome-scale metabolic model (GEM) accurately representing S. thermophilus CH8 metabolism was developed. Modeling the utilization of casein as a nitrogen source revealed an imbalance in amino acid supply and demand, resulting in growth limitation due to the scarcity of specific amino acids, in particular sulfur amino acids. Growth experiments in milk corroborated this finding. A subtle interdependency of the redox balance and the secretion levels of the key metabolites lactate, formate, acetoin, and acetaldehyde was furthermore identified with the modeling approach, providing a mechanistic understanding of the factors governing the secretion product profile. As a potential effect of high expression of arginine biosynthesis genes, a moderate secretion of ornithine was observed experimentally, augmenting the proposed hypothesis of ornithine/putrescine exchange as part of the protocooperative interaction between S. thermophilus and Lactobacillus delbrueckii subsp. bulgaricus in yogurt. This study provides a foundation for future community modeling of food fermentations and rational development of starter strains with improved functionality. IMPORTANCE Streptococcus thermophilus is one the main organisms involved in the fermentation of milk and, increasingly, also in the fermentation of plant-based foods. The construction of a functional high-quality genome-scale metabolic model, in conjunction with in-depth transcriptome profiling with a focus on metabolism, provides a valuable resource for the improved understanding of S. thermophilus physiology. An example is the model-based prediction of the most significant route of synthesis for the characteristic yogurt flavor compound acetaldehyde and identification of metabolic principles governing the synthesis of other flavor compounds. Moreover, the systematic assessment of amino acid supply and demand during growth in milk provides insights into the key challenges related to nitrogen metabolism that is imposed on S. thermophilus and any other organism associated with the milk niche.

employed in milk fermentation for millennia, S. thermophilus was first isolated and described in 1919 (2). The organism is a facultatively anaerobic, Gram-positive, homofermentative lactic acid bacterium with a low G1C genomic content and a genome size around 1.8 to 2.0 Mb (3).
In evolutionary terms, S. thermophilus has in recent times (3,000 to 30,000 years) undergone adaptation to milk as a new environmental niche (3). As a homofermentative organism, S. thermophilus converts lactose into lactic acid, resulting in acidification, which is a hallmark of most milk fermentations. Other minor fermentation products include formate, acetaldehyde, pyruvate, acetoin, and ethanol (1,(4)(5)(6), which may function as flavor components. Apart from having an abundant carbon source, i.e., lactose, milk is also rich in nitrogen; however, it is mostly protein bound, requiring efficient proteolysis for feasible uptake and utilization. Some S. thermophilus strains harbor the gene encoding the extracellular serine endopeptidase PrtS, usually providing a growth advantage (7), and yet many strains are prtS negative (8). As a consequence, the main mechanism of amino acid acquisition in S. thermophilus is through peptide uptake and subsequent hydrolysis. Amino acid biosynthesis is also known to play an important role for growth of S. thermophilus in milk, especially the synthesis of sulfur and branched-chain amino acids (BCAAs) (9,10). A frequent concern during milk fermentation is the risk of phage infection, and several phage types to which S. thermophilus is susceptible have been identified (11)(12)(13).
A few global microarray-based transcriptome studies have been conducted in S. thermophilus, with a focus on growth in milk (9), interactions with Lactobacillus delbrueckii subsp. bulgaricus (14,15), bacteriophage infection (16), or heat shock (17), whereas transcriptome sequencing (RNA-Seq)-based studies have emerged recently focusing on exocellular polysaccharide (EPS) synthesis (18,19); pH-controlled fermentation (20); and the effect of pH, temperature, and peptide composition (21,22). Besides these systemwide studies, a strain-specific genome-scale metabolic model (GEM) has been developed for S. thermophilus LMG18311 (5), although it is accompanied only by a limited number of simulations and lacks a predicted nutrient uptake and product secretion profile. The availability of high-quality GEMs is known to enable a mechanistic system-level understanding of metabolism and offer a useful tool for guiding the development of strains and cultures with enhanced characteristics for food fermentations (23).
To qualify as yogurt, according to the FAO/WHO Codex Alimentarius, milk fermentation is required to be carried out by S. thermophilus in combination with Lactobacillus delbrueckii subsp. bulgaricus (24). In milk, the two species coexist presumably in a mutualistic relationship exchanging metabolites, e.g., formate, folate, ammonia, ornithine, putrescine, and pyruvate, that augment the growth of each other (14,15,25). Both organisms are classified as generally recognized as safe (GRAS), and S. thermophilus is devoid of the virulence factors present in other species within the Streptococcus genus. While the coculture of the two organisms is the typical mode of fermentation in an applied setting, studying S. thermophilus in isolation also has clear merit, as this organism by itself can impart most, if not all, the characteristics required for, e.g., the production of yogurt.
This study aims at achieving a mechanistic understanding of key aspects of S. thermophilus physiology and metabolism through the combination of in silico and experimental approaches. The generation of a refined GEM capable of accurately simulating growth and metabolite production of an industrial S. thermophilus strain provided detailed insights into the metabolic flux distribution under different conditions. Flux balance analysis (FBA) was used to explain the variation in the secretion product profile in response to changing the nitrogen source in the medium. In addition, model simulations were employed to guide the analysis and interpretation of global and differential gene expression profiles of S. thermophilus CH8 in milk and in a chemically defined medium in a metabolic network context. Overall, the genome-wide phenotypic characteristics of S. thermophilus during growth in milk could be identified based on the obtained results.

RESULTS
Global genome and transcriptome characteristics of S. thermophilus CH8. S. thermophilus CH8 is a component of a commercial yogurt starter culture and is capable of efficient milk acidification, despite lacking the cell-envelope protease PrtS, as well as high texture formation. The genome of CH8 was sequenced to completion via a hybrid assembly approach, utilizing both short-and long-read sequencing technologies. The strain has a genome size of 1.85 Mb and is phylogenetically closely related to the type strain ATCC 19258, while it is separate from branches that include previously characterized strains, such as S. thermophilus LMD-9, LMG18311, and CNRZ1066 (Fig. 1). Based on rooting with Streptococcus salivarius NCTC8618, both CH8 and ATCC 19258 are relatively closer to the S. thermophilus ancestor. Nonetheless, clustering based on ortholog presence or absence places CH8 within a branch containing, e.g., LMG18311 and CNRZ1066 (see Fig. S1 in the supplemental material). Consequently, the genome sequence indicates a distinct ancestral origin, while the gene content resembles that of certain other previously characterized dairy strains. Comparing the ortholog distribution between CH8 and 34 previously published strains, with a focus on genes encoding metabolic functions, revealed only minor differences (see Table S1 in the supplemental material). Apart from specific genes in the exopolysaccharide synthesis gene cluster, CH8 also encodes an amino acid transporter that is rare among the other strains, while conversely lacking carbonic anhydrase, lysine decarboxylase (lysine degradation), and urocanate hydratase (histidine degradation) which are all present in the majority of the 34 strains. Therefore, at the level of metabolic genes, strain CH8 should provide a good representation of most strains within the species.
To obtain an increased physiological knowledge on strain CH8 and S. thermophilus in general, the transcriptional response of the strain was assessed under different growth conditions using RNA-Seq. Two medium conditions were employed, namely, a chemically defined medium (CDM) ( Fig. 2A, left panel) and milk ( Fig. 2A, right panel). CDM is a nutrient-rich medium with defined components, including free amino acids. Two time points (1 and 2) were included in the analysis, corresponding to mid-log and late-log phases, with the second time point approaching the transition into the stationary phase of batch fermentation ( Fig. 2A).
Replicate samples displayed similar transcription profiles, as indicated by the principal-component analysis (Fig. 2B). Here, mid-log and late-log samples from cultures in milk form clearly diverging groups, while CDM samples display moderate separation based on growth phase. Accordingly, medium composition, but also growth phase especially in milk, appears to have a profound effect on gene transcription. This finding can also be observed in the differential gene expression (Fig. 2C). While the quantity of differentially expressed genes is larger for the growth phase comparison in milk, the magnitude of fold change is slightly higher for the medium comparison.
Expression of genes related to carbon metabolism. Concerning specific gene expression changes (Fig. 3), genes involved in lactose metabolism, glycolysis, and the Leloir pathway displayed only minor fold changes (statistically significant, q value of ,0.05, but below the log 2 fold cutoff of .j61j) between different conditions, except for gapN, encoding NADP-dependent glyceraldehyde-3-phosphate dehydrogenase. This gene displayed a 1.6 log 2 -fold higher expression in mid-log milk cultures than that in mid-log cultures in CDM and late-log cultures in milk, indicating a higher demand for NADPH, which is needed, for example, in amino acid and nucleotide biosynthesis. The ldh gene encoding lactate dehydrogenase follows the expression pattern of the glycolysis genes and also has a high expression level. Genes involved in the formation of other fermentation end products, such as formate, acetate, acetolactate, acetoin, acetaldehyde, and ethanol, displayed much lower expression levels; most are nonetheless above the median gene expression level. Genes involved in acetaldehyde synthesis (Fig. 3, insert) displayed disparate expression levels, from around 100 to 2,000 in milk mid-log phase, with acetolactate decarboxylase expression particularly low. The limited differential expression of pyruvate metabolism genes is observed, although pyruvate formate-lyase has more than double the expression in mid-log cultures in milk compared with mid-log cultures in CDM and late-log milk cultures; this result is likely a reflection of an increased nucleotide biosynthesis demand.
Genes related to the Leloir pathway generally displayed considerably lower expression levels compared with glycolytic genes (on average a log 2 fold change of 4.2), indicating a much lower activity of the Leloir pathway, as expected. The Leloir pathway provides precursors for the biosynthesis of EPS whose biosynthesis genes were roughly halved in expression in late-log cultures, compared with mid-log cultures in both milk and CDM. As Leloir pathway genes contrarily displayed upregulation (q value of ,0.05, but below the fold change cutoff), the divergence in the expression pattern of the Leloir pathway genes and EPS synthesis genes could indicate the involvement of different regulatory mechanisms in their expression.
Expression of genes related to nitrogen metabolism. Among genes related to nitrogen metabolism, more expression changes are observed than for those related to carbon metabolism. Numerous amino acid synthesis genes were upregulated in milk compared with those in CDM ( Fig. 3; see Table S3 in the supplemental material). Most amino acid biosynthesis pathways included genes with a statistically significant log 2 fold change above 1 in milk compared those in CDM, e.g., pathways for arginine, branched-chain amino acid (BCAA), cysteine, methionine, histidine, aspartate, glutamate, lysine, threonine, and aromatic amino acid biosynthesis. Glutamine and asparagine pathway genes were unchanged, whereas some genes involved in glycine and proline biosynthesis had slightly lower expression in milk (q value of ,0.05, but below the fold change cutoff). In late-log cultures in milk, most amino acid synthesis genes had a somewhat lower expression than mid-log cultures, while tryptophan synthesis genes displayed markedly increased expression (2 to 4 log 2 fold change). Regarding gene upregulation in milk compared with that in CDM, the fold changes of genes encoding enzymes in the pathway from glutamate to arginine are particularly striking with an average 5.5 log 2 -fold increase in milk (q value of ,0.05). In mid-log cultures in milk, these genes reached very high read count levels, even comparable to the count levels of many glycolysis genes. Likewise, genes encoding enzymes involved in cysteine and methionine synthesis were upregulated markedly during growth under milk conditions.
Gene expression patterns also reflected the requirement for nucleotide biosynthesis in milk, as genes encoding enzymes involved in the synthesis of nucleotide precursors generally showed higher expression in mid-log cultures in milk than those in CDM, with a median log 2 fold change of 2.3 (q value of ,0.05).
Contrary to most amino acid synthesis genes, upregulation or differential expression of peptidase genes were generally limited, regardless of the variation in growth phase or medium composition. Possibly, peptidase gene regulation is limited and S.       The darker brown color signifies higher log 2 fold expression changes between milk mid-log versus CDM mid-log conditions, while "nd" denotes reactions associated with genes displaying no statistically significant differential expression. Insert: transcription level of corresponding enzyme-encoding genes, as estimated from read counts. The darker color signifies higher read counts. For each gene, the highest gene count among conditions is selected.
thermophilus may have evolved to express peptidase genes constitutively. Genomewide differential expression and count values are provided in Table S3.
Generation and refinement of a strain-specific genome-scale metabolic model. The genome of CH8 provides a blueprint of its metabolic capabilities and forms the basis of the strain's genome-scale metabolic model (GEM). By applying a comparative genomics approach, an S. thermophilus CH8-specific GEM, iRZ476, was created from previously published GEMs of five other organisms, including S. thermophilus LMG18311 (5,(26)(27)(28)(29)(30). Based on this approach, a matching ortholog could not be found in CH8 for only a few of the proteins included among the gene-protein-reaction associations in the LMG18311 model, and consequently, only two out of all 522 reactions were not transferred to the CH8 model. However, the comparison to the other reference models resulted in the incorporation of 50 additional reactions and improved the gene-protein-reaction association for 11 reactions. Through further manual curation based on knowledge of S. thermophilus physiology, pseudogene identification, and published literature (1,14,15), additional reactions were included or discarded, and further constraints were imposed on several key reactions originating from the LMG18311 model. Notably, reactions of pyruvate dehydrogenase, pyruvate oxidase, fumarate reductase, and alanine dehydrogenase were removed, while reactions of acetoin dehydrogenase and glyceraldehyde-3-phosphate dehydrogenase (GAPN) were added. These changes have profound effects on model simulations and more accurately reflect the metabolism of S. thermophilus as a species.
Simulating growth in a chemically defined medium. The refinement process led to a functional S. thermophilus GEM capable of predicting a realistic metabolite uptake and secretion profile, in concordance with previous experimental findings, e.g., the production of lactate, formate, acetoin, and succinate (5,6,14,31). The CH8-specific GEM, therefore, has the potential to serve as an updated general S. thermophilus metabolic model and can be used to generate models specific to other members in the species with minimal effort. Including CH8-specific experimental data on substrate (lactose, galactose, and glucose) uptake and lactate secretion rates during growth in CDM as additional constraints generated a condition-specific model of high accuracy, as reflected in the predicted growth rate of 0.98 h 21 , which is identical to the measured growth rate of the strain. The secretion products predicted by the model under these conditions include lactate, CO 2 , formate, acetaldehyde, fumarate, and glycerol ( Table 1). The vast majority of CO 2 synthesized is an effect of acetolactate metabolism and fatty acid synthesis, each responsible for ca. 50%, whereas formate is synthesized by pyruvate formate-lyase in the generation of acetyl-coenzyme A (CoA). The inclusion of experimental constraints led to predicting the secretion of acetaldehyde, an important yogurt flavor compound, at the expense of acetoin secretion, which is otherwise predicted in the absence of these constraints. The predicted production of acetaldehyde occurs via acetoin dehydrogenase activity as a means of NAD 1 / NADH redox balancing. A minor redox imbalance otherwise exists, as a consequence of a small excess of NAD 1 regenerated through lactate dehydrogenase, in relation to the amount of NADH generated during glycolysis and the experimentally measured lactate secretion rate. Fumarate and glycerol secretion are also predicted, with or without experimental constraints. Fumarate is a by-product of nucleotide biosynthesis and subject to interconversion with succinate depending on redox balance, while glycerol is a by-product of cardiolipin biosynthesis. Despite being energetically favorable in theory, no acetate production is predicted when applying the described experimental constraints, which is also in agreement with acetate measurements (data not shown). Flux values for all reactions are provided in Table S4 in the supplemental material.
Factors affecting the profile of secreted products. During simulations, a general interdependency was observed for the secretion products lactate, acetoin, formate, acetaldehyde, and ethanol. Several factors were found to influence the secretion profile of these pyruvate-derived metabolites during modeling, such as growth rate, lactate/lactose exchange flux ratio, and biosynthetic load, which was represented as GAPN flux (Fig. 4).
When applying the experimentally measured lactate/lactose flux ratio as a constraint, acetoin and acetaldehyde exchange levels show an inverse relationship. For the growth rate effect, the predefined lactate/lactose flux ratio could differ at higher or lower growth rates; however, the ratio would be expected to be relatively constant within growth rates ranging between 0.8 and 1.0 h 21 .
Whenever NADH generating and consuming reactions, mainly lactate dehydrogenase (LDH) and glyceraldehyde-3-phosphate dehydrogenase (GAPD), are balanced, excess pyruvate is predicted to be secreted as acetoin. When these reactions are imbalanced, acetaldehyde synthesis can occur at the expense of acetoin. A comparable shift from acetoin to acetaldehyde secretion is observed when the lactate/lactose flux ratio or biosynthetic load increase; lower values result in acetoin secretion, while higher values result in acetaldehyde synthesis. The increased biosynthetic load affects GAPD flux and thereby NADH generation, causing the redox imbalance that triggers acetaldehyde synthesis. Also, formate and acetaldehyde levels display an inverse relationship, as the synthesis of both metabolites generate acetyl-CoA, of which only a finite amount is required. Overall, the effect of changing growth or model parameters on the secretion product profile will ultimately be a combined function of the three factors, and most likely other factors as well, including regulatory. It is important to note that optimal growth is an underlying assumption of the metabolic profiles predicted using parsimonious flux balance analysis (pFBA), while regulatory constraints, which are initially unaccounted for in a stoichiometric model, could lead to different profiles. Growth on milk peptides. Modeling growth in a rich undefined medium, such as milk, is more challenging than the modeling of growth in a defined medium due to uncertainties in medium composition. A key feature of growth in milk is the nature of the main nitrogen source, namely, casein and whey protein, requiring proteolysis for the uptake of derived peptide fragments that can be further cleaved intracellularly to free amino acids. Casein proteins constitute the main protein forms and are the preferred substrate for proteolysis by lactic acid bacteria (32), while cleavage of whey proteins could not be detected during yogurt fermentation (33). Consequently, casein was selected as the substrate during modeling. In the model, the uptake of free amino acids was therefore substituted with the uptake of a theoretical casein-derived peptide, having an amino acid ratio based on the average amino acid composition of casein (34). A measured growth rate of 0.64 h 21 for CH8 in milk was used for constraining the solution space, while the lactose/lactate, lactose/galactose, and lactose/glucose exchange flux ratios observed during growth in CDM were retained as constraints (Table 2). To further increase the accuracy of model predictions, experimental data on the concentrations of four fermentation products (Table 3) and most amino acids (see Table S5 in the supplemental material) at the end of CH8 milk fermentation were generated. The secretion of all four fermentation products could be observed at various levels. The experimentally derived fermentation product ratios as well as the ratios between amino acids displaying a net increase in concentration over the course of fermentation (BCAAs, proline, threonine, phenylalanine, and ornithine) were included as model constraints. As proteolysis is the most likely growth limiting factor during dairy fermentation (35), minimization of the casein peptide uptake rate was used as an objective function in model simulations rather than biomass production. Ultimately, the condition-specific model of growth on casein peptide is rather constrained concerning product secretion reactions. Among unconstrained reactions, the secretion of CO 2 , formate, and fumarate are again predicted, as when simulating growth in CDM. For amino acids, some are predicted to be secreted contrary to measurements (glycine, histidine, and leucine). While the observed increase in extracellular amino acids during fermentation could also be a direct effect of extracellular proteolysis, overall, the secretion of certain amino acids and the growth limitation observed in milk indicate that during growth on casein some amino acids are in excess while others could be limiting. Increasing the casein peptide uptake rate in the simulations, and thereby the growth rate, is associated with a corresponding increase in amino acid secretion. Strains displaying higher growth rates could, therefore, theoretically secrete even higher levels of amino acids.
Despite the overall inclusion of multiple secretion constraints, a wide flux range for secretion products can be observed through flux variability analysis (FVA) and also for metabolites that are not secreted in the pFBA solution. This finding might indicate a certain degree of flexibility in metabolism, although most of this flexibility is an effect of the wide feasible range of lactose uptake at this growth rate.
Amino acid supply and demand during growth in milk. An imbalance in the availability of individual amino acids from casein peptide, relative to the biosynthetic demand, can be observed when modeling the growth on casein peptide. A comparison of casein amino acid content (34) with the experimentally determined amino acid composition of S. thermophilus CNRZ1066 cells (5) gives a more direct insight (Fig. 5A). For most amino acids, a good correlation between availability in casein and biosynthetic demand exists; however, for some amino acids, an imbalance is evident. Amino acids present in excess in casein are located generally on the upper (casein) side of the diagonal (Fig. 5A). All these amino acids, except for glutamate, lysine, and serine, are  indeed predicted to be secreted by CH8. Regarding the exceptions, glutamate is a frequent substrate for amino acid interconversions through transamination, whereas lysine is further needed for peptidoglycan biosynthesis and a proportion of serine is consumed in the synthesis of tryptophan, cysteine, and glycine. Therefore, these amino acids are also not necessarily expected to be in excess.
The imbalance between casein-derived amino acid supply and S. thermophilus amino acid demand could result in a limitation of growth by specific amino acids. Indeed, at low casein peptide uptake rates, in silico growth is limited by sulfur amino acid (cysteine and methionine) availability since to our knowledge they constitute the only significant organic sulfur sources in milk (Fig. 5B). At higher casein peptide uptake rates, growth is instead predicted to be limited by alanine, aspartate, tryptophan, and eventually other amino acids (Fig. 5B). For experimentally validating the hypothesis of sulfur amino acid limitation, CH8 milk fermentations were carried out in the presence or absence of additional methionine. Methionine was chosen as it, unlike cysteine, does not have oxygen scavenging properties. Indeed, the addition of methionine resulted in a marked increase in acidification rate, a proxy for CH8 growth. Overall, the time to pH 4.5, which is an important parameter in dairy fermentations, was reduced from 16 h to 9 h (Fig. 5C).
Ornithine secretion occurs during milk fermentation. Transcriptome profiling revealed exceptionally high expression levels of arginine biosynthesis genes during milk fermentation, whereas only minimal arginine biosynthesis was predicted during in silico simulation of growth on casein-derived peptide. Besides, based on the amino acid content of both S. thermophilus biomass and casein, an arginine deficiency in milk is not apparent. Thus, the arginine biosynthesis pathway could have an additional function in the supply of other metabolites. Ornithine, for example, is generated as an intermediate in the biosynthesis of arginine. Measurement of extracellular ornithine concentrations during milk fermentation by CH8 revealed an initial decrease followed by a slight increase in the concentration of this amino acid (Table 4). Arginine can also function as a precursor for the synthesis of polyamines, a class of compounds that is ubiquitous among living organisms. One polyamine, putrescine, can be synthesized from arginine in only two reactions. Most S. thermophilus strains harbor a gene annotated as an agmatinase, which is capable of catalyzing the second reaction. To explore this hypothesis, the intracellular and extracellular concentrations of putrescine, together with the extracellular concentrations of ornithine and glutamate (Table 4), were measured for CH8 cultured in milk. None of the polyamines that were measured (putrescine and four other polyamines) could be detected either extracellularly or in crude cell extracts (limit of detection, 0.4 mg/L), rejecting the hypothesis that any excess arginine serves as a precursor for polyamine synthesis in CH8.

DISCUSSION
As S. thermophilus is an important workhorse in the dairy industry, and increasingly also in the fermentation of plant-based foods, knowledge about its physiology has both scientific and industrial relevance. Here, a system-wide characterization of S. thermophilus CH8 physiology uncovered novel aspects relating to both carbon and nitrogen metabolism, while examination of both the in vitro and in silico growth in milk and CDM demonstrated the physiological characteristics of CH8 during fermentation in either medium.
Curation of the previously published S. thermophilus LMG18311 GEM (5) yielded a refined and updated model that provides a foundation for the in-depth understanding of this organism's metabolism. Considering the limited variation in the distribution of metabolic genes among the studied strains, the reconstructed metabolic network of CH8 can be expected to be broadly representative of the species. A recurring observation in model simulations was the surplus of pyruvate, leading to the secretion of a variety of derived metabolites, including formate, acetaldehyde, acetoin, ethanol, and pyruvate itself, apart from the hallmark lactate secretion. The secretion of these metabolites in milk was either verified experimentally in this work or previously reported by others (4-6, 31, 36). Through modeling, a set of three factors putatively influencing the secretion product profile in S. thermophilus were identified, namely, growth rate, lactate/lactose exchange flux ratio, and biosynthetic load. The main reason underlying the influence of these factors on the fermentation product profile is their effect on NADH, acetyl-CoA, and CO 2 intracellular pools, as these metabolites take part in the reactions involved in the synthesis of the secretion products. As the three factors reflect general mechanistic constraints during optimal growth, their influence could also be valid easily both under other conditions and in other strains. As for the different distribution of secretion products experimentally observed in milk, it is likely a reflection of regulatory constraints that are not incorporated explicitly in stoichiometric models. Thus, the distribution could also differ in strains with alternative regulatory constraints. While the effects of the three identified factors can be overruled or masked by regulatory mechanisms, the underlying principles of these effects are still at play and should be relevant to consider when, for example, optimizing the synthesis of a certain fermentation product.
One of the secretion products of S. thermophilus, acetaldehyde, is an important flavor compound in yogurt, and yet the exact route of its biosynthesis in this organism has not been resolved definitively. While most organisms synthesize acetaldehyde from acetyl-CoA, through acetaldehyde dehydrogenase activity, this enzyme is absent in S. thermophilus. In this work, constraint-based modeling identified the acetolactate pathway (Fig. 3) as the main source of acetaldehyde formation rather than the threonine aldolase (TA) activity (converting threonine to glycine and acetaldehyde) of the serine hydroxymethyltransferase (SHMT) enzyme as was proposed previously (37). Although various experimental reports on the possible routes of acetaldehyde formation in S. thermophilus exist (4,37,38), a clear consensus is absent as it appears that the relative contribution of each pathway varies, possibly with a preference toward the acetolactate pathway.
Nitrogen availability is presumably a growth-limiting factor during growth in milk (35). Modeling amino acid metabolism during growth on the casein peptide provided insights into the growth-limiting effect of amino acids on S. thermophilus. The most pronounced growth limitation relates to the sulfur amino acids cysteine and methionine. The synthesis of both amino acids requires sulfur, which S. thermophilus cannot assimilate from inorganic sources, while the presence of other significant organic sulfur sources in milk is not reported. Thus, unlike other amino acids that can be synthesized de novo from glycolytic intermediates, the combined pool of sulfur amino acids largely determines their total availability for protein synthesis. Without exogenous cysteine and methionine, modeling identified that growth is limited by the casein peptide uptake rate, up to a certain level, after which other amino acids, e.g., alanine, aspartate, and tryptophan, become limiting. The predicted sulfur amino acid limitation during growth in milk is in line with the importance of sulfur amino acid synthesis reported previously in S. thermophilus (9). An increase in the expression of sulfur amino acid synthesis genes was also observed, which could indicate the presence of another organic sulfur source or merely reflect cysteine and methionine interconversion. The uptake rate of casein peptides is a function of the proteolytic capabilities of a strain. A key feature in proteolytic S. thermophilus strains is the serine protease PrtS (39), whose presence is subject to significant strain variability (1). This protease is absent in CH8, explaining its lower growth rate in milk than that in CDM. However, strains carrying this protein could display a casein peptide uptake rate that surpasses the growth limitation caused by sulfur amino acids, and all S. thermophilus strains are therefore not necessarily subject to this limitation. Overall, the current results indicate an imbalance between amino acid supply in the form of peptides and individual amino acid demand. A discrepancy is also observed between pFBA predictions and the expression of genes related to amino acid biosynthesis. Possibly, the average amino acid composition of casein is not an accurate representation of the intracellularly available amino acids. The processes of extracellular proteolysis and peptide transport might lead to a particular intracellular distribution of peptides, thereby also affecting the composition of available intracellular amino acids. Apart from PrtS, certain cellassociated extracellular peptidase activities have been identified (40,41). An altered amino acid composition could potentially explain contradicting observations obtained from modeling and transcriptome profiling, e.g., within arginine and BCAA biosynthesis pathways.
Alternatively, additional functions for these amino acids, or intermediates in their biosynthesis pathways, that are not described in the reconstructed metabolic network may exist. High arginine biosynthesis gene expression has also been detected previously during S. thermophilus growth in milk using microarray technology (14). Intriguingly, verification of this expression using reverse transcriptase quantitative PCR (RT-qPCR) on argH detected a 50-fold upregulation (14), which is similar to the results in our study. Meanwhile, gene expression levels similar to those in the present study have been observed in another RNA-Seq study (18), as well as in an internal study performed on one of our proprietary prtS1 strains (our unpublished data).
It has been speculated that the mutualistic relationship between S. thermophilus and L. delbrueckii subsp. bulgaricus during milk fermentation includes the exchange of ornithine produced by S. thermophilus and subsequent ornithine-derived putrescine produced by L. delbrueckii subsp. bulgaricus (42). The increase in extracellular ornithine concentration over time observed in this study indicates that the ornithine/putrescine exchange hypothesis could be correct, although the final ornithine concentration is low. Possibly, the ornithine production of S. thermophilus would be enhanced during coculture. Increased arginine biosynthesis gene expression was indeed observed during coculture with L. delbrueckii subsp. bulgaricus (15).
Modeling the mutualistic relationship between L. delbrueckii subsp. bulgaricus and S. thermophilus would be a natural continuation of this study and has relevance both from an industrial and an evolutionary perspective. Several interactions have already been described (14,15,25), of which some have been evidenced and some remain as hypotheses. Future coculture modeling can help substantiate these aspects, quantify the interactions, and perhaps identify novel relationships.
The growth of S. thermophilus in both CDM and milk was examined here through a combined systems biology approach. This examination led to a better understanding of the characteristics of growth in its main niche, milk, at the level of both carbon and nitrogen metabolism, where principles governing the secretion product profile but also nitrogen limitation were uncovered. The availability of a refined and updated GEM capable of accurately simulating growth and metabolite secretion by S. thermophilus enables the future system-level mechanistic understanding of S. thermophilus physiology. Considering the disparity in amino acid supply and demand during growth in milk, studies on the effect of proteolysis and peptide uptake could provide more detailed knowledge on the actual distribution of intracellularly available amino acids. The knowledge presented here on the proposed mechanisms governing the secretion product profile could as well be harnessed to fine-tune or engineer strains displaying a desirable product profile. Moreover, the new GEM could provide the foundation for the characterization of the mutualistic relationship between S. thermophilus and L. delbrueckii subsp. bulgaricus in greater detail.

MATERIALS AND METHODS
Strains, media, and culture conditions. Streptococcus thermophilus CH8 is a proprietary strain, which was obtained from the Chr. Hansen Culture Collection. Strains were cultured in media consisting of either a chemically defined medium (CDM) or milk. A CDM for S. thermophilus has been described previously (43) and was modified in this study. The medium contained (per liter of demineralized water) 20 g lactose; 1 g Na-acetate; 0.6 g NH 4 -citrate; 3 g KH 2 PO 4 ; 2.5 g K 2 HPO 4 ; 0.24 g urea; 0.42 g NaHCO 3 ; 0.2 g MgCl 2 Á6 H 2 O; 0.05 g CaCl 2 Á2 H 2 O; 0.028 g MnSO 4 ÁH 2 O; 0.005 g FeCl 2 Á4H 2 O; 1-mL trace element solution; 10-mL vitamin solution; 0.01 g of each of the four nucleobases adenine, guanine, uracil, and xanthine; 0.5 g cysteine; and 0.04 g of each of the L-amino acids alanine, arginine, asparagine, aspartic acid, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, and valine. The trace element solution consisted of (per liter of demineralized water) 77 mM HCl, O while the vitamin solution consisted of (per liter of demineralized water) 100 mg pyridoxine-HCl, 50 mg p-aminobenzoic acid, 50 mg nicotinic acid, 400 mg Ca-DL-panthothenate, 50 mg thiamine, 50 mg lipoic acid, 50 mg riboflavin, 20 mg biotin, 20 mg folic acid, and 1 mg vitamin B 12 . The medium was adjusted to pH 6.6 and distributed as 50-mL aliquots in 250-mL crimp-top serum bottles with subsequent flushing of the headspace with N 2 gas for 10 min. The medium was sterilized by autoclaving at 121°C for 20 min. Prior to inoculation, the medium was supplemented with sterile, anaerobic solutions of lactose, MgCl 2 Á6 H 2 O, CaCl 2 Á2 H 2 O, urea, trace elements, vitamins, amino acids, and nucleobases. As the final step, a cysteine-HCl solution was added to lower the redox potential. For growth in milk, skim milk powder (Arla Foods amba, Denmark), containing around 60% protein and 1% fat, was reconstituted at a dry matter level of 9.5% in distilled water and pasteurized at 99°C for 30 min, followed by cooling. Following the aseptic transfer of milk to a sterile crimp-top serum bottle, dithiothreitol was added as a reducing agent to a final concentration of 0.15 g/L, and the bottle headspace was flushed with N 2 for 20 min. All fermentations were conducted under anoxic conditions. For inoculum preparation, bacteria were streaked from glycerol stock onto M17 (44) agar plates and incubated anaerobically overnight at 37°C, where after a single colony was inoculated into the CDM and incubated anaerobically overnight at 37°C. From this preparation, an inoculum was generated by inoculating the CDM at the desired level to obtain exponentially growing cells following overnight incubation at 37°C. Growth was monitored by following the optical density at 600 nm (OD 600 ) of the cultures.
Genome sequencing and analysis. Genome sequencing was performed using both Illumina shortread and Oxford Nanopore Technologies (ONT;, Oxford, UK) long-read sequencing technologies. For shortread sequencing, genomic DNA was extracted using the DNeasy blood and tissue kit (Qiagen, Hilden, Germany). The DNA was then fragmented using a BioRuptor instrument (Diagenode Inc., Denville, NJ) aiming at an insert size of 800 bp. A sequencing library was prepared using the Kapa HTP kit (Roche, Basel, Switzerland), according to the manufacturer's protocol. Sequencing was done using MiSeq reagent kit v2 on a MiSeq instrument (Illumina, San Diego, CA), with a read length of 2 Â 250 bp. For long-read sequencing, genomic DNA was purified using phenol-chloroform extraction followed by ethanol precipitation (45). Prior to library preparation, DNA was subjected to an additional purification step with AMPure XP beads (Beckman Coulter, Brea, CA), following the manufacturer's protocol. The sequencing library was prepared using the 1D genomic DNA by ligation kit (SQK-LSK108; ONT), following the workflow suggested by the manufacturer. The prepared library was sequenced on a MinION instrument (ONT).
A hybrid genome assembly approach based on both Illumina and ONT reads was followed using SPAdes 3.8.0 (46), with a coverage of 45-fold for ONT reads and 115-fold for Illumina reads. Low coverage contigs (,1.5 coverage) were removed and a long-read-only assembly was created, using Canu v. 1.3 (47) to resolve one final gap.
Open reading frames (ORFs) in the assembled genome were identified and annotated functionally using the RAST pipeline (48). The phylogeny of S. thermophilus strains was reconstructed by first aligning 534 core genes, identified using Roary (49) with PRANK (50) alignment. Poorly aligned positions were then eliminated with Gblocks (51), and maximum-likelihood phylogeny was inferred using RAxML-NG, which is RaxML based (52), with the GTRGAMMA model, 50 random starting trees, and 100 bootstrap replicates. Phylogenetic trees were visualized using FigTree (53), and strain NCTC8618 of S. salivarius, a closely related species, was included as an outgroup. For a pangenome analysis, protein ortholog detection was performed using proteinortho (54) employing a sequence identity of 95%, coverage of 90%, and similarity cutoff of 1. To avoid annotation bias, the ortholog comparison was based on a PGAP (55) annotated version of the CH8 genome and the refseq version of the public genomes (refseq genomes are annotated with PGAP). Subsequently, proteins specific to CH8, or rare among other strains, and orthologs absent in CH8 while present in the majority (18) of strains were identified, and differences relating to metabolic genes were highlighted. Hierarchical clustering based on ortholog distribution was performed in R using the Jaccard distance and unweighted pair group method with arithmetic mean (UPGMA) linkage method.
Isolation and processing of RNA. For growth in both CDM and milk, a preculture grown anaerobically overnight at 37°C in CDM was used to inoculate the desired medium to a final OD 600 of 0.05, and the resulting cultures were incubated at 37°C. Cultivations were performed in triplicates in each medium, and cells were harvested at two time points, as follows: at an OD 600 of ca. 0.6 to 0.7 and 1.0 to 1.2 for the cultures in CDM and around pH 5.92 and 5.37 for the cultures in milk. A 1:2 volume of cell culture to RNAprotect bacterial reagent (Qiagen) was mixed rigorously at cell harvest for both CDM and milk samples. For CDM growth, cell pellets were harvested by centrifugation. To harvest cells grown in milk, samples were subjected to mild sonication for 30 s, and the procedure for cell-milk separation was adapted from previous reports (56,57). To a 12-mL milk:RNAprotect mix, 4 mL 1 M Na-citrate (pH 7.0) and a 1.56-mL saline solution (0.145 NaCl, 0.016 M Na-b-glycerophospate, and 0.1% Tween 80 [pH 7.0]) were added, mixed, incubated at room temperature (RT) for 5 min, and centrifuged at 10,000 Â g for 5 min at 4°C. The cell pellet was washed with 1 mL extraction buffer (5 mM NaPO 4 and 1 mM EDTA [pH 7]) and stored subsequently at 280°C. The cell pellet was dissolved in Tris-EDTA (TE) buffer containing lysozyme (15 mg/mL), proteinase K (20 mg/mL), and mutanolysin (250U/mL) and shaken at 37°C and 1,400 rpm for 10 min. The subsequent RNA extraction procedure was performed with an RNeasy minikit (Qiagen) as per the manufacturer's instructions, including the removal of DNA with DNase I. The quality of the total RNA was evaluated using a bioanalyzer (Agilent), before using the Ribo-Zero rRNA removal kit for bacteria (Illumina) for rRNA depletion, according to the manufacturer's protocol. Sequencing thereafter was performed at Center for Biosustainability (Technical University of Denmark, Denmark), where a TruSeq RNA library prep kit (2 Â 75 bp; Illumina) was used for library preparation and pairedend sequencing was performed on an Illumina NextSeq instrument.
RNA-Seq data analysis. Detailed sequencing statistics can be found in Table S2 in the supplemental material. The obtained raw reads were trimmed with Trimmomatic (58) using default parameters, with the resulting unpaired reads removed. All remaining reads were mapped to an initial draft version of the CH8 genome, obtained through a hybrid genome assembly approach. Read mapping was performed with CLC Genomics Workbench (Qiagen), with the default parameters, and unique gene counts were extracted. A differential gene expression analysis was performed using DESeq2 (59) within the SARTools framework (60) with default parameters. Values were mapped to genes annotated on the complete genome, by the principle of orthology. Transcriptome data were visualized on CH8 metabolic maps with Escher software (61).
Genome-scale metabolic modeling. The GEM of S. thermophilus CH8 iRZ476 was created based on a comparative genomics approach using the GEM reconstruction tool Metadraft v. 0.8.1 (62). Briefly, protein orthologs were identified for CH8 and five selected reference strains with publicly available GEMs. Reactions associated with orthologs in the reference strains were included in the CH8 model after being subjected to manual filtering, thus excluding reactions implausible in S. thermophilus. Reference strain species included S. thermophilus LMG18311, Lactobacillus plantarum, Lactococcus lactis, Bacillus subtilis, Staphylococcus aureus, and Escherichia coli (5,(26)(27)(28)(29)(30). Compared with the LMG18311 model, 22, 51, and 27 reactions were removed, added, or constrained, respectively. The GEM is provided in File S1 in the supplemental material. Medium formulations included a medium similar to the defined CDM, with either amino acids or casein-derived peptides as the nitrogen source. For growth on casein peptides, an average peptide, designated casein peptide, was formulated with an average amino acid composition based on previous determinations of casein composition (34). Optimal growth states were identified using pFBA and flux variability analysis (FVA) within the COBRApy framework (63). Uptake and secretion rates of lactose, glucose, galactose, and lactate were determined for growth in CDM by time course metabolite measurement and metabolite yield calculation through linear regression based on the equation q p = Y p/x Á m, where q p is the rate of product formation, Y p/x the specific product yield coefficient, and m the growth rate. Rates were calculated at intervals in which a linear relationship between growth rate and metabolite concentration was observed. For growth in CDM, growth rate was maximized during modeling, while for growth on the casein peptide, growth rate was constrained to 0.64 according to experimental data, and instead casein peptide uptake was minimized. Additionally, for modeling growth on casein peptide, experimentally observed ratios between secreted metabolites of acetaldehyde, acetoin, ethanol, and pyruvate, as well as experimentally observed ratios between accumulating amino acids, proline, ornithine, threonine, isoleucine, valine, and phenylalanine, were applied as constraints.
Fermentation kinetics in CDM and milk. For simulating growth and metabolite formation in CDM and milk, detailed fermentation kinetics were obtained by growing CH8 principally as described above. For CDM fermentation, anaerobic growth was performed as described, in duplicates, although the medium contained twice the amino acid concentrations, apart from cysteine that remained at 0.5 g/L. Samples for metabolite analysis were collected and mixed 5:1 with 2 M H 2 SO 4 . Milk fermentation was here performed in 200-mL screw-cap bottles fitted with pH sensors. Milk acidification was followed by monitoring the changes in pH for up to 30 h at 40°C, using an iCinac system (AMS Alliance, Frepillon, France). Growth was followed by monitoring the OD 600 of samples clarified with borate-EDTA buffer (pH 8.0). For OD 600 of ,0.2 measurements, 0.5 mL of a milk sample was treated with 2 M borate-200 mM EDTA (pH 8.0) as described by Christensen and Steele (64). From that point onward, 0.1 mL of the sample was mixed with 0.9 mL of 0.5 M borate-10 mM EDTA (pH 8.0), and the OD 600 of the mixture was determined after a 30-min incubation at room temperature. Absolute OD 600 values of the cultures were corrected by deducting the OD 600 of noninoculated B-milk clarified in the same way. For methionine supplementation experiments, fermentations were performed with or without methionine added at a concentration of 80 mg/L.
Metabolite analysis. Extracellular levels of acetaldehyde, acetoin, ethanol, and pyruvate in milk samples from iCinac fermentations were analyzed in-house by gas chromatography coupled to flame ionization detection as described previously (65), with modifications. For acetaldehyde, acetoin, and ethanol analysis, a 1-mL sample was added to a 20-mL headspace vial already containing 200 mL 2 M sulfuric acid in Milli-Q water and the vial capped immediately. For pyruvic acid analysis, a 600-mL sample was added to a 20-mL headspace vial already containing 400 mL 300 mg D-a-hydroxyisovaleric acid/L in saturated NaHSO 4 . Subsequently, 0.5 mL ice-cold methanol was added and the vial capped immediately. For quantification, a standard addition with sodium pyruvate was done to a pool of samples using D-a-hydroxyisovaleric acid as the internal standard. The gas chromatograph (Autosystem XL; Perkin-Elmer) was equipped with a flame ionization detector and a free fatty acid phase (FFAP) column (25 m Â 0.2 mm Â 0.3 mm; 19091F-102; Agilent Technologies). The gas chromatograph was connected to a headspace sampler (TurboMatrix 110 Trap; Perkin Elmer). The operating parameters of the gas chromatograph were as follows: 32 lb/in 2 of head pressure, 55 mL/min helium flow rate, 180°C injector temperature, and 200°C detector temperature. The oven temperature was held at 60°C for 2 min; the temperature increased by 45°C/min up to 230°C. The parameters of the headspace sampler were as follows (values in parentheses were used for samples prepared according to the procedure for pyruvic acid determination): 70°C incubation temperature for 36.5 min (77 min), 180°C needle temperature, 210°C transfer line temperature, 1 min of pressurization time, and 0.02 min of injection time with split ratio 4.5. Unless stated otherwise, chemicals used were of analytical grade.
To track arginine-related metabolites, free amino acid concentrations were measured for both CH8 cell extracts and supernatants that were obtained during milk fermentations. Precultures and cultures were prepared as described under "Isolation and processing of RNA." For the extracellular measurement of free amino acids, samples were collected and frozen until analyzed. For the preparation of cell extracts, 50 mL milk fermentate was subjected to cold washing for the separation of cells from the milk protein matrix., using previously described wash buffers (57), with modifications. Initially, samples were washed with 25 mL 1 M sodium citrate and 10-mL saline solution. Two subsequent washes were performed with a mixture of 25-mL extraction solution and 25-mL borate solution (0.5 M borate and 10 mM EDTA). The resulting cell pellet was resuspended in 1 mL 0.05 M Na-phosphate buffer (pH 7), and cells were lysed by sonication (Misonix 3000). Concentrations of free arginine, ornithine, and glutamate in extracellular samples and cell extract samples were analyzed by Ansynth Service B.V. (The Netherlands) using classical ion-exchange liquid chromatography with postcolumn ninhydrin derivatization and photometric detection. Polyamine and free amino acid levels were analyzed in-house by gas chromatography coupled to mass spectrometry with electronic ionization. Derivatization was performed through alkylation with methylchloroformate (66).
Statistics and reproducibility. Differential expression analysis was performed with the DESeq2 R package applying the default median normalization and the parametric fit type, and a Benjamini-Hochberg correction was performed. A principal-component analysis of samples was performed on transformed (variance stabilizing transformation) count values. Differential gene expression was considered significant if the q value was below 0.05 and log 2 fold change was above j61j. Unless stated otherwise, all examples of differential gene expression mentioned adhere to these criteria.
Data availability. The annotated genome sequence was deposited at NCBI GenBank (CP089060), while the RNA-Seq reads were submitted to the Sequence Read Archive (SRR15427001 to SRR15427012). in the manuscript, however, are solely based on scientific grounds and do not reflect the commercial interests of their employer.