Differential transcription of expanded gene families in central carbon metabolism of Streptomyces coelicolor A3(2)

Background Streptomycete bacteria are prolific producers of specialized metabolites, many of which have clinically relevant bioactivity. A striking feature of their genomes is the expansion of gene families that encode the same enzymatic function. Genes that undergo expansion events, either by horizontal gene transfer or duplication, can have a range of fates: genes can be lost, or they can undergo neo-functionalization or sub-functionalization. To test whether expanded gene families in Streptomyces exhibit differential expression, an RNA-Seq approach was used to examine cultures of wild-type Streptomyces coelicolor grown with either glucose or tween as the sole carbon source. Results RNA-Seq analysis showed that two-thirds of genes within expanded gene families show transcriptional differences when strains were grown on tween compared to glucose. In addition, expression of specialized metabolite gene clusters (actinorhodin, isorenieratane, coelichelin and a cryptic NRPS) was also influenced by carbon source. Conclusions Expression of genes encoding the same enzymatic function had transcriptional differences when grown on different carbon sources. This transcriptional divergence enables partitioning to function under different physiological conditions. These approaches can inform metabolic engineering of industrial Streptomyces strains and may help develop cultivation conditions to activate the so-called silent biosynthetic gene clusters.


InTRoduCTIon
Streptomycete bacteria are major source of clinically useful bioactive natural products including antibiotics, immunosuppressive and anti-cancer agents. A remarkable feature of their genomes is that there are often several genes that appear to encode the same biochemical function [1,2]. This is often referred to as 'genetic redundancy' , where two or more genes are performing the same biochemical function [3]. While genetic redundancy does occur in nature, many so called 'redundant' genes have evolved divergent functions and provide the functional diversity and evolutionary robustness that can be observed in genomes from a wide range of organisms [4]. There are two main mechanisms that contribute to the expansion of gene families within genomes -the duplication of genes or horizontal gene-transfer events [4][5][6].
Such gene-family expansions are well known in streptomycete regulatory genes [7][8][9][10] and in specialized metabolite biosynthesis [11,12]. Yet, there has been limited attention focussed on the genes of central metabolism despite a number of gene-expansion events being identified from biochemical and phylogenomic studies. There is an increasing appreciation that gene-expansion events in central metabolism may facilitate the evolution of specialized metabolites [2,[13][14][15][16][17]. Whilst expansion of gene families is widespread, expression differences, gene dosage, cofactor variation, allostery or substrate affinities are seldom taken into account during the construction of metabolic models for Streptomyces. As a result, homologous functions often combined into a single flux pathway that may not reflect the physiological nature of each gene product. This is especially stark when attempting to understand the supply of precursor molecules from central metabolism to the production of specialised metabolites.
It was hypothesized that gene families that have undergone gene-expansion events would exhibit different expression profiles if they have diverged functionally. To understand the role of these gene expansions in Streptomyces and their impact on central metabolism and specialised metabolism, an RNA-Seq approach was taken using S. coelicolor grown either on glucose or tween as sole carbon source. Cultures were compared at a single point during growth (mid-log phase) to understand how transcription of central metabolic genes varies when the cultures are growing primarily via glycolysis (glucose as the sole carbon source) or gluconeogenically (with the mono-oleate tween). It was found that expanded gene families exhibit different responses to growth on different carbon sources. These data will enable prioritization of targets for metabolic engineering and will be informative for the construction of metabolic models.

Expanded gene families in carbon metabolism exhibit different transcriptional profiles
To investigate the differences between glycolytic and gluconeogenic growth conditions, S. coelicolor M145 [an A3 (2) strain] was grown respectively on glucose and Tween40 as sole carbon source. Growth of liquid cultures was monitored by cell dry weight and samples were removed for total RNA extraction at mid-exponential phase, which for glucose was 19 h and for tween was 36 h (Fig. S1, available in the online version of this article). None of the pigmented antibiotics (actinorhodin or undecylprodigiosin) could be detected at time points RNA was harvested. The specific growth rate of S. coelicolor M145 was 0.18 h −1 when grown on glucose and 0.1 h −1 for growth on tween (see Fig. S1).
RNA samples from these cultures showed global effects on transcription, when analysed by RNA-Seq, when the strain was grown on different carbon sources. Growth on tween resulted in 644 genes being differentially expressed when compared to growth on glucose. Of the 644 differentially expressed genes, 37 % were predicted to encode hypothetical proteins, 9 % encoded regulatory genes, 12 % encoded central carbon metabolic enzymes, 6 % were transporters, 4 % were associated with specialized metabolism, 2 % encoded genes associated with stress metabolism and 1 % of genes were associated with DNA replication, nitrogen metabolism and genes associated with metal metabolism (see Tables S1-6). Growth on Tween40 resulted in an upregulation of genes associated with gluconeogenesis and fatty acid degradation when compared to cultures grown on glucose (Fig. 1c, Tables S1-S6).
Gene-ontology (GO) enrichment analysis (using PANTER http://www. pantherdb. org; geneontology. org [18]), was used to investigate the enrichment of specific gene ontologies between the two growth conditions. Analyses were performed for genes upregulated on Tween and one for genes downregulated on Tween. The glycolytic genes are downregulated when S. coelicolor was grown on tween as would be expected, with GO enrichment analysis highlighting tpiA (SCO1945), pyk1 (SCO2014) and pyk2 (SCO5423), one of the three GAPDH (SCO7511) and a single copy of the pyruvate dehydrogenase E1 complex (SCO2183), it was noted that all genes belonged in the oxidoreductase gene-ontology grouping.
GO enrichment analysis for genes upregulated during growth on Tween were found to be those associated with fatty acid catabolism, gluconeogenesis (pyc; SCO0546) and another of the three GAPDH homologues (SCO7040).
To examine transcription of expanded gene families, two categories of expression profiles were considered to reflect changes across the gene family that are predicted to encode the same enzymatic function: type 1, where all genes within an expanded gene family behaved similarly under the growth conditions, and type II where members of an expanded gene family exhibit differential gene expression, i.e. one or more member of the family increased, while expression of other members of the gene family, increased to a different degree, decreased or remained the same ( Fig. 1a and Table S1). We identified 34 enzymatic reactions in central metabolism whose expression differed when cultures were grown on tween or glucose. Of these, 21 had more than one enzyme predicted to encode the same enzymatic function, i.e. were expanded gene families ( Fig. 1, Table S1). When the expression profiles of these genes were examined further, 9/21 enzymatic reactions had families that showed type I expression profiles and 12/21 expanded gene families exhibited type II patterns of expression. This indicates that despite their identity in genome annotation, members of some expanded gene families are not redundant in function but have distinctive physiological roles [2,13,19]. It is unlikely that functional differences can be attributed at the transcriptional level. However, the data enable prioritization of targets for metabolic engineering, such as modulating the expression poorly expressed members of gene families.
The glycolysis pathway, unsurprisingly, showed reduced expression when the cultures were grown on Tween40 as sole carbon source. Only two glycolytic enzyme families exhibited type II expression, with an increase in expression for the minor glucose kinase (SCO0063; fivefold increase in expression), whereas expression of the primary glk (SCO2126) remained unchanged under both conditions. GAPDH also showed type II expression profiles, with expression of SCO7040 increased on tween, whereas the other two copies of the genes encoding GAPDH had reduced expression (SCO1947 and SCO7511). A previous proteomic-based study also identified an increase in abundance of the SCO7040 protein when grown on a nonglucose carbon source (fructose; [19]). The two PK genes, as we have previously demonstrated, exhibited type I expression [2], with both copies being downregulated when cultures are grown on tween.
Overall, genes involved in gluconeogenesis were upregulated when cultures were grown on tween compared to glucose -as  would be expected. The only expanded gene family encoding gluconeogenic function is that encoding the two PPDKs, which both displayed substantially increased expression (type I expression), with ppdk1 (SCO0208) upregulated 11-fold up and ppdk2 (SCO2494) upregulated 30-fold (ppdk2). This suggests that, under these conditions, S. coelicolor may be using unconventional gluconeogenic routes for anaplerotic reactions rather than via the glyoxylate shunt, as expression of ICL and both MS enzymes remain unchanged during growth on either carbon source.
Expression of the pentose phosphate pathway showed little differential expression under the two conditions studied. Type II expression profiles were observed for zwf, with expression of SCO6661 reduced when cells were grown on tween, whereas expression of SCO1937 was unchanged between the two conditions. This was consistent with the work of Gubbens et al. [19], where SCO6661 was downregulated when cultures were grown on fructose rather than glucose. There was no evidence of changes in transcriptional activity for the putative Entner-Doudoroff (ED) pathway genes for KDPG aldolase (SCO2298, SCO3473 and SCO3495). However, reduced expression of two of the three phosphogluconate dehydratase homologues (SCO3877 and SCO6658) was observed when cultures were grown on tween (Fig. 1c). Whilst it was reported previously that there is no active ED pathway in S. coelicolor [20] these data suggest that putative phosphogluconate dehydratase homologues do respond to changes in carbon source.
In general, expression of genes encoding enzymes of the tricarboxylic acid (TCA) cycle remained stable under both growth conditions as would be expected, given this core part of metabolism is required for biosynthesis under all physiological conditions. The exception was expanded gene families, which exhibited differential expression (type II): expression of the fumarase (SCO5042) was reduced twofold on tween whilst SCO5044 expression remained unchanged; one copy of the malic enzyme (SCO2951) showed a twofold decrease in expression, compared to SCO5261, a second malic enzyme, which remained unchanged under both conditions; the extensive expansion in S. coelicolor of genes encoding the PHDC E1 subunit exhibited a range of expression changesexpression profiles for SCO1269, SCO7124, SCO2371 and SCO3816/3817 were the same for both carbon sources, whereas SCO2183 had a 3.5-fold decrease and SCO1270 a sixfold increase on tween.
As expected fatty acid utilization genes were expressed more when S. coelicolor was grown on tween, especially cholesterol esterase (SCO5420), which had an increase of almost 37-fold. The enzyme catalyses the hydrolysis of the head group of tween from the fatty acid palmitate, which are then utilized as carbon source [21][22][23]. The other genes from this pathway showed between 3-to 12-fold increase in expression (Table  S2).
To verify the RNA-Seq data, we performed qPCR on the RNA samples used for the RNA-Seq experiment using primer pairs for five different genes -pyk1 (SCO2014) and pyk2 (SCO5423) from glycolysis, ppdk1 and ppdk2 from gluconeogenesis with primary sigma factor hrdB (SCO5820) as control. These two pairs of genes were chosen as representative of expanded families, identified in a previous study [2]. RNA-Seq data showed that expression was strongly up or downregulated under the chosen conditions, whilst hrdB (SCO5820, as expected) showed no difference under the two conditions tested. The fold-change difference in the qPCR data was similar to that observed in RNA-Seq experiments, corroborating the wider results (Figs S2 and S3).

Carbon source influences specialized metabolite gene expression
Genes involved in isorenieratene biosynthesis (SCO0185-0191), showed an increase in transcription in cultures grown on tween, with a three to eightfold increase in expression across the entire operon. Isorenieratene is associated with blue-light exposure, with the pathway present in green photosynthetic bacteria and a few actinobacteria [24]. Isorenieratene is a carotenoid with antioxidative properties; it is synthesized via the mevalonate-independent pathway (MEP/DOXP pathway) from basic precursors of GAP and pyruvate, derived from glycolysis to form Isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAP). These metabolites then enter the carotenoid biosynthetic pathway via phytoene, lycopene and beta-carotene. The MEP/DOXP pathway also had increased expression of some genes (SCO6768 -2.2-fold, SCO5250 -4.4-fold) when grown on glucose (Table S2). Given the only difference between the cultures was the carbon source, we hypothesize that cultures experienced oxidative stress. This is in part supported by the GO enrichment analysis where 51 genes with GO terms matching 'oxidoreductases' exhibiting increased expression on Tween as the sole carbon source, with a sevenfold increase in expression of a superoxide mutase (SCO0999) and an eightfold increase of expression of a putative bacterioferritin co-migratory protein (SCO7353 ; Tables S2 and S4). Caution should be exercised however as many genes in primary metabolism also share GOs with 'oxidoreductases' .
A non-ribosomal peptide synthetase (NRPS) pathway (SCO6429-6438), the siderophore coelichelin biosynthetic gene cluster (SCO0491-0498) and the actinorhodin cluster (SCO5071-5092) all had decreased expression when grown on tween compared to glucose by two to fivefold, two to threefold and three to 11-fold, respectively (Fig. 1, Table S2). This may reflect a tighter control of entry into specialized metabolism when cultures are grown on tween rather than glucose, although further experiments are needed to confirm this.
Studies such as this can inform metabolic modelling approaches for strain improvement enabling expression data and metabolic flux analysis to be taken into account with respect to isoenzymes within expanded gene families rather than combining the activities of all copies of a gene family into a single flux [15]. The identification of metabolic engineering targets can often be complicated through the presence of multiple genes that putatively encode the same enzymatic function in streptomycetes. Studies such as this enables the identification of transcriptionally active genes under relevant conditions to be identified and prioritized for manipulation. This is exemplified by GAPDH in S. coelicolor where three genes are annotated as having this function, with one (SCO7040) increasing expression on tween when the other two copies show decreased expression under the same conditions. This knowledge will help inform choices and enable prioritization of metabolic engineering targets for industrial streptomycetes and the supporting dataset will also inform researchers on the role of hypothetical proteins and regulators, during growth on two different carbon sources that drive metabolism along two different pathways.

Bacterial strains and growth conditions
S. coelicolor A3(2) M145 [25] was used throughout the study. Spores were germinated in 50 ml of 2× YT medium in flasks containing a metal spring and were incubated for up to 8 h at 30°C and 250 r.p.m. until emerging germ tubes were visible under a microscope [25]. The cultures were harvested by centrifugation, washed twice with 0.25 M TES buffer (pH 7.2) and pellets were resuspended in media. Growth curves were performed at 400 ml scale with minimal medium [26] in 2 l flasks containing a metal spring at 30 °C, shaken at 250 r.p.m. Carbon source added was adjusted to ensure that each culture contained 166.5 mM equivalent of carbon.

RnA extraction and sample preparation
S. coelicolor biomass (15 ml sample) from liquid cultures was harvested by centrifugation (5 min, 4 °C, 4000 g). Cell pellets were resuspended in an equal volume of RNAprotect (Qiagen) for 5 min at room temperature. Following centrifugation (5 min, 4 °C, 6000 g) biomass was then resuspended in 1 ml 1× TE buffer containing 15 mg ml −1 lysozyme. Tubes were vortexed for 10 s and incubated at room temperature for 60 min whilst shaking. Then, 1 ml RLT buffer (Qiagen RNA Isolation Kit)+10 µl β-mercaptoethanol was added and vortexed and a phenol chloroform extraction followed by an ethanol precipitation was carried out. The sample was then purified using a commercial RNA isolation Kit (Qiagen). The isolated RNA was treated with RNAse free DNase (Ambion, Life Technologies) as specified by the manufacturer.
Quantification of RNA was carried out using Qubit (Life Technologies). The quality and integrity of the RNA was assessed using a Bioanalyzer (Agilent). Furthermore, the RNA samples were also used as templates for a generic PCR in order to check for DNA contamination using primers for hrdB (SCO5820).
To enrich the samples for mRNA, rRNA depletion was performed (rRNA depletion Kit Ribo Zero Magnetic Kit for Gram-positive bacteria; Epicentre [Illumina]) according to the manufacturer's instructions. The rRNA-depleted samples were then precipitated with ethanol and resuspended according to the manufacturer's instructions. The quality and integrity of the samples were then analysed on the BioAnalizer (Agilent) and the concentration was determined using Qubit (Life Technologies).

Library preparation
cDNA synthesis and library preparation was carried out using the Ion Total RNA-Seq Kit v2 Revision E from Ion Torrent, Life Technologies. The manufacturer's protocol for less than 100 ng rRNA depleted samples was followed. The yield and size distribution of the amplified cDNA was assessed using BioAnalizer (Agilent). The three samples harvested from glucose cultures and three samples from tween grown cultures were barcoded and pooled to a concentration of 20 pM as specified by the manufacturer. The Ion OneTouch 2 system using the Ion PGM Template OT200 Kit (Life Technologies) was used for template preparation which includes the steps of emulsification, amplification and enrichment of the library. The libraries were checked using the quality control assay for the Qubit (Life Technologies).

RnA sequencing
Sequencing of the samples was carried out using a Ion Torrent Personal Genome Machine System (PGM; Life Technologies) on a 316v2 chip following the procedures in the manual, Ion PGM Sequencing 200 Kit v2 User Guide Revision 3.0. All sequence data were deposited on the Sequence Read Archive (SRA) under the BioProject: PRJNA566372 (https://www. ncbi. nlm. nih. gov/ sra/ PRJNA566372).

data analysis
Sequencing data were downloaded from the Ion Torrent server version Torrent Suite 4.0.2 in Fastq format. Reads were trimmed and low-quality data removed. Reads were mapped to the reference genome of S. coelicolor [1]; GenBank: NC_003888.3. The data were analysed using CLC Genomics Workbench (Version 7.5, Qiagen). The software showed on average 99.8 and 99.7% alignment of the reads to the reference sequence. The CLC differential gene-expression tool was used to determine differential gene expression. To examine differential gene-expression analysis within the dataset, the cut-off was set at a P-value of 0.05. Genewise dispersions are estimated by conditional maximum likelihood using the total count for the gene of interest followed by empirical Bayes to obtain a consensus value [27,28]. The differential expression is then assessed using Fisher's exact test adjusted to over-dispersed data [29]. The raw data output from CLC Genomics Suite can be found in Table S8, and a list of all differentially expressed genes is shown in Table S5. For the heatmap representation in Fig. 1c, all differentially expressed genes were normalized to the maximum (+1) and minimum (−1) of all significantly different expressed genes and this can be found in Table S6. The code to create Fig. 1c can be found in Code S1 (https:// doi. org/ 10. 6084/ m9. figshare. 10008914. v3). The colour code was expressed relative to the highest and lowest expression change in the data shown ranging from green to red, respectively.
Gene-ontology enrichment analysis was performed using Panther classification from gene ontology at NCBI (https:// www. ncbi. nlm. nih. gov/ pubmed/ 23868073) and summarized in Table S7. qPCR qPCR was performed on the same samples as the RNA-Seq in order to provide independent confirmation of the data obtained. Each primer pair was tested using genomic DNA as the template. cDNA was synthesized from the RNA samples using qPCRBIO cDNA synthesis Kit (PCR Biosystems) following the manufacturer's instructions. Quantification of the cDNA was carried out using the QuantiFluor ssDNA system (Promega) with the Qubit Fluorometer (Life Technologies).
All cDNA samples were diluted to a concentration of 10 ng µl −1 and each reaction contained 10 ng of cDNA. Samples were mixed with qPCRBio MasterMix and the respective primers (Table S9; Kit 2× qPCRBIO SyGreen Mix Lo-ROX from PCRBI-OSYSTEMS). Using a Corbett Research 6000 (Qiagen) machine, PCR reactions were subjected to a three-stage thermocycling reaction (one 3 min step at 95 °C, followed by 40 cycles of 5 s at 95 °C and one 25 s step at 60 °C. Each reaction was carried out in duplicate and a no template control was included for each set of primers. To allow quantification, standard curves for each gene were prepared (in triplicate) using purified PCR product from a genomic DNA PCR. This template was diluted to create seven different standards ranging from 10 1 to 10 7 molecules/per reaction (Fig. S3) and was used to calculate the concentrations of the unknown samples obtained in the RNA sequencing.

Funding information
This work was funded through PhD studentships from the Scottish Universities Life Sciences Alliance (SULSA) to J.K.S. and R.R.