Evolutionary dynamics of rhomboid proteases in Streptomycetes

Proteolytic enzymes are ubiquitous and active in a myriad of biochemical pathways. One type, the rhomboids are intramembrane serine proteases that release their products extracellularly. These proteases are present in all forms of life and their function is not fully understood, although some evidence suggests they participate in cell signaling. Streptomycetes are prolific soil bacteria with diverse physiological and metabolic properties that respond to signals from other cells and from the environment. In the present study, we investigate the evolutionary dynamics of rhomboids in Streptomycetes, as this can shed light into the possible involvement of rhomboids in the complex lifestyles of these bacteria. Analysis of Streptomyces genomes revealed that they harbor up to five divergent putative rhomboid genes (arbitrarily labeled families A–E), two of which are orthologous to rhomboids previously described in Mycobacteria. Characterization of each of these rhomboid families reveals that each group is distinctive, and has its own evolutionary history. Two of the Streptomyces rhomboid families are highly conserved across all analyzed genomes suggesting they are essential. At least one family has been horizontally transferred, while others have been lost in several genomes. Additionally, the transcription of the four rhomboid genes identified in Streptomyces coelicolor, the model organism of this genus, was verified by reverse transcription. Using phylogenetic and genomic analysis, this study demonstrates the existence of five distinct families of rhomboid genes in Streptomycetes. Families A and D are present in all nine species analyzed indicating a potentially important role for these genes. The four rhomboids present in S. coelicolor are transcribed suggesting they could participate in cellular metabolism. Future studies are needed to provide insight into the involvement of rhomboids in Streptomyces physiology. We are currently constructing knock out (KO) mutants for each of the rhomboid genes from S. coelicolor and will compare the phenotypes of the KOs to the wild type strain.


Background
Rhomboids are intramembrane serine proteases, mechanistically characterized because they release factors to the outside of the cell, rather than into the cytosol. The first rhomboid protease was discovered in Drosophila [1] and further work revealed the existence of homologs in all branches of life [2]. The human and mouse genomes encode rhomboid genes, but the largest number of rhomboid genes are encoded by plants [2]. Despite their ubiquity, rhomboid sequence identity is only about 6% across all domains of life [2]. This extremely low conservation could be due to the fact that sequences are predominantly transmembrane, thereby experiencing a different evolutionary pressure than other proteases [2]. Through multidisciplinary approaches, it has been demonstrated that rhomboids create a central hydrated microenvironment immersed below the membrane surface; this microenvironment supports the hydrolysis carried out by its serine protease-like catalytic apparatus [3]. Rhomboid proteases can have three different topologies. The simplest form has six transmembrane (TM) helices and it is the most prevalent in prokaryotic rhomboids. The other two forms have an extra TM helix either at the C-terminus (6 + 1 TM) or at the N-terminus (1 + 6 TM), and are typical of eukaryotic rhomboids. Prokaryotic rhomboids AarA from the pathogenic Providencia stuartii, YqgP from Bacillus subtilis and Rv0110, one of the rhomboids from Mycobacteria are exceptions as they exhibit 6 + 1 TM topology like eukaryotes [4].
In spite of their shared mechanism of action, rhomboids exhibit a wide diversity of biological roles. It has been demonstrated that these proteases: control EGF receptor signaling in Drosophila and Caenorhabditis elegans [2], are involved in the cleavage of adhesins in apicomplexan parasites, and regulate aspects of mitochondrial morphology and function in yeast and multicellular eukaryotes [5]. Furthermore, rhomboids are involved in processes such as inflammation and cancer suggesting they could have therapeutic potential [6]. Not much is known to date about the biological role that rhomboids play in prokaryotes. AarA, from P. stuartii affects quorum sensing [7], YqgP from B. subtilis plays a role in cell division and glucose uptake [8]. The crystal structure of GlpG from Escherichia coli has been solved, yet its biological function is still undetermined [9]. Rhomboid proteases, Rv0110 and Rv1337 from Mycobacterium, a genus from the Actinomyces phylum, have been described, yet their substrates have not been identified [10,11]. Little is also known about rhomboids from Streptomyces, another genus from Actinomyces [11]. Streptomycetes are "multicellular" prokaryotic organisms with a complex developmental cycle and secondary metabolite secretion, in which signals from other cells and from the environment are detected, integrated and responded to using multiple signal transduction systems.
Streptomycetes produce two thirds of all industrially manufactured antibiotics, and a large number of eukaryotic cell differentiation-inducers [12], apoptosis inhibitors [13] and inducers [14][15][16], protein C kinase inhibitors [17], and compounds with antitumor activity [18]. The production of these secondary metabolites (physiological differentiation) is usually temporally and genetically coordinated with the developmental program (morphological differentiation) when cultured in agar and is likely to respond to environmental, physiological and stress signals [19][20][21][22]. These processes are controlled by different families of regulatory proteins, and are elicited by both extracellular and intracellular signaling molecules mediated by an array of signal transduction systems [23]. Given the involvement of rhomboids in cell signaling, we propose that they could participate in some of the signaling cascades existing in Streptomycetes.

Sequence analysis
Nucleotide and amino acid sequences from Streptomycetes and related species were collected in two ways to guarantee an exhaustive search. An initial collection was obtained from a BLASTp search using a previously identified rhomboid protein from P. stuartii, aAaR (S70_10405) as the query. Secondly, a gene search using the Integrated Microbial Genomes Education Site (IMG/EDU) was conducted retrieving all genes with the pfam01694 domain [33]. Using a combination of both NCBI and IMG, we determined which genomes have complete sequence data, and which are in draft format. We limited our final analysis to nine completely sequenced genomes. All resulting sequences were aligned in Bioedit using ClustalW and sorted based upon sequence similarity [34]. An initial neighbor joining phylogenetic tree was constructed using the translated nucleotide sequences which aided in the construction of our groups. Monophyletic families with high similarity and bootstrap support (>80%) were then arbitrarily named A-E.
Active sites in the rhomboid domain were found using pfam and e values were collected for support [35]. Any additional domains identified were collected. Using TMHMM and Phobius, sequences were analyzed for the number of transmembrane domains, amount of transmembrane amino acids, the cellular locations of active sites and the distance between them [36,37]. A twodimensional model of the transmembrane structure was constructed using TMRPres2D [38].
In order to determine conserved sites within and across families, sequences were analyzed with weblogo which indicates with large letters residues that are most likely important in protein function [39]. Consensus sequences of each family were also constructed using Bioedit and the pair wise divergence of the rhomboid domains within and between each family was calculated to determine which families were most similar.

Phylogeny reconstruction
One sequence from each family was used as a query for a BLASTp search of all sequenced genomes excluding the Actinomycetes [40]. Then, a phylogenetic tree containing sequences from our library and the non-Actinomycetes sequences were constructed to determine other species that harbor the same family in their genome.
Mega 5.0 was utilized for our phylogenetic reconstructions [41]. Previously aligned sequences were used to construct both neighbor joining trees and maximum likelihood trees for comparison. 1,000 bootstrap replicates were calculated and those lower than 70 were removed.

Gene neighborhoods
IMG-ACT was used to determine the genome location of each rhomboid gene. Using the five sequences from S. scabiei as a query, all orthologs were identified to determine if they were found in a similar region of the genome, in the same orientation, and had the same neighboring genes. The presence or absence of operons and the functions of neighboring genes were also determined [33].

Strains and cultures
S. coelicolor M145 was kindly provided by Dr. Mervyn Bibb. S. coelicolor was grown in Mannitol Soya flour and in R5 medium [42].

PCR conditions
Chromosomal DNA was extracted from S. coelicolor [42]. Internal primers for the four putative rhomboid genes for S. coelicolor were designed with Primer 3 [43] (Table 1). Amplification reactions contained 0.3 μM each of the rhomboid specific forward and reverse primers (Table 1), 0.02 U/μl KOD Hot Start Master Mix (Novagen) ~200 ng genomic DNA and nuclease free water in a reaction volume of 20 μl. The reactions were performed in a C1000 Thermocycler (BioRad) using the following conditions: Initial polymerase activation and denaturation was done at 95°C for 2 min, followed by 30 cycles consisting of: denaturation at 95°C for 20 s, annealing at 60°C for 10 s, extension at 70°C for 10 s with a final extension at 70°C of 5 min. The correct internal fragment size was amplified for the four putative rhomboid genes.

Transcription assays
50 ml of R5 [44] liquid media in a 250 ml flask were inoculated with spores from a Soy Flour Manitol [44] S. coelicolor plate and incubated in 30°C shaker at 240 rpm for 2 days. The pellet was harvested and RNA was prepared using EZRNA Total RNA Kit. Reverse transcription was done using M_MLB Reverse Transcriptase, oligo-random primers and nucleotide mixture from Promega, with ~60 ng/μl mRNA used as template. PCR was performed using the validated primers (Table 1) following the protocol described above. The expected fragment sizes were revealed via gel electrophoresis, the purified PCR product was cloned into pUC19, and DNA sequences were verified by sequencing at DNA analysis facility (Yale University).

A. Phylogenetic studies and family relationships
The in silico analysis of rhomboid proteases in nine fully sequenced and assembled genomes of Streptomycetes revealed that their genomes have up to five diverse and distinct rhomboid genes (arbitrarily labeled Families A-E) as shown in Table 2. This is supported by high bootstrap values (Figure 1) and the high divergence across the five rhomboid families (Table 3). Streptomyces avermitilis, bingchenggensis, cattleya, coelicolor, griseus, hygroscopicus, pristinaespiralis, scabiei and sviceus have two to five rhomboid genes. Only S. scabiei and S. sviceus (Table 2) genomes contained all five families. Two of these families (A and D) are consistently present in the strains listed above, as well as in 30 additional Streptomyces genomes recently analyzed (data not shown). Streptomyces rhomboids have a few differences with the recently described Mycobacterium (a closely related Actinomycetes genus) rhomboids: Streptomycetes harbor up to five rhomboids genes whereas Mycobacteria have a maximum of two; Streptomyces rhomboids are not found in large operons as the Mycobacterium counterparts are [10], and Streptomyces rhomboid genes also appear in different gene neighborhoods than Mycobacterium rhomboids (Figure 2). A summary of the phylogenetic analysis of the five rhomboid genes is presented (Figure 1).
The topological analysis of the Streptomyces rhomboid genes shows that families C, D and E have the most prevalent structure in prokaryotic rhomboids, 6 TM helices; interestingly families A and B have an additional TM helix similar to AarA from P. stuartii [7] and YqgP from B. subtilis [8] (Figure 3). It has been suggested that the bacterial rhomboids with the 6 + 1 TM helices could either be a bacterial progenitor to the eukaryotic rhomboid proteases or they may represent an ancient family of rhomboid proteases present in the last universal common ancestor (LUCA) [45]. The fact that family A (6 + 1 TM topology) and family D (6 TM topology) are present in all genomes analyzed suggests that each family could have different and potentially critical biological roles in Streptomyces.
We have also found that families A and B have zinc fingers as extra membranous domains, but this motif is less conserved in families C, D and E. It is suggested that these soluble accessory domains may be involved in the oligomerization status of the rhomboid proteins [46]. These findings again support the idea that different families could have distinct functions in Streptomyces biology. Family A Rhomboids belonging to family A are orthologous to the Rv0110 rhomboid protease 1 in Mycobacteria [11]; they are of similar length and structure. Family A rhomboids have seven TM helices, with a long run of extracellular amino acids between helices 1 and 2. At the N-terminus, there is a long string of cytoplasmic amino acids that in many cases have a zinc-finger domain. Each protein contains the same catalytic residues, a serine and a histidine, 46 amino acids apart in TM 4 and 6 ( Figure 3). (Figure 1) during the diversification of Actinomycetes (shown in blue), since they are found in the Frankia lineage, Kitasatospora setae, Acidothermus cellulolyticus, and in the nine Streptomycetes sequences analyzed. Rhomboid A genes are 27% divergent (Table 4), are found next to the gene encoding peptidyl-prolyl cis-trans isomerase, and the surrounding neighborhood is conserved across species (Figure 2). Usually they are located towards the center of the genome, but their orientation is not the same across all species (Figure 4).    Genomic rearrangements could have contributed to the shuffling of its genomic location in some species. Family B Family B rhomboids are similar to family A in that they have 7 TMs (Figure 3), approximately 46 amino acids separating the active sites and are also 27% divergent (Table 4). B rhomboids are only found in Streptomyces and their sister species Kitasatospora seta and Streptosporangia roseum (Figure 1 shown in red). This is likely due to a gene duplication event before the diversification of the Streptomyces genus, but after the separation of other Actinomycetes, such as members of the genus Frankia and Mycobacteria. Although rhomboids A and B are paralagous, their 29% divergence ( Table 3) is indicative of the antiquity of the duplication event, and may suggest that they have different roles. The location and orientation of rhomboid B (Figure 4) is highly conserved across Streptomycetes; it is located at the end of the genome and in close proximity to the rhomboid D gene. Rhomboid B genes Figure 3 Transmembrane structure of Rhomboids using TMRPres2D [45]. Rhomboids a, b, c and d are from S. coelicolor, and e is from S. sviceus active sites are highlighted with a red circle.

Phylogenetic analysis indicates vertical transfer of rhomboid A gene
have been lost in several species including, S. bingchenggensis, S hygroscopicus and S. pristinaespiralis. The phylogenetic history among species of Streptomyces is not well understood, and thus it is unknown if the loss of this gene was due to a single or multiple events.
Families C and D We hypothesize that the C and D rhomboid families are paralogous xenologs; these families are most likely due to a horizontal transfer event to the ancestor of the Streptomyces, Kitasatospora, Frankia and Acidothermus genera (Figure 1 shown in green and purple). Rhomboids C and D are, in addition, phylogenetically discontinuous, this is supported by their absence from the Mycobacterium lineage and other Actinobacteria. Since Streptomycetes are typically promiscuous, horizontal transfer of this lineage of rhomboids is possible; although it is also possible that they have been deleted from other Actinomycetes lineages several times. Families C and D are 80 and 77% similar within their families, respectively (Table 4). Further analysis of C and D rhomboids reveals a non-Actinomycetes ancestor ( Figure 5) suggesting a different evolutionary history for these genes.
Rhomboids belonging to family C are found in six of the nine analyzed Streptomycetes (Table 2) and display a unique 6 TM motif with 66 amino acids separating the active sites ( Figure 3). All C rhomboids are found towards the beginning of the genome when present, and in the same orientation (Figure 4). D rhomboids also have 6 TMs, but consistently have only 64 amino acids separating their active sites (Figure 3). D rhomboids are found in all of the species analyzed, and therefore are likely to be functionally important. They are found at the end of the genome, and in reverse orientation. S. cattleya appears to be unique since its D rhomboid gene is found at the opposite end of the genome, in close proximity to the C gene, and in the forward orientation ( Figure 4). This could be due to genomic rearrangement. As rhomboids C and D genes are the result of a duplication event in the Streptomyces ancestor, it is expected that all genomes that contain rhomboid D would also possess rhomboid C. However, rhomboid C is not present in three of the nine species analyzed. As discussed for the rhomboid B family, it is unknown if the loss of this gene was due to a single or multiple events.
Family E E rhomboids are orthologous to the Rv1337 rhomboid protease 2 in Mycobacteria. This family is  found only in four of the nine genomes analyzed, with 27% divergence in their nucleotide sequence (Table 4). They are located in the middle of the linear genome, although the orientation is not conserved (Figure 4). They are also not found in the same operon or gene neighborhood as Mycobacteria homologs (Figure 2). Phylogenetic analysis indicates vertical transfer (Figure 1) during the diversification of Actinomycetes, as rhomboid E genes (shown in yellow) are found in Actinomycetes. Like other rhomboid genes, E rhomboids have been lost in several species, including S. coelicolor ( Figure 2 shows the location where rhomboid E should be in the S. coelicolor genome). Their sequence analysis shows a structure of 6 TM helices with catalytic residues in TMs 4 and 6 separated by about 52 amino acids (Table 2).

B. Transcriptional analysis of putative S. coelicolor rhomboid genes
The existence of multiple rhomboid genes in Streptomycetes raises the question whether all, none, or some of them are transcribed. Phylogenetic analysis identified four candidate rhomboid genes (A, B, C and D) in S. coelicolor. Primers (Table 1) were designed to amplify internal fragments for these genes. Fragments of the predicted sizes were obtained using these primers and S. coelicolor genomic DNA as a template ( Figure 6).
To determine if these genes are transcribed, we isolated mRNA from a S. coelicolor liquid culture, and used reverse transcriptase (RT) to make cDNA. Detection of transcribed rhomboid genes was done by PCR using the primers described above (Table 1). The expected fragment sizes were obtained (Figure 7). The purified PCR products were cloned into pUC19, and the correct inserts were verified by sequencing. Therefore, reverse transcription analysis confirms that the four rhomboid genes (SCO3855, SCO2139, SCO6038, SCO2013) are transcribed in S. coelicolor.

Conclusions
In summary, our analysis demonstrates the existence of five distinct families of rhomboid genes in Streptomycetes. Families A and D are present in all Streptomyces genomes analyzed, family D displays the typical prokaryotic topology with 6 TMs while family A has 6 + 1 TMs. These findings suggest that both families A and D are essential and may play different biological roles in Streptomyces. The evolutionary dynamic of the Streptomyces rhomboids is very complex, and we will expand our analysis to other Streptomyces strains and the Actinobacteria taxa to obtain a more comprehensive understanding of it. We also show that the four rhomboid genes present in S. coelicolor are transcribed. We are currently studying the expression of these genes, specifically by constructing knock out (KO) mutants for each of the putative rhomboid genes from S. coelicolor and comparing the KOs to the wild type strain. This will provide insight into the involvement of rhomboids in Streptomyces physiology.