Systematic study on G‐protein couple receptor prototypes: did they really evolve from prokaryotic genes?

G‐protein couple receptor (GPCR) is one of the most striking examples of signalling proteins and it is only observed in eukaryotes. Based on various GPCR identification methods and classification systems, several evolutionary presumptions of different GPCR families have been reported. However, the prototype of GPCR still limits our knowledge. By investigating its structure and domain variance, the authors propose that GPCR might be evolved from prokaryotic world. The results given by the authors indicate that metabotropic glutamate receptor family would be the ancestor of GPCR. Phylogenetic analysis hints that one of metabotropic glutamate receptor GABA is possibly formed and evolved from the ancient chemical union of bacteriorhodopsin and periplasmic binding protein. The results obtained by the authors also unprecedentedly demonstrate that specific domains and identical structures are shown in each type of GPCR, which provides unique opportunities for future strategies on GPCR orphans’ prediction and classification.


Introduction
G-protein couple receptor (GPCR), also known as 7-transmembrane domain receptors, is one of striking protein families in cellular signalling mechanism. It represents an essential branch of gene families only shown in eukaryotic kingdom. GPCR has great diversity in sequence alignment with a basal structure framework. It has a tertiary structure that not only resembles a barrel inserted in cellular membrane but also consists of three primary components: extracellular N-terminus, transmembrane region and intracellular C-terminus. Transmembrane region contains 7-transmembranic helixes (TM1 ∼ TM7), three extracellular loops (EL1 ∼ EL3) and three intracellular loops (IL1 ∼ IL3).
Basically, GPCR has three main families as classified in GPCR database (www.gpcr.org/7tm/). Rhodopsin-like receptor family (class A), consisting of more than 20 subclasses, is the largest GPCR group and also represents in most vertebrate genomes [1]; secretin receptor family (class B) has a small number and it mainly acts as hormone or neuropeptide; metabotropic glutamate receptor family (class C) performs a variety of functions in behavioural and mood regulations and nervous systems. It is also reported that receptors in this family have a specific structure showing some similarities with signal-related structures in prokaryotic genomes [2]; except the three main families, there are also a few other families such as fungal mating pheromone, frizzled or smoothened and orphan receptors. In this study we only focus on the three major families.
Previous work has provided insights on GPCR evolution focusing on two perspectives: sequence comparisons between different species and populations within one species [3]. As they mainly focused on GPCR evolution from individual family, they failed to mention a brief systematic ancestor of GPCR and the correlation among different families [4][5][6][7][8]. Since metabotropic glutamate receptor family has been found in ancient slime molds and sponges [2], several clues show that the phylogenetic oldest GPCR would be glutamate-receptor-like receptors [9][10][11]. Therefore it is necessary to figure out a possible evolutionary mechanism of GPCR metabotropic glutamate receptor family. The increasing number of complete genome sequences provides us a great opportunity to raise a GPCR evolutionary question: did GPCR evolve from the ancient prokaryotic world?
Here, we preformed comparative analyses of three primary GPCR families and identified the correlation of specific   structures of metabotropic glutamate receptor family with prokaryotic proteins. Several identical features of different GPCR listed in this study would provide us better understanding for GPCR orphans' prediction and classification.

Data mining and classification
GPCR sequences were downloaded from GPCRDB (www. gpcr.org/7tm/) and aligned by CD-HIT at 90% identity [12]. Twenty types of rhodopsin-like receptor subfamily, 18 types of secretin receptor subfamily and eight types of metabotropic glutamate receptor subfamily were applied in our study (Table 1). periplasmic binding proteins (PBP) and bacteriorhodopsin sequences were taken from NCBI (ftp ://ftp.ncbi.nlm.nih.gov/, on June 2012) and trained by NCBI bacterial database using BLAST with E-value 10 −5 [13].
The final PBP and bacteriorhodopsin sequences were aligned and filtered by CD-HIT at 85% identity.

Phylogenetic analysis
ClustalW2 was used to align PBP sequences and N-terminus [14]. Bacteriorhodopsin sequences with seventransmembranic helix were predicted by TMHMM and then aligned with GPCR 7-transmembrane. We used ProtTest2.4 to determine the best fit amino acid substitution model (JTT + F) with parameter values for maximum likelihood analyses [15]. Phylogenetic tree was constructed and visualised by MEGA5 with 1000 of bootstrap replicates [16].

Conservative analysis in eukaryotic kingdom
In our study, we focused on GPCR types existing in organisms that are observed before and after the appearance of metazoan about 550 million years ago. This period has been widely considered as a 'big bang' of the bilaterians, animals with a front and a back. Based on the phylogenetic tree of known species (http://tolweb.org/tree/), 17 most representatively complete sequenced and annotated eukaryotic genomes with wide hierarchies ranging from protists to mammals were selected from NCBI. We utilised BLAST and eukaryotic annotation information to see GPCR distributions in each eukaryotic period. We classified fungi and protists as species occurred before the exploration of metazoan and the rest species are vertebrates. A list of species applied in this study can be found in supplementary Table 1. 3 Results and discussions

Overall structure and domain distributions
Sequence alignment shows that each part of GPCR has a significant sequence variance. However, 7-transmembrane structure is considerably conserved with the nucleotide length ranging from 200 to 300 amino acids (Fig. 1). Nucleotide length and specific domain on N-terminus vary dramatically among different families (Table 1). Most domains detected on TM1 ∼ 7 are 7 tm_n and DUFx (n is 1 ∼ 7 and x is a four digital number). DUFx composes of a large protein family whose domains are still unknown. Interestingly, EL2, IL3 and EL3 show some differences among three families (supplementary file 1). It is possible that the functions of ligand-binding sites of EL2 and EL3, which receive signals from extracellular, were reinforced N-terminus in rhodopsin during rhodopsin-like receptor evolution.
Many domains on N-terminus were detected such as epidermal growth factor, cadherin domain and RGD. Some diversified domains are also found in N-terminus of different GPCR. Taking cysteine residue box for an example, it might mediate cell-to-cell adhesion and migration [10,[17][18][19][20]. An interesting thing is a great number of N-terminus on rhodopsin-like receptors are much shorter than those on the other GPCR families. Short N-terminus structure promotes extracellular loop to ligand-binding sites for receiving extracellular signals. We observed N-terminus from metabotropic glutamate receptor family containing ANF receptor and NCD3G, whereas N-terminus from secretin receptor has the same HRM domain. Rhodopsin-like receptor subfamilies only found in metazoan have few domains on N-terminus. For instance, the hormone receptor from rhodopsin-like receptor family is detected in both periods (before and after metazoan). This implies these receptors probably are forefather of rhodopsin-like receptor family. As for C-terminus, it varies considerably in length and specific domains were hardly found. C-terminus only contains serine or threonine residues to increase the affinity of intracellular surface for the binding of scaffolding proteins, especially when these residues are phosphorylated [20].

GPCR 7-transmembrane might evolve from bacteriorhodopsin
Bacteriorhodopsin is an ancient light energy related protein that widely found in prokaryotes. It was reported in 1992 that bacteriorhodopsin might contain a similarity structure with 7-transmembrane structure [21]. Electroncrystallographic refinement of bacteriorhodopsin revealed that some analogy existed between GPCR and bacteriorhodopsin [3]. Evidence shows 7-transmembrane region is conserved in the crystal structure of rhodopsin [22], and it was also identified as the similar structure as in prokaryote genomes like light-sensitive proteo-, bacterioand halorhodopsin [23,24].
Based on these previous assumptions, we searched the entire bacteria database from NCBI and the most hits are bacteriorhodopsin. We found 830 out of 1096 trained bacteriorhodopsin sequences with a length of 200 ∼ 300 amino acids consisting of 7-transmembrane region. To further investigate their phylogeny, we constructed a phylogenetic tree by the sequences from all GPCR N-terminus and bacteriorhodopsin by maximum likelihood method with bootstrap replicates of 1000. Because there is great variance between GPCR N-terminus, the bootstrap value decreases rapidly in the N-terminus branch. Therefore we only focused on the structure of this phylogenetic tree. It shows that 7-transmembrane of GPCR metabotropic glutamate receptor family has a much closer relationship with bacteriorhodopsin than other GPCR families (Fig. 2). GABA, taste, metabotropic and pheromone from GPCR metabotropic glutamate receptor subfamilies would be more ancient than secretin receptors and rhodopsin-like receptors.
We also observe that 7-transmembrane region contains polarity conserved positions, which also have been reported in bacteriorhodopsin [25]. Metabotropic glutamate receptor family has few introns whereas rhodopsin-like receptor family contain 35.5% introns and the introns of secretin receptor family are highly conserved in their position [26]. The popular consensus at present is that introns appear within the eukaryote lineage as selfish elements, which proves metabotropic glutamate receptor family is the possible ancestor of GPCR. The possible reason is fewer introns would be propitious for gene duplication under evolutionary pressure. Some other domains (e.g. Bac_rhodopsin, AtpR, Sugar_transport, T2SM) also present on bacteriorhodopsin [27][28][29][30][31][32][33]. Since bacteriorhodopsin has more diversified domains, it implies bacteriorhodopsin may play other functions in prokaryotic life activities as well. The issue that whether sequence homology between bacteriorhodopsin and GPCR is formed by exons shuffling or duplication is still controversial [34], but obviously we could draw a conclusion that the origin of 7-transmembrane region is possibly from bacteriorhodopsin in the prokaryotic world.

PBP might be the prototype of GPCR N-terminus
Previous studies have reported that metabotropic glutamate receptor N-terminus and the ancient PBP share the identical structure Venus flytrap module (VFTM). VFTM has been suffered three rounds of duplication with positively selected functional divergence [17,35]. Conformational changes of VFTM induced by ligand-binding sites might have a correlation with the prototype of N-terminus.
Studies on the metabotropic glutamate receptor N-terminus demonstrate that divergence plays a dominant role in characterising the functions of VFTM [35]. Metabotropic glutamate receptor has longer N-terminus and is also found in earliest eukaryotes like chromalveolata, unikonts and opisthokonts [10].
Sensitive sequence analysis techniques indicated that extracellular region of metabotropic glutamate receptor family is similar to PBP, but it failed to mention which subclass of metabotropic glutamate receptors is closer to PBP [36]. We extracted all GPCR N-terminus and preformed the comparative analysis between N-terminus and PBP. BLAST (E-value 0.01) result shows that only N-terminus from metabotropic glutamate receptor family could hit PBP sequences. However, N-terminus from other GPCR categories like secretin receptors and rhodopsin-like receptor would not be hit even at such low standard (E-value 0.01). Therefore with the concern that N-terminus from metabotropic glutamate receptor family might be from PBP, we preformed phylogeny analysis (Fig. 3). The phylogenetic tree implies that GABA from metabotropic glutamate receptor family is much closer to PBP. Again, this indicates the extracellular domains of metabotropic glutamate receptor GABA_B might earlier evolve from PBP rather than other metabotropic receptors.

Conclusions
GPCR plays a key role in cellular signalling and probably evolved from prokaryotic world. Most rhodopsin-like receptors only occur after metazoan and they would be the latest GPCR, which appeared after secretin receptor family and metabotropic glutamate receptor family. Highly conserved 7-transmembrane region shares significant similarity with bacteriorhodopsin in prokaryotes. PBP would be the prototype of GPCR N-terminus. GABA from metabotropic glutamate receptor might be most ancient GPCR because both N-terminus and 7-transmemebrane of GABA are closer to the ancient PBP and bacteriorhodopsin. Therefore it is possible that ancient PBP and bacteriorhodopsin combined via GPCR cysteine residue box and then formed the prototype of metabotropic glutamate receptor family.
Here our hypothesis is metabotropic glutamate receptor is the forefather of all GPCR and it probably evolves from the compound of PBP and bacteriorhodopsin (Fig. 4). Afterwards, with the enhancement of VFTM in N-terminus, it evolved into other secretin receptor family and metabotropic glutamate receptor family, as shown by black dash lines. As for rhodopsin-like receptors, there would be two evolutionary mechanisms. One is that it would directly evolve from bacteriorhodopsin with reinforced ligand-binding sites, which could take charge of receiving signals from extracellular instead of N-terminus (shown by blue dash line). The other is original metabotropic glutamate receptor lost the function of binding sites on N-terminus (shown by orange dash line) and therefore it must reinforce EL to bind signals. With reinforced binding function on EL, metabotropic glutamate receptor explosively expanded into more diversified subfamilies after the appearance of vertebrates.