Evolution and functional diversification of the GLI family of transcription factors in vertebrates.

Background In vertebrates the “SONIC HEDGEHOG” signalling pathway has been implicated in cell-fate determination, proliferation and the patterning of many different cell types and organs. As the GLI family members (GLI1, GLI2 and GLI3) are key mediators of hedgehog morphogenetic signals, over the past couple of decades they have been extensively scrutinized by genetic, molecular and biochemical means. Thus, a great deal of information is currently available about the functional aspects of GLI proteins in various vertebrate species. To address the roles of GLI genes in diversifying the repertoire of the Hh signalling and deploying them for the vertebrate specifications, in this study we have examined the evolutionary patterns of vertebrate GLI sequences within and between species. Results Phylogenetic tree analysis suggests that the vertebrate GLI1, GLI2 and GLI3 genes diverged after the separation of urochordates from vertebrates and before the tetrapods-bony fishes split. Lineage specific duplication events were also detected. Estimation of mode and strength of selection acting on GLI orthologs demonstrated that all members of the GLI gene family experienced more relaxed selection in teleost fish than in the mammalian lineage. Furthermore, the GLI1 gene appeared to have been exposed to different functional constraints in fish and tetrapod lineages, whilst a similar level of functional constraints on GLI2 and GLI3 was suggested by comparable average non-synonymous (Ka) substitutions across the lineages. A relative rate test suggested that the majority of the paralogous copies of the GLI family analyzed evolved with similar evolutionary rates except GLI1 which evolved at a significantly faster rate than its paralogous counterparts in tetrapods. Conclusions Our analysis shows that sequence evolutionary patterns of GLI family members are largely correlated with the reported similarities and differences in the functionality of GLI proteins within and between the various vertebrate species. We propose that duplication and divergence of GLI genes has increased in the complexity of vertebrate body plan by recruiting the hedgehog signalling for the novel developmental tasks.


Introduction
The GLI regulatory proteins act downstream of the secreted hedgehog (Hh) signalling molecules and are known to play an important role in vertebrate embryonic patterning in regions such as the central nervous system, the anterior-posterior axis of the embryonic limb bud, craniofacial structures and the lungs. Whilst Drosophila possesses a single homologue of GLI (cubitus interruptus, Ci), vertebrates have three members, characterized by fi ve tandem C2-H2 zinc fi ngers linked by a consensus histidinecysteine linker sequence. 1 The birth of three GLI family members (GLI1, GLI2, and GLI3) from a single Ci like ancestral gene has been attributed to small scale gene duplication events that might have occurred within the time window of vertebrates-urochordates and fi sh-tetrapod split. 2,3 Evidence from Drosophila suggests that all the Hh signalling is transduced via Ci protein. 4 In the absence of Hh signalling the cytoplasmic Ci protein is cleaved to generate an N-terminal form with repressor activity. Hh signalling blocks this cleavage and increases the concentration of full length activator form of Ci protein. Thus a single Drosophila Ci protein can work both as an activator or repressor of target genes, upon the differential regulation of Hh signalling. 5 Like Ci, the Hh signalling dependent cleavage plays an important role in the post-translational regulation of the vertebrate GLI proteins. However the activator and repressor functions of ancestral Ci protein are not distributed evenly among the three vertebrate GLI paralogs. Functionally, the Drosophila Ci is more closely related to vertebrate GLI2 and GLI3. 6,7 These two partially redundant genes 8,9 can activate transcription and undergo proteolysis to generate repressors of transcription. 10 In contrast, GLI1 cannot undergo posttranslational modification and functions primarily as an activator of Hh transcriptional response. 11 Genetic and biochemical studies in human, mice and frog suggest that during development the three GLI proteins act in combinatorial manner that is context dependent and species specifi c. 12 For example, GLI1 and GLI2 induce motor neurons in the frog spinal cord, whereas GLI3 represses this function, by contrast, GLI1 induces fl oor plate differentiation in the same species, whereas both GLI2 and GLI3 repress this function. 13 In mice GLI1 is not required for development or tumorigenesis, 14,15 but it is essential for tumor formation in frog embryo and human cancers. 16,17 Genetic studies with frogs and mice further suggest the divergent roles of GLI proteins in the patterning of the neural tube and CNS. For instance, during frog development each of the GLI proteins is critical in the induction of all primary neurons: motor, sensory and interneurons, 16 whereas loss of any single or all GLI proteins in mouse embryos does not abolish neural tube development. 18 Whilst there are divergent roles of GLI1 and GLI2 between mouse and zebrafi sh during development, the role of GLI3 appears to be conserved. 19,20 Although the general aspects of GLI functions are similar in different vertebrate species, there are some important differences both at inter and intra-specific level. From an evolutionary perspective the duplication and divergence of GLI paralogs has increased the complexity of response to Hh morphogenetic signals in vertebrates. This complexity might, in turn, have contributed towards the deployment of Hh signalling to those domains of developing embryos which are considered as vertebrate synapomorphies, for instance appendicular (limb/fi n) and craniofacial structures. To gain insight into the functional constraints operating on GLI family members (within and between the species) following the duplication events, we conducted a molecular evolutionary study comparing the tetrapod and teleost lineages. We demonstrated that all members of the GLI gene family experienced more relaxed selection in teleost fi sh than in mammalian lineage. We also found that GLI1 genes have been exposed to different functional constraints in fi sh and tetrapod lineages, whereas the GLI2 and GLI3 sequences were subjected to a similar level of functional constraints across the lineages. Additionally, we utilized a relative rate test to show that in majority of the species analyzed the paralogous copies of the GLI family evolved with similar evolutionary rates except in tetrapods where GLI1 evolved at a signifi cantly faster rate than GLI2 and GLI3. Together, these results demonstrate that the evolutionary patterns of GLI sequences are largely correlated with their interspecifi c and intraspecifi c functional similarities and differences, but also show that duplication and divergence of GLI genes had led to the recruitment of the Hh pathway for the novel developmental processes in vertebrates.

Phylogenetic analysis
The phylogenetic history of vertebrate GLI genes was analyzed by including the sequences from representative members of teleost and tetrapod lineages (Fig. 1). The tree was rooted with orthologous genes from invertebrate species. A phylogenetic tree of multigene family members provides several types of useful information for studying the evolution and diversifi cation of function of genes across various species. First, it can work as tool to provide support for or against direct orthologous relationships between genes from different species. Second, it can provide information on the likely status of members of gene family in animals that are ancestral to groups of currently extant species. Finally, the phylogenetic tree can provide an estimate of the relative time elapsed since the divergence of any two gene sequences from their most recent common ancestor.
With these points in mind the phylogenetic neighbor-joining (NJ) tree presented in Figure 1 reveals several interesting features of the vertebrate GLI gene family. The phylogeny shows a topology of the form (A) (BC) where vertebrate GLI2 and GLI3 genes cluster together with signifi cant (99%) bootstrap support whereas GLI1 genes form an outgroup to them with bootstrap support of 100% (Fig. 1). The phylogeny suggests that, in the family of GLI genes, the ancestral chordate condition (as exemplifi ed in the ciona/amphioxus) was likely a single, possibly Amphioxus-GLI like, copy of GLI gene. 21 Then, before the actinopterygii-sarcopterygii split, the Amphioxus-GLI like ancestral gene underwent a duplication event and produced two gene copies, one of them (joint ancestor of GLI2 and GLI3) duplicated again, while other might not (GLI1) (Fig. 1).
These three copies of an ancestral gene were then retained in both bony fi shes and terrestrial vertebrates, because of their adoptive signifi cance. The phylogeny further shows that GLI2 gene underwent lineage specifi c duplication events in zebrafi sh and Xenopus producing two gene copies independently in these two species (shown as GLI2a and GLI2b) (Fig. 1). Note that the branches of zebrafi sh GLI2a and GLI2b genes are long, suggesting that the duplication that gave rise to the extra copy of GLI2 gene in zebrafi sh is probably ancient, whereas the branch lengths of Xenopus GLI2a and GLI2b suggests that these genes arose relatively recently in the evolutionary history of this lineage.

Estimation of sequence divergence among species
In order to determine the level of sequence divergence (influence of selection) at various phylogenetic separations, we sought to estimate the pattern of nucleotide substitutions at both silent (synonymous) and non-silent (non-synonymous) sites among GLI orthologs within and between the fi sh and tetrapod lineages. Selection was measured in terms of the difference in the rate of non-synonymous substitutions (Ka) to the rate of synonymous substitutions (Ks). If Ka and Ks values are not signifi cantly different from each other this indicates that genes are under few or no selective constraints and thus evolving neutrally. The gene pair is said to be under negative selection, if the Ka value is signifi cantly lower than Ks (Ka Ͻ Ks), i.e. non-silent substitutions have been purged by natural selection. The smaller the value of Ka compared to Ks, the larger the number of eliminated substitutions. The converse scenario, where the Ka value is signifi cantly greater than Ks (Ka Ͼ Ks), is indicative of positive selection, i.e. advantageous mutations have accumulated during the course of evolution.
Ka and Ks values have been estimated in pairwise comparisons of orthologs using the Li-Wu-Lu method. 22 Only those codons shared among all species have been considered for the analysis using the complete deletion option.

GLI3
Within the mammalian lineage the Ks values for the GLI3 gene (Table 1)   for zebrafi sh and tetraodon/Fugu comparisons. When using pair-wise comparisons between members of mammalian and fi sh lineages, both Ks and Ka values for GLI3 are in the range of 0.4-0.5.

GLI2
For the GLI2 gene (

Estimation of functional constraints
In order to estimate the selective forces operating on GLI gene family members following the duplication events, average Ka and Ks values have been estimated for GLI1, GLI2 and GLI3 genes, both within and between mammalian and fi sh lineages ( Table 2). The t-value of difference between average Ka and Ks for each gene has then been used to estimate the signifi cance to which they differ within and between mammalian and fi sh lineages. Results shown in Table 2 suggest that, with the exception of the mammalian-fi sh GLI2 comparison, there was no signifi cant difference between the average Ka and Ks within or between the two. This indicates a strong trend towards neutrality (Ka/Ks ratio of 1) for substitution rates at synonymous and non-synonymous sites for GLI gene family. Only the mammalian-fi sh comparison for  (Table 2) revealed three important aspects of GLI evolutionary patterns. Firstly, all the three GLI gene family members showed a signifi cantly higher rate of both silent and non-silent substitutions in fi sh when compared to mammals, suggesting a relatively relaxed selection in the fi sh lineage. This pattern correlates with the observations made by Robinson-Rechavi and Laudet 23 who found that genes evolve faster in fish than in mammals. Secondly, between mammalian-fi sh lineages, the signifi cantly higher average Ka and Ks values for GLI1 compared to GLI2 and GLI3 indicates relaxed selection and accelerated evolution in GLI1. This is perhaps refl ected in the divergent GLI1 functions attained in teleosts and tetrapods 19 since they last shared a common ancestor 450 million years ago. Thirdly, between mammalian-fi sh GLI2 and GLI3 genes, not only the average Ka values, (usually subject to selective pressure) but also the corresponding Ks (assumed to be neutral) values are signifi cantly lower than saturation level (Ks Ͼ 5) ( Table 2). This indicates that strong purifying selection operates on both silent and nonsilent sites. The lower rate of substitutions at silent sites is suggestive of codon usage bias in these two genes. 24,25 Furthermore average Ka values for GLI2 and GLI3 between mammalian-fi sh lineages are similar, perhaps due to equivalent functional constraints imposed on both genes.
Whilst GLI1 appears to have undergone rapid evolution since the divergence of tetrapods and teleosts, the GLI2 and GLI3 sequences appear to have evolved at considerably slower rate. This data is consistent with the functional conservation of GLI3 in vertebrates, 20 but not with experi men tal data that indicates a functional divergence of GLI2 orthologs in mice and zebrafi sh. 19 This functional divergence of GLI2 can be explained by two scenarios, by accommodating subtle changes (non-silent) within critical functional domains of the protein in each lineage, leading to functional divergence or perhaps by changes in gene expression pattern while keeping the protein activity domains conserved throughout the course of evolution.

Evolutionary distance between paralogs
To determine the evolutionary rates with which the duplicated genes evolved in each species tested (human, mouse, rat, frog, Fugu, teraodon, zebrafi sh) the Tajima relative rate test 26 has been carried out (Table 3) on amino acid substitutions on pairs of GLI paralogs, by using the orthologous sequence Ci from Drosophila as an outgroup. The Tajima relative rate test determines whether one duplicate has diverged to a greater extant than the other by comparing the sequences of each of the paralogs with that of the ortholog used as the outgroup. The results of this analysis (Table 3) indicate that in most cases (16/21 pairs) the GLI paralogs evolved at similar rate in each animal analyzed. Our fi ndings in the relative rate test are in agreement with Hughes and Hughes 27 and Kondrashov et al. 28 who suggested that paralogs typically evolve at similar rates, without signifi cant asymmetry. The markedly increased evolutionary rate (p Ͻ 0.05) of GLI1 in human and mouse may refl ect profound changes in the function of this gene compared to either of its paralogs in mammals. This notion is compatible with results from functional studies, where GLI2 and GLI3 are found to perform overlapping activities in mammalian cell culture and transgenic experiments, while GLI1 appears to play a notably different role. 10,11 Faster evolutionary rate also suggests that orthologous copies of GLI1 gene in human and mice might have attained divergent roles during the course of evolution. This assumption is in harmony with the functional data which shows that in mice GLI1 is not required for development or tumorigenesis, but it is essential for the proliferation of human tumor cells. 15,17 Asymmetric evolution of frog GLI paralogs probably suggests a trend in tetrapod GLI1 gene to experience an increased evolutionary rate (under relaxed selection pressure), whereas rapid evolution of GLI2 (evolutionary rate is comparable to GLI1 paralog, Table 3) might indicate the functional redundancy of GLI2 duplicates (GLI2a and GLI2b) in amphibians.

Conclusions
The Hh signalling pathway first elucidated in Drosophila and subsequently the vertebrate homologs of Drosophila Hh pathway genes were described by genetic studies in mouse, frog and zebrafish. These studies demonstrated that Hh signalling in vertebrates shares many features with that in insects, although clear differences have emerged. For instance, many genes involved in this pathway expanded by gene duplication specifi cally in vertebrate lineage. GLI proteins act at the last known step of Hh signalling pathway and lead to the activation or repression of target genes in a cellular context dependent manner. The fact that vertebrates possess more copies of GLI genes than did the common ancestor of chordates, suggests that the duplication and divergence of GLI genes in vertebrates has diversifi ed the mechanisms of receiving and interpreting the Hh signalling. This increase in the genetic complexity of Hh pathway mediators in early vertebrate evolution could conceivably be one of the key factors underlying the evolution of vertebrate innovations, including the limbs, bone and craniofacial structures. In this study we have inspected the molecular evolution of GLI family members in vertebrates. All the three GLI genes show a higher degree of divergence at both synonymous and non-synonymous sites in the teleost lineage when compared to mammals. This difference may indicate that GLI orthologs have achieved a greater level of functional diversifi cation in the fi sh lineage. In mammalian-fi sh sequence comparisons it appeared that GLI1 have accumulated signifi cantly more synonymous and non-synonymous changes than GLI2 and GLI3. This may refl ect functional importance associated with evolutionary pressure to retain the sequence features of two copies of the GLI family across the vertebrate lineage, whereas the third copy was free from constraining effects of natural selection and has attained unique features in each lineage. The fi ndings from a relative rate test involving GLI paralogs from each species examined suggest that the GLI1 protein may have undergone an accelerated evolutionary rate not only at interspecifi c level but also at intraspecifi c level. We propose that a transition from a single, Amphioxus-GLI like, ancestral chordate gene to three distinct vertebrate GLIs and their subsequent interspecifi c and intraspecifi c diversifi cations were critical events in diversifying the repertoire of the Hh signalling and deploying them for the vertebrate specifi cations.

Materials and Methods
In order to analyze the evolutionary patterns/ history of GLI sequences the complete cDNAs and corresponding protein sequences for human GLI gene family members, i.e. GLI1, GLI2 and GLI3 and their orthologs in mouse, rat, frog, Fugu, tetraodon, zebrafish, and several invertebrate species (Table 4) were extracted from ENSEMBL genome browser (http://www.ensembl.org) and National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov).
The phylogenetic tree for the GLI gene family was reconstructed by using the neighbor-joining method. 29,30 All positions containing gaps and missing data were eliminated from the dataset. Reliability of the resulting tree topology was tested by the bootstrap method (at 1000 pseudoreplicates) which generated the bootstrap probability for each interior branch in the tree. 31 The phylogenetic tree was rooted with orthologous genes from invertebrates. Number of synonymous nucleotide substitutions per synonymous (Ks) and non-synonymous nucleotide substitutions per non-synonymous site (Ka) were calculated by using the Li-Wu-Lu method 22 in pairwise comparison.
Evolutionary distance between all possible pairs of GLI paralogs within each lineage was estimated by Tajima's relative rate test. 26

Author Contributions
KHG and AAA conceived the project and designed the study. AAA performed the analysis. AAA, KHG, DKG, SA analyzed the data. AAA, KHG DKG, SA wrote the paper.