Abstract

Pax genes are defined by the presence of a paired box that encodes a DNA-binding domain of 128 amino acids. They are involved in the development of the central nervous system, organogenesis, and oncogenesis. The known Pax genes are divided into five groups within two supergroups. By means of a novel combination of evolutionary analysis, in vitro binding assays and in vivo functional analyses, we have identified the key residues that determine the differing DNA-binding properties of the two supergroups and of the Pax-2, 5, 8 and Pax-6 subgroups within supergroup I. The differences in binding properties between the two supergroups are largely caused by amino acid changes at residues 20 and 121 of the paired domain. Although the paired domains of the Pax-2, 5, 8 and the Pax-6 group differ by >19 amino acids, their distinct DNA-binding properties are determined almost completely by a single amino acid change. Thus, a small number of amino acid changes can account in large part for the divergence in binding properties among the known paired domains. Our approach for selecting candidate sites responsible for the functional divergence between genes should also be useful for studying other gene families.

Introduction

The evolutionary diversification of living forms arises, in part, from changes in the regulatory networks governing development. The coordinated developmental and tissue-specific expression of genes requires that transcription factors accurately recognize target DNA regulatory elements and bind them with appropriate affinity. Thus, a fundamental question is how these transcription factors and binding sites evolve. The availability of genetic, functional, and structural data make the Pax gene family a model system of choice to develop new strategies for addressing this question.

Pax genes contain a paired box that encodes a well-conserved DNA-binding domain of 128 amino acids. In mammals, nine Pax genes (denoted Pax-1 to Pax-9) have been identified, all of which have essential roles in embryonic development of tissues and organs. Pax genes have been implicated in human congenital aberrations (Pax-2, 3, 6, 8, 9) and are also associated with oncogenesis (Pax-3, 5, 7) (reviewed in Engelkamp and van Heyningen 1996 ; Dahl, Koseki, and Balling 1997 ; Underhill 2000 ). Pax gene function is required for normal development of many organs and tissues, ranging from the central nervous system (Pax-2, 3, 5, 6, 7, 8) and eye (Pax-2, 6), to pancreas (Pax-4, 6), and B-lymphocytes (Pax-5) (reviewed in Engelkamp and van Heyningen 1996 ; Stuart and Gruss 1996 ; Dahl, Koseki, and Balling 1997 ; Underhill 2000 ). Pax genes have also been isolated from Drosophila (Bopp et al. 1986 , 1989 ; Baumgartner et al. 1987 ; Noll 1993 ; Quiring et al. 1994 ; Fu and Noll 1997 ; Czerny et al. 1999 ), cnidarians (Sun et al. 1997 ; Gröger et al. 2000 ; Miller et al. 2000 ), and other animals such as sea urchins and Caenorhabditis elegans (Chisholm and Horvitz 1995 ; Czerny et al. 1997 ). On the basis of a phylogenetic analysis of paired domain sequences, the known Pax genes were divided into five groups within two supergroups: Pax-2, Pax-5, Pax-8, Pax-B,poxn/Pax-A, and Pax-6/ey in supergroup I; Pax-1/Pax-9/poxm and Pax-3/Pax-7/gsb/gsbn in supergroup II (Sun et al. 1997 ). Pax genes in the same group often display similarities in expression pattern, which is suggestive of similar or partly overlapping roles in development (Chalepakis et al. 1993 ). The Pax genes arose via duplication, and their functional diversification is caused in part by the divergence of DNA-binding specificity and affinity of their paired domains. During evolution, amino acid changes in the paired domain are required to convert an ancestral DNA-binding property into a novel binding property. However, many neutral or near-neutral amino acid substitutions also accumulate, making it difficult to identify critical amino acid residues. For example, only three of the 30 amino acid differences between Pax-5 and Pax-6 paired domains were found to be important for the differences in DNA-binding specificity between the two domains (Czerny and Busslinger 1995 ). An efficient strategy to overcome this difficulty is presented later and used to identify candidate amino acid changes responsible for the differences in binding properties between different groups of paired domains. The candidate changes are then tested, individually or in combination, by in vitro binding and in vivo functional assays.

Material and Methods

Inference of Ancestral Sequences

The method of Zhang and Nei (1997) was used to infer the ancestral sequences at each internal node of the phylogenetic tree of paired domains constructed by Sun et al. (1997) (fig. 1A ). We used the fast algorithm of the Bayesian method developed by Zhang and Nei to infer the ancestral sequences, which was called distance approach. This method also provides an evaluation of the accuracy of the inferred amino acid at each site.

Calculation of Relative Rates

A method for computing the relative rates of amino acid substitution at the sites that are different between the two nodes under study is developed. The method assumes that the rate of evolution (λ) varies among residue sites according to a gamma distribution (ϕ[λ]) and that the number of changes (κ) at a site with rate λ follows the Poisson process, Pλ(κ). The conditional distribution of λ at a site with κ changes is given by
The conditional mean of λ given κ is
where α is the gamma shape parameter and D the total branch length of the tree over all residue sites.
D, α, and κ (which is inferred as the expected number of changes in the tree, excluding the change under study) can be computed by Gu and Zhang's method (1997) . The relative rates are then measured by the relative rate score (S) defined by
which is normalized by Sm, the smallest E(λ|κ) value. That is, the smallest S value is taken to be 1.

cDNA Constructs and Site-Directed Mutagenesis

The cDNA sequences of the ancestral paired domain of supergroup I (ANI), the ancestral paired domain of supergroup II (ANII), and the Pax-6 ancestral paired domain (AN6) were constructed by site-directed mutagenesis. A mouse Pax-2 paired box cDNA (gift from Dr. P. Gruss, Max Planck Institute of Biophysical Chemistry, Göttingen, Germany) was used as the template for reconstructing ANI and ANII sequences (see fig. 1B for amino acid sequence), and human Pax-6 paired box cDNA (gift from Dr. G. Saunders, University of Texas M. D. Anderson Cancer Center, Houston, Tex.) was used as the template for reconstructing the AN6 sequence (see fig. 1B ). Mutations were introduced into the template by sequential PCR steps. Pairs of complementary oligonucleotides containing the desired point mutations were used with primers flanking the paired box region to amplify overlapping fragments covering the whole paired box region; these fragments were mixed in the same tube and amplified by the flanking primers. The oligonucleotide sequences used are available upon request. The reconstructed paired boxes were cloned into the EcoRI-XhoI sites of the polylinker region of pCITE 4b (+) vector (Novagen, Madison, Wis.). Their sequences were verified and used as templates for in vitro transcription-translation in the TNT T7–coupled reticulocyte lysate system (Promega, Madison, Wis.) according to the manufacturer's instructions.

Different variants of the full-length eyeless cDNA for testing in flies were generated by site-directed mutagenesis following the similar steps outlined earlier. Once the full-length cDNAs were constructed, they were cloned in pUAST (Brand and Perrimon 1993 ) for introduction into flies. cDNA DP6M3 encodes the full-length Eyeless protein with a single substitution at position 47 of the paired domain from the Pax-6 specific asparagine (N) to the histidine (H) occurring in Pax-2, 5, 8. cDNA M2 encodes the full-length Eyeless protein with the complete Eyeless paired domain substituted with the paired domain of Pax-2. M2M3 corresponds to M2 with a single substitution at position 47 of the paired domain from the histidine (H) occurring in Pax-2, 5, 8 to the Pax-6–specific asparagine (N).

Electrophoretic Gel Mobility Shift Assay (EMSA)

The seven test-binding sequences were selected from previously tested sequences: 5S2A (originally named 5sγ2A) and PRS5 by Czerny, Schaffner, and Busslinger (1993) ; H2A2.2, H2B2.1, and H2B2.2 by Barberis et al. (1989) ; and CD19-1 and CD19-2 by Kozmik et al. (1992) . The complementary oligonucleotides containing a Pax-binding site were annealed in 2 × SSC-Tris solution. One-hundred-and-eighty femtomoles of annealed probes were labeled with 32P by a Klenow fragment and eluted into 120 μl TE through a G-50 column (Pharmacia, Piscataway, NJ). Two microliters of the labeled probes was used in one binding reaction with paired domain peptides. The amount of peptide in 2 μl of translation mixture was measured with the S-tag rapid kit (Novagene, Madison, Wis.), so that approximately equal amounts of peptide were used in each binding reaction (around 0.3–0.5 pmol). Peptide was mixed with probe in 6.5% glycerol, 0.5% NP40, 0.5 mg/ml BSA, 0.2 mM DTT, 0.7 mM EDTA, 90 mM KCL, 15 mM Tris-HCl (pH7.5), 1 μg poly[dI-C] in a total volume of 20μl at 25°C for 30 min. The DNA-protein complexes were then resolved in 8% native Long Ranger sequence gels (FMC, Rockland, Me.) containing 0.25% TBE.

In order to measure the difference in DNA-binding affinities between mutant ANI-NVS and ANII-DIN, EMSA was carried out with fixed amounts of protein and radioactive-labeled probe 5S2A with increasing amounts of cold 5S2A (0×, 50×, 100×, 200×, 400×, 1000×, 2000× concentration of the 3 fmol–labeled probe). The protein amounts were 0.25 pmol for ANI-NVS and 0.26 pmol for ANII-DIN as measured with the S-tag rapid kit. The shift and free probes were quantitated with a phosphorimager (Molecular Dynamics, Sunnyvale, Calif.). The amount of bound probe was calculated from the distribution of shift versus free probe, and a Scatchard plot was drawn from the average of three repeat experiments. Plotting the ratio of bound probe versus unbound probe against the bound probe gave a linear function. The Y-intercept of the regression line was proportional to P0Kr and its slope was −Kr, which equals the binding affinity preference divided by the amount of poly[dI-dC]. The molecular concentration of the protein was P0, Kr was proportional to the affinity of the protein to the probe, and P0 could be estimated from the X-intercept (Calzone et al. 1988 ).

Drosophila Transgenics and Genetics

dppblink-GAL4 (Staehling-Hampton et al. 1994 ) was used as the driver line to induce ectopic eyes on antennae, legs, and wings. UE10 and UE11 are homozygous viable insertions of UAS-eyeless on chromosomes 3 and 2, respectively (Halder, Callaerts, and Gehring 1995 ). Drosophila transformants for UAS-DP6M3, UAS-M2, and UAS-M2M3 were generated using P-element–mediated transformation, essentially as described by Rubin and Spradling (1982) . The recipient strain for all constructs was y ac w. We obtained several independent transformants for each of the constructs, namely M2.3, M2.9, M2.26, M2M3.31, M2M3.33, DPM3.6, DP6M3.15, and DP6M3.17. Different insertions for a given transgene can result in variations in protein levels when expressed under the control of a particular GAL4 driver line. We corrected for this possible variation in several ways. First, all UAS-transgenes were overexpressed using a single GAL4 driver line, dppblink-GAL4. Second, for the in vivo experiments, we made use of all available independent insertions for individual transgenes to correct for possible inter-strain variability. Third, we evaluated the protein expression levels reached during overexpression of the different transgenic constructs with dppblink-GAL4 by Western blotting combined with densitometry (using NIH Image Software). In brief, by means of four independent Western blots, transgenic protein levels were compared in samples containing protein extracts of wing and leg discs (which expressed the transgenes) of three larvae per sample. Although variation did exist between independent samples of the same transgene, and between samples of different transgenes, the average protein levels were not significantly different, thereby corroborating that the use of multiple different transformant lines per transgene eliminates any existing variation. Lastly, we used large sample sizes to estimate ectopic eye sizes. In conclusion, the differences observed in the experiments between different transgenes were because of the molecular differences (i.e., the site-directed mutations) between them, and not because of positional effects.

Determination of Drosophila Red Eye Pigment Concentration

Determination was carried out as described by Evans and Howells (1978) . In brief, thoraces from five flies with ectopic eyes on wings and legs were sonicated in 400 μl of a 1:1 mixture of 0.1% NH4OH and chloroform. The samples were then centrifuged for 5 min at 14,000 rpm to remove debris. Two-hundred microliters of the supernatant of each sample was split into two wells of a 96-well plate, and spectrophotometric readings were taken at 485 nm. For each cross, multiple samples of five flies were taken to determine red eye pigment concentrations (see fig. 6C for details on sample number).

Results

Inference and Construction of Ancestral Paired Domains

We are especially interested in the differences between supergroups I and II because they would represent a very early functional divergence between known Pax genes (fig. 1A ). Furthermore, we are also interested in how major groups in supergroup I, and in particular the Pax-6 and Pax-2, 5, 8 groups, diversified in function. In order to infer the amino acid changes responsible for the divergences in binding properties between Pax groups, we first used the method of Zhang and Nei (1997) to infer the ancestral paired domain sequence at each internal node of the phylogenetic tree (fig. 1A ) constructed by Sun et al. (1997) . In principle, we could compare the predicted ancestral domains of the Pax-6 and the Pax-2, 5, 8 group. However, the Pax-2, 5, 8 ancestral sequence is very similar to the ancestral sequence of supergroup I, so instead we restricted our analysis to the latter. Furthermore, by comparing the binding properties of the ancestors of supergroup I and the Pax-6 group, we can identify the critical changes accompanying the change from the binding properties of the ancestor of supergroup I to the novel binding properties of the Pax-6 group during evolution. The Pax-4 paired domain is not included in the analysis because it is very divergent, though it is clustered with the Pax-6 group (Balczarek, Lai, and Kumar 1997 ). The inclusion or exclusion of Pax-4 does not change the topology of the phylogenetic tree of the paired domains and the ancestral sequences inferred (results not shown). The predicted ancestral paired domain sequences of supergroups I and II and of the Pax-6 group are shown in figure 1B as ANI, ANII, and AN6, respectively. The average accuracy of each ancestral node is higher than 90%. The possibility of mistakes caused by potentially wrong ancestral inference was also examined. For each amino acid residue that is important in this study (positions 20, 22, 44, 47, and 121), the accuracy of inference at each of nodes ANI, ANII, and AN6 is above 99%. Thus, it is unlikely that uncertainty in ancestral inference at these sites will mislead our interpretation.

Inference of Candidate Critical Amino Acids

As there are seven differences between ANI and ANII (fig. 1B ) and 19 amino acid differences between AN6 and ANI, it would have been tedious to test all differences for their effects by mutagenesis and binding assays. The second step of our strategy was to infer good candidate sites as follows. We first computed the relative rates of amino acid substitution at the sites that are different between the two groups under study (table 1 ). A site that shows a high rate of evolution would not be a good candidate because it would imply either frequent change in binding properties or little functional contribution to binding. We therefore selected only those sites that show very low relative evolutionary rates as candidate sites. For the differences between ANI and ANII, we selected three sites with the lowest rates: 121, 22, and 20 (table 1 ); 74 has the same rate as site 20, but it is not conserved within each of the two supergroups and was therefore not selected for testing. In the comparison between AN6 and ANI, sites 44, 47, and 66 have the lowest relative rates (table 1 ) and were selected as candidate sites for the binding specificity differences between AN6 and ANI.

DNA-Binding Properties of Ancestral Paired Domains

We selected seven identified Pax-5 binding sites to test the binding properties of ancestral paired domains. These seven nucleotide sequences were deduced from their binding with the Pax-5 paired domain, and representatives from different Pax groups showed distinct binding properties with this panel of sequences (Czerny, Schaffner, and Busslinger 1993 ). As controls and for comparison with the binding properties of ANI, ANII, and AN6, we included the paired domains of mouse Pax-1 (MPD1), Pax-2 (MPD2), and Pax-3 (MPD3), human Pax-6 (HPD6), and sea nettle Pax-A (SNPDA). These five-paired domains indeed showed essentially the same binding properties with the seven test sequences as expected from the study of Czerny and Busslinger (1995) (fig. 2 ). The only minor difference we observed was that the Pax-3 paired domain showed binding to only two instead of three sequences. This is probably because we used only the paired domain instead of the whole Pax protein and a much smaller amount of protein.

The ANI bound strongly to all test sequences except H2A2.2, which was bound less strongly. Pax-2 and Pax-A (MPD2 and SNPDA) showed the same broad specificity of binding to this set of test sequences, although overall they bound less strongly than ANI. In contrast, AN6 showed the same narrower binding specificity and roughly similar binding strength as the human Pax-6 paired domain (HPD6), except that human Pax-6 also showed weak binding to H2B2.1. The ANII showed almost no binding with the test sequences. The mouse Pax-1 paired domain (MPD1) showed modest binding with all test sequences except H2A2.2, whereas the mouse Pax-3 paired domain (MPD3) showed little if any binding with all sequences, except CD19.2 and 5S2A, which showed modest binding. Therefore, within the two supergroups, functional diversification during evolution appears to have involved changes from broad to narrow specificities in binding within the test sequences (ANI to AN6 and human Pax-6), and changes in binding affinities either across multiple binding sequences (e.g., ANI to mouse Pax-2 and sea nettle Pax-A, and ANII to mouse Pax-1) or across a few sequences (ANII to mouse Pax-3). Further, changes in binding specificity can occur independent of the changes in binding affinity.

Two Amino Acid Changes are Largely Responsible for the DNA-Binding Property Differences Between ANI and ANII

Figure 3A shows the effects of single mutations at site 20, 22, or 121 on the binding patterns of ANI and ANII. No single mutation at these three sites changed the binding properties of ANI significantly, as shown in the binding patterns of ANI-NVN (D20N), ANI-DVS (N121S), and ANI-DIN (V22I). However, changes at positions 20 (N20D, ANII-DIS) or 121 (S121N, ANII-NIN) in ANII greatly increased the binding strength of ANII to the test sequences: the binding properties of these ANII variants resembled those of ANI. In contrast, I22V in ANII (ANII-NVS) failed to significantly increase the binding of ANII to any of the test sequences.

The binding properties of ANI and ANII with combined mutations in positions 20, 22, and 121 were then tested. Figure 3B shows that ANI-NVS (D20N and N121S) had substantially decreased binding to test sequences relative to ANI, whereas the ANII-DIN (N20D and S121N), unlike ANII, bound efficiently to several sequences. When the binding strengths of ANI-NVS and ANII-DIN to test sequence 5S2A were compared using serially diluted concentrations of 5S2A, the binding of ANI-NVS was visibly weaker than that of ANII-DIN at every concentration tested (fig. 4A ). When the ratios of intensity of the shifted band versus the free band of 5S2A are calculated for ANI-NVS and ANII-DIN, the ratio for ANII-DIN ranges from 2- to 11-fold higher than that for ANI-NVS (five independent replicate assays). This range is rather larger than that expected from experimental error, suggesting that different preparations of the proteins differ in the specific activity of their total binding affinity. Nevertheless, ANII-DIN clearly had a higher affinity to the test sequences than ANI-NVS. In order to minimize experimental errors, we repeated EMSAs with three pairs of important ANI and ANII mutant paired domains. They were ANI and ANII, ANI-NVS and ANII-DIN, and ANI-NIS and ANII-DVN. ANII, ANI-NVS, and ANI-NIS all had weaker binding affinities to all test sequences than their corresponding mutants: ANI, ANII-DIN, and ANII-DVN. So, by mutating both sites 20 and 121, the binding properties of ANI and ANII to the test sequences could be swapped almost completely, and sites 20 and 121 should be the major sites responsible for the DNA-binding property differences between ANI and ANII. The combined mutations at sites 20 and 22 (ANI-NIN and ANII-DVS) or sites 22 and 121 (ANI-DIS and ANII-NVN) failed to cause effects on binding properties as significant as those at sites 20 and 121. When all three sites were mutated, ANI-NIS showed weak binding to the sequences, which was similar to ANI-NVS. However, the binding differences between ANI and ANII were mainly caused by their binding affinities, and none of these mutations could completely change ANI to ANII.

In order to get a more systematic view of the DNA-binding differences between ANI-NVS and ANII-DIN, competitive binding assays were performed using the 5S2A probe, which showed the strongest binding with ANI-NVS (fig. 4B ). There were seven different concentrations of competitive cold probe in the experiment (from 0 fmol to 6 pmol). When the linear regression was drawn, the last two concentrations (3 and 6 pmol) of ANI-NVS caused a big variation from the linear function with r2 = 0.58. Because the radioactive shift bands in the last two concentrations of competitive probes were very weak, big measurement errors might have occurred, accounting for the large variation in the linear regression. The last two points in the regression were therefore deleted, and reasonably good linear regressions were inferred with r2 = 0.9 for ANI-NVS and r2 = 0.92 for ANII-DIN (fig. 4C and D ). The Y-intercepts were 0.21 for ANI-NVS and 0.55 for ANII-DIN, whereas the slopes (= −Kr) were −1.11 × 10−3 for ANI-NVS and −2.28 × 10−3 for ANII-DIN. Comparison of the slopes shows that ANII-DIN had a two times higher affinity than ANI-NVS to 5S2A.

In Vitro and In Vivo Analysis of the Evolutionary Divergence Within Supergroup I: A Single Critical Amino Acid Change Between ANI and AN6

Figure 5 shows the effects of changes (individually or in combination) in ANI from R to Q, H to N, and G to R at positions 44, 47, and 66 (i.e., R44Q, H47N, and G66R), respectively, and the reciprocal alterations in AN6 (i.e., Q44R, N47H, and R66G). The binding pattern of ANI-RNG was very similar to that of AN6, except it also showed weak HPD6-like binding to H2B2.1, whereas the broad binding specificity of AN6-QHR was very similar to that of ANI. This finding suggests that an amino acid change at site 47 (H47N in ANI or N47H in AN6) alone can almost completely swap the binding patterns between ANI and AN6. In comparison, site 44 is less important because the change R44Q (i.e., ANI-QHG) had only a slight quantitative effect on the binding properties of ANI and the reciprocal change Q44R (i.e., AN6-RNR) caused weak binding to H2B2.1 and has only a slight quantitative effect on the binding to CD19.1 and H2B2.2. In combination, R44Q and H47N in ANI (ANI-QNG) showed the same binding properties as AN6, whereas Q44R and N47H in AN6 (AN6-RHR) only had a modest increase in affinity relative to the change of site 47 alone (AN6-QHR). Adding a change at site 66 had a modest quantitative effect on the binding properties of ANI and AN6 (fig. 5 ).

We next studied the significance of amino acid 47 of the paired domain in vivo, using the induction of ectopic eyes in Drosophila as a bioassay (Halder, Callaerts, and Gehring 1995 ) (fig. 6A ). It has been shown in numerous studies that ectopic expression of eyeless and Pax-6 homologs from other species by means of the GAL4-UAS-system (Brand and Perrimon 1993 ) results in the formation of supernumerary eyes (Halder, Callaerts, and Gehring 1995 ; Tomarev et al. 1997 ; Glardon et al. 1997 ). A very interesting observation was that mouse Pax-6 induces smaller ectopic eyes than eyeless (Halder, Callaerts, and Gehring 1995 ). Ectopic eye induction by a second Pax-6 homolog in Drosophila, twin of eyeless (toy), also showed smaller eyes (Czerny et al. 1999 ). The same authors also demonstrated distinct DNA-binding properties of Eyeless and Toy paired domains in vitro. These observations provided the basis for our in vivo analysis of paired domain evolution. Specifically, the rationale for these studies was that any major change in Pax-6 DNA-binding affinity, as suggested for N47H, would interfere with the normal induction of target genes regulated by Eyeless and thus result in induction of no ectopic eyes or smaller ones. In a first set of experiments, the size and frequency of ectopic eyes on antennae, wings, and legs induced by overexpressing Eyeless (transgene UE in fig. 6B ) or Eyeless[N47H] (transgene DP6M3 in fig. 6B ) were compared. In a second set of experiments, a similar comparison was made of ectopic eyes induced by overexpressing Eyeless[Pax2] (transgene M2 in fig. 6B ), in which the complete paired box of eyeless was replaced with the one from mouse Pax-2, or by overexpressing Eyeless[Pax2-H47N] (transgene M2M3 in fig. 6B ). The results were scored by (1) comparison of eye size on a relative scale of one (small) to three (large) in antennae, wings, and legs (fig. 6B ) and (2) by comparing eye pigment concentrations in thoraces carrying wings and legs with ectopic eyes (fig. 6C ). Scoring of relative sizes of ectopic eyes on antennae, legs, and wings revealed that the formation of ectopic eyes on the antennae is most sensitive to alterations in the transgenes. The wildtype UAS-eyeless transgene (UE) yielded antennal eyes in 100% of the cases. In contrast, the N47H mutation present in DP6M3 led to ectopic antennal eyes in only 25% of the cases. A swap of the Eyeless paired domain for the Pax-2 paired domain (transgene M2) resulted in an almost complete absence of antennal ectopic eyes. This effect is strongly reverted in the H47N mutation (transgene M2M3), where 77% had ectopic antennal eyes. The results for ectopic eyes in wings and legs were less dramatic than on the antennae, but a significant shift was observed in both size and presence of ectopic eyes with the various transgenes. Scoring of eye sizes on legs and wings clearly indicated a pronounced quantitative effect. To further evaluate size and frequency of ectopic eye induction on wings and legs, we used quantitative measurements of red eye pigments. The thorax of adult flies carries wings, legs, and halteres, which in wild type flies do not contain eye pigments. We determined red eye pigment concentrations in whole thoraces with wings and legs (with ectopic eyes) as a measure of the sizes of ectopic eyes. The data for red eye pigment concentrations were in good agreement with the scoring of relative eye sizes described before. Statistically significant decreases were observed when the N47H mutation was introduced in the Eyeless paired domain (compare DP6M3 with UE) or when the Eyeless paired domain was replaced with the Pax-2 paired domain (compare M2 with UE). Similarly, a statistically significant increase was observed when the H47N mutation was introduced in the Pax-2 paired domain (compare M2 with M2M3). The overall conclusion of both sets of experiments is that a change toward the Pax-6 specific N at position 47 of the paired domain leads to larger and more ectopic eyes (or both). Smaller or fewer ectopic eyes were observed when the change was toward H47 present in Pax-2, 5, 8. A remarkable result is that a complete replacement of the Eyeless paired domain for the Pax-2 paired domain did not abolish ectopic eye induction entirely but led instead to the induction of fewer and smaller ectopic eyes. This result strongly suggests that remarkable specificity can be conveyed by a single amino acid change but that the interaction of paired domains and their in vivo binding sites may be more flexible than anticipated.

Discussion

Divergence of Ancestral Supergroups I and II

The DNA-binding properties of the paired domains of ANI and ANII are strongly affected by residues 20 and 121, although it appears that mutations at all seven sites that differ between ANI and ANII display subtle effects. The paired domain consists of N-terminal and C-terminal subdomains. Each subdomain forms a helix-turn-helix structure (Xu et al. 1995 , 1999 ). Residue 20 is the first residue in the first helix of the N-terminal subdomain and may have an effect on the minor groove contacts made by the β-turn residues at the very N-terminus of the paired domain (Xu et al. 1999 ). Residue 121 is located in the third helix of the C-terminal subdomain. In Pax-6, asparagine 121 makes a water-mediated major groove contact with a guanine (Xu et al. 1999 ). In about half of the paired domains, residue 121 is a serine, which according to Xu et al. (1999) can readily make similar DNA contacts. Our data suggest that at least in in vitro analyses, this residue nevertheless has a fairly strong effect on DNA-binding affinity. Thus, it is possible that the primary effect of substitutions at positions 20, 22, and 121 is at the level of binding affinity. At the transcriptional level, this could result in a selective use of high-affinity binding sites with the consequent loss of expression of target genes that are regulated primarily by low-affinity sites. An alternative explanation is that 20N and 121S in ANII form a structure that cannot bind the test sequences, with a substitution in either of them changing this structure and making the protein capable of binding.

Because some Pax proteins also contain a homeodomain and because the paired domain and the homeodomain may interact with each other in DNA-binding (Underhill, Vogan, and Gros 1995 ; Fortin, Underhill, and Gros 1998 ), our study represents a simplified approach to simulate the process of the evolution of Pax DNA-binding properties. Nevertheless, our study has provided novel insight into the evolution of paired domains. The ancestors of supergroups I and II have very different binding properties to the panel of test sequences used in this study, and two amino acid substitutions have dominant effects on swapping their binding properties. Because there is no reliable root to the phylogenetic tree of Pax genes, we cannot predict the sequence of the common ancestor of all Pax genes and its binding properties to complete the whole picture of early Pax evolution. However, it is intriguing to speculate that gene duplication of a common ancestor gave rise to the two ancestors of supergroups I and II, and these two ancestor genes mutated at positions 20 and 121 and acquired different DNA-binding properties to initiate the differentiation of the two supergroups. It is possible that the DNA-binding property of the common ancestor resembles the ANI, and the accumulated mutations in sites 20, 121 (and others) might have gradually changed the protein structure to acquire new DNA-binding properties. As measured by the induction of ectopic eyes, our in vivo data clearly demonstrate that even a complete replacement of the paired domain of Eyeless by the one from Pax-2 does not abolish biological activity. This clearly suggests that in vivo, there is significant flexibility with regard to target-binding site recognition and utilization. At the same time it suggests a possible way through which gradual changes can accumulate without necessarily being detrimental to the organism. In conjunction with a gene duplication, one can envisage that initially the genes largely regulate the same targets but that over time the target gene populations will start to diverge. Lastly, our data indicate that the common ancestor of supergroup I had DNA-binding properties very similar to modern members of Pax-2, 5, 8 such as Pax-2. Also, the inferred ancestor paired domain sequence is very similar to the modern Pax-5 paired domain. Thus, it appears that the Pax-2, 5, 8 group has an ancient origin, and the original function could still be well preserved in some modern Pax-2, 5, 8 group members.

Evolution of the Paired Domain Within Supergroup I: Residue 47 is a Key Factor in the Divergence of the Pax-2, 5, 8 and Pax-6 Groups

Our novel approach using evolutionary analysis, in vitro binding assays, and in vivo ectopic eye induction identified residue 47 of the paired domain as the most critical for the sequence recognition difference between ANI and AN6. Previous structural and in vitro binding studies also identified an essential role for residue 47 (Czerny and Busslinger 1995 ; Xu et al. 1995 , 1999 ). The consistency of our results with previous reports, as well as the novel residues important for the divergence of the two supergroups (ANI and ANII) identified in the current study, are strong arguments for the validity and potentially wide usefulness of our multidisciplinary approach. Residue 47 is the first in the recognition helix (helix three). In Pax-6 (human), residue 47 is an asparagine (N47), whereas a histidine is present at position 47 in Drosophila Paired (Prd). These residues were shown to interact with DNA in a significantly different manner. H47 in Paired forms hydrogen bonds with a guanine at position 4 of the DNA consensus oligo used for crystallization (Xu et al. 1995 ). In contrast, N47 in Pax-6 recognizes an AT basepair by means of a van der Waals contact with a thymine at position 4 and a water-mediated contact with the phosphate of thymine at position 2 (Xu et al. 1999 ). The different interactions with DNA form the structural basis for the sequence-specificity observed in the current study, as well as in two other studies. Previously, Jun and Desplan (1996) demonstrated that reciprocal mutations at residue 47 in the Pax-6 and Prd paired domain were able to change the preferred binding specificity of two consensus binding sequences, PrdL and Pax-6L.Czerny and Busslinger (1995) had previously shown that a combination of changes at sites 42, 44, and 47 can completely swap the binding patterns of Pax-5 and Pax-6. In the current study, we demonstrated that residue 47 is the most critical difference between Pax-6 and Pax-2, 5, 8 within supergroup ANI. However, our evolutionary analysis revealed that site 42 is not a good candidate site because the rate of evolution at this site is relatively high. In fact, although ANI-RNG, ANI-QNG, and ANI-QNR have Q, whereas AN6 has L at position 42 (fig. 1B ), they show basically the same binding properties (fig. 2 ).

As position 47 is mainly responsible for the DNA-binding property divergence of the Pax-6 lineage from the common ancestor of supergroup I, it is interesting to note that Pax-4 also has a N at site 47, further supporting the clustering of Pax-4 with Pax-6 (Balczarek, Lai, and Kumar 1997 ). We propose that the substitution H47N occurred early in the duplicated gene of the ancestor of supergroup I and gave rise to the common ancestor of the Pax-6-Pax-4 group. In a later stage, gene duplication and more amino acid substitutions occurred, giving rise to the Pax-6 and Pax-4 lineages. In the formation of the Pax-6 lineage, the substitution R44Q might play a role in further defining the functional specificity of the Pax-6 gene. However, the substitution H47N is the most crucial step in the formation of the Pax-6 lineage. As for the Pax-4 lineage, only position 20 is different between Pax-4 and Pax-6 for all the critical amino acid residues we examined. The fact that most amino acid changes found in Pax-4 are unique implies that the unique property of the Pax-4 paired domain was derived from the common ancestor of Pax-6 and Pax-4. Further efforts to identify critical amino acid substitutions that led to the Pax-4 lineage will complement our knowledge of the functional evolution of this important Pax group.

David Irwin, Reviewing Editor

Keywords: paired domains binding assays in vivo tests DNA-binding properties functional evolution

Address for correspondence and reprints: Wen-Hsiung Li, Department of Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, Illinois 60637. E-mail: whli@uchicago.edu

Table 1 The Relative Rate Score (S) of Amino Acid Substitutions Between ANI and ANII and Between ANI and AN6

Table 1 The Relative Rate Score (S) of Amino Acid Substitutions Between ANI and ANII and Between ANI and AN6

Table 2 Constructs with Mutations in Each Candidate Site

Table 2 Constructs with Mutations in Each Candidate Site

Fig. 1.—A, A simplified tree topology for the five groups of paired domains. The ancestral paired domains of supergroup I (ANI), the Pax-6 group (AN6) and the supergroup II (ANII) are indicated by arrows. A more comprehensive phylogenetic analysis including more recent data (e.g., sea urchin Pax genes (Czerny et al. 1997 ) gave essentially the same result. B, The inferred ancestral sequences

Fig. 2.—EMSA of the representative paired domains from five Pax groups and three ancestral paired domains with seven Pax-5 binding sequences

Fig. 3.—A, EMSA of ancestral paired domains I (ANI) and II (ANII) with a single mutation in position 22, 22, or 121 with binding sequences. B, EMSA of ANI and ANII with combined mutations in positions 20, 22, and 121 with binding sequences

Fig. 4.—A, EMSA of ANI-NVS and ANII-DIN with test sequence 5S2A in series diluted concentration (from the same concentration as in other EMSA to 1/2, 1/4, 1/8, or 1/16 concentration). B = bound, U = unbound. B, Competitive binding assay of 5S2A with ANI-NVS and ANII-DIN. The cold probe concentrations are denoted on the top. Linear regression of the competitive binding assay of ANI-NVS (C) and ANII-DIN (D) with 5S2A. The amount of bound and unbound probes, including both cold and hot probes, were calculated on the basis of the distribution of bound and unbound hot probes on the EMSA

Fig. 5.—EMSA of different mutated paired domains (table 2 ) derived from the ancestral paired domain I (ANI) and ancestral paired domain 6 (AN6) with the same panel of test sequences used in figure 2 . The right panel indicates the amino acid change(s) in ANI or AN6

Fig. 6.—In vivo analysis of the role of residue 47 of the paired domain. A, Ectopic eyes on legs (encircled) and wing (arrow) of an adult fly of the following genotype: dppblink-GAL4; UAS-DP6M3. The dppblink-GAL4 driver line is expressed in the dpp pattern in imaginal discs. In those cells where GAL4 protein is produced, the UAS-DP6M3 transgene will be transcribed, resulting in the production of Eyeless protein with the N47H mutation. The Eyeless protein will activate the eye developmental pathway, resulting in the formation of ectopic eyes. B, Summary of ectopic eye induction, induced with different transgenes. All ectopic eyes were induced by driving the various transgenes with the dppblink-GAL4 driver line. Transgenes: UE = UAS-Eyeless; DP6M3 = UAS-Eyeless [N47H]; M2 = UAS-Eyeless[-2], M2M3 = UAS-Eyeless[Pax-2H47N]. The results for each transgene correspond to the pooled data from experiments with independent transgenic lines (see Materials and Methods for the list of transgenic lines that were used). #Animal: number examined; Size: indicates the relative size of ectopic eyes induced on antennae (A), wings (W), and legs (L), ranging from No eye, over very small (1) to large (3). The relative size is determined by comparing to the maximum eye sizes found on the corresponding appendages in the flies overexpressing wild type Eyeless (UE). The numbers in columns A, W, and L indicate the percentage of flies that displayed ectopic eyes of a particular relative size on Antenna, Wing, and Leg. C, Red eye pigment concentrations in extracts of thoraces with ectopic eyes on wings and legs. Changes from wild type Eyeless resulted in statistically significant reductions in red eye pigment concentrations that are used as a measure for eye size. Replacement of N47 of the Eyeless paired domain (transgene UE) by H47 (transgene DP6M3) results in pigment concentration values 66% of wild type (P < 0.01). A complete replacement of the Eyeless paired domain (transgene UE) with the Pax-2 paired domain (transgene M2) reduced this value even lower to 55% of wild type (P < 0.001). A single amino acid change at position 47 of the Pax-2 paired domain (transgene M2) from H47 to N47 (transgene M2M3), increased the red eye pigment concentrations to almost the same values (96%) as those induced by the wild type Eyeless protein (compare transgenes UE and M2M3). This increase is highly significant (P < 0.001). n indicates the number of samples consisting of five thoraces, taken from independent transgene insertions (see Materials and Methods for details). P values were determined with the unpaired t-test

This study was supported by grants from NIH (GM 57721, HD38387, GM 30998), the Advanced Research Program (ARP) of the Texas Higher Education Coordinating Board, and the University of Houston. H.S. is a recipient of the Schissler Scholarship for Human Genetics. We thank Drs. P. Gruss and G. Saunders for the Pax cDNA probes.

References

Balczarek K. A., Z. C. Lai, S. Kumar,

1997
Evolution of functional diversification of the paired box (Pax) DNA-binding domains
Mol. Biol. Evol
14
:
829
-842

Barberis A., G. Superti-Furga, L. Vitelli, I. Kemler, M. Busslinger,

1989
Developmental and tissue-specific regulation of a novel transcription factor of the sea urchin
Genes Dev
3
:
663
-675

Baumgartner S., D. Bopp, M. Burri, M. Noll,

1987
Structure of two genes at the gooseberry locus related to the paired gene and their spatial expression during Drosophila embryogenesis
Genes Dev
1
:
1247
-1267

Bopp D., M. Burri, S. Baumgartner, G. Frigerio, M. Noll,

1986
Conservation of a large protein domain in the segmentation gene paired and in functionally related genes of Drosophila
Cell
47
:
1033
-1040

Bopp D., E. Jamet, S. Baumgartner, M. Burri, M. Noll,

1989
Isolation of two tissue-specific Drosophila paired box genes, Pox meso and Pox neuro
EMBO J
8
:
3447
-3457

Brand A. H., N. Perrimon,

1993
Targeted gene expression as a means of altering cell fates and generating dominant phenotypes
Development
118
:
401
-415

Calzone F. J., N. Theze, P. Thiebaud, R. L. Hill, R. J. Britten, E. H. Davidson,

1988
Developmental appearance of factors that bind specifically to cis-regulatory sequences of a gene expressed in the sea urchin embryo
Genes Dev
2
:
1074
-1088

Chalepakis G., A. Stoykova, J. Wijnholds, P. Tremblay, P. Gruss,

1993
Pax: gene regulators in the developing nervous system
J. Neurobiol
24
:
1367
-1384

Chisholm A. D., H. R. Horvitz,

1995
Patterning of the Caenorhabditis elegans head region by the Pax-6 family member vab-3
Nature
377
:
52
-55

Czerny T., M. Bouchard, Z. Kozmik, M. Busslinger,

1997
The characterization of novel Pax genes of the sea urchin and Drosophila reveal an ancient evolutionary origin of the Pax2/5/8 subfamily
Mech Dev
67
:
179
-192

Czerny T., M. Busslinger,

1995
DNA-binding and transactivation properties of Pax-6: three amino acids in the paired domain are responsible for the different sequence recognition of Pax-6 and BSAP (Pax-5)
Mol. Cell. Biol
15
:
2858
-2871

Czerny T., G. Halder, U. Kloter, A. Souabni, W. J. Gehring, M. Busslinger,

1999
Twin of eyeless, a second Pax-6 gene of Drosophila, acts upstream of eyeless in the control of eye development
Mol Cell
3
:
297
-307

Czerny T., G. Schaffner, M. Busslinger,

1993
DNA sequence recognition by Pax proteins: bipartite structure of the paired domain and its binding site
Genes Dev
7
:
2048
-2061

Dahl E., H. Koseki, R. Balling,

1997
Pax genes and organogenesis
Bioessays
19
:
755
-765

Engelkamp D., V. van Heyningen,

1996
Transcription factors in disease
Curr. Opin. Genet. Dev
6
:
334
-342

Evans B. A., A. J. Howells,

1978
Control of drosopterin synthesis in Drosophila melanogaster: mutants showing an altered pattern of GTP cyclohydrolase activity during development
Biochem. Genet
16
:
13
-26

Fortin A. S., D. A. Underhill, P. Gros,

1998
Helix 2 of the paired domain plays a key role in the regulation of DNA-binding by the Pax-3 homeodomain
Nucleic Acids Res
26
:
4574
-4581

Fu W., M. Noll,

1997
The Pax2 homolog sparkling is required for development of cone and pigment cells in the Drosophila eye
Genes Dev
11
:
2066
-2078

Glardon S., P. Callaerts, G. Halder, W. J. Gehring,

1997
Conservation of Pax-6 in a lower chordate, the ascidian Phallusia mammillata
Development
124
:
817
-825

Gröger H., P. Callaerts, W. J. Gehring, V. Schmid,

2000
Characterization and expression analysis of an ancestor-type Pax gene in the hydrozoan jellyfish Podocoryne carnea
Mech. Dev
94
:
157
-169

Gu X., J. Zhang,

1997
A simple method for estimating the parameter of substitution rate variation among sites
Mol Biol. Evol
14
:
1106
-1113

Halder G., P. Callaerts, W. Gehring,

1995
Induction of ectopic eyes by targeted expression of the eyeless gene in Drosophila
Science
267
:
1788
-1792

Jun S., C. Desplan,

1996
Cooperative interactions between paired domain and homeodomain
Development
122
:
2639
-2650

Kozmik Z., S. Wang, P. Dorfler, B. Adams, M. Busslinger,

1992
The promoter of the CD19 gene is a target for the B-cell-specific transcription factor BSAP
Mol Cell Biol
12
:
2662
-2672

Miller D. J., D. C. Hayward, J. S. Reece-Hoyes, I. Scholten, J. Catmull, W. J. Gehring, P. Callaerts, J. E. Larsen, E. E. Ball,

2000
Pax gene diversity in the basal cnidarian Acropora millepora (Cnidaria, Anthozoa): implications for the evolution of the Pax gene family
Proc. Natl. Acad. Sci. USA
97
:
4475
-4480

Noll M.,

1993
Evolution and role of Pax genes
Curr. Opin. Genet. Dev
3
:
595
-605

Quiring R., U. Walldorf, U. Kloter, W. Gehring,

1994
Homology of the eyeless gene of Drosophila to the Small eye gene in mice and Aniridia in humans
Science
265
:
785
-789

Rubin G. M., A. C. Spradling,

1982
Genetic transformation of Drosophila using transposable element vectors
Science
218
:
341
-353

Staehling-Hampton K., P. D. Jackson, M. J. Clark, A. H. Brand, F. M. Hoffmann,

1994
Specificity of bone morphogenetic protein-related factors: cell fate and gene expression changes in Drosophila embryos induced by decapentaplegic but not 60A
Cell Growth Differ
5
:
585
-593

Stuart E. T., P. Gruss,

1996
PAX: developmental control genes in cell growth and differentiation
Cell Growth Differ
7
:
405
-412

Sun H., A. Rodin, Y. Zhou, D. P. Dickinson, D. E. Harper, D. Hewett-Emmett, W.-H. Li,

1997
Evolution of paired domains: isolation and sequencing of jellyfish and hydra Pax genes related to Pax-5 and Pax-6
Proc. Natl. Acad. Sci. USA
94
:
5156
-5161

Tomarev S., P. Callaerts, L. Kos, R. Zinovieva, G. Halder, W. J. Gehring, J. Piatigorsky,

1997
Squid Pax-6 and eye development
Proc. Natl. Acad. Sci. USA
94
:
2421
-2426

Underhill D. A.,

2000
Genetic and biochemical diversity in the Pax gene family
Biochem. Cell. Biol
78
:
629
-638

Underhill D. A., K. J. Vogan, P. Gros,

1995
Analysis of the mouse Splotch-delayed mutation indicates that the Pax-3 paired domain can influence homeodomain DNA-binding activity
Proc. Natl. Acad. Sci. USA
92
:
3692
-3696

Xu W., M. A. Rould, S. Jun, C. Desplan, C. O. Pabo,

1995
Crystal structure of a paired domain-DNA complex at 2.5 A resolution reveals structural basis for Pax developmental mutations
Cell
80
:
639
-650

Xu H. E., M. A. Rould, W. Xu, J. A. Epstein, R. L. Maas, C. O. Pabo,

1999
Crystal structure of the human Pax6 paired domain-DNA complex reveals specific roles for the linker region and carboxy-terminal subdomain in DNA binding
Genes Dev
13
:
1263
-1275

Zhang J., M. Nei,

1997
Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods
J. Mol. Evol
44
: (Suppl. 1)
S139
-S146