DNA-binding study identifies C-box and hybrid C/G-box or C/A-box motifs as high-affinity binding sites for STF1 and LONG HYPOCOTYL5 proteins.

LONG HYPOCOTYL5 (HY5) is a bZIP (basic leucine zipper) transcription factor that activates photomorphogenesis and root development in Arabidopsis (Arabidopsis thaliana). Previously, STF1 (soybean [Glycine max] TGACG-motif binding factor 1), a homologous legume protein with a RING-finger motif and a bZIP domain, was reported in soybean. To investigate the role of STF1, the phenotypes of transgenic Arabidopsis plants overexpressing STF1 and HY5 were compared. In addition, the DNA-binding properties of STF1 and HY5 were extensively studied using random binding site selection and electrophoretic mobility shift assay. Overexpression of STF1 in the hy5 mutant of Arabidopsis restored wild-type photomorphogenic and root development phenotypes of short hypocotyl, accumulation of chlorophyll, and root gravitropism with partial restoration of anthocyanin accumulation. This supports that STF1 is a homolog of HY5 with a role in light and hormone signaling. The DNA-binding properties of STF1 and HY5 are shown to be similar to each other in recognizing many ACGT-containing elements with a consensus sequence motif of 5′-(G/A)(G/A) TGACGT(C/G/A)(A/T/G)-3′. The motif represents a characteristically strong preference for flanking sequence to TGACGT and a larger sequence than the sequences recognized by the G-box binding factor and TGA protein families. The finding of C-box, hybrid C/G-, and C/A-boxes as high-affinity binding sites over the G-box and parameters associated with HY5 recognition define the criteria of HY5/STF1 protein-DNA interaction in the promoter regions. This study helps to predict the precise in vivo binding sites of the HY5 protein from the vast number of putative HY5 genomic binding sites analyzed by chromatin immunoprecipitation on chip.

The bZIP (basic Leu zipper) proteins are a class of transcription factors involved in many plant growth and development processes, including photomorphogenic development and hormone signaling (Jakoby et al., 2002;Cluis et al., 2004). One of the best characterized bZIP factors thought to play a role in photomorphogenic seedling development and hormone signaling in Arabidopsis (Arabidopsis thaliana) is LONG HYPOCOTYL5 (HY5) (Oyama et al., 1997;Ang et al., 1998;Holm et al., 2002;Cluis et al., 2004). The function of HY5 in photomorphogenesis is well illustrated in hy5 mutant seedlings, which have defects in light inhibition of hypocotyl elongation, in light-induced chlorophyll, and in anthocyanin accumulation (Oyama et al., 1997;Sibout et al., 2006;Shin et al., 2007). The model for light-signaling pathways in photomorphogenic development includes the photoreceptor phytochromes, the ubiquitin ligase CONSTITUTIVE PHOTOMORPHOGENIC1 (COP1), and the positive signaling component HY5 (Deng et al., 1992;Ang and Deng, 1994;Holm et al., 2002). One regulatory circuit involving COP1 is nuclear degradation of HY5. Although degradation of the HY5 protein is mediated by nuclear COP1 in the dark, light exposure results in the extrusion of COP1 from the nucleus into the cytosol, a process that allows accumulated HY5 to interact with DNA and to activate a light-regulated gene (von Arnim and Osterlund et al., 2000).
The role of HY5 in hormone signaling has been implicated in the hy5 null allele that shows altered root morphology (Oyama et al., 1997). The hy5 mutation affects several aspects of root morphogenesis, resulting in an elevated number of lateral roots, less responsiveness to gravitropic stimulus and touching, and longer root hairs in hy5 seedlings than in wild type. The hy5 mutant traits are partly the result of an altered balance in the signaling of auxin (Sibout et al., 2006). Microarray analyses have shown that many auxinresponsive and auxin-signaling genes are misexpressed in hy5 mutants, an indication that the genes encoding auxin-signaling components are one group of the HY5 downstream genes (Cluis et al., 2004;Sibout et al., 2006). HY5 is also involved in cytokinin signaling (Vandenbussche et al., 2007). Cytokinin treatment results in similar growth responses to blue light, such as the development of leaves and chloroplasts, stimulation of anthocyanin production, and the inhibition of hypocotyl growth elongation (Chory et al., 1994). It has been proposed that cytokinins increase the level of HY5 by reducing the degradation mediated by COP1 (Vandenbussche et al., 2007).
Previously, STF1, a homologous bZIP protein that acts as a potential regulatory factor for hypocotyl elongation, was reported for soybean (Glycine max; Cheong et al., 1998). STF1 has the unusual feature of having two unrelated structural domains with high-sequence homology to the N-terminal RING-finger domain found in RADIALLY SWOLLEN1 (RSW1): the cellulose synthase catalytic subunit and the C-terminal HY5-like bZIP domain. The bZIP proteins show a similar structural feature found in other legume bZIP proteins, including broad bean (Vicia faba) VFBZIPZF and Lotus japonicus LjBZF (Cheong et al., 1998;Nishimura et al., 2002). The role of LjBZF, a gene product of ASTRAY, was predicted in astray (Ljsym77), a root mutant that develops an increased number of nodules compared with the wild type. The astray mutant also shows photomorphogenic mutant phenotypes similar to those observed in hy5 mutants (Nishimura et al., 2002). However, the role of STF1 has not been studied in detail.
This article presents the findings of: (1) an analysis of the DNA-protein interactions of STF1 using both random binding site selection (RBSS) and gel mobility shift assay, and (2) a comparison of the DNA-binding properties and biological functions of STF1 with HY5 using a transgenic plant analysis. An in vitro binding analysis of STF1 and HY5 demonstrate that these bZIP proteins preferentially recognize C-, hybrid C/G-, and C/A-box motifs over G-box motifs. The in vitro analysis corresponds with earlier in vivo and functional analyses that identified the predicted locations of the HY5 binding sites in the promoters of anthocyanin biosynthetic genes (Hartmann et al., 2005;Shin et al., 2007). It also helps explain the abundance (approximately 3,900) of in vivo HY5 targets in the Arabidopsis genome identified by coupled chromatin immunoprecipitation and DNA chip hybridization (ChIP-chip; Lee et al., 2007). By identifying the HY5/STF1 recognition elements and the parameters associated with target genes, this study extends our understanding of the roles of HY5 and related bZIP proteins in the regulation of gene expression during plant development.

STF1 Can Replace HY5 in Photomorphogenesis and
Hormone Signaling STF1 is a bZIP factor of soybean that is homologous (71.8%) to the C-terminal half of the HY5 protein in Arabidopsis. It contains conserved amino acid motifs to the casein kinase II phosphorylation site and the COP1 interaction right before the bZIP domain ( Fig. 1A; Hardtke et al., 2000). To test whether STF1 and HY5 play similar roles in photomorphogenesis and hormonal signaling, a complementation test was performed using the hy5 mutant. The coding region of STF1 was constructed under the control of the 35S cauliflower mosaic virus promoter and was stably introduced into Arabidopsis. STF1OX, the STF1 overexpression line, was compared with wild type and with HY5OX, the HY5 overexpression line. Figure 1 shows that expression of both STF1 and HY5 in the hy5 mutants restored normal levels of hypocotyl growth inhibition as well as chlorophyll accumulation in the light-grown seedlings. The accumulation of anthocyanin was partially restored in the STF1OX line. In L. japonicus, a mutation in LjBZF, the STF1 related gene, resulted in the reduction of anthocyanin accumulation (Nishimura et al., 2002). These results suggest that STF1 plays the same role as HY5 in photomorphogenesis (Fig. 1B).
We then compared the other aspect of the hy5 mutant phenotype (i.e. gravitropic response, waving growth of root, lateral root formation, and root hair elongation) that reflects abnormal auxin signaling (Oyama et al., 1997;Cluis et al., 2004). The hy5 mutant grown on agar plates showed widely spread lateral roots that were directed nearly horizontal rather than downward (Fig.  2B). The main roots of the hy5 mutants also showed reduced gravitropism with a slight slant to the left. In addition, hy5 mutants exhibited defects in the touch response; they fail to display the normal wavy pattern of root growth when grown in the agar plates set at an angle of 45° (Fig. 2F). The gravitropic and touching responses of the roots were restored in the STF1OX line (Fig. 2, D and H). The enhanced lateral-root formation and the longer root hairs observed in hy5 mutants were also complemented by overexpression of HY5 and STF1 (Fig. 2, K and L). Altogether, the transgenic plant analysis provides further support that STF1 and HY5 have the same role in photomorphogenesis and hormone signaling. binding site (Cheong et al., 1998). The HY5 protein interacts with both the G-(CACGTG) and Z-(ATACGTGT) boxes of the light-regulated promoter of RbcS1A (ribulose bisphosphate carboxylase small subunit) and the CHS (chalcone synthase) genes Chattopadhyay et al., 1998;Yadav et al., 2002). To test whether STF1 and HY5 have similar DNA-binding properties, the binding properties of each were compared with eight different DNA sequences that represent G-, C-, and C/G-box motifs (Fig. 3A). C-box sequences carrying the mammalian cAMP responsive element (CRE; TGACGTCA) motif and the Hex sequence (TGACGTGGC), a hybrid C/G-box (Cheong et al., 1998), were high-affinity binding sites for both proteins (Fig. 3B). No binding or limited binding was observed to as-1 (Lam et al., 1989), nos-1 (Lam et al., 1990), or the AP-1 site (TGACTCA; Kim et al., 1993). Binding to the palindromic G-box (PA G-box, GCCAC-GTGGC) was moderate. However, binding activity to the G-box of the light-responsive unit 1 (U1) region of the parsley (Petroselinum crispum) CHS promoter (CHS-U1: TCCACGTGGC; Schulze-Lefert et al., 1989) or the G-box of GmAux28 (TCCACGTGTC) was much weaker than to the PA G-box (Fig. 3, B and C). Gradual increases in protein concentrations resulted in detectable binding to very weak binding sites such as CHS-U1 and as-1 sequences (Fig. 3B).
The N terminus of STF1 contains structurally unrelated domains with high sequence homology to the N-terminal RING-finger domain found in RSW1 ( Fig. 1; Figure 1. Phenotypes of the hy5 mutant and HY5 and STF1 complementation lines. A, Diagram of the STF1 protein structure. The basic DNA-binding region and Leu zipper region, casein kinase II (CKII in the image) phosphorylation sites, COP1 interaction sites to HY5, and the conserved motifs are indicated in the bottom. B, The effect of the hy5 mutation and HY5 and STF1 complementation lines on hypocotyl elongation, chlorophyll production, and anthocyanin accumulation; the phenotypes of lightand dark-grown seedlings (a and b); the hypocotyl lengths of light-and dark-grown seedlings (c and d); chlorophyll and anthocyanin levels (e and f). Hypocotyl length is the mean 6 SE (n $ 35) of 6-d-old seedlings grown in either LL or dark. The contents of chlorophyll (e) and anthocyanin (f) are from 5-d-old seedlings grown in constant white light. Lanes 1, 2, 3, and 4 denote wild type (WT in the image), hy5-Ks50 (hy5), 35STHY5/hy5-Ks50 (HY5OX), and 35STSTF1/hy5-Ks50 (STF1OX), respectively. Error bars represent the SD. [See online article for color version of this figure.] Cheong et al., 1998). When STF1 and HY5 are compared, the full-length STF1 possesses weaker binding activity than HY5 to both the PA G-and the CHS-U1 G-boxes (Fig. 3C). However, the binding affinities of both bZIP proteins were similar to CRE A/T (ATGACGTCAT), a CRE sequence with flanking adenine and thymine (A/T) at positions 24 and 14. Deletion of the STF1specific N-terminal domain resulted in enhanced binding to the G-box to a level comparable with HY5 (Fig.  3D). These results indicate that the bZIP domains of both STF1 and HY5 have similar binding properties for recognizing ACGT-containing elements (ACEs).
The in vitro binding experiments presented in this study show that, although the G-box is a known target site for the HY5 protein, the C-box sequences are the preferred binding sites for both STF1 and HY5.
STF1 Exhibits a Distinct DNA-Binding Property and Requires a Larger Recognition Sequence Than Does SGBF1 or STGA1 Three soybean bZIP proteins from different families have been described: SGBF1 (soybean G-box binding factor 1; Hong et al., 1995), STGA1 (soybean TGA1; Cheong et al., 1998), and STF1. To differentiate STF1 binding to ACEs, the electrophoretic mobility shift assay (EMSA) patterns for STGA1, STF1, and SGBF1 were compared using the same sets of binding site probes (Fig. 4A). SGBF1 interacts equally well with the two G-box sequences (PA G-box and CHS-U1). Hex, CRE, and AP-1 sequences are also well recognized. STGA1 bound strongly to the sequences containing TGACG and recognized, with high affinity, most of the sequences selected by STF1. Although some preference for the flanking sequence has been reported for these bZIP proteins, the same kind of flanking base preference was not observed for STF1. STF1, STGA1, and SGBF1 exhibit distinct DNAbinding properties; however, each binds Hex oligonucleotide (Cheong et al., 1994;Hong et al., 1995;Cheong et al., 1998;Fig. 4A). Thus, the protein-DNA contacts mediated by STF1, STGA1, and SGBF1 were compared in more detail using the Hex sequence. Methylation interference experiments were performed to determine whether methylation of G residues modified binding to these proteins. The data in Figure 4B show that STF1 binding to the Hex sequence requires distinct and additional contacts (12-13 bp) than those required by STGA1 and SGBF1. STF1 binding to the Hex oligonucleotide was inhibited when the G residues at positions 25, 24, 22, 10, 12, and 13 (''upper'' strand in the image) and 20, 14, and 15 (''lower'' strand in the image) were methylated. Binding patterns of STGA1 and SGBF1 to Hex are similar to TGA1 and GBF1 of Arabidopsis (Schindler et al., 1992b).

Binding Site Selection from Random Oligonucleotides
Defines the C-Box and the Hybrid ACEs C/G-Box and C/A-Box as High-Affinity Binding Sites for STF1 In its usual form, EMSA cannot be used to identify the wide spectrum of binding sites recognized by a DNA-binding protein. Thus to determine the DNAbinding site requirement and consensus sequence of STF1, RBSS was used (Oliphant et al., 1989). Random oligonucleotides were synthesized and allowed to bind with bacterially produced, purified recombinant STF1 at two different salt concentrations (50 and 150 mM KCl), which represent moderate-and high-stringency conditions, respectively. A total of 150 plasmids that contain DNA-binding sites selected from both conditions were isolated and sequenced. The sequences are shown in Figure 5A, where they are arranged according to the reference nucleotide at position 12. Ninety-five percent of the sequences contain the intact TGACGT motif. The consensus binding site for STF1 is thus 5#-RRTGACGTVDNN-3# [5#-( G / A )( G / A )TGACGT ( C /G/ A )( A /T/ G )-3#] (Fig. 5B).
When analyzed by type of ACE, these sequences can be grouped into four subclasses (Fig. 5C): C-box, where the C residue comes at the 12 position; a hybrid C/Gbox (C/G-box), with G at the 12 position; C/A-box, with A at the 12 position; and C/T-box, with T at the 12 position. The C-box subclass contains the largest number of selected binding sites for STF1 (38% at 50 mM KCl and 48% at 150 mM), followed by the C/G-(25.3%) and the C/A-boxes (26%). Only a small number of C/Tboxes (4/100) and non-TGACGT sequences (4/100) were selected. Further arrangement of each subclass identified the significance of the base at position 13 (top strand): STF1 shows a strong preference for A at position 13 when it interacts with the C-box (23 TGACGTC 12); however, T is preferred at 13 when STF1 interacts with the C/G-box (23 TGACGTG 12). No C residue was observed at position 13. In addition, most selected sequences contain a base preference for purines (G and A) at positions 25 and 24 (top strand). At higher salt concentrations, T at position 14 and H (A, T, C) at 15 are preferred (Fig. 5, B and C).

EMSA Confirms RBSS Analysis and Identifies the Significance of Combination of Half-Site
To confirm that the selected sequences represent the binding affinity of STF1 as well as HY5, a few selected sequences representing each group of ACEs were analyzed using EMSA ( Fig. 6). High-affinity binding was observed for all C-box sequences containing the (23 TGACGTCA 13) motif with flanking purines at positions 24 and 25 (CRE A/T , no. 16-17, no. 4-27, no. 4-38, no. 4-47, no. 12-78), followed by the C/G-box (Hex,(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21) and the C/A-box (no. 4-46), which is consistent with the RBSS data. Since purine bases are strongly favored at positions 25 and 24, the effect of base substitution at these positions was tested using CRE A/T as a reference sequence. STF1 binding to the C-box was profoundly reduced when GA at positions 25 and 24 was converted to either GT (no. 16-15) or TG (no. 1-2). Furthermore, a mutation at position 24 reduced the binding of STF1 to the C-box more than the mutation at position 25 (see no. 16-5 versus no. 1-2). When flanking bases satisfied the preference for purines at positions 25 and 24, only a slight preference for pyrimidine at 14 and 15 was observed. There was no difference in the preference for G or A at positions 25 and 24 (no. 16-17, no. 4-27, no. 4-38, and Figure 3. Comparison of the DNA-binding properties of STF1 and HY5. A, The seven ACEs and the AP-1 site used as binding-site probes. B, EMSA using different concentrations of STF1 and HY5. Increasing concentrations (100, 250, and 500 nM) of purified STF1 and HY5 protein were added to reaction mixtures containing 20,000 cpm of each binding-site probe (lanes 1-8; PA G-box, CHS-U1, Hex, as-1, nos-1, CRE G/A , CRE A/T , AP-1). Two probes, CHS-U1 and as-1, are very weak binding sites for both bZIP factors, which show binding only at high concentrations. C, Binding affinity of full-length STF1 and HY5 to CRE, PA G-box, and CHS-U1 probes. Radioactivity of the bands corresponding to free and bound DNA were measured from the dried gel using a Bio-Image analyzer (BAS 2500; Fuji Photo Film) and calculated as the percentage of bound versus free DNA. This experiment was performed three times with the same results. D, The effect of the N-terminal domain on STF1 binding to G-box sequences (PA G-box and CHS-U1 G-box). The HY5-like homologous domain is shown as a black bar at the top. Increasing concentrations (lanes 2-7; 0.05, 0.1, 0.2, 0.5, 1.0, and 1.73 mM) of each purified protein were added to the reaction mixtures containing PA G-box and CHS-U1 as binding site probes prior to EMSA.
no. 4-47). The lack of binding, or very weak binding, with number 4-33 (AGTGACGTTATT), number 12-86 (GGTGACGCCAGC), and number 4-34 (ACTGA-CGACGCC) further confirms that the C/T-box and TGACG-motifs without the ACGT-core are not the preferred binding sites for STF1. The significant reduction in binding to number 4-36 (GATGACGTC-TTA) is also consistent with the RBSS data that show T is rarely found at position 13 of the C-box.
The HY5 also showed a binding preference very similar to that of STF1 (Fig. 6A). This characteristically strong preference for flanking sequence to the TGACGT motif was not observed for STGA1 or SGBF1 (Fig. 6B).
The DNA-binding parameters of STF1 and HY5 are summarized in Figure 7. The ACEs comprise two halfsites, either of which can be symmetric or dependent (Izawa et al., 1993;Niu and Guiltinan, 1994; Fig. 7A).
The degree to which STF1 binds to the ACEs depends on the combination of half-sites. The symmetric half-site can be bound by STF1 when made symmetric (i.e. C-box), and the dependent half-site cannot be bound or weakly bound when made symmetric but can be bound when combined with a symmetric half-site. The RBSS and EMSA analyses show that the dependent half-sites to STF1 are defined as G-and A-boxes. The hybrid ACEs that contain both symmetric and dependent halfsites are good binding sites for both STF1 and HY5, whereas the ACEs comprising two dependent half-sites (i.e. G-box, A-box, and G/A-box [Z-box]) are weak binding sites (Fig. 7B).

Analysis of Target Genes of STF1 and HY5
Detailed analysis of the DNA-binding properties of STF1 identified a set of binding sites recognized by  Figure 3A. B, Methylation interference assay using Hex as a probe. STF1 binds differently to the Hex sequence than to the two other soybean bZIP proteins. Methylation interference shows that STF1 binds a wider sequence than the GBF (SGBF1) and TGA (STGA1) proteins recognize. Both strands (upper, lower in the image) of the DNA fragment containing the cloned Hex sequence (5#-ggGTG-ACGTGGCca-3#) were partially methylated and incubated with in vitro generated recombinant bZIP proteins. Free (f) and proteincomplexed (b) DNA fragments were separated, eluted, and, after piperidine cleavage, analyzed on a denaturing polyacryamide gel. Markers labeled G refer to the Maxam-Gilbert sequencing reactions of this DNA fragment (Maxam and Gilbert, 1980). The brackets indicate the location of protected sequences. The DNA sequence of the protected region is given below. Strong protection and weak protection by protein binding during methylation reaction are indicated as circles and triangles, respectively. KCl. Binding sites were selected from a pool of oligonucleotides carrying random 13 bp flanked by a defined sequence of 26 bp on either side. DNA sequences that bound to the GST-STF1 fusion protein were selected and analyzed using DNA sequencing. Binding sites were aligned according to the consensus sequence. Nucleotides corresponding to the flanking sequences on either side of the random 13 bp are underlined. An asterisk before a clone number indicates a selected sequence containing more or both STF1 and HY5. The pleiotropic response observed in the hy5 mutant suggests that there are many genes regulated by HY5. This is in agreement with the finding of a large number (approximately 3,900) of in vivo HY5 targets in the Arabidopsis genome (Lee et al., 2007). From the EMSA, the high-affinity binding site for homodimeric STF1 and HY5 was defined as 5#-( G / A )( G / A )TGACGT( C /G/ A )( A /T/ G )-3#. As G-box sequences and hybrid ACEs are targets for the HY5 protein, the basal binding motif can be described as HBACGTVD, which includes C-box, C/G-box, C/A-box, and G-and G/A-boxes. The Z-box (ATACGTGT) is the G/A-box, a hybrid ACE recognized by HY5 . A pattern matching analysis of the upstream region of the whole Arabi-dopsis genome found that 48.4% (15,516 out of 32,041 sequences) of all genes have the basic consensus sequence HBACGTVD motif in the 1-kb 5# upstream regions of their translation start. The same motif was found in 72% (2,800 out of 3,894) of the genes selected by in vivo HY5 target sequences (Lee et al., 2007). It was 1.48 times higher in the target genes than in the whole genome.
The RRTGACGTVD motif defined as the highaffinity binding motif for HY5/STF1 is found in 516 (13.25%) in vivo HY5 targets, whereas 1,707 genes (5.3%) in the whole genome contain this motif. This represents 2.87 times enrichment in target genes than in the whole genome. This indicates that the highaffinity consensus motif is more likely to be observed  in the proposed HY5-target genes than in the whole genome (Supplemental Tables S1 and S2).
HY5 regulates a wide range of genes involved in photosynthesis and hormone signaling (Cluis et al., 2004;Sibout et al., 2006;Lee et al., 2007). First, we compared 3,103 genes differentially regulated by light (Ma et al., 2005). Of these genes, 1,681 (54%) contain the HBACGTVD motif, and 228 (7.3%) contain the RRTGACGTVD motif (Supplemental Table S3). We then compared auxin-responsive genes whose mRNA levels are affected by the hy5 mutation (Cluis et al., 2004;Sibout et al., 2006). Of the 246 auxin-responsive genes, 142 (57.2%) contain HBACGTVD, and eight (3.2%) contain the high-affinity binding site (Supplemental Table S4). These values indicate that highaffinity binding sites are underrepresented in both light-regulated and auxin-responsive genes. The identification of the HBACGTVD motif and the in vivo HY5 binding sites in many genes involved in auxin and cytokinin signaling provides further support for the role of HY5 in hormone signaling (Table I). Genes containing high-affinity binding sites in the promoter region encode proteins with diverse functions (Supplemental Table S2). Many regulatory genes such as transcription factors, protein kinases, and ubiquitin ligases contain the binding motif and are actually bound by the HY5 protein (Lee et al., 2007). The anthocyanin biosynthetic genes are shown to be light regulated and targets of HY5 (Hartmann et al., 2005;Shin et al., 2007). We examined the effect of HY5 expression on the accumulation of mRNA among seven genes encoding phenylpropanoid biosynthetic enzymes: CHS, CHI, F3H, F3'H, DFR, FLS, and LDOX. Expression of the seven genes showed drastic reduction and induction by the hy5 mutation and HY5 overexpression line (Fig. 8, B and C). The STF1 overexpression line showed partial restoration of the target gene expression (data not shown). Limited induction of LDOX and F3#H was observed by STF1 overexpression, which is consistent with the partial complementation of anthocyanin accumulation in the transgenic lines. The lack of STF1 mutant soybean plants makes it difficult to address the role of STF1 in anthocyanin biosynthesis. However, the reduction of anthocyanin accumulation in the L. japonium astray mutant supports a role for STF1 in anthocyanin biosynthesis (Nishimura et al., 2002).
Promoters of all seven genes contain many ACEs. The F3H promoter was extensively analyzed to compare in vitro and in vivo binding to HY5 (Lee et al., 2007;Shin et al., 2007). Among the five ACEs that were bound to HY5 in vitro, two ACEs-G-box at 2464 and C/G-box at 2429-fit the criteria of the HY5 binding motif. These two ACEs are in vivo HY5 target sites (Fig. 8D). The predicted locations of the HY5 binding sites in the seven genes are shown in Figure 8E. These sequences were confirmed as HY5 targets using in vivo binding and functional analyses (Hartmann et al., 2005;Lee et al., 2007;Shin et al., 2007). Although these ACEs are classified as low-affinity binding sequences, the half-site of ACE complies well with the binding requirements of HY5 and STF1. All together, these findings strongly support that HY5 is the bZIP protein involved in expression of anthocyanin biosynthetic genes. This study also helps to predict precisely the binding sites in the HY5 target promoters.

DISCUSSION
This study shows that STF1 is a bZIP protein that has roles in photomorphogenic development and hormone signaling, much like HY5 of Arabidopsis. Analyses of the DNA-binding properties and the binding site selectivity of STF1 provide ample information about the spectra of binding sites recognized by this protein as well as the effects of flanking sequences around the ACGT-core motif, which contributes to the binding specificity and affinity of the two related proteins, STF1 and HY5. Symmetric half-sites are sequences that can be bound by protein when made symmetric (i.e. C-box), whereas dependent half-sites are sequences that cannot be bound or are weakly bound when made symmetric but that can be bound when combined with a symmetric half-site. The binding affinities to STF1 are indicated. B, The diagram of predicted binding affinity. Thick lines indicate strong or moderate binding of the half-site combinations; thin lines indicate weak binding of half-sites. RBSS identified a consensus site for high-affinity binding: 5#-RRTGACGTVDnn-3#. An interesting feature of the STF1 protein-DNA interaction to emerge from the RBSS and the EMSA analyses is that STF1 has a strict requirement for binding sites that have large sequences and certain combinations of flanking se-  Lee et al. (2007).  14 and 15 (Figs. 3 and 4). Although STF1 has a narrower binding site spectra than soybean's TGA1, Figure 8. Regulation of anthocyanin biosynthetic genes by HY5 and the predicted HY5 binding sites in the promoter region. A, Simplified schematic representation of the biosynthesis pathway of anthocyanins and flavonols. Abbreviations of the seven enzyme designations are: CHS, Chalcone synthase; CHI, chalcone isomerase; F3H, flavonone 3-hydroxylase; DFR, dihydroflavonol 4-reductase; LDOX, leucoanthocyanidin dioxygenase; F3#H, flavonone 3#-hydroxylase; FLS, flavonol synthase. B, Semiquantitative RT-PCR. mRNA abundance of genes involved in anthocyanin biosynthesis pathways in the hy5 mutation and HY5 complementation lines grown under LL. Transcript levels of CAB1 (chlorophyll a/b-binding protein 1) are shown for comparative purpose (Lee et al., 2007). ACTIN2 (Act2) mRNA level was used as an internal control. C, Bar graphs represent the relative expression levels of genes as obtained by RT-PCR (B). Band intensities were quantified using Bio-Rad's Quantity One software, and values were normalized against Act2 transcript. D, Diagram of F3H promoter fragments, including ACEs (A) as well as E-and G-boxes (G), according to Shin et al. (2007;top). Thick bars represent fragments identified by in vivo ChIP. The 12 nucleotides of the indicated elements are shown at the bottom. The types of ACEs are indicated to the right. *F3H promoter fragments are named according to Shin et al. (2007). **Prediction is based on binding parameter of STF1/HY5 binding. WB, Weak binding; NB, no binding. ***This is based on in vitro binding assay (Shin et al., 2007). The base number indicates the center of each element as counted from the translation start site (11). E, The list of predicted HY5 binding sites in the seven anthocyanin biosynthetic genes that satisfy the binding parameter identified from this study and the fragments identified by in vivo ChIP (Shin et al., 2007). The type of ACE and the predicted binding affinity are indicated on the right. In vitro binding data of direct binding to HY5 protein is not available (NA). ACE is identified by functional analysis and in vitro nuclear protein binding study (Hartmann et al., 2005). All functionally defined ACEs satisfy the criteria of STF1/HY5 binding but predicted as weak binding sites. which is a C-box binding protein (Cheong et al., 1994), the binding of STF1 to the G-box, coupled with its ability to form heterodimers with GBFs, makes it a novel member of the bZIP protein family because of its capability to recognize both C-box and G-box motifs. Many researchers have also reported the importance of the flanking sequences for DNA-binding of bZIP proteins (Schindler et al., 1992b;Williams et al., 1992;Izawa et al., 1993Izawa et al., , 1994Niu and Guiltinan, 1994;Hong et al., 1995;Martinez-Garcia et al., 1998). However, this study stresses the importance of a single base at position 24 for STF1 binding, which is in contrast to other bZIP proteins that interact with these C-box sequences in an overlapping manner (Fig. 4B).
Detailed analyses of the selected binding sites and the EMSA data identified an optimal combination of flanking sequences at positions 12 and 13 after TGACGT. For C-boxes, the preferred bases are CA (TGACGTCA) and CG (TGACGTCG). The CC (TGACGTCC) or CT (TGACGTCT) combinations are rarely selected, a finding confirmed by the EMSA using the  GATGACGTCGT) sequences (Fig. 4). Other preferred combinations of flanking sequences are GG (Hex, TGACGTGG: C/G-box), GT (no. 4-21, TGACGTGT: C/G-box), and AT (no. 4-36, TGACG-TAT: C/A-box). The flanking sequences GG, GT, and CA at positions 13 and 14 were also observed in the high-affinity binding site of group 1 factors (GBFs, EmBP1, HBP1a) for which the optimum binding site is CCACGTGG (G-box; Schindler et al., 1992aSchindler et al., , 1992bNiu and Guiltinan, 1994). For hybrid ACEs, the correct base combination should be considered in predicting the target site.
Since C-box motifs and hybrid C/G-or C/A-boxes are the preferred binding sites for HY5 as well as for STF1, both HY5 and STF1 may regulate a wide range of genes in addition to G-box containing genes. The observation that the most dramatic morphological defects in the hy5 mutant are found in the hypocotyls, stems, and roots supports this conjecture (Ang and Deng, 1994;Oyama et al., 1997;Ang et al., 1998). Recently, LjBzf, a gene highly homologous to STF1, was isolated from L. japonicus (Nishimura et al., 2002). Mutation of LjBzf results in the mutant astray (Ljsym77), a root mutant with a higher number of nodules than that of the wild type. The astray mutant shows similar greening, hypocotyl, and root morphology as the hy5 mutant with reduction in anthocyanin accumulation (Nishimura et al., 2002). The ASTRAY protein has a structure highly similar to that of other legume bZIP proteins: STF1 of soybean; VFBIPZF of broad bean, which contains an N-terminal RINGfinger domain found in RSW1; the cellulose synthase catalytic subunit; and the C-terminal HY5-like bZIP domain. Given these similarities, detailed analyses of STF1 DNA-binding properties may enhance our understanding of the roles of related legume bZIP proteins in plant development.
The HY5 target was reported to over 3,000 genes (Lee et al., 2007). The sequence analysis revealed that these genes carry several different types of ACEs. The ACEs are classified into C-box, G-box, and hybrid C/G, C/A, and G/A (Z-box) boxes. The analysis of HY5 target genes suggests that these bZIP proteins could regulate expression of many regulatory genes, such as those encoding transcription factors, kinases, or genes required for cell proliferation and elongation (Supplemental Tables S1 and S2). This is strongly indicative of HY5 having a role in a high hierarchical position (Lee et al., 2007). The identification of many signal transduction related genes and transcription factors involved in auxin and cytokinin signaling complies well with a role for HY5 in hormone signaling ( Table I). The HY5 binding sites in light-regulated genes fit the role of HY5 in photomorphogenesis.
Overall, the observation that STF1 and HY5 have similar binding properties and physiological roles and the identification of their binding criteria further facilitate our understanding of how these two bZIP proteins function in complex plant developmental processes, such as cell elongation, root development, and photosynthesis.

Plant Materials and Growth Conditions
Standard molecular biology techniques were used according to Sambrook et al. (1989). Arabidopsis (Arabidopsis thaliana) plants were cultivated in a growth chamber with a 16-h light/8-h dark cycle at 22°C under a combination of cool-white fluorescent and incandescent lights at 70 to 100 mmol m 22 s 21 . For generation of the STF1 overexpression (STF1OX) transgenic lines, the fulllength coding region of STF1 (Cheong et al., 1998) was amplified and cloned into the BamHI site of pCAMBIA1300 (CAMBIA, Canberra, Australia). For transformation of the hy5 mutant, Arabidopsis hy5-Ks50 (provided by Dr. Kiyotaka Okada, Kyoto University) was used. The 35STHY5 overexpression line (HY5OX) was obtained from Dr. K. Okada. Fifteen transgenic lines were obtained, and homozygous lines, which express higher levels of the STF1 gene, were established for phenotypic analysis. All plants used in these experiments were in the Wassilewskija background and were grown on halfstrength Murashige and Skoog (MS) agar medium containing 2% Suc, except for measurement of hypocotyl length.

Analysis of Hypocotyl Length, Anthocyanin Levels, and Chlorophyll Content
To measure hypocotyl length, sterilized seeds were placed on half-strength MS agar medium without Suc and stratified in the dark at 4°C for 3 d. The seeds were exposed to white light (100 mmol m 22 s 21 ) for 1 h, returned to the darkness at 22°C for 23 h, and then placed into either continuous white light or the dark for 6 d. The hypocotyl lengths of 50 seedlings were measured using SCION Image software (Scion).
To measure anthocyanin accumulation, 5-d-old light-grown seedlings were used. Fifty seedlings per sample were incubated overnight in 300 mL of extraction buffer (methanol containing 1% HCl) in the dark. After extraction, 200 mL of distilled water and 200 mL of chloroform were added to each sample, and absorbances were read at 530 and 657 nm. The quantity of anthocyanin was determined by spectrophotometric measurement of the aqueous phase (A 530 -A 657 ) and normalized to the total fresh weight of tissues used in each sample.
Relative chlorophyll levels were determined from the same samples used for measuring the quantity of anthocyanin. Chlorophyll was extracted into 1 mL of 80% acetone by shaking the chloroform fractions overnight in the dark. Chlorophyll levels were measured spectroscopically, and the amount was calculated using MacKinney's coefficients and the equation (chlorophyll a1b 5 7.15 3 OD 660 nm 1 18.71 3 OD 647 nm ) described by Holm et al. (2002).
Select binding sites and the DNA-binding sites described above were excised from plasmids by digestion with BamHI and HindIII, end-labeled with [a-32 P]dATP, and purified using PAGE. EMSA was then performed as previously described (Hong et al., 1995). Proteins were preincubated in a reaction buffer (20 mM HEPES, pH 7.9; 0.2 mg/mL poly(dI-dC); 0.5 mM dithiothreitol; 0.1 mM EDTA; 50 mM KCl) for 10 min at room temperature and incubated with 2 3 10 4 cpm (0.5 ng) of end-labeled probe DNA for 15 min. The resulting protein-DNA complexes were analyzed by electrophoresis on nondenatured 5% PAGE using a 0.53 Tris-borate/EDTA electrophoresis buffer. Following electrophoresis, the gels were dried and subjected to autoradiography with intensifying screens at 270°C.

Methylation Interference Analysis
Methylation interference analysis was performed as described previously with a minor modification (Schindler et al., 1992b). The plasmid containing the Hex sequence (5#-gggTGACGTGGcca-3#) was 3#-end labeled on either strand with Klenow and [a-32 P]dATP at the BamHI or HindIII sites and blocked with the KpnI or SacI sites, respectively, partially methylated on G residues with dimethylsulfate, and used as a probe. This probe was subjected to a gel mobility shift assay (described below) using a bacterial cell extract prepared from the Escherichia coli strain BL21(DE3)/pLysS harboring pRSET vectors containing SGBF-1S, STGA1, and STF1 cDNAs. The crude extracts were prepared as described previously (Hong et al., 1995). Bound and unbound DNAs were eluted from the gel, cleaved with 1 M piperidine, and analyzed on a 6% denaturing polyacrylamide gel.

Overexpression and Purification of the Proteins
Expression constructs of the pGEX vector (Pharmacia) were used to express the glutathione S-transferase (GST)-fusion proteins (STF1 and HY5) in E. coli BL21(DE3)/pLysS. Purification of the GST-fusion proteins was carried out according to Smith and Johnson (1988). E. coli transformants selected with ampicillin were grown to log phase at 37°C and induced with 1 mM isopropyl-b-D-thiogalactopyranoside (Promega). Three and one-half hours after induction, bacteria were centrifuged, and the pellets were washed with cold STE (0.1 M NaCl; 10 mM Tris-Hcl, pH 8.0; 1 mM EDTA, pH 8.0) and resuspended in a cold phosphate-buffered saline (PBS; 16 mM Na 2 HPO 4 ; 4 mM NaH 2 PO 4 , pH 7.3; 150 mM NaCl) that contained 0.25 mM phenylmethanesulfonyl fluoride, 0.25 mg/mL leupeptin, 2.5 mg/mL aprotinin, 2.5 mg/mL antipain, 0.25 mg/mL pepstatin A, and 1 mM DTT. After sonication, 0.1% Triton X-100 was added to the lysates, which were then clarified by centrifugation at 12,000g for 10 min at 4°C, and the supernatant was mixed with glutathione-linked agarose beads (Sigma). After a 30-min incubation with gentle shaking at 4°C, the beads were washed three times in a cold PBS. For RBSS, the fusion protein was eluted with 10 mM glutathione (Sigma) in PBS. For EMSA, the glutathione agarose beads containing the GST-fusion proteins were equilibrated with a thrombin cleavage buffer (2.5 mM CaCl 2 ; 150 mM Tris-HCl, pH8.0; 150 mM NaCl; 0.1% b-mercaptoethanol), and the proteins were eluted with a thrombin cleavage buffer containing thrombin (0.024 units/mL, Boehringer Mannheim). After purification, all the proteins were dialyzed in an extraction buffer (20 mM HEPES, pH 7.9; 50 mM b-mercaptoethanol; 0.2 mM phenylmethanesulfonyl fluoride; 1 mM DTT; 1 mM EDTA; 100 mM NaCl; 10% glycerol) at 4°C, aliquoted, and frozen at 270°C. Protein concentration was determined using a combination of the Bradford reagent assay (Bio-Rad) and staining with Coomassie blue after SDS-PAGE.

RBSS
The oligonucleotide synthesized for use in the binding selection was TN64 (5#-CGCGACGTCGGAAGACAAGCTTGTAA(N) 13 ATAGGATCCCTCACCT-CAGACAGAC-3#), which was derived from Schindler et al. (1992b). TN64 contains a randomized sequence of 13 nucleotides flanked by 26 nucleotides at the 5#-end containing a HindIII site and 25 nucleotides at the 3#-end containing a BamHI site. For PCR amplification, the oligonucleotides TNF20 (5#-CGCGACGTCGGAAGACAAAGC-3#) and TNR20 (5#-GTCTGTCTGAGGT-GAGGGT-3#) served as forward and reverse primers, respectively. To produce double-stranded random binding DNA sequences, TN64 (400 pmol) and TNR20 (800 pmol) were annealed, and the second strand was synthesized after adding the four deoxyribonucleotide triphosphates (final concentration of 0.25 mM each). Extension was performed using the Klenow fragment of the DNA polymerase. The GST-STF1 fusion protein was preincubated with 6 mg poly(dI-dC), as a nonspecific competitor, for 10 min at room temperature in a reaction mixture (20 mM HEPES, pH 7.9; 50 mM KCl; 0.1 mM EDTA; 0.5 mM DTT; 15% glycerol) prior to addition of the double-stranded synthetic random oligonucleotide, producing a final volume of 30 mL. After an additional 15-min incubation, the DNA-protein complex was separated on a 7.5% polyacrylamide (29:1 acrylamide/bis) gel containing 0.53 Tri-borate/EDTA and 3% glycerol for 1.5 to 2 h at 15 to 20 V/cm. The DNA-protein complex, which comigrated with the GST-STF1 and the a-32 P labeled Hex (5#-GGTGA-CGTGGCT-3#) probe complex, was positioned, excised from the gel, and eluted by electroelution into a dialysis bag. DNA was extracted with phenol/ chloroform and precipitated by ethanol. Recovered DNA was resuspended in appropriate volumes of deionized distilled water. PCR was carried out in a final volume of 100 mL containing 80 pmol of each primer, 20 nmol of the four deoxyribonucleotide triphosphates, 10 mM Tris-HCl (pH 8.3 at 25°C), 1.5 mM MgCl 2 , 50 mM KCl, and 0.5 mL (2.5 units) of Taq polymerase. After initial denaturation for 30 s at 92°C, the selected binding sequences were amplified for 30 s at 92°C, 1 min at 55°C, and 30 s at 72°C for 30 cycles in a thermal cycler (Pharmacia LKB). The amplified DNA of 64 nucleotides was purified on a 2% agarose gel, eluted as described above, and served as the template for the following round of selection. Five selection cycles (binding/gel-shift/elution/ PCR) were carried out using a binding buffer containing 50 mM KCl. The final pool of oligonucleotides was digested with BamHI and HindIII, ligated into pBluescript II (Stratagene) SK(1), and transformed into XL1-Blue. A total of 100 clones containing the selected binding site were randomly chosen. The sequences of the inserted DNA were determined by using the 7-deaza-dGTP sequencing kit (USB) with the T3 primer and [a-32 P]dATP (Amersham).

Sequence Analysis
The potential targets for STF1/HY5 in the Arabidopsis genome database were searched using the Pattern Matching program on the Arabidopsis Information Resource Web site (http://www.arabidopsis.org). Duplicate entries between the two data sets were removed using Duplicate Remover, a bioinformatic tool available on The Bio-Array Resource for Arabidopsis Functional Genomics Web site (http://bbc.botany.utoronto.ca).

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Table S1. Genes that carry high-affinity binding sites based on RBSS analysis and that were identified as HY5 targets by ChIP-chip (Lee et al., 2007).
Supplemental Table S2. The 1,705 genes that contain the high-affinity HY5 binding site (RRTGACGTVD); the number of HY5-binding motifs and the position of 12 nt sequence (start and end) are shown with sequences.
Supplemental Table S3. List of light-regulated genes that contain the HBACGTVD (1,681 genes) or RRTGACGTVD (228 genes) sequences (Ma et al., 2005) and their identification as HY5 target genes or not (Lee et al., 2007).