Interaction of SET domains with histones and nucleic acid structures in active chromatin

Changes in the normal program of gene expression are the basis for a number of human diseases. Epigenetic control of gene expression is programmed by chromatin modifications—the inheritable “histone code”—the major component of which is histone methylation. This chromatin methylation code of gene activity is created upon cell differentiation and is further controlled by the “SET” (methyltransferase) domain proteins which maintain this histone methylation pattern and preserve it through rounds of cell division. The molecular principles of epigenetic gene maintenance are essential for proper treatment and prevention of disorders and their complications. However, the principles of epigenetic gene programming are not resolved. Here we discuss some evidence of how the SET proteins determine the required states of target genes and maintain the required levels of their activity. We suggest that, along with other recognition pathways, SET domains can directly recognize the nucleosome and nucleic acids intermediates that are specific for active chromatin regions.

The accumulated data suggest that many diseases and metabolic disorders are caused by altered patterns of gene expression (Kaminsky et al. 2006;Perini and Tupler 2006;Maekawa and Watanabe 2007). The chromatin-templated processes are controlled by a complex pattern of posttranslational modifications of the flexible N-terminal tails of histone proteins, including methylation, acetylation, phosphorylation, ubiquitination, etc., which comprise the inheritable "histone code" of gene function (Marmorstein and Trievel 2009), although the effects of the histone code may depend on the particular situation (Lee et al. 2010). Chromatin activity is largely determined by the methylation status of specific lysine and arginine residues in the N termini of histones H3 and H4 (Li et al. 2007;Shilatifard 2008). For example, methylation of K4, 36, and 79 in histone H3 promotes gene activation, while methylation of H3 K9, 27, and H4 lys20 is associated with gene silencing. The knowledge of epigenetic gene regulatory principles is essential for developing targeted therapies.
With the exception of Dot1 which methylates H3 K79 (van Leeuwen et al. 2002), the methylation of histone lysines for chromatin activity and silencing is conferred, respectively, by the Trithorax and Polycomb group histone methyltransferases, containing a conserved 130-amino-acid catalytic "SET" domain [Su(var), Enhancer of zeste, Trithorax] (Qian and Zhou 2006). The SET domain is a paradigm for both positive and negative regulators of chromatin activity. Less well-conserved pre-and post-SET sequences may flank SET domains at N and C boundaries, respectively. The SET domain proteins assume full control on the maintenance of chromatin lysine methylation through rounds of cell divisions, after the gene activity patterns have been established in early embryogenesis by a cascade of maternal and zygotic transcription factors (Breiling et al. 2007). However, it is not fully understood how SET domain proteins can "decide" on the required states of gene activity in a tissue-specific manner, how SET proteins can initially recognize their target genes, and the way this "recognition" is transmitted to progeny cells.
The molecular mechanisms behind SET-domain recognition of chromatin activity states are essential for understanding the molecular etiology of epigenetic-related disorders. The best studied example is cancer (Albert and Helin 2010), although a number of other diseases also result from aberrant chromatin methylation patterns. For example, it has been shown that correct functioning of islet beta cell depends on the methyltransferase Set7/9 which is implicated in maintaining the active chromatin status of genes required for glucose-stimulated secretion of insulin (Deering et al. 2009). An aberrant histone methylation pattern may be a major underlying mechanism for sustained proinflammatory phenotype of diabetic cells. The SET7/9 protein recognizes and preserves the correct methylation states of K4 in histone H3 in chromatin of NF-kappa B-dependent inflammatory genes, the correct functioning of which is critical for preventing the progression of diabetes and the metabolic syndrome . Trimethylation of histone H3-K9 by Suv39h1 is essential in preventing the pre-activated state of diabetic vascular smooth muscle cells (Villeneuve et al. 2008). Methylation of histone H3 controls the expression of insulin (Cavener 2009). The tissue-specific heritable aberrations in histone methylation pattern could be a reason for the propagation of the trans-generational insulin-resistant phenotype in gestational diabetes (Devaskar and Thamotharan 2007). The Ezh2 methyltransferase is implicated in the maintenance of normal pancreatic beta cell proliferation, likely by preserving histone H3 trimethylation at the Ink4a/ Arf locus in islet beta cells, thus preventing beta cell regenerative failure and diabetes Villeneuve et al. 2008). In addition, the Ezh2 protein is also likely to be involved in formation of cancerous tissues, including prostate and breast cancers (Simon and Lange 2008). Deregulation by MLL1, the human homolog of Drosophila Trithorax, results in lymphoid and myeloid acute leukemias (Cosgrove and Patel 2010). Epigenetic alterations are also implicated in the development of cardiac hypertrophy, ischemia (Maekawa and Watanabe 2007;Granger et al. 2008;Kaneda et al. 2009), rheumatic arthritis (Strietholt et al. 2008), autoimmune disease (Szyf 2010), asthma (Schwartz 2010), and other diseases (Perini and Tupler 2006;Maekawa and Watanabe 2007). There are many of such studies, but their practical implications are still limited by insufficient understanding of the principles of how the SET-domain proteins recognize, maintain, and propagate the states of chromatin activity to descendant cells.
Data in the literature suggest several, not mutually exclusive, possibilities of how SET proteins may recognize their chromatin targets.
(a) SET-domain proteins can recognize their target genes through direct or indirect interaction with site-specific chromatin-binding factors, including factors recognizing specific histone modifications.
Thus, human MLL1/MLL2 methyltransferases can bind activated estrogen receptor-α through the associated tumor suppressor Menin (Dreijerink et al. 2006), TRR, the major Drosophila H3-K4 methyltransferase, can be targeted to ecdysone-responsive promoters through direct association with ecdysone nuclei receptor (Sedkov et al. 2003), MLL1 can associate with E2F transcription factor 6 , Trithorax and MLL methyltransferases may be targeted to chromatin through association with heat shock protein HSP90 (Tariq et al. 2009), PRC2 complexes can be site-specifically anchored to DNA by PHO/PHO-like/ YY-1 DNA-binding proteins (Brown et al. 2003), etc. The recruitment of SET-domain proteins may also involve direct interactions of HKMTs with specific DNA sequences; for example, the interaction of MLL1 with DNA through CXXC domain, which binds to non-methylated CpG DNA sites (Cierpicki et al. 2010), could contribute to stable association of MLL1 with HoxA9 genes (Milne et al. 2010). Direct interaction of NSD1, -2, -3 and PR-SET7/8 SET domains with DNA may be essential for methylation specificity and activity of these enzymes (Li et al. 2009).
The recruitment of SET-domain proteins may implicate the recognition of site-specific histone modifications and histone variants. For example, PHD motifs of MLL1 and Trithorax proteins can recognize histone H3 trimethylated at lysine 4 and thus contribute to the stable chromatin association (Chang et al. 2010;Milne et al. 2010). Suv39h1, -2 HKMTs can be targeted to chromatin through association of their Cterminal chromoshadow domain with HP1 chromodomain protein, which selectively binds di-and trimethylated lysine 9 in histone H3. Similarly, E(z) can associate through its Esc subunit with Polycomb, which recognizes H3-K27 trimethylation (Daniel et al. 2005;Schuettengruber and Cavalli 2009). The bromodomains of Trithorax and MLL methyltransferases and of their associated proteins can recognize histone tails acetylated at specific lysine residues (Yang 2004). Association of SET-domain proteins with chromatin may also involve recognition of histone variants. For example, histone variant H3.3, which is preferentially deposited at gene regulatory elements, is enriched in lysine methylation associated with active gene transcription (Ng and Gurdon 2008), which suggests that it may facilitate recruitment of SET proteins, presumably by promoting more accessible chromatin configuration (ibid).
Many of the HKMT-associated subunits in vitro can selectively bind histones with di-and trimethylated substrate lysine through their histone-recognition motifs. However, in vivo, this recognition of specific histone methylation states most likely confers proper di-and trimethylation of target lysines through control of the catalytic cycle, but not for the recruitment of HKMTs to their chromatin loci per se or for the basic monomethylation of chromatin. HKMT conserved subunits-the WD40 repeat proteins (Smith 2008) such as the human WDR5 (WDS in Drosophila, Cps30 in yeast), RbBP5, RbAp48/46 (p55 in Drosophila), and Eed (Esc/Escl in Drosophila)can recognize histones through their repeated regions of βpropeller structures and "present" them to methyltransferase catalytic domains (Ruthenburg et al. 2007a;Suganuma et al. 2008). Wdr5, the common subunit of Trithorax-group HKMT complexes, recognizes dimethylated K4 in histone H3 and promotes H3-K4 trimethylation in vitro (Schuetz et al. 2006;Suganuma et al. 2008;Trievel and Shilatifard 2009), although Wdr5 is dispensable for association of MLL1 with its chromatin targets and for chromatin di-(but not tri-) methylation in vivo (Wysocka et al. 2005;Suganuma et al. 2008). RbAp48, the common subunit of Polycomb PRC2 complexes, recognizes histone H3 and H4 termini, although within PRC2 complex its H4 binding is restricted Suganuma et al. 2008). The Esc and p55 subunits of Drosophila PRC2 are both required for association of PRC2 with nucleosomes in vitro (Nekrasov et al. 2005). Drosophila Esc and Escl and human Eed have been shown to specifically bind histone H3 in vitro in a H3 tail-and modification-independent manner that was essential for E(z)-dependent trimethylation of H3-K27 in vivo (Tie et al. 2007). However, Esc and Escl were dispensable for E(z) targeting and monomethylation of chromatin in vivo (Kurzhals et al. 2008). Human Ezh2, in association with Suz12 and Eed, specifically binds trimethylated H3-K27 (Hansen et al. 2008), although it also has been reported that Eed alone can recognize trimethylated forms of K9 or 27 in histone H3 and K20 in H4 (gene silencing methylation) that results in allosteric activation of Ezh2 methyltransferase (Margueron et al. 2009; Suganuma and Workman 2010)-it has been proposed that these mechanisms may be implicated in propagating repressive H3 K27 trimethylation over extended genomic domains, as well as in transmitting gene silence to progeny cells (ibid).
Proper di-and trimethylation of lysine 4 in histone H3 depend on monoubiquitination of K123 in histone H2B (K120 in mammals) (Shilatifard 2008;Weake and Workman 2008;Shukla et al. 2009). However, lack of H2B ubiquitination does not affect the recruitment of SET-domain methyltransferases to gene-specific loci and monomethylation of histone lysines. Instead, it likely affects the correct compositional assembly of recruited HKMTs with WD-repeated histone-binding subunits (Dehe and Geli 2006;Shilatifard 2008), such as the Cps35 of yeast Set1 and its homolog Wdr82 from human hSet1, which associate with chromatin in H2B ubiquitination-dependent but Set1/hSet1-independent manner (Lee et al. 2007;Wu et al. 2008). Thus, complementation of Set1 from a ubiquitination-deficient background with wild-type Cps35 confers Set1 trimethylation activity (Lee et al. 2007). However, some reports contest the absolute need of H2B ubiquitylation for Set1 trimethylation activity (Foster and Downs 2009;Wang et al. 2009) or recruitment of Cps35 (Vitaliano-Prunier et al. 2008).
Although SET proteins are primarily associated with the RNA polymerase II holoenzyme, the components of the Paf elongation complex are essential for recruitment of SET proteins complexes to transcribed genes (Krogan et al. 2003;Milne et al. 2005). However, the proper enzymatic activity of RNA pol II-recruited methyltransferase complexes depends on the presence of WDrepeated proteins like WDR5 for MLL/Trithorax or Cps35 for Set1 and on the monoubiquitination of histone H2B (Ruthenburg et al. 2007b;Shilatifard 2008).
(c) the above SET-domain recruitment principles are comprehensively overviewed in a number of recent reviews cited above. Here, we would like to discuss in more detail one more possible mechanism of chromatin recognition by SET-domain proteins, which may be implemented through direct interaction of SET domain with nucleic acid or histone components of transcriptionally active chromatin.
Evidence for this mechanism has come from the discovery of a high-affinity histone-binding motif in the SET domain of Drosophila Trithorax, which selectively and efficiently binds the N-terminal region of histone H3 (Katsani et al. 2001). A point mutation trxZ11 in Trithorax SET (G3601S), which severely impairs SET-domain functions and results in homeotic transformations and lethality in flies, also incapacitates the histone-binding ability of SET domain, suggestive that histone-SET interactions are involved the maintenance of the pattern of gene activity in vivo (ibid). As illustrated in Fig. 1, the SET-domain polypeptide of Set7/9 methyltransferase preferentially methylates SET-associated histones H3, suggesting that tight binding of histone H3 is essential for histone methylation. However, while SET domains of Trithorax and some other methyltransferases possess high affinity to single histone H3, SET domains inefficiently bind nucleosomes (Krajewski and Reese 2010), consistent with the observations that various SET domains exhibit high methyltransferase activity on histones but poorly methylate histones inside nucleosomes (Wang et al. 2001;Martin et al. 2006;Krajewski and Reese 2010). This suggests that histone-SET interactions are restricted in compact "canonical" nucleosomes and that the nucleosome particle must be structurally altered for association with SET domain.
The studies of most SET domains complexed with histone polypeptides suggest that the target lysine residue accesses the cofactor S-adenosylmethionine, located on the opposite face of catalytic unit, through a narrow "trans-enzyme" lysineaccess channel, composed of hydrophobic residues (Bottomley 2004;Southall et al. 2009;Wu et al. 2010 and references therein). The steric features of the lysine-access channel determine the SET-domain specificity and levels of methylation. By changing the size of the lysine-binding groove to match mono-or di-and trimethylated lysines, it was possible to convert SET7/9 monomethylase to trimethylase (Xiao et al. 2003) and Dim5 trimethylase to monomethylase (Zhang et al. 2003). The proper levels of histone methylation in vivo are likely promoted by the WD40 repeated HKMT's subunits, which control the interactions of the target lysine with the methyl-donating cofactor in the SET-domain lysine- Fig. 1 An illustration of the relationship between SET-domain histone binding and histone methylation. Set7/9 methyltransferase preferentially methylates SET-domain-associated histones H3. Bacterially expressed GST-tagged full-size SET7/9 protein, immobilized on glutathione-sepharose, was incubated for the indicated time with excess of HeLa cell core histones and 100 μg/ml BSA in the presence of 3 H-S-adenosyl-methionine. The GST-SET beads were sequentially washed in buffers containing 0.2% of NP-40 and 0.2, 0.4, and 0.6 M NaCl. Bead-associated proteins ("bound", bd-lanes 2 and 4) and TCA-precipitated pooled wash fractions ("unbound", u-lanes 3 and 5) were resolved on an SDS gel and stained with Coomassie (top panel). The H3-containing slice was excised from gel, treated with EN3HANCE reagent (Perkin-Elmer), dried, and exposed to film. Lane 1 shows input histones in the reaction buffer access channel . The strict spatial disposition of the histone terminus within the catalytic domain may explain the requirement for a steric accessibility of histone termini for their unhindered binding to the SETdomain lysine-access pocket.
The nucleosome consists of a 147-bp DNA fragment wrapped around an octamer of histone proteins H2A, H2B, H3, and H4, which compose a H3-H4 tetramer flanked on either side with a H2A-H2B dimer (Luger et al. 1997). The flexible N termini of histones are engaged by interactions with extranucleosome DNA and between themselves in an intra-and internucleosomal manner and, thus, are not readily accessible to external proteins (Davey et al. 2002;Zheng and Hayes 2003). The dense nucleosome packaging can be altered by a family of ATP-dependent chromatin remodelers which could also increase the accessibility of histone octamers through multiple mechanisms, including partial or complete eviction of histone H2A-H2B dimers, nucleosome dimerization, formation of intra-and internucleosome DNA loops, etc. (Clapier and Cairns 2009). Nucleosomes can be unfolded during transcription, resulting in a transient loss of H2A-H2B dimers (Kulaeva et al. 2007) and formation of stably altered nucleosome transcriptional intermediates (Nacheva et al. 1989;Bazett-Jones et al. 1996). The accessibility of histone termini can be affected by nucleosome-nucleosome interactions and by binding of linker histone H1 (Arya and Schlick 2009;Kan et al. 2007Kan et al. , 2009, or by posttranslational modification of histones (Campos and Reinberg 2009). For example, the acetylation of histone N termini-a hallmark for active chromatin regions-causes weakening of electrostatic interaction of histone tails with DNA (Choi and Howe 2009) that promotes opening of the nucleosome structure and increases accessibility of histone termini.
Internucleosome interactions within di-and oligonucleosomes and/or remodeling of di-and oligonucleosome by incorporations of histone H1 significantly stimulated the methyltransferase activity of the Ezh2 SET domain, which exhibited weak methyltransferase activity towards intact mononucleosomes ). Remodeling of dinucleosome, but not mononucleosome, templates with purified Isw1 and Isw2 complexes facilitated association of nucleosomes with GST-tagged SET-domain polypeptides of MLL1 and SET7/9 (Krajewski and Reese 2010). It also has been shown that nucleosomes reconstituted from hyperacetylated histones possess an increased affinity to GST-SET polypeptide of MLL1 (Krajewski and Vassiliev 2010). It may be presumed that any process that changes the chromatin conformation with a concomitant release of histone N termini could provide a structural basis for binding of SET domains to nucleosomes. Liberation of histone termini could alleviate their proper positioning in the SETdomain lysine-binding pocket, thereby partially eliminating the requirement for auxiliary WD-repeated factors.
In contrast, the yeast Set2 H3-K36-and human PR-Set7/ 8 H4-K20 methyltransferases exhibit a preference for nucleosomes over histones (Nishioka et al. 2002;Strahl et al. 2002). Set2, unlike many other SET-domain HKMTs, binds histone H3 peptide inefficiently (only H3-K36 non-methylated form), but can tightly bind nucleosomes in vitro through interaction of its N terminus H3K36-like motif in histone H4 (G 41 GVKR 45 vs. G 33 GVKK 37 in H3) , which is relatively exposed (Davey et al. 2002), and with other histone regions (Psathas et al. 2009;Du and Briggs 2010) (see text above). However, in vivo, multiple Set2-nucleosome contacts are essential for H3-K36 di-and trimethylation, but are dispensable for recruitment of Set2 to chromatin and H3-K36 monomethylation (ibid). The nucleosome preference of PR-Set7/8 may be explained by an unusually high flexibility of its lysine-binding channel so the substrate itself contributes to the structure of the channel (Xiao et al. 2005). This facilitates docking of the H4-K20 lysine to the methyldonating cofactor. Consistent with this observation, at low NaCl concentration, the PR-Set7/8 is equally efficient on nucleosome and bare histones substrates (ibid). In addition, the nucleosome preference of PR-Set7/8 is lost upon deletion of the first 14 amino acids from the N terminus of the protein (Nishioka et al. 2002) or in the presence of short nucleic acid fragments (Li et al. 2009), which likely facilitates the conformational changes within SET domain (ibid).
Additional evidence for the ability of SET domains to specifically target "open" chromatin is based on the ability of SET to bind single-stranded nucleic acids. The SETdomain region of diverse HKMTs contains a polypeptide motif named SSBL (SSB-like), located in the pre-SET region or the boundary of SET and pre-SET regions (Fig. 2), which can tightly bind single-stranded DNA  (Krajewski et al. 2005). In addition to the indicated proteins, the ssDNA-binding motif was found (although not mapped precisely) in the SET-domain region of yeast Set1, Set2, and human ALR1 proteins. This motif was not found in the Set7/9 methyltransferase (unpublished observations) (ssDNA) and RNA and can recognize ssDNA stretches in supercoiled and in vitro-transcribed DNA (Krajewski et al. 2005). The SSBL motif was found a SET domains belonging to E(z), Ash1, and Trithorax, but not Su(var)3-9, SET families (SET classification as suggested by Alvarez-Venegas and Avramova 2002). Therefore, the existence and location (Fig. 2) of the SSBL motif correlates with SET-domain functional activity. The SSBL binds single-stranded DNA substrates with similar selectivity and efficiency as the major Escherichia coli SSB protein (ibid), suggestive that SSBL motif in vivo may participate in the processes which involve the regular SSB proteins, i.e., almost any time that single-stranded DNA is present or requires manipulation.
It has been shown that the SSBL motif in the SET domain of Ash1 can recruit Ash1 protein to its target sites in the Drosophila homeotic gene Ultrabithorax (UBX) through association with UBX-associated non-coding RNAs (Sanchez-Elsner et al. 2006). The non-coding RNA transcripts of three Trithorax response elements (TREs) in UBX can mediate transcription activation by recruiting Ash1 to the template TREs. The SET domain of Ash1 binds all three TRE transcripts with each TRE transcript recruiting Ash1 only to the corresponding TRE in chromatin (ibid). The authors suggest that Ash1 is recruited to TREs through association with the stretches of ssDNA exposed during ongoing transcription and the non-coding RNAs generated by transcription (ibid). Schmitt and Paro (2006) suggested that non-coding RNAs that survived mitosis might serve for re-targeting of Ash1 to a TRE to ensure the re-establishment of epigenetic activation after DNA replication. In addition, direct interactions of SET domain with nucleic acid (RNA or DNA) may regulate their HKMT's activity and specificity, facilitating proper positioning the substrate lysine side chain in the SET-domain lysine-access channel, similar to what has been shown for NSD2 and PR-SET7/8 in vitro (Li et al. 2009).
Various types of histone methylation are hallmarks of not only gene-regulatory and transcription initiation regions but also of the entire transcribed genes (Li et al. 2007). Functionally active chromatin assume a more "open" configuration with differential packaging of DNA into nucleosomes (Gilbert and Ramsahoye 2005). Therefore, it can be presumed that H3 and H4 histone termini in functioning chromatin are more readily accessible to SET domains (see text above). Transcriptionally active DNA also contains extended regions of ssDNA, RNA transcripts, and multiple DNA regions with distorted base pairing, which represent potential targets for SET-domain proteins. This, in principle, may play a role in the HKMT discrimination of active and repressed chromatin. The recruitment of SET-domain-containing proteins to their chromatin loci may involve their association with "ex-posed" histone tails and ssDNA and RNA structures, which form during transcription and other nucleosome-templated processes. The ability of SET-domain proteins to bind single-stranded nucleic acids may be also involved in the heritable propagation of chromatin methylation patterns. For example, upon progression of the replication fork, the SET-domain proteins may be distributed between the parental and newly synthesized DNA strands due to the high affinity between SET and ssDNA. In this way, SETdomain proteins may be implicated not only in methylating chromatin for transcription and repression but also in transmitting these states to descendant cells.
In view of the above, it is of interest that the Trithorax SET domain is indispensable for targeting Trithorax protein to transcribed heat shock genes in vivo (Smith et al. 2004). The expressed Trithorax SET-domain polypeptide is recruited in vivo to the heat shock genes only during their active transcription (ibid). In addition, maintaining the active state of the Drosophila Ultrabithorax gene requires the initial transcription of the Ultrabithorax regulatory region (Bender and Fitzgerald 2002;Hogga and Karch 2002;Rank et al. 2002). Using the ChIP-on-chip technique, it has been shown that Drosophila Trithorax binds to the promoters of active genes and non-coding transcripts (Beisel et al. 2007). The SET domain of Ezh proteins is required for their recruitment to target genes. A point mutation in the SET domain that prevents association of PRC2 complexes with chromatin does not affect the integrity of the complex (Margueron et al. 2008), suggesting that the SET domains of Ezh1,-2 are directly implicated in targeting of Ezh to chromatin.
Obviously, the aforementioned mechanism addresses only one of the many aspects of the interaction of SETdomain-containing proteins with chromatin. However, considering the high affinity of SET to histones and single-stranded nucleic acids, these interactions may be important in the transcription regulatory mechanisms involved in cell determination.