Identification of Methyllysine Peptides Binding to Chromobox Protein Homolog 6 Chromodomain in the Human Proteome*

Methylation is one of the important post-translational modifications that play critical roles in regulating protein functions. Proteomic identification of this post-translational modification and understanding how it affects protein activity remain great challenges. We tackled this problem from the aspect of methylation mediating protein–protein interaction. Using the chromodomain of human chromobox protein homolog 6 as a model system, we developed a systematic approach that integrates structure modeling, bioinformatics analysis, and peptide microarray experiments to identify lysine residues that are methylated and recognized by the chromodomain in the human proteome. Given the important role of chromobox protein homolog 6 as a reader of histone modifications, it was interesting to find that the majority of its interacting partners identified via this approach function in chromatin remodeling and transcriptional regulation. Our study not only illustrates a novel angle for identifying methyllysines on a proteome-wide scale and elucidating their potential roles in regulating protein function, but also suggests possible strategies for engineering the chromodomain–peptide interface to enhance the recognition of and manipulate the signal transduction mediated by such interactions.

such as transcriptional regulation, signal transduction (12), tumorgenesis, and protein degradation. A great variety of PTMs on histones (i.e. histone modifications) can affect the expression of target genes by directly altering chromatin structure or recruiting regulatory protein complexes (13)(14)(15). A series of histone modification readers have been identified that recognize histone modifications and transmit the signal to the next layer in the protein interaction network (16,17). In addition to the discovery of those histone modification readers, structural biologists have solved several crystal structures of histone modification readers and demonstrated the characteristics of various binding pockets that recognize different histone modifications (16).
Polycomb protein comprises a family of histone modification readers that specifically recognize methylated H3K9/ H3K27 (18,19). It was first identified in Drosophila melanogaster as a heterochromatin-associated protein and a component of polycomb repressive complex 1 (PRC1), which is related to gene repression (20). A series of proteins also identified as polycomb group proteins are the components of PRC1 and PRC2 and involve chromatin-related gene repression. Further studies have shown that polycomb group proteins, including polycomb itself, are involved in a wide range of gene regulation activities related to development, sexrelated regulation, tumorgenesis, cell cycle control, and cell differentiation (21)(22)(23)(24)(25). However, little is known about the function of polycomb beyond its role as a polycomb group protein.
Five human homologs of Drosophila polycomb have been found: CBX2, CBX4, CBX6, CBX7, and CBX8 (18,26). Complex structures between CBX proteins and methylated histone peptides H3K9me3, H3K27me2, and H3K27me3 (27)(28)(29)(30) have been resolved. All of those complex structures show the same binding site structure as that of polycomb, in which a threeresidue aromatic cage composed of tryptophan and either tyrosine or phenylalanine is located on the interface, and the side chain of methylated lysine extends into the aromatic cage. The cationinteraction between the three aromatic rings and the positively charged methylated lysine becomes the main stabilizing force for the complex.
Recent studies have also shown that non-histone lysine methylation exists widely in the human proteome. Levy et al. discovered 118 proteins that can be methylated by lysine methyltransferase SETD6 using fluorescent methods and 114 others via radioactive methods; 26 proteins were detected with both methods (31). Dhayalan et al. described 91 nonhistone peptide substrates that can be methylated by lysine methyltransferase SET7/9; among them, 9 proteins were further verified in vivo (32). Additionally, Balasubramaniyan et al. showed that SET7/9 can also methylate FXR (33). The known function for lysine methylation on non-histone proteins is very limited. Only a few non-histone methylated proteins have been well studied. One example is tumor repressor p53, which has multiple lysine methylation sites, such as K370 (34), K372 (35,36), K373 (37), and K382 (38,39). p53K372me stabilizes p53 protein in the nucleus, which is required for the transcription activation of downstream genes (35,36), whereas p53K370me and p53K382me repress the p53 regulation effect on target genes (34,39). A second example is the lysine methylation of DNA methyltransferase DNMT1 (40,41); DNMT1-K142me can lead to its degradation (42). Furthermore, RELA, a component of the pleiotropic transcription factor NF-kB, can be monomethylated by SET6D at K310, resulting in a repressive effect for the RELA-driven processes (43). Despite these discoveries, the proteomic identification of methylation by means of mass spectrometry remains nontrivial because of the lack of effective enrichment methods for specific methylated peptides. In addition, determining which proteins recognize methyllysine is an important but very challenging problem that has not been widely addressed for nonhistone proteins.
Several computational methods related to domain-peptide interaction are available, but only on limited domains, such as SH2 (44), SH3 (45)(46)(47)(48)(49), and PDZ (50 -55). For all of these domains, enough interaction data have already been obtained to train a reasonably good machine learning model. However, there is no effective way to obtain machine learning models for interactions without sufficient experimental data. There are some prediction methods for potential lysine methylation sites (56 -59), but the prediction results give information only on whether lysine sites can be methylated; the relationships among methylated lysine sites, proteins, and their functions remain unclear.
In this study, we first characterized the domain-peptide interface using a mutation matrix generated from molecular dynamics and virtual mutagenesis, which revealed residues critical for binding and suggested possible ways to engineer the recognition interface. We then searched for methyllysinecontaining peptides in the human proteome that may bind to the chromobox protein homolog 6 (CBX6) chromodomain. We exploited a computational pipeline to integrate diverse data, including the binding affinity estimated by the mutation matrix, sequence conservation and secondary structure of the peptide, subcellular location of the protein containing the candidate peptide, mass spectrometry signal of methylation, and co-expression between CBX6 and its putative binder. The peptides selected by this computational pipeline were next examined via peptide microarray binding assay, and 50 nonhistone peptides were confirmed as CBX6 binders. Gene ontology analysis of these CBX6 interacting partners showed that the majority of them function in chromatin modification and transcriptional regulation. Our study provides a novel way to systematically identify methylated peptides in the proteome, illustrate the protein-protein interactions mediated by such methylation, and thus significantly expand our limited knowledge of non-histone protein methylations.

MATERIALS AND METHODS
CBX6 Modeling Procedure-The crystal structure for the CBX6-H3K27me3 complex was obtained from the Protein Data Bank (60) (PDB ID 3I90). The models were prepared for molecular dynamics simulation with the AMBER 9.0 (61) module LEAP. The trimethylated lysine, which was not recognized by the standard residue library of LEAP, was treated as a nonstandard residue. Its atomic charges were calculated with the HF/6 -31G* basis set of Gaussian03 (62) and fitted subsequently with RESP (63). Supplemental Table S1 lists the resulting partial charges on the trimethylated lysine atoms. In the Gauss-ian03 charge calculation, the trimethylated lysine residue was capped (64) by ACE and NME residues so that it could be treated as a complete molecule. The FF03 force field (65) was used to handle all atoms in the model except for the trimethylated zeta nitrogen of H3K27, for which the GAFF force field (66) was used. The CBX6histone model was solvated with a rectangular, periodic box of TIP3P water molecules (67) that extended 10 Å past the solute on all sides. Chloride counter ions were added to neutralize the models.
Molecular Dynamics Simulation-The CBX6-H3K27me3 system was minimized in two stages. In the first stage, the water molecules and counter ions were relaxed while organic molecules were restrained by a harmonic force of 100 kcal/mol Å 2 . In the second stage, the restraints were removed, and the entire system was allowed to minimize. Each stage consisted of 400 steps of steepest descent followed by 1600 steps of conjugate gradient minimization. After the minimization stages, the system was heated from 0 to 300 K over 30 ps. The temperature of 300 K and a pressure of 1 atm were maintained by a Berendsen thermostat and barostat (68) with a coupling time of 0.2 ps. The SHAKE algorithm (69) was used to restrain all bonds involving hydrogen. Initial temperatures were assigned randomly at the beginning of the heating run according to the Maxwell-Boltzmann distribution. The system was equilibrated for 200 ps and then underwent a production run lasting 4 ns. The CBX6 heavy atoms were restrained by a harmonic force of 10 kcal/mol Å 2 in order to ensure the stability of the complex during the equilibration and production runs.
Virtual Mutagenesis-We formed the mutation matrix by identifying the 21 most critical binding residues in CBX6 for binding to H3K27me3, according to Table I. Each of these residues was mutated to all of the other 19 naturally occurring amino acids. This produced 21 ϫ 19 ϭ 399 separate models, each a point mutation of the wild-type complex between CBX6 and histone H3. In a separate operation, the sequence of the protein was preserved and each of the H3 residues featured in the model, except for H3K27, was mutated to each of the other 19 amino acids. This produced another 10 ϫ 19 ϭ 190 separate models. The software program SCAP (70) was used to perform the point mutations. The standard procedure of minimization, heating, and equilibration was performed on each of the models, in addition to 2 ns of production run. Binding free energies were calculated from the second half (the 1.0 -2.0 ns interval) of the production run for each model.

Trajectory Analysis, Free Energy Calculation, and Decomposition
Analysis-The root mean square deviation for each system during the equilibration and production run phases of the molecular dynamics simulation was calculated by the PTRAJ module in the AMBER package (61). As a representative sample, the root mean square deviation for the CBX6/H3K27me3 wild type is shown in Fig. 1. Based on the stabilization of the root mean square deviation after about 1.0 ns, the interval for the binding free energy calculation was chosen for the latter half of each production run, between 1.0 and 2.0 ns. MM-GBSA (71)(72)(73) was used to calculate the binding free energies for each CBX6-histone complex using the MM-PBSA module in AMBER 9.0, according to the general procedure given by Wang and Kollman (74) and Hou et al. (73). The binding free energies ⌬G bind were calculated from 100 snapshots taken at equal intervals between the 1.0-ns and 2.0-ns marks of the production run trajectory. ⌬G bind was calculated as follows: Here, ϽG complex Ͼ, ϽG protein Ͼ, and ϽG peptide Ͼ are the respective individual free energies of the complex, protein, and peptide, and each term is calculated by summing the contributions from the electrostatic potential E elec , van der Waals potential E vdw , and solvation free energy G solv . In turn, G solv is the sum of the polar contribution G polar and nonpolar contribution G nonpolar to the solvation free energy.
G polar was calculated by setting the value of IGB to 2, activating the generalized Born parameters of Onufriev et al. (75). G nonpolar was estimated as 0.0072 times the solvent-accessible surface area (SASA), as measured via the LCPO method (76). A grid size of 0.5 Å was used to solve the Poisson-Boltzmann equation, and the probe radius (77) was set at 1.4 Å. A value of 4.0 was used for the interior dielectric constant, which applied to the region of the protein-histone binding pocket, and the external dielectric constant was given throughout as 80.0.
To investigate the molecular basis of recognition specificity, we conducted component decomposition of the binding free energy. The contributions to the binding free energy were also broken down for individual residues and by component (van der Waals, electrostatic, and solvation) for each of the complexes. Decompositions were performed by invoking the DECOMP module in the MM-GBSA program within AMBER. Each H3 or CBX6 residue was evaluated according to its overall contribution to the binding interaction, as opposed to a pairwise analysis of every CBX6/H3 residue combination.
Mutation Score-Every candidate peptide can be considered as a mutant from the H3K27me3 peptide. Therefore, a site-specific mutation matrix was used to calculate the mutation score, which is the estimated difference in binding energy between the candidate peptide and H3K27me3.
The number represented by i is the amino acid position on the peptide; i ranges from 1 to 11, and position 9 must be a lysine that is assumed to be methylated. M is the mutation matrix generated from Virtual Mutagenesis analysis, and A i is the ith residue of the candidate peptide.
Conservation Score-To evaluate the conservation of each candidate peptide, we first retrieved 42 vertebrate proteomes (supplemental Table S2) from Ensemble (78). For each human protein containing a candidate peptide, the homologous proteins in the other proteomes were found using NCBI BLAST (79). Multiple sequence alignment was then generated for the 43 homologous proteins using CLUSTALW2 (80). The overall conservation score was calculated from the multiple sequence alignment using the BLOSUM62 matrix (81).
Overall conservation score ϭ In Equation 5, A i is the ith amino acid in the candidate peptide and A i m is the ith amino acid in the homolog of mth vertebrate proteome. In addition to the conservation of the entire peptide, the lysine residue to be methylated should also be highly conserved because of the functional constraint, which can be evaluated using the multiple sequence alignment as well. The cutoff values for the overall conservation and lysine conservation scores were chosen as 2200 and 37, respectively.
Mass Spectrometry Evidence-About 30 million mass spectra from human tissues were downloaded from PRIDE (82) and PeptideAtlas (83). The complete collection of human proteins in the Uniprot/Swissprot database was used as the reference database for the MS scan. The scanning programs X!TANDEM (84) and INSPECT (85) were used to perform the scan, with parent mass error set to 1 Da and fragment mass error to 0.2 Da. We searched for peptides containing mono-, di-, or trimethylated lysine. The scanned results from the two programs were then merged.
Secondary Structure-The H3K27 peptide adopts an extended conformation (␤ strand and loop) in the CBX6 -H3K27 complex, and it is clear that the peptide cannot fit to the binding pocket in helical form. Therefore, we removed the candidate peptides that form a helix. For proteins with available structures in the Protein Data Bank, DSSP (86) was used to define the secondary structure for each residue. For proteins without solved structures, two secondary structure prediction methods, PSI-PRED3.0 (87) and SSpro4.0 (88), were used to estimate the secondary structure for each residue. If a residue was predicted as a helix by either method, its secondary structure was specified as helical. Candidate peptides containing more than two helix residues between the 4th and the 10th position were removed.
Solvent-accessible Surface Area-The SASA was calculated for all structure-available peptides. If no structure was available, to avoid noise, the SASA filter was not applied. The difference in SASA for H3K27 peptide before and after binding was calculated using DSSP (86). The SASA in every candidate peptide for each residue was calculated using DSSP and compared with that of H3K27. We required that each residue from the 2nd to the 10th position have more than 70% of the SASA of the corresponding H3K27 residue.
Co-expression between CBX6 and Its Putative Interacting Partners-Data for human microarray experiments were downloaded from the Stanford Microarray Database (89). The ratio between red and green channel intensities was used as the gene expression level. If a gene was represented by multiple probes, the average expression value was used. First, the expression value of each gene was converted to a Z-score such that these values were comparable across different microarray experiments. When calculating the pairwise correlation of two genes using the 10,882 microarray experiments, we discarded experiments in which either gene's expression was missing. Then, mutual information was calculated to evaluate the coexpression of the two genes. The Unigene ID was taken from the Stanford Microarray Database directly as the identifier for each gene. The mapping of the Unigene ID to the Uniprot accession ID was performed by parsing the Uniprot XML file. In cases when the Uniprot accession ID corresponded to multiple Unigene IDs, all of the IDs were considered as one gene, and their expression values were averaged before normalization. The correlation coefficients of expression levels between CBX6 and all available genes from the Stanford Microarray Database were calculated and used to construct a background distribution, based on which a p value was computed for a given co-expression coefficient. We took a loose p value cutoff of 0.5 as the co-expression filter threshold.
Peptide and Protein Synthesis and Screening-The chromodomain of CBX6 protein (amino acids 11-69) was expressed as a GSTtagged fusion protein and purified as described in Ref. 90. The purity of the protein was examined via SDS-PAGE electrophoresis followed by both Coomassie staining and Western blot using an anti-GST-HRP conjugate (Santa Cruz Biotechnology, Santa Cruz, CA). The concentration of purified protein was determined via BCA assay (Amresco). A total of 248 methylated peptides were synthesized by Sigma Aldrich (desalted, mass-spectrometry checked). The peptides were then printed as triplets onto glass slides (ArrayIt), together with a Cy3 marker and an anti-GST (mouse monoclonal) antibody (Thermo) as references.
The peptide microarray was rinsed with TBST buffer (25 mM Tris, 125 mM NaCl, 0.05% Tween-20, pH 8) and then underwent blocking with 5% nonfat milk in TBST (room temperature for 1 h or 4°C overnight). The slide was then incubated at 4°C with CBX6-GST fusion protein at a final concentration of 5 mM in 5% nonfat milk/TBST for 12 h. After being washed three times for 10 min each time with TBST, the slide was incubated with an anti-GST mouse monoclonal IgG antibody (Thermo) at a final concentration of 1 g/ml in 5% nonfat milk/TBST and shaken gently for 1 h at room temperature. A secondary anti-mouse IgG Dylight-488 conjugated antibody (Thermo) was added to a final concentration of 0.1 g/ml after three more cycles of 10-min washing with TBST. The slide was shaken for 1 h at room temperature and washed three times with TBST.
Data Acquisition and Analysis in Peptide Microarray Experiments-The dried microarray slide was scanned using a Hamamatsu Nano-Zoomer 2.0HT Slide Scanning System (Neuroscience Light Microscopy Facility, University of California, San Diego). Data quantification was processed using the microarray processing software ImageJ (91), for which the fluorescence intensity of a microarray spot was defined as its own mean intensity minus the background mean intensity around it on the scanned image.
Pull-down of CBX6 by Methylated Peptides-Methylated peptides (Sigma Aldrich) (0.05 to 0.1 mg) were incubated in PBS (10 mM, pH 7.2) with dry N-hydroxysuccinimide-activated resin (Thermo) to allow covalent coupling of the peptide via the CЈ terminal amine. The resin was then washed abundantly three times with PBS (pH 7.2), and the excessive reactive sites were quenched with 1 M Tris for 1 h. The resin was washed three times with TBS (25 mM, pH 8) and incubated with CBX6 chromodomain overnight at 4°C. The resin was then washed thoroughly six times with TBS, boiled with Laemmli buffer (DTT), and analyzed with SDS-PAGE gel. The CBX6 chromodomain was detected via Western blotting using an anti-GST-HRP conjugate (Santa Cruz Biotechnology, Santa Cruz, CA).

RESULTS
Deciphering the Recognition of Methylated Peptides by the CBX6 Chromodomain-In order to reveal the atomic mechanisms of CBX6-peptide recognition, we performed residue decomposition (see "Materials and Methods") on the CBX6 -H3K27me3 complex to identify the residues critical for binding (Table I). The two residues making the greatest contribution to the binding interaction are Phe11 (Ϫ3.78 kcal/mol) and Trp32 (Ϫ4.77 kcal/mol). These residues are two of the three in the aromatic, hydrophobic cage that interacts with H3K27me3 through van der Waals attractions (Trp35, with a Ϫ1.98 kcal/mol contribution, is the third). Aside from H3K27me3 (Ϫ11.79 kcal/mol), the greatest contributions to the binding free energy from histone residues arise from H3R26 (Ϫ7.22 kcal/ mol), H3K23 (Ϫ6.20 kcal/mol), H3A25 (Ϫ6.05 kcal/mol), and H3A24 (Ϫ4.64 kcal/mol). Trajectory visualization revealed that there are some important non-van der Waals interactions, such as the salt bridge between Glu46 and H3R26 and the hydrogen bond between Glu43 and H3S28. However, a survey of the residue contributions in Table I shows that van der Waals interactions are the main force in the binding between CBX6 and H3, accounting for 73% of the energy that is responsible for the complex binding. To further quantify the effects of the mutation of residues that are critical to the binding, we calculated the mutation matrix (Fig. 2, Supplementary Material) from in silico point mutation experiments (virtual mutagenesis) (73, 74) as a filter for possible CBX-binding peptides.
Proteome-wide Identification of Methyllysine Peptides Binding to CBX6 Chromodomain-CBX6 is a known reader of histone modifications through chromodomain-methyllysine recognition. We hypothesize that it also recognizes methyllysines in non-histone peptides sharing structural and energetic features with histone peptides. In order to identify these peptides in the human proteome, we developed a pipeline to integrate diverse data for a systematic search. We used the 11-residuelong H3K27me3 peptide as the template. Because the methyllysine is at the ninth position of the H3K27 sequence QLAT-KAARKSA, we required that the putative binding peptides of CBX6 contain a lysine at this position and have a binding energy with CBX6 comparable to that of H3K27me3. Because the number of candidate peptides was very large, we employed additional structural and functional criteria to reduce false positives (Fig. 3). We first extracted all 633,969 possible methyllysine-containing peptides (all 11-residue-long peptides containing a lysine at the ninth position) from the 20,280 human proteins documented in the Uniprot/Swissprot database (June 16, 2009) (92) (Fig. 3). If a peptide segment of a protein mediates the interaction with CBX6, the peptide and also the lysine residue to be methylated are under functional constraint and should be conserved during evolution. Therefore, we kept only the peptides that were conserved for the majority of the 11 positions (59,495 peptides passed the cutoff), and especially for the ninth lysine (39,631 peptides remained), across 42 vertebrate proteomes (supplemental Table S2).
Mass spectrometry data are considered as the gold standard for determining PTM. Because of the lack of effective enrichment methods for detecting methylated peptides, it is risky to rely only on the mass spectrometry data to define methyllysine, but a signal of methylation in a mass spectrometry study is still a good indicator that should be considered. Therefore, we searched for methylation signals of the remain-ing peptides in 30 million mass spectra from human tissues. Given that these mass spectroscopy experiments were not designed to search for methyllysine, we included all 9878 peptides that showed signals of mono-, di-, or trimethylation on the lysine at the ninth position.
Next, we applied two filters that accounted for the structural features of the chromodomain binding peptides. Because in the template complex of CBX6-H3K27me3 the H3 peptide adopts an extended conformation, we removed any peptides that are observed to form a helix in crystal structure, if available, or are predicted to form a helix by secondary structure prediction methods (see "Materials and Methods"). About half of the peptides did not pass this filter, and 4153 remained. In addition, the putative binding peptides of the chromodomain should be exposed on the protein surface to form the complex. Therefore, if the structure of the protein containing the candidate peptide was available in the Protein Data Bank (60), we calculated the SASA for the peptide and kept only peptides with SASAs comparable to that of the H3 peptide. As a result, 3423 peptides passed this filter.
To narrow down the candidate peptides for experimental investigation, we filtered the candidates using two functional criteria. Given that the CBX6 protein functions in the nucleus, it is likely that its interacting partners are also localized to the same cellular compartment. For the 1228 peptides that passed this localization filter, we calculated the correlation of gene expression profiles between CBX6 and the proteins containing the candidate peptides. Because the interacting partners are often co-expressed in various cell types or under multiple conditions, we kept only the proteins whose expression levels were strongly correlated with CBX6, which resulted in a list of 376 candidate peptides.
Lastly, the candidate peptides should bind to the CBX6 chromodomain with an affinity comparable to that of the H3 peptide. To prioritize peptides for experimental tests, we ranked the 376 peptides using their scores calculated from the mutation matrix and removed 144 peptides with positive scores (weaker binders to the CBX6 chromodomain than the H3K27me3 peptide). The final 232 peptide sequences were printed on a peptide microarray for an in vitro binding assay (supplemental Table S3).
Peptide Microarray Assay to Identify CBX6 Binders-The chromodomain (residue positions 11-69) of human CBX6 protein was screened against a total of 248 nine-residue-long peptide sequences bearing one methylated lysine at the eighth position using a peptide microarray platform fabricated as described in Ref. 90. The 248 peptides consisted of the 232 selected non-histone methylated peptides, 6 methylated H3K9 and H3K27 peptides (mono-, di-, and trimethylated) as positive controls, and 10 unmodified SH3-domain-binding peptide sequences as negative controls (supplemental Table  S3). All the peptides were immobilized onto the glass chip via an amine group on the C terminus followed by a hexa-carbon spacer to provide more separation and flexibility of the peptide strand relative to the surface. The CBX6 chromodomain was incubated with the printed peptides on the chip under constant agitation to allow sufficient contact. Binding between the protein and any peptide was detected by visualizing the immobilized protein on the chip with a fluorescent-labeled antibody.
We validated the proper functioning of the current peptide microarray platform and the analysis criteria for CBX6 chromodomain binding using the positive and negative controls. Although no negative control peptides showed any binding activity to CBX6 chromodomain on the microarray, the di-and trimethylated H3K9 and H3K27 peptides were identified as strong binders, with the trimethylated H3K27me3 showing the highest signal intensity. Monomethylated H3K9me and H3K27me both showed fluorescence intensity around the binder/non-binder boundary (with only one out of the three signals above the cutoff value), and thus these signals were considered below the confidence threshold. Given that previous solution binding studies (29) reported a weak binding affinity of these peptides toward CBX6 chromodomain (K D Ͼ 500 M for both trimethylated H3K9me3 and H3K27me3) despite the crystal structures for both complexes achieved, our observations suggested a superior detection sensitivity exhibited by this peptide microarray platform relative to conventional solution methods. To confirm the legitimacy of these weak binding interactions detected on the microarray, additional pull-down experiments were conducted for CBX6 chromodomain with the methylated H3K9 and H3K27 peptides. All di-and trimethylated peptide coated resins successfully immobilized the CBX6 chromodomain in the pull-down experiments, with the trimethylated H3K27me3 showing the highest binding affinity (supplemental Fig. S1), consistent with the microarray results.
A gradient of fluorescent intensities from the protein-incubated peptides was observed throughout the microarray, and a statistical cutoff was selected to quantitatively distinguish Identification of Methyllysine Peptides Binding to CBX6 Chromodomain in the Human Proteome the binders from the non-binders. For each peptide, the fluorescent intensity of the printed triplet was individually measured and fit to a mixed Gaussian distribution with two components (non-binding background versus binder) (Fig. 4). A total of 50 non-histone peptides (Table II) were identified as putative binders for the CBX6 chromodomain with significant fluorescence intensities above a statistical cutoff of p Ͻ 0.05 (from the background Gaussian distribution). Given that most of the peptides passing the computational filters might be possible binders to the CBX6 chromodomain, the cutoff selected based on the distribution of these peptides is likely a stringent one that missed some true positives.
Functional Analysis of the Putative Binders-As the interacting partners of polycomb family proteins are still poorly defined, the newly identified putative binding peptides of CBX6 would reveal unknown functions of the chromatin remodeling complexes and their interplay with other important cellular regulators. We input the proteins that contained the binding peptides from the microarray experiments into DAVID (93), a tool for function annotation clustering analysis using Gene Ontology terms. We found that these proteins were functionally clustered into six groups (cluster cutoff: enrichment score Ͼ 1.3; in each cluster, only Gene Ontology terms with p values Ͻ 0.05 are shown in supplemental Table S4).
CBX6 is able to recognize histone repression marks H3K9me3 and H3K27me3 (27)(28)(29)(30). CBX6 is a component of PRC1, which can alter chromatin structure and repress gene transcription. It is not surprising that a significant portion of array-identified CBX6-binding proteins play roles in chromatin modification. For example, NSD1 and ASH1L are two histone methyltransferases with different substrate specificities. NSD1 methylates H3K36 and H4K20, and ASH1L specifically methylates H3K36. These marks are usually found in the gene  body of transcribed genes. We also identified histone acetyltransferases MYST3 and MYST4 that may interact with CBX6. MYST3 and MYST4 are components of the MOZ/MORF complex, which preferentially acetylates H3. In addition, histone demethylase KDM2B, which removes methylation of H3K4 and H3K36, and histone deacetylase SIRT1, which represses gene expression by deacetylating H3K9ac, also potentially interact with CBX6. Although the question of whether these interactions occur in vivo has yet to be addressed, it is very reasonable to speculate that CBX6 mediates cross-talk between different histone-modifying enzymes.
We also found proteins in the putative binder list that play critical roles in chromatin remodeling and transcriptional regulation. SMCE1 is a component of the BAF complex and regulates gene expression via chromatin remodeling. CDYL2 and CHD8 both contain chromodomain and repress the expression of their target genes. CTCF is well known as an insulator of the genome and blocks interaction between enhancers and promoters, preventing the spread of heterochromatin. Moreover, several proteins involved in various gene regulation and development processes besides chromatin remodeling were also pinpointed. AGO2 is required for RNAmediated gene silencing by the RNA-induced silencing complex, MED1 is a component of Mediator complex co-activators of nearly all Pol II-dependent genes, TET1 is a dioxygenase that plays the important role of hydroxylating methylated cytosine and is involved in embryonic stem cell maintenance, RELB is a component of NF-kappaB, SIN3A represses Myc-related genes, and TLE3 inhibits the Wnt signaling pathway.
In addition, the proteins KKCC1 and SFRS1 are in our array-positive list and have also been identified by Levy et al. (31) as substrates of the lysine methyltransferase SETD6. KKCC1 is a calcium/calmodulin-dependent kinase involved in the regulation of cell apoptosis, and SFRS1 is related to exon skipping, a mechanism that ensures correct splicing. Taken as a whole, our study not only identifies putative methyllysine in the human proteome, but also suggests the functional consequences of this PTM in mediating protein-protein interaction and signal transduction.

CONCLUSION AND DISCUSSION
We present here a systematic approach that integrates structural modeling, bioinformatics analysis, and peptide microarray binding assay to identify methyllysine-containing peptides in the human proteome that may bind to the CBX6 chromodomain. The success of this approach with CBX6 suggests its potential broad application to other chromodomains or domains recognizing methylated lysine or arginine. Considering the difficulty of defining PTMs of methylation in proteomic studies, this proof-of-concept study opens a new avenue for defining methylation events.
The newly identified interacting partners of CBX6 reveal many possible cross-talks between different regulatory mech-anisms, including histone modification, chromatin remodeling, DNA methylation, RNA splicing, and microRNA processing. For example, previous epigenomic studies have identified bivalent promoters (94) and enhancers (95,96),Othat is, the promoters/enhancers are "poised" and marked by both active (e.g. H3K4me3 for promoter and H3K4me1/H3K27ac for enhancer) and repressive (H3K27me3 for both promoter and enhancer) histone modifications. The promoters and enhancers can become active or remain silenced, depending upon cellular conditions. CBX6 is a component of the PRC1 complex, which collaborates with PRC2 to silence genes through H3K27me3 (97). Therefore, we speculate that the putative interactions identified by our study might imply cross-talk between multiple chromatin remodeling complexes and that CBX6's interactions with both active (e.g. NSD1, ASH1L, MYST3/4) and repressive (e.g. KDM2B, SIRT1) factors might make it possible for the mechanisms underlying regulation of the poised promoters/enhancers to be deciphered. Although these findings are still waiting for in vivo confirmation, the specific predictions of the methylation site and interaction between CBX6 and other proteins would greatly facilitate further investigation of the biological significance.
Although peptide microarray technology has been widely exploited in the study of protein-peptide interactions, its combination with computational screening has proven to be important in proteomic studies, because the number of candidate peptides is normally huge and it is financially prohibitive to synthesize all of them (90). In this and our previous studies (90), bioinformatics analyses that integrate diverse and complementary information such as binding affinity estimated by mutation matrix and conservation across species have greatly boosted the hit rate of peptide microarrays, allowing us to find binding peptides from the human proteome. We thus believe such an integrated approach is particularly useful in identifying PTMs that are difficult to define otherwise and revealing how these PTMs regulate protein functions.