Stable-isotope-labeled Histone Peptide Library for Histone Post-translational Modification and Variant Quantification by Mass Spectrometry *

To facilitate accurate histone variant and post-translational modification (PTM) quantification via mass spectrometry, we present a library of 93 synthetic peptides using Protein-Aqua™ technology. The library contains 55 peptides representing different modified forms from histone H3 peptides, 23 peptides representing H4 peptides, 5 peptides representing canonical H2A peptides, 8 peptides representing H2A.Z peptides, and peptides for both macroH2A and H2A.X. The PTMs on these peptides include lysine mono- (me1), di- (me2), and tri-methylation (me3); lysine acetylation; arginine me1; serine/threonine phosphorylation; and N-terminal acetylation. The library was subjected to chemical derivatization with propionic anhydride, a widely employed protocol for histone peptide quantification. Subsequently, the detection efficiencies were quantified using mass spectrometry extracted ion chromatograms. The library yields a wide spectrum of detection efficiencies, with more than 1700-fold difference between the peptides with the lowest and highest efficiencies. In this paper, we describe the impact of different modifications on peptide detection efficiencies and provide a resource to correct for detection biases among the 93 histone peptides. In brief, there is no correlation between detection efficiency and molecular weight, hydrophobicity, basicity, or modification type. The same types of modifications may have very different effects on detection efficiencies depending on their positions within a peptide. We also observed antagonistic effects between modifications. In a study of mouse trophoblast stem cells, we utilized the detection efficiencies of the peptide library to correct for histone PTM/variant quantification. For most histone peptides examined, the corrected data did not change the biological conclusions but did alter the relative abundance of these peptides. For a low-abundant histone H2A variant, macroH2A, the corrected data led to a different conclusion than the uncorrected data. The peptide library and detection efficiencies presented here may serve as a resource to facilitate studies in the epigenetics and proteomics fields.

a peptide. We also observed antagonistic effects between modifications. In a study of mouse trophoblast stem cells, we utilized the detection efficiencies of the peptide library to correct for histone PTM/variant quantification. For most histone peptides examined, the corrected data did not change the biological conclusions but did alter the relative abundance of these peptides. For a low-abundant histone H2A variant, macroH2A, the corrected data led to a different conclusion than the uncorrected data. The peptide library and detection efficiencies presented here may serve as a resource to facilitate studies in the epigenetics and proteomics fields. Molecular & Cellular Proteomics 13 In eukaryotes, histones package and order DNA into nucleosomes. Histones are critical for the structural organization and transcriptional activities of chromatin. These proteins are highly conserved in eukaryotes, but they have very dynamic functions and are subject to many different regulatory mechanisms (1). One of the major mechanisms is enacted via the use of different histone variants and post-translational modifications (PTMs). 1 Over the past two decades, accumulating evidence has suggested that levels and PTM compositions of many histone variants play important roles in cellular processes and human diseases. There are increasing needs for the precise quantification of histone variants and PTMs in biomedical studies. Mass spectrometry (MS) has proved to be a robust and powerful tool for analyzing histones, as it can be used to simultaneously identify and quantify histone variants and PTMs (2).
Histones are rich in basic residues and can be highly modified, resulting in an abundance of different isoforms. As an example, the H3 N-terminal tail carries more than two dozen residues that may be modified (3). Many histone lysine residues can be acetylated or mono-, di-, or tri-methylated. These features make histone analysis challenging for the analytical chemistry field. To overcome these challenges, we previously developed an efficient chemical derivatization method for histone peptides that utilizes propionic anhydride to react with the primary amine on lysine residues (4,5). This method has been adopted by research groups worldwide (6 -17) and enables the production of (i) tryptic peptides of a reasonable length appropriate for mass spectrometry, (ii) increased hydrophobicity of histone peptides to obtain better separation and retention on reverse phase (RP) HPLC columns, and (iii) improved fragmentation of lesser charged histone peptides.
Our protocol of chemical derivatization combined with highresolution bottom-up nano-LC-MS/MS has been highly referenced in the literature (6 -17). In the majority of these studies, a relative abundance strategy was applied (8,13,14,18,19): histones were extracted from two or more biological samples and processed in parallel. The relative percentage, rather than the absolute amount of one or more particular histone PTMs, is quantified and compared. For a given histone peptide with multiple isoforms, including the unmodified peptide and the peptide with various PTMs and combinations of PTMs, the raw abundance of each isoform is quantified using each MS extracted ion chromatogram. The relative percentage of a specific isoform of the same peptide is calculated from the raw abundance of this particular isoform over the sum of all possible forms. This strategy eliminates systematic bias caused by sample preparation or instrumentation. Furthermore, stable isotope labeling in cell culture or at the chemical derivazation step may be introduced to enable more precise quantification (5). However, this strategy cannot eliminate the bias in the peptide detection efficiency in bottom-up MS. A well-known example of this bias is that phosphorylated peptides generally have lower detection efficiencies than their unphosphorylated counterparts, as recently systematically illustrated by Kuster and colleagues (20). This bias would potentially cause problems in the interpretation of histone PTM data, especially when there is a need for absolute quantification. It would also cause a problem for relative quantifications-because the data are normalized to the total signal from all isoforms of a peptide (8,13,14,18,19), bias in quantification of one particular form would affect all the other forms. To overcome these issues, we present here a library of 93 synthetic histone peptides and comprehensive analyses of these peptides using bottom-up nano-LC-MS/MS and correction factors generated through normalization of their detection efficiencies. We hope these data will become a useful resource for improved accurate quantitative analyses of histone proteins.

EXPERIMENTAL PROCEDURES
Synthetic Peptide Preparation-Ninety-three synthetic peptides representing various histone tryptic peptides were synthesized via Cell Signaling Technology ® Protein-Aqua™ (Table I). As previously described (21)(22)(23), these peptides were purified by means of RP-HPLC and analyzed via MALDI-TOF-MS and nanospray tandem MS.
The peptide concentration was measured via amino acid analysis. The stock solution was 25 pmol/l for all peptides. For each sample preparation, equal amounts (5 l) of each peptide were mixed and aliquoted, resulting in a concentration of 0.27 pmol/l (25/93 ϭ 0.27) for each peptide. For each sample preparation, 50 to 100 l of the mixture were subjected to two rounds of propionylation as previously described (5). Samples were then desalted by C18-based STAGE tips and injected onto an online nano-LC-MS setup as described in Ref. 5.
Nano-Liquid Chromatography Electrospray Ionization Tandem Mass Spectrometry-The samples were loaded onto one of the three instrument setups shown in Table II, all at 300 nL/min. Histone peptides were resolved on a two-step gradient from 2% acetonitrile to 30% acetonitrile in 0.1% formic acid over 40 min, then from 30% acetonitrile to 95% acetonitrile in 0.1% formic acid over 20 min. The mass spectrometers were operated in the data-dependent mode with dynamic exclusion enabled (repeat count: 1; exclusion duration: 0.5 min). MS instrument methods were set up as previously reported (2). Settings for resolution, automatic gain control, and normalized collision energy are listed in Table II. Every cycle, one full MS scan (m/z 290 to 1600) was collected, followed by 10 MS/MS scans using either high-energy C-trap dissociation or collision-induced dissociation in the ion trap (Table II). All isolation windows were set at 2.0 m/z. Ions with a charge state of 1 and a rejection list of common contaminant ions (including keratin, trypsin, and BSA) (exclusion width ϭ 10 ppm) were excluded from MS/MS.
MS2 Spectra Extraction-MS2 spectra of 93 synthetic peptides (supplemental Fig. S1) were produced using a modified version of X! Tandem (version Sledgehammer 2013.09.01.1) (24) along with the Trans-Proteomics Pipeline (25). The Lorikeet viewer was used to generate MS spectra figures (26). Both X! Tandem and the Lorikeet viewer were enhanced to allow customer-defined amino acid residues with stable isotope labeling. The search was conducted against an in-house-generated database including the 17 peptides in the peptide library (Table I), as well as 17 corresponding decoy sequences. The mass tolerance for precursor ions was set at 0.007 m/z. Masses of the heavy residues were supplied to X! Tandem, being defined by otherwise unused single-letter abbreviations as follows: B for heavy A, J for heavy G, O for heavy L, U for heavy P, and Z for heavy V. Propionylation (ϩ56.0262 Da) was defined as a static modification on the N termini and on unmodified and mono-methylated lysine residues. Other PTMs were defined as dynamic modifications.
Peptide Quantification-A three-step quantification protocol was conducted. First, the mono-isotopic raw abundance of each peptide was manually extracted from the Xcalibur Qual Browser, based on the area underneath extracted ion chromatograms at the full MS level. All detectable charge states were summed together, typically MH ϩ , MH2 ϩ , MH3 ϩ , and MH4 ϩ . For examples, please see Ref. 5. Second, for some co-eluting peptides, isotope masking was observed and taken into account. The histone H4 4 -17, canonical H2A 4 -11, and H2A.Z 1-19 peptide families have isobaric peptides that co-elute because they carry identical numbers of acetylation groups. In addition, two co-eluting peptides in the histone H3 27-40 family also have partial isotope overlap (peptides 41 and 43). To overcome the interference of neighboring ion isotope envelopes in quantifications, the raw abundance of peptides 43, 60, 61, 62, 64, 65, 66, 67, 68, 70, 71, 72, 81, 86, 87, 89, and 90 was corrected based on theoretical distributions of naturally occurring heavy isotopes using emass (27) (supplemental Table S1). For instance, histone H4 4 -17 peptides that carry one acetylated lysine (peptides 59 -62) co-elute and have sequential overlapping isotopic peaks in the mass spectrometer (supplemental Fig. S2). Taking the isotope interference between peptides 60 and 61 as an example, the MH ϩ of peptide 60 is 1545.897, and the third isotopic peak of this peptide is 1547.911, which overlaps with peptide 61 (MH ϩ ϭ 1547.910) ( Table I). The theoretical abundance of peptide 60's third isotopic peak is 33.7% of the first peak (27) (supplemental Table S1). Therefore, for each individual MS run, we subtracted 33.7% of peptide 60's raw amount from peptide 61's raw amount. Supplemental Fig. S2 demonstrates that isotope correction was necessary to eliminate the isotope interference effects. Third, in order to compare the detection efficiencies across all 93 species of peptides, we took into account multiple isotopes of each peptide. The higher a peptide's molecular weight is, the more naturally occurring heavy isotopes it may have. Because the first quantification step was based on only the monoisotopic peaks, the quantifications were biased toward smaller peptides. To correct for this bias, we used emass (27) to calculate each peptide's isotope distribution and recalculated each peptide's raw amount based on the isotope-correction factors presented in supplemental Table S1. Statistics-All t tests were performed with two tails. For p values shown in Figs. 4 to 7B and supplemental Tables S5 to S15, pair-wise p values were calculated. For p values used to generate q values (see below) and those shown in Figs. 7C to 7H and supplemental Tables S16 and S17, unpaired p values were calculated.
To correct for multiple testing, we employed the q value method developed by Storey and colleagues (28,29). First, p values were generated using the quantification of each peptide between sample preparations 1 and 2 ( Fig. 2A, supplemental Table S2). Subsequently, using the R package qvalue, a list of q values were generated based on the 93 p values, with a false discovery rate of 1% (supplemental Table S2) (28,29). The Smoother method was applied. Degrees of freedom were chosen as 9 based on the best fit of pi0 curves to the raw data (supplemental Fig. S3).
In order to further assess the contribution of different experimental conditions to the technical variance, namely, the two sample preparations and the equipment, we performed a permutation test of unsupervised hierarchical clustering. Each iteration of hierarchical clustering was performed on a random subsample of 60 (out of 93) peptides. The result was then compared to results from the known clusters of samples from either the two sample preparations or the three different instruments (Q-E, Elite, and Velos). p values were generated from a 10,000-repetition run.
GRAVY Scores and Isoelectric Points-GRAVY (grand average of hydropathy) scores of 16 peptides were calculated using an online GRAVY calculator. The GRAVY score is widely used to measure the hydrophobicity of peptides and is calculated by summing the hydrop- 93 histone peptides with various PTMs were synthesized with stable isotope (C13 and N15)-labeled amino acids (underlined in the column "Synthetic Peptide Sequence and PTMs"). me1, -2, and -3 denote mono-, di-, and tri-methylation; ph, phosphorylation; ac, acetylation. The peptide number used in Fig. 2 is shown in the first column. Cell Signaling Technology Inc. product numbers are shown in the second column. The averaged relative abundance in the last column represents the detection efficiencies of these peptides. athy values for each amino acid residue and dividing by the length of the sequence, as defined in Ref. 30.
Isoelectric points were calculated using an online tool, "computer PI/Mw" from ExPASy Bioinfomatics Resource Portal (31).
Tissue Culture and Histone Processing-Trophoblast stem (TS) cells were kindly provided by Dr. Marisa S. Bartolomei (University of Pennsylvania, Perelman School of Medicine). TS cells were cultured and maintained as previously described (32). In brief, TS cells were plated onto feeder mouse embryonic fibroblast cells (inactivated by mitomycin C) supplemented with 0.1% 25 g/ml FGF4 (Product No. 235-F4, R&D Systems, Minneapolis, MN) and 0.1% 1.5 mg/ml heparin (Product No. H3393, Sigma) in TS cell media. The TS cell media was composed of 20% FBS, 1% penicillin (500 U/ml) and streptomycin (5 mg/ml), 1% 200 mM L-glutamine, 1% 100 mM sodium pyruvate and 100 M ␤-mercaptoethanol in RPMI 1640 (pH 7.2). To collect mouse embryonic fibroblast-free TS cells, cells were trypsinized and replated for 45 min. The supernatant was transferred to a new plate for another 45 min. The new supernatant contained mostly TS cells and was used for protein analysis. To differentiate TS cells, TS cells were separated from mouse embryonic fibroblast cells and cultured in TS cell media without FGF4 and heparin for 5 to 6 days.
Histones were acid extracted from TS cells and processed with two runs of chemical derivatization, trypsin digestion, and desalting as previously described (5). For each MS run, 100 fmol of the synthetic peptide library was mixed with 0.4 to 1 g of endogenous histone peptides. Four and three MS runs of undifferentiated and differentiated TS cell histones, respectively, were conducted in Q-Exactive, using parameters shown in Table II. Endogenous peptide quantifications were done in the same manner as for the synthetic peptides, except for isobaric species. As previously illustrated in Ref. 5, we targeted the m/z for the isobaric peptides and quantified the relative abundance of their unique b or y ions at the MS/MS level. Subsequently we determined the relative abundance at the MS1 level based on the ratios we obtained.
Endogenous Histone Peptide Quantification and Correction-Endogenous histone peptides were quantified using three strategies: original data, internal-correction using spiked-in heavy peptide standards, and external correction using detection efficiencies generated from the 16 independent runs conducted earlier in the study. For each MS run (see above), the endogenous histone peptides (from TS cells) and the injected synthetic peptides were manually quantified based on extracted ion chromatograms. Using the same quantification strategies described above, we quantified using the monoisotopic peak for each peptide mass and across multiple charge states and cor-rected for underrepresentation of large peptides (supplemental Tables S1 and S3). We define the raw abundance of each peptide's monoisotopic peak as R and the isotope correction factors as ICF. For both endogenous (E) and synthetic peptides (Si for internal controls), the abundance of each peptide was achieved by dividing R by ICF. For external controls (Se), we used the relative abundance values from Table I, which were calculated using data from the 16 MS runs presented in Fig. 2.
For a given peptide family or histone variant group, a total number of N peptides was included in the quantification. The total abundance of each peptide family was set as 100%, and the relative abundance of each peptide (j) was normalized to the total abundance. The formulas used to calculate the three strategies are presented below.
First, for the original data, the relative abundance of each peptide is calculated as Second, for internal corrections, the following formula was used: Third, for external corrections, the following formula was used:

RESULTS
Peptide Library Design-To facilitate the expanding needs of histone PTM analyses in biomedical studies and to provide Listed are three instrument setups and settings for MS runs. For Elite and Velos, both high-energy C-trap dissociation (HCD) and collision-induced dissociation (CID) were used for MS/MS scans. more quantitative histone PTM characterization, we designed a peptide library standard utilizing the Protein-Aqua™ technology by Cell Signaling Technology ® (Table I). The guiding principles in the design of the library were to (i) represent the most biologically important and commonly seen histone PTMs, (ii) have both unmodified and modified forms of the same peptide, (iii) make PTMs distinguishable from the endogenous peptides at the MS1 level, and (iv) make PTMs distinguishable from each other when they have the same peptide backbone. This study focused on histone proteins that generate suitable peptides for bottom-up MS after our chemical derivatization protocol, namely, histone H3, H4, and H2A variants. We included the following modifications in the library: lysine mono-, di-, and tri-methylation (me1, me2, and me3); lysine acetylation (ac); arginine me1; threonine phosphorylation (ph); serine ph; and N-terminal acetylation (Nac). 93 synthetic peptides were designed, including 55 peptides representing different forms of eight histone H3 peptides, 23 peptides representing four histone H4 peptides, 5 peptides representing two canonical histone H2A peptides, 8 peptides representing one histone H2A.Z peptide, and one peptide each for histone macroH2A and H2A.X. Each peptide had one to three C13-and N15-labeled amino acid residues (G, A, P, V, and L) (Table I), making these peptides at least 4 Da heavier than the endogenous peptides. For the same peptide family (peptides that have the same backbone but different PTMs), different combinations of heavy residues were used for each individual peptide.
An example of the peptide design is given in Fig. 1. The histone H4 4 -17 peptide contains four lysine residues that may be acetylated. Multiple acetylation has been observed in many different organisms in vivo and is strongly correlated with active transcription (33,34). Fig. 1A shows endogenous histone H4 4 -17 peptides from mouse TS cells: unmodified peptide (Unmod), one-ac, two-ac, three-ac, and four-ac. These peptides were extracted and processed based on our standard bottom-up MS protocol (5). Acetylation on lysine residues blocks the chemical derivatization with propionic anhydride; therefore the more acetylations a peptide has, the lighter and more hydrophilic it is (5). Taking the one-ac population as an example, it contains four different forms: acetylation on K5, K8, K12, or K16. These isobaric peptides have very close retention times in RP-HPLC and can be distinguished only by their MS2 spectra. Fig. 1B shows 16 synthetic peptides of the histone H4 4 -17 family, all having distinct m/z values due to different designs of heavy residues (Table I, peptides 58 -73). The one-ac peptides are highlighted in different colors. Fig. 1C shows a mixture of endogenous and synthetic peptides. We can accurately differentiate and quantify all five forms of one-ac peptides (see below). To be noted, K16ac has a very small delay in elution relative to K5/K8/ K12ac (Fig. 1B), and the endogenous one-ac peptide peak is wider than those of the individual synthetic peptides, possibly because of the co-eluting isoforms (Fig. 1C).

Reproducible Technical Replicates among Two Sample
Preparations and Three Instruments-All 93 peptides were mixed at the same concentration and chemically derivatized with propionic anhydride following our standard bottom-up MS approach (5). To achieve an optimized peptide injection amount, we tested a range of injection amounts from 20 fmol to 100 fmol per peptide on the Q-Exactive. Results showed that 50-fmol and 100-fmol injections gave stable and similar quantifications, whereas runs with an injection amount of less than 50 fmol did not (supplemental Fig. S4). This was further confirmed by runs conducted on two other instruments using up to 150 fmol of sample (Fig. 2). A very similar optimal peptide standard injection amount was obtained in another recent study (35). Moreover, when we injected these peptides as internal controls with endogenous histone peptides, 100 fmol of the synthetic peptide mixture was approximately equivalent to our optimal endogenous histone concentration (0.4 to 1 g) (Fig. 1C). Therefore, we decided to focus on this concentration range in our analyses.
To ensure the reproducibility of our experimental procedures, two batches of peptides were prepared and processed independently by two researchers (sample preparations 1 and 2). The library does not contain any overlapping monoisotopic m/z values among peptides that may co-elute; therefore we can easily extract the raw abundance of each peptide by measuring the peak area in the MS chromatogram. A total of 16 runs with injection amounts from 50 to 150 fmol were conducted on three different mass spectrometers ("Experimental Procedures" and Fig. 2). Figs. 2A and 2B show heat maps representing relative abundances of 93 peptides in each run and a comparison between sample preparations 1 and 2, as well as among the three instruments. We do not observe significant variation between the two sample preparations, measured by q values with a false discovery rate of 0.01 ( Fig. 2A, supplemental Table S2, supplemental Fig. S3) (28,29). Additionally, with a null hypothesis that all runs were the same between the two sample preparations, we performed a permutation test of unsupervised hierarchical clustering. Each iteration of hierarchical clustering was performed on a random subsample of 60 (out of 93) peptides. The result was then compared against those from the two known clusters (sample preparation 1 versus 2). We asked how likely it was for the data to cluster non-randomly into the two clusters over 10,000 repetitions. Proper clustering did not occur significantly, and we obtained a p value of 0.0843. This test further supports the result that no significant difference was generated by the two independent sample preparations ( Fig.  2A). After combining data from both sample preparations, we also accessed the contribution of the instruments to the technical variance by means of unsupervised hierarchical clustering. For a strict test, we asked how likely it was for the data to cluster non-randomly into the three known clusters along the different instrument setups (Q-E, Elite, and Velos) over 10,000 repetitions. Proper clustering did not occur in most of the   The one-ac population is highlighted with various colors: endogenous one-ac in maroon, synthetic K16ac in green, synthetic K12ac in navy, synthetic K8ac in olive, and synthetic K5ac in purple. A, the one-ac peak includes four populations (K5/K8/K12/K16 ac) that cannot be separated by standard RP-HPLC. B, the four one-ac synthetic peptides contain different C13 and N15 labeled residues (highlighted in red) and therefore are distinguishable from each other at the MS1 level. Because all of these peptides coelute, some overlap of isotopes was observed (right panel). This can be easily corrected at the peptide quantification step (see "Experimental Procedures" and supplemental Table S1). C, approximately 400 ng of endogenous histone peptides were mixed with 100 fmol of synthetic peptide library. The MS1 chromatograms of five m/z representing the endogenous one-ac peptides and four heavy peptides are shown.  Table I. repetitions, and we got a p value of 0.9929. For a less strict test, we asked how often any two of the three groups clustered correctly, and we obtained a p value of 0.4943. These results indicate that technical replicates from different machines are not significantly different. This is illustrated by the heat maps in Fig. 2B.
Wide-spread Dynamic Differences in Detection Efficiencies-We observed a large dynamic range in relative abundance across 93 peptides in each individual MS run, even though all of the peptides were present at the same concentration. The average relative abundance ranged from 0.0018% to 3.20% (Fig. 2C, Table I), whereas the solution value should have been 1.08% (100%/93) for each peptide. The averaged value for the peptide with the highest intensity (peptide 30, Kme2SAPATGGVKKPHR, histone H3 27-40) was more than 1700-fold as high as the averaged value for the lowest peptide (5, TKme2QTAR, histone H3 3-8) (Table I). It is well known that variations in peptide length, primary sequence, and modifications may all contribute to differences in (i) peptide hydrophobicity, (ii) binding affinity with reverse phase resins, (iii) peptide basicity, and (iv) ionization efficiency at the liquid-gas interphase, thus resulting in the huge gap observed. In this study we aimed to describe the observed detection efficiencies of histone peptides and to provide quantitative normalization methods based on our quantifications.
The peptide length and primary sequence should be the major factors contributing to the detection efficiency of the 16 unmodified peptides in our library. Consistent with the general observations in the field, there was no clear correlation between peptide molecular weight and detection efficiency in the mass spectrometer for the size range of peptides in our study, as illustrated in Fig. 3A. Similarly, neither the GRAVY scores (30) (Fig. 3B) nor the isoelectric points (Fig. 3C) of these peptides correlated with the detection efficiency. To test for any influence of the chemical derivatization on the detection efficiencies, we also analyzed unpropionylated peptides. Because they are in general more hydrophilic than the propionylated peptides and are not retained well on C18 columns, we could reliably detect and quantify only 56 peptides (data not shown), 7 of which were unmodified peptides (28, 49, 50, 52, 78, 83, and 92). Consistent with the propionylated peptides, no correlation was observed between the peptides' detection efficiencies and their molecular weight, GRAVY scores, or isoelectric points (supplemental Fig. S5).
Effects of Modifications on Detection Efficiency-Our library contains 87 peptides that carry at least one modification. To assess whether a particular modification affected the detection efficiency, we compared each modified peptide to its unmodified form (Unmod). The total amount of each peptide family was set at 100%, as shown in Figs. 4 to 6. We observed some trends in how different modifications may affect the detection efficiency. Surprisingly, the same modifications had very different effects when added on different peptides. To be noted, in order to provide relevant information for biological samples, we focused on derivatized synthetic peptides in this study. The propionyl groups were mainly added to primary amine groups, namely, the N termini of peptides, as well as lysine residues with no modifications or mono-methylation. Lysine residues that are acetylated or diand tri-methylated are not propionylated.
Lysine Acetylation-In our histone peptide library, four peptides (H3 54 -63, canonical H2A 4 -11, H2A.Z 1-19, and H4 4 -17) carry mostly lysine acetylation when they are modified in vivo. Histone H3 54 -63 peptide can be acetylated on K56, and this modification is related to chromatin assembly and DNA damage (36). As shown in Fig. 4A, 70.4% of the signal came from the acetylated form, which is a 2.4-fold increase relative to Unmod (p Ͻ 0.01). In contrast, when the canonical histone H2A 4 -11 peptide was acetylated on either the K5 or the K9 position, the detection efficiency was significantly lower than for Unmod (4.7-fold and 2.7-fold decrease, respectively) (Fig. 4B, supplemental Table S5). The detection efficiency of double acetylation on K5 and K9 was further reduced, with an 11.2-fold decrease relative to Unmod (Fig. 4B).
Intriguingly, K5ac and K9ac were also significantly different from each other (p Ͻ 0.01, supplemental Table S5), suggesting a positional effect of lysine acetylation.
Similarly, on histone H2A.Z 1-19 peptides, the position of the acetylations played an important role in determining the detection efficiency (Fig. 4C, supplemental Table S6). This peptide can be acetylated on K4, K7, and K11 in vivo (37,38). For the mono-acetylated species, both K7ac and K11ac affected the detection efficiency negatively, showing a 1.6-fold  and 1.1-fold decrease relative to Unmod, respectively. When two acetylations were present, K4acK11ac and K7acK11ac had opposite effects: a 1.4-fold increase and a 2.1-fold decrease, respectively. When all three lysines were acetylated, the detection efficiency was not different from that of Unmod (Fig. 4C, supplemental Table S6). These results suggest that multiple lysine acetylations on the same peptide may have antagonistic effects on detection efficiency. Supporting this idea, when we plotted the relative abundance against the number of acetylation events (Fig. 4D), only the averaged one-ac peptide was significantly lower than other species; Unmod, two-ac, and three-ac were not different from one or the other (supplemental Table S6).
As previously mentioned, the histone H4 4 -17 peptide may have up to four acetylated lysine residues, and therefore a total of 16 isoforms have been observed in many biological models (Figs. 1 and 4E, supplemental Table S7). Our quantifications suggest that K8ac and K16ac had negative effects on detection efficiency, whereas K5ac and K12ac were not different from Unmod. Overall the peptides of the H4 4 -17 family were more similar to each other (relative standard deviation (RSD) ϭ 19.9%) than the other three families shown in Fig. 4 (RSD ϭ 57.7%, 96.7%, and 30.0%). On this peptide, acetylations generally reduce detection efficiency (Fig. 4F). The averaged one-ac, two-ac, three-ac, and four-ac peptides were all significantly lower than Unmod (1.1-, 1.3-, 1.2-, and 1.3-fold, respectively). This is consistent with a previous observation in a study by Kelleher and colleagues using a middle-down approach (39).
In contrast to the complicated effects of lysine acetylation and methylation, Rme1 seemed to positively affect the detection efficiencies, based on two peptides in our library. On the histone H3 1-8 peptide, R2me1 showed a 1.2-fold increase relative to Unmod (Fig. 5D) (p Ͻ 0.01), and R3me1 on histone H4 1-17 peptide showed a 1.6-fold increase (Fig. 5E) (p Ͻ 0.01).
Threonine and Serine Phosphorylation-As previously mentioned, phosphorylation (ph) on T, S, and Y residues usually has a negative effect on ionization efficiency, which may be due to (i) the suppressive effect that negative charges on phosphate groups have on ionization efficiency in positive ion mode or (ii) low retention of phosphorylated peptides in RP- HPLC resulting from their increased hydrophilicity (42). Therefore, the negative effects of phosphorylation should be more pronounced on shorter peptides. Three peptides in our peptide library consistently behaved as predicted. Relative to their unmodified counterparts, T3ph on histone H3 3-8, S10ph on H3 9 -17, and S28ph on H3 27-40 exhibited 5.9-fold, 2.4-fold, and 1.1-fold decreases, respectively (Figs. 5C, 6B, and 6C; supplemental Tables S10, S12, and S13). To be noted, our chemical derivatization removes positive charges on unmodified lysine residues; therefore, the number of lysine residues on each peptide would not have as much of an impact on the overall charge of the peptides.
Peptides with Multiple Modifications-Three peptide families of the histone H3 N-terminal tail have multiple residues that may be modified differently on their own and in combination. We have included the most commonly seen PTMs on these peptides in our library.
The H3 18 -26 peptide has two lysine residues that can be mono-methylated or acetylated in vivo (43). In our experiments this peptide exhibited the least variation (RSD ϭ 13.1%) among its different isoforms (Fig. 6A, supplemental Table S11). K18me1 and K23ac had similar negative effects on detection efficiency (1.3-fold and 1.2-fold decreases, respectively). It seemed that me1 only had an effect when it was on the K18 position, whereas ac had an effect on the K23 position. This observation further supports the idea that the positions of the PTMs have a great effect on detection efficiency.
As shown in Fig. 6B and supplemental Table S12, all acetylations on the H3 9 -17 peptide reduced the signal intensity, similar to the canonical H2A 4 -11 peptide (Fig. 4B). K9ac, K14ac, and K9acK14ac exhibited 1.8-, 1.5-, and 2.1-fold decreases relative to Unmod, respectively. K9acK14ac levels were also significantly lower than in the singly acetylated peptides (supplemental Table S12). The methylation profile of this peptide was similar to those of the H4 20 -23 and peptides in that K9me1 had the most abundant signal (21.9%, 2.3-fold increase from Unmod), whereas K9me2 and K9me3 were much lower (1.9-and 2.3-fold decreases). Similar patterns of K9 methylation persisted in combination with K14ac or S10ph (Fig. 6B, supplemental Table S12). All forms of K14ac and S10ph showed consistent negative effects on the detection efficiencies relative to their unmodified counterparts.
Unlike the acetylations on H3 9 -17 peptide, which had reduced intensities, K27ac on histone H3 27-40 peptide was 1.2-fold higher than Unmod (Fig. 6C, supplemental Table  S13). Interestingly, the methylation pattern on this peptide was different from those of all previously discussed peptides. All forms of K27 methylation by themselves were significantly higher than Unmod: me1, 1.1-fold; me2, 1.7-fold; and me3, 1.7-fold. The pattern of K27 methylation (K27me2/3 Ͼ K27me1) was largely maintained when S28 or K36 was modified, except that K27me3K36me3 was not different from K27me1K36me3 (supplemental Table S13). Moreover, K27me1 in combination with any other modifications on S28 and K36 reduced signal intensity (Fig. 6C). For methylation on K36, only K36me2 showed a decrease of 1.1-fold relative to Unmod. However, when K36 methylation was coupled with any form of K27 methylation, it was significantly lower than the unmodified K36 counterpart (Fig. 6C, supplemental Table  S13). For instance, the K27me2K36me1 isoform was much lower than the K27me2 isoform. S28ph had similar effects. These data suggest that combinatorial PTMs on histone H3 27-40 peptide generally lower detection efficiency.
Application of the Synthetic Peptides in Histone and Histone PTM Quantification-Our original protocol (5) did not consider the variations in peptide detection efficiencies. As mentioned, this might have led to bias in the peptide quantification and data interpretation. With the synthetic peptide library available, we employed it to investigate histone variant and PTM changes during mouse TS cell differentiation. These cells were derived from the trophoectoderm layer of mouse embryos 3.5 days post-coitum. During mammalian early development, the trophoectoderm lineage develops to vital extraembryonic placenta tissues, and therefore is critical for embryonic development. TS cells are widely used as a trophoblast model in vitro (46). Similar to embryonic stem cells, these cells can be differentiated into multiple lineages that mimic extra-embryonic tissues (47). We mixed endogenous histone peptides with the synthetic peptide library and analyzed the histone variants and histone PTMs.
First, we quantified the histone H2A variants in undifferentiated TS cells. Three histone H2A peptide pairs (endogenous and synthetic) were used: peptide 83 (HLQLAIR) is shared by the canonical H2A and H2A.Z, peptide 93 (GKTGGKAR) comes from H2A.X, and peptide 92 (SAKAGVIFPVGR) comes from macroH2A. As shown in Fig. 7A, the original data were obtained by quantifying the endogenous peptides in four MS runs. 91.6% of the signal came from canonical H2A/H2A.Z. We also quantified the synthetic peptides in the same runs and used the detection efficiency to correct for the measurement of each individual run. The raw abundance of each endogenous peptide was adjusted based on the relative detection efficiency of its synthetic counterpart in the same MS run ("Experimental Procedures"). After the internal correction (Int_corr), averaged canonical H2A/H2A.Z amounts dropped to 62.8%, whereas H2A.X increased from 8.0% to 36.8%. MacroH2A was very low and remained relatively unchanged (0.37% versus 0.42%). To test whether we could obtain comparable corrections using the detection efficiencies obtained from independent runs, we performed external corrections (Ext_corr) using data from the 16 runs of just the synthetic peptides, which were previously presented ( Fig. 2 and Table  I). The raw abundance of each H2A variant was corrected based on the relative detection efficiencies in Table I. Subsequently, the relative abundance of each variant was calculated based on the sum of all corrected H2A variants (set at 100%). The results of Ext_corr were very similar to those of Int_corr: 66.5%, 33.1%, and 0.42% for canonical H2A/H2A.X, H2AX, and macroH2A, respectively (Fig. 7A, supplemental Table S14). The corrections enabled us to accurately measure and compare the histone H2A variants.
We measured histone PTMs using the same method. Taking histone H4 K20 methylation as an example, we quantified both endogenous and synthetic H4 20 -23 peptides as shown in Fig. 7B. Before correction, K20me1 seemed to be the most abundant form (47.5%, Fig. 7B). After either internal or external corrections, K20me2 became the most abundant form, increasing from 24.6% to more than 70%, whereas the K20me1 level dropped to around 10%. This result is very similar to that of a previous study using top-down MS, which showed that more than 80% of histone H4s carry me2 on the K20 position in HeLa cells (48). Two peptides in the H4 20 -23 peptide family, Kme2VLR and Kme3VLR (peptides 76 and 77), had very low detection efficiencies (Table I). Therefore, they would be more easily affected by technical noise and variations, which may contribute to some difference between internal and external corrections in this family (supplemental Table S15). However, both internal and external corrections correct the data in the same direction and are more similar to each other than the original data.
Next, we applied this method to compare histone variant and PTM levels in undifferentiated and differentiated TS cells. Three datasets were generated using the original method of quantification, internal correction factors from spiked-in heavy peptides (Int_corr), and external correction based on detection efficiencies obtained from the 16 independent runs (Ext_corr). For most peptides examined, all three datasets showed a similar trend of isoform/modification changes after TS cell differentiation. However, the relative ratios of isoforms/modifications changed after data correction. For example, all three datasets showed increased levels of histone H2A.X and H4 K20me2/me3 after TS cell differentiation (Figs. 7C-7H, supplemental Tables S16 and S17). However, the corrections changed the degree to which H2A.X and H4 K20me2/me3 increased after differentiation relative to the original method. The original data showed that the differentiated TS cells had a 5.2-fold increase in H2A.X and a 2.6-fold increase in H4K20me2 relative to the undifferentiated cells, whereas both internal and external data corrections brought the fold change to a lower level (H2A.X: 1.9-and 2.4-fold, respectively; H4K20me2: 1.2-and 1.1-fold, respectively). In one case examined, the corrected data led to an opposite biological conclusion: In the original data macroH2A showed a 1.4-fold increase when TS cells were differentiated (Fig. 7C). However, after the data correction, differentiated TS cells actually had less macroH2A than the undifferentiated TS cells (more than a 1.5-fold decrease) (Figs. 7D and 7E). We think this occurred because the amount of macroH2A peptide is generally very low in the cell (Ͻ1% in our data), and its relative abundance depended strongly on levels of other H2A peptides, because we normalized the measurements by the sum of three H2A variants.
In summary, these results demonstrated that our original protocol without correction was robust enough to identify the critical biological changes, especially when the target peptides were abundant. However, the uncorrected data might over-or underestimate some changes between two biological conditions. Therefore, utilizing the detection efficiencies described here would greatly improve the accuracy in quantification of histone variants and PTMs. Furthermore, for peptides that exist at very low abundances (e.g. macroH2A), the proper corrections were critical for accurate interpretations in biology. DISCUSSION To our knowledge, we have created the largest synthetic peptide library for histones, and here we have presented quantitative and systematic analyses. A handful of other studies have also employed synthetic histone peptides, but typically only for individual or limited histone variants/PTMs. For example, Darwanto et al. used two pairs of histone H2B peptides (unubiquitinated and ubiquitinated) to generate standard curves for multiple-reaction-monitoring analyses of H2B ubiquitination (11). Recently, Jaffe and colleagues reported the usage of a histone peptide collection including five synthetic histone H3 peptide families. These peptides were labeled with heavy arginine residues and were used to analyze the specificity of histone antibodies in chromatin-immunoprecipitation experiments, and also to quantify histone PTMs (14). However, they only included histone H3 peptides and did not evaluate systematic effects of the PTMs. Additionally, because they used the only heavy arginine residue (6C13, 4N15) in each tryptic peptide, the isobaric PTM peptides remained isobaric in mass (e.g. histone H3K9ac versus H3K14ac). This same group used these peptides in a second study, but only as an external guide for retention time windows for MS/MS targeting (49). In the present study, we extensively tested 93 histone peptides on three different LC/ MS-MS instrument setups and obtained very similar results (Fig. 2B). This suggests that the detection efficiency variations are consistent under different conditions when following our protocols. The Protein-Aqua™ technology designed the peptides by substituting different numbers of heavy amino acid residues; therefore, all the originally isobaric peptides had different m/z values and could be easily separated at the MS1 level. Moreover, these peptides could also be measured in the presence of the endogenous peptides (Fig. 1C). This design enabled us to fully evaluate and quantify different forms of histone peptides.
We examined various aspects of the synthetic peptide data by manually extracting ion chromatograms for the MS analyses, and we also manually validated MS/MS results (examples are given in supplemental Fig. S1). For each peptide, all detectable charge states were quantified. Because longer peptides have more naturally occurring heavy isotopes, we found that the relative amounts of these peptides were underestimated when we measured only the first isotopic peak (e.g. C12). We also corrected for this factor using calculated isotope distributions (supplemental Tables S1 to S3). Overall, we are very confident with MS peak assignments and peptide quantifications.
As previously mentioned, many factors contribute to the final detection efficiencies of peptides, including peptide hydrophobicity, basicity and binding affinity with reverse phase resins. We observed no clear relationship between the peptide hydrophobicity or basicity and the detection efficiency (Fig. 3). We did, however, observe a significant loss of highly hydrophilic peptides, namely, peptides 5 and 6 (histone H3 3-8 with K4me2/3), in our analyses, as they had the lowest detection efficiencies in the entire library (Table I). Experiments (data not shown) demonstrated that this was mainly due to the poor affinity of C18 resin for these two small, very hydrophilic peptides. Usage of a different derivatization reagent significantly increased the peptide recovery after desalting with STAGE tips. 2 Therefore, our synthetic peptide standards can be very beneficial for accurate quantification of histone H3 K4me2/3, which are arguably the most important histone marks on active gene promoters.
Among the 14 families of histone peptides, we observed very different relative variations, as measured by relative standard deviation (Figs. 4 to 6). The RSD values ranged from 13.1% to 173.5%. These results suggest that PTMs, and even the same PTM, have diverse effects on different peptides. Overall, lysine methylation seemed to have larger effects (both positive and negative) than lysine acetylation. Surprisingly, despite the general believe that serine/threonine phosphorylation has strong deleterious effects on ionization efficiencies, phosphorylation on histone H3T3, S10, or S28 alone was never the most negatively affecting PTM (Figs. 5C, 6B, and 6C) on each peptide. Naturally occurring phosphorylations in biological samples are typically very low in abundance and are quite dynamic. In contrast, lysine methylations and acetylations are more abundant and stable, especially on histones (2,3,50). Our results suggest that the difficulties of phosphorylation detection are probably due to the naturally low amount of this mark and possibly are not solely a result of technical limitations, of course based on our detection of these histone peptides examined here. Consistently, a recent study also showed that phosphorylated peptides sometimes also had higher detection efficiencies than their unmodified counterparts (20), thus complicating the view of the effect that these marks have on MS detection efficiency.
The propionyl groups introduced onto histone peptides during our chemical derivatization protocol can also be considered as another form of modification. To assess the impact of propionylation on detection efficiencies, we subjected the histone peptide library to our MS pipeline without the chemical derivatization. However, the unpropionylated peptides were retained so poorly that we could only reliably detect 56 out of 93 peptides (data not shown). Out of the 56, 7 peptides were unmodified (supplemental Fig. S5). We managed to analyze two complete families, the histone H3 54 -63 and H3 73-83 peptide families (supplemental Fig. S6). On the histone H3 54 -63 peptide, K56ac showed a more consistent increase in detection efficiency than the Unmod counterpart (supplemental Fig. S6A). In contrast, the unpropionylated forms on the histone H3 73-83 peptide (Unmod; K79me1, -2, and -3) were more similar to each other (RSD ϭ 11.4%), whereas K79me2 and -me3 on the propionylated peptides were much higher than their Unmod and K79me1 counterparts (RSD ϭ 86.5%) (Fig. 5 and supplemental Fig. S6B). We concluded that, similar to lysine acetylation and methylation, lysine propionylation has variable effects on detection efficiency when added onto different peptide backbones.
Synthetic heavy-isotope-labeled peptides are usually used as internal controls (absolute quantification), mixed with biological samples and analyzed together via nano-LC-MS/MS. Alternatively, they may also be analyzed separately from the endogenous samples, serving as external controls for detection efficiency (14,51). We applied both calibration methods and found very similar results between external and internal controls spiked into histones extracted from mouse trophoblast stem and differentiated cells (Fig. 7). For the histone H2A variants, we observed no difference between Int_corr data and Ext_corr data (Fig. 7A, supplemental Table S14). For the histone H4 20 -23 peptide family, there was some difference between the two sets of corrected data (Fig. 7B, supplemental Table S15). However, both corrections showed the same trend and were more similar to each other than the original data. Furthermore, the corrected data may change the interpretation of some biological results. In the example shown in Fig. 7, both Int_corr and Ext_corr data show a decrease in marcoH2A level after TS cell differentiation, different from the original data. In fact, the 16 runs presented here were conducted over a period of more than five months on three different instruments, suggesting that the detection efficiencies of these peptides remain stable over time. Therefore, the external corrections may still be used to control for the bias in the detection efficiencies of different peptides, which is something that all researchers in this epigenetics proteomics field should pay closer attention to.
As demonstrated above, the synthetic peptide library presented here can serve as a very useful tool in the quantification of histone PTMs and variants in biological samples. However, some challenges still remain in analyzing the endogenous histone samples. For example, the H4 4 -17 peptide family has three isobaric groups within the family: the one-ac, two-ac, and three-ac peptides (Fig. 1). Although we routinely target their m/z (MH 2ϩ ), it remains difficult to determine the relative amount of each form because some of them do not have unique b or y ions. The synthetic library does provide useful information about retention time differences of some isobaric peptides (Fig. 1C). However, for correct identification and quantification of these peptides, we still have to rely on optimization at the mass spectrometer to obtain full coverage in the MS/MS spectrum.
In conclusion, we have provided a systematic evaluation and quantification of the detection efficiencies of our 93histone peptide library. We found that there were widespread detection efficiency biases that were both sequence and PTM specific, although no discernible trends could be broadly generalized across all the peptides. Nevertheless, we believe these data will be of great benefit in measuring important histone variants and PTMs and will promote more accurate use of mass-spectrometry-based technologies in the epigenetics field. Our data also serve as a cautionary warning to those wanting to measure histone PTM abundances, as correction factors (internal or external) must be taken into consideration before one makes strong claims describing histone PTM/variant stoichiometry. Furthermore, these data may serve as the first step toward a more comprehensive understanding of peptide behaviors in mass spectrometry, as they represent many different types of protein modifications. Lastly, we further hope this library can be used and mined to extrapolate new parameters or characteristics that could be used to explain the observations made herein and in future studies.