High-Throughput Characterization of Combinatorial Histone Codes

We present a novel method utilizing “salt-less” pH gradient weak cation exchange-hydrophilic interaction liquid chromatography (WCX-HILIC) directly coupled to electron transfer dissociation (ETD) mass spectrometry for the automated on-line high-throughput characterization of hyper-modified combinatorial Histone Codes. This technique, performed on a low resolution mass spectrometer, displays an improvement over existing methods with approximately 100-fold reduction in sample requirements and analysis time. The scheme presented is capable of identifying all of the major combinatorial Histone Codes present in a sample in a 2-hour analysis. The large N-terminal histone peptides are eluted by the pH and organic solvent WCX-HILIC gradient and directly introduced via nanoelectrospray ionization into a bench-top linear quadrupole ion trap mass spectrometer equipped with ETD. Each polypeptide is sequenced and the modification sites identified by ETD fragmentation. The isobaric trimethyl and acetyl modifications are resolved chromatographically and confidently distinguished by the synthesis of mass spectrometric and chromatographic information.

highly modified histone H3.2 K4me3K9acK14acK18acK23acK27acK36me3. Such detail provided by our proteomic platform will be essential for determining how histone modifications occur and act in combination to propagate the Histone Code during transcriptional events and could greatly enable sequencing of the histone component of human epigenomes.

Introduction:
Eukaryotic nuclear DNA is nominally compacted into chromatin fibers by use of nucleosomes consisting of a 146bp section of DNA wrapped around a core of histone proteins (1). Dynamic post-translational modifications (PTMs) of the histones, primarily in the accessible N-terminal region or histone 'tail', are an important but not fully understood component of dynamic gene regulation, epigenetic inheritance of cellular memory, genomic stability, and other nuclear mechanisms 2,3,4,5,6,7. An overwhelming number of studies point to the existence of a Histone Code of biological logic written on these proteins through these PTMs which are read by a diverse array of "effector" proteins leading to distinct biological events (3). Many single PTM sites on various histone proteins have been decidedly linked to specific physiological processes, such as histone H3 Lys 9 trimethylation (H3K9me3) which is associated with heterochromatin formation (one mode of gene silencing). Nevertheless, what effect multiple modifications occurring in combination may have on modulating the Histone Code signal remains to be determined. Significant progress has been made towards understanding histone modifications using antibody-based histone modification detection methods and by Bottom Up mass spectrometry 4,5,6. However, these efforts are fundamentally incapable of maintaining the connectivity between sites of modification over long amino acid sequences and thus do not provide information on how these modifications occur and function in concert. There are, however, several lines of recent evidence that indicate the biological significance of the combinatorial aspects of the Histone Code, 2,7,8 thus prompting research into the sequence analysis of long-range histone PTM patterns.
The technologies capable of determining such long-range patterns of PTMs, electron capture dissociation (ECD) (9) and electron transfer dissociation (ETD) (10) mass spectrometry (MS), are still relatively new. These have enabled Top and Middle Down gas phase sequencing for combinatorial histone PTM analysis. For example, Kelleher and co-workers have published several papers detailing the analysis of all core histones using ECD on a high-resolution Fourier transform mass spectrometer 11,12,13,14. As histones H2B and H2A are modestly modified, and histone H4 has limited complexity in comparison to histone H3, fairly thorough analysis of these proteins could be accomplished by a pure Top Down approach. However, the analysis of histone H3 has proven to be a significantly more difficult analytical problem, and has only resulted in a limited survey by a sole Top Down approach (15). Generally, ECD analysis of histones require large amounts of fairly pure sample and potentially long instrument acquisition times (several minutes to hours) to produce a single useful ECD spectrum given the sample complexity. The sensitivity of Bottom Up analyses have revealed more diverse PTMs on H2A, H2B and H4 than Top Down approaches 5,16,17,18,19. ETD experiments have been previously shown to be compatible with on-line chromatography methods and have similar limits of detection and dynamic ranges as Bottom Up MS. Therefore, ETD analysis of histones would seem to be a better fit for Top or Middle Down MS and thus improved methods for Top or Middle Down analysis of histones remains a priority. In support, ETD has been recently used by a few groups to sequence histone proteins and peptides 20,21. However, as all of these on-line analyses have been performed using standard reverse-phase high performance liquid chromatography (RP-HPLC), only limited analyses or analyses of the less complicated histones, H2A or H4, have been performed due in part to low chromatographic resolution.
The quality of any LC-MS analysis, as measured by dynamic range, sensitivity and specificity, is highly dependent on the quality of the chromatography. This becomes critical in the case of modified histone peptides where the sample is a complex mixture of wide concentration range of large peptides with identical amino acid sequences, modified in slightly different ways, resulting in many isobaric structural isomers.
Separation of these physically similar modified histone forms (especially the highly modified histone H3) by any method has proved difficult and non-routine (22).
Chromatographic methods traditionally used in proteomic analyses (RP-HPLC) achieve only marginal separation of large histone modified peptides resulting in complicated Middle Down MS analyses of highly mixed precursor ion tandem mass spectra (several isomeric, but uniquely modified species fragmented at once), highlighting the need for chromatographic resolution of histone forms prior to mass spectrometric interrogation. volatile mobile phase additives that render them inadaptable to an on-line LC-MS method. As a consequence, each of the many resulting LC fractions from the upfront separation had to be further purified and separately analyzed by MS afterwards.
Although such methods served as excellent discovery platforms, this process is extremely time consuming, leads to sample loss, and inherently reduces the chromatographic resolution prohibiting extensive studies of the relevance and dynamics of the modified forms discovered 24,27.
Here we present the first on-line nanoflow weak cation-exchange hydrophilic interaction liquid chromatography (WCX-HILIC) LC-MS/MS analysis method for the high-throughput characterization of complex mixtures of hyper-modified combinatorial Histone Codes. The chromatographic separation is performed on a WCX-HILIC PolyCAT A stationary phase (poly-aspartic acid); however, our mechanism of elution is different to that previously reported (27). The ionic strength gradient (i.e. salt elution) used by off-line methods has been replaced with a pH gradient that protonates the stationary phase to remove the cation exchange interaction. This change in elution strategy leads to a similar chromatographic profile as an ionic strength gradient but renders the method "mass spectrometry friendly" and results in dramatically improved analysis time, throughput, sample consumption and dynamic range. Whereas previous methods required 50-100 hours of manual MS data acquisition time and over 100µg of sample to systematically characterize a single histone extract; 20,24,27 the method presented here can achieve this with less than 1µg of sample in as little as a couple hours with an overall improvement in data quality. Because of the improved chromatographic resolution and the inherent concentration of minor forms at the point of ionization of an on-line nanoflow LC-MS method, our dynamic range and limits of detection are significantly improved. Furthermore, the selectivity of the chromatography means that isobaric modifications, most importantly trimethylation and acetylation, can be confidently distinguished and assigned by supplementing the ETD MS/MS with retention time data. Although this is the first work to distinguish between acetylation and trimethylation in such a manner, it should not be surprising that a modification that removes a positive charge can be resolved from a modification that permanently fixes a positive charge by cation exchange mechanisms. Thus, high-resolution mass spectrometry, as used previously, 12,13,14,15,20,21,25,27 is not a strict requirement.
We demonstrate using histones H3.2 and H4 from butyrate treated HeLa cells (butyrate is a deacetylase inhibitor and this results in a wide range and more complex mixture of potential forms ideal for methodological testing), that our method achieves a high quality comprehensive characterization of combinatorial Histone Codes using our nanoflow LC method in combination with ETD on a widely available ion trap instrument.

Sample Preparation
HeLa S3 cells were grown and harvested as previously described (15). In some instances, cells were treated with 10mM sodium butyrate overnight to increase histone acetylation. After nuclei isolation, histones were acid extracted according to standard protocols (28). Histones were then separated by RP-HPLC into the constituent family members (H2A, H2B, H3.1, H3.2, H3.3, H4 and H1) on a 4.6x250mm C8 column (Grace Davidson, Deerfield, IL) using a System Gold (Beckman Coulter, Fullerton CA) by guest on May 8, 2020 HPLC to deliver a gradient at 0.8mL/min from 30%B to 60%B in 100 min (A:5% acetonitrile, 0.2% TFA; B:95% acetonitrile 0.18% TFA). Histone H3.2 was selected and diluted in 100 mM ammonium acetate, pH = 4, digested with GluC protease (Roche, Nutley, NJ) at an protein:enzyme ratio of 10:1 for 5h at room temperature after which the reaction was quenched by freezing at -80 o C. The resulting 1-50 AA peptide of the H3.2 histone protein was then further RP-HPLC purified as described before (5) (1%B/min gradient, same solvents and solvent system as above, except a 2.1x250mm column and 0.2mL/min flow rate was used). Histone H4 was enzymatically digested using AspN (Roche, Nutley, NJ) (5:1 ratio, 100mM ammonium bicarbonate, pH = 8.0 for 6 hrs at 37 o C). The resulting 1-23 AA peptide was purified using C18 STAGE tip solid phase extraction (29), by loading in 0.1% acetic acid and eluting in 50% MeCN in 0.1% acetic acid. Eluted protein was evaporated to near dryness and diluted into the HILIC "A" mobile phase before loading on capillary HILIC columns.

Chromatography and Mass Spectrometry
A P2000 laser tip puller (Sutter Instruments, Novato, CA) was used to pull a 75μm i.d x

Histone H3 analysis
Our initial efforts to develop an effective on-line LC-MS method for the analysis of complex mixtures of modified histone forms involved an ionic strength gradient from the "A" mobile phase as described in the experimental section above to a "B" mobile phase of 500mM ammonium formate, 25% acetonitrile adjusted to pH 6.0 with ammonium hydroxide at a gradient velocity of 1%B per minute. Butyrate treated histone H3. After deciding on an unbuffered formic acid pH gradient, using the 0mM ammonium formate, pH 2.5, 25% acetonitrile as the "B" mobile phase, we investigated the effects of gradient rate by analyzing H3.  which is modified to full occupancy at each lysine except K37 which has never been reported modified in any other work to date (5). In this case the distinction between trimethylations and acetylations is trivial as five acetyls are required chromatographically and there are only five known sites of acetylation possible. This assignment is made even more apparent by the close elution of K4me2 and K36me2 containing homologous forms and the absence of K9me2 or K27me2 forms (or even me1 or unmodified forms) in the chromatographic region.
The evidence for chromatographic resolution of trimethyl from acetyl species is strong based on the physical separation mechanisms (cation exchange of differentially charged species, trimethyl is a fixed charge, acetyl is mostly uncharged) and the empirical observation of trends (consistent apparent degree of acetylation within chromatographic regions, apparent acetylated species eluting far from analogous me2, me1 and me0 forms, apparent trimethyl species eluting near and in a consistent order with respect to analogous me2, me1 and me0 forms, and even consistent regions of mass-retention time space that distinguish MS 2 unambiguously assigned acetyl positional isomers). However, in order to further validate our use of chromatographic information in conjunction with ETD MS 2 to distinguish trimethylations from acetylations we performed a parallel experiment on a LTQ-Orbitrap equipped with ETD. Shown in free of what otherwise would be isobaric interferences (see Fig 7). However, on even more detailed analysis several partially resolved peaks frequently make up each apparent peak elution at a given mass, often stepping though variations of methyl placements on a given acetylation theme (see Fig 8). Using this approach, we have comprehensively identified and relatively quantified over 200 modified histone H3.2 forms (see Table 1) in a single LC-MS/MS analysis.
In general, multiple species were resolved chromatographically at each mass. In and at K36 (data not shown). It can be concluded that the earlier peak at 89min is K9me2 and K14ac and unmodified at K36; however, further assignment is not possible.
Although the signal is relatively low, the ETD evidence for last peak at 214min indicates it has the same nominal mass and indistinguishable ETD spectra on a unit resolution instrument as the major peak at 147min. Based on the retention time shift to later in the gradient where only K23 monoacetylated species are otherwise observed, the peak at 214min is likely a minor modified form where an acetylation on K14 has been replaced with a trimethylation. Such apparently trimethylated minor pairs to acetylated forms appear throughout the data but most frequently on K9 for which acetylation has previously been reported (31). Further validation of these novel results on a high resolution mass spectrometer is clearly needed, however the retention time alone seems to be capable of distinguishing between acetylation and trimethylation and such observations would likely be difficult to impossible to validate even on a high resolution mass spectrometer without using an on-line separation as presented here.
The complexity of histone H3 means that in some cases very closely related Histone Codes may not fully resolve chromatographically even with our very effective separation method.  Table 1.

Histone H4 Analysis
The histone H4 1-23AA peptide is a significantly less complex sample to analyze due to fewer overall combinations of modifications and better ionization and fragmentation characteristics compared to the histone H3 1-50AA peptide. Our method is capable of separating positional isomers, greatly enhancing the capacity to detect and distinguish minor histone modified forms. As seen in Fig 9A,    Additionally, there are nucleotide-based epigenetic mechanisms such as DNA by guest on May 8, 2020 methylation (39). The payoff of solving the epigenetic conundrum, however, will be great given that epigenetics is essential to the biology of stem cells (40) and how cells differentiate and fail to maintain the required differentiated states as in cancer (41).
Whereas genomics tells us what genes are available for expression and generally their propensity for expression, epigenetics directs the parts in a concerted manner to produce a viable differentiated cell. In fact epigenetics is the mechanism of multicellularity (37). How the Histone Code functions in concert is largely unknown, however, what is known indicates that how combinations combine, both on a single histone and between different histones within the same nucleosome are important (8,40,42,43,44,45,46,47,48). This work is among the first works capable of accessing such information and the first to do so at a scale and throughput that allows for extensive probing of this information. We have high expectations, given the sensitivity and throughput of the method, for combining this method with approaches that probe complementary epigenetic information, such as genomic location, but result in significant fractionation and thus smaller total amounts of combinatorial Histone Codes to be elucidated. combinatorial forms of Histone H4 from human embryonic stem cells found that the H4 forms are only partially resolved requiring significant computation to identify forms and some, particularly the diacetyl forms, were not sufficiently distinguishable to allow for quantification (20). As shown in Fig 9 above these same forms are largely resolved by our method and quantitative information can essentially be read off of the chromatogram.
The capacity of the method to chromatographically resolve trimethylations from acetylations greatly improves the quality of data achieved and confidence in assignments. Spectral complexity is reduced by the resolution of these isobaric species and the confidence of PTM assignments is improved by the predictable relationship between modification state and relative retention time. The ability of the chromatography to resolve these modifications arises from the differential charge between the acetyl, which is primarily uncharged, and trimethyl which is a fixed positive charge. This capacity to resolve these modifications have been validated by multiple lines of evidence, including high resolution mass spectrometry, however the most useful indicator of this distinction is that analogous me0, me1, me2 and me3 species elute closely in numerical or reverse numerical order as seen in Fig 8 and the acetyl analogs elute much earlier. The ability of this approach to distinguish such isobaric species may be widely applicable to other analyses using low resolution mass spectrometry given enough knowledge of the system to predict retention times or simply by the relative retention times of multiple peaks.
By adapting our method to the capillary nanoflow scale and using on-line ETD, we have dropped the sample requirements from >100 µg to <1 µg. With the improved Polymerase II has been shown to have a unique pattern of phosphorylation rather than a particular degree of phosphorylation to switch between active and inactive states (49).
Interestingly, many proteins that are phosphorylated are frequently found to be hyper phosphorylated, allowing for information encoded in combinatorial patterns. Yet, these patterns of phosphorylation are not well studied. The High Mobility Group proteins exhibit modifications similar to histones and are found in association with histones in chromatin (50). Other examples include Tubulin (51) and p53 (52). The rate of discovery of such combinatorial codes would seem to indicate that there are significantly more by guest on May 8, 2020 such examples yet to be discovered and the importance of resolving and analyzing such largely isomeric biological codes will become greater.    A complex mixture of Histone H3.2 1-50AA modified forms derived from butyrate treated HeLa cells separated by on-line WCX-HILIC chromatography using an ammonium acetate based ionic strength gradient from "A" (75% ACN and 20mM Propionic acid adjusted to pH 6.0) to "B" (25% ACN and 500mM ammonium acetate adjusted to pH 6.0 using ammonium hydroxide) at a rate of 1%B per minute.

Figure 2.
A complex mixture of Histone H3.2 1-50AA modified forms derived from butyrate treated HeLa cells separated by on-line pH gradient WCX-HILIC chromatography using various concentrations of ammonium acetate in the "B" mobile phase. In all cases "A" consisted of 75% ACN and 20mM Propionic acid adjusted to pH 6.0 and "B" contained 25% ACN and was adjusted to pH 2.5 using formic acid, a gradient of 1%B per minute was used and . (A) 100mM ammonium formate, (B) 20mM ammonium formate, (C) 5mM ammonium formate, (D) unbuffered, approximately 0.5% formic acid.      Three isomeric H3.2 Histone Codes eluting in close proximity but with distinct retention times; K9me2K14acK23acK27me1 at approximately 156min, K9me1K14acK23acK27me2 at approximately 158min and K14acK23acK27me2 at approximately 160min. The plot is derived from the consensus of multiple ions as to the ratio of the ETD spectrum that each form represents multiplied by the precursor ion intensity. This demonstrates partial chromatographic resolution of very closely related isomers, which shows that these forms are distinct species and improves confidence of assignment through retention time correlation of structural changes.    A listing of all of the combinatorial Histone Codes found and further validated in the butyrate treated histone H3.2 analysis shown in Figs 4, 5, 6 & 7. The annotated form is shown in column 1. An approximate percent abundance is shown in column 2. This value is based on the sum of all precursor ion intensities that exhibit each Histone Code multiplied by an approximation of the fraction of the resulting ETD spectrum that the Histone Code represents based on the ratios of fragment ions. This information is presented for qualitative purposes only; however we expect to be able to use such an approach in semi-quantitative analyses with further validation. 'BQL' (below quantitation limit) indicates that the Histone Code was detected but information is insufficient for approximation of abundance. The m/z of the precursor ion is given in column 3 and in column 4 the number of 'methyl equivalents' that the m/z value correlates with, where an acetylation counts as three methyl equivalents. In column 5 the retention time of the Histone Code is given.  Table 2.
A listing of all of the combinatorial Histone Codes found and further validated in the butyrate treated histone H4 analysis shown in Figs 8, 9, and 10. The annotated form is shown in column 1. An approximate percent abundance is shown in column 2. This value is based on the sum of all precursor ion intensities that exhibit each Histone Code multiplied by an approximation of the fraction of the resulting ETD spectrum that the Histone Code represents based on the ratios of fragment ions. This information is presented for qualitative purposes only; however we expect to be able to use such an approach in semi-quantitative analyses with further validation. The m/z of the precursor ion is given in column 3 and in column 4 the number of 'methyl equivalents' that the m/z value correlates with, where an acetylation counts as three methyl equivalents. In column 5 the retention time of the Histone Code is given.