Defeating Major Contaminants in Fe3+- Immobilized Metal Ion Affinity Chromatography (IMAC) Phosphopeptide Enrichment *

Here we demonstrate that biomolecular contaminants, such as nucleic acid molecules, can seriously interfere with immobilized metal ion affinity chromatography (IMAC)-based phosphopeptide enrichments. We address and largely solve this issue, developing a robust protocol implementing methanol/chloroform protein precipitation and enzymatic digestion using benzonase, which degrades all forms of DNA and RNA, before IMAC-column loading. This simple procedure resulted in a drastic increase of enrichment sensitivity, enabling the identification of around 17,000 unique phosphopeptides and 12,500 unambiguously localized phosphosites in human cell-lines from a single LC-MS/MS run, constituting a 50% increase when compared with the standard protocol. The improved protocol was also applied to bacterial samples, increasing the number of identified bacterial phosphopeptides even more strikingly, by a factor 10, when compared with the standard protocol. For E. coli we detected around 1300 unambiguously localized phosphosites per LC-MS/MS run. The preparation of these ultra-pure phosphopeptide samples only requires marginal extra costs and sample preparation time and should thus be adoptable by every laboratory active in the field of phosphoproteomics.

Protein phosphorylation plays a key role in cellular signaling (1)(2)(3) and its deregulation has been implicated in many human diseases (4,5). The reversible nature of protein phosphorylation allows organisms and cells to rapidly adjust to changing environments without major regulation at the transcriptional or translational level. Consequently, reversible protein phosphorylation is among the most prevalent and extensively studied post-translational modifications. It is estimated that more than 75% of mammalian proteins are phosphorylated at least once during their lifetime (6). However, enrichment of phosphorylated peptides or proteins before mass spectrometric identification is essential because of the sub-stoichiometric nature of protein phosphorylation. Recent advances in enrichment methods, mass spectrometric detection and fragmentation as well as in site localization assessment have made the identification of thousands of phosphorylation sites from small amount of sample feasible (7)(8)(9). Most of the widely used enrichment strategies exploit the affinity of phosphate groups toward metals immobilized on carrier resins. These include Fe 3ϩ (10), Ga 3ϩ (11), Zr 4ϩ (12), or Ti 4ϩ (13) immobilized metal ion affinity chromatography (IMAC) and metal oxide affinity chromatography (TiO 2 (14), ZrO 2 (15)

and others).
We recently demonstrated that the use of Fe 3ϩ -immobilized metal ion affinity chromatography (IMAC) 1 resin particles in a column format offers a selective, comprehensive and reproducible enrichment of phosphopeptides over a wide range of sample quantities (from 0.1 to 5 mg) (16). In addition, it was shown that under the conditions tested, the Fe 3ϩ -IMAC column outperformed batch-based methods of enrichment such as TiO 2 and Ti 4ϩ -IMAC. Further optimizations of the Fe 3ϩ -IMAC enrichment method, including a reduction in gradient time (less than 15 min per enrichment), the number of consecutive enrichments possible between two column Fe 3ϩ chargings (more than 20) and the column recharging time itself (less than 1h) have tremendously increased throughput. These adjustments enabled the reproducible and selective identification of more than 10,000 unique phosphopeptides from 1 mg of HeLa digest (17).
Here we focused on further improvements tackling mostly sample preparation, which constitutes one of the key steps for any successful experiment. Indeed, when enriching low abundant phosphopeptides from a complex digest using affinity purification methods, contaminants can interfere with the binding of phosphopeptides to the stationary phase and thus affect the enrichment selectivity, with direct consequences on the attainable number of detected phosphopeptides. We identified some of the main interfering molecular components in Fe 3ϩ -IMAC phosphopeptide enrichments, being nucleic acid containing biomolecules. We developed a robust protocol, implementing enzymatic digestion using benzonase, which degrades all forms of DNA and RNA, before IMACcolumn loading, which ultimately resulted in drastic increases in electrospray ionization efficiency and MS2 identification rates of phosphopeptides. For human cell-lines, the improved sample purity led to a 50% increase in the number of unique phosphopeptides and phosphosites identified. Improvements were even more distinct when this protocol was applied to bacterial lysates, for which the frequency and abundance of phosphopeptides is low and consequently the influence of molecular contaminants even more problematic (18). Using our optimized protocol, the number of identified endogenous phosphopeptides could be raised by a factor 10. Therefore, the new protocol may lead to a better comprehension of so far poorly understood bacterial protein phosphorylation dynamics, which have been shown to correlate with bacterial pathogenicity (19) and/or antibiotic resistance (20). In summary, we developed a universal sample preparation protocol, which allows the preparation of ultra-pure phosphopeptide samples, resulting in striking improvements in phosphopeptides identification as well as in lower equipment maintenance.
HeLa and HEK293 Cell Culture-Cells were seeded at 15% density in 15-cm plates, allowed to adhere in full DMEM (Lonza, Basel, Switzerland) containing 10% heat-inactivated fetal bovine serum (Thermofisher scientific, Bremen, Germany), 2 mM L-glutamine (Lonza) and 20 mM HEPES(Sigma-Aldrich), and cultured to ϳ90% confluence over 2.5 days. Twelve hours before harvesting, growth medium was replaced with fresh prewarmed full DMEM. At harvesting, no dead or floating cells were visible by microscopic examination. Cells were washed twice with ice-cold PBS on-plate, detached by trypsin (Lonza), and collected by low-speed centrifugation at 200 ϫ g for 5 min. Cells were subsequently washed 3 times with ice-cold PBS and pelleted at low speed for 5 min.
Optimized Cell Lysis and Protein Digestion-One volume of cell pellet was resuspended in five volume of lysis buffer composed of 100 mM Tris-HCl pH 8.5, 7 M Urea, 1% Triton, 5 mM TCEP, 30 mM CAA, 10 U/ml DNase I, 1 mM magnesium chloride (Sigma-Aldrich, Steinheim, Germany), 1% benzonase (Merck Millipore, Darmstadt, Germany), 1 mM sodium orthovanadate, phosphoSTOP phosphatases inhibitors and complete mini EDTA free protease inhibitors. In comparison with the standard protocol, a mixture of urea and triton was preferred as SDC inhibits benzonase activity and as such requires sample dilution before benzonase addition. Otherwise, no significant difference in protein yield or number of phosphopeptides identified was observed between SDC and Urea/Triton lysis conditions (data not shown). Cells were further lyzed by sonication for 45 min (20 s on, 40 s off) using a Bioruptor Plus. Residual cell debris removal was performed by ultracentrifugation (140,000 ϫ g for 1h at 4°C). One percent benzonase was added to the supernatant and the mixture was incubated at room temperature for 2h. Then, methanol/ chloroform protein precipitation was performed as follows: to 1 volume of sample, 4 volumes of methanol (Sigma-Aldrich), 1 volume of chloroform (Sigma-Aldrich) and 3 volumes of ultrapure water were sequentially added with intensive vortexing after each addition. After centrifugation (10 min at room temperature, at 5000 rpm), the upper layer was removed without disturbing the interface and 3 volumes of methanol were added. After thorough vortexing, sonication and centrifugation (10 min at room temperature, at 5000 rpm), the liquid phase was removed, and the white protein precipitate was allowed to air dry. The precipitate was resuspended in digestion buffer composed of 100 mM Tris-HCl pH 8.5, 1% SDC (Sigma-Aldrich), 5 mM TCEP and 30 mM CAA. Trypsin and Lys-C proteases were respectively added to a 1:25 and 1:100 ratio (w/w) and protein digestion was performed overnight at room temperature.
Fe 3ϩ -IMAC Enrichment-Iron IMAC enrichments were performed in technical triplicates for both bacterial and human cell lines samples. Before Fe 3ϩ -IMAC column enrichment, samples were acidified to pH 3.5 using a 10% formic acid solution (Sigma-Aldrich) and the SDC precipitate was pelleted by centrifugation at 14000 rpm for 5 min. The supernatant was then loaded onto tC18 Sep-Pak resin columns (Waters) for desalting. Samples were washed twice with 1 ml 0.1% formic acid before elution with an elution buffer composed of 30% acetonitrile and 0.1% formic acid. Eluted peptides were immediately frozen in liquid nitrogen and dried down using a lyophilizer. Enrichments were performed as previously described (17). Briefly, lyophilized peptides were dissolved in buffer A (30% acetonitrile, 0.07% trifluoroacetic acid (TFA, Sigma-Aldrich)) and the pH was adjusted -if required -to a value of 2.3 using 10% TFA before injection onto the Fe 3ϩ -IMAC column (Propac IMAC-10 4 ϫ 50 mm column, Thermofisher scientific). The enrichment gradient is described in the supplemental Table  S1, with elution buffer B corresponding to 0.3% NH 4 OH. UV-abs signal was recorded at the outlet of the column, at a wavelength of 280 nm. Collected phosphopeptides were immediately frozen in liquid nitrogen and subsequently dried down using a lyophilizer.
LC-MS/MS-Nanoflow LC-MS/MS analysis was performed by coupling an Agilent 1290 (Agilent technologies, Middelburg, Netherlands) to an Orbitrap Q-Exactive HF (Thermofisher scientific). Lyophilized phosphopeptides were dissolved in loading buffer consisting of a mixture of 20 mM citric acid (Sigma-Aldrich) and 1% formic acid. Resuspended phosphopeptides (amount corresponding to enrichment of 1.4 mg of total digest in the case of human cell-lines and 2 mg in the case of bacterial samples) were injected, trapped and washed on a precolumn (100 m i.d. ϫ 2 cm, packed with 3 m C18 resin, Reprosil PUR AQ, Dr. Maisch, Ammerbuch-Entringen, Germany, packed in-house) for 5 min at a flow rate of 5 l/minute with 100% buffer A (0.1% FA, in HPLC grade water). Peptides were then transferred into an analytical column (75 m x 60 cm Poroshell 120 EC-C18, 2.7 m, Agilent Technology, Santa Clara, CA, packed inhouse) before separation at room temperature at a flow rate of 300 nL/min using a 160 min linear gradient, from 6% to 32% buffer B (0.1% FA, 80% ACN) in the case of human cell-lines samples, and using a 70 min linear gradient, from 8% to 32% buffer B in the case of bacterial samples. Electrospray ionization was performed using 1.9 kV spray voltage and a capillary temperature of 320°C. The mass spectrometer was operated in data-dependent acquisition mode: full scan MS spectra (m/z 375-1,600) were acquired in the Orbitrap at 60,000 resolution for a maximum injection time of 20 ms with an AGC target value of 3e 6 charges. Up to 12 precursors were selected for subsequent fragmentation and high-resolution HCD MS2 spectra were generated using a normalized collision energy of 27%. The intensity threshold to trigger MS2 spectra was set to 2e 5 , and the dynamic exclusion to 24 s in the case of human cell-lines samples and 12 s in the case of bacterial samples. MS2 scans were acquired in the Orbitrap mass analyzer at a resolution of 30,000 (isolation window of 1.4 Th) with an AGC target value of 1e 5 charges and a maximum ion injection time of 50 ms. Precursor ions with unassigned charge state as well as charge state of 1ϩ or superior/equal to 6ϩ were excluded from fragmentation.
Data Analysis-MaxQuant software (version 1.5.6.0) was used to process the raw data files, which were searched against reviewed homo sapiens or reviewed E. coli K12 databases (Uniprot, March 2016, respectively 20265 and 4434 entries), with the following parameters: trypsin digestion (cleavage after lysine and arginine residues, even when followed by proline) with a maximum of 3 missed cleavages, fixed carbamidomethylation of cysteine residues and variable phosphorylation on serine, threonine, tyrosine residues as well as variable oxidation of methionine residue. Mass tolerance was set to 4.5 ppm at the MS1 level and 20 ppm at the MS2 level. The False Discovery Rate (FDR) was set to 1%, a score cut-off of 40 was used in the case of nonmodified peptides and the minimum peptide length was set to 7 residues. The MaxQuant generated tables "evidence.txt" and "phospho (STY)Sites.txt" were used to calculate the number of unique phosphopeptides and phosphosites identified, respectively, and known contaminants were filtered out. Identified phosphopeptides and phosphosites are provided in supplemental Tables S2 to S7.
Nucleic Acid Marker Ions-RAW files were converted into mascot generic format (mgf) by using Proteome Discoverer 2.2. Peaks with an intensity below 5e 3  Experiment Design and Statistical Rationale-Each samples were enriched in triplicates before being separately injected into the LC-MS/MS system. Each raw file was separately processed using the MaxQuant software, and error bars displayed in figures correspond to observed standard deviations.

RESULTS AND DISCUSSION
To benchmark our set-up we first performed Fe 3ϩ -IMAC column-based phosphopeptide enrichments on samples from human cell lines (here HEK293 and HeLa) using the earlier reported standard sample preparation protocol. Pleasingly, we noted that the results we obtained were alike to those reported earlier, both in terms of UV signal measured at the Fe 3ϩ -IMAC column outlet and phosphopeptide identifications (supplemental Fig. S1) (17). It has been previously shown that the UV signal corresponding to the retained components linearly scales with the input amount (16). However, based on the ratio among areas under the curves corresponding to the flow-through and the so-called retained phospho-peak (termed R), phosphopeptides would constitute more than 5% of the injected sample, which is evidently implausible. In addition, and in contrast with synthetic (phospho)peptides (21), the MS2 identification rate observed was significantly lower in the case of phospho-enriched samples when compared with proteome samples. These observations led us to investigate the presence of potential contaminants in the retained fraction. After binning of the low mass region of MS2 spectra, it appeared that several ions-which were not corresponding to any amino acid residues fragmentation products-were present in many of the unidentified MS2 spectra. Among these ions, some marker ions of nucleic acid or (poly)ADP-ribosylation (22) were identified, as for example the recurrent ion of m/z 330.06, corresponding to an adenosine derivative fragment (23) (supplemental Fig. S2). To assess if the presence of nucleic acid contaminants is specific to the cell lysis conditions or to the IMAC format used, we decided to investigate the presence of these interferents in phosphoproteomics datasets generated by two different research groups. The same ion marker was frequently observed in two datasets using different lysis conditions and/or a different enrichment method, namely batch mode TiO 2 MOAC (24) (supplemental Fig. S3). This confirmed that nucleic acid contamination in phosphoproteomics analysis is a widespread problem and prompted us to optimize the sample preparation protocol to eliminate such contaminants likely originating from incomplete shearing.
To remove these contaminants, several steps were implemented in the sample preparation protocol (Fig. 1). Mechanical (extensive sonication or needle shearing) or thermal degradations (samples boiling) were ineffective and even resulted in some cases in a decrease of the number of detected phosphorylation events (data not shown). Enzymatic digestion of biomolecules containing nucleic acids provided by far the best results. After testing different (combinations of) nucleases and monitoring the corresponding UV signals, it appeared that benzonase outperformed all other tested nucleases. The reason behind this superior digestion efficiency probably resides in the low specificity of benzonase, a genetically engineered endonuclease, leading to cleavage of both (single and double stranded) DNA and RNA, without any base preference. It however appeared that addition of DNase I complemented the activity of benzonase and that the combination of the two enzymes yielded optimal digestion. Moreover, the presence of nucleases in the lysis buffer minimizes sample loss through a substantial reduction of sample viscosity inherent to the presence of large nucleic acid molecules.
This thorough enzymatic digestion led to a significant decrease of the UV-signal of the retained components (R) to less than 0.3% of the area corresponding to the flowthrough (FT), a value more correlating with the estimation of the proportion of phosphorylated peptides in protein digests of human cells (Fig 2A and 2B). Furthermore, the number of observed con-taminant marker ions for nucleic acid containing biomolecules drastically decreased, thus confirming our hypothesis about the nature of the contaminants as well as the efficiency of the hereby-presented protocol (Fig. 2C). More importantly, the elimination of contaminants resulted in a substantial increase of protein phosphorylation identification. A 50% increase in numbers of phosphopeptides or class I phosphosites (localization probability Ͼ 75%) was observed for the samples derived from human cell-lines (Fig. 2E and 2F). More than 17,000 unique phosphopeptides and 12,500 class I phosphosites were identified from single LC-MS/MS runs of phosphoenriched HeLa cell line samples (supplemental Table S2 and S3). Identical results were obtained for samples from HEK293 cell lines (supplemental Table S3 and S4). No major improvement of the enrichment specificity was observed as the potential for improvement was very limited. Specificity routinely exceeded 95% (at both the Peptide-Spectrum Matches (PSMs) and peptides levels) with both the standard and new protocol (Fig. 2D), indicating that the presence of contaminants does not interfere too much with the binding specificity of the stationary phase. Thus, enrichment specificity does not constitute an appropriate criteria to evaluate sample purity. Notably, identification rates of phospho-enriched samples were slightly superior to rates observed for normal proteome samples analysis using the same instrument, reaching more than 40% in the case of a 160 min gradient and close to 60% in the case of a 100 min gradient, both at high acquisition speeds (50 ms maximum injection time). This observation is thus contradicting the hypothesis that lower identification rates commonly observed in the case of phosphoproteomics analysis is solely caused by the difficulty of identifying MS2 spectra deriving from HCD fragmentation of phosphopeptides, which had been attributed to the extensive neutral losses of phosphate groups. Interestingly, as seen in Fig. 2G, multiply phosphorylated peptides constituted more than a third identified phosphopeptides in both human cell lines, a percentage significantly higher than proportions usually reported after standard Fe 3ϩ -IMAC enrichments (17,25). This could be of interest as substrate's multiple phosphorylations is known to have unique biological functions (26,27) and as such different techniques have been developed to specifically enrich multiply phosphorylated peptides (25,28).
Encouraged by this success, we sought to test the efficiency of the optimized protocol with even more challenging applications. Because of a different cell structure, notably the presence of a cell envelope, and to a lower stoichiometry of phosphorylation events, bacterial sample preparation for phosphoproteomics is known to be much more critical than preparation of human cell lines samples (18). Consequently, any bacterial phosphoproteomics study require extensive sample clean-up to identify substantial amounts of protein phosphorylations. To assess the efficiency of the presented optimized sample preparation protocol on bacterial samples we investigated Escherichia coli, a widely studied bacterial Human and bacterial samples can be prepared using the same protocol before Fe 3ϩ -IMAC column phospho-enrichment. Key steps implemented are the digestion of nucleic acid containing biomolecules after cell lysis, followed by methanol/chloroform protein precipitation. These optimizations resulted in substantial depletion of background, concomitant with an improvement of the purity of the phosphoenriched samples, leading to strong increases in the number of detected phosphorylation events. model system. In addition to the nucleic acid digestion step, methanol/chloroform protein precipitation proved to be very beneficial in the preparation of bacterial samples, as shown in Fig. 3 and supplemental Fig. S4. Comparison of the UV trace profiles displayed in Figs. 3A and 3B shows a remarkable decrease of the retained peak (R) in the UV trace, accompanied by a drastic increase in phosphopeptide sample purity. Notably, the UV signal profile corresponding to the enrichment of bacterial sample prepared with the optimized protocol was similar to the one observed in the case of human cell-line samples enrichment, indicating a comparable degree of sample purity. Again, the elimination of contaminants (Fig.  3C) resulted in major gain in protein phosphorylation identifi-cation, which for the E. coli sample translated into a 10-fold increase in the number of phosphopeptides and phosphosites identified (Fig. 3E, supplemental Table S5 and S6), once more underpinning the importance of sample preparation in (phospho)proteomics. More than 98% of the detected phosphopeptides in E. coli were singly phosphorylated (supplemental Fig. S5), confirming the low prevalence of multiple phosphorylated peptides observed in bacterial phosphoproteomics studies (18). The sample preparation method that we present here offers insight of unparalleled depth in a bacterial phosphoproteome. Indeed, a high proportion of identified phosphosites were never observed in previously published studies, as most of the phosphosites , an important decrease in the UV signal of the on-column retained molecules (annotated R) is observed. Indeed, with the optimized protocol the phosphopeptide UV-signal corresponds to less than 0.3% of the signal of the flow-through fraction (annotated FT). C, The decrease of the UV trace of the retained material corresponds to an increase in phosphopeptide sample purity through removal of the main interferents, namely nucleic acid containing biomolecules. The presented new protocol effectively eliminates these contaminants by enzymatic digestion, as also attested by the decrease of the proportion of MS2 spectra containing the peak of m/z 330.06, a fragment ion marker of nucleic acid containing biomolecules. D, The optimized protocol enables a remarkable increase in the number of identified protein phosphorylations, even if the enrichment specificity was already around 95% using the standard protocol (at the PSM level, D). E, F, For the human samples the number of unique phosphopeptides (E) and unique class I phosphosites (F) increased by 50% when comparing phospho-enriched samples derived from unstimulated HeLa and HEK293 cell-lines, to reach around 17,000 phosphopeptides and 12,500 class I phosphosites identified per LC MS/MS run. G, Finally, using the new protocol the percentage of multiply phosphorylated peptides identified increased substantially, to about 40% for both cell-lines, doubling the numbers observed using the standard protocol.
observed until now had been localized on highly abundant metabolic enzymes.
For both human and bacterial samples, removal of contaminants resulted into better electrospray ionization at the outlet of the LC column, as attested by the increase of Total Ion Current (TIC) in the LC-MS chromatograms, even though the UV trace of the retained peak was much reduced. These improvements, especially notable for the bacterial samples (Fig. 3D) are in line with the expected poor ionization efficiency of nucleic acid containing biomolecules in positive ion mode ESI. Furthermore, the maximum sample input before reaching saturation of the Orbitrap analyzer or a plateau in terms of number of phosphopeptides identified also increased (supplemental Fig. S6). From a more pragmatic point of view, we also witnessed a decrease in LC-MS/MS necessary mainte-nance, which we once more explain by the higher sample purity using the novel protocol. CONCLUSION Phosphoproteomics remains a thriving research field, which greatly contributed to the understanding of molecular mechanisms regulating cellular processes. Sample preparation, instrumentation and data analysis are three important pillars in (phospho)proteomics research, but although we witnessed staggering progresses in the last two over the last decade, sample preparation has largely remained unchanged. We present here simple optimizations of the sample preparation protocol for phosphoproteomics samples, leading to substantial improvements in the final phospho-enriched sample FIG. 3. Improved detectability of bacterial phosphopeptides due to enhanced sample preparation. A, B, Improvements by the optimized workflow were even more spectacular when applied to E. coli samples, as visible by the important reduction of the "retained signal" in the UV-trace profile. C, The protein precipitation step alone did not enable the removal of nucleic acid containing biomolecules (C, annotated std ϩ prec.) but resulted in an increase of the number of phosphopeptides identified and proved to be necessary. D, strikingly this reduced UV peak translated into a substantial increase in the total-ion-current chromatograms (TIC) in the LC-MS/MS runs, which we attribute to an increased phoshopeptide sample purity, resulting in better ionization efficiencies, as the huge background of negatively charged nucleic acid containing biomolecules is depleted in the optimized protocol. E, The increase of phosphopeptide sample purity obtained after preparation with the optimized protocol led ultimately to confounding increases in the number of identified phosphorylation events: numbers of E. coli unique phosphopeptides and class I phosphosites identified increased by a factor 10 when compared with the standard protocol. purity. We demonstrate that the main contaminants are likely derived from nucleic acid containing biomolecules and developed an efficient method to remove such contaminants. The important decrease in the UV-signal observed at the outlet of the Fe 3ϩ -IMAC column, accompanied by the decrease of occurrences of nucleic acid marker ions in the LC-MS/MS data, as well as the increase of the ionization efficiency after nucleases digestion all support our hypothesis. When compared with the standard protocol, optimizations presented here constitute only a marginal cost increase while remaining time-effective and should be adoptable by every laboratory interested in phosphoproteomics.
More importantly, significant increases were observed in the number of phosphopeptides or phosphosites identified. In the case of human cell lines samples, a 50% increase in the number of phosphopeptides or phosphosites identified was observed in the same LC-MS/MS time. Overall, the generation of ultra-pure samples allows to take full advantage of the mass spectrometer's high acquisition speed and to reach an unprecedented depth in human phosphoproteome analysis without fractionation. Furthermore, the presented method led to a 10-fold increase in the number of detected bacterial protein phosphorylations. We believe that this will open the way for investigations of still vastly unknown bacterial phosphoproteome dynamics, which could ultimately result in a better understanding of pathogenic microorganisms.