Separation methodology to improve proteome coverage depth.

So-called ‘in-depth proteomics’ and its applied separation methodology to improve the proteome coverage depth has become an important issue in mass spectrometric-based proteomics and system-wide cell biology studies. Employing a bottom-up approach and a variety of separation techniques, it allows for identification of proteins with low copy numbers and enables researchers to correlate the number of expressed genes in a cell with the proteome. Here we describe recent advances in this field with emphasis on peptide and protein separation technologies. The discussion is focused both on single injection analyses employing long reversed phase liquid chromatography separations of peptides (‘single shot proteomics’) and on the combination of orthogonal protein and peptide separation methods to achieve maximum protein coverage. Owing to these improvements, in-depth proteomics has now fully entered the field and is being implemented in an increasing number of laboratories.

In recent years, the main foci in mass spectrometry (MS)-based proteomics have been the issues of protein quantification and the analysis of posttranslational modifications. While these topics still receive a lot of attention, another aspect has more recently entered the field of MS-based proteomics, namely the challenge of identifying as many proteins as possible from a sample -often termed 'in-depth proteomics'especially in cell biology studies.
In-depth proteome analysis with sufficient numbers of identified proteins promises to allow for correlating the number of gene-coding sequences in a genome with the results of transcriptome analyses, with the aim of understanding variations in the genomes of cell types and the patterns of gene expression and translation, a goal often referred to as 'proteogenomics' [1]. The latter is exemplified by the chromosome-centric human proteome project [2], which aims to identify at least one protein encoded by each of the approximately 20,200 gene-coding sequences present in the Uniprot/Swissprot database.
In-depth proteome analysis presents a daunting challenge for current analytical approaches and technology. Proteome analysis is most often carried out using a 'bottom-up' approach, where proteins are represented by peptide proxies generated by proteolytic digestion. Following the mass spectrometric identification, characterization and quantitation of the peptides, the identity, modification state and quantity of the underlying proteins are then assembled from the experimental data through bioinformatics means. This approach multiplies the number and diversity of analytes that need to be handled by mass spectrometric analysis. More importantly though, protein abundances, for example, in human cells span an enormous range, from 1 to 10 7 copies per cell -a difference of seven orders of magnitude, while the corresponding range of transcribed genes runs only from 1 to 10 4 [3]. In human body fluids, this concentration range can be even wider, for example, in human plasma where it is considered to span 12 orders of magnitude [4,5]. In contrast to transcriptome/RNA-Seq analyses of genes, proteins as the actual gene products cannot be amplified in order to detect them, so the concentration range of proteins in a sample translates more or less directly into the linear dynamic range required from any analytical set-up used. Consequently, MS-based proteomics in body fluids usually involves additional prefractionation or enrichment steps [6] and is focused on targeted mass spectrometric detection and quantitation of proteins rather than global analysis [7].
In spite of these challenges, significant progress has been made, so that nowadays more than 10,000 proteins can be identified from cultures of mammalian cell lines [8][9][10]. Above all, this progress is attributed to the technical improvements realized in the latest generation of mass spectrometers, with gains in sensitivity, speed and resolution that allow scientists to deal with highly complex sample mixtures. State-of-the-art electrospray ionization mass spectrometers are capable of sequencing speeds of up to 20 peptide precursors per second in data-dependent acquisition (DDA) mode. As not all sequencing events are successfully converted into a unique peptide sequence, this translates to the identification of at most 8-9 peptide sequences per second [11]. Instrumental improvements in ion sampling and transmission, scanning speed, resolution, mass accuracy and signal processing/digitization allow for impressively complex peptide mixtures to be eluted into the mass spectrometer for in-depth analysis [12][13][14]. In parallel with improvements in MS hardware, improvements in separation workflows for proteins and peptides prior to MS analysis have also contributed significantly. Even the fastest mass spectrometers are still not able to cope with the demands put up by the complexity of whole proteome samples [15]. Consequently, two separation approaches have been continuously investigated and optimized: On the one hand, a number of groups have optimized the nanoflow reversed-phase liquid chromatography (RP-LC) gradient separation of tryptic peptides that is typically hyphenated to the mass spectrometer. On the other hand, additional dimensions of separation both on the peptide and on the protein level are increasingly being utilized to maximize the depth of analysis.
The performance of gradient separations of peptides is best measured by the achieved peak capacity, that is, the number of peaks that can be separated across the gradient time. For nanoflow RP-LC-based separation of peptides, mainly two factors have been utilized to drive increases in peak capacity: the column length and the size of stationary phase particles [16]. Whereas conservative set-ups were using relatively short (15-120 min) gradients on 50-75 mm inner diameter, 10-15 cm length columns with 3 mm particles for routine operations, many groups are now utilizing longer gradients of up to 10 h on longer columns (25-100 cm) packed with smaller (sub-2 mm) particle sizes to achieve peak capacities in excess of 450 [17][18][19]. To counter the corresponding increase of backpressure that comes with both increased column length and decreased particle size, ultrahigh pressure liquid chromatography systems capable of up to 1000 bar operating pressure have become the industry standard. In addition, column ovens are used to heat columns and solvent lines, which, besides reducing the back-pressure, also improves the recovery especially of hydrophobic peptides [20].
Another way to increase the performance of RP peptide separations hyphenated to MS includes further miniaturization, for example, by using columns with an inner diameter of 25 mm or smaller. The corresponding decrease in flow rates to the region of 10 nl/min is of great benefit for the nanospray process at the mass spectrometer interface: it enhances the ionization efficiency of peptides and reduces suppression effects between competing analytes, thereby increasing the number of detectable and fragmentable peptides in the MS [21][22][23]. The technical challenges involved in producing and handling gradient separations at these flow rates have prohibited this approach from becoming routine though.
Owing to these improvements in RP gradient separations, one may ask whether a rather simple workflow for in-depth proteome analysis -comprising hydrolysis of the sample with endoprotease, separation of the peptides by long capillary columns and identification of peptides and thus the proteins by a current ultra-fast and sensitive MS instrument -might already be sufficient to cover the proteome of a cell. Indeed, several of the leading groups in the field of MS-based proteomics have pursued this approach to establish 'single-shot proteomics', that is, an in-depth characterization of the proteome using just a single-injection LC/MS/MS analysis.
However, the current limit for this type of analysis is the identification of approximately 5000 proteins from yeast, and up to 8000 proteins from human cell cultures [8,24,25]. While in studies with yeast this number is sufficient to cover approximately 90% of the expressed proteome [11,26], in human currently only up to 50-60% of the expressed proteome are covered, corresponding mainly to the upper half of the expressed proteins' concentration range.
An interesting perspective on the capabilities and limitations of 'single-shot proteomics' using long reversed separations is given in a review by Zubarev [27], who correlates the sample amount, the column length and the gradient time necessary to reach an analyses depth to cover the percentage of the lowabundance proteins in a proteome. He points out that for reaching a depth of >9000 proteins in a mammalian cell line, at least 1 mg of sample (corresponding to 5 million cells) is required, and also considers the working-time component of in-depth analyses.
From this perspective, 'single-shot proteomics' using current state-of-the-art technology -while presenting a major technological advance -is unlikely to achieve full coverage of the expressed cellular proteome in mammalian systems, even when only protein identification is required. There are two principal reasons for this: First, the dynamic range and sensitivity of the MS and the sample amount required to identify the lowabundance proteins in the range of 10 copies per cell preclude the exclusive use of one-dimensional peptide separation before MS analysis (note that only a maximum of a few mg can be loaded on capillary columns). Second, the proteome consists not only of plain proteins, that is, the gene-coded protein sequences, but also, for example, of proteins that are heavily modified by known and unknown (post)translational events, protein isoforms and naturally truncated proteins. All these further complicate the identification especially of low-abundance proteins, as the available capacity of the LC-MS system for fragmenting peptide precursors becomes even more heavily utilized. In addition, there is a strong stochastic element in discoverydriven MS analyses as low-abundance peptides are not reliably selected for fragmentation in DDA workflows, hampering the reliable detection of low-abundance proteins [15]. This limitation becomes even more pronounced when the systematic analysis of posttranslational modifications (PTMs) is required [28]. Consequently, additional separation workflows are absolutely indispensable, in order to reduce further the sample complexity of a peptide mixture before it is subjected to the nanoflow RP-LC system coupled to the MS.
Two principal workflows, representing different separation 'philosophies', are frequently applied: (i) addition of further dimensions of separation on the intact protein level, for example, according to apparent protein molecular weight by SDS-PAGE, to isoelectric point by electrophoretic separation (OFFGEL electrophoresis, which is a free-flow isoelectric focusing technique that separates and recovers proteins directly in solution for further analysis [29,30]), or to hydrodynamic radius by size-exclusion chromatography [10] before hydrolysis with endopeptidases; (ii) hydrolysis of the protein sample in solution and subsequent application of additional separations on the peptide level, usually various chromatographic workflows to prefractionate the resulting highly complex peptide mixtures before these are submitted to the nanoflow RP-LC front end of the mass spectrometer.
For protein separation, the approach most frequently taken is still SDS-PAGE. Entire lanes covering a broad range of apparent molecular weight are cut into slices, proteins of the various slices are digested in the gel and extracted peptides from each individual slice are subjected to LC-MS. An advantage of this method is that separation by SDS-PAGE is still the best method in terms of separation range and resolution for entire proteins. In addition, SDS-PAGE is compatible with a very wide range of protein solubilization and sample handling requirements [31,32]. Consequently, when peptides derived from various molecular weight regions are then separated with a capillary column packed with RP material of smaller particle size and operated by ultrahigh pressure liquid chromatography pumps [33], the analysis depth of a sample increases dramatically, and up to 8000 proteins from less than 100 mg of sample material from a human cancer cell line can be identified on a routine basis [34]. Moreover, depending on the scientific task, only proteins from defined molecular weight regions can be investigated and results easily correlated with, for example, western blot analysis. However, although proteins can be concentrated by several precipitation approaches (e.g., acetone, TCA), SDS-PAGE has only a limited loading capacity, and when this is exceeded the separation capability of SDS-PAGE decreases, with high-abundance proteins being smeared out over a large range of apparent molecular weight, hampering the detection of low-abundance proteins in the various molecular weight regions of the PAGE. Therefore, when larger sample amounts are available, further pre-fractionation of proteins by size-exclusion chromatography or OFFGEL separation, followed by SDS-PAGE or another chromatographic separation step, is often the better option [10]. The alternative 'philosophy' relies solely on the multidimensional chromatographic separation of peptides generated by endopeptidase digestion of proteins in solution. Ideally, the chromatographic dimensions should be chosen in a way that they are highly orthogonal, that is, separate peptides by largely independent physicochemical principles; they can either be coupled on-line (LC Â LC) or off-line following fractionation in the first dimension [35]. The first highly successful example of a 2D, online peptide separation was the so-called MudPIT approach, in which peptide mixtures are separated first by strong cation-exchange (SCX) chromatography and then by nanoflow RP chromatography. In the originally described approach, both chromatographic stages were combined in a single capillary emitter directly connected to the mass spectrometer [36]. More robust, and more widespread in use, is the off-line separation of peptide mixtures by SCX chromatography followed by the application of single fractions to the nanoflow RP-LC-MS system. Moreover, anionexchange chromatography has recently been successfully used as the first dimension of peptide separation [37]. Note that most ion-exchange chromatography protocols operate with salt gradients, which at increasing salt concentrations interfere with the subsequent nanoflow RP-LC-MS analysis, despite the fact that RP capillaries are usually equipped with pre-columns for preconcentration and desalting of the peptide mixtures. A notable exception is the approach published by Deeb et al. [38], who used pH steps at constant salt strength rather than increasing salt concentrations to fractionate by strong anion exchange.
To avoid the 'salt problem' associated with ion-exchange chromatography, many groups have implemented the concept of 'orthogonal' RP separations [39]. Here, peptide mixtures are separated by RP chromatography at neutral or basic pH in the first dimension, then fractionated and the fractions analyzed by nanoflow RP-LC-MS under conventional conditions, that is, at pH 2-3, in the second dimension. The two chromatographic dimensions in this set-up show only limited orthogonality; however, this partially compensated for by the fact that RP separations provide significantly higher chromatographic resolution than ion-exchange separations. In addition, sample transfer from the first to the second dimension is achieved simply by removing excess organic solvent used for the first dimension gradient elution, and the first dimension can be easily up-scaled to analytical or even semi-preparative column dimensions [40].
A completely different, but also very powerful approach is the use of peptide isoelectric focusing (pIEF) in the first dimension either on gel strips or with the above-mentioned OFFGEL system [41,42]. Instead of proteins separated by their isoelectric point on gel strips carrying ampholytes, peptides are separated into defined pH regions according to their pI [43]. When using gel strips with ampholytes, peptides can be eluted from the gel strips, as in the in-gel digestion of proteins, and further analyzed by nanoflow RP-LC-MS. OFFGEL allows for recovering peptides from solution. However, when only pIEF Separation methodology to improve proteome coverage depth Special Report informahealthcare.com is used in the first dimension, the separation capacity is comparable to that of the above-mentioned LC workflows. Very recently, two interesting reports described the expansion of the pIEF approach. In a proteogenomic approach, Branca et al. [44] used high-resolution pIEF in a very narrow pH range (3.7-4.9) combined with the calculated pI values of all tryptic peptides predicted from genomic sequences that are translated into six open-reading frames. Using this approach, they identified numerous peptides not present in the Peptide Atlas [45] and several novel protein-coding gene loci. Atanassov and Urlaub [46] combined pIEF with in-gel digestion of proteins and showed that the detection of lowabundance proteins is significantly enhanced when peptides derived from individual gel slices after in-gel digestion are further separated by pIEF, extracted from pIEF slices again and then finally subjected to nanoflow LC-MS.
Finally, in addition to the described improvements in MS equipment as well as in the various separation approaches described above, another issue should be taken into account: sample preparation. Lysis of cells and solubilization are critical issues that are often underestimated when comprehensive indepth MS-based analyses are designed. Of the reagents used so far, SDS has proven to have by far the best solubilization properties. The fact that SDS acts so powerfully in solubilization of protein material is also a factor in favor of the use of SDS-PAGE as a first step in protein separation for in-depth proteome analysis (see above). However, SDS as a non-volatile surfactant is not compatible with MS per se, and also not with the above-mentioned LC techniques for peptide separation, so it must be removed from the protein/peptide mixture. This problem was solved by the introduction of FASP (filter-aided sample preparation) where samples (cells, tissue, etc.) are solubilized and denatured in the presence of high concentrations of SDS, which is then exchanged by filtration through urea in order to hydrolyze the samples under these conditions [47]. Urea can be removed by repeated filtration, and peptides are recovered from the filter. This preparation method has proved its feasibility in various in-depth MS-based proteome analyses when combined with the above-mentioned separation techniques for complex mixtures, such as SCX chromatography, strong anion-exchange chromatography, pIEF or the use of long, heated capillary nanoflow RP-LC columns coupled to the mass spectrometer. It is especially useful when dealing with minute sample amounts, for example from formalin-fixed, paraffin-embedded clinical tissue specimens [48,49].

Expert commentary
In-depth proteomics has entered the field of MS-based proteomics in cell biology and is being implemented in an increasing number of laboratories. Impressive improvements in MS instruments, combined with LC systems and capillary columns packed with material of relatively small particle size that allows much sharper peptide separation, yield spectacular numbers of fragmented peptides and thus of proteins identified. Straightforward sample-preparation techniques, along with the use of novel fractionation approaches at the protein and peptide levels, further enhance protein identification. A variety of twodimensional separation strategies are available today to tackle samples of high complexity and high dynamic range. The choice of strategy depends on the known properties of the analytical system in hand, that is, the expected complexity and dynamic range, and also on very practical questions -especially of sample transfer between separation steps and into the mass spectrometer.
However, the use of additional dimensions of separation greatly increases the instrument time spent on in-depth proteome analyses, requires higher amounts of starting material and -on account of the necessary fractionation -reduces sample-to-sample reproducibility. Consequently many laboratories have developed the highly desirable 'single-shot' proteome analysis, where the one-dimensional separation of peptide mixtures on nanoflow RP-LC linked to the mass spectrometer produces impressive results. While it does not yet allow a comprehensive identification of expressed proteins including low-abundance ones, it allows the analysis of significant numbers of biological and technical replicates with increased reproducibility. As its dynamic range is limited compared with that of methods employing multi-dimensional separation, the single-shot strategy currently seems best suited for the analysis of samples obtained in vitro (e.g., cultured cell lysates) rather than those obtained in vivo -especially body fluids, which exhibit a high dynamic range of protein expression.

Five-year view
Coming improvements in MS instrumentation working in DDA mode, combined with increasingly powerful separation approaches, will certainly allow us to dig even more deeply into the proteomes of cells or other biological systems. While the identification of, for example, up to 90% of the proteins expressed in yeast is an impressive achievement in itself, it comes at the cost of low sequence coverage and insufficient characterization of, for example, PTMs for most lowabundance proteins. Consequently, the goal of 'in-depth proteomics' is still far from being reached, and even further improvements will still leave a lot of room for focused approaches, for example, for PTM profiling.
A change in paradigm is available through the so-called data-independent acquisition (DIA) strategies. Instead of stochastically selecting peptides for fragmentation analysis by appropriate settings of the mass spectrometer's acquisition software, DIA strategies aim at recording MS/MS libraries of all peptides present in a complex proteome sample [50,51]. These libraries can be re-interrogated after analysis, enabling researchers to perform iterative analysis on the existing data and extracting qualitative and quantitative information independent of the acquisition parameters. While DIA strategies have not yet reached the same depth of analysis as the highly refined DDA approaches, they promise the steepest slope of improvement for in-depth proteomics analysis in the next 5 years.

Financial & competing interests disclosure
The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript.
This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
No writing assistance was utilized in the production of this manuscript.

Key issues
• 'In-depth proteomics', that is, the attempt to identify all proteins expressed and hence present in a biological system.
• Significant analytical challenges from protein complexity and especially dynamic range, since no protein-amplification technology is available.
• Chromatographic resolution and acquisition speed of the mass spectrometer are maximized to compensate for sample complexity.
• 'Single-shot proteomics' to minimize the impact of additional dimensions of protein/peptide separation on instrument time and reproducibility.
• 90% proteome coverage in yeast and 50-60% in human cell culture are made possible by 'single-shot' analyses.
• Multi-dimensional protein and peptide separations still add an additional level of analysis depth.
• Re-annotations of genomes are performed on the basis of 'in-depth proteomics' studies.