Protein Footprinting Comes of Age: Mass Spectrometry for Biophysical Structure Assessment*

Protein footprinting mediated by mass spectrometry has evolved over the last 30 years from proof of concept to commonplace biophysics tool, with unique capabilities for assessing structure and dynamics of purified proteins in physiological states in solution. This review outlines the history and current capabilities of two major methods of protein footprinting: reversible hydrogen-deuterium exchange (HDX) and hydroxyl radical footprinting (HRF), an irreversible covalent labeling approach. Technological advances in both approaches now permit high-resolution assessments of protein structure including secondary and tertiary structure stability mediated by backbone interactions (measured via HDX) and solvent accessibility of side chains (measured via HRF). Applications across many academic fields and in biotechnology drug development are illustrated including: detection of protein interfaces, identification of ligand/drug binding sites, and monitoring dynamics of protein conformational changes along with future prospects for advancement of protein footprinting in structural biology and biophysics research.


INTRODUCTION TO FOOTPRINTING: ROOTS AND HISTORY
"Footprinting" based technologies were initially developed to understand nucleic acids structure and dynamics. Initially enzymatic cleavage of DNA was used to sample "exposed" sites revealing the "protected" DNA binding site for a protein-DNA complex (1) The precision and accuracy of the methods were later refined by Galas and Schmitz establishing footprinting as a quantitative biophysics tool for studying macromolecular interactions (2,3). Over time chemical labeling approaches using small reagents, including the hydroxyl radical, (Fig. 1A) were developed to precisely define the boundary between exposed and buried regions and to extend footprinting even to in vivo studies (4 -6). Early protein footprinting studies included hydrogen exchange footprinting to examine protein structure (advanced by Rosa and Richards) (7,8) whereas alternative approaches to probe conformational change used enzymatic cleavage and gel analysis (e.g. similar to those of nucleic acids footprinting) to successfully define Ab epitopes (9).
Advances in mass spectrometry revolutionized protein footprinting starting in the early 1990s. For example, in chemical labeling studies of surface accessible lysine residues (10,11) mass spectrometry methods were employed to identify modification sites based on mass shift (Fig. 1B). At the same time the hydrogen-deuterium exchange (HDX) 1 field adopted mass spectrometry introducing intact protein exchange MS (12,13) as well as localization of exchange through cleavage and peptide analysis by MS (13). As protein footprinting techniques evolved (similar to the development of nucleic acids footprinting), enhancing structure resolution and measurement accuracy were needed to move the field forward. In addition, footprinting experiments typically need to balance the need for comprehensive labeling (required to get a good signal across many sites) while avoiding artifacts of the labeling process itself. In this respect HDX-MS is advantageous as it represents a minimal perturbation of the protein. For other protein labeling methods, which are typically irreversible, a detailed understanding of the chemistry of the individual reagents, development of appropriate dosimetry measurements, and developing quantitative mass spectrometry readouts of modified species with adequate dynamic range have been critical to progress. In this review, we discuss the evolution of reagents and MS methods that now allow accurate medium to high-resolution assessments of protein structure in solution to be routinely completed using micrograms of protein samples that can be poised in a wide range of biochemical states.
Description of Labeling Technologies for Footprinting: Deuterium and OH Radical Labeling Probes Backbone and Side-chain Structure of Proteins-HDX-MS accurately reports stability of the hydrogen bonding of backbone amide residues by measuring isotopic exchange of hydrogen with its deuterium isotope (14). In continuous HDX labeling a target protein solution is labeled by dilution or rapid buffer exchange in pure D 2 O to produce final deuterium concentrations greater than 95% (15). Exchange is monitored for time periods ranging from seconds to days until the desired level of exchange takes place. Aliquots of the reaction are acid quenched at selected time points and analyzed by LC-MS subsequent to protease cleavage separately. Peptide mass values and intensities are extracted from the chromatogram and mass shifts are monitored versus time as changes in the centroid mass ( Fig. 2A).
Hydroxyl radicals suitable for HRF labeling can be generated by various methods such as radiolysis of water with electrons, x-rays or gamma radiation, photolysis of peroxide, transition metal-dependent chemical reactions with peroxide, or high voltage electrical discharge in water (16 -28). The hydroxyl radicals abstract hydrogen from aliphatic residues and directly attack sulfur atoms and aromatic rings (29). The peptide-radical species generated are efficiently quenched by oxygen in solution leading to the final products detected by MS. Typically, in HRF-MS a target protein solution is labeled for defined times and the extent of unmodified and modified species is determined after protease cleavage by mass based extraction from the chromatogram for peptides of interest (Fig. 2B). Compared with chemical reactions such as Fenton chemistry (30) where minutes of exposure to labeling reagents is required, synchrotron radiolysis (29,31,32) and laser photolysis approaches (33,34) have microsecond to millisecond exposure times.
Experimental Approachesa. Bottom-up Proteomics to Deconstruct, Extract, and Quantify Structural Information in Footprinting Experiments-Subsequent to labeling of the target protein, mass spectrometry-based bottom up proteomic approaches are applied to identify and quantitate peptides across the protein sequence that accumulate mass increases as a function of labeling. The ability to associate specific mass accumulations with specific peptides generated by protease cleavage provides details of local structure. Given the special conditions required for minimizing back-exchange in HDX-MS, acidic tolerant proteases such as pepsin are widely used to digest deuterated proteins at pH 2.5 (where exchange is minimized) in an on-line format using immobilized enzyme that provides rapid and reproducible cleavage in minutes (35,36). Aside from rapid protease digestion, the accuracy and precision of HDX-MS data has benefited from the use of ultra-high pressure reverse phase chromatography systems (UPLC) that provide efficient and fast separation of peptide fragments (ϳ6 min) at reduced temperature (0°C) (35).
Because oxidative modifications are stable and irreversible, the downstream analysis for HRF-MS, especially direct tandem MS detection of modified sites, are more flexible than for HDX. Digestion conditions can be optimized for maximal coverage (e.g. selections of denaturant, reducing reagents, cleanup procedures and enzyme combinations). After digestion, peptide mixtures containing both unmodified and modified species are separated by reverse phase LC and identified by MS. The most common oxygen adducts (ϩ14, ϩ16 Da) usually elute prior to unmodified species because of the increased polarity conferred by the addition of oxygen whereas Arg or His oxidized species elute later than the unmodified species owing to different chemistries involved. By comparison with HDX-MS, hydroxyl radical labeling chemistry is complex, and additional limitations include potential scavenging of radicals by solutes such a sugars, glycerol, lipids, nucleotides etc. that can interfere with labeling desired side chain targets.
b. Data Analysis and Interpretation of Mass Spectrometrybased Footprinting Data-In HDX, deuterium incorporation can be analyzed by the "centroid" or the "theoretical fit" method. The centroid method measures the weighted m/z average of each peptide distribution over the experimental time course (13). Theoretical fit method determines the number of incorporated deuterium of a HDX peptide by fitting multiple theoretical isotopic envelopes of the peptide consistent with the observed isotopic distribution using least square regression (37,38). In HRF-MS peptide modification extents and rates (k) are calculated using exact mass based extracted ion currents from LC-MS. The fraction of unmodified or modified peptide is derived from ratio of chromatography peak area under the unmodified/modified species to total peak areas of all the species (including unmodified and modified). Modification rates of a peptide can be calculated from the dose-response curve plotted by exposure times (t) versus unmodified fraction of the peptide (y (t)) which follows a pseudo-first-order function: y͑t͒ ϭ e Ϫkt . Fig. 4A illustrates typical data for the labeling of ACKR3 (atypical chemokine receptor 3) bound with two different ligands (chemokine CXCL12 and small molecule CCX777) as a function of exposure time (40), the y axis here reflects the degree of "unmodified" peptide whereas the x axis reflects the labeling time. It was discovered that residues in N terminus and ECL2 of ACKR3 have rates of modification reduced by CXCL12 binding whereas multiple residues in the transmembrane region vary in the case of the CCX777 complex as shown in Fig. 4B.
c. Prospects for automation-Data acquisition and analysis of large-scale structural MS data in protein footprinting ex-

FIG. 2. Flow chart of HDX-MS (Top) and HRF-MS (Bottom).
A, Accessible hydrogen on backbone of the target protein are exchanged by deuterons after incubating with D 2 O on timescales ranging from seconds to days and the exchange is quenched by lowering pH and temperature of the reaction buffer. The uptake of deuterons is measured by use of rapid LC separation (ϳmins) and mass spectrometry detection after fast pepsin digestion (ϳmins) at low temperature. HDX-MS curves are plotted by deuteron uptake (as a number or %) as a function of exchange time. B, The target protein is exposed to hydroxyl radical labeling and the reaction is stopped by addition of quenchers (e.g. 10 mM methionine amide). The extent of oxidation is measured by LC separation (ϳhours) and mass spectrometry after proteolysis under optimized conditions. HRF dose response curves are plotted as a fraction of unmodified peptides versus exposure time. periments is a cumbersome task. Promisingly, integrated software has been developed to ease the burden of data analysis. MagTran (41) is a software that calculates centroid m/z for HDX peptides. HX-Express is a semiautomated data processing HDX-MS data that exports deuterium incorporation rate curves and peak width plots (42). Three hierarchical HDX software packages include: the deuterator (43), HD Desktop (HDD) (44), and HDX Workbench (45); these provide more automated features including web-based user interfaces for entering raw data, visualization of complete HDX-MS data and results, and comparison of multiple projects. TOF2H is designed to automate data analysis of MALDI-based HDX (46). HDX analyzer (47) is developed to evaluate structural dynamics changes using HDX data with multiple statistical methods included (R, Python, and RPY2).
ProtMapMS (www.neoproteomics.net) is a software that integrates identification of peptide modifications, dose response curves plotting and oxidation rate calculation for HRF data analysis (48). Another software, Byologic TM (www. proteinmetric.com) can identify protein modifications/sequence variants and visually compares MS1 and MS2 spectra of modified and unmodified forms (49). Its capability of accurate relative quantitation of oxidized variants relative to the unmodified sequence provides utility in HRF data analysis.
Use of Structural Mass Spectrometry Data for Structural Assessmenta. Footprinting Provides Protein/Protein Interface Mapping-Binding interfaces involved in protein-protein interactions typically occlude solvent and stabilize protein secondary and tertiary structures of proteins in a complex. Thus, observed reductions in labeling rates upon protein complex formation in protein footprinting experiments (HDX-MS or HRF-MS) are used to identify potential binding sites (50).
Many successful examples of mapping binding interfaces have been published to date. However, as footprinting only reports local information and cannot distinguish direct binding from allosteric effects of binding, other strategies such as mutagenesis, homology modeling, cross-linking or other structural methods are combined to support an understanding of the exact nature of the binding epitopes. For example, a combination of HDX-MS and mutagenesis was used to identify the binding interfaces between auto-antibodies derived from patients with acquired thrombotic thrombocytopenic purpura and ADAMTS13, the host metalloproteinase target (39). Epitopes in five-loop regions with slowed HDX rates upon binding were detected by HDX MS (Fig. 3B) whereas mutations in the five proposed binding loops of ADAMTS13 eliminated antibody binding and provide support for the proposed epitope. HDX-MS, like X-ray crystallography, can also identify complex discontinuous epitopes that are missed by linear methods such as peptide scanning and phage display. This is clearly illustrated in a study that defined the binding of factor H binding protein (fHbp), a key virulence factor and vaccine antigen of Neisseria meningitides, with its bactericidal mAb 12C1 (51) as the MS and crystallography data delineated roles for both N-terminal and C-terminal peptide regions. Both HDX-MS and HRF-MS were used to map the epitope of human epidermal growth factor receptor (EPGR) that binds to adnectin, a targeted biologic derived from the fibronectin domain, these solution data were consistent with that from crystallography (52,53).
Protein footprinting techniques to probe protein-protein interactions have been used heavily in biotechnology research, these studies are quite important for verifying epitopes and paratopes in monoclonal antibody drug development (54). In addition, protein footprinting is valuable for comparing structures of "innovator" therapeutic proteins and their biosimilars; this can help establish the potential equivalency of the bio- Residues are colored using a gradient where red colored residues demonstrated lower oxidation rate (more protected) in ACKR3:CXCL12 than in ACKR3:CCX777 whereas blue colored residues demonstrated higher oxidation rate (less protected) in ACKR3:CXCL12 compared with ACKR3:CCX777. Gray colored residues are not modified and half-colored circles indicate that an oxidation labeled one or both of two adjacent residues. The regions within the dashed line are the seven transmembrane domains of ACKR3. similars with the patented drugs. For example, HDX-MS was used to characterize the structure of various versions of insulin (55,56) and several forms of glucocerebrosidase (brand name Cerezyme) for treatment of Gaucher disease (57). Likewise, HRF-MS was used to analyze conformations of therapeutic proteins including recombinant erythropoietin (EPO), interferon ␣-2A (IFN), granulocyte colony-stimulating factor (GCSF) (58).
b. Footprinting Provides Protein-Drug Interaction Information-Small molecule binding to proteins does not bury a large surface area as in protein-protein interactions, however protein footprinting can reveal these drug-occupied surface or near surface pockets. For example, HDX-MS was used to analyze small molecule binding to a functioning PPAR␥ (peroxisome proliferator-activated receptor gamma) (59). HDX analysis of one such synthetic compound (SR1664) revealed that in contrary to agonists, SR1664 binding increased the mobility of the PPAR␥ C-terminal end helix (H11), which is close to another helix (H12) contributing to hydrogen bonds of PPAR␥ and full agonists (60). In the glioma field the drugbinding region of a potent inhibitor (SCB4380) targeting PTPRZ (protein tyrosine phosphatase receptor-type Z) was identified by HDX-MS in catalytic pocket of PTPRZ-D1 (61). Likewise, HRF-MS analysis of serotonin type 4 receptor (5-HT4R) in the presence and absence of an antagonist revealed the ligand-binding pocket consistent with data from other GPCRs and was used as well to define mechanisms of signal transmission (62). Also using HRF-MS, a small molecule partial agonist of atypical chemokine receptor 3 (ACKR3) was mapped to its binding site (Fig. 4B) along with the details of the receptor's binding of chemokine CXCR12 (Fig. 5A) (63). Overall, protein footprinting is seen to be very valuable in mapping the binding of both small and large molecules to proteins for both academic research and drug development. studied by pulse-labeling HDX-MS, where the protein unfolding and refolding was performed by addition and dilution of denaturant (64). Deuteration of the unfolded part of the protein measured at each folding time was interpreted in the context of existing crystal structures. A study of chaperone (GroEl/ ES)-assisted protein (TIM-barrel) folding by use of pulse-labeling HDX-MS showed acceleration of the protein folding by the chaperones (65). HDX-MS has also been used for studies of protein aggregation or fibril formation resulting from improper protein folding (66) and pulse-labeling HDX-MS was used to monitor refolding of bacteriorhodopsin induced by the denaturant (67). Quench flow HDX-MS can be used access millisecond time scale dynamics as seen in cytochrome-c refolding and other protein dynamics studies (68,69).
HRF-MS can monitor protein conformational changes from microseconds to minutes. A recent radiolysis-based HRF-MS study of YiiP, a proton-coupled Zinc (II) transporter, elucidated the millisecond conformational changes of water accessibility in the TM5 domain during Zinc (II) binding (70). Time-resolved synchrotron radiolytic HRF was also employed to study the dynamic conformational changes in gelsolin upon Ca (II) binding (71) on the millisecond timescale. In both these studies, introduction of metal ions to apo-protein via rapid mixing was initiated, and delay times from a few milliseconds to a few minutes (after mixing) were introduced prior to exposure of samples by radiolysis.
A combination of HDX-MS, HRF-MS and another structural technique, small angle X-ray scattering (SAXS), was employed to investigate the conformational changes of orange carotenoid protein (OCP) upon photo-activation (72). In addition to global structural changes revealed by SAXS, local conformational changes providing a detailed model of the conformational dynamics were detected by HDX-MS and HRF-MS. Detection of decrease in solvent accessibility of carontenoid binding residues in N-terminal domain and increases in C-terminal domain, together with crystal structural data of the OCP and RCP (red carotenoid protein) revealed a 12 angstrom translocation of carotenoid pigment within the protein upon OCP photoactivation including time resolved analysis of the reaction (73). This work is an excellent example of the merging of local detailed structural information on dynamics provided by footprinting with global crystallographic and SAXS signatures.
Stocks et al. (74) combined stopped-flow mixing and pH jump to observe unfolding of cytochrome-c, holo-myoglobin, and S100A11 in the 10 ms range; HRF-MS was achieved by laser photolysis of peroxide at time points of interest after mixing. A laser induced temperature jump coupled with laser photolysis of peroxide (called FPOP or fast footprinting of proteins) was implemented by Gross et al. in probing folding dynamics of barstar on the sub-millisecond time scale (75). Employing sub-millisecond laminar flow mixing coupled with HRF-MS, investigation of apomyoglobin folding revealed that sub-millisecond structure formation is primarily driven by hy-drophobic side chain interaction not backbone hydrogen bonding (76). Additionally, FPOP HRF-MS was used to differentiate conformational difference of Im7 (native, partially folded and globally unfolded) providing information of individual side chains in folding process (77).
New Opportunitiesa. High Resolution Structure Assessment-Achieving the maximum possible structural resolution for protein footprinting approaches has been an important goal of the field, and has now been achieved by combinations of several strategies. We introduce a figure of merit for comparing structural resolution across protein footprinting experiments defined as the number of probe sites identified and quantified divided by the total number of residues, expressed as a %. This analytic view is important as protein footprinting may achieve 100% "coverage," in that peptides that cover the entire protein sequence may be detected and analyzed, but sub-peptide level readouts may not be available to probe the fine details of the molecular interactions unless special care is taken. We examined over 60 manuscripts reporting 75 footprinting results primarily from the last 10 years to understand the evolving approaches across many laboratories. supplemental Table S1 summarizes the survey results, including 38 HDX-MS results and 31 HRF-MS results (16 using laser photolysis of peroxide and 15 using radiolysis), and seven other covalent labeling-MS results (1 acetylation, 3 GEE and 3 carbene labeling). Overall, using the above figure of merit, HRF-MS achieved structural resolution of 40% or more in many cases, whereas HDX-MS in many cases attained 100% structural resolution. These levels of structural resolution have been achieved in part through improvements in LC instrumentation (particularly UPLC) to separate peptides more efficiently and more quickly. For HRF-MS use of multiple proteases including pepsin has been effective in enhancing structural resolution whereas for HDX-MS, the development of approaches that generate overlapping peptic fragments has been quite successful (78). All these methods quantify the peptide intensities at the MS1 level. The "holy grail" of single residue structural resolution conceptually involves quantification at the MS2 level, through peptide fragmentation and tandem ion quantification. Significant progress along these lines has also been achieved by both HRF-MS and HDX-MS.
Reaching single residue structural resolution via MS2 has been challenging for HDX-MS owing to hydrogen scrambling during CID fragmentation. However, in certain cases good structural resolution was achieved by with minimal scrambling via ETD fragmentation for ␤2-microglobin (structural resolution ϭ 43%) (79) and staphylococcal nuclease (structural resolution ϭ 66%) (80). In both cases, careful detuning was used to avoid scrambling during isolation, which reduced the sensitivity of the approach. Also, charge bias (Ͼ ϭ 3ϩ) for peptides that are easily detected in ETD/ECD can result in incomplete sequence coverage for some large proteins, limiting application of the method.
In HRF-MS application of UPLC methods has resolved isobaric peptides where the oxidative modification exists on differing residues. In these cases, a single type of modified species, eluting at a defined retention time and with a characteristic mass, can be extracted to provide single residue footprinting data. These UPLC approaches, in combination with pepsin digestion and denaturation, were applied in HRF-MS footprinting analysis of ␤-amyloid via radiolysis achieving 35% structural resolution (81). Using laser photolysis based HRF-MS, investigators have achieved 40% structural resolution for thrombin (82). MS2 approaches are more feasible for HRF-MS, as scrambling is minimal, such quantification approaches have achieved 43% structural resolution for calmodulin (83). Quantitation of the labeling in this case was performed by combined analysis of both MS1 and MS2 intensities of target peptides as modification induced fragmentation effects can interfere with the analysis. Footprinting structural resolution can also be extended by use of alternative labeling reagents, such as GEE, which labels D and E residues (84,85); these can add up to 10% to the structural resolution depending on the target sequence. Carbene mediated protein footprinting has also been reported as an alternative to label residues that are less reactive to hydroxyl radicals such as Asp, Glu (86). and carbene labeling designed on a FPOP platform resolved different structural domains of apo and holo calmodulin (86). A synthesized bespoke diazirine -based carbene footprinting with high labeling efficiency were developed to probe the substrate-binding sites of lysozyme and deubiquitinating protease (87).
b. Measuring Quantitative Structural Parameters with Footprinting-Typical footprinting measurements provide a relative comparison between two states of interest. However, inferring absolute structure parameters is possible using information from covalent labeling. For example, measured HRF-MS oxidation rates can be converted to protection factors (PF) using the known intrinsic reactivity of amino acid side chains for hydroxyl radicals (88). These HRF-based PFs show statistically significant correlations with structural factors such as solvent accessible surface area and number of local structural contacts. This was illustrated at the single-residue level for Ca 2ϩ -calmodulin, where the solvent accessible surface area values side chains was predicted from footprinting data (83). Fig. 5 shows an example of applying this new approach, where the crystal structure data of a chemokine (CXCL12) was compared with the natural log PF values derived from experimental HRF MS data (Fig. 5A, 5B, and 5C). The resulting correlation provided a molecular ruler for inferring the absolute extent of protection of the chemokine when bound to its receptor (ACKR3) as shown in Fig. 5D (63). This combination of high structural resolution and quantitative structure assessment places HRF-MS in a unique position to assess structure in absolute versus relative terms, the previous default approach for footprinting. c. Footprinting in Cells-A longstanding goal of footprinting studies is the examination of macromolecular structure inside living cells. Although this may be out of reach using exchangeable reagents such as deuterium, it is a realistic prospect in the case of irreversible labeling. For nucleic acids footprinting, dimethylsulfate was developed as a footprinting reagent for detection of protein-DNA interactions in cells (89,90); these approaches have been considerable improved over time (91,92). Radiolytic footprinting has recently been used to monitor ribosome assembly in bacteria through monitoring of RNA based protections as a function of assembly (93)(94)(95) providing novel and powerful readouts of complex cellular structures.
Mass spectrometry to examine protein interactions in cells is appropriately dominated by progress in the cellular crosslinking field (96 -99); however, such studies do not provide the level of molecular detail presumed for footprinting, where high coverage of the molecule is the goal. For protein footprinting in cells, the challenges are quite substantial and include identification of reagents that can penetrate cells and modify protein species efficiently coupled to recovery of sufficient material for MS detection of multiple labeled sites per protein.
Hydroxyl radicals are a useful reagent for in vivo footprinting, but the low ambient oxygen levels inside cells reduces oxidation efficacy relative to in vitro studies. Possible approaches to overcome these limitations have been explored by Espino et al., who used laser photolysis of peroxide to oxidize proteins inside cells (100). In this case the peroxide was able to efficiently penetrate cells and native enzymes, like superoxide dismutase, provided a source of oxygen. Future approaches will certainly require multidimensional separations, targeted proteomic analysis and pull-down strategies to enhance coverage and provide functional information on cellular structures of interest. CONCLUSION Protein footprinting has evolved considerably over 30 years and its growth has occurred hand in hand with the significant advances in mass spectrometry that have occurred during the same time. At present, protein footprinting is making unique contributions to understanding protein structure and dynamics for macromolecular complexes, membrane proteins, and bio-therapeutics: three of the most challenging and important areas of research in structural biology today. In particular, the modest sample requirements and flexibility in use of solution conditions for conducting footprinting analysis are two of its greatest advantages. Thus, difficult studies such as highresolution structural kinetic analysis of protein dynamics or examination of the structure of native membrane protein forms are feasible by these methods. Recent advances in technique have pushed the structural resolution closer to the maximum possible, but further increases in signal to noise for detecting labeled species are needed to address more complex systems routinely. Conflict of interest statement: Dr. Chance is a founder and shareholder of the company NeoProteomics which has licensed technology from Case Western Reserve University in the field of mass spectrometry including the software ProtMapMS. Neo Proteomics also is a contract research organization performing commercial protein footprinting services.