A Functional Bayesian Model for Hydrogen–Deuterium Exchange Mass Spectrometry

Proteins often undergo structural perturbations upon binding to other proteins or ligands or when they are subjected to environmental changes. Hydrogen–deuterium exchange mass spectrometry (HDX-MS) can be used to explore conformational changes in proteins by examining differences in the rate of deuterium incorporation in different contexts. To determine deuterium incorporation rates, HDX-MS measurements are typically made over a time course. Recently introduced methods show that incorporating the temporal dimension into the statistical analysis improves power and interpretation. However, these approaches have technical assumptions that hinder their flexibility. Here, we propose a more flexible methodology by reframing these methods in a Bayesian framework. Our proposed framework has improved algorithmic stability, allows us to perform uncertainty quantification, and can calculate statistical quantities that are inaccessible to other approaches. We demonstrate the general applicability of the method by showing it can perform rigorous model selection on a spike-in HDX-MS experiment, improved interpretation in an epitope mapping experiment, and increased sensitivity in a small molecule case-study. Bayesian analysis of an HDX experiment with an antibody dimer bound to an E3 ubiquitin ligase identifies at least two interaction interfaces where previous methods obtained confounding results due to the complexities of conformational changes on binding. Our findings are consistent with the cocrystal structure of these proteins, demonstrating a bayesian approach can identify important binding epitopes from HDX data. We also generate HDX-MS data of the bromodomain-containing protein BRD4 in complex with GSK1210151A to demonstrate the increased sensitivity of adopting a Bayesian approach.

Supplementary Figure S1: Example prior predictive checks.Each panel shows a prior predictive check for a different summary statistics.We simulate from the prior predictive distribution denoted as y rep and compare to the observed data y.We see that the prior predictive is diffuse but correctly located S2 x yrep y Supplementary Figure S2: Example posterior predictive checks.Each panel shows a posterior predictive check for a different summary statistics.We simulate from the posterior predictive distribution denoted as y rep and compare to the observed data y.We see that the posterior predictive is concentrated and correctly located S3

Control of error rates in null experiments
To demonstrate that our Bayesian analysis controls false discoveries, we perform a permutation experiment, using the experiment on MBP generated in seven replicates introduced in the main text.The seven MBP samples without any structural variant can be used as a null experiment by partitioning the replicates falsely into two conditions.That is three of the samples are labelled condition A and four samples are labelled condition B, arbitrarily.We randomly permute the samples labelled A and B, five times.We then computed the posterior probability that each peptide is perturbed (alternative model).For each permutation this is visualised in a histogram of all the probabilities.We see that the posterior probability is never above 0.05 suggesting excellent control of error rates; that is, we never give confident support to the wrong model.We also check that the probabilities are calibrated by computing the Brier score for each permutation experiment.We plot the Brier scores as a boxplot and see that they are essentially 0, indicating good calibration.When a portion of amides undergo correlated deuterium exchange due to simultaneous unfolding and refolding, we observe EX1 kinetics.EX1 kinetics describe different populations of proteins undergoing different hydrogen-deuterium exchange mechanisms.This results in multi-modal deuterated spectra, in which each mode corresponds to a different protein sub-population.This behaviour can be challenging to model.Briefly, at the level of each residue a logistic model is an appropriate model of exchange: where r indexes residues, b r denotes the rate constant, a r the deuterium recovery and t is the time of exposure to heavy water.However bottom-up mass-spectrometry measures peptides and so the observation process is: Here i index peptides and R i the set of exchangeable residues for peptide i.However, in practice, it is challenging to fit this model because of the number of parameters typically exceeds the number of observations.Therefore, it is required to approximate the kinetics for each peptide.Some suggested models include the following 1,2 : Logistic: Weibull: Sum of logistics: A more complex possibility is a non-parametric model y i ≈ f (t), but we have chosen to explore if the simpler models are sufficient to explain the majority of HDX-MS kinetics.Currently, there is no statistical methodology or guidance on determining which approximation to use for statistical testing for peptide-centric HDX-MS data.We are particularly interested in determining the preferred model for data with EX1 kinetics.Here, using Bayesian statistics, we test whether a logistic or Weibull model is a better model of centroided EX1 kinetics (see methods).To perform this analysis, we simulate EX1 kinetics for 100 peptides.The length of each peptide was sampled uniformly between 5 and 15, with the amino acids chosen uniformly at random.The first mode was assumed to have 0 deuterium incorporation, whilst the incorporation for the second mode was sampled uniformly from [0, 1] and the charge state was sampled uniformly between one and eight.Deuterium incorporations measurements were made at 0, 300, 500, 700 and 1000 seconds post exposure to heavy water.The relative proportion of the second mode was assumed to rise exponentially.For each peptide two replicates were simulated to allow for natural variations.Bimodal spectra were simulated using the natural isotope distributions for that peptide and the centroid computed.To examine whether a logistic or Weibull model was preferred, we used the posterior probability of each model, as well as the leave-one-out expected log predictive density -a measure of out of sample predictive performance of our models (see methods for more details).Figure S6 plots paired boxplots which show that the Weibull model was preferred according to both measures.This is despite the additional penalisation of the Weibull model because of a more diffuse prior.

Model selection for structural spike-in experiment
Having analysed simulated data, we next analyse a structural spike-in experiment, where HDX data on maltose-binding protein (MBP) was generated in seven replicates across four HDX labelling times 3 .Additional experiments were carried out in triplicate for the W169G (tryptophan residue 169 to glycine) structural variant.Here MBP-W169G was spiked into the wild-type MBP sample in 5, 10, 15, 20, 25% proportions, and a further experiment included a 100% mutant sample.All data were analysed on a Agilent 6530 Q-TOF mass spectrometer and raw spectra processed in HDExaminer.Since there are two populations of proteins (either the WT or W169G), these HDX data undergo multi-modal exchange dynamics.This allows us to further test our previous model selection approach in practice.Furthermore, as these experiments are replicated we can also examine a random-effects model which could account for random variations in the plateauing of the HDX kinetics (see methods) across replicates.Thus, there are now three possible models to consider: a logistic model, a Weibull model, a Weibull model with random plateaus.To compare these models formally, we again use posterior model probabilities and the leave-one-out expected log predictive density (ELPD).For brevity, we consider the 10% and 15% spike-in experiments.
Figure S7 shows ternary plots for these three models, where the metrics have been rescaled to indicate relative model preference.Each pointer represents a modelled peptide.In general, the posterior probability suggests a preference for the Weibull model.Occasionally, there is a preference for logistic model with little support for the more complex random plateaus model.The conclusions are somewhat similar when using ELPD, but the random plateau model has a little more support.This suggests that the preferred model is a Weibull model and the more complex random plateau model is unnecessarily complex but might be useful in predicting new data.Hence, for under-determined HDX-MS data, where there are fewer measured data points than the full kinetic model, Bayesian modelling offers a principled approach to determine an approximate model.In the supplementary material, we also show that our Bayesian approach does not generate false positives in a null permutation experiment.S1: A summary of the correspondence between the Bayesian analysis of HDX-MS data and the co-crystal structure.Distance analysis is performed at eight Å.Solubility analysis is performed as described in the supplement.HDX-MS analysis peptide indicates the measured peptide.NA is used to indicate no peptide with exchangeable residues was measured for that residue.S15

HDX-MS of BRD4
A 6xHis-BRD4(1-477) construct was expressed and purified as previously described [5].For HDX labelling experiments, 2.5 µM stock of BRD4 protein (12.5 pmol) was prepared by dilution in a reference buffer of 50 mM MOPS, 150 mM NaCl, pH 7.2.The inhibitor-bound and apo forms were prepared by the addition of i-BET151 (for a final concentration of 25 µM) or an equivalent volume of DMSO (2% final concentration) and pre-incubated for at least 30 min at 1 • C. The reaction was initiated by a 12-fold dilution of 5 µL protein sample (12.5 pmol) in labelling buffer (50 mM MOPS, 150 mM NaCl, pD 7.2 (pHread = 6.8 with standard calomel electrode)) at 20 • C using an automated sample handling workflow (LEAP HDX PAL, Trajan Scientific).Labelling times were sampled at 0, 15, 60, 600, 3600, 14400 s in triplicate.Protein samples were quenched and denatured by an equal volume of quench solution (6 M guanidine hydrochloride, 400 mM sodium phosphate pH 2.2, 2% formic acid) for 1 min at 1 • C and immediately injected onto an immobilized nepenthesin-2 column (2.1 mm x 20 mm, Affipro, CZ).The resultant peptides collected on a precolumn trap (UPLC BEH C18 Vanguard, Waters) for 4 min with 0.2% formic acid, 0.03% TFA in H2O at a flow rate of 100 µL/min.Peptides were then eluted by liquid chromatography (1.7 µm UPLC BEH C18 column, 1.0 x 50 mm dimensions, 130Åpore size, Waters) for 12 min at a flow rate of 20 µL/min at 0 • C, over a gradient of 11-40% of 0.2% formic acid in MeCN before ramping to 98% for a further 3 min, and a 4 min sawtooth gradient cleaning cycle.A LeuEnk and GluFib lock solution was co-injected as an internal standard.Data was acquired on a Synapt G2-Si high definition mass spectrometer (Waters) in the data-independent HDMS acquisition mode.For fully labelled control experiments, BRD4 was labelled for 1 h in 6 M d5-guanidine deuteriochloride diluted in labelling buffer, before quenching as described above.Samples for 100% D control experiments were handled manually.
Peptide mapping experiments were completed in a separate experiment with a 25 pmol injection of protein diluted in the unlabelled reference buffer, and subsequently treated as above, using a HDMSe acquisition mode.Peptide lists from the peptide mapping experiments were generated in Protein Lynx Global Server 3.0 software.The peptide lists generated were subsequently imported into HDExaminer v2.5.0 (Sierra Analytics, Modesto, CA) and subject to further filtering (peptide length < 25; PLGS Score > 6.5; Products per amino acid > 0.3; ∆ ppm < ± 10) and analysed to determine deuterium uptake.Only peptides containing data with adequate intensity and reliable isotope distributions for all timepoints and both states were preserved.

S6 1 . 4
Bayesian analysis controls error rates in null experiments.(1-5) Histograms for the computed posterior probability of the alternative model.The alternative model is clearly never strongly supported (6) The histogram of Brier scores for each null permutation experiments.The values are close to 0 Bayesian model selection for HDX 1.4.1 A Weibull model for EX1 kinetics

S7
Examining model selection for EX1 kinetics.Paired boxplots for the two metrics of interest: posterior model probabilities (left) and leave-one-out expected log predictive density (ELPD) (right).Grey lines indicating the simulation pairings, indicating not only preference on average for the Weibull model but the vast majority of peptides.