A Bayesian approach to quantifying uncertainty from experimental noise in DEER spectroscopy Journal of Magnetic Resonance

Double Electron-Electron Resonance (DEER) spectroscopy is a solid-state pulse Electron Paramagnetic Resonance (EPR) experiment that measures distances between unpaired electrons, most commonly between protein-bound spin labels separated by 1.5–8 nm. From the experimental data, a distance dis- tribution P ð r Þ is extracted using Tikhonov regularization. The disadvantage of this method is that it does not directly provide error bars for the resulting P ð r Þ , rendering correct interpretation difﬁcult. Here we introduce a Bayesian statistical approach that quantiﬁes uncertainty in P ð r Þ arising from noise and numerical regularization. This method provides credible intervals (error bars) of P ð r Þ at each r . This allows practitioners to answer whether or not small features are signiﬁcant, whether or not apparent shoulders are signiﬁcant, and whether or not two distance distributions are signiﬁcantly different from each other. In addition, the method quantiﬁes uncertainty in the regularization parameter.


Introduction
Double Electron-Electron Resonance (DEER), also called Pulse Electron-Electron DOuble Resonance (PELDOR), is a solid-state pulse Electron Paramagnetic Resonance (EPR) experiment that measures long-range (1.5-8 nm) distances, r, between paramagnetic species to sub-ångström resolution [1][2][3]. It is commonly applied to proteins and other biomacromolecules immobilized in frozen glassy solutions, and has been used to investigate systems that can be challenging to NMR and X-ray crystallography, such as membrane proteins [4][5][6][7][8][9][10]. DEER provides complete distance distributions, PðrÞ, by measuring the through-space magnetic dipole-dipole interaction between two stable paramagnetic spin labels.
DEER data consist of an oscillatory time-domain signal that can be transformed via regularized numerical inversion into an interspin distance distribution function PðrÞ. Since this inversion is mathematically ill-conditioned, numerical regularization is necessary. Tikhonov regularization is the most widely used regularization procedure. It is a least-squares method that imposes a degree of smoothness in PðrÞ, reflecting the inherent width in the distribution owing to spin label and protein mobility [11][12][13]. The magnitude of the roughness penalty is scaled with a regular-ization parameter, a. This regularization scheme complicates error analysis and uncertainty quantification [13].
The distance distribution PðrÞ is, in principle, rich in structural information, but in practice its usable information is often limited to the dominant spin-spin distance. Peak widths and shapes, which describe structural heterogeneity, are more uncertain as they are highly sensitive to noise levels, degree of regularization, timedomain truncation, background correction, etc. There does not currently exist a general and robust method for fully quantifying uncertainty in the Tikhonov estimate of PðrÞ [4,[12][13][14]. As a result, it can be difficult to draw robust conclusions from DEER distance distributions. The lack of uncertainty quantification prevents access to important structural information.
Concerningly, it is very uncommon for any error or uncertainty estimates to be displayed in PðrÞ. After surveying over 100 recently published papers containing DEER distance distributions, we have found that nearly 90% lacked error estimates on PðrÞ (see SI). This state of affairs may hinder more widespread adoption of DEER spectroscopy. Enabling robust statistical analysis of DEER data will significantly expand the reliability, and therefore the utility, of DEER.
In this paper, we present a Bayesian statistical method to quantify the uncertainty arising from measurement noise and regularization in the DEER distance distribution estimated with Tikhonov regularization. The effects of other sources of uncertainty, such as background correction, can be assessed by extending the method. When evaluating a given PðrÞ, DEER practitioners often request to see the time-domain trace in order to qualitatively assess uncertainty in the PðrÞ. There is an empirical understanding that data with a higher signal-to-noise ratio (SNR) yield more reliable distance distributions, particularly in the case of multi-modal distributions, but quantifying the reliability of PðrÞ remains an open issue. Further, the selection of the value for the regularization parameter a, which is required for Tikhonov regularization, is a source of bias and uncertainty.
Bayesian methods have been successfully implemented in many applications involving the solution of ill-conditioned inverse problems, including medical imaging, underground resource exploration, image reconstruction, and astronomical spectroscopy [15][16][17][18]. Recently, the utility of Bayesian statistics has been recognized in the estimation and analysis of model parameters related to biophysical measurements [19][20][21].
The next section summarizes the basics of DEER and Tikhonov regularization. Then, we introduce a new Bayesian approach that provides a full probabilistic analysis of PðrÞ using a Markov-Chain Monte Carlo (MCMC) algorithm. After that, we illustrate its performance using model distributions. Finally, we discuss how the method can be applied to answer interpretational questions about PðrÞ and how it relates to other forms of error assessment.

The DEER signal
The time-domain DEER signal, VðtÞ, is the integrated spin echo measured as a function of the timing of the pump pulse relative to the probe pulses within the DEER pulse sequence. VðtÞ is modulated by the dipolar interaction between spin labels. It is a combination of the desired modulation due to intramolecular spin-spin dipolar interactions, FðtÞ, and an undesired background contribution due to intermolecular interactions, BðtÞ (Fig. 1). In dilute solutions, VðtÞ can be expressed as a product [1,2] VðtÞ ¼ V 0 Á FðtÞ Á BðtÞ ð 1Þ FðtÞ ¼ ð1 À kÞ þ kSðtÞ ð 2Þ where V 0 is the unmodulated integrated echo intensity, k is the modulation depth (a measure of the efficiency of the DEER pump pulse, 0 6 k 6 1), and SðtÞ is the intramolecular dipolar modulation function. Eq. (2) is based upon the assumption that the sample contains only A-B spin pairs, and is not valid for the case where an A-A pair is excited by the observer sequence. This case is unavoidable in homogenous labeling schemes, as in the common case of two nitroxide radical labels on one protein. A more complete expression contains additional terms, as explained in [22,23]. For the purposes of this work, the specific forms of BðtÞ and FðtÞ are irrelevant; the analysis is only concerned with SðtÞ.
The background BðtÞ is a decaying exponential, where D is the dimensionality of the system (3 for soluble proteins). The decay constant k depends upon the spin concentration and inversion efficiency. For coupled pairs of spin-1/2 in an amorphous glass or powder, and in the absence of orientation selection, SðtÞ is given by the integral where the integral kernel Kðt; rÞ describes the powder average of the dipolar interaction and PðrÞ is the distance distribution probability density function (PDF) R drPðrÞ ¼ 1 À Á that describes the conformational heterogeneity of the ensemble of spin-labeled protein molecules in the sample. The dipolar kernel is [1,2,24] Kðt; rÞ ¼ is the perpendicular dipolar frequency, , and CðzÞ and SðzÞ are the cosine and sine Fresnel integrals, respectively [25,26]. The frequency of oscillation in the time domain is proportional to the inverse cube of the spin-spin distance. With this kernel and the normalized PðrÞ, SðtÞ is normalized such that Sð0Þ ¼ 1 in the absence of noise.
Experimentally, SðtÞ is measured at n discrete time points t i . For the data analysis, P is also discretized, typically at n points r j , although a different number of points is possible. The discretized form of Eq. (4) is where K is here an n Â n matrix, and S and P are n Â 1 vectors, with elements: The step size in the distance domain, Dr, has been incorporated into K in order to simplify notation. This discretization preserves normalization of S and P. After background correction, the DEER experiment yields the noisy signal, S. However, the practitioner seeks the distance distribution, P.

Tikhonov regularization
In order to extract P from S, it is necessary to solve Eq. (7) for P. This would require inverting the matrix K. Unfortunately, K has a large condition number, and so the computation of its inverse is prone to large numerical errors. Thus, Eq. (7) is an ill-conditioned inverse problem [27]. To obtain a solution, numerical regularization is required. The regularization method preferred in the DEER literature is Tikhonov regularization [11][12][13]27], which solves the multidimensional least-squares minimization problem: Fig. 1. The normalized time-domain DEER signal, VðtÞ=V0 (solid black), is the product of two signals: the oscillatory intramolecular modulation, FðtÞ, that provides information about spin-spin distances in biradicals, and an averaged intermolecular signal in the form of a decaying exponential background, BðtÞ (dashed gray). The background is removed and FðtÞ is shifted and rescaled by k, the modulation depth, to obtain the dipolar modulation function, SðtÞ.
The first term in Eq. (11), q, is the residual sum of squares and captures how well the solution P a fits the data in the time domain (time-domain misfit, or prediction error). The second term, g, is a measure of the overall roughness of the solution. The regularization parameter a determines the importance of the roughness relative to the prediction error in the fitting.
Since P is a probability density, it is constrained to be nonnegative. Therefore, the simple analytical solution to the unconstrained problem, which is cannot be used. In order to solve the constrained problem, numerical iterative algorithms have to be employed [27,28].

Regularization parameter selection
The regularization term in Eq. (11) represents an assumption about the distance distribution: it should be smooth. This is a reasonable assumption as it is difficult to imagine a biologically relevant, or microscopically realistic, scenario where the distance between two intrinsically mobile spin labels varies in a very rough and discontinuous fashion on a sub-ångström scale. The degree of regularization is determined by the magnitude of the scalar parameter a, which is a nuisance parameter that must be separately determined. For very small a, roughness is not penalized, the solution is extremely spiky and over-fits the noise. For large a, roughness is over-penalized, P a will be over-smoothed, and the fit to the data will be poor.
Since the value of a significantly influences the form of the solution, it is important to choose an optimal value for a. One commonly employed method is the L-curve criterion [12,13,29]. The L-curve is a parametric curve in the log-log plot of g against q. This curve typically adopts a characteristic ''L" shape. The L-curve criterion considers the optimal value for a to be found at the corner of the L-curve [29]. In practice, the corner is often determined by eye. Alternatively, the point of maximum curvature can be found automatically [30]. This removes some practitioner bias from the data processing procedure and increases reproducibility.
Leave-One-Out Cross Validation (LOO-CV) is another approach [30]. Generally, CV approaches withhold a subset of the data from the Tikhonov analysis and then iteratively adjust a to reduce the error between the fit and the withheld data [30]. The LOO variant of CV excludes each data point, one at a time, from the analysis and then minimizes the total prediction error. This can be expressed as the minimization problem [31] a LOOCV ¼ min a X n i¼1 Kð:; iÞP a À SðiÞ 1 À h a ði; iÞ where h a is the a-dependent matrix KQ a K T and P a is the Tikhonov solution for a particular a. In our hands, LOO-CV outperforms the L-curve criterion. However, it is still a heuristic method and does not provide uncertainty estimates for a.

Bayesian uncertainty quantification
In this section, we present an MCMC method based on Bayesian statistics for quantifying uncertainty in P and a. It utilizes SðtÞ and assumes that background correction has already been performed. We introduce the basic probabilistic equations, show the connection to Tikhonov regularization, and describe the MCMC algorithm, generate representative samples of P and a, and analyze them statistically.

The posterior distribution
We start by considering what is observable and given, and what is unknown, uncertain, and wanted. Here, we consider the intramolecular dipolar modulation function SðtÞ observable and given. SðtÞ is obtained from the experimental data VðtÞ after background correction. It is an inseparable sum of the true signal and corrupting noise. A second useful quantity that can be obtained from experiment is the noise variance. We obtain it experimentally by acquiring and storing each phase-cycled DEER trace separately and then statistically analyzing the distribution of values for each t i across the set of acquired traces. As shown in the SI, this noise is Gaussian with amplitude roughly independent of t i . The noise deviations at adjacent time points are not correlated. Therefore, we can characterize the noise by a single noise variance, r 2 S . (If recording separate traces is not possible on a particular spectrometer, one may use the root mean square difference between a Tikhonov fit and the data as a stand-in. We do not recommend this, as errors in the fit will propagate to the final results.) For notational convenience, we will henceforth use its inverse, called the precision Next, we consider the unknown and uncertain quantities. These include the desired underlying distance distribution P. However, since the problem is not solvable without regularization, a regularization parameter is required as well. There is no absolute certainty about its optimal value. Here, we will use the symbol d and show later that d is proportional to a 2 .
To summarize, S and s are observable and fixed, and P and d are unknown and desired. Our goal is to fully characterize the conditional probability distribution for P and d, given the measured S and s: pðP; djS; sÞ. First, we construct an expression for pðP; djS; sÞ. Like any conditional probability, it is proportional to the full joint probability pðP; djS; sÞ ¼ pðP; d; S; sÞ pðS; sÞ / pðP; d; S; sÞ ¼ pðS; P; d; sÞ: In the second step we omitted the marginal probability pðS; sÞ, since it is a constant independent of P and d. In the last step, we reordered the variables. This joint probability can be written as a chain product of four probabilities pðS; P; d; sÞ ¼ pðSjP; d; sÞ Á pðPjd; sÞ Á pðdjsÞ Á pðsÞ: Next, we introduce specific expressions for the factors on the right-hand side. The first term indicates the conditional probability of a certain S given P, d and s. For this, we can use a Gaussian with the weighted sum-of-squares-deviation in the exponent pðSjP; d; sÞ ¼ pðSjP; sÞ ¼ s (We omit d in the notation, since the expression does not directly depend on it). This expression captures the essence of leastsquares fitting: if P is such that KP fits S well, the magnitude of the exponent is small and the probability large. On the other hand, if the fit is poor, the exponent is large in magnitude, and the probability is small (Fig. 2). The second term in Eq. (17) is the conditional probability of a certain P given values for d and s. For this, we choose a Gaussian smoothness requirement: where L is the same operator as in Tikhonov regularization (commonly the second derivative) and d determines the roll-off rate of the probability as a function of the roughness kLPk 2 . This expression assigns higher probabilities to smoother P, as illustrated in Fig. 2. The third term in Eq. (17) is the probability of a given d. Since we are uncertain about the value of d, we use a very broad distribution. For mathematical convenience and because of its highly tunable shape, we choose a gamma distribution [17] where we omitted s because it is independent of d. The fixed parameters a and b determine the shape and width of the distribution such that its mean is a=b and its variance is a=b 2 . We use a ¼ 1, in which case the expression simplifies to a decaying exponential, b expðÀbdÞ (see Fig. 2). We discuss the selection of b in Section 3.3.
It is not currently clear if our choice of Eq. (20) is optimal; other superior choices could exist. The gamma distribution is, however, commonly used in conjunction with Gaussian priors.
Finally, the last term in Eq. (17), pðsÞ, is a constant, since s is known from experiment. We can therefore drop it and use the experimental s value.
To summarize, our target expression for the conditional probability for P and d, given S and s, is pðP; djS; sÞ / pðSjP; sÞ Á pðPjdÞ Á pðdja; bÞ ð 21Þ with the right-hand side quantities given by Eqs. (18)- (20) and illustrated in Fig. 2. This expression is a form of Bayes' law and constitutes a Bayesian hierarchical model. pðP; djS; sÞ is called the posterior PDF, or simply the posterior. pðSjP; sÞ is called the likelihood and describes how likely it is to measure a certain S given a particular P and s. pðPjdÞ is a called the prior PDF, or simply the prior. It describes our knowledge about P prior to the acquisition of the data (i.e. P is not rough). pðdja; bÞ is a hyper-prior which describes prior knowledge about d. The reason we use a gamma distribution in d is that this makes the total expression a gamma function in d. Eq. (20) is therefore called a conjugate hyper-prior [15]. Similarly, the Gaussian prior in P is conjugate with the likelihood, since the total expression is also Gaussian in P.

Connection to Tikhonov regularization
The particular P that maximizes the posterior probability is the most likely P and is called the maximum a posteriori (MAP) solution: P MAP ¼ max PP0 pðP; djS; sÞ. The same P can be obtained by minimizing the negative logarithm of that probability: max PP0 pðP; djS; sÞ ¼ min PP0 ðÀ log pðP; djS; sÞÞ. Dropping all Pindependent prefactors in pðP; djS; sÞ, taking the negative logarithm, and dividing by s=2 gives À log pðP; djS; sÞ / kS À KPk 2 þ d s This is identical to the Tikhonov functional in Eq. (11) if [15,32,33] Consequently, the P a obtained by Tikhonov regularization is equal to P MAP in the special case where Eq. (23) is true. Eq. (23) is not a general constraint on d or a: s and d are independent variables in the Bayesian model.

Gibbs sampling
In order to obtain visualizable and interpretable results, we need to compute the average and other measures of P over the distribution given in Eq. (21). Due to the multivariate and constrained nature of the expression, this is not possible analytically. Therefore, we need to resort to numerical sampling techniques to generate a set of P and d such that any expectation (e.g. mean) over this set approximates well the corresponding analytical expectation value over the entire posterior distribution, pðP; djS; sÞ.
For this, we use the Gibbs sampler, which is a Markov-Chain Monte-Carlo (MCMC) algorithm for drawing numerical samples from a multivariate probability distribution [34]. The Gibbs sampler generates samples of P and d from pðP; djS; sÞ by successively sampling from the posterior of each variable (P and d) in turn. In our case, the necessary posteriors are pðPjd; S; sÞ and pðdjP; S; sÞ.
To get pðPjd; S; sÞ, we take Eq. (21) and omit terms that do not contain P. As shown in the SI, the resulting expression gives a multivariate normal distribution in P pðPjd; S; sÞ / exp À with mean . Drawing a random sample of P from this multivariate normal distribution under the nonnegativity constraint P P 0 can be done by solving the following randomized minimization problem ( [33], see SI): where v is a random vector sampled from the normal distribution Normalð0; IÞ with an identity covariance matrix, and C L is the lower triangular Cholesky factor of R such that C L C T L ¼ R (see SI). To solve Eq. (25), we use the Fast Non-Negative Least Squares (FNNLS) algorithm [28] (see Section 7).  26), therefore introducing minimal bias [33]. Initially, to achieve such an uninformative flat distribution, we chose b values between 10 À6 and 10 À3 , which render the decay rate of the exponential very slow [33], as in Fig. 2. We found that b had to be 10 À5 or smaller to avoid influencing the results. This choice resulted in estimates for a that were unreasonably large, so we instead used the LOO-CV estimate of a to select b. Recalling the relationship between d and a from Eq. (23), we set the mean of Eq. (26) ðã=bÞ equal to a 2 LOOCV s and solve for b to obtain: b ¼ã where P init is calculated via Eq. (25) using d ¼ a 2 LOOCV s. To draw random samples from the gamma distribution, we use the algorithm of Marsaglia and Tsang [35]. The Gibbs sampler explores the full (P, d) space by taking random steps (Monte Carlo) along each parameter dimension, using the last value sampled for the other parameter (Markov Chain) [15,36]: The symbol $ denotes drawing a random sample from the distribution to its right, and subscripted t refers to the sample index. Note that d t , the sampled value of d, is used immediately in the generation of a sample of P, P t .
The Gibbs sampler produces a chain of P and d values that is guaranteed to converge to the true distribution [15,17,34]. Upon convergence, the algorithm yields a large set of (P t , d t ) samples, distributed over parameter space such that their density is proportional to pðP; djS; sÞ. The initial part of the chain (called the burn-in) is usually discarded, since initially the distribution is not yet stationary and depends on the particular choice of the starting value.
Assessment of convergence needs to be done carefully. We consider the sampling converged when the 2nd, 25th, 50th, 75th, and 98th percentiles for each point in P change by less than 10 À3 nm À1 .
For more easily computed values such as a and the mean of P, we use a multi-chain convergence test [33,37]. This method requires that multiple chains are run from different starting points. At each convergence check, the intra-and inter-chain variances of some test statistic are compared. When the inter-and intra-chain variances coincide, the sampler is considered converged. The details are discussed in the SI.

Analyzing the Gibbs samples
Next, we calculate the median and spread estimates for the set of P vectors and d values from the Gibbs sampler to obtain measures of uncertainty for P and d. For each r j , the values of P t ðr j Þ are sorted into quantiles, using method 8 from [38]. Specifically, we calculate the 2nd, 25th, 50th, 75th, and 98th percentiles. We generate distance distribution plots, where the median P (50th percentile) is shown as a line, and the other quantiles are used to draw error bands. The 2nd and 98th percentiles encapsulate the 96% Bayesian credible interval (BCI). This interval indicates that the parameter in question has a 96% probability of falling within the interval [17]. Likewise, the 25th and 75th percentiles bound the 50% credible interval.
We analyze the distribution of d, and consequently a, in a similar fashion. From the set of d samples, we obtain a mean and the root-mean-square deviation. The method therefore automatically determines a, starting from the LOO-CV estimate, without the need of user choices and the danger of user bias and inconsistency. However, any systematic error in the LOO-CV estimate will affect the final results. We next show that the Bayesian method described above captures well the effect of noise on P. Fig. 3 shows DEER time-domain traces generated from two model distance distributions (one unimodal and one bimodal) and three different levels of noise. For these data, six independent chains per data set converged within 11,500-20,500 iterations. The sets of P and d from the resulting chains represent the posterior distribution and are illustrated in Fig. 4. The results are plotted in terms of a, rather than d. This scatter plot of the value of P at r ¼ 3 nm vs. a for each iteration of two chains of the Gibbs sampler reveals that most of the samples fall near to the model value for P and the a that best recovered the model through Tikhonov regularization. As the distance from the maximum of the posterior increases, the density of points decreases. To illustrate the Markov chains that built this particular posterior, the initial values and the first 25 iterations for two chains are highlighted. The sampler alternates between steps in the d and P dimensions. The initial value starts the Markov Chain in a region of low probability, and then quickly finds the region of high probability, sampling around the posterior. Though highly informative, this view of the posterior would be difficult to visualize for all n ¼ 400 points in each P considered, so we examine the results in terms of quantiles.

Credible intervals
The resulting median and the 50% and 96% credible intervals obtained from the Gibbs sampler for simulated data are shown in Fig. 5. The width of the credible intervals increases with the noise level and decreases with increasing d: variance is reduced at the cost of increased bias (see SI), but s dominates. High noise levels may lead to much more drastic deviations in the distance distribution. Bimodal distributions display wider credible intervals than unimodal distributions for a given noise level, again matching our expectations. The average and standard deviations of a, obtained from d via Eq. (23), are given in Fig. 5.
Small features adjacent to the primary peaks are likely to be artifactual. For example, the small feature at 4.8 nm in the unimodal PðrÞs in Fig. 3 is shown in Fig. 5 to be indistinguishable from zero and therefore artifactual at all noise levels considered.
The credible intervals in Fig. 5 show the persistence of a second peak at 3.9 nm for bimodal PðrÞs in the mid-and high-SNR cases. In the low-SNR case, where Tikhonov regularization is unable to recover a bimodal distribution due to poor data quality, the credible intervals reflect that the second peak cannot be resolved. In some cases, as in Fig. 5 top right panel, the BCIs do not encapsulate all of the original model distribution. This is likely due to a systematic error introduced by the LOOCV method that is unable to deter-mine the optimal regularization parameter. This error affects the MCMC procedure because the LOOCV estimate is used to determine the hyperprior.

Histograms
By generating histograms of single-point estimators such as the mean, the mode, or the variance for each P t , we can gain additional insight. Fig. 6 shows an example for the model data from Fig. 3. The histograms for the high-noise cases are larger than those for the low-and medium-noise cases because the Gibbs sampler took many more iterations to converge. In both the unimodal and bimodal cases, the location of the most populated distance does not significantly change with increasing SNR, though it does become more precise. This stability of the mode with respect to noise has been previously observed [11]. In contrast, the peak width is statistically significantly different at different noise levels, and unrelated to the precision of the mode. However, the confounding effect of overestimation of d due to the systematic error in the LOOCV estimator, which would tend to increase peak width via oversmoothing, cannot be ruled out. The mean likewise shifts with increasing noise.

Regularization parameter
In Tikhonov regularization, error assessment for a is not performed. One distinct advantage of the method described here is that it generates a distribution of d and thereby a, allowing statistical analysis of this parameter. Fig. 7 shows an example. For the high-and medium-SNR cases from Fig. 3, the Bayesian MCMC approach generates a narrowly distributed set of as close to the value that minimizes error between P a and the model P. The discrepancy is greater in the low-SNR case. LOO-CV outperforms the 'L'-curve, but both heuristic methods overestimate a. This systematic overestimation may result in an underestimation of the credible intervals (see Fig. S2).

Mixed-width distribution
One example that is frequently brought up to describe the limitations of Tikhonov regularization is a bimodal PðrÞ consisting of a wide and a narrow Gaussian peak of roughly equal area (Fig. 8) [11,14]. From the noisy time-domain signal corresponding to this PðrÞ, Tikhonov regularization predicts the narrow peak well, but represents the wide peak as a combination of features that have widths similar to that of the narrow peak. The smoothness bias in Tikhonov regularization applies approximately equally over the entire range of r and, in this case, this bias prevents faithful recovery of the original mixed-width bimodal PðrÞ. The Bayesian method outlined here provides error bands that visually indicate   23)). This distance corresponds to the peak maximum of the model P. Chain 1 begins at a ¼ 1 and chain 2 begins at a ¼ 100. The dashed line in the a dimension corresponds to the a that minimizes the error between the Tikhonov solution and the model P.
After a brief burn-in for both chains, the algorithm finds and samples the posterior distribution, visiting regions of high probability more often than those of low probability. The offset from the model values is due to noise and the systematic bias introduced by the LOO-CV estimate of a.
that the shape of PðrÞ is quite uncertain in the area of the wide feature. This shows that the wide peak cannot be recovered as faithfully as the narrow feature.

Discussion
Although not the only source of uncertainty in DEER distance distributions, noise in the time-domain signal contributes spurious features and partially determines the degree of regularization required. Researchers often go to great lengths to minimize the presence of noise. As a first step towards full uncertainty quantification in DEER analysis, the present work successfully quantifies the uncertainty in PðrÞ stemming from experimental noise. This will make the interpretation of DEER data easier and more robust, as more quantitative arguments about the significance or insignificance of a given feature can be made.

Interpretation guidelines
Bayesian credible intervals are a natural way to visualize the uncertainty in PðrÞ given a certain noise level in the data. Features lying within a given credible interval are statistically indistinguish-able at that level of credibility. The credible intervals can therefore guide interpretation of DEER distance distributions and help guard against overinterpretation. Next, we consider the following typical interpretative questions: (a) Significance of small peaks, (b) presence of shoulders, and (c) relevance of the difference between two distributions.
(a) Small side peaks are not uncommon in experimental DEER distance distributions. With the credible intervals, it is possible to assess whether they are significant or not. If the credible interval of a small peak includes the baseline, then the small peak is statistically indistinguishable from zero at that level of credibility and must be considered insignificant given the data. For example, in the top left panel of Fig. 5, the small feature at about 4.7 nm can be considered insignificant. Likewise, the small feature about 1.7 nm in Fig. 8 can be discounted. (b) Shoulders in distributions can indicate conformational subpopulations. If a straight reference line (or a spline) connecting two points on the median distribution to the left and to the right of the shoulder fully falls within the credible intervals, then the shoulder is not significant. In such a case, a P with a shoulder is statistically indistinguishable from a P without a shoulder. For example, in the bottom right panel of Fig. 5, a reference line connecting the median P from the left and right of the higher distance feature about 3.9 nm runs outside of the credible interval, implying that the shoulder is statistically significant. In Fig. 8, the shoulder feature at about 2.6 nm can be considered insignificant, as a line connecting the median P local maximum at about 3.2 nm to the same curve at about 2.2 nm falls entirely within the BCI. Overall, this indicates that both the Tikhonov solution and the P median show detail between 2.5 and 3.5 nm that is not significant. (c) When comparing two distributions, the credible intervals allows the practitioner to quickly identify whether the distributions are significantly different. If the credible intervals of two P do not overlap, then they are different. If there is overlap in the credible intervals, then significance testing can be employed to answer basic questions such as: are the mean spin-spin distances significantly different between these two samples? For example, the BCIs in Fig. 5 for the unimodal and bimodal high-or mid-SNR cases are clearly distinct, revealing a statistically significant difference. Additionally, one can construct histograms as in Fig. 6 and perform a Student's t-test. Fig. 6 demonstrates that PðrÞ peak width can be significantly affected by experimental noise. This implies that quantifying uncertainty in the width of PðrÞ is crucial in studies that compare the relative widths of distributions.

Comparison with other methods
In the literature, the quality of a given PðrÞ is most often assessed by examining the time-domain data, considering the magnitude of the regularization parameter, and then making a qualitative judgement call (see SI). Several attempts have been made to quantify uncertainty in DEER. Those quantitative estimates of error from signal noise that the authors have encountered come from one of three methods: the noise feature of the Validation Tool in the software DeerAnalysis [13]; error propagation [13,27]; and various model-based approaches [14,24,[39][40][41][42][43][44][45].

DeerAnalysis
DeerAnalysis [13] is the most commonly used software for DEER data processing, and it uses Tikhonov regularization to calcu-   the category of bootstrapping methods. The user sets several parameters, including the magnitude of the added synthetic noise and the number of iterations. The noise is therefore not guaranteed to be equal in magnitude to the experimental noise level, compromising accuracy. Convergence is likewise not guaranteed. The error bars displayed are the AE2r bands of the set of calculated P vectors. This visualization is useful if the population is normally, or close to normally, distributed. This is not guaranteed to be the case, especially where P is zero or close to zero. The confidence intervals often include regions of negative probability, which is not physically valid, so negative valued indices of the AE2r bands are set to 0. No error analysis regarding the uncertainty in the choice of a can be performed.
Compared to the validation tool in DeerAnalysis, the present method is more time consuming because of the large number of required iterations. DeerAnalysis also considers additional sources of uncertainty, such as background correction and pulse bandwidth. However, the Bayesian method features convergence criteria, utilizes real noise information, reduces user bias, and provides uncertainty estimates for a.

Propagation
The next method is more limited in scope than either DeerAnalysis or the proposed Bayesian approach. The variance of the noise in SðtÞ, r 2 S , can be propagated from the time domain to the distance domain using r 2 P ¼ r 2 where the primes indicate that only rows and columns in Q a (as defined in Eq. (13)) are retained that correspond to the indices where the constrained Tikhonov solution is non-zero. Commercial EPR spectrometers store acquired data only after signal averaging, so noise statistics are usually unavailable. In this case, the fit RMSD (root mean square difference) between the Tikhonov fit KP and the data S is often used as a stand-in for r S . As a result, any error in the fit will corrupt the calculation. This approach gives no error estimate for regions where the Tikhonov solution is zero. This method is used in the program FTIKREG [27], which was used by DeerAnalysis until 2016. We have used this method in the past [46,47].
Unlike the propagation method, for which true noise statistics are usually unavailable, the proposed Bayesian method uses observed noise information. It also provides estimates for the full r domain, including regions where PðrÞ has been constrained to non-negative values. Finally, the propagation method considers a to be given and fixed, and so provides no uncertainty estimates for the level of regularization applied.

Model fitting
In addition to the Bayesian approach, it is possible to use classical frequentist methods. One such approach to quantify uncertainty is to assume a particular model for PðrÞ, fit the model to the data, and then statistically evaluate the error in the various fit parameters, usually with some variant of the v 2 cost function.
The Bayesian method proposed here quantifies uncertainty arising from time-domain noise and regularization, ameliorating one of the disadvantages associated with Tikhonov regularization compared to model-based approaches. The Bayesian approach so far does not take into account background correction. As with Tikhonov regularization, the proposed method is model-free with respect to PðrÞ. The credible intervals produced by the present method do not include any negative values.
With the Bayesian credible intervals in hand, it is now possible to quantitatively compare model-based and model-free methods. This will require an extensive benchmark set of distributions that are likely to be encountered in practice.

Conclusions
We introduce a Bayesian statistical method for the analysis of DEER spectroscopic data. The method extends the existing method of Tikhonov regularization: First, it not only determines PðrÞ, but it also fully quantifies the associated uncertainty (error) due to noise in the data. Second, it automatically determines the regularization parameter a and provides an associated error estimate. The uncertainty regarding a is included in the overall error estimates for P.
The statistically determined uncertainty estimates are visualized using quantiles. This is an important improvement to the application of DEER spectroscopy as it guides interpretation, reduces practitioner bias, increases reliability, and helps reproducibility.
Although it outperforms current methods, the LOOCV method, used to set the hyperparameter b in the d hyperprior (Eq. (20)), introduces a systematic overestimation of the level of regularization. This error is propagated into the final results, where it may have the effect of underestimating the credible intervals, and should be kept in mind. We hope that a better hyperprior will be found in the future to eliminate this bias.
The Bayesian treatment is far from complete. There are still many other sources of uncertainty affecting the DEER distance distribution that need to be addressed. The error estimates determined here do not capture uncertainty arising from background correction, time-domain truncation, and discretization. Uncertainty in the background correction, in particular, can give rise to very significant uncertainty in the distance distribution [11,13]. The proposed method may be combined sequentially with the validation tool of DeerAnalysis to assure that a good choice for the background function has been used. The Bayesian statistical approach outlined here is flexible, general and robust enough to be extended to capture the influences of, and correlations between, these other sources of uncertainty.

Methods
All computations were carried out using MATLAB R2016a [48]. The dipolar kernel matrix K was calculated using Fresnel integrals [49]. Non-negativity constrained minimization for Tikhonov regularization and for Gibbs sampling was done using the Fast Non-Negative Least Squares (FNNLS) algorithm [28]. Each minimization took between 0.01 and 0.1 s on a standard laptop computer. Uniform and normally distributed pseudo-random numbers were generated with the Mersenne Twister pseudo-random number generator (PRNG) as implemented in MATLAB R2016a using a seed value of 19,560 to generate model data and 12,345 for Gibbs sampler runs. The PRNG was reset before each MCMC run, but not during a run, i.e. not between chains. The gamma-distributed pseudorandom numbers were generated using the algorithm of Marsaglia and Tsang [35,50].