Bootstrap aggregation for model selection in the model-free formalism

Abstract The ability to make robust inferences about the dynamics of biological macromolecules using NMR spectroscopy depends heavily on the application of appropriate theoretical models for nuclear spin relaxation. Data analysis for NMR laboratory-frame relaxation experiments typically involves selecting one of several model-free spectral density functions using a bias-corrected fitness test. Here, advances in statistical model selection theory, termed bootstrap aggregation or bagging, are applied to 15N spin relaxation data, developing a multimodel inference solution to the model-free selection problem. The approach is illustrated using data sets recorded at four static magnetic fields for the bZip domain of the S. cerevisiae transcription factor GCN4.


Introduction
Since the original publications in the early 1980s, the modelfree formalism of Lipari and Szabo (1982a, b) and the related two-step approach of Halle and Wennerström (1981) have served as starting points for extracting dynamical information about macromolecules from NMR spin relaxation data.The original approaches represented intramolecular dynamics using a single generalized order parameter and effective correlation time.In the ensuing decades, increasingly complex models have offered a more refined understanding of internal and overall molecular motions.Extended model-free formalisms characterize intramolecular dynamics using generalized order parameters and effective correlation times for more than one timescale (usually two) (Clore et al., 1990;Gill et al., 2016).Related approaches employ discrete or continuous distributions to more fully capture the range of intramolecular correlation times (Lemaster, 1995;Calandrini et al., 2010;Khan et al., 2015;Hsu et al., 2018Hsu et al., , 2020;;Smith et al., 2019).Other strategies employ physical models or atomistic molecular dynamics simulations for overall rotational diffusion and internal conformational fluctuations, to more directly link the NMR phenomena to underlying physical processes (Tugarinov et al., 2001;Zerbetto et al., 2013;Ollila et al., 2018;Polimeno et al., 2019a, b;Mendelman et al., 2020;Mendelman and Meirovitch, 2021).The availability of extended model-free formalisms, or other approaches with variable numbers of parameters, has created a further dilemma: should a data analysis protocol extract the most exacting information justified by the data or employ the model most robust to experimental variation?
Several authors have addressed model selection by employing the principle of parsimony or Occam's razor (Palmer et al., 1991;Stone et al., 1992;Mandel et al., 1995;d'Auvergne and Gooley, 2003;Chen et al., 2004).These approaches seek to identify the simplest model that explains the data within experimental uncertainties by applying various bias-correcting penalties to the fitness statistic, e.g., F statistic, Akaike information criterion (AIC), or Bayesian information criterion (BIC).These corrections alone often fall short of producing robust inferences and may yield parameter values susceptible to instability in both simulated and realworld replicates.In these situations, the model selection process has failed the principle of "worrying selectively".This criterion suggests, "Since all models are wrong, the scientist must be alert to what is importantly wrong."(Box, 1976).
To illustrate the issue more concretely, a typical data analysis protocol uses a nonlinear weighted least-squares algorithm to fit experimental spin relaxation data with a set of Published by Copernicus Publications on behalf of the Groupement AMPERE.
model-free spectral density functions (Mandel et al., 1995;Gill et al., 2016).The resulting χ 2 residual sum-of-squares variables are penalized for the number of adjustable parameters in each model function, the model with the lowest penalized residual sum-of-squares is selected as optimal, and the best-fit parameters of the model are reported.However, this procedure is subject to model selection error: random statistical variation in the experimental data may lead to one model chosen as optimal for a given data set, but another model, with different set of parameters, may be selected if the experimental data were replicated, with consequent different random variation.The problem of joint model selection and parameter estimation has been explored elegantly by d' Auvergne andGooley (2007, 2008a, b) and by Abergel et al. (2014).
The present paper addresses model selection error by using the approach of bootstrap aggregation or bagging.This concept originated from a desire to improve the performance of machine learning algorithms.Thus, Breiman showed that predictor accuracy and stability improved when averaging predictor values obtained from bootstrap replicates of the original training set (Breiman, 1996).Buja and Stuetzle subsequently extended the use of bagging to generalized statistical analysis and showed sampling with and without replacement yield equivalent improvements (Buja and Stuetzle, 2006).The approach and notation of Efron are used in the following (Efron, 2014).
Bootstrap aggregation improves parameter stability; consequently, the resulting variations in model-free parameter values, for example between atomic sites or functional states in a given macromolecule, are more likely to be biologically or chemically meaningful.Although applicable to most model selection situations, bootstrap aggregation exhibits the most pronounced benefits when the data justify two distinct models with similar degrees of certainty.
Bootstrap aggregation for model-free analysis of NMR spin relaxation rate constants is illustrated by application to backbone amide 15 N spin relaxation data that have been recorded at 1 H magnetic fields of 600, 700, 800, and 900 MHz for the bZip domain of the S. cerevisiae transcription factor GCN4 by Gill and coworkers (Gill et al., 2016).

Theory
In the following, the notation used by Efron is rephrased in terms appropriate for NMR spin relaxation data (Efron, 2014).Laboratory-frame nuclear spin relaxation rate constants for backbone 15 N spins can be transformed into sets of spectral density function values, J (ω), in which ω is an eigenfrequency of the spin system (Farrow et al., 1995;Gill et al., 2016).Laboratory-frame 15 N relaxation rate constants, typically R 1 , R 2 , and the steady-state nuclear Overhauser enhancement (NOE), recorded at a single static magnetic field yield estimates of J (0), J (ω N ), and J (0.87ω H ), in which ω N and ω H are the 15 N and 1 H Larmor frequencies.Thus, the number of spectral density values N = 3G, in which G is the number of static magnetic fields utilized.In the present application, G = 4.The set of experimental spectral densities is described using the following notation: in which the y j = J (ω j ) are ordered in increasing values of ω.The values of J (0) are ordered additionally by increasing values of the static magnetic field.The experimental data sets utilized in the present work are not affected by chemical exchange contributions to spin relaxation, but such contributions can be taken into account from the field dependence of transverse relaxation rate constants prior to the model-free analysis (Kroenke et al., 1998).
The extended model-free spectral density function used to fit 15 N spin relaxation data is given by the following: The set of possible model parameters in this function are given by the following: in which τ m is the (effective) overall rotational correlation time, S 2 f is the square of the generalized order parameter for internal motions on a fast (τ f ≤ 150 ps) timescale, and S 2 s is the square of the generalized order parameter for internal motions on a slow (τ s > 150 ps) timescale (vide infra).The square of the generalized order parameter S 2 = S 2 f S 2 s .Overall rotational diffusion has been assumed to be isotropic for simplicity; this assumption can be relaxed as needed (Lee et al., 1997).The spectral density data are fit with a set of nested models.The full model, Model 5, contains all five parameters, while simpler models, Models 1-4, are generated by fixing the value of one or more parameters, effectively removing such parameters from the model.Thus, The optimal model t 1 and associated parameter values µ are obtained as follows: using the lowest penalized residual sum-of-squares as described above.In the present work, the small-sample AIC C criterion was used for model selection (Hurvich and Tsai, 1989).
In general, a non-parametric bootstrap sample is generated by draws with replacement from the original data y and defined as follows: in which i = 1, . ..B and B is the total number of bootstrap samples.The nature of spectral density data requires care in generating bootstrap samples, and the particular procedure employed in the present work is described in Sect.3. A conventional non-parametric bootstrap determination of the standard deviations of the parameters μ begins by determining fitted parameters for the ith bootstrap sample as follows: in which the fitting model is fixed to the optimal model selected in fitting the original spectral density values, and only model parameter values are optimized.The bootstrap estimate of the standard deviation for the kth parameter is derived from the following expressions: In the conventional approach, the reported results of the data analysis are { μk } and { σ * k }.Model selection error is not assessed.This form of bootstrap simulation is an alternative to Monte Carlo simulations to determine parameter uncertainties, which could be regarded as parametric bootstrap simulations (vide infra).
In contrast to the conventional procedure, bootstrap aggregation determines both the optimal fitted model and associated model parameters for each bootstrap sample.Thus, the optimal model t i is determined for the ith bootstrap sample using the same model selection strategy as for the original data as follows: Unlike the conventional bootstrap procedure, the different members of the set μ * i obtained by bootstrap aggregation represent different models as well as different sets of optimized parameters.The aggregated, or smoothed, estimator of the kth model parameter is given by the following: To make the above formalism concrete, suppose that for a given set of spectral density values, model selection and parameter optimization for B bootstrap samples yields B 2 samples in which Model 2 is optimal and B 3 samples in which Model 3 is optimal, with B = B 2 + B 3 .The bootstrapaggregated estimates of S2 f and τf are given by the following: because Model 3 fixes S 2 f = 1 and τ f = 0.As another example, suppose that for a given set of spectral density values, model selection and parameter optimization for B bootstrap samples yields B 4 samples in which Model 4 is optimal and B 5 samples in which model 5 is optimal, with B = B 4 + B 5 .The bootstrap-aggregated estimates of S2 f and τf are given by the following: because both Models 4 and 5 fit S 2 f as a parameter, but Model 4 fixes τ f = 0.
A smoothed standard deviation for μ can be obtained using the plug-in principle (Efron, 2014).Here, the cumulative distribution functions for the parameters of interest are estimated using the empirical distribution function of the bootstrap replicates.Using the above notation, the number of times that the ith bootstrap replicate, y * i , contains the spectral density value y j is given by the following: With this definition, Y * i is a vector enumerating the representation of each original data point in the ith bootstrap replicate as follows: Further, the average representation of the original spectral density value y j across the B bootstrap replicates is given by the following: The covariance between the representation of the j th spectral density value and the kth model-free parameter value across B bootstrap replicates is given by the following: https://doi.org/10.5194/mr-2-251-2021Magn.Reson., 2, 251-264, 2021 Finally, the smoothed estimate of the standard deviation for the kth model-free parameter is calculated from the following expression: In bootstrap aggregation, the reported results consist of the smoothed estimators { μk } and { σk } incorporating the effects of model selection uncertainty.As noted by Efron, σk ≤ σ u k , in which σ u k is obtained using Eq. ( 8) naively applied to the bootstrap-aggregated data (rather than to data analyzed with a fixed model as above) (Efron, 2014).

Methods
Backbone amide 15 N spin relaxation data have been reported at G = 4 1 H static magnetic fields of 600, 700, 800, and 900 MHz for the bZip domain of the S. cerevisiae transcription factor GCN4 by Gill and coworkers (Gill et al., 2016).Experimental values of R 1 , R 2 , and the steady-state NOE measured at each magnetic field for each residue were converted to spectral density values using the following expressions (Farrow et al., 1995;Gill et al., 2016): NH , r NH = 0.102 nm is the N-H bond length, and σ = −172 ppm is the 15 N chemical shift anisotropy.A single value of J (0) was obtained for each residue as the weighted mean (using propagated experimental uncertainties) of the values obtained from the G static magnetic fields.The uncertainty in the mean J (0) was obtained by jackknife simulations.For each residue, the spectral density values used for model fitting consist of the mean J (0), G values of J (ω N ), and G values of J (0.87ω H ), for a total of nine data points.
To avoid such highly unrepresentative possibilities, bootstrap samples were generated by enumerating the 19 3 = 6859 possible arrangements in which at most two spectral density values from each set of J (0), J (ω N ), and J (0.87ω H ) are duplicated.The 19 possible arrangements of the G = 4 indices {1, 2, 3, 4} and corresponding Y ij for selecting bootstrap samples of J (0), J (ω N ), and J (0.87ω H ) are shown in Table 1.In this table, p ij is a pointer vector selecting data from a particular set of spectral density values.For example p 4j = [1, 2, 3, 1]; applying this pointer to the set of J (0) values would select the J (0) values obtained at 600 (×2), 700, and 800 MHz.The corresponding counter vector Y * 4j = [2, 1, 1, 0] is the numbers of times J (0) values recorded at the different fields were sampled.The process would be repeated for the other sets of spectral density values.For example, the i = 1260th bootstrap sample uses p 4j to select J (0), p 10j to select J (ω N ), and p 6j to select J (0.87ω H ). The full vector Y * i of length N = 12 is obtained by concatenating the individual Y * 4j , Y * 10j , and Y * 6j vectors from the table.With this procedure, the first bootstrap sample is identical to the original data.The mean and uncertainty were determined for J (0) for each bootstrap sample as described above for the original data so that fitting of bootstrap samples was performed in the same fashion as for the original data.
The data were analyzed using three procedures.First, a conventional analysis, Eq. ( 4), was performed in which optimal models t 1 and model parameters { μk } were determined for each amino acid residue (for which data were available) using AIC C .The uncertainties in model parameters, denoted { σk }, were determined by 500 Monte Carlo simulations using the measured experimental uncertainties in the spectral density values (Gill et al., 2016).Second, the optimal model was determined as in the first procedure, but the uncertainties in model parameters, { σ * k }, were determined by the conventional bootstrap, using Eq. ( 8).In both of these approaches, error estimates were obtained while fixing the model for each Monte Carlo or bootstrap sample as the optimal model t 1 selected against the original data.Third, the smoothed model parameters { μk } and uncertainties { σk } were determined by bootstrap aggregation using Eqs.( 10) and ( 19), respectively.In this approach, the optimal model and parameters were determined individually for each bootstrap sample as in Eq. ( 9).A flowchart outlining the process of performing bootstrap aggregation is shown in Fig. 1.Both Models 2 and 3 contain a single generalized order parameter and a single internal effective correlation time.The model selection strategy employed herein assigns Model 2 if the internal correlation is < 0.15 ns and Model 3 if the internal correlation time is ≥ 0.15 ns (vide infra).
Values of a local τ m were optimized for each residue in the well-ordered coiled-coil domain of the protein (residues 26-55).Values of τ m for residues in the basic region (residues 3-25) and disordered C-terminus (residues 56-58) were fixed at 17.5 ns, the average value obtained for ordered residues.A similar approach was used by Gill and coworkers in the original analysis of the relaxation data (Gill et al., 2016).Local values of τ m can be used to determine the overall rotational correlation time or diffusion tensor using established methods (Lee et al., 1997).Alternatively, the fitting process could be modified to globally optimize the overall rotational correlation time or diffusion tensor while independently optimizing generalized order parameters and correlation times for individual residues (Mandel et al., 1995).In this scenario, bootstrap aggregation for the internal dynamical parameters would be performed by the same approach as used herein.

Results
The results of the conventional analysis using AIC C for model selection and Monte Carlo error estimation are shown in Fig. 2. Each of the Monte Carlo simulations was analyzed using the optimal model determined from the original data.The optimal parameters differ slightly from those reported by Gill and coworkers because the present approach used a different spectral density function and model selection method compared to the earlier work (Gill et al., 2016).The results of the conventional analysis using AIC C for model selection and bootstrap resampling for error estimation are shown in Fig. 3.Each of the bootstrap data sets was analyzed using the optimal model determined from the original data.The results for bootstrap aggregation using AIC C to determine the optimal model for each bootstrap sample are shown in Fig. 4. The bootstrap-aggregated smoothed model-free parameters were calculated using Eq. ( 10), and the smoothed parameter uncertainties were calculated using Eq. ( 19).
Bootstrap simulations in which a single optimal model is utilized provide an alternative to Monte Carlo simulations for estimation of (unsmoothed) parameter uncertainties.The uncertainties in σ (S 2 ) obtained from Monte Carlo simulations and σ * (S 2 ) obtained from conventional bootstrap simulations are compared in Fig. 5a.The uncertainties have approximately the same range but are uncorrelated with each other.These results suggest the non-parametric bootstrap samples simulate the actual data distribution in comparable manner as the parametric Monte Carlo simulations but without assuming a normal distribution of spectral density values.The smoothed parameter uncertainty obtained from Eq. ( 19) is compared to the uncertainties from Monte Carlo simulations in Fig. 5b.The increase in σ (S 2 ) compared to σ (S 2 ) reflects the effect of model selection uncertainty.As noted by Efron, the estimate of smoothed parameter uncertainty obtained from Eq. ( 19) is smaller than the naive estimate obtained by applying Eq. ( 8) to the aggregated bootstrap samples (Efron, 2014).To illustrate the advantage of Eq. ( 19), Fig. 5c compares σ u (S 2 ) obtained from Eq. ( 8) and σ (S 2 ) obtained from Eq. ( 19).Similar trends are observed for other model-free parameters (not shown).
Table 2 shows the values of AIC C for each model fit to the original spectral density and the percentage that each model was chosen in the bootstrap aggregation.2, but the uncertainty estimates differ.Regions of the protein are colored as basic region 1 (residues 3-12) (reddish-purple), basic region 2 (residues 13-25) (green), coiled-coil (residues 26-55) (black), and disordered C-terminus (residues 56-58) (orange) (Gill et al., 2016).Values of S 2 , τ m , S 2 f , τ f , S 2 s , and τ s are plotted vs. residue number.Regions of the protein are colored as basic region 1 (residues 3-12) (reddish-purple), basic region 2 (residues 13-25) (green), coiled-coil (residues 26-55) (black), and disordered C-terminus (residues 56-58) (orange) (Gill et al., 2016). of model-free parameters determined from the optimal model for each bootstrap sample.The calculated spectral density function for bootstrap aggregation is compared to the fitted spectral density functions for each model in Figs. 9, 10, and 11.

Discussion
The difficulties posed by conventional model selection strategies, in which a single optimal model is chosen using AIC C or other fitness statistic, are illustrated for the bZip domain of GCN4 in Fig. 2. In particular, some residues in the basic region (residues 3-25) are analyzed using Model 4, in which τ f = 0 and other residues are analyzed with Model 5, in which τ f > 0. The resulting values of the other model-free parameters are systematically affected depending on whether or not τ f = 0.These systematic effects are evident most clearly in the scatter in S 2 f and τ s for residues in the basic region.The advantages of bootstrap aggregation in smoothing over variability in model selection is evident in Fig. 4, in which the residue-to-residue variability of the model-free parameters is reduced.Thus, the distributions of τ f and τ s are much more uniform within the four regions of the protein, suggesting rather uniform timescale processes in each subdomain.The similarity in the distributions for σ (S 2 ) and σ * (S 2 ), shown in Fig. 5a, indicates that the bootstrap procedure adequately samples the distribution of parameter values.That is, the reduction in parameter variability from bootstrap aggregation does not result from restricted sampling.
The results shown for residue Arg 11 in Tables 2 and 3 and Figs.6 and 9 illustrate the mechanics behind bootstrap aggregation.The original optimization against the measured data yielded AIC C values of 33.3 for Model 4 and 34.1 for Model 5.The conventional analysis then selects Model 4 (with τ f = 0) as optimal, even though AIC C for Model 5 is only slightly larger.In contrast the bootstrap analysis suggests that Model 4 would be optimal for 24 % and Model 5 would be optimal for 76 % of randomly chosen data, under the assumption that the bootstrap samples represent the underlying distribution of spectral density values.Bootstrap smoothing then averages each model-free parameter over the empirical distributions shown in Fig. 6, with resulting optimized spectral density curves compared to the original experimental data in Fig. 9.The results for Model 4 in Table 3 and the corresponding vertical orange line in Fig. 9 show that the selection of Model 4 in the conventional analysis results in an estimate for τ s that is skewed toward the lower boundary of the τ s bootstrap distribution.
The results shown for residue Arg 26 in Tables 2 and 3 and Figs.7 and 10 illustrate another advantage of bootstrap aggregation.In this case, the original optimization against the measured data yielded an AIC C value of 23.4 for Model 2, substantially smaller than for any other model, implying a single model might be an adequate description for this residue.However, the bootstrap distribution for the internal correlation times is bimodal.The conventional choice of Model 2 results in an estimate of τ f roughly centered in the distribution, but the smoothed bootstrap estimates identify the presence of two separable timescales for internal motions, one with a mean of 0.052 ± 0.019 and the other with a mean of 0.13 ± 0.08.Residue 26 is at the juncture between the basic region and coiled-coil motif of the GCN4 bZip domain; consequently, the latter effective internal correlation time might represent a vestige of the more pronounced motions evident in the basic region.The critical value of 0.15 ns used to separate fast from slow motions in the present work was chosen empirically to distinguish the two distributions observed for residue 26 (and used for all other residues).More sophisticated clustering algorithms could be used to make this distinction between Models 2 and 3.
The results shown for residue Asp 32 in Tables 2 and 3 and Figs. 8 and 11 illustrate a case of strong agreement between the conventional analysis and bootstrap aggregation when a single motional model is strongly favored by the experimental data.The distributions shown in Fig. 8 then represent the variability in model-free parameters across the bootstrap samples.These results would be comparable to results obtained in Fig. 3, in which the bootstrap samples were used to estimate model-free parameter uncertainties σ * k for a single fixed optimal model.The present application of bootstrap aggregation used spin relaxation data recorded at four static magnetic fields.A total of 6859 bootstrap samples were used to calculate smoothed parameter estimates.Data recorded at three static magnetic fields provide nine spectral density values but allow only 7 3 = 343 bootstrap samples.tic reduction in the size of the bootstrap sample, the relaxation rate constants recorded at 600, 800, and 900 MHz were analyzed for the disordered basic region (residues 3-25).This preserves the same range of sampled frequencies as for the original analysis, but only seven spectral density values are obtained for each residue, after averaging the three values of J (0).The smaller number of spectral density values results in smaller numbers of degrees of freedom when fitting the model-free spectral density models.As a consequence, only Models 3 and 4 were selected for the basic region in the con-  ventional analysis; essentially, the data were not sufficient to determine τ f and τ s simultaneously (Model 5).Nonetheless, bootstrap aggregation was effective in smoothing the effects of model selection error between Models 3 and 4, even with only 343 bootstrap samples (not shown).A number of studies have investigated the number of model parameters that can be determined from backbone amide 15 N relaxation data recorded at high static magnetic fields (Khan et al., 2015;Gill et al., 2016;Abyzov et al., 2016).The present results suggest that measurements at four static magnetic fields are required to fully statistically characterize the information content of such measurements within the extended model-free formalism.

Conclusions
Model selection error is a classical problem in statistics and has been recognized as a concern in the model-free analysis of NMR spin relaxation data since the work of d'Auvergne and Gooley (2007, 2008a, b)  emerged as a powerful approach for incorporating selection error into statistical model-building (Buja and Stuetzle, 2006;Efron, 2014).However, bootstrap aggregation requires sufficient numbers of data points to allow faithful resampling of the distribution of the data.This issue is made more serious by the nature of nuclear spin relaxation data: spectral density values for J (0), J (ω N ), and J (0.87ω H ) are very different and should not be interchanged by resampling.As shown in the present work, resampling within blocks of spectral density values clustered as J (0), J (ω N ), and J (0.87ω H ) recorded at three or four static magnetic fields is sufficient to enable bootstrap aggregation.However, the larger data set available from four static magnetic fields allows more reliable resolution of two internal correlation times, τ f < 0.15 ns and τ s ≥ 0.15 ns.Aggregation improves parameter stability by averaging over all models represented in the bootstrap sample.As applied to 15 N spin relaxation data for the bZip domain of GCN4, bootstrap aggregation reduces residue-to-residue variations in optimal model-free parameters, particularly in the partially disordered basic region.Consequently, trends in the conformational dynamics along the polypeptide backbone that reflect actual physical properties of the protein become more evident.Notably, local maxima in generalized order parameters within the basic region (residues 3-25), most evident for residues 8 and 9 and for residues 14 and 15 in Fig. 4, reflect transient populations of helical conformations observed in molecular dynamics simulations (Robustelli et al., 2013).NMR spin relaxation spectroscopy is a powerful approach for interrogating conformational dynamics of biological macromolecules.Bootstrap aggregation, coupled with experimental NMR spin relaxation mea-surements at multiple static magnetic fields, promises to advance efforts to understand the interplay between conformation and function in biology.
Code and data availability.A Jupyter notebook (Python 3.6) is provided for performing all data analyses reported in the publication.The NMR data analyzed in the publication are available at Mendeley Data (https://doi.org/10.17632/vpwz6mrynr.1;Gill et al., 2021).
Author contributions.AGP conceived the project.Calculations and writing of the paper were performed by TC and AGP.
Competing interests.The authors declare that they have no conflict of interest.Special issue statement.This article is part of the special issue "Geoffrey Bodenhausen Festschrift".It is not associated with a conference.
Acknowledgements.Arthur G. Palmer III acknowledges the support from the National Institutes of Health.Some of the work presented here was conducted at the Center on Macromolecular Dynamics by NMR Spectroscopy located at the New York Structural Biology Center, supported by the NIH National Institute of General Medical Sciences.Arthur G. Palmer III is a member of the New York Structural Biology Center.This paper is dedicated to Prof. Geoffrey Bodenhausen on the occasion of his 70th birthday.
Financial support.This research has been supported by the National Institutes of Health (grant nos.R35GM130398 and P41GM118302).
Review statement.This paper was edited by Malcolm Levitt and reviewed by three anonymous referees.

Figure 1 .
Figure 1.Flow chart for bootstrap aggregation for the model-free formalism.Indices l, m, and n are determined for each bootstrap sample from the index i using modulo arithmetic, in which // represents floor division, and % is the modulo (remainder) operation.The three indices l, m, and n select pointer and counter vectors from Table1.The three pointer vectors are used to generate bootstrap samples for J (0), J (ω N ), and J (ω H ). The three counter vectors are concatenated to form Y * i .

Figure 5 .
Figure 5.Comparison of model-free parameter uncertainties.(a) Uncertainties for S 2 calculated from Monte Carlo, σk , and bootstrap simulations, σ * k , for a single optimal model.(b) Uncertainties for S 2 calculated from Monte Carlo simulations for a single optimal model and smoothed σk calculated from bootstrap aggregation.(c) Uncertainties σ uk and σ for S 2 calculated from bootstrap aggregation, illustrating the smaller variability obtained using Eq.(19) for calculation of parameter sample deviations.

Figure 6 .
Figure 6.Distribution of model-free parameters from bootstrap aggregation for residue Arg 11.Color coding is S 2 f or τ f (reddishpurple) and S 2 s or τ s (blue).The orange line in (c) indicates the value of τ s obtained for the optimal single (unsmoothed) model, Model 4. For clarity, null values of 1 for generalized order parameters and 0 for internal effective correlation times are not shown in the graphs; τ f = 0 is observed 1664 times.

Figure 7 .
Figure 7. Distribution of model-free parameters from bootstrap aggregation for residue Arg 26.Color coding is S 2 f or τ f (reddish-purple) and S 2 s or τ s (blue).The orange line in (c) indicates the value of τ f obtained for the optimal single (unsmoothed) model, Model 2. For clarity, null values of 1 for generalized order parameters and 0 for internal effective correlation times are not shown in the graphs; S 2 f = 1 is observed 2167 times, S 2 s = 1 is observed 3884 times, τ f = 0 is observed 2823 times, and τ s = 0 is observed 3884 times.

Figure 8 .
Figure 8. Distribution of model-free parameters from bootstrap aggregation for residue Asp 32.Color coding is S 2 f or τ f (reddish-purple) and S 2 s or τ s (blue).The orange line in (c) indicates the value of τ f obtained for the optimal single (unsmoothed) model, Model 2. For clarity, null values of 1 for generalized order parameters and 0 for internal effective correlation times are not shown in the graphs; S 2 s = 1 was observed 6650 times.

Table 2 .
Model selection for selected residues.For each residue, the top line lists the AIC C values determined by fitting the original data to Models 1-5.The second line enumerates the percentage of bootstrap samples for which the indicated model exhibited the lowest AIC C .Both Models 2 and 3 contain a single internal effective correlation time.Model 2 is assigned if this correlation time is < 0.15 ns (and Model 3 is not assigned, NA).Model 3 is assigned if this correlation time is ≥ 0.15 ns (and Model 2 is not assigned, NA).

Table 3 .
Model-free parameters for selected residues.For each residue, parameter values for Models 1-5 are calculated from the fit of the original data to the relevant spectral density function, with errors determined by Monte Carlo simulation.The model selected by AIC C is indicated by * .Smooth values are obtained by averaging the best fit parameter values across bootstrap samples as in Eq. (10), with errors determined as indicated in Eqs.(15)-(19).