Nonlinear regression improves accuracy of characterization of multiplexed mass spectrometric assays

The need for assay characterization is ubiquitous in quantitative mass spectrometry-based proteomics. Among many assay characteristics, the limit of blank (LOB) and limit of detection (LOD) are two particularly useful figures of merit. LOB and LOD are determined by repeatedly quantifying the observed intensities of peptides in samples with known peptide concentrations and deriving an intensity versus concentration response curve. Most commonly, a weighted linear or logistic curve is fit to the intensity-concentration response, and LOB and LOD are estimated from the fit. Here we argue that these methods inaccurately characterize assays where observed intensities level off at low concentrations, which is a common situation in multiplexed systems. This manuscript illustrates the deficiencies of these methods, and proposes an alternative approach based on nonlinear regression that overcomes these inaccuracies. We evaluated the performance of the proposed method using computer simulations and using eleven experimental data sets acquired in Data-Independent Acquisition (DIA), Parallel Reaction Monitoring (PRM), and Selected Reaction Monitoring (SRM) mode. When the intensity levels off at low concentrations, the nonlinear model changes the estimates of LOB/LOD upwards, in some data sets by 20-40%. In absence of a low concentration intensity leveling off, the estimates of LOB/LOD obtained with nonlinear statistical modeling were identical to those of weighted linear regression. We implemented the nonlinear regression approach in the open-source R-based software MSstats, and advocate its general use for characterization of mass spectrometry-based assays.

Abbreviations DIA data-independent acquisition LOB limit of blank LOD limit of detection LLOQ lower limit of quantitation/quantification PRM parallel reaction monitoring SRM selected reaction monitoring

Summary
The need for assay characterization is ubiquitous in quantitative mass spectrometry-based proteomics. Among many assay characteristics, the limit of blank (LOB) and limit of detection (LOD) are two particularly useful figures of merit. LOB and LOD are determined by repeatedly quantifying the observed intensities of peptides in samples with known peptide concentrations, and deriving an intensity versus concentration response curve. Most commonly, a weighted linear or logistic curve is fit to the intensity-concentration response, and LOB and LOD are estimated from the fit. Here we argue that these methods inaccurately characterize assays where observed intensities level off at low concentrations, which is a common situation in multiplexed systems. This manuscript illustrates the deficiencies of these methods, and proposes an alternative approach based on non-linear regression that overcomes these inaccuracies. We evaluated the performance of the proposed method using computer simulations, and using eleven experimental datasets acquired in Data-Independent Acquisition (DIA), Parallel Reaction Monitoring (PRM), and Selected Reaction Monitoring (SRM) mode. When the intensity levels off at low concentrations, the non-linear model changes the estimates of LOB/LOD upwards, in some datasets by 20-40%. In absence of a low concentration intensity leveling off, the estimates of LOB/LOD obtained with non-linear statistical modeling were identical to those of weighted linear regression. We implemented the non-linear regression approach in the open-source R-based software MSstats, and advocate its general use for characterization of mass spectrometry-based assays.

Introduction
Assay characterization is important in quantitative bottom-up mass spectrometry-based proteomics. By relating the observed intensities to the concentration of the protein or peptide, it enables quantification of peptides or proteins in clinical research and diagnostics [1,2], or in systems biology research [3]. An important aspect of assay characterization is the estimation of its limit of blank (LOB), limit of detection (LOD) and lower limit of quantification (LLOQ), which quantify the sensitivity and the quantitative reproducibility of the assays. The estimation of these figures of merit in mass spectrometry has been necessary, e.g. for drug of abuse testing [4] or for clinical diagnostics, e.g. with PSA [5]. They are particularly important when the detection limits are near the clinical decision limit, as in the case of troponin [6]. They are also important for evaluating various mass spectrometric workflows [7]. Much technological research aims at estimating and minimizing these values for various experiment types. As of early 2017, over 700 publications indexed on PubMed reported LOB and LOD for mass-spectrometry based experiments. However, there is currently no commonly accepted experimental design or statistical guidelines for deriving these figures of merit, in particular in multiplexed mass spectrometric investigations. Assay characterization involves spiking peptides or proteins in known concentrations in a simple or complex background, or titrating a peptide or protein mixture of known composition. After digestion, separation and spectral acquisition, the observed peak intensities of a peptide are summarized, and related to the true concentrations using statistical models. The statistical models used for this task is the main focus of this manuscript.
Most statistical models rely explicitly or implicitly on the canonical calibration curve [8], which contains three distinct regimes (Figure 1(a)). The noise regime, i.e. the concentrations where peptide intensities are indistinguishable from the chemical and electronic noise. These intensities are referred to as background noise. The linear regime, i.e. the concentrations where the canonical calibration curve is an increasing straight line (referred to as response curve). The slope of the line is the efficiency of the assay. Finally, the saturation regime, i.e. the concentrations where an increase in concentration does not translate into an increase in the intensity of the analyte. The three regimes are defined by the parameters Change and Change Sat in Figure 1(a). Since most experiments do not include concentrations in the saturation regime, this manuscript focuses on the noise and on the linear regime. However, the approach proposed in the manuscript can be easily extended to the full canonical calibration curve.
The concepts of LOB and LOD of a peptide are best defined with respect to the canonical calibration curve, as illustrated in Figure 1 (b) [9]. Assuming that the peptide intensities fall in the linear regime, the LOB is defined as the concentration at which the upper bound of the prediction interval of the background noise intersects the mean intensity of samples with the peptide. The LOD is defined as the concentration at which the upper bound of the prediction interval of the background noise intersects the lower bound of the prediction interval of samples with the peptide. The prediction interval, i.e. the confidence interval of a new intensity (as opposed to the confidence interval of the mean response curve) must be used to accurately reflect the performance of a future assay. The lower limit of quantification or quantitation (LLOQ) is loosely defined as the lowest concentration at which the measurements are acceptable for the intended use [10]. Since multiple definitions of LLOQ exist [11], this manuscript only presents results for the LOB and LOD. However, the discussion can readily be extended to LLOQ.
When mass spectrometric experiments characterize a limited number of peptides, the concentrations are fine-tuned, so that the intensities fall well inside the linear regime, and the canonical calibration curve is observed. Following the canonical calibration curve, statistical analyses fit a linear regression that relates the concentrations and the observed intensities on the original (i.e., not log-transformed) scale. Many methods for fitting linear regression and estimating the LOB, LOD have been proposed [12,13,14,15], compared [9,16,17,18], and implemented in open-source software such as Quasar [19] and Skyline [20]. Most methods acknowledge that the variance of the observed intensities increases with the strength of the intensity, and specify regression with unequal variance, or use a robust estimation procedure.
Increasingly however, mass spectrometric assay characterization is multiplexed, and simultaneously quantifies hundreds or even thousands of peptides using, e.g. Selected Reaction Monitoring (SRM), or more recently Parallel Reaction Monitoring (PRM) or Data-Independent Acquisition (DIA). In these experiments it is often impossible to fine-tune the concentrations of each peptide, and some may fall outside of the linear regime. When the number of concentrations or replicates is relatively small, the intensities from the lower portion of the linear regime are similar to the intensities of the background noise. This creates uncertainty in the shape of the response, and the intensities of the peptide flatten out and produce the apparent calibration curve. The apparent calibration curve is illustrated with the dashed line in Figure 1 (a) and using an experimental dataset in Figure 2. Statistical analysis of multiplexed experiments must appropriately deal with these situations, and a linear model is not appropriate.
To account for the non-linearity of the apparent calibration curve, alternative approaches fitting a logistic response curve [21,22,23] have been proposed. However, as illustrated in Figure 2, they do not fully capture the intensities at low concentrations. Moreover, their estimation fails for peptides that are in linear regime, and for which the intensities do not contain both change points. As a result, the general strategy for the statistical analysis of the entire multiplexed assay characterization experiment is often unclear.
This manuscript proposes to remedy these challenges in highly multiplexed assay characterization experiments. It generalizes the approaches based on linear and logistic curve fits, and enables a more general but intuitive assay calibration. When the concentrations fall entirely in the linear portion of the dynamic range (i.e., in range of concentrations where the canonical and the apparent calibration curves agree), the output of the proposed approach is identical to that of weighted linear regression. When the concentrations deviate from this range and the observed intensities level off, the proposed approach appropriately represents the canonical calibration curve, while accounting for the uncertainty in the estimation. We evaluated the proposed approach using computer simulations, and compared the resulting figures of merit on 11 experimental datasets acquired in SRM, PRM and DIA mode. The evaluations demonstrated that the proposed approach results in a more accurate, and often more conservative, estimation of the figures of merit.

Experimental procedures
We evaluated the performance of the proposed approach using eleven experimental datasets, acquired in Selected Reaction Monitoring (SRM), Parallel Reaction Monitoring (PRM), or Data-Independent Acquisition (DIA), as well as using computer simulation. The datasets are summarized in Table 1. All the experimental datasets were processed with Skyline [20].
Assay development and evaluation experiments Experiments SRM1-SRM8 consisted of heavy isotope-labelled peptides spiked into a complex background in known and varying concentrations, and systematically quantified together with the endogenous peptides. Experiment SRM1 was a multi-laboratory study of SRM performance for human proteins (CPTAC study 7) [24]. Targeted SRM experiments SRM2-SRM7 were all the assays available on the CPTAC assay portal [25] at the time of writing. The DIA experiment is described in detail in Supplementary Section 1.2.
In each dataset, areas under the chromatographic curves corresponding to fragments or transitions of a peptide in a run were summed, and subjected to natural logarithm transformation. In the case of the DIA dataset, only the most intense fragment ions are considered, as described in Supplementary Section 1.2. The summed and log-transformed areas were then normalized across runs, by first equalizing the medians of these values among all the endogenous peptides, and then applying the same shifts to the values of all the heavy peptides. Finally, the normalized values were converted back to the original scale. In the following, we refer to these summed and normalized peak areas as the intensity of the peptide in the run.
Assay used in large clinical studies Experiment CL [2] was a mass spectrometric assay of vitamin D binding globulin (VDBG) in human serum. A human serum sample containing VDBG was diluted into a background of chicken serum lacking VDBG. Heavy-labeled standard peptides were spiked in at a fixed concentration in each run. The assay quantified two peptides of the protein, VLEPTLK and ELPEHTVK. The peak area responses and concentration of these two peptides was calculated using a five-point calibration curve. The concentration values of the peptides were averaged to determine the concentration of VDBG. The intensities were normalized to the heavylabeled standard according to the same procedure as above.
Simulated datasets In the experimental datasets, the true LOB/LOD are unknown, and therefore we can only evaluate the impact of different statistical approaches on their estimation. In contrast, simulated datasets can help us evaluate the accuracy of the results. We simulated two datasets, SIM1 and SIM2 with known figures of merit, as described in Supplementary Section 1.4.1.

Background
To place the proposed approach in the context of existing work, in this section we describe the definitions, notation, and current approaches for assay characterization based on linear [9] and logistic [21] curve fit with unequal variance. We assume that the characterization is performed at the peptide level (i.e., after summarizing all the transitions or fragments of the peptide), and describe the model for a single representative peptide.

Input data and notation
In mass spectrometry-based assay characterization experiments, each peptide is spiked or titrated in L samples in known and distinct concentrations C 1 < . . . < C L . The experiments also typically include blank samples used to quantify the background noise. We denote the blank samples by the true concentration C 0 = 0. The L + 1 samples are analyzed with a mass spectrometer in technical replicates (i.e., runs). The spectral peaks corresponding to the peptide in a run are detected and quantified (e.g., by integrating the area under their chromatographic curve, or by any other method of choice) using a data processing tool such as Skyline [20]. The peak intensities are then summarized (e.g., with the sum of the chromatographic peak areas of all the peptide transitions or fragments) and normalized, to produce a single intensity value per replicate run.
The output of data processing and summarization is the input to peptide assay characterization. It consists of (C i , Y ij ) pairs of true peptide concentrations C i , i = 0, . . . , L and their summary intensities Y ij in the replicate run j, j = 1, . . . , N i on the original (i.e., not logarithm transformed) scale. The total number of intensities for a peptide is N = L i=0 N i .

Linear curve fit
When the concentrations are in the linear portion of the dynamic range, many assay characterizations rely on a linear curve fit on the original (i.e., not log-transformed) scale. Most models explicitly or implicitly specify a two-part statistical model, where one part is for the blank samples, and the other for the samples containing the peptide. For the blank samples, the approaches assume that the background noise randomly fluctuates around a constant. For the samples with the peptide, the approaches assume a linear response curve, that relates the observed intensities to the spiked or titrated concentrations. The fluctuations around this expected relationship are assumed to be Normally distributed, with variance depending on the concentration. In the notation of this manuscript, a typical model is written as: The existing regression-based approaches differ in their assumptions regarding the variance. Some specify a distinct variance parameter σ 2 i for each concentration, as in Equation (1) [9]. A common alternative [12,26] assumes that the variance is proportional to C 2 i , i.e.
The first term of the sum models the variance of the chemical and electronic noise at low concentrations, while the second models its increase with concentration. The C 2 i scaling implies the constant signal to noise ratio in the linear regime. Since the pattern in Equation (2) was not observed for the datasets in this manuscript, the results below follow the approach in Equation (1) and in [9].

Logistic curve fit
Approaches based on logistic curve fit are similar in spirit to the approaches based on a linear curve fit above. They model the mean observed intensities µ i in Equation (1) with a logistic curve. In our notation, a five-parameter logistic curve [27,28] is expressed as: It contains five parameters [29]: a is the intensity value at zero concentration and in the noise regime, b is the intensity value of the saturation regime, and c is the midrange concentration between a and b. Coefficients d and g control the extent to which the response curve moves from the noise regime to the saturation regime. A four-parameter logistic curve sets g = 1 in Equation (3), and restricts the curve to be symmetric around the midpoint C = c, µ i = a + c 2 . In the following, we will compare the proposed approach to the approach based on the five-parameter logistic curve, as it is most widely used and often generates better quality fit.

Figures of merit
The limit of blank (LOB) of a peptide is defined as the highest concentration, at which the expected mass spectrometric intensity in presence of the peptide is indistinguishable from the possible background noise [10,12]. The possible background noise is defined as the 100 × (1 − α) % one-sided prediction interval of the intensity from the blank sample, which we denote as Noise α . Therefore, in the notation of Equation (1), the LOB is the concentration C at which Noise α intersects the response curve Intercept + Slope × C. This definition formalizes the fact that at the LOB, the probability of false positive peptide quantification is α, and the probability of false negative peptide quantification is 0.5. The LOB is illustrated with a dot in Figure 1 The limit of detection (LOD) of a peptide is defined as the highest concentration, at which the possible mass spectrometric intensity in presence of the peptide is indistinguishable from the possible background noise [10,12]. The possible mass spectrometric intensity in presence of the peptide is defined as the lower bound of the 100 × (1 − β)% one-sided prediction interval at concentration C i . At the LOD, the lower bound intersects the possible background noise Noise α . This definition formalizes the fact that intensities from any true concentration above the LOD must not be mistaken for background noise. At the LOD, the probability of false positive peptide quantification is α. The probability of false negative peptide quantification is β. The LOD is illustrated with a triangle in Figure 1 The definitions are easily extended to a logistic curve. In this case, the LOB uses the logistic response curve fit, and the LOD uses the lower bound of the prediction interval of the logistic response curve. In all the cases, the parameters α and β are set in advance. Smaller values of α and β lead to wider prediction intervals [30], and to more conservative figures of merit. In the following we systematically set α = β = 0.1 as is commonly done [10].
An alternative definition of the LOD [14] [31] is the concentration C at which the response curve has intensity Noise α +σ 1 quantile of the standard Normal distribution, σ 1 is the standard deviation of the sample with the peptide at the lowest concentration. Since this definition does not have a straightforward expression of prediction intervals, we do not use it in what follows.

Parameter estimation and prediction intervals
The figures of merit rely on unknown quantities that must be estimated from the data. For the blank samples, an estimate of the expected background noise µ 0 is obtained as the average intensity The upper bound of the (1 − α) × 100 prediction interval of the background noise is then determined as [32]: where For the samples with the peptide, parameters Slope and Intercept in Equation (1) are estimated via weighted least squares [30], as follows. First, the variance parameters σ 2 i are estimated as the sample variance at each spiked or titrated concentration [9]. To increase the resolution beyond the spiked/titrated concentrations, the concentrations are discretized to a finer grid C 1 < C i < C L+L . This finer grid contains the original concentrations, augmented with L = 100 concentrations equally distributed in log space between the minimum and the maximum of the spiked/titrated concentration values. The gridded variance parameters σ 2 i are estimated by linear interpolation between estimates from the closest observed concentrations. Next, the concentration-specific weights are defined as . Finally, the parameters in Equation (1) are estimated by plugging in the weights. The lower bound of the one-sided (1 − β) × 100% prediction interval is obtained with the closed-form expression [13,9]: Here Unfortunately, no closed-form analogy to Equation (5) exists for the logistic curve fit. Therefore, the lower bound of the prediction interval P I β,i is obtained via bootstrap [33]. The bootstrap is an iterative procedure with a large number B of iterations, say B = 2000. At each iteration, the pairs C * i , Y * ij are obtained by sampling with replacement from the observed pairs (C i , Y ij ). The estimatesâ * ,b * ,ĉ * ,d * andμ * i in Equation (3) are obtained via weighted least squares, while plugging in the full-sample weights w i = 1 With the logistic curve fit, the lower bound of the one-sided (1 − β) × 100% prediction interval for a concentration C i is defined as the β × 100% quantile of the mixture of B Normal distributions N μ * i ,σ 2 i with equal weights. The quantile can be approximated by simulation [34], or evaluated analytically on a grid of augmented concentrations above. Here we chose the grid-based evaluation as it is more computationally efficient, as detailed in Algorithm 1 in Supplementary Section 2.1.
Finally, the estimate of LOB is the concentration C i at which the estimated upper bound of the background noise Noise α intersects the fitted curveμ i (obtained with either linear or logistic curve fit). The estimate of LOD is the concentration C i at which the estimated upper bound of the background noise Noise α intersects the estimated lower bound of the prediction interval P I β,i , as shown in Fig 1(b).

Modeling the canonical calibration curve
The proposed approach assumes that the systematic relationship between the true concentrations and the observed intensities follows the canonical calibration curve in the noise and linear regimes, as shown in Fig 1(a). In other words, it generalizes the linear response µ i in Equation (1) by introducing the parameter Change ≥ 0. The intensities from all the concentrations below Change (including the blank samples) are viewed as the background noise, and the intensities of peptides with concentrations above Change fluctuate around the linear response curve. This bilinear model has a "hockey stick" appearance, and is formally written as: The parameter Change defines the functional form of the response curve. When Change ≤ C 1 , all the spiked or titrated concentrations of a peptide are within the linear regime, and Equation (6) is simplified to the linear model in Equation (1). When Change > C 1 some of the concentrations are below the linear regime, the observed intensities from these concentrations are indistinguishable from the background noise, and the three-parameter "hockey stick" pattern is appropriate for the peptide. Therefore, the model can accurately represent all the peptides in a multiplexed experiment.
The model in Equation (6) is easily extended to situations where the spiked or titrated concentrations span noise, linear and saturation regimes in Figure 1

Parameter estimation and model selection
The parameters µ 0 , Slope and Change in Equation (6) are unknown, and must be estimated from the data. The model is first fit to all the intensities of the peptide using the same weighted least squares estimation procedure as in Section 3.2.4. When the linear model is appropriate, the non-linear expression in Equation (6) is too complex (i.e., is over-parametrized), and results in a substantial uncertainty in the estimate of Change. We quantify this uncertainty with the (1 − γ) × 100% confidence interval. We set γ = 0.2 in what follows.
The confidence interval can be derived approximately using large-sample approximation [35], or by a bootstrap procedure. Since the number of replicate runs in assay characterization experiments is relatively small, here we use bootstrap as described in Supplementary Section 2.1. If the confidence interval is located above C 1 , we conclude that the change point is present. Otherwise, we conclude that the observed intensities are consistent with the linear response, and that the simpler linear model in Equation (1) is appropriate. Finally, the selected (linear or "hockey stick") model is refit to the data using weighted least squares.

Prediction intervals and figures of merit
The prediction intervals from this model must reflect the uncertainty in (1) selecting the appropriate model, (2) estimating the unknown parameters of the selected response curve, and (3) random fluctuations of the individual intensities around the response. The combined effect of these three sources cannot be expressed analytically. However, the intervals can be obtained using the bootstrap procedure, similar to that in Section 3.2.4, but with modifications.
At each bootstrap iteration, the pairs C * i , Y * ij are obtained by sampling with replacement from the observed pairs (C i , Y ij ). Parameters in Equation (6) are estimated via weighted least squares, while plugging in the full-sample weights w i = 1 σ 2 i . To reflect the uncertainty due to model selection, the model selection is performed as described in Section 4.2, and either a linear or a "hockey stick" model is fit to the resampled dataset. Figure 3 shows that the bootstrap iterations can vary substantially in both the selected model and its parameter estimates. Therefore, the response curve is obtained by averaging the estimated response curves across the bootstrap iterations. As can be seen from the figure, the resulting response curve may be highly non-linear, and may lack the characteristic "hockey stick" pattern. This non-linear fit reflects the uncertainty in the selection and in the estimation of the parameters of the canonical calibration curve.
Finally, the bounds of the prediction interval are obtained as the quantiles of the mixture of the predictive distributions across the iteration, where the numeric values for each concentration are evaluated on a grid as in Section 3.2.4. The figures of merit are then obtained as in Section 3.2.4. The estimate of LOB is the concentration C i at which the estimated upper bound of the background noise Noise α intersects the fitted curveμ i . The estimate of LOD is the concentration C i at which the estimated upper bound of the background noise Noise α (defined in Equation (4)) intersects the estimated lower bound of the prediction interval.

Implementation
The proposed approach is implemented as part of the free and open-source R-based package MSstats v3.6 and subsequent versions (http://msstats.org/assay-characterization) [36]. The pseudocode for the algorithm is available in Supplementary Section 2.1. The computational cost of the estimation procedure is fairly limited, with the most expensive step being the calculation of bootstrap samples. For instance, Figure 5(b) was obtained on a standard workstation with 3.5 GHz CPU in about one minute. The largest dataset in this manuscript, SRM2, was analyzed in approximately 6 hours. The algorithm can be easily parallelized, and the runtime can be shortened by deploying it on a multi-core computer or on a cluster. The runtime can also be shortened by replacing the bootstrap-based confidence interval of the parameter Change in Equation (6) with an asymptotic confidence interval, but at the expense of an accuracy loss.

The proposed approach accurately fit the data
As illustrated in Figure 2, both linear and logistic curve fits can fail to simultaneously capture the flattening of peptide intensities at low concentrations, and the linearity of the response curve. In contrast, the proposed approach relies on the canonical calibration curve, and on bootstrap-based estimation, to capture both aspects. Figure 3 shows the proposed curve fit for the same peptide as in Figure 2. The curvature of the fit explicitly reflects the uncertainty in the concentration Change, at which the noise regime transitions into the linear response.
Additional examples of a better fit and better interpretation of the proposed approach are shown in Figure 4 for the SRM2 experiment, and in Figure 5 for the DIA experiment. As can be seen, the proposed approach led to higher figures of merit as compared to linear or logistic curve fit. This reduced optimism in the performance of the assays may have important implications for their downstream use.
The proposed approach did not always alter the figures of merit. Supplementary Figure 1 illustrates the fact that, when the concentrations fall entirely within the linear portion of the dynamic range, the proposed approach was identical to that of weighted linear regression. The figure also indicates that the proposed approach was relatively robust to deviations from the linear regime at larger concentrations. The larger concentrations were associated with higher variance, and were therefore given a smaller weight during the estimation procedure. Supplementary Figure 2 presents a case where the proposed approach lowered the estimated figures of merit as compared to an existing approach.

The proposed approach improved the accuracy of LOB/LOD estimates
As we have seen in Figure 4 and in Figure 5, the differences in the model fit between the existing and the proposed approaches impacted the estimates of the figures of merit. Since the true figures of merit in these experiments are unknown, we evaluated the accuracy of the estimates using the simulated dataset SIM1. Figure 6(a) illustrates the accuracy of fit in one simulation instance. It shows that deviations from linear patterns increased the intercept and decreased the slope of the linear fit, as compared to the exact response curve. Although the logistic fit captured the non-linearity of the intensities at low concentrations, it deviated from the linear pattern in the linear regime. These situations led to smaller (i.e., less accurate, too optimistic) figures of merit.
The accuracy of the proposed approach over all the simulation instances is reported in Table 2 and illustrated in Figure 6(b). Both linear and logistic curves dramatically underestimated the true figures of merit. On average, for the simulated data set, the linear curve underestimated the LOB by 63% and the LOD by 25% . The 5-parameter logistic curve fit underestimated the LOB by 53% and the LOD by 6%. In contrast, the proposed approach was slightly conservative, on average it overestimated the LOB by 1% and the LOD by 4%. However, it substantially reduced both the magnitude and the variability of the estimation error. This indicates that the estimates by the proposed approach for the experimental datasets are likely closer to the true values than those by linear and logistic fits.
Supplementary Figure 3 shows that when the number of grid points used to discretize the spiked or titrated concentrations was sufficiently large, it had a relatively small impact on the accuracy of the results. The insufficient number of discretized concentrations can be diagnosed using plots such as Supplementary Figure 3(a), which display overly jagged curves. As shown in Supplementary  Figures 3(b) and (c), this issue can be corrected by using a finer grid.
Finally, Supplementary Figure 4 uses simulated datasets SIM2 and SIM3 to illustrate that the proposed approach accurately estimates the figures of merit for assays twith various combinations of noise, linear and saturation regimes. As compared to the linear fit, the proposed approach substantially reduced the optimism regarding the figures of merit for all the datasets. As compared to the logistic curve fit, it reduced the optimism for many datasets. The extent of the changes depended on the number of peptides for which intensities deviate from the linear pattern, and on the type of these deviations. A larger increase in the LOB and LOD was associated with a larger proportion of peptides with non-linear pattens. For example, in the PRM dataset 84% of peptides had non-linear patterns (i.e., more than 50% of the bootstrap iterations selected the "hockey stick" model with the proposed approach). The proposed approach increased the LOB and LOD by on average 40% as compared to the linear fit. In contrast, dataset SRM1 had only 14% of peptides with non-linear patterns, and the proposed approach did not produce substantial changes in the figures of merit.

The proposed approach produced more conservative figures of merit in the assay development and evaluation experiments
At the same time, the proposed approach did not always produce more conservative figures of merit. Supplementary Figure 2 illustrates one such a case for a peptide from dataset SRM7. The proposed approach produced smaller (i.e., more optimistic) figures of merit in this case. This behavior may be associated with intensities that deviate substantially from the canonical calibration curve, e.g., in presence of interferences or other data quality issues. In the assay development and evaluation experiments, such situation was observed on average in 10% of the peptides over all the datasets.
Supplementary Figure 5 illustrates these patterns for the DIA experiment. It shows that the non-linearity of a response curve of a peptide is strongly associated with the difference between the proposed and linear model-based estimates of LOB and LOD.

The proposed approach produced conservative figures of merit in the clinical research assay
We applied the proposed approach to the clinical research assay experiment. This dataset is representative of clinical research assays in that it is not multiplexed, contains many different concentrations, and a large number of replicates as compared to the assay development and evaluation experiments described previously in this manuscript. The linear fit for this dataset is illustrated in Figure 8(a). The linear curve appears to fit well overall. However, a zoom into the lowest concentrations shows that it failed to accurately represent this particular region, which is of our main interest. The logistic curve fit and the fit by the proposed approach are illustrated in Figure 8(b). Due to the lack of intensities in the saturation range, the logistic curve fit was unstable. It failed to adequately capture the leveling off at low concentrations, and resulted in an overly wide prediction interval. In contrast, the proposed approach accurately represented the data, facilitated the interpretation in the context of the canonical calibration curve, and provided reasonable estimates of the figures of merit.
Other quantities, in addition to LOB and LOD, are often of interest in clinical research assays. These include lower limit of quantification (LLOQ). In this manuscript we used a definition of LLOQ, which is based on a linear curve fit, and is limited to the concentrations in the experiment. The definition and the details of the calculation for this dataset are described in Supplementary Section 3.2. The procedure resulted in LLOQ=1 μg/ mL. Although this quantity is not necessarily related to the LOD, we note that it was below the value obtained by the proposed approach (LOD=1.03), and may therefore be potentially be too optimistic.

Discussion
We introduced a novel statistical approach, which is specifically designed for characterization of multiplexed mass spectrometry-based assays. It expands upon the linear curve fit between the observed peptide intensities and known concentrations by directly incorporating the concept of the canonical calibration curve.
When peptide concentrations fall entirely within the linear portion of the dynamic range, the fit by the proposed approach is identical to that of weighted linear regression. However, when peptide concentrations are small and their intensities flatten out, the proposed approach results in a more accurate and a more interpretable fit. This is particularly important in multiplexed experiments, where it is impossible to ensure that the concentrations of all the assayed peptides are in the linear portion of the dynamic range.
The proposed approach improves the accuracy of the figures of merit as compared to the linear and to the logistic curve fits. When peptide concentrations are below the linear portion of the dynamic range, the proposed approach typically results in more conservative figures of merit. This may have important implications regarding the usability of the assay. At the same time, the concentrations above the linear portion of the dynamic range do not have a strong impact on the quality of the fit. This is due to the fact that intensities from high concentrations have higher variance, and have a smaller weight during the estimation. If needed, extensions of the proposed approach to tri-linear fits can be used to capture concentrations exceeding the linear regime.
The proposed approach is compatible with any strategy for quantifying a peptide in a run. In this manuscript, we chose to sum the intensities of all the transitions or fragments of a peptide that are reported by data processing, because this approach is simple, frequently used, and does not require additional software or manual intervention. However, other strategies are also acceptable. For example, quantification can be performed using top transitions or fragments. The transitions or fragments can be manually or automatically curated to remove interferences. If a curation procedure is used, it must be incorporated into every subsequent use of the assay to avoid overly optimistic characterization of its performance.
The proposed approach can be used for absolute quantification. A byproduct of the estimation procedure is a confidence interval for the parameter Change, which determines the lower bound of the linear portion of the dynamic range. The fit obtained by the proposed approach for concentrations above the confidence interval, or a separate linear fit limited to the concentrations above the confidence interval, can be used to relate the observed intensities of the peptide to the true concentrations.
Since the proposed approach discretizes the true concentrations, its resolution is not limited to the concentrations used in the experimental design. Therefore, it can help compare the performance of alternative assays or workflows for a same peptide. Moreover, discrepancies between the figures of merit by the proposed approach and by the linear fit, in particular in cases where the proposed approach produces lower estimated values, can be used to automatically screen the assays for interferences, or for other data quality issues, which can be subsequently manually inspected.
Overall, we believe that the proposed approach offers multiple benefits, and advocate its general use for characterization of multiplexed mass spectrometry-based assays.

Acknowledgments
The work was supported in part by the US National Science Foundation CAREER award DBI-1054826 to O.V. We thank Ching-Yun Chang for valuable discussions and feedback and Meena Choi for help with the implementation.  The changes between the noise, linear, and saturation regimes occur at concentrations Change and Change Sat . When the number of concentrations or replicates is small, the intensities from the lower or upper range of the linear regime are similar to the intensities from the noise or of the saturation regime. This creates uncertainty in the shape of the response, and produces the apparent calibration curve. The linear portion of the dynamic range is defined as the range of concentrations where the canonical and the apparent calibration curves agree. (b) Estimation of LOB and LOD based on a linear fit, illustrated for experiment SRM7, study IIIA, site 95, peptide ESDTSYVSLK. The LOB is defined as the concentration at the intersection between the upper bound of the 95% one-sided prediction interval of the background noise, and the linear fit. The LOD is defined as the concentration at the intersection between the upper bound of the 95% one-sided prediction interval of the background noise, and the lower bound of the 95% one-sided prediction interval of the linear fit (which also corresponds to the lower bound of the 90% two-sided prediction interval). We emphasize the importance of using prediction intervals (as opposed to confidence intervals for the mean fit, which would have led to more optimistic figures of merit). The quality of fit impacts the estimates of the figures of merit.      The LOB plot shows for each data set the the % of peptides where non-linearity was present in more than 50% of bootstrap iterations. A greater percentage indicates a greater number of peptides with a non-linear characteristic in a particular dataset. Whenever this percentage is large, the linear method substantially underestimates the LOB and LOD at the dataset level compared to the proposed. Due to the lack of intensities in the saturation range, the logistic curve fit was unstable. It failed to adequately capture the leveling off at low concentrations, resulted in an overly wide prediction interval, and could not produce an estimate of LOD. In contrast, the proposed approach accurately represented the data, and provided estimates of both LOB and LOD. While the estimates of LOB were the same for the two fits for this particular peptide, this is not generally the case and depends on patterns of variation in the data.