Quality Control Analysis in Real-time (QC-ART): A Tool for Real-time Quality Control Assessment of Mass Spectrometry-based Proteomics Data*

Liquid chromatography-mass spectrometry (LC-MS)-based proteomics studies of large sample cohorts can easily require from months to years to complete. Acquiring consistent, high-quality data in such large-scale studies is challenging because of normal variations in instrumentation performance over time, as well as artifacts introduced by the samples themselves, such as those because of collection, storage and processing. Existing quality control methods for proteomics data primarily focus on post-hoc analysis to remove low-quality data that would degrade downstream statistics; they are not designed to evaluate the data in near real-time, which would allow for interventions as soon as deviations in data quality are detected. In addition to flagging analyses that demonstrate outlier behavior, evaluating how the data structure changes over time can aide in understanding typical instrument performance or identify issues such as a degradation in data quality because of the need for instrument cleaning and/or re-calibration. To address this gap for proteomics, we developed Quality Control Analysis in Real-Time (QC-ART), a tool for evaluating data as they are acquired to dynamically flag potential issues with instrument performance or sample quality. QC-ART has similar accuracy as standard post-hoc analysis methods with the additional benefit of real-time analysis. We demonstrate the utility and performance of QC-ART in identifying deviations in data quality because of both instrument and sample issues in near real-time for LC-MS-based plasma proteomics analyses of a sample subset of The Environmental Determinants of Diabetes in the Young cohort. We also present a case where QC-ART facilitated the identification of oxidative modifications, which are often underappreciated in proteomic experiments.


Liquid chromatography-mass spectrometry (LC-MS)-based
proteomics studies of large sample cohorts can easily require from months to years to complete. Acquiring consistent, high-quality data in such large-scale studies is challenging because of normal variations in instrumentation performance over time, as well as artifacts introduced by the samples themselves, such as those because of collection, storage and processing. Existing quality control methods for proteomics data primarily focus on post-hoc analysis to remove low-quality data that would degrade downstream statistics; they are not designed to evaluate the data in near real-time, which would allow for interventions as soon as deviations in data quality are detected. In addition to flagging analyses that demonstrate outlier behavior, evaluating how the data structure changes over time can aide in understanding typical instrument performance or identify issues such as a degradation in data quality because of the need for instrument cleaning and/or re-calibration. To address this gap for proteomics, we developed Quality Control Analysis in Real-Time (QC-ART), a tool for evaluating data as they are acquired to dynamically flag potential issues with instrument performance or sample quality. QC-ART has similar accuracy as standard post-hoc analysis methods with the additional benefit of real-time analysis. We demonstrate the utility and performance of QC-ART in identifying deviations in data quality because of both instrument and sample is- Control of data quality is a fundamental need for facilitating scientific reproducibility, and it is also essential for the translation of experimental discoveries to clinical, industrial or environmental applications (1)(2)(3). In liquid chromatographymass spectrometry (LC-MS) 1 -based proteomics studies, it has been demonstrated that performing robust quality control (QC) can improve overall protein quantification and subsequently yield more accurate statistical estimates of differential abundance by detecting outlier data points (4). To date, only a few tools have been developed to assess LC-MS-based proteomics data quality in the context of an entire study (reviewed in (5)), and most of these are implemented as posthoc analyses to be utilized at the end of the experiment. However, because of the complexity of proteomic studies, especially those involving large sample sets or cohorts, performing QC assessment of proteomics data in real-time would offer significant advantages. The sources of variability in a proteomics experiment that are addressed by QC can be categorized into two groups: biological and technical. The goal of QC is to not remove normal biological variability; however, there are circumstances where an LC-MS analysis displays outlier behavior and should be flagged and evaluated. For example, a sample may display outlier behavior and further examination may find that the subject had a cofounder, such as a medical drug exposure. The biological profile is likely no longer normal in the context of the experimental design, and thus either the sample would need to be removed or the cofounder would need to be dealt with statisticallyeither way the issue could be addressed. Thus, there is a clear advantage when performing further analysis of the data, either post-hoc or in near real-time, to identify analyses or samples for further investigation. Technical variability is derived from sample collection, transportation, storage, preparation, and/or instrument performance. Teasing out the cause of outliers in the category of technical variability can be extremely challenging, but evaluation of some data parameters, such as peak intensities and peptide sequences identified, which can vary depending on the mass spectrometer, LC column (particularly important for multi-column platforms), time since last instrument cleaning, length of proteolytic digestion, and sample cleanup, can improve downstream analysis (6). These issues are further complicated when the proteomics study requires months or years to complete, as other parameters in the instrument can drift over time. Thus, QC evaluation that differentiates normal change over time from outlier behavior could dramatically improve overall data quality by notifying investigators of the need for instrument maintenance, thus minimizing instrument-related artifacts.
The need for reliable QC approaches for LC-MS-based proteomics studies can be measured by the increasing number of publications on the topic (4,(7)(8)(9)(10)(11)(12)(13)(14)(15). Initial research in this area resulted in several web-based applications that track individual QC metrics on the fly with varying levels of sophistication (7)(8)(9)(10). In Wang et al., (11) QC metrics are tracked, then quantified and the uncertainty is partitioned into sources such as lab and instrument type, but the evaluation is focused on the entire experiment rather than individual MS analyses. Amidan et al., (12) proposed a method to identify poor-quality datasets using a supervised learning approach. Because of the dynamic nature of mass spectrometry data, the method performs well post-hoc, but the supervised algorithm is overly specific to the training data and therefore cannot accurately track the quality of experiments in real-time. Bielow et al., (13) developed a software tool (PTXQC) that summarizes QC metrics to allow an expert to curate individual datasets more quickly. However, their method currently is tailored to the QuaMeter metrics (16) only and the extension to other types of metrics is not immediate. Finally, Bittremieux et al., (14) proposed a powerful tool that is unsupervised in nature and can handle generic QC data of high-dimension.
We have developed a method, QC Analysis in Real Time (QC-ART), that identifies local and global deviations in data quality because of either biological or technical sources of variability. The procedure is similar to that of Matzke et al. (4) in the context of the statistical outlier algorithm employed but adds a dynamic modeling component to analyze the data in a streaming LC-MS environment. We demonstrate the accuracy of QC-ART on data from both label-free and isobaricallylabeled (i.e. iTRAQ (17)) proteomics studies. QC-ART is general enough to be applied to any LC-MS-based study where appropriate QC metrics can be collected over time, such as metabolomics and lipidomics. Using hand-curated data, QC-ART was validated to achieve similar accuracy to state-ofthe-art post-hoc analyses (4) but in real-time. Lastly, the capabilities and benefits of QC-ART for large-scale proteomics studies is demonstrated for data collected from analyses of a sample subset of The Environmental Determinants of Diabetes in the Young (TEDDY) cohort (18). Using QC-ART in this study, we identified multiple types of both naturally occurring issues and those because of deliberate, yet subtle manipulations of instrument operating parameters, and the results were compared with current state-of-the-art methods (14). We also demonstrated the utility of QC-ART in identification of oxidations of tryptophan, tyrosine and cysteine residues, which are often overlooked in peptide identification. QC-ART is available as an online application (https://ascm.shinyapps. io/BAS_QCART), where researchers can perform analyses by uploading their own data. Additionally, the source code necessary to implement QC-ART as a standalone application is freely available as the R package QCART on GitHub (https://github.com/stanfill/QC-ART), and the source code for a corresponding web interface is freely available at https:// github.com/stanfill/QC-ART-Web-App.

EXPERIMENTAL PROCEDURES
As discussed above, existing algorithms for QC of LC-MS-based untargeted proteomics data are not designed for streaming applications (14). However, for purposes of comparison to the state-of-theart, QC-ART is compared with several existing algorithms and results are compared in a post-hoc fashion to determine if QC-ART performs as well as or better than existing approaches in a dynamic manner. The two key algorithms used to compare with QC-ART were Robust Mahalanobis Distance on Peptide Abundance Vectors (RMD-PAV) and an unsupervised QC method, InSPECTor. These two algorithms are described at a high level for comparative purposes, and details of these methods are available in (4) and (14), respectively. For both algorithms, data matrices are represented at the sample level, i.e. in the case where samples are fractionated before LC-MS analysis, the data for a single sample is the sum of all fractions.
Existing Post-Hoc QC Algorithms-RMD-PAV is a post-hoc analysis technique used to identify outlier LC-MS data based on all of the quantified peptide peak intensities for a sample. The observed peptide peak intensities are summarized as an abundance distribution represented by a set of statistical metrics, such as median and skewness. This set of metrics is reduced using robust principal components analysis (rPCA), and then the robust Mahalanobis distance between the metrics transformed using the rPCA coordinate system is measured. The distance computed for each sample is compared against percentiles of the appropriate chi-square distribution to determine how extreme an instrument run is relative to the rest of the data set. A score akin to a p value is used to identify samples that may be outliers.
InSPECTor is a recently developed method that represents the cutting edge in unsupervised outlier detection for large LC-MS experiments. It is based on a local outlier probability distance metric, which identifies potentially outlying instrument runs by finding a group of k instrument runs most like the analysis in question. InSPECtor then uses the standard normal density kernel to estimate the probability of each run being an outlier (15). The probability threshold used to identify outlying instrument runs is chosen by the user and varies from experiment to experiment.
QC-ART-QC-ART is an algorithm that uses a dynamic linear model to flag anomalies while accounting for typical instrument change over time. Furthermore, to perform this task in near real time, only metrics that can be computed in a rapid fashion are used, such as those defined by NIST (19) and proposed along with QuaMeter (16). Fig. 1 illustrates the generic workflow for QC-ART. Once data collection has started, variables are computed for each instrument analysis in near-real time. The model is fit using a baseline set of data, and as each new sample is analyzed by the instrument, it is immediately scored. If the data do not show any anomalous behavior in the context of the baseline set, the process continues with the results from analysis of the next sample. However, if the data appear anomalous in the context of the baseline, then the sample is flagged for follow up by a technician. At this point the user may evaluate the model assumptions and determine if the existing baseline is still appropriate or if a new one should be established. If the user believes that the instrument performance has changed or that the instrument needs to be cleaned, then the process begins again.
QC-ART Variables-It has been shown that summary statistics derived from reporter ion distributions from isobarically labeled proteomics data are beneficial in addition to NIST and QuaMeter QC metrics when assessing LC-MS-based proteomics data quality (20). Inspired by these metrics, we considered a large list of potential variables that could be generated rapidly for inclusion in QC-ART. However, to increase the rate at which QC-ART can process data, it was prudent to reduce the number of variables to just those that demonstrate predictive qualities for identifying low-quality data. PCA was used to identify a subset of the initial variables in conjunction with domain expertise associated with common sources of altered LC-MS data quality, such as nanoelectrospray instability. For label-free proteomics data, only the NIST QC metrics are used.
Setting the Baseline-A baseline data set comprised from good quality instrument runs is critical and driven by the researcher's goal(s). For real-time analysis, for example to track instrument performance over time, then a set of data from the beginning of the experiment from analyses that were performed under ideal instrumental conditions should be chosen. In this way, successive instrument runs that have scores significantly far away from the quality threshold will signify a shift in data quality that should be evaluated. The dynamic nature of QC-ART allows the baseline to change over time as needed. When using QC-ART to perform post-hoc QC, a baseline that is evenly distributed thorough out the course of the experiment in chronological order, which accounts for uncertainty because of variability in instrument performance over time, is selected. We investigate the impact of baseline size and quality on the accuracy of QC-ART in supplemental File S1. Scoring New MS Data-An rPCA method is used to transform the data, and then the robust Mahalanobis distance between the reduced set of variables is computed to assess similarity, a modification of the existing Sign2 metric (21). Given that the metrics can vary dramatically in scale and the underlying statistical distributions of the metrics are unknown, rPCA is more accurate than traditional PCA methods at identifying outlying observations (22). Similarly, the robust Mahalanobis distance is used to score the instrument runs transformed to the rPCA space because it has been shown to be the preferred multivariate distance in the presence of extreme observations (4). The resultant scores, called QC-ART scores, are not guaranteed to follow a distribution; therefore, cutoff values based on percentiles of a common distribution are not appropriate. Thus, the baseline scores are used to build a model against which all future scores are compared. Previous QC methods assume a static linear model to the QC metrics, which implicitly assumes the mechanism generating the data is unchanged throughout the course of the experiment (7,10,13). Because instrument behavior is likely to change over time, we additionally implemented dynamic linear models, whose parameter estimates are continually updated when additional experiments are performed and observed to be high quality (23). Experiments that are identified to be of poor-quality should not be used to update the parameter estimates as doing so could decrease the chances of identifying poor-quality instrument runs performed later in the study.
For both the static and dynamic models, the assumptions associated with the model must be checked as new instrument runs are added to the dataset. See supplemental File S1 for a further discussion about model assumptions and how to verify them. Both the static and dynamic linear models are used to define threshold values for the QC-ART scores. The threshold values are chosen to control the probability of a false positive, i.e. a good quality instrument run was erroneously flagged as being of poor quality. The static model threshold is used to identify changes in instrument quality relative to the chosen baseline set only, whereas the dynamic model threshold is used to identify changes in instrument behavior relative to the baseline set after controlling for the recent behavior of the instrument. Because QC-ART scores are distances, they cannot be less than zero and a large score indicates that a given instrument run is different compared with the baseline set. Therefore, isolated large scores are interpreted as a single instrument run that warrants further investigation, e.g. because of an occluded electrospray ionization emitter or inappropriate database search parameters, but do not signify a systematic change in instrument quality. Several successive scores above the threshold value or deviations from the model assumptions represent a potential systematic change in data. Finally, the QC-ART scores on their own are not easily translated from one study to another. However, QC-ART scores can be translated to probabilities by using the probability distribution function implied by the static and dynamic models. Probabilities derived from the static model are interpreted as the probability that an extreme LC-MS instrument run occurred given the baseline data only. Alternatively, probabilities derived from the dynamic model are interpreted as the probability that an extreme LC-MS analysis occurred given the baseline data and recent changes in instrument behavior. RESULTS We assessed the ability of QC-ART to identify outlying instrument runs using both a previously published label-free proteomics data set that has been expertly curated, and a new isobarically-labeled proteomics data set from analysis of a large sample cohort. QC-ART is compared against RMD-PAV and InSPECtor for both data sets.

Real Data Benchmark -Expert Identified Outlier Runs-
The label-free proteomics data are comprised from analyses of a human lung-derived cell line, Calu-3, infected with Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV). A total of 141 LC-MS runs were performed using an LTQ-Orbitrap instrument, and the data were expertly curated, and 28 LC-MS analyses identified as potentially outlying (4). The same five statistics as previously published (4) were used to summarize each LC-MS run for the RMD-PAV and InSPECtor methods: (1) the fraction of missing peptides, (2) a group-wide correlation coefficient, (3) median, (4) skew, and (5) kurtosis of the abundance distribution. QC-ART scores were computed using the same five statistics except for the group-wide correlation coefficient because it could not be computed until all peptide quantifications were completed. The first ten LC-MS runs that were not suspected of being outliers were used as the baseline for QC-ART, which was applied to each sample in the order in which it was quantitated without using any information from samples that had not yet been quantitated. The InSPECtor method was implemented with a neighborhood size of ten to parallel the baseline choice for QC-ART. Results for other baseline sizes are given in supplemental File S1.
A receiver operating characteristic (ROC) curve analysis was used to compare the ability of QC-ART to identify extreme peptide abundance distributions relative to the RMD-PAV and InSPECtor methods (Fig. 2). QC-ART and RMD-PAV Case Study -Longitudinal Cohort Study-To test the utility of QC-ART for monitoring the quality of LC-MS data in real time, we used it to supervise data from plasma proteomics analyses of a subset of samples from TEDDY (18), a large prospective study with the goal of discovering factors that initiate the autoimmune response and destruction of the pancreatic beta cells, leading to the development of type 1 diabetes. To fulfill this goal, we are performing comprehensive plasma proteomic analyses of TEDDY samples to better understand progression of the disease. A total of 2252 plasma samples from 368 donors were pooled together by donor, depleted of the 14 most abundant proteins, then digested with trypsin and labeled with 8-plex iTRAQ reagent according to the manufacturer recommendations. Each 8-plex iTRAQ set was multiplexed by including 6 TEDDY samples plus one common reference sample (channel 121) that was generated by pooling aliquots from all donors, whereas the remaining iTRAQ channel (119) was not used. Each of the resulting sixty-two 8-plex iTRAQ sets was fractionated into 24 fractions, resulting in 1488 individual samples for LC-MS/MS analysis that in total required 14 months to complete, together with analysis of one independent QC sample (tryptic digest of the bacterium Shewanella oneidensis) and one blank every 24 fractions, which were used to assess instrument performance. The variables used by QC-ART to monitor these instrument runs are described in Table I. The NIST variables were calculated using the PNNL-developed software SMAQC, which is freely available on GitHub https://github.com/PNNL-Comp-Mass-Spec.
To monitor instrument performance using QC-ART, a set of instrument runs that were collected during peak instrument performance were chosen. In this instance, the first ten fraction sets of data (a total of 240 LC-MS/MS runs) collected after instrument cleaning, calibration and running an independent QC sample before each iTRAQ set were treated as the baseline for all future instrument runs (Fig. 3A). Additionally, because the fractions might contain completely different peptides, each of the 24 fractions of the iTRAQ sets was treated separately. For example, to assess the quality of the data collected in Fraction 1 of Set 11, the corresponding variables for Fraction 1 of Set 11 were compared against those same variables for Fraction 1 of Sets 1 through 10.
If an instrument run was known or later judged to be of poor-quality, its value was not used to update the dynamic threshold model. In practice, the samples that were flagged by QC-ART were reanalyzed at the end of the study because this methodology was under-development when the samples were being processed. For future studies employing QC-ART, flagged samples will be reanalyzed immediately. Five events of interest that occurred during the study are labeled in Fig. 3A (and all subsequent figures); (1) iTRAQ sets 16 and 17, (2) test runs with deliberately mistuned instrument parameters, (3) drop in instrument performance, (4) gap in analysis and (5) diluted samples. In July 2015, a still underdevelopment version of QC-ART flagged several runs of iTRAQ sets 16 and 17 as poor-quality datasets, at which point the instrument operator stopped the data collection and cleaned the mass spectrometry ion source. These samples were rerun in early August 2015, and the corresponding QC-ART scores returned to normal levels. We also collected 10 samples using deliberately mistuned mass spectrometry parameters and liquid chromatography gradient to assess QC-ART's ability to identify poor-quality data. The mistuned parameters were: (1) changing the mass spectrometer front lens voltage from Ϫ8 to Ϫ6, which affects the number of ions entering the ion trap; (2)  The interval for the fourth 25% of all MS events divided by RT-Duration RT_MSMS_Q1 The interval for the first 25% of all MS/MS events divided by RT-Duration RT_MSMS_Q4 The interval for the fourth 25% of all MS/MS events divided by RT-Duration inserting a dead volume in the LC tubing, which affects the chromatographic performance; (3) changing the mass spectrometer S-lens from 69 to 25%, which generates a bias toward lower m/z ions being transmitted to the ion trap; and (4) increasing the number of microscans, which reduces the overall number of collected spectra. In January 2016, the instrument behavior dropped significantly (labeled "Drop in instrument performance") and led to a period of instrument recalibration and cleaning (labeled "Gap"). Finally, in April 2016, a full set of samples seemed to be too concentrated and was then diluted before reanalysis, which led to differences in data quality (labeled "Diluted samples").
Using the first ten sets as the baseline, QC-ART correctly flagged the poor-quality test runs in July 2015. Similarly, the   FIG. 3. QC-ART scores of the iTRAQ data with static and dynamic thresholds when different sets of instrument runs were used as the baseline (red areas in timelines below figures). In all figures, the point circled in green represents a sample that exhibited suspicious oxidation patterns, the gray vertical lines indicate instrument cleaning and recalibration events, and the horizontal lines represent static (gray) and dynamic (yellow) threshold values. A, When the first ten sets are used as the baseline then the poor quality test runs are correctly identified, but the fundamental shift in instrument performance over time causes the method to flag too many data points later in the experiment. B, Checking model assumptions throughout the course of the cohort study tells the researcher when a new baseline is needed, which allows the method to correctly identify instrument runs of poor quality without negatively impacting the false alarm rate. C, Distributing the baseline throughout the course of the study reduces some of the noise in the QC-ART scores, but they cannot be computed in real-time because some of the baseline samples occur late in the study.
QC-ART scores correctly increased in response to the decrease in instrument performance in January 2016. QC-ART did not, however, identify the diluted samples in late April 2016. QC-ART's inability to detect the difference in sample preparation is because of the difference in instrument behavior before and after the recalibration and cleaning in January 2016. To properly account for the change in instrument performance after the long maintenance period, a new baseline must be chosen. The need for a new baseline is also apparent when assessing model assumptions (supplemental File S1). Fig. 3B illustrates a QC-ART update when model assumptions were violated, that is, a systematic change in instrument performance was identified. From Fig. 3A the distribution of the QC-ART scores changes throughout the course of the study, though it is not obvious exactly when that change is significant enough to warrant a new baseline. Tracking the model assumptions through time in conjunction with the QC-ART scores indicated that significant changes in instrument performance occurred in October 2015 and March 2016 (supplemental File S1). To account for the changed instrument behavior, two new baselines were chosen in October 2015 and again after the gap in experimental runs in early 2016 (red areas in timeline below Fig. 3B). Both retraining periods coincide with significant instrument maintenance that were initiated by the technicians and are immediately preceded by large shifts in QC-ART scores. This illustrates that when QC-ART is appropriately trained, it can identify all areas of interest. The poor-quality test runs in July 2015 have very large scores, with the scores leading up to the gap in January 2016 increasing as the instrument performance drops, and the diluted samples in July 2016 being identified successfully.
QC-ART can also be used as a post-hoc data quality tool by selecting a baseline comprised of data from analyses that are distributed throughout the course of the study (Fig. 3C). Compared with the results from when the baseline is comprised of data collected at the beginning of the study (Fig. 3A), spreading out the baseline greatly reduced the variance in scores as indicated by, e.g. reduced QC-ART scores associated with the poor-quality test runs in July 2015. Also, the variability in instrument performance through time was partially accounted for by the alternative baseline selection. Used as a post-hoc data analysis tool, QC-ART could identify the poor-quality test runs in July 2015, and the diluted samples in April 2016. QC-ART could not, however, identify the last group of instrument runs before the gap in January 2016. This is likely because data acquired just before it was included in the baseline set. This illustrates the importance of choosing an appropriate baseline set when using QC-ART.
To validate the QC-ART results, data from each LC-MS/MS analysis of the TEDDY cohort study was analyzed by the InSPECtor and RMD-PAV methods in a post-hoc fashion, i.e. the respective methods are applied once all the samples were analyzed at least once. The InSPECtor method was applied to all instrument runs using the same variables that informed QC-ART as reported in Table I (Fig. 4A). Unlike the QC-ART scores in Fig. 3A, the change in instrument performance over time cannot be detected with the InSPECtor method, which was expected given the local focus of its distance metric. The instrument runs starting in March 2016 do not appear to be different from those that occurred much earlier in the study even though they are quite different when compared directly. The upward trend in outlier scores starting in January 2016 indicates that InSPECtor could detect the degradation in instrument performance before the gap in early 2016, however. Based on the chosen 95% threshold, InSPECtor also identified the poor-quality test runs in July 2015 and data from some of the diluted samples in May 2016.
To apply the RMD-PAV method to the cohort study, the peptide reporter ion intensity data for each sample within each fraction set was extracted using MASIC (24). Because of the complexity of the cohort study, the collected data had to be manipulated to implement RMD-PAV. First, the full data set was reduced such that only the initial analysis of each iTRAQ set was retained. For example, samples from fraction set four were analyzed in May 2015 and March 2016, but only the results from May 2015 were used to compute RMD-PAV scores. Second, the iTRAQ 8-plex configuration used for this study creates results for six individual samples and a pooled reference sample. The RMD-PAV scores were derived from the pooled reference sample data because the data from the individual samples exhibited unwanted variation because of biological differences between the donors. Finally, because the study groups were spread across fraction sets, the suggested group average correlation variable typically used by RMD-PAV was replaced with a global average correlation. That is, the average pairwise correlation between the peptide abundance vectors for each sample is used in place of the average group correlation defined in Equation (1) of Matzke et al. (4). The RMD-PAV scores on the log base two scale that resulted from this implementation of RMD-PAV along with a 99.9% threshold are plotted in Fig. 4B.
The large group of samples with scores above the threshold starting in May 2016 suggests that RMD-PAV could recognize that data processed at the end of the study differ systematically from data collected at the beginning of the study. However, RMD-PAV was not able to differentiate the diluted samples from any of the other samples that occur after the break. Additionally, the data collected just before the gap does not look to be of poor-quality based on RMD-PAV scores. Note that the interesting data point plotted previously in green and the poor-quality test runs flagged by QC-ART and InSPECtor were not analyzed using RMD-PAV because they were reruns of samples analyzed earlier in the experiment.
The instrument run represented by the green point in Figs. 3 and 4A was flagged by QC-ART based on all three-baseline choices using both the dynamic and static thresholds. When the individual metrics corresponding to that run were investigated, no individual outlying values were noticed. To further investigate this issue, a thorough manual inspection was carried out for several LC-MS runs with similar behavior. Although only small differences were observed in peak intensities and overall shape of the total-ion chromatogram, some regions of the chromatogram revealed different peak distributions compared with the corresponding fractions in different iTRAQ sets (Fig. 5A). By examining one of those regions (elution time 32-37 min) we observed extensive mass shifts of 15.99 Da, which corresponds to the addition of one oxygen atom (Fig. 5B). The sequence of a peptide in this region was determined to be GQYCYELDEK, which corresponded to amino acid residues 177-186 of human vitronectin (Uniprot ID: P04004). Surprisingly, this peptide lacks methionine residues, which are easily oxidized and usually set as a possible modification location in proteomic data analysis. Note that InSPECtor gave this instrument run a score below 50/100 (Fig.   4A), which could imply it is incapable of capturing subtle changes in data quality as reliably as QC-ART.
To determine possible oxidation sites, database searches were performed again but with MSGFϩ (25) now considering potential oxidation in any amino acid residue. Those peptide identifications that were reported by MSGF as having oxidized residues were then analyzed by Ascore (26), a tool that calculates probabilities for specific localization of modifications, to assure the localization of the oxidation. The final oxidation counts were normalized by the total number for each amino acid (Fig. 5C), and the results showed enrichment in oxidations of cysteine, methionine, tryptophan and tyrosine residues (Fig. 5D). Although methionine oxidation was expected, cysteine was not because the samples were reduced with dithiothreitol during sample preparation. Tryptophan and tyrosine oxidations have been previously described in specific FIG. 4. Results of the InSPECtor and RMD-PAV methods for the TEDDY iTRAQ data with labeled time periods of interest. A, The InSPECtor outlier scores (%) were computed using a neighborhood size of 10 are plotted for each instrument run in chronological order. The horizontal line corresponds to a 95% threshold and the green point represents a sample that exhibited suspicious oxidation patterns. The mistuned sample runs and some of the diluted samples were correctly flagged to be of poor quality, but the instrument drift was not detected. B, The RMD-PAV scores were computed for each fraction set once. The solid horizontal line corresponds to a 99.9% threshold. Some of the diluted samples were correctly flagged to be of poor quality, but the large shift in instrument performance at the end of the cohort prevents RMD-PAV from identifying many of the other phenomena of interest during this study. oxidative stress conditions (27,28), but they are not usually considered during informatics processing for peptide identification. Moreover, oxidation on cysteine, tryptophan, tyrosine and proline residues were recently identified by an unbiased database search analysis of HeLa and HEK293 proteome (29). We then reanalyzed the data by performing the protein database searches considering these oxidations, which led to an increase of up to 27% in the number of identified peptides ( Fig. 5E), but which was not reflected in a significant increase in the identification of proteins (Fig. 5F).
Development of a User-friendly Interface-To make QC-ART more accessible to instrument operators and core facility personnel, we developed a user-friendly, online application (https://ascm.shinyapps.io/BAS_QCART), where researchers can perform analyses on QC metrics of their own samples (Fig. 6A). The researcher simply uploads their data to the application, then QC-ART scores and thresholds are computed using default parameter values (Fig. 6A-6C). We have pre-set reliable threshold values based on our extensive training. However, advanced users can manipulate tuning parameters such as the baseline size and the proportion of variability explained by the principal components (Fig. 6C). The QC-ART scores are plotted in an interactive dot plot along with dynamic and static thresholds to identify instrument runs that may require further evaluation. The source code used by the online application to implement the QC-ART method is freely available as the R package QCART on GitHub (https://github. com/stanfill/QC-ART). DISCUSSION QC-ART is a novel and powerful real-time QC tool. Its flexibility sets it apart from existing QC methods, with the added cost of more oversight by the researcher. The researcher must choose the baseline data sets, appropriate variables and a model to fit to the scores, but the insights derived from QC-ART are deeper than those currently attain-able and offer informative metrics to allow researchers to actively steer data collection. Existing QC tools either automatically chose a baseline set using a machine-learning algorithm, e.g. InSPECtor, or use all instrument runs available, e.g. RMD-PAV. By selecting a baseline of instrument runs at the beginning of a long study when instrument performance is likely optimal, QC-ART can reveal previously unexplored sources of uncertainty, such as normal m/z instrument drift. Using a data set that was expertly curated to be of good or bad quality, we showed that QC-ART is equally or more accurate than RMD-PAV and InSPECtor, but the QC scores for each sample were available immediately after peptide identification and other data processing was completed, rather than after all samples were analyzed. In the context of a long running cohort study, neither InSPECtor nor RMD-PAV could identify the gradual change in instrument performance that was obvious in Fig. 3. The benefits of QC-ART relative to existing post-hoc tools like RMD-PAV is derived from the baseline flexibility, but also QC-ART's ability to fuse multiple sources of data. The inclusion of NIST variables such as BPMZ skew, MS1 2B and P 2C, allowed QC-ART to identify samples that were prepared incorrectly, but had peptide abundance vectors similar to other runs. Because RMD-PAV is defined solely based on peptide abundance data, it was not able to identify the improperly prepared samples. Further, using all other instrument runs as a baseline makes it easy for RMD-PAV to identify large changes in instrument performance, but those large shifts in performance often mask the subtle changes such as the slow degradation of instrument performance in January 2016. Finally, by fitting a model to the QC-ART scores and continually checking the assumptions associated with those models, QC-ART was able to pinpoint exactly when the instrument needed service or cleaning. None of the existing methods for LC-MS-based proteomics analysis QC can identify change points in instrument performance with this level of rigor. In addition, QC-ART scores can be modeled either statically or dynamically, allowing QC-ART to identify both global and local changes in instrument behavior. Static thresholds are used exclusively in the literature to identify global outlying observations (7,10,13).
QC-ART is an important addition to the existing proteomics QC toolkit and can substantially reduce the amount of time required to identify and re-run samples that may have been subject to unwanted sources of variability.