Dynamic binning peak detection and assessment of various lipidomics liquid chromatography-mass spectrometry pre-processing platforms

Liquid chromatography-mass spectrometry (LC-MS) based lipidomics generate a large dataset, which requires high-performance data pre-processing tools for their interpretation such as XCMS, mzMine and Progenesis. These pre-processing tools rely heavily on accurate peak detection, which depends on setting the peak detection mass tolerance (PDMT) properly. The PDMT is usually set with a fixed value in either ppm or Da units. However, this fixed value may result in duplicates or missed peak detection. Therefore, we developed the dynamic binning method for accurate peak detection, which takes into account the peak broadening described by well-known physics laws of ion separation and set dynamically the value of PDMT as a function of m/z. Namely, in our method, the PDMT is proportional to for FTICR, to for Orbitrap, to m/z for Q-TOF and is a constant for Quadrupole mass analyzer, respectively. The dynamic binning method was implemented in XCMS [1,2] and the adopted source code is available in the Appendix. Our further goal was to compare the performance of different lipidomics pre-processing tools to find differential compounds. We have generated set samples with 43 lipids internal standards differentially spiked to aliquots of one human plasma lipid sample using Orbitrap LC-MS/MS. The performance of the various pipelines using aligned parameter sets was quantified by a quality score system which reflects the ability of a pre-processing pipeline to detect differential peaks spiked at various concentration levels. The quality score indicates that the dynamic binning method improves the performance of XCMS (maximum p-value 9.8·10−3 of two-sample Wilcoxon test). The modified XCMS software was further compared with mzMine and Progenesis. The results showed that modified XCMS and Progenesis had a similarly good performance in the aspect of finding differential compounds. In addition, Progenesis shows lower variability as indicated by lower CVs, followed by XCMS and mzMine. The lower variability of Progenesis improve the quantification, however, provide an incorrect quantification abundance order of spiked-in internal standards.

Liquid chromatography-mass spectrometry (LC-MS) based lipidomics generate a large dataset, which requires high-performance data pre-processing tools for their interpretation such as XCMS, mzMine and Progenesis. These pre-processing tools rely heavily on accurate peak detection, which depends on setting the peak detection mass tolerance (PDMT) properly. The PDMT is usually set with a fixed value in either ppm or Da units. However, this fixed value may result in duplicates or missed peak detection. Therefore, we developed the dynamic binning method for accurate peak detection, which takes into account the peak broadening described by well-known physics laws of ion separation and set dynamically the value of PDMT as a function of m/z. Namely, in our method, the PDMT is proportional to ௭ ଶ for FTICR, to ௭ ଵ . ହ for Orbitrap, to m/z for Q-TOF and is a constant for Quadrupole mass analyzer, respectively. The dynamic binning method was implemented in XCMS [1,2] and the adopted source code is available in the Appendix.
Our further goal was to compare the performance of different lipidomics pre-processing tools to find differential compounds. We have generated set samples with 43 lipids internal standards differentially spiked to aliquots of one human plasma lipid sample using Orbitrap LC-MS/MS. The performance of the various pipelines using aligned parameter sets was quantified by a quality score system which reflects the ability of a pre-processing pipeline to detect differential peaks spiked at various concentration levels. The quality score indicates that the dynamic binning method improves the performance of XCMS (maximum p-value 9.8·10 -3 † of two-sample Wilcoxon test). The modified XCMS software was further compared with mzMine and Progenesis. The results showed that modified XCMS and Progenesis had a similarly good performance in the aspect of finding differential compounds. In addition, Progenesis shows lower variability as indicated by lower CVs, followed by XCMS and mzMine. The lower variability of Progenesis improve the quantification, however, provide an incorrect quantification abundance order of spiked-in internal standards.
Dynamic binning, peak detection, lipidomics, LC-MS pre-processing. There is a significant amount of research directed to the development of LC-MS(/MS) data pre-processing tools. Some are designed primarily for lipidomics while others are more general LC-MS(/MS) pre-processing tools. To name a few, commercial tools such as Progenesis [6] developed by Nonlinear Dynamics and open source tools such as mzMine, [7,8] [15] are available in the field.
The above data processing tools typically contain the following processing modules [20]: data transformation such as resampling and smoothing, peak detection, retention time correction, the grouping of peaks across different samples, filling in missing data, annotation of individual and matched isotope peak clusters with potential metabolite identity, normalization of the quantification values and differential statistical analysis with univariate and multivariate methods or using other method of constructing EICs takes into account the correct ݉ ‫ݖ‬ య మ proportional peak broadening in mz for Orbitrap data in the peak detection step, which lead to more precise quantification.
LC-MS(/MS) pre-processing workflows differ in algorithmic design and involve many different parameters. It is difficult for a user to determine the correct parameters for peak detection to analyze a particular data. An objective evaluation of lipidomic LC-MS(/MS) data pre-processing workflows and used parameter set is important to compare the performance of different pipelines. [21,28] In 2012, Hoekman et al. [29] introduced a scoring method to compare the performances of different quantitative LC-MS/MS pre-processing workflows. The score quantifies the capacity of a pre-processing pipeline, with a given set of parameters, to detect differentially spiked compounds in sample that has the same composition i.e. consist of the same aliquot of one biological (background) sample. In this dataset the spiked compounds are not present in the background sample. We extended this scoring system to determine the distribution of scores which in turn allows to compare pipelines performance using non-parametric significance test. The modified scoring strategy was applied in this study to compare the performance of centWave peak picking with fixed and dynamic EIC construction tolerance in XCMS, as well as mzMine and Progenesis. Finally, we compared the performance of these LC-MS pre-processing workflows after optimizing the parameters for each workflow, to make them comparable as much as possible, despite the algorithmic differences. Dynamic binning is adopting the threshold tolerance of EIC construction in function of mz using equation derived from ion physics of the used mass analyzer.
Peak detection provides crucial information that is required for accurate compound identification and quantification, and accurate construction of EIC is fundamental for accurate peak detection. Setting proper mass tolerance is important for a proper selection of the ions of one isotope of a compound that forms the EIC. Fig. A.1 shows how mass tolerance is related to the width of the selected rectangle. Although there is a detailed guideline for setting the proper mass tolerance in XCMS,[20] the mass tolerance is highly dependent on the mz and the type of mass analyser. As shown in Fig. 1, compound 1 is located at a low mz of 376.3950 Da, which has a small mass peak width, around 0.003462 Da (9.2 ppm), while compound 2 is located at high mz of 1428.1220 Da, which has a larger uncertainty and thus has a mass peak width around 0.025594 Da (18.0 ppm). Thus, a small value of mass tolerance may not be sufficient to cover the uncertainty of ions mz fluctuation in higher mz, resulting in missing intensity ions from peaks and to underestimation of peak quantity and may results in peak splitting. If a larger value is selected for a mass tolerance, merging of mass traces that are close can occur in lower mz leading to missing peaks and quantitative values of mixed peaks.
Typically, this uncertainty (or variability) may result from two sources: the mass fluctuation (MF) and mass dispersion (MD). MF is fluctuation of the peak maxima in the mass spectrum, which can be observed between different subsequent MS1 mass spectra. Fig. A.2  As the above-mentioned uncertainty is dependent on the mz, a fixed mass tolerance value for peak detection may result in peak merging and/or failure to detect peaks at certain mz range. An alternative way is to use a dynamic mass tolerance according to the uncertainty of acquired ions. Thus, the peak detection mass tolerance (PDMT), should be set as a function of the MF (usually defined in ppm unit) and MD. The peak detection mass tolerance in Da unit (PDMT ୈ ୟ ) can be calculated by Mass peak width (∆m): full width at half maximum (FWHM) of mass spectral peak.
Mass resolving power (R): the observed mass (m) divided by the mass peak width (∆m) at 50% height for a well-isolated single mass spectral peak, as illustrated in Eq. (3).
Therefore, the mass peak width (∆m) changes according to mass resolving power (R) and the observed mass (m), as illustrated in Eq. (4).
It follows from Eq. (2), by replacing mass (m) with the mass-to-charge ratio (mz), The underlying physical principle used in the estimation of mz is different for different types of mass analyzers. In an Orbitrap mass spectrometer, the frequency (w) is directly linked to the mz ratio, as illustrated in Eq. (6) In which e represents the electron charge and z represents the number of charges of the ions. The letter k represents the field curvature, which is a constant value. Thus, To find the relationship between the mass resolving power R and mz, we use the derivative of the mass with respect to the angular frequency from Eq. (7) and derive the Eq. (8) after rearrangement: The ratio of mass to mass variation can then be obtained by dividing Eq. (7) In  mobile phase B in 2 min. The percentage of mobile phase B raised to 50% in the next 0.1 min and increased to 54% in next 9.9 min. Mobile phase B raised to 70% in 0.1 min and increased to 99% in 5.9 min and maintained at 99% for 1 min. Then the percentage of mobile phase B went back to 40% in 0.1 min and the system was equilibrated for 3.9 min before the next run started. The MS was set for positive mode and Based on these t-statistics, the quality scores introduced in this paper were calculated. To get the distribution of these quality scores, in each of the concentrations, we randomly selected 9 out of the 10 replicates to calculate the t-statistics and the quality score. This selection was repeated for 20 times resulting in 20 quality scores for each comparison. Vendor Peak Picking filter was selected to export centroid data. The .mzML files were sent to mzMine (version 2.40.1) for peak picking, grouping and isotope filtration. XCMS (version 3.8.2) also used .mzML files as input for peak picking and grouping for both the dynamic and fixed binning methods. The obtained results were further isotope filtered by CAMERA (version 1.43.2). The exported .csv files were used to assess the performances of the different LC-MS pre-processing tools. The fixed binning term reflects one parameter set in ppm used for linearly expanding EIC construction mz range in function of mz.

Dynamic binning implementation.
In XCMS, the centWave[26] algorithm is one of the most commonly used peak detection algorithms, which uses a mass tolerance for EIC construction in ppm unit. In this study, the centWave algorithm in XCMS source code was modified to implement the dynamic binning approach, which is available in the Appendix. The mz of the dataset ranges from 250 Da to 1750 Da, thus the mass tolerance in modified XCMS source code was changed according to the presented theory.

Model validation
To validate our model to set EIC construction threshold, we visually inspected the sampling interval in the profile mode dataset. The sampling intervals are the distances in mz between two adjacent acquisition points, which is determined by an algorithm implemented on the electronic card of Orbitrap instrument performing data processing with goal to have a constant number of sampling points for each peak in the mass spectrum. Since the number of sampling points in Orbitrap mass spectra is constant, then sampling interval increase should be in accordance with our theory on increasing peak width. Fig. 2a shows that the manually checked sampling intervals (red dots in the figure) increase proportional to . This increasing behavior of the sampling intervals set on the acquisition instrument and found in profile mass spectrum is following the peak width increase in function of mz described by Eq. (15). The exact number of sampling intervals defining the m/z interval which contains all ion intensity information for a peak in profile mode may vary depending on instrumental setting such as resolution.
Empirically, this number can be estimated as 4. Myers et al [25] found, in centroided data, that most of the peaks in a mass spectra span typically between 0-3 sampling intervals. They chose a fixed mz tolerance value of 0.01 Da and 0.02 Da for Orbitrap and Q-TOF, respectively. This setting can ensure that most of the centroid maxima of the analysed isotopic ions in consecutive mass spectra (Fig. 1a) are included in the EIC construction. The mz tolerance value is estimated according to 1 sampling interval at the highest mz. In our model, the peak detection mass tolerance in Da unit ‫ܶܯܦܲ(‬ ), calculated according to Eq. (20), is between 1 and 2 sampling intervals (Fig. 2a). This dynamic setting for peak detection mass tolerance could, on one hand, avoid merging EICs at low mz and on the other hand avoid EICs splitting of one isotope peak at high mz values.
In the implemented dynamic binning method, the mz tolerance value in centWave was set to the dynamic ܲ ‫ܦ‬ ‫ܯ‬ ܶ value, while for the fixed binning method 7, 13 or 19 ppm ( Fig. 2b) were used for peak detection for the entire mz range. namely the area under the chromatographic peak is 1.648·10 8 , which becomes 1.646·10 8 after baseline correction, and the maximum intensity measured in the mz and retention time rectangle containing the peak is 6.441·10 6 ( Table 1). Peaks in Fig. 3b shows similar properties as in Fig. 3a.  Fig. 3. Examples of IS peaks not detected with the default 7 ppm fixed EIC construction threshold by XCMS.

Comparison between the dynamic and fixed binning strategy in XCMS
The lower part of the figure shows the scatter plot of mz and retention time, while the upper part shows the peak shape as a scatter plot of intensity and retention time.   (a) The highlighted point is included only with the use of dynamic binning, while it is missing with fixed binning using 7 ppm threshold. This result in a higher area of a peak detected with the dynamic binning method compared to the one obtained with the fixed binning. (b) An example of peak splitting in fixed binning method with 7 ppm threshold. The two green rectangles indicate the two peaks detected by the fixed binning method, while dynamic binning method only detected correctly the one peak indicated by blue rectangle.
Even though the constructed EICs pass XCMS's filtration parameters, the quantification value of detected peaks may be less accurate compared to dynamic binning, because of less accurate EIC construction. The two peaks in Fig. 4 can be detected by using both fixed binning and dynamic binning method, but the quantification values are different (Table 1). For example, the highlighted point with black square in Fig. 4a is included only by the dynamic binning, while from the LC profile (upper part of Fig. 4a) it is obvious that the dots is part of the bell-shaped curve. As a result, the area of the peak detected with the dynamic binning method is higher than the one obtained with the fixed binning (i.e. 1.278·10 7 compared with 1.241·10 7 ). Fig. 4b shows an example of peak splitting using fixed binning peak detection. The two green rectangles indicate that the fixed binning method detected this peak twice at retention time 415 and 424 respectively; while dynamic binning method only detected correctly one peak at retention time 415, as indicated by the blue rectangle.
The difference between the number of detected peaks is small and from the presented data visualization of few examples is not obvious how dynamic binning performs at the whole dataset level. To have an overview of the difference of quantification between dynamic binning and fixed binning method considering all the peaks, we plotted the Bland-Atman plot of the quantification with dynamic and static EIC binning as shown in Fig. 5. The figure shows the comparison in concentration 1/16, 1/8, 1/4, 1/2, and 1 respectively. In each concentration, the x-axis shows the mean of the natural logarithm of peak intensities obtained with the dynamic and fixed binning method (7 ppm), and the y-axis represents the differences of the natural logarithms of the peak intensities obtained with both methods (i.e. In order to assess if the higher quantification values lead to more accurate quantification, we have applied a quality score introduced by Hoekman et al. [29] that assess ability to identify differential spiked-in compounds in samples with the same complex lipid background.   two-samples Wilcoxon non-parametric tests performed between dynamic binning and fixed binning with threshold of 7 ppm. It is also obvious that dynamic binning method is performing better than fixed binning with 13 and 19 ppm thresholds and that the dynamic binning method improves XCMS's quantification performance. Dynamic binning is performing much better than mzMine and perform similarly as Progenesis in 3 out 4 analysis. The quality score assessing the performance to identify spiked-in features in stable molecular background is a cumulative score, i.e. it is also worth exploring the ranking of the spiked-compound-related features amongst the features identified to be differential between spiking levels. The cumulative quality scores in function of ranked list of discriminating features is shown in Fig. 7. In this figure, a binary heat map is included above the x-axis. Each row in this heat map indicates the differential features ranked from the most differential (t-statistics) to the weaker ones from left to right. In these, the colored bars indicate features related to the spiked-in compounds, which features are contributing to the quality cumulatively scores as indicated by the y-axis of the line plots. The white bars, on the contrary, indicate the features that correspond to compounds from the background sample which are non-discriminatory between the different spiked-in concentration levels. These non-discriminatory features will lower the increase in quality score for subsequent lower-ranked IS features. For example, the dynamic binning method contains more coloured bars than white bars amongst the most discriminant features at the left part of the plots, which indicates it detected more differentially spiked-in IS-related features as reflected in the quality score. This result indicates that dynamic binning method detect differential features more accurately as indicated by the higher number of spiked-in related features occurring amongst the most discriminative features compared with fixed binning.
The two factors mentioned above, i.e. (1) larger number of coloured cells which (2) accumulate more abundantly at the left side of the plot, result in its higher cumulative score compared with fixed binning method at 7 ppm, 13 ppm and 19 ppm.    I  S  i  n  d  i  c  a  t  e  d  a  s  c  o  l  o  r  e  d  c  e  l  l  s  ,  w  h  i  c  h  a  r  e  i  n  c  r  e  a  s  i  n  g  t  h  e  c  u  m  u  l  a  t  i  v  e  q  u  a  l  i  t  y  s  c  o  r  e  i  n  t  h  e   y  -a  x  i  s  .  T  h  e  w  h  i  t  e  c  e  l  l  s  i  n  d  i  c  a  t  e  t  h  e  f  e  a  t  u  r  e  s  n  o  t  r  e  l  a  t  e  d  t  o  I  S  a  n  d  t  h  e  r  e  f  o  r  e  t  h  e  s  e  f  e  a  t  u  r  e  s  1  )  a  r  e   n  o  t  c  o  n  t  r  i  b  u  t  i  n  g  t  o  t  h  e  c  u  m  u  l  a  t  i  v  e  q  u  a  l  i  t  y  s  c  o  r  e  i  n  c  r  e  a  s  e  a  n  d  2  )  indicate Progenesis and dynamic binning XCMS has a strong ability to find spiked-in compound related discriminating features and identify less background peaks having the same level across all spiked-in levels.
Apart from the aspect of finding more accurately features with different levels, it is also worth evaluating the precision of the quantification of different methods. Fig. 8 shows the scatter plot of log 2 fold change (x-axis) versus log 10 average abundance of a matched features (y-axis). The y-axis range of XCMS and mzMine is similar (5-11), while Progenesis is quite different (3-9), which indicates differences in the quantification metric. To assess if pipeline preserve the quantitative order of peaks, EICs (Fig. A.7 for compound A, B and C respectively). Multiple possible reasons may exists to explain the abundance difference such as that XCMS and mzMine used the algorithm of CAMERA for isotope filtration, in which usually the highest isotope peaks are used for abundance calculation, while Progenesis used the average of isotope clusters for the abundance calculation. It should also be that the peak are of Compound A and B differs around 1 order of magnitude, which is accurately captured by the dynamic peak picking of XCMS and mzMine, while this difference is almost two order of magnitudes in case of Progenesis pre-processed data. Another explanation can be that the aggregate LC-MS map is constructed from aligned chromatogram of all fold changes. In this setup, the highest spiking level determine the area rectangle where ion intensity summation for all identified isotopes is performed and this rectangle is larger that should be for lower spiking levels features and may include therefore ions that do not belong to the spiking features leading to intensity order change.
The x-axis of Fig. 8  approach using the aggregate map allows detection of lower abundant peak due to the use of average signal across multiple LC-MS map for peak picking, which helps to identifying larger number of less differential peaks as indicated by the higher cumulative scores of Progenesis compared to dynamic binning XCMS approach taking into account all features for the score calculation. The density plot of CV between 0 and 1.5 (i.e. 150%) is shown in Fig. 9. These figures show that Progenesis has the most accurate feature quantification indicated by lower CVs and thus indicating lower variability of features in replicates of five individual concentration levels, while mzMine and XCMS have a wider distribution of CV. The better performance of Progensis is due to the use of aligned aggregate map for peak picking and feature detection, which can detect more accurately features with low variance as discussed previously. Fig. 9.   (Fig. 3) and more accurate quantification ( Fig.   4 and 5). As a result, the dynamic binning method shows a higher quality score (Fig. 6, maximum p-value: 9.8·10 -3 ) and cumulative score (Fig. 7) compared with the fixed binning method. This indicates XCMS's ability to find meaningful compounds (i.e. lipid biomarkers) is improved.
The results from the quality score and cumulative score also show Progenesis achieves similarly high quality scores and cumulative scores in all four fold changes to dynamic binning XCMS method, which is due to the use of aligned aggregate LC-MS for feature detection. Apart from the quality score and cumulative score, to evaluate the stability in quantification for XCMS, mzMine and Progenesis, we also determined the log 2 fold change distribution (Fig. 8) and CV distribution (Fig. 9) based on the intensities of detected features. The results show that Progenesis has lower CVs for plasma related features, which indicates lower variability of features in replicates of five individual concentration levels, followed by XCMS and mzMine. The lower variability of Progenesis may be due to the unique quantification algorithm (Fig. A.7), however, this also altered the order of quantitative values of ISs (Table A.5). The use of aligned aggregate LC-MS maps for feature detection area may be beneficial to quantify peaks with low variability but can lead to artifacts in case of misaligned LC-MS maps and for features with larger variability. Dynamic binning peak picking in XCMS shows similar performance compared to Progenesis without the use of aligned aggregate maps and being exposed to the previous mentioned artifacts, we recommend it used in lipidomics and metabolomics quantitative profiling.    K  a  l  m  a  n  f  i  l  t  e  r  -b  a  s  e  d  X  C  -M  S  i  s  o  t  o  p  e  t  r  a  c  e  f  e  a  t  u  r  e  d  e  t  e  c  t  i  o  n  ,  B  i  o  i  n  f  o  r  m  a  t  i  c  s  .  3  0  (  2  0  1  4  )  2  6  3  6  -2  6  4  3  .  h  t  t  p  s  :  /  /  d  o  i  .  o  r  g  /  1  0  .  1  0  9  3  /  b  i  o  i  n  f  o  r  m  a  t  i  c  s  /  b  t  u  3  5  9  .  [  2  4  ]  A  .  R  a  f  i  e  i  ,  L  .  S  l  e  n  o  ,  C  o  m  p  a  r  i  s  o  n  o  f  p  e  a  k  -p  i  c  k  i  n  g  w  o  r  k  f  l  o  w  s  f  o  r  u  n  t  a  r  g  e  t  e  d  l  i  q  u  i  d  c  h  r  o  m  a  t  o  g  r  a  p  h  y  /  h  i  g  h  -r  e  s  o  l  u  t  i  o  n  m  a  s  s  s  p  e  c  t  r  o  m  e  t  r  y  m  e  t  a  b  o  l  o  m  i  c  s  d  a  t  a  a  n  a  l  y  s  i  s  ,  R  a  p  i  d  C  o  m  m  u  n  .  M  a  s  s  S  p  e  c  t  r  o  m  .  2  9  (  2  0  1