Tutorial: Correction of shifts in single-stage LC-MS(/MS) data

Single-stage LC-MS data (MS1 map) should be comparable for accurate


Introduction
Over the past decade LC-MS(/MS) technology has been routinely used in proteomics and metabolomics laboratories to analyse complex biological samples [1,2]. However, to understand system level perturbations and molecular mechanisms of biological events and diseases, quantitative values of biomolecules are required to determine which compounds show differential levels between sample groups [3e6].
The non-fragmented single-stage part (MS1) of LC-MS(/MS) data is described with two separation dimensions such as mass to charge ratio (m/z) and retention time (rt) and one readout (ion intensity iin), which data is considered as second order tensor obtained from second order analytical instrument ( Fig. 1) [7]. MS1 data contains signals from all compounds that can be detected by an LC-MS(/MS) system and is the signal of choice used by label-free quantification approach [8,10,11]. MS1 signal is used for quantification for stable-isotope chemical labelling methods, which provide sample specific signal in MS1 domain such as stable isotope labelling by amino acids in cell culture (SILAC) [12], isotope-coded affinity tags (ICAT) [13] and isotope-coded protein labelling (ICPL) [14]. In ideal case, rt and m/z coordinates of one compound would not differ in MS1 maps facilitating the identification of MS1 signal of identical compounds using these coordinates independently to their identification status and intensity of compounds in multiple MS1 maps. However, these coordinates in the MS1 maps are not constant and are subject to variation. Examples for these variations include those correctable by a single monotonic function and those that are not correctable by such a function. These variabilities in conjunction with local compound density and other signal processing parameters such as the presence of chemical and electronic noise should be taken into account by the quantitative LC-MS(/MS) data pre-processing workflow to provide accurate quantitative tables with columns (or rows) corresponding to samples and rows (or columns) to compounds, which data are used subsequently for statistical evaluation. The iin readout may include variations for example as a result of differences in the injected sample amount due to variation in the quantity of all or of a subset of compounds varying in intensity in one batch. Also, ion suppression effects may reduce or increase the iin readout value of the compounds affected by it. These variations affect the quality of the quantitative table obtained upon LC-MS(/MS) pre-processing and ultimately the statistical outcome of biomarker discovery or differential expression analysis.
The minimal MS1 data pre-processing workflow includes only modules for peak detection and matching and assumes no shift in the rt and m/z dimensions and in the iin readout of MS1 data (Fig. 2a). Typical quantitative MS1 LC-MS(/MS) data pre-processing ( Fig. 2b) consists of modules for data format conversion, raw data resampling in retention time and m/z dimensions, denoising, correction for background ion intensity, peak detection and quantification followed by correction of shifts occurring in each of the rt and m/z dimensions and the iin readout of the MS1 data. Algorithms, which corrects for shifts in the rt domain are named retention time alignment methods, algorithms that correct shifts in the m/z domain are called mass (re)calibration and algorithms that corrects "shifts" in the iin readout are classified as normalisation approaches. The term "shift" in the iin readout cannot be interpreted similarly to the separation dimensions, but similar phenomena can be observed e.g. when total amount of injected sample differs. Correction of "shifts" can be treated mathematically similarly to those of separation dimensions. The final step after correction of shifts in the two separation dimensions of MS1 is the peak matching step, which identifies the MS1 information of identical compounds in multiple LC-MS(/MS) chromatograms based on m/z and rt coordinates, by matching based on an identified peptide sequence or based on the similarity between MS/MS spectra in data-dependent acquired LC-MS/MS data. All these steps are required prior to statistical evaluation and are implemented in automated data pre-processing pipelines [15e20]. One of the most critical steps is the accurate correction of shifts in the rt and m/z dimensions and in the iin readout of MS1 maps. Improper correction of shifts may lead to inaccurately matched peaks and to quantification bias which may ultimately lead to inappropriate conclusions after the statistical analysis. Presence of such error is often only recognized much later during experimental validation of the original biomarker discovery results contributing to irreproducibility of biological and preclinical studies and leading to loss of analysis time, research effort and resources [21]. In this tutorial, we focus on the LC-MS(/MS) analysis conditions, which results in comparable MS1 LC-MS(/MS) maps and on algorithms which are able to accurately pre-process the obtained MS1 maps. This paper restricts the discussion of LC-MS(/MS) pre-processing with respect to sources of variability, variability assessment and their correction approaches used between MS1 maps of LC-MS(/MS) data. Special attention is devoted to discuss the physicochemical origins and algorithmic treatment of correctable (monotonic) and non-correctable non-monotonic shifts in m/z and rt separation dimensions, and in the iin quantitative readout. This tutorial is aimed for experimental scientists planning molecular profiling experiments, aiming to generate MS1 data that can be preprocessed accurately, as well for bioinformaticians, who are developing new algorithms for LC-MS(/MS) data pre-processing and quality control.

Definitions and statements
In order to avoid confusion and facilitate the reading of the article we define here terms that will be used throughout the manuscript. Single-stage LC-MS(/MS) or MS1 dimensions: dimension definition is used both for the separation (rt or m/z) and for the readout (iin) variables of MS1 map. Monotonic shifts: monotonic shifts are differences (fluctuations) of values in one of the rt and m/z dimensions or in the iin readout of MS1 map pair of the same compounds (for rt and m/z dimensions) or the same compounds with the same quantity (iin readout) that can be corrected using a monotonic function. Non-monotonic shifts: is the differences (fluctuation) of values in one of the rt and m/z dimensions or in iin readout of MS1 map pair of the same compounds (for rt and m/z dimensions) or the same compounds with the same quantity (iin readout), which remains after correction with monotonic shift. Monotonic and non-monotonic shifts are always defined in the same dimension (or readout) of MS1 map pairs i.e. between m/z, rt or iin. Orthogonality: Orthogonality has many definitions in different science disciplines. In mathematics, algebra defines orthogonality of two vectors, which have dot product of zero. More general definition of orthogonality relates to synonyms such as independence, non-correlated or non-overlapping properties. Analytical chemistry uses the term orthogonality to measure the similarities and differences of two separation systems e.g. in liquid chromatography. Camenzuli et al. defines the orthogonality measure of two chromatographic separations as characteristics, which describes the degree of independence of two separation systems [22]. Gilar et al. provided similar but more practical definition of orthogonality as characteristics, which defines orthogonality as the joint peak capacity of two chromatographic system evaluated by occupancy percentage of bins with the same compound in the complete peak capacity space [23]. The analytical chemistry definition of orthogonality allow to interpret smaller and larger orthogonality differently from the algebraic binary definition, where two vector are either orthogonal or not. There are different metrics for orthogonality reported in the literature of analytical chemistry [22e25] and each of them refer to the fraction of area occupied by common compounds in the separation space of two chromatographic systems. These metrics can take values between 0 and 1, where 0 means two equivalent, and 1 reflects two fully independent separation systems. Since orthogonality is assessed using the common compounds therefore its value is dependent not only from the separation dimensions, but also from chemical space of the analysed compounds. We interpret orthogonality following the analytical chemistry's definition.

Conditions for correcting shifts
MS1 data has two separation dimensions (m/z and rt) and one readout (iin) as described in the introduction. Quantitative The dimensions are mass-to-charge ratio (m/z), retention time (rt) and ion intensity (iin) readout. Chromatographic pairs can show monotonic shift and non-monotonic shift with orthogonality component, where monotonic shift can be corrected, while the remaining non-monotonic shifts including orthogonality determines the uncertainty to find corresponding peaks in the chromatograms using rt and m/z dimensions. Orthogonality in iin readout leads to statistical bias and increase false discovery in statistical differential analysis.
information of compounds in MS1 data is represented as 3dimensional Gaussian (or Lorentzian) peaks, where iin is the extent of the peak while rt and m/z represent the location of the peak maxima. The distinction between iin readout and the m/z and rt dimensions is reflected by the role of these variables. m/z and rt characterise the peak capacity of the analytical system and are related to the physicochemical properties of a compound, while the quantity of a compound is expressed in the iin readout, which is the main interest of the subsequent quantitative statistical analysis. Algorithms correcting for shifts are generally applied to LC-MS(/ MS) chromatographic pairs, but some approaches perform alignment of the complete dataset in one step such as the Continuous Profile Model [26,27]. This method assumes one common underlying molecular profile, to which all chromatograms are aligned using a hidden Markov model [26]. In pairwise alignment, generally the MS1 coordinate of the raw data or feature list in one chromatogram (often called sample chromatogram) is corrected to the other non-altered chromatogram considered to be the reference. In this tutorial we discuss pairwise alignment of MS1 maps approaches but similar conditions apply for methods that align the complete data set in one step. Shifts in two separation dimensions and readout of MS1 map may occur, and these shifts have a physicochemical and/or instrumental cause or originate as error of LC-MS(/MS) data pre-processing. In rt and m/z dimensions and in the iin readout of MS1 map, monotonic shifts can be corrected when the following conditions are met: 1. Sample chromatograms should contain common compounds for alignment in the m/z and rt dimension, while for normalization (correction in the iin readout) the samples should contain common compounds with the same quantity in the chromatographic pairs. 2. The alignment algorithm should identify an adequate number of common peaks accurately for alignment in rt and m/z dimensions, while the iin readout (normalisation) should identify common compounds that are present in the same quantity in sufficient numbers and in sufficient distribution in the range of interest, which allows accurate alignment. ) optimal label-free MS1 data pre-processing workflows. Two modules are required for minimal workflow, which includes peak detection/ quantification modules (green) and module that matches the corresponding peaks across multiple chromatograms (purple). The minimal module assumes no monotonic shift and orthogonality in rt, m/z dimensions and iin readout. The optimal workflow implements modules for correction to monotonic shifts in the rt and m/z separation dimensions and in the iin readout of MS1 map corresponding to time alignment (correction in rt), to mass (re)calibration (correction in m/z) and to normalization (correction to iin). Other modules such as noise, data reduction, and resampling are additional modules of the workflow. Although not present in current pipelines, orthogonality assessment and modelling module e.g. by use of retention time prediction or feature decharging algorithms may add additional precision for LC-MS(/MS) data pre-processing workflow. The result of LC-MS(/MS) preprocessing is a quantitative table of compounds detected in multiple chromatograms serving as input for differential statistical analysis. Scheme b) was adopted from Christin et al. [39]. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) 3. Common compounds should follow the same order in both chromatograms for m/z and rt dimensions. In iin readout, the order of ion intensity of the common compounds present in the same quantity should be the same in the two chromatograms.
It is important to note that accurate single monotonic correction function applied to all compound cannot be derived if one or more of these conditions are not met. It is the common compounds (in rt and m/z dimension) and the common compounds that are present in the same quantity (in iin readout) in the two chromatograms that convey the information, that should be used to derive the single monotonic correction function. After obtaining the correction function, all rt, m/z and iin values of the other compounds will be corrected with the derived correction function. The requirement that common compounds should have the same quantity in the two chromatograms for alignment in iin is due to the fact that detector response and ion suppression/competition effects may be different at different concentration ranges. In fact the condition of having the same compounds in the same quantity can be seen to be too restrictive compared to requirement of known quantity. However, in iin readout the signal of compounds may be affected by the other compounds present e.g. due to ion-supression, while this coupling   Fig. 3). The monotonic retention time correction function is shown as a red solid line. The maximal deviation of peptides from the monotonic correction function obtained with robust kernel density approach and between laboratories is shown with red dashed line (red D). Green dashed line and greed "d" label shows the maximal deviation of is negligible in the rt and m/z dimensions, i.e. the influence of other compounds on the rt and m/z of one particular compound in the sample is limited. Using compounds with known but different quantities in the two chromatograms would result in compounds that are in different concentration ranges and their values could be affected by different detector response and/or ion suppression.
When the second condition is not met, common compounds or compounds with the same quantity are present in the two chromatograms, but the correction algorithm is unable to find them in sufficient number, density and accuracy to perform accurate correction. Beside the numbers of common compounds and common compounds with the same quantity, the distribution of them along the full measured range is important as well. If there are domains with no or low number of common compounds or compounds with the same quantity present, then information for monotonic shift correction is lacking at these locations and local misalignment may occur. In highly complex proteomics samples, common compounds and compounds with the same quantity are present in sufficient number and density across the full measured range. This may be challenging however for lower complexity metabolomics data. Typical examples of lack of information is at the beginning or end of the chromatogram where no compounds elute. Other important aspect is the accuracy of the alignment algorithm to select the common compound or the compound present with same quantity. If mismatched compounds or noise is present with large extent, then correction algorithm may be inaccurate. When the third point is not met, the common compounds or compounds with the same quantity are mixed-up and the exact location or quantity of a compound cannot be exactly determined in the other chromatogram by deriving a single monotonic correction function.

Distinction between monotonic and non-monotonic shifts and orthogonality
Correctable shift should be monotonic since any deviation from monotonicity would lead to a break the one-to-one correspondence of coordinate transformation. Monotonicity of shifts also ensures the mathematical inversion of the shift correcting function, which in fact inverses the role of sample and reference in the aligned chromatographic pairs. Monotonic and non-monotonic shifts have a different physicochemical origins and should be algorithmically treated differently. Monotonic shifts can be corrected, but non-monotonic one not unless the physicochemical process that leads to non-monotonic shift can be fully modelled. It is important to note that monotonic shift should be corrected with single monotonic function generally applied to all compounds in MS1 maps. Correction for non-monotonic shift requires compound specific monotonic correction function obtained from precise modelling of retention mechanisms or intensity changes of compounds. The application of a monotonic function to a group of signals is rare, but one example is provided later when individual monotonic function is applied for each m/z channel of MS1 map to correct small fluctuation in ion trap data caused by charge repulsion. Assessment of monotonic and non-monotonic shifts are performed using only compounds that are present in both chromatograms (common compounds) and using common compounds that are present with the same quantity in the two chromatograms in iin readout.
Non-monotonic shift may have two components. One component is related to data pre-processing errors such as to determine compound signal location in MS1 map (m/z and rt dimensions) or compound quantification (iin readout). The second is related to elution order inversion of common compounds and therefore can be interpreted as the analytical chemistry definition of orthogonality. The metric to calculate orthogonality should be calculated after correction for monotonic shift and will inevitably contain the data pre-processing error. Comparable MS1 maps without the need for complex modelling of orthogonality can be therefore obtained for MS1 map pairs, which includes only monotonic shift and nonmonotonic shifts with data pre-processing error component.
Publications so far discuss separately alignment (correctable monotonic shift) and assessment of orthogonality in LC-MS(/MS) (and GC-MS or CE-MS) data. For example orthogonality is considered absent when it comes to design of retention time alignment algorithm even the existence of elution order i.e. presence of small orthogonality was recognised in multiple articles [28,29]. However, it is obvious that the two phenomena may be present to a different extent in various datasets, and may influence the performance of monotonic shift correction and orthogonality assessment algorithms. Orthogonality in the literature was related solely to the retention time domain and was not mentioned for the m/z dimension or in the iin readout of MS1 map [22e25]. With correction of single monotonic function, we separate monotonic shift from non-monotonic ones, which may have orthogonality component. Fig. 3 shows a pair of chromatograms of the same complex proteomics sample that shows non-linear monotonic shifts mixed with orthogonality and non-monotonic shift due to data pre-processing error. The figure also shows the monotonic retention time correction function and the non-monotonic shift after correcting for monotonic shift with a single monotonic function applied to all compounds.
Since orthogonality cannot be corrected without accurate modelling and without knowing the identity of the peak in the MS1 data it has as consequence that either rt or m/z coordinates of a compound cannot be predicted precisely in other LC-MS data, while in the iin readout the normalisation will have limited precision. Fig. 4a shows a scatterplot of retention time of identical peptides in two chromatograms that were obtained with analysis of the same sample using two different LC-MS/MS platforms and gradient LC programs. Non-linear monotonic shift and orthogonality is obviously visible on the plot. Alignment of the two chromatograms using monotonic best fitted retention time correction function on the scatterplot using LOWESS regression constrained for monotonicity results in accurate alignment of peaks that are located on the correction function, while peaks far from this function are misaligned ( Fig. 4b and c). Orthogonality in this tutorial is assumed to have a symmetric form around a main monotonic trend, which is generally the case when the goal is to align datasets corrected for non-monotonic shift with small orthogonality component (i.e. strong correlation of rt of the same compounds in peptides from the main monotonic retention time correction function in data that was acquired in the same laboratory using the same LC-MS/MS platform and the same eluent program. The difference between red "D" and green "d" is related to the non-monotonic shift of the liquid chromatographic separation and shows the uncertainty to determine corresponding peak locations in two different chromatograms. Peak pairs with red, blue and green circles in the black dashed box area are corresponding to the three peak pairs that are used to illustrate the effect of peak elution order inversion in extracted ion chromatograms (EICs) in plots b and c after aligning one of the chromatograms to the other one. In plot b), the chromatogram of laboratory 1 was aligned to the chromatogram of laboratory 2, while in plot c) the chromatogram of laboratory 2 was aligned to the chromatogram of laboratory 1. Peptide LTLPQLEIR (green arrows) is located on the monotonic retention-time correction function, while the peptides DIAPTLTLYVGK (red arrows) and VHQFFNVGLIQPGSVK (blue arrows) are located far from this function. Retention time alignment using a single monotonic retention time correction function provides well aligned peaks for the first peptide (green traces). The two other peptides (red and blue arrows) suffer from considerable misalignment with retention time error close to the distance D due to considerable orthogonality. The EICs are normalized to the highest peaks and the Y axis represent ion counts relative to the most abundant signal intensity. Figures adapted from Mitra et al. [33]. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) the two chromatograms). This situation may be different when orthogonality is large e.g. in case of optimisation of peak capacity in multidimensional chromatography [22,30]. Another assumption that we include in the discussion of monotonic and non-monotonic shifts is that these shifts are independent between the two separation dimensions of m/z, and rt, except for iin which lead to the requirement of having the same compounds with the same quantity present in chromatographic pairs. Interaction between rt and m/z dimensions exist but their effect is generally small [31,32].
Orthogonality can be also considered between the rt and m/z dimensions, and the iin readout of single MS1 map, however this orthogonality is not related to the assessment of comparable MS1 maps and is therefore outside of the scope of this paper.

Shifts and orthogonality in single-stage LC-MS data
In this section the physicochemical origins of monotonic and non-monotonic shifts in the rt, m/z dimensions and in the iin readout along with algorithms that are used to correct for monotonic shift or assess the degree of non-monotonic shifts is discussed in detail. One pertinent problem relates to the definition of the term "same compound" in multiple samples. A chemical compound can be modified in different ways ranging from chemical modifications, adduct formation, charge state differences, or can be present at different degrees of dissimilarity when it comes to chemical and 3D structures such as diastereomerisation, cis/trans isomerization, structural (constitutional) isomers, chiral isomerisation and conformation changes. Table 1 lists molecular variants and modifications that describe how compounds in the same chemical structure family can be discriminated in the rt and m/z dimensions and in the iin readout of the MS1 map.

Retention time dimension
Physicochemical background. The dimension most prone for shift and orthogonality is the chromatographic dimension. Multiple factors may influence the elution time of a compound which may result in non-linear retention time shifts between chromatograms, such as slight changes in column/eluent temperature, slight changes in eluent's pH, modification of the stationary phase surface e.g. due to accumulation of the non-eluted components from previously analysed samples, degradation of the surface chemistry or mechanical changes of the stationary phase due to high pressure and slight changes in the solvent delivery and/or mixing system of the liquid chromatography apparatus [9].
Within a quantitative profiling study, orthogonality of separation is a property that is attempted to be minimized since orthogonality lowers the precision to predict the retention time of a compound in different MS1 maps [33]. Orthogonality may have different origins compared to monotonic shifts, such as those listed as cause of non-linear monotonic shifts. For example, simple change of the gradient program leads to slight orthogonality. The reason of this orthogonality has been already described in the linear solvent strength theory introduced by Snyder and his coworkers in the 60's [34] and this effect was considered by other researchers as well [35,36]. As a consequence, chromatograms acquired with different gradient programs will show different degrees of orthogonality, which in turn determines the maximal accuracy that can be achieved by retention time alignment using single non-linear monotonic correcting function. It is therefore important to consider for data generator and data evaluator scientists, that the same LC column the same gradient program and eluent composition should be used to obtain comparable MS1 maps. However these conditions are not sufficient in obtaining comparable MS1 maps, since it does not account of e.g. degradation of the LC column nor in change of gradient delivery systems.
Monotonic shift correction algorithms. In the last two decades multiple retention time correction algorithms were developed as part of label-free LC-MS(/MS) data pre-processing workflows [19,33,37e50]. A comprehensive review by Smith et al. [9] includes discussion of 50 open source retention time alignment algorithms. Although several retention time alignment algorithms exist, the general objective of every time alignment algorithm is to first identify peaks (or signal) of the same compound in two (or more) chromatograms and provide a retention time transformation function, that corrects for monotonic retention time shifts and aligns LC-MS(/MS) datasets. Retention time correction algorithms can be classified in many ways such as: i) type of data and MS1 map dimensions used for the alignment, such as using the complete MS1 map, total ion or base peak chromatograms, peak lists [39]; ii) if alignment is performed pairwise or in one step and iii) type of benefit or objective function used to measure similarity of the chromatographic pair, which is used subsequently to derive retention time correction function (e.g. sum of the squared ion intensity distance of raw data, correlation of raw ion intensity or sum of overlapping peak volume).
One of the most widely used algorithmic approach to derive the correction function is dynamic time warping (DTW) [51] that identifies the optimal retention time correspondence path. This path can be obtained by minimizing the cumulative differences between the LC-MS signal at different sampling points either using peak lists [52], TIC [47] or the regions of MS1 maps [53]. Correlation-Optimized time Warping (COW) [54] performs segment-wise stretching or shrinking of the retention time segments and uses a cumulative benefit function that maximizes segment profile similarity such as correlation [54] or sum of overlapping peak volumes [55]. The combination of segments positions that fit best the reference chromatogram is obtained using dynamic programming. Christin et al. [45] combined COmponent Detection Algorithm (CODA) with COW, which algorithm includes only information from LC-MS mass traces that contain low noise and background and large number of high abundant peaks from the sample and reference chromatograms. CODA implements a moving window, to detect m/z traces in different retention time domains with high quality peak content. Another algorithm called parametric and semi-parametric time warping ((s)PTW) uses fitted polynomial as a warping function that minimize the profile abundance differences between LC-MS chromatograms using TIC [56e58] or combined CODA selected mass traces [53]. OpenMS [59] applies an affine transformation to the retention time coordinates of sample feature list using linear regression on features obtained with robust matching (pose clustering) of the rt and m/z coordinates.
Commonly used time alignment methods either use centroid peak lists or charge-state-and isotope-deconvoluted feature lists. These lists are then used to model a retention time alignment function based on retention time values of correspondences. Correspondences could be defined as matched peak pairs within certain rt and m/z coordinates or bins or matched landmark isotopic features between datasets. However algorithms such as PEPPeR [60], SuperHirn [18], IDEAL-Q [42] and LCMSWARP [61] use a combination of isotopic feature detection and MS/MS identification to enhance the "Landmark Matching" process prior to retention time alignment. Many time alignment algorithms perform alignment pairwise, which poses the problem of reference selection. Star type of alignment using one reference to which all other Table 1 Summary of molecular variants, which effects the definition of same compound (molecular entity). The table contains molecular variants at various levels and presents how molecular variants can be distinguished in the rt and m/ z dimensions and the iin readout of the MS1 LC-MS(/MS) data.

Chemical modifications (covalent bond changes)
Difference can be expected, which extent is depending form the type and size of the modification Difference is expected if there is a change in molecular mass of the target compound.
Chemical modification leads to differences in ionisation properties, therefore same ion intensity may express different amount of compounds. Same chemical but different isotopic constitution No difference in retention time, only slight difference is expected when deuterium/hydrogen replacement occurs.
Difference should be observed when mass of the intact ion changes.
No difference between members of this type of compounds is to be expected. Different charge state Certain eluent composition (e.g. pH) may influence charge of the peak and therefore the retention time. The effect is depending form the time scale of hydrogen exchange and the pH.
In principle the charge states during liquid chromatography influence the charge distribution of the analytes in the MS. The same holds in changing electrospray conditions such as voltage, application of shearing gas (ionspray), different eluent or uses of eluent modifiers etc).
Charge state differences in chromatography or at the MS interface may influence the number of formed ions and may provide different detected response.
Adduct formation (Na þ , K þ , NH 4 þ , Mg 2þ , Ca 2þ etc.) May result in distinct peaks in the LC dimension. Results in distinct peaks if mass of the compound changes. Adduction formation may influence the competition for charges and this could lead to different detector response. Diastereomers, cis/trans isomers Physicochemical property changes of the analyte may result in different retention time.
Undistinguishable in this dimensions without fragmentation.
Very small (mass defect) or no difference is to be expected.

Constitutional isomers
May be resolved in chromatographic domain, but retention time are expected to be close, except when 3D structure has major changes.
Undistinguishable in this dimension without fragmentation. Expected to provide the same response.

Chirality
May be distinguishable in this dimension in special condition e.g. by using chiral counter ions or chiral stationary phases.
Undistinguishable in this dimension without fragmentation. Expected to provide the same response.

Conformational isomers
May be resolved in chromatographic domain, but retention time are expected to be close, except when 3D structure has major changes.
Undistinguishable in this dimension without fragmentation. Expected to provide the same response.

V. Mitra et al. / Analytica Chimica Acta 999 (2018) 37e53
chromatograms are aligned is suboptimal in alignment of large dataset containing chromatograms with dissimilar molecular composition. Voss et al. [52] developed the simultaneous multiple alignment of LC-MS peak lists. This algorithm performs the pairwise matching of peak lists following a hierarchical-tree based alignment of subsequent chromatographic pairs using peak list similarity as sequence of alignments. Finally, the algorithm calculates a global retention time correction function using a multidimensional kernel function and uses maximum likelihood estimation to derive the common elution profile. It should be noted that the assumption of the existence of a global retention time profile of MS1 map set could be wrong e.g. in dataset that contains chromatogram obtained with different gradient programs due to orthogonality. Many papers confuse time alignment with peak or feature matching step and use the word "feature alignment" or "peak alignment" for peak matching. The origin of this confusion may be that retention time shift correction algorithms need information from common compounds and one of the goals of shift correction algorithms is to find them. However, the goal of shift correction algorithms is not necessarily to find all common peaks (or signal of common compounds) between chromatograms, but to find them in a sufficient number, distribution and quality that allows to obtain a single monotonic shift correction function. After correction of shifts, the final peak matching algorithm is used to identify with highest accuracy all corresponding peaks across multiple chromatograms. The monotonicity aspect of shift correction means that the shift correction function cannot change the elution order of the peaks and provides one-to-one correspondences between chromatograms, while peak matching should deal with the remaining non-monotonic shift. The accuracy of the peak matching step will be dependent on how close the algorithm should look for corresponding partners in the two chromatograms, which distance will be smaller in case of data that was successfully corrected for monotonic shift compared to data where considerable monotonic shift is present. Many algorithms combine time alignment and feature matching in one module. PEPPeR, IDEAL-Q, SIMA [52], LWBMatch [62] and algorithm developed by Wandy et al. [63] which include grouping of peaks of related compounds are examples of algorithms which combine time alignment with peak matching within a single module.
Datasets with considerable peak elution order inversion (orthogonality) was aligned by Bloemberg et al. [64] using masstrace optimized PTW. However, PTW does not change the elution order of the peaks, since it derive monotonic retention time correction function, and cannot deal properly with LC-MS(/MS) pairs with significant elution order inversion. It is also obvious that the retention mechanism of analytes/stationary phase that lead to elution order inversion i.e. orthogonality in two chromatograms does not solely depend on the m/z of the compound, but rather on other parameters and from complex retention mechanism of the eluting compounds. This approach providing different retention time correction function for different m/z traces does not take into account peak elution order inversion within a mass trace.
Non-monotonic shift assessment algorithms. Metrics to measure the amplitude of orthogonality were solely developed for retention time dimensions and was used to assess the difference and similarity in chromatography systems. This assessment is based on joint peak capacity in two-dimensional liquid (2D-LC) or gas chromatography systems. The goal in 2D-LC is to maximise orthogonality between the first and second separation dimensions and concomitant peak capacity of the chromatographic system, therefore those algorithms deal with large orthogonality. One of the first metrics for orthogonality was introduced by Gilar et al. [23,65]. This metric measures the occupancy of bins of common peaks determined based on identified peptide sequences in the retention space of the two chromatograms. Recently Camenzuli et al. [22] introduced a generic measure of orthogonality that uses spread of peaks along 4 equations enclosing 45 of angle and crossing in the middle of normalized retention time that range between values of 0 and 1. The latter approach is independent on the density distribution of peaks providing an accurate measure of orthogonality. Gilar et al. [24] compared 4 different measures of orthogonality using binning of retention times (correlation coefficients, mutual information, box-counting dimensionality, and surface fractional coverage with different hulls) and concluded that except correlation all orthogonality metrics are related to each other and are suitable to optimise peak capacity in two dimensional chromatography. Schure et al. [25] recently summarized the 20 metrics of orthogonality and assessed their performance using 47 twodimensional LC chromatograms. This article pointed out that there are many metrics to measure orthogonality. Principal component analysis of the different orthogonality metrics shows that despite the fact that the studied metrics are correlated they do capture different aspects of the data. However so far there is no approach published that assesses orthogonality at the lower end i.e. small orthogonality between chromatographic separations. Developing metrics to measure small orthogonality is important, since orthogonality causes uncertainty to predict where a compound will elute in the other chromatogram and therefore determines the search domain to look for corresponding peaks by the peak matching algorithm using rt and m/z coordinates. Many peak matching algorithms try to find corresponding peak at all cost by allowing wide range to search for corresponding partners, which implementation may lead to mismatched peaks and subsequent statistical error. For this reason, we have developed an approach that assesses the extent of non-monotonic shift corresponding to the maximal retention time matching domain after alignment with single monotonic function. The algorithm determines the uncertainty region used to identify corresponding peaks in LC-MS(/MS) chromatogram pair of interest and LC-MS(/MS) chromatogram pair acquired subsequently in the same analyzis batch, where no peak elution order occurs and compare these regions on the basis of orthogonal residuals to assess the presence of peak elution order inversion or orthogonality [33].
Orthogonality between chromatograms will also have an effect on the accuracy of retention time normalisation algorithms such as iRT [66,67] or RePLiCal [68], which use the standardized retention time of reference standard set obtained with a standard mixture or spiked QconCAT proteins. In this case orthogonality will decrease the accuracy of normalised retention times or even may lead to completely false results in case of mismatching the reference standard peaks between chromatograms.

Mass to charge ratio dimension
The shifts in the m/z dimensions are mainly monotonic and may be caused e.g. by small change in temperature in the room where the instrument is installed in case of high resolution Orbitrap and time of flight mass analyzers or space-charge effect in case of low resolution three dimensional ion trap mass analyser [39]. Due to well-known physics of ion separation in theory no orthogonality in m/z dimension could happen except for a charge state shift of compounds, which may introduce orthogonality because the different compounds depending on their charge affinity have different charge state distribution changes. Shifts of charge distribution is unconventional, which happens at discrete m/z values, compared to conventional shifts such as retention time shift, which has continuously scale. During electrospray process, ionisation parameters have a large influence on the charge distribution of analytes. For example, ionspray combining electrospray with pneumatic nebulisation used with normal or capillary LC column results in more charges on the same analytes due to triboelectric effect compared to electrospray ionisation regime. The effect of charge is dependent from the chemical composition of analytes, therefore its effect is different for the different analytes resulting in orthogonality. Fig. 5 shows the considerable charge shift in MS1 map obtained with analysis of the same human blood sample depleted from the 6 most abundant proteins on a LC-MS platforms differing in the used LC column diameter, the injected sample amount and electrospray ionisation type (ionspray and electrospray) [69]. No orthogonality measure was so far developed for the m/z dimension, but "orthogonality" due to charge state shifts can be corrected in compound lists by calculating the neutral mass of compounds and summing up the intensity of the different charge states. Other aspects of orthogonality may relate to adduct formation of the same analytes. Adduct formation is often taken into account in untargeted label-free metabolomics LC-MS data preprocessing workflows, and correction for them is performed by summing up intensities that belongs to the different adduct forms of the same metabolite. However, the detector response may be dependent from m/z range and adducts may alter the ionisation efficiency and therefore the measured signal for a given amount of analytes. These changes in detector signal are generally not taken into account when different types of ion signal are summed up in current data pre-processing pipelines.
Mass recalibration algorithms. Several algorithms were developed to correct for monotonic shift in m/z, with the goal to enhance mass accuracy, which becomes essential for modern high resolution mass spectrometers. Space-charge effect in low resolution three dimensional ion trap instruments may cause shift in m/z which stays monotonic within a mass spectrum. Space-charge effect are caused by the presence of high abundant compounds close in m/z to other ions that results in ion repulsion, which effect may be particularly strong in ions trapped in three dimensional space [70]. To correct for shifts in m/z domain, routine calibration of the mass spectrometers based on spiked internals standards [31,39] or ubiquitous background ions and contaminants [71] are performed at regular intervals of time or for each acquired mass spectrum. The most widely used approach to device a single monotonic mass shift correction function is based on regression using polynomial function of 2e5 . Generally one monotonic function is used for all MS spectra of the MS1 map, but it become more common to use MS spectra specific monotonic corrections function especially when calibrants are present in all spectra such as co-infused compounds or background ions. Methods that utilise prior knowledge of the sample being analysed in combination to multidimensional non-parametric regression have shown to decrease standard deviations of m/z errors by 1.8e3.7 fold [31]. Mass correction algorithm that takes part of bioinformatics toolbox of Matlab (available from version R2007a) eliminates the monotonic shift in m/z trace caused by space-charge effect by using advanced data binning algorithms that synchronize all the spectra in a dataset to a common mass/charge grid [72e74] (Fig. 6a and b). Space charging effect influenced by the eluent and co-eluting compound composition is strong in ion trap data, where the order of peaks stays the same but the monotonic shift can differ between different m/z traces. This allows to use different monotonic correction functions for individual m/z trace in contrast to rt domains where single monotonic correction used for all mass trace and compound is justified. Removal of mass measurement error is not only required for MS1 data processing, but also for correction of precursor mass error in the assignment of peptide identifications. One way to correct monotonic shift in m/z dimension is to obtain monotonic correction function for the difference between the measured m/z of the precursor ion and the theoretical m/z of the identified peptide (Fig. 6c) [75]. Petyuk et al. [31] have corrected mass measurement errors for covariates of m/z, such as retention time, ion intensity and other parameters using a multidimensional, nonparametric regression model. Based on the results from the study, the authors expected to reduce the number of false identifications by 2e4 fold after correcting for mass measurement error [31]. Lommen et al. [32] showed the dependency of mass error in function of retention time and ion intensity and the correction for these shifts allowed to reach sub ppm accuracy for steroid metabolites in UHPLC-Orbitrap platform. These studies show that minor interaction between MS1 dimensions exits and have effect on the accuracy of pre-processed LC-MS(/MS) data.

Ion intensity readout
Experimental variability such as fluctuation of ionization efficiency in complex samples e.g. due to ion suppression, changing eluent composition, difference in electrospray interface and parameter settings, and differences in sample preparation can influence quantified peptide/protein levels [76]. Ion suppression is a source of orthogonality in LC-MS(/MS) data in iin readout, since intensity of compounds may differ based on the composition of coeluting compounds [77]. Ion suppression is larger in ionspray which combines electrospray with pneumatic nebulisation to ionise compounds at high eluent flow rate. Pneumatic nebulisation provides triboelectric effect which results in additional charging of compounds depending on their charge affinity [69]. However, ion suppression becomes less important at lower flow rate regimes where electrospray only dominates and this effect disappears at very low flow rates of a few nl/min [78]. In iin domain, methods used to correct monotonic shifts are known as normalisation and approach to assess orthogonality is unknown. When ion suppression effect is taken into consideration normalisation should be performed using the same set of compounds that have the same quantity in the two samples and have sufficiently even distribution in the full dynamic range of the detector. The best practice is to use an internal standard mixture for normalisation purpose, with LC-MS map, which shows the fluctuation of m/z due to space-charge effect in three-dimensional low resolution ion trap data (image a). This fluctuation results in small monotonic shifts, which does not change the order of peaks in m/z dimension and therefore could be corrected with binning algorithms that synchronizes all spectrum in a LC-MS chromatogram to a common m/z grid (image b). Scatter plot of mass error (difference of measured precursor m/z and theoretical m/z calculated from the sequence of identified peptide), showing non-linear monotonic shift and orthogonality in m/z dimension of high resolution Orbitrap LC-MS/MS data (plot c). Correction for monotonic shifts enhances the peptide identification rate, which option is implemented in some data pre-processing workflows. Images a and b were obtained with and LCQ ion trap LC-MS platform analysing a mix of 7 proteins obtained from Sashimi data repository (file 7MIX_STD_110802_1 from http://sashimi.sourceforge.net/repository.html). Plot c was obtained from proteomics analysis of HeLa cell using QExecutive Orbitrap LC-MS/MS platform and 1 h of gradient program.
known absolute concentration of all analytes.
Normalisation approaches. The normalization step has the aim to correct monotonic shifts in iin readout. Commonly applied normalisation approaches use mean, median or some global fixed value to correct constant shift in intensity in each sample [79]. Such normalisation methods remove systematic bias across samples and assume that all peptides behave similarly and independently of their abundances across multiple samples. Constant value are often calculated from a set of unique peptides originating from known house-keeping proteins that are supposed to be tightly regulated and to have similar concentration in biological samples [80]. Global adjustment can correct for differences in the amounts of material loaded on the LC-MS(/MS) system for each sample, but cannot capture more complex (e.g., non-linear and intensity-dependent) biases. LOWESS regression approach applied in the ion intensity domain or quantile normalisation that makes distribution of peaks intensity similar across multiple samples [79,81] can correct for such non-linear bias [79], however these approaches assume that the majority of the compounds are the same and have very similar quantity across samples [76]. ANOVA and regression models can effectively remove systematic differences when their sources are known [82]. In order to normalise and model data obtained from varied sample groups, such as disease versus control, a method called normalized spectral index (SIN) was developed. SIN combines three MS abundance features: peptide count, spectral count and fragmention (MS/MS) intensity [83]. Most normalization methods used for label-free proteomics data, such as normalisation to various central tendencies (e.g. mean, median), LOWESS regression and quantile normalization, have originated in microarray studies [79,84]. Specific LC-MS(/MS) based data normalisation methods have also been developed which applies probability based model for imputing missing events in order to avoid severe biases due to compounds present below the detection limit in the statistical analysis [85]. All of the above described approaches do not change the iin order of peaks originating from the same compounds that have the same quantity in chromatograms i.e. they perform monotonic transformations.
Improper normalisation may introduce bias in the statistical analysis for example when one subclass of compound differs considerably in one sample group while the remaining compounds remains unchanged between samples (so-called non-closed data) and normalisation is performed using a fixed value such as sum of ion intensity, median fold change, sum of peptide-spectrummatches or injected sample amount (Fig. 7). This effect is called size-effect and ratio based normalisation approach should be used to avoid such error [86]. The application of pairwise normalisation allowed to identify synergistic RAS and CIP2A signalling in HeLa cells before and after phosphopeptide enrichment. In this dataset there is a major shift in phosphopeptide composition before and after phosphopeptide enrichment and before and after stimulation of cells leading to major bias in statistical analysis of the phosphopeptide enriched samples without taking into account the enrichment effect. The enrichment effect was corrected using pairwise normalisation, which calculate a global factor using the median ratio of phosphopeptides that are present in samples both before and after phosphopeptide enrichment steps [87].

Order of correction for monotonic shift in rt and m/z dimensions and iin readout of MS1 map pairs
Order of correction for monotonic shifts in the rt, m/z dimensions and in the iin readout and the position of these modules in LC-MS(/MS) pre-processing workflows may influence the quality of LC-MS(/MS) pre-processing. In general correction for monotonic shift in m/z and rt dimensions should be made before peak matching step, since peak matching step require accurate rt and m/z coordinate of compounds. Normalisation in iin readout is generally performed after the peak matching step (Fig. 2). In general orthogonality is rare in m/z dimension, therefore it is advantageous to perform first mass recalibration before retention time alignment. Many retention time alignment algorithms uses m/z of compounds in peak list or in raw data, therefore this alignment order ensures that more accurate m/z values are used to identify common compounds, which drive the time alignment process. Fig. 7. Principle of "effect size" using simulated data of three peaks and two sample groups (red and blue traces). Effect size occurs when one sample class has large changes of one compound (first peak in blue traces) or part of the compounds only and the other peaks does not change (last two peaks in blue traces) compared to peaks in the other sample group (all peaks in red traces) where the amount of these peaks stays the same. The original situation is shown in plot a), while normalized data using the total sum of peak area (or compound quantity) results in lowering the fold change of the peak that has the major quantity change and introduces smaller fold changes in the two peaks that is present with the same quantity. This type of normalization leads to error in subsequent differential statistical analysis. Figure adopted from Filzmoser et al. [86]. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Conclusion
Monotonic and non-monotonic shifts were generally considered separately and orthogonality was exclusively considered in retention time dimension. In this tutorial we have demonstrated that these two types of shifts should be considered separately along the rt and m/z dimensions and the iin readout of MS1 part of label-free LC-MS(/MS) data. This has the benefit to assess the quality of MS1 map in the rt and m/z dimensions and in the iin readout with the same mathematical model (i.e. correctable monotonic and non-correctable non monotonic shift). Accurate quantification of multiple MS1 map is possible when monotonic shift and non-monotonic shift due to LC-MS(/MS) pre-processing error are present in an LC-MS(/MS) data set. It should be noted that signals obtained with other separation methods and spectroscopy/spectrometry techniques suffer from similar problems and there are many algorithms that can be adapted to accurately align and pre-process LC-MS(/MS) data. It is obvious that mass spectrometry coupled to other separation techniques such as capillary electrophoresis (CE-MS) and gas chromatography (GC-MS) present similar behaviours of monotonic and nonmonotonic shifts and orthogonality to those of LC-MS(/MS) data. For example peak elution order inversion was reported in GC-MS and GCÂGC-MS data, which was obtained with different acquisition parameters [88e91]. Signals in two-dimensional gel electrophoresis, NIR or NMR shows joint presence of monotonic and non-monotonic shifts with orthogonality component. One example of algorithm that could be adopted to pre-process LC-MS(/MS) is the generalized fuzzy Hough transform algorithm, which has been used to process NMR spectra acquired in one batch. This algorithm follows NMR signals that change gradually resulting in peak elution order inversion in acquisition-timesorted NMR spectra [92]. Similar algorithm could be adapted to model gradually changing of orthogonality in retention time in LC-MS(/MS) data, which can be used to determine corresponding peaks in datasets where gradual changes in retention time and elution order occur.
Assessment of small orthogonality in LC-MS(/MS) data is important when peak identity is transferred with accurate mass and time tag approach (AMT). AMT uses solely the m/z and rt coordinates of peaks and the increase of erroneous identification transfer due to peak elution order inversion was demonstrated by Tarasova et al. [35]. When orthogonality in the rt dimension is present, the transfer of peak identity will suffer from uncertainty, and may lead to false positives and negatives peak annotation. Therefore, it is necessary to accurately assess the presence of orthogonality between peptide identification in LC-MS/MS chromatograms. The extent of the orthogonality will determine the accuracy of identification transfer from LC-MS/MS data to LC-MS(/ MS) data and will determine the quality of the annotated and quantitative pre-processed MS1 LC-MS(/MS) maps.
In future more effort should be made to develop accurate modelling of orthogonality in the rt and m/z dimensions and iin readout of MS1 maps such as models used to predict accurately retention time of peptides or metabolites. For example linear solvent strength theory in liquid chromatography and three dimensional structure of peptides were successfully used to predict retention time of peptides even when different linear elution programs were used [36,93e95]. However, modelling comes with more experimental effort and cost. For example, retention time prediction of peptides measured with different linear gradient programs and eluent flow rates require to measure peptide standards in different conditions to parametrise properly the retention time prediction model. Similar models should be developed for example to simulate ion suppression process, charge and adduct distribution changes of compounds in ionspray or electrospray regimes. Accurate modelling of orthogonality would reduce the effect of peak-elution order change which determine the uncertainty to match peaks solely using m/z and rt coordinates and will results in smaller analytical variance in iin readout.
In many LC-MS(/MS) profiling studies the data is acquired in one small analysis batch where orthogonality is absent or limited, however orthogonality becomes important when data originate from multiple batches/instruments or when data is acquired in large batches, which will become more and more common in future due to the need for large clinical proteomics and metabolomics studies. We also hope that our tutorial highlight the importance to assess small orthogonality and that data generator and evaluator users known the adverse consequences that orthogonality can have on the outcome of quantitative LC-MS(/ MS) profiling studies. Vikram Mitra obtained an engineering degree (B.Eng) from Visvesvaraya Technological University, India in 2007 and then obtained a MSc. from University of Exeter (UK) in 2009. He then started a PhD at Rijksuniversiteit Groningen, the Netherlands. His PhD work involved development of data processing workflows for label-free LC-MS data. Currently, he is employed as a senior scientist (bioinformatics) at Proteome science plc. His research interests involve development of methods for data normalisation, quality control and statistical analysis of proteomics datasets for biomarker discovery. He is also working on development of functional enrichment routines to identify key molecular mechanisms in disease states.
Age K. Smilde is full professor of Biosystems Data Analysis at the Swammerdam Institute for Life Sciences at the University of Amsterdam and as of June 1, 2013 he holds a part-time position as professor at the Department of Food Science at the University of Copenhagen. He has published more than 230 peer-reviewed papers and has been the Editor-Europe of the Journal of Chemometrics during the period 1994e2002. He is a co-founder of the Netherlands Metabolomics Centre; a large Public/Private Consortium devoted to all aspects of metabolomics His research interest is data fusion and multiset methods. For more information: see www.bdagroup.nl.
Rainer Bischoff received his PhD in Chemistry from the University of G€ ottingen (Germany). After two postdoctoral positions at the Max-Planck-Institute for Experimental Medicine in G€ ottingen and the Department of Biochemistry at Purdue University (West-Lafayette, IN), he joint industry (Transgene (Strasbourg, France) and AstraZeneca (Lund, Sweden)) where he commenced protein-related research. He joined the University of Groningen (The Netherlands) as professor of Analytical Biochemistry in 2001. His research interests focus on biomarker discovery and validation, bioinformatics, biopharmaceutical proteins and the development of novel instrumental analytical techniques. He has authored over 200 peer-reviewed publications, book chapters and is inventor on 14 patents. P eter Horvatovich received his PhD from University of Strasbourg (France) in food analytical chemistry in 2001. After 2 years spend in pharmaceutical industry at Sanofi-Synthelabo (Budapest, Hungary) and one and half year at Bundesinstitute für Risikobewertung (Berlin, Germany), he joined Analytical Biochemistry group at University of Groningen (Groningen, The Netherlands), where he is working in the last 11 years currently in the position of Associate Professor. His research interests focus on computational mass spectrometry, proteogenomics data integration and biomarker discovery. He authored more than 60 peer-reviewed publications and book chapters, and he is editorial board member of Journal of Proteomics.