Applicability of retention modelling in hydrophilic-interaction liquid chromatography for algorithmic optimization programs with gradient-scanning techniques

Computer-aided method-development programs require accurate models to describe retention and to make predictions based on a limited number of scouting gradients. The performance of five different retention models for hydrophilic-interaction chromatography (HILIC) is assessed for a wide range of analytes. Gradient-elution equations are presented for each model, using Simpson’s Rule to approximate the integral in case no exact solution exists. For most compound classes the adsorption model, i.e. a linear relation between the logarithm of the retention factor and the logarithm of the composition, is found to provide the most robust performance. Prediction accuracies depended on analyte class, with peptide retention being predicted least accurately, and on the stationary phase, with better results for a radient scanning ethod development radient equations diol column than for an amide column. The two-parameter adsorption model is also attractive, because it can be used with good results using only two scanning gradients. This model is recommended as the first-choice model for describing and predicting HILIC retention data, because of its accuracy and linearity. Other models (linear solvent-strength model, mixed-mode model) should only be considered after validating their applicability in specific cases. © 2017 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license


Introduction
Hydrophilic interaction chromatography (HILIC) has become increasingly important for the analysis of highly polar analytes, such as antioxidants [1], sugars (e.g. glycomics [2][3][4]), (plant) metabolites [5][6][7], foodstuffs [8], and environmental pollutants [9]. The exact mechanism of retention in HILIC has been intensively investigated and it is thought to be rather complex. The currently accepted mechanism is a combination of (i) partitioning processes of the analytes between a water-poor organic mobile phase and a water-rich layer absorbed on a polar stationary-phase material [10], and (ii) electrostatic interactions between the analytes and the stationary-phase surface [11]. Therefore, HILIC can best be described as a mixed-mode retention mechanism.
To describe retention in HILIC, several retention models have been investigated. The model most commonly used in reversedphase LC (RPLC) involves a linear relationship between the logarithm of the retention factor (k) and the volume fraction of strong solvent (ϕ). When a linear gradient is used in RPLC this results in so-called linear-solvent-strength conditions [12]. Already in 1979, LSS conditions have been studied and described in detail by Snyder et al. [12] and equations were also derived for situations in which analytes elute before the gradient commences or after it has been completed [12,13]. However, due to the mixed-mode retention mechanism, this linear model may be less suitable to accurately model retention in HILIC.
To describe retention more accurately across a wider ϕ-range, Schoenmakers et al. introduced a quadratic model [13], including relations for the retention factor for analytes eluting within and after a gradient. However, an error function was required to allow partial integration of the gradient equation. This is an impractical aspect of the relationship. Moreover, the model may show deviations from the real values when predicting outside the scanning range. An empirical model proposed by Neue  be obtained for retention under gradient conditions [14]. A retention model based on surface adsorption has also been proposed for HILIC, predicting retention across narrow ranges of water concentrations in the eluent. To account for the observed mixed-mode behaviour found in HILIC, Jin et al. introduced a three-parameter model [15], which was found to precisely describe retention factors in isocratic mode [16]. However, similar to the quadratic model, integration of the corresponding gradient equation was significantly complicated, involving a gamma function and potentially yielding complex numbers.
For efficient method development, an underlying accurate description of the retention mechanism is crucial. Because gradient elution is an essential tool for analysing or scanning samples, accurate description of gradient-elution patterns is also essential. Computer-aided-optimization tools, such as Drylab [17] for 1D-LC, or PIOTR [18] for 1D and 2D-LC, utilize the concept of so-called "scouting" or "scanning" runs to establish retention parameters [19], from which the optimal conditions and the optimal chromatogram can be predicted. Method development in HILIC following these principles has extensively been studied by Tyteca et al. [20,21]. However, the currently employed retention models for HILIC do not allow accurate prediction of retention times of analytes eluting during or after the completion of gradients based on a very limited number of scouting measurements. This hampers the application of such optimization tools for HILIC.
In this work, we present the results of an evaluation study of each of the five above-listed models for predicting retention times in gradient-elution HILIC based on a limited number of scouting runs for a wide range of applications. First, the equation for each retention model is addressed in the context of gradient-elution chromatography. We use Simpson's Rule [22] to approximate the integration of resulting gradient equations when an exact solution does not exist, i.e. for the quadratic and mixed-mode models. The performance of each of these models in computer-aided methoddevelopment programs is assessed.

Theory
In the case that a solute elutes before the start of the gradient, the retention time (t R,before ) can be calculated from where k init depicts the retention factor at the start of the gradient and t 0 the column dead time. The general equation of linear gradients allows calculation of the retention time if a compound elutes during the gradient In this equation k (ϕ) is the retention model, denoting the variation of the retention factor k with the composition parameter . The change in ϕ as a function of time (i.e. the slope of the gradient) is depicted Bϕ = ϕ init + Bt) and is the sum of the system dwell time t D , the waiting time before the gradient is programmed to start t init , and t 0 ( ≈ t D + t init + t 0 ). For useful application of gradient-elution retention prediction models in real cases, it is essential that the retention time cannot only be established if the analyte elutes during the gradient, but also if it elutes after the gradient is completed. In this case, the retention is obtained by integrating the retention model in the following equation Here, t G represents the gradient time. The application of some of the proposed HILIC retention models is complicated, because the integrals in Eqs. (2) and (3) cannot be analytically solved. The application of each of the HILIC models for gradient-elution separations is the main topic of this paper and this will be described in detail in the following sections.

Exponential model
In the exponential model (Eq. (4)), k 0 accounts for the extrapolated retention factor for ϕ = 0 and S denotes the change in the retention factor with increasing mobile phase strength.
This equation is often referred to as the linear-solvent-strength (LSS) equations, because it corresponds to LSS conditions in combination with linear gradients (ϕ = ϕ init + Bt). Schoenmakers et al. derived equations for a compound eluting during (t R,gradient ) and after (t R,after ) the gradient [13].
Here, k final represents the retention factor at the end of the gradient and t G the duration of the gradient.

Neue-Kuss empirical model
The empirical model introduced by Neue and Kuss [14] is given by where the coefficients S 1 and S 2 represent the slope and curvature of the equation, respectively. Integration of the gradient equation yields with F defined as Similarly, introducing Eq. (7) into Eq. (3) and rewriting yields

Adsorption model
The adsorption model is based on confined surface adsorption as used in normal-phase chromatography and is given by where n depicts the ratio of surface areas occupied by a water and a solute molecule [16]. In the events that the compound elutes during or after the gradient retention can be calculated from [23] t R,gradient =

Mixed-mode model
The mixed-mode aspect of HILIC led Jin et al. to propose a tailored model [15] ln k = ln k 0 + S 1 ϕ + S 2 ln ϕ where S 1 is said to account for the interaction of solutes with the stationary phase and S 2 for the interaction of solutes with solvents. While seemingly attractive because of its ability to account for the mixed-mode character of HILIC retention, the relation does pose a practical problem upon integration of the gradient equation. The calculation of the retention time using Eq. (14), requires a gamma function and may possibly result in complex numbers. To circumvent this, we apply Simpsons' Rule [22] to approximate the integral (Eq. (15)).

Simpsons
This rule divides the integral over a function, f, in our case f (ϕ) = e S 1 ϕ · e S 2 in m segments of width ϕ. Subsequently, each segment is approximated by a solvable quadratic equation such that, in essence, the complex integral is replaced by m solvable quadratic equations (Eq. (16)).
Of course, the approximation is accompanied by an error. With the Simpsons' Rule, the maximum error depends on m and can be calculated from where D represents the maximum value of the fourth derivative of the retention model in the integration range from ϕ init to ϕ final . In this study, we set the acceptable calculation error to 0.001, which is much smaller than the typical experimental error. Rewriting the In the case of the mixed-mode model, ϕ x equals ϕ init . Likewise, the mixed-mode retention model can be applied in conjunction with Eq. (3), resulting in

Quadratic model
In comparison with the exponential ("LSS") model, the quadratic model offers a more flexible solution across a broader range of the volume fraction of the strong solvent. Retention is given by Where S 1 and S 2 represent the influences of the volume fraction of strong solvent. Similar to the mixed-mode model, integration of the gradient-elution equation is complex. Effectively, the gradient equations become For calculating the number of partitions m for the Simpson's approximation from (Eq. (18)),]zϕ x now equals ϕ final , because the fourth derivative is largest at the final solvent strength.

Chemicals
Aqueous solutions were prepared using deionized water (Arium 611UV; Satorius, Germany; R=18.2 M cm). Acetonitrile (ACN, LC-MS grade) was obtained from Avantor Performance Chemicals (Deventer, The Netherlands). Ammonium formate, formic acid (reagent grade, ≥95%) and the peptide mix (HPLC peptide standard mixture, H2016) were obtained from Sigma-Aldrich (Darmstadt, Germany). The earl-grey tea was obtained from Maas International (Eindhoven, The Netherlands). The dyes used in this study were authentic dyestuffs obtained from the reference collection of the Cultural Heritage Agency of the Netherlands (RCE, Amsterdam, The Netherlands).

Instrumental
All experiments were carried out on an Agilent 1100 LC system equipped with a quaternary pump (G1311A), an autosampler (G1313A), a column oven (G1316A) and a 1290 Infinity diode-array detector (G4212A) (Agilent, Waldbronn, Germany). In front of the column, an Agilent 1290 Infinity In-Line Filter (G5067-4638) was installed to protect the column. The dwell volume was approximately 1.1 mL. The injector needle was set to draw and eject at a speed of 10 L·min −1 with two seconds equilibration time. Two columns were used, a Waters Acquity BEH Amide (150 × 2.1 mm i.d., 1.7-m particles, 130-Å pore size; Waters, Milford, MA, USA) and a Phenomenex Luna HILIC column (50 × 3 mm, 3 m, 200 Å; Phenomenex, Torrance, CA, USA), further referred to as diol column.

Analytical methods
The mobile phase consisted of acetonitrile/buffer [v/v] 97:3 (Mobile phase A) and acetonitrile/buffer [v/v] 1:1 (Mobile phase B). The buffer was 10 mM ammonium formate at pH 3. For all experiments recorded on the amide column, five different scouting gradient programs were used. All gradients started from 0.0 until 0.5 min isocratic at 100% A, followed by a linear gradient from 100% A to 100% B in 10 (Gradient A1), 17 (Gradient A2), 30 (Gradient A3), 52 (Gradient A4) and 90 (Gradient A5) minutes. In all cases, 100% B was maintained for 2 min, after which a linear gradient of 1 min brought the mobile phase back to the initial composition of 100% A, which was maintained for 20 min to thoroughly re-equilibrate the column. The flow rate was 0.25 mL min −1 .

Sample preparation
The peptide mix was used at a concentration of 1000 ppm in deionized water, and was diluted five times with ACN before analysis. The injection volume was 5 L.
The metabolites were individually prepared as stock solutions in ACN/water with ratios (varying between 9:1 or 8:2 [v/v]) and resulting concentrations depending on the solubility of each compound (see Supplementary Material section S-1). For analysis, mixtures of seven or eight metabolites were prepared with effective metabolite concentrations in these mixtures between 100 and 500 ppm. The injection volume of each mixture was 1 L. The peaks in the chromatograms were identified using individual injections, clearly distinguishable peak patterns and UV-vis spectra.
The dyes were prepared as individual stock solutions of approximately 5000 ppm in water/MeOH (1:1) [v/v]. Mixtures of 20 dyes were prepared and the resulting solution was diluted four times in ACN. The mixtures were injected at a volume of 3 L with effective individual dye concentrations of approximately 25 ppm. The peaks in the chromatograms were identified using the UV-vis spectra.
For the preparation of the earl-grey-tea sample, 200 mL of deionized water was heated until the boiling point was reached. The heating was turned off and the earl-grey-tea bag was submerged in the water for 5 min. After cooling, the resulting solution was diluted five times with ACN prior to analysis. The injection volume was 5 L.

Goodness-of-fit
To establish a good representation of HILIC behaviour, a set of 57 analytes was compiled comprising organic acids, peptides, synthetic and natural dyes and components found in black tea ( Table 1). The set mainly comprises acidic, basic, zwitterionic and neutral compounds with varying polarity and, in most cases, aromatic functionality. The retention times for all analytes on the amide column were recorded with five different gradient programs (see Experimental section, gradients A1-A5). Using the obtained retention times (see Supplementary Material section S-2 for all retention time data), all retention parameters were determined for each retention model by using the nonlinear-programming-solver fminsearch function of MATLAB to fit the retention model to the data. The resulting retention parameters are displayed in Table 1. For most analytes, reasonable values were found. Peptide component 1 exhibited no retention and thus was excluded from this study.
To assess the ability of the different models to describe the retention behaviour, the Akaike Information Criterion (AIC) [24] was used. This criterion provides a measure for the mean-squared error in describing the retention behaviour based on information theory and it can be regarded as a goodness-of-fit indicator. The criterion has been used before to study retention behaviour in HILIC [16] and partition coefficients [25]. One attractive feature of the AIC criterion is that it allows comparison of models with different numbers of terms. For example, a three-parameter model is expected to describe the data better than a two-parameter model. The AIC criterion accounts for this by penalizing the model for each additionally employed parameter, allowing an unbiased comparison. For calculation the AIC, the number of parameters, p, the number of experiments, n, and the sum-of-squares error (SSE) from the fit of the retention model are used.  [16]. This is also reflected in Fig. 1, where the number of found compounds with a given AIC value are grouped in a histogram per retention model. It can be seen that the LSS model performs poorly in most cases and that both the mixed-mode and quadratic models are not robust across the range of compounds.
Conversely, the plot suggests that the adsorption and Neue-Kuss models perform more reliably, in particular for the metabolites and dyes (see Supplementary Material section S-3 for histograms per class of analytes).

Prediction errors
With this insight, the ability of the models to reliably predict retention times based on a smaller number of gradients was studied to allow automatic gradient optimization. To simulate such a situation, the retention times from gradients A2 and A4 were excluded, and the retention parameters were determined once more using exclusively the data from gradients A1, A3 and A5. The obtained retention parameters were used to predict the retention times for gradients A2 and A4.
The results are plotted in Fig. 2A for the amide column, where the errors in prediction of the retention times of all analytes in gradient A2 are reflected by the solid bars and the dashed bars reflect the prediction errors for gradient A4. Most models perform very similarly, with an average prediction error of approximately 2% for retention times of gradient A2 and 2.5% for retention times of gradient A4. A significant deviation of this impression is the performance of the Neue-Kuss model, which is surprising because of the low AIC values found for this model. This can be explained by realizing that this is an empirical model now using just three points (A1, A3 and A5) instead of all five such, was the case for Fig. 1. To investigate to which extent the observations obtained thus far also pertain to other stationary phases, the retention times for 17 of the analytes (peptides and metabolites) were also recorded on a cross-linked diol column, using a similar set of scanning gradients (see Supplementary Material section S-4 for the obtained retention times, determined retention parameters and the calculated AIC values; see Section 3.3.1 for gradient programs and conditions). Again, using exclusively data from gradients D1, D3 and D5, the retention times of gradient D2 and D4 were predicted and the prediction errors are shown in Fig. 2B. Interestingly, the prediction errors across the range of models is much better with, with prediction error values of approximately 0.5% for D2 and 1.0% for D4. Noteworthy is, however, the dramatic performance of the Neue-Kuss model for which the box-and-whisker plots are off-scale. The lower prediction errors suggest a more-regular behaviour of the diol column. It is worth to point out that the obtained retention factors for both columns were rather similar. To rule out that this significantly better performance Table 1 Overview of determined retention parameters and calculated AIC values for all 57 analytes and five studied retention models based on five scanning gradients.   of specific models is caused by the absence of the dyes and/or tea components in the data set, the plot shown as Fig. 2C compares exclusively the prediction errors for the 17 selected analytes. The observed trends support those found in Figs. 2A and 2B.

Repeatability
To study the robustness of fitting the retention parameters and predicting retention times, ten analytes were selected and mea- sured five times with all five gradients on the amide column. The resulting retention curves are shown in Fig. 3. The error bars signify the spread based on the five determinations. Narrow spreads in the retention curves were observed for the LSS and mixed-mode models, but especially for the adsorption model. The spread was larger for a number of components with the Neue-Kuss model, despite the use of five data points for fitting, and it was dramatic for the quadratic model.
One trend that can be observed is that for specific (mostly earlyeluting) analytes the spread is larger towards higher fractions of water. The typical HILIC conditions may no longer be applicable in this range. Earlier-eluting compounds are generally more difficult to fit with the relatively limited gradient range used for scouting.To study the influence of the number of scanning gradients on the error in prediction, retention parameters were determined for each possible combination of gradients where the predicted gradient would fall within the scanning range. For example, gradients A1, A2 and A4 could be combined to predict A3 as the latter falls within the scanning range, whereas this was not the case for predicting gradient A4 from gradients A1, A2 and A3. The results are shown in Fig. 4, with the plot showing the prediction errors for all 56 retained analytes plotted against the number of used scanning gradients. Note that the mixed-mode, quadratic and Neue-Kuss models were not investigated for the use with two scanning gradients as they are three-parameter models. Not surprisingly, a larger number of scanning gradients improves the accuracy of the predictions. However, we can see that for some models this trend is more significant than for other ones. Using four scanning gradients instead of two provides just minor improvements in prediction accuracy for the adsorption model, which appears to perform solidly for the amide column. It should also be noted that the effect of additives, which have been shown to influence the type of interactions in HILIC, have not been evaluated here [26].

Concluding remarks
We have investigated the quality-of-fit and prediction accuracies for five retention models and a wide array of compounds from different classes. For most compound classes, the adsorption model provides the most robust performance in terms of its ability to describe and accurately predict HILIC retention based on a limited number of scanning gradients. Prediction accuracies were generally better for a diol column than for an amide column, with the exception of the Neue-Kuss model which performed poorly when using three scouting runs on either column. While the adsorption model was found to perform robustly for most of the investigated analytes, the data suggest a strong effect of the combination of analyte class and stationary-phase chemistry on model performance. The two-parameter model is also attractive, because it can be used with good results using only two scanning gradients. We recommend that the adsorption model should be the first-choice model for describing and predicting HILIC retention data, unless data are available to demonstrate that other models ("LSS" model, mixed-mode model) perform adequately in specific situations. Based on the present data, we cannot recommend the quadratic and Neue-Kuss models for use in HILIC.