HPLC Retention Behavior of Triacylglycerols Extracted from Soybean Oil by Supercritical CO 2

Soybean oil fractions were obtained by collecting extract at different time intervals during supercritical CO2 extraction. Relationships between retention behavior of seventeen TAGs and their molecular characteristics were studied using chemometric approach. Quantitative structure-retention relationship (QSRR) analysis was carried out on retention time values (tr) obtained by high pressure liquid chromatography to identify structural requirements of different TAGs for their retention. Principal component analysis (PCA) was performed in order to select molecular descriptors that best describe retention behavior of the compounds investigated, and to determine the similarities among molecules. The accurate mathematical models were developed for predicting the retention behaviour of some TAGs. The validity of the models was evaluated by suitable statistical and cross-validation parameters.


INTRODUCTION
Pressing and extraction with organic solvents is widely used in the production of vegetable oils.For soybean oil, hexane has been the preferred extraction solvents for a long time but in recent year's supercritical CO 2 extraction, as environmentally friendly technique, appeared to be alternative to current extraction methods.][5][6][7][8][9][10][11][12][13] It has been proven that the oil extracted from soybeans with supercritical CO 2 is much higher quality than the same oil extracted by hexane.Furthermore, the refinement stages are simplified significantly and the solvent distillation stage is completely removed.
Edible oils are composed of mainly triacylglycerols (TAGs) and analysis of TAGs is a critical step to understand physicochemical properties of vegetable oil.TAGs are esters of fatty acids and glycerol.Triacylglycerol chain of soybean oil contains five different fatty acids: palmitic, (16 : 0); stearic, (18 : 0); oleic, (18 : 1); linoleic, (18: 2) and linolenic acid, (18: 3). 14he mechanisms of chromatographic separation are very complex and depend on many factors such as experimental conditions, type of chromatographic system, physicochemical characteristics of analytes, etc.In order to understand chromatographic processes, it is Croat.Chem.Acta 87 (2014) 261.
very useful to establish mathematical models which can predict the retention behavior of analytes on the basis of their structural characteristics in applied chromatographic system.Determination of the correlations between molecular structure and retention behavior of molecules in different chromatographic systems is the main task of quantitative structure-retention relationship (QSRR) chemometric method.Chemometric analysis is undoubtedly of great importance in modern science.It means performing calculations on measurements of chemical data.
QSRR analysis is very often applicable for prediction of the retention behavior of newly synthesized molecules and quantitative comparison of separation properties of individual types of chromatographic layers.][17][18][19] In this context, the goal of the present study was to evaluate the HPLC retention data by QSRR analysis.A central object of this study was to establish the possible relationships between retention characteristics and the structural descriptors of the investigated triacylglycerols in order to predict the retention behavior of this class of molecules.

Materials
The supercritical CO 2 extraction was performed on the soybean cultivar "Ika" created at the Agricultural Institute Osijek in Croatia in 2009.The samples were cleaned from impurities.The material was grounded and sieved using sieve sets (Erweka, Germany) and the average particle size was determined.The prepared samples were then stored at +4 °C prior to extraction.Moisture was determined by oven drying to the constant weight at 105 °C. 20ommercial CO 2 (M/s, Novi Sad, Serbia) was used.HPLC grade acetone and acetonitrile were purchased from Baker J. T., Milan (Italy).Lipid standards (LnLnLn, LLL, OOO) were obtained from Sigma chemical, St. Louis (USA).Other used chemicals were of analytical reagent grade.

Supercritical CO 2 Extraction of Soybean Oil
The experiments were performed on the laboratoryscale high pressure extraction plant (HPEP, NOVA-Swiss, Effertikon, Switzerland) given in detail elsewhere. 8,9,21The main plant parts and properties, according to manufacturer specifications, were: the diaphragm type compressor (with pressure range up to 1000 bar), extractor with internal volume 200 mL (p max = 700 bar), separator (with internal volume 200 mL, p max = 250 bar), and maximum CO 2 mass flow rate of 5.7 kg h −1 .
The ground soybean sample of 120 g was placed into an extractor vessel.The extracts were collected in previously weighed glass tubes.The amount of extract obtained at regular intervals of time was established by weight using a balance with a precision of ±0.00001 g.Separator conditions were 15 bar and 298 K.
At the different extraction conditions of pressure (300, 400, 500 bar), temperature (40, 50, 60 °C), CO 2 flow rate (0.194, 0.436 kg h −1 ) and characteristic particle size (0.238, 0.383, 1.059 mm), extraction process was carried out until extraction yield become constant.Different fractions, depending on extraction conditions, were obtained by collecting extract every two hours during the extraction process.After each extraction, the obtained extract was placed into glass vials (25 mL), sealed and stored at +4 C to prevent any possible degradation.

HPLC Analysis of Studied Compounds
TAGs were analysed by the IUPAC method 22 using a Perkin-Elmer High Performance Liquid Chromatography system series 200 equipped with isocratic pump, refractive index detector and TotalChrom Navigator (HPLC software).The separation was performed on two serial connected PE Pecosphere C18 columns (83×4.6).The analysis was carried out with acetone / acetonitrile (70 : 30) as a mobile phase.Standard and oil samples (5 %) were dissolved in HPLC-grade acetone and 20 μL aliquots were injected into the column and eluted at a flow rate of 2.5 mL min −1 .Furthermore, TAGs were identified by comparing their retention time to standards.Experiments were conducted in triplicate.

Molecular Modeling and in silico Molecular Descriptors
The derivation of in silico molecular descriptors proceeds from the chemical structure of the compounds.In order to calculate the molecular descriptors, all molecules were drawn into ChemBioDraw Ultra version 12.0 program.The 3D modeling of examined molecules was carried out using ChemBio3D Ultra version 12.0 software running on AMD Sempron Processor 3000+.The obtained 3D models were subjected to energy minimization using molecular mechanics force field method (MM2).The cutoff for structure optimization was set at a gradient of 0.1 kcal Å −1 mol −1 .The Austin Model 1 (AM-1) was used for full geometry optimization of all structures until the root mean square (RMS) gradient reached a value smaller than 0.0001 kcal Å −1 mol −1 using MOPAC.
The values of molecular descriptors (Table 1) for each molecule in the data set were calculated using the software ChemBio3D Ultra version 12.0 and ALOGPS 2.1.Determined descriptors of examined compounds were solubility descriptors (AlogpS and AClogS), molecular volume (MV) and the lipophilicity parameters, logP values, calculated by use of different theoretical procedures from the internet data (milogP, ClogP, AlogPs, AClogP, logP Kow , XlogP2, XlogP3) (Table 1).

Chemometric Analysis and Model Validation
In chemometric analysis the main problem is how to reduce the number of variables.This can be done by various statistical methods of explorative analysis, classification methods and regression methods.−26 PCA is a technique for reducing the amount of data when there is correlation present.It is worth stressing that it is not a useful technique if the variables are uncorrelated.PCA calculates latent, new variables by a combination of the original variables, representing the multidimensional data structure in an optimal way.In a multidimensional space, where the variables define the axes, the data are projected into a few principal components (PCs) that are linear combinations of the original variables and describe the maximum variation within the data.Each PC is characterized by scores and loadings.Scores are the new coordinates of the projected objects, and loadings reflect the direction with respect to the original variables.The loadings plot displays relationships between variables and can be used to identify variables (molecular descriptors in this study) which contribute to the positioning of the objects on the scores plot.The scores plot provides a data overview displaying patterns or groupings within the data.
Model validation is a very important aspect of any QSRR analysis.−30 The correlation coefficient values closer to 1.0 represent the better fit of the regression, and high values of the F-test indicate that the model is statistically significant.Standard deviation expresses the variation of the residuals or the variation about the regression line, and should have a low value for the regression to be significant.The lower PRESS value is, the better the predictability of the model. 31If PRESS value is less than TSS value, the model predicts better and can be considered statistically significant.TSS values are in terms of the dependent variable y.In many cases, r 2 CV and r 2 adj are taken as a proof of the high predictive ability of estimated mathematical models in QSRR.High values of these statistical characteristics (r 2 CV , r 2 adj > 0.5) indicate high predictivity of the equations.Unlike r 2 , r 2 CV may be negative, indicative of a very poor mathematical model, also unlike r 2 , which tends to increase upon the addition of any descriptor, r 2 CV will decrease upon the addition of irrelevant descriptors.

Statistical Methods
The complete regression analysis was carried out by PASS 2005, GESS 2006, NCSS Statistical Software, as well as Statistica v. 8 software.

RESULTS AND DISCUSSION
In our previous paper 10 we explain in detail how different extraction parameters (pressure, temperature, CO 2 mass flow rate and characteristic particle size) influenced on the extraction yield of soybean oil.The increase in pressure, temperature and CO 2 flow rate improved the extraction yield while decrease in particle size show higher extraction yield because of the increase in oil amount outside the particles, due to the enhancement of surface area with particle size reduction.The maximum obtained yield at different supercritical extraction conditions was 19.33 % which is very close to oil yield obtained by n-hexane (20.19 %).Furthermore, we investigated also the tocopherols content in soybean oil obtained by supercritical CO 2 at different extraction process conditions. 11Chemometric analysis was successfully applied on different tocopherols isomers to model the relationships between the contents of different tocopherols isomers in soybean oil.Accurate mathematical models were developed for predicting the total tocopherols contents, as well as the contents of δ-tocopherole isomer.
In this study, during the extraction of soybean oil by supercritical CO 2 , different number of fractions, depending on extraction conditions, was collected every two hours.At the pressure of 300 bar and temperature of 40 °C extraction process was carried out for 12 hours, so six different fractions were collected.At the pressure of 400 bar extraction process was carried out for 8 hours for every set of temperature (40, 50 or 60 °C) and four different fractions has been collected.At the pressure of 500 bar and temperature of 40 °C extraction process was the shortest, 6 hours, so three different fractions were collected.In all collected fractions the concentration of TAG was determined using reversed phase high performance liquid chromatography.The application of this method resulted in successful separation of the triacylglycerols in 15 min, with very simple sample preparation. 32Chromatogram of every injected sample showed 17 individual triacylglycerol peaks and their concentrations were calculated from peak area.Retention times and molecular parameters of investigated TAGs are given in Table 1.The representative chromatogram of the analysed extract is presented in Figure 1.The major TAG was LLL (trilinolein), followed by LLO (dilinoleoolein), LLP (dilinoleopalmitin), and LOP (linoleooleopalmitin).The concentration of each TAG, depending on the different investigated extraction conditions was as follows: LLL (16.34-23.62%), LLO (14.61-17.07%), LLP (10.86-16.82%) and LOP (11.82-15.44%).Furthermore, the levels of LnLnLn (trilinolenoin), LnLnL (dilinolenolinolein), LnLO (linolenolinoleoolein), PLP (linoleodipalmitin), OOP (dioleopalmitin), OOO (triolein) and SOP (stearinoleopalmitin) were relatively low (less than 4 %).Similar data for soybean oil TAGs composition was reported previously [32][33][34] with specific differences due to use of different soybean cultivars.

PCA
In order to overview the data for similarities and dissimilarities, PCA has been applied on calculated descriptors of studied compounds and resulted in a twocomponent model that explains 93.61 % of total variance.The first PC explains 78.61 % of the variability, and the second accounts for up to 15.00 %.Score values and the mutual projections of the loading vectors for the first two PCs are presented in Figure 2.
The obtained results show that PC2 separate examined compounds in two big groups, already presented in Table 1.Scores plot revealed that the classification of the studied triacylglycerols was achieved according to presence of the palmitic acid in their structure.Unlike molecules in second series, molecules in first series contain one molecule of palmitic acid in their structure.Compound 17 contains two palmitic acid molecules.The loading plot highlights the most influential descriptors responsible for such compounds order.AlogpS has the highest negative impact on the PC2, while the molar volume expresses the highest positive impact on the mentioned PC.Therefore, PC2 could be considered as a discriminating factor between compounds according to their solubility and molecular size.

QSRR
In the second step, we focused our efforts on developing the chemometric models which relate the retention characteristics and the structural descriptors of the investigated triacylglycerols.To obtain the quantitative effects of the triacylglycerols molecular structure on their retention behavior QSRR analysis was operated.The regression analyses including non-linear regression for two series of triacylglycerols were carried out.The specifications for the derived mathematical models are shown in Table 2.
The statistical quality of the resulting models, as depicted in Table 2, were determined by squared correlation coefficient (r 2 ), standard error of estimation (s) and sequential Fischer test (F). 34−37 F-value was specified to evaluate the significance of a variable.The higher F-value, the more stringent was the significance level.It is noteworthy that all these equations were derived using the entire data set of compounds and no outliers were identified.The F-value presented in Table 2 is found statistically significant at 99 % level since all the calculated F-values are higher as compared to tabulated values.
Also, all the models show high squared correlation coefficient greater than 0.9600.But, only high correlation coefficient is not enough to select the equation as a model and hence various statistical approaches were used to confirm the robustness and practical applicability of the equations.There are three important  components in any chemometric analysis: development of models, validation of models and utility of developed models.Validation is a crucial aspect of any chemometric analysis. 38For the testing the quality of the predictive power of selected models leave-one-out (LOO) procedure was used (Table 3).The PRESS value above can be used to compute an r 2 CV statistic, called r 2 cross-validated, which reflects the prediction ability of the model.This is a good way to validate the prediction of a regression model without selecting another sample or splitting data.It is very possible to have a high r 2 and a very low r 2 CV .When this occurs, it implies that the fitted model is data dependent.This r 2 CV ranges from below zero to above one.When outside the range of zero to one, it is truncated to stay within this range.
Adjusted r-squared (r 2 adj ) is an adjusted version of r 2 .This parameter shows the statistical significance of incorporated variable in model.Adjustable r 2 takes into account the adjustment of conventional correlation coefficient (r).Therefore, if an independent variable is added that does not contribute its fair share, the r 2 adj will actually decline.Adjustable correlation coefficient is a measure of the percentage explained variation in the dependent variable that takes into account the relationship between the number of cases and the number of independent variable in the regression model.Whereas r 2 will always increase when an independent variable is added, adjustable correlation coefficient will decrease if the added variable does reduce the unexplained variation enough the loss of degrees of freedom.
In many cases r 2 CV and r 2 adj are taken as a proof of the high predictive ability of chemometric models.A high value of these statistical characteristic (> 0.5) is considered as a proof of the high predictive ability of the model.But, recent reports have proved the opposite. 39Although, the low value of r 2 CV for the training set can indeed serve as an indicator of a low predictive ability of a model, the opposite is not necessarily true.Indeed, the high r 2 CV does not imply automatically a high predictive ability of the model.Thus, the high value of LOO r 2 CV is the necessary condition for a model to have a high predictive power, but it is not a sufficient condition.
The only way to estimate the true predictive power of the models is to test their ability to predict accurately the retention times of the triacylglycerols investigated.To confirm our finding, t r values were calculated from the selected models 2−4, 7, 8, 10−16, and graphically compared with experimental data (Figure 3).Low scattering of points around the linear relationship, significant slope (> 0.99), and intercept close to zero (< 0.005), indicate very good concurrence between experimental values of retention parameters and values obtained by defined mathematical models.It proves the usefulness of the derived models.Also, on the basis of the magnitude of the individual percentage deviation (IPD %) there is close agreement between observed and calculated retention constants (Table 4, Figure 4).The results of this investigation indicate that these models can be successfully applied in prediction of the retention times of analysed triacylglycerols.The use of chemometric models for prediction of retention behavior of these triacylglycerols reduces cost and time of determination.
As a result of the detailed statistical validation, it can be concluded that model 3 and model 14 have the best statistical performance and should preferably be used in prediction of retention behavior of studied compounds in the applied chromatographic system.

Figure 1 .
Figure 1.The representative chromatogram of the HPLC analysis of the analysed extract.

Figure 2 .
Figure 2. Score values and factor loadings of the calculated descriptors for the first two PCs.
QSRR study has been carried out for training set of 17 TAGs from soybean oil to correlate and predict the HPLC retention time of studied compounds.Soybean oil fractions were obtained by collecting extract at different time intervals during supercritical CO 2 extraction at different process parameters.Molecular modeling and QSRR analysis were performed to find the quantitative effects of the lipophilicity of the compounds on their retention behavior.Accurate mathematical models were developed for predicting the HPLC retention time of some TAGs.The validity of the models has been established using LOO cross-validation.The established models were used to predict the retention time of the investigated compounds and close agreement between experimental and predicted values was obtained.It indicates the retention time of series of TAGs can be successfully modeled using different lipophilicity descriptors, logPs.Acknowledgements.This work was financially supported by the Ministry of Science, Education and Sport of the Republic of Croatia, project: 073-0730489-0344.The authors are also grateful to the Josip Juraj Strossmayer University of Osijek, Republic of Croatia for financial support.Furthermore, these results are part of the projects No. 172012 and No. 172014 financially supported by the Ministry of Science and Technological Development of the Republic of Serbia, 2011−2014.

Table 2 .
Statistical parameters of the relationships between the retention time and lipophilicity of the investigated compounds

Table 3 .
Cross-validation parameters of the relationships between the retention time and lipophilicity of the investigated compounds

Table 4 .
Predicted values of the retention time of investigated triacylglicerols