A Fast Chromatographic Method for Determination of Daidzein and Genistein in Spiked Water River Samples Using Multivariate Curve Resolution

This work reports the development of a fast chromatographic methodology for quantitation of two phytoestrogens: daidzein (DAI), and genistein (GEN), in river water samples. The proposed method is based on high performance liquid chromatography-diode array detection (HPLC-DAD) data, and multivariate curve resolution-alternative least square (MCR-ALS) second-order calibration. Initially, the method was evaluated analyzing a synthetic validation set; prepared based on a Taguchi design. Subsequently, the method was applied to predict the concentration of the phytoestrogens in spiked river water samples, previously pre-processed by solid phase extraction (SPE). By implementation of the present chromatographic methodology, a 50% reduction in operation time was achieved (from 7.00 to 3.25 min) when compared with previous work in the literature. Precision was achieved even in the presence of non-modeled constituents and strong background. Thus, the proposed method is a rapid and robust alternative for the quantitation of studied phytoestrogens.


Introduction
As a result of human activities, an increased production of waste has been noted, particularly in natural waters.Consequently, the level of pollutants such as heavy metals, (well known for their toxic effects on living beings), has also increased. 1,2However, little is known about the impact of a class of compounds called emerging contaminants.These compounds are also released to the environment, and have recently become the object of a wide variety of studies. 3,4ithin this latter group of contaminants, phytohormones have great importance, since it is known that this class of compounds is bioactive even at low concentrations. 5Some phytohormones have been detected in environmental waters at alarming levels, as is the case of daidzein (DAI) and genistein (GEN). 6,7hytohormones are non-steroidal polyphenolic metabolites produced by plants, 8 which can chemically bind to specific intracellular estrogenic receptors causing variations of endocrine signals in animals and humans, thus their characterization as phytoestrogens. 9In Figure 1 the structural similarity between the phytoestrogens DAI and GEN, to basic estrogen structures is shown.Recent results have shown that fish exposed to phytoestrogens may become more aggressive, due to testosterone reduction, and to immunosuppression. 10,11hytoestrogens originating in the soy-processing industries, (field grain cultivation, and the food industry) contaminate both rivers and lakes. 12,13Research shows that after consumption, phytoestrogens undergo numerous metabolic conversions.Both their metabolites, and their precursor compounds may be absorbed into the bloodstream, and then excreted in urine. 14,15The increased production of soybeans and their derivatives for both eastern and western diets has been associated with a higher incidence of cancer in women. 169][20] A lack of proper water treatment implies that phytoestrogens (among other contaminants) reach the population.
Several methodologies have been proposed to quantify phytoestrogens, being that liquid (LC) or gas (GC) chromatography, coupled with mass spectrometry (MS), 21,22 are the most used methods.Other approaches have also been presented in the literature, 21 like LC and capillary electrophoresis (CE) combined with absorbance (diode array detector, DAD), 23,24 fluorescence and electrochemical detection, 25,26 immunoassays, 27 and voltammetry. 28][31] Multiway data can be easily generated with the modern instrumentation available in research and/or industrial laboratories, as examples; LC-DAD, GC combined with mass spectroscopy (GC-MS), CE-DAD systems (which are known as hyphenated techniques), or even simple excitation emission matrices (EEM). 32In terms of modeling multiway data, in the specific case of LC-DAD systems, multivariate curve resolution-alternative least square (MCR-ALS) is an appropriate algorithm. 33he development and validation of chromatographic analytical methods involves several steps, among them, analysis of interferences, which involves costs and time consuming.The composition of real samples tends to be complex, and may vary from sample to sample.An interferent (unexpected) can arise at any time.This requires that the method be modified and validated.The second order advantage can ensure that a chromatographic method continues to work well in scenarios where conventional strategies have failed.Another aspect is that there is no need for full chromatographic resolution of the analytes, which allows the use of shorter runs with fewer solvent consuming.
5][36] These applications have the great advantage of being able to circumvent the presence of unexpected constituents in a single sample; this feature is known as the second-order advantage. 32Thus, steps for the removal of interferents are not required, thus permitting less costly methods to be developed. 37his paper presents a method based on high performance liquid chromatography (HPLC) with molecular absorption detection in ultraviolet-visible region (UV-Vis) for simultaneous quantitation of two phytoestrogens in spiked river water samples (DAI and GEN), using multivariate curve resolution with alternating least squares.The predictive ability of the calibration models built, using pure standards of both analytes were evaluated for: (i) validation samples consisting of mixtures of DAI and GEN standards plus another phytoestrogen equol (EQL), which was added as a potential interferent, and (ii) river water samples.In addition, an isocratic elution mode was employed, with a run time of less than three and a half minutes, contributing to the development of green analytical methods.

Reagents and solutions
All reagents used in this work were of high-purity (≥ 99%).Daidzein, Genistein and Equol were purchased from Sigma-Aldrich Co. Acetonitrile and water (both HPLC grade) were filtered through a cellulose filter of 0.45 µm.All stock solutions of DAI (100.0 mg L -1 ), GEN (200.0 mg L -1 ) and EQL (100.0 mg L -1 ) were prepared in volumetric flasks by dissolving appropriate amounts in acetonitrile.

Apparatus and HPLC procedure
The LC-DAD matrices were recorded in an Ultimate 3000 Dionex liquid chromatograph, equipped with a manual injector and a fixed loop of 20 µL.Detection was carried out with a diode array detector in the range of 229 to 349 nm with resolution of 1 nm.A C18 column (AcclaimTM120) of 150 mm × 4.6 mm, 5 µm particle size and pore size 120 Å was used.The LC-DAD matrices were recorded employing the Chromeleon 6.1 software (DIONEX CA).The elution of standards and samples was performed in isocratic mode with an acetonitrile:water (70:30, v/v) mixture at a flow rate of 1.0 mL min -1 while maintaining the temperature of the column constant at 30 °C.

Calibration and validation sets
The calibration set consists of nine pure standards in triplicate for each analyte in the range of 1000 to 5000 ng mL -1 , with equally spaced increments of 500 ng mL -1 .All calibration standards were prepared by dilution of an appropriate aliquot of the stock solution.
The validation set was designed to assess the predictive ability of the calibration models, and to exploit the second order advantage.To achieve this goal, the validation set consisted in sixteen mixtures (in triplicate) of DAI, GEN and EQL generated according to a Taguchi design 38 with three factors at four levels (1300, 2300, 3300, and 4300 ng mL -1 ), whose concentrations for each of the sixteen validation mixtures are shown in Table 1.
DAI and GEN are the analytes quantified in this work, and EQL was added to the validation mixtures to simulate a potential interferent in the quantification of DAI and GEN by the proposed methodology.EQL was selected because it co-elutes with DAI and GEN in the chromatographic conditions used in this work.Nevertheless, the latter compound is also detected in real water samples.

River water samples
Eight river water samples were collected at different points along the Cuiá River in Paraíba, Brazil.Each sample was stored in a 1 L amber bottle, and acidified to pH 3 with acetic acid.The samples were subsequently processed in triplicate, as explained in the solid phase extraction procedure section.

Recovery
To evaluate the accuracy of the method, the river water samples were spiked with appropriate amounts of each analyte, generating samples at a concentration of 1 ng mL -1 .The spiked samples were also processed by solid phase extraction.

Solid phase extraction procedure
The samples (1.00 L spiked and non-spiked) were loaded on the C18 phase cartridge, pre-conditioned with 6 mL of the acetonitrile and 6 mL of water (both HPLC grade), at a flow rate of 3 mL min -1 .The elution was collected and dried under a gentle stream of nitrogen gas.The residues were dissolved with 500 µL of acetonitrile in order to achieve an enrichment factor of 2000.After this process, the samples were stored in vials for further analysis.

Software and chemometric analysis
The data modeling by MCR-ALS was carried out in environmental MatLab® using the graphical interface MVC2 developed by Olivieri et al., 39 and available on line. 40riefly, MCR is a bilinear decomposition method which assumes that the responses of each constituent of the system are additive 37 as shown in equation 1.
where D contains information collected experimentally as dimensions J × K.The information contained in D is the signal recorded in a system LC-DAD, E (J × K) is the residue matrix, 41 where J represents the elution times recorded at K wavelengths.C and S T are matrices truncated to N factors.For well-behaved systems, N is the number of chemical compounds which contain information on the pure concentration and spectral profiles.Initially, it is necessary to know N, which may come from prior knowledge of the samples, or by use of principal component analysis (PCA). 42Here the N value was accessed by inspection of the explained variance by each PCA component.
Starting from an initial estimative of C or S T , available by evolving factor analysis, 43 or by determination of the purest variables; 44 C (J × N), and S T (N × K) are estimated and optimized using alternating least squares. 33In this work, for all cases were used initial estimative of the spectral profile obtained by the method SIMPLISMA with 10% noise.
The bilinear decomposition described in equation 1 suffers from rotational freedom, i.e., there are many possible solutions for C and S T .However, the desirable solution is the "chemical solution", in this case pure chromatograms and spectra.Fortunately, the drawback of rotational freedom can be circumvented by applying restrictions for the solutions (C and S T ), obtained in each ALS iteration.Non-negativity, and uni-modality are the most common restrictions. 32n the case of I samples being analyzed on the LC-DAD system, I matrices (J × K) are generated.When using traditional algorithms for multiway data (like parallel factor analysis; PARAFAC), the data are arranged in a cube structure (I × J × K).Such methods are based on the principle of trilinearity, 32 commonly not obeyed by LC-DAD data, wherein elution time can be shifted for analytes from sample to sample.
The MCR-ALS algorithm is capable of handling trilinear data deviation.The three way array (I × J × K) is arranged in the form of an augmented matrix (Daug) of size (IJ × K) (column-wise augmented matrix), or size (IK × J) (row-wise augmented matrix).It is recommended that the augmented way is the same as that which breaks the trilinearity.The decomposition presented in equation 1 can be applied on Daug, generating the matrices Caug, S T , and Eaug for the case of the column-wise augmented matrix. 32inally, the scores obtained for the sum of the elements of the corresponding profile for each of the sub-matrices of Caug is used to construct a pseudo-univariate model in function of the concentrations of the calibration standards.The analyte concentration n is obtained by interpolation in the pseudo-univariate calibration line. 32

Results and Discussion
Calibration set: general considerations Figure 2 displays the chromatograms obtained for each of the calibration standards with absorbance recorded at 280 nm.A partial separation of the analytes (DAI and GEN) can be observed, although the resolution is less than one.
Another interesting observation can be made by visual inspection of Figure 2; significant peak shifts are produced with different runs.This problem can be exacerbated by the presence of interferents (in real samples), which typically vary from sample to sample, making the use of peak alignment algorithms unfeasible, as was reported by Boeris et al. 45 Given the above, MCR-ALS was chosen for modeling the data, because of its advantages, as was discussed in the introduction.A previous analysis carried out with the calibration matrices for a single analyte using extended MCR-ALS showed that two contributions (analyte and background profile) were retrieved.
Validation set MCR-ALS was applied to predict concentration for the validation set composed of sixteen mixtures of DAI and GEN in the presence of EQL (un-modeled).As an example of the modeled data, Figure 3a presents a typical LC-DAD surface for the samples validation set (sample No. 1, see Table 1).
On inspection of Figure 3a, it is possible to visualize the strong overlapping among the analyte signals, as well as the one corresponding to the interferent.EQL shows a retention time between that of DAI and GEN, overlapping simultaneously the signals of both analytes in both instrumental modes.It is important to remember that for traditional chromatographic methods, with complete resolution of all peaks, an unexpected constituent in a single sample co-eluting with the analytes requires revalidation of the method, a highly time consuming and laborious task.Figures 3b and 3c show the chromatograms and spectra corresponding to the pure standards of both analytes, and the interfering compound (all of them at 5000 ng mL -1 ).In these figures, the high degree of overlapping among the signals in both modes is evident.
As was previously mentioned, the first step when modeling data via MCR is to estimate the number of components N by principal component analysis.In ideal conditions, N must be equal to the chemical rank, i.e., equal to the number of chemical compounds in the validation mixtures, in this case three.However, sometimes a larger N value should be considered for better model fit.The increased number of components can be attributed to the complexity of the mixture, and to the presence of a strong background, which can lead to models with rank deficiency, as has been well discussed elsewhere. 46In this study, analysis of the augmented matrices was carried out by PCA, and the explained variance by each PC for the calibration and validation sets is shown in Table 2.
As can be seen, the analysis of PCA suggests two factors for both calibration sets, and four factors for the validation set, this amount of factors was used in the decomposition of data by MCR.
The initial estimatives of the pure spectral profiles was conducted using a method based on detection of pure variables (SIMPLISMA) with 10% noise.The estimated initial profiles retrieved for the validation samples can be assigned to DAI, GEN, the interferent EQL, and to the background.
The column-wise augmented matrix was then subjected to MCR-ALS decomposition, applying the non-negativity constraint in both modes and uni-modality in the time mode, except for background profile.In all cases ALS converged with a number of iterations less than or equal to thirty, and a residual fit of less than 0.51, which is in agreement with the typical noise of a DAD detector.
Figure 4 displays the optimized spectral and concentration profiles retrieved by MCR-ALS for the validation set samples.In this figure we observe a very close resemblance between the experimental profiles of both instrumental modes (see Figures 3a and 3b).
Beyond simple visual comparison of the spectra recovered by MCR, they may also be evaluated for degree of overlap (S 12 ) between the experimental normalized pure spectrum (s 1 ), and the retrieved spectrum (s 2 ). 47The S 12 value is calculated according to equation 2.   The S 12 value can vary in a range from 0 to 1, indicating extreme conditions, wherein 1 represents complete overlap between the real and retrieved profiles.On the other hand, 0 indicates that the profiles are very different.By applying equation 2 for S 12 , values of 1.0000 and 0.9998 were respectively obtained for DAI and GEN, indicating a satisfactory fit, and a high similarity for each spectrum and profile couple.
After the fitting, a pseudo-univariate calibration line was constructed through linear regression by ordinary least squares, between the value of the concentration of the calibration standards and the MCR scores.The concentration of the test samples was calculated according to equation 3.
where a test,n is the area under the concentration profile for the analyte n of a matrix test sample; b 1 and b 0 are the slope and intercept of the pseudo-univariate curve, respectively; and c n is the concentration of the analyte n in the test sample obtained by interpolation of the curve.The statistical parameters corresponding to the validation of the MCR models for DAI and GEN quantification are summarized in Table 3.In the latter table, observes that the MCR models were able to predict the concentration of GEN and DAI in validation samples, even in the presence of the interferent EQL.Root mean square error validation (RMSEV) (equation 4) values of 177 (DAI) and 144 ng mL -1 (GEN), and relative error prediction (REP) (equation 5) values of 5.9 (DAI) and 4.8% (GEN) were obtained.
cal RMSEV REP = 100 y (5)   where y pred is the concentration predicted by the MCR model for the ith validation sample, y nom is the nominal concentration, I is the number of validation samples.In equation 5, y cal is the average of the nominal concentration values of the calibration set samples.Table 3 also presents several figures of merit: selectivity (SEL), analytical sensitivity (γ), limit of detection (LOD), and limit of quantitation (LOQ), which were calculated according to Bauza et al., 48 and which complement the validation of the MCR models.As can be seen, the analytical sensitivity and selectivity achieved is high enough to make the method even more attractive, considering the values of LOD and LOQ which were reached, computing them through the formulas presented by Olivieri. 49Although the proposed method achieved a LOD in parts per billion, for real samples a pre-concentration step is needed.

Real samples
The eight river water samples were processed in triplicate using solid phase extraction, and run in an HPLC-DAD system under the same calibration conditions and validation set.The typical LC-DAD landscape sample (No. 1) being representative, is shown in Figure 5a.For all real samples, only one peak 1 was observed with an elution time around 2.60 min.We observed significant background presence (peak 2).The real samples data were analyzed similarly to the validation set samples by MCR-ALS, and for both analytes the concentration was below the limit of detection of the propose method.The real samples were then spiked as indicated in the recovery section.The surface landscape, corresponding to a typical sample (No. 1) LC-DAD matrix registered for a river water sample, and spiked with DAI and GEN is presented in Figure 5b.In this figure, the complexity of the analytical problem involved can be appreciated: the presence of a heavy organic load in samples of river water generates an extremely overlapped signal.The analyte signals are identified by the numbers 3 and 4. Behavior similar to that which is commented for the first real sample (No. 1), in Table 3, was observed for all samples.
Four factors were suggested from the PCA results for the LC-DAD data of the spiked real samples.A number of factors were chosen based on the results of Table 4.
The second column of Table 4 shows the explained variance for the first 10 principal components for non-spiked samples.After the second factor there were no appreciable variations in the explained variance.After the samples were spiked with DAI and GEN, PCA analysis indicated four factors (see third column of Table 4).
MCR was conducted using initial estimatives of the spectra (by SIMPLISMA with 10% noise), under the non-negativity restriction in both modes, and mean unimodality only in the chromatographic mode.Unimodality was not applied to the background profile.In Figure 6a (concentration profile), and Figure 6b (spectral profile), the profiles retrieved by MCR-ALS (using four factors) are shown.The profiles corresponding to DAI and GEN are displayed as solid blue and dashed green lines, respectively, and show good agreement with the experimentally recorded spectra (both shown with black circles and squares in Figure 6b), suggesting a good model fit with four factors.
Regarding Figure 6, the recovered profile can be also seen corresponding to the contribution of the interfering (circle red line).The concentration profile (Figure 6a) shows a wide peak, suggesting co-elution of an organic load signal, superimposed with both analytes.The spectral profile (Figure 6b) displays a spectrum with a maximum around 235 nm.It should be noted that in  conventional chromatographic method, co-elution may require revalidation of the method.However, the use of mathematic signal separation overcomes this type of drawback in the analysis of complex samples.The background profile (diamond line), recorded for real samples appears much more intense than in the validation samples (see Figure 4).Finally, concentration values of GEN and DAI were estimated by interpolation of the MCR areas of the test samples in the pseudo univariate calibration curve (the typical linear fit is displayed in the Figure 7).The pseudo univariate calibration curve was obtained by linear fit of the MCR areas for calibration samples vs. nominal concentration.
The results obtained are presented in Table 5.It is important to notice that the results for the DAI and GEN quantitations refer to real sample concentrations before the SPE step; in other words the figures of merit take into account the enrichment factor of the samples.Thus, a different magnitude in values can be seen when comparing the later values with those presented in Table 3 (validation set).
As can be seen, the proposed method was able to predict the concentration of both analytes in spiked river water samples even in the presence of interferents and a strong background.This was achieved with an acceptable accuracy, which was verified through the recovery values that ranged between 99 and 125 for DAI, and from 77 to 111 for GEN, with REP% values of 9.38 and 8.51.
Comparing the method reported in this work with the one published by Wang et al., 23 in which the quantitation of DAI and GEN in river water samples using HPLC-DAD is reported, we noted a 50% reduction in the running time.The REP values obtained by application of both methods are quite similar.However, the new method shows higher sensitivity, with 0.17 ng mL -1 of LOD for DAI, and 0.20 for GEN.Moreover, the proposed method makes use of the second order advantage, making it more robust when unexpected constituents occur in the sample, thus avoiding re-validations.LOD / (ng mL -1 ) 0.17 0.20 LOQ / (ng mL -1 ) 0.50 0.60 DAI: daidzein; GEN: genistein; RMSEV: root mean square error validation; REP: relative error prediction; SEL: selectivity; γ: analytical sensitivity; LOD: limit of detection; LOQ: limit of quantification.

Figure 2 .
Figure 2. Chromatograms corresponding to pure calibration standards daizein (solid line) and genistein (dashed line) at 280 nm.The black vertical line is the average retention time for each calibration set.

Figure 3 .
Figure 3. (a) Landscape obtained by LC-DAD for validation sample number 1; (b) chromatograms registered at 280 nm for pure standard solutions of (solid line) DAI, (dashed line) GEN and (circle line) EQL, all of them at 5000 ng L -1 and (c) the corresponding spectra of the latter compounds.

Figure 5 .
Figure 5. (a) Typical LC-DAD landscape after SPE for river water sample and (b) landscape obtained for the typical (No. 1) real spiked water river sample with 1.0 ng mL -1 for both analytes.

Table 1 .
Validation set built according to a Taguchi design

Table 2 .
Explained variance in PCA for calibration and validation sets PCA: principal component analysis; DAI: daidzein; GEN: genistein.

Table 4 .
Explained variance in PCA for test set PCA: principal component analysis.