Simultaneous Determination of Chlorpyrifos and Carbaryl by Spectrophotometry and Boosting Partial Least Squares

Resíduo de pesticida tem sido uma séria ameaça à saúde humana. Métodos de detecção rápida são necessários para os vários tipos de pesticidas. Este estudo investigou a viabilidade da combinação da espectrofotometria com o método dos mínimos quadrados parciais por boosting (boosting-PLS) para determinação simultânea de clorpirifos e carbaril, dois dos mais importantes pesticidas na agricultura. Estudou-se o efeito do pH sobre os espectros de absorbância de cada componente, sendo pH 5 o ideal. Para construir os modelos boosting-PLS, trinta e seis misturas binárias foram usadas e para validar os modelos resultantes, vinte misturas binárias foram empregadas. Resultados por boosting-PLS foram comparados com aqueles obtidos por full-spectrum PLS e algoritmo PLS acoplado a busca tabu (TS) como ferramenta de seleção de comprimento de onda. Apesar da melhoria pelas abordagens por boosting e seleção de comprimento de onda, a capacidade de previsão do algoritmo PLS, boosting-PLS mostrou-se superior aos demais modelos. Boosting-PLS não só é viável como também pode evitar a seleção de comprimento de onda.


Introduction
A wide variety of pollutants from industrial, agricultural and other human activities can contaminate aquatic environments.Among environmental pollutants, pesticides are of relevant concern due to their toxicity and prevalence of their use.For example, chlorpyrifos [O,O-diethyl O- (3,5,6-trichloro-2-pyridinyl)   phosphorothioate] became one of the largest selling insecticides in the world and with both agricultural and urban uses. 1 This insecticide could be purchased for indoor use by homeowners, but health-related concerns led United States Environmental Protection Agency (US EPA) to cancel home indoor and lawn application uses in 2001. 2 Chlorpyrifos is slowly hydrolyzed in alkaline medium, but the reaction rate increases with temperature, presence of metals and elevated pH values.In soil, it is initially degraded to 3,5,6-trichloropyridin-2-ol (TCP) which is subsequently degraded to organochlorine compounds. 3nother widely used pesticide is carbaryl.Carbaryl (1-napthyl methylcarbamate), as a broad spectrum insecticide, is one of the most widely used carbamate insecticides.It is used greatly in agriculture, including home gardens where it generally is applied as a dust.Carbaryl is not considered to be a persistent compound because it is readily hydrolyzed in alkaline medium. 4It is currently registered in the US EPA for controlling beetles and 90 other insects that cause problems in trees and ornamentals. 5,6lthough, these pesticides have relatively low persistence in the environment, they have high acute toxicity to human health and ecosystems. 7,8Therefore, the determination of these pesticides is of great interest, forcing different researchers to develop methods to analyze them.
][13][14][15] Spectrophotometry is a relatively easy method for simultaneous determination of variety components.It needs less expensive instrumentation and provides high sensitivity.One of the main drawbacks of the spectrophotometric methods is its poor selectivity due to the high degree of spectral overlapping.Nowadays, quantitative spectrophotometry has been greatly improved by the use of a variety of multivariate statistical methods, particularly the partial least squares regression (PLS). 16,17ultivariate calibration techniques, e.g., PLS, have been devised for the analysis of mixtures with overlapping spectra. 180][21][22][23] A difficulty when applying PLS in multivariate calibration is that overfitting may occur. 24oreover, it has been shown that a variable selection in PLS is necessary for obtaining a parsimonious and robust model. 25 variable selection procedure allows the informative part of the spectrum related to the variation of the analyte concentration to be modeled and other parts of the spectrum related to the variation of concentration of other compounds and/or background variations to be discarded.Consequently, in practice, the wavelength selection continues to be the process of interest due to the increase in the prediction capacity.Different strategies for wavelength selection in multivariate calibration models have been proposed in the literature, including the use of ant colony optimization, [26][27][28] tabu search (TS) 29 and genetic algorithm (GA). 30nlike the traditional calibration techniques based on a single model, which sometimes results in unsatisfactory accuracy and robustness, ensemble is based on the concept of building a series of model.For example, in boosting, samples in the training set are picked out with the probability obtained by the previous model.For a specific sample, if the prediction from the previous model is poor, the probability of the sample will increase so as to be trained more intensively.The final prediction is made by weighted median of the collected numerous models.The main advantage of such ensemble techniques is that they increase the accuracy and robustness of the calibration.][37][38][39][40] The aim of the present work was to develop simple, rapid, economical, accurate, precise and reproducible spectrophotometric methods for determination of carbaryl and chlorpyrifos.Due to the advantages of boosting, the combination of boosting-PLS and spectrophotometry for simultaneous determination of these pesticides was investigated.To our knowledge, this work is the first report on application of boosting-PLS for pesticide determination.Two other approaches including full-spectrum PLS and tabu search (TS)-PLS were used for comparison.
Tabu search is a meta-heuristic optimization method which was used by Hageman et al. 29 to solve the wavelength selection problem.TS process is initiated with a randomly selected solution (i.e., a subset of wavelengths).For each solution, a PLS model is built based on the selected wavelengths and the corresponding root mean squared error (RMSE) is calculated via crossvalidation method as: (1)   where xi is the predicted concentration of interested component in the i th mixture, x i is the real concentration, and m is the number of samples in the training set.
This RMSE value is used as fitness value.After evaluation of the initial solution, all possible neighbors of the solution are investigated.Neighboring solutions are slightly different from the initial solution and can be reached from the current solution by a simple, basic transformation of the current solution. 29,41The new solution is the one which yields the best result among all solutions considered.To avoid toggling between two solutions in a local optimum, TS uses a tabu list containing solutions or steps which are tabu.In this work, four tabu lists were employed, including select tabu list, deselect tabu list, move to tabu list and move from tabu list.The lengths of all tabu lists were optimized by trial and error.
The original concept of boosting was proposed by Schapire 42 and has intensively been developed. 43There are two methods used to establish the boosting regression model.The first one is by forward stage-wise additive modeling, which modifies the target values to effectively fit residuals. 35he second one is by changing sample weights to emphasize those which were poorly regressed on previous stages of the fitting process. 38In this work, the second approach is used.Anyway, at the end of boosting, there is an ensemble with T models at the hand which are used to predict analyte concentration in a test sample.Considering a training set containing m samples, the boosting procedure used in this study consists of the following steps: Step 1: initially, assign equal weights to each sample in the training data set: (2) Step 2: for iterations t = 1, 2, …, T: (i) Calculate probability for each sample: Here, the denominator represents the maximum residual between all predicted and real concentrations.(vi) Compute an average loss and from that a confidence index (CI) for the PLS model: (5)   This CI is ranging from 0 to 1. Low CI means high confidence in the prediction.(vii) Update weights of the samples by the following equation: (6)   Note that the weight-updating scheme implies that, the smaller the loss, the more the weight is reduced.In the other words, the weights of the unsatisfactory predicted samples are increased.This causes to pick up this sample as a member of the next boosting set with higher probability.(viii) Renormalized w, so that .(ix) t = t + 1.If t ≤ T, repeat steps i-viii; otherwise stop.After T iterations, there are T PLS models.
Step 3: the performance of the boosting-PLS is evaluated by a test set.For a sample j of the test set, the final prediction is the combined prediction obtained from the T PLS models as shown in equation 7. (7 where s is the normalized inverse of the Mahalanobis distance between a test sample and t th boosting set.

Reagents and solutions
All reagents were of analytical reagent grade.Bidistilled water was used throughout.Stock solutions of carbaryl and chlorpyrifos were prepared separately by dissolving 0.05 g of either carbaryl or chlorpyrifos in DMF (dimethylformamide) and diluting to 50 mL with DMF.Working solutions were prepared immediately before use by further dilution of the standard solutions in water.
Citrate buffer (0.15 mol L -1 ) was prepared and pH was adjusted to pH 5 by using concentrated HCl or NaOH.
Apparatus, software and data processing Electronic absorption spectra were measured by single beam UltraSpec 4000 spectrophotometer (Biotech, Pharmacia, England), using 10 mm path length quartz cells.The recorded spectra, from 250 to 375 nm, were digitized in 1.0 nm.All spectral measurements were performed using appropriate blank solution as a reference.A Methrohm model 780 pH-meter using combined glass electrode was used for pH measurements.The data treatment was done in MATLAB 7.0 (Mathw Works, Cochituate Place, MA) environments.The TS-PLS and boosting-PLS algorithms were written in MATLAB.

Procedure
To find the linear concentration range of each pesticide, one-component calibration was performed.Different volumes of a stock solution of the desired pesticide were injected, by means of a micro-syringe, into a cell that contained 2.0 mL citrate buffer solution (pH 5.0) and the absorbance spectra were recorded over the 250-375 nm spectral range versus a solvent blank.The linear dynamic range for each compound was determined by regressing absorbance at the corresponding λ max versus concentration.
Two sets of standard solutions are necessary for multivariate calibration.One, called training set, is used to develop model and the other to evaluate the model.In the present study, the training set contained 36 standard solutions and the test set contained 20 standard solutions.Although, the concentrations of the analytes in both data sets were within the linear concentration range of that pesticide, the compositions of the most concentrated solutions were selected so that the absorbance values were not exceeded the highest absorbance reading of the instrument.The composition of the training set was chosen by 6 level full factorial design, whereas the composition of the test set was chosen randomly.To prepare each solution, aliquots of carbaryl or chlorpyrifos solutions containing appropriate amount of these pesticides were added to a series of 10.0 mL volumetric flasks followed by addition of 2 mL of citrate buffer solution (pH 5), diluted to the mark with double distilled water.UV spectra of the mixtures were recorded in the wavelength range of 250-375 nm versus a solvent blank, and digitized absorbance was sampled at 1.0 nm intervals.

Spectral characterization of the analytes
The chemical structures of carbaryl and chlorpyrifos are shown in Figure 1.The absorbance spectra of each chemical are shown in Figure 2. As can be seen, carbaryl and chlorpyrifos have highly overlapped spectra.Due to overlapping spectra, mixtures of these chemicals cannot be analyzed by conventional univariate method.Multivariate calibration method such as PLS gives a solution to such problem.

Optimization of the experimental conditions
The experimental conditions for quantitative estimation of both pesticides were optimized via a number of preliminary experiments.The influence of pH on the spectrum of each compound at a constant concentration was studied separately in the pH range of 1-6 because carbaryl hydrolyzes in neutral and basic solutions. 4ydrochloric acid and sodium hydroxide were used for the pH adjustment.It was found that the system is almost independent of pH within the range of 1.0-6.0.However, to prevent hydrolysis of both carbaryl and chlorpyrifos, pH 5 was selected as optimum pH value.A solution of 0.15 mol L -1 citrate buffer was used for the adjusting pH.

One-component calibration
To find the linear range of each component, calibration graphs were obtained.The absorption spectra were recorded over 250-375 nm against a solvent blank.For each pesticide, the calibration curve was constructed with several points as absorbances at its λ max versus sample concentration and the graph was evaluated by linear regression analysis.The calibration curves were linear between 1.60 and 45.0 µg mL -1 for carbaryl and 1.50 and 50.0 µg mL -1 for chlorpyrifos.The characteristic parameters for the regression equations of individual calibration by absorption UV spectra are given in Table 1.

Multivariate calibration
The first step in simultaneous determination of binary mixtures by multivariate calibration methods involves the  construction of a training set.In this study, a training set consisting of thirty six binary mixtures according to 6-levels full factorial design was prepared.Calibration models were constructed with the aid of the following chemometrics methods: PLS, TS-PLS and boosting-PLS.The resulted models were validated against a randomly selected test set containing twenty binary mixtures (Figure 3).
To build full-spectrum PLS model, leave-one-out crossvalidation (LOO-CV) was used to optimize the number of latent variables (LVs).The LOO procedure is iterative, and, it consists of several steps in each iteration.At each iteration, one sample is leaved out from the training set.A PLS model with predefined number of latent variables (factors) is built based on the remaining samples.Then concentration of the hold-out sample is predicted by the model.This procedure was iteratively run for each sample for each number of factors.The error was expressed as the prediction residual error sum of squares of cross validation (PRESS-CV) which is given by: (8)   where xi is the predicted concentration of interested component in i th mixture by the model, x i is the real concentration, and m is the number of mixtures in calibration set.
In boosting-PLS, the two parameters that should be optimized were the ensemble size (the number of boosting cycles) and boosting set size (the number of samples in each boosting set).Again, the number of LVs for each boosting set was determined by the LOO-CV procedure.In TS-PLS, four parameters need to be optimized: the length of the tabu list for the select, deselect, move from and move to operators.These parameters were optimized by trial and error.For each generated solution in TS, the number of LVs was determined by leave-six-out cross validation (L6O-CV) procedure.
Plots of PRESS-CV versus the number of latent variables for both pesticides are pictured in Figure 4.As may be seen, the optimal number of factors was 2 for both pesticides.The results of full-spectrum PLS are summarized in Tables 2 and 3.As can be seen in Table 3, the RMSE and R 2 values for both training and test sets were used as indexes to evaluate performance of the models.
In another trial, tabu search was used to select more informative wavelengths to build PLS model.Figure 5 shows the selected wavelengths in three different runs of TS in the analysis of chlorpyrifos.Note that several wavelengths were selected from last part of the spectra where the absorbance signals are low.In order to evaluate the usefulness of this last part of the spectrum, PLS was applied on the last thirty two wavelengths and the results revealed R 2 values equal to 0.985 and 0.972 for training and test sets, respectively.This result clearly demonstrates that this part of the spectrum is informative and also indicates the ability of TS to choose the proper wavelengths.
The results of the analysis for chlorpyrifos and carbaryl by TS-PLS are summarized in Tables 2 and 3.In Table 3, it is obvious that TS-PLS resulted in more accurate models in     Figure 6 shows the influence of the ensemble size and the number of samples in each boosting set on the predictive power of boosting-PLS in the analysis of chlorpyrifos.As indicated in Figure 6, when the ensemble size is smaller than 18, the R 2 value drops especially when the number of samples in each boosting set is lower than 10.Regardless of number of samples, increasing the ensemble size causes the decrease in the fluctuation of the R 2 values.The best results obtained when the ensemble size and the number of samples were 35 and 25, respectively.
The predicted concentrations of chemicals in the test set by boosting-PLS are given in Table 2.The statistical parameters of the resulted boosting-PLS are collected in Table 3.In Table 3, it can be found that the precision of boosting-PLS was superior to full-spectrum PLS and even TS-PLS.The boosting-PLS model performance was improved by 47.94 and 10.35% compared to full-spectrum PLS and TS-PLS, in determination of chlorpyrifos, respectively.

Application
To test the practical application of the proposed method to the analysis of the considered pesticides, several tap water samples spiked with different amounts of these pesticides were analyzed by the PLS and boosting-PLS approaches.The spectra of each sample after pH adjusting were recorded in the range of 250-375 nm.Five replicate measurements were made.The results are shown in Table 4.  Recovery was calculated by (( xx)/x)×100, where x and x are real concentration and the estimated one, respectively.
The good agreement between these obtained results and the actual values indicates the successful applicability of the boosting-PLS approach for the simultaneous determination of carbaryl and chlorpyrifos.

Conclusion
Spectrophotometric methods in conjugation with chemometric tools such as multivariate calibration, e.g., PLS, provided simple analysis methods with high efficiency and low time consumption.These techniques are widely used for quantitative analysis of multicomponent mixtures with overlapping spectra.In this work, three chemometric tools including PLS, tabu search-PLS and boosting-PLS were used in the simultaneously determination of chlorpyrifos and carbaryl.The results revealed that the boosting-PLS approach is the best in terms of several performance measures.It seems that in a sense, using boosting-PLS can avoid doing a wavelength selection before modeling, therefore making a calibration more convenience.However, it is worthy to note that these conclusions are valid only for this data set.So, further investigations have to be made before any general conclusion can be drawn.

( 3 )
(ii) According to probability distribution p, select m' samples (m' < m) from the training set in order to generate a so-called boosting set.(iii) Develop a PLS model on the basis of the boosting set.Determine the number of latent variables by the crossvalidation on the current boosting set.(iv) Use the developed PLS model to predict the concentration of the interested analyte in all m' samples.(v) Calculate a square loss for each sample in the boosting set as:(4)

Figure 1 .
Figure 1.Chemical structures of carbaryl and chlorpyrifos.

Figure 3 .
Figure 3. Composition of training and test sets.

Figure 4 .
Figure 4. Plots of log (PRESS) versus number of latent variables for both pesticides.

Figure 5 .Figure 6 .
Figure 5. Absorbance spectra of samples in the training set and the selected wavelength in the analysis of chlorpyrifos by TS in three different runs (a-c)

Table 1 .
Parameters of the linear regression equations for each pesticide

Table 2 .
Composition of the binary mixture samples in the test set and their predicted values by various PLS modelings a Ch and Ca refer to chlorpyrifos and carbaryl, respectively.

Table 3 .
Statistical parameters of the different PLS modeling

Table 4 .
Results for the determination of the different concentrations of carbaryl and chlorpyrifos spiked in tap water a