A Sensitive SERS Sensor Combined with Intelligent Variable Selection Models for Detecting Chlorpyrifos Residue in Tea

Chlorpyrifos is one of the most widely used broad-spectrum insecticides in agriculture. Given its potential toxicity and residue in food (e.g., tea), establishing a rapid and reliable method for the determination of chlorpyrifos residue is crucial. In this study, a strategy combining surface-enhanced Raman spectroscopy (SERS) and intelligent variable selection models for detecting chlorpyrifos residue in tea was established. First, gold nanostars were fabricated as a SERS sensor for measuring the SERS spectra. Second, the raw SERS spectra were preprocessed to facilitate the quantitative analysis. Third, a partial least squares model and four outstanding intelligent variable selection models, Monte Carlo-based uninformative variable elimination, competitive adaptive reweighted sampling, iteratively retaining informative variables, and variable iterative space shrinkage approach, were developed for detecting chlorpyrifos residue in a comparative study. The repeatability and reproducibility tests demonstrated the excellent stability of the proposed strategy. Furthermore, the sensitivity of the proposed strategy was assessed by estimating limit of detection values of the various models. Finally, two-tailed paired t-tests confirmed that the accuracy of the proposed strategy was equivalent to that of gas chromatography–mass spectrometry. Hence, the proposed method provides a promising strategy for detecting chlorpyrifos residue in tea.


Introduction
Because of its possible health benefits, tea is the most famous and extensively consumed traditional nonalcoholic herbal beverage globally.As natural antioxidants, tea polyphenols play a significant role in preventing and treating cardiovascular disease, cancer, inflammation, and immune disorders [1].Nevertheless, farmers frequently use different pesticides in tea gardens to protect tea plants from pests, rodents, insects, and weeds, but without following control limits [2].However, the large-scale use of pesticides can accumulate in tea leaves and easily enter drinkable tea brews, which could raise public health concerns.
Organophosphates (OPs) constitute a group of pesticides specifically designed to control pests, weeds, and plant diseases.The application of OPs is still considered the most effective and accepted method to protect plants, which is crucial for improving agricultural productivity and crop yields [3].For example, dichlorvos, diazinon, dimethoate, chlorpyrifos, fenitrothion, monocrotophos, malathion, parathion, and quinalphos are commonly used OPs [4].Among the OPs, chlorpyrifos [O,O-diethyl O-(3,5,6-trichloro-2-pyridinyl)phosphorothioate] has become one of the most widely available broad-spectrum insecticides since its synthesis by a German chemical scientist in the 1930s for destroying flea beetles, flies, fire ants, maize rootworms, mosquitoes, roundworms, termites, and weeds [5].Despite Foods 2024, 13, 2363 3 of 20 variables (IRIV), and variable iterative space shrinkage approach (VISSA), were developed.Then, these intelligent variable selection models, along with a PLS model, were used to predict chlorpyrifos residue in tea samples based on the preprocessed SERS spectra.A comparative study of the prediction results obtained by different models was carried out.The proposed strategy's stability and sensitivity were confirmed by the repeatability and reproducibility tests and the calculated limit of detection (LOD) values.To verify the proposed strategy's accuracy, the paired t-tests were conducted on the detection results yielded by the proposed strategy and those produced by the GC-MS method.In this study, the schematic diagram of the experimental design is shown in Figure 1.
SERS sensor.Second, the SERS spectra of chlorpyrifos-spiked tea extracts were measured by a portable Raman spectrometer.Subsequently, the collected SERS spectra were preprocessed by adopting a spectral data preprocessing strategy.Four intelligent variable selection models, including Monte Carlo-based uninformative variable elimination (MC-UVE), competitive adaptive reweighted sampling (CARS), iteratively retaining informative variables (IRIV), and variable iterative space shrinkage approach (VISSA), were developed.Then, these intelligent variable selection models, along with a PLS model, were used to predict chlorpyrifos residue in tea samples based on the preprocessed SERS spectra.A comparative study of the prediction results obtained by different models was carried out.The proposed strategy's stability and sensitivity were confirmed by the repeatability and reproducibility tests and the calculated limit of detection (LOD) values.To verify the proposed strategy's accuracy, the paired t-tests were conducted on the detection results yielded by the proposed strategy and those produced by the GC-MS method.In this study, the schematic diagram of the experimental design is shown in Figure 1.

Instrumentation
A UV-1800 spectrophotometer (Shimadzu Co., Ltd., Kyoto, Japan) was used to measure ultraviolet-visible (UV-Vis) extinction spectra.Transmission electron microscopy (TEM) using a JEM-2100HR transmission electron microscope (JEOL Co., Ltd., Tokyo, Japan) was used to evaluate the morphology of the SERS sensor.A D8 ADVANCE X-ray diffractometer (Bruker AXS Co., Ltd., Karlsruhe, Germany) was used to examine the SERS sensor's Xray diffraction (XRD) patterns.Energy dispersive X-ray spectroscopy (EDS) was used to analyze the elementary composition of the SERS sensor using field emission scanning electron microscopy (Merlin, Carl Zeiss AG, Oberkochen, Germany).The zeta potential of the SERS sensor was measured by a Zetasizer Nano ZEN3600 (Malvern Instruments Co., Ltd., Worcestershire, UK).A GCMS-QP2010 GC-MS (Shimadzu Co., Ltd., Kyoto, Japan) was used to analyze the content of chlorpyrifos in tea samples.A PS-20A ultrasonator (Shenzhen Dekang Technologies Co., Ltd., Shenzhen, China) was used for ultrasonication.A portable Raman spectrometer (RMS1000, Shanghai Oceanhood Opto-Electronics Tech Co., Shanghai, China), equipped with a laser excitation of 785 nm and a fiber optic Raman probe (RPB-785-1.5-FS),was utilized to measure SERS spectra.The key parameters of the portable Raman spectrometer are as follows: the adjustment range of laser power is 5-500 mW; the spectral shift range is 200-3200 cm −1 ; and the spectral resolution is 2 cm −1 .

Synthesis of Au NSs
First, Au nanoparticles (NPs) were prepared using a reported method [28].Then, 50 mL of 0.25 mM HAuCl 4 •4H 2 O was placed in a round bottom flask and heated to ~100 • C under stirring.Then, 0.75 mL of 1% C 6 H 5 Na 3 O 7 was added and refluxed until the solution color changed to wine-red, indicating the successful synthesis of Au NPs.
Second, following a reported approach [29], a seeded-growth process was applied to synthesize star-like shaped Au NSs based on the prepared Au NPs.First, 250 µL of Au NPs with a diameter of ~17.53 nm was added as a seed solution to 20 mL of 0.25 mM HAuCl 4 •4H 2 O in a 50 mL beaker under stirring at 25 • C.After 5 min, 10 mL of 1 M HCl was added under the same conditions.Then, 0.2 mL of 2 mM AgNO 3 and 0.1 mL of 0.1 M C 6 H 8 O 6 were added quickly.After that, the mixture was stirred for ~30 s, and the solution color changed to blue, indicating the successful synthesis of Au NSs.In this study, the synthesized Au NSs served as the SERS sensor.The process diagram of synthesizing Au NSs is shown in Figure S1.

Sample Preparation
Green tea samples were bought at a nearby grocery and stored under refrigerated conditions.For simulating actual samples, each sample of 1 g green tea was homogeneously mixed with 1 mL of chlorpyrifos standard solutions of various concentrations (0.001, 0.01, 0.1, 1, 10, and 100 µg/mL).After the mixtures were dried at ambient temperature, the chlorpyrifos-spiked tea samples with different concentrations (0.001, 0.01, 0.1, 1, 10, and 100 µg/g) were obtained.Afterwards, 10 mL of CH 2 Cl 2 was mixed with 0.25 g of chlorpyrifos-spiked tea samples in a 25 mL volumetric flask.Then, the mixture was ultrasonicated for 1 h (the power was set to 100 W; and the temperature was set to 30 • C) and filtered through Whatman No.1 filter paper.The process was carried out three times, and the extracts were collected and merged.Subsequently, the extract was concentrated to 1 mL in a rotary evaporator and dried by a nitrogen stream.Finally, the residue was diluted with 1 mL of water containing 5% C 2 H 6 O.After 5 min of centrifuging at 6000 rpm, the supernatant was collected for SERS measurements [30].
Prior to spiking and preparation for SERS measurements, the tea samples were analyzed by GC-MS (a detailed procedure is shown in Supplementary Materials) to confirm that there was no pre-existing chlorpyrifos in the tea samples.

Collection of SERS Spectra and Spectral Preprocessing
First, 100 µL of Au NSs was concentrated 10 times.Then, it was mixed with various concentrations of chlorpyrifos-spiked tea extracts and incubated for 5 min to adsorb chlorpyrifos molecules properly on the surface of Au NSs.Subsequently, 10 µL of the mixture was deposited on a quartz plate for SERS measurements.During SERS measurements, the laser power was adjusted to 100 mW, and the integration time was set to 2 s.Finally, 10 SERS spectra were measured for each concentration, and therefore 60 SERS spectra were acquired in total.
In addition to the chemical information of chlorpyrifos, the raw SERS spectra generally contained multiple data artifacts, for instance, a baseline, instrumental noise, and light scattering effects.To improve the accuracy of the developed models, a spectral data preprocessing approach with four steps was employed [31] to eliminate these data artifacts: (1) asymmetric least squares was adopted for baseline correction; (2) multiplicative scatter correction was used for scatter correction; (3) a Savitsky-Golay smoother was employed for smoothing; and (4) autoscaling was used.

Development of the Partial Least Squares (PLS) Model
PLS is a powerful chemometric tool which can be used for relating two data matrices, X and Y, by a linear multivariate model, but goes beyond traditional multiple linear regression (MLR) in that it also models the structure of X and Y. PLS derives its usefulness from its ability to analyze data with many, noisy, strongly collinear, and even incomplete variables in both X and Y [32].In view of the above advantages, PLS has been widely and successfully used for spectral analysis [33][34][35][36][37][38].This study used PLS to develop a multivariate calibration model for the quantitative analysis of chlorpyrifos content in tea samples.For a spectral matrix X (m × p) and a corresponding response matrix y (m × 1), an expression of the relationship between X and y in the original space is described as follows: where b denotes the regression coefficient matrix, and E denotes the residual matrix.
Projecting the matrix X into a latent space, which is a low dimensional subspace: where W is the projection direction matrix, and T is a low dimensional representation of X.Therefore, Equation (3) is another expression for Equation (1): where Q represents the regression coefficients between T and y.This study used MC-UVE to select optimal spectral wavelengths and develop a multivariate calibration model for the quantitative analysis of chlorpyrifos content in tea samples.For a spectral matrix X (m × p) and a corresponding response matrix y (m × 1), the main procedures of MC-UVE are described as follows.
Step 1: A MC sampling run was performed N times.A portion of samples (e.g., σ × m) were chosen for every sampling run to generate a new spectra matrix X i and a corresponding matrix y i for developing a PLS model, where i = 1, 2, . .., N.
Step 2: Based on the developed PLS models, a coefficient matrix B = [b 1 , b 2 , . .., bp] was obtained, where b j = [b 1j , b 2j , . .., b N j] T .The stability of each wavelength was calculated as follows: where mean(b j ) and std(b j ) denote the mean and standard deviation of the regression coefficients of wavelength j, respectively.
Step 3: Based on the generated stability, a few (e.g., M) of the informative (stable) wavelengths were chosen for the development of the final PLS model.All the wavelengths were ranked in descending order according to their stability, and the stability of the Mth wavelength was set as the cutoff value.If the stability of the wavelength was less than the cutoff value, it was eliminated; otherwise, it was retained.Finally, the optimal spectral wavelengths were obtained.
Referring to previous literature [39], the number of MC sampling runs N was set to 100, and the ratio σ was set to 0.8.A flowchart of MC-UVE is shown in Figure 2A.

Development of the CARS Model
Based on Darwin's evolution theory, Li et al. proposed a variable selection method termed CARS.CARS realizes high computational efficiency by introducing MC and exponentially decreasing function (EDF) strategies and can prevent combination explosion in variable selection.This study employed CARS to select optimal spectral wavelengths and develop a multivariate calibration model for quantitative analysis of chlorpyrifos content in tea samples.For a spectral matrix X (m × p) and a corresponding response matrix y (m × 1), the main procedures of CARS are described as follows.
Step 1: A MC sampling run was performed.In this sampling run, a portion of samples (e.g., σ × m) were chosen to generate a new spectral matrix X ′ and a corresponding matrix y ′ for developing a PLS model.
Step 2: For the PLS model built by X ′ and y ′ , a coefficient vector b = [b 1 , b 2 , . .., b p ] T was obtained.To evaluate each wavelength's importance, a normalized weight for each wavelength was defined as follows: Step 3: The wavelengths with comparatively small weights were eliminated by the EDF strategy.In the ith sampling run, the ratio of wavelengths retained was calculated as follows: where a and k are two constants depending on two conditions: (1) in the first sampling run, all wavelengths were used for developing a PLS model, so r 1 = 1; (2) in the Nth sampling run, there were only two wavelengths retained, so r N = 2/p.Then, a and k could be computed as follows: where ln represents the logarithm (base e).After EDF-based enforced wavelength selection, a new wavelength subset was generated via the adaptively reweighted sampling (ARS) method for developing a PLS model, and the corresponding five-fold root mean squared error of cross-validation (RMSECV) value was calculated.
Step 4: After performing N times MC sampling runs, N wavelength subsets and the corresponding N RMSECV values were obtained.The wavelength subset with the lowest RMSECV value was chosen as the ideal spectral wavelengths.Step 4: After performing N times MC sampling runs, N wavelength subsets and the corresponding N RMSECV values were obtained.The wavelength subset with the lowest RMSECV value was chosen as the ideal spectral wavelengths.
Referring to previous literature [40], the number of MC sampling runs N was set to 500, and the ratio σ was set to 0.8.A flowchart of CARS is shown in Figure 2B.Referring to previous literature [40], the number of MC sampling runs N was set to 500, and the ratio σ was set to 0.8.A flowchart of CARS is shown in Figure 2B.

Development of the IRIV Model
Based on model population analysis (MPA), Yun et al. proposed an effective variable selection method called IRIV.IRIV classifies variables into four categories: strongly informative, weakly informative, uninformative, and interfering variables.During iterations, both the strongly and weakly informative variables are kept, and the variables that are uninformative and interfering are removed.Ultimately, a backward elimination strategy is carried out on the retained variables to obtain the optimal variables.This study employed IRIV to select the optimal spectral wavelengths and develop a multivariate calibration model for the quantitative analysis of chlorpyrifos content in tea samples.For a spectral matrix X (m × p) and a corresponding response matrix y (m × 1), the main procedures of IRIV are described as follows.
Step 1: A binary matrix M (N × p) was generated for sampling, where N denotes the number of sampling runs.In M, half of the elements in each column were set to "1", and the other half were set to "0", indicating that each wavelength's sampling weight was the same.Herein, "1" represents that the corresponding wavelength was selected, while "0" means that the corresponding wavelength was not selected.A new binary matrix M ′ was obtained by permutating elements in each column, and its dimension was the same as that of M. The schematic diagram of this step is shown in Figure S2.
Step 2: A PLS model was developed for each row in M ′ , and the corresponding fivefold RMSECV was calculated.Hence, it was possible to obtain a RMSECV vector (N × 1), which was represented as RMSECV 0 .
Step 3: A novel strategy was introduced to evaluate the importance of each wavelength.For the ith (i = 1,2, . .., p) wavelength, a matrix M1 ′ was generated by changing all the "1" elements in the ith column of M ′ to "0" and all the "0" elements in the ith column to "1", while keeping other columns of M ′ unchanged.Based on M1 ′ , RMSECV i (N × 1) was obtained.Then, Φ 0 and Φ i were employed to evaluate each wavelength's importance using Equation ( 9): where M ′ ki represents the value in the kth row and ith column of M ′ , and M1 ′ ki represents the value in the kth row and ith column of M1 ′ .
Step 4: The mean value of Φ 0 and Φ i were calculated, which were denoted as MEAN i,include and MEAN i,exclude , respectively.It was possible to compute the difference between the two mean values as follows: By using Equation (10) and the Mann-Whitney U test (p = 0.05), the wavelengths were readily divided into the following four groups: Step 5: The strongly and weakly informative wavelengths were retained in place after eliminating the uninformative and interfering wavelengths.Then, the next round was performed by returning to Step 1 until no uninformative or interfering wavelengths remained.
Step 6: A backward elimination strategy was carried out on the retained wavelengths to obtain the optimal wavelengths.
Referring to previous literature [41], the number of binary matrix sampling runs N was set to 500.A flowchart of IRIV is shown in Figure 3A.

Development of the VISSA Model
Combining MPA and weighted binary matrix sampling (WBMS), Deng et al. proposed a wavelength selection approach termed VISSA.In contrast to most current variable selection techniques, VISSA assesses space performance statically at every stage of the optimization process.Furthermore, the performance of the new variable space is better than that of the previous one during optimization.This study employed VISSA to select optimal spectral wavelengths and develop a multivariate calibration model for the quantitative analysis of chlorpyrifos content in tea samples.For a spectra matrix X (m × p) and a corresponding response matrix y (m × 1), the main procedures of VISSA are illustrated as follows:

Development of the VISSA Model
Combining MPA and weighted binary matrix sampling (WBMS), Deng et al. proposed a wavelength selection approach termed VISSA.In contrast to most current variable selection techniques, VISSA assesses space performance statically at every stage of the optimization process.Furthermore, the performance of the new variable space is better than that of the previous one during optimization.This study employed VISSA to select optimal spectral wavelengths and develop a multivariate calibration model for the quantitative analysis of chlorpyrifos content in tea samples.For a spectra matrix X (m × p) and a corresponding response matrix y (m × 1), the main procedures of VISSA are illustrated as follows: Step 1: A binary matrix M (N × p) was generated for sampling, where N is the number of sampling runs.In M, half of the elements in each column were set to "1", and the other half were set to "0", and therefore each wavelength's sampling weight was initially set to 0.5.Herein, "1" represents that the corresponding wavelength was selected, while "0" means that the corresponding wavelength was not selected.By permutating elements in each column, a new binary matrix M′ was obtained, and its dimension was the same as Step 1: A binary matrix M (N × p) was generated for sampling, where N is the number of sampling runs.In M, half of the elements in each column were set to "1", and the other half were set to "0", and therefore each wavelength's sampling weight was initially set to 0.5.Herein, "1" represents that the corresponding wavelength was selected, while "0" means that the corresponding wavelength was not selected.By permutating elements in each column, a new binary matrix M ′ was obtained, and its dimension was the same as that of M. Based on each row in M ′ , N PLS models were built, and N RMSECV values were calculated.Figure S3 displays the schematic representation for this step.
Step 2: A portion of the developed PLS models (e.g., σ × N) that had the lowest RMSECV values were chosen.The mean RMSECV value for those chosen models was calculated and recorded as RMSECV i , where i represents the current optimization step.
The weights for all wavelengths could be calculated as follows: Foods 2024, 13, 2363 10 of 20 where f j represents the frequency of wavelength j in the chosen σ × N models, and w j denotes the weight of wavelength j (0 ≤ w j ≤ 1).
Step 3: WBMS was executed using the wavelength weights that were previously acquired to generate a new binary matrix.Then, N PLS models were built based on the new binary matrix.Referring to the method described in Step 2, the mean RMSECV value was calculated and recorded as RMSECV i+1 .The weights for all wavelengths were also recalculated by referring to Equation (12).
Step 4: RMSECV i and RMSECV i+1 were compared.If RMSECV i+1 < RMSECV i , Steps 3 and 4 were performed until the mean RMSECV value of the updated models exhibited no additional improvement.Finally, the optimal spectral wavelengths were obtained.
Referring to previous literature [42], the number of binary matrix sampling runs N was set to 1000, and the ratio σ was set to 0.05.A flowchart of VISSA is shown in Figure 3B.

Model Evaluation Metrics
The prediction performance of these models was assessed by the root mean squared error of the calibration set (RMSEC), the root mean squared error of the prediction set (RMSEP), the coefficient of determination of the calibration set (R 2 C ), the coefficient of determination of the prediction set (R 2 P ), and the ratio of performance to deviation (RPD).In this study, all algorithms and calculations were performed by using MATLAB R2020a.

Characterization of Au NSs
As displayed in Figure 4A, the UV-Vis extinction spectrum of Au NSs showed a broad peak at 867 nm, whereas the UV-Vis extinction spectrum of Au NPs (~17.53 nm, shown in Figure S4) showed a narrow peak at 527 nm, the same as that reported in previous literature [43,44].The morphology of the Au NSs was validated by a TEM image (shown in Figure 4B).The size of Au NSs was ~64.09 nm, with an average of eight branches.The statistical histogram of the particle size of Au NSs is shown in Figure 4C.The signal of gold in the EDS histogram revealed that gold was the main element of the synthesized Au NSs, as shown in Figure 4D.Furthermore, the presence of the silicon signal was due to the use of a silicon wafer as the EDS sample template, and the carbon signal was from the reducing agents.As shown in Figure 4E, the surface charge of the Au NSs was positive, with the zeta potential value of +53.3 mV.The XRD pattern of Au NSs exhibited peaks at 38.04 • , 44.43 • , 64.70 • , and 77.38 • , and these peaks were related to the crystalline planes of the facecentered-cubic of (111), ( 200), (220), and (311), respectively, indicative of the crystallinity of Au NSs (JCPDS no.99-0056), as shown in Figure 4F.Additionally, the enhancement factor (EF) of Au NSs was estimated to be 1.06 × 10 5 (a detailed calculation is shown in Supplementary Materials), which demonstrated that it was capable of measuring the SERS spectra of chlorpyrifos residue in tea based on Au NSs.

Analysis of the Collected SERS Spectra
The SERS spectra of chlorpyrifos-spiked tea extracts were measured using Au NSs. Figure 5A shows six raw SERS spectra of chlorpyrifos-spiked tea extracts over the 0.001-100 µg/g concentration range.According to the theoretical molecular vibrations of chlorpyrifos calculated by the density functional theory (DFT) in a previous report [45], six primary characteristic peaks found at 619, 1074, 1143, 1266, 1447, and 1566 cm −1 were assigned to P=S stretching, P-O-C stretching, C-H wagging, ring stretching, C-H wagging, and C=C stretching, respectively.From the perspective of the molecular structure of chlorpyrifos, an internal P=S double bond is responsible for its strong adsorption to metal ions, forming new covalent bonds such as Au-S, Ag-S, and Cu-S [46].The high affinity of chlorpyrifos toward metal ions facilitated its Raman signal amplification.The proposed spectral preprocessing approach was then applied to the six raw SERS spectra, and the resultant SERS spectra are shown in Figure 5B.The preprocessed SERS spectra were smoother, and the baseline interference was removed, both of which were advantageous for subsequent quantitative analysis.Figure 5C shows 10 raw SERS spectra of chlorpyrifos-spiked tea extracts with a concentration of 1 µg/g, and the relative standard deviation (RSD) values of the Raman intensities at six characteristic peaks were computed.As shown in Figure 5D, the computed RSD values were all less than 5.0%, which implied the high consistency of the measured SERS spectra and the excellent repeatability of the Au NSs.

Analysis of the Collected SERS Spectra
The SERS spectra of chlorpyrifos-spiked tea extracts were measured using Au NSs. Figure 5A shows six raw SERS spectra of chlorpyrifos-spiked tea extracts over the 0.001-100 µg/g concentration range.According to the theoretical molecular vibrations of chlorpyrifos calculated by the density functional theory (DFT) in a previous report [45], six primary characteristic peaks found at 619, 1074, 1143, 1266, 1447, and 1566 cm −1 were assigned to P=S stretching, P-O-C stretching, C-H wagging, ring stretching, C-H wagging, and C=C stretching, respectively.From the perspective of the molecular structure of chlorpyrifos, an internal P=S double bond is responsible for its strong adsorption to metal ions, forming new covalent bonds such as Au-S, Ag-S, and Cu-S [46].The high affinity of chlorpyrifos toward metal ions facilitated its Raman signal amplification.The proposed spectral preprocessing approach was then applied to the six raw SERS spectra, and the resultant SERS spectra are shown in Figure 5B.The preprocessed SERS spectra were smoother, and the baseline interference was removed, both of which were advantageous for subsequent quantitative analysis.Figure 5C shows 10 raw SERS spectra of chlorpyrifos-spiked tea extracts with a concentration of 1 µg/g, and the relative standard deviation (RSD) values of the Raman intensities at six characteristic peaks were computed.As shown in Figure 5D, the computed RSD values were all less than 5.0%, which implied the high consistency of the measured SERS spectra and the excellent repeatability of the Au NSs.

Detection of Chlorpyrifos Residue in Tea Samples Via Different Models
To predict chlorpyrifos residue in tea samples, five models, PLS, MC-UVE, CARS, IRIV, and VISSA, were applied to the SERS spectra that had been preprocessed.Before developing these models, the preprocessed SERS spectra dataset was partitioned into a calibration set (six spectra were chosen at random for each concentration, 36 spectra in total) and a prediction set (the remaining four spectra for each concentration, 24 spectra in total).Additionally, the values in the response matrix y (i.e., the chlorpyrifos content) were logarithmically preprocessed.The performance of various models was assessed by the model evaluation metrics.The prediction results yielded by different models are tabulated in Table 1.

Detection of Chlorpyrifos Residue in Tea Samples Via Different Models
To predict chlorpyrifos residue in tea samples, five models, PLS, MC-UVE, CARS, IRIV, and VISSA, were applied to the SERS spectra that had been preprocessed.Before developing these models, the preprocessed SERS spectra dataset was partitioned into a calibration set (six spectra were chosen at random for each concentration, 36 spectra in total) and a prediction set (the remaining four spectra for each concentration, 24 spectra in total).Additionally, the values in the response matrix y (i.e., the chlorpyrifos content) were logarithmically preprocessed.The performance of various models was assessed by the model evaluation metrics.The prediction results yielded by different models are tabulated in Table 1.

Results of the PLS Model
PLS was used to develop a multivariate calibration model, which was then used for predicting chlorpyrifos residue concentration in tea samples.As shown in Table 1, the established PLS model generated the subsequent outcomes: the optimal number of LVs (nLVs) = 8, RMSEC = 0.1475, R 2 c = 0.9925, RMSEP = 0.3015, R 2 p = 0.9513, and RPD = 4.5310.The change in RMSECV values during the cross-validation procedure is displayed in Figure 6A.A scatter plot illustrating the correlation between the reference and the PLS model's predicted chlorpyrifos residue concentration is presented in Figure 6B.

Results of the MC-UVE Model
MC-UVE was used to choose the valuable wavelengths from the SERS spectra and develop a multivariate calibration model, which was then used to predict chlorpyrifos residue concentration in tea samples.As shown in Table 1, the developed MC-UVE model generated the following results: nLVs = 10, the number of selected wavelengths (nWAV) = 100, RMSEC = 0.1420, R 2 c = 0.9932, RMSEP = 0.2342, R 2 p = 0.9712, and RPD = 5.8928.Figure 6C shows the wavelengths chosen by the MC-UVE model.Of these 100 chosen wavelengths, the wavelengths at 1056, 1139, 1264, 1266, 1268, 1269, 1271, 1560, 1562, 1564, and 1566 cm −1 were in the neighborhood of the characteristic peaks of chlorpyrifos: 1074, 1143, 1266, and 1566 cm −1 .A scatter plot illustrating the correlation between the reference and the MC-UVE model's predicted chlorpyrifos residue concentration is presented in Figure 6D.
predicting chlorpyrifos residue concentration in tea samples.As shown in Table 1, the established PLS model generated the subsequent outcomes: the optimal number of LVs (nLVs) = 8, RMSEC = 0.1475,  = 0.9925, RMSEP = 0.3015,  = 0.9513, and RPD = 4.5310.The change in RMSECV values during the cross-validation procedure is displayed in Figure 6A.A scatter plot illustrating the correlation between the reference and the PLS model's predicted chlorpyrifos residue concentration is presented in Figure 6B.

Results of the MC-UVE Model
MC-UVE was used to choose the valuable wavelengths from the SERS spectra and develop a multivariate calibration model, which was then used to predict chlorpyrifos residue concentration in tea samples.As shown in Table 1, the developed MC-UVE model generated the following results: nLVs = 10, the number of selected wavelengths (nWAV) = 100, RMSEC = 0.1420,  = 0.9932, RMSEP = 0.2342,  = 0.9712, and RPD = 5.8928.Fig- ure 6C shows the wavelengths chosen by the MC-UVE model.Of these 100 chosen wavelengths, the wavelengths at 1056, 1139, 1264, 1266, 1268, 1269, 1271, 1560, 1562, 1564, and 1566 cm −1 were in the neighborhood of the characteristic peaks of chlorpyrifos: 1074, 1143, 1266, and 1566 cm −1 .A scatter plot illustrating the correlation between the reference and the MC-UVE model's predicted chlorpyrifos residue concentration is presented in Figure 6D.

Results of the CARS Model
CARS was adopted to select the valuable wavelengths from the SERS spectra and develop a multivariate calibration model, which was then used to predict chlorpyrifos residue concentration in tea samples.As shown in Table 1, the developed CARS model obtained the subsequent outcomes: nLVs = 9, nWAV = 23, RMSEC = 0.1315, R 2 c = 0.9944, RMSEP = 0.2117, R 2 p = 0.9745, and RPD = 6.2617.Figure 7A shows the wavelengths chosen by the CARS model.Of these 23 chosen wavelengths, the wavelengths at 624, 626, 628, 1077, 1081, 1258, 1262, 1266, 1271, 1562, 1569, and 1572 cm −1 were in the neighborhood of the characteristic peaks of chlorpyrifos: 619, 1074, 1266, and 1566 cm −1 .A scatter plot illustrating the correlation between the reference and the CARS model's predicted chlorpyrifos residue concentration is presented in Figure 7B. Figure 7C-I displays the change in RMSECV values for various MC sampling runs.The RMSECV values decreased during sampling runs 1 to 277 as the uninformative or interfering wavelengths were eliminated.Subsequently, the RMSECV values increased during sampling runs 290 to 500 due to the informative wavelengths being removed.Figure 7C-II and Figure 7C-III display the regression coefficient path and the number of sampled variables corresponding to various sampling runs.It can be seen from Figure 7C-III that the progress of variable selection of CARS was split into two steps: first, the wavelengths were eliminated rapidly, which implied a "fast selection" phase; second, the wavelengths were eliminated gradually, which signified a "refined selection" phase.

Comparison of Various Models
The RPD value of each variable selection model was higher than 5.0 (Table 1), which signified that all models effectively predicted chlorpyrifos residue concentration in tea samples.Nevertheless, thoroughly examining the prediction performance of the various models remains valuable.Herein, the PLS model was regarded as a benchmark, and the prediction performance of MC-UVE, CARS, IRIV, and VISSA improved in comparison.For the MC-UVE, CARS, IRIV, and VISSA models: (1) the RMSEC values decreased by

Comparison of Various Models
The RPD value of each variable selection model was higher than 5.0 (Table 1), which signified that all models effectively predicted chlorpyrifos residue concentration in tea samples.Nevertheless, thoroughly examining the prediction performance of the various models remains valuable.Herein, the PLS model was regarded as a benchmark, and the prediction performance of MC-UVE, CARS, IRIV, and VISSA improved in comparison.For the MC-UVE, CARS, IRIV, and VISSA models: (1)  In contrast, the prediction ability of the PLS model was inadequate, which was because all wavelengths were included in the modeling.The MC-UVE and CARS models improved the prediction ability compared to the PLS model by selecting informative wavelengths.The outcomes of the IRIV model were superior to those of the MC-UVE and CARS models, which was also validated by the distribution of the selected wavelengths, as they were more concentrated near the characteristic peaks of chlorpyrifos.Nevertheless, the IRIV model considered wavelength importance separately and only partially used wavelength combination information.Hence, it was difficult for the IRIV model to select the optimal wavelength combination.The VISSA model outperformed the other four models, which mainly benefited from the following three aspects: (1) a soft space shrinkage strategy, (2) the optimization of variable space, and (3) the effect of variable combination.
(1) Soft space shrinkage strategy Backward elimination and EDF are hard space shrinkage approaches commonly used in most variable selection methods.Conversely, VISSA offers a more sophisticated variable space shrinkage method known as a soft space shrinkage technique.Compared with the hard space shrinkage approaches, the soft space shrinkage method requires more time to eliminate the variables in each step when there are many variables, but the risk of eliminating essential variables is lower.
(2) Optimization of variable space Decisions on whether variables will be included or removed should be made repeatedly during variable optimization.On-time modeling increases the risk of making poor decisions, such as removing essential variables or choosing uninformative ones, and therefore statistical decisions are required.Moreover, when the variable space shrinks, the newly generated space should perform better than the previous one, which is reflected in each round of VISSA; thus, a decrease in the mean value of RMSECV of sub-models is guaranteed. (

3) Effect of variable combination
The variable combination information is a critical factor that various variable selection methods should be considered.This is because some variables do not significantly influence the model initially; however, when coupled with other variables, they play a significant role in modeling.Unlike other models that directly eliminate this type of variable, less important variables are given small weights by VISSA and are permitted to remain in the model for additional analysis.This process is demonstrated by the change in the weights of each wavelength during the iterations (shown in Figure S5).
In brief, an unambiguous ranking for the various models concerning prediction performance can be summarized as: VISSA > IRIV > CARS > MC-UVE > PLS.

Stability and Sensitivity of the Proposed Method
In this study, the stability of the proposed method was analyzed regarding two aspects: (1) the synthesized SERS sensor Au NSs and (2) the developed intelligent variable selection models.The repeatability of the Au NSs was evaluated in Section 3.2, and a satisfactory result was obtained.In this section, the reproducibility of the Au NSs was assessed by remeasuring the SERS spectra of chlorpyrifos-spiked tea extracts with three distinct concentrations (0.1, 10, and 100 µg/g) on three consecutive days, and 10 SERS spectra were collected for each concentration.Then, the RSD values of the Raman intensities at six characteristic peaks (619, 1074, 1143, 1266, 1447, and 1566 cm −1 ) were computed.As shown in Figure 8, all the computed RSD values were below 5.0%.Subsequently, the remeasured SERS spectra underwent spectral preprocessing and then were fed into the developed models for predicting chlorpyrifos residue concentration.Herein, 30 replicate runs of each intelligent variable selection model were performed for prediction, and then the RSD values of the RMSEP values were recorded for evaluating the stability of the developed intelligent variable selection models.As shown in Figure 9, the RSD values generated by various models on these SERS spectra were all less than 10.0%.Hence, the above results validated that the proposed strategy was stable, and the stability of VISSA model was superior than that of other intelligent variable selection models.intelligent variable selection models.As shown in Figure 9, the RSD values generated by various models on these SERS spectra were all less than 10.0%.Hence, the above results validated that the proposed strategy was stable, and the stability of VISSA model was superior than that of other intelligent variable selection models.LOD is a crucial metric that was used to assess the sensitivity of the proposed strategy.Based on the mean relative error (MRE) evolution method proposed by Oleneva et al. (detailed in Supplementary Materials) [47], the estimated LOD values of the various models were as follows: LOD MC-UVE = 5.6 × 10 −4 µg/g, LOD CARS = 5.1 × 10 −4 µg/g, LOD IRIV = 4.5 × 10 −4 µg/g, and LOD VISSA = 3.9 × 10 −4 µg/g.The MRL of chlorpyrifos in tea set by the European Union is 0.01 mg/kg (1.0 × 10 −2 µg/g).The estimated LOD values of all the models were lower than the MRL value.Hence, the sensitivity of the proposed method is qualified to determine chlorpyrifos residue in tea samples accurately.

Determination of Chlorpyrifos Content in Real Samples
A batch of green tea samples containing chlorpyrifos (the chlorpyrifos concentration range stated by the company is 0.01-0.001µg/g, but the exact concentration for each sample is unknown) was supplied by Pribolab Bioengineering Co., Ltd.(Qingdao, China).The proposed method and the GC-MS method were employed to determine the chlorpyrifos content in tea samples.Table 2 displays the detection results.Furthermore, two-tailed paired t-tests between the detection results of the MC-UVE, CARS, IRIV, and VISSA models and those of the GC-MS method were performed.The generated results were as follows: p-value (MC-UVE vs. GC-MS) = 0.24, p-value (CARS vs. GC-MS) = 0.27, p-value (IRIV vs. GC-MS) = 0.41, and p-value (VISSA vs. GC-MS) = 0.45.The results implied no significant difference between the proposed method and the GC-MS method regarding prediction performance (p-value > 0.05).

Figure 1 .
Figure 1.The schematic diagram of the experimental design of this study.

Figure 1 .
Figure 1.The schematic diagram of the experimental design of this study.

Figure 2 .
Figure 2. (A) A flowchart of MC-UVE.(B) A flowchart of CARS.Figure 2. (A) A flowchart of MC-UVE.(B) A flowchart of CARS.

Figure 2 .
Figure 2. (A) A flowchart of MC-UVE.(B) A flowchart of CARS.Figure 2. (A) A flowchart of MC-UVE.(B) A flowchart of CARS.

Foods 2024 , 21 Figure 4 .
Figure 4. (A) The UV-Vis extinction spectrum of Au NSs.(B) The TEM image of Au NSs.(C) The statistical histogram of the particle size of Au NSs.(D) The EDS of Au NSs.(E) The zeta potential of Au NSs.(F) The XRD pattern of Au NSs.

Figure 4 .
Figure 4. (A) The UV-Vis extinction spectrum of Au NSs.(B) The TEM image of Au NSs.(C) The statistical histogram of the particle size of Au NSs.(D) The EDS of Au NSs.(E) The zeta potential of Au NSs.(F) The XRD pattern of Au NSs.

Figure 6 .
Figure 6.(A) The change in RMSECV values during the cross-validation procedure in the PLS model.(B) A scatter plot illustrating the correlation between the reference and the PLS model's predicted chlorpyrifos residue concentration.(C) The wavelengths selected by the MC-UVE model.(D) A scatter plot illustrating the correlation between the reference and the MC-UVE model's predicted chlorpyrifos residue concentration.(E) The wavelengths selected by the IRIV model.(F) A scatter plot illustrating the correlation between the reference and the IRIV model's predicted chlorpyrifos residue concentration.

Figure 6 .
Figure 6.(A) The change in RMSECV values during the cross-validation procedure in the PLS model.(B) A scatter plot illustrating the correlation between the reference and the PLS model's predicted chlorpyrifos residue concentration.(C) The wavelengths selected by the MC-UVE model.(D) A scatter plot illustrating the correlation between the reference and the MC-UVE model's predicted chlorpyrifos residue concentration.(E) The wavelengths selected by the IRIV model.(F) A scatter plot illustrating the correlation between the reference and the IRIV model's predicted chlorpyrifos residue concentration.

Figure 7 .
Figure 7. (A) The wavelengths selected by the CARS model.(B) A scatter plot illustrating the correlation between the reference and the CARS model's predicted chlorpyrifos residue concentration.(C-I) The change in RMSECV values for various MC sampling runs.(C-II) The regression coefficient path for various MC sampling runs.(C-III) The number of sampled variables for various sampling runs.(D) The wavelengths selected by the VISSA model.(E) A scatter plot illustrating the correlation between the reference and the VISSA model's predicted chlorpyrifos residue concentration.

Figure 7 .
Figure 7. (A) The wavelengths selected by the CARS model.(B) A scatter plot illustrating the correlation between the reference and the CARS model's predicted chlorpyrifos residue concentration.(C-I) The change in RMSECV values for various MC sampling runs.(C-II) The regression coefficient path for various MC sampling runs.(C-III) The number of sampled variables for various sampling runs.(D) The wavelengths selected by the VISSA model.(E) A scatter plot illustrating the correlation between the reference and the VISSA model's predicted chlorpyrifos residue concentration.

Table 1 .
Prediction results of chlorpyrifos content in tea samples by various models.
a : the optimal number of latent variables.nWAV b : the number of selected wavelengths.

Table 1 .
Prediction results of chlorpyrifos content in tea samples by various models.
a : the optimal number of latent variables.nWAV b : the number of selected wavelengths.