Simultaneous Recognition of Species, Quality Grades, and Multivariate Calibration of Antioxidant Activities for 12 Famous Green Teas Using Mid- and Near-Infrared Spectroscopy Coupled with Chemometrics

In this paper, mid- and near-infrared spectroscopy fingerprints were combined to simultaneously discriminate 12 famous green teas and quantitatively characterize their antioxidant activities using chemometrics. A supervised pattern recognition method based on partial least square discriminant analysis (PLSDA) was adopted to classify the 12 famous green teas with different species and quality grades, and then optimized sample-weighted least-squares support vector machine (OSWLS-SVM) based on particle swarm optimization was employed to investigate the quantitative relationship between their antioxidant activities and the spectral fingerprints. As a result, 12 famous green teas can be discriminated with a recognition rate of 100% by MIR or NIR data. However, compared with individual instrumental data, data fusion was more adequate for modeling the antioxidant activities of samples with RMSEP of 0.0065. Finally, the performance of the proposed method was evaluated and validated by some statistical parameters and the elliptical joint confidence region (EJCR) test. The results indicate that fusion of mid- and near-infrared spectroscopy suggests a new avenue to discriminate the species and grades of green teas. Moreover, the proposed method also implies other promising applications with more accurate multivariate calibration of antioxidant activities.


Introduction
Free radicals (FRs) are defined as atomic or molecular species with unpaired electrons on an otherwise open shell configuration [1]. In recent years, it has been confirmed that FR action is implicated in several chronic and ageing diseases, such as cancer, stroke, heart disease, cataracts, rheumatoid arthritis, and Alzheimer's disease, because FR could damage or cause complete degradation of essential molecules in cells [2,3].
ere is growing evidence that antioxidants could protect the body against the destructive effects of FR [4,5]. erefore, antioxidant therapy has emerged as an important approach for maintaining human health and curing diseases.
Consumers are more inclined to take natural antioxidants rather than synthetic antioxidants due to their minimal toxicity. Green tea is considered to be a good source of natural antioxidants [6]. Increasing evidences demonstrate that polyphenolic compounds mainly including (-)-epicatechin (EC), (-)-epigallocatechin (EGC), (-)-epicatechin gallate (ECG), (-)-epigallocatechin gallate (EGCG), and (-)-gallocatechin gallate (GCG) in green teas exhibit antioxidant activity (AA) by adsorbing and neutralizing free radicals, quenching singlet and triplet oxygen, or decomposing peroxides [6][7][8]. e AA of green tea would differ depending on tea's species and quality grades, etc. AA is not only a criterion for evaluation of medical functions of green tea but also an essential standard for quality monitoring and control in the tea production line. erefore, it is significant to simultaneously recognize green teas with different species and quality grades and perform qualitative and quantitative analysis of AA for green teas.
Nowadays, wet-chemical analysis such as 2,2diphenyl-1-picrylhydrazyl (DPPH), 2,2 ' -azino-bis (3ethylbenz-thiazoline-6-sulfonic acid) (ABTS), ferric reducing/antioxidant power (FRAP), trolox equivalent antioxidant capacity (TEAC), and oxygen radical absorbance capacity (ORAC) are widely used to measure the antioxidant activity of various food samples [9][10][11]. However, these methods are time consuming and inconvenient for the applications of online and rapid analysis. With the advantages of being accurate, costefficient, rapid, nondestructive, and reagent-free, spectroscopic techniques including mid-and near-infrared (MIR/NIR) spectroscopy have been utilized as alternatives to wet-chemical analysis [12,13]. In recent years, individual MIR or NIR spectroscopy coupled with multivariate calibration has been successfully used to develop spectra-antioxidant activity relationship models [14][15][16][17][18][19][20][21][22]. For instance, Hu et al. reports the use of ATR-FT-IR spectroscopy for rapid prediction of antioxidant capacity in chocolate [15], Lucas et al. have been engaged to develop visible-near-infrared reflectance spectroscopy combined with chemometric data analysis methods for predicting total antioxidant capacity in both fresh and freeze-dried cheeses [17]. Li et al. have reported the use of single NIR and MIR spectroscopy for rapid determination of antioxidant activity of Radix Scutellariae from different geographical regions [16].
Green tea is a natural healthy product rich in polyphenolic compounds composed of a large number of hydrogenous bonds (i.e. C-H, O-H, and N-H) [23]. Based on molecular vibrations, MIR could represent the absorbance spectra of all chemical bonds in the range from 4000 to 400 cm −1 according to the unique correlation of spectral band positions with certain substructures, whereas NIR spectroscopy is within the wavelength range of 800-2500 nm that contains overtones and combinations of fundamental vibrations due to the stretching and bending of hydrogenous bonds [24]. erefore, the constituents that contribute to antioxidant activity could be characterized by MIR and NIR spectroscopy. For example, Luypaert et al. [25] utilized near-infrared spectroscopy and partial least squares (PLS) to estimate the total antioxidant capacity of green tea. Zhang et al. [26] achieved prediction of the total antioxidant capacity of green tea by a combination of NIR spectroscopy and chemometrics. However, most studies have focused on using NIR spectroscopy and chemometrics for the antioxidant activity analysis of green tea. It is still necessary to evaluate the antioxidant activity of green tea more accurately including also MIR spectroscopy. Moreover, available information about the antioxidant activity of green tea using data fusion of MIR and NIR is scant. Based on the different characteristics of MIR and NIR, the information fusion of MIR and NIR spectra could obtain more comprehensive and reliable description than single spectroscopy to enable more accurate qualitative and quantitative analysis. To the best of our knowledge, combinatory MIR and NIR spectroscopy coupled with multivariate calibration has not been reported for revealing spectra-antioxidant activity relationship in green teas.
In this work, a novel strategy based on the combination of MIR and NIR spectroscopy was developed and validated for the discrimination of 12 famous green teas with different species and quality grades, as well as for quantitative characterization of their antioxidant activity. Because the spectroscopic techniques provide multivariate and nonspecific signals, chemometrics methods for pattern recognition and multivariate calibration were employed to extract most useful and relevant information.
is information fusion approach based on MIR and NIR spectroscopy combined with chemometrics was proved to enable reliable classification of green teas and quantitative characterization with high accuracy for spectra-antioxidant activity relationship.

Chemicals and Materials.
A total number of twelve famous green tea samples with different species and quality rank were purchased from the representative local tea market, and their detailed information is listed in Table 1. 1,1-diphenyl-2-picrylhydrazyl (DPPH), 2,2'-azinobis (3ethylbenzothiazoline-6-sulfonic acid) (ABTS), and methanol with analytical grade were obtained from Chinese Medicine Reagent Co., Ltd. Deionized water was collected from a Milli-Q plus purification system (18.25 MΩ·cm).

Sample Preparation and Spectral
Acquisition. Each of 12 famous green teas included 10 samples. All green tea samples were crushed into powder and passed through a 200-mesh sieve to ensure homogeneous size of powder particles. e sieved powders were dried under vacuum at 60°C for 24 h and stored in a dryer spare part. For MIR spectroscopy, a Nicolet 6700 FT-IR spectrometer ( ermo Fisher Scientific Inc., USA) was used to collect spectra in the wavelengths ranged from 4000 cm −1 to 400 cm −1 with a resolution of 4 cm −1 . A total of 32 scans were accumulated per measurement. Sample scanning was acquired by diffuse reflectance mode. For NIR spectroscopy, 1.0 g powdered sample was weighed accurately and placed into a quartz glass cell for subsequent spectral measurement. NIR spectra were acquired by the diffuse reflectance mode with the use of Antaris II FT-NIR spectrometer ( ermo Electron Co., USA) over the 10000-4000 cm −1 range with a resolution of 8 cm −1 . For each spectrum, the number of scans was set to be 32. e average of three measured spectra of each sample was employed for data processing in this work.

Antioxidant Assay.
e sieved green tea powder was subjected to extraction by methyl at a ratio of 1 : 100 (w/v) at 40°C for 20 min and filtered thereafter. e supernatant was prepared for the antioxidant activity assay. e scavenging activities of each green tea were determined with 10 samples by two distinctive antioxidant assays including the DPPH method and the ABTS assay.
In the DPPH method, 90 μL from 1 mM methanol solution of DPPH was separately added into 910 μL methanol diluted green tea extracts (1 : 10). e reaction solutions were blended vigorously and their ultraviolet spectra were subsequently measured at 517 nm after 30-minute standing in the dark at room temperature. Each type of green tea was measured with 10 samples. e scavenging activity (SA) was calculated as below: In the ABTS assay, 0.15 mM, 0.3 mM, 0.6 mM, and 0.9 mM trolox standards was obtained by appropriate dilution with methanol. e hydrogen peroxide solution was diluted 1000 times by distilled water, and the peroxidase was diluted 10 times with phosphate buffer. e preparation of ABTS solution was performed by adding 152 μL phosphate buffer, 10 μL original ABTS solution, and 8 μL diluted hydrogen peroxide solution. e volume of 170 μL ABTS solution was prepared for the following measurements. e solutions contained 20 μL peroxidase and 10 μL methanol/ standard with different concentrations/diluted green tea extract (1 : 50), and 170 μL ABTS solution was measured at 414 nm after incubation for 6 min in the dark at room temperature. e total inhibitory activity of a green tea sample could be calculated according to the equivalent antioxidant capacity (TEAC). In this research, the antioxidant capacity of trolox standard (1 mM) was set at 1 mmol/L.

e Data Fusion of Spectroscopy and Data Processing
Method.
e raw instrumental data were acquired through above MIR and NIR spectroscopy individually. Data fusion was performed by matrix augmentation. e final MIR-NIR matrix consisted of 3426 variables (1869 MIR variables and 1557 NIR variables) of 120 samples from 12 famous green teas.
In this research, partial least square discriminant analysis (PLSDA) [27], moving window partial least-squares (MWPLS) regression [28,29], and optimized sampleweighted least-squares support vector machine (OSWLS-SVM) by particle swarm optimization algorithms [30] were used to identify the 12 famous green tea samples with different species and grade levels, select variable regions with higher weights, and construct appropriate spectraantioxidant activity relationship models, respectively. All data processing programs were written and performed in Matlab software environment.

Analysis of MIR and NIR Spectral Fingerprints.
e original MIR and NIR spectra of the 12 famous green teas are plotted in Figure 1. e raw MIR or NIR spectra are seriously overlapped and have a poor peak resolution, which makes the accurate assignments of specific peaks very difficult. For ease of peak attributions, chemical bonds are denoted as atom-atom, where an atom can be carbon (C), hydrogen (H), oxygen (O), and nitrogen (N). e MIR spectra of 12 famous green teas are shown in Figure 1(a), the characteristic absorption peaks can be attributed as follows. In the wavenumber range of 1600 cm −1 -970 cm −1 , there are some slight differences in the absorption of MIR spectra, while the absorption spectra of different green teas are quite different in the range of 2790 cm −1 -1905 cm −1 . Variations around the peak at 1200 cm −1 -1900 cm −1 can be associated with C-H group and the peak of 1500 cm −1 -1200 cm −1 can be associated with C-O group. e wide scope between 3400 cm −1 and 1600 cm −1 mainly consists of the overlapping of O-H stretching (3500 cm −1 -3020 cm −1 ) and various N-H bending and stretching vibrations of amide compounds (3400 cm −1 -1621 cm −1 ). e asymmetric vibration of CH 2 at 3020 cm −1 can be attributed to tea polyphenols, caffeine, and so on. ese components also contribute to O-H stretching vibration. Figure 1(b) shows NIR spectra of the 12 famous green teas, and the characteristic absorption peaks can be interpreted as follows: the peak at 4254 cm −1 is the combination absorbance of C-H symmetric stretching and C-H bending in the phenyl or the second overtone of CH 2 bending; peaks around 4347 cm −1 and 4413 cm −1 can be attributed to the combination absorbance of C-H asymmetric stretching and C-H bending, and these peaks may be relevant to tea polyphenols, catechin, and their derivatives; the peak band at 4656 cm −1 is due to the combination stretching vibration of C�C, �C-H bands and combination of the base bands of N-H stretching and bending; the peak around 5188 cm −1 can be explained as second overtone of C�O stretching bands, first overtone of C-H stretching bands in aromatic rings and combination of the base bands of O-H stretching and bending; and the peak at 5797 cm −1 can be responsible for the second overtones of C-H stretching in various groups and peak around 6823 cm −1 can be caused by the first overtone of O-H stretch from amino acids and caffeine. In general, different vibration modes can be attributed to functional groups from chemical components of green teas such as tea polyphenols, amino acids, caffeine, gallic acid, and theobromine. e low spectral resolution was mainly caused by the overlapping of multicomponents, and the shifts and distortions resulted from their interactions. Fortunately, MIR and NIR spectral fingerprints can still reflect characteristics of chemical bonds, and the multivariate variations among samples can provide useful information for classification and calibration. erefore, chemometric methods are required to develop discriminant model and spectra-antioxidant activity model.
In PLSDA models using the individual MIR and NIR data, the optimal number of PLSDA latent variables was both estimated as 5 by cross validation. Figure 2 demonstrates the dummy codes of the training sets and prediction sets for the 12 green teas with different species and quality grades. e accuracy rates of training sets and prediction sets for MIR and NIR were all calculated as 100%. To evaluate the accuracy and reliability of discriminant results, analytical figures of merit (FOMs) including sensitivity (SEN) and selectivity (SEL) were calculated. It was found that the PLSDA models constructed by either MIR or NIR data enable classification of the famous green teas with 100% sensitivity and specificity as shown in Table 2.
In this paper, spectral fingerprints of green teas were separately characterized by MIR and NIR spectroscopy in the consideration of antioxidant ingredients chemical information from different characteristic spectroscopic techniques. We also observed 100% discriminant accuracy in training and prediction by single MIR and NIR spectroscopy, indicating individual spectroscopic analytic technology coupled with supervised pattern recognition can obtain satisfactory classification capability to mine tiny chemical differences in famous green teas, and these differences in types and contents will be closely connected with the variations of their antioxidant activities. However, single instrumental analysis is always disadvantaged and restricted by the limited chemical information compared with multidimensional instrumental analysis, so it is imperative to reveal spectra-antioxidant activities by the combinatory MIR-NIR technology, which will be detail discussed in the next section.    0  10  20  30  40  50  60  70  80  90   0  10  20  30  40  50  60  70  80  90   0  10  20  30  40  50  60  70  80  90   0  10  20  30  40  50  60  70  80  90   0  10  20  30  40  50  60  70  80  90   0  10  20  30  40  50  60  70  80  90   0  10  20  30  40  50  60  70  80  90   0  10  20  30  40  50  60  70  80  90   0  10  20  30  40  50  60  70  80  90   0  10  20  30  40  50  60  70  80  90   0  10  20  30  40  50  60  70  80  90   0  10  20  30  40  50  60  70  80  green teas were measured using the reference methods described in the above sections. e values measured using DPPH were in the range from 20.59% to 51.27%. For ABTS, the total antioxidant activity values were characterized by trolox equivalent antioxidant capacity (TEAC) and were in the range of 1.81 to 5.72 mmol/g. e descriptive statistics for these two antioxidant activity indices are presented in Table 3.

Spectra-Antioxidant Activity
In this study, OSWLS-SVM models were developed to separately relate the single MIR, NIR, and combinatory MIR-NIR data to antioxidant activities. e total 120 famous green tea samples were split into a calibration set with 60 samples, a validation set with 30 samples, and a prediction set with 30 samples by DUPLEX method [31]. It is widely recognized that before a multivariate regression model is built, a well-performed variable selection can be helpful to improve the predictive ability of the model. In this paper, moving window partial least-squares (MWPLS) regression as an interval selection method was utilized to extract useful chemical information from MIR, NIR, and fusion of MIR-NIR before the construction of spectra-antioxidant activity relationships by OSWLS-SVM. e individual and combinatory spectroscopic data intervals with lower sums of squared residues (SSR) and less model complexity are selected to reconstruct instrumental response matrices.
In the process of variable selection, residue lines obtained by MWPLS for training set with individual and combinatory spectroscopic data are shown in Figure 3. Tea is particularly rich in polyphenols including (-)-epicatechin (EC), (-)-epigallocatechin (EGC), (-)-epicatechin gallate (ECG), (-)-epigallocatechin gallate (EGCG), and (-)-gallocatechin gallate (GCG), which act as antioxidants in vitro and in vivo by scavenging reactive oxygen and nitrogen species and chelating redox-active transition metal ions [32]. Chemical structures of some representative polyphenols in green teas are shown in Figure 4. In addition, tea polysaccharide conjugate (TPC) was also proved to have potent antioxidant activity [33]. As seen in erefore, the selected variable information based on MIR or NIR could adequately reflect the chemical structures from antioxidant ingredients, making the OSWLS-SVM models more interpretable and reasonable.
For OSWLS-SVM, a 70-cycle PSO was carried out to search for the sample weights and hyperparameters minimizing the objective function. e estimated sample weights are shown in Figure 5. It can be seen that most of the samples are high weighted, and useful information carried by these samples would be retained for prediction. e prediction results of antioxidant activities by using the DPPH and ABTS methods based on OSWLS-SVM are summarized in Tables 4 and 5, respectively. ese results demonstrated that spectroscopic technology coupled with OSWLS-SVM could reveal potential connection between spectral fingerprints and antioxidant activity and accurately predict the antioxidant activities in green teas. e recovery was determined by the ratio of predicted radical scavenging activity value to the true radical scavenging activity value. For DPPH scavenging activity, the average predicted recoveries gained from MIR, NIR, and spectroscopic fusion data were 99.8 ± 2.5, 99.7 ± 2.4, and 100.1 ± 1.7%, respectively. For ABTS radical scavenging activity, the average predicted recoveries gained from MIR, NIR, and spectroscopic fusion data were 99.7 ± 1.0, 99.9 ± 0.8, and 100.0 ± 0.4%, respectively. Moreover, the root-mean-squared error of the calibration (RMSEC) in the calibration set and prediction rootmean-squared error (RMSEP) of samples in the prediction set were used to evaluate the accuracy of calibration models. e model with spectroscopic fusion data can obtain the smallest prediction root-mean-squared error, indicating that combinatory spectroscopic model is better than the models using individual data. Meanwhile, it was demonstrated that the OSWLS-SVM model was effective.
In order to compare prediction performance of MIR, NIR, and MIR-NIR data, the predicted antioxidant activities versus the actual antioxidant activities for the 12 famous green teas are shown in Figure 6. For OSWLS-SVM model for DPPH using MIR, there's slight deviation between experimental values and prediction values in prediction set, but significant deviation exists in monitoring set, indicating that single MIR information could not fully reflect DPPH scavenging activity despite the high correlation. For the model of NIR for DPPH, some prediction values were not very close to actual values in training set; in addition, predictions of several monitoring samples slightly deviate from the actual values for both DPPH and ABTS. NIR also seems to be insufficient in characterizing all the potential antioxidant chemical ingredients. For OSWLS-SVM model using data fusion, the prediction values were very close to          actual values. ese results confirmed that the combinatory spectroscopic method coupled with OSWLS-SVM was more effective than the single spectroscopic method. e results in Figure 6 also indicate that spectra-antioxidant activity relationship models for ABTS were slightly better than the models for DPPH because DPPH and ABTS radicals have different sensitivity to taste ingredients. e performance of the proposed method was also evaluated and validated by some statistical parameters. A ttest was performed to compare the actual and predicted antioxidant activities estimated by the DPPH and ABTS methods (Tables 4 and 5). Degrees of freedom n is 29, and confidence level is 95%, for MIR with DPPH method t � 0.0841 < t 29 0.05 , MIR with ABTS method t � 0.4615 < t 29 0.05 , NIR with DPPH method t � 0.0820 < t 29 0.05 , NIR with ABTS method t � 0.0634 < t 29 0.05 , for combinatory spectroscopy with DPPH method t � 0.0441 < t 29 0.05 , combinatory spectroscopy with ABTS method t � 0.0054 < t 29 0.05 . e results indicate that there's no significant difference between actual and predicted antioxidant activities. Additionally, data fusion can obtain improved capacity to directly predict antioxidant activities of famous green teas.
To further compare the accuracy of individual and combinatory spectroscopy, the experimental antioxidant activities were linearly regressed against the predicted antioxidant activities. e calculated intercept and slope were compared with their ideal values (0 and 1) based on the elliptical joint confidence region (EJCR) test [34]. e EJCR test has been widely used and considered to be an effective and reliable method to evaluate the accuracy of result [35][36][37][38]. Figure 7 gives the results of EJCR tests for DPPH and ABTS methods. With the ideal point (0,1) signed with a pentacle (★) lying in the center, the elliptic size of combinatory spectroscopy was found to be smallest and NIR had the biggest elliptic size, indicating that the performance of the OSWLS-SVM model based on combinatory spectroscopic data was better than those of individual spectroscopic data.
e results further confirm that spectroscopic fusion strategy coupled with OSWLS-SVM could more sufficiently explain the relationship between overall chemical information of ingredients and antioxidant activity.

Conclusions
In this study, we developed a novel and effective alternative for the simultaneous recognition of the species and grades of 12 famous green teas and for the multivariate calibration of antioxidant activities of green teas based on MIR and NIR spectroscopy coupled with chemometrics. e 12 famous green teas could be successfully classified with 100% recognition rate using MIR or NIR spectroscopy individually coupled with PLSDA. Furthermore, individual MIR and NIR as well as combinatory spectroscopic data were subsequently related to their antioxidant activities by OSWLS-SVM. e results confirmed the advantages of data fusion of MIR and NIR in comprehensive and reliable characterization of chemical components in green teas. Furthermore, OSWLS-SVM was proved to be an effective approach to characterize the potential relationship between spectroscopic data and antioxidant activities, opening a new avenue for rapidly and more accurately estimating antioxidant activities in other foodstuffs.

Data Availability
Our data are available on request by interested readers.

Ethical Approval
is article does not contain any studies with human participants or animals performed by any of the authors.  Figure 7: (a) EJCRs for DPPH scavenging activity by applying E-nose, E-tongue, and spectroscopic fusion data; (b) EJCRs for ABTS radical scavenging activity by applying E-nose, E-tongue, and spectroscopic fusion data. e pentacle (★) indicates the ideal points (0, 1).