Development and Validation of Near-Infrared Methods for the Quantitation of Caffeine, Epigallocatechin-3-gallate, and Moisture in Green Tea Production

The quality of tea leaves (e.g., their color, appearance, and taste) can be directly influenced by the tea production process, which is closely connected with the content of a number of chemical components formed during the production of the tea leaves. However, the production process is now controlled by people's experience, making its quality significantly different. NIRS is a time-saving, cost-saving, and nondestructive method. Therefore, it is necessary to introduce NIRS technology into the quality control of the tea production process. In this study, a quantitative analysis model of caffeine, epigallocatechin-3-gallate (EGCG), and moisture content was established by near-infrared spectroscopy (NIRS) which was united simultaneously with partial least squares (PLSR) for online process monitoring of tea production. The model parameters show that the established model has fine robustness and outstanding measuring accuracy. Then, the feasibility of the established method is verified by the traditional method. Through the verification of the precision of the instrument and the stability of the sample, it is clarified that the model can be further utilized to monitor tea product quality online in a productive process.


Introduction
Tea is made of tea buds and leaves of Camellia sinensis. It is known as the second largest drink in the world [1]. Its production process is composed of fixation, withering, rolling, fermentation, polling, drying, etc. e different kinds of tea (green, black, and white tea) are basically different in their production process ( Figure 1) [2]. e two most bestselling categories are green (unfermented) tea and black (fully fermented) tea, accounting for 98% of the world's tea production and consumption approximately [3].
In 2015, the Intergovernmental Group for Tea, which was established by the Food and Agricultural Organization (FAO), outlined the current market conditions and intermediate-term market prospects for teas (until 2023). e world's green tea exports are expected to grow at an annual rate of 7.1% and reach 750981 tons by 2023. China is projected to dominate continuously, with exports of 458579 tons [4]. Tea quality is very essential for its market value, which is conventionally evaluated by skilled tea tasters. However, quality is determined by factors such as the color, appearance, and flavor of the tea. ese factors are closely related to some changes in the content of certain chemical components, including catechins, moisture content, and caffeine, during the production of tea [5][6][7][8].
In this complex composition, catechin is considered to be the most healthy component of green tea, accounting for about 25% of the dry weight of fresh tea leaves [9,10]. e composition of catechins in green tea varies with light variation, growing location, clonal variation, species, season, and altitude, and mainly with processing technology. EGCG, a substance with relatively high catechin content, is the main source of the astringency and bitterness of tea beverages and is a biologically active compound considered to be a significant qualitative factor in tea leaves [5,11,12].
Caffeine, which composes about 3% of the dry weight of tea leaves, is known for its stimulating properties and is an important quality factor in tea. Compared with the catechin in tea polyphenols, caffeine is very significant in improving the flavor of tea [13,14]. e quality of tea also depends on its moisture content ( Figure 2). Green tea is affected by moisture during transportation and storage. e affected green tea will produce a train of complex biochemical reactions, such as oxidation, polymerization, and degradation, resulting in poor tea quality [15,16]. Water content is an important index to measure the quality of green tea, which has an important impact on the shape, color, aroma, and taste of green tea [17]. By controlling the moisture content, the inevitable deterioration caused by changes in the physicochemical properties of tea leaves can be avoided, and the sensory properties of tea leaves can be kept fresh and stable during long-term storage [18][19][20][21][22].
For the sake of improvement of the efficiency of the tea production process and development of high-quality tea products, it is very significant to establish a real-time and credible analytical method for the measurement of these chemical compounds in tea samples.
In recent years, NIRS, a powerful tool, has been widely used for a series of processes, including raw material testing, product quality control, and process monitoring [23][24][25]. NIRS is a simple, rapid, and nondestructive analytical technique. In addition, compared with other analysis techniques, it enables the analysis of complex matrices without manipulating samples, greatly shortening the analysis time, which has aroused the growing interest of the pharmaceutical industry. [26,27]. So far, the application of NIRS in process monitoring has been widely reported. Zuo et al. [28] explored the potential applications of NIRS in monitoring the steaming process of Gastrodiae rhizoma. To confirm four chemical compositions and moisture changes in the process of steaming, about 10 laboratory-scale batches were utilized to establish quantitative models. Wu et al. [29] developed a pragmatic model enabling monitoring of the extraction process online in red peony by NIRS. e establishment of the models was utilized to monitor the online extraction process in real time. At the same time, considering the long-term application of the developed models, a model updating method was proposed. However, in tea studies, NIRS is only used to determine the content of chemical components in finished tea leaves, and few studies report that NIRS enabled monitoring of the tea production process [30][31][32][33]. In addition, traditional methodological validation was not used to check instrument accuracy and sample stability.
In this study, a method for the determination of moisture content and caffeine in green tea production was developed by using NIRS and partial least square regression. e root mean square error of prediction (RMSEP) and the coefficient of determination (R 2 ) of the prediction set were used for evaluating the performance of the finished model. e traditional methodology was used to validate the performance of the model in terms of instrument precision and sample stability so that it could be monitored in real time during the production process. is controllable and highefficiency production technology could be used in the future as a preliminary basis for large-scale production of highquality tea.   Figure 1: e production process of various teas.

Preparation of Reference
Solutions. e appropriate amount of caffeine or EGCG was accurately weighed into two volumetric flasks of 10 mL each. Subsequently, 10 mL of methanol was added to each flask, and the mixtures were mixed by ultrasonication to obtain two reference stock solutions of caffeine or EGCG with concentrations of 3.47 mg/mL and 5.05 mg/mL, respectively. en, 1 mL of the reference stock solution was pipetted into a 10 mL volumetric flask. Methanol was added to the scale line and sonicated and mixed to obtain reference solutions of 0.347 mg/mL and 0.505 mg/mL for caffeine and EGCG, respectively.

Preparation of Sample
Solutions. Different batches of tea leaves were accurately weighed in a 50 mL conical flask, then 25 mL of 70% methanol solution was added, capped, and weighed, the ultrasonic extraction was carried out for 30 minutes (power 500 W and frequency 45 kHz), the conical flask was allowed to cool to room temperature, the leaves were weighted again, the reduced weight was supplemented with 70% methanol, the solution was mixed thoroughly and allowed to stand for 10 minutes, and the supernatant was passed through a 0.22 μm filter membrane to obtain the sample solution.

Determination of Moisture Content.
A 5 g sample of tea was weighed into a bottle and subjected to heat treatment at 103 ± 2°C until it reached a constant weight. e weight lost 1.30 Tedding fresh leaves (moisture content: 75.4%) De-enzyming (moisture content: 63.9%) 1.50 Rolling (moisture content: 39.6%) shaping and making appearance tippy (moisture content: 4.7%) Journal of Analytical Methods in Chemistry from the sample expressed as a percentage represents its moisture content [3].

Acquisition Conditions.
e ANTARIS II FT-NIR analyser was used for spectral acquisition during the entire tea production process. e acquisition process is as follows: detection wavelength range 4,000 to 10,000 cm −1 and resolution 8 cm −1 . 5 ± 0.1 g of tea sample was put into the sample attachment (rotating cup), and then, the rotating cup was mounted on the NIR spectrometer and waited for detection. To eliminate background effects, three spectra of each sample were collected with air as a reference. Between each measurement, the cup with the tea sample was rotated by 120°. e average values of these three spectra were used in the following analysis. Taking into account that relative humidity and room temperature may affect the surface moisture of tea, relative humidity was maintained at 80% and room temperature was maintained at 25°C during the collection of spectra.

Spectral Pretreatment.
TQ Analyst software is used to analyse the spectrum. In addition to the improved prediction performance of the model, it is also necessary to transform the NIR spectra to remove irrelevant information and noise. erefore, several different methods were used to convert the spectra. For the selection of the light range type, the multiplicative scattering correction (MSC) method was chosen because the light range could not be kept constant due to the particle size and homogeneity of the sample in this experiment; for the spectral shift problem, the first-derivative (FD) and second-derivative (SD) methods were chosen; for the filtering method of the spectra, the Savitzky-Golay (SG), no smoothing (Ns), and Norris derivative (ND) methods were chosen. For the quantification of these three compounds, PLS regression models were developed and employed, and their final performance was explored and compared systematically.

Division of Calibration Sets and Test Sets.
In order to ensure that the calibration set content range included the verification set content range, the contents of different batches were sorted in descending order. A total of 96 calibration sets and 60 test sets were selected.

Model Verification.
We followed the methods of Shi et al. [34]. e root mean square error between correction and prediction (RMSEC and RMSEP) and the correlation coefficient between correction and prediction (RC 2 and RP 2 ) were used as the evaluation indexes of the best quantitative model. A model is considered high performance if it has high R 2 and low cross-validation root mean square error (RMSECV) and RMSEP values.
e quality of the model is not only based on the proper selection of modelling methods and spectral processing methods, as well as the evaluation of the model performance through R 2 , RMSEC, and RMSEP, but also the failure to consider the stability of the sample and the precision of the instrument will affect the establishment of the model. erefore, it is necessary to use the traditional methodology for verification.

Optimization of the Extraction Efficiency of Components to
Be Tested. Different solvents showed different extraction rates of the components to be tested. In this experiment, the extraction rates of 50% methanol, 70% methanol, 50% ethanol, 70% ethanol, and 95% ethanol solvents were investigated for the components to be tested in the samples. As shown in Figure 4, the peak areas of each component in HPLC were used to represent the extraction rates of different solvents. As can be seen in Figure 4, 70% methanol exhibited higher extraction efficiency.

Cluster
Analysis of Tea Samples. Cluster analysis was performed by analyzing the Euclidean distance between three relevant chemical components (moisture, caffeine, and EGCG) of the tea production process. As can be seen from Figure 5, the tea samples were grouped into 4 main categories, which basically correspond to the four key processes of tea production. is indicates that the contents of moisture, caffeine, and EGCG change with the tea production process, thus demonstrating the necessity of using NIR spectroscopy to monitor the changes of moisture, caffeine, and EGCG contents in the tea production process online. It can also be seen in Figure 6(b) that the peak at 5323 cm −1 in the spectrum is the second overtone vibration of the carbonyl group, followed by peaks at C-H (7212 cm −1 ), -CH 2 (5742 cm −1 ), and -CH 3 (5808 cm −1 ). e overtone, vibration, and stretching of these groups are all related to the chemical structure of the components to be tested, caffeine, EGCG, and moisture [13].

Reference Data Description.
To build a PLS model, one hundred and fifty-six samples are selected in the experiment. In order to avoid a subset selection bias, we use the Kennard and Stone (K-S) algorithm, which covers the space in a uniform manner by maximizing the Euclidean distances between selected and remaining objects. Finally, we selected two-thirds of the total samples as training sets and the remaining samples as test sets, i.e., 96 spectra as a training set and the remaining 60 spectra as a test set. As seen from Tables 1 and 2, the scope of the training set contains the scope of the test set. erefore, the choice of samples is appropriate.

Quantitative Analysis of the PLS Models.
Due to the influence of various factors, the original spectrum contains not only its own near-infrared spectrum information but also other noise information. erefore, it is necessary to preprocess spectral data by MSC + SD/FD + NS/SG/ND. In the processing of building PLS models, the "mean value" method is used to preprocess the data centrally. e performance of the final PLS models was evaluated by RMSECV, RMSEP, and R 2 .
It is well known that the two most critical parameters in the PLS modelling and optimisation process are the no. of PLS factors and the spectral preprocessing method. e optimum number of factors is determined by the minimum RMSECV. Table 3 shows the results of the calibration model with variable spectral pretreatment methods to determine the contents of caffeine, moisture, and EGCG. Compared with other methods, for the caffeine, the lowest spectral pretreatment method of RMSECV was the combination of MSC + FD + N-D, which was 0.14580%, but there were 9 partial least squares factors in this model. Exorbitant PLS factors could contain particular information during modelling, resulting in a worse generalization performance of the PLS model. e "overfitting" of the model refers to this phenomenon in reality. erefore, for the combination of MSC + FD + Ns as a spectral pretreatment method, the RMSECV value is 0.16092%; for the moisture content, the RMSECV value after MSC + FD + Ns combination spectral pretreatment is the lowest, which is 5.68517%; and for the EGCG, the RMSECV value after MSC + SD + N-d combination spectral pretreatment is the lowest, which is 0.31884%. Table 4 shows the statistics of the best calibration model for the three indicators.
Scatter plots, Figures 7(a)-7(c), show the partial least squares quantitative models for the components to be tested, moisture, caffeine, and EGCG, respectively. e circles and plus signs in Figure 7 represent the data in the training group and the test group, respectively. Corresponding the training group data to the training group spectra and then using partial least squares to perform a linear fit, a good linear relationship was finally obtained for three components to be tested. As seen in Figure 7(a), the RMSEP value of water was 5.600, and the correlation coefficients were 0.9812 and 0.9796 for the training and test groups, respectively; as seen in Figure 7 training and test groups, respectively; and as seen in Figure 7(c), the RMSEP value of EGCG was 0.395, and the correlation coefficients were 0.9885 and 0.9812 for the training and test groups, respectively. e predictive ability of a quantitative analytical model is directly related to the choice of the number of principal factors. Figures 8(a)-8(c) show the variation of RMSECV values with the number of principal factors for moisture, caffeine, and EGCG, respectively, of the substances to be measured. From Figure 8(a), it can be seen that the optimal number of principal factors for better predictive ability of the moisture analysis model is 4; from Figure 8(b), it can be seen that the optimal number of principal factors for better predictive ability of the caffeine analysis model is 7; and from Figure 8(c), it can be seen that the optimal number of principal factors for better predictive ability of the EGCG analysis model is 5.

Method Validation.
After the establishment of the NIR quantitative model, it is necessary to evaluate the applicability of this model. Generally, the prediction results of the NIR method and the measurement results of the reference method are required to be statistically analyzed to judge the performance of the established model. To prove that this method is suitable for the corresponding testing requirements, based on the abovementioned evaluation indicators, such as R 2 , RMSEC, and RMSEP, this experiment intends to verify the model method in terms of linearity, accuracy, repeatability, intermediate precision, and robustness according to the guidelines of the International Conference on Harmonization (ICH) [35].
3.6.1. Linearity. Linearity refers to the degree of the direct linear relationship between the detection results and the concentration (amount) of analyte in the test product, which is the basis of quantitative determination. Because the NIR spectrometric method is an indirect analysis method established by multiple linear regression, its linear study is different from the traditional analysis method. It can be evaluated by analyzing the relationship between the predicted value of NIR and the true value of the object to be measured. Table 5  s h a p i n g a n d m a k i n g a p p e a r a n c e t i p p y    intercept of the linear parameters directly affect the prediction error of the model. When the intercept is 0 and the slope is 1, the total error of the model is 0. erefore, it can be concluded from the regression equation that all the NIR quantitative models established in this experiment have good linearity.

3.6.2.
Accuracy. e accuracy of the method is mainly evaluated by comparing the analysis of the NIR predicted values of the analytes to be measured with the reference values and a high degree of similarity of results in high accuracy. e accuracy was evaluated by a paired t-test between the NIR predicted value of the prediction set samples and the reference value, and the results are shown in Table 5. t-test results showed that the P values of the three models of caffeine, moisture content, and EGCG were all greater than 0.05, indicating that there was no significant difference between the predicted values of the three NIR quantitative models and the reference values, and the accuracy was good.
3.6.3. Repeatability. According to the ICH guidelines, repeatability refers to the accuracy of the method in a shorter period of time and in the same operating environment. In this experiment, the same operator measured the NIR spectrum of the same sample 6 times on the same day to investigate the repeatability of the method. e results are shown in Table 5. e RSDs between the six predicted values of the three NIR quantitative models were 2.699%, 3.654%, and 1.459%, respectively, indicating good reproducibility.

Intermediate Precision.
Intermediate precision considers the effect of random factors on the analytical results in the same laboratory. Because of the differences in measurement results caused by different operators and operating times, in this experiment, two operators measured the same sample for three consecutive days to investigate the intermediate precision of the method. e results are shown in Tables 5 and 6. According to ANOVA, there was no significant difference between the results of NIR spectra measured by different operators and different operating times (P values were all greater than 0.05), indicating that the intermediate precision of the three NIR quantitative models was good.

Robustness.
e robustness examines the reliability of the NIR quantitative model in the normal course of application of the method. rough the paired t-test method, the contents of each component predicted by HPLC and NIR were compared, as shown in Table 7. Before the paired t-test, the F-test was used to compare the variance of the two methods to determine whether the difference was significant, and the results showed that the experimental statistic was lower than the critical level (for a significance level of 0.05). erefore, it can be concluded that there is no significant difference in standard deviations between the two methods for the sample set.

Conclusions
is study assessed the practicability of NIR spectroscopy in the quality control of green tea. First, the credible NIR quantitative models of caffeine, moisture, and EGCG in the production process were established and their robustness was validated. en, the feasibility of the proposed method was verified by the traditional method, and the precision of the instrument and the stability of the sample were verified. It was further demonstrated that the model could be applied in monitoring tea product quality online in the productive process. In addition, compared with the traditional HPLC method, this technology is fast and nondestructive and has obvious advantages, especially in terms of sample volume required, number of steps required, and time required (see Table 8), which is beneficial for automated plants to produce high-quality tea. In summary, the abovementioned study has shown that real-time and online monitoring of the green tea production process in an automated plant is feasible using NIR combined with PLS. However, further optimisation of the prediction model requires the collection of a larger sample to develop a more robust model.

Data Availability
e data used to support the results of this study are consistent with the data in this paper. Any further information is available from the authors upon request.

Conflicts of Interest
e authors declare no conflicts of interest.