Rapid Recognizing the Producing Area of a Tobacco Leaf Using Near-Infrared Technology and a Multi-Layer Extreme Learning Machine Algorithm

A novel recognition method was put forward to identify the producing areas of the flue-cured tobacco leaves rapidly and non-destructively by using a near-infrared (NIR) spectrometer and a multi-layer-extreme learning machine (ML-ELM) algorithm. In contrast to traditional linear discriminant analysis (LDA) and extreme learning machine (ELM) algorithms, the accuracy, sensitivity and specificity were the highest for the proposed ML-ELM algorithm. The ML-ELM models for different producing areas of Yunnan tobacco leaves had the best generalization ability and prediction results. Besides, the above three algorithms were also identified by using the chemical index data. The experimental results indicated that the NIR spectroscopy technology together with ML-ELM algorithm achieved the best prediction performance both using the NIR spectral data and chemical index data. It indicates that the combination of NIR and ML-ELM can recognize different producing areas of Yunnan tobacco leaves rapidly, accurately, and non-destructively.


Introduction
Tobacco is a high economic crop in China. The quality of cigarette product is significantly affected by the intrinsic attribute of the tobacco leaf itself. The partition for tobacco leaves producing area and quality level play crucial roles in the final cigarette products quality management. Nowadays, the producing area recognition of a tobacco leaf is mainly dependent on the chemical analysis and human sensory responses. 1,2 The chemical analysis method is expensive, time-consuming and cannot be synchronized to the tobacco producing and processing. Meanwhile, the results reliability of human sensory responses is not quantified and sometimes it is subjective. 3 Herein, it is very important and necessary to develop a new method which is rapid, cheap, highly efficient and more objective.
As a useful analytical chemistry tool, near infrared (NIR) spectroscopy exhibits advantages such as nondestructive, cheap, accurate, and fast. 4,5 It has been widely used in the fields of agriculture, 6 medicine, 7 food, [8][9][10] traditional Chinese medicine [11][12][13] and so on. In the previous research, 14 the different producing areas of tobacco leaves were classified by artificial neural networks together with NIR spectroscopy. The pattern recognition for tobacco leaves planted in different producing areas, positions and levels was carried out by Mahalanobis distance criterion based on principal components of the leaves characterized by NIR. 15 Du et al. 16 has built 115 models of destination tobacco leaves of different producing areas, levels and varieties with soft independent modeling of class analogy method and NIR. Besides, NIR with least-support vector machines was applied to determine producing areas of tobacco leaves. 17 The previous reports cited above were mainly focused on the recognition of the producing areas of tobacco leaves that were planted in different provinces of China. There were few researches concerning on the recognition of the producing areas of tobacco leaves that were cultivated in different cities of one province, especially in Yunnan Province. As the largest tobacco leaf planting area in China, the tobacco leaf production of Yunnan Province accounted for 45% of China's total production in 2020. In fact, the chemical and style characteristics of the tobacco leaves are very different like the climate and altitude variation of different cities in Yunnan province. Therefore, it is also necessary study to determine differences of the chemical and style characteristics of the tobacco leaves from different cities using a rapid method.
Extreme learning machine (ELM) put forward by Huang et al. 18 has been widely used in classification, 19 regression, 20 feature selection 21 and so on. However, ELM still has some key problems to be solved, especially processing the high-dimensional spectral data. Deep learning is used to analyze the characteristics of the spectral data. Multi-layer extreme learning machine (ML-ELM) is one of the unsupervised learning methods using both deep learning and extreme learning machine. This learning process of the method is layer by layer. Comparing with the traditional ELM algorithm, ML-ELM algorithm can not only obtain the essential features of the original spectral data, but also can reduce the dimensionality. Meanwhile, as a kind of deep neural network, ML-ELM algorithm can approximate the complicated function and does not need iteration when building the calibration model. Comparing with the traditional ELM algorithm and other machine learning algorithms, the generalization performance and speed of ML-ELM algorithm are better. It has several advantages over ELM algorithm in processing the spectral data. ML-ELM is already been used in image recognition, hyperspectral data classification, speech recognition and so on. 22 However, there are few applications in the classification of NIR spectral data. Considering the above discussion and analysis, ML-ELM is very suitable for the processing of NIR spectral data.
In the study, a novel classification method using NIR technology and ML-ELM algorithm was put forward to recognize the producing area of flue-cured tobacco leaves rapidly and non-destructively. The experimental results showed that the combination of NIR spectroscopy and ML-ELM algorithm is a promising tool for identifying the different producing areas of Yunnan tobacco leaves accurately and non-destructively.

Sample preparation and test
The NIR spectrometer should be firstly preheated with one hour. Then, the tobacco leaves were scanned after the tests of the NIR spectrometer performed successfully. In the scanning process, all the tobacco leaves should be ground into the powder which was then put into the rotating sample cup. The absorbance spectra of the tobacco leaves samples were acquired by using a NIR-Antaris II (Thermo Fisher Scientific America, Massachusetts, USA). The range of the wavenumber was 10,000-4000 cm -1 . The spectral resolution was 4 cm -1 and 64 scans were co-added. A polytetrafluoroethylene (PTFE) background disc was used as the spectral reference. Each sample was recorded with three spectrums and the means of the three spectrums were calculated as the final spectrum of each sample. 23,24 In the following research, two C1F and C2F classes experimental sets from different producing areas were chosen. The first C1F experimental set has 501 samples and the samples were harvested in 2019 from Jinggu, Yaoan, Xinping and Luliang cities, Yunnan Province of China. The second C2F experimental set has 643 samples and the samples were also harvested in 2019. It contains 4 different producing areas: Xuanwei, Luxi, Jingdong and Malong cities, Yunnan Province of China. The distribution of the 8 producing areas are shown in Figure 1. It can be seen from Figure 1 that some locations of 8 producing areas are very close. As the result, it may be difficult to recognize the producing areas of tobacco leaves. In the experimental process, the samples were divided into 3 parts. It contained calibration, validation and testing samples. The above 3 types of samples were chosen randomly. 345 samples were chosen as the calibration set, 89 samples as the validation samples and 67 samples as the test set for data set 1. 430 samples were chosen as the calibration set, 129 samples as the validation samples and 84 samples as the test set for data set 2. The results were showed in Table 1 for the details of data sets 1 and 2. The NIR spectral data of the two sets collected by the NIR spectrometer was shown in Figure 2.
The content of total sugar, reducing sugar, potassium, total plant alkali, chlorine, and total nitrogen are 6 routine chemical indexes of a tobacco leaf. The values of 6 indexes can reflect the quality of a tobacco leaf to some degree and they have also been used for recognizing the producing areas of tobacco leaves in the previous research. 15,16 As the result, the above 6 routine chemical indexes of all the tobacco leaf samples were also detected by using continuous flow analytical method with Skalar SANPWS flow analyzer (Breda, Netherlands). 25 The results of Table 2 showed the average values and standard deviations of the 6 routine chemical indexes of tobacco leaves in 8 different producing areas. It can be seen from Table 2 that 6 routine chemical indexes of a tobacco leaf exhibited some difference although the locations of some producing areas are close to each other. For example, the maximum average value of total sugar, potassium, reducing sugar, total plant alkali, chlorine, total nitrogen were 20.29, 52.03, 55.27, 56.41, 222.88, 15.50% higher than that of the minimum value in 6 different producing areas, respectively. The differences of the 6 routine chemical indexes of a tobacco leaf in different producing areas are huge. Therefore, it is necessary to recognize the tobacco leaves producing areas in different cities of Yunnan province.

Theory of ELM algorithm
ELM algorithm was put forward by Huang et al. 18 and the hidden nodes of ELM algorithm were usually performed randomly. If the input data is mapped to L dimensional ELM random feature space, then the network output can be defined as equation 1: are the hidden node outputs and g i (x) is the output of the i-th hidden node. Given N training samples , ELM is to resolve the following learning problems: where T = [t 1 , … , t N ] T are the target labels and H = [h T (x 1 ), … , h T (x N )] T . The output weights β can be calculated by equation 3:  where H † is the Moore-Penrose generalized inverse of matrix H.
If the solution wants to be more robust and has better generalization performance, a regularization term needs to be added and it is shown in equation 4.
where C is the regularization coefficient and the values of this parameter will be assigned randomly after the appropriate hidden layer numbers are set.

Theory of ML-ELM algorithm
Multi layer neural networks perform poorly when trained with back propagation (BP) only. Hence hidden layer weights in a deep network are initialized using layer wise unsupervised training and the whole neural network is fine-tuned using BP further. Similar to deep networks, each ML-ELM hidden layer weights are initialized using extreme learning machine auto encoder (ELM-AE) which performs layer wise unsupervised training. However, in contrast to deep networks, ML-ELM does not require fine tuning.
The activation functions of ML-ELM hidden layer can be either linear or nonlinear piecewise. If the number of nodes L k in k-th hidden layer is equal to the number of nodes L k-1 in the (k -1)-th hidden layer, g could be linear, otherwise, g could be nonlinear piecewise, e.g., sigmoidal function.
where H k represents the outputs of ML-ELM k-th hidden layer. If k-1= 0, this layer is the input layer, and H k represents the inputs of ML-ELM. β k represents the output weights of ELM-AE, and the inputs of ELM-AE are H k at this time. The output weights β k of ML-ELM can be analytically calculated using regularized least squares.
The flow chart is shown in Figure 3 for recognizing the producing area of a flue-cured tobacco leaf by using ML-ELM algorithm. The calibration and test samples were serial treated by spectral pre-processing, feature extraction using principal component analysis (PCA), parameters determination and classification of ML-ELM.

Measures of classification performance
Confusion matrix is a concept from machine learning, and it contains information about actual and predicted classifications done by a classification system. A confusion matrix has two-dimensions, one dimension is indexed by the actual class of an object, the other is indexed by the class that the classifier predicts. Figure 4 presents the basic form of confusion matrix for a classification task.  A number of measures of classification performance can be defined based on the confusion matrix. Some common measures are given as follows.
Accuracy is the proportion of the total number of predictions that were correct: (6) where TP is true positive, TN is true negative, FP is false positive, and FN is false negative.
Sensitivity is a measure of the ability of a prediction model to select instances of a certain class from a data set, which is defined by the formula: Specificity is the proportion of actual negatives measured that were correct: (8)

Results and Discussion
According to the pre-processing and PCA operation methods used in previous reports, [23][24][25] firstly Savitzky-Golay derivative pre-processing operation is performed on the spectral data to smooth and remove the noise. 23 Here first derivative with a 11-point number of smoothing points and two polynomial order methods were chosen. Figure 5 shows the spectra after pre-processing. The band assignments are: total sugar, 5050, 5200, and 7194 cm -1 ; potassium, 5050 and 5200 cm -1 ; reducing sugar, 4194, 4444, and 4789 cm -1 ; total plant alkali, 4444 and 4664 cm -1 ; chlorine, 4194, 4789, and 5200 cm -1 ; total nitrogen, 4789 and 7194 cm -1 . It could also be seen from Figure 5 that the resolution of the spectral data has been improved after the pre-processing operation. 23,24 However, the dimension of the spectral data is still huge after pre-processing operation. Therefore, PCA was used for reducing the dimension of the data. 25 Here, mean centering operation was used in the PCA analysis. The result of PCA showed that the first six principal components are 99.32 and 99.78% for data sets 1 and 2, respectively. This means that the first six principal components contained vast majority of information for both data sets 1 and 2.
Then, linear discriminant analysis (LDA), ELM, and ML-ELM algorithms were applied using the loadings and scores obtained by PCA operation. Accordingly, they were also used to identify the spectral data of different producing areas of tobacco leaves. To achieve the fair comparison and avoid the randomness in test results, all the calibration and testing samples were randomly chosen and the three algorithms ran on the same calibration and test splits for each calculation. For ML-ELM, the number of layers was an important parameter. Therefore, the first work was to define the number of hidden layers of the ML-ELM algorithm in order to achieve the better performance with less parameters. Here, sigmoid was set as the activation function and the number of hidden nodes was set as 10 and 500. It can be seen from Figure 6 that the overall accuracies of both data sets were increasing firstly and then decreasing with the number of the hidden layers increasing. The accuracy was the highest when the number was 3. Three  hidden layers of the ML-ELM algorithm will be chosen in the following experiment.
As mentioned above, the samples were divided into 3 parts. It contained calibration, validation and testing samples. The above 3 types of samples were chosen randomly. Here we use accuracy, sensitivity, and specificity to evaluate the performances of the calibration models, validation results and testing results for each producing area and each algorithm. In order to reflect the performance of the different predictors faithfully and to avoid overfitting, the experiment is performed and verified using a ten-fold cross validation. It means all the three algorithms were calculated 10 times. For the sake of comparison, the performance of LDA, ELM and ML-ELM algorithms are shown in Tables 3 and 4 for data sets 1 and 2 in the form of a confusion matrix. As shown in Tables 3 and 4, the accuracy, sensitivity, specificity of the ML-ELM algorithm were the highest for the calibration, validation and prediction samples compared with the LDA and ELM algorithms. The above results show the ML-ELM algorithm has the best performance to build the different calibration models for different producing areas of Yunnan tobacco leaves with NIR spectral data. Besides, the calibration models built by the ML-ELM algorithm also have the better prediction performance than the other LDA and  ELM algorithms. This is because they could be the result that the minimum Euclidean distance was used for LDA algorithm to classify the spectral data and it was ineffective when the dimensional spectral data was high. For ELM algorithm, the amounts of hidden nodes of ELM algorithm were randomly set. However, the ML-ELM classification algorithm picked up the best number of hidden nodes by using the unsupervised learning, thus learning more abstract features of the NIR spectral data. In order to verify the experimental results as described above, different tobacco leaves producing areas were also recognized after using 6 routine chemical indexes. The classification and cross-validation methods were the same as the above experiment using NIR spectral data. The experimental results with chemical indexes are shown in the last three columns of Table 4. The results showed that the accuracy, sensitivity, specificity of ML-ELM algorithm was the highest among the three algorithms when using chemical indexes. However, the above evaluation results of ML-ELM algorithm using chemical index data were much lower than using NIR spectral data for each producing area. Besides, NIR spectroscopy technology was cheap, low-cost, and effective compared with the chemical index detection method. Considering experimental results and consequences as described above, the NIR spectroscopy technology together with ML-ELM algorithm could be the most effective tool for recognizing different producing areas of tobacco leaves among all the above methods tested.
The averaged elapsed execution time are usually used to estimate the performance of an algorithm. Here the averaged elapsed execution time contains the calibration model building and the prediction process. All the experiments were performed on the same computer. The parameters of the computer are Core TM i7-8700h, 3.20GHz, CPU with 8GB RAM, with Windows 7 Professional operation system. All the algorithms were calculated by using the language of Matlab. 26 The results of Figure 7 showed the computing times of LDA, ELM, and ML-ELM algorithms on the two data sets using NIR spectral data. It was obvious that ELM and ML-ELM algorithms were much more efficient than LDA algorithm. Although the computing time of ML-ELM algorithm was a bit slower than the ELM algorithm, considering the classification accuracy, ML-ELM algorithm was also the best option to classify the NIR spectral data of the tobacco leaves from different producing areas.

Conclusions
Our study proposed a novel method using NIR spectroscopy technology together with ML-ELM algorithm to identify the different producing areas of tobacco leaves cultivated in Yunnan province. The results showed that the method put forward was an alternative strategy to discriminate different producing areas of tobacco leaves rapidly, accurately, and non-destructively. Besides, the ML-ELM algorithm performed much better than traditional LDA and ELM algorithms based on both NIR spectral data and chemical indexes data. The results indicated that application of the NIR spectroscopy technology together with ML-ELM algorithm could be useful for determining different plantation areas of Yunnan tobacco leaves.