Rapid Recognition of Different Sources of Heroin Drugs by Using a Hand-Held Near-Infrared Spectrometer Based on a Multi-Layer Extreme Learning Machine Algorithm

Rapid recognition of the sources of drugs can provide some valuable clues and the basis for determining the nature of the case. A novel recognition method was put forward to identify the sources of heroin drugs rapidly and non-destructively by using a hand-held near infrared (NIR) spectrometer and a multi-layer-extreme learning machine (ML-ELM) algorithm. In contrast to traditional linear discriminant analysis (LDA), support vector machine (SVM) and extreme learning machine (ELM) algorithms, the accuracy, sensitivity and specificity were the highest for the proposed ML-ELM algorithm. The prediction accuracy of the ML-ELM algorithm was 25.33, 20.00, 17.33% higher than that of LDA, SVM and ELM algorithm, respectively, for 4 cases. The ML-ELM models for recognizing the different sources of heroin drugs had the best generalization ability and prediction results. The experimental results indicated that the combination of the hand held NIR technology and ML-ELM algorithm can recognize the different sources of heroin drugs rapidly, accurately, and non-destructively on the spot.


Introduction
Heroin is one of the most common drugs in China.It is damaging to the health of people and has become a worldwide social problem.Yunnan province is located in southwest China and bordered with Myanmar, Laos and Vietnam.The border of Yunnan Province is more than 4000 kilometers and is an important route for drug trafficking.In 2020, 35.4 tons of drugs were caught in Yunnan province, accounting for 47.45% of China.As two of the most common drugs in China, methamphetamine and heroin drugs account for more than 80% of the illicit drugs.Yunnan Province has been hit hard by the drugs, especially methamphetamine and heroin. 1 The drug source denotes the scenes of the crime and case sources of the drugs.Drugs of different cases are all illegal production.As the raw material sources, producer production equipment, technical level and production process are all different for each illegal drug production factory, the purity, impurities, content of effective ingredients and residual solvent type will be different.It provides the basis for using the modern analytical techniques to recognize the source of drugs.The rapid determination of the source of the drugs can help the police to estimate if the drugs are the same case.It can provide some valuable clues and the basis for determining the nature of the case.Therefore, how to determine the sources of the drugs rapidly is very important for the drug control.Usually, the conventional method applicable to determine the sources of heroin drugs is generally based on gas chromatography-mass spectroscopy (GC-MS), which is expensive, time-consuming and cannot be synchronized to the crime scene.In order to carry out drug control efforts more effectively and investigate illegal heroin drugs after their confiscation, a fast and reliable qualitative determination for the analysis is crucial. 2 As a result, it is of great importance to develop a new method, which is rapid, cheap, reliable and high-efficient.
Near-infrared (NIR) spectroscopy is a useful analytical chemistry tool and it has the advantages such as being accurate, cheap, fast, and non-destructive. 3In recent years, considerable effort has been invested in applying NIR for drug determination and identification.However, few studies have been reported on the recognition of different sources of heroin drugs using NIR technology so far.Besides, in the previous research, 3 it can be known that NIR spectral data of drugs involves a large number of correlated features and it is difficult to find the connection between spectral data and heroin drug sources.Hence engineering the features to represent the salient structure of the spectral data is important.
Extreme learning machine (ELM) put forward by Huang et al. 4 has been widely used in many areas, such as classification, 5 regression, 6 feature selection 7 and so on.Multi-layer extreme learning machine (ML-ELM) is one of the unsupervised learning methods using both deep learning and extreme learning machine.][10] However, there are few applications in the classification of NIR spectral data.Considering the above discussion and analysis, ML-ELM is very suitable for the processing of NIR spectral data. 11n this study, a novel classification method using handheld NIR technology and ML-ELM algorithm was put forward to recognize the sources of heroin drugs rapidly and non-destructively.The proposed technique was applied to establish the heroin source recognition model and testified by practical-seized heroin samples.

Experimental
Equipment and samples A MicroNIR 1700 device was used to collect the spectral data of heroin drugs.The spectral range of the device is 900-1650 nm and it is provided by Viavi Solution, Milpitas, CA, United States.The MicroNIR is equipped with a 128-pixel detector array, which records data with a nominal spectral resolution of 6.25 nm.The integration time was 10 ms and each spectrum was the average of 60 scans, resulting in a measurement time of 0.60 s.In the scan process, the device was placed directly on the heroin samples to acquire the NIR spectroscopy.Meanwhile, a reflectivity plate over 99% was also placed under the heroin samples.
In the following research, a total number of 338 seized drugs by Yunnan police from 4 different case sources were chosen.All the samples were provided by public security bureaus of Yunnan province.The differences in the samples were very small and they could not be recognized through the eyes.Pictures of the samples could not be placed here for the reasons of confidentiality.High-performance liquid chromatography (HPLC) was used for quantitative determination of heroin drugs in order to identify the sources of different cases, the purity of the heroin drug samples of 4 different cases are shown in Table 1.It can be seen in Table 1 that the differences of the average purity of 4 different cases are huge.Besides, the statements of the suspects also showed the samples were from 4 different sources and the samples in each case were from the same source.In the experimental process, the samples were divided into 3 parts.It contained calibration, validation and testing samples.The above calibration and validation samples were chosen randomly.180 samples were chosen as the calibration set, 83 samples as the validation samples and 75 samples as the test set.The details of the dataset are shown in Table 1.

Theory of linear discriminant analysis
Linear discriminant analysis (LDA) is a well-known dimension reduction and classification method.It is used for binary classification problems.Suggest there is a set of samples which have two classes C 1 and C 2 .The total number of the samples is n.The number of class C 1 is n 1 and the number of class C 2 is n 2 .If each sample is described by q variables, the data forms a matrix X = (X ij ), i = 1, …, n; j = 1, …, q.We denote by µ k the mean of class C k and by μ the mean of all the samples: Then, the between-class scatter matrix S B and the within-class scatter matrix S W can be defined as: LDA determines a vector ω such that ω t S B ω is maximized while ω t S W ω is minimized.This double objective is realized by the vector ω opt that maximizes the criterion: (5)   It can be proved that the solution ω opt is the eigen vector associated to the sole eigen value of S W -1 S B if S W -1 exists.Once ω opt is determined, LDA provides a classifier. 12,13eory of support vector machine The theory of support vector machine (SVM) has been extensively described in literature. 14,15Considering a binary classification problem, the objective is to predict for all the objects their belonging to a class y{-1, +1}, from m dimensional input data represented by a vector written x = (x 1 , x 2 , …, x m ) and x i for the i th object of the training set.In the case of spectra, m represents the number of wavelengths.The class prediction first requires training on a data set containing the spectra corresponding to n objects or samples with known class, that is to say n{x, y} values.
Theory of extreme learning machine algorithm ELM proposed by Huang et al. 4 shows that the hidden nodes can be randomly generated.The input data is mapped to L dimensional ELM random feature space and the network output is given by equation 6: (6)   where b = [b 1 , …, b L ] T is the output weight matrix between the hidden nodes and the output nodes, h(x) = [g 1 (x), …, g L (x)] are the hidden node outputs (random hidden features) for the input x and g i (x) is the output of the i-the hidden node.Given N training samples {(x i , t i )} N i=1 , ELM is to resolve the following learning problems: where T = [t 1 , …, t N ] T are the target labels and H = [h T (x 1 ), …, h T (x N )] T means the output matrix of hidden layer.The output weights b can be calculated by equation 8: where H † is the Moore-Penrose generalized inverse of matrix H.
To have better generalization performance and to make the solution more robust, one can add a regularization term as shown in equation 9. (9)   where C is the regularization coefficient and the values of this parameter will be assigned randomly after the appropriate hidden layer numbers are set.

Theory of multi-layer extreme learning machine algorithm
If the number of nodes L k in k th hidden layer is equal to the number of nodes L k-1 in the (k -1) th hidden layer, g could be linear otherwise, g could be nonlinear piecewise, e.g., sigmoidal function.
where H k is the output matrix of k th hidden layer.If k = 0, the input layer x can be considered as the 0 th hidden layer.
The output of the connections between the last hidden layer and the output node t is analytically calculated using regularized least squares.The steps of recognizing the sources of heroin drugs by using ML-ELM algorithm were shown as follows.Firstly, the calibration and validation samples were serial treated by spectral pre-processing.Then, the parameters of ML-ELM algorithm were determined and the classification models were built.Finally, the prediction results could be obtained by using the built ML-ELM models.

Measures of classification performance
Confusion matrix cannot only be used in the twoclassification discriminant analysis, but also can be used in the multi-classification discriminant analysis.The following figure presents the basic form of confusion matrix for a multiclass classification task, with the classes A 1 , A 2 , and A n .In the confusion matrix, N ij represents the number of samples actually belonging to class A i but classified as class A j .
A number of measures of classification performance can be defined based on the confusion matrix.Some common measures are given as Figure 1.
Accuracy is the proportion of the total number of predictions that were correct: (11)   Precision is a measure of the accuracy provided that a specific class has been predicted.It is defined by: (12)   Specificity is the proportion of actual negatives measured that were correct: (13)   In equation 13, TN means true negative and FP means false positive.
Sensitivity is a measure of the ability of a prediction model to select instances of a certain class from a data set, it is defined by the formula: The traditional F-score (F 1 score) is the harmonic mean of precision and sensitivity: (15)

Results and Discussion
The NIR spectral data of all the samples were collected by a hand-held NIR spectrometer.No pre-processing operation has been done on the samples.In the scan process, the samples were measured without the packaging.All the physical evidence bag (plastic bag) were opened to avoid the influence of a polymer.The device was placed directly on the heroin samples to acquire the NIR spectroscopy.The NIR spectral data of the dataset of 4 different cases are shown in Figure 2. The peaks from 960 to 980 nm and 1400-1420 nm are the first and second overtone of O-H, respectively; peaks in the range of 1490-1600 nm are attributed to the first overtone of N-H.In the literature, 5 results showed that the variables located in the ranges of 1100-1250 nm and 1350-1600 nm had a great influence on heroin.
Savitzky-Golay derivative pre-processing operation is performed on the spectral data to reduce the influence of instrument noise and improve the signal-to-noise ratio.Besides, it can also maximize the small differences in absorption bands and correct the light scattering.Table 2 showed the accuracies of calibration models using LDA, ELM and ML-ELM with different parameters of Savitzky-Golay Derivative pre-processing operation.It could be   seen from Table 2 that when the parameters were first derivative, 7-point number of smoothing points and two polynomial order, the accuracies of the three models were the highest.As the result, the above parameters were chosen in the following part.The results of Figure 3 showed that in the spectral data after pre-processing, the noise were reduced and the small differences in absorption bands were maximized after the pre-processing operation.
Then, successive projections algorithm (SPA) 16 was used to optimize the wavelength.The optimization wavelength after using SPA were 1157, 1190, 1200, 1357, 1391, 1425 and 1570 nm.Table 3 showed the accuracies of calibration models using LDA, ELM and ML-ELM with the selected wavelength using SPA.It can be seen from Table 3 that the accuracies of LDA, SVM, ELM and ML-ELM algorithms were higher comparing with Table 2.As the result, the above 7 selected wavelength were chosen in the following experiments.
LDA, 17 SVM, 18 ELM, 19,20 and ML-ELM algorithms were used to identify the spectral data of different sources of heroin drugs.8][19][20] To achieve the fair comparison and avoid the randomness in test results, all the calibration, validation samples were chosen randomly and the three algorithms ran on the same calibration and test splits for each calculation.For ML-ELM, the number of layers was an important parameter.There exists a specific value of the number of layers, which can make the ML-ELM achieve the highest overall accuracy.If the number of layers was bigger or smaller than the specific value, the ML-ELM algorithm will not achieve the best classification performance and the overall accuracy of the ML-ELM will not be the highest.Therefore, the first work was to define the number of hidden layers of the ML-ELM algorithm in order to achieve the better performance with less parameters.Here, sigmoid was set as the activation function and the number of hidden nodes was set as 10 and 500.It can be seen from Figure 4 that the overall accuracy of the dataset increased firstly and then decreased as the number of hidden layers increases.The accuracy was the highest when the number was 3. Three hidden layers of the ML-ELM algorithm will be chosen in the following experiment.
Accuracy, precision, sensitivity and F-score were used to evaluate the performances of the calibration models, validation results and testing results of each algorithm.Meanwhile, a ten-fold cross validation was used for each experiment in order to avoid over-fitting and reflect the performance of the different predictors faithfully.The different main factors (1-10) of LDA algorithm were chosen and the final value is 8, as the calibration, validation and prediction accuracy were the highest when the main factor of LDA algorithm was 8.For SVM algorithm, the kernel   function of SVM algorithm was radial basis function (RBF).The punishment coefficient was chosen as 1 by using a particle swarm optimization algorithm in order to achieve the best classification performance.For ELM algorithm, the number of hidden neurons was set randomly by the computer each time, and the transfer function was sigmoidal function.
For the sake of comparison, the performance of LDA, SVM, ELM and ML-ELM algorithms are shown in Tables 4  and 5 in the form of a confusion matrix.As shown in the Tables 4 and 5, the accuracy, precision, sensitivity, specificity and F-score of ML-ELM algorithm are the highest compared with LDA, SVM, ELM algorithms.The higher sensitivity means the higher recognition capability

Figure 1 .
Figure 1.Confusion matrix for a multi-class classification task.

Figure 2 .
Figure 2. Original spectral data of the heroin drug dataset.

Figure 3 .
Figure 3. Pre-processing results of the original spectral data of heroin drugs.

Figure 4 .
Figure 4. Overall accuracy of ML-ELM algorithm dealing with heroin drug spectral data.

Table 1 .
Details of the heroin drugs dataset

Table 2 .
Accuracies of calibration models using LDA, ELM and ML-ELM with different parameters of Savitzky-Golay derivative pre-processing operation LDA: linear discriminant analysis; SVM: support vector machine; ELM: extreme learning machine; ML-ELM: multi-layer-extreme learning machine.

Table 3 .
Accuracies of calibration models using LDA, ELM and ML-ELM with the selected wavelength using SPA SPA: successive projections algorithm; LDA: linear discriminant analysis; SVM: support vector machine; ELM: extreme learning machine; ML-ELM: multi-layer-extreme learning machine.

Table 5 .
Testing results of the heroin drug spectral data using different algorithms LDA: linear discriminant analysis; SVM: support vector machine; ELM: extreme learning machine; ML-ELM: multi-layer-extreme learning machine.TP: true positive; FP: false positive; FN: false negative; TN: true negative; Pc: precision; Sn: sensitivity; Sp: specificity; AC: accuracy.