Application of bi-modal signal in the classification and recognition of drug addiction degree based on machine learning

: Most studies on drug addiction degree are made based on statistical scales, addicts’ account, and subjective judgement of rehabilitation doctors. No objective, quantified evaluation has been made. This paper uses devises the synchronous bimodal signal collection and experimentation paradigm with electroencephalogram (EEG) and forehead high-density near-infrared spectroscopy (NIRS) device. The drug addicts are classified into mild, moderate and severe groups with reference to the suggestions of researchers and medical experts. Data of 45 drug addicts (mild: 15; moderate: 15; and severe: 15) is collected, and then used to design an addiction degree testing algorithm based on decision fusion. The algorithm is used to classify mild, moderate and severe addiction. This paper pioneers to use two types of Convolutional Neural Network (CNN) to abstract the EEG and NIR data of drug addicts, and introduces batch normalization to CNN, thus accelerating training process, reducing parameter sensitivity, and enhancing system robustness. The characteristics output by two CNNs are transformed into dimensions. Two new characteristics are assigned with a weight of 50% each. The data is used for decision fusion. In the networks, 27 subjects are used as training sets, 9 as validation sets, and 9 as testing sets. The 3-class accuracy remains to be 63.15%, preliminarily justifying this method as an effective approach to measure drug addiction degree. And the method is ready to use, objective, and offers results in real time.


Introduction
No researchers have applied AI algorithms in the quantified evaluation of drug addiction degree by now. Based on statistical scales, addicts' account, and subjective judgement of rehabilitation doctors, this paper classifies drug addicts into mild, moderate and severe groups. The EEG-NIRS synchronous collection experiment is then designed to obtain the data of 45 subjects. Machine learning methods are used to classify the drug addiction degree. The main contributions of this study include: 1) the feasibility and advantages of EEG-NIRS in the research on detoxification are put forward; 2) artificial intelligence algorithm is used for Bi-modal Signal processing; 3) the objective and quantitative evaluation of drug addiction levels and craving is realized; 4) it provides a basis for the formulation of targeted rehabilitation training program in the future.
The Electroencephalogram (EEG) is a type of spontaneous bioelectrical signal captured and recorded on the surface of the scalp. It has the characteristics of being noninvasive, no radiation damage and low cost, so it is widely used in the research of brain diseases. Near-infrared spectroscopy technology uses light injection and detection points to measure the hemodynamic function of brain tissue, and can record blood oxygen levels [1][2][3][4]. In human body, oxygenated hemoglobin (HbO2) and deoxyhemoglobin (Hbb) have specific absorption for near-infrared light at 600-800 nm. However, other biological tissues in the brain are relatively transparent in this wavelength range to measure the near-infrared light intensity changes at 650 and 760 nm. The cerebral hemodynamic data were obtained according to beer Lambert Law [5,6]. The EEG can capture the changes in temporal activity. NIRS quantitatively analyzes the blood oxygen metabolism level of brain tissue via the spectral measurement method [7,8], the combination of the two marks the successful integration of EEG and cerebral blood oxygen metabolism level to positively quantify the degree of drug addiction [9][10][11].
Zhang et al. adopt the method of multi-modal task in the study and propose the new prediction framework of regions of interests (ROIs) on the cortex, improving the accuracy of predicting ROI [12]. Xia et al. propose a machine-learning method integrating stacked denoising autoencoder (SDAE) which has advantages over electrocardiogram (ECG) classification [13]. Wallois et al. used EEG-NIRS to synchronously collect information to study the brain activation mechanism [14]. In 2015, Balconi et al. studied the connection between EEG-NIRS and emotions, and the findings revealed the correlation between cortical forward networks and emotional stimuli [15]. Fazli et al. used two devices to synchronize information and improved the recognition accuracy of motion imagination to 90% [16]. Tomita et al. through visual cortex stimulation experiments, found that the brain-computer interface of EEG-NIRS is more accurate than using event-related potentials alone to recognize light scintillation [17]. At present, the research on the fusion of the results from the two methodologies has been applied in the fields of brain function diseases and cognitive analysis [18], evaluation of mental and movement disorders [19], brain-computer interface and other fields [20,21]. In 2017, Zich et al. studied the influence of age on the neurological correlation of myocardial infarction [22]. Safaie et al. studied the fusion monitoring of EEG-NIRS and EEG-NIRS [23]. Sawan's EEG-NIRS fusion was used to monitor intracranial brain function in epilepsy [24]. In 2015, Rand et al. designed a near-infrared probe which can be mixed with EEG electrodes, and verified the feasibility of the scheme through experiments [25]. In 2016, Buccino and other researchers tried to classify four different samples of motor imagery, namely right arm, left arm, right hand and left hand's tasks [26]. Shin et al, used EEG-NIRS to study two types of motor imagery, and the final accuracy rate reached 82% [27]. Guo et al. used deep learning to do emotion recognition for expression and gesture signals, with the highest accuracy rate of 50.42% [28].
The study is based on the subjects of forty-five (All male) drug addicts (15 mild, 15 moderate and 15 severe) who were pre-screened according to the behavioral scale screening, oral narration and medical examination. According to the designed experimental paradigm, the EEG and NIRS data of these drug addicts were collected. This paper designs an algorithm to detect drug addiction using CNN's detection algorithm based on decision level fusion. EEG-CNN network and NIRS-CNN network, respectively extracted features related to the degree of addiction and performed some normalization processing before classification to obtain two types of feature signals with the same dimension. After that, the two features were given a 50% weight. 27 subjects were used as training data, 9 subjects were used as validation data and 9 subjects were used as test data, for final classification.

Participates
At the Shanghai Qingdong Drug Rehabilitation Center, 45 subjects were recruited, all of whom were male methamphetamine withdrawal. Average age: 31.30 ± 4.64. Average education level: 10.07 ± 2.849. History of drug use: 9.929 ± 5.341. The duration of drug detoxification this time is 11.15 ± 2.73 months.
The specific inclusion criteria for the subjects are as follows: 1) patients meeting the DSM-5 diagnostic criteria for mental disorders caused by psychoactive substances; 2) patients with detoxification period less than half a year; 3) patients with junior high school education or above; 4) patients aging 18-41 years; 5) patients with willingness to participate in this study and informed consent. Exclusion criteria: 1) severe cognitive impairment, or unable to cooperate with the studyrelated evaluations and examinations; 2) severe somatic diseases; 3) severe psychotic symptoms; 4) participating in other psychological intervention and treatment; 5) a history of abuse of other psychoactive substances (except nicotine). The study was conducted in accordance with the declaration of Helsinki and was approved by the Ethics Committee of Shanghai University (Approval No. ECSHU2020-071). Neuracle Technology, Wet electrode wireless EEG acquisition equipment. The electrode position international standard 10/20 system. Number of brain electrodes: 64, sampling rate: 1000 Hz, input range: +/-375 mV, data transmission mode: WiFi. It can simultaneously collect multi-lead EEG, ECG and other signals. In order to ensure the signal quality, the impedance of brain electrodes is lower than 5 KΩ. According to the actual demand, 48 electrodes are selected. Figure 1, shows the distribution of EEG electrodes.

fNIRS equipment introduction
NIRSIT (OBELAB, Seoul, Korea), light source type: dual wavelength VCSEL laser, technical spectrum: CW: wave 780 nm, 850 nm, spatial resolution: 4 ×4 mm 2 , time resolution: 8.13 Hz, number of light points: 24, number of detection points: 32, detection depth: 0.2-1.8 cm. The NIRS system used in the experiment was able to measure the signals from four SD separations: 15, 21.2, 30 and 33.5 mm, and allowed to measure the changes of hemodynamic reaction at different depths [29]. It is a functional near infrared spectroscopy (fNIRS) device with 204 channels [30]. Figure 2, The method used in the experiment of NIRS equipment.

NIRSIT channel and functional area division
The NIRSIT channel and functional area are shown in Figure 3. The four advanced functional areas detected by the forehead near-infrared device are divided into the dorsolateral prefrontal cortex, the ventrolateral prefrontal cortex, frontopolar prefrontal cortex, and the orbital frontal cortex. The specific channel distribution: the right dorsolateral prefrontal cortex is 1, 2, 3, 5, 6, 11, 17, 18 channels. There were 19, 20, 33, 34, 35, 38, 39 and

Experiment and data collection
We used E-prime software package (Psychology Software Tools, Pittsburgh, PA) to write the experimental paradigm, with each map numbered. A complete experimental paradigm consists of the following three stages. Figures 4 and 5, are examples of drug abuse-related maps and neutral maps used in the experimental paradigm. Figure 6 The whole process of experimental paradigm. P means: drug map; N means: neutral map.
The first stage of the experimental paradigm, 10 minutes in total, during which the subjects need to close their eyes for 5 minutes and then open their eyes for 5 minutes.
The second stage, it lasts 6 minutes and is divided into drug maps and neutral maps. Among them, each block lasts 10 seconds. There are a total of 16 maps, and the display time of each map is 0.6 seconds. At the beginning, the first four maps are displayed randomly, during which there are two drug maps. After displaying the first four maps, the remaining 12 neutral maps are displayed randomly. After a block ends, there will be a 4-second interval map with a white background and a black cross. In the second stage, when the subjects see the drug map flash out, they need to be marked on the paper synchronously.
The third stage, it lasts a total of 4.6 minutes, during which the maps are all neutral, with each block lasting 10 seconds. There are 16 maps in total, with a display speed of 0.6 seconds. There will be a 4-second interval between each block.

Bimodal signal processing architecture based on decision fusion
Decision fusion means that each signal's data makes independent decisions first, and then the final decision was obtained with the help of weighted calculations according to the results of independent decisions of each signal.
EEG and NIRS data were obtained separately, due to the difference in sampling frequency of EEG equipment and NIRS equipment. After the two types of data were preprocessed by their respective CNN networks, feature extraction, and decision-making, the two types of data were then processed by batch normalization at the decision-making level, so that the data becomes a distribution with a mean value of 0 and a standard deviation of 1. Figure 7, the flow chart of bimodal addiction recognition based on decision level fusion.

EEG and NIRS filtering
Both EEG and NIRS networks are designed with Butterworth filters. The expression of norder Butterworth filter is: where n is the order, is the cutoff frequency, and is the passband edge frequency. In this experiment, n is 6 and the data is band-pass filtered. The selected frequency band ranges from 0.01 to 3 Hz. This frequency range can remove the interference of heartbeat, breathing, and slow drift on the original data. The EEG sampling rate is 1000 Hz, and the Butterworth filter is also selected for filtering. The selected frequency band ranges from 0.5 to 7.5 Hz. In the experiment, n takes 2 to filter the EEG data to remove signal interference such as ocular and facial myoelectric activity.

Convolutional layer
Convolutional layers are the core of convolutional neural networks [31,32]. Their main role is to extract features from the input data. The calculation form is as follows: is the th feature of the layer . is the convolutional kernel weight, and the th feature of the layer − 1 .
is a bias parameter, (•) is the activation function.

Pooling layer
The pooling layer sub-samples the input features according to specific rules in order to make the network robust to small changes in previously learned features [33,34]. The calculation form is as follows: is the th feature of the layer . 1 is the Subsampling coefficient. is the bias parameter, (•) is a sub-sampling function, (•) is the activation function.

Normalization of data
In this method, each neuron feature is normalized individually, and the mean and variance are calculated on a single training data block at a time, and then normalized. Each neuron feature is normalized individually, and the mean and variance are calculated on a single training data block at a time, and then normalized. The specific implementation process is as follows: Input: Values of x over a mini-batch: = { 1 … }; Parameters to be learned: , Calculate the mean of each mini-batch: Calculate the variance of each mini-batch:

Activation function
In this paper, the activation function uses a Rectified Linear Unit (ReLU), and the corresponding calculation formula is as follows: The results show that the derivation of the activation function is simple, and the output of some neurons is 0.

Full connection layer
The input to the fully connected layer must be one-dimensional, and the previous features are two-dimensional, so the order of each feature is converted to one-dimensional before it can be used as the input of the fully connected layer [35,36]. The calculation formula for the fully connected layer is as follows: ℎ , ( ) is the output value of the neuron. is the input feature vector of the neuron.
is the weight. is the bias parameter. θ(•) is the activation function; The first fully connected layer in this paper uses the ReLU activation function.

Softmax layer
In CNN, if the final output result is single-label multi-classification, the softmax function is usually used to normalize and map to the probability value [37,38], and the Softmax calculation formula is as follows: is the value of the output neuron corresponding to the ith category.

Adam network optimization algorithm
Required parameters：Learning rate ; 1 , 2 ∈ [0,1); Exponential decay rate for moment estimation; The random objective function , whose parameter is ( ); Initialized parameter vector 0 . Preparation: 0 ← 0 (initialize first moment vector) 0 ← 0 (initialize second moment vector) ← 0 (initialization time step). while not converged do Adam algorithm is an algorithm to perform a step optimization for random objective functions. The core idea of this algorithm is adaptive low order moment estimation. The pseudo code for the Adam algorithm. Figure 8 shows the EEG-NIRS addiction classification structure. the module 1 is the CNN network architecture. Table1 EEG model parameters.

EEG CNN network parameter settings
The CNN network parameters of EEG before the fusion of the two features.   Before the fusion of the two signals, the CNN network architecture of fNIRS. Figure 8, Module 3, normalization of EEG and NIRS output characteristics: EEG characteristics: batch normalization level. The EEG features were normalized to optimize the one-dimensional feature distribution. Near-infrared characteristics: batch normalization level. The near-infrared features were normalized to optimize the one-dimensional feature distribution. Weight distribution: after normalization, EEG and NIRS data of the same dimension and the same amount of data were obtained. At the same time, 50% weight was given to each type of data. The data was processed and output to the full connection level. Full connectivity level. One dimensional feature distribution was searched and finally classified.

Results
A total of 32 electrodes in the central, parietal, and occipital regions of the brain were selected for EEG. The sampling frequency of EEG data was 1000 Hz. The experiment used 45 subjects' data, each subject data has 56 trials, and each trial contained 32-channel EEG data of 1.125 s after the drug induced picture appeared. The CNN network selected 27 personal data for training, accounting for 60% of the total number of people, 9 personal data for validation and 9 personal data testing, each accounting for 20% of the total number of people.
For fNIRS data, 16 channels in the dorsolateral prefrontal cortex, ventrolateral prefrontal cortex, and frontal polar prefrontal cortex, which theoretically, related directly to the drug users, were selected. The data format was 10 channels 8 Hz. The experiment used 45 subjects' data, each of which contained 56 trial labels. Each trial contained 16-channel near-infrared data of 1.125 s after the drug picture appeared. Consistent with EEG processing, 27 subjects were for data training, 9 subjects were used as validation data and 9 subjects were used as test data. There were 3024 trial labels of training data, 1008 trial labels of validation data and 1008 trial labels of testing data. Batch normalization changes the size of the data variance and the mean position by using optimization. This method processes the EEG and NIRS features separately, and converts EEG and NIRS features into data distribution of the same dimension and format, which is possible to combine the two features.
In this paper, the CNN network uses the Adam optimizer to calculate loss through sparse categorical cross-entropy. After 1300 epochs, the result is shown in Figure 9. The loss changes showing a convergence trend, which proves that the network structure is stable. When monitoring the accuracy of the test data set, it was found that after 900 epochs, as shown in Figure 10, the accuracy of the test data set showed a significant overall decline, and over-fitting occurred. In order to suppress the occurrence of over-fitting, the Early stopping method is used to monitor the accuracy of the test set. This method continuously records the accuracy of the test set. If the optimal accuracy is not reached in multiple consecutive epochs, it stops the iteration and saves the optimal accuracy model. In this article, the Early stopping monitoring period is set to 5000 epochs. After many experiments, the optimal model basically appeared between 800 and 1000 epoch.   Figure 11, shows the average classification accuracy of drug addiction degree. Since the final model of neural network training is different each time. According to the CNN network, the correct rate of the ten average classification is 63.15%. The ultimate goal of this paper was to realize the intelligent judgement of the addiction degree of drug addicts through the machine learning method, instead of subjective judgment in traditional behavioral methods. From the analysis results obtained, it was consistent with the behavioral scale and expert judgment.
In this paper, based on the EEG-NIRS bimodal Signal classification of drug addicts' method, using the machine learning algorithm, CNN classified the degree of addiction by learning the characteristics related to the degree of addiction of drug addicts, which was objective and easy to use.

Discussion
In this paper, EEG-NIRS was used to classified the degree of drug addiction by using physiological data. It is an improvement of traditional psychological methods. In the study, a bimodal machine-learning classification algorithm based on decision level fusion is proposed to realize the classification of drug abusers.
In 2016, Das et al. Studied EEG-NIRS as a novel technology for brain monitoring [39]. Xing et al. summarized the fine-needle aspiration cytological diagnosis of malignant lymphoma and formed a simple and effective classification diagnosis method [40]. Shin et al. Verified the potential advantages of hybrid EEG-NIRS-BCIs in classification accuracy through experiments [41]. The bimodal fusion machine learning classification algorithm finally obtained a classification accuracy of 63.15%. This accuracy rate is not very high. However, by adjusting the network architecture of CNN, it is possible to improve the classification accuracy.
To objectively evaluate the degree of drug addiction by machine learning, we should pay attention to the following aspects: 1) The integrity of EEG and NIRS data; 2) Feature extraction can be considered during machine learning network design, input layer fusion, feature layer fusion, decision layer fusion, and the classifier can consider the comparison of SVM, LDA, and CNN methods. This method combines the personal situation of drug users, psychological questionnaires, and doctors' suggestions, and has been verified through long experiments to evaluate the scientific nature of drug addiction cravings.
The limitation of this study is that the number of subjects used is not large enough. When doing deep learning, we need more data as support, and the effect will become better; The limitation of the machine learning model is that the two-modal combination method does not achieve higher 3-class accuracy. When researchers encounter similar problems, they must adjust the machine learning architecture to obtain higher classification accuracy.

Conclusions
This paper was intended to find an objective and effective method to evaluate the degree of drug addiction. In this regard, an experimental paradigm was designed to induce the drug users' cravings for drugs. The 45 subjects were induced to crave characteristics by drug maps in the experimental paradigm. Based on CNN learning these characteristics, the drug addiction degrees of drug users were finally classified.
In addition, two CNNs were built on the basis of decision-level fusion, giving full play to the respective advantages of the EEG-CNN and the near-infrared CNN, and then fusion classification was achieved. Experiments had verified that this method could effectively detect the data features associated with addiction and define the addiction degree. The patterns were consistent between the algorithm classification and the classification of mild, moderate, and severe drug users provided by researchers and physicians.