A Classification Method of Acupoints and Non-acupoints based on Traditional Features and Wavelet Features

Meridians and acupoints are the basis of TCM theory and play an important role in disease diagnosis and acupuncture treatment. There are still many problems in current research on electrical signals of acupoints. On the one hand, most of the studies did not consider the integrity of the meridian, but only based on a few acupoints. On the other hand, the lack of targeted feature extraction and classification methods leads to unsatisfactory classification results. Considering the above problems, a method combining traditional features and wavelet features is proposed to classify acupoints and non-acupoints. Based on the integrity of the meridians, we first collect the body surface electrical signals of some acupoints and non-acupoints on the twelve meridians of the human body, and then extract traditional and wavelet features from the measured signals. Finally, SVM and XGBoost are used to classify acupoints and non-acupoints respectively. The experimental results show that this method can effectively improve the classification performance of acupoints and non-acupoints, and for the feature vectors constructed in this paper, XGBoost has better classification capabilities.


Introduction
In recent years, the problems existing in the western medicine therapy, such as chemical drug resistance and increase in the cost of medicare, cannot be neglected any more. Traditional Chinese Medicine (TMC) has attracted more and more attention because of its safety, convenience and effectiveness. However, compared with the western medicine, the application of TCM is constrained in many areas due to the lack of objective and scientific basis in meridian theory [1] and the absence of a clear and independent anatomical structure.
Acupuncture points are the basis of traditional Chinese medicine theory, and their states can reflect the physical health of the human body. Therefore, conducting research from the perspective of electricity, analyzing their electrical characteristics, and distinguishing them from non-acupuncture points, can help to achieve precise positioning of acupuncture points. And it can also provide similar research ideas for disease diagnosis and treatment through acupoint electrical signals, which has certain significance for supporting the theory of Chinese medicine.
The current research on the electrical characteristics of meridian points is mainly divided into two directions: one is to find out the different characteristics between acupoints and non-acupoints under the condition of applying electrical stimulation, and some results have been achieved. For example, a large number of experimental results show that the acupuncture points have low impedance and high capacitance characteristics [2]. The other is to study the potential characteristics of acupoints without stimulus, and it has been concluded that acupuncture points have higher potential characteristics than non-acupuncture points [3]. Compared with the method of applying electrical stimulation,The study of acupoint potential without stimulation directly collects the electrical signals spontaneously generated by the human body for analysis, which avoids the influence of external stimulation, so it may be closer to the essence of the electrical characteristics of acupoints [4]. However, there are still many problems in current research on acupoint potential, such as: 1) Most studies are only based on the potential characteristics of a few acupoints and nonacupuncture points on one meridian. They have not studied the similarities and differences between acupuncture points and non-acupuncture points on multiple meridians based on the integrity of the meridian.
2) The lack of targeted feature extraction and classification methods results in low classification accuracy.
3) There are relatively few studies on the electrical signals of acupuncture points and nonacupuncture points without stimulation.
In order to solve the above problems, this paper collects the electrical signals of some acupoints and non-acupuncture points on the twelve meridians of the human body, based on the integrity of the meridian, considers both the global and local features of the signals, and finally uses machine learning methods to classify acupoints and non-acdupoints.
The rest of this article is organized as follows. Section 2 briefly introduces the overall plan of this research. Section 3 mainly introduces the data processing process. Section 4 shows the experiments and results. Conclusion and future work are given in Section 5.

The overall plan
The overall plan of this article is shown in Figure 1. Firstly, we collect the raw signals of acupoints and non-acupoints on the twelve meridians of the human body, and then denoise the raw signals. Next, we extract traditional features and wavelet features from the de-noised signals. Traditional features consider the overall information of the signals and describe the overall characteristics of the signals, while wavelet features consider the local information of the signals and describe the local characteristics of different sub-band signals. Finally, machine learning methods are used to classify acupoints and nonacupoints.  3 points and some non-meridian non-acupuncture points 2cm near the acupuncture points on the twelve meridians were collected sequentially. All measured points are on the left body of the subject. At each measured point, we collected a 1000hz,150s-long signal . A group of subjects includes 7 healthy volunteers (20-29 years old, 3 males, 4 females). For a single subject, we obtained the raw signals of 47 acupuncture points and 38 non-acupuncture points.

Data pre-processing
For a single 150s long signal, we cut it every 3s as an acupoint or non-acupoint signal data. Among the 7 subjects, we selected the collected signals of 6 subjects (3 males, 3 females) as the training set data, and took the collected signals of the last subject as the test set data. In the training set data and the test set data, the total amount of acupuncture point and non-acupoint point samples is balanced.
We first observed the frequency domain of the collected signal and found that the 50hz power frequency and its 100hz harmonic interference in the collected signal frequency domain were relatively large, so we used the notch filter to remove the 50hz and 100hz power frequency interference [5].For the signal whose power frequency interference has been removed, we used wavelet decomposition filtering techniques [6] to remove other noises. As show in figure 2, comparing with the original signal, the denoised signal eliminate useless noise on the basis of retaining the useful signal to the maximum extent, which is very beneficial for feature extraction in the following work.

Traditional features
Traditional features include time domain features, frequency domain features, and nonlinear features. For a denoising signal, this paper extracts 42-dimensional traditional features.

Time domain features.
We perform various operations on the amplitude of the signals to extract the time domain features which can intuitively measure the change of the measured signal's amplitude and shape .The statistical characteristics used in this paper are: count value, average value, maximum, minimum, range, variance, skewness, kurtosis, etc. The method for calculating the variance is given by equation (1), where X i is the value at each time, N is the length of the sequence and 〈x〉 is the mean of the sequence.
The calculation method of the kurtosis shown in equation (2) can characterize the sharpness of the data distribution.

Frequency domain features.
Due to the randomness of the acupoint signal, the time-domain analysis has a certain instability, so we further obtain the signal's frequency-domain characteristics, which further reflect the nature of the body surface electrical signal.
In the frequency domain analysis, the measured data need to convert from the time domain to the frequency domain by FFT(Fast Fourier Transform), and then extract the corresponding features. The features used are: Shannon Entropy, Mean power, Total power, signal-to-noise ratio(SNR), etc. The Shannon entropy can represent the degree of uncertainty of the sequence. The calculation method of total power shown in equation (3) can represent the sum of the power spectral density,where p i is the power spectral density.
(3) The calculation method of signal-to-noise ratio shown in equation (4) can be used to characterize the proportion of valid information in the signals, where s is the signal's total power and n is the noise's total power. SNR = s/n (4)

Nonlinear domain features.
These features are extracted based on the nonlinear analytical method. The features in this paper include coefficient of variation(COV), median absolute deviation(MAD), autocorrelation coefficient(AC), the number of turning points (NT) [7], etc. The calculation method of COV shown in equation (5) can characterize the degree of dispersion of the signal, where std(X) is the variance of the sequence. COV = std(X)/〈x〉 (5) NT counts the number of changes in the sign of the slope, in other words, the number of signal peaks, and the calculation method is shown in equation (9), where u(x) indicates a unit-step function.

Wavelet features
Wavelet analysis transforms a one-dimensional signal in the time domain into a two-dimensional timescale space. The time-domain signal is projected and decomposed at multiple scales in the wavelet transform domain to obtain signals of different sub-bands. Different sub-band signals contain different local information and describe the dimensions of the signal more finely [8]. The basic principles of discrete wavelet transform are as follows: Assuming that there is a continuous signal f(t), we perform discrete wavelet transform on it. The definitions of discrete wavelet transform and inverse transform are shown in equation (7) and equation (8): where ψ j,k (t)=2 -j/2 ψ(2 -j/2 t-k) is a wavelet sequence, and ψ(t) is a wavelet basis function that satisfies certain conditions. f j (t) represents the component of the signal f(t) at a certain scale (2 j ). j and k respectively represent frequency resolution and time offset.
By using the mallat algorithm, we decompose the signal f(t) in a finite layer, then f(t) can be expressed as equation (9): (9) In equation (9), L is the number of decomposition levels, D j is the detail component at different scales, and A L is the low-pass approximation component,more specific equation, A L =f L A (n) and D j =f j D (n). The entire frequency band of the signal is divided into multiple sub-bands. The approximation coefficients and wavelet layer coefficients corresponding to these sub-bands are cA L ,cD L ,cD L-1 ,…,cD 1 .Let us define f(n) as the sampled signal of continuous signal f(t), then f(n) can be regarded as an approximate value when the scale j=1, that is, A 0 (n)=f(n). As shown in Figure   AIACT 5 3, the discrete signal is decomposed by wavelet with scale j=2,3,4, and we get the approximation coefficient cA 3 and wavelet coefficients cD 3 ,cD 2 ,cD 1 . Due to the lack of prior knowledge of acupoint signals, a thorough decomposition of acupoint signals may be more helpful to extract useful local information. In this paper, we use the Daubechies (DB7) wavelet to decompose the signals in 7 levels (j=8), and get the wavelet decomposition coefficient vector C. The structure of the coefficient vector C is as follows: C = {cA 7 ,cD 7 ,cD 6 ,cD 5 ,cD 4 ,cD 3 ,cD 2 ,cD 1 } where cA 7 is the approximation coefficient, and cD 7 ,cD 6 ,…,cD 1 are the wavelet coefficients.
For the approximation coefficient and the wavelet coefficients of each band, six statistical features which include maximum, minimum, mean, median, standard deviation, and average energy are extracted respectively. Therefore, an original signal can obtain 48-dimensional wavelet features [9]. These wavelet features describe different local information of the original signals.

Experiments and Results
In this paper, we obtained 7112 1000hz 3s-long signals from the original signals of 6 subjects, which include 3920 acupoint signals and 3192 non-meridian non-acupoint signals. These signals are used as training set data.
In order to evaluate the classification ability of the model more accurately. We randomly selected 12 acupuncture points and 12 non-meridian non-acupuncture point signals from all the measured signals of the seventh subject as a test data set. We randomly selected 11 times and got 11 different test data sets. Each test data set has 672 1000hz 3s-long original signals ,which include 336 acupoint signals and 336 non-meridian non-acupoint signals.
We take traditional features as the first type of features, and the combination of traditional features and wavelet features as the second type of features.
Firstly, we extract the first type of features from training data and test data. In the training stage, we use the training feature data to train SVM based on the RBF kernel [10] and XGBoost based on the CART regression tree [11], optimize the parameters by the grid search method ,and finally save the trained model. After that, we use 11 test data sets that have the first type of features to verify the classification effects of different models. F1-score is used as the evaluation standard for the classification ability of our models. Then we extract the second type of features from the training data and test data, and repeat the above process. As we have seen in Figure 4 and Table 1, from the perspective of the features, when using the second type of features that combine traditional features and wavelet features, the average F1-score of SVM in the 11 test sets is 0.7373, which has an increase of 0.0392 compared to using only traditional features, the average F1-score of XGBoost in the 11 test sets is 0.7728, which has an increase of 0.0292 compared to only using traditional features. The experimental results show that, compared with the first type of feature vectors that only extract the global features of the signals, the second type of feature vectors integrate the global and local features of the signals, have higher separability, and improve the model's ability to classify acupoint and non-acupoint potential signals. From the perspective of the models, using the feature vectors constructed in this paper, the average F1-score of the SVM and XGBoost classification models on the 11 test sets are all above 0.73, which can achieve the purpose of classification for acupoints and non-acupoints well. And when using the same type of feature vectors proposed in this paper, the classification performance of the XGBoost is better than SVM, so it has a better generalization ability.

Feature importance
In order to further explore the contribution of traditional features and wavelet features in the second type of features, We calculate the importance of all the features in the XGBoost model , shown in Table2, which show Top10,Top20,Top40,Top60 important features proportion. We can find that the traditional features are effective and they represent the global features of the signals. Meanwhile, the wavelet features are also indispensable, they represent the local characteristics of the signals. Combining global features with local features can describe the regular differences between the feature spaces of acupoints and non-acupoint signals more comprehensively, thereby