Arm Motion Classification Using Time-Series Analysis of the Spectrogram Frequency Envelopes

Hand and arm gesture recognition using radio frequency (RF) sensing modality proves valuable in man–machine interfaces and smart environments. In this paper, we use the time-series analysis method to accurately measure the similarity of the micro-Doppler (MD) signatures between the training and test data, thus providing improved gesture classification. We characterize the MD signatures by the maximum instantaneous Doppler frequencies depicted in the spectrograms. In particular, we apply two machine learning (ML) techniques, namely, the dynamic time warping (DTW) method and the long short-term memory (LSTM) network. Both methods take into account the values as well as the temporal evolution and characteristics of the time-series data. It is shown that the DTW method achieves high gesture classification rates and is robust to time misalignment.


Introduction
Propelled by successes in discriminating between different human activities, radar has recently been employed for automatic hand gesture recognition for interactive intelligent devices [1][2][3][4][5][6]. This recognition proves important in contactless close-range hand-held or arm-worn devices, such as cell phones and watches. The most recent project on hand gesture recognition, Soli, by Google, monitors contactless interactions with radar embedded in a wrist band and is a good example of this emerging technology [3]. In general, automatic hand or arm gesture recognition, through the use of radio frequency (RF) sensors, is important for the smart environment. It is poised to make homes more user friendly and most efficient by identifying different motions for controlling instruments and household appliances. The same technology can greatly benefit the physically challenged, who might be wheelchair confined or bed-ridden. The goal is to enable these individuals to function independently.
Arm motions assume different kinematics than those of hands, especially in terms of speed and time duration. Compared to hand gestures, arm gesture recognition can be more suitable for contactless man-machine interactions with a longer range, e.g., in the case of commanding appliances, like a TV, from a distant couch. The large radar cross-sections of the arms, vis-a-vis hands, permit more remote interactive positions in an indoor setting. Further, the ability of using hand gestures for device control can sometimes be hindered by cognitive impairments such the Parkinson disease which induces strong hand tremors.
The nature and mechanism of arm motions are dictated by their elongated bone structure defined by the humerus, which extends from the shoulders to the elbows, and the radius and ulna that extend from the elbows to hands. Because of such structures, arm motions, excluding hands, can be accurately simulated by two connected rods. In this respect, the instantaneous Doppler frequencies corresponding to different points on the upper arm are closely related. The same can be said for the forearm. This is different from hand motions which involve different and flexible motions of the palm and the fingers, and it is certainly distinct from body motions which yield intricate micro-Doppler (MD) signatures [7][8][9][10][11][12][13][14][15].
Recent work in automatic arm motion recognition using the maximum instantaneous Doppler frequencies, i.e., the frequency envelope of the MD signature of the data spectrogram, as features followed by the nearest neighbor (NN) classifier provided classification rates reaching close to 97% [16]. It was shown that the feature vector consisting of the augmented positive frequency and negative frequency envelopes outperforms data driven automatic feature extraction, such as principal component analysis (PCA), and provides similar results to convolutional neural network (CNN). Since the NN classifier applies distance metrics to measure closeness of the test data to the training data, shuffling the envelope values of all test and training data in the same manner will not change the metric or the classification results. In this respect, the frequency envelope values, rather than the actual shape of the envelope, decide the classification performance.
In this paper, with a focus on improving the results in [17], we employ features that capture the MD signature envelope behavior as well as the evolution characteristics. The envelope represents the maximum instantaneous Doppler frequencies, and thus, can be considered as a time series. Time-series analysis appears in many application domains, including speech recognition, handwriting recognition, weather readings, and financial recordings [18][19][20]. We consider two common time-series recognition methods, namely, the NN-dynamic time warping (DTW) (NN classifier with the DTW distance) [21][22][23][24] method and the long short-term memory (LSTM) method [25][26][27][28]. The former is a conventional machine learning (ML) technique that utilizes the DTW distance which is a sum-measure over a parametrization. It has nonlinear warping capability to find an optimal alignment between two time series and, therefore, can determine the similarity between the two time series [29][30][31][32][33][34]. The latter method is a deep learning tool which is more appropriate for time series than CNN. It establishes a memory of the data temporal evolution information during the training process [35][36][37][38]. The DTW-based NN classifier was shown to outperform those based on the L1 distance norm and the LSTM method, and achieves an average classification rate of above 99%. Both time-series analysis methods are robust to time misalignment. Similar to [17], our feature vector includes the augmented positive and negative frequency envelopes. However, we also augment these two envelopes with a vector of their differences which properly captures the time synchronization nature of the two envelopes. It is noted that no repetitive motions are considered, and gesture classification is applied to only a single arm motion cycle [39].
The main novelty of our work is that, to the best of our knowledge, this is the first time where time-series recognition methods are employed to classify arm motions by the maximum instantaneous Doppler frequency features. Commonly applied methods for classification are more suitable for image-like data, such as handcrafted feature-based methods and low-dimension representation techniques based on PCA and CNN [2,4,5,40]. The principal motivation of using time-series recognition methods is to exploit the time relations between the different envelope values for improved classification.
The remainder of this paper is organized as follows. In Section 2, we describe a method to extract the MD signature envelopes, and discuss two time-series analysis methods, namely, the dynamic time warping and the long short-term memory. Section 3 describes the arm motion experiments, and presents the gesture recognition accuracy of the two time-series analysis methods. Section 4 discusses the robustness of the proposed methods to time misalignment and time consumption. The paper is concluded in Section 5.

Time-Frequency Representations
The radar back-scattering signal from arms in motion can be analyzed by its MD signature. The MD signature is a time-frequency representation (TFR) which reveals the received signal local frequency behavior. A number of TFR methods could be used to represent the MD signature. The spectrogram is a commonly employed TFR which predicates using linear time-frequency analysis. For a discrete-time signal s(n) of length N, the spectrogram can be obtained by taking the short-time Fourier transform (STFT) of the data and computing the magnitude square, where n = 1, · · · , N is the time index, k = 1, · · · K is the discrete frequency index, and L is the length of the window function h(·). An example of the spectrogram, scaled to 0 dB, is illustrated in Figure 1. The sliding window h(·) is rectangular with length L =2048 (0.16 s), and K is set to 4096.
We consider the MD signal as a deterministic signal rather than a stochastic process, and do not assume an underlying frequency modulated signal model that calls for optimum parameter estimation [41][42][43].

Power Burst Curve (PBC)
In real-time processing, the received signal is typically a long time sequence signal that may contain multiple and consecutive arm motions. Finding the onset and offset times of arm motion becomes necessary to determine the individual motion boundaries and time span. These times can be obtained from the PBC [44,45], which measures the signal energy in the spectrogram within specific frequency bands. In particular, we compute In the problem considered, the negative frequency indices K N1 and K N2 are set to −500 Hz to −20 Hz, whereas the indices for positive frequencies are set to K P1 = 20 Hz and K P2 = 500 Hz. The frequency band around the zero Doppler bin between −20 Hz and 20 Hz affects the accuracy of the result and, therefore, is not considered. The resulting PBC is indicated by the blue curve in Figure 2 for the example spectrogram in Figure 1.
In order to avoid false breach of the motion signature, the original PBC curve is smoothed by a moving average filter of length P. The filtered PBC, S f (n), is represented by and is shown in Figure 2 by the red curve. The threshold, T, determines the beginning and the end of each motion, and is computed by where α depends on the noise floor and is empirically chosen from the range [0.01, 0.2]. S f min and S f max , respectively, represent the minimum and maximum values of S f (n). In this paper, α is set to 0.1, which means 10% over the minima. The threshold is indicated by a a yellow line shown in Figure 2.
The onset time of each motion is determined as the time index at which the filtered PBC exceeds the threshold, whereas the offset time corresponds to the time index at which the filtered PBC falls below the threshold.

Extraction of the Maximum Instantaneous Doppler Frequency Signature
The arm has a bone structure which makes it more rigid than the hands. For example, the motion of any point on the upper arm, which is the part from the shoulder to the elbow, can be discerned from any other point in the same part. This property motivates us to use the maximum instantaneous Doppler frequencies as principal features. These features represent the positive and negative frequency envelopes in the spectrograms and attempt to capture, among other things, the maximum Doppler frequencies, the time-duration of the arm motion event and its bandwidth, and the relative portion of the motion towards and away from the radar. In this respect, the envelopes can accurately characterize different arm motions. An energy-based thresholding algorithm discussed in [17,44] can be applied to extract the envelopes. First, the maximum positive and negative Doppler frequencies are determined by computing the effective bandwidth of each motion from the spectrogram. Second, the positive frequency and negative frequency parts of a spectrogram are used to generate the positive envelope and negative envelope, respectively. The corresponding energies of the two parts, denoted as E U (n) and E L (n), are computed separately as S(n, k) 2 .
(5) Figure 3 shows the resulting positive energy and negative energy of the example considered. These energies are then scaled to define the respective thresholds, T U and T L , where σ U and σ L represent the scale factors; both are less than 1. These scalars can be chosen empirically, but an effective way for their selection is to maintain the ratio of the energy to the threshold values constant over all time samples. This constant ratio can be found by time locating the maximum positive Doppler frequency and computing the corresponding energy at this location. In this example, t i = 2.54s, f j = 340 Hz and A(t i , f j ) = 320, where (t i , f j ) and A(t i , f j ) represent the location of the maximum positive Doppler frequency and its strength, respectively. The corresponding scale factor can be found by Once the threshold is computed, the positive frequency envelope is then provided by locating the Doppler frequency at each time instant for which the spectrogram assumes the first higher or equal value to the threshold. This frequency, in essence, represents the effective maximum instantaneous Doppler frequency. A similar procedure can be followed for the negative frequency envelope. The positive frequency envelope, e U (n), and negative frequency envelope, e L (n), are concatenated to form the feature vector e = [e U , e L ]. The extracted frequency envelopes of the example considered are plotted in Figure 4.

Time-Series Analysis Methods
The extracted maximum instantaneous Doppler frequencies are considered as a time series. To measure the similarity between two time series, the traditional L1 and L2 distance methods do not take into account the temporal or evolutionary behavior of the series. We seek a similarity measure that accounts for these properties and is robust to time shift and scaling. To fully exploit the characteristics of the maximum instantaneous Doppler frequencies, two time-series analysis methods are presented, namely, the DTW method and the LSTM method. The DTW method is a well-established distance measure which permits time and scale misalignments. A NN classifier can be applied in conjunction with the DTW [32,46]. On the other hand, the unique design structure of the LSTM allows the network to exhibit temporal dynamic behavior and as such, is cognizant of past input samples. It has already achieved a great success in handwriting recognition and speech recognition [47,48].

Dynamic Time Warping Method
The NN classifier is applied to the MD signature feature vector to discriminate among six arm motions. The DTW distance is one of the principal methods used to calculate the similarity between two motion time series which may vary in time or speed. For instance, similarities in walking patterns could be detected using DTW, even if one person walks faster than the other, or if there are accelerations and decelerations during the course of an observation.
Suppose X = (x 1 , x 2 , . . . , x i , . . . , x n ) and Y = (y 1 , y 2 , . . . , y j , . . . , y n ) are two time series representing the maximum instantaneous Doppler frequencies, an n-by-n distance matrix D is then formed, where the (i, j) matrix element represents the distance D(x i , y j ) between x i ∈ X and y j ∈ Y (the distance D(x i , y j ) is typically computed by the L1 or L2 norm). Each element also corresponds to an alignment between x i ∈ X and y j ∈ Y. A warping path, W, finds a path in the distance matrix D [21,29,49], where each w l corresponds to an element (i, j) l . The warping path is typically restricted by the following three constraints [21,29,49]: • Boundary conditions: the beginning and end of the path are w 1 = (1, 1) and w L = (n, n), respectively; • Monotonicity: given w l1 = (a, b) and When the distance matrix D is computed by the L2 norm, the diagonal line in the Figure 5 represents the Euclidean path, which is just one case of all possible paths. The DTW is the path that satisfies the above restrictions, and also has the minimum warping cost, as illustrated in Figure 5 and given by, The applied NN classifier is among the most commonly used classifiers in pattern recognition. It is a simple ML classification algorithm, where for each test sample, the algorithm calculates the distance to all training samples. The DTW distance is chosen as the distance metric due to its superior performance in time-series analysis compared with conventional L1 and L2 distances. The classification is performed by assigning the label of the closest training sample based on the resulting DTW distance.

Long Short-Term Memory
The CNN and recurrent neural network (RNN) are two common deep learning tools. The former performs well in spatial-distributed data processing, which is mainly used for image classification with the predefined size data. On the other hand, RNN can recognize the time information, and it is more commonly used in speech recognition and natural language processing. Since we cast the feature as time series, we opt to use the RNN to analyze the temporal information embedded in the data. However, the conventional RNN suffers from long-term memory. The LSTM is an alternative RNN architecture which can overcome this shortcoming. A detailed explanation of LSTM can be found in [25][26][27][28].
The diagram in Figure 6 illustrates the architecture of the employed LSTM network. The input layer inputs the time-series data into the network, and the LSTM layer learns temporal information from the input. The fully connected layer combines all the features learned by LSTM layers for classification. Therefore, the output size is equal to the number of classes. The softmax layer normalizes the output of the former layer to be used as the classification probabilities. At any given time, the input data X input is two dimensional. Along each dimension, a time series of length N, representing the maximum positive and negative instantaneous Doppler frequencies, is considered. That is, X input = X 1 X 2 · · · X t · · · X N = e U1 e U2 · · · e Ut · · · e UN e L1 e L2 · · · e Lt · · · e LN , where X t is the two-dimensional input vector containing the maximum positive and negative Doppler frequencies at time t. Figure 7 shows the details of the LSTM layer. Each LSTM block contains three gates to control the flow of the information, namely, the forget gate, the input gate, and the output gate. The hidden state h t and the cell state c t of the LSTM layer at time t can be obtained by following equations [25]: where f t is the forget gate, i t is the input gate, g t is the cell candidate, o t is the output gate. The W, R, and b are the input weights, recurrent weights and bias, respectively, with the subscripts corresponding to different gates. The W f , R f , and b f are the parameters of the forget gate, the W i , R i , and b i are the parameters of the input gate, the W g , R g , and b g are the parameters of the cell candidate, and the W o , R o , and b o are the parameters of the output gate. The σ and tanh(·) represent the sigmoid activation function and hyperbolic tangent activation function, respectively. The denotes the Hadamard product. The difference between the conventional RNN and the LSTM network is that the LSTM has three gates to regulate the flow of the information, which also leads to its long-term memory. The forget gate f t can learn what information is relevant in the time sequence, and decide to keep or forget accordingly. The previous hidden state h t−1 and the current input x t are passed through a sigmoid function, and the smaller the output value, the more information from the previous cell state c t−1 is forgotten. The candidate g t is the output by a tanh function to compress the inputs, and the input gate i t is applied to control how much information of the candidate is added in the updated cell state. The cell state c t itself can be updated by the previous cell state c t−1 , the forget gate f t , the input gate i t , and candidate g t . The output gates o t decide how much information to output. The hidden state h t can be obtained by the update cell state c t and output gate o t , which is also the output of the LSTM block at time t.

Arm Motion Experiments
The system in the experiments utilizes one K-band portable radar sensor from the Ancortek company with one transmitter and one receiver. It generates a continuous wave (CW) with the carrier frequency 25 GHz and the sampling rate is 12.8 kHz.
The data analyzed in this paper were collected in the Radar Imaging Lab at the Center for Advanced Communications, Villanova University. The radar was fixed at the edge of a table. The vertical distance between radar and the participant was approximately three meters. During the experiments, the participants were in a sitting position, and the body remained fixed as much as possible. In order to mimic typical behavior, the arms always rested down at table or chair arm level at the initiation and conclusion of each arm motion. Different orientation angles and speeds of arm motion were also considered. As shown in Figure 8, five different orientation angles, 0, ±10 • , ±20 • , were chosen, and the participant was always facing the radar at different angles. Since the speed of the arm motion varies from person to person and is also influenced by age, we took into account both normal speed and slow speed arm motions. The normal speed motion is more natural and relatively fast, whereas the slow speed arm motion is about 30% slower than the normal. The six arm motions were conducted as depicted in Figure 9, i.e., (a) pushing arms and pulling back; (b) crossing arms and opening; (c) crossing arms; (d) rolling arms; (e) stop sign; and (f) pushing arms and opening. In "pushing," both arms moved towards the radar, whereas the "pulling" was an opposite motion in which the arms moved away from the radar. The "pushing" was followed by "pulling" immediately with a very short pause or almost no pause between them. The motion of "crossing arms" describes crossing the arms from a wide stretch. Six people were invited to participate in the experiment, including four men and two women. Each arm motion was recorded over 40 seconds to generate one data segment. The normal arm motion and slow arm motion were both recorded twice at each angle. Each segment contained the same 12 or 13 individual arm motion, and the PBC was applied to determine the onset and offset times of each individual motion. A 5 second time window was utilized to extract every individual motion from the long time sequence. As such, repetitive motions and the associated duty cycles were not considered as features and were not part of the classifications. In total, we generated 1913 samples for six arm motions. Among the six arm motions, we chose the most discriminative arm motion as an "attention" motion for signaling the radar to begin as well as to end. Without the "attention" motion, the radar remained passive with no interactions with the human. Among all arm motions, "pushing arms and pulling back" and "pushing and open arms" assumed the highest accuracy. However, the former motion can be confused with common arm motions such as reaching for a cup or glasses on table. Thus, "pushing and open arms" was chosen as the "attention" motion.
The spectrograms for six arm motions at a normal speed and at zero angle were obtained by performing the STFT, and are shown in Figure 10. Through the envelope extraction method, the corresponding envelopes are also plotted in Figure 11. The yellow curves and the red curves in the figure are the maximum positive and negative envelopes, respectively. It is clear that the envelopes can well enclose the local power distributions. It is also evident that the MD characteristics of the spectrograms are in agreement and consistent with each arm motion kinematics [16]. For example, in "pushing arms and pulling back," the arms push forward directly which generates positive frequencies, whereas the "pulling" phase has negative frequencies. At the initiation of the arm motion, "crossing arms and opening," the two arms move back slightly, resting on a table or chair arms, in the ready position. This causes negative frequency at the beginning. The motion itself can be decomposed into two phases. In the "crossing" phase, the arms move closer to the radar at the beginning which causes positive frequencies, then move away from the radar which induces negative frequencies. The "open" phase is the opposite motion of the "crossing" phase, which also produces positive frequencies first and then negative frequencies. At the conclusion of the arm motion, the arms rest down causing positive frequencies at the end of the spectrogram. The motion "crossing arms" only contains the first phase of the motion "crossing arms and opening," and has the same respective MD signature. The two arms of "rolling arms" perform exactly the opposite movements, as one arm moves forward along a circular trajectory, while the other moves backwards. Therefore, the MD has simultaneously positive and negative frequencies. In one motion cycle, the right arm experiences three phases, moving forward, moving backward, and moving forward again. The left arm always performs the opposite motion to the right arm. For the motion, "stop sign," the arm moves backwards which only causes negative frequencies. The last arm motion, "pushing arms and opening " includes the pushing, which has positive frequencies, and the opening, which has negative frequencies. Figure 12 is an example of the "attention" motion with different velocities at 0 • . The time period of the normal motion is shorter than that of the slow motion, and the speed is faster which causes higher Doppler frequencies. The main characteristics and behaviors, however, remain unchanged. Figure 13 shows the "attention" motion at the normal speed and at different orientation angles. As the angle increases, the energy becomes lower owing to the dB drop in the antenna beam.

Classification Results
In the previous sections, we discussed the extraction of the maximum instantaneous Doppler frequency signatures and two different time-series analysis methods. In this section, the extracted features are regarded as a sequence that is input to both methods. The classification accuracy is used to evaluate the performance of the two ML methods, and all the classification results are obtained through 500 Monte Carlo trials. In each trial, we randomly selected 70% of the data segments for training and 30% for testing. All experiments were performed on Intel(R) Core(TM) i7-3770 CPU with 16 GB of memory.

Classification Accuracy of the LSTM Method
The structure of the LSTM method, described in Section 4.2, is applied. The input data were the maximum instantaneous Doppler frequencies. The output of the LSTM layer was the last sequence, and its size was determined by trial and error. During the training process, the batch number was set to 10, and the maximum epochs was 200. An epoch is an iteration over the entire training samples. The optimization solver was the stochastic gradient descent with momentum optimizer, and the learning rate was the constant value 0.001. The training accuracy and the loss of the training during the training process is plotted in Figures 14 and 15. The arm motion recognition results with different output sizes are shown in Figure 16. The highest accuracy of 96.67% was achieved with the output size 400. The confusion matrix is given in Table 1.

Classification Accuracy of the DTW Method
The DTW distance is robust to time misalignments and time scaling. Figure 17 shows two envelopes of the same motion class, but with a large misalignment in time. Although similar in shape, the L2 distance, which only accounts for the corresponding samples in the two time series, yields a high error norm. By applying the DTW distance, the two time series can be aligned well and the effect of the misalignments is significantly reduced. Similarly, two time series with different time scalings can also be aligned in value. Figure 18 shows the alignment of the envelopes of two members of the same motion class but with different speeds. In essence, by applying the DTW method, the two time series which belong to the same motion class assume small distance and high similarity, which reduces the probability of misclassification.  In our previous work [16,17], each of the original extracted envelope features contained 2000 samples for both positive and negative Doppler frequencies, and were directly input into the NN classifier with the L1 distance measure, achieving an overall accuracy 97.17% [16]. Considering the real-time processing and to avoid high computational burden of DTW dealing with long time series, we downsampled the envelopes to 200 samples. Figure 19 shows one example of the frequency envelope before and after downsampling. It is evident that the main characteristics of the envelope are maintained when downsampled. To further examine the impact of downsampling on the NN classifier, the downsampled features were put into the NN classifier with the L1 distance. This resulted in a classification accuracy of 97.13% [16], which is nearly the same as when using the entire sequence. The corresponding confusion matrix is given in Table 2.  Since the arm motion recognition accuracy based on the original and the downsampled envelopes is unchanged, we opted to use the downsampled envelope features as the input to the NN-DTW classifier. The result is an overall accuracy of 98.20%, with the confusion matrix shown in Table 3. It took about 0.2 s to classify each test sample with the downsampled data, using the DTW distance which makes it suitable for real-time processing. By comparing these two confusion matrices, the accuracy of motions (b), (c), (d), and (e) improved by 1% to 3%, whereas motion (a) dropped by 2%. There was a 1% overall improvement.
Since we concatenated the positive and negative envelopes to form a long vector, the relation between the time occurrences of the corresponding samples was not captured in the concatenated vector and was not considered by the DTW method. Thus, we decided to also include the vector of the two envelope differences as a feature. The new feature vector is e new = [e U , e L , e U − e L ], and includes the differences between the positive and negative envelopes. A remarkably higher average classification rate of 99.12% was achieved. The confusion matrix is shown in Table 4. All motions are classified with an accuracy of over 98.50%, and in particular, the motions in (c), (d), and (e) have an accuracy higher than 99%.

Analysis of the Classification Accuracy with Time Misalignment
The NN classifier with the L1 distance only considers the frequency envelope values individually, whereas the DTW distance takes into account the temporal information. Thus, it is expected that the NN classier with the DTW distance can achieve a better performance. The onset and offset times of each motion are obtained by the PBC, and used to center the individual motion in the middle of the spectrograms.

Analysis of the Time Consumption
In real-time arm motion classification, the execution time of the two time-series recognition methods considered is an important factor. The operation software is MATLAB 2018b with a Windows 10 computer. For both recognition methods, the input of each arm motion is 200 downsampled envelopes. The time for training and testing is obtained from 1343 training samples and 570 test samples with 100 trials, and is presented in Table 5. The training process of the LSTM is computationally expensive, since the classification of LSTM requires a large number of memory cells, output size, and epochs, which demand a long training time [50]. Once the LSTM network has been trained, the execution time for each test sample is only about 2.95 ms. The NN classifier has no training process, which means it does not require any training time [51,52]. The classification time of the NN-DTW method for each test sample is about 0.2 s, which is much longer than the LSTM network, but remains suitable for real-time processing, while maintaining a higher classification accuracy compared with the LSTM network. It is noted that the computational complexity of the NN classifier is O(Nd), where N is the number of the samples and d is the dimension of the features. The test time of the NN-DTW increases linearly as the number of samples increases. Fast NN methods [51,53] and DTW methods [30,54] can be applied to achieve fast implementation.

Conclusions
In this paper, we considered a time-series analysis method for effective automatic arm motion recognition based on radar MD signature envelopes. No range or angle information was incorporated into the classifications. Taking advantage of the Doppler continuity of the arm motion, the PBC was used to determine the individual motion boundaries from long time series. The positive and negative frequency envelopes of the data spectrogram were then extracted by an energy-based thresholding algorithm. The feature vector was the augmented positive and negative frequency envelopes, and their differences. The augmented feature vector was provided to the NN classifier based on the DTW distance, which is more suitable to describe the similarity between time series in lieu of the L1 and L2 distance measures. The LSTM, a time-series analysis method commonly used in ML, was also presented for comparison. The experimental results showed that the NN classifier based on the DTW distance achieves close to a 99% classification rate, which is superior to both existing classifiers based on the L1 distance and the LSTM method by an overall 2% improvement. It was also shown that the DTW and LSTM methods are robust to the time shift of the signal.
Future work may consider more diverse arm motions, arm speeds, arm angle orientations, and distances between the radar and the person moving his/her arms. It will be of interest to evaluate the robustness of the arm classification results while the person is in the state of standing or walking.