Unobtrusive human activity classification based on combined time ‐ range and time ‐ frequency domain signatures using ultrawideband radar

In this proposed approach to unobtrusive human activity classification, a two ‐ stage machine learning–based algorithm was applied to backscattered ultrawideband radar signals. First, a preprocessing step was applied for noise and clutter suppression. Then, feature extraction and a combination of time ‐ frequency (TF) and time ‐ range (TR) domains were used to extract the features of human activities. Then, feature analysis was performed to determine robust features relative to this kind of classification and reduce the dimensionality of the feature vector. Subsequently, different recognition algorithms were applied to group activities as fall or non ‐ fall and categorise their types. Finally, a performance study was used to choose the higher accuracy algorithm. The ensemble bagged tree and fine K ‐ nearest neighbour methods showed the best performance. The results show that the two ‐ stage classification was more accurate than the one ‐ stage. Finally, it was observed that the proposed approach using a combination of TR and TF domains with two ‐ stage recognition outperformed reference approaches mentioned in the literature, with average accuracies of 95.8% for eight ‐ activities classification and 96.9% in distinguishing between fall and non ‐ fall activities with efficient computational complexity.

restrictions for acoustic-based techniques such as the requirement of noise-free acoustic environments and low multipath reflections. Both methods have the problem of installation in bathrooms owing to privacy issues for cameras and high multipath reflections for acoustic equipment. In this regard, among non-wearable techniques, ultrawideband (UWB) radar can be a good and effective technology for remote monitoring of human activities because of its inherent and unobtrusive features such as insensitivity to lighting and weather conditions, high-range resolution for positioning, penetration ability through obstacles in behind-wall, indoor and outdoor scenarios, good immunity against multipath interference, low power consumption, and capability of multiple target discrimination [44][45][46]. Therefore, UWB sensors can be used in health monitoring and human care systems, especially to enhance quality of care for elderly people and patients.
Human fall detection and HAC using UWB sensors is founded on the detection of motion caused by the activity itself; therefore, a specific signature related to each human activity is created. The transmitted radar signal is affected by these motions, which produce changes in frequency, phase, amplitude, and differences in time-of-arrival of backscattered signals from the target [46]. Therefore, a time-frequency (TF) and time-range (TR) domain analysis is needed for this approach to reveal human activity motion signatures as velocity, the Doppler effect of different human body parts in motion, and human position versus observation time. Briefly, all of these revealed signatures are considered basic and fundamental in HAC operation [24]. For further clarification, Figure 1 shows the general approach of HAC using UWB radar used in most of the literature. It is mainly divided into three steps: 1. Preprocessing to clean noisy and cluttered collected data with different methods such as bandpass filters, DC removal, and moving average filters [44][45][46][47]; 2. Feature extraction using a predefined [48] or automated feature extraction method [49]; and 3. Classification of activities using machine learning (ML) and deep learning (DL) algorithms, which can be categorised as supervised algorithms [50] and unsupervised algorithms [51].
Numerous radar-based methods have been developed to address the HAC problem. TF representation is widely used for HAC. Kim et al. [23] used a support vector machine (SVM) to recognise seven human activities using six micro-Doppler signatures extracted from the TF representation (spectrogram). Kim et al. [24] applied a deep convolutional neural network (DCNN) algorithm instead of an SVM to classify the same activities. In Kim et al. [24] there is no need for manual feature extraction. Automatic feature extraction and recognition processes are done simultaneously using DL algorithms. Jokanovic et al. [26] improved the accuracy of micro-Doppler signature estimation by implementing a multiwindow TF analysis for radar returns using SVM for classification. Jokanovic et al. [27] evaluated a combination of TF and TR domains and demonstrated the superiority of using this combination over the use of a single domain in human motion recognition. Jokanovic et al. [28] again used this combination and proposed a deep learner approach consisting of two stacked autoencoders and a logistic regression classifier to improve activity recognition accuracy. In these publications, each domain (TR and TF) was used separately at each recognition stage. Sadreazami et al. proposed three different DL methods for fall detection [36][37][38]. They used radar time series data obtained by the sum of the range profile of UWB radar returns as input for a deep residual network for automatic feature learning [36]. In work by the same authors [37], the features were learned automatically using DCNN from the radar time series data. The authors [38] also proposed a capsule network-based fall detection (CapsFall) method in which the TR representation of the radar time series was fed into a CapsFall method for automatic feature learning. Du et al. [39] introduced a three-dimensional (3D) DL framework for human motion analysis in which reflected radar echoes were transformed into range-Doppler points to obtain the discrete representation of motion trajectory. Du et al. [40] proposed a segmented convolutional gated recurrent neural network (SCGRNN) method to classify different human activities determined from on micro-Doppler spectrograms. Ding et al. [41] addressed the HAC problem using a multilayer classification method. First, signatures in the TR domain were used to categorise the 12 human activities into in situ motions and non-in situ motions. After that, spectrograms were obtained using the weighted range-time-frequency transform (WRTFT) method. Then, the physical empirical and principal component analysis (PCA) features were extracted and used to classify the motions into in situ and non-in situ motions, respectively [41]. Li et al. [42] developed a hierarchical recognition technique in which each activity among the six motions was classified based on a one-versus-all binary classification. A combination of different domains was used, the extracted features were investigated, and the optimum features were selected for each stage. Qi et al. [43] proposed a multilayer approach by combining the conventional K-nearest neighbour (KNN) supervised ML technique and a convolutional neural network (CNN) to distinguish 12 kinds of human daily motion. First, the radial features with the KNN classifier were used to divide the motions into two categories: those with no radial displacement and those with radial displacement. Then, a CNN classifier used the feature spectrograms to categorise the two groups of activities into 12 different motions [43].
This work proposes an approach using a UWB radar (Xethru X4M03) [52] for fall detections and HAC using ML algorithms. First, data were collected from three participants performing fall and non-fall activities, as shown in Table 1 using the radar. Next, the data were cleaned by applying different signal processing methods, such as moving average method calculations and singular value decomposition (SVD) to remove the noise and both stationary and non-stationary clutters. Then, features were manually extracted as TF domain features, TR domain features, and related statistical features. Furthermore, the features extracted contained information that helped in distinguishing between activities. The Doppler frequency resulting from each motion and the change of range extracted from the TR and TF domains facilitated the HAC. After that, a feature evaluation was performed on the 56 extracted features to choose the optimum features that provided better performance measured by metrics such as accuracy, precision (PR), and sensitivity (SE). After that, only those selected features were used as a part of the proposed algorithm. This selection procedure is considered a contribution to study the effectiveness of 56 features in TF and TR domains for HAC problem. Next, once a set of features was extracted, a classification algorithm was applied to determine whether an event was a fall or non-fall activity. Then, fall activities were also categorised according to ML algorithms as fall forward, fall backward, fall left, and fall right. Finally, the non-fall human activities were also classified under four categories: sitting, standing, walking, and bending. In this proposed approach, all types of supervised classification methods were performed for each classifier. The recognition method with the highest performance was determined for each stage. The results of the proposed approach showed improvements compared with the reference methods and the one-stage classification in terms of accuracy, precision, sensitivity, and computational time. The parameter sensitivity was also investigated. Potential applications of the proposed ML method include, but are not limited to, monitoring the health status of patients in care settings and elderly and frail people at home or in institutional settings. The method could also find application in workplace occupational health and safety settings as well as in first-responder fire and rescue settings and police or military settings, where fall surveillance is conducted. Novelties of the proposed approach are that: � First, a fitting model is presented for the received signal, considering both the Doppler frequency components resulting from human activities and the non-stationary clutter for HAC problems. � Moreover, the preprocessing stage provides a signal processing technique for non-stationary clutter removal to improve the algorithm accuracy. To the best of the authors' knowledge, non-stationary clutter suppression is not often used in traditional HAC methods.
� The two-stage recognition scheme for human activities consists of three separate models for classification founded on multiple-domain representation. The combination of binary classification in the first stage and two multiclass classifiers in the second stage was proposed using conventional ML techniques. With the selection of the optimum and robust feature set and classifier type for each stage, a competitive performance was achieved, compared with DL algorithms with less computational complexity. � The most important novelty of our work is a good and versatile combination of multimodal features. These effective features help us to achieve good accuracy even with conventional ML methods instead of DL methods. An efficient two-stage classification with an optimal combination of multimodal features was used to improve performance compared with one-versus-all hierarchical schemes with less redundancy of data.
The remainder of the work is organised as follows. In Section 2, signal models for HAC and the data collection setup are described. The complete proposed ML-based algorithms applied to classify fall and non-fall human activities are explained in detail in Section 3. Results and discussion are presented in Sections 4 and 5, respectively. Concluding remarks are given in Section 6.

| UWB radar and data collection setup
The X4M03 UWB radar, shown in Figure 2a, was used for data collection in all samples for HAC. The X4M03 operates at 6-8.5 GHz, offering high spatial resolution. In addition, the sampling rate in propagation time (fast time) was equal to 23.328 GHz. Each sampling point is referred to as a range bin in which the length of each scan is 1536 range bins. The radar covers a distance range of 0.2-9.9 m with 5.35-cm range resolution. The duration of each measurement scenario was approximately 8∼10 s (observation time or slow time), in which the frames per second is 400. The radar was placed 1 m above the floor level (Figure 2b), and 300 data samples were collected in different scenarios at different ranges (R0: 1∼5 m), as presented in Table 2. Three healthy subjects (age 20-30 years, two males and one female) participated in the data collection process, performing numerous predefined fall and non-fall activities, as illustrated in Figure 3. The number of fall and non-fall activities and their categorisation in training and test data sets are given in Table 2. Schematics of different activities are shown in Figure 3.

| Signal modelling
To learn easily and become familiar with the modelling equation of the signal, it is essential to define two independent time scales of the signal: first, where wave propagation phenomena T A B L E 1 Fall and non-fall activities for three subjects occur, that is, the propagation or fast time t, which is generally in the range of nanoseconds and sampled by the X4M03 radar at a sampling rate of 23.328GS/s. Therefore, we can define: where t s is the sampling interval, typically in the range of a few picoseconds, m is the range bin number, and M is length of one reflected wave. Second is the observation or slow time T, which is in the range of seconds, where the target motion for example, falls, sit, and non-falls can be observed. T can be expressed as: where T 0 is the pulse repetitive interval. Figure 2b shows the measurement schematic of the HAC using the X4M03 UWB radar. Initially, the radar transmits a signal s tx (t) from the transmitter antenna, which propagates into the environment towards the target (human body). If the condition of the measurement environment is considered an ideal case (that is, no loss, no noise, no antenna mismatch, and no angular dependency), the propagated signal hits the human body and the transmitted signal will be partially reflected towards the receiver antenna, obtaining a backscattered signal, s rx (t), which can be written as [45]: n ∈ N; n ¼ f0; ; …; Ng is the index of the transmitted signal, s tx (t), K is the total number of detected targets, t is the fast time, and * means convolution. σ k ðtÞ is the time-dependent propagation path of the kth reflection, which is described as: γ tx ðtÞ and γ k ðtÞ, are the impulse response functions (IRF) of the transmitting and receiving antenna, respectively, and γ k ðtÞ is the IRF of the kth reflection.
For simplicity, let us first suppose that a simple target at k = i is a static or motionless target. The effect of σ k ðtÞover the transmitted signal for static targets can be expressed as: where where α is the complex amplitude, τ i is the round-trip time delay, c is the speed of light and R i is the range of target i. When the target is not static and the main purpose is to determine the motion of target, we can rewrite Equation (7) as: where τ j is the round-trip time delay of the target in motion. In radar, the variation of delay occurs only in the observation time; that is, no variation in delay can be observed in fast time during one pulse duration T 0 . Therefore, τ j , the timevariant round-trip delay, can be written as: where τ 0 ¼ 2R 0 c , R 0 is the initial range, and v is the radial velocity. Following this, the equation of the received signal (including both stationary and non-stationary effects) can be represented as: where the first and second P relate to stationary and nonstationary targets, respectively.
To study the Doppler signatures resulting from the human body in different motions, the equations of τ i should be rewritten with respect to the Doppler frequency. The relation between the Doppler frequency f D and radial velocity v can be modelled as [53]: where λ 0 is the transmitted wavelength and f 0 is the transmitted frequency. Then, the time-dependent round-trip delay is: where f D can take the value of f torso , f limbs , f head , f r , and f h , which are the Doppler frequencies resulting from torso, limb, head, respiration, and heartbeat motions, respectively.
These Doppler signatures are caused by the human activity and vital signs, and are used as a feature in the next sections. Therefore, Equation (7) can be rewritten as: In addition, an important parameter for UWB radar for HAC is the range resolution, which can be defined as [54]: where BW is bandwidth and ωis pulse width. Finally, because there are no ideal conditions in real measurement scenarios, additional propagation phenomena should be considered when writing the model equation of the received signal. Therefore, the reflected signal from the target is buried by noise n(t,T ) and clutter c(t,T ) (both stationary and nonstationary) caused by many environment reflections (e.g., walls, furniture, moving objects), and antenna coupling. The received signal will be denoted as: where c(t,T ) and n(t,T ) are clutter and noise, respectively. They should be removed from the received signal in the preprocessing stage. For further information about UWB radar

| PROPOSED METHODOLOGY
In this section, different steps of the proposed algorithm for HAC are described. First, an overall algorithm block diagram is presented in Figure 4. The algorithm is divided into three levels.
The first level is a preprocessing stage that cleans the collected radar data by applying different signal processing techniques to remove noise and stationery and non-stationary clutters and prepare data for the next stage. The second level begins by representing the cleaned data in different domains (TR and TF) and extracting all predefined features. After that, several features are selected and passed to the first stage of classification depending on the supervised ML to distinguish between fall and non-fall activities. In the last level, the output results of the first stage are classified separately using two classifiers: one for fall activities and the other for non-fall activities.

| Preprocessing
The main purpose of this stage is to clean and prepare the collected data for classification.

| Data collection
For HAC using UWB radar, the backscattered signals at different time of arrivals, or different ranges from the radar, can be successively gathered. Then, the reflected echoes r(t,T ) are sampled and recorded in a 2D data matrix R ∈ rðmt i ; nT 0 Þ as: where m and n represent the propagation and observation time index, respectively. The output of the sensor system is a data matrix R (radargram). The range profiles are stored in M range-bins, which is the number of columns of matrix R. Each scan, representing the reflected signal, has a data vector of length M. The radar scans are stored in N scans every 0.0025 s (400 scans per second), which represents the rows of matrix R.

| DC removal and stationary clutter suppression
The main purpose of this section is to remove noise, DC, and stationary clutter. This is done by subtracting the mean of the collected data matrix in slow time. After that, the background subtraction method is used to remove stationary clutter from backscattered signal R. This method is applied by subtracting the estimated background of each range bin over the observation time, that is zðmÞ ¼ Rð:; mÞ from the measured signal: Estimated background p(m -1) is calculated using the moving average method; the estimation technique uses the average of previous L samples to estimate the background of the reflected signal: The simplest case is a moving target indicator that considers L = 1, where simply, the background is assumed to be equal to the previous sample; L = 30.

| Non-stationary clutter suppression
In addition, non-stationary clutter should be suppressed. For this purpose, the resulting output of stationary clutter reduction is matrixX, where X∈ N�M . SVD calculates the lagcovariance matrix C = XX T and its singular decomposition. If N ≥ M, the SVD of X can be written as (18): where Σ ¼ of empirical orthogonal functions and the matrix of principal components, respectively, where The data matrix may be thought of as the summation of M individual singular-value matrices, each given by Equation (19): where D∈ N�M ; j ∈ f 1; …; Mg, Σ i is a matrix of the same size as Σ in which all elements are zero except Σðj; jÞ. Data matrix X can be rewritten as Equation (20):

-
where clutter, target, and noise are represented by the set of eigentriples A, B, and C, respectively. SVD is used to detect the target signals and remove the contribution of X clutter and X noise : that is, clutter (stationary and non-stationary) and noise, respectively. Clutter mainly results from environmental reflections occurring where measurements are performed, such as from walls, static objects, non-static objects, and furniture. An SVD operation is applied iteratively and the data matrix is reconstructed after removing the singular values that represent the clutter. Our proposed method shows that second, third, fourth, and fifth singular values contain target information. When all other singular values are removed, the reconstructed data matrix is matrix Y ¼ y i;j ∈ N�M . For further details, please refer to Qi et al. [43] and Mostafa et al. [44].

| Summation and normalisation
The resulting output matrix of non-stationary clutter suppression is matrix y i;j ∈ N�M , where N is the number of rows corresponding to observations recorded at different time intervals (slow time, across scans) and M is the number of columns representing the spatial samples from different ranges (fast time, only one scan). To prepare the input time series for the proposed HAC ML algorithms and calculate the spectrogram, the average of each impulse response (over fast time) is calculated as: Using the X4M03 radar, we measure the reflected signals in different observation times and create a typical radargram ( Figure 5). The vertical axis is the fast time and the horizontal axis is the slow time. Figure 5 shows the recorded radar data at different steps of the preprocessing stage of fall-forward human activity. The noise and clutter reduction can be clearly observed after applying denoising and clutter suppression signal processing techniques to the measured data shown in Figure 5a. The data are then prepared for the feature extraction step.

| Feature extraction
After the preprocessing stage, feature extraction is the most critical step in ML-based classification. Received signal contains information about subject activity such as range and Doppler frequency. This information can be extracted from measured data represented in two different domains of TR and TF. In addition to statistical features in each domain, several useful features are extracted from the combination of the domains. The total number of extracted features is 56, but not all features are selected for the classification steps.  Figure 7 shows the TR representation of preprocessed data for all eight fall and non-fall activities. Here, the discussed features are those extracted from data represented in the TR domain. First, two main features can be calculated in this TR domain, which are range difference resulting from motion activities and the event width, as shown in Figure 7a. These two features provide important information about activities, because the duration of the event and the change in range differ from one activity to another, especially between fall and non-fall activities, as shown in Figure 7a-h. The range variation of fall activities is commonly greater than that of non-fall motions. Furthermore, the duration of fall activities is more than the time needed to sit, stand, or bend.

| Features of TR domain
After that, a set of empirical and statistical features that characterised the distribution of the radar signal, are extracted: power of received signal, entropy (relative degree of randomness), mean, median, variance, skewness (measure the asymmetry of the data), kurtosis (measure of the degree of distribution peakedness), mobility, form factor, mean frequency, median frequency, root mean square, zerocrossing rate (the number of times the signal crosses the zero reference level in a specific time interval), and turns count (the number of zero crossings of the derivative of the signal).
The last type of features defined in this section is founded on a PCA that was applied to the data matrix in building the radargram shown in Figure 7. Because noise and clutter were removed in the preprocessing stage, the resulting data matrix mainly contains human motion information. Therefore, the most dominant eigenvalues are relevant to the human activities. The first principal component (PC) contains spatial information about the subject activity (greatest source of motion) in the room. The human body is positioned or distributed among the range profiles of the radar echoes for each posture. Because the energy contribution resulting from human motion may go beyond the first PC, other PCs have been considered. In the PCA: F I G U R E 5 Radargram collected radar data at different steps of preprocessing for fallforward human activity: (a) Collected data; (b) After DC removal; (c, d) After stationary and non-stationary clutter suppression, respectively; (e) After averaging over fast time � The first eigenvalue was processed alone. This eigenvalue and its eigenvector were used to reconstruct the matrix that contained the spatial distribution of activity or motion and the mean, median, skewness, kurtosis were computed and used as features. � Eigenvalues 2-9 were chosen from the PCA in descending order because those eight dominant eigenvalues correspond to human motion.
In other words, large-scale movements such as human position changing lie in the first eigenvalue. Small scale movements and posture variation lie in eigenvalues 2-9.

| Features of TF domain
In addition to the TR domain, the TF domain is important and traditional for the representation of radar data collected from human subjects. The short-time Fourier transform (STFT) is the most applied technique to construct the spectrogram. It is typically used for TF representation. STFT performs a Fourier transform on the windowed signal [57]: where w(k) denotes the window signal, f s is the sampling frequency on observation time and L is the window length. The spectrogram is the modulus squared of the STFT, as shown in Equation (23): The spectrogram of the eight data for fall and non-fall activities are shown in Figure 8a-h. In addition, an important parameter, the energy burst curve (EBC), is calculated in Equation (24). The EBC with a lower-bound frequency f lb = 0 Hz and upper-bound frequency f ub = 200 Hz of eight different activities is presented in Figure 9: Statistical features extracted from the EBC curves of Figure 9 are maximum, standard deviation, kurtosis, median and variance, and the maximum of EBC with f lb = 25 Hz and f ub = 200 Hz.
As mentioned, the Doppler and micro-Doppler frequency signatures of fall and non-fall activities, resulting from the motion of torso, limbs, head, arms, and the whole human body, are modelled in Equation (10) Figure 8. There is no need to detect the Doppler frequency of each activity precisely. As shown, fall activities contain higher Doppler frequency components that can reach, and in some cases exceed, 150 Hz (Figure 8a and 8b), where for non-fall activities, it is not greater than 80 Hz in all cases (Figure 8e-h). Besides, the Doppler signatures of walking are more distinguishable than other activities. In other words, the Doppler frequency components of walking are not restricted at only one time, but are distributed throughout the total time of measurement (Figure 8h). The power of these eight frequency intervals is F I G U R E 6 Block diagram of both feature extraction and feature evaluation stages MOSTAFA AND CHAMAANI also considered a feature used for HAC, as shown in Figure 10. Furthermore, features are computed from the spectrogram, such as spectrogram entropy (statistical measure of randomness or uncertainty), the total power of the spectrogram, the power of frequency range of heartbeat and respiration, and the normalized rate with respect to the total power.

| Feature evaluation
Feature evaluation is a step before the algorithm is generalized and exported for use in the tested data. After the feature extraction from TR and TF domains, not all features are considered. Furthermore, every classification step has its own features, but some are commonly used in the two stages. As shown, in Figure 6, the features extracted and used are those resulting in this section for each classifier. Figure 11a is a scatterplot showing the distribution of two activities, fall and non-fall, with respect to the values of two features or predictors (as an example: total power of spectrogram (EX), and spectrogram entropy). The scatterplot helps in understanding which features are more important for classification: that is, which provide more information about activities and separate the activities under classification. Consequently, it helps to choose or remove the investigated features. First, all features are selected; then, we remove features that do not help in classification, drawing on observation of the scatterplots (changing the pairs to get information about all the features) until we obtain the optimum result, which is best accuracy (accuracy and other metrics are defined in Section 4) with a minimum number of features. This method for feature selection is called sequential backward selection. Figure 11b-d shows the accuracy of the first, second, and last classifiers, respectively, with  Figure 11b shows that the first classifier can achieve the optimum accuracy with the selection of 27 of 56 features with 96.9% maximum accuracy. Furthermore, with a selection of only six features (power of the spectrogram, spectrogram entropy, range difference, and the power of fourth, fifth, and sixth spectrogram windows), 92% accuracy is achieved. For the second classifier (fall activities classifier), the best accuracy of 94.1% is achieved with a selection of 35 of 56 features. Here, the number of needed features is much more than that for the first classifier, because the classes are close to each other (the activities are more similar). As seen in Figure 11c, 18 features are needed to offer approximately 83% accuracy. The most robust features for the second classifier are range difference, entropy, event width, the power of fifth, sixth, seventh, and eighth spectrogram windows, eigenvalues 2 and 3, and the maximum and standard deviation of EBC. Regarding the last classifier (non-fall activities classifier), the best recorded result is 95.6% with 22 selected features; an accuracy of 85.6% with only six features can be obtained in which the features are entropy of spectrogram, entropy, the total power of spectrogram, and the power of fourth, fifth, and sixth spectrogram windows.

| Classification
In this section, after performing feature extraction and selection, a resultant vector of selected features for each data sample is the input of the classifier. However, the classification stage is divided into two main steps. The first step consists of only one classifier, but before choosing the classifier type, a performance evaluation for different classifiers such as SVM and decision tree is performed. The same analysis is done for classifiers for the second step. The purpose of the first classifier is to distinguish and categorise the fall activities separately from non-fall activities. In the next step, we have two distinct classifiers that can be used simultaneously. The two output groups of the first step classifier, fall activities and non-fall activities, are classified in simultaneous, independent, parallel ways to specify whether the motion type was fall forward, fall backward, fall left or fall right (fall group), or sit, stand, bend, or walk (non-fall group). The different types of classifiers are evaluated to choose the best classification method for each classifier:

| RESULTS
All experiments of HAC classification used the X4M03 UWB radar with a sampling frequency in observation time of 400 Hz to cover 0-200 Hz Doppler frequency components. The experimental setup of two different scenarios for HAC is presented in Figure 12. The next two subsections show the result of the training and test data, respectively. In what follows, the metrics used to rate the performance of classifier are: Precision ðPRÞ ¼ Sensitivity where TP and TN denote true positives and true negatives, respectively, and FP is the number of false positives and FN denotes the false negatives. The results of the preprocessing stage and feature evaluation were discussed in previous sections.

| Results of training data using fivefold cross-validation
The evaluation of HAC algorithms consisted of two main parts: feature evaluation (Section 3. performance analysis. In this section, training data were used to evaluate different types of classifier. Only methods with the highest accuracy for each family method are shown in the results tables. Other evaluated methods with lower accuracy within the same family method are not presented. As an example, suppose the method family is 'tree'. The highest accuracy for this method family was achieved using a fine tree method. Therefore, only results for the fine tree method are shown in Tables 3 and 4. The same was applied for all family methods with the highest accuracy overall methods highlighted in green. Table 3 shows that the best classifier of the first step of the classification stage (to distinguish between fall and nonfall activities) was the ensemble bagged tree classifier with an averaged accuracy (AC) = 96.9%, precision (PR) = 96.4% and SE = 98.5% in a fivefold cross-validation with a training time of 3.188 s and prediction speed of approximately 800 observation/s. Hence, this method outperformed the accuracy of other methods at least by 5.3%. The results shown in Table 3 correspond to one run, and the averaged result over 10 runs was AC = 96.7%. The second step of classification consisted of two classifiers also in a fivefold cross-validation. The first was a fall activity classifier with four classes (fall forward, fall backward, fall left, and fall right) and the second was non-fall activities classifier with four classes (sit, stand, bend and walk). Table 4 shows the performance metrics of the different classification methods for the fall and non-fall activities classifiers.  Tables 5-7. After evaluating the performance of classifiers, the two-stage classification approach was evaluated. To this end, its results were compared with the one-stage classification ( Table 8) Table 8 is 86.7% using 44 selected features. A better result can be achieved using two-stage classification with fewer predictors or features. The green and red highlights in the confusion matrix mean true and false prediction, respectively.

| Results of validation data
The main goal is to export a model that can be used to categorise life activities of humans (using incoming or tested data), especially with regard to fall and non-fall events. After performance evaluation using training data with a fivefold crossvalidation of the proposed approach of the HAC algorithm, the optimum features were selected for each classification step and the best classification methods were chosen for each classifier. The model can be exported using the performance analysis mentioned earlier, as seen in Figure 13. The results of test data classification using the exported model are listed in Tables 9-12. Table 9 presents the accuracy of the three classifiers used in the proposed approach with and without validation. The results of the test data approximately match the training data with some decrease in accuracy percentage because the test data were a small data set and errors or false predictions were accentuated in the results. In addition, the results of the fivefold cross-validation classifiers outperform that of the no-validation classifiers by approximately 6%. The

| DISCUSSION
In this section, the results of the proposed algorithm are compared with results found in the state of the art. Table 13 shows the comparison of the proposed approach with the reference methods mentioned in the introduction. The proposed approach outperforms the approaches used in the reference papers in term of accuracy: � By 3%, compared with Kim and Ling [23], who used only the TF domain and SVM for classification with manual feature extraction to classify seven human activities (AC = 92.8%). An HAC work by Kim and Moon [24] used only TF data representation; but with the combination of DCNN for automatic feature learning and classification of seven activities, the obtained AC of 90.3% is less than our proposed method by approximately 5%. The training time for each fold with 400 epochs was about 1420 s on average, which is much longer than the training time of the proposed method (less than 7 s applying fivefold cross-validation), in which MATLAB is used for both to train the data. In Kim and Moon [24], the number of samples was 336; three spectrograms were extracted from each measurement, giving 1008 samples in total. The reference method used the whole spectrogram of 300 � 140 as input for DCNN classification, whereas in the proposed method the features from the spectrogram and radargram were used as input for a classifier, which reduced computational complexity.
� By approximately 4%, compared with the method used in Jokanovic and Amin [27], in which data are first classified according to the range extent (two classes) and then by TF representation (four classes) and the accuracy was 91.25% (our approach has eight classes or activities). The computational time of the method used in Jokanovic and Amin [27] is 1.4 s. � Compared with Jokanovic and Amin [28], the same performance (AC approximately 97%) was obtained. The classification approach in Jokanovic and Amin [28] used DL algorithms for feature extraction and classified only two activities whereas our proposed approach distinguished among eight different types of fall and non-fall activities. The total number of samples for Jokanovic and Amin [28] is 208, whereas in the current work, 300 samples exist. This made the comparison more equitable regarding the dataset size and using the same performance metrics (accuracy and confusion matrix). There is no information about the computational complexity in Jokanovic and Amin [28]. � By 4%, compared with the method used in Sadreazami et al. [36]. Sadreazami  for radar dataset). Our proposed approach shows superiority even when a 2D representation has been used. � By 5% more than the achieved accuracy in Du et al. [40].
SCGRNN was used to classify six different human activities using micro-Doppler spectrograms. Segmented features of spectrograms were extracted using a convolution operation. The weakness in Du et al. [39,40] was the inability to capture the local structure of the micromotion signature.  [41]. The range domain was used for binary classification (first stage) into in situ motions and non-in situ motions with 99.9% accuracy (using SVM). After that, the physical empirical and PCAfeatures were extracted from WRTFT spectrograms and used to classify into in situ motions (KNN) and non-in situ motions (bagged trees), respectively. � By 0.4 compared with Li et al. [42]. A one-versus-all hierarchical classification was used with five separate models. Conventional ML algorithms, four SVM and one bagged tree, were used for classification. This approach has redundancy and computational complexity. The training data for the unclassified activities were used for each previous model before being classified. � The accuracy of the Qi et al. [43] was above 97%. Multistage classification and combined conventional machine KNN and GoogLeNet (CNN-based method) were used to distinguish 12 kinds of human activities.
Therefore, the use of multiple-domain and multistage classification improves accuracy for motion recognition. DL methods, especially neural network-based methods, show superiority in terms of classification accuracy, but they increase the complexity of learning. Referring to Qi et al. [43], the computational complexity of GoogLeNet, CNN-gated recurrent unit, and deep convolutional autoencoder is higher than SVM, KNN, bagged tree, and PCA classification. Subsequently, the complexity of the proposed method is less than the methods used in Qi et al. [43] because it uses the KNN and bagged tree as a classifier. The proposed method provides high classification performance with efficient complexity by defining and selecting a group of robust features from multiple domains and choosing the optimum classifier for each classification stage.
The hyperparameter sensitivity was also investigated. The hyperparameters are the maximum number of splits and number of learners for the first and third classifiers, and the distance metric and distance weight for the second classifier. These hyperparameters were tuned and the corresponding classification accuracy obtained. Tables 14 and 15   This study did not consider situations such as the presence of multiple targets or moving obstacles in the radar field, other daily life activities (not considered in the training data), and variations owing to irregular motion (because classification is founded on micro-Doppler features, such variations can cause changes in micro-Doppler signatures and lead to misclassification). In the future, the performance could be enhanced with a larger dataset compiled of actions from both young and older subjects and over a range of environmental conditions. Besides, in situ measurement could also be considered, such as by installing the radar in home and workplace settings and recording different actual daily life activities.

| CONCLUSION
This work proposes an approach to HAC applying a twostage ML-based algorithm to data collected from backscatter UWB radar signals. A two-stage classification algorithm for HAC was proposed and evaluated. The signal modelling of HAC using UWB radar was discussed and presented. A preprocessing stage was applied to mitigate the contribution of clutter and noise. The cleaned data were represented in two domains: TF and TR. Then, 56 features were extracted from these two domains, containing range features, Doppler signatures features, and statistical features. Important features were range difference, event width, spectrogram entropy, power of spectrogram, and the power of the third, fourth, fifth, sixth, and seventh spectrogram windows. Features of each classifier were selected to provide the optimum accuracy. The best recognition method for each classifier was chosen as ensemble bagged tree, fine KNN, ensemble bagged tree for fall/non-fall classifier, fall classifier, and non-fall classifier, respectively. The proposed two-stage algorithm with eight classes outperformed by 9%, a singlestage algorithm that used the ensemble bagged tree classifier with 44 selected features. Finally, the performance of the proposed approach was compared with the literature, and the proposed approach showed better performance.