Activity recognition with cooperative radar systems at C and K band

: Remote health monitoring is a key component in the future of healthcare with predictive and fall risk estimation applications required in great need and with urgency. Radar, through the exploitation of the micro-Doppler effect, is able to generate signatures that can be classified automatically. In this work, features from two different radar systems operating at C band and K band have been used together co-operatively to classify ten indoor human activities with data from 20 subjects with a support vector machine classifier. Feature selection has been applied to remove redundancies and find a set of salient features for the radar systems, individually and in the fused scenario. Using the aforementioned methods, we show improvements in the classification accuracy for the systems from 75 and 70% for the radar systems individually, up to 89% when fused.


Introduction
The increase in the average age of the population worldwide has opened up new healthcare-related challenges requiring solutions, which allow maintaining both independence and a decent quality of life [1]. These challenges often appear as multi-morbidity conditions, which correlate with age and they present a problem with wide ranging repercussions on the individual as well as their supporting healthcare system [2]. Falls and fall-derived injuries are a key risk for older people as they are the first sign of age above disability free life expectancy. Their effect is visible in the UK where the National institute for health and care excellence UK (NICE) estimates a cost of £2.3 billion per year due to falls [3]. Aside from the financial strain, there are further consequences on the mental and physical well-being of the person as they can experience loss of mobility and loss of confidence.
NICE recommend a risk prediction tool that predicts the likelihood or risk of a person falling and this is the reason behind the desire to monitor activities beyond falls as they can help not just to detect falls but also to prevent them. There is also an appetite to find multifactorial risks behind a fall with abnormalities of gait, muscle weakness, cardiovascular health and osteoporosis being key assessment targets [3].
For this, activity monitoring with the use of various technologies has been suggested: ambient sensors such as radar and ultrasound [4] that work without involving the user; video monitoring with depth cameras, which rely on image-based techniques [5,6]; wearables which use one to one limb traversal information [4,6] and environmental sensors such as pressure pads or fall detecting floors [7].
Ambient sensors are emerging in the assisted living context due to their key benefits compared with the other technologies [4]. The main attraction comes in the form of convenience, as they do not require the end user to interact with the sensor. The ease of integration into the user's environment and privacy-oriented nature (as they do not take pictures or videos) makes them ideal for assisted living applications. Further acceptance issues involving comfort, as some sensors have to be worn but also image, as some sensors can be inferred as sign of disability, are less problematic with radar [4].
Classifying activities with radar readings from human movement usually rely on the micro-Doppler effect [8]. This is observed in joint time-frequency (TF) representations of radar returns, as minute frequency modulations caused by rotational or oscillatory movements alongside the central Doppler shift caused by the traversal main motion. Micro-Doppler has been used in [9] to estimate different gait patterns and in [10] for elderly fall detection. It has also been applied for detecting armed and unarmed personnel [11] and hand-gesture recognition [12].
In this paper, we use two radar systems, a K-band continuous wave (CW) and a C-band frequency modulated continuous wave (FMCW), independently and then co-operatively to classify various daily human activities and falls. Previously, we have fused radar with inertial sensors to achieve a similar effect [13,14], but the information fusion of two separate radar systems operating at different frequencies is, to the best of our knowledge, novel for assisted living applications. As radar-on-chip technology makes the development and deployment of multiple radar sensors costeffective and less complex than before, there is scope to explore what additional information for assisted living applications can be leveraged by multiple, cooperating radar systems operating with frequency diversity.
In our previous work, we have evaluated activity recognition with a single radar system, comparing between the support vector machine (SVM) and K-nearest neighbour classifier [15]. Since SVM was the stronger classifier, this piece will focus on the results for it.
We have also utilised feature selection to retrieve salient predictors from the feature subspace as in [13] to improve classification. The difference here comes from the use of one-vsone error coding and use of multiple radar systems cohesively.
The paper is organised as follows: Section 2 outlines the radar systems, the experimental setup, the activities recorded and the participants. Section 3 outlines the processing of the raw readings and feature extraction. Section 4 discusses the classifier used; the feature selection method and we outline the need for the fused feature set. In Section 5, we demonstrate the improvements brought about by our proposed methods as results.

Sensor setup
Measurements from two independent radar sensors were collected at James Watt South, University of Glasgow over a period of 1 week. The first radar was an FMCW system operating with a centre frequency of 5. in line with the torso of the participants with the transmit power set at 17 dBm. The second radar was a CW system with a carrier frequency of 24 GHz. Its transceiver was a micro strip patch with a transmit power of 1.2 dBm. Both systems were located at a close proximity with the aspect angle of the target being at effectively zero degrees and they were set up as shown in Fig. 1. The distance between the antennas and the participant was 1 m with added variability depending on the action.

Data collection
This set of recorded activities is based on assisted living set, which has been utilised in [13,15]. The activities are comprised of three central movements: dynamic movement; torso traversal and limbbased activity.
• Dynamic movements have a wide range of motions involving translation of the torso and all limbs as seen in activities: walking and walking with objects. • Torso traversal activities have a central component of movement of torso, which is visible in activities: sitting, standing, bending to pick up objects, checking under bed, bending to tie shoelaces and falling. • Forelimb-based activities involve the movement of arms and the activities in this set are focused on interactive activities such as drinking water and receiving a phone call.
False alarms are a key concern in fall detection systems as valuable resources could be unnecessarily used up following a false fall identification. Therefore, a representative set requires activities similar to the main signature of interest, in other words, confusers. A5: bending to pick up objects, A6: tying shoelaces and A10: checking under bed are considered as confusers for fall. These activities are also listed in Table 1. Twenty male participants between the ages of 21 and 34 years contributed to the data set. Variety in the form of body height and body shape was present and for the activities, the length of the recording varied between 5 and 10 s depending on the complexity for movements.

Data processing
Short-time Fourier transform (STFT) of the radar returns were taken to project the time variant property of the signal. Prior to this, moving target indication filtering was performed to remove clutter from static objects such as furniture and walls. The window for the STFT was 0.2 s with an overlap of 95%.
The frequency bands for human micro-Doppler are limited to a 100-Hz window where the simplicity and the computational efficiency of the STFT is more desirable. Other TF transforms such as Wigner-Ville decomposition and Hilbert-Huang transform have been proposed but have not been implemented [16]. The Fourier transform of the spectrogram in time gives the cadence velocity diagrams which was also utilised for extracting time localised information [17].

Feature extraction
A plethora of features have been suggested for the purpose of classification of micro-Doppler and radar information. These features could be grouped into three categories: Previous applications of these sets of features include measuring lameness in horses [18] and activity classification [13].

Classifier
SVM [19] can be used for separating data into two different classes with a vector containing specific points close to the separating hyperplane. If a clear separation margin which identifies classes in two-dimensional space is not present, a kernel can be utilised to map features to a higher dimension for class separation. Different kernels can be chosen depending on the distribution of the data, such as Gaussian, cubic, quadratic etc. to better fit the classification problem. As this is a multi-class problem, error-correcting output codes [20] are utilised in this scenario with one-vs-one selected in this instance.
One-vs-one error coding is an exhaustive search of subspace where pairwise comparison is made. Here, for k classes, k k − 1 /n binary SVM classifiers are run and each of the decisions are expressed as a confidence that the observation tested is a specific class. This is illustrated in Fig. 2 and this is the reason behind the strength of SVM in classification. Each binary pair (i.e. A1 and A2 then A1 and A3 and so on) assigned positive for true class and negative for the false class [18] with a selectable binary loss function which evaluates the posterior class probability, giving a value of confidence with the decision. Ultimately, the class with the least loss value is output as the predicted class.
We have previously shown in a multi-class human activity problem, SVM is superior [15]. It is also the classifier of choice for activity classification with radar with [17,19,21] showing its relative strength for this application and warranting its use in this scenario.   Fig. 3a shows the binary loss values for the different classes for the radar systems. In this case, the target class is A9, which is the class of interest: fall and we can see the confidence, expressed as binary loss, for A9 is the minimal among all activities. This scenario represents the ideal case where both radar systems correctly attribute the features to the correct decision. On the other hand, Fig. 3b shows an instance where the decision is less clear. Here we have the binary loss values for A7: Drinking from a cup, which the FMCW system correctly classifies. However, the CW radar in this instance picks up the confuser A8: taking a call as the predicted class. Furthermore, it is apparent that fusing helps here as it also selects A7 correctly.

Feature selection
In cases where the feature space is wide and contains many covariate features [22], it is necessary to have the most salient features for minimal effort for class prediction. For this, a few methods have been suggested: Filter methods use the distribution of the features to find distances between points or correlative properties to score features. Wrapper methods search the feature space for combinations resulting in the maximum classification performance iteratively, either by adding or by removing features one by one. Compared with filter method, depending on the feature space wrapper methods can be more computationally complex as they are iterative and exhaustive. In this work, the wrapper method has been utilised.
Sequential feature selection (SFS) is a forward search wrapper method, which has been utilised to improve prediction for activity classification applications before [13,19].

Results
Using the dataset mentioned, the data was split into 70% for training and 30% for testing. The separation was done in a stratified manner so the class ratios were preserved and class imbalance during training was prevented. This process was repeated 20 times after which the average was taken and the mean values are presented in this section.
With 21 base FMCW radar features, the overall classification accuracy was 75%. The CW radar with 19 features has even more limited performance with rates of ∼70% when all features are used. In Fig. 4, this is shown as a triangle on the blue line.
First, SFS does not appear to provide a great increase when the standalone radar systems are used independently. However, it does show that selecting ten features can get the classification accuracy to within 2% of the maximum value attainable in both cases.
The real noticeable improvement comes when SFS is used together with fused features from both radars. With all the features pooled together, there is an improvement in classification to 83%, shown as a triangle for the fused plot, without any feature selection. SFS increases this further to ∼90%, denoted by a star on the yellow line in Fig. 4, with only 15 features used out of the 38 total. Out of these 15 features selected in the multi-radar fusion scenario, 4 are extracted from the CW radar data, and 11 from the FMCW radar.
Looking at the radar systems individually, the CW selects features from the SVD and physical features (centroid and bandwidth), whereas the FMCW chooses energy curve, centroid, cadence velocity diagram-based features, and SVD-based features. For both systems, the SVD features selected are predominantly from the left singular vectors, in other words, the spectral information. Interestingly, in the fused case, the features selected appear to be different from the individual case. While centroid is present for both cases, the bandwidth from the CW appears to be redundant when fused. Understandably, many of the features generated appear to be covariate with their CW and FMCW counterpart. Additionally, this indicates the presence of less significant features, which could be removed to improve efficiency for implementation into real world systems. There are also class-  based differences between the radars for specific activities that is shown by analysing the accuracy of the classifiers on a per class basis. The true positive rate where the classifier has identified the class correctly, in other words the accuracy, is shown in Table 1.
Here we compare the accuracy for all three situations, where each of the sole radars are used and when they are fused together. Specifically looking at A9: fall; which is the activity of interest in our case the CW system appears to be strong at detecting it compared with FMCW. Fusing here seems to bring the helpful features, which identify falls as the accuracy to a similar level as the CW case. Some redundancy remains, as fall detection accuracy is 1% lower when fused.
For activities A3, A4, A5 and A6, fusing improves the accuracy from between 69.05 and 83.61% to between 90.39 and 94.22%. The classification accuracy of this small cluster of activities seems to have increased the most by the cooperative use of radar. In Table 2, aside from the clusters A1/A2 and A7/A8; the remaining activities highlighted in green are classified to an accuracy above 90%.
Notably, for A2: walking with an object, the accuracy is lower for the fused case at 75.05% when compared with FMCW alone with 83.95%. This appears to be due to the features from CW influencing the decision process as it is close to the 73.89% accuracy attained when only CW is used. The accuracy for A1 and A2 remains low even after feature selection and fusion. This is shown in Table 2 where we can see that confusion occurs between the two types of walking movements. However, this is due to both motions having a similar movement and variety in walking pattern from the participants. It is also not a severe outcome as it is not a confuser for A9.
In Table 2, we see that there are missed detections for A9: falls and misidentifications of other classes as falls. 1.95% of A6 were identified as falls. Some fall events are detected as A6 but also surprisingly A8, which is not a confuser for A9.
Although initially it seems the CW performs better individually, Tables 3 and 4 show that there are improvements to be seen with fusion for incorrect classification and missed detections too. Table 3 shows that a lower proportion of A9 is incorrectly detected but with the fused scenario the erroneous detections occur with A6 as opposed to A5 for CW and A5/A4 for FMCW. The CW radar alone performs similarly with only A5 being incorrectly identified as falls but with higher misclassifications 2.95%. The FMCW is the worst performing system here as it incorrectly identifies 3% of A4 and 2.17% of A5 as A9.
A number of falls are misclassified as other activities, specifically, A3, A4, A6 and A8 and the proportion of which is shown in Table 4. Here, we see the 1% loss in fall detection accuracy is offset slightly as without fused features, falls are identified as three different classes for both sole radar systems. Fusing reduces this to two classes that are identified incorrectly, and we also see here that the features from the CW system are offsetting the errors from the FMCW. The bias towards A8 that the FMCW has still appears to be there to some extent as 2.06% of A9: falls are still predicted as A8.

Conclusion
In this paper, we demonstrated that fusing the features from the FMCW and CW brings about improvements in classification accuracy in general but also for missed detection and misidentification of falls as other classes. Using data from the radar returns of the 20 participants, classification rates of up to 89.54% was achieved with the help of fusion along with SFS. The central property of feature fusion appears to be bringing in the strengths of the feature set from both sensors and SFS which seem to simply enhance this as it has more choices to make the optimal feature selection.
These methods appear to make remote monitoring viable but a scope of improvement remains as the system still exhibits missed detections and misclassifications of falls (A9).    To attain the desired rates for class identification overall, work will be done in the future to integrate features that are able to isolate activities A1 from A2 and A7 from A8 which are currently lowering the overall average significantly.
Furthermore, diversity in sensor locations; participants and radar systems will be tested to improve overall classification rates.

Acknowledgements
A. Shrestha was supported for his PhD by the UK Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Award to the School of Engineering.