An Intelligent Implementation of Multi-Sensing Data Fusion With Neuromorphic Computing for Human Activity Recognition

The increasing demand for considering multisensor data fusion technology has drawn attention for precise human activity recognition (HAR) over standalone technology due to its reliability and robustness. This article presents a framework that fuses data from multiple sensing systems and applies neuromorphic computing to sense and classify human activities. The data is collected by utilizing inertial measurement unit (IMU) sensors, software-defined radios, and radars, and feature extraction and selection are performed on the data. For each of the actions, such as sitting and standing, an activity matrix is generated, which is then fed into a discrete Hopfield neural network as a binary feature pattern for one-shot learning. Following the Hopfield network neurons’ feedback output, the conformity to the standard activity feature pattern is also determined. Following the Hopfield network neurons’ feedback output, the training of neurons is completed after two steps under the Hebbian learning law, and the conformity to the standard activity feature pattern is also determined. According to the probabilistic statistics on inference predictions, the proposed method, that is the neuromorphic computing of the three data fused framework, achieved the box plot for the highest lower quartile output of 95.34%, while the confusion matrix classification accuracy of the two activities was 98.98%. The results have shown that neuromorphic computing is most capable of multisensor data-fusion-based HAR. Furthermore, the proposed method can be enhanced by incorporating additional hardware signal processing in the system to enable the flexible integration of human activity data.


I. INTRODUCTION
I N RECENT years, the application of multisensor data fusion technology has become popular for military, industry, and emerging technology development applications [1]. Multisensor information fusion (MSIF) is an information processing technique in which the data from multisensor or multisource hardware are fused and analyzed to complete the required decision making and estimation [2].
MSIF technology is widely used in robotics [3], gait detection [4], remote sensing [5], healthcare [6], and other fields [7]. Research studies have proven that compared with singlesensor systems, the use of MSIF technology results in accurate detection and tracking of subjects' activities [8]. Moreover, it can enhance the validity, reliability, and robustness of the entire system, improve data credibility to increase accuracy, expand the time and space coverage, and reinforce the system's real-time performance and information utilization [9].
Muhammad et al. [10] proposed a data-fusion-based system for ensemble computing with the random forest algorithm to predict results from multiple sensors. The results of the study was promising as it recorded an average accuracy of more than 90% after performing data fusion. Li et al. [11] used the sequential forward selection (SFS) method to fuse the inertial measurement unit (IMU) and radar information to form time-series data, which can be used as features to train the support vector machine (SVM) and artificial neural network (ANN) algorithm for classification computing, which increases the accuracy by approximately 6% compared to using a single type of data.
In view of the uneven data quality of different hardware platforms [12], Huang et al. [13] used multiscale features by three sparsity-invariant operations. It depends on a hierarchical multiscale encoder-decoder neural network, which is used to process sparse input and feature maps for multihardware data. The features of multiple sensors can be fused further to improve the performance of deep learning algorithms. However, a multisensing system normally requests hardware platforms to work synchronously to ensure the collected data time axis is unified in the coordinate system.
A current research focus revolves around the development of high accuracy human activity recognition (HAR) systems using the limited data sets available. Traditional machine This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ learning (especially, deep learning models) has achieved practicable results in the HAR field [14] but it has also led to a large amount of training data collection overhead [15]. On the upside, deep neural networks are friendly to highdimensional data learning and it completes the end-to-end calculation without the more cumbersome process of feature engineering. On the downside, it causes problems, such as huge demand for training samples, complex model structure, and time-consuming training [16]. Moreover, it loses the cognition of features, and there are challenges to knowing the importance of data features [17]. On the other hand, neuromorphic computing has required fewer training samples to achieve high accuracy recognition results [18]. It is based on the combination of feature engineering for the abstract expression on the object and the associative memory function of neuromorphic computing, which achieves one-shot learning for HAR.
This article presents a novel multisensing HAR system, which is a neuromorphic-computing-based data fusion method. It extends on the work presented in [19] where IMU sensors, radar, and the universal software radio peripheral (USRP) signals are used for HAR. Our method is to construct a feature matrix to fuse different hardware information as a unified data input to a Hopfield neural network. The constructed activity feature matrices depend on attention mechanisms to combine IMU, radar, and USRP signals for feature extraction and selection. The multihardware data are then fused for better classification and recognition accuracies using the Hopfield neural network as compared to traditional data fusion results.
The main contributions of this article are as follows. 1) We explored neuromorphic computing methods in the HAR task, which is based on the Hopfield neural network. The advantage of one-shot learning, it is only one training sample request that is friendly for limited data sets. 2) We construct an attention mechanism of the data fusion framework for multisensing device signals. It depends on TopK calculation to feature selection to achieve feature maps that are different from traditional handcrafted features. This article is organized as follows: Section II outlines how the human motion data from the IMU, radar, and USRP are collected and modeled. In Section III, the feature matrix details of the data fusion for signal preprocessing and the algorithm calculation workflow, are detailed. Section IV presents a quantitative evaluation of the application of neuromorphic computing on the fused data set in the context of existing studies in the literature. Finally, Section V summarizes the multisensing data fusion implementation of the Hopfield neuromorphic computing to HAR and outlines the potential future direction.

A. Data Collection
At present, there are many types of sensing hardware that can capture human movement information, but the acquisition of signals by a single device is relatively limited. In general, IMU sensors are low cost, easy to use, and less restricted by usage scenarios, and have been integrated into many wearable devices. However, its serviceable range and accuracy are inferior to those of USRP and Radar, and its performance is constrained by other components, such as batteries and microprocessors. USRP can achieve higher precision object detection through the Doppler frequency-shift principle. Depending on high power support, hardware performance can be better released. However, it is generally used in fixed scenes that cannot move quickly, which means that the capture of object signals is easily affected by some factors, such as occlusion and limited angle. Comparatively, the radar mainly transmits electromagnetic waves and receives echoes to obtain the distance, speed and angle of objects. It has good penetration and a strong resolution ratio. However, it is bulky and complicated to install. Therefore, following the advantages of these different types of devices, we can integrate them together to form a multisensing human activity perception system, which complements each other and realizes a more stable and reliable HAR task. After the data fusion method is adopted, the Sensor, USRP and Radar will provide different perception information, which can overcome the limitations and discrepancy of a single device in terms of geometric, spectral and spatial resolution. Finally, it improves the data quality and, thus, facilitates the positioning, recognition and interpretation of human movement information.
The data collection of human body movements was performed using three sensing hardware platforms, as listed in the following. 1) Shimmer 3 IMU sensor [20].
2) Walabot Radar DIY model [21]. 3) USRP [22] X300 unit. First, the IMU sensor was worn on the wrist where the three axes of the coordinates system of each sensor (gyroscope, accelerometer, and magnetometer) have the spatial coordinate information of X, Y, and Z respectively. Then, the radar and the USRP were positioned at a distance of 2 m from the fixed human activity position (see Fig. 1).

B. IMU Sensor, USRP, and Radar Modeling
IMU Sensor Modeling: The IMU [23] constitutes a gyroscope, an accelerometer and a magnetometer, used in measuring the attitude angle of an object. The gyroscope detects the angular velocity signals relative to the three degrees of freedom (X, Y, and Z) in the coordinate navigation system, and the accelerometer monitors the acceleration signals of the independent three axes of the object carrier coordinate system in X, Y, and Z directions. The magnetometer can obtain the surrounding magnetic field information. It can calculate the angle between the module with the north direction through the geomagnetic vector and help correct the angular velocity parameters of the gyroscope. The real-time output that includes the 3-D angular velocity signal, acceleration signal, and magnetic field information is used to calculate the object's posture. To capture this information, the voltage signals of the x, y, and z axes in the IMU sensor are digitized at sampling frequencies of 20 Hz for the magnetic field and 400 Hz for the accelerometer and gyroscope. The working current of the sensor is 500 μA with a power supply voltage of 3.3 V, resulting in a total power consumption of 1.65 mW.
Radar Modeling: The Radar device used in this article is an off the shelf "Walabot DIY" device. The device is designed to use radar technology to detect metal and wooden studs as well as electrical wires inside of a wall to assist users with DIY tasks around the home. However, it can also be used to detect human movements [24], [25]. The Walabot radar is a multiple-input and multiple-output (MIMO) device and does not allow for its preset parameter to be tuned. Hence, the data for this experiment was collected using the predefined settings of the product.
USRP Modeling: The USRP device is a software-defined radio (SDR) used to enable radio-frequency (RF) communication between two antennas. Two omnidirectional antennas are connected to a single USRP device, that is, one as a transmitter and one as a receiver. The data collection window was set to 5 s during which the activity took place. During the 5-s communication window, the channel state information (CSI) are captured, reflecting the activity performed. This process is repeated multiple times to capture several samples for each activity, where the amplitude of the RF signals is extracted from the CSI. The USRP was configured to operate at 2.4-GHz frequency similar to Wi-FI, with a 20-MHz bandwidth.
In this article, the USRP is set up to communicate using orthogonal frequency-division multiplexing (OFDM) [26]. Channel estimation is an important feature of OFDM as it monitors the state of the channel for the purpose of improving performance. Channel estimation does this by using a specified set of symbols known as pilot symbols. These symbols are used in the transmission of the data and once the receiver antenna receives the data, the received pilot symbols are compared to the expected pilot symbols and this provides the details of the state of the channel. Fig. 2 shows the raw data as captured by the IMU sensor, radar, and USRP devices, where Fig. 2(a) and (b) represents those of the sitting and standing activities, respectively. It is worth mentioning that the data collected from all three devices was not synchronized due to the difficulty of controlling the start and end of the data collection window and the sampling of each sensor was independent and different from each other. This resulted in an inconsistent time stamp of the collected actions, as shown in Fig. 2(c).

C. Data Principles
The error formula of the action-state variable inhere can be summarized as (1). Here, δk is the error value between time i and time j, a is the state quantity, t is the time difference, and dt represents the microvariable with t as the variable. Following the normalization process of the raw data from each sensing unit, the measured values are then converted to unified coordinate system, which eliminates the time stamp of the information. This is shown for the sitting and standing activities in Fig. 2(d) and (e), respectively III. PROPOSED STRUCTURE MATRIX TO DATA FUSION Fig. 3 shows the framework and the data flow from the multisensing stage to the neuromorphic computing stage, for HAR. First, human motion information is, respectively, collected on different hardware platforms, and features are extracted from the collected raw data.

A. Feature Extraction and Feature Selection
Feature Selection and feature extraction are two important subcontents of Feature Engineering. Among them, feature extraction can find the attributes that best represents the uniqueness of the data [27]. Feature selection is to select the appropriate feature from the candidate features [28]. It can reduce the dimension of the data, improve, and optimize the ML model's performance. Fig. 4 shows the process from raw data feature extraction to the attention mechanism [29] of TopK [29] feature selection [30], and binarization for human activity features map. Fig. 4(a) is the raw multisensing data calculated by a treebased prediction model that can be used to list features and obtain the heat map after the TopK order [29], [30]. Fig. 4(b) is the 5×5 feature matrix after extracting the best 25 features of TopK computing. Finally, Fig. 4(c) and (d) is the human activity feature pattern after binarization by features values (following positive and negative values to binarization).
Feature extraction obtains a new feature space by transforming or mapping the original raw data, such as mapping from 3-D space to 2-D space. The purpose of feature extraction is to use fewer features to represent most of the information in the original data space. Thus, it can improve computing efficiency and reduce dimension disasters.
The attention mechanism [29], [29] of neural networks is a resource allocation scheme. In neural network learning, the stronger expression ability of the model requests more parameters on the neurons. Meanwhile, more information can store on neurons, but this will bring information overload. Therefore, depending on the attention mechanism, the neuron network pays more attention to the high critical information on the current task. Meanwhile, filtering out irrelevant information and reducing attention to other information. As a result, information overload can be solved, and the accuracy and efficiency of task operation can be improved, by allocating computing resources to high important tasks. Inhere, the attention mechanism selectively ignores unimportant information by the following activity features' importance. Then, it focuses on these features to express the corresponding activity. The focusing process is reflected in the calculation of feature weight coefficients. The weight shows the essential features of data. Through the heat map of feature correlation, there are TopK [31] (K = 25) features selected to represent the original information of the activity.
The formula of the attention mechanism's distribution probability is represented by (2) [32]. Source is the stored data, and Query is for fetching the corresponding value in the memory of stored data as the attention value. The Lx denotes the length of the Source, it is a series of <Key, Value> data pairs. In this case, the weight coefficient of the corresponding Value of each Key can be obtained by element Query in the Target. First, it calculates the correlation or similarity between Query and each Key and then, the Value is weighted and summed to get the final Attention value. Essentially, the Attention mechanism is a weighted sum for the values of elements in the Source, while Query and Key are used to calculate the weight coefficient of corresponding values Similarity Query, Key i • Value i . (2)

B. Hopfield Neural Network and Euclidean Distance
Designing neuromorphic computing for end-to-end signal processing. First, the raw data is the feature extracted through data preprocessing, and the feature map of the corresponding activity is obtained as explained earlier in Section III-A. The binary feature pattern is then fed to the Hopfield neural network [33] for training. Finally, the output signal is compared by the Hopfield neural network and the corresponding activity feature map. It can recognize the input signal that has been trained or not to achieve the inference result of the activity. Depending on the Hopfield neural network is a fully connected structure of the recurrent feedback neural network to achieve the associative memory of neuromorphic computing. Fig. 5 shows the network architecture.
The discrete Hopfield neural network (DHNN) [34] is based on binary feedback to realize associative memory work. Following the step function of activation calculation to each neuron, its input and output of the neuron are binary values of −1 and 1. The Hopfield neural network training phase is illustrated in Fig. 6 for the sitting and standing activities. Fig. 6(a) depicts one Hopfield neural network state-space neuron that processes the training for both activities. The weight of the DHNN is calculated using a binary feature matrix (5×5 feature pattern achieved by the above feature extraction and selection of human activity), and trained by the Hebbian learning law [36]. The x-Axis means the Feature Matrix input to corresponding Neurons, and the y-Axis shows the Neurons' state change after feature matrix data input. With the process of learning, the state values (z-Axis) of all neurons will tend to be stable, which means that the neural network training is completed, and the neurons have the function of associative memory for the specified feature matrix. Inhere, it indicates that the neuron has reached a stable state after the two steps of training. Fig. 6(b) displays the entire DHNN weight output after training the Hebbian learning algorithm using the two preprocessed binary patterns of activities. The weight value means the connection relationship in the neural network architecture. Due to the fully connected layer neural network structure, it is a 25×25 array that shows neurons' connection strength to each other, which corresponds to associative memory function for learned two activities' feature matrices.
All device signals will go through the feature extraction to output different feature patterns. The input data is transferred to the neurons, and it is like a filter that only passes the  data for trained two activity feature patterns on the activation state. Then, the Hopfield neural network output links to the Euclidean distance algorithm. It is based on the similarity to estimate the recognition result of the HAR, which compares the output of the neural network and trained feature patterns. The Euclidean distance to calculate the similarity is the distance between two points and it is always a nonnegative number [35]. Thus, the similarity value range is between [−1, 1], and its reciprocal will control the result between [0, 1]. At this point, the distance is negatively correlated with similarity. Two trained activities will get a high probability similarity output, while other data signals will output a low probability similarity because the neuron is not activated. Finally, the classifier for neuromorphic computing is completed to realize effective HAR.

C. Proposed Algorithm Implementation Scheme
Algorithm 1 verifies the feasibility of the whole framework theoretically, and shows the specific calculation process of each step in the workflow. In order to avoid the interference between the different types of hardware signals in the calculation, feature extraction will be performed separately first and then work on the feature-level fusion. This processing helps different types of signals keep the original information. Depending on the attention mechanism [32] of TopK computing, the most important subfeatures can be extracted from the fused feature set. In order to make the Hopfield neural network get better processing results, the activity feature matrix is converted into the binarized feature pattern by calculating the threshold values. Finally, following the calculation of the similarity between the Hopfield neural networks output and feature pattern, the confidence of the activity classification can be achieved to complete the HAR process.

IV. EXPERIMENTAL EVALUATION AND DISCUSSION
As compared to the classification performance of data collected using a single hardware platform, the data fusion methodology adopted in this article increases the activity classification accuracy through the feature-level fusion of the IMU sensor, radar, and USRP signal, which recorded an accuracy of about 98.98% (see the multiclass confusion matrix in Fig. 7). Fig. 7 as a confusion matrix that evaluates the performance of the algorithm. It visually statistic classification model inference error and inference correct values. In the confusion matrix, which is a square matrix for multiclasses. Each row of this matrix represents instances in the true class, and each column represents instances in the predicted class. So, it is easy to show whether the algorithm will confuse the two  classes. The results show that 100% correct classed the Sitdown activity, 98% correct classed the Stand-up activity, and only 2% were confused classification of the Stand-up activity as Sit-down activity. This is further shown in Fig. 8 where a box and whiskers plot is used to compare the inference probability when using single devices, as well as the fusion of two and three of the devices together. The inference probability is similar to the confidence coefficient for algorithm performance. It verified the stability of the algorithm through high and stable inference probability output. The box and whiskers plot shows statistics of the inference probability results, which include maximum, median, upper quartile, and lower quartile and minimum. It can show a set of data variations and outliers. As can be seen, applying neuromorphic computing to fuse HAR data from three hardware devices is a minimum change and the highest lower quartile output of 95.34% compared to machine learning results from a single device and data fusion of two hardware devices. It means this result is most stable for inference performance of classification. Therefore, this evaluation proves that our solution can pass the method of constructing the matrix to help the data fusion between different hardware, and the fused data can obtain higher accuracy performance by the neuromorphic computing algorithm. Table I shows a comparison against traditional machine learning algorithms' accuracy and proves that better results are achieved through the proposed data fusion method. For instance, Bangaru et al. worked on the EMG and IMU sensor, and they used ANN to classify human activities. Furthermore, Chung et al. improve the data fusion method to be suitable for 9-axes IMU sensor (magnetometer, accelerometer, and gyroscope) and achieve results from the LSTM network. Based on the frequency-modulated continuous-wave (FMCW) radar, Cao et al. implemented the convolutional neural network classifier to processing fused data recognize human activity signal, and William et al. designed a framework to ensemble the KNN, neural network, and ensemble classier model to processing of USRP human activity data. However, by comparing accuracy, our implementation is more accurate than their classification. We believed that the recognition findings are preferable, demonstrating that the Hopfield neural network of neuromorphic computing to fuse multihardware signal features effectively recognizes human behavior. Furthermore, our proposed workflow has greater robustness and accuracy performance.

V. CONCLUSION
This study proposed a multisensing data fusion architecture for a HAR system that uses neuromorphic computing to integrate different hardware signal data for sensing and classifying human behaviors swiftly and efficiently. Depending on the attention mechanism method for feature selection to achieve multisensing device signal fused feature maps. One of the benefits, the Hopfield neural network of the associative memory function was applied for one-shot learning to human activities. It is only requested one training sample that is friendly to limited data sets. Another benefit is that there are different from traditional handcrafted features, the TopK calculation as an attention mechanism method to feature selection achieves good feature maps representing corresponding human activities. This approach not only addressed the issues with traditional machine learning for large training sample requirements but it also allowed for greater flexibility in fitting multisensing hardware signals. The suggested technique has a great potential to assist the different types of measurement devices in achieving system-level data fusion without affecting the accuracy of classification and recognition. Furthermore, validation methods were employed throughout to demonstrate that the method yields the significant improvement in accuracy when sensor, USRP, and radar data are fused. The proposed approach has shown a classification accuracy of approximately 98.98% and has demonstrated the strong potential of neuromorphic computing of multisensing data in HAR.