Classifying Participant Standing and Sitting Postures Using Channel State Information

Custance, Oliver; Khan, Saad; Parkinson, Simon

doi:10.3390/electronics12214500

Open AccessArticle

Classifying Participant Standing and Sitting Postures Using Channel State Information

by

Oliver Custance

^*

,

Saad Khan

and

Simon Parkinson

Department of Computer Science, University of Huddersfield, Huddersfield HD1 3DH, UK

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(21), 4500; https://doi.org/10.3390/electronics12214500

Submission received: 1 October 2023 / Revised: 28 October 2023 / Accepted: 29 October 2023 / Published: 1 November 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Recently, channel state information (CSI) has been identified as beneficial in a wide range of applications, ranging from human activity recognition (HAR) to patient monitoring. However, these focused studies have resulted in data that are limited in scope. In this paper, we investigate the use of CSI data obtained from an ESP32 microcontroller to identify participants from sitting and standing postures in a many-to-one classification. The test is carried out in a controlled isolated environment to establish whether a pre-trained model can distinguish between participants. A total of 15 participants were recruited and asked to sit and stand between the transmitter (Tx) and the receiver (Rx), while their CSI data were recorded. Various pre-processing algorithms and techniques have been incorporated and tested on different classification algorithms, which have gone through parameter selection to enable a consistent testing template. Performance metrics such as the confusion matrix, accuracy, and elapsed time were captured. After extensive evaluation and testing of different classification models, it has been established that the hybrid LSTM-1DCNN model has an average accuracy of 84.29% and 74.13% for sitting and standing postures, respectively, in our dataset. The models were compared with the BedroomPi dataset and it was found that LSTM-1DCNN was the best model in terms of performance. It is also the most efficient model with respect to the time elapsed to sit and stand.

Keywords:

CSI; behaviour biometrics; LSTM-1DCNN; confusion matrix; isolation chamber

1. Introduction

In recent years, non-intrusive behavioural biometrics have been researched due to their benefits over physiological biometrics, such as fingerprint, iris, and facial. They are nonintrusive, which means that the person does not require physical contact, which makes them convenient for the user [1]. Non-intrusive biometrics are also very hygienic, something that was an issue with hand geometry biometrics during the COVID pandemic, as well as in the healthcare domain [2]. However, Wi-Fi signals, a nonintrusive behavioural biometric, have been extensively researched recently regarding their successful impact on human activity recognition (HAR) in various fields such as healthcare and smart homes. Wi-Fi sensing is cost-effective due to its deployment on existing Wi-Fi infrastructure. In 2023, according to Statista [3], Internet use in the UK is estimated to be in 93.18% of households. This serves as a great incentive to research the use of Wi-Fi signals for HAR. It is for this reason that this paper is investigating sitting and standing positions by exploring the potential of channel state information (CSI) extracted from the widely available ESP32 microcontroller.

Wireless sensing refers to the use of wireless signals to detect and track various environmental factors. Wi-Fi signals can be reflected, refracted, and even absorbed by static objects, such as walls or moving objects, which, in the context of this research, can be humans [4]. These changes in the signal can be analysed to infer the presence and movement of humans. Channel state information (CSI) is the metric used to reveal the conditions of the environment. CSI provides information about the characteristics of the received signal, such as amplitude, phase, and frequency. In this paper, two different static actions of sitting and standing are investigated. As shown in Figure 1, participants were required to sit and stand between the transmitter (Tx) and receiver (Rx) while recording CSI data. All participants were selected based on different physical features, such as height and body weight, to help the various models distinguish between these differences. For example, a taller person above 6 ft will block more of the signal than a shorter person. In addition to this, body composition causes variations in CSI since a person with more body mass will absorb more of the signal than a person with less body mass [5].

CSI data-based Wi-Fi sensing has several advantages. First, Wi-Fi sensing is non-intrusive and does not require any additional sensors or wearable sensors. This is because the process can be performed using the existing Wi-Fi infrastructure. From a user convenience perspective, this is beneficial, as it allows increased usability in the authentication process [6]. Second, it is relatively low-cost compared to other sensing systems, such as those that are device-based wearables. Current CSI toolkits include the Linux 802.11n CSI TOOL [7,8] and the Atheros CSI Tool [9,10,11]. This previous research has investigated different network interface cards (NICs) for evaluation. The setup is quite expensive, as it often requires a router and a laptop or PC, whereas, in this paper, we utilise the ESP32 microcontroller, which is a low-cost device capable of capturing rich CSI data. Finally, the CSI approach can provide greater coverage than systems that are camera-based and can often be privacy-invasive. Wi-Fi signals can travel through walls and obstacles, allowing for monitoring and analysis of activities and movements, even when the participant is not in the direct line of sight (LoS) [12,13]. However, Wi-Fi sensing is still a developing area of research and there are still many challenges to overcome. The two main identified problems are related to performance [8,14,15] and user convenience [16,17,18,19]. The accuracy ranges from 90–80%, often reducing as the number of participants increases—90% was only for a small number of participants, that is, around 11 people. However, a high false acceptance rate (FAR) of 9.43% is considered a security issue [13]. Poor accuracy from the aforementioned papers has been formed by model complexity, i.e., WiPass [13], uses a 256-dimensional deep convolutional neural network, and NeuralWave [15], uses a 23-layer deep CNN. On the other hand, many papers collect large datasets for training purposes by asking the participants to perform various gestures; the result of this is performing data collection in a way which would be unrealistic and limited in scope deployment. HumanFi [16] asks the user to collect 40 samples of walking back and forth in a straight line; Lin et al. [18] asked for 140 h of data collection of 20 different types of human activity. Wang et al. [17] explored the combination of gait and breathing; however, they required 18 samples from each participant standing for 15 s per sample. In the studies that have been conducted, background interference was not considered, and it is currently not known how the accuracy might improve in a controlled environment.

To address the above limitations and further knowledge in this area, we collect data from 15 participants where five samples are collected from sitting and five from standing inside an isolation chamber designed to withstand electromagnetic fields (a Faraday cage). The segments are 5 s long. This results in a total data collection time of 50 s per participant, which is considerably less than related studies where large data samples are collected to investigate matching capabilities. In this work, we are interested in small data segments that more closely match how such a system would work in the real world, where participants are likely not to stay stationary for long periods of time. The system is tested on a wide variety of preprocessing techniques, which are then fed into a variety of classifier models as input and performance metrics are recorded. In addition to this, the datasets are split into sitting and standing to determine whether the models perform better when a different activity is performed by the user.

The contributions presented in this paper are summarised below.

The proposed solution collects data using a shorter sample size of around 50 s for training using existing data collection tools. The method requires the user to test only once, performing sitting and standing postures. To the best of our knowledge, no paper performs classification for these postures in short data segments.
The data are tested back-to-front to identify the true quality of data in the extracted features before undergoing data preparation techniques.
The experiments are performed in an isolation chamber to minimise the external environmental impact. This is reflected in the high-accuracy results from classification. ABLSTM achieved 97.63% and 98.90% for standing and sitting, respectively, without any preprocessing beforehand.
Several optimisation techniques have been tested during the data preparation phase to achieve the best accuracy. Performance metrics are also recorded for comparison purposes.

The rest of this article is organised as follows: Section 2 reviews the related work linking CSI and Wi-Fi sensing, as well as how this can be used for HAR. Following on, an introduction to CSI and the advantages of certain processing techniques are discussed in Section 3. Section 4 presents how CSI will be collected, including the setup of equipment and the environment. Section 5 evaluates the test results and compares the sitting and standing datasets. In Section 6, the limitations of the approach are discussed.Finally, the paper is concluded in Section 7.

2. Related Work

Research regarding biometrics all stemmed from the well-known vulnerabilities of knowledge-based authentication systems (KBA) and token-based authentication systems (TBA). In two recent works, the authors identified the challenges around knowledge-based systems [20,21]. Usability and security aspects were identified as the main challenges with regard to KBA systems. TBA systems are more common than many expect, with radio frequency identification (RFID) technology increasingly being used in many domains such as Healthcare [22,23] and smart cities [24]. Fatima et al. [25] demonstrate the vulnerabilities of smart card theft, stealing microchips, and impersonation.

Biometric research has become an intensively researched topic in order to overcome the challenges and limitations imposed by previous authentication techniques. Biometrics can mainly be identified as physiological and, more recently, behavioural. Palma et al. [26] provide a detailed overview of biometric-based human recognition systems. The physiological biometrics mentioned include fingerprint, face, hand, iris, ear acoustic, vascular patterns, electrocardiogram (ECG) and deoxyribonucleic acid (DNA). However, Joshi et al. [27] performed a security analysis on the fingerprint system alone and confirmed that there were 16 possible attack points. Devices such as smartphones are using biometric authentication systems, such as facial authentication (FA) techniques. Like all other forms of biometrics, FA is vulnerable to presentation and spoofing attacks, as demonstrated by Zheng et al. [28]. The geometry of the hand is susceptible to spoofing attacks, as Bhilare et al. [29] demonstrated that the system has a spoofing acceptance rate of 84.56%. From a security perspective, this system is not sufficiently secure to be used. All other physiological techniques also suffer from issues related to human characteristics that deteriorate with age, sensing challenges with different illuminations, and substantial changes to the biometric feature such as a cut to the skin.

To overcome the challenges, research has been shifted towards investigating behavioural biometrics due to the numerous benefits it has over physiological data from a security and usability perspective. Behavioural biometrics are all non-intrusive [1,30]; however, they are difficult to replicate, which instantly takes away the presentation attacks discussed earlier, and generally more secure in the sense that humans do not have to remember any passwords or token cards. Various behavioural biometrics have been researched, including keystroke dynamics [31,32,33,34], mouse dynamics [35,36], gait [37], and many more.

Recent research regarding behavioural biometrics has investigated ways of utilising existing infrastructure in people’s daily lives, with the main motivation of keeping down expenses as well as providing scope for the research to develop. Furthermore, by leveraging existing Wi-Fi devices in buildings, no wearable sensor or camera-based system that can be privacy-invasive needs to be used; hence, the sensing is conducted in a device-free manner, which is positive in terms of user convenience. Current research has shifted to Wi-Fi sensing and the use of radio frequency signals, which use radio frequency (RF) signals that propagate in the surrounding environment [38]. CSI refers to information about a wireless communication channel between a transmitter and a receiver. The CSI recorded can be affected by various factors such as obstacles that can be static or moving, i.e., furniture, walls, or humans, as well as distance, among others.

2.1. Human Activity Recognition (HAR) Based on Sensors

Recently, HAR from various sensors has received great attention due to its beneficial use in various applications in the fields of surveillance systems, health care systems, rehabilitation, and smart homes. Currently, HAR is recorded by internal or external sensing. Internal sensing usually refers to a device attached to the human body, this can range from a smartwatch to smart clothes which have various sensors containing essential data which can be leveraged for various applications in many ways. There are many integrated sensors within these wearable devices, including accelerometers, gyroscopes, and orientation and magnitude sensors, which all have different functionalities capturing different sets of data. The reason why wearable devices have come under the spotlight in recent years is due to the ability to utilise the captured data to classify the activity of a person in real-time to provide assistance and guidance, which refers to human activity recognition (HAR) [39]. HAR plays a key role in ambient assisted living (AAL), medical diagnosis, and especially in healthcare, among many others.

Muaaz and Mayrhofer investigated the security strength of a smartphone-based gait recognition system using zero-effort and live minimal-effort impersonation attacks using the accelerometer sensor within the smartphone [40]. Showing that the system had no false positives and the expert impersonators found it difficult to mimic the gait due to regularity between their steps. However, this system matched impersonators with similar physical characteristics to their targets, and the system will find it difficult to distinguish attackers outside of the experiment. Sun et al. also investigated the accelerometer sensor in wearable Internet of Things (WIoT) devices [41], which again looks at gait recognition. The authors proposed a speed-adaptive gait cycle segmentation method and a matching threshold generation method to mitigate the problem of varying walking speeds. [Gait-based identification for elderly users in wearable healthcare systems] also looked at gait recognition for the use of elderly people by alleviating the problem of intra-subject gait fluctuation. The authors proposed a gait template synthesis model and an arbitration-based score-level fusion method to improve the overall accuracy. The system achieved an average recognition rate of 96.7%. However, both systems would deteriorate because the study only focused on one age group, and younger generations have intrabody gait fluctuations, which lead to greater variability in the results due to their unstable walking [41,42]. Furthermore, both systems require high computation and memory costs, which is an ongoing challenge in WIoT devices due to their low memory nature, which means that the methods have to be very efficient and often not robust enough to enable high accuracy. Recent advances in wearables have led to necklaces and knee bandages. Chen et al. explored the use of a neck-mounted wearable that has embedded infrared (IR) sensors to track facial expressions [43]. Although this seems practical, the camera may be blocked by hair or a beard and is dependent on walking parameters such as speed. Very recently, Lie et al. investigated the various wearable sensors in a medical knee bandage and aimed to provide postoperative rehabilitation and protection [44]. They incorporated five sensors, including electromyography sensors (EMG), accelerometers (ACC), electrogoniometers (EMG), gyroscopes (GYRO), and microphones (MIC).

Although there is promise with respect to wearable devices for a range of applications, there are many limitations and challenges that accompany it. Some limitations have been expressed specifically for each listed paper above; however, the listed problems below are applicable to all. Firstly, WIoT devices need to wear them at all times for them to function, and this is inconvenient and cumbersome. Also, it may cause skin irritation and other medical problems for some users [45,46]. Although some articles consider the security implications of WIoT, there still persist problems in which illegal intruders who are not registered in the system or do not wear these devices cannot be recognised by wearable device-based systems [47]. One of the biggest challenges embedded in wearable research is that devices with small size and low power consumption and factor form, such as those dedicated to wearable platforms, have strict computational, memory, and energy requirements [48]. A recent study by Tran et al. investigated chronic patient perceptions of the use of WIoT in healthcare [49]. Although 55% believed that the devices could improve their follow-up and reactivity of care, most of the responses highlighted that patients feared that they could replace human intelligence, represent serious risks of hacking, or lead to misuse of private patient data by caregivers. Thus, 22% of the study’s users would refuse to use the WIoT devices for the above reasons.

External sensing refers to devices such as cameras in fixed locations. Vision-based HAR research can be divided on the basis of the data type, which includes RGB data and RGB-D data. Zerrouki et al. recently investigated HAR based on the variation of body shape; this was achieved by segmenting the body into five partitions and in each frame, area ratios were calculated and fed into the proposed adaptive boosting algorithm [50]. However, two factors severely impact the performance of this system. In dark or dusky conditions, the camera will not be able to detect the human body due to low illumination levels. In addition to this, the automatic changing of the background makes human action recognition challenging and can generate errors and false classifications. Oyedotun et al. applied deep learning for hand gesture recognition in Thomas Moeslund’s gesture recognition database [51]. The authors demonstrated that DNN and stack denoising autoencoders (SDAEs) are capable of learning the complex hand gesture classification task with lower error rates. They achieved recognition rates of 91.33% and 92.83% for CNN and SDAE, respectively. Abraham et al. noticed the problem of RGB and depth camera videos that are affected by background clutter and illumination changes and are applicable to a limited field of view only; the above two papers fall under this issue [52]. The authors overcame this by presenting a multimodal feature-level fusion approach that includes an RGB camera, depth sensor, and a wearable, and an accuracy rate of 97.6% is achieved on the publicly available UTD-MHAD dataset. However, the system does not incorporate multiview HAR, and the orientation of the person whose action is being recognised must be in sight of the operating camera.

The ubiquity of Wi-Fi technology has made it indispensable in our daily lives. Researchers are increasingly exploring its applications in wireless sensing due to its widespread availability and cost-effectiveness. By harnessing commercial Wi-Fi devices, researchers can create innovative sensing solutions without the need for expensive and complex wearable devices or cameras. Although certain wearable devices may be affordable, they often come equipped with low-cost sensors, resulting in subpar performance. The use of Wi-Fi infrastructure presents a viable and economical alternative for advanced sensing technologies. In contrast to vision-based techniques, identification systems rely on Wi-Fi signals capable of penetrating obstacles, making them particularly effective in complex and cluttered environments. Diverging from methods employing inertial measurement units (IMUs) or wearable sensors, Wi-Fi sensing operates in a non-intrusive and device-free manner. Furthermore, in the context of the expanding Internet of Things (IoT) landscape, the widespread presence of Wi-Fi devices facilitates the establishment of a ubiquitous and imperceptible security system through CSI-based biometric sensing. Wi-Fi sensing offers enhanced privacy compared to wearable-based systems or camera-based methods due to its nonintrusive nature and reduced risk of capturing sensitive visual information. Unlike cameras, which can potentially record detailed images or videos of individuals, Wi-Fi sensing operates on the basis of radio waves, which do not capture visual data. Additionally, wearable devices often require physical contact with the body, raising concerns about personal space and consent. Wi-Fi sensing, being device-free, eliminates the need for individuals to wear or carry any tracking equipment, preserving their privacy and reducing the risk of unauthorised data collection. In addition, Wi-Fi signals can be effectively anonymised, ensuring that the identification and tracking of specific individuals remains challenging, further safeguarding user privacy. These factors collectively contribute to Wi-Fi sensing being a more privacy-conscious choice in comparison to wearable or camera-based systems.

2.2. Human Activity Recognition (HAR) Based on CSI

CSI-based HAR has received a lot of attention in recent years because of the advantages of sensor-based HAR. The main benefits include the fact that it is non-intrusive and does not require users to wear any sensors on their bodies. It is insensitive to illumination, making it effective at all times of the day. It also has greater privacy protection due to not having a camera operating in the room. Finally, CSI-based HAR is cost-effective as it is often implemented on existing Wi-Fi infrastructure and does not require any additional hardware. Wi-Fi signals can be described in two different ways: received signal strength (RSS) and channel state information (CSI). RSS is often used in indoor positioning and provides an estimate of the power of the received signals. However, RSS is not stable and cannot capture dynamic changes in the signal while an activity is being performed [53]. In contrast to RSS, which provides a more general indication of signal strength, CSI offers a more detailed and dynamic representation of the signal, making it a better choice for HAR.

The two problems associated with CSI-based HAR remain with the dynamic environments and recognising new activities with new users, which ultimately requires new sampling in different environments, which is a time-consuming task and inconvenient for users. Wang et al. [6] propose a multimodal CSI-based HAR system (MCBAR) that aims to address the issue of rarely seen activities in unseen environments and accomplishes this by using generative adversarial network (GAN) and semi-supervised learning techniques. This method enables a more robust system and requires only a small amount of data from the participants. In one paper, the authors aim to improve recognition performance by implementing a framework, augment few-shot learning-based human activity recognition (AFSL-HAR). The framework achieves high recognition rates by including a feature Wasserstein generative adversarial network (FWGAN) module, which can synthesise diverse samples to help the recognition model learn sharper classification boundaries [54].

Schäfer et al. investigate deep neural networks, such as LSTM and SVM, to classify eight common activities. These are EMPTY, LYING, SIT, SIT-DOWN, STAND, STAND-UP, WALK, and FALL [55]. Three experiments were conducted on different platforms, and all achieved high accuracy. One other related paper investigated attention-based bidirectional long-short-term memory (ABLSTM) for passive HAR recognition. BLSTM enables the model to learn features in two directions from raw sequential CSI data [56]. LSTM only processes in one direction; however, Chen et al. [56] argued that future CSI is also of great importance for HAR. The attention layer can assign greater weights to more important features and time steps to be identified, leading to greater performance. Shalaby et al. [57] looked at four different deep learning methods, specifically: a convolution neural network (CNN) with a gated recurrent unit (GRU); a CNN with a GRU and attention; a CNN with a GRU and a second CNN; and a CNN with long short-term memory (LSTM) and a second CNN. They report that dividing the model into two main steps, feature extraction and classification, enabled high performance. Their models achieved good accuracy levels of 99.31%, 99.16%, 98.88%, and 98.95%, compared to those of 75% and 95% achieved by the LSTM and ABLSTM models, respectively.

Other papers have investigated transferring CSI data into images and feeding this in CNN networks, whilst others investigated deep neural networks (DNN). A work collected seven different activities and collected CSI from a Raspberry Pi 4 device and converted the data to RGB images to be fed into a 2D CNN classifier [53]. CNN can analyse the data in parallel rather than sequentially, unlike other models such as LSTM. Training time for other neural networks is longer than CNN; therefore, by using CNN, there is less consumed training time and lower computational complexity, leading to greater model efficiency. The model was successful with an accuracy of around 95%. In other work, the authors have used DNN and used four techniques for HAR [58]. The proposed model (HARNN) implements a two-level decision tree, a linear regression method, a noise removal mechanism, and an RNN used to recognise human activity. Finally, work investigating the vulnerability of how a DNN will be influenced by adversarial attacks by adding small perturbations to the CSI has been undertaken [56]. When adversarial attacks were implemented, the accuracy was reduced to around 0.3, which presents a great security risk.

Overall, HAR using Wi-Fi CSI information is a complicated task affected by numerous surrounding parameters, such as multipath reflections of Wi-Fi signals in the nearby environment where the activities are performed, and temperature and humidity of the air, which influence the amplitude and phase shift of the received signal. A significant amount of research is centred around deep learning techniques. Overfitting is a big problem with deep learning, as the model achieves very high accuracy while training without learning any complex pattern, so when exposed to unseen data, the classification is often incorrect. However, all papers mentioned above often fail to compare their techniques to others, and this data collection is often the issue. Issues persist with the number of users involved. Recent publications involve 3 to 10 users, which does not allow robust findings [6,54,55,56,57,58,59,60]. It is also the case that papers cite the issue with dynamic environments, but again the papers only test two or three different environments, which does not give enough trust in the model that is being tested.

3. Preliminaries

In Figure 2, we illustrate the testing architecture and the various methods and techniques that have been identified through the literature based on published results.

3.1. CSI Amplitude and Phase

In wireless communication systems, channel state information (CSI) provides information on the characteristics of the wireless channel between the transmitter and the receiver. In this scenario, when the Tx ESP32 transmits a signal x, it is received by Rx in the form of

y = H x + η

, where x is the transmitted signal, y is the received signal,

η

is the noise vector, and H is the channel matrix. The channel matrix H represents the characteristics of the channel, such as path loss, fading, and interference, and this can be estimated based on the received signal and the transmitted signal. The complex CSI vector contains 64 subcarriers, where 52 subcarriers contain data and 12 are null. Among the 52 subcarriers, 48 are usable, as 4 are used as pilot subcarriers. Each element in the matrix is composed of both real and imaginary components, which can then be used to calculate the amplitude and phase for each subcarrier. The amplitude is calculated as follows:

A^{(i)} = \sqrt{{(H_{i m}^{(i)})}^{2} + {(H_{r}^{(i)})}^{2}}

(1)

where

A^{(i)}

is the amplitude of the ith subcarrier. Within the channel matrix, H consists of both imaginary components

(H_{i m}^{(i)})

and real components

(H_{r}^{(i)})

for each subcarrier i. Given this, the amplitude A can be calculated. The same components are used to extract the phase

ϕ

of subcarrier i from the channel matrix H.

ϕ^{(i)} = atan2 (H_{i m}^{(i)}, H_{r}^{(i)})

(2)

3.2. Signal Processing

The first stage is to extract features from the raw CSI data collected from the ESP32 Rx microcontroller. The four techniques which have received the greatest attention in literature are amplitude, power spectral density, statistical hand-crafted features, and the Doppler shift. The amplifier has received the greatest attention because of its beneficial uses in various applications. Amplitude can detect changes in the environment from large-scale to small micro-movements. Power spectral density is a highly used feature that indicates the strength of the signal, and hence, noise and motion can be easily detected from this. Statistical features have received praise and criticism based on their performance in different experiments. However, it has been found to improve the accuracy of classification algorithms [61]. Finally, the Doppler shift will be tested, as this can give extra detail, such as the direction and speed of motion.

Denoising filters will also be used on the best-performing feature extraction method, and this will then be used as the default for all other remaining techniques. The Savitzky–Golay filter method is effective for noise removal, depending on the activity by changing the degree of the polynomial and the size of the moving window. In this case, it will be used to filter out high-frequency noise in the original signal. Butterworth will also be tested to remove high-frequency noise to remove outliers.

3.3. Data Preparation

Once the data have passed through signal processing, application-specific techniques will be applied to the data to give the classifier models the best chance to learn patterns. Two methods that will be implemented are least-squares baseline removal and polynomial fitting. Least squares baseline removal is often used in CSI as it can effectively remove hardware noise from the ESP32. The polynomial will be tested as it can provide a flexible fit depending on the baseline variations of the CSI data and works by dividing the signal into segments, and a polynomial is chosen to best fit the data segments, hence the flexibility.

Another common data preparation technique to perform is feature scaling because of its importance in not presenting any biased data to the machine learning (ML) technique, as the decisions can be heavily influenced by outliers or dominating data. All feature scaling methods tested include max–min normalisation, Z-Score standardisation, and unit vector scaling. All methods preserve the original distribution of data, allow for easy interpretation by ML models, and are robust to outliers in the sense that the techniques are not sensitive and affect the normalisation process.

3.4. Classification Models

Using the pre-processed data from various techniques displayed in Figure 2, we then train three classification models to see which model performs best from the different datasets. A significant amount of research is focused on neural networks, specifically LSTM, CNN, and ABLSTM, partly due to their recognition rates to understand complex patterns. There is an ability to perform hyperparameter turning on the number of layers, neurones used per layer, activation functions, learning rates, regularisation, dropout, and batch size, which can be changed accordingly to improve performance. The three chosen models are ABLSTM, CNN, and a hybrid model that combines an LSTM with a 1D CNN, forming LSTM-1DCNN. CNN will acquire the features, and LSTM will learn the dependencies between the identified features. From the model architectures, we can finally obtain an approach with good robustness and high-performance accuracy for classification. Each model has been designed in a way which avoids deep learning model issues, such as overfitting, by adding dropout and early stopping techniques. As well as this, our models have been designed with SoftMax activation, which allows for a probability distribution over possible output classes.

The first classification model is ABLSTM and this is based on its empirically proven efficacy and its inherent interpretability advantages. ABLSTM integrates the capabilities of bidirectional long short-term memory (BLSTM) networks, capable of capturing sequential dependencies from both past and future context, and attention mechanisms that highlight crucial segments within input sequences. By incorporating attention mechanisms into LSTM networks, we not only improve the model’s ability to capture intricate temporal patterns in our human activity recognition (HAR) data but also gain interpretability. The attention weights generated during the model’s training process elucidate which parts of the input sequences are crucial to decision making, providing valuable information on the features that guide the predictions of the model. This is supported by the authors in [Wi-Fi CSI-Based Passive Human Activity Recognition Using Attention-Based BLSTM], which achieved 96% accuracy and above for two different environments. Furthermore, the choice of ABLSTM aligns with recent advances in explainable artificial intelligence, ensuring that our research contributes not only to the field of HAR but also to the broader discourse on interpretable machine learning models. The configuration of ABLSTM is shown in Table 1. It has a single bidirectional LSTM, dropout, and dense layers. The input of the model is a 3D tensor consisting of the number of samples, the number of subcarriers, and 1 represents the number of features per subcarrier. The first layer is bidirectional, meaning it actually has 128 units, as it has 64 units for forward LSTM and 64 for backwards LSTM. The output of this layer is 2D, which concatenates the output of the forward and backward LSTMs at each subcarrier. The next layer is dropout with a 0.2 dropout rate to prevent overfitting. The final layer is a dense layer with the SoftMax activation function that is used to classify the target classes. The output is 2D with the number of samples versus the number of target classes, 15. An early stopping mechanism is applied to the model to stop the training if the validation loss does not improve for three consecutive epochs.

Convolutional neural networks (CNNs) have emerged as a cornerstone in the field of computer vision and pattern recognition due to their ability to automatically learn intricate spatial hierarchies and discriminative features from raw data. In the context of human activity recognition (HAR), where data often involve complex spatial and temporal patterns, CNNs excel at capturing these hierarchies. The convolutional layers employ localised receptive fields to detect low-level features such as edges and corners, gradually combining them to recognise higher-order patterns and shapes. This hierarchical feature extraction aligns seamlessly with the nature of human movements, which often involve nuanced spatial configurations. Additionally, CNNs significantly reduce the dimensionality of input data through the use of convolutional and pooling layers, enabling the network to focus on the most salient features while discarding irrelevant information. This reduction not only enhances computational efficiency but also aids in robust feature learning. Furthermore, CNNs are inherently translation invariant, meaning that they can recognise patterns regardless of their position in the input data. This property is particularly valuable in HAR, where the orientation and position of the subject may vary widely. Taking advantage of these characteristics, CNNs offer a scientifically validated framework to effectively process raw sensor data, automatically learn essential features, and discern complex human activities, making them a principled choice for this research. The CNN architecture can be observed in Table 2. Firstly, they are converted into a 3D array of eight rows and columns and three colour channels. Each image is then increased by a factor of four in the row and column directions. The final image of the shape results in a 32 × 32 × 3 and this is input in the model. The first layer is a convolutional layer with 32 filters, each of which has a 3 × 3 size. The activation function is the rectified linear unit (ReLU). The next layer is a max-pooling layer that reduces the dimensions of the output from the previous layer by a factor of two. Two more convolutional layers are applied similar to the first two layers. A flat layer is then used to convert the 3D tensor into a 1D vector shape. Finally, a dense layer issued with the activation function ReLU and a SoftMax are used for class probabilities.

LSTM networks are inherently designed to capture temporal dependencies and long-term patterns within sequential data, making them exceptionally well suited for modelling human activities, which often exhibit complex temporal structures. By integrating 1DCNN into the architecture, we can exploit the spatial hierarchy extraction capabilities of CNNs. The 1DCNN layers adeptly identify localised features and patterns within the temporal sequences, augmenting the LSTM’s ability to discern subtle nuances in human movements. Furthermore, this hybrid model architecture capitalises on the complementary strengths of LSTM and 1DCNN. LSTM excels at capturing sequential dependencies, learning from the historical context of activities, while 1DCNN efficiently extracts spatial features from the sequential data, allowing the model to focus on specific segments crucial for classification. Integration of these components fosters a synergistic relationship. LSTM understands the intricate temporal dynamics, while 1DCNN captures localised spatial features. This parallel processing power enhances the model’s understanding of complex activity sequences, enabling it to decipher nuanced variations in human actions that might be challenging for individual models to grasp comprehensively. Moreover, the hybrid LSTM-1DCNN architecture ensures robustness against overfitting. By combining the strengths of LSTM’s memory cells, which retain long-term context, with the spatial feature extraction prowess of 1DCNN, the model learns a robust and discriminative representation of activities. Additionally, the hierarchical nature of the hybrid model allows it to effectively generalise from training data to unseen activities, which makes it an extremely advantageous model to test. The architecture of the hybrid model is shown in Table 3. The model consists of nine layers and is used to extract complex patterns from the dataset. The first layer is a 1D convolutional layer that contains 64 filters and a kernel size of three, which is used to extract features, and ReLU is used due to its ability to handle nonlinearity; a convolutional 1D layer is also used in layer 3. A max-pooling layer is used to reduce the dimensionality of the output, and this is again used in layer 4. Layers 5 and 6 use LSTM, which is designed to handle sequential data. A dense layer is then added with a ReLU activation function, a dropout layer with a rate of 0.5 to prevent overfitting. Finally, a dense layer is used with the output equal to the number of classes in the target variable. The activation function used is SoftMax, which produces probabilities that sum to 1 for each class.

3.5. Ethical Considerations

The ethical implications surrounding Wi-Fi sensing, especially in the context of human activity recognition, warrant careful consideration, particularly as these technologies find applications in sensitive domains such as patient monitoring and personal fitness.

Privacy considerations are paramount, as Wi-Fi sensing has the potential to capture highly personal and sensitive data. Safeguarding individuals’ privacy requires rigorous data anonymisation techniques and strict access controls. Ensuring that the data collected cannot be traced back to specific individuals is crucial to upholding their privacy rights. Furthermore, transparency in data collection practices and clear communication on how Wi-Fi sensing is used is fundamental to obtaining informed consent from individuals, respecting their autonomy.

Data security is another critical ethical concern. Secure transmission and storage of data collected via Wi-Fi sensing methods are essential to prevent unauthorised access and potential breaches. Implementing robust encryption protocols and adherence to data security standards are imperative to protect against data tampering and unauthorised use, thus upholding the integrity and confidentiality of collected information.

In conclusion, addressing the ethical implications of Wi-Fi sensing in human activity recognition requires a comprehensive approach that involves rigorous privacy protection measures, stringent data security protocols, informed consent practices, and clear regulations to prevent misuse. By upholding these ethical principles, the integration of Wi-Fi sensing technologies in sensitive areas can proceed responsibly, respecting individual privacy and autonomy.

4. System Model

4.1. CSI Data Collection

To experiment with the isolation chamber, we use an ESP32-CSI Toolkit (https://github.com/espressif/esp-csi/tree/0edad726c21577358583d9f36368e9f02fd4ea2a/examples/get-started, accessed on 31 October 2023) to collect data from the Wi-Fi-enabled ESP32 microcontroller, which provides a more usable and cheaper alternative to existing methods, which have limitations in dealing with large-scale systems due to high deployment costs. As well as this, the ESP-CSI Toolkit provides a standalone solution, meaning it can be deployed anywhere and does not require an updated NIC, which is the focal point of many other toolkits for CSI data collection. For a pair of transceivers, one antenna works as a transmitter and one antenna works as a receiver. Therefore, there are a total of 1 × 1 × 64 = 64 CSI values, which come in the form of two assigned bytes in two fields of channel frequency responses, including legacy long training field (LLTF) and high-performance LTF (HT-LTF), which comes as two subcarrier indexes from 0 63 and −64 −1, resulting in a total of 384 values for a 40 MHz bandwidth system. We transmit frames from one ESP32 transmitter microcontroller and collect the CSI from another ESP32 acting as the receiver at a sample rate of 80 Hz with a baud rate of 921,600 to capture all available information. We also demonstrate an example from our dataset, as seen in Table 4 and Figure 3. In the Figure, each distinctive coloured line corresponds to a unique subcarrier from the pool of 64 subcarriers captured by the ESP32 microcontroller device.The column headings correspond to the data features generated using the ESP32-CSI toolkit. The top part consists of the the channel matrix H, which has been cropped because there are around 400 columns. The first 24 columns give details about the packet preamble, such as the MAC address, ID, RSSI, bandwidth, channel, timestamps, and much more information. From column 25 onwards is the array of CSI data, the part in which we are most interested. Each row of the matrix represents a packet that has been received by the receiver. Below is the extracted signal wave in real time, which shows the full 64 subcarriers. The data displayed are from an anonymous user sitting.

4.2. Public Dataset

In addition to collecting our own dataset, it is important to see how the methods generalise to a public dataset to fully validate the robustness of each method. The dataset [53] we used was collected inside a large bedroom. The paper did not provide a name for this dataset, therefore, we will use “BedroomP” from now on. They exploited Raspberry Pi for CSI data collection and asked 3 volunteers to perform 7 activities 20 times, resulting in 420 samples. The activities included sitting down, standing up, lying down, running, walking, falling, and bending. For the purpose of this article, only sitting and standing will be used, the others will be disregarded. The authors employed the Nexmon Tool [53] and used the Raspberry Pi 4 and a Tp-link archer c20 as an access point (AP) in a 20 MHz bandwidth of channel 36. The AP and Pi are positioned 1 m above the ground, the authors gave the reason for ensuring an unobstructed signal path was achieved. They were placed 3 m away from each other.

4.3. Experiment Setup

To demonstrate the effectiveness of testing various models, we keep the experiment simple. Figure 4 shows the experimental setup for the first five tests in which the user is sitting on the red seat. Figure 4b shows the next five tests in which the user will simply stand in between the transmitter and the receiver. The transmitter and receiver are placed

\approx 1.55

m apart in the isolation chamber with dimensions of 3 m × 1.6 m. The transmitter is located next to the laptop and the receiver is on the left wall. Before the experiment, users were asked to remove any devices they had in their pockets as this can act as interference with the signal. Although this was a controlled experiment, the tests performed were only initial, and no markers were placed on the floor as the exact location where a user stood. Each test lasts for 5 s, resulting in 25 s of sitting data and 25 s of standing data. Therefore, each user will be tested for a total duration of 50 s. There is no specific rule for the duration of data collection, but it is an important factor to consider when achieving good accuracy. Specifically, in the HAR research field, there is no pattern, and the durations range from one end of the spectrum to the other. Ref. [62] completed the duration of activity over 20 s, but emphasised that this started and ended in a stationary state. Ref. [63] also followed the same pattern of 20 s starting and ending in stationary positions. Ref. [64] looked at volunteers who performed activities within a duration of 10–20 s; therefore, no specific time was set for this and the experiment was a bit more relaxed. Moshiri et al. [53] experimented with 3 volunteers and asked them to perform 7 different activities 20 times, so no time was set on this, it was a case of performing it 20 times. In [65], they recorded their activities in 5 s, but showed that high accuracy can be achieved with minimal activity recording. Although the above papers mainly test for 20 s, we believe that 5 s is sufficient to record the full activity. The reason for this is that the experiment needs to be as realistic as possible; previous research has resulted in gathering data from users by asking them to repeat activities numerous times and this is something that does not fit into daily life. For each activity, sitting and standing, we will record 10 samples per user, 5 from sitting, and 5 from standing. Although this seems to be a very limited number of samples, we believe that the tested activities have minimal variability, allowing us to capture the inherent variability present in human behaviour. Activities may have slight differences due to factors such as body positioning, slight movements, or environmental conditions. By repeating the activities five times, we capture this variability. Also, in real-world scenarios, participants may become fatigued or bored if activities are repeated excessively. Beyond a certain point, additional repetitions will not provide significantly different variations in the data, especially for routine activities. By keeping the repetition at five times, we balance the need for variability with the participants’ comfort and engagement, ensuring a more realistic representation of daily activities. By testing various signal processing, data preparation, and classification models, it is possible to achieve good accuracy for HAR. Once the data are collected, four test files will be used for training purposes, and the fifth test file will be used for testing. The models are selected from good performance in the literature [53,55,56,58] and decided to be ABLSTM, CNN, and a hybrid model that combines the two to form LSTM-1DCNN. Each model was tested with only 10 epochs, and the reason for this was that the workstation laptop had a 1.4 GHZ Quad-Core Intel Core i5 processor. An epoch of size 10 was chosen as appropriate for the laptop specification.

5. Evaluation

5.1. Experiment Results on Our Dataset

5.1.1. Experiment Results for Sitting

Figure 5 shows the confusion matrix for using polynomial fitting in the CNN network. The reason for analysing the confusion matrix on polynomial fitting for a CNN model is because of an increase in accuracy as a result of the trial and the tested back-to-front method. The use of confusion matrices in this work was inspired by recent work published by Jannat et al. [66]. Instead of performing preprocessing techniques first and then extracting the features, this paper tested this in reverse. Therefore, we extracted the feature first before performing denoising, detrending, and feature scaling methods. The reason for testing this back-to-back method is to demonstrate that good accuracy can be produced. It has been discovered that not only does it increase accuracy, but it also allows for minimal preprocessing, and hence, saves time. The diagonal, which is highlighted in dark blue shows the number of correct classifications, whereas all other entries in the matrix are misclassifications. Visually, it is clear that most classifications are correct. However, there were 25 instances where the model incorrectly classified the person, which represents 11.1%. Notably, three instances came to focus and these were all linked to one person, Peter. During testing, Peter experienced difficulty starting the testing and this may have impacted the results; hence, a few misclassifications were present. Another noticeable prediction included 91 misclassifications of predicting Maomao when the true person was Jenna. This may have happened due to their similar height as both were relatively short in comparison to the other participants. However, the numbers are still low, as additional metrics, such as precision, recall, and F1 scores, would be identified should the model predict the same numbers of participants for a true target.

The test uses various signal processing and data preparation techniques for the sitting posture dataset, which are shown in Table 5. It can be analysed that amplitude is the best-performing feature on all three models in comparison to handcrafted features, such as statistical features. This is because amplitude provides more discriminative information about the wireless channel. Amplitude contains richer detail, whereas handcrafted features contain second-order statistics, such as mean and kurtosis, which do not give a great indication of the environmental effects. As amplitude was the best-extracted feature, we used this in the different signal processing techniques and in the data preparation stages. The purpose of extracting the feature first, i.e., the amplitude, was to identify how much quality can be extracted from raw data first. It was interesting to see that the Savitzky–Golay filter, polynomial fitting, and max–min normalisation had increased accuracy by performing this back-to-front testing.

Figure 6 shows the extracted features in more detail relating to each model. It is immediately evident that amplitude performed the best. Phase achieved low accuracy in all three models, especially the LSTM-1DCNN model. The reason for this is that the phase is not sufficient to capture HAR alone, and it is highly sensitive to the position and movement of the ESP32 microcontroller. Performance increased above 94% for all the aforementioned techniques. In particular, the Savitzky–Golay filter improved the score by just under 2%, which is a significant improvement for security applications. The performance increased due to the removal of noise and the extraction of more informative features, which ultimately smoothed the wave that can be seen in Figure 7. Considering the user is sitting down, the CSI is still sensitive to micromovements, hence the peaks in the wave, but the filter has minimised the peaks and achieved a 94.40% accuracy on LSTM-1DCNN.

In addition to investigating the accuracy of the model, it is worth looking at the elapsed time with respect to each classification model, as observed in Table 6. This is important because the efficiency concerning HAR is pivotal from a user convenience perspective. It can be observed that ABLSTM seems to be the model that uses the most time to learn the data. The architecture of this model is complex with 64 units, followed by a dropout layer and a dense layer using SoftMax activation. This complex architecture results in good accuracy, but it requires an average processing time of 280.85 s (seconds) for sitting, which is equivalent to around 4 min and 40 s. This architecture will result in the issue of being too time-consuming when more data are supplied to the model. In comparison, LSTM-1DCNN was the most efficient model, as it combines CNN and RNN. CNN is used to learn the local patterns in data, whilst the recurrent layers capture temporal dependencies between the patterns. The aforementioned model has an average of 56.23 s, these results are substantially faster than ABLSTM or CNN. The model promises further research to optimise the parameters in a way that improves accuracy whilst keeping an efficient approach.

5.1.2. Experiment Results for Standing

Figure 8 refers to the confusion matrix for polynomial fitting in a CNN model for standing. Again, similar to sitting, polynomial fitting is used for analysis due to an increase in accuracy by performing this back-to-front method. For the CNN model, the accuracy increases by 1.89% as a result of this method. It can be observed that there are 36 misclassifications, which is much higher than the sitting confusion matrix. This accounts for 16% of the incorrect predictions. Although this seems superior to sitting, the number of incorrect predictions is relatively low in comparison to sitting. The highest number of incorrect predictions for one person is 73, which is relatively low compared to 125 incorrect predictions for sitting. However, a potential reason may be due to human error, as Peter did experience difficulty with the testing as mentioned previously, and this may have impacted the model’s performance.

The standing experiment involved users standing up between the transmitter and receiver, as shown in Figure 4b. The accuracy results for this dataset can be seen in Table 7. Like sitting, standing had a good amplitude for all three models, again ABLSTM proved the best model in classifying users from the amplitude dataset with an accuracy of 97.63%. However, the same model gave the lowest accuracy of all of the features extracted with respect to the Doppler shift. The Doppler shift works from estimated changes in phase over time, and although phase had a higher accuracy for standing compared to sitting, it requires greater signal processing.

The features can be seen in Figure 9 showing that the models performed very similarly to sitting, some performed slightly better for sitting, such as statistical features, whereas the phase performed better for standing. The presence of a human standing causes attenuation of the signal due to the larger surface area of the human body, which can absorb or reflect some of the signal energy, resulting in a weaker or distorted signal. The CSI extracted will be affected more by an increased surface area from the user, and the multipath effect will be impacted heavily as a result. The amplitude performed well on all three models. However, hand-crafted statistical features dropped slightly lower than the sitting dataset because the calculations would have been working on a greater range of data as a result of greater activity happening.

The elapsed time for standing is also recorded, and this can be seen in Table 8. Again, ABLSTM took the longest to classify the data, taking an average of 280.85 s. This was the slowest of all the models; the most efficient model was LSTM-1DCNN, with an average of 56.23 s. This model allows the model to effectively capture temporal dependencies in the input data while also learning local feature representations through convolutional filters. It has been shown to be effective because of its ability to learn both long-term dependencies and short-term patterns in the input sequence. It took around 1 min on average for the model to classify the participants in various methods.

5.2. Experiment Results on BedroomPi

The only difference between our dataset and BedroomPi was that BedroomPi already came in amplitude form, so it had 52 columns (52 subcarriers); therefore, phase, statistical features, and Doppler shift had to be ruled out on this dataset. However, this would not impact the rest of the experiment as amplitude was the best feature selected for the remaining filters on my dataset.

5.2.1. Experiment Results for Sitting

Table 9 displays the experimental results obtained from testing the three models, namely ABLSTM, CNN, and LSTM-1DCNN. LSTM-1DCNN again shows the greatest promise with the highest accuracy for all but two methods, in which case ABLSTM performed better when using the Savitzky–Golay filter and the Butterworth filter. The LSTM-1DCNN had an average accuracy of 83.5%, which was better than ABSLTM and CNN and recorded an average accuracy of 80.9% and 77.6%, respectively. Figure 10 shows the confusion matrices of LSTM-1DCNN operating on Amplitude, Savitzky–Golay, max-min, and Z score. All four methods work great on the hybrid LSTM-1DCNN classifier model. For example, the Savitzky–Golay filter is only misclassified in some instances but all under 3%, which is great performance and shows promise for this model to be further developed under different scenarios. A reason for the high accuracy is that the waves for the three users, as shown in Figure 11, are so different, showing that sitting down has a great variation in body movement and that everyone sits down differently. This demonstrates that sitting is an activity that is distinguishable between users for recognition and identification. However, ABLSTM also has shown great qualities as well, and the Bi-directional layer has added a lot of accuracy to the model because of the processing in two directions, forward and backward; this enables the model to learn complex patterns and perform well.

Table 10 records the other metric when evaluating the methods, which is the time elapsed to train and compile the model in each filtered dataset. Looking at Table 10, it is very obvious that the hybrid LSTM-1DCNN model is efficient to train on. It has an overall average elapsed time of 162.68 s, when comparing this to CNN, which has an average time of 453.33 s, it just shows how efficient it is. Reasons for a slow CNN model could be a number of things. However, CSI data contain spatial and temporal information of wireless signals. CNNs are usually designed to capture spatial patterns, but incorporating both features leads to a complex architecture, and more layers are required to learn the fine-grained details within the data.

5.2.2. Experiment Results for Standing

Table 11 looks at the results for standing up whilst Figure 12 looks at the confusion matrix, which again looks into the hybrid LSTM-1DCNN, this time observing the classification on Amplitude, Butterworth, polynomial fitting, and Z-score. It is observable from looking at Table 11 that the Butterworth filter and least squares baseline removal are the two methods that bring all the classifier average accuracy rates down. It is very surprising to see that Butterworth performs so poorly on these data considering that it is widely used by the research community, for example, in [67], the authors use a Butterworth highpass filter to remove noise and achieve good results, 97.32% identification accuracy. Ming et al. [16] leveraged the Butterworth filter to remove noise from the amplitude and again achieved strong results by using the Fresnel Zone model to achieve 96% identification accuracy. However, feature scaling methods were again very effective for all three models, recording performance accuracy throughout 93%. The average accuracy for ABLSTM, CNN, and LSTM-1DCNN was 84.48%, 81.23% and 88.71%, respectively. In a comparison of performance with the sitting dataset from BedroomPi, all three models improved, which is interesting. A possible reason for this is that when standing, the torso region is much larger compared to sitting and hence the signals will be reflected, refracted, and scattered more, which will be evident in the CSI data, and this makes the classification a little easier due to the data being spread more. From viewing the confusion matrix, the Butterworth filter in the top right has a few dark-shaded areas showing some weaknesses, as previously mentioned. Perhaps the parameters of the model needed to be changed for this specific dataset to be as effective as other researchers suggested. Z-Score and polynomial fitting performed very well on the standing dataset, whereas polynomial fitting performed borderline for sitting. A possible reason is that polynomial fitting works very well when the relationship between the data points is nonlinear. With standing, the dataset will have larger variability and greater fluctuations, making it very suitable for the polynomial fitting to be used here; however, as previously stated, sitting has less movement and less variability, which is evident on the basis of poor performance.

Table 12 shows the time elapsed for the standing dataset. It is immediately clear that the hybrid LSTM-1DCNN performs very efficiently and substantially outperforms the other two classifier models. To be precise, the average elapsed time for LSTM-1DCNN is 169.6 s, showing how efficient the model is in proportion to ABLSTM and CNN, which have average times of 434.76 s and 520.35 s, respectively. This model has shown to be effective both performance-wise and efficiency-wise for our dataset and the BedroomPi dataset mainly due to the initial layers of such a hierarchical architecture approach allowing for quick but precise convergence during training. The use of LSTM, which inherently has mechanisms such as dropout, prevents overfitting and combining this with 1DCNN layers provides solid regularisation techniques enhancing the model’s generalisation capabilities.

5.3. Performance Comparison

As was evident throughout the evaluation of our tested models in both datasets, LSTM-1DCNN will be compared against the models used by the authors [53] who created the BedroomPi dataset. They tested several methods, but the methods that are comparable to the LSTM-1DCNN are BLSTM, 2D-CNN, LSTM, and 1D-CNN. The authors presented an overall accuracy, which is the process we will follow as well to compare. Figure 13 displays the results. Just to mention, the BedroomPi is a fairly new dataset, and limited work has been done on it at the current time. Therefore, this article goes one step further than simply performing activity recognition and goes into the identification of users performing these activities. Therefore, our LSTM-1DCNN model, which identifies people performing activities, is being compared against models, which are just performing activity recognition. It is clear from Figure 13 that our model performs very well compared to existing models on this dataset. The best-performing model is 2D-CNN but only by a minimal margin of 0.1%; however, this recognises activities where our model identifies users from fine-grained features. It is also evident that, individually, a 1D-CNN and LSTM do not perform very well with an average accuracy of 87.4% and 89.2%, respectively, being recorded. But by fusing these together, substantial performance increases can be obtained, which is evident through our model. In the future, we would look at the prospect of an LSTM-2DCNN and evaluate if this has an impact on performance. 1DCNN was appropriate to be fused on both datasets because it was a time series of data. Thus, 2DCNN would be used if the data were converted to images first. Both have positives and will be evaluated more in future work.

What is also interesting is that Moshiri et al. [53] recorded the consumed time for each model. Although they recorded the time per step of the model in milliseconds, it is interesting to observe that the 1D-CNN and LSTM models are the most efficient out of all tested models for both the training and testing stages. Here, 1D-CNN and LSTM took 9 and 13 milliseconds, respectively, to train and, to test, 3 and 6 milliseconds. This proves the reasoning behind the fast and efficient hybrid LSTM-1DCNN model. It is also observed that specific types of LSTM, including ConvLSTM [68] and DenseLSTM [69], are not very efficient with respect to the elapsed time. Although they showed signs of promise, both took well over 30 milliseconds. This is mainly due to their complex architecture, especially in DenseLSTM because they have a large number of parameters due to dense connectivity, meaning that each unit is connected to every unit in the previous and next layers. More parameters require more memory to store and more computations during forward and backwards passes, leading to longer training times. Dense connections restrict the degree of parallelism that can be exploited during computations. Modern deep learning frameworks leverage parallel processing units such as GPUs; the one we used was an Intel Iris Plus Graphics 645 1536 MB, but dense connections limit the parallel execution of operations, leading to slower training times, especially for the GPU that was used.

6. Limitations

There are still limitations with the use of CSI for HAR, particularly regarding sensor-based systems; this system tries to overcome these limitations. The first limitation concerns the use of smartphones. Smartphones have more recently become dominant over wearables due to numerous sensors such as accelerometers, gyroscopes, barometers, humidity sensors, etc. According to an article, 6.92 billion people have a smartphone, equivalent to 86.29% of the world population [70], hence the scope for research with regard to the use of smartphones for HAR. Furthermore, because of the computing power that has become standard on today’s smartphones, researchers have begun to use smartphones to replace wearable sensors in HAR. Two recent articles mentioned the lack of computational capability of mobile and wearable sensor devices and how this leads to difficulties with onboard and real-time recognition [71,72,73]. One big problem is the change in sensor orientation, especially for smartphone accelerometers. In addition, battery life is a concern, and this will be further reduced if applications require real-time monitoring for HAR. With regard to user convenience, carrying a smartphone/wearable at all times might not be suitable for everyday use. Furthermore, all wearable and smartphone sensors are fixed and, therefore, might not be the correct sensor for specific tasks, and potentially, the data recorded might not be open-source and available for pre-processing and feature extraction stages. In one particular work, the authors investigated designing an accelerometer-based architecture for HAR using CNN for feature extraction [73]. Although it achieves 92% for the activities performed, it has not been tested in different environments and the sensors worn by the user may affect their daily lives. This is the case in [74,75,76,77], where users are asked to place their smartphone in their pocket or mount it on their waist, legs, or forearm [78,79,80,81]. Li et al. [82] discuss the HAR dataset named OPPORTUNITY. Data were recorded with numerous wearable sensors and small to very large segments of data were missing due to data transmission problems of the wireless sensors during the acquisition process.

However, there are still challenges in the HAR from the sensor field and there are often influencing factors which impact the performance of a recognition system. Chen et al. [83] present these challenges in detail, but one that appears in many research papers is the feature extraction stage. This stage has a great influence on the overall system and the problem lies with inter-activity similarity, which refers to the different activities that have similar features. For example, the way a person runs or walks could have very similar movements, and this is a big challenge for feature extraction techniques to recognise these subtle differences. Nweke et al. [71] examine the problems surrounding hand-crafted features, i.e., statistical—mean, median, standard deviation, and kurtosis. However, as mentioned above, these features are incapable of extracting the small subtle movements in the activities mentioned above. The solution to this is deep learning and artificial intelligence methods to enable automatic feature extraction and classification, which will be able to recognise complex activities, unlike handcrafted features. In one work, the authors provide the advantages and disadvantages of deep learning methods [71]. Upon reflection, CNN and RNN seem to be the two most popular deep learning methods used in the literature. For example, Challa et al. [84] and [85] produce a CNN hybrid model in conjunction with bidirectional LSTM and CNN, respectively. They achieved the precision between 94% and 97% in different datasets. However, although ML techniques achieved good performance, limitations still exist regarding the stages before, which require significant manual effort to get the data into the correct format. Even so, the data would not be transferable in the sense that it could be used on other models for comparison. Although deep learning is more capable than typical neural networks, major issues do still persist, such as challenges with overfitting and resource usage. However, the deep belief network (DBN) uses restricted Boltzmann machines (RBN) and this allows them to extract meaningful features from raw sensor data, making the model an ideal candidate for HAR [78].

The concern regarding the study’s reliance on a controlled isolated environment is duly noted. While the controlled environment provides essential standardisation, acknowledging its limitations in representing chaotic scenarios in the real world is crucial for the validity and applicability of the research findings. To address these concerns, future iterations of our research will explore avenues to enhance the ecological validity of the study. One approach could involve the introduction of controlled environmental variables representative of real-world conditions. By incorporating relevant external factors, such as background noise and varying furniture layouts, the study can better simulate the complexities of real-world settings. Additionally, conducting pilot tests in diverse and uncontrolled environments will provide valuable insights into the model’s performance in unpredictable situations.

7. Conclusions

This article has investigated the feasibility of classifying participants who perform sitting and standing postures in an isolation chamber cage on three models: ABLSTM, CNN, and LSTM-1DCNN. We decided to test them back-to-front to establish the quality of each technique and demonstrate that the method shows good accuracy (Table 9) from this time-saving method by recording accuracy, elapsed time, and computational complexity. The experimental results show that amplitude is the best-extracted feature and this was recorded for all three models for sitting and standing postures (Table 13). However, hand-crafted features are not a good feature to extract based on the model’s accuracy. Polynomial fitting, Z-score, and the Savitzky–Golay filter all show positives for this back-to-front proposed method as the accuracies increased for both sitting and standing. Although ABLSTM provided the highest accuracy (97.63% and 98.90% for standing and sitting postures, respectively), of all models and tested techniques, it came at a cost, as it was the least efficient of all. In comparison, LSTM-1DCNN was the best model used, as seen in Table 9, with an average accuracy of 74.13% and 84.29% for standing and sitting, respectively. In addition, the average timing was the lowest of all models. The following tables give a summary of all accuracies and timings.

In our future work, we will look to investigate the back-to-front fashion deeper by testing different models and techniques, including polynomial fitting, Savitzky–Golay, Z-Score, and the tree-structured Parzen estimator (TPE) hyperparameter optimisation to identify the best parameters for the tested models. Furthermore, we will look into the effect that multiple transmitters will have and how their position impacts performance by performing many different natural activities.

Author Contributions

Conceptualization, O.C., S.K. and S.P.; methodology, O.C., S.K. and S.P.; software, O.C.; validation, O.C., S.K. and S.P.; formal analysis, O.C.; investigation, O.C.; resources, O.C.; data curation, O.C.; writing—original draft preparation, O.C.; writing—review and editing, O.C., S.K. and S.P.; visualization, O.C.; supervision, S.K. and S.P.; project administration, O.C., S.K. and S.P.; funding acquisition, S.K. and S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to it containing personal identifiable information.

Conflicts of Interest

The authors declare no conflict of interest.

References

Khan, S.; Parkinson, S.; Grant, L.; Liu, N.; Mcguire, S. Biometric systems utilising health data from wearable devices: Applications and future challenges in computer security. ACM Comput. Surv. (CSUR) 2020, 53, 1–29. [Google Scholar] [CrossRef]
Kaur, G.; Singh, A.; Singh, D. A comprehensive review on access control systems amid global pandemic. In Proceedings of the 2022 International Conference on Emerging Trends in Engineering and Medical Sciences (ICETEMS), Nagpur, India, 18–19 November 2022; pp. 15–19. [Google Scholar]
Petrosyan, A. UK: Internet Usage Reach 2019–2028|Statista—statista.com. 2023. Available online: https://www.statista.com/statistics/553589/predicted-internet-user-penetration-rate-in-the-united-kingdom-uk/ (accessed on 12 May 2023).
Ma, Y.; Zhou, G.; Wang, S. WiFi sensing with channel state information: A survey. ACM Comput. Surv. (CSUR) 2019, 52, 1–36. [Google Scholar] [CrossRef]
Guo, R.; Li, H.; Han, D.; Liu, R. Feasibility analysis of using Channel State Information (CSI) acquired from Wi-Fi routers for construction worker fall detection. Int. J. Environ. Res. Public Health 2023, 20, 4998. [Google Scholar] [CrossRef]
Wang, D.; Yang, J.; Cui, W.; Xie, L.; Sun, S. Multimodal CSI-based human activity recognition using GANs. IEEE Internet Things J. 2021, 8, 17345–17355. [Google Scholar] [CrossRef]
Liu, L.; Zhang, S. An Indoor Geolocation Algorithm based on CSI and Affine Propagation Clustering. J. Phys. Conf. Ser. 2020, 1650, 022096. [Google Scholar] [CrossRef]
Gu, Y.; Yu, X. WiPass: PIN-free and Device-free User Authentication Leveraging Behavioral Features via WiFi Channel State Information. In Proceedings of the 2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC), Shanghai, China, 23–25 April 2021; pp. 120–124. [Google Scholar]
Yang, J.; Zou, H.; Xie, L. SecureSense: Defending Adversarial Attack for Secure Device-Free Human Activity Recognition. arXiv 2022, arXiv:2204.01560. [Google Scholar] [CrossRef]
Kong, H.; Lu, L.; Yu, J.; Chen, Y.; Xu, X.; Tang, F.; Chen, Y.C. Multiauth: Enable multi-user authentication with single commodity wifi device. In Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, Shanghai, China, 26–29 July 2021; pp. 31–40. [Google Scholar]
Staat, P.; Mulzer, S.; Roth, S.; Moonsamy, V.; Heinrichs, M.; Kronberger, R.; Sezgin, A.; Paar, C. IRShield: A countermeasure against adversarial physical-layer wireless sensing. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–26 May 2022; pp. 1705–1721. [Google Scholar]
Al-qaness, M.A. Device-free human micro-activity recognition method using WiFi signals. Geo-Spat. Inf. Sci. 2019, 22, 128–137. [Google Scholar] [CrossRef]
Wang, Z.; Jiang, K.; Hou, Y.; Huang, Z.; Dou, W.; Zhang, C.; Guo, Y. A survey on CSI-based human behavior recognition in through-the-wall scenario. IEEE Access 2019, 7, 78772–78793. [Google Scholar] [CrossRef]
Xin, T.; Guo, B.; Wang, Z.; Wang, P.; Yu, Z. FreeSense: Human-behavior understanding using Wi-Fi signals. J. Ambient Intell. Humaniz. Comput. 2018, 9, 1611–1622. [Google Scholar] [CrossRef]
Pokkunuru, A.; Jakkala, K.; Bhuyan, A.; Wang, P.; Sun, Z. NeuralWave: Gait-based user identification through commodity WiFi and deep learning. In Proceedings of the IECON 2018—44th Annual Conference of the IEEE Industrial Electronics Society, Washington, DC, USA, 21–23 October 2018; pp. 758–765. [Google Scholar]
Ming, X.; Feng, H.; Bu, Q.; Zhang, J.; Yang, G.; Zhang, T. HumanFi: WiFi-based human identification using recurrent neural network. In Proceedings of the 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Leicester, UK, 19–23 August 2019; pp. 640–647. [Google Scholar]
Wang, X.; Li, F.; Xie, Y.; Yang, S.; Wang, Y. Gait and respiration-based user identification using wi-fi signal. IEEE Internet Things J. 2021, 9, 3509–3521. [Google Scholar] [CrossRef]
Lin, G.; Jiang, W.; Xu, S.; Zhou, X.; Guo, X.; Zhu, Y.; He, X. Human Activity Recognition Using Smartphones with WiFi Signals. IEEE Trans.-Hum.-Mach. Syst. 2022, 53, 142–153. [Google Scholar] [CrossRef]
Jawad, S.K.; Alaziz, M. Human Activity and Gesture Recognition Based on WiFi Using Deep Convolutional Neural Networks. Iraqi J. Electr. Electron. Eng. 2022, 18, 110–116. [Google Scholar] [CrossRef]
Alhakami, H. Knowledge based Authentication Techniques and Challenges. Int. J. Adv. Comput. Sci. Appl. 2020, 11. [Google Scholar] [CrossRef]
Wang, C.; Wang, Y.; Chen, Y.; Liu, H.; Liu, J. User authentication on mobile devices: Approaches, threats and trends. Comput. Netw. 2020, 170, 107118. [Google Scholar] [CrossRef]
Abugabah, A.; Nizamuddin, N.; Abuqabbeh, A. A review of challenges and barriers implementing RFID technology in the Healthcare sector. Procedia Comput. Sci. 2020, 170, 1003–1010. [Google Scholar] [CrossRef]
Haddara, M.; Staaby, A. RFID applications and adoptions in healthcare: A review on patient safety. Procedia Comput. Sci. 2018, 138, 80–88. [Google Scholar] [CrossRef]
Fahmy, A.; Altaf, H.; Al Nabulsi, A.; Al-Ali, A.; Aburukba, R. Role of RFID technology in smart city applications. In Proceedings of the 2019 International Conference on Communications, Signal Processing, and Their Applications (ICCSPA), Sharjah, United Arab Emirates, 19–21 March 2019; pp. 1–6. [Google Scholar]
Fatima, H.; Khan, H.U.; Akbar, S. Home Automation and RFID-Based Internet of Things Security: Challenges and Issues. Secur. Commun. Netw. 2021, 2021, 1723535. [Google Scholar] [CrossRef]
Palma, D.; Montessoro, P.L. Biometric-based human recognition systems: An overview. In Recent Advances in Biometrics; IntechOpen Limited: London, UK, 2022; pp. 1–21. [Google Scholar]
Joshi, M.; Mazumdar, B.; Dey, S. Security vulnerabilities against fingerprint biometric system. arXiv 2018, arXiv:1805.07116. [Google Scholar]
Zheng, Z.; Wang, Q.; Wang, C. Spoofing Attacks and Anti-Spoofing Methods for Face Authentication over Smartphones. IEEE Commun. Mag. 2023. Early Access. [Google Scholar] [CrossRef]
Bhilare, S.; Kanhangad, V.; Chaudhari, N. A study on vulnerability and presentation attack detection in palmprint verification system. Pattern Anal. Appl. 2018, 21, 769–782. [Google Scholar] [CrossRef]
Alsaadi, I.M. Study on most popular behavioral biometrics, advantages, disadvantages and recent applications: A review. Int. J. Sci. Technol. Res. 2021, 10, 15–21. [Google Scholar]
Baynath, P.; Soyjaudah, K.S.; Khan, M.H.M. Keystroke recognition using neural network. In Proceedings of the 2017 5th International Symposium on Computational and Business Intelligence (ISCBI), Dubai, United Arab Emirates, 11–14 August 2017; pp. 86–90. [Google Scholar]
Wang, Y.; Wu, C.; Zheng, K.; Wang, X. Improving reliability: User authentication on smartphones using keystroke biometrics. IEEE Access 2019, 7, 26218–26228. [Google Scholar] [CrossRef]
Parkinson, S.; Khan, S.; Crampton, A.; Xu, Q.; Xie, W.; Liu, N.; Dakin, K. Password policy characteristics and keystroke biometric authentication. IET Biom. 2021, 10, 163–178. [Google Scholar] [CrossRef]
Parkinson, S.; Khan, S.; Badea, A.M.; Crampton, A.; Liu, N.; Xu, Q. An empirical analysis of keystroke dynamics in passwords: A longitudinal study. IET Biom. 2023, 12, 25–37. [Google Scholar] [CrossRef]
Siddiqui, N.; Dave, R.; Seliya, N. Continuous authentication using mouse movements, machine learning, and Minecraft. arXiv 2021, arXiv:2110.11080. [Google Scholar]
Qin, D.; Fu, S.; Amariucai, G.; Qiao, D.; Guan, Y. Mauspad: Mouse-based authentication using segmentation-based, progress-adjusted dtw. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December 2020–1 January 2021; pp. 425–433. [Google Scholar]
Delgado-Santos, P.; Tolosana, R.; Guest, R.; Vera-Rodriguez, R.; Deravi, F.; Morales, A. GaitPrivacyON: Privacy-preserving mobile gait biometrics using unsupervised learning. Pattern Recognit. Lett. 2022, 161, 30–37. [Google Scholar] [CrossRef]
Hernandez, S.M.; Bulut, E. Wifi sensing on the edge: Signal processing techniques and challenges for real-world systems. IEEE Commun. Surv. Tutor. 2022, 25, 46–76. [Google Scholar] [CrossRef]
Ramanujam, E.; Perumal, T.; Padmavathi, S. Human activity recognition with smartphone and wearable sensors using deep learning techniques: A review. IEEE Sensors J. 2021, 21, 13029–13040. [Google Scholar] [CrossRef]
Muaaz, M.; Mayrhofer, R. Smartphone-based gait recognition: From authentication to imitation. IEEE Trans. Mob. Comput. 2017, 16, 3209–3221. [Google Scholar] [CrossRef]
Sun, F.; Mao, C.; Fan, X.; Li, Y. Accelerometer-based speed-adaptive gait authentication method for wearable IoT devices. IEEE Internet Things J. 2018, 6, 820–830. [Google Scholar] [CrossRef]
Sun, F.; Zang, W.; Gravina, R.; Fortino, G.; Li, Y. Gait-based identification for elderly users in wearable healthcare systems. Inf. Fusion 2020, 53, 134–144. [Google Scholar] [CrossRef]
Chen, T.; Li, Y.; Tao, S.; Lim, H.; Sakashita, M.; Zhang, R.; Guimbretiere, F.; Zhang, C. Neckface: Continuously tracking full facial expressions on neck-mounted wearables. Proc. Acm Interact. Mob. Wearable Ubiquitous Technol. 2021, 5, 1–31. [Google Scholar] [CrossRef]
Liu, H.; Xue, T.; Schultz, T. On a Real Real-Time Wearable Human Activity Recognition System. In Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies, Lisbon, Portugal, 16–18 February 2023; pp. 16–18. [Google Scholar]
Nkabiti, K.P.; Chen, Y.; Sultan, K.; Armand, B. A deep bidirectional LSTM recurrent neural networks for identifying humans indoors using channel state information. In Proceedings of the 2019 28th Wireless and Optical Communications Conference (WOCC), Beijing, China, 9–10 May 2019; pp. 1–5. [Google Scholar]
Zhang, L.; Wang, C.; Ma, M.; Zhang, D. WiDIGR: Direction-independent gait recognition system using commercial Wi-Fi devices. IEEE Internet Things J. 2019, 7, 1178–1191. [Google Scholar] [CrossRef]
Wang, D.; Yang, J.; Cui, W.; Xie, L.; Sun, S. CAUTION: A Robust WiFi-based human authentication system via few-shot open-set recognition. IEEE Internet Things J. 2022, 9, 17323–17333. [Google Scholar] [CrossRef]
Lattanzi, E.; Donati, M.; Freschi, V. Exploring artificial neural networks efficiency in tiny wearable devices for human activity recognition. Sensors 2022, 22, 2637. [Google Scholar] [CrossRef] [PubMed]
Tran, V.T.; Riveros, C.; Ravaud, P. Patients’ views of wearable devices and AI in healthcare: Findings from the ComPaRe e-cohort. NPJ Digit. Med. 2019, 2, 53. [Google Scholar] [CrossRef]
Zerrouki, N.; Harrou, F.; Sun, Y.; Houacine, A. Vision-based human action classification using adaptive boosting algorithm. IEEE Sensors J. 2018, 18, 5115–5121. [Google Scholar] [CrossRef]
Oyedotun, O.K.; Khashman, A. Deep learning in vision-based static hand gesture recognition. Neural Comput. Appl. 2017, 28, 3941–3951. [Google Scholar] [CrossRef]
Anil Kumar, C.J.; Abraham, C.; Darshan, M.C.; Freddy, D.; Anandakrishnan, P.S. Robust Human Activity Recognition using Multimodal Feature-Level Fusion. Grenze Int. J. Eng. Technol. (GIJET) 2023, 9. [Google Scholar]
Moshiri, P.F.; Shahbazian, R.; Nabati, M.; Ghorashi, S.A. A CSI-based human activity recognition using deep learning. Sensors 2021, 21, 7225. [Google Scholar] [CrossRef]
Wang, Y.; Yao, L.; Wang, Y.; Zhang, Y. Robust CSI-based human activity recognition with augment few shot learning. IEEE Sensors J. 2021, 21, 24297–24308. [Google Scholar] [CrossRef]
Schäfer, J.; Barrsiwal, B.R.; Kokhkharova, M.; Adil, H.; Liebehenschel, J. Human activity recognition using CSI information with nexmon. Appl. Sci. 2021, 11, 8860. [Google Scholar] [CrossRef]
Chen, Z.; Zhang, L.; Jiang, C.; Cao, Z.; Cui, W. WiFi CSI based passive human activity recognition using attention based BLSTM. IEEE Trans. Mob. Comput. 2019, 18, 2714–2724. [Google Scholar] [CrossRef]
Shalaby, E.; ElShennawy, N.; Sarhan, A. Utilizing deep learning models in CSI-based human activity recognition. Neural Comput. Appl. 2022, 34, 5993–6010. [Google Scholar] [CrossRef] [PubMed]
Ding, J.; Wang, Y. WiFi CSI-based human activity recognition using deep recurrent neural network. IEEE Access 2019, 7, 174257–174269. [Google Scholar] [CrossRef]
Ambalkar, H.; Wang, X.; Mao, S. Adversarial human activity recognition using Wi-Fi CSI. In Proceedings of the 2021 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Virtual, 12–17 September 2021; pp. 1–5. [Google Scholar]
Li, H.; He, X.; Chen, X.; Fang, Y.; Fang, Q. Wi-motion: A robust human activity recognition using WiFi signals. IEEE Access 2019, 7, 153287–153299. [Google Scholar] [CrossRef]
Yang, J.; Chen, X.; Zou, H.; Wang, D.; Xu, Q.; Xie, L. EfficientFi: Toward large-scale lightweight WiFi sensing via CSI compression. IEEE Internet Things J. 2022, 9, 13086–13095. [Google Scholar] [CrossRef]
Yousefi, S.; Narui, H.; Dayal, S.; Ermon, S.; Valaee, S. A survey on behavior recognition using WiFi channel state information. IEEE Commun. Mag. 2017, 55, 98–104. [Google Scholar] [CrossRef]
Fard Moshiri, P.; Nabati, M.; Shahbazian, R.; Ghorashi, S. CSI-Based Human Activity Recognition using Convolutional Neural Networks. In Proceedings of the 11th International Conference on Computer and Knowledge Engineering (ICCKE 2021), Mashhad, Iran, 28–29 October 2022; pp. 7–12. [Google Scholar]
Shahverdi, H.; Nabati, M.; Fard Moshiri, P.; Asvadi, R.; Ghorashi, S.A. Enhancing CSI-Based Human Activity Recognition by Edge Detection Techniques. Information 2023, 14, 404. [Google Scholar] [CrossRef]
Damodaran, N.; Haruni, E.; Kokhkharova, M.; Schäfer, J. Device free human activity and fall recognition using WiFi channel state information (CSI). CCF Trans. Pervasive Comput. Interact. 2020, 2, 1–17. [Google Scholar] [CrossRef]
Jannat, M.K.A.; Islam, M.S.; Yang, S.H.; Liu, H. Efficient Wi-Fi-Based Human Activity Recognition Using Adaptive Antenna Elimination. IEEE Access 2023, 11, 105440–105454. [Google Scholar] [CrossRef]
Xu, Y.; Yang, W.; Chen, M.; Chen, S.; Huang, L. Attention-based gait recognition and walking direction estimation in wi-fi networks. IEEE Trans. Mob. Comput. 2020, 21, 465–479. [Google Scholar] [CrossRef]
Forbes, G.; Massie, S.; Craw, S. Wifi-based human activity recognition using Raspberry Pi. In Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore, MD, USA, 9–11 November 2020; pp. 722–730. [Google Scholar]
Zhang, J.; Wu, F.; Wei, B.; Zhang, Q.; Huang, H.; Shah, S.W.; Cheng, J. Data augmentation and dense-LSTM for human activity recognition using WiFi signal. IEEE Internet Things J. 2020, 8, 4628–4641. [Google Scholar] [CrossRef]
How Many People Have Smartphones Worldwide (May 2023)—bankmycell.com. Available online: https://www.facebook.com/ashley.turner2 (accessed on 12 May 2023).
Nweke, H.F.; Teh, Y.W.; Al-Garadi, M.A.; Alo, U.R. Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: State of the art and research challenges. Expert Syst. Appl. 2018, 105, 233–261. [Google Scholar] [CrossRef]
Wang, Y.; Cang, S.; Yu, H. A survey on wearable sensor modality centred human activity recognition in health care. Expert Syst. Appl. 2019, 137, 167–190. [Google Scholar] [CrossRef]
Wan, S.; Qi, L.; Xu, X.; Tong, C.; Gu, Z. Deep learning models for real-time human activity recognition with smartphones. Mob. Netw. Appl. 2020, 25, 743–755. [Google Scholar] [CrossRef]
Saha, J.; Chowdhury, C.; Roy Chowdhury, I.; Biswas, S.; Aslam, N. An ensemble of condition based classifiers for device independent detailed human activity recognition using smartphones. Information 2018, 9, 94. [Google Scholar] [CrossRef]
Mukherjee, D.; Mondal, R.; Singh, P.K.; Sarkar, R.; Bhattacharjee, D. EnsemConvNet: A deep learning approach for human activity recognition using smartphone sensors for healthcare applications. Multimed. Tools Appl. 2020, 79, 31663–31690. [Google Scholar] [CrossRef]
Nandy, A.; Saha, J.; Chowdhury, C.; Singh, K.P. Detailed human activity recognition using wearable sensor and smartphones. In Proceedings of the 2019 International Conference on Opto-Electronics and Applied Optics (Optronix), Kolkata, India, 18–20 March 2019; pp. 1–6. [Google Scholar]
Milenkoski, M.; Trivodaliev, K.; Kalajdziski, S.; Jovanov, M.; Stojkoska, B.R. Real time human activity recognition on smartphones using LSTM networks. In Proceedings of the 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 21–25 May 2018; pp. 1126–1131. [Google Scholar]
Hassan, M.M.; Uddin, M.Z.; Mohamed, A.; Almogren, A. A robust human activity recognition system using smartphone sensors and deep learning. Future Gener. Comput. Syst. 2018, 81, 307–313. [Google Scholar] [CrossRef]
Chung, S.; Lim, J.; Noh, K.J.; Kim, G.; Jeong, H. Sensor data acquisition and multimodal sensor fusion for human activity recognition using deep learning. Sensors 2019, 19, 1716. [Google Scholar] [CrossRef]
Lawal, I.A.; Bano, S. Deep human activity recognition using wearable sensors. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments, Rhodes, Greece, 5–7 June 2019; pp. 45–48. [Google Scholar]
Ahmed, N.; Rafiq, J.I.; Islam, M.R. Enhanced human activity recognition based on smartphone sensor data using hybrid feature selection model. Sensors 2020, 20, 317. [Google Scholar] [CrossRef] [PubMed]
Li, F.; Shirahama, K.; Nisar, M.A.; Köping, L.; Grzegorzek, M. Comparison of feature learning methods for human activity recognition using wearable sensors. Sensors 2018, 18, 679. [Google Scholar] [CrossRef] [PubMed]
Chen, K.; Zhang, D.; Yao, L.; Guo, B.; Yu, Z.; Liu, Y. Deep learning for sensor-based human activity recognition: Overview, challenges, and opportunities. ACM Comput. Surv. (CSUR) 2021, 54, 1–40. [Google Scholar] [CrossRef]
Challa, S.K.; Kumar, A.; Semwal, V.B. A multibranch CNN-BiLSTM model for human activity recognition using wearable sensor data. Vis. Comput. 2022, 38, 4095–4109. [Google Scholar] [CrossRef]
Bianchi, V.; Bassoli, M.; Lombardo, G.; Fornacciari, P.; Mordonini, M.; De Munari, I. IoT wearable sensor and deep learning: An integrated approach for personalized human activity recognition in a smart home environment. IEEE Internet Things J. 2019, 6, 8553–8562. [Google Scholar] [CrossRef]

Figure 1. Our CSI collection process. A transmitter ESP32 device (Tx) sending packets to a receiver ESP32 device (Rx) whilst participants complete sitting and standing postures in line-of-sight (LoS).

Figure 2. Our system overview.

Figure 3. Received signals of anonymous user sitting.

Figure 4. Experiment setup. (a) Users sitting down. (b) Users standing up.

Figure 5. Confusion matrix of polynomial fitting on CNN for HAR for sitting.

Figure 6. Feature extraction comparison on the classification models—sitting.

Figure 7. The Savitzky–Golay filter smoothing out the wave.

Figure 8. Confusion matrix of polynomial fitting on CNN for HAR for standing.

Figure 9. Feature extraction comparison on the classification models—standing.

Figure 10. LSTM-1DCNN confusion matrix for: (Top left) Amplitude, (Top right) Savitzky–Golay, (Bottom left) Z-score, (Bottom right) max–min.

Figure 11. Savitzky–Golay filtered wave.

Figure 12. LSTM-1DCNN confusion matrix for: (Top left) Amplitude, (Top right) Butterworth, (Bottom left) polynomial fitting, (Bottom right) Z-score.

Figure 13. Accuracy of different models.

Table 1. ABLSTM model layers.

Layer Name	Input Size	Output Size
Bidirectional Layer 1	(132, 192, 1)	(32, 128)
Dropout Layer 1	(32, 128)	(32, 128)
Dense Layer 1	(32, 128)	(32, 15)

Table 2. CNN model layers.

Layer Name	Input Size	Output Size
Convolutional Layer 1	(32, 32, 32, 3)	(32, 30, 30, 32)
Pooling Layer 1	(32, 30, 30, 32)	(32, 15, 15, 32)
Convolutional Layer 2	(32, 15, 15, 32)	(32, 13, 13, 64)
Pooling Layer 2	(32, 13, 13, 64)	(32, 6, 6, 64)
Convolutional Layer 3	(32, 6, 6, 64)	(32, 4, 4, 64)
Flatten Layer 1	(32, 4, 4, 64)	(32, 1024)
Dense Layer 1	(32, 1024)	(32, 64)
Dense Layer 2	(32, 64)	(32, 15)

Table 3. LSTM-1DCNN model layers.

Layer Name	Input Size	Output Size
Convolutional Layer 1	(32, 30, 192)	(32, 28, 64)
Pooling Layer 1	(32, 28, 64)	(32, 14, 64)
Convolutional Layer 2	(32, 14, 64)	(32, 12, 32)
Pooling Layer 2	(32, 12, 32)	(32, 6, 32)
LSTM Layer 1	(32, 6, 32)	(32, 6, 64)
LSTM Layer 2	(32, 6, 64)	(32, 32)
Dense Layer 1	(32, 32)	(32, 128)
Dropout Layer 1	(32, 128)	(32, 128)
Dense Layer 2	(32, 128)	(32, 15)

Table 4. Channel matrix obtained from anonymous user.

CSI DATA11	CSI DATA12	CSI DATA13	CSI DATA14	CSI DATA15	CSI DATA16	CSI DATA17
5	0	22	2	21	1	23
3	4	15	16	16	14	17
−3	5	−10	19	−9	20	−9
−5	1	−19	5	−19	6	−20

Table 5. Overall performance accuracy on our dataset—sitting.

Classification Models	ABLSTM	CNN	LSTM-1DCNN
Signal processing
Feature extraction
Amplitude	98.90%	95.81%	92.96%
Phase	41.68%	61.02%	29.79%
Statistical features	44.77%	46.46%	98.40%
Doppler shift	11.96%	80.67%	86.93%
Denoising filters
Savitzky-Golay Filter	98.63%	93.36%	94.40%
Butterworth Filter	38.88%	38.49%	82.4%
Data Preparation
Detrending
Least Squares Baseline Removal	60.37%	34.59%	74.77%
Polynomial fitting	99.91%	97.77%	95.74%
Feature scaling
Max–Min	99.10%	95.61%	94.71%
Z-Score	98.29%	94.79%	92.75%

Table 6. Elapsed time on our sitting dataset.

Classification Models	ABLSTM	CNN	LSTM-1DCNN
Signal processing
Feature extraction
Amplitude	335.61 s	116.92 s	63.27 s
Phase	344.88 s	108.21 s	57.77 s
Statistical features	28.64 s	9.28 s	41.87 s
Doppler shift	387.41 s	90.70 s	65.70 s
Denoising filters
Savitzky-Golay Filter	329 s	112.03 s	64.12 s
Butterworth Filter	313.28 s	198.40 s	60.07 s
Data Preparation
Detrending
Least Squares Baseline Removal	353.25 s	119.89 s	69.75 s
Polynomial fitting	352.52 s	122.77 s	74.46 s
Feature scaling
Max–Min	333.30 s	120.70 s	70.53 s
Z-Score	340.84 s	120.58 s	58.73 s

Table 7. Overall performance accuracy on our dataset—standing.

Classification Models	ABLSTM	CNN	LSTM-1DCNN
Signal processing
Feature extraction
Amplitude	97.63%	94.14%	87.94%
Phase	74.43%	72.16%	84.03%
Statistical features	33.70%	36.72%	68.75%
Doppler shift	11.86%	79.60%	59.66%
Denoising filters
Savitzky-Golay Filter	97.9%	92.60%	78.32%
Butterworth Filter	44.20%	40.29%	54.01%
Data Preparation
Detrending
Least Squares Baseline Removal	46.47%	29.66%	57.09%
Polynomial fitting	99.35%	96.03%	82.59%
Feature scaling
Max-Min	96.13%	93.47%	84.07%
Z-Score	97.08%	94.55%	84.80%

Table 8. Elapsed time on our standing dataset.

Classification Models	ABLSTM	CNN	LSTM-1DCNN
Signal processing
Feature extraction
Amplitude	291.50 s	92.27 s	52.81 s
Phase	273.15 s	98.74 s	53.69 s
Statistical features	22.62 s	8.00 s	37.96 s
Doppler shift	543.66 s	87.51 s	59.88 s
Denoising filters
Savitzky-Golay Filter	233.51 s	98.71 s	53.55 s
Butterworth Filter	280.29 s	100.20 s	53.83 s
Data Preparation
Detrending
Least Squares Baseline Removal	307.24 s	93.47 s	63.13 s
Polynomial fitting	291.39 s	110.60 s	62.09 s
Feature scaling
Max–Min	322.33 s	117.06 s	63.87 s
Z-Score	242.77 s	110.25 s	61.48 s

Table 9. Overall performance accuracy on BedroomPi dataset—sitting.

Classification Models	ABLSTM	CNN	LSTM-1DCNN
Signal processing
Feature extraction
Amplitude	92.39%	91.67%	93.75%
Denoising filters
Savitzky-Golay Filter	96.52%	85.74%	94.51%
Butterworth Filter	57.92%	53.52%	57.85%
Data Preparation
Detrending
Least Squares Baseline Removal	60.56%	58.97%	63.13%
Polynomial fitting	75.56%	75.61%	87.38%
Feature scaling
Max–Min	91.1%	87.46%	93.72%
Z-Score	92.06%	90.43%	94.42%

Table 10. Elapsed time on BedroomPI sitting dataset.

Classification Models	ABLSTM	CNN	LSTM-1DCNN
Signal processing
Feature extraction
Amplitude	556.77 s	782.51 s	189.49 s
Denoising filters
Savitzky-Golay Filter	547.97 s	396.5 s	199.33 s
Butterworth Filter	542.48 s	408.07 s	190.63 s
Data Preparation
Detrending
Least Squares Baseline Removal	374.93 s	397.68 s	186.11 s
Polynomial fitting	364.58 s	394.22 s	186.11 s
Feature scaling
Max–Min	388.68 s	389.95 s	187.44 s
Z-Score	488.89 s	405.12 s	185.74 s

Table 11. Overall performance accuracy on BedroomPi datase—standing.

Classification Models	ABLSTM	CNN	LSTM-1DCNN
Signal processing
Feature extraction
Amplitude	93.46%	87.55%	95.72%
Denoising filters
Savitzky-Golay Filter	95.79%	91.37%	96.04%
Butterworth Filter	67.58%	59.76%	83.71%
Data Preparation
Detrending
Least Squares Baseline Removal	58.76%	57.71%	59.05%
Polynomial fitting	88.5%	86.76%	95.19%
Feature scaling
Max–Min	93.5%	91.76%	95.61%
Z-Score	93.8%	93.68%	95.67%

Table 12. Elapsed time on BedroomPI standing dataset.

Classification Models	ABLSTM	CNN	LSTM-1DCNN
Signal processing
Feature extraction
Amplitude	503.73 s	560.5 s	174.43 s
Denoising filters
Savitzky-Golay Filter	352.15 s	504.8 s	168.17 s
Butterworth Filter	462.44 s	528.18 s	172.48 s
Data Preparation
Detrending
Least Squares Baseline Removal	304.66 s	514.92 s	165.6 s
Polynomial fitting	480.79 s	500.29 s	169.7 s
Feature scaling
Max–Min	468.62 s	503.31 s	166.53 s
Z-Score	470.94 s	530.44 s	170.35 s

Table 13. Summary table of performance.

Classification Models	ABLSTM	CNN	LSTM-1DCNN
Sitting accuracy	69.25%	73.86%	84.29%
Standing accuracy	69.88%	72.92%	74.13%
Sitting time	280.85 s	91.68 s	56.23 s
Standing time	311.87 s	111.95 s	62.63 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Custance, O.; Khan, S.; Parkinson, S. Classifying Participant Standing and Sitting Postures Using Channel State Information. Electronics 2023, 12, 4500. https://doi.org/10.3390/electronics12214500

AMA Style

Custance O, Khan S, Parkinson S. Classifying Participant Standing and Sitting Postures Using Channel State Information. Electronics. 2023; 12(21):4500. https://doi.org/10.3390/electronics12214500

Chicago/Turabian Style

Custance, Oliver, Saad Khan, and Simon Parkinson. 2023. "Classifying Participant Standing and Sitting Postures Using Channel State Information" Electronics 12, no. 21: 4500. https://doi.org/10.3390/electronics12214500

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Classifying Participant Standing and Sitting Postures Using Channel State Information

Abstract

1. Introduction

2. Related Work

2.1. Human Activity Recognition (HAR) Based on Sensors

2.2. Human Activity Recognition (HAR) Based on CSI

3. Preliminaries

3.1. CSI Amplitude and Phase

3.2. Signal Processing

3.3. Data Preparation

3.4. Classification Models

3.5. Ethical Considerations

4. System Model

4.1. CSI Data Collection

4.2. Public Dataset

4.3. Experiment Setup

5. Evaluation

5.1. Experiment Results on Our Dataset

5.1.1. Experiment Results for Sitting

5.1.2. Experiment Results for Standing

5.2. Experiment Results on BedroomPi

5.2.1. Experiment Results for Sitting

5.2.2. Experiment Results for Standing

5.3. Performance Comparison

6. Limitations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI