Body-Worn Sensors for Recognizing Physical Sports Activities in Exergaming via Deep Learning Model

Obesity and laziness are some of the common issues in the majority of the youth today. This has led to the development of a proposed exergaming solution where users can play first-person physical games. This research study not only proposes a solution for physical fitness in the form of a game using wearable sensors but also proposes a multi-purpose system that provides different applications when trained for the domain-specific dataset. Critical tasks of gesture recognition and depiction in virtual reality can be applied to many applications in the domains of crime detection, fitness, healthcare, online learning, and sports. In particular, the proposed system enables a user to perform, detect, and depict different gestures in the virtual reality game. First, the system pre-processes input data by applying a median filter to overcome the anomalies. Then, features are extracted through a convolutional neural network, power spectral density, skewness, and kurtosis methods. Further, the system optimizes different features by using the grey wolf optimization. Lastly, the feature set which is optimized is fed to a recurrent neural network for classification. When Compared to the traditional methods, the suggested system gives better results while being easier to use. The IMSporting behaviors (IMSB) dataset includes badminton and other physical activities, the WISDM dataset includes common locomotor motions, and the ERICA dataset which includes a variety of exercises, were used in the experimentation. According to experimental findings, the suggested approach outperformed current methods, which showed detection accuracies of 85.01%, 88.46%, and 93.18% over the IMSB, WISDM, and ERICA datasets, respectively.


I. INTRODUCTION
According to global statistics, the gaming sector is now the one that is expanding the quickest globally. The gaming The associate editor coordinating the review of this manuscript and approving it for publication was Alberto Cano .
industry, which was estimated to be worth ''$179.7 billion'' in 2021, is predicted to increase at a ''CAGR of 8.94%'' from 2022 to 2027 and reach a value of $339.95 billion. [1]. Playing games using a mouse and keyboard is old-fashioned, and the need for a new methodology for playing games is essential to further expand this billion-dollar industry.
Apart from that, studies have revealed that people with more screen time are prone to become obese, lazy, sleepless, and tired [2]. Generally, sleeplessness can cause anxiety, mental disorders, and anger issues in adults. Most of these adults have no physical activities, and they are addicted to gaming, which causes a lot of rash effects on their behavior and their bodies [3]. Researchers have been trying to find new ways to make games less harmful and more useful in areas like health, education, sports, and the military. This is because games are used by a lot of people and can have negative effects on them.
Morton Heilig created the first virtual system in 1957 and gave rise to the concept of virtual reality (VR). The proposed gadget was known as sensorama, but later, in 1987, researcher Jaron Lanier came up with the phrase ''virtual reality'' [4]. VR headsets are now too expensive for the general public to use. Oculus Go, for instance, has the highest pricing at $545. The Oculus headgear may be purchased for as low as $249 [5]. There are some systems similar to Meta Quest 2, HTC Vive Pro 2, Sony PlayStation VR, and Wii Nintendo that provides similar and amazing experiences to their users, but they are quite costly. Apart from that these VR devices have long wires attached to computers which restrict users from moving freely in the space around them. There is no gesture recognition system inside these systems. They can only be used for a single purpose at a time. Powerful computing machines are required for generating VR views, and VR illusions and the controllers for playing games only work after buttons on the controllers are pressed. Some VR headsets have cameras that continuously detect the controller movements, and, in this way, they performed an action. In such a system it is common to witness action delay, speed issues, and gesture accuracy issues. Apart from these issues, there is also a configuration issue that is common in almost all VR headsets i.e., login, signup, and connectivity-related issues. Hence, a novel, more efficient, wireless, cost-effective, and the sensors-based wearable system is suggested in order to make a difference. It will help youngsters having obesity and other related issues by encouraging the use of physical exercise in them.
1) The proposed approach is multi-purpose and not only limited to gaming, but it can also be used in other domains such as fitness, robotics, drones, sports, and e-learning. 2) It connects VR and human physical health through playing games in an indoor environment. 3) The system will also enhance the trend of old-fashioned games by introducing a sensors-based wearable device that can control gaming objects precisely through accelerometer data generated from human body gestures. 4) The system makes a unique virtual reality experience by using inertial sensors, and it is both affordable and user-friendly. 5) The system removes the wires and the need for powerful computers. Instead, the wireless approach is designed for playing exergames in a virtual reality headset using wearable sensors. 6) To anticipate the values, an accurate recurrent neural network (RNN) classifier is utilized. The proposed approach uses 6 DOF (degrees of freedom) inertial sensors to measure acceleration. This data will be transferred via a transmitter to the computer, where a pretrained deep-learning model will test via an RNN classifier. Finally, the gaming behavior will be predicted, and the appropriate interface will be loaded into the game. The sensor data controls the avatar gestures and sends the realtime gesture data to a pre-trained model. The motions and activity recognition may be shown over the personal computer (PC) at the same time as the user watches in their VR headset.
Numerous head-mounted displays for VR (HMD-VR) exergames are being utilized widely for rehabilitation and to aid the recovery of patients [6]. In addition, VR games have several uses in the medical field, such as preventative health and well-being along with medical evaluations in clinical treatments [7]. VR gaming has a wide range of educational uses as well. The use of games in a classroom increases students' engagement and motivation, whereas first-person VR game usage will keep students active and healthy while having fun [8]. There has been a resurgence in the research related to delivering therapy using VR gaming systems. VR games offer potential in the treatment of numerous ailments, including post-stroke, Parkinson's disease, and others [9].
The following are our system's significant contributions: • Our suggested system provides a cost-effective solution to very serious issues like childhood obesity and other health-related problems.
• Similar systems exist in the realm of VR, but they are exceedingly expensive and out of the reach of the common man. As a result, the suggested method creates a product that allows individuals to keep in shape while playing games in an indoor environment.
• Our hardware-based system is being developed and tested. Therefore, it is a reliable solution for physical health and other applications.
• To make our system efficient and affordable, we are incorporating a straightforward and inexpensive VR device.
• The system outperformed other already made approaches in terms of accuracy rates.
The remaining article is structured as follows: Section II looks at similar research in the area of VR sports action detection using sensors and cameras. The suggested technique is thoroughly explained in Section III. The many datasets that were utilized to verify the effectiveness of the suggested strategy and the outcomes of those tests are described in Section IV. The paper's concluding part and goals for the future are included in Section V. VOLUME 11, 2023

II. RELATED WORKs
For deep learning and machine learning-based systems employing a range of sensors, including inertial measurement units (IMU), cameras, and other fused sensors, several methodologies have been proposed by numerous academics. This section reviews the research on camera-based and wearable sensors-based systems.

A. VIRTUAL REALITY EXERGAMING WITH WEARABLE SENSORS
Virtual reality games and sensors have been utilized in several applications in recent years. I. Paraskevopoulos and E. Tsekleves suggested a system in [10] that incorporated motion capture technology that was more affordable, flexible, and off-the-shelf with video games that were specially designed to meet the needs of Parkinson's Disease (PD) rehabilitation. However, they used larger controllers for the game, and D. Fitzgerald et al. developed a computer game for VR in order to lead an athlete through several advised rehabilitation activities [11].
In an effort to enhance physical performance while avoiding or treating musculoskeletal disorders, certified professionals have prescribed training programs to athletes. With the use of serious games and virtual environments, Mondragón Bernal et al. developed and assessed a system for teaching power distribution operation. Building information modelling from a ''115 kV'' substation was utilised to create a scenario with high technical details suited for professional training in the VR simulator. [12].
Immersive 3D virtual worlds and serious games, or video games meant to be educational, are both growing in popularity. Serious games have just lately been tested for healthcare education. Following a review of educational philosophies highlighting the importance of serious games and virtual simulations as teaching aids, Ma et al. examined various instances of early teaching models and evaluated procedures in their study [13]. They further made recommendations for how to assess their worth in a learning environment.
VR technologies are gaining popularity as a way to model, evaluate, and improve the assembly process. Abidi et al. discussed the development of a ''haptic virtual reality platform'' for virtual assembly planning, execution, and evaluation. The technology enables real-time handling and interaction with virtual components. To examine the advantages and disadvantages of combining haptics with physical-based modeling, the system consists of several software programs including Open Haptics, PhysX, and OpenGL/GLUT libraries [14].

B. VIRTUAL REALITY EXERGAMING WITH CAMERA
Researchers have used a variety of approaches while employing camera-oriented VR systems. To put handicapped persons at ease, Gerling et al. [15] devised a system that employed the ''Kinect v2 depth camera'' to evaluate the movement of wheelchair and created two Unity VR games. The study's findings were highly encouraging since an immersive VR experience for persons with disabilities proved to be a wonderful experience for them.
Stomp Joy was a camera based VR game which was specific to one task which was rehabilitation after lower limbs stroke developed by Xu et al. [16]. Sangmin et al. [17] created a VR game in Unity for the A-Camera and A-Visor. They displayed cutting-edge head-mounted virtual reality controllers that enthusiasts could easily construct for themselves using corrugated cardboard, an Arduino, and sensors.
Another study expands the use of VR in manufacturing by incorporating ideas and research from training simulations into the evaluation of assembly training efficacy and training transfer [18]. A research was carried out by Abidi et al. to evaluate and contrast the virtual assembly training method used for the first Saudi Arabian vehicle prototype. Three learning contexts were examined in this study: conventional engineering, computer-aided design environments, and immersive VR. Random assignments were made to the various training contexts for 15 university students [19].
Industrial design, planning, and prototyping are more successful and economical when done in VR. The study conducted by Abdulrahman M. Al-Ahmari and colleagues and reported in this paper was primarily concerned with creating a ''virtual manufacturing assembly simulation system'' that tackles the limitations of VR settings. Using a virtual environment, the proposed system builds an interactive workbench for looking at different assembly options and teaching how to put them together [20]. Dissimilar from the above systems in literature, our framework proposed wireless body-worn sensors for controlling 3D game objects, a deep learning-based approach for recognizing sports behaviors, and activity recognition for an indoor gaming activity that is used to predict the label of the considered game activity.

III. OUR APPROACH
This section elaborates on the proposed architecture for active monitoring of the sports-related activities of humans and their conversion into IMU data for its recognition in gaming activity. Such recognition can be very helpful for demonstrating complex sports behaviors in VR. It is also helpful for artificial intelligence-based gaming objects to recognize a set of sports behavior. Fig. 1 shows a description of the overall system. According to the figure, wearable sensors generate accelerometer data for particular gaming activity, which is performed by humans using the body-worn sensors-based device. The publicly available benchmarked datasets were used to evaluate the proposed system. The data was pre-processed, and the corresponding features were extracted. A well-known approach of grey wolf optimization (GWO) was efficiently used for feature optimization over the extracted features. RNN classifier is further applied for the classification of these optimized features to recognize the gestures performed by humans. Lastly, the predicted gesture was depicted in the VR game and the user can play the game as first person in the VR world.

A. DATA PRE-PROCESSING
The ERICA dataset contained 3-axes accelerometer data, which is obtained by integrating mpu6050 device with Arduino. Each value in the data portrayed a certain position of a body part where the body-worn device is attached in 3D space. Hence, each value of the dataset is equally important in the proposed approach, but the data may contain irrelevancy, irregularity, inconsistencies, and repetition that can affect the proposed model and generate false predictions [21]. It is called the noise in data that needs to be calibrated before it is fed to the classifier. To reduce this noise, the data was divided into frames, which improve the quality of data and ensure the employment of signal enhancement for data filtering to identify undesired features. It also helps in avoiding irrelevancy, irregularity, inconsistencies, and repetition issues [22]. Then, a 3 rd order median filter has been used to cancel the significant noise artifact. However, the median filter is a ''nonlinear filter'', which removes ''speckle noise'' from a given signal. It provides the median of the signal in a required size and outperformed the ''low pass filter'' because of reducing noise with keeping the original signal. The median filter is calculated as: where n is the count of values and X is the ''ordered set of values'' in the dataset [23]. ERICA is a lightweight dataset containing sensor data from three different gym exercises. Every data value is important, and a small proportion of data requires pre-processing. For this purpose, median filter was applied. Fig. 2 shows the result of the median filter applied over ERICA dataset in the form of filtered and unfiltered signals. The dotted line displays the filtered wave, and the solid line shows the unfiltered data.

B. FEATURES EXTRACTION
After pre-processing, the data is further subjected to features extraction methods to collect unique features from the data. These features were then passed to the features optimization module for further processing. We have utilized four different feature extraction methods including power spectral density (PSD), skewness, kurtosis, and convolutional neural network (CNN).

1) POWER SPECTRAL DENSITY (PSD)
PSD determines the ''power of a signal as a function of frequency'' by using the signal's per unit frequency [26]. Watts per hertz (W/Hz) is a typical unit of measurement for PSD. Fast Fourier Transform (FFT) has it's function to VOLUME 11, 2023 produce the Discrete Fourier Transform X (w i ) of a signal, where w i gives the frequency point. Following is the equation to calculate PSD: The average power P x can be explained as S (f ) df , where the function S (f ) is used to express the power of each minimal limit unit's frequency component, it will be referred to as the PSD [27].
PSD shows the energy of fluctuations relative to frequency. In other words, it demonstrates the frequencies whose certain variants are strong and those frequencies that are weak. PSD is applied over three columns of the dataset, which contains accelerometer data in the x, y, and z-axis. This data is collected concerning time domain. When PSD is applied over the ERICA dataset, unique features are extracted from the frequency domain. Fig. 3 elucidates the results, which show the signals' power vs frequency. This helped in explaining the distribution of data between multiple frequency domains.

2) SKEWNESS
Skewness can be defined as a slight deviation from the ''normal distribution or symmetrical bell curve'' in a collection of data. There are various conditions. If the curve is seen inclined towards right or left, data has skewness. Skewness can have zero, negative, positive, or undefined values [28]. Skewness can be calculated as: where N is the total sample count in data, X i is the value of the samples, X gives the mean, and σ shows the standard deviation. Skewness scores are calculated between −3 and +3.
If a value of skewness in a distribution is more than 1 or lesser than -1, it is said to be strongly skewed, if it is between 0.5 and 1 or -0.5 and -1, it is said to be mildly skewed. Additionally, the distribution is said to be very symmetrical if the value of skewness is between -0.5 and 0.5 [29]. The skewness of the ''ERICA dataset'' is shown in Fig. 4.

3) KURTOSIS
The final feature extraction method used in our proposed system was kurtosis. It can be termed as the cumulative weight of a distribution's tails in relation to its middle point. A set of essentially normal data may be visualized using a histogram to reveal a bell-shaped peak with the majority of data falling within three standard deviations (plus or minus) of the average [30]. The equation below is a mathematical formula for kurtosis: where N is the total sample count in data, X i is the value of the samples, X denotes mean and σ denotes standard deviation. A metric in the statistics field is called kurtosis, which expresses the measure of divergence of a distribution's tail from those of a normal distribution. Kurtosis, thus, tells if a particular distribution's tails include greater values. [31].

4) CONVOLUTIONAL NEURAL NETWORK
To extract features, we applied CNN over the filtered data that collected the features, while a dissimilar neural network classified the features. The feature extraction network uses input data. Three layers make up a neural network; input, output, and hidden layer. The neurons in CNN are similar to the neurons of human body. The way they take the input, analyze it and send the response to the body is similar. Data arrays are accepted as input by the input layer. CNN's may have several hidden layers that employ mathematics to extract characteristics from the provided data. Several instances of this include convolution, pooling, corrected linear units, and fully connected layers. Formally, the following formulas were used to extract key features map via one-dimensional convolution operation: where a l j (τ ) denotes the ''feature map j in layer l'', σ is a ''non-linear function'', F l gives the ''number of feature maps in layer l'', K l jf displays the ''kernel convolved over feature map f in layer l'' to form the ''feature map j in layer (l+1)'', p l is the ''length of kernels in layer l'' and b l j provides a ''bias vector'' [25]. The datasets have some activities and when we pass the dataset columns to CNN based feature extraction method, we will get a unique feature. The features extracted of CNN on the ERICA dataset are demonstrated by three different colors in Fig. 6. The algorithm for data preprocessing and features extraction methods described above is shown in Algorithm 1:

C. FEATURE OPTIMIZATION
The dimensionality reduction method was employed next over multiple datasets to divide and reduce the vector size to make them more manageable groups. A necessary step in the ''feature selection process'' of a predictive model is to make the feature array smaller and use only the features that are important in certain cases. Fewer input variables Modern advanced feature selection techniques choose a subset of essential features utilizing the strength of optimization algorithms to improve classification outcomes [32]. Numerous controlling factors were used by the majority of optimization algorithms including the genetic algorithm. They must be tuned for improved performance. The optimization step of the proposed system uses the GWO technique, which is a novel meta-heuristic optimization technique. Its guiding premise is to model cooperative hunting behavior similar to that of grey wolves in the wild. Compared to other techniques, GWO has a unique model structure. The goal of the GWO is to use population interaction to locate the best areas of the complicated search space [33]. The pack finds its prey by changing the positions of the individual agents with respect to the prey location as follows: where X p is the prey position, X is the grey wolf position t is the iteration, the dot operator shows vector entry-wise multiplication, and D is defined as: where coefficient vectors (A and C) are computed as follows: where ''r 1 and r 2 ''are random vectors with ranges [0, 1] and a is a linear function of the number of exploration and VOLUME 11, 2023 exploitation repetitions. All wolves have the same value for a. According to these calculations, a wolf can modify its location in the search area around its prey at any random time. The entire pack engages in hunting based on information provided by the beta, alpha, and delta wolves, who are aware of the whereabouts of the prey, as stated in the following: where X 1 , X 2 , X 3 are calculated as follows: where X 1 , X 2 , X 3 are best results and D α , D β , D γ are calculated as [34]: Fig. 7, Fig. 8, and Fig. 9 display the visualization of the fitness value or best solution with the number of iterations by applying GWO over the ERICA, IMSporting Behaviors (IMSB), and WISDM datasets, respectively.

D. RECURRENT NEURAL NETWORK
The classification of interactions has been carried out by a classifier named RNN and it is the last phase of the proposed system. The RNN is a fast, robust, and one of the most reliable neural networks currently available due to its unique feature called internal memory [35]. Fig. 10 shows a visual representation of the RNN for the ERICA dataset, which has a 5 hidden layers of LSTM, one input layer of LSTM and an output-dense layer. x(t) is used as input at any time step t in RNN. Onehot vector x1, for instance, may correspond to a word in a text. H(t) serves as the network's ''memory'' and represents a concealed state at time t. The hidden state of the previous time step and the current input is used to determine h (t). The RNN has connections for hidden input, hidden-to-hidden recurrent connections, and hidden-to-output connections, all of which were parameterized by a ''weight matrix U,'' ''weight matrix W,'' and ''weight matrix V,'' respectively. Over time, all of these ''weights (U, V, W)'' were shared. By o(t), the network output is shown. [36]. The following set of equations can be used to model the RNN forward pass: The equations shown above are an illustration of a recurrent network that converts an ''input sequence'' into an identically lengthened ''output sequence''. The sum of the losses across all the time steps would thus be the overall loss for a particular series of x and y values. We suppose that the vector of probabilities over the output was obtained by using the ''outputs o(t)'' in the softmax function [37]. We also suppose that, given the current input, the loss L is the ''negative loglikelihood'' of the genuine goal y(t). The algorithm for feature optimization and classification by RNN is shown below in Algorithm 2:

IV. EXPERIMENTAL SETUP AND RESULTS
The experimental results part discusses about the benchmark datasets which we have used in our study, and experimental setup for the proposed system, statistical evaluation, results by implementing the proposed architecture, and comparison of this work with other body-worn systems.
The first dataset used is IMSB created by Intelligent Media Centre, Air University, Islamabad. There are six different sports-related interactions in the IMSB dataset including table tennis, football, cycling, badminton, basketball, skipping. Three tri-axial accelerometers were attached to the knee, wrist, and lower neck regions of the subjects. The dataset contained motion data from participants performing six different sports as mentioned above. The dataset also contained 120 data sequences with varying exercise time period from the 40s to 60s. A total of 20 subjects were engaged in repetitive behaviors. Fig. 11 shows the plots of raw data from three accelerometers in x, y, and z coordinates for basketball and badminton behavior. The WISDM is the next dataset that was used. The activities included in the dataset are running, sitting, standing, going up and down stairs, and so on. There are 1,098,207 total samples in this dataset, including 424,400 walking samples, 342,177 jogging samples, 122,869 upstairs samples, 100,427 downstairs samples, 59,939 sitting samples, and 48,395 standing samples. Fig. 12 represents the walking and jogging behavior over the WISDM dataset.  The third dataset utilized was ERICA, which was created for the automated tracking and analysis of exercise activities at the individual level. This dataset was acquired as part of the development of a low-cost, pervasive digital personal training system that combines affordable IoT sensors linked to dumbbells with personal wireless ear-worn devices (earables) to enable fine-grained tracking of a person's free-weight exercise training. Total of 324 samples from three separate free-weight workouts carried out by 27 subjects are included in this dataset. The activities performed in a dataset are biceps curls, lateral raises, and triceps extensions. Fig. 13 shows the biceps curls and lateral raises behavior of the ERICA dataset.

B. EXPERIMENTAL SETUP
This section gives a brief description of the implementation of our proposed system. A 3D VR game will be made in Unity3d which when playing is visible to the person by a screen attached to the VR headset and on the PC as well through screen casting method [42].
A 6 DOF sensor MPU6050 is utilized for capturing the motion data of the human body during exergaming. Arduino nano read the sensor data and send it via nrf24l01 to the receiving point, which is computer, using serial communication. Further, the User Datagram Protocol (UDP) communication was used to establish a secure connection using specific IP based path to convey data from host to destination and in our system, it helps in establishing a secure channel connection between computer and VR game to convey the predicted results from the proposed model to the game [43]. The results are presented in the form of confusion matrices and a precision-recall table. On a Windows 10 computer running the Unity3D and Python programming languages, with 16 GB of RAM, and a Core i7-7500U CPU running at 2.70 GHz, all processing and experimentation were carried out. nRF24L01 and MPU6050 on an Arduino Nano were utilized to create body-worn. Finally, using the IMSB, WISDM, and ERICA datasets, the suggested system's performance is compared to the precision of other already made systems.

C. SATISTICAL EVALUATION
We will go over the experimental findings of the suggested model using the publically accessible IMSB, WISDM, and ERICA datasets in this part. We will also be comparing the results with other state-of-the-art methods.

1) IMSporting BEHAVIORS DATASET
Regarding the IMSB dataset, confusion matrices are used to demonstrate interaction recognition for different dataset types. A ''confusion matrix'' measures the effectiveness of a classifier on the basis of ''true positives, false positives, true negatives, and false negatives''. [44]. The amount of true positives reveals the correctly detected classes represented on the matrix diagonal. Table 1 demonstrates the confusion matrix over the IMSB dataset. The confusion matrix in Table 1 shows that a few interaction classes having related activity types are confused with each other. The mean accuracy achieved by applying the classifier is 85.01%. The recall, precision, and F1-score for different classes over the IMSB dataset are shown in Table 2. Hence, an accurate system was developed, which was able to recognize each game with high precision [45]. The results of the gaming interface using the IMSB dataset are presented in Fig. 14.  Table 3 represents a comparison of classifier results when compared to the other state-of-the-art methods. Many systems were developed that are similar to the proposed method. A comparison of the systems developed over the IMSB dataset has been displayed in Table 3. The results from other systems using different classifiers over the IMSB dataset are compared with the proposed method. Authors have used artificial neural networks algorithm along with features extraction methods and got an accuracy of 82.83% [46]. Another system utilized a random forest algorithm and achieved an accuracy of 83.42% [47]. One of the other proposed systems used for classification through LSTM with multi-fused features extraction achieved an accuracy of 80% [48]. Lastly, a system using the multi-layer perceptron (MLP) achieved an accuracy of 75.90% [49]. The system proposed in this paper has achieved an accuracy of 85.01% that outperformed all the previously proposed systems.

2) WISDM DATASET
For WISDM dataset, the results from RNN classifier over the optimized features produced the confusion matrix that is shown in table 4. It is clear that the result of interaction classes was efficient and acceptable. A small amount of data from some interaction classes was confused i.e., achieving a mean accuracy of 88.46%. The ratio of ''correct positive predictions'' to the ''total positives'' is precision while the recall is the ''true positive rate'', and it is the ratio of ''correct positive'' to the ''total predicted positives''. The average of precision and recall is the F1 score [50]. The precision, recall, and F1-score for classes of the dataset are given in Table 5.  VOLUME 11, 2023 An accurate system was developed that can recognize each game with high precision. The results of the gaming interface using the WISDM dataset are shown in Fig. 15.  Table 6 represents a comparison of RNN results over the WISDM dataset with other state-of-the-art models. The results were compared with the conventional system and compared with the proposed method. Authors in [51] have used a reweighted genetic algorithm and achieved an accuracy of 87.75%. Another proposed system in [52] utilized MLP and achieved a 75.09% accuracy rate. Another system applied classification through CNN and achieved 75.90% accuracy [53]. The Hoeffding tree algorithm has achieved an accuracy rate of 75.54% in another proposed method [53]. Lastly, a system utilized support vector machines and achieved 82.77% accuracy [38]. The proposed system in this paper has outperformed these systems by achieving an accuracy rate of 88.46%.

3) ERICA DATASET
Due to its light nature, many classifiers generate efficient results over the ERICA dataset. The accuracy acquired by the proposed system via RNN and optimized features from ERICA dataset is 93.18%. The confusion matrix over ERICA dataset is shown in Table 7. It is shown that the results of interaction classes have achieved a mean accuracy rate of 93.18%. Despite the complications in activities, the results show that few activities are confused with other activities.    The precision, recall, and F1-score for activities recognized are given in Table 8.
The outcomes from the gaming interface over ERICA dataset are shown in Fig. 16. The interface also illustrates the gaming object's gesture position that will give the information regarding gaming label prediction. Table 9 represents a comparison of RNN results over the ERICA dataset with other state-of-the-art methodologies. According to the table, Radhakrishnan et al. used the ERICA dataset for their experiment and achieved 70.0% accuracy using a random forest classifier [41]. An accuracy rate of 81.7% was attained for identifying gym workouts while monitoring the leg muscles using a pressure-sensing system [54].
The accuracy of the Kalman filter is 84.0% in a filter-based sensor fusion activity recognition system [55].

V. CONCLUSION
The proposed system effectively implemented a VR firstperson game with an accurate deep learning-based gesture recognition system. Recently, it solved a major problem of obesity caused by a lack of physical activity particularly in the young generation. Three datasets were utilized for experimenting with the proposed approach, which are IMSB, WISDM, and ERICA datasets. First, the dataset used is pre-processed by applying a third-order median filter. Next, features were extracted by four well-known techniques called power spectral density, CNN-based features extraction, skewness, and kurtosis. Then, the datasets were further reduced through grey wolf optimization to get the optimized features. Further, the gestures were classified by applying the RNN classifier. After gesture prediction, the hardware is implemented using Arduino and motion sensors. Furthermore, the hardware and software components are created and combined by serial communication. Extensive experiments have been performed over the three datasets and demonstrated the effectiveness and efficiency of the system by achieving remarkable results and superior performance. It also outperformed the recognition accuracy of conventional state-of-theart systems.
As for limitations, the sensor must be calibrated otherwise, it will generate wrong results. The sensor must be placed in the right place to get the desired gesture result. The dataset was generated by taking data on exercises from healthy and young persons, and if you try Bodyworns on disabled persons, you may get the wrong result. Activity labels might appear after a few seconds of delay if you performed multiple activities together in less than 2 seconds. A battery power of 5V is required for the sensor to work normally.
By including new features and playable games, we want to increase the effectiveness of the suggested system in the future. Additionally, we want to create a jacket with bodyworn sensors. In the future, we also hope to increase the system's precision and provide consumers a better user interface so they may play and take pleasure in a virtual gaming experience.
MIR MUSHHOOD AFSAR received the B.S. degree in computer science from Air University, Islamabad. He is currently a Research Assistant with the Intelligent Media Centre. His research interests include machine learning, deep learning, camera and sensor-based gesture recognition, and virtual reality.
SHIZZA SAQIB received the bachelor's degree in computer science from Air University, Islamabad. She is currently a Research Assistant with the Intelligent Media Center. Her research interests include machine learning, deep learning, image processing, and virtual reality.
MOHAMMAD ALADFAJ is currently with the Department of Natural Engineering, College of Science, King Saud University, Saudi Arabia.
MOHAMMED HAMAD ALATIYYAH is an Assistant Professor of computer science with the Computer Science Department, Prince Sattam Bin Abdulaziz University, Saudi Arabia. His research interests include the recommender systems and computer vision, such as group recommender systems, travel recommender systems, and drone vision.
KHALED ALNOWAISER received the Ph.D. degree in computer science from Glasgow University, Scotland. He is an Assistant Professor with the Computer Engineering Department, Prince Sattam Bin Abdulaziz University, Saudi Arabia. His research interests include computer vision, optimization techniques, and performance enhancement.
HANAN ALJUAID received the B.S. degree from KAU University and the M.S. and Ph.D. degrees in computer science from UTM University, in 2014. She is currently with the Computer Sciences Department, College of Computer and Information Sciences, Princess Nourah Bint Abdul Rahman University (PNU), Saudi Arabia. Much of her work has been on improving the understanding, design, and performance of pattern recognition, mainly through the application of data mining and machine learning. She has given numerous invited talks and tutorials. She has published numerous articles in pattern recognition, the IoT, and data science. Her research interests include computer vision and NLP. His research interests include high-reliable autonomic computing mechanism and human-oriented interaction systems. VOLUME 11, 2023