A Survey on Yogic Posture Recognition

Yoga has been a great form of physical activity and one of the promising applications in personal health care. Several studies prove that yoga is used as one of the physical treatments for cancer, musculoskeletal disorder, depression, Parkinson’s disease, and respiratory heart diseases. In yoga, the body should be mechanically aligned with some effort on the muscles, ligaments, and joints for optimal posture. Postural-based yoga increases flexibility, energy, overall brain activity and reduces stress, blood pressure, and back pain. Body Postural Alignment is a very important aspect while performing yogic asanas. Many yogic asanas including uttanasana, kurmasana, ustrasana, and dhanurasana, require bending forward or backward, and if the asanas are performed incorrectly, strain in the joints, ligaments, and backbone can result, which can cause problems with the hip joints. Hence it is vital to monitor the correct yoga poses while performing different asanas. Yoga posture prediction and automatic movement analysis are now possible because of advancements in computer vision algorithms and sensors. This research investigates a thorough analysis of yoga posture identification systems using computer vision, machine learning, and deep learning techniques.


I. INTRODUCTION
Around the world, millions of people have died as a victim of covid-19. Heart disease and stroke have become the other major public health problems, and stroke can impair mobility while also being a leading cause of disability. Mental health issues are on the rise, including millions of individuals struggling with depression worldwide. Nowadays, poor nutrition, lack of physical activity, and obesity are the primary causes of numerous health problems.
Yoga has been used as a clinical therapy to improve immunity and assist in the case of chronic conditions such as cardiovascular disease, respiratory disease, cancers, & metabolic disorders. Asanas which include pranayama for breath control & dhyana for mind relaxation, enhance the innate immune system in that it prevents the onset of chronic arthritis. It can also help with the symptoms of chronic arthritis by enhancing joint mobility and microcirculation. These practices help to alleviate the The associate editor coordinating the review of this manuscript and approving it for publication was Wei-Wen Hu . psychophysiological impacts of prolonged stress. When practicing yoga, serotonin, oxytocin, & melatonin are released, and it is beneficial to cope with fear and anxiety during a pandemic [1].
Around 31 million adults in the U.S. practice yoga as a part of their lifestyle, and more than 21 million adults adopt yoga for their health benefits [2]. Yoga in the form of exercise unites the body and mind, and it minimizes health issues & disease burden too [3]. According to a recent analysis [4], yoga can help balance motor function along with physiological consequences, including blood pressure, pulse rate, and body weight. Yoga has given prominence to breathing control, mind control, body control, and posture maintenance, and these main components differentiate yoga from physical exercises. Three primary practices of yoga include Asanas, Meditation, and Pranayama. i) Asanas is a Sanskrit word for body posture. It creates lightness in the physical body and corrects imbalances. Muscles, joints, the circulatory system, and the neural system also benefit from asanas. ii) Meditation improves a person's mental health. iii) Pranayama helps to regulate breathing. VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ All the yoga postures are classified based on their utility, and pre-position [5]. As per the utility, yoga asanas are classified into three categories such as 1. Cultural Posture, 2. Meditative Posture, and 3. Relaxative Posture.
Cultural or Corrective Posture is used to regulate the defective body posture, systematize the different activities, and renovate energy in the body. Cultural postures are commonly performed as daily physical exercise. It is separated into seven categories i) Dynamic Sequences, ii) Inverted Postures, iii) Forward Bending Postures, iv) Backward Bending Postures, v) Twisting Postures, vi) Sideward Bending Postures, and vii) Standing Postures.
Meditative Posture is used to stabilize the body for some advanced pranayama and meditation practices. Siddhasana (Adept's Pose), Padmasana (Lotus Pose), Vajrasana (Diamond Pose), Sukhasana (Comfortable Pose), Swastikasana (Auspicious Pose), and Vrukshasana (tree pose) are some of the meditating asanas. Relaxative Posture is performed to get healed from the situations like stress and the associated tensions. This Posture brings physical and mental relaxation. A few common relaxation postures include Makarasana (Crocodile posture), Shavasana (Corpse posture), and Balasana (Child's Pose) [6]. As per the pre-position, the major types of poses are standing poses, sitting poses, supine poses, prone poses, balancing poses, forward bend poses, backward bend poses, twisting poses, and inversion poses [7].
Pranayama is the art of deliberately altering one's breathing patterns, typically while seated, to include quick diaphragmatic breathing, slow, deep breathing, trying to breathe via alternate nostrils, and breath holding or retention.
Pranayama exercises four essential aspects of breathing Pūraka (inhalation), recaka (exhalation), anta kumbhaka (retention of breath inside the body), and bahi kumbhaka (retention of breath outside the body) [8]. Figure 1 depicts the classification of asanas separated only based on utility and body positions.

A. HEALTH BENEFITS OF YOGA
Sun salutation is the ideal activity that produces optimal stress on the cardiovascular and respiratory systems because it utilizes exercise's slow, dynamic, and static stretching elements. In order to examine the potential of sun salutation as an exercise and weight loss tool, four rounds of it were assessed for their effects on the cardiorespiratory and metabolism systems. Six Asian Indian males and females (18 to 22 years) who had practiced sun salutation for more than two years took part in this study in good health. In this study, the volunteers were attached to a heart rate tracker and the Oxycon Mobile Metabolism System to assess their heart rate & oxygen consumption during the four rounds of sun salutation. The assessment was conducted in a single session lasting around 30 minutes. This study's volunteers burned exactly 100 kcal every 15 minutes (400 kcal per hour) [9].
Yoga has been accepted as a physical exercise that is safe to practice following a stroke. According to participants, the physical advantages of yoga include ''increasing flexibility, strength, and coordination [10].'' In this study, a two-week training program in an integrated set of yoga postures, comprising breathing techniques, sun salutation, yogasana (physical postures), pranayama (breathing), dhyana (meditation), and a devotion session was taught to fifty-three asthmatic patients. They were instructed to perform these poses for sixty-five minutes each day. They were compared with a control group of fifty-three asthmatics matched with their age, gender, and type severity of their condition, who kept taking the regular medications. Regarding the frequency of asthma attacks per week, drug treatment ratings, and peak flow rate, there was even a noticeably more significant improvement in the yoga-practicing group [11].
This systematic review trial demonstrated that Sahaja yoga has minimal positive effects on several objective and subjective indicators of asthma's effects in patients who are willing to do nonpharmacological therapy. Compared with the control group, participants in the yoga group gained marginally higher on the Asthma-related quality of life mood subscale and had higher peak expiratory flow values. Possible reasons include a change in the passage of ''vital energy,'' as defined by the conventional yogic system, or a change in its dynamics of airway muscle cells [12].
This trial examines the effects of a single session of yoga on cardiovascular physiology and discovers that paled in comparison to simply sitting on a chair at rest, yoga raises oxygen consumption (VO2), Metabolic Equivalents (METs), Heart Rate (HR), and percent Maximal Heart Rate (MHR) by 0.35 L/min, 1.67, 20 beats per minute (bpm), and 11%, respectively [13].
This study examines the heart rate differences whilst practicing yoga postures, deep breathing, and relaxing techniques. Sixteen volunteers are instructed through three distinct styles of yoga practice. Polar S610 heart rate monitors measure every practice session's one-minute mean heart rate. A repeated analysis of variance measures did not find any substantial variations amongst any of the three yogic styles at the early or the last resting heart rate. It is measured in the fourth and seventy-eighth minute of every session. A substantial variation was discovered between the early and last relaxing postures during the whole 80-minute practice as well as throughout the ''Postures only'' phase of the session. Compared to the remaining two forms of yoga, Bonferroni post hoc checks found significance in the astanga yogic heart rate. There was no discernible variation between hatha yoga and gentle methods. These findings reveal that various yogic exercises might have different health advantages [14].
Participants graded yoga's impact on seven health-related factors, including lifestyle, mental, and physical components. The majority said they strongly agreed with or agreed that yoga had enhanced their strength (87.1%), flexibility (91.6%), stress (82.6%), and mental stability (86.2%). In addition, 57.4% of people reported sleeping better, and 69.3% mentioned their yoga practice had improved the way they lived [15].
This study reveals that in all suspected and proven cases of COVID-19 (stages 1 and 2) cases, in order to manage their stress, they need to practice relaxation exercises like breathing exercises, meditation, and yoga [16]. Availability of internet platforms that give remote fitness classes such as yoga, Pilates, and workouts, as well as suitable activities that can be completed at home or quarantine room, provides an opportunity for those people to maintain or engage in physical activity [17]. Yoga, healthy fresh fruits, vitamins C, D, and zinc boost the immunity of the affected ones [18]. Sun salutation can help to manage weight and maintain or increase your cardiorespiratory fitness [9].
During the practice, better monitoring of yogic posture is mandatory to gain all the health benefits.

B. YOGA RELATED INJURIES
Yoga-related injuries occur for several reasons, including poor instructor, improper posture, self-or instructor-imposed pressure to perform specific complex postures, and proper guidance in yoga.
Muscular injuries are most common during any exercise or activity, and yoga will be no exception. Sixty-two percent of survey respondents reported at least one musculoskeletal injury that lasted over a month. There were 107 muscular reports of injuries in total during the practice of Ashtanga Vinyasa Yoga. A new practice-related injury occurred every 1,000 training hours at a rate of 1.18. Recurrence of preceding injuries and non-specific back pain of unidentified origin were added, increasing the injury rates to 1.45 per 1,000 training hours. Injuries during yoga were more common in the lower extremities, specifically, the hamstrings and knees [19].
The following case studies reveal that improper stance of yogic posture could become a health issue during practice. A 15-year-old girl had a fracture-separation of the epiphyseal plate of the distal tibia at the time of practicing the lotus yoga posture. In the lotus posture, ankles need to be laid in a position of supinated-inverted with the foot on the opposing thigh. This posture involves the entire structure of the tibiotarsal joint's outer compartment, including the tibiofibular ligament, and forces the insertion of an anterolateral region of the epiphysis tract to the maximum [20]. A 38-year-old healthy yoga practitioner had a low-energy femoral shaft fracture while practicing a specific yogic stance, marichyasana pose B. This posture must bend the hip and knees into the opposing inguinal crease [21].
From their case report, doing yoga while using sedative drugs, being elderly, or having ''benign hypermobility'' of their connective tissue needs much alertness. These individuals risk suffering sciatic nerve damage while doing complex and challenging yoga postures for an extended amount of time [22]. Information released on particular injuries linked to yoga has frequently come from relatively small case studies, frequently focusing on a single person who sustained an injury as an outcome of extreme activities which are very out of the normal for a yoga session or because of the participant's underlying medical condition [23]. Injuries in yoga are not rare; however, the most frequent adverse effects are musculoskeletal-related, primarily tiny ligaments or muscular injuries which cure completely without intervention [21]. In this study, researchers discovered that the incidence of VOLUME 11, 2023 severe injuries in yoga was low compared to specific other physical exercises and that the number of injuries revealed by practitioners for each year of practice exposure is low, confirming that yoga is not a high-risk activity. Despite the limits of yogic research studies, the available evidence indicates that participants gain a wide range of wellness and health improvements [23].
Optimal posture is essential to get all the health benefits of yoga; otherwise, poor postures result in minor injuries while practicing yoga.

C. THE ROLE OF COMPUTER VISION IN YOGA
Yoga is widely practiced in the yoga centre under the proper guidance of a yoga instructor who could effectively instruct the client thoroughly with his/her assistance.
The COVID-19 outbreak and its associated lockdown measures have twisted the work-life balance. Due to global restrictions, individuals have been forced to stick in their households for days or weeks. Meanwhile, all the gyms, yoga, and sports centres are closed. It has put enormous stress on people's physical and mental health.
Recent advances in vision and sensor-based methods suggest a distanced yoga approach without the trainer, in which the individual could stand before a gadget and precisely practice yoga postures without the need for an educator or even the yoga learning centre [24]. Researchers focus on creating novel digital systems based on computer vision methods to process, analyze, and sense visual information (such as images or videos) from cameras or sensors.
With recent technological advancements in sensors and the emergence of kinetic devices, it is now easier to detect human poses in real time. Human Pose Prediction is one such type that is generally utilized for real-time yogic posture detection around the world. It focuses on remaking and comprehending the asana's posture from depth images.

D. HUMAN POSE ESTIMATION (HPE)
Real-time HPE is a promising and significant research challenge in computer vision. The HPE's core concept is to detect and evaluate human posture [52], [53]. In HPE, the problem of localizing the keypoints or body parts of the human is mainly focused on [54].
HPE automatically discovers the keypoints of human body parts in images and videos. In detecting a single person in an image or video in simple scenarios, a single-person posture estimation algorithm is used to detect the keypoints of a specific posture. If more than one person is detected from the image or video, then multi-person algorithms are employed to detect the keypoints in human body parts [55].
1) Single-Person Pose Estimation: It is used to estimate a stance of a single person from an image or video. Pictorial structure models are conventional methods of this approach. For example, tree and random forest (RF) models are the most efficient in single-pose estimation. Deep learning techniques have been very convincing in object/face detection and HPE recently [56]. 2) Multi-Person Pose Estimation: Most of these algorithms use a top-down approach to estimate an image and then extract the posture of every person [57]. Two major multi-person posture estimation algorithms [58] follow either a bottom-up or top-down approach. However, on the contrary, all the single-person posture estimation approaches are top-down [59].
• Bottom-Up Approach: This approach first detects all parts in the image, followed by associating parts for each distinct person [54], [66], [67], [68], [69] based on the bottom-up approach. 3) 2D Posture Estimation: In two-dimensional posture estimation, X and Y coordinates are solely anticipated in each image landmark. It provides no information regarding the angles of the skeleton. 4) 3D Posture Estimation: In three-dimensional posture estimation, both X, Y, and Z coordinates of each landmark and the angles of each body joint of a human skeleton.

E. HUMAN BODY MODELING
One of the most important aspects of identifying human posture is the choice of a human body model. A body model contains information like human shape and texture [43]. Analysis of the human body model gives all the essential information about the body. All the significant aspects of the human body model include points, segments (each segment is considered a rigid body), and segment groups linked to one another via joints. Human body models are separated into the following categories based on how frequently they are used [70]. Figure 2 shows the three human body models.

1) SKELETON-BASED MODEL
This model is also referred to as the kinematic model [71]. This human body model was simple and adaptable enough to be used in 2D, and 3D posture estimation [54], [72]. It represents the set of joints like shoulders, elbows, knees, ankles, and limb orientations encompassing the skeletal structure of a human body. While the advantage of this model includes its flexibility and ease of representation, it also has certain disadvantages; for example, a lack of textures in an image implies a lack of width and contour details of the human body [70].

2) CONTOUR-BASED MODEL
This model is also referred to as the planer model. It depicts the human body shape by encircling it with a contour [73]. Multiple rectangles imprecise the person's body contours reveal human body parts in this model, and it is used for 2D pose estimation [74]. The most widely used contour-based models are the active shape model, and the cardboard model [75], [76], [77], [78], [79].

3) VOLUME-BASED MODEL
It is used to estimate 3D poses. Various well-known 3D person's body models are used for deep learning-based three-dimensional postural measurement for 3D human mesh extraction [74]. It was trained using 60,000 human scans from the GHS3D dataset, which includes full-body and close-ups of the face and hands [80].
In the first stage, it collects the user's image or real-time video of yoga postures using various cameras and sensors. The second stage uses the keypoint estimation method on a real-time video stream to automatically extract the user's body posture while doing asanas. The third stage uses machine learning & deep learning approaches to extract essential features of human body posture in images and videos manually/automatically. The fourth and final stage predicts the asana and analyses the yogic posture done by the user.

G. PAPER COLLECTION PROCESS
In this process, one of our primary paper collection sources is Google Scholar. We utilized IEEE Xplore and PubMed in addition. We utilized nineteen keywords for searching references in all three databases, and it is separated into five groups. General searching keywords are in groups 1 & 2, utilized type of sensors are considered in group 3, utilized keypoint detection tools and libraries are mentioned in group 4, and utilized learning methodologies are considered in group 5. Table 1 illustrates those five groups. We are using three combinations of five groups of keywords in the paper selection, like combining group 1 & 2 with group 3, then group 1 & 2 with group 4, and group 1 & 2 with group 5. For example, a combination of keywords is as follows: ''Sensor-based yoga posture classification'', ''Yoga posture recognition using Kinect'' and ''Yogic posture prediction using machine learning''. The process of searching and gathering research articles for YPR will be carried out from May 2021.
The eligibility criteria for selecting the collected papers in evaluation: • It should be a published journal or a conference paper from reputed articles.
• It should be had motion capture sensors or wearable sensors in data collection.
• It must deal with yogic posture analysis.
• It should be a novel methodology and must produce promising results.
In total, forty-three articles are selected after evaluation for this review process in YPR. Figure 4 represents the number of selected papers year-wise.

H. TAXONOMY
This research has revealed a comprehensive study of the existing approaches to yoga posture prediction, classification, and grading.
Our main contributions to this survey: • This paper enumerates the benefits and injuries related to yoga.
• This paper indulges a detailed review of the growth of vision-based and sensor-based devices used in YPR. • This paper compares the use of wearables in body vital monitoring with a few proposed methodologies.
• This paper alludes to the utilized feature extraction techniques and keypoint detection methodologies in YPR briefly.
• This paper illustrates the comparison features of keypoint estimation libraries and the three versions of Kinect.
• This paper provides a detailed review of utilized machine learning and deep learning models in YPR.
• This paper specifies the major performance evaluation metrics utilized for YPR and analyses the performance of popular YPR algorithms in machine learning, deep learning, and hybrid learning models.
• This paper specifies the modern approaches to yoga pose classification and yoga pose grading methodologies in recent years.
• This is the first comprehensive survey on yogic posture recognition and grading techniques using machine learning methods to the best of our knowledge. • This paper might be helpful for those who are working in machine learning & deep learning-based yoga posture identification in real-time and guides future researchers in the right direction. This review has been organized as follows: section II enumerates various sources of data based on vision and sensor, section III explains the several keypoint detection techniques used in pose estimation, section IV illustrates the learning models and their evaluation metrics used in yoga posture recognition, section V specifies the yoga pose prediction approaches, section VI discusses the inferences of yoga posture recognition systems, and finally, section VII mention the future directions and conclusion of this review paper. Figure 5 illustrates the taxonomy of this review.

II. SOURCE OF DATA
Pose Recognition methods are classified into two main categories based on the generated data.

A. VISION BASED
A vision-based pose analysis has the promising ability to provide inexpensive and feasible solutions in the estimation of the human body pose using cameras.

1) CAMERAS
The most basic and classic approach to action identification is to install security cameras within the venue and observe human activities. Information can be seen either manually (by a user reviewing all images and videos) or automatically. Computer vision methodologies have been proposed to process and analyze info and distinguish activities automatically [117]. Vision-based pose recognition systems use cameras, such as RGB cameras, web cameras, and mobile cameras, as their sensors and analyze the human motions in images or videos [118], [119]. It is inexpensive as it makes use of computer vision technology, which is less costly than the specialized sensor. Furthermore, it may run on a CPU system only using a camera, and it does not really require a GPU for computing, keeping it inexpensive. It does not require any special sensors or advanced technologies to operate, reducing operating costs and making it available to a wide range of people [101].

2) DEPTH CAMERAS
One disadvantage of traditional cameras is their dependency on lighting, which means they will never be able to work in a dark environment. Depth cameras, like Kinect, could operate in dark places, effectively solving this problem. Kinect  produces a wide variety of streaming data, including color, depth, and audio [81], [82], [83], [84], [85], [86], [87].
It can collect more data about the human body and produce an accurate virtual skeleton. Activities can be identified using this knowledge because distinct motions of a body (particularly the skeleton) are associated with various motions. Apart from the processing complexity, depth-sensing devices are expensive, which is a barrier to using them for activity recognition [117].

B. SENSOR BASED
Many tools are available for yoga posture recognition, including vision-based markers, markerless, and wearable sensors. Vision-based sensors rely on the presence of an infrastructure to be assessed. As a result, this remedy is intrusive for continuous monitoring of everyday actions. Furthermore, visionbased recognition systems are expensive. The only reason for considering wearable (motion) detectors is for practical reasons. Following that, innumerable acceleration sensors, gyroscopes, and force sensor nodes have been developed. Wearable sensors are small, inexpensive, and lightweight and are used to acquire data without interfering with daily activities [120]. Figure 6 specifies the utilized sensors in YPR.
Wearables have had a massive effect on one of the areas where conventional medical systems are migrating to active models which will take better care of a patient's medical status through continuous monitoring in order to diagnose the patient's illness at an early stage. Wearables are tiny, portable medical tools that provide immediate access to a patient's health status, often in time to protect life. A wearable device makes it easier to react to changes in the patient's body quickly. Real-time data can be gathered from a wearable device, and findings can be obtained using machine learning algorithms. The user data was collected by wearables, which then analyzed the data using machine learning algorithms [121].
Most wearables were marketed toward customers and monitored the body vitals and motion tracking. It can also monitor and identify arrhythmia and aid in exercise and recovery. Wearable gadgets are targeted to healthcare professionals, including ECG patch recorders, vests, fitness trackers, smartwatches, and smart clothing with built-in sensors to increase prognostication and early identification of severe decompensation. Wearable technology is developing very quickly. A decision on the treatment of cardiac failure can benefit from the instruments' increasing precision. The advancement of wearable technologies and information interfaces between customers, patients, and medical professionals was likely to support healthier lifestyles and disease prevention [122]. Nadi X yoga pants were created for people who seek a step-by-step tutorial about how to perform a yogic posture, along with advice on where to concentrate and evaluation of yogic posture's effectiveness [123]. Nadi X yoga pants eliminate the need for an instructor to monitor yogic stances at practice. It gives gentle vibration feedback about optimal alignment for that specific posture during yoga practice.
According to the healthcare global market report in 2022, the size of the healthcare market of the global internet of things (IoT) is anticipated to increase from $130.26 billion in 2021 to $158.03 billion in 2022, representing a compound annual growth rate (CAGR) of 21.3%. With a CAGR of 22.4%, the worldwide healthcare market share is expected to reach $354.66 billion in 2026. The growing use of smart devices and wearables will propel the IoT in the healthcare market. For instance, the Times of India, an Indian newspaper which was published in April 2022, reported that just over 200 million smart gadget users in India in 2019; the IoT revolution, as well as the rapid digitization caused by the pandemic, increased the number of users to over 2 billion in 2021. [124].
To recognize their activities, people wear various sensing devices in different places: Sensewear, ActiGraph on the right and left wrist, and ActivPal on the right hip [125]. According to the rapid development in sensor technology, sensor-based pose recognition is prevalent and widely used in many areas, such as medical and healthcare applications. They provide accurate information on behavioural activities that promote a healthy and safe way of life [119], [126].
In this pandemic, citizens engage with self-tracking, selfcare, and health Self-reliance using wearable sensors. The wearables have in-built sensors which can collect the body temperature, heart rate, oxygen level, blood pressure, respiratory rate, stress level, and sleep levels through a user interface that interacts with algorithms that extract and process personal health data. Wearables encourage 'do-ityourself' healthcare, predicting health issues in the starting stage and health improvements through specific exercises or daily activities. In an emergency, wearables alert the medical authorities and the concerned family members with a notification message of the patient's current medical status and exact location. Intelligent yoga mats could be the most exciting technological advancement for yogic practitioners, and the motion sensors in the mat will be linked to your smartphones or computers; it provides real-time feedback about the posture, significantly enhancing that yoga practice. Table 2 compares the advent features of wearable devices utilized in body vital monitoring with few existing approaches during exercise or in their day-to-day activities.
Due to the lack of proper standardization and efficient interoperability practices, the tight coupling between multiple systems offered by various suppliers continues to be one of the biggest problems of wearables [137]. In contrast to wearables in table 2, the health monitoring approach [136] contains an integrated framework providing all these features, including automated heart rate monitoring, stress analysis, muscle activity analysis, fall prevention, fall detection, an alarm system to seek help in an emergency, and data security.

1) ACCELEROMETER
Wearable accelerometers depend on triaxial Micro-Electro-Mechanical Systems (MEMS), which measure capacitance change. Accelerometers in wearables have a different resolution of 14-16 bits with large-scale acceleration ranging from 20 to 160 meters per second squared. Accelerometers have had the least energy usage of any MEMS motion detector, ingesting only a few tens of microwatts on average. The accelerometer also measures gravity, which is suitable for absolute orientation detection & gait phasing with Zero velocity UPDATE (ZUPD). Linear displacement computation, on the other hand, necessitates the double integration of the accelerometer output signal. Physical sensor nonlinear effects generate bias error which then quickly accumulates if ZUPD or other methods are not used to compensate, and this is the fundamental impediment to dead reckoning motion tracking. Due to the finite sampling rate, it produces an error in the motion measurements that is proportional to the speed, and the duration of quick movements [120]. Accelerometers seem superior to pedometers & actometer in that they react to both frequency and severity of motion. In contrast, pedometers and actometer were attenuated by contact or tilting and only recorded body movement if a specified threshold level was exceeded [89].
While practicing yoga, the body must be in proper posture, which can be assessed using an accelerometer or gyroscope sensor. Accelerometers are currently the most often used sensors due to their versatility, mobility, and ease of use. Accelerometer sensors offer kinematic data based on acceleration measurements. Characteristics of accelerometer signals are utilized to discriminate between the relative movement of body parts for every transition pose and to examine the smoothness with which exercise was executed [90].

2) GYROSCOPE
A gyroscope would be a triaxial MEMS device that measures the angular motion of an object, such as a body part. A gyroscope operates on the Coriolis concept, which measures angular position based on linear movement. Modern gyroscopes have typical resolution and sampling rates equivalent to accelerometers; maximal angular speed is roughly 1000-2000 degrees per sec; energy usage seems to be an order of magnitude greater. Gyroscope sensors could be mounted on several human body regions, including the ankle, foot, waist, and knee, to monitor body pose and kinematic movements. Gyroscopes have a minor bias drift than accelerometers, but they have measurements that are less sensitive to shocks, and gravitational field impact [120].
With sensors like accelerometers and gyroscopes in wearables and smartphones, exercises or physical activities can be easily tracked by capturing the user's body motions. The smartphone collects the 6D sequence of time-stamped sample data equivalent to the 3-axis accelerometer, and gyroscope [91].

3) MAGNETOMETER
A magnetometer is a device that detects the magnetic field's direction, strength, & changes. In wearable technology, the hall effect detects the Earth's magnetic field. Magnetometers can also help determine the user's absolute alignment for posture recognition. Micromechanical magnetometers typically have lower sampling rates with signal-to-noise ratio (SNR) resolutions, around 10-100Hz & 8-12 bits. As a result, the magnetometer is being used as an assistive motion-sensing component [120]. Magnetometers are one of the few sensing devices that are unresponsive to acceleration and can provide absolute information regarding alignment in 3D space [89]. Magnetic sensors are typically hall effect sensors.

4) INERTIAL MEASUREMENT UNIT (IMU)
The IMU was a sensor unit that uses a combination of an accelerometer, a gyroscope, and a magnetometer to detect the user's linear velocity, angle speed, alignments, and the force of gravity. A Mahony filter with an additional Kalman filter is commonly utilized for physical data fusion of triaxial inertial measurement units. Some gadgets, such as Bosch Sensortec chips, straightforwardly generate exact orientations in quaternions at a sample rate of 100Hz [120].
Human motion assessment is now performed using various technologies, including infrared optoelectronic systems (OMC), Magneto-Inertial Measurement Units (MIMUs), and camera-based systems. OMCs are the ''gold standard'' for motion tracking. Even though these processes have high accuracy, they require a secure environment and specialized skills, making them impractical for outdoor environments [138].
MIMUs were one of the most promising motion capture technologies and standards because of their smaller size and low cost. The procedures used to extract orientation information using MIMUs are known as ''sensor fusion'' algorithms that integrate information from several sensors to provide an accurate estimation of kinematic features like joint angles. They could also be used in outdoor applications, but their kinematic estimation accuracy is lower than OMCs. As a result, IMUs could be considered a suitable choice [138].
Among them, Kalman filters are one of the most dependable, efficient, and durable sensor fusion algorithms. Human kinematic assessments in indoor circumstances utilize an IMU-based Extended Kalman filter. Four IMUs have been mounted laterally on the right upper limb and trunk using elastic bands. All the IMUs and the reflective markers were braced on the 3D-printed plastic support. During the execution of the sun salutation series, five qualified yoga trainers were used to test the system. By comparing the joint angle predictions with the results collected from the optoelectronic reference system, Pearson's correlation coefficients with Mean Absolute Error (MAEs) can be measured [138].
Gupta and Gupta [91] proposed a yoga help systemmounted sensory units in various body parts to correct the yoga postures with feedback. Each sensor unit comprises a Node Micro Controller Unit (NodeMCU) module, including motion sensor units, an accelerometer, and a gyroscope. This module serves as a control unit and transmits the sensor information via Bluetooth. The nine-axis IMU sensor (MPU9250) does provide an accelerometer and gyroscope with a sampling rate of 50 Hz. They also created a mobile application and made data collection fast and easy on the smartphone. Furthermore, a feedback report portraying the accuracy level is prepared and sent back to the trainee's mobile phone. This system predicts and evaluates the sun salutation yoga posture sequence using sensors and a deep neural network.
An interactive yoga training model with motion replication is used in virtual reality. Their system uses sixteen IMUs and six tactors to capture the user's body posture. It compares and analyzes the yoga postures of experts and practitioners and gives feedback to the users for correcting the yoga posture [139]. The IMU seems to be the most common and accurate wearable device for constructing movement analysis applications due to its small size and integrated sensor fusion implementations [120].

5) ELECTROMYOGRAPHY (EMG)
An EMG evaluates muscular actions such as voluntarily or involuntarily muscular contractions. It can reveal muscular disorder, nerve disorder, and nerve-muscle transmission issues, all of which cause locomotion difficulties. The sensor's EMG electrodes record electrical signals in use for muscular contractions. These signals can then be analyzed to identify anomalous behaviour after they have been acquired. The Electromyographic sensor employs two different types of electrodes: needle-like invasive electrodes for readings of depth and sensitivity, & non-invasive, lower sensitivity surface of the skin electrodes. Surface EMG (sEMG) analysis can examine various gait-related characteristics such as muscular properties, paresis, rigidity, and stress [120].
Myoware is often used to monitor muscular activity through electric potential, generally known as sEMG, which has already been utilized in clinical research and diagnosis of neuromuscular disorders. EMG sensors, however, have also found their way into robotic systems, prosthetic devices, and other control devices as microcontrollers and integrated circuits have become much more powerful [113].
However, accelerometers had intrinsic difficulties distinguishing between passive and active movement performance. sEMG sensors, on the other hand, are sensitive to muscular contraction and hence aid in distinguishing between active and passive movements. The method of sEMG study is gaining significance in sports and is now the most practical method of determining how hard a muscle is working (although it has some limitations) [90]. A postural recognition system is used during yoga sessions to ensure that specific lower muscle movements are accurate. To collect data, ten people participated in this study and performed five yoga poses. Their technique analyses the movement of four lower-limb muscle cells in both legs using EMG signals output. The analog-read package of Arduino, a simple open-source circuitry design that relies on hardware and software, provides data-gathering EMG values ranging from 0 to 1,023. The Myoware muscle sensor is a preconfigured printed circuit board that includes the circuitry needed to convert minor fluctuations of muscle energy into analog values that a microcontroller could recognize. After collecting EMG data from the ten participants, the Simple Average Moving (SMA) calculation was utilized to analyze the EMG data and remove the noise from them. The EMG data were extracted using feature extraction before being served into machine learning approaches (like SMO, J48, Random Forest) for posture detection. According to the results, the Random Forest algorithm has the more accurate results in identifying yoga poses when compared with other approaches [113].
The amplitude of the sEMG signal was proportionate to the force produced by the muscles, and it was used to assess the muscle effort exerted by people to do specific asana. The signal should be adjusted before comparing sEMG signals from various individuals. sEMG signals were normalized to different characteristics such as the person's height, body weight, limb, and neck [90].

6) ELECTROENCEPHALOGRAPHY (EEG)
Most meditation studies were focused on EEG due to its low price, portability, & non-invasive connectivity to the activity in the brain. Brain waves are altered based on specific cognitive and motor act changes. EEG has become one of the finest ways of grasping brainwave activity in a non-invasive fashion with high resolution, compared with many other brain signals. Electrodes are placed on the scalp surface to assess the overall brain impulses of the cerebral cortex. EEG devices assess the electromagnetic fields linked with a broad set of neurons. Due to the general complexity in framing the spatial activities onto the distinct areas in the brain & electrode location, EEG is tough for inexperienced spectators to understand [112].

7) INFRARED (IR) SENSOR
An IR sensor is a radioactive material with a spectral sensing component within specific wavelengths ranging between 780 nm to 50 µm. They are most extensively used in movement detection. IR sensing elements easily detect physical movements based on heat variation.
Low-resolution IR sensors have been used in the gadget-free yogic pose detection system. It contains a sensory module with eight thermal sensors and an I2C serial port connecting to WiPy 2.0 (Wi-Fi/Bluetooth module). A WiPy 2.0 unit, including an ESP32S microcontroller, interacts with a router and the deep learning server over the Internet. The AMG8833 has a built-in camera. The sensor also has a small surface-mounted device and senses IR signals with wavelengths ranging from 8 to 13 µm, and it records the observed temperatures as the floating-point value of 2 decimal digit numbers in Celsius. The significant benefits of AMG8833 in the YPR system are compact, low energy consumption, unobtrusiveness, ability to identify immobile subjects, offthe-shelf nature, and low cost compared to all other thermal imaging cameras [92].
A unique IoT-based yogic posture recognition system contains three wireless sensor nodes, and each node interconnects the wireless unit and low-resolution IR sensors. The wireless sensor nodes are mounted on ceilings and walls to capture yogic postures in x, y, and z directions. They selected 18 yogis to do 26 postures throughout two sessions, each lasting up to 20 seconds. Those sessions are recorded, preprocessed, and then transformed into grayscale images. Their model evaluated 93,200 posture images using the tenfold cross-validation with DCNN, producing an accurate result of 99.991% [92].

8) FORCE SENSITIVE RESISTOR (FSR) SENSOR
FSR sensors sense the static and dynamic pressure exerted on a target surface. Their reaction range is mainly decided by the fluctuation in electrical resistivity [140].
FSR is made of a semi-conductive substance or semiconductive inks between two thin substrates. Shunt type and thru type are two distinct types of FSR sensors. FSR in the shunt type were polymeric thick-film sensors with two membranes divided with a thin air gap. One membrane contains two pairs of interdigitated traces that are electrically separated from each other, and the other is covered with some special textured, resistant ink. Thru-type FSRs used twin polyester outer substrates as resilient printed circuitry. Silver rings with traces are positioned above and below in between the pressure-sensitive phase, succeeded by the polymer film [141].
Their system facilitates a real-time yoga practice using an embedded-based Intelligent Yoga Mat (ESYM) to correct the yoga postures. For evaluating the pressure of human poses, FSR-type sensors are favoured. The ESYM is created using a network of pressure sensors. At first, the pressure nodes on the ESYM are recognized, and then a pattern is derived using FSR sensors. Pressure sensor data modules store each sensor's information. For the assessment of yoga poses, an approach for pattern recognition was developed. Through the use of a speech unit, biofeedback results are utilized to correct the poses [93].

9) RFID
With the growth of RFID technology, various techniques for human action recognition employing device-free RFID technology have developed in recent years. Initially, the RFID technology's range was limited to a few more centimeters, but it has already been expanded for passive and active tags.
RFID technology is comprised of two major components: readers and tags. A reader is a device that reads tags and collects data from them. It is equipped with an antenna that emits radio waves. RFID tags receive and manipulate these radio signals with data like ID. A reader can collect these backscattered signals through the use of an antenna that contains tag information. Tags are microscopic chips that can be applied to a variety of objects. There are two kinds of tags: Active and Passive. Active RFID tags have their power source, whereas passive RFID tags do not have a battery and rely on the readers' radio signals for energy. As opposed to passive tags, active RFID tags have a greater range. RFID has been embraced in various disciplines because of its passive nature, low price, and unobtrusiveness. RFID is currently frequently employed in scientific work on activity recognition. RFID technology is used by researchers in tracking, localization, posture, gesture, and behaviour recognition [117].
RFitness is the fitness posture system to detect yogic postures using RFID tags on the yoga mats. Multiple commodity RFID tags are attached to the yoga mat to predict the yoga posture, letting different yogic postures activate distinct RFID tags. According to the detected signals from the RFID tags, distinct yogic postures are easily identified using deep neural networks [110].

10) WI-FI
There has been a paradigm change in activity recognition research during the last decade, from device-based methods to device-free ones. Researchers have started to employ Channel State Information (CSI) for activity detection after investigating the features of wireless networks. Several Wi-Fi-based localization, track, and fall detection methods were already proposed. Wi-Fi has the advantage of being unobtrusive, so users are not obliged to take any device with them [117].
Wiga is a noncontact activity detection system using Wi-Fi in real time. It can detect the sequence of actions; moreover, the user did not participate in a training phase. Wiga uses non-intrusive Wi-Fi gadgets that can do action sequences classification, which would be more comfortable and secure than vision-based & wearable sensor systems. Nevertheless, the wireless system is vital in ambient environments. Although Wiga works well in this experimental scenario, its accuracy might degrade in a different atmosphere, but since the model was trained inside one environment would learn environment-specific properties [111].
The contributions of sensors in the yoga posture recognition system are summarized in Table 3. In these approaches, the sensors are mounted on the yoga practitioner's entire body [91], [93], [109], [112], [113], [114], [138], and the walls & ceiling [92]. These sensors predict the entire body posture, muscle activities, and the rested brain states during the practice of yoga.
Many studies have been conducted to develop posture recognition models using cameras, Kinect, wearables, and smart mats. Vision-based methods, in general, have higher accuracy due to higher resolutions; nevertheless, these methodologies raise privacy issues. As a result, most users may well not tolerate such systems. On the contrary, the key advantages of using wearable device approaches are non-invasiveness and high accuracy. Even though transporting and maintaining numerous or even a sole wearable sensing device is problematic in long-term real-life usage because of its maintenance overhead and discomfort. As a result, among the most optimal alternative for yogic posture detection would be to use a privacy-preserving & devicefree sensor module that does not use the cameras or any wearables [92].

III. FEATURE EXTRACTION
The collected information would first go through the feature extraction methods in order to support an effective perception of human actions. The recognition model can be constructed using learning approaches from every feature occurrence. After training, unseen occurrences can be assessed with in recognition model to estimate a prediction of the act done [32].

A. LOCAL FEATURE EXTRACTION
Local features in the images or videos refer to the patterns or distinctive structures like points, edges, or small image patches. They are primarily associated with an image patch distinct from its surroundings in terms of texture, color, or intensity.
The Local descriptors, including Scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and Local Binary Patterns (LBP), have been widely used in various computer vision tasks, especially HPE [43]. To recognize objects, SIFT was initially developed. SIFT features have become very effective image descriptors, and their use in face analysis has been extensively researched [144], [145], [146], [147]. SIFT and SURF features are utilized to compare the user's portrayal of an asana to a video of the identical ''asana'' done by an expert [107], [148], [149].

B. GLOBAL FEATURE EXTRACTION
A method for extracting global features based on computer vision is the Histogram of Oriented Gradient (HOG) [150], [151], [152], [153]. It calculates the horizontal and vertical orientation and magnitude of gradients from the entire image. Typically, this method is employed to identify human postures in images. HOG divides the image into blocks, then computes the gradient histogram of every block, and finally concatenates these histograms to create the feature vector [43].

C. SKELETAL BASED FEATURE EXTRACTION
Skeletal data features extracted from the human skeleton had recently spurred research into yoga activity recognition [97]. Extracted skeletal data provides significant improvements in accurate prediction. In most cases, depth sensors such as Microsoft Kinect derive skeleton-based key features. This depth sensor provides the body joints' 3D coordinates. The feature vector could be calculated using the relative distances between the joints [43].

D. KEYPOINT BASED FEATURE EXTRACTION
In pose estimation, the configuration of the body (pose) is predicted from an image. The process of pose estimation is separated into basic steps: 1) Identifying human body joints/keypoints. 2) Grouping those predicted joints. Initially, keypoints are usually identified by various joints in the human body. It includes all the joint positions, eye, ear, neck, shoulder, and so on. In the second step, by grouping all those joints, the entire body structure should be formed, and these grouped keypoints predict the pose of a human at a time [154]. The keypoint detection method extracts the  x and y coordinates of distinct areas of the body parts and their confidence levels. After that, the coordinates of different areas of the body derived from such an image encompass sufficient evidence to identify whether the posture is correct or not [94]. Keypoint estimation methods could not need high-tech devices for analyzing pictures or videos. It predicts the following poses: sitting, standing, walking, running, cycling, lying down, fall detection, push-up count, and activity prediction [155].

1) MICROSOFT KINECT
In pose estimation, the major challenge was predicting the human body coordinates of joints. With the introduction of Kinect, anyone can easily detect and forecast the body coordinates joints in any human pose using depth images. Conventional techniques in this field have mainly relied on common sensors, like RGB cameras, that are often computationally expensive and susceptible to lighting changes and cluttered backgrounds [86]. The following figure 7 illustrates Microsoft Kinect's three versions. Table 4 summarizes the utilized Kinect versions in the yoga posture recognition system.
Kinect is a motion perception input device released by Microsoft in 2010. Microsoft Kinect contains RGB cameras, an infrared camera, infrared projectors, and microphones. It can be used for gesture recognition, posture recognition, speech recognition, skeletal body detection, and voice control. It captures real-time RGB and Depth video feed. Compared with traditional RGB videos, the key benefit of employing depth video content is that separating the human in the foreground would be easier, while the video seems cluttered. color information is also not there in the depth videos, so clothes worn by the human subjects could not affect the segmentation process. Kinect was initially manufactured as a motion controller to avoid physical controllers in 3D gaming applications, and it is a specific novel product among the numerous competitors in the gaming sector. It permits activity recognition researchers to focus mainly on acquiring robust feature descriptors to specify its actions instead of low-level segmentation [158], [159]. Kinect has an inbuilt IR sensor that can detect the information of distance in-between the object & the sensor and generates the depth images all at the rate of 30 frames per second. Furthermore, Kinect's precision & adaptability in detecting the joints of a human body are outstanding, letting it detect complex and challenging postures [86]. The first version of Kinect usually captures images at the rate of 640 × 480 pixels resolution [83]. To extract the features of the user's body map, Kinect uses the OpenNI library and OpenCV library to extract the body contour from the body map [81]. Body contouring information is gathered in a single direction; therefore, it cannot precisely compare and contrast the yoga poses of beginners and experts in the field. With the help of two Kinects, the body maps of the practitioners were extracted from the front and side views. It eliminates the limitations in [107] and [139] surpasses the previous comparison results [84]. Kinect utilized a built-in infrared laser projector, multiarray mic, RGB camera, and CMOS sensor to capture the image and video. It also has a skeleton tracking tool to detect and track the coordinates of human joints. There has recently been a surge in interest in employing Microsoft's 3D Kinect sensor in vision-based pose identification [86]. Using a Microsoft Kinect, a novel self-trained yogic posture recognition system was designed. They selected five yoga practitioners to perform 12 yoga postures five times each and collected 300 videos. First, the user's body map is captured, and the body contour is extracted using a Kinect. A star skeleton, a rapid skeletonization method that connects the centroid of a particular object towards the contouring extreme, is utilized as a prominent identifier of human posture in YPR. A distance function is used to examine the divergences between the user's pose and the pre-built traditional Yoga poses. The system's pose prediction accuracy was 99.33% [81]. Their model classifies the yoga poses based on the detected body joint points in real-time using Kinect. They selected five yoga practitioners to perform three yoga poses thrice for each in 5 -8 seconds (5*3*3 = 45 videos). They achieved over 97% accuracy in every calculated angle between the distinct body parts associated with those three poses [86].  [82]. VGB would be a tool of Kinect V2 which uses classification techniques to deliver data to discover motions [82], [85]. It records the depth and color image and tracks the entire body skeleton with its 25 body joint coordinates. The Kinect and the yogis (children) were distanced by 1.5 to 3.0 meters [87]. Two Kinect were placed around 2 meters from the yoga mat in a perpendicular front direction. Perpendicular view orientation is slightly more accurate than front view orientation in obtaining the body maps from the two separate directions. It assesses the inaccuracy between the body joints and exhibits that standing poses are more accurate than the seated and supine body orientation yoga poses. Algorithms in Kinect need clarification when the yoga practitioner moves their heads below the waist during the practice. Kinect VGB predicted the solution in most sampled yoga postures with 99.5% high true positive and 0.03% low false positive [85]. Their model uses an innovative approach for practicing yogic poses that can monitor up to six persons simultaneously using Kinect V2. They recognized six yoga poses using Kinect V2 and AdaBoost algorithm. This model produces accurate results of yogic postures greater than 94.78% [82]. A gesture analysis model of yoga using machine learning selected six yoga experts to perform five yoga poses in 1-2 minutes and collected 10 -20 GB of video clips for training purposes. It uses the Visual Gesture Builder tool of Kinect V2 and the AdaBoost algorithm that recognizes the gesture of yoga. They recruited 20 students (adults) with minimal yoga experience for this course and measured the posture alignment of yoga conducted weekly twice for 75 minutes over ten weeks. The system accuracy was over 90% in all yoga postures, along with the specificity closer to 1 [85].
• Azure Kinect Microsoft has made great efforts to enhance the capabilities of Kinect in business applications, as previous versions of the device had been a failure among consumers in the gaming industry. Due to their continuous efforts, Azure Kinect was born in 2019 using computer vision technology, and it supports some new applications [160]. For building robust body tracking, advanced computer vision, and voice recognition models, Azure Kinect has in-built RGB & depth cameras, 360-degree angled seven microphones circular array, and an orientation sensor. It can connect many Azure Kinects into one volumetric capture rig using software tools through volumetric capture workflow. It permits the users to experience the interactive virtual reality of human performances. Previous versions support only gaming, but it supports some new fields, such as logistics, robotics, healthcare, and retail [161]. Detecting a person's body posture from either image or video could be problematic in case of powerful articulations or occlusion and tiny or scarcely visible joints. In order to do an accurate detection, the context must be collected. Utilizing the IR sensors for transforming 2D images acquired through a simple RGB camera into a three-dimensional human posture prediction, each approach has its own set of advantages and disadvantages. Microsoft's Azure Kinect is amongst the most well-known solutions. Azure Kinect records all skeletal information, those data are the 3-dimensional coordinates of the participant's joints, and it is fed into the layer as an input. It employs them to generate various vectors & angles, allowing for precise postural detection. This gadget weighs 440g, which is significantly much less than the weight of the Kinect v2. The joint coordinates are based on the 3D coordinate system of the depth camera. The joints were arranged in a hierarchy from the middle of the body towards the extremity. The mathematical formula was utilized to calculate the angles. The Bayesian network then computes the likelihood of each predetermined pose and selects the best one with the most excellent probability as the final posture. It identifies six body postures of the players, such as Hands UP, Hands Down, Turn Left, Turn Right, Dive, and Slow Down [157]. Figure 8 illustrates the skeletal body joints or keypoints tracking in Kinect versions. Nowadays, the Microsoft 3D Kinect sensor is widely used in vision-based human action recognition techniques [157].

2) VICON MOTION CAPTURE SYSTEM
With the emergence of real-time sensing devices like Kinect & Leap motion detectors, human activity recognition approaches have moved their attention from 2D to 3D RGB-D (red, green, blue, & depth). Both RGB & depth data were examined separately, and features extracted are provided as entries with a classifier including a Markov model, support vector machines, and the neural network [97]. The modern computer-aided VICON Motion Capture (MoCap) system recovers the exact shape and posture of the body by providing exact 3D positions of sparse markers carefully positioned on the subject's body. This MoCap system consists of 9 cameras, eight infrared sensors, and an RGB camera, as illustrated in figure 8. The focal distance and acquired range were optimized to provide a 95% reconstruction of 3D joint info. Yogic activities were collected by a 3-dimensional motion capture device, while body flexibility, balance, & functionality of its operative body parts were calculated and integrated using information from the EMG sensor. The investigation of the system mainly focused on older persons and their daily actions. The goal is to assist yogic instructors in constructing specific yoga postures using  or skeletal information and converts it into the color texture features using joint angle and joint distance map (JDM) [97].
Surya namaskar asana consists of 12 sequences of postures with distinct body motions. Furthermore, postures' duration varied based on the user's training, shape, and mass index. With the exception of time flexibility, there will be other aspects to consider when practicing yoga, including muscular stiffness & different stages of joint occlusion at the time of the entire yoga practice. Utilizing merely joint location data as the features that vary considerably irrespective of the causes could produce an excessive number of false positives & true negatives. Figure 9 specifies the standard nine-camera mocap system in the indoor environment [97].
In their model, all practitioners were attached with sixteen low-mass retro-reflective markers at specified body positions in accordance with Vicon Plug-In Gait models for its leg muscles. All kinematics data were obtained at 100 Hz employing a Vicon ten-camera motion capture system. Each of the seven yogic postures was executed in a way that users could practice frequently. Every pose was done in the balanced standing position three times, with each participant holding the posture for fifteen seconds. All the positions were done both on the left and right legs. For analysis, each pose's threedimensional ankle dislocation and joint motions are extracted and entered into Excel File. For the mid-five seconds of the stance hold, the mean joint displacements & joint moments were determined. The stress and strain inside the three axes of motion were charted using mean and standard errors (SE). One shortcoming with this model would be that the MoCap features represent movement between both the shank & foot sections and need to account for the numerous joints within the foot [163].
A three-dimensional motion was collected utilizing 39 retro-reflective markers with a 12-camera Vicon system at a sampling frequency of 100 Hz. Markers were positioned on predefined anatomical landmarks specified by the plugin-gait model and fixed using double-sided sticky tape. For calibration, the static trials were captured while standing in a skeletal posture. At 10Hz, analog data was filtered. Vicon Nexus was used to compute the angular position of the spine and top and bottom extremities during 12 successive poses of Surya namaskar [109]. The markers are placed across the whole body, and it is illustrated in the following figure 10.
Compared to the Kinect, the Vicon was a marker-based motion-tracking system. Vicon utilizes a global coordinate system, while Kinect does not. Vicon adopted a reference standard lab method for measuring movements, whereas Kinect used algorithms to improve measurement accuracy. Keypoint detection methodologies estimate the yoga posture from the head, shoulder, body, and foot skeletal joint's keypoints. From the image or video, the acceleration of positional changes in yoga was predicted based on the skeletal keypoint information extracted using the keypoint extraction tools like OpenCV, OpenPose, Mask R-CNN, and MediaPipe. It increases the yoga posture's prediction accuracy.

E. KEYPOINT DETECTION TOOLS
Some standard keypoint extraction tools from images or videos used in yoga posture recognition are outlined below.

1) OpenCV
Intel's Open-Source Computer Vision (OpenCV) library was written in C & C++. OpenCV library contains more than 500 functions and 2500 optimized algorithms covering many computer vision areas. OpenCV has been downloaded more than 2 million times, and this number is growing, averaging over 26,000 downloads per month [118], [164]. Furthermore, OpenCV included OpenPose in its libraries with their Part Affinity Fields-based topology in their Neural Network module [57].

2) OpenPose
It is the first multi-person detection library in real-time. This library identifies 135 keypoints in total from the human VOLUME 11, 2023 FIGURE 11. Representation of Body, Hand, Face, [165] and Foot [57] detected keypoints in OpenPose.
posture on a single image. It contains three different categories of keypoint blocks. It is illustrated in the following figure 11 [102].
3) Face keypoint detection. Existing 2D body pose estimation libraries did not combine their face, body, hand, and foot keypoint detectors [57].
Features of OpenPose: OpenPose, the keypoint detection library, has many inbuilt features, which are listed below.
• It is compatible with many versions of hardware and software.
• It is compatible with multiple operating systems like Windows, Mac OS, Ubuntu, & Embedded systems.
• It supports some of the hardware like CUDA GPUs, OpenCL GPU, and CPU.
• Users can select their inputs from an image, video, webcam, & IP camera.
• Users can choose to have their results displayed or saved to disks.
• Users can activate or deactivate the face, body, hand, and foot keypoint detectors.
• Users can enable pixel coordinate normalization.
• Users can select the number of GPUs required for their application.
• Fast-tracking and visual smoothing in detecting a single person in real-time. Compared with OpenPose, Mask R-CNN and Alpha-Pose have some drawbacks in their libraries. Mask R-CNN and Alpha-Pose, enable its users to complete their frame readers (image, video, and live streaming data), the majority of its pipelines, visualize the results, and JSON or XML output file creation to their results. Face and body keypoint representations are not integrated into conventional keypoint detection methods, requiring a different library for each use [57].
OpenPose network utilizes the first ten layers of VGG-19 to extract the features of an image then those extracted features are given as input to the two parallel stages of convolutional layers. The first stage of convolutional layers predicted a series of two-dimensional confidence mappings. Each one of them represents a particular body part of the human skeleton. In this second stage, a collection of two-dimensional vector fields from the part affinity field was predicted, and it computes the degree of relationship between estimated parts. In the final stage, bipartite graphs are created in-between pairs of body parts by utilizing the confidence maps, and part affinity fields could cut out its weak linkages. After getting the results of each phase, human skeletal postures are evaluated and assigned to everyone in the image [166].
Extracted postural features are required for developing a self-trained yoga posture detection model. Nevertheless, this model utilizes the manually extracted features and needs a distinct model for every asana. The skeletal system is the fundamental characteristic required for expressing many human postures. There are several methods for obtaining the human body's skeletal structure that can subsequently be used to predict posture. However, such approaches [83], [88], [107] are computationally intensive, unsuitable for general smartphone apps, and sensitive to vibrations. It measures the user's motion & identifies the position of secret body parts that were hard to discover in normal situations. HPE has been a rapidly evolving area, and OpenPose completely transformed it by substantially decreasing computing time without compromising the prediction accuracy of the model [94].
The outcome relating to every frame of the live stream was obtained in the form of JSON, and it consists of the locations of each body portion for each person identified in the image. The pose extraction was carried out by the OpenPose network's default resolution for optimal performance. At these configurations, the system averaged around three frames per second. Every system design architecture employs OpenPose for keypoint extraction [96], [167].

3) MASK R-CNN
Mask R-CNN is the extension of faster R-CNN [168] by adding one additional branch in the network to predict the segmentation masks from every Region of Interest (RoI). Furthermore, it has similar existing features of classification and bounding box recognition [61].
Each RoIs mask branch can be applied using a tiny, Fully Convolutional Network (FCN) to forecast the segmentation results in pixel-to-pixel mode. To achieve better accuracy and speed, RoIPool enables attending RoIs on feature maps. RoIAlign, a straightforward, quantization-free layer that reliably conserves precise spatial information, is used by Mask R-CNN to fix misalignments [61].
Compared with the Faster R-CNN, it is simple to set up & it has a little burden on pixel-to-pixel alignments and operates on five frames per second. Faster R-CNN comprises two main phases. The initial phase, known as the Region Proposal Network (RPN), suggests candidate object bounding boxes. Its second phase, essentially Fast R-CNN, obtains features on each candidate box utilizing RoIPool and does some classifications and bounding-box regression. Utilized features in these two phases might be shared for quicker inference. Like Faster R-CNN, Mask R-CNN uses the same two stages. In contrast, the second stage produces the binary mask for each RoI. Faster R-CNN exhibits the results for the object as a labelled class and the bounding offset. Mask R-CNN added one more label as an object mask in the results. R-CNN is a Region-based CNN, it is the most popular approach to detecting semantic and instance segmentation [61]. Instance segmentation clearly distinguishes each specific object classified as similar instances, but semantic segmentation classifies all the objects as single entities without any differentiation [169]. Mask R-CNN library detects 17 keypoints in the human body [102].
Mask R-CNN is entirely based on an instance-first strategy. Mask R-CNN can easily be extended for predicting the human pose. In Mask R-CNN, keypoint locations are modelled as a one-hot mask, and it has easily predicted K masks as one for each K keypoint type. It outperforms all the instance segmentation tasks in the past single-model results on the COCO dataset [61].

1) MediaPipe Pose
It is one of the MediaPipe ML solutions for highfidelity (hi-fi) body pose tracking. MediaPipe Pose  using the Blaze Pose model to detect 33 threedimensional landmarks is illustrated in Figure 12 and extracts the background mask from the entire human body from real-time video. Existing approaches primarily rely on the powerful desktop and laptop environments for body pose tracking, but it achieves high performance in real-time on desktop, laptop, and recent smartphones [171].

2) MediaPipe Holistic
The holistic model integrates the separate pose, face, and hand models in MediaPipe. It tracks the entire human body, face, and hand landmarks on mobile devices. It could be applied to various modern healthcare applications like fitness trackers, sports analysis, gesture control, and virtual reality. It generates 33 keypoints for pose detection using MediaPipe Pose, 21 keypoints per hand for hand detection using Medi-aPipe Hand, and 468 keypoints for face detection using MediaPipe Face Mesh. In total, MediaPipe Holistic approach detects 543 keypoints in a whole body [172]. Figure 13 illustrates the detected keypoints in the face and hand. Table 6 specifies the comparison of these keypoint estimation libraries' features. VOLUME 11, 2023

IV. LEARNING MODELS
Learning models are grouped into two parts according to the models used in YPR.

A. MACHINE LEARNING MODELS
Human Activity Recognition systems use machine learning and deep learning approaches to detect the pose of a human stance by extracting their sensed signals from various sensors and camera vision systems [119]. Computer vision was used to extract features and to create a skeleton of a body by marking & linking all the joints. Coordinates & angles formed by the joints can be retrieved and used as a predicted feature in machine learning models. Multiple ML approaches were utilized to compute the posture's rate of accuracy [103]. Figure 14 represents the implemented YPR system using machine learning methods to classify yoga postures. Yogic postures are identified by the trained classifiers using machine learning models. The classifiers are first trained with training feature datasets, then the trained classifier is utilized to predict the specific yoga pose from a testing set of features [43]. A popular machine learning classification model in the yoga posture recognition system is listed below.

1) LOGISTIC REGRESSION (LR)
LR model was mainly used for binary classification problems. It produces binomial, multinomial, and ordinal outcomes as well. The linear regression model deals with the prediction of continuous variables, while this model predicts the target, which is categorical. It uses the sigmoid function to predict the probabilistic values between 0 to 1 in binary classification problems [179].
LR might be the most well-known discriminative method, and it supports both L1 and L2 regularization. For multiclass problems, it uses the SoftMax activation function to predict the probability of each class. Newton-cg, sag, and lbfgs solvers only support L2 regularization. Liblinear solver supports both regularizations and is the first and best choice for smaller datasets. Solvers of 'sag' and 'saga' are faster for larger datasets [180].
Agrawal et al. [103] use multinomial and Newton-cg solvers to predict the ten yoga poses. The maximum number of iterations taken for the Newton-cg solvers to converge is 1000, 1500, 2000, & 2500, and it achieves the average accuracy of 82.15%, 83.02%, 83.79%, and 83.16% for all the yoga poses.

2) Naïve BAYES (NB)
NB is built on the Bayesian theorem's probability model. The certain characteristics of the class variables are assumed to be independent of one another using a Naïve Bayesian classifier [181]. This model has its probability table and is updated through training data. In this model, to predict a new observation on data needs to look up the probability table based on the feature value of classes [179]. Agrawal et al. [103] have created an NB classification model to classify the ten yogic poses at an accuracy rate of 74.75%.

3) SUPPORT VECTOR MACHINE (SVM)
SVM is one of the supervised learning methods with a twoclass classifier. To solve a greater number of issues involved in different categories, multiclass SVM is the best choice. A multiclass SVM generates several classifiers and differentiates those relying on the distinct labels from the rest or in-between every set of classes [182]. Support Vector Machines are also handling both classification and regression problems. It classifies objects on the training dataset more accurately based on the examples [179]. SVM classifies data by generating a hyperplane with the greatest possible separation within classes [179], [182].
SVM and Kernel-SVM have been used in a comparison study to categorize the resting brainwave patterns affiliated with Kriya Yoga meditation sessions. The EEG signals of 10 non-meditating persons and 23 meditating persons were recorded. EEG data were collected using a 64-channel EEG device and using global 10/20 conventional electrode placement. A 16-bit resolution EEG gadget with a 256Hz sample rate was used to gather the EEG sample. During data collection, the meditating persons meditated meanwhile the non-meditators sat casually. It has been found that ''Polynomial'' has higher classification rates than the other three most used kernel functions. The results of SVM & k-SVM for two distinct groups are displayed and compared. The polynomial kernel function had an average classification rate of 90.82 %. Furthermore, the average accuracy in the system of SVM & k-SVM was observed to be 85.54 percent and 90.82 percent. These findings revealed that k-SVM outperforms traditional SVM in differentiating meditating & non-meditating patterns of EEG. Amongst other classifiers, the k-SVM is a stronger predictor of detecting non-linearity in EEG time series. For VOLUME 11, 2023 nonlinear signals such as EEG, k-SVM is far superior to SVM [112].
Lee et al. [87] classify the four yoga positions performed by the kids, and the linear kernel SVM classifier yields 91.3 % accuracy. Compared to linear SVM, polynomial kernel SVM, KNN, and Random Forest classifiers in yoga posture classification, polynomial kernel SVM produces better results.
Gupta and Jangid [101] developed an SVM classification model in YPR with an accuracy of 97.64% in classifying the four yoga poses. The SVM and RF models achieved excellent results, with an accuracy above 95%. However, the SVM classifier surpasses traditional RF with a precision of 97.64 percent, which is 1.17 percent higher.
Agrawal et al. [103] have created an SVM classification method that uses linear, polynomial, and radial kernel functions and classifies the yoga poses with an accuracy of 87.91%, 93.58% and 98.71%.
Nagalakshmi and Mukherjee [106] have proposed an SVM model. It uses linear and radial kernel classifiers to predict the 13 yoga asanas and achieves an overall accuracy of 71.5% and 59%. Linear SVM achieves the highest accuracy compared with the models K-Nearest Neighbors (KNN) and k-SVM.

4) DECISION TREE (DT)
One of the most well-known data categorization algorithms is a DT classifier. An essential feature of DT has been its potential to convert complex decision-making issues into simpler processes. It constructs the classifier model as a tree [183].
Decision trees are mainly applied to solve the classification & regression problems. Target variables in classification are categorical, whereas they are continuous in regression. This machine learning model is widely used to predict future outcomes [179].
Agrawal et al. [103] predict the ten yoga postures with the highest accuracy of 97.71 percent while detecting the yogic position using the DT classifier.

5) RANDOM FOREST
RF is an ensemble classification technique widely used in ML classification and regression. It uses a similar ensembling technique to connect multiple decision trees in parallel. Different subsamples of datasets are given as inputs to each decision tree. It uses the majority voting technique for binary classification problems to collect the binary outputs. It takes the average (mean or median) of the continuous outputs for regression problems. It minimizes the loss and improves accuracy. The RF in the ML model highly accurate than the DT learning model [142].
The yoga Posture Recognition model using RF classifiers achieves 94.9% accuracy in classifying the four yoga postures performed by the children [87].
RF classifier correctly classified four yoga positions: tree pose, triangle stance, warrior I, and warrior II poses, with an accuracy of 96.47% [68], and it detects the ten yogic postures with accuracy in 99.26%, 99.72%, and 99.90% using different parameters [103].

6) K-NEAREST NEIGHBORS
It is a non-parametric and lazy learner algorithm. It does not make any assumptions based on data and takes action only at the classification stage. It does not learn anything from the underlying data; instead, it simply stores them. This model handles expensive calculations based on huge datasets [179].
Lee et al. [87] identify the four yoga postures of children using KNN, and it produces the best overall average of 93.1% accuracy that could be based on its short amount of video data.
Nagalakshmi and Mukherjee [106] have created a KNN classifier model that uses the euclidean distance function for all the yoga posture classifications. In their model, uniform and in-verse weight functions are used for prediction. In the uniform weight function, all the points in each neighbor are equally weighted, but in the case of inverse, points are weighted inversely to their distance such that closer ones have more weight than the farthest one. It predicts the highest accuracy of 99.01% using five neighbors, inverse distance weight, with Euclidean distance. KNN classifier has the k value of 6 and it classifies the 13 yoga asanas with an overall 71% accuracy, 71.59% precision, and 72.76% recall.

7) NEURAL NETWORKS (NN)
NN is a set of algorithms that use nodes to recognize relationships between the data, like how neurons in the human brain function. Changes to the input data are adopted in neural networks and produce the best resultant data without redesigning the outputs. This model entirely relies on the training data for the learning process. Once it is learned from the training data, then the performance of the network increases automatically. YPR uses neural networks to classify the 13 yogic postures with an overall accuracy of around 74% [106].

8) PRINCIPAL COMPONENT ANALYSIS (PCA)
PCA is an unsupervised machine learning approach widely used for reducing the dimensions of the dataset and can be used for feature extraction. Principal components are a statistical technique for changing correlated variables to uncorrelated ones. This model was trendy in ML and Data Analysis [142].
To reduce data redundancy, Wiga performs PCA on video. Nevertheless, Wiga uses PCA to obtain real-time detection with a latency of fewer than 0.5 seconds in a smaller time window to reduce duplicate data, and it avoids the segmentation procedure in this model. The redundancy removal algorithm in each receiver antenna uses the PCA algorithm to minimize the dimensions of subcarriers. Three Channel State Information (CSI) streams antennas are used to compute PCA initially. PCA was conducted on CSI streams across a sliding window termed short PCA to fulfil the sequential action identification in real time. Because most everyday human activities last less than 0.5s to 3s, a window duration for SPCA was adjusted to 0.25, 0.5, 0.75, or 1 second to pick the relevant metrics [111].

9) AdaBoost
Adaptive Boosting is an ensemble method that combines machine learning models to improve performance and minimize the errors produced in classifiers. In this machine learning technique, initially, weights are equally distributed to the base classifier. If it produces the incorrectly classified records, weights are updated to all the records and the feed is forwarded to the newly added weak classifier. It continually adds the weak learners for incorrectly classified records until it becomes the strong learner. Adaboost model improves the accuracy and produces minimal error [82].
In [82], they collected three datasets of Hatha yoga's six yoga poses to evaluate the performance of a YPR. Initially, in the first session, the yoga trainer's pose sequence was evaluated in the first dataset. Footage from the second session conducted by the yoga trainee was included in the second dataset. The third dataset was made from videos from both sessions but was restricted to 5 clips. After getting the collected frames, the system's accuracy was calculated. The third dataset achieves the highest accuracies of more than 94.78 % in all postures.
VGB classifies every recorded frame onto motion capture patterns using one of two detection technologies: AdaBoost Trigger or RFR Progress. The AdaBoost algorithm merges the outputs of an ensemble of weak classifiers into a total, and it can create thousands of additional weak classifiers in data analysis. It chooses the relevant features used to increase the model's predictive power, and the irrelevant features need not be computed, such as dimensionality reduction and increasing speed. A filter has been applied to raw frame findings, removing the noise and jitter in the skeleton. Filtering settings were examined to identify ideal values for yoga postures which minimizes the false positives and negatives [85].
In [85], Kinect Studio was utilized to record six yoga trainers performing a set of five yoga asanas, mountain, forward bend, upward salute, side bend, and tree pose, then convert that to extracted videos utilizing KSConvert. By consensus of two yoga teachers, video clips were labelled or tagged in the entire frame throughout the recording, identifying a yoga posture. This model achieves accuracy of 95.8% and 98.4% on the yoga postures of trainees and experts. The following table 7 illustrates the overview of ML models. Table 8 illustrates the accuracy of machine learning models in classifying yogic postures.

B. DEEP LEARNING MODELS
Deep learning models most widely used in yoga posture recognition systems are convolutional neural networks, recurrent neural networks, deep neural networks, autoencoders, and hybrid models. Figure 15 shows how deep learning models work in YPR using different types of networks.

1) CONVOLUTIONAL NEURAL NETWORK (CNN)
Regarding vision tasks, CNNs have been the frequently utilized deep learning model. Hand-crafted elements would be used in classical ML approaches, whilst CNNs automatically understand certain representative features [96]. CNN works quite well for visual recognition tasks, including picture categorization, object recognition, and so on, specifically whenever massive data is used to train the network. To detect the incorrect yogic stance from the specific pose is to identify the pose that is executed by utilizing the CNN from various yogic postures. For providing keypoints and skeletal annotation for pose identification, which might not be feasible because of the higher computational requirements, they utilized a CNN to alleviate this problem. A CNN has been used to extract 15 keypoints of a body from which that skeleton of a user's posture was constructed [94]. Each layer in CNN performs a distinct role of play. The three important types of neural layers which involve a CNN are as follows: • Convolutional Layer: In this first layer, CNN convolves that entire picture, and its feature maps [186], [187]. A convolutional layer was composed of 3 × 3 filters, each of which has its number of parameters that must be learned. These filters' height and weight were less than the input array's. Every filter is combined with an input matrix to generate a neuron-based activation map. The convolution layer's output volume is obtained by stacking its activation maps filtered along its dimensions. Every neuron within the activation map is just connected to the relatively limited regions of input volumes since the height and width for every filter were created to be much less than the input. The convolution layer's local connectivity lets the network learn the filters whose interaction is maximal to their input's local region, and it exploits the spatially localized correlation of inputs. Each pixel in an input image would be much more correlated with the adjacent than distant pixels. Furthermore, while the activation maps can be created by convoluting a filter with the inputs, all filtering characteristics are shared through all local positions. This weight-sharing method minimizes the range of parameters needed for successful expression, learning, & generalization [188].
• Pooling Layer: Pooling layers have diminished the incoming volume's spatial extent in advance of the upcoming convolutional layer, and it has no effect on the volume's depth dimension. This layer's operation is also known as subsampling or downsampling because the decrease in size results in a concurrent data loss. However, any kind of loss is advantageous to the network as the reduction in the size directly leads to a lower computational burden for the network's successive layers and tends to work against overfitting. The most popular methods are average pooling and maximum pooling. It is revealed that maximum pooling could further lead to faster convergence, excellent feature invariant selection, & improved generalization [186]. A pooling layer is  often placed between two successive convolutional layers. Through down sampling and its representations, the layer minimizes the range of elements & computations.
Max pooling is widely utilized because it is much more effective [188].
• Fully Connected Layer: After the convolution and pooling layers, fully linked layers have been used to perform high-level thinking in CNN. As the name indicates, this layer possesses the entire connections among all activations at the preceding layers using neurons. These activations could be calculated by performing a matrix multiplication preceded by a bias offset. The two-dimensional extracted features are eventually converted to a one-dimensional feature vector utilizing fully connected layers. The generated vector may be classified into a specific set of categories for use as a feature vector in further analysis [186]. VGG network influenced this 3-dimensional Yoga Net model. VGG net is a highly deep Convolutional Neural Network model, and it won the Large Scale Visual Recognition ImageNet Competition at categorization and localization tasks in 2014. This model architecture is the same as that of CNN architectures. Both Keras & Ten-sorFlow packages were used to create this network, which has been done in Python. This CNN uses hyperparameter activations such as rectified linear unit (ReLU) in the convolution layer and SoftMax in dense layers. In threedimensional skeletal-based human action classification tasks, combining three-dimensional movement sensory data with two-dimensional color-coded JDMs as input to this deep network was beneficial. Despite this, these joints' distance was restricted by their potential to reflect rotated joint motions that contribute to quite a significant availability of data in human action categorization tasks. Incorporating the extra feature of joint orientation with joint distance characteristics reduces the recurrence of false positives in this approach. As a result, rather than JDMs, employing JADMs. In multiple trials, JADMs outperformed province techniques utilizing JDMs and CNNs in terms of training time and accuracy. Over four weeks, a nine-camera motion capture system has been used to capture 42 yoga positions with ten subjects at ten different positions. At last, this model can track the transition of yogic movements with time. It could serve as a baseline for a self-assessment yogic approach [97].
CNN achieves the yogic posture classification accuracy of 95% in training [94]. In [83], the proposed CNN model achieved 90% average accuracy in all yoga postures. In this model, CNN achieved 98% training, 88% validation, and 91% testing accuracy. Another approach would be to employ a VGG-16 trained network using a transfer learning algorithm. This approach achieves an accuracy of 98.44% in training, 79.3% in validation, and 72% in testing. Finally, the results specify that VGG16 model accuracies are marginally lesser than that of the CNN models [100].
Some popular CNN architectures like AlexNet, VGG16 and ResNet18 are used in [106]. The AlexNet model was built with eight layers, and the first five layers are convolutional layers followed by two fully connected layers and one Soft-Max layer. Eight layers of AlexNet, 16 layers of VGG, and 18 layers of ResNet were trained and tested with 13 classes of yoga datasets with 2129 images. In this approach, deep learning classifiers except AlexNet perform worse than other models. This model classifies the yoga postures with an accuracy of 30% in VGG16, 60% in ResNet18, and 83.05% in AlexNet [106].
For vision-based applications, including specific facial recognition, object identification and recognition, posture detection, robotics, and self-autonomous vehicles, CNNs are already proven to be incredibly effective [186].

1) Three-Dimensional CNN (3D-CNN) Model
A three-dimensional CNN model architecture has been developed to identify yoga postures quickly. This model utilizes the three-dimensional convolutional DL framework to recognize the yoga poses based on their underlying spatial-temporal relationship. They used cell phone cameras with a 4K resolution as a 30-fps frame rate for recording all videos. The sequences are fed into the proposed three-dimensional CNN. The stated 3D CNN model's 3D convolution layer extracts discriminative features from the data video clips. The derived features are then fed into the SoftMax layers to predict the specified yoga pose from the ten yogic postures [95].
It consists of repeatedly iterating three-dimensional convolution, maximum pooling, average pooling, and fully connected layers using SoftMax activations. At first, these convolution layers toward the up and down phases of the network learned that retrieve high and low discriminating data via yogic pose videos for categorization. However, the pooling layers minimize the spectral dimensionality of the feature mappings derived from the convolution layer. This average pool layer aggregates the combined output features retrieved from the activity video into one-dimensional input vectors, which are then categorized into one of ten Yoga positions by the SoftMax layer [95]. The network's convolution layers use the ReLU function to perform nonlinear feature transformation. With no computational overhead, this activation function generates monotone gradients that are quicker and simpler to compute. Moreover, batch normalization can be used at every stage of constructed networks to increase speed, accuracy, and stability. Additionally, when the dropout layer was used, which had a probability of 0.5, its networks were forced to identify highly resilient characteristics that can be used by combining several randomly chosen subsets over the other neuron. Dropout is perhaps the fast way of aggregating models in the neural network by eliminating random elements [95]. The network is set to automatically pick the best filters that reflect the key features for recognizing Yoga poses.  [92].

2) RECURRENT NEURAL NETWORK (RNN)
RNNs are also neural net architectures used to resolve conflicts involving sequence prediction. Sequence prediction issues could be one-to-many, many-to-one, or many-to-many type. The prior data of a neuron are kept in RNNs, which aids in processing sequential data. Therefore, the context is kept, and output is given while taking earlier learned facts into account. RNNs have frequently been used to solve problems in natural language processing where the inputs are typically modelled after the sequence. However, there is an interdependence between the past action and the following act in action detection or posture categorization tasks. When identifying the final posture in yoga, the context or understanding of the opening or intermediate poses is equally crucial. As a result, yoga can be conceived of as a series of stances. RNNs are an excellent option for yoga posture categorization since the sequential assessment of joint positions can better reflect the interdependence among joint locations [182]. However, the major issue of RNNs, they cannot protect against long-term dependency. The current data can be suitable for performing its recent task, and however, in some cases, the distance will be too large amidst the current data and its task. RNNs failed in these circumstances of not being able to integrate that relevant data. During yoga, when the intermediate movements for yoga postures become too lengthy, RNNs fail to preserve the recording of the initial phases required to estimate the portrayed action, and its termed a long-term dependency issue in RNN [182]. •

4) AUTO ENCODER (AE)
This model utilizes a pose recognition system using CNN and a stacked autoencoder. An autoencoder is another unsupervised measure to reduce dimensions. An autoencoder could do encoding; after that, the inputs could be classified by decoding by NN. Image inputs are supplied into the hidden units of a Stacked Auto Encoder (SAE) to extract the features that are subsequently fed into the output units of SAE to rebuild the input image. The final layers are used as inputs for the classifier. They train an NN like a classifier that maps the extracted features for its output labels. After this SAE, it has a neural network containing 784 input layers, 100 hidden layers, and output layers equivalent to the total of classes [83]. They collected two datasets of 12 karanas and 14 karanas of 864 and 1260 images using YouTube and a third dataset containing 400 images of 8 yoga postures. They achieved accuracy in 86.11%, 97.22%, and 70% of those three datasets [83].

5) HYBRID MODEL
In recent times, a CNN-LSTM hybrid has been used for sentiment analysis, text categorization, cardiac prognosis, face anti-spoofing, and skeleton-based pose estimation. The combined deep learning approach utilizing CNN & LSTM was implemented for yoga posture recognition on real-time clips. In this model, the CNN layer was utilized for extracting the features on keypoints of every frame acquired through OpenPose, and the LSTM layer delivers the temporal posture estimation [96].
A hybrid of CNN & LSTM is employed in this deep learning model. CNN was often used for pattern recognition issues, while LSTM was used for time-series data applications. This time-distributed CNN network was employed to extract the feature from two-dimensional coordinated keypoints generated within the preceding stage. The SoftMax estimates the possibility of every asana in a frame, while an LSTM model examines the changes within those features on an over a frame. That predicted threshold value was utilized to identify a frame in which the user is not practicing yoga, and the impact on polls of frames is being explored. This model predicts the yoga postures in every video with 99.04% accuracy for frame-wise & 99.38% accuracy on the poll of 45 frames, and it attains an accuracy of 98.92% in real-time for a group of 12 diverse persons, demonstrating its capability to perform six yoga poses effectively [96].
Wiga introduces a deep learning model which combines CNN & LSTM to retrieve the high-level characteristics. It uses fine-grained CSI as a source to construct a deep learning model which maps motion-induced signal changes on activity sequences. Beginning with the measured inputs of CSI, Wiga filters out the undesirable signals and their redundant components. After that, it abstracts the deep features using CNN and models the source's temporal dependencies with LSTM. This model was evaluated with 17 yogic postures performed by seven practitioners. Wiga attains 97.7% & 85.6% accurate yogic posture results for the trained, and untrained yoga participants [111]. Table 9 specifies the achieved accuracy of existing yoga posture recognition systems using deep learning models for classification.

C. METRICS FOR PERFORMANCE ANALYSIS
The following metrics are utilized for evaluating the performance of the ML and DL classification models. 1) Confusion Matrix: It is an N * N matrix used for comparing the actual value with the predicted one. Here, the range of target classes was specified as N. It measures the classification performance of ML methods, also referred to as Error Matrix, having a specific layout table used to visualize and summarize the performance of a classification model. It has four specific outcomes used to describe the evaluation metrics of the classifier. They are • True Positive (TP): The model correctly predicted the specific type of asana.
• False Positive (FP): Model incorrectly predicted the specific type of asana. It is a Type-I error.
• True Negative (TN): The model correctly rejected the specific type of asana.
• False Negative (FN): Model incorrectly rejected the specific type of asana. It is a Type-II error.
2) Accuracy: It is the ratio of the correctly classified asanas out of all the asanas. Accuracy intimates how the model performs in all asanas.
3) Overall Accuracy (OA): It represents the total ratio of accurately classified asanas across all the asanas [105].

4) Precision:
It is the ratio of the number of asanas correctly classified in the specific type to the overall asanas classified correctly or incorrectly in that specific type.

5)
Recall: This is the ratio of correctly identified yoga poses to the overall number of yoga poses in that group.
6) F1-score: It represents the harmonic mean of precision and recall. It attempts to find the balance of precision and recall. It specifies how precise and robust the classification model is.
7) Activity Error Rate (AER): For maintaining the consistency of the standard recognized sequence of activities, some activities must be altered, removed, or inserted. AER is defined as calculating the overall percentage of  all the altered, removed, or inserted activities divided by its standard sequence of action [111]. The resultant coefficient of MCC is: • +1 means perfect prediction and best agreement between actual and predicted values.
• 0 means random prediction, and it is no agreement or relationship between any values.
9) Overall Correctness Score (OCS): Yoga Help system [91] assesses the entire sun salutation sequence. If any sun salutation sequence is missed during the performance, it assigns the OCS as zero. Otherwise, it calculates the OCS as follows: a -Number of correct steps in speed, b -Number of acceptable steps in deviation. 10) Support: It specifies the number of actual class occurrences in that dataset. Table 10 illustrates the performance evaluation metrics used for the yoga posture recognition systems.
It discusses the machine learning algorithms frequently used in YPR and explains how each machine learning algorithm differs depending on the applications and settings. Based on the performance and based on the type of data, we have to choose an algorithm. It offers a comprehensive breakdown of the ML and DL models utilized in YPR so that the readers can decide what course of action to take in light of the availability of the new ML methods. Table 11 shows the detailed analysis of distinct computer vision methodologies utilized in YPR.

V. PREDICTION APPROACHES
Yoga poses are detected by the YPR system as two approaches using vision and sensor-based methodologies.

A. YOGA POSTURE CLASSIFICATION
Yoga has recently witnessed extraordinary global popularity. It is a great way to exercise at home. Sun Salutation is also a type of Yoga form that involves strengthening practically every region of the body and incorporates a series of 12 linked stages [91]. Deep learning has recently achieved incredible TABLE 12. Summary of yoga pose classification and grading methodologies in recent years. VOLUME 11, 2023 performance in tackling the yogic posture classification due to its remarkable featured learning capability [108]. With the proliferation of motion sensors, it is now possible to collect motion data and monitor the performance of yoga postures. This unique Yoga Help system uses motion data to recognize the various movements & estimate how well they were performed by the practitioner [91].
In [88], a computer vision-assisted self-training yoga system was implemented. According to the skeleton and contour-based analysis from front and side perspectives, this yoga method categorizes the yoga positions. Five practitioners were selected to perform a total of 12 yoga poses, five times for each pose. They achieved a maximum of 99.87% and 99.15% accuracy in front and back views.

B. YOGA POSTURE GRADING
Unlike yoga pose categorization, which seeks to deduce the yoga posture class label, yogic posture grading (YPG) attempts to measure the individual's yogic acts statistically. Despite the fact that there are numerous studies on yogic posture categories, there are very few studies on yogic posture grading [108].
The reference posture was the desired yogic posture the client attempted to perform. The targeted posture and posture derived from its key-point prediction model will be compared in order to validate the consensus among various angles and joints. This resemblance could determine whether the participant's posture was correct. Examining the angles between a participant's joints and then confirming that those angles must be within the level of tolerance for performing a yogic posture based on the field of expertise of yoga is one technique to identify anomalies [94].
In [24], a computer vision-assisted autonomous yoga learning system was developed. In this interactive learning system, the player's gesture was compared with the standard yoga posture, and its grade would be computed using distance transition & pattern matching. Six people were chosen to perform every posture three times in the trials. The system score was assigned between 0 to 100. The differences between the scores given by the computer and the yoga instructor would be approximately around 86 percent lies in -2.5 to 2.5.
The effectiveness of a user's yogic posture is assessed using a Self-Practice YPR with an OpenPose-based Workout & Overall Coaching Consultant model, and a statistically based grading approach is suggested. The algorithm generated the overall score of the selected keypoints by adding the weight to the total score. Initially, all poses of 50 practitioners resembling a yoga instructor's pose were recorded using subjective evaluation, and the two angle variations of each chosen keypoint were computed. Second, the values of angle difference could be normalized, and the mean score and standard deviation were computed using the feature of the distribution. Finally, the intervals of postural assessment for the chosen keypoint are determined by adjusting the threshold based on angle changes. Some postures concentrate on the hands, while others concentrate on the feet. Every competent yoga teacher is familiar with the prerequisites for each Yogic position. The major vital points would receive an 80, while the other key points would receive a 20 if the appropriate weighting were applied [99].
The yoga teacher visual analysis system was used to compare and assess the performed yogic video sequences of beginners and professionals using the speed-up robust feature algorithm. It enables users to adjust their yoga postures without the assistance of yoga professionals [107].
Selecting two yoga posture photos of the student and the teacher, retrieving the human skeleton keypoints, and entering those into the posture characteristic encoder are methods for grading yoga postures. At last, in terms of generating the posture grade, overall feature similarities in-between them were computed [24], [94], [99], [107], [108]. Table 12 represents a detailed overview of these approaches used in the recent ML and DL models.

VI. INFERENCES
This section illustrates some inferences observed through this detailed analysis.
• Yogic posture recognition is one of the major research challenges in computer vision. It is well known that the proper yoga posture promotes health benefits and lowers illness burden. The health advantages of practicing yoga are still not widely known yet.
• An enhanced tool with excellent resolution and accuracy is available for vision-based models to identify yoga poses, but it lacks privacy. Hence, new models utilize multiple sensors for predicting yogic posture to achieve high accuracy.
• Machine learning and deep learning models provide accurate results to some extent for estimating the yogic posture. Beyond that, a combination of these models (Hybrid models) is applied to achieve better results.
• In the current scenario, YPR models must be accompanied by skeletal feature extraction, Keypoint estimation tools, angle calculations, and learning algorithms.
• From this detailed analysis, it has been noted that yoga posture can be accurately classified and graded in image, video, and live streams using a variety of ML, DL, and hybrid models.

VII. CONCLUSION
YPR aids in advancing different applications in many healthcare, security, games, and fitness areas. Due to their small size and low cost, vision-based detectors and wearable sensors have historically been widely utilized to collect yogic postures. This paper investigates the most recent yoga posture assessment developments using sensing devices, machine learning, and deep learning methodologies. Initially, the feature of vision-based and sensor-based gadgets has been provided. Furthermore, a thorough analysis of keypoint estimation strategies used in recent research is presented, highlighting the key merits and shortcomings of each strategy. The study also includes information about the publication, used kinetic models, and the sensor's location attached to the human body. The widely used machine learning and deep learning algorithms are then suggested, along with their accuracy. Finally, it specifies the recent research on classification and grading approaches of yogic posture. The discovery of advanced textiles with multi-sensing abilities and the development of sensing devices for smart healthcare open new directions for further research. Yoga, pranayama, and other forms of exercise can address health advantages, which can be a breakthrough in healthcare applications and open the door for healthcare sectors to extend their emphasis in the face of the COVID-19 pandemic. It raises awareness about using E-textiles, smart textiles, and other vital human monitoring devices. The use of such wearables to monitor bodily vitals has rapidly increased during this pandemic; it creates self-awareness in every individual's health care across the globe and predicts health issues. The existing approaches only predict the accurate pose when all the body parts are visible. During the practice of complex yogic postures, some body parts overlap with others; these parts are obscured from view or difficult to detect. The current approaches typically do not label or predict concealed body parts. Most large-scale pose estimate datasets comprise typical poses used daily, making it difficult for the existing models to handle such complicated poses successfully. Existing yoga posture recognition systems couldn't produce the expected results in complex yogic posture recognition. Furthermore, the investigation of three-dimensional body posture recognition might be helpful in this, and it could address new challenges in the future. A Novel approach to detect multiperson yogic postures in a single frame during yoga practice could be an extension of YPR and future improvement. Many factors, including background, lighting, and overlaying figures, would make predicting multi-person postures even more difficult. Future investigation will focus on enhancing the suggested system's potential to detect a broad range of yogic postures performed by multiple persons and generalizing the method for use in the real world. Durable models across multiple camera angles would be essential in applications like an automated Yoga trainer. Train the models using a large number of yogic postures with a variety of camera angles to do this. New inventions with the right combination of ML methods and wearables will benefit the entire society and improve the quality of people's lives conveniently. An innovative integrated yoga mat with a smartphone and optional wearables like a smartwatch, smart band, or any other smart gadgets is a powerful combination for creating a personalized yoga journey in future. Design an efficient mobile application for self-assessment yoga and gaming tools for health improvements in this pandemic period. Moreover, design a novel approach to predict the amputees' yogic postures in images.