Face Emotion Recognition Based on Machine Learning: A Review

Computers can now detect, understand, and evaluate emotions thanks to recent developments in machine learning and information fusion. Researchers across various sectors are increasingly intrigued by emotion identification, utilizing facial expressions, words, body language


INTRODUCTION
Despite their best efforts, humans cannot fully suppress emotions, as some researchers argue that emotions are inherent abilities.Emotion detection, an automated technique for determining an individual's affective state, is becoming more and more important in the field of human-computer interaction (HCI) for a variety of applications, such as automobile safety (Hudlicika & Broekens, 2009).Unfortunately, most modern HCI systems lack emotional intelligence, rendering them incapable of processing or understanding emotional data and making decisions based on such information (Newell & M. Marabelli, 2015).Typically, emotions are assessed by analyzing patterns of facial expressions, head movements, eyelid movements, or a combination of these factors.While the visual sense of facial emotions is valuable for emotion identification, it is not always sufficient (Vankalayapati et al., 2011).In advanced intelligent systems, addressing the disconnect between humans and machines is crucial.A system that cannot recognize human affective states is prone to inadequate responses to those states.Therefore, it is crucial to train machines in interpreting and understanding human emotional states (Amanoul et al., 2021).As a result, the development of a reliable, accurate, flexible, and resilient emotion identification system becomes imperative for successful implementation in intelligent Human-Computer Interaction (HCI).With the overarching goal of instilling machines with emotions, an increasing number of researchers in artificial intelligence (AI) have explored affective computing, particularly emotion recognition, establishing it as an emerging and promising area of study (Kratzwald et al. 2018).Numerous studies on emotion recognition in audiovisual formats have been conducted over the years.The literature generally exhibits three primary methods: visualbased, audio-visual, and audio-based approaches.Early research primarily concentrated on independently addressing auditory and visual data modalities.The basis of audio-based emotion detection endeavors involves extracting and identifying emotional states from human voice signals (Tawari & Trivedi 2011;Kashevnik et al., 2021).
An increasing cohort of experts in the fields of ergonomics and intelligent systems is focused on improving the efficiency and flexibility of Human-Computer Interaction.This dedication arises from the rising prevalence of collaboration between humans and machines in diverse contexts.In intelligent HCI systems, computers must exhibit adaptability to accurately comprehend human communication styles and deliver appropriate responses (Zhihan et al., 2022).Human intentions are conveyed through both verbal and nonverbal means, encompassing a spectrum of emotional expressions.The understanding of human emotions and behavior plays a pivotal role in the adaptability of computers, giving rise to a burgeoning field known as affective computing [Shantanu et al., 2022;shadeeq et al., 2023].The selection of distinctive features for discerning various emotions is a critical consideration in this context.Two categories of features, namely prosodic and spectral features, have been identified as valuable for the identification of emotions in speech (Chen et al., 2005;Wu et al., 2014).

BACKGROUND THEORY
A person's intricate emotional state is shaped by the interplay of behavior, thoughts, and feelings, manifested through psychophysiological reactions to internal or external stimuli.The quest to quantify beauty has prompted numerous studies across diverse fields, including psychology, philosophy, biology, and the arts, with a particular focus on facial analysis and aesthetics (Saeed et al., 2022).Affective computing serves various purposes, notably contributing to intelligent and user-friendly Human-Computer Interaction (HCI).Precise realtime detection of the human operator's emotional state can enhance HCI systems significantly (Dave, 2023).In military and aerospace domains, it is feasible to promptly identify the high-risk functional status of soldiers, pilots, and astronauts in real-time.Beyond this, emotion recognition technology finds application in public transit, enhancing driving safety by monitoring a driver's real-time emotional state and preventing risky driving during periods of extreme emotional stress (Zhang et al., 2020).In eLearning systems, the primary emphasis is on single-user face detection, where facial expressions are used to represent the user's emotions, enabling appropriate adjustments to instructional tactics (Ashwin et al., 2020;Abdullah & Abdulazeez 2021).This interdisciplinary field draws from cognitive science, psychology, and computer science.Emotions wield substantial influence over human behavior, impacting processes like perception, attention, learning, and decision-making (Zhao et al., 2016).Within the training process of machine-learning systems, loss functions are fundamental components.The optimal parameter values for the system are obtained by minimizing the mean loss value across a labeled training set, as outlined in reference (Saeed et al., 2023).Affective computing applications span diverse domains, including automatic driving assistance.Physiological signals are employed in alert systems to monitor the user's state.For instance, if a driver is too fatigued, unresponsive, or unwell to drive, the system can issue alerts and take appropriate actions, such as reducing speed or stopping the vehicle, to enhance driving safety and security (Kashevnik et al., 2021;Yang et al., 2018).
The rapid advancements in Artificial Intelligence emphasize the urgent requirement for intelligent HCI.Researchers are increasingly drawn to emotional computing as a significant area of study within AI (Anderson & Mc Owen, 2006).Emotion recognition research aims to improve humancomputer interaction by making it more smooth, natural, and friendly.This entails a transition from machine-centric to human-centric machine design, evolving the computer from a solely logical computing unit into an intuitive perceptron.
To achieve this transformation, a crucial prerequisite is the incorporation of affective computing capabilities into the machine or computer.
Without emotional intelligence, the computer or device would be unable to attain a level of intelligence equivalent to that of humans (Zhang et l., 2020;Jenke et al., 2014).

The Concept of Emotion
The initial step toward recognizing emotion involves defining the concept itself.The interdisciplinary nature of the inquiry, spanning computer science, philosophy, and neuroscience, has led to multiple efforts to address this question and formulate a comprehensive definition of emotion.However, consensus is lacking, and discord persists, with no universally accepted definition.The significance of defining emotion is particularly pronounced in Machine Learning (ML), where a clear definition is essential for establishing success criteria.
To address this challenge, a common strategy involves categorizing emotions using two models: continuous and discrete (Chen & Zhang, 2017).The general process of emotional computing based on physiological inputs can be outlined in three phases (Halfon et al., 2011): i) Feature extraction: This involves extracting features from diverse sources of heterogeneous physiological signals, including respiration, pulse rate, galvanic skin reaction, EEG, and ECG.
ii) Emotion recognition: The second step is the identification of the emotional state through the processed physiological data.
iii) Emotional regulation: The final step involves the control or modification of emotions using psychological techniques, completing the cycle of emotional computing.

Discrete Emotion Spaces
Throughout history, philosophers have delved into the realms of emotion and feelings, with traces of such contemplation dating back to (Gravier, 2017).During the Roman Empire, emotions were conceptualized and categorized into four fundamental types: fear, pain, lust, and pleasure.Meanwhile, posits that emotions have an evolutionary origin (Colzato, 2017).
transcending cultural boundaries and supporting the notion of natural selection (Deng & Ren, 2021).
In order to capture the complexities and subtleties of emotions, numerous authors have embraced the concept of continuous multi-dimensional space models.In these models, emotions are measured along predetermined axes within a continuous multi-dimensional space, facilitating easier comparison and classification of emotions (Trnka et al., 2021).This approach offers a framework for overcoming the challenges associated with understanding and categorizing the intricate landscape of human emotions.

Continuous Dimensions
Two important factors need to be taken into account in a continuous portrayal of emotion: the ability to delineate correlations between various emotional states, such as sadness and admiration or trust, and the quantification of a specific condition, distinguishing, for example, between very sad, sad, and not sad.Arousal, ranging from calm to excited, and valence, spanning from negative to positive, are fundamental dimensions taken into account in this context.Figure 1 shows the mapping of several emotions inside the two-dimensional valencearousal space (Gunes, et al., 2011;Sadeeq & Abdulazeez, 2022).This representation aids in capturing the nuanced variations in emotional experiences by considering both the intensity and the positive or negative nature of emotions.

Autonomic Nervous System
Emotions have been evolutionarily maintained as a crucial mechanism for efficiently mobilizing and coordinating rapid responses from diverse systems when environmental stimuli threaten existence.Despite general agreement with Levenson's theory, theorists diverge on the number of distinct emotional states connected to specific Autonomic Nervous System (ANS) patterns (Mohsin & Beltiukov, 2019;Mendl et al., 2022).
On one side, some theorists assert that the "on" and "off" states represent the only two ANS patterns.Conversely, other researchers propose the existence of numerous ANS activation patterns, each linked to a distinct emotion.The intricate functionality of the Autonomic Nervous System (ANS) introduces an additional challenge in establishing correlations between an individual's emotional state and their present physiological signals.Physiological signal alterations, such as an increase in heart rate or breathing, are more likely attributed to non-emotional ANS functions rather than emotional ones, further complicating the association between physiological responses and emotional states (Giannakakis, 2019).
vi) Electromyography (EMG) Electromyography (EMG) gauges the electrical activity of skeletal muscles through the use of needle or surface electrodes.Muscle contractions cause an increase in the amplitude of the EMG signal (Patil & Pawar, 2022;Ibrahim et al., 2016).

vii) Electroencephalography
(EEG) EEG measures the electrical field generated by currents during synaptic connections between neurons in the cerebral cortex (Brienza, & Mecarelli, 2019).viii) Eye Gaze Methods including electrooculography (EOG), photoelectric, and infrared reflection detect the eye's resting potential as well as variations during vertical and horizontal eye movements.
This provides insights into visual attention and emotional responses (Pazvantov & Petrova, 2022).It is essential to use pre-validated emotional stimuli to achieve a comprehensive portrayal of feelings at varying intensities because of the subjectivity and variation in emotion elicitation (Somarathna et al., 2022).

Machine Learning Algorithms
Machine learning algorithms have found applications in various sectors, ranging from medicine to economics.One specific field that focuses on analyzing data patterns in an educational setting is education data mining (Ahmed et al., 2021).Once the necessary dataset has been generated with relevant attributes, employing a robust classification method becomes a crucial next step.Encouragement for the multiclass classification of human expressions, vector machines (SVM) are commonly used, frequently in combination with different feature extraction techniques (Raut, 2018).Within the realms of computer vision and machine learning, forecasting face attractiveness stands as a challenging yet pivotal undertaking.The intricacies of human perception and the diversity of facial appearances pose challenges in developing reliable and efficient Face Beauty Prediction (FBP) models (Saeed, et al., 20223).The discussion of the benefits and drawbacks of applying deep learning models is essential for the development of glaucoma screening, diagnosis, and detection systems (Kako & Abdulazeez, 2022).
Understanding both the advantages and limitations will contribute to the effective utilization of these models in enhancing glaucomarelated healthcare practices.

Support Vector Machine (SVM)
Among the most effective classification methods, Support Vector Machines (SVM) aim to identify the best hyperplane that accurately separates two classes.The concept of a margin is crucial, representing the maximum distance from both classes to prevent any overlap.To manage non-linear data, kernel functions like polynomial and radial basis function (RBF) are utilized.Rather than using a binary classification, a multi-class Support Vector Machine (SVM) is frequently used in the context of emotion detection to identify a variety of emotions, including fury, fear, disgust, contempt, happiness, sorrow, and surprise.
When comparing multiple machines learning methods and mitigating variations from the database, k-fold cross-validation is commonly applied (Rajesh & Naveenkumar, 2016).Fine-tuning the variables, gamma and C, has a notable impact on the accuracy of classifiers, allowing for optimization in both binary and multi-class classification scenarios (Loconsole, et al., 2014).

Hidden Markov Models (HMM)
Hidden Markov Models (HMM) are statistically useful for revealing hidden patterns within data and are particularly popular for speech-based emotion recognition (Lee, et al., 2008).In this approach, a series of observable features serves as input.The advantage of using both Hidden Markov Models and k-Nearest Neighbors (k-NN) lies in the fact that HMM can perform sophisticated computations, while k-NN only needs to classify between the given samples [53] (Zhou, et al., 2004).To achieve optimal results for speech emotion recognition, Hidden Markov Models are often employed in serial multiple classifier systems (SVC + HMM).In such systems, Hidden Markov Models (HMMs) are employed for sample training, and classification is managed by Support Vector Machines (SVM).SVM provides a direct classification rather than a score, enhancing its applicability in this context (Nijs, et al., 2016;Sadeeq & Abdulazeez, 2023).

Random Forest Classifiers
In certain instances, decision trees have demonstrated their superiority over Support Vector Machines (SVM).Built on decision trees, random forests improve performance by using several forests or classifiers rather than just one to identify the target variable's class.S1 through S5 represent the subset of emotions that are used for detection.Various techniques, including K-Nearest Neighbor, Neural Networks (ANN), and Linear Discriminant Analysis, are employed for emotion prediction and categorization (Chen, et al., 2017).Figure 4 illustrates the classifier for this type, showcasing the architecture and methodology used in these approaches.

METHODS
The diagram provided in the accompanying image outlines the key processes essential for building a machine learning system dedicated to emotion recognition, as depicted in Figure 5.

Fig. 5. Schematic illustration of an emotion identification machine learning
technique (Zhang et al., 2020) The data acquisition protocol in developing a machine learning (ML) system for emotion recognition is susceptible to various issues that introduce noise and external interference into the sensor signal.Factors such as subject movement, electrode disconnection, environmental changes in humidity and temperature, electrostatic artifacts, and unexpected user movements can lead to signal degradation.Consequently, the initial stage in ML system development typically involves applying signal preprocessing techniques to the raw signal.This involves filtering, noise reduction, and outlier removal, synchronizing signals from various sensors, and addressing null values and data loss through methods such as linear interpolation (Hosseini & Khalilzadeh, 2010).
Convolutional Neural Networks (CNNs) are recognized as trainable systems with the ability to reduce dimensionality and acquire discriminative features.Identifying emotional states can be accomplished through two primary methods: (1) feature-dependent machine learning techniques that rely on feature-class representation, and (2) featureindependent machine learning methods, including approaches within deep learning (DL) (Patil & Pawar, 2022;Haji, et al., 2021).To address challenges like overfitting and limited dataset size, researchers have recently turned to transfer learning and components of deep convolutional neural networks (DCNN).However, the implementation of large CNN systems remains challenging due to their computational intensity and the millions of parameters involved, particularly on small devices with limited hardware resources (Saeed, et al., 2022).In traditional machine learning system design, a feature engineering stage is often introduced after signal preprocessing to optimize the useful content of physiological signals.Once feature engineering is completed on the incoming input, a classifier outputs the subject's emotion class label (Zhu, et al., 1988;Tripathi, et al., 2017).These metrics, commonly known as features, offer a succinct description of the signal, enabling comparisons across different signals in transformed dimensions and augmenting the informative characteristics of the signals.These features may be linear or non-linear, unimodal or multimodal, and can belong to the temporal, statistical, or spectral domains.Figure 7 illustrates the data variable types (Patil & Pawar, 2022).
Research findings indicate that Convolutional Neural Networks (CNNs), as a deep data-driven technique, exhibit effectiveness in extracting or predicting facial attractiveness from images (Abdulkareem & Abdulazeez. 2021).Improved performance is achieved by CNN models with deeper structures, larger input images, and smaller convolution kernels (Saeed et al., 2023;Alzubaidi, et al., 2021).Genetic algorithms (GA) and differential evolution (DE) are two widely recognized algorithms designed to simulate the genetic process of reproduction (Sadeeq & Abdulazeez, 2023) .

LITERATURE REVIEW
K. P. Seng and colleagues proposed an approach aimed at emotion recognition in audio and video streams (Seng, et al., 2016).Their method integrates machine learning and rule-based strategies to enhance the effectiveness of emotion recognition.The visual route is established to achieve dimensionality discrimination and reduction, utilizing Bi-directional Principal Component Analysis (BDPCA) and Least-Square Linear Discriminant Analysis (LSLDA).The visual characteristics are subsequently analyzed by an Optimized Kernel-Laplacian Radial Basis Function (OKL-RBF) neural classifier.In the audio route, features such as Mel-scale frequency cepstral coefficients, spectral properties, log-energy, pitch, teager energy operator, and zero crossing rates, are employed.This comprehensive approach seeks to improve recognition efficacy by combining advanced techniques for both visual and auditory emotional cues.
Introduced a sophisticated linear model designed to distinguish facial movements in expressive face recordings with diverse linearly-representable characteristics (Xiang & Tran, 2017).In contrast to previous approaches that required a clear but somewhat unrealistic dissociation of identity and expression, their approach uses sparse representation just on the residual expression components and simultaneously captures the underlying neutral face.This is accomplished by implicitly subtracting the neutral face and leveraging the lowrank characteristic between frames.In experiments conducted on manually created expression components, their one-shot C-HiSLR, when applied to rawface pixel intensities, demonstrated superior performance compared to traditional shape + SVM models with landmark detection and two-stepped Sparse Representation Classification on CK+.
Employed transfer learning by training Convolutional Neural Networks (CNNs) with millions of images, allowing the knowledge gained in training to be applied to a different task (Shaees, et al., 2020).AlexNet, the selected pre-trained CNN, uses a hybrid classifier that blends transfer learning with a Support Vector Machine (SVM)-like classification methodology.The evaluation of their approach involved testing it on the Cohn-Kanade+ (CK+) and Natural Visible and Infrared Expression (NVIE) databases, both widely used expression databases.
Presented a hybrid convolutionrecurrent neural network method for recognizing facial emotions in photos (FER) (Jain, et al., 2018).Convolution layers and a recurrent neural network (RNN) are used in the suggested network design to make it easier to discover correlations from facial photos.During the classification process, temporal dependencies in the images are accommodated by the recurrent network.The researchers assessed their hybrid model using two publicly available datasets, demonstrating promising experimental results.In the data preparation stage, each facial outline is initially identified using a face and point of interest finder.Following normalization, the nose, lips, and nose are aligned, and mean subtraction and contrast normalization are applied when processing each facial image through the CCN.
Introduced a system incorporating a random forest classifier for facial expression detection (Arora, et al., 2018).The experiment assessed the system's performance in recognizing five common emotions-sadness, joy, anger, neutrality, and surprise-utilizing data from the Japanese Female Facial Emotion (JAFFE) database.The proposed framework shows promise for real-life applications, particularly in conjunction with electroencephalograms and braincomputer interfaces.Considering that facial emotions arise from facial muscle deformations, the system utilizes gradient features, well-known for their sensitivity to object deformations, to encode these facial components.Emotion classification is the next testing phase, when assessment parameters like false acceptance rate, false rejection rate, and recognition accuracy are measured.devised a system focused on recognizing students' emotions based on facial cues, employing a three-phase approach: The process involves face detection using Haar Cascades, normalization, and emotion recognition using Convolutional Neural Networks (CNN) on the FER 2013 database, which includes seven distinct expression types.The findings suggest the feasibility of detecting facial emotions in educational settings, offering potential assistance to teachers in adjusting their presentations based on students' emotional states (Lasri, et al., 2019).
Presented a method for recognizing facial expressions using image edge detection and a Convolutional Neural Network (CNN) (Zhang, et al., 2019).The process involves normalizing the facial expression image, extracting edges using convolution, incorporating edge information onto feature images to preserve texture details, reducing dimensionality through maximum pooling, and ultimately employing a Softmax classifier for expression classification.A simulation experiment evaluates the method's robustness in facial expression identification against complex backgrounds, combining the Fer-2013 facial expression database is combined with the Labeled Faces in the Wild (LFW) dataset for scientific testing purposes.
In order to assess primary emotions in images, like happiness or sadness, Verma built a Convolutional Neural Network (CNN) with two components: the first one predicts the secondary emotion, while the second one analyzes the fundamental emotion (Verma & Verma, 2020).After being trained using the FER2013 and Japanese female facial expression (JAFFE) datasets, the model showed superior capacity to predict emotions from facial expressions than existing state-of-the-art approaches.
The authors' use of facial expression analysis has improved the ability to identify emotions.The phases of creating, honing, and testing an algorithm for emotion recognition based on logistic regression are described in their work.The study offers comprehensive information about the optimization process and outcomes in training and test sets (Barrionuevo et al., 2020).
A Convolutional Neural Network (FERC) was used to propose a novel approach to facial emotion recognition (Mehendale, 2020).The CNN in the FERC is divided into two segments: the first segment eliminates the backdrop of the image, and the second segment concentrates on extracting facial feature vectors.An expressional vector (EV) is used by the FERC model to distinguish between five different categories of typical facial expressions.Supervisory information was gathered from a 10,000photo collection that included 154 different people.The last perceptron layer modifies weights and exponent values in each iteration of the two-level CNN, which functions in a sequential fashion.Notably, FERC differs from standard methods by employing a singlelevel CNN, which adds to improved accuracy.Moreover, a new backdrop removal method that is applied prior to expressional vector (EV) production tackles possible problems such changes in camera distance.Introduced a modular system designed for the recognition of human facial emotions, consisting of two machine learning algorithms for offline training and subsequent real-time application, specifically for detection and classification (Alreshidi & Ullah, 2020) A genetic algorithm (GA) was integrated with support vector machine (SVM)-based classification to address a multi-attribute optimization problem involving feature and parameter selection.The research used two datasets: the Multimedia Understanding Group (MUG) dataset and the enlarged Cohn-Kanade dataset (CK+).They contrasted their method with convolutional neural networks (CNNs), a popular method for identifying emotions on faces (Liu, et al., 2020).
A directed graph neural network (DGNN), more precisely a graph convolutional neural network that uses landmark information for facial emotion recognition (FER), was introduced by Q. T. Ngo et al. in their proposal (Ngoc, et al., 2020).In this model, landmarks serve as nodes in the graph structure, and the Delaunay method constructs the edges in the directed graph.By leveraging the underlying geometrical and temporal information present on faces, emotional cues are extracted using graph neural networks, aiming to avoid the vanishing gradient issue.Additionally, their model includes a stable temporal block in the graph architecture.A unique method for facial expression identification using deep neural networks inside a decision tree framework was reported by M. A (Ruzainie, et al., 2021).Discrete Cosine Transform (DCT) coefficients with low frequencies are arranged to represent expression features.By applying 2-D DCT as an unsupervised feature extractor on the difference images between neutral and expression photos, these coefficients are discovered.They are not used because of the practical difficulties in obtaining geometric properties for practical uses.The first two decision tree nodes focus on three expressions: surprise, smiling, and melancholy.Three efficient one-hidden-layer multilayer projections (OHL-MLP), trained by the back-propagation approach, are used to implement them.The third node in the decision tree, which is a deep neural network implementation, handles and evaluates the three additional expressions: disgust, fury, and fear.An autoencoder is first applied to the directly concatenated DCT coefficients of several facial components, such as the mouth, nose, and eyes, in order to integrate and fine-tune the features.The next step is to train an OHL-MLP to classify the target expressions.
A system that divides emotions into happy, normal, and shocked categories was presented (Sujanaa, et al., 2021).The video frames in the dataset were retrieved at a rate of twenty frames per second and depict a range of moods.A Haar-based cascade classifier is used to segment the mouth area in the facial images.The system extracts edge and local information as well as gradient information from the emotion image using the local binary pattern (LBP) and the histogram of oriented gradients (HOG).Every mouth image is represented by a single histogram that is created by combining these elements.
In order to facilitate multimodal emotion detection, proposed a method to increase heterogeneity between different modalities and create a complementary relationship (Chen, et al., 2022).Experiments were conducted on the SAVEE and eNTERFACE'05 datasets to assess the accuracy of the suggested method.In order to combine multimodal data-which includes time and frequency domain data gathered from voice and facial expressions-Kernel canonical correlation analysis was utilized.To select features from several modalities and lower dimensionality, Kmeans clustering was used.
The goal was to determine which classifier could best recognize negative emotions including fear, rage, disgust, and melancholy.Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron Neural Network (MLPNN), Radial Basis Function (RBF), K-Nearest Neighbor (KNN), Decision tree (J48), and Neural Network (NN) were among the classifiers whose efficacy was evaluated (Tiwari & Veenadhari).The dimensionality was reduced using Principal Component Analysis (PCA), and feature extraction techniques like Gabor wavelet, Chi-Square, Local Binary Pattern (LBP), and Histogram of Gradient (HOG), were used.
A unique method for facial expression identification using convolutional neural networks (FERC) (Savakar, et al., 2023).Two segments make up the CNN network's FERC model: the first segment eliminates the background of the image, and the second segment gets rid of the face vector.The FERC model detects five distinct types of regular facial expressions using the expressional vector (EV).The exponent values and weights of the last perceptual layer change with each iteration of the continuous double-level CNN.FERC stands out from typical CNN single-level technology, enhancing accuracy.CNN involves facial recognition, image processing, object identification, and related technologies, utilizing multiple layers in a deep neural network to potentially extract significant features from the data.
The conducted experiments to extract facial traits with the aim of improving facial emotion recognition (Subudhiray, et al., 2023).They used feature extraction approaches such as Gabor, local binary pattern (LBP), and histogram of oriented gradient (HOG) in addition to conventional k-nearest neighbor (kNN) classification.Performance metrics, such as recall, kappa coefficient, computation time, precision, average and overall recognition accuracy, and recall, were compared.The experiment utilized two databases: Japanese female facial expression (JAFFE) and Cohn-Kanade (CK+).

Dataset Type:
The method is evaluated on several benchmark datasets and contrasted with the five-reference method (SFEW) 2.0 dataset and the real-world affective faces (RAF) dataset.12. (Liu, et al., 2020) 2020 The method combines a genetic algorithm (GA) with support vector machine (SVM) based classification to solve a multi-attribute optimization issue involving feature and parameter selection.The initial node achieves a recognition rate of 99.17% for both surprised and smiling faces.The second node successfully detects sadness in every instance (100%).The last node exhibits accuracy rates of 100.00%, 70.00%, and 55.00% for the facial expressions of fear, rage, and disgust, respectively.

Dataset Type:
Applying the recommended recognition algorithm to the enlarged Cohn-Kanade (CK+) database 17. (Sujanaa, et al., 2021)  Although it demands the most computing time, it achieves the highest average accuracy.In contrast, HOG exhibited the lowest average accuracy at 55.2% with the shortest calculation time, while LBP demonstrated an average accuracy of 88.2%, surpassing Gabor's accuracy with a shorter computational time.

Year Description Results
Dataset Type: Two databases have been utilized: Japanese female facial expression (JAFFE) and Cohn-Kanade (CK+).

DISCUSSION
This  (Jain, et al., 2018).After testing the proposed model in many scenarios and fine-tuning its hyperparameters, it was discovered that combining the two kinds of neural networks (CNN-RNN) greatly enhanced the overall detection results.Furthermore, in a different work, the researchers employed gradients to encode facial traits as components of the face, training a random forest classifier to identify emotions (Arora, et al., 2018).The suggested system was accurate in identifying common emotions, such as happy, sad, angry, neutral, and astonished, when tested on the Japanese Female Facial Emotion (JAFFE) database.Furthermore, a study demonstrated how CNN models can be trained to recognize facial emotions, suggesting that facial expression detection is a feasible application in educational contexts (Lasri, et al., 2019).This capability could enable teachers to adapt their lessons based on the emotional states of students (Zhang, et al., 2019).
The proposed approach aims to mitigate the limitations imposed by artificial design elements and facilitate automatic learning of pattern features.This method utilizes image data from training samples, directly inputting pixel values from each image.Through unconscious, autonomous learning, the model can capture more abstract features of the images.The appropriate initialization of weights during the training phase significantly influences weight updates, contributing to the effectiveness of the suggested strategy.Furthermore, the suggested method demonstrates its capability to enhance the identification of facial expressions, particularly in complex background scenarios.Compared to FRR-CNN and R-CNN models, the suggested model converges substantially faster in complicated backdrop environments.Additionally, the suggested strategy achieves a higher recognition rate, showcasing its effectiveness in handling complex background environments ( (Verma & Verma, 2020)).
The authors employ a two-CNN approach, where the first CNN determines the primary emotion (happiness or sadness), and the second CNN identifies the secondary emotion.The results suggest that this method outperforms current state-of-the-art techniques in accurately detecting emotions from facial expressions.In another study, the development of a logistic regression-based emotion detection system is detailed, outlining the phases of algorithm development, training, and testing (Barrionuevo et al., 2020).The logistic regression algorithm enhances the ability to discern emotions through facial expression analysis.The FERC (Facial Expression Recognition Convolutional) model proposed by consists of a two-part CNN (Mehendale, 2020).The initial section concentrates on obtaining vectors of the face's features, whilst the subsequent section removes the image's background.The FERC model's expressional vector (EV) successfully distinguishes five different types of regular facial expressions.With an EV of length 24 values, the model exhibits great accuracy, indicating emotions with 96% accuracy.Neighborhood difference features (NDF) are used by the modular approach presented to identify faces and categorize facial emotions into seven distinct states (Alreshidi & Ullah, 2020).A random forest classifier is trained to categorize facial expressions into seven groups during testing.In the work of Support vector machine (SVM) based classification is combined with a genetic algorithm (GA) to solve a multi-attribute optimization issue that includes feature and parameter selection.For the purpose of recognizing facial emotions, convolutional neural networks (CNNs) and this approach are contrasted in the studies (Liu, et al., 2020).
A convolutional neural network (CNN), a well-liked approach for identifying emotions on faces, and the suggested method were compared in the study by (Liu, et al., 2020) A directed graph neural network (DGNN) for landmark-based facial expression recognition (FER) was shown in (Ngoc, et al., 2020).The DGNN outperformed cutting-edge image-or video-based algorithms in terms of performance.For both the CK+ and AFEW datasets, state-of-the-art performance of 98.47% and 50.65% was obtained when the recommended strategy was paired with a conventional video-based method.In the future, the FER system might be expanded to incorporate more modalities, like facial features, auditory signals, and physical movements.The recommended strategy has remarkably high recognition accuracy Aknand, 2021).The overall test setup and a few challenging face photos demonstrated the technology's potential for real-world commercial applications, such as hospital patient monitoring and surveillance security.The concept of face emotion detection may be extended to body language or speech recognition for new industrial applications.The kind of FER dataset utilized for model training was discovered to have an impact on the FER system's performance in [79].Training using publicly available datasets including FER 2013, Affect Net, JAFFE, and extended Cohn-Kanade (CK+), the proposed method showed a 3.33% increase in model validity over the traditional FER approach.A decision tree structure and deep neural networks were used in the method by (Ruzainie, et al., 2021).The 2-D DCT was used on difference images between neutral and expressive shots for unsupervised feature extraction.High accuracy rates for identifying a range of facial emotions were shown by the final decision tree nodes (Sujanaa, et al., 2021).
A framework consisting of the happy, normal, and surprise emotion categories was proposed (Chen, et al., 2022).The video frames, acquired at a rate of twenty frames per second, were segmented using a Haar-based cascade classifier.The classifier was trained to concentrate on the facial image's mouth area.The histogram of oriented gradients (HOG) and the localized binary pattern (LBP) were employed to extract the gradient information from the emotion image.These features were combined into a single histogram, each of which represented a mouth image, to form the characteristics.
A strategy for enhancing heterogeneity between various modalities and creating a complementary relationship between them for multimodal emotion detection was suggested (Chen, et al., 2022).The suggested approach utilized Kernel canonical correlation analysis to merge multimodal information from speech and facial expression.In order to choose features from many modalities and reduce dimensionality, K-means clustering was used.SAVEE and eNTERFACE'05 were the two datasets used in the studies.
Finding the classifier that could most accurately recognize negative emotions including fear, fury, contempt, and sorrow was the aim of research (Tiwari & Veenadhari).The classifiers evaluated were Support Vector Machine (SVM), Radial Basis Function (RBF), Decision tree (J48), Random Forest (RF), Multilayer Perceptron Neural Network (MLPNN), Neural Network (NN), and K-Nearest Neighbor (KNN), using Principal Component Analysis (PCA) as the dimensionality reduction method.Histogram of Gradient (HOG), Local Binary Pattern (LBP), and Chi-Square, Gabor wavelet was some of the feature extraction techniques used.
In a brand-new technique called convolutional neural networks (FERC) for face expression recognition was unveiled (Savakar, et al., 2023).The backdrop of the image in the first segment and the face vector in the second are eliminated by the two-segment FERC model.The expressional vector (EV) is used by the FERC model to identify the five distinct categories of typical facial expressions.The study emphasizes the value of deep learning, and more especially the CNN technique and the Keras framework.
In a study, the effectiveness of three derived features-the Histogram of Gradient, Research was done on the recognition of facial expressions using Gabor and the Local Binary Pattern (Subudhiray, et al., 2023).Gabor outperformed the others, achieving a recognition accuracy of 94.8%.Every feature extraction method's performance metrics, such as recall, kappa coefficient, computation time, precision, average and total recognition accuracy, and calculation time, were investigated.

CONCLUSION
Recent developments in emotion recognition from visual dataparticularly facial expressions-are thoroughly examined in this paper.It includes a range of study topics from the last ten years, such as gadgets, emotion models, and classification strategies.SVM was applied and with GA (Aknand, 2021;Sujanaa, et al., 2021;Tiwari & Veenadhari).Yielding accuracy of 98.51% and 93.46% for respectively, out of the twenty-one research that met the criteria.The validation accuracy ranged from 93.57% to 96.29%, while the test accuracy varied from 95.85% to 96.56% (Sujanaa, et al., 2021;Tiwari & Veenadhari).The approximate values for the F1-score, recall, and total precision were 0.96, 0.95, and 0.97.Compared to non-users, produced a greater recognition rate by using k-means (Chen, et al., 2022).CNNs were chosen, and who used 1D-CNN to achieve 97.44% (Lasri, et al., 2019;Zhang, et al., 2019;Verma & Verma, 2020;Mehendale, 2020;Aknand, 2021;Sujanaa, et al., 2021).Compared to existing techniques, application of CNNs in enhanced deep learning showed greater accuracy in emotion prediction (Mehendale, 2020).Introduced FERC using a two-part CNN, emphasizing emotions with 96% accuracy using an EV of length 24 values.FERC's single-level CNN approach differs from typical techniques, enhancing accuracy.In 1D-CNN achieved an accuracy of 97.44% (Sujanaa, et al., 2021).Showcased an algorithm with a training speed approximately 1.5 times faster than the contrast method, utilizing fewer iterations to attain an average recognition rate of 88.56% (Zhang, et al., 2019).The collective findings highlight diverse approaches and promising outcomes in facial emotion recognition.
Figure 2(a) illustrates the SVM classifier, where the decision boundary is optimized by gamma, and C serves as the misclassification penalty function.Figure 2(b) displays the Optimal Hyperplane generated using the SVM algorithm.

Figure
Figure 3 depicts several existing Hidden Markov Model (HMM) topologies: (A) a fully connected HMM; (B) a circular HMM; (C) a left-right HMM.To achieve optimal results for speech emotion recognition, Hidden Markov Models are often employed in serial multiple classifier systems (SVC +