Intelligent pixels of interest selection with application to facial expression recognition using multilayer perceptron
Highlights
► We discover pixels in an face image that improve emotional classification. ► We create emotion masks to improve classification using backward feature elimination. ► Small number of selected pixels outperforms full frame pixels. ► There is a high accuracy difference in very close feature windows. ► Positive emotions are likely to occur in the lower face.
Introduction
Facial expression recognition (FER) is a hot research topic and a challenging problem on different domains including face recognition, human computer interaction, facial animation as well as social interaction. In the last decade, researchers from various disciplines focused on efficient, accurate and fast recognition of facial expressions. Emotions can be detected from physical sensors, image and video. Each sensor type has its own challenges such as noisy signals, high dimensionality and quality of selected features. There are many automatic FER studies achieving high accuracy on well-defined datasets. However, these studies still perform poor results under real world situations. Therefore there is still a considerable accuracy gap for realistic classification scenarios. One solution for this problem is to improve the classification results in terms of objective measures. Among others, feature selection is an important step towards better classifiers. Feature selection and reduction strategies are used to select relevant features to create robust models. In this scope, majority of the previous studies on FER considered the face and facial features as a combination of coarse rectangular units [1], [2]. These units are used to locate or extract valuable facial feature information. Although its implementation simplicity, it includes useless and noisy data for the machine learning step. Therefore there is a need to find local pixel of interests (POI) to be used in FER. Group of POI provide non-rectangular masks that can be used to improve the classification performance.
Selection of the best variable and feature become the focus in classification research where there are thousands of different possibilities. Feature selection is the technique for selecting a subset of relevant features from original data to reduce feature size while maximizing the classifier output. Wrapper and filter based feature selection are the most common two approaches in the field. Wrappers evaluates the importance of specific features considering a particular learning algorithm [3] whereas filter based methods reduce the features space using a specific filter. Regardless of the fact that computational complexity of wrapper based methods put aside, the facial area used in FER is a small region that can be represented by regions as small as 20×20 to 50×50 pixels for vision based algorithms.
In this study, we used an analytic approach that performs wrapper based feature selection by exhaustive searching of all possible set of feature windows to find informative pixels to improve the results of FER. For a given emotion class, we created corresponding emotion mask to improve the Multilayer Perceptron (MLP) model's performance. Our experiments on different datasets showed that proposed method gives better results than full frame and the best traditional feature window based classification.
The rest of this paper is organized as follows. In Section 2, we briefly overview related works. Database material, mask generation and its application to FER is described in Section 3. Experimental results and discussion are presented in Section 4, followed by the conclusion.
Section snippets
Related works
Considering either analytic or holistic classification problem, there is a need to eliminate redundant and noisy information. Analytic approaches are widely use in face recognition domain and they are based on the detection of specific facial features such as eyes, eyebrows, nose, mouth and the locations of facial fiducial points such as corner positions of eye, mouth and their geometric relationships. Here a system is solved by considering its subparts and how they work together to produce
Material and methods
In this paper, we make use of non-rectangular emotion masks for facial emotion recognition problem to improve overall classification results. Fig. 3 shows general flow diagram of our method.
Our proposed method exhaustively searches for the best feature window position from a set of static images among all possible combinations using an Artificial Neural Network (ANN) and creates a non-rectangular mask for a given emotion class. As the emotion recognition problem is a non-linear problem, the
Results and discussion
For a 50×50 face image we considered (m=224) different search window Rk which yields 14,490 different neural networks. When all possible windows sizes feed into the neural network the outputs are the set of best and worst window locations as shown in Fig. 6.
It took approximately 23 h to process the GENKI dataset. For each Rk, we stored the x and y position giving the highest and lowest accuracy in test phase. Average training accuracy of 14,490 different neural network is 93.6% with a standard
Conclusion
In order to find the best feature window position and size, we performed exhaustive search on facial area. Although the bigger windows size have the higher accuracy, our experiments showed that location and size of the windows has a great effect on the emotion classification problem. Experiments showed that, in many cases, smaller feature windows have more accurate results than larger feature windows. In addition, for the same windows size there is a high accuracy difference in very close
Acknowledgments
The authors would like to thank the handling editor and reviewers for their constructive comments on this paper. This study is supported by the Multimodal Interfaces for Disabled and Ageing Society (MIDAS) ITEA 2-07008 project.
References (36)
- et al.
Combining appearance and motion for face and gender recognition from videos
Pattern Recognition
(2009) - et al.
Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition
Signal Processing
(2008) - et al.
A comparative study of texture measures with classification based on featured distributions
Pattern Recognition
(1996) - et al.
Independent component analysisalgorithms and applications
Neural Networks
(2000) - et al.
Input variable selection for feature extraction in classification problems
Signal Processing
(2012) - et al.
The contribution of the upper and lower face in happy and sad facial expression classification
Vision Research
(2010) - et al.
An analysis of facial expression recognition under partial facial image occlusion
Image and Vision Computing
(2008) - et al.
Projected gradient method for kernel discriminant nonnegative matrix factorization and the applications
Signal Processing
(2010) - et al.
Robust real-time face detection
International Journal of Computer Vision
(2004) - et al.
An analytic-to-holistic approach for face recognition based on a single frontal view
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1998)
3d face recognition using local binary patterns
Signal Processing
Eigenfaces for recognition
Journal of Cognitive Neuroscience
Discriminant analysis for recognition of human face images
Online nonnegative matrix factorization with robust stochastic approximation
IEEE Transactions on Neural Networks and Learning Systems
Nenmfan optimal gradient method for nonnegative matrix factorization
IEEE Transactions on Signal Processing
Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent
IEEE Transactions on Image Processing
Pattern recognition using feature feedbackapplication to face recognition
International Journal of Control, Automation and Systems
Cited by (31)
Frontalization and adaptive exponential ensemble rule for deep-learning-based facial expression recognition system
2021, Signal Processing: Image CommunicationCitation Excerpt :In this work, an automatic FER algorithm for static facial images using the techniques of the CNN, face frontalization, and the hierarchical structure is proposed. In contrast to related works [8–16], which used either small patches or local features for FER, the proposed FER system adopts an improved frontalized preprocessing technique. Moreover, an advanced shortcut network, with even higher accuracy than the famous DenseNet [17], is applied.
Multiple strategies to enhance automatic 3D facial expression recognition
2015, NeurocomputingCitation Excerpt :In recent years, novel methods have emerged for the research on image classification [1,2]. However, facial expression recognition using 2D static images [3–7] or image sequences [8–11], as one of the applications of image classification, is greatly hindered by its nature. The facial expressions in 2D space cannot capture their out-of-plane changes and they are sensitive to illumination changes and head pose variations, which can be well tackled by 3D data as they are represented in 3D physical coordinates to capture the subtle changes occurring in the depth of the facial surface.
Fully automatic 3D facial expression recognition using polytypic multi-block local binary patterns
2015, Signal ProcessingCitation Excerpt :Facial expression recognition has attracted considerable attention in the past decades due to its potential applications in various fields, such as human–computer interaction, psychological studies, and facial animation. However, the performance of traditional algorithms for facial expression recognition degrades heavily with the illumination and head pose variations, as they are applied to 2D static images [1,2] or image sequences [3,4]. In order to address these limitations of 2D systems, many researchers explore this issue in 3D space which captures true facial surface data and has better stability and robustness.
Facial expression recognition on partially occluded faces using component based ensemble stacked CNN
2023, Cognitive NeurodynamicsExtraction of Facial Features
2022, Face Analysis under Uncontrolled Conditions: From Face Detection to Expression Recognition
- 1
Present address: University of Angers, LISA Laboratory, 62 Avenue Notre Dame du Lac, 49000 Angers, France.