Intelligent pixels of interest selection with application to facial expression recognition using multilayer perceptron

doi:10.1016/j.sigpro.2012.08.007

Signal Processing

Volume 93, Issue 6, June 2013, Pages 1547-1556

https://doi.org/10.1016/j.sigpro.2012.08.007 Get rights and content

Abstract

This paper presents an automatic way to discover pixels in a face image that improves the facial expression recognition results. Main contribution of our study is to provide a practical method to improve classification performance of classifiers by selecting best pixels of interest. Our method exhaustively searches for the best and worst feature window position from a set of face images among all possible combinations using MLP. Then, it creates a non-rectangular emotion mask for feature selection in supervised facial expression recognition problem. It eliminates irrelevant data and improves the classification performance using backward feature elimination. Experimental studies on GENKI, JAFFE and FERET databases showed that the proposed system improves the classification results by selecting the best pixels of interest.

Highlights

► We discover pixels in an face image that improve emotional classification. ► We create emotion masks to improve classification using backward feature elimination. ► Small number of selected pixels outperforms full frame pixels. ► There is a high accuracy difference in very close feature windows. ► Positive emotions are likely to occur in the lower face.

Introduction

Facial expression recognition (FER) is a hot research topic and a challenging problem on different domains including face recognition, human computer interaction, facial animation as well as social interaction. In the last decade, researchers from various disciplines focused on efficient, accurate and fast recognition of facial expressions. Emotions can be detected from physical sensors, image and video. Each sensor type has its own challenges such as noisy signals, high dimensionality and quality of selected features. There are many automatic FER studies achieving high accuracy on well-defined datasets. However, these studies still perform poor results under real world situations. Therefore there is still a considerable accuracy gap for realistic classification scenarios. One solution for this problem is to improve the classification results in terms of objective measures. Among others, feature selection is an important step towards better classifiers. Feature selection and reduction strategies are used to select relevant features to create robust models. In this scope, majority of the previous studies on FER considered the face and facial features as a combination of coarse rectangular units [1], [2]. These units are used to locate or extract valuable facial feature information. Although its implementation simplicity, it includes useless and noisy data for the machine learning step. Therefore there is a need to find local pixel of interests (POI) to be used in FER. Group of POI provide non-rectangular masks that can be used to improve the classification performance.

Selection of the best variable and feature become the focus in classification research where there are thousands of different possibilities. Feature selection is the technique for selecting a subset of relevant features from original data to reduce feature size while maximizing the classifier output. Wrapper and filter based feature selection are the most common two approaches in the field. Wrappers evaluates the importance of specific features considering a particular learning algorithm [3] whereas filter based methods reduce the features space using a specific filter. Regardless of the fact that computational complexity of wrapper based methods put aside, the facial area used in FER is a small region that can be represented by regions as small as 20×20 to 50×50 pixels for vision based algorithms.

In this study, we used an analytic approach that performs wrapper based feature selection by exhaustive searching of all possible set of feature windows to find informative pixels to improve the results of FER. For a given emotion class, we created corresponding emotion mask to improve the Multilayer Perceptron (MLP) model's performance. Our experiments on different datasets showed that proposed method gives better results than full frame and the best traditional feature window based classification.

The rest of this paper is organized as follows. In Section 2, we briefly overview related works. Database material, mask generation and its application to FER is described in Section 3. Experimental results and discussion are presented in Section 4, followed by the conclusion.

Section snippets

Related works

Considering either analytic or holistic classification problem, there is a need to eliminate redundant and noisy information. Analytic approaches are widely use in face recognition domain and they are based on the detection of specific facial features such as eyes, eyebrows, nose, mouth and the locations of facial fiducial points such as corner positions of eye, mouth and their geometric relationships. Here a system is solved by considering its subparts and how they work together to produce

Material and methods

In this paper, we make use of non-rectangular emotion masks for facial emotion recognition problem to improve overall classification results. Fig. 3 shows general flow diagram of our method.

Our proposed method exhaustively searches for the best feature window position from a set of static images among all possible combinations using an Artificial Neural Network (ANN) and creates a non-rectangular mask for a given emotion class. As the emotion recognition problem is a non-linear problem, the

Results and discussion

For a 50×50 face image we considered (m=224) different search window R_k which yields 14,490 different neural networks. When all possible windows sizes feed into the neural network the outputs are the set of best and worst window locations as shown in Fig. 6.

It took approximately 23 h to process the GENKI dataset. For each R_k, we stored the x and y position giving the highest and lowest accuracy in test phase. Average training accuracy of 14,490 different neural network is 93.6% with a standard

Conclusion

In order to find the best feature window position and size, we performed exhaustive search on facial area. Although the bigger windows size have the higher accuracy, our experiments showed that location and size of the windows has a great effect on the emotion classification problem. Experiments showed that, in many cases, smaller feature windows have more accurate results than larger feature windows. In addition, for the same windows size there is a high accuracy difference in very close

Acknowledgments

The authors would like to thank the handling editor and reviewers for their constructive comments on this paper. This study is supported by the Multimodal Interfaces for Disabled and Ageing Society (MIDAS) ITEA 2-07008 project.

References (36)

A. Hadid et al.
Combining appearance and motion for face and gender recognition from videos
Pattern Recognition
(2009)
D. Ververidis et al.
Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition
Signal Processing
(2008)
T. Ojala et al.
A comparative study of texture measures with classification based on featured distributions
Pattern Recognition
(1996)
A. Hyvärinen et al.
Independent component analysisalgorithms and applications
Neural Networks
(2000)
S.-I. Choi et al.
Input variable selection for feature extraction in classification problems
Signal Processing
(2012)
M.-Y. Chen et al.
The contribution of the upper and lower face in happy and sad facial expression classification
Vision Research
(2010)
I. Kotsia et al.
An analysis of facial expression recognition under partial facial image occlusion
Image and Vision Computing
(2008)
Z. Liang et al.
Projected gradient method for kernel discriminant nonnegative matrix factorization and the applications
Signal Processing
(2010)
P. Viola et al.
Robust real-time face detection
International Journal of Computer Vision
(2004)
K.-M. Lam et al.
An analytic-to-holistic approach for face recognition based on a single frontal view
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1998)

H. Tang et al.

3d face recognition using local binary patterns

Signal Processing

(2012)

M. Turk et al.

Eigenfaces for recognition

Journal of Cognitive Neuroscience

(1991)

K. Etemad et al.

Discriminant analysis for recognition of human face images

N. Guan et al.

Online nonnegative matrix factorization with robust stochastic approximation

IEEE Transactions on Neural Networks and Learning Systems

(2012)

N. Guan et al.

Nenmfan optimal gradient method for nonnegative matrix factorization

IEEE Transactions on Signal Processing

(2012)

N. Guan et al.

Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent

IEEE Transactions on Image Processing

(2011)

S. Gong, P.W. McOwan, C. Shan, Conditional mutual information based boosting for facial expression recognition, in:...

G.-M. Jeong et al.

Pattern recognition using feature feedbackapplication to face recognition

International Journal of Control, Automation and Systems

(2010)

Cited by (31)

Frontalization and adaptive exponential ensemble rule for deep-learning-based facial expression recognition system
2021, Signal Processing: Image Communication
Citation Excerpt :
In this work, an automatic FER algorithm for static facial images using the techniques of the CNN, face frontalization, and the hierarchical structure is proposed. In contrast to related works [8–16], which used either small patches or local features for FER, the proposed FER system adopts an improved frontalized preprocessing technique. Moreover, an advanced shortcut network, with even higher accuracy than the famous DenseNet [17], is applied.
Automatic facial expression recognition (FER) is an important technique in human–computer interfaces and surveillance systems. It classifies the input facial image into one of the basic expressions (anger, sadness, surprise, happiness, disgust, fear, and neutral). There are two types of FER algorithms: feature-based and convolutional neural network (CNN)-based algorithms. The CNN is a powerful classifier, however, without proper auxiliary techniques, its performance may be limited. In this study, we improve the CNN-based FER system by utilizing face frontalization and the hierarchical architecture. The frontalization algorithm aligns the face by in-plane or out-of-plane, rotation, landmark point matching, and removing background noise. The proposed adaptive exponentially weighted average ensemble rule can determine the optimal weight according to the accuracy of classifiers to improve robustness. Experiments on several popular databases are performed and the results show that the proposed system has a very high accuracy and outperforms state-of-the-art FER systems.
Multiple strategies to enhance automatic 3D facial expression recognition
2015, Neurocomputing
Citation Excerpt :
In recent years, novel methods have emerged for the research on image classification [1,2]. However, facial expression recognition using 2D static images [3–7] or image sequences [8–11], as one of the applications of image classification, is greatly hindered by its nature. The facial expressions in 2D space cannot capture their out-of-plane changes and they are sensitive to illumination changes and head pose variations, which can be well tackled by 3D data as they are represented in 3D physical coordinates to capture the subtle changes occurring in the depth of the facial surface.
The research on 3D facial expression recognition has attracted numbers of interests due to its superiority to 2D data and it has been greatly promoted in recent years. However, its performance needs to be further improved and its data structure needs to be further analyzed to keep its automation well as the mesh structure of 3D face models cannot be applied directly to algebraic operations. This paper addresses these problems with multiple strategies, so that 3D facial expression recognition can be automatically implemented and its performance is subsequently enhanced. Firstly, an image-like-structure is proposed to represent the 3D face models, so that algebraic operations can be directly applied to analyze 3D data. Based on this image-like-structure, the strategies of irregular division schemes and the entropy weighted blocks are employed to improve the recognition accuracy. The former aims to keep the integrity of local structure; the latter is employed to emphasize the contribution of different facial regions. Both of them can be separately or jointly, utilized to facial feature descriptors. With the remarkable experimental results based on LBP and LTP, we can conclude that these strategies are available to promote the performance of automatic 3D facial expression recognition, which draws a promising direction for automatic 3D facial expression recognition.
Fully automatic 3D facial expression recognition using polytypic multi-block local binary patterns
2015, Signal Processing
Citation Excerpt :
Facial expression recognition has attracted considerable attention in the past decades due to its potential applications in various fields, such as human–computer interaction, psychological studies, and facial animation. However, the performance of traditional algorithms for facial expression recognition degrades heavily with the illumination and head pose variations, as they are applied to 2D static images [1,2] or image sequences [3,4]. In order to address these limitations of 2D systems, many researchers explore this issue in 3D space which captures true facial surface data and has better stability and robustness.
3D facial expression recognition has been greatly promoted for overcoming the inherent drawbacks of 2D facial expression recognition and has achieved superior recognition accuracy to the 2D. In this paper, a novel holistic, full-automatic approach for 3D facial expression recognition is proposed. First, 3D face models are represented in 2D-image-like structure which makes it possible to take advantage of the wealth of 2D methods to analyze 3D models. Then an enhanced facial representation, namely polytypic multi-block local binary patterns (P-MLBP), is proposed. The P-MLBP involves both the feature-based irregular divisions to depict the facial expressions accurately and the fusion of depth and texture information of 3D models to enhance the facial feature. Based on the BU-3DFE database, three kinds of classifiers are employed to conduct 3D facial expression recognition for evaluation. Their experimental results outperform the state of the art and show the effectiveness of P-MLBP for 3D facial expression recognition. Therefore, the proposed strategy is validated for 3D facial expression recognition; and its simplicity opens a promising direction for fully automatic 3D facial expression recognition.
Facial expression recognition on partially occluded faces using component based ensemble stacked CNN
2023, Cognitive Neurodynamics
Leveraging on Advanced Remote Sensing- and Artificial Intelligence-Based Technologies to Manage Palm Oil Plantation for Current Global Scenario: A Review
2023, Agriculture (Switzerland)
Extraction of Facial Features
2022, Face Analysis under Uncontrolled Conditions: From Face Detection to Expression Recognition

View all citing articles on Scopus

¹: Present address: University of Angers, LISA Laboratory, 62 Avenue Notre Dame du Lac, 49000 Angers, France.

View full text

Intelligent pixels of interest selection with application to facial expression recognition using multilayer perceptron

Abstract

Highlights

Introduction

Section snippets

Related works

Material and methods

Results and discussion

Conclusion

Acknowledgments

Pattern Recognition

Signal Processing

Pattern Recognition

Neural Networks

Signal Processing

Vision Research

Image and Vision Computing

Signal Processing

Robust real-time face detection

International Journal of Computer Vision

An analytic-to-holistic approach for face recognition based on a single frontal view

IEEE Transactions on Pattern Analysis and Machine Intelligence

3d face recognition using local binary patterns

Signal Processing

Eigenfaces for recognition

Journal of Cognitive Neuroscience

Discriminant analysis for recognition of human face images

Online nonnegative matrix factorization with robust stochastic approximation

IEEE Transactions on Neural Networks and Learning Systems

Nenmfan optimal gradient method for nonnegative matrix factorization

IEEE Transactions on Signal Processing

Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent

IEEE Transactions on Image Processing

Pattern recognition using feature feedbackapplication to face recognition

International Journal of Control, Automation and Systems