Elsevier

Signal Processing

Volume 93, Issue 6, June 2013, Pages 1547-1556
Signal Processing

Intelligent pixels of interest selection with application to facial expression recognition using multilayer perceptron

https://doi.org/10.1016/j.sigpro.2012.08.007Get rights and content

Abstract

This paper presents an automatic way to discover pixels in a face image that improves the facial expression recognition results. Main contribution of our study is to provide a practical method to improve classification performance of classifiers by selecting best pixels of interest. Our method exhaustively searches for the best and worst feature window position from a set of face images among all possible combinations using MLP. Then, it creates a non-rectangular emotion mask for feature selection in supervised facial expression recognition problem. It eliminates irrelevant data and improves the classification performance using backward feature elimination. Experimental studies on GENKI, JAFFE and FERET databases showed that the proposed system improves the classification results by selecting the best pixels of interest.

Highlights

► We discover pixels in an face image that improve emotional classification. ► We create emotion masks to improve classification using backward feature elimination. ► Small number of selected pixels outperforms full frame pixels. ► There is a high accuracy difference in very close feature windows. ► Positive emotions are likely to occur in the lower face.

Introduction

Facial expression recognition (FER) is a hot research topic and a challenging problem on different domains including face recognition, human computer interaction, facial animation as well as social interaction. In the last decade, researchers from various disciplines focused on efficient, accurate and fast recognition of facial expressions. Emotions can be detected from physical sensors, image and video. Each sensor type has its own challenges such as noisy signals, high dimensionality and quality of selected features. There are many automatic FER studies achieving high accuracy on well-defined datasets. However, these studies still perform poor results under real world situations. Therefore there is still a considerable accuracy gap for realistic classification scenarios. One solution for this problem is to improve the classification results in terms of objective measures. Among others, feature selection is an important step towards better classifiers. Feature selection and reduction strategies are used to select relevant features to create robust models. In this scope, majority of the previous studies on FER considered the face and facial features as a combination of coarse rectangular units [1], [2]. These units are used to locate or extract valuable facial feature information. Although its implementation simplicity, it includes useless and noisy data for the machine learning step. Therefore there is a need to find local pixel of interests (POI) to be used in FER. Group of POI provide non-rectangular masks that can be used to improve the classification performance.

Selection of the best variable and feature become the focus in classification research where there are thousands of different possibilities. Feature selection is the technique for selecting a subset of relevant features from original data to reduce feature size while maximizing the classifier output. Wrapper and filter based feature selection are the most common two approaches in the field. Wrappers evaluates the importance of specific features considering a particular learning algorithm [3] whereas filter based methods reduce the features space using a specific filter. Regardless of the fact that computational complexity of wrapper based methods put aside, the facial area used in FER is a small region that can be represented by regions as small as 20×20 to 50×50 pixels for vision based algorithms.

In this study, we used an analytic approach that performs wrapper based feature selection by exhaustive searching of all possible set of feature windows to find informative pixels to improve the results of FER. For a given emotion class, we created corresponding emotion mask to improve the Multilayer Perceptron (MLP) model's performance. Our experiments on different datasets showed that proposed method gives better results than full frame and the best traditional feature window based classification.

The rest of this paper is organized as follows. In Section 2, we briefly overview related works. Database material, mask generation and its application to FER is described in Section 3. Experimental results and discussion are presented in Section 4, followed by the conclusion.

Section snippets

Related works

Considering either analytic or holistic classification problem, there is a need to eliminate redundant and noisy information. Analytic approaches are widely use in face recognition domain and they are based on the detection of specific facial features such as eyes, eyebrows, nose, mouth and the locations of facial fiducial points such as corner positions of eye, mouth and their geometric relationships. Here a system is solved by considering its subparts and how they work together to produce

Material and methods

In this paper, we make use of non-rectangular emotion masks for facial emotion recognition problem to improve overall classification results. Fig. 3 shows general flow diagram of our method.

Our proposed method exhaustively searches for the best feature window position from a set of static images among all possible combinations using an Artificial Neural Network (ANN) and creates a non-rectangular mask for a given emotion class. As the emotion recognition problem is a non-linear problem, the

Results and discussion

For a 50×50 face image we considered (m=224) different search window Rk which yields 14,490 different neural networks. When all possible windows sizes feed into the neural network the outputs are the set of best and worst window locations as shown in Fig. 6.

It took approximately 23 h to process the GENKI dataset. For each Rk, we stored the x and y position giving the highest and lowest accuracy in test phase. Average training accuracy of 14,490 different neural network is 93.6% with a standard

Conclusion

In order to find the best feature window position and size, we performed exhaustive search on facial area. Although the bigger windows size have the higher accuracy, our experiments showed that location and size of the windows has a great effect on the emotion classification problem. Experiments showed that, in many cases, smaller feature windows have more accurate results than larger feature windows. In addition, for the same windows size there is a high accuracy difference in very close

Acknowledgments

The authors would like to thank the handling editor and reviewers for their constructive comments on this paper. This study is supported by the Multimodal Interfaces for Disabled and Ageing Society (MIDAS) ITEA 2-07008 project.

References (36)

  • H. Tang et al.

    3d face recognition using local binary patterns

    Signal Processing

    (2012)
  • M. Turk et al.

    Eigenfaces for recognition

    Journal of Cognitive Neuroscience

    (1991)
  • K. Etemad et al.

    Discriminant analysis for recognition of human face images

  • N. Guan et al.

    Online nonnegative matrix factorization with robust stochastic approximation

    IEEE Transactions on Neural Networks and Learning Systems

    (2012)
  • N. Guan et al.

    Nenmfan optimal gradient method for nonnegative matrix factorization

    IEEE Transactions on Signal Processing

    (2012)
  • N. Guan et al.

    Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent

    IEEE Transactions on Image Processing

    (2011)
  • S. Gong, P.W. McOwan, C. Shan, Conditional mutual information based boosting for facial expression recognition, in:...
  • G.-M. Jeong et al.

    Pattern recognition using feature feedbackapplication to face recognition

    International Journal of Control, Automation and Systems

    (2010)
  • Cited by (31)

    • Frontalization and adaptive exponential ensemble rule for deep-learning-based facial expression recognition system

      2021, Signal Processing: Image Communication
      Citation Excerpt :

      In this work, an automatic FER algorithm for static facial images using the techniques of the CNN, face frontalization, and the hierarchical structure is proposed. In contrast to related works [8–16], which used either small patches or local features for FER, the proposed FER system adopts an improved frontalized preprocessing technique. Moreover, an advanced shortcut network, with even higher accuracy than the famous DenseNet [17], is applied.

    • Multiple strategies to enhance automatic 3D facial expression recognition

      2015, Neurocomputing
      Citation Excerpt :

      In recent years, novel methods have emerged for the research on image classification [1,2]. However, facial expression recognition using 2D static images [3–7] or image sequences [8–11], as one of the applications of image classification, is greatly hindered by its nature. The facial expressions in 2D space cannot capture their out-of-plane changes and they are sensitive to illumination changes and head pose variations, which can be well tackled by 3D data as they are represented in 3D physical coordinates to capture the subtle changes occurring in the depth of the facial surface.

    • Fully automatic 3D facial expression recognition using polytypic multi-block local binary patterns

      2015, Signal Processing
      Citation Excerpt :

      Facial expression recognition has attracted considerable attention in the past decades due to its potential applications in various fields, such as human–computer interaction, psychological studies, and facial animation. However, the performance of traditional algorithms for facial expression recognition degrades heavily with the illumination and head pose variations, as they are applied to 2D static images [1,2] or image sequences [3,4]. In order to address these limitations of 2D systems, many researchers explore this issue in 3D space which captures true facial surface data and has better stability and robustness.

    • Extraction of Facial Features

      2022, Face Analysis under Uncontrolled Conditions: From Face Detection to Expression Recognition
    View all citing articles on Scopus
    1

    Present address: University of Angers, LISA Laboratory, 62 Avenue Notre Dame du Lac, 49000 Angers, France.

    View full text