Geometric-based Feature Extraction and Classification for Emotion Expressions of 3 D Video Film

Feature extraction is the most significant step in the operation of emotion expressions recognition. Discrimination operation of emotion expressions has gained the attention of many researchers in the field of pattern recognition because of its significant impact on the various aspects of applications, especially in the application of human-computer interaction, both for the image or to the video. Based on pattern recognition theory, the process of facial expression recognition can be divided into features extraction operation and classification operation. In this paper, the geometric-based features extraction operation is used for extracting the local characteristics (landmarks) of a set of emotion expressions (anger, happiness, sadness, surprise) for images of BOSPHORUS database as training stage, then the classification operation is done by using of the threshold method (Euclidean distance) between the distances of neutral image and the expression image. The trained system is used for feature extraction and classification for 3D video film (stereoscopic) as testing stage. This method is implemented on 40 3D video films that were recorded, 10 video films for each expression of the four basic emotion; the ratio of discrimination is 85%.


I. INTRODUCTION
Feature extraction process is essentially the facial expression representation, which transforms the genuine information from a low-level 2D pixel or 3D vertex based representation, into a higher-plane representation of the face regarding its landmarks, spatial setup, shape, appearance and/or movement.The dimensionality of the original input facial data is usually reduced by features extraction [1].
Manuscript received January 1, 2017; revised April 1, 2017 Based on pattern recognition theory, the process of facial expression recognition can be divided into features extraction operation and classification operation.Many methods have been suggested, involving principal component analysis, linear discriminant analysis, nonparametric discriminant analysis, optical flow, Fisher weight maps, and local binary pattern [2].
Many researchers have worked on geometric-based as well as holistic-based feature extraction approaches.
This paper presents the proposed technique of geometricbased features extraction and classification for emotion recognition operation with four basic emotion expressions (anger, happiness, sadness, surprise), which depends on finding the local features (landmarks) of the face for the neutral image and the expression image for each frame of 3D video film (stereoscopic).

II. GEOMETRIC-BASED FEATURE EXTRACTION
In geometric-based approach, the local features (local statistics and locations) include mouth, eyes, eyebrows, and nose are at first extracted from face images, as shown in Fig. 1, which is used for classification [3].The most successful of geometric-based methods is Active Appearance graph Models (AAM) [4].
Figure 1.Landmarks features [5] Most ways to deal with facial expression examination endeavor to understand a little arrangement of prototypic passionate facial expressions, i.e., dread, misery, repugnance, outrage, shock, and satisfaction [6] and [7].This practice might be contrived from the exploration of Darwin [8], and all the more as of late of Ekman [9], who recommended that fundamental feelings have relating prototypic expressions.
Facial Action Coding System (FACS) is the best known among a few different techniques for acknowledgment of facial signals and most generally utilized as a part of mental exploration [10,11].FACS portrays that the adjustments in the facial expression are as far as 44 Different Action Units (AUs), each of which is anatomically identified with the choking of either a particular facial muscle or an arrangement of facial muscles.FACS gives the guidelines to AU location in a face picture, alongside the meaning of different AUs [12].

A. Geometric-based Features Extraction as Training Level
Local features extraction is accomplished for an arrangement of BOSPHORUS database images as training information; this is performed via preparing this framework to discover the distances between particular focuses which are allocated manually on the particular territories of face of database images.In this work, there are 14 specific points that must be taken from landmarks of face and 7 distances for each face of neutral images and expression images, as illustrated in Fig. 2: 1) 2 points represent the inner eyebrows points.Line between inner eyebrows (Ib), represents the distance between inner eyebrows.2) 4 points: 2 points between the center of left eye and center of left eyebrow, 2 points between the center of right eye and center of right eyebrow.Line between the center of left eye and center of left eyebrow (Ibyl) represents the distance between left eye and left eyebrow.Line between the center of right eye and center of right eyebrow (Ibyr) represents the distance between right eye and right eyebrow.
3) 2 points represent the lip vertical distance.Line between lip vertical points (Lpv) represents the distance between lip vertical points.4) 2 points represent the lip horizontal distance.Line between lip horizontal points (Lph) represents the distance between lip horizontal points.5) 4 points: 2 points between the center of left eye and the point of left corner of mouth, 2 points between the center of right eye and the point of right corner of mouth.Line between the center of left eye and the point of left corner of mouth (Eml) represents the distance between left eye and left angle of mouth.Line between the center of right eye and point of right angle of mouth (Emr) represents the distance between the right eye and right angle of mouth.

B. Classification of Geometric-based Features of Training Level
After the operation of features extraction, analysis of all the features (in this operation, decision is needed for deciding which feature is useful in classification operation).Sequential Forward Selection (SFS) algorithm is used for selecting the suitable local features for the images of database, which serves the classification operation.
The result of SFS selection in this step are 6 features: Ib1, Lpv1, Lph1, for neutral images, Ib2, Lpv2, and Lph2 for expression images, which will be used in the classification operation for emotion expression recognition.
In the training level, threshold classification operation is used to train this system for decision making about the category of each image, this operation depends on finding the proportion of the amount of changes in distance between neutral image and expression image by using (Ib1, Lpv1, Lph1, Ib2, Lpv2, and Lph2) local features in the classification operation.Threshold classification operation involves the following:

C. Geometric-based Technique as Testing Level
The main objective of this work is to extract features of the sequences frames of 3D video film.Local features extraction of intensity image is done for frames of 3D video film; this is done by selecting some of these frames as neutral frames and expression frames.The trained framework finds the distances between particular focuses which are allocated manually on the particular areas of face of the 3D frames after face detection operation.There are 6 specific points that must be taken from landmarks of face:  2 points representing the inner eyebrow distance Ib.  2 points representing the lip vertical distance Lpv. 2 points representing the lip horizontal distance Lph.Feature extraction operation for 3D video film: 1) for i: = 1 to No. of 3D video frames do 2) for j: = 1 to first 10 frames of video as neutral frames do 3) if prompt == press 'y' then // choose suitable neutral frame 4) for k: = 1 to No. of 6 points detected manually from each face do

IV. EXPERIMENTAL RESULTS AND ANALYSIS
The proposed technique is used for local feature extraction and classification for facial expression in 3D video frames.This proposed system is implemented on well-known 3D facial expression database (BOSPHOPUS database) as training level.Results of actualizing geometric-based technique for feature extraction for some images of database, as an example, detecting the 6 points manually on specific area (2 points on inner eyebrows, 2 points on lip vertical, and 2 points on lip horizontal) of the face for each neutral image and expression image, as illustrated in Fig. 3.
Table I. introduces the distances between focuses for eyebrow distance (Ib1, Ib2), lip vertical distance (Lpv1, Lpv2), lip horizontal distance (Lph1, Lph2), for every neutral image and expression image as local features for each image of database.Table (I) presents also the category of every image of database relying upon the proportion of the measure of changes in distance between neutral image and expression image.Results of ratio of changes in distances are shown in Table I. for some images of database.The decision making depends on values of (Ib, Lpv, Lph), when the values of (Ib, Lpv) is less than zero and the value of (Ib) is the smallest one between them, the decision is Anger, as shown in rows (1,5,9,13).When the value of (Lph) is greater than zero and it is the greater one between them, the decision is Happy, as shown in rows (2,6,10,14).When the value of (Lpv) is greater than zero and it is the greater one between them the decision is Surprise, as shown in rows (4,8,12,16).Otherwise the decision is Sad, as shown in rows (3,7,11,15).
Results of executing geometric-based technique for feature extraction for frames of some of 3D video, as an example, detecting the 6 points manually on specific area (2 points on inner eyebrows, 2 points on lip vertical, 2 points on lip horizontal) of the face for each neutral frames and expression frames (for one side of frame, left or right), as illustrated in Fig. 4.
Table II.introduces the distances between focuses for eyebrow distance (Ib1, Ib2), lip vertical distance (Lpv1, Lpv2), lip horizontal distance (Lph1, Lph2), for each neutral frame and expression frame as local features for each frame of 3D video film.Table (2) presents also the class of each frame contingent upon the proportion of the measure of changes in distance between neutral frame and expression frame.Results of ratio of changes in distances are shown in Table II.for some frames of 3D video film.The decision making depends on values of (IB, LPV, LPH), when the values of (IB, LPV) are less than zero and the value of (IB) is the least one between them, the decision is Anger, as shown in rows (1,5,9,13).When the value of (LPH) is greater than zero and it is the greater one between them, the decision is Happy, as shown in rows (2,6,10,14).When the value of (LPV) is greater than zero and it is the greater one between them, the decision is Surprise, as shown in rows (4,8,12,16).Otherwise the decision is Sad, as shown in rows (3,7,11,15).Number of testing 3D video film for 4 expression classes are 40 videos, the number of classified videos is 34 videos and the number of misclassified videos is 6, the recognition rate for all 3D video film is 85 %.The perplexity matrix is given in Table III.Geometric-based features classification strategy for intensity frames gives the classification rate for anger as 60 %, happy as 100%, sad as 80 %, surprise as 100 %; at long last, the recognition rate for all 3D video film is 85 %.
Misclassification rate for Geometric-based features classification for intensity frames is 15%; the perplexity is between sad feeling expression and anger with surprise feeling expressions, which prompts this proportion of misclassification.
In future work, this work is to be developed by making the process of finding the landmarks of geometric-based features classification method, automatically not manually.Visual information in 3D video films in this work is dealing with frontal view, towards the integration of the work there is need to deal with poses of the face for multiple angles.
TECHNIQUE In this work, geometric-based feature extraction and classification are implemented in two stages: geometricbased features extraction and classification for training level, and geometric-based features extraction and classification for testing level.
this paper, the proposed strategy is exhibited for 3D video features extraction and classification.The local features of geometric-base method (Ib, Lpv, and Lph) are straightly isolated (linearly) for images database as training images and for frames of 3D video as testing images for 4 feeling expressions.These features are figured by utilizing the basic distinction technique (Euclidean Distance) and amplify the objective function for classification.The local features of geometric-base technique (Ibyl, and Ibyr) are repeated features which give the same performance for LPV feature.(Eml, and Emr) features do not discriminate any expression of 4 emotion expressions, therefore, (Eml, Emr, Ibyl, and Ibyr) has been excluded in the classification process.The threshold classification method for geometric-based features (Ib, Lpv, and Lph) of image database during training level and successive frames (intensity frames) of 3D video films at testing level depends on finding the proportion of distance (the measure of changes between neutral image and expression image) rather than the distance itself.

TABLE I .
LOCAL FEATURES AS DISTANCES AND PROPORTION OF DISTANCES CHANGING OF SOME IMAGES OF DATABASE.

TABLE II .
LOCAL FEATURES AS DISTANCES AND PROPORTION OF DISTANCES CHANGING OF SOME 3D VIDEO FRAMES.

TABLE III .
PERPLEXITY MATRIX FOR GEOMETRIC-BASED FEATURES CLASSIFICATION METHOD OF INTENSITY FRAME