Face Recognition Performance in Facing Pose Variation

There are many real world applications of face recognition which require good performance in uncontrolled environments such as social networking, and environment surveillance. However, many researches of face recognition are done in controlled situations. Compared to the controlled environments, face recognition in uncontrolled environments comprise more variation, for example in the pose, light intensity, and expression. Therefore, face recognition in uncontrolled conditions is more challenging than in controlled settings. In thisresearch, we would like to discuss handling pose variations in face recognition. We address the representation issue us ing multi-pose of face detection based on yaw angle movement of the head as extensions of the existing frontal face recognition by using Principal Component Analysis (PCA). Then, the matching issue is solved by using Euclidean distance. This combination is known as Eigenfaces method. The experiment is done with different yaw angles and different threshold values to get the optimal results. The experimental results show that: (i) the more pose variation of face images used as training data is, the better recognition results are, but it also increases the processing time, and (ii) the lower threshold value is, the harder it recognizes a face image, but it also increases the accuracy.


I. INTRODUCTION
N OWADAYS, there are many human jobs that can be replaced by the computer, for example in recognizing the face. The face recognition has many applications in various fields including government and business. Generally these applications can be divided into the two groups: (1) Authentication: to verify users in accessing something, for example, in securing a nuclear reactor [1]. In this application, face recognition has many benefits compared to the traditional passwords, such as preventing forgotten passwords or misuse of stolen keys. The accuracy in this kind of application must be high, so the environments should be controllable. This can be done by restraining the number of people and setting the input images as Received: Nov. 11, 2016; received in revised form: Jan. 16, 2017; accepted: Jan. 17, 2017; available online: Mar. 30, 2017. the frontal face under steady illumination. (2) Surveillance: to monitor people in the certain area, such as identifying visitors in public services. This application typically has to deal with uncontrolled environments in which input images have background clutters, varying illumination, and large variations in pose. Face recognition, which was developed by Viisage and was deployed at the airport [2], has many false alarms and has finally been terminated. The inaccuracy of this system is mainly due to the uncontrolled environments.
Many researches of face recognition algorithm in the past were developed on controlled face databases in which images have simple backgrounds and little variations in pose, illumination, and expression. Recently researchers have shifted to develop face recognition algorithm in uncontrolled environments. There are three main challenges for face recognition in uncontrolled environments with the variation in expression, illumination, and pose. The first challenge is varying expression, which can reduce recognition performance significantly. Our previous research [3] tried to recognize expression by using Active Appearance Model and Fuzzy Logic. The second problem is illumination variation. It is difficult to recognize the face under varying illumination. Our previous publication [4] proposed Multi-Scale Retinex method to solve illumination problem. Finally, pose variation makes the feature match between two face images from different viewpoints is very difficult. In this research, we would like to discuss handling pose variations in face recognition. We address the pose variation problem by developing a multi-pose of face detection based on yaw angle movement of the head. The recognition and matching steps use the most common methods, which are Principle component analysis (PCA) and Euclidean distance. We argue that these steps are not too crucial in analyzing the pose variation problem that is more related to face detection phase.
The recent researches on pose variation problem [5] reveals that the best frontal face recognition algorithm on Labeled Face in the Wild (LFW) dataset [6] is Cite this article as: A. A. S. Gunawan and R. A. Prasetyo, "Face Recognition Performance in Facing Pose Variation", CommIT (Communication & Information Technology) Journal 11(1), 1-7, 2017. poorly in recognizing faces with many pose variations. This algorithm is called as High Dimensional LBP [7] which uses a high dimensional feature based on a single-type Local Binary Pattern (LBP) descriptor and employs sparse projection method to make the high dimensional feature practical. The High Dimensional LBP algorithm achieves 93.18% accuracy under the LFW dataset in unrestricted protocol [7], but its accuracy drops to 63.2% under the multi-views and illuminations dataset [5]. In fact, the problem of poseinvariant in face recognition which is desired by many applications remains unsolved, as argued by Ref. [8].
In their research, a combination of geometric and statistical modeling techniques was used to solve the pose problem in face recognition. The algorithm can reach 99.19% accuracy in near-frontal face recognition (between −15 • to 15 • ) and the accuracy drops significantly to an average of around 75% when the poses change to 45 degrees from frontal images. This experiment shows that the key ability of pose-invariant face recognition has not been solved yet.
Briefly, the process of face recognition can be divided into the following steps: (1) Face detection, where the computer searches the existence of face features in an image or a video. Commonly, the output of face detection is a bounding box around each face.
(2) Face alignment, where we align the face image to standardize template by resizing the image size and correcting the location and orientation of the face. (3) Feature extraction, where the computer extracts the derived features of the face image by using representation method. (4) Face identification, where we compare the similarity between the input image and the faces in the database and identify to whom the input image belongs to.
The remainder of this research is organized as follows: the next section will review face detection and face alignment in detail. Furthermore, we discuss feature extraction and face identification in Section III. In Section IV, we implement the multi-pose face recognition based on Principle Component Analysis (PCA). The experiment results of the measurement by using several multi-pose face videos are presented in Section V. Finally, the last section presents the main conclusions of this research.

A. Face Detection
Viola and Jones [9] proposed a face detection algorithm that could achieve real-time face detection with high detection rates. The algorithm was implemented in OpenCV and could detect frontal faces and profile face at about five frames per second. The algorithm can be divided into four steps: (1) Haar-like Feature Selection. Human faces share several similar characteristics, which can be extracted by using Haar-like (rectangular) features. These features are faster than pixel-based image processing. (2) Calculating Integral Image. The rectangular features are represented by the integral image. By using the integral image, the sum of values in a rectangular subset of the image can be calculated quickly and efficiently. (3) Adaboost Training. AdaBoost which stands for Adaptive Boosting is formulated by Yoav Freund [10]. AdaBoost is used to construct complex features by using only a few simple rectangular features. (4) Cascading Classifiers. The rectangular features as classifiers are arranged in a cascade in order of the complexity. By the cascade structure, the speed of the detection can be increased by focusing only on the most probable areas in the image.

B. Face Alignment
The output of face detection is usually just a rough bounding box around each face. This detected face image has to be aligned in a pre-defined template to compensate the variation of location, size, illumination and orientation. This alignment process aims to get higher accuracy in localizing and normalizing face image because face detection step just provides the rough estimation of location and scale of the face image.
To cope with pose variation problem, we use multiview face recognition approach by using both frontal face and profile face detection to capture all view of face poses. Therefore, our approach can be considered as extensions of the existing frontal face recognition. Then, we perform some preprocessing steps to the detected faces, which are: (1) the size of face images is set to 100 × 100 pixels, (2) the image is converted to grayscale value and, (3) the illumination is normalized using histogram equalization algorithm. Figure 1 shows the detected and aligned frontal and profile faces as result of face detection and alignment by using OpenCV.

III. FEATURE EXTRACTION AND FACE IDENTIFICATION
After a face image has been aligned and normalized, the feature extraction is done to get the effective input data that robust from geometric and photometric variation [11]. Finally, the face identification is done by conducting feature matching of extracted features for recognizing face images of different peoples. In the identification step, we compare new input face image to the training face images which are saved in face database.

A. Principle Component Analysis
In the feature extraction step, it uses Principal Component Analysis (PCA). The reason for this choice is that natural face images have significant statistical redundancy, which can be reduced to form a more compact representation by using PCA [12]. PCA, which is also known as Karhunen-Loeve Transformation (KLT), gives the orthogonal decomposition of the face image. Its output which is called as eigen image is a linear projection of the input face image corresponding to the largest Eigenvalue of the covariance matrix.
Essentially, the covariance matrix is built from training images that are taken from many objects. To get the covariance matrix, a 2D image with size m-columns and n-rows have to be flattened as a 1D vector. In this research, the size of column and row of the image are the same n dimension. Then, the vector is formed n 2 ×1 dimension. Exactly, all face images in our experiments are 100 × 100 dimension matrices which are flattened to 10000 × 1 dimension vectors. Each face image is formed as a 100 × 100 dimension matrices, as following: a 100,1 a 100,2 · · · a 100,100 The matrix contains the value of the pixel of a 100× 100 face image. This matrix is flattened to a vector in 10000×1 dimension space. It can be seen in the matrix below.
Γ i = a 1,1 · · · a 1,100 a 2,1 · · · a 100,100 T (2) If there are N individuals, who become samples, and are taken the P images from, the total images of the training set will be: The average face image of the training data, as shown in Fig. 2, is computed by using the formula: After this calculation, the average value of the images is subtracted by each face vector Γ i to obtain vector Φ i .
Finally, the covariance matrix C can be written as: M are the numbers of total images. The matrix A has M × 100 2 dimension, and matrix C has M × M dimension. By computing the orthogonal decomposition of the matrix C, we will get M eigenvalues and M ×100 2 dimension eigenvectors. Commonly, the number of total images (M ) is much less than image size (n 2 ). This approach is used to reduce the image representation.
If the most significant eigenvector from the previous step is converted back to 100 × 100 dimension matrix, then it will create face-like images which can be seen in Fig. 3. This is the reason why PCA is also called as eigenface method.

B. Euclidean Distance
If there is a new 100×100 face image Γ that must be recognized, then the following steps should be done: 1) The input data Γ is normalized using average face image, as follow:  2) The normalized images are projected to the Eigenspace. The projection result, which is M × 1 vector is: Ω proj = Ω proj,1 Ω proj,2 · · · Ω proj,M T (7) 3) To identify the projected image Ω proj , all training images Φ i need to be projected to eigenspace and become Φ proj i . Next, to estimate the similarities between two projected images and determine whether two projected images come from the same person, we use the Euclidean distance as similarity measurement. This distance has the formula:

4) After the calculations, the smallest distance value
is selected: Finally, the conclusion is taken after comparing ∆ to the threshold value θ by using the following rules: a) If ∆ < θ, where θ is the threshold value that is chosen heuristically, then the input face image is recognized as the same person in a face image in training data with the smallest distance value. b) If ∆ > θ, then the input face is not recognized as known person based on face image database. By using PCA, the computation is reduced from n 2 dimension face image to M dimension projected image. Practically, the total of face images in training set is smaller than the total of pixels in the image [12, p. 96]. Our PCA method approach for face recognition is shown in Fig. 4.

IV. IMPLEMENTATION
The implementation of the algorithm uses .Net framework with C# language programming and EmguCV library. The class HaarCascade in EmguCV is used as the classifier of object detection. To solve pose variation problem, we use both frontal face and profile face detectors to capture all view of face poses. However, the profile face detector is only for the left profile face. To get the right profile face, we flip the image horizontally and perform face detection using the left profile face detector. Therefore, the detection system will be able to detect all view of face poses. These used detectors are referred to the development of face detector research by Seo [13] which build a decision tree to detect face image. This research resulted in classifier with 95.4% of accuracy rate for the testing dataset.
Our approach can be considered as extensions of the existing frontal face recognition by using Principal Component Analysis method. The only different is that the person faces in the training data are stored in the various pose. In our initial experiments, it is found that the accuracy of identification would decrease if the faces are arranged without regarding the similarities of the poses. To improve the accuracy of the system, the person faces in the training data are separated based on the captured pose, which is face image (1) from the front, (2) from the right side, and (3) from the left side (see Fig. 5).
By splitting into three processes, the face detection and recognition require longer time in sequential processing. If we use this sequential approach, the application cannot run in real time during the face detection and recognition process with the input source using a webcam. Therefore, the implementation of our application uses the concept of parallel processing provided by .NET Framework 4. The main benefit of the parallel processing is the distribution of the data processing tasks to multiple cores in the processor (CPU). Thus, many tasks can be completed more quickly. The usage of the parallel processing will depend on the specification of the CPU. Figure 6 is the example of detected result of left profile face by using our application.

V. EXPERIMENTAL RESULTS
For our experiments, we need to collect our dataset because there is no public face dataset which is directly related to the pose variations problem. Our dataset is designed to answer the experiment objective that is to find the optimal number of variations in the database and their quantitative pose positions. The collection of face images used in the experiment is captured with a controlled background by using a huge white paper in the room condition that has a light intensity of approximately between 150 to 300 lux. Furthermore, each facial image whether as training data or input data is grayscale and histogram equalized to obtain images with similar gray level value. The angle taken for the facial images is based on yaw head movement. The person faces in the training data are captured by the rule of yaw angle, between −75 • up to 75 • , as shown in Fig. 7. The total of captured face images is eleven images for each person. The angle measurement is measured using a piece of paper with angle lines placed on the center point of the head.
For the first experiment, the face images in training data are taken from ten persons that each of them has eleven images. The experiment would like to see the recognition system accuracy without tuning threshold parameter. It is done by setting a threshold value equal to −1, where all detected faces will return the recognition results. In this experiment, we get the person identification of the smallest distance ∆. Finally, the experiment result is shown in Table I. In  the Table, the term 'recognizable' means the detected faces, and 'amount of recognizable' means the number of the detected faces. Furthermore, the term 'amount of successful' means the number of successful recognition of the detected faces.
Based on the experiment result above, the accuracy of the system in recognizing every detected face can be calculated as: The second experiment would like to see the effect of pose variation in training data by reducing certain poses in the first experiment. First, we create five variations based on a number of face image pose in training data as shown in Table II.
The impact of training data variation based on the detected pose angles in the recognition accuracy is shown in Table III. The result of last variation (V) is the same as the first experiment.
In the third experiment, we would like to analyze the impact of varying threshold value to the successful recognition. The variation of the threshold is from 1000 to 5000, and this threshold is compared to the smallest    distance as described in Section III. Furthermore, this experiment uses only the variation V in Table II, and the experiment result of recognition accuracy based on threshold value is shown in Table IV. In this table, the term 'amount of recognizable under threshold value' means the number of the detected faces if the recognition system is set to the certain threshold value. The threshold value represents the recognition accuracy. The lower the threshold value is, the higher the recognition accuracy will be. The last experiment is to know the processing time in loading the variation of face image training data. The experiment result is shown in Table V.

VI. CONCLUSIONS
The failure in recognizing input face based on our visual observation could be caused by the digital image which is not clear or has excessive noise, the close similarity between one face with another, and the pose with the certain angle which is not provided in training data. These three reasons have the main contribution in decreasing recognition accuracy. Based on the analysis of the experiment results in Tables I,  III-V, our conclusions about recognition performance in facing pose variation are as following: 1) The more pose variation of face images used as training data is, the better recognition results are, but it also increases the processing time.
2) The lower threshold value is, the harder it is to recognize a face image, but it also increases the accuracy. This setting is suitable for authentication application.
3) The higher threshold value is, the easier it is to recognize a face image, but it also reduces the accuracy. This setting is suitable for surveillance application. 4) The best result of our experiment is 100% of recognition accuracy rate for the threshold value of 1000, and the worst result is 69.91% of recognition accuracy rate when a threshold value equals to −1. This threshold means that all detected faces will return the recognition results. The main weakness in our approach is that there is no classification algorithm after reducing the dimension using PCA. Therefore, the additional classification algorithm will be the first issue in our next step research, and it will be addressed using algorithm establishing the classification of multiple data, such as Support Vector Machines (SVM).