Different Approaches for Face Authentication as Part of a Multimodal Biometrics System

This paper describes different approaches for the face authentication from the features and classification abilities point of view. Authors compare two types of features Histogram of Oriented Gradients (HOG) and Local Binary Patterns (LBP) including their combination. These parameters are classified using Multilayer Neural Network (MLNN) and Support Vector Machines (SVM). Face authentication consists of several steps. The first step contains Viola-Jones algorithm for face detection. Authors resize the detected face for a fixed vector and afterwards, it is converted into grayscale. Next, feature extraction with a simple Min-Max normalization is applied. Obtained parameters are evaluated by classifiers and for each detected face, authors get posterior probability as the output of the classifier. Different approaches for face authentication are compared with each other using False Acceptance Rate (FAR), False Rejection Rate (FRR), Equal Error Rate (EER), Receiver Operating Characteristic (ROC) and Detection Error Tradeoff (DET) curves. The results are verified with AR Face Database and elaborated in a feature extraction and classifier design point of view. Best results were achieved by HOG feature for SVM classifier. Detailed results are listed in the text below.


Introduction
Personal authentication can be divided into three fields according to methods used.The first field is based on knowledge, which means that the person knows a password.The second field is represented by the authentication methods based on possession (identification card, key).The last one is based on the biometric authentication.The systems coming from the biometric authentication are used to verify the identity of a person by using unique physiological features (fingerprint, iris, retina, facial geometry, voice, etc.) [1].
The main advantage of biometric authentication is that a user does not need to remember a password or always carry an easily stealable key.The reasons for using biometric authentication are speed, convenience, precision, high reliability, zero operating cost, practicality and clarity.Biometric authentication can be used in many areas: security of computers and data, building access, judiciary, ensuring a comfort, etc. [2] and [3].
Face recognition represents a technology which identifies and verifies a unique facial geometry from the digital image.Face recognition can be divided into two areas.The first area is the face identification and the second is face authentication [4].Face recognition is widely used because the facial geometry is one of the very popular biometric characteristics.Digital image of a face can be scanned simply and non-invasively with common camera equipment.There are many areas where we can use face recognition (access control, bankcard identification, security monitoring, etc.) [5].
A lot of work has been done in the last years in the field of face authentication as part of a multimodal biometrics system.Sanderson et al. [6] provide a review of important milestones in audio-visual person identification and verification (features, classifiers and fusion techniques).Authors [6] used eigenface as features and GMM for classification in their research.Brunelli et al. [7] used a set of geometric features, describing the size and the layout of the different features in the faces (eye, mouth, nose, eyebrow).Recognition proceeded by measuring the distance of the unknown descriptive vector and a set of reference vectors (known people).Raghavendra et al. [8] compared four methods for feature extraction (PCA, 2DPCA, LDA, 2DLDA).Each of these feature vectors was classified by nearest neighbour classifier.Kala et al. [9] used a set of geometric features (width of the eye, length of the eye, length of the mouth, width of the mouth, . . . ) for face representation.These parameters were classified by artificial neural network.Barbu et al. [12] used SIFT-based face recognition technique for feature extraction.Authors used measurement of the distance between feature vectors for classification.In [10] and [13], authors used the same features (HOG, LBP) as a descriptor of face.Chandrasheker et al. [10] used SVM and HMM for classification and Xie et al. [13] used only SVM.This paper is focused on face authentication and compares relevant methods to achieve the lowest error rate.Authors compare various parameters (HOG, LBP and their combination) and multiple classifiers (MLNN and SVM).The combination of parameters and classifier with the lowest error rate will be used for multimodal biometrics system in future work.This multimodal system will consist of voice authentication and face authentication.
The rest of the paper is organised as follows: the second chapter mentions the basic idea of face authentication and is followed by the description of the AR Face Database that has been used in the presented experiment.The results are then presented in chapter four, and section five contains a discussion about the future work and possible improvements.

Face Authentication
Face authentication or face verification is a kind of biometric authentication where the facial geometry is used for the verification process.Simply put, the main task of the technology is to decide whether a face from the digital image belongs to an authenticated user or not.
Face authentication is used in many areas such as banking, building access, devices access and so on.As already mentioned, the main advantages of this approach are low price, user comfort, contactless nature and sufficient accuracy [4] and [5].Authors have focused on face authentication because the goal is to design multimodal biometric authentication system which will consist both of the face and voice authentications.
The process of authentication consists of following steps: face detection, preprocessing (resize, grayscale conversion), feature extraction, classification and decision.These steps are shown in Fig. 1 and described in more details below.

Face Detection
Face detection is intended to find a face and its coordinates in a given image.Authors used Viola-Jones detection method [14] in the presented experiment.This method is based on three main features (integral image, AdaBoost training, cascading classifiers).Viola-Jones algorithm is very fast, accurate and very suitable for face detection [15].

Preprocessing
Preprocessing performs an adjustment of a detected face into a useful form.It consists of two parts.The first part is the change of the size of the face picture.
We have to resize a detected face for the classifiers (we need the same size of feature in all time).Authors have set up the size of the detected face to 120×120 pixels.
In the second part, the detected face is converted into grayscale.

Feature Extraction
The most important step of the face authentication is the choice of significant parameters/features.These parameters should meet some requirements.First, they should be robust.Parameters should not change their characteristics in time.Second, they should be secure, which means that it should not easy to mimic these parameters.Third, they should be both illumination and rotation invariant [4] and [5].The most used descriptors are HOG and LBP [11], [16] and [17] and these have also been used in the presented experiment.

1) Histogram of Oriented Gradients
The method is based on evaluating well-normalised local histograms of image gradient orientations in a dense grid.The basic idea is that local object appearance and shape can often be characterised rather well by the distribution of local intensity gradients or edge directions, even without precise knowledge of the corresponding gradient or edge position.Computation algorithm is described in [18].We used these input arguments for HOG extraction: size of HOG cell was set up 8×8 pixels, number of cells in block was 4, number of overlapping cells between adjacent blocks was 1 and number of orientation histogram bins was set up 9.This setting corresponds to the length of HOG feature 7056 for image size 120×120 pixels.

2) Local Binary Patterns
The basic LBP method characterises the spatial structure of a local image texture by thresholding 3×3 square neighbourhood with the value of the center pixel and considering only the sign information to form of a local binary pattern [17].LBP is defined by Eq. (1).
where x c and y c are coordinates of pixel, I c is a brightness level of center pixel, I u is a brightness level of neighboring pixel, s(I u − I c ) is the threshold function and U is a number of neighboring pixels.

Classification
From the classifiers point of view, authors compare two types of classification methods.The first method is MLNN.This method has appropriate properties for face authentication (high accuracy, generalisation, adaptation) [5].The second method is SVM.This method is very useful for face authentication because it is primarily intended for binary classification [4].

1) Multilayer Neural Network
Authors have used feedforward Multilayer Neural Network with backpropagation in the experiment [19].network consisted of three layers (input, hidden and output layer).The number of neurons in the input layer is determined by the number of extracted parameters for a given detected face (for example 7056 HOG).
The number of neurons in hidden layer was set up to 10.Output layer represented two output classes (reference user, imposter).The sigmoid was used as an activation function with the steepness of 0.5 for each neuron.

2) Support Vector Machines
SVM offers a progressive method in the field of machine learning.The principle of classification is to find the hyperplane that divides the training data into the feature space.The optimal hyperplane is such that the training data points lie in the opposite half-space and the value of the distance between half-spaces is the largest.In other words, the goal is to maximise space among half-spaces (maximum margin).Support vectors are described by training data points that represent a decision-making role [4] and [5].

Decision
The last step of the authentication process is the decision about allowing the access or not.The system has to decide whether the user is the reference one or the imposter.The decision is based on comparison of max value of score and threshold.If the max value is higher than a threshold, the user is marked as reference one otherwise as an imposter.
Measurement of the face authentication performance allows comparison of different systems.Authors have used FAR, FRR, EER, ROC and DET curves for measurement of performance.The FAR is the measure of the likelihood that the face authentication system will incorrectly accept an access attempt by the imposter.FAR is computed by Eq. ( 2).The FRR is the measure of the likelihood that the face authentication system will incorrectly reject an access attempt by a reference user.FRR is computed by Eq. ( 3).EER indicates that the proportion of FAR is equal to the proportion of FRR.ROC shows the relationship between true positive rate (sensitivity) and False Positive Rate (FAR) at various threshold settings.DET curve is a graphic representation of error rates (FAR vs FRR) for binary classification systems [20].
where N F A is the number of incorrect acceptance and N IV A is the number of all imposter attempts.
where N F R is the number of incorrect rejection and N EV A is the number of all authorized attempts.

AR Face Database
The database contains over 4000 colour frontal view images of 126 people's faces (70 men and 56 women) that were taken during two different sessions separated by 14 days.Similar pictures were taken during the two sessions.No restrictions on clothing, eyeglasses, makeup, or hairstyle were imposed upon the participants.Controlled variations include facial expressions (neutral, smile, anger, and screaming), illumination (left light on, right light on, all side lights on), and partial facial occlusions (sunglasses or a scarf) [21].
Authors have chosen 18 reference participants (13 men and 5 women) and 10 imposter participants (6 men and 4 women) for the experiment.It corresponds to the number of participants in the correlation speech database.For each person, a 16 digital images have been used for training and 10 for testing of the system.

Experimental Results
The SVM model and MLNN were trained for each of 18 reference users.These classifiers were used for recognition between two classes (class of authorised user and imposters class).Class of authorised user was trained using 16 digital images (this set of images contained images from both sessions), 10 remaining images were used for testing.The imposters class was trained as a background model.We used 1 digital image from each of 17 reference users for training this model.As the testing data for imposters, digital images from 10 participants (10 imposters) were used.These participants do not belong to the reference users (background model was not trained by using digital images from these participants).This training process was repeated for LBP only, HOG only and LBP+HOG features.The results for both classifiers with different features are shown in Tab. 1.The table contains the values of FAR and FRR with threshold 0.5 (50 %) and the value of EER.The values of these parameters are given in percent.ROC curves for both classifiers are shown in Fig. 2. As shown in the table above, best results were achieved by HOG features for SVM classifier.The values of FAR and FRR were 2.7 % and 3.3 % for threshold 50 %.It corresponds to accuracy 96.9 %.EER was 2.8 % for these parameters.The lowest EER (3.9 %) was achieved by using the same parameters for MLNN classifier.Combination of LBP and HOG parameters brings similar results as HOG parameters.From the classifier point of view, the SVM classifier achieved a better result for all features when compared to MLNN classifier.More detailed results are listed only for SVM classifier with HOG features.
From the authentication point of view, we want to achieve the lowest FAR.It means that the threshold  has to be set up to a high value.Figure 3 shows the values of FAR and FRR depending on the threshold.The zero value of FAR was achieved after setting the threshold to 62 %.Table 2 shows confusion matrix for threshold equal 50 %.As we can see in Fig. 3 the value of FAR decreases with an increasing threshold.
On the other hand, the value of FRR increasing as expected.Figure 4 shows DET curve with a marked point of EER.

Conclusion and Future Work
The aim of our research was to find the best features for face authentication and suitable classifier with the lowest values of FAR and FRR.The results and knowledge of this research will be used for the design of multimodal biometrics system based on voice and face authentication.Our research was focused on the analysis of AR Face Database using different parameters and classifiers.We compared LBP, HOG features and their combination.gradients, on the other hand, investigates the ensemble (histogram) of changes (gradients).Therefore, it is expected that investigating the whole image rather than looking for local patterns should perform better.Classification errors occurred primarily for images where a participant wears a scarf.It means that the values of FAR, FRR and EER will be lower when we do not use these images.
If we want to compare our results with other research works it is important to say that almost all of the works reviewed in the introduction used different databases and/or different experimental setup, thus any direct comparison between the numerical results would be meaningless.If we compare only suitability of classifiers we can say that SVM classifier is the most used classifier for face authentication.This fact corresponds to the results of experiments mentioned in the introduction and our results.
The future work will be divided into two parts.The first part will be focused on expanding our speech database (Comtech) with photos of faces.The second part will contain a design of the multimodal biometrics authentication system based on voice authentication and face authentication.
Authors used two machine learning methods for classification (MLNN and SVM).The comparison was made based on values FAR, FRR and EER.From the classifiers point of view, we achieved lower values of FAR, FRR and EER with SVM classifier than with MLNN classifier.This result follows the properties of classifiers.The SVM classifier uses "only" support vectors for classification.It means SVM classifier does not need big training data set if the training data set contains suitable support vectors.On the other hand, MLNN classifier needs big training data set for precise neuron weights setting.Conclusion of this result is that if we have small training data set we should use SVM classifier.Experimental results show the best values of errors were achieved for HOG features.HOG performs better than LBP because while binary local pattern feature takes care of a local pattern, histogram of