Face Recognition Using Holistic Features and Linear Discriminant Analysis Simplification

This paper proposes an alternative approach to face recognition algorithm that is based on global/holistic features of face image and simplified linear discriminant analysis (LDA). The proposed method can overcome main problems of the conventional LDA in terms of large processing time for retraining when a new class data is registered into the training data set. The holistic features of face image are proposed as dimensional reduction of raw face image. While, the simplified LDA which is the redefinition of between class scatter using constant global mean assignment is proposed to decrease time complexity of retraining process. To know the performance of the proposed method, several experiments were performed using several challenging face databases: ORL, YALE, ITS-Lab, INDIA, and FERET database. Furthermore, we compared the developed algorithm experimental results to the best traditional subspace methods such as DLDA, 2DLDA, (2D) 2 DLDA, 2DPCA, and (2D) 2 2DPCA. The experimental results show that the proposed method can be solve the retraining problem of the conventional LDA indicated by requiring shorted retraining time and stable recognition rate.


Introduction
The published methods of face recognition can be categorized in to three groups [1].Firstly holistic matching method, which uses the whole face region as raw input to the recognition system; secondly features based (structural) matching methods which use the local features such as eyes, noses, and mouth, and local statistics (geometrics and or appearance) as data input to the recognition system; and thirdly hybrid methods which use both whole face TELKOMNIKA ISSN: 2087-278X Face Recognition Using Holistic Features and Linear Discriminant … (I Gede Pasek SW) 777 and two-dimensional (2D-PCA/LDA) [6][7][8][9][10][11] which is matrix-based analysis.The 2D-PCA outperforms over the 1D-PCA.However, the 2D-PCA requires more coefficients for image representation than 1D-PCA.The two-directional 2D-PCA [8], which works in both the row and column direction of image has been proposed in order to achieve higher and more stable accuracy as well as fewer coefficients is required for image representation.However, they still have retraining problem.
Like 1D-PCA, the 1D-LDA is essentially working on vector based for features clustering.Generally, the 1D-LDA provides better performance over the PCA, because the LDA discriminate the data using both between class scatter and within class scatter information.The Ref. [16] shows that the LDA provides higher discrimination power than the PCA and the classification information of the PCA spread to all over to principle components while the LDA's concentrate in top few discriminant vectors.However, the 1D-LDA has several difficulties: computational cost and singularity problems.Due to those problems, direct LDA has been proposed, as described in Ref. [9].In order to keep the two-dimensional structure of face image the 2D-PCA and 2D-LDA has been developed which the 2D-LDA gave better performance than the 2D-PCA.In order to get more compact features, the two-directional and two-dimensional PCA ((2D) 2 PCA) [8] and PCALDA ((2D) 2 PCALDA) [11] have been developed.The (2D) 2 PCALDA out performs over the others algorithms.Therefore, we will compare our proposed methods to the (2D) 2 PCALDA methods.The strength of 1D-PCA and 1D-LDA for face recognition with DCT based holistic features as raw input on the system has been presented in Refs.[13][14].Moreover, we also developed an alternative 1D-PCA [14] that improve the global mean dependent of the covariance matrix and weight 1D-LDA [13] that improve the class separable.However, the LDA and its variations as described previously still have retraining problem because the between class scatter depend on the global mean.The Ref. [15] proposed a new formulation of scatter matrices to extend the two-class non-parametric discriminant analysis and multi-classifier integration to address Gaussian distribution assumption of the LDA-based methods.However, the proposed formulation of non-parametric between class scatter matrices required larger time computational cost than that of the conventional LDA (by about three times of the LDA-based).
To address large computational cost and retraining problems, we propose a new formulation of S b that is based on constant global means.The proposed formulation of S b has the same characteristic as the original one in terms of its symmetric.

Holistic Features Extraction
In order to get a global/holistic compact feature of face image called as holistic features (HF), we apply frequency and moment analysis as pre-processing face image.From this processing, we keep a small size of dominant frequency contents and invariant moment coefficients as HF.The frequency analysis (i.e.DCT), which has good energy compactness, is utilized to obtain the dominant frequency content [3][4][5].In addition, the DCT decomposes the entire face image without geometrical normalization, bounding box, and blocking processing.From the DCT decomposition output, the dominant frequency content is created by three steps: firstly, convert the DCT coefficients to a vector using row ordering technique; secondly, sort the vector descending using quick sort algorithm, and finally truncate m first vector elements (i.e., less than 100 elements).Those processes are performed on both training and query (probe) face images.However, in the training process, those are performed one time.
Consider the dominant frequency content, if they are reconstructed into the face images, the reconstructed face images will be different.However, we can still understand that they are the face images, as shown in Figure 1(a).It means that the dominant frequency content existing in low-frequency components is sufficient for face image representation.In other word, if an image is transformed to the frequency domain and then the high frequency components are removed, the reconstructed image will lose a little significant information.Furthermore, if the difference between the reconstructed image and the original is determined by root mean square error (RMSE), we will get results as shown in Table 1.It shows that the dominant frequency of DCT-based tends to contain most information of face image than that of the wavelet (DWT)-based.
In order to get robust HF of any face pose variations, the moment information that provides invariant measure of face images shape is considered.The moment information is obtained using invariant moment analysis, which is derived from central moment analysis [17].The invariant moment set is invariant to translation, scale change, and rotation; therefore, this concept can be used to get the holistic information of any face pose variations.Next, the invariant moment analysis was determined just in the intensity component of color images.Finally, from both the selected frequency coefficients (f) and the moment invariant set (g), we construct the HF vector, where i is i-th class of face image.The dimensional of x is m+n, where m is number of selected frequency coefficients and n is number of selected invariant moments.The strength of the proposed HF: it is such a compact size that have higher discrimination power than that of without moment information, as shown in Figure 1(b) and it has almost similar features of small pose variations of face image which has been proved in Ref. [12].It means that the HF can overcome large computational cost of the CPCA and CLDA based face recognition algorithms and the large variability of pose in single face.In this case, the discrimination power was determined by Eq. ( 11) and the invariant moments, which are higher than 4, are not included because they make the within class scatter matrix (S w ) be singularity.This problem comes because those invariant moment's values are close to or event zero.

Features Clusters 4.1. Classical LDA
The aim of LDA is to find a transformation data such that feature clusters are most separable after the transformation.The LDA determines the optimum projection matrix from both the S b and the within-class scatter matrix (S w ) , as explained in Refs.[5,9,13].
Suppose we have {(X . From the data samples, both the S b and the S w can be determined as follows. where the P(x k )=N k /L.From both the S b and S w, the optimum projection matrix of LDA, W, can be determined which has to satisfy the following criterion.
The W=[w 1 , w 2 ,w 3 ,...,w p ] which satisfy the Eq. ( 3) can be obtained by solving eigen problem of inverse S b times S w and then select p orthonormal eigenvectors corresponding to the largest eigenvalues (i.e.p<q, q is the dimensional size of vector input, x).Finally, the input vector is projected into new space using the following equation.
The main problem of the LDA has the singularity problem of scatter matrix due to the high data dimensional and small number of training samples so called as small size problem (SSS).Furthermore, they require retraining all samples to obtain the most favorable projection matrix when new data registered into the system.It comes because all terms on summation of S b depend on global mean and summation should be recalculated from scratch (see Eq. 1).Regarding to the SSS problem of LDA, some methods have been proposed to solve that problem such as DLDA, RLDA, and PCA+LDA.However, those methods still require large computational costs and memory space requirement, and have retraining problems.

Simplified LDA
In order to decrease the retraining computational load of the LDA algorithm, we simplified the LDA using a constant global mean assignment for all samples.Suppose we have the data cluster of two classes in three-dimensional which is normalized in the range [0-1], shown in Figure 2  Using the Eq. ( 1), the Sb of this case can be rewritten as follows., which make the data cluster be more or the same separable than or as that of the original one.This condition makes the discrimination power of the data projection be higher or the same than or as that of the original the data projection.
If the a µ is moved to the maximum value of the range ( a µ = [1 1 1] T , as shown in Even though the predictive 2 c b S has the same computational complexity as that of the Eq.( 5) but it does not require to recalculate global mean.Logically, the 1 For n-dimensional data, all cases of approximation S b can be generalized as the following equation.

( )(
) where the µ p is a constant of global mean µ a .When a new class, x new comes into the system, the Eq. ( 8) can be updated as following equation.
In order to decrease much more retraining time complexity, the updating of the S w also can be simplified as the following equation.

(
) Let us compare the Eq. ( 9) with Eq. ( 1) and Eq. ( 10) with Eq. ( 2), both the Eq. ( 9) and Eq. ( 10) seem to have much less complexity than that of original one (Eq.( 1) and Eq. ( 2)).The time complexity of recalculating Sb using Eq. ( 9), it requires: n 2 multiplication operations for case 1, n 2 multiplication and n 2 addition operations for case 2,while the updating of S w using Eq. ( 10) also need n 2 multiplication and n 2 addition operations.By substituting the original S b and S w with updated ones (Eq.( 9) and Eq. ( 10)) of classical LDA eigen analysis, the optimum W called as sLDA projection matrix is obtained.In case of updating S b using case 1 and case 2 model of constant global mean, the sLDA is called as sLDA_C1 and sLDA_C2, respectively.The optimum projection matrices are constructed by selecting small number (m) of eigenvectors, which correspond to the largest eigenvalues.By using these optimum projection matrices, the projected features of the both training and querying data set can be performed as done by the classical LDA using Eq.(4).

The Strength of sLDA
In order to prove whether the approximation S b provide the same separability as the original one, evaluation using discrimination power (DP) [16], which represent the ability of features separation is performed by the following equation.
Where sep means separation.As reported in Ref. [16], the DP parameter had been successfully used to analysis for knowing which parts of the face have large discriminant information.
In this case, the DP is examined in of all cases of approximation S b using data from well-known ORL database, which is performed using the following procedure.In this case, we also compare the DP of the proposed methods with original LDA (LDA).(i) Obtaining the optimum W of simplified LDA using mentioned procedure.(ii) Transforming the x i k to the projected data ( y i k ) using the Eq. ( 4).(iii) Determining the within and between class scatter of projected data (y i k ) called as S b Pro and S w Pro using Eq. ( 1) and Eq. ( 2) respectively.(iv) Calculating the DP of the projected data by substituting S b and S w of Eq. ( 11) with S b Pro and S w Pro , respectively.The examination result shows that all cases of approximation S b have closely the same DP as that of the original LDA algorithm (see Figure 3 were determined by trace their most significant eigen values.The results show that the scatter of case 2 provide higher than that of case 1 and that LDA, as shown in Figure 3(b).It means that the case 1 and 2 possibly has better class separable than that of the LDA.Therefore, the case 1 and 2 provide little bit higher recognition rate than that of the CLDA (see Figure 7).This result also match/inline with the result of power discriminant analysis, as shown in Figure 3.

The Implementation for Face Recognition
The diagram block of the proposed face recognition is presented in Figure 4 consisting of three main processing: feature extraction, retraining process, and recognition process.The function of HF is to find out the most useful information from face images using the algorithm as presented in section 3. The retraining process is done by the sLDA as presented in section 4.2.Each retraining step, the matrix S b and S w have to be save for performing next retraining.
In recognition process, Each face image is extracted into HF and then the HF is transformed into new space data called as sLDA projected data.The sLDA projected data of the training samples is matched with the querying projected data using nearest neighbor rule.It means the smallest score is concluded as the best matching face.

Experimental and Results
The experiments are carried out using several challenging face databases: the ORL database [19], YALE database [21], ITS-Lab.Kumamoto University database [12], INDIA database [20], and FERET database [18].Each database has special characteristics, which are described below.
The ORL database was taken at different times, under varying lighting conditions with different facial expressions (open/closed eyes, smiling/not smiling) and facial details (glasses/no glasses).All of the images were taken against a dark homogeneous background.The faces of the subjects are in an upright, frontal position (with tolerance for some side movement).The ORL database is a grayscale face database that consists of 40 people, mainly male.Total face images are 400 samples.The example of face pose variations of ORL database is shown in Figure 4 All the experiments are performed in those databases with the following assumptions: (i) The face image size was 128 x128 pixels (28 pixels/cm) represented using 24 bits for color image and 8 bit of for grayscale image per pixel.1282 pixels).In order to know whether our proposed methods can work for with and without HF (nodimensional reduction), we did experiments on small size database (ORL and ITS databases) ISSN: 2087-278X and large size database (FERET database).In addition, to handle the out of memory problem on the LDA and our proposed methods due to large dimensional size of input image (128×128 pixels), we resized the original face image into 32 × 32 pixels.This size was chosen because some researchers used this dimensional size when they directly implemented the LDA and its variations for data cluster without dimensional reduction.The results show that our proposed methods can works well (see Figure 7) which are indicated by high enough recognition rate by about more than 89% and more than 96% for without (base line) and with HF, respectively.In addition, the results also show that the recognition rate of the proposed methods (sLDA_C1 and sLDA_C2) are almost the same as that of the CLDA for both using HF and without HF, which means the propose method has the same ability as the CLDA in terms of features clustering.The HF can provide better recognition rate than that of without HF because it has ability to extract the most significant and remove the noise information of face image.In addition, the HF also provides higher discriminatory power than that of without HF as reported in Wijaya et al (2008) which means the HF already contain most useful discriminant information.In order to show that the proposed method required less retraining time for new insertion data, the next experiment was performed.It was done in FERET face database performing the retraining gradually: firstly, it was trained 208 face classes as initial data and then added gradually 20 new face classes to the system until reaching 508 face classes.The experimental results were plotted in Figure 8(a).It shows that all variations of the sLDA require less retraining time than that of CLDA.Even though, the retraining times of the sLDA increase but its average of increment slope is less than half of that of CLDA.Moreover, the larger number of classes are, the larger increment of training time of the CLDA will be, while the sLDA almost require constantly retraining time.This result proves that the sLDA requires less retraining time, which means the sLDA, can solve the retraining problem of CLDA.It can be achieved because the sLDA has simpler computation complexity than that of CLDA (d+d  Regarding to the recognition rate stability of our proposed methods, they give almost the same stability recognition rate as that of the CLDA but more stable than that of GSVD-ILDA which is one of the recent variation of LDA for solving retraining problem [20].The results are inline with the previous experimental results that all of the sLDA variations have the same structure as the CLDA but they have simpler computation complexity.Therefore, all cases of sLDA are alternative algorithm for features cluster of large sample size databases, which requires much retraining processing.

Conclusion and Future Work
From all of the experimental results have been prove our hypothesis as describe below.Firstly, the HF as a dimensional reduction of raw image has been successfully implemented with good achievement when the one-dimensional CLDA, and sLDA as features clustering.Secondly, all variation of the sLDA-base face recognition has been proved that they require less time for retraining.The rest, the proposed method (sLDA) outperforms over the recent subspace methods (2DPCA and 2DLDA based methods) and the sLDA can used as alternative solution to avoid recalculating the S b and global mean on the retraining process of LDA.In addition, the sLDA is alternative solution for large size data clustering because it does not depend on the global means.However, All of the sLDA just work for updating the retraining data belonging to the new class.
The new strategy using statistical prediction will be considered to avoid this problem.In addition, to get more precise verification result, we will consider more local features analysis involving eyes, nose, mouth, and context information of the face image.

Figure 1 .
Figure 1.(a) The reconstructed images of DCT-based with size from 16 (top-left) to 256 (bottom-right) elements and (b) the discrimination power as function number of invariant moment are images sample form L classes, each class has N samples, X i k represents image matrix of i-th samples of k-th class, x i k represents features vector of i-th samples of k-th class, where i=1, ... , N k , and N k is number of training samples of class C k .Let ∑ size.Let define k µ as mean features vector of class C k , and a µ as mean features vector of all samples:

Face
Recognition Using Holistic Features and Linear Discriminant … (I Gede Pasek SW) (a).

Figure 2 .
Figure 2. The illustration two classes clustering of three-dimensional data

From the Figure 2 S
(b), when the a µ is moved in the origin point ( a µ is equal to null vector, a µ = [0 0 0] T ), the Sb of this case can be simplified as has less computational complexity, because it does not require to recalculate global mean and it has the same characteristic as the original one in term of separability and symmetrical matrix.The separability means that

Figure
Figure (c), the Sb of this case can be written as separable scatter than that of the original once because the original global mean of the training data is placed in between [0] and[1] while the constant global mean is placed in the maximum (1) and in the minimum value (0).Therefore the distance between the mean of each class to the original global mean is probably less or the same than or as that of each class to the constant global mean.In other words, the W

Face
Recognition Using Holistic Features and Linear Discriminant … (I Gede Pasek SW)

Figure 3 .
Figure 3. (a) The discrimination power of our proposed methods compared to established LDA and (b). the comparison of the W S W org b

Figure 4 .Face
Figure 4. Diagram block of sLDA based face recognition (a).The ITS-Lab database consists of 48 people and each person has 10 pose orientations as shown on Figure 4(b).Total face images are 480 samples.The face images were taken by Konica Minolta camera series VIVID 900 under varying light condition.The YALE database is grayscale database, which consists of 15 people, and each person consists of 11 differential facial expression, illumination, and small occlusion (by glass).Therefore, there are 165 face images in the database.On of the person is shown on Figure 4(c).The INDIA database consists of 61 people (22 women and 39 men) with each person having eleven pose orientations as shown on Figure 4(d): looking front, looking left, looking right, looking up, looking up towards left, looking up towards right, and looking down.The Indian database also included some emotions: neutral, smile, laughter, sad/disgust.Total face images are 671 samples.From FERET database, 2032 images form 508 individuals were selected which correspond to four different sets, namely (fa, fb, ql, qr).The example of face pose variations of some database is shown in Figure 4(e).

Figure 5 .
Figure 5. Example of face pose variation of single subject

Figure 6 .Face
Figure 6.The recognition rate of CLDA and our proposed methods (sLDA_C1 and sLDA_C2)for with and without HF

Figure 7 .
Figure 7.The comparison of the recognition rate of the proposed to established algorithms

2
Figure 8.(a) Retraining and querying time, and (b) Recognition stability for incremental data.