Enhanced Asymmetric Bilinear Model for Face Recognition

Bilinear models have been successfully applied to separate two factors, for example, pose variances and different identities in face recognition problems. Asymmetric model is a type of bilinear model which models a system in the most concise way. But seldom there are works exploring the applications of asymmetric bilinear model on face recognition problem with illumination changes. In this work, we propose enhanced asymmetric model for illumination-robust face recognition. Instead of initializing the factor probabilities randomly, we initialize them with nearest neighbor method and optimize them for the test data. Above that, we update the factor model to be identified. We validate the proposed method on a designed data sample and extended Yale B dataset. The experiment results show that the enhanced asymmetric models give promising results and good recognition accuracies.


Introduction
Face recognition has been an active research area due to its applications in security entry systems and automatic law enforcement system. Pose variances, illumination changes, and noise from observations make it hard to define a model with promising recognition accuracies. Previous work [1][2][3][4][5] has proven that image variations of faces under varying illuminations could be modeled by linear spaces while view spaces which resulted from varying head poses are considered nonlinear [6]. Manifold learning [6,7], bilinear models [8][9][10], linear dimensionality reduction [11,12], and tensor analysis [13][14][15] have all been applied to solve face recognition problems. Recently deep learning techniques are also employed to solve face recognition problems and have achieved good performances [16][17][18].
Usually, faces need to be aligned before recognition. There has been lots of works which have been done on face alignment [19][20][21][22][23][24][25][26][27][28][29][30]. And it is well known that misalignment will result in recognition accuracy drop [20]. In our work, we explore the possibility of face recognition without alignment. In the proposed algorithm, we concentrate on separating factors that might affect recognition accuracy. To be specific, we use bilinear model to separate a varying factor from recognition target. And we prove in experiments that this separation is effective and required.
Bilinear models [8] are usually used to model systems with two factors and each factor itself is linear given the other factor fixed. Bilinear models for 2D image data have been widely used to solve face recognition problems due to their simplicity in formulation. In this category of solutions, factors introduced in a system are modeled as a symmetric bilinear model or asymmetric bilinear model. In this work, we also introduce asymmetric bilinear model due to its simplicity in representation.
Among all variance factors in face recognition problems, illumination change is one influential and intensively studied factor. Example works on illumination variations in face recognition problem include [9,10], of which work in [10] is closely related to our work. Authors in [10] combine ridge regression into symmetric bilinear model to separate illumination and identity factor. Symmetric bilinear model decomposes input data into two factor matrices representing two factors and an interaction matrix denoting interactions between two factors. Instead of symmetric models, we utilize asymmetric models which decompose input data into a factor matrix and a matrix containing the other factor and interaction matrix. With fewer parameters to optimize, 2 International Journal of Distributed Sensor Networks the asymmetric model does not suffer from convergence problem as in symmetric model. But, due to noise captured in image features, the original asymmetric model is not able to cope with illumination changes in real world face recognition problems.
In this work, we propose a modified asymmetric model for accurate face recognition. As in the original asymmetric bilinear model, we calculate probability predictions with Bayesian rules and expectation maximization method (EM). But instead of initializing the factor probabilities randomly, we initialize them with nearest neighbor method and optimize them for the test data. And we update the factor model to be identified. The enhanced asymmetric model is composed of three modules: calculating factor matrices from training data, initialization step for test data, and optimization and classification step for test data. In the first step, we use singular value decomposition to separate the factor to be identified from other factors based on training data. In the initialization step, a nearest neighbor method is introduced to initialize the joint probabilities of all factors which is further updated in optimization step. In the third step, we update the factor model to be identified. We validate the proposed method on a manually created data and cropped images from extended Yale B dataset. The experiment results show that the enhanced asymmetric models give promising results and good recognition accuracies.
The rest of this paper is organized as follows: in Section 2 we explain the proposed method in detail, including how to use Bayesian rules and EM method to optimize test factors and how to achieve illumination-robust face recognition and identity-robust illumination recognition goal with the proposed method; in Section 3, we validate the proposed method on two types of data: designed data and public available extended Yale B dataset; in Section 4, we will conclude our work and discuss possible future works.

Enhanced Asymmetric Bilinear Model
Bilinear models [8] separate the input data into two factors, described as style and content. Content is the factor to be identified; for example, identity and style are the factors that vary and need to be separated. Bilinear models reduce to the linear model when one factor is fixed. It contains symmetric and asymmetric models. Authors in [10] use symmetric model and take the identity as content and the illumination as style. In a symmetric bilinear model, style and content interact using an interaction matrix. But actually we are not interested in both style and content matrices in a single task classification problem. For simplicity, we choose an asymmetric model and modify it to accurately recognize factors.
Suppose we represent content with a vector of parameters b of dimension and use y to denote dimensional input feature vector of style and content ; an asymmetric bilinear model represents the observation vector y as This formulation can also be written as where A denotes × matrix with entries { } and In a symmetric model, and in (3) are represented separately thus resulting in one more variable.
As in [8], we use singular value decomposition to fit asymmetric model to training data. First we stack ×dimensional input features as follows: .
then we apply singular value decomposition to the observation: The style matrix A is initialized as the first columns of US and the content matrix is initialized as the first rows of V .

Optimizing Factor Matrix for Test Data.
Given test dataỸ of an unseen stylẽ, the goal is to find the matrix containing style factor Ãand content factor b̃approximating test data as precious as possible, that is, to minimize the approximation error , In original bilinear model, optimization is under several assumptions: style matrix of the test data b̃equals style matrix of the training data b̃; the probability of test observation of a specific style and a specific content does not need special way of initialization. Due to captured noise in input feature, the problem in reality is not preciously a linear model. So we add initialization step and update content matrix in optimization. According to experimental results, updating the content matrix b̃instead of keeping it unchanged gives more accurate recognition results.
This variable is introduced as a weight in optimization step. Instead of the random initialization, we set the initial (ỹ | ,̃) as 1 ifỹ has a nearest neighbor y with a stylẽand a content̃and 0 otherwise.

Illumination
Training data Test data Figure 1: In the illumination-robust identity recognition application, we show the split between the training and test data.

The Algorithm.
To minimize the error between the test features and its approximation Ãb̃, we use an EM and add an extra update of the matrix containing content b̃which proves to give more accurate recognition results.
(1) Initialize the matrix containing style factors̃as being equal to that computed from training data , and initialize (ỹ |̃,c) as 1 ifỹ has a nearest neighbor y with a stylẽand a content̃, and otherwise 0.

Illumination-Robust Face Recognition and Identity-Robust Illumination Recognition.
In this section, we explain the possible applications of the proposed method in face recognition problems. First, we define identity as content and illumination as style to show the application of linear models to illumination-robust face recognition problem; see Figure 1. Then we switch content and style and set the illumination as content and identity as style. It is widely acknowledged that the illumination changes could be modeled with linear space but it is not clear if the identity space is also the same case. We design the second experiment to show the application of the proposed method in identity-robust illumination recognition and to test the expansibility of the system. In the following section, we show experimental results of these two experiment settings.

Experimental Results
We validate the proposed method with two categories of data: one is a manually defined sample data and another is a publicly available face recognition dataset. The aim of the designed sample data is to verify the soundness of the proposed method and the second category of experiments is to provide standard results.

Experiment on Designed Data. Suppose we have a tensor
here content denotes the total number of ones in a feature vector and style denotes which positions are set as ones. For example, in the first style, ones occur in the first, third, fifth, and seventh columns. If we use all contents and styles for training and select one content to test the method, the recognition accuracies should reach 100 percent. So we choose the whole tensor as training data and pickỸ as Y(2, : , :).
With the proposed method, we get the experimental results in Table 1. The tables show recognition accuracies and approximation errors in each iteration. From the table, we can see that the proposed method can reach 100 percent recognition accuracy on the designed dataset.

Experiment on Extended Yale B Dataset. The extended
Yale Face Database B contains 16128 images of 28 human subjects under 9 poses and 64 illumination conditions and they provide a subset of cropped images. In our experiment we use the cropped images. With this dataset, we design two sets of experiments: illumination-robust face recognition and identity-robust illumination recognition.
In illumination-robust face recognition, we pick those illuminations with id ranging from 1 to 39 as training and illuminations with id 50 for test. The experimental results for this experiment setting are shown in Table 2. From the table, we can see that after 3 iterations the recognition accuracy reaches the maximum 0.92 and then starts decreasing dramatically.
In identity-robust illumination recognition, we pick those persons with id ranging from 1 to 38 as training and person with id 39 as test. The experimental results for this experiment setting are shown in Table 3  although with similar amount of training data, illumination recognition gives better results than identity recognition. This might be due to sufficient training data (38) for the identity space or illumination to be an easier target. From Tables 2 and  3, we can say that the enhanced asymmetric model gives high recognition accuracies. Recognition accuracies increase quite fast with approximation error decreasing and with several steps, we can achieve the optimum recognition accuracies for our method. We further show a set of experiments exploring the effect of training sample numbers. In illumination-robust identity recognition application, we pick the same test data as the previous experiment but we pick the illumination with identity ranging from 1 to 5 as training in experiment 1 (Exp1), illuminations with identity ranging from 1 to 9 as training in experiment 2 (Exp2), and illuminations with identity ranging from 1 to 13 as training in experiment 3 (Exp3). Experimental results are shown in Figure 2.
From the figure, we can see that, in the term of errors, the training sample numbers differences have an obvious effect in the first and the second iterations but tend to be subtle in the following iterations. In fact, with more training  sample numbers, the optimization tends to be unstable. The best accuracy 92.31% occurs in all experiment settings. In the third iteration, experiment 1 and experiment 2 reach the maximum while experiment 3 reaches the maximum in iteration 6.
The proposed algorithm is further compared with a nearest neighbor method to show the importance of separating style and content. We design the nearest neighbor method applied in identity recognition problem: first, varying numbers of training data are selected (i.e., 5, 9, and 13); then, for each test data, it is compared with all styles (illumination variations in this case) from all contents (face identity in this case); finally, the label of the test data is set as the label of the training data with the minimum distance. The recognition accuracy of the nearest neighbor method is shown in Table 4. From the table, we can see that the raw nearest neighbor method performs very badly on the identity recognition problem here. What is more, when the number of training data increases, the recognition accuracy drops. Note that each training data case has a different style compared to other training data in the dataset. We can see from the recognition accuracies that more training data from variant styles confuses the method. Adding different styles of training data without separating them makes the recognition accuracies worse.

Conclusions and Future Works
In this paper, we present an enhanced asymmetric bilinear model and apply it to illumination-robust face recognition problem and identity-robust illumination recognition problems. By initializing the content and style probability with nearest neighbor method and adding update for content matrix, we achieve quite good recognition accuracies. In the future, it would be interesting to explore the application of the proposed method on models containing more than two factors. Also, we would like to explore applications of asymmetric models on applications other than face recognition.