Article for Active Appearance Model Fitting Based on Optimization Methods

Active Appearance Model (AAM) is one of the most popular techniques that extract features by precise modelling of human faces under various physical and environmental circumstances. In such active appearance model, fitting the model with original image is a challenging task. State of the art shows some of the optimization methods that are applicable to resolve the optimization problem. This encourages us to review the recent improvements of AAM fitting based on optimization methods. This study provides an up-to-date review of researches as regards to AAM optimization; studies are reviewed and discussed in four different sections. This review evidently explores the limitations of the previous works to facilitate better understanding different kind of approaches. The aim of this study is to help as a guide for further studies.


INTRODUCTION
AAM is a computer vision algorithm, which matches a statistical model of object shape and appearance to a new image.The Active Appearance Model (AAM) has been proposed by Cootes et al. (1998) and Abdulameer et al. (2013); of late, a number of applications employ AAMs for feature extraction, which include, face modelling for understanding human behaviour and medical imaging projects, such as segmentation of cardiac MRIs or the diaphragm in CT data and registration in functional heart imaging (Bosch et al., 2001;Wang and Chen, 2010).
Basically, AAM model could be developed in four primary phases (Gao et al., 2010;Sethuram et al., 2010): • Initially, a statistical shape model is built to model the differences in the shape of an object, by utilizing a number of illustrated training images • Next, a texture model is constructed to model the variations of texture, which is depicted by pixel intensity.• Ultimately, an appearance model is constructed by merging the shape and the texture models (Cootes et al., 1998).
In this study, we provide a critical review of active appearance model fitting based on optimization methods and we investigate their limitations to help as a guide for further studies.

ACTIVE APPEARANCE MODEL THEORETICAL BACKGROUND
Essentially, AAM model can be composed from three primary phases: (Cootes et al., 1998).

Statistical shape model:
A statistical shape model is constructed from a set of annotated training images.In a 2-D case, a shape is represented by concatenating n point vectors {(xi; yi)}: The shapes are then normalized by Procrustes analysis (Goodall, 1991) and projected onto the shape subspace created by PCA: where, x denotes the mean shape, P s = {s i } is the matrix consisting of a set of orthonormal base vectors s i and outlining the modalities of variants extracted from training set and b G consists of the shape variables in the shape subspace.Consequently, depending on the corresponding points, images in the training set are warped to the mean shape to produce shape-free patches.
Texture model: The texture model is developed more identically as the shape model.Depending on the shape free patch, the texture could be raster scanned into a vector g, followed by linear normalization of the texture, by the parameters ˯ = { , {ˠ and ˧ is given by: where, α and β are, respectively, the mean and the variance of the texture ˧ and 1 = [1; 1, …..1] T is the vector with the same length of ˧ C .Eventually, based on PCA, the texture is projected upon the texture subspace Combined appearance model: Finally, the coupled relationship between the shape and the texture is analysed by PCA and the appearance sub space is created.At the end, the shape and the appearance can be described as follows: where, c is a vector of appearance parameters controlling both the shape and the texture and ˝ and ˝ are matrices describing the modes of variation derived from the training set.Thus the final appearance model can be represented as b = Qc where: And Q is the matrix of eigen vectors of b.
After the model is created, it is important to fit the model to new images, which is crucial to identify the most appropriate parameters of the model for an object.Nevertheless, this is an unconstrained optimization problem, which is challenging to solve.Generally, it could be addressed by the gradient descent algorithm.Let p signify the AAM's parameter vector{p : = c : /t : /u : {, which is the combination of the appearance parameters c, pose parameters t and the texture transformation parameters u. ˧ = is the sampled texture vector of the current image, which is projected to the texture model frame and g is the texture vector, generated by the model.It can be supposed that there is a linear relationship between the two amounts, the Fig. 1: The basic structure of active appearance model variance of p and the texture difference between an image and the model, which is given by: = ˞.J{J{ (7) where, δp is a small variance of p and R is the linear relationship (or gradient matrix) between δp and r.
AAM assumes R to be fixed and pre-computes it by multivariate linear regression techniques.Since the relationship R is pre-computed, fitting is carried out as an iterative procedure as follows (Gao et al., 2010) (Fig. 1): • Sample the texture of an image and project it to the texture model space.• If ˗ < E, then accept the update; otherwise, try at k = 0.5, 0.25, etc., then go to the first step.
These procedures iterate until there is no more improvement.

AAM Fitting based on optimization methods:
According to Peyras et al. (2007) choosing fitting algorithm is an important concern of AAM.The existence of fitting problem in the face recognition approaches affects their efficiency, as well as generates a poor recognition precision (Baek et al., 2009).Of late, several research approaches have been formulated for resolving the fitting problem in AAM.The techniques for enhancing the fitting efficiency of AAM could be categorized into four sections.Figure 2 shows the four categories.

Manifold deformable model:
The first approach uses manifold deformable model.Furthermore, Christoudias et al. (2004Christoudias et al. ( , 2006) ) have defined the manifold appearance of human face due to external lighting, where each model has been acquired based on particular view of 2D shape of each face, in a specific light field.In order to fit the light-field model to the input image, they have initially selected a particular view of the light-field model, which is most adjacent to the view of the input image and later employed the direct search to coordinate the input image to a level in the manifold light-field.Based on employing manifold estimation in the consolidated view-based feature spaces, (Huang et al., 2010) have proposed and technique for recognizing face features across poses, which is capable of incorporating undetectable views, even there is a big change in pose.Nonetheless, they have evaluated their work on only 25 models of 3D and this might be not adequate to confirm their work.

Combines the current models:
The second approach combines the current models.Wang et al. (2003) have hybridized Active Shape Models (ASMs) and Active Appearance Models (AAMs), for efficiently interpreting image.In the proposed approach, firstly, ASM local searching approach has been employed, later, apart from the statistical shape limitations similar to the ASMs, the global texture constraint depending on the sub-space reconstruction error is further utilized for assessing the fitting level of the presently projected model to the new image.Furthermore, identical to the technique used in the AAM approach, the global texture is employed to estimate and update the parameters of the model.Yan et al. (2002) have proposed a Texture-Constrained Active Shape Model (TC-ASM), which acquired the local appearance model of ASM, as a result of its strength in diverse light conditions.They have also obtained the global texture model of AAM to restrict the shape and to present an optimization measure for identifying the shape parameters.Making use of the texture-constrained shape has enabled the Fig. 2: AAM optimization approaches Fig. 3: The proposed fitting method Sung et al. (2007) search approach to escape from getting trapped in local minima of the ASM search, leading to the enhanced fitting outcomes.
For addressing the disadvantages in AAM fitting algorithm, Sung et al. (2007) have hybridized Active Shape Models (ASM) with AAMs, where the later attempts to identify appropriate landmark points making use of the local profile model.Considering that the authentic objective function of the ASM search is not suitable for merging these methods, a gradient based iterative method has been derived by enhancing the objective function of the ASM search.Later, they have proposed a novel fitting approach, which hybridizes the objective functions of both, ASM and AAM into a single objective function in a gradient based optimization framework as explained in Fig. 3.The AAM utilizes gradient-based optimization approaches for model fitting and consequently, it is very sensitive to initial model parameters.For addressing this issue Sung et al. (2008) have combined the active appearance models and the Cylinder Head Models (CHMs), where the global head motion parameters obtained from the CHMs have been employed as the indicators of the AAM parameters for an excellent fitting or re-initialization: Employ specific AAM: The third approach employs a specific AAM, which is suitable for various individuals and external variations.Lucey et al. (2006) have employed the person-specific AAM for tracking the object and have employed the generic AAM to determine features independent to subjects.As the traditional AAM is always viewpoint-generic, which minimizes the fitting precision; Kawarazaki et al.Berry et al. (2011) have proposed a new image pyramid fitting structure exclusively developed to be applied in an Audio-Visual Speech Recognition (AVSR) system, where the area described by the AAM will be minimized in the course of the iterations.This enables the fitting technique to become more precise as the fitting advances.The new fitting structure is applied with the Fixed Jacobian algorithm and then it compared to a conventional strategy, where the shape of mouth is extracted from a full face AAM.In the above technique the appearance model able to symbolizea shape within the limit of 4 pixels.Nevertheless, in case of specific individuals, the model is not being capable of to represent the shape within this boundary.
Generally, the traditional AAM often deviates, when the input image have differences in pose, appearance and lighting effects, which were not included in the training set. Lee and Kim (2009) are proposed tensor-based AAM for addressing the above problem, which employs multi-linear algorithm to the shape and appearance models of the traditional AAM, for enhancing the fitting efficiency.Tensor-based AAM comprises of an image tensor and a model tensor.Image tensor determines the pose, appearance and illumination of the input image; on the other hand, the model tensor creates variation specific AAM basis vectors, making use of a single trained model tensor and the projected variations, on line.To estimate the variants from the image tensor, they have proposed two distinct techniques: • Discrete variation estimation • Continuous variation estimation Where, the former estimates the pose, expression and lighting of the input image, based on one of the poses, expressions and lighting effects of the training images.On the other hand, the continuous variation approach approximates the pose, expression and lighting effects of the input image, by a combining its corresponding basis vectors.Nevertheless, it consumes more time in case of tensor-based AAM with continuous variation estimation, which demands more pre-processing time for determining the image differences.

Introducing new fitting algorithm:
The forth technique modifies the AAM fitting algorithm itself, by introducing a novel fitting algorithm or enhancing the current fitting algorithm.Donner et al. (2006) have proposed a rapid AAM utilizing the Canonical Correlation Analysis (CCA), which models the relation between the differences of image and parameter, for enhancing the convergence speed of fitting algorithm.Moreover, Matthews and Baker (2004) have proposed arapid AAM fitting algorithm depending on the inverse compositional image alignment algorithm, which does not need a linear relationship among the differences of image and model parameter.They have accomplished better fitting precision as against the conventional AAM; furthermore, the model also has faster convergence.Andreopoulos and Tsotsos (2005) have proposed an extended fitting 2-D AAM approach, based on the inverse compositional image alignment algorithm.The expansion of the algorithm involves the fitting of 3-D AAMs on short axis cardiac MRI.Moreover, Xiao et al. (2004) have proposed a 2D+3D AAM, which has an additional 3D shape model.They have also offered the effective fitting algorithm of2D+3D AAM, by including 3D constraints to the cost function of 2D AAM.Furthermore, (Hu et al., 2004) have proposed an additional extension of 2D+3D AAM fitting algorithm, which is called as multi-view AAM fitting (MVAAM) algorithm, which fits a single 2D+ 3D AAM to images with different views, which were simultaneously acquired from multiple cameras.Nevertheless, the assessment process was not adequate to demonstrate the efficiency of the proposed technique.25 models of 3D and this not adequate to confirm their work.Wang et al. (2003) Combining the current models ASM+AAM lack of the ability to entirely exploit the detailed information present in the given image and sensitive to initial model parameters Yan et al. (2002) Combining the current models TC-ASM The dataset was not enough to assess the generalization of the shape prediction from the texture Sung et al. (2007) Combining the current models ASM-AAMs Model Fitting utilizes gradient-based optimization approaches and consequently, it is very sensitive to initial model parameters Lucey et al. (2006) Specific AAM Person-specific AAM As the traditional AAM is always viewpoint-generic, which minimizes the fitting precision Kawarazaki et al. (2011)  Moreover, Sung and Kim (2006) have proposed a new fitting algorithm of 2D+3D AAMs for a multifocal camera system, referred to as stereo AAM (STAAM), for improving the stability of the fitting of 2D + 3D AAMs.Furthermore, STAAM minimizes the amount of model parameters and demonstrates a superior fitting steadiness, as against the present multiview AAM.Finally, the AAM fitting problem had been solved by introducing a new adaptive ABC algorithm (Abdulameer et al., 2014).The introduction of adaptive ABC has fastened the AAM fitting and hence the effectiveness of recognition has been enhanced without compromising the recognition performance.Table 1 show and summaries the relevant studies as regards to the four sections of fitting optimization methods with their limitations.

CONCLUSION
This study has presented a critical review about fitting optimization in active appearance model technique which is widely used with face recognition applications.In this review, several research approaches have been formulated for resolving the fitting problem in AAM.The techniques for enhancing the fitting efficiency of AAM have been categorized into four sections which were: manifold deformable models, combining the current models, using specific AAM and introducing new fitting algorithm.In addition, this review has been explored the limitations and the drawbacks of the previous studies clearly.From what has been reviewed so far, we can conclude that introducing a new fitting algorithm with AAM could have much influence on the fitting process.
texture ˜ = ˧ : The matrix consisting of a set of orthonormal base vectors g D and describing the modes of variation derived from training set and b A includes the texture parameters in the texture subspace.

•
Calculate the residual texture vector, J = ˧ C − ˧ and evaluate the fitting accuracy using ˗ = ÉJÉ2, where É. É means the norm (2-norm generally).• Predict the variance of the parameters of the model by = −˞.J{J{.• Update the model parameters p → p + kδp, where k = 1 initially.• With the new model parameters, calculate the new model texture ˧ and resample the texture,˧ .• Calculate the new residual vector, J = ˧ − ˧ IJˤ ˗ = ÉJ É 2.

(•
2011) have proposed a viewpoint-specific AMM for extracting robust facial point under multi-viewpoint.The proposed technique performs the following functions: Gathers the training samples and separates them into various groups based on the viewpoint • Builds the AMMs for each training group • If a new test image is given, it is fitted based on the trained AAMs; • Compares the fitting error and chooses the minimum one as the final fitting results.