A survey on computer vision techniques for detecting facial features towards the early diagnosis of mild cognitive impairment in the elderly

In the UK, more and more people are suffering from various kinds of cognitive impairment. Its early detection and diagnosis can be of great importance. However, it is challenging to detect cognitive impairment in the early stage with high accuracy and low costs. Some currently popular methods include cognitive tests and neuroimaging techniques which have their own drawbacks. Whilst viewing videos, studies have shown that the facial expressions of people with cognitive impairment exhibit abnormal corrugator activities compared to those without cognitive impairment. The aim of this paper is to explore promising computer vision and pattern analysis techniques in the case of detecting cognitive impairment through facial expression analysis. This paper presents a survey of computer vision techniques to detect facial features for early diagnosis of cognitive impairment. Additionally, this paper reviews and compares the advantages and disadvantages of such techniques. Automatic facial expression analysis has the potential to be used for cognitive impairment detection in the elderly. In the case of detecting cognitive impairment through facial expression analysis, it may be better to use a local method of facial components alignment, and employ static approaches in facial feature extraction and facial feature classification.


Introduction
There are a growing number of elderly people in the UK, with a significant number of people over 65 years old experiencing dementia. Dementia is a progressive cognitive impairment which may cause impairment in many cognitive domains such as memory and language ability. A large amount of money is spent on healthcare for dementia patients every year. In addition, the number of people who have dementia is still increasing and it is estimated that there will be about 80 million dementia patients worldwide by 2040 (Ferri et al., 2005).
Mild cognitive impairment (MCI) is an intermediate stage between the expected cognitive decline of normal aging and more serious decline from causes such as dementia. For most patients with MCI, the normal activities of their daily life are not greatly affected (Winblad et al., 2004;Wu & Ho, 2009). Diagnosing MCI is similar to diagnosing mild dementia, with similar symptom profiles (Petersen, 2004). Currently, there are many techniques available to detect cognitive impairment, including MCI and more severe cases of cognitive impairment resulting CONTACT Erfu Yang erfu.yang@strath.ac.uk from dementia. It is critical to detect the symptoms of cognitive impairment at an early stage, monitor the progress of the disease, and provide collective care for elderly people with cognitive impairment, especially for those living in a low-income community. There has been significant progress in the field of detecting and diagnosing cognitive impairment, principally through widely employed cognitive tests and neuroimaging techniques. However, such techniques have their own strengths and weaknesses. Using cognitive tests is one of the most popular methods for detecting cognitive impairment. However, some attributes like age, education and personality will influence the test results so need to be considered carefully when using these test results to detect cognitive impairment (Petersen et al., 2001). In addition, for face to face cognitive tests, professional neurophysiologists are needed to carry out cognitive tests for patients (Wild, Howieson, Webbe, Seelye, & Kaye, 2008).
Increasingly, diagnostic neuroimaging techniques employed by clinicians are widely used in clinics. There are two kinds of neuroimaging techniques for the early detection and diagnosis of cognitive impairment: magnetic resonance imaging (MRI) and metabolic positron emission tomography (FDG-PET) (Mosconi et al., 2007). However, their major weakness is the costs involved in the examination in the screening stage. The current methods for diagnosing cognitive impairment are still unsatisfactory (Mukadam, Cooper, Kherani, & Livingston, 2015). In particular, new techniques are required to meet the need of early detection of cognitive impairment with high accuracy and low costs.
Furthermore, it has been observed that the emotion of people with cognitive impairment exhibits detectable differences compared to individuals without such impairment (Chen et al., 2017;Behavioural Neurology, n.d.). The emotions are recognized by their facial expression or are from self-reported emotion experience. For instance, Kuan et al. researched self-reported emotion experience after watching film clips between cognitive impaired people and healthy people (Chen et al., 2017). After watching a disgusting film, disgust was a target emotion. The participants may have both target emotions and nontarget emotion after watching film clips. Kuan et al. mainly researched non-target emotion of the participants after watching film clips. They found that the patients with frontotemporal dementia showed more positive and negative non-target emotion, whereas the patients with Alzheimer's disease showed more positive non-target emotions compared with other participants (Chen et al., 2017). Related work was conducted by Julie et al. who invited 20 cognitive impaired people and 20 healthy people to watch videos and compared their reactions when watching these videos. They found that they demonstrate difficulty in facial muscle control and amplification of expressed emotion (Henry, Rendell, Scicluna, Jackson, & Phillips, 2009). Similarly, Marcia et al. found cognitive impaired people showed more negative emotion while watching negative images which was because of reduced control of negative feelings (Behavioural Neurology, n.d.). In addition, the research from Edmarie et al. found that patients with Alzheimer's disease could experience prolonged states of emotion beyond their memory for events which caused the emotion (Guzmán-Vélez, Feinstein, & Tranel, 2014).
Moreover, some researchers compared the emotion which was using surface facial electromyography between cognitive impaired participants and healthy control participants after watching emotional video clips. Keith et al., for example, studied the abnormality of facial muscle activity in patients with Alzheimer's Disease (AD) (Burton & Kaszniak, 2006), reporting abnormal corrugator activity in individuals with cognitive impairment compared to a control group, while watching images or videos. Additionally, patients with cognitive impairment had difficulties in controlling their facial muscles and expressing their feelings. In a relevant research, Fiona et al. investigated psychophysiological responses (surface facial electromyography and skin conductance level) from cognitive impaired participants and healthy participants after watching emotionally video clips (Kumfor, Hazelton, Rushby, Hodges, & Piguet, 2019). They found that 25 behavioural-variant frontotemporal dementia patients showed an overall dampening of responses while semantic dementia patients showed incongruous emotions.
Automatic facial expression analysis has the potential to be used for cognitive impairment detection in the elderly. However, there are no developed systems for the detection of cognitive impairment through facial expression analysis. This paper presents a survey of the use of computer vision techniques to detect facial features for the early diagnosis of cognitive impairment. The aim is to find promising computer vision techniques for detecting cognitive impairment by analysing facial expressions.
Facial expressions play an important role in communicating with each other, expressing feelings and emotions and so on (Palermo & Rhodes, 2007). Currently, computer vision is a hot topic and many researchers are doing relevant research such as object detection, noise reduction and image classification using deep learning (Gao et al., 2015;Gao, Xue, Sun, Wang, & Zhang, 2016). Additionally, this field has produced significant developments in automatic facial expression analysis using computer vision techniques. Automatic facial expression recognition involves three stages typically: face detection and facial components alignment, facial feature extraction and facial feature classification. Face detection and facial components alignment is often the first step in many approaches (Lee, Kim, Kim, & Whangbo, 2015). In this step, the location of the face and key facial components are determined. The second step is facial feature extraction. Here, some predesignated facial features are extracted using various approaches. Facial features extraction can involve static approaches or dynamic approaches, which will be discussed in more details in Section 3. The final step is typically facial feature classification, where images and sequences of facial expressions are classified into some predesigned emotion group or specific facial muscle action unit group using the extracted facial features. Facial feature classification can be again be divided into static and dynamic approaches. The major steps in facial expression recognition are illustrated in Figure 1.
This paper is organized as follows. Section 1 provides a general overview and sets out the outline for the literature review. The background on dementia, MCI and research challenges are introduced in Section 2, whilst Section 3 summarizes techniques to recognize facial expressions. Section 4 provides a discussion of the surveyed work, where advantages, drawbacks and research directions are summarized. Finally, Section 5 provides a short summary of the paper.

Dementia and mild cognitive impairment
Dementia is progressive cognitive disorder, which typically features cognitive impairments such as memory and language impairment. Dementia involves specific neuropathological changes, including extracellular and intraneuronal parenchymal lesions. As detection of these neuropathological changes cannot be carried out when the patients are alive (Dubois et al., 2010), dementia detection often involves a probabilistic diagnosis. Dementia is caused by damage to brain cells, which leads to abnormal cellular operation and communication. Furthermore, damage to different parts of the brain is found to typically relate to different types of dementia ('Dementia -Signs, Symptoms, Causes, Tests, Treatment, Care alz.org,' n.d.). The common types of dementia are Alzheimer's Disease, Dementia with Lewy Bodies, Vascular Dementia, and Fronto-Temporal Dementia (Gaugler, James, Johnson, Scholz, & Weuve, 2016). Among these types, the most common type of dementia is Alzheimer's disease, as 60% to 80% of dementia patients suffering from this type of dementia (Blennow, de Leon, & Zetterberg, 2006).
In addition, different types of dementia may have different symptoms, and symptoms may also vary across individuals, even those with similar diagnoses. For most patients, the initial symptom relates to difficulty in recalling new information. Later, other symptoms may appear. There are some common symptoms in most dementia patients, such as problems with language, movement, recognition, reasoning & judgment, and changes in personality (Gaugler et al., 2016). As a result, patients' emotional expression may be affected by these symptoms. For instance, poor memory, reasoning and judgment abilities may result in abnormal facial expressions while viewing visual material.
MCI is an intermediate stage between the expected cognitive trajectory due to normal aging and the more serious decline due to dementia. MCI typically features more serious cognitive decline than that expected for the individual's education and age. However, in the early stage of MCI, normal activities in daily life will not be affected greatly. Among such patients, around half can be expected to remain stable in their cognitive conditions, but the remaining group may eventually become patients with dementia. As a result, MCI can be considered as a risk state for dementia (Nestor, Scheltens, & Hodges, 2004;Wu & Ho, 2009). In particular, MCI patients with impairments in episodic memory, verbal abilities, associative recognition impairment and visual-spatial function may have a higher risk of suffering dementia (Arnáiz & Almkvist, 2003;Troyer et al., 2012).

Major challenges
Although there have been many methods and techniques developed to diagnose cognitive impairments, none are without drawbacks. Additionally, few approaches provide effective and low-cost solutions for early detection and diagnosis of cognitive impairment. In the following paragraphs, we highlight some major research challenges about the early detection of cognitive impairment.
Whilst many methods and techniques can be employed to diagnose severe cognitive impairments like dementia, these approaches may not be sensitive enough to detect MCI. For example, the Mini-Mental Status Examination (MMSE) is one of the most widely used cognitive tests employed to detect severe cognitive impairment. In the MMSE, people are asked to answer specific questions which are related to cognitive domains such as language and memory. However, it has been reported that only 18% of MCI subjects can be detected via the MMSE cognitive test (Nasreddine et al., 2005), which indicates that this test has little utility for reliable early detection of MCI.
It is generally difficult to diagnose cognitive impairment in early stages of development. Different individuals will display different symptoms, and more overt symptoms may not be expressed. Indeed patients with MCI may be able to live independent lives, with the symptoms expressed by such patients appearing minor relative to healthy older adults, and consequently reliable diagnostics is extremely challenging. Within MCI, the detection of minor abnormalities may require extensive testing in a broad range of cognitive domains, such as reaction time, memory and attention and processing speed (Gualtieri, 2004).
A key factor to additionally consider is the challenge of providing a low-cost and effective cognitive impairment detection solution for elderly people living in a low-income community. It is expensive, for example, to detect cognitive impairment using neuroimaging techniques, whilst cognitive testing typically requires professional neurophysiologists to conduct such clinical tests. In both cases, there are substantial financial and personnel costs.

Face detection and alignment
Face detection is always the first step, both for face recognition or for facial expression recognition (Lee et al., 2015). After face detection, face alignment is an important next step to locate the positions of the contour of facial parts such as eyes, nose and mouth. In practice, a reliable face detection system should be able to complete this task in different lighting situations, orientation and background in a short time. Face alignment is quite important in many facial expression recognition approaches. Bad facial components alignment may affect the accuracy of facial features extraction.
Techniques of face alignment can be divided into two groups: local and global methods (Huang, Hsu, & Cheng, 2010). Local methods, the system detect facial components such as mouth corners or the pupils of the eyes. With this approach, as facial landmarks are independent, it is possible that accuracy may be affected by spatial or temporal variance in illumination. A global method can be more reliable, as the approach of employing the whole geometric structure of the face to locate facial landmarks. It is less sensitive to lighting variance.
A promising approach to face alignment, involving aspects of both methods, was proposed by Cootes, Taylor, Cooper, and Graham (1995). The Active Shape Model (ASM) algorithm contains a global shape model and many local feature models. The major steps for the ASM algorithm to find the best facial landmarks can essentially be summarized as follows (Cootes et al., 1995;Huang et al., 2010): (1) The shape parameter b is initialized to zero.
(2) The shape model points are generated by x =X + PB, whereX is the mean shape model, P is the eigenvectors corresponding to the largest eigenvalues and b is the shape parameter.
(3) The best landmark z is found by the feature model.
In addition, if |b − b| is less than the predefined threshold, the process is completed. Else, b is set to be b and the process is returned to step 2.
An additional model for face alignment reported in the literature is the Active Appearance Model (AAM) (Lee et al., 2015;Seshadri & Savvides, 2012). Whilst both AAM and ASM can be used to find landmarks for facial components, AAM uses a global appearance model while ASM uses a global shape model and uses a local region to find a better position of facial components. Generally, the ASM method is better at finding accurate landmark positions, while the AAM method has a better ability to endure variations in lighting (Le, Brandt, Lin, Bourdev, & Huang, 2012). In ASM, the first step is to match the average face model to the genuine image (Lee et al., 2015). The model then searches around the image to find the best location for the point in the image and updates the facial model. The traditional ASM method utilizes the Mahalanobis distance to find the best locations for points. This process brings two problems (Huang et al., 2010). The first problem is that the practical landmark positions may be far from the average facial shape template in some facial expressions. As a result, the facial landmarks cannot be located correctly. Additionally, the practical landmark positions may not be located along the normal direction of an edge contour. This will result in errors in accurate location of landmark positions. In order to solve these problems, researchers have subsequently improved the classical ASM, achieving better performance in facial component alignment (Huang et al., 2010;Le et al., 2012;Lee et al., 2015).
In one approach, Yea-Shuan et al. proposed an improved method to locate the facial landmark positions, based on the traditional ASM method (Huang et al., 2010). In the improved method, the corner-type landmarks such as the outer canthus of the eyes are found first and are used to initialize the facial landmark positions. Then, the ASM is used to find the final facial landmark positions. In experiments, the improved method showed better performance than the traditional ASM method (Huang et al., 2010).
Yong-Hwan et al. have also improved the classic ASM method to achieve better accuracy in locating the facial landmark positions for facial components (Lee et al., 2015). Their approach modified the traditional method in the following ways: (i) by using the centre of the eyes to initialize landmark positions and (ii) by improving the model definition file and extending the profile from 1D to 2D. Yong-Hwan et al. tested the improved method using over 700 images of faces. The experimental results showed that their approach led to an increase of more than 10% in detection success rate.
Vuong et al.'s group have also proposed an improved ASM method to find facial landmark positions, especially for high-resolution real-world images (Le et al., 2012). Vuong et al.'s improved the shape model in two aspects: (i) through component shape fitting and (ii) through configuration model fitting, such that the improved model has more flexibility for shape variations. Vuong et al. also improved profile matching using the standard Viterbi algorithm (Le et al., 2012). Vuong et al. additionally proposed an interactive refinement algorithm to reduce fitting errors. Overall, testing of their approach employed 2000 high-resolution images and achieved good results.

Facial features extraction and representation
Careful selection and extraction of facial features is extremely important, if inadequate features are selected, then even the best classifier cannot provide satisfying recognition results (Shan, Gong, & McOwan, 2009). Facial features selection is related to facial muscle action units (AU). Combination of a set of action units can make up different emotions. Although it could be argued that there are potentially thousands of facial expressions, most of them differ only slightly. In terms of categorization, Ekman and Friesen proposed the Facial Action Coding System (FACS), which is the most widely used system for facial expressions analysis (Ekman & Rosenberg, 2005). Within the Facial Action Coding System, facial expressions are made up of 46 action units (AU), with action units related to specific sets of facial muscles.
Techniques for facial feature extraction can be sub divided into different groups: geometric based or appearance based methods (Shan et al., 2009). In the geometric based methods, the shapes and locations of facial components are extracted to form a feature vector which represents the geometry of the face. However, such a geometry-based methods cannot be used in some contexts, as they require accurate facial feature location and tracking. The other approach is to employ appearance-based methods. Within this approach, some image filters such as Gabor wavelets are applied to the entire face or part of the face to detect changes in the face but this is both time and system memory consuming.
Another approach to the categorization of approaches to facial feature extraction is to view approaches as either holistic or local in nature (Tong, Liao, & Ji, 2007). In the holistic method, the whole face will be analysed, which necessarily dictates that more information from the entire face will be utilized, which leads to a commensurate increase in the required computational complexity. In a local method, only part of the face or limited facial components are involved. Thus, local methods are sensitive to changes in a small facial area and is computationally efficient. However, each action unit may be designed separately. Alternatively, there is a third way to categorize approaches. Facial features can be detected by static approaches where action units are analysed frame by frame or dynamic approaches where facial features are detected with the evolution of facial features (Tong et al., 2007). Typical classification of facial feature extraction is also shown in Figure 2. In the following section, static and dynamic approaches to facial feature extraction will be reviewed separately.

Static approaches
Static approaches typically extract facial features from one image or one frame from image sequences, Caifeng et al., for example, proposed such a facial expression recognition system based on statistical local features using Local Binary Patterns (LBP) (Shan et al., 2009). Originally, Ojala et al. proposed a two-level LBP which was used as texture description ('A comparative study of texture measures with classification based on featured distributions, ' 1996). The advantages of the LBP algorithm include (i) good performance even under illumination changes, and (ii) simplicity in computation. Caifeng et al. tested various machine learning algorithms using several databases (Shan et al., 2009). The experiments showed that the LBP techniques were efficient and effective for recognizing facial expression. Furthermore, Caifeng et al. reported that the best performance was achieved using Support Vector Machine classifiers with Boosted-LBP techniques. Moreover, Caifeng et al.'s experiments demonstrated that the LBP technique performed well in facial expression recognition for low-resolution images. However, this system had one limitation in that it can only process static images, but it does not use temporal images of facial expressions.
Maja et al. also presented an automation system which can be used for facial expression recognition, employing both frontal and profile face images (Pantic & Rothkrantz, 2004). Within this approach, a multi-detector is employed to locate the profile contour, and some facial components such as mouth and eyes. In the system, the methodology involved the extraction of 10 facial points from profile contours and 19 facial points from facial components contours. This system can be used to recognize 32 different action units in facial expressions. The experiment showed that this system had good performance with accuracy of 86%.

Dynamic approaches
Dynamic approaches often extract facial features from a set of image sequences which contain temporal information. Consequently, the computation complexity is increased. For example, James et al. proposed a facial expression recognition system based on computer vision which was sensitive to subtle changes in faces (Lien, Kanade, Cohn, & Li, 2000). Three different modules were included within this system, which were used to extract facial features. The three modules were facial-feature tracking, dense-flow extraction (which uses a wavelet motion model), and edge & line extraction. This system could recognize action units using dynamic facial expression images.
In a related approach, Ying et al. proposed the Automatic Face Analysis system to recognize facial expression using frontal face images (Tian, Kanade, & Conn, 2001). This system could recognize the action units instead of some basic facial expressions such as happy and sad and was able to detect and track various facial features such as lips and brows. It recognized neutral expression, 6 upper action units and 10 lower facial action units. The general recognition accuracy for facial action units was 96% excluding neutral expressions. Ying et al.'s system was tested in different facial image databases and addressed many limitations of other facial expression recognition systems. For example, image alignment was not necessary in the system, and the out-of-plane head motion could be handled. Additionally, processing time was largely improved and it could process each frame in less than 1 s. The system was robust and could track facial features even when appearance changed greatly.
Maja et al. proposed another system to recognize facial action units instead of typical facial expressions such as happy and sad (Pantic & Patras, 2006). In addition, the system was able to recognize the action units from long profile face images sequences. The system was able to track 15 facial points and recognized 27 action units using profile face videos, achieving a recognition rate of approximately 87%.
In addition, some researchers use dynamic Bayesian network (DBN) to analyse action units (Tong et al., 2007). For instance, Yan et al. proposed a method to find the relationship between action units and temporal evolutions for action units recognition (Tong et al., 2007). In this approach, a DBN was used to represent the relationships between action units and to find temporal changes in the development of action units. The benefit of a Dynamic Bayesian network approach is its strength in modelling the evolution of emotion from a neutral state to a weak emotion, next to the apex, and to a releasing state at the end. Yan et al.'s work showed that the integration of relationships of action units could improve the action unit recognition accuracy greatly, especially in illumination and face pose changes. In addition, low-intensity action units can be detected by investigating the related highintensity action units. However, one limitation of the evaluation conducted was that they only focused on detecting the 14 most common action units. The researchers reported that they planned to detect more action units and improve computer vision techniques in subsequent work.
Finally, Guoying et al. proposed a facial expression recognition system which uses spatiotemporal local binary patterns (Zhao & Pietikäinen, 2007). The advantages of this system include both robustness to lowresolution video and simplicity in computation. In addition, the system uses region-based local descriptors which can be employed to recognize facial expression in image sequences. Experimental evaluation demonstrated that the recognition accuracy for facial expression was approximately 96.3% using images sequences in the Cohn-Kanade facial expression database. The system was also tested in different image resolutions and video frame rates, demonstrating good performance in real time.

Facial features classification
Facial feature classification is inherently difficult due to the variety of ways people show their expressions (Lopes, de Aguiar, De Souza, & Oliveira-Santos, 2017). In addition, recognition accuracy may be affected by both the age and the ethnicity of the subjects. Moreover, for the same person, the same expression may have a different appearance in different lighting conditions, environments or poses. Furthermore, the classifier's efficiency for facial expressions recognition may be affected by the quality of the selected features.
In addition, there are many kinds of classifiers which can be used for facial features classification such as neural networks, Bayes classifiers and support vector machines (Mandal, Poddar, & Das, 2015). The architecture of different neutral networks often vary, with each having their own advantages and drawbacks. Also, different neutral networks may be applied in different tasks. For example, artificial neutral networks (ANN) are widely used for data classification and pattern recognition (Bashyal & Venayagamoorthy, 2008;Mandal et al., 2015). ANN consists of neurons where every unit has an input and passes its output to the next layer.
In some approaches, there are predesigned categories of facial expression, such as Ekman's proposed 6 basic emotions (e.g. anger, sadness, surprise, happiness, disgust and fear) (Cohen, Sebe, Garg, Chen, & Huang, 2003). In such approaches, facial features are extracted from the images as the input, with the output being the predesigned emotion categories. However, they often differ in the features extracted and the classifiers used. Facial expressions can be classified into two groups: positive and non-positive facial expressions (Mandal et al., 2015).
Similarly, classification of facial expressions may also be divided into those using either static or dynamic image sequences (Cohen et al., 2003;Lopes et al., 2017). Static approaches are often easy to use and train, whilst dynamic approaches often involve larger training samples.

Static approaches
Static approaches are also known as frame-based approaches (Silva, Sobral, & Vieira, n.d.). In such approaches, the system only uses the current input image or individual frame. Static approaches include Bayesian Networks, Neural Networks, and linear discriminant analysis (Cohen et al., 2003;Lopes et al., 2017).
For instances, André et al. proposed a method for facial expression recognition which used Convolutional Neural Network (CNN) and image pre-processing techniques (Lopes et al., 2017). A significant strength of the CNN approach is its ability to achieve good accuracy with big data. Image pre-processing techniques are used to extract facial expression features within this approach. André et al. evaluated their system through three public image bases: CK+, JAFFE and BU-3DFE, subsequently demonstrating a recognition accuracy of 96.76% in the CK+ database.
In a related work, Shishir et al. used the learning vector quantization (LVQ) algorithm to classify seven different facial expressions from static images of human faces (Bashyal & Venayagamoorthy, 2008). The result showed better performance when compared with Shishir's earlier work. In particular, the recognition of facial expressions of fear showed better accuracy compared with multilayer perceptron (MLP) based classification techniques, which was used in the earlier work. In addition, the facial expression recognition accuracy was 85.7% for the entire data set.
Murari et al. proposed a facial expression recognition system which used higher order Zernike moments for facial features extraction and an ANN-based classifier for feature classification (Mandal et al., 2015). In the work, facial expressions were classified into two groups: positive and non-positive emotions. The system was conducted on the Cohn-Kanade (CK) dataset, achieving a recognition accuracy of 69%. It is of interest to note that Murari et al. also conducted a survey in their work, reporting a facial expression recognition rate within human subject was able to recognize facial expressions with an accuracy of 78%, demonstrating that machine-based facial expression classification may have similar recognition accuracy compared to human operators.
Comparing three different classifiers for facial expressions (ANN, Linear Discriminant Analysis (LDA) and K-Nearest Neighbor (KNN)), Caroline et al. focused on recognizing seven basic facial expressions: happiness, anger, sadness, surprise, disgust, fear and neutral. Their work demonstrated that the Linear Discriminant Analysis appeared to have the best performance with approximately 99.5% of recognition accuracy for MUG and FEED-TUM databases (Silva et al., n.d.).

Dynamic approaches
Dynamic approaches are also known as sequence-based approaches (Silva et al., n.d.). Such approaches employ temporal information through the use of several images sequences as input to classifiers. Dynamic approaches include hidden Markov model (HMM) based classifiers (Cohen et al., 2003;Lopes et al., 2017).
Dynamic approaches were studied by Ira et al. to classify facial expressions using continuous video inputs (Cohen et al., 2003). They researched Bayesian networks, focusing on distribution assumptions and feature dependency structures. Ira et al. used the Naive-Bayes classifiers and tested the system with the distribution changing from Gaussian to Cauchy. Furthermore, they used Naive Bayes (TAN) classifiers and investigated dependencies between various facial motion features. The system worked in real time for facial feature extraction and facial expression classification. In addition, they also studied the static classification method. For static approaches, the system classifies the facial expression for only one frame in a video. They used two different classifiers: Naive Bayes Classifiers and Tree-Augmented Naive Bayes classifiers. Furthermore, they used a multi-level HMM classifier for dynamic facial expression classification which involved temporal information.
On the other hand, Irene et al. used the multiclass Support Vector Machine (SVM) system as the classifier to recognize six basic facial expressions or a set of facial action units (Kotsia & Pitas, 2007). The experiment used the Cohn-Kanade database. The experiments achieved 99.7% of accuracy for recognizing 6 basic facial expressions and 95.1% for facial action unit detection.

Discussion
Facial expressions play an important role in human social activity, which reflect people's emotions, attitudes, social relations and physiological information. It is possible that in the future, automatic facial expression recognition may play an important role in human-computer interaction. Indeed, many researchers are working on this topic and finding effective solutions for facial features extraction and classification (Barroso, Santos, Cardoso, Padole, & Proença, 2016;Quan, Matuszewski, & Shark, 2016;Yifrach, Novoselsky, Solewicz, & Yitzhaky, 2016). However, one prevailing issue is that most current research uses labbased facial expression images or image sequences. In practice, many situations such as lighting conditions, image quality, complexity of background and so on may affect the result of recognizing facial expressions and needs to be satisfactorily taken into account in a viable system for facial expression analysis.
In practice, a typical facial expression recognition system may consist of three parts: face detection and facial components alignment, facial features extraction and facial features classification. Various techniques may be involved in each step. Additionally, these techniques may be more applicable in a selected group of situations and each has advantages and drawbacks which are discussed in detail below. Table 1 summaries the characteristics and drawbacks for the reviewed techniques in facial expressions recognition. Face detection and facial component alignment is the first step in facial expression recognition in many approaches. AAM and ASM are two popular approaches for facial component alignment. ASM uses local information and is more suitable for finding landmarks for facial components. However, the accuracy of this approach may be decreased when there is a change in lighting conditions. On the other hand, AAM uses a global appearance model, which has better ability to endure changes in lighting conditions. However, computation complexity is also increased.
Facial features extraction is often an important step in facial features analysis. It will also influence the accuracy of facial features classification. Techniques of facial feature extraction can be divided into different groups. For example, they can be divided into geometric based methods and appearance-based methods, each with their own advantages and disadvantages. Geometric methods may need accurate facial features detection to work effectively, whilst appearance-based methods may be both time and memory consuming. Furthermore, techniques of facial features extraction can be divided into holistic methods and local methods. Holistic methods have the advantage of using the whole face, which contains more information, but the computational complexity is also increased. Local methods may be sensitive to changes in a small area, but each face component may be dealt with separately. Additionally, techniques for facial features extraction can be dichotomized into static or dynamic approaches. Static methods may have advantages in computational simplicity, but they do not take temporal information into account. As a result, this approach cannot analyse the complete evolution of a given facial expression and understand the changes in emotions. Conversely, dynamic methods take the evolution of facial expressions into consideration, but with a corresponding increase in computational complexity.
Classification of facial expressions is often the final step in facial feature analysis. Here, the major difficulties relate to differences in facial ethnicity characteristics and influence of lighting situations, environments or poses (Lopes et al., 2017). Normally, facial features are extracted from the images as the input to the classifier, with the outputs from the classifiers being predesigned categories of facial expression (such as the 6 basic emotions of anger, sadness, surprise, happiness, disgust and fear) (Cohen et al., 2003). Additionally, classification of facial expressions can be divided into approaches which employ either two types: those using static images or dynamic image sequences (Cohen et al., 2003). Static approaches in facial features classification include Bayesian Networks, Neural Networks, and linear discriminant analysis. Dynamic approaches include the hidden Markov model (HMM) based classifiers. Static approaches are often easier to use and train, but they may have problems in recognizing facial expressions from some frames in a video sequence when the expression is not fully manifested. On the other hand, dynamic approaches contain temporal information and are more suitable for person-dependent system, because of the changes in facial expressions and the difference in temporal pattern among different individuals. Also, dynamic approaches often involve more training samples.
There are several good survey papers which identify and summarize issues relating the problems from the existing systems for facial feature analysis (Samal & Iyengar, 1992;Fasel & Luettin, 2003;Sandbach, Zafeiriou, Pantic, & Yin, 2012;Zeng, Pantic, Roisman, & Huang, 2009;Sariyanidi, Gunes, & Cavallaro, 2015). Fasel and Luettin (2003) for example report that many systems only employ frontal face views as their input. Furthermore, many systems took for the assumption that there were only small head motions between two frames. They also only focused on either static images or dynamic image sequences. However, it is better to utilize both static images and dynamic images sequences. Moreover, out of plane rotation of faces may also lead to problems in some systems. On the other hand, Zhihong et al. considered that it is better to recognize facial features and expressions by fusing both audio and visual information (Zeng et al., 2009).
It could be predicted that there will be significant improvements in the area of automatic facial expression recognition yet to come. One future direction is that facial expression recognition may become more efficient in various conditions and background complexities. Machine-based facial expression recognition is already encroaching on, and ultimately may have higher accuracy than human operators. Another future direction is that more information may be involved in facial and emotion recognition processes, such as voices and body movement.
Automatic facial features analysis may have applications in the field of health and medical treatment. Although there are currently many existing methods to detect and diagnose cognitive impairment, such as cognitive tests and neuroimaging techniques, all existing techniques have their own weaknesses. For example, in cognitive testing, it is known that personal attributes like age, education and personality can influence the test results, and additionally such tests typically require professional neurophysiologists. Meanwhile, automatic facial features analysis may have the potential for early detection and diagnosis of the cognitive impairment. Automatic systems may be designed for extracting predesigned facial features related to cognitive impairment, such as abnormal corrugator activity. In such an approach, a classifier could be employed to use facial features related to cognitive impairment. In this case, an automation system may be developed as an app on mobile devices and be made available through the internet. In such a scenario, the user may only need to watch pre-prepared emotion-triggering video stimuli on their mobile phone, when the camera from the mobile phone is recording, and such that the analysis of the facial features of the user can be processed in the background. In summary, the facial feature-based method may provide a convenient, low-cost and effective solution for early detection and diagnosis of cognitive impairment for elderly people.

Conclusion
This paper presented a survey on computer vision techniques to analyse facial features. The process of facial features analysis can be divided into three parts: face detection and facial components alignment, facial features extraction, and facial features classification. Within this paper we have discussed, the computer vision techniques involved in each step were reviewed and compared in terms of their advantages and drawbacks. For facial components alignment, in the lab-based environment, when the lighting condition is even and stable, it may be better to use a local method such as ASM. On the contrary, when the lighting condition is complex, it may be better to use a global method such as AAM. For facial feature extraction and facial feature classification, there are both have static and dynamic approaches available. When compared to static approaches, dynamic approaches require more training samples and are more suitable for person-dependent systems. Additionally, dynamic approaches may also have better performance in analysing facial expressions when there are sufficient training samples. On the other hand, when there are insufficient training samples, it is better to use static approaches.
As there are known differences between healthy individuals and those with cognitive impairment in facial expression, there is a possibility that cognitive impairment can be detected by recognizing specific facial features. Current techniques for detection of cognitive impairment such as cognitive tests and neuroimaging techniques have many drawbacks. Automatic facial expression analysis systems may be an alternative solution for early detection of cognitive impairment for elderly people, with benefits of high accuracy and low cost. In the case of detecting cognitive impairment through facial expression analysis, where lighting conditions might be even and stable and, there are no enough training samples in the beginning, it may be better to use a local method in facial components alignment and use a static approach in facial feature extraction and facial feature classification.
One future direction is that automatic facial expression recognition may be developed to be more efficient in various conditions and complexity of backgrounds whilst maintaining high accuracy. Machine-based facial expression recognition may ultimately have higher accuracy than human operators. Another future direction is that more information, such as voice and body movement, may be involved in the process in the facial expression and emotion recognition.