Screening of Alzheimer’s disease by facial complexion using artificial intelligence

Despite the increasing incidence and high morbidity associated with dementia, a simple, non-invasive, and inexpensive method of screening for dementia is yet to be discovered. This study aimed to examine whether artificial intelligence (AI) could distinguish between the faces of people with cognitive impairment and those without dementia.121 patients with cognitive impairment and 117 cognitively sound participants were recruited for the study. 5 deep learning models with 2 optimizers were tested. The binary differentiation of dementia / non-dementia facial image was expressed as a “Face AI score”. Xception with Adam was the model that showed the best performance. Overall sensitivity, specificity, and accuracy by the Xception AI system and AUC of the ROC curve were 87.31%, 94.57%, 92.56%, and 0.9717, respectively. Close and significant correlations were found between Face AI score and MMSE (r = −0.599, p < 0.0001). Significant correlation between Face AI score and chronological age was also found (r = 0.321, p < 0.0001). However, MMSE score showed significantly stronger correlation with Face AI score than chronological age (p < 0.0001). The study showed that deep learning programs such as Xception have the ability to differentiate the faces of patients with mild dementia from that of patients without dementia, paving the way for future studies into the development of a facial biomarker for dementia.


INTRODUCTION
Dementia is one of the most serious problems facing a global aging population. Diagnosis of dementia is important for early intervention of the disease.
AGING simple, noninvasive, and inexpensive method for screening for dementia.
The perceived age of older adults was shown to be a robust biomarker of aging that is predictive of survival, telomere length [1], DNA methylation [2], carotid atherosclerosis [3], and bone status [4]. It was demonstrated that perceived age reflects cognitive function more closely than chronological age [5]. Furthermore, in the authors' experience, advanced Alzheimer's disease (AD) patients display a specific complexion. Thus, it was postulated that cognitive decline may be expressed in a patient's face.
Deep learning, a major branch of machine learning in artificial intelligence (AI), has remarkably improved the performance and detection capabilities of AI programs since convolutional neural network (CNN) was first developed [6]. The authors have previously reported how CNN could be used to discern AD and dementia with Lewy bodies (DLB) perfusion single photon emission tomography (SPECT) images. The differentiation was made using cingulate island sign [7]. Based on this information, it was hypothesized that AI software may be able to classify patients as having cognitive impairment or not using facial recognition.
The present study aimed to examine whether AI can distinguish facial traits of cognitive impairment patients from that of non-dementia patients. The findings of this study lay the foundations for the development of a noninvasive, inexpensive and rapid screening tool for cognitive impairment using AI.

Demographics
3 patients from the Department of Geriatric Medicine were diagnosed as cognitively normal after several tests and were thus included in the non-dementia arm of the study. Although the participants from the Kashiwa cohort live self-sufficiently in the community, one participant showed declining MMSE over the past few years and was thus classified into the cognitive impairment arm of the study. Two patients were diagnosed to be dementia with Lewy bodies (DLB), two were diagnosed to idiopathic normal pressure hydrocephalus (iNPH), one had aphasia due to cerebral infarction. The rest of the participants were diagnosed to be Alzheimer's disease. The five non-AD patients were excluded from the analyses. Table 1 summarizes the demographics of participants.

Models examined
Learning curves for models, Xception, SENet50, ResNet50, VGG16, and simple CNN with SGD and Adam optimizer are shown in Figure 1. Xception with Adam showed the best performance with a peak validation accuracy of more than 94% and a bottom validation loss of less than 0.21. Xception with Adam optimizer showed slightly better performance, therefore Adam appeared to be the preferred optimizer for Xception. ResNet, SENet showed fast learning, however validation cross entropy loss did not drop adequately. VGG16 with Adam showed good accuracy, however cross entropy loss could not be calculated in an early epoch. VGG16 with SGD showed successful learning with the peak validation accuracy being approximately 91%. Simple CNN with Adam also showed learning with the peak validation accuracy being approximately 92%. However, validation cross entropy loss of simple CNN did not show stability.
From these results, Xception with Adam applied as an optimizer was chosen for use in this study. 28 epochs was chosen as this was where the bottom of loss was seen in the model.

AGING
The correlation coefficient between MMSE and Face AI score in female (r = −0.661) was significantly stronger than that in male (r = −0.501) (p = 0.00833) when evaluated using Fisher's Z-transformation method [9].
The difference in the correlation coefficient between age and Face AI score by sex (male: r = 0.229, female: r = 0.388) when assessed using the same method was not significant (p = 0.0565).

Analysis of upper and lower half faces
44 epochs was chosen for upper half and 41 epochs for lower half, as these were where the bottom of loss was seen in the model.

DISCUSSION
The deep learning network implemented in this study successfully discerned faces of participants with cognitive impairment from those without dementia.
With an accuracy of 92.56% and AUC of ROC being 0.9717, this method is reliable enough for implementation as an initial screening test for dementia. The relatively low sensitivity (87.31%), which may be attributable to the fewer number of dementia participant facial images, could be improved to 96.27% by employing ROC analysis. Face AI score correlated significantly more strongly with MMSE than with age. The weaker correlation between Face AI score and age (r = 0.321) would imply that the significant difference between age of dementia participants with and without dementia is unlikely to have affected the AI system. If the AI system relied on participants age, limiting the age range would worsen the results. However, limiting the age did not impair the accuracy ( Figure 2B).
The AI system is too complicated and has so-called "black box" nature. Although both upper and lower half faces showed excellent performance, lower half showed slightly better performance than upper half contrary to our expectation ( Figure 2C). The system may get more information from mouths or wrinkles than eyes or hair.
The correlation coefficient between Face AI score and MMSE was significantly higher in females than in males. A previous study by the authors demonstrated a significantly stronger correlation between MMSE and perceived age than with chronological age but only in female participants [5], which may be attributable to the tendency for cognitively healthy older women to pay more attention to their appearances relative to their cognitively impaired counterparts, resulting in a marked difference in perceived age.
The fact that SENet50, ResNet50, and simple CNN models were unable to reduce loss during discriminating facial images of dementia and nondementia participants suggests that the task is more difficult for AI software than discerning differences in SPECT images [7]. This is understandable given that while SPECT images can be classified by trained nuclear medicine physicians, diagnosis of dementia   AGING from facial images alone cannot be done manually. Suitable deep learning networks such as Xception are required to accomplish the complex task of detection of dementia through facial images.

AGING
This study has several limitations. Firstly, the study comprised only 484 images as it was performed at a single institution with a single cohort. Moreover, this study may be affected by institutional biases. Although pretraining assisted AI system learning with limited images, further studies employing a larger number of images from multicenter will be needed to confirm the results. Secondly, the non-dementia participants of this study may have had undetected dementia that did not require nursing care. Similarly, though the majority of dementia participants were assumed to have Alzheimer's disease, this was not confirmed using pathology or amyloid positron emission tomography, and one participant from Kashiwa cohort did not underwent sufficient tests. Thus, some participants may have been suffering other forms of dementia such as DLB, vascular dementia or normal pressure hydrocephalus. The facial differentiation of DLB and AD patients may be an interesting study in the future. Finally, the facial images included in this study were only front facing images of Japanese participants with neutral expressions. The ability to use images from various angles, of people from varied ethnicities and with a variety of facial expressions would improve the robustness of the AI system.

CONCLUSIONS
The study showed that deep learning software such as Xception has the ability to differentiate facial images of people with mild dementia from those of people without dementia. This may pave the way for the clinical use of facial images as a biomarker of dementia.

Participants
Dementia patients were recruited mainly from the Department of Geriatric Medicine, The University of Tokyo Hospital. Many of them also participated in a previous perceived age study [5]. The majority of participants without dementia were recruited from the Kashiwa cohort organized by the Institute of Gerontology, The University of Tokyo. Patients from the Department of Geriatric Medicine who were diagnosed as cognitively normal after several tests were included in the non-dementia arm of the study. Participants from the Kashiwa cohort who showed declining MMSE over the past few years were classified into the cognitive impairment arm of the study. The non-AD patients were excluded from the analyses. The rest of the participants were diagnosed to be Alzheimer's disease based on DSM-IV criteria and their Hachinski ischemic scale [10] were no more than 4.
Most of the patients were diagnosed using psychological tests, information from family, laboratory data, brain structural imaging (X-ray computed tomography or nuclear magnetic resonance imaging) and perfusion single photon emission tomography by dementia specialist, except for participants from Kashiwa cohort.
All procedures were approved by the Ethical Review Board at The University of Tokyo Hospital and The AGING University of Tokyo. The clinical study guidelines of the University of Tokyo, which conform to the Declaration of Helsinki (2013), were strictly adhered to. Healthy volunteers, dementia patients and their families were provided with detailed information about the study, and all provided written informed consent to participate.

Image preparation
Front-on portrait images were taken of participants wearing a neutral expression. The images were cropped to a square with the face in the middle of the image. Backgrounds were removed so that the AI does not use the background to differentiate cognitive impairment patients from non-dementia patients.
Images of cognitive impairment and healthy participants were divided into 10 groups (group0 ... group9). All images taken of the same participant were included in the same group.

Statistics
The diagnostic and predictive accuracy of the best CNN model was calculated using the group-base 10-fold cross validation. The binary differentiation of cognitive impairment / non-dementia facial images were expressed as "Face AI score". The scores were obtained by applying an inverse sigmoid function to the output predictive value. Face AI score was evaluated using the ROC curve analysis and AUC on the image-base. Correlations between Face AI score and MMSE / chronological age were assessed. Difference by sex was also examined.
All statistical analyses were performed with python and scipy.stat library.

Analysis of upper and lower half faces
The faces were divided into upper and lower half faces. Upper half faces and lower half faces were separately trained with the same model as the total faces using the same group-base 10-fold cross validation and the optimum number of epochs determined. The performance was analyzed as described above.

AUTHOR CONTRIBUTIONS
Y.U-K. contributed to conceptualization, dementia patient recruitment, funding acquisition, investigation, and project administration. M.K. contributed to programming deep-learning programs, analyzing data, and writing the initial draft. T.T., B-K.S., and K.I. contributed to participant recruitment in Kashiwa cohort and establishment of the protocol of Kashiwa cohort study. T.K. managed dementia patient recruitment. M.F. and T.I. advised on the construction of the AI models. S.O. and M.A. contributed to supervision and funding acquisition. All the authors discussed the project and have read and approved the final manuscript. and Mr. Koichi Fujisawa at Sunstar Co., Ltd. for helping taking photographs in the Kashiwa cohort; and Ms. Natalie Okawa for editing the English of this manuscript.