Data Resource Acquisition from People at Various Stages of Cognitive Decline – Design and Exploration Considerations

In this paper we are introducing work in progress towards the development of an infrastructure (i.e., design, methodology, creation and description) of linguistic and extra-linguistic data samples acquired from people diagnosed with subjective or mild cognitive impairment and healthy, age-matched controls. The data we are currently collecting consists of various types of modalities; i.e. audio-recorded spoken language samples; transcripts of the audio recordings (text) and eye tracking measurements. The integration of the extra-linguistic information with the linguistic phenotypes and measurements elicited from audio and text, will be used to extract, evaluate and model features to be used in machine learning experiments. In these experiments, classification models that will be trained, that will be able to learn from the whole or a subset of the data to make predictions on new data in order to test how well a differentiation between the afore-mentioned groups can be made. Features will be also correlated with measured outcomes from e.g. language-related scores, such as word fluency, in order to investigate whether there are relationships between various variables.


Introduction
Current state-of-the-art diagnostic measures for Alzheimer's Disease (AD) are invasive, expensive, and time-consuming.There is a consensus on the need for the identification of the disease in its earliest manifestations by applying non-invasive and/or cost-effective methods that could aid the identification of subjects in the preclinical or early clinical stages.Efficient tools that could be applied in routine dementia screening in primary care settings for identifying subjects who could be appropriate for further cognitive evaluation and dementia diagnostics1 , could provide the specialist centres the opportunity to engage in more demanding, advanced investigations, care and treatment.New paths of research traced to acquire further knowledge about AD and its subtypes as well as tools based on the exploration of several complementary modalities and parameters, such as speech analysis and/or eye testing (cf.Laske et al., 2014, for a review) could be examined and incorporated into established neuropsychological, memory and cognitive tests in order to investigate fairly unexplored features that may be used as potential biomarkers for AD.This paper describes some efforts underway to acquire, assess, analyze and evaluate linguistic and extra-linguistic data from people with subjective (SCI) and mild cognitive impairment (MCI) and healthy, agematched controls.
The SCI, the MCI, and the Alzheimer's disease (AD) are on a spectrum of disease progression.Subjective cognitive impairment (SCI) is a common diagnosis in elderly people, sometimes suggested to be associated with e.g.depression, stress or anxiety, but also a risk factor for dementia (Jessen et al., 2010).On the other hand, mild cognitive impairment (MCI) is a prodromal state of dementia (Ritchie & Touchon, 2010), in which someone has minor problems with cognition (e.g., problems with memory or thinking) but these are not severe enough to warrant a diagnosis of dementia or interfere significantly with daily life, but still the difficulties are worse than would normally be expected for a healthy person of their age.
The rest of this paper is organized as follows: Section 2 presents related work from the Computational Linguistic/Natural Language Processing (CL/NLP) field in the domain of dementia, with focus on classification and prediction methodologies using mainly spoken language (including transcribed data) as well as eye tracking measures.Sections 3 and 4 provide a description of the protocol used in the project.Section 3 briefly discusses the Gothenburg MCI-study (from which the current project is recruiting its subjects) and also the ethical issues related to the project, while Section 4 presents the material and design of the various experimental tasks and the procedure for data collection.Post-processing of acquired data is also discussed in Section 4 while Section 5 provides a brief outline of the features we plan to extract from the data and the algorithms to use for classification and statistical analysis.Finally, in Section 6, the conclusions and future work are presented.

CL/NLP and the area of Dementia
A prerequisite for identifying dementia in its earliest stages is a reliable cognitive examination (Nordlund et al., 2010).Particularly for clinicians, language plays an important role in diagnosis and investigations include inquiries about the use of language in various forms.New findings aim to provide a comprehensive picture of cognitive status and promising results have recently thrown more light on the importance of language and language (dis)abilities as an essential factor that can have a strong impact on specific measurable characteristics that can be extracted by automatic linguistic analysis of speech and text (Ferguson et al., 2013;Szatloczki et al., 2015).The work by Snowdon et al. (2000), "The Nun Study", was one of the earliest studies which showed a strong correlation between low linguistic ability early in life and cognitive impairment in later life by analyzing autobiographies of American nuns and could predict who could develop Alzheimer's Dementia by studying the degradation of the idea density (that is, the average number of ideas expressed in 10 words; Chand et al., 2010) and syntactic complexity in the nuns' autobiographical writings.
Since then, the body of research and interest in CL/NLP research in the area of processing data from subjects with mental, cognitive, neuropsychiatric, or neurodegenerative impairments has grown rapidly2 .Automatic spoken language analysis and eye movement measurements are two of the newer complementary diagnostic tool with great potential for dementia diagnostics (Laske et al., 2014).Furthermore, the identification of important linguistic and extra-linguistic features such as lexical and syntactic complexity, are becoming an established way to train and test machine learning classifiers that can be used to differentiate between subjects with various forms of dementia and healthy controls or between different types of dementia subjects (Lagun et al., 2011;Roark et al., 2011;Olubolu Orimaye et al., 2014;Rentoumi et al., 2014).
Although language is not the only diagnostic factor for cognitive impairment, several recent studies (Yancheva et al., 2015) have demonstrated that automatic linguistic analysis, primarily of speech samples, produced by people with mild or moderate cognitive impairment compared to healthy individuals can identify objective evidence and measurable (progressive) language disorders.Garrard & Elvevåg (2014) comment that computer-assisted analysis of large language datasets could contribute to the understanding of brain disorders.Although, none of the studies presented in the special issue of Cortex vol.55 moved "beyond the representation of language as text" and therefore finding reliable ways of incorporating features, such as prosody and banken.gu.se/eng/rapid-2016>); various papers in the workshop series on "Speech and Language Processing for Assistive Technologies", SLPAT (<http://www.slpat.org/>)and the seven Louhi: Workshops on Health Text Mining and Information Analysis (<https://louhi.limsi.fr/2016/>).
emotional connotation, into data representation remains a future challenge, the editors acknowledged that current research indicates that "the challenges of applying computational linguistics to the cognitive neuroscience field, as well as the power of these techniques to frame questions of theoretical interest and define clinical groups are of practical importance".Nevertheless, studies have shown that a steady change in the linguistic nature and the degree of symptoms in speech and writing are early and could be identified by using language technology analysis (Mortimer et al., 2005;Le et al., 2011).New findings also show a great potential to increase our understanding of dementia and its impact on linguistic degradation such as loss of vocabulary, syntactic simplification, poor speech content and semantic generalization.Analysis of eye movement is also a relevant research technology to apply, and text reading by people with and without mild cognitive impairment may give a clear ruling on how reading strategies differ between these groups, an area that has so far not been researched to any significant extent in this particular domain (Fernández et al., 2013(Fernández et al., , 2014;;Molitor et al., 2015).With the help of eye-tracking technology the eye movements of participants are recorded while suitable stimuli is presented (e.g., a short text; cf.section 4.3).

The Gothenburg MCI-study and Related Ethical Issues
The ongoing Gothenburg mild cognitive impairment study (Nordlund et al., 2005;Wallin et al., 2016) is an attempt to conduct longitudinal in-depth phenotyping of patients with different forms and degrees of cognitive impairment using neuropsychological, neuroimaging, and neurochemical tools.
The study is clinically based and aims at identifying neurodegenerative, vascular and stress related disorders prior to the development of dementia.All patients in the study undergo baseline investigations, such as neurological examination, psychiatric evaluation, cognitive screening (e.g., memory and visuospatial disturbance, poverty of language and apraxia), magnetic resonance imaging of the brain and cerebrospinal fluid collection.At biannual follow-ups, most of these investigations are repeated.
The overall Gothenburg MCI-study is approved by the local ethical committee review board (reference number: L091-99, 1999; T479-11, 2011); while the currently described study by the local ethical committee decision 2016).The project aims at gathering a rather homogeneous group of participants with respect to age and education level (50 with SCI/MCI and 50 controls).Written informed consent is obtained from all participants in the study while the exclusion and inclusion criteria are specified according to the following:

Material and Design of Experiments
The purpose of the acquisition of the data (audio recordings, transcriptions 4 and eye tracking measurements) is to facilitate feature extraction in machine learning experiments (see Section 5).

Audio Recordings
For the acquisition of the audio signal we use the Cookie-theft picture (see Figure 1) from the Boston Diagnostic Aphasia Examination (BDAE; Goodglass & Kaplan, 1983) which is often used to elicit speech from people with various mental and cognitive impairments.During the presentation of the Cookie Theft stimuli (which illustrates an event taking place in a kitchen) the subjects are asked to tell a story about the picture and describe everything that can be observed while the story is recorded.For the task the original label of the cookie jar is translated and substituted from the English "COOKIE JAR" to the Swedish label "KAKBURK".The picture is considered an "ecologically valid approximation" to spontaneous discourse (Giles et al., 1996). 4Since some of the features to be extracted (e.g.part-ofspeech and syntactic labels from the transcriptions) are language-dependent it requires the use of a language-specific infrastructure (in our case Swedish), for that reason we plan to use available resources; cf.Ahlberg et al. (2013); possible modifications to the transcribed language are also envisaged.
We chose to use the Cookie Theft picture5 since it provides a standardized test that has been used in various studies in the past, therefore comparisons can be made based on previous results, e.g. with research on the DementiaBank database or other collections (MacWhinney, 2007;Williams et al., 2010;Fraser & Hirst, 2016).Moreover, in order to allow the construction of a comprehensive speech profile for each research participant, the speech task also includes reading aloud a short text from the International Reading Speed Texts collection (IReST; Trauzettel-Klosinski et al., 2012) presented on a computer screen.As a matter of fact, two texts are used from this collection, in connection to the eye tracking experiment (see next section), but only one of those texts is read aloud and thus combined with eye-tracking recording; cf.Meilán et al., 2012 and 2014 for similar "reading out" text passage experiments.IReST is a multilingual standardized text collection used to assess reading performance, for multiple equivalent texts for repeated measurements.Specifically in our project we use the Swedish IReST translations, namely texts "one" and "seven" (Öqvist Seimyr, 2010).For the audio capture of both we use a H2n Handy recorder6 while the audio files are saved and stored as uncompressed audio in .wav44.1 kHz/16 bit format.Recordings are carried out in an isolated environment in order to avoid noise.

Verbatim Transcriptions
The textual part of the infrastructure consists of manually produced transcriptions of the two audio recordings previously described.The digitized speech waveform will be semi-automatically aligned with the transcribed text.During transcription, special attention will also be paid to nonspeech acoustic events including speech dysfluencies consisting of filled pauses a.k.a.hesitation ("um"), false-starts, repetitions as well as other features such as laughing.A very basic transcription manual is also produced which will help the human transcribers accomplish a homogeneous transcription.Furthermore, for the transcription the PRAAT application (Boersma & Weenink, 2013) is used.

Eye-Tracking
The investigation of eye movement functions in SCI/MCI, and any differences or changes in eye movements that could be potentially detected for those patients is of great importance to clinical AD research.However, until now, eye tracking has not been used to investigate reading for MCI-persons in a much larger scale, possibly due to the number of procedural difficulties related to this kind of research.On the other hand, the technology has been applied in a growing body of various experiments related to other impairments such as autism (Yaneva et al., 2016;Au-Yeung et al., 2015) and dyslexia (Rello & Ballesteros, 2015).For the experiments we use EyeLink 7 1000 Desktop Mount with monocular eye tracking with head stabilization and a real-time sample access of 1000Hz.Head stabilization provides an increased eye tracking range performance.The participants were seated in front of the monitor at a distance of 60-70 cm.While reading, the eye movements of the participants are recorded with the eye-tracking device while interest areas around each word in the text are defined by taking advantage of the fact that there are spaces between each word in the text.The eye-tracking measurements are used for the detection and calculation of fixations, saccades and backtracks.Fixation analyses is conducted within predefined Areas of Interest (AOI); in our case each word is an AOI.

Comparison over a Two-Year Span
The previously outlined experiments/recordings will be repeated two years after the first recording taking place during the second half of 2016.This way we want to analyze whether there are any differences and at which level and magnitude between the two audio and eye-tracking recordings.Namely, compare and examine whether there any observable, greater, differences/decline on some features and which these could be.We are aware that more longitudinal data samples over a longer time period would be desirable but at this stage only a single repetition is practically feasible to perform.In other, longitudinal experiments, e.g. in investigating the nature and progression of the spontaneous writing, patterns of impairment were observed in patients with Alzheimer's disease over a 12-month period, 7 From SR Research Ltd. <http://www.sr-research.com/>.these were dominated by semantic errors (Forbes-McKay et al., 2014).

Envisaged Analysis and Features
The envisaged analysis and exploration intends to extract, evaluate and combine a number of features from the three modalities selected to be investigated.These are speech-related features, text/transcription-related features and eye tracking-related features.

Speech-related Features
A large number of acoustic features have been proposed in the literature which pinpoints of the importance of distinguishing between vocal changes that occur with normal aging and those that are associated with MCI (and AD).We expect that our spoken samples will show different features depending on whether they are produced spontaneously (when talking about the Cookie theft picture) or they consist of read aloud speech.Reliable and robust acoustic features that might differentiate spoken language in SCI/MCI and healthy controls remains an ongoing challenge but the technology develops rapidly.Roark et al. (2011) used 21 features in supervised machine learning experiments (using Support Vector Machines) from 37 MCI subjects and equally many controls (37/37).Features from both the audio and the transcripts included: pause frequency, filled pauses, total pause duration and linguistic variables such as Frazier and Yngve scores and idea density, while best accuracy with various feature configurations were 86.1% for the area under the ROC curve.Pause frequency has been identified as a feature differentiating spontaneous speech in patients with AD from control groups (Gayraud et al., 2011), and may also be used to distinguish between mild, moderate and severe AD (Hoffman et al., 2010).Meilán et al. (2014) used AD subjects and spoken data (read loud and clear sentences on a screen).They used acoustic measures such as pitch, volume and spectral noise measures.Their method was based on linear discriminant analysis and their results could characterize people with AD with an accuracy of 84.8%.Yancheva et al. (2015) used spoken and transcriptions features provided from the DementiaBank (Cookie theft descriptions) using 393 speech samples (165/90).
They extracted and investigated 477 different features both lexicosyntactic ones (such as syntactic complexity; word types, quality and frequency) and acoustic ones (such as Melfrequency cepstral coefficients -MFCC, including mean, variance, skewness, and kurtosis; pauses and fillers; pitch and formants and aperiodicity measures) and semantic ones (such as concept mention) in order to predict Mini Mental State Examination (MMSE 8 ) scores with a mean absolute error of 3.83 while with individuals with more longitudinal samples the mean absolute error was improved to 2.91, which suggested that the longitudinal data collection plays an important role.König et al. (2015) looked also at MCI and AD subjects (23/26) and examined vocal features (silence, voice, periodic and aperiodic segment length; mean of durations) using Support vector machine (SVM).Their classification accuracy of automatic audio analyses was 79% between healthy controls and those with MCI and 87% between healthy controls and those with AD; and between those with MCI and those with AD, 80%.Tóth et al. (2015) used also SVM and achieved 85.3% F-score (32 MCI subjects and 19 controls) by starting with eight acoustic features extracted by applying automatic speech recognition (such as speech tempo i.e. phones per second) and extending them to 83.Finally, Fraser et al. (2016) also looked at the Demen-tiaBank and using 240 samples of AD and 233 from healthy controls, extracted 370 features, such as linguistic variables from transcripts (e.g., part-ofspeech frequencies; syntactic complexity and grammatical constituents), psycholinguistic measures (e.g., vocabulary richness) and acoustic variables from the audio files (e.g., MFCC).Using logistic regression, Fraser et al. could obtain a classification accuracy of 81% in distinguishing individuals with AD from those without based on short samples of their language on the Cookie Theft picture description task.

Text/Transcription-related Features
Many of the previous studies combine both acoustic features and features from the transcriptions; cf. the supplementary material in Fraser et al. (2016).Some of the most common features and measures from transcribed text follow the lexicon-syntax-se-8 MMSE is a brief screening test that quantitatively estimates the severity and progression of cognitive impairment and mantics continuum.These measures include (i) lexical distribution measures (such as type-token ratio, mean word length, long word counts, hapax legomena, hapax dislegomena, automated readability index and Coleman-Liau Index; also lexical and non-lexical fillers or disfluency markers, i.e. "um", "uh", "eh") and out-of-vocabulary rate (Pakhomov et al., 2010).(ii) syntactic complexity markers (such as frequency of occurrence of the most frequent words and deictic markers; [context free] production rules, i.e. the number of times a production rule is used divided by the total number of productions; dependency distance, i.e. the length of a dependency link between a dependent token and its head, calculated as the difference between their positions in a sentence; parse tree height, i.e. is the mean number of nodes from the root to the most distant leaf; depth of a syntactic tree, i.e. the proportion of subordinate and coordinate phrases to the total number of phrases and ratio of subordinate to coordinate phrases; noun phrase average length and noun phrase density, i.e. the number of noun phrases per sentence or clause; words per clause); and (iii) semantic measures (such as the idea or propositional density, i.e. the operationalization of concisenessthe average number of ideas expressed per words used; the number of expressed propositions divided by the number of words; a measure of the extent to which the speaker is making assertions, or asking questions, rather than just referring to entities etc.).

Eye Tracking-related Features
Eye tracking data has been used in machine learning methods in the near past that take advantage of eye dynamics biomarkers (Lagun et al;2011) with good indication that they can aid the automatic detection of cognitive impairment (i.e., distinguish healthy controls from MCI-patients).Several studies provide evidence and suggest that eye movements can be used to detect memory impairment and serve as a possible biomarker for MCI and, in turn, AD (Fernández et al., 2013).Basic features we intend to investigate in this study are fixations (that is the state the eye remains still over a period of time); saccades (that is the rapid motion of the eye from one fixation to another) and backtracks (that is the relationship between two subsequent saccades where the second goes in the opposite direction than the first); for a also cognitive changes over time (Tombaugh & McIntyre, 1992).
thorough description of possible eye-tracking related features cf.Holmqvist et al., 2015:262.Saccades are of particular interest because they are much related to attention and thus, they are likely to be disturbed by cognitive impairments associated with neurodegenerative disorders (Anderson et al., 2013).Note that there are many assumptions behind the use of eye tracking technology for experiments designed for people with MCI.For instance, the longer the eye gaze fixation is on a certain word, the more difficult the word is for cognitive processing, therefore the durations of gaze fixations could be used as a proxy for measuring cognitive load (Just & Carpenter, 1980).Molitor et al. (2015) provide a recent review on the growing body of literature that investigates changes in eye movements as a result of AD and the alterations to oculomotor function and viewing behavior.

Correlation Analysis
We intend to further perform correlation analysis with the features previously outlined and the results from various measures from language-related tests performed in the Gothenburg MCI-study, tests which are applied for assessing possible dementia.Typically, clinicians use tests such as Mini-Mental State Examination (MMSE), linguistic memory tests and language tests.Such tests include the token test, subtest V, which is a test of syntax comprehension; the Boston naming test and the word fluency FAS test (the number of words initiated by the letters F, A, and S).This investigation intends to identify whether there are variables/features (highly) correlated with i.e. the MCI class, yet uncorrelated with each other i.e. the healthy controls or SCI.

Conclusions and Future Work
In this paper we have introduced work in progress towards the design and infrastructural development of reliable multi-modal data resources and a set of measures (features) to be used both for experimentation with feature engineering and evaluation of classification algorithms to be used for differentiating between SCI/MCI and healthy adults, and also as benchmark data for future research in the area.Evaluation practices are a crucial step towards the development of resources and useful for enhancing progress in the field, therefore we intend to evaluate both the relevance of features, compare standard algorithms such as Support vector machine and Bayesian classifiers and perform correlation analysis with the results of established neuropsychological, memory and cognitive tests.We also intend to repeat the experiments after two years in order to assess possible changes at each level of analysis.We believe that combining data from three modalities could be useful, but at this point we do not provide any clinical evidence underlying these assumptions since the analysis and experimentation studies are planned for year 2 of the project (2017).Therefore, at this stage, the paper only provides a snapshot of the current stage of the work.
50-79 years  Swedish as a first language and not speaking languages other than Swedish before the age of 5  Comparable education length of the participants  No apparent organic cause of symptoms  Research subjects have read information about the research project 3 and approved voice recording and eye movement measurements Exclusion criteria  Participants have dyslexia or other reading difficulties  Participants have deep depression  Participants have an ongoing abuse of any kind  Participants suffer from serious psychiatric or neurological diseases such as Parkinson's, Amyotrophic lateral sclerosis or have/had a brain tumor  Participants do not understand the question or the context in the selection process  Participants have poor vision (that cannot be corrected by glasses or lenses), cataract, nystagmus, or cannot see and read on the computer screen  Participants decline participation during telephone call or later at the recording site  Participants decline signing the paper of informed consent  Recordings or eye movement measurements are technically unusable.

Figure 1 :
Figure 1: The Cookie Theft picture