Feasibility of common, enjoyable game play for assessing daily cognitive functioning in older adults

Background Frequent digital monitoring of cognition is a promising approach for assessing endpoints in prevention and treatment trials of Alzheimer’s disease and related dementias (ADRD). This study evaluated the feasibility of the MIND GamePack© for recurrent semi-passive assessment of cognition across a longitudinal interval. Methods The MIND GamePack consists of four iPad-based games selected to be both familiar and enjoyable: Word Scramble, Block Drop, FreeCell, and Memory Match. Participants were asked to play 20 min/day for 5 days (100 min) for 4 months. Feasibility of use by older adults was assessed by measuring gameplay time and game performance. We also evaluated compliance through semi-structured surveys. A linear generalized estimating equation (GEE) model was used to analyze changes in gameplay time, and a regression tree model was employed to estimate the days it took for game performance to plateau. Subjective and environmental factors associated with gameplay time and performance were examined, including daily self-reported questions of memory and thinking ability, mood, sleep, energy, current location, and distractions prior to gameplay. Results Twenty-six cognitively-unimpaired older adults participated (mean age ± SD = 71.9 ± 8.6; 73% female). Gameplay time remained stable throughout the 4-months, with an average compliance rate of 91% ± 11% (1946 days of data across all participants) and weekly average playtime of 210 ± 132 min per participant. We observed an initial learning curve of improving game performance which on average, plateaued after 22–39 days, depending on the game. Higher levels of self-reported memory and thinking ability were associated with more gameplay time and sessions. Conclusion MIND GamePack is a feasible and well-designed semi-passive cognitive assessment platform which may provide complementary data to traditional neuropsychological testing in research on aging and dementia.


Introduction
Clinical treatment trials and longitudinal observational research studies of cognition in aging, Alzheimer's disease and related dementias (ADRD) typically rely on infrequent (i.e., annual or semiannual) cognitive measures as endpoints (1,2).However, this approach has proven to be inefficient in detecting subtle cognitive changes amidst daily variability (e.g., "good days and bad days") in performance of older adults with or without cognitive decline (3,4).With such variability, ADRD trials may require thousands of participants and long durations of follow-up to accurately and reliably determine if significant changes in cognition are present (5).This contributes to the protracted time and enormous expense of bringing a new drug to market.It can take more than 13 years (4,6) and the field has spent over $42.5 billion since 1995 (7), with few meaningful successes to show (8,9).
Frequent, or even continuous, passive monitoring through the use of mobile applications, web-based testing, and home-based digital sensors has emerged as a promising alternative for measuring cognition in aging and ADRD trials (10)(11)(12).Studies have shown that using such tools in a hypothetical pre-symptomatic AD trial can reduce the required sample size for detecting treatment effects by at least 50% (13,14).This reduction in sample size is attributed to more frequent and sensitive assessments to detect variability (15), which provides more robust data than infrequent or regular clinical visits.Further, frequently monitoring changes in cognition-related functions helps shorten trial periods by detecting treatment effects early (16).Models suggest continuous monitoring could reduce the cost of developing a new ADRD drug from $5.7 billion to $2 billion (6).Although mobile neuropsychological testing (17) and wearable technologies for activity and sleep (18) have been proposed as possible frequent/continuous monitoring solutions, each of these methods has limitations.Web or mobile neuropsychological tests are derived to mimic more conventional neuropsychological tests but are vulnerable to practice effects and require commitment and concentrated effort by participants, even if designed for ease of use (19,20).Wrist, ring, or pocket wearables do not measure cognition directly (21)(22)(23).These drawbacks create a gap between continuous, passive but obscure proxies of cognitive function and more formalized, active, and directed measurements of neuropsychological functions.
As a compromise to achieve more dense monitoring of cognition without the limitations of repeated neuropsychological testing, our team has developed a game-based solution for semi-passive, daily monitoring of cognitive functioning.The MIND GamePack © is a cognitive function monitoring platform with a front end of familiar and enjoyable games and interfaces and a back end of cloud-based servers and basic analysis tools, making it a complete solution that can be deployed in clinical trials or longitudinal research.Our selected games, including Memory Match (inspired by Concentration/ Memory ® ), FreeCell (Solitaire) (24-26), Word Scramble (Boggle™), and Block Drop (Tetris ® ) (27), are popular among older Americans (28,29).Many Baby Boomers and Generation X' ers are very comfortable with the iPad touch-screen medium, and many engage with digital games for leisure (28,29).These generations are now reaching an age where they face a higher risk of cognitive decline and dementia.However, their familiarity with consumer electronics, including electronic games, presents an opportunity to implement a game-based solution that appeals to older adults.Such an approach has the potential to generate greater interest and compliance in research and trials (28).Moreover, deploying games to participants' mobile devices is scalable and inexpensive, and game outcomes have been demonstrated to reflect aspects of players' cognitive functions, providing direct measurement of cognition (30-32).Games with multiple difficulty levels and different puzzles/configurations within a level may also help reduce the practice effects, fatigue and boredom, and potential administration and data errors observed in formalized cognitive tests (33).
After refining MIND GamePack for appearance, usability, and back-end operations in beta phase testing with older adults, we instituted a Phase I study to evaluate feasibility in a longitudinal setting.Within this paradigm, it was hypothesized a learning curve would plateau for each game for each participant.Our group also wanted to explore how subjective and environmental factors may account for differences in play between daily sessions.The current study assessed the feasibility of the MIND GamePack with a group of cognitively unimpaired older adults who played in their home environments over a four-month period.The study aimed to gain insights into the real-world application of game play as a sensitive and reliable outcome measure for future research in aging, ADRD and other disorders which may impact cognition.

Study design
The current study investigated the at-home use of MIND GamePack over the course of 4 months.The platform consists of four games: 'Block Drop' , 'FreeCell' , 'Memory Match' , and 'Word Scramble' (Figure 1).The MIND GamePack is an iPadOS-based application which passively extracts raw data and custom-defined summary metrics (Supplementary Figure S1) believed to engage key domains of cognition.Data are collected following a game session, either by game completion or early termination (i.e., quitting a game prior to completion).Participants in the study were individually trained to play each of the games via a standardized procedure.At the same visit, all participants were provided with study devices (iPad tablet), pre-loaded with software to take home and were asked to play unsupervised for at least 5 min per game per day for 5 days a week, for a total of 100 weekly minutes.Participants received financial incentive, up to $435 compensation, if they complied with minimum gameplay requirements.This 4-month study had four study intervals: a lead-in period (Day −21 to −14), baseline (Day 1), 6-week follow-up (Day 42), and end-of-study (Day 84).

Inclusion and exclusion criteria
Eligible participants (Supplementary Table S1) had to meet the following criteria: ages of 55-90, at least a high school diploma or equivalent level of education, native English speaker, Montreal Cognitive Assessment (MoCA) (34) score of 26 or higher, Geriatric Depression Scale (GDS) (35) score of 9 or lower indicating no more than sub-clinical depressive symptomatology, and qualitative evaluation of the Columbia Suicide Severity Rating Scale (C-SSRS) indicating psychiatric stability.

Games and game features
The MIND GamePack includes four games.Block Drop (e.g., Tetris) is a dynamic puzzle game where players control descending blocks of varying 4-square geometric shapes (Supplementary Figure S2), with the goal of aligning blocks in a continuous row and clearing as many rows as possible.To achieve this within 5 min, players must touch a sensitive user-interface (UI) to move and rotate blocks with up, down, right, and left arrow buttons, with the goal of filling gaps on the playing field.When all gaps within a row are filled, the row is cleared.For increased strategy, players may also "hold" descending blocks for later use.For the analysis of Block Drop summary metrics, we extracted the number of lines cleared within a completed 5-min game session.The number of lines cleared is moderately correlated with Trail Making Test A (r = 0.5), which is a test commonly measuring motor speed and attention (36).
FreeCell is a derivative of the traditional Solitaire card game in which all cards of a 52-card deck are dealt to the player.The goal is to stack all cards in ascending order by suit on the foundations pile.To accomplish this, the player may move single (or multiple) cards between the foundations pile, four free cells, and the playing field.The player may use their own strategy or request 'hints' to solve the "puzzle" in an untimed format.FreeCell is further complicated by employing different rulesets for different piles; in the MIND GamePack version of FreeCell, if the player makes an "incorrect move" the move is withdrawn, the player is notified of the correct ruleset for placement, and they are allowed to try again to advance the puzzle.In analysis of the data generated from FreeCell, we calculated the ratio of correct moves by dividing the number of correct (i.e., non-error producing) moves by the total number of moves per session.The ratio of correct moves is moderately correlated with Trail Making Test B (r = 0.5), a test commonly used to measure executive function (36).
Memory Match (like Concentration) is a simulated card game where players are given a prespecified amount of time to memorize a matrix of cards before all are turned over.The player must then select match-pairs from memory in an untimed format.Depending on the level selection, the matrix size can range from [2×2] to [8×8], consisting of 2 and 32 matches, respectively.In the case of Memory Match, we calculated a percent accuracy score by dividing the number of correctly selected match-pairs by the total number of flipped cards per session.Percent accuracy score was normalized based on difficulty  Learning Test ® Third Edition (CVLT3) Short Delay Free Recall (SDFR) test (r = 0.5) and Long Delay Free Recall (LDFR) test (r = 0.6).CVLT3 SDFR and LDFR tests are used for assessing short and longterm memory recall (37).Lastly, Word Scramble (like Boggle), is a timed puzzle game in which players are presented with a 5×5 randomized matrix of letters (including the phoneme 'qu').By selecting individual letters vertically, horizontally, or diagonally (or a combination thereof), players are tasked with finding as many words as possible (3 letters or longer) within a time limit of 5 min.Players can pause, rotate the board, and clear their current selection of letters as a function of the game.For Word Scramble, we calculated a normalized word-found score by determining the number of possible words on each board (range: 40-7,000) using an augmented dictionary of the ENABLE1 Word List (i.e., Enhanced North American Benchmark LExicon) compiled for public domain.We normalized the word-found score by comparing it to the potential number of words available on each board, using z-score transformation.Normalized word-found score is moderately correlated with WAIS-IV Wechsler Adult Intelligence Scale 4th Edition (WAIS-IV) Subtests Symbol Search Total Score (r = 0.6) and Delis-Kaplan Executive Function System (DKEFS) Verbal Fluency Letter Fluency Scaled Score (r = 0.4).WAIS-IV symbol search is a test of information processing speed (38) and visual perception while DKEFS verbal fluency letter fluency test is a test of semantic abilities (39).

Compliance survey
A compliance survey was conducted at week 6 and end-of-study (i.e., week 12) to understand participants' tolerability and experience with the MIND GamePack (Supplementary Table S2).Interview questions included assessing the burden of game play time requirements, game preference, and general experience of using the platform.In the survey, participants were asked to rank their "favorite" games on a scale of 1-4, with 1 indicating their favorite game and 4 indicating their least favorite game at four study intervals.

Subjective and environmental factors
Six subjective and environmental factors were identified from the literature that might potentially influence gameplay routines.Only at the first login of each day of play, participants completed a survey within the iPad program before game sessions, rating their sleep quality, energy level, mood, and subjective memory and thinking ability that day on a 1-5 Likert scale.One-item mood assessments have shown sensitivity in detecting changes before and during life events (40).At each login for game play, participants also indicated where they were playing the game (i.e., at home or elsewhere) and whether there were potential distractions in their environment (yes/ no) (Supplementary Figure S3).

Data architecture
The Google Cloud Computing Platform (GCP) was used in the MIND GamePack to achieve wireless data transmission, aggregation, analysis, and back-up.Upon completion of both individual game sessions and daily surveys, and successful connection to a stable Wi-Fi connection, de-identified participant game data were sent to Cloud Firestore per session per user.Transmitted data were analyzed within the GCP, annotated into raw data, and held for daily archiving within the GCP cloud bucket storage.Data were then archived per day per user into JavaScript Object Notation (JSON) files for query in bucket storage.Further, data in this format were then backed up to a local open-source relational database management system (RDBMS) at Massachusetts General Hospital (MGH) (Figure 2).

Statistical analysis
Gameplay time and number of game sessions were calculated per participant per day over 4 months.A linear generalized estimating equation (GEE) model was utilized to evaluate the changes in gameplay time and number of sessions (41).To estimate how long it may take for game performance to plateau, a regression tree model was used with the outcome being the performance of each game (42).The regression tree model calculates the mean squared error (MSE) for two chunks of time series and identifies the day i r  when the least MSE was observed (i.e., the day best splitting the time series data).There are three steps with the regression tree model.

( )
For each participant's data, sort ( ) x y by i x in ascending order and set ( ) ii.The sum of squared distance is minimized by the mean, so for each i r , compute 1, 1 iii.Find i r  minimizes ( ).
i T r Regression tree models were conducted on all participants for an identified game feature, and then averaged across the sample, to identify the mean time at which said feature plateaued.The standardized mean squared error (MSE) was calculated for each participant during the monitoring period, and then averaged across participants for each monitoring day.A multinomial logistic GEE model was used to examine changes in game preference ranks and the likelihood of enjoying a specific game over time.Subjective and environmental factors of gameplay were included in GEE models.The four games' play time, preference, and total game play time were analyzed separately.All the GEE models were adjusted for age, sex, and years of education.

Participant demographics
Twenty-nine participants were screened, and 2 participants were ineligible (MoCA score ≤ 26).Twenty-seven participants were enrolled, and 1 withdrew after stating a lack of financial incentive.In total, 26/27 enrolled participants completed the 12-week study (Table 1).Participants were recruited primarily through the Mass General Brigham Rally platform.

Feasibility
A total of 1946 days of data were collected across 26 participants.Over the four-month study period, GEE models revealed total gameplay time (β = −0.005,value of p = 0.94) and the total played sessions (β = −0.01,value of p = 0.19) remained stable (Table 2), with an average compliance rate of 91% (SD = 11%).On average, participants played for at least 100 min per week for 11 out of 12 weeks.Participants played on average 3.5 h per week (SD = 2.2) which was approximately 2 h above minimum requested compliance (Figure 3).Table 2 presents the results of GEE models.The analysis revealed a significant association between self-reported memory and thinking ability and total gameplay time and sessions.Specifically, a point increase in daily self-reported memory and thinking ability was associated with 2 min more (or 0.34 session) of game play.There was a significant association between self-reported distractions and the number of game sessions played that day (p = 0.01).Participants played 1.3 more sessions if they were in an environment with distractions from people and things in their environment.
Among the four games, Block Drop, and FreeCell play time and number of played sessions remained stable throughout the study.The number of Memory Match sessions decreased over time (p = 0.01), although play time did not significantly decrease (p = 0.26) (Table 3).Word Scramble gameplay time (p = 0.03) and number of played sessions (p = 0.03) showed significant increases over the study period, indicating an approximately increase of 1 min in playtime per week (or 0.2 session).Figure 4 shows the amount of play time for each game from two participants.Gameplay time and sessions across days.

Learning and time to plateau
The results of the regression tree models are presented in Table 4.Our analysis revealed that for Block Drop, the trajectory of average number of lines cleared plateaued after approximately 3 weeks of gameplay.For FreeCell, the trajectory of the average ratio of the valid moves feature plateaued after 5 weeks of gameplay.For Memory Match, the trajectory of the average normalized percent accuracy  5 demonstrates the regression tree model and the use of MSE to determine the day when the number of lines cleared plateaued for one case example.Figure 6 presents the averaged standardized MSE for each day of the 4 games throughout the study period.

Compliance
Table 5 lists selected quotes from participants from the compliance survey in which participants were asked how difficult/easy the MIND GamePack was to incorporate into their daily routine.Participants reported that the current study design was well-tolerated and easy to incorporate in their daily routines.Two participants stated they wanted to continue playing the games after the study concluded.One participant commented that this set of games should be available to everyone participating in research.

Discussion
The current study evaluated the feasibility of the MIND GamePack as a game-based tool for daily assessment of cognitive function in older adults.Both quantitative and qualitative data indicated that participants found the games enjoyable and played them consistently over the 4-month period.Participants had a high compliance rate, averaging at 91%, suggesting that, on average, they were compliant with the study requirements for 11 out of the 12 weeks.Game performance stabilized after 22-39 days (3-6 weeks), reflecting a spectrum of diverse learning processes between participants.Such data can be useful in monitoring subtle, yet personal, changes in cognition in day-to-day lives.The platform includes short daily surveys to capture subjective and environmental factors which may further impact game play.Feasibility data supports the potential of this game-based platform for endpoints in future longitudinal and interventional research in aging and ADRD.
Before using the MIND GamePack in clinical research, it is essential to have characterized the learning curves and plateaus of game performance.If participants keep improving over time, as might occur with "brain exercises, " it becomes difficult to distinguish if improvement is due to intervention or natural learning and improvement processes.We adopted a novel approach using regression tree models to estimate the time when game performance plateaued.Despite variations in the features of different games, the average time-to-plateau for participants was approximately 3-6 weeks.This information can be useful in designing future clinical trials that incorporate the MIND GamePack.To ensure accurate assessment, investigators may consider allowing at least a 1 month of lead-in period before baseline assessment and administering an intervention to participants.The observed differences in the average time-to-plateau of game features may be attributed to the different cognitive demands of each game.For example, Block Drop requires visuomotor and visuospatial abilities, which may be improved upon more expeditiously compared to other cognitive domains such as memory (Memory Match) or language (Word Scramble).To better understand the concurrent  validity of game features, future studies will explore the correlation between the learning rates of these features and conventional neuropsychological tests.
In the current design, participants were asked to play all four games 5 min per day, 5 days a week.Understanding game preferences and levels of enjoyment in playing games is critical to ensuring future compliance and game selection.Because we added the gaming preference survey after recruitment and had a limited sample size, we recognize this is an important area to explore in the future.Validated surveys of participant experience are needed to fully capture user experience and the likelihood of participating in clinical trials with the MIND GamePack.
Our study underscores the importance of assessing both subjective and environmental factors that may affect game play and interpretation of play data.Our findings of better self-reported memory and thinking ability prior to daily gameplay contribute significantly to more gameplay time and higher frequency suggests value for regular assessments to capture cognitive performance on both "good" and "bad" days.We also observed a positive correlation between the number of gameplay sessions and the presence of distractions from other people and things in their environment.This association may be attributed to increased social interactions and discussions about games with family and friends, which may have led to more frequent gameplay sessions.Another explanation might be participants played more to compensate for the time they were distracted.

Limitations
The MIND GamePack is an easy-to-use, accessible, and costeffective solution for dense semi-passive monitoring of cognition in aging and ADRD longitudinal research.However, the platform has some limitations that should be considered.Being unsupervised, one possibility is that participants could allow others to play despite their instruction not to permit anyone else to play the games.For the current study, only one participant indicated they let another person, a family member, play a single session.Second, participants may have preferences for certain games and may choose not to play others, which could limit the evidence for learning and make it challenging to identify meaningful time points using a regression tree model.The

Future directions
Additional research is exploring the use of MIND GamePack in older adults with MCI and mild-stage dementia due to AD.This will allow us to assess whether individuals with memory problems are able to play the games and adhere to the protocol.The construct validity of game features can be better described using a more heterogeneous population across the ADRD continuum.The cognitive domains for validating the MIND GamePack could include global cognition, memory, motor, visuospatial, language, and executive function.Additionally, we suggest developing more features to assess learning and comparing the learning performance of game features across different levels of cognitive impairment to gain a better understanding of the natural trajectories of learning with and without ADRD.The availability of multiple features will support the application of machine learning and linguistic techniques to characterize cognitive variability and heterogeneity.Such insights could be used in future ADRD research to enhance the use of the MIND GamePack as a robust complementary assessment tool for evaluating disease progression and the effectiveness of interventions being evaluated for stabilization or even symptomatic improvement.

FIGURE 1 MIND
FIGURE 1 MIND Game Pack platform.(From left to right, top row): 'Block Drop', 'FreeCell', 'Memory Match'.(From left to right, bottom row): 'Word Scramble', Log-In Screen, Home Screen with compliance visualizations.

FIGURE 2
FIGURE 2Data architecture of MIND GamePack.

FIGURE 4
FIGURE 4Case examples of gameplay time across days.(A) Gameplay time remained stable across 100 days.Word Scramble was played longer than other three games.(B) Gameplay time showed a trend of increase.FreeCell was played longer than other three games.

FIGURE 5 A
FIGURE 5 A case example of regression tree model of number of lines cleared in Block Drop.The scatter plot shows the number of lines cleared in Block Drop per day.The line plot shows the MSE for each split in the regression tree models.Result of the regression tree model suggested the participant' game performance plateaued around day 19.

TABLE 5
Quotes from the compliance survey.easy -I'm routine driven and I just set it up and it happens….I experiment trying to play early morning, afternoon, and evening -I vary my play." 66 Female "Easy -I experimented at the beginning playing two in the morning and two in the afternoon.But now I play them all after breakfast…yesI do it after my morning coffee and breakfast." Burden 57 Female "I play a lot of different games and enjoy them.I have the same ranking of my favorites, butI sort of incorporate it into my day.Even Memorial weekend we were gone to Maine.It was easy to change the games and get onto Wi-Fi." is really good quality.I am a developer-I'm impressed." 74 Female "I like the sound effects of the games.I liked the Block Drop when you get 4 and it goes pheww…" 89 Male "It's interesting, I found the animal ones [i.e.levels] to be complex and the drinks were hard -a myriad of different shaped glasses and colors." it was kind of fun.I feel like some of the gamesI want to download on my iPad."89 Male "It was an interesting challenge overall.Something there for everyone."70 Female "I really enjoyed,I'm kind of sad it's over.I want to play some more."

TABLE 2
Total gameplay and subjective and environmental factors.

TABLE 3
Changes in play time (in minutes) and sessions (in counts) across games.

TABLE 4
Estimated days that a game performance plateaued.
Finally, it is important to note that the current study focused on older adults with intact cognition.Ongoing research is exploring the use of the MIND GamePack with older adults with mild cognitive impairment (MCI) or mild dementia due to AD.