Usability testing of touchscreen DriveSafe DriveAware with older adults: A cognitive fitness-to-drive screen

Abstract Background: DriveSafe DriveAware (DSDA) is a cognitive fitness-to-drive screen that can accurately predict on-road performance. However, administration is restricted to trained assessors. General practitioners are ultimately responsible for determining fitness to drive in many countries but lack suitable tools. We converted DSDA to touchscreen to provide general practitioners and other health professionals with a practical fitness-to-drive screen. This necessitated the development of an automatic data collection system. We took a user-centred design approach to test usability of the system with older adults, the group most likely to take the test. Method: Middle-aged and older adult volunteers were asked to try an iPad application to assist in the development of a fitness-to-drive screen. Seventeen males and 18 females (mean age 70 years) participated in four trials; each participant was tested only once. We tested all text and function changes until all older adults could successfully self-administer the screen. Results: Older adults found basic touchscreen functions easy to perform, even when unfamiliar with the technology. Conclusion: Usability testing allowed us to develop a user-friendly touchscreen data collection system and ensured that design errors were not missed. Psychometric evaluation of data gathered with touchscreen DSDA was conducted in a separate study prior to use in clinical practice.

Abstract: Background: DriveSafe DriveAware (DSDA) is a cognitive fitness-to-drive screen that can accurately predict on-road performance. However, administration is restricted to trained assessors. General practitioners are ultimately responsible for determining fitness to drive in many countries but lack suitable tools. We converted DSDA to touchscreen to provide general practitioners and other health professionals with a practical fitness-to-drive screen. This necessitated the development of an automatic data collection system. We took a user-centred design approach to test usability of the system with older adults, the group most likely to take the test. Method: Middle-aged and older adult volunteers were asked to try an iPad application to assist in the development of a fitness-to-drive screen. Seventeen males and 18 females (mean age 70 years) participated in four trials; each participant was tested only once. We tested all text and function changes until all older adults could successfully self-administer the screen. Results: Older adults found basic touchscreen functions easy to perform, even when unfamiliar with the technology. Conclusion: Usability testing allowed us to develop a user-friendly touchscreen data ABOUT THE AUTHORS Touchscreen DriveSafe DriveAware (DSDA) is a standardised assessment of cognitive fitness to drive that has been administered by drivertrained occupational therapists for many years. Our group converted DSDA into a practical and predictive touchscreen test for general practitioners to use in medical practice. This involved a number of research phases. First, we tested usability of touchscreen DSDA with older adults throughout the design and programming phases to ensure the test was user-friendly. Next, we developed and tested an automatic data collection and scoring system that reflected the decisions that would have been made by a trained-assessor. Finally, we conducted a study to examine the internal validity, reliability and predictive validity of data gathered with touchscreen DSDA. A standardised onroad assessment was the criterion measure. Rasch analysis provided evidence that touchscreen DSDA had retained the strong psychometric properties of original DSDA. The present paper relates to the test development stage.

PUBLIC INTEREST STATEMENT
DriveSafe DriveAware (DSDA) is valid, predictive test that occupational therapists have used for many years to determine if drivers can manage the cognitive aspects of driving. However, DSDA is not practical for medical practice. General practitioners (GPs) are ultimately responsible for determining medical fitness to drive, but lack the tools. Therefore, a touchscreen version of DSDA was developed to provide GPs with a practical, valid, predictive test to determine patient cognitive fitness to drive.
Older adults are the patients most likely to take touchscreen DSDA due to age-related changes and the onset of medical conditions such as stroke and dementia that may affect driving. However, older adults can have difficulty with technology due to cognitive changes and reduced vision, hearing, and reaction time. Therefore, older adults were consulted in the touchscreen DSDA development stage to ensure the test was userfriendly. It would not have been possible to develop a successful test without usability testing because we could not have predicted what difficulties older adults may encounter.
collection system and ensured that design errors were not missed. Psychometric evaluation of data gathered with touchscreen DSDA was conducted in a separate study prior to use in clinical practice.
General practitioners are the professionals ultimately responsible for determining medical fitness to drive in most countries and are in an ideal position to screen drivers because: 1) patients usually present to them in the first instance; 2) they are required to fill out license authority medical forms (Dobbs et al., 1998;Sims et al., 2012), and 3) there is mandatory reporting of medically "at risk" drivers in jurisdictions of many countries including the US, Canada, and Australia (Austroads, National Transport Commission, 2016;Jang et al., 2007). Surveys show general practitioners believe they should be responsible for making determinations about fitness to drive but lack valid and reliable driver screens that are practical for use in medical practice (Dobbs et al., 1998;Fildes, 2008;Jang et al., 2007;Marshall, Demmings, Woolnough, Salim, & Man-Son-Hing, 2012;Molnar, Patel, Marshall, Man-Son-Hing, & Wilson, 2006;Sims et al., 2012;Wilson & Kirby, 2008;Woolnough et al., 2013;Yale, Hansotia, Knapp, & Ehrfurth, 2003).
The desktop (original) version of DSDA is a cognitive fitness-to-drive test showing promise as a driver-screening instrument. Data gathered with original DSDA are face valid, reliable, sufficiently predictive, test-retest reliable, and trichotomise patients via two evidence-based cut-off scores based on the likelihood of passing an on-road assessment (i.e. "Likely to Pass", "Requires Further Testing", and "Likely to Fail") (Hines & Bundy, 2014;Kay, Bundy, & Clemson, 2009b;2009a, Kay, Bundy, Clemson, Cheal, & Glendenning, 2012O'Donnell, Morgan, & Manuguerra, 2018). However, original DSDA is not practical for medical practice and requires a trained administrator. Therefore, we further developed the test so it would be suitable for administration by general practitioners. Prior to development of the new screen, we surveyed a representative sample of 200 Australian general practitioners to identify their preferences regarding a driver screen (Brown, Cheal, Cooper, & Joshua, 2013). General practitioners reported they needed a brief (mean and median 10 min), valid and simple test. Thus, we designed DSDA to be largely self-administered via iPad, with capacity for a practice nurse to set up and supervise the self-administered components.
Because the majority of drivers likely to take touchscreen DSDA will be older adults, we wanted to be sure that older adults could use tablet technology and feel comfortable with it (Cook et al., 2014;Dixon, Bunker, & Chan, 2007;Matthew et al., 2007;Ryan, Corry, Attewell, & Smithson, 2002;White, Janssen, Jordan, & Pollack, 2015). Older adults experience age-related changes that could affect interaction with a digital tablet (e.g. reduced vision, hearing loss, reduced reaction time, reduced coordination, and cognitive changes). We wanted to design an interface that would consider these limitations so that touchscreen DSDA examined the desired construct (i.e. awareness of the driving environment and one's own driving performance) (Kay et al., 2009a) and not individual differences in ability to enter responses via a touchscreen. First, we reviewed optimum touchscreen design guidelines for older adults. The most relevant for this project included: large targets (minimum .31" or 8 mm); large, simple fonts; high-contrast colours; contrasting targets and backgrounds; caution in the design of drag tasks including testing with seniors first; avoidance of scrolling; simple and meaningful icons; and, avoidance of distracting or irrelevant elements (Kobayashi et al., 2011;Loureiro & Rodrigues, 2014;Schneider, Wilkes, Grandt, & Schlick, 2008).
We took a user-centred design approach to avoid potentially costly, time-consuming user-interface problems in the clinical research phase. Hegde (2013) described usability testing as the cornerstone of best practice when designing medical devices. Usability is defined as the extent to which a product or service can be used with efficiency, effectiveness, and satisfaction by the target users to achieve specified goals in the specified context of use (International Organization for Standardization, 2010). The user-centred design philosophy places end users at the centre of the design process (Dorrington, Wilkinson, Tasker, & Walters, 2016;McCurdie et al., 2012). Elements of the design are refined via an iterative process (Hegde, 2013;McCurdie et al., 2012;Rogers & Mitzner, 2016).
We sought to make the touchscreen version of DSDA as similar as possible to the original version in order to retain test validity. However, we were transitioning from a test where a trained administrator collected, interpreted, and scored variable data via participant verbal responses, to a test where variable data were collected via participant touchscreen responses and scored automatically. This necessitated the development of an automatic variable data collection and scoring system that would reflect the decisions that would otherwise have been made by a trained assessor. In the present study, we addressed the research question "Does the touchscreen data collection system we designed collect variable data in a way that is user-friendly for older adults who may be unfamiliar with the technology?" We sought to answer this question by testing the usability of the touchscreen DSDA data collection system with older adults concurrently with touchscreen DSDA software design and programming.

Method
The University of Sydney Human Research Ethics Committee provided approval for the study. We conducted four rounds of usability testing on 4 days over 1 month. Results from each round informed the next stage of design and programming. We aimed to test approximately 10 participants per round. Testing with larger numbers was not considered beneficial because a repeated pattern of errors emerged after trials with 7-10 individuals. These errors needed to be addressed in programming before further feedback was useful. Each participant was tested only once.

Setting
We conducted Round-1 and Round-4 at a large aged-care residential facility. We conducted Round-2 and Round-3 at a community centre. Round-2 occurred within the context of a social group for older adults. Both centres were located in Sydney, Australia.

Participants
We placed an advertisement in community meeting areas at both centres, asking for older adult volunteers to assist in the development of a fitness-to-drive screen for general practitioners. Potential volunteers were informed that their information would be anonymous; we would provide no advice regarding their driving. Volunteers advised centre staff if they wished to participate. A total of 35 adults volunteered: 17 males and 18 females aged between 41 and 89 (mean age 70 years); 83% (29 participants) were over 65. We collected no identifying data. No one withdrew or was excluded after agreeing to participate. Participant characteristics including iPad use are listed in Table 1.
Round-1 and Round-4 participants lived in supported care units. Eleven were ambulant; two mobilised via wheelchair. Three reported a past stroke; three reported hearing impairment; one reported significant vision impairment. Round-2 participants were all retired, ambulant, generally well, driving, and living independently in the community. Round-3 participants were younger and more active than the other groups. All were in paid employment and driving. Inclusion of a middleaged group allowed comparison with a different generational cohort. The educational status of the sample was: post graduate degree (number [n] = 1), university degree (n = 7), college certificate (n = 6), completion of high school (n = 5), completion of middle high school (n = 9), and completion of primary school (n = 1), not reported (n = 6).

Instruments
DriveSafe measures awareness of the driving environment (Kay et al., 2009a(Kay et al., , 2008. Touchscreen DriveSafe consists of 10 images of a 4-way intersection (see sample image in Figure 1). Each image includes between two and four potential hazards (i.e. people or vehicles). These hazards appear for 4 s then disappear, leaving only the blank intersection. Participants are asked to recall the hazards that were displayed, touching the blank intersection to identify hazard type, location, and direction of movement.  DriveAware measures awareness of one's own driving abilities (Kay et al., 2009b(Kay et al., , 2009a. Touchscreen DriveAware (Kay et al., 2009b) consists of two self-administered questions and five questions administered by a general practitioners or suitably qualified health professional (see sample question in Figure 2). The DriveAware items yield a discrepancy score based on the difference between the patient's self-ratings and the clinician's ratings, or performance in DriveSafe.
Touchscreen DSDA was presented to participants via a 3rd generation iPad (operating system 9), with a 9.7-inch, 2048 × 1536 mm (264 pixels per inch), multitouch "retina" display. Headphones and a stylus were available to use depending on each participant's preference.

Procedure
We developed a pilot version of touchscreen DSDA based on screen blueprints, which provided a visual guide of the skeletal framework and arrangement of elements in the iPad application. We performed several rounds of in-house testing and quality checks until we had created a satisfactory version for the present study.
The first author trialled the pilot version at the two centres. At the time of the trial, each participant sat on a chair with the iPad on a table in front. Volume and brightness were adjusted to full. Participants adjusted the position of the iPad to suit their focal length and chose whether to use a stylus or headphones. The examiner recorded participant actions and comments during testing with attention to apparent ability to understand test requirements, operate functions (e.g. tap, drag, undo, and buttons), and evidence of any anxiety or frustration. The examiner also recorded any technical difficulties related to programming. The first author conducted a brief interview with each participant post testing including questions regarding test difficult, ability to understand and follow written and audio instructions, and ability to operate the device. After each round of testing, the first author discussed any difficulties encountered and potential solutions with the iPad application developers. Agreed programming and design changes were made and quality checks performed, followed by re-testing with participants. Testing continued until touchscreen DSDA was fully programmed and participants could independently selfadminister the test.

Analysis
The analysis focused on functional outcomes: whether participants could understand the test requirements, successfully perform the associated actions via the touchscreen (e.g. tap a target or adjust an arrow direction), and complete the required task in a timely manner and without errors. We assessed these outcomes against the project goals and objectives presented in Table 2.

Results
The following is a summary of the main challenges encountered and the solutions implemented prior to testing in subsequent rounds.

Test set-up
Usability testing provided important insight regarding optimum test set-up. For example, four Round-1 participants forgot to wear their reading glasses and all failed to attend to the screen at the commencement of each item. Some had difficulty entering responses via touch, largely due to incorrect finger angle when the iPad was flat on the table. We addressed these difficulties by adding a written and audio prompt to put reading glasses on if worn; adding a countdown (i.e. "3, 2, 1") and bell to cue timely attention; and placing the iPad on a stand angled to 20 degrees. Result from Round-2 and Round-3 testing indicated these measures had resolved the testing difficulties; we identified no further test set-up challenges.
Three participants reported hearing loss in Round-4. One had a significant loss, did not wear hearing aids, and could not hear the instructions with full volume. However, all three participants reported hearing the instructions clearly once wearing headphones. One participant had a significant hand tremor that impaired touch ability. A stylus solved this problem. Round-2 participants were asked to try both stylus and finger inputs and could use either equally successfully.

Drivesafe
The primary goal for the DriveSafe evaluation was to determine user-friendly input methods.

Object location input
Round-1 participants (n = 9) triggered unwanted object location responses by resting a hand on the screen or by incorrect touch. Also, we encountered technical difficulties for "tap" functions because some objects were situated too close together. We resolved all difficulties via use of an iPad stand and optional stylus, a programmed 0.5-s delay between taps to allow menus to open, and adjustment of object proximity. We encountered no further difficulties in subsequent rounds.

Object type input
Round-1 participants had difficulty with object-type menus overlapping each other or the screen edge, preventing option selection. We overcame these difficulties by programming a 1-cm exclusion zone around the perimeter of the screen and by moving close objects further apart. Round-2 participants failed to notice or use the icon depicting two people (see Figure 3(a)). Therefore, we enlarged it and moved it to a more prominent location on the menu (from left to right). We added forced selection of a couple icon to a practice item and inserted an error message to provide clarification (i.e. "For 2 people walking together, use the 2-person icon"). We encountered no further difficulties in subsequent rounds.

Direction of movement input
We trialled two direction input methods in Round-1: an 8-way arrow icon (i.e. tap the arrow representing the object's direction) and a drag icon (i.e. drag the object in the desired direction). More people preferred "drag" to "arrows" (n = 10:2). One participant found both options difficult. Participants reported drag provided a more accurate reflection of their intended direction stating, "The arrows are not really spot on" and "I wish there was one in the middle". Because the drag function was the most successful, we discontinued arrow testing. Five participants in Round-2 had difficulty with the drag motion, either not dragging the icon far enough or trying to drag it too far. We resolved this in subsequent rounds by fixing the icon at the tap location and snapping out a ghosted arrow once the participant selected the object type. The ghosted arrow became solid once touched (see Figure 3(b)). We also extended the drag radius and required a pivot action, which older adults found easier than drag alone.

a) Object selection menu
First the user taps the object on the menu that represents the object they recalled being displayed.

b) Object direction selector
Next the user pivots the arrowhead to indicate the direction of movement recalled. The icon is fixed. We wanted to make the touchscreen interface user-friendly without prompting responses. For example, in the original version of DriveSafe, patients could "forget" to provide a verbal response for the "direction" category; this contributed to scoring and served to discriminate amongst individuals. We initially tried "no cue" for this category but each of the 13 participants forgot to enter at least one response. Lack of a visual cue was therefore not useful for discriminating ability amongst participants of varying ability levels. We addressed the problem of eliciting a response, whilst minimising cueing, by inserting the ghosted arrow. This prompted participants to enter a direction response but, if they failed to respond, they could still proceed.

Item progression
In Round-1, we identified a significant problem in self-administration of item progression via the "next" button. All participants over-focused on tapping this button, missing the brief display of objects. We corrected this by a) delaying the appearance of the "next" button until a first response had been entered; b) adding a written and audio instructions once objects had disappeared (i.e. "Tap where you saw the object"); and c) inserting a message at the conclusion of each item (i.e. "Are you sure you have completed this screen?"). Some participants tried to enter responses before objects had disappeared. This was largely resolved by the aforementioned programming changes. Inserting an error message was not viable, as this would have distracted participants from committing the hazards to memory.
Designing a user-friendly "undo" process proved challenging in all rounds. Middle-aged and older participants had contrasting experiences and difficulties. We added undo instructions into the demonstration image and forced undo practice because Round-1 and Round-2 older adults did not understand how to use the undo button. However, the middle-aged participants strongly disliked being forced to practice (e.g. stating, "I don't think I should have to press undo. I didn't make a mistake" and "No. I'm not undoing because I did it right"). Making practice optional in conjunction with additional instructions worked well for all adults.

Understanding of test requirements
In Round-1, only one of the 13 participants understood the test requirements. Participants continually pressed the next button, missing many items. We resolved this in Round-2 with the forced practice of incorrect demonstration items and two levels of in-application instructions. To avoid disrupting test flow for participants who quickly understood test requirements, only participants having difficulties received second-level instructions. The wording of some test instructions confused participants in Round-2. For example, one DriveSafe instruction stated, "You will now see 10 images of an intersection". Including numerals confused the participants (e.g. "That's a problem. I thought I needed to look for 10 items"), so we removed them. Field-testing allowed identification of particular words or colours that were problematic. For example, participants failed to understand the word "marker". Thus, we substituted the word "arrow". Participants reported a red message background made them feel like they had done something wrong. We resolved this by changing the background colour to blue.
Round-4 participants had the least iPad exposure: only one participant reported owning an iPad but rarely using it. The others had never used one. One 79-year-old male with significant hearing loss, full vision loss in one eye, and glaucoma in the other eye, presented a particular challenge (he was still driving). He reported he had never seen an iPad and did not understand the touchscreen concept. Despite this, he completed the test successfully and with ease, wearing headphones. The only difficulty he had was working out how to drag the direction icon. The second level of inapplication instructions addressed the difficulty and he required no administrator assistance.
We learned about the need to create an administrator-assisted option for test administration because two participants struggled with the practice items (neither had used an iPad before). A brief verbal prompt addressed their difficulties but the first participant became frustrated after five item repetitions. We developed an administrator-assist procedure to meet needs we observed during the study. For example, we concluded that an administrator should intervene after four unsuccessful practice attempts.
We chose not to develop a solution for every observed problem. For example, one participant took twice as long as the others to complete the test and demonstrated behaviours not observed amongst other participants (e.g. placing a car in the foreground of every image although there was no object at this location). These difficulties may have related to test design or to the participant's cognitive deficits. Because we did not observe a similar problem in any other participant, we did not develop a programming solution.

Driveaware
Participants self-administered DriveAware with ease through all rounds of testing. Therefore, we made no adjustments. The original design worked well in practice.

Discussion
Older adults in this study found basic touchscreen operations easy to perform without training, even when unfamiliar with the technology. All participants quickly understood how to interact with the iPad, although 23 of 29 participants aged over 65 had never or rarely used one. Participants' overall response to the touchscreen test was positive; spontaneous comments from four participants indicated they found the test face-valid (e.g. "I think it is fair in the fact that it makes you look at your surrounds, which a lot of older people don't, and just basic road rules I guess" and "That seemed quite good. If my doctor made me do this to check my driving, that would be fair").
DriveSafe was the more difficult subtest to convert to iPad administration because it was the more complex aspect of DSDA. The original version relied on an administrator to interpret patient verbal responses. It was fundamental to consider motor and cognitive performance required for touchbased functions such as "tap" and "drag" in the DSDA conversion (Findlater, Froehlich, Fattal, Wobbrock, & Dastyar, 2013). Consistent with other research (Cockburn, Ahlström, & Gutwin, 2012;Findlater et al., 2013;Kobayashi et al., 2011), participants found tap intuitive and drag more difficult. However, participants were not precise when entering responses via either action. This may be due to parallax, because the target is obscured by the finger, or due to the additional dexterity and pressure required to "hold" and slide the target against resistance (Cockburn et al., 2012;Findlater et al., 2013;Kobayashi et al., 2011;Loureiro & Rodrigues, 2014). The arrow design that avoided the need for a precise dragging action worked better than the other options trialled. We considered this lack of precision when determining scoring in a subsequent study, allowing generous zones to be scored as correct. Consistent with design guidelines for older adults (Loureiro & Rodrigues, 2014), we found timing of interface elements critical to smooth progression of the test. For example, participants encountered difficulties in Round-1 because the "next" button appeared before it was needed, resulting in over-focus on this button and misapplication.
Cueing timely attention to the small iPad screen was fundamental in the conversion because participants determined when to progress to the next item. In touchscreen DSDA, a loud bell and countdown timer successfully cued attention via auditory and visual prompts. We did not standardise distance to the screen based on touchscreen design guidelines for older adults which recommended seniors be free to adjust the iPad distance for comfortable viewing (Loureiro & Rodrigues, 2014). This worked well in practice. Participants did not have difficulty observing the smaller hazards, which varied in size from .04" (10 mm) to .24" (62 mm).
We tested all text and function changes across age groups as any change significantly impacted performance. The middle-aged cohort had difficulty with different instructions and functions than the older age groups. Inclusion of participants with challenges common to older age, such as hearing and vision loss, tremors, and potential cognitive changes, avoided user-interface problems that might have occurred if we had considered only the needs of able-bodied users. The performance of participants with these challenges informed test set-up and administration procedures and identified the need for an examiner-assisted version of administration.

Limitations
A limitation of this study was the small sample size, which may have resulted in low-probability errors not being detected (Hegde, 2013). However, we considered a range of other sources to mitigate this: literature review, task analysis, prototyping, interviews, expert reviews, and continuous quality checking.

Conclusion
A user-centred design process allowed us to develop a user-friendly touchscreen data collection system for older adults, the group most likely to take touchscreen DSDA and most likely to struggle with the technology. The approach taken allowed us to test and validate our design assumptions and ensured that design errors were not missed. We believe that conducting usability testing concurrently with iPad application design, programming and evaluation resulted in significant cost, time, and efficiency benefits. A further study was conducted to develop an automatic data scoring system to reflect the decisions that would otherwise have been made by an expert rater (Cheal, Bundy, Patomella, Scanlan & Wilson, 2018). Additionally, a further study was conducted to examine the psychometric properties and predictive validity of data gathered with touchscreen DSDA before it could be used in clinical practice (Cheal & Kuang, 2015).