Effects of Different Types of Cognitive Training on Cognitive Function, Brain Structure, and Driving Safety in Senior Daily Drivers: A Pilot Study

Background. Increasing proportion of the elderly in the driving population raises the importance of assuring their safety. We explored the effects of three different types of cognitive training on the cognitive function, brain structure, and driving safety of the elderly. Methods. Thirty-seven healthy elderly daily drivers were randomly assigned to one of three training groups: Group V trained in a vehicle with a newly developed onboard cognitive training program, Group P trained with a similar program but on a personal computer, and Group C trained to solve a crossword puzzle. Before and after the 8-week training period, they underwent neuropsychological tests, structural brain magnetic resonance imaging, and driving safety tests. Results. For cognitive function, only Group V showed significant improvements in processing speed and working memory. For driving safety, Group V showed significant improvements both in the driving aptitude test and in the on-road evaluations. Group P showed no significant improvements in either test, and Group C showed significant improvements in the driving aptitude but not in the on-road evaluations. Conclusion. The results support the effectiveness of the onboard training program in enhancing the elderly's abilities to drive safely and the potential advantages of a multimodal training approach.


Introduction
The proportion of the elderly in the driving population is increasing rapidly, especially in advanced countries, and assuring their safety as drivers is an issue of huge importance [1,2]. Promotion of age-based license reassessments and the recommendation of driving cessation are measures undertaken in a number of countries. However, in widespread suburban areas, a private car is a necessary transportation method and driving cessation can result in a large negative impact on quality of life [3]. Thus, methods that enhance driving safety and prolong the mobility of elderly people are important.
Previous studies have indicated that cognitive function has a significant effect on the driving safety of elderly people, both in healthy people and those with mild cognitive impairment [4][5][6] (however, also see [7]). By focusing on this association between cognitive function and driving safety, many studies have introduced different types of cognitive training and investigated their transfer effects on driving safety and the maintenance of mobility [8][9][10][11][12][13].
With respect to the general outcomes of cognitive and behavioral training in the elderly, studies have shown the effects of such training interventions on neural structure [14][15][16][17][18][19] and function [20,21] in the elderly. However, to the best of the authors' knowledge, there have been no studies specifically addressing the possible association among cognitive enhancement, neural plasticity, and changes in driving safety. Simultaneously investigating the changes in these aspects can help attain a deeper understanding of how the mechanisms of cognitive training can result in changes in behaviors as complicated as driving. Moreover, by comparing the effects of different types of training on cognitive function, brain structure, and driving safety, one can expect to obtain insight into how to design a more effective training program for the optimal driving safety of the elderly. Therefore, we conducted a pilot intervention study and investigated the effects of three different types of cognitive training on cognitive function, brain structure, and the driving safety of the elderly.
Our first aim was to verify the effectiveness of an onboard cognitive training system that we devised to support the elderly in attaining and maintaining their ability to drive safely. The system was designed to enhance processing speed, executive control, working memory, visual processing, and divided attention, which have all been suggested to have significant relevance to driving safety [4,6,22]. It provided two types of training tasks (with different levels of working memory demands), which required the users in the driver's seat to respond to light stimuli presented in their peripheral visual field with the steering wheel and brake pedal, according to a set of given rules. More detailed description of the system is provided in Materials and Methods. The system was implemented onboard with the aim of providing elderly drivers with casual and effective cognitive training on a daily basis. To explore the possible positive effects of setting up an onboard cognitive training system, we included a training group that had cognitive training on a personal computer (PC) with a training program that was similar to that used in the onboard cognitive training but that differed in several driving-related factors, such as the required field of view and the motor control involving distributed parts of the body.
Another aim of the study was to investigate the effects of training in a daily intellectual activity, such as puzzle solving, on driving safety. A considerable number of studies have indicated the beneficial effects of such activities in the enhancement or maintenance of cognitive functions in the healthy elderly [23][24][25][26]. However, inconsistencies and debates on the effectiveness exist [27][28][29], and the effects of training in cognitively engaging daily activities are still unclear. In particular, their effects on driving safety have not been directly addressed. Therefore, in this study, we included a crossword-puzzle training group as another cognitive training group, and this type of cognitive training had less apparent relevance to driving safety.
Thirty-seven healthy elderly (age range, 60-75 years) subjects were randomly assigned to one of three training groups: Group V trained in a vehicle with onboard cognitive training, Group P trained with cognitive training on a PC, and Group C trained in solving crossword puzzles. These groups had 24 sessions of training, with each session taking 20 min, over a period of 8 weeks. Before and after the training period (Pre and Post), they underwent tests on cognitive function, brain structure, and driving safety. The changes in cognitive functions induced by the different types of training were evaluated by a neuropsychological test battery that covered a broad range of cognitive functions. The changes in brain structure were investigated using voxel-based morphometry (VBM) analysis of regional gray matter volume (rGMV) and a tract-based spatial statistics (TBSS) analysis of white matter integrity. VBM is a method that is widely used to investigate regional gray matter structural changes induced by training interventions [15,18,19,[30][31][32][33]. Fractional anisotropy (FA) maps obtained from diffusion tensor imaging (DTI) give indicators of white matter fiber structural integrity, and its changes have also been used to characterize brain structural changes that are induced by training interventions [15,[34][35][36][37][38]. TBSS is a new method which aimed to alleviate the problems of cross-subject FA-map alignment [39]. Driving safety was evaluated by professional driving instructors while the subjects were actually driving on the course of a driving school as well as with a driving aptitude test unit.

Ethics Statement.
This study was approved by the Ethics Committee of the Tohoku University Graduate School of Medicine. Written informed consent was obtained from each subject. The study was conducted according to the principles expressed in the Declaration of Helsinki.

Randomized Controlled Trial Design.
Although being a pilot study, this study was registered in the UMIN Clinical Trial Registry (UMIN000006268). The study was conducted between July 2011 and December 2011 in Sendai City, Miyagi Prefecture, Japan. The flow diagram of this study is shown in Figure 1. The protocol is approved by the Ethics Committee.
The study was a double-blind intervention. The subjects and testers were blinded to the study's hypothesis. The subjects were only informed that the study was designed to investigate the effects of three different training programs, and they were blinded to the training of the other two groups. The testers were blinded to the group membership of the subjects. A researcher (T. N.) randomly assigned the subjects who were stratified by sex to one of the three groups by a random draw with a computer (see below for details).

Subjects.
The subjects were recruited from local residents through advertisements in a local town paper. The interested applicants were screened firstly by a semistructured telephone interview and then using a questionnaire. Thirtynine eligible applicants were invited to Tohoku University for a briefing and two people declined to participate after being explained about the study. All consented subjects ( = 37) were right-handed, and they were native Japanese speakers. They were aged within the range of 60-75 years at the time of participation, and they were daily drivers who owned a car and who drove more than three times a week on an average. The subjects were not using any medications known to interfere with cognitive function, including benzodiazepines, antidepressants, or other central nervous agents. They had no history of head trauma, mental disease, or diseases known to affect the central nervous system, including thyroid disease, multiple sclerosis, Parkinson's disease, stroke, severe hypertension, or diabetes. To exclude those subjects with potential dementia, the exclusion criterion of Mini-Mental State Examination (MMSE) scores less than 25 was applied [40]. None of the subjects were excluded on the basis of this criterion.
All the subjects provided informed consent, which was approved by the Ethics Committee of the Tohoku University Graduate School of Medicine, in order to participate in this study. After informed consent was obtained, the subjects were randomly assigned to one of the three groups (Group V, P, or C, explained below). Male and female elderly drivers were expected to show distinct driving behaviors [41]. To counterbalance the effect of such baseline difference, stratified randomization was performed [42]. Subjects were blocked into separate strata by their sexes and allocated separately into the three groups by a random draw from a computerized random number generator, achieving a balanced sex ratio among the groups. One subject withdrew consent before the group allocation, and one subject allocated to group P dropped out during the intervention period, resulting in a final total of 35 subjects who were included in the analyses ( Figure 1). Table 1 summarizes the baseline demographics and neuropsychological characteristics of the subjects. We observed no significant differences in any of the characteristics among the groups.

2.4.
Overview of the Interventions. The subjects were randomly assigned to one of the three intervention groups, Group V, P, or C, which involved a specific type of training program: Group V trained in a vehicle with an onboard system that we developed, Group P trained with a similar program but on a PC, and Group C trained to solve crossword puzzles. Subjects visited our laboratory 24 times (3 days a week for 8 weeks) and they performed 20 min of training each time. All the training was conducted in the presence of time keepers (part-time university students) who confirmed proper task execution by the subjects and provided the necessary help with regard to operating the training programs. Before and after the training intervention period, the subjects underwent neuropsychological and behavioral tests, structural brain imaging [magnetic resonance imaging (MRI)], functional brain measurement by magnetoencephalography (MEG) which will not be described in this paper, and two types of driving safety tests. The primary outcome measure was the driving aptitude test. The preintervention (Pre) tests were conducted within a week before the start of each subject's intervention period, and the postintervention (Post) tests were conducted within a week after the end of the intervention period. In the following sections, the training and Pre/Post tests are explained in more detail.

Group V: In-Vehicle Cognitive Training
Onboard Cognitive Training System. We developed a cognitive training system that, once installed, can be utilized easily and constantly by elderly daily drivers. Considering their relative tendency to be less familiar with and/or enthusiastic about using computers and video games, we expected that the elderly people would engage in a cognitive training more easily and continuously if it was implemented onboard for use during the frequent occasions of daily driving (e.g., before leaving/after coming home, or in the parking lot of a shopping mall). It was also expected that the training program situated onboard, entailing the use of the visual field common to driving and operable with the familiar steering wheel and brake pedal, would facilitate the transfer of training effects to actual driving situations. To ensure safety, the system was intended to be used while the car was parked.
The system consisted of five light-emitting diodes (LEDs), a speaker, a controller area network interface, and a mobile computer. The LED lights were placed around the driver's seat in the peripheral visual field of the driver. Four lights (topright, bottom-right, top-left, and bottom-left) were located approximately equidistant from the center of the driver's view, and one light was beside the left (opposite to the driver's seat) side mirror. The location beside the side mirror was chosen considering the importance of paying attention to the side mirror for driving safety. In the training task, the LEDs rhythmically and randomly presented a pattern of lights with various colors. The users responded with the steering wheel and brake pedal according to a set of given rules.
The accelerator pedal was not used. Audio feedback informed the subjects of the accuracy of their responses.
The system provided the following two types of training tasks: an immediate response task and a delayed response task. The immediate response task focused on the functions of processing speed, executive control, visual processing, and divided attention. The delayed response task focused on those listed above and working memory. The rules of the two tasks are explained below.
Immediate Response Task. The users responded to the rhythmically presented light stimuli on the four LEDs located around the center with the steering wheel. When two lights of the same color (green or blue) were presented on the same side (left or right), the subjects were to turn the wheel in the opposite direction as if to avoid an obstacle on that side. Yellow lights were distractors, and thus the subjects were to suppress their response even if two yellow lights were presented on the same side. In addition, red lights were presented randomly on any of the five LEDs, sometimes in synchronization with the rhythm and sometimes out of the rhythm. The users were to respond to the red light regardless of its position. In particular, when two lights of the same color (green or blue) were on the same side and a red light was on one of the remaining LEDs, they were required to conduct both steering and braking responses. Both the steering wheel and brake were to be operated when the corresponding stimuli were shown, and they were to be returned to the neutral position thereafter.
Delayed Response Task. In this task, the light stimuli were presented in the same way as in the immediate response task. The difference was that the users were to delay their steering responses to the periodic stimuli by (= 1, 2, . . .) steps. Therefore, the users were required to simultaneously judge the current light pattern, memorize the operation in order to conduct its steps later, and conduct a wheel operation in response to -steps-back stimuli if necessary. The load to working memory increased as larger was used. In the present study, we used = 1 for the first 12 visits and = 2 for the later 12 visits. In contrast, brake responses to the random red lights were to be given without delay, as in the immediate response task. In particular, the users were to conduct both steering and braking responses when two lights of the same Table 2: Relationship between the difficulty levels and the periods of rhythmic stimuli in the immediate response task and delayed response task. color (green or blue) were on the same side step(s) before and a red light was on one of the remaining LEDs.
Adaptive Training with Dynamically Adjusted Tempo. In both tasks, three successive correct responses induced a change in the task to a tempo in which the rhythmic stimuli were one level faster, and two successive wrong responses induced a change to a level that had a slower tempo. There were 10 levels for tempo, and Table 2 shows the period of the rhythmic stimuli for each level. The dynamically adjusted tempo made the task involve adaptive training, in which the difficulty was always adequately challenging despite the users' changing learning levels. In addition, the training system provided three types of level-dependent background music, with changes between levels 3 and 4 and between levels 6 and 7.
Training Procedure. The onboard system was equipped onto a passenger car placed in the parking lot of our institute, and the subjects in Group V individually came and performed two sessions of the immediate response task and two sessions of the delayed response task on each training day. Each session took 5 min. For the delayed response task, the required delay for the steering response to the periodic stimuli was set to = 1 for the first 12 training days and to = 2 for the later 12 training days. At the end of each training session, the system provided a visual feedback that informed the subject of the transition of performance.

Group P: On-PC Cognitive Training
On-PC Training Tasks. As a counterpart of the onboard training system, we also developed an on-PC cognitive training system. This system also provided the two types of tasks, the immediate response and delayed response tasks, with the same rules as those explained above. The main differences were as follows. (1) This system presented colored stimuli on a PC display instead of LED lights. The four LEDs around the center of the users' visual field were replaced by four corners of a 24-inch display. The visual angle from the center to the four stimuli positions was 20 ∘ . The fifth distant LED beside the side mirror was replaced by stimuli presentation at the center position. (2) Instead of a steering wheel and brake pedal, all responses were made with a three-button computer mouse. Steering to the left and right was replaced by clicking the left button with the right index finger and the right button with the right ring finger, respectively. The brake pedal was replaced by clicking the middle button with the right middle finger.
Because of the shared features of the tasks, we assumed that the basic demands on the functions of processing speed, executive control, and working memory would not be different from that of the onboard cognitive training. However, we also expected that because of the comparatively narrower visuospatial and modality distributions in the input stimuli (limited use of peripheral visual field) and the output effectors (only right-hand fingers) than those of the onboard training, the effects on the functions of visual processing and divided attention would be smaller. In addition, the differences in the similarities to actual driving were expected to make the transfer of this on-PC training to driving safety less likely than that of the onboard training.
Training Procedure. Each subject in Group P visited a room in our institute, and they had two sessions of the immediate response task and two sessions of the delayed response task on a PC each time. The parameters of the training tasks were the same as those of the onboard training, and each session took 5 min. For the delayed response task, the required delay for the left/right-button response to the periodic stimuli was set to = 1 for the first 12 training days and to = 2 for the later 12 training days. At the end of each training session, the system gave visual feedback that informed the subjects of the transition of performance.

Group C: Crossword Training Group
Crossword Training Task. We adopted a number crossword puzzle as a training task for Group C. This is a version of a kanji crossword puzzle with numbers in which squares with the same number should contain the same kanji character. Some cells are filled with kanji characters in the initial state, and a set of available characters is also provided. There are no additional clues. The goal is to fill all the white numbered squares so that every succession of white squares makes a meaningful word. We used a PC program that was played using a computer mouse.
For successful performance, the task was supposed to require the ability to make inferences, processing speed, and executive function in order to efficiently try various possibilities, working memory to keep in mind the possible choices and hypotheses, and divided attention to check the consistency of the consequences at different positions simultaneously. Vocabulary was also required; however, it was not critical as the answers consisted of words that most people know.
Training Procedure. Each of the subjects in Group C visited a room in our institute, and they were trained on solving the number crossword puzzle on a PC program for 20 min each time. All the operations were performed using a computer mouse. Their progress was recorded, and if they could not finish a problem in the 20 min, they continued it the next time they came in. If they completed a problem before the time limit, they proceeded to the next problem. If they gave up on a problem, the answer was shown, and they then proceeded to the next problem.

Neuropsychological Tests.
To evaluate the possible effects of the three types of cognitive training on a broad range of cognitive functions, the subjects underwent the following neuropsychological tests in the Pre and Post tests: MMSE [40], the Block Design (BD) subtest in the Wechsler Adult Intelligence Scale (WAIS) III [43], the Frontal Assessment Battery (FAB) at bedside [44], the Word Fluency Test (WFT) [45], the Trail Making Test (TMT) [46], the Symbol-Digit Modalities Test (SDMT) [47], the Spatial Span subtest of the Wechsler Memory Scale-Revised (WMS-R) [48], the Benton Visual Retention Test (BVRT) [49], the Rey-Osterrieth Complex Figure Test (CFT) [50], the Rey Auditory-Verbal Learning Test (AVLT) [51], and the Judgment of Line Orientation (JLO) [52]. The MMSE was used to screen for cognitive impairment, and subjects were required to have a score of 25 or more for inclusion in the analyses (no subjects were excluded). The other test measures were used to evaluate cognitive functions, as explained in the next section.

Analyses of Cognitive Functions.
One way to obtain a general and stable measure of a cognitive construct is to calculate a composite measure of multiple neuropsychological tests relevant to the cognitive function [5,37,53,54]. By summing up the standardized scores (mean, 50; standard deviation, 10) of the relevant test measures explained above with some required sign-flipping so that all the scores indicated better performance when they were higher, we evaluated the following three composite measures for the specific cognitive function domains: (1) processing speed composite, which included TMT-A and SDMT; (2) executive function composite, which included FAB, WFT, and TMT-B; and (3) working memory composite, which included SS, BVRT, and AVLT. As stated above, these cognitive domains were among the main targets of the onboard cognitive training. Therefore, we hypothesized that Group V as well as Group P would show significant improvements in these cognitive domains. In addition, by combining BD, WFT, TMT-B, BVRT, CFT-Copy, CFT-Recall, AVLT, and JLO, we calculated a composite measure of cognitive impairment (COGSTAT), which has been reported to be a significant predictor of driving safety in drivers with Alzheimer's disease [5,53,55].
The improvements (Post − Pre) in each composite measure were computed for each subject. To exclude the possible influence of preexisting factors of noninterest, the improvement values of all subjects were fit to a general linear model with their mean-centered age, sex, and Pre scores (baseline) as explaining variables, and they were adjusted for these confounding variables. Then, the significance of the training effects on the improvements in each composite within each intervention group was tested with the Wilcoxon signedrank test (one-sided with the alternative hypothesis being that the improvement is positive). We used the nonparametric Wilcoxon test because some of the outcome variables did not satisfy the assumption of normal distribution on which the parametric -test is based. Because of the exploratory nature of this study, the test results were considered significant at < 0.05 with the Benjamini-Hochberg (BH) procedure [56] applied to the combined data of the cognitive composites and the driving safety for each group in order to control the false discovery rate (FDR).
In addition, for each pair of groups, we conducted the Wilcoxon rank-sum test (two-sided) in order to explore the significant group differences in the improvements in the composites. The test results were considered significant at < 0.05 with the BH in order to control the FDR.  [57] method in MATLAB (The MathWorks, Inc., Natick, MA, USA). The New Segmentation algorithm was applied to all T1-weighted images both before and after the interventions to extract tissue maps that corresponded to gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF). To reduce the risk of missegmenting irrelevant tissues, such as dura, into GM, we used a modified gray matter tissue probability map (TPM) that was obtained by changing the values of the voxels of the standard GM TPM in SPM8 with values less than 0.25 to zero. The GM tissue images obtained were then subjected to the DARTEL template creation tool. A study sample-specific template and the nonlinear deformations that best aligned the images with the template were obtained by iteratively registering the imported images with their averages. The DARTEL normalization tool was used to apply the estimated deformations and spatially normalize the tissue images to the template and then to the Montreal Neurological Institute (MNI) standard space with a 12-parameter affine linear transformation. The warped images were modulated by the Jacobian determinants that were derived from the nonlinear deformation parameters to compensate for individual local volume deformations [58]. Then, all the warped and modulated GM images were smoothed by convolving a 12 mm full-width at half maximum isotropic Gaussian kernel. Finally, the signal changes in the rGMV between the Pre and Post images were computed at each voxel for each subject. In this computation, we included only the voxels that showed rGMV values >0.10 in both the Pre and Post images in order to avoid possible partial volume effects around the tissue borders.
The processed maps representing the rGMV changes between the Pre and Post scans (rGMV Post − rGMV Pre) were then forwarded to the whole-brain group level analysis. Within each intervention group, we tested the rGMV changes (both positive and negative) with a general linear model (one sample -test with covariates). In addition, for each pair of groups, we performed one-way analysis of covariance (twosample -tests with covariates) in order to find the significant differences in the ways that the rGMV changed between the groups. In both types of tests, the mean-centered age, sex, total gray matter volume in the Pre scan, and total intracranial volume in the Pre scan were included in the model as covariates. The statistical model estimation was performed with SPM8, and the level of statistical significance was set at < 0.05 with family-wise error corrected for multiple comparisons with the nonisotropic adjusted cluster level [59] with an underlying voxel level of < 0.001. For the correction, the NS toolbox (http://fmri.wfubmc.edu/cms/software#NS) was used. Nonisotropic adjusted cluster-size tests can and should be applied when cluster-size tests are used for data that is known to be nonstationary (in other words, not uniformly smooth), just as is the case for VBM data [59].

Tract-Based Spatial Statistics on Diffusion
Tensor Images. The preprocessing and tract-based spatial statistics (TBSS) analyses of the DTI data were performed with the FSL software package (http://www.fmrib.ox.ac.uk/fsl/, Version 5.0), according to the standard procedures described previously in [39]. First, the DTI images of each subject and each time point (Pre and Post) were prealigned with each other in order to correct for eddy current, distortion, or head motion. Then, the diffusion tensor was calculated and used to obtain an FA map, and nonbrain voxels were excluded. The obtained FA maps were subjected to noise reduction by eroding exterior noise-susceptible voxels, aligned to the FMRIB58 FA MNI standard-space image (1 × 1 × 1 mm 3 ) by nonlinear registration and averaged to obtain a studyspecific mean FA image. A tract skeleton was generated from the mean FA image, and it was thresholded for FA values more than 0.2 in order to restrict further analysis to points within the white matter that had been successfully aligned across subjects. All individual FA maps were projected to the thresholded mean FA skeleton, resulting in the skeletonized FA data for each subject and time point. The projection to the mean FA skeleton is the essence of the TBSS analysis to reduce potential problems of misregistration as a source of falsepositive or false-negative results in the subsequent voxel-wise cross-subject statistics [39].
In the same manner as the VBM analysis on the rGMV, the changes in the projected FA signals between Pre and Post (FA Post − FA Pre) were computed for each subject at each voxel in the mean FA skeleton and subjected to statistical tests of the effects of the interventions. We conducted a onesample test of the FA changes within each group and a twosample test of the differences in the FA changes between each pair of two groups. Mean-centered age and sex were included in the model as covariates. The statistical model estimates were performed with permutation-based crosssubject statistics [60] implemented as the Randomise Tool in FSL. A total of 5,000 permutations were performed for each contrast, and statistical inferences were made with the threshold-free cluster enhancement (TFCE) [61] correction for multiple comparisons. Fully corrected values less than 0.05 were considered significant.

Driving Aptitude Tests with a Simulator Test Unit.
Subjects were tested with the CG400 driving aptitude test unit (TKK 7020, Takei Scientific Instruments Co., Ltd., Niigata City, Japan) [62]. This computer-based driving ability test unit has been licensed by the National Police Agency in Japan, and it has been installed in many driving schools. It is widely used as an evaluation system for the driving aptitude of elderly people. The test unit is equipped with a cathode-ray tube monitor, a steering wheel, an accelerator, and brake pedals, and it is in accordance with the National Police Agency System Guidelines [63]. It provides tests on the following four types of tasks: simple response, selective response, handling operation, and divided attention to multiple tasks. Based on the performances of these tasks, the test unit evaluates five-level grades on 14 driving-related aspects: (1) speed of simple responses, (2) stability of simple responses,

On-Road Evaluation by Driving School Instructors.
Subjects visited a local driving school and had a driving test before and after the intervention period. The tests were scheduled on a day different from the days when the other parts of the Pre/Post tests were conducted at our institute but within a week before the start of each subject's intervention period (Pre) and within a week after the end of the intervention period (Post). The tests were conducted during the daytime but not on days with extremely bad weather with limited view or with snow cover on the roads.
The driving test was conducted in the training course of the driving school. Subjects drove a 1.7 km route with  Table 3, and the counts were converted to a fivelevel grade for each check item, resulting in a total grade of a maximum of 130 for driving with no unsafe actions counted.

Analysis of Driving Safety
Changes. The total grade given by the driving aptitude test unit was analyzed in the same manner as the cognitive composite measures: improvements (Post − Pre) of the total grade were computed for each subject and adjusted for age, sex, and Pre (baseline) total grades. The within-group training effects on the improvements were tested with the Wilcoxon signed-rank test (one-sided with the alternative hypothesis being that the improvement is positive). The between-group differences in the improvements were tested with the Wilcoxon rank-sum test (two-sided).
The total grade of the on-road driving safety test was first adjusted for the effect of the testers (driving instructors). Then, the data were subjected to the same within-group and between-group analyses as the driving aptitude measure. The means and SEMs of each of the changes in the measures were calculated from the differences in the measure (Post − Pre) that were adjusted for age, sex, and the Pre (baseline) score of the measure. The P values for each group show the significance of the improvements (Post − Pre > 0) obtained from the Wilcoxon signed-rank tests (one-sided) and corrected by the Benjamini-Hochberg (BH) procedure to control the false discovery rate (FDR). The P values for each pair of the groups were from the Wilcoxon rank-sum test (two-sided), which explored the significant group differences in the improvements of each measure with a correction for multiple comparisons with the BH procedure in order to control the FDR.

Performance Progress in the Training Tasks.
As explained in the Materials and Methods, the training tasks of Groups V and P consisted of adaptive training in which the tempo of the presentation of the stimuli (trials) was dynamically adjusted according to the subject's performance level. Hence, the numbers of presented and correctly processed stimuli worked as indicators of the performance progress of the trained tasks. Subjects in both Groups V and P showed sustained progress in the performances of the two training tasks.
In Group C, we did not have an exact quantitative evaluation of the improvements in the ability to perform the training task because the crossword puzzle problem solved by each subject differed day by day according to his/her progress. However, the recordings of the time keepers confirmed that all the subjects made progress so that their answer characters were found more quickly and steadily.

Changes in Cognitive Function.
To characterize the effects of each training intervention on cognitive function, we conducted within-group tests of the improvements in the composite measures of three cognitive domains (processing speed, executive function, and working memory) as well as in a composite measure of cognitive impairment (COGSTAT) [5]. The changes in the composite scores (Post − Pre) were adjusted for age, sex, and the Pre (baseline) score, and the hypothesized improvements (Post − Pre > 0) were tested using the one-sided Wilcoxon signed-rank test for each group, with the BH procedure [56] employed to correct for multiple comparisons. Figure 2 and Table 4 show the summaries of the intervention effects and their significance. Group V showed significant improvements in the processing speed ( = 0.048) and working memory ( = 0.048) composites as well as marginally significant improvements in the executive function ( = 0.076) and COGSTAT ( = 0.076) composites.
Group P showed no significant improvements in any of the composites. Group C showed no significant improvements in any of the composites, but marginally significant improvements in the working memory ( = 0.092) and COGSTAT ( = 0.078) composites.
In addition, we explored the intergroup differences in the improvements in each cognitive domain with the two-sided Wilcoxon rank-sum tests between each pair of groups, with the BH procedure used for multiple comparison corrections. The statistical tests found no significant intergroup differences for any of the cognitive composites (Table 4; Figure 2).

Regional Gray Matter Volume Changes.
With the VBM analysis of the one-sample -tests of the (Post − Pre) differences in the images in each training group, we found several significant rGMV changes in each group. In Group V, significant rGMV increases were observed in the left orbitofrontal cortex (OFC) adjacent to the inferior frontal gyrus (IFG; = 8.33, = 0.033; Figure 3(a)). In Group P, significant rGMV increases were observed in the right middle frontal gyrus (MFG; = 18.24, = 0.003) and the left superior occipital gyrus (SOG; = 7.25, = 0.025; Figure 3(b)). In Group C, significant rGMV increases were observed in the left dorsolateral prefrontal cortex (DLPFC; = 8.70, = 0.033; Figure 3(c)) and significant rGMV decreases were observed in the precuneus ( = 22.65, = 0.002), the medial cerebellum ( = 12.10, = 0.002), and the right caudate ( = 8.76, = 0.039; Figure 3(d)). No significant changes were observed for the other within-group contrasts.
In addition, two-sample -tests that compared the rGMVs between the pairs of groups revealed that the rGMV increases in the structures around the right caudate were significantly larger in Group P than in Group C (Figure 4). No significant differences in rGMV changes were observed for the other between-group contrasts.
The summary of these rGMV changes is given in Table 5.

Fractional Anisotropy Changes in White
Matter. TBSS analysis was used to investigate the effects of the different types of training on white matter integrity, and we conducted one-sample tests of the FA changes (FA Post − FA Pre) within each group. The significance of the permutation test was corrected for multiple comparisons with the TFCE correction [61], and clusters determined by the threshold (1 − ) > 0.95 (or equivalently < 0.05) on the correctedmaps were considered significant. The results of the TBSS analysis revealed significant FA increases in white matter tracts around the left intraparietal sulcus (IPS) and the left precuneus of Group C ( Figure 5; Table 6). No significant increases or decreases of FA were observed in the other training groups.
In addition, we conducted two-sample tests of the differences in the FA changes between each pair of groups, and we found no significant differences in the FA changes.

Changes in Driving Safety
Measures. The total grades obtained from the driving aptitude test unit were analyzed in the same way as the cognitive composite measures: the (Post − Pre) total score changes were adjusted for age, sex, and the Pre score, and the hypothesized improvements (Post − Pre > 0) were tested with one-sided Wilcoxon signed-rank tests for each group, with the BH correction used for multiple comparisons. The total grades for the on-road driving safety tests were analyzed with the same procedure after they had Table 5: Summary of regional gray matter volume changes detected by the voxel-based morphometry analysis. P values were family-wise error corrected for multiple comparisons with the nonisotropic adjusted cluster level [59] with an underlying voxel level of P < 0.001.   Figure 4: Significant between-group difference in regional gray matter volume (rGMV) changes. (a) In the structures around the right caudate, the rGMV changes were significantly different between Groups P and C. The colored cluster shows the region that exhibited significant differences in rGMV change with family-wise error corrected < 0.05 at the nonisotropic adjusted cluster level [59] with an underlying voxel level of < 0.001. (b) The average of rGMV changes in each group shows that the differences in the region were largely because of the rGMV increases in Group P. The error bars in the graph show the 95% confidence intervals. The listed clusters were defined by threshold-free cluster enhancement-corrected (1 − P) > 0.95. COG: center of gravity.
first been adjusted for the effects of the testers. Moreover, the intergroup differences in the improvements in driving aptitude and the on-road driving safety evaluations were also tested in the same manner as the cognitive composites with the two-sided Wilcoxon rank-sum test between each pair of groups, with the BH procedure used for multiple comparison corrections. Figure 6 and Table 4 show the summaries of the intervention effects on the driving safety measures. Group V showed significant improvements in the driving aptitude ( = 0.015) and on-road evaluation ( = 0.040). Group P showed no significant improvements in either test. Group C showed significant improvements in driving aptitude ( = 0.015) but not in the on-road evaluation. The between-group differences in the driving safety measures were not significant.

Discussion
The purpose of the present intervention study was twofold. The first aim was to verify the effectiveness of the newly  developed onboard cognitive training system in enhancing the driving safety of elderly people as well as its effects on related basic cognitive functions and neural plasticity. The other aim was to elucidate the diverse effects that the different types of cognitive training exerted on cognitive functions, brain structure, and driving safety by observing changes in these aspects simultaneously. The results were generally in accordance with these aims and the expectations. Group V, who underwent onboard training, showed significant improvements in the driving safety measures as well as significant or marginally significant improvements in multiple cognitive domains, supporting the expected beneficial effect of the training. The MRI data analyses of neural plasticity detected structural changes that seemed to be associated with the different functional demands for each training task. In the following sections, we discuss each aspect of the traininginduced changes in detail.

Changes in Cognitive Functions.
Group V with the onboard training showed significant improvements in the processing speed and working memory composites as well as marginally significant improvements in executive function. These results were consistent with our expectations because the training program focused exactly on these cognitive domains which are relevant to driving safety [4], along with visual processing and attention [6,22]. The results of Group P, who showed no significant improvements in either of the cognitive domains, were less expected. From the shared basic design of the on-PC training of Group P and the onboard training of Group V, we assumed that the involvement of processing speed, executive control, and working memory functions would be at the same level. In addition, the performance changes in the trained tasks were also similar between the two groups. Therefore, we did not anticipate the qualitative differences in the training effects of Groups V and P on these basic cognitive function domains. One interpretation of these results is that cognitive training can have greater effects if it also involves bodily performance, which the onboard training contained but the on-PC training lacked. This interpretation was consistent with the results of studies that have shown positive effects of aerobic exercise [16,18,19,64] or even very mild physical activity [65] and effects of the combination of cognition and aerobic training [20] on cognitive functions and neural plasticity. Another possible explanation is that the differences in the ease of operation (or the familiarity to the user interface) and the resultant differences in the comfort, motivation, and engagement in the task caused differences in the transfer effects of the training to cognitive functions. For example, the speed of processing training, which has been reported to have an enhancing effect on cognitive and daily functions [8][9][10][11], uses a PC with a touch screen instead of a computer mouse and thus makes the training more intuitive and easier, particularly for those who are not familiar with the computer mouse operation. Yet another possible explanation is that our subjects had higher baseline functioning and thus relatively lower sensitivity to training. Studies have observed stronger training effects for participants with relatively poor performance at baseline, suggesting that those with reduced functioning were likely to receive greater benefit from training [8,11]. In either case, the results indicated an advantageous effect of implementing cognitive training onboard, at least for elderly drivers.
Group C trained to solve crossword puzzles showed marginally significant improvements in the working memory as well as the COGSTAT composites. These results seemed to support the findings of previous reports that daily intellectual activity, such as puzzle solving, has beneficial effects on cognitive function in the healthy elderly [23][24][25][26]. We also noted the possibility that the training setting for Group C in this study, which required concentrated effort to solve the puzzles within the limited time frame of 20 min, may have entailed larger effects than solving such puzzles as a daily habitual activity [27]. The observation that the crossword puzzle training had marginal effects on working memory as well as on the COGSTAT, which combined memoryrelated test measures such as CFT-Recall, AVLT, and BVRT, was understandable because of the nature of the task. The number crossword puzzle used in the training required the memorization and retrieval of various types of information, such as the available characters to fill in the squares, the characters already filled in that work as constraints to make meaningful words, and the possible choices and hypotheses in parallel.

Structural Changes in the Brain.
Group V showed significant rGMV increases in the left OFC adjacent to the IFG. This region has been frequently implicated in sensory integration, the processing of affectively salient stimuli, reward prediction, and decision making [66]. These findings may support the speculation that the onboard training task successfully elicited motivated engagement, leading to better transfer effects for a wide range of cognitive domains. In contrast, it has been reported that the rGMVs of the bilateral OFC are negatively correlated with individual differences in impulsivity, suggesting the region's contribution to executive function [67]. In addition, gray matter volumes in the bilateral IFG have been reported to positively correlate with individual differences in processing speed in healthy elderly [68]. The observed rGMV increases may have reflected improvements in these functions.
The regions that showed significant rGMV increases in Group P may correspond to the functional processes required by the training task. For example, the left SOG was involved in the visual processing of stimuli, and the right MFG integrated the bottom-up processing with the top-down control [69]. Although the direct intergroup comparison did not show significant differences, the different patterns of the rGMV changes between Groups V and P suggested that the two groups may have engaged in the training in different ways, despite similarity in the basic task design.
Group C showed the widest range of neural plasticity that was largely reasonable considering the nature of the training task and its observed effects on working memory. The left DLPFC, which showed significant rGMV increases in Group C, constitutes the frontoparietal network essential for working memory [70][71][72]. The significant FA increase in the white matter tracts around the left IPS and the left precuneus that was revealed by the TBSS analysis was consistent with previous findings that working memory training induces significant FA increases in younger adults [38]. In addition, it has been reported that the FA values in the same WM region positively correlate with processing speed [73]. Interestingly, the regions that showed significant rGMV decreases, including the precuneus, the medial cerebellum, and the right caudate, have also been reported to be involved in working memory [74][75][76]. Recent findings, although largely in younger cohorts, have suggested that, in certain cases, rGMV decreases rather than increases are associated with increased cognitive and behavioral functions [31,33,77]. Therefore, it is possible that these rGMV decreases also reflected the training-induced cognitive enhancements. Another possible explanation for the observed rGMV decreases may be attributed to ageinduced shrinkage. In this context, it has been reported that the cerebellum and caudate are susceptible to age effects [78] but that the precuneus is less affected by age [79]. Considering the intervention period of 2 months, it is not likely that all the observed rGMV decreases in Group C were caused by an aging effect. Further studies are needed to clarify this point.

Changes in Driving Safety.
Group V showed significant improvements in both of the driving safety measures, the driving aptitude test, and the on-road evaluation. These results indicated that the onboard cognitive training system indeed had a transfer effect on the ability to drive safely. In contrast, Group P did not show significant improvements in either of the measures. These results were consistent with our initial hypothesis that the differences in the drivingrelated factors of the training tasks, such as the required field of view for the processing of stimuli and the motor control involving distributed body parts, would make the onboard training more effective than the on-PC training. Another explanation is that the training effects on the basic cognitive domains of processing speed, executive function, and working memory, which were significant only in Group V, were transferred to driving safety. This interpretation may again explain the discrepancy from the studies which showed that driving safety improvements could be driven with computer only methods [8][9][10][11]: our subjects in Group P who lacked significant training effects on the basic cognitive functions, possibly due to the difficulty of operation as discussed above, may also have lacked the transfer effects to driving safety. Note that all the subjects were daily drivers who reportedly owned a car and who drove more than three times a week on an average. Therefore, it is not likely that simple contact with the driving environment in Group V induced the difference. Either way, the results again suggested the advantage of an onboard training system over a general (on-PC) training of a similar type.
Group C showed significant improvements in the driving aptitude measures but not in the on-road driving safety evaluation. This can also be interpreted in terms of the difficulty of the far-transfer effect. From its purpose as an aptitude evaluation system, the driving aptitude test continuously poses situations with high cognitive demands, and thus, the improvements in the general cognitive function domains would have better benefitted the evaluation. In comparison, the on-road evaluations involved more factors specific to the domain of driving. It has been repeatedly reported that the far-transfer effects from training to a task with less similarity are harder to achieve than the near-transfer effects from training to a task with larger overlap in the underlying processes, especially for older adults [80][81][82][83][84][85]. Therefore, the effects of crossword training on working memory and other cognitive functions may have been better transferred to the "nearer" driving aptitude tests and not to the "farther" onroad evaluations.

Limitations.
The major limitation of this study was the small sample size, which was mainly because of the relatively limited resources for the requirements of isolation in order to prevent interactions between the subjects, supervision, and control in the training intervention. Consequently, we had only limited results from the intergroup analyses, which would have been more conclusive if they had been significant, compared with the within-group results reported in the present study. Therefore, a possible future direction would be to replicate and extend the results of the present study with a larger sample and less strict (more casually controlled) trial design. One might argue that another way to establish the intergroup effects is by introducing a no-intervention control group. However, this would be less helpful because comparisons with a no-intervention control cannot dissociate the confounding effects from participation in the trial itself. In addition, we explored the direct association between the cognitive enhancements, neural plasticity, and the changes in driving safety with correlational analyses, as has been reported in a cross-sectional study [86]; however, we did not find any significant relationships (not shown in this paper). To determine if this was due to insufficient statistical power or if this originated from the complexity of the relationships of cognitive function through the substrate to the driving behavior [7], a study with a larger sample is required.
Another remaining issue is the maintenance of the training effects. How long are the effects of the training sustained once the training is terminated? How effective is the addition of booster training and what is the best timing [87,88]? Additional follow-up studies are required to answer these questions.

Real-World
Applicability of the Onboard Cognitive Training. As described in the Materials and Methods, the onboard cognitive training system was developed with the expectations that it can be utilized by the elderly daily drivers easily and continuously in a natural extension of daily driving activities and can facilitate the transfer of training effects to actual driving situations. The obtained results generally supported the usefulness of the onboard training.
Regarding the installation of the onboard system, it can be largely covered by existing or commonly available equipment. The training programs can be implemented on a smartphone or on a computer of an automotive navigation system. The information of the steering wheel and brake pedal can be collected through a standardized vehicle network bus, which has been equipped in most of modern cars. Audio speakers are also equipped in most of cars. Although the LED lights and their controller need to be newly introduced, they are relatively inexpensive. In total, we expect that the onboard training program can provide a viable option for enhancing elderly drivers' safety in real life.

Conclusions
To the best of our knowledge, this is the first study to simultaneously investigate the possible effects of different types of cognitive training on cognitive enhancement, neural plasticity, and changes in driving safety. The results showing improvements in cognitive functions and driving safety verified the effectiveness of the onboard training system that we developed and suggested that implementing a training system in the car facilitates enhancements of driving safety. However, the results of the behavioral improvements induced by crossword puzzle training, which had less apparent relevance to driving, also implied potential usefulness. Combined with the neural plasticity results from the structural analyses of the MRI data, these different types of training can contribute to driving safety through different underlying mechanisms. Thus, one future direction worth exploring is a multimodal training approach [89] that combines these different types of training. Despite the limitations discussed above, we believe that the reported results and implications provide useful insights for future studies on the enhancements of driving safety and broad daily activities of elderly people.