Behavioural performance improvement in visuomotor learning correlates with functional and microstructural brain changes

A better understanding of practice-induced functional and structural changes in our brains can help us design more effective learning environments that provide better outcomes. Although there is growing evidence from human neuroimaging that experience-dependent brain plasticity is expressed in measurable brain changes that are correlated with behavioural performance, the relationship between behavioural performance and structural or functional brain changes, and particularly the time course of these changes, is not well characterised. To understand the link between neuroplastic changes and behavioural performance, 15 healthy participants in this study followed a systematic eye movement training programme for 30 minutes daily at home, 5 days a week and for 6 consecutive weeks. Behavioural performance statistics and eye tracking data were captured throughout the training period to evaluate learning outcomes. Imaging data (DTI and fMRI) were collected at baseline, after two and six weeks of continuous training, and four weeks after training ended. Participants showed significant improvements in behavioural performance (faster task completion time, lower fixation number and fixation duration). Spatially overlapping reductions in microstructural diffusivity measures (MD, AD and RD) and functional activation increases (BOLD signal) were observed in two main areas: extrastriate visual cortex (V3d) and the frontal part of the cerebellum/Fastigial Oculomotor Region (FOR), which are both involved in visual processing. An increase of functional activity was also recorded in the right frontal eye field. Behavioural, structural and functional changes were correlated. Microstructural change is a better predictor for long-term behavioural change than functional activation is, whereas the latter is superior in predicting instantaneous performance. Structural and functional measures at week 2 of the training programme also predict performance at week 6 and 10, which suggests that imaging data at an early stage of training may be useful in optimising practice environments or rehabilitative training programmes.


Introduction
There is growing evidence from human neuroimaging studies ( Zatorre et al., 2012 ;Fields, 2015 ;Collins, 2015 ;Tucker and Luu, 2012 ;Wang et al., 2014 ;Sampaio-Baptista and Johansen-Berg, 2017 ) that learning is associated with measurable structural brain changes. This alteration in structure supports the corresponding changes in the functional properties of individual neurones, giving rise to behavioural change ( Sampaio-Baptista and Johansen-Berg, 2017 ).
Practice-induced structural changes of cortical and subcortical areas have been reported to occur in grey and white matter . While most studies consider the long-term effect of practice, some recent studies have demonstrated measurable changes at much faster Schmithorst and Wilke, 2002 ;Maguire et al., 2000 ;Sato et al., 2015 ). These classical studies consistently show larger local brain volume in task-relevant brain areas in skilled participants compared to untrained controls. Significant positive correlations between grey matter volume and the duration of practice are commonly reported and support the claim that the observed differences are training-induced ( Draganski et al., 2006 ;Cannonieri et al., 2007 ;Maguire et al., 2000 ).

Diffusivity changes in response to short duration perceptual and motor training
In addition to the volumetric brain structure changes discussed above, brain microstructure, as measured by DTI, has been shown to be a good predictor for skills, such as musical aptitude ( Spray et al., 2017 ), musical performance ( Oechslin et al., 2010 ), gymnastics ( Deng et al., 2018 ), reading skills ( Horowitz-Kraus et al., 2014 ), or working memory ( Takeuchi et al., 2010 ), as well as in clinical diagnosis in a wide range of areas, ranging from hallucination proneness ( Spray et al., 2018 ) to multiple sclerosis ( Roosendaal et al., 2009 ).
Plasticity studies can broadly be split into studies that compare groups that received extensive training with control groups, and studies that measure rapid learning-induced microstructural changes within subjects. Studies that compare groups commonly show that extensive training is linked to increased fractional anisotropy (FA) and decreased diffusivity measures (e.g. Steele et al., 2013 ;Moore et al., 2017 ;(motor training), Salminen et al., 2016 ;(n-back memory), and Cao et al., 2016 ; (cognitive training)). Increases in FA and decreases in diffusivity measures, however, are not reported in all training studies: Imfeld et al. (2009 ), for example, showed that diffusivity measures in the corticospinal tract of early onset musicians was higher and FA lower than in controls. Oechslin et al. (2010 ), presumably using the same imaging data, showed that mean FA in the superior longitudinal fasciculus was negatively correlated with pitch identification performance. Similar data was reported by Vandermosten et al. (2016 ) who showed that, for a group of trained phoneticians, the FA in auditory areas was reduced compared to controls. Golestani et al. (2011 ), also for phoneticians, showed systematic gross morphological difference in the transverse gyri of the trained group vs controls. These structures are thought to be established in utero, highlighting that influences of domain-specific predispositions cannot be distinguished from training related morphological differences when comparing controls with groups that received specialist training (and may well have been selected for this training because of specific predispositions). This significant confound can be avoided by looking at the effect of training on a novel task within groups although this approach precludes the study of very long term training effects since the same participants have to be scanned at least twice, before and after training.
A number of recent studies have shown diffusivity reduction and/or FA increases in task-relevant brain areas after very short training periods. , for a lexical training task, showed rapid diffusivity reduction in language-related brain areas, while Tavor et al. (2020 ) showed that short duration (45 mins) keyboard training causes significant MD reduction in motor areas. Similarly, training on memory tasks cause rapid changes in the hippocampus: Antonenko et al. (2016) showed that reductions in diffusivity and increases in FA are seen in older participants (mean age 69 years), suggesting that observed changes are not age-limited. Sagi et al. (2012 ) and Tavor et al. (2013 ), showed significant changes to hippocampal microstructure in participants trained to navigate in a car racing game. Interestingly, in Tavor et al. (2013 ) study, the transient changes returned to baseline after approximately a week. This is consistent with data by Blumenfeld-Katzir et al. (2011) that link DTI changes observed in rats trained to perform a navigation task over five days to structural plasticity in astrocytes. The involvement of these glial cells suggests that the rapid microstructural changes seen in rats and humans may not be the results of direct changes to neural processes, but perhaps instead reflect changes that are precursors to more long-term neural changes. Furthermore, at least in the short term and in rats, rapid diffusivity changes appear to depend on the nature of the novel experience rather than exposure time .
In summary, while long term DTI training studies show conflicting changes in DTI parameters with learning, which may be explained by group differences, short term studies show more consistent increases in FA and/or reductions in diffusivity measures, even after very short training on novel tasks. These changes often are correlated with performance and do not appear to be limited by the age of the participants ( Antonenko et al., 2016 ;Madden et al., 2009 ;Antonenko et al., 2016 ;Nichols and Joanisse, 2016 ;Bennett et al., 2011 ). These changes may reflect transient changes in supporting tissue, for example astrocytes Theodosis et al., 2008 ), rather than directly reflect neural plasticity.
Functional and structural changes to neural tissue can be caused by a number of mechanisms and affect grey matter, white matter, and extraneuronal structures (review: Zatorre et al., 2012 ). Unfortunately, the effect of training on these structures and the relationship between the different DTI parameters is not well understood ( Golestani et al., 2011 ). Longitudinal studies that include multiple scans may provide the necessary data to better understand how practice induces change in DTI measures of microstructure in the human brain ( Zatorre et al., 2012 ;Wang et al., 2014 ;Sampaio-Baptista and Johansen-Berg, 2017 ).
To explore the underlying learning mechanisms during longitudinal studies ( Kelly and Garavan, 2005 ;Cao et al., 2016 ), DTI needs to be combined with other imaging modalities (e.g. fMRI). If the observed microstructural changes have a causal relationship to behavioural changes, then one would expect diffusivity measures to correlate not only with behavioural performance , but also with functional brain activation measures ( Straathof et al., 2019 ). Moreover, if microstructural changes are apparent within days Sagi et al., 2012 ) rather than months of the onset of a new practice task, then this offers the possibility to directly measure brain micro-structural change before, during and after training.

fMRI activation changes in response to short duration motor and perceptual training
In general, reports of fMRI activation changes with learning are not entirely consistent ( Kelly and Garavan, 2005 ). Some authors report an increase in activation ( Huang et al., 2013 ;Larcombe et al., 2018 ) others a decrease over time ( Schneiders et al., 2011 ;Erickson et al., 2007 ;Farrar and Budson, 2017 ), yet others show simultaneous increase in some areas and decrease in other areas, consistent with functional reorganisation ( Kelly and Garavan, 2005 ). A possible explanation for these inconsistent results is that there are distinct mechanisms and temporal phases in the time course of learning, related to differential dynamics of blood oxygen level-dependent (BOLD) activity ( Yotsumoto et al., 2008 ). The following review will focus on motor and perceptual learning, which is at the core of the task we use.
One of the earliest reports of fMRI experiments that provide evidence of experience-dependent reorganisation of task relevant areas was provided by Kami et al. (1995 ) who showed that motor learning results in persistent and task-specific BOLD response increases in motor cortex. Later work on motor sequence learning and motor adaptation confirmed these findings and emphasized the role of sleep dependent consolidation ( Debas et al., 2010 ;review Doyon and Benali, 2005 ).
A substantial amount of fMRI data supports proposals that perceptual training, like motor training, also leads to changes in local connections within task relevant areas. In the visual domain this has been shown for tasks ranging from texture ( Schwartz et al., 2002 ) or curvature discrimination ( Maertens and Pollmann, 2005 ) to motion perception ( Shibata et al., 2012( Shibata et al., , 2016. Similar effects have been demonstrated in other sensory modalities: short term discrimination training leads to neural activation increases, for example for auditory pitch discrimination ( de Souza et al., 2013 ), or tactile grating orientation discrimination training ( Hodzic et al., 2004 ) that were correlated to behavioural performance gains.
While perceptual learning necessarily modifies mostly sensory brain areas, similar effects have been shown in non-perceptual tasks. One example is mental practise, the cognitive rehearsal of movements, which in stroke patients led to simultaneous improvements in clinical motion measures and increased functional activation in bilateral motor cortex and ipsilateral superior parietal cortex ( Page et al., 2009 ). Censor et al. (2012) argue that perceptual and motor learning show similar characteristics in terms temporal dynamics and the interactions between primary cortical and higher-order brain areas that are suggestive of a common learning mechanism for both types of tasks.
In visual perceptual learning (VPL), initial functional activation increases are consistently reported with learning ( Yotsumoto et al., 2008 ;Frank et al., 2018 ;Hadjikhani et al., 2001 ;Furmanski et al., 2004 ;Kourtzi et al., 2005 ), the further time course of functional activation changes with learning is much less clear. Yotsumoto et al. (2008 ), for instance, found that functional activation in the primary visual cortex (V1) initially increased, but returned to the level observed before training while behavioural performance still improved. The authors hypothesize that there are distinct temporal phases in the time course of perceptual learning, which are expressed in differential dynamics of BOLD activity. The initial activation increase can be explained in terms of an early encoding phase, while the drop in activation as learning progresses is explained by the consolidation in a later retention phase.
The reduction in functional activity to baseline levels, however, is not seen in other studies: Most strikingly, Frank et al. (2018 ) measured behavioural performance and functional activation during a VPL task that required participants to discriminate complex motion patters and found that performance and task specific functional activation increases were still measurable three years after participants completed the initial VPL training.
A possible explanation of these differential results is that functional activation changes can be explained by different models: if learning leads to neuroplastic changes that make pre-existing task-relevant neural circuitry more efficient, then a reduction of activity would be expected. More efficient neural circuitry might enable participants to direct attentional resources better to task relevant areas, leading to more local increases of activation in relevant areas ( Büchel and Friston, 1997 ). Another, partial, explanation for activation increases, particularly in self-paced tasks, are rate effects ( Nyberg et al., 2006 ): increased activation may be caused by an increase in the number of stimuli participants can process in a given time ( Liu et al., 2005 ), or by attending to task-relevant aspects, such as the modality, of the signals.
Functional activation changes in many VPL studies ( Bi et al., 2014 ;Yotsumoto et al., 2014 ;Frank et al., 2016 ;Kang et al., 2018 ) are reflected in structural changes. Cortical thickness measures, for example, were associated with successful behavioural improvement in two VPL training studies ( Bi et al., 2014 ;Frank et al., 2016 ) that also reported increase of functional activation with learning. Similarly, FA increases have been shown to accompany functional activation increases with training, and these FA increases were correlated significantly with behavioural improvement ( Kang et al., 2018 ). A better understanding of the time course of functional activation patterns, in particular when combined with microstructural data, may therefore shed light on the dynamics of learning induced neuroplastic changes.
The aim of this study is to test the hypothesis that training induces correlated changes in microstructural MR parameters, functional activation measures and behavioural data. To investigate the potential correlations between multiple outcome measures we chose a task which has clearly defined outcome measures, is novel for all participants to control for previous experience, and is relevant for clinical applications . The task we chose is a 'visual search' task that is used for rehabilitation of patients with hemianopia (visual field loss, typically caused by injury to visual cortex; Rowe et al., 2017 ). Indeed, for clinical and rehabilitative purposes, several studies have explored the impact of scanning eye movements into the affected visual field for hemianopia patients ( Hanna et al., 2017 ). The training aims to compensate for visual field defects by improving the speed and accuracy of eye movements. into the defective visual field portion ( Aimola et al., 2014 ;Jacquin-Courtois et al., 2013 ;Lane et al., 2010 ;Ong et al., 2015 ;Pambakian et al., 2004 ). Rowe et al. (2017 ) proposed a specific eye movement training program for stroke survivors with hemianopia and show an improvement in vision-related quality of life with training. In this intervention, stroke survivors learn to make systematic bilateral eye movements to scan the environment over six weeks. The task provides precisely measurable data on performance and eye movement parameters ( Mannan et al., 2010 ;Pambakian et al., 2004 ;Wang, 2001 ), which can be directly correlated with MR imaging parameters. While the task has previously been used in neuro-rehabilitation, it can be applied to healthy participants and provides insights into the underlying neural mechanisms of perceptual and motor learning that reflect general principles of brain plasticity in adults ( Kang et al., 2018 ). Using a visuomotor learning task that is carried out over a relatively extended period means that we can scan participants multiple times during the training and these repeated scans enable us to test how early parameter changes predict longer-term changes or behavioural outcomes.

Specific hypotheses
In this novel visuomotor learning task, we expect to see significant performance improvements over time in terms of processing speed, in terms of the required number of gazes to perform the task and in terms of gaze fixation duration.
(1) Most fMRI studies in motor or visual perceptual learning, as discussed previously, report early increases of fMRI signal as a consequence of training. We therefore hypothesize that BOLD activation in functionally relevant areas should increase.
(2) In addition to testing participants on the trained task, data was collected for an untrained, involuntary, eye movement task. We expect this task to activate broadly similar brain areas in fMRI, but do not expect to see significant performance or functional activation changes for this control task. (3) As discussed in the introduction, many learning studies show a reduction in diffusivity measures (MD, AD and RD) and an increase in FA with training. We therefore expect to see an increase in FA and/or a reduction in water diffusivity measures (MD, AD and RD) in the functionally relevant brain areas. (4) One of the aims of this study is to characterise the time course of behavioural, functional and structural change. Being able to demonstrate correlations between these parameters would strengthen the argument that the observed changes are directly or indirectly related and have predictive value.

Participants
To determine the number of participants required for the experiment, a power analysis (g-power, V3.1.9.4 ( Faul et al., 2020 )) was carried out. The effect size was estimated from previous results published by Sagi et al. (2012) who conducted a similar learning study (Cohen's d = 0.95). For the DTI analysis we have a clear a-priori hypothesis: decreased diffusivity measures and increased FA. The minimum participant number for a one-tailed, paired samples (comparing pre-and posttraining data) analysis with error probability (0.05) and power (1error prob = 0.95) was estimated as 14 (actual power 0.956).
In the current study, 15 healthy participants (7 female, 8 male) with normal vision were recruited (mean age = 35.9 years, range 21 -60). asked to look at during the task. The numbered circles and fixation point are permanently visible, participants systematically look at all numbered targets, alternating left and right. The next target always contains a small shape (circle or triangle), which participants have to identify. As soon as a response is given the current target disappears and a random shape is presented at the next target position. The targets are repetitively represented (379 stimuli) on three different axes (horizontal and two oblique axes). The digital version was used for training at home (30 min daily), lab testing and for the voluntary fMRI task.
All participants gave informed consent and the study was approved by University of Liverpool Ethics Committee (reference number: 1596).

Training design
All participants followed an eye movement training programme developed for hemianopia neurorehabilitation: VISION visual scanning training . The basic training paradigm remained unchanged, except that the visual targets were presented on a computer screen instead of an A4 landscape card to enable us to measure behavioural performance and eye tracking data, such as correct eye fixation points and gaze duration.
The digital version of the VISION visual scanning training card ( Fig. 1 ) was created using PsychoPy2 (v1.82.01; Peirce et al., 2019 ), and consists of numbered circles radiating out from a central fixation target along three axes, one horizontal, the other two spanning the diagonals of the screen. The participants learn to move their gaze quickly between a sequence of permanently visible and numbered targets as indicated in the diagram. Participants start just to the right of the fixation point (target 1, right) then move to the left hemifield (1, left), then to target (2) on the right etc.

Training task and stimuli
Participants were asked to identify a small (0.45°visual angle, grey colour) symbol at the fixation target positions by pressing an appropriate button on a custom key-pad (left for triangle, right for circle). The first target appeared at the first numbered position near the centre of the screen at the beginning of each training session. Subsequent stimuli were presented in predictable positions but with a random shape (triangle or circle), systematically alternating between hemifields along the horizontal as well as oblique planes to ensure stimulation of a wide visual area. Only one target shape was visible at a time; the next target was presented in the opposite hemifield as soon as participants responded to the current stimulus. The three axes of targets were always fixated in the same order: horizontal, then bottom-left to top-right, and then top-left to bottom-right.
As in the original training paradigm, participants were instructed to minimise head movements and keep at a fixed distance from the computer screen to achieve at least a 30-degree field of vision to the right and left sides. Participants were asked to respond as rapidly and accurately as possible. A minimum performance criterion of > 75% correct (the midpoint between chance and ceiling performance) was defined for all behavioural and functional tasks to ensure participants were attending to the stimuli.

Training paradigm
Participants were asked to continually train at home for 30 min daily, five days a week and for six consecutive weeks. Performance data (response times, accuracy, and overall training duration) were recorded automatically and checked every week (results: Table 2 ).

Behavioural performance data
In addition to the performance monitoring that was carried out while participants were training at home, each participant was assessed in the laboratory, using an eye tracker, every week. The data gathered at home were used to check compliance with the training regime while the data recorded in the lab, using common equipment and a controlled environment for all participants, is the basis of the performance data reported here.

Testing procedure
Behavioural performance was measured for each subject at the baseline (week 0), once every week for six weeks of training and at a followup test, one month after the training ended, to quantify visuomotor learning gains ( Table 1 ).
All participants performed a session of the same 379 targets at each data collection point. The test session was performed on a laptop screen of 38.3 × 21.6 cm (width x height) at a viewing distance of 50 cm. A chin rest was used to minimise head movements. A display mounted eye tracker ( The EyeTribe -Model: ET1000), individually calibrated for each test session, was used to record gaze position data.

Behavioural performance measurement
The principal behavioural performance measure was the mean response time (RT): the average time taken by participants to respond to each stimulus in the experiment. For experiments conducted in the lab and training runs at home, the number of targets in each run was fixed. The fMRI experiment was a block design, with alternating 'task' and 'rest' conditions, here the total execution time (task block duration) was fixed so that the number of targets processed varied between participants and MRI sessions.
In addition to the behavioural RT measure, two sets of eye tracking data were recorded: the Mean Fixation Duration (MFD); fixation durations were estimated from the eye tracking data as periods where the eye position varied less than 0.5 0 over a minimum of 60 ms ( Mannan et al., 2010 ), and the total number of fixations (TNF); the number of fixations (gazes) required to complete the task. Learning-related behavioural changes in this study were expected for eye movement parameters rather than for response accuracy, where we expect near ceiling performance, see also tables S1, S2 and S3 in the supplementary material (section B).

Stimulus presentation in the scanner
Participants were asked to lie still in the MRI scanner in a low light environment. The stimuli were presented via an MRI compatible monitor (NordicNeuroLab, model LCD 3.0.4, https://www.nordicneurolab.com ). Participants saw the stimuli Table 1 Data collection protocol : table shows behavioural performance measures in lab (baseline, six visits during the training period and a follow up visit after a month of training period), and MRI scans visits (baseline, two weeks later, end of the training period and a follow up visit after a month of training period). During the MRI scan visits the fMRI data (voluntary and involuntary), DTI data and in-scanner behavioural performance data were collected.

Image data analysis: general approach
Our image analysis is split into three parts. Firstly, we identify regions of significant change in functional and diffusivity measures over the six-week course of training. This analysis was limited to taskrelevant brain regions, i.e. areas that show significant functional activation at any time during training. Following on from this analysis, we analyse the time course of change in the functional and diffusivity measures, focussing on the regions identified in the previous step. The final analysis tests whether neuroimaging measures correlate with behavioural performance and whether it is possible to extrapolate (predict) measures in time.

Functional imaging tasks
Two functional task sessions, voluntary and involuntary eye movement, were run for each participant at each MRI scan visit. For the voluntary eye movement task, stimulus presentation and participant instructions were identical to those used during training. For the involuntary eye movement task, serving as a control task, participants were asked to respond to circle or triangle targets (grey colour, size varied randomly between 0.92°and 1.14°visual degrees), which were presented in random positions to create involuntary saccades.
A standard block-design of 15 blocks, 32 s block duration ( Maus et al., 2010 ), alternating 16 s of rest and 16 s of activity, was applied to measure the fMRI contrast changes for both tasks. Each functional task took 8 min (160 vol) to complete. All sessions started with 16 s of rest and participants were requested to fixate a central fixation point during the rest periods.
The response accuracy and the total number of responses to the targets were recorded using PsychoPy2 ( Peirce et al., 2019 ). Both tasks were performed at each of four fMRI scan visits (week 0, 2, 6 and 10, Table 1 ) that coincided with DTI scans and performance tests. Global activation patterns elicited during the training task are compared with those seen during the control task to demonstrate learning specific changes.

fMRI statistical analysis
Analysis of the functional data was performed with SPM12 (Statistical Parametric Mapping software, https://www.fil.ion.ucl.ac.uk/spm/ ) running on MATLAB R2016a. Pre-processing of the functional data was performed using the SPM12 default batch (preproc_fMRI.m) starting with slice time correction, before realignment of all functional scanning sessions to a common image. For each participant, the EPI volumes for all eight sessions, voluntary and involuntary task at each of the four visits, were processed at the same time. All images within each session were spatially realigned to the first volume of the session, unwarped and corrected for motion artefacts.
The T1 weighted (mprage) structural image, collected during the baseline scan, was segmented into white matter, grey matter and cerebrospinal fluid for spatial normalization. All EPI images for each individual were realigned to this single, high resolution structural scan.
To complete the preprocessing, all EPI images were normalised to the Montreal Neurological Institution (MNI) space, resampled (1 × 1 × 1 mm 3 voxels using trilinear interpolation) and spatially smoothed with a 6 mm Gaussian kernel (full-width at half-maximum) for the group analysis.

First level fMRI data analysis
Data were analysed using a random-effect model ( Friston et al., 1999 ), implemented in a two-level procedure. In the first level, fMRI signals for individual subjects were modelled in a General Linear Model (GLM) by a design-matrix, modelling the onsets and the durations of each block in the voluntary and involuntary task. In addition to the four regressors that were used for further analysis, rest_voluntary, act_voluntary, rest_involuntary and act_involuntary, six motion parameters, representing translational and rotational motion in three dimensions each, computed at the realignment stage, were modelled as nuisance regressors. From the four task-related regressors, two contrasts were created (act_voluntry vs rest_voluntary and act_involuntary vs rest_involuntary) for each visit. These contrasts form the basis of the ROI definition because they enable us to restrict further analysis to taskrelevant brain regions in the fMRI and DTI data.

Second level fMRI analysis 1: ROI definition
In fMRI and DTI analysis, it is common to define independent regions of interest (ROI) to increase sensitivity to small changes of the parameters of interest, here relatively subtle changes in fMRI or DTI parameters over six weeks of training ( Froeling et al., 2016 ).
For DTI analysis, in particular for white matter tracts, it is common to use anatomically or functionally defined ROIs to extract white matter tracts that connect these regions ( Anon, 2007 ;Revill et al., 2014 ;Beer et al., 2013 ) while in functional imaging independent functional scans are commonly used to define ROIs ( Poldrack, 2007 ).
Training effects in DTI data are not restricted to white matter, but have also been shown to modulate grey matter diffusivity ( Fields, 2011 ;Kühn et al., 2014 ;Noppeney et al., 2005 ;Blumenfeld-Katzir et al., 2011 ). Functionally relevant grey matter areas should therefore be included in the DTI analysis.
We expect relatively small structural or functional learning-induced changes over time but expect these neuroplastic changes to be restricted to task-relevant grey-matter regions, or in white matter tracts that connect relevant regions. To enhance the specificity of our analysis we therefore define a ROI that is used in fMRI and DTI analysis: group-wise fMRI activation maps (voluntary eye movement vs rest, thresholded pT-FCE < 0.05; Spisák et al., 2019 ), were computed separately for each of the four visits. These four maps were combined using a logical union (OR) operation to ensure that any clusters that were significantly active at any time of the training were included in the resulting ROI.
The DTI and fMRI used different image acquisition sequences and were processed independently. To allow for any inaccuracies in the alignment of the two modalities, the fMRI-based ROI masks were further smoothed using a 6 mm Gaussian kernel. The resulting mask consequently extends significantly into white matter adjacent to functionally defined regions: 55.9% of voxels in the global mask are located in white matter tracts identified from the DTI FA map (FA ≥ 0.15; Dellani et al., 2007 ); see supplementary material (section A, Fig. S1)

Second level fMRI analysis 2: training-induced functional activation change
For the second-level analysis (group analysis), contrast images from the first level analysis were entered into flexible ANOVA (repeated measures) analysis ( Friston et al., 2002 ). This analysis was performed separately for the two functional experiments (voluntary and involuntary) and consisted of four contrasts: two contrasts characterise the activation contrasts at baseline (week 0) and the end of training (week 6) relative to rest: ACT w0 > REST w0 and ACT w6 > REST w6. The remaining two contrasts compare activation post-training (week 6) with the baseline (week 0): ACT w0 > ACT w6 and ACT w0 < ACT w6 ). Cluster level probability values, thresholded at pFWE < 0.05, were computed from a voxel-level threshold of p -unc < 0.001. The cutoff for the cluster size used in the analysis was 845 voxels. All analyses were limited to the ROI.

DTI data acquisition
Diffusion-weighted images were collected immediately following the fMRI scans at each of the four visits (see Table 1 ). An EPI sequence with the following parameters was applied: repetition time/TR = 3200 ms, echo time/TE = 90 ms, flip angle = 90°, slice thickness = 2.5 mm, FOV = 220 × 220 voxels, 50 slices. As DTI sequences suffer from spatial distortions along the phase encoding direction ( Thakkar et al., 2020 ), two diffusion-weighted sequences were acquired with reverse encoding directions, resulting in pairs of images with distortions in opposite directions. The principal DTI sequence (posterior/interior) consisted of a non-diffusion weighted T2 (b-value = 0 s/mm 2 ) followed by a 64-direction diffusion weighted (bvalue = 1000s/mm 2 ) images.

DTI data analysis
DTI data were analysed using FMRIB Software Library (FSL 5.0.9, http://www.fMRIb.ox.ac.uk/fsl/ ; Woolrich et al., 2009 ) in a pipeline ( Burzynska et al., 2010 ), including: (1) topup, to estimate and correct susceptibility induced distortions ( Andersson et al., 2003 ), (2) eddy, to correct for eddy currents and motion artefacts in diffusion data ( Andersson and Sotiropoulos, 2016 ), (3) BET (brain extraction tool) to remove the skull and non-brain tissue ( Pechaud et al., 2006 ), and (4) DTIFIT for voxel-by-voxel calculation of the diffusion tensors ( Basser et al., 1994 ). The following DTI parameters were computed: axial diffusivity ( AD = 1 ) , radial diffusivity, RD = ( 2 + 3 )∕2 , mean diffusivity MD = ( 1 + 2 + 3 )∕3 , and fractional anisotropy Instead of TBSS (tract-based spatial statistic; Smith et al., 2006 ), the conventional DTI analysis consisting of nonlinear registration, followed by projection onto an alignment-invariant tract representation, the "mean FA skeleton", we followed the work of Schwarz et al. (2014 ), who showed that ANTs (Advanced Normalization Tool; Avants et al., 2011 ) groupwise registration methods enable more sensitive detection of the true structural changes and greater specificity in resisting false positives, which may occur from misregistration. As diffusion gradients are specified in the native space ( Thakkar et al., 2020 ), DTI analysis was also carried out in the native space for each participant. In order to perform voxel-based analysis (VBA) we used ANTs ( antsMultivariateTemplateConstruction2.sh ; Klein et al., 2010 ) to generate a FA template images for each individual subject. All DTI maps (FA, MD, AD and RD), then, were realigned in native space, using this FA template. DTI parameter changes between visits were computed in native space as a first level of analysis. To transform the DTI maps into a common space, transformation matrices mapping the individual FA templates into a common space (FMRIB58_FA_1 mm.nii.gz) were computed using antsRegistrationSyNQuick.sh. The same transformation was then applied to all DTI maps using antsApplyTransforms ( Avants et al., 2011 ).
The group analysis (as a second level of analysis) was then conducted after transforming the results of individual calculations into a common space. To increase the signal to noise ratio (SNR) ( Schwarz et al., 2014 ), all computed images were smoothed with a 5 mm Gaussian kernel after transformation to the common space. The FSL-randomise script ( https://fsl.fMRIb.ox.ac.uk/fsl/fslwiki/Randomise ; Winkler et al., 2014 ) was applied with default settings ( z = 2.3 and p = 0.05) and by running 5000 permutations for statistical inference.
In a joint functional and DTI analysis, it is common to consider white matter tracts that connect functionally relevant grey matter areas. We performed a separate TBSS analysis to identify changes located in white matter tracts and found that all clusters of significant changes in the TBSS analysis were included in the main ROI mask. Instead of duplicating the material in the main text, we therefore cover the TBSS analysis in the supplementary materials (section A, Fig. S2).

fMRI and DTI data extraction for time course analysis
The basis of the time-course analysis are brain areas that showed significant functional or structural change at week 6 (the end of training) compared to baseline.

fMRI data extraction
The MarsBar ROI toolbox for SPM (release 0.42; Brett et al., 2002 ) and REX ( http://web.mit.edu/swg/rex ) were used to extract the mean activation in those areas that exhibited significant change at week 6 for all other time points. The labels of these regions were obtained using SPM-Anatomy toolbox v.1.7. All clusters fell into occipital (v3d), cerebellar (OC) regions (see Fig. 5 a, in the results section) and the right frontal eye field (FEF); see Fig. 4 in results section. Where more than one activation cluster was located within a single anatomical region the clusters were combined.

DTI data extraction
Analogous to the fMRI processing, mean DTI parameters (AD, RD, MD and FA) were calculated over all voxels showing significant change. Clusters for each of the four DTI parameters were extracted and analysed separately because the size and shape of regions of change differed across DTI parameters. This means that each parameter mean is based on a mask that is specific to the parameter under consideration (see Fig. 5 b for MD mask for time course data extraction; supplementary material, Fig. S6a for AD mask data extraction, S6b for RD mask data extraction; Fig. S4 for FA mask data extraction). Where more than one activation cluster was located within a single anatomical region the clusters were combined into a single data extraction mask (for time course analysis) to mirror the fMRI analysis.

Statistical analysis for all variables
To evaluate performance gains during training, paired samples ttests, using Statistical Package for Social Sciences (SPSS V 16.0) for windows , were computed.
For further analysis testing whether changes in behavioural performance, functional activation patterns, and structural measures are coincident, we correlated these measures.
We expect an improvement in performance measures (decrease in RT, MFD and TNF), a decrease in all diffusivity measures (MD, AD, RD) and an increase in fractional anisotropy (FA) with training. For this reason ( Ruxton and Neuhaeuser, 2020 ), one-sided statistical tests were used for these parameters.
Similarly, the existing literature ( Yotsumoto et al., 2008 ;Hadjikhani et al., 2001 ;Furmanski et al., 2004 ;Kourtzi et al., 2005 ;Little et al., 2004 ) shows consistent functional activation increases in motor and perceptual learning tasks. We therefore used one-sided t-tests, hypothesizing an increase in mean functional activity with training, in all direct comparisons of fMRI data.
Behavioural performance data (RT and TNF) and DTI measures for the correlational analysis were expressed as change relative to the first measure recorded for each participant to minimise inter-subject variability:

Δ =
where the relative performance change at time t , Δ , is the ratio of the absolute performance P t and the performance at the baseline P 0 .
The functional imaging data are proportional signal change estimates, , already, so that they were used as they were recorded. The significance level for all correlation analyses was set at p ≤ 0.05. The measures are highly correlated with each other, therefore violate the independence assumption that is the basis for multiple comparison correction (see Fig. 7 for summary data), we therefore present uncorrected p values for the correlation analysis.

Results
All 15 participants completed the training period with the minimum daily training session (30 min daily) and exceeded the minimum performance requirement (75% correct target identification in each task session). Thus, all behavioural and imaging data were included in the data analysis. Table 2 and section B in supplementary materials (Table S1, Table S2 and Table S3) contain more details about behavioural performance accuracy of the participants during the training and tasks session.
Behavioural performance improvements were maintained one month after the training ended in all behavioural measures. The mean RT at week 10 was significantly lower than at baseline (0.47 s ( ± 0.10 s) vs 0.80 s ( ± 0.19 s), t(14) = 10.542, p < 0.001) but not different to the RT measured at week 6, Fig. 2 . See supplementary material, section C, for the other behavioural measures.

Impact of training -imaging data
Our participants learned to execute a complex sequence of alternating speeded voluntary eye movements. To show that behavioural and functional changes were the result of training we asked all participants to carry out an involuntary eye movement (control) task to show the dissociation of training effects between the two tasks. We hypothesize that we will see an increase in functional activation and the DTI FA  measure and a decrease in mean diffusivity measures (MD, RD, AD) in task-relevant brain areas for the training task only.
Both MRI functional tasks (voluntary and involuntary eye movement) cause similar global brain activation patterns ( Fig. 3 a and 3 b). A significant increase in fMRI signal with training, however, was only shown in the voluntary (trained) task, Fig. 3 c. No systematic training induced functional changes were shown for the involuntary (control) eye movement task, Fig. 3 d.
At the end of the training period, significant fMRI activation changes were observed in the frontal eye field (FEF), the occipital lobe and cerebellum ( Figs. 4 and 5 a), p (FWE) < 0.05 while participants performed the trained task.
Significant mean diffusivity reductions were also observed in matching regions, the occipital lobe and cerebellum ( p (FWE) < 0.05, see Fig. 5 b). The functional and structural brain changes overlapped mainly in extrastriate area (v3d) and Oculomotor Cerebellum region (OC), Fig. 5

c.
We hypothesized that at the end of the training period, significant increases in fMRI activation, FA and a simultaneous reduction in diffusivity measures would be seen. Our data shows overlapping functional and structural changes in two functionally relevant areas. These changes are seen only for the trained, not the control task. No structural, but functional changes are seen in FEF. Fractional Anisotropy, FA, one of the key DTI measures, did not show significant change in any region (Fig. S4, section D in supplementary materials).
The pattern of change for the three diffusivity measures, AD, RD and MD are similar. As the MD measure is a composite of AD and RD the discussion below will focus on MD changes. Additional details are provided in the supplementary materials, including maps showing AD and RD data (section D, Fig. S6), and more details . Panel (c) shows areas where fMRI and DTI changes overlap. Note that the spm fMRI images (panel a) are 'flipped' to facilitate comparison with the fsl representation of the MD images (panel b). All analyses were limited to the ROI area, which mean that all significant functional and structural clusters changes are located in the ROI mask (supplementary material, Fig. S1a).
about the affected functional and structural brain areas (section D, Fig. S7).

Time course of functional and structural change
Behavioural measures ( Figs. 2 and S3) show the typical exponential performance improvement that was first described a century ago ( Thurstone, 1919 ). The extracted micro-structural imaging data show matching gradual MD reductions in both visual and cerebellum MD target regions ( Fig. 6 b and 6 d) over the six weeks of training. At week 6, MD was reduced significantly in the both target regions (MD visual, t(14) = 5.93, p < 0.001; MD cerebellum, t(14) = 8.08, p < 0.001).
Functional activation changes were measured for the trained, voluntary eye movement task as well as for a control task involving involuntary eye movements (voluntary: Fig. 6 a, Fig. 6 c and Fig. S8; involuntary: Fig S9). Separate repeated measures ANOVAs with the factors time, (week 0,2, 6 and 10), condition (voluntary, involuntary) and participant as the repeated measure were conducted for each of the three regions of interest (visual cortex, cerebellum and FEF). All three analyses showed significant main effects of learning duration (factor 'Time': week 0, 2, 6 and 10) and significant interactions between duration and factor 'Condition' (voluntary, involuntary) consistent with differential learning effects for the two conditions. Significant main effects of 'Condition' were seen in visual cortex and cerebellum, but not in the FEF region. Post-hoc, paired t-tests show that the activation increases between week 0 and week 6 are significant in all three areas for the voluntary eye movement task but not for the control task ( Table 3 ).
Functional activations and mean diffusivity changes in the visual and cerebellar target areas, as shown in Fig. 6 , partially reverted to baseline after the cessation of training. MD values in the visual area at week 10 remained significantly below baseline, t(14) = 1.72, p < 0.05, whereas the MD in cerebellum was no longer significantly different from the baseline, t(14) = 1.2, p < 0.12, Fig. 6 b and 6 d) Functional activation during the voluntary task, which was increased significantly relative to baseline at weeks 2 and 6 during training, reduced at the week 10 test to a point where it was no longer significantly different from baseline (visual cortex t(14) = − 1.50, p = 0.07, cerebellum t(14) = − 1.27, p = 0.11, and FEF t(14) = − 1.55, p = 0.7), Fig. 6 a and 6 c.

Correlations between behavioural, functional and structural changes
If structural and functional adaptation supports behavioural adaptation, then the degree of this change should be related to behavioural performance changes at the individual level. The following sections describe correlations between measures, but also between neuroimaging measures recorded at week 2 and behavioural outcome measures recorded at weeks 6 and 10 to test whether early structural or functional change predicts longer term outcomes.

Overall pattern of correlations
The diagram in Fig. 7 shows an overview of correlations between neuroimaging parameters that changed significantly over the course of training and behavioural performance measures. In each case the change at weeks 2, 6, and 10, relative to baseline, is the basis for the correlation.
The measures are grouped by modality and marked by blue triangles: fMRI, top left, DTI diffusivity measures, middle, and behavioural measures, lower right. In each case the diagram shows that most measures within the two imaging modalities and behaviour are correlated. This confirms that the measures represent systematic, and gradual, changes that coincide over more than one brain region. Correlations across measures are much sparser: the most obvious correlations are between diffusivity measures and behavioural performance, identified by yellow shaded boxes in Fig. 7 . AD, RD and MD measured at weeks 2 and 6 in the cerebellum are correlated with behavioural measures recorded at weeks 2-10. pact of the training on imaging data; fMRI data is shown on the left (red), MD data is shown on the right (blue). The extracted imaging data represent, separately, the mean measures (fMRI beta and MD) of all significant clusters in the visual cortex and cerebellum where significant changes are seen after six weeks of training. Both metrics follow roughly exponential behaviour during training, but then revert back towards baseline during the month after training ceases while behavioural performance remains static. Table 3 Summary statistics of the ANOVA main training effects : The table shows main effect of condition (voluntary vs involuntary), time (week 0,2,6, and 10) and their interaction for three separate ANOVAs that were computed for the three brain areas where significant change was seen. Significant interactions and a main effect of test time are seen in all three data extracted areas.. Post-hoc paired t-tests, comparing the mean BOLD response after training with the baseline measure in each area are shown on the right. A significant main effect of condition is seen in visual cortex and cerebellum, but not in FEF. Post-hoc tests show significant activation increases in all extracted data areas for the trained, but not for the control task. The diagram shows significant correlations ( p < 0.05, one sided, no correction) between neuroimaging and behavioural parameters. The matrix labels consist of three components: the type of measurement (fM: fMRI, MS, RD, AD: diffusivity measures, and RT: response time), the next letters identify the target area (vis = visual cortex, cb = cerebellum, fef = frontal eye field) for the neuroimaging data, while, for behavioural data, the label indicates where the data was recorded: lab = laboratory, sc = scanner). The final component is a number identifying the week the data was recorded (weeks 2,6,10). All changes are relative to the baseline at week 0. The size of the circle and colour indicates the degree and polarity of correlation, see colourbar on the right.
Two further clusters of correlations link diffusivity measures to functional activation in the visual cortex, light blue blocks in Fig. 7 . Correlations between functional activation patterns and behavioural data are very limited (top right corner in Fig. 7 ).
We include this data because it illustrates that the measures are highly correlated, especially within imaging modalities and across time. The data shows clear structure, which is incompatible with the random pattern that would be expected if the correlations emerged from type II errors as a result of noisy data. It is important to note that many of the correlations in this bulk analysis, for example across different measures, in different brain areas, or recorded at different times do not represent particularly meaningful comparisons. The detailed data discussed in the following sections represent planned comparisons for a small subset of possible correlations to test whether structural or functional change predicts performance.

Structural vs behavioural changes
Performance changes at the end of the training period are correlated with all three diffusivity measures (MD, AD and RD) in the cerebellum,

Fig. 8. Correlation between RT change and MD change in cerebellum :
the graph shows a significant positive correlation between relative RT and MD changes amongst participants (numbers identify participants) over the six week training periods. All changes are relative to measures recorded at baseline (week 0). but not in visual areas. Fig. 8 shows the MD data against mean RT measured in the lab, r(13) = 0.58, p < 0.01.
See supplementary materials (section E, Table S4 and Table S5) for additional correlation analysis between structural and lab behavioural data.

Functional vs behavioural changes
A wide range of DTI diffusivity measures were correlated with behavioural performance measures obtained under controlled laboratory conditions; see Fig. 8 and section E, Tables S4 and S5 in supplementary materials. We therefore expected to also see correlations between the functional activation and behavioural performance changes. Our data did not show this correlation at any time point. A possible explanation for this discrepancy is that the structural and performance changes discussed in the previous section are long-term changes, while functional activation reflects short-term behaviour in the scanner that is affected by additional task and environmental constraints. To further investigate this, correlations with performance, measured in the scanner instead of the lab environment, were computed.
The overall performance data in the scanner was similar to that seen in the lab. The mean RT while participants performed the voluntary fMRI task in scanner reduced significantly from 0.69 s ( ± 0.10 s) at baseline level to 0.45 s ( ± 0.18 s) after six weeks of training, t(14) = 8.92 , p < 0 . 00. Similar significant change was shown after two weeks of training (t(14) = 12.92, p < 0.001) and also a month after training ceded (t(14) = 12.05, p < 0.001) (see also fig S8d in supplementary materials, section F).
The functional data for the involuntary eye movement task shows similar overall activation patterns, but no significant activation change during the training period and no correlation of activation with either behavioural or structural change data, as expected for a control task, supplementary material (section F, Fig. S9).

Overall functional vs behavioural changes: voluntary vs involuntary task
The correlation analyses presented so far aimed to investigate parameter changes seen at discrete time points. This is important to evaluate to what extent individual data is related and can be used to make outcome predictions. The measure, however, is fragile because it considers parameter changes between baseline and single time points. Using single measures means that the data may be more affected by measurement noise. An alternative approach is to test whether general trends, for example fMRI activation vs RT, are correlated for the population. This data is much more robust because all measurements can be considered at once, but does not allow individual outcome predictions to be made. We present it here because it shows that, overall, functional activation is correlated with RT, particularly for the visual cortex. More importantly the analysis enables us to address one of the potential confounds in the experiment: Both tasks that participants performed in the scanner are self-timed; this means that participants process increasing numbers of targets per unit time as they become more proficient at the task. What if the functional changes we see in the fMRI signal are not a direct result of learning, but simply reflect faster processing? One might, for example, expect activity in circuits controlling eye-movements to increase as participants perform the task faster, but these increases would be possible without direct effects of learning in these areas. The data in Fig. 9 shows all fMRI data against RT in the voluntary and involuntary tasks. If rate, and not learning, modulated activation, then one would expect to see higher BOLD signals in task relevant areas for shorter average RTs. This is clearly seen for the learned task. The same effect, however, should also be visible for the involuntary task where we see a broad range of mean RTs. The involuntary task does NOT show any link between RT and fMRI signal in any of our target regions, confirming that rate effects do not account for the differences seen.

Outcome prediction
Cerebellar MD changes ( Fig. 8 ) as well as AD and RD changes (supplementary material, Table S4) at the end of training were correlated with behavioural performance improvements. One of the key motivations for this research was to test whether early structural change (our week 2 data) can predict future neuroplastic or performance improvements because this would provide the basis for rehabilitative outcome prediction.
Interestingly, as shown in Fig. 10 , the cerebellar MD change at week 2 predicts the changes of both the RT in lab at week 6, r(13) = 0.43, p < 0.05 ( Fig. 10 c), and MD at week 6, r(13) = 0.48, p < 0.03 ( Fig. 10 d). The other diffusivity measures, AD and RD, showed similar predictive properties as shown in the supplementary materials (section H, Fig. S6 and Fig. S11).
Functional activation data in visual cortex after two weeks of practice did not predict the behavioural changes in the lab, but surprisingly predicts in-scan performance at the end of training and four weeks after the end of training well (visual fMRI week 2 vs RT in scanner week 6, r(13) = − 0.64, p < 0.005, Fig. 11 a; visual fMRI week 2 vs RT in scanner week 10 r(13) = − 0.64, p < 0.004, Fig. 11 b).

Discussion
The primary aim of the study was to investigate the link between behavioural performance on a visuomotor learning task with functional and structural parameters derived from neuroimaging.
As expected from previous work ( Little et al., 2004 ), all participants showed significant behavioural performance improvements over the course of the training period: the mean response time, the number of fixations required to complete the task, and the mean fixation duration required for individual decisions reduced significantly while response accuracy remained high ( > 90%). The results are consistent with two previous studies ( Mannan et al., 2010 ;Pambakian et al., 2004 ), which applied visual search training for rehabilitative purposes on hemianopia patients. Performance was measured four weeks after the cessation of training and remained at levels seen at the end of the training. This indicates that the training effect was persistent as would be expected for motor and perceptual learning.

Neuroplastic change
The principal finding is that scanning training leads to significant, bilateral functional activation increases in extrastriate visual cortex (v3d), the frontal part of the Oculomotor Cerebellum (OC) ( Kheradmand et al., 2016 ) and in the right FEF. The functional activation changes in extrastriate visual cortex and cerebellum were accompanied by a significant decrease in DTI mean diffusivity measures in these areas and the underlying connecting white matter tracts. No significant changes in FA were observed.

Functional activation during voluntary eye movements
Extrastriate cortex : Previous fMRI studies ( Büchel et al., 1998 ;Hietanen et al., 2006 ) highlighted the role of extrastriate area in visual processing. A longitudinal eye movement training study in hemianopia patients also reported an increase of brain activation following the training in this area ( Nelles et al., 2010 ). The current results suggest a key role of extrastriate cortex in visuomotor learning.
Our data shows bilateral (although right-dominant) functional and structural plasticity at the junction between intraparietal and transverse occipital sulcus (IPS/TOS) and bilateral changes at dorsal V3 (v3d). This is consistent with data reported by Maurizio and colleagues ( Corbetta et al., 1998 ) who showed selective activation in the IPS/TOS area when participants performed saccadic shifts to peripheral stimuli. Another fMRI study by ( Müller-Plath, 2008 ), using a visual search task, Training reduces RT and increases fMRI activation in visual cortex (panel a) and FEF (panel c) significantly while the correlation is marginally significant in cerebellum (panel e). Equivalent changes are not seen for the involuntary eye movement task that served as a control (panels b, d, f).
reported a role of IPS/TOS and dorsal V3 (v3d) in eye movement programming, although the specific role of v3d in the eye movement processing is still not clearly understood.
Cerebellum : Our results showed significant functional and structural alteration in the frontal part of the cerebellum (oculomotor region: the fastigial oculomotor region and central lobule, see Fig. 5 a and 5 b in results section), consistent with previous studies ( Wang et al., 2002 ;Fuchs et al., 2010 ;Patel and Zee, 2015 ;Kheradmand and Zee, 2011 ;Blurton et al., 2012 ), that highlight the important role of the cerebellum in eye movement control. The oculomotor cerebellum is, for example, involved in the control of gaze shifting and rapid redirection of gaze from one object to another ( Wang et al., 2002 ;Fuchs et al., 2010 ). The fastigial oculomotor region, which was also changed functionally in the current study, facilitates saccade initiation ( Patel and Zee, 2015 ; cerebellum over the training period, (c) shows a significant correlation between the improvement in the completion time (at week 6) and the reduction in mean diffusivity after two week of training (week 2), and (d) reveal the correlation amongst participants when compare the initial stage of MD changes (week 2) vs the later stage of MD change (week 6) during visuomotor learning. Fig. 11. Outcome prediction in scanner : (a) shows a significant correlation between voluntary fMRI increase in the visual area after two weeks of training (week 2) and the reduction in scanner RT at the end of the training period (week 6), (b) shows a significant correlation between voluntary fMRI increase in visual area after two weeks of training (week 2) and the reduction in scanner RT after a month of training ended (week 10). Kheradmand and Zee, 2011 ), and makes them fast, accurate and consistent ( Robinson, 2001 ). Our findings mirror those of Blurton and colleagues ( Blurton et al., 2012 ), who reported functional plasticity in the cerebellum following a saccade adaption training. This suggests that the cerebellum plays a key role in sequence learning ( Hikosaka et al., 2002 ;Debaere et al., 2004 ), which is the basis of our scanning task.
Frontal eye field (FEF): A functional MRI study by Cornelissen et al. (2002 ) suggested a standard anatomical area for FEF at mean Talairach coordinates averaged over subjects and hemispheres of x = 40.6 ( ± 4.9), y = -8.5 ( ± 3.8), z = 55.4 ( ± 4.5) mm. Our functional results show a significant functional activation increase roughly in this anatomical area (MNI x = 44, y = − 11.89, z = 52.10, see Fig. 5 ). This activation change was significant two and six weeks after training commenced. No significant structural alterations, however, were seen in FEF in the current study. The FEF is known to play a key role in saccade initiation ( Alkan et al., 2011 ), as well as in cognitive processes that require gaze control, such as attentional orienting, visual awareness and perceptual modulation ( Vernet et al., 2014 ).
In addition to FEF an area in right motor cortex, (BA4, MNI coordinates: x = 54, y = − 2, z = 26; see supplementary martials, section D, Fig. S7c) showed a significant functional change at the end of training period in the current study, suggesting a role in visuomotor learning.
Despite the crucial role of supplementary eye field (SEF) and dorsolateral prefrontal cortex (dIPFC) in saccadic control, the current data did not show evidence of a selective functional activation increase while participants performed the voluntary and involuntary eye movement tasks. The main roles of SEF and dIPFC are in the inhibition of unwanted reflexive saccades ( Vernet et al., 2014 ;Pouget, 2015 ). SEF, also, is involved in sequencing saccades with body movements ( Pouget, 2015 ;Pierrot-Deseilligny et al., 2004 ). A possible explanation for the absence of significant change in both SEF and dIPFC during the current functional tasks may be that the trained task requires systematic/predictive eye movements but does not include antisaccades or saccades combined with body movement.

Linking neuroplastic to behavioural change
A key finding in this study is that changes in diffusivity in the cerebellum, and, to a lesser extent, functional activation changes, are significantly correlated with behavioural performance changes at set time points. Behavioural and functional activation changes were seen for the trained voluntary eye movement task but not while participants performed involuntary eye movements, our control task. These findings are consistent with the interpretation that neuroplastic changes in our target areas support performance improvements. A closer look at the time course of these changes, however, suggests that this link is not direct.

Functional activation changes over time
Although a wide range of neuroimaging studies ( Zatorre et al., 2012 ;Sagi et al., 2012 ;Cao et al., 2016 ;Thomas et al., 2009 ;Madden et al., 2009 ;Scholz et al., 2009 ;Mukai et al., 2007 ) link learning with brain alterations, the mechanisms underlying brain plasticity are still not fully understood ( Fields, 2013 ) and the results are not entirely consistent. It has been argued that longitudinal learning studies can strengthen our understanding of the link between practice and neuroplasticity Sampaio-Baptista and Johansen-Berg, 2017 ). Most studies, as discussed in the introduction, show activation increases during motor or perceptual learning over the course of training ( Yotsumoto et al., 2008 ;Hadjikhani et al., 2001 ;Furmanski et al., 2004 ;Kourtzi et al., 2005 ). Some studies report that this BOLD response increases were maintained for months or years after training ended ( Frank et al., 2018 ;Bi et al., 2014 ). Frank et al. (2018 ), for instance, reported that performance and functional activation changes caused by visual motion patterns training was maintained three years after the completion of training. Others show contradictory data: Thomas et al. (2009 ), for example, report reduced activation in directly task-relevant areas, but a simultaneous activation increase in middle frontal cortex in a mirroring learning task. Yotsumoto et al. (2008 ), provide evidence for differential dynamics of BOLD activity during learning. They showed increasing activity and performance improvements in the early stages (day 1 and 10) of training, followed by a decrease of activation to baseline while performance remained unchanged. Similar data, at a much shorter time scale, are reported by Mukai et al. (2007) , who scanned participants while training on a contrast discrimination task: while the training group showed overall activation increases compared to a control group in task relevant areas, this activation reduced significantly over the first hour of training. These findings, as Yotsumoto et al. (2008 ) argue, are consistent with a consolidation model where learning leads to efficiency gains, therefore reduced metabolic demand.
The functional imaging data reported here ( Fig. 6 and Fig. S8 in supplementary materials) shows significant initial increases relative to baseline (week 2), but no significant change in activation while training continued between weeks 2 and 6. This finding is consistent with a model that assumes changing dynamics in the BOLD response as training progresses. We do not, however, observe the activation reduction reported by Yotsumoto et al. (2008 ) while training was ongoing, instead we observed a reduction towards, but not all the way to baseline after training ceased while performance was maintained.
The findings that functional activation patterns do not change monotonically during training or linearly with performance further strengthen the case for considering functional activation at multiple time points. They may explain why functional activation in visual cortex, recorded at week 2, but not at later stages, is a good predictor for performance at later stages.

Microstructural change over time
The current study showed exponential changes in behavioural performance, which were associated with more gradual microstructural alterations. This gradual reduction in diffusivity is consistent with a linear pattern of brain changes over time reported by Erickson et al. (2011 ) in a longitudinal study. The rather more gradual change of diffusivity measures, compared to fMRI data, may explain the better correlation with behavioural performance and suggests different time-courses of change in functional and microstructural plasticity measures.
A trend of structural changes to return to the baseline level after the completion of the training course, consistent with our findings, was reported in some longitudinal studies ( Scholz et al., 2009 ;Lövdén et al., 2012 ). Microstructural alterations resulting from spatial navigation training, for example, reverted to the baseline four months after the completion of the training course ( Lövdén et al., 2012 ). This is consistent with data presented by Sagi et al. (2012 ) who showed that the effects of learning modulated DTI activity in humans and rats within 24 h. They argue that, at least in rats, the measured MD decreases during training are linked to an increase in the number of synaptic vesicles, an increase in the number of astrocytic processes, and an increase in brainderived neurotrophic factor (BDNF), which they argue is indicative of long term potentiation. It is interesting to note the astrocyte involvement -if glial cell activity is upregulated during active learning phases and affects diffusivity measures, then a full or partial reversion of activity in these cells after training ceases might explain the diffusivity reduction we observed after active learning ceased.

Eye movement processing and fMRI activation
One of the surprising results of this study is the high degree of correlation between mean RT in scanner and mean fMRI activation (Fig. S10 in supplementary materials). This finding raises the important question whether RT directly predicts observed fMRI activation, for example via rate effects, rather than neuroplastic change.
Rate effects that reflect improvements in motor performance can affect the results in studies of learning-related changes in brain activity (e.g., Nyberg et al., 2006 ;Rao et al., 1996 ;Riecker et al., 2003 ). Nyberg et al. (2006 ) showed that motor training in a finger tapping task increased functional activity in SMA and right cerebellum. These effects could be dissociated from pure rate effects.
If one, or all, of our target areas were purely relay-areas where activation increases directly reflect more frequent or faster eye movements, then increased metabolism, without any underlying neuroplastic change, might be expected as participants improve over time. This explanation is not plausible for two reasons: the most direct evidence is provided by the fMRI measures during involuntary tasks -here, in contrast to the trained task, raw RT and BOLD response data are not correlated, meaning that in our target regions the RT does not directly predict the BOLD signal ( Fig. 9 in results section). A second argument for the hypothesis that the functional activation changes in this region are the result of learning is that the DTI analysis showed diffusivity changes in the same region. These changes, in contrast to fMRI activity, are clearly not task specific.

Structural changes supporting learning
The learning of new skills relies upon changes in brain function, and this functional adaptation depends on the capacity of the nervous system to modify its structure ( Scholz et al., 2009 ). Our correlation analysis, which shows that behavioural performance, functional activation and microstructural change are linked in functionally relevant brain areas is consistent with this.
An increase of FA and a reduction in MD, AD and RD are commonly associated with learning ( Sampaio-Baptista and Johansen-Berg, 2017 ; Madden et al., 2009 ), and may represent microstructural alterations (e.g. myelination processes, glio-genesis, angiogenesis and fibre remodelling; Wang et al., 2014 ;Sampaio-Baptista and Johansen-Berg, 2017 ;Cao et al., 2016 ). The present data show a significant reduction in MD and AD during training which is consistent with increased density of cellular membranes, for example of glial cells, that restrict the freedom of water diffusion ( Ruxton and Neuhaeuser, 2020 ). Similarly, RD significantly reduces during training and this is consistent with increased myelination Cao et al., 2016 ) in white matter tracts because a growing myelin sheath restricts water diffusivity perpendicular to the direction of the axons.
While this study presents robust reductions in MD, AD and RD that are correlated with behavioural data, the increase in FA, which we hypothesized to see, did not come close to significance in any of our target regions. FA changes are consistently reported for long term training studies ( Zatorre et al., 2012 ), it would be interesting to test whether more intense training or longer training periods would change this finding Lövdén et al., 2013 ;Cao et al., 2016 ).

Outcome prediction and rehabilitation
A key finding of this study was that diffusivity measures (MD, AD and RD), recorded after two weeks, predicted behavioural performance gains and structural change at the end of training. Functional activity at week two predicted performance at later time points in the scanner but not performance in the lab. One possible explanation for this discrepancy may lie in the complex and non-monotonic temporal dynamics that have been reported for functional activation changes during training, which we discuss in previous sections ( Yotsumoto et al., 2008 ). A second consideration is that fMRI activation, like performance, is an instantaneous measure of activity in the scanner. It is well known, and not surprising, that running experiments in the novel, and perhaps threatening, scanner environment has a negative impact on performance ( Gutchess and Park, 2006 ;van Maanen et al., 2016 ). Our behavioural data is consistent with this: the target detection accuracy in the scanner, for example, was significantly lower than in the lab (t(14) = 2.197, p < 0.01; see Tables S1 and S2, supplementary material Section B). Environmental changes will affect some participants more than others but are likely to affect functional and behavioural measures simultaneously. One would hope that any detrimental effects of the scanner environment resolve as participants get used to the environment, for example over repeated scans in learning studies. A potential issue here is that any baseline measures are taken at the first scan, where differential effects are likely to be largest and will affect all subsequent outcome measures. It is nevertheless difficult to derive a general recommendation to use 'in scan' performance, because these measures are constrained by the total time available for scans.
A possible application of the finding that microstructural data provides early outcome prediction lies in neurorehabilitation. One area of application are specific situations, where overt behavioural performance measures are difficult to obtain because of the patient's condition. An example of this would be very early aphasia interventions, which have shown promise after strokes, but where outcome measures (better fluency) are obtained weeks or months after admission ( Godecke et al., 2012 ). Many patients would not be able to understand instructions or reliably execute them in functional imaging tasks, so that structural scans that do not require active patient participation in cognitive tasks offer a promising alternative. Using structural rather than functional imaging to predict outcomes, however, has potential advantages well beyond the narrow application areas where communication with patients is impaired. Our participants, even though they were healthy volunteers and were scanned repeatedly, showed different performance patterns in the scanner compared to performance in the lab. These performance differences are likely to be exacerbated for patient groups that are scanned infrequently, which is likely in a rehabilitative setting, or are older than our participants because divided attention costs are larger for older than younger adults ( Gutchess and Park, 2006 ;van Maanen et al., 2016 ). The fMRI environment, therefore, may have a disproportionate effect on cognitive performance of older adults, who are more likely to require neurorehabilitation, than young participants. DTI structural scans therefore offer the prospect of providing data that is easier to capture and more robust than functional imaging, particularly for patients with a range of cognitive issues, and for whom the experience is likely to be novel and frightening.

Age as a factor
Janacsek et al. (2012 ) measured the effect of age on participant performance using an implicit sequence learning (ASRT) task ( Howard and Howard, 1997 ). This task, like our task, requires participants to identify stimuli presented in a predictable sequence of positions on the screen. The most important finding was that improvements in learning rate were seen in adolescents, learning rate was stable between 15 and 59 years, and then gradually reduced for older participants. Reaction times followed a u-shaped pattern with a minimum between 18 and 29 years while accuracy monotonically increased up to very old age. Finding that learning rate does not change until we reach a relatively high age is consistent with reports of rapid learning-induced DTI changes seen in older participants (average 69 years, Antonenko et al., 2016 ), showing that microstructural change is not restricted to younger participants.
Our participants were aged between 18 and 60 years old, and our analysis showed that age was not significantly related to any of the outcome measures, which were expressed as parameter change over time. We therefore did not include age as a covariate into our analysis. Age, however, was significantly correlated with the majority (11/17) of the absolute measures. On this basis we argue against the use of absolute neurometrics or performance measures as the basis for systematic evaluation of neuroplastic change in time.

Limitations
Our participants were scanned four times over the course of the experiment. Our data indicate differential time courses of functional, microstructural and behavioural performance measures. It would be interesting to document very early Yotsumoto et al., 2008 ) neuroplastic changes in response to the task as well as extend the post-intervention period beyond the four weeks we used ( Frank et al., 2018 ).
We show that functional activation measures predict learning outcomes measured in the lab less well than instantaneous performance in the scanner. One reason for this may be that our task was inherently self-paced, so that neural efficiency gains associated with the processing of individual stimuli could be counteracted by increasing numbers of stimuli that participants were able process with training. It would be instructive to consider the effect more systematically in future studies as they may lead to stronger evidence for fMRI activity reduction through consolidation.
The present study was designed to identify co-localised functional and structural changes in a VPL task and to quantify the time course of these changes. While the number of participants we used for these experiments lies within the range typically used for longitudinal studies ( Yotsumoto et al., 2008 ;Kourtzi et al., 2005 ;Frank et al., 2014 -15;19;26 participants respectively) we would have preferred to use a much larger participant number for the correlational analysis.

Conclusion
Visual search training led to an improvement in visual motor skills: fewer fixations and reduced fixation duration were required to perform the task. The behavioural improvement was associated with significant functional and structural brain plasticity. Microstructural alterations and functional activation changes coincide in two anatomical brain areas; extrastriate area (IPS/TOS and v3d) and cerebellar oculomotor region; both are known to be involved in visual eye movement processing.
The current study shows that microstructural changes can be reliably detected after relatively short durations of training and that these measures correlated well with functional and behavioural measures. We therefore argue these measures should be recorded routinely in learning paradigms because they provide further evidence that practice outcomes and neurometrics are related.
Water diffusivity indices (MD, AD and RD) may provide accurate measures that predict long-term learning outcomes at the initial stages of training.

Statement of data availability
All data and code are available from the authors upon reasonable request.

Funding
The Saudi Cultural bureau in London supported this work, as the first author is a sponsored PhD student.