Reaction Time Variability and Brain White Matter Integrity

Objective: Mean speed of responding is the most commonly used measure in the assessment of reaction time (RT). An alternative measure is intraindividual variability (IIV): the inconsistency of responding across multiple trials of a test. IIV has been suggested as an important indicator of central nervous system functioning, and as such, there has been increasing interest in the associations between IIV and brain imaging metrics. Results however, have been inconsistent. The present seeks to provide a comprehensive evaluation of the associations between a variety of measures of brain white matter integrity and individual differences in choice RT (CRT) IIV. Method: MRI brain scans of members of the Lothian Birth Cohort 1936 were assessed to obtain measures of the volume and severity of white matter hyperintensities, and the integrity of brain white matter tracts. CRT was assessed with a 4 CRT task on a separate occasion. Data were analyzed using multiple regression (N range = 358–670). Results: Greater volume of hyperintensities and more severe hyperintensities in frontal regions were associated with higher CRT IIV. White matter tract integrity, as assessed by both fractional anisotropy and mean diffusivity, showed the smallest effect sizes in associations with CRT IIV. Associations with hyperintensities were attenuated and no longer significant after controlling for M CRT. Conclusions: Taken together, the results of the present study suggested that IIV was not incrementally predictive of white matter integrity over mean speed. This is in contrast to previous reports, and highlights the need for further study.

were analyzed using multiple regression (N range ϭ 358 -670). Results: Greater volume of hyperintensities and more severe hyperintensities in frontal regions were associated with higher CRT IIV. White matter tract integrity, as assessed by both fractional anisotropy and mean diffusivity, showed the smallest effect sizes in associations with CRT IIV. Associations with hyperintensities were attenuated and no longer significant after controlling for M CRT. Conclusions: Taken together, the results of the present study suggested that IIV was not incrementally predictive of white matter integrity over mean speed. This is in contrast to previous reports, and highlights the need for further study.

General Scientific Summary
Variability in speeded cognitive test performance has been argued to be a potential early marker of cognitive decline and progression into mild cognitive impairment in aging. Evidence as to the robustness of the relationship, and the potential neurological underpinnings is varied. Our results suggest that average speeded performance, not variability, may be more reliably related to various measures of the brain. These findings are in contrast to much of the extant literature, highlighting the need for further research.
Keywords: white matter hyperintensities, diffusion tensor imaging, reaction time, intraindividual variability, cognitive aging Supplemental materials: http://dx.doi.org/10.1037/neu0000483.supp Speed of information processing and reaction time (RT) have been studied as integral parts of human cognitive capacities since the 19th century (Cattell & Galton, 1890), with interest persisting over the subsequent years (Deary, 2000;Diehl, Hooker, & Sliwinsky, 2015). It has been proposed that speed of basic information processing represents a fundamental and tractable element of human general cognitive abilities (Jensen, 2006), which has led to a huge degree of interest in studying its neurological basis (Eckert, 2011;Penke et al., 2010Penke et al., , 2012. Processing speed has also been suggested to be a key capacity in the study of cognitive aging (Madden, 2001;Salthouse, 1996). Classically, RT studies have focused on some measure of central tendency, or average speed over trials. More recently however, focus has switched to how speed of responding may vary across a set of trials (Hultsch, MacDonald, & Dixon, 2002). Within-individual variability in RT is correlated with average RT, but debate remains as to which is the most fundamental, and whether they share neuroanatomical correlates.
Intraindividual variability (IIV) in cognitive assessment indexes the consistency of a person's responses across a short period of time. In the context of an RT task, IIV is the amount of trial-to-trial variability. It provides a complement to the more widely used mean (or other index of central tendency) RT across a number of trials. IIV is a trait-like characteristic of an individual, in that people more variable on one cognitive task are also more variable on different tasks, and those more variable within a testing occasion are also more variable across occasions (Hultsch, MacDonald, Hunter, Levy-Bencheton, & Strauss, 2000). IIV is significantly correlated with higher level cognitive functioning; for example, with general mental ability (Deary, Der, & Ford, 2001;Rabbitt, Osman, Moore, & Stollery, 2001), with less variable people tending to have higher cognitive ability. There is a growing interest in IIV due, in part, to its predictive value. IIV predicts change in cognitive abilities over time (Lövdén, Li, Shing, & Lindenberger, 2007;MacDonald, Hultsch, & Dixon, 2003;Nilsson, Thomas, O'Brien, & Gallagher, 2014) and mortality (Deary & Der, 2005a). Furthermore, IIV differentiates between groups of different neurological health statuses, for example, mild cognitive impairment (MCI) versus no-MCI (Dixon et al., 2007), and dementia versus no dementia (Hultsch et al., 2000). People with MCI or dementia are, on average, slower and more variable, and there is some evidence that IIV has predictive value over and above that of RT mean (Dixon et al., 2007;Hultsch et al., 2000).
IIV increases with age from young adulthood (see Dykiert, Der, Starr, & Deary, 2012;Hultsch, Strauss, Hunter, & MacDonald, 2008, for reviews). The mechanisms underpinning this age effect are not well understood. The simplest explanation for the increased IIV is that it is driven by general slowing that occurs with age. In other words, as mean RT increases, so does the IIV. However, a number of researchers have argued that the increasing of IIV is the primary phenomenon which, in turn, leads to an increased mean RT. Several theories have been proposed as possible mechanisms of IIV; for example, higher frequency of attentional blocks (Bunce, Warr, & Cochrane, 1993) or lapses of intention (West, Murphy, Armilio, Craik, & Stuss, 2002), which are related to poorer executive functioning. The lapses or blocks lead to very long RTs on trials on which they occur, thus increasing the overall IIV. Naturally, these long RTs also lead to an increase in mean RT.
A primary focus of research examining the biological basis of RT and IIV has been on the brain. Life-course changes in IIV (decrease in childhood, relative stability in adulthood and an increase in older age) closely map onto the maturation and degeneration of the brain across the life span (MacDonald, Nyberg, & Bäckman, 2006). Specifically, accumulating evidence from recent imaging studies has highlighted brain white matter and its integrity as potentially important for RT. Table 1 summarizes the results from a number of recent studies which have considered the associations between IIV in a RT task and metrics derived from brain imaging. A few generalities may be taken from the content of Table 1. First, across modalities, there is some evidence that the effects of IIV may be independent of mean RT (e.g., Jackson, Balota, Duchek, & Head, 2012; Fjell, 2007). Second, in studies focused on brain volume measures, there has been some consistency in effects located in the frontal white matter (Bunce et al., 2007;Haynes et al., 2017;Lövdén et al., 2013). Studies focused on white matter connectivity have also found support for the importance of frontal associations (Fjell, Westlye, Amlien, & Walhovd, 2011;Moy et al., 2011). Although these results, on face value, suggest a consistency of effects observed across studies, the issue of whether RT variability is related to white matter (WM) micro-and macrostructure is far from resolved. There is much heterogeneity in the studies described here, in particular with reference to the RT tasks on which intraindividual variability is calculated; the measures of WM integrity adopted; and the size, age, and make-up of the samples.
Variability in cognitive performance is not a unitary concept. Even the same measure, for example individual standard deviation, may represent qualitatively different aspects of human performance, depending on the task or the timescale from which it was derived. For example, there are different components to different tasks, such as perceptual, motor, reasoning, decision making, or inhibition of irrelevant response. Variability in some of these components might have different neural underpinnings. Variability in CRT, which is a relatively simple task requiring only a minimal amount of cognitive processing (a selection and execution of an appropriate response) may not be readily comparable to variability in a task, which might still use RT as its "score" but requires more complex cognitive operations (e.g., inhibition of an irrelevant response or performing operations in WM). Consistent with this notion, there are reports of different IIV-WM integrity associations from tasks of different difficulty (e.g., Bunce et al., 2007;Deary et al., 2006;Fjell et al., 2011;Haynes et al., 2017;Mella et al., 2013). Of note is that both higher and lower associations have been reported for more complex tasks. Further, Fjell et al. (2011) demonstrated that associations of IIV and measures of WM integrity differed, not only in magnitude but also in spatial distribution in the brain, depending on whether congruent (less demanding) or incongruent (more demanding) trials were selected for the calculation of IIV. Given these findings we propose that the question of whether there is a relationship between WM micro/macrostructure needs to be addressed by a series of studies focused on specific RT tasks.
A second important consideration is the size and structure of the samples used in the extant research. The problem of low power in studies with small samples is generally accepted, insofar as underpowered studies are less likely to detect a true effect (i.e., they are more likely to produce false negatives). However, two issues associated with small samples that are underappreciated are that (a) even the effects that are found to be significant, are less likely to reflect a true effect and (b) the effect sizes of significant results are more likely to be overestimated (Button et al., 2013). Sample sizes of studies reviewed in Table 1 vary from 25 to 526. For reference, a sample size required to achieve 80% power to detect a small (r ϭ .1) or medium (r ϭ .3) correlation at ␣ ϭ .05 would be 782 and 84, respectively. This is clearly an oversimplification of the process of appropriate power calculations, however consider in light of these estimates, none of the studies reviewed were sufficiently powered to detect a small effect and more than half had insufficient power to detect a moderate effect. Finally, and as noted above, age plays an important role in the relationship between RT IIV and brain integrity. Fjell et al. (2011) found that the association between white matter microstructure and IIV increases with age, with stronger associations found among older participants (age Ն52 years) than younger participants (age Ͻ52 years). Moy et al. (2011) found that effects for radial diffusivity (RD) and mean diffusivity (MD) were no longer significant after controlling for age (range ϭ 20 -66ϩ years in their sample). Given the potential complexity of the relationship with age, birth cohort studies may be particularly useful as they provide built in control for age effects.
In the current study, we seek to provide a comprehensive assessment of the associations between a variety of volumetric and connective brain imaging metrics and variability in a single task (CRT task) in a large birth cohort, thus minimizing the influence of test comparison, age effects and small sample estimate inconsistency.

Method Participants
Participants in the current study were from the Lothian Birth Cohort 1936 (LBC1936), a longitudinal study of cognitive aging. The study sample consists of surviving members of the Scottish Mental Survey 1947, most of whom were tested on an IQ-type test at school at approximately age 11 years. The LBC1936 participants were mostly resident in Edinburgh and its surrounding area (i.e., the Lothians, Scotland) at recruitment to Wave 1 of the study at about age 70 years. Study protocols for initial recruitment and subsequent waves of data collection, including brain MRI, are reported in detail elsewhere (Deary, Gow, Pattie, & Starr, 2012;Deary et al., 2007;Wardlaw et al., 2011).
The LBC1936 sample consisted of 1,091 participants in Wave 1 (age M ϭ 69.5 years, SD ϭ 0.8), of whom 866 participants returned in Wave 2 (age M ϭ 72.5 years, SD ϭ 0.7). Of these, 855 were invited to undertake MRI, of which 728 initiated MRI protocol. For the current study, we retained only those participants for whom quantitative estimates of WMH volume were available. Reasons for absence of WMH data included the use of shortened sequencing protocol with some participants who were uncertain or anxious, termination of scan prematurely due to issues such as claustrophobia, poor quality data due to movement artifacts, and a number of other health and safety reasons (for discussion of some of these issues see Sandeman et al., 2013). The resultant total study sample comprised 671 participants (see Table 2 for sample demographics). All data used in the current cross-sectional study come from Wave 2 of testing.

CRT Variability and Mean
CRT mean and variability were measured using a portable device designed for the U.K. Health and Lifestyle Survey (Cox, Huppert, & Whichelow, 1993). The box includes a high-contrast LCD display and five response keys labeled 1, 2, 0, 3, 4 arranged in a shallow arc. In the four CRT task, participants placed their second and middle fingers of each hand on the buttons labeled 1, 2, 3, and 4. Participants were presented with a number (1, 2, 3, or 4) on the LCD screen and had to press the corresponding button as quickly as possible. The test consisted of a total of 48 trials: eight practice trials and 40 test trials. Within the test trials, each of the numbers (1, 2, 3, and 4) appeared 10 times in a random order. The time between trials varied randomly from 1 s to 3 s across all trials.
The box provides the mean and standard deviation of both the correct and incorrect responses; however, only data from the correct responses were available for the current study. Deary and Der (2005b) reported the test-retest reliability of M CRT and SD CRT based on a sample of 49 adults (age M ϭ 37.1 years, SD ϭ 11.4). M CRT had a test-retest stability of 0.92, whereas for SD CRT the test-retest reliability was 0.73.
In the current analyses, our primary variable of interest is SD CRT. However, for comparison purposes with previous studies, we also report results for models in which the dependent variable is the coefficient of variability (CV; CV ϭ SD CRT/M CRT). As is expected, these three variables show moderate to high positive correlations (M CRT and SD CRT: r ϭ .62; M CRT and CV CRT: r ϭ 0.16; SD CRT and CV CRT: r ϭ 0.86). Wardlaw et al. (2011) provide full details of the brain imaging protocol. In brief, participants underwent whole brain structural and high angular resolution 2-mm isotropic voxel diffusion MRI (7 T2-and 64 diffusion-weighted (b ϭ 1,000 s/mm2) axial singleshot spin-echo echo-planar imaging volumes) on a GE Signa Horizon HDxt 1.5T clinical scanner (General Electric, Milwaukee, WI), using a self-shielding gradient set (maximum gradient strength 33 mT/m) and an eight-channel phased-array head coil. The structural MRI included axial T2-(1-mm ϫ 1-mm ϫ 2-mm voxels), T2 ‫ء‬ -(1-mm ϫ 1-mm ϫ 2-mm voxels) and FLAIRweighted (1-mm ϫ 1-mm ϫ 4-mm voxels) scans, and a high resolution T1-weighted volume scan (1-mm ϫ 1-mm ϫ 1.3-mm voxels) acquired in the coronal plane. All sequences were collected with contiguous slice locations, whereas the acquisition parameters for the T2-, T2 ‫ء‬ -, FLAIR and diffusion MRI protocols, that is, field-of-view (256 mm ϫ 256 mm in all cases), imaging matrix, slice thickness and location, were chosen to allow easier coregistration between scans.

Quantitative White Matter Hyperintensity (WMH) Volumes
Prior to image segmentation, all structural scans were coregistered using FLIRT, a linear automatic registration tool from the FMRIB Software Library (http://www.fmrib.ox.ac.uk/fsl). We used a validated multispectral image processing tool, MCMxxxVI (Valdés Wardlaw et al., 2011; http://sourceforge.net/projects/bric1936) for segmentation of brain tissue volumes from the four structural scans, that is, T2-, T1-, T2 ‫ء‬ -and FLAIR-weighted MRI, to measure: intracranial volume (all soft tissue structures inside the cranial cavity including brain, dural, cerebrospinal fluid and venous sinuses); total brain volume (the actual brain tissue volume without the superficial or ventricular cerebrospinal fluid); cerebrospinal fluid (all cerebrospinal fluid inside the cranial cavity including the ventricles and superficial subarachnoid space); and WMH volumes. MCMxxxVI does not distinguish hyperintense and hypointense areas of cerebromalacea due to old cortical/subcortical infarcts or lacunes from WMH and cerebrospinal fluid, respectively. Therefore, these areas were masked out from the respective binary masks by thresholding in FLAIR sequences using a region-growing algorithm from Analyze 10.0 (http://www.analyzedirect.com/ Analyze). Where stroke lesions were confluent with WMH, the boundary between the two was determined by evaluation of the WMH in the contralateral hemisphere and neuroradiological knowledge. Brain tissue volumes were measured blind to participant information. For the current study, we use the total WMH volume (cm 3 ) after first residualizing for overall brain size (intracranial volume, cm 3 ).

Qualitative White Matter Lesion Location
Qualitative visual ratings of the intensity and location of WMH were scored using the Wahlund scale based on the FLAIR and T2-weighted scans (Wahlund et al., 2001). Hyperintensities were rated both bilaterally and as an overall score in the frontal, parietooccipital and temporal lobes, as well as the basal ganglia and infratentorial regions. Hyperintensities were rated on a four-point scale: for basal ganglia: 0 ϭ no hyperintensities; 1 ϭ one focal Note. Both white matter tract integrity general fractional anisotropy factor (WMT gFA) and white matter tract integrity general mean diffusivity factor (WMT gMD) are regression-based factor scores, standardized to mean of 0 and standard deviation 1. CRT ϭ choice reaction time; FA ϭ fractional anisotropy; MD ϭ mean diffusivity.
hyperintensity; 2 ϭ more than one focal hyperintensity; 3 ϭ confluent hyperintensities; for other regions: 0 ϭ no hyperintensities; 1 ϭ focal hyperintensity; 2 ϭ beginning confluence hyperintensity; 3 ϭ diffuse involvement of the entire region) by Zoe Morris, a trained neuroradiologist. This process results in an ordered categorical variable. On inspection of the score distribution (see Table 2), it became clear there were very low frequency cells which would be problematic for the estimation of the association beta coefficients. As such, we created binary variables for each area by combining 0 and 1 and 1 and 2 scores from the original scale.
For the current analysis, we used the overall Wahlund ratings rather than considering the left and right hemispheres individually. Correlations between left and right hemispheres were moderate to high for all lobes and regions (frontal lobe ϭ .86; parieto-occipital lobe ϭ .89; temporal lobe ϭ .64; basal ganglia ϭ .72; infratentorial region ϭ .88).

Tract Segmentation
The diffusion MRI data were preprocessed using FSL tools (FMRIB, Oxford, UK; http://www.fmrib.ox.ac.uk) to extract the brain, remove bulk patient motion and eddy current induced artifacts, and generate parametric maps of fractional anisotropy (FA). Underlying connectivity data were generated using BedpostX/ ProbTrackX with the default settings of a two-fiber model per voxel, and 5,000 probabilistic streamlines with a fixed separation of 0.5 mm between successive points (Behrens, Johansen Berg, Jbabdi, Rushworth, & Woolrich, 2007).
Twelve tracts of interest were identified using probabilistic neighborhood tractography, a novel approach for automatic and reproducible tract segmentation (Clayden, Storkey, & Bastin, 2007), as implemented in the TractoR package for fiber tracking analysis (Clayden et al., 2011; http://www.tractor-mri.org.uk). Briefly, this method works by segmenting the same fasciculus-ofinterest across a group of subjects from single seed point tractography output by modeling how individual tracts compare to a predefined reference tract in terms of their length and shape (Clayden et al., 2007). In practice, multiple native space seed points are placed in a cubic neighborhood of voxels (typically 7 ϫ 7 ϫ 7) surrounding a seed point transferred from the center of the reference tract, which is defined in standard space, with the tract that best matches the reference chosen from this group of 'candidate tracts'. Tracts assessed were the genu and splenium of corpus callosum, and bilateral anterior thalamic radiations, rostral cingulum bundles, arcuate, uncinate and inferior longitudinal fasciculi. Tract masks generated by probabilistic neighborhood tractography were overlaid on the FA parametric maps and tract-averaged values of these biomarkers, weighted by the connection probability, determined for each tract in every participant.
To ensure that the segmented tracts were anatomically plausible representations of the fasciculi of interest, a researcher (i.e., Susana Muñoz Maniega) visually inspected all masks blind to the other study variables and excluded tracts with aberrant or truncated pathways. In general, probabilistic neighborhood tractography was able to segment the 12 tracts of interest reliably (see Clayden, Storkey, Muñoz Maniega, & Bastin, 2009) in the majority of participants, with tracts that did not meet quality criteria, such as truncation or failing to follow the expected path, ranging from 0.3% for the splenium of corpus callosum to 16% for the left anterior thalamic radiation, with a mean of 5%. (Failures in tract segmentation are typically caused by underlying tractography errors in BedpostX/ProbTrackX resulting from finite image resolution, small registration mismatches in the component diffusion MRI volumes and measurement noise.) From the point of view of substantive investigations, the 12 tracts represent a good balance between projection (anterior thalamic radiation), commissural (genu and splenium of the corpus callosum) and association (arcuate fasciculus, rostral cingulum, uncinated fasciculus & inferior longitudinal thalamic radiation) fibers which connect a wide variety of brain regions. In the current study, we used FA and MD data for each of the 12 tracts to compute a metric of overall white matter integrity following Penke et al. (2012).
Confirmatory factor analytic models were fit using full information maximum likelihood estimation to account for the small proportions of missing data in Mplus 7.4 (Muthén & Muthén, 1998. A single general integrity factor was modeled, with all 12 tracts loading on it. Separate models were fit for FA (gFA) and MD (gMD). Values for the left and right hemispheres of each tract were allowed to correlate in order to account for the local dependence. Regression based factor scores were estimated from these models and used in subsequent analyses.
In addition, and for information, we also ran models using the hemispheric tract average for each of the tracts listed above.

Health Covariates
Participants were asked a series of questions on their medical history by an interviewer, which were responded to with simple Yes/No answers. The questions asked whether participants had a history of cardiovascular disease, hypertension (being treated for), high cholesterol, diabetes, blood circulation problems (being treated for), or stroke. In all statistical analyses (see next section), these variables were included as individual binary covariates. This was done to provide some statistical control for any shared variance between presence of disease, imaging metrics and speeded performance.

Statistical Analyses
We built two regression models for each of our predictor variables of interest, WMH, general white matter integrity FA and MD, and Wahlund ratings of white matter lesion severity and location. In addition to the primary imaging variables of interest, we also include results from analyses using the hemispheric aver-ages (or single values for the genu and splenium of the corpus callosum) for each white matter tract as predictors of speed variability in place of the general FA and MD variables.
In the first model (Model 1), we included age, sex, health covariates and the brain integrity measure of interest. We chose to include age as, despite the use of a birth cohort and thus a narrow age range, both RT and white matter are sensitive to effects of age. Correlations between age and the focal RT and white matter variables are provided in the online supplemental material. In the second model (Model 2), we additionally included M CRT. Here we were interested in the extent of attenuation of any main effects after controlling for average RT.
The primary focus of this study was in prediction of SD CRT as a measure of variability. However, for completeness and following reviewer comments, given the exploratory nature of the study, we also estimate models using both M CRT and the CV CRT as dependent variables. In the analyses of M CRT, the second stage models include SD CRT in the second step. Results for assumption checks for all models are included in the online supplemental material.
To assess the robustness of our findings, a number of sensitivity analyses were conducted. First, to assess the influence of outlying values, we reran our final models using robust regression using the Huber method as implemented in the rlm function in the MASS package in R (Venables & Ripley, 2002). Second, we reran our final models on the basis of subsets of the full data set-those individuals with a history of cardiovascular disease (CVD) or stroke (n ϭ 214) and those with a history of neither (n ϭ 457).

Results
Descriptive statistics are shown in Table 2. The frequency of the Wahlund ratings indicates that hyperintensities are primarily located, and have greater severity, in the frontal and parieto-occipital regions. As often observed in aging samples, the most commonly reported medical problems were hypertension and high cholesterol.
Results of regression diagnostic tests suggested no evidence of multicollinearity, heteroscedasticity, nonindependence of errors, nonnormality of model residuals, or influential observations (see the online supplemental material). Reaction time variables often display high levels of skew and kurtosis; however, this problem is usually less severe in CRT compared with simple RT. As can be seen in Table 2, in our study SD CRT was relatively normally distributed. As such, we did not deem it necessary to rerun models using the log transform of RT, as is often reported in the literature. Table 3-8, display the standardized regression coefficients for models including WMH volume, gFA, gMD, Wahlund ratings, individual tract average FA and individual tract average MD, respectively. In order to facilitate comparisons across models, each table contains results from all models with SD CRT, M CRT, and CV CRT as the outcome variables of interest. Given the large number of models and tests in this exploratory study, p values are reported for completeness, but we refrain from interpreting individual effects based on these (for the full model results tables, see Tables S2 through S18 in the online supplemental material).
Across all models, the variance accounted for by age, sex, health covariates and the focal brain imaging variables was small, ranging from 2.6% to 4.8% variance explain for SD CRT, with ranges of 3.3% to 5.4% and 0.9% to 3.9% variance explained for M CRT and CV CRT respectively. As expected given the strong positive correlation between M CRT and SD CRT, the addition of M CRT to models predicting SD CRT increased variance explained to between 36.0% and 41.1%. Similarly, and again as expected, when SD CRT was added to models predicting M CRT, variance explained increased to between 36.6% to 41.7%. (See Tables S2 through S18 in the online supplemental material for full results of model F-test variances explained.)

Covariate Effects
For age and sex, the direction of the effects indicated that female participants and those who were older had higher SD CRT, and thus were more variable in performance, whereas men and older participants had higher M CRT, indicating they were slower on average across trials. Of the health covariates, all effects sizes were small (absolute ␤ range ϭ .00 to .11), with effects generally indicating that those who have a history of a medical condition were both more variable and have a higher M CRT.

Main Effects
Across models, WMH volume demonstrated consistently larger effects (␤ range ϭ .05 to .14), with the direction of the effects indicating the increased WMH volume was associated with greater variability and higher average CRT. The largest effects across models were seen with the white matter tract variables. Of the general measures, the largest effect was for gFA predicting M CRT (␤ ϭ 0.15). Considering the specific tract averages, the strongest effects were seen for both FA and MD in the genu of the corpus callosum, particular with M CRT.
However, these effects were in the opposite direction to that which might be expected. Positive associations between FA and SD CRT (␤ ϭ 0.14) and M CRT (␤ ϭ 0.17) suggest that higher values of FA are associated with higher average reaction time and greater variability, with opposite effects and interpretation for MD (␤ ϭ Ϫ0.08 and Ϫ.022, respectively). Finally, of the models including the Wahlund ratings, the largest effects were seen for ratings in frontal regions (␤ range ϭ .06 to .15), indicating those with greater severity of lesion in this region were both more variable and had higher M CRT.

Comparison Across SD CRT, M CRT, and CV CRT
Across all models, the largest effects were seen for the coefficients predicting M CRT, indicating that the imaging variables were more strongly associated with average performance than variability. However, it is important to note that in absolute terms these effects were still small, and thus the difference in the coefficients predicting SD CRT, M CRT, and CV CRT were also small.
The difference in coefficients from Model 1 to Model 2 for both SD CRT and M CRT provide information on the degree to which the effects of the imaging variables on SD CRT and M CRT are  attenuated by the inclusion of other CRT summary variable. Across models, the inclusion of either SD CRT or M CRT resulted in attenuations to the effects of the focal imaging variables. The magnitudes of these attenuations varied. Fractionally larger attenuations were evident in models predicting SD CRT when M CRT was added as a predictor than the reverse specification. However, these differences were marginal. Many coefficients that were near zero in the original models (Model 1 across tables) showed varied attenuations in magnitude and direction of effect, fluctuating around zero.

Sensitivity Checks
To assess the sensitivity of our results, we reran all final models using robust regression (see Tables S19 through S21 in the online supplemental material.) The pattern of results was identical to the main models with respect to the direction, magnitude, and associated inferential statistics.
Next, we considered whether those individuals with a history of CVD or stroke drove the observed effects in the total sample. Parameter estimates were compared for the final models in which the sample  was split into a sample without history of either CVD or stroke (n ϭ 456) and those with a history of either condition (n ϭ 214; see Tables S22 through S24 in the online supplemental material). Again, the pattern of results was largely similar to the main analyses. For each model, for each outcome, effect sizes were generally smaller due. However, the direction of effects and the relative size of effects were broadly consistent. Taken collectively, the results of the sensitivity checks indicated the patterns of results were generally robust of individual influential cases and were not driven by those within the sample with a history of CVD.

Discussion
In the full sample, WMH volume was positively associated with SD CRT; people with larger WMH volume were more variable in CRT responses. Ratings of hyperintensity severity in frontal lobes, but not in other brain areas, were positively associated with SD CRT. Neither FA nor MD estimates of general white matter tract integrity were significantly associated with SD CRT. Across all models, when M CRT was included in the models, an attenuation of effects for white matter variables was observed. Therefore, in the current sample, there were no incremental effects of various metrics of white matter on CRT variability over and above the average effects of CRT speed. Across all models, the variance explained by white matter measures was small (Range ϭ 0.9% to 4.8%).
To investigate the relationships of the imaging measures with different dependent measures derived from the CRT test, we reran our models using both CV CRT and M CRT as the dependent variable. With CV CRT as the dependent variable, the association of the frontal hyperintensity ratings remained, as did ratings of hyperintensities in the parieto-occipital region. However, the direction of these effects differed, specifically in the case of the parieto-occipital association, lower ratings of hyperintensities were associated with greater CRT variability. The counter intuitive direction of this effect suggests this may be a chance finding. It should be noted here that, although frequently used, CV CRT is a crude measure of IIV adjusted for mean and a number of issues associated with its use and interpretation have been raised (see, e.g., Hultsch et al., 2008, for discussion).
When models were reestimated using M CRT as the dependent variable, associations were found with WMH volume, gFA and gMD. These relationships were found to be independent of SD CRT. Given the large number of statistical tests reported in the current study, it is perhaps more informative to consider the magnitude of the standardized effects. The coefficients from the various imaging metrics are larger in magnitude when predicting M CRT when compared with SD CRT (see Table 3 through  Table 8). Only in the case of Wahlund ratings (Table 6) are the standardized effects approximately equal.
Taken together, the results of the present study suggest that various metrics of WM integrity show limited associations with either SD CRT and M CRT when both variables are included in the models. Put differently, imaging metrics are not incrementally predictive of either SD or M CRT when the other RT measure is controlled for. This is in contrast to some previous reports suggesting that IIV's association with WM integrity is largely independent of mean RT speed (Bunce et al., 2007;Fjell et al., 2011;Moy et al., 2011).
The results support findings from earlier work linking white matter integrity and information processing speed in the current sample. For example, Valdés Hernández et al. (2013) showed that WMH load is associated with both general cognitive ability and with information processing speed in old age. Using diffusion tensor MRI indicators of white matter tract integrity rather than hyperintensities, Penke et al. (2012) showed that the association between the integrity of white matter tracts and general cognitive ability was fully mediated by the speed of basic information processing. However, neither of the previous studies in the current sample have considered CRT variability. As outlined in the introductory paragraphs, there remain some differences in the methodologies applied across studies-for example, the specific WMH measures, composition of the samples, and different RT tests-which make drawing meaningful direct comparisons, difficult. More importantly, many of the previous studies were underpowered. Comparing our results to those studies with comparable sample sizes, for example Haynes et al. (2017; see Table 1), similar patterns of results are seen. Specifically, the effects of interest are largely small, and when models are estimated, including both metrics of average RT and variability in RT, effects are attenuated and drop below nominal alpha thresholds.
Our study is one of the largest-sampled studies to consider SD CRT, M CRT, and CV CRT alongside multiple different metrics brain white matter. Though exploratory in nature, such an approach can be advantageous in an area where current findings contain inconsistencies. For example, our study is one of few large studies to address the issue of whether the association between frontal WMH and CRT IIV might be explained by the shared association of these two variables with mean RT. The current study does not resolve the question of whether variability or average RT performance is more strongly related to fundamental measures of the brain. However, we hope that others will follow in adding further evidence from adequately powered studies to help resolve the existing inconsistencies in the results. Such studies would benefit from both more extensive and detailed proxy behavioral measures of speed, and more biologically derived metrics of brain activation. Some investigations of this type have been carried out and they provide interesting insights. For example, studies of blood oxygen-level-dependent signal variability in the brain suggested that, at least within the brain itself, IIV and mean signal are distinct quantities (Garrett, Kovacevic, McIntosh, & Grady, 2011).
Moreover, contrary to the general consensus regarding RT IIV, it appears that as far as brain signals are concerned, greater -not smaller -variability may be advantageous, representing greater adaptability from moment to moment (Garrett et al., 2011(Garrett et al., , 2013McIntosh, Kovacevic, & Itier, 2008). This underlines the complexity of the topic and warrants further research into more nuanced aspects of variability.
The effects noted in the current study are generally small. However, the current sample is sufficiently large and the measures sufficiently reliable that we judge that these effect sizes are unlikely to have been much biased on the basis of features of the study, and we judge that the estimates presented here are robust. Of course, the practical question of the importance of such small effects for an individual's daily functioning remains; however, small effects can have importance at the population level.
The analyses also suggested a number of covariates to be significant predictors of the various measures of CRT (SD CRT, M CRT, and CV CRT). Specifically, in a majority of models, sex was a significant covariate with the direction of the effect suggesting that women were more variable in performance than men. Interestingly, given the narrow age range of the current sample (birth cohort), age had a significant effect on CRT measures. In the case of predicting SD CRT this was true only in the models without M CRT. In predicting M CRT, age remained a significant covariate after the inclusion of SD CRT. Collectively, this pattern of results suggests that in the current sample, at least, age is primarily associated with overall M CRT and not variability. Although it may seem surprising given the narrow range of age, and thus low variance, to see significant effects, speeded performance has been shown to be sensitive to the effects of aging. On a practical note, the significant effects also validate the inclusion of age as a covariate in our models.
This study provided a systematic exploratory investigation of the association between white matter integrity and RT IIV in a large sample of older adults. Sample sizes are often small in neuropsychological studies (but see Anstey et al., 2007;Bunce et al., 2007;Haynes et al., 2017, for notable exceptions), which can exacerbate the problem of unreliable, unreproducible results. Greater number of large, adequately powered studies are needed to clarify the effects. Another strength of the present study is the use of multiple measures of white matter integrity, providing a comprehensive look at the associations with RT IIV and speed. It should be noted that we considered only the volume and location of WMHs, and white matter tract integrity and that it is known that as the brain ages and in the presence of increasing degrees of WMHs, the integrity of normal appearing white matter becomes increasingly nonnormal. Future studies should include measures of the tissue integrity of the whole brain, which may provide some additional insight into the associations between integrity and the average and variability in speeded test performance.
The RT measures used in the present study were calculated from 40 trials, which is a modest number by comparison with some other studies investigating RT variability in relation to white matter integrity. Even though 20 trials have previously been shown to be an adequate number for this type of investigation (Bunce et al., 2013), we know that the reliability of both IIV and mean RT increases, and perhaps more importantly, the discrepancy in reliabilities between the two measures decreases, with the larger number of trials (Schmiedek, Lövdén, & Lindenberger, 2009). Therefore, having more trials from which mean RT and IIV are calculated is highly encouraged in future studies in this field.
In conclusion, we found little evidence that white matter integrity explains variance in SD CRT over that explained by the M CRT. The results suggest that the association between WMH load and CRT variability might be secondary to the association of WMH with average CRT. At least in the case of CRT and in the cross-sectional analysis of the current sample, neither WMH load nor white matter tract integrity appear to be strong candidates to explain age differences in CRT IIV. Further investigations into putative causes of increased CRT in older adults are required.