Aerobic exercise is associated with region-specific changes in volumetric, tensor-based, and fixel-based measures of white matter integrity in healthy older adults

White matter integrity and cognition have been found to decline with advancing adult age. Aerobic exercise may be effective in counteracting these declines. Generally, white matter integrity has been quantified using a volumetric measure (WMV) and with tensor-based parameters, such as fractional anisotropy (FA) and mean diffusivity (MD), the validity of which appears to be compromised in the presence of crossing fibers. Fixel-based analysis techniques claim to overcome this problem by yielding estimates of fiber density (FD), cross-section (FC)


Introduction
The human brain undergoes a range of structural changes as senescence progresses.The deterioration of white matter integrity has been widely documented using various methods of quantification with both cross-sectional and longitudinal data (see reviews by Gunning-Dixon et al., 2009;Liu et al., 2016;Wassenaar et al., 2019).White matter volume (WMV), which is used to index macrostructural changes (i.e., general atrophy in aging), has been shown to increase until middle age and then quickly decrease starting around 60 years old (e.g., Bendlin et al., 2010;Geng et al., 2016;Raz et al., 2005).Diffusion tensor-derived measures of white matter integrity, which are used to approximate the direction and magnitude of the diffusion of water molecules through brain tissue, also show age-related changes; fractional anisotropy (FA), higher values of which are thought to indicate a higher degree of organization of white matter tracts, has been found to decrease with older age (e.g., Beck et al., 2021;Bendlin et al., 2010;Damoiseaux et al., 2009;Hsu et al., 2008;Kennedy and Raz, 2009;Lövdén et al., 2014), and this decrease has been shown to accelerate with age (Sexton et al., 2014).Mean diffusivity (MD), lower values of which are thought to indicate restricted movement of molecules and are thus typically interpreted as an indicator for denser tissue, has in turn been shown to increase with older age (e.g., Beck et al., 2021;Bendlin et al., 2010;Hsu et al., 2008), with this change again accelerating in older individuals (Sexton et al., 2014).The decreases in WMV and degradation of white matter microstructure indexed with FA and MD are thought to contribute to age-related cognitive decline, particularly in perceptual speed (see review by Bennett and Madden, 2014).For example, both FA and MD have been repeatedly shown to be related cross-sectionally to performance on tasks of perceptual speed (e.g., Bendlin et al., 2010;Hong et al., 2015;Kennedy and Raz, 2009;Kerchner et al., 2012;Turken et al., 2008).
Given this association between certain measures of white matter integrity and cognition at least cross-sectionally, efforts have been made to slow or even reverse the deterioration of white matter, with the aim of ameliorating age-related cognitive decline.In this regard, aerobic exercise (i.e., physical activity resulting in increased cardiovascular fitness) has been proposed as a lifestyle factor intervention.Overall, small but significant effects of physical fitness and physical activity on WMV have been found (see Sexton et al., 2016, for a meta-analysis).Mixed results have been found regarding microstructural integrity measured with FA and MD and its association with physical fitness and activity.Some cross-sectional studies have shown positive correlations with FA and negative correlations with MD (Gow et al., 2012;Johnson et al., 2012;Z. Liu et al., 2012;Tseng et al., 2013a), while others found no association (Burzynska et al., 2015;Marks et al., 2011;Tian et al., 2014).Longitudinally, one study showed that individual differences in fitness change were associated with change in FA and short-term memory (Voss et al., 2013), while another showed decreases in FA and no change in MD after exercise training (Clark et al., 2019).In a recent review, Erickson et al. (2019) concluded that there is moderate evidence supporting an association between aerobic exercise and cognition in a number of domains in older adults (also see Barha et al., 2017).Of note, an earlier review including twelve randomized controlled trials comparing aerobic exercise and a variety of control conditions found no evidence for an effect of aerobic exercise on cognition (Young et al., 2015).Additionally, there have been findings of the associations between white matter integrity and perceptual speed in both younger (Magistro et al., 2015) and older adults (Papp et al., 2014).However, the role of white matter integrity in the relationship between aerobic exercise and cognition is still unclear, as relatively few intervention studies have investigated this question (see Stillman et al., 2020) and even fewer have produced positive results (e.g., Voss et al., 2013).
Although many of the aforementioned studies used FA and MD as indicators of white matter integrity, the use of these diffusion tensor metrics has received criticism.The diffusion tensor model with which these metrics are derived is not fiber-specific, and is therefore not able to adequately distinguish between contributions from individual fibers to voxel-wise metrics in voxels with more complex multi-fiber geometry, such as crossing fibers (Raffelt et al., 2012(Raffelt et al., , 2015(Raffelt et al., , 2017)), which poses a significant limitation as 60-90% of voxels within white matter have been estimated to contain crossing fibers (Jeurissen et al., 2013).In order to make up for this shortcoming, newer metrics have been developed using "fixels," which represent specific fiber bundles within a voxel (Raffelt et al., 2017).Within each voxel, a fiber orientation distribution (FOD) of the fixels is computed using constrained spherical decomposition (Dhollander et al., 2016;Jeurissen et al., 2014).These FODs are then used to calculate apparent fiber density (FD), reflecting the density of fibers within a specific bundle, and fiber cross-section (FC), reflecting the diameter of a specific fiber bundle and commonly transformed to log(FC), as well as the product of FD and FC, fiber density and cross-section (FDC).These metrics are currently thought to be a more reliable measure of the physiological properties of white matter fibers than FA and MD.Supporting this, a combined histological and ex vivo MRI study in rodents (Rojas-Vite et al., 2019) found that histologically measured axonal density in both the optic nerve and optic chiasm correlated with FD in these regions estimated with constrained spherical deconvolution.Some studies have now also investigated the association of these fixel-based metrics with age.Choy et al. (2020) found that all three metrics showed widespread negative associations with age in a cross-sectional sample, particularly in anterior regions of the brain.Similarly, Kelley et al. (2021) found reduced FD, FC, and FDC in older adults versus younger adults, particularly in fronto-limbic areas.To our knowledge, the association between age-related decreases in the fixel-based metrics and cognition in healthy aging has not yet been explored, though a number of studies have found reduced values of FD, FC, and FDC in patients with Alzheimer's disease (Mito et al., 2018) and Parkinson's disease (Li et al., 2020;Rau et al., 2019;Zarkali et al., 2020).There has also been no investigation of the effects of an aerobic exercise intervention on the fixel-based metrics to date.
The current study thus aims to comprehensively investigate the relationship between aerobic exercise, various white matter integrity metrics, and perceptual speed.The current sample of older adults participated in a six-month intervention, either in an aerobic exercise group or an active control group.We expected to find (i) positive changes in WMV in exercisers as compared to controls.Given previous findings, we also expected to find (ii) increased FA values and decreased MD values as an effect of exercise, or overall decreases in FA and increases in MD, with no significant exercise effect.Thirdly, given the inverse association between age and fixel-based metrics, we expected to see (iii) an amelioration of age-related negative changes in FD, FC, and FDC as a result of aerobic exercise.We also explored correlations between cardiovascular fitness, white matter metrics, and performance on a task indexing perceptual speed.In this regard, we expected to see (iv) positive relationships between greater cardiovascular fitness, indicators of greater white matter integrity, and better performance on a perceptual speed task.

Sample and study design
In the current set of exploratory analyses, we investigated the effects of aerobic exercise on white matter integrity in previously sedentary older adults by comparing individuals who participated in a physical training intervention group to those who did not engage in exercise.To isolate the effects of exercise, we used a subset of data from the AKTIV study, which investigated cognitive and physical exercise intervention effects in older adults.A full description of the study can be found in Wenger et al. (2022), and other findings using this sample on gray matter structural integrity and psychosocial functioning are reported respectively in Polk et al. (2022) and Düzel et al. (2022).We repeat all relevant details regarding subject recruitment, intervention design, and acquired measures for the current analyses here.
Healthy older adults from 63 to 78 years old were recruited if they met none of the following exclusion criteria: magnetic resonance imaging (MRI) contraindications; they could not meet the time requirements of the study; not right-handed; engaging in aerobic exercise more than once every two weeks; fluent in a language other than German or English, or fluent in more than two languages; or receiving medical treatment for Parkinson's, gout, rheumatism, heart attack, stroke, cancer, severe back problems, severe arrhythmia, severe chronic liver or kidney failure, severe disease of the hematopoietic system, mental illness (e.g., depression), or neurological disease (e.g., epilepsy, brain tumor).
Before the start of the intervention, participants first underwent a physical assessment including cardiopulmonary exercise testing (CPET) at the Charité -Universitätsmedizin Berlin, and were then invited to the Max Planck Institute for Human Development, Berlin for a baseline MRI session and cognitive testing (T1).Out of the 201 individuals invited to participate, 42 dropped out or were excluded due to existing medical conditions or claustrophobia in the scanner before the training.Participants trained at home in one of four intervention groups (active control, language, aerobic exercise, or combined language and aerobic exercise) for three months before being scanned a second time using the same sequences and completing the same cognitive battery at T2, and after a total of six months of at-home training, participants underwent MRI, cognitive testing, and a physical assessment a final time (T3).A further 17 participants dropped out during the training citing physical complaints (e.g., pain during exercise), disinterest, time constraints, or unspecified reasons, leaving a total of 142 participants who completed the study.
The ethics committee of the German Psychological Society (DGPs) approved the study and written informed consent was collected from all participants.

Interventions
In the current set of analyses, we focused on the isolated effects of aerobic exercise versus a sedentary lifestyle, without additional cognitive training, and compared the exercise-only group (EG) to the active control group (ACG).
Forty participants completed the study in the EG (mean age = 69.8years, 50% females).Aerobic exercise was implemented with a stationary bicycle (DKN Ergometer AM-50) which was synchronized with a tablet (Lenovo TB2-X30L TAB) via Bluetooth.Using this tablet, participants could access their personalized interval training program, the initial level of which was determined at the first physical assessment (30 min at 25-140 Watts, M = 67.8,SD = 26.65).Participants were asked to exercise three to four times a week with no restrictions as to time of day.After each session, participants indicated their perceived exertion via the Borg Rating of Perceived Exertion Scale, which includes ratings from 6 (no exertion at all) to 20 (maximal exertion).If participants indicated a rating below 12 (too easy) or above 15 (too difficult), the intensity of the training could be adjusted remotely.Training intensity increased automatically approximately every two weeks by 3 min and three to four Watts.A collection of pre-selected literature was also available on the tablet, and participants were instructed to read at a slow pace for 15 min on days when they completed an exercise session, or for 45 min on days when they did not.In total, participants were expected to engage in study-related activity for approximately 45 min a day for at least six days each week.Finally, in-person group sessions of five to ten individuals each were conducted once a week, during which participants in the EG engaged in a stretching and toning course led by an external instructor.Adherence to the aerobic exercise intervention was defined as engaging in an average of 90 min of exercise a week for at least 21 weeks (≥1890 min total) with no pauses of longer than two weeks, as well as a slight steady increase in training difficulty over the course of the study, as was automatically implemented by the interval training application (see Table 1 in Results for adherence rates).
Thirty-five participants completed the study in the ACG (mean age = 70.7 years, 40% females).They also received a tablet and were asked to read the selected literature for 45 min a day on at least six days of the week.In-person group sessions for participants in the ACG consisted of a book club, where groups discussed short stories led by external facilitators (http://shared-reading.de/).Adherence in the ACG was defined as at least 1890 total minutes of reading during the study (see Table 1 in Results for adherence rates).

Cardiovascular fitness
Cardiovascular fitness was measured as peak oxygen uptake, or VO 2 peak, relativized by body weight in kilograms, using CPET with a bicycle ergometer (Ergoselect 100k, Ergoline GmbH, Bitz, Germany) and the Quark Clinical-based Metabolic Cart using the standard Breath-by-Breath setup and the V2Mask (Hans Rudolph, Inc.).

Preprocessing and calculation of voxel-wise values.
T 1 -weighted images were preprocessed using the longitudinal preprocessing pipeline with default parameters of the Computational Anatomy Toolbox (CAT12, Structural Brain Mapping group, Jena University Hospital) in Statistical Parametric Mapping (SPM12, Institute of Neurology).Images were smoothed using an 8-mm full-width half-maximum (FWHM) standard Gaussian kernel.
Diffusion-weighted images were preprocessed using MRtrix (version 3.0_RC3; Tournier et al., 2019), FSL (FMRIB's Software Library, version 6.0.2;Jenkinson et al., 2012;Smith et al., 2004;Woolrich et al., 2009), Note.M = mean; SD = standard deviation.Age, sex, years of education, and total minutes spent in intervention were calculated among those participants who fully adhered to the intervention and were included in the current analyses.
To calculate FD, log(FC), and FDC, we followed the "Fibre density and cross-section -Multi-tissue CSD" tutorial from the MRtrix3 documentation (https://mrtrix.readthedocs.io/en/latest/fixel_based_analysis/mt_fibre_density_cross-section.html;Tournier et al., 2019).In order to conduct repeated measures ANOVA on the fixel-based metrics in SPM, voxel-wise metrics were calculated for FD, log(FC), and FDC.For FD and FDC by summing the fixel-wise values across directions to calculate total FD and FDC per voxel; for log(FC), a weighted average was calculated across directions, where the log(FC) value in the direction with the greatest FD was weighted most heavily.These voxel-wise metrics were converted to Nifti format for longitudinal analysis in SPM.Finally, the FD, log(FC), and FDC maps were smoothed in SPM using a 10-mm FWHM standard Gaussian kernel.
More details regarding the preprocessing of MRI data and calculation of voxel-wise metrics can be found in the supplementary materials.

Digit Symbol Substitution task
Perceptual speed was assessed with the Digit Symbol Substitution task (DSST; Wechsler, 1981).The DSST consists of a key to nine unique digit-symbol pairs, and rows of unpaired digits.Participants are asked to complete as many pairs as possible with the corresponding symbol within 90 s.Each correct answer is scored as 1, one incorrect answer is counted as 0, and after two consecutive incorrect answers, responses are no longer counted.

Statistical analyses 2.4.1. Repeated measures ANOVA
To investigate group differences in change in VO 2 peak and DSST score, ANOVAs with time point as a within-subject factor (T1, T2, T3) and group as a between-subject factor (ACG, EG) were conducted, with age, sex, and years of education included as covariates.These were run using commands from the rstatix package (Kassambara, 2021) in R (R Core Team, 2021), version 4.1.2 (2021-11-01), using RStudio (RStudio Team, 2021), version 2021.09.2 + 382.Post-hoc t-tests were run using base R stats commands.

Flexible factorial analysis investigating time-by-group interactions
To investigate group differences in change in the white matter metrics, flexible factorial models in SPM12 were used to compute voxelwise statistics.This model, in contrast to the permutation-based models typically used in TBSS or FBA, can account for the fact that an individual's scans at different time points are not independent of one another.Smoothed WMV maps, smoothed FA and MD maps, and smoothed voxel-wise FD, log(FC), and FDC maps were entered into flexible factorial models with subject as a within-subject factor, time point as a within-subject factor (T1, T2, T3), and group as a betweensubject factor (ACG, EG).Age, sex, and years of education were entered into the model as covariates of no interest.We tested for a timeby-group interaction to investigate whether changes across time points differed between groups.A threshold of p < .050with correction for false discovery rate (FDR) at the peak-level was applied first, and if no significant clusters were revealed, a more liberal threshold of p < .001,uncorrected, was applied.In all cases, correction for non-isotropic smoothness was applied in CAT12 with a cluster extent threshold of k > 100.Missing data were excluded case-wise: no WMV data were missing, four cases were excluded from the diffusion tensor-derived and fixel-based metrics due to missing scans at one time point.
To investigate the directions of effects found, within-subject mean values at each time point were extracted from significant clusters using the REX: Response Exploration for Neuroimaging Datasets toolkit in MATLAB (Duff et al., 2007).Paired t-tests were conducted in R on these within-subject means to inspect within-group changes post-hoc.

Correlations at baseline and change-change correlations
Finally, the relationships between VO 2 peak, white matter integrity metrics, and DSST score were investigated using Pearson correlations with the Hmisc R package (Harrell, 2021) and differences between correlations calculated within-group were examined with the cocor R package (Diedenhofen and Musch, 2015).Baseline correlations were calculated, as well as correlations between percent change from T1 to T3 in VO 2 peak, extracted white matter metrics from clusters showing group differences in change, and DSST score.Missing data points were excluded pair-wise.Correction for FDR was applied to baseline correlations and change-change correlations separately.

Results
A description of the sample can be found in Table 1.Participants who did not meet compliance criteria were excluded and one further participant in the EG was excluded due to technical difficulties.This resulted in n ACG = 32 and n EG = 29 included in the analyses.A post-hoc sensitivity analysis using G*Power (version 3.1.9.6) indicated that, with α = 0.05, 1β = 0.95, and a study design with two groups and three time points, a sample size of N = 61 could reliably capture time-bygroup interaction effects with a critical F ≥ 3.073 and correlations with a coefficient of r ≥ 0.438.

Group differences in cardiovascular fitness change
Means and standard deviations of VO 2 peak at T1 and T3 for each group are reported in Table 2.A repeated measures ANOVA including age, sex, and years of education as covariates revealed a significant timeby-group interaction in VO 2 peak, F(1, 53) = 6.091, p = .017,Hedge's g = 0.009.Post-hoc pairwise t-tests indicated a significant increase in VO 2 peak within exercisers, t(28) = 4.959, p < .001,with a mean percent change of 12.8% (SE = 2.28), but not within controls, t(29) = 1.279, p = .211,with a mean percent change of 3.7% (SE = 2.22).

Whole-brain fractional anisotropy
No clusters showing significant time-by-group effects in FA were found at a threshold of p FDR < .050,k > 100.At p uncorrected < .001,one cluster was revealed (see Fig. 1).In this cluster in the left part of the genu of the corpus callosum (136 voxels; peak F = 23.35;peak voxel: x = − 10, y = 27, z = 8), the EG showed a significant decrease, t(28) = − 4.229, p < .001,while the ACG showed a significant increase in FA, t(27) = 5.149, p < .001.

Whole-brain mean diffusivity
Regarding changes in MD, no clusters survived the threshold of p FDR < .050,but two clusters showing a significant time-by-group interaction were revealed at the more lenient threshold of p uncorrected < .001(see Fig. 1).One cluster was found in the right posterior corona radiata extending into the splenium of the corpus callosum (rPCR/splenium; 358 voxels; peak F = 18.88; peak voxel: x = 26, y = − 38, z = 28), in which the EG showed a significant increase in mean MD, t(28) = 4.530, p < .001,and the ACG showed a significant decrease, t(27) = -2.663,p = .013.The other cluster was found in the right superior longitudinal fasciculus (rSLF; 100 voxels; peak F = 18.38; peak voxel: x = 34, y = − 25, z = 28), in which the EG again showed a significant increase in mean MD, t(28) = 2.450, p = .021,and the ACG showed a significant decrease, t(27) = − 3.458, p = .002.

Whole-brain fiber cross-section
No significant clusters were revealed when testing for group differences in change in log(FC), either at the initial threshold of p FDR < .050or the more liberal threshold of p uncorrected < .001.

Whole-brain fiber density and cross-section
Finally, two clusters were found in which there were significant group differences in change in FDC at p uncorrected < .001(see Fig. 1), both of which almost entirely overlapped with those found in FD: one in the right dmPFC (386 voxels; peak F = 24.79;peak voxel: x = 3, y = 65, z = 26 in study-specific space) and one in the right dlPFC (130 voxels; peak F = 16.97;peak voxel: x = 38, y = 78, z = 25 in study-specific space).These clusters did not survive FDR correction either.The pattern of within-group change mirrored that seen in FD: in the dmPFC, the EG decreased significantly from T1 to T3, t(28) = − 3.208, p = .003,while the ACG increased, t(27) = 3.931, p < .001.In the dlPFC, the EG showed no significant change, t(28) = − 1.705, p = .099,and the ACG showed a significant increase, t(27) = 3.399, p = .002.
Means and standard deviations of extracted mean values from each cluster showing a significant time-by-group interaction can be found in Table 2.

Cognition
DSST score means and standard deviations are reported in Table 2.No significant time-by-group effects were found in DSST when controlling for age, sex, and years of education, F(2, 102) = 2.696, p = .072,Hedge's g = 0.010.Post-hoc pairwise t-tests indicated a significant increase in DSST score within the EG, t(26) = 2.213, p = .036,but not within the ACG, t(29) = − 0.163, p = .872.

Correlations
Baseline correlations between VO 2 peak, DSST, and extracted mean values within each of the clusters showing significant time-by-group differences, as well as correlations between percent change in white matter metrics, all corrected for FDR, can be found in Table 3.
Regarding correlations with percent change in cardiovascular fitness, a positive correlation was found between percent change in VO 2 peak and percent change in WMV in the splenium of the corpus callosum, r(57) = 0.33, p FDR = .029.Negative correlations were found between percent change in VO 2 peak and percent change in FD in the dmPFC, r(55) = − 0.32, p FDR = .034,and percent change in FDC in the dmPFC, r(55) = − 0.30, p FDR = .047.No differences between withingroup correlation coefficients were found.
Correlations with percent change in DSST score were also detected.A positive correlation was found with percent change in WMV in the rACR/genu of the corpus callosum, r(51) = 0.35, p FDR = .026.Weak negative correlations with percent change in FD and FDC in the dlPFC were also found, r(49) = − 0.31, p uncorrected = .014and r(49) = − 0.31, p uncorrected = .023,though these did not survive FDR correction, p FDR = .051and p FDR = .053,respectively.No group difference was found in the change-change correlation between DSST and WMV.The correlation between percent change in DSST score and percent change in FD was significantly different between groups, Fisher's z = 2.399, p = .016;the      found within the ACG, r(24) = 0.08, p = .712.See Fig. 2 for visualization of significant correlations between percent changes in variables of interest.
Of note, removing the visual outlier in percent change in VO 2 peak did not affect the significance of the results: percent change in WMV in the cluster found in the splenium and percent change in VO 2 peak were positively correlated, r = 0.33, p uncorrected = .010;percent change in FD in the cluster found in the dmPFC and percent change in VO 2 peak were negatively correlated, r = − 0.35, p uncorrected = .009;and percent change in FD in the cluster found in the dmPFC and percent change in VO 2 peak were negatively correlated r = − 0.32, p uncorrected = .017.Similarly, removing the two visual outliers in percent change in DSST score resulted in a correlation of r = 0.33, p uncorrected = .015,between percent change in WMV in the cluster found in the rACR/genu and percent change in DSST score, a within-controls correlation of r = 0.05, p uncor- rected = .806,and a within-exercisers correlation of r = − 0.48, p uncorrected = .015,between percent change in FD in the dlPFC cluster and percent change in DSST score, and a within-controls correlation of r = 0.08, p uncorrected = .712,and a within-exercisers correlation of r = − 0.50, p uncorrected = .011,between percent change in FDC in the dlPFC cluster and percent change in DSST score.
No significant correlation was found between change in VO 2 peak and change in DSST score, r(51) = 0.02, p FDR = .882.
Finally, given that we found an effect of aerobic exercise on white matter integrity and in order to better understand the relationships between the white matter metrics themselves, post-hoc Pearson correlations were calculated using the Hmisc R package (Harrell, 2021).We extracted mean FA, MD, FD, log(FC), and FDC from the most robust clusters showing group-by-time effects, namely those clusters in the rACR/genu and splenium in which the EG showed no change while the ACG showed decreases in WMV, using REX.These results can be found in Table 4.

Discussion
This study investigated the effects of aerobic exercise on several white matter integrity metrics including (i) WMV, derived from voxelbased morphometry, (ii) FA and MD, derived using diffusion tensor models, and (iii) FD, log(FC), and FDC, derived using fixel-based analyses.In particular, given the known weaknesses of diffusion tensor modeling, we were interested in whether fixel-based analysis would be better suited to capturing exercise-induced changes in white matter integrity in a sample of healthy older adults.We also looked at the associations between the changes in each of the metrics used to capture white matter integrity.Finally, we looked at (iv) correlations with change in cardiovascular fitness and change in a cognitive task indexing perceptual speed.
Participants in the aerobic exercise group engaged in at-home interval training on a stationary bike for three to four days a week for six months, leading to an increase in cardiovascular fitness (VO 2 peak) compared to active control participants.This indicates that at-home aerobic exercise that is personalized to the individual is an effective intervention for cardiovascular fitness in older adults.This finding is discussed in greater detail in Polk et al. (2022).
We found evidence of exercise-induced maintenance of WMV in the current sample, substantiating our first hypothesis.Namely, we found two clusters, one in the rACR extending into the genu of the corpus callosum, and one in the splenium of the corpus callosum, in which change over six months was significantly different between the group engaging in regular aerobic exercise and the sedentary group.This finding is consistent with several cross-sectional studies which found effects of physical activity on WMV (Benedict et al., 2013;Erickson et al., 2007;Gow et al., 2012;Ho et al., 2011;Tseng et al., 2013b).Furthermore, we were able to replicate findings from a previous six-month intervention study in which older adults in a similar age range (60-79 years) participated in either a supervised aerobic exercise group or a nonaerobic stretching and toning group (Colcombe et al., 2006).This study also found more positive change in WMV in anterior white matter, namely in the genu of the corpus callosum as an effect of exercise.Our study differs from the previous in that the current design implemented a flexible, at-home training regimen, rather than in-lab exercise sessions under supervision from a personal trainer.This indicates that at-home exercise, which may be more accessible for older adults, has a similar impact on WMV as supervised, in-lab exercise.
Additionally, change in WMV in the splenium was correlated with change in cardiovascular fitness, with more positive change in VO 2 peak associated with reduced loss of WMV, supporting our fourth hypothesis.This effect seemed to be general, as no difference between within-group correlation coefficients was detected, indicating that this relationship was specifically not exercise-induced.However, the significant increase in VO 2 peak seen in the EG over the course of the intervention may have contributed to the maintenance of WMV in the splenium.This is consistent with a number of cross-sectional (e.g., Erickson et al., 2007;Ho et al., 2011) and longitudinal (e.g., Colcombe et al., 2006) studies finding associations of physical activity, aerobic exercise, and cardiovascular fitness with WMV in both frontal and parietal areas.Finally, change in WMV in the rACR/genu of the corpus callosum was correlated with change in DSST score, with reduced loss of volume being associated Note.rACR = right anterior corona radiata; WMV = white matter volume; FA = fractional anisotropy; MD = mean diffusivity; FD = fiber density; FDC = fiber density and cross-section; log(FC) = logarithm of fiber cross-section.*significant at p FDR < .05.
with a greater increase in score from T1 to T3, further supporting our fourth hypothesis.This corroborates cross-sectional findings of anterior corpus callosum size being associated with performance on the DSST (Fling et al., 2011).Again, this effect seemed to be general, with no group difference in correlation, but given the preservation of WMV in the EG compared to the ACG, as well as the significant increase in DSST score among exercisers, a causal relationship between aerobic exercise, anterior corpus callosum WMV, and performance on the DSST seems plausible.
The findings of the diffusion tensor model-derived metrics, FA and MD, did not support our hypothesis that exercise should mitigate decreasing FA and increasing MD.In the current sample, we found that FA decreased within exercisers and increased within controls in the left part of the genu of the corpus callosum.In the rPCR/splenium as well as in the rSLF, MD values in the EG increased while they decreased in the ACG.Notably, the clusters showing group-by-time interactions in FA and MD did not survive correction for multiple testing, so we only interpret these results cautiously in the following.In part, this finding corroborates earlier work by Clark et al. (2019), who found widespread decreases in FA and increases in MD in a group of 57-to 86-year-old individuals who participated in supervised aerobic exercise in the form of walking for six months, although this report did not include comparisons with a control group.Additionally, Voss et al. (2013) found no group-level effects of one year of supervised aerobic walking, compared to a flexibility, toning, and balance condition, on white matter integrity as measured by whole-brain FA in older adults aged 55-80 years.Our study differs from both the Clark et al. (2019) and Voss et al. (2013) studies in the type and context of aerobic exercise implemented; while both of the previously mentioned studies administered a supervised, in-lab aerobic walking paradigm, the current design used an at-home stationary bicycle-based interval training.Future studies may consider how different types of aerobic exercise (e.g., walking vs. biking, at-home vs. supervised in-lab) could have a different impact on the white matter metrics of FA and MD in older adults in order to more systematically understand their patterns of change in the context of aging and aerobic exercise.
Regarding this finding within the scope of aging, in a study comparing whole-brain FA values between younger and older adults, a number of regions, including the cingulum bundle, in which the current analyses showed a decrease in FA among exercisers but not controls, were found to have greater FA values in older adults as compared to younger adults (Kelley et al., 2021).The increases in FA found in this area within controls may therefore be consistent with age-related decline, and the decrease in FA induced by aerobic exercise in older adults would then be indicative of a protective effect of exercise.Regarding the cluster of MD in the rPCR/splenium of the corpus callosum, in which the EG increased while the ACG increased, it may be important to consider both the biological underpinnings of MD here, as well as the specific location of this cluster.Within regions where white matter tracts are highly unidirectional, such as within the body of the corpus callosum, FA and MD values may accurately map onto a number of factors indicating white matter integrity, including fiber coherence, fiber diameter and density, and myelination (Basser and Pierpaoli, 1996;Beaulieu, 2002;Pierpaoli and Basser, 1996).However, the diffusion tensor model has known shortcomings in the face of crossing fibers (Raffelt et al., 2012(Raffelt et al., , 2015(Raffelt et al., , 2017)).Indeed, Kelley et al. (2021) found strong negative voxel-wise correlations between FA and a measure of multi-fiber complexity, an index of the number of crossing fibers, throughout the brain.In the current analyses, one of the clusters in which MD showed a time-by-group effect was located at the intersection of two major white matter tracts: the corpus callosum and the corona radiata.The increase of MD in this area may thus be representative of an increase in the prominence of crossing fibers in this area, which is plausibly beneficial at the convergence of major tracts.This could also explain an increase in WMV (although not in an overlapping cluster, but located in a similar area in the opposite hemisphere), as an increase in myelination in multiple directions would increase the proportion of white matter found within a voxel.In support of this interpretation, we found a positive correlation between percent change in WMV in the splenium and percent change in MD in the rPCR/splenium of the corpus callosum (see Table 3).Ultimately, one should consider the drawbacks of the diffusion tensor model when interpreting age-and exercise-induced patterns of change in FA and MD.
Finally, FD and FDC findings were quite similar to one another, which is unsurprising as FDC is simply a linear combination (i.e., the product) of FD and FC.Two clusters in the PFC were found, one in the dmPFC and one in the dlPFC, however, the direction of the effects again went in the opposite direction of that which we hypothesized given previous findings regarding age-related effects on the fixel-based parameters (Choy et al., 2020;Kelley et al., 2021).Namely, significant decreases in both FD and FDC were observed in the dmPFC cluster within exercisers, while controls showed increases.In the dlPFC cluster, the EG showed no change, however the ACG still showed significantly more positive change than the EG.This seems to indicate that the density as well as combined density and cross-section of fiber bundles decreased as an effect of aerobic exercise.Again, these clusters did not survive correction for multiple testing and are interpreted with caution here.
Interestingly, in the dmPFC, change in both FD and FDC values were negatively correlated with change in VO 2 peak, with decreases in fitness being associated with greater increases in FD and FDC.This effect seemed to be unrelated to the exercise intervention, as the group-wise correlations were not significantly different.Altogether, this seems to suggest that increases in FD and FDC in this specific region are associated with age-related decline, and that aerobic exercise and improved cardiovascular fitness may ameliorate this decline.Notably, the correlation with change in VO 2 peak was also not significant within either group, which could indicate a lack of power, and the overall correlation should be interpreted with caution, given the group differences in both FD and FDC changes, as well as in VO 2 peak change.In the dlPFC cluster, changes in FD and FDC were found to be weakly negatively correlated with change in DSST score, and these correlations were significantly different between groups, with the EG showing a significant negative correlation and the ACG showing no association.This suggests that, on a functional level, lower levels of FD and FDC in this area in the dlPFC could be beneficial in aging, with greater exercise-induced declines being associated with greater improvement in performance on a task requiring a range of cognitive processes, including perceptual speed and executive function.
Together, these findings suggest that the directionality of age-related changes as well as exercise-induced changes may be region-specific in terms of functional adaptivity.Notably, both clusters found in the current analyses of FD and FDC were localized at the border between white matter and gray matter, in what is known as superficial white matter.Studies using FA and MD have found an inverse relationship between superficial FA and age (Nazeri et al., 2015), as well as cross-sectional associations between diffusion tensor metrics and cognitive function (Reginold et al., 2016).However, there has been little to no investigation of region-specific associations between age, cognition, and structural integrity in these superficial white matter areas due to methodological limitations (Kirilina et al., 2020).
Finally, we were interested in the relationships between the various white matter metrics themselves, especially across the different analysis techniques.We therefore extracted all the metrics of interest from the most robust clusters showing group-by-time interaction effects, namely the cluster in the rACR extending into the genu of the corpus callosum and the cluster in the splenium of the corpus callosum.At baseline, as expected, WMV was positively correlated with FA and the fixel-based metrics, and negatively with MD; FA was positively correlated with the fixel-based metrics, MD was negatively correlated with FD and FDC, and FA and MD were negatively correlated; and FD, FDC, and log(FC) were positively correlated with one another.This indicates a linear relationship among the metrics when observing them cross-sectionally.However, we observed no significant correlations among percent changes in these metrics, with the exceptions of FA and MD, which were negatively correlated, and FD and FDC, which were almost perfectly positively correlated.The absence of significant change-change correlations could indicate that the variance in change measured with these metrics was not different from zero, restricting our ability to measure change-change correlations; future methodological studies should aim to test how reliably changes in these metrics, particularly in the newer fixel-based metrics, can be captured.Alternatively, if the absence of significance is indicative of a true null result, this could mean that while within a certain analysis pipeline, changes across individual metrics are associated with one another, each of the analysis pipelines captures a different aspect of white matter integrity, and each of these aspects of white matter integrity changes in a differentiated manner.
While WMV is a more general measure of white matter integrity that specifically does not account for underlying anatomical structures (e.g., cellular structures, fibers or fiber bundles), the diffusion tensor-based and fixel-based metrics aim to capture microstructural white matter integrity and changes therein.Given the drawbacks of the diffusion tensor model discussed previously that potentially limit their validity to measure microstructural integrity in a majority of white matter, it may be beneficial to devote future research to more fully understanding the anatomical underpinnings of age-related decreases in FD, FC, and FDC in humans.As the fixel-based metrics were designed to capture changes in fiber bundles, the use of these metrics of course does not preclude the use of e.g., WMV to understand more general atrophy due to aging, and indeed these two methods could be used complementarily to more deeply investigate how aging affects the brain, and in turn, how agerelated deterioration of white matter can be reversed through interventions such as aerobic exercise.
Altogether, the findings of the current study provide evidence of an effect of aerobic exercise on WMV, tensor-based FA and MD, and fixelbased FD and FDC in older adults, and show that increases in WMV in the corpus callosum and reductions FD and FDC in superficial WM in the PFC are related to increased fitness as well as improvement on a cognitive task indexing perceptual speed.

Limitations
The current study has a number of limitations that warrant mentioning.First, given the interventional nature of the study, the sample was only moderately sized.This may have reduced our power to find robust effects that would survive correction for multiple comparisons.Future studies should aim to replicate the current findings with larger sample sizes.Second, the sample in the current study is relatively homogeneous: participants were recruited from an area with relatively high socioeconomic status, they were well-educated on average, and indicated no major health problems, despite not engaging in physical activity on a regular basis.Thus, there could be other protective factors at play apart from aerobic exercise.Future studies should aim to generalize the findings of the current study in more heterogeneous samples.Finally, regarding the white matter findings using diffusion tensor model-and fixel-based metrics, the clusters in which groups differed in change that were found at threshold of p uncorrected < .001did not survive correction for false discovery rate, so these results should be interpreted with caution, as previously mentioned.Furthermore, both clusters of FD and FDC change in the dmPFC and dlPFC were localized at the edge of the cortex in superficial white matter.These fiber bundles are more complicated to study than deep white matter, given their proximity to gray matter, and may also be more susceptible to noise during MR acquisition, leading to difficulties in the estimation of certain WM tracts (Guevara et al., 2020;Kirilina et al., 2020;Reveley et al., 2015).Additionally, the current study used b-values of 710 s/mm 2 and 2850 s/mm 2 for multi-shell analyses.However, a recent study investigating the impact of different b-values on the estimation of FD in a sample of children and adolescents (8-18 years old) found that multi-shell schemes including higher b-values of 4000 s/mm 2 or even 6000 s/mm 2 improved the sensitivity of tract-specific FD to age associations (Genc et al., 2020).Future research interested in associations between age and WM integrity measured with fixel-based metrics may therefore consider acquiring MR data with sequences that specifically target superficial WM, or with diffusion-weighted imaging using higher b-values.

Conclusion
The current study investigated the effects of aerobic exercise on white matter structure in older adults using voxel-based morphometry, the diffusion tensor model, and fixel-based metrics.We found strong evidence that aerobic exercise is protective of WMV in the corpus callosum, replicating previous findings.We also found that these changes positively correlated with both changes in cardiovascular fitness and changes in performance on a cognitive task indexing perceptual speed.We also found weak evidence of a decrease in FA and an increase in MD within exercisers.While this was contrary to our expectations, there have been previous reports of similar findings.Moreover, controls showed greater increases in FA and decreases in MD than exercisers, adding to the existing skepticism of the interpretation that increased FA and decreased MD are always beneficial in aging.However, given the criticisms of the diffusion tensor model regarding crossing fibers, perhaps other metrics that do not have these same weakness are better suited to investigate age-related changes.Finally, we found weak evidence for decreased FD and FDC in exercisers as compared to controls in frontal regions near the cortex.These decreases were negatively correlated with increases in VO 2 peak overall, as well as increases in DSST scores within exercisers.This suggests that the generalized interpretation that higher density and cross-section of white matter fibers are always associated with better outcomes in aging may not be precise enough.Specifically, changes in FD and FDC in deep versus superficial WM may have different functional implications, and these metrics should be investigated in a region-specific manner to understand potentially differentiated biological underpinnings of FD and FDC changes in different brain areas.

Author contribution statement
SEP assisted with data acquisition, preprocessed imaging data, analyzed the data, interpreted the results, and wrote the manuscript.MMK preprocessed imaging data and revised the manuscript.NCB designed the neuroimaging protocol and revised the manuscript.CM and JP performed physical assessments including cardiopulmonary exercise testing and revised the manuscript.BW designed the physical assessment protocol and revised the manuscript.SK designed the study and revised the manuscript.UL designed the study, interpreted the results, and revised the manuscript.SD designed the study, interpreted the results, and revised the manuscript.EW designed the study, preprocessed imaging data, interpreted the results, and revised the manuscript.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Group differences in change in white matter volume (WMV), fractional anisotropy (FA), mean diffusivity (MD), fiber density (FD), fiber density and crosssection (FDC).Yellow/orange-colored clusters represent more positive changes in exercisers than controls, cyan-colored clusters represent more negative changes in exercisers than controls.WMV, FA, and MD are calculated and displayed in MNI space, whereas FD and FDC are calculated and displayed in a study-specific space.(For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) Note.DSST = Digit Symbol Substitution task; WMV = white matter volume; FA = fractional anisotropy; MD = mean diffusivity; FD = fiber density; FDC = fiber density and cross-section; rACR = right anterior corona radiata; rPCR = right posterior corona radiata; rSLF = right superior longitudinal fasciculus; dmPFC = dorsomedial prefrontal cortex; dlPFC = dorsolateral PFC.* Significant at p FDR < .05.

Fig. 2 .
Fig. 2. Significant correlations between percent change in VO 2 peak, white matter integrity metrics extracted from clusters with significant time-by-group interactions, and Digit Symbol Substitution task score.Overall correlations are shown in opaque black for those correlations that do not show significant group differences (top row: left, center, right; bottom row: left); for these, group-wise correlations are shown as transparent color-coded lines.Correlations that differ significantly between groups are represented by opaque color-coded lines (bottom row: center, right); for these, overall correlations are shown as transparent black lines.WMV = white matter volume; FD = fiber density; FDC = fiber density and cross-section; dmPFC = dorsomedial prefrontal cortex; DSST = Digit Symbol Substitution task; ACR = anterior corona radiata; dlPFC = dorsolateral prefrontal cortex.(For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Table 1
Sample demographics and intervention specifics.

Table 2
Means and standard deviations of variables of VO 2 peak, Digit Symbol Substitution task score, and white matter metrics extracted from clusters showing significant time-by-group interactions.

Table 3
Correlation coefficients of baseline and percent change correlations between VO 2 peak, Digit Symbol Substitution task score, and white matter metrics extracted from clusters showing significant time-by-group interactions.

Table 4
Correlation coefficients of baseline and percent change correlations between white matter metrics extracted from clusters showing robust time-by-group interactions.