Spinal cord magnetic resonance imaging and spectroscopy detect early-stage alterations and disease progression in Friedreich ataxia

Abstract Friedreich ataxia is the most common hereditary ataxia. Atrophy of the spinal cord is one of the hallmarks of the disease. MRI and magnetic resonance spectroscopy are powerful and non-invasive tools to investigate pathological changes in the spinal cord. A handful of studies have reported cross-sectional alterations in Friedreich ataxia using MRI and diffusion MRI. However, to our knowledge no longitudinal MRI, diffusion MRI or MRS results have been reported in the spinal cord. Here, we investigated early-stage cross-sectional alterations and longitudinal changes in the cervical spinal cord in Friedreich ataxia, using a multimodal magnetic resonance protocol comprising morphometric (anatomical MRI), microstructural (diffusion MRI), and neurochemical (1H-MRS) assessments.We enrolled 28 early-stage individuals with Friedreich ataxia and 20 age- and gender-matched controls (cross-sectional study). Disease duration at baseline was 5.5 ± 4.0 years and Friedreich Ataxia Rating Scale total neurological score at baseline was 42.7 ± 13.6. Twenty-one Friedreich ataxia participants returned for 1-year follow-up, and 19 of those for 2-year follow-up (cohort study). Each visit consisted in clinical assessments and magnetic resonance scans. Controls were scanned at baseline only. At baseline, individuals with Friedreich ataxia had significantly lower spinal cord cross-sectional area (−31%, P = 8 × 10−17), higher eccentricity (+10%, P = 5 × 10−7), lower total N-acetyl-aspartate (tNAA) (−36%, P = 6 × 10−9) and higher myo-inositol (mIns) (+37%, P = 2 × 10−6) corresponding to a lower ratio tNAA/mIns (−52%, P = 2 × 10−13), lower fractional anisotropy (−24%, P = 10−9), as well as higher radial diffusivity (+56%, P = 2 × 10−9), mean diffusivity (+35%, P = 10−8) and axial diffusivity (+17%, P = 4 × 10−5) relative to controls. Longitudinally, spinal cord cross-sectional area decreased by 2.4% per year relative to baseline (P = 4 × 10−4), the ratio tNAA/mIns decreased by 5.8% per year (P = 0.03), and fractional anisotropy showed a trend to decrease (−3.2% per year, P = 0.08). Spinal cord cross-sectional area correlated strongly with clinical measures, with the strongest correlation coefficients found between cross-sectional area and Scale for the Assessment and Rating of Ataxia (R = −0.55, P = 7 × 10−6) and between cross-sectional area and Friedreich ataxia Rating Scale total neurological score (R = −0.60, P = 4 × 10−7). Less strong but still significant correlations were found for fractional anisotropy and tNAA/mIns. We report here the first quantitative longitudinal magnetic resonance results in the spinal cord in Friedreich ataxia. The largest longitudinal effect size was found for spinal cord cross-sectional area, followed by tNAA/mIns and fractional anisotropy. Our results provide direct evidence that abnormalities in the spinal cord result not solely from hypoplasia, but also from neurodegeneration, and show that disease progression can be monitored non-invasively in the spinal cord.


Introduction
Friedreich ataxia (FRDA) is the most common inherited ataxia, with an incidence ranging from 1:20 000 to 1:250,000, or less depending on countries. 1,2 Genetic transmission is autosomal recessive and is caused by mutations in the frataxin gene (in most cases a GAA repeat expansion in Intron 1) leading to reduced expression of frataxin. 3,4 Symptoms include limb and gait ataxia, dysarthria and cardiomyopathy. 5 Onset is typically in the second decade of life, 6 and there is currently no effective disease-modifying treatment.
Nikolaus Friedreich recognized spinal cord degeneration as a hallmark of the disease 7 and spinal cord atrophy was reported in early MRI studies. [8][9][10][11][12] In the pathogenesis of FRDA, substantial demyelination and gliosis of the posterior and lateral columns of the spinal cord occur prior to cerebellar pathology. 5 In particular, dorsal root ganglia, posterior roots and posterior columns of the spinal cord show significant pathological alterations. [13][14][15] Recent cross-sectional MRI studies showed that spinal cord cross-sectional area (CSA) is significantly smaller in FRDA than in controls and correlates negatively with disease severity. [15][16][17] However, longitudinal MR data documenting disease progression in the spinal cord are still lacking. With many clinical trials ongoing or on the horizon, including gene therapy trials, MR could play a key role in assessing disease progression and the effect of prospective treatments. Consequently, 'natural history' longitudinal MR data documenting disease progression in the spinal cord are urgently needed.
Beyond anatomical imaging, advanced MR modalities, such as diffusion MRI (dMRI) and magnetic resonance spectroscopy (MRS) have the potential to respectively reveal microstructural and neurochemical alterations not visible in conventional anatomical images.
Diffusion MRI 18 relies on the anisotropic diffusion of water in organized tissues, such as the brain white matter or spinal cord, to recover microstructural and connectivity information through local biophysical modelling and tractography. 19 Axonal membranes hinder the diffusion of water molecules, a phenomenon that can be quantified through MRI by taking measurements along multiple orientations. Diffusion tensor imaging (DTI) models diffusion MRI data by assuming Gaussian diffusion at each location of the imaged tissue. Diffusion MRI of the spinal cord is challenging and has only recently received growing attention and seen new developments in the neuroimaging community, [20][21][22][23][24][25][26] with applications in disorders such as degenerative cord compression, 27 multiple sclerosis, 28 amyotrophic lateral sclerosis 23 and traumatic spinal cord injury. 29 To the best of our knowledge, only one study has reported crosssectional data using dMRI in the spinal cord in FRDA, 30 and no longitudinal dMRI data have been published.
Proton magnetic resonance spectroscopy ( 1 H-MRS) allows non-invasive measurement of the concentration of multiple metabolites in multiple organs, including the spinal cord. Although challenging to implement, 1 H-MRS in the spinal cord has been shown to be feasible. [31][32][33][34] A number of studies have reported neurochemical alterations in the spinal cord in neurodegenerative diseases, [35][36][37] and changes in N-acetyl-aspartate (NAA) concentrations in a longitudinal assessment of lesions of the spinal cord in multiple sclerosis. 38 However, no data have been reported using MRS in the spinal cord in FRDA.
The objective of the present study was to study pathological alterations and longitudinal changes in the cervical spinal cord in an early-stage cohort of individuals with FRDA, using an advanced multimodal MR protocol (MRI, dMRI and MRS). We hypothesized that FRDA participants would show significant alterations in morphometry, microstructure and neurochemistry relative to controls, even at early-stage, and that FRDA participants would show changes over time.

Participants
All procedures were conducted in accordance with the Declaration of Helsinki and were approved by the University of Minnesota Institutional Review Board. All participants provided informed consent (adults) or, for minors, informed assent with their parents providing informed consent.
Twenty-eight individuals with FRDA (mean age: 19.0 ± 7.3 years, range: 11-35, 14 M/14F) and 20 age-and gendermatched control individuals (20.4 ± 7.1 years, range: 10-35, 11 M/9F) were recruited for this study ( Table 1). The FRDA group was at an early stage of the disease (time from diagnosis: 2.3 ± 2.6 years; disease duration: 5.5 ± 4.0 years). With 'earlystage' defined as functional staging ≤ 3 (corresponding to ambulatory, i.e. walking with or without assistance), 26 of 28 FRDA participants were at early-stage at baseline, with a majority (n = 22) at Stage 2 or below. FRDA participants were recruited through the University of Minnesota Ataxia Research Center and the Friedreich's Ataxia Research Alliance. Control participants were recruited through the local CMRR volunteer list and word of mouth. Eligibility criteria for FRDA participants were: confirmed FRDA diagnosis. Exclusion criteria for all participants were: neurological disease (other than FRDA), contraindication for MR scan, pregnancy, claustrophobia, restless leg syndrome (difficult to stay still in the scanner) as well diabetes, smoking and history of alcoholism due to their potential confounding effect on metabolite concentrations with MRS. For the same reason, all subjects were asked to withhold antioxidants such as multivitamins for 3 weeks prior to scanning. All subjects were also asked to withhold benzodiazepine or any medication specific to ataxia symptoms for 36 h prior to scanning.
Twenty-one participants with FRDA returned for followup (referred to as the 'longitudinal cohort') approximately 1 year later (time between baseline and 1-year follow-up = 13.7 ± 2.2 months), and 19 participants also returned for a 2-year follow-up (time between baseline and 2-year followup = 26.8 ± 3.4 months). In the following, we refer to '1-year' and '2-year' follow-ups for brevity. Controls were scanned only at baseline. Reasons for not returning for 1-year and 2-year follow-up included: too progressed (n = 1), rod placement (n = 2), brain arteriovenous anomaly (n = 1), claustrophobia (n = 1) and no longer interested or too busy (n = 4).
Study visits took place at the Center for Magnetic Resonance Research, University of Minnesota from late 2013 to 2018. Each visit included three ∼1 hr scanning sessions at 3 Tesla over 1-2 days: One for brain anatomical and diffusion MRI, one for spinal cord dMRI, and one for spinal cord MRS. Because of the large amount of data, we focus in this paper on the spinal cord findings, with the brain anatomical MRI used only for cervical spinal cord morphometry.
In addition to MR scans, all patients were assessed with the original Friedreich Ataxia Rating Scale (FARS), 39 which includes total neurological score (FARS total neuro, range 0-117), activities of daily living (ADL, range: 0-36), functional staging (range: 0-6), timed 9-hole peg test (9HPT) with dominant and non-dominant hand and timed 25 ft walk test. In addition, Scale for the Assessment and Rating of Ataxia (SARA) scores (range: 0-40) were collected except for 15 FRDA participants at baseline.
An ataxia neurologist (K.O.B.) reviewed spinal cord T 2 images acquired prior to DTI (which were not suitable for spine morphometry due to the large slice thickness but suitable for visual inspection of the spinal cord in the sagittal plane). There was no central canal narrowing, cord contact or cord compression in any of the subjects. Mild multilevel disc bulging and disc desiccation was seen in 6 patients and 8 controls. However, these signs are common even in younger individuals and are not indicative of spinal cord pathology in the absence of other findings.

MR data acquisition
Measurements were initially performed on a 3 Tesla Siemens MAGNETOM Trio scanner (Siemens, Erlangen, Germany) with Syngo software version VD13D. The Trio scanner was upgraded to a MAGNETOM Prisma fit scanner with Syngo software version VE11C during the study. The potential impact of this upgrade was carefully investigated and additional scans and analyses were performed to ensure that the upgrade did not bias our results (see 'Scanner upgrade' section below).
On both scanners, the body transmit RF coil was used. On Trio, a 32-channel receive head coil was used for brain acquisitions, and a 20-channel receive head-neck coil was used for spine acquisitions. On Prisma, a 64-channel receive headneck coil was used for all acquisitions.

Spinal cord diffusion MRI
Sagittal T 2 -weighted 2D TSE images with low resolution in the slice direction were acquired for positioning the dMRI volume of interest and for identification of cervical levels, with the following parameters: field of view 260 × 260 mm 2 , matrix 384 × 384, 13 slices, 3 mm slice thickness, 0.7 × 0.7 mm 2 in-plane resolution, TR/TE = 3500/112 ms, flip angle 160deg.
The cervical spine was scanned axially, covering levels C2 to C7, with 1.12 × 1.12 mm 2 (field of view 118 × 62 mm 2 ) inplane resolution and 3.3 mm slice thickness (30 slices), as illustrated in Fig. 2. A segmented-readout EPI sequence (RESOLVE) 40 was used in combination with parallel imaging to reduce artefacts due to susceptibility changes at tissue interfaces around the spine, as previously reported. A finger pulse oximeter was used for cardiac triggering. The following parameters were used: TR/TE = 4500/66 ms, in-plane acceleration (iPAT) = 2, 30 directions, b-value 650 s/mm 2 , six b = 0 volumes. Data were collected with reversed phase-encode blips (AP and PA), resulting in pairs of images with distortions going in opposite directions. From these pairs, the susceptibility-induced off-resonance field was estimated and corrected using a method similar to that described in a study by Andersson et al. 41 Subject motion and eddy current-induced distortions were corrected 42 as previously reported. 23

Spinal cord magnetic resonance spectroscopy
Sagittal T 2 -weighted images similar to those acquired before spinal cord diffusion were used to position an 8 × 6 × 30 mm 3 (1.44 ml) voxel in the spine, centred on the C4-C5 intervertebral disc. B 0 shimming was performed using Siemens image- based shimming with a reduced field of view. Average water linewidth in the voxel was 13.0 ± 3.4 Hz after shimming. MR spectra were acquired using an optimized semi-LASER sequence with TE = 28 ms, TR = 5 s, and 2 × 128 averages. A detailed description of the sequence, including outer-volume suppression and VAPOR water suppression can be found in a study by Oz et al.. 43 Individual shots were saved separately. A finger pulse oximeter was used for cardiac triggering. Unsuppressed water reference signals (NT = 4) were collected for eddy current correction and to serve as an internal concentration reference for quantification. Water signal was also measured at different TEs (28,35,50,70,100,200, 300, 500, 800, 1000, 1500, 3000 and 4000 ms) with one average per TE and TR = 15 s. The resulting T 2 curve was fit with a bi-exponential function to determine the fraction of CSF and tissue in the voxel, assuming a T 2 value of 740 ms for CSF. 44 The entire procedure was repeated twice, yielding two spectra with 128 transients each, for a total of 256 transients.
Both individual shots and accumulating spectrum were monitored in real time for spectral quality by the operator. In case of movement (resulting in degraded water suppression, degraded water linewidth and/or lipid signal visible in single shots or in accumulated spectrum), the spectral acquisition was stopped (keeping the shots already acquired). Spine T 2 -weighted images were reacquired to reposition the voxel, adjust B 0 shim again if necessary, and acquire the remaining shots.

MR data processing Available data
In all, there were 88 visits (20 controls at baseline, 28 FRDA at baseline, 21 FRDA at 1 year, 19 FRDA at 2 years), and 3 data sets were acquired at each visit: Brain anatomical MRI (used for upper cervical spine morphometry), spinal cord dMRI and spinal cord MRS. Of those 264 data sets, 12 data sets (<5%) could not be used primarily because of significantly different from CTRL, P < 0.05, two-tailed two-sample unpaired t-test. # Slope significantly lower than zero, P < 0.05, one-tailed one-sample t-test.
excessive motion artefacts but also artefacts due to braces, or missing data due to subject discomfort or anxiety. The final number of data sets used for the analysis was as follows: Baseline controls: 20

Anatomical MRI
Brain T 1 -weighted images were analyzed using the Spinal Cord Toolbox (SCT) 45 version 4.0 to extract spinal cord CSA and eccentricity at C1, C2 and C3 levels. The C2-C3 intervertebral disc was manually delineated to initialize the automatic labelling of vertebrae, after which deep learning algorithms in the SCT toolbox were used to segment the cord.

Diffusion MRI
The diffusion images were preprocessed to remove noise 46 and Gibbs-ringing artefacts. 47 The images were then corrected for motion, as well as geometric and eddy current distortions using FSL (topup and eddy) 42 and bias-field corrected. Subsequently, diffusion tensor fitting using robust estimation of tensors by outliers rejection 48 was performed to estimate fractional anisotropy (FA), mean diffusivity (MD), radial diffusivity (RD) and axial diffusivity (AD). 49 The mean b = 0 image was segmented to aid with the registration of the PAM50 template 50 into the native diffusion space of each participant. This registration step was significantly different from CTRL, P < 0.05, two-tailed two-sample unpaired t-test. # Trend for slope significantly lower than zero, raw P = 0.02, adjusted P = 0.08, one-tailed one-sample t-test.
initialized using the non-linear deformation fields generated from the registration of the spine T 2 -weighted image to the PAM50 template during the segmentation of the spine T 2 -weighted image. Since it is difficult to identify the vertebral levels in the axially acquired diffusion image, the warping fields between the high-contrast spine T 2 -weighted image and the template were used to match vertebrae and thus indirectly and automatically label the vertebral levels in the diffusion image. Values for FA, MD, RD and AD at each spinal cord level were extracted using the sct_extract_metric command of SCT, using combined label 51 for whole-cord white matter (sum of labels 0:29), combined label 53 for dorsal columns (sum of labels 0:3) and labels 4:5 for cortico-spinal tract.

Magnetic resonance spectroscopy
Individual shots in the first NT = 128 acquisitions were inspected visually and any shot that had severely degraded water suppression, or showed lipid contamination, was removed. In most cases, SNR of metabolites on single shots was not sufficient for shot-to-shot frequency correction, therefore shots were averaged in blocks of 4 or 8 (depending on SNR), then frequency-aligned before summation. The procedure was repeated for the second NT = 128 acquisition. The two resulting summed spectra were frequency-aligned and summed to yield a final single spectrum (NT = 256). In some participants, one of the two spectra was either not acquired due to time constraints or not usable due to spectral quality, and only one NT = 128 spectrum was used in the analysis instead of NT = 256. Spectra were rejected if: (i) LW > 0.2ppm (25 Hz) or (ii) SNR < 2 or (iii) height of lipid peak greater than 50% of tNAA. LW and SNR values were those estimated by LCModel.

Effect of scanner upgrade
The Trio scanner was upgraded to Prisma in the middle of the study. Of 21 participants who returned for follow-up, nine participants were scanned on Trio at baseline and 1-year follow-up, and on Prisma at 2-year follow-up (noted 'TTP'), two participants were only scanned on Trio ('TT') (no 2-year follow-up), seven participants were scanned on Trio at baseline and Prisma thereafter ('TPP') and three participants were scanned only on Prisma ('PPP'). Scanning parameters were matched as closely as possible on Trio and Prisma.
To measure any potential bias due to this scanner upgrade, we scanned five additional subjects (not part of the n = 20 control participants) on Trio before the upgrade and on Prisma after the upgrade (see Supplementary Table 1). All measured parameters remained stable before and after the upgrade, with one exception: all diffusivity metrics (AD, RD and MD) were on average 25% higher on Prisma than on Trio. Therefore, in order to mitigate this effect, all underestimated values obtained on Trio were corrected (multiplied) by a factor of 1.25. The reason for this difference was traced to a single mismatched parameter (coil combination mode) set to 'Sum of Squares' on Trio, and 'Adaptive Combine' on Prisma ('Adaptative Combine' did not exist in VD13D). This resulted in different noise distributions on the two systems 52 and affected diffusivity estimates (but not FA). Briefly, the Trio's 'Sum of Squares' method leads to a diffusion signal that follows a non-central-χ 2 distribution with an elevated noise floor which artificially increase the diffusion signal, thereby underestimating diffusivity measures. Since we did not keep the raw per-coil data (so-called 'TWIX' data), we were unable to reconstruct the Trio data with the 'Adaptive Combine' method.
Following this correction of diffusivity values, we performed two analyses of the longitudinal data (which would be the most sensitive to any effect of the scanner upgrade) to rule out any remaining bias: One with all the data, and a second one with 'same scanner only' data: For each participant, we kept only time points from the same scanner, i.e. for 'TTP' we kept only 'TT', for 'TPP' we kept only 'PP' and for 'PPP' we kept 'PPP'. The 'same scanner only' longitudinal results were then compared to those obtained with all data (see Supplementary Table 2). We found that longitudinal results were consistent between the two analyses, confirming that longitudinal results from all data were not significantly affected by the scanner upgrade.

Statistical analysis
Missing data were left blank. To reduce the extent of multiple testing adjustment, the following metrics were chosen a priori (outcome measures): Morphometry: CSA and eccentricity of whole cord at C2-C3 obtained from SCT. The average of C2-C3 was chosen to allow comparison with manual segmentation performed at C2-C3 (Supplementary Figure 1). Cross-sectional results also include separate CSA for grey matter and white matter.
Diffusion: FA, MD, AD, RD averaged over C3-C6 (C2 and C7 were excluded because of inconsistent B0 shimming and image quality at the edge of the field of view).
Spectroscopy: Water-referenced concentrations in tNAA, mIns, tCr, tCho, and ratio tNAA/mIns. The ratio tNAA/mIns was chosen because it combines the changes in tNAA (decrease) and mIns (increase) that is observed in many neurodegenerative diseases, maximizing the effect size. For longitudinal slopes and associations between clinical metrics and MR metrics, only the ratio tNAA/mIns was chosen due to higher variability observed in water-referenced concentrations.
MR metrics in patient and control groups were compared at baseline using linear models with group as the primary variable of interest, and age and sex as covariates.
Among FRDA participants only, associations of clinical metrics (response variables) with MR metrics (predictor variables), using data from all three visits, were estimated using linear mixed models, with subjects as random effects.
For each MR metric and each clinical metric, withinparticipant longitudinal slopes across all visits were estimated using linear regressions. These slopes were tested using one-sided one-sample t-tests. For each variable, we hypothesized that longitudinal changes would go in the same direction as cross-sectional differences at baseline. For example, if CSA was lower in FRDA participants than in controls at baseline, we hypothesized that CSA in FRDA participants would continue to decrease over time (slope < 0). The sample size of the longitudinal cohort (n = 21) was sufficient to detect slopes > 0 (or < 0) with an effect size of 0.5 (or −0.5) or better, with α = 0.05 and β = 0.7.
For analyses that were repeated for multiple metrics, reported P-values were corrected for Type I error inflation due to multiple testing using the Holm-Bonferroni procedure across metrics within each MR modality (morphometry: four metrics at baseline, two metrics longitudinally; diffusion: four metrics; spectroscopy: five metrics at baseline, one metric longitudinally; clinical: five metrics).

Data availability
The data that support the findings of this study are available from the corresponding author upon request. The data are not publicly available due to their containing information that could compromise the privacy of research participants. Because FRDA is a rare disease, there is a risk that even de-identified data, which contain information such as age, age of onset, GAA repeat length, or clinical scale scores, could be used to identify individual FRDA participants. Especially with children, we are erring on the side of caution and making data available on request, rather than publicly available.

Spectroscopy
In spite of the small volume, the main resonances (tCho, tCr, tNAA, mIns) were clearly visible in MR spectra (Fig. 3). The average water linewidth was 13.0 ± 3.4 Hz, average metabolite linewidth returned by LCModel was 15.4 ± 3.8 Hz and average signal-to-noise ratio (SNR, with noise defined as 2*RMS noise ) returned by LCModel was 5.0 ± 1.6. Examples of spectra with varying spectral quality are provided in Supplementary Figure 2.
The spectral pattern was markedly different in the FRDA participants relative to controls, with a lower tNAA peak at 2.01 ppm and a higher mIns peak at 3.55 ppm (Fig. 3). The concentration of tNAA was 36% lower (P = 6 × 10 −9 ) and the concentration of mIns was 37% higher (P = 2 × 10 −6 ) in FRDA participants than in controls ( Table 2). As a result, the ratio tNAA/mIns was 52% lower in FRDA participants compared with controls (P = 2 × 10 −13 ). There was almost no overlap in tNAA/mIns values between the two groups. All patients had a tNAA/mIns ratio below 0.85, while all controls had a tNAA/mIns ratio higher than 0.85, with one exception: a 10 y.o. control participant had a tNAA/mIns ratio of 0.6.
All cross-sectional results, as well as cross-sectional effect sizes (Cohen's d), are summarized in Table 2.

Longitudinal cohort findings
Annual rates of change (slopes) for each MR metric and each clinical metric are summarized in Table 3. Effect size is reported as standardized response mean (SRM) defined as mean(slope)/SD(slope).

Morphometry
Spinal cord morphometry at C2-C3 using SCT showed a mean rate of atrophy of −2.4% per year (SRM = −0.95), with no significant change in eccentricity (Fig. 1). These findings were broadly consistent across individual vertebral levels C1, C2 and C3, with the largest effect size (−1.23) for the average C1-C2 (Supplementary Table 3). The effect size was smaller for C3 (−0.78) than for C1 or C2 (−1.13 and −1.07, respectively) due to higher standard deviation at C3, possibly reflecting the decreasing coil sensitivity and lower SNR at the edge of brain T 1 images. Longitudinal atrophy was observed only in white matter. There was no significant longitudinal change in grey matter (Supplementary Table 3).
In a separate analysis, spinal cord area and eccentricity were also obtained through manual segmentation of the spinal cord at C2-C3 using SpineSeg 53 by three independent raters in a blind fashion (see Supplementary Figure 1). The rate of atrophy from the average CSA across the three raters (−2.9%) was slightly higher than the one found using the automated method SCT (−2.4%) and had slightly lower SRM (−0.87 versus −0.95). All three individual raters found a decrease in CSA over time, but with lower SRM (ranging from −0.62 to −0.79), suggesting that manual segmentation introduces more variability than SCT (Supplementary Figure 1).

Diffusion
FA averaged across C3-C6 showed a trend (raw P < 0.02, corrected P < 0.08) in yearly decrease of −3.2% with a Cohen's d of −0.50. There was no significant change for any of the diffusivity metrics (MD, RD, AD) averaged across C3-C6. In post hoc analyses (Supplementary Table 4), FA values appeared to be most reliable (lower standard deviation and greater effect size) at the centre of the imaging volume (C4-C5) were B 0 shimming is optimal and image quality is higher. FA values averaged between C4 and C5 confirmed the trend in yearly decrease with a change of −5.6% (raw P = 0.0002) and Cohen's d of −1.0. Spectroscopy tNAA/mIns showed a significant (P = 0.03) yearly decrease of −5.8% with a SRM of −0.51. Because individual metabolite concentrations had higher variability than the ratio tNAA/mIns, slopes for individual metabolites were not included in the main analysis but are provided in Supplementary Table 5 for information. While none of the individual metabolites showed a statistically significant change over time, numerical results suggest that the longitudinal decrease in tNAA/mIns may be driven both by a decrease in tNAA (−3.9%/year) and an increase in mIns (+0.7%/year). This is supported by the fact that the ratio tNAA/tCr decreased and the ratio mIns/tCr increased, whereas tCr was relatively stable (Supplementary Table 5

Correlations of MR metrics with clinical scales
Many of the measured MR parameters correlated with clinical parameters ( Table 4). The strongest effects were observed for the spinal cord CSA, followed by FA and tNAA/ mIns. CSA correlated negatively with all clinical parameters: FARS, SARA, ADL, functional staging and 9HPT. For example, CSA correlated negatively with FARS total neuro with R = −0.60 and P = 4 × 10 −7 (Fig. 4). FA correlated negatively with FARS total neuro, functional staging and ADL, whereas tNAA/mIns negatively correlated with functional staging, ADL and 9HPT non-dominant. There was no significant correlation between any of the MR metrics and GAA1 (shorter GAA repeat), GAA2 (longer GAA repeat) or age at onset.

Discussion
To our knowledge, these are the first quantitative longitudinal MR results ever reported in the spinal cord in FRDA. Our data demonstrate the ability to monitor disease progression non-invasively in the spinal cord in FRDA, which is critical to assess how potential treatments affect the nervous system in upcoming clinical trials.
In addition, while a handful of studies have reported crosssectional results in the spinal cord in FRDA with morphometry, and one recent study with DTI but none with MRS, our study provides unique multimodal cross-sectional data. The  multimodal approach allowed assessment of anatomy, microstructure and neurochemistry in the same subjects, providing different and complementary measures of neurodegeneration. Finally, use of an early-stage cohort (as we did here, with only 5.5 years from onset and 2.3 years from diagnosis) will be crucial for planning of treatment development and testing going forward because efficacy of potential treatments would presumably be maximal if administered in the early stage of the disease, and because the faster disease progression at early stage would make it easier to detect an effect of potential treatments.
All three MR modalities (anatomical MRI, dMRI and MRS) showed large cross-sectional alterations in FRDA participants relative to controls, even at such an early stage. In

Morphometry
The smaller spinal cord area and higher eccentricity in FRDA compared with controls is consistent with previously reported results using MRI, 11,12,[15][16][17] as well as post-mortem pathology results. 54 While these cohorts are not directly comparable, results are remarkably consistent between studies, and show that the upper cervical spinal cord is generally ∼30-40% smaller in FRDA relative to controls, and eccentricity is ∼10% higher.

Diffusion MRI
Our dMRI cross-sectional findings are consistent with a recent study that reported lower FA and higher diffusivity in total white matter and in specific white matter tracts in FRDA. 30 Both studies found that FA in total white matter across C3-C6 (C2-C5 in 30 ) was about 25% lower in FRDA participants compared with controls, albeit Hernandez et al. 30 reported a smaller cross-sectional effect size (Cohen's d) of about −0.85 (compared with −2.5 here), which could be due to the larger range of disease severity in their cohort (IQR for FARS total neuro = 41 versus 14.8 in the present study) and therefore larger SD, or due to differences in MR acquisition parameters affecting the SNR of images. Both studies also found that RD in total white matter was much higher in FRDA than in controls (+56% in our study, about +160% in 30 ), while the increase in AD was smaller (+17% in our study, +6% in 30 ) The lower FA and higher diffusivity observed in the spinal cord are similar to those observed in the CNS in many neurodegenerative diseases and are consistent with neurodegeneration.

Magnetic resonance spectroscopy
Higher tNAA and lower mIns have been observed in numerous neurodegenerative disorders and are not specific to FRDA. However, alterations observed in the spinal cord in the present study were surprisingly large and much larger than those observed in the brain in FRDA. 55 In fact, the differences are so pronounced that disease status can practically be determined from visual inspection of the spectra alone.
Myo-inositol is often associated with glia and myelination, while tNAA, synthesized only in neurons, is considered a neuronal marker. Therefore, our findings suggest both neuronal loss and abnormal myelination.

Longitudinal effect sizes
Among MR metrics, the largest longitudinal effect size (SRM) was found for CSA, with an effect size of −0.95 at C2-C3. Post hoc analyses showed that SRM was even higher for CSA at C1-C2 (−1.23) (Supplementary Table 3). The lower effect size at C3 (−0.78) could be due to lower SNR at the edge of the field of view in the brain T 1 -weighted image. Studies with larger spinal cord coverage would be needed to investigate longitudinal changes in lower cervical and thoracic areas. However, we hypothesize that C1-C2 will provide close to maximum SRM because that region contains the highest number of ascending axons, and therefore would presumably undergo the largest absolute decrease

Figure 4 Correlation between FARS total neuro
and cross-sectional area. Correlation was determined using cross-sectional area as predictor variable and FARS total neuro as response variable. Data from all three visits were used. R = −0.60, P < 7·10 −6

Table 4 Regression coefficients between clinical parameters and MR parameters
Regressions were performed using data from all three time points. Numbers indicate the R coefficient between the corresponding clinical metric (response variable) and MR metric (predictor variable). Blue colour indicates negative correlation and red colour indicates positive correlation. Higher colour intensity indicates higher absolute correlation coefficient. Values are in mm 2 for CSA, 10 -3 mm 2 /s for MD, RD, AD, and unitless for eccentricity and tNAA/mIns. * P < 0.05, ** P < 0.005, *** P < 0.0005. in volume from neurodegeneration. This hypothesis is supported by cross-sectional data from Dogan et al.: 17 with the largest relative difference in FRDA versus controls in the upper thoracic spine, but the largest absolute difference in the upper cervical spine.
The longitudinal decrease in FA averaged over C3-C6 was borderline significant after correction for multiple testing, but post hoc analysis suggests that FA averaged over C4-C5 could be more precise. Indeed, when looking at individual levels (Supplementary Table 4), the measurement noise (SD of annual slope) was highest at the edges of the DTI FOV (C2 and C7) and lowest at the centre of the FOV (C4-C5), implying that the measurement was most precise at C4-C5. Differences in the values of annual slopes between individual spinal cord levels were well within the noise of the measurement and were not statistically significant.
Note that the SD on the slope for the average over C3-C6 (which we chose a priori as outcome measure for statistical analysis) was nearly the same as for the average over C4-C5, and was less than or equal to the SD on the slope at C4 or C5 taken individually (showing that averaging across cervical levels does improve precision). Therefore, while taking the average over C4-C5 may have been slightly more precise in retrospect, results averaged over C3-C6 still provided close to optimal precision.
A tract-specific analysis using SCT, similar to what was recently reported in 30 , did not reveal improved longitudinal effect size (SRM) in the dorsal columns and cortico-spinal tract (Supplementary Table 6).
For MRS, the ratio tNAA/mIns was by far the most sensitive longitudinal metric as expected. Other ratios such as tNAA/tCr and mIns/tCr were less sensitive to disease progression, but suggest that both a decrease in tNAA and an increase in mIns contribute to the longitudinal decrease in tNAA/mIns (Supplementary Table 5).
Overall, MR metrics had lower effect sizes than clinical metrics, with spinal cord CSA coming the closest. However, now knowing where to look, we expect that MR acquisition can be further optimized to improve the effect size of MR metrics. For example, spinal cord CSA was determined here using brain T 1 images with 1 mm isotropic resolution and suboptimal SNR below C3 cervical level. Future studies could easily use 0.8 mm isotropic and improved coverage of the cervical cord by activating corresponding coil elements in the 64-channel spine + neck receive coil.
We also noted that SRMs for clinical metrics in the present study appear higher than those reported in literature. 56 For example, one study reported an increase in FARS total neuro of 4.1 ± 7.84 (Table 10 in a study by Patel et al. 56 ) in a young cohort (BL age < 16) after 1 year, corresponding to an SRM of 0.52. In the present study, the annual increase in FARS total neuro (from 1-year data only) was 5.2 ± 4.5 corresponding to an SRM of 1.14 (Supplementary Table 7), even though the cohort was older (BL age = 19). It appears that the higher SRM in our study is primarily due to lower SD (4.5 versus 7.84), which could be explained by the fact that all our clinical assessments were performed by a single rater (no interrater variability).

True annual effect size
Effect sizes (SRMs) reported in Table 3 are for annual slopes obtained from 2-year data (i.e. three time points per participant). To plan clinical trials, it would also be useful to know the 'true' annual SRM, namely SRM size obtained from 1-year data only (i.e. two time points per participant). These SRMs are reported in Supplementary Table 7. As expected, true annual SRMs were lower than annual SRMs obtained from 2-year data. For example, the effect size for CSA at C2-C3 was −0.95 for annual slope from 2-year data, and −0.71 for annual slopes from 1-year data only.

Hypoplasia versus neurodegeneration
The large differences observed in FRDA participants relative to controls (for example, −52% lower tNAA/mIns) suggest that alterations may already be present well before onset. This is consistent with the hypothesis that alterations in the spinal cord are largely due to hypoplasia rather than neurodegeneration. 54 While previously reported correlations between CSA and measures of disease severity at baseline suggested that neurodegeneration was also present in FRDA, 15 our longitudinal data provides direct evidence and quantifies this neurodegeneration, with a rate of atrophy of −2.4% per year in our early-stage cohort.

MR metrics for monitoring disease progression in pre-manifest individuals
While CSA was the MR metric with the strongest longitudinal effect size in our cohort, other MR metrics could also play a key role, especially during the pre-manifest phase. For example, it is possible that neurochemical (MRS) and microstructural (DTI) alterations occur earlier during development than alterations in CSA (morphometry). While identification of pre-manifest individuals with FRDA is currently rare, perinatal genetic testing will likely become more widespread in the near future once the first effective treatments become available. MR modalities capable of detecting changes during the pre-manifest phase could therefore play a crucial role in assessing treatment efficacy during that phase, and in determining how early to start treatment.

Possible improvements to the MR protocol
With results now providing a better picture of the expected cross-sectional alterations and longitudinal changes in the spinal cord in FRDA, the MR protocol could be streamlined and shortened substantially, benefitting from recent advances in hardware and pulse sequences. For example, while spine dMRI acquisition lasted about 40 min in the present study, it can be reduced to ∼4 min. 21 Similarly, the length of spinal cord MRS was about 40 min (including all adjustments and shimming). Shorter TR, fewer shots (due to improved SNR with available 64-channel head-neck coils), faster shimming using FASTMAP, and improved workflow through automation of adjustment steps now allow acquisition of similar data in about 10 min. Therefore, data for all three modalities investigated here (morphometry, DTI, MRS) can be acquired in <30 min of scan time in future studies (not counting subject installation and clean-up), even if the MRS needs to be interrupted and repeated due to motion. This scan duration is comparable to that of the generic spinal cord protocol, 21 with the magnetization transfer acquisitions replaced by MRS.

Limitations of the study
Our study has a number of limitations. First, follow-up scans were only performed in FRDA participants, and not in controls. Therefore, slopes of change over time in FRDA were tested compared to zero, assuming that control values remain stable during the follow-up period. Indeed, published studies suggest that longitudinal changes in control participants are unlikely to explain the changes observed in our FRDA participants (average age: 19-20 years, range: 10-35 years). For example, large cross-sectional studies in healthy participants across age have shown that CSA increases from age 10 to early-mid-twenties and remains stable from 20 s to 30 s. 57,58 Therefore, we would not expect to see a decrease in CSA in controls in our cohort. For spine MRS, concentrations reported in healthy controls show less than 1% decrease in tNAA and no change in mIns across ages in the 20-70 age range. 59 For dMRI, FA and MD have been shown to be fairly stable, with less than 0.2% change per year in the 10-40 age range. 60 Therefore, it seems unlikely that the observed decrease in CSA, tNAA/mIns and DTI metrics in the present study would be due to normal aging.
Another limitation is that the scanner was upgraded midway through the study. However, the test-retest data obtained before and after upgrade, and the fact that we find similar results with 'same scanner' data and with 'all data' gives us high confidence that the longitudinal changes reported here were not substantially affected by the scanner upgrade.

Need for larger, multi-site study
Finally, while our single-site study provides the first quantitative longitudinal results in the spinal cord in FRDA, they were obtained in an early-stage cohort and cannot be generalized to later disease stages. Natural history studies of clinical scales have shown that disease progression in FRDA is faster at early stage, 56 as well in younger patients and in classical onset FRDA as opposed to late onset. 61 Our cohort in the present study included only classical onset FRDA participants, most of whom were young and early-stage. The average annual increase (slopes from 2-year data) was 5.3 ± 3.7 for FARS total neuro, 2.0 ± 1.1 for SARA and 2.0 ± 1.4 for ADL (Table 3). Based on natural studies of clinical scales, changes in MR metrics are likely slower in older cohorts, late onset cohorts and/or non-ambulatory cohorts.
Larger multi-site studies are needed to cover the entire range of disease severity and duration and gain a fuller picture of disease progression in the spinal cord in FRDA. A new multi-site longitudinal neuroimaging study (TRACK-FA 62 ) has now started and aims to scan 200 patients and >100 controls at three time points over 24 months. 63 This international effort, involving seven sites on four continents, will use an improved and streamlined neuroimaging protocol that includes the modalities presented in this paper.