Jacobian integration method increases the statistical power to measure gray matter atrophy in multiple sclerosis☆

Gray matter atrophy provides important insights into neurodegeneration in multiple sclerosis (MS) and can be used as a marker of neuroprotection in clinical trials. Jacobian integration is a method for measuring volume change that uses integration of the local Jacobian determinants of the nonlinear deformation field registering two images, and is a promising tool for measuring gray matter atrophy. Our main objective was to compare the statistical power of the Jacobian integration method to commonly used methods in terms of the sample size required to detect a treatment effect on gray matter atrophy. We used multi-center longitudinal data from relapsing–remitting MS patients and evaluated combinations of cross-sectional and longitudinal pre-processing with SIENAX/FSL, SPM, and FreeSurfer, as well as the Jacobian integration method. The Jacobian integration method outperformed these other commonly used methods, reducing the required sample size by a factor of 4–5. The results demonstrate the advantage of using the Jacobian integration method to assess neuroprotection in MS clinical trials.


Introduction
Multiple sclerosis (MS) is an inflammatory, demyelinating disease of the central nervous system. Although multiple focal lesions in white matter are the pathologic and imaging hallmarks of MS, gray matter is also involved. Gray matter pathology, which has been known from early post-mortem studies (Dawson, 1916) but overlooked for many decades, has recently become a new focus of MS research (Kutzelnigg et al., 2005;Lucchinetti et al., 2011). Several postmortem (Bo et al., 2007;Kutzelnigg et al., 2005), in vivo magnetic resonance imaging (MRI) (Mainero et al., 2009), and MR spectroscopy studies (Caramanos et al., 2009) have shown that gray matter pathology appears to be independent of white matter pathology, suggesting distinct mechanisms of tissue destruction. Pathological studies have shown that there is significant gray matter demyelination in MS, the extent of which can exceed that of white matter Kutzelnigg et al., 2005).
However, cortical lesions are rarely visible on conventional MRI (Geurts et al., 2005a(Geurts et al., , 2008. Advanced MRI techniques such as double inversion recovery and phase-sensitive inversion recovery can improve sensitivity to leukocortical and intracortical lesions (Geurts et al., 2005b;Nelson et al., 2007) but fail to capture the large bands of subpial demyelination seen on histopathology (Seewann et al., 2012). Tissue loss in gray matter (gray matter atrophy), which apparently results from lesional as well as non-lesional pathology (Wegner et al., 2006) and represents overall destructive pathology including neurodegeneration, can be measured by conventional MRI.
Measures of cortical gray matter tissue loss or atrophy are clinically relevant, as they correlate with cognitive impairment (Amato et al., 2007), are more closely associated with physical disability than whole brain atrophy (Fisher et al., 2008), and appear to be less influenced by so-called "pseudoatrophy" than whole brain or white matter atrophy (Nakamura et al., 2010;Tiberio et al., 2005). Indeed, these properties make cortical gray matter atrophy attractive as an outcome measure in clinical trials, particularly as therapeutic targets shift from suppression of inflammation to neuroprotection and remyelination.
The longitudinal measurement of cortical volume change on MRI is not an easy task because the cortex is thin and convoluted, and the relaxation behavior of both cortex and white matter can change with pathology. To be useful as an outcome measure in MS, it is critical to determine an optimal strategy to quantify gray matter atrophy with high statistical power. The objectives of this study were: (1) to assess the reproducibility of various analysis pipelines to measure cortical or gray matter volume, (2) to quantify cortical or gray matter atrophy over time in an MS population, (3) to compare these pipelines in terms of required sample size, and (4) to assess factors in study design (image resolution and study duration) that influence the statistical power to detect a clinical effect on the rate of cortical atrophy over time.

Material and methods
We used a scan-rescan dataset to calculate reproducibility and a longitudinal clinical study of MS patients to measure the required sample sizes to detect gray matter atrophy.

Subjects
Subjects for the scan-rescan dataset were 20 healthy normal controls (age = 30 ± 4 years, 10 females) (Aubert-Broche et al., 2006). Subjects for the longitudinal dataset came from a multi-center clinical study (Assessment Study of Steroid Effect in Relapsing Multiple Sclerosis Subjects Treated with Glatiramer Acetate, ASSERT, NCT00203047) involving 414 relapsing-remitting MS (RRMS) patients. A cohort of 287 patients (mean baseline age = 39.9 ± 9.0, proportion of female = 73.2%) who completed at least two MRI sessions was studied here. All patients were randomized to either with glatiramer acetate alone or with glatiramer acetate plus 1250 mg of prednisone given orally for 5 days every 4 months.

Segmentation of MS lesions
T2-lesions in white matter were automatically segmented using a multispectral Bayesian classifier (Francis, 2004) with PD-weighted, T2weighted, and T1-weighted images, and then reviewed by experts and manually corrected as necessary. No cortical gray matter lesions were identified, as the scanning sequence was not designed to be sensitive to gray matter lesions.

Image analysis
The T1-weighted images for each subject were analyzed by combinations of cross-sectional and longitudinal pre-processing with crosssectional segmentation-based and longitudinal registration-based algorithms. The following section describes the details of the pre-processing and atrophy measurement methods. All methods were fully-automated except for the MS-lesion segmentation described above.
Conventional cross-sectional pre-processing (XPP): As shown in Fig. 1, XPP consisted of (XPP-1) N3 intensity-non-uniformity correction (Sled et al., 1998); (XPP-2) MS-lesion filling (Battaglini et al., 2012) (to reduce bias in gray matter volumes due to the impact of variable white matter MS lesion loads on image intensity distributions) (Nakamura and Fisher, 2009); and (XPP-3) standard ICBM-space registration (using the ICBM 2009c Nonlinear Symmetric Template) (Fonov et al., 2009), using a hierarchical registration technique (Nakamura, 2011). Briefly, the hierarchical registration procedure involved estimating the affine transformation parameters in multiple steps: (1) two rotations (y-and z-rotations) by maximizing the left and right inter-hemispheric symmetry, (2) x-rotation and z-translation by normalized mutual information (NMI) registration to align anterior-posterior on the y-axis, (3) multi-seed optimization for a global scaling factor using NMI, and (4) Fig. 1. Flowchart describing the cross-sectional (XPP) and longitudinal (LPP) pre-processing pipelines. three scaling and three shearing parameters again estimated with NMI. The Nelder-Mead simplex method was used to optimize each step with NMI as the cost function (Pluim et al., 2003). For lesion filling, the mean and standard deviation of normal-appearing white matter (NAWM) were estimated from initial segmentation of normal-appearing brain tissue using FMRIB's Automated Segmentation Tool (FAST) (Zhang et al., 2001).
Longitudinal pre-processing (LPP): The longitudinal pre-processing began with the XPP and added the following: (LPP-1) Intra-subject registration using pairreg from FSL, which includes skull-based corrections for scaling and skewing between each pair of images , that corrects for potential voxel size mis-calibration and, to some extent, geometric distortion (Caramanos et al., 2010); (LPP-2) An unbiased subject-specific linear template using all combinations (between each time-point) of linear transformations . The matrix average was calculated as in Leung et al. (2012) using octave (http://www.octave.org); (LPP-3) Differential intensity correction, which estimates the bias field as a median-filtered ratio map with respect to the subject-specific template Lewis and Fox, 2004); (LPP-4) Template-to-standard-space registration using the hierarchical method, as previously described; (LPP-5) Consistent standard space (ICBM 2009c Nonlinear Symmetric Template (Fonov et al., 2009)) registration by concatenating the native-totemplate affine registration matrix and template-to-standard-space registration matrix; (LPP-6) Field-of-view matching by removing all voxels that were not in the image at any time-point, and (LPP-7) Longitudinal lesion-filling using combined lesion masks, where the lesion masks from each time-point were transformed to the subject-specific template, combined, transformed back to the native space, and filled with NAWM intensities similar to the lesion filling in XPP (Nakamura et al., 2010). The flowchart of the two pre-processing pipelines is shown in Fig. 1.
Statistical Parametric Mapping (SPM, http://www.fil.ion.ucl.ac.uk/ spm) is a software suite of MATLAB functions and subroutines. We used the latest version, SPM8b. Of the many pipelines in SPM8b, we are interested in the "Segment" function (Ashburner and Friston, 2005). It is a cross-sectional algorithm where each image is independently analyzed. This segmentation produces tissue probability maps from which maps of gray matter, white matter, and cerebrospinal fluid classes are obtained. The tissue class with the highest probability is assigned at that voxel. For this work, we are interested only in the gray matter voxels. Since SPM requires good initial spatial normalization, we performed linear spatial normalization prior to SPM analysis using the hierarchical registration method in XPP (Nakamura, 2011) with the stereotactic ICBM 2009c Nonlinear Symmetric Template image (Fonov et al., 2009). The resulting volume, therefore, is a head-size normalized total (= cortical + deep + cerebellar) gray matter volume. SIENAX is the cross-sectional version of the Structural Image Evaluation using Normalization of Atrophy (SIENA) method  and is part of FSL (http://www.fmrib.ox.ac.uk/fsl/). Currently, SIENA cannot measure cortical atrophy and was not used in this study. Briefly, in SIENAX, the brain is extracted from the volume using the Brain Extraction Tool (BET) (Smith, 2002) and then classified using FAST into gray matter, white matter, and cerebrospinal fluid (Zhang et al., 2001). FAST corrects  for spatial intensity variations as well as partial volume, and uses a hidden Markov random field model and expectation-maximization algorithm (Zhang et al., 2001). SIENAX calculates a v-scaling factor to normalize the brain volumes so that they are comparable in the standard stereotaxic space. This scaling is the determinant of the skull-constrained brain registration matrix that registers the subject MRI and standard template. In the end, SIENAX outputs normalized and non-normalized volumes for cortical and total gray matter. In this study, we used normalized cortical volume.
FreeSurfer is a freely available image analysis package (http://surfer. nmr.mgh.harvard.edu) that has both cortical surface reconstruction (Dale et al., 1999) and volumetric segmentation (Fischl et al., 2002(Fischl et al., , 2004a. In this study, the surface-based cortical thickness is used to measure cortical gray matter atrophy. The images are analyzed first cross-sectionally; then the unbiased longitudinal scheme is applied to improve the consistency (Reuter et al., 2012). XPP or LPP is not applied for FreeSurfer because FreeSurfer has its own longitudinal pipeline. The FreeSurfer version is 5.1.
The Jacobian integration method is a longitudinal registration-based method and a type of tensor-based morphometry (Ashburner et al., 1998). We used a variant of the longitudinal pipeline developed in the Image Processing Laboratory at the McConnell Brain Imaging Centre at the Montreal Neurological Institute (Guizard et al., 2012). Briefly, the Jacobian integration method consisted of the following: (1) skull-based intra-subject registration using pairreg , (2) transformation and resampling of both images into an isotropic halfway space using sinc interpolation, (3) symmetric nonlinear registration of the two affine-halfway-transformed images using SyN (Avants et al., 2008), (4) calculation of the local Jacobian determinants of nonlinear displacement fields, and (5) integration of Jacobian determinants within the baseline cortical masks obtained from FAST (Zhang et al., 2001). The Jacobian determinants are calculated from numerical integration and not analytical integration of functions used for nonlinear registration. The output of the Jacobian integration method is a percent change in volume; it is not a cross-sectional measure.

Effect of study designs
We evaluated the effect of image resolution and duration of trials using the Jacobian integration method. To assess the effect of image resolution, the rate of cortical atrophy was measured from higherresolution sagittal MRIs (1.5 × 1.0 × 1.0 mm 3 ) and separately using standard-resolution axial MRIs (1.0 × 1.0 × 3.0 mm 3 ). It should be noted that this evaluation of image resolution does not test the pure effect of resolution change because the pulse timing parameters are not the same. Nonetheless, we find that this evaluation is more realistic than synthetic averaging of slices, and the result is directly applicable to real-world clinical trials.
For the statistical effect of the study duration, the rate of cortical atrophy was measured from baseline to year 1, baseline to year 2, and baseline to year 3.

Statistical analysis
For scan-rescan analysis, we measured the absolute percent volume change.
In order to determine and compare the statistical power of each pipeline, we estimated the sample size (per arm) required to detect pre-specified treatment effects (10-90%) in the longitudinal data set, without accounting for normal aging. We compared combinations of the following analysis pipelines (1) XPP + SPM, (2) LPP + SPM, (3) XPP + SIENAX, (4) LPP + SIENAX, (5) cross-sectional version of FreeSurfer, (6) longitudinal version of FreeSurfer, (7) XPP + Jacobian integration method, and (8) LPP + Jacobian integration method. The effect of study design was investigated using Jacobian integration method only.
We used the "pwr" package (http://cran.r-project.org/web/packages/ pwr/index.html) in R to estimate the sample size required to detect treatment effects with 80% power, 0.05-significance level, and 10-90% treatment effects. The treatment effect was assumed to start immediately and remain constant over 3 years. The 95% confidence interval was estimated by bootstrapping 10,000 times. The sample size was calculated from the longitudinal data and independent of scan-rescan data.
For the analyses using SIENAX, SPM, and FreeSurfer, the atrophy rate was calculated from the difference in cortical gray matter volume, total gray matter volume, and cortical thickness, respectively. For the Jacobian integration method, the output is a direct measure of atrophy rate in percent change. The atrophy rates were annualized before calculating  The required sample size per arm for each pipeline for varying treatment effects with fixed power of 80% and 0.05 significance level. Table 1 shows the same values with 95% confidence interval. Values greater than 1000 are omitted here as such trials are not realistic. Cross-sectional FreeSurfer, FreeSurfer (x) values are not displayed because their values are above 1000. Abbreviations: XPP = cross-sectional pre-processing; LPP = longitudinal pre-processing. the sample sizes. As in Healy et al. (2009), we defined outliers a priori as having an annualized change greater than 10%/year, and these were eliminated from the analysis. The number of outliers for each technique is reported in Results section.
From the longitudinal dataset, we removed data from seven sites that changed scanners during the data acquisition period. We also did not analyze images with incomplete supratentorial brain coverage or severe artifacts. For high-resolution MRIs, there were 279 baseline and year-one image pairs, 159 baseline and year-two image pairs, and 71 baseline and year-three image pairs. The respective numbers were 274, 158, and 71 for the standard-resolution MRI image-pairs. There was no effect of treatment on whole brain or cortical atrophy, and the following analysis was performed on the combined group. Fig. 2 shows an example of input and resulting images from a single RRMS subject. Fig. 3(a) shows the percent volume change in cortical or total gray matter atrophy from each pipeline, and Fig. 3(b) shows the required sample size for varying treatment effects. Table 1 shows the corresponding values with 95% confidence intervals. The numbers of outliers (defined as N10%/year change a priori) was none from Jacobian integration method; 4 from SPM8 and SIENAX with XPP; and 2 from SPM8 and SIENAX with LPP; 16 for cross-sectional for Jacobian integration, XPP + Jacobian, LPP + Jacobian, XPP + SPM8, LPP + SPM8, XPP + SINEAX, LPP + SIENAX, cross-sectional FreeSurfer, and longitudinal FreeSurfer, respectively. Compared to XPP, LPP decreased the required sample size on average by 38% and 57% for SPM8 and SIENAX, respectively. The Jacobian integration method showed further improvement and had a 58% reduction on average compared to the next best result, LPP + SPM8. Compared to conventional XPP + SIENAX, Jacobian integration reduced the sample size required to see a change in cortical gray matter by more than 5 fold.

Effect of study designs
MRI image resolution was found to be a significant contributor to study power. As shown in Fig. 4 and Table 2, low-resolution MRIs (3 mm slice thickness) required an average of 34% more subjects to detect differences. The same figure also shows that longer studies require fewer patients. A post-hoc analysis with a subset of patients who had completed 4 MRIs did not significantly change these results.

Discussion
The results of the current study showed that the longitudinal Jacobian integration method was superior to commonly-used crosssectional methodsreducing the required sample size by 4-5 fold in the MS population studied. The required sample size was reduced when commonly-used cross-sectional methods were applied on longitudinally-pre-processed images (Nakamura et al., 2012), but the improvement with the Jacobian integration method far exceeded that improvement. Our results suggest that longitudinal methods such as the Jacobian integration method have substantial advantages for measuring cortical and gray matter atrophy in future clinical trials.
The fact that the average atrophy rates varied from −0.519%/year with the Jacobian integration method + LPP to −1.218%/year with LPP + SIENAX emphasizes that the interpretation of atrophy data requires caution. We cannot directly compare atrophy rates across different analysis methods. The current literature on cortical atrophy using Jacobian integration methods in MS is limited. In Anderson et al. (2009), the authors used a Jacobian integration method to compare the rates of gray matter atrophy in normal controls versus that in patients with RRMS and failed to detect a significant differencelikely due to the study's small sample size. In another study, Anderson et al. applied a Jacobian integration method in patients with Alzheimer's disease and showed a pattern similar to our findings of power improvement with respect to SIENAX (Anderson et al., 2012).
The estimated sample size required by SIENAX in the current study was larger than that of a previous report by Healy et al. (2009). The latter study reported approximately 70-250 patients per arm for a 2-year annual MRI study with 50% treatment effect and 80% power, whereas Required sample size per arm to detect treatment effect on cortical atrophy except for SPM, which uses gray matter atrophy. The range is a 95% confidence interval obtained from bootstrapping 10,000 times. Abbreviations: XPP = cross-sectional pre-processing; LPP = longitudinal pre-processing. our equivalent analysis with XPP-SIENAX required approximately 450 patients per arm (data not shown). A number of important differences between the studies may underlie this discrepancy: first, study designs were different (measurement of total gray matter atrophy from monthly MRIs in Healy et al. vs. cortical atrophy from annual MRI in the current study); and, second, the atrophy rates in the patient populations were very different (from −1.9 to −3.6%/year in Healy et al. and from −0.56% to −1.22% in the current study). It is plausible that the MS population studied here, all of whom were on treatment with glatiramer acetate, was more stable than the population in that study.
The current study did not include a placebo arm, which would have allowed for an estimation of actual treatment effect. Nevertheless, future MS clinical trials are unlikely to have placebo arms due to ethical considerations ; therefore, modeling the treatment effect against a placebo population may overestimate the statistical power when applied to a trial with an active comparator arm. As a result, our estimations are more directly applicable to future clinical trials. The current study also did not examine cortical atrophy in normal controls, which could have allowed us to account for normal aging. However, cortical, or gray matter, atrophy due to normal aging is very small compared with MS-related atrophy at the group level (−0.028 ± 0.24 and −0.23 ± 0.34% per year, respectively, for normal controls and patients with RRMS in Fisher et al. (2008); and −0.06 ± 0.16 and −0.15 ± 0.27% per year in Anderson et al. (2009)); thus, we believe that neglecting normal aging effects in the present calculation introduces little error. However, in a population where atrophy due to normal aging is not small (e.g., Alzheimer's disease), this approach could overestimate the power.
FreeSurfer did not perform well in the current study as shown by the large sample sizes and many outliers. It is possible that images used in the longitudinal study (single FLASH without signal averaging or multi-echo) are not optimal for FreeSurfer. Nonetheless, it emphasizes that the Jacobian integration method is robust with respect to the T1weighted sequence details as its performance in low-resolution images is relatively similar to that from high-resolution MRIs. This is a potentially important practical advantage of Jacobian integration, as it can be used almost equally as well on images with 3 mm slice thickness, which is standardly acquired in MS clinical trials.
We measured atrophy in all cerebral gray matter (deep and cortical gray matter) directly using different methods, and compared their statistical power. Cortical gray matter forms about 80% of total gray matter and does not include deep structures such as the thalami, which are known to show disproportionately high rates of atrophy, even in early MS (Henry et al., 2008). Importantly, cortical gray matter also does not include cerebellar gray matter, and the segmentation of which on conventional MRI scans is highly contaminated by partial volume effects. Sample sizes were not very different for the two cross-sectional approaches (namely, SPM and SIENAX), and all things considered, it is difficult to assess whether one metric should be preferred in the clinical trial setting.
A constant and immediate treatment effect was assumed here. However, it is possible that gray matter may demonstrate pseudoatrophy (an acceleration of atrophy in the first year following the initiation of treatment), depending on the nature of the treatment being initiated . Previous studies have suggested that white matter volume change is predominantly affected by pseudoatrophy whereas gray matter is less sensitive to fluctuations associated with changes in inflammation (Nakamura et al., 2010;Tiberio et al., 2005). However, we cannot exclude the possibility that pseudoatrophy can occur in gray matter as well (Horakova et al., 2008;Nakamura et al., 2010). It is also possible that there is a tissue-specific delayed effect of treatment on atrophy; that is, conventional anti-inflammatory treatments may reduce inflammation in white matter first, followed by reduced Wallerian degeneration, and ultimately neuroprotection. Such treatment-specific mechanisms of action may also play a role in the dynamics of cortical atrophy.
In conclusion, our results clearly show that longitudinal registrationbased methods such as the Jacobian integration method described here have greater statistical power for detecting treatment effects on gray matter atrophy than the commonly-used cross-sectional segmentationbased methods, even when the latter are combined with longitudinal pre-processing. Our results should help in the planning of new clinical trials assessing neuroprotection in MS.