White matter compartment models for in vivo diffusion MRI at 300mT/m

This paper compares a range of compartment models for diffusion MRI data on in vivo human acquisitions from a standard 60mT/m system (Philips 3T Achieva) and a unique 300mT/m system (Siemens Connectom). The key aim is to determine whether both systems support broadly the same models or whether the Connectom higher gradient system supports significantly more complex models. A single volunteer underwent 8h of acquisition on each system to provide uniquely wide and dense sampling of the available space of pulsed-gradient spin-echo (PGSE) measurements. We select a set of promising models from the wide set of possible three-compartment models for in vivo white matter (WM) that previous work and preliminary experiments suggest as strong candidates, but extend them to fit for compartmental T2 and diffusivity. We focus on the corpus callosum where the WM fibre architecture is simplest and compare their ability to explain the measured data, using Akaike's information criterion (AIC), and to predict unseen data, using cross-validation. We also compare the stability of parameter estimates in the presence of i) noise, using bootstrapping, and ii) spatial variation, using visual assessment and comparison with anatomical knowledge. Broadly similar models emerge from the AIC and cross-validation experiments in both data sets. Specifically, a three-compartment model consisting of either a Bingham distribution of sticks or a Cylinder for the intracellular compartment, an anisotropic diffusion tensor (DT) model for the extracellular compartment, as well as an isotropic CSF compartment, performs consistently well. However, various other models also perform well and no single model emerges as clear winner. The WM data (with virtually no CSF contamination) do not support compartmental T2 but partially support compartmental diffusivity. Evaluation of parameter stability favours simpler models than those identified by AIC or cross-validation. They suggest that the level of complexity in models underpinning currently popular microstructure imaging techniques such as NODDI, CHARMED, or ActiveAx, where the number of free parameters is about 4 or 5 rather than 10 or 11, may reflect the level of complexity achievable for a useful technique on current systems, although the 300mT/m data may support more complex models.


Introduction
Magnetic resonance (MR) microstructure imaging uses mathematical models to relate MR signals in each image voxel to microscopic tissue features, and thus estimate and map those features over an image volume. In diffusion MRI, the standard DT model has two key limitations: first it is too simple to explain the data over a wide range of b-values and orientations; and second, it lacks specificity to particular tissue features. A variety of alternative biophysical diffusion MRI models have emerged over the last decade to address these limitations. These models underpin the emerging generation of microstructure imaging techniques that are now starting to replace DT-imaging in a range of biological and clinical studies into tissue microstructure variation, as in e.g. Winston et al. (2014) and Kunz et al. (2014). Stanisz et al. (1997) pioneered the multi-compartment representation of separate diffusive processes in nervous tissue. The model has three geometric compartments: ellipsoids for restricted intra-axonal water, anisotropically hindered extracellular water (with diffusivity and relaxation constants different to the intracellular space) and isotropically restricted glial cell water, with exchange between all compartments. The Ball-Stick model (Behrens et al., 2003) is the simplest two-compartment model that can capture anisotropy. The two compartments exhibit restricted axonal diffusion (through a Stick) and isotropic extra-axonal diffusion (through a Ball). The Composite Hindered and Restricted Model of Diffusion (CHARMED) by Assaf et al. (2004) replaces Behrens' intracellular Stick with impermeable cylindrical fibres, and replaces the extracellular Ball with a full DT. Initially the model used a fixeddiameter distribution to estimate fibre orientation and volume fraction. Later work on AxCaliber (Assaf et al., 2008) additionally estimates the parameters of a gamma distribution of axon diameters (GDR-cylinders). ActiveAx Dyrby et al., 2013) combines elements of Stanisz's model and AxCaliber to obtain the Minimal Model of White Matter Diffusion (MMWMD), which is designed to be the simplest model that adequately fits diffusion MRI data from coherent WM while providing estimates of axon density and diameter. The model has a single axon diameter and cylindrically-symmetric extracellular diffusion, which makes it parsimonious enough to obtain orientationally invariant parameter estimates.
The simple models above do not account for fibre direction inhomogeneity which is abundant in the brain even at a sub-voxel level (Zhang et al., 2011b(Zhang et al., , 2012Jeurissen et al., 2013). A wide family of multiple fibre reconstruction algorithms (Seunarine and Alexander, 2009;Tournier et al., 2011) aim to recover multiple fibre orientations or the fibre orientation distribution, but these are not directly relevant here as they do not separate different tissue compartments. However, various compartment models incorporate the idea. Hosey et al. (2005) and Behrens et al. (2007) extend the Ball-Stick model to multiple Sticks, capturing multiple fibre populations with distinct orientation. Similarly, Assaf and Basser (2005) extend CHARMED to multiple intracellular compartments. The DIAMOND model (Scherrer et al., 2013) also employs discrete fibre populations and the number of fibres within each voxel is determined via a model selection framework based on the generalization error. Zhang et al. (2011b) extend the MMWMD, relaxing the assumption of straight parallel fibres to a Watson distribution of orientations. This led to the simpler NODDI model (Zhang et al., 2012), which further assumes zero axon diameter (i.e. Watson distributed Sticks rather than Cylinders). Later work (Tariq et al., 2014) incorporates dispersion anisotropy by replacing the Watson distribution with a Bingham distribution. Concurrently with NODDI, Sotiropoulos et al. (2012) extended the Ball-Stick model to Ball-Rackets also using a Bingham distribution of Sticks, and Bingham mixtures, as in fact proposed earlier in Kaden et al. (2007). Jeurissen et al. (2013) combine the constrained spherical deconvolution model of the fibre orientation distribution from Tournier et al. (2007) with additional grey matter and CSF compartments.
The wide-ranging set of available models relating the diffusion signal to microstructural tissue features raises the question about which are most appropriate in different situations. A series of recent studies have aimed to identify the combination of compartments that best explain the diffusion MRI signal from WM over the accessible range of the measurement space. Panagiotaki et al. (2012) provide a taxonomy of simple compartment models for non-dispersive fibres, by collecting and enriching earlier models. Using the Bayesian Information Criterion (BIC) they then compare the performance in explaining data from fixed rat nerve tissue acquired with a wide range of b-values and diffusion times over long scan times (65 h) and using a high gradient strength (400 mT/m) animal imaging system. Richardson et al. (2013) repeated the study using non-fixed tissue in 'viable' in vivo state showing consistency of model ranking but inconsistency of parameter estimates with fixed tissue. Ferizi et al. (2014) extended the acquisition to multi-shell high angular resolution diffusion imaging (HARDI) on a live human volunteer, using a standard human 3T scanner with a maximum gradient strength of 60 mT/m. Each study concludes that, to capture the broad trends in the signal, at least three compartments are required, characterising: i) cylindrically restricted intracellular diffusion; ii) hindered anisotropic extracellular diffusion; and iii) isotropically free and/ or restricted diffusion. Another work by Ferizi et al. (2013) extends the set of models to include fibre dispersion via a Watson distribution of fibres and found that, in one of the most coherent WM structures such as the corpus callosum, the dispersed fibre models described the signal better than models with a single orientation. All these experiments show that even in regions of tissue with supposedly the simplest geometry the models required to explain the data are surprisingly complex.
The recent development of human MR systems with 300 mT/m gradients, in particular the Connectom scanner (Setsompop et al., 2013), is a major step towards the long-term translation of microstructure imaging techniques to widespread clinical practice. Dyrby et al. (2013) illustrate the benefits of stronger gradients for mapping axon diameters and other parameters in WM using fixed post-mortem tissue and a small-bore animal imaging system. The first experiments verifying those findings on live human subjects are now beginning to emerge from the Connectom scanner Duval et al., 2014;Huang et al., 2015).
Here we explore the generalisability of earlier model comparison results, which use standard human scanners with up to 60 mT/m gradients, to the wider measurement space of human data accessible with the Connectom scanner. We construct a multi-shell HARDI protocol for the Connectom scanner with a wide range of b-values and diffusion times. It is similar in spirit and size to that acquired in Ferizi et al. (2014) but exploits the wider measurement space afforded by the 300 mT/m gradients. We concentrate on a subset of models used in previous work (Panagiotaki et al., 2012;Ferizi et al., 2013Ferizi et al., , 2014) that perform consistently well and compare model rankings obtained from the Connectom data sets and data from Ferizi et al. (2014) using the AIC, bootstrapping and cross-validation. We introduce a few methodological enhancements over previous works (Panagiotaki et al., 2012;Ferizi et al., 2013Ferizi et al., , 2014. First, we fit the models voxel-wise over more homogeneous regions of the genu, midbody and splenium, rather than fit to data averaged across the whole of the corpus callosum. Second, we account explicitly for the variable TE among measurements, by incorporating a compartmental T 2 decay into the modelling. Third, we also study models with compartmentally different diffusivity. We further look at the sensitivity of parameter estimates to noise by looking across bootstrap data sets and to spatial variation by mapping parameters over the corpus callosum, which provide a complementary evaluation of the models.

Methods
This section describes the data acquisition and the identification of regions of interest within the corpus callosum in which we run the model comparison. It then defines the set of models to compare, which enhance earlier models by including both T 2 and compartmental diffusivity effects, and reduce the full set to simplify the comparison. Finally, it describes the methods we use for parameter estimation and model comparison.

Data acquisition
We describe below the acquisition protocols, which are also summarised in Table 1. Ethical approval and written consent was obtained prior to scanning.

The Achieva protocol
This protocol is as described in Ferizi et al. (2014). A PGSE sequence was used on a clinical 3T Philips Achieva system, with cardiac gating and TR = 4 s. Every one of the 32 HARDI shells has a unique set of 45 directions with three interwoven b = 0 acquisitions. Each shell has a combination of Δ = {30, 50, 70, 90} ms, δ = {6, 10, 15, 22} ms, and |G| = {55, 60} mT/m, giving 1,536 measurements and achieving a b-value range of 218 to 10,308 s/mm 2 . TE varied from 51 to 127 ms. The SENSE factor was 1.10.
To cover this protocol, one healthy volunteer is scanned in two separate non-stop sessions, each lasting about 4 h. The nine slices, centred on the mid-sagittal plane, are 4 mm-thick with 2 mm × 2 mm in-plane resolution. The SNR of b = 0 images varies from about 30 at TE = 51 ms to 5 for TE = 127 ms. The images are then co-registered using Flirt (Jenkinson et al., 2002).

The Connectom protocol
This acquisition used the MGH/Harvard-UCLA Magnetom Skyra Connectom (Siemens Healthcare) scanner, which has a novel AS302 gradient system with a custom-built 64-channel coil, capable of |G| = 300 mT/m and a slew rate of 200 T/m/s. A PGSE (Tanner and Stejskal, 1968) sequence was used, with GRAPPA parallel imaging, an acceleration factor of 2, cardiac gating and TR = 1 s. The protocol contains 48 HARDI shells, each with 90 directions (45 unique pairs of opposite directions) and 10 interwoven b = 0 acquisitions. Each shell has a unique combination of Δ = {22, 40, 60, 80, 100, 120} ms, δ = {3, 8} ms, and |G max | = {60, 100, 200, 300} mT/m (as can be seen in Table 1, most of the actual |G| values used in the scanning are very close to |G max | so, for simplicity, we will make no distinction between the two). The b-value therefore ranges from 50 to 45,900 s/mm 2 . Each shell uses the minimum TE possible for a given combination of δ and Δ; TE thus ranges from 49 to 152 ms. (We note that the difference between the echo spacing, and thus echo train length, for the Connectom scanner and conventional scanners, such as Achieva, at the 2 mm resolution used, is relatively minor; for our protocol, they were 0.53 ms and 0.72 ms respectively. In addition, the 34.2 mT/m gradient for the EPI readout in this Connectom acquisition was not particularly high, compared with Achieva's 35 mT/m, and so any potential Maxwell terms would be negligible.) For this protocol, the same healthy control as in the Achieva acquisition is scanned over two 4 h non-stop sessions. The imaged volume comprises twenty 4 mm-thick whole-brain sagittal slices covering the corpus callosum left-right. The image size is 110 × 110 and the in-plane resolution is 2 mm × 2 mm. The SNR of b = 0 images is about 35 at TE = 49 ms and 6 at TE = 152 ms. All Connectom images are corrected for eddy current distortions and co-registered using FSL Flirt/Fnirt (Jenkinson et al., 2002) and the eddy module (www.fmrib.ox.ac.uk/fsl/eddy).

Compartment models
We considered only three-compartment models following earlier findings in Panagiotaki et al. (2012) and Ferizi et al. (2013Ferizi et al. ( , 2014. The compartments are shown in Fig. 1, and a more detailed description is given in the Appendix A. The full set of candidate models for each compartment leads to a very large set of three-compartment models. We focus here on a small subset that this earlier work suggests are the strongest candidates. In particular, following Alexander et al. (2010) and Zhang et al. (2012), we consider only isotropic free diffusion for the third compartment, modelling CSF contribution, since we consider only in vivo data. For the intracellular compartments, we consider only Cylinder, Bingham-sticks, and one/two Sticks (we regard the Stick-Stick combination as a single intracellular compartment). Preliminary work suggests possible benefits of Bingham over Watson models of dispersion. However, those experiments reveal that models which contain a more complex compartment than a single-radius Cylinder, such as GDR-cylinders (Assaf et al., 2008;Panagiotaki et al., 2012), dispersed Cylinders (Zhang et al., 2011b) or multiple distinct Cylinders (Zhang et al., 2011a), show only a marginal fitting advantage, and that they also take a very long time to fit and are quite unstable. So we decided to not include these complex models, to keep the set small and the results more comprehensible. Similarly, we consider only Ball and full Tensor for the extracellular compartment, excluding the Zeppelin models to keep the set compact.
Much previous work (Alexander, 2008;Alexander et al., 2010;Zhang et al., 2011bZhang et al., , 2012Panagiotaki et al., 2012;Ferizi et al., 2013Ferizi et al., , 2014 assumes equal diffusivity of the material in the intracellular and extracellular compartments. For each of our compartment models, we evaluate performance with and without this constraint. To highlight the difference in later results, the models with separate intra/extracellular diffusivities have a "-diff" suffix. In the model comparison we add a combination of two cylindrically symmetric tensors, Zeppelins, with unrelated orientations. This type of model, often referred to as a mixture of Gaussians, has been used to recover intra-voxel populations of fibres (Tuch et al., 2002;Parker and Alexander, 2003). We also further consider two established models from the earlier literature, NODDI and MMWMD as implemented similarly in Zhang et al. (2012) and Alexander et al. (2010). Both models use a simple tortuosity approximation, which the other models we consider here do not. They also have fixed diffusivity parameters, here to 1.9 and 1.7 μm 2 =s respectively.

T 2 effect
Varying TE means that we need to model T 2 effects. The total signal model is then: where f i , f e and f c are the weights of the normalised intracellular, extracellular, and CSF compartment signals S intra , S extra and S c , respectively; the values of compartmental T 2 decay rates are indexed similarly; S 0 is the proton density signal (which is TE-independent, and obtained from fitting to the b = 0 signal; see next section).

Model fitting
We use a two-stage model fitting procedure. The first step estimates the T 2 of tissue separately in each voxel by fitting to the (TE-dependent) b = 0 signal a bi-exponential model in which one component is from tissue and the other from CSF. A preliminary analysis of voxels fully inside WM regions shows no significant departure from a monoexponential decay, so we assume equal T 2 within the intracellular and extracellular compartments. When fitting the bi-exponential model, the value of T 2 in CSF is fixed to 1,000 ms. We fix this a priori as a more precise value of CSF is unlikely to be estimated with the range of TEs we use; this is also what we approximately estimate from an ROI in the ventricles. Thus, for each voxel we estimate the volume fraction of CSF, the S 0 and the T 2 of the tissue. These three estimates are then fixed for all the subsequent model fits, which enable a more direct comparison of the classes of WM compartments on their ability to fit the diffusionweighted signal.
In the second step, we fit each model using the Levenberg-Marquardt algorithm with an offset-Gaussian noise model, as in Panagiotaki et al. (2012) and Ferizi et al. (2014). The offset-Gaussian objective function is an approximation to a non-central chi noise model (Gudbjartsson and Patz, 1995), and accounts for the bias in low signal measurements introduced by the noise floor. It minimises the sum-ofsquared-errors: where N is the number of measurements,S i is the i-th measured signal, S i its prediction from the model, and σ is the noise standard deviation. To find σ for the Connectom data we used the background signal, as well as ROI signal in b = 0 images. For Achieva data we only used the ROI signal, as these images are acquired with a limited FOV. In both cases the σ estimates matched reasonably well with the observed signal noise floor in the plots, i.e. the residual signal along the fibres at the highest b-value. Each fitting was repeated 500 times, each time randomly perturbing from a central starting point with equal volume fractions, axial WM diffusivity 2 μm 2 =s, radial diffusivity 1 μm 2 =s, and cylinder radius 2 μm. The CSF diffusivity was fixed throughout to 3 μm 2 =s.

Model ranking
We compare the models in two ways. First, we use the Akaike information criterion (Akaike, 1974) where, given the data, L is the likelihood of the model with K number of free parameters. It is standard practice to drop the constant terms in log (L) because the absolute value of AIC is not informative, and that the AIC is not used to compare the performance of models across data sets (Burnham and Anderson, 2002). Further, as the noise level σ is found from the images, thus known a priori in the model fit, the first term in the AIC equates to the SSE of Eq.
(2). The lower the AIC, the better the model. Compared with other similar studies which use BIC (Panagiotaki et al., 2012;Ferizi et al., 2014), here we switched to the AIC following the recommendation in Burnham and Anderson (2002); however, for our large data sets the differences between AIC and BIC are minor. In addition to the AIC, we use cross-validation (Stone, 1974) and boostrapping (Efron, 1979) as complementary methods for model comparison and to further investigate the stability of the model ranking and parameter estimation. We use leave-one-out cross-validation at the level of HARDI shells, i.e. at each iteration we leave out a complete HARDI shell, fit the model to the remaining data, and use the fitted parameters to estimate all measurements in the missing shell. The final score is the average fitting error over all shells (32 for the ACH-genu data and 48 for the CON-genu data). When bootstrapping, we sample with replacement in the original data to get the same number of measurements. We bootstrap fifty times for each of the four voxels in the genu, giving a total of 200 data sets to which we fit the models. For our data, that number of bootstraps is sufficient for the variance in the mean parameter estimates over bootstraps to stabilise within 5% of its value.
Each model is fitted separately in each voxel of the regions of interest defined in Section 2.2, and the results reported in Section 3 are, unless otherwise stated, averages over all the voxels in the region.

Results
This section starts with some images of the corpus callosum. Then we show the ranking obtained via various model selection procedures after fitting the models. Finally, we evaluate the parameter estimates and stability obtained for each model. Fig. 2 shows the diffusion-weighted images from the Connectom scanner with gradient applied in the direction perpendicular to the fibres of the corpus callosum. Fig. 3 shows the ROI-average diffusion weighted signal for each combination of pulse sequence parameters in the CON-genu data set, split into the four gradient strengths used, together containing 4,800 measurements. The measurements provide good coverage of the signal range. Anisotropy is apparent even at the highest diffusion weighting of 45,900 s/mm 2 , clearly supporting models that assume restricted diffusion within axons. There were no obvious differences between the 45 pairs of opposite diffusion encoding directions, other than what is expected from measurement noise. (In two shells with δ = 3 and 8 ms, |G| = 60 mT/m, and Δ = 20 ms, four or five pairs showed unexpected differences at around twice the level of noise standard deviation. We ran the model fitting with and without these outliers, and the changes observed in the model parameter estimates were insignificant.) Fig. 4 shows the distribution of log-signal at b = 0 versus echo time TE for WM (genu, midbody and splenium) and CSF. In both WM and CSF the data show no significant departure from the mono-exponential model, so for each scanner we concluded that, at least for these data sets, the intracellular and extracellular T 2 are indistinguishable. Voxel-wise estimated T 2 of ACH-genu was 59 ± 2 ms, and of CON-genu was 51 ± 1 ms. Away from the genu, there was variation across the regions of the corpus callosum. A similar difference can be seen for higher T 2 in the splenium, at 70 ± 1 ms and 61 ± 1 ms for ACH-splenium and CON-splenium, respectively; but we found more comparable 63 ± 3 ms for ACH-midbody and 64 ± 2 ms for CON-midbody. The T 2 estimate from ACH-csf data averaged about 1,500 ± 400 ms, and 700 ± 86 ms from CON-csf data, although these estimates are less stable because the slope of the linear fit is very shallow. These estimates are in the order of previous estimates (MacKay et al., 1994;Whittall et al., 1997) of about 71 ms for genu, and over-1,000 ms for CSF. Fig. 5 compares AICs of each model for the three regional data sets from both scanners. The scores for each voxel are shown as coloured marks, and the star marker gives the average over the four-voxel region. We can roughly split the models into three groups. The first group of six models, marked with an H on the plots, generally has the lowest AIC; this group includes five models with separate compartmental diffusivities, comprising all four Tensor models as well as the combination of Ball with Bingham, and the remaining Tensor-Bingham model. The models in group marked with L generally have the highest AICs, and includes all combinations of extracellular Ball and intracellular Stick/s. The remaining models form a third group, marked with M on the plot, that have broadly intermediate AIC scores. The division into these groups is present in both genu and splenium data sets. AICs from the midbody region are less consistent, arising largely because of partial volume with CSF. We concentrate from here onwards on the genu data sets for a more detailed analysis of model performance. Fig. 6 shows the AICs of the models for the subsets of the CON-genu data sets with different gradient strengths. The plots show clearly how the group structure emerges in the plots as the gradient strength increases; specifically, model combinations of Ball with Cylinder are increasingly favoured, but those with Stick/s are not. Bingham-based models perform best across the four datasets. The results from CON-genu-60 and ACH-genu are largely similar, since both use similar gradient strengths; however, we note that some differences would arise as ACH-genu uses much larger δ, and so has higher b-value measurements than CON-genu-60. Fig. 7 shows how the ranking of models varies over bootstrap iterations and cross-validation folds. In bootstrapping, at each iteration, we compute the AIC of each model from the seen data; in crossvalidation, we compute the SSE of the unseen data. Thus we obtain a model ranking from each iteration of each procedure. The positional Fig. 2. Images of the corpus callosum showing the signal for gradient direction perpendicular to the fibres, at each |G|, for each pulse time δ, but only for the smallest and largest diffusion times Δ. The grey scale is inverted and adjusted in each case so as to give a reasonable contrast between the corpus callosum and the background. Signal still persists even at b = 45,900 s/mm 2 . Fig. 3. Diffusion weighted signal in the CON-genu data set averaged over the four voxels in the middle of the genu. For clarity, the signal is split across each gradient strength. The legend gives the b-value in units of s/mm 2 and, in the last plot, also diffusion and pulse times (δ|Δ), which are the same across the four plots, in units of ms. The x-axis gives the cosine of the angle between the applied gradient vector G and the fibre direction n; at 0 on the left the gradient is perpendicular to the fibres, but as it approaches 1 on the right the gradient direction becomes parallel with the fibre. variance diagrams show a histogram for each model of the position in the ranking over all iterations, pooled over the four genu voxels. For example, the top left diagram shows that Tensor-Bingham-CSF-diff and Tensor-Cylinder-CSF-diff consistently give the lowest AIC, whereas the bottom left diagram shows that their ranking by SSE on the unseen shells has lower rank more often.

Model ranking
The bootstrap and cross-validation results broadly reflect the group structure observed from AIC in Fig. 5. The leave-one-shell-out strategy in our cross validation creates much greater variation in the model ranking than the random selection of measurements in the experiment that does not consider the shell structure of the acquisition scheme (results not shown). This shows that the full distribution of b-values is much more influential on the choice of model than the choice of gradient directions.
There is a similar picture of model ranking stability when using CON-genu data. The group structure remains in bootstrapping and cross-validation, though there are more variations within the groups. Fig. 8 illustrates the fit of three models to the raw signal from just one of the voxels from the four-voxel neighbourhoods contributing to ACH-genu and CON-genu; the results are similar in the other voxels. We pick a representative from each group in the ranking of ACH-genu in Fig. 5: Tensor-Bingham-CSF-diff (top performing group, H), Ball-Bingham-CSF (M group), and Ball-Stick-CSF (L group). The model signal is shown as solid line, whereas the raw data is shown with markers. We can see that all three models capture the ACH-genu data reasonably well. The plots on the right reveal that higher b-value shells of CON-genu data enable a greater differentiation between the models. In particular, the three top-right plots reveal that the simpler Ball-Stick-CSF is clearly worse at capturing the 5,400 and 6,500 s/mm 2 shells. This pattern continues with even higher b-value shells in the bottom panel of plots. Beyond the 14,550 s/mm 2 shell we see differences between the two Bingham models: while Ball-Bingham-CSF cannot capture larger b-value shells, the extra complexity of Tensor-Bingham-CSF-diff model allows it to capture higher b-value shells more closely. Table 2 compares parameter estimates from each model from the voxels in ACH-genu (top) and CON-genu (bottom). Each estimate is the mean obtained after fitting models separately in each of the four voxels. Most models fitted to ACH-genu data give higher intracellular than extracellular volume fraction, which is consistent with what we expect from histological studies of WM tissue (Aboitiz et al., 1992;Highley et al., 1999); however, combinations of Tensor with Cylinder or Stick give lower intracellular volume fraction estimates. The coupled diffusivities are highest in Bingham models, which makes sense because they can explain reduced apparent diffusivity by fibre dispersion. In models Fig. 4. The blue markers of the plots show the b = 0 signals from one-of-four ROI voxels across TE values, as used in the estimation of T 2 ; the T 2 statistics in each subplot are taken across the four ROI voxels. Achieva plots are on the top-half, with Connectom plots below them; refer to the main text for the names of the data sets. Each of sixteen Achieva TEs has six b = 0 signals, while the Connectom data has forty b = 0 signals for each of the twelve TEs. The gradient of the fitted line, shown in red, is an estimate of the negative inverse of the T 2 value (the T 2 values inside each plot are estimates extracted from a bi-exponential fit, however, which also includes a CSF component). The distribution of points in WM did not suggest two separate rates of T 2 decay, therefore we fixed the intra/extracellular T 2 of WM compartments to be the same.

Parameter values and stability
with separate compartmental diffusivities, the intracellular diffusivity is generally higher than the extracellular diffusivity in Ball models, but the opposite applies to Tensor models (except Tensor-Bingham-CSF-diff). The Bingham distribution is more dispersed in Ball than Tensor models, most likely because the model has to compensate for the lack of extracellular anisotropy.
We see similar patterns in volume fraction, dispersion and diffusivity for the models fitted to CON-genu data. Again, separating compartmental diffusivities generally makes the volume fractions closer.
The parameter values from simple mixture models are revealing to compare. Specifically, we see in Stick-Stick and Zeppelin-Zeppelin models which provide more angular flexibility that the second tissue compartment, of small volume size in the case of Stick, is almost perpendicular to the first. Rather than capturing intrinsic anatomical features this is a sign that these models fail to capture the signal variation that arises from dispersion, and so end up with unexpected configurations simply to capture that effect, as illustrated in Jbabdi et al. (2012).
Parameter estimates from ACH-genu and CON-genu are broadly consistent, especially for simpler models. For these simpler models, the radius estimates are generally higher in ACH-genu. This suggests that the higher gradient strengths of CON-genu help ameliorate the overestimation of axon diameter index, consistent with Dyrby et al. (2013). Table 3 summarizes results using bootstrap samples to study the variance of the different parameter estimates. Comparison with Table 2 shows that the estimates from the full data set are reasonably similar to the mean across the bootstrap data sets. In addition, simpler models tend to have higher stability in the parameter estimation, i.e. the parameter estimates have lower variance across bootstrap samples. Fig. 9 shows the spatial variation over the corpus callosum of parameter estimates obtained in four different models. Specifically, we pick two relatively simple models representing established techniques in the literature, MMWMD and NODDI, and two more complex models, Tensor-Cylinder-CSF-diff and Tensor-Bingham-CSF-diff, that have similar combinations of components but consistently outperform the simpler models in our model comparison experiments. The key Fig. 5. Model ranking for each data set. Three model clusters are identified. On top of the ranking are six models, five of which have separate compartmental diffusivities (all four Tensor models, and Ball-Bingham), and the remaining Tensor-Bingham model; these are marked with H beside the plots. Combinations of Ball with Stick/s, marked with L, rank last. Middleranking models are marked with M. The colours are simply to discriminate rows and the coloured markers, one for each component voxel, correspond to the coloured name to the left. Each voxel has a different marker shape, which is consistent from row to row. The black stars show the mean over the voxels in each row. AIC values are meaningful within each data set, but not across the data sets. Fig. 6. The stability of model ranking with gradient strength. All four gradient-specific data sets reveal clear differences between model classes. Specifically, Ball with Cylinder models are favoured by higher gradients/b-values, whereas Ball with Stick/s combinations are not. CON-genu-60 uses similar gradient strengths as ACH-genu but it has a lower b max~2 , 000 s/mm 2 vs. b max~1 0, 000 s/mm 2 . observation is that the simpler models produce smoother parameter maps that are more consistent with trends in volume fraction that we expect to observe. In particular, the parameter maps from the complex models fitted to the Achieva data set are noisy whereas those from the simpler models are much smoother. The Connectom data appears to support the complex models better and produces more qualitatively acceptable parameter maps; the variation is still higher than for the simple models but it may reflect genuine anatomical differences. These maps emphasise that fitting data and predicting unseen data better are not the only important criteria in selecting an appropriate model for parameter mapping, as simpler models offer greater stability and so may reflect a more correct sensitivity to pathology or underlying anatomical differences.

Discussion
In this work we sampled a wide measurement space of in vivo human data accessible with the Connectom scanner and its 300 mT/m gradients with the aim of determining whether similar compartment models for WM explain these unique data as more accessible data from standard systems with 60 mT/m. Thus, we compare various models using acquisitions that span the range of possible PGSE measurements as widely and densely as possible from both the Connectom and a standard Achieva scanner. We compare models in several ways: AIC and cross-validation to determine their ability to explain measured data and predict unseen data, parameter variance across bootstrap samples to assess stability in the presence of noise, and spatial variation of parameter estimates to assess stability more qualitatively and to check for recovery of expected trends in the corpus callosum.
First, we observe (through Figs. 2 and 3) clear signal and signal anisotropy in Connectom data even with b-values as high as 45,900 s/mm 2 , which shows very strong diffusion restriction. We found (in Fig. 4) that neither data set supports separate T 2 parameters for the intracellular and extracellular compartments, but they do support a separate T 2 for CSF. The estimate of T 2 varies across the corpus callosum, being lower in the genu than in the splenium and the midbody. We see T 2 differences across the two data sets, but the trends are similar: Achieva T 2 values are about 10 ms higher in both the genu and splenium regions. Fig. 7. The AIC of bootstrap data sets (top) and SSE of cross-validation data sets (bottom) from the ACH-genu and CON-genu (see sub-Section 2.6 for more details). In bootstrapping we construct 200 data sets, fifty for each of the four voxels in the ROI. In cross-validation, each HARDI shell is left aside at a time. Clearly, there is greater uncertainty when predicting unseen shells in the cross-validation, especially for the richer CON-genu data set. Generally, in all diagrams the six H-ranked models, shown in blue, are on top, while the red-marked L-models are last. The number of model parameters are shown on the left (excluding three, S 0 , CSF volume, and T 2 of WM).
Though T 2 values are in the order of previous estimates, slight variations can be expected compared with other experiments specifically designed to measure tissue T 2 . The fixed 1,000 ms T 2 of CSF in both datasets may slightly affect WM T 2 . There are differences in the EPI-train and bandwidth in the two acquisitions that may contribute to the differences in observed T 2 , as may the T 2 ⁎ effects. T 1 relaxation can potentially affect the T 2 estimate, but the linear fit in Fig. 4 suggests that this effect is negligible (which is to be expected from a relatively high white matter T 1~8 00 ms, and a higher density of shells at lower TEs). However, we believe the dominant effect is partial volume contamination with CSF, particularly in the midbody, which differ in the two acquisitions.
We also found (from Figs. 5 and 7) that a broadly consistent group of three-compartment models emerge as best at explaining both the Achieva and Connectom data from the genu. Specifically, Tensor-Bingham or Tensor-Cylinder combinations perform strongly, using both AIC and cross-validation, with Tensor-Stick-Stick and Ball-Bingham also ranking consistently highly. The high performing models give a qualitatively reasonable fit across the range of b-values with no obvious sign of overfitting (Fig. 8). Moreover, we see (in Fig. 6) that higher gradients data favour more the complex fibre configurations when compared with the relatively simple Stick combinations.
The highest performing models, in particular Tensor-Bingham, benefit only slightly from having separate intracellular and extracellular diffusivity parameters. The separation appears more beneficial for models with Stick and Cylinder intracellular compartment, which may suggest that the performance enhancement simply comes from the additional parameter enabling these models to capture signal variation arising from orientation dispersion. Both midbody data sets have more partial volume artefacts, and provide a less clear comparison, although the results broadly support those from the genu and splenium.
The analysis of parameter stability (Tables 2 and 3, and Fig. 9) favours simpler models. First, the extra parameter introduced by separating intracellular and extracellular diffusivities makes the resulting volume fraction parameter estimates less in line with what we expect from histology, especially in Tensor models. Furthermore, separating the diffusivity parameters often increases the variance of diffusivity parameter estimates over bootstrap samples. Parameter maps over the corpus callosum appear noisier for the complex models emerging from the AIC and cross-validation rankings than simpler models underpinning standard techniques, such as NODDI and ActiveAx/MMWMD, although the maps from more complex models appear less noisy in the 300 mT/m Connectom data than the 60 mT/m Achieva data.
Our results suggest two broad conclusions about the choice of models for microstructure imaging. First, the choice of models to underpin microstructure imaging applications goes beyond the ability of the models to explain and predict data, as assessed by AIC and crossvalidation. While these quantities are informative to evaluate, and fitting and predicting data are clearly desirable qualities, a useful microstructure imaging technique must also provide realistic models, with Fig. 8. Plots illustrating the quality of fit for three selected models fitted to ACH-genu and CON-genu. We select one voxel from the genu ROI, whose raw signal is shown with markers and the model signal shown as solid line. Though the models are fitted to all the data, for clarity, plots show only a few selected high and low b-value shells, of |G| = 60 mT/m for ACH-genu, and 300 mT/m for CON-genu, leaving out b = 0 measurements. The shells have the same Δ (30, 50, 70, 90 ms for ACH-genu, and 22, 40, 80, 100, 120 ms for CON-genu), increasing in b-value from top to bottom. While the Ball-Stick model clearly fits the signal worse, differences between the two Bingham models are more subtle. However, the Tensor model has extra flexibility to capture signals across the full range, e.g. matching the higher b-value shells in CON-genu better than the Ball-Bingham model. stable and repeatable parameter estimates that reflect the underlying tissue architecture. The stability requirement favours simple models with complexity similar to those in current microstructure imaging techniques, such as NODDI. Tailored experiment design (Alexander, 2008) and more reliable location of globally optimal parameter estimates (Daducci et al., 2015) may help stabilise parameter estimates from more complex models to some extent. However, it seems unlikely that PGSE acquisition protocols on current widely available scanners, such as the Achieva, will support much more complex models than those in currently popular techniques. The second broad conclusion is that no single model clearly outperforms all the others. While most microstructure imaging techniques adopt a single model for all voxels, future techniques may benefit from retaining a range of models and making a selection at each voxel or by averaging predictions from multiple models in the spirit of Burnham and Anderson (2002) and explored to some extent by Stamm et al. (2014).
This study has a number of limitations and areas for further work. We study only a subset of the wide set of possible three compartment models, with the aim of investigating consistency at different gradient strengths rather than identifying the best from all possible models. Table 2 Parameter estimates from model fitting to the full data set of ACH-genu (top) and CON-genu. The CSF volume fraction and T 2 are the same across all models. κ 1 gives the degree of fibre dispersion and κ 2 the extent of fanning. Angle 1 refers to the angle, in degrees, between the first fibre compartment and the direction perpendicular to the mid-sagittal slice; similarly, Angle 2 corresponds to the second fibre compartment (the 2nd Stick or Zeppelin). In each data set, the SSE scores are projected onto a 0 (best fit) to 100 (worst fit) scale.
Preliminary work suggests little benefit from obvious extensions of the set of models we present results from here, such as Watson or Bingham distributions of cylinders (Zhang et al., 2011b), GDR-cylinders (Assaf et al., 2008;Panagiotaki et al., 2012), or mixtures of Watson and Bingham distributions of sticks (Kaden et al., 2007;Sotiropoulos et al., 2012). While the results suggest that certain models provide less reliable estimates than others, we cannot conclude that the signal is not sensitive to the effects modelled by any compartment in particular. Because the data we acquire are not optimised in the estimation of any specific parameter we can only draw general conclusions about model compartments.
Though we here report AIC, as suggested in Burnham and Anderson (2002), we also checked the BIC results for comparison, which agree strongly with the AIC. But we expect that AIC and BIC differences would become significant for smaller and sparser data sets. It could well be that none of these models are self-sufficient enough in describing the datato quote the statistician G.E.P. Box, "all models are wrong, but Table 3 The stability of parameters across 200 data sets (fifty bootstrap sets for each of the four ROI voxels). In addition to the mean estimate across all fits to the bootstrap data sets, we also report beside it, in blue superscript, the standard deviation as a percentage of the mean. More complex models generally have higher parameter variance. some are useful"but they nonetheless provide us with useful and interesting statistics, which no doubt will inform even better models in the future. In this study we start with a simple and established family. However, other better models may exist, and our data will be available online for others to compare other variants (for more information go to the 'Tutorials/Compartment Models' section on the Camino website http://cmic.cs.ucl.ac.uk/camino/). One possibility is to enrich each model with effects such as compartmental exchange/permeability or time-dependent diffusivity (Novikov et al., 2014). Moreover, such models may well provide benefits in other regions of the brain with more complex fibre architecture; possibly by also exploiting measurements from other pulse sequences, such as Stimulated-Echo Acquisition Mode (STEAM) (Merboldt et al., 1969;Tanner, 1970), Oscillating-Gradient Spin-Echo (OGSE) (Callaghan and Stepišnik, 1995), or Double Pulsed-Field Gradient Sequence (dPFG) sequences (Mitra, 1995;Cheng and Cory, 1999;Komlosh et al., 2007).
Artefacts in the experimental acquisition protocol can potentially affect the results, so we focus only on the mid-sagittal corpus callosum and, furthermore, mostly on just the genu, since our highly experimental acquisition produced the most usable data in that region. We cannot rule out that hidden artefacts affect the results in this area too, although we believe their influence to be minor. Future work with optimised pulse sequences will help eliminate such affects, and also enable a comparison of results to determine consistency across multiple subjects. Other minor effects may also influence results. For example, we do not correct for alterations to the B-matrix (Leemans and Jones, 2009) from movement, although those corrections are minor and are unlikely to have a major effect. Similarly, we account for non-central chi noise bias in only a simple way via the offset-Gaussian fitting objective function and more sophisticated techniques, e.g. from Veraart et al. (2013), might be beneficial when extrapolating from the results in this paper to more complete microstructure imaging techniques. Although the other areas of the corpus callosum appear to give similar results, more peripheral WM with greater orientation dispersion and prevalent crossings are likely to yield different conclusions. For example, Table 2 shows that in fact the Bingham distributions from the genu are quite isotropic so a Watson distribution would probably be sufficient there; however, in regions of fanning fibres the anisotropy of the Bingham distribution is likely to be more important (Tariq et al., 2014). Other pulse sequences may also provide access to other parameters that the models we use do not consider. For example, STEAM and Pulsed-Gradient Stimulated-Echo (PGSTE) may inform on WM exchange rates (as in Nedjati-Gilani et al. (2014) and Nilsson et al. (2009), respectively) and dPFG may inform on fibre curvature (Jespersen and Buhl, 2011).
One final point to highlight is that we include T 2 parameters in models for the diffusion MRI signal for the first time in an extensive model selection experiment. The T 2 parameters are essential to explain the signal well and careful estimation of their values is important, as even quite small errors can have a significant influence on the estimation of the other parameters. Additional experiments in Ferizi (2014) using the same data sets we report here study these effects in more detail.

Model for A comp
The Tensor model has A comp = exp(− bq t Dq), where D ¼ d ∥ nn T þ d ⊥ 1 n ⊥ 1 n T ⊥ 1 þ d ⊥ 2 n ⊥ 2 n T ⊥ 2 is the diagonalised Tensor matrix with eigenvectors {n , n ⊥ 1 , n ⊥ 2 } and their respective eigenvalues {d ∥ ; d ⊥ 1 ; d ⊥ 2 }, and q as the normalised wave vector. As special cases of the above Tensor, the isotropic Ball has d ∥ ¼ d ⊥ 1 ¼ d ⊥ 2 , whereas the anisotropic (one-  . Intracellular volume fraction maps of four representative models fitted to the whole corpus callosum Achieva data set (left) and Connectom data set (right). NODDI and MMWMD are simplified constructions of Tensor-Bingham-CSF-diff and Tensor-Cylinder-CSF-diff, respectively. dimensional) Stick, which is used to model the single Stick, two Sticks or Bingham-distributed sticks, has d ⊥1 ¼ 0 ¼ d ⊥2 .
The Cylinder model uses the Gaussian phase distribution approximation (Murday and Cotts, 1968;Stepišnik, 1993;Vangelderen et al., 1994;Alexander, 2008). This restricted signal is a product of the parallel A ∥ and radial A ⊥ , both being functions of applied diffusion gradient of magnitude G, with radial component G ⊥ and axial G ∥ relative to the cylinder axis. The axial signal is: whereas the radial signal component is: where R is the cylinder radius, d ∥ and d ⊥ are the apparent axial and radial diffusion coefficients, and α m R is the m-th zero of the derivative of the Bessel function of the first kind, order one.