Machine learning based compartment models with permeability for white matter microstructure imaging

Abstract Some microstructure parameters, such as permeability, remain elusive because mathematical models that express their relationship to the MR signal accurately are intractable. Here, we propose to use computational models learned from simulations to estimate these parameters. We demonstrate the approach in an example which estimates water residence time in brain white matter. The residence time &tgr;i of water inside axons is a potentially important biomarker for white matter pathologies of the human central nervous system, as myelin damage is hypothesised to affect axonal permeability, and thus &tgr;i. We construct a computational model using Monte Carlo simulations and machine learning (specifically here a random forest regressor) in order to learn a mapping between features derived from diffusion weighted MR signals and ground truth microstructure parameters, including &tgr;i. We test our numerical model using simulated and in vivo human brain data. Simulation results show that estimated parameters have strong correlations with the ground truth parameters (Symbol) for volume fraction, residence time, axon radius and diffusivity respectively), and provide a marked improvement over the most widely used Kärger model (Symbol). The trained model also estimates sensible microstructure parameters from in vivo human brain data acquired from healthy controls, matching values found in literature, and provides better reproducibility than the Kärger model on both the voxel and ROI level. Finally, we acquire data from two Multiple Sclerosis (MS) patients and compare to the values in healthy subjects. We find that in the splenium of corpus callosum (CC‐S) the estimate of the residence time is 0.57±0.05 s for the healthy subjects, while in the MS patient with a lesion in CC‐S it is 0.33±0.12 s in the normal appearing white matter (NAWM) and 0.19±0.11 s in the lesion. In the corticospinal tracts (CST) the estimate of the residence time is 0.52±0.09 s for the healthy subjects, while in the MS patient with a lesion in CST it is 0.56±0.05 s in the NAWM and 0.13±0.09 s in the lesion. These results agree with our expectations that the residence time in lesions would be lower than in NAWM because the loss of myelin should increase permeability. Overall, we find parameter estimates in the two MS patients consistent with expectations from the pathology of MS lesions demonstrating the clinical potential of this new technique. Symbol. No caption available. Symbol. No caption available. HighlightsSome tissue parameters remain elusive because mathematical models are intractable.We propose to use machine learning to estimate these parameters, here permeability.Simulation results show an excellent agreement between estimations and ground truth.New technique performs better than the standard Karger Model.In‐vivo results consistent with pathology of MS lesions showing clinical potential.


Introduction
Techniques such as AxCaliber (Assaf et al., 2008) and ActiveAx  use computational models to provide estimates of tissue microstructure properties, such as cell size or packing density, from diffusion-weighted (DW) MR data. These models use simple geometries such as cylinders and spheres to represent axons and other cells, so that closed form mathematical expressions can be derived that closely approximate the expected MR signal. However, some tissue properties such as permeability or fibre undulation, have previously remained elusive because mathematical models that express their relationship to the MR signal accurately are intractable. Numerical simulations (Hall and Alexander, 2009) provide flexible forward models that can capture such effects, however they are computationally expensive to use in solving the corresponding inverse problem to estimate parameters from data. Here we propose to use a machine learning approach to learn the mapping between the underlying model parameters and the diffusion signal using numerical simulations, and then use that mapping to estimate the parameters. We demonstrate this idea on the residence time τ i of water molecules inside the white matter axonal fibres.
There is a widespread interest in developing imaging biomarkers based on the permeability associated with the intra-axonal water residence time τ i . Permeability is an important property of axonal membranes, and its changes have been associated with the condition of the insulating myelin sheet  and with various neurological conditions, such as Multiple Sclerosis (MS) or Parkinson's disease (Volles et al., 2001). DW MRI is potentially able to estimate τ i as it is sensitive to the motion of water molecules within tissue. However, progress has been limited by lack of sufficiently accurate mathematical models relating τ i to the MR signal.
The analytical expressions used to calculate signals of white matter tissue assume that the water molecules are completely restricted and thus ignore or use simplistic models of permeability (Van Gelderen et al., 1994;Callaghan, 1997;Codd and Callaghan, 1999). A new approach by Grebenkov et al. (2014), assuming high gradients and narrow pulses, may help to address these issues but current results are only applicable to compartment sizes substantially larger than human cells.
Other mathematical models, such as the Kärger model and apparent exchange rate (AXR) imaging, explicitly incorporate τ i by considering the effect of exchange among water pools. Of the two approaches, the Kärger model (Kärger et al., 1988) is most commonly used (Stanisz et al., 1997;Lätt et al., 2009;Nilsson et al., 2010) as it is compatible with data acquired using widely available pulsed gradient spin echo (PGSE) and stimulated echo (STE) imaging sequences. It accounts for inter-compartmental water exchange by coupling the DW MR signals due to the separate compartments via τ i . However, it relies on the assumption that the two individual pools of water are well mixed and it does not model restriction which in white matter tissue is prominent due to the presence of axonal membranes which restrict the motion of water molecules. The Kärger model may still provide a reasonable description of the diffusion-weighted signal in the long time limit, however the cell membranes have to be close to impermeable, and in the case of fast exchange the model fails (Fieremans et al., 2010). Even though these limitations have been known for over 20 years, there have been no improvements to the model due to the mathematical intractability of the problem. AXR imaging (Lasič et al., 2011; has recently been introduced as an alternative to estimating τ i via Kärger model; however it requires specialised double diffusion encoding imaging sequences (DDE) (Shemesh et al., 2016) and it again relies on strong assumptions about the compartmentation of water into a 'fast' and 'slow' pool. The estimated AXR parameter also conflates τ i with intra-axonal volume fraction f, making it difficult to disentangle the origin of any measured change. Furthermore, AXR is based on Gaussian compartments, with time-independent Gaussian diffusion in each compartment, and allows for only two compartments otherwise further conflation occurs (Lasič et al., 2016).
Given the inherent difficulties involved in deriving accurate analytical models of exchange with permeable axonal membranes, there have been alternative approaches which bypass mathematical models altogether and use simulations to learn how permeability affects measured DW MRI signals. Nilsson et al. (2010) generate libraries of microstructure parameters and their corresponding DW MR signals from Monte Carlo (MC) simulations, and use them to find the nearestneighbour microstructure parameters that matched unseen data, i.e. combinations of tissue parameter values not explored in the training set. However, nearest-neighbour matching typically has poor generalisation. Furthermore, using the raw signals to perform the matching is also potentially inefficient, as it requires new libraries to be generated for each acquisition protocol used and each library entry is specific to a particular fibre orientation.
Our machine learning approach constructs a mapping between microstructural parameters of interest and orientationally invariant features derived from the DW MR data. We use MC simulations to generate synthetic signals from a library of histologically relevant microstructure indices. A random forest regressor then learns the mapping between features derived from DW MR signals and ground truth microstructure parameters using the synthetic data, providing an efficient and accurate method for estimating microstructure parameters, including τ i , that generalises smoothly between training examples. We compare our approach to the Kärger model using simulated data and demonstrate that the trained random forest can be used to estimate sensible and consistent estimates of microstructure indices from in vivo healthy human brain white matter. Finally, we demonstrate the clinical relevance of this approach using data acquired from two patients with MS.

Imaging protocol
We use an orientationally invariant, DW-PGSTE protocol with "echoplanar imaging" (EPI) sequence, optimised (Alexander, 2008) for a two-compartment model with exchange, assuming a maximum imaging time of approximately 30 minutes. PGSTE sequences are particularly well suited for estimating τ i , as they allow us to probe long diffusion times by exploiting the longer T 1 compared to T 2 relaxation rates. Here we use an orientationally invariant protocol, which does not depend on the orientation of the fibres and is optimised assuming that the fibre orientation is unknown. The protocol is optimised for the following biophysically plausible tissue parameters, assuming that axons are modelled as infinitely long, randomly packed parallel cylinders: intra-axonal volume fraction f=0.7, intrinsic diffusivity d = 2 × 10 m s −9 2 −1 , apparent diffusivity in the perpendicular direction in the extracellular space d = 0.7 × 10 m s ⊥ −9 2 −1 , axon radius R=1 μm, intracellular residence times τ ∈ {0.05, 0.1, 0.2, 0.4, 1} s i , T 1 =0.832 s (Stanisz et al., 2005). Exchange is incorporated in the optimisation via the Kärger formalism. This choice of model and parameter values in the protocol optimisation are not critical, here we design it to find a combination of measurements broadly sensitive to the white matter parameters of interest: f, d ⊥ , d, R and τ i . For example, axon radius of 1 μm is relatively large for human brain (Aboitiz et al., 1992), however since the clinical scanners are not sensitive to small diameters , choosing a smaller R would not make any difference in the optimal sequence. The resulting protocol contains 108 measurements divided into 4 equal shells, with Δ ranging from 0.102 to 0.412 s. The final protocol, with nominal b values, is shown in Table 1. The additional crusher and slice select gradients used in the PGSTE sequence result in altered diffusion weighting, which we account for in the data generation (Alexander and Dyrby, 2013;Lundell et al., 2014).

Monte Carlo simulations
The first step in constructing our computational model is to use Monte Carlo simulations (Hall and Alexander, 2009) in combination with the imaging protocol in Table 1 to generate synthetic diffusion MR signals from digital phantoms with a wide range of plausible ground truth microstructure indices.
Each phantom is characterised as a unique combination of five parameters: the mean μ R and standard deviation σ R of the axon radius distribution, the intrinsic diffusivity of the spins d, the intracellular volume fraction f and the intracellular water residence time τ i . Here, intracellular residence time τ i is defined as the average time a water molecule spends inside the intracellular space represented by the white matter axonal fibres. This is connected to permeability through the expression: k = R τ 2 * i , where R is the axon radius and k is the permeability. This, and the more general form of this expression are described in Jespersen et al. (2005) The membrane permeability k is specified via a probability p of the water molecule stepping through a membrane when it encounters it. The molecule will be reflected back with a probability of 1−p. A value of p=0 corresponds to impermeable membranes while a value of p=1 corresponds to fully permeable membranes. The relationship between the probability p and the membrane permeability k is given by the expression: p k = * 6 dt d , where dt is the simulation time increment and d is the intrinsic diffusivity.
To mimic the structure of in vivo human brain data, we model white matter as a collection of non-abutting parallel cylinders with radii drawn from a gamma distribution. We construct 12,500 unique white matter substrates, with substrate parameters randomly selected in the ranges: (to ensure that the distributions have a non-zero mode, matching the distributions observed in histology (Aboitiz et al., 1992)), f ∈ [0.4, 0.7], τ ∈ [0.02, 0.95] s i , d ∈ [0.8, 2.2] × 10 m s −9 2 −1 . The cylinders are randomly packed in the substrates as described in Hall and Alexander (2009), with example substrates shown in Fig. 1.
All simulations are performed using 100,000 spins and 2000 time steps. We chose these values as they provide precision of 10 −10 of the unweighted signal, which is several orders of magnitude smaller than realistic signal noise (Hall and Alexander, 2009). The diffusion step size calculated as in Hall and Alexander (2009) is between 0.5 μm and 0.9 μm, depending on the diffusivity of each substrate. We also choose 100,000 non-abutting parallel cylinders in a substrate to avoid variation due to sampling of the gamma distribution (Hall et al., 2014).
We generate two sets of signals for each geometry: noise-free and noisy. As spins undergo T 1 relaxation during the mixing time TM between the second and third 90 degree RF pulses, and thus also between the two diffusion gradients, measurements made using longer Δ (and so longer TM) experience more relaxation leading to lower signal intensities and signal to noise ratios (SNR). For the noisy data set we scale the signals by TM T exp(− / ) 1 using T = 0.832 s 1 for white matter at 3 T (from Stanisz et al., 2005). We then add Rician noise, choosing the standard deviation of the noise σ so that the SNR of the b=0 images with Δ=0.102 s is 20 which reflects our in-vivo data.

Data acquisition
We scan two healthy subjects and two MS patients. The first healthy volunteer is a 32 year old male, and the second healthy volunteer is a 30 year old female. Both healthy volunteers are scanned twice, with the rescan taking place within a month of the initial scan. The first MS patient is a 22 year old female with relapsing remitting multiple sclerosis (RRMS), a disease duration of 2 years and expanded disability status scale (EDSS) of 2. The second MS patient is a 63 year old male with secondary progressive multiple sclerosis (SPMS), a disease duration of 25 years and EDSS of 6. Both MS patients were scanned once. None of the MS patients recruited into the study experienced a relapse or a course of corticosteroids three months prior to imaging. We obtained written informed consent for all participants, and the study was approved by our local research ethics committee.
All data are acquired using the imaging protocol in Table 1 on a 3 T Philips Achieva scanner, using a 32-channel receive-only RF coil, SENSE acceleration factor of 2.5, full k-space acquisition, and receiver bandwidth of 1948 Hz/pixel. Additional imaging parameters are as  follows: TE=0.068 s, TR=12 s, FOV=256 mm×256 mm, matrix size=128×128, slice thickness=4 mm, number of slices=40 (the first scan of the first volunteer has 30 slices). The total imaging time is approximately 40 minutes. For the MS patients, in addition to the diffusion-weighted data we also acquire a high resolution (1 mm×1 mm×3 mm) T 2 -weighted scan for the purposes of creating a lesion mask. Following the data acquisition, some post-processing steps are performed. We perform eddy current and motion correction using the eddy tool in FSL (Smith et al., 2004). As the model that we learn here for white matter is specific to axons that resemble parallel cylinders, it is not applicable in regions containing CSF, grey matter or highly dispersed or crossing white matter fibres. We mask out these voxels by computing maps of linearity C =

Random forest regression
Random forest regression is a machine learning technique which here aims to learn the mapping between the microstructure parameters and the features of the DW MRI signal. The regressor is a collection of decision trees trained on separate randomly chosen samples from the available training data (Criminisi et al., 2013). Each tree estimates a set of parameters of interest from an input feature vector, and the final estimation is obtained by averaging the estimations from all trees in the forest (Criminisi et al., 2013). In order to do this, an initial training step is required in which every tree in the forest learns a mapping from a vector of signal features to a set of ground-truth microstructure parameters. The trained forest can then be used to estimate the parameters given a previously unseen feature vector. Here, the MC simulations provide training data with known microstructure parameters (specified in the simulation); the simulation outputs DW-MR signals from which we derive the features that form input to the mapping.
Each tree in the forest is trained as follows: at the parent node of the tree, we perform an initial linear regression to find a relationship between the training features and parameters. We then search the feature space for a partition such that having separate regressions on either side of the partition improves the estimation. If such a partition exists the node is split, resulting in a pair of child nodes. We then continue this procedure for the child nodes, and all subsequent nodes, until there is no more improvement in the estimation. Randomness is typically introduced into the forest in two key ways. The first method, Fig. 2. A schematic overview of how testing, training and estimation is done with random forest regression. In the appendix we go into further details of the random forest regressor itself.

Feature extraction
The feature vector used in this study comprises rotationally invariant features derived from diffusion tensor and 4th order spherical harmonic (SH) fits to each shell of data separately. The features that we calculate from the DT are the eigenvectors λ 1 , λ 2 , λ 3 , the mean diffusivity MD and the fractional anisotropy FA for each voxel. From the spherical harmonic fits, we calculate the mean and peak ADC, peak dispersion (i.e. the eigenvalues of the hessian matrix at the peak), anisotropy, skewness and kurtosis of the apparent diffusion coefficient profile, as well as simple combinations of the SH coefficients given by and the kurtosis is and S is the unit sphere. This gives 15 features from each shell for a total of 60. When fitting the DT and SH models, we compensate for the additional STE gradients as described in Lundell et al. (2014). Although some of these features are not mutually independent, we include all of them in order to have a comprehensive set to ensure we can find the most informative split criteria.

Training, testing and estimation
In Fig. 2 we show a schematic overview of how training, testing and estimation are done using the random forest regression. We use a widely used and freely available random forest regressor in the sci-kit learn python toolkit (Pedregosa et al., 2011) with 100 trees and a maximum depth of 20. Accuracy generally increases with the number of trees and tree depths. Preliminary experiments found that we obtain diminishing returns in accuracy above these values, but computational complexity increases. We use bagging to inject randomness into the forest. In the Appendix we provide an overview of the random forest regression method we use. More in-depth details of the implementation are available on the scikit-learn website (http://scikit-learn.org/). We train separate regressors for the noisy and noise-free data sets resulting in two distinct forests. The forests are trained on 10,000 of the 12,500 feature vectors, with the remaining 2500 previously unseen feature vectors used for testing. This is repeated for 100 randomly generated test and training sets to investigate bias due to the choice of training data. The testing phase allows us to evaluate how well the regressors perform by directly comparing the estimations to the known ground truth values. When estimating from the noise-free test feature vectors we use the random forest trained on noise-free data, whereas we use the random forest trained on the noisy data to estimate from the noisy test feature vectors.
Following the training and testing stages, we use the noisy random forest regressor on the in vivo human data sets. We use the noisy rather than noise-free random forest regressor as it has similar noise characteristics to the actual MR data.
The parameters that we estimate during the random forest regression describe the underlying microstructure properties of the tissue. We combine μ R and σ R , which describe size distribution of cells, into a single volume-weighted radius index α (Alexander et al., 2010), so that the final set of parameters we estimate are f, τ i , α and d.

Kärger model fitting
The most commonly used mathematical model to estimate τ i from diffusion-weighted PGSE or PGSTE data is the Kärger model (Stanisz et al., 1997;Lätt et al., 2009;Nilsson et al., 2010). Therefore, in order to compare our approach to the current state of the art method, we fit a two-compartment Kärger model to the 2,500 noise-free and noisy test data sets and the masked white matter voxels from the in vivo human data sets.
The intracellular compartment, with volume fraction f, is modelled using randomly packed, parallel cylinders characterised by a single radius index α and with an intra-axonal water residence time τ i . The extracellular space is modelled as a cylindrically symmetric DT with diffusivities, d and d ⊥ . d is assumed to be the same in both compartments, and d ⊥ is estimated from d and f using the tortuosity model in Szafer et al. (1995).
Prior to model fitting, each measurement is normalised by the mean b=0 measurement with the same TM to eliminate T 1 effects. For all data sets the model is fit using Markov Chain Monte Carlo (MCMC) (Gilks et al., 1994) with an offset Gaussian noise model (Panagiotaki et al., 2012) (assuming different noise standard deviations σ for each shell of data, which we estimate a priori) to sample from the posterior distribution over the model parameters. The burn-in phase for the MCMC consists of 10,000 steps, after which we collect 1000 samples at an interval of 100 steps.
2.6. Data analysis 2.6.1. Simulated data To assess how well the random forest estimates and Kärger model estimates match the known ground truth values in the test set, we use Bland-Altman plots and calculate for each parameter. Bland-Altman plots show the means of the ground truth and estimated values plotted against their differences, allowing us to assess the bias in our estimates. Points on the Bland-Altman plots are colour-coded with colour bars showing the percentage error. Due to better visibility we limit the colour bar to −50 to 50 %, with points outside of this range taking the same colour as the two maximum range points. The supplementary material includes additional visualisation of the agreement using scatter plots.
To determine whether the choice of training set used for the random forest regression introduces bias, we calculate the mean and standard deviation of the correlation coefficients for each parameter over all 100 training/test sets.

Healthy subjects
We use the scan-rescan data from the two healthy volunteer subjects to investigate intra-subject reproducibility on both the region of interest (ROI) and voxel level.
For the ROI analysis, we manually define regions in the splenium (CC-S) and genu (CC-G) of the corpus callosum, the left (ALIC-L) and right (ALIC-R) anterior limbs of the internal capsule and the left (CST-L) and right (CST-R) corticospinal tracts on the scan and rescan FA maps for both volunteers. We calculate the mean parameter estimates from the random forest and Kärger model in each ROI for both the scan and rescan data and plot scatter plots of scan versus rescan estimates and Bland-Altman plots for f, τ i , α and d individually. Bland-Altman plots show the mean of the scan and rescan parameter estimates against their difference; for highly reproducible parameters the points on a Bland-Altman plot should be closely clustered about the zero difference line. We also investigate the regional variability of each model parameter by calculating the coefficient of variation (CoV), defined as the ratio of the standard deviation to the mean, i.e. s μ and represents the variability of a parameter in relation to its mean value over the population. To do this, for each parameter, we pool together the scan and rescan parameter estimates for both volunteers in each region.
We also investigate intra-subject reproducibility on the voxel level. For each subject, we warp the parameter maps estimated from the rescan data into the space of the scan data using the FLIRT tool in FSL (Jenkinson et al., 2002) in order to obtain the correspondence between voxels. To assess the similarities between the scan and rescan, we again use Bland-Altman plots and calculate correlation coefficients for each parameter. The supplementary material includes additional visualisation of the agreement using scatter plots.

MS subjects
To investigate the sensitivity of the random forest and the Kärger model to tissue damage in MS lesions, an expert clinical researcher (NC) marked the lesions on the high resolution T 2 -weighted images. We also mark additional ROIs in the contralateral normal appearing white matter (NAWM) for comparison. The ROI and lesion masks are then registered to the mean b=0 image from shell 1 of the diffusionweighted data using FLIRT (Jenkinson et al., 2002). We use this shell for registration as it has the shortest diffusion time and so provides the best contrast.
In the first subject, in the early stages of MS, two of the marked lesions overlap the white matter mask completely and are investigated further. For the second subject, in the late stage of MS, the lesions are much more widespread, overlapping most of the white matter mask used to select voxels for analysis. However for this subject, our analysis is limited by the availability of contralateral NAWM for comparison and thus we only use one lesion in the CST. We then calculate the mean parameter estimates from both models in the lesions and control ROIs and compare. We then used t-test to find out if there is a statistically significant difference between parameter values in the lesions compared to those in NAWM. Fig. 3 shows Bland-Altman plots of f, α, τ i and d for both the random forest regressor and the Kärger model for noise-free data generated from cylinder substrates. The data points are colour-coded according to how close the estimates are to the actual values and the percentage error is shown on the colour bars. The solid black line indicates the mean difference between ground truth and estimated parameters and the dashed lines show the 95% limits of agreement. For the random forest model, points are clustered about the zero difference line, indicating low bias. However, despite the absence of noise the recovery of the parameters is not perfect. In particular, for f, τ i and α, there is some bias in the estimated values which depends on the ground truth value, for example, large values of τ i are consistently underestimated. However, the noise-free performance provides a benchmark for the best estimation we can achieve given the data available. Correlation coefficients are also strong for all parameters (f: R 2 =0.88, τ i : R 2 =0.95, α: R 2 =0.82, d: R 2 =0.99).

Simulations
Estimates obtained from the Kärger model are generally less accurate and more biased. In particular, τ i is consistently underestimated for all ground truth values, and although low diffusivities are well-recovered, the model underestimates high diffusivities. Correlations between ground truth and estimated values of f (R 2 =0.75) and d (R 2 =0.99) are good, however the model struggles to recover τ i (R 2 =0.60) and α (R 2 =0.11). It consistently underestimates τ i , and provides no correlation between ground truth and estimated α. Fig. 4 shows similar plots, but for simulated data with SNR=20. For the random forest model, the results for all parameters are consistent with those obtained from the noise-free data, although the 95% limits of agreement are slightly wider. Although the mean difference lines are mostly close to zero, again we see that there is some bias in our estimates, which depends on the ground truth values. For f (R 2 =0.70), we see that the larger volume fractions tend to be underestimated slightly whereas low f are slightly overestimated. This is also the case for τ i (R 2 =0.70). Residence times of up to approximately 0.6 s are estimated well, after which the estimates level off. The estimations of axon radius index are weaker than of the other parameters (R 2 =0.48), but we still see a positive correlation, indicating that the method is still able to distinguish small axons from large axons even in the presence of noise. Diffusivities are again very well estimated (R 2 =0.98).
The addition of noise weakens the performance of the Karger model still further. The 95% limits of agreement are much wider than for the random forest model, and again there is more bias in τ i and d. Similar to the random forest model, there is some bias in f (R 2 =0.65), although the Kärger model tends to underestimate low values and slightly overestimate higher values. Once again, the Kärger model struggles to estimate high d, despite having a high correlation coefficient (R 2 =0.96). The Kärger model is again much worse at estimating τ i (R 2 =0.40) and α (R 2 =0.02). As can be seen from both the Bland- Altman plot and the correlation coefficients, estimations of α are now essentially noise. Table 2 shows mean correlation coefficients for f, τ i , α and d averaged over the 100 training/test data sets for both noise-free data and data with SNR=20. The correlations between the ground truth and estimated values are very consistent for all parameters and both noise levels, indicating that while the random forest model needs sufficient coverage of the parameter range, it is not biased by the choice of training data. Fig. 5(a) shows scan and rescan parameter maps estimated using the random forest regressor across representative axial, coronal and sagittal slices for the first healthy volunteer. An initial visual inspection suggests good agreement between the scan and rescan parameters, with positive correlation coefficients (f: R 2 =0.57, τ i : R 2 =0.45, α: R 2 =0.48, d: R 2 =0.65) and similar trends observed across all the major white matter tracts. The values estimated for all parameters are within plausible ranges. Estimates of volume fraction f are in the range 0.44-0.65. The upper bound is slightly lower than expected, but as suggested by the simulation results in Fig. 4 large f tends to be underestimated by the random forest when the data is noisy. However, we still see the expected high-low-high trend in f across the mid-sagittal CC. Estimates of τ i are consistently in the range 0.4-0.5 s across the major tracts. The scan and rescan maps of d are also highly consistent, with estimations for most voxels in the range 1.4-1.8×10 −9 m 2 s −1 . Fig. 5(b) shows the scan and rescan parameter maps estimated using the Kärger model across the same slices. Estimates of volume fraction f are in the range 0.2-0.8. We see the high-low-high trend in f more distinctly in the Kärger model data than in the random forest data although estimated volume fractions in the midbody of the CC (0.25-0.3) are lower than expected, especially as the white matter masks were generated using a linearity threshold of 0.5. The scan and rescan maps of f are generally in good agreement though with R 2 =0.57, which is the same correlation coefficient as that of the random forest estimate. Maps of τ i are noisier and show less consistency between the scan and rescan data with R 2 =0.28, lower than that of the random forest estimate. Moreover, the location of these areas varies from scan to rescan. For example there are several voxels in the splenium of the mid-sagittal CC which have low τ i in the scan data but much higher τ i in the rescan data (shown by the arrows in Fig. 5(b)). Both the scan and rescan maps of α are very noisy, with an extremely low correlation coefficient of R 2 =0.06 and no discernible patterns in the data. Estimates of d however do appear reproducible (R 2 =0.66), and in the range of 0.8-1.4×10 −9 m 2 s −1 .

Healthy subjects
To provide a more quantitative analysis, we warp the parameters maps derived from the rescan data into the space of the scan data using the FLIRT tool in FSL (Jenkinson et al., 2002). Bland-Altman plots of scan versus rescan parameter estimates of f, τ i , α and d across corresponding voxels for both the random forest and Kärger model are displayed in Fig. 5(c). The data points are colour-coded based on the difference between scan and rescan parameter estimates, with colour indicating percentage error. The colour scale is the same for the random forest and Kärger model for each parameter. Neither model shows significant bias for any parameter, and all points are clustered about the zero difference line. For f, τ i and α, the random forest model has better reproducibility as suggested by the narrower 95% limit lines, but for d these limits are roughly the same for both methods. The correlation coefficients for f (random forest: R 2 =0.57, Kärger model: R 2 =0.57), d (random forest: R 2 =0.65, Kärger model: R 2 =0.66), τ i (random forest: R 2 =0.45, Kärger model: R 2 =0.28) and α (random forest: R 2 =0.48, Kärger model: R 2 =0.06) also support these findings. Scatter plots showing these correlations for both subjects are included in the supplementary material.
Figs. 6(a)-(c) show equivalent parameter maps and scatter plots for volunteer 2. As for subject 1, the scan and rescan maps for the random forest model show good visual similarities for all parameters. We see the high-low-high trend in f and the low-high-low trend in α across the mid-sagittal CC and estimates of τ i and d are consistent between scans as well as between the two subjects. For the Kärger model, reprodu-

Table 2
Means and standard deviations of the correlation coefficients between ground truth and estimated values using 100 different training and test data sets for the random forest model, for both SNR = ∞ and SNR=20.

SNR = ∞ SNR=20
Parameter  cibility is again reasonable for f and d but the maps of τ i and α are noisy. Fig. 6(c) shows voxelwise Bland-Altman plots of scan versus rescan parameters. As for the first volunteer, these plots show that there is no bias between scan and rescan estimates for either method, although again the random forest model has better reproducibility for f, τ i and α as indicated by the narrower 95% limit of agreement lines. The correlation coefficients for the random forest and Kärger model are very similar for f (random forest: R 2 =0.73, Kärger model: R 2 =0.73) and d (random forest: R 2 =0.72, Kärger model: R 2 =0.75), but the reproducibility of residence time (random forest: R 2 =0.63, Kärger model: R 2 =0.35) and axon radius metrics (random forest: R 2 =0.56, Kärger model: R 2 =0.02) is much higher for the random forest than the Kärger model. Following on from the voxelwise analysis of scan-rescan reproducibility, we analyse the reproducibility of the indices across regions of interest (ROIs). We manually define ROIs in the splenium (CC-S) and genu (CC-G) of the corpus callosum, the left (ALIC-L) and right (ALIC-R) anterior limbs of the internal capsule and the left (CST-L) and right (CST-R) corticospinal tracts on the scan and rescan data for both volunteers. Fig. 7 shows an example ROI mask overlaid on the FA image of the scan data for volunteer 2. We calculate the mean parameter estimates from the random forest and Kärger model in each ROI for all data sets. The top row of Fig. 7 shows scatter plots of scan versus rescan estimates of f, τ i , α and d. Results from the Kärger model (blue) and the random forest (pink) are shown on the same plots. The random forest shows high reproducibility for all parameters at the ROI level. Estimates of d from the Kärger model show good reproducibility, matching that of the random forest. However f, τ i and in particular α show lower reproducibility. The bottom row of Fig. 7 shows a Bland-Altman plot, which plots the mean of the scan and rescan parameter estimates against their difference. Dashed lines indicate the 95% limits for each method. The results of the Bland-Altman analysis confirm the results of the correlation plots. For d, the data points from both the random forest and Kärger model show similar spread about the zero line, indicating that the methods have equivalent reproducibility. However, for the other three parameters the data points for the random forest are more closely clustered around the zero line than the data points for the Kärger model, indicating that the scan and rescan parameter estimates from the random forest model show better agreement.
Finally, we pool the scan and rescan parameter estimates for both subjects across the six regions of interest and calculate the mean μ, standard deviation σ and coefficient of variation (CoV) of each parameter in each region (as described in the Methods section). Tables 3 and 4 show the results for the random forest and Kärger model respectively. CoVs are lower for all parameters when estimated using the random forest as opposed to the Kärger model. The two methods perform similarly for d; both methods result in similar standard deviations, but the CoVs for the Kärger model appear slightly higher than for the random forest due to the slightly lower mean estimates of d. However, for f, τ i and α, CoVs calculated from random forest parameter estimates are substantially lower than for the Kärger model.

MS patients
In Fig. 8, we present the parameter maps of the first MS subject (early stage MS). First from the left is a b=0 image with the overlaid lesion mask (red areas in squares) and respective normal appearing white matter tissue (NAWM) (green areas in circles). The remaining four columns are parameter maps from the same slice estimated using the random forest model (top row) and Kärger model (bottom row). An initial visual inspection of parameter maps, suggests that neither the random forest nor the Kärger model detects any obvious parameter differences in the lesion in genu (top area in a square), compared to the respective NAWM tissue (top area in a circle). For the lesion in splenium though (bottom area in a square), there are differences compared to the NAWM (bottom area in a circle): τ i estimated using the random forest is reduced relative to the NAWM area and the volume fraction estimated by Kärger model is lower compared to the NAWM. When comparing random forest and Kärger model parameter estimates, random forest provide higher values than Kärger model for all four parameter maps over all white matter voxels.
Equivalent parameter maps for the second subject with late stage MS are shown in Fig. 9. In this subject the lesions are much more widespread, overlapping most of the white matter mask used to select voxels for analysis, including the genu and the corticospinal tracts. An initial visual inspection of the whole white matter mask shows more dramatic changes in the parameter maps for both the random forest and Kärger model compared to the first MS subject in early stage MS. For the random forest model, there is a reduction in f and τ i and an increase in α in the lesion area (red area in a square) compared to the contralateral NAWM (green area in a circle). For the Kärger model, f and τ i are lower in the lesion areas compared to the contralateral NAWM, but there also appears to be increase in d. To provide a more quantitative analysis, we calculate the mean parameter estimates from both models in the lesions we marked in Figs. 7 and 8 (red areas in squares), and their respective contralateral NAWM ROIs (green areas in circles), as shown in Table 5. In the lesion in the genu of subject 1, the diffusivity is higher than that in NAWM. The difference between the other model parameters is not statistically significant. For the lesion in the splenium of subject 1, the random forest estimates a lower τ i of 0.19 s in the lesion compared to 0.33 s in the contralateral NAWM. The random forest also estimates a higher α in the lesion compared to the NAWM. The Kärger model does not show any differences in τ i or α, but it does suggest a reduction in axonal volume fraction in the lesion, with f dropping from 0.35 in the NAWM to 0.26.
For the lesion in the CST of subject 2, the quantitative analysis confirms the initial visual inspection. Using the random forest model, we find that f drops from 0.52 to 0.47 and τ i is reduced from 0.56 s in the NAWM to 0.13 s in the lesion. There is also an increase in α from 3.3 μm to 4.3 μm, as well as an increase in d, although for d the change is the same order of magnitude as the standard deviation across the ROI. Estimates from the Kärger model show a larger decrease in f from 0.41 in NAWM to 0.18 in the lesion. τ i also decreases from 0.71 s in the NAWM to 0.33 s in the lesion, although the standard deviation is much higher when using the Kärger model compared to the random forest. The intrinsic diffusivity increases from 1.50×10 −9 m 2 s −1 to 1.84×10 −9 m 2 s −1 . There is also an increase in the mean value of α in the lesion compared to the NAWM, but the standard deviation is much larger than the apparent change. Lesion parameter values for which there is a statistically significant difference (p < 0.02) to the values in NAWM are marked with stars. Table 3 Mean, standard deviation and coefficient of variation of all parameters estimated using the random forest model in all six regions of interest.  Table 4 Mean, standard deviation and coefficient of variation of all parameters estimated using the Kärger model in all six regions of interest.

Discussion
In this study we have introduced a computational framework to learn a computational model that maps features derived from DW MR signals to microstructure parameters. This approach supports the estimation of microstructure parameters that have previously proven elusive due to lack of accurate mathematical models relating them to the MR signal. The key example we focus on here is the residence time of water between intra and extracellular spaces and we used random forest regression to learn the mapping. Simulations show that the parameter correlations from the new computational model approach are much stronger than those obtained using the Kärger model, particularly for residence time τ i and axon diameter index α. The trained random forest also estimates sensible microstructure parameters from in vivo human data, even for parameters such as α and τ i which only weakly influence the DW MR signals available from human scanners . From scanrescan data of healthy subjects, we show that results from the random forest have better reproducibility than the Kärger model on both the voxel and ROI level. Finally, data acquired from two MS patients demonstrate the clinical potential of estimates of exchange-related parameters obtained this way. In both subjects, we observe reductions in f and τ i and increases in α in lesions compared to NAWM when using the random forest model, which is consistent with expectations from the pathology of MS lesions. In particular we expect the axon diameter index to be higher in lesions than NAWM, because small axons are more vulnerable to MS pathology (DeLuca et al., 2006;De Luca et al., 2004;Huang et al., 2016); axonal loss corresponds to lower f; and myelin damage can be expected to reduce τ i .

Simulations
Our simulation results, which cover a wide range of ground truth parameters, including both situations in which the system is close to well-mixed (short τ i ; long Δ) and not well-mixed (long τ i ), show that the Kärger model performs worse than the random forest approach as it is less able to recover the underlying ground truth parameters accurately. Kärger model consistently underestimates τ i , even short τ i , i.e. less than 0.1 s, where for the longest diffusion times of 0.406 s and 0.412 s in the protocol, the system approximates the well-mixed conditions for which the Kärger model is valid. However, the comparison is not totally fair, as the random forest is trained directly on simulated data, while Kärger model, although used to optimise the protocol for sensitivity to τ i , is not. Nevertheless, the comparison is worthwhile, because, the study still provides a useful insight into the limitations of the Kärger model's assumptions, i.e. that the system is well-mixed, in situations where they are known to be violated, as in white matter. We show that for realistic white matter models Kärger model provides inaccurate estimates, which could be argued are expected as Kärger model is not meant to be valid under those assumptions. Nevertheless, it is worthwhile to do the comparison because Kärger model provides the current standard model of exchange for PGSE and PGSTE data, even when diffusion is restricted, Fig. 9. Mean b=0 image and parameter maps showing f, τ i , α and d estimated using the random forest and estimated using the Kärger model for the second MS subject. Masks for the lesion and NAWM are overlaid on the mean b=0 image. The lesion is in CST and is coloured in red (in a square), and the NAWM is also in CST but contralateral to the lesion and is coloured in green (in a circle).  Nedjati-Gilani et al. NeuroImage 150 (2017) [119][120][121][122][123][124][125][126][127][128][129][130][131][132][133][134][135] and the evaluation of its performance under these more realistic conditions is important.
To investigate the performance of the Kärger model further, we also consider the case in which the Kärger model is fitted to test data generated from the Kärger model, both with and without noise i.e. the situation in which the Kärger model should explain the underlying data exactly (results not shown here). For noise-free data, the Kärger model estimates f (R 2 =0.98), τ i (R 2 =0.76) and d (R 2 =1) well. Estimates of τ i are still not exact as, even at an maximum diffusion time of 0.412 s, the signals generated from the Kärger model for any τ i greater than ≈0.6 s are virtually indistinguishable. Estimates of α are still poor (R 2 =0.21) but as discussed earlier the protocol was not optimised for maximum sensitivity to this parameter. For noisy data with SNR=20, estimations of f (R 2 =0.87) and d (R 2 =0.96) are still strong, but results for τ i (R 2 =0.35) and α (R 2 =0.09) are poor. This implies that estimates of τ i from the Kärger model are extremely sensitive to noise, and thus the model is unlikely to be able to provide meaningful estimates of intracellular water residence time from data obtained under realistic acquisition conditions.

Healthy subjects
The scan-rescan experiments in healthy subjects show that the random forest has good reproducibility compared to the Kärger model. However for all parameters the correlation coefficients are much lower than the very high reproducibility of DT-derived parameters such as FA, which typically has a CoV of 0.8-3.0% (Vollmar et al., 2010) or NODDI parameters (Tariq et al., 2013) which achieve CoV of around 5-6%. Difficulties in registering the scan and rescan volumes due to different numbers of slices in the data sets contribute, but mainly this arises because sensitivity of the signal to τ i and α is weak. However, we also note that the CoVs for τ i estimated using the random forest model with the current protocol compare favourably to values published in a recent reproducibility study of AXR imaging (Lampinen et al., 2014). Lampinen et al estimate the CoV of AXR in the CC, ALIC and CST and report values of 58%, 41% and 45% respectively. Their study differs from ours in that they combine AXR estimates from the left and right ALICs and CSTs and from the genu and splenium of the CC; if we combine our data in a similar manner, the CoVs for τ i would be 8.9%, 22.3% and 15.6%. However, they also have a larger number of subjects and note that at least in the CST approximately half of the variance can be attributed to inter-subject differences.
Random forest estimates for the residence time and the axon diameter are closer to the reported values in literature than those of the Kärger model. There are numerous voxels in which Kärger model estimates τ i to be between 0.01 and 0.06 s. These values are an order of magnitude smaller than the values reported in literature (Finkelstein, 1976;Quirk et al., 2003). Estimates of d using Kärger model are in the range 0.8-1.4×10 −9 m 2 s −1 which is lower than what we would typically expect in the in-vivo white matter tissue  The ROI analysis shows that for f, τ i and α, CoVs calculated from random forest parameter estimates are substantially lower than for the Kärger model, suggesting that the random forest may provide more consistent estimations of microstructure indices across a population. We also note that for the random forest method, the values of f that are estimated fall in a narrower range than those estimated using the Kärger model, which may artificially narrow the 95% limits slightly. This is due to the range of volume fractions present in the training set (f=0.4-0.7) which was chosen to be representative of white matter tissue. Also, the correlation coefficients are much higher for the second volunteer than the first volunteer, which may be due to a better registration between the data sets.

MS subjects
In MS subjects, we observe reductions in f and τ i and increases in α in lesions compared to NAWM when using the random forest model. This is intuitive as the breakdown of myelin in the lesion is likely to make the residence time shorter compared to the NAWM, the damage of the tissue is likely to cause cell loss and the small axons are found to be preferentially damaged (DeLuca et al., 2006;De Luca et al., 2004;Huang et al., 2016). Analysing further and comparing the NAWM in MS subjects to the same ROIs in healthy subjects (presented in Table 3, CC:S, CC:G, CST-R) we observe that the trend continues, i.e. we observe reductions in f and τ i and increases in α in patients' NAWM compared to healthy subjects. It has been shown previously that NAWM in MS subjects is somewhat damaged compared to the normal tissue (Filippi et al., 2012) and our results support this suggestion. These are just preliminary results because the number of both MS and healthy subjects is quite small. However, the trends are plausible and the estimated parameters sensible. We also found that, for one of the lesions in the first MS subject, almost no difference between the parameters in the lesion and the NAWM tissue was found. This could be because the lesion was relatively new compared to the other lesion and the damage has not yet fully developed. However, using the Kärger model for that lesion produces counterintuitive results as we found that the residence time was shorter in the NAWM than in the lesion itself. This is the opposite trend to what we expect, as damage to the myelin sheath is thought to increase permeability and thus reduce the residence time. We did not observe this when using the random forest model, and therefore believe that the difference is more likely due to noisy estimation of τ i using the Kärger model. In future, we plan to translate the technique for estimating residence time τ i we introduced here. We show some promising new results, however a larger study of MS patients is necessary in order to investigate whether the parameters of the random forest model, in particular τ i , could act as suitable biomarkers for detecting and tracking changes in MS pathology.

Limitations and further work
One of the limitations of the work we present here is that the training uses a very simple model of white matter tissue. Specifically, the model uses long straight circular cylinders that mimic axon bundles and does not account for myelin water, curvature of axons, dispersion or crossing fibres. However, the general idea we present here, of computational models based on simulations and machine learning extends naturally to much more complex models. Future work will investigate incorporation of such effects, although these require further development of the simulation system we use to construct training data.
Another limitation of the current study is that the machine learning procedure is tested on the same type of substrates that it was trained on. We note however that although the model of the substrate is the same, the model parameters between the training and the test data are different so the results demonstrate generalization to some extent. Future work will explore a meaningful evaluation using more diverse test data with additional effects (e.g. undulation, dispersion or a histology-based-mash). However, this would require substantial further development of our simulation system and is outside of the scope of this paper. One important advance for the future is to use the random forest regression to obtain measures of uncertainty (as in Tanno et al., 2016) to highlight situations where the signals are unfamiliar so the model produces unreliable results.
The success of random forest regression is dependent on the sensitivity of the diffusion MR signal to the parameters of interest. This can be seen in the simulation results where for τ i greater than the maximum diffusion time, the difference in signals (and thus in the features) due to differences in τ i is within the standard deviation of the noise, and so cannot be differentiated. The estimations of axon radius index are weaker than that of the other parameters (R 2 =0.48), with errors going as high as 100% in some cases as can be seen in Figs. 3 and 4, which is simply due to a low sensitivity of the signal to axon diameter for the gradient strength used . Nevertheless, for both residence time and axon diameter we still see a positive correlation between the estimates and the ground truth, indicating that the method is able to distinguish small axons from large axons and very permeable from impermeable axons even in the presence of noise. The sensitivity of the signal and hence the reproducibility of our parameters can potentially be improved by optimising the acquisition protocol further, e.g. by improving the angular resolution (subject to time constraints) or increasing the maximum diffusion time or gradient strength to provide better contrast (also subject to SNR constraints) or using the random forest model in the protocol optimisation instead of the Kärger model.
The results we obtained in-vivo are inherently difficult to validate. Accurate estimates of τ i are not obtainable via histology, however, an existing study of intra-to extra-axonal water exchange across the whole in vivo rat brain using relaxometry and contrast agents suggests a mean τ i of approximately 0.55 s (Quirk et al., 2003) aligning with our estimates. Furthermore, studies on sphingomyelin membranes (which are found in axonal membranes) estimate permeabilities of ≈1×10 −6 ms −1 (Finkelstein, 1976), which correspond to residence times of 0.3-0.6 s for axons with radii of 0.5-1 μm. The maps of α are slightly noisier, but we observe the characteristic low-high-low trend across the CC (Aboitiz et al., 1992) in both scan and rescan maps. Although it is possible that the random forest distinguishes the small axons in the genu and splenium from the larger ones in the midbody and cortico-spinal tract, other effects not accounted for in the model (dispersion, undulation, etc.) mean that it does not accurately recover actual mean diameters. The scan and rescan maps of d are highly consistent. The estimations for most voxels lie in the range 1.4-1.8×10 −9 m 2 s −1 , which is plausible for white matter tissue.
We used the same intrinsic diffusivity and the same MR parameters such as T 1 and T 2 inside and outside the axons. This assumption may be inaccurate, particularly in pathology, and departures would indeed disrupt the parameter estimates. The method of using machine learning to generate computational models extends naturally to situations where these parameters differ in different water pools. However, accurate estimations of such parameters would demand richer data sets with higher b-values -the computational models do not affect fundamental limitations on identifiability, as discussed for example in Jelescu et al. (2016).
One possible way to improve performance of the Karger model in recovering ground truth parameters in simulation is to constrain its parameters to the range used in the simulation. We note that the random forest model is not constrained in this way either, as the linear models at each leaf node can extrapolate, although in practice the random forest does limit parameter estimates mostly within that range. Even so, constraining the Kärger model to the same range is unlikely to improve performance. For example, in the noise free case, 86% of the Kärger model's estimates of intracellular exchange time and 99% of the Kärger models estimates of axon radius index are within the same range as the random forest training values anyway. The intracellular exchange rate is constrained to be positive in the Kärger model, and of the 14% of estimates that do lie outside of the random forest training set range most are simply slightly closer to zero. For these values, constraining Kärger model further will result in them converging to the lower limit, which will not improve the correlation with the ground truth values.
The random forest models that we use are non-linear although they do use a linear model at the leaf nodes. Linear models at the leaf nodes prove to be a good trade off between the segregation of the training set and the complexity of the models at each leaf node, however the future work may explore alternative leaf node models such as constant quadratic etc. Criminisi et al. (2013).

Conclusions
We propose to use machine learning to construct computational models that relate the MR signal to microstructure parameters. We demonstrate the idea by testing the ability to estimate residence time τ i using clinical scanner settings. This can potentially be important as the axonal membrane permeability, which is associated with the residence time, is hypothesised to be correlated with the condition of the insulating myelin sheaths around the axonal fibre . Numerous white matter pathologies of the human central nervous system, such as MS, spinal cord injury and leukodystrophies, are characterised by damage to the myelin; thus sensitive and specific biomarkers, such as the residence time τ i that we estimate here, could improve the diagnosis of and treatment for these conditions.
The computational modelling approach we use here opens doors to estimating a wide range of other parameters for which mathematical models are intractable: undulation; properties of the extra-cellular space; etc. The approach, as used here, also neatly avoids the need for any analytical model of tortuous extra-cellular diffusion for which only approximations are available (Burcaw et al., 2015). Given that these compartment models tend to overestimate key microstructure parameters such as the axon diameter , including the effects of these intractable parameters not only gives us clinically useful parameters, but may also improve the estimation of other microstructure indices. Although the mapping we learn is specifically for randomly packed, parallel, non-abutting cylinders and a STE imaging sequence, the approach easily extends to other tissue configurations and pulse sequences. In future we plan to incorporate models of fibre dispersion into our MC simulations to extend our technique to dispersed white matter fibre regions as well as grey matter. We also plan to investigate more specialised pulse sequences, such as the AXR sequence  or optimised generalised gradients (Drobnjak et al., 2010(Drobnjak et al., , 2011, which may improve sensitivity as well as allowing us to compare other analytic models, or other kinds of machine learning technique, e.g. deep learning. A random forest regressor is formed of a collection of regression decision trees and its output is a weighted average of the estimate from each individual decision tree (see Fig. A10). A decision tree is formed of internal nodes (represented by grey circles in Fig. A11), terminal nodes (the blue rectangles) and edges which connect all the nodes in a hierarchical fashion. Each internal node has one incoming and two outgoing edges and stores a test (split function) that it applies to the incoming data, which is a multi-dimensional vector that, here, contains rotationally invariant features of the MRI signal corresponding to a substrate. After the test, the data is sent along one of the outgoing edges depending on the result of the test. The terminal nodes store the predictor that relates the incoming data to the final answer, which here is the estimate of the microstructure parameters of interest.  The training phase decides on three things: the internal structure of the tree, i.e. what internal and leaf nodes exist; the test at each internal node; and the regression function at each leaf node. It does that through a greedy splitting process: it starts with all the input training data V at the root node and constructs a linear mapping from V to the corresponding labels. Here V is a set of feature vectors V FV FV FV = { , , …, } n 1 2 each derived from the diffusion MRI signals of n different substrates and the labels are the corresponding ground truth microstructure parameters for each substrate. The training phase then seeks a test that divides the training data into two subsets V 1 and V 2 , each with an independent linear mapping, in such a way as to maximise the information gain over the single mapping at the root node. If no pair of mappings increases the information over the single one, the root node remains a leaf node. Otherwise it splits into two siblings containing the two subsets and the process continues recursively until no information gain can be found. At the end of the training stage, each tree encodes a piece-wise linear mapping between the feature vectors (MRI signal features) and the parameters to be estimated (the four microstructure parameters f τ α d , , , i ). For the testing stage, previously unseen feature vectors are fed into the random forest. Using the mapping learned during the training stage, the forest returns estimated tissue parameters for each of the previously unseen substrate feature vectors. Random forests repeat the same training process for each decision tree but introduce randomness in different ways. Here we use bagging, which trains each decision tree on a different, randomly selected subset of data. Randomisation is introduced into the training process as it has been shown to improve the performance of the forest and its robustness to noise (Criminisi et al., 2013).
A more in-depth discussion together with details of the implementation is available in scikit online documentation in http://scikit-learn.org and papers (Criminisi et al., 2013) and (Pedregosa et al., 2011).