Keywords

1 Introduction

The presence of free water (FW) in a white matter (WM) voxel is known to confound the interpretation of the diffusion tensor indices [10]. However, fitting a multi-compartment diffusion MRI model that includes a FW compartment is an ill-posed problem, as it often requires spatial regularisation or the introduction of prior knowledge for the tissue compartment [12, 14]. These regularisation schemes potentially reduce the sensitivity of the tensor indices to WM alteration (e.g., in pathology). Additionally, the voxel-wise fitting of the FW model estimates a non-zero CSF volume fraction in deep WM, where it is expected to be zero [13].

A T1w image is usually acquired along with the diffusion sequence in a typical study. T1w images generally have high resolution, good SNR and a high contrast between tissues. We propose here to use the CSF PVF estimated from the T1w image as a prior in order to fit a two-compartment model on the diffusion data. We hypothesize that the extra information can improve the stability of the fitting and allow to relax the constraint on the MD.

A similar approach was previously proposed in [5]. However, the crucial issue of mis-registration between the T1w image and the diffusion data was not addressed. In a typical study, susceptibility-induced distortions associated with the echo planar imaging (EPI) and the eddy current induced distortions limit the accuracy of the co-registration. Susceptibility-induced distortions can be adressed using a separately acquired fieldmap that allows to accurately unwarp the EPI images [9]. Alternative methods can be used to estimate such fieldmap when the latter was not acquired in the experiment, but they require the acquisition of images with opposite phase-ecoding directions [2]. However, in many dMRI group studies, no fieldmap or opposite phase encoded images are available. Consequently, using directly co-registered T1w-derived tissue PVFs in a diffusion model may result in significant deviations in the estimation of the tensor FA or MD.

In order to avoid the co-registration issue, we propose to use a data-driven approach to learn a mapping between diffusion features and the tissues’ PVFs estimated from the T1w images. Supervised machine learning methods have already been successfully applied in diffusion MRI. For example, it was shown that the NODDI model can be successfully fitted on single shell data by applying a random forest regression method [1].

In this work, we compare four supervised machine learning approaches in their ability to predict the tissue PVFs. We use a range of diffusion features estimated from models applicable to low b-value single-shell data, that is typically encountered in group studies. We also evaluate the robustness of our approach to the error in registration between the T1w and diffusion images. Finally, we use the CSF PVF predicted by our method as a prior while fitting the FW model and assess how the additional information relaxes the constraints on the diffusion tensor parameters.

2 Materials and Methods

2.1 Data Acquisition and Pre-processing

We performed our experiments on two similar datasets of two separate subjects. Data was acquired on a 3T Siemens Verio scanner. Diffusion data consisted of a standard DWI acquisition with TR = 9600 ms, TE = 76 ms and b = 1000 \(\mathrm{s}\,\mathrm{mm}^{-2}\), with 60 non-collinear diffusion gradient directions in addition to 6 non-weighted b = 0 images. The voxel size was 2 mm isotropic, with a parallel imaging acceleration factor of 2, for a total scan time of 12 min. The structural T1w image was acquired at a resolution of 1 mm isotropic for a scan time of 6 min.

Another set of diffusion data was also acquired, consisting of an Inversion Recovery (IR) sequence where the CSF signal was nulled. The acquisition parameters were: TR = 20.6 s, TI = 2300 ms, b = 1000 \(\mathrm{s}\,\mathrm{mm}^{-2}\), with 60 non-collinear diffusion gradient directions in addition to 6 non-weighted b = 0 images. This acquisition allows retrieving tensor indices that are not contaminated by the CSF contribution to the signal [11].

The diffusion data was preprocessed to correct for movement and eddy currents distortions with EDDY [3]. The diffusion features consisted of 17 parameters resulting from: the DWIs (mean b0, mean b1000), the diffusion tensor (FA, MD, tensor mode and the 3 tensor eigenvalues), Ball&Stick model [4] (volume fractions of the 3 fibres, total fibre volume fraction, isotropic diffusivity and the dispersion of the 3 fibres orientations) and finally probabilistic tractography (visitation maps for full brain tractography). In order to add information from the neighbouring voxels, we added a smoothed version of each feature map (\(\sigma =\) 6 mm) as well as its gradient, for a total of 51 features.

The tissue PVFs were estimated on the T1w image using FAST [17]. These PVFs were then registered to the diffusion data with FLIRT BBR [7].

2.2 Predicting Tissues Volume Fraction

Our method relies on a voxel-wise regression, that learns a mapping between the features derived from the diffusion data and the corresponding tissue PVFs computed on the T1w image. A supervised machine learning (ML) approach requires a training and a testing dataset. In our case, training is performed on one subject and testing on another subject. To train the ML model, we assume that T1 and diffusion images are accurately registered.

2.3 Comparison of the Machine Learning Approaches

We compare the performance of 4 machine learning approaches: random forest (RF), support vector machine regression (SVR), Gaussian process (GP) and neural networks (NN). The methods’ hyper-parameters were optimised using grid search for the RF and SVR, gradient descent for the Gaussian process and an empirical manner for the NN. The optimisation of the 4 ML approaches (RF, SVR, GP and NN) was based on the correlation between the predictions and the T1 segmentation on a validation set, except for the GP, where the parameters were optimised only on the training set. Each of the 4 methods was optimised on the same set of 20000 voxels.

The performance of each ML approach was similarly assessed with the correlation between their PVF predictions and the PVF estimated from the T1, for each tissue (WM, GM and CSF). The correlation was computed on the full brain of the test subject. It ensured that the performance measure includes false positives, e.g. non-zero WM volume fraction in the CSF as well as false negatives, e.g. zero WM volume fraction in the middle of the white matter.

2.4 Mis-registration Between T1 and Diffusion Space

The main motivations to use a ML approach is the mis-registration between T1 and diffusion space. Therefore, we evaluated how the registration error affects the performance of the learning process. We introduced additional registration error between structural and diffusion data on the learning dataset, and computed how the performance of the segmentation on the test dataset was impacted.

2.5 Application to FW Modelling

In this experiment, we fitted the FW model described in [14]. We used the predicted CSF volume fraction as a prior for the FW compartment. The mean of the prior is equal to the volume fraction predicted, with a correction factor to account for the different T1 and T2 relaxation times of the tissues, with \(FWF=\frac{\hat{S}_{fw}}{\hat{S}_{fw}+\hat{S}_{T}}\), where \(S_{X}=\hat{S}_{X}(1-\exp [-\mathrm {TR}/T_{1,X}])\exp [-\mathrm {TE}/T_{2,X}]\). The standard deviation of the tissue PVF prior is set to 0.1, which approximately corresponds to the standard deviation of the error between our model prediction and the T1 segmentation. Compared to [14], we relaxed the prior on the MD, with its standard deviation to 0.5 \(\times 10^{-3}\,\mathrm{mm}^{2}\,\mathrm{s}^{-1}\).

3 Results

3.1 Machine Learning Approaches Comparison

The random forest optimal architecture consisted in 100 fully grown trees with 90% of the features used at each split. The SVR was an \(\epsilon \)-SVR (\(\epsilon =0.01\)) with a Gaussian kernel (\(\gamma =0.1\)) and a regularisation parameter \(C=1\). The best perfomance for the Gaussian process was obtained with a Gaussian kernel with a length-scale of 3.6 and a noise level parameter of 0.1. Finally, the best NN architecture consisted in 3 hidden layers with 200 units each and a rectified linear activation function.

In Fig. 1, we show the performance obtained with the 4 ML approaches for the 3 tissue types. Overall, the RF exhibits the highest correlation between the predictions and the T1 segmentation, for the 3 tissue types (\(r\,=\,0.83\), 0.88 and 0.93 for the CSF, GM and WM respectively). It is closely followed by the NN. The SVR and GP perform less well. In addition to its high accuracy, the random forest method was also the fastest. The training phase was performed in a few minutes, whereas the other methods required a few hours.

Fig. 1.
figure 1

Comparison of the four ML approaches. Correlation between T1w-derived and predicted (from diffusion data) partial volume segmentation using random forest (RF), Gaussian process (GP) and support vector regression (SVR).

Figure 2 shows the segmentation maps predicted by the RF method, as well as the T1w image segmentation. For a given voxel, the tissue with the highest PVF is represented. Visually, the RF segmentation maps are very similar to the T1 segmentation. The RF still gives high WM PVF prediction in the crossing areas (\({>}0.95\)). We note that the RF segmentation is better in the subcortical structures as they contains higher GM PVFs.

Fig. 2.
figure 2

Segmentation maps predicted with the RF method. Tissues’ partial volume fraction obtained from the T1 and RF method in diffusion space.

3.2 Mis-registration Between T1 and Diffusion Space

Figure 3 shows the correlation and MSE obtained on the test dataset as a function of the mis-registration (translation along the x, y and z axes) artificially introduced into the training dataset. To compute the scores, we used the RF regression. The plots show that the correlation is at its maximum when there is no mis-registration, reaching 0.943, with an MSE of 0.017. The correlation gradually decreases (and the MSE increases) as the registration error increases. Interestingly, we see that a mis-registration along the x axis (left-right orientation) has more impact on the performance than in the y and z axes.

Fig. 3.
figure 3

Impact of the mis-registration on the RF performance. Correlation and MSE between predicted and ground truth PVF on the test data as a function of the misregistration between diffusion and structural space on the training data.

Here, the key result is that the method is very robust to misregistration of the training dataset. Indeed, the correlation and the MSE obtained on the test dataset stays almost constant when the training error remains under 2 mm, which corresponds to the voxel size of our diffusion data. The robustness of our approach validates its applicability to datasets where EPI distortions are significant and cannot be corrected.

3.3 Synthetic T1

We performed an additional experiment to predict the absolute T1 intensities at their native resolution (1 mm), using RF and the diffusion features up-sampled to 1mm. In the Fig. 4, the RF predictions are shown together with the T1w original image in the same space. We observe that the predictions look very similar to the original T1w image, and the correlation coefficient between the two images is high (\(r\,=\,0.948\)). The contrast between grey and white matter is excellent on the original T1w image, as expected from this modality. The WM/GM contrast is also very good in the predictions. This is an interesting result, given that the prediction is obtained from a single-shell acquisition.

Fig. 4.
figure 4

Prediction of T1w image intensities. Original T1w and T1w image predicted with the random forest from upsampled diffusion data

The main outcome of this experiment is the quality of the predictions in terms of resolution. Although the diffusion features were derived from data with a resolution of 2 mm, the predictions at 1 mm appear excellent. The predicted T1w is slightly blurrier than the original one, but the actual resolution is very close to the original T1w. In term of the resolution element (RESEL) [15] the original T1w linear RESEL is 17.2 mm and the predicted T1w RESEL is 19.4 mm. The latter is lower than the RESEL obtained with spline interpolation, which is in this case 22.5 mm. Importantly, this result shows that our method takes advantage of the sub-voxel “super-resolution" information contained in the diffusion data.

3.4 Application to FW Modelling

We show here how the prior (derived from the RF predictions) on the CSF compartment affects the fitting of the FW model to the diffusion data. We used the RF predictions, as they are the most accurate. As a reference, we also included the results obtained with the T1-derived CSF PVFs, which are prone to registration error. Finally, we show the results obtained with the same model, without the prior on the CSF volume fraction.

In Table 1, we show the FA and MD obtained in the regions where partial voluming between the CSF and the WM occurs. These regions are defined by a WM PVF \(\ge 0.5\), a CSF PVF \(\ge 0.1\), and a GM PVF \(=0\). We compare our fitting with the inversion recovery acquisition where the CSF signal was nulled. We indicated the correlation as well as the MSE, the latter being more sensitive to biased estimates. The table shows that the best results (in bold) are obtained when we used the CSF PVFs predicted by the random forest as a prior on the FW volume fraction. Importantly, when we used the RF CSF prior, the CSF PVF estimated in the deep WM was constant with a value of 0. When the prior is not used, the model estimates a non-zero FWF (on average 0.1) in crossing fibres areas, which affects the tissue MD and FA.

These results indicate that the fitting benefits from the extra information inferred with the RF approach. Importantly, we showed that the PVFs estimated from the T1 should not be directly used when fitting the model on the diffusion data.

Table 1. Correlation coefficient (r) and RMSE between the FA and MD predicted with the different priors and the inversion recovery ground truth.

4 Discussion

Machine Learning Approaches.

Amongst the 4 machine learning techniques we evaluated, we found that the random forests outperforms other approaches. It is closely followed by the NN method, while the SVR and GP are both under-performing. This result is not unexpected, as the RF approach is known to be very efficient, scalable, and effective for a wide range of supervised learning problems. The neural network approach has been shown to be promising as well, although it requires a more careful tuning and is more computationally demanding than the RF. Additionally, the NN can be naturally extended to use the spatial neighbourhood with convolutional neural networks (CNN). We tried to use a 3D CNN method that was but we unfortunately faced implementation issues, as the optimisation of the weights in the context of regression proved to be difficult.

Diffusion Features.

In this work, we employed diffusion features derived from well known models (diffusion tensor and ball and stick), as well as the tractography visitation map. One of the motivations was to make the method applicable to acquisitions with different parameters (e.g. number and orientation of diffusion gradients, resolution, number of b-values). Another motivation was that the features are rotationally independent, and therefore they do not require the training and testing datasets to be aligned. A model free approach was used in [6]. The signal intensities from each diffusion gradient were directly used as features in order to infer NODDI and DKI parameters. While their approach appears to be promising, it is restricted to the case where the same diffusion protocol is used for an entire study.

Diffusion Data Segmentation.

We successfully applied our ML approaches to partial volume segmentation. The segmentation maps appeared homogeneous, even though no spatial regularisation was used. However, it heavily relies on the accuracy of the T1 segmentation, which was inaccurate, notably in the sub-cortical grey matter structures. FAST, being purely driven by the voxelwise T1w intensities, generally does not give good thalamus segmentation, whereas the dMRI-based PVF prediction more accurately estimated larger GM content in the thalamus.

Application to FW Modelling.

The first motivation of our work was to improve the fitting of a two-compartment model that accounts for CSF contamination. By adding an informative prior on the CSF volume fraction, we were able to prevent the model from fitting a CSF compartment in deep white matter. As a result, the micro-structural properties estimated in the tissue are not biased either by the MD prior or by the presence of a spurious CSF compartment. Secondly, the CSF PVF prior enables us to relax the prior on the MD. We showed that the extra information provided by the RF reduces the bias on the MD estimated by the model in areas where partial volume effects are present. We note that our work could easily be extended to a more complex diffusion model with a CSF compartment, such as NODDI [16], or models with a GM compartment, such as multi-tissue spherical deconvolution [8].

Implications for the Future.

The analysis of diffusion data was until recently exclusively based on an explicit model that often require advanced diffusion acquisition protocols. Our results showed that a machine learning approach can provide extra-information that is useful for solving ill-posed problems. A corollary is that the diffusion data contains redundant information. It was indeed demonstrated that a machine learning approach (neural networks) can reduce the amount of data needed to fit NODDI model by a factor of 12 [6]. In another study, it was shown that using an ML based approach can infer NODDI micro-structural parameters more accurately than a direct fitting on the data [13]. They showed that their method is robust to mesoscopic changes of the white matter (i.e., the geometric padding of the fibres).

5 Conclusion

In this work, we showed the potential of ML approaches to retrieve quantities from the diffusion data that cannot be derived with an explicit model. Our experiments were limited to one test subject, as inversion recovery DWI data is uncommon in larger dataset. Finally, on top of ensuring anatomical accuracy, our method also provides useful information that allows to retrieve unbiased tensor indices throughout the entire WM.