Estimating myelin-water content from anatomical and diffusion images using spatially undersampled myelin-water imaging through machine learning

Myelin is vital for healthy neuronal development, and can therefore provide valuable information regarding neuronal maturation. Anatomical and diffusion weighted images (DWI) possess information related to the myelin content and the current study investigates whether quantitative myelin markers can be extracted from anatomical and DWI using neural networks. Thirteen volunteers (mean age 29y) are included, and for each subject, a residual neural network was trained using spatially undersampled reference myelin-water markers. The network is trained on a voxel-by-voxel basis, resulting in a large amount of training data for each volunteer. The inputs used are the anatomical contrasts (cT1w, cT2w), the standardized T1w/T2w ratio, estimates of the relaxation times (T1, T2) and their ratio (T1/T2), and common DWI metrics (FA, RD, MD, λ1, λ2, λ3). Furthermore, to estimate the added value of the DWI metrics, neural networks were trained using either the combined set (DWI, T1w and T2w) or only the anatomical (T1w and T2w) images. The reconstructed myelin-water maps are in good agreement with the reference myelin-water content in terms of the coefficient of variation (CoV) and the intraclass correlation coefficient (ICC). A 6-fold undersampling using both anatomical and DWI metrics resulted in ICC = .68 and CoV = 5.9%. Moreover, using twice the training data (3-fold undersampling) resulted in an ICC that is comparable to the reproducibility of the myelin-water imaging itself (CoV = 5.5% vs. CoV = 6.7% and ICC = .74 vs ICC = .80). To achieve this, beside the T1w, T2w images, DWI is required. This preliminary study shows the potential of machine learning approaches to extract specific myelin-content from anatomical and diffusion-weighted scans.


Introduction
Myelin is a layered, fatty substance wrapped around the axons and acts as an electrical insulator, allowing for an efficient and faster transport of electrical signals in the central nervous system. Myelination is most active during the first 2 years of life, but is an experiencedependent process that continues well into adulthood. For example, disorders such as Alzheimer's disease ( Kavroulakis et al., 2018 ), multiple sclerosis ( Kolind et al., 2015 ), epilepsy ( Drenthen et al., 2019b ), Parkinson's disease , major depressive disorder ( Sacchet and Gotlib, 2017 ), and schizophrenia ( Lang et al., 2014 ). Therefore, in vivo myelin-specific markers could provide helpful insights into several neurological and neuropsychiatric disorders ( Drenthen et al., 2020 ).
White matter is relatively bright on T1-weighted (T1w) MR images due to myelin-bound cholesterol, while the T2-weighted (T2w) contrast of white matter is low due to motion-restricted protons in the myelinwater. Therefore, previously both the T1-and T2-weighted images have been combined to enhance myelin contrast by calculating the ratio of T1-and T2-weighted image intensities ( Glasser and Van Essen, 2011 ). Despite the enhanced myelin contrast, the T1w/T2w ratio fails to provide information that is specific to the myelin content only. Likewise, diffusion-weighted imaging (DWI) is often used to study microstructural white matter abnormalities. However, while DWI measures can provide information that is related to the myelin content, it is non-specific and therefore not suitable for the quantification of myelin ( Laule et al., 2007 ). Dedicated myelin-specific imaging techniques, such as myelinwater imaging, often have a low spatial resolution and require complex image processing as well as tailored MRI sequences which add to the total acquisition times ( > 7 min), hampering their appropriateness in clinical studies ( Drenthen et al., 2019a ).
Machine learning is a rapidly emerging technique in the field of medical imaging ( Hosny et al., 2018 ;Lundervold and Lundervold, 2019 ). Machine learning algorithms can be trained to reveal complex patterns in data. Previously, neural networks have already been successfully applied to accelerate the calculation times of the post-acquisition myelin analysis ( Lee et al., 2020 ;Liu et al., 2020 ). Here, we aim to extract myelin-specific information from anatomical and DWI scans, using neural networks. While anatomical and DWI techniques are not suitable for absolute quantification of myelin, combined they might exhibit an underlying (complex and non-linear) relation to the myelin content. Therefore, provided with reference myelin measurements, machine learning algorithms could, in theory, be trained to calculate the myelin content from anatomical and DWI scans. In this study, we aim to estimate quantitative myelin-water information from anatomical and DWI images by using spatially undersampled (e.g. acquiring fewer data) myelin-water imaging as a reference (resulting in faster scanning). There are many machine learning algorithms including statistical approaches (e.g. (non-)linear regression models) and neural networks. When large amounts of data are available, neural networks (i.e. deep learning) are often considered a preferred option. Furthermore, an important strength of neural networks is that they can handle nonlinear and multidimensional dependencies between in and output data ( Shahid et al., 2019 ). The network can subsequently be used to obtain whole brain myelin-specific maps with a resolution comparable to the input data.
As a first step to explore whether MWF can be estimated from commonly clinical scans, we assessed whether machine learning approaches can derive full MWF images, with clinical scans and undersampled MWF scans as input. If successful, this could be a step towards MWF maps based on clinical scans, without MWF acquisition. Neural network will be trained using spatially undersampled reference myelin markers obtained from dedicated MRI scans. After learning, the network can, in theory, reconstruct the whole brain myelin map. To assess the reliability of our results, we compare the deviations between the reconstructed myelin-specific map and the reference measurements to the betweenscan reproducibility (which was assessed previously ( Drenthen et al., 2019a )).

Myelin quantification
A singular value decomposition (SVD) filter ( Bydder and Du, 2006 ) was used to reduce noise in the multi-echo data, and a Gaussian kernel of 5 mm full-with half maximum (FWHM) served to spatially smooth the GRASE images. Subsequently, myelin-water fraction (MWF) maps were calculated using an adapted version of the orthogonal matching pursuit (OMP) ( Drenthen et al., 2018 ). To solve the OMP, a basis set of 1000 logarithmically spaced relaxation functions (T2 range 15 to 2000 ms) was used. The algorithm was regularized by running 20 instances with random initializations of the solution, as was described in more detail in ( Drenthen et al., 2018 ). The code used to run the OMP algorithm is publicly available on GitHub ( https://github.com/ GSDrenthen/Non-Negative-OMP ).
The decay of the T2 signal is highly influenced by B1 inhomogeneities and imperfect slice profiles due to the slice-selective excitation pulses. Therefore, to model the decays more accurately, the relaxation curves in the basis set were determined using the extended phase graph algorithm (EPG) ( Weigel, 2015 ). Furthermore, the slice profile was estimated using the Fourier transform of the slice-selective excitation pulse (small-tip approximation) ( McPhee and Wilman, 2017 ). Subsequently, by integrating the EPG relaxation profiles across the slice the basis set was corrected for imperfect slice profiles ( Drenthen et al., 2019a ). The B1 inhomogeneities were calculated by solving the OMP problem for a range of possible B1 errors and subsequently selecting the B1 error that corresponds to the lowest residual of the fit. A B1 error range of 0.5-1.0 was used, which is equivalent to a flip angle range of 90-180°.
The MWF was calculated as the ratio of myelin-water associated T2 components (15-40 ms) to all T2 components (15-2000 ms). The inhouse code used to calculate the slice profile and MWF is made public on GitHub ( https://github.com/GSDrenthen/calc _ MWF ).

Input data
The GRASE image acquired with TE = 100 ms was used as T2w contrast image. A bias field correction was applied to both the T1w and T2w images, after which the T1w image was registered to the undersampled TE = 10 ms GRASE image by zero-filling the missing slices and subsequently using the coregistration routine of the statistical parametric mapping (SPM12, https://www.fil.ion.ucl.ac.uk/spm/software/spm12/ ) toolkit ( Friston et al., 2007 ). Besides the T1w and T2w contrasts (cT1w and cT2w) the standardized T1w/T2w (sT1w/T2w) ratio was calculated ( Cooper et al., 2019 ). Furthermore, from the T1w and T2w images, the T1 and T2 relaxation time maps were approximated using the MRI signal formulas for MPRAGE sequences (T1) ( Marques et al., 2010 ) and spin echo (T2) imaging. To estimate the signal intensity bias, first the signal formulas are solved for the CSF, where relaxation times are assumed (T1 = 4273 ms, T2 = 1577 ms ( Drake-Pérez et al., 2018 )). Subsequently, the intensity bias is used to solve the signal formulas for a predefined range of T1 and T2 time values to estimate a basis set of T1 and T2 contrasts. The T1 and T2 relaxation maps are then approximated voxel-wise by finding the value from the basis set that is closest to the cT1w and cT2w. While this method does not represent the exact underlying T1 and T2 relaxation times, it provides the neural network with a non-linear scaling of the T1w and T2w contrasts that is related to the relaxation times.
The preprocessing of the DTI data were performed using ExploreDTI v4.8.6 ( Leemans et al., 2009 ). The diffusion MRI data was first corrected for head displacement, including B-matrix rotation, and eddy current induced geometric distortions. Subsequently, an EPI distortion correction was performed by a non-linear transformation of the diffusion images to the T1w image (which was already transformed to the GRASE space). From the corrected DWI images, principal eigenvalues ( 1 , 2 ,

Regions of interest
The white matter was parcellated into four lobes: the frontal, temporal, parietal and occipital lobes using Freesurfer (version 5.1) ( Fischl and Dale, 2000 ). First, the T1w images were parcellated based on the 68 cortical regions in the Desikan-Killiany atlas ( Desikan et al., 2006 ). Subsequently, each white matter voxel was assigned to the most proximal cortical region. Furthermore, using Freesurfer, the splenium, the genu, the whole corpus callosum, the cortical gray matter, and the thalamus were automatically segmented from the T1w images. This resulted in 8 white matter regions of interest (ROIs) (the genu and splenium of the corpus callosum, the whole corpus callosum, the lobular white matter, and all white matter) and 2 gray matter ROIs (cortical gray matter, and the thalamus).

Machine learning
For each subject, a neural network was trained using 5 (out of 26) of the simultaneously acquired slices, corresponding to an undersampling factor of 6 (i.e. using only 1 of the 6 acquired packages). To ensure that extreme MWF values (e.g. very high and very low) are available for training, the slices are maximally spaced and selected such that they intersect the splenium, an area known to be highly myelinated ( Fig. 1 ). The networks are trained on a voxel basis, resulting in a large training dataset for each volunteer (i.e. > 10.000). Furthermore, to investigate the added value of more training data, additionally, for each subject neural networks were trained using 10 (out of 26) slices. This corresponds to a undersampling factor of 3. Moreover, to estimate the added value of the DWI metrics compared to simply the anatomical images, neural networks were trained using either the combined set of the anatomical and diffusion images or only the anatomical (T1w, T2w) images.
A residual network (ResNet) architecture for non-linear regression problems was used ( Chen et al., 2020 ). The optimal depth and width of the ResNets were individually assessed for each volunteer using a 5-fold cross-validation grid search. Number of layers was iteratively assessed over range; [3 4 5 6 7] while the number of nodes in each layer was evaluated for range; [20 30 40 50 60]. This resulted in neural networks with a median depth of 5 (range 3-6) and a median width of 30 (range 20-50). For each layer, rectified linear units (ReLU) are used as nonlinear activation functions. To prevent vanishing gradients due to dying ReLU's ( Lu et al., 2019 ), parametric ReLU activation functions are used ( He et al., 2015 ). These are similar to a leaky ReLU ( He et al., 2015 ) with a leak rate that is estimated during the training. The networks are trained with a batch size of 8 and using 100 epochs, and early stopping is used to prevent overfitting (with a patience of 20 epochs). The available training data is split into a train set (80%) and validation set (20%) to determine the early stopping point. This early stopping strategy was empirically assessed, and revealed to be a good balance between allowing the network to have enough epochs to learn, and to prevent the effects of overfitting.
The distribution of MWF values in the gray and white matter resembles a Rayleigh distribution (i.e., an imbalanced distribution with a long tail). Therefore, a standard mean squared loss function is likely to underestimate the MWF values. To cope with the imbalanced data, we opt to use a custom loss function with an adaptive weight that is related to the MWF distribution. First, we fit a Rayleigh distribution to the MWF values and, subsequently, invert it to put higher weights on the cases in the tails ( Fig. 2 ): Where is the scale parameter corresponding to the Rayleigh distribution fit of the input MWF values.
The adaptive moment estimation (Adam) ( Kingma and Ba, 2015 ) was used to update the network parameters and the network weights were initialized using the Glorot uniform initializer ( Glorot and Bengio, 2010 ).
Prior to training, all input data was linearly scaled to the range of [0 1]. Fig. 3. For a representative subject, the A) T1-weighted image, B) reference MWF map, reconstructed MWF maps C) with DWI and D) without DWI features, and E) and F) corresponding absolute voxel-wise difference w.r.t. the reference MWF map, respectively.
The neural network was written in Python v3.7 using the Keras library in TensorFlow ( Abadi et al., 2016 ). The code to build the neural network architecture is made public on GitHub ( https://github.com/ GSDrenthen/MWF _ NN ).

Statistical analysis
To assess the agreement between the reference and the estimated MWF, the coefficient of variation (CoV), and intraclass correlation coefficient (ICC) were determined for each ROI ( Drenthen et al., 2019a ). The CoV was determined as the within-subject standard deviation (SD ws ) relative to the overall mean for all subjects ( = * 100% ). An ICC was calculated to measure the absolute agreement using a 2-way mixed model ( Liljequist et al., 2019 ). Results are considered to be in good agreement when ICC > 0.60 ( Cicchetti, 1994 ).

Results
In Fig. 3 , the T1w image (A) of a representative subject is shown alongside the corresponding reference MWF map (B). Fig. 3 C and D show the myelin maps that were reconstructed using neural networks with the combined set of features (DWI, T1w and T2w) and only the anatomical features (T1w, T2w), respectively. Fig. 4 shows the loss corresponding to the estimated MWF map in Fig. 3 C as a function of epochs. Fig. 5 shows the agreement between reference and reconstructed MWF of the ROIs for the neural networks trained using the combined set of features (DWI, T1w and T2w) and only the anatomical features (T1w, T2w) with a 6-fold undersampling, respectively. Qualitatively, the combined set of features shows a better agreement compared to using only the anatomical features (points in the scatter plot are closer to the reference line). Table 1 lists the mean and standard deviations of the ROIs from the reference and reconstructed MWF maps for both neural networks (combined set, and anatomical only) for a 6-fold undersampling factor. Furthermore, the difference between the two MWF values as well as the reproducibility measures are given. In terms of the reproducibility measures, the MWF map reconstructed with both DWI and anatomical images (ICC = 0.68, and CoV = 5.9%) shows a better agreement with the reference compared to using only the features from the anatomical images (ICC = 0.43 and CoV = 10.0%). The ResNet using only the anatomical features especially fails to correctly estimate the myelin content in the temporal white matter, corpus callosum and splenium.
Training with more input data (i.e. 10 input slices instead of 5), results in higher ICC values of 0.74 (for T1w, T2w, and DWI) and 0.52 (for T1w and T2w only) ( Table 2 ).

Fig. 4.
Loss as a function of epochs for the training (solid) and test (dashed) set for a representative subject. Note that in this case the early stopping criteria is met after 65 epochs.

Discussion
This preliminary study shows the potential of machine learning approaches to extract myelin-specific content from anatomical and DWI scans. The present application could greatly reduce the scanning time of myelin quantification from 7:30 min to 1:15 min (6-fold), while maintaining a low CoV of 5.9% and an ICC that is indicative of a good agreement between reconstructed and reference myelin maps (ICC = 0.68). Moreover, the CoV and ICC of a 3-fold undersampling (scanning time from 7:30 min to 2:30 min) is comparable to the reproducibility of the full multi-slice GRASE sequence itself (CoV = 5.5% vs. CoV = 6.7% and ICC = 0.74 vs ICC = 0.80). To achieve this, DWI is required in addition to the T1w and T2w images. The T1w and T2w images alone are insufficient to ensure a good agreement (ICC < 0.60).
While the T1w/T2w ratio can enhance the myelin contrast, it does not sufficiently capture the myelin-water related information. This is in line with a prior study that showed a stronger correlation between MWF and DWI metrics compared to MWF and T1w/T2w images ( Uddin et al., 2019 ). This is further emphasized by similar ICC and CoV values when omitting the T1/T2 ratio as input to the neural networks (CoV = 5.5% vs. CoV = 5.4% and ICC = 0.74 vs ICC = 0.75). In our study, especially the white matter area's that have an on average higher (i.e. the corpus callosum) or lower (i.e. temporal white matter) myelin content show deviations from the reference. The spatial intensity variation of the T1w and T2w images between these regions is insufficient to distinguish the differences in myelin content. This is especially evident from the Bland-Altman plots, showing a clear bias towards underestimation of regions with higher MWF values. While this bias is most evident in the case where only T1w and T2w images are used as training, a smaller but similar effect is visible when DWI information is added. Hence, adding DWI information can, to a certain degree, prevent the underestimation of higher MWF values. Nevertheless, there will remain regions with high MWF, such as the splenium of the corpus callosum, that cannot be reconstructed accurately. One possible explanation for this is that these very high MWF values are vastly outnumbered by the MWF values of the lobular white matter. Future research, aimed at data augmentation and oversampling techniques, could potentially promote the estimation of these high MWF values. Therefore, the current method is most promising in the lobular white matter regions.
Machine learning methods, neural networks especially, require an abundant amount of data. Therefore, the current study uses a voxel-wise Fig. 5. The agreement between the reference and estimated MWF for the ten ROIs and all subjects is shown. Left shows the results for the combined set of features (DWI, T1w and T2w) and on the right is the results for only the anatomical features (T1w, T2w).

Table 1
Results for a 6-fold undersampling. Mean and standard deviation of the reference and reconstructed MWF values in the ROIs are shown for the neural networks trained with the combined anatomical and DWI metrics, as well as neural networks trained with only anatomical metrics. Additionally, reproducibility measures are shown for each ROIs, as well as a mean of all ROIs. DWI, diffusion-weighted images; CC, corpus callosum; WM, white matter; cGM, cortical gray matter; ROI, region of interest; MWF, myelin-water fraction; RC, repeatability coefficient; CoV, coefficient of variation; ICC, intra-class correlations coefficient. White matter All 11.8 ± 0.9 11.8 ± 1.0 2.0 ± 1.9 .89 11.8 ± 0.9 12.3 ± 1.1 3.9 ± 3.8 .61 Frontal 11.8 ± 0.9 11.9 ± 1.0 2.5 ± 2.  approach, resulting in > 10.000 in-and output pairs. However, the drawback of this approach is that the spatial information is lost. The myelin in the brain has clear spatial patterns, and this additional information could improve the reconstruction of myelin markers. For example, methods that use additional information on neighboring voxels, or training on the images directly using convolutional neural networks (CNN) could yield a more accurate reconstruction. Furthermore, another limitation of the voxel-wise approach is that errors related to image registration cannot be excluded. The current study shows the feasibility of myelin-water content extraction from anatomical and DWI in a relatively homogenous sample of young and healthy adults. Therefore, future clinical studies are needed to evaluate the performance in various myelin-associated pathologies (e.g. Alzheimer's disease ( Kavroulakis et al., 2018 ), multiple sclerosis ( Kolind et al., 2015 ), epilepsy ( Drenthen et al., 2019b ), Parkinson's disease , major depressive disorder ( Sacchet and Gotlib, 2017 ), and schizophrenia ( Lang et al., 2014 )). It should be especially investigated how the proposed methods performs in, for example, white matter lesions.
This study opens up new possibilities, for example, a neural network could be trained to extract myelin content from T1w, T2w and DWI scans directly, without requiring additional myelin-water imaging data. However, the contrasts of standard anatomical T1w, T2w images are qualitative measures and therefore not directly suitable for such an application. Therefore, more research is needed (e.g. intensity normalization procedures) to extract myelin-specific content from anatomical and DWI directly. Since anatomical scans are necessary for neuroradiological assessment, they are generally available for all patient exams. Moreover, DWI data is also commonly acquired in many clinical studies. By applying a machine learning approach on readily available clinical scans, not only can the spatial resolution and total scan time of future clinical studies investigating the myelin content be improved, but also the myelin content can be calculated retrospectively in many prior clinical cohorts. Furthermore, a successful first application of the proposed approach would open novel and unique possibilities for patients that cannot sustain long scanning times, such as elderly patients, patients with movement disorders, young children and infants.
To conclude, myelin-water maps can be reconstructed from anatomical and diffusion-weighted scans with an agreement that is comparable to the reproducibility of the scan itself. These results show the potential of machine learning approaches to extract specific myelin-content from anatomical and diffusion-weighted scans.

Data and code availability statement
The data and code that support the findings of this study are available from the corresponding author, GSD, upon reasonable request.

Declaration of Competing Interest
None.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.