Enhancing the estimation of fiber orientation distributions using convolutional neural networks

Local fiber orientation distributions (FODs) can be computed from diffusion magnetic resonance imaging (dMRI). The accuracy and ability of FODs to resolve complex fiber configurations benefits from acquisition protocols that sample a high number of gradient directions, a high maximum b-value, and multiple b-values. However, acquisition time and scanners that follow these standards are limited in clinical settings, often resulting in dMRI acquired at a single shell (single b-value). In this work, we learn improved FODs from clinically acquired dMRI. We evaluate patch-based 3D convolutional neural networks (CNNs) on their ability to regress multi-shell FODs from single-shell FODs, using constrained spherical deconvolution (CSD). We evaluate U-Net and High-Resolution Network (HighResNet) 3D CNN architectures on data from the Human Connectome Project and an in-house dataset. We evaluate how well each CNN can resolve FODs 1) when training and testing on datasets with the same dMRI acquisition protocol; 2) when testing on a dataset with a different dMRI acquisition protocol than used to train the CNN; and 3) when testing on a dataset with a fewer number of gradient directions than used to train the CNN. This work is a step towards more accurate FOD estimation in time-and resource-limited clinical environments.


Introduction
Diffusion magnetic resonance imaging (dMRI) can be used to investigate the organization of white matter (WM). WM is composed of neuronal axon fiber bundles that impose a preferential direction to the diffusion of water molecules [1,2], resulting in anisotropic diffusion along the axon fiber bundles [3,4].
WM tissue microstructural organization, such as axon diameter [5] and local fiber orientation distribution (FOD or fODF) [6], can be estimated from dMRI acquisitions. FODs can be used to perform fiber tractography [7,8], which has an important role in presurgical planning [9][10][11][12] and connectome analyses [13]. A common method to estimate local fiber orientation is diffusion tensor imaging (DTI) [3]. However, DTI only models single fiber populations and cannot resolve complex fiber configurations such as fiber crossings [6]. More robust methods for representing FODs have been proposed based on spherical deconvolution [14][15][16] or other approaches that estimate diffusion orientation distribution functions from q-space [16,17].
Constrained spherical deconvolution (CSD) estimates a more complex FOD from dMRI signals [4,18]. Single-shell single-tissue CSD (S-CSD) models each voxel as a single compartment with one corresponding FOD, irrespective of the underlying tissue components [18]. However, S-CSD FODs suffer from partial volume effects (PVEs) when multiple tissues are present. Multi-shell multi-tissue CSD (M-CSD) reduce PVEs by modeling one anisotropic compartment, corresponding to WM, and two isotropic compartments, corresponding to grey matter (GM) and cerebrospinal fluid (CSF) [4]. M-CSD relies on decay behaviors of distinct tissue types across multiple b-values (or shells) to separate each voxel into three components. However, M-CSD is limited in clinical settings since it requires dMRI acquisitions with multiple shells (MS) which has longer acquisition times compared to DTI or single-shell (SS) dMRI.
Single-shell 2-tissue CSD (2TS-CSD) and single-shell 3-tissue CSD (SS3T-CSD) [19] attempt to resolve PVEs by modeling isotropic compartments, similar to M-CSD, using the b = 0 s/mm 2 image as a second shell. With the additional shell, a multi-tissue signal profile is computed by 2TS-CSD assuming one anisotropic component (WM) and one isotropic component (either GM or CSF). SS3T-CSD uses an iterative approach to fit a CSD model for the three tissues compartments by first fitting one pair of components (WM and GM) and then fitting another pair of components (WM and CSF) [19].
FODs that are able to resolve complex fiber configurations typically require specific dMRI acquisition protocols with a high number of gradient directions (b-vecs), a high maximum b-value and/or multiple bvalues [20][21][22][23]. A higher number of gradient directions and higher b-values improve FOD angular resolution, enabling FODs to better resolve complex fiber configurations, such as fiber crossings [24,25]. However, local fiber reconstruction is more accurate for images with a high signal-to-noise ratio (SNR), which can be achieved from low b-value shells [25]. Despite these advantages, clinical uptake is limited due to longer acquisition times and the need for expert staff to set up the acquisition protocols [26]. Therefore, improving FOD modeling for clinically available dMRI acquisitions is an active topic of research.
Deep learning (DL) has been successfully used for a variety of medical imaging tasks [27] to learn an underlying mathematical representation that non-linearly maps data between representations [28]. DL regression has been presented to improve dMRI spatial resolution [29], to map model coefficients from q-space [30] to either neurite orientation dispersion and density imaging (NODDI) [31] or diffusion kurtosis imaging (DKI) models [32], to output high quality CSD coefficients from CSD coefficients computed on dMRI with fewer gradient directions [33], to fit SH coefficients between different shells [34], and to map from S-CSD coefficients to FODs computed from histology [35].
In Tanno et al. [29]; a CNN patch-based regression was presented to infer higher resolution patches and quantify uncertainty. In a different work, Golkov et al. [30] computed model coefficients (DKI or NODDI) from q-space signal intensities using a multilayer perceptron network (MLP). In Koppers et al. [34]; a MLP was trained using SH coefficients calculated from one or a combination of shells to infer the SH coefficients of the same order for a different shell. In Nath et al. [35]; a neural network composed of regular hidden and residual layers (ResDNN) was trained to map S-CSD coefficients to FODs computed from macaque histology. In Lin et al. [33]; 3D CNNs were used to estimate M-CSD coefficients of a good quality MS dMRI acquisition from M-CSD coefficients computed from dMRI with fewer gradient directions.
In this work, we aim to compute a more accurate and reliable FOD from SS dMRI, the most common dMRI acquisition in clinical applications. Our goal is to demonstrate that 3D CNNs, U-Net [36] and High-Resolution Network (HighResNet) [37], can regress M-CSD coefficients from 2TS-CSD coefficients. We present an application of CNNs that has not been proposed elsewhere in the literature and is non-obvious from other proposed regression tasks for dMRI which have focused on regression between shells using MLPs [34] or regression from few gradient directions to many gradient directions for MS acquisitions [33]. We present an extensive evaluation involving: (1) training and testing on datasets with the same dMRI acquisition protocol, (2) testing on datasets with different dMRI acquisition protocols than the training dataset, and (3) regressing from dMRI with fewer gradient directions than the training dataset.

Pipeline overview
Our pipeline consists of the following steps. We construct a paired dataset composed of one SS dMRI and one MS dMRI (Section 2.2) from the Human Connectome Project (HCP) [38] and an in-house dataset which we refer to as QS dataset (see Section 3.1 for a detailed description of the datasets). For each dataset, we compute two CSD models, 2TS-CSD and M-CSD (Section 2.3) from the SS dMRI and MS dMRI, respectively. Then, we train a CNN to regress the M-CSD coefficients from the 2TS-CSD coefficients (Section 2.4).

Training dataset
From a MS dMRI a paired SS dMRI dataset is constructed by selecting one shell from the MS dMRI. In this work, we select a shell that has at least a minimum number of gradient directions for a given b-value that can characterize the angular frequency components of the dMRI signal [25]. For the HCP dataset, we construct a SS dMRI for each of the three b-values (1000, 2000, 3000 s/mm 2 ) where each shell has 90 directions. For the QS dataset, we construct a SS dMRI for two b-values (700, 2500 s/mm 2 ) with 32 and 64 directions, respectively.

CSD modeling
CSD models FODs from the dMRI signal intensities. A dMRI signal intensity is approximated by convolving CSD coefficients with a signal attenuation profile, also known as a response function [18]. The CSD coefficients are computed by minimizing differences between the dMRI signal intensities and signal intensities approximated from the CSD coefficients using iterative linear least-squares. A soft regularizer is also applied to enforce a non-negativity constraint on the CSD coefficients.
Although l max = 8 provides FODs with a high angular contrast for high b-values (b > 1000 s/mm 2 ) [25], in this work, we focused on l max = 4, comprising 15 coefficients, to establish proof of concept.
After computing CSD coefficients, we applied multi-tissue informed log-domain intensity normalization [39] to correct intensity inhomogeneities.

M-CSD
M-CSD coefficients are computed following the equation [4]: where d i is the vector of dMRI signal intensities in the i-th shell, x j is an unknown vector of coefficients for the FOD of tissue j, C i,j is the matrix relating the coefficients for the FOD of tissue j to the dMRI signal intensities measured on the i-th shell in q-space by spherical convolution. An additional constraint is imposed on the coefficients where A j is a tissue specific matrix relating the coefficients of the FOD for tissue j to their signal amplitudes, effectively imposing non-negativity on all coefficients. To perform the optimization, we use the algorithm available within MRtrix [40].

2TS-CSD
For the 2TS-CSD, a similar approach to the M-CSD is used where the b = 0 s/mm 2 (b-zero) image is used as a second shell. To ensure Equation (1) has a unique solution, we set j = 2, reducing the number of tissue components to two. We model one compartment as isotropic corresponding to CSF, and the other compartment as anisotropic, corresponding to WM. The CSF is selected as the isotropic compartment as it leads to a more accurate fit of the FOD compared to using GM as the isotropic compartment [19].

CNN training
We evaluate two 3D CNN architectures, a 3D High-Resolution Network (HighResNet) [41] and a 3D U-Net [36], on their ability to regress M-CSD coefficients from 2TS-CSD coefficients. Fig. 1 shows a graphical representation of the network architectures. For both networks, patch-based training is used. Patch-based training requires a reduction in the effective receptive field (ERF) to reflect the selected patch size (Section 2.5) and avoid distortions at the patch boundary [42]. Details of how the ERF is reduced for each network are described below.

HighResNet
The original HighResNet architecture is comprised of three levels of dilated convolutions and nine residual connections resulting in 0.81 M trainable parameters. HighResNet was originally proposed as a compact network that could achieve large ERFs [42] without requiring a downsample-upsample pathway to capture low and high level features [36,43]. Dilated convolutions are used to produce accurate predictions and detailed probabilistic maps near object boundaries [41]. In this paper, we modified the HighResNet architecture by reducing the number of layers to achieve the desired ERF. The final HighResNet architecture comprises two levels of dilated convolutions and four residual connections resulting in 0.16 M trainable parameters. A parametric rectified linear unit (PReLU) activation function was used in place of a ReLU. PReLU adaptively learns the rectifier parameters and has been shown to improve CNNs performance in other applications [44].

U-Net
The original 3D U-Net is a "U"-shaped network that has a downsample-upsample pathway composed of 14 convolutional layers [36] resulting in 19.08 M trainable parameters. We adapted the U-Net architecture to achieve the desired ERF by reducing the network to ten convolutional layers resulting in 3.93 M trainable parameters. We removed one encoder block (2 × (conv. + batch norm. + PReLu) + max pooling) and one decoder block (concat. + up-sampling + 2 × (conv. + batch norm. + PReLu)).

Data augmentation
Classic techniques for on-the-fly augmentation including axis flipping, scaling, and rotation have been successfully applied to CNN training for small 3D medical imaging datasets [37,41,45]. However, applying these techniques as implemented in traditional medical image processing tools is not appropriate for CSD coefficients as they are in the SH domain and not the 3D spatial domain. Therefore, we apply 3D random rotations in the SH domain.

Training setup
Each network is trained with an RMSprop optimizer to minimize the L 2 loss between the M-CSD coefficients and the CSD coefficients regressed by the CNN, measured by loss(y,ŷ) = ‖y−ŷ‖ 2 2 2 where y are the ground truth coefficients (M-CSD) and ŷ are the coefficients regressed by the network. The initial parameters for each CNN are set using the He uniform initialization [44]. Each CNN is trained for 400 epochs, with a weight decay of 1E − 6. Training starts with a learning rate of 3E − 2, which is then reduced by 1/2 every 50 epochs. Training weight decay, initial learning rate, and number of epochs were experimentally chosen based on observed convergence. Patches are uniformly sampled within the binary mask corresponding to the intracranial space computed using the skull-stripped algorithm of MRtrix [40].
For each iteration in an epoch, a subject from the training set is randomly selected. Subsequently, CSD coefficients are augmented by applying a random FOD rotation in the range of [− 25, 25] degrees. From this augmented data, 40 patches of size 32 × 32 × 32 × 15 were randomly sampled from within the intracranial space, where 15 is the number of 2TS-CSD coefficients. The number of patches were experimentally selected to achieve low validation loss while being able to be loaded on the available graphics processing unit (GPU) memory. An epoch finishes when all subject from the training set have been selected once. For every epoch, a new set of random rotations and patches are computed for each subject.

Datasets
We use two datasets to conduct our analysis: the publicly available HCP [38] and a dataset acquired at the National Hospital for Neurology and Neurosurgery, Queen Square (QS). The details of each dataset and the applied preprocessing are described below.

QS dataset
The QS dataset is comprised of 50 volumetric MS dMRI scans acquired from patients with epilepsy who appeared "structurally normal" on a T1-weighted MRI (T1). Small lesions may be present focally in the GM but should not distort or affect the WM FOD computation. All patients underwent MRI scanning as part of routine clinical care. Images are acquired on a 3T GE MR750 including a T1 sequence (MPRAGE) and a MS dMRI sequence with 2 mm isotropic resolution and the gradient directions 11,8,32, and 64 at b = 0, 300, 700, and 2500 s/mm 2 , respectively and one b = 0 s/mm 2 with reverse phase-encoding. The MS dMRI is corrected for signal drift, geometric distortions and eddy- Fig. 1. HighResNet architecture [41] and U-Net architecture [36] used in this work. current induced distortions as in Mancini et al. [11].

HCP dataset
We use a subset of the HCP dataset comprised of 45 subjects [38]. Images are acquired on a 3T scanner with the following parameters: 1.25 mm isotropic resolution with 90 gradient directions for each b = 1000, 2000, and 3000 s/mm 2 and 18 images at b = 0 s/mm 2 . Images in the HCP dataset are corrected following the protocols described in Sotiropoulos et al. [38] prior to download.

Evaluation metrics
To evaluate the CSD model accuracy, we compute mean absolute error (MAE) and the angular correlation coefficient (ACC) [46] for all voxels in the WM (Section 2.3). WM voxels are identified using geodesic information flows (GIF) to segment the WM as a binary mask [47]. The "ground truth" CSD is the M-CSD computed using dMRI with all gradient directions and shells.
MAE measures how well CSD coefficients match with lower values indicating high similarity to the ground truth CSD coefficients. MAE is computed as MAE(y,ŷ) = |y−ŷ| n where y is the ground truth M-CSD coefficients and ŷ are either the CSD coefficients inferred from a trained network or 2TS-CSD coefficients.
ACC is computed between two different sets of SH coefficients u, the M-CSD coefficients, and v, from either the CSD coefficients inferred from a trained network or 2TS-CSD coefficients, both models have the SH order j, where ACC are in the range [− 1, 1], where 1 implies a perfect linear correlation between two functions on a sphere, and − 1 implies a negative correlation [48]. Note, in this work the value − 1 is not achievable due to the non-negativity constraint imposed on the CSD coefficients.

Experiments
We assess CNN performance in the following scenarios: 1) training and testing on datasets with the same dMRI acquisition protocol (Experiment 1 -Intra-Scanner Acquisition Performance); 2) testing on datasets with different dMRI acquisition protocols than the training dataset (Experiment 2 -Inter-Scanner Acquisition Performance); and 3) testing on datasets with fewer dMRI gradient directions than the training dataset (Experiment 3 -Downsampled Imaging Performance). Additionally, we assess performance variability within specific brain regions. The details of each experiment are described below. For all experiments, the "ground truth" CSD is the M-CSD computed using the complete set of gradient directions and shells (Section 2.2).

Experiment 1 -Intra-Scanner Acquisition Performance
We assess how well CNNs were able to regress M-CSD coefficients when using dMRI acquired on the same scanner and with the same protocol for training and testing. In this experiment, 5-fold crossvalidation is conducted where 3 folds are used for training, 1 fold for validation, and 1 fold for testing.

Experiment 2 -Inter-Scanner Acquisition Performance
We assess the generalizability of CNNs to regress M-CSD coefficients from dMRI acquired from different acquisition protocols and a different scanner. We use CNNs trained in Experiment 1 -Intra-Scanner Acquisition Performance, without additional model tuning, to test on a dataset with different dMRI acquisition protocol than used during training. For example, a CNN trained on QS dataset using 2TS-CSD coefficients derived from the b = 700 s/mm 2 shell is evaluated on the HCP dataset using 2TS-CSD coefficients derived from the b = 2000 s/mm 2 shell.

Experiment 3 -Downsampled Imaging Performance
We assess the robustness of CNNs when acquisitions have fewer gradient directions than the training dataset. Using the same test datasets as in Experiment 1 -Intra-Scanner Acquisition Performance, we subsampled the number of gradient directions in the dMRI by 25%, 50%, and 75%. CNNs trained from Experiment 1 -Intra-Scanner Acquisition Performance, with no further tuning, were used to regress M-CSD coefficients from 2TS-CSD coefficients computed from the subsampled dMRI. CNNs were tested with both 2TS-CSD coefficients computed from the same dMRI acquisition protocol (as in Experiment 1 -Intra-Scanner Acquisition Performance) and dMRI with different acquisition protocols (as in Experiment 2 -Inter-Scanner Acquisition Performance).
Each subsampled dMRI was created as follows. For the original dMRI dataset, we first reorder the set of gradient directions such that if a scan is terminated prematurely, at any point, the acquired gradient directions will still be close to optimally distributed on the half-sphere [40]. Then, we truncated the number of gradient directions for both b = 0 s/mm 2 and the selected shell to generate the SS dMRI with the desired reduction in gradient directions. Finally, the 2TS-CSD is computed for the subsampled SS dMRI as described in Section 2.3.

Variability within brain regions
We assess the performance of the CNNs to regress M-CSD coefficients within the specific brain regions: frontal, occipital, parietal, and temporal lobes and the corpus callosum (CC). All brain regions were determined from a segmentation generated using [47] with WM labels derived from the Hammers Atlas [49].

Implementation
All experiments were performed on a workstation equipped with an Intel CPU (Xeon® W-2123, 8 × 3.60 GHz; Intel), 32 GB of memory and a NVIDIA GPU (GeForce Titan V) with 12 GB of on-board memory. All code was implemented in Python 3.6. PyTorch 1.6.0 [50] was used for network training. NiftyNet 0.4.0 [37] was used for data loading and sampling. Data augmentation was performed using SHtools 4.5 [51]. The code used to train the CNNs is available online. 1

For all Tables and Figures, the acronym QS 700-HCP 2000 CNN U-
Net indicates that CSD coefficients are from a CNN trained on the QS dataset using the 2TS-CSD coefficients from b = 700 s/mm 2 shell and tested using as input 2TS-CSD coefficients from the HCP dataset from the b = 2000 s/mm 2 shell. The method 2TS-CSD (QS 700) indicates the baseline approach computed from the QS dataset computed for the b = 700 s/mm 2 shell. The "ground truth" is always the M-CSD coefficients computed using the complete set of gradient directions and shells. Table 1 reports the mean, standard deviation, and median MAE and ACC between the indicated CSD coefficients and the M-CSD coefficients when training and testing on data with the same dMRI acquisition protocol. CSD coefficients from U-Net are the most similar to M-CSD coefficients while the baseline 2TS-CSD coefficients are the least similar. Pronounced ACC improvements are found in the QS dataset (9% for the mean ACC) compared to the HCP dataset (5% for the mean ACC) for the best performing CNNs. Additional experiments where we trained the CNNs using l max = 8, comprising 45 coefficients, are found in the Supplementary Material Section 1. Similar results were found for l max = 8 as for l max = 4. Finally, in Section 3 of Supplementary Material, we qualitatively evaluated the ability of the regressed CSD coefficients to perform tractography on QS 2500 and HCP 2000, generating four WM language-related tracts using ROI-based probabilistic tractography [11]. Tractography performed using CSD coefficients regressed from CNNs results in tracts more similar to tracts generated from M-CSD coefficients and with fewer spurious streamlines than tracts generated from 2TS-CSD coefficients (Supplementary Figures 2-3).

Experiment 2 -Inter-Scanner Acquisition Performance
We evaluate how well the CNNs perform when testing on datasets with different dMRI acquisition protocols than used during training. As models trained with the HCP dataset for different b-values had similar performance in Experiment 1 -Intra-Scanner Acquisition Performance (Section 4.1), we select one model (HCP 2000) for evaluation in this experiment. Table 2 reports the mean, standard deviation, and median MAE and ACC between the indicated CSD coefficients and the M-CSD coefficients. U-Net and HighResNet were quantitatively more similar to M-CSD coefficients than the 2TS-CSD coefficients.
CSD coefficients regressed from CNNs have greater similarity to M-CSD coefficients compared to the 2TS-CSD coefficients indicated by the CDFs skewing toward higher ACC values as shown in Fig. 3.
Figs. 4 and 5 show ACC heatmaps for the WM voxels. ACC heatmaps show higher errors in WM voxels far from the boundary in the CSD    coefficients regressed from CNNs for Experiment 2 -Inter-Scanner Acquisition Performance compared to Experiment 1 -Intra-Scanner Acquisition Performance. However, CSD coefficients regressed from CNNs better capture finer details in regions containing fiber crossings and voxels near the WM boundaries compared to the 2TS-CSD coefficients. Figs. 6 and 7 show a visual representation of the FODs as glyphs. FODs for the QS dataset show a higher qualitative agreement in single fiber populations compared to multiple fiber populations. FOD amplitudes were observed to be smaller than expected for the QS 700 dataset when CSD coefficients were regressed with CNNs training on the HCP dataset. This may occur due to a poor angular contrast in the low b-value (b = 700 s/mm 2 32 directions) of the testing data, which the CNNs cannot properly regress.
As shown in Fig. 7, when evaluating on the HCP dataset, CNNs trained on QS 2500 were capable of resolving fiber crossings while CNNs trained on QS 700 did not capture these finer details. However, both CNNs resolved FOD scaling and small rotations.

Experiment 3 -Downsampled Imaging Performance
We evaluated how well CNNs regressed CSD coefficients from 2TS-CSD computed from dMRI with subsampled gradient directions. For this experiment, we selected the best CNNs from Experiment 1 -Intra-Scanner Acquisition Performance for each dataset, QS 2500 and HCP 2000. Fig. 8 shows box plots of ACC for different amounts of gradient direction subsampling. CSD coefficients regressed from CNNs have better performance compared to 2TS-CSD coefficients for all levels of subsampling. This demonstrates robustness in our method to appropriately regress coefficients regardless of the number of gradient directions.
We qualitatively evaluated our method on its ability to generate accurate and reliable fiber bundles for QS 2500 and HCP 2000 when dMRI with 50% of the gradient directions to regress CSD coefficients was used as input. Tractography computed from CSD coefficients regressed from CNNs resulted in tracts more similar to tracts generated from M-CSD coefficients and with fewer spurious streamlines than tracts generated from 2TS-CSD coefficients as shown in Supplemental Material Figs. 2 and 3.

Variability within brain regions
We evaluate how well CNNs regressed CSD coefficients within specific brain regions to identify if there are spatial dependencies (Section 3.3.4). We selected the best CNNs from Experiment 1 -Intra-Scanner Acquisition Performance for each dataset, QS 2500 and HCP 2000. Fig. 9 shows ACC for the CC and frontal, occipital, parietal, and temporal lobes.
When regressing CSD coefficients for the HCP dataset, ACC between the regressed coefficients and M-CSD coefficients was higher compared to 2TS-CSD coefficients using CNNs trained on QS 2500. When regressing CSD coefficients for the QS dataset, CNNs trained on HCP 2000 had good performance in specific regions (occipital and temporal lobes) but relatively poor performance in other regions (CC, frontal and parietal lobes). However, differences in ACC for the CNN trained on HCP 2000 and the 2TS-CSD coefficients are relatively small and may be the result of random variation within models. One factor related to performance is the 2TS-CSD coefficients had relatively good performance (>0.9 for all regions), therefore, regression of CSD coefficients using CNNs had very little room to improve. The performance of CNNs to regress CSD coefficients on subsampled SS dMRI showed higher agreement with the M-CSD than the 2TS-CSD coefficients for all brain regions.
We evaluated our method on its ability to mitigate PVEs. We show a boxplot (Supplementary Figure 1) for the ACC in voxels at the WM/GM interface, the region most likely to contain PVEs. We demonstrate that CSD coefficients regressed from CNNs are more similar to M-CSD coefficients than 2TS-CSD coefficients at the WM/GM interface.

Discussion
We evaluated a patch-based CNN to regress M-CSD coefficients from 2TS-CSD coefficients. We demonstrated quantitatively (Tables 1 and 2 and Figs. 8 and 9) and qualitatively (Figs. 2-7) that CNNs can accurately regress CSD coefficients using data that are common in clinical settings -SS dMRI. CNNs can regress M-CSD coefficients from data with the same dMRI acquisition protocol as the training set (Experiment 1 -Intra-Scanner Acquisition Performance); generalize well to dMRI data acquired with different protocols than the training dataset (Experiment 2 -Inter-Scanner Acquisition Performance); and are robust to dMRI with fewer gradient directions than the training dataset (Experiment 3 -Downsampled Imaging Performance).
We evaluated two common neural network architectures, U-Net [36] and HighResNet [37]. Both architectures perform similarly throughout all experiments. The aim of this work was not to find the best CNN to perform CSD coefficient regression but to show the capability of DL to enhance CSD coefficients for clinically available dMRI acquisition protocols. Overall larger improvements, in terms of MAE and ACC, were observed in clinical protocols (QS) compared to research protocols (HCP). The HCP dataset has high spatial and angular resolution, which allows the 2TS-CSD to resolve complex fiber configurations and compute very accurate FODs while for the QS dataset individual shells are unable to properly capture these differences.
In this work, we investigated CNN-based regression methods for l max = 4, comprising 15 coefficients, to establish proof of concept. However, we evaluated the ability to regress M-CSD coefficients for l max = 8 for the best performing CNNs (QS 2500, HCP 2000) to assess a real-world setting. We found similar performance between both l max orders (Supplementary Material Section 1).
Similar works have proposed DL-based approaches for regressing FODs. Koppers et al. [34] trained a MLP using SH coefficients calculated from one shell or combination of shells to infer SH coefficients of the same order for a different shell. Koppers et al. [34] regress SH coefficients between shells. In our work, we use CNNs to regress M-CSD coefficients from 2TS-CSD coefficients to go from a SS to MS representation.
Nath et al. [35] trained a neural network composed of regular hidden and residual layers (ResDNN) to regress from S-CSD coefficients to FODs computed from histology obtained from a macaque model. FODs computed from histology have higher resolution than FODs computed from dMRI. Due to a lack of ground truth in human data, the model was validated measuring reproducibility in a scan-rescan dataset comprising 12 subjects. ACC was computed between two FODs obtained from two different scanning sessions. In this work, we are interested in regressing M-CSD coefficients from 2TS-CSD coefficients, recognizing that even M-CSD coefficients have limitations for both angular and spatial resolution of WM fibers [4]. Hence, we measured ACC between the regressed coefficients and the "ground truth" M-CSD coefficients. Direct comparisons between Nath et al. [35] and our work are not possible, as ACC between scan-rescan FODs should be higher than ACC between CSD coefficients of varying quality.
Lin et al. [33] used a 3D CNN to regress M-CSD coefficients from M-CSD coefficients computed on dMRI with downsampled gradient directions (similar to Experiment 3 -Downsampled Imaging Performance). Lin et al. [33] evaluated regression of M-CSD coefficients from S-CSD coefficients for one subject in the HCP dataset. For this subject, ACC ranged from 0.95 to 0.96 in WM regions and 0.87 to 0.89 in partial volume regions when dMRI was downsampled to 45 gradient directions at b = 2000 s/mm 2 shell (50% of the original gradient directions). In comparison, our method had median ACC ranged from 0.92 to 0.96 in 45 patients when the S-CSD model was computed on 50% of the gradient directions at b = 2000 s/mm 2 shell. Hence, our CNNs have similar performance to Lin et al. [33].
Our approach may enable faster commercial dMRI acquisition with fewer gradient directions, thereby reducing acquisition times and facilitating translation in time-limited clinical environments [26]. Our work has the potential to estimate CSD coefficients from SS dMRI with similar quality to M-CSD coefficients. This method could be applied to improve the analysis of retrospective data where changing the acquisition is not possible.
There are two key limitations in this work. First, we used datasets to train our CNNs from the same scanner with the same acquisition protocol. Although we demonstrate that our approach is capable of generalizing across dMRI acquisition protocols (Experiment 2 -Inter-Scanner Acquisition Performance and Experiment 3 -Downsampled Imaging Performance), further improvements in the CNNs may be obtained by combining datasets during training. Secondly, we did not validate on subjects with pathologies that would distort WM tissue connectivity, such as tumors. Although the QS dataset [11] was acquired from patients with epilepsy, if any small lesions or abnormalities were present they were not big enough to distort normal WM anatomy. One future avenue of research is to evaluate this approach on data with pathologies that distort normal anatomy and validate it in a clinical setting, e.g. presurgical planning.

Conclusions
In this work, we presented a 3D patch-based convolutional neural network (CNN) to regress multi-shell, multi-tissue constrained spherical deconvolution (M-CSD) coefficients from single-shell 2-tissue CSD coefficients (2TS-CSD). Two CNN architectures, U-Net and HighResNet, were evaluated on their ability to regress CSD coefficients 1) on the same Fig. 9. ACC for specific brain regions tested on QS 2500 (Top) and on HCP 2000 (Bottom). ACC for specific brain regions when testing with dMRI containing all gradient directions (Left) and 50% of the gradient directions (Right). HR indicates HighResNet. dataset; 2) across different diffusion MRI (dMRI) acquisition protocols, and 3) on dMRI with fewer gradient directions than the training dataset. Our approach enables robust multi-tissue CSD FODs on SS dMRI acquisition protocols with few gradient directions, allowing faster dMRI acquisition in clinical settings. Further validation is required to demonstrate this approach generalizes to datasets acquired at multiple sites and on patients with brain pathologies that distort normal anatomy, such as tumors.