Image quality transfer and applications in diffusion MRI

ABSTRACT This paper introduces a new computational imaging technique called image quality transfer (IQT). IQT uses machine learning to transfer the rich information available from one‐off experimental medical imaging devices to the abundant but lower‐quality data from routine acquisitions. The procedure uses matched pairs to learn mappings from low‐quality to corresponding high‐quality images. Once learned, these mappings then augment unseen low quality images, for example by enhancing image resolution or information content. Here, we demonstrate IQT using a simple patch‐regression implementation and the uniquely rich diffusion MRI data set from the human connectome project (HCP). Results highlight potential benefits of IQT in both brain connectivity mapping and microstructure imaging. In brain connectivity mapping, IQT reveals, from standard data sets, thin connection pathways that tractography normally requires specialised data to reconstruct. In microstructure imaging, IQT shows potential in estimating, from standard “single‐shell” data (one non‐zero b‐value), maps of microstructural parameters that normally require specialised multi‐shell data. Further experiments show strong generalisability, highlighting IQT's benefits even when the training set does not directly represent the application domain. The concept extends naturally to many other imaging modalities and reconstruction problems. Graphical abstract Figure. No Caption available. HighlightsImage quality transfer propagates information from rare or expensive high quality images to abundant or cheap low quality images.Dramatically outperforms interpolation in resolution enhancement of diffusion MRI.Enables tractography to recover fine pathways normally only accessible at 1.25 mm resolution from 2.5 mm data sets.Provides plausible NODDI and SMT maps from single‐shell input data.Requires only off‐the‐shelf and computationally light machine learning and imaging tools and complementary to other sparse reconstruction and super‐resolution techniques.


Introduction
Bespoke MRI scanners and imaging protocols can produce very high quality data uniquely informative on anatomy and physiology. For example, the Human Connectome Project (HCP) developed specialised human MRI scanners fitted with 100 mTm −1 and 300 mTm −1 gradient coils Setsompop et al., 2013), which are up to an order of magnitude more powerful than standard clinical scanners. In combination with custom imaging and image reconstruction innovations  and a lengthy acquisition protocol, these devices provide unprecedented image resolution, signal levels, and coverage of the measurement space revealing ultra-fine anatomical detail from live subjects. Such experimental devices showcase the potential for future clinical imaging systems (Vu et al., 2015), but provide little immediate benefit to current clinical practice where hardware is more modest and patient-imaging time is very limited. Moreover, translation is slow, because market forces demand strong evidence that mass production has major benefit. Even when available, the latest technology is often prohibitively expensive outside elite research centres, so the state of the art remains inaccessible to most practitioners.
Image quality transfer (IQT) aims to bridge this gap between rare experimental systems and accessible commercial systems. The technique learns mappings from low quality (e.g. clinical) to high quality (e.g. experimental) images exploiting the similarity of images across subjects, regions, modalities, and scales: image macro-and meso-structure is highly predictive of sub-voxel content. The mapping may then operate directly on low-quality images to estimate the corresponding high-quality images, or serve as a prior in an otherwise ill-posed imagereconstruction routine.
Such a procedure has the potential to enhance or enable a wide range of desirable imaging or image analysis operations. These include long-standing challenges in medical imaging, such as enhancement of spatial resolution (Greenspan et al., 2002;Coupé et al., 2013;Scherrer et al., 2012;van Steenkiste et al., 2016;Ning et al., 2016) and harmonisation of multi-centre studies (Mirzaalian et al., 2016) or longitudinal studies that straddle scanner upgrades, as well as more recent developments, such as image reconstruction from sparse acquisition (compressed sensing or MR fingerprinting) (Lustig et al., 2007;Ma et al., 2013), or predicting one contrast or modality from another ("image synthesis" or "modality propagation") (Jog et al., 2015;Ye et al., 2013;Burgos et al., 2015;Bahrami et al., 2016).
We construct two simple and practical implementations of IQT via patch-regression using i) a global linear transformation (IQT-GL), and ii) a non-linear random-forest (IQT-RF). To demonstrate the concept and benefits of IQT, we focus on using it to enhance spatial resolution and the downstream benefits to brain connectivity mapping. We also investigate the potential for IQT to enrich the information content of images by estimating maps of microstructural indices that normally rely on a "multi-shell" acquisition, but from "single-shell" input.
Image resolution is a major limiting factor in tractography. IQT can reconstruct high resolution images from low resolution input by learning a mapping from low resolution to high resolution images. We demonstrate here that this operation improves dramatically on interpolation, the standard technique that has well-documented limitations (Dyrby et al., 2014), and enables tractography to identify white matter pathways in low-resolution images that otherwise require specialised high-resolution data. Diffusion MRI super-resolution techniques, e.g. (Coupé et al., 2013;Scherrer et al., 2012; van Steenkiste et al., 2016;Ning et al., 2016), potentially offer similar benefits. However, most rely on a specialised acquisition with partially overlapping anisotropic voxel grids, cf. Greenspan et al. (2002), and/or involve complex processing pipelines (Coupé et al., 2013), so are not widely used. In contrast, IQT operates naturally on any existing data set and requires no special acquisition. In fact, the IQT operation is complementary to the reconstruction procedures in Scherrer et al. (2012), van Steenkiste et al. (2016) and Ning et al. (2016).
Additional experiments aim to recover parameter maps provided by neurite orientation dispersion and density imaging (NODDI) (Zhang et al., 2012) and microscopic diffusion anisotropy mapping based on the spherical mean technique (SMT)  from single-shell data sets, i.e. with only one non-zero b-value. NODDI maps indices of neurite (axons and dendrites) density and their geometric configuration (orientation dispersion); SMT  maps the per-axon microscopic diffusion tensor independent of intra-voxel orientation distribution, as well as orientation dispersion. Both exemplify the microstructure-imaging paradigm (Assaf et al., 2013), but require non-standard multi-shell acquisitions with multiple non-zero b-values and fail given single-shell data sets, which are routinely acquired for diffusion tensor imaging (DTI) (Basser et al., 1994). IQT learns mappings from single-shell (DTI) to multi-shell (NODDI/SMT) parameter maps. The ability to estimate such parameters from singleshell input data potentially enables reanalysis of a wide range of historical brain-imaging studies to reveal more specific information on group differences or disease mechanisms. The closest previous work on this topic (Golkov et al., 2016) learns a prior from examples to stabilise estimates of advanced diffusion MRI indices, specifically NODDI parameters and diffusion kurtosis (Jensen et al., 2005), from sparse data sets. However, the data sets in Golkov et al. (2016) are still multishell so the parameter-estimation problem remains well posed.

Methods
This section first specifies the various data sets and diffusion MRI models the experiments in the next section use to demonstrate IQT. We then describe the general implementation of IQT through patch regression and detail the specific versions designed for resolution enhancement and microstructural parameter mapping.

Data sets
We make use of four data sets to test and evaluate our implementations of IQT.

HCP data set
The main data set for training and testing is the diffusion MRI data provided by the WU-Minn HCP, release Q3 . The data from each subject includes 288 diffusion weighted images (DWIs) (acquired twice with reversed phase encoding direction) with 1.25 mm isotropic voxels; 18 have nominal b=0 and the three highangular-resolution-diffusion-imaging (HARDI) shells of 90 directions have nominal b-values of 1000, 2000, and 3000 smm −2 ; the precise values vary spatially, see Sotiropoulos et al. (2013), which fully details the acquisition. The data are corrected for distortions (susceptibilityinduced, eddy currents and subject motion), see Glasser et al. (2013).
The HCP cohort contains broadly healthy, young-adult (22-36 years) subjects, who vary in race, gender, and handedness all of which affect brain structure. We identify various subgroups of the HCP cohort to evaluate generalisability. Subgroup 1, which we refer to as HCP1, is as heterogeneous as possible containing a mixture of "White -Hispanic/Latino", "White -Not Hispanic", and "Black or African Am.", unrelated, male and female, and left and right-handed subjects in their 20s and 30s. Subgroups 2 (HCP2) and 3 (HCP3) are as homogeneous as possible while being as different as possible from one another. HCP2 subjects are unrelated "White -Not Hispanic" righthanded females aged 22-27. HCP3 subjects are "Black or African Am." unrelated males aged 31-36, predominantly left-handed. HCP1 and HCP2 include 8 subjects for training and a separate 8 subjects for testing. HCP3 has only a training set of 8 subjects. Support vector machine classification, c.f. Klöppel et al. (2008), of whole structural images from subjects in HCP2 and HCP3 achieves around 90% accuracy on average over an 8-fold cross-validation test. Thus, brain structure strongly discriminates the two groups.
A further 8 training sets of 8 subjects (HCP-Random 1-8), randomly chosen from the remaining subjects in HCP Q3, enable tests of variability of performance over training sets.

HCP Lifespan data set
The HCP Lifespan Pilot Phase 1a, which is available online at http://lifespan.humanconnectome.org, has a much wider age range than the main HCP cohort including 26 subjects from 8-75 years. The HCP Lifespan diffusion-imaging protocol is a shortened version of the main HCP protocol with lower resolution (1.5 mm isotropic voxels) and only two HARDI shells, with b = 1000 and 2500 smm −2 . However, the protocol exploits the optimised features of the HCP scanners, providing data of considerably better quality than standard sequences.

Monkey data set
We also use 13 diffusion MRI data sets each from a separate perfusion-fixed vervet monkey (Chlorocebus aethiops) brain. The age range is 5-83 months. The animals were obtained from the Behavioral Science Foundation, St. Kitts and were socially housed in enriched environments. The experimental protocol was reviewed and approved by the Institutional Review Board of the Behavioral Science Foundation acting under the auspices of the Canadian Council on Animal Care. The brains were prepared using the protocol in Dyrby et al. (2011). They were imaged on a 4.7 T Varian MR system at the Danish Research Centre for Magnetic Resonance, Denmark, with imaging protocol similar to the 300 mTm −1 protocol in Dyrby et al. (2013), but with less directions per shell to enable full-brain imaging. Thus, each data set has three HARDI shells, with b-values approximately 2000, 3000, and 9500 smm −2 , 16 b=0 images, and isotropic 0.5 mm voxels.

Prisma data
We acquired two non-HCP in-vivo human data sets on separate subjects using the clinical 3 T Siemens Prisma scanner in fMRIB, Oxford. The scanner is equipped with a modern gradient set of 80 mT/ m maximum strength. The publicly available CMRR Multiband sequence (Moeller et al., 2010) has been installed. These features allow acquisitions that resemble the main HCP diffusion MRI protocol. Subjects 1 and 2 are healthy male adults (29 and 33 years old respectively). Both data sets include a standard 3D T1-weighted MPRAGE (1 mm isotropic resolution) and diffusion MRI data with three 90-direction HARDI shells, b-values of 1000, 2000, and 3000 smm −2 , and 21 b=0 images, each for two image resolutions: 2.5 mm and 1.35 mm isotropic voxels (TE = 75 ms, TR=2.17 s, multiband factor=3 and TE = 94.6 ms, TR = 5.59 s, multiband factor=3, respectively). All data were acquired twice with reversed phase encoding direction (AP and PA). The HCP preprocessing pipelines  were used to correct for distortions in the diffusion MRI data and obtain transformations to T1 and MNI space. The Prisma scanner is less powerful than the bespoke HCP scanner and cannot achieve sufficient signal at 1.25 mm resolution, but the 1.35 mm data provides a pseudo ground-truth for IQT resolution enhancement of the 2.5 mm data.

Diffusion MRI models and fitting
We make use of four diffusion MRI models: i) the diffusion tensor (DT) model (Basser et al., 1994) and ii) Mean Apparent Propagator (MAP) MRI (Özarslan et al., 2013), to illustrate IQT resolution enhancement; and iii) NODDI (Zhang et al., 2012) and iv) SMT , for IQT parameter mapping. Initial demonstrations of IQT resolution enhancement use DTI for simplicity. However, MAP-MRI provides a more general signal representation than DTI and can capture orientational variance in fibre crossing regions and across multiple b-values (for a fixed diffusion time), which IQT tractography requires. We demonstrate IQT parameter mapping by reconstructing NODDI and SMT maps from single-shell DTI maps.
We fit the DT model to only the b = 0 images and b = 1000 smm −2 shell of the HCP and Lifespan data sets. Since the monkey data come from ex-vivo samples, the diffusivity of water in the tissue is 3-4 times lower (Dyrby et al., 2011;D'Arceuil et al., 2007). Thus, we use the b = 0 images and b = 3000 smm −2 shell (87 gradient directions); the latter shows signal attenuation most similar to the b = 1000 smm −2 HCP data. In all cases we use weighted linear least squares, cf. Jones and Basser (2004), accounting in the HCP data for the spatially varying b-value and gradient directions.
We fit the MAP basis up to order 4, giving 22 coefficients, by linear (unweighted) least squares to all three shells of the HCP data again accounting for spatial variation. The choice of scale parameters (see Özarslan et al. (2013)) u u u = = = 1.2 × 10 mm empirically minimises the error in fitting the signals in the HCP data.
The NODDI model, see Zhang et al. (2012) for details, has five free parameters: the intra-cellular volume fraction f ICVF ; the free-water volume fraction f ISO ; the mean θ ϕ ( , ) and variance parameter κ of a Watson distribution of fibre orientations. The orientation dispersion index (ODI) is a function of κ (Zhang et al., 2012). We fit the NODDI model via non-linear fitting using the NODDI toolbox https://www. nitrc.org/projects/noddi_toolbox version 0.9.
SMT  has two primary unknown parameters: the axial λ and radial λ ⊥ diffusivities of microscopic tissue compartments, which provide the microscopic FA, μFA. The framework also estimates the fibre orientation distribution p, which the scalar orientation dispersion entropy H(p) summarises. We fit the model using the implementation from , which is available at https:// ekaden.github.io.

IQT implementation
Our implementations of IQT use patch regression, i.e. learn mappings from a 3D neighbourhood, or "patch", of N 1 voxels in the input (low-quality) data set to a corresponding patch of N 2 voxels in the output (high-quality) data set. Open-source code is available from the authors on request. Input and output voxels are vector-valued containing p 1 and p 2 values, respectively. The regression problem requires a training set of patch pairs T | | , where each input patch x i has dimension p N 1 1 and each output patch y i has dimension p N 2 2 . Image reconstruction considers each patch of an input image independently; the learned mapping provides the output patch at the corresponding location. Thus, the IQT reconstruction step simply iterates over the image volume executing the mapping at each location to piece together the output image.
We consider a hierarchy of three types of mapping to implement the regression: global linear (for IQT-GL), regression trees, and regression forests (for IQT-RF).

Linear regression
Global linear regression computes a linear transformation matrix G YX = † , where the columns of Y are the training output-patches, y i , and the columns of X the corresponding input-patches, x i ; X † is an appropriate pseudo inverse of X. For an unseen input patch x, the estimate of the corresponding output patch is Gx.

Regression trees
The regression tree implements a piecewise linear regression over the space of input data points (Breiman, 2001;Criminisi and Shotton, 2013). Each internal node in the tree sends data points into left or right subtrees by thresholding one of J scalar functions of x, called features, F F , …, J 1 . Each leaf node contains a linear transformation with the same structure as the global linear transformation G defined above. Thus, for test input-patch x, the output estimate is G x t where G t is the linear transformation associated with the leaf node t at which the data point arrives after traversing the tree.
Regression-tree training estimates the optimal i) tree structure, i.e. which nodes exist and which are internal and leaf nodes, ii) choice of feature and feature threshold at each internal node, and iii) linear transformation at each leaf node. We use a greedy search here similar to Criminisi and Shotton (2013). To control for overfitting, the algorithm reserves half the training set, T, as a validation set V. It selects features, thresholds, and fits linear models using the remaining half of T and accepts only splits that reduce the residual error of V. The procedure starts by assigning all training points to a single root node and computing a global linear transformation. It then seeks the feature-and-threshold combination that partitions the data with maximum information gain is the information content of the root node, I L and I R that for the left and right child nodes, respectively, and For each candidate F j , a standard golden search locates the threshold τ j that maximises the information gain. We select the F j with maximum information gain and accept the split if it reduces the residual errors on V. Specifically, the test compares the residual errors from the root node j j or G R in the right node otherwise. We retain the split only if < LR P ; otherwise, the root node remains a leaf node. If the split at the root node does reduce errors on V, the training process continues and recursively splits the training data into finer subsets assigned to left and right child nodes until the residual error of V stops reducing.

Regression forests
Regression forests (Criminisi and Shotton, 2013) use multiple regression trees constructed from different bootstrap samples of the available training data. Outputs are element by element averages of the estimate from each tree weighted by the error covariance S t −1 (c.f. Eq. (3)) of the linear transformation G t , estimated during training.

IQT resolution enhancement
The IQT resolution-enhancement mapping takes as input a n n n (2 + 1) × (2 + 1) × (2 + 1) cubic patch of voxels, so that N n = (2 + 1) 1 3 , and outputs an m m m × × cubic patch of voxels, so that N m = 2 3 . The output patch is a cubic array of subvoxels corresponding to the central voxel of the input patch, as Fig. 1 (left) illustrates. Each voxel in each patch contains p p = 1 2 model parameters. For example, for DTI resolution enhancement with n=2 and m=2, each input vector, x, contains the p = 6 1 independent elements of the DT in each voxel of a 5 × 5 × 5 low-resolution patch and each output vector, y, contains the p = 6 2 elements of each DT in the 2 × 2 × 2 high resolution patch; the mapping is thus from 750  to 48  . Fourth-order MAP has p = 22 2 , leading to correspondingly higher dimensional mappings.
Training pairs come from downsampling each DWI by a factor of m in each dimension, fitting the model (e.g. DT or MAP) in each voxel of both the downsampled and full resolution image, and associating n (2 + 1) 3 patches in the downsampled image with the m 3 patch in the full resolution image corresponding to the central voxel of the lowresolution patch. For downsampling, here we simply take the mean DWI intensity in each block of m m m × × voxels, thus assuming an idealised point-spread function. The training data includes only patches wholly contained in the brain mask. We randomly subsample T from the pool of patch-pairs available from a set of training images without replacement. The HCP images each contain around 7.8 × 10 5 brain voxels on average, which leads to a pool of around 3×10 6 patch pairs (allowing overlapping patches) with m=2 and n=2 in a training set of 8 HCP subjects. We consider sampling rates of up to 1 in 2, which hits the limits of available computer memory and use similar training set sizes from the Lifespan and Monkey data sets.
For IQT models trained on ex-vivo monkey data, a simple correction makes image patches from the in-vivo data sets compatible during reconstruction. Specifically, since the average eigenvalues of DTs fitted to the Monkey b = 3000 smm −2 shell are approximately half of those from the HCP or Lifespan b = 1000 smm −2 shell, we halve all the elements of each DT in the input patch, estimate the output patch from the regression model, and double all elements of the output to recover compatibility with the original data.

IQT parameter mapping
IQT parameter mapping estimates one set of image contrasts from another so, in general, p p ≠ 1 2 . We assume here that the voxel grid of the output images is the same as the input, i.e. N m = = 1 2 , as in Fig. 1 (right). Thus to estimate NODDI or SMT parameters from DTI data, the mapping takes as input a n (2 + 1) 3 cubic patch of DTs, so that N n = (2 + 1) 1 3 and p = 6 1 . For NODDI, the mapping outputs all five parameters at the central voxel of the input patch, so that p = 5 2 ; for SMT it outputs λ , λ ⊥ , and H(p) so p = 3 2 . Training pairs come from fitting DT maps to just the b = 0 and 1000 smm −2 images (input) and matched NODDI or SMT maps fitted to full data sets (output).

Split features
For F F , …, J 1 , we use the following features of x in IQT-RF to define partitions of the training data in the trees for both applications: • The three eigenvalues of the DT in the central voxel.
• The linearity, planarity, isotropy (Westin et al., 1999) and trace of the DT in the central voxel.
• The means of each of the features above over the central 3 × 3 × 3 cube (if input patch size n ≥ 1 -see Fig. 1) and those over the whole n (2 + 1) 3 cube (if n ≥ 2).
Thus here J=7 if n=0, J=15 if n=1, and J=23 if n ≥ 2. We quantify the orientational variance using the principal eigenvalue of the mean dyadic tensor, as in Alexander and Barker (2005).
The set includes only orientationally and spatially invariant features to ensure generalisability among various types of training and test data. Additional features like absolute DT orientation and spatial location within the brain potentially enable the regression to build models specific to particular anatomic locations. Their inclusion leads to small reductions of error scores when training and testing within the HCP data set where the brain is consistently oriented within each image. However, they reduce the generalisability among data sets, because anatomical locations correspond less well between, say, monkey brains and HCP subjects than between different HCP subjects. Thus we exclude such features here.

Boundary effects
During IQT training, we use only patches wholly contained within the brain regions of the images. During testing, however, ignoring partial patches leaves a rim of unassigned values around the edge of the brain, which can erode a significant portion of the image for large patches. We avoid this effect by completing partial patches with the conditional mean of all missing entries in x, given the available entries, derived from a Gaussian model of the set of input patches in T. Specifically, we compute the mean x and covariance of the training input-patch library, where X is the matrix with columns x i , i T = 1, …, | | and X is the matrix of the same size as X, but with each column equal to x. These two quantities provide a multi-variate Gaussian model of the distribution of input patches. Equation 5.13 of Prince (2012) provides the conditional mean of a multivariate Gaussian distribution given a subset of components. We use that expression directly to replace the missing entries in boundary patches. Once completed in this way, we pass the completed x through the IQT mapping in the usual way. D.C. Alexander et al. NeuroImage 152 (2017) 283-298 Tractography Tractography experiments use the probabilistic tractography algorithm in Sotiropoulos et al. (2013) via the multi-shell multi-fibre reconstruction in Jbabdi et al. (2012). IQT tractography simply runs the same algorithm after resolution enhancement with IQT. Specifically, we fit the MAP coefficients to low resolution DWIs and enhance resolution using IQT to obtain high resolution MAP coefficient maps. Those high resolution maps provide DWIs following the HCP protocol, through Eq. (24) in Özarslan et al. (2013), which provides the input to the tractography package. IQT could work with any other tractography package in a similar way.

Error metrics
We use two simple error metrics to evaluate IQT performance. To compare DTI maps, we use the mean (over N S test set subjects) median (over the set Ω i ( ) in the i-th subject of brain voxels contained only in patches completely within the brain mask) root-mean-squared error (of the six independent DT elements D D , …, We refer to this metric as DT-RMSE. We have also computed a range of other performance metrics, including orientational difference, as in Alexander and Barker (2005), and normalised predicted signal error, as in Ning et al. (2015), but all produce similar trends and conclusions so we present only DT-RMSE.
To compare NODDI maps, we use the mean (over subjects) of the mean (over NODDI parameters), median (over voxels) absolute error, which we refer to as NODDI-MAE. Only the three scalar NODDI parameters, f ICVF , f ISO , and κ, contribute and we normalise κ to [0, 1], like f ICVF and f ISO .

Experiments and results
This section starts by demonstrating and evaluating IQT resolution enhancement, which we use to: a) highlight the advantage over standard operations such as interpolation, b) study the performance of IQT-GL and IQT-RF as a function of algorithm parameters, and c) demonstrate and evaluate generalisability beyond the training data. It then demonstrates the potential for parameter mapping from undersampled data sets using NODDI and SMT. We exemplify the downstream benefits of IQT resolution enhancement in tractography and, finally, verify results using independent test data (the Prisma data). Table 1 summarises the usage of the various data sets in each experiment and result. Fig. 2 demonstrates IQT image-resolution enhancement and its generalisability. The test image (top left) is a 1.25 mm resolution DT map not in any training set. We show the direction-encoded colour (DEC) map (Pajevic and Pierpaoli, 1999), which summarises the full DT map with DT fractional anisotropy (FA) (Basser and Pierpaoli, 1996) as intensity and principal orientation as colour (red for left-right, green for anterior-posterior, blue for inferior-superior). The corresponding low-resolution image (top middle-left) comes from block averaging DWIs prior to model fitting to obtain a 3.75 mm resolution DT map (27 times larger voxels). The remaining DEC maps all show 1.25 mm resolution DT maps obtained from the 3.75 mm resolution map by interpolation or IQT. The IQT transformations operate on the low resolution DT maps; all use m=3 and n=2. The figure compares reconstructions from IQT-GL and IQT-RF both trained on HCP data (row 3 left) and IQT-RF trained on Lifespan and Monkey data (row 3 right). The IQT-GL transformation uses a training set of T | | ≈ 1.6 × 10 6 patch-pairs drawn from the HCP1 training set, as does each of the 8 trees in HCP-trained IQT-RF. Lifespan-trained IQT-RF also has 8 trees with similar T | | to the HCP trees; Monkey-trained IQT-RF has 64 trees with T | | ≈ 0.2 × 10 6 so that the total number of training samples is similar to the HCP and Lifespan-trained forests. The choices correspond to the best operating points identified in later quantitative results ( Fig. 6(a)); the optima are similar for m=2 and m=3. Linear and cubic interpolation is on the downsampled DWIs prior to model-fitting.

Resolution enhancement
The difference maps in rows 2 and 4 are sums of absolute differences of DEC maps (before clipping) against ground truth ("Hires gold standard" in the figure). The bar chart (row 2 left) summarises later quantitative results ( Fig. 5 and Fig. 7(b)) showing mean reconstruction errors (DT-RMSE) over the eight subjects in the HCP1 test set.
IQT (e.g. third row middle-left) approximates the corresponding full-resolution image (top left) more closely than interpolation (e.g. top right). IQT provides sharper edges throughout the brain (e.g. pink arrows) than interpolation, producing less blurred images, and retains the high FA of the full resolution image, which interpolation artificially reduces. IQT also avoids artefactual "hotspots" (white arrows and pink arrows) that interpolation can cause (Dyrby et al., 2014). The pink arrows highlight IQT's ability to recover small structures obscured by partial volume effects at the resolution of the input data. Even IQT-GL improves substantially on interpolation, although the non-linear IQT-RF has additional benefits. For example, within homogeneous regions such as the corpus callosum (yellow ellipses), IQT-RF captures the smooth transition from red to yellow apparent in the "Hi-res gold standard", whereas IQT-GL produces blockier output. The difference maps highlight better detail at tract interfaces (e.g. cyan circles) with IQT-RF than IQT-GL. In general, IQT avoids the strong anatomical structure that appears in the difference maps from interpolation. Even given only training data from the Lifespan and Monkey data sets, IQT-RF shows clear advantages over interpolation. In particular, the IQT-RF Monkey image (row 3 right) shows remarkable consistency with the IQT-RF HCP image (row 3 left-middle) given that the former is synthesised only from patches of ex-vivo fixed monkey-brain images. Fig. 3 provides a more detailed comparison of IQT and interpolation. It compares DTI principal direction maps in the cortex from a fullresolution HCP image (not in the training set) with those from the same image downsampled to 2.5 mm resolution and subsequently interpolated and IQT-RF reconstructed 1.25 mm images. A sharp transition in fibre orientation occurs across the boundary between grey matter and white matter (inner black contour) from tangential to orthogonal to the boundary. The transition is only visible with sufficiently high image resolution (see e.g. Dyrby et al., 2011) and in   the figure is clear at 1.25 mm resolution but not at 2.5 mm (black arrows). IQT recovers the transition, as well as the directionality within the cortical ribbon, even though the low-resolution image contains little information in these areas, e.g. where indicated with black arrows. Interpolation fails to recover the transition as it cannot resolve the partial volume effects inherent in the low resolution input data, as discussed in Dyrby et al. (2014). In addition, IQT-RF recovers complex directional structure better in the white matter, particularly at interfaces, e.g. the area to the right of the white arrows. Supplementary Fig. S1 shows an equivalent to Fig. 3 without using boundary completion. The comparison reveals where boundary completion comes into play and confirms, qualitatively, efficacy of the approach. For example, at the crown of the gyrus containing the black arrow, the visual match of IQT-RF with the original high resolution image is closer than that from interpolation. Supplementary Fig. S2 shows the random forest that provides the reconstruction in Fig. 3 to give some insight into its operation. Fig. 4 demonstrates IQT NODDI and SMT parameter mapping from single-shell data. IQT training uses the HCP1 training set with n=1 and m=1 (best operating point from Fig. 6(b)), T | | = 1.6 × 10 6 , and 8 trees. The second column comes from inputting just the b = 0 and 1000 smm −2 images to the standard NODDI and SMT estimation tools, as used for the first column. The SMT algorithm uses random starting points for single-shell input.

Parameter mapping
As Zhang et al. (2012) and  predict, conventional estimation techniques ("Direct Fitting" in the figure) fail given singleshell input, producing strongly disrupted maps, e.g. noisy appearance of f ICVF and μFA. IQT-GL also performs poorly and fails to recover the structure of most parameter maps. The improvement IQT-RF provides is striking. It recovers the broad patterns of contrast between grey matter, white matter, and cerebro-spinal fluid in all parameters, albeit , the corresponding image after downsampling the DWIs to 3.75 mm isotropic resolution (top left-middle), those after reconstruction to full resolution via linear (top right-middle) and cubic (top right) interpolation prior to DT fitting, and after reconstruction using IQT (third row) trained on other HCP data sets (left), the Lifespan data set (right-middle), and fixed monkey-brain data (right). The second and fourth rows show difference maps against the "Hi-res gold standard". The bar chart on the second row compares average reconstruction errors over the 8 images in the HCP1 test setlower is better.
with less point-to-point variation (cyan ellipses) and weaker contrast (yellow ellipses) than the "Original".

Quantitative evaluation
Fig. 5 compares high-resolution DTI reconstruction errors (DT-RMSE from Eq. (4)) on the HCP1 test set for various algorithms as a function of per-tree training set size, T | |. As in Figs. 2 and 3, lowresolution input data comes from block averaging DWIs prior to model fitting and linear and cubic interpolation is on the downsampled DWIs prior to model-fitting. All quantitative results on resolution enhancement, in this and subsequent figures, use m=2 (factor of 8 difference in voxel volume), which is representative of practical situations. However, this is in contrast to the qualitative results in Fig. 2, which use m=3 (factor of 27 voxel volume difference) to emphasise visible differences in reconstruction quality better. In the IQT-RF data points, in all figures in this section, the number of trees is inversely proportional to the number of training samples per tree so that the total amount of training data is comparable; from left to right, the number of trees is 64, 32, 16, 8, 4. Fig. 5 shows that standard-deviation over subjects (error bars in the figure) is consistent among the different algorithms. However, the subject-specific error score is highly correlated among algorithms, because individual anatomical features, such as the size of the ventricles, strongly drive its absolute value. For this reason, standard deviation over training sets is more meaningful when comparing performance between algorithms than standard deviation over subjects. Later results (Fig. 7(b)) show that the standard deviation over training sets has a value of around 10 mm s −7 2 −1 , which is negligible on the scale of Fig. 5. This suggests that even small differences in error score, such as between IQT-OneTree (a single regression-tree) and IQT-RF at T | | ≈ 1.6 × 10 6 , are strongly significant at about 60 standard deviations. Similarly, the differences in performance between interpolation and IQT-GL, or IQT-GL and IQT-RF, are highly significant. Thus we conclude that, as the qualitative results in Fig. 2 suggest, all IQT implementations outperform linear and cubic interpolation, and non-linear random forests (IQT-RF) outperform non-linear decision trees (IQT-OneTree), which outperform global-linear regression (IQT-GL). The flexibility of the non-linear mappings captures significant extra detail. Moreover, as is common in learning applications, the consensus approach of the forest provides subtle but significant additional benefit over the single tree; this is both because it enables the learning to consider more training data, and because it mitigates the greedy tree-learning algorithm hitting local minima. Fig. 5 also shows that while IQT-GL benefits little from increasing T | |, and interpolation is obviously independent of T | |, IQT-RF performance increases steadily with T | |. Thus, IQT-RF favours a small number of complex trees, from large T | |, over a large number of smaller trees trained with less data. Fig. 6 plots reconstruction errors against T | | for various n to guide the choice of the parameter. Panel (a) is for IQT resolution enhancement and (b) for NODDI parameter mapping. For comparison to the minimum NODDI-MAE score for IQT-RF of 0.10 × 10 −2 (n=1, maximum T | |), the corresponding score for "Direct Fitting" (Fig. 4) is 1.81 × 10 −2 (18 times worse than IQT-RF), and the best case for IQT-GL (n=2, largest T | |) is 0.65 × 10 −2 (6 times worse than IQT-RF). We omit error bars in both panels for clarity, but they have similar magnitude to those in Fig. 5.
In resolution enhancement, performance of IQT-GL increases with n for sufficient T | |. This reflects the increase in linear-model complexity ); the number of parameters increases to 98,832 (=(7 × 6 + 1) × (2 × 6 3 3 )) for n=3. In contrast, the random forests find an appropriate level of complexity of the model for any n. However, computer memory limitations enforce a Fig. 4. IQT reconstruction of NODDI and SMT parameters from DTI; trained on HCP1. The left column shows gold standard NODDI (top three rows) and SMT (bottom three rows) parameter maps estimated using standard tools from the full data set (all 3 non-zero b-value shells) of the same HCP subject as Fig. 2. The second column shows corresponding maps estimated using standard tools from just the b = 0 images and b = 1000 smm −2 shell. The third and fourth columns show maps recovered by IQT-GL and IQT-RF, respectively, from just the b = 1000 smm −2 DTI. D.C. Alexander et al. NeuroImage 152 (2017) 283-298 trade-off between number of data points the training set can contain (maximum at small n) and the information content of each data point (maximum at large n). Fig. 6(a) shows that n=2 optimises that trade off.
Supplementary results in figure S3 explore the dependence of performance on noise level in the input data and orientation with respect to the training data (representing a change in head orientation). They show that performance degrades gracefully as noise increases and that performance is orientation dependent. Other results (not shown) reveal that reconstruction error reduces rapidly as the number of trees in the random forest increases from 1 to 3, but stabilises above 4; increasing the number of trees further, even for high T | |, provides little gain. Moreover, for fixed training set size, little improvement arises from increasing the number of source images in the HCP1 training set above its setting here of 8. Corresponding results for MAP-MRI resolution enhancement (not shown) show similar trends to those for DTI: forests outperform single trees, global linear transformations, and standard interpolation in terms of reconstruction error; n=2 is a good choice, although for MAP-MRI n=1 performs similarly to n=2 and we thus use n=1 for MAP-MRI driven tractography later, as the boundary completion is faster to run. Fig. 6(b) shows that n=1 optimises the neighbourhood size for IQT NODDI parameter estimation. The comparison to n=0 confirms that the neighbourhood information is important and improves substantially on a direct estimation of the NODDI parameters from each DT in isolation. Fig. 7 evaluates generalisability of IQT beyond its training set. Panel (a) evaluates the effect of choice of training set within the HCP data set, while (b) compares the performance of models trained on HCP, Lifespan, and Monkey data sets. Fig. 7(a) tests the hypothesis that IQT models generalise well across demographic groups, i.e. performance depends little on the choice of subjects in the training set. Results show that reconstruction errors on a test set of images from a very homogeneous group (HCP2: white, right-handed, females in their 20s) depend very little on whether the training set comes from other individuals in the same group, a set of individuals in a distinct homogeneous group (HCP3: black, lefthanded, males in their 30s), or a diverse group (HCP1: with all combinations of the characteristics in HCP2 and HCP3). In fact, training on HCP1 produces slightly lower errors than training on HCP2 on both test sets, highlighting that a precise match of demographics between the training and test sets has little importance. Error scores vary substantially among test sets, reflecting the relatively high variability over subjects, which, as discussed earlier, depends on gross anatomy, such as ventricle volume. Fig. 7(b) tests wider generalisability of IQT-RF by comparing DTI reconstruction errors for the HCP1 test set using IQT-RF trained on various different data sets. The variability of the error scores among the HCP Random 1-8 training sets (grey markers) allows us to evaluate the standard deviation over training set, which is of order 10 mm s −7 2 −1 . This illustrates the within-subject variability (in contrast to the betweensubject variability that the standard deviation over subjects reflects) that we refer to earlier in the discussion of Fig. 5 and confirms that performance depends only weakly on the choice of HCP training subjects. Although error scores from the Lifespan and Monkey data sets are outside the range of variability of the HCP training sets, so significantly worse statistically, both training sets still produce good approximations to the ground truth. Both produce much lower error scores than interpolation techniques: comparison of error scores with Fig. 5 shows that IQT-RF trained on Monkey data produces similar error scores to IQT-GL trained on HCP data (see also Fig. 2 inset). Fig. 7(b) also shows that for the Lifespan training sets, performance increases with T | | favouring a small number of complex trees, as with the HCP training sets. However, for the Monkey data sets, a larger  | |, and patch size, n. Panel (a) is for IQT resolution enhancement with m=2 using HCP1 training and test sets. Panel (b) is for IQT NODDI parameter estimation. The black markers and lines show error scores for models trained on HCP1 and grey those for HCP2 training; the test set in both cases is from HCP1. D.C. Alexander et al. NeuroImage 152 (2017) 283-298 number of simpler trees proves favourable; this promotes generalisability better for this training data, which is more distinct from the test data than the HCP or Lifespan training data.
Tractography Fig. 8 demonstrates IQT's direct benefits to tractography in ameliorating ambiguities that arise from limited spatial resolution. The top row replicates the finding in Sotiropoulos et al. (2013) that probabilistic tractography separates the four distinct pathways from the cortical hand area to the thalamus, brainstem, spinal cord and putamen (Haines, 2012) when using 1.25 mm resolution data sets. Fig. 4 of Sotiropoulos et al. (2013) also shows that tractography cannot separate the pathways using 1.5 mm or 2 mm resolution images. However, following IQT MAP-MRI resolution enhancement (n=1, m=2, HCP1 training data, T | | ≈ 1.6 × 10 6 , 8 trees in IQT-RF), probabilistic tractography recovers the four pathways even from 2.5 mm resolution input data. The finding is consistent over the HCP test subjects. In contrast, the same tractography procedure after linear or cubic interpolation misses at least one of the four pathways and produces various false positives (yellow arrows).

IQT on independent data sets
Figs. 9-11 demonstrate IQT-RF, trained on HCP data, on the independently acquired Prisma data sets. The key distinction with earlier results in Figs. 2 and 3, is that the low resolution data set is acquired at 2.5 mm resolution rather than obtained by the same downsampling procedure used in training. The results confirm efficacy of IQT-RF on realistic data. Specifically, IQT-RF sharpens weak structures in data from widely available scanners (Fig. 9), produces plausible NODDI parameter maps even from very sparse input data sets (Fig. 9), recovers structure at the white-matter-grey-matter boundary that interpolation cannot (Fig. 10), and enables tractography to separate thin pathways from data acquired at 2.5 mm isotropic resolution (Fig. 11). Fig. 9 illustrates DTI resolution enhancement and NODDI parameter mapping via IQT-RF on the Prisma data set from Subject 1. The increase in detail in the high resolution images obtained from IQT-RF is clear in many areas of the DEC maps, particularly after zooming in (rows 2 and 4). For example, the small red, blue, and yellow (top to bottom) structures marked with white arrows are clearly discernible in the IQT-RF maps, but not in the low-resolution 2.5 mm data, even after cubic interpolation of the source DWIs to 1.25 mm resolution. The figure also shows that performance degrades gracefully as the number of input DWIs decreases. The second column of the figure uses all 90 b = 1000 smm −2 DWIs to construct the DTI input to IQT-RF. Such a data set would require about 15 min to acquire on a modern clinical system. However, the third and fourth columns use only 30 and 6 DWIs, respectively, representing sparser 5 minute or 1 minute acquisitions. Very short acquisitions like these are common in clinical situations where subjects cannot stay still, e.g. infants (Counsell et al., 2003), or are in the middle of a surgical operation (Winston et al., 2012). The figure shows that even using very sparse input to IQT, the mappings we learn produce high resolution images with good consistency to those from less noisy input and to the pseudo-ground-truth. IQT-RF also produces NODDI maps that reflect the ground-truth contrast. IQT-RF preserves the pattern of contrast even with the thirty-direction input, although the maps become visibly disrupted with six-direction input. Fig. 10 compares cortical DTI principal direction maps in the 1.35 mm and 2.5 mm resolution data sets with those after interpolation and IQT-RF resolution-enhancement from 2.5 mm to 1.25 mm resolution. The figure illustrates areas where IQT-RF recovers the sharp tangential-orthogonal transition at the white-grey matter boundary that interpolation loses (black arrows and ellipse). It also shows areas that match the 1.35 mm pseudo ground-truth less well (yellow arrow and grey matter above black ellipse). In the white matter of the gyrus indicated by the yellow arrow, IQT-RF and interpolation both produce a slightly different orientation (more purple) to the 1.35 mm data (more green). However, the orientations in both reconstructions reflect the source 2.5 mm data, so the difference may arise from imperfect spatial alignment of the 1.35 mm and 2.5 mm data sets; small residual misalignments always remain even though state-of-the-art image registration (Anderson and Sotiropoulos, 2016) maps both data sets to MNI space. IQT-RF recovers the sharp tangential-orthogonal transition across the white-grey matter boundary on the left side of that sulcus, albeit displaced (again possibly arising from imperfect spatial alignment with the pseudo-ground truth), whereas interpolation does not recover the transition at all. The grey matter at the top of the sulcus (right of yellow arrow, above black ellipse) has complex structure in the 1.35 mm data set that is lost in the 2.5 mm data; neither Fig. 7. Quantitative evaluation of generalisability of IQT resolution enhancement both within the HCP data set (panel (a)) and across different data sets (panel (b)). Panel (a) plots errors for training and test sets of demographically distinct HCP sub-groups with m=2 and n=2. The same colours indicate the same test set (HCP1 or HCP2) and the same markers indicate the same training set (HCP1, HCP2, or HCP3). HCP1 contains diverse subjects while HCP2 and HCP3 are homogeneous but distinct. In the legend, "TrainXTestY" represents training on HCPX and testing on HCPY. Panel (b) shows reconstruction errors on the HCP1 test set for IQT-RF with m=2 and n=2 trained on Monkey data, Lifespan data, the HCP1 and HCP2 training sets, as well as the eight HCP-Random training sets. The error bars show one standard deviation over subjects.
D.C. Alexander et al. NeuroImage 152 (2017) 283-298 interpolation nor IQT-RF recover the full complexity of the region. IQT-RF relies on boundary completion in much of that area. Nevertheless, reassuringly, IQT-RF remains consistent with the lowresolution data producing similar output to interpolation. Fig. 11 shows probabilistic tractography from the hand area of the motor cortex using the full 1.35 mm data set, and 1.25 mm data sets reconstructed by interpolation and IQT-RF MAP-MRI resolution enhancement (n=1, m=2) from the 2.5 mm Prisma data sets using all 90 directions. Fig. 8. IQT-enhanced tractography. A: Probabilistic tractography maps (thresholded at 0.35%) for one HCP subject (not in any training set) overlaid on a group-mean T1weighted image. Tractography maps come from the original high resolution data set with seed region in the hand area of the motor cortex, together with corresponding maps using data sets reconstructed via linear and cubic interpolation, IQT-GL, and IQT-RF, after downsampling to 2.5 mm resolution (8 times larger voxels). The blue arrows point to four cortical projections that tractography identifies in the original data (from medial to lateral: cortico-thalamic, cortico-bulbar, cortico-spinal and cortico-striatal). B: Error and correlation scores between probabilistic tractography maps from the original high resolution data set and each reconstruction; bar-chart metrics are averages over the 8 test subjects from HCP1. Individual performance on all test subjects are root-meansquare errors and correlations normalised to the worst and best case, respectively, so the worst error score is 100 and the best correlation score is 1. D.C. Alexander et al. NeuroImage 152 (2017) 283-298 In both subjects, tractography separates the four cortical projections (as described in Fig. 8 and indicated by the blue arrows) from the pseudo ground-truth 1.35 mm data set. After interpolation of the 2.5 mm data set to 1.25 mm resolution, tractography cannot separate the four pathways, in particular missing the cortico-bulbar projection to the brainstem (2nd projection from the left), and produces several false positives. Tractography on the 1.25 mm IQT-RF data sets finds four pathways, recovering the cotico-bulbar projection. In subject 2, tractography on the IQT-RF 1.25 mm data set separates the three medial pathways even more clearly than the 1.35 mm data set. In both subjects, the position and shape of the most lateral (cortico-striatal) projection is somewhat disrupted in comparison to the 1.35 mm data set. However, the match between the IQT-RF tractography and the pseudo groundtruth is remarkable given that the former uses less than one sixth of the input data.

Discussion
We have introduced the notion of IQT, proposed two straightforward implementations, and demonstrated the potential to transfer information from a unique custom-built high-power imaging system to lower-quality data representative of widely available machines. The current implementations of IQT require only off-the-shelf machine learning tools and simple and standard data preprocessing. They require no image smoothing, denoising, spatial normalisation, or specialised acquisition sequence or protocol. Although training IQT-RF can require days of processor time, a straightforward image reconstruction with IQT-RF typically requires about 10 minutes of processor time (similar to DTI or MAP-MRI processing, but <5% of the standard NODDI computation), since it requires just one linear transformation per image patch. Boundary completion increases computation. However, both operations also parallelise naturally.
IQT resolution enhancement shows clear improvements over interpolation, which is the standard way to estimate sub-voxel information in imaging applications from visualisation by a radiologist to sophisticated post-processing procedures, such as tractography. IQT avoids artefacts commonly associated with interpolation (Dyrby et al., 2014), such as hot-spots/blurrring in Fig. 2 and the partial volume effects in Fig. 3, and recovers fine detail lost at low resolution. Although higher order interpolation can also reduce such artefacts to some extent (Dyrby et al., 2014), such methods all approximate high resolution voxels as averages over low-resolution neighbourhoods. IQT's strategy is fundamentally different and has the substantial advantage of exploiting knowledge of likely corresponding high resolution image structure. The technique should be beneficial in any application that requires sub-voxel inference.
Future work must compare performance of IQT's mappings learned from training data, with more recent interpolation techniques, such as sparse reconstruction or non-local means (Coupé et al., 2013;Manjon et al., 2010). Preliminary tests with the basic algorithm in Manjon et al. (2010) find best performance running it directly on the DT element maps, which gives DT-RMSE of 1.1 × 10 mm s −4 2 −1 on the HCP1 test set.
Thus, it scores in between cubic interpolation and IQT-GL; see bar chart in Fig. 2. The later technique (Coupé et al., 2013) tailored specifically for DWI will be interesting to compare once the code is freely available.
The generalisability that Fig. 2 and 7 illustrate is important, because it reveals the potential for IQT to enhance images distinct Fig. 10. Visualisation of DTI principal directions in and around the cortex in Prisma data, Subject 1, before and after resolution enhancement. Fig. 9. IQT-RF for the non-HCP data set from Subject 1. The top row (columns 2-4) shows DEC maps from the b = 1000 smm −2 shell of the 2.5 mm data set with all 90 directions (middle left), a subset of 30 of the 90 directions (middle right), and a subset of 6 directions (right). The top left panel shows the DEC map after cubic interpolation of the 90 DWIs used in the middle-left panel to 1.25 mm resolution. The third row compares DEC maps from the 1.35 mm pseudo ground-truth data set (left) with those from 1.25 mm DTI maps reconstructed by IQT-RF from the 2.5 mm DTI maps with varying numbers of input images in the corresponding columns of the top row. To show differences more clearly, rows 2 and 4 zoom in on the area marked by the white rectangle in the left-most image of row 3 in the corresponding image immediately above; the white arrows highlight structures visible in the IQT maps that are not discernible in the low-resolution images. The last three rows compare ground-truth NODDI parameter maps (left), fitted to the whole 1.35 mm data set, to parameter maps reconstructed by IQT from DTI fitted to 90 (and 21 b=0 images), 30 (plus 5 b=0), and 6 (plus 1 b=0) (middle left to right, respectively) directions from the b = 1000 smm −2 shell of the 1.35 mm data set. In each case, IQT-RF uses an eight-tree random forest learned from HCP1 with T | | ≈ 1.6 × 10 6 ; n=2 for DTI resolution enhancement, and n=1 for NODDI mapping.
D.C. Alexander et al. NeuroImage 152 (2017) 283-298 from the training set. Specifically, reconstruction of high-resolution DTI degrades remarkably little as we move through IQT trained on i) HCP data from very similar subjects to the test data; ii) HCP data from a distinct demographic; iii) data from a distinct acquisition protocol and very different demographic (the HCP Lifespan data set); and iv) data from fixed monkey brains. This shows the great potential of IQT to enhance any data set even when training data is not directly repre-sentative. Nevertheless, the robustness of performance in the presence of pathology remains a key question for healthcare applications. Future work must determine whether IQT can benefit cohort studies and whether training data from patients is necessary to do so. A related challenge is to quantify uncertainty in IQT output, to highlight areas of low confidence e.g. from lack of representation in the training data. The current implementations naturally provide such estimates, via Eq. (3), although future work must evaluate their precision. See Tanno et al. (2016) for preliminary work on IQT uncertainty quantification in the presence of pathology. More generally, future work could explore ideas from domain adaptation (Blitzer et al., 2006) to address the challenges of generalisability particularly in situations where direct training data is hard to come by. One natural question for future work on resolution-enhancement is whether IQT can predict even higher resolution images than its training data on the assumption of fractal properties of brain images, i.e. self-similarity across scale. This is reasonable over a limited range of scales, as, for example, we can expect similar partial volume effects to arise going from 1.25 mm to 2.5 mm voxels as from 0.625 mm to 1.25 mm. Fig. 12 shows an initial exploration of the idea. It shows 0.625 mm and 0.313 mm isotropic resolution DEC maps obtained via IQT-RF, trained with 1.25 mm images downsampled to 2.5 mm, but using the original 1.25 mm image as input for one and two iterations, respectively. The figure focusses on regions of interest in increasingly (top to bottom) lateral sagittal slices. The top row shows a slice near the midline of the brain and focusses on a region in the cerebellum. IQT-RF reveals the fine tree-like structure of the cerebellar white matter, e.g. the thin pathways ending at the white horizontal arrows, which are difficult to discern at the original resolution. The middle row illustrates clearer depiction after IQT-RF super-resolution of the fine interdigitating pathways in the brain stem at the level of the pons (blue and red area). IQT-RF infers sub-voxel crossing structures (e.g. white vertical upward arrow) and distinct subareas (white vertical downward arrows) obscured by partial volume effects at the original resolution. The bottom row shows that IQT-RF highlights the fine separating pathways in the lateral areas of the corpus callosum (white diagonal arrows) projecting to different gyri; the pattern is consistent with those in very high resolution ex-vivo DTI and polarised light imaging, e.g. (Mollink et al., 2016). Fig. S4 shows equivalent images obtained from cubic interpolation of each DWI prior to fitting the DT. The interpolated images do reveal some of the same hidden detail, but are more blurred. For example, the fine pathways of the cerebellum are less clear than from IQT-RF. Although these results are compelling, validation is challenging due to lack of ground truth. Future validation work with very high resolution data is necessary to confirm efficacy.
IQT-enhanced tractography reveals thin pathways in standard resolution data previously identifiable only in specialised high-resolution data. We concentrate here on just one well-known tractography task to illustrate the potential and further work will evaluate the benefits more generally. However, exploitation of IQT in this way promises major benefits in neuroanatomy Wedeen et al., 2012) and patient connectivity studies Warren et al., 2013), where tractography has provided fundamental advances in understanding. It offers similar benefits in neurosurgical planning e.g. (Winston et al., 2012), which relies increasingly on tractography, but with low-resolution data from limited acquisition time; Fig. 9 shows feasibility of IQT in such scenarios. Moreover, the benefits that IQT offers tractography are mostly complementary to other advances in tractography, such as global and microstructureinformed tractography (Sherbondy et al., 2010;Reisert et al., 2014;Daducci et al., 2015) or machine-learning based tractography (Neher et al., 2015). A recent evaluation of signal models in Ning et al. (2015) suggests that other signal representations can recover fibre orientations more accurately than MAP-MRI, so future work may examine the impact of the precise signal representation IQT uses and make choices that further enhance performance. Fig. 8 shows that occasionally (e.g. Fig. 11. Probabilistic tractography maps, overlaid on T1 maps in MNI space, on both subjects for various acquired and reconstructed data sets. D.C. Alexander et al. NeuroImage 152 (2017) 283-298 subject 7) tractography matches ground truth better after cubic interpolation than IQT-RF. This occurs despite the IQT-RF reconstructed MAP-MRI map always having lower error compared to ground truth than interpolation. In this particular case, IQT in the seed region relies on boundary completion, which reduces accuracy and disrupts the tractography process from the start. IQT parameter mapping gives tantalising results in recovering multi-shell microstructure-imaging parameter maps from single-shell input. The ability to estimate subtle microstructure parameters from single-shell DTI data potentially enables reanalysis of a wide range of historical brain-imaging studies to reveal more specific information on group differences or disease mechanisms. However, this estimation problem is extremely challenging and ill-posed, as the effects of neurite density and orientation dispersion are inseparable in isolated voxels from single-shell input data (Zhang et al., 2012). IQT manages to resolve much of the ambiguity by exploiting the local image (i.e. patch) structure. For example, within-voxel orientation dispersion from NODDI or SMT must have some consistency with the distribution of orientations in surrounding voxels; IQT-RF with non-trivial patch size learns such constraints implicitly. Nevertheless, the current implementation is not sufficient to replace the multi-shell acquisition. The ellipses in Fig. 4 highlight a loss of detail. Fig. 13 shows how this manifests in a group study: IQT-NODDI recovers the known trend across age-groups of NODDI's f ICVF parameter, which increases to middle age then decreases towards old age (Chang et al., 2015). However, the absolute values are biased upwards and the ranges are narrower. This reflects the fact that, even considering neighbourhood configurations, the mapping from DTI to NODDI is underdetermined so IQT-RF cannot capture subtle variations in parameter values. Despite these limitations, the current implementation may be sufficient to provide pilot data motivating reacquisition of existing single shell data sets using a multi-shell protocol. Alternative input parameters, such as crossing fibre orientations, T1 or T2-weighted intensity patterns, white-matter-grey-matter classification, or even the weak parameter estimates from the second column of Fig. 4, may improve performance. In general though, IQT parameter mapping may be more directly useful in stabilising estimates from very sparse, but multi-shell, data sets, as Golkov et al. (2016) and Fig. 9 explore. Future work will test the potential further.
Although the simplicity of the current IQT implementations is appealing, many other implementations are feasible and may improve performance. More precise simulation of low-quality from high-quality data to generate matched training pairs is certainly possible (Jog et al., 2015). Other learning algorithms, e.g. deep learning via convolutional neural networks (Dong et al., 2016), may capture more detailed and precise IQT mappings and may help specifically in learning mappings at the boundary to help tractography from cortical regions. Enforcing Fig. 12. "Super-resolution" DTI maps obtained from IQT-RF. The first column shows DEC maps from three sagittal slices at increasingly lateral positions (top to bottom) from the same data set used for Fig. 2 at the original 1.25 mm-isotropic resolution. The second column shows the regions of interest marked in the first column at the original 1.25 mm-isotropic resolution. The third and fourth columns show the same regions of interest in 0.625 mm and 0.313 mm isotropic resolution data sets obtained from IQT-RF with m=2 and n=2 after one and two iterations, respectively. Fig. 13. Box plots showing the trends across the different age groups in the Lifespan data set of the NODDI f ICVF parameter from standard NODDI fitting and IQT-RF estimation from DTI after training on HCP1 (8 training images, T | | ≈ 1.6 × 10 6 , 8 trees, n=1, m=1). The markers show the mean, median, standard deviation, and range over the white matter skeleton from the popular tract-based spatial-statistics method (Smith et al., 2006). compatibility of neighbouring patches, e.g. via a Markov random field or AutoContext (Tu and Bai, 2010), may help resolve remaining ambiguities. For example, close inspection of the 0.313 mm images in Fig. 12 reveals subtle artefacts, e.g. a blockiness to the image in the vicinity of the upward vertical arrow in the middle row, that arise when pushing the technique to its limit. Neighbourhood constraints and uncertainty evaluation can add further constraints to avoid such effects.
All these candidate IQT implementations support many other important image reconstruction and analysis challenges, such as: synthesising other image contrasts, e.g. estimating T2-weighted images from T1 (Jog et al., 2015;Ye et al., 2013), to reduce acquisition time in clinical studies, or estimating X-ray CT images from MRI (Burgos et al., 2015), to avoid irradiating patients in cancer-therapy planning; learning mappings among imaging protocols to reduce confounds in multicentre studies or studies that straddle scanner upgrades (Mirzaalian et al., 2016); or removing artefacts e.g. from signal drop-out or subject motion. The demonstrations we make here illustrate the potential of IQT on its own to bring the power of tomorrow's imaging techniques into today's clinical applications. However, even greater future potential lies in IQT's complementarity to rapid image acquisition strategies, such as compressed sensing (Lustig et al., 2007), MR fingerprinting (Ma et al., 2013), or simultaneous multislice (Moeller et al., 2010;Setsompop et al., 2012). The combination offers great promise in realising practical low-power MR, or other imaging, devices such as portable desktop, ambulance or battlefield MRI scanners (Cooley et al., 2015, or in intra-operative imaging applications with a very tight acquisition-time budget, e.g. (Winston et al., 2012). Longer-term, these ideas support a future medical-imaging paradigm exploiting coupled design of a) bespoke high-powered devices to capture databases of high quality images, and b) widely deployed cheap and/or low-power devices designed specifically to exploit the rich information from (a).