Brain microstructure by multi-modal MRI: Is the whole greater than the sum of its parts?

The MRI signal is dependent upon a number of sub-voxel properties of tissue, which makes it potentially able to detect changes occurring at a scale much smaller than the image resolution. This "microstructural imaging" has become one of the main branches of quantitative MRI. Despite the exciting promise of unique insight beyond the resolution of the acquired images, its widespread application is limited by the relatively modest ability of each microstructural imaging technique to distinguish between differing microscopic substrates. This is mainly due to the fact that MRI provides a very indirect measure of the tissue properties in which we are interested. A strategy to overcome this limitation lies in the combination of more than one technique, to exploit the relative contributions of differing physiological and pathological substrates to selected MRI contrasts. This forms the basis of multi-modal MRI, a broad concept that refers to many different ways of effectively combining information from more than one MRI contrast. This paper will review a range of methods that have been proposed to maximise the output of this combination, primarily falling into one of two approaches. The first one relies on data-driven methods, exploiting multivariate analysis tools able to capture overlapping and complementary information. The second approach, which we call "model-driven", aims at combining parameters extracted by existing biophysical or signal models to obtain new parameters, which are believed to be more accurate or more specific than the original ones. This paper will attempt to provide an overview of the advantages and limitations of these two philosophies.


Introduction
Magnetic resonance imaging (MRI) has had an unprecedented contribution to our understanding of the brain, thanks to its ability to take extremely detailed pictures of this organ non-invasively. As our understanding of the MR signal increased, and hardware development allowed us to push the boundaries further and further, a range of image contrasts, each reflecting different properties of the tissue, has become available.
This has prompted a shift from qualitative to quantitative MRI that represented a true revolution in the application of MRI for research (Tofts, 2003a), particularly with the development of techniques able to detect changes occurring at the microstructural level.
Most of these techniques have proven extremely sensitive to tissue abnormalities, albeit at the price of poor specificity. The MRI signal is a very indirect measure of the tissue properties we are interested in, and despite the influence that factors such as myelin content and axonal packing have on the contrast, the variety of factors that contribute to the overall signal prevents a one-to-one association between MRI biomarkers and biological substrate. In order to overcome this intrinsic limitation of MRI, increasingly sophisticated models of signal behaviour have been developed, in an attempt to link the MRI signal to specific tissue features (such as axon diameter or permeability), e.g (Alexander et al., 2010;Coatleven et al., 2014;Kaden and Alexander, 2013). However, these applications remain associated with prohibitively long scan times and poor reproducibility.
How can we access non-invasive imaging biomarkers with improved specificity?
The answer lies in the versatility of MRI: by combining several MRI contrasts we can exploit the relative contributions of differing pathological substrates to selected MRI contrasts and substantially increase the sensitivity to specific substrates. A way of picturing this is by imagining that many tissue components have been "encoded" via different filters in each MRI technique. Multi-modal MRI is thus the way to decode them.
Multi-modal MRI is a broad concept that refers to any attempt to combine information coming from more than one MRI contrast. The possible approaches thus span from simply measuring several MRI parameters in the same individuals, to developing joint models, to using complex computational approaches to derive new measures.
In this paper we will review some examples of multi-modal imaging with the aim of identifying the advantages of this approach while highlighting at the same time the challenges and pitfalls associated with it.
The paper is organised as follows: first we will review the main components of brain tissue that we may want to characterise, and the MRI techniques that so far hold the most promise for achieving this goal. Next we will discuss the evidence that supports the complementarity of some of these techniques. Finally, we will review the most popular methods for the acquisition and analysis of multi-modal data.
What are we trying to measure?
The aim of microstructural imaging is to quantify the properties of tissue components, such as myelin, axons, dendrites, glia, and to characterise pathological features such as demyelination, inflammation, axonal loss. In other words, the ultimate goal of microstructural imaging is to be able to provide non-invasive histology. While the same principles apply to the study of white and grey matter, this paper will focus primarily on the former tissue. This is because most of the work done to date concerns the white matter, and models of the MRI signal in white matter are usually less ambiguous than those of the grey matter.
The white matter of the human brain is composed by tightly packed myelinated and non-myelinated axons and glial cells. The glial cells include oligodendrocytes, astrocytes, microglia, and oligodendrocyte progenitor cells (Walhovd et al., 2014). Pathology in the white matter thus consists mainly of demyelination, axonal degeneration and loss, and gliosis. In addition, iron, which is stored in the ferritin protein, tends to accumulate with age and neurodegenerative processes, although its concentration levels are higher in the grey matter (particularly in the basal ganglia) than in white matter (Connor et al., 1990;Hallgren and Sourander, 1958). Similar changes are induced to MRI biomarkers by each and/or a combination of these abnormalities, complicating their interpretation. Most MRI parameters tend to share some degree of variance, and disentangling these contributions is essential to understand the pathophysiology of neurological disorders and therefore to develop treatments. In addition to disease, measuring white matter changes is also relevant to understanding the mechanisms underpinning plastic changes occurring to the brain as a consequence of maturation, ageing, training and lifestyle (Kleim et al., 2002;Scholz et al., 2009;Zatorre et al., 2012).
How are we trying to measure it?
Here we will provide a brief overview of some basic concepts that may be needed to understand the following sections. While an extensive review of each technique is beyond the scope of this paper, interested readers can refer to the references provided below for more details. This list of techniques is not meant to be exhaustive: other MRI methods that offer insight into microstructure exist. Here we have included the most popular ones and also those that so far have been most consistently combined in a multi-modal fashion.

Diffusion MRI
The contrast in diffusion MRI (dMRI) arises from the interaction between the random motion of water molecules and the obstacles (constituted by membranes, organelles, cells, etc) they encounter within tissue (LeBihan, 1990). If such obstacles are not distributed uniformly, but rather form ordered "barriers" to diffusion, then diffusion becomes anisotropic. Diffusion tensor imaging (DTI) can model diffusion anisotropy, and allows a number of scalar indices to be derived, that can be used to characterise tissue microstructure (Basser et al., 1994). Fractional anisotropy (FA) has become one of the most popular MRI-derived indices in clinical studies, and it has been applied to the study of many neurological and psychiatric disorders (Pierpaoli et al., 1996). However, changes to anisotropy are difficult to interpret because different effects, such as myelin loss and axonal degeneration, could result in the same FA change (Beaulieu, 2009). A more comprehensive picture can be gained by looking at the eigenvalue changes at the same time, or at the so-called axial (AD) and radial (RD) diffusivities (Song et al., 2005). However, care must be taken, as axial and radial diffusivity may be meaningless in regions of crossing fibres (Wheeler-Kingshott and Cercignani, 2009). This is a consequence of the diffusion tensor being inadequate to describe diffusion in such a system. dMRI has evolved over the years to account for these problems, yielding more complex models, which can account for more than one fiber bundle per voxel, and even for multiple compartments. Many of these advanced models (e.g., diffusion kurtosis imaging (Jensen et al., 2005)) increase sensitivity to microstructural changes but still do not provide specific information about the nature of the detected changes. More direct measurements of specific features, such as axonal density of radius, can in principle be achieved (Alexander et al., 2010) exploiting dMRI, but they require prohibitively long acquisition times and specialised equipment.

Magnetization transfer imaging
Magnetization transfer (MT) is based on the exchange of magnetization occurring between groups of spins characterised by different molecular environment (Wolff and Balaban, 1989), namely those in free water and those bound to lipids and proteins. It is generally accepted that myelin dominates this process in the white matter. While MT imaging has been available for 3-4 decades now, the traditional approach to quantify it was based on the so-called MT ratio (MTR) (Dousset et al., 1992). MTR is the percentage difference of two images, one with off-resonance saturation (to which only macromolecular protons are sensitive) and one without. By increasing the number of acquisitions to 3, it becomes possible to separate the contributions of MT and T1, and therefore to reduce the impact of other factors, including the T1-shortening effects of iron, on image contrast (Helms et al., 2008). Analytical models of the MT-weighted signal exist (Henkelman et al., 1993), where each pool is characterised by their longitudinal relaxation rates (R A and R B ), their transverse relaxation times (T 2A and T 2B ), and their spin densities (M 0A and M 0B ). The exchange rate constant between the pools is R. Assuming that myelin is the main contributor to MT in white matter, the ratio M 0B /M 0A , known as the pool size ratio (PSR or F), is believed to reflect myelin content. Some Authors prefer to use the bound pool fraction (BPF) which is given by M 0B /(M 0B þ M 0A ). Several animal studies support this assumption (Ou et al., 2009b;Turati et al., 2015). MT is also known to be sensitive to inflammation and pH (Henkelman et al., 2001;Louie et al., 2009), and consistently it was recently suggested that MT parameters other than F might be sensitive to activated microglia or astrocytosis (Harrison et al., 2015) and metabolism (Giulietti et al., 2012). One of the limitations of MT is that typically the quantification of macromolecular protons is relative to the amount of liquid protons. This of course makes it impossible to distinguish between cases of increased water (e.g., oedema) and decreased lipid-proteins (e.g., demyelination) (Stanisz et al., 2004). In addition, macromolecules other than myelin might affect MT measurements. T1, T2, T2*, and T2' relaxometry T1 and T2 are known to be extremely sensitive to white matter microstructure (Kucharczyk et al., 1994). The properties of myelin, in particular, cause the relaxation times of the water trapped within its layers to be much shorter than those of intra and extracellular water (MacKay et al., 1994). This can be exploited in multi-component relaxometry, also known as multi-exponential T2 (MET2). The technique involves sampling the signal at several echo-times, and estimating the spectrum of T2-values (MacKay et al., 2006), with each peak corresponding to a different water component. In a departure from the original technique, the multi-component driven equilibrium single pulse observation of T1 and T2 (mcDESPOT) approach (Deoni et al., 2003) allows whole brain myelin water fraction (MWF) maps to be obtained in under 10 min. One of the differences between mcDESPOT and MET2 is in the assumptions made about water exchange. While MET2 methods typically assume there is no exchange, such an assumption has been questioned particularly in the presence of myelin thinning. Neglecting this term might lead to underestimating the myelin water fraction, and should be considered as a limitation of these methods.
The signal in gradient echo sequences decays with a faster time constant than T2, due to the presence of local external magnetic field inhomogeneities causing an additional dephasing of the magnetization. This shorter decay time is known as T2*. T2* is related to any factor causing local susceptibility changes such as the presence of iron (Haacke et al., 2005).
R2' (¼1/T2'), defined as the difference between 1/T2* and 1/T2, should provide a more direct measure of field inhomogeneities and thus of iron (Ordidge et al., 1994). However, this parameter tends to be small in magnitude and often affected by noise, which limits its precision, and its use.
T2* contrast is also exploited in techniques such as bloodoxygenation level dependent (BOLD) contrast, arterial spin labelling, and susceptibility-weighted imaging.

Quantitative susceptibility mapping
Quantitative susceptibility mapping (QSM) is becoming increasingly popular . Magnetic susceptibility is related to iron content, myelin properties, fiber orientation and blood flow (Haacke et al., 2005), and thus QSM has great potential in the context of microstructural imaging. This technique aims at measuring quantitatively the local magnetic susceptibility independently from the sample orientation. The data are acquired typically using flow-compensated gradient echo images, with a set of parameters that depend on the tissue of interest (Haacke et al., 2015). The relevant information is contained in the phase of the MRI signal, and require careful processing in order to be extracted. The processing includes phase-unwrapping (i.e. removing the discontinuities caused by the fact that the phase is defined between -π and π) and removal of background fields caused by bulk field inhomogeneities. Once the local magnetic field is estimated, estimating the magnetic susceptibility requires the inversion of an ill-posed problem, and only recently appropriate reconstruction methods have become available (e.g., (Liu et al., 2009;Schweser et al., 2013;Shmueli et al., 2009). Thanks to these developments, and to the advent of ultra-high magnetic fields for human MRI, QSM has become feasible in vivo, and it has been applied to the study of iron distribution, demyelination and oxygen metabolism (Wang and Liu, 2015).

Proton density quantification
The proton density (PD) quantifies the amount of MR-visible protons contributing to the MRI signal and therefore it is related to the tissue water content. Water content variations are often associated with pathological processes such as oedema but also with maturation and ageing (Neeb et al., 2006). In addition, assuming that the MR-visible protons in the brain correspond to the "liquid" protons in free water, it has been suggested that the quantity (1-water content) can be used as a measure of the macromolecular content, in the form of macromolecules and lipid tissue volume or MTV (Mezer et al., 2013). The MRI signal intensity is intrinsically proportional to PD through the equilibrium magnetization M 0 ; exact quantification, however, is hampered by a number of confounding factors, including field inhomogeneities and receiver coil profile (Tofts, 2003b). Once M 0 is known, a quantitative estimation of the water content requires some kind of calibration, in order to normalize PD values to a pure water standard. Thanks to the development of accurate methods to correct for the bias, recently PD quantification has gained momentum.

H MR spectroscopy
MR spectroscopy (MRS) measures the concentration of chemical compounds (known as metabolites) that contain hydrogen ( 1 H). Other nuclei can be studied (e.g., phosphorus, sodium or fluorine), but here we will focus on 1 H MRS. The physical principle behind MRS is "chemical shift", i.e. the difference in resonant frequency between each metabolite and water. This difference depends on the 'electron cloud' (Buonocore and Maddock, 2015), a term that refers to the field produced by the electrons surrounding the nucleus. Thanks to chemical shift, a spectrum showing the peak resonant frequency of each metabolite can be obtained using MR. The peak area is estimated as a measure of relative concentration. The metabolites of greatest interest are: N-acetyl-aspartate (NAA) which is seen only in neurons and axons and is believed to reflect both density and function of nervous cells; choline (Cho), a marker of membrane turnover, typically elevated in tumours; Creatine (Cr), which is often used as a reference for quantifying other metabolites; myo-inositol (mI), a glial cell marker; and lastly Glutamate þ Glutamine (Glx), Lactate, and GABA. Absolute quantification of metabolite concentration is challenging, and thus often it is expressed as a ratio between the metabolite of interest and another one (typically Cr), which is assumed to remain stable in the condition under study. This is, however, not ideal, as changes to Cr have been observed for example in tumours (Hattingen et al., 2008;Howe et al., 2003).
Imaging using multiple modalities: overlapping or complementing information?
The possibility of measuring many different physical quantities noninvasively has an enormous potential for characterising biological changes in tissue, with the final goal of devising the appropriate combination of quantitative MRI parameters for diagnosing and monitoring neurological and psychiatric disorders. Although the most informative way of combining several MRI parameters is not immediately obvious, several examples found in the literature support the notion that some complementary information can be obtained when using more than one MRI technique. In the attempt to prove the specificity of the bound pool fraction (F) from qMT and RD to myelin, Ou et al. (2009a) used retinal ischaemia as a model of axonal damage with no demyelination in control mice and of axonal damage with demyelination in shivered mice (confirmed by immunohistochemistry). BPF and RD were significantly different between control and shivered mice, but not between injured and uninjured eyes. By contrast, AD and relative anisotropy differed significantly between injured and uninjured eyes, but not between mouse strains. These data suggest that MT is selectively sensitive to demyelination, while dMRI could potentially be sensitive to both demyelination and axonal damage, although the well-known limitations of RD and AD must be taken into account (Wheeler-Kingshott and Cercignani, 2009). When qMT and dMRI were combined to characterise damage along the cortico-spinal tract of patients with benign MS, it was consistently shown that MT-derived PSR was significantly different from that of controls, while FA was not. This mismatch was interpreted as indicative of extensive demyelination in the absence of axonal loss (Spano et al., 2010). A similar approach was followed by Narayanan et al. (2006), who combined quantitative MT (as a marker of myelin density) with NAA/Cr from 1 H MRS (as a marker of axonal loss) to study the brain of patients with MS. In this small sample, both BPF and NAA/CR were found to be altered compared to healthy controls, but no correlation was found between the two, suggesting that axonal damage is not strictly related to demyelination outside of visible lesions. Coupled with examples from similar studies, these confirm that multiple contrasts can be complementary. Nevertheless, the aim of multi-modal imaging is to go beyond the acquisition of multiple modalities analysed "in parallel", and to combine the different parameters to obtain novel biomarkers, greater than the sum of their parts. By doing so, it should also become possible to account for the collinearity of several MRI measures. So, what is the optimal way of combining them, and what are the obstacles that may prevent us from achieving this goal?
How to acquire multiple modalities The first challenge of a multi-modal MRI protocol is in the acquisition strategy. There are essentially two alternative approaches that can be followed. The first one is to independently acquire the modalities of interest, while the second one is to develop specialised acquisition sequences which allow weighting along more than one dimension (e.g., diffusion and T2), accompanied by analytical models able to disentangle the information provided by each single weighting.
The use of independent acquisition has the advantage of simplicity, with sequences often available as commercial products, already available quantitative models, and usually comparatively high signal-to-noise ratio (SNR). However, bringing together data from separate acquisitions is not without problems. This is especially relevant when the two modalities require different type of readout. dMRI is typically obtained using singleshot EPI, which suffers from geometric distortions (Jezzard and Balaban, 1995). As these effects are non-linear, simple image realignment or affine registration do not compensate for them, and sophisticated approaches are required to match them with data obtained from spin-warp acquisitions. An example of the impact this geometric mismatch can have on multi-modal protocol is given by Mohammadi et al. (2015), who combined dMRI and MT to compute the g-ratio, i.e., the ratio of the inner to the outer axonal diameter (Stikov et al., 2015) (discussed below). Due to susceptibility distortions, some voxels of the corpus callosum showed an unrealistically high g-ratio~1. More convincing values are obtained after correcting for distortions (see Fig. 1).
Depending on the combination of parameters of interest, this issue might be addressed by serial acquisitions that share the same basic readout. For example, multi-parameter mapping (MPM) (Helms et al., 2009;Weiskopf et al., 2013) collects 3 multi-echo spoiled gradient-echo sequences with predominant T1-, PD-, and MT-weighting, respectively. The multiple echoes can be used to obtain estimates of T2*, but also for averaging to boost the SNR of each acquisition. From the T1-and PDweighted scans it is straightforward to extract the amplitude of spoiled gradient echo (apparent PD) and T1. These quantities can then be used to derive the "MT saturation" (Helms et al., 2009), a phenomenological quantity that, albeit not absolute, reflects the density of macromolecular protons after removing the confounding effect of T1.
The idea of combining multiple gradient-echo sampling (for T2*decay) with other weightings was originally implemented for measuring T2 and T2* at the same time (and therefore T2') in the method originally called gradient echo sampling of FID and echo (GESFIDE) (Ma and Wehrli, 1996), and further developed into the gradient echo sampling of the spin echo (GESSE) by Yablonskiy and Haacke (1997). These are early examples of combined acquisitions that remove some of the problems associated with independent measurements.
When data are acquired for the purpose of estimating sub-voxel compartments, the geometric mismatch can have important consequences, as even small differences in the resolution of separate acquisitions can introduce bias, if images are differently interpolated (see Fig. 2). Joint acquisitions can address this issue. These acquisitions also tend to be more time-efficient than independent ones, although the complexity of the mathematical models might impose some constraints on signal-to-noise ratio (SNR). Examples were introduced already in the late 90s, although restricted to in vitro experiments, due to the required scan time.
Early studies in excised tissue (Andrews et al., 2006;Peled et al., 1999; attempted to establish the relationship between T2-species and diffusion behaviour using some variant of the diffusion-weighted (DW) Carr-Purcell-Meigoom-Gill (CPMG) sequence. This acquisition consists of a standard pulsed-gradient spin-echo (PGSE) preparation (van Dusschoten et al., 1996) followed by a CPMG train of 180 pulses. An echo is collected after each refocusing pulse thus mapping the effects of T2 decay. By altering the amplitude and the direction of the diffusion gradients, it is possible to modulate the amount of diffusion weighting and to evaluate anisotropy. Although a 2D-spectrum could in principle be obtained through 2D inverse Laplace transform, in practice this is an ill-posed problem. Alternatively, a T2-spectrum can be obtained for each diffusion weighting separately, enabling the apparent diffusion coefficient (ADC) of the water compartments corresponding to each T2 peak to be studied. Andrews et al. (2006) further modified this approach by adding a double inversion recovery (DIR) preparation to the DW-CPMG. This allowed them to selectively suppress the signal from non-myelin components exploiting T1 compartmentalisation instead of T2. The results obtained in these separate experiments were fairly Fig. 1. Example of susceptibility-induced geometric distortions in the single-shell dMRI data and their effects on the estimated MR-based g-ratio map. The MR g-ratio and contrast-inverted b ¼ 0 maps (ib0) from the original (A,B,F,G) and susceptibility-distortion corrected dMRI data (C,D,H,I) of a representative subject were compared to the subjects' MT map (E,J), which did not suffer from susceptibility artifacts. The spatial mismatch between anatomical structures in the single-shell dMRI and MT data (see contours in red) was strongly reduced after susceptibility correction. The susceptibility-related mismatches between uncorrected dMRI and MT maps led to a severe locally varying bias in the g-ratio maps [e.g., crosshair highlights one of the voxels with an unrealistic g % 1 at the edge of the genu ( inconsistent, particularly with respect to anisotropy: some studies support the notion that the shortest T2 component (identified with myelin water) is strongly anisotropic in diffusion (Andrews et al., 2006), while others do not . One of the possible motivations for the incongruence is in the choice of the total number of T2 peaks to be modelled, which differed among these studies. An additional complication is constituted by the fact that T2 compartments do not necessarily correspond to diffusion compartments. For example, while intra-and extra-cellular water compartments are expected to have differing diffusion coefficients and behaviour (Assaf and Basser, 2005;Stanisz et al., 1997), it is still under debate whether their T2 can be distinguished (Bjarnason et al., 2005;Whittall et al., 1997). Therefore characterising the diffusion properties of a specific T2 peak might not have a straightforward interpretation.
Despite these limitations, the approach remains valid in principle, and attempts to measure diffusion and T2 at the same time have recently gained momentum, after the suggestion that DTI parameters might be TE-dependent (Qin et al., 2009). While this might be simply explained by SNR considerations (i.e., the difference in DTI parameters is caused by a bias due to decreasing SNR at larger TEs), the most intriguing interpretation is that TE determines the relative contribution of each water compartment to the global signal. This view, however, suggests that a comprehensive description of water compartments can only be gained by developing joint models that take into account diffusion and T2 behaviour at the same time. Advances in hardware enable increasingly shorter TEs to be used in conjunction with diffusion weighting (Fan et al., 2016). Using such a system, Tax et al. (2017) obtained a dataset with b values ranging from 500 to 7 000 smm À2 , and TE ranging from 47 to 127 ms. Their data confirmed a dependency of DTI indices on TE, and suggested that the combined diffusion and T2 spectrum might be resolved in the human brain, providing novel information about water compartments in white matter. Novel computational approaches able to resolve 2D spectra have also been proposed (Benjamini and Basser, 2016).
Another interesting example of combined acquisitions was proposed by De Santis et al. (2016), and combines inversion recovery (IR) with dMRI. This sequence was developed for the purpose of mapping T1 (exploiting the IR preparation) along specific white matter tracts, even when fibers cross within a single voxel (exploiting dMRI). The feasibility of this approach in vivo was demonstrated, although the scan time remains too long for clinical translation. With future advances in hardware and sequence design, similar schemes are likely to become more manageable and open the possibility of other contrast to be incorporated in a single acquisition.

How to combine multiple modalities
The strategies used to combine MRI techniques can be broadly classified into 2 categories: data-driven approaches, which rely on multivariate and/or machine learning methods; and model-driven approaches, which attempt to combine parameters extracted by existing biophysical or signal models to obtain new parameters, which are believed to be more accurate or more specific to a given substrate than the original ones.

Data driven methods
The most informative way of combining MRI parameters is not immediately obvious. A simple but effective way is using linear regression or similar methods. The interdependence between MRI parameters in this case becomes useful and can be exploited to maximise the amount of information derived from multi-parametric protocols. One parameter that is sensitive to several biological substrates, such as T1, can be modelled as a linear combination of other MR parameters, each used as a surrogate of one or more of these substrates. The unexplained, or residual, variance is then assumed to measure the tissue component which was not modelled by any of the surrogates. A few examples of successful application of linear regression can be found in the literature. Ciccarelli Fig. 2. Effects of resampling raw data to fit with multi-compartment models. These images show the percentage difference in the estimated NODDI parameters when the raw data are downsampled (top row) or upsampled (bottom row) before performing the fitting. The resulting maps were compared with maps that were resampled after the fitting. The original voxel size was 2.5 Â 2.5 Â 2.5 mm 3 . While these effects are small, it is conceivable that combining 2 or more multi-component models that undergo different degrees of resampling might introduce non-negligible errors.
M. Cercignani, S. Bouyagoub NeuroImage 182 (2018)  These examples highlight one of the potential problems with MRI biomarkers, namely their collinearity. T1 tends to correlate with T2 and MT-derived quantities; T2* and MT might share some variance: overall there is some overlap between the quantities that we are hoping to use to measure differing underlying pathology. Some of these data-driven approaches attempt to remove this collinearity, and to isolate the unique contribution of each specific technique.
More sophisticated approaches exploit multivariate methods, which provide tools able to reduce the dimensionality of the data and to extract from them some "latent variables" that better represent the characteristics of the object under study. Examples include principal component analysis (PCA), independent component analysis (ICA), and factor analysis, all of which re-express the data into a series of components obtained as linear combinations of the original observations. The difference between these 3 methods is in the criteria used to define these linear combinations. Two or more MRI modalities might be differently sensitive to several microscopic properties of tissue at the same time. Applying a data reduction approach might help to identify the common "latent" source of contrast, ideally related to a specific substrate. A nice example is provided by the multivariate myelin estimation model (MMEM) proposed by Mangeat et al. (2015). Assuming that T2* and MTR are both sensitive to myelin content, but also affected by other factors such as iron content and tissue orientation (T2*) or inflammation and pH (MTR), they combined them using ICA to identify their shared information, assumed Fig. 3. Example of data-driven multi-modal application: multivariate myelin estimation model (MMEM). MMEM aimed to estimate a cortical myelin map using MTR, T2*, cortical thickness (CT) and B0 orientation maps. The MMEM was divided into two steps. Firstly, two maps were estimated using multi-linear regressions: one using MTR, CT and B0 orientation (ME_MTR) and one using T2*, CT and B0 orientation (ME_T2*). ME_MTR and ME_T2* maps represent myelin-correlated values corrected for partial volume effect and fibers orientation. In order to merge MTR and T2* within the same framework, both linear regressions were performed with a common dependent variable (BMM). Secondly, the shared information between ME_MTR and ME_T2* was extracted using ICA decomposition, for each subject. The ICA decomposed the signal into two components that are mathematically independent. The 'so-called' first component of the ICA was the source that shares the highest variance between ME_MTR and ME_T2*, the hypothesis being that the first component of the ICA was an indicator for myelin content. Reproduced with permission from Mangeat et al. (2015). Copyright 2015 Elsevier Inc.
to reflect only myelin density in the human cortex (Fig. 3). These methods are relatively simple to implement and potentially useful for MRI modality combination. However, it must be noted that they are unable to distinguish between the intrinsic variability of a parameter due to the underlying microstructure, and the variability dependent on measurement error and image inhomogeneity. In addition, the interpretation of the resulting components is not always straightforward, and in some cases the latent variables may remain elusive in their meaning.
Machine learning (ML) approaches constitute a more advanced class of computational methods, which can be used to associate a combination of MRI techniques with a range of microstructural features (Lemm et al., 2011). ML methods rely on training an algorithm (most commonly some kind of classifier) to identify the features of interest using real data (Ashburner and Kloppel, 2011). Once trained, the classifier can be used on previously unseen data. Although there are no published examples to date, in the context of multimodal imaging, animal or post-mortem data could be used to associate a combination of MRI parameters with a specific tissue substrate validated with histology, and then translated into clinical applications. One of the limitations of ML algorithm is that they require a very large number of observations in order to produce reliable associations (i.e., in order to be appropriately trained). This might not always be possible in the context of biological samples.

Model-driven methods
This family of methods differs from multivariate models because it attempts to combine parameters extracted by existing biophysical or signal models to obtain new parameters, which are believed to be more accurate or more specific than the original ones. Biophysical models refer to those that explain the MR signal as a function of biological propertiesexample: axon diameter distribution (Alexander et al., 2010;Assaf et al., 2008); whereas the signal models explain the MR signal using mathematical or statistical propertiesexample: diffusion kurtosis (Jensen et al., 2005).
A simple example of this kind is driven equilibrium single pulse observation of T1 and T2 (DESPOT1 and DESPOT2)a method for quantifying T1 and T2 based on steady-state free precession (SSFP) sequences (Deoni et al., 2003). The SSFP signal equation is a function of both T1 and T2, as both longitudinal and transverse magnetization are brought into dynamic equilibrium through the application of repeated RF pulses (Young et al., 1986). In order to disentangle T1 and T2, an independent measure of either one is required. T1 is thus estimated using spoiled gradient echo at variable flip angles (Bluml et al., 1993), thus enabling T2 to be extracted. The method can be generalised to assume multiple water components, characterised by separate relaxation times (Deoni and Kolind, 2015). The multi-component version (mcDESPOT) yields maps of the fractions of myelin water as well as of intra and extra-cellular water (which cannot be distinguished using this method), and has been used in multiple studies to characterise myelination and other microstructural properties of tissue (Combes et al., 2017;Deoni et al., 2012;Kitzler et al., 2012;Kolind et al., 2013).
The sensitivity of relaxometry to myelin can be further exploited by combining these techniques with dMRI. While dMRI is highly sensitive to tissue geometry and integrity, the long echo times typically required to achieve sufficient diffusion weighting result in no signal contribution from the fast decaying myelin component. In principle the complementarity of the 2 techniques can be exploited to obtain separate estimates of the volumes of myelin, extra-cellular and intra-cellular spaces. Their complementarity derives from the fact that multi-compartment models of dMRI can easily separate the intra-cellular and the extra-cellular volume fractions, but typically are not sensitive to myelin while mcDESPOT does not distinguish intra-and extra-cellular spaces and measures their volume fractions as a combined sum. Bouyagoub et al. (2017) proposed a simple model that requires the separate acquisition of mcDESPOT and neurite orientation dispersion and density imaging (NODDI (Zhang et al., 2012),) to yield separate the intra-cellular and extra-cellular volume fractions maps along with myelin maps (Fig. 4).
As discussed in the joint acquisition section, multi-compartment models of diffusion such as NODDI provide estimates of tissue fraction that are relaxation-weighted. This means that acquisition parameters such as TE might affect the results, but also that in the presence of abnormal T2 values, the contribution of T2 and diffusion changes to abnormal volume fractions cannot be disentangled. Even in the healthy Fig. 4. Model-driven example of multi-modal approach. By exploiting the sensitivity of NODDI and mcDESPOT to different water compartments (top), it is possible to derive volume fraction maps for intracellular, extracellular, CSF and myelin water (Bouyagoub et al., 2017).

M. Cercignani, S. Bouyagoub
NeuroImage 182 (2018) 117-127 brain, CSF, which constitutes the bulk of isotropic diffusion volume fraction, has much longer T2 than parenchyma, potentially affecting the estimation of the isotropic diffusion component in white matter. Combining T2 and diffusion in a single acquisition is feasible, as discussed above, but challenging. An alternative option is to independently reconstruct the corresponding T2 spectrum, and then to feed these values into the diffusion model. It was recently shown that combining mcDES-POT, which provides estimates of intra/extra-cellular T2 and CSF T2, and NODDI allows the latter to be adjusted to incorporate T2 values, accounting for different T2 from different compartments and thus removing the bias (Bouyagoub et al., 2016). See Fig. 5 for an example. The local magnetic susceptibility is known to be affected by the orientation of white matter fibers with respect to the main magnetic field. Other MRI contrasts that reflect iron and myelin characteristics might have some dependency on the orientation of microstructure. Combining these techniques with DTI enables the orientation-dependency of these parameters to be investigated and modelled. This has been done for QSM , relaxometry (Gil et al., 2016)and the macromolecular pool absorption lineshape in MT (Pampel et al., 2015).
A clear example of the augmented information provided by combining two MRI techniques rather than focusing on a single contrast is given by the g-ratio framework (Stikov et al., 2015), extensively reviewed by other papers in this issue. The g-ratio is equal to the ratio of the inner-to-outer diameter of a myelinated axon, and is known to be associated with the speed of conduction along the axon (Rushton, 1934). A method for estimating the so-called "aggregate g-ratio" was proposed, building on simple geometric considerations and exploiting the respective sensitivities of MT imaging to myelin, and of dMRI to intra-axonal water fraction (Stikov et al., 2015). Similar approaches based on combining dMRI and relaxometry have also been suggested (Melbourne et al., 2014). Because of its link with conduction velocity, the g-ratio can be directly linked to axonal physiology and function, and represents an ideal candidate tool for exploring the structure-function relationship. Soon after its first introduction, this framework has been applied to the study of brain maturation, and the variability within the healthy population Dean et al., 2016;Mohammadi et al., 2015).
An important observation with respect to model-driven approaches is that they heavily rely on the biophysical modelling that links the signal behaviour in each technique to biological parameters being correct. Any bias would propagate into the multi-modal model, sometimes in an unpredicted fashion. In addition to systematic bias, one must not forget that any MRI parameter is affected by noise, and such noise will propagate into the newly derived index resulting from their combination. In some cases this effect can be estimated using the propagation of error equation (Bevington and Robinson, 2002), which shows how combining 2 noisy measures may blur any signal beyond detection. A very simple example is the combination of R2(¼1/T2) and R2*(¼1/T2*) for computing R2': If the variances associated with R2 and R2* are, respectively, σ 1 and σ 2 , according to the propagation of error equation the variance associated with R2 0 is given by: (2)

MR fingerprinting
A completely new approach to multiparametric MRI is MR fingerprinting (MRF) (Ma et al., 2013). The MRF concept relies on the idea that unique signal evolutions can be generated for different tissues through the continuous variation of the acquisition sequence parameters. Once the data are collected, pattern recognition can be used to associate each signal evolution with a specific tissue defined by a dictionary containing signal evolutions from all possible combinations of parameters. This enables T1/T2 values to be associated with that specific tissue, as well as Fig. 5. Exploiting multi-modal data to correct model bias. Fig. 5A shows the isotropic component estimated by NODDI in a healthy participant. The isotropic fraction appears unrealistically high in the white matter, and one of the possible explanations for this is that NODDI fractions are relaxation-weighted. Bouyagoub et al. (2016) used each compartment's T2 estimated by mcDESPOT to correct for this, yielding the maps shown on Fig. 5B. other MR quantities (depending on the acquisition parameters that are varied during acquisition). This is conceptually comparable to matching a person's fingerprint to a database. MRF relies on the use of undersampling techniques, such as compressed sensing (Lustig et al., 2007) to make the acquisition feasible in terms of scan time. Although still in its infancy, MRF offers a series of advantages compared to standard MR acquisition. It allows the collection of multiparametric data in a short time; it is robust against field inhomogeneities and motion artifacts; and it is extremely versatile. Providing a detailed review of MR fingerprinting and its applications goes beyond the scope of this paper but we foresee an increasing role for this approach in both clinical and research applications of MRI.

MRI and nuclear medicine
Multi-modal imaging can be thought as a more general approach, going beyond the boundaries of MRI. By combining MRI with other, complementary, neuroimaging techniques, such neurophysiological and nuclear medicine methods, it is theoretically possible to characterise more complex systems. For example, positron emission tomography (PET) can exploit ligands to specific neurobiological substrates to provide high specificity. Thanks to the spread of hybrid modalities, the marriage between MRI and nuclear medicine is likely to be a long and happy one. To date, examples of combined MRI and PET are still limited; nevertheless it was shown that biomarkers obtained by bringing together fluorodeoxy-glucose (FDG) PET, gray matter volumetrics and dMRI can better explain memory disorders in patients with AD than each single metric in isolation (Walhovd et al., 2009). Similarly, depressive symptoms in MS can be partially explained by functional connectivity of the hippocampus combined with imaging of PET ligands based on 8-kDa translocator protein (TSPO), sensitive to the activation of microglia, and thus to acute inflammation (Colasanti et al., 2014). More quantitative approaches are likely to be developed in the near future.

Conclusions
Although some degree of overlap exists between the information accessible through the main microstructural imaging techniques based on MRI, their complementarity ensures that their combination can be exploited to make the whole greater than the sum of its parts. While this is generally accepted and supported by a large body of data, the best way of bringing together different methods is still controversial. Multivariate methods offer a simple solution, but a somewhat complicated interpretation of the results. Joint models provide a more direct description of the microstructure but require more complex data acquisition strategies, a large degree of approximation, and are subject to a number of biases. Ultimately validation will be essential to understand the real potential of these methods, and their implementation will require the combined efforts of physicists, computational scientists and biologists.