Towards in vivo g-ratio mapping using MRI: Unifying myelin and diffusion imaging

Highlights • Second review on the topic of g-ratio mapping using MRI.• A summary of the most recent developments in the fieldproviding methodological background.• Discussion of pitfalls associated with g-ratio mapping using MRI.


Introduction
The g-ratio is a geometrical invariant of axons quantifying their degree of myelination relative to their cross-sectional size. Coupled with the axonal diameter, the g-ratio is a key determinate of neuronal conduction velocity (Rushton, 1951;Chomiak and Hu, 2009;Schmidt and Knösche, 2019). Signal transmission along different axonal fibres can be regulated and synchronised by varying the degree of myelination, and therefore the gratio, to optimize cognitive function, sensory integration and motor skills (Fields, 2015).
As the central nervous system appears to communicate at physical limits to constrain metabolic demands (Salami et al., 2003;Hartline and Colman, 2007;Coggan et al., 2015), small deviations from the optimal g-ratio value (0.6-0.8, (Rushton, 1951;Chomiak and Hu, 2009)) may have strong functional impact. For example, histological investigation has shown that the cortical g-ratio is higher in patients with multiple sclerosis, probably because of the de-and re-myelination processes associated with the disease and its progression (Albert et al., 2007). To understand such processes and their functional implications, clinical research and diagnostics would benefit greatly from the capacity to measure the g-ratio of fibre pathways in vivo.
Until recently, information about the g-ratio distributions have only been accessible by invasive methods such as ex vivo electron microscopy (Hildebrand and Hahn, 1978), which restricted analyses to small numbers of axons and a limited number of brain regions or pathways. The g-ratio measured by such techniques is denoted the microscopic g-ratio because of the extremely fine spatial resolution that can be achieved. Clearly using MRI to investigate the g-ratio in vivo would be highly desirable as it could provide whole brain information on a voxel-wise basis. Stikov et al. proposed the methodology by which such a non-invasive MR-based "aggregate" g-ratio could be measured (Stikov et al., 2011, which we denote in this review interchangeably the "MR g-ratio" or "g-ratio mapping". The MR g-ratio framework measures the ensemble average of an underlying, unresolved, microstructural distribution of g-ratios. Making a strong assumption that the g-ratio is constant within a voxel, Stikov et al. demonstrated, via a geometrical plausibility argument (Stikov et al., 2011, that this aggregate MR g-ratio can be computed on a voxel-wise basis from the ratio of the myelin and axonal volume fractions (MVF and AVF respectively). Establishing this relation was important because both the MVF and AVF can be estimated by combining biophysical models (Alexander et al., 2019;Novikov et al., 2019) and quantitative MRI within a framework known as in vivo histology using MRI .
The challenge for, and validity of, in vivo g-ratio mapping centres on how precisely and accurately the AVF and MVF can be measured with the chosen MRI techniques. Three years ago, Campbell et al. thoroughly reviewed the methods of g-ratio mapping and highlighted potential pitfalls (Campbell et al., 2018). A key take home message of their review was the introduction of the qualifying term "weighted" into the name MR g-ratio, i.e. aggregated g-ratio weighted mapping. They proposed this qualifier to acknowledge the impact that any miscalibration between the MR-based myelin proxy and the true MVF would have. Typically, ex vivo electron microscopy measures of the MVF act as the gold standard for methodological assessment and calibration.
Despite the challenges associated with accurate measurement and calibration of the MVF and AVF, many studies have exploited the potential of in vivo g-ratio weighted imaging for a variety of different applications (see Table 1 for full details). These have ranged from g-ratio mapping in infants (Melbourne et al., 2016) and children (Dean et al., 2016) to healthy adults Mancini et al., 2018;Berman et al., 2019;Drakesmith et al., 2019), during healthy aging (Cercignani et al., 2017;Berman et al., 2018) and as a result of pathological change (Hagiwara et al., 2017;Hori et al., 2018;Kamagata et al., 2019;Yu et al., 2019).
Since the review by Campbell et al. (Campbell et al., 2018), awareness has increased regarding the fact that g-ratio mapping with MRI can be confounded by the degree of miscalibration of the MRI-based MVF proxy. Furthermore, new methodological studies have been published on g-ratio weighted mapping, e.g. to assess its repeatability Ellerbrock and Mohammadi, 2018a), and the reproducibility when the particular proxies used for the AVF and MVF are varied (Ellerbrock and Mohammadi, 2018a). A series of validation studies have also been conducted by the Does lab West et al., 2018aWest et al., , 2018b) based on extensive histological data coupled with ex vivo MRI. These studies have provided insight into the relationship between the MR g-ratio, and the various MVF measures through comparison with the current gold standard of electron microscopy. A particular strength of these studies has been the inherently large dynamic range of the MVF and g-ratio owing to the fact that hyper-and hypo-myelinating mouse models were used. This has enabled the validity and sensitivity of g-ratio mapping to be more thoroughly investigated.
In this review, we provide the background information necessary to understand the MRI methodologies that have been used to date to quantify the MVF and AVF (or fibre volume fraction, FVF) in vivo specifically in the context of g-ratio mapping. We seek to unify the nomenclature describing the various myelin and diffusion models. Moreover, we use the findings of the aforementioned methodological studies in simulation based experiments to further understand the impact on the MR g-ratio of currently used calibration methods.
We examine the accuracy of the estimates based on the three myelin markers most commonly used for g-ratio mapping: the bound pool fraction derived, from quantitative magnetisation transfer methods, the macromolecular tissue volume derived from proton density mapping, and the myelin water fraction derived from myelin water imaging.
Finally, we provide an outlook on emerging approaches and what will be required to make g-ratio mapping with MRI a viable clinical tool.

Biomarkers
Subjects or Participants Remarks Axonal or Fibre volume fraction (AVF or FVF) Myelin volume fraction (MVF) LA.x and LM.x refer to limitations pertinent, respectively, to the AVF or MVF measure used. (Stikov et al., 2011) DWI1 (DTI) SPGR (qMT) 5C First model relating gratio to MVF and AVF. It assumed constant g-ratio in a voxel, and parallel axons. FA was also related to FVF assuming parallel fibres. LA.1, LM.1, LM.9  DWI2.5 (NODDI) SPGR (qMT) 1C; 1P; 1Mc; Revised g-ratio model. In this model, the g-ratio is still assumed to be constant in a voxel but the model was extended to nonparallel axons.

Methodology
Biological tissue is formed of multiple microenvironments, which we refer to as compartments or pools. From an MRI perspective, key compartments in an imaging voxel comprised of human brain tissue, are those formed of aqueous and non-aqueous protons (Fig. 1a). The aqueous protons ( ) appear in a variety of microenvironments including water trapped within the myelin sheaths of fibre pathways ( ), or contained within the intra-( ) and extra-cellular spaces ( ), and cerebrospinal fluid ( ). The nonaqueous protons are bound to macromolecules ( ), including myelin ( ) and other macromolecules such as proteins ( ). We express these compartments as fractions of the imaging voxel under the simplifying assumption that, while the relative contribution will spatially vary, every voxel is fully described by its content of water and bound protons, i.e. + = 1. Of these tissue compartments, it is the axonal and myelin-associated compartments that are important in the context of in vivo g-ratio mapping (section 2.1).
With MRI we tailor our experiments to maximise our sensitivity to specific compartments with the aim of quantifying the MVF and AVF respectively. To date, g-ratio mapping studies have either used relaxometry (Fig. 1b) or magnetisation transfer (Fig. 1c) techniques to quantify the myelin compartment (section 2.2), while diffusion imaging has been used to quantify the axonal compartment ( Fig. 1d and section 2.3). These different imaging modalities have each evolved specific nomenclature over the course of their development. In this review, we aim, wherever possible, to unify these disparate notations using the fractional contributions outlined above and illustrated in figure 1. To facilitate modelling, brain tissue is decomposed into four distinct tissue compartments (and CSF) that are of key relevance from an MRI perspective. These cover two broad categories: non-aqueous macromolecule-bound protons (fB) and aqueous (fW) protons, each of which may (fMW, fBM) or may not (fAW, fEW, fCSF, fBNM) be associated with myelin (a). Myelin water imaging specifically focuses on characterising the distinct water micro-environments to quantify the myelin water fraction, MWF, i.e. the fractional contribution from myelin-associated water, fMW, relative to all water (b). Magnetisation transfer approaches focus instead on distinct macromolecular-bound and free water compartments, which can exchange magnetisation to quantify the bound pool fraction (BPF), i.e. the fractional contribution from the macromolecular environment (BPF=fB/(fB+fW), c). The diffusion weighted signal is sensitive to intra-axonal and extra-axonal water compartments, and potentially to an isotropic diffusion compartment such as CSF. By decomposing the signal, the intra-axonal water fraction (AWF) can be isolated: i.e. AWF = fAW/(fAW+fEW+fCSF).

2.1
The Aggregate g-ratio Model Assuming a circular cross-section of axons, the microscopic -ratio of an individual axon indexed by is defined as = , where and are the inner and outer radii of the fibre respectively (see Fig. 2a). All further considerations are targeting the white matter (WM), which is considered to be composed of three discrete, non-overlapping compartments: axonal (A), myelin (M), and extracellular (E). In this case, any sample volume of WM can be described by the volume fractions (VF) of each compartment, which sum up to one, i.e.: + + = 1. Using this WM model, Stikov and colleagues (Stikov et al., 2011 suggested that the aggregated -ratio in an MRI volume ( Fig.   2b) can also be defined in terms of volume fractions as: (1) = √1 − + To derive the relationship in Eq. (1) (see also (Stikov et al., 2011), the -ratio in an MRI voxel is assumed to be constant (Fig. 2c), whereas there is no restriction on the orientation of the axons in the voxel (Fig. 2d). Shortly after the g-ratio model was introduced,  suggested that is in fact capturing the fibre-areaweighted mean (Fig. 2e) of all the microscopic g-ratios in the voxel (Fig. 2f). If the assumptions of Eq. (1) hold, this model can also be used with other imaging modalities (e.g., electron microscopy, where the and have been measured after segmentation of the image ). This efficient process allows the microscopic information obtained by these other modalities to be summarised over a spatial scale comparable to an MRI voxel, and therefore to be compared directly with the MR-based g-ratio in validation studies. The aggregate g-ratio model has been developed specifically for white matter (Stikov et al., 2011Campbell et al., 2018), where biomarkers of the MVF and AVF can be measured with MRI. In the following sections we will first outline the methods that have been used to date to quantify MVF and AVF in the context of g-ratio mapping.

Figure 2:
Schematic summary of the aggregated g-ratio model and its relation to the microscopic g-ratios. Myelinated axons (a) are represented by annual cylinders with myelin (blue) and axonal (yellow) compartments (b-f), other microstructural compartments are agglomerated in the background (grey). The aggregated g-ratio ( ) can be formulated as a function of the axonal and myelin volume fractions ( and respectively, b). In this model, all axons within a voxel are assumed to have the same g-ratio. In the initial model suggested by Stikov et al. in 2011, the axons were also assumed to be orientated in parallel (c). This assumption was subsequently relaxed , allowing arbitrary axonal orientation (d). West et al. (2016) showed that the aggregated g-ratio is related to the fibre area-weighted mean of the microscopic g-ratios (f)in the figure the weights are represented by the degree of transparency to indicate the weighting towards larger fibres (e).

Myelin Volume Fraction
A variety of different MRI-based measures have been used to characterise the myelin content within a voxel (Alonso-Ortiz et al., 2015;MacKay and Laule, 2016;Sled, 2018).
Here we focus on myelin-water imaging (MWI) and magnetization transfer (MT) imaging.
In both cases, each of which will be discussed in turn, the measure aims to be reflective of the fractional myelin content within the imaging volume, i.e. the MVF. This is done by quantifying either the myelin water fraction ( = , Fig. 1b) or the bound pool fraction ( = + , Fig. 1c). In either case, an additional calibration step is clearly required to convert the measure to the MVF ( + + ) in order to accurately compute the g-ratio (West et al., 2018b). As noted by Campbell et al. (Campbell et al., 2018) this calibration step is crucial to the accuracy and precision of g-ratio mapping and will be discussed in detail in section 3.

MWF based on Myelin Water Imaging
Starting from Figure 1, the simplest water imaging model quantifies the density of free water protons within an imaging voxel, i.e. the proton density (PD) (Tofts, 2004). Under an assumption of complete longitudinal recovery within each repetition time, TR, the extrapolated MR signal at an echo time, TE, of 0ms ( 0 ) is proportional to the product of the fractional water content, , a calibration factor, , that accounts for the concentration of protons in the voxel relative to that of free water, and the spatially-varying receive field sensitivity, , : 0 = such that + = 1 (Fig. 1c). The receive field modulation must be estimated and removed ( 0 ′ = , see section 2.2.3) prior to final calibration, which is done with respect to a reference, e.g. cerebrospinal fluid, CSF: This is equivalent to assuming that the volume fraction of macromolecules is zero ( ≈ 0 and = 1), i.e. 0, ′ = . The remaining contents of the voxel have recently been referred to as the macromolecular tissue volume ( = 1 − = ) (Mezer et al., 2013). PD mapping typically makes no distinction between different water microenvironments (e.g. myelin water v's non-myelin water) and instead estimates the sum of contributions from all compartments (Fig. 1b,c) under the assumption of a monoexponential signal decay. Therefore, MTV might vary with the minimum echo time, as well the echo spacing, at which the signal was sampled (more details can be found in (Tofts, 2004)).
By contrast, myelin water imaging (MWI, (Alonso-Ortiz et al., 2015)) extends this model to encompass multiple distinct water compartments, each with specific relaxation behaviour contingent on the local microenvironment. MWI quantifies myelin-associated aqueous protons in a voxel as a fraction of the total MR visible water signal, i.e. = as defined in Figure 2b. To date, three main approaches to myelin water imaging have been used for g-ratio mapping using MRI. Each technique exploits a different relaxation property to stratify the different tissue water compartments (MacKay and Laule, 2016): (1) multi-echo spin echo imaging to quantify compartment-specific transverse relaxation times (Melbourne et al., 2016;West et al., 2018a), T2, (2) multi-echo gradient echo imaging to quantify compartment-specific effective transverse relaxation times (Jung et al., 2018) , T2*, and (3) multi-compartment driven equilibrium single pulse observation of T1 and T2 (mcDESPOT, (Deoni et al., 2008;Dean et al., 2016;Drakesmith et al., 2019)) to distinguish fast and slow relaxing compartments based on their distinct T1 and T2 relaxation and exchange behaviour.
In MWI, the MWF is most commonly estimated by characterising the proportion of the water signal originating from different microstructural environments based on their distinct transverse relaxation times (T2). To do this, it is assumed that the residency time, , of the protons in each water pool is sufficiently long that their distinct relaxation behaviour can be discerned. The case  >> T2 indicates a slow exchange regime, which can equivalently be described by an exchange rate, k = 1/ << 1/T2. In this case, multiexponential behaviour, with a component originating from each of the water pools having distinct amplitude and relaxation times, can be discerned (Zimmerman and Brittin, 1957).
Indeed, T2 distributions from normal brain have been shown to contain multiple peaks that can be attributed to myelin water trapped between the lipid bilayers, intra/extracellular water and cerebral spinal fluid (Whittall et al., 1997;MacKay and Laule, 2016).
To quantify distinct T2 times, data are typically acquired using a multi-echo spin echo readout with a range of echo times. Each voxel is assumed to contain contributions from an unspecified number of slow or non-exchanging environments, each with distinct T2 decay times. Fitting the data to this model is typically done with a regularised nonnegative least squares approach (Whittall and MacKay, 1989;MacKay et al., 2006), in which the regularisation ensures smoothly varying signal amplitudes as a function of T2.
After fitting, the myelin compartment is assigned to the short T2 peaks, requiring a threshold T2 time to be specified. The MWF is then estimated as the area under the peaks below this threshold T2 time relative to the area under all peaks, i.e. (MacKay et al., 2006). This ignores any differential weighting that might be present, for example due to compartment-specific T1 times. In white matter, at least two different T2 relaxation times have been reported, which are associated with different tissue compartments (MacKay et al., 2006;Cercignani et al., 2018): (1) myelin water having a T2 of about 15 -at 3T. It should also be noted that the T2 relaxation times of the intra-and extra cellular spaces likely differ (Dortch et al., 2013;Veraart et al., 2018;McKinnon and Jensen, 2019) and that there is exchange between these compartments that also influences the T2 distribution in white matter (Sled et al., 2004). These effects will be revisited in section 3.1.1 but have also been discussed in detail elsewhere (Does, 2018).
A similar approach uses a multi-echo gradient echo acquisition in lieu of acquiring spin echoes. In this case compartment-specific T2* times are estimated instead of T2 (Lenz et al., 2012;Sati et al., 2013). This approach is more time efficient, less vulnerable to transmit field inhomogeneity and less demanding from an RF power perspective since refocusing pulses are not required. However, given that the signal is governed by the more rapid T2* decay, it does suffer from reduced signal-to-noise ratio (SNR) relative to spin echo approaches. Fitting complex-valued data to the multi-exponential decay model has been shown to increase the robustness of MWF estimates made with this approach (Nam et al., 2015b).
Rather than modelling distinct tissue compartments solely from the decay of the transverse magnetisation, the mcDESPOT approach integrates spoiled gradient echo (SPGR) and balanced steady-state free precision (bSSFP) images, acquired with different nominal flip angles, to fit a two compartment model of the steady state signal (Deoni et al., 2008). The combination of these two acquisition types allows both T1 (SPGR) and T2 (bSSFP) to be estimated (Deoni et al., 2013). In the mcDESPOT model distinct relaxation times are determined for a fast and a slow relaxing pool, as well as the exchange rate (k), or residency time () of the two pools in the condition of chemical equilibrium. The fast relaxing pool is subsequently assumed to be myelin-associated water allowing the MWF to be quantified. The relaxation and exchange of these two pools is modelled using the Bloch-McConnell equations, which allows analytical solutions for the steady state signal to be derived (McConnell, 1958;Liu et al., 2016). Fitting the acquired data to these signal models requires seven distinct model parameters to be estimated: T1, T2 and fractional amplitude for each compartment as well as the exchange between them.

BPF based on Magnetisation Transfer
Like PD mapping, magnetisation transfer (MT) based approaches simplify the characterisation of white matter to two distinct pools (Fig. 1c). In this case one is comprised of an aqueous environment and the other a non-aqueous environment that, in the context of g-ratio mapping, is assumed to be associated with myelin. Unlike "free" water, such as found within the intra-or extra-cellular compartments, that has a sharp resonance linewidth, the "bound" non-aqueous protons have a much more heterogeneous microenvironment leading to a much broader range of MR frequencies and by consequence a very short T2 in the range of tens of microseconds, meaning that the transverse magnetisation component is undetectable with MRI, unless ultra-short TE approaches are adopted (Sheth et al., 2016;Jang et al., 2020;Weiger et al., 2020).
However, given its broad range of resonant frequencies, this bound pool can be selectively saturated through the application of an off-resonance radiofrequency pulse prior to conventional excitation and signal detection. This pre-pulse can selectively saturate the longitudinal magnetisation of the bound pool while leaving the free pool largely unaffected. Subsequently, the process of magnetisation transfer (MT), primarily occurring through dipolar coupling between the bound and free pools, leads to an observable reduction in the measured signal intensity (Wolff and Balaban, 1989;Sled and Pike, 2001;Sled, 2018;van Zijl et al., 2018). MT techniques capture the relative proportion of magnetisation in the bound pool relative to the free pool through the pool size ratio ( = (Sled and Pike, 2001) and Fig. 1c). Analogously to the MWF in MWI, the BPF, is defined as the magnetisation fraction within the bound pool relative to the total magnetisation in both pools (Sled, 2018): Fig. 1c . In the first g-ratio mapping studies, the measured BPF was calibrated against histological data to convert it to an estimate of the MVF and combined with a diffusion-based measure of the FVF to estimate the g-ratio (Stikov et al., 2011. The simplest means of probing the macromolecular bound pool via MT is to acquire an image using a pre-pulse with a single off-resonance frequency interleaved with a standard excitation pulse. The magnetisation transfer ratio (MTR) is defined as the normalised signal decrease relative to a reference image with only the standard excitation pulses (Henkelman et al., 2001). While this measure has been shown to be reflective of myelin content via histological analysis (Schmierer et al., 2004) hardware dependencies remain and reduce its comparability across individuals . A more robust measure that incorporates correction for both spatially varying T1 and B1 + effects is the magnetisation transfer saturation (MTsat). This measure quantifies the percentage saturation per TR of the steady state SPGR signal that would result from a dual excitation sequence, and is dependent on the BPF (Helms et al., 2008). MTsat has been shown to be more robust to B1 + inhomogeneity than MTR  and to empirically correlate with the pool size ratio, F (Campbell et al., 2018).
More comprehensive modelling of the two magnetisation pools is obtained through quantitative MT (qMT) imaging. This approach aims to separate the contributions of the free and bound pools by explicitly modelling the distinct T1 and T2 relaxation times of the pools and incorporating the exchange between them, under the assumption of chemical equilibrium. The absorption lineshape of the bound pool must also be modelled, and is often assumed to be super-Lorentzian, with a T2 in the region of tens of microseconds (Morrison and Henkelman, 1995). With this approach, the BPF can be estimated from the fractional magnetisation contributions of the two pools. To estimate this extended set of parameters, multiple images, sampling the so called z-spectrum, are acquired, each using a pre-pulse with a different off-resonance frequency (Sled and Pike, 2001;Cabana et al., 2015;Sled, 2018).
An intriguing, but not yet validated, approach that has also been used in the context of gratio mapping is to use multi-compartment Bloch simulations to model the myelin volume fraction within the voxel directly (Warntjes et al., 2016;Hagiwara et al., 2017).

Protocol Considerations for MVF mapping
A range of different protocols can be used to estimate the proton density, and by consequence the macromolecular tissue volume (Warntjes et al., 2007;Volz et al., 2012;Baudrexel et al., 2016;Mezer et al., 2016;Wang et al., 2018;Callaghan et al., 2019;Lorio et al., 2019). This approach requires an estimate of the receiver field sensitivity, , which can be obtained by constrained model fitting or measurement (Mezer et al., 2016). The normalisation step to express PD as a fraction, or more commonly a percentage, of the concentration of protons in pure water requires a reference region to be defined, e.g.
within the CSF-filled ventricles. However, the optimal choice of the normalisation region will depend on the acquisition scheme since sufficient SNR is required for robust estimation (CSF was used in (Berman et al., 2018) and white matter in (Ellerbrock and Mohammadi, 2018a)). The accuracy and precision of the PD estimation will in turn dictate the accuracy and precision of the MTV estimate. The mapping of PD was introduced in the context of fully relaxed signal (i.e. TR >> T1). However, for reasonable scan times, this requirement can be relaxed, but in this case it is necessary to correct for spatially varying T1 recovery.
Multi-compartment MWI necessitates short echo times to adequately sample the decay of the short T2 myelin-associated water compartment (Whittall et al., 1999). This extends the minimum achievable TR and can lead to long acquisition times, particularly for spin echo based approaches, unless spatial coverage or resolution are sacrificed. Acquiring multiple spin echoes in a single readout increases temporal efficiency, but the train of pulses can lead to the refocusing of echoes from unwanted pathways, i.e. the production of stimulated echoes, when the transmit field, B1 + , is inhomogeneous. Correction schemes based on simulating the impact of these echoes (e.g. (Lebel and Wilman, 2010)) have been proposed and can be incorporated into the fitting procedure. 2D slice-selective approaches are also vulnerable to distorted slice profile effects, which can be mitigated either by modifying the sequence to ensure a sufficiently broad refocusing width, or by accounting for the effect during processing (Lebel and Wilman, 2010;Nöth et al., 2017).
The large number of refocusing pulses also increases the specific absorption rate (SAR) of the sequence and can be a limiting factor at higher field strengths.
MTR and MTsat are time efficient means of quantifying the effect of magnetisation transfer. As highlighted earlier, MTsat is more hardware robust. In addition, high resolution maps can be obtained with whole brain coverage in reasonable scan times making it particularly appealing for clinical studies. This efficient method was used in the first group study mapping the g-ratio in vivo . However, a limitation of these rapid approaches is that they are semi-quantitative. The saturation of the bound pool, and therefore of the free pool via magnetisation transfer, will depend on the particular off-resonance pulse used, most notably the power and offset frequency. For further details, acquisition protocols and software for estimating this parameter see e.g. (Tabelow et al., 2019).
qMT approaches circumvent this limitation by quantifying specific physical parameters.
However, the extended datasets required to fit the full qMT model fitting lead to a trade off between scanning durations and spatial resolution and/or coverage. To constrain the model fits, parameters can be fixed, e.g. the T1 of the free and bound pool can be set equal to each other, or an "observed" T1 can be separately measured and integrated into the fitting to relate the T1 times of the bound and free pools. For further details and software available for fitting such models, see e.g. (Cabana et al., 2015).
Clearly brain tissue can be characterised by a very broad range of physical parameters.
The multi-parameter mapping (MPM) quantitative MRI protocol offers a comprehensive approach providing high resolution, whole brain estimates of (single compartment) T1, T2*, PD, MTV and MTsat, with correction for transmit and receive field effects, in clinically feasible scan times Callaghan et al., 2019;Tabelow et al., 2019).
As such it provides simple proxies for both the macromolecular (via MTsat & MTV) and free water pools (PD) in a single protocol.

Axonal Volume Fraction and Fibre Volume Fraction
Diffusion MRI is the method of choice to separate the intra-and extra-axonal tissue compartments ( and , Fig. 1d) because of the distinct diffusion properties of water in these compartments. However, as detailed above, the myelin-associated water compartment has a short T2. This means that diffusion-weighted MRI is insensitive to myelin water because of the comparatively long minimum echo time required to accommodate the application of diffusion gradients. Nonetheless, there are several different diffusion-based approaches available to probe the intra-axonal tissue compartment. Detailing each of these goes beyond the scope of g-ratio mapping. For those interested in more details, we refer to other excellent reviews (e.g. (Alexander et al., 2019;Novikov et al., 2019)). Here we will specifically focus on those approaches that have been used to date to estimate the intra-axonal volume fraction for the purpose of computing the aggregated g-ratio. These studies can be subdivided into two categories: the studies that have used standard DTI data and those that have used multi-shell (and even more advanced) diffusion MRI protocols. Each category will be discussed in turn.

FVF from DTI data
The first category of studies requires only a limited set of measurement parameters, including only a single b-value and a modest number of diffusion directions, as defined by the DTI protocol. Therefore, these studies refrain from explicitly modelling more than one tissue compartment. A feature of these studies was the interpretation of diffusion-MRI based measurements of the axonal compartment as the FVF rather than the AVF, which, given the insensitivity of diffusion MRI to myelin water, is probably incorrect as we will discuss further in the next section.
DTI: The first g-ratio mapping study by Stikov et al. (Stikov et al., 2011) used simulations, in which axons were modelled as straight, parallel cylinders to establish a second order relationship between the fractional anisotropy (FA) of the diffusion tensor and the total .
The assumption of straight and parallel cylinders, however, restricted the application of this model to white matter regions with well aligned fibres. As a result, it has only been applied to the corpus callosum to date (Stikov et al., 2011;Berman et al., 2018).
TFD: The diffusion model used by Mohammadi et al. (Mohammadi et al., 2015) for g-ratio mapping was based on the tract-fibre density (TFD), which is not restricted to well-aligned fibre pathways and thus could be applied across the whole brain. The TFD was derived from fibre orientation distributions (Reisert et al., 2013) and assumed to be directly proportional to the . The proportionality constant that related TFD to was combined with the calibration coefficient that related the MTsat myelin marker used to capture . The resulting calibration constant was estimated by referencing against a ground truth g-ratio value from literature . This calibration approach will be further discussed in the context of myelin biomarkers in section 3.2.
However, Ellerbrock et al. (Ellerbrock and Mohammadi, 2018a) recently showed the TFDbased parameter to be less stable in terms of repeatability and comparability than estimates derived from the Neurite and Orientation Dispersion in Diffusion Imaging (aka NODDI) model (Zhang et al., 2012), discussed in more detail in the next section.

AVF from multi-shell diffusion MRI data
Using a more extensive set of experimental measurement parameters, i.e. multiple bvalues (aka diffusion shells), allows the second category of studies to use a more principled model for the diffusion signal, the so-called "standard model" (Novikov et al., 2019). The standard model is built upon well-established signal models for two tissue compartments (for a summary see, e.g., (Novikov et al., 2019)), the axonal ( ) and extracellular ( ) compartments (Fig. 3a.ii). A restricted signal component is assumed to come from the axonal compartment, which is modelled as impermeable sticks (Fig. 3a.iii). A hindered signal component describes the extra-cellular space, which is modelled using a 3D anisotropic diffusion tensor. Such an example is depicted in Figure 3c for the White Matter Tissue Integrity (WMTI) model, showing the axially-symmetric ellipsoidal tensor composed of axial ( ,|| ) and perpendicular ( ,⊥ ) extra-cellular diffusivities.
In contrast to g-ratio studies based on DTI data, those using multi-shell diffusion MRI data also acknowledge the fact that the direct contribution of myelin water in the diffusion MRI signal is negligible (Fig. 3a.ii). As a consequence, their models take into account that the axonal compartment estimated from the visible MRI signal in a typical diffusion experiment is not = + +   to estimate by rescaling the accounting for the unsampled , i.e.: (2) = (1 − ) . values are for the in vivo case. However, NODDI does not assume parallel fibres, but rather accounts for fibre dispersion ( ), which is described by a Watson distribution (Stoyan, 1988).
NODDI can therefore be used in regions with more disperse fibre orientations (as depicted in (b)).
WMTI: The WMTI model (Fieremans et al., 2011) contains signal contributions from intraaxonal ( ) and extra-cellular ( ) compartments and is therefore directly related to the "standard model" of diffusion MRI. In this model, the signal fraction of sticks ( = + , Fig. 1d) is directly used as proxy for while 1 − (= + , Fig. 1d) estimates the extra-cellular water fraction (Fig. 3c). In addition to , WMTI simultaneously estimates the intra-axonal diffusivity ( ,|| ) and two extra-cellular diffusivities ( ,⊥ and ,|| ) of an axially-symmetric ellipsoidal tensor. However, it assumes parallel fibres and therefore has applied been only to the corpus callosum (West et al., 2018a).
mcSMT: Like WMTI, the multi-compartment Spherical Mean Technique (mcSMT) model developed by Kaden et al. (Kaden et al., 2016) is based on the "standard model". But, instead of assuming parallel fibres, it uses the SMT to factor out the contribution of fibre orientation. As a result, it can be applied to the whole brain. Similar to the WMTI model, mcSMT estimates the signal fraction of the intra-axonal space, . This has been used as a proxy for the in g-ratio mapping (West et al., 2018a). In the mcSMT model, the intra-and extra-cellular parallel diffusivities are assumed to be equal ( ,|| = ,|| ) and the tortuosity model (Szafer et al., 1995) is used to relate the extra-cellular parallel and perpendicular diffusivities to each other via : ,⊥ = (1 − ) ,|| .

NODDI:
The most commonly used method to estimate the in g-ratio mapping has been the NODDI model (Fig. 3d, (Zhang et al., 2012;Stikov et al., 2015)). As compared to the aforementioned models, NODDI is a 3-compartment signal model. It not only models the two signal compartments from the intra-axonal and extra-cellular space but also a third isotropic signal component ( with an associated signal fraction ν 0 = + + , Fig. 1d) to account for any partial-volume contamination by freely diffusing water, e.g., as in CSF. To compensate for the increased number of model parameters and stabilize model fitting, the diffusion constants are fixed (Fig. 3b). ) such that the intra-axonal signal fraction is corrected for the contribution of the CSF compartment, to ensure the g-ratio WM model assumption, i.e. + + = 1 (Fig. 1d).
NODDI accounts for fibre dispersion using the single-parameter Watson distribution (Stoyan, 1988;Jespersen et al., 2012), making it applicable for whole brain g-ratio mapping.
CHARMED: Compared to other diffusion models that have been used for g-ratio mapping, the Combined Hindered and Restricted Models of water diffusion (CHARMED) approach makes the fewest assumptions. It models diffusion in the extra-cellular space by a full ellipsoidal tensor (whereas the NODDI and WMTI models assume an axially-symmetric ellipsoid), and, in principle, it can account for crossing fibre configurations (Assaf et al., 2004;Assaf and Basser, 2005) unlike the standard NODDI approach. The CHARMED model can be further extended to additionally estimate axon diameters (e.g. (Assaf et al., 2008;Alexander et al., 2010;Huang et al., 2016)). This has been used by Duval et al. for g-ratio mapping in the spinal cord (Duval et al., 2017) and by Yu et al. (Yu et al., 2019) in patients with multiple sclerosis. However, such a protocol requires more extensive (and time-consuming) data acquisition.

Protocols for AVF mapping
While the first category of studies requires only a standard single-shell DTI protocol (Stikov et al., 2011;Mohammadi et al., 2015;Berman et al., 2018), the minimum requirement protocol for the second category of studies depends on the model to be used for AWF mapping. The WMTI model parameters can be estimated from the diffusion kurtosis tensor measurement (Fieremans et al., 2011;Jespersen et al., 2018). The NODDI, mcSMT, and WMTI model parameters can be estimated from a two-shell diffusion MRI protocol composed of a "lower" (~1 2 ) and a "higher" diffusion weighting (~2 2 ) 1 . In contrast to the aforementioned models, the CHARMED model typically requires a more extended diffusion MRI protocol: Drakesmith et al. used a five shell diffusion MRI dataset for g-ratio weighted imaging (Drakesmith et al., 2019). Extending the CHARMED model to also estimate axon diameters requires an even more advanced protocol where the b-values and additional diffusion parameters such as diffusion sensitization times also have to be changed ( see (Duval et al., 2017) for g-ratio mapping).
Typical protocol-associated issues that can introduce biases are: ceiling effects (i.e. = 1 in white matter, which can be encountered with NODDI if b-shells are sub-optimally sampled (recommendations for optimal sampling provided in (Zhang et al., 2012)). Rician noise in low SNR data can also lead to bias in AWF estimation can propagate into the 1 Note that these parameters are for in vivo imaging and will be different for ex vivo MRI. For example, in the study by (West et al., 2018a) the low and higher diffusion weighting were at ~3 2 and ~6 2 , respectively.
AWF parameters from WMTI. Mapping accurate AWF parameters in the spinal cord comes with additional challenges because of increased susceptibility to nonlinear motion (e.g. due to swallowing, (Yiannakas et al., 2012)), physiological noise (e.g. (David et al., 2017)), or partial volume effects due to its small size (1 cm in diameter).

Challenges for aggregated g-ratio mapping
The most important prerequisite of g-ratio mapping with MRI is that the biomarkers of MVF and AVF be accurate. Two key requirements for an accurate biomarker are model validity and a one-to-one correspondence between the MRI-biomarker and the gold standard MVF and AVF. While the first point can be investigated by theoretical evaluation of the model, the second point is typically not fulfilled necessitating a calibration step.
Another important challenge is related to imaging artefacts and their impact on the multimodal combination of MVF and AVF biomarkers. In this section, we will first discuss the question of model validity associated with MRI-based MVF and AVF biomarkers, then we will use a simulation experiment based on ex vivo data to improve our understanding of the calibration step, and finally we discuss imaging artefacts associated with the multimodal combination of MRI data.

Model Validity
It is important to bear in mind that "all models are wrong but some are useful" 2 . In the following sections we will cover some of the key model assumptions made to facilitate in vivo mapping of the AVF and MVF and enable g-ratio mapping. We will also discuss the consequent limitations of application.

MVF models
The simplest model for estimating is based on PD mapping, in which a monoexponential, i.e. single water compartment, is typically assumed when extrapolating the signal to a TE of 0ms to remove confounding T2 (*) decay. This is clearly not valid and constituent water compartments within a voxel will have variable influence depending on the echo times and spacings used (Whittall et al., 1999). This will be the case for both PD mapping and MWI. In general, longer apparent T2 (*) , and smaller fractional contribution from short T2 components, are observed as the first TE is increased . It is also important to fully sample the decay, which requires sufficiently long echo times to capture any slowly decaying compartments. However, when fitting magnitude data with long echo times, significant biases can be introduced by the Rician noise distribution and greatly alter the measured T2 (*) values (Bjarnason et al., 2013). As noted earlier, complex-valued fitting can be particularly beneficial when the MWF is characterised via the shorter T2* decay (Nam et al., 2015b) of gradient echo imaging.
Moreover, it has recently been shown that MWF depends on the orientation of fibres with respect to the external magnetic field (Birkl et al., 2020). Sensitivity to B0 inhomogeneity can also bias model fits as can phase errors caused by physiological effects, such as breathing and eddy currents (Nam et al., 2015a) and motion, which distorts the decay (Magerkurth et al., 2011). Vulnerability to physiology and motion, together with partial volume effects, are particularly problematic for spinal cord imaging (Duval et al., 2017Hori et al., 2018). More generally, these potential sources of artefact can manifest differently in vivo and ex vivo, meaning that while some techniques may work well in post mortem data, e.g. achieving cross-validation with histological data, they may not necessarily work well in vivo.
Models assuming two pools, either distinct non-exchanging water pools in myelin water imaging (Fig. 1b) or a bound and free pool that interact via magnetisation transfer (Fig.   1c) are also limited by the fact that they do not describe the full complexity of the tissue's microstructure. Higher numbers of pools, are undoubtedly present (c.f. even the simplified model of Fig. 1a) but are unlikely to be distinguishable based on distinctly observable relaxation behaviour either because of exchange or because it would require unattainable measurement precision. Simulation studies of more complete models have helped us to better understand the limitations of these simplifications.
In MWI, a slow exchange rate, is central to the possibility of differentiating water pools, and their fractional sizes, based on experimentally distinguishable T2 times. As the exchange rate increases to a more intermediate regime, distinct compartments may still be discernible, but the relaxation times will appear reduced, as will the MWF (Does, 2018). The situation is further complicated by the presence of noise, which, even at low levels, can further broaden the distribution of apparent relaxation times, and even lead to distinct water environments merging in the three pool case (Does, 2018). The rate of magnetisation transfer exchange between macromolecular and water pools is an order of magnitude larger than the diffusion-driven exchange rate between water compartments (c.f. non-directional exchange rates of 10s -1 and 100s -1 respectively, (Levesque and Pike, 2009)). Theoretical analysis of a four pool model (analogous to Fig. 1a) has also shown that inter-compartmental exchange could substantially alter the estimated MWF, but that the qMT-based BPF is more robust (Levesque and Pike, 2009). In support of this theoretical analysis, much greater variation in MWF than BPF has been seen in the spinal cord, not only ex vivo (Dula et al., 2010) but also in vivo (Harkins et al., 2012). The variability observed across tracts was consistent with variable exchange due to differences in axon diameter and myelin thickness, the key determinants of the g-ratio.
Much of the extensive validation work for the MWI technique has been conducted ex vivo, and often with samples at room temperatures. Both of these factors serve to slow the rate of exchange increasing the validity of the slow exchange assumption (Does, 2018).
Therefore, one must again exercise caution extrapolating the validity of MWF metrics from ex vivo findings to the in vivo situation.
Although these three and four pool models are likely to be closer to the true tissue microarchitecture, inversion of such a complex model would be difficult in terms of both precision and bias. Indeed, even in the context of the two pool models that have been used to date for g-ratio mapping, the parameterisation must be supported by the data.
The comparatively high parameterisation of the mcDESPOT model has necessitated the use of advanced fitting procedures, such as stochastic genetic or region contraction algorithms (Deoni et al., 2008(Deoni et al., , 2013. The achievable precision and accuracy of the approach has been called into question (Lankford and Does, 2013;West et al., 2019) and it has been shown to suffer from degeneracy when seeking to determine optimal model parameters, which is only resolved by using a simpler model, excluding exchange (West et al., 2019). A common requirement of all model types, including those capturing the AWF, is that any fixed parameters, e.g. as might be assumed in qMT models where the T1 of the free and bound pools may be assumed to be equal (Cabana et al., 2015), be appropriate to the population under consideration be they adults, children or indeed patients.
While it is also incorrect to assume that the non-aqueous compartment of tissue is entirely comprised of myelin, this has been shown to be the dominant source of the MT contrast mechanism in WM (Eng et al., 1991). In reality, the bound pool can be associated not only with the lipids and proteins of the myelin sheath, but also with any other macromolecule-bound protons (see Fig. 1a), e.g. glial cells (MacKay and Laule, 2016).
MWF will not only capture water within myelin sheaths surrounding axons but also that associated with any myelin debris in pathological cases, as has been shown in peripheral nerve (Webb et al., 2003). Similarly, MT-based measures lack specificity. Hence it should be borne in mind that although alterations in myelin content will change the measured MT effect, an alteration in MT effects cannot be uniquely attributed to a change in myelin and may be driven by other macromolecular changes. The derived MVF is also used to correct for the fact that the diffusion signal is insensitive to this compartment (by rescaling AWF). However, this neglects the non-myelin-macromolecular contribution within the imaging voxel, i.e. (Fig. 1a).
Finally, MWF is most commonly estimated within the myelin-rich white matter but has been shown to be significantly lower in grey matter (MacKay et al., 2006;MacKay and Laule, 2016). In this case, a further difficulty relates to whether or not sufficient sensitivity can be achieved in vivo to reliably estimate MWF in grey matter.

AWF models
Examples of strong simplifications used by the AWF models are that: the restricted compartment is solely associated with axons that can be modelled as impermeable sticks without cross-section, and that diffusion in the extra-cellular space is assumed to be Gaussian. The assumption that the restricted compartment is solely associated with axons is expected to be approximately correct in white matter because the density of other cells is small relative to the density of axons. In grey matter, however, the restricted diffusion signal will depend not only on the axonal compartment but on the density of cells of all types (soma and glia) as well. Thus, the simple three compartment approach of non-CSF tissue stipulating that AVF+MVF+EVF=1, where the combination of one diffusion and one myelin biomarker can be used to estimate AVF, no longer holds because the diffusion biomarker derived from the restricted signal component will be weighted by both AVF and EVF.
In addition to these model limitations, there is another problem associated with all of the approaches used for g-ratio mapping to date: they are based on the standard model comprised of compartments accounting for restricted and hindered diffusion. This model is known to suffer from a degeneracy of parameter estimates (Jelescu et al., 2016a) when measured with a linear diffusion weighting approach, i.e. the typical Stejskal and Tanner (Stejskal and Tanner, 1965) diffusion weighting scheme, which has been the case for all the aforementioned g-ratio mapping studies.
Alternatively, prior assumptions motivated by the biological composition of the tissue can be imposed to stabilize the parameter estimation. The NODDI, mcSMT, and WMTI models make particularly strong use of prior assumptions to allow the remaining model parameters to be estimated from data that can be acquired in a clinically feasible imaging time (see section 2.3.3). NODDI and mcSMT use a tortuosity model (Szafer et al., 1995) to relate the perpendicular extra-axonal diffusivity to the parallel extra-axonal diffusivity scaled by "one minus the neurite density": ( ,⊥ = (1 − ) ,|| ), i.e. the higher the neurite density in the tissue the lower the perpendicular diffusivity. As discussed in (Jelescu et al., 2015), ). This latter assumption would be problematic if different populations were studied, for which these diffusivities were not applicable, e.g. children, patients, or postmortem brains. WMTI, on the other hand, assumes that all fibres are aligned in parallel restricting its application to anatomical regions that support this assumption, e.g., it has been applied to the corpus callosum (West et al., 2018a). But, even in the corpus callosum axons are not necessarily aligned fully in parallel. This might be another reason (in addition to fixed diffusivities used in NODDI) for the systematically smaller AWF estimates when using WMTI as compared to NODDI as reported, e.g., in (Jelescu et al., 2015). Of course, this list of model assumptions that should be borne in mind is not exhaustive (Jelescu and Budde, 2017;Novikov et al., 2019). The Watson distribution used in NODDI can model fibre dispersion in a single fibre population, but cannot describe more complex fibre scenarios, such as crossing fibres. Nevertheless, it accounts, to a certain degree, for the variability of fibre-alignment within fibre pathways and thus might be better suited for whole brain g-ratio mapping than models that assume strictly parallel fibre configurations.

Calibration for MVF
Assuming that the diffusion-based AWF is accurate 3 , the relation between the myelin biomarker and the MVF still needs to be established via a calibration step. This calibration is particularly important since it is not only required to quantify the MVF, but also to convert the AWF to AVF (Eq. 2). Histological investigations suggest that the relationship between typical myelin biomarkers (which we will collectively denote in this section) and the MVF is linear (Fig. 4, (West et al., 2018b)): where and are unknown coefficients that need to be calibrated. It is expected that these coefficients will depend on instrumental variables and may therefore vary with MR systems, sequence parameters, as well as myelin biomarker models. Such dependency clearly limits the reproducibility and comparability of the MR-based g-ratio. Using simulations, Campbell et al. (2018) demonstrated that imperfect calibration can not only introduce a bias in the g-ratio, but can even cause the g-ratio to depend on the fibre volume fraction, negating the major strength of the g-ratio, i.e. that it is independent of FVF. Their simulations revealed that this dependence was different if the miscalibration was present only in the offset or only in the slope and coined the phrase aggregated gratio weighted imaging (Campbell et al., 2018).
To reduce these dependencies, two calibration methods have been used for in vivo gratio mapping. These have utilised a region of interest (ROI) in which either (a) the myelin biomarker was calibrated against the reference , first employed by  or (b) the measured g-ratio was calibrated against the reference g-ratio, first employed by . We refer to these approaches collectively as From Equation (4) it is clear that the single-point calibration methods are insufficient to establish a one-to-one correspondence between the MVF and the MRI-based myelin biomarker. One problem, for example, could be that will depend on the myelin biomarker within the reference ROI if ≠ 0 (see Eq. (4) ? How much does the MR-based g-ratio deviate from the ground truth? How large is this deviation relative to the expected dynamic range of the g-ratio, e.g. pathology-related differences?
Although the simulations in (Campbell et al., 2018) improved our understanding of the pitfalls of g-ratio mapping, they did not directly answer these questions. However, experimental data from the Does lab West et al., 2018bWest et al., , 2018a) could help answer these questions. In those experiments, the authors reported the changes of the g-ratio and the associated myelin-volume fractions in a range of mouse models spanning hypo-to hyper-myelination using both MRI and electron microscopy. The MRI based data included three biomarkers of myelin content: MWF ( ), BPF, and MTV.
Since in this case MTV was derived from the MWI experiment (i.e. with a multicompartment model) we denote it . In the following, we will use the range of reported values for MVF from histology, and AWF from diffusion MRI, to generate ground truth parameters for a subsequent simulation-based experiment.
We will evaluate the bias and error between the ground truth g-ratio, , and that obtained by calibrated MRI measures, , using Bland-Altman analyses (Bland and Altman, 1986). The Bland-Altman plots (Fig. 5)  whereas error captures the deviation from a one-to-one relationship between the ground truth and the MR g-ratio. While a potential bias can be retrospectively corrected, the error in the g-ratio mapping method will define its sensitivity and ability to detect change or differences between individuals, groups or over time. Any error must be lower than the expected difference between groups or due to pathology if the g-ratio mapping method using MRI is to be of use to reliably assess these differences.

Ex vivo simulation experiment
To generate a realistic range of ground truth values for the MVF and AWF, we used the histology-based MVF values reported in (West et al., 2018b), which range from approximately 0.015 to 0.285 (cf. Fig. 7 in (West et al., 2018b)) and the MRI-based AWF values as reported in , ranging from approximately 0.3 to 0.7 (cf. Fig.   13 in ). Note all measurements are derived from the same animals.
Then, we used Eqs. (1) and (2) to generate the ground truth g-ratio values ( , which ranged from 0.79 to 0.97) 4 . To generate the MRI-based myelin marker, we used the linear relationships reported in (West et al., 2018b) between the histological MVF (here: the ground truth MVF) and three myelin biomarkers: = 0.45 + 0.086 (Fig. 4a 5 ), = 0.89 − 0.016 (Fig. 4b), and = 0.75 − 0.047 (Fig. 4c 6 ). Note that the calibration of was independent of the experimental data in (West et al., 2018b) but based on literature values from an independent experiment. Therefore, was used in the following simulations instead of . This was opposite to the calibrated , which was estimated using the experimental data in (West et al., 2018b). Also note that the requires an intrinsic calibration to normalize the water content (see section 2.2.1).
In this simulation experiment, we compared with the non-calibrated g-ratio values ( ) and with the calibrated g-ratio values ( ) using either the g-ratio (Fig. 5b,e) or MVF (Fig. 5c,f) single-point calibration (SPC) methods (depicted as scatter in Fig. 5a-c and Bland-Altman plots in Fig. 5d-f). The SPC-reference values were based on the 4 Note that these values are larger than the g-ratios assessed via ex vivo histology in (West et al., 2018a), ranging from 0.75 to 0.88. This may be because the latter was determined from myelinated axons only. The g-ratios in this simulations, however, are based on the combination of histological MVF and diffusion MRI-based AWF. The diffusion-based AWF should be sensitive to both myelinated and unmyelinated axons (Beaulieu and Allen, 1994a;Beaulieu, 2002). Since the g-ratio of unmyelinated axons is 1, a larger ground truth g-ratio is expected than the EMbased values in (West et al., 2018a). 5 Note that the linear equation reported in Figure 7 (West et al., 2018b) had a negative offset, i.e.: = 0.45 − 0.086. But, this is assumed to be in error since it must be positive to describe the black curve. 6 This linear equation was generated from the normalized water content estimated from reported in Figure 8 (West et al., 2018b) and the conversion to was done according to (Berman et al., 2018). average in control mice (blue symbols in Fig. 7 (West et al., 2018b): ≈ 0.175 and ≈ 0.85). The index described the myelin biomarker that was used to generate the g-ratio: i.e. used (blue crosses), used (green crosses), and used (black crosses). When calculating the g-ratios, an upper and lower limit was applied meaning that if 2 > 1, the g-ratio value was set to one (because (1 − ) ≤ 1) and if 2 < 0, the g-ratio was set to zero (because (1 − ) ≥ 0). In the results, we report the bias and error of the Bland-Altman analyses relative to the dynamic range of simulated ground truth g-ratios: = max( ) − min( ) = 0.18.

Simulation results
The results are summarized in Figures 5 as well as in Table 2. Without calibration, we found that the bias was smallest for the BPF-based g-ratio (6.6%) and largest for the MTV-based g-ratio (-41.7%). The error was smallest for the MWF-based g-ratio (1.0%), moderate for the MTV-based g-ratio (7.1%) and largest for the BPF-based g-ratio (25.5%). Regardless of calibration method (i.e. MVF or g-ratio reference) the calibration reduced the bias for the MWF-based g-ratio (ca. -3%) and the MTV-based g-ratio (ca. -8%) but increased it for the BPF-based g-ratio (ca. 11%). Importantly, the calibration increased the error for MWF-based (ca. 6%) and MTV-based (ca. 21%) g-ratios and had almost no effect on the BPF-based g-ratio. Altogether, the BPF-based g-ratio exhibited comparatively large error exacerbated by its significantly compressed dynamic range ( = max( )−min( ) ≈2%) as compared to the g-ratios using the other MVF proxies: ≈ 25%, ≈ 29%, ≈ 21%. Note that the abrupt change in the slopes of the MTV-based g-ratio values in Figs. 5 (black crosses) were due to reaching the upper limit for the MR g-ratio (i.e. the calibration led to a sealing effect for this biomarker  The relative bias and error introduced by the g-ratio-based (3 rd row) and MVF-based (4 th row) single-point calibration as assessed by the Bland-Altman analysis for three different myelin biomarkers: Bound Pool Fraction ( in 2 nd column), calibrated Myelin Water Fraction ( in 3 rd column), and Macromolecular Tissue Volume ( in 4 th column). The reference values for the single-point calibrations are depicted in the last column. The bias and error are defined via the respective Bland-Altman analysis illustrated in Fig. 5. Bias is defined as the mean difference 〈 〉. Error is defined as the interval between +/-1.96 〈 〉 with = − . Here, bias and error are presented in percentage, relative to the dynamic range of the ground truth g-ratios: = max( ) − min( ) = 0.18.

What we can learn from the simulation experiment
We learned from this simulation that the single-point calibration can reduce the bias in the g-ratio (i.e. two out of three MRI-based g-ratio values became closer to the ground truth) but it comes at the cost of an increased error (i.e. the deviation from a one-to-one correspondence between the MR and the ground truth g-ratio increased after calibration).
We expect that the latter feature is of more relevance to typical g-ratio studies, where longitudinal changes in the g-ratio or changes in the g-ratio between groups will likely be investigated. Moreover, the simulations showed that and are reasonable biomarkers for the g-ratio in terms of their error. Perhaps surprisingly, they perform best, in terms of error, when no calibration is performed. , on the other hand, is degraded as an MVF biomarker when using single-point calibration, but also suffers from larger error when no calibration is performed. Interestingly, the two better performing MVF biomarkers, i.e. and , both involved a calibration step in their computation, unlike the . For the calibration was purely based on literature values, whereas was calibrated against a grey matter value specific to each brain.
Based on these simulations, a number of conclusions can be drawn. First, the singlepoint calibration method is insufficient to calibrate the g-ratio for the investigated scenarios where the offset parameter was non-zero. Second, at least in contexts consistent with those investigated here (e.g. fixed tissue, ex vivo MRI, mice), -based g-ratio should not be used without more sophisticated calibration methods that are capable of accurately estimating both the slope and offset (see, e.g., (West et al., 2018b)).
Third, and might be better biomarkers for g-ratio weighted imaging because they can be readily used without calibration. Some important caveats to these conclusions are outlined in the following.
Since gold standard information is missing when investigating in vivo data, we have used ex vivo data to generate ground truth data for a simulation experiment. It is important to bear in mind that the presented results may not translate to the in vivo case because of potentially different model validity (see section 3.1.1). Data quality can also vary considerably between in vivo and ex vivo imaging scenarios. Not only because of the use of fixed tissue ex vivo (and concomitant mitigation of physiological and motion corruption as well as the capacity for markedly longer scanning protocols) but also because of the different MRI techniques and non-clinical imaging systems used (West et al., 2018b). As a consequence, the relative differences in performance and impact of calibration for each of the g-ratios derived from different myelin biomarkers ( , , and ) should be interpreted with care. This important point is further emphasised in figure 6. This demonstrates that the small dynamic range in the -based g-ratio relative to the -based g-ratio predicted by simulation (Fig. 6a), does not manifest in comparable in vivo g-ratio maps ( -based vs. -based in Fig. 6b). These in fact show a greater dynamic range for the MTsat-based g-ratio and higher correspondence between the two approaches after calibration. This contrasting observation might be due to the reasons outlined above, the use of somewhat different techniques in vivo and ex vivo, or due to fixation issues, e.g. fixation has been shown to strongly increase the in normal appearing white matter (Schmierer et al., 2008).
In summary, these simulations show that the single-point calibration, used in virtually all in vivo g-ratio mapping studies to date (Table 1), does not fully resolve the issue of converting MR proxies to the true MVF and can even increase bias and error in the gratio estimates. Therefore, further methodological development and validation is required to find the optimal means of ensuring the necessary validity and sensitivity of the MR gratio. Mohammadi, 2018a), were acquired using the protocol described in the caption of Fig. 7. Note that the MR g-ratios ("g3" and "g4") in the original publication were erroneous due to a reported mistake, see corrigendum (Ellerbrock and Mohammadi, 2018b). Here, the correct maps are depicted.

Unification of Multi-modal data
The aggregated g-ratio weighted imaging approach combines two complementary MRI contrasts, sensitive to the axonal-water and myelin volume fractions respectively. Given that each quantitative MRI technique is typically vulnerable to a specific set of artefacts, the combination of multiple data types needs to take care not to amplify these artefacts such that they obscure or corrupt the quantity of interest. For example, we have previously demonstrated that modality-specific spatial distortions, arising from inhomogeneous magnetic susceptibility distributions in the brain, can prevent voxel-wise spatial correspondence of the AWF and MVF proxies being achieved and lead to erroneous g-ratio estimates . Even after correcting the susceptibility-induced distortions using dedicated tools (Ruthotto et al., 2012(Ruthotto et al., , 2013, residual misalignments between the EPI-based diffusion data and the gradient-echo based magnetisation transfer saturation maps can persist. The most obvious reason for residual misalignments is, of course, insufficient susceptibility distortion correction, but partial-volume effects in the EPI-based diffusion data associated with the typically lower spatial resolution, the EPI-readout, and eddy current distortions can also lead to lower white-matter tissue probability in the diffusion data relative to the MTsat map (Fig. 7).
Here, we suggest combining the overlap between two modality-specific white-matter tissue probability maps (TPMs) to remove regions in the resulting g-ratio maps (Fig. 7a.v and 7b.v) that do not overlap between the two MRI contrasts, i.e. the region outside the red contours in Fig. 7a.iii and Fig. 7b.iii. In the example of Figure 7, the TPM was generated from the MTsat (Fig. 7a.iii and 7b.iii) and NODDI (Fig. 7a.iv and 7b.iv) map, respectively. 500, 1000, 2500 mm/s2), each shell with 60 directions and additional b=0 images interspearsed, every 10 images, resolution: 1.6 mm isotropic, repetition time: 5300 ms, echo time: 73 ms. All scans were performed on a 3T PRISMA MRI (Siemens Healthcare, Erlangen, Germany), using the Siemens 1-channel transmitter (Tx) / 64-channel receiver (Rx) head-coil. The spatial distortions were reduced using the ACID toolbox (www.diffusiontools.com).

Validation of g-ratio mapping
Clear in vivo validation of g-ratio mapping is highly desirable, but generally unfeasible.
We therefore typically rely on ex vivo histology for validation. A number of differences between these two imaging scenarios have been highlighted in previous sections. Here we summarise key points specifically pertinent to the ex vivo histology (e.g. electron microscopy) gold standard scenario. Most importantly, one has to consider the change in tissue composition that occurs when going from the in vivo to the ex vivo situation.
In this case, the MRI signal and its parameters can significantly change due to, e.g. (i) autolysis (varying post-mortem interval (Shepherd et al., 2009)), (ii) fixation and the associated changes of cross-linking proteins, tissue shrinkage, and slowed diffusion processes (Schmierer et al., 2008;Shepherd et al., 2009), and (iii) temperature changes (Birkl et al., 2016). These changes affect diffusion (Dyrby et al., 2011) and other important MR parameters, such T1, T2* (Streubel et al., 2019) and susceptibility (based on signal phase) contrasts. However, despite these changes, the most important MRI mechanisms (e.g. diffusion anisotropy and relaxation mechanisms) are still present after fixation (Roebroeck et al., 2008). Nonetheless, it is necessary to characterize these differences in MRI parameters to enable translation and interpretation across in vivo and ex vivo measurements.

g-ratio
To date, only two studies have compared g-ratio measurements from ex vivo histology with MRI West et al., 2018a). Stikov et al. (2015) compared the gratio measured with in vivo MRI and ex vivo histology on a macaque monkey. West et al. (West et al., 2018a) compared g-ratio maps based on the WMTI, mcSMT and NODDI models to the equivalent g-ratio measured using gold standard histology techniques in mouse models. All three methods showed a moderate linear correspondence. It is important to note that for the NODDI model to work, the fixed diffusivities had to be adjusted empirically, suggesting that it is less well suited, at least for ex vivo data. Another interesting finding was that a simplified g-ratio model, in which the extra-axonal volume fraction was assumed to be one, such that AVF = 1 -MVF, performed equally well to the above mentioned diffusion signal models. The conclusions from this finding could be quite radical, i.e. that it is not necessary to measure both diffusion MRI and myelin markers to estimate changes in g-ratio across a strongly myelinating process. However, again caution is required since the gold-standard g-ratio (measured by histology) did not account for the contribution of unmyelinated axons, which the MRI g-ratio is also expected to depend on. Finally, it is important to highlight that, to date, no human specimen has been used to validate the g-ratio. This, however, would be a crucial step in linking ex vivo histology with our target in vivo application, i.e. g-ratio mapping in the human brain.

MVF
Here we discuss comparisons between myelin-sensitive MRI-based metrics and the gold standard MVF measured via histology that have been carried out in the context of g-ratio mapping. In early work, Stikov et al. compared the PSR estimated via MRI with the MVF estimated from electron microscopy (EM) in the corpus callosum of a macaque . They found a linear dependence between these quantities but did not find the relationship to be significant, perhaps due to limited myelin-related variance present in the data. Using mouse models spanning hypo-and hyper-myelinated conditions has allowed a broader variance in myelination to be investigated (West et al., 2018b

AVF
Validation of AVF presents some distinct challenges. The AVF estimate derived from diffusion-based metrics (either via FVF or AVF=(1-MVF)AWF) is sensitive to the pool of myelinated axons but also influenced by the unmyelinated axons Allen, 1994a, 1994b;Beaulieu, 2009;Jones, 2010). By contrast, gold standard EM-based assessment of volume fractions often focus on the myelinated axons only West et al., 2018bWest et al., , 2018aZaimi et al., 2018;Tabarin et al., 2019). The myelin sheath provides protection against autolysis and acts as a contrast-enhancer for microscopy, making myelinated axons likely to be present and more easily detectible than unmyelinated axons (Olivares et al., 2001). In 2D EM, unmyelinated axons can also be confused with non-neuronal processes from cells like astrocytes or microglia. Ideally, a high-resolution microscopy approach combined with a neuron-specific stain, e.g. for neurofilaments, should be used to assess the AVF by encompassing all axons. (Jelescu et al., 2016b) compared MRI-based AWF with a histological counterpart (via Eq. (2)), including both myelinated axons and an estimate of unmyelinated axons, in mice models with different degrees of myelination. They found a linear relation, though not a 1:1 correspondence. This is an indication that MRI-based AWF also needs to be calibrated.
However, it is expected that miscalibration of AWF will have less effect on MR g-ratio than miscalibration of MVF because it has been shown that (de)myelination-related changes in the g-ratio are strongly driven by changes in MVF and less so by the AWF (West et al., 2018a).

Conclusion and Outlook
This review provides methodological background for the MRI techniques pertinent to aggregate g-ratio weighted mapping with the aim of improving understanding of the currently used biomarkers, as well as to provide insight into the potentials and particularly the pitfalls. G-ratio weighted mapping has the potential to achieve non-invasive mapping of this functionally-relevant microstructural parameter by utilising the strength of multi-contrast quantitative MRI and biophysical models (also known as in vivo histology using MRI ). The main take-home messages of this review are: (1) to fully benefit from the advantages of the aggregated g-ratio model, further work on a more appropriate calibration method is necessary to enable simultaneous estimation of both the slope and offset of the relationship between MRI markers and the true MVF; (2) more ex vivo histology gold standard measurements of human brain tissue are required to assess the typical range of MR g-ratio values that can be expected in vivo, (3) the quest to find the most appropriate MRI biomarkers for MVF and AVF for the in vivo situation is ongoing. In particular, there is currently a lack of validation studies for biomarkers of the AVF compartment using diffusion-based metrics. A major challenge here will be the estimation of the contribution to the AVF from unmyelinated axons (and cells potentially as well) via histology.
Other models that combine WMTI parameters and fibre dispersion (as defined by Watson distribution, e.g., in NODDI) (Jelescu et al., 2015;Jespersen et al., 2018) might have the potential to combine the sensitivity of WMTI to compartmental diffusivities with the less strict assumption about fibre alignment of the NODDI model. However, they suffer from model-inherent degeneracies (Jelescu et al., 2016a). One proposed solution to this degeneracy is to combine linear encoding schemes with planar or spherical diffusion sequences (Reisert et al., 2018;Coelho et al., 2019). A few studies have compared the diffusion anisotropy and intra-cellular signal fraction from linear diffusion weighting with planar diffusion weighting sequences: (Henriques et al., 2019) did it ex vivo in mice and  did it in vivo in humans. However, these techniques have not yet been used for aggregated g-ratio weighted imaging. Another study has revealed a one-to-one correspondence between a simplified NODDI model and the mean diffusivity and fractional anisotropy as measured with DTI . NODDI-DTI might help to link the models of g-ratio mapping studies based on a standard DTI protocol to those models that were based on more advanced diffusion MRI protocols. However, it has also not yet been applied to g-ratio mapping either. Future directions might also include the use of generative signal models that directly depend on the MR g-ratio (e.g., Bowtell, 2012, 2013)) to allow its extraction, or alternatively estimating the g-ratio from a multi-compartment GRE signal model (Thapaliya et al., 2018(Thapaliya et al., , 2020. New approaches that promise greater specificity to myelin (e.g. ihMT (Varma et al., 2015;Ercan et al., 2018;Duhamel et al., 2019)) and intra-axonal (Shemesh et al., 2016) compartments may also improve our capacity to directly map the g-ratio in the human brain in vivo.

Acknowledgements
We would like to thank Nadège Corbin, Luke J. Limitations of the AWF estimation approach (LA) LA.1 Assumes parallel fibres and thus can be applied only in regions where this assumption is not violated (typically it has been applied in the corpus callosum). LA.2 The TFD (Reisert et al., 2012) has been assumed to be proportional to the fibre volume fraction, neglecting the contribution of the myelin water. It relies on a tractography algorithm and thus inherits the associated limitations. Gratios based on this method show a larger scan-rescan variability as compared to NODDI-based g-ratios (Ellerbrock and Mohammadi, 2018a). LA.3 NODDI and mcSMT relate the perpendicular extra-axonal diffusivity to the parallel extra-axonal diffusivity scaled by "one minus the neurite density": ( ,⊥ = (1 − ) ,|| ). Moreover, NODDI and mcSMT impose a one-to-one scaling between the intra-and extra-cellular parallel diffusivities: ,|| = ,|| . LA.4 NODDI fixes it to a constant value (for in vivo healthy adults the diffusivities are usually assumed to be: ,|| = ,|| = 1.7 2 and 0 = 3 2 ). LA.5 The WMTI model assumes parallel fibres and thus can applied only in regions where this assumption is not violated (typically it has been applied in the corpus callosum, but whether the model assumptions are sufficiently met there is unclear). LA.6 These studies provided not sufficient information to assess the specific implementation of the diffusion model.
Limitation of the MVF estimation approach (LM) LM.1 Requires a conversion factor to convert MRI-based myelin marker to the myelin volume fraction, which is done via histological data in different species. If this conversion factor is incorrect, the g-ratio will not be decoupled from the FVF .

LM.2
MTsat depends not only on the bound pool fraction but also on the rate of exchange, k, between the bound and free pools. Moreover it is a semiquantitative measure because it depends on the particular off-resonance pulse used in the sequence, most notably its power and offset frequency (Helms et al., 2008). LM.3 Results in biased (over-)estimates (West et al. 2019) with artifactually high precision (West et al. 2019;Lankford and Does, 2013). LM.4 Requires estimation of the proton density and therefore a normalisation factor, e.g. the proton density in CSF. The optimal choice of the normalisation region will depend on the acquisition scheme, and will dictate the precision and accuracy of the MTV estimate. The modulation of the receiver coil's sensitivity also needs to be removed, either by constrained model fitting or measurement (Mezer et al. 2016). LM.5 The MTR suffers from the same limitations as MTsat, but retains dependence on both T1 and transmit-field inhomogeneities. These additional dependencies make it more prone to error as demonstrated e.g. in . LM.6 When fitting magnitude multi-echo data with long echo times, significant biases can be introduced by the Rician noise distribution that can greatly alter the measured T2(*) values (Bjarnason et al. 2013).
Fitting results are sensitive to the choice of TE and echo spacing, e.g. higher apparent T2(*) and smaller fractional contributions from short T2 species as the first echo is increased (Whittall et al., 1999;Cercignani et al., 2018). A broad range of echo times are required to fully characterise both long and short T2 components. Short echo times are required to acquire a signal with appreciable contribution from myelin, which is particularly problematic for gradient echo imaging due to the very short T2* of myelin. LM.7 Error can result from the sensitivity to B1+ effects, both inhomogeneity, which can lead to stimulated echoes distorting the decay, and slice profile effects for 2D acquisitions (Lebel and Wilman, 2010). Power deposition can also be problematic, particularly at UHF. LM.8 Sensitivity to B0 inhomogeneity can bias model fits (Nam et al. 2015a). Phase errors caused by breathing and eddy currents can also lead to errors if uncorrected (Nam et al. 2015b). LM.9 Assumes a two pool model, which is a simplification, but likely sufficient to be supported by in vivo data acquired in the human brain (Levesque and Pike, 2009). LM.10 The model validity is unknown.  Bound pool magnetisation relative to the combined bound and free pool magnetisation amplitudes as measured using qMT.

PSR (F) Pool size ratio
Bound pool magnetisation relative to free pool magnetisation amplitude as measured using qMT.

AVF Axonal volume fraction
The fraction of the imaging voxel volume that is intra-axonal.

AWF Axonal water fraction
The fraction of the MRI water signal originating from the axonal compartment.

MVF Myelin volume fraction
The fraction of the imaging voxel volume associated with myelin. This includes both the myelin itself and the water trapped between its bilayers.

MWF Myelin water fraction
The fraction of the MRI water signal identified as exhibiting faster relaxation and attributed to the water trapped within the myelin sheath.

EVF Extra cellular volume fraction
The fraction of the imaging voxel volume that originates outside the fibre.

Fibre volume fraction
The fraction of the imaging voxel volume that originates outside the fibre.

PD Proton density
The concentration of MR-visible water relative to the concentration in the same volume comprised entirely of water.

MTV(F) Macromolecular tissue volume (fraction)
The (fractional) volume of the imaging voxel that is comprised of macromolecules, i.e. that is not MR-visible water.

Magnetisation transfer saturation
The steady state signal loss as a result of magnetisation transfer between the bound and free pools.