Fixel-based Analysis of Diffusion MRI: Methods, Applications, Challenges and Opportunities

Diffusion MRI has provided the neuroimaging community with a powerful tool to acquire in-vivo data sensitive to microstructural features of white matter, up to 3 orders of magnitude smaller than typical voxel sizes. The key to extracting such valuable information lies in complex modelling techniques, which form the link between the rich diffusion MRI data and various metrics related to the microstructural organisation. Over time, increasingly advanced techniques have been developed, up to the point where some diffusion MRI models can now provide access to properties specific to individual fibre populations in each voxel in the presence of multiple "crossing" fibre pathways. While highly valuable, such fibre-specific information poses unique challenges for typical image processing pipelines and statistical analysis. In this work, we review the "fixel-based analysis" (FBA) framework, which implements bespoke solutions to this end. It has recently seen a stark increase in adoption for studies of both typical (healthy) populations as well as a wide range of clinical populations. We describe the main concepts related to fixel-based analyses, as well as the methods and specific steps involved in a state-of-the-art FBA pipeline, with a focus on providing researchers with practical advice on how to interpret results. We also include an overview of the scope of all current FBA studies, categorised across a broad range of neuroscientific domains, listing key design choices and summarising their main results and conclusions. Finally, we critically discuss several aspects and challenges involved with the FBA framework, and outline some directions and future opportunities.


Introduction
Diffusion MRI (dMRI) has revolutionized our capabilities to study white matter (WM) microstructure and organization in healthy and diseased populations ( Jones, 2010 ;Johansen-Berg & Behrens, 2013 ): it enables us to visualize WM fibre bundles and measure properties of their microstructure non-invasively, in-vivo and without relying on ionizing radiation. Over more than 2 decades, numerous dMRI guided studies have demonstrated that clinical populations present with altered WM organization in various specific WM fibre tracts (see reviews: e.g., Deprez et al., 2013 ;Hulkower et al., 2013 ;Pasternak et al., 2018 ). These studies have also generally reported significant, moderate-to-high correlations between disease symptoms and dMRI derived metrics sensitive to WM microstructure, with more severe changes in WM microstructure typically relating to more pronounced symptoms.
The key factor enabling such studies to assess this valuable information lies in complex modelling techniques, which form the link between the rich diffusion MRI data and various metrics related to the microstructural aspects of interest ( Novikov, Kiselev, & Jespersen, 2018 ). These include a range of biophysical models, such as the composite hindered and restricted model of diffusion (CHARMED) ( Assaf & Basser, 2005 ) or the neurite orientation dispersion and density imaging (NODDI) ( Zhang et al., 2012 ) model, which aim to model the diffusion signal as distinct microstructural compartments with biophysical parameters; as well as more generalized representations of the diffusion signal, including diffusion tensor imaging (DTI) ( Basser & Pierpaoli, 1996 ), diffusion kurtosis imaging (DKI) ( Jensen et al., 2005 ) and diffusion spectrum imaging (DSI) ( Wedeen et al., 2008 ). Some of the most commonly used approaches to date for studies of WM microstructure have been based on DTI, which provides general information on the local orientation of white matter fibres as well as metrics describing the fractional anisotropy (FA) and mean diffusivity (MD). Preprocessing, various fitting strategies, and post-processing for DTI are welldocumented ( Van Hecke et al., 2015 ;Mori & Tournier, 2014 ). Statistical analyses are often performed using region-of-interest (ROI) approaches, or voxel-based analysis (VBA) with statistical enhancement mechanisms such as threshold-free cluster enhancement (TFCE) ( Smith & Nichols, 2009 ), but bespoke frameworks such as tract-based spatial statistics (TBSS) ( Smith et al., 2006 ) have also been proposed to address specific challenges with registration and smoothing. However, DTI (as well as several other commonly used models or signal representations) is unable to correctly represent complex geometrical WM fibre configurations generally referred to as "crossing fibres ", leading to problems with interpretation and limited biological specificity of associated metrics, as well as various detrimental effects on processing techniques such as streamline tractography ( Farquharson et al., 2013 ;Jones, 2010 ;Jones et al., 2013 ). The aforementioned statistical analysis approaches also lack mechanisms to leverage information from multiple distinct fibre populations within voxels.
To address these challenges, a statistical analysis framework named "Fixel-Based Analysis " ( FBA ) was proposed ( Raffelt et al., 2015. In this context, a "fixel " refers to an individual fibre population within a vo xel , allowing for fibre-specific metrics to quantify WM properties and changes. Unlike voxels, fixels relate directly to the underlying WM anatomy itself. In a typical FBA, fixels are derived from WM fibre orientation distributions (FODs) as computed by constrained spherical deconvolution (CSD) techniques ( Tournier et al., 2007 ;Jeurissen et al., 2014 ;Dhollander & Connelly, 2016 ). A corresponding fixel-wise measure of apparent fibre density ( Raffelt et al., 2012b ), more broadly referred to as "fibre density " (FD) , can be computed directly from the FODs themselves as well ( Raffelt et al., 2015 ). Apparent FD is a measure of white matter microstructure : its value is approximately proportional to total intra-axonal volume under certain conditions ( Raffelt et al., 2012b ;Genc et al., 2020a ). Interestingly, macroscopic differences of fibre-bundle cross-section (FC)  can also be measured on a fixelwise level by leveraging information from individual subject warps to a common template space, effectively resulting in the fixel-wise equivalent of the traditional tensor-based morphometry (TBM) ( Ashburner & Friston, 2000 ) approach. Finally, the fixel-wise analysis of a combined fibre density and cross-section (FDC)  measure leads to an approach very similar to the well-known voxel-based morphometry (VBM) ( Ashburner & Friston, 2000 ) method. Using this entire range of techniques to its full potential for the first time, an example FBA study was presented in Raffelt et al. 2017 , comparing a clinical group of patients with temporal lobe epilepsy to healthy controls. This revealed statistically significant reductions in both apparent FD and FC in fibre pathways of the affected temporal lobe in patients as compared to controls. Furthermore, the combined FDC measure enabled a more sensitive assessment of fixel-wise effects, with greater effect sizes detected than when testing apparent FD or FC alone. The core tools to implement such FBA studies ( Raffelt et al., 2015 have been made available as part of the MRtrix3 software package . Since the original description of the FBA framework, several FBA studies have been undertaken, with a particular surge in published studies in the most recent years (as can be appreciated in Fig. 1 ). Despite this quickly emerging base of recent empirical work, the scope and methodological aspects of the FBA framework have not been critically reviewed yet.
In this work, we review the FBA framework. We (1) provide an overview of the main concepts related to the FBA framework and describe the methods and specific steps involved in modern FBA pipelines, (2) include an overview of the scope of all current FBA studies, categorised across a broad range of neuro-scientific domains and (3) critically discuss a range of aspects and challenges involved with the FBA framework and its various applications.

Fixel-based Analysis (FBA): Concepts and Methods
In this section, we provide an overview of the main concepts and methods of the FBA framework, and specific steps involved in a state-ofthe-art fixel-based analysis (FBA) pipeline. While the FBA framework is unique in that it allows researchers to investigate fibre-specific properties extracted from diffusion MRI (dMRI) data, the pipeline otherwise relatively closely reflects the structure of a "traditional " voxel-based analysis (VBA) pipeline. Conceptually, the core difference lies in the introduction of a new type of grid element, the "fixel ", which refers to a specific individual fibre population within a voxel. While this may seem like a relatively simple and straightforward adaptation of VBA at first sight, working with fixels (and fibre orientation distributions, from which fixels are typically derived) rather than voxels poses various unique challenges for some of the key steps of a typical standard VBA pipeline. A range of works have proposed and implemented specific solutions to address these challenges ( Raffelt et al., 2011( Raffelt et al., , 2012a( Raffelt et al., , 2012b( Raffelt et al., , 2015, which has resulted in the current FBA framework. studies per year. When both a preprint as well as a subsequent peer reviewed publication exist for a given study, this was only counted once (towards the year of publication of the peer reviewed work). Incomplete data for 2021 are not included in this plot. However, we found an additional 17 FBA studies in the first two months of 2021, resulting in a total of 75 published FBA studies (see Supplementary Document 2 ).

From voxels to fixels
In a VBA approach, image values are analyzed on a voxel-wise basis. To this end, individual subject images are spatially registered and warped to a common template space. Because the voxel grid for which the original image values are sampled is a discrete regular lattice, warping images to a template space requires regridding of the images to a new (common) voxel grid. This is however trivially enabled by various interpolation methods, which allow to sample the original image values for any set of 3D spatial coordinates. This then establishes spatial correspondence between voxels across all subject images after warping to and regridding in template space, allowing for direct comparison and statistical analysis at a voxel-specific level without requiring any spatial hypothesis a priori. Region-of-interest (ROI) type of analyses can benefit from this approach as well, as ROIs can be defined just once on the template for areas where image registration has aligned all images with sufficient accuracy.
The FBA framework is centered around the concept of a fixel , a specific fibre population within a voxel ( Raffelt et al., 2015 ), enabling analysis of individual fibre-specific properties in the presence of crossing fibre populations. In addition to a 3D position in the spatial domain, fixels also have a (2D) orientation in the angular domain. While fixels are an adequate choice of grid element for the purpose of mapping fibre-specific metrics, they are fundamentally different from voxels in the context of image processing: the fixel "grid " is derived directly from modelling of the dMRI data themselves in each voxel ( Raffelt et al., 2015 ). This has several notable implications ( Raffelt et al., 2015: 1 Unlike voxels -which cover the entire spatial domain at regular positions independently of what is represented in the image data -the fixels' presence and orientation is directly tied to white matter (WM) anatomy , as shown in Fig. 2 . In the angular domain, fixels can have any orientation. In the spatial domain, fixel positions are still limited to the discrete positions of an underlying voxel grid. However, fixels do not exist everywhere in space: some voxels contain no fixels. On the other hand, some fixels share the same space: some voxels contain multiple fixels. 2 Because fixel orientations are linked to the WM anatomy and obtained from the image data themselves, spatial transformations of fixel-wise image data require corresponding reorientations of the fixels , i.e., spatial transformations imply angular transformations. For non-rigid transformations, these reorientations can differ for fixels in different voxels and even for individual fixels contained in the same voxel: the angles between fixels in the same voxel can change. On the upside, because fixels can have any orientation, no regridding is required in the angular domain: the local (forward) angular transformation can be applied directly to the fixel orientation. 3 Even though the spatial positions of fixels are still restricted to those of an underlying voxel grid, the spatial regridding required for image transformation cannot be trivially overcome by interpolation methods in the same way as for voxel-wise image data: there is no implied notion of which fixels in neighbouring voxels "belong together ". This is made even more clear (and challenging) by the fact that neighbouring voxels can contain different numbers of fixels, and some voxels contain no fixels at all. Put differently, the fixel grid does not exist consistently throughout the spatial domain. 4 Even if the spatial regridding problem would be overcome (and proper fixel reorientation be applied) to map individual subject fixel images to a common template space, this still does not establish fixelwise correspondence across all subject images. Even though the images should align up to the accuracy of image registration and the voxel grid is shared, the fixel grids' local presence and orientations still relate to the individual subjects' anatomy. Establishing a common fixel grid is a unique challenge in and of itself. 5 Finally, for the purpose of statistics, VBA typically relies on spatial smoothing (e.g., to boost signal-to-noise ratio and increase normality of residuals) and statistical cluster enhancement (to improve sensitivity). Both of these require a notion of local voxel neighbourhoods . While defining an equivalent concept for fixels poses yet another challenge, this also provides a unique opportunity: a cluster of fixels in a local part of a given WM tract could be in a neighbourhood entirely separated from the fixels of another crossing tract , even when these tracts overlap spatially (i.e., share voxels).
Overall, while a fixel grid is a logical and sensible extension of a voxel grid, it is clearly not a trivial one. The FBA framework provides solutions to the above challenges. All FBA studies to date have relied on WM fibre orientation distributions (FODs) from which both fixels and fixelwise metrics are derived ( Raffelt et al., 2015 ). In this context, an FBA pipeline "circumvents " the challenges related to points 1 and 3 above by delaying the computation of fixels from (voxel-wise) FODs until after the registration, warping and regridding of subject FOD images to the Derivation of a fixel "grid " and fixel-wise apparent fibre density (FD). Left : voxel-wise WM FODs (here obtained from 3-tissue CSD) serve as the source from which both the fixel grid and fixel-wise apparent FD metric are computed. In this image, WM FODs in a coronal slice are overlaid on an FOD-based directionallyencoded colour map ( Dhollander et al., 2015 ) ( red = mediolateral, green = anteroposterior, blue = superoinferior ). Middle : the fixels are obtained by segmenting the FODs into their individual "peaks ". Unlike the regular lattice structure of the underlying voxel grid, the fixel grid's presence and orientations are tied to the WM anatomy itself. Right : apparent fibre density (FD), a fixel-wise metric, is computed as the integral of each FOD lobe ( hot colour scale ). The underlying voxel intensities show the total voxel-wise apparent FD ( grey colour scale ). Both fixel-wise and voxel-wise apparent fibre densities are expressed in arbitrary units . common template space ( Raffelt et al., 2011( Raffelt et al., , 2012a( Raffelt et al., , 2015. While point 2 (reorientation) conceptually goes hand in hand with point 3 (spatial regridding), it is then in practice separated and performed after fixels are derived in template space . Finally, for points 4 and 5, unique strategies are implemented ( Raffelt et al., 2015 ). A common fixel grid is derived from an average FOD template and angular correspondence of subject fixels to the common fixel grid is established by identifying subject fixels within a certain angular threshold. Fixel neighbourhoods are locally derived by computing connectivity between fixels, informed by template-based streamline tractography. All these steps and solutions are evident in the structure of the state-of-the-art FBA pipeline as described below.

Fixel-wise metrics and apparent fibre density
In voxel-wise MRI data, the measurement(s) for each voxel relate to underlying properties of tissue within the volume of the voxel . The same holds for fixel-wise data. However, the presence of multiple fixels in a voxel allows an individual specific fixel-wise metric to relate to only part of the contents of the voxel. The specific location of these fixelwise compartments within the voxel is typically not known, due to the partial volume effect. Rather, fixel-wise metrics relate to properties of the population of fibres along or close to the orientation of the fixel . WM axons of different crossing fibre populations might for instance even interdigitate within the voxel.
Fixel-wise metrics can be obtained from advanced dMRI models. Some dMRI models or signal representations only estimate voxelaveraged properties: e.g., the tensor from DTI allows for the extraction of a single principal orientation of diffusion, and voxel-wise metrics such as FA and MD can be calculated ( Basser & Pierpaoli, 1996 ). Other (typically multi-compartment) models represent fixels explicitly, along with corresponding fixel-wise metrics: e.g., the CHARMED model includes separate compartments for individual fibre populations and estimates a signal fraction for each of these ( Assaf & Basser, 2005 ). Note that not all multi-compartment models are necessarily multi-fibre models: e.g., the NODDI model has a single intra-and extra-cellular compartment (both relating to only a single fibre population), as well as an isotropic freewater compartment, and yields voxel-averaged measures of neurite density and orientational dispersion ( Zhang et al., 2012 ).
All FBA studies to date have relied on CSD techniques ( Tournier et al., 2007 ;Jeurissen et al., 2014 ;Dhollander & Connelly, 2016 ); these estimate a WM FOD in each voxel, and some additionally estimate separate grey matter (GM) and cerebrospinal fluid (CSF) signal compartments. The WM FOD as obtained from these techniques is a free-form continuous angular function, i.e., it does not explicitly represent fixels and fixel-wise metrics. However, the FOD typically shows an angular contrast with several "peaks " clearly relating to individual fibre populations. In a quantitative context, the amplitude of the WM FOD is referred to as the apparent fibre density (AFD) ( Raffelt et al., 2012b ). The AFD along a given orientation of the WM FOD is mostly proportional to the dMRI signal perpendicular to it. Hence, the AFD is approximately proportional to the total amount of intra-cellular volume of axons along this orientation under certain conditions, including a sufficiently long diffusion gradient pulse duration (e.g., ≥ 30 ms), a relatively high b-value (e.g., ≥ 3000 s/mm 2 ) and for a certain scale of microstructural features (e.g., axon diameters ≤ 6 m) ( Raffelt et al., 2012b ). Recent findings have demonstrated that the accuracy and specificity of AFD can be improved by using higher b-values and single-shell data (as opposed to multi-shell data, which includes lower b-values), as these choices help to suppress extra-axonal signal ( Genc et al., 2020a ) (see also the later section on "Requirements and effects of acquisition parameters " for an in-depth discussion on this topic).
In the FBA framework, fixels are derived directly from the WM FODs themselves by segmenting each FOD "lobe " (this refers to the shape of the FOD peaks when visualised by radial scaling of amplitudes) ( Raffelt et al., 2015 ), as shown in Fig. 2 . The fixel-wise total AFD is obtained by integrating the AFD values across the corresponding lobe. The final fixel-wise metric is often referred to more broadly as "fibre density " (FD) . Because the apparent FD is approximately proportional to the total intra-cellular volume of axons within the voxel (and along the fixel), it can not distinguish between effects specific to axon count or axon diameter(s): both factor into the apparent FD metric . Also note that apparent FD is largely not sensitive to myelin , as myelin-associated water has a very short T2 relaxation time and therefore contributes little to the dMRI signal ( Raffelt et al., 2012b ). Finally, signal related to other non-WM tissues, cells and fluids can be teased out from the WM FOD to render the apparent FD more specific to WM only, by using 3-tissue CSD techniques such as multi-shell multi-tissue CSD (MSMT-CSD) ( Jeurissen et al., 2014 ) and single-shell 3-tissue CSD (SS3T-CSD) ( Dhollander & Connelly, 2016 ).
While apparent FD is approximately (linearly) proportional to intraaxonal volume, it doesn't provide a direct absolute or standardized volume measurement of it. CSD techniques in this context are applied to the dMRI signal without voxel-wise normalisation by the b = 0 image ( Raffelt et al., 2012b ), unlike most other dMRI modelling techniques. Not only is apparent FD expressed in arbitrary units , it also requires correction for spatial intensity inhomogeneities (bias fields) of the dMRI data as well as some form of global intensity normalization to render it comparable between different subjects within a study ( Raffelt et al., 2012bDhollander et al., 2021 ). This is in addition to using the same acquisition hardware and parameters throughout any given study, as well as using a single common study-specific response function (per tissue) with the CSD method for all subjects to be compared (as reflected in relevant steps of the pipeline; see also Fig. 3 ).

Fixel-based Analysis pipeline
Even though the basic components enabling the FBA framework were fully established earlier ( Raffelt et al., 2015, the implementation of various FBA studies has since been improved due to the introduction of new techniques related to preprocessing and dMRI signal modelling . Of particular note in this context is the introduction of 3-tissue CSD techniques, which can estimate GM and CSF signal compartments in addition to the WM FOD ( Jeurissen et al., 2014 ;Dhollander & Connelly, 2016 ). Other than an increased specificity of the WM FOD, it also enables a more robust approach to global intensity normalization and bias field correction. The latter is then informed by and performed on the 3-tissue CSD derived signal compartments themselves , rather than as a "preprocessing " step on the original dMRI data. This practice and the accompanying structure of the (preprocessing) pipeline have been adopted across recent FBA studies, as 3-tissue CSD processing has become possible for both multi-shell as well as single-shell dMRI data since the introduction of the SS3T-CSD method ( Dhollander & Connelly, 2016 ).
As mentioned, the FBA pipeline otherwise closely reflects the overall structure of a "traditional " VBA pipeline. Compared to VBA, most additional steps relate to either dMRI-specific preprocessing earlier on in the pipeline, and specific solutions to deal with the challenges of fixels and fixel-wise metrics later in the pipeline. The information of local deformations obtained from the spatial warps of subject images to template space can be used in a fixel-wise fashion as well. This yields the fixel-based equivalents of tensor-based morphometry (TBM) via computation of the fixel-wise "fibre-bundle cross-section " (FC) , and voxel-based morphometry (VBM) ( Ashburner & Friston, 2000 ) by combining FD and FC into "fibre density and cross-section " (FDC) . The core tools to run the FBA pipeline  have been made available as part of the MRtrix3 software package  ; https://www.mrtrix.org ); and are often complemented with other tools, e.g., for motion and distortion corrections ( Jenkinson et al., 2012 ; https://fsl.fmrib.ox.ac.uk/fsl/ ) or SS3T-CSD ( https://3tissue.github.io ). A schematic overview of all steps, their relationships and the overall flow of the pipeline is provided in Fig. 3 . In Supplementary Document 1 , we describe each step with a focus on its purpose, interpretation, and practical aspects for consideration by researchers, and we list additional software resources.
Ultimately, the FBA pipeline yields fixel-wise statistical results and a specific p-value is assigned to each individual fixel (even in the presence of multiple different fixels in the same voxel). Fig. 4 shows a typical result using some of the most common visualization techniques typically relied upon in published FBA studies. Notably, all these visualizations present the exact same fixel-wise result. While the cropped streamlines tractogram visualization is more convenient to observe and explore the result as a whole, it otherwise still only shows those areas where individual fixels effectively reached a threshold for statistical significance. However, it's not surprising that FBA results often feature some anatomical "continuity " or a pattern of "clusteredness ". On the one hand, for a range of biological mechanisms it makes sense that larger parts of WM tracts would be involved or affected, but on the other hand this is also promoted inherently in the FBA framework itself, e.g., by connectivitybased fixel-wise smoothing and the connectivity-based fixel enhancement (CFE) mechanism ( Raffelt et al., 2015 ).

FBA studies: Applications
We have performed a systematic search to retrieve all currently published FBA application studies, as defined and detailed in Supplementary Document 2 . Note we also included research preprints with the intention of more exhaustively sampling the current scope of applications. Since the introduction of the FBA framework, 75 FBA studies (66 peer-reviewed, 9 preprints) have been published. The adoption of the FBA framework has seen a stark increase over time ( Fig. 1 ).
We summarized the main results and conclusions of each study in Supplementary Document 3 . Finally, we also documented all key study parameters and outcomes in a comprehensive overview in Supplementary Document 4 .

FBA and other dMRI analysis strategies
FBA is one amongst a range of techniques which have been used to assess and analyse white matter microstructure. Diffusion MRI data is also commonly analysed using voxel-based analysis (VBA) of various metrics derived from different diffusion and microstructure models. Another popular analysis framework developed specifically for dMRI data Fig. 3. The FBA pipeline reflects the general structure of a VBA pipeline, but with many additional steps to appropriately process dMRI data (red boxes), FOD images (blue boxes) and fixel-wise image data (gold boxes) . The left column shows the main flow of the pipeline. The FBA framework avoids problems related to spatial interpolation of fixel-wise image data by warping FOD images to template space instead, and delaying definition of fixels to a later stage in the pipeline. In the right columns , each box names a processing step and its resulting output. Subject-level steps are performed once per individual subject image in the study, whereas study-level steps are computed only once for the entire study. All steps are described in detail in Supplementary Document 1 . Apart from the introduction of 3-tissue CSD techniques and log-domain intensity normalisation, this pipeline matches all steps described originally . It is also broadly in line with the online documentation provided with the MRtrix3 software .
is tract-based spatial statistics (TBSS) ( Smith et al., 2006 ). The majority of studies applying either VBA or TBSS on dMRI data have focused on metrics derived from diffusion tensor imaging (DTI) ( Basser & Pierpaoli, 1996 ), or more recently the neurite orientation dispersion and density imaging (NODDI) microstructure model ( Zhang et al 2012 ).
Of note is that several FBA studies themselves have also additionally included results based on DTI-derived metrics, e.g., fractional anisotropy (FA): of all 75 FBA studies we included for this review, 33 studies (44%) included additional DTI-based results via either VBA, TBSS and/or region of interest based analyses. Given limitations of the DTI model and problems with interpretation of derived metrics ( Fig. 5 ), it is somewhat surprising to see such results are still included quite often. One explanation might be the desire to more directly relate findings to previous studies in the same application area (e.g., similar clinical groups), where their conclusions typically did rely solely on DTI-based findings. In some cases, these studies combined and directly compared FBA and DTI findings, for instance exhibiting larger effect sizes using the FBA framework compared to analyses based on DTI results ( Adanyeguh et al., 2018 ). Others reported lower sensitivity of voxel-wise DTI metrics in detecting group-wise differences when compared to fibre-specific FBA results, particularly in crossing-fibre regions ( Raffelt et al., 2015 ;Gajamange et al., 2018 ;Mito et al., 2018 ;Zarkali et al., 2020 ). In another study, almost no overlap between significant voxel-wise DTI and fixel-wise FBA findings was found ( Lyon et al., 2019 ), which is quite remarkable. Understanding these discrepancies remains an ongoing challenge. Their impact is relevant, as it confounds which WM structures are reported to be associated to specific conditions in the literature over time.
FBA offers two key advantages over alternative dMRI analysis techniques: sensitization to microstructure-specific properties independent of local fibre geometry, and specificity of the analysis and results with   Mito et al. (2018) , whereby a cohort of Alzheimer's disease patients was tested for FDC decreases compared to healthy control subjects). Panel A : direct visualization of the fixel-wise statistical results by colouring each fixel according to its p-value. Due to the particular choice of the ( hot colour ) scale bar limits, fixels with p < 0.05 are highlighted, whereas others are black. Panel B : the same result is visualized by cropping a whole-brain streamlines tractogram. Parts of streamlines are only shown when they intersect voxels containing fixels with p < 0.05 , while running along an orientation close to those fixels. Coloring here was chosen similar to Panel A to highlight that this is merely a different visualization of the same result. Panel C : the benefit of the streamlines visualization of an FBA result is that it more easily allows to identify larger continuous patterns in the result, e.g., relating to known anatomy of WM tracts. Here, the cropped streamlines visualization is shown in 3D for the whole brain (augmented by a glass brain volume). The researchers have additionally labeled the streamlines via targeted tractography approaches based on prior knowledge of WM tract anatomy.
respect to individual fibe-specific effects. While DTI metrics have proven to be sensitive to certain changes of white matter microstructural properties, they are inherently non-specific to axonal properties, and conflated by extra-axonal signal contamination as well as various aspects of fibre geometry (e.g., crossing fibres, dispersion, etc.), rendering biophysical interpretations challenging, non-intuitive or even misleading ( Jones et al. 2013 ;Bach et al. 2014 ;Beaulieu 2009 ). The example in Fig. 5 illustrates how a genuine decrease of fibre density (FD) in presence of crossing fibre populations might for instance result in an increase of FA as derived from DTI. Typical studies of, e.g., neurodegeneration, based on DTI might not even recover such regions as only decreases of FA would often be hypothesised and tested for. But even when tested and recovered, such an effect would be counter-intuitive. Some FBA studies have incorporated DTI analyses to highlight these issues in areas with crossing fibres. Grazioplene et al. (2018) demonstrated in a schizophrenia cohort that significant group differences of FA substantially overlapped with regions containing complex fibre architecture: they conclude that DTI findings could be lacking in specificity due to macro-structural complexity and thus may not necessarily reflect group differences in microstructural properties. Mito et al. (2018) recovered regions of increased FA in crossing fibre regions in Alzheimer's disease patients, and explicitly demonstrated these to be misleading findings reflecting inherent issues of voxel-wise FA values.
Alternative multi-compartment methods, such as neurite orientation dispersion and density imaging (NODDI) ( Zhang et al., 2012 ) have been proposed to quantify white matter microstructural properties with greater specificity to intra-cellular properties, and separate these from effects due to geometry. For example, NODDI incorporates a separate parameter for orientational dispersion of the neurite distribution, independently of (the magnitude of) neurite density. In DTI, on the other hand, both such effects are "entangled " in the FA metric, leading to the aforementioned problems. However, another distinct advantage of FBA is its ability to analyse individual fibre-specific properties separately , whereas VBA approaches are inherently unable to assign significant effects to specific fibre populations due to partial voluming. Even when models (such as NODDI) do address and disentangle certain fibre geometry confounds, they do not per se model individual fibre populations. For example, while NODDI does account for dispersion, it does not define this for separate fibre populations within a voxel. Hence, genuine fibre crossing configurations are fitted as a single population with a large amount of dispersion, and neurite density is not separately quantified for crossing neurite populations.
Some researchers might be interested in comparing results directly between or across different analysis frameworks and dMRI models. While the improved specificity of FBA relative to other voxel-based analysis approaches has been well documented, it is typically difficult to relate the nature of the effects of the various other voxel-wise diffusion metrics to a given fibre-specific effect. Individual studies relying on different clinical populations, acquisition parameters, and image processing steps only add further to the complexity of such direct comparisons. Therefore we would generally caution users against attempting to infer intuitive or even complex relationships between results, as the lack of specificity of voxel-based approaches and models renders this theoretically impossible without strong assumptions.
The aforementioned tract-based spatial statistics (TBSS) framework ( Smith et al., 2006 ) constitutes another popular approach to analyze voxel-wise metrics (e.g., derived from DTI or NODDI). In the context Patients show a fixel-specific decrease of apparent FD in the SLF, with both other tracts unaffected. DTI-based analysis will yield a counterintuitive result in this scenario, whereby the FA is increased in such voxels. Such a change might go undetected (when increases of FA are not tested for), could be misinterpreted (as if a certain aspect of WM microstructure has "improved "), and cannot be attributed to any specific individual fibre population or combinations thereof due to lack of fibre-specificity. Finally, note this change has even impacted on the diffusion tensor's main orientation, whereas no individual WM tract orientations had in reality been affected.

Table 1
Comparison of key defining aspects of voxel-based analysis (VBA), tract-based spatial statistics (TBSS) and fixel-based analysis (FBA). Note that tensorbased morphometry (TBM) and voxel-based morphometry (VBM) are regarded as a type of VBA in this context.

Voxel-based analysis (VBA)
Tract-based spatial statistics (TBSS) Fixel-based analysis (FBA) Domain of analysis Entire voxel grid within the brain. Only voxels on a mean (template) FA "skeleton ".
Fixel-level specificity for individual fixels in a voxel. Alignment & correspondence Image registration to a common template space and spatial interpolation.
Image registration to a common template space. Thinning of FA template to obtain a mean FA skeleton. Project maximum subject FA value perpendicular to mean FA skeleton onto the skeleton voxels.
FOD-based image registration to a common FOD template. Segmentation of template fixels and subject fixels. Bespoke fixel correspondence criteria to assign reoriented subject fixels to template fixels.

Statistics
Correction for a large number of comparisons. Spatial smoothing and threshold-free cluster enhancement (TFCE).
Correction for a reduced number of comparisons (less voxels on the FA skeleton).
Correction for a very large number of comparisons (typically more fixels than voxels). Connectivity-based fixel-wise smoothing and connectivity-based fixel enhancement (CFE).
of this review and the topic of fixel-specificity, to avoid confusion on the TBSS naming (in particular the term "tract-based "): this is effectively a voxel-based technique. The problems with VBA that TBSS aims to address are of an entirely different nature: they relate to challenges with alignment of subject images (due to limited precision and accuracy of image registration techniques) as well as the dependence of VBA on an arbitrary amount of smoothing (which does impact strongly on the result) ( Smith et al., 2006 ). To put it differently: TBSS mostly addresses existing problems of VBA related to establishing voxel-wise correspondence between images . While FBA also implements a bespoke strategy towards establishing correspondence between subject data, this is rather to tackle new challenges introduced by the nature of fixels (see also the earlier section "From voxels to fixels "). Interestingly, the challenges addressed by TBSS do remain largely present for FBA in principle. However, due to the incorporation of FOD-based population template construction and registration in the pipeline, image alignment is expected to be more accurate in the first place ( Raffelt et al., 2011( Raffelt et al., , 2012a. We provide a general overview comparing the key defining aspects of the VBA (also covering TBM and VBM), TBSS and FBA approaches towards analysis in Table 1 . For specific details on TBSS, we refer the reader to Smith et al. (2006) and Smith et al. (2007) . All relevant details on the FBA framework are provided in the earlier section "Fixel-based Analysis pipeline " and Supplementary Document 1 . Finally, as mentioned in the earlier section "Fixel-wise metrics and apparent fibre density ", other diffusion modelling and parameter estimation techniques can also yield fixel-wise measures. The CHARMED model estimates signal fractions for individual fibre populations in a voxel ( Assaf & Basser, 2005 ). Another example is the "Bayesian estimation of diffusion parameters obtained using sampling techniques with modelling of crossing fibres " (BEDPOSTX) technique ( Behrens et al., 2007 ;Jbabdi et al., 2012 ), which similarly estimates fixel-wise parameters. The key difference with CSD techniques is that the aforementioned methods compute fixel-wise metrics directly from the dMRI data, whereas CSD techniques yield a free-form continuous FOD first, from which fixels are obtained later on in the FBA pipeline (see Fig. 2 for an illustration of fixel derivation from FODs, and Fig. 3 for the order of these steps in the pipeline). This effectively makes it possible to analyse fibre-specific parameters obtained from other estimation strategies such as CHARMED or BEDPOSTX using the FBA framework . However, since the CSD-based FBA pipeline implements the transition from subject space to template space by warping FOD images to avoid the problems associated with spatial interpolation of fixel-wise data, a few adjustments have to be made to achieve this . Such a pipeline should warp the (preprocessed) dMRI data themselves directly to template space (without reorientation). Obtaining the fixels and their orientations as well as the fixel-wise parameters (e.g., applying CHARMED or BEDPOSTX) should then be performed in template space , after which fixels can be reoriented similarly to the original pipeline ( Fig. 3 ). A solution would also have to be implemented for deriving a common fixel analysis mask. Note that for this purpose, FODs obtained from a CSD technique could still be relied upon to build a study-specific FOD template from which a fixel analysis mask can be derived. However, the final fixel-wise statistics would be performed directly on the parameters derived from the other dMRI modelling technique (e.g., CHARMED or BEDPOSTX).

Requirements and effects of acquisition parameters
Since the main goal of FBA is to investigate fibre-specific effects, resolving individual crossing fibres in the first place is of course essential. In this context, so-called "high angular resolution diffusion imaging " (HARDI) gradient schemes are commonly employed to collect dMRI data ( Tuch et al., 2002 ). As the name suggests, HARDI schemes are designed to acquire images for a large number of diffusion gradient directions, uniformly distributed over the angular domain, and typically at a constant amount of diffusion-weighting (i.e., a specific b-value, referred to as a "shell "). Hence, with the capacity to resolve crossing fibres in mind, two key parameters are to be considered: the number of diffusion gradient directions and the b-value . Tournier et al. (2013) have systematically investigated the required number of gradient directions to capture the angular contrast of dMRI data for a range of b-values, up to b = 5000 s/mm 2 . Generally, while the signal (and thus also the signal-to-noise ratio (SNR)) of dMRI data decreases for higher b-values, the angular contrast increases with b-value . However, higher angular contrast implies higher angular frequencies of the signal, thus also increasing the required number of gradient directions to capture all features of this signal well. Tournier et al. (2013) confirmed this by investigating the angular frequency content of the signal via a spherical harmonics (SH) representation (the angular equivalent of a Fourier basis). Specifically, they found that terms beyond an SH order of 8 were negligible for all b-values up to b = 5000 s/mm 2 . In their b-value sampling range, the trend of requiring higher SH orders also levelled off around b = 3000 s/mm 2 . The mathematical equivalent to sample an order 8 SH signal equals 45 diffusion gradient directions . What these results thus suggest is that 45 directions constitutes a sufficient HARDI sampling to capture all features in the signal, and that those features themselves don't manifest much stronger beyond b = 3000 s/mm 2 (at least in the range up to b = 5000 s/mm 2 ).
However, in practice it might still be desirable to acquire data for more than 45 gradient directions: SNR at high b-values is typically very low, and hence more data points are useful for a robust fit of various models. On the other hand -and often overlooked -when resolving the WM FOD using a CSD method, the non-negativity constraint on the FOD amplitude also "injects " information into the model fitting process, an inherent effect referred to as "super-resolved " CSD ( Tournier et al., 2007 ). This effect is substantial for most FODs throughout the WM, as these are typically very sparse in the angular domain (i.e., large parts of the angular domain have zero FOD amplitude). In practice, this means reasonable quality WM FODs can be resolved with even less than 45 gradient directions sampled. How far this can be stretched reliably is hard to determine though, and the exact extent of it would also depend on the local fibre configuration (i.e., the sparsity of the FOD). In light of this and the aforementioned contribution of more images to the overall SNR, 45 gradient directions can still reasonably be argued to be a good (minimum) target to aim for when designing a HARDI protocol for the purpose of FBA . Of the 75 FBA studies we included, 66 studies (88%) used data with 45 gradient or more gradient directions (for the highest b-value), whereas 6 studies (8%) still managed to run FBA with 30 or less gradient directions (for the highest b-value). Overall, HARDI gradient schemes appear to be well adopted in practice.
While both the number of diffusion gradient directions and the bvalue thus have an impact on the overall qualitative aspects of the WM FODs, several FBA studies using a low number of gradient directions and/or low b-values have still yielded fairly encouraging results, which demonstrates that FBA is effectively feasible for such data as well as sensitive to significant effects. FBA is certainly technically compatible with a range of angular resolutions and b-values, as these parameters do not preclude any preprocessing steps, 3-tissue CSD reconstruction (using either MSMT-CSD or SS3T-CSD), intensity normalization, template construction, fixel segmentation and reorientation, or any other steps in a state-of-the-art FBA pipeline (see also the earlier section "Fixel-based Analysis pipeline " and Supplementary Document 1 ).
However, the main caveat lies in the interpretation of the apparent FD metric (see also the earlier section "Fixel-wise metrics and apparent fibre density "). At a sufficiently high b-value, e.g., b = 3000 s/mm 2 or similar, increased specificity to the intra-axonal water signal results in more accurate measures of apparent fibre density that are approximately proportional to the total amount of intra-cellular volume of axons under certain conditions ( Raffelt et al., 2012b ;Genc et al., 2020a ). This is achieved due to the strong attenuation of extra-axonal water signals at such high b-values. At lower b-values, such as those commonly acquired for DTI processing (e.g., b = 1000 s/mm 2 or similar), signals from the extra-cellular space outside the axons will contribute to the apparent FD metric, undermining the intended definition of the latter and thus rendering biological interpretation challenging and fundamentally limited. For example, a clinical patient group may experience substantial changes to the extra-cellular architecture, which would be artificially reflected as a (group) difference in apparent FD. Furthermore, effective differences in apparent FD due to actual changes of intra-axonal volume are likely to induce concomitant changes to the extra-cellular volume and architecture. Hence, most measured apparent FD effects will effectively be biased at low b-values. For example, a decrease of intraaxonal volume should result in a decrease of apparent FD reflecting it; but if this leads to a corresponding increase of the volume of the extracellular space, the signal from the latter at a low b-value would also increase and thus counteract the expected decrease in apparent FD. In such a scenario, apparent FD effect sizes are diminished and sensitivity of the FBA to apparent FD is negatively impacted. Interestingly, this also challenges the use of multi-shell data for the purposes of quantifying apparent FD, as this introduces lower b-values as well. While it might be intuitively appealing to use all (or generally "more ") data to compute the WM FOD, this is not necessarily compatible with the very assumptions on which apparent FD relies ( Genc et al., 2020a ). Indeed, the MSMT-CSD equations ( Jeurissen et al., 2014 ) apply to each b-value shell in the data, and thus lower b-values will weigh in, again introducing undesirable extra-cellular signal contributions into the apparent FD metric. Genc et al. (2020a) demonstrate this via simulations as well as in-vivo data in a relevant FBA scenario. Their results revealed that (1) apparent FD was estimated less accurately when lower b-value or multishell data were used and showed a larger dependency on extra-cellular signal, as compared to single-shell high b-value data and (2) using lower b-value or multi-shell data also led to reduced sensitivity (in an experiment involving age-related patterns of development). Of all 75 studies sampled in this review, 44 studies (59%) were limited to data with b ≤ 2500 s/mm 2 . Of these, 20 studies were even limited to b ≤ 1000 s/mm 2 data. While the remaining 31 studies (41%) did work with datasets with a maximal b > 2500 s/mm 2 , 12 of those relied on multi-shell data and included lower b-value shells to resolve the WM FOD from which the apparent FD metric was estimated. Dimond et al. (2020) did have multishell data available, but for the reasons described above they chose to only use the highest b-value shell ( + b = 0) with SS3T-CSD to compute apparent FD for FBA (and lower b-value data was used only for a separate analysis of DTI-derived metrics). Overall, we note that -in contrast to HARDI gradient schemes -higher b-values are still relatively less well adopted.
In conclusion, on the one hand, it is technically entirely feasible to run a state-of-the-art 3-tissue CSD based FBA pipeline even on data limited to, e.g., only 30 gradient directions and/or b-values as low as b = 1000 s/mm 2 . This opens up possibilities for revealing fixel-specific effects in many existing older datasets, or for long-running studies that have already "locked in " their dMRI protocols. However, further work may be required to assess the reliability of specific conclusions drawn from data including low b-values. On the other hand, for new studies it is highly advisable to collect HARDI data with, e.g., ≥ 45 gradient directions and b ≥ 3000 s/mm 2 , so as to ensure both good WM FOD quality (which promotes robust processing) and specificity to intra-axonal signals, enabling proper quantitative interpretation of the apparent FD metric at the core of a typical FBA.

Challenges with interpretation of FD and FC
Beyond the effects of acquisition parameters, which can complicate or limit interpretation of apparent FD effects as explained above (e.g., due to partial sensitisation to extra-axonal signals at limited b-values), other challenges with and limitations of the apparent FD metric exist. As mentioned in the earlier section on "Fixel-wise metrics and apparent fibre density ", apparent FD does not tease apart effects of axon count and axon diameters. This is an important consideration when interpreting apparent FD changes, as without a proper context, e.g., thinking of decreases in apparent FD strictly as a loss of individual axons could constitute a critical misinterpretation of findings. Moreover, apparent FD is largely not sensitive to myelin ( Raffelt et al., 2012b ), and thus decreases in apparent FD do not necessarily reveal demyelination (nor do apparent FD increases imply myelinogenesis), even though they might accompany or eventually follow it in a number of (biologically) realistic scenarios. Note that, while dMRI signal in general is not sensitized to myelin, it is still a popular choice to study myelin ( Mancini et al., 2020 ). This is possible due to myelin changes indirectly affecting the geometrical architecture of the extra-axonal space, which in turn influences parameters of certain dMRI models. However, such parameters are also affected by a range of other effects, so they cannot be specifically interpreted as myelin ( Mancini et al., 2020 ).
Apart from the aforementioned notes on the specific sensitization of apparent FD, another challenge is involved with its interpretation: the mere fact that it represents a local (apparent) density metric has surprisingly complex implications, which could easily be overlooked or misunderstood. The local FD of axons provides us with a measure approximately proportional to the amount of "axonal matter " present per unit of volume, i.e., within a voxel (and along the fixel orientation). Hence, this depends not only on those axons themselves, but also on all the other (non-axonal) space or volume in between . For example, a decrease in FD could result from vasogenic edema (as might occur, e.g., after traumatic brain injury), whereby an excess of fluid accumulates in the interstitial matrix and causes it to expand: this might simply move the axons further apart without otherwise affecting their individual size (i.e., diameter). Interestingly, note that several other multi-compartment dMRI models also involve local (voxel-wise or fixel-wise) density metrics: for example, the NODDI model yields a neurite density measure ( Zhang et al., 2012 ). As such, considerations related to interpreting a density metric are similarly relevant.
In the vasogenic edema example, the FD metric is sensitive to the decrease of the number of axons within a given voxel , even though no actual axons were lost: some were merely displaced outside the voxel, into other voxels. Macroscopically, a swelling of the tract might thus be observed (which "compensates " for the decreased FD). In the FBA framework, the latter macroscopic piece of the puzzle can be assessed by computing the fibre-bundle cross-section (FC) metric, which expresses this property for different subject images relative to a common template. This is obtained from the warps mapping each subject to the template space, and thus relies on accurate image registration (see the section "Fibre-bundle cross-section (FC) computation " in Supplementary Document 1 ). This information can then be combined with the FD metric in a strategy akin to VBM ( Ashburner & Friston, 2000 ), resulting in the fibre density and cross-section (FDC) metric . In our edema example, the FD decrease would be offset by a similar FC increase (i.e., the swelling), resulting in an unchanged FDC . The latter would then ultimately reflect the fact that no actual axons were lost in the overall bundle. However, FDC would indeed not be sensitive to the vasogenic edema effect, even though it might still be of critical biological relevance. Ultimately, the complete picture is only provided by assessing FD, FC and FDC and jointly considering their individual decreases or increases. This leads to many different possible combinations of effects, of which we have provided a range of basic examples and more complex scenarios in the "Combined fibre density and cross-section (FDC) computation " section in Supplementary Document 1 . Yet another more complex example involving crossing fibre tracts is presented in Fig. 6 . The latter illustrates that effects within one bundle can result in (surprising) concomitant effects in another crossing bundle. We should note, however, that these examples do not demonstrate any technical limitations of the framework. Rather, they illustrate that reasoning about density and/or cross-sectional effects is not as straightforward as it might seem at first sight, and interpreting such results requires careful consideration of any possible underlying scenario that might explain these.
Finally, even the specificity in separating FD and FC effects is not entirely clear-cut in practice: it is limited by the accuracy and precision with which image registration is able to map subject images to the common template space and as such establish spatial (or fixel-wise) correspondence between them. The effects of registration on density and volume assessments have been known and well described since the introduction of VBM ( Ashburner & Friston, 2000 ), but are sometimes overlooked in practice. In the context of FBA, these imply that part of what "should have been " an FC effect can be underestimated and partially transfer into an FD effect instead when image registration does not entirely bridge the spatial gap between images. This will for certain anatomical structures always be the case up to an extent, because non-rigid registration algorithms rely on spatial regularization to robustly produce a sufficiently smooth mapping between images. Moreover, the amount of "transfer " of such FC effects to FD will depend on the size of the anatomy, and generally be more pronounced for thinner structures (in particular those approaching the acquisition voxel size) . The opposite is possible as well, when strong intensity differences due to pronounced FD effects might induce a nonlinear deformation (and thus FC effect), especially when using a sum of squared differences metric to drive registration. Because the specificity to distinguish FD and FC effects thus depends on spatial resolution, sizes of different anatomical structures and a range of image registration parameters (e.g., regularization or smoothness of the warp) which are under arbitrary control of researchers, FD and FC effect sizes can not be meaningfully directly compared.
FD is often said to relate to microstructural effects, while FC reflects macro-structural effects. This is a useful intuition to introduce and explain complex combinations of FD, FC and FDC effects and motivate researchers to carefully consider various biological scenarios that might explain their results. However, we conclude that caution is advised, as the "separation " between FD and FC is less clear-cut due to practical limitations of methods.

Multimodal studies
Combining complementary information from different (MRI) contrasts or modalities may allow for more comprehensive and insightful conclusions than reporting FBA (or other dMRI analysis) results in isolation (for reviews, see Damoiseaux and Greicius, 2009 ;Straathof et al., 2019 ;Suárez et al., 2020 ). This is particularly important when studying brain injured patients whereby white matter damage is not occurring Fig. 6. Complex pitfalls when interpreting fixel-specific FD and FC changes (example similar to Raffelt et al., 2017 ; but with critical corrections to the number of axons in the illustrated voxel). This example demonstrates how changes in one bundle of axons can explain concomitant changes in another crossing bundle. A bundle of crossing orange ( vertical ) and blue ( perpendicular to this page ) axons are shown, and measured in the ( black ) voxel. The orange tract suffers microscopic axon loss, followed by a macroscopic collapse (atrophy) of the tissue. The blue tract features no actual axon loss, but merely "joins in " the collapse of tissue due to the available space. Upon initial axon loss, all results are intuitive: only FD of the orange fibre population is decreased (and this is also reflected in its FDC), while all other properties ( orange FC; blue FD and FC) are unaffected. However, when the tissue collapses (atrophy), a set of complex effects plays out across FD and FC values of both orange and blue bundles . The FD of both bundles increases (!), whereas FC jointly decreases. Compared to the original "healthy " setting, the effect size of FD alone underrepresents the impact to the orange tract, but also describes an increase (!) for the blue tract. Arguably, FDC is "easier " to interpret (only showing impact on the orange bundle, with the blue bundle unaffected throughout); but in turn it is not sensitized to the atrophy, which might itself indicate a biologically relevant stage or transition in a complex disease process.
in isolation from other brain alterations, such as GM atrophy, changes in functional connectivity, and neuro-inflammation. Of the 75 studies sampled in this review, 22 employed multimodal data and analysis techniques in combination with FBA of dMRI data. However, only 9 of these multimodal MRI studies quantified the relationship between fixel-wise metrics and other (e.g., structural or functional) MRI metrics ( Adanyeguh et al., 2018 ;Boonstra et al., 2020 ;Gajamange et al., 2018 ;Luo et al., 2020 ;Mizuguchi et al., 2019 ;Park et al., 2021 ;Sanchez et al., 2020 ;Savard et al., 2020 ;Vaughan et al., 2017 ). For example, Luo et al. (2020) reported that apparent FD of the fornix column and body, and FC of ventral cingulum correlated with composite amyloid and tau levels in Alzheimer's disease patients. As another example, Savard et al. (2020) observed that the amount of grey matter atrophy was strongly related to reduced apparent FD and FC in patients with fronto-temporal dementia. Multimodal studies can also improve interpretation of findings with regards to structure-function relationships, e.g., progression of disease with reference to cognition and behavior. Savard et al. (2020) were able to dissociate the contribution of apparent FD, FC and GM volume to semantic symptoms and executive dysfunction in fronto-temporal dementia, adding to our understanding of differing pathophysiological paths to both types of impairment and suggesting targets for therapy.
Despite encouraging findings, some multimodal MRI studies combining FBA results with other modalities show a number of common limitations. In many of these studies, the reported correlations between fixelwise metrics and other measures were still just weak to moderate. Also, appropriate correction for multiple comparisons was not always performed. Sometimes uncorrected thresholds and trends were reported for correlation analyses between fixel-wise and other metrics. Such issues are not specific to FBA studies or underlying methodology though: these are very common among multimodal studies in general, and this problem has only recently started to attract more attention ( Alberton et al., 2020 ). Generally, studies should aim to employ p-values adjusted for multiple comparisons; not only for the number of fixels, but also for the testing of multiple contrasts (within FBA) as well as for multiple experiments involving (combinations of) different modalities ( Alberton et al., 2020 ). While this would better ensure the validity of statistical results, trends can still be reported to help motivate future (multimodal) imaging studies. Unambiguous documentation of which results are supported by what kind of correction(s) is key, and the choice of words and language used in results, discussion and conclusion sections should be carefully considered accordingly.
Not all studies have found significant relationships between fixel-wise metrics and other measures. For example, Adanyeguh et al. (2018) reported no significant correlations between fixel-wise metrics and atrophy scores in patients with cerebellar ataxia. Failure to detect significant correlations could be explained by a more complex, non-linear relationship between both metrics. However, not all intuitively formulated hypotheses of this nature are necessarily valid in the first place (i.e., a correlation might genuinely not exist even when two effects are independently observed or described in a particular cohort). Regardlessly, studies would generally benefit from more advanced statistical analyses to reveal potentially non-linear relationships between fixel-wise and other brain metrics. Due to the magnitude and nature of FD and FC metrics (respectively volumetric and related to surface area), various non-linear transformations (e.g., a logarithm) arguably present as sensible candidates.
Researchers have also explored associations between various brain measurements and fixel-wise metrics, affording researchers greater freedom to pinpoint effect locations across the brain with increased specificity. Mizuguchi et al. (2019) reported that resting state functional connectivity between right lateral prefrontal cortex and left striatum was positively correlated with FC in the right anterior corona radiata. Boonstra et al. (2020) found that cerebellar decrease of FDC in multiple sclerosis (MS) patients was associated with cerebellar white matter atrophy and lesion load. Savard et al. (2020) observed in fronto-temporal dementia patients that reductions of apparent FD and FC in tracts of a fronto-temporal network were strongly linked to the amount of GM atrophy of peak nodes within this network.

Behavioral relevance of FBA results
Several FBA studies have investigated associations between fixelwise metrics and behavioural scores as a secondary aim, e.g., Adab et al. (2020) studied bimanual coordination performance, and Verhelst et al. (2019) looked at verbal working memory. Others incorporated clinical outcomes, e.g., via Mini-Mental State Examination ( Luo et al., 2020 ). These studies often examined such associations using correlation analyses between fixel-wise metrics and the relevant outcomes of interest across populations. For example, Choy et al. (2020) found significant negative correlations between age and fixel-wise metrics across multiple tracts in healthy adults, while Pannek et al. (2020) observed developmental improvements in cognitive and motor performance to be positively associated with fixel-wise metrics in infants. It should be noted that the majority of correlation coefficients of fixel-wise metrics with behavioural outcomes were often weak to moderate in strength across those studies implementing such analyses. Remarkably, Verhelst et al. (2019) found correlations between traditional DTI metrics and verbal working memory which were not present for the fixel-wise metrics. As mentioned in previous sections, interpreting disparate results between FBA and analysis of voxel-wise DTI metrics remains an ongoing challenge, due to the non-specific nature of DTI and its derived metrics (see also the earlier section "FBA and other dMRI analysis strategies ").
Whilst correlational analyses are important to understand brainbehavior relationships, some studies have employed other advanced analytical approaches, including multivariate profile analysis ( Genc et al., 2018 ) and mediation analyses ( Adab et al., 2020 ). For example, the mediation analyses performed by Adab et al. (2020) revealed that FDC partially mediates the relationship between age and bimanual coordination in the splenium and genu of the corpus callosum. Similar to the challenges of multimodal studies (see the earlier section "Multimodal studies "), it might be relevant to also explore non-linear relationships between fixel-wise metrics and behavioral metrics.
With increasing numbers of fixel-wise metrics and behavioral measures, the number of combinations and thus statistical tests (or "contrasts ") can easily grow. Without careful consideration, this can increase the prevalence of type 1 errors ( Alberton et al., 2020 ). However, properly correcting for these will then impact on the overall statistical power of studies. In this context, FBA in particular already faces a challenge due to the large numbers of individual fixels that are often analyzed; even though it implements the connectivity-based fixel enhancement (CFE) mechanism to partially address this (see also the earlier section "FBA and other dMRI analysis strategies "). Other strategies include using fixel regions of interest , either obtained from significant results of a prior FBA contrast (e.g., Mito et al., 2018 ) or by defining tracts of interest using an a priori anatomical hypothesis (e.g., Adab et al., 2020 ).

Longitudinal FBA studies
The majority of FBA studies thus far have primarily focused on revealing cross-sectional group differences of fixel-wise metrics. However, it is often of clinical interest to examine longitudinal changes in brain microstructure, particularly in response to development, aging, disease, or training interventions. Of the 75 studies reviewed, we identified 14 that have investigated changes in fixel-wise metrics over time. On the one hand, these changes were often assessed within one specific group of participants, and statistical analyses were performed comparing metrics between different time points (e.g., Mizuguchi et al., 2019 ;Verhelst et al., 2019 ;Rau et al., 2019 ). On the other hand, e.g., Genc et al. (2018 ) and Kelly et al. (2020) directly analysed the changes in fixel-wise metrics over the time period that each subject was studied, and assessed the relationship between such changes and developmental factors. This was in practice achieved by pre-computing the difference in fixel-wise metrics between different time points for each participant, reflecting a measure of change over time. Those were then analyzed in turn via a whole-brain FBA, either cross-sectionally to determine whether the change over time was different between groups, or to test whether changes over time were associated with phenotypic and clinical characteristics.
An important consideration for researchers relates to the definition and pre-computation of changes in fixel-wise metrics over time. In certain studies, the actual time difference between "time points " might vary to a certain extent across subjects due to how these "time points " are defined or when data could be acquired from the subjects. In these cases, it might be sensible to quantify change per unit of time , i.e., by normalizing the pre-computed change in fixel-wise metric by the time difference. Whether this is desirable or not, however, depends on the kind of change that is studied (or hypothesized). In this context, selecting time points for acquisition of data and modelling changes of FD and FC over time can prove highly challenging and involves a priori assumptions on the biological and biophysical processes. Note for example the surprisingly complex effects of atrophy (see the section "Combined Fibre Density and Cross-section (FDC) computation " in Supplementary Document 1 ) or vasogenic edema (see the earlier section "Challenges with interpretation of FD and FC ") on the FD and FC metrics, whereby they might (non-monotonically) go up and down over time. Being able to measure this, critically depends on sampling particular time points in the first place.
Another challenge many longitudinal studies are facing, involves missing data for some time point(s) of certain subjects. These scenarios call for statistical analyses which can more appropriately deal with this, such as mixed effects modelling. Some FBA studies have computed (average) fixel-wise metrics in a range of white matter pathways, in order to accurately model mixed effects due to missing data at the tract level rather than fixel level ( Dimond et al., 2020 ;Genc et al., 2020b ). More generally, for particularly complex statistical challenges it might be useful to extract tract-wise or fixel ROI-wise metrics and process these in dedicated advanced statistical software packages. The results of such advanced statistical analyses can often also be visualized in bespoke ways (using specialized plots), which might otherwise not be possible for many individual fixels (or it would at least defeat the purpose of a clear and thus useful visualization). Similarly, it might help to avoid over-interpretation of complex results.
Finally, another methodological aspect that is frequently brought up when implementing longitudinal FBA studies relates to the construction of the study-specific (FOD) template. Generally, the considerations in this context are not different to those for cross-sectional analyses, or non-FBA (e.g., VBA or VBM) studies (see the section on "Study-specific FOD template construction " in Supplementary Document 1 ). The FOD template serves as a common reference point for the study: for example, while the FC metric values are locally expressed relative to the template (by virtue of being calculated from the subject-to-template warps), they scale to it equally across all images of subjects and time points. Hence, relative FC effects between subjects or time points are not affected by the choice of a (common) template, even though the actual FC values themselves are .

Dealing with WM lesions in FBA studies
Many clinical populations, such as patients with traumatic brain injury (TBI), stroke, multiple sclerosis (MS), dementias, stroke and other neurodegenerative diseases are clinically heterogeneous due to the presence of lesions in variable (and often widespread) locations and of different types and sizes. These include large focal lesions, diffuse axonal injuries, white matter (T2) hyperintensities (WMHs), inflammation, and edema. Even in healthy elderly subjects, lesions can present in a similarly challenging fashion. The specific effects lesions have on FBA -both processing steps and results -is relatively unexplored in the literature.
Due to the complex nature of the FBA pipeline and its many different interacting steps (see also the earlier section "Fixel-based Analysis pipeline " and Supplementary Document 1 ), rigorous quality assessments are highly recommended at most stages of the pipeline (including preprocessing steps to deal with artefacts and motion, brain mask estimation, response function estimation, 3-tissue CSD, etc.) towards the estimation of the relevant fixel-wise metrics, especially in populations where large focal lesions, inflammation and edema are present. The reasons for this stem mostly from the fact that lesions can severely alter image intensities, but also geometry. For example, such lesions could affect the accuracy and precision of image registration and even the construction of a study FOD template itself. Several aspects of lesions and how they impact on 3-tissue modelling and metrics have been studied outside of the FBA framework, e.g., by Mito et al. (2020) and Khan et al. (2020Khan et al. ( , 2021 . Generally, it is important for FBA studies to take an appropriately cautious approach when lesioned subject populations are included. One typical practice for studies is to actively exclude participants with such extensive amounts of lesioned tissue that it would otherwise lead to specific problems with the processing or statistical analysis. For example, Verhelst et al. (2019) chose to only examine TBI patients with diffuse axonal injuries and excluded those with larger focal lesions . While on the one hand it makes sense to specifically study a TBI subpopulation with a focus on deficits more likely caused by white-matter disconnections, on the other hand being restricted to such an approach has inherent limitations in other scenarios. Consistently excluding participants might sometimes result in non-representative samples across the literature describing particular populations. One avenue to address this challenge was recently suggested: to enable more specific insights into rare or heterogeneous populations, a shift from group studies to singlecase approaches could be considered ( Attye et al., 2021 ;Chamberland et al., 2020 ). Defining robust pipelines for single-case FBA inspired approaches could be an interesting direction for future research. In this context, to the best of our knowledge, only Fekonja et al. (2021) implemented a modest initial attempt at subject-specific analysis of two randomly selected cases from their study on corticospinal tract impairment in patients with tumours.
Some studies have attempted to address challenges due to lesions by performing analyses within subdivisions of cohorts, e.g., based on shared lesion characteristics, which can as such limit the amount of heterogeneity. Wallace et al. (2020) performed an FBA that combined a largely heterogeneous sample of mild, moderate, and severe TBI patients. While descriptions of exclusion criteria were not provided, they performed subgroup analyses separately for mild TBI and moderatesevere TBI participants. As another example, Egorova et al. (2020) performed a whole-sample FBA of a cohort of stroke patients, but additionally also analyzed right hemisphere stroke and left hemisphere stroke patients separately (all three analyses as a cross-sectional comparison with healthy controls). Note the latter example showcases an aspect that is relevant beyond lesions specifically: lateralized pathologies. These are particularly challenging to study, as fixel-wise apparent FD, FC and FDC show widespread and non-trivial laterality even in the healthy brain ( Honnedevasthana Arun et al., 2021 ;Verhelst et al., 2021 ). Notably, Verhelst et al. (2021) urged caution with the typical approach of flipping brain images that is sometimes relied upon for studying lateralized pathology, as this might lead to false positive findings unrelated to the effect of interest. They formulated a range of caveats and advice in this context, concluding that it might often be preferable to avoid the brain flipping strategy altogether for this purpose and analyze the (differently lateralized) patient groups separately , as was done in Egorova et al. (2020) . More broadly, pathological tissue might affect nearby (or more remote) WM structure and function differentially, depending on its specific location.
Some FBA studies have assessed lesion load or volume measurements, as derived from other MRI modalities such as fluid-attenuated inversion recovery (FLAIR) or susceptibility-weighted imaging (SWI) data ( Boonstra et al., 2020 ;Egorova et al., 2020 ;Gottlieb et al., 2020 ). In these studies, lesion volumes have typically not been integrated in the FBA itself, e.g., as a covariate of non-interest or variable of interest. Instead, they were reported or analyzed separately. Some studies, e.g., Boonstra et al. (2020) , have furthermore (visually) inspected the lesion segmentations and nearby regions for the purposes of quality assessment of certain steps, e.g., image registration.
Finally, particular studies in MS ( Gajamange et al. 2018 ), stroke Gottlieb et al., 2020 ), and mild cognitive impairment and Alzheimer's disease ( Mito et al., 2018 ) have computed WM FODs using SS3T-CSD. By including additional isotropic signal compartments for other tissues and fluid in the model, 3-tissue CSD techniques resolve WM FODs that are more specifically sensitized to the anisotropic signal from axons. Critically, this allows for resolving and preserving the angular contrast of the WM FODs in presence of infiltrating pathology ( Aerts et al., 2019 ). Preservation of the aforementioned angular contrast of WM FODs in turn enables FOD-guided population template construction and registration of subject FOD images to this template to result in better spatial alignment (similar to how 3-tissue CSD techniques increase the same contrast for axonal projections into the cortical GM). Furthermore, accurate fixel segmentation is then also possible in lesioned regions or those infiltrated by pathological tissue. Ultimately, it also enables more accurate and specific apparent FD measures by removing signal contributions unrelated to the intra-axonal space. However, note that the use of high b-values is additionally recommended to further suppress extra-axonal signal contributions ( Raffelt et al., 2012b ;Genc et al., 2020a ) (see also the earlier sections on "Fixel-wise metrics and apparent fibre density " and "Requirements and effects of acquisition parameters ").

Limitations and future challenges
The FBA framework has introduced a unique capability, where the partial volume effect between different crossing fibre populations has been largely tackled. Yet this still does not represent the "ultimate " specificity to disentangle effects for all relevant distinct fibre populations. Several WM bundles in the brain "funnel " together along substantial portions of their length. This can not be overcome by modelling at the local voxel-level or even fixel-level alone: it is effectively an inherent limitation to the dMRI measurements in isolated voxels . This also causes major problems, e.g., for fibre tractography and might even be the most important reason it is challenged by large amounts of false positive connections ( Maier-Hein et al., 2017 ). For FBA, this becomes a challenge when interpreting results in terms of known anatomy. The increased fibre-specificity might even impose a false sense of confidence in labelling those results, with a bias towards larger or more commonly known bundles (with a further risk of relating these to the wrong cortical regions or even functions). An opportunity exists to develop objective labelling strategies , based on carefully curated prior knowledge.
Another challenge lies in the obvious complexity of the FBA framework (e.g., see Figure 3 ). Compared to VBA, many additional steps are necessary to appropriately address the unique nature of dMRI data, FOD images and fixel-wise image data. While several of these steps are relatively straightforward for researchers, in the sense that they are largely automated, others do introduce new user-defined parameters or choices and a need for specialized quality assessments at various stages in the pipeline. With so many complex interactions between steps, it can be hard to anticipate how certain choices early on in the pipeline might ultimately impact on the final result. For some of the bespoke mechanisms, default parameters do exist: e.g., Raffelt et al. (2015) determined reasonable choices for the smoothing extent and CFE parameters. However, it is unknown to what extent these generalize beyond those initial experimental findings. Similarly, the practice of spatial upsampling has been adopted for its benefits towards improving image contrast (with further downstream benefits for other pipeline steps, e.g., image registration), but a systematic investigation of up-sampling resolutions and their effect on FBA study results has not been performed to date. Yet other user-defined choices exist, e.g., in determining appropriate thresholds or criteria to derive the common fixel analysis mask and its spatial (fixel) extent. Most current FBA studies have adopted existing parameters and choices without further questioning whether these could be improved or tailored to fit their research questions better. This might be addressed in the future by more systematic investigations of the effects of certain FBA pipeline parameters and choices. Reproducibility studies could play another key role in increasing our understanding of the strengths of the framework, but might also reveal possible pitfalls for researchers undertaking FBA studies. While there exists some evidence on good test-retest reliability and long -term stability of 3-tissue CSD methods ( Newman et al., 2020 ), the same has not been pursued yet for derived fixel-wise FD, FC or FDC values (note that this additionally depends on registration steps, fixel definition, and fixel correspondence computation). Furthermore, beyond the preprocessing and complexities involved with computing these metrics, the FBA framework as a whole includes many other steps. Future studies should look into the test-retest reliability of the more final steps and outputs of the FBA pipeline. Reproducing entire FBA study outcomes, either using the same data, newly acquired data of the same subjects, or an entirely new sample of subjects, would also help to further establish the robustness of the framework. Finally, more tools and guidance to assist researchers in assessing the quality of FBA results could prove to be of added value. We have provided recommendations and state-of-the-art best practices in Supplementary Document 1 for all individual steps of the FBA pipeline. These should help to ensure high quality accurate FBA results by allowing researchers to perform diligent quality checks at each stage of the pipeline. However, assessing the quality of the final result remains a difficult challenge.
While analyzing fibre density and fibre-bundle cross-section using the FBA framework provides more (fibre) specific white matter assessments, relating these to the real underlying biophysical and biological mechanisms is also still challenging. More studies and validation are essential to provide further insights and validate specific FBA results against gold standard histological measurements, in order to better understand the cellular mechanisms underlying fixel-wise effects in white matter ( Al-Amin et al., 2020 ;Malhotra et al., 2019 ;Wu et al., 2020 ). Despite some FBA studies effectively incorporating other multimodal MRI data and analysis strategies (e.g., Adanyeguh et al., 2018 ;Boonstra et al., 2020 ;Luo et al., 2020 ;Mizuguchi et al., 2019 ;Vaughan et al., 2017 ), there still exists scope for improved integration of information derived from different modalities (when available) with FBA results. Specifically, in order to extract relevant information from various brain measurements, it has been suggested to validate these against different parameters of another framework, such as connectome embedding ( Rosenthal et al., 2018 ). Of particular interest might be other modalities sensitized to myelin, especially given that apparent FD itself is not (directly) sensitive to myelin. A good candidate might for instance be relaxo-metry, from which a myelin water fraction can be obtained that correlates relatively well with histology ( Mancini et al., 2020 ). An interesting opportunity in this context relates to a technique developed by De Santis et al. (2016) , which combines dMRI and relaxo-metry to resolve fibre-specific values for the longitudinal relaxation time (T1). Such fibre-specific measurements could be analyzed directly with the FBA framework, either to compare or relate to apparent FD, or to augment it.
Most current FBA studies performed group-based analyses, which may prove insufficient to further our understanding of the pathophysiology and management of rare conditions. Group analyses are unable to reflect individual differences between patients and cannot entirely account for between-subject heterogeneity, e.g., in lesion topography ( Attye et al., 2021 ;Chamberland et al., 2020 ). Given this, there is a relevant need for a paradigm shift from groupwise comparisons (e.g., a group of patients, compared to a group of controls) to more individualized profiling (i.e., a single patient, compared to a group of reference controls) of fixel-wise and other metrics, which would aid in conceptualizing both microstructural and macro-structural changes in white matter across rare or clinically heterogeneous populations. Continued work in this area will hopefully allow fixel-wise metrics to be used as diagnostic or prognostic biomarkers ( Atkinson et al., 2001 ), providing new and increased value for both researchers and clinicians alike.

Conclusion
We reviewed the FBA framework for the analysis of whole-brain fibre-specific properties of white matter micro-and macrostructure, as typically derived from diffusion MRI data. Similar to voxel-based analysis, FBA enables analysis of the whole brain without a priori hypothesis as to which parts or structures of the brain might show (significant) effects of interest. However, it allows for this in a truly fibre-specific manner where effects can manifest individually even for different fibre populations within a single voxel. This brings a range of unique challenges, for which the framework implements bespoke solutions. Since its original development, the framework has seen a stark increase in adoption across diverse application areas, yielding unique and valuable insights into various clinical populations as well as healthy subjects. However, limitations and challenges remain, in particular related to validation and translation. Interpretation of results -while greatly improved over other approaches due to fibre-specificity -should still be performed cautiously and is not always trivial due to the complex nature of and interactions between microstructural properties of WM tissue.

Data and Code Availability Statement
There are no relevant data related to this review paper, apart from the parameters and details sourced directly from the 75 FBA studies. These are all included in the table in Supplementary Document 4 .