Towards a representative reference for MRI-based human axon radius assessment using light microscopy

on a of variation of the intensity, to Finally, we analyzed the error due to outstandingly large axons in 𝑟 eﬀ . Compared to 𝑟 arith , 𝑟 eﬀ was estimated with higher accuracy (maximum normalized-root-mean-square-error of 𝑟 eﬀ : 8.5 %; 𝑟 arith : 19.5 %) and lower bias (maximum absolute normalized- mean-bias-error of 𝑟 eﬀ : 4.8 %; 𝑟 arith : 13.4 %). While 𝑟 arith was confounded by variation of the image intensity, variation of 𝑟 eﬀ seemed anatomy-related. The largest axons contributed between 0.8 % and 2.9 % to 𝑟 eﬀ . In conclusion, the proposed method is a step towards representatively estimating 𝑟 eﬀ at MRI voxel resolution. Further investigations are required to assess generalization to other brains and brain areas with diﬀerent axon radii distributions.


Introduction
The MRI signal generated by an ensemble of protons probing the local, microscopic environment in human brain tissue can contain information about microstructural tissue features such as the axonal radius ( Alexander et al., 2010;Andersson et al., 2020;Assaf et al., 2008;Veraart et al., 2020 ). The axonal radius is a key to determine neuronal communication in the human brain because it is related to, e.g., the neuronal conduction velocity ( Drakesmith et al., 2019;Schmidt and Knösche, 2019;Waxman, 1980 ). The estimation of the axonal radius and other microstructural features via biophysical modeling of the MRI signal ( Alexander et al., 2019 ) is an active area of research because of its potential to partially replace or complement invasive ex-vivo histology with non-invasive, in-vivo, quantitative MRI approaches ( Stikov et al., 2015;Weiskopf et al., 2021 ). However, before these models can normalized-mean-bias-error NRMSE normalized-root-mean-square-error NRSD normalized-residual-standard-deviation PMD post-mortem delay lated to the human brain. As the tail of the axon radii distribution may vary between humans and other mammals ( Biedenbach et al., 1986;Leenen et al., 1982 ), eff for humans may be shifted with respect to other species. This shift may be further reinforced by the reduced capability to resolve small axons in human MRI systems when compared to preclinical MRI systems ( Drobnjak et al., 2016;Nilsson et al., 2017;Veraart et al., 2020 ). For human brain, the current gold standard for the validation of eff ( Alexander et al., 2010;Horowitz et al., 2015;Innocenti et al., 2015;Veraart et al., 2020 ) stems from neuroanatomical studies ( Aboitiz et al., 1992;Caminiti et al., 2009;Graf von Keyserlingk and Schramm, 1984;Liewald et al., 2014 ) of small ensembles of axons (100-1000 axons), aiming to evaluate the arithmetic mean radius ( arith ) on manually annotated electron microscopy images (EM). As arith is determined by the bulk of the axon radii distribution, it can be expected that estimates of arith are less sensitive to the ensemble size as compared to eff . For eff , however, small-ensemble estimates can strongly under-or overestimate eff ( Mordhorst et al., 2021 ) of typical MRI voxels, because the tail of the axon radii distribution is insufficiently sampled. Albeit high-resolution, large-scale light microscopy (lsLM) cannot resolve small axons as accurately as EM, an lsLM-based approach might be appropriate to generate a histological gold standard for the validation of MRI-based radius estimation in human brain tissue. Because of the large field-of-view of lsLM, covering cross-sections of 1 mm 2 or larger, it is possible to capture large ensembles of axons including 10 5 to 10 6 axons per section and thus sample the tail of the axon radii distribution more accurately. Moreover, lsLM has the advantage of being fast, cheap and simple to perform compared to EM. As the assessment of axon radii on large field-of-view microscopy data renders manual annotation infeasible, automated approaches, e.g., methods based on convolutional neural networks (CNN), are required. So far, CNN-based methods based on large two-or three dimensional scanning or transmission electron microscopy (SEM/TEM) sections have been trained on images of perfusion-fixed mice or rats ( Abdollahzadeh et al., 2021;Zaimi et al., 2018 ). However, it is unlikely that the models generated in these studies translate well to immersion-fixed human brain tissue with higher tissue degradation.

List of Symbols and Acronyms
In this study, we investigate the potential of lsLM and CNN-based segmentation to map the distribution of axon radii in a human corpus callosum specimen. We quantify the capability of the proposed method to estimate the MRI-visible eff and arith , which is commonly reported in neuroanatomical studies, by evaluating the estimation errors on six lsLM sections. While reference data for the frequency-weighted arith can be generated through manual annotation with reasonable effort, the tail-weighting of eff introduces the necessity to accurately capture the tail of the axon radii distribution and thus investigate larger ensembles of axons than can be realistically annotated. To address this challenge, we merge manually annotated radii from different sources into composite axon radii distributions, combining the accurate resolution of the bulk of axon radii in EM with representative sampling of the tail of the axon radii distribution on large-field-of-view lsLM subsections. Additionally, we investigate whether our method is capable of capturing anatomy-related, spatial variation of arith and eff in the presence of lowfrequency image intensity variation, e.g., due to staining heterogeneity. Finally, we analyze the potential error due to individual, outstandingly large axons in eff .

Ensemble mean axon radii
For a discrete axon radii distribution of individual radii with ( ) axons with radius ( ) in bin , the arithmetic mean radius can be defined as The MRI-visible, effective mean radius ( eff ) ( Burcaw et al., 2015;Sepehrband et al., 2016;Veraart et al., 2020 ) can be estimated from the intra-axonal signal of dMRI. Clinical acquisition employs pulse-gradient spin echo dMRI sequences with wide pulses, i.e. using pulse widths ≥ 10 ms ( Burcaw et al., 2015 ). In the wide-pulse limit, (2) While arith is frequency-weighted ( arith , ( ) ) and therefore determined by the bulk of the axon radii distribution, eff is weighted ( eff, ( ) ) towards the tail of the axon radii distribution because eff, ( ) scales with the fifth power of ( ) . Each radius ( ) denotes the radius of a circular approximation of the axonal body of a myelinated axon with equivalent area ( West et al., 2016 ) (hereafter denoted as circular equivalent).

Axon radii ranges
Throughout this manuscript, we generated composite axon radii distributions by combining axon radii distributions from different sources at particular thresholds. As a consequence, we partitioned the axon radii distribution into three parts: • Large axons ( ≥ 1 . 6 μm ) represent the tail of the axon radii distribution and therefore have a strong contribution towards the tailweighted eff . The threshold was chosen so that the estimated eff was decreased by 50 % when axons above this threshold were removed from the pooled axon radii ensemble of the corpus callosum lsLM sections evaluated with a prototype of the proposed method. • Small axons ( < 0 . 3 μm ) are below the resolution limit of lsLM. • Medium-sized axons ( 0 . 3 μm ≤ < 1 . 6 μm ) constitute the bulk of the axon radii distribution together with small axons.

Data acquisition
Tissue preparation Four human white matter samples of four different subjects were used in this study: a corpus callosum (CC, male, 74 years, postmortem delay (PMD): 24 hours, cause of death (COD): multi organ failure), a corticospinal tract (CST, female, 89 years, PMD: 24 hours, COD: heart failure), an optic chiasm (OC, male, 59 years, PMD: 48 hours, COD: multi organ failure) and a sample obtained from the area dorsolateral of the olivary nucleus including the anterolateral system (AS, male, 81 years, PMD: 24 hours, COD: multi organ failure). Following standard procedures, blocks were immersion-fixed in 3 % paraformaldehyde and 1 % glutaraldehyde in phosphate-buffered saline at pH 7.4. Then, smaller blocks of 1 to 4 mm edge length were cut, contrasted with osmium tetroxide and uranyl acetate, dehydrated in graded acetones, embedded in Durcupan resin and cut into semi-( ∼ 500 nm ) and ultra-thin ( ∼ 50 nm ) sections. Semi-thin sections were stained with 1 % toluidine blue for imaging with lsLM.

Axon radius estimation pipeline
Axon radius estimation was divided into three steps: semantic segmentation, instance segmentation and radius approximation (see Fig. 2 ). To perform semantic segmentation, i.e., to classify each pixel as either axon, myelin or background, we applied a CNN (see Section 2.5 ) in a sliding window manner (see Fig. 2 a). To identify axon instances from individual pixels, we applied connected-component labeling (see Fig. 2 b).

Table 1
The dataset of human tissue samples. The following tissue samples were investigated: a corpus callosum (CC), a corticospinal tract (CST) an optic chiasm (OC) and an anterolateral system (AS). Sections were assigned exclusively to the training or test dataset. For each axon instance, the circular equivalent radius was approximated (see Fig. 2 c).

Semantic segmentation
Training data annotation For training of the CNN, we manually annotated 64 lsLM subsections of similar size ( 70 × 70 μm 2 to 120 × 120 μm 2 ) originating from different sections of the four tissue samples: 46 CC subsections, 4 OC subsections, 4 CST subsections and 10 AS subsections. To avoid fitting to the test data, whole lsLM sections were used exclusively for training or testing (see Table 1 ). To cover a wide range of appearance in axon shape and image contrast, some subsections were only partially annotated, i.e., pixels were assigned an ignore label and were not considered during training. As large axons were expected to have particular relevance for eff , but occur with low frequency, we assigned higher priority to the annotation of these axons.
The manual annotation of individual axons followed the approach described in Zaimi et al., 2018 : first, the myelin sheath was annotated, then the enclosed axonal body was filled. Remaining pixels were assigned a background label. At a later stage, we generated initial segmentations of the myelin sheaths using an early prototype of the CNN. Here, the procedure for the segmentation of myelin sheaths changed as follows: initial segmentations of myelin sheaths were refined, myelin sheaths of missed fibers were annotated and myelin sheaths of falsely segmented fibers were removed. Manual annotations were carried out using GIMP ( The GIMP Development Team ) or ITK-SNAP ( Yushkevich et al., 2006 ).
The manual annotation was performed by a total of six raters (M. Morozova, B. Fricke, J.M. Oeschger, S. Papazoglou, T. Tabarin and L. Mordhorst). Each manually annotated subsection was crosschecked by a second rater. Initially, manual annotations were carried out in collaboration with two experts (i.e., M. Morawski and M. Morozova) who were furthermore consulted in case of doubt.
Network Architecture We used a CNN of the U-Net ( Ronneberger et al., 2015) ; Yakubovskiy (2020) family (see Fig. 3 a), i.e., we followed its general architecture of consecutive encoding and decoding paths with skip connections between shallow and deep layers processing features of the same spatial resolution. In U-Nets, the resolution is reduced after each encoder block while the number of channels is increased; this process is reversed along the decoding path. For the encoding path, we employed transfer learning, i.e., we used EfficientNet-B3 ( Tan and Le, 2019) encoders pretrained on the ImageNet dataset ( Deng et al., 2009 ). In the decoding path, we used two sequences of 3 × 3 convolutions with batch normalization (BN) and rectified linear activation units (Relu) (see Fig. 3 b). The aforementioned sequences were framed with concurrent spatial and channel squeeze and excitation (scSE) ( Roy et al., 2018 ) modules. While the encoding path decreased the spatial resolution by using one convolution with stride two in each encoder block, the decoding path increased spatial resolution using nearest neighbor interpolation as an initial step of each decoder block. Using skip connections between the encoding and The human corpus callosum sample. The schematic of the sample (a) highlights the regions used for training (blue) and testing (red). For each region, one large-scale light microscopy (lsLM) section was acquired. For the = 6 test regions (red), matching lsLM and electron microscopy (EM) subsections were acquired: two sections from genu (G1, G2), two sections from midbody (M1, M2) and one section each from isthmus (I1) and splenium (S1). For section G1, the lsLM (b) and its matching EM section (c) are depicted as well as examples of subsections that were magnified to cover the same spatial extent ( 20 × 20 μm 2 ) at common resolution.   ( Ronneberger et al., 2015 ) architecture, following the approach of an encoding (top row) and decoding path (bottom row) with skip connections (dashed arrows) between encoding and decoding path. The encoding path consists of the first six stages of a pretrained EfficientNet-B3 (EN-B3) ( Tan and Le, 2019 ) model. The decoding path used the fundamental decoder blocks illustrated in (b): each decoder was composed of upsampling by nearest neighbor interpolation, concatenation of the encoded features at same resolution, and two sequences of 3 × 3 convolutions (Conv 3×3 ) with batch normalization (BN) and rectified linear activation units (Relu) framed by squeeze and excitation (scSE) ( Roy et al., 2018 ) modules. The skip connections (dashed arrows) connected the intermediate features of the encoding path (after Efficient-Net B3 stages one, three, and four) with corresponding decoder outputs at the same resolution. The final outputs were obtained by applying Conv 3×3 with a softmax activation to the output of the last decoder, yielding pixel-wise pseudo-probabilities for axon, myelin and background. Annotated numbers denote spatial resolution and the number of channels, e.g., 512 2 × 3 denotes a tensor with 512 × 512 pixels and 3 channels, which corresponds to the input and output size used during training.
decoding path, we concatenated the outputs of encoder blocks, i.e., the output of the pretrained EfficientNet-B3 after stages one, three, and four with decoder blocks processing features of the same resolution (after applying interpolation). Outputs were obtained using 3 × 3 convolution with softmax activation, yielding pixel-wise pseudo-probabilities for axon, myelin and background. In total, the network had ∼ 3 . 8 million trainable parameters.
Input preprocessing Inputs were standardized per color channel with respect to the training dataset, i.e., we computed channel-wise mean and standard deviation across all pixels of the training dataset; then, for each input during training, we subtracted the channel-wise mean and divided by the channel-wise standard deviation.
Training We trained the model for 200 epochs, using pseudo-epochs of 150 randomly drawn training patches of 512 × 512 pixels. We used mini-batch gradient descent with a mini-batch size of 4, Nesterov momentum (0.95), an initial learning rate of 10 −2 and a learning rate decay of = 0 . 2 every 50 epochs after initial 100 epochs to minimize a Lovàsz-softmax loss ( Berman et al., 2018 ). All weights of the CNN were modified during training. The training phase took about 45 minutes on an NVIDIA Quadro RTX 6000 GPU. We used a framework ( Falcon et al., Hyperparameter optimization To determine the above used initial learning rate, and the number of epochs, we carried out a grid search for the initial learning rate and using optuna ( Akiba et al., 2019 ) in a 4-fold-cross-validation (CV) approach. CV splits were conducted at the level of entire lsLM training subsections. We considered the averaged dice score for axon and myelin as the target metric, which we evaluated every 10 epochs on entire subsections of the validation set of the particular CV fold. Each model was trained at least 150 epochs. To avoid overfitting, we stopped when the target metrics did not increase for three consecutive validation steps, i.e., 30 epochs. We then chose hyperparameters, i.e., learning rate, and the number of epochs, so that they optimized the mean of the above target metric across all CV folds.

Test dataset
To generate reference data for the evaluation experiments detailed in the following sections, we manually annotated multiple lsLM and one EM subsection for each test region ∈ {1 , … , } (see Fig. 4 ): (a) To assess the axon segmentation performance of the semantic segmentation model (see Section 2.5 ), we manually annotated all axons on five lsLM subsections ( , ) LM (with ∈ {1 , … , LM = 5} ) in small field-of-views of 28 × 28 μm 2 (see Fig. 4 b.1). Only axonal bodies were manually annotated. Individual axons were manually annotated as follows: the outline of the axonal body was defined, then the enclosed region was filled. Manual annotations were crosschecked as described in Section 2.5 . (b) To capture the tail of the axon radii distribution, we manually annotated large axons on three lsLM subsections ( , ) lsLM (with ∈ {1 , … , lsLM = 3} ) in large field-of-views with an equivalent square area of 350 × 350 μm 2 (see Fig. 4 b.2). Exhaustive manual annotation of all axons was considered infeasible due to the large field-ofview. Only axonal bodies were manually annotated. An early proto-type of the proposed method was used as a guidance for the rater to detect and initially segment large axons. Then, the segmentation of detected axons was manually refined; missed axons were annotated; falsely detected axons were removed. Manual annotations were crosschecked as described in Section 2.5 . (c) To capture the bulk of the axon radii distribution, we manually annotated all axons on one matching EM subsection ( ) EM in small fieldof-views, ranging from equivalent square areas of 54 × 54 μm 2 to 87 × 87 μm 2 (on average: 75 × 75 μm 2 ) (see Fig. 4 b.3). Outlines of axonal bodies were approximated as polygons by M. Morozova. To convert these polygons to axon segmentation masks, we assigned pixels inside polygons an axon label and classified remaining pixels as background.

Performance of the semantic segmentation network
To assess the capability of the semantic segmentation model (see Section 2.5 ) to segment axons, we considered the binary, pixel-wise classification task of discriminating between axon and background. As we evaluted the capability to segment axons, we did not consider myelin, i.e., we generated binary axon prediction masks and treated all non-axon pixels as background. We evaluated the axon segmentation performance both at the level of individual pixels and at the level of axon instances.

Pixel-wise segmentation performance
To quantify the axon segmentation performance at the level of individual pixels, we evaluated segmentation metrics ( Eqs. (3) to (6) ) on pairs of binary axon masks obtained through manual annotation and prediction using the semantic segmentation model on small-field-ofview subsections ( , ) LM . From pixel-wise comparison of pairs of manually annotated and predicted axon masks, we determined the number of false negatives ( |FN |), the number of false positives ( |FP |), the number of true positives ( |TP |) and the number of true negatives ( |TN |). Finally, we computed and for each of the ⋅ LM subsections of all test regions and summarized each metric by the mean across subsections.

Instance-wise segmentation performance
We assessed the axon segmentation performance at the level of axon instances as a function of the axon radius. Two measures were considered: an instance-wise evaluation of the dice coefficient and a comparison of the number of undetected axons (i.e., false negatives) and falsely detected axons (i.e., false positives). For this analysis, we pooled all manually annotated axons over all small-and large-field-of-view lsLM subsections ( , ) LM and ( , ) lsLM across all test regions. The instance-wise dice coefficient was assessed for pairs of manually annotated axons and their best-matching axon from the prediction following a similar approach as in Abdollahzadeh et al., 2019 . The instance-wise dice coefficient was computed using Eq. (6) for pairs of predicted and manually annotated binary axon masks in which we considered only the respective two matching axons, whereas remaining pixels were considered to be background. For each manually annotated axon, the best-matching, predicted axon was determined in terms of the highest instance-wise dice coefficient. Manually annotated axons with no best-matching, predicted axon, i.e., the maximum instance-wise dice coefficient was zero, were considered to be false negatives. Then, we binned manually annotated axons by their radii (spacing: 0 . 1 μm ) and computed the mean dice coefficient per bin. To disentangle the contribution of false negatives from the contribution of under-or oversegmentation of correctly detected axons towards the mean dice coefficients, we repeated the analysis without taking false negatives into account for the computation of the mean dice coefficients.
To compare over-and underdetection of axon instances as a function of the axon radius, we computed |FN | and |FP | (here: at axon instance level) per axon radius bin. While |FN | was immediately available from the computation of mean dice coefficients, we determined |FP | as the number of predicted axons that were not assigned as a best-matching axon to any manually annotated axon.

Error of estimated arith and eff
In this section, we evaluated different error metrics of estimates of arith and eff , i.e., ̂ ( , ) arith and ̂ ( , ) eff , for axon radii distributions predicted on large-field of-view lsLM subsections ( , ) lsLM . Corresponding reference values, i.e., ( , ) arith and ( , ) eff , were generated from different axon radii distributions obtained through manual annotation of ( , ) lsLM and matching EM subsections ( ) EM (see Fig. 5 ).

Error metrics
To assess the error of ̂ ( , ) arith with respect to ( , ) arith (and the error of ̂ ( , ) eff analogously), we considered three different error metrics. Using the residuals and denoting the -th moment of (and others analogously) as we assessed accuracy in terms of the normalized-root-mean-square error the bias in terms of the normalized-mean-bias-error and the normalized-residual-standard-deviation Note, that NRMSE (see Eq. (8) ) can be expressed in terms of Eqs. (9) and 10 using the following decomposition:

Error of ̂ eff
To assess the error of estimates of the tail -weighted eff , we compared estimates ( ̂ ( , ) eff ) obtained from predictions on large-field-of-view lsLM subsections ( , ) lsLM against reference values ( ( , ) eff ) computed from composite reference axon radii distributions  ( , ) eff . The tail of  ( , ) eff was sampled from manual annotations on ( , ) lsLM . The bulk of  ( , ) eff was sampled from manual annotations on matching EM subsections ( ) EM and rescaled according to a scaling factor ( , ) to compensate for the smaller axon ensemble size of ( ) EM as compared to ( , ) lsLM (see Fig. 5 d). This composition of  ( , ) eff was motivated as follows: accurate representation of the tail required exhaustive manual annotation of the tail of the axon radii distribution of ( , ) lsLM ; the bulk of the axon radii distribution of ( , ) lsLM could not be sampled through manual annotation with reasonable effort due to the large ensemble size. Instead, we assumed that the bulk of the axon radii distribution could be representatively sampled from smaller axon ensembles annotated on ( ) EM . To generate the reference axon radii distribution  eff for one subsection, we determined its numbers of axons eff with radii ( ) per bin using corresponding numbers of axons manually annotated on EM and lsLM , i.e., EM and lsLM . For the bulk of  eff , eff was obtained by rescaling EM according to . For the tail of  eff , eff was equal to lsLM . Thus, As there was no obvious choice of , we determined a lower ( ↓ ) and upper ( ↑ ) bound for (see Fig. 6 ).
↓ was determined as the ratio between the number of lsLMresolvable ( ≥ 0 . 3 μm ) axons predicted on lsLM ( lsLM , ≥ 0 . 3 μm ) and the number of lsLM-resolvable axons manually annotated on EM This choice of ↓ was due to the observation that the semantic segmentation network was more likely to miss axons than to falsely detect axons (see Section 3.1 ). Therefore, ↓ was likely to underrepresent the bulk . In contrast, we determined ↑ as the ratio of subsection areas of lsLM and EM , denoted as lsLM and EM : We assumed that ↑ would overrepresent the bulk because we expected a higher axon density in EM than in lsLM due to the lack of large nonfiber structures such as blood vessels in EM . To enable combination of differently sized axon radii distributions, the axon radii distribution of axons manually annotated on ( ) EM was rescaled by a scaling factor ( , ) (see Section 2.8.2 for details). The tick on the x-axis denotes the threshold that partitioned the axon radii distribution into bulk ( < 1 . 6 μm ) and tail ( ≥ 1 . 6 μm ) axons. The insets emphasize the tail of the axon radii distribution. An over-or underrepresentation of the bulk of the axon radii distribution leads to an error in eff . Due to the tail -weighting of eff , we hypothesized that using a reference axon radii distribution with overrepresented bulk (  eff↑ ) would lead to an underestimation of eff , whereas using a reference axon radii ensemble with underrepresented bulk (  eff↓ ) would lead to an overestimation of eff , i.e., eff↑ < eff < eff↓ . Therefore, we assessed the error metrics of ̂ eff with respect to both reference values eff↓ and eff↑ and used the maximum absolute value per error metric as an upper bound for the true error. Moreover, we assessed the dynamic range of errors by investigating the error metrics of ̂ eff with respect to reference values obtained based on scaling factors in the range between ↓ and ↑ . To this end, we computed reference values eff ( interp ) for Fig. 7. Schematic of axon radii distributions used to assess the error of ̂ eff for one subsection lsLM . (a-b) Reference axon radii distribution (purple and pink)  eff ( ) and predicted (yellow) axon radii distribution as described in Section 2.8.2, Fig. 5 b and Fig. 5 d. (c-e) Axon radii distributions generated to assess the error of ̂ eff due to predicted axon radii in distinct axon radii ranges. These axon radii distributions used the predicted axon radii distribution (b) in the large (c), medium-sized (d) and small (e) axon radii range and axon radii of  eff ( ) (see (a)) in the remaining ranges. Axon radii distributions in (c-e) partially relied on axon radii of  eff ( ) (see (a)), thereby inheriting a depencency on the sweep variable , which determined the scaling of the bulk of  eff ( ) as described in Section 2.8.2 . Vertical bars (a-e) mark values of eff computed from the respective axon radii distributions. The ticks on x-axes denote the two thresholds that partitioned the axon radii distribution into small ( < 0 . 3 μm ), medium-sized ( 0 . 3 μm ≤ < 1 . 6 μm ) and large ( ≥ 1 . 6 μm ) axons. The insets emphasize the tail of the axon radii distribution.

Error of ̂ arith
To assess the error of estimates of the bulk -determined arith , we compared estimates ( ̂ ( , ) arith ) obtained from predictions on large-field-of-view lsLM subsections ( , ) lsLM against reference values ( ( , ) arith ) obtained from manual annotations on matching EM subsections ( ) EM (see Fig. 5 c). The choice of an EM-based reference was due to its accurate representation of the bulk of the axon radii distribution, including small axons below the resolution limit of lsLM. As only one EM subsection ( ) EM existed per test region, we used the same reference ( , ) arith for all lsLM subsections per region, i.e., ( , 1) arith = ... = ( , lsLM ) arith . Note, that the generation of ( , ) arith was simplified in comparison to the approach used for ( , ) eff ( ) in Section 2.8.2 : instead of computing ( , ) arith in analogy to ( , ) eff ( ) from composite axon radii distributions  ( , ) eff ( ) combining EM-and lsLM-based axon radii distributions, we calculated ( , ) arith exclusively from EM-based axon radii distributions. The motivation for this simplification was as follows: first, EM accurately captures the bulk of the axon radii distribution that determines arith ; second, we avoided the dependency of ( , ) arith and derived error metrics for ̂ arith on .

Sensitivity of ̂ arith and ̂ eff to variation of the image intensity
We assessed whether the influence of spatially varying intensity, e.g. introduced by staining heterogeneity, affected the capability of our method to map anatomy-related, spatial variation of ̂ arith and ̂ eff across whole lsLM sections. For qualitative analysis, we generated spatially smoothed maps of ̂ arith and ̂ eff by computing the average of randomly positioned subsections (equivalent square area: 350 × 350 μm 2 ) and visually compared the patterns of the spatially smoothed maps to those of the corresponding lsLM images. For quantitative analysis, maps of ̂ arith , ̂ eff and the image intensity were generated similar to those above but sampled on an equally spaced grid (grid pixel area: 350 × 350 μm 2 ). To obtain a scalar value for the image intensity, we applied gray scale conversion. Then, grid pixels of sections with similar axon radii distribution (G1, G2, M1, M2) were pooled and the correlation between image intensity and mapped radii was computed. As visual inspection suggested that small axons were particularly difficult to resolve in strongly stained areas, the above experiments were performed with and without considering small axons to test this hypothesis.

Sensitivity of eff to outstandingly large axons
To evaluate how much eff is affected by outstandingly large axons, we investigated how eff changed as a function of a varying threshold when only axons with < were considered for the computation of eff . was chosen to cover the whole range of observed axon radii for a given axon radii distribution. In particular, we assessed the worst case in which the largest individual axon was missed. To exclude estimation errors from this experiment, we considered only reference data, i.e., the reference axon radii distributions generated in Section 2.8.2 . To rather over-than underestimate the sensitivity to outstandingly large axons, we used reference axon radii distributions  ( , ) eff↓ with underrepresented bulk . Furthermore, to carry out this analysis at a scale as close as possible to the cross-sectional size of typical voxels of a human MRI system ( 1 mm 2 or larger), we computed ( ) eff↓ from pooled axon radii distributions, combining axon radii distributions of all lsLM subsections per test region, yielding  ( ) eff↓ = lsLM ⋃ =1  ( , ) eff↓ for the -th test region. Thereby, we obtained ( ) eff↓ from the largest axon ensembles available for each test region based on combined areas of about 0 . 37 mm 2 ( ≈ lsLM ⋅ (350 μm) 2 ).

Table 2
Pixel-wise segmentation metrics. Each value in the table denotes a mean value of the corresponding metric (see Eqs. (3) to (6) ) over all manually annotated small-fieldof-view subsections ( , ) LM (see Section 2.6 .(a)).  Table 2 lists pixel-wise segmentation metrics: balanced accuracy, dice, precision and recall as defined in see Eqs.

Segmentation performance
(3) to (6) . Higher precision than recall indicates that the number of false negatives was larger than the number of false positives. Fig. 8 shows segmentation metrics evaluated at the level of axon instances as a function of the axon radius. The mean dice coefficient increased as a function of the axon radius in the range from 0 . 0 μm to 1 . 4 μm ( Fig. 8 a). For larger axons, the mean dice coefficient varied only little and was always higher than 0.88, regardless of whether false negatives were considered or not to compute the mean dice coefficient. In contrast, mean dice coefficients of smaller axons were determined by the large fraction of false negatives, indicated by the difference between gray and white bars. The number of false negatives per axon radius was mostly higher than the number of false positives, in particular for axons with < 1 μm ( Fig. 8 b). Fig. 9 shows estimates of arith (i.e., ̂ arith ) against reference values ( arith ) and denotes accuracy, bias and random error in terms of NRMSE, NMBE and NRSD as defined in Eqs. (8) to (10) . ̂ arith deviated from the line of unity, yielding an NRMSE of 19.5 % (see Fig. 9 ). NMBE and NRSD contributed with similar magnitude ( ∼ 14%) to the NRMSE (see Eq. (11) for a decomposition of NRMSE into NMBE and NRSD ).   Fig. 10 is based on repeated comparison between estimates and reference values as illustrated for arith in Fig. 9 , but shows only the above error metrics as a function of . Here, determined the scaling of the bulk of axon radii distributions  eff ( ) , interpolating between lower ( ↓ ) and upper bound ( ↑ ) scaling factors (see Eq. (15) ). Generally, NMBE varied as a function of , whereas the NRSD was less dependent on (see Fig. 10 , center and bottom row). NMBE and NRSD translated into NRMSE (see Fig. 10 , top row), yielding an overall NRMSE between 7.2 % to 8.5 % (see Fig. 10 a). The overall NRSD (7.1 % to 7.3 %) was predominantly determined by an -independent contribution of large axons (6.9 %) and complemented by a smaller, -dependent contribution of medium-sized axons (1.4 % to 2.1 %) (see Fig. 10 a-c, bottom row). In contrast, the overall NMBE (-3.7 % to 4.8 %) was predominantly determined by a strong -dependent contribution of medium-sized axons (-1.3 % to 6.5 %) and a smaller, -independent contribution of large axons (-2.9 %) (see Fig. 10 a-c, center row). Small axons  (8) ); (center row) bias as evaluated by NMBE (see Eq. (9) ); (bottom row) the residual standard deviation NRSD (see Eq. (10) ). Each column depicts the aforementioned errors with respect to reference values eff ( ) due to erroneous axons in distinct axon radii ranges: (a) entire axon radii range, estimating overall errors; (b) large axon radii range ( ≥ 1 . 6 μm ); (c) medium-sized axon radii range ( 0 . 3 μm ≤ < 1 . 6 μm ); (d) small axon radii range ( < 0 . 3 μm ). Errors in (a-d) are shown as a function of a sweep variable , which determined the scaling of the bulk of reference axon radii distributions  eff ( ) . These reference axon radii distributions  eff ( ) were used to compute reference values eff ( ) in (a-d) and estimates of eff ( ̂ eff, large ( ) , ̂ eff, medium ( ) and ̂ eff, small ( ) ) in (b-d). Here, = 0 and = 1 correspond to using lower ( ↓ ) and upper ( ↑ ) bounds of the scaling factor (see Eq. (15) ). Error metrics were evaluated over ⋅ lsLM = 18 lsLM subsections. Note, that NRMSE combines NMBE and NRSD as described in Eq. (11) . Fig. 11. Sensitivity of estimates of the arithmetic mean radius arith and the MRI-visible effective axon radius eff to variation of the image intensity. Depicted are: spatially smoothed maps of estimates ̂ arith and ̂ eff (a), the lsLM image of section M1 (b) adjusted to illustrate the correlation with maps of ̂ arith (a), and scatter plots between ensemble mean axon radii ( ̂ arith and ̂ eff ) and lsLM image intensities (c). The correlation plots (c) pool across four sections (G1, G2, M1, M2). The -values have been multiplied by the number of sections to correct for multiple comparisons (Pearson's is the correlation coefficient). had a small -dependent NMBE (0.4 % to 0.9 %) and small errors overall, i.e., NRMSE was at most 1.1 % (see Fig. 10 d).

Sensitivity of ̂ arith and ̂ eff to variation of the image intensity
The spatial variation of ̂ arith resembled the image intensity distribution of the corresponding lsLM section (see Fig. 11 a, top row and Fig. 11 b). In contrast, maps of ̂ eff had a high local heterogeneity, which was not observed in the image intensity distribution of the corresponding lsLM section (see Fig. 11 a, bottom row and Fig. 11 b). These observations were supported by a strong correlation (Pearson's = 0 . 80 , < 10 −5 ) between ̂ arith and the image intensity, which was reduced when small axons were discarded (Pearson's = 0 . 50 , < 10 −5 ) (see Fig. 11 c, top left and top right). In contrast, ̂ eff did not show a significant correlation with the image intensity (see Fig. 11 c, bottom row).

Sensitivity of eff to outstandingly large axons
eff increased nonlinearly as a function of but with decreasing slope (see Fig. 12 a). For large , there is a step-wise dependence between eff↓ The elongated shape of this axon is likely due to the axon being oriented almost parallel to the cutting plane of the two-dimensional section. When discarding this axon, eff↓ decreased from 3 . 07 m to 2 . 63 m , i.e., a decrease of 14.3 %. For the remaining regions, the decrease of eff↓ ranged from 0.8 % to 2.9 %. The total number of axons ranged from 3 . 7 ⋅ 10 4 to 6 . 8 ⋅ 10 4 . eff↓ used the lower bound ( ↓ ) of a scaling factor, which determined the scaling of the bulk of reference axon radii distributions  eff↓ of eff↓ (see details in Sections 2.8.2 and 2.10 ). and due to the sparse occurrence of large axons. Compared to other regions, the influence of the largest axon on eff↓ was particularly strong in region M2: eff↓ decreased by 14.3 % when the largest axon was discarded (see region M2 in Fig. 12 a). The largest axon was much larger than other axons across all regions and its elongated shape suggested that this axon was oriented almost parallel to the cutting plane, i.e., its axon radius was strongly overestimated by the circular equivalent approximation (see Fig. 12 b). The influence of the largest axon was smaller for the remaining regions: eff↓ decreased by 0.8 % to 2.9 % when discarding the largest axon (see Fig. 12 a). For region S1, there is a plateau between 1 . 3 μm ≲ ≲ 1 . 6 μm , indicating that no axons were sampled in this axon radii range. As the axon radii distributions  eff↓ of eff↓ used the bulk of the axon radii distribution (i.e., < 1 . 6 μm ) from matching , small-field-of-view EM subsections EM (see Section 2.8.2 ), it seems that axons with ≳ 1 . 3 μm were not representatively sampled in EM of region S1.

Discussion
We investigated the potential of CNN-based segmentation on highresolution, large-scale light microscopy (lsLM) sections to narrow the scale gap between histological reference data and MRI voxels for the validation of diffusion MRI-based effective axon radius ( eff ) estimation in human brain tissue. The proposed pipeline accurately estimates eff in a human corpus callosum on sections spanning several cross-sections of typical voxels of human MRI systems ( 1 mm 2 or larger) and is thus a promising candidate for the validation of MRI-based eff estimation in the human brain. However, the arithmetic mean radius ( arith ), which is commonly reported in neuroanatomical studies, is less accurately estimated.

Estimation error of arith and eff
To assess the estimation error of eff representatively for crosssections of MRI voxels ( 1 mm 2 or larger) of a human MRI system, sufficient sampling of the tail of the axon radii distribution is required. Therefore, we investigated large ensembles of axons representing at least 10,000 axons per sample. To address the challenge of generating reference data for eff on large ensembles of axons, we captured the tail ( ≥ 1 . 6 μm ; also denoted as large axons) of the axon radii distribution by exhaustive manual annotation and complemented the tail with the bulk ( < 1 . 6 μm ) from closeby-cut, small-field-of-view EM sections. To compensate for the smaller ensemble size in EM, we rescaled the axon radii distribution according to a scaling factor. As the true scaling factor was unknown, we estimated lower and upper bound scaling factor and assessed the estimation error of eff for scaling factors in the so-defined scaling factor range to estimate an upper bound and the dynamic range of different error metrics.
Across the entire range of axon radii, we conclude higher suitability of the proposed method to estimate eff than arith due to higher accuracy (maximum normalized-root-mean-square-error: 8.5 % versus 19.5 %) and lower bias (maximum absolute normalized-mean-bias-error: 4.8 % versus 13.4 %). Assessment of individual ranges revealed that erroneous, large axons predominantly determine the estimation accuracy of eff followed by medium-sized axons ( 0 . 3 μm ≤ < 1 . 6 μm ). A decomposition of the accuracy into bias and residual standard deviation revealed that the residual standard deviation was predominantly determined by large axons and had a small dynamic range. The bias, however, had a large dynamic range due to the scaling factor-dependent bias of mediumsized axons. Since the true scaling factor is unknown, the true bias cannot be quantified for medium-sized axons. Small axons ( < 0 . 3 μm ) below the resolution limit of lsLM introduced only a minor overestima-tion, even when they were neglected altogether for estimating eff (see Appendix A ). Thus, the potential of lsLM to sample the tail of the axon radii distribution in large field-of-views outweighs its limited capability to resolve small axons for mapping eff .
While we assessed the presented pipeline with particular focus on the ensemble mean radii of segmented axons, i.e., arith and eff , we employed pixel-wise optimization to train the semantic segmentation model. We evaluated the commonly used dice coefficient per axon instance and found a reflection of the better suitability to estimate eff : larger axons were better segmented.

Mapping anatomy-related, spatial variation across whole sections
Toluidine blue staining introduces low-frequency variation of image intensity across lsLM sections. In a spatial correlation analysis, we identified this variation as a confounding factor for mapping arith but not for mapping eff . In the light of moderate errors, spatial variation of eff seems anatomy-related. As arith was particularly confounded when small axons were taken into account, small axons seem particularly prone to staining effects. The inaccurate resolution of small axons may explain the observed overestimation of arith . For eff , the correlation with the image intensity was hardly affected by inclusion or rejection of small axons which underlines their minor contribution towards eff .

Sensitivity of eff to outstandingly large axons
Due to the tail -weighting of eff , individual, outstandingly large axons may strongly contribute towards eff and thus strongly decrease estimation accuracy in case of erroneous segmentation. We assessed this potential source of error by discarding the largest axon for the computation of eff in axon ensembles representing at least 35,000 axons.
The strongest contribution (14.3% in region M2) of an individual axon was due to an outlier. Across the remaining regions, the contribution was smaller (0.8% to 2.9%), but still notable, considering that these axons represented only 0.001 % to 0.003 % of the axon ensembles. For the outlier-region M2, the largest axon ( r = 9 . 46 μm ) was oriented almost parallel to the cutting plane, resulting in an elongated shape. Thus, circular equivalent approximation may largely overestimate axon radii and bias the estimation of eff . To avoid such outliers, axon radii may be estimated based on the minor axes of ellipsoids fitted to the axon areas.
The investigated lsLM subsections (area: ∼ 0 . 37 mm 2 ) were smaller than the cross-section of a typical MRI voxel ( 1 mm 2 or larger). In the latter, we expect reduced potential of individual axons to bias eff due to the larger axon ensemble size.

Limitations and future directions
Although the proposed method accurately estimated eff for different axon radii distributions sampled across the corpus callosum, further investigation is required to assess how well the model generalizes and how well the overall method translates to other brain areas.
Recent, automated methods for large-scale axon segmentation used different acquisition techniques and segmentation algorithms ( Abdollahzadeh et al., 2021;Zaimi et al., 2018 ), which, however, were trained on perfusion-fixed brain tissue of mice or rats. The method of Abdollahzadeh et al., 2021 is tailored towards three-dimensional data and is therefore not immediately comparable to the proposed method. Although the two-dimensional method of Zaimi et al., 2018 employs a similar approach of subsequent, U-Net-based ( Ronneberger et al., 2015 ) semantic and instance segmentation, our method differs, e.g., in details of the U-Net architecture and by employing transfer learning. In comparison, the method of Zaimi et al., 2018 yielded slightly higher metrics for segmentation of axons in human tissue transmission electron microscopy (TEM) data, e.g. a dice coefficient of 0.81 as compared to 0.77 in our lsLM-based approach, and higher metrics for other mammals in both scanning electron microscopy (SEM) and TEM data (e.g., mean dice coefficient > 0 . 9 ). However, due to the different microscopy and tissue preparation techniques, immediate comparison between these results is difficult. In a future study, a segmentation model could be trained using the framework of Zaimi et al., 2018 with the presented data to benchmark the aforementioned method against the proposed method for mapping eff . In the present study, the primary focus was to assess the feasibility of generating reference data for eff using automated axon radius segmentation on large-field-of-view lsLM sections.
Estimates of individual axon radii from two-dimensional crosssections can be biased for axons that are non-orthogonally oriented to the cutting plane ( Abdollahzadeh et al., 2019;Andersson et al., 2020;Lee et al., 2019 ). While approximating axon radii based on the minor axes of fitted ellipsoids may underestimate axon radii, the circular equivalent approximations may overestimate individual axon radii. This bias has been reported to be similar for individual circular equivalent and minor axis approximations in terms of absolute deviation from the along-axis median ( ∼10%) in the corpus callosum of sham-operated rats ( Abdollahzadeh et al., 2019 ). However, further investigations are required to assess how this bias translates towards estimation accuracy of eff . In our analyses, we identified a potential bias of eff based on circular equivalent radii caused even by individual axons that were oriented almost parallel to the cutting plane (see Section 3.4 ). For the two-dimensional reference used in this work, estimates of eff based on minor axis radii approximations yielded similar accuracy (maximum normalized-root-mean-square-error: 8.1 %) and bias (maximum absolute normalized-mean-bias-error: 5.2 %) as estimates of eff based on circular equivalent radii (see Appendix B ). Two-dimensional cross-sections cannot capture along-axon variation of the axon radius, given that strong along-axon variation of the axon radius has been reported at the level of individual axons in the corpus callosum of mice ( Lee et al., 2019 ), rats ( Abdollahzadeh et al., 2019 ) and monkeys ( Andersson et al., 2020 ). However, at the ensemble level, good agreement between axon radii distributions estimated from two and three dimensions has been reported for an ensemble of 54 large axons (arithmetic mean radius: 1 . 35 μm ) within a section of the monkey corpus callosum ( Andersson et al., 2020 ). In fact, the aforementioned study concluded that it may be feasible to compensate the incapability to capture along-axon variation by sufficient in-plane sampling. Following this hypothesis, our proposed method may complement threedimensional microscopy studies of small ensembles of axons with largeensemble sampling in two dimensions.
To assess the estimation error of arith , we compared lsLM-based estimates against EM-based references from close-by cut sections. This choice of reference has two limitations: first, the EM-based axon ensembles were smaller, i.e., covering only 5 to 10 % of their lsLM-based counterparts; second, spatial misalignment arised from section-to-section distance and unknown in-section location. However, the latter section-tosection distance may not render the choice of reference data unsuitable, given that previous studies have reported good agreement between axon radii distributions across comparable distances ( Andersson et al., 2020 ). Furthermore, we assumed that representative estimation of the frequency-weighted arith is enabled by accurate resolution of frequently occurring axons rather than by a large ensemble size or exact spatial alignment. Consequently, we regarded EM as a more suitable reference than lsLM because EM can resolve all frequently occurring axons, including small axons below the resolution limit of lsLM. Indeed, we found small axons to be particularly prone to variation of the image intensity in lsLM which in turn led to systematic overestimation of arith . While the residual standard deviation observed for arith may partially be due to the choice of unrepresentative reference data, the systematic overestimation of arith seems to reflect a bias of the proposed pipeline.
To assess the estimation error of eff , we computed reference values from composite axon radii distributions, combining the bulk of axon radii from EM-based, manual annotations with the tail from lsLM-based, manual annotations. In one particular region, the assumption that EM can accurately capture the bulk of axon radii in small field-of-views seemed to be violated for larger axons of the bulk of the axon radii distribution. For this region, the reference value of eff may have been less accurate than in other regions.
We have limited analyses to the definition of the MRI-visible, effective radius eff measured with diffusion MRI in the wide-pulse limit ( Burcaw et al., 2015;Sepehrband et al., 2016;Veraart et al., 2020 ). However, our pipeline can also be used to estimate eff in the shortpulse limit ( Burcaw et al., 2015;Sepehrband et al., 2016 ) with lower accuracy (maximum normalized-root-mean-square-error: 10.6 %) and higher bias (maximum absolute normalized-mean-bias-error: 9.2 %) (see Appendix C ). The lower performance for short-pulse estimates of eff is likely to the fact that eff in the short-pulse limit is less tail -weighted than eff in the wide-pule limit. Consequently, the decreased segmentation performance for axons of the bulk of the axon radii distribution becomes more relevant for eff in the short-pulse limit.
The manual annotation of microscopy slides is prone to errors and inter-observer variability, in particular in the presence of staining and tissue degradation due to the immersion-fixation used in this study. Employing strategies that address noisy and uncertain manual annotations, e.g. by design of specific loss functions may improve axon segmentation accuracy ( Karimi et al., 2020 ) and thus radius estimation accuracy.

Conclusion
The presented pipeline is a step towards mapping the MRI-visible, effective radius ( eff ) by combining high-resolution, large-scale light microscopy (lsLM) with deep learning. As the two-dimensional lsLM sections span the cross-sectional scale of typical MRI voxels ( 1 mm 3 or larger), the proposed method may complement three-dimensional microscopy studies of small ensembles of axons with large-ensemble sampling in two dimensions. Since the pipeline is based on the fast, cheap and simple to perform lsLM measurement, it can easily be used beyond the realm of MRI-based radius models, e.g., to generate a representative, neuroanatomical atlas of the ensemble of large axons across the human corpus callosum. However, before this can be done the generalization to different brains is yet to be demonstrated.

Code and data availability
The source code and training data used in this study will be made publicly available upon publication of this study on https://github.com/ quantitative-mri-and-in-vivo-histology/ls _ axon _ segmentation .

Ethics
For samples used in this study, the entire procedure of case recruitment, acquisition of the patient's personal data, the protocols and the informed consent forms, performing the autopsy and handling the autopsy material have been approved by the responsible authorities (Approval #205/17-ek).

Declaration of Competing Interest
The Max Planck Institute for Human Cognitive and Brain Sciences has an institutional research agreement with Siemens Healthcare. NW was a speaker at an event organized by Siemens Healthcare and was reimbursed for the travel expenses.  Fig. A.13. Assessment of the error of ̂ eff due to undetected small axons for one subsection lsLM . (a) The reference axon radii distribution  eff ( ) , combining bulk from EM (purple) and tail from lsLM (pink). (b) Axon radii distribution of (a) with small axons neglected altogether. The sweep variable determined the scaling of the bulk of  eff ( ) as described in Section 2.8.2 . Vertical bars (a-b) mark values of eff computed from the respective axon radii distributions. The ticks on x-axes denote the two thresholds that partition the axon radii distribution into small ( < 0 . 3 μm ), medium-sized ( 0 . 3 μm ≤ < 1 . 6 μm ) and large ( ≥ 1 . 6 μm ) axons. The insets emphasize the tail of the axon radii distribution.   (9) ); (c) the residual standard deviation as evaluated by NRSD (see Eq. (10) ). The axon radii distribution of ̂ eff, small,max ( ) neglected small ( < 0 . 3 μm ) axon radii altogether, thereby simulating a potential incapability of large-scale light microscopy (lsLM) to detect small axons. All errors (a-c) are shown as a function of a sweep variable , which determined the scaling of the bulk of reference axon radii distributions  eff ( ) . These reference axon radii distributions  eff ( ) were used to compute both reference values eff ( ) and estimates ̂ eff, small,max ( ) . Here, = 0 and = 1 correspond to using lower ( ↓ ) and upper ( ↑ ) bounds of the scaling factor (see Eq. (15) ). Error metrics were evaluated over ⋅ lsLM = 18 lsLM subsections. Note, that NRMSE combines NMBE and NRSD as described in Eq. (11) .

Credit authorship contribution statement
axons observed in Section 3.2 (using estimates ̂ eff, small ), all errors were increased, i.e. the maximum NRMSE increased from 1.1 % to 1.8 %. However, all errors, remained much smaller than corresponding errors of medium-sized (maximum NRMSE: 6.8 %) or large (maximum NRMSE: 7.5 %) axons observed in Section 3.2 .

Appendix B. Error of ̂ eff for minor axis approximations of axon radii
Throughout the manuscript, we used the circular equivalent approximation for individual axon radii (see Section 2.1 ). Here, we assessed the error of ̂ eff with individual axon radii approximated from minor axes of ellipsoids fitted to axonal areas (short: minor axis radii).
Methods To fit ellipsoids to axonal areas, we used an implementation ( van der Walt et al., 2014 ) of the non-iterative least-squares approach described in Halir and Flusser, 1998 . Axon radii were then computed by halving the length of minor axes of fitted ellipsoids. The assessment of the error of ̂ eff for minor axis radii was carried out analogously to the procedure described in Section 2.8.2 . In particular, we used the same procedure to generate reference axon radii distributions  ( , ) eff ( ) . However, to compute ̂ ( , ) eff and ( , ) eff ( ) , we determined minor axis radii for associated axons of  ( , ) eff ( ) and the axon radii distribution predicted on  (8) ), bias (NMBE ( ) ; see Eq. (9) ) and the residual standard deviation ( NRSD ( ) ; see Eq. (10) ) of ̂ eff with respect to reference values eff ( ) based on minor axis approximations of individual axon radii. The NRMSE was between 5.8 % to 8.1 % (see Fig. A.15 a). The NMBE had a smaller maximum absolute value and a larger dynamic range (-3.4% to 5.2 %) than the NRSD (5.7 % to 6.2 %) (see Fig. A.15 b-c). When compared to the errors for circular equivalent-based ̂ eff observed in Section 3.2 , the maximum NRMSE and the maximum absolute NMBE were comparable (NRMSE: 8.1 % versus 8.5 %; NMBE: 5.2 % versus 4.8 %), whereas the maximum NRSD was slightly lower (6.0 % versus 7.3 %). However, the dynamic ranges of errors for minor axis-based ̂ eff were similar to those for circular equivalent-based ̂ eff .

Appendix C. Error of ̂ eff in the short-pulse limit
The MRI-visible, effective mean radius can be estimated from the intra-axonal signal and is determined by the pulse-length of the specific sequence. Throughout the manuscript, we used the definition of the effective radius in the wide-pulse limit as defined in Eq. (2) . In the short-pulse limit, can be analogously defined ( Burcaw et al., 2015;Sepehrband et al., 2016 ). To assess the error of estimates of eff, SP , i.e., ̂ eff, SP , we repeated the analysis in Section 2.8.2 for ̂ eff, SP with respect to reference values eff, SP ( ) computed from reference axon radii distributions  eff, SP ( ) . Results Fig. A.16 shows accuracy (NRMSE ( ) ; see Eq. (8) ), bias (NMBE ( ) ; see Eq. (9) ) and the residual standard deviation ( NRSD ( ) ; see Eq. (10) ) of ̂ eff with respect to reference values eff ( ) . The NRMSE was between 5.3 % to 10.6 % (see Fig. A.16 a). The NMBE had a larger absolute maximum value and a much larger dynamic range (-2.5 % to 9.2 %) than the NRSD (5.6 % to 5.9 %) (see Fig. A.16 b-c). When compared to the errors for wide-pulse ̂ eff observed in Section 3.2 , maximum NRMSE and maximum absolute NMBE were higher (NRMSE: 10.6 % versus 8.5 %; NMBE: 9.2 % versus 4.8 %), whereas the maximum NRSD was lower (5.9 % versus 7.3 %). Furthermore, the dynamic ranges of both NRMSE and NMBE for short-pulse estimates ̂ eff, SP were higher than those for wide-pulse estimates ̂ eff . Thus, the -dependent error introduced by scaling of the bulk of  eff ( ) has larger impact for short-pulse eff, SP due to the weaker tail -weighting of eff, SP .

Supplementary material
Supplementary material associated with this article can be found, in the online version, at 10.1016/j.neuroimage.2022.118906  (9) ); (c) the residual standard deviation as evaluated by NRSD (see Eq. (10) ). Both ̂ eff and eff ( ) were estimated using minor approximations of individual axon radii. All errors (a-c) are shown as a function of a sweep variable , which determined the scaling of the bulk of reference axon radii distributions  eff ( ) . These reference axon radii distributions  eff ( ) were used to compute reference values eff ( ) . Here, = 0 and = 1 correspond to using lower ( ↓ ) and upper ( ↑ ) bounds of the scaling factor (see Eq. (15) ). Error metrics were evaluated over ⋅ lsLM = 18 large-scale light microscopy (lsLM) subsections. Note, that NRMSE combines NMBE and NRSD as described in Eq. (11) .

Fig. A.16.
Error of estimated effective axon radii in the short-pulse limit. Depicted are three different error metrics of estimates ̂ eff, SP of the MRI-visible, effective axon radius in the short-pulse limit eff, SP with respect to reference values eff, SP ( ) : (a) the accuracy as evaluated by NRMSE (see Eq. (8) ); (b) the bias as evaluated by NMBE (see Eq. (9) ); (c) the residual standard deviation as evaluated by NRSD (see Eq. (10) ). All errors (a-c) are shown as a function of a sweep variable , which determined the scaling of the bulk of reference axon radii distributions  eff, SP ( ) . These reference axon radii distributions  eff, SP ( ) were used to compute reference values eff, SP ( ) . Here, = 0 and = 1 correspond to using lower ( ↓ ) and upper ( ↑ ) bounds of the scaling factor (see Eq. (15) ). Error metrics were evaluated over ⋅ lsLM = 18 large-scale light microscopy (lsLM) subsections. Note, that NRMSE combines NMBE and NRSD as described in Eq. (11) .