Methods for quantitative susceptibility and R2 ∗ mapping in whole post-mortem brains at 7T applied to amyotrophic lateral sclerosis

Susceptibility weighted magnetic resonance imaging (MRI) is sensitive to the local concentration of iron and myelin. Here, we describe a robust image processing pipeline for quantitative susceptibility mapping (QSM) and R2 ∗ mapping of ﬁxed post-mortem, whole-brain data. Using this pipeline, we compare the resulting quantitative maps in brains from patients with amyotrophic lateral sclerosis (ALS) and controls, with validation against iron and myelin histology. Twelve post-mortem brains were scanned with a multi-echo gradient echo sequence at 7T, from which suscepti- bility and R2 ∗ maps were generated. Semi-quantitative histological analysis for ferritin (the principal iron storage protein) and myelin proteolipid protein was performed in the primary motor, anterior cingulate and visual cor- tices. Magnetic susceptibility and R2 ∗ values in primary motor cortex were higher in ALS compared to control brains. Magnetic susceptibility and R2 ∗ showed positive correlations with both myelin and ferritin estimates from histology. Four out of nine ALS brains exhibited clearly visible hyperintense susceptibility and R2 ∗ values in the primary motor cortex. Our results demonstrate the potential for MRI-histology studies in whole, ﬁxed post-mortem brains to investigate the biophysical source of susceptibility weighted MRI signals in neurodegenerative diseases like ALS.


Introduction
Magnetic resonance imaging (MRI) is able to provide a range of imaging contrasts sensitive to different properties of the tissue environment, and is thus a promising tool to study neurodegenerative diseases. However, even quantitative MRI methods are not intrinsically biologically specific, and interpretation of results requires comparisons against a more specific reference such as histology. In this study, we consider the use of susceptibility weighted MRI measures that are purported markers of iron and myelin in the context of a specific neurodegenerative disease, amyotrophic lateral sclerosis (ALS). In particular, we present methods for quantifying magnetic susceptibility and R2 * in whole, fixed postmortem brains, with direct comparisons to histology in the same tissue.
Susceptibility weighted MRI is sensitive to the para-and diamagnetic properties of tissue ( Durrant et al., 2003 ). This sensitivity forms the foundation of contrast in images derived from both QSM and R2 * mapping ( Haacke et al., 2015 ;Duyn, 2013 ;Wang and Liu, 2015 ;Deistung et al., 2013 ). In the brain, these maps have been shown to relate to the concentration of iron Langkammer et al., 2010 ) and myelin  in tissue, which have para-and diamagnetic properties respectively relative to water. A fundamental difference between QSM and R2 * mapping is that para-and diamagnetic inclusions have opposite effects on magnetic susceptibility ( ) estimates obtained from QSM, but the same effect on R2 * rate. In neurodegenerative diseases, changes in tissue myelin and iron content may co-occur. As myelin and iron have opposing susceptibility signs (relative to water), the combination of QSM and R2 * mapping has the potential to shed light on the overall pathology, provided their histopathological correlates can be clearly defined. For example, demyelination and accumulation of iron both lead to a positive shift in , but an opposing effect on R2 * (decrease in R2 * due to demyelination and increase in R2 * due to iron deposition).
To date, several studies have reported the correlation of R2 * and susceptibility with brain iron and myelin content using either whole, unfixed post-mortem brains (in situ) or small, fixed brain tissue samples Langkammer et al., 2010 ;Zheng et al., 2012 ;Zheng et al., 2013 ;Sun et al., 2015 ;Hametner et al., 2018 ); however, no studies have applied QSM in whole, fixed post-mortem brains. Whole, fixed brains present additional challenges including the presence of air bubbles, which induce rapid focal field variations, and the influence of brain shape on the macroscopic magnetic field homogeneity, which can make shimming difficult. To directly compare whole-brain QSM to histology in the same tissue, we therefore need to develop bespoke imaging approaches and processing pipelines for whole, fixed post-mortem brains.
Here, we demonstrate this approach in an exemplar neurodegenerative disease. ALS is a neurodegenerative disease of the motor system, characterized by rapidly progressive and irreversible loss of motor function, ultimately leading to death ( Kiernan et al., 2011 ). Similar to other neurodegenerative diseases, ALS pathology is believed to follow a standard progression within the cerebral cortex that starts in the primary motor cortex (M1) ( Brettschneider et al., 2013 ;Tan et al., 2015 ) and gradually progresses into non-motor cortices ( Brettschneider et al., 2013 ;Tan et al., 2015 ). In end-stage disease (corresponding to postmortem brains), M1 would be expected to exhibit substantial pathology, anterior cingulate cortex (ACC) to exhibit intermediate pathology, and secondary visual cortex (V2) to be largely unaffected ( Tan et al., 2015 ). Therefore, development of MRI biomarkers related to these regions would be helpful for studying disease mechanisms; for tracking disease progression and response to therapeutics; and potentially for early diagnosis. A study comparing R2 * mapping with iron histology in a single ALS brain demonstrated elevated R2 * in the deep layers of M1 that corresponded to iron accumulation ( Kwan et al., 2012 ). Recent in vivo QSM studies of ALS have shown elevated susceptibility in the motor cortex and a range of subcortical nuclei ( Schweitzer et al., 2015 ;Costagli et al., 2016 ;Acosta-Cabronero et al., 2018 ), hypothesised to be driven by an increased concentration of iron.
We present a pipeline for comparing and R2 * to semi-quantitative histology in whole, fixed post-mortem brains. Our motivation for scanning whole brains, as opposed to small tissue samples, is the ability to study the relationship between MRI and histopathology across different brain regions that are affected at different disease stages ( Pallebage-Gamarallage et al., 2018 ). Susceptibility weighted MR images were acquired using a multi-echo gradient echo (GRE) acquisition in whole, fixed post-mortem brains from nine ALS patients and three control subjects. These data were subsequently processed using a custom analysis pipeline to generate quantitative susceptibility and R2 * maps. Brains were then sectioned and stained using immunohistochemistry targeting myelin proteolipid protein (PLP) and ferritin in three brain regions with a different pathological stage profile (M1, ACC and V2). Our results provide proof-of-principle for using MRI-histology comparisons to study the histopathological correlates of neurodegeneration at the whole brain level.

Post-mortem brain samples
Post-mortem brains were obtained from the Oxford Brain Bank, University of Oxford, under its generic Research Ethics Committee approval (15/SC/0639). In total, twelve whole, post-mortem formalin fixed brains were scanned with our multi-echo GRE acquisition. The cohort consisted of nine donors with a clinical diagnosis of ALS, and three control donors with no known neurological disorder. Diagnosis was confirmed by a neuropathologist. Patient demographics, post-mortem information, and relevant tissue handline properties are listed in Table 1 .

Sample preparation and MRI data acquisition
Brains were imaged in a 7T whole-body human scanner (Siemens Healthcare, Erlangen, Germany) using a 1Tx/32Rx head coil and a multi-echo GRE sequence. To prepare each brain for scanning, the brain was first removed from storage in formalin. Excess formalin remaining within the ventricles was removed by mildly compressing the brain whilst being held with the cerebellum pointing downwards. Excess formalin on the brain surface was removed using paper towels. The brain was then placed in a plastic bag filled with a susceptibility-matched perfluorocarbon liquid (3M TM Fluorinert TM FC-3283) and gently compressed/rotated to fill the ventricles with Fluorinert and remove any excess air-bubbles. This process also removed any remaining formalin within the ventricles, which has a lower density than Fluorinert (and hence floats to the top of the bag). This process continued until no new air-bubbles/formalin were observed. The brain was subsequently transferred and sealed in a 3D printed brain-shaped container (Fig. S1 in the Supplementary material), keeping the fourth ventricle pointing upwards to prevent loss of Fluorinert during the transition. The 3D printed shell was subsequently filled with Fluorinert and gently manipulated to remove any new air bubbles that may have formed during the transfer. This brain-shaped container was then placed in an external brain holder with a spherical chamber (Fig. S1 in the Supplementary material) that was again filled with Fluorinert. This exterior holder achieves two aims: first, ensuring that brains are consistently positioned in the main magnetic field; and second, providing a spherical overall shape that minimizes field distortions in the brain. Both of these aims will benefit reproducible and high-quality susceptibility weighted data (details are shown in Figure S2 in the Supplementary material).
Susceptibility weighted data were collected using a 3D multi-echo GRE sequence with following parameters: TEs = 2,8.6,15.2,21.8,28.4,35 ms with monopolar readout and non-selective RF pulse, TR = 38 ms, flip angle = 15°, bandwidth = 650 Hz/pixel, GRAPPA factor = 2 and inplane resolution = 0.5 × 0.5 mm 2 . Over the course of our experiment, changes were made to the slice thickness, Cases 1-3/4-7 were scanned with slice thickness = 1.3/1.2 mm and matrix size = 384 × 310 × 104, total scan time = 11 min. Cases 8/9-12 were scanned with slice thickness = 0.55/0.6 mm and matrix size = 384 × 310 × 208, total scan time = 22 min. GRE acquisitions were repeated four times for all brains and raw, multi-channel k-space data were collected. To perform comparative analyses at a similar voxel size, the raw k-space data from Cases 8-12 were subsequently truncated along the slice direction to downsample the data to closely match the slice thickness of Cases 1-7 (1.1 mm for Case 8 and 1.2 mm for Case 9-12). As part of a larger study, structural, relaxometry and diffusion data were also acquired ( Pallebage-Gamarallage et al., 2018 ).

. Image reconstruction and alignment
Open-ended fringe lines (non-physical phase singularities) were observed in the manufacturer-supplied reconstruction, which would severely corrupt the QSM processing. These fringe lines were observed also in single RF channels in the region of highly focal signal loss induced by the coil sensitivity profiles. We therefore developed a custom reconstruction pipeline for the raw k-space data, which is described in detail below. Briefly, the stages are: (i) perform custom GRAPPA reconstruction; (ii) remove the coil-specific phase offset; (iii) combine phase images across coils; and (iv) align repeat scans. This reconstruction produces a phase image for each echo that is free of open-ended fringe lines.
The phase offset for each channel, 0 ( ⃗ ) , was removed by subtracting the phase of the first short TE echo ( TE 1 ) ( 1 = 2 ) from all subsequent echoes ( Robinson et al., 2017 ;Bernstein et al., 1994 ). This produces five phase images Δ ( TE n ′ ) for each channel with equivalent echo times ′ = − 1 (specifically, 6.6, 13.2, 19.8, 26.4 and 33 ms). Crucially, this step also removes the time-independent phase offset, which was also present in TE 1 . For each equivalent echo time, a coil-combined phase image was generated using a complex sum over all the coils. Magnitude images were combined using a simple "sum-ofsquares " method to produce the magnitude data of all 6 echoes.
However, before we could perform this correction to remove 0 ( ⃗ ) , we had to remove an observed image misalignment between different echoes along the phase encoding direction. The amount of misalignment increased with longer echo times within each repeat (the largest misalignment was found between the first and sixth echo in Case 12, details are shown in Figure S3 in the Supplementary material). This misalignment was hypothesised to be induced by B 0 drift caused by temperature changes of the passive shim material ( Foerster et al., 2005 ), as has been reported previously for long-duration post-mortem scans ( Miller et al., 2011 ). Before removing the coil-specific phase offsets, we therefore aligned the first echo to each subsequent echo. By aligning the first echo, which is only used for the 0 ( ⃗ ) correction, we avoid interpolating the later echoes that are passed into subsequent stages of the pipeline. This linear image registration was constrained to translation along the phase encoding axis using FLIRT (FMRIB's Linear Image Registration Tool) with a custom schedule file ( Jenkinson et al., 2002 ;Jenkinson and Smith, 2001 ;Greve and Fischl, 2009 ). This was done separately for the four repeats.
Finally, image misalignments between different repeats were observed, again consistent with B 0 drift caused by temperature changes of the passive shim material, details are described in Section 2.3.3 . Thus, another registration step was performed to align all coil-combined phase images Δ ( TE ′ ) across repeats. This alignment used FLIRT with a specific schedule file constrained to translations along x, y and z axes ( Jenkinson et al., 2002 ;Jenkinson and Smith, 2001 ;Greve and Fischl, 2009 ). This alignment used the second echo from repeat 1 as the reference, and registered all other echoes from the 4 repeats to this reference.

Phase unwrapping and field map estimation
Δ ( ⃗ ) was estimated using a voxel-wise non-linear fit of the complex data, ( ⃗ , ′ ) , over the co-registered coil-combined echoes ( Liu et al., 2013 ;Kressler et al., 2009 ) by minimizing: where is the gyromagnetic ratio and Δ ( ⃗ ) is the z-component of the magnetic field perturbation. The cost function is evaluated only for voxels in a mask that was generated by thresholding the magnitude image from the first echo. Assuming linear phase evolution, 0 will have been removed by our coil combination. We include 0 ( ⃗ ) in the Δ ( ⃗ ) fitting process as an indication of the goodness of the fit. Small remaining estimates of 0 may correspond to a slightly non-linear phase evolution due to tissue microstructure. However, any large, focal estimates of 0 would be indicative of noisy voxels with unreliable phase ( Schweser et al., 2011 ), which should not be included in the later steps of QSM processing. Fitting was performed using the lsqnonlin function in MATLAB (R2016a). Fitting was initialized with estimates of 0 ( ⃗ ) and Δ ( ⃗ ) obtained from the first two echoes of our coil combined data. To achieve this, a phase difference map between the first two echoes was generated: and spatially unwrapped using PRELUDE (Phase Region Expanding Labeller for Unwrapping Discrete Estimates) ( Jenkinson, 2003 ). The phase map at the first echo was also spatially unwrapped and our initializations were estimated as follows: This initialization offers some advantages. Firstly, without initialization of the fitting, phase wraps may remain in the estimated Δ ( ⃗ ) map due to rapid temporal phase evolution in tissue that exceeds the Nyquist limit. By providing an initialization consisting of a Δ ( ⃗ ) map on the order of the true Δ ( ⃗ ) , we can produce wrap-free Δ ( ⃗ ) maps that are estimated from all the echoes in our offset-corrected phase data. Secondly, spatial phase unwrapping is only performed on the first echo and the phase difference between the first two echoes to generate 0 ( ⃗ ) and Δ ( ⃗ ) . These images will contain few phase wraps compared to the phase images acquired at higher echo times, enabling robust unwrapping performance and initialization estimates with minimal artefacts. If phase evolution was purely linear, these difference maps would be flat and close to zero. In practice, these phase difference maps consistently exhibit a linear phase gradient along the readout direction (a-c). A 1D linear phase was estimated and removed along the readout direction using the methodology described in ( Tendler and Bowtell, 2019 ), resulting in much flatter phase difference maps (d-f).
As shown in Figure S4 in the Supplementary material, voxels with very large and focal 0 ( ⃗ ) values occurred at the edges of the brain and were found to be due to the presence of vessels and air bubbles. These voxels are sources of artefacts of no interest in QSM. Therefore, the brain mask was refined by thresholding this 0 ( ⃗ ) map to remove these focal field outliers.
When comparing the predicted phase evolution from ′ 1 to the observed phase evolution in later echoes, we consistently observed a linear phase gradient along the readout direction ( Fig. 1 , similar to the effect found by Tendler et al. ( Tendler and Bowtell, 2019 )), which was hypothesised to be due to eddy currents or an accumulative mistiming of the centre of readout gradient (resulting a shift of k-space centre). These phase gradients would lead to apparent gradients in field offset along the readout direction and need to be removed. These shifts were estimated by a 1D linear fit along the readout direction over the phase increments of Δ ( ′ 2 ) to Δ ( ′ 5 ) with reference to Δ ( ′ 1 ) , and subsequently removed from Δ ( ′ 2 ) to Δ ( ′ 5 ) .

Averaging, background field removal and dipole inversion
Four repeats of our GRE sequence were obtained per post-mortem brain sample (except Case 2, where only data from two repeats were available). Prior to each repeat, the scanner performs a whole-brain center frequency calibration, such that the phase over the whole brain should average to zero for each echo. Investigation of this mean temporal phase evolution over the four repeats instead revealed a non-zero, variable slope of phase evolution that consistently increased from the first to last repeat (quantified as the mean frequency offset Δf of the Δ ( ⃗ ) map, Fig. 2 A). This is likely due to B 0 drift induced by temperature drift of the passive shim material inside the scanner bore during acquisition ( Miller et al., 2011 ). These GRE data were obtained as part of a larger study ( Pallebage-Gamarallage et al., 2018 ) acquiring multi-modal MRI data within each brain. Over the course of this acquisition, the gradient duty cycle ranges from an intensive 24-hour diffusion-weighted protocol to a moderate T1/T2 weighted protocol, making it likely that the scanner temperature was unstable during the GRE acquisition. Even with an accurate frequency calibration at the beginning of each GRE repeat, this could lead to frequency drift during each repeat. In Fig. 2 B, these same results are plotted as the change in Δf between adjacent repeats (red boxes). Comparing this to the translations along the phase encoding direction between repeats (blue circles), we see a very similar pattern. These results would be consistent with the hypothesis that both the between-repeat image misalignments and mean Δf differences are caused by B 0 drift induced by temperature changes of the scanner's passive shim material. If this is true, these results indicate that scanner  frequency evolution ( Δ = Δ ) over the whole brain for each of the 4 repeats (only two repeats were available for Case 2, which is not presented here). The dashed horizontal line corresponds to the Δf value that accumulates phase of with echo spacing of 6.6 ms ( 2 × 6 . 6 ×10 −3 = 75 . 76 rad ∕s ), which leads to aliasing when Δf is above this value (indicated by the light green dots). This was observed in Cases 1, 5 and 12. (B) Blue circles: Translations along the phase encoding direction needed to align echo six of repeats 2-4 to echo two of repeat 1. Red squares: Increments of mean Δf over the whole brain between the labelled repeat and the preceding repeat. The correspondence of patterns between blue and red squares is consistent with our hypothesis that both effects are driven by a common source, hypothesised to be due to B 0 drift induced by temperature change of the passive shim material.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) temperature stabilized during the course of GRE acquisitions, demonstrated by the decreasing change in Δf and decreasing translations over the course of multiple repeats.
Averaging the phase data from 4 repeats prior to fitting for Δ ( ⃗ ) would require a correction for this B 0 drift to avoid phase cancellation. Instead, Δ ( ⃗ ) maps from 4 repeats were estimated and averaged to obtain the final Δ ( ⃗ ) map. We found the latter approach to be more robust than the former to certain artefacts in the data. In particular, one brain showed evidence of Fluorinert leakage during the course of the scan, leading to local changes in Δ ( ⃗ ) due to the difference in susceptibility between Fluorinert and air.
The experimental setup developed here differs from in vivo imaging in several important ways of relevance to QSM processing and algorithms that are optimized for in vivo imaging may not be appropriate post-mortem. Hence, an evaluation of the robustness of different algorithms for background field removal and dipole inversion in post-mortem data processing was performed. Three existing algorithms for background field removal were evaluated: variable kernel SHARP (v-SHARP) ( Schweser et al., 2011 ;Li et al., 2011 ), projection onto dipole fields (PDF)  and Laplacian boundary value (LBV) ( Zhou et al., 2014 ). We also evaluated three algorithms for dipole inversion: thresholded k-space division (TKD) ( Haacke et al., 2010 ), morphology enabled dipole inversion (MEDI) ( Liu et al., 2012 ) and streaking artifact reduction for QSM (STAR-QSM) ( Wei et al., 2015 ). Representative susceptibility maps from these comparisons are shown in Fig. 3 . Overall, we found that the combination of v-SHARP and STAR-QSM was the most robust for our given post-mortem datasets, yielding susceptibility maps which did not contain large scale inhomogeneities, had few steaking artefacts and did not appear over-smoothed.
Thus, background fields in the estimated Δ ( ⃗ ) maps were removed using the v-SHARP algorithm ( Schweser et al., 2011 ;Li et al., 2011 ) with our refined mask, a maximal kernel size of 12mm and regularisation parameter of 0.02mm − 1 . Finally, the susceptibility maps were generated using the STAR-QSM algorithm ( Wei et al., 2015 ).
In QSM, the centre of k-space of the estimated susceptibility map is undefined ( 3 ), which makes it a measure of tissue's relative susceptibility, an internal reference region is usually used, thus, the measured susceptibility values were referenced to the whole brain by setting the mean susceptibility value across the whole brain to zero. Fig. 3. Susceptibility maps from combinations of three background field removal methods and three dipole inversion methods. Red arrows indicate streaking artefacts or remaining large scale inhomogeneities on other susceptibility maps compared to maps generated using STAR-QSM and v-SHARP. Susceptibility maps generated with MEDI generally showed over-smoothed contrast, for instance, there was less contrast within white matter regions but very sharp edges at the grey and white matter boundary. Note implementations of PDF, LBV and MEDI were from the MEDI toolbox and the implementation of STAR-QSM was from the STI suite.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

R2 * post-processing pipeline
This same multi-echo GRE scan was used to generate R2 * estimates from the magnitude images. R2 * maps were estimated using voxel-wise fitting of the magnitude images of all 6 echoes by minimizing: where ( , ⃗ ) is the experimental magnitude data, 0 ( ⃗ ) is the extrapolated signal at TE = 0 ms, and ( ⃗ ) represents the noise floor.
As described above, B 0 drift induced effects were observed and corrected during the processing of phase data. This is likely due to temperature changes of the passive shim material during scanning, which followed a long diffusion protocol with significantly higher gradient duty cycle than the GRE scans. Previous work has reported a linear dependence of T2 * measurements on sample temperature ( He et al., 2009 ). To test for such effects in our data, the T2 * maps from individual repeats of each brain were estimated and the mean T2 * values over whole brain were calculated. Our results found a small change (0.41% ± 0.29%) in the mean T2 * values from repeats 4 relative to repeat 1 over the 11 brains (changes in ms for each brain and repeat are shown in Figure  S5 in the Supplementary material). As the mean T2 * value drifts over 4 repeats were small, no correction of the temperature effect was performed. The final R2 * maps were generated using averaged magnitude images over the four repeats for each brain.
The processing pipeline for generating susceptibility and R2 * maps is displayed in Fig. 4 .

MR image analysis
Three regions of interest (ROIs) were analyzed ( Fig. 5 ) in the MRI data. These regions were chosen to represent different stages of ALS: M1 (Stage I), ACC (Stage II and III) and V2 (no pathology, as internal control). M1 was further separated into face, hand and leg areas. Masks of M1 regions were created in MNI space and subsequently registered into individual diffusion space and manually refined in each hemisphere by an experienced researcher familiar with M1 anatomy (R.M.) using the mean diffusivity maps from the diffusion acquisition as a reference. ROI masks were subsequently registered to the GRE data using FLIRT ( Jenkinson et al., 2002 ;Jenkinson and Smith, 2001 ;Greve and Fischl, 2009 ). Masks of ACC and V2 were hand-drawn based on the location of the tissue block stained for histology. Briefly, the histological stained slides were carefully mapped to the corresponding location in the 3D structural images by visually comparing photographs of the coronal sampled brain slices to the MR image. Masks of ACC and V2 were manually drawn in the structural space to match the cortical region on the stained slides (described in Section 2.6 ), and finally registered to GRE data using FLIRT ( Jenkinson et al., 2002 ;Jenkinson and Smith, 2001 ;Greve and Fischl, 2009 ).
Due to a change in brain bank protocols, two types of fixative (10% neutral buffered formalin (NBF) and 10% formalin) were used for fixation of the brains. The effects of the two types of fixative on R2 * and susceptibility values were analysed. As shown in Fig. 6 A and 6 B, differences were found between two fixative groups in the mean R2 * value over the whole brain (29.7% difference between the group means) and the M1 region (29.4% difference between the group means). Fixation effect (characterized as an indicator variable) was regressed out from mean R2 * value over the whole brain. As the mean susceptibility value across the whole brain was set to zero for each brain, only the group Fig. 4. Illustration of the post-processing pipeline. Green arrows indicate the pre-processing steps including coil combination and mask generation, red arrows indicate the R2 * mapping processing steps, and blue arrows indicate the QSM processing steps.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) difference in averaged susceptibility value in the M1 region between the two fixative groups was investigated ( Fig. 6 C) and the group difference was found to be not significant (7.5% difference between the group means).
Effects of post-mortem delay and time in fixative before scanning on the susceptibility and R2 * maps were also investigated: no significant associations were found.
Although care was taken to detect and remove the voxels corresponding to air bubbles as described above, some large positive or negative susceptibility values induced by remaining air bubbles still existed in the final susceptibility maps. An outlier detection process was performed on the susceptibility and R2 * values within our anatomically defined ROIs prior to measuring the mean value. Outliers were defined as values more than three scaled median absolute deviations away from the median. These outliers were excluded from the ROI analysis. Following outlier rejection, mean susceptibility and R2 * values within each ROI mask were calculated.

Histological analysis
Following MRI, the brains were systematically sampled and ROIs were extracted as described by Pallebage-Gamarallage et al. ( Pallebage-Gamarallage et al., 2018 ). Briefly, the M1 leg region was represented on the medial surface of the paracentral lobule at the banks of the interhemispheric fissure. The M1 hand area (hand knob) was recognised by the inverted omega on the precentral gyrus at the anterior bank of the central sulcus. Similarly, the M1 face region was identified on the precentral gyrus, at the anterior bank of the central sulcus, lateral to the intersection of inferior frontal sulcus and precentral sulcus. Subsequently, the brains were sliced in a coronal plane for the ACC and V2 to be sampled. The ACC was extracted from the coronal slice represented with the striatum (caudate and putamen) and the anterior limb of the internal capsule rostral to the anterior commissure and posterior to the head of the caudate nucleus. V2 was sampled adjacent to the primary visual cortex (V1), identified by the stria of Gennari at the banks of the calcarine  6. Comparison of the effects of the two types of fixative on susceptibility and R2 * maps. Mean R2 * values over the whole brain is plotted for the two types of fixative (A). Mean R2 * (B) and susceptibility (C) values in the whole primary motor cortex (M1) region are plotted for the two types of fixative. Mean R2 * values in the 10% neutral buffered formalin (NBF) group were significantly lower compared to the 10% formalin group over whole brain ( p = 0.0031, Hedges's g = − 3.287) and in M1 ( p = 0.0004, Hedges's g = − 3.059), whereas no significant difference was found for the mean susceptibility values in two fixative groups ( p = 0.58, Hedges's g = 0.286) in M1. * * p < 0.01; * * * p < 0.001 fissure. Sampled tissues were processed for paraffin embedding and 6 m thick paraffin sections were cut and mounted on 75 × 26 mm slides for immunohistochemistry. Sections were deparaffinised in xylene, rehydrated in graded ethanol and water. Endogenous peroxidase activity was blocked by 3% H 2 O 2 (in phosphate buffered saline) for 30 min prior to carrying out microwave antigen retrieval in citrate buffer. Sections were incubated with primary antibodies against ferritin (1:3000, Sigma F5012) and PLP (1:1000, Bio-Rad MCA839G) for 1 h. Proteins were vi-sualised with DAKO EnVision Detection Systems where secondary antibody HRP (horseradish peroxidase) rabbit/mouse serum was applied for 1 h followed by 3,3'-diaminobenzidine (DAB) chromogen. Sections were counterstained with haematoxylin, dehydrated with graded ethanol, cleared in xylene and mounted with DPX mounting medium.
Whole slides were digitised at x20 objective magnification with Aperio ScanScope®AT Turbo (Leica Biosystems) high throughput slide scanner. Previously validated ( Pallebage-Gamarallage et al., 2018 ) preset Aperio Colour Deconvolution algorithm thresholds (version 9.1, Leica Biosystems) unique to each stain were applied to quantify the stained area fraction (SAF). The SAF is the ratio of positively stained area normalised by the total area within a region of interest (ROI). Cortical ROIs were defined by an experienced histologist based on histological and neuroanatomical landmarks on each slide (as described previously in Section 2.6 ). Example annotations of cortical ROIs on histology slides are shown in Figure S6 in the Supplementary material. Representative digital histology and markup images of ferritin and PLP stains are shown in Figure S7 in the Supplementary material. The SAF calculations were validated by a neuropathologist. Note that the measured SAF values have no units.
Ferritin staining was performed in the left hemisphere for M1 (leg and face areas, in two separate experiments), ACC and V2. In order to compare across regions in light of high cross-batch variability in ferritin staining, we conducted ferritin staining in two batches containing all samples, with each batch including one M1 region, ACC and V2. As a result of this cross-batch variability, the ferritin SAF calculated from two batches had substantial differences in the range of SAF values. To combine the ferritin results across experiments, SAFs were standardized within each batch (demean then divide by the standard deviation) prior to combined analysis of both batches. Thus, ferritin SAFs reported have zero mean, and therefore will include negative "fractions ". PLP staining was also performed in both hemispheres for M1 (leg, face and hand areas), ACC and V2. No significant effects were found when comparing fixative type, postmortem delay, or time in fixative before scanning for the histology data.

Statistical analysis
All statistical analyses were performed using SPSS version 25.0 (IBM, Armonk, NY). Pearson correlation analyses were used to assess (i) the relation between susceptibility and R2 * values, (ii) the relation between ferritin and PLP, and (iii) the relation between pairs of MRI and histological stains.
Welch's t-tests were used to compare (i) the group differences (ALS vs controls) in mean susceptibility and R2 * in the M1 and (ii) the group differences (ALS vs controls) in ferritin and PLP stains in available M1 areas. Hedges's g was calculated for each t-test as effect size, which has a similar interpretation to Cohen's d but is generally considered more accurate for small sample sizes.
All p-values presented are for two-tailed tests. A p-value less than 0.05 was considered significant. Fig. 7 depicts the four ALS cases that exhibited visually apparent hyperintensities in susceptibility and R2 * within the cortical grey matter of M1 compared to the surrounding tissue, including the adjacent primary sensory cortex. For example, this is clearly visible around the 'hand knob' in Case 6 and 9, but also extending towards face and leg areas. The remaining brains did not exhibit obvious visible hyperintensities in M1. Susceptibility and R2 * maps from all brains are shown in Figure S8 in the Supplementary material and are available at https://doi.org/10.5281/zenodo.3925203 . ALS brains were found to have greater mean susceptibility and R2 * values in M1 when compared to controls ( Fig. 8 ).

Susceptibility and R2 * maps
The mean susceptibility and R2 * values were compared across all regions (M1, ACC and V2) and showed a strong positive linear correlation with a slope of 0.93 ppb − 1 s − 1 ( Fig. 9 ). In comparison, a previous in vivo study by Deistung et al. ( Deistung et al., 2013 ) found no significant correlation between susceptibility and R2 * values for various cortical grey matter structures including motor, sensory, occipital, prefrontal and temporal cortices. The linear correlation presented here is largely driven by the regional differences between the three regions that were found to have different levels of ferritin and myelin content based on histological results. The observed consistency between the susceptibility and R2 * values reflects the common factors driving both QSM and R2 * mapping, which relate to relative susceptibility offsets in different tissue compartments.

Histological results and their relation to MRI-derived measures
As shown in Fig. 10  to control brains, although the group differences were not significant. Both susceptibility and R2 * showed positive correlations with ferritin and PLP, when the leg and face area of M1, ACC and V2 regions were analysed together ( Figs. 11 and 12 ). As both ferritin and PLP contribute to estimated and R2 * values, and a positive correlation between ferritin and PLP SAFs was found (r = 0.35, p = 0.011), partial correlation (PLP was regressed out when correlating /R2 * with ferritin, and vice versa) results are also reported.

Discussion
Acquisition of high-quality quantitative susceptibility mapping in whole, post-mortem brains has both challenges and opportunities that are not present in vivo. Opportunities include long scan times to in- crease SNR and spatial resolution, as well as the ability to compare against histology drawn from a broad range of brain regions. The proposed experimental setup and image processing pipeline for quantitative susceptibility and R2 * mapping were developed to address the specific challenges associated with post-mortem acquisition. First, the absence of skull will in general alter the macroscopic background field compared to in vivo. By embedding the brain within a large sphere, we can improve the overall shim quality. Second, reliable positioning of the brain is important for achieving consistent QSM results, particularly in the white matter where susceptibility anisotropy causes the estimated susceptibility value to depend on orientation to the B0 field. This challenge was overcome through the use of an internal brain-shaped holder that could be reliably positioned within the outer sphere. Third, the presence   of air bubbles causes severe signal dropout and phase aliasing, while small air bubbles induce strong dipole-shaped fields around the region that may lead to streaking artefacts during the dipole inversion step. Finally, post-mortem scans tend to be associated with long scan times, during which gradient-intensive sequences can cause significant heating of the gradient hardware. If this heating is transferred to the passive shim material inside the scanner bore, it can lead to significant B 0 drift during the course of acquisitions ( Foerster et al., 2005 ). In our study, image misalignments along the phase encoding direction and differences in the slope of phase evolution between different GRE repeats were observed that would be consistent with such a B 0 drift. In addition, T2 * values have previously been demonstrated to be sensitive to temperature. Changes in temperature inside the scanner bore could be transferred to the sample, depending on air circulation and heat capacity of the sample while embedded in the susceptibility-matching fluid. In our study, very small differences in mean T2 * values over whole brain between the four GRE repeats were found for all brain samples. This suggests that the sample temperature was relatively stable during the course of GRE acquisitions. Importantly, although changes in passive shim temperature and sample temperature could both be caused by gradient heating, they need not be coupled, since the two effects depend on details of heat transfer between different physical systems. The above-mentioned issues have been addressed by appropriately constrained post-processing corrections in our study. Similarly, histology data requires optimisation. For both ferritin and PLP, non-specific (background) DAB staining was avoided using treatment with 3% H 2 O 2 to quench peroxidase activity, which can interact with DAB. The quality of immunohistochemistry was assessed during protocol development by an experienced histologist, and validated by a clinical neuropathologist. We now describe for each stain the cellular specificity that was observed for DAB and prior literature reporting similar results. PLP staining was observed to be specific to axonal fibres of the cortical grey matter and no staining was observed in neurons or oligodendrocytes. Myelin PLP staining was most dense in M1 and V2 and less dense in ACC. This agrees with previous reports that ACC is substantially less myelinated than the M1 and V2 ( Nieuwenhuys and Broere, 2017 ;Glasser et al., 2016 ). In PLP, the majority of the axonal fibres were observed to run vertically within the grey matter perpendicular to the grey and white matter boundary, with other fibres running horizontally along the length of the cortical layers, with some bundling of vertical fibres, consistent with previous findings ( Turner, 2019 ). Ferritin staining was apparent in myelin, glial cells and oligodendrocytes.

ALS
Previous work showed that myelinated fibres colocalised with ferritin staining in both grey and white matter ( Fukunaga et al., 2010 ). Furthermore, microglial ferritin staining has been demonstrated in the frontal cortex of Alzheimer's disease ( van Duijn et al., 2013 ) and in the deep layers of the motor cortex of ALS ( Kwan et al., 2012 ). Since the majority of iron is bound to ferritin, studies have validated intracellular ferritin as a surrogate marker for iron content in tissue ( van Duijn et al., 2013 ;Reinert et al., 2019 ). Shortcomings of immunostaining with respect to quantification are described below.
Susceptibility and R2 * both showed positive correlation with the ferritin stained area fraction ( Fig. 11 ). This is as expected, due to the paramagnetic properties of iron Langkammer et al., 2010 ), what was not expected, however, is that both susceptibility and R2 * values additionally showed positive correlations with PLP. Positive correlation is expected for R2 * estimates ( Zhang et al., 2016 ), but a negative correlation would be predicted for susceptibility due to the diamagnetic properties of myelin. A similar result to ours has been previously reported ( Hametner et al., 2018 ), which has been attributed to a correlation between iron and myelin content in grey matter, which was also observed in our data. To explore whether this explanation is consistent with our data (i.e. that the apparent correlation was in effect an indirect relationship mediated by cortical iron content), we performed a partial correlation analysis. Regressing the ferritin SAF out of both PLP SAF and susceptibility estimates prior to correlation reduced the positive correlation, but did not produce the expected negative correlation. To the extent that ferritin SAF accounts for iron (caveats on this point are discussed below), the partial correlation analysis was unable to explain this finding. This positive partial correlation of PLP with susceptibility is not in agreement with the diamagnetic nature of myelin and remains unexplained.
More generally, it is worth noting that the correlations presented here are driven primarily by regional differences rather than variation within region, reflected as a clustering of regions in Fig. 11 . In this situation, correlations are prone to identifying apparent relationships between any properties exhibiting regionally-specific differences. In general, we did not observe meaningful correlations within individual regions, with the only significant correlation being between ferritin SAFs and susceptibility in the ACC, although this does agree well with the overall inter-regional correlations. The lack of significant correlations in individual regions could reflect low statistical power in the presence of small effect size (differences between brains) and small samples (n = 12).
This unexpected correlation between PLP SAF and susceptibility could reflect some limitations in the immunohistochemistry methods used within our study. First, DAB staining does not follow the Beer-Lambert law (the optical density is not representative of the protein concentration). This is the primary motivation for semi-quantitative analysis based on SAF rather than optical density. One drawback of measuring SAF is the need to use thresholding to determine positively stained area. In this work, a single threshold for each stain was manually chosen by a histologist which could potentially fail to account for slide to slide variability. A pipeline that enables automated thresholding of digital histology images is currently under development ( Otsu, 1979 ;Geijs et al., 2018 ). Second, ferritin staining is not directly quantitative of iron load. Each ferritin complex can carry a variable number of iron atoms, meaning that ferritin-associated iron is not necessarily proportionate to the protein content measured here. Third, although ferritin is the principal iron storage protein, non-ferritin iron would also contribute to MR susceptibility signal. This would lead to both an underestimate in the apparent correlation with iron load and, potentially, an imperfect deconfounding of iron from PLP stains. Other techniques that more directly measure iron concentration, such as the mass spectrometry technique used in Langkammer et al. , could offer a more accurate quantification of iron in tissue. However, such techniques require heating the sample which may damage the tissue chemical structure and preventing further use of the histology sample.
Previous studies have reported susceptibility and R2 * hyperintensities in M1 of ALS patients ( Schweitzer et al., 2015 ;Costagli et al., 2016 ;Acosta-Cabronero et al., 2018 ). We observed similar effects in cortical grey matter of M1 in four out of nine ALS brains, corresponding to different sub-regions. This variability might relate to disease heterogeneity. Furthermore, in our study, ALS brains were found to have greater mean susceptibility and R2 * values in M1 compared with control brains, while ALS brains also exhibited greater ferritin and PLP stained area fractions compared with control brains, although group differences were not statistically significant. The result of higher PLP SAFs in the M1 region in ALS brains compared to control brains ( Fig. 10 B) is potentially surprising as neurodegeneration is more often associated with demyelination. A related result has been previously reported by Meadowcroft et al. ( Meadowcroft et al., 2015 ), where an increased staining and optical density of subcortical white matter in the primary motor cortex was found in ALS tissue compared to controls using a different myelin stain (Luxol fast blue). Disease-related iron and myelin changes in M1 and their contributions to the MRI susceptibility signal need further investigation.
This study had several limitations. First, the GRE acquisition protocol could be better optimised, as the readout train included unnecessary dead time. To give a feel for the potential gains, if one wished to leave the exact TEs and TR unchanged (e.g. if the first echo required a particularly short TE = 2 ms), a bandwidth of 210 Hz/pixel could be achieved for echoes 2-5, increasing the SNR by 1.76x for these echoes. Second, registration between histology slides and MRI data was not performed, requiring us to correlate at the level of region-of-interest rather than pixel-wise. A pipeline that enables automated registration of histology slides to 3D MRI images using dissection photos as an intermediary is currently under development ( Huszar et al., 2019 ). This registration pipeline aims to enable pixel-wise comparisons between MRI and histology acquired within the same sample. Third, we were only able to obtain ferritin histology in limited regions of interest over two separate experiments. This limitation reflects challenges in ferritin staining, with quality of stain being operator dependent such that direct comparison requires that all slides are processed as a single batch. To enable combination of stained area fractions measured from two batches, ferritin SAFs were standardized statistically within batch to account for technician-related bias. Fourth, QSM is a measure of tissue's relative susceptibility, which is generally made in comparison to an internal reference region (e.g. cerebrospinal fluid). In our study, we lack a reliable reference region that can be confidently considered to be free of pathology (indeed, even the region chosen to represent very low pathology, V2, may be affected). As such, susceptibility values were referenced to the whole brain by setting the mean susceptibility of whole brain to zero. Fifth, only three control brains are included, which limits statistical power in this current cohort. Sixth, we observed a variation of R2 * values with distance to the tissue surface in post-mortem brains ( Figure  S9 in the Supplementary material). This may be due to the time it takes for fixative to penetrate into the tissue ( Dawe et al., 2009 ) combined with leakage of the fixative from the tissue surface. We are developing a regression-based correction for this effect based on forward simulation of the diffusion of fixative over time ( Qi et al., 2017 ).

Conclusion
We have developed a QSM and R2 * processing pipeline for whole, fixed post-mortem brain acquisition. Comparison with semi-quantitative ferritin and myelin histology demonstrates the potential for susceptibility and R2 * maps to serve as a quantitative imaging marker for the diagnosis of ALS or other neurodegenerative diseases.

Authors' contributions
CW established the QSM and R2 * processing pipeline, processed the MR data, conducted high-level analyses and drafted the manuscript. SF contributed to design and construction of the brain holder and established the post mortem MRI protocols and carried out MRI acquisition. OA developed the sampling strategy and advised on histology analysis. SBC performed histology staining and analysis. MC provided GRAPPA reconstruction algorithms. AL performed histology staining and analysis. RALM generated ROI masks. JM contributed to the analysis of histology stained images. MPG carried out systematic sampling, block face photography, immunohistochemistry, histology analysis. MRT supported cohort characterisation. KLM conceived the study design, contributed to establishment of the post mortem MRI protocol and QSM/R2 * processing pipeline and data analysis. BCT conceived the study design, contributed to QSM/R2 * processing pipeline, provided algorithms and software for QSM pre-processing, and data analysis. All authors critically appraised the manuscript.

Declarations of Competing Interest
None