Magnetic resonance imaging in multiple sclerosis animal models: A systematic review, meta-analysis, and white paper

Highlights • This is an overview of preclinical MRI studies in neuroinflammatory diseases.• We summarized experimental setup, MRI methodology, and risk of bias of these studies.• We propose guidelines to improve standardization of preclinical MRI studies.• Implementing these reporting guidelines could facilitate clinical translation.• This study can serve as a framework for future preclinical studies using MRI.


Introduction
A growing number of large cooperatives, including the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Human Connectome Project (HPC) (Petersen et al., 2010;Van Essen et al., 2013) aim to standardize reporting on neuroimaging in humans. Whereas standardized reporting on neuroimaging in clinical researchincluding the use of magnetic resonance imaging (MRI) as a fundamental tool in diagnosis and monitoring of multiple sclerosis (MS)has received much attention, no such attempts have been made in preclinical neuroimaging research. This gap is surprising since MRI is also widely used in preclinical research to screen for drug efficacy and to investigate pathogenic aspects in animal models, especially in MS animal models, in which both inflammatory and demyelinating pathology are readily detectable using MRI. One concern is that differences in experimental MRI scanning and reporting on technical imaging details can impede comparisons between studies. A comprehensive reporting of methodological details is also key for potential replication of findings (Kilkenny et al., 2010) an issue that is receiving a great deal of attention in preclinical research (Justice and Dhillon, 2016;Kilkenny et al., 2010;Steward and Balice-Gordon, 2014). Thus, improved reporting of methodological imaging details can maximize the availability and utility of the information gained from every animal experiment, which can ultimately prevent unnecessary animal experiments in the future. Finally, keeping track of the abundance of preclinical MS neuroimaging studies so far published has proven difficult.
Therefore, we set out to provide a comprehensive overview of preclinical MRI studies in the field of neuroinflammatory and demyelinating diseases, summarizing experimental setup, MRI methodology, and risk of bias. Through a meta-analysis, we also investigated the efficacy of assessed therapeutic approaches using MRI outcome measures and histological measures of disease activity in MS animal models. In order to increase standardization of experiments, we propose minimal reporting guidelines on technical aspects and experimental setup for future preclinical MRI studies, with the goal of improving successful translation of preclinical findings for potential therapeutic interventions for MS.

Materials and methods
This systematic review summarizes preclinical studies assessing therapies and/or pathogenic aspects of MS in corresponding animal models using MRI. The inclusion criteria and method of analysis were specified in advance and documented in a protocol, which was published on PROSPERO (registration number: CRD42019134302). We used the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) Guidelines (Moher et al., 2015).

Search strategy and paper selection
A comprehensive search string to identify publications assessing MRI in MS animal models was generated. The following databases were searched for matches: EMBASE, go3R, Medline, PubMed, Scopus, and Web of Science (last search 01 May 2020). See Supplementary search string for the exact string. All animal species, publication dates, and languages were included in the database search.
Publications were included in this systematic review if they met the following inclusion criteria: (1) the publication was an original peer reviewed full publication that published unique data; and, (2) since MS animal models are generally defined by neuroinflammatory (e.g. experimental autoimmune encephalomyelitis or Theiler's murine encephalomyelitis) and/or demyelinating pathology (e.g. cuprizone or lysolecithin), the publication used an animal model with a neuroinflammatory/demyelinating pathological substrate in conjunction with any MRI outcome.
Publications were included in the meta-analysis if they met the following inclusion criteria in addition to the ones listed above: (3) the publication contained at least one adequate control group (i.e. vehicle or no treatment); (4) an outcome measure related to MRI was used; and (5) the publication provided an effect measure, animal numbers, and a measure of variability for the respective experimental groups.
Publications were screened for relevance by one reviewer. Reviews were excluded but used as a source for potential studies and for discussion.

Data extraction
The following study characteristics were extracted from the fulltexts by two independent reviewers: (1) parameters on model organisms and disease model: type of animal model, tested intervention, application regimen, species, strain, sex, housing conditions, weight, and number of animals per group; (2) parameters on MRI scanning: anesthesia, technical details on MRI scanner (supplier, coils, gradients, magnetic field strength), technical details on MR imaging parameters (pulse sequence, echo/repetition time, field of view, matrix size, and others), contrast agent type, and dosage. As study outcome measures, we extracted the mean and variance (standard deviation [SD]) or standard error of the mean [SEM]) of all available MRI outcome parameters. BVI checked if all data were extracted correctly. Disagreement between the two reviewers was solved by jointly assessing the data in the publications and coming to a consensus. The interrater agreement was 71% for MRI outcomes.
When possible, data were extracted from text or tables; if not, data were extracted from graphs using universal desktop ruler software (AVP Software Development, USA). When the group size was reported as a range (e.g., 6-7), the mean number of animals was used in our analysis (e.g. 6-7 = 6.5).

Quality assessment
We scored the risk of bias according to a five-item checklist derived from the consensus statement 'Good laboratory practice' in the modelling of stroke (Macleod et al., 2009): implementation in the experimental setup of any measure of randomization, any measure of blinding, prior sample size calculation, statement on animal welfare, and statement of a potential conflict of interest. For each of these items, a 'yes', a 'NR' (not reported), or a 'no' was scored. As a sixth item, we also scored whether the study was in accordance with the ARRIVE guidelines (Kilkenny et al., 2010).

Meta-analysis
Data were analyzed using the software Comprehensive Meta-Analysis (CMA, version 3.0). Different studies used different scales to measure the same outcome; thus, we calculated the Hedges' g standardized mean difference (SMD)the mean of the experimental group minus the mean of the control group divided by the pooled SDs of the two groupsinstead of the raw mean difference.
In order to adequately represent weight of individual experiments in the meta-analysis, control groups were adjusted in case they served for more than one experimental group. In that case, the number of observations in that control group was divided by the number of experimental groups served.
Individual SMDs were subsequently pooled to obtain an overall SMD and 95% confidence interval. Since we did not expect one true underlying effect of all the meta-analyzable studies, we used the random effects model [14], which takes into account the precision of individual studies and the variation between studies and weighs each study accordingly.
Sources for heterogeneity were explored using I 2 to describe the percentage of the variability in the effect estimates that is due to heterogeneity rather than sampling error (Higgins and Thompson, 2002). We expected the variance to be comparable within the subgroups (i.e., the pooled treatments); therefore, we assumed a common across-study variance across subgroups. No sub-subgroup analyses were calculated due to low number of experiments per therapeutic approach.
We used funnel plots, Trim and Fill analysis, and Egger regression to assess potential publication bias. SMDs may cause funnel plot distortion, thus, we plotted the SMD against n 1/ , a sample size-based precision estimate (Zwetsloot et al., 2017).  (Moher et al., 1999). A search string for MS animal models and MRI was used in conjunction with an animal filter . A total of 9079 publications were retrieved via EMBASE, Medline, PubMed, Scopus, and Web of Science, of which 4112 publications remained after deduplication. After initial screening of titles and abstracts, 499 publications were included in the full-text search. Of these, 300 unique publications met our inclusion criteria for the synthesis on experimental methods (Supplementary reference list). Of these, 67 unique publications investigated a potential MS therapy, whereas 49 unique publications contained quantitative structural MRI data and could therefore be used for the quantitative synthesis on therapy effect in MRI (meta-analysis). The remainder was excluded according to the criteria listed in Fig. 1.

Study selection process
The first report using MRI in an MS animal model was published in 1985 (Stewart et al., 1985). It showed that MRI lesions were apparent in primate brains prior to experimental autoimmune encephalomyelitis (EAE) symptom onset. The first report using an MS animal model to assess a therapy for its remyelinating potential in vivo and meeting our inclusion criteria, however, was only published in 1994 (Namer et al., 1994). Thus, all studies included in the meta-analysis were published between 1994 and 2020.

Assessed therapies
A total of 44 different therapies (47 different therapeutic approaches) were investigated in 49 of the publications. The assessed therapies are listed in Supplementary Table 2.
Whereas in vivo imaging only was performed in 232 publications (77%, Fig. 2B), ex vivo imaging only was performed in 36 publications (12%). 32 publications (11%) acquired both in vivo and ex vivo MR images, respectively. Brain only was imaged by most publications (228, Fig. 1. Prisma flow chart of the study selection process (Moher et al., 1999). Deduplication refers to removing identical studies found in multiple medical databases (e.g. same references in EMBASE and MEDLINE). Four duplicate studies were removed in the eligibility stage. Abbreviations: MRI, magnetic resonance imaging; MRS, magnetic resonance spectroscopy; MS, multiple sclerosis. 76%) followed by spinal cord only (37, 12%). Both brain and spinal cord were imaged by 26 publications (9%). 86 publications acquired longitudinal neuroimaging (37% of in vivo imaging publications). The longest imaging follow-up time was 12 months.
Of 62 publications acquiring DWI, 27 reported the maximum b value (44%). B values ranged from 50 to 3000 s/mm 2 . Twenty-five publications reported the pulse duration (δ, 40%) and the time between the pulses (Δ, 40%). Only a few publications reported the number of directions (11 publications, 18%) or the diffusion gradient strength (9 publications, 15%). Of note, more than 1/3 of publications using DWI were released in the three years prior to our search cutoff date.

Study quality and risk of bias
Poor reporting in preclinical studies is a known issue, and therefore many items of commonly used risk of bias tools are scored as unclear risk of bias . We therefore scored the risk of bias according to a five-item checklist derived from the consensus statement 'Good laboratory practice' in the modelling of stroke (Macleod et al., 2009). These items were also scored in a comparable study in EAE (Vesterinen et al., 2010) and in a study of toxic demyelination models (TD) (Hooijmans et al., 2019). Compliance with animal welfare regulations or an approved animal license were reported in 80% of cases (EAE: 32%, TD: 58%). Blinding of the experiment at any level was reported in 29% of publications (EAE: 16%, TD: 38%). Due to the experimental setup, one publication (< 1%) was not able to blind their researchers, and this was explicitly reported. A statement about conflict of interest was reported in 35% of publications (EAE: 6%, TD: 38%). Thirteen percent of publications reported randomization at any level (EAE: 9%, TD: 5%). Four publications (1%) reported a prior sample size calculation (EAE: < 1%, TD: 2%). These findings are summarized in Fig. 2C.
Finally, as a sixth item, we checked whether the publication was in accordance with the ARRIVE guidelinesan initiative to improve the reporting standard of animal research (Kilkenny et al., 2010). Three Proportional study characteristics on type of anesthesia for imaging, magnetic resonance imaging (MRI) scanner supplier, field strength of MRI scanner, scanned central nervous system region(s), and use of contrast agent. The top portion of the bar always represents the remaining pooled categories per characteristic or the proportion of studies who did not report on that particular study characteristic. (C) Risk of bias assessment of eligible studies using a sixitem checklist (animal welfare reporting, blinding of experiments, statement of a potential conflict of interest, randomization in experimental setup, prior sample size calculation, study in accordance with ARRIVE guidelines (Kilkenny et al., 2010;Macleod et al., 2009)). For each of these items, 'yes', 'NR' (not reported), or 'no' was scored. Except for the item animal welfare statement, the majority of studies have unclear risk of bias (i.e., not reported; orange bar). Abbreviations: Bru, Bruker; Cup, cuprizone; EAE, experimental autoimmune encephalomyelitis; Gd, gadolinium; Iso, isoflurane; K/X, ketamine-xylazine; marm, marmosets; NR, not reported; Var, Varian; SC, spinal cord. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) publications reported being in accordance with the ARRIVE guidelines (1%, Fig. 2C).
Pooling the individual effect sizes of all therapies in our meta-analyses showed that the therapies described in the literature had a beneficial effect on MRI outcomes (e.g. volume of T2 brain lesion load, standardized mean difference (SMD): 1.24, 95% CI: [1.06, 1.34], p = 0.021, Table 1). The overall heterogeneity between the studies was moderate (I 2 = 37%).
In order to obtain a more detailed overview of the efficacy of the various therapies included in this review, we also analyzed the effect of the 47 different therapeutic approaches for their impact on MRI outcomes separately. Twenty-eight therapeutic approaches led to a significant improvement of MRI outcomes (Fig. 3). For the remaining 19 therapeutic approaches, no statistically significant results were found. Of note, in most cases, only one study was available per therapeutic approach. The median sample size [interquartile range, IQR] was 7 [5][6][7][8][9][10] for the treatment groups and 6 [4][5][6][7][8] for the control groups.
Pooling the individual effect sizes of all therapies in our meta-analyses showed that the therapies described in the literature had a beneficial effect on histological outcomes of (re-)myelination (e.g. number of thinly myelinated axons in electron microscopy, SMD: 1.72, 95% CI: [1.09, 2.30], p = 0.014 Table 1). The overall heterogeneity (Higgins and Thompson, 2002) between the studies was high (I 2 = 80%), however, reflecting the anticipated differences between interventions, models used, and study design.
In order to obtain a more detailed overview of the efficacy of the various therapies included in this meta-analysis, we also analyzed the effect of the 24 different therapeutic approaches for their impact on (re-)myelination histology markers separately (with the corresponding method to assess (re-)myelination in square brackets). Seven therapeutic approaches led to a significant improvement of histological outcomes of (re-)myelination ( Supplementary Fig. 1). For the remaining 17 therapeutic approaches, no statistically significant results were found. The median sample size [IQR] was 7 [5.38-9.5] for the treatment groups and 5.25 [3-7.25] for the control groups.
Pooling the individual effect sizes of all therapies in our meta-analyses showed that the therapies described in literature had a beneficial effect on histological outcomes of neuroinflammation (e.g. number of inflammatory CD3 + cells within parenchymal lesions, SMD: 1.20, 95% CI: [0.93, 1.55], p = 0.019 Table 1). The overall heterogeneity between the studies was substantial (I 2 = 61%).
In order to obtain a more detailed overview of the efficacy of the various therapies included in this meta-analysis, we also analyzed the effect of the 18 different therapeutic approaches for their impact on inflammatory histology markers separately (with the corresponding method to assess inflammation in square brackets). Ten therapeutic approaches led to a significant improvement of histological outcomes of neuroinflammation ( Supplementary Fig. 2). For the remaining 8 therapeutic approaches, no statistically significant results were found. The median sample size [IQR] was 6 [5-7.5] for the treatment groups and 5 [4][5][6] for the control groups.

Histological markers of neurodegeneration as outcome
In total, 5 publications also assessed the neuroregenerative/-protective potential of therapeutic approaches. Immunohistochemistry/fluorescence for neurofilament or SMI32 were used by 2 publications each.
Pooling the individual effect sizes of all therapies in our meta-analyses showed that the therapies described in the literature had a positive effect on histological outcomes of neurodegeneration (e.g. number of SMI-positive axons within neuro-inflammatory lesions, SMD: 0.81, 95% CI: [0.10, 1.51], p = 0.044, Table 1). The overall heterogeneity between the studies was substantial (I 2 = 61%).
In order to obtain a more detailed overview of the efficacy of the various therapies included in this meta-analysis, we also analyzed the effect of the 5 different therapeutic approaches for their impact on  Ineichen, et al. NeuroImage: Clinical 28 (2020) 102371 neurodegeneration histology markers separately (with the corresponding method to assess inflammation in square brackets). Only one therapeutic approach (olesoxime [Neurofilament]) led to a significant improvement of histological outcomes of neurodegeneration ( Supplementary Fig. 3). For the remaining 4 therapeutic approaches, no statistically significant results were found. The median sample size [IQR] was 4.5 [3.75 -5.25] for the treatment groups and 4.5 [4-5.25] for the control groups.

Correlation analysis
We next asked how well MRI outcome measures correlate with histological markers of (re-)myelination or neuroinflammation. For this, we plotted the SMDs of the MRI outcomes against the SMDs of these histological outcomes. A positive correlation was found for any non-contrast-enhanced MRI outcomes (i.e. structural T1-weighted/T2weighted and MTI measures as well as DWI measures) and measures of (re-)myelination (Fig. 4A). SMDs of MRI outcomes showed no correlation to neuroinflammation (Fig. 4B), Only 5 studies histologically assessed neurodegeneration. Hence, we did not assess correlation.

Publication bias
In order to assess publication bias, we visually inspected the funnel plot and calculated Egger's regression. The funnel plot is a graphical representation of trial size plotted against the reported effect size. An uneven scattering on both sides of the summary effect size indicates publication bias. Visual inspection of the funnel plot indicated the presence of publication bias for MRI data (Fig. 5). This finding was supported by Egger's regression showing statistically significant evidence for small study effects (p = 0.001).

Discussion
MRI is widely used in preclinical research to investigate putative therapeutic approaches or pathogenic aspects in MS animal models. Tracking the large number of published studies in this field has proven difficult, however. In order to obtain an overview of these studies, we systematically reviewed methodological details of preclinical studies using MRI in MS animal models. Furthermore, a meta-analysis on therapeutic approaches provides evidence for a solid correlation between MRI outcomes measures and histological measures of (re-)myelination.

Risk of bias and reporting of methodological details
Accumulating evidence suggests low reproducibility rates in life sciences (Justice and Dhillon, 2016), including neuroscience (Steward and Balice-Gordon, 2014). A recent report indicates that, in the United States alone, the cumulative prevalence of irreproducible preclinical research exceeds 50% with an approximate cost of $28 billion/year (Freedman et al., 2015). An insufficient reporting of experimental details in (pre-)clinical research can contribute to this lack of reproducibility (Carp, 2012;Collins and Tabak, 2014). This problem is Fig. 3. Forest plot of the included studies for MRI outcomes. The diamond indicates the global estimate and the whiskers its 95% confidence interval (CI). The numbers listed after each therapy are: the exact effect size with its 95% CI, the number of included studies for a certain intervention (ns), the total number of treated animals (nt) and control animals (nc). The capital letters in round brackets indicate whether the corresponding therapy has also been tested for (re-)myelination (M), inflammation (I) and/or neurodegeneration (N). The gray bar indicates the 95% CI of the overall effect size. The dotted line indicates an SMD of 0, i.e. studies with whiskers which overlap this dotted line do not show statistically significant SMDs between therapy and control group. Also consider Supplementary Figs. 1-3 for effect on (re-)myelination, inflammation and/or neurodegeneration. References are provided in the Supplementary information. particularly apparent in MRI research, where small differences in imaging protocol can lead to large differences in tissue contrast (Amiri et al., 2018). Therefore, a detailed and accurate reporting of the used methodology and results is key for future reproducibility of findings from MRI studies and, more importantly, successful translation of preclinical findings to clinical trials. Reduction of methodologically inappropriate animal experiments could also ultimately reduce animal numbers.
Estimating the risk of bias in our included publications, by scoring whether measures to avoid bias were reported in six separate domains, suggests an overall high risk of bias of the included studies, albeit in the range of other published studies in the field (Hooijmans et al., 2019; Vesterinen et al., 2010). It has been shown that studies that report on measures to avoid bias, such as blinding of experimenters, yield substantially lower efficacy estimates (Macleod et al., 2008;Vesterinen et al., 2010). Thus, it is highly recommended to include such measures to the experimental design of any planned study.
Results from our systematic review show that an abundance of different species and MS animal models has been used in conjunction with MRI. Many studies have not reported on key methodological details of the experimental setup, e.g. 21% of all studies did not include information on the sex of used animals and 19% of all studies did not report the total number of studied animals. Guidelines for reporting experiments involving animals have been published to tackle this problem (Kilkenny et al., 2010;Landis et al., 2012); they are still insufficiently implemented to scientific practice, however (Baker et al., 2014) (only 3 out of 300 publications reported of being in accordance with the ARRIVE guidelines in our systematic review).

Minimal reporting guidelines
While the overall reporting on technical details regarding MRI system and image acquisition was reasonable, some important methodological details were seldom reported: many studies did not report on gradient system, receiver bandwidth, flip angle magnitude, field of view, matrix size or gradient strength, and number of directions in DWI. There was also a high variability on which technical aspects were reported and which were omitted. The poor reporting and the variability in reporting are due to a lack of reporting guidelines. Whereas such reporting guidelines for general aspects of preclinical animal research (Kilkenny et al., 2010) or clinical trials (Moher et al., 2001) have been proposed, no such guidelines are available for preclinical neuroimaging studies. Thus, based on the findings in this review and our experience with neuroimaging in animals, we propose minimal reporting guidelines ( Table 2). The reporting suggestions are grouped according to experimental steps, i.e. details on the MRI system, details on animal anesthesia, details on sequence(s), details on contrast media (if applicable), and details on ex vivo imaging (if applicable). Even though we did not include other disease models to our analysis, these guidelines could be applied to any preclinical neuroscience research using MRI. We also request referees and journal editors to scrutinize papers for these details. A complete reporting of relevant information is key for a potential replication of findings (Kilkenny et al., 2010).

Results from the meta-analyses
The meta-analysis for the MRI outcomes showed that therapies tested in MS animal models had an overall beneficial effect on the model disease course. The same is true for the outcomes (re-)myelination, neuroinflammation, and neurodegeneration. However, the effectsize summaries and the therapy effect sizes should be interpreted with caution, due to mostly small study sample sizes,differences in study design characteristics, and overall low numbers of studies. They should therefore not be used as rank order of potency.
Interestingly, there was a statistically significant positive correlation between SMDs from the non-contrast-enhanced MRI outcomes and histological measures of (re-)myelination. This suggests that non-contrast enhanced MRI outcomes reflect the underlying (re-)myelination status reasonably well. Surprisingly, despite the relative success of therapeutic development to modulate inflammation, our data did not support a significant correlation between MRI outcomes and histological outcomes of neuroinflammation. There were also too few studies available to do subgroup analysis for specific imaging outcomes such as contrast-enhancing lesions. More studies are thus needed to address whether there is a correlation between specific imaging findings and underlying histopathology in MS and/or corresponding animal models. This holds particularly true for measures of neurodegeneration: only four publications concomitantly assessed histological measures of Fig. 4. Correlation analysis between standardized mean difference (SMD) of the MRI outcomes and histological markers of (re-)myelination (A) or neuroinflammation (B). The analysis indicates a statistically significant correlation between SMDs of non-contrast-enhanced MRI outcomes and SMDs of (re-) myelination (r = 0.63, p < 0.001). No statistically significant correlation was found between SMDs of MRI outcomes and neuroinflammation. neurodegeneration and MRI. Moreover, no therapies are currently approved to mitigate neurodegeneration in MS, which is present even early in the disease course Trapp et al., 1998). A deeper understanding about certain MR image features and their underlying histopathology could facilitate the choice for adequate outcomes in clinical trials (Maggi et al., 2014).
For the assessment of therapy efficacy, most publications used a T2 and/or contrast-enhancing lesion burden measure. These outcomes are also commonly used in the design of clinical trials and thus reflect sound outcome measures also for preclinical research (van Munster and Uitdehaag, 2017). It is noteworthy that DWI, including tractography, is increasingly used in preclinical neuroimaging research and is a popular choice for determining white matter microstructure in vivo (Jelescu and Budde, 2017). Hence, DWI has particular relevance for demyelinating and/or neurodegenerative pathology, both of which are hallmarks of MS (and to some degree MS animal models) (Lassmann and Bradl, 2016;Trapp et al., 1998). Recent attempts at standardizing DWI methodology in preclinical research further support its benefit in MS animal model neuroimaging (Anderson et al., 2020). Yet, the specificity of DWI findings needs to be further validated in correlative histopathology studies (Budde et al., 2009). Also, a careful choice of imaging parameters, such as gradient strength, gradient duration and diffusion time, is key for reliable DWI results (Jelescu and Budde, 2017).
Of note, only a few studies used MRI brain/spinal cord atrophy as outcome measure, even though this outcome is increasingly being used in clinical trials. A potential reason for this discrepancy is technical limitations during post-processing of images, which may impede the determination of a reliable atrophy rate in smaller-scale brains, such as from rodents, especially within the mostly brief time frame of animal studies (Kurniawan, 2018). However, a considerable number of studies performed longitudinal neuroimaging up to 12 months. Such longitudinal assessment of disease processes can greatly support pathophysiological understanding, particularly in neuroinflammation and therefore highly dynamic pathology (Maggi et al., 2017).
Finally, visual inspection of the funnel plot and testing of the Egger regression indicated publication bias, whereby effect sizes are overestimated. It has been suggested that publication bias may account for at least one-third of the efficacy reported in systematic reviews of animal stroke studies Van der Worp et al., 2010). Similar overestimations of effect sizes are likely true for other model diseases, including MS.

Limitations
Our review has some limitations. (1) Many key methodological details of animal studies included in our review were poorly reported. Unfortunately, this also holds true for many other systematic reviews of animal studies (Hooijmans et al., 2015) a situation that seriously hampers reliable risk of bias assessment. Although this limits our ability to reliably estimate the validity of the results of the included studies, we nevertheless included the poorly reported papers in this review because papers that do not report essential details are not necessarily methodologically impaired (Green and Higgins, 2005). (2) For the meta-analysis, the number of studies was low, while the variability between the studies was considerable. This influences the reliability of the conclusions drawn from this systematic review. To account for that, we anticipated heterogeneity by using a random rather than a fixed-effects model for the meta-analysis. (3) We did not perform post-hoc powercalculations due to their limited validity (Levine and Ensom, 2001). It is worth reiterating that sample sizes were small in most of the included studies, in line with previous findings from a large systematic review in neuroscience (Button et al., 2013). Small sample sizes imply low power, which lowers the likelihood that a statistically significant result reflects a true effect (Marino, 2017). • Gradient performance (e.g. 200 mT/m) • Coil (e.g. 8-channel phase array coil)

Conclusions
Our systematic review summarizes preclinical studies using MRI in MS animal models. We show that, whereas preclinically used MRI outcomes correlate well with underlying measures of (re-)myelination, reporting on certain technical aspects of MRI acquisition is poor. We therefore propose minimal, non-onerous reporting guidelines for studies using MRI in a preclinical setup. These guidelines address the important problem of insufficient methodological reporting and accompanying lack of experimental reproducibility. Taken together, findings from our study will inform preclinical researchers on adequate reporting of technical aspects of MRI acquisition. We hope this will encourage successful replication of future results and, eventually, successful bench-to-bedside translation of promising therapeutic approaches.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.