In vitro evaluation of cerebrospinal fluid velocity measurement in type I Chiari malformation: repeatability, reproducibility, and agreement using 2D phase contrast and 4D flow MRI

Phase contrast magnetic resonance imaging, PC MRI, is a valuable tool allowing for non-invasive quantification of CSF dynamics, but has lacked adoption in clinical practice for Chiari malformation diagnostics. To improve these diagnostic practices, a better understanding of PC MRI based measurement agreement, repeatability, and reproducibility of CSF dynamics is needed. An anatomically realistic in vitro subject specific model of a Chiari malformation patient was scanned three times at five different scanning centers using 2D PC MRI and 4D Flow techniques to quantify intra-scanner repeatability, inter-scanner reproducibility, and agreement between imaging modalities. Peak systolic CSF velocities were measured at nine axial planes using 2D PC MRI, which were then compared to 4D Flow peak systolic velocity measurements extracted at those exact axial positions along the model. Comparison of measurement results showed good overall agreement of CSF velocity detection between 2D PC MRI and 4D Flow (p = 0.86), fair intra-scanner repeatability (confidence intervals ± 1.5 cm/s), and poor inter-scanner reproducibility. On average, 4D Flow measurements had a larger variability than 2D PC MRI measurements (standard deviations 1.83 and 1.04 cm/s, respectively). Agreement, repeatability, and reproducibility of 2D PC MRI and 4D Flow detection of peak CSF velocities was quantified using a patient-specific in vitro model of Chiari malformation. In combination, the greatest factor leading to measurement inconsistency was determined to be a lack of reproducibility between different MRI centers. Overall, these findings may help lead to better understanding for application of 2D PC MRI and 4D Flow techniques as diagnostic tools for CSF dynamics quantification in Chiari malformation and related diseases.


Introduction
The dynamic movement of cerebrospinal fluid (CSF) has long been the subject of scientific investigation, and its important functional role to support central nervous system health is increasingly realized. For this reason, non-invasive phase contrast magnetic resonance imaging (PC MRI) quantification of CSF dynamics has been pursued for diagnosis, prognosis, and treatment of neurological diseases such as hydrocephalus [1,2], Chiari malformation [3], and syringomyelia [4,5]. Variabilities in CSF dynamics, such as increased CSF velocities and/ or flow rate, are thought to be indicative of Chiari malformation and related neurological disorders [6,7]. Single-plane two-dimensional, through-plane encoded PC MRI (2D PC MRI) and time-resolved three-dimensional velocity encoded PC MRI (4D Flow) are promising modalities that allow for CSF dynamics characterization. 2D PC MRI is one of the best known non-invasive methods and currently the only method for both qualitative and quantitative CSF characterization [8]. Clinical application of 2D PC MRI is widely varied with use in visualizing morphological and functional alterations in normal pressure hydrocephalous patients as well as CSF flow assessment in Chiari malformation populations with and without syringomyelia [9]. 4D Flow has shown potential to advance in vivo assessment of complex hemodynamic and CSF flow patterns [10][11][12]. Originally developed for cardiovascular applications [13], 4D Flow has been applied to analyze CSF velocity differences between healthy controls and Chiari malformation patients, with and without syrinx formation [14]. Contrast-enhanced MRI techniques have also been applied to quantify relatively slow timescale transport phenomena, such as CSF solute transport in humans [15][16][17]. Additionally, MRI has been applied to quantify short timescale phenomena such as dynamic motion of CSF due to respiration and other maneuvers using real-time PC MRI [18][19][20][21] and time-slip MRI [22,23]. These methods show promise to help reveal new insights about CSF system physiology in health and disease.
At present, the diagnostic relevance of PC MRI-based measurement of CSF velocity dynamics remains under debate by the medical community. For example, the recently published National Institutes of Health common data elements (CDEs) for Chiari malformation clinical research does not include any recommended measurements related to CSF dynamics [24]. The lack of adoption of CSF dynamics as a standard measure for Chiari malformation is likely due to the conflicting findings reported in previous studies comparing CSF velocities in Chiari malformation patients and healthy controls [25][26][27][28]. For example, some investigators report elevated CSF velocities in Chiari malformation patients' pre-surgical treatment, and others reported decreased pre-surgical CSF velocities in Chiari malformation patients compared to post-surgery. Also, there are conflicting reports of both elevated and decreased CSF velocities in healthy subjects compared to Chiari malformation patients.
These conflicting findings were discussed in a review by Shaffer et al. [6].
To address the need for improved CSF dynamics quantification, the present study aims to quantify the agreement, reproducibility, and repeatability of 4D Flow and 2D PC MRI measurement of CSF velocities at the craniovertebral junction. Our focus was the craniovertebral junction CSF velocities because these velocities are thought to potentially be a diagnostic indicator of Chiari malformation. To mitigate normal physiological variation in CSF velocities, our approach utilized a subject-specific high-resolution 3D printed model of a Chiari malformation patient with computer controlled pulsatile CSF pump [29]. We hypothesized that 2D PC MRI and 4D Flow would have strong measurement agreement, repeatability, and reproducibility.

Literature review
We conducted a meta-analysis of all CSF velocity quantification studies applied in Chiari malformation (Table 1). These studies show a range of peak CSF velocities in healthy controls and Chiari patients depending on the measurement position along the spine, voxel size, slice thickness, and number of phases. Figure 1 provides a summary of Table 1 results in terms of the average CSF velocities reported in the studies at each axial slice position along the spine (FM to C5) for healthy subjects (N = 91 included across all studies analyzed) and Chiari malformation patients that have not received decompression surgery (N = 166 included across all studies analyzed). Additional file 1: Fig. S1 contains Forest plots depicting the meta-analysis for each imaging methodology and treatment group, showing the spread of reported peak systolic CSF velocities. This meta-analysis shows peak CSF velocities are elevated in Chiari malformation compared to healthy subjects and the axial position of greatest CSF velocity elevation is most commonly reported at the FM -C1 vertebral level (Fig. 1). However, the standard deviation of peak CSF velocities is considerable compared to group differences and this variance makes specification of a diagnostic threshold for patients versus controls difficult. Notably, several studies included in the meta-analysis had Chiari cohorts with syringomyelia, which is known to affect CSF dynamics [14]. The comorbidity of Chiari and syringomyelia complicates the assessment of Chiari CSF dynamics and requires further investigation to accurately describe the contributions of Chiari and syringomyelia to the CSF dynamics.
Reproducibility and repeatability of CSF velocity measurements, measured in cm/s for individual voxels collected for a region of interest at the craniocervical junction, have not been specifically investigated. A number of studies have been conducted on the reliability of arterial hemodynamics using 4D Flow [30] and 2D PC MRI [31,32] measurements. However, arterial flow velocities are typically one order of magnitude greater than CSF velocities. Thus, the reproducibility/ repeatability results from these arterial hemodynamics studies are difficult to apply for CSF velocities. Repeatability of 2D PC MRI CSF and cerebral blood flow (mm 3 /s) measurements have been investigated and shown to have moderate in vivo test-retest repeatability [33]. In that study, the authors did not quantify reliability of CSF velocity measurement (cm/s) that has been a focus of interest for CSF-based Chiari malformation diagnostic tests. Repeatability of in vivo 2D PC MRI measurements of CSF flow at the aqueduct of Sylvius has been examined and found to have moderate repeatability [33]. However, aqueductal CSF velocities are typically greater than at the craniocervical junction. Also, the CSF space geometry at the craniocervical junction is more complex than the tube-shaped aqueductal geometry. The craniocervical junction anatomy is an annulus shape that contains spinal cord nerve roots, neuroaxis curvature, and tonsillar descent in Chiari malformation patients. Poor 4D Flow accuracy has been found during timeframes corresponding to low CSF flow rate [34], but further research is necessary before clinical application is feasible. While many studies have previously quantified repeatability and operator effects for PC MRI hemodynamic and cerebral blood flow characterization [35][36][37][38][39][40][41][42], few studies have quantified these parameters for PC MRI CSF dynamics characterization (Table 2). Overall, these previous CSF dynamics studies are stratified into focuses on the cerebral aqueduct, the spinal subarachnoid space (SAS), and the C2-C3 area and are summarized in Table 2. These studies consistently reported strong intra/inter operator agreement and peak velocity measurements are independent of the operator, therefore intra/inter-operator effects are null in this context and were not investigated. A study by Tawfik et al. [43] detailed 2D PC MRI measurement repeatability at the cerebral aqueduct and reported a peak velocity standard deviation of 1.9 cm/s, which is comparable to the 1.83 cm/s peak velocity standard deviation we found in the cervical spine (Table 2). In vivo studies by Sakhare Error bars represent pooled reported standard deviation for studies included in each group. The total number of healthy and Chiari malformation patient studies included is 91 and 166, respectively (see Table 1 for individual values). FM = foramen magnum  [33] and Luetmer et al. [44] reported standard deviations of 2D PC MRI CSF flow between 0.04 and 0.98 mL/s but did not look at peak velocity values. Pahlavian et al. [34] performed an accuracy study on 4D Flow quantification of CSF dynamics using a 3D printed in vitro model similar to the one used here and found fairly high accuracy (95% CI ± 1.8 cm/s, Table 2) but did not quantify repeatability nor reproducibility of measurements. These accuracy results from Pahlavian et al. were of similar range as the reproducibility results of this study. To our knowledge, this is the first study to specifically detail the agreement between 2D PC MRI and 4D Flow quantification of peak CSF velocities and characterize reproducibility of measurements across different scanners.
The large variance in CSF velocities reported in Chiari malformation patients versus controls ( Fig. 1) in literature is likely due to the wide range in PC MRI acquisition methods and post-processing techniques. Factors contributing to inconsistency in PC MRI measurement results can be summarized as follows: (1) human error introduced by operator region of interest selection and variance of measurement location particularly with 2D techniques [45], (2) inconsistency in eddy current offset correction [34], (3) spatial resolution of MRI slices [8], (4) temporal resolution of number of phases sampled per cardiac cycle [46], (5) transient impact of respiration on time-average CSF flow measured by PC MRI [18,21,22] (6) Orientation of the neck angulation [47], (7) normal physiological variance in CSF flow [48], (8) noise and other imaging artefacts generated from subject motion in the MRI scanner [34,45,47,49], (9) respiration-induced B 0 variations [50].

Study design
Experiments were performed using an in vitro subject-specific CSF flow model of a Chiari malformation patient that was tested at five different MRI scanners at four different scanning centers. The centers were physically located as follows: Center 1, University Hospital in Cologne Germany (3T Achieva, Philips Healthcare, Best, Netherlands); Center 2, Emory University in Atlanta, Georgia, U.S.A (Siemens 3T PrismaFit, Atlanta, Georgia, U.S.A); Center 4, University Hospital in Basel Switzerland (3T, MAGNETOM Prisma, Siemens Healthcare, Erlangen, Germany); Centers 3 and 5 were both located at University Hospital in Lausanne Switzerland (3T Pris-maFit and 3T Tim Trio, respectively, Siemens Healthcare, Erlangen, Germany). To quantify repeatability, the flow model was scanned three times at each center using both 2D PC MRI immediately followed by 4D Flow MRI. To quantify reproducibility, results were compared across the five centers. Agreement between 2D PC MRI and 4D Flow CSF velocity measurements were also quantified. Results were statistically analyzed within and across MRI centers and between measurement techniques using a linear mixed effects model.

Subject specific in vitro csf flow model and experimental set-up
To control a consistent CSF flow waveform and anatomic shape across MRI measurement centers, we utilized a computer-controlled in vitro model CSF flow system previously developed by our research group [51] (Fig. 2a). The model was designed based on T2-weighted anatomical MRI data collected for a five-year-old Chiari malformation patient with 6.8 mm cerebellar tonsillar descent below the foramen magnum (FM), as described in Bunck et al. [14]. The spinal subarachnoid space was manually segmented from the medulla to the upper thoracic spine based on the T2-weighted images. Dorsal and ventral spinal cord nerve rootlets (NR) were added to the model segmentation based on ex-vivo anatomic measurements of nerve root location, radicular line, and descending angle. The model was printed by stereolithography with a spatial resolution of 75 µm (see Fig. 2b for model dimensions).
4D Flow images were acquired to quantify the subjectspecific CSF flow waveform in the same Chiari malformation patient. CSF flow rate as a function of time was quantified based on a region of interest located at the C2-C3 vertebral level. This waveform was input to an inhouse designed computer-controlled oscillatory syringe pump with pulse-trigger output (for MRI cardiac gating). To allow MRI scanning, the syringe output was connected to the in vitro models via polyethylene tubing. The pump was positioned outside of the scanner operating room with tubing connected to the in vitro model through the waveguide. Tubing was taped to the floor and scanner bed during operation to minimize tubing movement/vibrations during operation. Complete details on the in vitro system dimensions and characterization are provided by Thyagaraj et al. [51]. Scanning was repeated three times at each location. 4D Flow measurements from MRI machines are prone to eddy current offsets arising from non-uniformity of magnetic fields, therefore a static fluid body was placed next to the in vitro model during scanning for a post-processing eddy current offset correction. After affixing the static fluid bodies in place, each trial consisted of a 2D PC MRI scan immediately followed by a 4D Flow scan. Between subsequent trials, the model was manually repositioned by approximately a few centimeters within the scanner bed to mimic realistic conditions in clinics. This repositioning was to mimic the altered position that may occur if a human subject were to be re-scanned in the scanner bed with slightly different body orientation.

In vitro imaging protocol
Imaging parameters were chosen to represent standard clinical procedures such that these results best represent the repeatability and reproducibility seen clinically.
4D Flow and 2D PC MRI images were collected at each center using the following settings (Table 3), adapted from a previous protocol [51]. We sought to have identical imaging parameters applied across all MRI machines and across the 4D Flow and 2D PC MRI protocol. In brief, 4D flow datasets were collected in the sagittal orientation with velocity encoding of 15 cm/s, prospective gating, 16 phases per cardiac cycle leading to a temporal resolution of 30 ms, repetition time (TR) of 7.5 ms, echo time (TE) of 4.6 ms, flip angle (FA) = 5°, with 1.5 mm isotropic resolution. Prospective gating of the model was based on the heart rate recorded in conjunction with the subject specific waveform collected for the computercontrolled model.
2D PC MRI data was collected at nine axial slice positions along the model located as shown in Fig. 3 with distance between axial planes in Table 4. Total imaging time was approximately 15 min for the 4D Flow protocol and ~ 30 s for each 2D PC MRI scan. Slice positions relative to one another (i.e. foramen magnum to C1 vertebral level) was set to be identical across all MRI centers. The 4D Flow acquisition covered the entire region where 2D PC MRI slices were located.

MRI post-processing
Both 4D Flow and 2D PC MRI data were post-processed using GTFlow software (version 2.2.4, Gyrotools Inc, Zurich, Switzerland) by a single person at a center core lab. An eddy current offset correction was applied based on the static fluid body placed next to the in vitro model, to offset errors arising from non-uniformity of the magnetic field [52]. The flow field was also inspected and corrected for any aliasing artefacts when present. 2D PC MRI velocity data at each of the nine axial positions was exported as Matlab (version R2014b, Mathworks Inc, Natick, MA) readable files for quantitative comparison of CSF velocities. At each 2D PC MRI slice position, a 4D flow slice was selected and also exported to Matlab. To quantify peak systolic CSF velocity, first the phase corresponding to peak systole was identified using the maximum spatially averaged velocity of all pixels with non-zero velocities, defined as follows: where i represents phase number, N represents total number of non-zero velocities, and V n represent the thru-plane CSF velocity for the respective pixel. The peak systolic value was then measured as the pixel within the phase of peak systole having the greatest velocity value.

Statistics
Because trial, scanning center, and scan type could have significant effects, we developed the following linear mixed-effects model for each replicate:   Table 4. A = anterior, P = posterior, S = superior, I = inferior where y is the velocity measurement along the spine, xs are binary covariates, βs are the fixed effects, and zs the random effects. Specifically, x 1 indicates whether the treatment group is 4D Flow MRI or not, each of the x k s with k= 2,⋯,5 indicates whether the measurement was taken at the jth scanning center, and each of the x k s with k= 6,⋯,13 indicates whether the measurement was taken at one of the eight axial positions (C1, C2M, C2B, C3, C4, C5, C6, and C7). In this model, β 0 represents the baseline, which is the mean velocity measurement from 2D PC MRI at scanning center #1 at the FM position along the spine. In other words, this model estimates the difference between another scanning center and Center 1, as well as between another axial position and FM. The baseline may also be the overall mean, or another center or axial position. Our analysis aims to test whether the regression coefficient is significantly different from 0; this is the same hypothesis no matter which baseline is chosen. Additionally, z represents random effects of the scanning centers and axial slice position (note that the treatment of 4D versus 2D is assumed to be a fixed effect and not included in the random effects), which follow a multivariate normal distribution with mean of zero and a symmetric variance-covariance matrix: where z = (z 1 , ⋯, z 13 ) is the column vector of all the random effects, 0 is a vector of zeros, and ∑ is the variance-covariance matrix. We used the Matlab (Ver. 2019a Mathworks Corp., Natick, MA) function "fitlme" to estimate the parameters in this linear mixed-effects model and test whether each of the fixed effect sizes is significantly different from zero. If so, this would indicate a statistically significant impact on the parameter from treatment groups, scanning centers, or axial position of velocity measurements. Using this linear mixed-effects model, we obtained p-values for the following 14 fixed effect sizes: the baseline (Center 1), scan type (4D Flow MRI or not), scanning centers (Centers 2-5), axial position of measurement (C1, C2M, C2B, C3, C4, C5, C6, and C7). We accounted for multiple comparisons by applying the Bonferroni correction where the threshold for significant p-values was adjusted to be α/14, where α is the experimentwise type I error rate.

Results
MR images were collected over three trials at five scanning centers using 2D PC MRI and 4D Flow. Trial 3 at Center 4 and trial 2 at Center 5 were excluded from analysis due to a bubble detected in the entrance tubing during scanning; all other scanning centers (Centers 1, 4, and 5) had three successful trials for each imaging modality that were included in analysis.

Agreement of CSF velocity detection by 4D flow versus 2D PC MRI
Our statistical analysis concluded that 4D Flow and 2D PC MRI are comparable methods for CSF velocity measurements at any scanning center and for any vertebral position. No evidence was found indicating disagreement (p = 0.86, Table 5) and there was moderate agreement seen in the Bland Altman Plot (Fig. 4). In all, there was an average difference of 0.02 cm/s between measurements of each scan type with a 95% confidence interval (CI) of −0.28 to 0.24 cm/s (Table 5) and a maximum difference of 2.9 cm/s (Fig. 4). No individual center had perfect agreement between 4D Flow and 2D PC MRI values. While the measurements showed no discernable trend relating to axial position of measurement, relative clusters formed for each scanning center showing that scanning center likely effects velocity measurement. Notably, a linear trend arose wherein the average velocity and difference between imaging modalities linearly decreased from Center 1 to Center 5, sequentially.

Repeatability
Repeatability within centers was relatively consistent with confidence intervals less than ± 2 cm/s (15% of the average measured value of 14 cm/s), (Fig. 5). 4D Flow and 2D PC MRI had similar degrees of repeatability, with some centers showing potentially more consistency of 2D PC MRI measurements and some showing better consistency of 4D Flow measurements. Comparatively, Center 2 showed the greatest degree of repeatability (STD = 0.87 cm/s, Table 6), Center 1 showed the worst degree of repeatability (STD = 1.50 cm/s, Table 6), and Centers 3-5 had relatively moderate repeatability (STD = 1.06 cm/s, 1.18 cm/s, and 1.25 cm/s, respectively, Table 6).
Each center appeared to have a relative offset value of measurements, indicating a calibration factor may be useful in future comparative studies of PC MRI measurement values.

Discussion
This study quantifies agreement, repeatability, and reproducibility of 2D PC MRI and 4D Flow characterization techniques for the measurement of CSF flow velocities at the craniovertebral junction in Chiari malformation. We found that agreement between 2D PC MRI and 4D Flow was good, repeatability within any one scanner was fair, and reproducibility across centers was poor. An anatomically realistic in vitro CSF flow model was used to conduct experiments performed at five MRI scanning centers. Peak systolic velocities were found to range from 8.3 to 17.3 cm/s, which falls within the range of values reported in Chiari malformation patients (Table 1).

Table 5 Effect sizes and corresponding p values estimated from the linear mixed-effects model for velocity measurements
The mean effect size is provided, along with the 95% confidence interval (CI). We used Bonferroni correction to account for multiple testing.

Agreement
Peak systolic velocity values for 2D PC MRI and 4D Flow had overall good agreement for all centers analyzed with an average difference of 0.02 cm/s with 95% CI of −0.28 To 0.24 cm/s (Table 5). This finding supports that either technique can be used within a scanning center and the results would be comparable within exact slices. In clinical practice, a specific slice location is required for 2D PC MRI, while the slice location to be analyzed with 4D Flow is selected after image acquisition by re-slicing of the data. This provides added flexibility for analysis of CSF peak velocities that is not possible using 2D PC MRI. Our approach aimed to acquire 2D PC MRI and 4D Flow with similar spatial and temporal resolution (Table 3). However, it was not possible to identically match all scanner parameters which may have led to some differences in results across protocols. Notably, variance across measurements was greater in 4D Flow results than 2D PC MRI (STD = 1.83 cm/s and 1.04 cm/s, respectively, Table 6). 4D Flow datasets seem to be closer to zero on average than the 2D PC MRI datasets yet the 2D PC MRI data has a narrower range of values than the 4D Flow data. That is to say, 2D PC MRI had less variance overall than 4D Flow across centers but failed to accurately estimate the mean as well as 4D Flow, therefore indicating greater precision and less accuracy in 2D PC MRI than 4D Flow measurements (Fig. 5). Without a "Gold Standard" known peak CSF velocity in the in vitro model, the underlying factor leading to this variance requires further research. This technique-based measurement variance can be seen in Fig. 4, where all measurements lie within ± 3 cm/s. Here, the overall good agreement between the techniques is apparent, but the relative clustering of values based on scanning center reveals an important insight into the reproducibility and repeatability of techniques. These center-based clusters could be due to scanner-specific effects at each center, wherein each scanner has a quantifiable effect on the measurements it makes. With a more focused research study, these scanner-effects can be understood and potentially mitigated by use of a standardized scanner calibration technique.

Repeatability
Repeatability of measurement values within any scanner was dependent on each individual scanning center. This variance could be due to axial slice location relative to the model anatomy as peak velocity can vary significantly across the caudal brain and cervical spine. The difference in peak CSF velocity across axial positions was found to be significantly different from the foramen magnum (FM) baseline in four of eight locations (p < 0.05/14 = 0.0036, Table 5). Therefore, some variance is expected in the model and will likely be even greater in vivo. Figure 5 provides a visual depiction of the repeatability of either technique in each center where each measurement value was subtracted from the average peak systolic CSF velocity across axial positions for each center. Specifically, the horizontal bars across each box represent the median of each dataset; the closer this median bar is to zero, the better the repeatability within that center for that scanning technique.

Reproducibility
Overall, the most important factor leading to measurement inconsistency in our study was lack of reproducibility across MRI scanning centers. Figure 6 shows that across axial positions, each center tended to have a relative offset based on the specific scanner used. In general, Center 1 reported the highest values for peak systolic CSF velocity followed by each other center sequentially, with Center 5 generally having the lowest reported peak systolic velocity values. This scanner-specific relative offset could be indicative of systemic difference across scanners. As mentioned above, this relative offset at each center is an important source of variance between scanning centers and could potentially be corrected by a standardized calibration procedure. This variance between scanning centers could potentially be due to scanner specific field inhomogeneity, eddy current generated during scanning, and/or inconsistency of the in vitro experimental set up. It is possible to eliminate eddy current offsets in 4D Flow scans by use of a zero-flow condition, whereby the resultant velocity field from that measurement can be applied for correction, but was not done here as this research is clinically oriented and sought to mimic in vivo conditions. Therefore, variance due to eddy current offsets during scanning are expected to be representative of those found in a clinical setting. We sought to reduce the effect of scanner specific field inhomogeneity with postprocessing techniques, but it is to be expected in every clinical setting that there will be local magnetic field inhomogeneities and/or gradient imbalances that could be inconsistent over the whole field of view. To mitigate any potential experimental inconsistency, experiments were conducted with identical conditions across all centers including use of identical tubing, fittings, and computer controlled oscillatory pump and identical control waveform (see "Methods"). Additional details on the in vitro system are also provided by Thyagaraj et al. [51]. Further, reproducibility varies slightly at different axial positions of imaging. This reproducibility is exaggerated at lower vertebral positions, with statistically significant differences at the C3, C4, C6, and C7 positions. Notably, C7 had the greatest significance (p = 9.5 × 10 -12 < 0.05 /14 = 0.0036) and the largest effect size (−2.17, −1.24) of any vertebral position. At higher axial positions, specifically the FM-C2 levels, the difference is not significant. Therefore, we do not believe vertebral position contributed greatly to the lack of reproducibility.

Case study-comparison of centers 2 and 3
Centers 2 and 3 utilized the same machine and provide an interesting case study, therefore a secondary statistics model was utilized wherein Center 2 was used as the reference rather than Center 1 (Additional file 1: Table S1). A statistically significant difference was found between Center 2 and Centers 4 and 5 (p = 1.4 × 10 -6 and 4.1 × 10 -22 , respectively); no significant difference was found between Center 2 and Centers 1 and 3 (p = 0.27 and 0.11, respectively). These results show that while Centers 2 and 3 utilized the same machine and had a small amount of clustering (Fig. 4), Center 2 was most similar to Center 1, therefore utilizing the same type of machine does not guarantee how similar results will or will not be. Based on this, scanner calibration procedures should potentially be developed based on individual scanning machines.

Relevance of findings to clinical diagnostics for chiari malformation
A meta-analysis of similar studies in literature and previous investigations of healthy and Chiari CSF dynamics reveal important insights for the clinical application of novel PC MRI peak velocity quantifications in the cervical spine. Figure 1 shows that these previous investigations of CSF dynamics reported consistently elevated peak CSF velocities in Chiari patients compared to healthy controls at every vertebral level, with a maximum difference of 6.9 cm/s at the C1 position. This difference points towards an underlying physiology of Chiari malformation at the C1 vertebral position that could be leveraged for improved diagnostics pending reliable detection, which requires disagreement between groups to be less than the effect size. Good agreement between 2D PC MRI and 4D Flow measurements indicates both methods would be acceptable in clinical use to characterize CSF dynamics. We also found intra-scanner repeatability of either measurement type to be good, but inter-scanner reproducibility was poor. This lack of reproducibility may help us understand previous studies with conflicting results regarding Chiari CSF dynamics. Mitigation of the lack of reproducibility across centers could be achieved with a standardized calibration procedure such as generating scanner specific reference values for healthy volunteers.

Limitations
Several limitations have been identified within this study, the use of an in vitro model being the primary limitation. To better understand each parameter and its specific effects on measurement variability, a simple model with an analytical solution could be investigated in future studies with varying levels of complexity, though this was not done here as the focus of this research was for Chiari Malformation applications. We utilized an in vitro subject specific model of a pediatric Chiari patient, but in vivo studies are needed to understand the full range of physiologically-rooted variability that can occur such as the impact of respiration, movement artefacts, etc. The use of a pediatric Chiari patient for a subject specific model results in data that is not representative of all conditions and individual anatomies, limiting the application of these results to adult populations and other disease populations. The precise 3D flow field in the in vitro models has not been validated for any specific Chiari patient. This model used one representative flow waveform to control the oscillatory pump, which introduces further specificity of these results and limits a broader application. Importantly, the use of a rigid model here is not realistic to in vivo situations. Rigid model structures do not model tissue properties therefore flow field characteristics of such models cannot be accurately assessed and must be taken into consideration in the application of results shown here. Future studies to detail and develop an in vitro model of these tissue properties are necessary. Depending on the sensitivity of the measurements, the location of the imaging plane can cause error in the results and introduce an operator bias in the data. Operator dependence has been detailed by previous studies [45,53] and therefore was not included here as a parameter of interest.
This study focused on measurement agreement, repeatability, and reproducibility and did not quantify a "Gold Standard" measurement for quantification of accuracy. While we sought to set up identical experiments at each center for each trial, the computer-controlled oscillatory pump can only control the waveform input to the inlet. Although tubing was relatively rigid, the exact waveform at the model outlet cannot be known without independent quantification. The exact waveform could be quantified by independent measurement of CSF velocities by flow measurement with laboratory bench-top devices but was outside the scope of this study. Finally, the use of computational flow dynamics could be used to further characterize velocity field errors and an accuracy based on the input flow waveform. Computational flow dynamics have been detailed in a study previously done by our group [34] and therefore were not included here.

Conclusion
A patient-specific in vitro model of Type I Chiari malformation was used to quantify agreement, repeatability, and reproducibility of 2D PC MRI and 4D Flow quantification of peak CSF velocities. The single greatest factor leading to measurement inconsistency of peak CSF velocities was lack of inter-scanner reproducibility. Taken in combination, the results help identify sources of error that can be improved to allow better application of CSF velocity detection for medical diagnostic purposes. Overall, both 2D PC MRI and 4D Flow techniques show promise as diagnostic tools to quantify CSF dynamics in Chiari malformation.