Brain volume loss in individuals over time: Source of variance and limits of detectability

BACKGROUND
Brain volume loss measured from magnetic resonance imaging (MRI) is a marker of neurodegeneration and predictor of disability progression in MS, and is commonly used to assess drug efficacy at the group level in clinical trials. Whether measures of brain volume loss could be useful to help guide management of individual patients depends on the relative magnitude of the changes over a given interval to physiological and technical sources of variability.


GOAL
To understand the relative contributions of neurodegeneration vs. physiological and technical sources of variability to measurements of brain volume loss in individuals.


MATERIAL AND METHODS
Multiple T1-weighted 3D MPRAGE images were acquired from a healthy volunteer and MS patient over varying time intervals: 7 times on the first day (before breakfast at 7:30AM and then every 2 h for 12 h), each day for the next 6 working days, and 6 times over the remainder of the year, on 2 S MRI scanners: 1.5T Sonata (S1) and 3.0T TIM Trio (S2). Scan-reposition-rescan data was acquired on S2 for daily, monthly and 1-year visits. Percent brain volume change (PBVC) was measured from baseline to each follow-up scan using FSL/SIENA. We estimated the effect of physiologic fluctuations on brain volume using linear regression of the PBVC values over hourly and daily intervals. The magnitude of the physiological effect was estimated by comparing the root-mean-square error (RMSE) of the regression of all the data points relative to the regression line, for the hourly scans vs the daily scans. Variance due to technical sources was assessed as the RMSE of the regression over time using the intracranial volume as a reference.


RESULTS
The RMSE of PBVC over 12 h, for both scanners combined, ("Hours", 0.15%), was similar to the day-to-day variation over 1 week ("Days", 0.14%), and both were smaller than the RMS error over the year (0.21%). All of these variations, however, were smaller than the scan-reposition-rescan RMSE (0.32%). The variability of PBVC for the individual scanners followed the same trend. The standard error of the mean (SEM) for PBVC was 0.26 for S1, and 0.22 for S2. From these values, we computed the minimum detectable change (MDC) to be 0.7% on S1 and 0.6% on S2. The location of the brain along the z-axis of the magnet inversely correlated with brain volume change for hourly and daily brain volume fluctuations (p < 0.01).


CONCLUSION
Consistent diurnal brain volume fluctuations attributable to physiological shifts were not detectable in this small study. Technical sources of variation dominate measured changes in brain volume in individuals until the volume loss exceeds around 0.6-0.7%. Reliable interpretation of measured brain volume changes as pathological (greater than normal aging) in individuals over 1 year requires changes in excess of about 1.1% (depending on the scanner). Reliable brain atrophy detection in an individual may be feasible if the rate of brain volume loss is large, or if the measurement interval is sufficiently long.


Introduction
Brain atrophy measured from magnetic resonance imaging (MRI) scans is an important in vivo marker of neurodegeneration and predictor of disability progression in MS (Rudick and Fisher, 2013;Sormani et al., 2013), and is commonly used to assess the efficacy of disease-modifying drugs at the group level in clinical trials. It also would be advantageous to use measures of brain atrophy to assess response to therapy in individual patients, if the measurements could be made sensitively and precisely enough.
Image processing techniques exist to measure changes in brain volume that have shown high sensitivity and precision, on average, in experiments where a group of subjects are scanned, repositioned, and immediately rescanned. However, the fluctuations in measured brain volume seen on repeated scans of individuals can be much greater than the mean magnitude of brain volume fluctuations seen on scanreposition-rescan experiments for a group (Biberacher et al., 2016), and can be on the same order as the change in brain volume expected in MS patients over one year. Since changes of this magnitude can be induced over the course of a few hours in dehydration/rehydration experiments (Duning et al., 2005;Kempton et al., 2009;Nakamura et al., 2014a), it is plausible that normal physiological fluctuations in brain water content over the course of a day (e.g. before vs. after meals, before or after coffee consumption, morning vs. evening) could dominate short-term fluctuations (on the time scale of hours) of brain volume in individuals (Sampat et al., 2010;Nakamura et al., 2015). In addition, disease-related processes other than irreversible tissue loss, such as development or resolution of focal or diffuse inflammation in a person with MS, could contribute to changes in brain volume in individuals on the time scale of days to months (Sampat et al., 2010;De Stefano and Arnold, 2015;Rudick et al., 1999). Alternatively, technical sources of fluctuations in brain volume due to drifts in gradient linearity , subtle differences in contrast due to varying radiofrequency inhomogeneity and differences in subject positioning (Cover et al., 2011) could be the main contributors to measured fluctuations in brain volume in individuals over these time frames. Technical factors such as gradient linearity may, in turn, be different from scanner to scanner.
The primary objective of this study was to clarify the magnitude of physiological changes in brain volume, which may impede the use of this measurement to assess neurodegeneration in individual patients over intervals of 1 year or less.

Study design
This was a pilot, observational MRI study performed on two subjects, one healthy volunteer (male, age ¼ 36.4 years) and one MS patient (male, age ¼ 35.3 years), in whom we measured changes in brain volume on images acquired on two different scanners over intervals varying from a few minutes to hours/days/months. The study was approved by the Research Ethics Board of the Montreal Neurological Institute and Hospital, and informed consent was obtained from both participants.

Data collection procedures
We performed MRI scans before breakfast at 7:30 AM and then every 2 h for 12 h, including before and after each meal (7 scans). We then scanned the subjects at the same time of day/prandial state (7:30-8:00 AM, before breakfast) each day for 6 more working days (6 scans), once per month for 5 months (5 scans), and again at 1 year (1 scan) for a total of 19 longitudinal sessions per subject. Subjects were scanned on two scanners in the McConnell Brain Imaging Centre of the Montreal Neurological Institute: a 1.5T Siemens Sonata, VA35 software, 8-channel coil (Scanner 1) and a 3T Siemens TIM Trio, VB17, 32-channel coil (Scanner 2). The following MRI sequences were acquired: (1) Alzheimer's Disease Neuroimaging Initiative (ADNI)-protocol, 3D MPRAGE, high-resolution (1 Â 1 Â 1 mm 3 voxel size) T1-weighted sequence, (2) 3D gradient-recalled echo, T1-weighted sequence at a typical clinical/clinical trials resolution (1 Â 1 Â 3 mm 3 voxel size), (3) dual-echo, turbo spin-echo sequence yielding proton density (PD)-weighted and T2weighted images (both 1 Â 1 Â 3 mm 3 ), and (4) turbo spin-echo FLAIR (1 Â 1 Â 3 mm 3 ). The acquisition time was approximately 30 min per session. The scanner manufacturer-provided distortion correction feature was enabled on the 3T TIM Trio, but a corresponding feature was not available on the older VA35 software of the Sonata. For each exam, care was taken to place the subject in a consistent position with respect to the center of the magnet (isocenter), using external landmarks and the scanner's positioning lasers. After the intensive first day, we acquired scan-reposition-rescan data on Scanner 2 for daily, monthly and 1-year scans, for an additional 12 scanning sessions. In these rescan sessions, after the first set of scans were acquired, the subjects were removed from the scanner and then put back in to repeat the scans. This provided data on the short-term technical reproducibility of the measurements, including effects due to repositioning and scanner tuning, since physiological fluctuations would be negligible over this time frame (eating or drinking between scan and rescan sessions was not permitted).

Data analysis
MRI images were preprocessed using our our longitudinal pipeline, including bias-field correction with N3 (Sled et al., 1998), registration to a standard coordinate space (Nakamura et al., 2014b), and correction of differences in intensity nonuniformity between visits using N3. Whole brain atrophy measurements were made on the N3-intensity-corrected, T1-weighted, 3D MPRAGE images. Percent brain volume change (PBVC) was calculated from baseline to each follow-up scan using the Structural Image Evaluation, using Normalisation, of Atrophy (SIENA) software (Smith et al., 2002), version 2.6, part of the FMRIB Software Library (FSL) (Jenkinson et al., 2012), version 5.0.4. Intracranial volume (ICV), a measure of head size independent of fluctuations in brain volume due to pathology or physiology in adults, was measured using segmentation with BEaST (Eskildsen et al., 2012), and used as a measure of volume change due to instrumental factors. The position of each subject's head relative to the scanner isocenter was estimated for each scan using the z-coordinate of the brain centre (predefined as the centre of mass between the thalamus) relative to the scanner isocentre z-coordinate (the origin, defined as 0). The error in repositioning each follow-up scan relative to the baseline reference scan was defined as the difference in the z-coordinate of the brain centre between the two scans (z-shift).
We estimated the effect of physiologic fluctuations on brain volume using linear regression of the PBVC values over hourly and daily intervals. The magnitude of the physiological effect was estimated by comparing the root-mean-square (RMS) error of the regression, i.e. the square root of the mean squared difference (visualized as the vertical distance on the PBVC axis) of all the data points relative to the regression line, for the hourly scans vs the daily scans. Variance due to technical sources was similarly assessed as the RMS error of the regression over time using the intracranial volume as a reference. The effect of head position on PBVC was assessed by correlating z-shift with PBVC from short-term (hourly and daily) MRIs.
The minimal detectable change (MDC) of brain volume was calculated as where SEM is the standard error of the mean. The SEM was calculated in R (R Core Team, 2016) as the square root of the residual variance on a linear mixed effect model, where the model included a fixed effect and a random subject effect. The MDC can be described as the size of change below which there is more than a 95% probability that no real change occurred.

Results
The PBVC measured from baseline is plotted in Fig. 1 over three time scales (hours, days and months) for both subjects (healthy volunteer and patient with MS) on both Scanner 1 (1.5T Sonata) and Scanner 2 (3T TIM Trio). The plots graphically depict the extent of the variability in PBVC over the three time scales. Table 1 shows the variability of PBVC in terms of RMS error from the regression line over various intervals. The RMS error over one day, for both scanners combined, ("Hours", 0.148%), was similar to the day-today variation over 1 week ("Days", 0.139%), and both were smaller than the RMS error over the course of the year ("Months", 0.213%). All of these variations, however, were smaller than the scan-reposition-rescan RMS error (0.32%). Thus, technical sources of variability dominated the measurements, precluding any measurement of brain volume fluctuations consistent with physiological changes.
The variability of PBVC for the individual scanners followed the same trend, with Scanner 2 (the 3T TIM Trio) showing less variability than Scanner 1. The standard error of the mean (SEM) for PBVC was 0.259 for Scanner 1, and 0.215 for Scanner 2. From these values, we computed the minimum detectable change (MDC) to be 0.7% on Scanner 1 and 0.6% on Scanner 2. Table 2 shows the mean intracranial volumes (ICV) and their standard deviations for each subject on each scanner. The mean ICVs measured on the two scanners were very similar, with the measurements at 3T (Scanner 2) being slightly smaller (by 0.6% and 1.8% for Subject 1 and Subject 2, respectively). The standard deviation on the ICV measured across all the visits was larger on Scanner 1 (3.8 ml, 0.26% of mean ICV) than on Scanner 2 (1.7 ml, 0.11% of mean ICV).
The dependence of PBVC on the position of the brain within the scanner relative to the baseline (reference) scan is plotted for hourly and daily measurements in Fig. 2. There was a significant correlation (p ¼ 0.004) between the z-shift and PBVC, with a slope of 0.025 percentage unit change in PBVC for every millimeter difference in position of the brain centre between the baseline and follow-up scans.

Discussion
The first main finding of this study was that brain volume changes   attributable to physiological fluctuations due to cyclic, time-of-dayrelated changes in brain volume (which could include, for example, shifts in brain hydration state, blood volume, CSF volume or hormonal levels) were, at the individual level, below the measurement error. The RMS errors for the hourly changes over one day were similar to the RMS errors computed from daily scans performed at the same time of day over a week. In addition, both errors were below the scan-reposition-rescan variability, which isolates the variability due to measurement error. Of the many factors that can affect the stability of MRI scan data (including temporal and spatial radiofrequency stability, impacting image contrast, and gradient stability, impacting geometric distortion), the dependence of PBVC on relative brain position in the scanner strongly implicated geometric distortion due to gradient nonlinearity as the main factor leading to variability in brain volume change measurements. This is consistent with previous observations based on distortion correction using geometric phantoms Fonov et al., 2010). Gradient linearity depends on the specific scanner make and model, as well as siting and maintenance details for individual machines. For example, of the two scanners in this study, the long-bore Siemens TIM Trio (Scanner 2, considered a "high-end", research-oriented scanner when initially purchased) has better, more linear gradients than the older, more clinically-oriented Siemens Sonata (Scanner 1). In general, scanners with long bores will have better gradient linearity than scanners with short bores; other design considerations also will have an impact. Newer scanners will have distortion correction software availablethis should be used consistently when brain volume measurements are to be performed.
We previously reported a significant effect of time-of-day on the percentage change in brain parenchymal fraction (BPF) in a large clinical trial dataset of MS patients (N ¼ 755 patients, 3269 scans) (Nakamura et al., 2015). Using mixed-effects modeling, the fixed effect for time-of-day showed a linear decline in brain volume of 0.09% over 12 h, from 7 am to 7 pm. While the linear decline with time-of-day was highly significant in 755 patients, the magnitude of the decline was well within the fluctuations of BPF seen within an individual in that study (Nakamura et al., 2015). A decline of 0.09% over 12 h is also well within the error of an individual PBVC measurement observed in our current study.
The other main finding of this work was that the minimum detectable change, below which changes of PBVC at the individual level have a 95% probability of not being real, were AE0.7% for Scanner 1 (representative of "clinical" scanners), and AE0.6% for Scanner 2 (representative of a wellmaintained, high-end research scanner). While this may seem to contradict the published scan-rescan error of 0.15% for the SIENA method (Smith et al., 2002), one has to appreciate that this number from the SIENA paper represents the median absolute PBVC across a group of 16 healthy controls on scan-reposition-rescan data. While this reliability measure is relevant for group studies, it does not reflect the variability that can be seen on a single measurement from one pair of scans. In the original SIENA paper, the largest individual PBVC on scan-reposition-rescan was reported to be as high as þ0.6% on images acquired with a resolution of 1 mm (Smith et al., 2001). The implication of our finding is that the presence of brain atrophy between two scans can only be reliably ascertained at the individual level when PBVC exceeds the MDC. In order to determine whether brain atrophy is in the pathological range, and use it to guide clinical decisions, the PBVC must be sufficiently large to interpret, either due to the degree of pathology or the length of the measurement intervals.
Recent work establishing cut-offs to distinguish MS-related brain atrophy from that due to normal aging suggests annualized PBVC cut-offs of À0.40% (80% specificity, 65% sensitivity), À0.46% (90% specificity, 56% sensitivity) or À0.52 (95% specificity, 49% sensitivity) (De Stefano et al., 2016). These cut-offs were determined by statistical analysis of PBVC distributions in groups of healthy controls and MS patients of similar age scanned with long-term follow-up (mean 7.5 years in MS patients and 6.3 years in healthy controls) on the same scanner, specifically to minimize variability due to physiological fluctuations and measurement errors on the annualized rates of atrophy. These cut-offs did not depend on age within the age range of the participants in that study (mean 37 years, range 18-63 years for the MS group) (De Stefano et al., 2016). Another group arrived at similar cut-offs for interpreting SIENA PBVC as atrophy in individual MS patients, also using long-term follow-up data acquired on a single scanner (Uher et al., 2019). Combining this with our MDC findings, we would conservatively propose that a PBVC measured over 1 year be interpreted as atrophy beyond that of normal aging (in the age range typical of MS patients) only if the PBVC is larger (more negative) than À1.12% on Scanner 2 and -1.22% on Scanner 1. Longer intervals improve the ability to detect atrophy because the MDC only applies to the PBVC measurement and not the time interval. Over 2 years, for example, still using the À0.52%/year cutoff, the minimum pathological change cut-off would be (À0.52 x 2) -0.6 ¼ À1.64 (or À0.82/year) on Scanner 2. Our findings agree with recent work by Opfer et al., who analyzed data from 3 reliability datasets using SIENA with optimized pre-processing and estimated a "within-patient fluctuation" (WPF) rate of AE0.54%, leading to minimum PBVC of À1.06% (at a cutoff of À0.52% true PBVC) or À0.94% (at a cutoff of 0.4% true PBVC) to be considered to be in the pathological range (Opfer et al., 2018). These rates are comparable to ours, and the small remaining differences observed could be due to the specific scanners involved and/or details of the SIENA pre-processing (Biberacher et al., 2016).
However, care should be taken in trying to generalize these cutoffs, as they depend on a number of factors, and only apply to the SIENA method for assessing whole-brain volume change (Andorra et al., 2018). Modifications to the SIENA processing such as alternative brain extraction methods as well as specific SIENA settings can alter the variance of the PBVC measure (Biberacher et al., 2016) and, therefore, the MDC. Other factors that may impact the cutoffs include the age of the individual, time from initiation of a new therapy (due to the potential impact of pseudoatrophy, i.e. a loss of brain volume due to reduction of inflammation), scanner and T1-weighted imaging sequence details (Andorra et al., 2018). Thus, these cutoffs should be regarded as approximate, and the interpretation of a brain volume change near these cutoffs, at the individual level, requires local imaging experience and the clinical judgement of an informed neurologist. Fig. 2. The location of the brain along z-axis of the magnet relative to the baseline scan (z-shift) correlated with brain volume change for hourly and daily brain volume fluctuations (p ¼ 0.004) with a slope of 0.025 percentage units of PBVC per mm change in z-shift. Circles are Subject 1 (healthy control) and triangles represent Subject 2 (person with MS).

Conclusions
Consistent diurnal brain volume fluctuations attributable to physiological shifts were not detectable in this small study. Technical sources of variation dominated measured changes in brain volume in individuals until the volume loss exceeded around 0.6-0.7%. Reliable interpretation of measured brain volume changes as pathological (greater than normal aging) in individuals with multiple sclerosis requires changes per year in excess of about 1.1% (depending on the scanner, as well as the measurement technique), which means that the sensitivity of the measurement would be low over an interval of one year or less. Reliable brain atrophy detection in individual patients may be feasible if the rate of brain volume loss is large due to the type of pathology, or if the measurement intervals are sufficiently long.

Declaration of interest
Dr. Sridar Narayanan reports research grants from the Canadian Institutes of Health Research and the Myelin Repair Foundation, and personal fees from NeuroRx Research and Genentech. Dr. Kunio Nakamura reports research grants from NIH, DOD, NMSS, Biogen, Sanofi Genzyme, and Novartis, and personal fees from Sanofi Genzyme (speaking), Neu-roRx (consulting), and Biogen (license royalties). Dr. Douglas Arnold reports consultant fees and/or grants from Acorda, Adelphi, Alkermes, Biogen, Celgene, Frequency Therapeutics, , Genzyme, Hoffman LaRoche, Immune Tolerance Network, Immunotec, MedDay Merck-Serono, Novartis, Pfizer, Receptos, Roche, Sanofi-Aventis, Canadian Institutes oGenentechf Health Research, MS Society of Canada, International Progressive MS Alliance, and an equity interest in NeuroRx Research.