Test–retest variability underlying fMRI measurements
Introduction
Functional Magnetic Resonance Imaging (fMRI) is a widely applied method for measuring brain activation in humans. For some purposes of fMRI, such as the planning for neurosurgery (Rutten et al., 2002), the definition of phenotypes in genetic studies (Turetsky et al., 2007), or clinical trials predicting the outcome of pharmacological treatment (Chen et al., 2007), a high degree of reliability is demanded, meaning that differences with retesting should be minimal. However, it is well known that activation maps in the same subjects can contain substantial variation across sessions (McGonigle et al., 2000).
This is not surprising, as the fMRI signal not only contains activation related signal (i.e. Blood Oxygen Level Dependent (BOLD) signal) but also noise. This noise is produced both by the scanner and by human physiological processes such as heartbeat and respiration (Kruger and Glover, 2001, van Buuren et al., 2009). Because of this noise, the estimate of the true BOLD signal in a certain voxel will fluctuate around the true underlying mean BOLD signal. We postulate that this true underlying BOLD signal would be revealed should one obtain a number of scans that approaches infinity during each experimental session. We believe however, in regular fMRI experiments the number of obtainable samples is limited, so noise in the fMRI signal is an important factor for determining reliability (Bennett and Miller, 2010).
Besides noise, the estimates of the underlying BOLD signal can also differ because there are in fact true underlying BOLD signal changes between sessions. This true variation, as opposed to variation due to noise, refers to between-session signal changes that are larger than would be expected based on noise alone. More specifically, we define the true variation as the variation in signal that would be measured when we have a number of scans that approaches infinity. In this study we want to estimate the amount of true variation. An estimate of true variation in the underlying BOLD signal can yield a theoretical limit of fMRI reliability of individual measurements. A theoretical limit of fMRI reliability is an important piece of knowledge not only for assessing feasibility of future fMRI studies, but also for providing a more elaborate background for general interpretation of fMRI results.
We will also make an attempt to address the nature of the variability in the underlying BOLD signal, by partitioning the between-session variation in underlying BOLD signal in two different terms: global effects and spatial pattern. We believe that these different variability-terms possibly have different sources. Firstly fMRI signals (i.e. noise and BOLD signal) can vary due to global whole brain variations, affecting the amplitudes of BOLD responses to a similar extent across the entire brain. This type of variation would thus scale the amplitudes of BOLD responses (and their estimates) with roughly the same factor throughout the brain, but leave the spatial pattern of activation relatively unchanged (see Fig. 1A for a schematic representation). Secondly the underlying signal could also differ because of changes in the spatial pattern of activation. The pattern of activation begins to differ when the amount of activation in one voxel changes relative to the amount of activation in another after taking whole brain variation in the amplitude of activation into account (see Fig. 1B for a schematic representation). Variation in the spatial pattern of activation will be assessed per brain area.
Subjects performed a visual (blocked design) and a motor inhibition task (event related design) on two occasions separated by one week. We assessed the presence of global changes in the amplitude of the pattern of BOLD activation and changes in the spatial pattern of underlying BOLD activation in individual subjects. Results show that the underlying patterns of activation are relatively stable over sessions for all brain areas, while the whole brain amplitude of activation is more variable.
Section snippets
Background
The purpose of the analysis was to estimate the variability in the true underlying BOLD signal between sessions. We wanted to express this true underlying variability as a percentage of the mean underlying signal. We assumed that the true underlying BOLD signal would be found when we acquired an infinite number of scans during each session. However, as we extrapolate this estimate of variation in underlying BOLD from sessions that are in reality limited in duration, we also assume the ideal
SDparallel and SDorthogonal
The SDparallel and the SDorthogonal were determined for the two tasks and for different brain areas. The results can be seen in Fig. 4. The scatterplots of t-values and the fitted lines that underlie these estimates can be seen for all subjects in Supplement 1 for the visual task, and Supplement 2 for the motor inhibition task. Results were tested univariately with one sample t-tests and Bonferroni corrected (p = 0.05) for the number of comparisons (= 70 = equal to the number of cortical segments),
Discussion
We estimated differences in underlying BOLD activation between sessions. The amount of variability in the underlying BOLD signal (i.e. true variation) can yield a theoretical limit of fMRI reliability of individual measurements. In this test–retest study, subjects were scanned one week apart while performing a visual and motor inhibition task. We specifically investigated variations in the spatial pattern of activation, and global changes in the amplitude of the spatial pattern of activation.
Conclusions
In summary, results from this study show that underlying patterns of BOLD activation are relatively stable across sessions, while the amplitude of the activation is more variable. The small pattern variability that we observed was caused by a general phenomenon of the most active voxels also showing the most variation, irrespective of brain area. Furthermore, this pattern variability was present mostly on a very local scale (neighboring voxels). The variability in the amplitudes (global
References (35)
- et al.
Long-term test–retest reliability of functional MRI in a classification learning task
NeuroImage
(2006) - et al.
Measuring fMRI reliability with the intra-class correlation coefficient
NeuroImage
(2009) - et al.
Brain imaging correlates of depressive symptom severity and predictors of symptom improvement after antidepressant treatment
Biol. Psychiatry
(2007) - et al.
Cortical surface-based analysis. I. Segmentation and surface reconstruction
NeuroImage
(1999) - et al.
An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest
NeuroImage
(2006) - et al.
Cortical surface-based analysis. II: Inflation, flattening, and a surface-based coordinate system
NeuroImage
(1999) - et al.
Test–retest reliability of event-related functional MRI in a probabilistic reversal learning task
Psychiatry Res.
(2009) - et al.
Single-trial discrimination for integrating simultaneous EEG and fMRI: identifying cortical areas contributing to trial-to-trial variability in the auditory oddball task
NeuroImage
(2009) - et al.
An event-related fMRI study of the neurobehavioral impact of sleep deprivation on performance of a delayed-match-to-sample task
Brain Res. Cogn Brain Res.
(2004) - et al.
Variability in fMRI: an examination of intersession differences
NeuroImage
(2000)
Beyond mind-reading: multi-voxel pattern analysis of fMRI data
Trends Cogn. Sci.
Test–retest reliability of fMRI activation during prosaccades and antisaccades
NeuroImage
Investigation of low frequency drift in fMRI signal
NeuroImage
Head-repositioning does not reduce the reproducibility of fMRI activation in a block-design motor task
NeuroImage
Striatal dysfunction in schizophrenia and unaffected relatives
Biol. Psychiatry
Within-subject variation in BOLD-fMRI signal changes across repeated measurements: quantification and implications for sample size
NeuroImage
How reliable are the results from functional magnetic resonance imaging?
Ann. N. Y. Acad. Sci.
Cited by (41)
An open-access accelerated adult equivalent of the ABCD Study neuroimaging dataset (a-ABCD)
2022, NeuroImageCitation Excerpt :Measuring the reliability and meaningful variation of behavior and brain function across repeated study sessions helps disentangle these interactions and interpret longitudinal changes. A growing body of work assesses the reliability of task-based patterns of fMRI activity within-subjects and across sites in youth (Casey et al., 1998; Haller et al., 2018; Kennedy et al., 2021) and adults (Bennett & Miller, 2013; Berman et al., 2010; Buimer et al., 2020; Elliott et al., 2020; Friedman et al., 2008; Gee et al., 2015; Kragel et al., 2021; Li et al., 2020; McGonigle et al., 2000; Raemaekers et al., 2012; for reviews see Herting et al., 2018; Noble et al., 2021). An open question remains, however, about how to “benchmark” session-to-session differences observed in longitudinal developmental neuroimaging datasets—that is, how to disentangle developmental effects from practice effects from repeated testing and other state-related effects and noise.
Internal reliability of blame-related functional MRI measures in major depressive disorder
2021, NeuroImage: ClinicalEffects of using different software packages for BOLD analysis in planning a neurosurgical treatment in patients with brain tumours
2020, Clinical ImagingCitation Excerpt :A much easier situation is, for example, with MR morphometry and MR spectroscopy, where any standard anatomical scans or phantom can be used [25,26]. Based on the available literature [17,27–37] discussing the algorithms used for fMRI data analysis and employed in the relevant software packages, it can be concluded that differences result from the fact that slightly various patterns were used in the consecutive steps of the analysis. It seems that no previous studies have focused on comparing the volume of active areas as well as distances between the edge of active area and the tumour boundary as identified by FSL, SPM and FuncTool.
Test-retest reliability of brain responses to risk-taking during the balloon analogue risk task
2020, NeuroImageCitation Excerpt :Furthermore, reliability must be established before fMRI can be used for medical or legal applications (Bennett and Miller, 2010). As such, current literature on the reliability of task-induced brain activation studies (e.g., Atri et al., 2011; Brandt et al., 2013; Cao et al., 2014; Chase et al., 2015; Gorgolewski et al., 2015; Gorgolewski et al., 2013; Morrison et al., 2016; Nettekoven et al., 2018; Plichta et al., 2012; Raemaekers et al., 2012; Sauder et al., 2013; Yang et al., 2019) as well as the reliability of resting-state brain imaging studies (e.g., Blautzik et al., 2013; Braun et al., 2012; Chen et al., 2015; Guo et al., 2012; Li et al., 2012; Liao et al., 2013; Mannfolk et al., 2011; Somandepalli et al., 2015; Song et al., 2012; Yang et al., 2019; Zuo and Xing, 2014) is growing rapidly. Despite the widespread use of the BART paradigm for assessing risk-taking behavior and brain function, the test-retest reliability of brain responses to the BART has not been evaluated.