Elsevier

NeuroImage

Volume 60, Issue 1, March 2012, Pages 717-727
NeuroImage

Test–retest variability underlying fMRI measurements

https://doi.org/10.1016/j.neuroimage.2011.11.061Get rights and content

Abstract

Introduction

A high test–retest reliability is of pivotal importance for many disciplines in fMRI research. To assess the current limits of fMRI reliability, we estimated the variability in true underlying Blood Oxygen Level Dependent (BOLD) activation, with which we mean the variability that would be found in the theoretical case when we could obtain an unlimited number of scans in each measurement.

Methods

In this test–retest study, subjects were scanned twice with one week apart, while performing a visual and a motor inhibition task. We addressed the nature of the variability in the underlying BOLD signal, by separating for each brain area and each subject the between-session differences in the spatial pattern of BOLD activation, and the global (whole brain) changes in the amplitude of the spatial pattern of BOLD activation.

Results

We found evidence for changes in the true underlying spatial pattern of BOLD activation for both tasks across the two sessions. The sizes of these changes in pattern activation were approximately 16% of the total activation within the pattern, irrespective of brain area and task. After spatial smoothing, this variability was greatly reduced, which suggests it takes place at a small spatial scale. The mean between-session differences in the amplitude of activation across the whole brain were 13.8% for the visual task and 23.4% for the motor inhibition task.

Conclusions

Between-session changes in the true underlying spatial pattern of BOLD activation are always present, but occur at a scale that is consistent with partial voluming effects or spatial distortions. We found no evidence that the reliability of the spatial pattern of activation differs systematically between brain areas. Consequently, between-session changes in the amplitude of activation are probably due to global effects. The observed variability in amplitude across sessions warrants caution when interpreting fMRI estimates of height of brain activation. A Matlab implementation of the used algorithm is available for download at www.ni-utrecht.nl/downloads/ura.

Introduction

Functional Magnetic Resonance Imaging (fMRI) is a widely applied method for measuring brain activation in humans. For some purposes of fMRI, such as the planning for neurosurgery (Rutten et al., 2002), the definition of phenotypes in genetic studies (Turetsky et al., 2007), or clinical trials predicting the outcome of pharmacological treatment (Chen et al., 2007), a high degree of reliability is demanded, meaning that differences with retesting should be minimal. However, it is well known that activation maps in the same subjects can contain substantial variation across sessions (McGonigle et al., 2000).

This is not surprising, as the fMRI signal not only contains activation related signal (i.e. Blood Oxygen Level Dependent (BOLD) signal) but also noise. This noise is produced both by the scanner and by human physiological processes such as heartbeat and respiration (Kruger and Glover, 2001, van Buuren et al., 2009). Because of this noise, the estimate of the true BOLD signal in a certain voxel will fluctuate around the true underlying mean BOLD signal. We postulate that this true underlying BOLD signal would be revealed should one obtain a number of scans that approaches infinity during each experimental session. We believe however, in regular fMRI experiments the number of obtainable samples is limited, so noise in the fMRI signal is an important factor for determining reliability (Bennett and Miller, 2010).

Besides noise, the estimates of the underlying BOLD signal can also differ because there are in fact true underlying BOLD signal changes between sessions. This true variation, as opposed to variation due to noise, refers to between-session signal changes that are larger than would be expected based on noise alone. More specifically, we define the true variation as the variation in signal that would be measured when we have a number of scans that approaches infinity. In this study we want to estimate the amount of true variation. An estimate of true variation in the underlying BOLD signal can yield a theoretical limit of fMRI reliability of individual measurements. A theoretical limit of fMRI reliability is an important piece of knowledge not only for assessing feasibility of future fMRI studies, but also for providing a more elaborate background for general interpretation of fMRI results.

We will also make an attempt to address the nature of the variability in the underlying BOLD signal, by partitioning the between-session variation in underlying BOLD signal in two different terms: global effects and spatial pattern. We believe that these different variability-terms possibly have different sources. Firstly fMRI signals (i.e. noise and BOLD signal) can vary due to global whole brain variations, affecting the amplitudes of BOLD responses to a similar extent across the entire brain. This type of variation would thus scale the amplitudes of BOLD responses (and their estimates) with roughly the same factor throughout the brain, but leave the spatial pattern of activation relatively unchanged (see Fig. 1A for a schematic representation). Secondly the underlying signal could also differ because of changes in the spatial pattern of activation. The pattern of activation begins to differ when the amount of activation in one voxel changes relative to the amount of activation in another after taking whole brain variation in the amplitude of activation into account (see Fig. 1B for a schematic representation). Variation in the spatial pattern of activation will be assessed per brain area.

Subjects performed a visual (blocked design) and a motor inhibition task (event related design) on two occasions separated by one week. We assessed the presence of global changes in the amplitude of the pattern of BOLD activation and changes in the spatial pattern of underlying BOLD activation in individual subjects. Results show that the underlying patterns of activation are relatively stable over sessions for all brain areas, while the whole brain amplitude of activation is more variable.

Section snippets

Background

The purpose of the analysis was to estimate the variability in the true underlying BOLD signal between sessions. We wanted to express this true underlying variability as a percentage of the mean underlying signal. We assumed that the true underlying BOLD signal would be found when we acquired an infinite number of scans during each session. However, as we extrapolate this estimate of variation in underlying BOLD from sessions that are in reality limited in duration, we also assume the ideal

SDparallel and SDorthogonal

The SDparallel and the SDorthogonal were determined for the two tasks and for different brain areas. The results can be seen in Fig. 4. The scatterplots of t-values and the fitted lines that underlie these estimates can be seen for all subjects in Supplement 1 for the visual task, and Supplement 2 for the motor inhibition task. Results were tested univariately with one sample t-tests and Bonferroni corrected (p = 0.05) for the number of comparisons (= 70 = equal to the number of cortical segments),

Discussion

We estimated differences in underlying BOLD activation between sessions. The amount of variability in the underlying BOLD signal (i.e. true variation) can yield a theoretical limit of fMRI reliability of individual measurements. In this test–retest study, subjects were scanned one week apart while performing a visual and motor inhibition task. We specifically investigated variations in the spatial pattern of activation, and global changes in the amplitude of the spatial pattern of activation.

Conclusions

In summary, results from this study show that underlying patterns of BOLD activation are relatively stable across sessions, while the amplitude of the activation is more variable. The small pattern variability that we observed was caused by a general phenomenon of the most active voxels also showing the most variation, irrespective of brain area. Furthermore, this pattern variability was present mostly on a very local scale (neighboring voxels). The variability in the amplitudes (global

References (35)

  • K.A. Norman et al.

    Beyond mind-reading: multi-voxel pattern analysis of fMRI data

    Trends Cogn. Sci.

    (2006)
  • M. Raemaekers et al.

    Test–retest reliability of fMRI activation during prosaccades and antisaccades

    NeuroImage

    (2007)
  • A.M. Smith et al.

    Investigation of low frequency drift in fMRI signal

    NeuroImage

    (1999)
  • D.A. Soltysik et al.

    Head-repositioning does not reduce the reproducibility of fMRI activation in a block-design motor task

    NeuroImage

    (2011)
  • M. Vink et al.

    Striatal dysfunction in schizophrenia and unaffected relatives

    Biol. Psychiatry

    (2006)
  • B.B. Zandbelt et al.

    Within-subject variation in BOLD-fMRI signal changes across repeated measurements: quantification and implications for sample size

    NeuroImage

    (2008)
  • C.M. Bennett et al.

    How reliable are the results from functional magnetic resonance imaging?

    Ann. N. Y. Acad. Sci.

    (2010)
  • Cited by (41)

    • An open-access accelerated adult equivalent of the ABCD Study neuroimaging dataset (a-ABCD)

      2022, NeuroImage
      Citation Excerpt :

      Measuring the reliability and meaningful variation of behavior and brain function across repeated study sessions helps disentangle these interactions and interpret longitudinal changes. A growing body of work assesses the reliability of task-based patterns of fMRI activity within-subjects and across sites in youth (Casey et al., 1998; Haller et al., 2018; Kennedy et al., 2021) and adults (Bennett & Miller, 2013; Berman et al., 2010; Buimer et al., 2020; Elliott et al., 2020; Friedman et al., 2008; Gee et al., 2015; Kragel et al., 2021; Li et al., 2020; McGonigle et al., 2000; Raemaekers et al., 2012; for reviews see Herting et al., 2018; Noble et al., 2021). An open question remains, however, about how to “benchmark” session-to-session differences observed in longitudinal developmental neuroimaging datasets—that is, how to disentangle developmental effects from practice effects from repeated testing and other state-related effects and noise.

    • Effects of using different software packages for BOLD analysis in planning a neurosurgical treatment in patients with brain tumours

      2020, Clinical Imaging
      Citation Excerpt :

      A much easier situation is, for example, with MR morphometry and MR spectroscopy, where any standard anatomical scans or phantom can be used [25,26]. Based on the available literature [17,27–37] discussing the algorithms used for fMRI data analysis and employed in the relevant software packages, it can be concluded that differences result from the fact that slightly various patterns were used in the consecutive steps of the analysis. It seems that no previous studies have focused on comparing the volume of active areas as well as distances between the edge of active area and the tumour boundary as identified by FSL, SPM and FuncTool.

    • Test-retest reliability of brain responses to risk-taking during the balloon analogue risk task

      2020, NeuroImage
      Citation Excerpt :

      Furthermore, reliability must be established before fMRI can be used for medical or legal applications (Bennett and Miller, 2010). As such, current literature on the reliability of task-induced brain activation studies (e.g., Atri et al., 2011; Brandt et al., 2013; Cao et al., 2014; Chase et al., 2015; Gorgolewski et al., 2015; Gorgolewski et al., 2013; Morrison et al., 2016; Nettekoven et al., 2018; Plichta et al., 2012; Raemaekers et al., 2012; Sauder et al., 2013; Yang et al., 2019) as well as the reliability of resting-state brain imaging studies (e.g., Blautzik et al., 2013; Braun et al., 2012; Chen et al., 2015; Guo et al., 2012; Li et al., 2012; Liao et al., 2013; Mannfolk et al., 2011; Somandepalli et al., 2015; Song et al., 2012; Yang et al., 2019; Zuo and Xing, 2014) is growing rapidly. Despite the widespread use of the BART paradigm for assessing risk-taking behavior and brain function, the test-retest reliability of brain responses to the BART has not been evaluated.

    View all citing articles on Scopus
    View full text