A precision neuroscience approach to estimating reliability of neural responses during emotion processing: Implications for task-fMRI

Recent work demonstrating low test-retest reliability of neural activation during fMRI tasks raises questions about the utility of task-based fMRI for the study of individual variation in brain function. Two possible sources of the instability in task-based BOLD signal over time are noise or measurement error in the instrument, and meaningful variation across time within-individuals in the construct itself—brain activation elicited during fMRI tasks. Examining the contribution of these two sources of test-retest unreliability in task-evoked brain activity has far-reaching implications for cognitive neuroscience. If test-retest reliability largely reflects measurement error, it suggests that task-based fMRI has little utility in the study of either inter- or intra-individual differences. On the other hand, if task-evoked BOLD signal varies meaningfully over time, it would suggest that this tool may yet be well suited to studying intraindividual variation. We parse these sources of variance in BOLD signal in response to emotional cues over time and within-individuals in a longitudinal sample with 10 monthly fMRI scans. Test-retest reliability was low, reflecting a lack of stability in between-person differences across scans. In contrast, within-person, within-session internal consistency of the BOLD signal was higher, and within-person fluctuations across sessions explained almost half the variance in voxel-level neural responses. Additionally, monthly fluctuations in neural response to emotional cues were associated with intraindividual variation in mood, sleep, and exposure to stressors. Rather than reflecting trait-like differences across people, neural responses to emotional cues may be more reflective of intraindividual variation over time. These patterns suggest that task-based fMRI may be able to contribute to the study of individual variation in brain function if more attention is given to within-individual variation approaches, psychometrics—beginning with improving reliability beyond the modest estimates observed here, and the validity of task fMRI beyond the suggestive associations reported here.

Participants were asked to charge their device and to synchronize it with an app-based database every day, and received reminders via text and phone call if they had not synchronized their device for three consecutive days.Participants were also asked to synchronize their data in-person at the monthly visits.Failure to manually sync the data to their phone for a period of seven consecutive days resulted in those data being overwritten and therefore going missing.Aggregating across all participants, 32.47% out of the total 10,106 total study days were missing sleep data (ranging from 1.8% to 88.8% per participant).While it is important to note that six subjects had more than 60% of their maximum possible data missing, the minimum number of available sleep observations per subject was 48 days, which is three to sixteen times more daily data than prior actigraphy-based sleep studies with adolescents to date (Carpenter et al., 2017;Cohodes et al., 2020;Doane and Thurston, 2014;Harbard et al., 2016;Littlewood et al., 2019).To address the possibility that sleep data was not missing completely at random, we used all available data and a maximum likelihood estimator to fit multilevel models.This approach is robust to the data being missing at random, that is, to missingness that is conditional on observed values of the modeled outcome, in this case sleep variables (Matta et al., 2018).It is not possible to determine empirically whether data are missing not at random, as this depends on unobserved values, but it is unlikely that differences in sleep duration or neural function cause participants to remove or turn off their watches.

Model specification and priors for Bayesian models
The following describes the priors used for each Bayesian model extracted directly from model fits.

Power analysis simulation
We have 30 participants, and 292 total observations.To properly estimate statistical power, it is necessary to use simulation methods in which the data generating model reflects expectations of the data to be observed.
For detailed guidance on how to perform a power calculation for multi-level models, please see Arend and Schäfer (2019) and Green and MacLeod (2016).We perform the following power analysis using the simr package (Green and MacLeod 2016) in R. We assumed 30 participants (level-2 units) and 292 total observations (level-1 units) with no more than 10 observations nested in each level-2 unit.We assumed an ICC = 0.10.All other parameters are set such that we evaluate standardized regression coefficients between 0.05 and .29.We ran 10,000 total iterations across 13 effect sizes (also see Rodman et al, 2021 for a nearly identical power analysis with small differences in assumptions; see this paper's repository for code).
According to this analysis, we would expect power of 80% for effect sizes around B = 0.17 and 95% power for B = .21.Supplemental Tables Table S1.Compensation Schedule Table S2.Effect of time on contrast of Fear > Neutral Table S3.Effect of time on contrast of Fear > Neutral, parcellated analysis Table S4.Between-and within-person correlations among predictor variables Table S5.Within-person response to aversive cues covaries with within-person negative mood Table S6.Within-person response to aversive cues covaries with within-person negative mood, parcellated analysis Table S7.Within-person variation in neural response to aversive cues based on within-person variation in sleep Table S8.Within-person variation in neural response to aversive cues based on within-person variation in sleep, parcellated analysis Table S9.Within-person response to aversive cues covaries with within-person stressful life events Table S10.Within-person response to aversive cues covaries with within-person stressful life events, parcellated analysis Table S11.Within-person response to aversive cues covaries with within-person chronic stress Table S12.Within-person response to aversive cues covaries with within-person chronic stress, parcellated analysis

Table S3. Effect of time on contrast of Fear > Neutral, parcellated analysis
Notes: Estimates are medians of the posterior distributions for the parameter of interest.The 95% credible interval provides one possible range of plausible parameter values.Please see the manuscript for more information about how parcels were selected.
The data for each parcel was scaled by its standard deviation (across all voxels, participants, and waves), which means that the estimates can be interpreted as the expected change in terms of standard deviations of the BOLD contrast for each unit increase in the variable of interest.

Table S4. Between-and within-person correlations among predictor variables
Between-person Within-person   S6.Within-person response to aversive cues covaries with within-person negative mood, parcellated analysis Notes: Estimates are medians of the posterior distributions for the parameter of interest.The 95% credible interval provides one possible range of plausible parameter values.Please see the manuscript for more information about how parcels were selected.
The data for each parcel was scaled by its standard deviation (across all voxels, participants, and waves), which means that the estimates can be interpreted as the expected change in terms of standard deviations of the BOLD contrast for each unit increase in the variable of interest.Notes: Estimates are medians of the posterior distributions for the parameter of interest.The 95% credible interval provides one possible range of plausible parameter values.Please see the manuscript for more information about how parcels were selected.

Hemisphere
The data for each parcel was scaled by its standard deviation (across all voxels, participants, and waves), which means that the estimates can be interpreted as the expected change in terms of standard deviations of the BOLD contrast for each unit increase in the variable of interest.Table S10.Within-person response to aversive cues covaries with within-person stressful life events, parcellated analysis Notes: Estimates are medians of the posterior distributions for the parameter of interest.The 95% credible interval provides one possible range of plausible parameter values.Please see the manuscript for more information about how parcels were selected.

Hemisphere Parcel Label Estimate [99.988% CI] Location
The data for each parcel was scaled by its standard deviation (across all voxels, participants, and waves), which means that the estimates can be interpreted as the expected change in terms of standard deviations of the BOLD contrast for each unit increase in the variable of interest.

Figure S5 .
Figure S5.Zero-order association of size, smoothness and within-person reliability.

Figure S6 .
Figure S6.Effects on amygdala response to Fear -Neutral

Table S1 .
Compensation Schedule

Table S2 .
Effect of time on contrast of Fear > Neutral

Table S5 .
Within-person response to aversive cues covaries with within-person negative mood

Table S7 .
Within-person variation in neural response to aversive cues based on within-person variation in sleep

Table S8 .
Within-person variation in neural response to aversive cues based on within-person variation in sleep, parcellated analysis

Table S9 .
Within-person response to aversive cues covaries with within-person stressful life events