Psychometric considerations in the measurement of event-related brain potentials: Guidelines for measurement and reporting

https://doi.org/10.1016/j.ijpsycho.2016.09.005Get rights and content

Highlights

  • The psychometrics properties of event-related potentials (ERPs) are context dependent

  • Reliability should be evaluated on a study-by-study basis

  • Guidelines for reporting reliability in studies using ERPs are proposed

  • Generalizability theory is recommended for calculating reliability estimates

Abstract

Failing to consider psychometric issues related to reliability and validity, differential deficits, and statistical power potentially undermines the conclusions of a study. In research using event-related brain potentials (ERPs), numerous contextual factors (population sampled, task, data recording, analysis pipeline, etc.) can impact the reliability of ERP scores. The present review considers the contextual factors that influence ERP score reliability and the downstream effects that reliability has on statistical analyses. Given the context-dependent nature of ERPs, it is recommended that ERP score reliability be formally assessed on a study-by-study basis. Recommended guidelines for ERP studies include 1) reporting the threshold of acceptable reliability and reliability estimates for observed scores, 2) specifying the approach used to estimate reliability, and 3) justifying how trial-count minima were chosen. A reliability threshold for internal consistency of at least 0.70 is recommended, and a threshold of 0.80 is preferred. The review also advocates the use of generalizability theory for estimating score dependability (the generalizability theory analog to reliability) as an improvement on classical test theory reliability estimates, suggesting that the latter is less well suited to ERP research. To facilitate the calculation and reporting of dependability estimates, an open-source Matlab program, the ERP Reliability Analysis Toolbox, is presented.

Section snippets

Introduction: measurement in psychophysiology

It is now widely understood that some of the most exciting work in psychopathology involves discovering and understanding relevant brain mechanisms, without falling prey to naïve reductionism (Lilienfeld, 2007, Miller, 1996, Miller, 2010). Despite the enthusiastic press such work receives, it is far from clear how to proceed. Many avenues beckon, and some of the most exciting research tools, being relatively new, are often the most primitive, demanding, and fickle. Some in the field have

Psychometric properties are context-dependent

Reliability and validity as properties of a measure are not universal but are dependent on a specific population and context, and they should be continually assessed and refined (Smith and McCarthy, 1995). Thus, a measure cannot be said to be reliable or valid in some general sense. It is commonplace to claim score reliability via citing previous psychometric studies (Vacha-Haase et al., 2000, Whittington, 1998), based on the common misunderstanding that reliability is an fundamental property

Generalizability theory

Historically, psychometric studies of ERP scores have utilized classical test theory (CT theory) to estimate reliability. CT theory estimates reliability by partitioning observed variance in a score into two parts: true-score variance, which is assumed to be systematic, and error variance, which is assumed to be random. The reliability estimate is a ratio of true-score variance to observed-score variance and thus only incorporates one source of error at a time. For example, estimates of

Guidelines for reliability in ERP studies

Acceptable levels of reliability largely depend on the context in which measurements are to be used. Various recommendations for reliability are available based on whether a scale is in the early stages of development, whether group differences are being examined, or whether a clinical decision will be made based on a score (e.g., Nunnally and Bernstein, 1994). Due to the various recommendations about reliability standards in psychometric texts, studies that attend to such issues nearly always

Impact of reliability on statistical analyses

The cost of ignoring psychometric properties of measurements, such as score reliability, is particularly high when comparing tasks, environments, or groups. Several related psychometric issues, discussed by Chapman and Chapman, 1973, Chapman and Chapman, 1978, Chapman and Chapman, 2001, Melinder et al. (2005), Miller and Chapman (2001), Miller et al. (1995), Strauss (2001), and Zinbarg et al. (2010), are crucial in comparing groups but rarely engaged. Although historically discussed with

Further considerations of statistical power

In the context of both classic test theory and generalizability theory, low reliability can be thought of as measurements with high error variance, which undermines effect size. Because the statistical power to find an effect depends on effect size, there is a direct relationship between reliability and statistical power. We recommend that a power analysis for a particular effect in a particular study specify how small an effect is worth finding in that context (Miller and Yee, 2015) and that

Improving reliability

ERP score reliability is intimately related to EEG recording procedures, task design, and measurement approach. Any efforts to improve recording procedures to reduce contamination from background EEG noise or isolate the phenomenon being studied from overlapping processes should improve score reliability by reducing measurement error (i.e., variability in ERP scores unrelated to the phenomenon of interest). Resources are available exploring various considerations for setting up an EEG lab to

ERP component validity

Similar to score reliability, determining score validity is part of the continual process of refining a measure and ideally is evaluated in every study that uses that measure (Strauss and Smith, 2009). Validating an ERP measure requires understanding the events that elicit it and its relationship to the psychological and/or biological construct it is believed to manifest. Each subsequent study of the phenomenon implicitly tests whether it co-occurs with the proposed events, examines the

Software as a black box

Numerous analysis packages are available that implement increasingly diverse and complex analysis methods. As a consequence, investigators relying on such packages are becoming more distant from the algorithms on which they rely and less cognizant of the judgment calls needed to apply those algorithms. Conversely, as equipment costs decline (e.g., dense-array EEG systems), and equipment accessibility improves, the user base expands, so the risk of people with less and less training using those

ERP reliability analysis (ERA) toolbox

We are not aware of any general analysis package that provides such tools for calculating psychometric characteristics of data sets. In order to pursue the reliability analyses recommended here, and with the caveats just noted, we offer such a software package. Clayson has developed an accessible reliability-computation package that can be readily integrated into a data-analysis path using common Matlab programming skills (http://peclayson.github.io/ERA_Toolbox; Clayson and Miller, 2017-in this

Summary

The present paper has discussed a variety of fundamental psychometric issues arising in ERP research and how failing to consider these issues potentially undermines the contributions of a study. The paper has also advocated the routine reporting of reliability or dependability estimates as a way to address some of these issues and to improve the transparency of ERP research. To date doing so is strikingly rare, undermining confidence in the available ERP literature. The ERA Toolbox provides a

Acknowledgements

The authors thank Scott A. Baldwin, J. Christopher Edgar, and Tzvetan Popov, and Cindy M. Yee-Bradbury for comments on an earlier draft of this paper.

References (123)

  • K. Jerger et al.

    P50 suppression is not affected by attentional manipulations

    Biol. Psychiatry

    (1992)
  • N. Kathmann et al.

    Sensory gating in normals and schizophrenics: a failure to find strong P50 suppression in normals

    Biol. Psychiatry

    (1990)
  • J.S. Lamberti et al.

    Within-session changes in sensory gating assessed by P50 evoked potentials in normal subjects

    Prog. Neuro-Psychopharmacol. Biol. Psychiatry

    (1993)
  • M.J. Larson

    Commitment to cutting-edge research with rigor and replication in psychophysiological science by Michael J. Larson

    Int. J. Psychophysiol.

    (2016)
  • M.J. Larson et al.

    Sample size calculations in human electrophysiology (EEG and ERP) studies: a systematic review and recommendations for increased rigor

    Int. J. Psychophysiol.

    (2017)
  • M.J. Larson et al.

    Making sense of all the conflict: a theoretical review and critique of conflict-related ERPs

    Int. J. Psychophysiol.

    (2014)
  • S. Laszlo et al.

    A direct comparison of active and passive amplification electrodes in the same amplifier system

    J. Neurosci. Methods

    (2014)
  • C.M. Michel et al.

    EEG source imaging

    Clin. Neurophysiol.

    (2004)
  • G.A. Miller

    Another quasi-30 years of slow progress

    Appl. Prev. Psychol.

    (2004)
  • J.V. Patterson et al.

    P50 sensory gating ratios in schizophrenics and controls: a review and data analysis

    Psychiatry Res.

    (2008)
  • J. Rentzsch et al.

    Test-retest reliability of P50, N100 and P200 auditory sensory gating in healthy subjects

    Int. J. Psychophysiol.

    (2008)
  • A. Riesel et al.

    The ERN is the ERN is the ERN? Convergent validity of error-related brain activity across different tasks

    Biol. Psychol.

    (2013)
  • R. Adcock et al.

    Measurement validity: a shared standard for qualitative and quantitative research

    Am. Polit. Sci. Rev.

    (2001)
  • A. Anastasi

    Psychological Testing

    (1997)
  • S.A. Baldwin et al.

    The dependability of electrophysiological measurements of performance monitoring in a clinical sample: a generalizability and decision analysis of the ERN and Pe

    Psychophysiology

    (2015)
  • G.L. Barkley et al.

    MEG and EEG in epilepsy

    J. Clin. Neurophysiol.

    (2003)
  • F. Baugh

    Correcting effect sizes for score reliability: a reminder that measurement and substantive issues are linked inextricably

    Educ. Psychol. Meas.

    (2002)
  • A. Brand et al.

    The precision of effect size estimation from published psychological research: surveying confidence intervals

    Psychol. Rep.

    (2016)
  • R.L. Brennan

    Generalizability Theory: Statistics for Social Science and Public Policy

    (2001)
  • C.H. Brunia et al.

    Correcting ocular artifacts in the EEG: a comparison of several models

    J. Psychophysiol.

    (1989)
  • S.M. Cassidy et al.

    Retest reliability of event-related potentials: evidence from a variety of paradigms

    Psychophysiology

    (2012)
  • L.J. Chapman et al.

    Disordered Thought in Schizophrenia

    (1973)
  • L.J. Chapman et al.

    The measurement of differential deficit

    J. Psychiatr. Res.

    (1978)
  • L.J. Chapman et al.

    Commentary on two articles concerning generalized and specific cognitive deficits

    J. Abnorm. Psychol.

    (2001)
  • M. Chmielewski et al.

    What is being assessed and why it matters: the impact of transient error on trait research

    J. Pers. Soc. Psychol.

    (2009)
  • P.E. Clayson et al.

    How does noise affect amplitude and latency measurement of event-related potentials (ERPs)? A methodological critique and simulation study

    Psychophysiology

    (2013)
  • J. Cohen et al.

    Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

    (2003)
  • M.G.H. Coles et al.
  • E.W. Cook et al.

    Digital filtering: background and tutorial for psychophysiologists

    Psychophysiology

    (1992)
  • L.J. Cronbach

    My current thoughts on coefficient alpha and successor procedures

    Educ. Psychol. Meas.

    (2004)
  • L.J. Cronbach et al.

    Educational Measurement. Test Validation

    (1971)
  • L.J. Cronbach et al.

    The Dependability of Behavioral Measures: Theory of Generalizability for Scores and Profiles

    (1972)
  • E. Donchin et al.

    Publication criteria for studies of evoked potentials (EP) in man. Report of the methodology committee

  • J.C. Edgar et al.

    Digital Signal Processing

    (2016)
  • X. Fan et al.

    Confidence intervals for effect sizes: confidence intervals about score reliability coefficients, please: an EPM guidelines editorial

    Educ. Psychol. Meas.

    (2001)
  • D. Foti et al.

    Psychometric considerations in using error-related brain activity as a biomarker in psychotic disorders

    J. Abnorm. Psychol.

    (2013)
  • R. Freedman et al.

    Neurobiological studies of sensory gating in schizophrenia

    Schizophr. Bull.

    (1987)
  • R. Freedman et al.

    The genetics of sensory gating deficits in schizophrenia

    Curr. Psychiatry Rep.

    (2003)
  • W.J. Gehring et al.

    A neural system for error detection and compensation

    Psychol. Sci.

    (1993)
  • A. Gelman et al.

    Beyond power calculations: assessing type S (sign) and type M (magnitude) errors

    Perspect. Psychol. Sci.

    (2014)
  • Cited by (0)

    View full text