Psychometric considerations in the measurement of event-related brain potentials: Guidelines for measurement and reporting

doi:10.1016/j.ijpsycho.2016.09.005

International Journal of Psychophysiology

Volume 111, January 2017, Pages 57-67

https://doi.org/10.1016/j.ijpsycho.2016.09.005 Get rights and content

Highlights

•
The psychometrics properties of event-related potentials (ERPs) are context dependent
•
Reliability should be evaluated on a study-by-study basis
•
Guidelines for reporting reliability in studies using ERPs are proposed
•
Generalizability theory is recommended for calculating reliability estimates

Abstract

Failing to consider psychometric issues related to reliability and validity, differential deficits, and statistical power potentially undermines the conclusions of a study. In research using event-related brain potentials (ERPs), numerous contextual factors (population sampled, task, data recording, analysis pipeline, etc.) can impact the reliability of ERP scores. The present review considers the contextual factors that influence ERP score reliability and the downstream effects that reliability has on statistical analyses. Given the context-dependent nature of ERPs, it is recommended that ERP score reliability be formally assessed on a study-by-study basis. Recommended guidelines for ERP studies include 1) reporting the threshold of acceptable reliability and reliability estimates for observed scores, 2) specifying the approach used to estimate reliability, and 3) justifying how trial-count minima were chosen. A reliability threshold for internal consistency of at least 0.70 is recommended, and a threshold of 0.80 is preferred. The review also advocates the use of generalizability theory for estimating score dependability (the generalizability theory analog to reliability) as an improvement on classical test theory reliability estimates, suggesting that the latter is less well suited to ERP research. To facilitate the calculation and reporting of dependability estimates, an open-source Matlab program, the ERP Reliability Analysis Toolbox, is presented.

Section snippets

Introduction: measurement in psychophysiology

It is now widely understood that some of the most exciting work in psychopathology involves discovering and understanding relevant brain mechanisms, without falling prey to naïve reductionism (Lilienfeld, 2007, Miller, 1996, Miller, 2010). Despite the enthusiastic press such work receives, it is far from clear how to proceed. Many avenues beckon, and some of the most exciting research tools, being relatively new, are often the most primitive, demanding, and fickle. Some in the field have

Psychometric properties are context-dependent

Reliability and validity as properties of a measure are not universal but are dependent on a specific population and context, and they should be continually assessed and refined (Smith and McCarthy, 1995). Thus, a measure cannot be said to be reliable or valid in some general sense. It is commonplace to claim score reliability via citing previous psychometric studies (Vacha-Haase et al., 2000, Whittington, 1998), based on the common misunderstanding that reliability is an fundamental property

Generalizability theory

Historically, psychometric studies of ERP scores have utilized classical test theory (CT theory) to estimate reliability. CT theory estimates reliability by partitioning observed variance in a score into two parts: true-score variance, which is assumed to be systematic, and error variance, which is assumed to be random. The reliability estimate is a ratio of true-score variance to observed-score variance and thus only incorporates one source of error at a time. For example, estimates of

Guidelines for reliability in ERP studies

Acceptable levels of reliability largely depend on the context in which measurements are to be used. Various recommendations for reliability are available based on whether a scale is in the early stages of development, whether group differences are being examined, or whether a clinical decision will be made based on a score (e.g., Nunnally and Bernstein, 1994). Due to the various recommendations about reliability standards in psychometric texts, studies that attend to such issues nearly always

Impact of reliability on statistical analyses

The cost of ignoring psychometric properties of measurements, such as score reliability, is particularly high when comparing tasks, environments, or groups. Several related psychometric issues, discussed by Chapman and Chapman, 1973, Chapman and Chapman, 1978, Chapman and Chapman, 2001, Melinder et al. (2005), Miller and Chapman (2001), Miller et al. (1995), Strauss (2001), and Zinbarg et al. (2010), are crucial in comparing groups but rarely engaged. Although historically discussed with

Further considerations of statistical power

In the context of both classic test theory and generalizability theory, low reliability can be thought of as measurements with high error variance, which undermines effect size. Because the statistical power to find an effect depends on effect size, there is a direct relationship between reliability and statistical power. We recommend that a power analysis for a particular effect in a particular study specify how small an effect is worth finding in that context (Miller and Yee, 2015) and that

Improving reliability

ERP score reliability is intimately related to EEG recording procedures, task design, and measurement approach. Any efforts to improve recording procedures to reduce contamination from background EEG noise or isolate the phenomenon being studied from overlapping processes should improve score reliability by reducing measurement error (i.e., variability in ERP scores unrelated to the phenomenon of interest). Resources are available exploring various considerations for setting up an EEG lab to

ERP component validity

Similar to score reliability, determining score validity is part of the continual process of refining a measure and ideally is evaluated in every study that uses that measure (Strauss and Smith, 2009). Validating an ERP measure requires understanding the events that elicit it and its relationship to the psychological and/or biological construct it is believed to manifest. Each subsequent study of the phenomenon implicitly tests whether it co-occurs with the proposed events, examines the

Software as a black box

Numerous analysis packages are available that implement increasingly diverse and complex analysis methods. As a consequence, investigators relying on such packages are becoming more distant from the algorithms on which they rely and less cognizant of the judgment calls needed to apply those algorithms. Conversely, as equipment costs decline (e.g., dense-array EEG systems), and equipment accessibility improves, the user base expands, so the risk of people with less and less training using those

ERP reliability analysis (ERA) toolbox

We are not aware of any general analysis package that provides such tools for calculating psychometric characteristics of data sets. In order to pursue the reliability analyses recommended here, and with the caveats just noted, we offer such a software package. Clayson has developed an accessible reliability-computation package that can be readily integrated into a data-analysis path using common Matlab programming skills (http://peclayson.github.io/ERA_Toolbox; Clayson and Miller, 2017-in this

Summary

The present paper has discussed a variety of fundamental psychometric issues arising in ERP research and how failing to consider these issues potentially undermines the contributions of a study. The paper has also advocated the routine reporting of reliability or dependability estimates as a way to address some of these issues and to improve the transparency of ERP research. To date doing so is strikingly rare, undermining confidence in the available ERP literature. The ERA Toolbox provides a

Acknowledgements

The authors thank Scott A. Baldwin, J. Christopher Edgar, and Tzvetan Popov, and Cindy M. Yee-Bradbury for comments on an earlier draft of this paper.

References (123)

S.A. Baldwin
Improving the rigor of psychophysiology research
Int. J. Psychophysiol.
(2017)
N.N. Boutros et al.
Test-retest reliability of the P50 mid-latency auditory evoked response
Psychiatry Res.
(1991)
E. Bramon et al.
Meta-analysis of the P300 and P50 waveforms in schizophrenia
Schizophr. Res.
(2004)
A.M. Briesch et al.
Generalizability theory: a practical guide to study design, implementation, and interpretation
J. Sch. Psychol.
(2014)
J. Carp
The secret lives of experiments: methods reporting in the fMRI literature
NeuroImage
(2012)
P.E. Clayson et al.
ERP Reliability Analysis (ERA) Toolbox: An Open-source Toolbox for Analyzing the Reliability of Event-related Potentials
Int. J. Psychophysiol.
(2017)
B.A. Clementz et al.
P50 suppression among schizophrenia and normal comparison subjects: a methodological analysis
Biol. Psychiatry
(1997)
J.C. Edgar et al.
Interpreting abnormality: an EEG and MEG study of P50 and the auditory paired-stimulus paradigm
Biol. Psychol.
(2003)
M. Falkenstein et al.
Effects of crossmodal divided attention on late ERP components. II. Error processing in choice reaction tasks
Electroencephalogr. Clin. Neurophysiol.
(1991)
R.W. Heinrichs
Meta-analysis and the science of schizophrenia: variant evidence or evidence of variants?
Neurosci. Biobehav. Rev.
(2004)

K. Jerger et al.

P50 suppression is not affected by attentional manipulations

Biol. Psychiatry

(1992)

N. Kathmann et al.

Sensory gating in normals and schizophrenics: a failure to find strong P50 suppression in normals

Biol. Psychiatry

(1990)

J.S. Lamberti et al.

Within-session changes in sensory gating assessed by P50 evoked potentials in normal subjects

Prog. Neuro-Psychopharmacol. Biol. Psychiatry

(1993)

M.J. Larson

Commitment to cutting-edge research with rigor and replication in psychophysiological science by Michael J. Larson

Int. J. Psychophysiol.

(2016)

M.J. Larson et al.

Sample size calculations in human electrophysiology (EEG and ERP) studies: a systematic review and recommendations for increased rigor

Int. J. Psychophysiol.

(2017)

M.J. Larson et al.

Making sense of all the conflict: a theoretical review and critique of conflict-related ERPs

Int. J. Psychophysiol.

(2014)

S. Laszlo et al.

A direct comparison of active and passive amplification electrodes in the same amplifier system

J. Neurosci. Methods

(2014)

C.M. Michel et al.

EEG source imaging

Clin. Neurophysiol.

(2004)

G.A. Miller

Another quasi-30 years of slow progress

Appl. Prev. Psychol.

(2004)

J.V. Patterson et al.

P50 sensory gating ratios in schizophrenics and controls: a review and data analysis

Psychiatry Res.

(2008)

J. Rentzsch et al.

Test-retest reliability of P50, N100 and P200 auditory sensory gating in healthy subjects

Int. J. Psychophysiol.

(2008)

A. Riesel et al.

The ERN is the ERN is the ERN? Convergent validity of error-related brain activity across different tasks

Biol. Psychol.

(2013)

R. Adcock et al.

Measurement validity: a shared standard for qualitative and quantitative research

Am. Polit. Sci. Rev.

(2001)

A. Anastasi

Psychological Testing

(1997)

S.A. Baldwin et al.

The dependability of electrophysiological measurements of performance monitoring in a clinical sample: a generalizability and decision analysis of the ERN and Pe

Psychophysiology

(2015)

G.L. Barkley et al.

MEG and EEG in epilepsy

J. Clin. Neurophysiol.

(2003)

F. Baugh

Correcting effect sizes for score reliability: a reminder that measurement and substantive issues are linked inextricably

Educ. Psychol. Meas.

(2002)

A. Brand et al.

The precision of effect size estimation from published psychological research: surveying confidence intervals

Psychol. Rep.

(2016)

R.L. Brennan

Generalizability Theory: Statistics for Social Science and Public Policy

(2001)

C.H. Brunia et al.

Correcting ocular artifacts in the EEG: a comparison of several models

J. Psychophysiol.

(1989)

S.M. Cassidy et al.

Retest reliability of event-related potentials: evidence from a variety of paradigms

Psychophysiology

(2012)

L.J. Chapman et al.

Disordered Thought in Schizophrenia

(1973)

L.J. Chapman et al.

The measurement of differential deficit

J. Psychiatr. Res.

(1978)

L.J. Chapman et al.

Commentary on two articles concerning generalized and specific cognitive deficits

J. Abnorm. Psychol.

(2001)

M. Chmielewski et al.

What is being assessed and why it matters: the impact of transient error on trait research

J. Pers. Soc. Psychol.

(2009)

P.E. Clayson et al.

How does noise affect amplitude and latency measurement of event-related potentials (ERPs)? A methodological critique and simulation study

Psychophysiology

(2013)

J. Cohen et al.

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

(2003)

M.G.H. Coles et al.

E.W. Cook et al.

Digital filtering: background and tutorial for psychophysiologists

Psychophysiology

(1992)

L.J. Cronbach

My current thoughts on coefficient alpha and successor procedures

Educ. Psychol. Meas.

(2004)

L.J. Cronbach et al.

Educational Measurement. Test Validation

(1971)

L.J. Cronbach et al.

The Dependability of Behavioral Measures: Theory of Generalizability for Scores and Profiles

(1972)

E. Donchin et al.

Publication criteria for studies of evoked potentials (EP) in man. Report of the methodology committee

J.C. Edgar et al.

Digital Signal Processing

(2016)

X. Fan et al.

Confidence intervals for effect sizes: confidence intervals about score reliability coefficients, please: an EPM guidelines editorial

Educ. Psychol. Meas.

(2001)

D. Foti et al.

Psychometric considerations in using error-related brain activity as a biomarker in psychotic disorders

J. Abnorm. Psychol.

(2013)

R. Freedman et al.

Neurobiological studies of sensory gating in schizophrenia

Schizophr. Bull.

(1987)

R. Freedman et al.

The genetics of sensory gating deficits in schizophrenia

Curr. Psychiatry Rep.

(2003)

W.J. Gehring et al.

A neural system for error detection and compensation

Psychol. Sci.

(1993)

A. Gelman et al.

Beyond power calculations: assessing type S (sign) and type M (magnitude) errors

Perspect. Psychol. Sci.

(2014)

Cited by (0)

View full text

Psychometric considerations in the measurement of event-related brain potentials: Guidelines for measurement and reporting

Highlights

Abstract

Section snippets

Introduction: measurement in psychophysiology

Psychometric properties are context-dependent

Generalizability theory

Guidelines for reliability in ERP studies

Impact of reliability on statistical analyses

Further considerations of statistical power

Improving reliability

ERP component validity

Software as a black box

ERP reliability analysis (ERA) toolbox

Summary

Acknowledgements

Int. J. Psychophysiol.

Psychiatry Res.

Schizophr. Res.

J. Sch. Psychol.

NeuroImage

Int. J. Psychophysiol.

Biol. Psychiatry

Biol. Psychol.

Electroencephalogr. Clin. Neurophysiol.

Neurosci. Biobehav. Rev.

Biol. Psychiatry

Biol. Psychiatry

Prog. Neuro-Psychopharmacol. Biol. Psychiatry

Int. J. Psychophysiol.

Int. J. Psychophysiol.

Int. J. Psychophysiol.

J. Neurosci. Methods

Clin. Neurophysiol.

Appl. Prev. Psychol.

Psychiatry Res.

Int. J. Psychophysiol.

Biol. Psychol.

Measurement validity: a shared standard for qualitative and quantitative research

Am. Polit. Sci. Rev.

Psychological Testing

The dependability of electrophysiological measurements of performance monitoring in a clinical sample: a generalizability and decision analysis of the ERN and Pe

Psychophysiology

MEG and EEG in epilepsy

J. Clin. Neurophysiol.

Correcting effect sizes for score reliability: a reminder that measurement and substantive issues are linked inextricably

Educ. Psychol. Meas.

The precision of effect size estimation from published psychological research: surveying confidence intervals

Psychol. Rep.

Generalizability Theory: Statistics for Social Science and Public Policy

Correcting ocular artifacts in the EEG: a comparison of several models

J. Psychophysiol.

Retest reliability of event-related potentials: evidence from a variety of paradigms

Psychophysiology

Disordered Thought in Schizophrenia

The measurement of differential deficit

J. Psychiatr. Res.

Commentary on two articles concerning generalized and specific cognitive deficits

J. Abnorm. Psychol.

What is being assessed and why it matters: the impact of transient error on trait research

J. Pers. Soc. Psychol.

How does noise affect amplitude and latency measurement of event-related potentials (ERPs)? A methodological critique and simulation study

Psychophysiology

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

Digital filtering: background and tutorial for psychophysiologists

Psychophysiology

My current thoughts on coefficient alpha and successor procedures

Educ. Psychol. Meas.

Educational Measurement. Test Validation

The Dependability of Behavioral Measures: Theory of Generalizability for Scores and Profiles

Publication criteria for studies of evoked potentials (EP) in man. Report of the methodology committee

Digital Signal Processing

Confidence intervals for effect sizes: confidence intervals about score reliability coefficients, please: an EPM guidelines editorial

Educ. Psychol. Meas.

Psychometric considerations in using error-related brain activity as a biomarker in psychotic disorders

J. Abnorm. Psychol.

Neurobiological studies of sensory gating in schizophrenia

Schizophr. Bull.

The genetics of sensory gating deficits in schizophrenia

Curr. Psychiatry Rep.