Psychometric considerations in the measurement of event-related brain potentials: Guidelines for measurement and reporting
Section snippets
Introduction: measurement in psychophysiology
It is now widely understood that some of the most exciting work in psychopathology involves discovering and understanding relevant brain mechanisms, without falling prey to naïve reductionism (Lilienfeld, 2007, Miller, 1996, Miller, 2010). Despite the enthusiastic press such work receives, it is far from clear how to proceed. Many avenues beckon, and some of the most exciting research tools, being relatively new, are often the most primitive, demanding, and fickle. Some in the field have
Psychometric properties are context-dependent
Reliability and validity as properties of a measure are not universal but are dependent on a specific population and context, and they should be continually assessed and refined (Smith and McCarthy, 1995). Thus, a measure cannot be said to be reliable or valid in some general sense. It is commonplace to claim score reliability via citing previous psychometric studies (Vacha-Haase et al., 2000, Whittington, 1998), based on the common misunderstanding that reliability is an fundamental property
Generalizability theory
Historically, psychometric studies of ERP scores have utilized classical test theory (CT theory) to estimate reliability. CT theory estimates reliability by partitioning observed variance in a score into two parts: true-score variance, which is assumed to be systematic, and error variance, which is assumed to be random. The reliability estimate is a ratio of true-score variance to observed-score variance and thus only incorporates one source of error at a time. For example, estimates of
Guidelines for reliability in ERP studies
Acceptable levels of reliability largely depend on the context in which measurements are to be used. Various recommendations for reliability are available based on whether a scale is in the early stages of development, whether group differences are being examined, or whether a clinical decision will be made based on a score (e.g., Nunnally and Bernstein, 1994). Due to the various recommendations about reliability standards in psychometric texts, studies that attend to such issues nearly always
Impact of reliability on statistical analyses
The cost of ignoring psychometric properties of measurements, such as score reliability, is particularly high when comparing tasks, environments, or groups. Several related psychometric issues, discussed by Chapman and Chapman, 1973, Chapman and Chapman, 1978, Chapman and Chapman, 2001, Melinder et al. (2005), Miller and Chapman (2001), Miller et al. (1995), Strauss (2001), and Zinbarg et al. (2010), are crucial in comparing groups but rarely engaged. Although historically discussed with
Further considerations of statistical power
In the context of both classic test theory and generalizability theory, low reliability can be thought of as measurements with high error variance, which undermines effect size. Because the statistical power to find an effect depends on effect size, there is a direct relationship between reliability and statistical power. We recommend that a power analysis for a particular effect in a particular study specify how small an effect is worth finding in that context (Miller and Yee, 2015) and that
Improving reliability
ERP score reliability is intimately related to EEG recording procedures, task design, and measurement approach. Any efforts to improve recording procedures to reduce contamination from background EEG noise or isolate the phenomenon being studied from overlapping processes should improve score reliability by reducing measurement error (i.e., variability in ERP scores unrelated to the phenomenon of interest). Resources are available exploring various considerations for setting up an EEG lab to
ERP component validity
Similar to score reliability, determining score validity is part of the continual process of refining a measure and ideally is evaluated in every study that uses that measure (Strauss and Smith, 2009). Validating an ERP measure requires understanding the events that elicit it and its relationship to the psychological and/or biological construct it is believed to manifest. Each subsequent study of the phenomenon implicitly tests whether it co-occurs with the proposed events, examines the
Software as a black box
Numerous analysis packages are available that implement increasingly diverse and complex analysis methods. As a consequence, investigators relying on such packages are becoming more distant from the algorithms on which they rely and less cognizant of the judgment calls needed to apply those algorithms. Conversely, as equipment costs decline (e.g., dense-array EEG systems), and equipment accessibility improves, the user base expands, so the risk of people with less and less training using those
ERP reliability analysis (ERA) toolbox
We are not aware of any general analysis package that provides such tools for calculating psychometric characteristics of data sets. In order to pursue the reliability analyses recommended here, and with the caveats just noted, we offer such a software package. Clayson has developed an accessible reliability-computation package that can be readily integrated into a data-analysis path using common Matlab programming skills (http://peclayson.github.io/ERA_Toolbox; Clayson and Miller, 2017-in this
Summary
The present paper has discussed a variety of fundamental psychometric issues arising in ERP research and how failing to consider these issues potentially undermines the contributions of a study. The paper has also advocated the routine reporting of reliability or dependability estimates as a way to address some of these issues and to improve the transparency of ERP research. To date doing so is strikingly rare, undermining confidence in the available ERP literature. The ERA Toolbox provides a
Acknowledgements
The authors thank Scott A. Baldwin, J. Christopher Edgar, and Tzvetan Popov, and Cindy M. Yee-Bradbury for comments on an earlier draft of this paper.
References (123)
Improving the rigor of psychophysiology research
Int. J. Psychophysiol.
(2017)- et al.
Test-retest reliability of the P50 mid-latency auditory evoked response
Psychiatry Res.
(1991) - et al.
Meta-analysis of the P300 and P50 waveforms in schizophrenia
Schizophr. Res.
(2004) - et al.
Generalizability theory: a practical guide to study design, implementation, and interpretation
J. Sch. Psychol.
(2014) The secret lives of experiments: methods reporting in the fMRI literature
NeuroImage
(2012)- et al.
ERP Reliability Analysis (ERA) Toolbox: An Open-source Toolbox for Analyzing the Reliability of Event-related Potentials
Int. J. Psychophysiol.
(2017) - et al.
P50 suppression among schizophrenia and normal comparison subjects: a methodological analysis
Biol. Psychiatry
(1997) - et al.
Interpreting abnormality: an EEG and MEG study of P50 and the auditory paired-stimulus paradigm
Biol. Psychol.
(2003) - et al.
Effects of crossmodal divided attention on late ERP components. II. Error processing in choice reaction tasks
Electroencephalogr. Clin. Neurophysiol.
(1991) Meta-analysis and the science of schizophrenia: variant evidence or evidence of variants?
Neurosci. Biobehav. Rev.
(2004)