Statistical power and false positive rates for interdependent outcomes are strongly influenced by test type: Implications for behavioral neuroscience

Frankot, Michelle; Mueller, Peyton M.; Young, Michael E.; Vonder Haar, Cole

doi:10.1038/s41386-023-01592-6

Article
Published: 04 May 2023

Statistical power and false positive rates for interdependent outcomes are strongly influenced by test type: Implications for behavioral neuroscience

Michelle Frankot^1,2,
Peyton M. Mueller¹,
Michael E. Young³ &
…
Cole Vonder Haar¹

Neuropsychopharmacology volume 48, pages 1612–1622 (2023)Cite this article

908 Accesses
4 Citations
20 Altmetric
Metrics details

Subjects

Abstract

Statistical errors in preclinical science are a barrier to reproducibility and translation. For instance, linear models (e.g., ANOVA, linear regression) may be misapplied to data that violate assumptions. In behavioral neuroscience and psychopharmacology, linear models are frequently applied to interdependent or compositional data, which includes behavioral assessments where animals concurrently choose between chambers, objects, outcomes, or types of behavior (e.g., forced swim, novel object, place/social preference). The current study simulated behavioral data for a task with four interdependent choices (i.e., increased choice of a given outcome decreases others) using Monte Carlo methods. 16,000 datasets were simulated (1000 each of 4 effect sizes by 4 sample sizes) and statistical approaches evaluated for accuracy. Linear regression and linear mixed effects regression (LMER) with a single random intercept resulted in high false positives (>60%). Elevated false positives were attenuated in an LMER with random effects for all choice-levels and a binomial logistic mixed effects regression. However, these models were underpowered to reliably detect effects at common preclinical sample sizes. A Bayesian method using prior knowledge for control subjects increased power by up to 30%. These results were confirmed in a second simulation (8000 datasets). These data suggest that statistical analyses may often be misapplied in preclinical paradigms, with common linear methods increasing false positives, but potential alternatives lacking power. Ultimately, using informed priors may balance statistical requirements with ethical imperatives to minimize the number of animals used. These findings highlight the importance of considering statistical assumptions and limitations when designing research studies.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 6: Results from Experiment 4 on the validation data set.**

What’s wrong with my experiment?: The impact of hidden variables on neuropsychopharmacology research

Article 25 March 2022

Increasing the statistical power of animal experiments with historical control data

Article 18 February 2021

Heterogenising study samples across testing time improves reproducibility of behavioural data

Article Open access 03 June 2019

Data availability

Code for this project will be shared on the corresponding author’s GitHub repository (https://github.com/VonderHaarLab) at the time of publication and is also included as Supplement S2. Underlying data for TBI simulations (control set) is publicly available at https://odc-tbi.org/, dataset identifier 703. Data for cue manipulation simulations (validation set) was obtained and used with permission from Dr. Catharine Winstanley at the University of British Columbia.

References

Garner JP. The significance of meaning: why do over 90% of behavioral neuroscience results fail to translate to humans, and what can we do to fix it? ILAR J. 2014;55:438–56.
Article CAS PubMed PubMed Central Google Scholar
Burke DA, Whittemore SR, Magnuson DSK. Consequences of common data analysis inaccuracies in CNS trauma injury basic research. J Neurotrauma. 2013;30:797–805.
Article PubMed PubMed Central Google Scholar
Elliott S. Impact of inadequate methods and data analysis on reproducibility. J Pharm Sci. 2020;109:1211–9.
Article CAS PubMed Google Scholar
Hoekstra R, Kiers HA, Johnson A. Are assumptions of well-known statistical techniques checked, and why (not). Front Psychol. 2012;3:137.
Article PubMed PubMed Central Google Scholar
McCullagh PN, J.A. Generalized linear models: Chapman and Hall; 1989.
Porsolt RD, Anton G, Blavet N, Jalfre M. Behavioural despair in rats: a new model sensitive to antidepressant treatments. Eur J Pharm. 1978;47:379–91.
Article CAS Google Scholar
Slattery DA, Desrayaud S, Cryan JF. GABAB receptor antagonist-mediated antidepressant-like behavior is serotonin-dependent. J Pharmacol Exp Ther. 2005;312:290–6.
Article CAS PubMed Google Scholar
Burgdorf CE, Bavley CC, Fischer DK, Walsh AP, Martinez-Rivera A, Hackett JE, et al. Contribution of D1R-expressing neurons of the dorsal dentate gyrus and Ca(v)1.2 channels in extinction of cocaine conditioned place preference. Neuropsychopharmacology. 2020;45:1506–17.
Article CAS PubMed PubMed Central Google Scholar
Smith PF, Renner RM, Haslett SJ. Compositional data in neuroscience: if you’ve got it, log it! J Neurosci Methods. 2016;271:154–9.
Article PubMed Google Scholar
Dang Q, Mazumdar S, Houck PR. Sample size and power calculations based on generalized linear mixed models with correlated binary outcomes. Comput Methods Prog Biomed. 2008;91:122–7.
Article Google Scholar
Knief U, Forstmeier W. Violating the normality assumption may be the lesser of two evils. Behav Res Methods. 2020;53:2576–90.
Article Google Scholar
Bradley JV. Robustness? Br J Math Stat Psychol. 1978;31:144–52.
Article Google Scholar
Micceri T. The unicorn, the normal curve, and other improbable creatures. Psychol Bull. 1989;105:156–66.
Article Google Scholar
Lee Van Horn M, Smith J, Fagan AA, Jaki T, Feaster DJ, Masyn K, et al. Not quite normal: consequences of violating the assumption of normality in regression mixture models. Struct Equ Modeling. 2012;19:227–49.
Article CAS PubMed PubMed Central Google Scholar
Young ME, Clark MH, Goffus A, Hoane MR. Mixed effects modeling of Morris water maze data: advantages and cautionary notes. Learn Motiv. 2009;40:160–77.
Article Google Scholar
Young ME, Hoane MR. Mixed effects modeling of Morris water maze data revisited: bayesian censored regression. Learn Behav. 2021;49:307–20.
Article PubMed PubMed Central Google Scholar
Smith PF. A guerilla guide to common problems in ‘neurostatistics’: essential statistical topics in neuroscience.J Undergrad Neurosci. 2017;16:R1–r12.
Google Scholar
Yu Z, Guindani M, Grieco SF, Chen L, Holmes TC, Xu X. Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron. 2022;110:21–35.
Article CAS PubMed Google Scholar
Bonapersona V, Hoijtink H, Sarabdjitsingh RA, Joëls M. Increasing the statistical power of animal experiments with historical control data. Nat Neurosci. 2021;24:470–7.
Article CAS PubMed Google Scholar
Zeeb FD, Robbins TW, Winstanley CA. Serotonergic and dopaminergic modulation of gambling behavior as assessed using a novel rat gambling task. Neuropsychopharmacology .2009;34:2329–43.
Article CAS PubMed Google Scholar
Vonder Haar C, Martens KM, Frankot MA. Combined dataset of rodent gambling task in rats after brain injury. Open data commons for traumatic brain injury. 2022. http://odc-tbi.org.
Langdon AJ, Hathaway BA, Zorowitz S, Harris CBW, Winstanley CA. Relative insensitivity to time-out punishments induced by win-paired cues in a rat gambling task. Psychopharmacology. 2019;236:2543–56.
Article CAS PubMed PubMed Central Google Scholar
Vonder Haar C, Frankot M, Reck AM, Milleson VJ, Martens KM. Large-N rat data enables phenotyping of risky decision-making: a retrospective analysis of brain injury on the Rodent Gambling Task. Front Behav Neurosci. 2022;16:837654.
Article PubMed PubMed Central Google Scholar
Bates D, Maechler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67:1–48.
Article Google Scholar
Stanislaw H, Todorov N. Calculation of signal detection theory measures. Behav Res Methods Instrum Comput. 1999;31:137–49.
Article CAS PubMed Google Scholar
Bürkner PC. Advanced Bayesian multilevel modeling with the R package brms. R J. 2018;10:395–411.
Article Google Scholar
Young ME. Bayesian data analysis as a tool for behavior analysts. J Exp Anal Behav. 2019;111:225–38.
Article PubMed Google Scholar
Seyhan AA. Lost in translation: the valley of death across preclinical and clinical divide – identification of problems and overcoming obstacles. Transl Med Commun. 2019;4:18.
Article Google Scholar
Festing M. On determining sample size in experiments involving laboratory animals. Lab Anim. 2018;52:002367721773826.
Article Google Scholar
Kilkenny C, Parsons N, Kadyszewski E, Festing M, Cuthill I, Fry D, et al. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS One. 2009;4:e7824.
Article PubMed PubMed Central Google Scholar
Hoy RR. Quantitative skills in undergraduate neuroscience education in the age of big data. Neurosci Lett. 2021;759:136074.
Article CAS PubMed Google Scholar
Young ME. Discounting: a practical guide to multilevel analysis of choice data. J Exp Anal Behav. 2018;109:293–312.
Article PubMed Google Scholar
Alkharusi H. Categorical variables in regression analysis: a comparison of dummy and effect coding. Int J Educ. 2012;4:202–10.
Article Google Scholar
Beltz AM, Beery AK, Becker JB. Analysis of sex differences in pre-clinical and clinical data sets. Neuropsychopharmacology. 2019;44:2155–8.
Article PubMed PubMed Central Google Scholar
Diester CM, Banks ML, Neigh GN, Negus SS. Experimental design and analysis for consideration of sex as a biological variable. Neuropsychopharmacology. 2019;44:2159–62.
Article PubMed PubMed Central Google Scholar
Meddings JB, Scott RB, Fick GH. Analysis and comparison of sigmoidal curves: application to dose-response data. Am J Physiol. 1989;2571:G982–9.
Google Scholar
Horst NK, Jupp B, Roberts AC, Robbins TW. D2 receptors and cognitive flexibility in marmosets: tri-phasic dose-response effects of intra-striatal quinpirole on serial reversal performance. Neuropsychopharmacology. 2019;44:564–71.
Article CAS PubMed Google Scholar
NIH. R01-equivalent grants: Competing applications, awards, and success rates. In: NIH Data Book. 2022. https://report.nih.gov/nihdatabook/report/29.
Depaoli S, Winter SD, Visser M. The importance of prior sensitivity analysis in Bayesian statistics: demonstrations using an interactive shiny app. Front Psychol. 2020;11:608045.
Article PubMed PubMed Central Google Scholar
Austin PC. Estimating multilevel logistic regression models when the number of clusters is low: A comparison of different statistical software procedures. Int J Biostat. 2010;6:16.
Article PubMed PubMed Central Google Scholar
Dalton GL, Phillips AG, Floresco SB. Preferential involvement by nucleus accumbens shell in mediating probabilistic learning and reversal shifts. J Neurosci. 2014;34:4618–26.
Article CAS PubMed PubMed Central Google Scholar
Lin B, Bouneffouf D, Cecchi G. Predicting human decision making in psychological tasks with recurrent neural networks. PLoS One. 2022;17:e0267907.
Article CAS PubMed PubMed Central Google Scholar
Dixon P. Models of accuracy in repeated-measures designs. J Mem Lang. 2008;59:447–56.
Article Google Scholar
Luke SG. Evaluating significance in linear mixed-effects models in R. Behav Res Methods. 2017;49:1494–502.
Article PubMed Google Scholar
Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997;53:983–97.
Article CAS PubMed Google Scholar
Singer JD, Willet JB. Using Wald statistics to test composite hypotheses about fixed effects. In Applied longitudinal data analysis: modeling change and event occurrence. New York: Oxford University Press; 2003.
Saravanan V, Berman GJ, Sober SJ. Application of the hierarchical bootstrap to multi-level data in neuroscience. Neurons, Behavior, Data Analysis, and Theory. 2020;3:1–25.
Langford IH. Using a generalized linear mixed model to analyze dichotomous choice contingent valuation data. Land Economics. 1994;73:507–14.
NIH. NIH data sharing policy and implementation guidance. 2020. https://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm.
Shaver TK, Ozga JE, Zhu B, Anderson KG, Martens KM, Vonder Haar C. Long-term deficits in risky decision-making after traumatic brain injury on a rat analog of the Iowa gambling task. Brain Res. 2019;1704:103–13.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We would like to thank the researchers in the Vonder Haar laboratory and the Winstanley laboratory who helped to collect the original data which were used to support the simulations in this manuscript.

Funding

This work was supported by the NINDS (R01- NS110905; R01-NS110905-05S1) and Ohio State University.

Author information

Authors and Affiliations

Injury and Recovery Laboratory, Department of Neuroscience, Ohio State University, Columbus, OH, USA
Michelle Frankot, Peyton M. Mueller & Cole Vonder Haar
Department of Psychology, West Virginia University, Morgantown, WV, USA
Michelle Frankot
Department of Psychological Sciences, Kansas State University, Manhattan, KS, USA
Michael E. Young

Authors

Michelle Frankot
View author publications
You can also search for this author in PubMed Google Scholar
Peyton M. Mueller
View author publications
You can also search for this author in PubMed Google Scholar
Michael E. Young
View author publications
You can also search for this author in PubMed Google Scholar
Cole Vonder Haar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: MF, CV; Formal analysis: MF, PM, MY; Funding acquisition: CV; Writing – first draft: MF; Writing – review and editing: MF, PM, CV, MY.

Corresponding author

Correspondence to Cole Vonder Haar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

S1_Supplement

S2_Simulations

S3_Published_reanalaysis

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Frankot, M., Mueller, P.M., Young, M.E. et al. Statistical power and false positive rates for interdependent outcomes are strongly influenced by test type: Implications for behavioral neuroscience. Neuropsychopharmacol. 48, 1612–1622 (2023). https://doi.org/10.1038/s41386-023-01592-6

Download citation

Received: 31 October 2022
Revised: 23 March 2023
Accepted: 20 April 2023
Published: 04 May 2023
Issue Date: October 2023
DOI: https://doi.org/10.1038/s41386-023-01592-6

This article is cited by

Understanding Individual Subject Differences through Large Behavioral Datasets: Analytical and Statistical Considerations
- Michelle A. Frankot
- Michael E. Young
- Cole Vonder Haar
Perspectives on Behavior Science (2024)