Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Statistical power and false positive rates for interdependent outcomes are strongly influenced by test type: Implications for behavioral neuroscience

Abstract

Statistical errors in preclinical science are a barrier to reproducibility and translation. For instance, linear models (e.g., ANOVA, linear regression) may be misapplied to data that violate assumptions. In behavioral neuroscience and psychopharmacology, linear models are frequently applied to interdependent or compositional data, which includes behavioral assessments where animals concurrently choose between chambers, objects, outcomes, or types of behavior (e.g., forced swim, novel object, place/social preference). The current study simulated behavioral data for a task with four interdependent choices (i.e., increased choice of a given outcome decreases others) using Monte Carlo methods. 16,000 datasets were simulated (1000 each of 4 effect sizes by 4 sample sizes) and statistical approaches evaluated for accuracy. Linear regression and linear mixed effects regression (LMER) with a single random intercept resulted in high false positives (>60%). Elevated false positives were attenuated in an LMER with random effects for all choice-levels and a binomial logistic mixed effects regression. However, these models were underpowered to reliably detect effects at common preclinical sample sizes. A Bayesian method using prior knowledge for control subjects increased power by up to 30%. These results were confirmed in a second simulation (8000 datasets). These data suggest that statistical analyses may often be misapplied in preclinical paradigms, with common linear methods increasing false positives, but potential alternatives lacking power. Ultimately, using informed priors may balance statistical requirements with ethical imperatives to minimize the number of animals used. These findings highlight the importance of considering statistical assumptions and limitations when designing research studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Task and simulation design.
Fig. 2: Experimental design.
Fig. 3: Results of Experiment 1.
Fig. 4: Results of Experiment 2.
Fig. 5: Results of Experiment 3.
Fig. 6: Results from Experiment 4 on the validation data set.

Similar content being viewed by others

Data availability

Code for this project will be shared on the corresponding author’s GitHub repository (https://github.com/VonderHaarLab) at the time of publication and is also included as Supplement S2. Underlying data for TBI simulations (control set) is publicly available at https://odc-tbi.org/, dataset identifier 703. Data for cue manipulation simulations (validation set) was obtained and used with permission from Dr. Catharine Winstanley at the University of British Columbia.

References

  1. Garner JP. The significance of meaning: why do over 90% of behavioral neuroscience results fail to translate to humans, and what can we do to fix it? ILAR J. 2014;55:438–56.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Burke DA, Whittemore SR, Magnuson DSK. Consequences of common data analysis inaccuracies in CNS trauma injury basic research. J Neurotrauma. 2013;30:797–805.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Elliott S. Impact of inadequate methods and data analysis on reproducibility. J Pharm Sci. 2020;109:1211–9.

    Article  CAS  PubMed  Google Scholar 

  4. Hoekstra R, Kiers HA, Johnson A. Are assumptions of well-known statistical techniques checked, and why (not). Front Psychol. 2012;3:137.

    Article  PubMed  PubMed Central  Google Scholar 

  5. McCullagh PN, J.A. Generalized linear models: Chapman and Hall; 1989.

  6. Porsolt RD, Anton G, Blavet N, Jalfre M. Behavioural despair in rats: a new model sensitive to antidepressant treatments. Eur J Pharm. 1978;47:379–91.

    Article  CAS  Google Scholar 

  7. Slattery DA, Desrayaud S, Cryan JF. GABAB receptor antagonist-mediated antidepressant-like behavior is serotonin-dependent. J Pharmacol Exp Ther. 2005;312:290–6.

    Article  CAS  PubMed  Google Scholar 

  8. Burgdorf CE, Bavley CC, Fischer DK, Walsh AP, Martinez-Rivera A, Hackett JE, et al. Contribution of D1R-expressing neurons of the dorsal dentate gyrus and Ca(v)1.2 channels in extinction of cocaine conditioned place preference. Neuropsychopharmacology. 2020;45:1506–17.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Smith PF, Renner RM, Haslett SJ. Compositional data in neuroscience: if you’ve got it, log it! J Neurosci Methods. 2016;271:154–9.

    Article  PubMed  Google Scholar 

  10. Dang Q, Mazumdar S, Houck PR. Sample size and power calculations based on generalized linear mixed models with correlated binary outcomes. Comput Methods Prog Biomed. 2008;91:122–7.

    Article  Google Scholar 

  11. Knief U, Forstmeier W. Violating the normality assumption may be the lesser of two evils. Behav Res Methods. 2020;53:2576–90.

    Article  Google Scholar 

  12. Bradley JV. Robustness? Br J Math Stat Psychol. 1978;31:144–52.

    Article  Google Scholar 

  13. Micceri T. The unicorn, the normal curve, and other improbable creatures. Psychol Bull. 1989;105:156–66.

    Article  Google Scholar 

  14. Lee Van Horn M, Smith J, Fagan AA, Jaki T, Feaster DJ, Masyn K, et al. Not quite normal: consequences of violating the assumption of normality in regression mixture models. Struct Equ Modeling. 2012;19:227–49.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Young ME, Clark MH, Goffus A, Hoane MR. Mixed effects modeling of Morris water maze data: advantages and cautionary notes. Learn Motiv. 2009;40:160–77.

    Article  Google Scholar 

  16. Young ME, Hoane MR. Mixed effects modeling of Morris water maze data revisited: bayesian censored regression. Learn Behav. 2021;49:307–20.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Smith PF. A guerilla guide to common problems in ‘neurostatistics’: essential statistical topics in neuroscience.J Undergrad Neurosci. 2017;16:R1–r12.

    Google Scholar 

  18. Yu Z, Guindani M, Grieco SF, Chen L, Holmes TC, Xu X. Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron. 2022;110:21–35.

    Article  CAS  PubMed  Google Scholar 

  19. Bonapersona V, Hoijtink H, Sarabdjitsingh RA, Joëls M. Increasing the statistical power of animal experiments with historical control data. Nat Neurosci. 2021;24:470–7.

    Article  CAS  PubMed  Google Scholar 

  20. Zeeb FD, Robbins TW, Winstanley CA. Serotonergic and dopaminergic modulation of gambling behavior as assessed using a novel rat gambling task. Neuropsychopharmacology .2009;34:2329–43.

    Article  CAS  PubMed  Google Scholar 

  21. Vonder Haar C, Martens KM, Frankot MA. Combined dataset of rodent gambling task in rats after brain injury. Open data commons for traumatic brain injury. 2022. http://odc-tbi.org.

  22. Langdon AJ, Hathaway BA, Zorowitz S, Harris CBW, Winstanley CA. Relative insensitivity to time-out punishments induced by win-paired cues in a rat gambling task. Psychopharmacology. 2019;236:2543–56.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Vonder Haar C, Frankot M, Reck AM, Milleson VJ, Martens KM. Large-N rat data enables phenotyping of risky decision-making: a retrospective analysis of brain injury on the Rodent Gambling Task. Front Behav Neurosci. 2022;16:837654.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Bates D, Maechler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67:1–48.

    Article  Google Scholar 

  25. Stanislaw H, Todorov N. Calculation of signal detection theory measures. Behav Res Methods Instrum Comput. 1999;31:137–49.

    Article  CAS  PubMed  Google Scholar 

  26. Bürkner PC. Advanced Bayesian multilevel modeling with the R package brms. R J. 2018;10:395–411.

    Article  Google Scholar 

  27. Young ME. Bayesian data analysis as a tool for behavior analysts. J Exp Anal Behav. 2019;111:225–38.

    Article  PubMed  Google Scholar 

  28. Seyhan AA. Lost in translation: the valley of death across preclinical and clinical divide – identification of problems and overcoming obstacles. Transl Med Commun. 2019;4:18.

    Article  Google Scholar 

  29. Festing M. On determining sample size in experiments involving laboratory animals. Lab Anim. 2018;52:002367721773826.

    Article  Google Scholar 

  30. Kilkenny C, Parsons N, Kadyszewski E, Festing M, Cuthill I, Fry D, et al. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS One. 2009;4:e7824.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Hoy RR. Quantitative skills in undergraduate neuroscience education in the age of big data. Neurosci Lett. 2021;759:136074.

    Article  CAS  PubMed  Google Scholar 

  32. Young ME. Discounting: a practical guide to multilevel analysis of choice data. J Exp Anal Behav. 2018;109:293–312.

    Article  PubMed  Google Scholar 

  33. Alkharusi H. Categorical variables in regression analysis: a comparison of dummy and effect coding. Int J Educ. 2012;4:202–10.

    Article  Google Scholar 

  34. Beltz AM, Beery AK, Becker JB. Analysis of sex differences in pre-clinical and clinical data sets. Neuropsychopharmacology. 2019;44:2155–8.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Diester CM, Banks ML, Neigh GN, Negus SS. Experimental design and analysis for consideration of sex as a biological variable. Neuropsychopharmacology. 2019;44:2159–62.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Meddings JB, Scott RB, Fick GH. Analysis and comparison of sigmoidal curves: application to dose-response data. Am J Physiol. 1989;2571:G982–9.

    Google Scholar 

  37. Horst NK, Jupp B, Roberts AC, Robbins TW. D2 receptors and cognitive flexibility in marmosets: tri-phasic dose-response effects of intra-striatal quinpirole on serial reversal performance. Neuropsychopharmacology. 2019;44:564–71.

    Article  CAS  PubMed  Google Scholar 

  38. NIH. R01-equivalent grants: Competing applications, awards, and success rates. In: NIH Data Book. 2022. https://report.nih.gov/nihdatabook/report/29.

  39. Depaoli S, Winter SD, Visser M. The importance of prior sensitivity analysis in Bayesian statistics: demonstrations using an interactive shiny app. Front Psychol. 2020;11:608045.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Austin PC. Estimating multilevel logistic regression models when the number of clusters is low: A comparison of different statistical software procedures. Int J Biostat. 2010;6:16.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Dalton GL, Phillips AG, Floresco SB. Preferential involvement by nucleus accumbens shell in mediating probabilistic learning and reversal shifts. J Neurosci. 2014;34:4618–26.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Lin B, Bouneffouf D, Cecchi G. Predicting human decision making in psychological tasks with recurrent neural networks. PLoS One. 2022;17:e0267907.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Dixon P. Models of accuracy in repeated-measures designs. J Mem Lang. 2008;59:447–56.

    Article  Google Scholar 

  44. Luke SG. Evaluating significance in linear mixed-effects models in R. Behav Res Methods. 2017;49:1494–502.

    Article  PubMed  Google Scholar 

  45. Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997;53:983–97.

    Article  CAS  PubMed  Google Scholar 

  46. Singer JD, Willet JB. Using Wald statistics to test composite hypotheses about fixed effects. In Applied longitudinal data analysis: modeling change and event occurrence. New York: Oxford University Press; 2003.

  47. Saravanan V, Berman GJ, Sober SJ. Application of the hierarchical bootstrap to multi-level data in neuroscience. Neurons, Behavior, Data Analysis, and Theory. 2020;3:1–25.

  48. Langford IH. Using a generalized linear mixed model to analyze dichotomous choice contingent valuation data. Land Economics. 1994;73:507–14.

  49. NIH. NIH data sharing policy and implementation guidance. 2020. https://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm.

  50. Shaver TK, Ozga JE, Zhu B, Anderson KG, Martens KM, Vonder Haar C. Long-term deficits in risky decision-making after traumatic brain injury on a rat analog of the Iowa gambling task. Brain Res. 2019;1704:103–13.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We would like to thank the researchers in the Vonder Haar laboratory and the Winstanley laboratory who helped to collect the original data which were used to support the simulations in this manuscript.

Funding

This work was supported by the NINDS (R01- NS110905; R01-NS110905-05S1) and Ohio State University.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: MF, CV; Formal analysis: MF, PM, MY; Funding acquisition: CV; Writing – first draft: MF; Writing – review and editing: MF, PM, CV, MY.

Corresponding author

Correspondence to Cole Vonder Haar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Frankot, M., Mueller, P.M., Young, M.E. et al. Statistical power and false positive rates for interdependent outcomes are strongly influenced by test type: Implications for behavioral neuroscience. Neuropsychopharmacol. 48, 1612–1622 (2023). https://doi.org/10.1038/s41386-023-01592-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41386-023-01592-6

This article is cited by

Search

Quick links