Statistical testing for baseline differences between randomised groups is not meaningful

Harvey, L. A.

doi:10.1038/s41393-018-0203-y

Download PDF

Editorial
Published: 04 October 2018

Editor-in-Chief

Statistical testing for baseline differences between randomised groups is not meaningful

L. A. Harvey¹

Spinal Cord volume 56, page 919 (2018)Cite this article

27k Accesses
23 Citations
51 Altmetric
Metrics details

It is still common to see statistical testing of baseline data of clinical trials in an attempt to prove that two or more groups to whom patients have been randomised are comparable. For example, groups may be statistically compared on variables such as age, sex or type of injury that are measured before randomisation and before any intervention has been administered. p values less than 0.05 are interpreted as evidence that the groups are not comparable and hence do not provide a fair basis from which to compare the effects of the intervention. At one level this may seem like a reasonable approach. However, at another level, this practice defies the logic of hypothesis testing and encourages ongoing misuse of statistics.

This is not a new revelation. To the contrary, these issues have been talked about for nearly 30 years by leading biostatisticians [1]. Nonetheless, the practice persists. Here are some comments that should dampen enthusiasm for using p values in this way:

“….performing a significance test to compare baseline variables is to assess the probability of something having occurred by chance when we know that it did occur by chance. Such a procedure is clearly absurd.” p. 126 [2].

“P-values for baseline differences do not serve a useful purpose, since they are not testing a useful scientific hypothesis.” p. 2928 [3].

“With few exceptions, the statistical literature is uniform in its agreement on the inappropriateness of using hypothesis testing to compare the distribution of baseline covariates between treated and untreated subjects in RCTs.” p. 142 [4].

“Indeed the practice can accord neither with the logic of significance tests nor with that of hypothesis tests….I suspect that the practice has originated through confused and false analogies with significance and hypothesis tests in general.” p. 1716 [5].

Even if we ignore the criticisms of statistical testing for baseline differences, there is the added problem that an insignificant p value may merely reflect a small sample size. That is, there may be large differences that statistical testing fails to detect. And what if there is a significant difference on one baseline variable? It would be rather surprising if there was not given the high number of variables that are typically measured at baseline. A single p value less than 0.05 among many baseline statistical tests may just reflect a spurious finding.

Spinal Cord encourages authors to refrain from statistically testing for possible baseline imbalance in randomised studies.

References

Bland JM, Altman DG. Comparisons against baseline within randomised groups are often used and can be highly misleading. Trials. 2011;12:264.
Article Google Scholar
Altman AR. Comparability of randomised groups. Statistician. 1985;34:125–36.
Article Google Scholar
Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med. 2002;21:2917–30.
Article Google Scholar
Austin PC, Manca A, Zwarenstein M, Juurlink DN, Stanbrook MB. A substantial and confusing variation exists in handling of baseline covariates in randomized controlled trials: a review of trials published in leading medical journals. J Clin Epidemiol. 2010;63:142–53.
Article Google Scholar
Senn S. Testing for baseline balance in clinical trials. Stat Med. 1994;13:1715–26.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

University of Sydney, Sydney, Australia
L. A. Harvey

Authors

L. A. Harvey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to L. A. Harvey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Harvey, L.A. Statistical testing for baseline differences between randomised groups is not meaningful. Spinal Cord 56, 919 (2018). https://doi.org/10.1038/s41393-018-0203-y

Download citation

Published: 04 October 2018
Issue Date: October 2018
DOI: https://doi.org/10.1038/s41393-018-0203-y

This article is cited by

Efficacy and safety of 0.05% micellar nano-particulate (MNP) cyclosporine ophthalmic emulsion in the treatment of moderate-to-severe keratoconjunctivitis sicca: a 12-week, multicenter, randomized, active-controlled trial
- A Tarakeswara Rao
- Amit Gupta
- Shoibal Mukherjee
BMC Ophthalmology (2023)
Effects of a 7-Day Pornography Abstinence Period on Withdrawal-Related Symptoms in Regular Pornography Users: A Randomized Controlled Study
- David P. Fernandez
- Daria J. Kuss
- Mark D. Griffiths
Archives of Sexual Behavior (2023)
The trick does not work if you have already seen the gorilla: how anticipatory effects contaminate pre-treatment measures in field experiments
- Barak Ariel
- Alex Sutherland
- Matthew Bland
Journal of Experimental Criminology (2021)
A simple checklist, that is all it takes: a cluster randomized controlled field trial on improving the treatment of suspected terrorists by the police
- Brandon Langley
- Barak Ariel
- Cristobal Weinborn
Journal of Experimental Criminology (2021)
The adaptive designs CONSORT extension (ACE) statement: a checklist with explanation and elaboration guideline for reporting randomised trials that use an adaptive design
- Munyaradzi Dimairo
- Philip Pallmann
- Yuki Ando
Trials (2020)

Statistical testing for baseline differences between randomised groups is not meaningful

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

This article is cited by

Efficacy and safety of 0.05% micellar nano-particulate (MNP) cyclosporine ophthalmic emulsion in the treatment of moderate-to-severe keratoconjunctivitis sicca: a 12-week, multicenter, randomized, active-controlled trial

Effects of a 7-Day Pornography Abstinence Period on Withdrawal-Related Symptoms in Regular Pornography Users: A Randomized Controlled Study

The trick does not work if you have already seen the gorilla: how anticipatory effects contaminate pre-treatment measures in field experiments

A simple checklist, that is all it takes: a cluster randomized controlled field trial on improving the treatment of suspected terrorists by the police

The adaptive designs CONSORT extension (ACE) statement: a checklist with explanation and elaboration guideline for reporting randomised trials that use an adaptive design

Search

Quick links

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Efficacy and safety of 0.05% micellar nano-particulate (MNP) cyclosporine ophthalmic emulsion in the treatment of moderate-to-severe keratoconjunctivitis sicca: a 12-week, multicenter, randomized, active-controlled trial

Effects of a 7-Day Pornography Abstinence Period on Withdrawal-Related Symptoms in Regular Pornography Users: A Randomized Controlled Study

The trick does not work if you have already seen the gorilla: how anticipatory effects contaminate pre-treatment measures in field experiments

A simple checklist, that is all it takes: a cluster randomized controlled field trial on improving the treatment of suspected terrorists by the police

The adaptive designs CONSORT extension (ACE) statement: a checklist with explanation and elaboration guideline for reporting randomised trials that use an adaptive design

Search

Quick links