Sir

I agree with the points you make about statistical significance under the heading 'Significant' in your News Feature 'Disputed definitions' (Nature 455, 1023–1028; 2008). However, you do imply that the term 'significant' means simply above or below the 5% level — a figure chosen by the statistician R. A. Fisher for practical reasons and used in the days when people did arithmetic by hand and referred to printed tables.

Nowadays, of course, personal computers do more general calculations and report probability (P) values directly. A P-value may be exact (obtained from counting permutations), an approximation based on asymptotes, or derived from a model by repeated simulation. It then has to be reported and interpreted. Too many scientists — and editors — take the line you reproach and use statistical significance as a criterion of importance.

In addition, significance is calculated in respect of a null model, chosen by the researcher and often in the knowledge that it is untenable. Why would you make measurements to compare groups if you expected to find no differences? A small P-value may therefore be pure fiction as a measure of knowledge gained. This comes on top of any undisclosed history of data selection and of cherry-picking results during the data analysis.

Conversely, numbers obtained from small surveys rarely demonstrate clear-cut (significant) results for individual questions, and a pattern of non-significant results in an expected direction across a range of questions could still be worth reporting as indicative. When the null hypothesis is a straw man, it may be more interesting not to be able to demonstrate the anticipated effect — for example, in a pay survey that finds no gender differences.

I endorse your view that what may seem to be sophistry is a crucial distinction. Compare, for example, the statement “The observed differences could occur 5% of the time if the true effect is zero” with the statement “The probability that the true effect is zero is 5%”. Not only is the latter statement wrong, it does not match the scientific question, which should be to estimate, at a given probability, the minimum size of the effect. Another common variation is to report “no differences between groups” on the basis of t-tests that check for a difference only between the group means.

For scientists, talking statistics can be more dangerous than what your interviewee described as “talking Swahili in Louisiana” — unless they grasp the grammar as well as the words.