A reanalysis of Lord’s statistical treatment of football numbers

https://doi.org/10.1016/j.jmp.2009.01.002Get rights and content

Abstract

Stevens’ theory of admissible statistics [Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677680] states that measurement levels should guide the choice of statistical test, such that the truth value of statements based on a statistical analysis remains invariant under admissible transformations of the data. Lord [Lord, F. M. (1953). On the statistical treatment of football numbers. American Psychologist, 8, 750–751] challenged this theory. In a thought experiment, a parametric test is performed on football numbers (identifying players: a nominal representation) to decide whether a sample from the machine issuing these numbers should be considered non-random. This is an apparently illegal test, since its outcomes are not invariant under admissible transformations for the nominal measurement level. Nevertheless, it results in a sensible conclusion: the number-issuing machine was tampered with. In the ensuing measurement-statistics debate Lord’s contribution has been influential, but has also led to much confusion. The present aim is to show that the thought experiment contains a serious flaw. First it is shown that the implicit assumption that the numbers are nominal is false. This disqualifies Lord’s argument as a valid counterexample to Stevens’ dictum. Second, it is argued that the football numbers do not represent just the nominal property of non-identity of the players; they also represent the amount of bias in the machine. It is a question about this property–not a property that relates to the identity of the football players–that the statistical test is concerned with. Therefore, only this property is relevant to Lord’s argument. We argue that the level of bias in the machine, indicated by the population mean, conforms to a bisymmetric structure, which means that it lies on an interval scale. In this light, Lord’s thought experiment–interpreted by many as a problematic counterexample to Stevens’ theory of admissible statistics–conforms perfectly to Stevens’ dictum.

Section snippets

Admissible statistics and the measurement-statistics debate

In typical introductory statistics classes, psychology students are taught that the level of measurement should be taken into account when choosing a statistical test. For example, a t test should not be performed on data that are of a nominal or ordinal level. Exactly why this rule should be followed is rarely explained and not widely known among psychologists; therefore we reiterate the rationale for it. Suppose mathematical proficiency of children was measured on an ordinal level. In such a

What do the football numbers measure?

The professor in Lord’s thought experiment repeatedly emphasizes that the numbers are nominal representations of the uniqueness of the players. Now, the numbers can certainly be used to distinguish players on the field; but this is not the property for which the statistician uses the numbers. Instead, the professor asks a question and draws a conclusion about the machine—namely that it was unlikely to be in its original state (randomly shuffled by the professor) when the freshman numbers were

Measuring machines

Lord’s statistician uses the statistical results to make an inference about the state of the vending machine, and decides that the freshman mean did not come from the machine in its original state. Thus, Lord’s inference concerns the state of the machine relative to another (possible) state of the machine. His reference class is not a set of football players, but a set of possible states of the machine (e.g., fair and biased states). Insofar as measurement is taking place in the thought

Conclusion

We have examined extensively why the test in Lord’s thought experiment appears to be inadmissible, while at the same time it leads to a scientifically useful and informative conclusion. In doing so we found that Lord’s argument depends on the assumption that the football numbers represent a property on the nominal level. Not only was it shown that it is immaterial to the argument that the numbers represent nominal uniqueness of the players, it was also shown that another property can be

Acknowledgments

We would like to extend our thanks to professors Willam H. Batchelder and R. Duncan Luce for several helpful suggestions on earlier versions of this manuscript.

References (24)

  • F.S. Roberts

    Applications of the theory of meaningfulness to psychology

    Journal of Mathematical Psychology

    (1985)
  • E.W. Adams et al.

    A theory of appropriate statistics

    Psychometrika

    (1965)
  • N.H. Anderson

    Scales and statistics: Parametric and nonparametric

    Psychological Bulletin

    (1961)
  • B.O. Baker et al.

    Weak measurements vs. strong statistics: An empirical critique of S.S. Stevens’ proscriptions on statistics

    Educational and Psychological Measurement

    (1966)
  • F.L. Behan et al.

    Football numbers (continued)

    The American Psychologist

    (1954)
  • E.M. Bennet

    On the statistical mistreatment of index numbers

    The American Psychologist

    (1954)
  • C.J. Burke

    Additive scales and statistics

    Psychological Review

    (1953)
  • J. Gaito

    Scale classification and statistics

    Psychological Review

    (1960)
  • J. Gaito

    Measurement scales and statistics: Resurgence of an old misconception

    Psychological Bulletin

    (1980)
  • D.J. Hand

    Measurement theory and practice

    (2004)
  • M.R. Harwell et al.

    Rescaling ordinal data to interval data in educational research

    Review of Educational Reseach

    (2002)
  • J. Kampen et al.

    The ordinal controversy revisited

    Quality & Quantity

    (2000)
  • Cited by (19)

    • Using ordinal scales in psychology

      2021, Methods in Psychology
      Citation Excerpt :

      This suggestion has proved controversial for a number of reasons and its value has been debated. ( See, e.g., Anderson, 1961; Cliff, 1992, 1993a, b; Davison and Sharma, 1988; Gardner, 1975; Hand, 1996; Lord, 1953; Michell, 1999; Scholten and Borsboom, 2009; Stine, 1989). It is not our intention to review previous research on the implications of different scale types for statistical and mathematical inference.

    • Statistical approaches to sex estimation

      2020, Sex Estimation of the Human Skeleton: History, Methods, and Emerging Techniques
    • A geometrical approach to the ordinal data of Likert scaling and attitude measurements: The density matrix in psychology

      2013, Journal of Mathematical Psychology
      Citation Excerpt :

      Before closing, it is worth considering how the vector-space approach to Likert scaling sheds light on the infamous ordinal-to-interval scale transformation problem. This is a thorny issue that has engendered considerable debate in the psychometrics community for over half a century, and is still unresolved (Gardner, 1975; Knapp, 1990; Lord, 1953; Scholten & Borsboom, 2009; Spector, 1980; Stevens, 1968). Briefly, in the traditional view of Likert scaling one thinks of the ordered responses (e.g., strongly-disagree, disagree, …) as discrete attitude markers along a continuum of attitudes.

    View all citing articles on Scopus
    View full text