Bradley-Terry Models in R

This paper describes the R add-on package BradleyTerry , which facilitates the speciﬁcation and ﬁtting of Bradley-Terry logit models to pair-comparison data. Included are the standard ‘unstructured’ Bradley-Terry model, structured versions in which the parameters are related through a linear predictor to explanatory variables, and the possibility of an order or ‘home advantage’ eﬀect. Model ﬁtting is either by maximum likelihood or by bias-reduced maximum likelihood in which the ﬁrst-order asymptotic bias of parameter estimates is eliminated. Also provided are a simple and eﬃcient approach to handling missing covariate data, and suitably-deﬁned residuals for diagnostic checking of the linear predictor; these are new methodological contributions which will be discussed in greater detail elsewhere.


Bradley-Terry model 1.Introduction
The Bradley-Terry model (Bradley and Terry 1952) assumes that in a 'contest' between any two 'players', say player i and player j (i, j ∈ {1, . . . , K}), the odds that i beats j is α i /α j , where α i and α j are positive-valued parameters which might be thought of as representing 'ability'. For a good general introduction see Agresti (2002). Applications are many, ranging from experimental psychology to the analysis of sports tournaments to genetics (for example, the allelic transmission/disequilibrium test of Sham and Curtis (1995) is based on a Bradley-Terry model in which the 'players' are alleles). The model can alternatively be expressed in the logit-linear form logit[pr(i beats j)] = λ i − λ j , where λ i = log α i for all i. Thus, assuming independence of all contests, the parameters λ i , λ j , etc., can be estimated by maximum likelihood using standard software for generalized linear models, with a suitably specified model matrix. The primary purpose of the BradleyTerry package, implemented in the R statistical computing environment (Ihaka and Gentleman 1996;R Development Core Team 2003), is to facilitate the specification and fitting of such models, including special cases in which the ability parameters are related to available explanatory variables through a linear predictor of the form λ i = p r=1 β r x ir .

Example: analysis of journal citations
The following comes from page 448 of Agresti (2002), extracted from the larger table of Stigler (1994). The data are counts of citations among four prominent journals of statistics: The coefficients here are maximum likelihood estimates of λ 2 , λ 3 , λ 4 , with λ 1 (the log-ability for Biometrika) set to zero as an identifying convention.
Note the use of the special right-hand-side formula '..', which is used to specify the linear predictor λ i − λ j of the standard Bradley-Terry model.
If a different 'reference' journal is required, this can be achieved using the optional refcat argument: for example, making use of update to avoid re-specifying the whole model, > update(citeModel, .~., refcat = "JASA") Call: BTm(formula = citations~.., refcat = "JASA") Coefficients: . The use of the standard Bradley-Terry model for this application is of course rather questionable -for example, citations within a published paper can hardly be considered independent, and the model discards potentially important information on self-citation. Stigler (1994) provides arguments to defend the model's use despite such concerns.

Abilities predicted by explanatory variables
In some application contexts there may be 'player-specific' explanatory variables available, and it is then natural to consider model simplification of the form in which ability of each player i is related to explanatory variables x i1 , . . . , x ip through a linear predictor with coefficients β 1 , . . . , β p . See, for example, Springall (1973). The BTm function allows such models to be specified in a natural way using the standard S -language model formulae.
As a very simple illustration with just one predictor, consider the citations model above but with ability determined by the journal's country of origin: > journalNames <-levels(citations$winner) > journalData <-data.frame(origin = c("UK", "USA", "USA", "UK"), + row.names = journalNames) > citeModel2 <-BTm(citations~origin, data = journalData) > citeModel2 Call: BTm(formula = citations~origin, data = journalData) The UK journals have an estimated advantage in (log) ability of 1.273 over the USA journals. This model saves two parameters, but at the expense of severe lack of fit: clearly journals' ability to be cited varies significantly within at least one of the two countries of origin.
The 'standard' Bradley-Terry model from §1.2 above could have been specified in the same way: > journal <-as.factor(row.names(journalData)) > BTm ( The special model formula '..' used in §1.2 provides a convenient shorthand for the specification of this model.

Missing values
The NA values in the journal-citation data above appear in data rows that are not used in the Bradley-Terry model. Such rows in the data frame of contest results (i.e., the left-hand side of the model formula) are simply discarded by BTm.
Where there are missing values in player-specific predictor (or explanatory) variables which appear on the right-hand side of the model formula, it will typically be very wasteful to discard all contests involving players for which some values are missing. Instead, such cases are accommodated by the inclusion of one or more parameters in the model. If, for example, player 1 has one or more of its predictor values x 11 , . . . , x 1p missing, then the combination of (1) and (2) above yields for all other players j. This results in the inclusion of a 'direct' ability parameter for each player having missing predictor values, in addition to the common coefficients β 1 , . . . , β p -an approach which will be appropriate when the missingness mechanism is unrelated to contest Journal of Statistical Software 5 success. The same device can be used also to accommodate specified departures from a structured Bradley-Terry model, whereby some players have their abilities determined by the linear predictor but others do not.
As a simple illustration, consider the previous citations model in which country of origin is unknown for one of the journals (say, Communications in Statistics): The fit of this model -which in effect allows distinct abilities for JASA and Communications in Statistics, is better (as evidenced by the much-reduced deviance) than the previous model, but is still unacceptable. The two UK journals differ significantly in ability, as may be seen from a summary of the original three-parameter fit: The estimated difference of 0.269 between JRSS-B and the 'reference' journal Biometrika is highly significant (although the correlations likely in this dataset have probably caused the significance of all such comparisons to be overstated in these results).

Order effect
In certain types of application some or all contests have an associated 'bias', related to the order in which items are presented to a judge or with the location in which a contest takes place, for example. A natural extension of the Bradley-Terry model (1) where z = 1 if i has the supposed advantage and z = −1 if j has it. (If the 'advantage' is in fact a disadvantage, δ will be negative.) The scores λ i then relate to ability in the absence of any such advantage.
As an example, consider the baseball data given in Agresti (2002) Here there are 7 teams, and for example Milwaukee beat Detroit 4 times at home (home.adv is 1) and 3 times away from home (home.adv is −1). The 'standard' Bradley-Terry model without a home-advantage parameter is fitted as before: This reproduces the results given on page 438 of Agresti (2002): the home team has an estimated odds-multiplier of exp(0.3023) = 1.35 in its favour.

Ability scores
The function BTabilities extracts estimates and standard errors for the log-ability scores λ 1 , . . . , λ K . These will either be 'direct' estimates, as in the standard Bradley-Terry model or for players with one or more missing predictor values, or 'model-based' estimates of the form λ i = p r=1β r x ir for players whose ability is predicted by explanatory variables. As a simple illustration, estimates in the origin-predicts-ability model for journal citation data are obtained by: Here precision is of course overstated (the reported standard errors are too small), since this particular model was a poor fit to the data.

Residuals
There are two main types of residuals available for a Bradley-Terry model object.
First, there are residuals obtained by the standard methods for models of class glm. These all deliver one residual for each contest or type of contest. These residuals estimate the error in the linear predictor; they are obtained by suitable aggregation of the so-called 'working' residuals from the glm fit. From these residuals it is immediately evident, for example, that the origin-predicts-ability model understates the ability of JASA and overstates the ability of Communications of Statistics (and similarly for JRSS-B versus Biometrika). The weights attribute indicates the relative information in these residualsweight is roughly inversely proportional to variance -which may be useful for plotting and/or interpretation; for example, a large residual may be of no real concern if based on very little information. Weighted least-squares regression of these residuals on any variable already in the model is null. For example: > resids <-BTresiduals(citeModel2) > journalData$origin[2] <-"USA" ## ie the previous value is restored > lm(resids~origin, weights = attr(resids, "weights"), + data = journalData) Call: lm(formula = resids~origin, weights = attr(resids, "weights"), data = journalData) Coefficients: (Intercept) originUSA 1.690e-16 -4.391e-16

Bias-reduced estimates
Model-fitting in BTm is by default computed by maximum likelihood, using an internal call to the glm function. An alternative is to fit by bias-reduced maximum likelihood (Firth 1993): this requires additionally the brlr package, and is specified by the optional argument br = TRUE. The resultant effect, namely removal of first-order asymptotic bias in the estimated coefficients, is often quite small. One notable feature of bias-reduced fits is that all estimated coefficients and standard errors are necessarily finite, even in situations of 'complete separation' where MLEs take infinite values (Heinze and Schemper 2002).

Model search
In addition to update() as illustrated above, methods for the generic functions add1() and drop1() are provided. These can be used in the standard way for model elaboration or specialization, and their availability also allows the use of step() for automated exploration of a set of candidate player-specific predictors.
9. Setting up the data

Contest results
The left-hand side of the model formula supplied to BTm is a data frame with at least two columns. The citations object shown in §1 above is an example; baseball in §4 is another. Each row represents a contest result. One column (either named "winner", or the first column if no column has that name), is a factor indicating contest winners; another (either "loser", or column 2) indicates contest losers. An optional numeric column named "Freq" contains the frequency of each result; if this column is absent, all frequencies are taken to be 1.
If order.effect is specified, it should be a numeric vector of the same length as the number of rows in the contest-results data frame. It may be convenient to store such a vector in the same data frame, as was done in the baseball dataset above. Values should be 1 where the winner is advantaged by the effect, -1 where the loser is advantaged, and 0 where neither player is advantaged.
To use only certain rows of the data in the analysis, the subset argument may be used in the call to BTm. This should either be a logical vector of the same length as the number of rows in the contest-results data frame, or a numeric vector containing the indices of rows to be used.

Predictors
Variables which appear in the right-hand side of the model formula are 'player'-level predictor variables. The safest approach is to put all potential predictor (explanatory) variablesincluding factors and any offset term -into a data frame like journalData above, with one row per (potential) player, and with row names the names of players exactly as used in the "winner" and "loser" columns of the contest-results data frame. The data argument to BTm, which applies only to right-hand side variables, is then used to identify the data frame in which predictors (and any offset) can be found.
An offset in the model can be specified using the offset argument to BTm, which should be a vector of length equal to that of the other right-hand side variables (and which should, for tidiness, come from the same data frame as other predictors).
The BradleyTerry package does not provide: • any methods for dealing with ties, i.e., contests in which neither player wins.
• any facilities either for handling contest-specific (as opposed to player-specific) predictor variables, except for the possibility of an order effect as described above.
These extensions to the Bradley-Terry model can be achieved in R (or elsewhere) by fitting suitably constructed log-linear models -see, for example, Critchlow and Fligner (1991) and Dittrich, Hatzinger, and Katzenbeisser (1998). They are outside the scope of the BradleyTerry package, whose purpose is to simplify the specification and fitting of Bradley-Terry models with player-specific predictors (including of course the 'saturated' case of the standard Bradley-Terry model (1)).
A useful extension of the BradleyTerry package would be to allow the inclusion of a playerspecific random effect, as in with the {U i } distributed independently as N (0, σ U ) for example, to allow for imperfect representation of ability by the linear predictor β r x ir . Work on this is in progress.