Estimating Extensive Form Games in R

This article introduces new software, the games package, for estimating strategic statistical models in R . In these models, the probability distribution over outcomes corresponds to the equilibrium of an underlying game form. We review such models and provide deriva-tions for one example, including discussion of alternative motivations for the stochastic component of the models. We introduce the basic functionality of the games package, such as how to estimate players’ utilities for outcomes as a function of covariates. The package implements maximum likelihood estimation for the most commonly used models of strategic choice, including three extensive form games and an ultimatum bargaining model. The software also includes functions for bootstrapping, plotting ﬁtted values with their conﬁdence intervals, performing non-nested model comparisons, and checking global convergence failures. We use the new software to replicate Leblang’s (2003) analysis of speculative currency attacks.


Introduction
The games package provides functions for estimating extensive form models of strategic interaction in the R language (R Core Team 2013). These are random-utility models of choices by multiple agents, each of whom conditions his or her actions on the likely decisions of the other players. In these models, the distribution of outcomes is determined by the equilibrium of the underlying game. The goal is to estimate how the players' utility for each possible outcome varies as a function of observed variables. These models are appropriate for situations where the ultimate outcome following one actor's choice depends on actions taken by another, which are common in social science. In such cases, standard estimators like binary-choice regression or Heckman selection models will lead to incorrect inferences (Signorino 1999(Signorino , 2002Signorino and Yilmaz 2003). The games package currently implements full information maximum like-lihood estimation functions for four models of strategic choice: egame12, a discrete extensive form game with two players and three terminal nodes. egame122, a discrete extensive form game with two players and four terminal nodes. egame123, a discrete extensive form game with three players and four terminal nodes.
ultimatum, the ultimatum bargaining game.
The package also provides various post-estimation functions, including convergence checks, plotting of fitted values, and non-nested model comparison tests.
The models available in the games package include those that have been used most widely in the analysis of strategic interaction within the political science literature. The egame12 model, in which there are two players and three possible outcomes, has been used to analyze speculative attacks in currency markets (Leblang 2003), deterrence of international conflict (Signorino and Tarar 2006), the efficacy of economic sanctions (McLean and Whang 2010), and Latin American governmental crises (Helmke 2010). The egame122 model, with two players and four possible outcomes, has been applied to data on candidate entry into U.S. Congressional races (Carson 2003(Carson , 2005, appeals court decisions (Randazzo 2008), and coalition formation between lobbyists (Holyoke 2009). The ultimatum bargaining model has been used to analyze lab experiments of the ultimatum game (Ramsay and Signorino 2009) and bargaining between rulers and subjects over war mobilization (Kedziora 2012).
All of the models implemented in the games package are recursive, sequential move games with finite time horizons. As such, they represent one part of a broader literature on statistical modeling of competitive behavior. Much of the earliest econometric work in this area focused on static games with simultaneous moves, with applications to labor force participation and market entry (Bjorn and Vuong 1984;Bresnahan and Reiss 1991;Berry 1992). More recent work on static games has focused on semiparametric estimation via iterative methods (Aradillas-Lopez 2010; Bajari, Hong, Krainer, and Nekipelov 2010). Similar techniques have been developed to estimate the parameters of dynamic games from observed data (Hotz and Miller 1993;Aguirregabiria and Mira 2002;Pakes, Ostrovsky, and Berry 2007;Pesendorfer and Schmidt-Dengler 2008). Another important area of the econometric literature on strategic interaction is the estimation of auction models. These techniques use bid data to estimate features of the distribution of private valuations of the goods being auctioned (Laffont, Ossard, and Vuong 1995;Guerre, Perrigne, and Vuong 2000;Bajari and Hortaçsu 2005). To be clear, the games package currently provides statistical tools for estimating recursive, sequential games, and not simultaneous move games or dynamic games with infinite time horizons.
Estimation of discrete strategic models was previously implemented as the program Strat (Signorino 2003a) using the GAUSS programming language (Aptech Systems, Inc. 2006). The games package provides new models and additional post-estimation functionality over that available in Strat, and it is implemented in the popular R language. For analysis of other models with a recursive structure, Bas, Signorino, and Walker (2007) provide a multistage method similar to Heckman's (1979) two-step estimator for selection models. This method, statistical backward induction (SBI), uses only logistic or probit regression and can be easily implemented in virtually any statistical software. However, like the Heckman two-step estimator, statistical backward induction is inefficient relative to full information maximum likelihood. The purpose of the games package is to provide efficient maximum likelihood estimation for the most common extensive-form models of strategic choice. Section 2 of this paper provides a brief introduction to strategic statistical models, including a derivation of a simple example. Section 3 discusses the details of their implementation in the games package (Signorino and Kenkel 2014) and provides a replication of Leblang (2003). The post-estimation functionality is covered in Section 4. Package games is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package= games.

Strategic statistical models
Every strategic statistical model is associated with a game form and solution concept. First, the structure of the interaction must be known: the number of players, the order in which they move, the number of actions each has available, and the possible outcomes. The purpose is to estimate players' utilities for each outcome, usually as a function of covariates, from data on observed outcomes of the game being played. This requires the introduction of a stochastic component, so that there is a non-degenerate probability distribution over outcomes for any given set of coefficients (i.e., those on the covariates describing players' utilities). In particular, we will specify where error enters the model and calculate the equilibrium outcome for each given set of parameters and stochastic shocks, using the appropriate solution concept for the assumed stochastic structure (see below). The probability of each outcome can then be obtained by assuming a distribution for the error terms.
The choice of stochastic structure is crucial for the estimation and interpretation of utility parameters. The games package implements methods for two cases: Agent error: Each player's utility over outcomes is fixed and common knowledge. However, there are perceptual or implementation errors that can lead to a player not choosing the action that maximizes her expected utility calculated in terms of the outcome payoffs. This can be represented as a shock α mj to Player m's expected utility for taking action j, where the shock is realized immediately before m makes her action choice (and hence is unknown to the preceding players). We typically assume that each α mj is drawn independently from a normal or logistic distribution. The solution concept under agent error is quantal response equilibrium (McKelvey and Palfrey 1998), wherein each player anticipates the probability of "mistakes" by others and adjusts her expectations accordingly.
Private information: There is a different stochastic shock to each player's utility for each outcome. We will write this as π mk , for Player m and outcome k. The key assumption is that each player fully knows her utility for each outcome, but only knows the distribution of the shocks to the other players' outcome utilities. The solution concept in this case is perfect Bayesian equilibrium: each player takes the action that gives her the highest expected utility, with respect to the realized shocks to her preferences and her expectations about the actions the other players will take. Whereas the distribution over outcomes was induced by the possibility of wrong decisions in the agent error case, now it comes from the fact that observationally indistinguishable players may have different privately known preferences. Except in the statistical ultimatum model, we will assume that each π mk is drawn from a normal distribution.
We illustrate both of these stochastic structures in the egame12 example below. It is important to recognize that different assumptions about the form of the error correspond to distinct statistical models, and thus may yield different results. Users of the games package should select the form of error with care, taking into account which set of assumptions best fits their application. The implications of different error structures for strategic models are discussed in detail in Signorino (2003b).

Illustration: The egame12 model
The egame12 model, with two players and three possible outcomes, is the simplest strategic model. The players are indexed m = 1, 2, and each has an action set a m = {L, R}. The outcomes are indexed Y = 1, 3, 4. The structure of the interaction is as follows: 1. Player 1 chooses his action a 1 . If a 1 = L, the game ends and the outcome is Y = 1.
Otherwise, if a 1 = R, Player 2 gets to move.
2. Player 2 chooses her action a 2 . If a 2 = L, the outcome is Y = 3; if a 2 = R, the outcome is Y = 4.
These are illustrated in the game trees in Figure 1. Let p 1 and p 2 denote the action probabilities for Player 1, where p 1 = P(a 1 = L) and p 2 = P(a 2 = R). Let Player 2's action probabilities, conditional on 2's move being reached, be p 3 = P(a 2 = L|a 1 = R) and p 4 = P(a 2 = R|a 1 = R).
Each player's utility depends on the outcome of the game; utility to Player m for outcome k ∈ {1, 3, 4} is denoted U mk . When estimating this game, we usually model each of these utilities as a linear function of known covariates, U mk = x mk β mk , with the goal of estimating the coefficients β mk . Each player chooses her action a ∈ {L, R} to maximize her expected utility, EU m (a). Since Player 2 moves last, her expected utilities are simply EU 2 (L) = U 23 and EU 2 (R) = U 24 . Similarly, since the action a 1 = L is a game-ending move, Player 1's expected utility from this is EU 1 (L) = U 11 . However, Player 1's expected utility from the action R depends on Player 2's choice, so we have EU 1 (R) = p 3 U 13 + p 4 U 14 .
We observe N plays of the game, each with realized outcome Y i and associated regressors x mki for each player m and outcome k. Our goal is to estimate β = (β mk ) m∈{1,2},k∈{1,3,4} , the set of coefficients describing players' utilities for particular outcomes, via maximum likelihood. Once we assume a stochastic structure (agent error or private information), we can calculate the observation-wise choice probabilities p 1i , . . . , p 4i for any β. The log-likelihood is then The calculation of these choice probabilities is the subject of the following subsections.

Agent error
Consider observation i ∈ {1, . . . , N } and assume that each player receives a stochastic shock α mji to her expected utility for choosing j ∈ {L, R}, where each α mji is drawn independently from a normal distribution with mean 0 and variance σ 2 . 1 To solve for the quantal response equilibrium of the game, we will proceed via backward induction, solving for Player 2's choice probabilities in order to find Player 1's.
If Player 2's turn is reached, her choice determines the outcome for sure. We thus have EU 2i (L) = U 23i = x 23i β 23 and EU 2i (R) = U 24i = x 24i β 24 . The ex ante probability that Player 2 chooses R is where Φ(·) is the standard normal cumulative distribution function (CDF). Since Player 2 must choose from L or R, we have p 3i = 1 − p 4i . We now can solve for Player 1's choice probabilities. If Player 1's action is L, the outcome is Y i = 1 for certain. However, if he chooses R, the outcome is a lottery over outcomes 3 and 4, with probabilities p 3i and p 4i respectively. Since Player 1 also receives a shock to his expected utilities by action, his ex ante chance of choosing R is Because Player 1 must choose L or R, we have p 1i = 1 − p 2i . We can then estimate the agent error model for a given dataset by substituting Equations 2 and 3 into the log-likelihood function, Equation 1.
As in standard binary dependent variable models (e.g., generalized linear models (GLMs) with a logit or probit link), the statistical model is not identified with respect to the scale parameter σ, so it cannot be estimated (Signorino 1999;Lewis and Schultz 2003). The scale parameter is fixed to σ = 1 in all of the extensive-form models in the games package, so that each estimated utility coefficientβ mkj can be interpreted as an estimate of the ratio β mjk /σ. Alternatively, we allow for σ to be modeled as a function of covariates, in which case the regression coefficients for these variables can be estimated. For details, see the example in Section 3.3.

Private information
Assume there is an additive shock π mki to each outcome utility U mki , where each π mki is drawn independently from a normal distribution with mean 0 and variance σ 2 . We will again proceed by backward induction, this time to solve for the perfect Bayesian equilibrium.
Player 2 will choose R if and only if U 24i + π 24i ≥ U 23i + π 23i . The ex ante probability of Player 2 choosing R is therefore As before, p 3i = 1 − p 4i . We can now write Player 1's expected utility for choosing R in the private information case as The ex ante probability of Player 1 selecting R is Then, as in the agent error case, we can estimate the model by letting p 1i = 1 − p 2i and substituting Equations 4 and 5 into the log-likelihood function, Equation 1. As before, for identification we must either set σ = 1 or model σ as a function of regressors.
The agent error and private information models are very similar, but not observationally equivalent. Equations 2 and 4 are identical, so the action probabilities for Player 2 are the same under either model (holding fixed β). However, the expressions for Player 1's choice probabilities, Equations 3 and 5, are slightly different. In the agent error case, the choice probability for Player 1 depends on the difference in expected utility shocks, α 1Li − α 1Ri , which is identically distributed across observations i ∈ {1, . . . , N }. In the private information model, the probability depends on the difference of the weighted outcome utility shocks, π 11i − p 3i π 13i − p 4i π 14i . This is normally distributed with mean 0 regardless of the values of p 3i and p 4i , but its variance is 1 + p 2 3i + p 2 4i and thus may vary across observations. In particular, the variance is least when p 3i = p 4i = 0.5 and greatest when p 3i = 1 or p 4i = 1. In practice, this does not make for a major difference, and the two models usually yield similar estimates of β. Lewis and Schultz (2003) provide a necessary condition for identification in discrete-choice strategic models: no regressor, including the constant, may appear in all of a player's utility equations for the outcomes reachable after her move. All of the fitting functions in the games package enforce this condition. One way to accomplish it is to fix each player's utility to 0 for one outcome. This comes without loss of generality, since Von Neumann-Morgenstern utilities are unique only up to a positive affine transformation.

Identification of model parameters
The necessary condition is sufficient if there is enough variation in the regressor and outcome data. As an example of insufficient variation in the regressors, consider the egame12 model with: where x is a binary covariate. This specification meets the necessary condition, since U 13 and U 23 are both fixed to 0. We have that p 4i = Φ(β 240 / √ 2) ≡ p 4 for all i. Let b be any real number and consider the alternative parameters It is easily verified from Equations 2 and 3 (for the agent error model) or Equations 4 and 5 (for the private information model) that the choice probabilities are the same for all observations under β andβ. The two sets of parameters are thus observationally equivalent, so the model parameters are not identified. This kind of identification problem is averted if there is at least one continuous covariate included in the model, as then For other examples of identification failure due to insufficient variation in the covariates, see Lewis and Schultz (2003).
The other identification problem that may arise is separation, a common issue in models of discrete choice (Albert and Anderson 1984). For example, in the egame12 model, suppose there is a covariate x j (or more generally a linear combination of covariates) that enters Player 2's utility such that for some critical valuex. Then, as in ordinary logistic or probit regression models, maximum likelihood estimates (MLEs) of the parameters associated with this covariate do not exist. In practice, the fitting procedure will tend toward infinity and typically fail to converge. This condition is trivially true if one of the outcomes of a model is not observed, so another necessary condition for existence of an MLE is that each outcome be observed at least once in the data.
In addition to these global identifiability issues, there is also the possibility that a numerical maximization procedure will stop at a point that is not a strict local maximum (e.g., a saddle point or a flat spot in the likelihood function). A sufficient condition for an estimateβ to be a strict local maximum is that the Hessian matrix be negative definite atβ; all fitting functions in the games package issue a warning if the result does not satisfy this condition.

The statistical ultimatum model
The ultimatum game is a workhorse model of bargaining in economics and in political science, in which one player makes a "take it or leave it" offer to the other. The equilibrium size of an offer depends on the proposer's expectations about what will be accepted, which standard models like ordinary least squares (OLS) fail to account for. To facilitate analysis of bargaining data, we implement the statistical ultimatum game of Ramsay and Signorino (2009) via the ultimatum function. The structure of the game is: 1. Player 1 makes an offer y ∈ [M, Q] to Player 2.
2. Player 2 can accept or reject the offer.
(a) If accepted, payoffs are Q − y for Player 1 and y for Player 2.
(b) If rejected, payoffs are R 1 + 1 and R 2 + 2 respectively, where 1 , 2 are i.i.d. logistic variables with scale parameters s 1 and s 2 . The reservations R m are common knowledge, but the realized stochastic terms m are privately known by the players.
In applications, the reservation values are modeled as a function of covariates, R mi = x mi β m , and the goal is to estimate β 1 , β 2 , s 1 , and s 2 . For example, experimental economists have investigated whether there are cross-cultural differences in play of the ultimatum game in lab settings; e.g., if Americans make systematically lower offers (see Botelho, Harrison, Hirsch, and Rutström 2005). To test this, one would include an indicator for nationality in the equations for the players' reservation values.
The log-likelihood of the ultimatum model can be derived as follows (for full details, see Ramsay and Signorino 2009). First, the probability that Player 2 accepts a given offer y is where Λ(·; s) is the logistic CDF with scale parameter s. We can then characterize y * = h( 1 ), the unconstrained optimal offer for Player 1 with private shock 1 , as the solution to the concave optimization problem max y P(accept | y) · (Q − y) + (1 − P(accept | y)) · (R 1 + 1 ).
After differentiating and rearranging, we yield the implicit definition where λ(·; s) is the logistic probability density function (PDF) with scale parameter s. Finally, by transforming the distribution of 1 , we obtain expressions for the PDF and CDF of y * : Due to the constraint that offers must be between M and Q, the observed offer will be y = M if y * < M and y = Q if y * > Q. Letting δ i be an indicator for whether the offer was accepted in observation i, we yield the log-likelihood The main criterion for identification in the statistical ultimatum game is the usual condition that the variables within each x m be linearly independent; otherwise, the corresponding β m is unidentified. Unlike in most discrete-choice models, the scale parameters s 1 and s 2 are individually identified in the ultimatum model. This is a consequence of the shared observability of the offer y (see Ramsay and Signorino 2009). In small samples, however, estimation of s 2 may cause numerical difficulties; when this occurs, users may instead treat its value as fixed.

Specification and estimation
In this section and those below, we replicate Leblang's (2003) analysis of speculative currency attacks to illustrate the package's functionality. The dataset is available in package games as leblang2003.
R> library("games") R> data("leblang2003", package = "games") R> names(leblang2003) [1] "outcome" "preelec" "postelec" "rightgov" [5] "unifgov" "lreserves" "realinterest" "lexports" [9] "capcont" "overval" "creditgrow" "service" [13] "USinterest" "contagion" "prioratt" "nation" [17] "month" "year" Each observation is a country observed in a particular year. The assumed data-generating process follows the egame12 model, with two players and three potential outcomes. Player 1 is "the market," which decides whether or not to initiate a speculative attack on Player 2's (the country's) currency. If the market decides not to attack, the game ends. If there is an attack, the country decides whether to devalue the currency or defend its exchange-rate peg. The observed distribution of outcomes is: R> We assume that the market is strategic, incorporating its expectations of the country's response into its initial decision of whether to make a currency attack. The source of uncertainty is assumed to be private information about payoffs, which yields outcome probabilities given by Equations 4 and 5. The market's utility for the three possible outcomes, and each country's utility for defending the currency, is assumed to be a linear function of observed covariates. For identification, the country's utility for devaluation is fixed to 0. 2 See the original paper or the help page for leblang2003 for information on the covariates and specific assignments to each utility equation.

Modeling player utilities
The typical use of a strategic model is to estimate the effect of observed factors on players' utility for each possible outcome. To avoid an overabundance of parameters and potential inefficiency, analysts will typically want to make some exclusion restrictions -i.e., to leave some regressors out of some utility equations. This necessitates the use of multiple model formulas, which we handle via the Formula package (Zeileis and Croissant 2010). The variables to include in each utility are specified using the standard formula syntax, and each set is separated by a vertical bar (|). For example, in the egame12 model, an analyst may want to use the specification U 11 = β 11,0 + β 11,1 x 1 , U 13 = 0, where x 1 and x 2 are observed variables. The appropriate Formula syntax is y~x1 | 0 | x1 + x2 | x2.
In some of the more complex models, such as egame123 with its eight utility equations, writing the model formulas manually may be daunting or prone to error. We provide two options to ease the process. First, users may specify the model formulas as a list; the fitting functions then use the internal function checkFormulas to convert it to the appropriate Formula object.
R> f1 <-list(u11 = y~x1, u13 =~0, u14 =~x1 + x2, u24 =~x2) R> games:::checkFormulas(f1) (Elements of the list need not be named; in fact, the names are ignored.) Second, the function makeFormulas provides interactive prompts for constructing the model formulas step by step. The user only needs to supply the name of the model he or she intends to fit and a character vector containing outcome descriptions. For the Leblang data, the appropriate call would look like makeFormulas(egame12, outcomes = c("no attack", "devaluation", "defense")).
The following menu will appear at the R console: Equation for player 1's utility from no attack: 1: fix to 0 2: intercept only 3: regressors, no intercept 4: regressors with intercept Selection: If 3 or 4 is selected, the user will be prompted to enter a space-separated list of variables to include in the utility equation of interest. We use functions from stringr (Wickham 2010) in parsing the input. The same menu will then be displayed for Player 1's utility from devaluation, Player 1's utility from defense, and Player 2's utility from defense. The final prompt will ask for the name of the variable (or variables; see Section 3.2 below) containing information on the observed outcomes. The function will then return the Formula specification corresponding to the given input, which can be supplied as the formulas argument of the appropriate fitting function.

Dependent variable specification
For most of the models included in the games package, there are a few different ways that the dependent variable might be stored in the dataset. For example, all of the following are plausible representations of the outcome variable in the Leblang data: Numeric indicators for the final outcome, where 1 means no currency attack, 2 means devaluation in response to an attack, and 3 means defense against an attack.
Factor indicators for the final outcome, where the levels correspond to no attack, devaluation, and defense respectively.
Binary variables representing each player's action. The first would be coded 0 when there is no attack and 1 when there is an attack. The second would be coded 0 when the targeted country devalues and 1 when it defends the currency peg.
The games package allows for all of these types of specifications. To use a numeric or factor indicator for the final outcome, the form of the specification is simply y~., as in typical model formulas. To use binary indicators, the names of the indicators should be separated with + signs on the left-hand side, as in y1 + y2~.. When using binary indicators, unobserved outcomes -in this case, the value of y2 when y1 == 0 -should not be coded as NAs, as this will typically result in their being removed from the dataset.
The method of specifying the dependent variable has no effect on the estimation results, as shown in the next example.

[1] TRUE
The only difference is in the construction of the names of the utility equations. When binary action indicators are used, the outcome names are inferred from the names of the action variables. When numeric or factor outcome variables are used, their values/levels are used as the outcome names.

Model fitting
Once the formula has been constructed, it is straightforward to fit a strategic model. All of the fitting functions contain the arguments data, subset, and na.action, which are used in the typical way to construct the model frame. In addition, the method argument is passed to maxLik (from the maxLik package; Henningsen and Toomet 2011) to select an optimization routine, and other parameters to control the process (e.g., reltol, iterlim) can be passed as named arguments.
Each fitting function returns an object inheriting from two S3 classes. The first is the 'game' class, for which most of the methods of interest are defined, including print and summary.
The second is the name of the particular model that was fit; this is used by the predict methods. For the most part, the elements of a 'game' object are the same as those of 'lm' and 'glm' objects (e.g., coefficients, vcov). Pertinent differences include: The log.likelihood element contains the vector of the n observationwise log-likelihoods evaluated at the parameter estimate, for use in non-nested model tests (see Section 4.2 below).
The y element contains the outcome variable represented as a factor whose levels are the outcome names.
The link and type elements store the link function and source of error respectively.
The equations element contains the names of the utility equations and scale terms; this is used by the print method for 'game' objects and latexTable to group the parameters estimated.
Fitted ultimatum models contain some additional elements, which are discussed below.
The nonparametric bootstrap is implemented as part of the fitting process via the boot argument of the model functions. To run the bootstrap on a model that has already been estimated, use update as in the next example. A status bar is printed by default, but it can be suppressed by setting bootreport = FALSE.

R> summary(leb1)
Call: egame12(formulas = flb1, data = leblang2003, link = "probit", type = "private", boot = 100) To see the asymptotic, normal-theory standard errors instead, supply the option useboot = FALSE to the summary call. The other arguments for the fitting functions depend on whether the model is one of the discrete extensive form games or the statistical ultimatum game.

Extensive-form models
The stochastic structure of the extensive-form models is specified via the arguments link and type. The link argument is used to specify the distributional form of the error terms: "probit" for normal, "logit" for type I extreme value. The type argument specifies whether the source of randomness is "agent" error or "private" information. Normal errors must be used in the case of private information; if a model is specified with link = "logit" and type = "private", a warning will be issued and a probit link will be enforced. The error variance σ 2 normally is not estimable on its own, as noted above in Section 2. This is no longer the case if σ is modeled as a function of known covariates: The argument sdformula is used to estimate γ for such a model. The formula should be one-sided, with nothing to the left of the~, as in the following example.
The extensive-form models also allow for estimation of the error variance when the average payoffs are known to the analyst, such as in data from lab experiments. In this case, the payoffs can be specified with the fixedUtils argument. The only information needed from the model formula is the outcome variable, i.e., the formulas argument can be written in the form y~. or y~1. When the argument fixedUtils is used, the default behavior is to estimate a single common scale parameter, as in the next example.

The ultimatum model
In the ultimatum model, each observation consists of the value of the offer made by Player 1 and whether Player 2 accepted it. By assumption, there is an exogenous upper bound on the size of the offer, which is specified via maxOffer, and a lower bound specified via minOffer (default 0). It is important to be able to identify which offers were at one of these boundary points, since the log-likelihood of an observation depends on whether the offer was interior. If offers are stored as floating-point numbers, naive equality tests may misclassify some boundary observations as interior. To mitigate this, we use the argument offertol and code an offer x as meeting the lower bound if x < minOffer + offertol and the upper bound if x > maxOffer − offertol. Unless there are extremely slight differences between observed offers, on the order of 1 × 10 −8 , the default value of offertol should suffice for most analyses.
The arguments s1 and s2 are for fixing the scale parameters of the stochastic component of the players' reservation values. If either of these is left unspecified, it is estimated. We recommend fixing s2, since attempts to estimate it often run into numerical stability issues (Ramsay and Signorino 2009).
The model formula for ultimatum should be written in the form offer + accept~R1 | R2, where R1 and R2 contain the variables for Player 1's and 2's reservation values respectively. Some researchers may only have access to data on offer size, but not whether the offer was accepted. For such datasets, run ultimatum with the argument outcome = "offer" and specify the model formula as offer~R1 | R2. Parameters for Player 2's reservation value are still estimable in this case, since the optimal offer for Player 1 depends on his or her expectations of the probability of acceptance. Even when acceptance data are available, the option outcome = "offer" may be useful for making formal comparisons of the statistical ultimatum model to OLS models of offer size, as in Ramsay and Signorino (2009). For more on model comparison, see Section 4.2 below.
We illustrate the statistical ultimatum game with data from a classroom experiment. Each gender variable is an indicator for whether the proposer (1) or receiver (2) is female. We investigate whether players' reservation values -the amount they get if the offer is rejectedis a function of their gender.

Convergence
The log-likelihood functions for strategic models are not globally concave, so convergence to a global maximum is not guaranteed. We provide two methods to avert convergence problems: well-chosen default starting values and a likelihood profiling method.
In all of the egame models, the default starting values come from statistical backward induction (SBI), an equation-by-equation method that uses ordinary probit or logistic regression models to obtain consistent estimates of the parameters (Bas et al. 2007). 3 For example, in an egame12 model with logit link, the procedure is as follows. Let Y i be an indicator for whether Player i chooses R.
1. Obtain the estimateβ 24 by running a logistic regression of Y 2 on X 24 within the subset of observations for which Y 1 = 1 (i.e., Player 2's choice is observed).
(b) Obtain the estimates by running a logistic regression of Y 1 on the expectationtransformed data matrix −X 11p3 X 13p4 X 14 . The estimated coefficient vector is β 11β13β14 .
3. Multiply the obtained estimates by √ 2 to obtain starting values for the full-information procedure. 4 The applications of SBI to egame122 and egame123 are similar. It is less straightforward to generate starting values for the ultimatum model. We use a similar two-step procedure, but it has not been verified as consistent and sometimes yields non-finite likelihoods, in which case starting values of zero (except for the intercept) are used.
To assess convergence of an already-fitted model, we implement likelihood profiling via the profile method for 'game' objects. As in the MASS package's (Venables and Ripley 2002) profile method for 'glm' objects, this entails refitting the model numerous times, each time holding a single parameter at some value other than the original estimate. In the case of GLMs, this profiling procedure is typically used to estimate likelihood-ratio confidence regions (McCullagh and Nelder 1989, p. 254). However, it can also serve as a rough global convergence check: if the log-likelihood of any of the refitted models is greater than that of the original fit, then by definition the original procedure did not converge to a global maximum. When this is the case, as in the following example, the profile method applied to a 'game' object issues a warning.

R> plot(profstu1)
See Figure 2 for the output from this example. Slightly lower values of both the intercept and the gender coefficient for Player 1 appear to yield better-fitting models.
When the profile method for 'game' objects finds parameters that yield a higher loglikelihood than the original fit, these can be used as starting values in re-estimation of the model via the profile argument of the fitting function.

Reporting results
A natural form to present the results of a fitted strategic model is in a table where each row is a covariate and each column is a utility equation. We provide the function latexTable to automatically generate L A T E X code for such tables. Table 1 was generated with the following code: R> leb1cap <-paste("Replication of \\citeauthor{Leblang2003}'s", + "\\citeyearpar{Leblang2003} results.") R> latexTable(leb1, caption = leb1cap, label = "tab:leb1", floatplace = "p") Additional arguments include digits for the number of digits printed, rowsep for the point spacing between rows, and useboot for the use of bootstrap vs. normal-theory standard errors.

Predicted probabilities
The raw output from the model fitting functions in package games describes the effect of each covariate on players' utilities for different outcomes. Some analysts may instead be interested in how each variable affects the probability of a particular outcome occurring. Such probabilities are nonlinear functions of the covariates; for example, they are given by Equations 2 and 3 for egame12 models with agent error. Following popular developments in the political science literature (King, Tomz, and Wittenberg 2000), we provide the function predProbs to analyze how the predicted probability of each outcome changes as a function of certain covariates. The general procedure is: 1. Select a "covariate of interest," X j .
2. Hold all other variables at their central values -means for continuous variables, medians for binary or ordinal variables, modes for others -or some other pre-specified "profile" X −j = (X j ) j =j .
3. Using the estimated model, find the predicted probability of each potential outcome over the observed range of X j , while holding X −j fixed.
4. Calculate confidence intervals for the predicted values using a parametric or nonparametric bootstrap.

Plot the results.
The only mandatory arguments for predProbs are model, for the fitted model object, and x, a character string containing the name of the variable of interest (partial matches are allowed). If x is numeric, then the default behavior is to evaluate predicted probabilities at 100 grid points along the range observed for x in the data used to fit the model (i.e., the data frame model$model). The number of grid points and range of values can be controlled via the arguments n and xlim respectively. If x is a factor variable, all available levels are used.
Additional named arguments can be used to change the default profile of values for the covariates other than x. These arguments should be specified as varname = value, where varname exactly matches the name of the variable in the data frame used to fit the model and value is an expression that is evaluated within the model frame, model$model. For example, to set a variable y to its observed 10th percentile, use the argument y = quantile(y, probs = 0.1).
Confidence intervals for the predictions are calculated by resampling. If model has a boot.matrix element containing nonparametric bootstrap results, these are used. Otherwise, a matrix of parametric bootstrap results is constructed by taking 1,000 samples from a multivariate normal distribution whose mean isβ and whose variance matrix is the inverse of the negative Hessian of the estimates. The default is to compute a 95% confidence interval; this can be controlled by the ci argument. It normally takes a few seconds to compute the fitted values for all of the bootstrapped coefficients, so a status bar is displayed. This can be suppressed by setting report = FALSE.
We illustrate with an example from Leblang's data. Suppose we are interested in the estimated effect of currency reserves on the outcome probabilities when contagion is high (currency attacks are occurring elsewhere) but all other variables are held at their central values. We would use predProbs as follows.
The plot method for 'predProbs' objects can be used for visualization of the output. The number of plots that can be produced from predProbs output is equal to the number of outcomes in the corresponding fitted model (e.g., three for an egame12 model). To deal with this, we have written the plot method for 'predProbs' objects to behave similarly to the plot method for 'gam' objects in the gam package (Hastie 2011), in which each fitted model corresponds to as many plots as there are covariates.
R> par(mfrow = c(2, 2)) R> plot (predleb1) See Figure 3 for the output. If no additional arguments are specified to plot(x), where x is a 'predProbs' object, all of the plots are printed in sequence. If ask = TRUE is specified, then an interactive menu is used for plot selection: R> plot(predleb1, ask = TRUE) Make a plot selection (or 0 to exit): 1: plot: Pr(no attack) 2: plot: Pr(devaluation) 3: plot: Pr(defense) 4: plot all terms The argument which can be used to select one of these without bringing up the menu; e.g., plot(predleb1, which = 2) will produce only the plot for the devaluation outcome. In each case, all of the standard plotting arguments can be used to control the output. To change the line type used for the confidence bands, use the argument lty.ci.

Non-nested model comparisons
It is not possible to express traditional discrete-choice models like logistic regression as "restricted" strategic models, or vice versa. Therefore, standard likelihood ratio tests are inappropriate for comparing the fit of a strategic model to that of a GLM; a non-nested model comparison is necessary (Clarke and Signorino 2010). The games package implements Vuong's (1989) test for strictly nonnested models and Clarke's (2006) distribution-free test via the vuong and clarke functions respectively. Each test compares two models, under the null hypothesis that the two have an equal Kullback-Leibler distance from the true model. Both use test statistics formed from the log-likelihood contributions of each individual observation. The main difference is that Clarke's test has greater power in small samples, whereas Vuong's depends on asymptotic properties. We implement both tests with the recommended BIC-based correction to penalize overparameterization.
The simplest use of the non-nested test functions is to compare two strategic models to each other. For example, we can use them to determine whether agent error or private information is more appropriate for Leblang's data.

Vuong test for non-nested models
Model 1 log-likelihood: -482 Model 2 log-likelihood: -482 Observations: 7240 Test statistic: -0.15 Neither model is significantly preferred (p = 0.88) Neither stochastic structure appears to be significantly preferred over the other.
It is somewhat less straightforward to compare strategic to non-strategic models. Vuong's and Clarke's tests can be applied only to pairs of models for which the dependent variable is exactly the same. In a strategic model like egame12, the dependent variable for each observation is the outcome reached -i.e., the vector of all decisions made by each player. By contrast, in a standard (binary) logistic regression model, the dependent variable is an indicator for whether one particular outcome was reached. To allow for comparisons in such cases, the vuong and clarke functions have outcome arguments. For example, we could compare the strategic model of Leblang's data to a logistic regression in terms of their ability to predict the occurrence of speculative attacks as follows.

Vuong test for non-nested models
Model 1 log-likelihood: -430 Model 2 log-likelihood: -431 Observations: 7240 Test statistic: -9.8 Model 2 is preferred (p < 2e-16) The logistic regression is preferred despite having a lower log-likelihood, since it fits 8 parameters compared to the strategic model's 19. The argument outcome1 = 1 is used to indicate that the strategic model should be evaluated in terms of its fit with the market's decision not to initiate a currency attack. We would have used outcome1 = 2 to consider the outcome of a speculative attack followed by devaluation, and outcome1 = 3 for an attack followed by defense. Of course, these could not have been compared to logit1, and vuong or clarke would stop with an error after detecting models with different dependent variables.

Conclusion
We have provided new software, the games package, for estimating strategic statistical models in R. Such models are appropriate for the analysis of data where multiple agents make decisions sequentially and where the agents' expectations of each other's actions determine their choices. The new software implements multiple strategic models, including a statistical bargaining game. The software is easy to use and includes post-estimation features such as non-nested comparison tests and plots of fitted values with measures of uncertainty. We show that the software can be used to easily replicate one well-known analysis of a strategic statistical model (Leblang 2003).