Eﬀect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the eﬀects Package

Based on recent work by Fox and Andersen (2006), this paper describes substantial extensions to the eﬀects package for R to construct eﬀect displays for multinomial and proportional-odds logit models. The package previously was limited to linear and generalized linear models. Eﬀect displays are tabular and graphical representations of terms – typically high-order terms – in a statistical model. For polytomous logit models, eﬀect displays depict ﬁtted category probabilities under the model, and can include point-wise conﬁdence envelopes for the eﬀects. The construction of eﬀect displays by functions in the eﬀects package is essentially automatic. The package provides several kinds of displays for polytomous logit models.

An "effect display" is a table or graph meant to represent a term in a statistical model.Effect displays are generalizations of similar methods, such as Fisher's (1936) "adjusted means" in the analysis of covariance, Goodnight and Harvey's (1978) "least-squares means" in analysis of variance and covariance, and Searle, Speed, and Milliken's (1980) "estimated population marginal means."To create an effect display, predictors in a term are allowed to range over their combinations of values, while other predictors in the model are held to "typical" values.
The primary motivation for effect displays is the difficulty of interpreting many statistical models directly from their estimated coefficients.For example, to interpret a linear model with interactions requires that main effects -and possibly lower-order interactions -marginal to an interaction be combined with the interaction, an operation that can entail mental gymnastics.Similarly, the coefficients of regression splines defy direct interpretation.These difficulties are compounded in generalized linear models, where interest usually inheres more directly in the scale of the response variable than in the scale of the linear predictor on which the coefficients of the model are expressed.
Effect displays for generalized linear models were introduced by Fox (1987), and an implementation in the effects package for the statistical programming environment R (Ihaka and Gentleman 1996; R Development Core Team 2009) was described in Fox (2003).The multinomial and proportional-odds logit models are common statistical models for polytomous (i.e., multi-category) categorical responses; these models are described in many sources (see, e.g., Fox 2008, Section 14.2).Fox and Andersen (2006) extended effect displays to multinomial and proportional-odds logit models where the principal obstacle was the derivation of standard errors for the effects.Fox and Andersen's results are given in Appendix A, which describes the extension of the effects package to multinomial and proportional-odds logit models.The package is available from the Comprehensive R Archive Network at http: //CRAN.R-project.org/package=effects.

Review of effect displays for generalized linear models
The computation of effect displays is particularly simple for a generalized linear model, with linear predictor η = Xβ and link function g(µ) = η; here, µ is the expectation of the response vector y.Let β represent an estimate of β, and V ( β) the estimated covariance matrix of β.Let the rows of X 0 include all combinations of values of predictors appearing in a high-order term of the model, along with typical values of the remaining predictors.The structure of X 0 , with respect for example to contrasts and interactions, is identical to that of the model matrix X.We compute fitted values η 0 = X 0 β to represent the term in question, consequently absorbing terms marginal to it.An "effect display" is a graph of these fitted values as a function of the predictors in the term, or of the fitted values transformed to the scale of the response, g −1 ( η 0 ).The standard errors of η 0 , the square-root diagonal entries of X 0 V( β)X 0 , may be used to compute point-wise confidence intervals for the effects; the end-points of these confidence intervals may also be transformed to the scale of the response.
To illustrate effect displays for a generalized linear model, we develop a logistic-regression model for the Titanic data set in the effects package.This data set contains information about the survival status, sex, age, and passenger class of 1309 passengers on the ill-fated 1912 maiden voyage of the ocean liner Titanic:

R> library("effects")
Loading required package: lattice Loading required package: grid Loading required package: MASS Loading required package: nnet Loading required package: colorspace Attaching package: 'effects' The following object(s) are masked from package:datasets : Thus, the three-way interaction among the predictors is non-significant, but each of the twoway interactions is statistically significant.We proceed to remove the three-way interaction and refit the model: R> titanic.2<-update(titanic.The allEffects function in the effects package will compute effects for all high-order terms in a model -that is, terms that are not marginal to any others in the model: R> titanic.2.all <-allEffects(titanic.2, typical = median, given.values= + c(passengerClass2nd = 1/3, passengerClass3rd = 1/3, sexmale = 0.5)) R> print(titanic.2.all, digits = 3) By default, the effect function, which allEffects calls, sets covariates -in this case, age is the only covariate -not included in a term to their mean values; factors are set to their proportional distribution in the data by averaging over contrasts.In the call to allEffects, we overrode these defaults, specifying typical = median to compute the median rather than the mean of excluded covariates, and setting given.values= c(passengerClass2nd = 1/3, passengerClass3rd = 1/3, sexmale = 0.5) to fix the distribution of passenger class to 1/3 in each level and the distribution of sex to half male and half female.Also by default, effect sets age to 10 values evenly spaced over its range; we can optionally supply the number of values for a covariate via the default.levelsargument or we can set the values directly, as illustrated below.Notice that the print method for the object returned by allEffects reports tables of the effects, which, by default, are on the scale of the response variable -for a logit model, on the probability scale.There is also a summary method (not employed here), which provides greater detail.
To plot the effects on one page (producing Figure 1): R> plot(titanic.2.all, ticks = + list(at = c(.01,.05,seq(.1, .9, by = .2),.95,.99)),ask = FALSE) The default is to plot on the scale of the linear predictor, where the linearity of the model is preserved, but to label the response axis on the generally more easily interpreted scale of the response.Because the plotted effects span a wider range on the probability scale than is accommodated by the default tick marks, we specify custom tick marks via the ticks argument; setting ask = FALSE produces a multi-panel display of all of the high-order terms in the model.The broken lines on the graphs represent point-wise 95-percent confidence envelopes around the fitted values, and the panels showing age give the marginal distribution of age in the "rug-plot" at the bottom of the graphs.The panel at the top left of Figure 1 displays the passenger-class × sex interaction: For female passengers, survival decreased with class, while for male passengers, survival was highest for first-class passengers but essentially the same for second-and third-class passengers.Female third-class passengers had approximately the same level of survival as male first-class passengers.
The top-right panel depicts the class-by-age interaction: Survival declined with age for all three passenger classes, but most dramatically for second-class passengers.First-class passengers had quite a high level of survival, and third-class passengers a relatively low level of survival, at all ages.
The lower-left panel shows the sex-by-age interaction: For both sexes, survival decreased with age, but age made more of a difference for males than for females, and the youngest males survived at approximately the same rate as the oldest females.
The 95-percent point-wise confidence envelopes suggest that all of the effects are reasonably precisely estimated (on the probability scale, which is expanded at the extremes of the logit scale), except at the highest ages.

Warning message:
In analyze.model(term,mod, xlevels, default.levels): passengerClass:sex:age does not appear in the model As before, we supply custom tick marks, and we restrict age, which is highly skewed, to run from 0 to its 99th percentile, via the argument xlevels = list(age = 0:65)).We can specify the three-way interaction as either "passengerClass*sex*age" or "passengerClass:sex:age", but the order of the predictors must be the same as in the model.
The display in Figure 2 is, we maintain, easier to digest than the separate panels for the three two-way interactions shown in Figure 1, but the general pattern of the results is as we have described above.A reasonable brief summary is that survival on the Titanic was of "women and children first," but modified by passenger class.
Because the model fit to the data is linear in age on the logit scale, it is possible to arrive at these conclusions by examining the estimated coefficients of the model, but to do so requires Figure 2: Effect display for the three-way interaction passengerClass × sex × age, a term that is not in the model fit to the Titanic data, but to which all of the two-way interactions in the model are marginal.laboriously adding terms according to relationships of marginality among them, a process that is automated in the effect displays.Moreover, the effect plots in Figures 1 and 2 label axes on the more easily interpreted probability scale.

Effect displays for multinomial logit models
The multinomial logit model is probably the most common regression model for polytomous categorical data.The effects package uses the implementation of the multinomial logit model in the multinom function from the nnet package (associated with Venables and Ripley 2002), one of the "recommended" packages that ship with standard R distributions.
The multinomial logit model takes the following form: Here, µ ij is the probability that the response variable is in its jth of m levels for observation i; x i is the ith row of the model matrix X; and β j is a vector of p regression coefficients pertaining to level j of the response.The first level of the response is treated specially to insure that m j=1 µ ij = 1; this choice is arbitrary and inconsequential, in the sense that the model produces the same set of fitted probabilities regardless of the response level singled out for special treatment.The interpretation of the regression coefficients, however, depends upon the selection of "baseline" level for the response, because log Thus, the coefficients represent effects on the log-odds of membership in level j versus level 1 of the response.Differences in coefficients give effects on the log-odds of membership in other pairs: Producing effect displays on the probability scale for the multinomial logit model is relatively straightforward, because the model yields fitted probabilities µ 0j for specified values of the linear predictors x 0 β j .The only difficult point is the calculation of standard errors for the effects.We adopt the strategy of displaying effects on the probability scale, but computing standard errors and confidence intervals on the scale of the individual-level logits, log [µ 0j /(1 − µ 0j )]: Because the probabilities are bounded by 0 and 1, while the logits are unbounded, confidence intervals on the logit scale are better-behaved.We outline this procedure in the Appendix A; a more extended treatment may be found in Fox and Andersen (2006).
The data frame BEPS in the effects package contains data from the 1997-2001 British Election Panel Study.We adapt an example that appeared in Fox and Andersen (2006), which, in turn, was based on work by Andersen, Heath, and Sinnott (2002).The focus of the example (available via the command example(BEPS)) is the interaction between political knowledge and electoral choice: Individuals who are familiar with political parties' platforms are expected to be more likely to align their votes with their preferred positions on issues, in this case, with the issue of the UK's integration into the European Union.The data set contains the following variables:

R> summary(BEPS)
vote is a factor with three levels, Conservative, Labour, and Liberal Democrat.The Conservatives were generally opposed to increased UK participation in the EU, while the other two parties adopted a pro-Europe position.
Europe is an 11-point scale measuring the respondent's attitudes towards European integration, with high scores representing "Eurosceptic" (i.e., anti-EU) sentiment.
political.knowledgeassesses the respondent's familiarity with the parties' positions on the European-integration issue.Scores range from 0 (representing knowledge of none of the parties' positions) to 3 (knowledge of all three positions).
Age, gender, perceptions of economic conditions over the past year (both national and household), and evaluations of the leaders of the three major parties are included in the model as "control variables" because of their known relationship to voting.
After exploration of the data, we model Europe as a three-degree-of-freedom regression spline.(Fox and Andersen 2006, employed  Partly because the model is for logits comparing each of the other two parties to the Conservatives, partly because of the interaction, and partly because of the use of a B-spline, it is very difficult to interpret the Europe × political.knowledgeinteraction from the estimated coefficients.
Although the coefficients for gender and household economic conditions are not statistically significant, neither variable is the focus of the research, and we leave both terms in the model.Effects are computed for the Europe × political.knowledgeinteraction as follows: R> europe.knowledge<-effect("bs(Europe, 3)*political.knowledge",beps, + xlevels = list(Europe = seq(1, 11, length = 50), + political.knowledge= 0:3), given.values= c(gendermale = 0.5)) #  To obtain "safe predictions" of the effects in a model that uses a data-dependent basis for the model matrix (such as the present model, which employs a regression spline), the effect function refits the model, adapting the strategy described in Hastie (1992, Section 7.3.3).
To produce a smooth graph, we set Europe to 50 evenly spaced values between 1 and 11, even though the variable itself takes on only integer values; political.knowledge is set to integer values between 0 and 3; the dummy regressor for gender, gendermale, is fixed to 0.5, representing a group composed equally of men and women.The default effect plot for the Europe × political.knowledgeinteraction is shown in Figure 3: R> plot(europe.knowledge)An alternative "stacked-area" representation of the interaction, using traditional politicalparty colors (and suppressing the uninformative rug-plot at the bottom of each panel), is displayed in Figure 4: R> plot(europe.knowledge,style = "stacked", + colors = c("blue", "red", "orange"), rug = FALSE)  It is clear from both effect displays that as voters' knowledge increased, they more accurately aligned their votes with their political opinions: At the lowest level of political knowledge (the left-most column of Figure 3 and the left panel of Figure 4), the relationship between attitude towards European integration and vote was weak, and indeed there was slightly higher support for the pro-Europe Liberal Democratic Party among the most Eurosceptic voters than among those most favorable to European integration of the UK.At the highest level of political knowledge, support for the anti-Europe Conservative Party increased dramatically with opposition to European integration, as support for Labour and the Liberal Democrats declined.The confidence envelopes in Figure 3 show that these effects are quite precisely estimated.
We reiterate that the use of a regression spline in this model makes direct interpretation of the estimated coefficients a practical impossibility.Even if the we had fit a linear term in attitude towards European integration, however, the complicated structure of the multinomial logit model together with the interaction between attitude towards Europe and political knowledge would have proved an obstacle to direct interpretation of the coefficients.

Effect displays for proportional-odds models
The proportional-odds model, perhaps the regression model most commonly used for an ordinal response, is often derived by assuming a linear regression for a latent response variables ξ, where x i is the ith row of the model matrix, β is a vector of regression coefficients, and α is an intercept parameter (which will prove redundant).Although the latent response cannot be observed directly, a binned version of it, y, with m levels is available: where the "thresholds," , are parameters to be estimated from the data along with the regression coefficients.The cumulative distribution function of the observed response is for j = 1, 2, ..., m − 1.Assuming that the errors ε i are normally distributed produces an ordered probit model; assuming that the errors are logistically distributed, with distribution function produces an ordered logit model: logit[Pr(y i > j)] = log e Pr(y i > j) The intercept α in Equation 2 is set to 0 to identify the model, in effect fixing the origin of the latent response.This model is called the "proportional-odds model" because different cumulative log-odds, logit[Pr(y i > j)] and logit[Pr(y i > j )], differ by the constant α j − α j , and therefore the odds themselves are proportional regardless of the value of x i .Each level of y save the last has its own intercept, which is the negative of the corresponding threshold, but there is a common coefficient vector β.
The WVS data frame in the effects package contains data from the second wave of the World Value Survey (Inglehart 2000), conducted between 1995 and 1997.Forty-two countries participated in this survey, which collected data on social attitudes and related information employing a common questionnaire.We focus here on four countries (with sample sizes given in parentheses): Australia (1874), Norway (1127), Sweden (1003), and the United States (1377).The response variable in our illustration, which is adapted from, but not identical to, an example in Fox and Andersen (2006), is the answer to the question, "Do you think that what the government is doing for people in poverty in this country is about the right amount, too much, or too little?"Predictors include country, gender, religion (whether or not the respondent belonged to a religion), education (whether or not the respondent held a university degree), and age (in years): R> summary(WVS) The command example(WVS) will reproduce the results in this section.
After exploring the data, we fit the following model, which includes interactions between country and the other predictors (because the focus here is on national differences), and employs a four-degree-of-freedom regression spline in age.We employ the polr function in the MASS package (associated with Venables and Ripley 2002), which is one of the recommended packages in R: As was true of the multinomial-logit model, it is difficult to interpret the results from the estimated coefficients.We will instead present three illustrative effect displays for the country × age interaction.The first display (Figure 5), which is the default, includes point-wise confidence bands around the fitted effect: Because there are a few very old individuals in the data set, we plot to the 99th percentile of age, setting age to integer values between 18 and 83 (which produces a smooth plot).We also fix the proportion of men to 0.5; religion and education are implicitly set to their observed proportions in the data (which are, respectively, 0.85 church attenders and 0.21 with post-secondary degrees).
The second display (Figure 6) is a "stacked-area" graph, obtained by setting the argument style="stacked" to the plot method for polytomous effect-display objects: The colors for the levels of the ordered response are selected using the sequential_hcl function in the colorspace package (Ihaka, Murrell, Hornik, and Zeileis 2009;Zeileis, Hornik, and Murrell 2009); default colors for multinomial logit models are selected using the rainbow_hcl function in the same package.
Finally, Figure 7 shows an effect display for the country-by-age interaction on the scale of the latent response, obtained by specifying the argument latent=TRUE to the effect function: R> plot(effect("country*bs(age,4)", wvs.2, xlevels = list(age = 18:83), + given.values= c(gendermale = 0.5), latent = TRUE), rug = FALSE) Notice that the display includes horizontal lines at the category thresholds of the response.
The three displays reveal a similar pattern: Support for government action for the poor generally declined with age in all four countries, but the relationship to age was stronger in Australia and the US than in Norway or Sweden, and at virtually all ages, support for the poor was lower in the US than in the other three countries.The estimated modal response in Norway and Sweden was 'too little' at virtually all ages, and 'about right' in the US at almost all ages; in Australia, the modal response was 'too little' among young respondents and 'about right' among older respondents.Figure 5 suggests that probabilities of category membership are estimated with reasonable precision, except at the highest ages in Norway and Sweden, but Figure 7 shows that there is substantial uncertainty in the estimated effects on the scale of the latent response.
As in the previous example, the use of a B-spline in this model, here for age, makes it impractical to interpret the coefficient estimates directly.Had we specified a linear effect for age, direct interpretation would have been possible in principle, at least on the scale of the latent response, but would have required mental arithmetic to sum coefficients marginal to each high-order term.

A. Standard errors for the displays
This appendix adapts Fox and Andersen's (2006) results on standard errors for effect displays in polytomous logit models to the notation of the present paper, which, in turn, reflects the parametrization of the multinomial and proportional-odds logit models in the multinom and polr functions.
Because the fitted probabilities in the multinomial logit model are nonlinear functions of the model parameters (from Equation 1), Fox and Andersen derive approximate standard errors for effects by the delta method.Each fitted probability for an effect is computed at a particular model vector x 0 ; differentiating the corresponding fitted probabilities µ 0j with respect to the parameters β j of the model produces Let V ( β) represent the covariance matrix of the estimated coefficients β j stacked into a column vector, By a second application of the delta method, Confidence bounds computed on the logit scale using the standard error V ( λ 0j ) can then be translated back to the probability scale.
The proportional-odds model (Equation 3) is for cumulative logits, and so expressions for individual-category probabilities µ 0j = Pr(Y 0 = j) are required:

Figure 1 :
Figure1: Effect display for high-order terms in the binary logistic-regression model fit to the Titanic data.The red broken lines give point-wise 95-percent confidence envelopes around the fitted effects.

Figure 3 :
Figure 3: Effect plot for the Europe × political.knowledgeinteraction in the multinomial logit model fit to the BEPS data.

Figure 6 :
Figure 6: Stacked-area effect plot for the country × age interaction.

Figure 7 :
Figure7: Effect display for the country × age interaction, plotted on the scale of the latent response.The horizontal lines give the estimated thresholds between levels of the observed response.

passengerClass*sex*age effect plot
a simpler model in which both Europe and political.knowledgeentered the model linearly.)