MNP: R Package for Fitting the Multinomial Probit Model ∗

MNP is a publicly available R package that fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP software can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005).


Introduction
This paper illustrates how to use MNP, a publicly available R (R Development Core Team, 2012) package, in order to fit the Bayesian multinomial probit model via Markov chain Monte Carlo.The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data.Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies.The MNP software can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set.We use Markov chain Monte Carlo (MCMC) for estimation and computation.In particular, we use the efficient marginal data augmentation MCMC algorithm that is developed by Imai and van Dyk (2005).
MNP can be installed in the same way as other R packages via the install.packages("MNP")command.Appendix ?? gives instructions for obtaining R and installing MNP on Windows, Mac OS X, and Linux/UNIX platforms.Only three commands are necessary to use the MNP software; mnp() fits the multinomial probit model, summary() summarizes the MCMC output, and predict() gives posterior prediction based on the fitted model (In addition, coef() and vcov() allow one to extract the posterior draws of model coefficients and covariance matrix).To run an example script, start R and run the following commands: library(MNP) # loads the MNP package example(mnp) # runs the example script Details of the example script are given in Sections 3 and 4. Three appendices describe installation, the commands, and version history.We begin in Section 2 with a brief description of the multinomial probit model that MNP is designed to fit.

The Method
MNP implements the marginal data augmentation algorithms for posterior sampling in the multinomial probit model.The MCMC algorithm we implement here is fully described in Imai and van Dyk (2005); we use Scheme 1 of their Algorithm 1.

The Multinomial Probit Model
Suppose we have a dataset of size n with p > 2 choices and k covariates.Here, choices refer to the number of classes in the multinomial model.The word "choices" is used because the model is often used to describe how individuals choose among a number of alternatives, e.g., how a voter chooses which candidate to vote for among four candidates running for a particular office.We focus on the case when p > 2 because when p = 2, the model reduces to the standard binomial probit model, which can be fit via the glm(, family = binomial(probit)) command in R. The multinomial probit model differs from the ordinal probit model in that the former does not assume any inherent ordering on the choices.Thus, although the individuals may have preferences among the available alternatives these ordering are individual specific rather than being characteristics of the alternatives themselves.
The ordinal probit model can be fitted via an MCMC algorithm in R by installing a package called MCMCpack (?).
Under the multinomial probit model, we assume a multivariate normal distribution on the latent variables, W i = (W i1 , . . ., W i,p−1 ).
where X i is a (p−1)×k matrix of covariates, β is k ×1 vector of fixed coefficients, e i is (p−1)×1 vector of disturbances, and Σ is a (p − 1) × (p − 1) positive definite matrix.For the model to be identified, the first diagonal element of Σ is constrained, σ 11 = 1.Please note that starting with version 2.6-1, we use the restriction trace(Σ) = p as the default identification strategy following the recommendation of Burgette and Nordheim (2009).This avoids the arbitrariness of fixing one particular diagonal element.
The response variable, Y i , is the index of the choice of individual i among the alternatives in the choice set and is modeled in terms of this latent variable, W i , via where Y i equal to 0 corresponds to a base category.
The matrix X i may include both choice-specific and individual-specific variables.A choice-specific variable is a variable that has a value for each of the p choices, and these p values may be different for each individual (e.g., the price of a product in a particular region where an individual lives).Choicespecific variables are recorded relative to the baseline choice and thus there are p − 1 recorded values for each individual.In this way a choice-specific variable is tabulated as a column in X i .Individualspecific variables, on the other hand, take on a value for each individual, but are constant across the choices, e.g., the age or gender of the individual.These variables are tabulated via their interaction with each of the choice indicator variables.Thus, an individual-specific variable corresponds to p − 1 columns of X i and p − 1 components of β.

The Multinomial Probit Model with Ordered Preferences
In some cases, we observe a complete or partial ordering of p alternatives.For example, we may observe the preferences of each individual among different brands of a product.We denote the outcome variable in such situations by Y i = {Y i1 , . . ., Y ip } where i = 1, . . ., n indexes individuals and j = 1, . . ., p for some j ̸ = j ′ , we say individual i is indifferent to the choice between alternatives j and j ′ , but treat the data as if the actual ordering is unknown.In other words, formally we insist on strict inequalities among the preferences, but allow for some inequalities to be unobserved.The preference ordering is assumed to satisfy the usual axioms of preference comparability.Namely, preference is connected: For any j and j ′ , either Preference also must be transitive: for any j,j ′ , and j ′′ , For notational simplicity and without loss of generality, we assume that Y ij takes an integer value ranging from 0 to p − 1.We emphasize that we have not changed the model from Section 2.1.Rather, we simply have more observed data: the index of the choice of the individual i, Y i , can be computed from Y i .Thus, we continue to model the preference ordering, Y i , in terms of a latent (multivariate normal) random vector, where W ip = 0, the distribution of W i is specified in equation 1, and #{• • • } indicates the number of elements in a finite set.This model can be fitted via a slightly modified version of the MCMC algorithm in Imai and van Dyk (2005).In particular, we need only modify the way in which W ij is sampled and use a truncation rule based on Equation 3.

Prior Specification
Our prior distribution for the multinomial probit model is where A is the prior precision matrix of β, ν is the prior degrees of freedom parameter for Σ, and the (p − 1) × (p − 1) positive definite matrix S is the prior scale for Σ; we assume the first diagonal element of S is one.The prior distribution on Σ is proper if ν ≥ p − 1, the prior mean of Σ is approximately equal to S if ν > p − 2, and the prior variance of Σ increase as ν decreases as long as this variance exists.We also allow for an improper prior on β, which is p(β) ∝ 1 (i.e., A = 0).(2000), Nobile (2000), and Imai and van Dyk (2005).We prefer our choice because it allows us to directly specify the prior distribution on the identifiable model parameters, allows us to specify an improper prior distribution on regression coefficient, and results in a Monte Carlo sampler that is relatively quick to converge.An implementation of of the sampler proposed by McCulloch and Rossi (1994) has recently been released in the R package bayesm (Rossi and McCulloch, 2005).

Prediction under the Multinomial Probit Model
Thus, the posterior predictive distribution accounts for both variability in the response variable given 1 Algorithm 2 of Imai and van Dyk (2005) allows for a non zero prior mean for β.Because the update for Σ in this sampler is not exactly its complete conditional distribution, however, this algorithm may exhibit undesirable convergence properties in some situations.
the model parameters (i.e., the likelihood or sampling distribution) and the uncertainty in the model parameters as quantified in the posterior distribution.Monte Carlo evaluation of the posterior predictive distribution is easy once we obtain a Monte Carlo sample of the model parameters from the posterior distribution: We simply sample according to the likelihood for each Monte Carlo sample from the posterior distribution.This involves sampling the latent variable under the model in (1) and computing the preferred choice using (2) or the ordering of preferences using (3).

Example 1: Detergent Brand Choice
In this and the next section, we describe the details of two examples of MNP.In this section we use a market research dataset to illustrate the fitting of the multinomial probit model.In Section 4 we fit the multinomial probit model with ordered preference to a Japanese election dataset.We also describe how to perform convergence diagnostics of the MCMC sampler and analysis of the Monte Carlo output of MNP using an existing R package.Additional examples of MNP can be found in Imai and van Dyk (2005).

Preliminaries
Our first example analyzes a typical dataset in market research.The dataset contains information about the brand and price of the laundry detergent purchased by 2657 households originally analyzed by Chintagunta and Prasad (1998).The dataset contains the log prices of six detergent brands -Tide, Wisk, EraPlus, Surf, Solo, and All -as well as the brand chosen by each household (Type help(detergent) in R for details about the dataset).
We fit the multinomial probit model by using choice as the outcome variable and the other six variables as choice-specific covariates.After loading the MNP package, this can be accomplished using the following three commands, data(detergent) res <-mnp(choice ~1, choiceX = list(Surf=SurfPrice, Tide=TidePrice, Wisk=WiskPrice, EraPlus=EraPlusPrice, Solo=SoloPrice, All=AllPrice), cXnames = c("price"), data = detergent, n.draws = 10000, burnin = 2000, thin = 3, verbose = TRUE) summary(res) The first command loads the example dataset and stores it as the data frame called detergent.The second command fits the multinomial probit model.The default base category in this case is All.(The default base category in MNP is the first factor level of the outcome variable, Y .)Each household chooses among the six brands of laundry detergent, i.e., p = 6.We specify the choice-specific variables, choiceX, using a named list.The elements of the list are the log price of each detergent brand and they are named after the levels of factor variable, choice.We also name the coefficient for this set of choice-specific variables by using cXnames.The argument data allows us to specify the name of the data frame where the data are stored.The model estimates five intercepts and the price coefficient as well as 14 parameters in the covariance matrix, Σ.
We use the default prior distribution; an improper prior distribution for β and a diffuse prior distribution for Σ with ν = p = 6 and S = I.We sample 10,000 replications of the parameter from the resulting posterior distribution, saving every fourth sample after discarding the first 2,000 samples as specified by the arguments, n.draws, thin, and burnin.The argument verbose = TRUE specifies that a progress report and other useful messages be printed while the MCMC sampler is running.The  We emphasize that these results are preliminary because convergence has not yet been assessed.Thus, we delay interpretation of the fit until Section 3.3, after we discuss convergence diagnostics in Section 3.2.Note that coef(res) and vcov(res) allow one to extract the posterior draws of model coefficients and covariance matrix if desired.Type help(mnp) in R for details.
We store the output from each of the three chains as an object of class mcmc, and then combine them into a single list using the following commands, where the first command loads the coda package2 and the second command saves the results as an object of class mcmc.list, which is called res.coda.We exclude the 7th column of each chain, because this column corresponds to the first diagonal element of the covariance matrix which is always equal to 1.The following command computes the Gelman-Rubin statistic from these three chains, gelman.diag(res.coda,transform = TRUE) where transform = TRUE applies log or logit transformation as appropriate to improve the normality of each of the marginal distributions.Gelman et al. (2004) suggest computing the statistic for each scalar estimate of interest, and to to run the chains until the statistics are all less than 1.1.
Inference is then based on the Monte Carlo sample obtained by combining the second half of each of the chains.The output of the coda command lists the value and a 97.5% upper limit of the Gelman-Rubin statistic for each parameter.The Gelman-Rubin statistics are all less than 1.1, suggesting satisfactory convergence has been achieved.
(Note that the 97.5conservative user might want to obtain a set of longer Markov chains and recompute the Gelman-Rubin statistics.)It may also be useful to examine the change in the value of the Gelman-Rubin statistic over the iterations.The following commands produce a graphical summary of the progression of the statistics over iterations.
gelman.plot(res.coda,transform = TRUE, ylim = c(1,1.2)) where ylim = c(1,1.2) specifies the range of the vertical axis of the plot.The results appear in Figure 1, as a cumulative evaluation of the Gelman-Rubin statistic over iterations for nine selected (Three coefficients appear in the first row; three covariance parameters appear in the second row; and three variance parameters appear in the third row.) The coda package can also be used to produce univariate time-series plots of the three chains and univariate density estimate of the posterior distribution.The following commands create these graphs for the price coefficient.

Final Analysis and Conclusions
In the final analysis, we combine the second half of each of the three chains.This is accomplished using the following command that saves the last 25,000 draws from each chain as an mcmc object and combines the mcmc objects into a list,  The output shows the mean, standard deviation, and various percentiles of the posterior distributions of the coefficients and the elements of the variance-covariance matrix.The base category is the detergent All.Separate intercepts are estimated for each detergent.The price coefficient is negative and highly statistically significant, agreeing with the standard economic expectation that consumers are less likely to buy more expensive goods.
MNP also allows one to calculate the posterior predictive probabilities of each alternative being most preferred given a particular value of the covariates.For example, one can calculate the posterior predictive probabilities using the covariate values of the first two observations by using the predict() We follow the commands used in 3.2 and compute the Gelman-Rubin statistic for each parameter.Upon examination of the resulting statistics, we determined that satisfactory convergence has been achieved.For example, the value of the Gelman-Rubin statistic is less than 1.01 for all the parameters.Hence, we base our final analysis on the combined draws from the second half of the three chains (i.e., a total of 75,000 draws using 25,000 draws from each chain).Posterior summaries can be obtained using the coda package as before, Here, one of the findings is that older voters tend to prefer LDP as indicated by the statistically significant positive age coefficient for LDP.This is consistent with the conventional wisdom of Japanese politics that the stronghold of LDP is elderly voters.
To further investigate the marginal effect of age, we calculate the posterior predictive probabilities of party preference under two scenarios.First, we choose the 10th individual in the survey data and compute the predictive probability that a voter with this set of covariates prefers each of the parties.
This can be accomplished by the following commands, The result indicates that under the model, we should expect 36% of voters with these covariates to prefer LDP, 32% to prefer NFP, 21% to prefer SKG, and 11% to prefer JCP.Next, we change the value of the age variable of this voter from 50 to 75, while holding the other variables constant.We then recompute the posterior predictive probabilities and examine how they change.This can be accomplished using the following commands, where the first two commands recode the age variable for the voter and the second command makes the prediction.We obtain the following results, JCP LDP NFP SKG [1,] 0.06548 0.485467 0.249667 0.199387 The comparison of the two results shows that changing the value of the age variable from 50 to 75 increases the estimated posterior predictive probability of preferring LDP most and by more than 10 percentage points.Interestingly, the predictive probability for SKG changes very little, while that of NFP decreases significantly.This suggests that older voters tend to prefer LDP over NFP.
Predictions of individual preferences given particular values of the covariates can be useful in interpreting the fitted model.Consider a value of the (p − 1) × k matrix of covariates, X ⋆ , that may or may not correspond to the values for one of the observed individuals.We are interested in the distribution of the preferences among the alternatives in the choice set given this value of the covariates.Let Y ⋆ be the preferred choice and Y ⋆ = (Y ⋆ 1 , . . ., Y ⋆ p ) indicate the ordering of the preferences among the available alternatives.As an example, one might be interested in Pr(Y ⋆ = j | X ⋆ ) for some j.By varying X ⋆ , one could explore how preferences are expected to change with covariates.Similarly, one might be interested in how relative preferences such as Pr(Y ⋆ j > Y ⋆ j ′ | X ⋆ ) are expected to change with the covariates.In the context of a Bayesian analysis, such predictive probabilities are computed via the posterior predictive distribution.This distribution conditions on the observed data, Y = (Y 1 , . . ., Y n ) or Y = (Y 1 , . . ., Y n ), but averages over the uncertainty in the model parameters.For example, summary(res) command gives a summary of the output including the posterior means and standard deviations of the parameters.The summary is based on the single MCMC chain produced with this call of MNP.Before we can reliably draw conclusions based on these results, we must be sure the chain has converged.Convergence diagnostics are discussed and illustrated in Section 3.2.The result of the call of summary(res) are as follows.Call: mnp(formula = choice ~1, data = detergent, choiceX = list(Surf = SurfPrice, Tide = TidePrice, Wisk = WiskPrice, EraPlus = EraPlusPrice, Solo = SoloPrice, All = AllPrice), cXnames = c("price"), n.draws = 10000, burnin = 2000, thin = 3, verbose = TRUE)

Figure 1 :Figure 2 :
Figure 1: The Gelman-Rubin Statistic Computed with Three Independent Markov Chains for Selected Parameters in the Detergent Example.The first row represents three coefficients, the second row represents three covariances, and the third row represents three variance parameters.

1
Alternate prior specifications were introduced by McCulloch and Rossi (1994) and McCulloch et al. (2000).The relative advantage of the various prior distributions are discussed by McCulloch et al.