Estimation of Random Utility Models in R : The mlogit Package

mlogit is a package for R which enables the estimation of random utility models with individual and/or alternative specific variables. The main extensions of the basic multinomial model (heteroscedastic, nested and random parameter models) are implemented.


Introduction
Random utility models is the reference approach in economics when one wants to analyze the choice by a decision maker of one among a set of mutually exclusive alternatives.Since the seminal works of Daniel Mc Fadden (McFadden 1974) who won the Nobel prize in economics "for his development of theory and methods for analyzing discrete choice", a large amount of theoretical and empirical literature have been developed in this field. 1hese models rely on the hypothesis that the decision maker is able to rank the different alternatives by an order of preference represented by a utility function.For a choice of an alternative l among a set of J alternatives, each alternative is therefore characterized by an utility value U l and the alternative l is chosen if and only if U l > U j ∀ j = l.
These models are called random utility models because the researcher is unable to measure the whole level of utility, but only part of it.Therefore, the utility for alternative l is written as: U l = V l + l where V l is a function of some observable covariates and unknown parameters to be estimated, and l a random deviate which contains all the unobserved determinants of the utility.Alternative l is therefore chosen if j < (V l − V j ) + l ∀ j = l and this choice can be written in probabilistic terms: Different hypothesis on the distribution of lead to different flavors of random utility models.Early developments of these models were based on the hypothesis of identically and independent errors following a Gumbell distribution.2Much more general models have since been proposed, based on much less restrictive distribution hypothesis, and often estimated using simulations.
The first version of mlogit was posted in 2008, it was the first R package allowing the estimation of random utility models.Since then, other package have emerged (see Sarrias and Daziano 2017, page 4 for a survey of revelant R pakages).mlogit still provides the widests set of estimators for random utility models and, moreover, its syntax has been adopted by other R packages, especially by gmnl (Sarrias and Daziano 2017) and mnlogit (Hasan, Wang, and Mahani 2016) which, respectively, implements advanced mixed logit models and estimates efficiently multinomial logit models on large data sets.
The article is organized as follow.Section 1 explains how the usual formula-data and testing interface can be extended in order to describes in a very natural way the model to be estimated.Section 2 describe the landmark multinomial logit model.Section 3 and 4 present two important extensions of this basic model: section 3 presents models that relax the iid Gumbell hypothesis and section 4 introduces slope heterogeneity by considering some parameters as random.Section 5 concludes.

Data management, model description and testing
The formula-data interface is a critical advantage of the R software.It provides a practical way to describe the model to be estimated and to store data.However, the usual interface is not flexible enough to deal correctly with random utility models.Therefore, mlogit introduce tools to construct richer data.framesand formulas.
1.1.Data management mlogit is loaded using: library("mlogit") It comes with several data sets that we'll use to illustrate the features of the library.Data sets used for multinomial logit estimation concern some individuals, that make one or a sequential choice of one alternative among a set of mutually exclusive alternatives.The determinants of these choices are covariates that can depends on the alternative and the choice situation, only on the alternative or only on the choice situation.
Such data have therefore a specific structure that can be characterized by three indexes: the alternative, the choice situation and the individual.These three indexes will be denoted alt, chid and id.Note that the distinction between chid and id is only relevant if we have repeated observations for the same individual.
Data sets can have two different shapes: a wide shape (one row for each choice situation) or a long shape (one row for each alternative and, therefore, as many rows as there are alternatives for each choice situation).mlogit deals with both format.It provides a mlogit.datafunction that take as first argument a data.frameand returns a data.frame in "long" format with some supplementary information about the structure of the data.

Wide format
Train 3 is an example of a wide data set: data("Train", package = "mlogit") head (Train,  This data set contains data about a stated preference survey in Netherlands.Each individual has responded to several (up to 16) scenarios.For every scenario, two train trips are proposed to the user, with different combinations of four attributes: price (the price in cents of guilders), time (travel time in minutes), change (the number of changes) and comfort (the class of comfort, 0, 1 or 2, 0 being the most comfortable class).
This "wide" format is suitable to store individual specific variables.Otherwise, it is cumbersome for alternative specific variables because there are as many columns for such variables that there are alternatives.
For such a wide data set, the shape argument of mlogit.data is mandatory, as it default value is "long".The alternative specific variables are indicated with the varying argument which is a numeric vector that indicates their position in the data frame.This argument is then passed to reshape that coerced the original data.frame in "long" format.Further arguments may be passed to reshape.For example, as the names of the variables are of the form price_A, one must add sep = '_' (the default value being ".").
The choice argument is also mandatory because the response has to be transformed in a logical value in the long format.To take the panel dimension into account, one has to add an argument id.var which is the name of the individual index.
Tr <-mlogit.data(Train,shape = "wide", choice = "choice", varying = 4:11, sep = "_", alt.levels = c("A", "B"), id.var = "id", opposite = c("price", "comfort", "time", "change")) Note the use of the opposite argument for the 4 covariates: we expect negative coefficients for all of these, taking the opposite of the covariates will lead to expected positive coefficients.We next convert price and time in more meaningful unities, hours and euros (1 guilder was 2.20371 euros): Tr$price <-Tr$price / 100 * 2.20371 Tr$time <-Tr$time / 60 head(Tr, 3) An index attribute is added to the data, which contains the two relevant indexes: chid is the choice situation index and alt the alternative index.As a id.var is provided, the index contains a third column, the individual index.This attribute is a data.framethat can be extracted using the index function, which returns it:this data.frame.
head(index(Tr), 3) Long format There are four transport modes (air, train, bus and car) and most of the variable are alternative specific (cost for monetary cost, ivt for in vehicle time, ovt for out of vehicle time, freq for frequency).The only individual specific variables are dist (the distance of the trip), income (household income), urban (a dummy for trips which have a large city at the origin or the destination) and noalt the number of available alternatives.The advantage of this shape is that there are much fewer columns than in the wide format, the caveat being that values of dist, income and urban are repeated four times.
For data in "long" format, the shape and the choice arguments are no more mandatory.
To replicate published results (latter in the text), we'll use only a subset of the choice situations, namely those for which the 4 alternatives are available.This can be done using the subset function with the subset argument set to noalt == 4.This can also be done within mlogit.data,using the subset argument.
The information about the structure of the data can be explicitly indicated using choice situations and alternative indexes (respectively case and alt in this data set) or, in part, guessed by the mlogit.datafunction.Here, after subsetting, we have 2779 choice situations with 4 alternatives, and the rows are ordered first by choice situation and then by alternative (train, air, bus and car in this order).
The first way to read correctly this data frame is to ignore completely the two index variables.In this case, the only supplementary argument to provide is the alt.levels argument which is a character vector that contains the name of the alternatives in their order of appearance: MC <-mlogit.data(ModeCanada,subset = noalt == 4, alt.levels = c("train", "air", "bus", "car")) Note that this can only be used if the data set is "balanced", which means than the same set of alternatives is available for all choice situations.It is also possible to provide an argument alt.var which indicates the name of the variable that contains the alternatives MC <-mlogit.data(ModeCanada, subset = noalt == 4, alt.var = "alt") The name of the variable that contains the information about the choice situations can be indicated using the chid.varargument: MC <-mlogit.data(ModeCanada,subset = noalt == 4, chid.var= "case", alt.levels = c("train", "air", "bus", "car")) Both alternative and choice situation variable can also be provided: MC <-mlogit.data(ModeCanada,subset = noalt == 4, chid.var= "case", alt.var = "alt") and dropped from the data frame using the drop.indexargument: MC <-mlogit.data(ModeCanada,subset = noalt == 4, chid.var= "case", alt.var = "alt", drop.index= TRUE) head(MC)

Model description
Standard formulas are not very practical to describe random utility models, as these models may use different sets of covariates.Working with random utility models, one has to consider at most five sets of covariates: • alternative and individual specific covariates x ij with generic coefficients β, • individual specific covariates z i with alternative specific coefficients γ j , • alternative and individual specific covariates w ij with alternative specific coefficients δ j , • alternative specific covariates t j with a generic coefficient ν, • individual specific covariates v i that influence the variance of the errors.
Ignoring for the moment the 5th set of covariates, the observable part of the utility index for alternative j is: As the absolute value of utility is irrelevant, only utility differences are useful to modelize the choice for one alternative.For two alternatives j and k, we obtain: It is clear from the previous expression that coefficients of individual specific variables (the intercept being one of those) should be alternative specific, otherwise they would disappear in the differentiation.Moreover, only differences of these coefficients are relevant and may be identified.For example, with three alternatives 1, 2 and 3, the three coefficients γ 1 , γ 2 , γ 3 associated to an individual specific variable cannot be identified, but only two linear combinations of them.Therefore, one has to make a choice of normalization and the simplest one is just to set γ 1 = 0.
Coefficients for alternative and individual specific variables may (or may not) be alternative specific.For example, transport time is alternative specific, but 10 mn in public transport may not have the same impact on utility than 10 mn in a car.In this case, alternative specific coefficients are relevant.Monetary cost is also alternative specific, but in this case, one can consider than 1$ is 1$ whatever it is spent for the use of a car or in public transports.In this case, a generic coefficient is relevant.
The treatment of alternative specific variables don't differ much from the alternative and individual specific variables with a generic coefficient.However, if some of these variables are introduced, the ν parameter can only be estimated in a model without intercepts to avoid perfect multicolinearity.
Individual-related heteroscedasticity (see Swait and Louviere 1993) can be addressed by writing the utility of choosing j for individual i: , where has a variance that doesn't depend on i and j and σ i is a parametric function of some individual-specific covariates.As the overall scale of utility is irrelevant, the utility can also be writen as: with homoscedastic errors.if V ij is a linear combination of covariates, the associated coefficients are then divided by σ i .
A logit model with only individual specific variables is sometimes called a multinomial logit model, one with only alternative specific variables a conditional logit model and one with both kind of variables a mixed logit model.This is seriously misleading: conditional logit model is also a logit model for longitudinal data in the statistical literature and mixed logit is one of the names of a logit model with random parameters.Therefore, in what follows, we'll use the name multinomial logit model for the model we've just described whatever the nature of the explanatory variables used.
mlogit package provides objects of class mFormula which are built upon Formula objects provided by the Formula package. 5The Formula package provides richer formulas, which accept multiple responses (a feature not used here) and multiple set of covariates.It has in particular specific model.frameand model.matrixmethods which can be used with one or several sets of covariates.
To illustrate the use of mFormula objects, let's use again the ModeCanada data set and consider three sets of covariates that will be indicated in a three-part formula: • cost (monetary cost) is an alternative specific covariate with a generic coefficient (part 1), • income and urban are individual specific covariates (part 2), • ivt (in vehicle travel time) is alternative specific and alternative specific coefficients are expected (part 3).
Some parts of the formula may be omitted when there is no ambiguity.For example, the following sets of formulas are identical: By default, an intercept is added to the model, it can be removed by using + 0 or -1 in the second part.The model matrix contains J − 1 columns for every individual specific variables (income and the intercept), which means that the coefficient associated to the first alternative (air) is set to 0. It contains only one column for cost because we want a generic coefficient for this variable.It contains J columns for ivt, because it is an alternative specific variable for which we want alternative specific coefficients.

Testing
As for all models estimated by maximum likelihood, three testing procedures may be applied to test hypothesis about models fitted using mlogit.The set of hypothesis tested defines two models: the unconstrained model that doesn't take these hypothesis into account and the constrained model that impose these hypothesis.
This in turns define three principles of tests: the Wald test, based only on the unconstrained model, the Lagrange multiplier test (or score test), based only on the constrained model and the likelihood ratio test, based on the comparison of both models.
Two of these three tests are implemented in the lmtest package (Zeileis and Hothorn 2002): waldtest and lrtest.The Wald test is also implemented as linearHypothesis in package car (Fox and Weisberg 2010), with a fairly different syntax.We provide special methods of waldtest and lrtest for mlogit objects and we also provide a function for the Lagrange multiplier (or score) test called scoretest.
We'll see later that the score test is especially useful for mlogit objects when one is interested in extending the basic multinomial logit model because, in this case, the unconstrained model may be difficult to estimate.For the presentation of further tests, we provide a convenient statpval function which extract the statistic and the p-value from the objects returned by the testing function, which can be either of class anova or htest.

Random utility model and the multinomial logit model 2.1. Random utility model
Remind from the introduction that, in a random utility model, the probability of choosing an alternative l among a set of mutually exclusive ones is: Denoting F −l the cumulative density function of all the s except l , this probability is: Note that this probability is conditional on the value of l .The unconditional probability (which depends only on β and on the value of the observed explanatory variables) is obtained by integrating out the conditional probability using the marginal density of l , denoted f l : The conditional probability is an integral of dimension J − 1 and the computation of the unconditional probability adds on more dimension of integration.

The distribution of the error terms
The multinomial logit model (McFadden 1974) is a special case of the model developed in the previous section.It is based on three hypothesis The first hypothesis is the independence of the errors.In this case, the univariate distribution of the errors can be used, which leads to the following conditional and unconditional probabilities: which means that the conditional probability is the product of J −1 univariate cumulative density functions and the evaluation of only a one-dimensional integral is required to compute the unconditional probability.
The second hypothesis is that each follows a Gumbel distribution, whose density and probability functions are respectively: where µ is the location parameter and θ the scale parameter.The first two moments of the Gumbel distribution are E(z) = µ + θγ, where γ is the Euler-Mascheroni constant (≈ 0.577) and V(z) = π 2 6 θ 2 .The mean of j s is not identified if V j contains an intercept.We can then, without loss of generality suppose that µ j = 0, ∀j.Moreover, the overall scale of utility is not identified.Therefore, only J − 1 scale parameters may be identified, and a natural choice of normalization is to impose that one of the θ j is equal to 1.
The last hypothesis is that the errors are identically distributed.As the location parameter is not identified for any error term, this hypothesis is essentially an homoscedasticity hypothesis, which means that the scale parameter of the Gumbel distribution is the same for all the alternatives.As one of them has been previously set to 1, we can therefore suppose that, without loss of generality, θ j = 1, ∀j ∈ 1 . . .J. The conditional and unconditional probabilities (4) then further simplify to: The probabilities have then very simple, closed forms, which correspond to the logit transformation of the deterministic part of the utility.

IIA property
If we consider the probabilities of choice for two alternatives l and m, we have P l = e V l j e V j and P m = e Vm j e V j .The ratio of these two probabilities is: This probability ratio for the two alternatives depends only on the characteristics of these two alternatives and not on those of other alternatives.This is called the IIA property (for independence of irrelevant alternatives).
Consider for example inter urban trips between two towns, with 3 modes available.The initial situation is presented in table 1, which indicates that the initial market shares are 30% for car, 10% for plane and 60% for train.Consider now that a low-cost airline company enters the market, so that the price of plane ticket decreases from 150 to 100 euros.
price We consider in the table two scenarios of evolution of the market shares (columns s1 and s2).Both indicates that the market share of the plane increases and that the market shares of the other two modes decrease.For the first scenario (column s1), almost all the increase of plane market share is due to former train users who now take the plane.The car market share decreases very slightly (from 30 to 29%).This situation may be realistic if car trips are mainly family trips, as train and plane trips mainly trips for professional purpose.This situation cannot be predicted by a multinomial logit model, because of the IIA property; the ratio of the probabilities of choosing train and car should be identical before and after the decrease of plane price ticket.This is the case in column s2, where the relative decrease of the market shares for car and train are identical.
IIA relies on the hypothesis that the errors are identical and independent.It is not a problem by itself and may even be considered as a useful feature for a well specified model.However, this hypothesis may be in practice violated, especially if some important variables are omitted.

Interpretation
In a linear model, the coefficients can be directly considered as marginal effects of the explanatory variables on the explained variable.This is not the case for the multinomial logit model.However, meaningful results can be obtained using relevant transformations of the coefficients.

Marginal effects
The marginal effects are the derivatives of the probabilities with respect to the covariates, which can be be individual-specific (z i ) or alternative specific (x ij ): • For an individual specific variable, the sign of the marginal effect is not necessarily the sign of the coefficient.Actually, the sign of the marginal effect is given by β l − j P ij β j , which is positive if the coefficient for alternative l is greater than a weighted average of the coefficients for all the alternatives, the weights being the probabilities of choosing the alternatives.In this case, the sign of the marginal effect can be established with no ambiguity only for the alternatives with the lowest and the greatest coefficients.
• For an alternative-specific variable, the sign of the coefficient can be directly interpreted.The marginal effect is obtained by multiplying the coefficient by the product of two probabilities which is at most 0.25.The rule of thumb is therefore to divide the coefficient by 4 in order to have an upper bound of the marginal effect.
Note that the last equation can be rewriten: dP il /P il dx ik = −γP ik .
Therefore, when a characteristic of alternative k changes, the relative change of the probabilities for every alternatives except k is the same, which is a consequence of the IIA property.

Marginal rates of substitution
Coefficients are marginal utilities, which cannot be interpreted.However, ratios of coefficients are marginal rates of substitution.For example, if the observable part of utility is: , join variations of x 1 and x 2 which ensure the same level of utility are such that: dV = β 1 dx 1 + β 2 dx 2 = 0 so that: For example, if x 2 is transport cost (in $), x 1 transport time (in hours), β 1 = 1.5 and β 2 = 0.2, β 1 β 2 = 30 is the marginal rate of substitution of time in terms of $ and the value of 30 means that to reduce the travel time of one hour, the individual is willing to pay at most 30$ more.Stated more simply, time value is 30$ per hour.

Consumer's surplus
Consumer's surplus has a very simple expression for multinomial logit models, which was first derived by Small and Rosen (1981).The level of utility attained by an individual is U j = V j + j , j being the alternative chosen.The expected utility, from the searcher's point of view is then: E(max j U j ), where the expectation is taken over the values of all the error terms.Its expression is simply, up to an additive unknown constant, the log of the denominator of the logit probabilities and is often called the "log-sum": If the marginal utility of income (α) is known and constant, the expected surplus is simply E(U) α .

Application
Random utility models are fitted using the mlogit function.Basically, only two arguments are mandatory, formula and data, if a mlogit.dataobject (and not an ordinary data.frame) is provided.

ModeCanada
We first use the ModeCanada data set, which was already coerced to a mlogit.dataobject (called MC) in the previous section.The same model can then be estimated using as data argument a mlogit.dataobject: or a data.frame.In this latter case, further arguments that will be passed to mlogit.datashould be indicated: ml.MC1b <-mlogit(choice ~cost + freq + ovt | income | ivt, ModeCanada, subset = noalt == 4, alt.var = "alt", chid.var= "case") mlogit provides two further usefull arguments: • reflevel indicates which alternative is the "reference" alternative, i.e. the one for which the coefficients of individual specific covariates are 0, • alt.subset indicates a subset of alternatives on which the estimation has to be performed; in this case, only the lines that correspond to the selected alternatives are used and all the choice situations where not selected alternatives has been chosen are removed.
We estimate the model on the subset of three alternatives (we exclude bus whose market share is negligible in our sample) and we set car as the reference alternative.Moreover, we use a total transport time variable computed as the sum of the in and the out of vehicule time variables.
MC$time <-with(MC, ivt + ovt) ml.MC1 <-mlogit(choice ~cost + freq | income | time, MC, alt.subset = c("car", "train", "air"), reflevel = "car") The main results of the model are computed and displayed using the summary method: The frequencies of the different alternatives in the sample are first indicated.Next, some information about the optimization are displayed: the Newton-Ralphson method (with analytic gradient and hessian) is used, as it is the most efficient method for this simple model for which the log-likelihood function is concave.Note that very few iterations and computing time are required to estimate this model.Follows the usual table of coefficients and some goodness of fit measures: the value of the log-likelihood function, which is compared to the value when only intercepts are introduced, which leads to the computation of the McFadden R 2 and to the likelihood ratio test.
The fitted method can be used either to obtain the probability of actual choices (type = "outcome") or the probabilities for all the alternatives (type = "probabilities").Note that the log-likelihood is the sum of the log of the fitted outcome probabilities and that, as the model contains intercepts, the average fitted probabilities for every alternative equals the market shares of the alternatives in the sample.
sum(log(fitted(ml.MC1, type = "outcome"))) ## [1] -1951.344 logLik(ml.MC1) ## 'log Lik.' -1951.344 (df=9) apply(fitted(ml.MC1, type = "probabilities"), 2, mean) ## car train air ## 0.4575659 0.1672084 0.3752257 Predictions can be made using the predict method.If no data is provided, predictions are made for the average of the sample on which the estimation as been performed.which is an illustration of the IIA property.If train time changes, it changes the probabilities of choosing air and car, but not their ratio.
We next compute the surplus for individuals of the sample induced by train time reduction.This requires the computation of the log-sum term (also called inclusive value or inclusive utility) for every choice situation, which writes: Estimation of Random Utility Models in R: The mlogit Package For this purpose, we use the logsum function, which works on a vector of coefficients and a model.matrix.The basic use of logsum consists on providing as unique argument (called coef) a mlogit object.In this case, the model.matrixand the coef are extracted from the same model.

ivbefore <-logsum(ml.MC1)
To compute the log-sum after train time reduction, we must provide a model.matrixwhich is not the one corresponding to the fitted model.This can be done using the X argument which is a matrix or an object from which a model.matrixcan be extracted.This can also be done by filling the data (a data.frame or an object from which a data.framecan be extracted using a model.framemethod), and eventually the formula argument (a formula or an object for which the formula method can be applied).If no formula is provided but if data is a mlogit.dataobject, the formula is extracted from it.
ivafter <-logsum(ml.MC1, data = NMC) Surplus variation is then computed as the difference of the log-sums divided by the opposite of the cost coefficient which can be interpreted as the marginal utility of income: Consumer's surplus variation range from 0.6 to 31 Canadian $, with a median value of about 4$.
Marginal effects are computed using the effects method.By default, they are computed at the sample mean, but a data argument can be provided.The variation of the probability and of the covariate can be either absolute or relative.This is indicated with the type argument which is a combination of two a (as absolute) and r (as relative).For example, type = "ar" means that what is measured is an absolute variation of the probability for a relative variation of the covariate.effects(ml.MC1, covariate = "income", type = "ar") ## car train air ## -0.1822177 -0.1509079 0.3331256 The results indicate that, if income has doubled, the probability of choosing air increases by 33 points of percentage, as the probabilities of choosing car and train decrease by 18 and 15 points of percentage.
For an alternative specific covariate, a matrix of marginal effects is displayed.effects(ml.MC1, covariate = "cost", type = "rr") ## car train air ## car -0.9131273 0.9376923 0.9376923 ## train 0.3358005 -1.2505014 0.3358005 ## air 1.2316679 1.2316679 -3.1409703 The cell in the l th row and the c th column indicates the change of the probability of choosing alternative c when the cost of alternative l changes.As type = "rr", elasticities are computed.For example, a 10% change of train cost increases the probabilities of choosing car and air by 3.36%.Note that the relative changes of the probabilities of choosing one of these two modes are equal, which is a consequence of the IIA property.
Finally, in order to compute travel time valuation, we divide the coefficients of travel times (in minutes) by the coefficient of monetary cost (in $).The value of travel time ranges from 23 for train to 37 Canadian $ per hour for plane.

NOx
The second example is a data set used by Fowlie (2010), called NOx.She analyzed the effect of an emissions trading program (the NOx budget program which seeks to reduce the emission of nitrogen oxides) on the behavior of producers.More precisely, coal electricity plant managers may adopt one out of fifteen different technologies in order to comply to the emissions defined by the program.Some of them require high investment (the capital cost is kcost) and are very efficient to reduce emissions, some other require much less investment but are less efficient and the operating cost (denoted vcost) is then higher because pollution permits must be purchased to offset emissions exceeding their allocation.
The focus of the paper is on the effects of the regulatory environment on manager's behavior.Some firms are deregulated, whereas other are either regulated or public.Rate of returns is applied for regulated firms, which means that they perceive a "fair" rate of return on their investment.Public firms also enjoy significant cost of capital advantages.
Therefore, the main hypothesis of the paper is that public and regulated firms will adopt much more capitalistic intensive technologies than deregulated and public ones, which means that the coefficient of capital cost should take a higher negative value for deregulated firms.Capital cost is interacted with the age of the plant (measured as a deviation from the sample mean age), as firms should weight capital costs more heavily for older plants, as they have less time to recover these costs.

Logit models relaxing the iid hypothesis
In the previous section, we assumed that the error terms were iid (identically and independently distributed), i.e. uncorrelated and homoscedastic.Extensions of the basic multinomial logit model have been proposed by relaxing one of these two hypothesis while maintaining the hypothesis of Gumbell distribution.

The heteroskedastic logit model
The heteroskedastic logit model was proposed by Bhat (1995).The probability that U l > U j is: which implies the following conditional and unconditional probabilities There is no closed form for this integral but it can be efficiently computed using a Gauss quadrature method, and more precisely the Gauss-Laguerre quadrature method.

The nested logit model
The nested logit model was first proposed by McFadden (1978).It is a generalization of the multinomial logit model that is based on the idea that some alternatives may be joined in several groups (called nests).The error terms may then present some correlation in the same nest, whereas error terms of different nests are still uncorrelated.
Denoting m = 1 . . .M the nests and B m the set of alternatives belonging to nest m, the multivariate distribution of the error terms is: The marginal distributions of the s are still univariate extreme value, but there is now some correlation within nests. 1 − λ m is a measure of the correlation, i.e. λ m = 1 implies no correlation.In the special case where λ m = 1 ∀m, the errors are iid Gumbel errors and the nested logit model reduce to the multinomial logit model.It can then be shown that the probability of choosing alternative j that belongs to nest l is: and that this model is a random utility model if all the λ parameters are in the 0 − 1 interval.7 Let us now write the deterministic part of the utility of alternative j as the sum of two terms: the first one (Z j ) being specific to the alternative and the second one (W l ) to the nest it belongs to: We can then rewrite the probabilities as follow: Then denote I l = ln k∈B l e Z k /λ l which is often called the log-sum, the inclusive value or the inclusive utility. 8We then can write the probability of choosing alternative j as: The first term P j|l is the conditional probability of choosing alternative j if the nest l is chosen.It is often referred as the lower model.The second term P l is the marginal probability of choosing the nest l and is referred as the upper model.W m + λ m I m can be interpreted as the expected utility of choosing the best alternative in m, W m being the expected utility of choosing an alternative in this nest (whatever this alternative is) and λ m I m being the expected extra utility gained by being able to choose the best alternative in the nest.The inclusive values link the two models.It is then straightforward to show that IIA applies within nests, but not for two alternatives in different nests.
A consistent but inefficient way of estimating the nested logit model is to estimate separately its two components.The coefficients of the lower model are first estimated, which enables the computation of the inclusive values I m .The coefficients of the upper model are then estimated, using I m as covariates.Maximizing directly the likelihood function of the nested model leads to a more efficient estimator.

Applications
ModeCanada Bhat (1995) estimated the heteroscedastic logit model on the ModeCanada data set.Using mlogit, the heteroscedastic logit model is obtained by setting the heterosc argument to TRUE: ml.MC <-mlogit(choice ~freq + cost + ivt + ovt | urban + income, MC, reflevel = 'car', alt.subset = c("car", "train", "air")) hl.MC <-mlogit(choice ~freq + cost + ivt + ovt | urban + income, MC, reflevel = 'car', alt.subset = c("car", "train", "air"), heterosc = TRUE) coef(summary(hl.MC)) [11:12, ] ## Estimate Std.Error z-value Pr(>|z|) ## sp.train 1.2371829 0.1104610 11.200182 0.000000e+00 ## sp.air 0.5403239 0.1118353 4.831425 1.355592e-06 The variance of the error terms of train and air are respectively higher and lower than the variance of the error term of car (set to 1).Note that the z-values and p-values of the output are not particularly meaningful, as the hypothesis that the coefficient is zero (and not one) is tested.The homoscedascticity hypothesis can be tested using any of the three tests.A particular convenient syntax is provided in this case.For the likelihood ratio and the wald test, one can use only the fitted heteroscedastic model as argument.In this case, it is guessed that the hypothesis that the user wants to test is the homoscedasticity hypothesis.
lr.heter <-lrtest(hl.MC, ml.MC) wd.heter <-waldtest(hl.MC, heterosc = FALSE) or, more simply: The Wald test can also be computed using the linearHypothesis function from the car package : library("car") lh.heter <-linearHypothesis(hl.MC, c('sp.air= 1', 'sp.train = 1')) For the score test, we provide the constrained model as argument, which is the standard multinomial logit model and the supplementary argument which defines the unconstrained model, which is in this case heterosc = TRUE.sc.heter <-scoretest(ml.MC, heterosc = TRUE) sapply(list(wald = wd.heter,lh = lh.heter,score = sc.heter,lr = lr.heter),statpval) ## wald lh score lr ## stat 25.196 25.196 9.488 6.888 ## p-value 0.000 0.000 0.009 0.032 The homoscedasticity hypothesis is strongly rejected using the Wald test, but only a the 1 and 5% level for, respectively, the score and the likelihood ratio tests.JapaneseFDI Head and Mayer (2004) analyzed the choice of one of the 57 European regions belonging to 9 countries by Japenese firms to implement a new production unit.data("JapaneseFDI", package = "mlogit") jfdi <-mlogit.data(JapaneseFDI,chid.var= "firm", alt.var = "region", group.var= "country") Note that we've used an extra argument to mlogit.datacalled group.varwhich indicates the grouping variable, which will be used later to define easily the nests.There are two sets of covariates: the wage rate wage, the unemployment rate unemp, a dummy indicating that the region is eligible to European funds elig and the area area are observed at the regional level and are therefore relevant for the estimation of the lower model, whereas the social cotisation rate scrate and the corporate tax rate ctaxrate are observed at the country level and are therefore suitable for the upper model.
We first estimate a multinomial logit model: ml.fdi <-mlogit(choice ~log(wage) + unemp + elig + log(area) + scrate + ctaxrate | 0, data = jfdi) Note that, as the covariates are only alternative specific, the intercepts are not identified and therefore have been removed.We next estimate the lower model, which analyses the choice of a region within a given country.Therefore, for each choice situation, we estimate the choice of a region on the subset of regions of the country which has been chosen.Moreover, observations concerning Portugal and Ireland are removed as these two countries are mono-region.
lm.fdi <-mlogit(choice ~log(wage) + unemp + elig + log(area) | 0, data = jfdi, subset = country == choice.c& !country %in% c("PT", "IE")) We next use the fitted lower model in order to compute the inclusive value, at the country level: where B g is the set of regions for country g.When a grouping variable is provided in the mlogit.datafunction, inclusive values are by default computed for every group g (global inclusive values are obtained by setting the type argument to "global").By default, output is set to "chid" and the results is a vector (if type = "global") or a matrix (if type = "region") with row number equal to the number of choice situations.If output is set to "obs", a vector of length equal to the number of lines of the data in long format is returned.The following code indicates different ways to use the logsum function:   The results of the fitted models are presented in table 2 using the texreg package.
For the nested logit models, two tests are of particular interest: • the test of no nests, which means that all the nest elasticities are equal to 1, • the test of unique nest elasticities, which means that all the nest elasticities are equal to each other.
Once again, the three tests strongly reject the hypothesis.

The probabilities
For the standard logit model, the probability that individual i choose alternative j is: Suppose now that the coefficients are individual-specific.The probabilities are then: The mixed logit model consists on considering the β i 's as random draws from a distribution whose parameters are estimated.The probability that individual i choose alternative l, for a given value of β i is: To get the unconditional probability, we have to integrate out this conditional probability, using the density function of β.Suppose that V il = α + β i x il , i.e. there is only one individual-specific coefficient and that the density of β i is f (β, θ), θ being the vector of the parameters of the distribution of β.The unconditional probability is then: which is a one-dimensional integral that can be efficiently estimated by quadrature methods.If V il = β i x il where β i is a vector of length K and f (β, θ) is the joint density of the K individual-specific coefficients, the unconditional probability is: This is a K-dimensional integral which cannot easily be estimated by quadrature methods.The only practical method is then to use simulations.More precisely, R draws of the parameters are taken from the distribution of β, the probability is computed for every draw and the unconditional probability, which is the expected value of the conditional probabilities is estimated by the average of the R probabilities.

Individual parameters
The expected value of a random coefficient (E(β)) is simply estimated by the mean of the R draws on its distribution : β = R r=1 β r .Individual parameters are obtained by first computing the probabilities of the observed choice of i for every value of β r : where y ij is a dummy equal to one if i has chosen alternative j.The expected value of the parameter for an individual is then estimated by using these probabilities to weight the R β values: βi = r P ir β r r P ir

Panel data
If there are repeated observations for the same individuals, this longitudinal dimension of the data can be taken into account in the mixed logit model, assuming that the random parameters of individual i are the same for all his choice situations.Denoting y itl a dummy equal to 1 if i choose alternative l for the t th choice situation, the probability of the observed choice is: The joint probability for the T observations of individual i is then: and the log-likelihood is simply i ln P i .

Application
The random parameter logit model is estimated by providing a rpar argument to mlogit.This argument is a named vector, the names being the random coefficients and the values the name of the law of distribution.Currently, the normal ("n"), log-normal ("ln"), zero-censored normal ("cn"), uniform ("u") and triangular ("t") distributions are available.For these distributions, two parameters are estimated which are, for normal related distributions, the mean and the standard-deviation of the underlying normal distribution and for the uniform and triangular distribution, the mean and the half range of the distribution.For these last two distributions, zero-bounded variants are also provided ("zbt" and "zbu").These two distributions are defined by only one parameter (the mean) and their definition domain varies from 0 to twice the mean.
It's often the case that we are willing to impose that the distribution of a random parameter takes only positive or negative values.For example, the price coefficient should be negative for every individual.In this case, "zbt" and "zbu" can be used.The use of "ln" and "cn" can also be relevant but, in this case, if only negative values are expected, one should consider the distribution of the opposite of the random price coefficient.This can easily be done using the opposite argument of mlogit.data.For example, if opposite = c("price", "time") is used, mlogit.datareturns the opposite of the two variables, so that the corresponding coefficients should now be positive.
R is the number of draws, halton indicates whether halton draws (see Train 2003, chapter 9) should be used (NA indicates that default halton draws are used), panel is a boolean which indicates if one wishes to use the panel data version of the log-likelihood.
Correlations between random parameters can be introduced only for normal-related distributed random parameters, using the correlation argument.If TRUE, all the normalrelated random parameters are correlated.The correlation argument can also be a character vector indicating the random parameters that one wishes to be correlated.

Train
We first use the All the coefficients are highly significant and have the predicted positive sign (remind than an increase in the variable comfort implies using a less comfortable class).The coefficients can't be directly interpreted, but dividing them by the price coefficient, we get monetary values: coef(Train.ml)[-1]/ coef (Train.ml)We obtain the value of 26 euros for an hour of traveling, 5 euros for a change and 14 euros to travel in a more comfortable class.
We then estimate a model with three random parameters, time, change and comfort.We first estimate the uncorrelated mixed logit model: The summary method supplies the usual table of coefficients, and also some statistics about the random parameters.Random parameters may be extracted using the function rpar which take as first argument a mlogit object, as second argument par the parameter(s) to be extracted and as third argument norm the coefficient (if any) that should be used for normalization.This is usually the coefficient of the price (taken as a non random parameter), so that the effects can be interpreted as monetary values.This function returns a rpar object, and several methods/functions (summary, mean, med for the median and stdev for the standard deviation) are provided to describe it: time.value<-rpar (Train.mxlcAs the change attribute seems to be weakly correlated with the two other random parameters, the correlation can be restricted to the time and comfort attributes by filling the correlation argument with a character vector. Train.mxlc2 <-update (Train.mxlc, correlation = c("time", "comfort")) cor.mlogit (Train.mxlc2)## time comfort ## time 1.0000000 0.3909467 ## comfort 0.3909467 1.0000000 The presence of random coefficients and their correlation can be investigated using any of the three tests.Actually, three nested models can be considered, a model with no random effects, a model with random but uncorrelated effects and a model with random and correlated effects.We first present the three tests of no correlated random effects: For example, the probabilities of dying using the water taxi and the helicopter are respectively of 2.55 and 18.41 out of 100,000 trips.This feature enables the authors to estimate the value of a statistical life.For an individual i, the utility of choosing alternative j is: where p j is the probability of dying while using alternative j, c j and t j the monetary cost and the transport time of alternative j and w i the wage rate of individual i (which is supposed to be his valuation of transportation time).C ij = c j + w i t j is therefore the individual specific generalized cost for alternative j. β il and β ic are the (individual specific) marginal utility of surviving and of expense.The value of the statistical life (VSL) is then defined by: The two covariates of interest are cost (the generalized cost in $PPP) and risk (mortality per 100,000 trips).The risk variable being purely alternative specific, intercepts for the alternatives cannot therefore be estimated.To avoid endogeneity problems, the authors introduce as covariates marks the individuals gave to 5 attributes of the alternatives: comfort, noise level, crowdedness, convenience and transfer location.We first estimate a multinomial logit model.data("RiskyTransport", package = "mlogit") RT <-mlogit.data(RiskyTransport,shape = "long", choice = "choice", chid.var= "chid", alt.var = "mode", id.var = "id") ml.rt <-mlogit(choice ~cost + risk + seats + noise + crowdness + convloc + clientele | 0, data = RT, weights = weight) Note the use of the weights argument in order to set weights to the observations, as done in the original study.The ratio of the coefficients of risk and of cost is 9.84 (hundred of thousands of $), which means that the estimated value of the statistical life is a bit less than one million $.We next consider a mixed logit model.The coefficients of cost and risk are assumed to be random, following a zero-bounded triangular distribution.Individual-level parameters can be extracted using the fitted method, with the type argument set to parameters.indpar <-fitted(mx.rt,type = "parameters") We can then compute the VSL for every individual and analyse their distribution, using quantiles and plotting the empirical density of VSL for African and non-African travelers (as done in León and Miguel 2017, table 4, p.219 and figure 5, p.223).

Conclusions
mlogit estimates a large set of random utility models with a unified and friendly interface.Some of these models haven't been presented in this article, namely the rank-ordered logit model, the overlapping nested logit model, the paired combinatorial logit model and the multinomial probit model.Moreover, it provides usefull functions and methods which compute and return useful results, like predicted probabilities, inclusive values, marginal effects and consumer surplus.

4.
The random parameters (or mixed) logit model 4.1.Derivation of the model A mixed logit model or random parameters logit model is a logit model for which the parameters are assumed to vary from one individual to another.It is therefore a model that takes the heterogeneity of the population into account.

Table 1 :
Market shares

Table 2 :
Nested logit models for the choice by Japanese firms of a european region Train data set, previously coerced to a mlogit.dataobject called Tr.We first estimate the multinomial model: both alternatives being virtual train trips, it is relevant to use only generic coefficients and to remove the intercept: