Predicting Lessee Switch Behavior using Logit Models

Modeling the mode choice by an individual is a challenging task. In this paper, vehicle choice of lessees is discussed. Prediction of vehicle choice occurs by fitting three different logit models: standard, nested and cross-nested multinomial logistic regression. Both nested and cross-nested logit relax error term distribution assumptions and therefore allow for correlations across alternative vehicle choices. It is shown that allowing for correlation across alternatives is the proper way of modeling lessees’ vehicle choice. © 2021 The Authors. Published by Elsevier B.V. This is an open access article under the license (http://creativeco ons.org/licenses/by-nc-nd/4.0/) r-r i r r i ility of the Conference Program Chairs.


Introduction
Modeling individual's choices in selecting transportation modes has been a large area of research. However, predicting and analyzing individuals' evaluation of mode alternatives, and their corresponding decision of mode among a set of interrelated choices, remains complex. Discrete choice models are the typical family of models used to analyze and predict an individual's choice of one alternative from a set of mutually exclusive and collectively exhaustive alternatives [7]. These types of models are widely discussed in literature, and rose to fame when Daniel McFadden won the Nobel Prize in economics for his development of theory and methods for analyzing discrete choice [8]. Discrete choice models have had considerable influence on the growth of the mode choice modeling field, by trying to accommodate for both observed and unobserved effects on an individual's choice. In such models, it is assumed that an individual's preference for an alternative is captured by a value, called utility, and selects the alternative with highest utility. Concurrently, the assumption is made that the analyst does not have complete information, and therefore a factor of uncertainty is considered [1]. Discrete choice models are widely used due to the extent of literature available, and the relative ease of interpretation of such models.

Introduction
Modeling individual's choices in selecting transportation modes has been a large area of research. However, predicting and analyzing individuals' evaluation of mode alternatives, and their corresponding decision of mode among a set of interrelated choices, remains complex. Discrete choice models are the typical family of models used to analyze and predict an individual's choice of one alternative from a set of mutually exclusive and collectively exhaustive alternatives [7]. These types of models are widely discussed in literature, and rose to fame when Daniel McFadden won the Nobel Prize in economics for his development of theory and methods for analyzing discrete choice [8]. Discrete choice models have had considerable influence on the growth of the mode choice modeling field, by trying to accommodate for both observed and unobserved effects on an individual's choice. In such models, it is assumed that an individual's preference for an alternative is captured by a value, called utility, and selects the alternative with highest utility. Concurrently, the assumption is made that the analyst does not have complete information, and therefore a factor of uncertainty is considered [1]. Discrete choice models are widely used due to the extent of literature available, and the relative ease of interpretation of such models.
In this paper several models, all part of the discrete choice family, are applied to a specific use case related to mode choice. Multiple types of the discrete choice models have focused on mode choice analysis ( [9], [2], [3]). This paper revolves around logit models, a branch of the discrete choice family. Logit models are well-represented in literature and most used for modeling mode choice ( [4], [6], [5]). Even though multinomial logit choice models have historically been most prominent in the field of mode choice modeling, such models preserve certain correlation assumptions. These assumptions might not prove accurate. The nested logit and cross-nested logit model discussed in this paper builds on the multi-nominal logit model and each relaxes correlation assumptions.
The research in this paper is conducted for a leasing company with establishments throughout the world. The obtained insights should lead to improved alignment of separate entities within the company. For instance, buying, selling, and leasing of vehicles are all related, and should therefore be aligned to maximize profit. To understand customer behavior, and to provide tailored offers to these customers, it could prove greatly advantageous to model the switch behavior of the lessees within a company. A switch can be thought of as the choice of vehicle, given the customer has had a leasing contract with the company. That is to say, upon termination of the customer's current contract, the leasing company would like to predict the make of vehicle the customer will most likely lease next. Each constructed model should predict what make of vehicle a lessee is most likely to lease post termination of his contract. It is expected that the more assumptions on relations across alternatives are relaxed, the better a model performs in predicting these switches.

Data
The provided data consists of 69,952 matched contracts or switches occurring between January 2014 and March 2018. One data point depicts two contracts, one for the previous vehicle and the second for the current vehicle. In addition, both contracts contain vehicle and contract characteristics that are used as modeling variables. The provided contracts originate from ten different branches of the company, with each branch depicting a different country. The company considers each client to belong to a particular client segment: Corporate, International, Private or Small Enterprise (SME). By definition, it is not possible to switch client segments. In addition, each vehicle of the company's fleet is placed within different segments. To be more precise, all vehicles are associated with the following segments: brand classification, vehicle segment, OEM group, make and model. All these labels are predefined by the leasing company and are stated within each provided contract. Note that most segments depend on the higher-level segment. That is, if the variable model is known, one knows the variables make, OEM group, vehicle segment and brand classification by definition. Recall that it is possible that a vehicle is present in multiple client segments and countries. Additionally, if only the variable make is known, one does know the variable OEM group by definition. Knowing the make does not necessarily imply the vehicle segment to be known. Since vehicles are classified to be part of different vehicle segments based on the model of vehicle, it is possible for makes to belong to multiple vehicle segments. This property proves extremely convenient when subdividing makes into nests for nested logit.
Aside from variables segmenting the vehicles of the fleet, each contract provides the following information: customer ID, vehicle ID, fuel type, vehicle type, body style, lease type, catalogue price, commercial discount amount, standard discount percentage, total accessories amount, total options amount, ufwt (unfair wear and tear) amount, start mileage, end mileage, contract mileage, intro date model, end date model, sale date, sale amount, termination info, start date contract, end date contract and contract duration. The variables mileage per month and switch quarter are extracted by the analyst. Not all variables are used for prediction purposes. Some variables are either too highly correlated, or variables are omitted due to a lack of descriptive quality.
The pre-processing of data can be separated in two parts: processing of numerical variables, and processing of categorical variables. First, processing of numerical variables is discussed. All missing values of numerical variables contained in contracts are replaced with the mean value of the concerned variable after grouping by the variables country, client segment and make. To illustrate the matter, if the variable catalogue price is missing for a Volkswagen Golf of a driver stemming from the SME segment of the Spanish branch, it is replaced with the mean value of the catalogue price for that particular vehicle in those segments. In addition, outliers are set to either the determined lower or upper boundary of the concerned variable.
To use numerical variables for prediction, these need to be scaled. Prior to this procedure, all variables depicting a monetary value are transformed using the natural logarithmic function. The variables catalogue price, commercial discount amount, total accessories amount, total options amount, and ufwt amount are transformed by taking the natural log of the original value. In addition, all zero values for which the natural log is not defined, are replaced with the minimum value for which the natural log is defined. The idea of using the natural logarithmic function for variable transformation, is to push the variable towards being normally distributed. Post logarithmic transform, the monetary variables are treated as any other numerical variable. Next, all numerical variables are standardized to have zero mean, and standard deviation of 1. This transformation refrains functions present in discrete choice models from saturating.
Processing of categorical variables occurs in slightly different fashion. No missing values occur in the data. The provided data does contain values such as 'unknown', or 'country did not supply a value'. These values rarely occur and therefore do not form a significant problem. States of categorical variables that rarely occur, are either set to the state other, or are merged with an already-existing state. For instance, the variable client segment is reduced to contain three states, since Private is merged with SME.
Lastly, the provided vehicle leasing data are split into training and test data. These data are identical for all models used in this study. The most recent 10% of data are considered to be test data. Data are classified as most recent based on the date a switch occurred. Providing test data allows for models to predict on data that are seen as the most representative of the current situation. Prior to splitting data into train and test data, all data are shuffled to avoid dis-balanced data.

Choice modeling on Vehicle Leasing Data
The section is describes our reasoning behind the selected nests for the nested and cross-nested models. No nests are required for a standard multinomial logistic regression model; fitting such a model on the vehicle leasing data is straightforward and the results of the multi-nominal logit model is used as starting values for both the nested and cross-nested models. Besides pointing the maximization procedure in the right direction, this procedure significantly reduces computation time.
All three models use the same explanatory variables, which can be classified as numerical and categorical variables. The numerical variables consist of the variables catalogue price, commercial discount amount, standard discount percentage, total accessories amount, ufwt amount, mileage per month and contract duration. The remaining variables consist of the categorical variables country, client segment, fuel type, vehicle type, vehicle segment, body style, lease type, switch quarter and make. Regarding categorical variables, rather than estimating one β per state of the variable, one β per state of the variable per alternative is estimated. Namely, for state Diesel of the variable fuel type, one β per alternative is estimated (β Diesel-Audi , β Diesel-BMW , etc.).
One of the βs per categorical variable is held fixed; all other βs are estimated with respect to the fixed β. In addition, a categorical variable with n unique states, produces n − 1 (times the number of unique alternatives) different βs to be estimated, since the nth state is a perfect linear combination of the previous n − 1 states. Additionally, one β per alternative for each numerical variable is estimated ( β catalogue price-Audi, β catalogue price-BMW, etc.). The multinomial logistic regression uses all these explanatory variables to analyze and predict.
The Independence of irrelevant alternatives (IIA) assumption does not allow for correlation across alternatives. However one can imagine that for instance adding a Fiat 500 to the choice set should not change the decision maker's choice when he is looking for an SUV type of vehicle. To allow for correlations across alternatives, the alternatives are divided into nests. This division proves relatively straightforward due to the vehicle segmentation provided by the leasing company. The leasing company assigns each model of vehicle to one particular vehicle segment. Note however, that the make of vehicle can belong to multiple vehicle segments, since a make of vehicle consists of multiple models. The exact distribution of the target variable new make over the variable new vehicle segment is shown in Table 1.
From Table 1, it becomes clear that all makes belong to multiple vehicle segments. For the nested multinomial logistic regression model, alternatives are restricted to belong to only one nest. To determine the nesting structure of this model, and to determine which nest each alternative belongs to, each alternative is assigned to the vehicle segment in which the alternative occurs most. A schematic representation of the nesting structure of the model is visualized in  Figure 1. Observe that only four unique vehicle segments are assigned to be a nest: C, D, LCV and SUV. Each nest has at least two alternatives that belong to it. Regarding cross-nested multinomial logistic regression, the restriction of each alternative belonging to one nest is dropped. Each alternative is allowed to be contained in multiple nests. The model estimates the allocation parameters α, indicating the degree to which an alternative belongs to a particular nest. Using table 1 we determined the nests each alternative belongs to, and the allocation parameters α corresponding to these nests. First, each number of this table is divided by the total occurrences of the alternative in the data. This procedure results in a table denoting the fraction of the number of times an alternative is assigned to a particular vehicle segment over the total number of occurrences of the alternative in the data. The idea is that these fractions function as starting values for the allocation parameters α in the cross-nested multinomial logistic regression model. All fractions less than 0.10 are considered to equal zero. For interpretability reasons, the allocation parameters are usually scaled to k a ik = 1∀i, with i denoting the alternative and k the nest. Therefore, all fractions less than 0.10, but greater than zero are added to different α. Since these α solely function as starting values for the cross-nested model, this should not impose a problem. Figure  2 depicts a schematic overview of the cross-nested structure.

Results
This section describes the obtained results by all three discussed logit models. Recall that all models utilize the same data, and that estimates of the multinomial logit model are used as starting values for both the nested and cross-nested models.
Observe from Table 2 the results regarding all discrete choice models. The leftmost column states performance measures, whereas the remaining columns indicate the values associated with these measures for the standard, nested, and cross-nested multinomial logistic regression models respectively. The measures ρ andρ 2 , are defined as the likelihood ratio index and the adjusted likelihood ratio index respectively. The latter is a slight adjustment of the former, taking into account the number of estimated parameters K. The cross-nested model achieves best results on all performance measures. Comparing this statistic for all three models, the more restrictive assumptions are relaxed, the better fit the model is on training data. Since performance of all three models is roughly similar, it needs to be checked whether dropping the IIA assumption is relevant. We do this by examining the significance of the values of the nest coefficients.
To do so, note the estimates of the nest parameters stated in Table 3. As the nested model collapses to a standard multinomial logistic regression model if all nest parameters are equal to one, λ k = 1∀k. Note that both nest D and SUV are significant irrelevant of the maintained significance level. Nest C is deemed appropriate depending on the maintained significance level. Albeit slight, all estimated nest parameters are greater than one, hence validating relaxing of IIA. Further relaxation of assumptions leads to the cross-nested model, of which the nest parameters are displayed in Table 4. Interestingly, all nest parameters are significant and all estimates of these parameters differ significantly from one, hence relaxing assumptions is again validated. Observe that nest MPV has the highest estimated parameter value. The higher the nest parameter estimate, the more correlated alternatives of this nest are within the nest, rather than outside the nest. Table 4. Relevance of the nest parameters of the cross-nested logit model. Note that no statistics are displayed for nest LCV. This parameter was initially estimated with infinite standard error. On the second run this parameter was therefore held fixed at the estimated value of the first run.

Nest
Estimate Even though all nest parameters of the cross-nested model are deemed significant and relevant, Table 5 states some allocation parameters indicating that inclusion of the corresponding alternative in the concerned nest is not strongly supported by the given data. In other words, some allocation parameters are deemed insignificant. The third column of this table states the starting value of the respective allocation parameters. Recall that these estimates are solely based on the nesting structure provided by the vehicle leasing company. Note that most estimated values do not differ much from the provided starting value of the parameter. Interestingly enough, estimates of allocation parameters that differ much from the corresponding starting value are typically significant. Lastly, note that for each of the alternatives at least one of the allocation parameters is significant, indicating inclusion in one of the nests is indicated by the data.
Next, the relevance and influence of the explanatory variables is discussed. Note that insignificant parameters are also displayed. These parameters serve two purposes. Firstly, insignificance of parameters could serve an explanatory purpose regarding inclusion in models. Secondly, exclusion of these parameters causes both the final log-likelihood of models and the achieved accuracy on test data to worsen. Therefore, all insignificant estimates were maintained and used for prediction. Table 6 portrays the estimates for these parameters. Per state of make only the two parameters with the highest estimated value are shown. Note that for nearly all these βs, the one indicating make loyalty has the highest value. Only the parameters βmakeBMW BMW and βmakeRenaultRenault are not the estimate with the highest value for that particular state of the variable. In addition, note that all these estimates are significant. This table portrays the estimates of the multinomial logistic regression model. Similar phenomena are observed for both the nested and cross-nested models.

Discussion
Logit models have historically proven successful for analyzing and predicting mode choice. This paper presented a mode choice case study by which performance of several logit models, each relaxing correlation assumptions more, was compared. It was expected that the models in which correlation assumptions were relaxed would perform best. The presented results did indeed indicate better performance of such models. More precisely, these models fit training data better and generalize better on unseen data, obtaining higher accuracy on test data.
The standard multinomial logit model was estimated first. Due to a guarantee of convergence, such models prove extremely convenient when relaxing assumptions on the distribution of error terms. When defining a nesting structure the estimated parameters of the standard model were taken as starting values for the parameters of the nested models. The presented results indicate that the leasing company defined nesting structure proves accurate, with almost all estimated nest parameters being significant. Relaxing assumptions on error term distribution does indeed improve performance of models, albeit slightly. From estimation results it can be concluded that correlations across alternatives indeed exist, hence assigning each make one or multiple nests is justified. Table 5. Statistics on the allocation parameters α. In α ik , i denotes the alternative and k the corresponding nest. All values for which statistics are not displayed were fixed at run-time. These parameters were estimated with infinite standard error at first. On the second run they were fixed at the produced output value of the first run.  In conclusion, relaxing assumptions on error term distribution allows for better capturing of correlations across alternatives. For the leasing company to use the models discussed in this paper, the company should keep track of the drivers of their vehicles. Doing so will allow for accurate matching of contracts, whilst concurrently enhancing predictive power by addition of explanatory variables.