Joint prediction of travel mode choice and purpose from travel surveys: A multitask deep learning approach

The prediction and behavioural analysis of travel mode choice and purpose are critical for transport planning and have attracted increasing interest in research. Traditionally, the prediction of travel mode choice and trip purpose has been tackled separately, which fail to fully leverage the shared information between travel mode and purpose. This study addresses this gap by proposing a multitask learning deep neural network framework (MTLDNN) to jointly predict mode choice and purpose. We empirically evaluate and validate this framework using the household travel survey data in Greater London, UK. The results show that this framework has significantly lower cross-entropy loss than multinomial logit models (MNL) and single-task-learning deep neural network models (STLDNN). On the other hand, the predictive accuracy of MTLDNN is similar to STLDNN and is significantly higher than MNL. Moreover, in terms of behaviour analysis, the substitution pattern and choice probability of MTLDNN regarding input variables largely agree with MNL and STLDNN. This work demonstrates that MTLDNN is efficient in utilising the information shared by travel mode choice and purpose, and is capable of producing behaviourally reasonable substitution patterns across travel modes. Future research would develop more advanced MTLDNN frameworks for travel behaviour analysis and generalise MTLDNN to other travel behaviour topics.


Introduction
The analysis and prediction of travel mode choice are of pronounced importance for transport planning and travel demand forecast.Traditionally, the predominant approach used for modelling travel mode choice is random utility models (DCM), which rely on predefined utility specifications for each alternative in the choice set (Domencich and McFadden, 1975).Recently, researchers have been increasingly interested in applying machine learning (ML) techniques such as deep neural networks (DNN) to analyse travel mode choices.Rather than relying on predefined utility functions and pre-selected input variables, ML methods are capable of automatically identifying the non-linear relationship between input features and mode choice.Therefore, ML methods have demonstrated considerable predictive power and interpretability in mode choice analysis.
On the other hand, trip purpose prediction has also received increasing attention in transport planning and research.Trip purpose refers to the purpose of a trip and why people travel, such as education, work, recreation, and business.The methods for predicting trip purpose are classified into three categories, namely rule-based models (using predefined heuristic rules), statistical methods (using logistic models), and ML methods (Gong et al., 2014).Similar to mode choice analysis, ML methods are proved to outperform rule-based and statistical methods in terms of predictive accuracy for trip purposes.
Until now, the prediction of travel mode choice and trip purpose are two separate research fields and no previous attempts have been introduced to simultaneously predict these two trip characteristics.The joint predicting of travel mode and trip purpose is made possible by two aspects.First, empirical studies show a considerable dependence between travel mode ("how people travel") and trip purpose ("why people travel").Second, the recent development in multitask learning (MTL) and DNNs provides an opportunity to address the problem of jointly predicting travel mode and trip purpose, as an alternative to separate prediction of mode or trip purpose.
This study proposes a framework of using multitask learning deep neural networks (MTLDNNs) to analyse and predict travel mode and trip purpose based on travel surveys.This framework starts with several shared hidden layers that capture the shared information between two tasks and ends with some task-specific layers that model each task, as shown in Fig. 1.We first describe the structure and components of this framework and then apply this framework to a travel survey dataset collected in London, England, which is a household survey designed to monitor long-term trends in personal travel and to inform the development of transport policy.In the experiments, we demonstrate the predictive performance of this MTLDNN framework in comparison with the classical DNN and multi-nomial logit models.To demonstrate the behavioural interpretability of this framework, we illustrate the relationship between choice probabilities and key input variables and the substitution patterns between mode alternatives.Overall, this study shows that the MTLDNN framework is appropriate for jointly analysing and predicting travel mode and trip purpose due to the theoretical flexibility and predictive performance.
The paper is organised as follows.Section 2 reviews the dependence between travel mode and trip purpose, as well as the existing studies that use multitask learning for travel behaviour and choice models.Section 3 introduces the test of independence between mode and purpose and the MTLDNN for analysing mode and purpose.Section 4 presents data and experiment settings, and then Section 5 discusses model performance and the behavioural information in MTLDNN.Finally, Section 6 summarises the key findings and proposes future research directions.

Dependence between travel mode and trip purpose
Travel mode and trip purpose are among the most crucial characteristics for travel surveys and transport behaviour research.Empirical studies have shown that these two attributes are correlated and dependent on each other and that one attribute plays an important role in modelling and predicting the other.
Multiple studies have adopted travel mode as one of the input variables for predicting trip purpose and reported that travel mode has a significant effect on the prediction of trip purpose.In the work by Ermagun et al. (2017), travel modes are statistically significant variables in the nest logit model for predicting trip purpose, and the mode of "Bus" is within the top ten most important variables in the random forest model for predicting trip purpose.Likewise, travel mode was considered as the second most important variable in a trip-purpose random forest model that is developed for large-scale GPS-based travel surveys (Yazdizadeh et al., 2019).On the other hand, a body of research has confirmed the feasibility and importance of using trip purpose for accurate prediction of travel mode in travel surveys.Cheng et al. (2019) designed a random forest approach to predicting travel choice, in which trip purpose ranked fifth regarding variable importance in the 20 input variables.
Although using trip purpose to predict travel mode (or the other way round) leads to improved predictive accuracy, this approach requires the presence of one attribute and hence does not apply to datasets where both attributes are unavailable.Unlike other trip information (e.g., distance, travel time) that can be passively collected by smartphones, either travel mode or purpose needs user input and is expensive to collect, especially at a large scale.For these reasons, it is of crucial importance to jointly predict mode and purpose based on the shared information between them.

Discrete choice and machine learning methods for travel behaviour analysis
For decades, DCMs have been used to model and examine individual decision making in transportation, including travel modes, trip purposes, travel scheduling, travel route, among others (Annaswamy et al., 2018;Ben-Akiva et al., 1996;Cantarella and de Luca, 2005;De Dioszar Ortu and Willumsen, 2011).DCMs consist of a wide range of models, including Multinomial Logit (MNL), Nested Logit (NL), Cross-Nested Logit (CNL), and Mixed Logit (MXL) models (Ben-Akiva and Lerman, 2018).These models have been widely used in travel behaviour research as they have clear mathematical structures and can provide economic information of travel behaviours.In turn, the economic information extracted from DCMs can provide insights to guide transport policies.For instance, DCMs can compute and derive the market shares of different modes that reveal the popularity of mode alternatives.Moreover, the substitution pattern of mode alternatives uncovers how the choice probability and market share vary with input variables (e.g.travel distance).However, each DCM is based on model assumptions, such as the independence of irrelevant alternatives for MNL model.If these assumptions are violated, the parameter inference and model H. Bei et al. prediction will be biased and inaccurate.These assumptions limit the applicability of DCMs, especially when dealing with panel data.
ML algorithms have been adopted and applied in different fields of quantitative travel behaviour studies, including car ownership prediction (Paredes et al., 2017), license plate recognition (Li et al., 2019), and traffic flow prediction (Ren et al., 2020).Notably, ML models have been used as an alternative to the traditional DCMs for modelling and predicting travel mode (Wang et al., 2020a) and trip purposes (Ermagun et al., 2017).The theoretical foundation of this line of research is that the modelling of travel mode or purpose can be considered as a general classification problem, which can then be addressed by ML classification algorithms (or ML classifiers).The main advantage of ML classifiers is that it is more flexible than DCMs due to fewer model assumptions, which leads to a higher predictive accuracy.Moreover, compared with DCMS, machine learning models have more complicated model structures, which makes it possible to model non-linear relationship between variables.The ML classifiers that are commonly used for travel mode or purpose prediction include support vector machine, classification trees, random forest, and DNNs.Notably, comparative studies consistently report that DNNs achieved a higher predictive accuracy for travel mode and purposes than DCMs (Xia et al., 2023;Zhao et al., 2020).
A major limitation of ML models for travel behaviour research (and also other fields) is that these models are not readily interpretable and it is challenging to extract reliable economic information from these models, when compared to DCMs.(Wang et al., 2020a) demonstrate the feasibility of generating a wide range of economic information regarding travel mode choice from DNN models.While the economic information extracted from DNNs is mostly reasonable and consistent with DCMs, some of the information is unreliable, which is caused by three inherent issues of DNNs: high sensitivity to model hyperparameters, model nonidentification, and local irregularity.Therefore, challenges still exist in terms of extracting reliable economic information from DNNs for travel behaviours.

Multitask learning for travel behaviour research
As discussed above, most ML classifiers for travel behaviour analysis are designed for a single task, such as estimating travel mode or trip purpose.In contrast, MTL is a mechanism that improves the generalisation of ML models on a task by sharing the information and representations between related tasks (Caruana, 1997).To achieve this, MTLDNN trains tasks in parallel whilst using a shared representation.In comparison with traditional single-task ML, MTLDNN would achieve a better prediction performance on different tasks without the loss of model interpretability, due to the shared information across different tasks.In addition, MTLDNN would greatly reduce the risk of overfitting on one task, as the model has to seek a representation that simultaneously captures all of the given tasks (Ruder, 2017).A summary of the advantages and disadvantages of three types of methods (including DCM, single-task ML, and MTLDNN) is provided in Table 1.
MTLDNN has been widely and successfully applied to related tasks of different domains.In natural language processing, MTLDNN has been used to simultaneously predict semantic components of different levels, including part-of-speech tags, chunks, named entity tags (Collobert and Weston, 2008;Hashimoto et al., 2016).In image recognition, MTLDNN has displayed remarkable performance in two pairs of tasks, semantic segmentation and surface normal prediction, and object detection and attribute prediction (Misra et al., 2016).In urban analytics, MTLDNN has been utilised to learn individual geodemographic attributes (including age, gender, income level, and car ownership) from public transport travel patterns (Zhang et al., 2020).
MTLDNN shares similar ideas as the simultaneous estimation of choice models in travel behaviour research.Specifically, MTLDNN has the potential to jointly model and estimate the household car ownership and vehicle kilometre travelled (Zegras, 2010), auto ownership and mode choice (Train, 1980), mode choice and psychological or attitudinal factors (Lyon, 1984;Morikawa et al., 2002).However, the use of MTLDNN for choice models is very limited.The only exception is that researchers used MTLDNN to jointly model revealed and stated preferences in travel surveys (Wang et al., 2020b).
In recent years, various MTLDNN architectures have been developed.The first MTLDNN was proposed by Caruana (1997), which contains shared hidden layers between all tasks and task-specific layers.The MTLDNN architecture was then improved via designing varying regularisation mechanisms and adding network components that control the differences and similarities of tasks (Argyriou et al., 2007;Evgeniou, 2005;Long et al., 2015;Misra et al., 2016;Ruder et al., 2019;Yang and Hospedales, 2019).A comprehensive overview of the history and development of MTL and MTLDNN is provided by Ruder (2017).Despite the advanced MTLDNN architectures, this study uses the classical MTLDNN architecture, as it is straightforward and efficient for the tasks of modelling and predictive mode and purpose.

Test of independence between mode and purpose
This study uses Pearson's chi-square test to test the independence between mode and purpose choices in travel survey data.Pearsons's chisquare test is a hypothesis testing method that tests the independence between multiple categorical data.This test is based on a contingency table between these data, which displays the multivariate frequency distribution of the variables (see Table 4 for an example).Assuming that the dimensions of mode and purpose choices are K m and K p , and O i,j is the co-occurrence frequency of mode i and purpose j, then the chi-square test statistic is calculated as:

Table 3
The selected and fitted models for comparison.H. Bei et al.
where E ij is the expected frequency of mode i and purpose j under the null hypothesis of independence.
The p-value is then calculated by comparing the above test statistic and the X 2 distribution.By comparing the p-value and the predefined significance level, one can decide whether the null hypothesis of no relationship should be rejected.In this study, the selected significance level is 0.01, and a p-value smaller than 0.01 indicates a statistically significant dependence between travel mode and purpose.

Multitask learning neural network for mode and purpose
The MTLDNN for analysing mode and purpose is described as follows.We let x i ∈ R d denote the input variables for mode and purpose, where i ∈ {1, 2, ⋯, N} are the indices of observations, and d represents the input dimension.The output choices of mode and purpose are denoted by y m,i and y p,i , where m and p stand for mode and purpose, respectively, y m,i ∈ {0, 1} Km and y p,i ∈ {0, 1} Kp ; K m and K p are the dimensions of mode and purpose choices, respectively.Both y m,i and y p,i are binary vectors.Due to the constraint that exactly one alternative of mode (or purpose choice) is true, ∑ Km k=1 y m,i [k] = 1 and ∑ Kp k=1 y p,i [k] = 1.As represented by Fig. 1, the feature transformation of mode and purpose can be represented as: where M 1 denote the depth of shared layers; M 2 denotes the depth of task-specific (or non-shared) layers for each task; g 0 represents the transformation of one shared layer; g m and g p represent the transformation of one layer in mode and purpose, respectively.Specifically, except for the output layer, the transformation functions (including g m , g p , and g 0 ) comprise ReLU and linear transformation: Overall, equations ( 2) and ( 3) depict the multitask learning deep neural networks (MTLDNN) architecture as shown in Fig. 1: represent the shared layers, while and represent task-specific layers for mode analysis and purpose analysis, respectively.The choice probability functions in mode and purpose are calculated by a standard softmax activation function, which is commonly used in multi-class classification, as follows: Vm,i[k]  ∑ Km j=1 e Vm,i[j]   (4) Vp,i[k]  ∑ Kp j=1 e Vp,i[j]   (5) in which w m and w p represent the task-specific parameters in g m and g p ; w 0 represent the shared parameters in g 0 .Equation ( 4) computes the choice probability for each of the K m mode alternatives, and Equation ( 5) computes the probability of each of the K p trip purposes, given the input data.
The MTLDNN model is trained by empirical risk minimisation (ERM), which contains classification errors and regularisation terms: Equation (5) consists of four parts.The first two parts are the empirical risk of predicting mode and purpose, respectively, which take the form of cross-entropy loss.The third part, is the L1-class regularisation term with λ 1 weight, where ‖w 0 ‖ = ∑ j ⃒ ⃒ w 0,j ⃒ ⃒ and so forth.The fourth part is the L2-class regularisation term with λ 2 weight, where ‖w 0 ‖ 2 2 = ∑ j w 2 0,j and so forth.Equation ( 5) incorporates four hyperparameters (θ m , θ p , λ 1 , λ 2 ) with a constraint that θ m + θ p = 1.Specifically, θ m and θ p adjust the relative importance of mode and purpose prediction, while λ 1 and λ 2 adjust the absolute magnitudes of layer weights.The larger λ 1 or λ 2 , the larger weight decay in the training of DNN models.

Multinomial logit models for mode and purpose
This study compares the DNNs for analysing mode and purpose against two baseline MNL models that analyses mode and purpose, respectively.For convenience, the MNL models that predict mode and purpose are called MNL-M and MNL-P, respectively.The utility function of MNL-M follows a linear structure: where w m is the parameters for travel mode analysis; ε m,i is the random utility term; x m,i is the independent variables for mode choice.Then, the choice probability function of each of the K m mode alternatives in DCMs is computed as follows: Similar to MNL-M, the utility function of MNL-P follows a linear structure, as follows: where w p is the parameters for trip purpose; ε p,i is the random utility term; x p,i is the independent variables for analysing trip purpose.
The probability function of each of the K p trip purposes in DCMs is as follows: (11) In the specification of the utility functions ( 9) and ( 11), we included all theoretically relevant independent variables.We used the maximum likelihood estimation method to estimate the parameters.

Data summary
We utilised the annual National Travel Survey (NTS) data of the UK from 2005 to 2016, which are publicly provided by the Department for Transport (Department for Transport, 2020).Respondents are divided into nine regions according to the household address they provided, namely North East, North West, Yorkshire and the Humber, East Midlands, West Midlands, East of England, London, South East, and South West.The NTS data is collected through a combination of face-to-face interviews and 7-day self-reported written travel diaries.The NTS data also provides individual socio-demographic information about respondents and their households, including gender, income, and car ownership.This dataset enables researchers to connect travel patterns with individual characteristics.Each year, the survey encompasses individuals from all age groups and involves around 16,000 participants from 7,000 households in England.From 2005 to 2016, a total of 121,765 respondents from 69,208 households participated in this survey.After simple data cleaning, the database contains a total of 2,100,492 observations with detailed travel information, including participant ID, mode, purpose, date and time of start and end point, location of the start and end point (government office region level, coarse-grained), travel duration, and travel distance.As the specific start and end location of each trip is not available and the home/work location of volunteers are not available, constructing the trip chain of a participant is challenging.Therefore, we consider each trip as an instance in the NTS data without constructing trip chains.In the future, Bei et al. if more details of the trips and the home/work locations of the volunteers are available, we can construct the trip chains to achieve more accurate prediction of travel behaviours.
To demonstrate the utility of the proposed framework, in this case study, we focus on the subset of travel survey data of London in the year 2005.

Experiment setup
This study compares the performance of MTLDNNs to STLDNNs and two baseline MNLs.The utility specification and estimation results of the MNLs are presented in Table A3 and Table A4.When training DNN models, the model hyperparameters that define the architecture and regularisation should be carefully selected, as the model performance largely depends on the hyperparameters.Therefore, we predefined the hyperparameter space (see Table A5) and used a grid search approach to select the optimal hyperparameters.The model performance is evaluated by the hold-out method, in which the original survey data is divided into training and testing sets with the ratio of 7:3.The training set is used for model training while the testing set is used for model evaluation.

Data summary and the dependence between mode and purpose
The market share of mode and proportion of travel purpose is shown in Table 2.The most popular travel mode, Car or Van, takes up 58.29% of total trips, as opposed to 2.69% of the least popular mode, Bicycle.Per travel purpose, the two most common purposes are Leisure (26.41%) and Commuting (24.03%), while the least common purpose is Business (4.80%).Moreover, Table 2 cross tabulates the mode and purpose, with the most popular travel purpose for each travel mode highlighted in bold.It is shown that the distribution of trip purpose varies considerably across travel modes, and vice versa.
In addition, we performed the Chi-square test to test whether mode and purpose are independent of each other.As shown in Panel 2 of Table 2, the Chi-square statistic is 3059.947,with a degree of freedom of 24 and a p-value smaller than 0.01.This indicates that there is a statistically significant dependence between mode and purpose, which is the foundation for the utility of multitask learning that co-predicts mode and purpose.

Model performance
Here, we compare cross-entropy loss and predictive accuracy of MTLDNN for predicting mode and purpose, in comparison with the baseline STLDNN and MNL.The notation of the selected models is detailed in Table 2.The hyperparameter values of the selected DNNs are in Table A6.The model performance is detailed in Table 3.
Generally, MTLDNN models outperform STLDNN and MNL models regarding cross-entropy loss in the testing set.When measured by crossentropy loss for predicting mode, MTLDNN-M outperforms STLDNN-M and STLDNN-P by 0.214 and 0.289.On the other hand, per the crossentropy loss for predicting purposes, MTLDNN-P outperforms STLDNN-M and STLDNN-P by 0.251 and 0.211.In addition, the MTLDNNs is less likely to overfit on the training samples than the STLDNNs, as MTLDNNs have a smaller difference of cross-entropy loss between training and testing set.This demonstrates the advantage of MTLDNN that it considerably reduces the risk of overfitting (Baxter, 1997;Ruder, 2017).On the other hand, the cross-entropy loss of MTLDNN or STLDNN is much lower than MNL, which indicates the superior predictive power of MTLDNN and STLDNN than MNL.

Table A3
Estimated coefficients of the MNL-M model.
The predictive accuracy of MTLDNN models is close to STLDNN.Specifically, per predicting mode in testing data, MTLDNN-P slightly outperforms STLDNN, whereas MTLDNN-M performs worse than the STL models.Regarding the predictive accuracy of purpose, MTLDNN-M performs slightly better than the STL models whilst MTLDNN-P underperforms STLDNN.In addition, the predictive accuracy of MTLDNNs and STLDNNs is significantly higher than MNL.
The model comparison demonstrates the inconsistency between cross-entropy loss and predictive accuracy, which can be explained by the differing weighting mechanisms (Wang et al., 2020b).On the one hand, the predictive accuracy assigns equal weights to all instances based on the binary loss.On the other hand, the cross-entropy loss assigns a large weight to the 'confident but incorrect predictions' (Wang et al., 2020b), which is shown in Equation ( 6) and ( 7).
We argue that cross-entropy loss is a more appropriate measure than predictive accuracy in travel behaviour research, which is also discussed by (Train, 2009).On the one hand, the cross-entropy loss is equivalent to negative log likelihood in the maximum likelihood estimation, therefore it has a clear theoretical interpretation.In fact, log likelihood and the derived measures are often used to measure the discrete choice models that fit the data.On the other hand, predictive accuracy indicates "precent correctly predicted" and "should actually be avoided" in practice.The reason is that this measure presumes that the decisionmaker is expected to choose the alternative with the highest probability given by the model, which is inconsistent with the meaning of probabilities and the objective of specifying choice probabilities.

Analysis of substitution patterns of mode choice
The substitution pattern of the mode alternatives is one of the most important economic information extracted from travel mode analysis.It reveals how the mode choice probability and market share vary with input variables (e.g.travel distance) (Wang et al., 2020a).In practice, it provides researchers and transport planners with an understanding of how market share of each mode changes with trip attributes.Here, we

Table A4
Estimated coefficients of the MNL-P model.

Fig. 1 .
Fig. 1.The MTLDNN architecture for jointly predicting mode choice and trip purpose.

Table 1
A comparison of three types of methods for travel behaviour analysis.

Table 2
Contingency table of mode and purpose and the Chi-square test.

Table 4
Performance comparison of six models.

Table A1
Descriptive statistics of the survey data.