Evaluating geospatial context information for travel mode detection

Detecting travel modes from global navigation satellite system (GNSS) trajectories is essential for understanding individual travel behavior and a prerequisite for achieving sustainable transport systems. While studies have acknowledged the benefits of incorporating geospatial context information into travel mode detection models, few have summarized context modeling approaches and analyzed the significance of these context features, hindering the development of an efficient model. Here, we identify context representations from related work and propose an analytical pipeline to assess the contribution of geospatial context information for travel mode detection based on a random forest model and the SHapley Additive exPlanation (SHAP) method. Through experiments on a large-scale GNSS tracking dataset, we report that features describing relationships with infrastructure networks, such as the distance to the railway or road network, significantly contribute to the model’s prediction. Moreover, features related to the geospatial point entities help identify public transport travel, but most land-use and land-cover features barely contribute to the task. We finally reveal that geospatial contexts have distinct contributions in identifying different travel modes, providing insights into selecting appropriate context information and modeling approaches. The results from this study enhance our understanding of the relationship between movement and geospatial context and guide the implementation of effective and efficient transport mode detection models.


Introduction
Knowledge regarding individuals' usage of travel modes is an indispensable element in contemporary travel behaviour studies.Travel mode choices are formed due to the everyday needs and constraints of individuals (Hägerstrand, 1970) and are generally influenced by travel-related factors such as cost, time, ability to generalize to new populations and to transfer to new geographical areas.Moreover, researchers have increasingly recognized the importance of geospatial context information, such as bus stops, land cover, and points of interests (POIs), in characterizing movements with different travel modes (Semanjski et al., 2017).By representing context information as features and incorporating them into the framework, detection performance can be significantly improved (Roy et al., 2022;Zeng et al., 2023).However, the relationship between movement and context can be depicted in multiple ways.There is currently no consensus on which types of geospatial context information are most critical for mode detection.As a result, detection approaches create a set of features based on their interpretation, typically including more contextual variables than necessary as input to ML models.
To address these gaps, we propose a pipeline to systematically evaluate the contribution of geospatial context information for travel mode detection.Specifically, we thoroughly review recent literature to identify common and natural context representations, which are then implemented based on open-source geospatial data.We develop a random forest (RF) model, reported as one of the best-performing ML models for this task.Finally, we apply SHapley Additive exPlanation (SHAP), a type of feature attribution method, to the trained RF model to evaluate the features' impacts.In short, our contributions are summarized as follows: • We implement a comprehensive set of features for travel mode detection.The features are summarized from previous research and describe motion characteristics and geospatial context information, covering a broad spectrum of context modelling approaches.
• We quantitatively evaluate the importance of geospatial context features to the detection performance.
Our results suggest that geospatial network features contribute the most, and the separation of travel modes can benefit from features describing their respective infrastructures.
• We conduct experiments on a large-scale GNSS tracking dataset.The dataset includes individual movements within Switzerland and involves seven travel modes: bicycle, boat, bus, car, train, tram, and walk, which is one of the largest studies in terms of geographical areas and detection scheme.
• We include only open-source geospatial data in the feature construction, ensuring the framework's reproducibility and generalizability.We open-source our framework to provide a benchmark for further reference 1 .

Related work
Travel mode detection is the task of inferring the travel mode utilized by an individual given a movement trajectory.Here, we focus on detecting travel modes from trajectories recorded using smartphone GNSS sensors, as they provide dense mobility traces that can lead to fine-grained mode detection results (Huang et al., 2019).The general approach for this line of mode detection studies involves three steps (Shen and Stopher, 2014;Prelipcean et al., 2017;Zeng et al., 2023).First, rule-based algorithms are designed to detect mode transfer points (MTPs) from raw GNSS track points, which segment the continuous trajectory into stages conducted with a single travel mode.Then, characteristic features such as speed or heading are extracted from the stages.Finally, rule-based heuristics, statistical methods or ML classifiers are developed to infer the travel mode based on these features.As the detection of MTPs has become the de-facto preprocessing standard (Tsui and Shalaby, 2006;Schuessler and Axhausen, 2009), we discuss the feature extraction and the method development in the following section.

Feature extraction and importance assessment
To identify typical features for travel mode detection, we consult review papers (Shen and Stopher, 2014;Gong et al., 2014;Prelipcean et al., 2017) and select representative studies published in recent years.An overview of features implemented in the reviewed literature can be found in Table 1.We distinguish between features that characterize the movement (motion feature) and features that describe the relationship between motion and context information (geospatial context feature).
Overall, motion features have been introduced since the emergence of the problem and are still the most widely used input variables.Speed and acceleration features can be found in nearly all studies and are considered the most straightforward indicators for distinguishing travel modes (Tsui and Shalaby, 2006;Xiao et al., 2015).The bearing rate that reflects the heading change stability and the length of the stage is mainly applied to separate motorized and non-motorized travels (Stenneth et al., 2011;Xiao et al., 2015).Jerk, which measures the change rate in acceleration, has gained popularity due to its introduction as an additional channel input for deep learning (DL) models (Dabiri and Heaslip, 2018).Other motion features, including duration, altitude, GNSS accuracy, and distance between track points, have been less often employed in recent years.Operationally, variables observed per track point, such as speed and acceleration, need to be aggregated into a single value to describe a movement trajectory.Apart from the most often calculated average value, researchers note that a nearly maximum value such as the 85th percentile should be used as an additional indicator (Biljecki et al., 2013), allowing for robustness to noise (Schuessler and Axhausen, Total 24 17 6 11 9 6 1 4 2 7 6 7 1 1 4 3 12 6 4 1 1 1 1 1 1 * Geospatial context features were only used to detect subway mode by designed rules.

2009
).In addition, a few methods consider more sophisticated statistical indicators (e.g., standard deviation, mode, and skewness) to describe the variable's distribution over the movement trajectory (Xiao et al., 2017;Wu et al., 2022).
Comparatively, geospatial context data has been used less frequently for travel mode detection.Based on their represented context information, we divide these features into four categories: • Infrastructure networks.These features quantify the trajectory's proximity to networks that allow movements with specific travel modes, typically implemented using different distance measures.For example, Stenneth et al. (2011) represented the rail network feature with the average Euclidean distance between each track point and its closest rail line.Roy et al. (2022) adopted the Hausdorff distance to obtain the furthest distance between track points and the network.Rasmussen et al. (2015) and Wu et al. (2022) both used a threshold-based method that calculates the proportion of track points with Euclidean distances closer to the network than a given threshold.
• Public transport stations.They are the most commonly implemented geospatial context features because of their easy accessibility from open-source map services and their effectiveness in distinguishing public transport from car travel (Chen et al., 2010).Proximity to public transport stations is either considered for the whole trajectory (e.g., by measuring the distance to each track point) (Stenneth et al., 2011;Roy et al., 2022) or for the movement's start and end points (e.g., by only considering the trajectory's endpoints) (Zong et al., 2017;Wang et al., 2018a).
• Public transport timetables and real-time locations.Besides static geospatial contexts, previous studies integrate information that reflects the actual public transport service situations (Sadeghian et al., 2022;Stenneth et al., 2011).However, this dynamic information is only openly accessible in limited areas of the world (e.g., large cities), hindering their application in large-scale travel mode detection studies.
• Land use and land cover (LULC).Relatively few studies consider LULC contexts in the task.Biljecki et al. (2013) proposed using the water body feature to identify boat movements, and Roy et al. (2022) argued that including LULC features help distinguish non-motorized travel modes.It has been long recognized that the built environment influences an individual's travel mode choice (Cheng et al., 2019;Tamim Kashifi et al., 2022), suggesting a considerable potential to incorporate LULC features for travel mode detection.
Although many studies have recognized the importance of geospatial context features (Biljecki et al., 2013;Semanjski et al., 2017), few have attempted to quantify their significance in improving mode detection performance.Comparative studies have demonstrated that including geospatial context information can significantly increase accuracy, ranging from 18% (Stenneth et al., 2011) to 77 % (Roy et al., 2022).However, due to variations in the implemented features, it is challenging to assess and compare the contribution of different context categories to the outcome, hindering the development of an efficient travel mode detection model.

Travel mode detection methods
The widespread use of smartphone GNSS sensors for collecting travel diaries has spurred researchers to develop approaches for automatically detecting travel modes from raw GNSS tracking data.As a first attempt, rule-based heuristics with human-crafted rule sets to differentiate each travel mode were proposed (Stopher et al., 2008;Chen et al., 2010;Gong et al., 2012).These systems match experts' understanding of mode usage but failed to perform satisfactorily when applied to noisy real-world GNSS records.Therefore, statistical methods such as fuzzy logic (Schuessler and Axhausen, 2009;Biljecki et al., 2013) and Bayesian networks (Xiao et al., 2015) were applied to account for ambiguity in allocating modes to observed motion and context characteristics.Later attempts introduced ML for learning classification "rules" directly from input features, allowing more flexibility in the decisions and being more robust to real-world noise.Examples of ML-based approaches include decision trees (Zheng et al., 2008), RF (Wang et al., 2018a), support vector machines (Semanjski et al., 2017), and artificial neural networks (Roy et al., 2022).Among these, RF has become increasingly popular as it can effectively handle high feature dimensionality and multicolinearity (Fernández-Delgado et al., 2014).Studies comparing various mode detection methods have consistently found that RF achieved the best performances among classical ML algorithms (Stenneth et al., 2011;Dabiri and Heaslip, 2018;Sadeghian et al., 2022).
Thanks to the availability of large-scale datasets, DL models have sparked a new paradigm for travel mode detection.Examples include convolutional neural network (CNN) (Dabiri and Heaslip, 2018;Yazdizadeh et al., 2020) and recurrent neural network (RNN) (Kim et al., 2022), which can learn multi-level representations from input features, enabling them to describe highly non-linear relationships.Additionally, DL models can effectively capture consecutive travel mode choice patterns (Zeng et al., 2023), which is challenging to consider in the standard three-step mode detection framework (Bolbol et al., 2012).However, current DL mode inference models only accept a limited motion feature set as input (e.g., speed, acceleration, jerk and bearing in Dabiri and Heaslip (2018)), and approaches to include relevant geospatial context information are still to be explored.The modelling approaches and feature attribution results obtained from this study offer valuable insights that can inspire geospatial context integration into DL models for travel mode detection.

Methodology
We present a framework for evaluating the importance of geospatial context information in travel mode detection.The overall pipeline is illustrated in Figure 1.First, we extract motion and geospatial context features from the GNSS movement trajectory ( §3.1).These features provide a comprehensive characterization of movements performed with different travel modes.Then, we implement an RF classifier to identify the travel mode with the extracted feature set ( §3.2).Finally, based on the classification outcomes, we evaluate the contribution of the features to the model's prediction using SHAP ( §3.3).In the following, we provide a more detailed description of each step.

Trajectory
Travel mode label

Random forest
Performance evaluation

Feature importance evaluation
Figure 1: Pipeline for evaluating the contribution of geospatial context information for travel mode detection.

Feature Extraction
As a first step, a typical travel mode identification framework requires extracting meaningful features.
We represent a movement trajectory travelled with a single travel mode by user u i using where m represents the employed travel mode, c is the associated geospatial context, and g(s) denotes the time-ordered track points that constitute the movement, i.e., g(s) = (q k ) n k=1 .A track point q is a tuple of q = ⟨p, t⟩, where p = ⟨x, y⟩ includes spatial coordinates in a reference system, e.g., latitude and longitude, and t is the time of recording.Following the notation, we distinguish between motion features, where only track points g(s) are involved, and geospatial context features, where both context information c and track points g(s) are needed.

ID Name
Feature 2 provides an overview of all motion features included in the study.Basic movement metrics such as length, duration, speed, acceleration, and bearing rate are calculated from track points.We obtain the average and the 85th percentile value of these metrics for each trajectory.More specifically, we calculate the length ∆d k and duration ∆t k of travel between two consecutive track points q k , q k−1 ∈ (q k ) n k=1 : where ∥ • ∥ 2 denotes the Euclidean distance with p k and p k−1 represented in a planar coordinate system.
The length D and duration T are obtained through summarizing all these intervals, i.e., D = n k=2 ∆d k , T = n k=2 ∆t k .Moreover, the speed v k and acceleration a k of track point q k ∈ (q k ) n k=1 are obtained as follows: Another essential feature that distinguishes travel modes is the direction change, which is often represented using the bearing rate (Yazdizadeh et al., 2020;Sadeghian et al., 2022) that measures the absolute difference between the bearings of two sequential track points.The bearing rate b k of track point can be calculated as follows: We then calculate the average speed V , average acceleration A and average bearing rate B of the trajectory by averaging over all its track points: The 85th percentile speed V 85 th , acceleration A 85 th and bearing rate B 85 th are obtained by ordering k=2 sequences in ascending order and selecting the 85th percentile value, respectively.
Geospatial Context Feature.Table 3 provides an overview of the geospatial context features considered in the study.These features are either obtained with the trajectory's endpoints (Table 3 Endpoints) or all points that form the trajectory (Table 3 All points).The latter can be further categorized following the geometric type of the context data, i.e., whether the contexts exist in the form of points (e.g., POI), networks (e.g., road network) or areas (e.g., residential area).We include abundant geospatial context features to exhaustively consider different context modelling approaches in previous studies (see Section 2).
We start by implementing features that only depend on the endpoints of a trajectory.These features measure the closeness of trajectory endpoints to the geospatial context, assuming that trajectories shall start and end at predefined locations (Stopher et al., 2008;Gong et al., 2012).Operationally, we regard the k=1 with m(Q) point objects as the context c = Q, and identify the minimum distance between Q and the trajectory start point q 1 as well as the end point q n , respectively: We use the minimum and maximum of the two distances to quantify the closeness of a trajectory to the point context: We construct Q using context data regarding railway station, tram stop, bus stop, car parking, bicycle parking and ship landing stage, respectively, which results in 12 features in this category.
Besides, previous studies have measured the proximity of the entire trajectory to geospatial contexts (Semanjski et al., 2017), which accounts for dynamic context interactions during the movement process.Considering the same point object set Q, we now measure the average minimum distance between Q and every point q ∈ (q k ) n k=1 that forms the trajectory: We implement features measuring the distance to railway stations D railS , tram stops D tramN , bus stops D busS and general POIs (e.g., public and catering facilities) D P OIS using the respective point context data.
In addition, movements using specific travel modes are confined by their infrastructures, such as trains and trams that operate on established tracks.This characteristic can be quantified by measuring the distance between the trajectory and infrastructure networks.Here, we consider the infrastructure network k=1 that consists of m(L) line objects as the context c = N .We then calculate the average of the closest distance between every trajectory point q ∈ (q k ) n k=1 and the network N : where f (q j , L k ) measures the Euclidean distance between the point q j and the line L k .As a result, network distances of a trajectory to the railway network D railN , to the tram network D tramN , to the road network D roadN , and to the pedestrian and bike network D pedN are obtained.
The last feature category relates to built and natural environments that are particularly attractive to or only accessible by specific travel modes.These LULC contexts are typically available in area formats and can be denoted using a set k=1 that contains m(E) non-overlapping area objects.Analogous with Equation 8, we include features describing the distance to residential areas D resident , public green spaces D green and forest areas D f orest , as travels close to these LULC contexts were reported to be dominated by active modes and short-distance public transport (Semanjski et al., 2017;Roy et al., 2022).In addition, we obtain the trajectory's proportion on water P water for describing boat travels: where within(q j , E k ) represents the spatial analysis function for determining whether q j is within is the indicator function, and

Random Forest Classification
Given a feature set that describes the trajectory, travel mode detection can be viewed as a classification problem that aims to obtain a (non-linear) mapping g(•) between the feature set X and the ground-truth travel mode label c, i.e., c = g(X ).When a new trajectory is observed, the learned mapping g(•) is used to predict a travel mode label ĉ.Many supervised ML models have been proposed for learning g(•) (Dabiri and Heaslip, 2018), among which tree-based models, especially the RF model, are reported to achieve optimum performances (Stenneth et al., 2011;Yang et al., 2022).
RF is an ensemble method based on the decision tree classifier, with a binary tree structure that partitions the feature space into a set of mutually exclusive regions.These partitions are performed by searching all possible feature splits and selecting the one that maximises the Gini impurity gain.For a candidate splitting feature X k ∈ X , the Gini impurity index is calculated as: where C is the number of categories in X k , and pr i represents the sample proportion of the i-th class.The gain for a split is calculated by comparing the Gini impurity in the parent node and the one after performing the split.A decision tree is completely built until pre-specified termination criteria are met, or until all leaves are pure, meaning that only samples from the same category are included (Breiman et al., 2017).
While the decision tree is robust to the inclusion of irrelevant features and produces inspectable models, it tends to overfit the training set and has low generalization ability (Hastie et al., 2009).RF significantly alleviates the overfitting issue by maintaining multiple decision trees, each trained with a different training set, constructed by random sampling from the original training dataset with replacement.In addition to this sample randomization, often referred to as bootstrap aggregating, RF introduces additional randomness in the feature selection process.Only a random subset of features is considered at each node split, ensuring the diversity of the learned decision trees (Breiman, 2001).During prediction, each constructed decision tree outputs a prediction class label, and the majority voting strategy is used to determine the final classification result.

Evaluating feature importance
With a well-trained RF model, we evaluate the contribution of individual features to the prediction result with SHAP (Lundberg and Lee, 2017) and TreeExplainer (Lundberg et al., 2020).SHAP is a game-theoretic approach to explain the output of ML models using Shapely values (Shapley, 1952) that fairly distribute players' contributions when they collectively achieve an outcome.The concept can be generalized to ML to quantify the contribution of each feature that collectively delivers the model's output ( Štrumbelj and Kononenko, 2014).More formally, the Shapley value ϕ X k of feature X k is its marginal contribution to the model prediction, averaged over all possible models trained with different feature combinations: where | • | denotes the cardinality of a set and v(S) is the prediction value of a model trained with the feature set S. The exact computation of Shapley values for an arbitrary model has proven to be NP-hard (Matsui and Matsui, 2001), posing computational challenges to their widespread adoption.Facing this challenge, Lundberg and Lee (2017) proposed SHAP to estimate Shapley values, and subsequently, TreeExplainer was presented to exactly compute Shapley value explanations for tree-based models (such as RF) in polynomial time (Lundberg et al., 2020).
We use the TreeExplainer to obtain SHAP value explanations at the level of individual observations.The overall importance ϕ X k of feature X k is calculated as the average absolute SHAP values over all considered data samples: where ϕ i,X k is the SHAP value for sample i for feature X k , and N is the considered sample size.A higher absolute SHAP value suggests a stronger influence of the feature on the prediction.Therefore, we can assess the impact of context features in travel mode detection by analyzing ϕ X k for all implemented motion and geospatial context features.
In addition, we implement feature importance assessment methods employed by previous travel mode detection studies (Yang et al., 2022;Wu et al., 2022), including mean decrease in impurity (MDI), permutation importance and drop column importance, to complement and support the result obtained using SHAP.
These methods are based on different principles and are described in detail in Appendix A.

GNSS tracking data and preprocessing
We utilize a large-scale longitudinal GNSS tracking dataset for the case study.The tracking dataset was recorded within the SBB Green Class (GC) E-Car pilot study conducted by the Swiss Federal Railways (SBB) from November 2016 to December 2017 (Martin et al., 2019).The study, involving 139 Switzerland-based participants, aimed to evaluate the effect of a mobility-as-a-service (MaaS) offer on individuals' mobility behaviour.The participants were provided with a MaaS bundle and were asked to install a GNSS-tracking application on their smartphones that records their daily movement with a high temporal resolution.Based on motion measurements such as speed and acceleration obtained from built-in smartphone sensors, the We implement a series of preprocessing steps to prepare the dataset for travel mode detection following previous research on travel behaviour analysis (Hong et al., 2023) and mode detection (Stopher et al., 2008;Dabiri and Heaslip, 2018).The detailed steps can be found in Appendix B. After preprocessing, we obtain 365,307 stages with user-validated mode labels grouped into bicycle, boat, bus, car, train, tram, and walk.
The travel mode frequency is shown in Table 4, suggesting a highly imbalanced number of class labels: ranging from 367 stages for the class boat to 155,177 for the class walk.and Swiss Map Vector 25 (SMV25) 3 .OSM is an open-source project that provides users with free and easily accessible digital map resources and is considered the most successful and prevailing volunteered geographic information (VGI) project (Hong and Yao, 2019).We retrieve historical feature layers from early 2017 in Switzerland from OSM to match the time frame of the GC tracking study.These layers include transport infrastructure, traffic-related POI, general POI, places of worship, road and railway infrastructure, as well as land cover type.As the water layer of the 2017 dataset does not include main lakes across Swiss borders, this layer is taken from the latest version of OSM (mid-2022).Besides, we retrieve information regarding the size of a river and whether it is navigable for boats from SMV25.Table 5 provides an overview of all layers used as input to extract geospatial context features.

Model training and evaluation metrics
Our pipeline is implemented in Python using trackintel (Martin et al., 2023), scikit-learn (Pedregosa et al., 2011) and SHAP (Lundberg and Lee, 2017) libraries.We randomly split the GC dataset into non-overlapping train and test sets with a ratio of 8:2.We then perform a grid search using five-fold cross-validation on the training set to determine the optimum hyper-parameters.The detailed ranges and the final selected hyper-parameter set are given in Appendix C. We finally retrain the RF model using all the training data and evaluate the model performances on the held-out test set.SHAP values are obtained from test data samples using path-dependent feature perturbation for the shapely value function (Equation 11), whose results are regarded as "true to the data" and reflect natural mechanism in the real world (Chen et al., 2020).
We use the F1 score to assess the performance of our travel mode detection model.For each travel mode category, the F1 score is the harmonic mean of precision and recall, which are obtained by constructing the confusion matrix and counting the true positive (TP), false positive (FP), and false negative (FN) samples from the model: We use the average F1 score across classes, which weights the performance for each travel mode fairly without considering its number of instances, thus creating a suitable measure for the class imbalance problem.

Feature extraction
We extract 32 features for each movement stage, consisting of 8 motion features and 24 geospatial context features.Figure 2 shows the distribution of three example features from different categories, highlighting clear distinctions between the considered travel modes.For instance, 85th percentile of acceleration distinguishes between low acceleration modes such as boat and walk with high acceleration ones such as car and tram (Figure 2A).Additionally, the stage endpoints for all travel modes except tram have a considerable distance to tram stops (Figure 2B).Finally, distance to road network successfully differentiates travel modes that operate on the road network (i.e., bus and car) from those that do not occupy road spaces, such as boat and train (Figure 2C).These examples demonstrate that features characterize movements from different perspectives and facilitate the separation of travel modes.We conduct a correlation analysis to investigate the relationship between features.Figure 3 shows the heatmap of Spearman's rank correlation coefficient ρ between pairs of features, which accounts for differences in feature distributions and scales.The majority of light grid colours representing ρ values close to 0 show that most features are not strongly correlated, indicating that the implemented feature set effectively captures diverse movement characteristics without much redundant information.However, there are some exceptions.We observe darker grid colours for motion features, showing high positive correlations between length and duration as well as speed and acceleration.We also report high negative correlations between bearing rates and other motion features.Moreover, we find strong positive correlations between features related to train infrastructure and between features related to tram infrastructure.Highly correlated features have limited influence on the travel mode detection performance since RF is relatively robust to feature collinearity during training (Fernández-Delgado et al., 2014).However, the interpretation of individual feature contributions may be affected depending on the employed feature attribution method (Hastie et al., 2009;Molnar, 2020).

Travel mode identification result
We train an RF model using these features to identify travel modes for movement stages.The confusion matrix and the precision, recall and F1 score of travel modes are presented in Table 6.We achieve an overall accuracy of 93.0% and an average F1 score of 83.0%.Regarding individual classes, we report reliable detection for the most frequently observed travel modes, namely car, train, and walk, as indicated by their high F1 scores of over 90%.In addition, the RF model achieved high performance for the tram mode with an F1 score of 96.2%.On the other hand, the model experiences difficulty in correctly classifying less frequently observed travel modes, such as bicycle, bus, and boat.Specifically, movement stages recorded as bicycle modes are often misclassified as walk or car modes.This could result from the bicycle mode being formed from both conventional and electric bikes, leading to a wide range of movement behaviours.We also observe that bus trips are frequently misdetected as car movements, likely due to their similar motion characteristics and shared road infrastructure.In summary, the overall performance of the mode detection model provides an excellent foundation for estimating the relative importance of each input feature.Yet, we must consider the performance variations between travel modes when interpreting the feature importance result.

Evaluation of Feature Importance
Based on the trained RF model, we analyse the feature contribution to distinguishing travel modes using various feature importance assessment methods.This section focuses on feature importance obtained using   value factors all belong to motion features (Figure 4B).On the contrary, motion features are insufficient in separating bus and tram modes, whose most contributing elements are related to their corresponding geospatial context, such as bus and tram stops (Figure 4D and H).Here, it is evident that identifying bus and tram modes benefit from different context modelling approaches.We observe a high SHAP value of the endpoint-context distance to detect bus trips, whereas the trajectory-context distance contributes more to output tram labels.Not surprisingly, identifying boat travels benefits the most from the water body feature (Figure 4G), which describes the unique characteristics of boats travelling on water.In summary, this detailed analysis reveals that feature importance is travel mode dependent, and features with low overall importance might be essential for identifying specific travel modes.Besides general motion features, geospatial context features that reveal characteristics of particular modes are also crucial for travel mode detection.

Discussion
This study proposes a feature attribution framework for travel mode detection models.We implement a set of motion and context features to train an RF model, which obtains convincing travel mode identification results.The SHAP feature attribution analysis reveals the essential role of geospatial features in the task.
Note that by considering various mode labels on a large-scale GNSS tracking dataset, the achieved performance is high in view of the recent studies (Yang et al., 2022;Kim et al., 2022;Wu et al., 2022).Although the exact performance figures are not directly comparable due to the differences in the employed dataset and preprocessing steps, the result demonstrates the selected features' effectiveness and the RF model's capability.Moreover, the successful detection lays a solid foundation for subsequent feature importance assessment, as feature attribution methods aim to disentangle the decision-making process within the model.
We use SHAP values to measure the importance of features and compare them with those obtained from other importance assessment methods.Despite differences in relative importance, all assessment methods regard network features as the most contributing factors, and LULC (except water body) features are of minor significance.The discrepancies mainly lie in the attribution of motion features, which are most heavily affected by feature correlation.Permutation and drop column importance both suffer from this issue, underestimating the contributions from motion features.MDI and SHAP reflect the internal decision process of the RF model that is relatively robust to feature colinearity.Consequently, these two methods obtain similar attribution results for motion features.Nevertheless, MDI utilizes training data to derive its result and cannot reflect the model's generalization ability for unseen data.Therefore, with a solid foundation from game theory, SHAP is the most suitable and accurate approach to demonstrate the relative importance of geospatial context features.
The SHAP importance value for features helps us understand why the RF detection model may confuse specific travel modes.We show this by comparing each travel mode's most contributing feature set, given in Figure 4B-H.Bicycle movements are detected relying on road and pedestrian networks as well as motion features, which are also the most contributing factors for identifying walking and car trips.Similarly, the distance to the road network is most influential for both car and bus trips.The detection model will likely misclassify a travel mode if its distributions in the most contributing features are similar to other modes.In other words, the lack of distinctive features for bicycle and bus movements limits their prediction performance, leaving room for improving existing and introducing new feature designs.
Although we have systematically evaluated the importance of geospatial context data, there are several points to consider when interpreting the results.Firstly, it is essential to distinguish the context feature importance obtained from our pipeline from the general significance of the underlying context information.
Our study presents only one way to represent context information as features, and feature assessment results may vary with different modelling approaches.However, we note that the features and their representations implemented in this study are carefully selected following a systematic literature review.Secondly, the GC dataset is subject to common GNSS tracking quality issues, such as spatio-temporal gaps and spatial uncertainties in GNSS recordings (Zhao et al., 2021), which may affect the representativeness of the features and consequently influence the assessment results.Future studies should analyze the robustness of the assessment results to the quality of GNSS tracking data.

Conclusion
Methods proposed for travel mode detection from GNSS tracking data are increasingly powerful, yet little is known regarding the model's underlying working mechanism and how the model outputs a travel mode.
To address this gap, this study introduces an analytical framework for assessing the significance of geospatial context information in travel mode detection models.Concretely, we review common feature representations from recent work and implement an exhaustive set of features that describe motion characteristics and geospatial context interactions of a moving trajectory.We analyse the correlations between these features and use them for training an RF model that learns to correctly identify the travel mode.Using the constructed RF model, we employ feature attribution methods to evaluate the influence of individual features on obtaining the output mode label.The framework is tested on a longitudinal GNSS tracking dataset containing userlabelled travel modes for trips over 52,000 user days.
The feature attribution results obtained in this study demonstrate that geospatial network features, such as distance to the road network, are more critical than motion features, such as speed and acceleration, when classifying an extensive list of travel modes.This finding highlights the importance of incorporating network features in travel mode detection models, especially given that many existing studies rely heavily on complex motion features without considering the geospatial context.We also find that features describing relations between movement and geospatial point entities help identify public transport travel with designated start and end stations.Additionally, our results suggest that the majority of LULC features do not significantly contribute to the task, emphasizing the need for further modelling work to represent the relationship between LULC and movements.Finally, we identify the most contributing features for detecting each mode, providing insights into the contexts that should be emphasized when aiming to classify specific travel modes accurately.
The proposed travel mode detection framework can be readily applied to movement datasets collected from other parts of the world, thanks to the high-quality and easily-accessible OSM worldwide map service (Boeing et al., 2022).The study provides valuable guidance for feature selection, effective feature design, and building efficient travel mode detection models.Additionally, our results can inspire novel context integration approaches for DL models to represent the relationship between movement and geospatial context.relationship, the drop in the model score indicates how much the model depends on the feature.We compute the permutation importance on the test set without retraining the RF model.However, it is essential to note that this importance score does not reflect the intrinsic predictive value of a feature by itself, but rather how important the feature is for a particular model.For example, features deemed low importance for a poor model could be essential for a good model.Therefore, ensuring a strong predictive power of the model is crucial before using this score.Additionally, permutation importance is biased for highly correlated features.
As a well-trained model may still achieve good performance when one of the correlated features is shuffled, the permutation importance measure tends to underestimate the contribution of correlated features.
Figure S2 shows the mean decrease of the F1 score for random shuffling features.Most feature permutations have a limited effect on the model's performance, as indicated by an F1 score decrease of less than 3%, which might be due to their observed inter-correlations (see Figure 3).The network features have much higher contributions than all other feature categories, with distance to road network (2.19) being the most important among them.Also, including proportion on water  Drop Column Importance.The drop column importance method evaluates the contribution of a feature by retraining a model without it.The underlying idea is that training a model without a feature will not significantly affect the performance if the feature is unimportant.However, this method is computationally expensive, requiring retraining the model as many times as the number of features.Additionally, it can underestimate the importance of correlated features.In extreme cases where two features are completely correlated, dropping one will not impact the model, as the remaining feature contains all the necessary information for training, resulting in a zero importance score.
Drop column importance is measured by the decrease in the F1 score of the test set, averaged across five different model (re-)trainings, as shown in Figure S3.This method can also produce negative values, which suggests the prediction performance increases when the feature is excluded from the model.Dropping each feature does not significantly influence the model's performance, with most differences in F1 score of less than 1%.We still observe a relatively high contribution of the network features, especially distance to road network (2.19).The distinction between all other features is small.The motion features demonstrated low importance compared to the other assessment methods, most likely due to their strong correlation (see Figure 3), which diminished their contribution when measured using drop column importance.

Figure 2 :
Figure 2: Boxplot showing the feature distribution categorized based on the ground truth travel mode.Three example features 1.6 A 85 th (A), 2.4 D max tram (B) and 2.19 D roadN (C) are selected to show the effectiveness of the implemented features in separating travel modes.Outlier values are excluded.

Figure 3 :
Figure 3: Spearman's rank correlation coefficient between pairs of features.All pairs have two-tailed P < 0.05 except for 2.21 Pwater and 2.6 D max bus (P = 0.13).
Travel mode category.SHAP values are calculated at the level of individual observations, thus able to reflect detailed feature attribution for each travel mode category.We presented the five most crucial features for detecting each travel mode in Figure4B-H.Although these features vary, we observe shared patterns for different travel modes.The corresponding network feature is always the most important contributing factor for modes restricted to specific infrastructures.Typical examples include cars, buses, trains and trams, whose infrastructure network is the most contributing feature, considerably exceeding the importance of the second most crucial one (Figure4C, D, F, and H).In addition, motion features such as average speed and acceleration are commonly attached with high SHAP values, showing their importance in distinguishing all travel modes.Their contributions are particularly evident in identifying walking, where the highest SHAP (2.21) significantly impacts the model's performance.Other decisive contributing factors include motion features (1.1 and 1.3), endpoint features (2.5, 2.6, 2.11, and 2.12) and distance to residential areas (2.23).The standard deviation across different permutation runs, indicated by the error bars, is very low, suggesting the permutation importance result is relatively stable.

Figure S2 :
Figure S2: The feature importance evaluated using permutation.The error bars indicate standard deviations calculated from five different permutations.

Figure S3 :
Figure S3: The feature importance evaluated using the drop column method.The error bars represent standard deviations calculated from five different model fits.

Figure S4 :
Figure S4: Travel mode prediction performance when altering the hyper-parameters of RF.The error bars represent standard deviations calculated from five-fold cross-validation.We show the performances for sample-weighted and original RF when tuning the maximum tree depth with 150 estimators (A), and tuning the number of estimators with 21 maximum tree depth (B).

Table 1 :
Summary of input features in related literature (PT: public transport).

Table 2 :
Description of motion features.

Table 3 :
Description of geospatial context features.
application segments the recorded GNSS traces into stages of continuous movements and staypoints where users are stationary.It additionally imputes the travel mode labels for stages (also based on motion measurements), which are later confirmed or corrected by the study participants.The GC dataset consists of ∼230 million GNSS track points, aggregated into 465,195 stages with travel mode labels (car, e-car, train, bus, tram, bicycle, e-bicycle, walk, airplane, boat, coach), and includes information about individual travel behaviour for 52,251 user days.

Table 4 :
Travel mode frequency of stages.

Table 5 :
The sources, geometry types and descriptions of the considered geospatial context data.

Table 6 :
Confusion matrix and performances for travel mode identification.model does not deem their contained information crucial.Additionally, geospatial point features moderately contribute to generating the output.Context information regarding public transport, including rail, tram and bus stops (2.1 -2.6 and 2.13 -2.15), is more valuable than the information related to car, bicycle and ship parking (2.7 -2.12).The distance to POIs (2.16) is among the most influential features in this category.Lastly, except for the high contribution of the water body feature (2.21), the other LULC features (2.22 -2.24) have limited impacts on the detection model, as shown by their lowest SHAP values compared to all other features.We believe that motion and other context features are sufficient in distinguishing travel modes, and the ancillary LULC context information does not provide additional knowledge for the RF model.