MODELLING TRAVEL TIME DISTRIBUTION AND ITS INFLUENCE OVER STOCHASTIC VEHICLE SCHEDULING

. Due to the paucity of well-established modelling approaches or well-accepted travel time distributions, the existing travel time models are often assumed to follow certain popular distributions, such as normal or lognormal, which may lead to results deviating from actual ones. This paper proposes a modelling approach for travel times using distribution fitting methods based on the data collected by Automatic Vehicle Location (AVL) systems. By this proposed approach, a compound travel time model can be built, which consists of the best distribution models for the travel times in each period of a day. Applying to stochastic vehicle scheduling, the influence of different travel time models is further studied. Results show that the compound model can fit more precisely to the actual travel times under various traffic situations, whilst the on-time performance of resulting vehicle schedules can be improved. The research findings have also potential benefit for the other research based on travel time models in public transport including timetabling, service planning and reliability measurement.


Introduction
In public transport (e.g. bus, tram and train), a service line contains a series of intermediate stops between two termini (or with one terminal in a circle line). The service from a terminal to another is called a trip, of which the duration is called a trip time and consists of a set of segment travel time between two consecutive stops and a set of dwell time at each intermediate stop. The travel time at each route segment usually varies dramatically in different (e.g. peak-or off-peak) periods through a day due to fickle traffic, uncertain passenger demands and vehicle malfunction, etc. It is well known that the travel time is hard to be precisely measured and predicted (Chakroborty, Kikuchi 2004;Vu, Khan 2010;Ng et al. 2011). However, the travel times and their distributions are essential and fundamental for the research in different areas of public transport, including, for example, the bus route design (Kuo et al. 2013), service planning (Furth, Muller 2007), reliability measurement (Van Lint et al. 2008;Mazloumi et al. 2010), timetable optimization (Yan et al. 2012), bus dispatching (Dessouky et al. 1999), and vehicle scheduling (Zhao et al. 2006;Xu, Shen 2012). Moreover, the travel time is also the concern of passengers, which is one of key criteria for bus service quality (Abkowitz, Tozzi 1987;Uniman et al. 2010;Van Oort et al. 2012;Mori et al. 2015). Therefore, modelling the travel times properly has great significance to the research and practice in public transport.
Since the distribution of travel times can reflect the nature and shape of the travel time variability, it has been increasingly applied as a measurement of travel times (Srinivasan et al. 2014). Various distribution models have been suggested for travel times, amongst which the normal and lognormal distributions are the most popular. For instance, in the study of travel time reliability measurement, Mazloumi et al. (2010) suggested that the travel time tended toward a normal distribution in peak time and a lognormal distribution in off-peak time. In the study of the timetable optimization problem, Yan et al. (2012) aimed to minimize the early and late arrival times, in which stochastic bus travel times were assumed to follow the lognormal distribution. Dessouky et al. (1999) summarized the travel time distribution models applied in thirteen studies on dispatching problems, in which the normal, lognormal and gamma distributions were mostly used. Kieu et al. (2015) proposed a comprehensive hybrid approach to investigate the travel time distribution, in which lognormal was finally selected from 23 candidate distribution models to describe the travel time variability, and then was applied to set the recovery times in a timetable. Taylor and Susilawati (2012) found that the Burr distribution was better fit to the bus travel times in a real-world instance, since it had a flexible shape with the ability to describe the data with significant skewness. Generally, the research based on the assumed travel time distributions has been active in many areas for decades. However, it is still hard to judge, which of these commonly assumed models fit generally to actual travel time distributions, since the observed travel time distributions vary dramatically in different scenarios.
Amongst the application areas of travel time distribution in public transport, vehicle scheduling can be selected to demonstrate the significance of modelling travel time distributions, since the travel times are essential for vehicle scheduling and affect the on-time performance and operating cost of vehicle schedules (Ceder 2007). The Vehicle Scheduling Problem (VSP) is concerned with the allocation of a set of predetermined service trips to a fleet of vehicles in such a way that the total number of vehicles and operating cost are minimized (Huisman et al. 2004). In a resulting schedule, the daily work of a vehicle starts from a pull-out from a depot, followed by a sequence of service trips, and ends at a pull-in to the depot. In traditional VSPs, the travel time for each service trip (i.e. scheduled trip time) is assumed to be fixed, which is set deterministically in advance. The scheduled trip time profoundly affects the on-time performance and cost of vehicle schedules (Zhao et al. 2006). Early practice of setting scheduled trip times is normally based on human experiences (Furth, Muller 2007), which results in lack of accuracy.
In the past decades, the transport systems are increasingly equipped with Automatic Vehicle Location (AVL) systems. Large amounts of AVL data recorded by these systems have facilitated the setting of scheduled trip times for vehicle scheduling. Various methods have been developed for setting fixed scheduled trip times more accurately (Muller, Furth 2000;TRB 2003;Furth et al. 2006;Salicrú et al. 2011) based on the AVL data.
To achieve better on-time performance and cost efficiency of vehicle schedules, stochastic, instead of fixed scheduled trip times are considered in vehicle scheduling. In such vehicle scheduling methods, a set of scenarios of the trip time probability is generally applied, which is derived from the AVL data in advance. For example, Huisman et al. (2004) proposed a robust solution approach to the dynamic VSP, in which a stochastic programming problem was solved periodically with the consideration of future travel times in multiple scenarios. Each scenario provided the travel time probability based on historical data. Later, using these scenarios, Naumann et al. (2011) proposed a stochastic approach for vehicle scheduling aiming to reduce planned costs and disruption costs caused by delays.
Obviously, introducing the stochastic trip times to the VSP has increased the reliance on travel time distributions. However, the quantity and quality of the AVL data may be unreliable since these systems are often designed and deployed for monitoring uses. Such shortcoming would bring inaccuracy into the scenarios derived from the data without calibration and hinder the application of the scenarios to the VSP. Consequently, either these scenarios or the commonly assumed travel time distributions (e.g. normal or lognormal) may deviate from actual travel time distributions and influence the accuracy of computational results.
Different from the common practice of using assumed distribution models, this paper proposes a modelling approach for travel times using distribution fitting methods based on AVL data. The proposed modelling approach aims to build a compound travel time model consisting of the best distribution models for the travel times in each period of a day. The detail of the approach will be presented in Section 1. To demonstrate its value and efficiency, an application to the Stochastic Vehicle Scheduling Problem (SVSP) is carried out and the VSP model is described in Section 2. Experiments are displayed in Section 3, where results show that the compound model can fit more precisely to the actual travel times under various traffic situations, whilst the on-time performance of resulting vehicle schedules can be improved. Finally, the concluding remarks are given in last section.

Modelling travel time distributions using distribution fitting methods
To obtain a compound model fitting to actual travel time distributions, a series of basic statistical tools are integratedly employed in our travel time modelling approach. The basic idea is as follows: an empirical travel time distribution for each period during a day is firstly obtained based on AVL data; then a set of candidate distribution models are suggested in light of the normality of the empirical distributions; finally, the model fitting best to actual travel times is decided for each period during a day using distribution methods, and then a compound travel time model is constituted of these best distribution models for each period.

Obtain the empirical distributions of travel times based on AVL data
With the data pre-processing method introduced in research by Xu and Shen (2012), the observed travel time samples are firstly abstracted from the AVL data. Considering the factors that can distort travel times, the AVL data, corresponding to different days, such as workday, weekends, holiday seasons and the periods with seasonal climatic conditions and long-term construction projects etc., is normally classified into different sample sets. Moreover, some missing or wrong AVL records may exist due to the errors in the recording or matching procedure; therefore, these incorrect or missing records will be corrected, filtered out or made up in advance.
Given a set of pre-generated samples, an empirical distribution can be produced by counting up the frequency of occurrences for each travel time value. However, the travel time on a service line, especially on a bus line, varies dramatically between different periods (e.g. peak and off-peak periods) throughout a day. Therefore, to model the travel time distribution, it would be better to partition the service span of a day into a series of Homogeneous Travel Time (HTT) bands, and then to build the empirical distribution model for each HTT band.
In practice, the HTT bands are often simply decided by human schedulers according to the distinct peak and off-peak periods. To get more precise HTT bands, Xu and Shen (2012) propose a heuristic partitioning method and Shen et al. (2014) propose a k-means based clustering method for partitioning the HTT bands based on a sample set of travel times derived from original AVL data. To avoid very short HTT bands, the resulting HTT bands can be adjusted manually and finally decided by operators in practice. Figure 1 gives an example, where the service span is partitioned into seven HTT bands based on the observed travel times.
Having identified the HTT bands, the empirical distribution of travel times for each HTT band can be calculated based on the corresponding sample set. From the empirical distribution, the shortest travel time (i.e. a lower bound) and the longest travel time (i.e. an upper bound) can be easily justified, which represent the free-flow speed and the slowest speed of vehicles respectively. Figure 2 illustrates an example of two distinct empirical distributions of the travel time associated to 2 HTT bands.

Check the normality
Given a HTT band, the travel times are frequently assumed to follow a Normal distribution or a lognormal distribution when a skewed distribution is considered appropriate (Li et al. 2006;Pu 2011). However, by observing the empirical distributions of travel times (Figure 2), it can be found that the patterns and shapes of the distributions can vary considerably, even at different HTT bands on the same line.
To select a set of proper candidate distributions, quantitative information on how much an empirical distribution is close to a Normal distribution is to be obtained by checking its normality. The skewness and kurtosis should be firstly computed, which are the measures of the asymmetry and the peakedness/flatness of the distribution respectively.
Evidences provided in Van Lint et al. (2008) have shown that the difference in the shapes of travel time distributions significantly affects travel time reliability. Specifically, a skew and wide distribution can result in worse  Frequency travel time reliability. Therefore, it is helpful for better characterizing the travel time reliability of the bus line to quantify the shape of the empirical distribution in view of the skewness and kurtosis in each HTT band. For instance, given two empirical distributions in Figure 2, it can be seen that the distribution at AM peak (7:30…11:30) appears more peaked than that at Day (11:30…16:30) and both distributions are recognizably left skewed with long right tails. The computational results show that the kurtosis for AM peak and Day are 5.93 and 3.45 respectively and the skewness are 1.17 and 0.58 respectively.
Based on the skewness and kurtosis, the Jarque-Bera (J-B) test is then carried out to examine whether the empirical distribution has the skewness and kurtosis matching a normal distribution, in, e.g., the usually adopted significant level 0.05. This indicates that the test will reject the null hypothesis (of normality) at the 5% significance level if its p-value is less than 0.05. According to the J-B test, the two empirical distributions in Figure 2 reject the hypothesis of normality. Skewed distribution models would be better fit. Using this approach, the normality of the distributions for all HTT bands can be tested. Experiments have shown that the left skewed empirical distributions are mostly observed, while in some cases, either symmetric or slightly right skewed distributions exist. Therefore, the commonly used symmetric and left skewed distribution models, such as normal, lognormal, gamma and Weibull, are to be selected as the candidate distribution models for the fitting. Moreover, the Burr distribution should also be an appropriate candidate as it can be either left skewed or right skewed. The Probability Density Functions (PDFs) and parameters of the candidate distribution models are given in Table 1, where the normal distribution is a truncated normal distribution.

Distribution fitting for candidate distribution models
Given a set of candidate distribution models, a distribution fitting method is devised to select the most suitable distribution model from the candidates. The main steps include: firstly the maximum likelihood estimation method is adopted to compute the distribution parameters for each candidate; secondly, each candidate distribution model is truncated with the lower and upper bounds corresponding to the empirical distribution; thirdly, the Goodness-Of-Fit (GOF) is tested by the Kolmogorov-Smirnov (K-S) test. For any candidate model, if the pvalue of K-S test is larger than the pre-defined significance level a (say a = 0.05), then such a model is considered as significantly fitted with the data and being accepted. Fourthly, to compare the models further using Akaike's Information Criterion (AIC) (Akaike 1974), which is a measure of the relative quality of a candidate model by trading off the GOF and the number of the parameters of the model. As a result, the best fitted distribution for each HTT band can be decided, and they together constitute a compound model for the travel times during an entire day.

Modelling the SVSP
The model of traditional VSP is firstly given as follows: let (i, j) denote any arc between two trips or between a depot and a trip. Given a set of depots D, A d denotes the set of all arcs corresponding to the depot d ∈ D, P d and Q d denote the sets of pull-out arcs from and pull-in arcs to the depot d ∈ D respectively. The traditional VSP can be presented as a network flow model in Equations (1)-(5). An overview of vehicle scheduling models can refer to research by Bunte and Kliewer (2009) where: x ij is a decision variable and x ij = 1 if the arc (i, j) is selected; otherwise x ij = 0. In the objective function (1), C ij is the operating cost associated to the arc (i, j), C veh is a large constant to penalize the utilization of an additional vehicle. Constraints (2) and (3) are the flow-conservation constraints. Constraint (4) guarantees that each trip is covered exactly by one vehicle, and constraint (5)  deadhead refer to the waiting time and empty movement between any two consecutive trips respectively. In the traditional VSP model, the trip times are fixed, therefore, the idle time is calculated as: Different from the traditional VSP model, the SVSP model is proposed in research by Shen et al. (2016), where each trip has a stochastic trip time. In the SVSP model, the idle time ID ij of any trip-link arc (i, j), as illustrated in Figure Moreover, while stochastic trip times are considered, the feasibility of any two trips i and j is no longer deterministic. Therefore, e s ij i j ij IF t T DH = − + is defined as the infeasible time of the arc (i, j) and IF ij = 0 if the arc is time feasible. The larger the infeasible time is, the more possibly delay occurs which indicates the worse on-time performance the trip has. Hence, the penalty of the infeasible time P ij is defined as: Based on Equations (7) and (8), the SVSP is established by adapting the objective function as Equation (9) where: a is a non-negative coefficient to balance the conflict objectives: minimizing the cost and maximizing the on-time performance. In light of this objective function, any vehicle schedule can be evaluated in views of the cost and the penalty of infeasible time based on trip time distributions.

Experimental results
The proposed modelling approach has been implemented. The experiments on the Route 4 in Haikou Bus (HKB4 for short) in China have been carried out, in which the AVL data was recorded from May to September 2010. In this instance, only workdays are considered. Furthermore, the data recorded during the first week of September is excluded since HKB4 passes by three schools and the travel times on school open days may vary dramatically from those at normal workdays. In this section, the data set of HKB4 is first to be analysed, after which three groups of experiments are to be displayed in this section. One is on modelling trip time distributions. The second is on evaluation of vehicle schedule based on the different distribution models. The third is on applying the established distribution model to the SVSP, which aims to demonstrate the significant value of our proposed modelling approach.

Analysis of the trip time distributions of HKB4
The service span of a day is partitioned into 7 and 6 HTT bands for outbound and inbound directions respectively. The empirical distributions and the median trip times for each HTT band on the outbound and inbound are shown in Figures 4 and 5 respectively. Based on the shapes of the empirical distributions in Figures 4 and 5, the following three traffic situations can be distinguished: 1) As shown in the Figures 4a, 4c, 4e, 4g, 5a, 5f, the trip time distributions are approximately symmetric and the median trip times are low, which indicate that the traffic conditions are mostly free-flow. This situation is called the free flow situation; 2) As shown in Figures 4b, 5b, 5c, 5d, 5e, the median trip times are low and the distributions are significantly left skewed, which reflects that the traffic conditions are free-flow in most cases while congestion occurs in some cases. This situation is called the mixed traffic situation; 3) As shown in Figures 4d, 4f, the median trip times are high, and the distributions are slightly right skewed. This shape indicates that the traffic conditions are mostly congested, while in few cases, buses experience smoother traffic, which is consistent with the conclusion described in research by Van Lint et al. (2008). This situation is called the heavily congested situation.

Experiments of modelling the trip time distributions of HKB4
Before the distribution fitting, the normality checking is carried out respectively for each HTT band. The results show that the trip times follow left skewed distributions at most HTT bands and follow normal or right skewed distributions in rare situations. In light of this finding, five distributions have been revealed as the candidate distributions, including: normal, lognormal, gamma, Weibull and Burr. Amongst them, the gamma and lognormal distributions are two commonly used left skewed distributions, while the Burr distribution is flexible, which can be left or right skewed by adjusting the parameters (it can also resort back to a normal distribution). Based on these five candidate distributions, the trip time distributions of HKB4 are modelled for each HTT band, the corresponding fitted parameters are obtained as shown in Table 2, while the results of the normality checking and K-S tests are listed in Table 3.   It can be seen from Table 3 that the Burr distribution has been accepted for 12 out of 13 HTT bands. This result is slightly different from that in research by Kieu et al. (2015), where the lognormal was suggested as best-fitted. However, Kieu et al. (2015) also indicate that the Burr distribution is the best-fitted if the bus travel time is averagely long with long-tailed and left skewed distribution. In the case of HKB4, the trip time pattern fits this description since this bus line passes through the downtown of the city, and the delay occurs frequently. Consequently, the Burr distribution shows adaptation at the most of HTT bands. Table 3 also shows that the normal and lognormal distributions are accepted for 6 HTT bands respectively, while the gamma distribution is accepted for 5 HTT bands. However, the Weibull distribution is rejected for all the HTT bands.
To demonstrate the fitting of the distribution models to the empirical distributions with different shapes, 3 HTT bands corresponding to each distinguished traffic situa-tion (i.e. the free flow, mixed traffic and heavily congested situation) are given as examples. As shown in Figures 6a,  6b, 6c, respectively, the empirical distribution at AM peak outbound has a typical left skewed shape, while it has an approximately symmetric shape at Day outbound and a slight right skewed shape at PM peak outbound.
It can be seen from Figure 6a that the normal and lognormal distributions are obviously flatter than the empirical distribution, while the Burr distribution can well match the data. In Figures 6b, 6c, the difference amongst the three distribution models is not obvious. The Burr distribution shows good adaptation, which retrieves to an approximately symmetric shape, although it is slightly more peaked than the other two distribution models.
Next, the AIC provides further comparison of these distributions. The best fitted distributions in terms of AIC index for each HTT band are listed in Table 4.
From Table 4, we can see that for the outbound of HKB4, normal distribution is the best model for 3 HTT  bands at AM early, Eve peak and Night. The Burr distribution best fits the AM and PM peaks, and the lognormal fits the HTT bands following the AM and PM peaks. The situation for inbound is quite different, where the Burr distribution is the best fitted model for every HTT bands. Consisting of the best fitted distributions for each HTT band, the compound travel time model is established for the entire day.

Evaluation results of a given vehicle schedule
As described in Formulae (7) and (8), the operating cost and the penalty of infeasible time are two measures introduced in the SVSP model, which are functions of the trip time distribution. Therefore, any vehicle schedule can be evaluated based on a given trip time distribution model. In this experiment, a pre-compiled vehicle schedule S0 is first given, then evaluated based on five different distribution models: empirical, normal, lognormal, Burr, and the compound model respectively. The evaluation results are displayed in Table 5, where the RPD denotes the Relative Percentage Deviation over the evaluation result of the empirical distribution. It can be seen from Table 5 that based on different trip time models, the evaluation results, especially the to-tal penalties, vary significantly. Compared with the penalty measured by the empirical distribution, the results of the normal and lognormal distributions are significantly lower, while results of the Burr distribution and the compound model are slightly higher. To provide insight into the measurement of the cost and penalty based on different distribution models, a further analysis is carried out and presented as follows.

Analysis on the cost and penalty of the trip-link arcs
To further see the influence of different trip time distribution models over the operating cost and penalty of a schedule, this section is to analyse the influence on the cost and penalty of a trip-link, since the operating cost of a schedule is usually defined as the sum of the costs of all the links in the schedule.
Firstly, we take typical trip-links from AM peak, Day and PM peak of HKB4 outbound respectively as examples for the measurement of cost and penalty. Figures 7-9 show their costs and penalties calculated based on different trip time models, where the deadhead time is assumed to be zero to make the figures be clearer. The generality is not lost with such an assumption since the deadhead time is deterministic. Moreover, it should be noticed that in our proposed VSP model, a trip has not a fixed duration as usually predetermined in the traditional VSP, therefore, we define a max trip allowance for any trip to measure the feasibility of a trip-link, which is defined as the difference of the departure times of the two trips in the link. It can be seen from Figures 7a, 8a, 9a that the costs obtained based on different distribution models are very close. In contrast, as shown in Figure 7b referring to AM peak (mixed traffic situation), the penalties obtained based on the normal and lognormal distributions are recognizably underestimated comparing to the empirical distribution and Burr distribution, which drops nearly to 0 when the max trip allowance approaches 60 min. Figure 8b, referring to Day (free flow situation), shows that the penalties obtained based on the normal and lognormal distributions are closer to the empirical distribution and the Burr distribution causes slight overestimation. Figurer 9b referring to PM peak (heavily congested situation), shows that the penalties obtained based on different distribution models are close, and meanwhile, the Burr distribution causes slight overestimation and underestimation respectively when the max trip allowance is smaller than 65 min and during 65 to 70 min. Therefore, this illustrates that the Burr distribution fits considerably better to the empirical distribution than the normal and lognormal distributions in the mixed traffic situation, where the empirical distribution is heavily left skewed, and the three distributions are all close to the empirical distribution in the other two traffic situations. From Figures 7-9, we can find that under different traffic situations, different trip time models have different degrees of proximity to the empirical distribution in views of the cost and penalty. Hence, we believe that it might be necessary to use a compound instead of single model to better characterize the travel time in various traffic situations.

Analysis on the discontinuous distribution during off-peak periods
Moreover, we found that during off-peak periods, e.g. AM early, the empirical distributions may be discontinuous. For instance, the value of the empirical distribution at AM early as shown in Figure 10a, is 0 when the trip times are 36, 52, 53 and 54 min. This phenomenon, not caused by missing data, may be caused by low frequency and high on-time performance during off-peak periods.
To gain an insight into the travel time distribution during off-peak periods, we take a typical trip-link arc from the AM early as an example for the measurement of penalty. Figure 10 shows the distribution fitting to the empirical distribution and the penalty of a trip-link arc calculated based on different trip time models at the AM early. Figure 10b illustrates the penalties calculated based on the fitted distribution models (Burr and normal distributions respectively) and the empirical distribution. It can be seen that the penalty calculated based on the empiri-cal distribution is obviously lower than that of the fitted distribution models, and it decreases to nearly 0 when the maximum trip allowance reaches 50 min. In contrast, the penalties calculated based on the Burr and normal distributions descend more smoothly to the right tail. However, the Burr and normal distributions both cause overestimation of the penalty in comparison with the empirical distribution. From Figure 10, we can find that the distribution fitting can be used as a tool to make up the discontinuous empirical distributions, however, the fitted distribution might overestimate the possibilities of travel times corresponding to the discontinuous parts.

Experiment of applying different trip time distributions to the SVSP model
To validate the efficiency of the proposed approach, the compound model is applied to the SVSP presented in Section 2. Meanwhile, to reveal the impact of using different distribution models in SVSP, other four models (i.e. empirical, normal, lognormal, and Burr distributions) are also applied to the SVSP. The SVSP is solved using CPLEX, which is a well-known commercial optimization software. The coefficient a•in the SVSP model is set to 1.5 based on the experiments presented in (Shen et al. 2016). The resulting schedules are given in Table 6. Five schedules S1…S5 have been produced as shown in Table 6, where S1 is produced based on the empirical distribution to be served as a benchmark schedule, S2 … S4 are produced based on the normal, lognormal and Burr distributions respectively, S5 is produced based on the compound model established before. To compare the schedules consistently, the empirical distribution is now served as the identical distribution for measuring the cost and penalty of all the schedules, where the evaluation value is defined as the sum of its cost and weighted penalty, and the RPD denotes the relative percentage deviation over the benchmark schedule S1.
As shown in Table 6, S2 and S3 have lower costs and much higher penalties than S1, which means the on-time performance of S2 and S3 are much worse than S1. Therefore, the normal and lognormal distributions are not suitable to be used as the trip time distribution models for SVSP in this case. The costs of S4 and S5 are close to S1, however, their on-time performance are better than S1 as they have lower penalties.
The conclusion of this experiment is twofold. First, the Burr distribution and the compound model are both suitable to be applied to the SVSP in this case. In view of the evaluation value, the compound model is a bit better than the Burr distribution. Second, the Normal and Lognormal distributions can lead to significantly worse on-time performance comparing to the empirical distribution due to their underestimation of the penalties.

Concluding remarks
Modelling travel time distributions has great significance for the researchers and practitioners in public transport. However, currently there is a paucity of a well-established modelling approach and well-accepted travel time distributions for this purpose. This paper has proposed a modelling approach for travel times using distribution fitting methods based on AVL data. In this approach, a series of basic statistical tools are integratedly employed to build a compound travel time model, which consists of the best distribution models for the travel times in each period of a day. Although these statistical tools are traditional, the integrated utilization of them to build a compound travel time model has great practical significance. The traditional research related to or based on travel time distributions will benefit by using this proposed modelling approach instead of based on a single pre-assumed travel time distribution.
A series of experiments have been carried out based on the AVL data of HKB4. The empirical distributions and the corresponding HTT bands are firstly obtained from the raw AVL data of HKB4, where three traffic situations have revealed. Afterward, three groups of experiments have been carried out. One is on building a compound model with the proposed modelling approach. The second is on the evaluation of vehicle schedules and the third is on stochastic vehicle scheduling based on the different distribution models.
Experimental results have shown that different distribution models have little impacts on the measurement of cost. In contrast, the normal and lognormal distributions can lead to underestimation on the penalty in the cases where the travel time is significantly left skewed, which would consequently cause low on-time performance to vehicle schedules. Moreover, the compound model can fit more precisely to the actual travel times under various traffic situations. In the experiments on the SVSP model, it has been found that the cost and on-time performance of compiled vehicle schedules can be improved by using the compound model, and the Burr distribution is the secondbest choice.
Although the modelling approach for travel time distributions is tested based on the historical trip time samples derived from AVL data, it can be also applied to the travel times in route segment level, or when the historical data are insufficient or incomplete. In light of this, the vehicle scheduling for a newly open bus line can potentially benefit from this proposed travel time modelling approach. Moreover, the proposed approach would be of benefit to the research and practice of public transit planning, timetabling and service reliability measurement, etc.