A new low complexity bus travel time estimator for �eet management system

Improving the experience of using the public transportation system can be done by estimating the arrival time of the bus and notifying the passengers. Consequently, the accuracy of the estimation affects this experience. As the number of buses, stations, and service areas increases, the data collected in the cloud makes travel time estimation-related data processing more challenging. Despite this challenge, a distributed method for estimating the arrival time of the bus is considered in this paper. Also, we present a way to decentralize data processing and distribute it on each bus. Besides, using the Kalman filter and updating the estimated values at short intervals improves the estimation error. Examination of the degree of complexity shows that the proposed method has significantly reduced the complexity in the cloud, which makes the proposed method implementable in metropolitan areas. The implementation results on a dataset show that the proposed method has a good performance in terms of mean square error and root mean square.


Introduction
In recent years, with the development of wireless telecommunication network technology and various applications, including the Internet of Things, special attention has been paid to vehicle communications [1].Even some standards have been set for communication between vehicles, vehicles to infrastructure, also vehicles in general to everything.On the other hand, the intelligent city issue is another topic of interest to researchers in recent years [2], [3].
As part of the smart city, the intelligent transportation system uses the vehicle communication infrastructure to collect information properly.This system contains large data sets.The processing and security of this information have challenges that some papers have addressed [4], [5], [6], [7], [8].
An Intelligent transportation system plays a vital role in the passengers' experience.In this system, each bus collects some information.Processing this information in a server is one of the challenging tasks which must satisfy sustainability, accessibility, and economic development as the main objectives [9].This information can be used for bus travel time estimation for passengers or traffic congestion reduction.
Many environmental factors affect the travel of a bus.Some of these factors, such as weather conditions in different seasons, weekends and daylight hours, change periodically, and some other factors such as the occurrence of natural disasters, how the driver drives, the bus breakdown, the number of passengers at each station, the traffic condition of each street and the geographical area are entirely random.This is why we face a complex system that is difficult to estimate the travel time of each bus [10].The estimator must also have a reasonable computational complexity for the server to process data of a large number of buses.
The fleet management system is a server that works with Automated Vehicle Location (AVL) to determine the estimated time for each bus to arrive at each station.The information on each bus is collected and then processed on the server.The estimated time for each bus to arrive is classified into two general categories.In the first category, each bus estimates its arrival time based on its previous information and the information of that route.In the second category, all buses and routes' data is collected at the server, and the server performs an estimating operation using a neural network.Although neural network-based methods and machine learning are more accurate, they are not practical for large cities with many buses due to their high complexity.
The main contributions of this paper are summarized as follows: • This paper presents a distributed method for estimating the bus arrival time based on the Kalman filter.Each bus estimates its arrival time at the next station using the Kalman filter in the proposed method.Estimating the bus arrival time includes two steps: coarse estimation and fine estimation.
To computation offloading, the estimation algorithm is distributed among the buses.Intermediate calculations are stored in the server for the estimated time calculation.And estimated arrival times are displayed to users if needed.• Unlike other works, in this paper, the computational complexity for the server has been calculated.Inspired by fog computing, the estimation algorithm is distributed so that the server performs the least number of calculations.The computational complexity charts are presented in terms of estimated time accuracy, the number of buses, and the number of stations.• The proposed method is more practical compared to other methods.While it has good estimation accuracy, it has low complexity computation.Due to this good balance between accuracy and complexity, this method can be implemented in the entire fleet management system.
The rest of this paper is organized as follows.Section II reviews current research in bus travel time estimation, whereas section III describes the proposed model.Section IV compares our fog-based estimation with the centralized method in terms of complexity.Section V represents the proposed method's performance with the real-world data set.Finally, Section VI concludes the paper and reviews the main contributions.

Literature Review
The fleet management system includes sections such as a server as a management center, AVL, bus sending and receiving unit, radio system, bus GPS unit and on-board computer on the bus.The necessary information (including GPS information and other information required by the management center) is prepared by the on-board computer on the bus and sent to the server using the sending and receiving unit through the radio system, which can be a cellular communication network.In AVL, the location information of the bus is extracted and given to the management center [11].
The automatic passenger counting system introduced in [12] is responsible for extracting the information of the boarding and alighting passengers at each station [13].Estimating or predicting bus arrival times, also known as bus travel time estimates, is influenced by variables such as the number of passengers, the number of stations and the length of the route, the environmental conditions and the way you drive.The relationship between these factors and their effect on bus arrival time is discussed in [14].
Traffic flow is an influential factor in estimating the arrival time of the bus.Extensive research has been done in this area.Most of these researches collect comprehensive data from buses and apply the proposed processing method.Machine learning-based methods constitute a significant part of this research.For example, [15] introduces a method for estimating traffic flow that has been done using a deep belief network and multitasks learning model.In [16], another approach based on the restricted Boltzmann machine and recurrent neural network is used for the same purpose.Genetic algorithm-based methods are also used in [17] and [18] to predict traffic flow and traffic congestion.
Incidents are another influential factor in creating traffic congestion and affecting the arrival time of the bus, which has been discussed in some articles [19], [20], [21].Although traffic and incident analysis may help bus arrival time estimates, they add double processing complexity and overhead to the estimator system.However, this improvement may not be significant.There are different ways to estimate the bus arrival time as estimating a parameter in the desired system.These methods fall into three categories.In the first category, estimates are based on bus history.The second category uses statistical estimation methods.In the third category, methods based on machine learning and neural network are used [22].
One of the easiest ways to estimate the arrival time is to use the historical data of each route mentioned in [23], [24], [25].In this method, the average historical data for each path is considered the estimated value for that path.Although this method is a bit complicated, the estimate is inaccurate and reacts slowly to some events such as accidents.As a result, more accurate estimation methods are needed.
In [26], a Kalman Filter-based bus travel time estimator was proposed.According to this method, each path is divided into some subsections equal length parts, and the Kalman filter is used for travel time estimation in each section.The accuracy of this method is proportional to the number of sections.This paper does not discuss the computational complexity of the server, which depends on the number of sections.Also, dwell time is not estimated in this paper.
In [9], authors estimate travel time for vehicles other than buses has been made with the graph-based method.This method is unsuitable for estimating bus travel time because the dwell time is not considered at bus stops.On the other hand, the application of the proposed method is for estimating traffic.
In [27], the Kalman Filter-based bus travel time prediction and artificial neural network approach were compared.Although the experiment results show that the neural network has a more accurate estimate, the need for a large dataset to train the network makes Kalman filter preferable.
A log-normal auto-regressive modeling approach introduced in [28].In this paper, the complexity of the estimation method is high due to the concentration of calculations in the server, which makes the server need a lot of memory and a very powerful processor.
The server processes extensive data to estimate the arrival time of buses in metropolitan areas where the number of buses and the number of stations is high.In large data applications, common data processing methods are inefficient and newer techniques are needed to process data properly [29].Real-time processing of this big data is critical when estimating the bus arrival time because delays in calculations increase the estimation error.
Data loss is one of the challenges of the intelligent transportation system that may be faced for various reasons.This may overshadow the estimated values and reduce the performance quality of the estimation algorithm.In [30], the authors used machine learning to estimate the lost data.This estimated data, along with other data, completes the data set to make the bus arrival time estimation algorithm more efficient.

Proposed Method
Figure 2 shows the architecture of the proposed method.According to this architecture, the bus has GPS modules, a local database, transmit and receive module for exchanging information with servers, timers and implemented algorithms.The server also includes a send and receive module for exchanging data with buses, AVL, arrival time calculation unit and back-end processing for other tasks such as preparing information for end-users.The server is connected to the database to store and retrieve data.
Figure 3 shows the flowchart of the proposed distributed algorithm that must be performed on each bus.In fact, every bus should be informed of its position by taking feedback from GPS and running the appropriate algorithm.We need three types of estimation algorithms to estimate the arrival time of the station, the travel time between two stations estimation, which we called it link travel time estimation, the time to reach the subsequent station estimation (in the distance between the two stations), which we called it mini link travel time estimation, and the dwell time at the station estimation.
According to the flowchart, each bus checks its location.If the bus reaches the station, it takes the parameters of the last Kalman filter from the server.These parameters are related to the link travel time estimation.After running the Kalman filter, the values of the parameters and the link travel time estimation are sent to the server.The server updates these parameters in the database.Therefore, the value registered in the database is the latest updated value.If the link travel time is required for viewing, the server extracts the estimated value from the database.
Then the location of the bus is checked to determine the exit from the station.After leaving the station, Algorithm 2 is run to estimate the dwell time.The necessary parameters to run Algorithm 2 are received from the server.If the bus is at the terminal, there is nothing to do.Otherwise, the T3 timer is started, and Algorithm 3 runs when it expires.
Based on the Kalman algorithm, to estimate tn+1 with the latest observation and the average of previous observations, a weighted linear combination is calculated as follows: ( where g n+1 is the Kalman filter gain for n + 1 th time index, t n is the latest observation in the n th time index and tn is the average of previous observations until the n th time index.At each step, Kalman filter gain must be updated as where e n and σ 2 n are Kalman filter error and variance of previous observations respectively.Like the Gain Kalman filter, the filter error, average and variance of previous observations also must be updated as According to the flowchart of figure 3, each fog node runs Algorithm 1 when it reaches the station to implement the proposed method.This algorithm is responsible for updating the time estimate between two stations.Then the updated value is sent to the server.The initial values of the estimation process are randomly selected.After some steps, the algorithm converges to the actual value.

Algorithm 1
Require: last parameters of Kalman for link travel time estimation (e n , σ 2 n , tn and n).Ensure: updated parameters and estimated value.
The fog node updates the estimated dwell time when leaving the station.This is done using Algorithm 2. Similarly, this algorithm is also distributed in fog nodes, and the server is solely responsible for storing and reading updated values.
Finally, Algorithm 3 runs after the bus leave the station at specific times.This algorithm helps to update the estimated time of arrival at the station.In a similar way, Algorithm 3 estimates the average link travel time by Algorithm 2 Require: last parameters of Kalman for dwell time estimation (e n , σ 2 n , tn and n).Ensure: updated parameters (e n+1 , σ 2 n+1 and tn+1 ) and estimated value ( tn+1 ).
where m is time index, and v is speed of the bus.So fine estimation of link travel time (using mini link travel time) calculated as where ∆x L is the length of link L. Therefore two estimates are available for link travel time.Finally to increase the accuracy of the estimate, the link travel time is updated by where tn+1,L is the coarse estimation which obtained from algorithm 1 and Note that T up is the updating time of mini link estimation, and this parameter affects the complexity of the method.Complexity analysis is discussed in the next section.Algorithm 3 stores the intermediate calculation parameters in the bus but sends the final value of the link estimate to the server.

Evaluation
In this section, we evaluate the proposed method.Due to the fog-based nature of the proposed method, the evaluations are divided into two parts.In the Algorithm 3 Require: last parameters of Kalman for mini link speed estimation (e m , σ 2 m , vm ), m, T up , tm+1 and coarse estimation ( tn+1,L ).Ensure: updated parameters (e m+1 , σ 2 m+1 and vm+1 ) and estimated value ( tn+1 ).
3: if mT up ≤ tn+1,L then 4: 7: end if 8: Calculate vm+1 from eq. 6 and tn+1 from eq. 12. 9: Update e m+1 , σ 2 m+1 and vm+1 from eq. 8-10 and let m ⇐ m + 1. 10: Restart timer T 3 .first part, the complexity of the proposed method is evaluated in terms of calculations and the number of variables.In the second part, the bus arrival time is estimated using the data set provided in the book.

Complexity Analyze
The proposed method transfers the calculations from the cloud to the node nodes.In fact, it leads to the distribution of computations.So we compare the complexity of the proposed method with previous methods that are centralized.Also, due to the time-consuming reading and writing of information in the database, we compare the number of writes and reads in the database.
In centralized methods, the server must perform all the processing related to each Kalman filter.In contrast, no processing related to the Kalman filter is performed on the server in the distributed method.Also, in the proposed method, the result of intermediate calculations is stored in each node, which reduces the variables read and written in the database.
Let N , M , T L , and T up be the number of buses, the number of bus stops in the path, average link travel time and estimation update time in each link, respectively.Table 1 shows the comparison details of the proposed and centralized methods.Note that each Kalman filter has six variables.These variables must be stored as intermediate calculations when the Kalman filter has been executed.
Figure 4 shows the number of variables stored in the database for the proposed and centralized methods.According to the figure, the proposed method has exponentially reduced the number of variables.This reduces memory and thus speeds up access to variables.
As shown in Figure 4, increasing the number of algorithm 3 estimates exponentially reduces the number of variables.The proposed method reduces the number of variables by drastically reducing the number of variables related to

Estimation performance
The data set provided in [31] has been used to evaluate the method of estimating the arrival time of the bus.This database includes bus locations with timestamps in the Helsinki city of Finland.The Historical Average (HA) method is one of the simplest and least complicated methods for estimating bus travel time.Therefore, we compare the proposed method with this method.Figure 5 shows the proposed method's mean square error (MSE).As shown in figure 5, the estimator performance improves with increasing samples, and the proposed method has less error even in low iteration numbers.After ten iterations, the error of the proposed method will be a maximum of 15%.While method HA still has a high error after even calculating with 20 samples.
In addition to depending on the number of iterations, MSE also depends on the time.In other words, in the hours when there is a possibility of traffic and the traffic is more unpredictable, the amount of estimation error will be different.
Figure 6 shows the amount of estimation error at different times of the day.The proposed method works well at different hours and has a minor error from 11:00 to 12:00 AM.The HA method, on average, has almost twice the error of the proposed method.
The MSE metric determines the error rate.Therefore, this metric may not be appropriate in some cases.For example, a low error rate may be a significant error.Another metric that indicates the amount of error is Root Mean Square Error (RMSE) and is defined as where N is the number of iterations and t ac,r is the actual travel time in r th iteration.Also, tr is travel time estimation in r th iteration.Unlike the MSE metric, which is normalized to the actual value, we normalize this metric along the path per kilometer.Figure 7 shows the average RMSE graph regarding the number of iterations.According to this figure, the proposed method has an average error of fewer than 5 seconds per Kilometer after ten repetitions.The HA method has more errors and may even take more than 60 seconds.
The results in Figure 8 show that the proposed method performs well at different times of the day in terms of average RMSE.It should be noted that although in Figure 6, from 11:00 to 12:00 o'clock, the estimator performance is good from the RMSE point of view, from the average RMSE point of view in this period, it does not perform well.Prolonged travel time during this period has caused the amount of MSE to be lower compared to other periods.

Conclusion
Estimating the arrival time of the bus is of great importance in the intelligent transportation system.With increasing the number of buses and stations, a large amount of data reaches the cloud.Cloud must process them properly.Cloud processing is not just about estimating processes.Therefore, reducing the processing load contributes to the overall performance of the intelligent transportation system.The Kalman filter-based algorithm is implemented and distributed on each bus in this paper.So, each bus is responsible for processing part of the data.The results of the complexity analysis showed that the proposed method has well reduced the cloud processing load.Besides, the number of cloud accesses to the database has been reduced by eliminating the exponential term in fine estimation.The results of implementing the proposed method on a data set also showed that each bus updates the estimated value by performing its calculations.In contrast, the estimated value is close to the actual value.

1 Fig. 4
Fig. 4 Number of saved variables for distributed and centralized methods.

Fig. 5
Fig. 5 MSE of the proposed method compared with HA in different iterations.

Fig. 6
Fig. 6 MSE of proposed method and HA method in different hours.

Fig. 7
Fig. 7 Average RMSE per iteration for proposed method and HA method.

Fig. 8
Fig. 8 Average RMSE of proposed method and HA in different times.