An Efficient Solving Method to Vehicle and Passenger Matching Problem for Sharing Autonomous Vehicle System

School of Transportation Science and Engineering, Beihang University, Beijing 100191, China Institute of Transport Studies, Department of Civil Engineering, Monash University, Australia Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China Institute of Rail Transportation of Jinan University, Electrical and Information College of Jinan University, Zhuhai 519070, China


Introduction
Autonomous vehicles (AVs) are regarded as a promising mode of transportation that provides increased mobility, enhanced customer satisfaction, and reduced infrastructure costs (e.g., [1][2][3]). For the autonomous transportation system, using shared AVs (SAVs) is considered as one of the most efficient way to provide on-demand service. According to the number of passengers carried at one time, existing researches related with SAVs could be classified into two categories, i.e., car-sharing and ride-sharing.
In the car-sharing mode, each SAV only takes one passenger at one time. Different from traditional car-sharing system (e.g., Boyaci et al. [4], Huang et al. [5]), the major concern for SAV system is how to assign SAVs to provide point-to-point service instead of determining depot locations and relocate AVs among stations for balancing supply and demand (e.g., Ma et al. [6], Levin et al. [7], Chen et al. [2], Miao et al. [8]). In the ride-sharing mode, multiple passengers are permitted to ride in one car simultaneously. If the system control platform could match efficiently passengers and vehicles, then the overall vehicle occupancy will be increased and traffic congestion will be released through cutting down on-road vehicles. erefore, this paper will focus on how to build a control platform to make passengers sharing a car.
For modeling the ride-sharing service, most researches refer to the classical Dial-a-Ride Problem (DARP) model proposed by Cordeau [9]. e essential idea of DARP is to extract the routing problem into the arc-based flow conservation model. In this way, vehicle routes could be decoded by linking visited arcs. Although this method could illustrate the routing problem with a mixed integer model, its drawback is inevitable to generate valid arcs for identical requests. Meanwhile the number of variables for this model would be raised exponentially with increased passengers. To solve this model in an efficient way, efforts have been devoted on exploring optimal solutions. One of the prevailing algorithms is the branch-and-cut method, which aims at eliminating invalid arcs and cutting redundant domains, e.g., Liu et al. [10], Bongiovanni et al. [11]. Except for it, other methods are also tried to find optimal routes. For example, Hosni et al. [12] adopted the Lagrangian decomposition method to segment the multivehicles routing problem into several single vehicle. Mahmoudi and Zhou [13] combined the forward dynamic programming and Lagrangian relaxation approach to search feasible solutions. Cordeau and Laporte [14] applied a Tabu search heuristic to solve the DARP and proposed a framework of neighborhood evaluation. Häme and Hakula [15] proposed a heuristic algorithm that maximized the number of customers served by a single vehicle. Diana and Dessouky [16] proposed an insertion heuristic to handle the computational complexity and investigated the relationship between solution quality and computational costs.
However, most existing methods try to explore detailed vehicle trajectories. is poses difficulties in searching feasible paths for increased demands. Vazifeh, et al. [17] transferred the dispatching of vehicles as a minimum path cover problem on directed graphs and tested graph algorithms with massive taxi data of New York City. Based on this method, it might be powerful to solve the vehicle and passenger matching problem by investigating the similarity of passenger trip trajectories and classifying passengers into groups. Our idea herein is to simplify the researched problem by directly exploring the relationship between passengers. In this way, passengers can be classified into several groups. en by checking the connection between groups and assigning each group with one vehicle, the number of vehicle needed can be derived. With this thought, we will propose an algorithm for tackling various passengers. In summary, the main objective of this paper is: (i) simplify the researched problem by linking passengers instead of directly matching vehicles and passengers; and (ii) design a cluster-based algorithm to find solutions for high traffic demands with high efficiency. e rest of the paper is organized as follows. In Section 2, the vehicle dispatching model for SAV-based transportation systems is proposed Section 3 presents an efficient heuristic algorithm developed for the proposed model. In Section 4, comparative results of various scenarios are discussed. Finally, Section 5 presents some conclusions with remarks.

The Dispatching Problem of an AV-Based Transportation Service System
is paper mainly discusses how to plan AVs serving routes and determine the number of dispatched vehicles for the SAV transportation system. An optimization model is developed for matching vehicles and passengers. e details of the proposed model are descripted in this section.
2.1. Problem Statement. For the SAV system, a major problem is to optimize vehicle serving paths with the consideration of saving the number of vehicles. In general, passengers are characterized by origin-destination (OD) pairs and time windows. In detail, let be the set of all requests where 표(푢) and 푑(푢), ∀푢 ∈ 푈, respectively represent requested pickup (origin) and delivery (destination) nodes. For these requests, the earliest and latest pickup time indexes are 푡 (푢) and 푡 (푢), ∀푢 ∈ 푈 and the earliest and latest drop-off time are 푡 (푢) and 푡 (푢), ∀푢 ∈ 푈. Based on passenger requests, a directed graph 퐺 = (퐴, 퐸) could be depicted, where is the set of OD nodes and is the set of edges associating travel time among nodes. Let represents the set of paths travel time in the road network, values of weighted edges in the graph are regarded as travel time, i.e., 푇푇 (표(푢), 푑(푢)) ∪ 푇푇 It should be noted that travel time is estimated as fixed values for simplification. For vehicles, is the set of available AVs. We assume that AV fleet is homogeneous with respect to capacity which is equal to a preset value . Under this background, the goal of optimizing the AV transportation system is set to minimize the system cost, i.e., the number of used vehicles. Based on existing researches, an extended DARP model could be modified to express the objective of minimizing vehicle number instead of minimizing the total travel distance. Since the DARP model is a node-flow model, the solution domain is denoted by 4 × {푉} ⊗ {푈} ⊗ {푈}. When available vehicles and user demands are large, numerous time will be exhausted on searching optimal vehicle paths. To reduce the complexity in mathematical modeling and to present it in a simple way, we intend to build the optimization model by only delivering the relationship of passengers, i.e., 휆(푢, 푤), 1 if passenger and w are served by the same vehicle, and 0 otherwise. In this way, passengers are divided into groups and the space is reduced to 0.5 × {푈} ⊗ {푈}. Furthermore, detailed vehicle paths could be obtained through translating passenger relationships. For example, given four passengers (푢 = 1, 2, 3, 4, ∀푢 ∈ 푈) waiting for service. If 휆(1, 2) = 1 and 휆(1, 4) = 1, passenger 1, 2, and 4 are carried by the same vehicle and passenger 3 is assigned to another vehicle. is implies that two vehicles in total are required to satisfy these four passengers. erefore, the whole framework of this researched problem could be concluded as shown in Figure 1. Table 1 lists some important parameters and variables.

2.2.
e Optimization Model of Passenger-Passenger Matching. Before formulating the optimization model, some characteristics of this system are predefined as follows: (i) All passenger requests should be served; (ii) All passengers are assumed to have the willingness of sharing rides with others; (iii) All passengers' pick-up and delivery times are within their expected time windows; (iv) Each passenger is only served by one single vehicle; (v) Vehicle capacity should not be exceeded when carrying multiple passengers; and (vi) AVs are controlled by a central control system.
A vehicle carrying more passengers will increase the probability of saving vehicle needed, which means that more passengers are associated. at is to say maximizing the number of associated passengers could take an effect in minimizing the total number of vehicles. In this way, the objective function is formulated by Equation (1) with the following constraints: (1) e transitivity relation of passengers. To determine the correlation between multiple passengers, the transitivity of passenger relationships is denoted in Equation (2). It means that passengers ὔ and ὔὔ are connected if they are both linked by a passenger .
To provide the convenience for solving the model, Equation (2) Service time constraint. Since the desired pick-up and drop-off times are collected in advance, request service time should be within these time limitations, as reflected in Equations (4) and (5). For request and , if they are assigned to the same vehicle (휆(푢, 푤) = 1), time differences in spatial dimension also need to be added to determine variables 푝(푢) and 푔(푢), which are expressed in Equations (6)− (8). M is a very large number. Equation (6) implies that time difference of taking request and ὔ for vehicle v should not be less than the minimum travel time between original node 표(푢) and 표(푤). Similarly, Equation (7) illustrates that time difference of dropping off requests and need to be larger than the minimum travel time between destination nodes 푑(푢) and 푑(푤). In addition, the time constraint for picking up request and dropping off is also considered in Equation (8).
(3) Vehicle capacity constraint. To express the status of onboard passengers at a certain time, we discretize the simulation time into uniform intervals and an auxiliary variable 푧 1 (푢, 푡) is used to present the states of requests. If 푧 1 (푢, 푡) = 1, it means that request is being served at current time . Otherwise, request is waiting to be picked up or has arrived at its destination. Combining this binary factor 푧 1 (푢, 푡), we can count the total number of onboard passengers at any time , which is formulated in Equation (15): where the state variable 푧 1 (푢, 푡) is an indirect representation on the relation between actual pick-up time 푝(푢) and drop-off time g(푢), as shown in Equation (16): e presence of the nonlinear constraint of (16) makes it difficult to be implemented directly in commercial solvers. For the sake of computation complexity, we conduct linear transformations as by Equations (17)−(23). To accomplish the process, we introduce another auxiliary variable 푧 2 (푢, 푡) associated with request state. Given the pick-up and drop-off times, a request state could be divided into three phases. Firstly, a waiting phase, the request is waiting to be served and pick-up time has not been reached, in which time duration of this type should be earlier than actual pick-up time 푝(푢). Equation (17) should always hold in this situation, 푧 2 (푢, 푡) and 푧 1 (푢, 푡) limited by Equation (22) will be approximated to 1 and 0 respectively to meet Equations (18)−(20), i.e., waiting state when 푧 1 (푢, 푡) = 0 and 푧 2 (푢, 푡) = 1. In the second onboard phase, a request has been transported to its destination but has not yet arrived. Time duration of this phase should be governed by the pick-up time and ended at drop-off time, i.e., 푝(푢) ≤ 푡 ≤ 푔(푢). During the off-board phase, Equations (17) and (20) will hold only if 푧 1 (푢, 푡) = 1 and 푧 2 (푢, 푡) = 0. e last phase will occur when a request has been completed and time is later than drop-off time 푔(푢). Hence, Equation (20) should always hold for off-line situations. Under this constraint, it can be proved that 푧 1 (푢, 푡) = 0 and 푧 2 (푢, 푡) = 0 are the only solution to make Equations (17) and (19) satisfied. In this way, it is obvious that Equations (17)−(23) are equivalent constraints to Equation (16), which could be added in linear forms.

Solution Method
A mixed integer optimization problem is formulated to solve the matching of request and vehicle dispatching. Current practice to obtain optimal solutions of such problems is through commercial so ware such as Gurobi and CPLEX. However, these solvers are known to have limitation in computation time once travel demand increases significantly.
To solve large-scale instances within an acceptable time, we intend to discover potential shared trips to reduce the computational complexity by exploring passengers' characteristics. Sharing trips denote trips that have overlapping schedules in both time and space dimensions. is is similar to the cluster analysis (e.g., [19]) to find relative elements in a cluster and divide a set into several independent clusters. us minimizing matrix П, as shown in Figure 3, where the element u,k means the weight for a passenger pair. e detour distance, overlapping trip and waiting time are all feasible parameters to reflect their weights. To give an intuitive evaluation on the service system, we take the saving travel distance as the weighting value. In this way, the weight value is formulated as 휋 푢,푘 = 푅(u) + 푅(푘) − 푅(푠), where 푅(푠) denotes the travel distance for a passenger pair (푢, 푘). It should be pointed out that a passenger pair will not be a valid pair if two passengers are not matched with time and space constraints. Under this situation, we set 푅(푠) as a very large number for invalid pairs. In addition, diagonal elements are also invalid pairs, which are set to 0 in this relationship matrix.
With this matrix, the task of passenger clustering could be considered as finding sharing pairs with the maximum total saved travel distance. is process is similar with job assignment problems, which is stated as follows: the number of vehicle could be regarded as minimizing independent clusters. e minimization implies that each cluster should include as many passengers as possible.
In this way, the vehicle dispatching problem is converted to segmenting passenger set. If each cluster is taken as a passenger with specific OD pair and time restriction, traffic demand will be reduced to the number of clusters in this cluster-based method. Under this simplified network, passengers no longer have intersection in both time and space dimensions, i.e., multiple passengers sharing one vehicle at the same time is impossible. It can be concluded that the problem is approximately translated to a multiple travelling salesman problem (mTSP) with time windows. To solve this model, we also adopt the cluster-based method to classify passengers and derive the number of required vehicles. e cluster-based method has shown to have significant impact in simplifying the optimization model of this paper. e whole process of the proposed algorithm could be summarized into two parts, i.e., passenger clustering and cluster compression, as shown in Figure 2. e detailed description of this algorithm is stated as follows: 3.1. Part I: Passenger Clustering. To divide passengers into groups, the first step is to describe the sharing ability of passengers for selecting sharing pairs. It should be noted that a sharing pair means two passengers carried by one AV at the same time. Based on passengers' OD pairs and service time restrictions, we can roughly evaluate the sharing ability of any two passengers and generate a relationship  Journal of Advanced Transportation 6 as a group. If the set of sharing pairs is empty, stop the iteration and output passenger groups; otherwise go to step 3.
Step 3: Passenger number redefinition. Take each group as a new passenger, and relabel them.
Step 4: e relationship matrix recalculation. Since a new passenger represents a passenger group, the total travel distance 푅(푠) for a passenger pair should be calculated with a heuristic method or dynamic programming. For simplify, we will adopt the insertion algorithm to find a feasible visiting sequence to evaluate the total travel distance. en generate the relationship matrix and go back to step 2.

Part II: Cluster Compression.
is part aims to lower the upper bound and move it towards the optimal solution. If each cluster is served by one vehicle, the number of clusters gives an upper bound of the vehicle dispatching problem. To reduce the number of used vehicles, the method of recombining clusters is introduced, as shown in Figure 4. It should be noted that a vehicle could not serve multiple clusters at the same time for clusters output from Part I. In this way, if each cluster is considered as a passenger (Figure 4(b)), the problem is similar to mTSP with time windows. For mTSP, the objective is to minimize the total travel distance. us the goal of this part is to dispatch the minimum number of vehicles to visit these passengers with least travel distance. e basic idea is to prejudge any two passengers that could be served by a vehicle. For passenger and ὔ , assuming that they are served by the same vehicle, possible serving sequences are 표(푢) → 푑(푢) → 표 푢 ὔ → 푑 푢 ὔ and 표 푢 ὔ → 푑 푢 ὔ → 표(푢) → 푑(푢). If either of them satisfies time restrictions of picking up and dropping off, they will be marked as a link, as shown in Figure 4(c). It should be noted that the weighting value of a link is the travel distance of drop-off point 푑(푢) to pick-up point ὔ or drop-off point ὔ to pick-up point 표(푢). To choose appropriate links, we also use the Hungarian algorithm to make decisions. A link will be considered as a new passenger in this network, as shown in Figure 4(d). en the same process is applied again to link passengers till no link could be established. At the end, independent passengers (e.g., and in Figure 4(e)) are defined as branches, where a branch implies a vehicle is required and passengers could be derived from associating links. e whole iterative process is illustrated as below: Step 1: Preliminary. Based on information submitted by passengers, recalculate OD and time requirement for clusters and label where 푢,푘 is a binary variable, 1 if passenger and are combined and 0 otherwise. Equation (24a) denotes the objective of maximizing the total saved travel cost. Equation (24b) is the weight value expression for passengers. Equations (24c) and (24d) represent each passenger should be matched with a passenger to share a ride.
Results of this model might contain invalid pairs, which should be refined to obtain valid sharing pairs. In the relationship matrix, diagonal pairs might be invalid but are contained in the results. Hence, the pair (푢, 푢) will be deleted if 휔 푢,푢 = 1. Repeated pairs are another issue for this model. For example, if 푢,푘 and 푘,푢 are simultaneously equal to 1, they both denotes passenger and will become a group.
us, pair (푢, 푘) or (푘, 푢) needs to be deleted for simplifying sharing pairs. e aforementioned model and refining process are the first step to determine passenger pairs, which could not be directly used to find minimum passenger groups.
is is because more pairs might be found between passenger pairs and single passengers. To minimize the number of passenger groups, an iteration process is conducted to explore new passenger pairs. e whole procedure of passenger clustering could be concluded as follows: Step 1: Preliminary. Collect passenger OD pairs and expected service times. Label them and mark them as original passengers. en calculate the relationship matrix based on Equation (24b).
Step 2: Sharing pair determination. Solve Equations (24a)-(24d) with the Hungarian algorithm and obtain sharing pairs by deleting invalid pairs and repeated pairs. en take each sharing pair as a group. If a passenger is simultaneously appeared in two groups, integrate these two groups as a group, until groups are independent. Unmatched passengers are respectively regarded  In this uniformly distributed scenario, minimum vehicle number for the 24-request case could be estimated in the following way. First we consider ride sharing among the same OD pair, i.e., assigning one vehicle to each OD pair. In this situation, four passengers could be served and two passengers are le for each OD pair. Since path A → B → D → C and D → C → A → B are sharing rides, dispatching one vehicle for each path will completely serve the remaining passengers. In this way, 6 vehicles are needed to satisfy 24 passengers. Similarly, the minimum number for the other instances are 8, 10, and 12 vehicles. Our model and a typical DARP model [9] are solved for this case using Gurobi solver. eir results are listed in Table 2. For the four cases, our model could obtain the correct minimum vehicle number as aforementioned. However, the DARP model could not find the correct minimum number for the 24 and 40 cases within a short time.
In the randomly generated scenario, any two nodes on the network could form an OD pair with stochastically given time windows. For randomly generated case, one passenger might share rides with more passengers but only within time windows restriction. As is presented in Table 3, our model could still give optimal values. Computation times of our model are not drastically exhausted for exploring the best passenger combination. It implies that our model is more adaptive to solving the vehicle dispatching problem.
For these instances, the proposed cluster-based algorithm and a heuristic algorithm (insertion algorithm [20]) are also used to find optimal dispatching plans. Results of these algorithms are listed in Tables 4 and 5. In fixed OD pair case, the minimum vehicle number of the two algorithms are equal to the optimal values and computational times are shorter than those of Gurobi solvers. For the random case, most results of them with new passenger indexes * . en generate a 0-1 matrix to express the connectivity of these recreate passengers and mark gap distances as weight values for links and a very large number for nonlinks.
Step 2: Link selection. Apply the Hungarian algorithm to find optimal links with the minimum accumulated gap distance. en filter these selected links to produce independent links, where this process is similar with Equation (23).
Step 3: Stop criterion. If there is no feasible links available, stop calculation and output branches; otherwise go to step 4.
Step 4: Passenger set updating. Recalculate OD and time limitation for each independent links and label passengers indexes again. en go back to step 2.

Numerical Experiments
In this section, a set of cases are generated to examine the validity of our model and performance of the proposed cluster-based algorithm. For providing abundant comparisons, computational results of a typical DARP model and insertion algorithm are calculated. From these results, the application scope of our model and the proposed algorithm are concluded. In addition, the effect of sensitive parameters on determining the minimum number of vehicles are analyzed for the SAV system.

Results of Small Scale Problem.
For the small scale problem test, request size is set to be ranging from 24 to 48 defined on a simple network of 5 × 5 ( Figure 5). Travel time of each link between adjacent nodes is assumed to be 60 seconds. Two simple scenarios are employed: fixed and random OD pairs.
In the first scenario, we intend to explain our model with predetermined sharing paths. e OD pairs are designed to be in diagonal directions (A → D, B → C, D → A, and C → B). Each OD pair is allocated with the same amount of requests. Journal of Advanced Transportation 8 30 mph [7]. Assuming a maximum vehicle speed, travel time on each link is approximately 60 seconds. e time interval between earliest and latest time of pick-up or drop-off nodes is still set to be 120 seconds.
In this section, random requests ranging from 500 to 4000 are tested.  (Table 7). It is obvious that the proposed algorithm could find a ride-sharing pattern with fewer number of vehicles and shorter travel distance. In terms of vehicle number, the proposed cluster-based algorithm saves 12-42 vehicles as compared to the insertion algorithm. As for travel distance, a reduction of 100-600 miles could be obtained. e average occupation rate (average passenger number per vehicle) obtained with the cluster-based algorithm ranges from 3.6 to 5.32, which is higher than the insertion algorithm's occupation rate from 3.31 to 5.12 ( Figure 6). Detailed passenger number distributions are presented in Figure 7. As could be noticed, the passenger number distributions have similar form of a normal distribution with peak frequency in the middle. e cluster-based algorithm's distributions always have a larger median value. For example, the mean value of cluster-based algorithm is 4 for the 500 request case, which is one passenger more than the insertion algorithm. is means that the cluster-based algorithm is able to apply more sharing rides cluster-based algorithm find the best values while the heuristic algorithm could not give any optimal values. For request 28 and 40, although dispatching plan derived from cluster-based algorithm needs one more vehicle, they are less than those by the insertion algorithm. is shows that the cluster-based algorithm has a better chance of finding the optimal or near-optimal values. Nevertheless, compared with the insertion algorithm, the cluster-based algorithm might need a little longer time to obtain superior results.
To further analyze our model's adaptability, we conduct a series of tests to observe computation time variation with increased requests on the same road network, as shown in Table 6. In addition, the minimum number of vehicles of cluster-based algorithm are 8, 9, 8, 13, 12, and 13; and the optimal values of our model are 8, 8, 8, 11, 12, and 11. Although cluster-based algorithm could not search the optimal values for all cases, it has an overwhelming advantage in terms of computation time. It can be seen that the computation time of Gurobi is more than 1000 seconds when request reaches 100, which might need several hours to find the optimal value for larger requests. erefore cluster-based algorithm will become a better choice when near-optimal solutions are demanded in a short time with hundreds of requests.

Comparison Test for Large-Scale
Case. In addition to the 5 × 5 road network, a larger 20 × 20 grid representing city downtown is adopted with randomly generated requests. e length of each link is set to 2640 and free flow speed is is implies that the SAV system might not gain more benefits by introducing larger vehicles with more seats under this condition. is is mainly because the probability of numerous passengers with very similar OD pair and the time windows are relatively low. e average number of multiple passengers served by a vehicle at the same time will not be a large value, especially when demand is lower than 3000. us a vehicle with four seats is enough under this condition. e influence of the time windows (the difference between earliest and latest expected time of picked up or dropped off) on system's total vehicle number is shown in Figure 8. With increasing time interval, total vehicle number gradually reduces from 139 to 93 as time windows increase from 2 minutes to 16 minutes (Figure 9(a)), which results in shortened total travel distance (Figure 9(b)). It indicates that the system succeeds in finding sharing rides for more passengers with extended time restriction. Since the time interval is a key parameter that reflects the willingness of waiting time, passengers might be picked later compared with the earliest expected times if it is set to be longer. If this parameter is very large, it is intuitive that they will have more opportunities to share a ride with others. Under this condition, the number of vehicles will become smaller. For more trips which are integrated as a trip, the total travel distance will also be reduced.

Conclusions
In this paper, we formulate the vehicle dispatching problem of SAV transportation system into a 0-1 integer programming model. Unlike existing vehicle routing optimization models, our model focuses on exploring the similarity of passengers' demand in time and space dimensions to classify passengers into groups, in which the number of required vehicles is derived indirectly. To solve this model, the cluster-based algorithm is proposed for classifying passengers. e whole process consists of two parts: (1) the Hungarian algorithm is introduced to select appropriate passengers sharing trips and determine an upper bound for required vehicles; (2) a reunion process by linking sharing trips is conducted to lower the upper bound. Since the Hungarian algorithm only needs a polynomial time, the computational complexity of the than the insertion method since its number of highly loaded vehicle is larger. For example, for 4000 requests case, 52 vehicles are assigned with 8 passengers by the cluster-based algorithm while only 7 vehicles could be assigned with the same number of passengers for the insertion algorithm.

Sensitivity Analysis.
In this section, we conduct a sensitivity analysis to examine how the performance of the proposed algorithm is affected by key input parameters. Figure 8 presents results of demanded vehicle number under various vehicle capacities. It should be noted that capacity equaling 1 means only one passenger is served at a time by a vehicle with four seats where ride-sharing is not included. For capacity of 7 case, seven passengers are allowed to be carried at the same time. For simplicity, we mark vehicle capacity of 1, 4, and 7 as condition I, II, III respectively. For condition I, the number of required vehicles are much higher than the other two conditions, while results of condition II and III are of little difference. If condition II is considered as the benchmark, it seems that vehicles with higher capacity makes little contribution to reducing total vehicle number.  proposed algorithm could be greatly reduced, which makes it applicable for solving large-scale cases. e validity and efficiency of our vehicle dispatching model and the proposed cluster-based algorithm are presented by conducting a series of tests. First the model and algorithm are applied for small-size passenger requests. Results show that the proposed algorithm could always find optimal or near-optimal solutions when comparing the optimal values with obtained from the optimization solver. We also list results of a typical DARP model [9] and insertion algorithm for further analysis. By comparing computation time and solution gaps, it indicates that the proposed algorithm has an advantage in gathering passengers sharing a vehicle and making the objective function towards the best value. For large-size cases, the new algorithm still expresses a better performance than the insertion algorithm in minimizing the number of vehicles and total travel distance. At last, the effect of key input parameters on the number of vehicles are discussed. It is concluded that enlarging vehicle capacity will not reduce used vehicles in consequence when it exceeds four and extending waiting time will make a positive feedback on decreasing the number of vehicles.
rough the whole study, we mainly investigate how to minimize used vehicles for the SAV system with given demands, which is a static dispatching method. To enrich the application scope of the cluster-based algorithm, the dynamic or on-line planning will be regarded as an interesting research direction. Furthermore, we assume and all passengers have the willingness of accepting ride sharing, which only consider the dispatching problem from the system view. We will extend our model by introducing customized passengers demand for the further research. In addition, charging price is a key factor in passengers' decisions, which might bring a trade-off in passengers cost and the system revenue. In this way, price optimization will be also considered as our future research to enrich our model.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.