Hybrid Clustering Algorithms with GRASP to Construct an Initial Solution for the MVPPDP

: Mobile commerce (m-commerce) contributes to increasing the popularity of electronic commerce (e-commerce), allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time. As demand for e-commerce increases tremendously, the pressure on delivery companies increases to organise their transportation plans to achieve profits and customer satisfaction. One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem (MVPPDP), where a selected set of pickup and delivery customers need to be served within certain allowed trip time. In this paper, we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure (GRASP) to construct an initial solution for the MVPPDP. Our approaches first cluster the search space in order to reduce its dimensionality, then use GRASP to build routes for each cluster. We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem. Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.


Introduction
Today, most people rely on electronic commerce (e-commerce) to meet their needs. Ecommerce has several advantages that encourage people to rely on it rather than traditional purchasing methods, such as availability of products at any time, low prices, privacy, comfort, and quality assurance. In addition to those advantages, the spread of mobile commerce (m-commerce) has increased the popularity of e-commerce, allowing people to sell or buy goods using a mobile device or tablet anywhere and at any time. As the demand for e-commerce has increased, the pressure on delivery companies to organise their transportation plans to achieve profits and customer satisfaction has also increased.
Transportation means used by millions of people to transport themselves or their goods all over the world make the transport and shipment sector a hotspot in the research field. Most of this research aims to optimise the planning and organisation of the different transportation means used: land, air, and sea. Land transport is considered the most important, because it is used daily in different countries and has a huge volume that includes buses, trucks, cars, trains, trams, motorcycles, and more. In addition, the increasingly harmful impact of land transport, encourages researchers in different fields (operations research, computer science, and industrial engineering) to find the best solutions possible. There are two negative impacts of land transport: environmental and economic effects. According to environmental research, transport has a main role in distributing pollution and increasing global warming, which results in many diseases that have negative effects on people's health. Therefore, researchers aim to find solutions to decrease congestion and the environmental damage caused by harmful emissions of greenhouse gases and carbon dioxide. On the other hand, the economic effects on companies that work in the transport and shipment sector includes missorganisation of used means. Furthermore, the unexploited space in trucks results in huge losses for these companies in addition to the environmental impact, with statistics indicating that 15% to 30% of accidents, traffic congestion, and pollution are caused by empty trucks [Gansterer, Hartl and Vetschera (2019)].
There are many problems that mimic the means of land transport which are presented by researchers to optimise services and minimise their harmful impacts, such as vehicle routing problems, bus scheduling problems [Saha (1970)], cash transportation problems [Yan, Wang and Wu (2012)], railroad blocking problems [Barnhart, Jin and Vance (2000)], and others [Díaz-Parra, Ruiz-Vanoye, Bernábe Loranca et al. (2014) ;Cornillier, Boctor and Laporte (2008) ;Zhu, Hu and Wang (2012)]. One of the best-known combinatorial optimisation problems designed to optimise transportation network management and organise the distribution of goods and vehicles is the vehicle routing problem (VRP). The goal of the problem is to distribute goods for a set of customers by finding the best route(s). One or more vehicles with limited capacity from a homogeneous fleet of vehicles are used to transfer goods to customers. Each vehicle must start its trip from a depot and return to it again after serving customers. In the literature, there are different types of VRPs that have been presented over the last 50 years. Although these types vary in constraints and complexity, they share a common goal of minimising the harmful impacts of transport besides reducing cost [Lin, Choy, Ho et al. (2014)]. One important variant of the VRP is the pickup and delivery problem (PDP), which aims to reduce the total transportation cost when people or goods are moved. It differs from the VRP in the type of service provided, with each good or customer transferred between two predefined locations: a pickup node and a delivery node. Several real-world applications of the PDP include the distribution of beverages and the collection of empty bottles and cans, the shipping of cargos and the transportation of raw materials from suppliers to factories. Also, many extensions of the PDP have been presented in the literature that include slight changes to the workings of the original PDP. The selective pickup and delivery problem (SPDP) is one relatively new variant of the PDP that differs from the standard PDPs by serving only some customers rather than serving all. This means the SPDP is valuable to those companies that have limited resources and can realise profits by finding the best routes to serve only the profitable customers. There are two types of SPDPs: (1) SPDPs subject to minimising the travelling cost only (e.g., [Ting and Liao (2013); Liao and Ting (2010) ;Ho, Sin and Szeto (2016)]) and (2) SPDPs subject to minimising the travelling cost and maximising the profit collection (e.g., [Coelho, Munhoz, Haddad et al. (2012); Bruck, dos Santos and Arroyo (2012)]). Gansterer et al. [Gansterer, Küçüktepe and Hartl (2017)] presented the multi-vehicle profitable pickup and delivery problem (MVPPDP), which belongs to the second type of SPDP. It aims to find the best routes by minimising the travelling cost and maximising the profit collection in a one-day planning horizon. The MVPPDP is a static problem with a central depot, a set of customers, a predefined number of requests, products, and a homogeneous fleet of vehicles. Each request is defined by a customer pair: pickup customer and delivery customer. Thus, the products are transported from a pickup customer and delivered to a delivery customer to earn a profit from the service. Some real-life applications of the MVPPDP include food delivery mobile apps and delivery companies apps. Since the MVPPDP is an NP-hard problem (because it is one types of the SPDP that is a special case of a combination of the PDP) [Gansterer, Küçüktepe and Hartl (2017)], metaheuristic algorithms can be utilised to find good solutions within a reasonable processing time, especially for medium and large size problems. In this paper, we proposed an approach that is based on clustering algorithms with proven efficiency in speeding the search process and decreasing the processing time. We used K-means, adaptive K-means and ant colony optimisation (ACO) algorithms to cluster the search space of the MVPPDP. In addition, we modified the greedy randomized adaptive search procedure (GRASP) that was used in Alhujaylan et al. [Alhujaylan and Hosny (2019)] to construct an initial solution for the MVPPDP, after the clustering phase. The main contributions of this paper are as follows: 1) helping delivery companies and food delivery mobile apps to efficiently plan their services by speeding the selection of customers that can be served and finding the shortest routes to get the profits and achieve customer classification, 2) efficient transport planning can help in decreasing environmentally harmful effects of emissions of 2 and other gases besides saving energy resources, and 3) novel solution approaches are proposed to construct an initial solution for the problem which outperform state-of-the-art methods in the literature. The rest of this paper is organised as follows. A review of some related work is presented in Section 2. Section 3 introduces the formal problem definition. The proposed method is described in detail in Section 4. Section 5 illustrates the experimental results. Finally, conclusions and future work are presented in Section 6.

Related work
The MVPPDP belongs to the second type of SPDP that aims to minimise the travelling costs and maximise the profits collected by visiting only the profitable customer pairs. Both the MVPPDP and the profitable tour problem (PTP) share the same goal of finding a good route that maximises the difference between the total collected profit and the total travelling cost. The main difference between them is that in the PTP there are no constrains that must be considered on the vehicle route, such as maximum trip time, vehicle capacity limits, or precedence constraints [Archetti, Speranza and Vigo (2014)]. Studies that address the PTP are rare in the literature [Toth, Paolo and Vigo (2014)]. Therefore, we first present work related to the MVPPDP. Then, some of the studies that have been done on the SPDP with minimising travelling cost and maximising profit collection are briefly presented. There are five algorithms that have been presented to solve the MVPPDP. The first four algorithms, presented by Küçüktepe et al. [Küçüktepe (2014); Gansterer, Küçüktepe and Hartl (2017)], used the variable neighbourhood search (VNS) metaheuristic. The construction phase of VNS was done using two heuristics: the greedy construction heuristic (C1) and the two-stage cheapest insertion heuristic (C2), while the improvement phase was based on applying the following neighbourhood operators: the pairwise forward exchange, the pairwise backward exchange, relocate pairs, forward insertion, backward insertion, inter tour exchange, inter dummy exchange, inter tour insert, inter dummy insert, and the gravity centre exchange. Authors adapted two strategies for applying the neighbourhood operators: a sequential (Seq) approach that uses a predefined sequence of neighbourhoods and a self-adaptive (Sea) approach that determines the sequence of neighbourhoods by adapting itself according to the search status. Finally, an alternative algorithm that was based on a guided local search (GLS) metaheuristic was implemented to compare with the proposed approach. The proposed algorithm was tested on 36 new randomly generated data instances of different sizes. Experimental results showed that both variants of general VNS outperformed GLS in terms of solution quality. The fifth algorithm was proposed by Haddad [Haddad (2017)]. Their combination of an iterated local search and random variable neighbourhood descent was proposed to solve the profitable pickup and delivery problem; therefore, they named their algorithm IPPD. The IPPD algorithm was not limited to accepting only feasible solutions during the search. The C1 heuristic that was used in Gansterer et al. [Gansterer, Küçüktepe and Hartl (2017)] was adopted to construct the initial solution. To improve the solution, several of the following neighbourhood moves were then applied: pair swap or pair shift, block swap or block shift, pickup or delivery shift, inter pair swap or inter pair shift, inter block swap or inter block shift, gravity centre exchange, insert operator and remove operator. The proposed algorithm was tested on the benchmark instances proposed by Gansterer et al. [Gansterer, Küçüktepe and Hartl (2017)]. It proved its efficiency in addressing smalland medium-sized instances, for which it was able to find new best solutions for six instances. However, the performance of the proposed algorithm was not adequate for large-sized instances. Recently Alhujaylan et al. [Alhujaylan and Hosny (2019)] used the construction phase of the well-known GRASP to build initial solutions for the MVPPDP. They compared the methods performance with two construction heuristics that were previously used in the literature to build initial solutions of the MVPPDP. The experimental results proved the proposed method outperformed the other construction heuristics, especially in small-sized instances, where they got eight new best initial solutions for the problem.
One of the practical applications of the SPDP was the distribution of soft drinks and collection of recyclable containers by a Quebec-based company by applying three heuristics as follows: the nearest neighbour heuristic, first petal heuristic, and second petal heuristic. The empirical results indicated that these heuristics contributed to decrease the distance by 23% [Privé, Renaud, Boctor et al. (2006)]. Another application of SPDP is called the single vehicle routing problem with deliveries and selective pickups (SVRPDSP), which has been solved using mixed integer linear programming (MILP) and a Tabu search (TS) algorithm. The proposed algorithm was applied on instances that were derived from the VRP library, and it was able to produce near-optimal solutions for 68 instances [Gribkovskaia, Laporte and Shyshou (2008)]. Additionally, the selective multidepot vehicle routing problem with pricing required an organised solution to collect cores of durable goods from customers to encourage them to buy new products. A rich neighbourhood TS (TS-RN) coupled with two MILPs (mixed integer linear programming) models was proposed to solve this problem. According to the results, the proposed approach succeeded in terms of both efficiency and accuracy when it was tested on 40 randomly generated instances [Aras, Aksen and Tekin (2011)]. Again, the same instances used in Gribkovskaia et al. [Gribkovskaia, Laporte and Shyshou (2008)] were used again to test two hybrid general variable neighbourhood searches (HGVNSs) and a hybrid metaheuristic based on an evolutionary algorithm (EA) that were proposed by Coelho et al. [Coelho, Munhoz, Haddad et al. (2012);Bruck, dos Santos and Arroyo (2012)] to solve the SVRPDSP. Both metaheuristics were proved to be robust and effective. For the interested reader, other applications of the SPDP can be found in Coelho et al. [Coelho, Munhoz, Ochi et al. (2016)

Problem definition
The MVPPDP is a static problem with predefined constituents. We assume that there is a central depot that receives customer requests. These requests consist of pairs of pickup and delivery customers. To serve these requests, we have a set of homogeneous vehicles that transport a number of homogeneous products from a selected (partial) set of pickup customers to their corresponding delivery customers, such that a certain profit is gained. The constraints of the MVPPDP that should be considered are the following: • The pairing constraint: For each request there is a predefined customer pair (pickup and delivery). • The precedence constraint: The pickup customer must be visited before the delivery customer.

•
The trip time constraint: Each vehicle has a certain daily travel time limit that cannot be exceeded while customers are being served. • The capacity constraint: Each vehicle has a limited capacity that cannot be exceeded, when products are being collected from pickup customers. • Each vehicle should start/end its journey from/at the depot with an empty load. • Each customer must be visited only once. The formal definition of MVPPDP is given by Küçüktepe [Küçüktepe (2014)] as follows: Let = ( , ) be a graph, where = {0, … ,2 + 1} is the vertex set. The vertex (0, 2 + 1) represents the central depot. = {1, … , } is the set of pickup customers, while = { + 1, … ,2 } is the set of delivery customers. The arc set is = {( , ): , ∈ , ≠ } , such that a non-negative routing cost is associated with each arc. It is assumed that a revenue is collected when visiting each delivery vertex . Also, for a pickup vertex there is a supply > 0, while for a delivery vertex there is a demand + = − . It is also assumed that there is no supply or demand for the depot (i.e., 0 = 0). Finally, there is a set of vehicles = {1, … , }, such that each vehicle has a maximum load capacity and a maximum tour time limit . The objective function of the MVPPDP is described as follows: where is a binary decision variable that is equal to one if arc is used by vehicle , and zero otherwise. For the details of the mathematical model of the MVPPDP, the reader is referred to Gansterer et al. [Gansterer, Küçüktepe and Hartl (2017); Alhujaylan and Hosny (2019)].

Proposed method
We divided our proposed method for solving the MVPPDP into two phases: a clustering phase and a routing phase. The details of each phase are presented below.

Clustering phase
The purpose of the clustering phase is to try to reduce the search space by dividing customers into clusters based on some relatedness measure. After this, selected customers from each cluster will be visited by one vehicle whose route will be planned in the routing phase. We proposed three clustering algorithms to cluster the MVPPDP search space: 1) a K-means clustering algorithm, 2) an adaptive K-means clustering algorithm, and 3) an ACO-based clustering algorithm. In the MVPPDP, since each request has a pair of customers (pickup and delivery customers), the coordinates of a midpoint between the pickup customer and the delivery customer are computed to represent the request pair as follows: where represents the index number of a given customer, = {1, … , }, ( 1 , 1 ) represents the coordinate of the pickup customer, and ( 2 , 2 ) represents the coordinate of the delivery customer. Thus, the midpoints of requests were considered in all clustering algorithms that have been used. The details of the algorithms are presented in the following sub-sections.

K-means clustering algorithm
The K-means algorithm, which was developed by MacQueen [MacQueen (1967)], is one of the most well-known and simplest unsupervised learning algorithms that have been used to solve the familiar clustering problem. The K-means algorithm has numerous advantages that help make it useable for many clustering problems. For example, it is easy to understand and implement, it has high performance speed, and it is relatively efficient where its time complexity is ( ) , where is the number of iterations required for cluster convergence, and is the number of objects in the dataset. The general idea of the K-means algorithm is to classify a given data set into an a priori number of clusters with each cluster treated as a separate group. Dividing the problem into subproblems in this fashion eases and accelerates the solving of the problem. A flowchart of the K-means clustering algorithm is illustrated in Fig. 2. In the following steps, we explain how the algorithm works. Step 1: Initialisation of parameters In our K-means algorithm, two initial parameters need to be entered: the number of clusters and the coordinates of customers.
Step 2: Initial clustering solution From each cluster , = {1,2, … , } a random request is selected to be an initial centroid of the cluster. The Euclidean distances between the rest of requests (i.e., those that are not selected as cluster centroid) and the cluster centres { 1 , 2 , … , } are then computed. Each request is assigned to the nearest cluster centre.
Step 3: Generation of a new clustering solution For each cluster , a new cluster centre is calculated by computing the means of the coordinates of all requests that belong to that cluster. We then repeat Step 2, with each request reassigned to the appropriate cluster based on the distance between it and the new cluster centres.

Step 4: Final clustering solution
To generate the final clustering solution, Step 3 is repeated several times until convergence (i.e., until no change is observed between the clustering solution's current iteration and the previous one) [ Du (2010)].

Adaptive K-means clustering algorithm
One drawback of the classical K-means algorithm is that it does not take into consideration the particularities of the MVPPDP. Therefore, the main goal of proposing the adaptive Kmeans clustering algorithm is clustering the search space with consideration of the MVPPDP's objective: maximising profit and minimising travelling cost. In other words, this differs from that of the K-means clustering algorithm by considering profits in addition to the distance between customers. Thus, in addition to the distance between the request and the cluster centre, our adaptive K-means algorithm also considers the "appropriateness" of the request within the cluster. The concept of appropriateness, which was used in Haddad [Haddad (2017)] for inserting new requests into a route, has been adapted to our method as follows: after the centre is computed for each cluster, a request is assigned to an appropriate cluster based on the following equation: Algorithm 1: Pseudocode for adaptive K-means clustering

ACO-based clustering algorithm
Ant colony optimisation (ACO) is a metaheuristic algorithm that mimics the cooperative behaviour of ants foraging in search of food. It was first introduced by Dorigo et al. [Colorni, Dorigo and Maniezzo (1991)]. The ACO algorithm is one of the most well-known of the swarm intelligence algorithms used to solve numerical and combinatorial optimisation problems, and it is particularly useful for problems which require finding the shortest path as a goal (see for example [Dahan, El Hindi, Mathkour et al. (2019); Bell and McMullen (2004)]). Additionally, clustering with ACO or with other models inspired by the behaviour of ants has been used as an alternative to traditional clustering algorithms. See Jafar et al. [Jafar and Sivakumar (2010)] for a survey of interesting clustering approaches based on ant behaviour. The pseudocode for our MVPPDP clustering-based ACO algorithm is presented in Algorithm 2, where the meaning of each notation is as follows: Max_Iter: maximum number of iterations; Num_Cust: number of customers; Pop_Size: size of population; K: number of clusters; and , : the pheromone trail matrix of size * , where refers to number of requests, and refers to number of clusters. The details of each step of the clustering using ACO are presented below.
Step 1: Initialisation of parameters Several parameters need to be set before starting the algorithm, such as, population size (Pop_Size), number of clusters (K), number of customers (Num_Cust), maximum number Input: (the number of clusters), midpoint coordinates of requests={1, … , } /* Initial clustering solution */ For =1 to K Choose randomly a request to be an initial cluster centroid. End For • Compute the distance between the rest of requests and cluster centres.
• For each request, compute the appropriateness of the request withi n each cluster.
• Assign each request to the cluster with the maximum appropriateness value. /* Generating a new clustering solution */ Repeat For =1 to K Reassign a new cluster centre by computing the mean of requests that belong to cluster .

End For
• Compute the distance between requests and the new cluster centres.
• For each request, compute the appropriateness of the request within each cluster.
• Assign each request to the cluster with the maximum appropriateness value. Until no change between the current clustering solution and the previous one /* Final clustering solution */ Output: a solution with set of clusters of iterations (Max_Iter), a pheromone trail matrix that is initialised to a low value, and the evaporation rate ( ).
Step 2: Initial population construction Initially, each artificial ant starts building its solution by randomly assigning requests into groups (clusters) such that each pair of pickup and delivery customers must belong to the same cluster.
Step 3: Solution evaluation The quality of each solution is evaluated in terms of the value of the objective function ( ), which aims to achieve high profit at the lowest possible cost. The calculation of the objective function depends on computing the centre of gravity ( ) for each cluster ( ), which is used later to indicate the degree of appropriateness of customers within clusters. The concepts of centre of gravity and appropriateness, which were used in [Haddad (2017)] for inserting new requests into a solution, have been adapted for our method as follows: First, the centre of gravity ( , ) of each cluster is computed, with the Cartesian coordinates ( , ) for each request in the cluster weighted by the customers revenue : Then, to determine whether each request is in the appropriate cluster, as done in the adaptive K-means algorithm, the appropriateness is computed by considering the distance between the customer and the cluster centre with respect to the request's revenue using this equation: where , ( ) refers to the distance between request and the of the cluster . After this, the for each solution is computed as the sum of all appropriateness values for all clusters. Then, the best solution that has the maximum value is selected and memorised in the Best_Solutions matrix. The pheromone trail matrix is then updated, as shown in the next step.
Step 4: Pheromone update The pheromone trail matrix has an important role in improving the quality of solutions during the progress of the algorithm. In our approach, we adopt an offline pheromone update [Talbi (2009)]. Thus, updating the pheromone trail matrix includes two phases: • An evaporation phase: To avoid premature convergence and increase the diversification and exploration of the search space, all the values of the pheromone trail matrix ( , ) are reduced automatically by a fixed proportion, which is called the evaporation rate ( ): where the trail value , represents the pheromone concentration of request associated to cluster .

• A reinforcement phase:
To memorise the characteristics of the best solution ( * ) that was obtained in the current iteration, the values of the pheromone trail matrix for the best solution found are increased by a positive value only for those requests that have been included in the best solution: Step 5: Generating new solutions For the process of generating new solutions, we were keen to achieve two goals: first, keeping track of the good solutions that had been obtained in previous iterations, in order to continue searching in those areas. Second, increasing the diversification of solutions to prevent getting stuck in a local optimum solution. To achieve the first goal (intensification), the values in the pheromone matrix helped us to focus on the promising areas that contain good solutions. To achieve the second goal (diversification), we used a simple heuristic of adding some randomness. Thus, each artificial ant constructs its solution using the following strategy: 1. The pheromone trail matrix values are normalised using this equation: where , is the normalised pheromone probability for the request belonging to cluster , and is the number of cluster .
2. A random number in the range between 0 and 1 is generated. 3. The request is assigned to the appropriate cluster by comparing the cluster's value in the normalised pheromone matrix to the random number. Thus, the request is assigned to the cluster if the random number is greater than or equal to the normalized probability of assigning the request to and less than the probability of assigning it to + 1.

4.
Steps 2 and 3 are repeated for all requests to generate a new first solution. Additionally, the same steps are repeated to generate the rest of the new solutions in the population. After this, the new solutions are evaluated again, and the pheromone matrix is updated as shown in Steps 3 and 4. This process is repeated several times until the stopping criterion is reached, which in our method is reaching the maximum number of iterations.
Step 6: Selecting best solution After several iterations, the final best clustering solution-the one with the maximum -is selected from the Best-Solutions matrix. This clustering solution is the best from among all good solutions obtained at each iteration. Thus, all requests have now been assigned to clusters, and this grouping will be used later in the routing phase to assign a vehicle to visit the customers in each cluster.
Algorithm 2: Pseudocode of the proposed clustering-based ACO

Routing phase
After the MVPPDP search space has been divided into clusters, with each customer pair placed in the appropriate cluster, we must choose which customer pairs will be served and in what order. In other words, the routing phase starts. However, there are many restrictions to consider before starting the routing phase. First, each cluster must be served by only one vehicle, which means that the number of vehicles used is equal to the number of clusters. Second, each vehicle should start and end its journey from the depot with an empty load. Furthermore, in addition to the pairing, profit, and cost constraints that were considered in the clustering phase, precedence, trip time and vehicle capacity Step 1: Initialization of parameters: Pop_Size, K, Num_Cust, Max_Iter, a pheromone trail matrix , , and the evaporation rate .
Step 2: Constructing initial population For S=1 to Pop_Size Assign the requests randomly to different clusters.

Step 3: Evaluation phase
Compute the centre of gravity for each cluster.
Compute the appropriateness of requests within clusters. Compute the objective function for each solution.

End For
Select the best solution * that has the maximum objective function, and memorize it in _ matrix.

End If End For End For End For Repeat Step 3 (Evaluation) and
Step 4 (Pheromone update) End For Step 6: Selecting best solution Select the best solution in _ matrix to be the final clustering solution.
are other constraints that should not be violated when constructing routes. The routing phase in the literature is usually divided into two sub-phases: the solution construction phase and the solution improvement phase. In this paper, we considered just the first subphase by proposing a new approach that based on the GRASP as explained next. GRASP is a multi-start metaheuristic that is commonly applied to solve different combinatorial optimisation problems. It was first introduced by Feo et al. [Feo, Thomas and Resende (1995)]. It consists of two phases: a construction phase and an improvement phase. The construction phase is used to build an initial feasible solution, while the second phase is a local search used to improve the initial solution to get a local optimum. The construction phase of GRASP combines the greedy and randomised features, where the greedy feature selects a set of candidate solutions based on a specific goal (i.e., maximum profit or minimum distance) and sorts the set in what is called the restricted candidate list (RCL), while the randomised feature randomly selects one of the best candidate solutions from the RCL. Combining the two features helps GRASP to be fast, competitive, and able to find quality solutions in a reasonable time. GRASP has successfully contributed to solving multiple variants of VRPs [Layeb, Ammi and Chikhi (2013) In this paper, we used the construction phase of the well-known GRASP to construct an initial solution of the MVPPDP. To increase the chance of getting effective solutions, we used the concept of a population metaheuristic, which creates a population that contains a set of solutions. All these solutions are improved through a number of iterations until the stopping criterion is reached, at which time the best solution is selected. We adopted the methods used in Alhujaylan et al.  (2019)] as GRASP while we refer to our new approach that uses clustering algorithms as GRASP with clustering, or GRASP(C). The first difference between the two versions is that GRASP in Alhujaylan et al. [Alhujaylan and Hosny (2019)] constructs the initial solution directly without clustering, which means all customer pairs are candidates for selection, while in GRASP(C) the search space is first clustered, then the initial solutions are constructed for each cluster based on the positions of customer pairs. The rest of modifications are presented as follows. Selecting seed customers: Each solution contains a set of routes. The number of routes is equal to the number of clusters, with each route served by only one vehicle. Each route in a solution is constructed by first selecting a seed customer based on the computed customer benefit (CB). In GRASP, the CB is calculated by dividing the revenue gained from the delivery customer by the distance between the customers in the pair. By contrast, in GRASP(C) the CB is calculated based on the distance between the depot and the pickup customer. Thus, the pickup customer that is geographically closest to the depot is selected to be the seed customer which is inserted in the route first. Constructing the routes: To fill the route with unvisited customers, the following process is repeated until either the maximum time allowed for a trip is reached or all unvisited customers have been served. In GRASP [Alhujaylan and Hosny (2019)], all unvisited customer pairs are inserted individually in the best position, such that the best position for the pickup customer is selected before that of the delivery customer, in order to meet the precedence constraint. To clarify the meaning of best position, we present the following example [Alhujaylan and Hosny (2019) Assume that an unvisited customer pair ( , − ) is selected for insertion into the best position in this route. To satisfy the precedence constraint, the best position of the pickup customer is assigned first by computing its insertion cost using the equation: ) where represents the distance cost between two vertices. This equation is applied for all slots in the route, and the position that has the lowest insertion cost is selected for the pickup customer. The same equation is then used to insert the delivery customer with consideration of the position of the pickup customer. After determining the best position for the customer pair, the insertion ratio (IR) is computed as follows: where is the revenue of customer pair (c,-c). Using this method, the unvisited customer pairs are inserted into the candidate solution set (CSS) in descending order based on IR values, and then the first elements of the CSS are assigned to the RCL. After that, one unvisited customer pair is selected randomly from the RCL and inserted into the route. This process is repeated for the remaining unvisited customer pairs until either the maximum trip time is reached or no more customers need to be served. This approach, however, is very time consuming; therefore, we modified it in GRASP(C) by changing the method for computing the IR as follows: where ( , ) is the distance between the depot and the pickup customer. The unvisited customer pairs are then inserted into the CSS in descending order based on IR2 values, after which we assign the first elements of the CSS to the RCL. Next, one unvisited customer pair is selected randomly from the RCL and inserted into the best position in the route with respect to vehicle capacity, time, and precedence constraints. Evaluate solution: After constructing all routes, the quality of the solution is evaluated using Eq. (1). The value of the objective function will later be compared to the values for other solutions. Local best solution: After all the solutions in the population are constructed, the values of their objective functions are compared. The solution with the maximum objective function value is selected to be the local best solution in this iteration. Final best solution: Again, all the local best solutions selected in each iteration are compared, and the one with the maximum objective function value is chosen to be an initial solution for the MVPPDP. The outline of the construction phase of our GRASP(C) is presented in Algorithm 3, using the same parameters used in Algorithm 2. The meanings of new notations are as follows: Num-Clusters: number of clusters, CSS: candidate solutions set, RCL: restricted candidate list, US: un-served pairs of customers, and SM: solutions matrix that contains the best solutions in the population for each iteration.

Computational experiments
The computational experiments aim to compare the performance of our proposed algorithm with the construction heuristics: the greedy construction heuristic (C1) and the two-stage cheapest insertion heuristic (C2) [Küçüktepe (2014)], and GRASP [Alhujaylan and Hosny (2019)]. All proposed algorithms have been coded in MATLAB (R2017b) and executed using a laptop computer with an Intel Core i7-4510U CPU @ 2.00 GHz (2601 MHz, two cores, four logical processors). Before we present the results of experiment, the used datasets and parameter tuning details are described in the following sub-sections.

Test instances
The same instances that were used in Gansterer et al. [Gansterer, Küçüktepe and Hartl (2017)] are used here to test our approach. The data instances are 36 instances that are classified into three groups: small size (20 and 50 customers served by two and three vehicles, respectively), medium size (100 and 250 customers served by four and five vehicles, respectively) and large size (500 and 1000 customers served by six and eight vehicles, respectively). Moreover, each group has 12 instances. These instances diverge from each other with respect to time and revenue. The total time limit is set to be either small or large, with the range within 2500 to 15000 to generate short and long routes. Also, the amounts of revenues are set to be either equal for all customers, proportional to For =1 to Max_Iter For =1 to Pop_Size For =1 to Num_Clusters Phase 1: Seed Vertex Selection Step 1: Compute the distance between the depot and the pickup customers Step 2: Select the pair that has the closet pickup customer to be seed customer Phase 2: Route Construction While Maximum Route Time is not violated Step 3: Compute the insertion ratio (IR2) for all unvisited customers Step 4: Put all the candidate customer pairs in the in descending order of IR2 Step 5: Assign half of the candidate customer pairs in the to the Step 6: Pick one customer pair randomly from the and insert it in its best position in the route after checking the precedence, time, and capacity constraints

End While End For
Step 7: Compute the objective function for the solution, and assign the solution to the End For Step 8: Select the best solution that has the highest objective function in and assign it to Final-Best-Solutions matrix End For Step 9: Select the best solution that has the highest objective function in the Final-Best-Solutions matrix to be the initial solution for the the demands, or random. The revenue is gained from the delivery customer after delivery of goods coming from the pickup customer. The quantity of goods is set to be an integer value from 1 to 50.

Parameter tuning 5.2.1 ACO parameters tuning
There are only three parameters that need to be tuned in the ACO: size of population, number of iterations, and evaporation rate ( ). Nine datasets were used to test these parameters: 50 customers with fixed revenue to generate a short route (50-F-S), 50 customers with proportional revenue to generate a short route (50-P-S), 50 customers with random revenue to generate a short route (50-R-S), 250 customers with fixed revenue to generate a short route (250-P-S), 250 customers with proportional revenue to generate a short route (250-P-S), 250 customers with random revenue to generate a short route (250-R-S), 1000 customers with fixed revenue to generate a short route (1000-R-S), 1000 customers with proportional revenue to generate a short route (1000-P-S), and 1000 customers with random revenue to generate a short route (1000-R-S). Since the time constraint is not considered in the clustering phase, we took only instances that generated short routes because the result is the same as that for generating long routes. The details of testing each parameter are as follows.

Population size
Each data instance was tested with different population sizes: 10, 20, 30, and 40. Tab. 1 illustrates the results of tuning the population size. These show that increasing the population size beyond 30 did not lead to an improvement in the OF. Thus, the population size was taken to be 30.

Number of iterations
Each data instance was also tested with different numbers of iterations: 100 and 200. Tab. 2 presents the values of the OF after setting the population size to 30, as determined in the previous experiment. The results of testing showed that there was no enhancement in the OF values after 100 iterations. Thus, the maximum number of iterations was taken to be 100.

Evaporation rate
Once again the same instances were used to select a suitable value for the evaporation rate, which is used to increase diversification of the search space and to prevent the algorithm is becoming stuck in local optima. After setting the population size and number of iterations to 30 and 100, respectively, several tests were run with values of =0.2, 0.5, and 0.8. Tab. 3 illustrates that there was no change in OF when the value of was changed.

Parameter tuning for GRASP(C)
In GRASP(C), only two parameters need to be tuned: the size of the RCL and the number of iterations (Max_Iter). In our method, we added a third parameter which is the population size (Pop_Size). Empirical experiments were performed to select the most suitable value for each parameter. Since the routing process needs more time than the clustering process, we selected only medium-sized instances with different situations (revenue either equal, proportional to demand, or random and time either long or short) to test these parameters. Six medium-sized data instances were used containing 100 customers served by four vehicles, each with a capacity of 80. The descriptions of these instances are as follows: 100 customers with fixed revenue to generate a short route (100-F-S), 100 customers with fixed revenue to generate a long route (100-F-L), 100 customers with proportional revenue to generate a short route (100-P-S), 100 customers with proportional revenue to generate a long route (100-P-L), 100 customers with random revenue to generate a short route (100-R-S), 100 customers with random revenue to generate a long route (100-R-L). The details of testing each parameter are below.

The restricted candidate list (RCL) size
Since we want to create a population that contains sets of solutions, the diversity of solutions is important. Thus, half of those solutions that are found in the CSS (candidate solution set) were assigned to the RCL; that is, if the number of initial solutions in the CSS was n, = then ⌈ ⌉ solutions were assigned to the RCL. This value was chosen by trial, because selecting a small RCL resulted in a population of similar solutions.

Number of iterations
The number of iterations was set to be 10, 50, 100, or 500. As seen in Tab. 4, there was no enhancement of OF values once the number of iterations was increased to more than 100.

Population Size
Several values of population size were tested: 10, 20, 30, and 40. As seen in Tab. 5, there was no enhancement of OF values once the number of iterations was increased to more than 100. Also, after setting the number of iterations to 100, increasing the population size to more than 30 did not lead to an enhancement of OF values.

Experimental results
In this experiment, we show the results for the MVPPDP using our GRASP(C) approach, in which the initial solution was constructed after clustering the search space using the clustering methods K-means, adaptive K-means, or ACO. Both ACO and GRASP(C) were run five times for every dataset, since they are stochastic approaches, while the Kmeans and the adaptive K-means were run one time each, because they are deterministic approaches. We compare our results with the GRASP proposed in Alhujaylan et al.
[Alhujaylan and Hosny (2019)] (as explained in Section 4.2), and the greedy construction heuristic (C1) and the two-stage cheapest insertion heuristic (C2) presented in Küçüktepe [Küçüktepe (2014)]. Tab. 6 presents our results for the MVPPDP using our methods: K-means-GRASP(C), adaptive K-means-GRASP(C), and ACO-GRASP(C). The results are calculated in terms of the OF of the solution (i.e., the gained profit), which is equal to total revenue minus total travelling cost, as previously shown in Eq. (1). Thus, the larger the OF value, the better the solution obtained. The results of our proposed methods are also compared with the results of C1, C2 [Küçüktepe (2014)], and GRASP [Alhujaylan and Hosny (2019)] in Tab. 6. In the table, the first column presents the name of the instance. The following two columns present the results of the C1 and C2 algorithms in terms of the best OF value. The third through sixth columns show the results of GRASP, K-means-GRASP(C), adaptive K-means-GRASP(C), and ACO-GRASP(C), in terms of the average and best objective values of five runs. For each group of instances of a particular size, the average results are shown in the highlighted row.  As can be seen from the results in Tab. 6, all the proposed algorithms demonstrated good performance, on average, in solving the small, medium, and large-sized data instances, compared to previous approaches from the literature. We achieved new best solutions (bold values) for 25 sets of data instances with K-means-GRASP(C), for 26 sets with adaptive-K-means-GRASP(C), and for 13 sets with ACO-GRASP(C). We also found solutions (underlined values) that were better than at least one of the algorithms C1, C2, or GRASP solutions as follows: 11 with K-means-GRASP(C), 10 with adaptive-Kmeans-GRASP(C), and 13 with ACO-GRASP(C). On the other hand, when we compared our construction heuristics that rely on first clustering the search space and then constructing the initial solution using GRASP(C), we found that adaptive K-means-GRASP(C) outperformed both K-means-GRASP(C) and ACO-GRASP(C) in the number of new best solutions produced (26 compared to 25 and 13, respectively). Looking at the overall average results for each category, though, we realize that the K-means-GRASP(C) method has a better performance than the other two methods with respect to small size instances (20 and 50 customers), while adaptive Kmeans-GRASP(C) outperformed the other methods in medium and large size instances (with the exception of instances of size 1000 customers). The reason for this last observation could be that the K-means clustering is based on distance only, which means the customer pairs that are geographically close are grouped together. Since this particular category of instances is very challenging, due to the large number of customers, decreasing the distance between customer pairs, by grouping them in the same cluster makes it possible to serve more customer pairs without exceeding the total trip time, thus increasing the value of the OF. In addition, looking at the average results for each instance set, the adaptive K-means-GRASP(C) outperformed C1 and C2, while ACO-GRASP(C) demonstrated acceptable performance compared to those heuristics, although it was inferior to both K-means-GRASP(C) and adaptive K-means-GRASP(C). The reason for that could be the approach used in our ACO which has a significant effect on the results. In fact, both K-means and adaptive K-means are based on static computations during the clustering process; K-means clusters customer pairs based on distance, and adaptive K-means clusters them based on distance and revenue. In contrast, ACO clustering is based on a criterion other than the distance and revenue, which is a random insertion heuristic that is used to insert the customer pairs into the appropriate clusters. In other words, for each customer pair, a random number is generated and compared to the pheromone matrix values of the clusters, and the cluster with a value greater than the random number is selected. Thus, the randomness feature of ACO might contribute to inaccurate clustering of customer pairs. Fig. 3 presents the performance of all proposed construction heuristics compared with C1, C2, and GRASP in terms of profits. It can be observed from this figure that both Kmeans-GRASP(C) and adaptive K-means-GRASP(C) have comparable results which clearly outperform all other construction heuristics. On the other hand, we also compared between the construction heuristics based on the execution time (in seconds). Since the execution time is not reported in Küçüktepe et al. [Küçüktepe (2014)], we compared our proposed algorithm with the GRASP of Alhujaylan et al. [Alhujaylan and Hosny (2019)], as shown in Tab. 7. Tab. 7 illustrates that all our clustering algorithms have an effective role in speeding the search process. Also, the modifications that were done on GRASP contribute to making GRASP(C) faster than its predecessor (recall that GRASP selects the best position for each unvisited customer pair then computes their IRs to select one pair randomly in descending order, whereas GRASP(C) first computes the IR2, selects one pair randomly from the first elements in descending order, and then selects the best position for one pair only). Moreover, the time performance of our algorithms, K-means-GRASP(C), adaptive Kmeans-GRASP(C), ACO-GRASP(C), is comparable, although ACO-GRASP(C) shows slightly shorter processing time. In general, the results in Tabs. 6 and 7 and Figs. 4 and 5 indicate that the proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time compared with all rival algorithms, GRASP, C1, and C2.

Conclusion
In this paper we presented new heuristics to construct initial solutions for the multi-vehicle profitable pickup and delivery problem (MVPPDP). The heuristics are based on first clustering the search space of the MVPPDP using three clustering methods: K-means, adaptive K-means, and ACO (ant colony optimisation). Then, a modified version of greedy randomised adaptive search procedure (GRASP) has been used to construct the initial MVPPDP solution based on the results of each clustering algorithm. We compared our results with those from the other construction heuristics that have been previously used for the MVPPDP. The experimental results proved the effectiveness of our algorithms in terms of both the solution quality and processing time. The results obtained in our research are beneficial for small-scale pickup and delivery companies with limited resources to improve their planning by reducing their cost and increasing their profit. In future work, we will improve the initial solutions by using population metaheuristics that combine two features: 1) intensification, to increase the opportunities for getting good solutions, and 2) diversification, to prevent becoming stuck in local optima. Moreover, suitable neighbourhood operators will be carefully selected to be applicable with the constraints of the MVPPDP, thus increasing the chances of producing high-quality solutions.