Two-stage stochastic programming for the inventory routing problem with stochastic demands in fuel delivery

The inventory routing problem (IRP) arises in the joint practices of vendor-managed inventory (VMI) and vehicle routing problem (VRP), aiming to simultaneously optimize the distribution, inventory and vehicle routes. This paper studies the multi-vehicle multi-compartment inventory routing problem with stochastic demands (MCIRPSD) in the context of fuel delivery. The problem with maximum-to-level (ML) replenishment policy is modeled as a two-stage stochastic programming model with the purpose of minimizing the total cost, in which the inventory management and routing decisions are made in the first stage while the corresponding recourse actions are implemented in the second stage. An acceleration strategy is incorporated into the exact single-cut Benders decomposition algorithm and its multi-cut version respectively to solve the MCIRPSD on the small instances. Two-phase heuristic approaches based on the single-cut decomposition algorithm and its multi-cut version are developed to deal with the MCIRPSD on the medium and large-scale instances. Comparing the performance of the proposed algorithms with the Gurobi solver within limited time, the average objective value obtained by the proposed algorithm has decreased more than 7.30% for the medium and large instances, which demonstrates the effectiveness of our algorithms. The impacts of the instance features on the results are further analyzed, and some managerial insights are concluded for the manager.


Introduction
This paper focuses on a multi-vehicle multi-compartment inventory routing problem with stochastic demands that occurs, for instance, in the background of petrol station replenishment. This problem is also denoted as the petrol station replenishment problem, fuel replenishment (delivery) problem, or refined oil secondary distribution problem (Cornillier et al., 2008;Chowmali & Sukto, 2020;Li & Zhang, 2020). This problem is always considered as a vehicle routing problem due to vehicle distribution cost accounting for more than 60% of the total cost of the current refined oil logistics system . However, one of the best applications now for collaboration between refined oil depots and petrol stations is a vendor-managed inventory (VMI) system in refined oil supply chain management. Under VMI, the oil supplier is responsible for managing the inventories from all petrol stations. Hence, the oil distributor is then faced with solving an integrated problem of determining replenishment quantities and timing for petrol stations and organizing vehicle routes for those replenishments simultaneously. Such an integrated problem in the literature is well known as the Inventory Routing Problem (IRP) (Coelho et al., 2014).
Generally, a vehicle is divided into several identical compartments, and the liquid oil in one compartment must be delivered completely to a single customer for the consideration of transportation safety. In addition, vehicles loaded with hazardous chemicals such as refined oil are forbidden to perform replenishment tasks during the day, and can only occur during 0 am to 5 am. Petrol stations or other customers are replenished only once by a fleet of heterogeneous vehicles from a single oil depot during the working time.
Most of the literature on such a problem assumes a constant consuming rate at each customer, a fleet of identical vehicles at the distributor, and focuses on developing solution methods that design proper quantities and vehicle route schedules. In fact, the actual demand at each customer is often stochastic, vehicles at the supplier are usually heterogeneous, each type vehicle might be equipped with a different number of compartments. After the actual demand of each customer is observed at the end of the period, the recourse cost can be calculated by the supplier. Failing to take these facts into account would certainly result in more cost of the collaboration system such as VMI.
Our problem then could be seen as a combination of two classical variants of the inventory routing problem (IRP): an inventory routing problem with multi-vehicle multi-compartment (MCIRP) and an inventory routing problem with stochastic demands (IRPSD). For the convenience of expression, our problem is defined as the multi-compartment inventory routing problem with stochastic demands (MCIRPSD).
The remainder of this paper is organized as follows: Section 2 provides a review of the literature related to this paper. Section 3 describes our problem, introduces assumptions and notation, and then gives two formal mathematical models. After that, the solution methodology is developed in Section 4 and computational results, based on MCIRPSD instances, are discussed in Section 5. Finally, conclusions and future directions for further research are presented in Section 6.

Literature review
The inventory routing problem (IRP) has received more and more attention in recent decades. The most recent surveys of the IRPs are provided by Andersson et al. (2010), Mosca et al. (2019), Raa and Aghezzaf (2009), Roldán et al. (2017). In addition, Coelho et al. (2014) and Adulyasak et al. (2015) present a comprehensive literature review on the IRP, based on the main characteristics e.g., variants, model, and solution approaches.
There are many variants of the IRP, one of the important features is whether to consider the type of vehicle and the number of compartments. The majority of the literature only considers a fleet of homogeneous vehicles consisting of a single compartment, heuristic approaches and exact algorithms are developed for this single-vehicle single-compartment IRP with various constraints (Gruler et al., 2020;Hiassat et al., 2017;Wang et al., 2018). Several IRP literatures about multi-vehicle or multi-compartment can be found in green inventory routing problems (Cheng et al., 2017;Micheli & Mantella, 2018), livestock collection problems (Oppen et al., 2010), and fuel replenishment problems (Popović et al., 2012).
Another essential stream of the IRP discusses two fundamental replenishment policies. The first replenishment policy introduced by Archetti et al. (2012) is called order-up-to-level policy (OU), once the customer is visited, the replenishment volume delivered will restore the inventory level to maximum storage capacity. The second replenishment policy called the maximum-to-level policy (ML) is a relaxation of the OU policy, which relaxes the requirement that the quantities received must restore the maximum inventory capacity at the customer. Obviously, the second replenishment policy has more flexibility and complexity, which results in higher challenges. The multi-vehicle multi-product IRP with the ML policy was proposed by Nikzad et al. (2019), who adopt a matheuristic algorithm. Mirzapour et al. (2019) developed a hybrid algorithm based on the L-shaped method for the multi-objective IRP with multi-vehicle multi-product and OU policy described above. More research for the OU and ML policies has been investigated in Su et al. (2020) and .
Special cases of the continuous-time IRP were also studied. Usually, researchers discretize the infinite or finite time and add an extra constraint on the maximum working time of the vehicle during each period. The work studied by Raa and Aghezzaf (2009) belongs to the continuous-time version. Several papers related to this research have presented heuristic solution approaches (Chitsaz et al., 2016;Hemmati et al., 2016;Raa & Dullaert 2017). For the first cyclic IRP with stochastic demands, additional safety stock, and expedited replenishments are used by Raa and Aouam (2021) to buffer the demand variability, the algorithm incorporates the local-search procedure into a metaheuristic.
Studies on the stochastic demand IRP reformulate the problem as a two-stage stochastic programming model (Agra et al., 2018;Cho et al., 2018). The modeling idea we adopt in this article belongs to this method. Li et al. (2020) constructed a twostage stochastic model for the IRP in the context of the fuel distribution with multi-depot multi-vehicle, where determining the routes and quantities lies in the first stage and deciding the recourse actions is in the second stage. The reformulated deterministic equivalent model was solved by a CPLEX solver. For the single-vehicle allocation problem of the limited medical reserves,  regarded the donated supplies as the recourse action, the objective is to minimize the allocation cost and the expected recourse cost. The optimal solutions are obtained using Gurobi solver. This modeling approach can be found in other applications, such as transportation network protection problems (Liu et al., 2009), biodiesel supply chains problems (Marufuzzaman et al., 2014), emergency system management problems (Moreno et al., 2018;Wang 2020). However, the multi-vehicle multi-compartment IRP mentioned above focuses on the situation where the demands of the customers are deterministic. Moreover, the existing literature on the IRP with stochastic demands rarely considers both continuous-time during a single period and the multi-vehicle multi-compartment. According to the real situation, all features must be taken into account in constructing the distribution scheme. To the best of our knowledge, Li et al. (2020) studied the multi-vehicle multi-compartment IRP under stochastic demands. In order to reduce the complexity of the problem, they only consider the direct distribution mode (one vehicle only visits a single customer) in their paper without organizing optimal routes. Also, maximum working time for each vehicle during a period is not considered. The purpose of this paper is to fill this gap in the research by modeling the MCIRPSD when the time is continuous and developing a powerful solution method.
The main contributions of this research are summarized as follows: First, a comprehensive variant of the IRP is studied, considering multi-compartment, stochastic demand, a fleet of heterogeneous vehicles and their maximum working time during a single period in fuel delivery. This problem of a single period can be further extended to the case of an infinite period.
Second, a two-stage stochastic programming model is proposed to design the quantities delivered to customers, the number of employed vehicles and corresponding vehicle routes. The objective of the model is to minimize the total cost consisting of distribution cost (fixed and variable costs for vehicles, loading costs for compartment) and expected recourse cost (shortage costs and inventory costs).
Finally, the accelerated Benders decomposition algorithms including single-cut algorithm and multi-cut version are developed to solve the two-stage model of smaller instances. For larger instances, the accelerated decomposition approaches are treated as the second-phase to improve the performance of the two-phase heuristic algorithms. Computational and managerial insights can be obtained based on the numerical experiments and the results.

Problem definition and mathematical model
In this section, the MCIRPSD is formally defined and the two-stage stochastic programming model is given. The MCIRPSD problem is considered where customers are replenished with a single refined oil product by a fleet of heterogeneous vehicles from a supplier depot (e.g., refined oil depot) at the beginning of a single period. Every customer's daily demand (sales volume) is stochastic. The oil depot has sufficient available supply (Raa& Aouam, 2021). Fig. 1 presents the timing of the task we assume in this study.  At the end of each day (e.g., from 23 pm to 24 pm), the supply depot makes a distribution plan for the next day according to each customer's inventory level and limited storage capacity, the deliveries are performed before the beginning of the next daily sales task (e.g., from 1 am to 5 am), and the recourse costs are calculated for all customers at the end of the next day (e.g., from 22 pm to 23 pm). There are several types of vehicles available at the supplier depot, the number of compartments in any type vehicle differs from others. Each customer can be replenished at most once during maximum working time for every vehicle per period. Each vehicle takes the same time to unload oil products at any of the customers. The customer's storage capacity and replenishment times are limited while demand is uncertain per period, then three cases might happen at the end of each period (ⅰ) customer's demand is larger than the sum of replenishment and initial inventory (inventory level from previous period), this means that the customer will be in a stockout state; (ⅱ) the demand is less than the sum of replenishment and initial inventory, in this case the customer will be in a surplus state; (ⅲ) the sum of replenishment and initial inventory satisfies the daily demand accurately. As mentioned above, the MCIRPSD could be formulated as a twostage stochastic programming model. In the first stage, before the realization of the stochastic demand, the model decides (1) when and how much oil product to deliver to the customer; (2) how to organize customers into different routes during a single period; (3) how to assign each route to a proper vehicle. In the second stage, while the stochastic information has been revealed, the model decides (1) how much shortage costs will be paid due to case (ⅰ); or (2) how much inventory costs will be paid for case (ⅱ) or other costs to pay for dealing with the surplus inventory.
The objective of the MCIRPSD is to minimize the sum of the first-stage cost and the recourse cost (second-stage cost) per period. The first-stage cost consists of fixed costs for vehicles, loading costs for oil products and routing costs. The recourse cost includes shortage costs for the stockout and inventory costs for the surplus inventory. In addition, the ML policy is adopted as a replenishment policy because of the high flexibility.

Assumptions
There are several necessary assumptions itemized: (1) Each vehicle capacity and customer replenishment are integer multiples of a single compartment's capacity.
(2) Before the start of the daily sales task, each customer's inventory level is equal to the previous period and remains unchanged until it is replenished. (3) The daily stochastic demand of each customer follows a Poisson distribution with identical known parameters (Li et al., 2020).

Notation
To formulate the MCIRPSD model, we define the following notations.

Decision variables
i vk e : the time when vehicle k K ∈ of type v V ∈ arrives at customer i C ∈ ; i vk x : non-negative integer variable corresponding to delivery quantity when customer i C ∈ is replenished by vehicle k K ∈ of type v V ∈ (integral multiple of a single compartment's capacity); : non-negative continuous variable corresponding to surplus inventory level at customer i when ξ is realized; : non-negative continuous variable corresponding to amount of shortage at customer i when ξ is realized.

Two-stage stochastic programming model
The first-stage mathematical model of the MCIRPSD is as follows: (1 ), , , , , The second-stage mathematical model is: The first-stage problem determines the routes of vehicles, the delivery quantity and arriving time to each customer before demand is revealed. The objective function (1) minimizes the distribution cost and recourse cost per period, consisting of the fixed costs for vehicles, the loading costs for compartments and the route costs, where ( ( , )) E Q x ξ ξ denotes the expected recourse cost for the second-stage problem. Constraints (2) and (3) imply each vehicle being used starts from and returns to the oil depot respectively. Constraints (4) represent flow balance. Constraints (5) indicate that the total delivery quantity from a single route cannot exceed the vehicle's capacity. Constraints (6) impose the delivery quantity constraint on a visited customer. Constraints (7) ensure that every customer is replenished at most once during one period. Constraints (8) define the arrival time constraint of the vehicle between two adjacent customers. Constraints (9)-(12) describe the domains of the firststage decision variables. After the demand is known, the second-stage problem minimizes the recourse cost based on the first-stage variable x , as stated in formulation (13). Constraints (14) design the relationship between the customer's initial inventory, replenishment, shortage, surplus, and actual demand. The rest constraints describe the non-negativity of the second-stage variables.

Deterministic equivalent mixed-integer programming model
This study focuses on the two-stage stochastic programming under the realizations of finite scenarios is q s S ∀ ∈ is the actual demand level at customer i under realization s . As discussed above, the twostage stochastic programming model of the MCIRPSD can be specified as an equivalent deterministic mixed-integer formulation: subject to (2)- (12) , , Note that the two-stage stochastic programming model is a complete recourse problem due to its second-stage problem being always feasible for any given first-stage decision variables values (Birge & Louveaux 2011).

Solution approach
The two-phase heuristic approaches are based on accelerated Benders decomposition algorithms, whose primal version called classical Benders decomposition was formally proposed by Benders (1962). Basically, the classical Benders decomposition is a procedure in which all decision variables are divided into distinctive subsets so that the corresponding solutions of each subset are not processed considering all constraints simultaneously. This decomposition approach was later extended and applied by Van Slyke and Wets (1969), who introduced the so-called L-shaped method.
This approach decomposes the primal problem into two relatively simpler formulations, called master problem (BMP) and subproblem (BSP), respectively . For two-stage stochastic programming, the first-stage problem is modeled as a master problem and each realization of the second-stage problem is regarded as a subproblem naturally. The master problem is solved as a relaxation of the primal problem and provides first-stage variables values for the subproblem to get a feasible solution. At each iteration, the Benders cuts (i.e., feasibility and optimality cuts) are generated and added to the master problem to improve the feasible solution obtained in the last iteration, such cuts could be seen as new constraints that must be satisfied by the master problem. Both problems are solved iteratively until the optimal solution to the primal problem is found.

Classic Benders decomposition
In this section, we designed the classical Benders decomposition method to solve our two-stage stochastic programming model. Naturally, the Benders master problem (BMP) is as follows: subject to (2)-(12) The Benders subproblem (BSP) is given by Firstly, the classical decomposition algorithm is designed based on the single-cut, which aggregates the dual information of the linear programming subproblems corresponding to all realizations of the stochastic scenarios to find optimality cuts. Then the single-cut based algorithm is extended to the multi-cut version decomposition algorithm. The multi-cut approach makes full use of the dual information associated with each subproblem of the random realization to generate the optimality cuts without losing any information. Furthermore, since the stochastic programming model proposed in Section 3.4 is a complete recourse problem, the feasibility cuts need not be considered (Noyan, 2012).

Single-cut based Benders decomposition algorithm
The BMP is a relaxation of the primal problem, its optimal value provides a lower bound for the primal optimal function value. In addition, the single-cut decomposition algorithm will solve | | S subproblems associated with all realizations of the stochastic demands. The subproblems' total objective value is equal to the sum of the expected function values over all subproblems. Then, the Benders master problem (BMP 1) in single-cut algorithm can be reformulated as follows: where θ is the lower bound of subproblems' total objective function value, while the upper bound is denoted by ( , ) s S Q x s θ ∈ =  . Z. Li and P. Jiao / International Journal of Industrial Engineering Computations 13 (2022)   513 The Benders subproblem (BSP 1) in single-cut algorithm is: Step 2. 6: If * = θ θ , then stop and return the optimal BMP 1 solution * x . The stopping criterion * = θ θ (Step 6) here need to be relaxed to guarantee that the algorithm can eventually converge and terminate with an acceptable optimality gap ε . For the single-cut decomposition algorithm, θ is the upper bound of the subproblems' total cost and * θ is the lower bound. Then, the optimal termination criterion for the single-cut algorithm can be written as follows: θ θ ε θ *

− ≤
The algorithm is terminated when the optimal stopping criterion is achieved, where the optimality gap ε is a given stopping tolerance value (Noyan 2012, Sikora 2021  and the coefficient matrix

Multi-cut based Benders decomposition algorithm
To design the multi-cut based Benders composition algorithm, we introduce extra variables , s s S θ ∀ ∈ associated with the lower bound of each subproblem's expected function value to reconstruct the master problem's objective function.
Then, the BMP 2 in multi-cut algorithm can be reformulated as follows: Similarly, the upper bound of each subproblem's expected objective function value is denoted by ( , ) s EQ x s θ = . The Benders subproblem (BSP 2) in multi-cut algorithm is: Step 5 are the same as those in the single-cut algorithm. Furthermore, a similar termination criterion for the multi-cut algorithm can be represented as follows: The algorithm should be terminated when the optimal stopping criterion is achieved, where the optimality gap ε is the tolerance value adopted in the single-cut version.

Acceleration strategy
The classical decomposition algorithm SC or MC can be applied to solve the two-stage stochastic programming model directly, in which the first-stage problem contains plenty of integer variables. That means the algorithm has to solve a mixed-integer programming formulation. In other words, the algorithm would deal with a huge computational challenge, because more CPU time will be consumed at each iteration to obtain the master problem's optimal solution.
In this study, a simple strategy is incorporated into Step 3 to improve the efficiency of the decomposition algorithms. This strategy is to restrict the solution time of Step 3 to reduce each BMP's computing time. At each iteration, this strategy allows the optimal solution of the BMP to be replaced by a feasible solution obtained within a restricted computing time. Regardless of the feasible solution's quality, the algorithm accepts the feasible solution and executes subsequent steps such as constructing optimality cuts and inserting them to the BMP. Such optimality cuts can always continuously optimize the solution space of the BMP in each iteration, may be at the cost of more iterations.
Other acceleration strategies have been embedded into the decomposition algorithm to generate optimal (or near-optimal) solutions to the Benders master problem more efficiently (Laporte & Louveaux 1993). The multi-cut decomposition algorithm with acceleration strategy and the single-cut version are called "A-MC" and "A-SC", respectively.

Two-phase heuristic algorithm
Although the above-mentioned acceleration technology can speed up the convergence of the decomposition algorithms. It is still hard for the improved algorithms to solve more complicated instances, because relative low-quality solutions obtained in limited time are quite difficult to produce satisfactory optimality cuts, especially for our MCIRPSD problem. As indicated above, the number of iterations and computation time required by the algorithms may still be unacceptable in practice.
To remedy this issue, inspired by different distribution modes (i.e., the matching relationship between vehicle capacity and customer maximum demand), two-phase heuristic approaches based on the accelerated Benders decomposition algorithms are developed. In the first phase, the original customers set is divided into two subsets (i.e., the customers set using direct distribution mode and another set) and then the former set's total cost is calculated. In the second phase, the improved decomposition algorithms are respectively used to produce another set's distribution schemes. The two-phase heuristic algorithm can be summarized as Algorithm 3.

Algorithm 3
The two-phase heuristic algorithm Input: Data on the customers, vehicles, costs, and stochastic parameters; Output: The optimal solutions * * * , , x y z , and the optimal total cost of primal problem _ total cost ; The details of the two-phase heuristic algorithm are listed as follows: Stage 1. Divide the customers set into two subsets, and compute the total cost _ direct total of the direct distribution scheme.
1. According to the known random demand scenario s q , obtain each customer's maximum demand mq : = max{ } s s S mq q ∈ The maximum demand value is an integer. 2. By the first assumption, the replenishment quantity for each customer must be an integer multiple of a single compartment capacity. The maximum demand mq obtained in Step 1 is used as the replenishment quantity represented by the number of compartment. The advantage of this operation is that the quantity delivered to each customer will cover the demand fully during this period after completing a replenishment.
3. Assign the customer whose replenishment mq exceeds 2 max Q to a group, and then assign the corresponding number ( _ direct vehicle ) of large-capacity vehicles to serve each customer directly without designing the optimal routes. 4. According to all the relevant parameters (i.e., distance, replenishment, and various costs) and the objective function, compute the total cost _ direct total under the direct delivery mode.

The number of remaining customers denotes as _ C direct vehicle
− and the upper bound of the replenishment for such customers is 2 max Q . Determine the total quantity of vehicles K required to serve all customers: For another group with the remaining customers (calculated as _ C direct vehicle − ), the scale of this instance relative to the primal instance has been reduced. We can solve this instance associated with fewer customers and obtain the total cost _ optimize total by the algorithms introduced in Section 4.2.
The total cost of the primal problem is denoted as follows: _ _ _ total cost direct total optimize total = + For simplicity, the two-phase algorithms based on the improved Benders decomposition algorithms are represented by "A-MC*" and "A-SC*".

Experiments
In this section, the computational results for some problem instances are presented to illustrate the effectiveness and performance of the methods described in Section 4.2 and 4.3. The optimization model and all the algorithms are coded in Python 3.8 and executed on a 64-bit workstation equipped with an Intel(R) Xeon(R) CPU E5-2650 clocked at 2.20 GHz and 24 GB RAM, running on a Linux operating system with 2 twelve-core. Gurobi 9.1.2 is used as the BMP problem solver and BSP problem solver in decomposition and two-phase algorithms. For the deterministic equivalent mixed-integer programming model proposed in Section 3.4, a time limit of 2500 seconds is imposed on the Gurobi solver per instance.

Problem instances
For the computational experiments, we adopt a subset of the R101 benchmark instance created by Solomon (1987) to test our algorithms. In this section, instances are designed with the following features: (1). The instance size is between 10 and 50 customers, small instances are constructed with 10 and 20 customers ( =10, 20 C ), medium instances are generated with 30 and 40 customers ( =30, 40 C ), and the largest instances are developed with 50 customers ( =50 C ).
(2). The vehicle type is set to 2 ( { } 1, 2 V = ), the single compartment capacity is fixed at 5 tons. The vehicle capacity of the two types is 10 tons and 20 tons ( 2, 4, v Q v V = ∀ ∈ ) respectively. (3). Although there are a large number of scenarios for all customers, the demand scenarios from different customers are dependent, and all scenarios can be divided according to weather conditions such as extreme weather or not, and so on. As a result, it is reasonable to merge most similar scenarios into fewer ones (Li et al., 2020). Then, the number of scenarios with stochastic demand is between 2 and 5 ( =2,3, 4,5 S ). For both decomposition algorithms, the optimality gap ε is set to 0.05. For the route costs, Euclidean distance is used as the distance between two locations, then the sum of the length for all routes is regarded as the route costs. For the extension to the instance of random demand, we generate stochastic demands according to Poisson distribution with parameter =2 λ .

Performance of the algorithms
In the first experiment, the improved decomposition algorithms are validated by benchmarking them against classical algorithms. This experiment is conducted on 4 instances with 8 customers. Table 1 reports the results obtained on MCIPRSD instances. For each method, Table 1 gives the CPU time ("t") of the instances solved to optimality and the improvement percentage of the best solution time relative to the worst time ("imp"). The following naming convention is used to uniquely remark each instance: C-S, where C denotes the number of customers, S the number of scenarios. The computational results reported can be summarized as follows: (1). All methods could solve the instances to optimality in 780.00 seconds.
(2). For the classical decomposition algorithms, the MC outperforms SC, and it is obvious that the SC has the worst performance in all instances. The maximum CPU time (minimum CPU time) of MC for all instances is 453.22 (108.95) seconds, while the CPU time of SC is 779.38 (300.39) seconds respectively.
(3). For the improved decomposition algorithms, A-MC is also better than A-SC, the maximum time (minimum time) of A-MC for all instances is 82.69 (59.63) seconds, while the CPU time of A-SC is 161.66 (68.38) seconds. The efficiency of both algorithms has been improved more than the classical versions. The minimum improvement percentage of the CPU time (imp) are 30.16% and 75.66% respectively.

Decomposition algorithms for small instances
The next experiment measures the performance of the improved decomposition algorithms on small instances. In this experiment, we solved 8 small instances with the Gurobi solver and improved algorithms respectively. As described above, running time is limited to 2500 seconds on the solver for the MCIRPSD model per instance. The computing time of 50 seconds is imposed on the problem BMP of the improved algorithms.
Detailed results for each method and instance are given in Table 2. The first column of the table gives the information of the instance. For each method, this table gives the best (optimal or near-optimal) objective function values ("best", "bestM", "bestS") computed within the limited computing time, the solution time ("t") in seconds, and the percentage gap between the algorithm's best cost and the solver's, calculated as 1 ( )/ M M gap best best best = − and 2 ( )/ S S gap best best best = − .
For the instances of 10 customers, the solver solves all to optimality, the same optimal solutions for 3 instances (10-3, 10-4, 10-5) are also found by the A-SC approach, the method A-MC only finds the optimal solution for the 10-3 instance. Generally, the high-quality solutions (e.g., the highest gap is only 3.19%) are obtained by our algorithms for all instances. From the perspective of solution time, the running time of the solver is no more than 100 seconds, and the time does not show a strict increasing trend with the increase of the instance size. This trend is also reflected in the solution time of the A-MC algorithm. However, the CPU time of the A-SC algorithm gradually increases with the size of the instance increase.
For the instances of 20 customers, the solver can only get the near-optimal total cost (also called upper bound in other literature) in the limited time. Similarly, none algorithm obtained the optimal solutions, the percentage gap1 is between 1.22% and 2.84% while the gap2 is between 0.43% and 3.51%. The last row of this table presents the average value of all computational results. The average total cost value of the solver is the best among the three methods, although the A-MC method has poor performance, its average gap is only 1.40%. When comparing the solution time among the three methods, we observe that the A-MC significantly outperforms the Gurobi solver and A-SC algorithm.

Fig. 2. Iterative process of the upper and lower bounds for subproblem of the 20-5 instance
Taking the 20-5 instance as the example, the upper boundθ and lower bound * θ for the problem BSP of both algorithms are shown in Fig. 2. The method A-MC requires 31 iterations while the A-SC algorithm needs 78 iterations. It is clear that the problem BSP's lower bound is continuously tight, but the upper bound always fluctuates up and down instead of falling steadily. Since the upper boundθ associated with the first-stage feasible solution * x needs to be obtained indirectly by solving the problem BMP, which is an MVMCIRP problem, the quality of the feasible solution obtained within the imposed time limit cannot be controlled. This is why the upper bound preforms such an unusual characteristic. However, from the overall trend, the feasible solution is approximating the optimal solution with the iteration of each algorithm.

Two-phase algorithms for medium and large instances
This experiment evaluates the two-phase heuristic algorithms for instances with more than 20 customers. We solved all 12 instances under increasing numbers of the customer, i.e., 30, 40,50 C = . In this section, the same solution time settings were adopted for the solver and the problem BMP in the decomposition algorithms. Table 3 shows the results of these methods on each instance, together with the average value and difference of these results. The columns "gap1(%)" and "gap2(%)" in Table  3 have a slight difference (i.e., added symbol "-") compared to the columns in Table 2 because these two methods (A-MC* and A-SC*) perform better than the solver in some medium instances (i.e., 30-2, 40-2, 40-4, 40-5) and all large instances.
When the number of customers increases, our two-phase algorithms are more and more powerful than the solver, particularly well on the 50-2 instance (the percentage gap reaching -47.45%). On the other hand, for those instances with the same customers, it is obvious that the computing time goes down when the number of scenarios increases. With larger customer and scenario numbers, this trend becomes more apparent, ranging from 4244.80 to 7.30 (6342.04 to 6.93) seconds for the large instances. These results in Table 3 confirm that better performance can be obtained by visiting some customers directly with more demand and without organizing the routes. This increases the actual number of vehicles required and fixed costs, but such a method decreases the customers who need replenishment and route design by the algorithms. Next, Table 4 reports more detailed costs obtained by the three methods described above about the instances with 40 and 50 customers. Note that the direct delivery is applied to serve the customer whose maximum demand exceeds 2 max Q (more than 2 compartments in our settings) in the first-phase algorithm. Firstly, we get the sum of the vehicle costs, compartment loading costs and routing costs, called the first-stage objective function value from the Gurobi solver and this cost value is denoted by "costf", and then remark the recourse cost consisting of shortage costs and inventory holding costs as "costr". In the following, the cost of direct delivery is denoted by "costMd" ("costSd") in Table 4, including vehicle fixed costs, compartment loading costs, and routing costs in the first phase. Similarly, the sum of the vehicle fixed costs, loading costs and routing costs obtained from the second-phase algorithms (i.e., A-MC and A-SC methods) is denoted by "costMg" ("costSg"). Finally, we record the recourse costs obtained from both algorithms as "costMr" and "costSr". With the number of customers and scenarios increases, higher demand for a customer will appear more frequently and the number of customers who need direct service increases, resulting in a significant increase in the direct delivery costs "costMd" and "costSd". Conversely, fewer customers need the second-phase algorithms to arrange their optimal replenishment and routes, which leads to lower "costMg" and "costSg". In addition, it can be seen that the recourse costs "costr" in our two-phase algorithms are equal or smaller than the solver with increasing sizes of instances. Furthermore, in order to compare intuitively the first-stage costs and the recourse costs obtained by the three methods, Table  5 shows the improvement percentage gap of the costs difference between the algorithms and solver. The gap formulations here are the improvement percentage of the algorithms costs relative to the solver costs: 1 ( )/ The differences for the first-stage costs "costf", "costMf" and "costSf" are relatively limited for most instances, but it is the second-stage cost that cause all the difference. The second-phase algorithms (A-MC and A-SC) do provide sufficient advantages for our two-phase algorithms, which lead to less recourse costs. At the same time, this effect is becoming more apparent with larger customer numbers, i.e., the highest improvement gap is 72.68%.

Sensitivity analysis of instance features
The influences of two features to the results are analyzed in this subsection, i.e., the number of employed vehicles and the ratio of fixed costs between the large-capacity vehicle and the small-capacity vehicle. The medium instance 30-5 with 26 vehicles is regarded as an appropriate example. Let the number of used vehicles vary from 24 to 28 while fixing other parameters. The total cost and running time corresponding to different numbers of vehicles are obtained by our two-phase algorithms. Fig. 3 presents the impact of the number of employed vehicles on the total cost. The total costs from both algorithms are almost the same affected by the number of employed vehicles. We can find that the total costs decrease significantly as the number increases from 24 to 25. When increasing this number up to 26, its impact on the total cost is very small. The total cost is no longer affected until the number exceeds 26. According to the results detailed in Fig. 4, we can claim that the running time increases with the number of vehicles and the increased running time is relatively limited as the number of vehicles increases from 24 to 25. However, we can also make such a claim for the number of vehicles, i.e., a larger number (more than 25) result in a higher running time, especially for the method A-SC*.
Based on the above analysis, a larger number of vehicles avoids the occurrence of the higher total cost but hence leads to a longer running time to formulate the distribution schedule. As a trade-off, an appropriate amount can not only effectively decrease the total cost but also actually reduce the formulation time of the distribution policies, which illustrates the importance of an outstanding method to determine the vehicles' number.
For the medium instance 30-5 with 26 vehicles in the previous section, the fixed cost ratio is 2 (200 yuan for the large-capacity vehicle and 100 yuan for the smaller one). Let the ratio vary from 1 to 5 while fixing the cost of the small-capacity vehicle.
The total cost and the number of vehicles employed corresponding to different ratios are obtained by the two-phase algorithms. Fig. 5 presents the influence of the fixed cost ratio on the total cost. The total costs from both algorithms are almost the same affected under the same ratio of the fixed cost. We can find that the total costs increase significantly with the ratio increases from 1 to 5. Obviously, a higher fixed cost ratio (higher fixed cost for the large-capacity vehicle) results in more total cost. The number of the vehicles used by both algorithms are the same under the same ratio, except for the case where the ratio is 3. Since the fixed cost of the large-capacity vehicle is too large (the fixed cost of the large-capacity vehicle is at least 3 times of the small-capacity vehicle), which results in more fixed cost on each compartment of oil, so it cost-effective to use more small-capacity vehicles instead of one larger vehicle. Particularly, when the ratio exceeds 2, two additional small-capacity vehicles are used in most instances, and the number of the large-capacity vehicles used is decreased by 1. In addition, for instance 30-5 with 27 vehicles, when the ratio varies from 1 to 5, the corresponding number of the vehicles used is 27-0, 26-1, 24-4, 24-4, 24-4. It can be concluded that when the ratio is no more than 3, the larger the fixed cost ratio, the more small-capacity vehicles are used in the instance. As mentioned before, these experiments confirm that our two-phase solution approaches are quite extraordinary in designing routes, delivering replenishment, and employing applicable vehicles to search for solutions that provide a better total cost for the MCIRPSD.

Conclusions
This paper studies a new inventory routing problem in fuel delivery. Specifically, we consider a multi-vehicle multicompartment inventory routing problem with stochastic demand and ML policy (MCIRPSD) in a single period. Before obtaining the accurate demand from the customers, the appropriate replenishment and routes are determined for all customers by balancing the distribution cost of vehicles and routes with the expected recourse cost that result from surplus and stocking out. To describe such a problem, a two-stage stochastic programming model is proposed and its deterministic equivalent MILP model is formulated.
The MCIRPSD consists of determining, in the first-stage, the delivery quantities and arriving time, the vehicles and routes that will be used in the single period. Therefore, the first-stage problem is a continuous-time, multi-vehicle multi-compartment inventory routing problem with ML policy, which contains integer variables and continuous variables and is NP-hard. In order to deal with the two-stage problem, improved decomposition algorithms and two-phase heuristic algorithms based on the former are presented. The proposed problem and model can be solved for different scale instances, and then different distribution strategies could be evaluated by the decision-makers. From the computational results, we can observe that the improved decomposition approaches can find the optimal and near-optimal solutions compared to the equivalent MILP model using Gurobi solver on instances with up to 20 customers. Moreover, our two-phase algorithms are quite powerful in obtaining good solutions for different medium and large instances with up to 50 customers.
In addition, sensitivity analysis is conducted to assess the impacts of the number of vehicles and the fixed cost ratio on the results. More vehicles obtain relatively stable total cost but result in unacceptable running time, less numbers get more total cost but satisfactory running time. According to the trade-off and preference of total cost and waiting time, a more appropriate number of vehicles can be employed based on the method in this paper. Once the fixed cost of the both vehicles is known and the fixed cost ratio is equal to 1, it is better to employ enough large-capacity vehicles for the distributor. When this ratio is more than 2, relatively more small-capacity vehicles and less large vehicles should be employed by the manager.
In terms of future research, we could consider many relevant extensions of the MCIRPSD. So far, we assume that the oil depot always has enough available inventory at the beginning of each period. Then considering a limited supply at the oil depot is the first extension. Another interesting extension for further research is the improvement of the decomposition algorithms. The solution time is limited for the problem BMP, but it is worthwhile to develop other more effective heuristic or exact approaches to accelerate the computing time for the BMP.