Shared Mechanism-Based Self-Adaptive Hyperheuristic for Regional Low-Carbon Location-Routing Problem with Time Windows

In this paper, we consider a variant of the location-routing problem (LRP), namely, the regional low-carbon LRP with reality constraint conditions (RLCLRPRCC), which is characterized by clients and depots that located in nested zones with different speed limits. The RLCLRPRCC aims at reducing the logistics total cost and carbon emission and improving clients satisfactory by replacing the travel distance/time with fuel consumption and carbon emission costs under considering heterogeneous fleet, simultaneous pickup and delivery, and hard time windows. Aiming at this project, a novel approach is proposed: hyperheuristic (HH), which manipulates the space, consisted of a fixed pool of simple operators such as “shift” and “swap” for directly modifying the space of solutions. In proposed framework of HH, a kind of shared mechanism-based self-adaptive selection strategy and self-adaptive acceptance criterion are developed to improve its performance, accelerate convergence, and improve algorithm accuracy. The results show that the proposed HH effectively solves LRP/LRPSPD/RLCLRPRCC within reasonable computing time and the proposed mathematical model can reduce 2.6% logistics total cost, 27.6% carbon emission/fuel consumption, and 13.6% travel distance. Additionally, several managerial insights are presented for logistics enterprises to plan and design the distribution network by extensively analyzing the effects of various problem parameters such as depot cost and location, clients’ distribution, heterogeneous vehicles, and time windows allowance, on the key performance indicators, including fuel consumption, carbon emissions, operational costs, travel distance, and time.


Introduction
Location-routing problem (LRP) is the most significant and widely studied combinational optimization problem, with real-word applications in distribution logistic and transportations, such as obnoxious waste and disaster relief [1,2]. Lowcarbon logistics is combination of depots location and routing vehicle with concerning environment pollution. Recently, several locating, routing, and distributing of LRP simulations are of social, economic, and environmental significance [3], which lead to obtain importance in studying low-carbon logistics. This inspired us to define a LRP with considering environmental effect that seeks to reduce total logistics cost and carbon emission with replacing the part of travel distance/time by fuel consumption cost.
Low-carbon location-routing problem (LCLRP) was proposed by Zhang et al. [4]. The goal is to determine the facility location and vehicle routes with low-carbon emission in order to minimize the total carbon emission consisted of depots fixed carbon emission and vehicle travel carbon emission, which addressed the low-carbon objective of environment friendly routing and depot location. Regional LCLRP (RLCLRP) was developed by Koc et al. [5] to assess the effect of depot, clients, and fleet on the logistic cost, fuel consumption, and carbon emission by replacing the routing distance with fuel consumption cost under the established 2 Mathematical Problems in Engineering speed zones. There are some considerable differences between RLCLRP and the proposed model in this paper, namely, RLCLRPRCC. RLCLRP concerns the economic and environmental effect but RLCLRPRCC also takes clients satisfaction into account, namely, the hard time windows of clients. In RLCLRP, delivery demand of clients is the only activity, but in the proposed model, the pickup is also an important planning. The arcs-specific speed and speed zones in RLCLRP are fixed but in this paper are stochastic. Moreover, RLCLRP is solved using pollution-and-location-heterogeneous adaptive large neighborhood search and RLCLRPRCC is tackled by hyperheuristic with shared mechanism-based selection strategy and self-adaptive acceptance.
The main contributions of this paper are as follows: (i) Problem modeling. The proposed problem takes several real-world conditions into account, such as heterogeneous fleet, simultaneous pickup and delivery, special speed zones, and hard time windows, and considers the economic, social, and environmental effect. (ii) Hyperheuristic framework is presented and a pool of low-level heuristics (LLH) are established. To the best of our knowledge, this framework is firstly introduced to solve RLCLRPRCC. (iii) Shared mechanism-based choice strategy and selfadaptive acceptance criterion are developed. Aiming to refine the performance of hyperheuristic framework, shared mechanism-based choice strategy and self-adaptive acceptance criterion are designed to accurately and real-timely evaluate the current performance of the LLHs. (iv) Extensive experiments on RLCLRPRCC were carried out to assess the effects of the key parameters on the total cost, carbon emission, travel distance/time, etc. Several suggestions on management were also provided.
The reminder of this study is organized as follows. Section 2 provides a brief review on problem and proposed heuristics. The mathematical formulation and problem description are carried out in Section 3. The proposed approach is detailed in Section 4. Section 5 evaluated the proposed model and hyperheuristic and finally conclusions are drawn in Section 6.

Problem Domain
Review. LRP deals with the combination of two types NP-hard decisions that often emerge in logistics: the location-allocation problem (LAP) and vehicle routing problem (VRP) (Ouhader and Kyal) [8] (Sun) [9]. Among the applications of LRP, LRP considering environmental effect such as carbon emission/fuel consumption, namely, LCLRP, has recently emerged as one of the most addressed, which also handles two NP-hard problems: LAP and pollution-routing problem (PRP) (Bektas and Laporte) [10] or green vehicle routing problem (GVRP) (Liu et al.) [11].
LCLRP is the vehicle routing problem affected by depots location and magnitude, which concerns fuel consumption and carbon emission. Reducing the fuel consumption and improving the transportation efficiency at an operational level would be the most straightforward course of planning activities (Lin et al.) [12]. As stated in Demir et al. [13], evaluation and reduction of fuel consumption and carbon emissions call for excellent estimate model to be fed into logistics actions and methods. Moreover, inspired from various characteristics and nature of transportation actions, type and nature of estimation model are also significant for accurately evaluating fuel consumption and carbon emissions. Meanwhile, fuel consumption has been stressed the importance as a key indicator of approximating carbon emission, as the amount of fuel consumption is directly proportional to the amount of carbon emission (Poonthalir and Nadarajan) [3]. As time went on, the mathematical models estimating fuel consumption/carbon emission have encouraged diverse thinking. Among various fuel consumption and carbon emission models, three main categories can be included as follows: (i) Factor models. Few key factors are concerned as simple fuel consumption, such as vehicle load, travel distance, etc. Typical cases are fuel consumption rate model (Xiao et al.) [14], FCR with speed (Liu et al.) [11], fuel consumption calculation (FCC) (Poonthalir and Nadarajan, Kuo et al.) [3,15], and models proposed by Li et al. [16].
The relationship of the above three models can be drawn as follows. The first factor models are considered as the simple fuel consumption of macroscopic models and the last two are mutually transformational. For various different natures of transportation actions, the rationality and correctness of corresponding fuel consumption models play a role in approximating the fuel consumption and carbon emission. Among the factors of the above three models, five main categories can be derived: vehicle parameters, traffic condition, weather condition, driver habit/skill, and provided facility. For example, Poonthalir and Nadarajan [3] emphasized that the drivers may travel with some likely Mathematical Problems in Engineering 3 speed. And the number and location of depots or fleet size and type are also the key points in affecting the fuel consumption and carbon emission. For better understanding of fuel consumption/emission models and factors, the readers are recommended to the papers of Lin et al. [11] and Demir et al. [13,23].
Among various factors affecting fuel consumption and carbon emission, fleet size and type are of particular significance and much practical in most distribution activities (Koc et al., Hoff et al.) [5,22,24,25]. Koc et al. [24] illustrated the benefits of mix fleet in reducing carbon emission. Li and Fu [26] concluded that heterogeneous fleet can efficiently reduce fuel consumption/carbon emission or number of fleet and increase the capacity utilization rate of fleet. Koc et al. [5] also demonstrated that the usage of mix fleet can reduce logistic total cost and capacity utilization rate. Pitera et al. [27] explored rules of thumb for vehicle assignment within mix fleet to obtain an understanding of simple implementation, such as assigning cleaner vehicles to routes with more clients and longer travel distances. Xiao et al. [28] emphasized that mix fleet is concerned with individualized features including fleet types, carbon emissions rates/models, load capacities, and so on.
Vehicle speed is another key factor in impacting on the fuel consumption and conversion rate and carbon emission (US Department of Energy) [29]. As stated in Barth and Boriboonsomsin [30], when the vehicle velocity is less than 30mph, carbon emission and fuel consumption will nonlinearly grow rapidly; that is, when the vehicle speed decreases from 30mph to 12.5mph or 12.5mph to 5mph, carbon emission and fuel consumption per one mile will be double. The different vehicle velocities are assumed with respect to different times of day to incorporate traffic regulations (Kazemian and Aref) [31] or traffic congestion. Aiming at reflection of speed limits, several main manners were applied and described as follows. The first one for representing traffic congestion is featured that different speed or travel time is to formulate it as a step function of time and uses a simple method to obtain continuous travel times (Kuo [15], Xiao et al. [28], Kazemian and Aref [31], Mirmohammadi et al. [32], Figliozzi [33], Soysal et al. [34], Franceschetti et al. [35], etc.) or only one speed limit is assigned to a specific vehicle which is traversing a particular arc (Afshar-Bakeshloo et al.) [36]. Then, Poonthalir and Nadarajan [3] proposed a novel strategy to simulate the varying speed environment using triangular distribution of probability density function. In their method, expected values within several speed intervals, representing the likely speed of driver in each interval, were given in advance to calculate the average expected speed. The third one is the method proposed by Koc et al. [5], namely, the arc-specific speeds and speed zones. Three speed zones with different constant vehicle velocity are established to represent the speed limits, and the cost of depots differs in each zone. The authors designed the Cheapest Path Calculation (CPC) strategy to obtain the nondominated set of paths determined by the nature of CMEM model and speed zones. This work is based on the strategy proposed by Koc et al. [5], but the location and velocity of each nested speed zone are stochastic, namely, V zone1 <V zone2 <V zone3 and V zone3 ∼U(60, 80), V zone2 ∼U (40,60), and V zone1 ∼U (20,40). The reason is that each travel time or distribution duration of each vehicle in city logistics could keep constant in a few hours, but they will vary with the traffic conditions even if speed limits are fixed, aiming at artificial/automatic applying adequate speed in line with different traffic conditions resulting in selecting the low-cost path, just like the approach timing coefficient (Zhang et al.) [37]. Additionally, several real-world conditions are considered in the proposed mathematical model: simultaneous pickup and delivery and hard time windows.
Reverse logistic, namely, simultaneous pickup and delivery, was proposed by Karaoglan et al. [38] to provide better service for clients in LRP (LRPSPD). Karaoglan et al. [39] proposed two types of LRPSPD models, namely, node-based and flow-based. Yu et al. [7,40] developed simulated annealing (SA) heuristic for LRPSPD, with high quality results. Beyond enterprise economic and environmental effect, clients' satisfaction level is also an extremely important factor in wining clients' heart for a long term, that is, clients service time windows. Zhang et al. [41] applied membership function to estimate the degree of satisfaction. Afshar-Bakeshloo et al. [36] developed service-level function of fuzzy time windows. Wang and Li [42] concerned clients satisfactory by penalizing the vehicle arrive early or lately outside time windows.
To our best knowledge, the differences between our paper and Wang and Li [42] are detailed: (1) type of clients' time windows, (2) model of fuel consumption and carbon emission, (3) solution algorithms, and so on. Therefore, there is no published work simultaneously integrating in a case study of LRP with arc-specific speeds and speed zones, heterogeneous fleet, simultaneous delivery and pickup, and hard time windows.

Approach Review.
As in "no free lunch theorems" (Wolpert and Macready) [43], even though one framework and method may be the best performing within a few research areas, no one can perform ideally in all situations. Moreover, it is beyond all doubt that it is hard task for tester without a deep knowledge in solution domain, as studying existing search-based approaches is generally domain-dependent. The ideal of hyperheuristic (HH) was defined by Denzinger et al. [44] and Crowling et al. [45] as "heuristics to choose heuristics". Posteriorly, an extensive version was developed by Burke et al. [46] as a methodology and classified into two types: heuristic selection and heuristic generation (heuristics to generate heuristics). This paper focuses on the former based on single-point-search method, and brief description and review are provided hereafter.
In the framework of HH, two levels are concerned: HLH and LLH. The HLH manipulates the space consisted of a fixed pool of LLHs instead of directly transforming the space of solutions (Kalender et al.) [47], which is independent with problem domain. Two main categories can be considered in HLH: selection mechanism and acceptance criterion (Özcan et al.) [48]. The role of heuristic selection mechanism is to intelligently select appropriate heuristics from the pool of LLHs, whilst acceptance criteria are to decide whether to accept or reject resultant solution after applying the selected LLH (Sabar et al.) [49]. By analyzing the source of feedback information, three modules can be considered in selection mechanism: on-line, off-line, and no-learning. Choice function (Cowling et al.) [45], Fitness Rate Rank based Multi-Armed Bandit (FRR-MAB) (Li et al.) [50], reinforcement learning (Nareyek) [51], tabu search (Zamli et al.) [52], etc. are examples for adaptively selecting the appropriate LLHs by evaluating the performance of each LLH, and simple random (SR), random descent (RD), random permutation (RP), etc. are viewed as no-learning methods. Acceptance criterion can be classified into two types: deterministic and nondeterministic methods. The former determinately accepts the resultant result, such as all moves (AM), only improving (OI), improving and equal (IE), etc., whilst SA, great deluge (GD), and Monte Carlo (MC) are instanced as the nondeterministic methods.
To the best of our knowledge, hyperheuristics have not been used thus far to address proposed problems. Moreover, in proposed framework of HH, a kind of shared mechanismbased self-adaptive selection strategy and self-adaptive acceptance criterion are developed to improve the performance, accelerate convergence, and improve accuracy; the details of proposed approaches will be provided in Section 4.

Mathematical Formulation
In this paper, fuel consumption and carbon emission are instantly estimated using Comprehensive Modal Emission Model (CMEM), which can be viewed as a state-of-theart microscopic fuel consumption/carbon emission model because of its ease of application (Demir et al.) [23]. The proposed model of RLCLRPRCC is provided in Section 3.2.

Model of CMEM.
Three modules are concerned in CMEM: engine power, engine speed, and fuel rate. The first module for a vehicle of type h∈H is gained by the total tractive power requirement TPR h : where VW h is the total weight (kg) of vehicle of type h∈H; a is acceleration (m/s 2 ) and is gravitational constant (m/s 2 ); is road angle (typically 0); C d,h is the coefficient of aerodynamic drag; is air density (kg/m 3 ); A is the frontal surface area (m 2 ); C r is coefficient of rolling resistance; v is the vehicle speed (m/s). Additionally, to consider the driving losses of vehicle engine and usage of vehicle accessories (P acc ), such as air-condition and other power loads (typically 0), the below is applied: where TP h is the second-by second engine power output (kW) and tf is the vehicle drive train efficiency. Following modules consider the engine speed and fuel consumption rate and are approximated as follows: where N h is the engine speed (rpm); S is the engine/vehicle speed ratio in top gear L ; R(L) is the gear ratio in gear L= {1, 2, . . . , L g }; is fuel-to-air mass ratio, FCR h (g/s) is the fuel consumption rate; k h is engine friction factor; V h is engine displacement (L); is the efficiency parameter for diesel engines. When a vehicle of type h travels d at a constant velocity v, calculation of fuel consumption FC h (g) is as follows: Therefore, FC h (L) can be represented as follows: where = /44 , =1/1000 tf , h =0.5C d,h A h , and =a+ sin + C r cos . is conversion factor from g to L. In the above equation, the first is the engine module which is proportional to travel time; the second part is moment module and the last is speed module. Moreover, as the amount of fuel consumption is directly proportional to the amount of carbon emission, is used to represent the conversion factor, namely, each liter of oil consumed will exhaust 2.32 kg of carbon dioxide.

Proposed Formulation.
The RLCLRPRCC can be defined on a complete and directed graph G=(V, E) with a vertex set V and an edge set E. V consists of clients set I={1, 2, . . . , } and depots set J={1, 2, . . . , }. Each client i∈I has q i demand for delivering and p i demand to pick up and is satisfied by a single vehicle and depot within a specific service time windows [e i , l i ]. Each depot j∈J has a fixed opening cost FD j and capacity CD j . The fleet H={L 1 , L 2 , M} consists three types of vehicles with different specific parameters provided by Koc et al. [5] and the fixed renting costs per a single travel are FV∈ ¥{38, 44, 54}. E is constructed by the parameters concerning each edge, and E={( , ) : , ∈ , ̸ = }\{( , ) : ∈ , ∈ }. d ijh , Q ijh , FC ijh , and TT ijh are, respectively, the distance, Mathematical Problems in Engineering 5 dynamic load, fuel consumption, and travel time of an edge (i, j) travelled by a vehicle of type h∈H; AT ih is the arrive time at client i∈I by vehicle of type h∈H. In additional, C fc is the price of 1 L fuel, C cm is the price of 1 kg carbon emission, and V is the penalty price per minute caused by waiting clients if vehicle arrives early. Some assumptions should be described as follows: (1) each client must be satisfied only once within specific time window; (2) each vehicle must return to the departure depot; (3) the load of each edge must be less than the capacity of this vehicle of type h; (4) the load of each depot must not exceed its capacity; (5) the number of fleet in each depot is unlimited; (6) vehicles are assigned to service clients based on the principle of maximum load rate; (7) penalty cost will be calculated if vehicle arrives early; (8) vehicle must arrive at client nodes before closing time windows, etc.
The definition of decision variables is as follows: x ijh =1 if vehicle of type h travels from node i∈V to node j∈V; y j =1 if the depot j∈J is opened; z ij =1 if client i∈I is serviced by depot j∈J. The mathematical formulation proposed by Koc et al. [5] for RLCLRP can be modified to formula the presented model of RLCLRPRCC, as follows: subject to The objective function (6) is the total logistic cost including vehicle renting cost, depot opening cost, fuel consumption, carbon emission cost and penalty cost. Constraint (7) makes sure that each client is visited only once by a single vehicle. Constraint (8) guarantees that the number of edges entering and leaving each node is the same. Constraints (9) and (10) demonstrate that each client is serviced only once by a single depot and vehicle. Constraints (11)-(13) forbid infeasible routings that do not return to the departure depot. Constraint (14) ensures that the load of each depot must not exceed its capacity. Constraints (15) and (16) illustrate that the load of each vehicle leaving and entering a depot must equal to all clients' delivery and pickup demand, respectively. Constraint (17) is the load dynamic equilibrium constraint. Constraints (18) and (19) make sure that the load of vehicle in each edge must be less than its capacity and lager than 0. Constraints (20) and (21) guarantee that the total load of the vehicles leaving/arriving a depot is equal to the total delivery/pickup demand of all clients assigned to it. Constraint (22) is the equitation to calculate arrive time of a vehicle. Constraint (23) enforces that each vehicle must arrive at a client before closing windows time. Constraints (24)-(26) define some integrality constraints for the decision variables.   Figure 1 shows the framework of a hyperheuristic solver implemented in this paper, including two levels. At a lowlevel, a set of heuristics for scheduling, which has a corresponding expertise, is defined. And at a high level, a shared mechanism-based choice strategy and self-adaptive acceptance criterion are developed to select promising heuristic and maintain diversity of selection. The proposed hyperheuristic approach is a single-point search framework, which is characterized by tending to get stuck in local optimal solutions. Aiming at overcoming this rebarbative bug, each run starts with randomly selecting an individual from initial population which is randomly generated, and an appropriate LLH is selected from a fixed set of operators. After implementing the selected operator, we update heuristic space according to the proposed highlevel strategy and obtain a promising operator via roulette selection. The stopping criterion is based on the maximum numbers of iterations.

Problem Domain
The individual in the proposed HH consists of a set of routes, given by R= { 1 , 2 , . . . , }, where r i is a set of clients served by ith vehicle and the opening depot is inserted at two ends of each route. Each compete route is stored in subcell array. Aiming at fast evaluating solution, the properties of each route also are included in the second rows of route array, such as total cost, departing/returning load and type of vehicle in each route. We just need compute fitness of a single solution by simply add the total costs of all routes and the costs of opening depots. The computation complexity of correction processing is O(1) and it does not significantly affect computational efficiency. Moreover, the adopted representation can avid restore feasible and obtain feasible children solutions, and so does the following LLHs in Section 4.1.2.   To clarify the solution representation, a simple example of 15 clients, 4 vehicles and 5 potential location of depots is illustrated in Figure 2. The length of all arrays represents 4 routings, the left represents the tracing of 4 routes including serviced clients' order and opening depots, the right is the properties of 4 routes with the type of vehicles in the last position. Finally, the population initialized, called N pop , are randomly created.

Low-Level Heuristics.
As the factors (e.g., solution representation and approach parameters) of preexisting methods bring much inconvenience and difficult to develop hyperheuristics and reduced LLHs are flexible response to different application, several reduced operators specified to RLCLRPRCC are developed in this paper. The module of (operators set) is composed of 16 LLHs h 1 , h 2 , . . ., h 16 across two pools of heuristics: mutational heuristic (MH) and local search/hill climber (HC), which are identified by the role in improving or worsening the solution. The LLH search strategies are presented in the following list: (1) h 1 . Swaps two adjacent clients within a single routing.
(2) h 2 . Moves two adjacent clients to a different position within a single routing.
(3) h 3 . Moves a single client from vehicle to another.
Mathematical Problems in Engineering 7 (4) h 4 . Swaps two clients from different vehicles.
(5) h 5 . Diversifies the selected depots by opening a new one and randomly assigning between 1 and 2/3 of the routes to it or closes one depots and all routes of this depots are assigned to another depot.
(6) h 6 . Collapses each route into a single client, that is, "Super-Client", and the concerned distance is the small insertion cost of the depot in the original route; similar to Barreto et al. [62], if infeasible route is obtained, the insertion cost will be set to infinity.
(7) h 7 . Decomposes a single route into two different routes.
(8) h 8 . Merges two routes into a single route with randomly selecting a depot resulting in a feasible route.
(9) h 9 . Exchanges two edges inside the routes. The first 8 LLHs belong to MH, which have a good global optimization capability to help search escape from the local optima. The remainder is conductive as local optimization to improve the solution quality with ensuring FIR≥0. The constraints in Section 3.1 must be satisfied in obtained solutions by the above 16 operators, aiming at avoiding the feasible restoration method.

Proposed High-Level Selection
Strategy. The HLH of HH framework is proposed to choose an appropriate operator to search the solution domain space. The performance of operators may differ dynamically in different search phase. Therefore, how to estimate the performance and choose the most appropriate operators is of importance in developing the HLH. To acquire it, the strategy that can track the performance of LLHs in real-time should be developed; namely, the selected probability of each LLH is updated based on its real-time or phase performance.
Inspired from the paired performance of Choice Function (Cowling et al.) [45] and FRR-MAB (Li et al.) [50], a shared mechanism combined adaptive selection strategy is developed and characterized as follows: (1) sliding window organized as a first-in-first-out (FIFO) queue is applied to store recent application of operators and fitness improvement rate (FIR) values; (2) it is considered that achieving the fitness improve values are the joint effort of operators within sliding window; (3) FIR between parent and child fitness are not suitable for estimating the performance of Poor/Mutational Heuristics (PH), a novel strategy is developed; (4) FRR-MAB is used to calculate the credit values of PH and Elite Heuristics (EH). In the following, the details of the above are described.
In the shared mechanism, the sliding window is a twodimensional list of W rows and 2 columns. The first column records the operator index number and the second column records the corresponding FIR. However, only those operators with FIR>0, instead of all operators, are stored. As it is believed that the joint effort is achieved by the operators with FIR>0. Moreover, if the FIR gained by other operators is less than 0, sliding window will be cleared. Then sliding window resumes to work. In this paper, the W value is set to 4, that is only 4 operators can share the performance of FIR obtained by recent operator. As to the allocation method, performance value obtained are suggested to be in direct proportion to FIR obtained by recent operator with biased ratio method, that is br={ 1 , 2 , . . . , } and br 1 +br 2 +. . .+br W =1 and br 1 ≤br 2 ≤ . . . ≤br W . For example, biased ratio is set at {0.1, 0.15, 0.25, 0.5}, the above two constraints are satisfied, then the performance value for recent operator with obtaining incumbent FIR, which have to be shared to others in sliding window, can get 0.5FIR, the nearest operator, namely operator of last iteration, can share 0.25FIR, and the third and last can share 0.15FIR and 0.1FIR. The nearer operators implemented, the lager performance value can share. In the shared mechanism, the cumulative performance of each operator is nominated as SFIR.
The sliding window ensures that the stored FIR values reflect the current situation of the search, and only the operator with FIR>0 can share FIR of other operators. However, how to estimate the operators with FIR<0 is also significant in selecting an appropriate operator. In this paper, a kind of classification is adapted to partition the operators into two groups according to their global efficiency, and the most promising operators are listed in Elite list (EL) and the low-ranking operators are included in Low Ranking List (LRL). The operators in EL are called Elite Heuristics (EH), otherwise Poor Heuristics (PH). The global efficiency of operators adopts the Global FIR (GFIR) to classify the operators, as follows: where i is the ith operators, t is the incumbent iteration. Those operators with GFIR>0 can be viewed as EH, and others are PH, and if GFIR of all operators are less than 0, then GFIR is sorted in descending order and the first half are EH, others are PH. In this paper, operators are designed as local search heuristics; therefore the first strategy is adopted. For EH, the performance is estimated using the above shared mechanism; otherwise a novel method is developed to track the performance of PH by achieving FIR using the improvement rate between best solution of incumbent iteration and the last iteration implementing PH, that is, cumulative improvement rate of best fitness, called PFIR.
where bf t0 , bf t are, respectively, the best fitness of last iteration and current iteration. To our best knowledge, the most widely approach is to treat all operators as a whole and utilize a single criterion, but this may result in selecting the inappropriate operators in Iterated Local Search (ILS). For example, the credit assignment in FRR-MAB may lead to negative total reward, as the FIR of PH always are negative.
Moreover, how to select an appropriate type (PH or EH) is of significance according to situation of the search; as stated in Ozcan et al. [48], it might be desirable to apply a HC/EH whenever a MH/PH is implemented and vice versa. In this paper, the probability of selecting PH depends on the number of iteration without improving the best solution, nominated as TQ; NH is the number of operators; is control factors. The lager TQ value, the lager probability of selecting PH.
Equitation (30) is the function to calculate the credit value (CV), FRR is credit assignment value after normalizing, is the set of operators, N is the set of usage count of all operators, C is a scaling factor to control the tradeoff between exploitation and exploration, the former favors the arms with best empirical rewards, and the latter focuses on the infrequently tried arms. The FRR represents two performance indicators, namely, normalization value of SFIR of EH and PFIR of PH. According the description of the above definition, the pseudocode of the share mechanism procedure and high-level selection strategy are showed at Algorithms 1-2.
In each iteration, a heuristic of the pool of operators is chosen by roulette selection with the selection probability of the ith operator, as follows:

Proposed High-Level Acceptance Criterion.
Each application of a low-level heuristic takes an incumbent solution and modifies it to construct a new child solution. The child solution is then considered for accepting as the solution in the coming iteration. If the new solution is not accepted, then it is discarded. If the new solution is at least as good as the solution, then it is automatically accepted as incumbent solution regardless of the acceptance criterion specified by HH. In this paper, the nonimproving solutions are accepted with a probability p ac as follows: where TQ is the number of iteration without improving the best solution and NH is the number of operators. The above equitation demonstrates that the probability of accepting the nonimproving solution depends on the TQ instead of the quality of child solution.

Computational Evaluation
Implementation aspects and evaluations of the proposed mathematical model and approach are presented and discussed in the following sections.

Implementation Aspects and Configuration of Parameters.
The algorithm was parallel programmed in Matlab 2018a and results are achieved by using a 4.0 GHz Intel Core i7-6700K CPU with 8 GB of RAM and running Windows 10; concerning to the component of the algorithm, the parameters in the proposed HH play a significant role in the quality of solutions. Therefore, an initial experiment with various parameters was carried out and the following were found to be the most suitable. The maximum number of iteration (T max ) depends directly of the number of clients, depots, and the maximum op is operator of the last iteration; nop is current selected operator; FIRop is the fitness improvement rate obtained by op; Tmax and t is the maximum/current iteration, et al. number of vehicles; the reasoning is that the higher the value is, the more intensified of the search for good solutions is, and more time-consuming it becomes (especially in large instances), so the upper limit is set at 10 5 . max = min (10 ( + + ) 2 , 10 5 ) The configuration parameters used in shred mechanism were suggested as follows: br 4 ∼U(0.4, 0.6), br 3 =0.5(1br 4 ), br 2 =0.3(1-br 4 ) and br 1 =0.2(1-br 4 ). The selection of PH/EH uses the control factor =1.6. Acceptance factor in adaptive acceptance criterion is ∼U(2, 2.4). The parameter related into FRR-MAB follows the default values suggested in Li et al. [50], scaling factor (C=0.5), if needed, and the sliding window size (W=50), which are also used by Srickler et al. [57].
The potential threat to validity of the above parameter configure can be identified by the virtue of instances and empirical experiments; namely, parameter configure differs with the characteristics of instances.

Test Instances.
As the instances for RLCLRP is lack and RLCLRPRCC is firstly studied in this paper, the test instances in this paper are randomly generated. The speed of three zones are set as follows: V zone3 ∼U(60, 80), V zone2 ∼U(40, 60), and V zone1 ∼U(20, 40) (km/h), and the location and size of three zones are stochastic with keeping nested nature. We generated four sets of instances, namely, LCLIENT, LDEPOT, LTW, and LVEHICLE, which are used to evaluate the impact of distribution of clients/depots, clients' time windows and fleet on the total cost (TT), carbon emission (CE), travel distance (TD), travel time (TT), etc. In the first set, four subsets are generated: (1) all clients clustered zone 1, denoted by CC1; (2) all clients clustered zone 2, denoted by CC2; (3) all clients clustered zone 3, denoted by CC3; and (4) clients located randomly, denoted by CR. LDEPOT is similar to LCLIENT. The third set also includes four subsets by the different l i and e i , namely, l i -e i = ×ST i , and ∈ {1, 1.5, 2, ∞}. The last one also contains four subsets characterized by different fleet: (1) only L1 vehicle, denoted by L1; (2) only L2, denoted by L2; (3) only M, denoted by M; and (4) heterogeneous fleet, denoted by (HF). In each set, only corresponding parameters are changed and other parameters are kept the same, and the clients' time windows are randomly selected in the instance C101 proposed by Solomon et al. [63] and modified by dividing 10. The service time of each client depends on its total demand and the maximum ST is set at 540 (s); the delivery and pickup

Efficiency of High-Level Selection Strategy.
We have conducted a preliminary on the LRP/LRPSPD benchmark instances, for assessing the high efficiency of the proposed high-level strategy. To this end, we have compared the solutions of proposed algorithm with the optimal results obtained by the hyperheuristic with FRR-MAB and the bestknown solution (BKS) in the literature. The used set instances are one of three benchmark sets, namely set by Barreto et al. [62]. The results for this set can be seen in Tables 1 and 2. The first column displays the name of each instance, followed by BKS, then results concerning two HH approaches with proposed selection strategy and FRR-MAB are demonstrated: best solution (BS). Gap 1 /Gap 2 are the gaps to BKS and BS/mean results over 20 runs; SD is the standard deviation of the minimum results over 20 runs. Finally, MT is the mean computing time in second over 20 runs. From the BS in Table 1, HH-SMAA can obtain 17 BKS and two new BKS, one of them reaching an improvement of 0.3% to the previous BKS. Only 10 BKS are achieved by HH-FRR-MAB with median SD value at 4.25, which is much larger than the values of HH-SMAA. For the results of LRPSPD, 13 BKS are obtained and 5 new BKS are found with one reaching an improvement of over 1.6% to pervious solution. HH-FRR-MAB can 11 BKS and four BS are better than previous results. Moreover, aiming at demonstrating the efficiency of the proposed HH, Friedman statistical experiment is carried out between the results of the two approaches, and the results are provided in Table 3. The obtained 2 values are larger than critical values, indicating that there is a significant difference between the above two approaches and the performance of the proposed HH is statistically superior to HH-FRR-MAB. Table 4 presents the key performance indicators obtained by the proposed approach for solving LCLIENT set, aiming at analyzing the effect of clients' distribution on the total cost (TC) in CNY, vehicle cost (VC) in CNY, fuel and carbon emission cost (FEC) in CNY, penalty cost (PC) in CNY, travel distance (TD) in km, travel time (TT) in minutes, and carbon emission (CE) in kg. The first column displays the name of the instance. From the results in Table 4, all clients in zone 1 can reduce all performance indicators but not MT among the results compared with other clients' distribution, as shown in Figure 3. Compared with the results of CC2, CC1 seems to averagely reduce 5.56%TC, 2.96% VC, 20.82%FEC/CE, 37.77%PC, 19.25%TD, and 7.87%TT. The results of CC3 reveal that it is time-consuming, expensive, and long-distance for servicing this zone's clients even if the speed limits of vehicle    The ratio of each part in objective function seems to have nothing with the distribution of clients, as shown in Figure 4, among four ratios, the largest is the depots cost with ranging from 44% to 60% and average value at 51.68%, followed by vehicle or fuel consumption and carbon emission cost around 22%, and the ratio of penalty cost is the least, reaching a ratio of 4.12% on average. The efficiency of the proposed model is reported in Table 5. The values in Table 5 indicate the increment of the results of RLRPRCC-TT/RLRPRCC-TD, referring to the travel time/distance as the third part in the objective function instead of fuel consumption and carbon emission cost. Figure 5 reveals the change among the above three models, compared to RLRPRCC-TT, the proposed model can reduce 1.95% TC, 16.46%FEC, and 6.83% TD sacrificed by increasing 4.27%VC, 20.71%PC/VW, and 8.44%TT. Meanwhile, the proposed mathematical model also can reduce 1.57% TC, 14.08%FEC, and 6.68%TD with increasing 3.04%VC, 22.94%PC/VW, and 6.76%TT.

Effect of Clients' Distribution.
The above results demonstrate that the distribution of clients significantly affects the key performance indicators, and the type of CC1 always achieves the minimum values. Moreover, the proposed model in this paper can reduce around 2% logistics total cost, 15% fuel consumption and carbon emission cost and carbon emission, and 7% vehicle travel distance on average, which illustrates the efficiency of the presented mathematical model.

Effect of Depots' Distribution.
In this paper, the depot costs are strongly correlated with daily rental of depots (depending on their location), cargo management cost, and shipment charge. The distribution of depots in one city determines cost and location of each depot, which jointly influences the key performance indicators. The results of LDEPOTS instances are carried out in Table 6. From the observation of values in Table 6, the distribution of depots in zone 3 can significantly decrease 59%TC, 1.62%FEC/CE and 3.83%TD when compared with DC1, 29%TC for DC2    Figure 6. In Figure 7, the ratio of each part in objective function is displayed in boxplot, the ratio of depots' cost in DC3 is minimum with 53.66% ratio on average, and the minimum ratio of others can be found in the type of DC1. Each ratio of parts in objective function differs in different type of depots' distribution. Therefore, the logistics company should preferentially optimize the maximum ratio among four parts for selecting the most favorable depots in obtaining the minimum total costs, as the depots' cost average accounts for more than 50%. Table 7 displays the increment of results obtained by RLRPRCC-TT and RLRPRCC-TD. Figure 8 shows the increment of each performance indicator in RLRPRCC-TT and RLRPRCC-TD, which indicates that our proposed model can reduce 3.76%/3.07TC, 41.57%/38.31%FEC/CE, and 18.80%/22.12%TD on average. Therefore, our model also has high efficiency in solving these instances.

Effect of Clients' Time Windows.
In this section, we analyze the effect of variations of clients time windows; as stated in Section 5.1, there exist 4 settings for indicating allowance of time windows, that is, l i = ×ST i +e i , and ∈ {1, 1.5, 2, ∞}, and the e i values and other parameters are kept the same across all variations. The comparison of four variations is provided in Tables 8 and 9. Table 8 provides the results of the instances with =1. Table 9 presents the decrement values of key performance indicators, that is TC, PC, FEC/EC, TD, and TT. From the observation in Table 9, the average decrement of the above 5 indicators increase as the value increases, but the penalty cost of =1 is less than it of =1.5. For each performance indicator, the increment of TC, VC, FEC/EC, PC/WT, TD, and TT ranges from 3.93% to 19.45%, from -0.36% to 23.94, from 16.54% to 33.60%, from 5.43% to 42.96%, and from 8.84% to 25.36%, respectively. Figure 9 reveals the tendency of 6 performance indicators affected by the value, including TC, VC, FEC/EC, PC/WT, TD, and TT. Our results demonstrate that the clients' time windows have significantly impact on most of key performance indicators, and the logistics company should take appropriate prejudice way to select appropriate clients to service.    Table 11 provides results compared the results in Table 10  The above results suggest that the instances used heterogeneous fleet of vehicles can effectively reduce total logistics cost, fuel consumption, carbon emission, vehicle waiting time, and travel distance/time and increase the capacity utilization rate and number of vehicles, even though the usage of vehicle type depends directly on the demand and time windows of clients. Therefore, the logistics company   Size of Instance should rent various type of fleet composition based on nature of demand and time windows of clients for deducing cost, environmental effect and increasing clients satisfactory. Figure 10 illustrates the computing performance of the proposed method; it goes without saying that the average computing time has relationship with the size of instances, such as the computing time of {50-5} goes up to 10 times of that of {25-5}, and the computing time of {75-7} is three times of that {50-5}. Moreover, the running time on average of {100-10} is 1.2 times of that of {75-7}. The above phenomenon indicates that the computing time also subjects to the parameters in hyperheuristic, such as the maximum iteration and the implemental environment.

Conclusion
This paper proposed a new model for regional low-carbon location-routing problem with simultaneous pick-ups and deliveries, hard time windows, and heterogeneous fleet. For solving the understood NP-hard problem, a novel approach was developed, that is, shared mechanism-based self-adaptive hyperheuristic (HH-SMAA). A three-index exponential-size MIP mathematical model was first introduced to minimize the logistics total cost which consists of total depot, vehicle, fuel consumption, and carbon emission cost and penalty cost, where the penult is regarded as routing cost for considering environmental effect and the latter is vehicle waiting cost (if arrive early) with respect to client satisfaction and maintain client loyalty for a long term.
To validate our proposed HH-SMSA, the LRP/LRPSPD benchmark instances, namely, Barreto sets, were carried out to evaluate its performance and reliability by comparing with the improved hyperheuristic with Fitness Rate Rank based Multi-Armed Bandit (FRR-MAB). Furthermore, four sets of new benchmark instances were purposefully constructed to assess the effects of various problem parameters, such as depot and client distribution, client time windows allowance, fleet, etc., on key performance indicators, including total cost, depot cost, vehicle cost, fuel consumption and carbon emission cost, penalty cost, vehicle travel distance and time, etc. From the above analysis and conclusions, several managerial insights for logistics company for a short/long term were provided.
Future works may define multiobjective function model for RLCLRPRCC with respect to economic effectiveness, environmental effect, and client satisfactory; corresponding multiobjective hyperheuristics will be developed. Moreover, the future studies may also consider temporal dimension in RLCLRPRCC with respect to time-dependent speeds account for traffic conditions.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.