) A multi-agent based cooperative approach to scheduling and routing. European Journal of Operational Research . ISSN 0377-2217 (In Press)

Abstract

In this study, we propose a general agent-based distributed framework where each agent is implementing a different metaheuristic/local search combination. Moreover, an agent continuously adapts itself during the search process using a direct cooperation protocol based on reinforcement learning and pattern matching. Good patterns that make up improving solutions are identified and shared by the agents. This agent-based system aims to provide a modular flexible framework to deal with a variety of different problem domains. We have evaluated the performance of this approach using the proposed framework which embodies a set of well known metaheuristics with different configurations as agents on two problem domains, Permutation Flow-shop Scheduling and Capacitated Vehicle Routing. The results show the success of the approach yielding three new best known results of the Capacitated Vehicle Routing benchmarks tested, while the results for Permutation Flow-shop Scheduling are commensurate with the best known 1. INTRODUCTION 3 Heuristics are rules of thumb for solving specific computationally hard 4 problems. Researchers and practitioners use heuristics when exact methods 5 fail to produce any solutions with a "reasonable" quality in a "reasonable"  Ross, 2014). In this study, we take an alter- 13 native approach and use cooperating agents, where each agent is enabled to 14 take a different approach with different parameter settings. 15 By cooperative search we mean that (meta)heuristics, executed in parallel 16 as agents, have the ability to share information at various points through-17 out a search. To this end, we propose a modular agent-based framework 18 where the agents cooperate using a direct peer to peer asynchronous mes- 19 sage passing protocol. An island model is used where each agent has its own 20 representation of the search environment. Each agent is autonomous and 21 can execute different metaheuristic/local search combinations with different 22 parameter settings. Cooperation is based on the general strategies of pattern 23 matching and reinforcement learning where the agents share partial solutions 24 to enhance their overall performance. 25 The framework has the following additional characteristics. By using 26 ontologies (see Section 3.2), we are aiming to provide a framework that is 27 flexible enough to be used on more than one type of combinatorial optimi-28 sation problem with little or no parameter tuning. This is achieved by using 29 our scheduling and routing ontology to translate target problems into an in- Salesman Problem. Kouider and Bouzouia (2012) propose a direct com-5 munication multi agent system for job shop scheduling where each agent is 6 associated with a specific machine in a production facility. Here a problem 7 is decomposed into several sub-problems by a "supervisor agent". These 8 are passed to "resource agents" for execution and then passed back to the 9 supervisor to build the global solution. 10 Little work has been done on asynchronous direct cooperation where par- 11 tial solutions are rated and their parameters are communicated between au-12 tonomous agents all working on the total problem. So far, no direct co- 13 operation strategy has been applied to more than one problem domain in 14 combinatorial optimisation. To this end, the agents are truly autonomous 15 and not synchronised. There is a gap in the literature regarding agents co- 16 operating directly and asynchronously where the communication is used for 17 the adaptive selection of moves with parameters. 18 The outline for the rest of the paper is as follows. Section 2 provides 19 formal problem statements for the two case studies. Section 3 describes 20 the proposed modular multi-agent framework for cooperative search, while 21 Section 4 describes how it is implemented. In Section 5 we discuss the exper-22 imental design. In Section 6 we report the results of the tests where, to the 23 best of our knowledge, for three of the capacitated vehicle routing instances 24 we achieved better results than have been reported in the literature. Finally, 25 Section 7 presents conclusions and suggestions for future work. 26

27
In this section we offer brief problem descriptions of the case studies 28 applied to the agent-based framework proposed in this paper. We chose these 29 instances as they are representative scheduling and routing problems. The  (Juan et al., 2015). This makes them a good fit with the partial solutions 33 identified by the system.

34
A solution can hence be represented, uniquely, by a permutation S = 10 (σ 1 , ..., σ j , ...σ n ), where σ j ∈ J indicates the job in the j th position. The 11 completion time C σ j ,i of job σ j on machine i can be calculated using the 12 following formulae:

14
The Capacitated Vehicle Routing Problem (Dantzig and Ramser, 1959) 15 can be defined in the following graph theoretic notation. Let G(V, E) be an 16 undirected complete graph where V = {v 0 , v 1 , v 2 , ...v n } is the vertex set and 17 where vertices E is a set of edges. 18 Let the set v i where i = {1, ...n} represent the customers who are ex- 19 pecting to be serviced with deliveries and let v 0 be the service depot. Also 20 associated with each vertex v j is a non-negative demand d j . This value is 21 given each time a delivery is made. For the depot v 0 there is a zero demand The set E represents the set of roads that connect the customers to each 24 other and the depot. Thus each edge e ∈ E is defined as a pair of vertices

25
(v i , v j ). Associated with each edge is a cost c i,j of the route between the two 26 vertices.

27
Finally there is also a set of unlimited trucks each with same loading 28 capacity. The aim is to service all the customers visiting them once only and 29 using as few trucks as possible. In any potential delivery round a customer's 30 demand has to be taken into account. The total demands of customers on 1 the round must not exceed the capacity of the vehicle. This means that it is 2 normally not possible to visit all customers with one truck. As a consequence 3 each delivery round for a truck is called a route. 4 The goal of the CVRP problem is to minimise the overall travelling dis-5 tance to service all customers with varying demand using a given number of 6 trucks, each with the same fixed capacity.

7
This problem was proved to be NP-Hard by Garey and Johnson (1979). 8 2.3. Benchmark instances 9 We used the following benchmark instances for testing the experiments 10 described in Section 5. For PFSP, we selected 12 benchmark problems 11 from Taillard (1993   The framework makes use of two types of agent: launcher and meta-29 heuristic agents.

30
• The launcher agent is responsible for queueing the problem instances to specific data and as such is generic. 13 A search proceeds with the launcher reading a number of problem in-14 stances into memory. It converts them into objects that can be defined by 15 the Ontology for scheduling and routing (section 3.2 below) and then sends 16 each object, one at a time, to the metaheuristic agents to be addressed. For 17 a given problem instance, the metaheuristic agents participate in a commu- 18 nication protocol which is in effect a distributed metaheuristic that enables 19 them to search collectively for good quality solutions. This is a sequence of 20 messages passed between the metaheuristic agents and each message is sent 21 as a consequence of internal processing conducted by each agent. One itera-22 tion of this protocol is called a conversation and is based upon the well-known 23 contract net protocol (FIPA, 2009). In order to arrive at a good solution the 24 agents will conduct 10 such conversations.

25
To understand the pattern matching protocol it is necessary to explain the 26 proposed model for scheduling and routing used throughout the framework.  The ontology used by the framework generalises these notions as abstract 1 objects.

2
• SolutionElements: A SolutionElement is an abstract object that can 3 represent a problem specific object such as a job in PFSP or, a customer 4 or depot in CVRP.

5
• Edge: An Edge object contains two SolutionElements objects. These 6 are used to represent pairs of jobs or customers in a permutation that 7 will be in the cooperation protocol to identify good patterns in improv-8 ing permutations.

9
• Constraints: The Constraints interface is between the high level 10 framework and the concrete constraints used by a specific problem.

11
These are used to verify a valid permutation.

12
• NodeList: A NodeList object is a list of SolutionElements objects or 13 Edges. It represents a schedule of jobs in the PFSP. In the case of 14 CVRP, a NodeList represents a Route and is therefore a sub-list of a 15 full permutation. 16 • SolutionData: A SolutionData object is a list of NodeList objects 1 and therefore is the permutation that is optimised by the framework. 2 In this study it represents a schedule of jobs in PFSP, or a collection of 3 routes in CVRP. 4 All message passing in the framework, including the whole ontology, is 5 written in XML. This can be advantageous as many benchmark problems, 6 these days, are also in XML making the interface between problem definition 7 and ontology seamless in practice. Figure 1 shows the structure of the ontol-8 ogy and how SolutionElements are the interface between the framework and 9 a concrete problem. The framework features a method of Edge selection and short-term mem-12 ory. A conversation, as has been explained already, is a type of distributed 13 heuristic. Its purpose is to identify constituent features of incumbent solu-14 tions that are likely to lead to the building of improving solutions. 15 This is achieved by using objects defined in the ontology. SolutionData NodeLists are built of n − 1 Edges and n SolutionElements.

29
The initiator agent collects all the Edge objects from all the other agents 1 into a list and scores them by frequency. Here, frequency is the number of 2 times an Edge appears in the initiators list. The only Edge objects that are 3 retained are the ones that have the same score as the number of agents that 4 are participating in the conversation. The idea here is that if an Edge occurs 5 frequently in all incumbent solutions, it is likely to be an Edge that will be 6 part of an improving solution. These retained good Edges are then shared 7 by the initiator with the other agents. 8 Another feature is the learning mechanism where each agent keeps a short-9 term memory of good Edges. This is a queue of good Edges that operates 10 somewhat like a Tabu list. An agent's queue is populated during the first 11 conversation with edges from the incumbent solution produced by its meta-12 heuristic. Thereafter the queue is maintained at a factor, that is 20%, of 13 the size of the candidate solution for the problem instance at hand. In sub-14 sequent conversations as new edges not already in the list arrive, they are 15 pushed onto the front of the queue while other edges are popped off the back 16 of the queue so that the size of the list does not change.

17
The Edges in the short-term memory are used at the start of each con- 18 versation to modify the performance of agent's metaheuristic to enable it to 19 find better solutions.

20
The basic idea of this learning mechanism is that both the RandNEH and   The framework conducts a search where each agent is launched and reg- 4 isters with the JADE platform that hosts the framework. Once this is com-5 plete, the agents wait for the launcher agent to read in a problem from file. 6 The launcher will then send the problem to each of the metaheuristic agents. Only when the metaheuristic agents receive that problem from the launcher 8 do they embark on a search.     Figure 2 shows the edge selection protocol used by the metaheuristic 15 agents. One complete execution of the algorithm illustrated is a conversation. 16 In any conversation, there will be an agent that takes on the role of an 17 initiator and the others are responders. In the very first conversation agent1 18 will always take on the role of initiator. Thereafter, any agent can be the 19 initiator, but it is determined in the previous conversation which agent will 20 be the initiator for the current conversation (see below). 21 In Figure 2 an agent taking on the role of initiator starts a conversation.

22
At the start of a conversation, each agent either takes a list of Edge objects 23 generated from a previous conversation or from one generated by the launch 24 agent (see I 1 and R 1 in Figure 2).

25
The agents then find a new incumbent solutions using their given heuris-26 tics in conjunction with the edges provided in the previous step (see I 2 and 27 R 2 in Figure 2).

28
The initiator breaks its incumbent solution into edges and then invites 29 the responder agents to do the same and send them to the initiator, I 3 and 30 R 3 of Figure 2.

31
The receiving agents also send the value of their best-so-far solution. This 32 will be used by the initiator to determine which agent will be the new initiator 33 in the next conversation (see I 4 in Figure 2).

34
In I 4 , the initiator receives the Edge objects from the responding agents 35 and collects them together. Each Edge object is scored and ranked based on 36 frequency. This can be seen in box I 4 of Figure 2 as the function getScore.  In I 4 of Figure 2, through the function getInitiator, the initiator also de-1 termines which metaheuristic agent is going to be the initiator in the next 2 conversation. This is achieved by choosing the agent the best objective func-3 tion value to be the initiator. 4 The initiator then sends good Edge objects, found during this conversa-5 tion, to the receiving metaheuristic agents. 6 Each agent keeps a pool or short-term memory of high scoring Edge 7 objects. The pool acts as a sort of queue and its length is set when the agent 8 is launched. In this study all the agents have a pool size of 20% of length 1 of the instance currently being optimised. During the first conversation each 2 agent populates its pool as good edges are identified. Once the pool is up to 3 size, it is maintained as a queue as described in Section 3.3.

4
The other metaheuristic agents receive the lists of Edge objects from the 5 initiator (see box R 4 in Figure 2). They also update their internal memory's 6 or pools as described above. In box I 5 and R 5 of Figure  In this section we discuss the experimental design.  which will produce different edges that will enhance the performance of the 18 distributed edge selection algorithm. 19 In both case studies each metaheuristic is allowed to run for 12 seconds 20 each time it is called. The main hypothesis to be tested in these experiments is that cooperating 30 agents produce better results than their stand alone equivalents. The results

31
are also compared with state-of-art results for each of these benchmarks. To 32 this end, for each instance of the tests the following scenarios were run:

33
The CVRP tests were conducted as follows with α − values selected on The PFSP tests were conducted similarly but without the need for α − 5 values. 6 They are tested in this way so that standalone agents running just one 7 metaheuristic at a time can be compared statistically with groups of coop-8 erating agents in order to test the main hypothesis. 9 Every instance is tested 20 times. The resulting values are then used to 10 evaluate the performance of the test. In particular the average and minimum 11 value of the 20 runs for each problem are taken. These are compared with 12 the known optimal or best values for each problem instance. 13 To test the hypothesis that agents cooperating by edge selection perform 14 better than stand alone agents, Wilcoxon signed rank tests are conducted 15 for each benchmark instance, with a 95% confidence level. We used the 16 Wilcoxon test rather than t-test because we cannot guarantee that the test where both approaches consistently achieve the same value.

29
The results for each problem are averaged and the average percentage The results are also analysed to find the best result of each group of agents 1 over the 20 runs of each problem instance.  With respect to answering our main hypothesis: "is cooperation by pat-   Table 3 explores the possibility that adding more agents leads to better 10 results. Here we can see that 8 agents perform statistically better than 4, 11 while 12 agents show some improvement, but not statistically, over 8. The significance. This is suggestive that it is better to increase the number of 2 agents by a factor of 2. 3 The cooperation mechanism used in this study works by identifying and 4 sharing of good patterns that form partial solutions to the problem at hand.

5
These are then passed to a metaheuristic to build a new putative solution 6 to the problem. Given this, it is interesting to study the patterns (edges) 7 identified by each agent and compare them to the final solution found by the  14 These are all the unique edges identified during this search. These edges are 15 identified multiple times but the table only shows them once. 16 Indeed some edges (highlighted in bold) identified by the system do end     12,and 16 agents cooperating against a stand alone agent. As before we 16 tested for statistical significance using the Wilcoxon signed rank test at the 17 95% confidence level. Table 6 lists these result using the same notation as 18 used in In table 7 we report the results of our tests for the secondary hypothesis.

5
As with the PFSP results, we can see a gradual improvement as more agents 6 are added. But again it seems it is necessary to double the number of agents     best results for these instances. We were also able to show for groups of 8, 12 3 and 16 agents compared with the stand alone equivalent, that cooperation 4 by pattern finding is better than no cooperation. Finally we are also able to 5 show that doubling the number agents each time leads to improving results 6 as shown in Figure 4.

8
In this study we propose a general agent-based distributed framework 9 where each agent implements a different metaheuristic/local search combi-10 nation. An agent continuously adapts itself during the search process using 11 a cooperation protocol based on reinforcement learning and pattern finding.

12
Good patterns that make up improving solutions are identified by frequency 13 of occurrence in a conversation and shared with the other agents. The distributed computing framework presented can be run on a local 6 network of personal computers each using 2GB memory.

7
The framework also aims to be a generic and modular needing very little 8 parameter tuning across different problem types tested so far. It has been 9 been applied successfully to PFSP and CVRP. It has also been used to model