Deadlock-Detection via Reinforcement Learning

Optimization of makespan in scheduling is a highly desirable research topic, deadlock detection and prevention is one of the fundamental issues. Supported by what learned from this class, a reinforcement learning approach is developed to unravel this optimization difficulty. By evaluating this RL model on forty classical non-buffer benchmarks and compare with other alternative algorithms, we presented a near-optimal result. *Corresponding author: Mengmeng Chen, Department of Industrial Engineering and Management Systems, University of Central Florida, Orlando, USA, Tel: 407 823-2204 ; E-mail: CHENMM@Knights.ucf.edu Received May 02, 2017; Accepted June 01, 2017; Published June 16, 2017 Citation: Chen M, Rabelo L (2017) Deadlock-Detection via Reinforcement Learning. Ind Eng Manage 6: 215. doi:10.4172/2169-0316.1000215 Copyright: © 2017 Chen M, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Introduction
Due to buffer-less setting, deadlock (DL) occurs frequently in resource sharing environment and concurrent computing systems. A deadlock is a state in which each member of a group of actions is waiting for some other member to release a lock. [1] Once this DL state occurred, workflow would stack in a fixed loop and never discharged. Figure 1 present a typical scheduling problem: 3 jobs need to be operated on 3 different machines following different sequences, each machine can only operate one job each time. How to schedule jobs in specific sequence to minimize total makespan aka processing time without deadlock is a typical optimization problem. Due to limitation of resources, deadlock happened frequently, other than a feasible solution, to find the global optimal deadlock free solution is difficult.
There are certain methods to solving deadlock problems: 1. Do nothing, 2. Kill the workflow, 3. Preempt and rollback. Other than kill the workflow, deadlock detection algorithms are more efficient in most cases and additionally, deadlock free scheduler would enable the realtime control for engineering system. Preventing or avoiding deadlock helping maintain system performance aka makespan stay at positive level.
A simple head-tail scheduling example is present in Figure 2. The rectangle stands for resource Ri, symbol Pi stands for jobs, if rectangle is empty them means no job is operating on that resource. Arrow stands for the action of each job. Second level DL has been addressed in the previous articles. Here on Figure 3, a deck lock is presented. Job P3 is moving to resource 3 then resource 1, but once it moved it will be a deadlock because P1 and P2 will be non-moveable. If P2 move to resource 3 first, then P2 and P3 will be a deadlock as they heading to other's occupation while there is no buffer.
In this paper we present a new reinforcement learning approach solving this scheduling optimization problem. We will test our algorithm's on the classical buffer-less benchmark and compare with the optimal solution.

Related Work
Local and global deadlock-detection in component-based systems are NP-hard [2]. Wysk et al. [3] developed a deadlock detection via integer programming in finding the optimum makespan. Specifically, they added constraints to ensure agents not release resource unless being assigned the next resource. However, integer programming method can only be used on small size problem as it would take very long time to run. Their integer programming formulation is shown below.    Formulation: In view of the modeling frameworks in the existing literature, three strategies for processing DL and corresponding research work are as follows: • Deadlock Prevention, which organizes resource usage by each process to ensure that at least one process is always able to get all the resources it needs [4]. Mixed integer programming [5] and region theory [6] are used to solve elementary siphon [7][8][9] to avoid deadlock. Edsger et al., Gen and Cheng [6,10] developed Banker's algorithm in 2006. However, these algorithms have many limitations: need fixed processing numbers; no further processing can be started when executing, and also need fixed resources amount.

•
Deadlock Avoidance based on the current system state and agents' future resource request, by restricting the resource allocation to avoid the deadlock. Petri Net model [3,[11][12][13][14] are complete developed in this area. Di-graph model and Auto-mata model [6,9,12,15,16] were also built to handle avoidance problems. However, current avoidance algorithms are not able to handle high level DL.

•
Deadlock Detection and Recovery is more focusing on quickly deadlock issue once detected. This can be completing in a couple ways [6,14], such as aborting certain action or add additional buffer. Still, detect deadlock and schedule a deadlock free path may be more convenient.
Lots of artificial intelligence and operation research effort has been applied to in scheduling problems. Zhang and Dietterich [17] were the first who applied reinforcement learning here. Mahadevan et al. [18] proposed a reinforcement learning algorithm combines different scheduling problem for the optimization of transfer-lines in manufacturing systems. Another maintenance-based approach based on simplified reinforcement learning is suggested by Zeng and Sycara [19].

Research Methodology
First let's give definition of deadlock on different levels. The first level of a DL is a set of agents that each request collects the resources held by another agent. The second-level DL is a set of agents any action will result in a first-level DL. The high level DL is a set of agents moving any action will result in a second-level DL. Figure 4 deliver a graph of how a high level DL happens.
We would like to use ranking matrix formulate this action system: S=[s ij ] M×N stand for the state matrix of the system with i∈l={1,2,…, M} and j∈j={1,2,…, N}

Proposition 1
For agents/ jobs A={a i , i=1,…, M A } in the detection system, we have that, If apply Proposition 2 to the previous example in ranking matrix. The state matrix will be:
As mentioned above, we consider all collaborative action teams seeking to optimize global rewards, and we assume that we can use our reinforcement learning approach to model the corresponding multiaction stochastic systems and provide a search algorithm. Therefore, there is at least one action sequence that maximizes the expected return of all movements [20][21][22][23][24][25][26].

Definition 1
Set S i ⊆S be the system state of i, where si={A (π 1 ), A (π 2 )…A (π i )}, {A i } denote actions will be executed, π I denotes the policies of action. The action A i at state I and the reward value R ( i ), represent by: Here P (π) represents the performance of policy π, and R () represents the local action reward parameter.

Definition 2
Under the set S o ⊆S, the policy: defined as the expected local reward R() and set discount factor γ to 1.

Evaluation
There are 40 classical scheduling benchmark problems for testing. The design of these problems adopts complex structure to increase difficulty. Additionally, if these systems are buffer-less, find scheduling will harder. Gantt chart can be drawn based on a DL-free timesheet obtained by each scheduling benchmark [27][28][29][30][31].
As shown in Figures 8 and 9, they present benchmark LA08 (15 × 5) and benchmark LA16 (10 × 10). We test the performance of our algorithm in this 40 benchmarks with backtracking counting's, we also compare the running time between with and without DL detection. Algorithms is written in Matlab, and the workflow of our RL algorithm is shown in Figure 10.
The results of 40 benchmark testing with our algorithm are given in Table 1. From the results, we found that: (1) All 40 benchmarks can be solved via algorithm within acceptable time frame.
(2) Our results are very close to the optimal solution. These shows our policy-based RL approach is effective in reducing time and cost.
(3) Due to the system difficulty, once the system becomes larger, the number of backtracking increases. Backtracking numbers are 0 for first 15 benchmark, and the number increases as the states increases. However, for all benchmark problems, our number of backtracking used is kept at a low level.
(4) Once a DL event occurs, our scheduling algorithm can rearrange and generate a new DL-free timesheet within 1 seconds. Therefore, we can assuming that our DL-free algorithm would be applied to other similar structure systems. Additionally, under more power computation system this algorithms making itself a qualified tool for real-time operation system (Table 1).

Conclusion
Based on the ranking matrix, graph model and reinforcement learning, a new corresponding DL detection algorithm is proposed by us, and using that the author analyzed the general pattern of high-level DL detection problem based on discrete system, using the classical    forty benchmark problems. However due to the heavy computation, some work might took very long term, but this can be solved in time while the computation speed is exponential increasing.
This algorithm is developed under the buffer less environmental which is much more difficulty compare to real world. Therefore, it is worth believing that our algorithm should be extended to other resource sharing systems.
Based on this DL detection algorithms, relax some certain constrains new limited buffer DL detection algorithms can be developed and can be widely applied in the mechanical system, parallel computing system, and the future is quite bright.