Strategically Patrolling in a Chemical Cluster Addressing Gas Pollutants’ Releases through a Game-Theoretic Model

Chemical production activities in chemical clusters, if not well managed, will pose great threats to the surrounding air environment and impose great burden on emergency handling. Therefore, it is urgent and substantial in a chemical cluster to develop proper and suitable pollution controlling strategies for an inspection agency to monitor chemical production processes. Apart from the static monitoring resources (e.g., monitoring stations and gas sensor modules), patrolling by mobile vehicle resources is arranged for better detecting the illegal releasing behaviors of emission spots in different chemical plants. However, it has been proven that the commonly used patrolling strategies (i.e., the fixed route strategy and the purely randomized route strategy) are non-optimal and fail to interact with intelligent chemical plants. Therefore, we proposed the Chemical Cluster Environmental Protection Patrolling (CCEPP) game to tackle the problem in this paper. Through combining the source estimation process, the game is modeled to detect the illegal releasing behaviors of chemical plants by randomly and strategically arranging the patrolling routes and intensities in different chemical sites. In this game-theoretic model, players (patroller and chemical sites), strategies, payoffs, and game solvers are modeled in sequence. More importantly, this game model also considers traffic delays or bounded cognition of patrollers on patrolling plans. Therefore, a discrete Markov decision process was used to model this stochastic process. Further, the model is illustrated by a case study. Results imply that the patrolling strategy suggested by the CCEPP game outperforms both the fixed route strategy and the purely randomized route strategy.


Introduction
The so-called chemical clusters are formed due to economies of scale, environmental factors, and other collaboration benefits (e.g., social motives and legal requirements) [1]. However, within such clusters, the situation is that the air pollutants produced in the process of chemical production are often illegally released to the surrounding environments instead of being purified and treated [2]. In extreme situations, the accidental releases caused by spontaneous or anthropogenic activities can exert dramatic implications [3]. Through transportation and dispersion (e.g., atmospheric flow), humans and the environment can be undesirably exposed to these dangerous gas emissions and eventually suffer from the harmful or fatal effects. Therefore, in order to avoid the release of gas substances of high toxicity and ensure the protection of environment and human health, air quality monitoring and gas emission detection have to be performed regularly [4]. At present, a core issue of concern to those who manage the chemical cluster is the effective prevention and mitigation of impacts caused by risk accidents and the implementation of effective management that can ensure safe production and social stability [5].
To this end, Qiu et al. [6] proposed a method incorporating a drone-based monitoring platform and a source estimation method to estimate contaminant sources in a chemical cluster. This research work falls through providing practical and efficient patrolling routes for mobile inspection resources hereafter. Fortunately, game theory has the advantage on modeling limited resource deployment in a multiple stakeholders' situation through a sound mathematical approach, according to Tambe et al. [7]. Namely, one of the applications of game theory in the security domain is to schedule patrolling (i.e., an act of traveling of mobile resources at different locations and intervals) [8] and this kind of patrol is also introduced in some other domains.
The security games have been successfully deployed in many areas (airports [9], ports [10], and trains [11]) to protect infrastructures for randomizing schedules for patrolling and monitoring. Also, a game-theoretic concept is deployed by Aguirre et al. [12] to define a multi-agent patrolling strategy on a national border to achieve a safer country. Similar works can be found in these references, such as Basilico et al. [13] and Gatti [14]. The zero-sum graph patrolling games defined by Alpern et al. [15,16] and Papadaki et al. [17] can perfectly solve the game on some special graphs, such as the line graph, etc. However, the defined zero-sum graph patrolling games are unreasonable in many situations and in their line graph patrolling game, the quantitative risk assessment of the line was absent [18]. Subsequently, to solve this problem, the Pipeline Patrolling Game (PPG) was proposed by Amirali et al. [19] in the Bayesian Stackelberg game form based on security risk assessment. Other than in security domain, scheduling patrol also sprung up in environmental protection domain in recent years. Green Security Games (GSGs) are typical representatives, applications of which mainly focused on scheduling patrols to protect the forests, fish, wildlife, etc. [20]. However, few works have realized the importance of scheduling patrols in a chemical cluster to detect the spontaneous or anthropogenic industrial production emission activities intelligently. Different from the above mentioned game-theoretic applications in the security domain [7,21] and in the environment domain [22][23][24][25], Zhu et al. [26,27] proposed the Chemical Plant Environmental Protection (CPEP) game and the extended CPEP game in succession, which were the first works for optimizing audits and detections of illegal release of atmospheric contaminants in chemical clusters. In these models, the game-theoretic model in conjunction with source estimation methods was utilized to better schedule the static inspection resources (i.e., high-accuracy monitoring stations and gas sensor modules) for detecting the irregularities of chemical plants. Besides the static monitoring resources, mobile inspection resources are highly recommended to monitor the chemical production process for their flexibility and mobility. In this paper, the Stackelberg game is therefore applied for scheduling the patrol of mobile inspection resources in a chemical cluster. For analyzing the patrolling in a chemical cluster, the patrolling object is modeled as a graph, in which the nodes are different chemical sites in the cluster. The game-theoretic model involves intelligent interactions between patrol teams (i.e., the defender) and chemical plants (i.e., many attackers). To model this interaction precisely and practically, several challenges and uncertainties remain to be solved. (i) It is unavoidable to face the challenge of large state space to represent strategies for the players since the game takes place on a road network. (ii) The patroller would travel in the graph and stay inside some nodes for a certain period of time and implement inspections to detect illegal releases of chemical plants through source estimation methods. However, the strategy generated by our model cannot guarantee the 100% success rate of catching the violations of chemical plants. (iii) Due to traffic delays or some other cognitive reasons, the patroller may not follow the patrolling schedules precisely. Thus, the present paper proposes a Chemical Cluster Environmental Protection Patrolling (CCEPP) game, answering the question of how to optimally randomize patrolling in a chemical cluster and deal with the aforementioned challenges and uncertainties. In this way, the proposed method not only facilitates the decision-making process of a patrolling route for the patroller team, but also addresses the atmospheric pollutants controlling problem, as well as reduces the risk of accidental gaseous pollutant leaks.
The remainder of the paper is organized as follows: Section 2 deals with the road network modeling problem, the CCEPP game and its corresponding game solver are proposed in Section 3, Section 4 involves an illustrative case study and experimental results, and conclusions and future directions are drawn in Section 5.

Road Network Modeling
A typical patrolling scenario can be demonstrated as follows: Patrol teams drive the inspection vehicles randomly, patrolling inside each of the chemical sites or travelling on the public road to another chemical site. During patrolling inside a chemical site, the team would utilize the monitoring facilities to collect atmospheric data for source estimation. Besides monitoring data from static inspection resources (i.e., monitoring stations and gas sensor modules), if the patrol team is patrolling inside this chemical site when some of the emission sources in this site are releasing excessive atmospheric contaminants, then the emission sources certainly have a probability of being detected by the patrol team. After staying for a period of time, the patrol team would travel to another site via the connected road. Therefore, to depict such a patrolling process, which is changing with space-time, road network is definitely modeled first.

Graphic Modeling
Several criteria are provided to determine the nodes and edges in a graph based on the practical road network: (i) If a chemical site only has one chemical plant, it usually owns one vehicle entrance; (ii) if a chemical site has several chemical plants and each chemical plant has several emission sources, we can assume that each chemical plant in this site has a vehicle entrance, all of which are assumed to be fully connected as well; (iii) the vehicle entrances are usually located on the side of public road; (iv) if two entrances belonging to different sites cannot be connected through a straight segment, a crossroad has to be added. Therefore, we model the road network as a graph G(V,E), where V denotes the number of nodes of the graph (i.e., the vehicle entrances of each site and the crossroads), and E is the number of edges of the graph (i.e., the roads between different nodes).
For the sake of illustration, an example of a small part of the Shanghai chemical cluster is given. There are six chemical sites in this picture, indexed as site 'A', site 'B', and so forth. As we may notice, chemical sites 'A' and 'D' have two vehicle entrances, while other chemical sites have only one entrance. Moreover, three crossroads are added to link two sites which cannot be connected through a straight segment. We used black dotted lines in this figure to demonstrate the traffic roads in reality for the patroller to drive. Meanwhile, based on above mentioned criteria, the graphic model of this cluster shown in Figure 1 was displayed in Figure 2. As we can notice in this picture, nodes in Figure 2 are represented by the sites' entrances (i.e., 'A1', 'A2', 'B1', 'C1', 'D1', 'D2', 'E1', 'F1') and the crossroads (i.e., 'Cr1', 'Cr2', 'Cr3'). Further, edges 'e1' to 'e12' are constructed to reflect the vehicle roads based on actual connection relationships of these nodes. Based on the graphic model, we illustrate the graphic patrolling problem of mobile vehicles as follows: (i) A patroller (or several patrol teams) starts her patrolling from a node (i.e., the dummy source node); (ii) she moves on the nodes and edges in the graph; (iii) when arriving at a node (a chemical site), she may decide whether to stay at the node for a specific period of time p i t (inspect inside the site), or move to another site instantly with a period of time m e t , without patrolling the current site; (iv) after the maximum travelling time budget of a patrol T is expended, the patroller terminates the patrolling and goes back to the dummy source node.  Based on the graphic model, we illustrate the graphic patrolling problem of mobile vehicles as follows: (i) A patroller (or several patrol teams) starts her patrolling from a node (i.e., the dummy source node); (ii) she moves on the nodes and edges in the graph; (iii) when arriving at a node (a chemical site), she may decide whether to stay at the node for a specific period of time t p i (inspect inside the site), or move to another site instantly with a period of time t m e , without patrolling the current site; (iv) after the maximum travelling time budget of a patrol T is expended, the patroller terminates the patrolling and goes back to the dummy source node. Based on the graphic model, we illustrate the graphic patrolling problem of mobile vehicles as follows: (i) A patroller (or several patrol teams) starts her patrolling from a node (i.e., the dummy source node); (ii) she moves on the nodes and edges in the graph; (iii) when arriving at a node (a chemical site), she may decide whether to stay at the node for a specific period of time p i t (inspect inside the site), or move to another site instantly with a period of time m e t , without patrolling the current site; (iv) after the maximum travelling time budget of a patrol T is expended, the patroller terminates the patrolling and goes back to the dummy source node.  With the definitions of inspecting time t p i and travelling time t m e , we can define a superior connection matrix sC of graph G. Based on the superior connection matrix, an algorithm is proposed in Section 2.3 to construct a transition graph. An entry sC(i, j) in matrix sC represents the time cost through moving from node i to node j (of graph G). There are two possible scenarios regarding the relationship of nodes i and j: (i) The two nodes belong to different chemical sites or at least one of them is a crossroad node. In this case, sC(i, j) is equal to the travelling time t m e that the patroller needs to move from node i and node j; or (ii) the two nodes belong to different entrances of a chemical site, or are the same. In this situation, sC(i, j) is equal to the inspection time t  Table 1 shows an example of this matrix by using the values mentioned above. For instance, t m 1 is the driving time from node 'A1' to 'B1' and t p A denotes the time needed to inspect inside the chemical site 'A'. All the time-related data are unified in minutes. Table 1. Superior connection matrix for Figure 2 with the practical numbers.

Time Discretization
To simplify the transition graph model, a discretization of the time is necessary. We discretize time into an equal granularity of h minutes (i.e., the time slice is determined by practical conditions and is denoted as one minute in this study, however, it can also be one second or one hour as well). In this way, a time vertex is added every h-minute until the maximum inspection time budget T has been expended. Moreover, if the time dimension of the transition graph is continuous, the patroller's travelling time, the patroller's patrolling time, and the attacker's attack period are not necessarily integers. Thus, these time-related parameters should be rounded to their closest integer numbers of time slice in order to discretize the time dimension. Consequently, any actions of the patroller and the attacker would happen at the beginning of the vertex of each time slice.
Although the time dimension of the transition graph is continuous in reality, such a discretization mentioned above is also reasonable and feasible. For one thing, the attacker's actions can be enumerated by discretizing the time, as seen in Section 3.1. For another thing, if the length of a time slice is short enough, the discretization model can also describe the reality well.

Transition Graph Modeling
A transition graph tG(tV, tE) is defined based on the graphic model of a chemical cluster and the time discretization. A node (or state) in tG is denoted by a tuple of (t, i), wherein t ∈ [0, T) represents the current time step and i ∈ {1, 2, . . . , |V|} denotes the site that the patroller is located (it is also denoted as a node in graph G(V, E)). After choosing an action and moving to another site i 2 at time t 2 from current site i 1 at time t 1 , a directed edge in tG is connected between the two node (t 1 , i 1 ) and (t 2 , i 2 ).
In the transition graph, we need only enumerate a polynomial number of nodes (or states) and edges instead of enumerating an exponential number of pure strategies. The goal of this paper is to compute the optimal probability flow (i.e., marginal coverage vector) and sample from this vector to create inspection schedules for the inspection agency. However, due to traffic delays or some other cognitive reasons, the patroller may not follow the patrolling schedules precisely. We therefore have to incorporate this uncertainty into the transition graph. Fortunately, Markov decision processes (MDPs) provide a mathematical paradigm for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. In our situation, a discrete MDP can model the discrete time stochastic control process of the actions and states. To be specific, the process is in a state s at each time step, and the decision maker may choose any available action a in state s. At the next time step, the process would move into a new state e randomly at a certain probability P a (s, e), giving a reward R a (s, e) correspondingly. Therefore, a discrete MDP in this paper is represented by a 3-tuple (S, A, P a (s, e)) without considering the discount factor γ and the immediate reward R a (s, e).
S is a finite set of states. Each state s ∈ S is a tuple (t, i). Each vertex tv ∈ tV in the transition graph corresponds to a state s and thus S is equal to tV; A is a finite set of actions which corresponds to the set of actions available from current state s, i.e., the set of sites connected by site i; P a (s, e) = pro(s t+1 = e|s t = s, a t = a) represents the probability that the patroller takes action a in state s, leading to state e. Table 2 shows an iterative algorithm for generating the transition graph tG(tV, tE). The essence of this algorithm is to find all the connective nodes of each node in the graphic model. In this table, dis(dsn, nd) denotes the shortest distance in graph G from the dummy source node dsn to node nd. An example of the patrolling graph tG for the chemical cluster with the data of patrolling and inspecting time in Table 2 is shown in Figure 3 to further illustrate how the algorithm works. We further assume the inspection time budget T = 30 and dummy source node close to the 'Cr2'. Table 2. An algorithm of generating the transition graph.

Algorithm: Generating the Transition Graph
Construct an empty temporary node list tNL, an empty node list tV, and an empty edge set tE; Construct node tv = (0, dsn), in which dsn is the patrolling source node in graph G; Initialize tNL ← tv, tV ← tv; While tNL is not empty, do; Get the first node in tNL, denoted as the current node cv = (ct, cn); Construct the follow-up nodes of cv; In graph G, find all the connected nodes of cn, representing as ccn = {nd ∈ V|sC(cn, nd) < ∞}; For each node nd that belongs to ccn, if ct + sC(cn, nd) ≤ T + dis(dsn, nd) holds, construct a new node nv = (ct + sC(cn, nd), nd) and a directed edge ne from cv to nv should also be constructed (the state transition should also be considered); Add edge ne to list tE; If nv is already in tV, continue; otherwise, insert nv into tNL, add nv to tV; Remove cv from tNL; End.
In Figure 3, the x axis and the y axis denote the time dimension and the nodes in graphic model, respectively. Therefore, a possible node for tG can be any coordinates in this figure. As we may notice, node 1 (at the left-hand side of this figure) is (0, 'Cr2'), which means that at time 0, the patroller starts from her source node (i.e., 'Cr2'). Thereafter she has three choices: (i) To come to site 'C' with a driving time t m 6 and reach node 2 (i.e., (2, 'C')); (ii) to come to site 'D' (more accurately, entrance 'D2') with a driving time t m 5 and reach node 3 (i.e., (1, 'D2')); (iii) to come to site 'E' with a driving time t m 7 and reach node 4 (i.e., (2, 'E')). Subsequently, at these new nodes (e.g., 2, 3, or 4), the patroller has to face the same choice problem (i.e., to patrol the current chemical site or to come to the adjacent chemical sites). Finally, when the total patrol time satisfies, the patroller terminates the patrol and comes back to her source node. In Figure 3, indexes of some nodes are marked to clarify this patrolling problem. In Figure 3, the x axis and the y axis denote the time dimension and the nodes in graphic model, respectively. Therefore, a possible node for tG can be any coordinates in this figure. As we may notice, node 1 (at the left-hand side of this figure) is (0, 'Cr2'), which means that at time 0, the patroller starts from her source node (i.e., 'Cr2'). Thereafter she has three choices: (i) To come to site 'C' with a driving time 6 m t and reach node 2 (i.e., (2, 'C')); (ii) to come to site 'D' (more accurately, entrance 'D2') with a driving time 5 m t and reach node 3 (i.e., (1, 'D2')); (iii) to come to site 'E' with a driving time 7 m t and reach node 4 (i.e., (2, 'E')). Subsequently, at these new nodes (e.g., 2, 3, or 4), the patroller has to face the same choice problem (i.e., to patrol the current chemical site or to come to the adjacent chemical sites). Finally, when the total patrol time satisfies, the patroller terminates the patrol and comes back to her source node. In Figure 3, indexes of some nodes are marked to clarify this patrolling problem.
In Figure 3 Figure 3 refers to a fixed patrolling route, and it is: back to 'Cr2'. A purely randomized patrolling route can be defined as: "At any node of the transition graph, the patrol team goes to each edge outgoing from the node with an equal probability." For example, in this figure, when the patroller is at node 1 (0, 'Cr2'), she would go to node 2, 3 or 4 with an equal probability of 1/3, and so forth.
In this paper, the patroller is required to prolong her patrolling in the chemical site to keep the continuity of coverage on each chemical site until the next patroller might be able to arrive at the site (see step 4.2.2 in Table 2). For instance, in Figure 3, the maximum patrolling time budget is set as 30 minutes, however, the patrolling in chemical site 'A' is not stopped until 38 minutes. The idea is that the shortest time in which the next patrolling team can arrive at chemical site 'A' from source node 'Cr2' is 8 minutes. However, if the current patroller does not prolong her patrolling in this chemical site and the next patrol team starts at time 30 from the source node 'Cr2', then chemical site 'A' would definitely not be inspected during that time interval between 30 and 38. This method may increase the patroller's workload. However, the problem can be solved if we set the value of T slightly smaller than the patroller's real workload. In Figure 3, a fixed patrolling route can be demonstrated as a series of edges (te 1 , te 2 , . . . , te len ) satisfying the following three conditions in the transition graph: (i) The in-degree of the start node of te 1 is 0; (ii) the out-degree of the end node of te len is 0; (iii) the edges te j and te j+1 (j = 1, 2, . . . , len − 1) are linked, which means that the end node of te j is the start node of te j+1 . For instance, the bold and black line in Figure 3 refers to a fixed patrolling route, and it is:

Chemical Cluster Environmental Protection Patrolling Game
A purely randomized patrolling route can be defined as: "At any node of the transition graph, the patrol team goes to each edge outgoing from the node with an equal probability." For example, in this figure, when the patroller is at node 1 (0, 'Cr2'), she would go to node 2, 3 or 4 with an equal probability of 1/3, and so forth.
In this paper, the patroller is required to prolong her patrolling in the chemical site to keep the continuity of coverage on each chemical site until the next patroller might be able to arrive at the site (see step 4.2.2 in Table 2). For instance, in Figure 3, the maximum patrolling time budget is set as 30 min, however, the patrolling in chemical site 'A' is not stopped until 38 min. The idea is that the shortest time in which the next patrolling team can arrive at chemical site 'A' from source node 'Cr2' is 8 min. However, if the current patroller does not prolong her patrolling in this chemical site and the next patrol team starts at time 30 from the source node 'Cr2', then chemical site 'A' would definitely not be inspected during that time interval between 30 and 38. This method may increase the patroller's workload. However, the problem can be solved if we set the value of T slightly smaller than the patroller's real workload.

Chemical Cluster Environmental Protection Patrolling Game
The CCEPP game is proposed in this section to model practical interactions between the patroller and the chemical plants. Then, we introduce the game from four aspects in succession, including players modeling, strategy modeling, payoff modeling, and the solutions of the game.

Players Modeling
Players of the CCEPP game are the patrol teams (i.e., mobile inspection resources, such as vehicles, helicopters, or drones) on the one hand and the chemical plants on the other. Hereafter, patrol teams are referred to as "leader" or "defender" and the chemical plants are referred to as "follower" or "attacker". The aim of the defender is to optimize the patrolling plans of mobile vehicles to detect more irregularities of chemical plants and to improve her payoff in the meantime. After observing the actions taken by the defender, the attacker attempts to release excessive air pollutants to optimize his profits. Moreover, both players in this game are assumed with perfect rationality based on two basic reasons: One is that both players in the CCEPP game are able to perceive their situation and the opposite player's actions accurately, and the other is that the players tend to maximize their payoffs through intelligently planning their strategies. Future work can be devoted to extend the model to deal with boundedly rational attackers.

Attacker's Strategy
To simplify the modeling of the attacker's strategy, several reasonable assumptions are made: (i) The number of working release spots differs from those in different release scenarios; (ii) in a chemical site, i, the release spots have the same working start time and the same working duration; (iii) the working duration of the release spots from different chemical sites are usually different because the categories of atmospheric contaminants are different within different chemical sites. Therefore, three parts are involved in an attacker's strategy: (i) Determine a time to start the release; (ii) determine a release scenario to use (denoted by the working duration of release spots); and (iii) determine the number of release spots to use.
Specifically, the formulation of the attacker's pure strategy is listed in Formula (1).
where t represents the releasing start time, k i denotes the working duration of release spots (e.g., five minutes), and rs i is the number of working release spots (this number is a positive integer between 1 and the total number of release spots in a site). As can be noticed in Figure 3, there is a red and bold line in each chemical site representing one attack scenario of attackers. For instance, the red and bold line in chemical site 'F' represents a release of site 'F' starting at time 2, with a release duration of 10 min. Formula (1) implies that each chemical site would choose one attack scenario to implement. Moreover, we can compute the number of pure strategies for a site i through Formula (2) based on the above definitions, in which S i a is the number of pure strategies owned by the attacker; T denotes the total time slices based on the segment of the maximum inspection budget; Sce is the number of different release scenarios; and RS i represents the total number of release spots owned by a site i.

Defender's Strategy
As illustrated in the definition of transition graph, a flow through the graph corresponds to a specific defender patrolling strategy and that flow is represented by a marginal coverage vector Π (s,e)∈tE c a (s, e). To be specific, two different states in the transition graph are connected by an edge. Thus, the marginal probability means that the inspection resources may go that edge. Moreover, once the optimal flow is computed, we can sample from which to generate the random patrol schedules. Here, the defender's strategy is denoted as Formula (3).
where c a (s, e) denotes the marginal coverage probability of the patroller assigned to the edge from node s to node e (reach state s, execute action a, and end up in state e); and Π is the Cartesian product of all edges in tG (i.e., all (s, e) ∈ tE). Further, the marginal probability of the inspection resources reaching state s and executing action a is defined as ω a (s). We then define a dummy source node s + to represent a root node which has no income edges, while a dummy sink node s − is defined to represent a terminal node end at the maximum inspection time budget T with only income edges. An intermediate node of tG is a node that has both income edges and outcome edges. Moreover, two properties should be satisfied based on the graphic flow theory: (i) For each intermediate node, the sum of all the income probabilities must equal the sum of all the outcome probabilities (i.e., the flow into a state s is equal to the flow out of the state); (ii) the sum of probabilities coming out from the root node or coming into the terminal node corresponds to the quantity of mobile inspection resources. Formulas (4) and (5) illustrate the abovementioned two equalities.
∑ in∈{e∈tV|(e,s)∈tE} c a (in, s) = ∑ ω a (s) = ∑ out∈{o∈tV|(s,o)∈tE} c a (s, out), Moreover, due to actual traffic delays or a patroller's bounded cognition on patrolling plans, the same action a taken by a patroller from a state s may not lead to a fixed state e. The following equation is used to explain the relationship between the marginal probabilities and state transition probabilities. It defines that marginal coverage probability c a (s, e) equals to the multiplication of the marginal probability ω a (s) and the probability of successfully transitioning to state e. c a (s, e) = ω a (s)P a (s, e)∀s, e ∈ tV,

Payoff Modeling
In this paper, the owner of a chemical site is assumed to choose an attack scenario to perform. Further, there will be two possible results when he chooses to attack, being: (i) The attack fails, or (ii) the attack is successfully implemented. In case the attack succeeds, the patroller will suffer a loss L i d (from the pressure of public opinion and authorities) and the attacker will obtain a gain G i a (from releasing excessive air contaminants without purification treatment). If the attack fails, the patroller will acquire a reward R i d and the attacker will suffer a penalty P i a . Both reward R i d and penalty P i a come from forfeit of attackers. Also, an expenditure r · C d is defined to represent the cost for conducting a patrolling through the chemical cluster. Values used in this paper of all these parameters are determined by experts from the environmental protection authorities of Shanghai's chemical cluster. Then, the defender's and attacker's payoff are formulated in Formulas (7) and (8), wherein f ( f ) is the probability that the attack would fail from the defender's (the attacker's) perspective and ρ i is the prior probability.
In Formulas (7) and (8), f ( f ) is a variant related to the attacker's strategy and defender's strategy modeled in Section 3.2. In the following paragraphs, the calculation of the probability f ( f ) is studied.
The probability that the irregularities of chemical sites would be detected by the static inspection resources (i.e., monitoring stations and gas sensor modules) is denoted by parameter f cpep and can be computed through the Chemical Plant Environmental Protection (CPEP) game or can be assessed by environmental experts as well. Moreover, we represent the probability that the patroller would detect the irregularities of chemical sites successfully as parameter f p . Considering the case that an excessive release can be detected by mobile resources and static resources at meantime, we therefore formulate the probability f ( f ) as Formula (9).
In Formula (9), f cpep is a site-specific constant. Taking into the characteristics of source estimation methods [28][29][30][31], we selected two main factors to model the probability f p . A release in site i starts at time t and lasts for k i time slices with rs i release spots. If a selected patrol of a defender has certain marginal coverage probabilities assigned in site i after the release time t, the inspection resources have a certain amount of time to collect the release date used for source estimation. Moreover, the quantity of working release spots in an attack scenario significantly influences the detection probabilities. Generally speaking, the probability f p would decrease when the quantity of working release spots increases.
Furthermore, we define an effective data collecting time as t e f f , which is constituted of two components: t overlap and t a f ter . The time t overlap means that the data collecting time is located on the overlaps between the release procedure of chemical sites and the patrollers staying in the site. Meanwhile, the time t a f ter represents that the data collecting time is located after the release finishing time. Based on the characteristic of source estimation method, pollution data collected during the overlap time is more useful than that collected after the release finishing time. Therefore, the effective data collecting time can be formulated as Formula (10).
in which ε is a real number larger than 1 and t overlap can be represented by ]. If we denote the start time that the patroller stays in a site i as st, the staying behavior of the patroller can be represented by a tuple of starting time and staying duration (st, t p i ). There are five situations classified to calculate the effective data collecting time t e f f . Situation 1: If st + t p i ≤ t holds, it means that the release scenario has not started. In this situation, the effective data collecting time t e f f equals 0.
it means that only the overlap time is the data collecting time. In this situation, the effective data collecting time t e f f equals ε · t overlap .

Situation 3:
If st > tandst + t p i ≤ t + k i holds, it also means that only the overlap time is the data collecting time. In this situation, the effective data collecting time t e f f equals ε · t overlap .

Situation 4:
If st > tandst + t p i > t + k i holds, it means that the total data collecting is patrolling time t p i in this site. In this situation, the effective data collecting time t e f f equals t e f f = ε · t overlap + t a f ter , in which t a f ter equals t p i − t overlap . Situation 5: If st > t + k i holds, it also means that the total data collecting is patrolling time t p i in this site. But in this situation, the effective data collecting time t e f f equals t a f ter .
After listing these situations, the probability, f p , can be formulated through Formula (11). (11) in which σ sit is a parameter with respect to effective data collecting time t e f f and working release spots rs i , denoted by Formula (12). In Formula (12), σ i is a positive, real number related to detection probability. S sit i represents the states set satisfying the specific situation (the states set relates to all time steps associated with site i).

Game Solver and Solution Definition
In the CCEPP game, it is assumed that the attacker can collect information about the patroller's patrolling routes. For instance, the attackers would acquire the information regarding the patrolling route through long-term observation on patrolling teams or stealing the patrolling plans. Therefore, we assume that the CCEPP game is played sequentially by the patroller and the attacker. Firstly, the patroller (being the game leader) will commence a patrolling strategy → c (see Formula (13), and subsequently, the attacker (being the game follower) responds with his optimal strategy accordingly (see Formula (14)). It is worth mentioning that the patrolling team is able to compute the optimal strategy of the attackers and then schedule her optimal strategy correspondingly. A Stackelberg for the CCEPP game is a patroller-attacker strategy pair that satisfies the following conditions: By discretizing the time dimension, the finite number of strategies of attackers can be enumerated. For a given patroller-attacker strategy pair, payoff functions u d and u i a would both be linear polynomials of → c . Therefore, a multiple linear programming (LP) algorithm [32] can be introduced to calculate the Stackelberg equilibrium for the CCEPP game, as shown in Table 3. Table 3. MultiLPs algorithm for calculating the Stackelberg equilibrium for the Chemical Cluster Environmental Protection (CCEPP) game.

Initialization
For each attacker strategy (t, k i , rs i ), calculate u i a and u d , where are linear polynomials of → c ;

Linear Programming (LP)
Suppose that the attacker strategy (t # , k # i , rs # i ) is the attacker's best response, which means: The defender would then aim at: The Stackelberg equilibrium achieves: In the linear programming step, the game assists the defender to solve an LP problem. In this LP problem, the cost function is Formula (16) and the constraints are Formulas (15), (4), (5) and (6). Furthermore, the MultiLPs algorithm implements the LP step for each attacker strategy. To be specific, if the value of c a (s, e) is constrained to be either 0 or 1, then the optimal fixed patrolling route for the patrol team would be generated.

Experimental Settings
In the illustrative case, experiments are conducted in the Shanghai chemical cluster to explain how the CCEPP game works in a real industrial scene. The part areas, graphic model, and transition graph of the Shanghai chemical cluster are shown in Section 2 (i.e., Figures 1-3). Further, the maximum patrolling time budget T is assumed to be 30 minutes, as the energy on the mobile patrolling vehicles is limited to a maximum of 40 minutes for driving and inspecting. Table 1 shows the patroller's moving time between different chemical sites and inspecting time inside each site. It is assumed that patroller's source beginning node is set to the 'Cr2', which means a patrol team starts her patrolling plan from this node. Some more parameters and simplification assumptions of this case are given hereafter.
For the sake of clarity, we assume that the attacker has only one release scenario to perform and this attack scenario lasts for 10 minutes with all release spots working during that period and the patrol team is assumed to follow the patrolling schedules precisely. Table 4 gives the model inputs related to the case study. They are the defender's reward R i d and loss (L i d ) of detecting and not detecting the attacker's irregularities; the attacker's gain G i a and penalty (P i a ) from a successful release and from a failed release, respectively; the cost for sending a patrol team (i.e., C d = 2); and the probability f cpep that the release can be detected by static inspection resources. The probability that the patroller can catch the release (i.e., σ i ) should be provided by environmental protection authorities. However, for the sake of simplicity, we assume that if the patroller is patrolling in the i th chemical site and the attacker is releasing atmospheric contaminants at meantime, there is a probability of 0.05 that the attacker's behavior would be detected by the patroller in each time slice. Moreover, the parameter ε is assumed to be 3 in this paper. The unit of all the monetary parameters can be, for instance, k¥. It is worth mentioning that all these data concern estimations from the environmental protection authorities in the Shanghai chemical cluster. In general, the numbers of rewards (R i d ), losses (L i d ), and the detection probability ( f cpep ) are accurate, as they are from the estimations of their own data. The amounts of rewards (G i a ) and losses (P i a ) for the attacker may present uncertainties because they are estimations of attackers' data from the perspective of the defender. For instance, "The gain of a successful release in chemical site 'A' is 60" means that the patroller thinks the attacker will receive a value of 60 from this release. However, the study of these parameters will not be covered in this paper. For one thing, future research will consider the unknown adversaries (i.e., exact values of parameters related to the opponents are unknown). Future research can also focus on the sensitivity of the model to the values selected for parameters R i d , L i d , G i a , P i a and ε.

Game Modeling
In this illustrative case study game, there are seven players, namely a patrol team and six chemical sites. Meanwhile, the six chemical sites are independent and thus the game follows the standard paradigm of a Bayesian Stackelberg game. It is further assumed that the six chemical sites share the same prior probability (i.e., 1/6). Since only one attack scenario is considered, each attacker therefore has only m = 1 × 30 × 1 = 30 pure strategies, being releasing excessive air pollutants at a time (i.e., at a time t ∈ {0, 1, . . . , 29}). The patroller has 764 possible actions, shown as edges in Figure 3. This means that the patroller's strategy can be formed as a vector of 764 entries and each entry denotes the marginal coverage probability of the edge in the transition graph.
According to Formula (7) and (8), the patroller's and the attacker's payoffs can be computed. Payoffs will be represented as linear polynomials of the patroller's strategy (i.e., → c ), while the attacker's strategy decides the coefficients of polynomials. Figure 4 shows the SE of the game developed for this case, computed by the MultiLPs algorithm shown in Table 3. As we can notice, the patroller's optimal patrolling strategy is represented by the black (and bold) lines. The associated number on the line denotes the probability that the defender will take this action. For instance, c1 = 0.3546 means that the patrol team should drive from the start node 'Cr2' to chemical site 'C' at a probability of 0.3546 at time 0. Similarly, c2 = 0.5637 denotes that at time 0, the patrol team should drive from the start node 'Cr2' to chemical site 'D2' at a probability of 0.5637. Moreover, when the patrol team is at node 1 (0, 'Cr2'), the patroller would have three possible actions (i.e., move to chemical site 'C', or 'D', or 'E'). The marginal coverage probabilities on the edge '1-2', '1-3', and '1-4' are equal to the conditional probabilities of taking these actions, for the total probability in node one is 1. Furthermore, in patrolling practice, if the patroller arrives at a node in the figure, the conditional probabilities of the following actions can be calculated by the formula c a (s, out)/∑ ω a (s). For instance, the probability that the patroller would arrive at the node (1, 'D2') in Figure 4 is ∑ ω a (s) = 0.5637, and the conditional probabilities that the patroller should take the three actions (i.e., either patrolling in site 'D', or driving to crossroad 'Cr2', or crossroad 'Cr1') are c a 1 (D2, D2) = 0.47155 /0.5637 = 0.8365, c a 2 (D2, Cr2) = 0.091165 /0.5637 = 0.1617, and c a 3 (D2, Cr1) = 0.000987 /0.5637 = 0.0018, respectively. Under this optimal patrolling strategy, the payoffs for the patroller and attacker are -6.616 and 3.188 (on average), respectively. The detailed information of this optimal patrolling strategy is listed in Table A1.  Next, let us compare the SE generated by CCEPP game with the purely randomized route strategy. In real patrolling practice, patrollers often randomly schedule the patrolling route. This situation, as demonstrated in Figure 3, is simply assigning equal probabilities to edges that start from the same node. For example, when the patrol team arrives at node 1 (0, 'Cr2'), he would move to site 'C', or 'D2', or 'E' with the same probability, being 1/3. However, the purely randomized patrolling strategy does not take into consideration the hazardousness level that each chemical site holds and, if this is the case, an intelligent attacker would take his preference to attack, since all the chemical sites are equally patrolled. Under this purely randomized patrolling strategy, we can firstly calculate the marginal coverage probabilities on each edge, then compute the probabilities in each overlap situation, and finally compute the payoffs of patroller and attacker. According to Formula (7) and (8), the corresponding payoff for the patroller and attacker are -8.254 and 4.054 (on average), respectively. Compared to the SE of the CCEPP game, the defender's payoff reduces from -6.616 to -8.254. This result reveals that the CCEPP SE strategy is characterized with a higher probability that the attacker is more possible to be discovered of his illegal behaviors.

Results and Discussions
Moreover, in current patrolling practice, some patrollers may follow a fixed route strategy rather than choosing the purely randomized strategy. This scenario has been explained in detail as a black (and bold) line, as demonstrated in Figure 3. However, if a fixed patrolling route is scheduled, the patroller's real-time location is deterministic to intelligent attackers, since intelligent attackers would collect useful information before an attack. In the transition graph, if the probability of an action being Next, let us compare the SE generated by CCEPP game with the purely randomized route strategy. In real patrolling practice, patrollers often randomly schedule the patrolling route. This situation, as demonstrated in Figure 3, is simply assigning equal probabilities to edges that start from the same node. For example, when the patrol team arrives at node 1 (0, 'Cr2'), he would move to site 'C', or 'D2', or 'E' with the same probability, being 1/3. However, the purely randomized patrolling strategy does not take into consideration the hazardousness level that each chemical site holds and, if this is the case, an intelligent attacker would take his preference to attack, since all the chemical sites are equally patrolled. Under this purely randomized patrolling strategy, we can firstly calculate the marginal coverage probabilities on each edge, then compute the probabilities in each overlap situation, and finally compute the payoffs of patroller and attacker. According to Formulas (7) and (8), the corresponding payoff for the patroller and attacker are -8.254 and 4.054 (on average), respectively. Compared to the SE of the CCEPP game, the defender's payoff reduces from -6.616 to -8.254. This result reveals that the CCEPP SE strategy is characterized with a higher probability that the attacker is more possible to be discovered of his illegal behaviors.
Moreover, in current patrolling practice, some patrollers may follow a fixed route strategy rather than choosing the purely randomized strategy. This scenario has been explained in detail as a black (and bold) line, as demonstrated in Figure 3. However, if a fixed patrolling route is scheduled, the patroller's real-time location is deterministic to intelligent attackers, since intelligent attackers would collect useful information before an attack. In the transition graph, if the probability of an action being taken is further constrained to be either 0 or 1 (i.e., c ∈ {0, 1} instead of c ∈ [0, 1]), then a vector of that satisfies Formula (4) and (5), representing a fixed route strategy. By solving the MultiLPs algorithm, the optimal fixed route strategy is obtained, shown in Figure 5. The fixed patrolling route can be further demonstrated as: The patroller starts from 'Cr2'; she goes to chemical site 'D2' and then she moves back to 'Cr2'; she further goes to chemical site 'E' and then moves back to 'Cr2'; she goes to chemical site 'D2' and back to 'Cr2' again; subsequently, she drives to chemical site 'C' and then moves to 'Cr3'; she goes to chemical site 'A2' and moves back to 'Cr3'; then she goes to chemical site 'C', crossroad 'Cr2', chemical site 'D2', crossroad 'Cr1', chemical site 'B', and chemical site 'A1' in sequence; finally, she patrols site 'A'. If the patroller follows the fixed patrolling route and the attacker plays his best response, the corresponding payoff for the patroller and attacker are -8.35 and 4.15 (on average), respectively. It is worth mentioning that neither the patroller's optimal fixed patrolling route and the attacker's best response are unique. For instance, given the patroller's fixed route, it would be indifferent for the attacker to start his attack at any time. However, the player's payoff would not be different. Therefore, only one optimal fixed patrolling route is shown in this paper. would be indifferent for the attacker to start his attack at any time. However, the player's payoff would not be different. Therefore, only one optimal fixed patrolling route is shown in this paper. Through comparing these three strategies, the SE generated by CCEPP game obviously outperforms the purely randomized route strategy and the fixed route strategy, shown in Table 5. By implementing the SE strategy, the defender will decrease her loss and the defender's payoff increases from -8.254 (or -8.35) to -6.616. The result reveals that the SE strategy has a higher probability of detecting attacker's illegal releases and brings a higher reward to the defender. To be specific, higher marginal coverage probabilities would be accompanied in the more hazardous chemical sites, which implies that the holder of hazardous chemical sites is highly likely to obey air regulations and conduct permitted emissions after being caught several times. Therefore, playing a CCEPP game is essential for the patrolling team in her daily management work because the game not only improves her payoffs and detects the illegal discharge behaviors of chemical sites, but also reduces the risk of hazardous gas leakage incidents.  Through comparing these three strategies, the SE generated by CCEPP game obviously outperforms the purely randomized route strategy and the fixed route strategy, shown in Table 5. By implementing the SE strategy, the defender will decrease her loss and the defender's payoff increases from −8.254 (or −8.35) to −6.616. The result reveals that the SE strategy has a higher probability of detecting attacker's illegal releases and brings a higher reward to the defender. To be specific, higher marginal coverage probabilities would be accompanied in the more hazardous chemical sites, which implies that the holder of hazardous chemical sites is highly likely to obey air regulations and conduct permitted emissions after being caught several times. Therefore, playing a CCEPP game is essential for the patrolling team in her daily management work because the game not only improves her payoffs and detects the illegal discharge behaviors of chemical sites, but also reduces the risk of hazardous gas leakage incidents.

Conclusions
The atmospheric pollution prevention problem in a chemical cluster has drawn a great concern around the world. Though some works have been done to schedule the utilization of static inspection resources in a chemical cluster, mobile inspection resources are highly recommended to monitor the chemical production process for their flexibility and mobility. However, the current widely used patrolling strategies (i.e., purely randomized route strategy and fixed route strategy) have obvious drawbacks. To this end, a so-called chemical cluster environmental protection patrolling (CCEPP) game was developed and proposed to aid the inspection agency in effectively scheduling patrols on different chemical sites. In this game, the intelligent interactions between the patroller and the holders of chemical plants were considered. Practical road network constraints and reasonable time discretization were modeled in this game as well. Moreover, simple source estimation process in each chemical site was also considered. Finally, the MultiLPs algorithm was introduced to solve this game correspondingly.
An illustrative case study was implemented to demonstrate how our proposed CCEPP game works in the Shanghai chemical cluster. Results of the case study show that the patroller would have higher expected payoffs by strategically randomizing patrolling routes, indicating that patrolling more potential releasing chemical plants would be more likely. These chemical sites are accompanied by higher marginal coverage probabilities. Performance of the patrolling strategy from the Stackelberg equilibrium outperforms the performances of any fixed patrolling routes and the performance of the purely randomized routes. Further, higher marginal coverage probabilities are accompanied in the more hazardous chemical sites signifying that it is more possible for the holder of hazardous chemical sites to choose to obey air regulations after several punishments. In other words, the surrounding ecosystem and residential environment will be largely improved on the one hand; the risks of hazardous gas leakage incidents will be considerably reduced by strategically scheduling the patrols on the other hand.
The proposed CCEPP game can be further modified from several directions. Firstly, the current model assumes that the estimation of the attackers' data from the defender is correct. In reality, the model should be extended to consider unknown opponents. Secondly, the attacker is assumed to only know the probability of each action that the patroller would take. But the more realistic situation is that the intelligent adversary not only knows the probability, but also observes the current location of the patroller. To model this situation, a stochastic game is recommended. Thirdly, the patrolling inside the chemical sites is modeled simply. In reality, it is better for the patroller to determine the locations of releasing spots through source estimation methods. Therefore, future work should also focus on the application of source tracing algorithm in the CCEPP game. It would assist the patroller in tracing the releasing source and verifing this irregularity.