1 Introduction

In the last decade, process mining, also known as workflow mining, has emerged as a hot topic in the field of business process management (BPM) and it provides a bridge between data mining and BPM. The starting point of process mining is a collection of historical information which is known as event log recorded by information systems. Based on event logs, the goal of process mining is to output a process model observed by the algorithm which reflects the real execution path of business process in an information system. The mined process model can be applied to many fields in BPM, including process model design, analysis of bottleneck of business process, etc. [1, 2].

Recently, the explosive growth of block-chain technology brings new challenges to BPM. Block-chain can make the traditional inter-organizational business process more secure and trustable. BPM can be promoted by the block-chain technology to applied in many fields, such as international finance, security data exchange (e.g. bank account information, personal health information), and so on. Weber et al. [3] firstly described a proposal to support inter-organizational process through block-chain. Mendling et al. [4] pointed out that how to collect event logs on trusted nodes in the chain and how to discover the real business process among organizations are big challenges in the field of block-chain-based BPM. In this work, we do not consider the collection of event logs and assume that (1) all the required process information have been collected and recorded in an event log; (2) The process information from different organizations has been converted to a unified format. Thus, we only focus on how to mining a high-quality process model from an event log.

Mining a high-quality process model from event logs is not an easy task. On the one hand, besides common process structures (e.g. sequence structure, parallel structure, choice structure, etc.), the process mining algorithm needs to handle lots of special structures, such as duplicated activity, invisible activity, non-free-choice structure, long-distance dependence structure, and so on [5]. On the other hand, event logs are always noisy or incomplete in the real life that may greatly affect the mining results. There are many reasons for the appearance of noisy data in an event log, such as server outage, network congestion, etc. In addition, for a complicated process model, it is possible to get an event log that cannot contain all possible execution paths, thus the event log is often incomplete (or unbalanced).

To evaluate the quality of a process model, researchers have proposed many effective indicators. The most commonly used indicators include replay fitness, precision and simplicity [6]. Replay fitness quantifies the extent to which the discovered model can accurately reproduce the traces recorded in the event log (a trace is described as a sequence of events and it represents a complete path of activity execution). However, only the replay fitness cannot prove the quality of the process model. For instance, the flower model is an extreme case which is able to replay arbitrary sequence of events. To handle this problem, researchers usually employ precision to quantify the fraction of the behavior allowed by the model but not observed in the event log. Simplicity is used to capture the complexity of the process model. A complex model is not required because it is hard to be understood. How to balance these competing indicators to discover an optimal process model is an important task in the design of the algorithm.

Computational intelligence (CI) is an efficient technique to handle such complex task. However, current CI-based process mining algorithms, including GeneticMiner [7], evolutionary tree miner [8, 9], ProDiGen et al. [10], suffer from slow convergence and premature. The reason behind the low efficiency of these algorithms is that all of them are based on genetic algorithm and their genetic operators are very simple. This paper proposes a new CI-based process mining algorithm, named PSOMiner, which is based on the discrete particle swarm optimization technique. There are three major innovations in the work. Firstly, a series of arithmetic operators are proposed for velocity updating and position updating on causal matrix. Secondly, a local mutation operator is designed to help PSOMiner skip out the local optimum if the premature is detected. Thirdly, each position in a particle (i.e. a candidate process model) is assigned a score during the calculation of fitness value. Then, the score is employed to guide the local mutation to improve the efficiency of the algorithm.

The remainder of this paper is structured as follows. Section 2 reviews some works related to this paper. Section 3 gives some background knowledge of process mining and particle swarm optimization (PSO). Then, the details of PSOMiner are given in Sect. 4. Section 5 explains the experiments and analyzes the experimental results in depth. The last section give conclusions.

2 Related Work

The basic idea of process mining is proposed by Cook and Wolf in 1995 [11]. Their aim is to automatically discover process models from event logs of the software process. Later, Agrawal et al. [12] applied the technique to workflow management and officially named it as process mining. So far, a large amount of achievements have emerged in this field. Generally, these works can be categorized into four groups.

(1) The α algorithm and its extensions

A milestone in the field of process mining is the α-algorithm proposed by van der Aalst et al. [13]. The algorithm can discover a structural workflow net (SWF-net) from event logs under the assumption of that the event log is complete. It models the business process by petri net and focuses on discovery of four classical relations (i.e. sequence relation, causal relation, parallel relation and choice relation) among activities from event logs. Based on the α algorithm, de Medeiros et al. proposed α + algorithm [14] which can discover length-one loops and length-two loops from event logs. Wen et al. proposed α# algorithm [15] and α +  + [16] which are able to handle invisible activity and non-free-choice structure respectively. The common characteristic of α series algorithms are that they extract the basic relationship among activities by scanning traces in event logs to construct the process model. The process model mined by α series algorithms has low structural complexity thus is easy to be understood. However, these algorithms do not consider replay fitness and precision, thus the quality of the obtained process model is low.

(2) Heuristics algorithm.

Weijters et al. proposed a heuristic algorithm for process mining, named as Heuristic Miner (HM) [17]. The HM algorithm quantifies the dependency among activities by counting the occurrence frequency of some patterns in an event log. Since the observed patterns with low frequency in an event log are considered to be noises, the HM algorithm has strong ability of anti-noise. Unfortunately, some correct but low frequency patterns are easy to be wrongly handled, thus the HM algorithm is unable to get a perfect process model. Later, they proposed an improved algorithm, called Flexible Heuristic Miner [18], which significantly improves the performance of the HM algorithm. Besides that, Greco et al. proposed CNMiner algorithm [19] which considers precedence constraint in the heuristic method. The disadvantage of the heuristic algorithm is that the performance of the algorithm mainly depends on the number of structure patterns. The more the patterns are, the higher the time complexity of the algorithm is.

(3) Region-based algorithm.

Bergenthum et al. firstly proposed a process mining algorithm based on language-based region [20]. However, the algorithm does not consider precision of the process model, moreover its time complexity increases exponentially with the size of event log. Werf et al. proposed ILP Miner which is also based on language-based region [21]. The ILP Miner employs integer linear programming technique to construct the process model. van Zelst et al. proposed Hybrid ILP Miner [22] which changes the objective function of ILP Miner and considers a method to handle the behavior with low frequency in an event log. Leemans et al. proposed Inductive Miner [23] which employs a strategy of divide-and-conquer. The Inductive Miner algorithm starts from a single node and it uses the recursive method to combine the adjacent local process models into a larger process model.

(4) CI-based algorithm.

CI is a powerful tool to solve problems in data mining, nonlinear optimization and other fields. This kind of methods simulate the biological evolution process, and has very strong search ability and robustness. Medeiros et al. firstly applied genetic algorithm in process mining (named as Genetic Miner) [7]. Through well-defined fitness function and crossover/mutation operators, the Genetic Miner is able to obtain a process model which is very consistent with the event log. Moreover, many problems, including non-free-choice structure, invisible activity, duplicated activity etc., are solved in a unified framework. Vázquez-Barreiros et al. proposed another algorithm ProDiGen [10] which is also based on genetic algorithm. However, the hierarchical objective function used in ProDiGen often obtains an unsatisfactory process model which has perfect replay fitness but very poor precision. Buijs et al. proposed Evolutionary Tree Miner (ETM) [8] which employs process tree to represent a process model. Moreover, the objective function of ETM considers replay fitness, precision and simplicity, therefore it can find high-quality process model.

3 Preliminaries

3.1 Process Mining

Process mining aims to extract knowledge from an event log and produce a process model which is required to be consistent with real business process. A process model is usually represented by petri net.

Definition 1

A petri net over a set of activities \(A\) is denoted by a tuple \(N = \left( {P,T,F,\pi } \right)\), where:

  1. 1.

    \(P\) is a finite set of places;

  2. 2.

    \(T\) is a finite set of transitions, and \(P \cap T = \emptyset\);

  3. 3.

    \(F = \left( {P \times T} \right) \cup \left( {T \times P} \right)\) is a finite set of directed arcs connecting places and transitions;

  4. 4.

    \(\pi :T \to A \cup \tau\) is a function mapping transitions to either activities or \(\tau\), such that \(\forall_{a \in A} \exists_{t \in T} \pi \left( t \right) = a\).

The symbol \(\tau\) denotes the invisible activity in a process model. Let \(N = \left( {P,T,F,\pi } \right)\) be a petri net, \(a,b \in P \cup T\) are two nodes in \(N\), if \(< a,b > \in F\), we say that \(a\) is an input node of \(b\), meanwhile \(b\) is the output node of \(a\). We use the symbol \(\bullet a = \left\{ {b\left| { < b,a > \in F \wedge y \in P \cup T} \right.} \right\}\) to denote the input nodes of \(a\). Similarly, the symbol \(a \bullet\) represents the output nodes of \(a\).

Figure 1 gives an illustrating example of process model represented by petri net, in which the circles denote places and the squares denote transitions. For a business process, the activities are modeled by transitions and the causal dependency among activities are modeled by places and arcs. It is easy to find that there are eleven transitions and ten places in Fig. 1. Two nodes that named “start” and “end” denote the beginning place and the ending place respectively.

Fig. 1
figure 1

An illustrating example of process model

Petri net is a token-based system. The token is represented by a block dot (see the “start” place). If all the input nodes (a.k.a. places) of a transition have one token, the transition is enabled to be fired. A firing transition will remove a token from each input place, and then put a token to each output place. For example, in Fig. 1, it can be seen that the input place of transition “A” contains a token, thus the transition “A” is enabled to be fired. It would remove a token from the “start” place and put a token to place “P1”. Then, transition “B” is enabled to be fired. It should be noted that “P3” is an OR-Split structure, that is to say, one of the four following transitions (i.e. “C”, “D”, “E” and “F”) would be able to be fired if “P3” has a token. Besides, “P1” and “P5” are OR-join structure, transitions “D” and “J” represent AND-split and AND-join structure respectively. More details can be found in [13].

Although petri net provides a powerful tool for process modeling, it is hard for us to directly apply a CI-based method on a petri net. To solve this problem, Medeiros et al. [7] proposed causal net (also known as causal matrix) to represent the process model. The definition is given below.

Definition 2

[7]: A causal matrix is a tuple \(\Pi = \left( {A,C,I,O} \right)\), where.

(1) \(A\) is a finite set of activities;

(2) \(C \subseteq A \times A\) is the causality relation;

(3) \(I:A \to P\left( {P\left( A \right)} \right)\) is an input condition functionFootnote 1;

(4) \(O:A \to P\left( {P\left( A \right)} \right)\) is an output condition function.

Furthermore, since a process model represented by petri net is usually compared with a model represented by causal matrix, a method for converting a petri net to a causal matrix is given below.

Definition 3

Let \(N = \left( {P,T,F,\pi } \right)\) be a petri net. The rules of mapping \(N\) to a causal matrix \(\Pi = \left( {A,C,I,O} \right)\) include:

(1) \(A = \pi \left( T \right)\).

(2) \(C = \left\{ {\left( {t_{1} ,t_{2} } \right) \in T \times T\left| {t_{1} \cdot \cap \cdot t_{2} \ne \emptyset } \right.} \right\}\).

(3) \(I:T \to P\left( {P\left( T \right)} \right)\) such that \(\forall_{t \in T} \, I\left( t \right) = \left\{ { \cdot p\left| {p \in \cdot t} \right.} \right\}\).

(4) \(O:T \to P\left( {P\left( T \right)} \right)\) such that \(\forall_{t \in T} \, O\left( t \right) = \left\{ {p \cdot \left| {p \in t \cdot } \right.} \right\}\).

For explanation, the process model in Fig. 1 is converted to a causal matrix shown in Table 1. It can be seen that each row in the table corresponds to an activity. Each row has two columns, i.e. the input of the activity \(I\left( a \right)\) and the output of the activity \(O\left( a \right)\). The input and output are denoted by a set whose elements are also sets. For distinction, the element is called subset in the following paper. It should be noted that activities in a same subset of \(I\left( a \right)\) have an OR-join relation and those in different subsets have an AND-join relation. On the other hand, activities in a same subset of \(O\left( a \right)\) have an OR-split relation and those in different subsets have an AND-split relation.

Table 1 The process model in Fig. 1 as a causal matrix

Take transition “B” as an instance, it has two input places “P1” and “P2”. Moreover, the input transition of “P1” is transition “A” and the input transition of “P2” is transition “C”, thus \(I\left( B \right) = \left\{ {\left\{ {A,C} \right\}} \right\}\). The output place of “B” is “P3” whose following transitions include “C”, “D”, “E” and “F”, thus \(O\left( B \right) = \left\{ {\left\{ {C,D,E,F} \right\}} \right\}\).

Then, the definitions of event, trace and event log are given below.

Definition 4 (Event, trace, and event log)

Let \(A\) be a set of activities, event is denoted as \(e = \omega \left( a \right)\), where \(a \in A\) and \(\omega\) is a function relating each activity to an event. Let \(\Gamma\) be a universe of events, a trace \(t \in \Gamma^{*}\) is a finite sequence of events. Let \(\Sigma = \Gamma^{*}\) be a universe of traces, an event log \(L\) is a multi-set of traces, i.e.\(\mathrm{L}=\mathcal{B}\left(\Sigma \right)\).

It is not difficult to understand that the process model in Fig. 1 may produce many different execution paths represented by a sequence of events (i.e. a trace) and these traces consist of an event log, just like {ABEIK50, ACFIK20, ACEBF100, ABCBEIK80, ABDGHJ60, ABDHGJ45 …}. The superscript represents the number of times that the trace appears in the event log.

3.2 Particle Swarm Optimization

PSO, which imitates the social behavior of bird flocking, is one of the most popular CI algorithm [24]. The PSO algorithm is initialized with a population of particles randomly distributed in the search place. Each particle imitates a bird which has two properties, i.e. its position \(x_{i} = \left( {x_{i,1} ,x_{i,2} , \cdots ,x_{i,n} } \right)\) and flying velocity \(v_{i} = \left( {v_{i,1} ,v_{i,2} , \cdots ,v_{i,n} } \right)\), where n is the dimension of the problem. In addition, each particle is assigned a fitness value calculated by an optimization function. During each generation, each particle updates its velocity and position using the following rules [25]:

$$ v_{i,j} = \omega v_{i,j} + c_{1} r_{1,j} \left( {pbest_{i,j} - x_{i,j} } \right) + c_{2} r_{2,j} \left( {gbest_{j} - x_{i,j} } \right) $$
(1)
$$ x_{i,j} = x_{i,j} + v_{i,j} $$
(2)

where \(pbest_{i,j}\) and \(gbest_{j}\) are the value of j-th dimension of the best-so-far position yielded by the i-th particle and the whole population respectively. In addition, \(\omega\) is the inertia weight, \(r_{1,j}\) and \(r_{{{2},j}}\) are two random numbers, \(c_{1} ,c_{2}\) are two acceleration parameters which are used to control the speed flying to the best-so-far position found by the particle itself and the population. In this way, particles learn from their self-cognition and social interaction to approximate the optimal solution of the problem.

4 Discrete Particle Swarm Optimization Algorithm for Process Mining

4.1 Framework of PSOMiner

Most of the CI algorithms for process mining follow the approach of genetic evolution, such as Genetic Miner, ETM, ProDiGen, etc. All of them suffer from premature and slow convergence speed because of the poor search ability of the genetic operators.

In this work, we propose a novel CI algorithm for process mining which is based on PSO technique, called PSOMiner. The flowchart of the PSOMiner is shown in Fig. 2. Firstly, the algorithm initializes a population of particles by the heuristic method proposed in [7]. Then, the particles are evaluated by the fitness function. If the stopping criteria is achieved, the algorithm will output the best solution. Otherwise, it will check whether or not the algorithm falls into local optimum (i.e. premature). If so, the algorithm will select the guided local mutation algorithm to help itself skip out the local optimum; otherwise it will select the PSO algorithm to fast approximate the optimal solution. There are two termination conditions, which are (1) whether or not the number of times that the best fitness value is not changed is higher than a predefined threshold, or (2) the number of generation reaches upper bound. The premature is judged by the variance of the fitness value of all particles. The details of the PSO algorithm as well as the guided local mutation technique will be introduced below.

Fig. 2
figure 2

The flowchart of PSOMiner

4.2 Fitness Function

In this work, particles are evaluated by a fitness function which is composed of three indicators, including completeness \(F_{c}\), precision \(F_{p}\) and simplicity \(F_{s}\). The formula is given below, where \(\lambda_{c} ,\lambda_{p} ,\lambda_{s}\) are three weight coefficients.

$$ fitness = \lambda_{c} \times F_{c} + \lambda_{p} \times F_{p} + \lambda_{s} \times F_{s} $$
(3)

The completeness \(F_{c}\) is in charge of quantifying the ability of a process model that it can properly replay the traces in an event log. For an event log without noise, the completeness of a perfect process model should be 1.0, i.e. all the traces can be properly replayed on the process model. A simple way to define the completeness metric is the number of correctly replayed traces divided by the total number of traces. However, this definition is too coarse. Consider two process models, one is a totally incorrect process model and the other just misses an arc, the above method cannot distinguish the two models because both of them can not properly replay the event log. This work follows the definition of completeness in [7]. The definition of completeness \(F_{c}\) is shown in formula (4). The function \(allParsedActivities\left( {\Pi ,L} \right)\) returns the total number of activities in the event log \(L\) that could be parsed without problems by the causal matrix \(\Pi\), and \(numOfActivitiesInLog\left( L \right)\) gives the total number of activities in \(L\).

$$ F_{c} = \frac{{allParsedActivities\left( {\Pi ,L} \right)}}{numOfActivitiesInLog\left( L \right)} $$
(4)

A process model maybe not a good solution even if it has completeness 1.0. An extreme instance is the flower model, i.e. for any activity, its inputs and outputs are the rest of the activities. The flower model can replay arbitrary traces, so its completeness value is the best. However, it is meaningless. Precision is an indicator for quantifying the fraction of the behavior allowed by the model which is not seen in the event log. However, it is difficult to give a proper definition of precision because it has to detect all the extra behavior in the process model. Therefore, our definition of precision takes into account all the activities that are enabled when the model replays an event log (see formula 5). The function \(allEnabledActivities\left( {\Pi ,L} \right)\) returns the number of enabled activities the log \(L\) is replayed on the model \(\Pi\).

$$ F_{p} = 1 - \frac{{allEnabledActivities\left( {\Pi ,L} \right)}}{{\max_{\Pi ^{\prime} \in \Omega } \left( {allEnabledActivities\left( {\Pi ^{\prime},L} \right)} \right)}} $$
(5)

The third dimension of the fitness is simplicity, which is used to measure complexity of a process model [10]. The definition of the indicator is shown in formula (6), where the symbol \(\phi\) and \(\psi\) denote the AND-join structure and the AND-split structure respectively. It has been introduced that AND-split (or AND-join) structure is modeled by the activities in different subset of \(O\left( a \right)\)(or \(I\left( a \right)\)). Therefore, what we need to do is to count the number of subset in \(I\left( a \right)\) and \(O\left( a \right)\).

$$ F_{s} = 1 - \frac{{\sum\nolimits_{a \in A} {\left( {\sum\nolimits_{\phi \in I\left( a \right)} {\left| \phi \right|} + \sum\nolimits_{\psi \in O\left( a \right)} {\left| \psi \right|} } \right)} }}{{\max_{\Pi \in \Omega } \left( {\sum\nolimits_{a \in A} {\left( {\sum\nolimits_{\phi \in I\left( a \right)} {\left| \phi \right|} + \sum\nolimits_{\psi \in O\left( a \right)} {\left| \psi \right|} } \right)} } \right)}} $$
(6)

4.3 PSO Algorithm

Different with the S-PSO [24, 25], the particle in PSOMiner is naturally represented by causal matrix which is more complex than the representation of particles in most of the S-PSO works. All the concepts and operators, such as position, velocity, position updating and velocity updating, are required to be redefined. Next, these new concepts and operators will be introduced.

4.3.1 Position

In PSO, the position of a particle usually represents a solution to the problem. In this work, the position of a particle is naturally represented by the causal matrix. Formally, the position is denoted by a tuple \(Pos = \left( {Pos^{I} ,Pos^{O} } \right)\), where \(Pos^{I} = \left\{ {I\left( a \right)\left| {a \in A} \right.} \right\}\) is a set that represents the input structure of each node and \(Pos^{O} = \left\{ {O\left( a \right)\left| {a \in A} \right.} \right\}\) is a set that represents the output structure of each node. Take the process model in Fig. 1 for an example, \(Pos^{I} = \left\{ {\left\{ {} \right\},\left\{ {\left\{ {A,C} \right\}} \right\},\left\{ {\left\{ B \right\}} \right\},\left\{ {\left\{ B \right\}} \right\},\left\{ {\left\{ B \right\}} \right\},\left\{ {\left\{ B \right\}} \right\},\left\{ {\left\{ D \right\}} \right\},\left\{ {\left\{ D \right\}} \right\},\left\{ {\left\{ {E,F} \right\}} \right\},\left\{ {\left\{ G \right\},\left\{ H \right\}} \right\},\left\{ {\left\{ I \right\}} \right\}} \right\}\). It can be seen that the representation of particle position in PSOMiner is much more complex than that in the traditional S-PSO. The former contains subsets of arbitrary length, but the latter contains subsets with fix length (just as {(a, b), (b, c), ….}).

4.3.2 Velocity

In PSO, velocity determines the speed and direction of a particle to update its position. Similar to the S-PSO, velocity is defined as a set of elements with possibility in this work. The definition of the velocity \(V\) is shown in formula (7).

$$ V = \left\{ {{e \mathord{\left/ {\vphantom {e {p\left( e \right)\left| {e \subseteq P\left( A \right)} \right.}}} \right. \kern-\nulldelimiterspace} {p\left( e \right)\left| {e \subseteq P\left( A \right)} \right.}}} \right\} $$
(7)

where \(e\) is the element which is a subset of the power set of activities \(A\), \(p\left( e \right) \in \left[ {0,1} \right]\) is the possibility of the subset. Note that an element with possibility zero will be ignored for the sake of convenience.

4.3.3 Velocity Update

The velocity updating rule in PSOMiner is different from the traditional method (see formula 8) because we do not consider the velocity of last generation. The reason behind the consideration is that the algorithm only selects one node to update in each generation, thus the historical velocity may provide trivial information for the search of optimal solution.

$$ V_{i,j} = c_{1} r_{1,j} \left( {pbest_{i,j} - Pos_{i,j} } \right) + c_{2} r_{2,j} \left( {gbest_{j} - Pos_{i,j} } \right) $$
(8)

The symbols \(V_{i,j}\) and \(Pos_{i,j}\) mean the velocity and the position of the j-th node (including the input set and the output set) of i-th particle respectively,\(pbest_{i,j}\) is the j-th node of the historically best position found by i-th particle, and \(gbest_{j}\) is the j-th node of the historically best position found by all particles. The meaning of other symbols are same with the traditional PSO. Then, all related operators will be introduced in detail.

(1) Position—Position (e.g.\(pbest_{i,j} - Pos_{i,j}\))

It has been explained that position of each particle contains two factors, i.e.\(Pos_{i}^{I}\) and \(Pos_{i}^{O}\). Both of them are sets composed of the subsets of the \(P\left( A \right)\). This section makes no distinction between them, i.e., all of the definitions proposed in this section can be applied to both factors. A minus operator between two positions are given below.

$$ Pos_{i,j} - Pos_{k,j} = \left\{ {e\left| {e \in Pos_{i,j} \wedge e \notin Pos_{k,j} } \right.} \right\} $$
(9)

For explanation, an example is shown in Fig. 3, in which three different structures of the output of node “B” are given. Let \(gbest_{i}^{O} = \left\{ {\left\{ A \right\},\left\{ {C,D} \right\},\left\{ E \right\}} \right\}\),\(pbest_{i,j}^{O} = \left\{ {\left\{ {C,D,E} \right\}} \right\}\) \(Pos_{i,j}^{O} = \left\{ {\left\{ {C,D} \right\},\left\{ {D,E} \right\}} \right\}\), then \(gbest_{j}^{O} - Pos_{i,j}^{O} = \left\{ {\left\{ A \right\},\left\{ E \right\}} \right\}\), \(pbest_{i,j}^{O} - Pos_{i,j}^{O} = \left\{ {\left\{ {C,D,E} \right\}} \right\}\). This operator is able to find out the promising structures that appear in good solution but not appear in current solution.

Fig. 3
figure 3

An example for explanation of operators in PSOMiner

(2) Real number × Position (e.g. \(c_{1} r_{1,j} \left( {pbest_{i,j} - Pos_{i,j} } \right)\))

The multiplication operator between a positive coefficient \(c\) and a position \(Pos\) yielded in the last step is defined by formula (10).

$$ c \times Pos = \left\{ {{e \mathord{\left/ {\vphantom {e {p\left( e \right)\left| {e \in Pos} \right.}}} \right. \kern-\nulldelimiterspace} {p\left( e \right)\left| {e \in Pos} \right.}}} \right\},p\left( e \right) = \left\{ \begin{gathered} c{\text{ if }}c < 1 \hfill \\ 1{\text{ if }}c \ge 1 \hfill \\ \end{gathered} \right. $$
(10)

By multiplying a positive coefficient \(c\), the result of minus operation between two positions becomes a set of possibilities, i.e. it has been transformed to a velocity. Continue the example above, assume \(c_{1} r_{1,j} = 0.{92}\), \(c_{2} r_{2,j} = 0.{84}\), then \(c_{1} r_{1,j} \left( {pbest_{i,j}^{O} - Pos_{i,j}^{O} } \right) = \left\{ {{{\left\{ {D,E,F} \right\}} \mathord{\left/ {\vphantom {{\left\{ {D,E,F} \right\}} {0.{92}}}} \right. \kern-\nulldelimiterspace} {0.{92}}}} \right\}\), \(c_{2} r_{2,j} \left( {gbest_{j}^{O} - Pos_{i,j}^{O} } \right) = \left\{ {{{{{\left\{ A \right\}} \mathord{\left/ {\vphantom {{\left\{ A \right\}} {0.{84}}}} \right. \kern-\nulldelimiterspace} {0.{84}}},\left\{ E \right\}} \mathord{\left/ {\vphantom {{{{\left\{ A \right\}} \mathord{\left/ {\vphantom {{\left\{ A \right\}} {0.{84}}}} \right. \kern-\nulldelimiterspace} {0.{84}}},\left\{ E \right\}} {0.{84}}}} \right. \kern-\nulldelimiterspace} {0.{84}}}} \right\}\).

(3) Velocity + Velocity (e.g. \(c_{1} r_{1,j} \left( {pbest_{i,j} - Pos_{i,j} } \right) + c_{2} r_{2,j} \left( {gbest_{j} - Pos_{i,j} } \right)\))

The addition operator between two velocities is given by formula (11).

$$ V_{1} + V_{2} = \left\{ \begin{gathered} \left\{ {{{e_{1} } \mathord{\left/ {\vphantom {{e_{1} } {p\left( {e_{1} } \right),}}} \right. \kern-\nulldelimiterspace} {p\left( {e_{1} } \right),}}{{e_{{2}} } \mathord{\left/ {\vphantom {{e_{{2}} } {p\left( {e_{2} } \right)}}} \right. \kern-\nulldelimiterspace} {p\left( {e_{2} } \right)}}\left| {e_{1} \cap e_{2} = \emptyset } \right.} \right\} \hfill \\ \left\{ {{{e_{1} } \mathord{\left/ {\vphantom {{e_{1} } {p\left( {e_{1} } \right)}}} \right. \kern-\nulldelimiterspace} {p\left( {e_{1} } \right)}}\left| {e_{1} \cap e_{2} \ne \emptyset \wedge p\left( {e_{1} } \right) \ge p\left( {e_{2} } \right)} \right.} \right\} \hfill \\ \left\{ {{{e_{2} } \mathord{\left/ {\vphantom {{e_{2} } {p\left( {e_{2} } \right)}}} \right. \kern-\nulldelimiterspace} {p\left( {e_{2} } \right)}}\left| {e_{1} \cap e_{2} \ne \emptyset \wedge p\left( {e_{1} } \right) < p\left( {e_{2} } \right)} \right.} \right\} \hfill \\ \end{gathered} \right. $$
(11)

where \({{e_{1} } \mathord{\left/ {\vphantom {{e_{1} } {p\left( {e_{1} } \right)}}} \right. \kern-\nulldelimiterspace} {p\left( {e_{1} } \right)}} \in V_{1} ,{{e_{2} } \mathord{\left/ {\vphantom {{e_{2} } {p\left( {e_{2} } \right)}}} \right. \kern-\nulldelimiterspace} {p\left( {e_{2} } \right)}} \in V_{2}\). The principle behind this operator is simple that the structure with higher possibility will be kept for the position updating. Since the possibilities are generated randomly, this method can enhance the diversity of the particles. Continue the above example, \(c_{1} r_{1,j} \left( {pbest_{i,j} - Pos_{i,j} } \right) + c_{2} r_{2,j} \left( {gbest_{j} - Pos_{i,j} } \right) = \left\{ {{{\left\{ A \right\}} \mathord{\left/ {\vphantom {{\left\{ A \right\}} {0.84}}} \right. \kern-\nulldelimiterspace} {0.84}},{{\left\{ {C,D,E} \right\}} \mathord{\left/ {\vphantom {{\left\{ {C,D,E} \right\}} {0.92}}} \right. \kern-\nulldelimiterspace} {0.92}}} \right\}\). Based on three operators, the velocity updating rule given by formula (8) can be performed.

4.3.4 Position Update

So far, a velocity is ready for updating the current position of the particle. However, it is impossible to simply add a velocity to a position because the former contains possibilities. We employ the method proposed in [25] to remove possibilities from the velocity meanwhile filter out the elements whose possibilities are lower than a threshold \(\alpha\) (see formula 12).

$$ V^{\alpha } = \left\{ {e\left| {{e \mathord{\left/ {\vphantom {e {p\left( e \right)}}} \right. \kern-\nulldelimiterspace} {p\left( e \right)}} \in V \wedge p\left( e \right) \ge \alpha } \right.} \right\} $$
(12)

Then, the obtained \(V^{\alpha }\) can be used to update the position of the particle. The operator is given in formula (13).

$$ Pos_{i} + V_{i}^{\alpha } = \left\{ {e\left| {e \in \left\{ {{{e^{\prime}} \mathord{\left/ {\vphantom {{e^{\prime}} { \cup V_{i}^{\alpha } }}} \right. \kern-\nulldelimiterspace} { \cup V_{i}^{\alpha } }}\left| {e^{\prime} \in Pos_{i} } \right.} \right\} \vee e \in V_{i}^{\alpha } } \right.} \right\} $$
(13)

where \(\cup\) denotes a generalized union operation and \({{e^{\prime}} \mathord{\left/ {\vphantom {{e^{\prime}} { \cup V_{i}^{\alpha } }}} \right. \kern-\nulldelimiterspace} { \cup V_{i}^{\alpha } }}\) means that it removes the elements in \(\cup V_{i}^{\alpha }\) from \(e^{\prime}\). The purpose is that it can keep the promising structures in \(V^{\alpha }\). Continue the above example, let \(V_{i,j}^{{{0}{\text{.8}}}} = \left\{ {\left\{ A \right\},\left\{ {C,D,E} \right\}} \right\}\), thus \(\cup V_{i,j}^{{{0}{\text{.8}}}} = \left\{ {A,C,D,E} \right\}\) and \(Pos_{i,j} + V_{i,j}^{{{0}{\text{.8}}}} = \left\{ {\left\{ {C,D} \right\},\left\{ {D,E} \right\}} \right\} + \left\{ {\left\{ A \right\},\left\{ {C,D,E} \right\}} \right\}{ = }\left\{ {\left\{ A \right\},\left\{ {C,D,E} \right\}} \right\}\). The result is shown in Fig. 4.

Fig. 4
figure 4

The result of position update

4.4 Model Repair

Usually, the new particle is not consistent. For example, let the j-th activity and k-th activity of i-th particle be “B” and “E” respectively, it is possible to obtain \(Pos_{i,j}^{O} = \left\{ {\left\{ {C,E} \right\},\left\{ F \right\}} \right\}\), but \(Pos_{i,k}^{I}\) does not contain activity “B”. In addition, the input of the “start” activity as well as the output of the “end” activity may be not empty. Therefore, a repair operation is required after the position update. Firstly, we give a definition of consistence to determine whether a causal matrix is consistent or not.

Definition 5: let \(\Pi = \left( {A,C,I,O} \right)\) be a causal matrix, it is consistent if it satisfies following conditions:

  • \(O\left( {start} \right) \ne \emptyset \wedge I\left( {start} \right) = \emptyset\), where start is the initial activity;

  • \(I\left( {end} \right) \ne \emptyset \wedge O\left( {end} \right) = \emptyset\), where end is the last activity;

  • for \(\forall a^{\prime} \in \cup I\left( a \right)\),\(a \in \cup O\left( {a^{\prime}} \right)\), where \(a,a^{\prime} \in A \wedge a \ne start \wedge a^{\prime} \ne end\);

  • for \(\forall a^{\prime} \in \cup O\left( a \right)\),\(a \in \cup I\left( {a^{\prime}} \right)\), where \(a,a^{\prime} \in A \wedge a \ne end \wedge a^{\prime} \ne start\).

Next, the pseudo code of particle repair is shown in algorithm particle_repair (\(\Pi\),\(a\)). The algorithm firstly sets the \(I\left( {start} \right)\) and \(O\left( {end} \right)\) to empty. Then, it traverses all nodes \(a^{\prime} \in A\) in the particle and repair the \(I\left( {a^{\prime}} \right)\) and \(O\left( {a^{\prime}} \right)\) iteratively.

figure a
figure b

4.5 Local Mutation

Although PSO is a stochastic algorithm that has strong search ability, it often suffers from the problem of premature. In this work, a guided local mutation method is proposed to help the algorithm skip out the local optimum if the premature is detected during the generation.

To guide the local mutation, it requires to locate the incorrect node in a particle. Conformance checking is a hot issue in process mining. The most popular technique for conformance checking is based on alignment [26], in which the A* algorithm is used to find the optimal alignment between a process model and an event trace. However, this method is not appropriate in this work, because finding the optimal alignment is too time-consuming. Besides, it can only locate the incorrect area but not a node. It is inaccurate for the guidance of the local mutation.

In this paper, we prepare to assign each node a score by a fine-grained method (see formula 14). Besides that it can accurately local the incorrect node, another advantage of this method is that the score can be obtained along with the calculation of fitness value of particles [27], thus it has better performance than the alignment-based method. It can be seen that there are three factors in the formula, which are allTokenParsedActivity, allMissingTokens and allTokensLeftOnActivity. The three factors represent the total number of tokens parsed on the activity, the number of missing tokens and the number of tokens that are not consumed on the activity during the log replay, respectively.

$$ Score_{a} = \frac{\# allMissingTokenOnActivity + \# allTokenLeftOnActivity}{{\# allTokenParsedOnActivity}} $$
(14)

The pseudo code of local mutation is given in algorithm 2. For each particle, it firstly selects an activity by the roulette wheel strategy. The higher the score of a node, the easier it is to be selected. Then, the algorithm randomly selects a mutation strategy, including add an arc, delete an arc and redistribute the structure, and applies it on the selected node.

figure c

5 Experiments

This section gives the experimental results and the analysis. The experiments are composed of three parts. Firstly, we compared the search ability and the convergence speed among different combination of strategies to prove the effectiveness of PSOMiner. Secondly, we compared PSOMiner with three state-of-the-art algorithms of process mining on a group of noise-free event logs and a group of noisy event logs to evaluate the performance and the anti-noise ability of the proposed algorithm respectively. Finally, PSOMiner was run on event logs from real world to further validate its performance.

5.1 Event Logs and Experimental Settings

There are 32 different event logs used in the experiments. These event logs were divided into three groups. The first group contains 14 noise-free event logs generated by different process models [28]Footnote 2. The process models are produced by the CPN Tool and have different structural characteristics, such as choice structure, parallel structure and loop structure with different length. The details of the event logs are shown in Table 2, in which the activities column is the number of activity contained in the process models and the event column is the number of events simulated by the CPN tool. Besides, each event log have 300 traces. The event logs of the second group were produced by randomly adding noise in event logs of the first group. We randomly choose one of the following three operations with a probability of 10% to add noise to the event logs, including swap two adjacent events in a trace, add an event to a trace and remove an event from a trace. The third group contains four event logs from real production environment, named Sepsis, BPIC2012, BPIC2013cp and BPIC2013inc respectively.Footnote 3 The former three event logs are from BPI Challenge and all of them are from real world. For example, The BPI2013cp event log is from the IT problem management system of VOLVO and it contains 6660 events that form 1487 traces. The Sepsis event log is from an information system of a hospital which records the process of diagnosis and treatment of sepsis. It contains 15,214 events that form 1050 traces.

Table 2 Event logs

The parameter settings are shown in Table 3. The ParticleSize is set to 1 ~ 2 times of the number of activity in the event log. The reason behind this is the heuristic algorithm of population initialization which would filter out those causal matrices that do not satisfy the constraints. The MaxGeneration means the upper bound of the number of generation and the MaxNotChanged means that the algorithm will stop if the number of times that the best fitness value does not change exceeds the threshold. Parameters \(\lambda_{c} ,\lambda_{p} ,\lambda_{s}\) are set to 0.7, 0.2 and 0.1. The reason is that, for a process model, whether it could reproduce the event log is the most important characteristic that users care about. Parameter \(\mu\) is for premature detection. That is to say, if the variance of the fitness value of particles is lower than 0.1, we think that the algorithm suffers from premature. Then, the guided local mutation strategy is used to help the algorithm skip out the local optimum. The parameter \(\alpha\) is set to 0.8. Parameters \(c_{1} ,c_{2}\) are set to 2. The reason is that the efficiency of the algorithm would be greatly decreased if the value of parameter \(c\) is too small (e.g. below 1.0) because there is a high probability that \(c \times r\) is less than \(\alpha\). Conversely, if the value of parameter \(c\) is large, the randomness of the algorithm will be affected.

Table 3 Setting of parameters

5.2 Effectiveness

In this section, we focus on the validation of the effectiveness of PSOMiner. Four different combinatorial strategies were evaluated, which are PSO + GM (Guided local Mutation, a.k.a. PSOMiner), PSO + RM (Random local Mutation), PSO and GA (genetic algorithm). GA is the Genetic Miner proposed in [7]. Specifically, for comparison, we modified the fitness function and the stopping conditions in GA but the rest part was not changed. The metrics are the best fitness value achieved by these algorithms and the number of generation when they achieve the best results. To ensure the reliability of the results, all the algorithms were run for 20 times and the arithmetic average of the results were calculated.

Experimental results are shown in Figs. 5, 6. It is easy to find that the search ability of PSO is worse than GA. The reason behind it is that the GA algorithm employs some optimization techniques which enhance its search ability. Moreover, both of the number of generation and the achieved fitness value of PSO are the least among four algorithms. It demonstrates that PSO is easy to fall into local optimum. The quality of the results achieved by PSO + RM is slightly worse than that achieved by GA. It is inferior to GA in 9 of 14 event logs, but the convergence speed of PSO + RM is obviously faster than that of GA. Finally, compared with PSO + RM and GA, PSO + GM achieves much better results in both the fitness value and the convergence speed. It won gold medals in 12 of 14 event logs and shared the best result in the rest 2 event logs. Moreover, its convergence speed is faster than PSO + RM and GA. The experimental results prove the effectiveness of PSOMiner.

Fig. 5
figure 5

Fitness value achieved by four different methods

Fig. 6
figure 6

Number of generation of four different algorithms

5.3 Comparison with State-of-the-Art Algorithms

This section compares PSOMiner with three state-of-the-art algorithms of process mining, which are ETM, Hybrid ILP Miner and HM, to evaluate the performance of the proposed algorithm. We use ProM 6.9 [29] which is the most popular platform for process mining. The parameters of three algorithms for comparison are set to default. It should be pointed out that the output process model of three algorithms are different. The output of ETM is a process tree and the output of Hybrid ILP Miner is a petri net and the output of HM is a heuristic net. Fortunately, ProM 6.9 provides function to convert process tree and heuristic net to petri net. For comparison, we convert the obtained petri nets to causal matrices based on definition 3.

To evaluate process models represented by causal matrix, we employ four metrics proposed in [7], which are behavior precision (BP), behavior recall (BR), structural precision (SP) and structural recall (SR). The former two metrics require to replay an event log on the obtained process model and the original process model. BP evaluates how much behavior is allowed by the obtained process model but not by the original process model. BR is for the opposite. If the obtained model perfectly fit to the original model, both of the BP and BR would be 1.0. The latter two metrics are based on the causal relations of the obtained process model and the original process model. SP evaluates how many causal relations the obtained process model has but not in the original process model, and SR is for the opposite.

In the experiments, both of PSOMiner and ETM were run for 20 times because they are stochastic algorithms. We calculate the arithmetic mean value of the four metrics of the output model. Firstly, the experiments were performed on event logs of the first group (a.k.a. the noise-free event logs). The results are shown in Table 4. To facilitate the analysis, the optimal value of metrics obtained by these algorithms are labeled by bold. From the table, it is easy to find that ETM, Hybrid ILP Miner, HM and PSOMiner mine the best process model (i.e. totally same with the original model) on 2 event logs, 8 event logs, 10 event logs and 12 event logs respectively. It can conclude that the performance of PSOMiner is better than other three state-of-art process mining algorithms on noise-free event logs.

Table 4 Comparing PSOMiner with three state-of-art algorithms on noise-free event logs

The proposed algorithm only lost two event log which is ma7 and Herbst6p18. To find out the reason why PSOMiner did not achieve the best results on the two event logs, the most frequent obtained results on ma7 was analyzed in depth (see Fig. 7). In the figures, OR-S, OR-J, AND-S and AND-J denote structures of OR-Split, OR-Join, AND-Split and AND-Join respectively. We find that the incorrect part is the input of activity “D” labeled by red color. The original input of activity “D” is \(I\left( D \right) = \left\{ {\left\{ {C,E} \right\},\left\{ {E,F,B} \right\},\left\{ G \right\}} \right\}\) and the obtained is \(I\left( D \right) = \left\{ {\left\{ {C,E} \right\},\left\{ {F,B} \right\},\left\{ G \right\}} \right\}\). The reason behind this phenomenon is that the position update operation will remove all the elements in \(\cup V_{i}^{\alpha }\) from the particle’s position of last generation. For example, assume \(Pos = \left\{ {\left\{ {C,E} \right\},\left\{ {E,F,B} \right\}} \right\}\) and \(V^{\alpha } { = }\left\{ {\left\{ {C,E} \right\},\left\{ G \right\}} \right\}\), then \(Pos + V^{\alpha } = \left\{ {\left\{ {C,E} \right\},\left\{ {F,B} \right\},\left\{ G \right\}} \right\}\). It can be seen that the intersection of two sets is removed. The proposed algorithm suffers same problem on another event log.

Fig. 7
figure 7

The original process model of a7 (left) and the obtained process model of a7 (right)

Furthermore, the experiments were performed on 14 event logs with 10% noise. Experimental results are given in Table 5. From the results, we can find that PSOMiner achieves the best performance on most of the metrics of these event logs. However, it cannot prove that the process models mined by PSOMiner are better than that of other algorithms, because a process model may have very good performance on three of the four metrics, but very poor on the last one. For fairness, we propose a new metric \(fscore\) which integrates four metrics together (see formula 15). The results of calculation is given in Fig. 8. It is easy to find that PSOMiner is obviously superior to other three algorithms on 8 event logs and is slightly superior to other three algorithms on 5 event logs. It only slightly inferior to Hybrid ILP Miner on the last event log (a.k.a. L2LS). The results demonstrate that PSOMiner has good ability of anti-noise.

$$ fscore = \frac{{B_{p} \cdot B_{r} }}{{B_{p} + B_{r} }} + \frac{{S_{p} \cdot S_{r} }}{{S_{p} + S_{r} }} $$
(15)
Table 5 Comparing PSOMiner with three state-of-the-art algorithms on event logs with 10% noise
Fig. 8
figure 8

f-score of four algorithms on event logs with 10% noise

5.4 Results on Real Data Sets

This section gives the results of experiments performed on four real event logs. Since the process models of these event logs are unknown, the metrics used in previous section are no longer applicable. Three alignment-based metrics, which are alignment-based fitness, alignment-based precision and their combination (i.e. fscore) [30], were employed to evaluate the performance of PSOMiner on real event logs. Moreover, for fairness, we refer to the results of benchmark given in [31]. The output of PSOMiner (a.k.a. causal matrix) was converted to petri nets based on the method in [7], since the tool only supports petri net as the input.

PSOMiner was run for 20 times on each event log. The results are shown in Table 6. Compared with the results of ETM in [31], it is easy to find that the quality of the process models mined by PSOMiner are superior to the quality of the process models mined by ETM. Especially on BPIC2012, BPIC2013cp and BPIC2013cp, the f-score value obtained by PSOMiner is much better than that obtained by ETM. Moreover, compared with other state-of-art algorithms, the f-score obtained by PSOMiner on four event logs are also very competitive. These results prove that PSOMiner can perform well on real event logs.

Table 6 The experimental results of performing PSOMiner on four real event logs

6 Conclusion and Future Work

In this paper, we propose a novel PSO algorithm, named PSOMiner, to solve the problem of process mining. The most interesting thing is that a series of operators based on causal matrix are designed for position updating and velocity updating in PSOMiner. Besides that, a guided local mutation technique is embedded to address the problems of premature.

We validated the performance of the algorithm through a large number of experiments. Experimental results prove that (1) PSOMiner has strong search ability and fast convergence speed. (2) Compared with three state-of-the-art algorithms of process mining, PSOMiner can achieve the optimal results on 12 in 14 noise-free event logs and 13 in 14 noisy event logs. It demonstrates that PSOMiner has good performance and anti-noise ability. (3) PSOMiner achieved competitive results in the last experiment and it proves that PSOMiner can perform well on real world event logs.

Except for the step of evaluation, other steps of PSOMiner are easy to be parallelized, therefore we can employ modern parallel or distributed techniques, e.g. GPU or cluster, to further speed-up the algorithm. Next, the CI-based process mining can be further studied from two aspects. On the one hand, we can utilize current CI-based process mining techniques to solve deeper tasks in this field, such as cross-organization process mining, blockchain-oriented process mining, etc. On the other hand, we can study some recent CI techniques and propose new CI-based algorithms of process mining with higher performance.