Scalable Alignment of Process Models and Event Logs: An Approach Based on Automata and S-Components

Given a model of the expected behavior of a business process and given an event log recording its observed behavior, the problem of business process conformance checking is that of identifying and describing the differences between the process model and the event log. A desirable feature of a conformance checking technique is that it should identify a minimal yet complete set of differences. Existing conformance checking techniques that fulﬁll this property exhibit limited scalability when confronted to large and complex process models and event logs. One reason for this limitation is that existing techniques compare each execution trace in the log against the process model separately, without reusing computations made for one trace when processing subsequent traces. Yet, the execution traces of a business process typically share common fragments (e.g. preﬁxes and sufﬁxes). A second reason is that these techniques do not integrate mechanisms to tackle the combinatorial state explosion inherent to process models with high levels of concurrency. This paper presents two techniques that address these sources of inefﬁciency. The ﬁrst technique starts by transforming the process model and the event log into two automata. These automata are then compared based on a synchronized product, which is computed using an A* heuristic with an admissible heuristic function, thus guaranteeing that the resulting synchronized product captures all differences and is minimal in size. The synchronized product is then used to extract optimal (minimal-length) alignments between each trace of the log and the closest corresponding trace of the model. By representing the event log as a single automaton, this technique allows computations for shared preﬁxes and sufﬁxes to be made only once. The second technique decomposes the process model into a set of automata, known as S-components, such that the product of these automata is equal to the automaton of the whole process model. A product automaton is computed for each S-component separately. The resulting product automata are then recomposed into a single product automaton capturing all the differences between the process model and the event log, but without minimality guarantees. An empirical evaluation using 40 real-life event logs shows that, used in tandem, the proposed techniques outperform state-of-the-art baselines in terms of execution times in a vast majority of cases, with improvements ranging from several-fold to one order of magnitude. Moreover, the decomposition-based technique leads to optimal trace alignments for the vast majority of datasets and close to optimal alignments for the remaining ones.


Introduction
Modern information systems maintain detailed business process execution trails. For example, an enterprise resource planning system keeps records of key events related to a company's order-to-cash process, such as the receipt and confirmation of purchase orders, the delivery of products, and the creation and payment of invoices. Such records can be grouped into an event log consisting of sequences of events (called traces), each consisting of all event records pertaining to one case of a process.
Process mining techniques [1] allow us to exploit such event logs in order to gain insights into the performance and conformance of business processes. One widely used family of process mining techniques is conformance checking [2]. A conformance checking technique takes as input a process model capturing the expected behavior of a business process, and an event log capturing its observed behavior. The goal of conformance checking is to identify and describe the differences between the process model and the event log. A desirable feature of a conformance checking technique is that is should identify a minimal yet complete set of behavioral differences.
Existing conformance checking techniques that fulfill these properties [3,4,5] exhibit limited scalability when confronted to large and complex event logs. For example, in a collection of 40 real-life event logs presented later in this paper, the execution times of these techniques are over 10 seconds in about a quarter of cases and over 5 seconds in about half of cases, which hampers the use of these techniques in interactive settings as well as in use cases where it is necessary to apply conformance checking repeatedly, for example in the context of automated process discovery [6], where several candidate models need to be compared by computing their conformance with respect to a given log.
This paper presents two complementary techniques to address these shortcomings. The first technique starts by transforming the process model and the event log into two automata. Specifically, the process model is transformed into a minimal Deterministic Acyclic Finite State Automaton (DAFSA), while the process model is transformed into another automaton, namely its reachability graph. These automata are then compared using an error-correcting synchronized product, computed via an A* heuristic with an admissible heuristic function, which guarantees that the resulting synchronized product captures all differences with a minimal number of error corrections. The synchronized product is then used to extract optimal (minimal-size) alignments between each trace of the log and the closest corresponding trace of the model, as well as statements describing the behavior observed in the log but not captured in the model, and vice-versa, using the verbalization technique presented in [4].
A limitation of this first technique is that as the level of concurrency in the process model increases, the size of the automaton of the process model grows exponentially, thus hampering scalability. For example, we consider the loan application process model displayed in Fig. 1 using BPMN notation. The process starts when a credit application is received, then the credit history, the income sources, personal identification and other financial information are checked. Once the application is assessed, either a credit offer is made, the application is rejected or additional information is requested (the latter leading to a re-assessment). Note that the number of possible interleavings of the parallel activities increases rapidly, in this case the four tasks in parallel can be executed in 24 different ways, thus leading to a combinatorial explosion when computing an automaton from this process model. To address this shortcoming, the paper proposes a second technique wherein the process model is first decomposed into a set of automata, known as S-components, such that the product of these automata is equal to the automaton of the whole process model. An error-correcting product automaton is computed for each S-component separately and the resulting product automata are then recomposed into a single error-correcting product automaton capturing all the differences between the process model and the event log, but without minimality guarantees.
Coming back to the example in Fig. 1, the second technique starts by decomposing the model into four sub-models -each containing one of the four parallel tasks. Each of these concurrency-free models is then handled separately, 2 thus avoiding the computation of all possible interleavings and reducing the search space for computing the minimal error-correcting synchronized product. This paper is an extended and revised version of a previous conference paper [7]. This latter paper introduced the first technique mentioned above (automata-based conformance checking). With respect to the conference version, the additional contributions are the decomposition-based technique, an extensive empirical evaluation based on 40 real-life datasets, as well as correctness proofs both for the automata-based and the decomposition-based technique.
The next section discusses existing conformance checking techniques. Section 3 introduces definitions and notations related to finite state machines, Petri nets and event logs. Next, Section 4 introduces the automata-based technique, while Section 5 presents the technique based on S-component decomposition. Finally, Section 6 presents the empirical evaluation while Section 7 summarizes the contributions and discusses avenues for future work. sake of uniformity, L denotes a finite set of labels, τ is a special "silent" label and we write Σ = L ∪ {τ}. We use Z-notation [20] operators over sequences. Given a sequence c = x 1 , x 2 , . . . , x n , |c| denotes the size, and head and tail retrieve the first and last element of a sequence, respectively, i.e., |c| = n, head(c) = x 1 and tail(c) = x n . The element at index i in the sequence c is retrieved as c[i] = x i . The operators for and after retrieve the elements before and after i in a sequence, respectively. For example, for(c, i) = x 1 , . . . , x i and after(c, i) = x i+1 , . . . x n . Finally, MultiSet denotes the multiset representation of a sequence.

Finite state machines
A pervasive concept in our approaches is that of finite state machine (FSM), which is defined as follows.
Definition 3.1 (Finite State Machine (FSM)). Given the set of labels Σ, a finite state machine is a directed graph F = (N, A, s, R), where N is a finite non-empty set of states, A ⊆ N × Σ × N is a set of arcs, s ∈ N is an initial state, and R ⊆ N is a set of final states.
An arc in a FSM is a triplet a = (n s , l, n t ), where n s is the source state, n t is the target state and l is the label associated to the arc. We define functions src(a) = n s to retrieve the source state, λ (a) = l to retrieve the label and tgt(a) = n t to retrieve the target state of a. Furthermore, given a node n ∈ N and arc a = (n s , l, n t ) ∈ A, let n a = n t if n = n s , and n a = n otherwise. The set of incoming and outgoing arcs of a state n is defined as n = {a ∈ A | n = tgt(a)} and n = {a ∈ A | n = src(a)}, respectively. Finally, a sequence of (contiguous) arcs in a FSM is called a path.

Petri nets
Process models are normative descriptions of business processes and define the expected behavior of the process. Over the years, several business process modelling languages have been proposed, such as Petri nets, BPMN and EPC. In the context of this work, business processes are represented as a particular family of Petri nets, namely labelled freechoice sound workflow nets. This formalism uses transitions to represent activities, and places to represent resource containers. The formal definition of labelled Petri nets is given next. Definition 3.2 (Labelled Petri net). A (labelled) Petri net, or simply a net, is the tuple N = (P, T, F, λ ), where P and T are disjoint sets of places and transitions, respectively (together called nodes); F ⊆ (P × T ) ∪ (T × P) is the flow relation, and λ : T → Σ is a labelling function mapping transitions to the set of task labels Σ containing the special label τ.
Transitions labeled with τ describe invisible actions that are not recorded in the event log when executed. A node x is in the preset of a node y if there is a transition from x to y and, conversely, a node z is in the postset of y if there is a transition from y to z. Then, the preset of a node y is the set •y = {x ∈ P ∪ T |(x, y) ∈ F} and the postset of y is the set y• = {z ∈ P ∪ T |(y, z) ∈ F}.
Workflow nets [21] are Petri nets with two special places, an initial and a final place.
is a labelled Petri net, i ∈ P is the initial and o ∈ P is the final place, and the following properties hold: • The initial place i has an empty present and the final place has an empty postset, i.e., • If a transition t * were added from o to i, such that •i = o• = {t * }, then the resulting net is strongly connected.
The execution semantics of a net can be represented by means of markings. A marking m : P → N 0 is a function that associates places to natural numbers representing the amount of tokens in each place at a given execution state. As we will later work with the so-called incidence matrix of a Petri net, we define the semantics already in terms of vectors over places. Fixing an order {p 1 , . . . , p k } = P over all places, we write a marking m as a column vector m = m(p 1 ), . . . , m(p n ) . We slightly abuse notation and write m for both the function and the column vector; further we represent m as the multiset of marked places in our examples. In vector notation, the pre-set •t of any transition t defines a column-vector N − (t) = x 1 , . . . , x k with x i = 1 if p i ∈ •t, and x i = 0 otherwise. Correspondingly, we define N + (t) = z 1 , . . . , z k with z i = 1 if p i ∈ t•, and z i = 0 otherwise, for the post-set of t. We lift +, −, and ≤ to vectors by element-wise application.
A transition t is enabled at a marking m if each pre-place of t contains a token in p, i.e, N − (t) ≤ m. An enabled transition t can fire and yield a new marking m = m − N − (t) + N + (t) by consuming from all its pre-places (N − (t)) and producing on all its post-places (N + (t)). A marking m is reachable from another marking m , if there exists a sequence of firing transitions σ = t 1 , . . .t n such that ∀1 ≤ i < n : where m 0 = m and m n = m. A marking k-bounded if every place at a marking m has up to k tokens, i.e., m(p) ≤ k for any p ∈ P. A net equipped with an initial marking and a final marking is called a (Petri) system net. The following definition for net system refers specifically to workflow nets.
where WN is a labelled workflow net, m 0 denotes the initial marking and m f denotes the final marking.
A system net is k-bounded if every reachable marking in the net is k-bounded. This work considers 1-bounded system nets that are sound [22], i.e., where from any marking m reachable from m 0 we can always reach some m f ∈ m f , there is no reachable marking m > m f ∈ m f that contains a final marking, and each transition is enabled in some reachable marking. Figure 2 shows the system net representation for our running example. The reachability graph [23] of a system net SN contains all possible markings of SN -denoted as M. Intuitively, a reachability graph is a non-deterministic FSM where states denote markings, and arcs denote the firing of a transition from one marking to another. The reachability graph for the running example is depicted in Fig. 3 showing markings as multi-sets of places. In this figure, every node contains the places with a token at each of the reachable markings. The complexity for constructing a reachability graph of a safe Petri net is O(2 |P∪T | ) [24].

Event logs
Event logs, or simply logs, record the execution of activities in a business process. These logs represent the executions of process instances as traces -sequences of activity occurrences (a.k.a. events). A trace can be represented as a sequence of labels, such that each label signifies an event. Although an event log is a multiset of traces containing several occurrences of the same trace, we are only interested in the distinct traces in the log and, therefore, we define a log as a set of traces. Figure 4 depicts an example of a log containing activities of the loan application process in Fig. 1 -for readability purposes, Fig. 4 uses the letters next to each of the activities in the model in Fig. 1.
Definition 3.6 (Trace and event log). Given a finite set of labels L, a trace is a finite sequence of labels l 1 , ..., l n ∈ L * , such that l i ∈ L for any 1 ≤ i ≤ n. An event log L is a set of traces.

Automata-based conformance cheking
The objective of conformance checking is to identify an ideally minimal set of differences between behavior of a given process model and a given log. As illustrated in Fig. 5, the first approach proposed in this paper does so by constructing an error-correcting product, between the reachability graph of the model and an automaton-based representation of the log (called DAFSA). (1) First, the input process model is expanded into a reachability graph. (2) In parallel, the event log is compressed into a minimal acyclic and deterministic FSM, a.k.a. DAFSA. The resulting reachability graph and DAFSA are then compared (3) to create an error-correcting synchronized product automatonherein called a PSP, short for Partially Synchronized Product. Each state in the PSP is a pair consisting of a state in the reachability graph and a state in the DAFSA. A PSP represents a set of optimal trace alignments that can be used for (4) diagnosing behavioral difference statements via further analysis. The rest of this section starts by introducing some necessary concepts and is followed by a description of each of the steps.

τ-less reachability graph of a process model
Even though τ-transitions represent invisible steps that are not recorded in an event log, they are captured in the reachability graph of a Petri net. In principle, we assume that a Petri net has a minimal number of τ-transitions, for instance, by applying structural reduction rules that preserve all visible behavior [25]. However not all τ-transitions can be removed by structural reduction of the Petri net. We therefore remove the remaining τ-transitions through behavior preserving reduction rules on the reachability graph by the breadth-first search algorithm given in Alg. 1. Intuitively, for every marking m reached by a τ-transition a 1 = (m 1 , τ, m) ∈ A R and any outgoing transition a 2 = (m, l, m 2 ) ∈ A R , the algorithm replaces a 1 with a 12 = (m 1 , l, m 2 ) (lines 6-8 and lines [19][20][21]. This replacement is repeated until all arcs representing τ-transitions are removed. In case all incoming arcs of a state get replaced we also remove m and its outgoing arcs (Lines 12-16). Function replaceTau also handles the case of another outgoing τ-labeled transition a 2 = (m, τ, m 2 ) by a depth-first search along τ-transitions in A R (lines [22][23][24]. The algorithm then removes each remaining τ transition a = (m 1 , τ, m f ) targeting the final marking while introducing new replacement arcs a = (m 2 , l, m f ) for each incoming arc of m 1 , such that (m 2 , l, m 1 ) ∈ A R (Line 17 and function replaceTauBackwards). The reachability graph returned by Alg. 1 is now free of τ transitions. Figure 6 shows the τ-less reachability graph of the loan application process.

Compact DAFSA representation of an event log
Event logs can be represented as Deterministic Acyclic Finite State Automata (DAFSA), which are acyclic and deterministic FSMs. A DAFSA can represent words, in our case traces, in a compact manner by exploiting prefix and suffix compression. Daciuk et al. [26] present an efficient algorithm for constructing a DAFSA from a set of words. In the constructed algorithm every word is a path from the initial to a final state and, vice versa, every path from an initial to a final state is one of the given words. We reuse this algorithm to construct a DAFSA from an event log, where the words are the set of traces. The complexity of building the DAFSA is O(|L| · log n), where L is the set of distinct event labels, and n is the number of states in the DAFSA. 8 Figure 6. Tau-less reachability graph of the running example.
The prefix of a state n ∈ N D is the sequence of labels associated to the arcs in the path from the initial state to n and, analogously, the suffix of n is the sequence of labels in a path from n to a final state. The prefix of the initial state and the suffix of a final state is { }. A state n can have several prefixes, which are denoted by pref (n) = (n s ,l,n t )∈ n {x ⊕ l | x ∈ pref (n s )}, where ⊕ denotes the concatenation operator. Similarly, the set of suffixes of n is represented by suff (n) = (n s ,l,n t )∈n {l ⊕ x | x ∈ suff (n t )}. Prefixes and suffixes are said to be common iff they are shared by more than one trace.   , G, f 1 ) . In this example, the two common prefixes in nodes n 2 and n 11 , as well as the common suffices from nodes n 4 and n 5 , are shared by two traces in the event log.

Error-correcting synchronised product
The computation of similar and deviant behavior between an event log and a process model is based on an errorcorrecting synchronized product (PSP) [27]. Intuitively, the traces represented in the DAFSA are "aligned" with the executions of the model by means of three operations: (1) synchronized move (match), the process model and the event log can execute the same task/event with respect to their label; (2) log operation (lhide), an event observed in the log cannot occur in the model; and (3) model operation (rhide), a task in the model can occur, but the corresponding 9 event is missing in the log. Both a trace in a log and an execution represented in a reachability graph are totally ordered sets of events (sequences). An alignment aims at matching events from both sequences that represent the tasks with the same labels, such that the order between the matched events is preserved. An event that is not matched has to be hidden using the operation lhide if it belongs to the log, or rhide if it belongs to an execution in the model. For example, given a trace in a log D, B,C, E, G and an execution in a model B, D,C, A, E, G , it is possible to match the events with label E, and either the events with label B or the events with label D, but not both. Finally, the events that are not matched needed to be hidden. In our context, the alignments are computed over a pair of FSMs, a DAFSA and a reachability graph, therefore the three operations: match, lhide and rhide, are applied over the arcs of both FSMs. A match is applied over a pair of arcs (one in the DAFSA and one in the reachability graph) whereas lhide and rhide are applied only over one arc. We record the type of operation and the involved arcs in a triplet called synchronization where ⊥ denotes the absence of an arc in case of lhide and rhide.
All possible alignments between the traces represented in a DAFSA and the executions represented in a reachability graph can be inductively computed as follows. The construction starts by pairing the initial states of both FSMs and then applying the three defined operations over the arcs that can be taken in the DAFSA and in the reachability graph -each application of the operations (synchronization) yield a new pairing of states. Note that the alignments between (partial) traces and executions are implicitly computed as sequences of synchronizations.
Given a sequence of synchronizations ε = β 1 , . . . , β m with β i = (op i , a i,D , a i,R ), 1 ≤ i ≤ m, we define two projection operations ε D and ε R that retrieve the sequence of arcs for the DAFSA and the reachability graph, respectively. The projection onto D is the sequence ε D = a 1,D , . . . , a m,D A D of the D-entries in ε projected onto the arcs in D (i.e., removing all ⊥). Correspondingly, ε R = a 1,R , . . . , a m,R A R . Thus, ε D (ε R ) contains the arcs of all match and lhide (rhide) triplets. On top of that notation, we are interested in the sequence of labels represented by a sequence of arcs, shorthanded as λ (ε D ) = λ (a 1 ), . . . , λ (a n ) .
. The set of all proper alignments for a given trace c is denoted as C (c, R). We write ε c|op = {β = (op, a D , a R ) ∈ ε c } for the synchronizations of a particular operation op in a given alignment ε c .
A cost can be associated to a proper alignment for a given trace. If an asynchronous lhide or rhide move is associated to a non-τ label then the cost increases. Assuming that the cost of hiding a non-τ transition is 1, the cost function is given as follows: All alignments can be collected in a finite state machine called PSP [27]. Every state in the PSP is a triplet (n, m, ε), where n is a state in the DAFSA D, m is a state in the reachability graph R and ε is the sequences of arcs taken in the D and in R to reach n and m; every arc of the PSP is a synchronization of D and R; the pairing of the initial states is the initial state of the PSP; and the finial states are those with no outgoing arcs. Definition 4.6 (PSP). Given a DAFSA D and a reachability graph R, their PSP P is a finite state machine P = (N P , A P , s P , R P ), where N P ⊆ N D × M × C is the set of nodes, A P = N P × S(D, R) × N P is the set of arcs, s P = (s D , m 0 , ) ∈ N P is the initial node, and R P = {f ∈ N P | f = ∅} is the set of final nodes. 10 The PSP contains all possible alignments, however we are interested in the proper alignments with minimum cost. These alignments are called optimal. The computation of all possible alignments can become infeasible when the search space is too large. Thus, we use an A * algorithm [28] to consider the most promising paths in the PSP first, i.e., those minimizing the number of hides. We define the cost function for the A * as follows.
Definition 4.7 (A * -cost function). Let L and P be a given event log and PSP, then for every trace c ∈ L and every node x = (n, m, ε) ∈ P we define a cost function ρ(x, c) = g(x, c) + h(x, c) that relies on the current cost function g and a heuristic function h for estimating future hides for a given trace. We define functions g and h as follows: Function g returns the current cost for a given node x in the PSP and a given trace c to align. If the trace labels of the partial alignment of x, i.e. ε(x) D , fully represent a prefix of c then the cost of ε(x) is that of the cost function defined in Def. 4.5. Otherwise, node x is not relevant to trace c and the cost is set to ∞ to avoid considering this node in the search. Function h relies on two functions F Log and F Model . F Log (x, c) = MultiSet(c) \ ε D denotes the multiset of future trace labels and F Model is the set of multisets of future model labels. The set of future model labels F Model (x) is computed in a backwards breadth-first traversal over the strongly connected components of the reachability graph from each of its final markings. The multisets of task labels are collected during the traversal and stored in each node of the graph. All labels from cyclic arcs inside strongly connected components are gathered during the traversal with a special symbol ω representing that the label can be repeated any number of times. For the comparison of these labels to achieve an underestimating function, we set these labels to infinity for the term F Log \ f Model and to 0 for the term f Model \ F Log , i.e. we assume that repeated task labels match all corresponding labels in the trace. Observe that h assumes that all events with the same label in F Log and f Model are matched, this is clearly an optimistic approximation, since some of the those matches might not be possible; then the optimistic approximation computed by h guarantees the optimality of the alignments; h is admissible.
Algorithm 2 shows the procedure to build the PSP, where an A * search is applied to find all optimal alignments for each trace in a log. The algorithm chooses a node with minimal cost ρ, such that if it pairs two final states (one in the DAFSA and one in the reachability graph) -representing the alignment of a complete trace -then it is marked as an optimal alignment. Otherwise, the search continues by applying lhide, rhide and match. As shown in [4], the 11 Algorithm 3: Construct the PSP with Prefix-and Suffix Memoization replace line 2 with the following block: Reuse common prefix alignments replace line 14 with the following block: complexity for constructing the PSP is in the order of O(3 |N D |·|M| ) where N D is the set of states in the DAFSA and M is the set of reachable markings of the Petri net.
In order to optimize the computation of the PSP, two memoization tables are used: prefix and suffix. Both tables store partial trace alignments for common prefixes and suffixes that have been aligned previously. The integration of these tables requires the modification of Alg. 2, as shown in Alg. 3. For each trace c, the algorithm starts by checking if there is a common prefix for c in the prefix memoization table. If this is the case, the A * starts from the nodes stored in the memoization table for the partial trace alignments that have been previously observed. In the case of common suffix memoization, the algorithm checks at each iteration whether the current pair of nodes and the current suffix is stored in the suffix memoization table. If this is the case, the algorithm appends nodes to the A * search for each pair of memoized final nodes and appends all partial suffix alignments to the current alignment instead of continuing the regular search procedure. By reusing the information stored in these tables, the search space for the A * is reduced.
The approach illustrated so far produces a PSP containing all optimal alignments. Nevertheless, if only one optimal alignment is required, then the algorithm can be easily modified to stop as soon as the first alignment is found. Overall, the complexity of the proposed approach is exponential in the worst case, i.e. O(|Σ| · log n + 2 |P∪T | + 3 |N D |·|M| ).
The A * will compute the cost of performing the following possible synchronizations: (match, B), (lhide, B) (rhide, A), (rhide, B), (rhide,C) and (rhide, D). Out of these six possibilities it will only explore (match, B) 1 and (rhide, A) which have a cost of one. Other synchronizations like (rhide, B) 2 will never be explored since they have a cost of three and there exist nodes with a lower cost. The A * search will continue exploring the possible synchronizations until all optimal alignments are discovered.

Deterministic alignments
A trace can have several optimal alignments, however, in order to have a deterministic computation of a single optimal alignment, we define an order on the construction of the PSP. This order is imposed on the operations, with the following precedence order: match > rhide > lhide, and on the lexicographic order of the activity labels. We apply this precedence order at each iteration of the A * -search on the set of candidate nodes of the queue that all have the lowest cost values w.r.t. ρ. In that way the A * search will still always explore the cheapest nodes first and guarantees to find an alignment with optimal cost. The precedence order merely provides a tool to deterministically select an optimal alignment from the set of optimal alignments with a specific order of operations and activity labels already during the exploration of the search space.
We choose to prioritize rhide over lhide synchronizations in the preference order to increase the number of match synchronizations in the returned optimal alignment. We would like to remind the reader that an increase in match synchronizations does not change the cost function for an alignment as per Def. 4.7. An alignment with more match synchronizations, however, can link the observed trace more closely to the process model. The following lemma shows that for optimal alignments, more rhide synchronizations lead to more match synchronizations.
Lemma 4.1. Let ε c be an optimal alignment for a trace c. For any other optimal alignment ε c for c, such that ε c|rhide < ε c|rhide , then ε c|match < ε c|match .
Proof. Given two optimal alignments ε c , ε c , it holds that these two alignments have the same cost according to Def. 4.5, i.e. cost(ε c ) = cost(ε c ), and these two alignments are proper according to Def. 4.4. Further, we assume that ε c has more rhide synchronizations than ε c , i.e. ε c|rhide < ε c|rhide . As a first step, we assume that ε c has exactly one more rhide synchronization than ε c , i.e. ε c|rhide = ε c|rhide + 1. The cost of an alignment is the number of rhide and lhide synchronizations disregarding all synchronizations involving τ. Since we remove all τ-labelled transitions in Alg. 1, the cost of an alignment equals exactly to the number of rhide and lhide synchronizations. By the assumptions, ε c has one more rhide synchronization than, and the same cost as, ε c and so it follows that ε c has exactly one more lhide synchronization for a trace label than ε c , i.e. (lhide, ) ∈ ε c ∧ (lhide, ) / ∈ ε c ∧ ∈ c. Since both alignments properly represent the trace, the sum of their lhide and match synchronizations is equal to the size of the trace |c|. Therefore, ε c needs to have one more match synchronizations than ε c , in particular (match, ) ∈ ε c ∧ (match, ) / ∈ ε c ∧ ∈ c. The general case of multiple rhide synchronizations follows from inductive reasoning. If an optimal alignment ε c has x more rhide than another optimal alignment ε c , then ε c must have x more lhide than ε c because they have the same cost. Similarly, ε c must have x more match synchronizations than ε c since the number of lhide and match synchronizations needs to equal to the size of the trace |c|. Hence, it holds for two optimal alignments ε c , ε c with ε c|rhide < ε c|rhide that ε c|match < ε c|match and thus the proof is complete.  Algorithm 4 shows the modified procedure to construct a PSP containing one deterministic optimal alignment for a given trace c which differs from Alg. 2 by using the deterministic selection criteria explained above (line 10), and terminating when the entire trace has been read and the final state in R has been reached (line 13).
Algorithm 4: Revised for one-optimal: Construct the PSP 10 choose a tuple n act = ((n D , m, ε), ρ) ∈ Opt with the following priorities : op(ε(|ε|)) : match > rhide > lhide and choosing λ (ε(|ε|)) in lexicographical order; Note that the "final" node (r D , r R , ε c ) returned in line 13 defines a sequence ε c of synchronizations. Next, we show that ε c is indeed an optimal alignment of c to R. Let φ (c, P) = ε c be a function that "extracts" ε c out of the constructed PSP P returned by Alg. 4. Proof (Sketch). In order to prove that ε c is a proper alignment, we proceed to show that it fulfils the two properties in Def. 4.4.
(1) The projection on the DAFSA reflects the trace λ (ε c D ) = c. Recall that the projection of any proper alignment onto D contains only match or lhide operations. Alg.4 starts at the initial state of the DAFSA for every given trace, iterates over the trace (9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21) and adds lhide-operations (line 17) and match-operations (line 19) for outgoing arcs with the next label of the trace. Every alignment ε c returned by Alg. 4 then fulfils this property by construction as it needs to fulfil the condition ε D = c in line 12 for determining if a given alignment is final.
(2) ε c R is a path form m 0 to a final marking m f ∈ M f . Recall that the projection of any proper alignment onto R contains only match or rhide operations. The algorithm always starts to add arcs from the initial marking of the reachability graph. At every iteration of the main loop (9-21) it either adds arcs with match operations in line 19 or with rhide operations in line 20 from the set of outgoing arcs of the current marking in the reachability graph. The algorithm then adds a new node to the queue that contains the target of the added arc. By lines 18 and 20, subsequent arcs are only added if they are outgoing arcs of the node m reached in R, and thus will always form a path in R. This path will always start from the initial marking and end in a final marking as per the condition in line 12 and thus it is a path through the reachability graph.
Proof (Sketch). Algorithm 4 finds alignment ε c inside the while loop in function align (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21). Potential alignments are inserted into a queue in lines 19, 17 and 20. In line 10, a candidate alignment is chosen from the queue with a minimal cost function value with respect to ρ. In each iteration of the while loop, the active candidate alignment is checked for being final in line 12. Once a candidate alignment ε c is found final, it is returned by the function. Since all candidate alignments ε in the queue are selected and then removed according to their cost function value ρ(ε) in increasing order, the first alignment that is a proper alignment for trace c will have a minimal value for ρ(ε c ). If h(x) = 0 would hold, then the candidate alignment would always be picked according to the cost function g and trivially the first final alignment would also be optimal, since all alignments with smaller costs had been investigated.
Observe that for all final states f ∈ R P , h( f ) = 0, since every final state in the PSP represents a proper alignment and a proper alignment fully represents the trace, i.e. F Log = / 0, and its projection on the reachability graph represents a path, i.e. F Model = / 0. It follows that ε c is optimal w.r.t. ρ, when function h underestimates the cost to the optimal cost for any investigated node, which is in line with the optimality criterion of the A * -search algorithm [28].
We show that our definition of function h fulfils this criterion by analyzing how it estimates future hides for any given node. Let node x be a candidate node, function h compares the multiset of future log labels, determined by trace c set minus the already aligned trace labels ε(x) D , with every possible multiset of future model labels to all possible final markings. The multisets of future task labels represent possible paths in the reachability graph to a final marking and a path to a final node in the DAFSA representing the suffix of trace c. By comparing multisets to find deviations, the context of task labels is dropped and h allows for a lower cost than g. Repeated task labels are also assumed to be matched in these multisets and thus are not taken into account in the comparison. Finally, function h minimizes the difference of all multiset comparisons such that it always finds the closest final marking in terms of distance. Givent that the multisets represent possible paths, the value of h can only be as high as the true cost of a path and will underestimate the cost in case the abstractions obscure differences due to context or cyclic structures. Thus, h underestimates the true cost to the closest final marking and thus the alignment ε c is minimal with respect to ρ.

Taming concurrency with S-Components
The automata-based technique presented in the previous section suffers from a fundamental scalability limitation due to the fact that it needs to materialize the reachability graph and the size of the reachability graph increases exponentially with the number of parallel activities. This section presents a novel (quasi-optimal) divide-and-conquer approach based on the decomposition of the model into paralellism-free sub-models, so called S-Components.

Decomposition of the Petri net
The decomposition approach considers uniquely-labelled sound free-choice workflow nets, a subclass of workflow nets [21,29]. A workflow net is uniquely labelled if every non-silent label is assigned to at most one transition. Soundness was defined in Sect. 3.2. A net is free-choice iff whenever two transitions t 1 and t 2 share a common preplace s, then s is their only pre-place; in a free-choice net concurrency and choices are clearly separated. The formal definitions are given below. is free-choice iff for any two transitions t 1 ,t 2 ∈ T : s ∈ •t 1 ∩ •t 2 implies •t 1 = •t 2 = {s}. A workflow net is uniquelylabelled, iff for any t 1 ,t 2 ∈ T N , λ (t 1 ) = λ (t 2 ) = τ =⇒ t 1 = t 2 . A system net is uniquely-labelled, sound, and freechoice if the underlying workflow net is.
An S-Component [21,29] of a net is a substructure, where every transition has one incoming and one outgoing arc (it does not contain parallelism). A well-formed free-choice workflow net is covered by S-Components and every place, arc and transition of the workflow net is contained in at least one S-Component, which is also a workflow net. Figure 12 shows 4 different S-components of the running example of Fig. 2. Each S-Component contains one of the four tasks A, B, C or D that can be executed in parallel. Note that S-component overlap on non-concurrent parts of the net, e.g., on p 9 , as indicated by nodes with solid borders.
Before we explain the decomposition of a workflow net into S-Components, we need to introduce the concept of the incidence matrix of a Petri-net. Recall from Sect. 3.2 that a marking m = m(p 1 ), . . . , m(p k ) is a column vector over the places P = {p 1 , . . . , p k }; and vectors N − (t) and N + (t) describe the tokens consumed and produced by t on each p ∈ P. The resulting effect of t on P is N(t) = N − (t) + N + (t). The incidence matrix of a net N is the matrix N = N(t 1 ) . . . N(t r ) of the effects of all transitions T = {t 1 , . . . ,t r }. Given a firing sequence σ in WN starting in m 0 , let the row vector y = y 1 , . . . , y r specify how often each t i , i = 1, . . . , r occurred in σ . For any such row vector, the marking equation m = m 0 + N · y yields the marking reached by firing σ . Figure 11 shows how the marking equation of the Petri net of our sample loan application process in Fig. 2 gives a new marking from the initial marking. Figure 11. Marking equation to reach marking (p10) for our loan application example.
The decomposition of a sound free-choice Petri net into S-Components is based on its place invariants. A place invariant is an integer solution J to the equation J · N = 0 describing that the number of tokens (weighted by J) is constant over all reachable markings, i.e., J · m 0 = J · m for all reachable markings m of N, because J · N · y = 0 [29]. The equation J · N = 0 has an infinite number of solutions. We are interested the unique set of PI non-trivial place invariants (different from 0) that are minimal (not linear combinations of other place invariants of N) can be obtained through standard linear-algebra techniques. Each minimal place invariant J possibly defines an S-Component as a subnet of the workflow net consisting of the support J of J [29]. The workflow net can be decomposed into n S-Component subnets, where n is the number of minimal place invariants of the workflow net, i.e. |PI|. We next define a S-Component net and the decomposition of a workflow net.
For the set of all minimal place invariants PI of WN, the S-Component decomposition C is a non-empty set of S-Component workflow nets that cover WN, i.e. C = {WN J | J ∈ PI}.

Conformance checking with S-Component decomposition
This section introduces a novel divide-and-conquer approach to speed up the conformance checking between a system net and an event log. The division of the problem relies on the decomposition of the workflow net into S-Component workflow nets as introduced in Section 5.1.
The following definition introduces trace projection, an operation that filters out the events with labels not contained in the alphabet of a particular S-Component.
The novel divide-and-conquer approach decomposes the workflow nets into concurrency-free sub-workflow nets -S-Components -, computing partial alignments between projected traces and S-components, and recomposing the partial alignments to create alignments for each trace in the log. Note that the alignments are partial because the projected traces are only parts of a complete trace. In the following, we explain the full procedure, illustrated in Fig. 13 and defined in Alg. 5, as we obtain and re-compose partial alignments for the trace B, D, A, E, F, G in our running example and the S-Component workflow nets (Fig. 12). Observe that in our running example there are four S-Component workflow nets, each representing the execution of one of the parallel activities A, B,C and D.
Algorithmic idea. Algorithm 5 starts by computing the reachability graphs for each of the computed S-Components SN i (having alphabet L i ), as well as the DAFSAs of the projected logs with alphabet L i (see Lines 1-3). It continues by taking each trace in the log, in this case the trace B, D, A, E, F, G , and projecting it onto the alphabet of each S-component c L i . Thus, four partial traces are created: B, E, F, G , D, E, F, G , E, F, G and A, E, F, G . The traces share the subsequence E, F, G as the corresponding transitions are in the sequential part of the net and hence in all S-components in Fig. 12.
Then, we compute the deterministic optimal alignment ε i of each projected trace c L i to its S-component SN i (by calling Alg. 4 in line 5 of Alg. 5); we call each ε i a projected alignment. Figure 13 shows the four optimal alignments ε 1 -ε 4 retrieved by Alg. 4 for our running example. Note that because each SN i is sequential, the reachability graph of 17 each SN i has the same size as SN i itself. Thus the k projected alignment problems are exponentially smaller than the alignment problem on the reachability graph of the original SN.
Once the partial alignments have been computed, we iterate over the original trace c and compose the projected alignments ε i = β i 1 β i 2 . . . β i n i between D i and R i of SN i , i = 1, . . . , k along all "shared" synchronizations into a "global" composed alignment ε c between D and R of SN. For example, in Fig. 13, all projected alignments have a "shared" synchronization match(E) (as E is shared by all 4 S-components), so we first advance in each ε 1 -ε 4 over their non-shared synchronizations (one step in each individual S-component) and then compose all match(E) synchronizations of ε 1 -ε 4 into one match(E) synchronization of ε c .
Technically, we compose the projected alignments ε i into ε c by composing the arcs of all the D i and of all the R i along the trace c. The composed arcs over the D i have to form a path through D and the composed arcs over the R i have to form a path through R. Then ε c is an alignment of c to SN by Def. 4.4.
Partially composing k FSMs. Recall that each synchronization β i = (op i , b i , a i ) of a projected alignment ε i refers to an arc b i = (n i , , n i ) of DAFSA D i and/or an arc a i = (m i , , m i ) of the reachability graph R i of SN i ; D i and R i are FSMs.
Next we describe how we technically compose nodes and arcs of D i , i = 1, . . . , k and R i , i = 1, . . . , k, respectively, into composed nodes (as vectors of nodes) and composed arcs (as partial vectors of arcs). The data structures resemble those of the PSP in Def. 4.6. A composed node of all the R i , i = 1, . . . , k is vector m = m 1 , . . . , m k , m i ∈ N R i ; and, n = n 1 , . . . , n k , n i ∈ N D i is a composed node of all the D i .
We will compose nodes and arcs of D i , i = 1, . . . , k and R i , i = 1, . . . , k along the visible events in the trace c. The composition begins at the initial composed nodes n = s D 1 , . . . , s D k and m = s R 1 , . . . , s R k (lines 7-8 of Alg. 5). We iterate over the trace c with a counter pos c and advance separate counters pos i , i = 1, . . . , k for each component (line 9-10 of Alg. 5). The "next" synchronization β i pos i , i = 1, . . . , k in each partial alignment gives an arc a i of R i and/or an arc b i of D i . We partially compose a subset of those arcs sharing the same label (lines 11-39) as explained next.
First, we give some technical notation for partially composing the D i and R i arcs, and then we explain the loop for the composition.
Suppose we are at the composed marking m = m 1 , . . . , m k of all the R i , and the next arcs we shall follow in the R i are a 1 , . . . , a k , a i = (m i , i , m i ). We may follow only those arcs together that share the same label. The partial composition of these arcs for some label is the vector a = â 1 , . . Partially composing alignments. We now can explain how we compose the projected alignments ε i into ε c by composing the arcs of the D i and R i in the order in which they occur in the ε i , i = 1, . . . , k. We "replay" trace c starting from an empty composed alignment, all projected alignments are at pos i = 0, and at the initial composed nodes n and m for the D i and R i .
The next event to replay is = c(pos c ) (line 11 of Alg. 5). The next projected synchronizations are β i = ε i (pos i ), i = 1, . . . , k with β i = (op i , a i , b i ). Two cases may arise.
1. For all S-components i that have ∈ L i in their alphabet, their next synchronization β i involves arcs labeled with = (β i ); lines (25-39 in Alg. 5). In this case, all S-components "agree" and we can synchronize the D i arcs and the R i arcs in the β i of those S-components into a synchronization for in ε c . Again, three cases may arise.
(a) All synchronizations β i labeled with agree on the operation lhide (lines 25-29 in Alg. 5). The partially composed arc a = (n, , n a) of the D i in the new synchronization (lhide, a , ⊥) describes that all Scomponents make a lhide step together (i.e., no S-component fires a transition for event ). The new synchronization is appended to ε c and we advance the position pos i for all S-components involved in this composition.
(b) All synchronizations β i labeled with agree on the operation match (lines 30-36 in Alg. 5). We append to ε c a new match synchronization with partially composed D i arcs and R i arcs, describing that all involved S-components make a match step together.
(c) The partial alignments of some S-components disagree on the operation, i.e., we have conflicting partial solutions (lines [37][38][39]. In this case we fall back to computing a global alignment without decomposition (line 40).
2. There are S-components i that have ∈ L i in their alphabet, but the next synchronization β i is not labeled with = (β i ) (the set C in line 12 of Alg. 5). These S-components have to "catch up" with rhide synchronizations to reach a state where they can participate in a lhide or match synchronization over (lines [13][14][15][16][17][18][19][20][21][22][23][24]. However, such S-components may only catch up together: Suppose there is an S-component i having as next synchronization an rhide over (β i ) = x = , then all S-components with x in their alphabet (set lab x in line 15) must have an rhide synchronization on x as their next synchronization (set sync x in line 14). If we find such a set (line 16), then we can compose a rhide synchronization from the R i arcs in sync x and append it to ε c . This step may have to be repeated if there is another S-component that still has to catch up. If the projected alignments disagree on the next rhide, we have conflicting partial solutions and fall back to computing a global alignment without decomposition (lines [22][23][24]. Note that in this way, we consecutively construct two paths: one through the composition of the D i (by the a = (n, , n a) arcs) and one through the composition of the R i . In Sect. 5.4 we formally state that these paths correspond to paths through D and R and thus ε c is an alignment; the proof is given in Appendix C.
Explanation by example. In the example of Fig. 13, the partial alignments ε 1 -ε 4 are composed to ε c as follows.

Optimality is not guaranteed under recomposition
The recomposition of partial alignments in Alg. 5 is not necessarily optimal. Figure 14 shows a pair of S-Components, each representing a parallel activity A or B followed by a merging activity C and a trace C, A, B , 19  ε 1 (0)), . . . , n D (ε k (0))); ε 1 (0)), . . . , m R (ε k (0)));   where the merging activity is miss-allocated before the parallel activities. The two optimal projected alignments according to the sorting from subsection 4.4 then each include a rhide synchronization for the parallel activity, a match synchronization for the merging activity C and a lhide synchronization for the parallel activity. Note that both projected alignments are optimal in cost. Once the projected alignments are recomposed, the cost of the recomposed alignment is 4: r(A), r(B), m(C), l(A), l(B) . However, there exists another proper alignment with a lower cost of 2: l(C), m(A), m(B), r(C) . The reason why the recomposed alignment is not optimal, while the projected alignments are optimal, is that the projected alignments choose one optimal alignments out of multiple possible optimal alignments with the same cost without considering which choices would globally minimize the cost when recomposing the projected alignments. In this example, the projected alignments with another kind of sorting could also be l(C), m(A), r(C) and l(C), m(B), r(C) , which would recompose to the optimal alignment. With the current sorting introduced in subsection 4.4, we introduce an additional cost of one over the optimal cost per S-Component workflow net for a task miss-allocation of a merging activity possibly multiple times, when the parallel block is enclosed in a cyclic structure. Hence, the worst-case cost over-approximation of the proposed recomposition algorithm for a given trace c is k * #i, where k is the size of the S-Component decomposition and #i is the number of maximal repetitions of a label in c that is also contained in a parallel block in the process model. Transforming the recomposition procedure into a minimization problem of selecting the best projected alignments for recomposition would however increase the calculation overhead exponentially since every trace can have exponentially many optimal alignments for each S-Component workflow net. Thus, selecting the best optimal projected alignments can be computationally more expensive than calculating only one-optimal alignments for the initial workflow net and event log. However, calculating the reachability graphs of workflow nets without parallel constructs is polynomial in size, speeding up the calculation of one optimal-projected alignments, and thus the proposed technique can provide significant speed-ups over the original technique on process models with parallelism. Figure 14. Counter-example to optimality of a recomposed alignment.
Even though the presented approach computes non-optimal results, the evaluation shows that both the fraction of affected traces as well as the degree of over-approximation is rather low. The results obtained for the evaluation of this novel approach is oftentimes close to optimal.

Addressing invisible label conflicts
The recomposition of synchronizations from the partial alignments of the S-components in Alg. 5 relies on the unique labeling. In this way, arcs in the reachability graphs of different S-components can safely be related to each other. However, if a uniquely labeled process model contains a τ-labeled transition, Alg. 1 reduces these τ-labeled transitions by contraction with subsequent visible edges. This may lead to two arcs in the reachability graph carrying the same label D but describing different effects, a hidden form of label duplication. Applying Alg. 5 on such a model may lead to two partial alignments where the composed synchronization agree on label D, but the underlying arcs in the reachability graphs disagree, leading to a "hidden" recomposition conflict not detected by Alg. 5. The resulting ε c would no longer form a path through the process model.
In the following, we illustrate the problem by an example and discuss a simple change to Alg. 1 that ensures a unique labeling over all reachability graphs (global and projected). For such reachability graphs, Alg. 5 always returns an alignment, which we prove formally. Figure 15 shows an example with trace A, B, D and a process model, where the parallel tasks B and C can be skipped. The process model is decomposed into two S-Component nets, one for each of the two parallel activities. When the trace is projected onto the S-Component with activity C, the obtained alignment matches both trace activities and skips activity C with the τ transition.  joining τ-transition). The alignment for the first S-component uses the former whereas the alignment for the second S-component uses the latter, leading to the conflict described above. Figure 16 illustrates shows how to relabel arcs in the reachability graph to avoid "hidden" label duplication. First, add to each τ-transition a unique index at the start of Alg. 1 so that all τ-transitions are uniquely labeled. Second, we alter Alg. 1 to maintain the identity of the removed τ transitions in the next visible transition . In particular, when replacing an arc (n 1 , , n 2 ) for an arc with label τ i , we create an extended label (τ i , ) for the replacement arc. Let Alg. 1 * be this modification of Alg. 1 and let Alg. 5 * which invokes Alg. 1 * instead of Alg. 1. Alg. 1 * and the extended labels are not used for the PSP construction Algs. 2 and 4. We omit the technical details. The changes in Fig. 16 lead 21 Figure 15. Recomposed alignment that can not be replayed on the process model.
to the following differences: The transition τ 1 can now be distinguished from transition τ 3 . During the recomposition, there will be a label conflict between the extended labels (τ 1 , D) and (τ 3 , D). As a result, this trace will be aligned on the original reachability graph to ensure the alignment forms a path through the process model. The following theorem states that the recomposed alignment is a proper alignment.
The formal proof by induction on the length of c is given in Appendix C. The core argument is to show that the markings and the transition firings of SN can be reconstructed from the vector m of markings of each S-component nets. Further, the arcs in the reachability graphs of the S-components nets are isomorphic to the transitions. As a result, the transition effect of the original transition in SN can be recomposed from the effects in the S-component nets. The latter argument requires the uniqueness of arcs in the reachability graphs provided by Alg. 1 * .

Evaluation
We implemented our approach in a standalone open-source tool. 3 Given an event log in XES format and a process model in BPMN or PNML (the latter is the serialization format of Petri nets), the tool will return several conformance statistics such as fitness and raw fitness cost. Optionally, a list of one-optimal alignments for each unique trace as well as their individual alignment statistics can also be extracted. The tool implements both the Automata-based approach described in Section 4 as well as the extended approach with the S-Components improvement described in Section 5.
Using this tool, we conducted a series of experiments to measure the quality and time performance of both our approaches against two state-of-the-art conformance checking techniques: 1) the newest version of the one-optimal alignment with the ILP marking equation, first presented in [30] and implemented in ProM in the PNetReplayer package (ILP Alignments); and 2) the one-optimal alignment approach using the extended marking equation presented in [5] and implemented in ProM in the Alignment package (MEQ Alignments). We implemented multi-threading with each unique trace, and in the S-Components variant, also for each S-Component.
The two benchmark implementations use optimized data structures and efficient hashcodes [31]. Accordingly, we optimized our software implementation using similar techniques, so as to achieve results that are as comparable as possible. Specifically, we optimised the queueing mechanism by improving the selection of suitable solutions, merging overlapping solutions and prioritizing longer solutions with the same cost to faster find an optimal solution.

Setup
We measured the quality of alignment in terms of alignment cost (Def. 4.7) per trace. We chose to report the alignment cost over other conformance measures such as fitness as it better allows one to pinpoint over-approximation of the result. Given that the complexity of the alignment problem is worst-time exponential, we decided to apply a reasonable time bound of 10 minutes to each experiment. We note that previous experiments reported that in certain cases the computation of an alignment may take over a dozen hours [11]. The experiments were run multi-threaded with a fixed amount of 16 threads for each approach to achieve a comparable computation setup. Each experiment was run five times and we report the average results of runs #2 to #4 to avoid influence of the Java class loader and reduce variance.
The experiments were conducted on a 22-core Intel Xeon CPU E5-2699 v4 with 2.30GHz, with 128GB of RAM running JVM 8. This machine can execute up to 44 threads per socket.

Datasets
We used two datasets of log-model pairs from a recent benchmark on automated process discovery [32] to investigate the performance of a wide range of different log and process model characteristics. The first dataset consists of twelve public event logs. These logs in turn originate from the 4TU Centre for Research Data 4 . They include the logs of the Business Process Intelligence Challenge (BPIC) series, BPIC12 [33], BPIC13 cp [34], BPIC13 inc [35], BPIC14 [36], BPIC15 [37], BPIC17 [38], the Road Traffic Fines Management process log (RTFMP) [39] and the SEPSIS Cases log (SEPSIS) [40]. These logs record process executions from different domains such as finance, healthcare, government and IT service management. The BPIC logs from the years 2011 and 2016 were excluded since they do not represent real business processes. We observe that in the benchmark, some of those logs (marked with " f ") were filtered from infrequent behavior with the technique in [41] to avoid state-space explosion in the computation of fitness and precision. We retained this filtering step because we experienced the same problem with the unfiltered logs, i.e. it was not possible to compute the raw fitness cost for most of the approaches assessed. Moreover, keeping this filtering allows us to retain exactly the same dataset as used in the benchmark, for compatibility purposes. The second dataset is composed of eight proprietary logs sourced from several organizations around the world, including healthcare, banking, insurance and software vendors.
Each of the two datasets comes with four process models per log, that have been discovered using four state-of-theart automated discovery methods in the benchmark in [32], namely: Inductive Miner [42], Split Miner [43], Structured Heuristics Miner [44] and Fodina [45]. We discarded the process models discovered by the latter two methods for our experiments since they may lead to process models with transitions with duplicate labels (and in some cases also to unsound models), which our S-Components extension does not handle. This resulted in a total of 40 log-model pairs for our evaluation. Table 1 reports the log characteristics. We have logs of different sizes in terms of total traces (681-787,667) or total number of events (6,808,706). The difficulty of the conformance checking problem, however, is more related to the number of distinct traces (0.01%-97.5%), the number of distinct events  and the trace length (avg. . The logs thus feature a wide range of characteristics, from simple to complex logs, for the conformance checking problem. For reference, we made the public logs and corresponding models, together with all the results of our experiments, available online [46].  Table 2 reports the statistics of the process models obtained with Inductive (IM) and Split Miner (SM), for each log in our evaluation. Specifically, this table reports size (number of places, transitions and arcs), number of transitions, number gateways (XOR-splits, AND-splits) and size of the resulting reachability graph from the Petri net (in case of a BPMN model, it is the Petri net obtained from this model). In addition, if a Petri net has at least one AND-split, we also report on the number of S-Components and for each of them the following statistics: their average Petri net size, average number of transitions, average number of XOR-splits and average size of the resulting reachability graph.
Inductive Miner is designed to discover highly-fitting models. As a result, the models often exhibit a large reachability graph as the models need to cater for a large variety of executions present in the logs. Split Miner strikes a trade-off between fitness and precision by filtering the directly-follows graph of the log before discovering the model. That leads to process models with a smaller state space, but with a possibly higher number of fitness mismatches. Altogether, these models present two different scenarios for conformance checking: the models discovered by Inductive Miner require a large state space to be traversed with a low to medium number of mismatches per trace, while the models of Split Miner have a smaller state space with a medium to high number of mismatches per trace.

Results
The S-Components approach outperforms the other approaches in 8 out of 40 log-model pairs, the Automata-based technique performs best in 28 out of 40 cases and the extended MEQ Alignments approach outperforms in 3 out of 40 cases. In total, the S-Components approach times out ("t/out" in the table) in 2 cases, the Automata-based approach in 3, ILP Alignments on 1 and the extended MEQ alignments approach times out in 6 cases. All approaches timed out on PRT2 (IM), which has a huge state space of 5,515,357 nodes and arcs in the reachability graph. The S-Components approach actually manages to compute alignments quickly for this log-model pair since the S-Component reachability graphs are very small, but times out when some traces conflict with each other in the recomposition algorithm and need to be aligned on the original reachability graph, which is much larger. The S-Components approach manages to compute alignments for the BPIC14 f (IM), which was not possible within the 10 minutes timeout for the Automatabased approach.
The Automata-based approach performs better than both state-of-the-art approaches ILP and MEQ Alignments by one-two orders of magnitude. For example, for the BPIC17 f (IM) it takes 680 ms against 20.7 sec of ILP Alignments or for BPIC12 (SM) it takes 4,578 ms vs 188,489 ms of ILP Alignments. When the state space reduction of the S-Components is effective, it shows the potential to improve over other approaches by at least one order of magnitude, e.g. for BPIC12 (IM) it improves from 121,845 ms (Automata-based) to 44,301 ms and for BPIC14 f (IM) from 84,102 ms (ILP) to 7,789 ms. In total the Automata-based approach improves over both baseline approaches by one order of magnitude in ten datasets and the S-Component approach in three datasets. The S-Component extension improves over the automata-based approach by one order of magnitude in five cases. The process models discovered by Split Miner do not feature parallel constructs except the model discovered from the BPIC12 log. Thus, in these logs, the performance of the S-Component extension is the same as that of the automata-based approach. In the BPIC12 (SM) case, the Automata-based technique outperforms the S-Component approach because it exploits the parallel constructs in the model. This is due to the combined state-space of the S-Component reachability graphs being larger than the original size of the reachability graph of the process model. This can already happen for process models with a small amount of parallel behavior in comparison to models with a large amount of other behavior, i.e. the model from the BPIC12 log has one parallel block with two parallel transitions against a total of 85 transitions.
Since the advantages of the S-Components decomposition are limited to a specific type of process models (those with large state spaces due to a high degree of parallelism), we derived an empirical rule to decide when to use the S-Components improvement on top of our Automata-based approach. Accordingly, we apply this improvement if the sum of the reachability graph sizes of all S-Components is smaller than that of the original reachability graph of the process model. We added the execution times for this hybrid approach to Table 3. The hybrid approach manages to outperform all other approaches in 30 out of 40 cases and performs second best in five more cases. We note that the reported execution time of the hybrid approach does not include the time required to decide whether or not to apply the S-Components improvement. If we end up selecting the S-Components, we do not actually need additional time, since the reachability graphs for the S-Component nets are computed anyways as part of the decomposition approach. If we select the base approach, this leads to two cases: the model does not have parallelism or it does. If it does not, we detect this case by checking all transitions of the Petri net, which is a linear operation, so the time is negligible. If the model has parallelism, we need to calculate the reachability graphs for every S-Component net. In practice, this time was always negligible in our experiments, but there can be very large process models for which this operation may be expensive. However, in these cases, it is likely that we would select the S-Component approach anyway. Table 4 shows the optimal costs for a subset of datasets. In these log-model pairs, the S-Components approach over-approximates the optimal cost of the alignments, i.e. in 6 out of 40 cases. For completeness the full table with optimal costs for all datasets can be found in Appendix A. The difference between the S-Component approach and all other approaches with optimal costs ranges from 0.002 to 0.052 per trace. We further broke down the overapproximation into two columns: the fraction of traces in the log that were affected by an over-approximation, which ranges from 0.2 to 5.2%, and the average fitness-cost that was over-approximated in the affected traces, which ranges from 1 to 2 mismatches more than the optimal number. We observe that the approach never under-approximates and always returns proper alignments. By design, the Automata-based approach always has the same cost as the ILP or the MEQ Alignments and thus is always optimal.
One example of over-approximation can be observed in the SEPSIS dataset (IM) for the trace CRP, Leucocytes, LacticAcid, ER Registration, ER Triage, ER Sepsis Triage, IV Antibiotics, IV Liquid . The optimal alignment for this trace, retrieved with ILP-Alignments, is (rhide,ER Registration), (match,CRP), (match,Leucocytes), (match,Lactic -Acid), (lhide,ER Registration), (match,ER Triage), (match,ER Sepsis Triage), (match,IV Antibiotics), (match, IV 26 Liquid) with a cost of 2, because task ER Registration is misplaced after the parallel block. The S-Components approach finds instead the following alignment: (lhide,CRP), (lhide,Leucocytes), (lhide,LacticAcid), (match,ER Registration), (match,ER Triage), (match,ER Sepsis Triage), (match,IV Antibiotics), (match,IV Liquid) with a cost of 3. As shown in Figure 17, in the process model, task ER Registration appears before the parallel block, while in the trace this occurs after the activities in a parallel block. As a result, the S-Component approach will hide all the activities in the parallel block, i.e. CRP, Leucocytes and LacticAcid, and then match the activity ER Registration. When recomposing the projected alignments, however, the added alignment cost will be 3 instead of 2. Note that the alignment of the S-Components approach is still a proper alignment, i.e. still represents the trace and forms a path through the process model. Figure 17. Sepsis Inductive Miner process model.

Threats to validity
A potential threat to validity is the number of threads used in our experiments (16). A different number of threads can lead to different results. For that reason, we repeated our experiments in single-thread mode. The results are reported in Appendix B and are consistent with those obtained by the multi-threaded evaluation.
Another potential threat to validity is the selection of datasets. We decided to use two datasets of real-life logmodel pairs from a recent discovery benchmark [32]. These datasets exhibit a wide range of structural characteristics and originate from different industry domains, so they provide a good representation of reality. However, the models discovered by Split Miner did not contain a lot of parallel structures and were thus not highlighting the strengths of the S-Components decomposition. This calls for further experiments with models with a higher degree of parallelism, and more in general, with very large real-life log-model pairs. Such datasets are not publicly available at the time of writing. An alternative, is to use artificial datasets as in [47].
A final threat to validity is posed by the number of methods used for automated process discovery (two). Potentially we could have chosen a larger number of methods. The choice of Split Miner and Inductive Miner was determined by both pragmatic reasons (other methods such as Structured Heuristics Miner return models with duplicate labels which we cannot handle, or led to models for which fitness could not be computed) as well as by the need to test two extreme cases: models with large state spaces versus models with large degrees of parallelism. Moreover, they are the best performing automated discovery methods according to the benchmark in [32]. So, all considered, they constitute a sufficiently representative set of discovery methods.

Conclusion
This paper presented an automata-based technique for conformance checking of process models against event logs. Specifically, the paper showed that the problem of conformance checking can be mapped to that of computing a minimal error-correcting product between an automaton representing the event log (its minimal DAFSA) and an automaton representing the process model (its reachability graph). The resulting product automaton can be used to produce sets of optimal alignments between each trace in the log and a corresponding trace in the model.
The use of a DAFSA to represent the event log allows us to benefit from both prefix and suffix compression of the traces in the log. This is a distinctive feature of the proposal with respect to existing trace alignment techniques, which compute an alignment between each trace in the log and the model, without any reuse across traces. The empirical evaluation reported in the paper shows that this approach outperforms state-of-the-art trace alignment techniques in a clear majority of cases.
The proposed automata-based technique suffers from combinatorial explosion when the process model contains a large number of concurrent branches. To address this shortcoming, we combined the automata-based conformance checking technique with a technique to decompose a process model (specifically a Petri net) into a collection of concurrency-free components, namely S-components. Each of these S-components (which corresponds to an automaton) can be aligned separarely against a projected version of the log, in such a way that the alignments can be recomposed into a correct (although not necessarily optimal) alignment. The evaluation showed that this decompositionbased approach achieves lower execution times than the monolithic automata-based approach when the number of S-components is high, in part thanks to the fact that the decomposition-based approach lends itself to parallel computation. The evaluation also showed that the decomposition-based approach computes optimal alignments in the majority of cases. In those model-log pairs where it does not find the optimal (minimal) alignments, the over-approximation is small (one or a handful of moves) and it only occurs for a small percentage of traces (5% or less).
The proposed technique still fails to perform satisfactorily on a handful of the event logs used in the evaluation. Further improvements may be achieved by designing better heuristic functions to guide the A* algorithm.
In this article, we combined S-components decomposition with an automata-based approach to align each Scomponent against the event log. This combination is natural, since each S-component corresponds to a concurrencyfree slice of the process model, which can be seen as an automaton. It is possible however to combine this Scomponent decomposition approach with existing exact or approximate trace alignment techniques, including the trace alignment techniques of Adriansyah et al. [3] or Van Dongen [5]. An avenue for future work is to explore the relative performance of the S-component decomposition approach with other conformance checking algorithms.
This article focused on the problem of identifying unfitting log behavior. Another avenue for future work is to extend the approach to detect additional model behavior, for example by adapting the ideas proposed in [4] in the context of event structures. Proof of Thm 5.1.1. We show λ (ε c D ) = c by induction on the prefixes c of c in the for-loop in lines 10-39. For the empty prefix c before the for-loop, ε c = . In each iteration of the for-loop with pos c ≤ |c|, the prefix c is extended with = c(pos c ) ∈ L, and the current prefix of ε c is extended with a synchronization (lhide, (n, , n ), ⊥) (line 27) or (match, (n, , n ), (m, , m )) (line 33). Thus, the proposition holds for both prefixes. The only other extension of the current prefix of ε c in Alg.5 is with synchronizations (lhide, ⊥, (m, , m )) in line 18 which do not occur in λ (ε c D ).

Appendix A. Complete Cost comparison and order of approximation
Proving Thm 5.1.2 requires some further notation, definitions, and observations on Petri nets. For a WN, let C = {WN 1 , WN 2 , . . . , WN k } be the set of S-Components of WN. By the abuse of notation, let C(t) be the set of S-components in which t is contained as WN i = ((P i , T i , F i , λ i ), i i , o i ) ∈ C(t) iff t ∈ T i , for each t ∈ T ; sets C(p), p ∈ P i , are defined accordingly.
In a sound free-choice net WN, the pre-and post-sets of a transition t (together) cover the same S-component, which follows from WN being covered by S-components [48] and the free-choice structure: