The Connection between Process Complexity of Event Sequences and Models discovered by Process Mining

Process mining is a research area focusing on the design of algorithms that can automatically provide insights into business processes. Among the most popular algorithms are those for automated process discovery, which have the ultimate goal to generate a process model that summarizes the behavior recorded in an event log. Past research had the aim to improve process discovery algorithms irrespective of the characteristics of the input log. In this paper, we take a step back and investigate the connection between measures capturing characteristics of the input event log and the quality of the discovered process models. To this end, we review the state-of-the-art process complexity measures, propose a new process complexity measure based on graph entropy, and analyze this set of complexity measures on an extensive collection of event logs and corresponding automatically discovered process models. Our analysis shows that many process complexity measures correlate with the quality of the discovered process models, demonstrating the potential of using complexity measures as predictors of process model quality. This finding is important for process mining research, as it highlights that not only algorithms, but also connections between input data and output quality should be studied.


Introduction
Recent years have seen a drastic increase in the availability of event sequence data and corresponding techniques for analyzing business processes, healthcare pathways, or software development routines [46,28,16]. Process mining is a research area focusing on the design of techniques that can automatically provide insights into business processes by analyzing historic process execution data, known as event logs [45,15]. In process mining research, various algorithms have been developed for automated process discovery. A recent study found a rich spectrum of 35 distinct groups of such algorithms scattered over more than 80 studies [6]. Much of this research on automated process discovery is motivated by the ambition to improve process discovery outputs, in terms of high precision and recall, while producing models that are simple and easy to understand [3].
So far, this stream of research on improving process discovery algorithms has been largely driven by the implicit assumption that a better algorithm would generate a better process model, no matter the characteristics of the input event log. In fact, there are good reasons to question this narrow focus. First, research on computer experiments highlights that studying the effect of input data characteristics on output is an important objective in many research areas [39,27].
Second, research on classifiers demonstrates the benefits of selecting algorithms based on characteristics of the input data [20,36,35]. Third, in various application domains, establishing a solid understanding of how input characteristics influences output has led to fundamental algorithmic innovations [38]. For these reasons, Kriegel et al. recommend factoring in the variation of input parameters over meaningful ranges when comparing algorithms [22].
In this paper, we revisit the output quality of automated process discovery algorithms in light of these arguments. More specifically, we investigate the empirical connections between measures capturing process complexity in terms of process behavior recorded in an event log and the quality of the process models discovered from that event log, as well as which of these process complexity measures can serve as a suitable predictor of process discovery quality. To this end, we first review process complexity measures defined in prior research studies.
We analyze their characteristics and categorize them according to what perspective of process complexity they capture. Then, noting that each measure relates to a different perspective, we propose a new measure of process complexity based on graph entropy, which can exhaustively capture process complexity from multiple perspectives. Lastly, we analyze the process complexity measures using a prototypical implementation and an evaluation over an extensive set of event logs and their corresponding automatically discovered process models. Our analysis shows that many process complexity measures (including our novel measure) correlate with the quality of the discovered process models. Our findings demonstrate the potential of using process complexity measures as predictors for the quality of process models discovered with state-of-the-art process discovery algorithms. Such a result is important for process mining research, as it highlights that not only algorithms, but also connections between input data and output quality should be studied.
The remainder of the paper is structured as follows. Section 2 summarizes prior research on measuring process complexity and related studies. Section 3 presents our process complexity measure, how it is calculated, and which properties it satisfies. Section 4 presents our evaluation and the main findings of this study. Section 5 concludes the paper and draws ideas for future work.

Background and Related Work
In this section, we contextualize our study by discussing related work with a focus on the quality of discovered process models, automated process discovery algorithms, and process complexity.

Quality of Discovered Process Models
Process discovery is the task that encompasses the understanding of a business process behavior and the representation of that behavior in the form of a pro-cess model [15]. Process mining research has developed various algorithms for automated process discovery. The algorithms analyze the information recorded in an input event log (i.e., the process execution data capturing the process behavior) and automatically generate a process model as an output. Several measures have been defined for assessing the quality of a discovered process model [45]. Precision, fitness, and simplicity are the most frequently used ones.  Fitness measures to which extent the behavior contained in the event log can be reproduced by the process model. Also fitness ranges between 0 and 1, where a value of 1 means that all the sequences of events contained in the event log can be reproduced by the process model. We observe that all Models (a),  [24]. Some of them take into account the structure of a process model (e.g., model size), others the behavioral variability of a process model (e.g., control flow complexity). In our example in Figure 1, it is apparent that Model (c) is structurally more complex than Models (a) and (b), since it has more nodes and edges. On the other hand, Model (c) is behaviorally simpler than Model (a) and (b), since it allows for less variation. Accordingly, its control flow complexity is lower.

Automated Process Discovery Algorithms
Over the past decade, more than 80 research papers have proposed new algorithms for automated process discovery [6]. Often, the process models they produce significantly differ in terms of how they trade off precision, fitness, and simplicity [6]. The latest benchmark study [6] compares and evaluates seven of the most effective state-of-the-art algorithms, namely, α$ [18]  covering fitting, precise, and simple process models across the whole benchmark dataset of 24 real-world event logs. However, they also showed a substantial variance of performance. HILP and A$ often discovered unsound models (i.e., containing behavioral errors, such as deadlocks) and rarely produced highly fitting, precise, or simple process models. While FO and SHM often produced accurate models, these were usually highly complex and difficult to interpret.
Even though the study by Augusto et al. [6] does not discuss this aspect, their results suggest a connection between the input event log features and the quality of the automatically discovered process models. Also other works have tried to select the most suitable discovery algorithm based on event log characteristics [36,35], but without studying the connection between log complexity and model quality. For this reason, we hypothesize that characteristics of the event log influence the quality of discovered process models.

Process Complexity as a Factor of Process Discovery Quality
It is a challenge for discovery algorithms to generate process models that are easy to understand. Often, the generated models are overly complex. These complex models are called "spaghetti models". Van der Aalst emphasizes that "spaghettilike structures are not caused by the discovery algorithm but by the variability of the process" [43]. In line with this observation, several proposals have been made for pre-processing event logs independent of the discovery algorithm applied.
These proposals build on clustering, supervised sequence labeling, sequential patterns, or text matching [11,40,14]. They highlight the potential to improve process discovery outputs by modifying characteristics of the event log data as an input.
To assess the empirical connection between log complexity and process dis- Ref.

Size
Number of Events magnitude [17] Number of Event Types variety [17] Number of Sequences support [17] Minimum, Average, Maximum Sequence Length TL-min, TL-avg, TL-max [45] Average Time Difference between Consecutive Events (time) granularity [17] Variation Number of Acyclic Paths in Transition Matrix LOD [31] Number of Ties in Transition Matrix t-comp [19] Lempel-Ziv Complexity LZ [30] Number and Percentage of Unique Sequences DT(#), DT(%) [45] Average Distinct Events per Sequence structure [17] Distance Average Affinity affinity [17] Deviation from Random dev-random [30] Average Edit Distance avg-dist [30] [24,25,37,26]. The few studies that consider process complexity more explicitly stem from computer science, organization science, and management science. The corresponding measures are categorized in Table 1 and described next.
The first category includes size measures. Various properties of an event log can be easily counted including the number of events, sequences, and event types, the minimum, maximum, and average sequence length, and the average and minimum time difference between two events (proposed by Günther [17,Ch.3]).
The second category contains measures related to the variation of the process behavior recorded in the event log. Several of these measures take the transition matrix derived from the directly-follows relations observed in the event log as a starting point. Pentland [31] proposes the calculation of complexity as the number of acyclic paths implied by the transition matrix derived from the event log. Haerem et al. [19] use a slight variation based on what they call the number of ties, which in essence is the count of directly-follows relations observed in the event log. Also, Pentland's proposal [30] of measuring the number of operations of compressing the event log using the Lempel-Ziv algorithm is a variation measure. Finally, the (absolute and relative) number of distinct sequences [45] and the average number of distinct events per sequence [17] also provide an indication of variation.
The third category refers to distance measures. Several distance notions have been defined. Günther [17] proposes the notion of affinity, which is based on the overlapping directly-follows relations of two event sequences. His proposed complexity measure, namely average affinity, is calculated as the mean of affinity over all pairs of event sequences [17,Ch.3]. This measure is closely related to the one proposed by Pentland [30] called deviation from random of the transition matrix. Furthermore, Pentland also proposes a second distance measure based on the average edit distance between event sequences based on notions of classical optimal matching [12].
Each of these measures has its limitations and blind spots. We can easily identify cases where one measure indicates a difference while other measures are unaffected. We make the following observations on the relationship between two event logs L 1 and L 2 : Observation O3: Assume that L 1 and L 2 have the same distance measures with each pair of sequences having, e.g., an edit distance of 1 on average. If L 1 includes each sequence of L 2 twice, the size measures will be substantially different. Also, the variation can be quite different if L 1 includes many sequences that are the same, while a few are very different, as compared to L 2 where all sequences have rather little distance.
These observations clearly show that there is no unique process complexity measure capable of capturing information regarding size, variation, and distance at once. In the next section, we propose a new measure that addresses this challenge.

Process Complexity based on Graph Entropy
In this section, we propose a novel measurement of process complexity based on graph entropy. More generally, entropy has been used for assessing the behavior of process representations and corresponding logs at the language level in [34,33]. This means that the measurements do not account for the distribution of variants of an event log. Graph entropy is particularly suited as an underlying concept because it can capture size, variation, and distance in an integral way. To this end, we have to map an event log to a graph structure that acknowledges equivalences between sequences without introducing abstractions.
In Section 3.1, we define the notion of an extended prefix automaton. Section 3.2 defines graph entropy for extended prefix automata, proves monotonicity, and relates the proposed measure to Observations 1-3.

Event Logs as Prefix Automata
In this section, we define the extended prefix automaton for an event log. The concept of a prefix automaton was introduced by Munoz-Gama and Carmona in [29] based on concepts described by Van der Aalst et al. in [44]. The concept of a prefix automaton is particularly suited for our purpose of describing the complexity of an event log. Prefix automata describe sequences without loss of information and abstraction. They account for equivalent prefixes but do not introduce complexity that is not present in the event log. Figure 2 illustrates the idea of a prefix automaton for the event log with the four sequences we discussed above. We observe that overlapping prefixes of the sequences lead to joint paths in the prefix automaton. All paths originate from the root. There are as many variants in this event log as there are nodes on the right-hand side without successors. These are four in our case. We revisit basic notions of events, event sequences, and event logs upon which we will define the construction of the extended prefix automaton. A plain event log L plain ∈ E * is a finite sequence of events (with events potentially relating to different cases). A plain event log is ordered by event timestamps and not by cases.
The connection between an event log L and a corresponding plain event log [29] defines a prefix automaton P A = (S, T, A, s 0 ) with S being a set of states, A the set of activities, T ⊆ S × A × S set of transitions and s 0 the initial state. Based on this, we introduce the concept of an extended prefix automaton.
Compared to the prefix automaton T S in [29], our automaton is extended in two ways. First, we define the seq function that maps each state s ∈ S to a set of events seq(s) ⊆ E having the same prefix as the state itself. Second, we define a partitioning function C that splits the extended prefix automaton EP A into 0 ≤ k ≤ |L| partitions. Note that |L| here refers to the number of traces in an event log L. We discuss partitioning in more detail below.

Definition 2 (Extended prefix automaton). EP A = (S + , T, A, C, seq, root)
Note that the root has no incoming transitions.
• C ∈ S + → N 0 ∪{⊥} a partitioning function, defining for each state s ∈ S + the partition to which it belongs. C(s) refers to a partition the state s belongs to. A state can only belong to one partition. We write C(root) =⊥ because the root node does not belong to any partition.
• seq ∈ S → ℘(E) maps each state to events having the same prefix as the state. Note that every event in the log L (or L plain ) corresponds to one and only one state: ∀e ∈ L ∃s ∈ S : e ∈ seq(s) and ∀s, s ∈ S : s = s ⇒ seq(s) ∩ seq(s ) = ∅.
• root ∈ S + is the entry state of the automaton and corresponds to an Although the concept of accepting states is not mentioned in [29] and mentioned only informally in [44], we want to state explicitly here that in an extended prefix automaton all states are accepting states. 1 This means that every 1 It is an alternative to introduce sink nodes similar to the root node as the only accepting states. We do not consider this alternative here, because it does not allow the incremental construction of the prefix automaton while events of running cases are continuously added.
trace σ ∈ L can be replayed (recall = 1), but the extended prefix automaton can also produce shorter sequences that were not observed in the log L (precision ≤ 1). Note that this is no harm to our ambition of obtaining a representation of the event log with 100% precision and recall when we store observed event sequences using the extended prefix automation.
Consider our previous example sequences and Figure 3. The example illustrates how events of the event log are mapped to states with the help of the function seq(s). All sequences start with activity a, so one state s 1 1 suffices to capture all observed behavior up to that point. The second event in sequences 1 and 4 is b, which is captured by the state s 1 2 such that C(s 1 2 ) = 1 and seq(s 1 2 ) = {b 1 , b 4 }. In sequences 2 and 3, the second event is c, which requires the extended prefix automaton to branch and introduce another state Each time such branching occurs, a new partition is introduced, and the final number of partitions equals the number of observed process variants.
For its construction, we use the corresponding plain event log L plain . Algorithm 1 shows how the extended prefix automaton is constructed. The algorithm iterates over the events in the log L plain and uses the variables last AT , pred AT , current AT , and current c . The variable last AT is a mapping used to store the latest corresponding activity type that is added to the automaton for every case ID. While it is possible to search for it at every iteration, such a mapping increases the efficiency of the algorithm. pred AT is used to store the activity type of the current event's predecessor, while current AT stores the activity type of the event in question. current c stores the partition number of the current activity type.  2. The preceding state is root and it has no outgoing transitions, which means e is the first event in L plain . In this case, the new state defines the first partition in the extended prefix automaton. Note that this case does not differ from the previous one conceptually, but requires a slightly different implementation.

Result: Extended Prefix Automaton EP A
3. The preceding state has no outgoing transitions but is not a root state.
Since the pred AT has no outgoing transitions, the new activity does not add any path and thus the new state should belong to the same partition as its predecessor.
We assume that all look-up functions in this algorithm can be implemented with computational complexity in O(k) with a constant k. Then, iterating over the set of events E drives the complexity of this calculation. The complexity of calculating the extended prefix automaton is accordingly O(E).

Graph Entropy of Extended Prefix Automata
Entropy is an appropriate concept for defining complexity measures due to its properties of monotonicity [34]. Some applications of entropy have been developed by Polyvyanyy et al. for defining precision and recall measures for conformance checking at the language level, in which eigenvalues of process models and event logs are calculated iteratively with polynomial complexity [34,33,32]. Here, we highlight the opportunity to use entropy as an underlying concept for calculating the process complexity of an event log based on extended prefix automata in linear time.
More specifically, we define the four entropy measures: variant entropy and sequence entropy, as well as their corresponding normalized versions. We build on the measures proposed by Dehmer et al. [13] who define graph entropy based on a partitioning of the graph into X i partitions.
We calculate variant entropy based on the extended prefix automaton by only considering its structure and not the number of events associated with each state. We apply the formula with X = S (this is S + without the root).
Note that we do not consider the set of transitions T , because every state has exactly one incoming transition. We obtain: The Both versions of the variant entropy measure are based on the number of states in a partition in (2). This is an abstraction. Each state s ∈ S is associated with a non-empty set of events, whose cases share the same prefix seq(s) = ∅.
Each event e ∈ L belongs to one and only one such set. This implies that every event can also be assigned to one and only one partition in the extended prefix automaton. In this way, we obtain a measure that reflects the frequencies of events and corresponding prefixes. Thus, we calculate sequence entropy based on the number of events in a partition by extending (2). For the sake of readability, we define: and in the following way: Same as variant entropy, the absolute sequence entropy measure depends on the number of states in the extended prefix automaton, but additionally also on the number of events associated with each state. The same idea of normalization can be applied to sequence entropy resulting in the normalized sequence entropy: Lemma 1. The sequence entropy measure is monotonous with respect to an increasing number of events.
This property makes sequence entropy particularly suitable for measuring process complexity based on event logs.
Proof. In order to prove monotonicity of the sequence entropy measure, we have to show that adding one event increases this measure. To this end, we have to show that the following equation holds for |seq(S 2 )| = |seq(S 1 )| + 1.
We observe a corner case. If max(C 2 ) = max(C 1 ) = 1, then each summation equals the preceding term, such that both the left-hand and the right-hand side of the equation yield zero.
We rearrange the equation by bringing the sums onto the right-hand side.
Now we can distinguish two cases. If max(C 2 ) > max(C 1 ), then there must be one new partition that includes only one event. As a result, the righthand side becomes 1 · log(1) = 0, such that the formula holds true because |seq(S 2 )| · log(|seq(S 2 )|) is larger than |seq(S 1 )| · log(|seq(S 1 )|) due to each of its factors being larger, respectively.
Let us consider the alternative case that max(C 2 ) = max(C 1 ). In this case, there exists an index x where there is a difference between seq i (S 2 ) and seq i (S 1 ).
We write Furthermore, we can assume that there is a natural number m < S 1 , such that we can write We bring all terms to the left-hand side, yielding > 0 Now, let us consider that we can write S 2 = S 1 · f = S 1 + 1, such that f = S1+1 S1 . Then, we obtain f · |seq(S 1 )| · log(f · |seq(S 1 )|) > 0 We pull f out of the logarithms and reorder the terms to obtain > 0 These can be pulled together to obtain > 0 This can be simplified and rewritten to We can replace the product on the right-hand side of the inequality with a term that is larger by replacing factors that are greater one with larger factors. In turn, we replace |seq(S 1 )| − m with |seq(S 1 )|, such that the terms from lines (39) and (40) become equal. We then obtain This means that these cases can be distinguished.
Observation 2 states event logs L 1 and L 2 can have the same number of variants, but different size. Let us assume that L 1 is a duplication of L 2 such that |L 1 | = 2 · |L 2 |. Then, the amount of variants remains the same: . For evaluating the impact on the entropy measure, we calculate the difference E s (L 1 ) − E s (L 2 ). We assume S refers to L 2 , which yields the following equation: 2 · |S| · log(2 · |S|) − This is the same as E s + 2 · |S| · log(2) − 2 · |seq i (S)| · log(2) and greater than zero if E s is greater than one.
Observation 3 states that two logs L 1 and L 2 with the same edit distance between cases can differ in size and in their number of variants. Regarding size, we already demonstrated that duplicating the event log increases entropy if there is more than one partition. If we assume a case of a constant number of states and events in the extended prefix automaton, then it suffices to show that two times the amount of events in one partition (2 · |seq i (S)|) yields a higher entropy than two partitions with |seq i (S)| events. For these two cases, we have summands in the summation as follows: (2 · |seq i (S)|) · log(2 · |seq i (S)|) for less and larger partitions and 2 · (|seq i (S)| · log(|seq i (S)|)) for double the amount of partitions with half the number of events. It is easy to see that To summarize, we observe that our sequence entropy yields a process complexity measure for event logs that is monotonously growing with events being added no matter where in the event log. In comparison to the other measures presented in Section 2, we observe that the critical Observations 1-3 are well addressed by our entropy-based measure.

Evaluation
We have implemented our graph entropy-based process complexity measures as a Python application, 2 which also computes all log complexity measures discussed in Section 2. The application receives as input an event log (either in CSV or XES format) and calculates all complexity measures or a user-selected subset of them. Henceforth, for simplicity, we will refer to these measures as log complexity measures to distinguish them from the complexity that is associated with process models. We focus our evaluation and the analysis of the different complexity measures on the following two research questions:

RQ1.
How does log complexity affect the quality of automatically discovered process models?

RQ2.
What log complexity measures could be used as a proxy to predict the quality of automatically discovered process models?
In the following, we describe in detail the analysis that we conducted towards answering the two research questions.

Dataset and Setup
For our experiments, we first selected the collection of 24 event logs that Augusto et al. used for their review [6]. We extended this collection with eight event logs of the Business Process Intelligence Challenges 2019 [47] (three out of eight) and 2020 [48] (five out of eight). Hence, the collection of event logs we use for our experiments includes a total of 32 event logs (20 of which are publicly available and 12 private). To the best of our knowledge, this is the largest collection of real-world event logs used in a process mining study so far, representing an increase of 33.3% from the former largest collection [6]. The public subset of the collection of event logs previously used by Augusto et al. [6] is available for download from the 4TU Research Data Centre [7]. The event logs we used in our experiments record the executions of business processes from a variety of domains, including healthcare, finance, government, and IT service management. They demonstrate an heterogeneous degree of complexity across the different complexity measures. Tables 3-5 show the complexity of each event log, reporting our novel graph entropy-based complexity measures (Table 3) as well as the complexity measures from prior research (Tables 4   and 5). The labels of the state-of-the-art log complexity measures are reported in Table 1 in Section 2, while the labels of our measures are reported in Table 2.  The values 3 shown in Tables 3-5  Given our interest in understanding how log complexity measures relate to the quality of automatically discovered process models, the second component of our evaluation dataset includes the process models automatically discovered from the 32 event logs using the three most reliable algorithms to date, according to [6]: the Evolutionary Tree Miner (ETM) [9], the Inductive Miner -Infrequent

Process Complexity Measure Label
Behavior Variant (IM) [23], and the Split Miner (SM) [3]. Table 6 shows the quality of the process models automatically discovered by ETM, IM, and SM, covering both accuracy (i.e., fitness and precision) and complexity of the process models. The corresponding measures are: fitness, precision, F-score (of fitness and precision), size, and control flow complexity (CFC). We recall that fitness quantifies the amount of behavior contained in the event log that the process model is able to replay. The precision measure quantifies the amount of behavior allowed by the process model that can be found in the event log. The F-score of fitness and precision is the product of the two measurements divided by their sum and multiplied by two. Over the past decade, several fitness and precision measures have been proposed [41], each of them suffering from different limitations (from approximation to low scalability).
The results shown in Table 6 report the alignment-based fitness and precision measures proposed by Adriansyah et al. [2,1]. The choice of measures was guided by the goal to maintain consistency with the latest benchmark results [6].
The alignment-based fitness [2] is calculated as one minus the normalized sum of the minimal alignment cost between each trace in the event log and the closest corresponding trace that can be replayed by the process model. The

alignment-based precision [1], instead, builds a prefix-automaton of the event log
and then replays the process model on top of it, assessing the number of times that the process behavior diverges from the behavior of the prefix-automation.    of nodes composing the process model, while the CFC of a (BPMN) process model calculates the amount of branching induced by its split gateways [24].
Starting from the log complexity measurements and the process model quality measurements, we calculated the Pearson correlation and the Kendall correlation [21] for each pair of corresponding measurement series (log complexity, process model quality) and assessed their statistical significance. While the Pearson correlation focuses on the likelihood of an existing linear relation between two measurement series (e.g., between the magnitude of the event logs and the fitness of the models discovered by IM), the Kendall correlation tells us whether two measurement series exhibit the same rank. The Kendall correlation is particularly useful when two measurement series do not exhibit the same trend (e.g., a linear or an exponential relation), yet they rank the objects of the measurements identically or similarly (to a certain degree, assessed via statistical significance). In our context, the Kendall correlation allows us to understand if a log complexity measure calculated for a set of logs ranks the logs identically or similarly to the rank yielded by a process model quality measure.
Furthermore, we conduct a regression analysis to study the potential to predict process model complexity based on log complexity.
We leverage the results of the correlation analysis to collect evidence that allowed us to answer RQ1. Then, we select a subset of the log complexity measures that correlated the most with the quality measures of automatically discovered process models, and we explore if and how they could be used to estimate apriori the quality of the automatically discovered process models (answering RQ2     Table 9 shows the overlap of statistical significance and the corresponding p-values. A pair of measures that correlate negatively is identified with the dash symbol between brackets. To reinforce the statistical significance, in the following, we refer only to the results reported in Table 9. The correlation summary reported in Table 9 can be used to analyze which log complexity measures affect the process model quality measures the most and vice-versa. This analysis allows us to answer RQ1. Focusing on the quality measures of the automatically discovered process models by ETM (columns 2 to 6, Table 9), we notice that only one of the log complexity measures correlates with the precision and the CFC of ETM's process models (with a p-value at 0.05). This highlights that both the precision and the CFC of the process models discovered by ETM are not affected by the complexity of the input log. A similar observation can be made for the fitness of the process models discovered by IM. By design, IM always strives to discover a highly fitting process model [23]. Consequently, the overall complexity of the input log does not influence the fitness of IM: no correlation exists between the log complexity measures and the fitness of IM's process models. Finally, also the CFC of the models discovered by SM appears to be resistant to the log complexity, correlating with only three complexity measures: affinity, avg-dist, and TL-avg.
Next, we look at the process model quality measures that correlate the most

Results of Regression Analysis
We now want to investigate to what extent log complexity measures could be used to predict a process model quality measure. To this end, for each log complexity measure that correlates with a process model quality measure in Table 10, we assessed the residual errors (minimum, median, and maximum) and the coefficient of determination (R 2 ) of the corresponding linear regression model, where the log complexity measure is the predictor variable and the process model quality measure is the outcome variable. The linear regression models that exhibit the highest R 2 should be preferred, since R 2 is defined as 1 − SSr SSt , where SS r is the residual sum of squares and SS t is the total sum of squares. In general, the higher R 2 the more accurate is the linear regression model. The results of this analysis are reported in Table 11. Looking at the process model accuracy measures (fitness ETM , precision IM , F-score IM , fitness SM , and F-score SM ), we note that the best complexity measures to predict them are avg-dist (for ETM and IM) and nseq-e (for SM). This shows that the accuracy of the process models discovered by ETM, IM, and SM is closely affected by the variation of process behavior recorded in the event logs, rather than the absolute amount of process behavior. Therefore, automatically discovering a process model is likely to be more challenging when the event log records a small amount of process behavior that varies greatly than when the event log records a huge amount of process behavior that varies little. Looking at the model complexity measures Size ETM , CFC ETM , CFC IM , and Size SM , one log complexity measure seems to highly affect them all, which is variety.
It is possible to generate more accurate regression models by using uncorrelated multi-predictors or non-linear regression models (e.g., generalized additive models). Ideally, having an accurate regression model to estimate the quality of the automatically discovered process models could save considerable time for process analysts, especially if an optimization is required during process discovery [6,8]. In fact, one could select the automated process discovery algorithm to be used based on the results of the regression model predictions. However, the design of such an accurate regression model and its systematic evaluation would deserve a separate study, a very different dataset (e.g., a large collection of event logs, even artificial), and a different type of evaluation, as seen in previous studies [26]. We also note that seminal studies in that area have been conducted by Riberio et al. [36,35], however, neither taking into account the log complexity measures nor the novel automated process discovery algorithms designed in the past five years (including, Fodina [50] and Split Miner [3]). Furthermore, Riberio et al. propose a black-box prediction system, which does not focus on the connection between log complexity measures and the quality of the discovered process model.

Results of Computational Performance
Finally, we compared the computational performance of all the log complexity measures, assessing their average, maximum, minimum, and median execution times over the 32 event logs. The results are shown in Table 12. We did not report the execution times of derived measures (e.g., the percentage of distinct traces, which is derived from the total number of distinct traces and support), or of ratios (e.g., our normalized graph entropy measures). We note that all the complexity measures have a median execution time below one second, with the exception of affinity with a median execution time of 9.6 seconds.
Indeed, affinity was the slowest measure to be calculated, followed by our graph entropy-based complexity measures and avg-dist. However, while affinity has an average execution time well above a minute, our measures and avg-dist have an   average execution time in the order of seconds, which is reasonable given that these measures are not designed to work in a real-time context.

Threats to Validity
The findings reported in this study should be interpreted taking into account the size and variety of the 32 event logs of our dataset. Although the event logs are a good approximation of event logs that one may find in real-world scenarios, both in terms of complexity and variety of domains, they cannot possibly synthesize all possible business processes observable in an industrial setting. It is interesting to note that the original submission of this study's manuscript included only 24 event logs [7] and that the additional 8 event logs were added following the reviewers' suggestions. Nonetheless, the changes in the statistical analysis were minor. For instance, the total number of pairs of measures that correlated according to both Pearson and Kendall correlations increased from 84 to 90 (see Table 9). Even more importantly, the seven log complexity measures that correlated the most did not change (Table 10), nor did the most robust predictors (Table 11).
Another threat to the validity of our evaluation is the selection of the quality measures of automatically discovered process models, which in the past few years have been under scrutiny [42,41] in the process mining research community.
However, these research studies showed that no existing precision measure is ideal, while newly designed ones are either approximate [4] or computationally inefficient [32,34]. Although one may argue that one precision measure is better than another, we note that the choice of the precision measure does not affect the results much. The findings are reliable and accurate in light of the quality measures we used, which remain the most popular as of today.

Conclusions
With this paper, we provide two major contributions to measuring process complexity and to assessing process mining algorithms. First, we analyzed existing measures for process complexity that are based on event logs. Each of these measures emphasizes different complexity criteria including size, variation, and distance. We defined new measures of process complexity based on graph entropy, which capture all three concerns of process complexity by adhering to monotonicity. Second, we evaluated the identified set of process complexity measures, including our novel measures, using a benchmark collection of event logs and their corresponding automatically discovered process models. The goal of our evaluation was to investigate which empirical connections hold between the process complexity measures and the quality of discovered models. Our results show that many process complexity measures (including our novel measure) correlate with the quality of the discovered process models and that it is possible to use process complexity measures as predictors for the quality of process models discovered with state-of-the-art process discovery algorithms.
The findings we reported in this paper are important for process mining research, as they highlight that not only algorithms but also empirical connections between input data complexity and output quality should be investigated.
Our results demonstrate the potential to examine the concept of process complexity and its corresponding measures in connection with automated process discovery and there are various opportunities to extend this approach to related research problems. Additional aspects of event log data, such as data complexity, could be used to study connections with further output parameters, such as the process model's usefulness perceived by analysts.