research-article

Mining Frequent Infix Patterns from Concurrency-Aware Process Execution Variants

Authors:
Michael Martini

RWTH Aachen University, Aachen, Germany

RWTH Aachen University, Aachen, Germany
View Profile

,
Daniel Schuster

Fraunhofer FIT, Sankt Augustin, Germany and RWTH Aachen University, Aachen, Germany

Fraunhofer FIT, Sankt Augustin, Germany and RWTH Aachen University, Aachen, Germany
View Profile

,
Wil M. P. van der Aalst

Fraunhofer FIT, Sankt Augustin, Germany and RWTH Aachen University, Aachen, Germany

Fraunhofer FIT, Sankt Augustin, Germany and RWTH Aachen University, Aachen, Germany
View Profile

Authors Info & Claims

Proceedings of the VLDB Endowment Volume 16 Issue 10pp 2666–2678https://doi.org/10.14778/3603581.3603603

Published:01 June 2023Publication History

Proceedings of the VLDB Endowment

Abstract

Event logs, as considered in process mining, document a large number of individual process executions. Moreover, each process execution consists of various executed activities. To cope with the vast amount of process executions in event logs, the concept of variants exists that group process executions with identical ordering relations among their executed activities. Variants are an integral concept of process mining and help process analysts explore, filter, and manage large amounts of event data. In this paper, we consider concurrency-aware variants that allow activities within a process execution to be partially ordered---the execution of individual activities can overlap in time. However, the number of variants is often vast, making it challenging for process analysts to explore event data. Therefore, we present a novel approach to frequent pattern mining from concurrency-aware variants. We show that mining frequent patterns from concurrency-aware variants can be reduced to the frequent subtree mining problem. Further, we compare our proposed algorithm to a state-of-the-art frequent subtree mining algorithm exhibiting improved performance on real-life event logs.

References

Wil M. P. van der Aalst, Arya Adriansyah, Ana Karla Alves de Medeiros, Franco Arcieri, Thomas Baier, Tobias Blickle, Jagadeesh Chandra Bose, Peter van den Brand, Ronald Brandtjen, Joos Buijs, et al. 2011. Process mining manifesto. In International conference on business process management. Springer, 169--194. Google ScholarCross Ref
Rakesh Agrawal and Ramakrishnan. Srikant. 1995. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering. IEEE Comput. Soc. Press, 3--14. Google ScholarCross Ref
Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, Vol. 1215. Santiago, Chile, 487--499. Google ScholarDigital Library
Tatsuya Asai, Kenji Abe, Shinji Kawasoe, Hiroshi Sakamoto, Hiroki Arimura, and Setsuo Arikawa. 2004. Efficient substructure discovery from large semi-structured data. IEICE Transactions on Information and Systems 87, 12 (2004), 2754--2763. Google ScholarCross Ref
Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, Andrea Marrella, Massimo Mecella, and Allar Soo. 2019. Automated Discovery of Process Models from Event Logs: Review and Benchmark. IEEE Transactions on Knowledge and Data Engineering 31, 4 (2019), 686--705. Google ScholarDigital Library
Kristof Böhmer and Stefanie Rinderle-Ma. 2020. LoGo: combining local and global techniques for predictive business process monitoring. In Advanced Information Systems Engineering: 32nd International Conference, CAiSE 2020, Grenoble, France, June 8--12, 2020, Proceedings 32. Springer, Springer, Cham, 283--298. Google ScholarDigital Library
R. P. Jagadeesh Chandra Bose and Wil M. P. van der Aalst. 2009. Trace Clustering Based on Conserved Patterns: Towards Achieving Better Process Models.. In Business Process Management Workshops, Vol. 43. Springer, 170--181. Google ScholarCross Ref
Josep Carmona, Boudewijn F. van Dongen, Andreas Solti, and Matthias Weidlich. 2018. Conformance Checking. Springer. Google ScholarCross Ref
Michelangelo Ceci, Pasqua Fabiana Lanotte, Fabio Fumarola, Dario Pietro Cavallo, and Donato Malerba. 2014. Completion time and next activity prediction of processes using sequential pattern mining. In Discovery Science: 17th International Conference, DS 2014, Bled, Slovenia, October 8--10, 2014. Proceedings 17. Springer, Springer, Cham, 49--61. Google ScholarCross Ref
Yun Chi, Richard R. Muntz, Siegfried Nijssen, and Joost N. Kok. 2005. Frequent subtree mining - An overview. Fundamenta Informaticae 66, 1--2 (2005), 161--198. Google ScholarDigital Library
Yun Chi, Yi Xia, Yirong Yang, and Richard R. Muntz. 2005. Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering 17, 2 (2005), 190--202. Google ScholarDigital Library
Yun Chi, Yirong Yang, and Richard R. Muntz. 2005. Canonical forms for labelled trees and their applications in frequent subtree mining. Knowledge and information systems 8 (2005), 203--234. Google ScholarCross Ref
Remco Dijkman, Juntao Gao, Alifah Syamsiyah, Boudewijn F. van Dongen, Paul Grefen, and Arthur ter Hofstede. 2020. Enabling efficient process mining on large data sets: realizing an in-database process mining operator. Distributed and Parallel Databases 38, 1 (2020), 227--253. Google ScholarDigital Library
Felix Mannhardt. 2016. Sepsis Cases - Event Log. Google ScholarCross Ref
Peter C Fishburn. 1970. Intransitive indifference with unequal indifference intervals. Journal of Mathematical Psychology 7, 1 (1970), 144--149. Google ScholarCross Ref
Philippe Fournier-Viger, Ted Gueniche, and Vincent S. Tseng. 2012. Using partially-ordered sequential rules to generate more accurate sequence prediction. In Advanced Data Mining and Applications: 8th International Conference, ADMA 2012, Nanjing, China, December 15--18, 2012. Proceedings 8. Springer, Springer, Berlin, Heidelberg, 431--442. Google ScholarCross Ref
Shohei Hido and Hiroyuki Kawano. 2005. AMIOT: induced ordered tree mining in tree-structured databases. In Fifth IEEE International Conference on Data Mining (ICDM'05). IEEE, IEEE, 8--17. Google ScholarDigital Library
Chuntao Jiang, Frans Coenen, and Michele Zito. 2013. A survey of frequent subgraph mining algorithms. The Knowledge Engineering Review 28, 1 (2013), 75--105. Google ScholarCross Ref
Sander J. J. Leemans, Sebastiaan J. van Zelst, and Xixi Lu. 2023. Partial-order-based process mining: a survey and outlook. Knowledge and Information Systems 65, 1 (2023), 1--29. Google ScholarDigital Library
Jing Lu, Weiru Chen, Osei Adjei, and Malcolm Keech. 2008. Sequential patterns postprocessing for structural relation patterns mining. International Journal of Data Warehousing and Mining (IJDWM) 4, 3 (2008), 71--89. Google ScholarCross Ref
Heikki Mannila, Hannu Toivonen, and A Inkeri Verkamo. 1997. Discovery of frequent episodes in event sequences. Data mining and knowledge discovery 1, 3 (1997), 259--289. Google ScholarDigital Library
Lars Reinkemeyer. 2020. Process Mining in Action. Springer. Google ScholarCross Ref
Daniel Schuster, Niklas Föcking, Sebastiaan J van Zelst, and Wil M. P. van der Aalst. 2022. Conformance Checking for Trace Fragments Using Infix and Postfix Alignments. In International Conference on Cooperative Information Systems. Springer, Springer, Cham, 299--310. Google ScholarDigital Library
Daniel Schuster, Lukas Schade, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. 2022. Visualizing Trace Variants from Partially Ordered Event Data. In Process Mining Workshops. LNBIP, Vol. 433. Springer, 34--46. Google ScholarCross Ref
Daniel Schuster, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. 2022. Utilizing domain knowledge in data-driven process discovery: A literature review. Computers in Industry 137 (2022), 103612. Google ScholarDigital Library
Daniel Schuster, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. 2023. Cortado: A dedicated process mining tool for interactive process discovery. SoftwareX 22 (2023), 101373. Google ScholarCross Ref
Minseok Song, Christian W Günther, and Wil M. P. van der Aalst. 2008. Trace clustering in process mining. In International conference on business process management. Springer, Springer, Berlin, Heidelberg, 109--120. Google ScholarCross Ref
Henry Tan, Tharam S. Dillon, Fedja Hadzic, Elizabeth Chang, and Ling Feng. 2006. IMB3-Miner: Mining Induced/Embedded subtrees by constraining the level of embedding. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Springer, Berlin, Heidelberg, 450--461. Google ScholarDigital Library
Shirish Tatikonda, Srinivasan Parthasarathy, and Tahsin Kurc. 2006. TRIPS and TIDES: new algorithms for tree mining. In Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, Association for Computing Machinery, 455--464. Google ScholarDigital Library
Niek Tax, Natalia Sidorova, Reinder Haakma, and Wil M. P. van der Aalst. 2016. Mining local process models. Journal of Innovation in Digital Ecosystems 3, 2 (2016), 183--196. Google ScholarCross Ref
Wil M. P. van der Aalst. 2016. Process Mining: Data Science in Action. Springer. Google ScholarCross Ref
Wil M. P. van der Aalst. 2020. On the Pareto Principle in Process Mining, Task Mining, and Robotic Process Automation.. In Proceedings of the 9th International Conference on Data Science, Technology and Applications - DATA. INSTICC, SciTePress, 5--12. Google ScholarCross Ref
Boudewijn F. van Dongen. 2012. BPI Challenge 2012 - Event Log. Google ScholarCross Ref
Boudewijn F. van Dongen. 2017. BPI Challenge 2017 - Event Log. Google ScholarCross Ref
Boudewijn F. van Dongen. 2020. BPI Challenge 2020 - Event Log. Google ScholarCross Ref
Maikel L. van Eck, Xixi Lu, Sander J. J. Leemans, and Wil M. P. van der Aalst. 2015. PM²: A Process Mining Project Methodology. In Advanced Information Systems Engineering. LNCS, Vol. 9097. Springer, 297--313. Google ScholarCross Ref
Yongqiao Xiao and J-F Yao. 2003. Efficient data mining for maximal frequent subtrees. In Third IEEE International Conference on Data Mining. IEEE, IEEE, 379--386. Google ScholarCross Ref
Mohammed J. Zaki. 2002. Efficiently mining frequent trees in a forest. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. IEEE, 71--80. Google ScholarDigital Library
Fareed Zandkarimi, Jana-Rebecca Rehse, Pouya Soudmand, and Hartmut Hoehle. 2020. A generic framework for trace clustering in process mining. In 2020 2nd International Conference on Process Mining (ICPM). IEEE, IEEE, Padova, 177--184. Google ScholarCross Ref

Recommendations

Efficient algorithms for mining constrained frequent patterns from uncertain data
U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data

Mining of frequent patterns is one of the popular knowledge discovery and data mining (KDD) tasks. It also plays an essential role in the mining of many other patterns such as correlation, sequences, and association rules. Hence, it has been the subject ...
Read More
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test ...
Read More
Efficient mining of maximal correlated weight frequent patterns

Maximal frequent pattern mining has been suggested for data mining to avoid generating a huge set of frequent patterns. Conversely, weighted frequent pattern mining has been proposed to discover important frequent patterns by considering the weighted ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 16, Issue 10
June 2023
295 pages
ISSN:2150-8097
Editors:
Georgia Koutrika
Athena Research Center
,
Jun Yang
Duke University
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 June 2023
Published in pvldb Volume 16, Issue 10

Check for updates
Badges
- Artifacts Available / v1.1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 87
  Total Downloads
- Downloads (Last 12 months)87
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining Frequent Infix Patterns from Concurrency-Aware Process Execution Variants

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Efficient algorithms for mining constrained frequent patterns from uncertain data

Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Efficient mining of maximal correlated weight frequent patterns