Abstract
Event logs, as considered in process mining, document a large number of individual process executions. Moreover, each process execution consists of various executed activities. To cope with the vast amount of process executions in event logs, the concept of variants exists that group process executions with identical ordering relations among their executed activities. Variants are an integral concept of process mining and help process analysts explore, filter, and manage large amounts of event data. In this paper, we consider concurrency-aware variants that allow activities within a process execution to be partially ordered---the execution of individual activities can overlap in time. However, the number of variants is often vast, making it challenging for process analysts to explore event data. Therefore, we present a novel approach to frequent pattern mining from concurrency-aware variants. We show that mining frequent patterns from concurrency-aware variants can be reduced to the frequent subtree mining problem. Further, we compare our proposed algorithm to a state-of-the-art frequent subtree mining algorithm exhibiting improved performance on real-life event logs.
- Wil M. P. van der Aalst, Arya Adriansyah, Ana Karla Alves de Medeiros, Franco Arcieri, Thomas Baier, Tobias Blickle, Jagadeesh Chandra Bose, Peter van den Brand, Ronald Brandtjen, Joos Buijs, et al. 2011. Process mining manifesto. In International conference on business process management. Springer, 169--194. Google ScholarCross Ref
- Rakesh Agrawal and Ramakrishnan. Srikant. 1995. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering. IEEE Comput. Soc. Press, 3--14. Google ScholarCross Ref
- Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, Vol. 1215. Santiago, Chile, 487--499. Google ScholarDigital Library
- Tatsuya Asai, Kenji Abe, Shinji Kawasoe, Hiroshi Sakamoto, Hiroki Arimura, and Setsuo Arikawa. 2004. Efficient substructure discovery from large semi-structured data. IEICE Transactions on Information and Systems 87, 12 (2004), 2754--2763. Google ScholarCross Ref
- Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, Andrea Marrella, Massimo Mecella, and Allar Soo. 2019. Automated Discovery of Process Models from Event Logs: Review and Benchmark. IEEE Transactions on Knowledge and Data Engineering 31, 4 (2019), 686--705. Google ScholarDigital Library
- Kristof Böhmer and Stefanie Rinderle-Ma. 2020. LoGo: combining local and global techniques for predictive business process monitoring. In Advanced Information Systems Engineering: 32nd International Conference, CAiSE 2020, Grenoble, France, June 8--12, 2020, Proceedings 32. Springer, Springer, Cham, 283--298. Google ScholarDigital Library
- R. P. Jagadeesh Chandra Bose and Wil M. P. van der Aalst. 2009. Trace Clustering Based on Conserved Patterns: Towards Achieving Better Process Models.. In Business Process Management Workshops, Vol. 43. Springer, 170--181. Google ScholarCross Ref
- Josep Carmona, Boudewijn F. van Dongen, Andreas Solti, and Matthias Weidlich. 2018. Conformance Checking. Springer. Google ScholarCross Ref
- Michelangelo Ceci, Pasqua Fabiana Lanotte, Fabio Fumarola, Dario Pietro Cavallo, and Donato Malerba. 2014. Completion time and next activity prediction of processes using sequential pattern mining. In Discovery Science: 17th International Conference, DS 2014, Bled, Slovenia, October 8--10, 2014. Proceedings 17. Springer, Springer, Cham, 49--61. Google ScholarCross Ref
- Yun Chi, Richard R. Muntz, Siegfried Nijssen, and Joost N. Kok. 2005. Frequent subtree mining - An overview. Fundamenta Informaticae 66, 1--2 (2005), 161--198. Google ScholarDigital Library
- Yun Chi, Yi Xia, Yirong Yang, and Richard R. Muntz. 2005. Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering 17, 2 (2005), 190--202. Google ScholarDigital Library
- Yun Chi, Yirong Yang, and Richard R. Muntz. 2005. Canonical forms for labelled trees and their applications in frequent subtree mining. Knowledge and information systems 8 (2005), 203--234. Google ScholarCross Ref
- Remco Dijkman, Juntao Gao, Alifah Syamsiyah, Boudewijn F. van Dongen, Paul Grefen, and Arthur ter Hofstede. 2020. Enabling efficient process mining on large data sets: realizing an in-database process mining operator. Distributed and Parallel Databases 38, 1 (2020), 227--253. Google ScholarDigital Library
- Felix Mannhardt. 2016. Sepsis Cases - Event Log. Google ScholarCross Ref
- Peter C Fishburn. 1970. Intransitive indifference with unequal indifference intervals. Journal of Mathematical Psychology 7, 1 (1970), 144--149. Google ScholarCross Ref
- Philippe Fournier-Viger, Ted Gueniche, and Vincent S. Tseng. 2012. Using partially-ordered sequential rules to generate more accurate sequence prediction. In Advanced Data Mining and Applications: 8th International Conference, ADMA 2012, Nanjing, China, December 15--18, 2012. Proceedings 8. Springer, Springer, Berlin, Heidelberg, 431--442. Google ScholarCross Ref
- Shohei Hido and Hiroyuki Kawano. 2005. AMIOT: induced ordered tree mining in tree-structured databases. In Fifth IEEE International Conference on Data Mining (ICDM'05). IEEE, IEEE, 8--17. Google ScholarDigital Library
- Chuntao Jiang, Frans Coenen, and Michele Zito. 2013. A survey of frequent subgraph mining algorithms. The Knowledge Engineering Review 28, 1 (2013), 75--105. Google ScholarCross Ref
- Sander J. J. Leemans, Sebastiaan J. van Zelst, and Xixi Lu. 2023. Partial-order-based process mining: a survey and outlook. Knowledge and Information Systems 65, 1 (2023), 1--29. Google ScholarDigital Library
- Jing Lu, Weiru Chen, Osei Adjei, and Malcolm Keech. 2008. Sequential patterns postprocessing for structural relation patterns mining. International Journal of Data Warehousing and Mining (IJDWM) 4, 3 (2008), 71--89. Google ScholarCross Ref
- Heikki Mannila, Hannu Toivonen, and A Inkeri Verkamo. 1997. Discovery of frequent episodes in event sequences. Data mining and knowledge discovery 1, 3 (1997), 259--289. Google ScholarDigital Library
- Lars Reinkemeyer. 2020. Process Mining in Action. Springer. Google ScholarCross Ref
- Daniel Schuster, Niklas Föcking, Sebastiaan J van Zelst, and Wil M. P. van der Aalst. 2022. Conformance Checking for Trace Fragments Using Infix and Postfix Alignments. In International Conference on Cooperative Information Systems. Springer, Springer, Cham, 299--310. Google ScholarDigital Library
- Daniel Schuster, Lukas Schade, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. 2022. Visualizing Trace Variants from Partially Ordered Event Data. In Process Mining Workshops. LNBIP, Vol. 433. Springer, 34--46. Google ScholarCross Ref
- Daniel Schuster, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. 2022. Utilizing domain knowledge in data-driven process discovery: A literature review. Computers in Industry 137 (2022), 103612. Google ScholarDigital Library
- Daniel Schuster, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. 2023. Cortado: A dedicated process mining tool for interactive process discovery. SoftwareX 22 (2023), 101373. Google ScholarCross Ref
- Minseok Song, Christian W Günther, and Wil M. P. van der Aalst. 2008. Trace clustering in process mining. In International conference on business process management. Springer, Springer, Berlin, Heidelberg, 109--120. Google ScholarCross Ref
- Henry Tan, Tharam S. Dillon, Fedja Hadzic, Elizabeth Chang, and Ling Feng. 2006. IMB3-Miner: Mining Induced/Embedded subtrees by constraining the level of embedding. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Springer, Berlin, Heidelberg, 450--461. Google ScholarDigital Library
- Shirish Tatikonda, Srinivasan Parthasarathy, and Tahsin Kurc. 2006. TRIPS and TIDES: new algorithms for tree mining. In Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, Association for Computing Machinery, 455--464. Google ScholarDigital Library
- Niek Tax, Natalia Sidorova, Reinder Haakma, and Wil M. P. van der Aalst. 2016. Mining local process models. Journal of Innovation in Digital Ecosystems 3, 2 (2016), 183--196. Google ScholarCross Ref
- Wil M. P. van der Aalst. 2016. Process Mining: Data Science in Action. Springer. Google ScholarCross Ref
- Wil M. P. van der Aalst. 2020. On the Pareto Principle in Process Mining, Task Mining, and Robotic Process Automation.. In Proceedings of the 9th International Conference on Data Science, Technology and Applications - DATA. INSTICC, SciTePress, 5--12. Google ScholarCross Ref
- Boudewijn F. van Dongen. 2012. BPI Challenge 2012 - Event Log. Google ScholarCross Ref
- Boudewijn F. van Dongen. 2017. BPI Challenge 2017 - Event Log. Google ScholarCross Ref
- Boudewijn F. van Dongen. 2020. BPI Challenge 2020 - Event Log. Google ScholarCross Ref
- Maikel L. van Eck, Xixi Lu, Sander J. J. Leemans, and Wil M. P. van der Aalst. 2015. PM2: A Process Mining Project Methodology. In Advanced Information Systems Engineering. LNCS, Vol. 9097. Springer, 297--313. Google ScholarCross Ref
- Yongqiao Xiao and J-F Yao. 2003. Efficient data mining for maximal frequent subtrees. In Third IEEE International Conference on Data Mining. IEEE, IEEE, 379--386. Google ScholarCross Ref
- Mohammed J. Zaki. 2002. Efficiently mining frequent trees in a forest. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. IEEE, 71--80. Google ScholarDigital Library
- Fareed Zandkarimi, Jana-Rebecca Rehse, Pouya Soudmand, and Hartmut Hoehle. 2020. A generic framework for trace clustering in process mining. In 2020 2nd International Conference on Process Mining (ICPM). IEEE, IEEE, Padova, 177--184. Google ScholarCross Ref
Recommendations
Efficient algorithms for mining constrained frequent patterns from uncertain data
U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain DataMining of frequent patterns is one of the popular knowledge discovery and data mining (KDD) tasks. It also plays an essential role in the mining of many other patterns such as correlation, sequences, and association rules. Hence, it has been the subject ...
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test ...
Efficient mining of maximal correlated weight frequent patterns
Maximal frequent pattern mining has been suggested for data mining to avoid generating a huge set of frequent patterns. Conversely, weighted frequent pattern mining has been proposed to discover important frequent patterns by considering the weighted ...
Comments