Transformer Models for Activity Mining in Knowledge-Intensive Processes

Khandaker, Faria; Senderovich, Arik; Yu, Eric; Carbajales, Sebastian; Chan, Allen

doi:10.1007/978-3-031-25383-6_2

Faria Khandaker⁹,
Arik Senderovich¹⁰,
Eric Yu⁹,
Sebastian Carbajales¹¹ &
…
Allen Chan¹¹

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 460))

Included in the following conference series:

International Conference on Business Process Management

698 Accesses

Abstract

Mining useful information to analyze knowledge-intensive business processes requires data that describes activities of knowledge workers. Emails are widely used in organizations to provide support in the functioning of knowledge-intensive processes. The recent COVID-19 pandemic has increased reliance on technologies such as email to help facilitate communication within organizations to make up for the lack of face-to-face contact. In this work, we propose an activity mining technique, which receives an incoming email message, classifies the sender’s intent and translates it into a set of business process activities. Specifically, we leverage deep learning language models to first classify the email body into a group of intents, which are then mapped to related activities. To our knowledge, we propose the first transfer-learning based solution for mining activity information from emails. The effectiveness of our solution was evaluated on real-world data coming from email exchanges between knowledge workers. Our results based on unsupervised experiments and a field study show that transformer models can be used to semantically label emails and that mapping activities to matched intents is highly accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that a sentence can be represented as a bag of words or a sequence; our problem formulation is agnostic to how sentences are defined.
2.
For more information about these pre-trained models visit https://huggingface.co/.

References

Dustdar, S., Hoffmann, T., Van der Aalst, W.: Mining of ad-hoc business processes with teamlog. Data Knowl. Eng. 55(2), 129–158 (2005)
Article Google Scholar
Corston-Oliver, S., Ringger, E., Gamon, M., Campbell, R.: Task-focused summarization of email. In: Text Summarization Branches Out, pp. 43–50 (2004)
Google Scholar
Stuit, M., Wortmann, H.: Discovery and analysis of e-mail-driven business processes. Inf. Syst. 37(2), 142–168 (2012)
Article Google Scholar
Bloom, N.: How working from home works out. Stanford Institute for Economic Policy Research, pp. 1–8 (2020)
Google Scholar
Heavin, C., Power, D.J.: Challenges for digital transformation-towards a conceptual decision support guide for managers. J. Decis. Syst. 27(sup1), 38–45 (2018)
Article Google Scholar
Wang, W., Hosseini, S., Awadallah, A.H., Bennett, P.N., Quirk, C.: Context-aware intent identification in email conversations. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 585–594 (2019)
Google Scholar
van der Aalst, W.M., Nikolov, A.: EmailAnalyzer: an e-mail mining plug-in for the prom framework. BPM Center Report BPM-07-16, BPMCenter.org (2007)
Google Scholar
Lin, C.C., Kang, D., Gamon, M., Pantel, P.: Actionable email intent modeling with reparametrized RNNs. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Chambers, A.J., et al.: Automated business process discovery from unstructured natural-language documents. In: Del Río Ortega, A., Leopold, H., Santoro, F.M. (eds.) BPM 2020. LNBIP, vol. 397, pp. 232–243. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66498-5_18
Chapter Google Scholar
Elleuch, M., Ismaili, O.A., Laga, N., Gaaloul, W., Benatallah, B.: Discovering activities from emails based on pattern discovery approach. In: Fahland, D., Ghidini, C., Becker, J., Dumas, M. (eds.) BPM 2020. LNBIP, vol. 392, pp. 88–104. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58638-6_6
Chapter Google Scholar
Jlailaty, D., Grigori, D., Belhajjame, K.: On the elicitation and annotation of business activities based on emails. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp. 101–103 (2019)
Google Scholar
Alibadi, Z., Du, M., Vidal, J.M.: Using pre-trained embeddings to detect the intent of an email. In: Proceedings of the 7th ACIS International Conference on Applied Computing and Information Technology, pp. 1–7 (2019)
Google Scholar
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)
Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: International Conference on Machine Learning, pp. 2152–2161. PMLR (2015)
Google Scholar
Cohen, W., Carvalho, V., Mitchell, T.: Learning to classify email into “speech acts”. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 309–316 (2004)
Google Scholar
Searle, J.R., Searle, J.R.: Speech Acts: An Essay in the Philosophy of Language, vol. 626. Cambridge University Press (1969)
Google Scholar
Carvalho, V.R., Cohen, W.W.: Learning to extract signature and reply lines from email. In: Proceedings of the Conference on Email and Anti-Spam, vol. 2004 (2004)
Google Scholar
El Emam, K., Madhavji, N.H.: A field study of requirements engineering practices in information systems development. In: Proceedings of 1995 IEEE International Symposium on Requirements Engineering (RE 1995), pp. 68–80. IEEE (1995)
Google Scholar
Wang, X., Xu, Y.: An improved index for clustering validation based on silhouette index and Calinski-Harabasz index. In: IOP Conference Series: Materials Science and Engineering, vol. 569, p. 052024. IOP Publishing (2019)
Google Scholar
Yin, W., Hay, J., Roth, D.: Benchmarking zeroshot text classification: datasets, evaluation and entailment approach. arXiv preprint arXiv:1909.00161 (2019)
Sappadla, P.V., Nam, J., Mencía, E.L., Fürnkranz, J.: Using semantic similarity for multi-label zero-shot classification of text documents. In: ESANN (2016)
Google Scholar
Di Ciccio, C., Mecella, M.: Mining artful processes from knowledge workers’ emails. IEEE Internet Comput. 17(5), 10–20 (2013)
Article Google Scholar
Shu, K., Mukherjee, S., Zheng, G., Awadallah, A.H., Shokouhi, M., Dumais, S.: Learning with weak supervision for email intent detection. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1051–1060 (2020)
Google Scholar
Pearl, J., Mackenzie, D.: The Book of Why: The New Science of Cause and Effect. Basic Books (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information, University of Toronto, Toronto, Canada
Faria Khandaker & Eric Yu
School of Information Technology, York University, Toronto, Canada
Arik Senderovich
IBM Centre for Advanced Studies, Markham, Canada
Sebastian Carbajales & Allen Chan

Authors

Faria Khandaker
View author publications
You can also search for this author in PubMed Google Scholar
Arik Senderovich
View author publications
You can also search for this author in PubMed Google Scholar
Eric Yu
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Carbajales
View author publications
You can also search for this author in PubMed Google Scholar
Allen Chan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Faria Khandaker .

Editor information

Editors and Affiliations

University of Seville, Sevilla, Spain
Cristina Cabanillas
University of Agder, Kristiansand, Norway
Niels Frederik Garmann-Johnsen
Kiel University, Kiel, Germany
Agnes Koschmider

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khandaker, F., Senderovich, A., Yu, E., Carbajales, S., Chan, A. (2023). Transformer Models for Activity Mining in Knowledge-Intensive Processes. In: Cabanillas, C., Garmann-Johnsen, N.F., Koschmider, A. (eds) Business Process Management Workshops. BPM 2022. Lecture Notes in Business Information Processing, vol 460. Springer, Cham. https://doi.org/10.1007/978-3-031-25383-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-25383-6_2
Published: 09 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25382-9
Online ISBN: 978-3-031-25383-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Transformer Models for Activity Mining in Knowledge-Intensive Processes