Anonymization-as-a-Service: The Service Center Transcripts Industrial Case

Borovits, Nemania; Bardelloni, Gianluigi; Tamburri, Damian Andrew; Van Den Heuvel, Willem-Jan

doi:10.1007/978-3-031-48424-7_19

Nemania Borovits¹²,
Gianluigi Bardelloni¹³,
Damian Andrew Tamburri¹² &
…
Willem-Jan Van Den Heuvel¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14420))

Included in the following conference series:

International Conference on Service-Oriented Computing

332 Accesses

Abstract

Modern Big Data Analytics services require compliance with non-functional requirements such as privacy, in order to align with the introduced legislation such as the General Data Protection Regulation (GDPR). Specifically, the Telco industry has been using Big Data Analytics solutions for service continuity, whose basic steps revolve around automatically transcribing call center text data to extract valuable insights and enhance customer service. Such data obviously contains Personal Identifiable Information (PII) which hampers privacy-sensitive service operations if not handled properly. To meet these requirements we created Deperson—an efficient rule-based data anonymization service—which enables companies to anonymize customer data effectively while preserving its utility for further analysis. As a proof-of-concept, Deperson has been integrated into an existing Big Data Analytics solution in the Customer Contact Analytics department of a major Dutch Telco provider to ensure compliance with GDPR regulations. Based on dictionary look-ups and pattern-matching rules Deperson effectively removes PII achieving an accuracy of 0.82 while maintaining the essential information necessary for analysis. Our concept shows that Deperson plays a significant role in enabling the extraction and further processing of valuable insights from customer data without risking non-compliance with GDPR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
available online at: https://github.com/kpnDataScienceLab/deperson.
2.
https://github.com/OpenTaal/opentaal-hunspell.
3.
https://data.overheid.nl/.

References

Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union L 119 4 May 2016; pp. 1–88 (2016)
Google Scholar
Armstrong, M.: Infographic: Data Protection Fines Reach Record High in 2023. Statista Daily Data (2023). https://www.statista.com/chart/30053/gdpr-data-protection-fines-timeline
Ataei, M., Degbelo, A., Kray, C., Santos, V.: Complying with privacy legislation: from legal text to implementation of privacy-aware location-based services. ISPRS Int. J. Geo Inf. 7(11), 442 (2018)
Article Google Scholar
Avison, D.E., Lau, F., Myers, M.D., Nielsen, P.A.: Action research. Commun. ACM 42(1), 94–97 (1999)
Article Google Scholar
Barreno, M., Nelson, B., Joseph, A.D., Tygar, J.D.: The security of machine learning. Mach. Learn. 81, 121–148 (2010)
Article MathSciNet MATH Google Scholar
Borovits, N., et al.: FindICI: using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code. Empir. Softw. Eng. 27(7), 1–30 (2022)
Article Google Scholar
Burgess, M.: CHATGPT has a big privacy problem. Wired (2023). https://www.wired.com/story/italy-ban-chatgpt-privacy-gdpr/
Carlini, N., Liu, C., Erlingsson, Ú., Kos, J., Song, D.: The secret sharer: Evaluating and testing unintended memorization in neural networks. In: 28th USENIX Security Symposium (USENIX Security 2019), pp. 267–284 (2019)
Google Scholar
Carlini, N., et al.: Extracting training data from large language models. In: 30th USENIX Security Symposium (USENIX Security 2021), pp. 2633–2650 (2021)
Google Scholar
Chen, W.Y., Yu, M., Sun, C.: Architecture and building the medical image anonymization service: cloud, big data and automation. In: 2021 International Conference on Electronic Communications, Internet of Things and Big Data (ICEIB), pp. 149–153. IEEE (2021)
Google Scholar
Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Article Google Scholar
Coughlan, P., Coghlan, D.: Action research for operations management. Int. J. Oper. Prod. Manag. 22(2), 220–240 (2002)
Article Google Scholar
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Google Scholar
Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333 (2015)
Google Scholar
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 758–769 (2007)
Google Scholar
Guerriero, M., Tamburri, D.A., Di Nitto, E.: Defining, enforcing and checking privacy policies in data-intensive applications. In: Proceedings of the 13th International Conference on Software Engineering for Adaptive and Self-Managing Systems, pp. 172–182 (2018)
Google Scholar
Hisamoto, S., Post, M., Duh, K.: Membership inference attacks on sequence-to-sequence models: is my data in your machine translation system? Trans. Assoc. Comput. Linguist. 8, 49–63 (2020)
Article Google Scholar
Huang, J., Shao, H., Chang, K.C.C.: Are Large Pre-Trained Language Models Leaking Your Personal Information? arXiv preprint arXiv:2205.12628 (2022)
Jian, Z., et al.: A cascaded approach for Chinese clinical text de-identification with less annotation effort. J. Biomed. Inform. 73, 76–83 (2017)
Article Google Scholar
Kaplan, M.: May I Ask Who’s Calling? Named Entity Recognition on Call Center Transcripts for Privacy Law Compliance. arXiv preprint arXiv:2010.15598 (2020)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115. IEEE (2006)
Google Scholar
Li, Z.S., Werner, C., Ernst, N., Damian, D.: Towards privacy compliance: a design science study in a small organization. Inf. Softw. Technol. 146, 106868 (2022)
Article Google Scholar
Lukas, N., Salem, A., Sim, R., Tople, S., Wutschitz, L., Zanella-Béguelin, S.: Analyzing leakage of personally identifiable information in language models. arXiv preprint arXiv:2302.00539 (2023)
Meehan, M.: Data Privacy shall Be The Most Important Issue In The Next Decade. Forbes (2019). https://www.forbes.com/sites/marymeehan/2019/11/26/data-privacy-shall-be-the-most-important-issue-in-the-next-decade/
Mireshghallah, F., Uniyal, A., Wang, T., Evans, D.K., Berg-Kirkpatrick, T.: An empirical analysis of memorization in fine-tuned autoregressive language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 1816–1826 (2022)
Google Scholar
Murugadoss, K., et al.: Building a best-in-class automated de-identification tool for electronic health records through ensemble learning. Patterns 2(6), 100255 (2021)
Article Google Scholar
Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: 2009 30th IEEE Symposium on Security and Privacy, pp. 173–187. IEEE (2009)
Google Scholar
Neamatullah, I., et al.: Automated de-identification of free-text medical records. BMC Med. Inform. Decis. Mak. 8(1), 1–17 (2008)
Article Google Scholar
Paleyes, A., Urma, R.G., Lawrence, N.D.: Challenges in deploying machine learning: a survey of case studies. ACM Comput. Surv. 55(6), 1–29 (2022)
Article Google Scholar
Pan, X., Zhang, M., Ji, S., Yang, M.: Privacy risks of general-purpose language models. In: 2020 IEEE Symposium on Security and Privacy (SP), pp. 1314–1331. IEEE (2020)
Google Scholar
Papernot, N., McDaniel, P., Sinha, A., Wellman, M.P.: SoK: security and privacy in machine learning. In: 2018 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 399–414. IEEE (2018)
Google Scholar
Solove, D.J.: Why privacy matters even if you have ‘nothing to hide’. Chronicle High. Educ. 15 (2011)
Google Scholar
Soria-Comas, J., Domingo-Ferrer, J.: Big data privacy: challenges to privacy principles and models. Data Sci. Eng. 1(1), 21–28 (2016)
Article Google Scholar
Turrecha, L.M.: AI has a privacy problem, and the solution is privacy tech, not more Red Tape. AI Has A Privacy Problem, And The Solution is Privacy Tech, Not More Red Tape (2023). https://lourdesmturrecha.substack.com/p/title-ai-has-a-privacy-problem-and
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Zhong, Z.Y.S., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: SIAM International Conference on Data Mining, pp. 1–11 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Eindhoven University of Technology, Eindhoven, Netherlands
Nemania Borovits & Damian Andrew Tamburri
KPN, Amsterdam, Netherlands
Gianluigi Bardelloni
Tilburg University, Tilburg, Netherlands
Willem-Jan Van Den Heuvel

Authors

Nemania Borovits
View author publications
You can also search for this author in PubMed Google Scholar
Gianluigi Bardelloni
View author publications
You can also search for this author in PubMed Google Scholar
Damian Andrew Tamburri
View author publications
You can also search for this author in PubMed Google Scholar
Willem-Jan Van Den Heuvel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nemania Borovits .

Editor information

Editors and Affiliations

Sapienza University of Rome, Rome, Italy
Flavia Monti
Technical University of Munich, Garching, Germany
Stefanie Rinderle-Ma
University of Seville, Seville, Spain
Antonio Ruiz Cortés
Sun Yat-sen University, Guangzhou, China
Zibin Zheng
Sapienza University of Rome, Rome, Italy
Massimo Mecella

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Borovits, N., Bardelloni, G., Tamburri, D.A., Van Den Heuvel, WJ. (2023). Anonymization-as-a-Service: The Service Center Transcripts Industrial Case. In: Monti, F., Rinderle-Ma, S., Ruiz Cortés, A., Zheng, Z., Mecella, M. (eds) Service-Oriented Computing. ICSOC 2023. Lecture Notes in Computer Science, vol 14420. Springer, Cham. https://doi.org/10.1007/978-3-031-48424-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-48424-7_19
Published: 20 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48423-0
Online ISBN: 978-3-031-48424-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics