‘Right to Be Forgotten’: Analyzing the Impact of Forgetting Data Using K-NN Algorithm in Data Stream Learning

Libera, Caio; Miranda, Leandro; Bernardini, Flávia; Mastelini, Saulo; Viterbo, José

doi:10.1007/978-3-031-15086-9_34

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13391))

Included in the following conference series:

International Conference on Electronic Government

1225 Accesses
1 Citations

Abstract

New international regulations concerning personal management data guarantee the ‘Right to Be Forgotten’. One might request to have their data erased from third-party tools and services. This requirement is especially challenging when considering the behavior of machine learning estimators that will need to forget portions of their knowledge. In this paper, we investigate the impact of these learning and forgetting policies in data stream learning. In data stream mining, the sheer volume of instances typically makes it unfeasible to store the data or retraining the learning models from scratch. Hence, more efficient solutions are needed to deal with the dynamic nature of online machine learning. We modify an incremental k-NN classifier to enable it to erase its past data and we also investigate the impact of data forgetting in the obtained predictive performance. Our proposal is compared against the original k-NN algorithm using seven non-stationary stream datasets. Our results show that the forgetting-enabled algorithm can achieve similar prediction patterns compared to the vanilla one, although it yields lower predictive performance at the beginning of the learning process. Such a scenario is a typical cold-start behavior often observed in data stream mining applications, and not necessarily related to the employed forgetting mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available at https://scikit-multiflow.github.io/.
2.
Available at https://github.com/dlcaio/research-project-data-stream-learning.

References

Albertini, M.K., de Mello, R.F.: A self-organizing neural network to approach novelty detection. In: Machine Learning: Concepts, Methodologies, Tools and Applications. IGIGlobal (2012)
Google Scholar
Alves, C., Bernardini, F., Meza, E.B.M., Sousa, L.: Evaluating the behaviour of stream learning algorithms for detecting invasion on wireless networks. Int. J. Secur. Netw. 15(3), 133–140 (2020)
Article Google Scholar
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis, vol. 11 (2010). http://portal.acm.org/citation.cfm?id=1859903
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM (2000)
Google Scholar
European Commission: ethics guidelines for trustworthy AI (2019). https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai. Accessed 17 July 2020
European parliament: general data protection regulation (2016). https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679. Accessed 18 May 2020
Faial, D., Bernardini, F., Meza, E.M., Miranda, L., Viterbo, J.: A methodology for taxi demand prediction using stream learning. In: 2020 International Conference on Systems, Signals and Image Processing (IWSSIP) (2020)
Google Scholar
Faial, D., Bernardini, F., Miranda, L., Viterbo, J.: Anomaly detection in vehicle traffic data using batch and stream supervised learning. In: Moura Oliveira, P., Novais, P., Reis, L.P. (eds.) EPIA 2019. LNCS (LNAI), vol. 11804, pp. 675–684. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30241-2_56
Chapter Google Scholar
Frank, E., Hall, M.A., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann, Burlington (2016)
Google Scholar
Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)
Book MATH Google Scholar
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 1–37 (2014)
Article MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Holzinger, A., et al.: Machine learning and knowledge extraction in digital pathology needs an integrative approach. In: Holzinger, A., Goebel, R., Ferri, M., Palade, V. (eds.) Towards Integrative Machine Learning and Knowledge Extraction. LNCS (LNAI), vol. 10344, pp. 13–50. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69775-8_2
Chapter Google Scholar
Jantke, P.: Types of incremental learning. In: Proceedings of the AAAI Symposium on Training Issues in Incremental Learning (1993)
Google Scholar
Lemaire, V., Salperwyck, C., Bondu, A.: A survey on supervised classification on data streams. In: Zimányi, E., Kutsche, R.-D. (eds.) eBISS 2014. LNBIP, vol. 205, pp. 88–125. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17551-5_4
Chapter Google Scholar
Losing, V., Hammer, B., Wersing, H.: KNN classifier with self adjusting memory for heterogeneous concept drift. In: Proceedings of the 2016 IEEE International Conference on Data Mining (ICDM), pp. 291–300 (2016)
Google Scholar
Manapragada, C., Webb, G., Salehi, M.: Extremely fast decision tree. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2018, pp. 1953–1962 (2018)
Google Scholar
Mellado, D., Saavedra, C., Chabert, S., Torres, R., Salas, R.: Self-improving generative artificial neural network for pseudorehearsal incremental class learning. Algorithms 12, 206 (2019)
Article MathSciNet Google Scholar
Mirzasoleiman, B., Karbasi, A., Krause, A.: Deletion-robust submodular maximization: data summarization with the “right to be forgotten”. In: Proceedings of 34th International Conference on Machine Learning, Proceedings Machine Learning Research, vol. 70, pp. 2449–2458 (2017)
Google Scholar
Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. J. Mach. Learn. Res. 19(1), 2915–2914 (2018)
Google Scholar
Polikar, R., Udpa, L., Udpa, S.S., Honavar, V.: LEARN++: an incremental learning algorithm for multilayer perceptron networks. In: Proceedings of 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (2000)
Google Scholar
Politou, E., Alepis, E., Patsakis, C.: Forgetting personal data and revoking consent under the GDPR: challenges and proposed solutions. J. Cybersecur. 4(1), tyy001 (2018)
Article Google Scholar
Villaronga, E.F., Kieseberg, P.T.L.: Humans forget, machines remember: artificial intelligence and the right to be forgotten. Comput. Law Secur. Rev. 34(2), 304–313 (2018)
Article Google Scholar
Zamora-Martínez, F., Romeu, P., Botella-Rocamora, P., Pardo, J.: On-line learning of indoor temperature forecasting models towards energy efficiency. Energ. Build. 83, 162–172 (2014)
Article Google Scholar

Download references

Acknowledgments

This research was supported by the Coordination for the Improvement of Higher Education Personnel (CAPES), Process n. 88882.183880; and PIBIC/CNPQ/UFF. We also gratefully acknowledge Albert Bifet and Paristech University, who hosted Flávia Bernardini for a week and allowed us to have discussions to achieve these results.

Author information

Authors and Affiliations

Fluminense Federal University, Niterói, RJ, Brazil
Caio Libera, Leandro Miranda, Flávia Bernardini & José Viterbo
University of São Paulo, São Carlos, SP, Brazil
Saulo Mastelini

Authors

Caio Libera
View author publications
You can also search for this author in PubMed Google Scholar
Leandro Miranda
View author publications
You can also search for this author in PubMed Google Scholar
Flávia Bernardini
View author publications
You can also search for this author in PubMed Google Scholar
Saulo Mastelini
View author publications
You can also search for this author in PubMed Google Scholar
José Viterbo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leandro Miranda .

Editor information

Editors and Affiliations

Delft University of Technology, Delft, The Netherlands
Marijn Janssen
Corvinus University of Budapest, Budapest, Hungary
Csaba Csáki
Linköping University, Linköping, Sweden
Ida Lindgren
University of the Aegean, Samos, Greece
Euripidis Loukis
Linköping University, Linköping, Sweden
Ulf Melin
Danube University Krems, Krems an der Donau, Austria
Gabriela Viale Pereira
University of Granada, Granada, Spain
Manuel Pedro Rodríguez Bolívar
University of Macedonia, Thessaloniki, Greece
Efthimios Tambouris

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Libera, C., Miranda, L., Bernardini, F., Mastelini, S., Viterbo, J. (2022). ‘Right to Be Forgotten’: Analyzing the Impact of Forgetting Data Using K-NN Algorithm in Data Stream Learning. In: Janssen, M., et al. Electronic Government. EGOV 2022. Lecture Notes in Computer Science, vol 13391. Springer, Cham. https://doi.org/10.1007/978-3-031-15086-9_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-15086-9_34
Published: 30 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15085-2
Online ISBN: 978-3-031-15086-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

‘Right to Be Forgotten’: Analyzing the Impact of Forgetting Data Using K-NN Algorithm in Data Stream Learning