Skip to main content

‘Right to Be Forgotten’: Analyzing the Impact of Forgetting Data Using K-NN Algorithm in Data Stream Learning

  • Conference paper
  • First Online:
Electronic Government (EGOV 2022)

Abstract

New international regulations concerning personal management data guarantee the ‘Right to Be Forgotten’. One might request to have their data erased from third-party tools and services. This requirement is especially challenging when considering the behavior of machine learning estimators that will need to forget portions of their knowledge. In this paper, we investigate the impact of these learning and forgetting policies in data stream learning. In data stream mining, the sheer volume of instances typically makes it unfeasible to store the data or retraining the learning models from scratch. Hence, more efficient solutions are needed to deal with the dynamic nature of online machine learning. We modify an incremental k-NN classifier to enable it to erase its past data and we also investigate the impact of data forgetting in the obtained predictive performance. Our proposal is compared against the original k-NN algorithm using seven non-stationary stream datasets. Our results show that the forgetting-enabled algorithm can achieve similar prediction patterns compared to the vanilla one, although it yields lower predictive performance at the beginning of the learning process. Such a scenario is a typical cold-start behavior often observed in data stream mining applications, and not necessarily related to the employed forgetting mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at https://scikit-multiflow.github.io/.

  2. 2.

    Available at https://github.com/dlcaio/research-project-data-stream-learning.

References

  1. Albertini, M.K., de Mello, R.F.: A self-organizing neural network to approach novelty detection. In: Machine Learning: Concepts, Methodologies, Tools and Applications. IGIGlobal (2012)

    Google Scholar 

  2. Alves, C., Bernardini, F., Meza, E.B.M., Sousa, L.: Evaluating the behaviour of stream learning algorithms for detecting invasion on wireless networks. Int. J. Secur. Netw. 15(3), 133–140 (2020)

    Article  Google Scholar 

  3. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis, vol. 11 (2010). http://portal.acm.org/citation.cfm?id=1859903

  4. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM (2000)

    Google Scholar 

  5. European Commission: ethics guidelines for trustworthy AI (2019). https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai. Accessed 17 July 2020

  6. European parliament: general data protection regulation (2016). https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679. Accessed 18 May 2020

  7. Faial, D., Bernardini, F., Meza, E.M., Miranda, L., Viterbo, J.: A methodology for taxi demand prediction using stream learning. In: 2020 International Conference on Systems, Signals and Image Processing (IWSSIP) (2020)

    Google Scholar 

  8. Faial, D., Bernardini, F., Miranda, L., Viterbo, J.: Anomaly detection in vehicle traffic data using batch and stream supervised learning. In: Moura Oliveira, P., Novais, P., Reis, L.P. (eds.) EPIA 2019. LNCS (LNAI), vol. 11804, pp. 675–684. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30241-2_56

    Chapter  Google Scholar 

  9. Frank, E., Hall, M.A., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

  10. Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)

    Book  MATH  Google Scholar 

  11. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 1–37 (2014)

    Article  MATH  Google Scholar 

  12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  13. Holzinger, A., et al.: Machine learning and knowledge extraction in digital pathology needs an integrative approach. In: Holzinger, A., Goebel, R., Ferri, M., Palade, V. (eds.) Towards Integrative Machine Learning and Knowledge Extraction. LNCS (LNAI), vol. 10344, pp. 13–50. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69775-8_2

    Chapter  Google Scholar 

  14. Jantke, P.: Types of incremental learning. In: Proceedings of the AAAI Symposium on Training Issues in Incremental Learning (1993)

    Google Scholar 

  15. Lemaire, V., Salperwyck, C., Bondu, A.: A survey on supervised classification on data streams. In: Zimányi, E., Kutsche, R.-D. (eds.) eBISS 2014. LNBIP, vol. 205, pp. 88–125. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17551-5_4

    Chapter  Google Scholar 

  16. Losing, V., Hammer, B., Wersing, H.: KNN classifier with self adjusting memory for heterogeneous concept drift. In: Proceedings of the 2016 IEEE International Conference on Data Mining (ICDM), pp. 291–300 (2016)

    Google Scholar 

  17. Manapragada, C., Webb, G., Salehi, M.: Extremely fast decision tree. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2018, pp. 1953–1962 (2018)

    Google Scholar 

  18. Mellado, D., Saavedra, C., Chabert, S., Torres, R., Salas, R.: Self-improving generative artificial neural network for pseudorehearsal incremental class learning. Algorithms 12, 206 (2019)

    Article  MathSciNet  Google Scholar 

  19. Mirzasoleiman, B., Karbasi, A., Krause, A.: Deletion-robust submodular maximization: data summarization with the “right to be forgotten”. In: Proceedings of 34th International Conference on Machine Learning, Proceedings Machine Learning Research, vol. 70, pp. 2449–2458 (2017)

    Google Scholar 

  20. Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. J. Mach. Learn. Res. 19(1), 2915–2914 (2018)

    Google Scholar 

  21. Polikar, R., Udpa, L., Udpa, S.S., Honavar, V.: LEARN++: an incremental learning algorithm for multilayer perceptron networks. In: Proceedings of 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (2000)

    Google Scholar 

  22. Politou, E., Alepis, E., Patsakis, C.: Forgetting personal data and revoking consent under the GDPR: challenges and proposed solutions. J. Cybersecur. 4(1), tyy001 (2018)

    Article  Google Scholar 

  23. Villaronga, E.F., Kieseberg, P.T.L.: Humans forget, machines remember: artificial intelligence and the right to be forgotten. Comput. Law Secur. Rev. 34(2), 304–313 (2018)

    Article  Google Scholar 

  24. Zamora-Martínez, F., Romeu, P., Botella-Rocamora, P., Pardo, J.: On-line learning of indoor temperature forecasting models towards energy efficiency. Energ. Build. 83, 162–172 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by the Coordination for the Improvement of Higher Education Personnel (CAPES), Process n. 88882.183880; and PIBIC/CNPQ/UFF. We also gratefully acknowledge Albert Bifet and Paristech University, who hosted Flávia Bernardini for a week and allowed us to have discussions to achieve these results.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leandro Miranda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Libera, C., Miranda, L., Bernardini, F., Mastelini, S., Viterbo, J. (2022). ‘Right to Be Forgotten’: Analyzing the Impact of Forgetting Data Using K-NN Algorithm in Data Stream Learning. In: Janssen, M., et al. Electronic Government. EGOV 2022. Lecture Notes in Computer Science, vol 13391. Springer, Cham. https://doi.org/10.1007/978-3-031-15086-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15086-9_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15085-2

  • Online ISBN: 978-3-031-15086-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics