Privacy-Preserving Synthetic Educational Data Generation

Vie, Jill-Jênn; Rigaux, Tomas; Minn, Sein

doi:10.1007/978-3-031-16290-9_29

Jill-Jênn Vie¹²,
Tomas Rigaux¹² &
Sein Minn¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13450))

Included in the following conference series:

European Conference on Technology Enhanced Learning

2068 Accesses
2 Citations

Abstract

Institutions collect massive learning traces but they may not disclose it for privacy issues. Synthetic data generation opens new opportunities for research in education. In this paper we present a generative model for educational data that can preserve the privacy of participants, and an evaluation framework for comparing synthetic data generators. We show how naive pseudonymization can lead to re-identification threats and suggest techniques to guarantee privacy. We evaluate our method on existing massive educational open datasets.

J.-J. Vie and T. Rigaux—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Acs, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through lossy compression. In: 2012 IEEE 12th International Conference on Data Mining, pp. 1–10. IEEE (2012)
Google Scholar
Berendt, B., Littlejohn, A., Blakemore, M.: AI in education: learner choice and fundamental rights. Learn. Media Technol. 45(3), 312–324 (2020)
Article Google Scholar
Cablé, B., Guin, N., Lefevre, M.: An authoring tool for semi-automatic generation of self-assessment exercises. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS (LNAI), vol. 7926, pp. 679–682. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39112-5_87
Chapter Google Scholar
Chen, R., Acs, G., Castelluccia, C.: Differentially private sequential data publication via variable-length N-grams. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 638–649 (2012)
Google Scholar
Choffin, B., Popineau, F., Bourda, Y., Vie, J.J.: DAS3H: modeling student learning and forgetting for optimally scheduling distributed practice of skills. arXiv preprint arXiv:1905.06873 (2019)
De Montjoye, Y.A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the crowd: the privacy bounds of human mobility. Sci. Rep. 3(1), 1–5 (2013)
Article Google Scholar
Denis, P.: Probabilistic inference using generators: the statues algorithm. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) SAI 2020. AISC, vol. 1229, pp. 133–154. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52246-9_10
Chapter Google Scholar
Dorodchi, M., Al-Hossami, E., Benedict, A., Demeter, E.: Using synthetic data generators to promote open science in higher education learning analytics. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 4672–4675. IEEE (2019)
Google Scholar
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Chapter MATH Google Scholar
Gervet, T., Koedinger, K., Schneider, J., Mitchell, T., et al.: When is deep learning the best approach to knowledge tracing? J. Educ. Data Min. 12(3), 31–54 (2020)
Google Scholar
Heffernan, N.T., Heffernan, C.L.: The ASSISTments ecosystem: building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. Int. J. Artif. Intell. Educ. 24(4), 470–497 (2014)
Article MathSciNet Google Scholar
Holmes, W., Iniesto, F., Sharples, M., Scanlon, E.: ETHICS in AIED: who cares? An EC-TEL workshop. In: EC-TEL 2019 Fourteenth European Conference on Technology Enhanced Learning (2019). https://oro.open.ac.uk/67263/
Jordon, J., et al.: Hide-and-seek privacy challenge. arXiv preprint arXiv:2007.12087 (2020)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lee, J., Clifton, C.: How much is enough? Choosing \(\varepsilon \) for differential privacy. In: Lai, X., Zhou, J., Li, H. (eds.) ISC 2011. LNCS, vol. 7001, pp. 325–340. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24861-0_22
Chapter Google Scholar
Leinonen, J., Ihantola, P., Hellas, A.: Preventing keystroke based identification in open data sets. In: Proceedings of the Fourth ACM Conference on Learning@Scale, pp. 101–109 (2017)
Google Scholar
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. from Data (TKDD) 1(1), 3-es (2007)
Google Scholar
Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: 2008 IEEE Symposium on Security and Privacy (SP 2008), pp. 111–125. IEEE (2008)
Google Scholar
Patki, N., Wedge, R., Veeramachaneni, K.: The synthetic data vault. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 399–410 (2016). https://doi.org/10.1109/DSAA.2016.49
Pavlik, P.I., Jr., Cen, H., Koedinger, K.R.: Performance factors analysis-a new alternative to knowledge tracing (2009, online submission)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Piech, C., et al.: Deep knowledge tracing. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Ping, H., Stoyanovich, J., Howe, B.: DataSynthesizer: privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, pp. 1–5 (2017)
Google Scholar
Rasch, G.: On general laws and the meaning of measurement in psychology. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 4: Contributions to Biology and Problems of Medicine, pp. 321–333. University of California Press, Berkeley (1961). https://projecteuclid.org/euclid.bsmsp/1200512895
Rocher, L., Hendrickx, J.M., De Montjoye, Y.A.: Estimating the success of re-identifications in incomplete datasets using generative models. Nat. Commun. 10(1), 1–9 (2019)
Article Google Scholar
Settles, B., Brust, C., Gustafson, E., Hagiwara, M., Madnani, N.: Second language acquisition modeling. In: Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 56–65 (2018)
Google Scholar
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017)
Google Scholar
Van Lehn, K.: Two pseudo-students: applications of machine learning to formative evaluation. Technical report, Carnegie-Mellon University, Pittsburgh, PA, Department of Psychology (1990)
Google Scholar
VanLehn, K., Ohlsson, S., Nason, R.: Applications of simulated students: an exploration. J. Artif. Intell. Educ. 5, 135 (1994)
Google Scholar
Wilson, K.H., Karklin, Y., Han, B., Ekanadham, C.: Back to the basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation. In: International Educational Data Mining Society. ERIC (2016)
Google Scholar
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: PrivBayes: private data release via Bayesian networks. ACM Trans. Database Syst. (TODS) 42(4), 1–41 (2017)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

SODA, Inria Saclay, 1 rue Honoré d’Estienne d’Orves, 91120, Palaiseau, France
Jill-Jênn Vie & Tomas Rigaux
CEDAR, Inria Saclay, 1 rue Honoré d’Estienne d’Orves, 91120, Palaiseau, France
Sein Minn

Authors

Jill-Jênn Vie
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Rigaux
View author publications
You can also search for this author in PubMed Google Scholar
Sein Minn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jill-Jênn Vie .

Editor information

Editors and Affiliations

Pontificia Universidad Católica de Chile, Santiago, Chile
Isabel Hilliger
Universidad Carlos III de Madrid, Madrid, Spain
Pedro J. Muñoz-Merino
KU Leuven, Leuven, Belgium
Tinne De Laet
Universidad de Valladolid, Valladolid, Spain
Alejandro Ortega-Arranz
The Open University, Milton Keynes, UK
Tracie Farrell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vie, JJ., Rigaux, T., Minn, S. (2022). Privacy-Preserving Synthetic Educational Data Generation. In: Hilliger, I., Muñoz-Merino, P.J., De Laet, T., Ortega-Arranz, A., Farrell, T. (eds) Educating for a New Future: Making Sense of Technology-Enhanced Learning Adoption. EC-TEL 2022. Lecture Notes in Computer Science, vol 13450. Springer, Cham. https://doi.org/10.1007/978-3-031-16290-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-16290-9_29
Published: 05 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16289-3
Online ISBN: 978-3-031-16290-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics