Skip to main content
Log in

A multi-source credit data fusion approach based on federated distillation learning

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Data imbalance and privacy disclosure shortcomings have become the main problems in the process of multi-source credit data fusion, the former causes conflicts during the fusion process, the latter brings huge security risks. While federated learning is used for data privacy protection, communication cost defects and inaccurate fusion results will follow. In order to effectively unify data fusion, the paper proposes an approach based on federated distillation learning, which uses synthetic distillation data instead of traditional parameter transfer models to fuse to reduce time cost and improve accuracy without compromising data privacy,simultaneously utilizing local data to train the model and conducting interactive learning with the server's model. Specifically, it uses a decision tree model to distill knowledge from credit data, replacing the traditional parameter transfer model. At the same time, the Generic Adversarial Network is used to balance data distribution and solve the problem of data imbalance on the server. The experimental results show that the method proposed has improved both utilization performance and unbalanced data processing by at least 3%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113

    Article  Google Scholar 

  2. Bellotti T, Crook J (2019) Support vector machines for credit scoring and discovery of significant features. Expert Syst Appl 36(2):3302–3308

    Article  Google Scholar 

  3. Ben-David A (2008) Rule effectiveness in rule-based systems: a credit scoring case study. Expert Syst Appl 34(4):2783–2788

    Article  Google Scholar 

  4. Blanco A, Pino-Mejı´as R, Lara J, Rayo S (2013) Credit scoring models for the microfinance industry using neural networks: evidence from Peru. Expert Syst Appl 40(1):356–364

    Article  Google Scholar 

  5. Yang Q, Liu Y, Chen T et al (2019) Federated Machine Learning: Concept and Applications. ACM Trans Intell Syst Technol 10(2):1–19

    Article  Google Scholar 

  6. Huang J, Qian F, Guo Y et al (2013) An in depth study of LTE: Effect of network protocol and application behavior on performance. ACM Sigcomm Comput Commun Rev 43(4):363–374

    Article  Google Scholar 

  7. BrenChawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  8. Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F (2017) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci 479:448–455

    Article  Google Scholar 

  9. Li T, Sahu AK, Talwalkar A et al (2020) Federated learning: Challenges, methods, and future directions. IEEE Signal Process Mag 37(3):50–60

    Article  CAS  Google Scholar 

  10. Smith V, Chiang CK, Sanjabi M (2017) Federated multi-task learning. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS). Long Beach: Curran Associates, 4424−4434

  11. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial nets. In: Advances in neural information processing systems. Neural Comput Appl 32:8451–8462

    Google Scholar 

  12. Duan M, Liu D, Chen X et al (2019) Astraea: self -balancing federated learning for improving classification accuracy of mobile deep learning applications. 2019 IEEE 37th international conference on computer design. New York: IEEE 2019:246–254

    Google Scholar 

  13. ChenY, Ssun X, Jin Y (2019) Communication-efficient federated deep learning with asynchronous model update and temporally weighted aggregation. 2019, arXiv:1903.07424

  14. Liu L, Zhang J, Song S H, et al (2019) Client-edge-cloud hierarchical federated learning. 2019, arXiv:1905.06641

  15. Yao X, Huang T, Wu C, et al (2019) Federated learning with additional mechanisms on clients to reduce communication costs. 2019, arXiv:1908.05891

  16. Mcmahan HB, Moore E, Ramage D et al (2017) Communication-efficient learning of deep networks from decentralized data. Artif Intell Stat 10:1273–1282

    Google Scholar 

  17. Beimel A, Korolova A, Nissim K, et al (2019) The power of synergy in differential privacy: Combining a small curator with local randomizers. arXiv preprint,arXiv:1912.08951

  18. Ye D, Wei H, Xiaojun C et al (2020) Efficient and secure federated learning based on secret sharing and gradients selection. J Comput Res Dev 57(10):2241–2250

    Google Scholar 

  19. Tran N H, Bao W, Zomaya A, et al (2019) Federated learning over wireless networks: Optimization model design and analysis. In Proceedings of IEEE INFOCOM 2019 - IEEE Conference on Computer Communications. Piscataway, NJ: IEEE. 1387–1395

  20. Shiqiang W, Tuor T, Salonidis T et al (2019) Adaptive federated learning in resource constrained edge computing systems. IEEE J Sel Areas Commun 37(6):1205–1221

    Article  Google Scholar 

  21. Eunjeong Jeong, Seungeun Oh, Hyesung Kim, et al (2018) Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data. CoRR abs/1811.11479

  22. Itahara S, Nishio T, Koda Y et al (2021) Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-IID private data. IEEE Trans Mob Comput 22(1):91–205

    Google Scholar 

  23. Li D, Wang J (2019) Fedmd: Heterogenous federated learning via model distillation. ar Xiv :1910.03581

  24. Chang H, Shejwalkar V, Shokri R, et al (2019) Cronus: Robust and heterogeneous collaborative learning with black-box knowledge transfer. 2019, ar Xiv preprint ar Xiv:1912.11279

  25. Wu Y, Cai S, Xiao X. et al (2020) Privacy preserving vertical federated learning for tree-based model. arXiv preprint arXiv:2008.06170

  26. Yang M, Song L, Xu J, et al (2019) The tradeoff between privacy and accuracy in anomaly detection using federated XGBoost. arXiv preprint arXiv:1907.07157

  27. Liu L, Zhang H, Ji Y, Wu QJ (2019) Towards AI fashion design: an attribute-GAN model for clothing match. Neurocomputing 341:156–167

    Article  Google Scholar 

  28. Luo C, Wu D, Wu D (2018) A deep learning approach for credit scoring using credit default swaps. Eng Appl Artif Intell 65:465–470

    Article  Google Scholar 

  29. Zhang H, Sun Y, Liu L et al (2018) ClothingOut: a category-supervised GAN model for clothing segmentation and retrieval. Neural Comput Appl 32:4219

    Google Scholar 

  30. Zhang Y, Wang D, Chen Y, Shang H, Tian Q (2017) Credit risk assessment based on long short-term memory model. Int Conf Intell Comput. https://doi.org/10.1007/978-3-319-63312-1_62

    Article  Google Scholar 

  31. Zojaji Z, Atani RE, Monadjemi AH et al (2016) A survey of credit card fraud detection techniques: data and technique oriented perspective. ArXiv preprint arXiv:1611.06439

  32. Lei K, Xie Y, Zhong S et al (2020) Generative adversarial fusion network for class imbalance credit scoring. Neural Comput Appl 32:8451–8462

    Article  Google Scholar 

  33. Heo B, Lee M, Yun S (2019) Knowledge distillation with adversarial samples supporting decision boundary. Proc AAAI Conf Artif Intell 33:3771–3778

    Google Scholar 

  34. Yang C, Xie L, Su C et al (2019) Snapshot distillation: teacher-student optimization in one generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Long Beach, pp 2859–2868

  35. Cha H, Park J, Kim H et al (2019) Federated reinforcement distillation with proxy experience memory. In: Proceedings of the IEEE conference on federated machine learning for user privacy and data confidentiality (FML 2019)

  36. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comput Sci 14(7):38–39

    Google Scholar 

  37. Jeong E, Oh S, Kim H, et al (2018) Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data. abs/1811.11479

Download references

Funding

The National Key Research and Development Program of China,2019YFB1404602.

Author information

Authors and Affiliations

Authors

Contributions

All authors contribute equally to the article. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zhoubao Sun.

Ethics declarations

Conflicts of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Sun, Z., Mao, L. et al. A multi-source credit data fusion approach based on federated distillation learning. Int. J. Mach. Learn. & Cyber. 15, 1153–1164 (2024). https://doi.org/10.1007/s13042-023-02032-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-02032-z

Keywords

Navigation