skip to main content
10.1145/3565698.3565765acmotherconferencesArticle/Chapter ViewAbstractPublication Pageschinese-chiConference Proceedingsconference-collections
research-article
Open Access

VFLens: Co-design the Modeling Process for Efficient Vertical Federated Learning via Visualization

Authors Info & Claims
Published:12 February 2024Publication History

ABSTRACT

As a decentralized training approach, federated learning enables multiple organizations to jointly train a model without exposing their private data. This work investigates vertical federated learning (VFL) to address scenarios where collaborating organizations have the same set of users but with different features, and only one party holds the labels. While VFL shows good performance, practitioners often face uncertainty when preparing non-transparent, internal/external features and samples for the VFL training phase. Moreover, to balance the prediction accuracy and the resource consumption of model inference, practitioners require to know which subset of prediction instances is genuinely needed to invoke the VFL model for inference. To this end, we co-design the VFL modeling process by proposing an interactive real-time visualization system, VFLens, to help practitioners with feature engineering, sample selection, and inference. A usage scenario, a quantitative experiment, and expert feedback suggest that VFLens helps practitioners boost VFL efficiency at a lower cost with sufficient confidence.

Skip Supplemental Material Section

Supplemental Material

VFL.mp4

mp4

11 MB

References

  1. André Altmann, Laura Toloşi, Oliver Sander, and Thomas Lengauer. 2010. Permutation importance: a corrected feature importance measure. Bioinformatics 26, 10 (2010), 1340–1347.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yindalon Aphinyanaphongs, Lawrence D Fu, Zhiguo Li, Eric R Peskin, Efstratios Efstathiadis, Constantin F Aliferis, and Alexander Statnikov. 2014. A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization. Journal of the Association for Information Science and Technology 65, 10(2014), 1964–1987.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. David Arthur and Sergei Vassilvitskii. 2006. k-means++: The advantages of careful seeding. Technical Report. Stanford.Google ScholarGoogle Scholar
  4. Avrim L Blum and Pat Langley. 1997. Selection of relevant features and examples in machine learning. Artificial intelligence 97, 1-2 (1997), 245–271.Google ScholarGoogle Scholar
  5. H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017. arxiv:1602.05629https://arxiv.org/pdf/1602.05629.pdfGoogle ScholarGoogle Scholar
  6. Girish Chandrashekar and Ferat Sahin. 2014. A survey on feature selection methods. Computers & Electrical Engineering 40, 1 (2014), 16–28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Angelos Chatzimparmpas, Rafael M Martins, Kostiantyn Kucher, and Andreas Kerren. 2021. FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches. arXiv preprint arXiv:2103.14539(2021).Google ScholarGoogle Scholar
  8. Hao Chen, Zhicong Huang, Kim Laine, and Peter Rindal. 2018. Labeled PSI from fully homomorphic encryption with malicious security. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 1223–1237.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hao Chen, Kim Laine, and Peter Rindal. 2017. Fast private set intersection from homomorphic encryption. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 1243–1255.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, Dimitrios Papadopoulos, and Qiang Yang. 2021. Secureboost: A lossless federated learning framework. IEEE Intelligent Systems(2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tao Fan. 2018. FATE-Board_FATE’s Visualization Toolkit., 11 pages. https://github.com/FederatedAI/FATE-BoardGoogle ScholarGoogle Scholar
  13. Fedai. 2019. Computer vision Platform powered by Federated Learning. https://www.fedai.org/cases/computer-vision-platform-powered-by-federated-learning/Google ScholarGoogle Scholar
  14. George Forman 2003. An extensive empirical study of feature selection metrics for text classification.J. Mach. Learn. Res. 3, Mar (2003), 1289–1305.Google ScholarGoogle Scholar
  15. Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of machine learning research 3, Mar (2003), 1157–1182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The unreasonable effectiveness of data. IEEE Intelligent Systems 24, 2 (2009), 8–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated Learning for Mobile Keyboard Prediction. (2018). arxiv:1811.03604http://arxiv.org/abs/1811.03604Google ScholarGoogle Scholar
  18. Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne. 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677(2017).Google ScholarGoogle Scholar
  19. Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, Vol. 1. IEEE, 278–282.Google ScholarGoogle Scholar
  20. Fred Hohman, Kanit Wongsuphasawat, Mary Beth Kery, and Kayur Patel. 2020. Understanding and visualizing data iteration in machine learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yan Huang, David Evans, and Jonathan Katz. 2012. Private set intersection: Are garbled circuits better than custom protocols?. In NDSS.Google ScholarGoogle Scholar
  22. Qinghe Jing, Weiyan Wang, Junxue Zhang, Han Tian, and Kai Chen. 2019. Quantifying the performance of federated transfer learning. arXiv preprint arXiv:1912.12795(2019).Google ScholarGoogle Scholar
  23. Alan Jović, Karla Brkić, and Nikola Bogunović. 2015. A review of feature selection methods with applications. In 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO). Ieee, 1200–1205.Google ScholarGoogle Scholar
  24. Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, 2019. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977(2019).Google ScholarGoogle Scholar
  25. Ron Kohavi and George H John. 1997. Wrappers for feature subset selection. Artificial intelligence 97, 1-2 (1997), 273–324.Google ScholarGoogle Scholar
  26. Anran Li, Lan Zhang, Juntao Tan, Yaxuan Qin, Junhao Wang, and Xiang-Yang Li. 2021. Sample-level Data Selection for Federated Learning. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE, 1–10.Google ScholarGoogle Scholar
  27. Anran Li, Lan Zhang, Juntao Tan, Yaxuan Qin, Junhao Wang, and Xiang-Yang Li. 2021. Sample-level Data Selection for Federated Learning. In IEEE INFOCOM 2021 - IEEE Conference on Computer Communications. 1–10. https://doi.org/10.1109/INFOCOM42981.2021.9488723Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Qinbin Li, Bingsheng He, and Dawn Song. 2021. Model-Contrastive Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10713–10722.Google ScholarGoogle ScholarCross RefCross Ref
  29. Quan Li, Kristanto Sean Njotoprawiro, Hammad Haleem, Qiaoan Chen, Chris Yi, and Xiaojuan Ma. 2018. Embeddingvis: A visual analytics approach to comparative network embedding inspection. In 2018 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 48–59.Google ScholarGoogle ScholarCross RefCross Ref
  30. Quan Li, Xiguang Wei, Huanbin Lin, Yang Liu, Tianjian Chen, and Xiaojuan Ma. 2021. Inspecting the Running Process of Horizontal Federated Learning via Visual Analytics. IEEE Transactions on Visualization and Computer Graphics (2021), 1–1. https://doi.org/10.1109/TVCG.2021.3074010Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Qinbin Li, Zeyi Wen, and Bingsheng He. 2020. Practical federated gradient boosting decision trees. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4642–4649.Google ScholarGoogle ScholarCross RefCross Ref
  32. Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2018. Federated optimization in heterogeneous networks. arXiv preprint arXiv:1812.06127(2018).Google ScholarGoogle Scholar
  33. Zhang Li and Guo Jun. 2006. A method for the selection of training samples based on boundary samples. Journal of Beijing University of Posts and Telecommunications 29, 4(2006), 77.Google ScholarGoogle Scholar
  34. Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR) 54, 6 (2021), 1–35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mike. 2018. Federated learning: distributed machine learning with data locality and privacy. https://blog.fastforwardlabs.com/2018/11/14/federated-learning.htmlGoogle ScholarGoogle Scholar
  36. Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. 2019. Agnostic federated learning. In International Conference on Machine Learning. PMLR, 4615–4625.Google ScholarGoogle Scholar
  37. Ronald Rivest and S Dusse. 1992. The MD5 message-digest algorithm.Google ScholarGoogle Scholar
  38. Marcin Rojek. 2018. Devices learning from each other ? See it live this September at AI Summit in San Francisco !, 7–11 pages.Google ScholarGoogle Scholar
  39. David Roschewitz, Mary-Anne Hartley, Luca Corinzia, and Martin Jaggi. 2021. IFedAvg: Interpretable Data-Interoperability for Federated Learning. arXiv preprint arXiv:2107.06580(2021).Google ScholarGoogle Scholar
  40. Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. 2021. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. In proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Erwan Scornet. 2020. Trees, forests, and impurity-based variable importance. arXiv preprint arXiv:2001.04295(2020).Google ScholarGoogle Scholar
  42. Jinwook Seo and Ben Shneiderman. 2005. A rank-by-feature framework for interactive exploration of multidimensional data. Information visualization 4, 2 (2005), 96–113.Google ScholarGoogle Scholar
  43. MA Syakur, BK Khotimah, EMS Rochman, and Budi Dwi Satoto. 2018. Integration k-means clustering method and elbow method for identification of the best customer profile cluster. In IOP conference series: materials science and engineering, Vol. 336. IOP Publishing, 012017.Google ScholarGoogle ScholarCross RefCross Ref
  44. Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, and Ramesh Raskar. 2018. Split learning for health: Distributed deep learning without sharing raw patient data. arxiv:1812.00564 [cs.LG]Google ScholarGoogle Scholar
  45. Guan Wang. 2019. Interpret federated learning with shapley values. arXiv preprint arXiv:1905.04519(2019).Google ScholarGoogle Scholar
  46. Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopoulos, and Yasaman Khazaeni. 2020. Federated learning with matched averaging. arXiv preprint arXiv:2002.06440(2020).Google ScholarGoogle Scholar
  47. Xiguang Wei, Quan Li, Yang Liu, Han Yu, Tianjian Chen, and Qiang Yang. 2019. Multi-Agent Visualization for Explaining Federated Learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 6572–6574. https://doi.org/10.24963/ijcai.2019/960Google ScholarGoogle ScholarCross RefCross Ref
  48. Ting Wu, Lei Chen, Pan Hui, Chen Jason Zhang, and Weikai Li. 2015. Hear the whole story: Towards the diversity of opinion in crowdsourcing markets. Proceedings of the VLDB Endowment 8, 5 (2015), 485–496.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Kai Yang, Tao Fan, Tianjian Chen, Yuanming Shi, and Qiang Yang. 2019. A quasi-newton method based vertical federated learning framework for logistic regression. arXiv preprint arXiv:1912.00513(2019).Google ScholarGoogle Scholar
  50. Liu Yang, Ben Tan, Vincent W Zheng, Kai Chen, and Qiang Yang. 2020. Federated recommendation systems. In Federated Learning. Springer, 225–239.Google ScholarGoogle Scholar
  51. Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology 10, 2(2019), 1–19. https://doi.org/10.1145/3298981 arxiv:1902.04885Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Qiang Yang, Yang Liu, Yong Cheng, Yan Kang, Tianjian Chen, and Han Yu. 2019. Federated learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 13, 3(2019), 1–207.Google ScholarGoogle ScholarCross RefCross Ref
  53. Ka-Ping Yee, Kirsten Swearingen, Kevin Li, and Marti Hearst. 2003. Faceted metadata for image search and browsing. In Proceedings of the SIGCHI conference on Human factors in computing systems. 401–408.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. I-Cheng Yeh and Che-hui Lien. 2009. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36, 2 (2009), 2473–2480.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Ching-Hung Yuen and Kwok-Wo Wong. 2011. A chaos-based joint image compression and encryption scheme using DCT and SHA-1. Applied Soft Computing 11, 8 (2011), 5092–5098.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Nghia Hoang, and Yasaman Khazaeni. 2019. Bayesian nonparametric federated learning of neural networks. In International Conference on Machine Learning. PMLR, 7252–7261.Google ScholarGoogle Scholar
  57. Chengliang Zhang, Suyi Li, Junzhe Xia, Wei Wang, Feng Yan, and Yang Liu. 2020. {BatchCrypt}: Efficient Homomorphic Encryption for {Cross-Silo} Federated Learning. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 493–506.Google ScholarGoogle Scholar
  58. Lingchen Zhao, Lihao Ni, Shengshan Hu, Yaniiao Chen, Pan Zhou, Fu Xiao, and Libing Wu. 2018. Inprivate digging: Enabling tree-based distributed data mining with differential privacy. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 2087–2095.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and Vikas Chandra. 2018. Federated learning with non-iid data. arXiv preprint arXiv:1806.00582(2018).Google ScholarGoogle Scholar
  60. Fanglan Zheng, Kun Li, Jiang Tian, Xiaojia Xiang, 2020. A vertical federated learning method for interpretable scorecard and its application in credit scoring. arXiv preprint arXiv:2009.06218(2020).Google ScholarGoogle Scholar

Index Terms

  1. VFLens: Co-design the Modeling Process for Efficient Vertical Federated Learning via Visualization

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            Chinese CHI '22: Proceedings of the Tenth International Symposium of Chinese CHI
            October 2022
            342 pages
            ISBN:9781450398695
            DOI:10.1145/3565698

            Copyright © 2022 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 February 2024

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate17of40submissions,43%
          • Article Metrics

            • Downloads (Last 12 months)26
            • Downloads (Last 6 weeks)19

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format