research-article

Open Access

VFLens: Co-design the Modeling Process for Efficient Vertical Federated Learning via Visualization

Authors:
Yun Tian

School of Information Science and Technology, ShanghaiTech University, China

School of Information Science and Technology, ShanghaiTech University, China

0000-0003-2249-0728
View Profile

,
He Wang

School of Information Science and Technology, ShanghaiTech University, China

School of Information Science and Technology, ShanghaiTech University, China
View Profile

,
Laixin Xie

School of Information Science and Technology, ShanghaiTech University, China

School of Information Science and Technology, ShanghaiTech University, China
View Profile

,
Xiaojuan Ma

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, China

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, China
View Profile

,
Quan Li

School of Information Science and Technology, ShanghaiTech University, China

School of Information Science and Technology, ShanghaiTech University, China
View Profile

Chinese CHI '22: Proceedings of the Tenth International Symposium of Chinese CHIOctober 2022Pages 1–14https://doi.org/10.1145/3565698.3565765

Published:12 February 2024Publication History

Chinese CHI '22: Proceedings of the Tenth International Symposium of Chinese CHI

Pages 1–14

ABSTRACT

As a decentralized training approach, federated learning enables multiple organizations to jointly train a model without exposing their private data. This work investigates vertical federated learning (VFL) to address scenarios where collaborating organizations have the same set of users but with different features, and only one party holds the labels. While VFL shows good performance, practitioners often face uncertainty when preparing non-transparent, internal/external features and samples for the VFL training phase. Moreover, to balance the prediction accuracy and the resource consumption of model inference, practitioners require to know which subset of prediction instances is genuinely needed to invoke the VFL model for inference. To this end, we co-design the VFL modeling process by proposing an interactive real-time visualization system, VFLens, to help practitioners with feature engineering, sample selection, and inference. A usage scenario, a quantitative experiment, and expert feedback suggest that VFLens helps practitioners boost VFL efficiency at a lower cost with sufficient confidence.

Supplemental Material

VFL.mp4

mp4

11 MB

Download

References

André Altmann, Laura Toloşi, Oliver Sander, and Thomas Lengauer. 2010. Permutation importance: a corrected feature importance measure. Bioinformatics 26, 10 (2010), 1340–1347.Google ScholarDigital Library
Yindalon Aphinyanaphongs, Lawrence D Fu, Zhiguo Li, Eric R Peskin, Efstratios Efstathiadis, Constantin F Aliferis, and Alexander Statnikov. 2014. A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization. Journal of the Association for Information Science and Technology 65, 10(2014), 1964–1987.Google ScholarDigital Library
David Arthur and Sergei Vassilvitskii. 2006. k-means++: The advantages of careful seeding. Technical Report. Stanford.Google Scholar
Avrim L Blum and Pat Langley. 1997. Selection of relevant features and examples in machine learning. Artificial intelligence 97, 1-2 (1997), 245–271.Google Scholar
H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017. arxiv:1602.05629https://arxiv.org/pdf/1602.05629.pdfGoogle Scholar
Girish Chandrashekar and Ferat Sahin. 2014. A survey on feature selection methods. Computers & Electrical Engineering 40, 1 (2014), 16–28.Google ScholarDigital Library
Angelos Chatzimparmpas, Rafael M Martins, Kostiantyn Kucher, and Andreas Kerren. 2021. FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches. arXiv preprint arXiv:2103.14539(2021).Google Scholar
Hao Chen, Zhicong Huang, Kim Laine, and Peter Rindal. 2018. Labeled PSI from fully homomorphic encryption with malicious security. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 1223–1237.Google ScholarDigital Library
Hao Chen, Kim Laine, and Peter Rindal. 2017. Fast private set intersection from homomorphic encryption. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 1243–1255.Google ScholarDigital Library
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.Google ScholarDigital Library
Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, Dimitrios Papadopoulos, and Qiang Yang. 2021. Secureboost: A lossless federated learning framework. IEEE Intelligent Systems(2021).Google ScholarDigital Library
Tao Fan. 2018. FATE-Board_FATE’s Visualization Toolkit., 11 pages. https://github.com/FederatedAI/FATE-BoardGoogle Scholar
Fedai. 2019. Computer vision Platform powered by Federated Learning. https://www.fedai.org/cases/computer-vision-platform-powered-by-federated-learning/Google Scholar
George Forman 2003. An extensive empirical study of feature selection metrics for text classification.J. Mach. Learn. Res. 3, Mar (2003), 1289–1305.Google Scholar
Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of machine learning research 3, Mar (2003), 1157–1182.Google ScholarDigital Library
Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The unreasonable effectiveness of data. IEEE Intelligent Systems 24, 2 (2009), 8–12.Google ScholarDigital Library
Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated Learning for Mobile Keyboard Prediction. (2018). arxiv:1811.03604http://arxiv.org/abs/1811.03604Google Scholar
Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne. 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677(2017).Google Scholar
Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, Vol. 1. IEEE, 278–282.Google Scholar
Fred Hohman, Kanit Wongsuphasawat, Mary Beth Kery, and Kayur Patel. 2020. Understanding and visualizing data iteration in machine learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarDigital Library
Yan Huang, David Evans, and Jonathan Katz. 2012. Private set intersection: Are garbled circuits better than custom protocols?. In NDSS.Google Scholar
Qinghe Jing, Weiyan Wang, Junxue Zhang, Han Tian, and Kai Chen. 2019. Quantifying the performance of federated transfer learning. arXiv preprint arXiv:1912.12795(2019).Google Scholar
Alan Jović, Karla Brkić, and Nikola Bogunović. 2015. A review of feature selection methods with applications. In 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO). Ieee, 1200–1205.Google Scholar
Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, 2019. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977(2019).Google Scholar
Ron Kohavi and George H John. 1997. Wrappers for feature subset selection. Artificial intelligence 97, 1-2 (1997), 273–324.Google Scholar
Anran Li, Lan Zhang, Juntao Tan, Yaxuan Qin, Junhao Wang, and Xiang-Yang Li. 2021. Sample-level Data Selection for Federated Learning. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE, 1–10.Google Scholar
Anran Li, Lan Zhang, Juntao Tan, Yaxuan Qin, Junhao Wang, and Xiang-Yang Li. 2021. Sample-level Data Selection for Federated Learning. In IEEE INFOCOM 2021 - IEEE Conference on Computer Communications. 1–10. https://doi.org/10.1109/INFOCOM42981.2021.9488723Google ScholarDigital Library
Qinbin Li, Bingsheng He, and Dawn Song. 2021. Model-Contrastive Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10713–10722.Google ScholarCross Ref
Quan Li, Kristanto Sean Njotoprawiro, Hammad Haleem, Qiaoan Chen, Chris Yi, and Xiaojuan Ma. 2018. Embeddingvis: A visual analytics approach to comparative network embedding inspection. In 2018 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 48–59.Google ScholarCross Ref
Quan Li, Xiguang Wei, Huanbin Lin, Yang Liu, Tianjian Chen, and Xiaojuan Ma. 2021. Inspecting the Running Process of Horizontal Federated Learning via Visual Analytics. IEEE Transactions on Visualization and Computer Graphics (2021), 1–1. https://doi.org/10.1109/TVCG.2021.3074010Google ScholarDigital Library
Qinbin Li, Zeyi Wen, and Bingsheng He. 2020. Practical federated gradient boosting decision trees. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4642–4649.Google ScholarCross Ref
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2018. Federated optimization in heterogeneous networks. arXiv preprint arXiv:1812.06127(2018).Google Scholar
Zhang Li and Guo Jun. 2006. A method for the selection of training samples based on boundary samples. Journal of Beijing University of Posts and Telecommunications 29, 4(2006), 77.Google Scholar
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR) 54, 6 (2021), 1–35.Google ScholarDigital Library
Mike. 2018. Federated learning: distributed machine learning with data locality and privacy. https://blog.fastforwardlabs.com/2018/11/14/federated-learning.htmlGoogle Scholar
Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. 2019. Agnostic federated learning. In International Conference on Machine Learning. PMLR, 4615–4625.Google Scholar
Ronald Rivest and S Dusse. 1992. The MD5 message-digest algorithm.Google Scholar
Marcin Rojek. 2018. Devices learning from each other ? See it live this September at AI Summit in San Francisco !, 7–11 pages.Google Scholar
David Roschewitz, Mary-Anne Hartley, Luca Corinzia, and Martin Jaggi. 2021. IFedAvg: Interpretable Data-Interoperability for Federated Learning. arXiv preprint arXiv:2107.06580(2021).Google Scholar
Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. 2021. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. In proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarDigital Library
Erwan Scornet. 2020. Trees, forests, and impurity-based variable importance. arXiv preprint arXiv:2001.04295(2020).Google Scholar
Jinwook Seo and Ben Shneiderman. 2005. A rank-by-feature framework for interactive exploration of multidimensional data. Information visualization 4, 2 (2005), 96–113.Google Scholar
MA Syakur, BK Khotimah, EMS Rochman, and Budi Dwi Satoto. 2018. Integration k-means clustering method and elbow method for identification of the best customer profile cluster. In IOP conference series: materials science and engineering, Vol. 336. IOP Publishing, 012017.Google ScholarCross Ref
Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, and Ramesh Raskar. 2018. Split learning for health: Distributed deep learning without sharing raw patient data. arxiv:1812.00564 [cs.LG]Google Scholar
Guan Wang. 2019. Interpret federated learning with shapley values. arXiv preprint arXiv:1905.04519(2019).Google Scholar
Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopoulos, and Yasaman Khazaeni. 2020. Federated learning with matched averaging. arXiv preprint arXiv:2002.06440(2020).Google Scholar
Xiguang Wei, Quan Li, Yang Liu, Han Yu, Tianjian Chen, and Qiang Yang. 2019. Multi-Agent Visualization for Explaining Federated Learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 6572–6574. https://doi.org/10.24963/ijcai.2019/960Google ScholarCross Ref
Ting Wu, Lei Chen, Pan Hui, Chen Jason Zhang, and Weikai Li. 2015. Hear the whole story: Towards the diversity of opinion in crowdsourcing markets. Proceedings of the VLDB Endowment 8, 5 (2015), 485–496.Google ScholarDigital Library
Kai Yang, Tao Fan, Tianjian Chen, Yuanming Shi, and Qiang Yang. 2019. A quasi-newton method based vertical federated learning framework for logistic regression. arXiv preprint arXiv:1912.00513(2019).Google Scholar
Liu Yang, Ben Tan, Vincent W Zheng, Kai Chen, and Qiang Yang. 2020. Federated recommendation systems. In Federated Learning. Springer, 225–239.Google Scholar
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology 10, 2(2019), 1–19. https://doi.org/10.1145/3298981 arxiv:1902.04885Google ScholarDigital Library
Qiang Yang, Yang Liu, Yong Cheng, Yan Kang, Tianjian Chen, and Han Yu. 2019. Federated learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 13, 3(2019), 1–207.Google ScholarCross Ref
Ka-Ping Yee, Kirsten Swearingen, Kevin Li, and Marti Hearst. 2003. Faceted metadata for image search and browsing. In Proceedings of the SIGCHI conference on Human factors in computing systems. 401–408.Google ScholarDigital Library
I-Cheng Yeh and Che-hui Lien. 2009. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36, 2 (2009), 2473–2480.Google ScholarDigital Library
Ching-Hung Yuen and Kwok-Wo Wong. 2011. A chaos-based joint image compression and encryption scheme using DCT and SHA-1. Applied Soft Computing 11, 8 (2011), 5092–5098.Google ScholarDigital Library
Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Nghia Hoang, and Yasaman Khazaeni. 2019. Bayesian nonparametric federated learning of neural networks. In International Conference on Machine Learning. PMLR, 7252–7261.Google Scholar
Chengliang Zhang, Suyi Li, Junzhe Xia, Wei Wang, Feng Yan, and Yang Liu. 2020. {BatchCrypt}: Efficient Homomorphic Encryption for {Cross-Silo} Federated Learning. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 493–506.Google Scholar
Lingchen Zhao, Lihao Ni, Shengshan Hu, Yaniiao Chen, Pan Zhou, Fu Xiao, and Libing Wu. 2018. Inprivate digging: Enabling tree-based distributed data mining with differential privacy. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 2087–2095.Google ScholarDigital Library
Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and Vikas Chandra. 2018. Federated learning with non-iid data. arXiv preprint arXiv:1806.00582(2018).Google Scholar
Fanglan Zheng, Kun Li, Jiang Tian, Xiaojia Xiang, 2020. A vertical federated learning method for interpretable scorecard and its application in credit scoring. arXiv preprint arXiv:2009.06218(2020).Google Scholar

Index Terms

VFLens: Co-design the Modeling Process for Efficient Vertical Federated Learning via Visualization
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

A Static Bi-dimensional Sample Selection for Federated Learning with Label Noise
Database Systems for Advanced Applications
Abstract
In real-world Federated learning(FL), client training data may contain label noise, which can harm the generalization performance of the global model. Most existing noisy label learning methods rely on sample selection strategies that treat small-...
Read More
Secure vertical federated learning based on feature disentanglement
Highlights
- A new vertical federated learning framework (SVFL) preventing label owner from label inference attack.
Abstract
Federated learning (FL) faces many security threats. Although multiple robust FL frameworks have been proposed to defend against these malicious attacks in horizontal federated learning (HFL), security issues in vertical federated ...
Read More
Multi-label learning with label-specific feature reduction

We propose two multi-label learning approaches with LIFT reduction.The idea of fuzzy rough set attribute reduction is adopted in our approaches.Sample selection improves the efficiency in feature dimension reduction. In multi-label learning, since ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

Chinese CHI '22: Proceedings of the Tenth International Symposium of Chinese CHI
October 2022
342 pages
ISBN:9781450398695
DOI:10.1145/3565698

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 February 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Feature Interpretation
Federated Learning
Sample Selection
Visual Analytics
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate17of40submissions,43%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 26
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

VFLens: Co-design the Modeling Process for Efficient Vertical Federated Learning via Visualization

Chinese CHI '22: Proceedings of the Tenth International Symposium of Chinese CHI

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

A Static Bi-dimensional Sample Selection for Federated Learning with Label Noise

Secure vertical federated learning based on feature disentanglement

Multi-label learning with label-specific feature reduction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

VFLens: Co-design the Modeling Process for Efficient Vertical Federated Learning via Visualization

Chinese CHI '22: Proceedings of the Tenth International Symposium of Chinese CHI

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

A Static Bi-dimensional Sample Selection for Federated Learning with Label Noise

Secure vertical federated learning based on feature disentanglement

Multi-label learning with label-specific feature reduction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media