research-article

PSLF: Defending Against Label Leakage in Split Learning

Authors:
Xinwei Wan

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China

0009-0005-2096-6122
View Profile

,
Jiankai Sun

Bytedance Inc., Seattle, USA

Bytedance Inc., Seattle, USA

0000-0002-7214-0665
View Profile

,
Shengjie Wang

Bytedance Inc., Seattle, USA

Bytedance Inc., Seattle, USA

0000-0002-9311-102X
View Profile

,
Lei Chen

Bytedance Inc., San Jose, USA

Bytedance Inc., San Jose, USA

0009-0008-7975-0651
View Profile

,
Zhenzhe Zheng

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China

0000-0002-5094-5331
View Profile

,
Fan Wu

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China

0000-0003-0965-9058
View Profile

,
Guihai Chen

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China

0000-0002-6934-1685
View Profile

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge ManagementOctober 2023Pages 2492–2501https://doi.org/10.1145/3583780.3615019

Published:21 October 2023Publication History

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 2492–2501

ABSTRACT

With increasing concern over data privacy, split learning has become a widely used distributed machine learning paradigm in practice, where two participants (namely the non-label party and the label party) own raw features and raw labels respectively, and jointly train a model. Although no raw data is communicated between the two parties during model training, several works have demonstrated that data privacy, especially label privacy, is still vulnerable in split learning, and have proposed several defense algorithms against label attacks. However, the theoretical guarantee on the privacy preservation of these algorithms is limited. In this work, we propose a novel Private Split Learning Framework (PSLF). In PSLF, the label party shares only the gradients computed by flipped labels with the non-label party, which improves privacy preservation on raw labels, and meanwhile, we further design an extra sub-model from true labels to improve prediction accuracy. We also design a Flipped Multi-Label Generation mechanism (FMLG) based on randomized response for the label party to generate flipped labels. FMLG is proven differentially private and the label party could make a trade-off between privacy and utility by setting the DP budget. In addition, we design an upsampling method to further protect the labels against some existing attacks. We have evaluated PSLF over real-world datasets to demonstrate its effectiveness in protecting label privacy and achieving promising prediction accuracy.

References

Walaa Alnasser, Ghazaleh Beigi, Ahmadreza Mosallanezhad, and Huan Liu. 2022. PPSL: Privacy-Preserving Text Classification for Split Learning. In 4th International Conference on Data Intelligence and Security. 160--167.Google Scholar
Arwa Alromih, John A. Clark, and Prosanta Gope. 2022. Privacy-Aware Split Learning Based Energy Theft Detection for Smart Grids. In Information and Communications Security - 24th International Conference. 281--300.Google Scholar
Avazu. 2015. Avazu click-through rate prediction. https://www.kaggle.com/c/ avazu-ctr-prediction/data.Google Scholar
Ahmad Ayad, Marian Frei, and Anke Schmeink. 2022. Efficient and Private ECG Classification on the Edge Using a Modified Split Learning Mechanism. In 10th IEEE International Conference on Healthcare Informatics. 1--6.Google ScholarCross Ref
Yuanqin Cai and Tongquan Wei. 2022. Efficient Split Learning with Non-iid Data. In 23rd IEEE International Conference on Mobile Data Management. 128--136.Google Scholar
Elie Chedemail, Basile de Loynes, Fabien Navarro, and Baptiste Olivier. 2022. Large Graph Signal Denoising With Application to Differential Privacy. IEEE Transactions on Signal and Information Processing over Networks, Vol. 8 (2022), 788--798.Google ScholarCross Ref
Criteo. 2014. Criteo display advertising challenge. https://www.kaggle.com/c/ criteo-display-ad-challenge/data.Google Scholar
Xiaofeng Ding, Cui Wang, Kim-Kwang Raymond Choo, and Hai Jin. 2021. A Novel Privacy Preserving Framework for Large Scale Graph Data Publishing. IEEE Trans. Knowl. Data Eng., Vol. 33, 2 (2021), 331--343.Google ScholarDigital Library
Qiang Duan, Shijing Hu, Ruijun Deng, and Zhihui Lu. 2022. Combined Federated and Split Learning in Edge Computing for Ubiquitous Intelligence in Internet of Things: State-of-the-Art and Future Directions. Sensors, Vol. 22, 16 (2022), 5983.Google Scholar
Cynthia Dwork. 2006. Differential Privacy. In Automata, Languages and Programming, 33rd International Colloquium, Vol. 4052. 1--12.Google Scholar
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. 2016. Calibrating Noise to Sensitivity in Private Data Analysis. Journal of Privacy and Confidentiality, Vol. 7, 3 (2016), 17--51.Google ScholarCross Ref
Ege Erdogan, Alptekin Kü pcc ü, and A. Ercü ment cC icc ek. 2022a. SplitGuard: Detecting and Mitigating Training-Hijacking Attacks in Split Learning. In Proceedings of the 21st Workshop on Privacy in the Electronic Society. 125--137.Google Scholar
Ege Erdogan, Alptekin Kü pcc ü, and A. Ercü ment cC icc ek. 2022b. UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks against Split Learning. In Proceedings of the 21st Workshop on Privacy in the Electronic Society. 115--124.Google Scholar
Hossein Esfandiari, Vahab S. Mirrokni, Umar Syed, and Sergei Vassilvitskii. 2022. Label differential privacy via clustering. In International Conference on Artificial Intelligence and Statistics, Vol. 151. 7055--7075.Google Scholar
Alexandre V. Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. 2003. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 211--222.Google ScholarDigital Library
Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, and Chiyuan Zhang. 2021. Deep Learning with Label Differential Privacy. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021. 27131--27145.Google Scholar
Yoo Jeong Ha, Minjae Yoo, Gusang Lee, Soyi Jung, Sae Won Choi, Joongheon Kim, and Seehwan Yoo. 2021. Spatio-Temporal Split Learning for Privacy-Preserving Medical Platforms: Case Studies With COVID-19 CT, X-Ray, and Cholesterol Data. IEEE Access, Vol. 9 (2021), 121046--121059.Google ScholarCross Ref
Dongkun Hou, Jie Zhang, Jieming Ma, Xiaohui Zhu, and Ka Lok Man. 2021. Application of Differential Privacy for Collaborative Filtering Based Recommendation System: A Survey. In 12th International Symposium on Parallel Architectures, Algorithms and Programming. IEEE, 97--101.Google Scholar
Sanjay Kariyappa and Moinuddin K. Qureshi. 2021. Gradient Inversion Attack: Leaking Private Labels in Two-Party Split Learning. CoRR, Vol. abs/2112.01299 (2021), 1--13.Google Scholar
Yusuke Koda, Jihong Park, Mehdi Bennis, Koji Yamamoto, Takayuki Nishio, and Masahiro Morikura. 2019. One Pixel Image and RF Signal Based Split Learning for mmWave Received Power Prediction. In Proceedings of the 15th International Conference on emerging Networking EXperiments and Technologies, CoNEXT 2019, Companion Volume. 54--56.Google ScholarDigital Library
Jingtao Li, Adnan Siraj Rakin, Xing Chen, Zhezhi He, Deliang Fan, and Chaitali Chakrabarti. 2022a. ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10184--10192.Google Scholar
Oscar Li, Jiankai Sun, Xin Yang, Weihao Gao, Hongyi Zhang, Junyuan Xie, Virginia Smith, and Chong Wang. 2022b. Label Leakage and Protection in Two-party Split Learning. In The Tenth International Conference on Learning Representations. 1--27.Google Scholar
Xiaolan Liu, Yansha Deng, and Toktam Mahmoodi. 2022. A Novel Hybrid Split and Federated Learning Architecture in Wireless UAV Networks. In IEEE International Conference on Communications. 1--6.Google ScholarCross Ref
Frank McSherry and Kunal Talwar. 2007. Mechanism Design via Differential Privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science. 94--103.Google ScholarDigital Library
Ahmed El Ouadrhiri and Ahmed Abdelhadi. 2022. Differential Privacy for Deep and Federated Learning: A Survey. IEEE Access, Vol. 10 (2022), 22359--22380.Google ScholarCross Ref
Dario Pasquini, Giuseppe Ateniese, and Massimo Bernaschi. 2021. Unleashing the Tiger: Inference Attacks on Split Learning. In 2021 ACM SIGSAC Conference on Computer and Communications Security. 2113--2129.Google Scholar
Eric Samikwa, Antonio Di Maio, and Torsten Braun. 2022. ARES: Adaptive Resource-Aware Split Learning for Internet of Things. Computer Networks, Vol. 218 (2022), 109380.Google ScholarDigital Library
Jiankai Sun, Xin Yang, Yuanshun Yao, and Chong Wang. 2022a. Label Leakage and Protection from Forward Embedding in Vertical Federated Learning. CoRR, Vol. abs/2203.01451 (2022), 1--17.Google Scholar
Jiankai Sun, Xin Yang, Yuanshun Yao, and Chong Wang. 2022b. Label Leakage and Protection from Forward Embedding in Vertical Federated Learning. CoRR, Vol. abs/2203.01451 (2022), 1--17.Google Scholar
Differential Privacy Team. 2017. Learning with Privacy at Scale. https://machinelearning.apple.com/research/learning-with-privacy-at-scale.Google Scholar
Chandra Thapa, Mahawaga Arachchige Pathum Chamikara, Seyit Camtepe, and Lichao Sun. 2022. SplitFed: When Federated Learning Meets Split Learning. In Thirty-Sixth AAAI Conference on Artificial Intelligence. 8485--8493.Google Scholar
Brandon Tran, Jerry Li, and Aleksander Madry. 2018. Spectral Signatures in Backdoor Attacks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2019. 8011--8021.Google Scholar
Valeria Turina, Zongshun Zhang, Flavio Esposito, and Ibrahim Matta. 2021. Federated or Split? A Performance and Privacy Analysis of Hybrid Split and Federated Learning Architectures. In 14th IEEE International Conference on Cloud Computing. 250--260.Google ScholarCross Ref
Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, and Ramesh Raskar. 2018. Split learning for health: Distributed deep learning without sharing raw patient data. CoRR, Vol. abs/1812.00564 (2018), 1--7.Google Scholar
Praneeth Vepakomma, Abhishek Singh, Otkrist Gupta, and Ramesh Raskar. 2020. NoPeek: Information leakage reduction to share activations in distributed deep learning. In 20th International Conference on Data Mining Workshops, ICDM Workshops 2020, Sorrento, Italy, November 17--20, 2020. 933--942.Google ScholarCross Ref
Dinah Waref and Mohammed Salem. 2022. Split Federated Learning for Emotion Detection. In 4th Novel Intelligent and Leading Emerging Sciences Conference. 112--115.Google Scholar
S. L. Warner. 1965. Randomized response: a survey technique for eliminating evasive answer bias. Publications of the American Statistical Association, Vol. 60, 309 (1965), 63--69.Google ScholarCross Ref
Danyang Xiao, Chengang Yang, and Weigang Wu. 2022. Mixing Activations and Labels in Distributed Training for Split Learning. IEEE Transactions on Parallel and Distributed Systems, Vol. 33, 11 (2022), 3165--3177.Google Scholar
Qinqing Zheng, Jinshuo Dong, Qi Long, and Weijie J. Su. 2020. Sharp Composition Bounds for Gaussian Differential Privacy via Edgeworth Expansion. In Proceedings of the 37th International Conference on Machine Learning. 11420--11435.Google Scholar
Wenxuan Zhou, Zhihao Qu, Yanchao Zhao, Bin Tang, and Baoliu Ye. 2022. An efficient split learning framework for recurrent neural network in mobile edge environment. In Proceedings of the Conference on Research in Adaptive and Convergent Systems. 131--138.Google ScholarDigital Library

Index Terms

PSLF: Defending Against Label Leakage in Split Learning
1. Security and privacy
  1. Systems security
    1. Distributed systems security

Recommendations

UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks against Split Learning
WPES'22: Proceedings of the 21st Workshop on Privacy in the Electronic Society

Training deep neural networks often forces users to work in a distributed or outsourced setting, accompanied with privacy concerns. Split learning aims to address this concern by distributing the model among a client and a server. The scheme supposedly ...
Read More
Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Read More
A More Secure Split: Enhancing the Security of Privacy-Preserving Split Learning
Secure IT Systems
Abstract
Split learning (SL) is a new collaborative learning technique that allows participants, e.g. a client and a server, to train machine learning models without the client sharing raw data. In this setting, the client initially applies its part of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
October 2023
5508 pages
ISBN:9798400701245
DOI:10.1145/3583780
General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
label leakage
privacy preservation
split learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 171
  Total Downloads
- Downloads (Last 12 months)171
- Downloads (Last 6 weeks)28
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PSLF: Defending Against Label Leakage in Split Learning

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks against Split Learning

Transductive Multilabel Learning via Label Set Propagation

A More Secure Split: Enhancing the Security of Privacy-Preserving Split Learning