基于人在回路的纵向联邦学习模型可解释性研究

doi:10.11959/j.issn.2096-6652.202345

智能科学与技术学报 ›› 2024, Vol. 6 ›› Issue (1): 64-75.doi: 10.11959/j.issn.2096-6652.202345

基于人在回路的纵向联邦学习模型可解释性研究

李晓欢¹^,², 郑钧柏¹^,², 康嘉文³, 叶进², 陈倩¹^,⁴()

^1.广西高校智能网联与场景化系统重点实验室（桂林电子科技大学信息与通信学院），广西桂林 541004
^2.广西综合交通大数据研究院，广西南宁 530025
^3.广东工业大学自动化学院，广东广州 510006
^4.桂林电子科技大学建筑与交通工程学院，广西桂林 541004

收稿日期:2023-09-12 修回日期:2023-12-07 出版日期:2024-03-15 发布日期:2024-03-15
通讯作者: 陈倩 E-mail:chenqian@mails.guet.edu.cn
作者简介:李晓欢（1982- ），男，博士，桂林电子科技大学信息与通信学院教授、博士生导师，主要研究方向为智能计算、工业物联网和空天地网络等。
郑钧柏（1997- ），男，桂林电子科技大学信息与通信学院硕士生，主要研究方向为可解释性机器学习和联邦学习等。
康嘉文（1989- ），男，博士，广东工业大学自动化学院青年百人教授，主要研究方向为隐私保护、区块链和工业物联网等。
叶进（1970- ），女，博士，广西综合交通大数据研究院教授、博士生导师，主要研究方向为网络协议设计和智能计算等。
陈倩（1984- ），女，硕士，桂林电子科技大学信息与通信学院副教授，主要研究方向为物联网、交通大数据和智能计算等。
基金资助:
国家自然科学基金项目(U22A2054);广西科技重大专项(AA22068101)

Research on the explainability of vertical federated learning models based on human-in-the-loop

Xiaohuan LI¹^,², Junbai ZHENG¹^,², Jiawen KANG³, Jin YE², Qian CHEN¹^,⁴()

^1.Guangxi University Key Laboratory of Intelligent Networking and Scenario System (Guilin University of Electronic Technology), Guilin 54004, China
^2.Guangxi Research Institute of Integrated Transportation Big Data, Nanning 530025, China
^3.School of Automation, Guangdong University of Technology, Guangzhou 510006, China
^4.School of Architecture and Transportation Engineering, Guilin University of Electronic Technology, Guilin 54004, China

Received:2023-09-12 Revised:2023-12-07 Online:2024-03-15 Published:2024-03-15
Contact: Qian CHEN E-mail:chenqian@mails.guet.edu.cn
Supported by:
The National Natural Science Foundation of China(U22A2054);The Key Science and Technology Project of Guangxi(AA22068101)

摘要/Abstract

摘要：

纵向联邦学习（vertical federated learning，VFL）常用于高风险场景中的跨领域数据共享，用户需要理解并信任模型决策以推动模型应用。现有研究主要关注VFL中可解释性与隐私之间的权衡，未充分满足用户对模型建立信任及调优的需求。为此，提出了一种基于人在回路（human-in-the-loop，HITL）的纵向联邦学习解释方法（explainable vertical federated learning based on human-in-the-loop，XVFL-HITL），通过构建分布式HITL结构将用户反馈纳入VFL的基于Shapley值的解释方法中，利用各参与方的知识校正训练数据来提高模型性能。进一步，考虑到隐私问题，基于Shapley值的可加性原理，将非当前参与方的特征贡献值整合为一个整体展示，从而有效保护了各参与方的特征隐私。实验结果表明，在基准数据上，XVFL-HITL的解释结果具有有效性，并保护了用户的特征隐私；同时，XVFL-HITL对比VFL-Random和直接使用SHAP的VFL-Shapley进行特征选择的方法，模型准确率分别提高了约14%和11%。

关键词: 纵向联邦学习, 可解释性, 人在回路, Shapley值

Abstract:

Vertical federated learning (VFL) is commonly used for cross-domain data sharing in high-risk scenarios. Users need to understand and trust model decisions to promote the application of models. Existing research primarily focuses on the trade-off between explainability and privacy within VFL, and fails to fully meet the needs of users for establishing trust and fine-tuning models. To address these issues, we proposed an explainable vertical federated learning method based on human-in-the-loop (XVFL-HITL), which incorporated user feedback into the VFL's Shapley value-based explainability approach through a distributed HITL structure, using the knowledge of all VFL participants to correct training data and enhance model performance. Furthermore, considering privacy concerns, this paper employed the additive principle of Shapley values to integrate the feature contribution values of all entities other than the target participant into an aggregated measure, which effectively protected the feature privacy of each participant. Experimental results indicated that on benchmark data, the explainability results of XVFL-HITL were effective and could well protect the feature privacy of user. Additionally, compared to VFL-Random and VFL-Shapley, the model accuracy of XVFL-HITL improved by approximately 14% and 11%, respectively.

Key words: vertical federated learning, explainability, human-in-the-loop, Shapley value

中图分类号:

TP391

李晓欢, 郑钧柏, 康嘉文, 等. 基于人在回路的纵向联邦学习模型可解释性研究[J]. 智能科学与技术学报, 2024, 6(1): 64-75.

Xiaohuan LI, Junbai ZHENG, Jiawen KANG, et al. Research on the explainability of vertical federated learning models based on human-in-the-loop[J]. Chinese Journal of Intelligent Science and Technology, 2024, 6(1): 64-75.

图/表 6

图1

表1

图2

图3

图4

图5

参考文献 38

1	XU W, YANG Z H, NG D W K, et al. Edge learning for B5G networks with distributed signal processing: semantic communication, edge computing, and wireless sensing[J]. IEEE Journal of Selected Topics in Signal Processing, 2023, 17(1): 9-39.
2	DU H Y, MA B H, NIYATO D, et al. Rethinking quality of experience for metaverse services: a consumer-based economics perspective[J]. IEEE Network, 2023(99): 1-8.
3	BHARDWAJ R, NAMBIAR A R, DUTTA D. A study of machine learning in healthcare[C]//Proceedings of 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC). Piscataway: IEEE Press, 2017: 236-241.
4	MALEKLOO A, OZER E, ALHAMAYDEH M, et al. Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights[J]. Structural Health Monitoring, 2022, 21(4): 1906-1955.
5	ABDULLAH, ALANAZI. Using machine learning for healthcare challenges and opportunities[J]. Informatics in Medicine Unlocked, 2022, 30: 100924.
6	YANG Y, He K, WANG Y P, et al. Identification of dynamic traffic crash risk for cross-area freeways based on statistical and machine learning methods[J]. Physica A: Statistical Mechanics and Its Applications, 2022, 595: 127083.
7	ABU AL-HAIJA Q, KRICHEN M, ABU ELHAIJA W. Machine-learning-based darknet traffic detection system for IoT applications[J]. Electronics, 2022, 11(4): 556.
8	SALEEM M, ABBAS S, GHAZAL T M, et al. Smart cities: fusion-based intelligent traffic congestion control system for vehicular networks using machine learning techniques[J]. Egyptian Informatics Journal, 2022, 23(3): 417-426.
9	FERNANDES M, CORCHADO J M, MARREIROS G. Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review[J]. Applied Intelligence, 2022, 52(12): 14246-14280.
10	GAROUANI M, AHMAD A, BOUNEFFA M, et al. Towards big industrial data mining through explainable automated machine learning[J].The International Journal of Advanced Manufacturing Technology, 2022, 120(1/2): 1169-1188.
11	YANG F, MUELLER M L. Internet governance in China: a content analysis[J]. Chinese Journal of Communication, 2014, 7(4): 446-465.
12	MULDER T, TUDORICA M. Privacy policies, cross-border health data and the GDPR[J]. Information & Communications Technology Law, 2019, 28(3): 261-274.
13	陆康, 刘慧, 任贝贝, 等. 智慧图书馆用户数据隐私保护研究：基于《中华人民共和国网络安全法》和《一般数据保护条例》的文本启示[J]. 图书馆理论与实践, 2020(3): 17-21.
	LU K, LIU H, REN B B, et al. Research on user data privacy protection of smart libraries: based on the 《enlighten-ment of cybersecurity law of the people's repulic of China》 and 《general data pro-tection regulation》[J]. Library Theory and Practice, 2020(3): 17-21.
14	杨强, 童咏昕, 王晏晟, 等. 群体智能中的联邦学习算法综述[J]. 智能科学与技术学报, 2022, 4(1): 29-44.
	YANG Q, TONG Y X, WANG Y S, et al. A survey on federated learning in crowd intelligence[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(1): 29-44.
15	陈晋音, 李荣昌, 黄国瀚, 等. 纵向联邦学习方法及其隐私和安全综述[J]. 网络与信息安全学报, 2023, 9(2): 1-20.
	CHEN J Y, LI R C, HUANG G H, et al. Survey on vertical federated learning: algorithm, privacy and security[J]. Chinese Journal of Network and Information Security, 2023, 9(2): 1-20.
16	VEPAKOMMA P, GUPTA O, SWEDISH T, et al. Split learning for health: distributed deep learning without sharing raw patient data[EB]. arXiv preprint, 2018, arXiv:1812.00564.
17	LONG G D, TAN Y, JIANG J, et al. Federated Learning for Open Banking[M]//YANG Q, FAN L, YU H. Federated Learning. Cham: Springer, 2020: 240-254.
18	ZHOU J, LI X L, ZHAO P L, et al. KunPeng: parameter server based distributed learning systems and its applications in alibaba and ant financial[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2017: 1693-1702.
19	MOLNAR C. Interpretable Machine Learning[M]. North Carolina: Lulu Press, Inc., 2020.
20	DAS A, RAD P. Opportunities and challenges in explainable artificial intelligence (xai): a survey[EB]. arXiv preprint, 2020, arXiv:2006.11371.
21	FREITAS A A. Comprehensible classification models[J]. ACM SIGKDD Explorations Newsletter, 2014, 15(1): 1-10.
22	ZHENG F, LI K, TIAN J, et al. A vertical federated learning method for interpretable scorecard and its application in credit scoring[EB]. arXiv preprint, 2020, arXiv:2009.06218.
23	MONTAVON G, SAMEK W, MüLLER K R. Methods for interpreting and understanding deep neural networks[J]. Digital Signal Processing, 2018, 73: 1-15.
24	RIBEIRO M T, SINGH S, GUESTRIN C. "Why should I trust You?": explaining the predictions of any classifier[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 1135-114.
25	LUNDBERG S M, LEE S I. A unified approach to interpreting model predictions[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 4768-4777.
26	CHEN P, DU X, LU Z, et al. EVFL: an explainable vertical federated learning for data-oriented artificial intelligence systems[J]. Journal of Systems Architecture, 2022, 126: 102474.
27	DUBEY P. On the uniqueness of the shapley value[J]. International Journal of Game Theory, 1975, 4(3): 131-139.
28	KANG Y, HE Y, LUO J, et al. Privacy-preserving federated adversarial domain adaptation over feature groups for interpretability[J]. IEEE Transactions on Big Data, 2022(99): 1-2.
29	WANG G. Interpret federated learning with shapley values[EB]. arXiv preprint, 2019, arXiv:1905.04519.
30	WANG G, DANG C X, ZHOU Z Y. Measure contribution of participants in federated learning[C]//Proceedings of 2019 IEEE International Conference on Big Data (Big Data). Piscataway: IEEE Press, 2019: 2597-2604.
31	GU Y, BAI Y. LR-BA: backdoor attack against vertical federated learning using local latent representations[J]. Computers & Security, 2023: 103193.
32	WU X, XIAO L, SUN Y, et al. A survey of human-in-the-loop for machine learning[J]. Future Generation Computer Systems, 2022, 135: 364-381.
33	张俊, 许沛东, 陈思远, 等. 物理-数据-知识混合驱动的人机混合增强智能系统管控方法[J]. 智能科学与技术学报, 2022(4): 571-583.
	ZHANG J, XU P D, CHEN S Y, et al. A hybrid physics-data-knowledge driven approach for human-machine hybrid-augmented intelligence-based system management and control[J]. Chinese Journal of Intelligent Science and Technology, 2022(4): 571-583.
34	NGUYEN T N, CHOO R. Human-in-the-loop XAI-enabled vulnerability detection, investigation, and mitigation[C]//Proceedings of 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). Piscataway: IEEE Press, 2021: 1210-1212.
35	ROMANINI D, HALL A J, PAPADOPOULOS P, et al. PyVertical: a vertical federated learning framework for multi-headed splitnn[EB]. arXiv preprint, 2021, arXiv:2104.00489.
36	TSIAKAS K, MURRAY-RUST D. Using human-in-the-loop and explainable AI to envisage new future work practices[C]//Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments. New York: ACM, 2022: 588-594.
37	VALTONEN L, M?KINEN S J. Human-in-the-loop: explainable or accurate artificial intelligence by exploiting human bias? [C]//Proceedings of 2022 IEEE 28th International Conference on Engineering, Technology and Innovation (ICE/ITMC) & 31st International Association For Management of Technology (IAMOT) Joint Conference. Piscataway: IEEE Press, 2022: 1-8.
38	KOHAVI R. Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid[C]//Proceedings of KDD. [S.l.:s.n.], 1996: 202-207.

参数	说明
M	参与方数量
P	参与方集合
P_i	集合 P中第 i个参与方
X	所有参与方本地数据集合
n	每个参与方本地数据数量
k	所有参与方本地数据特征数量
X ⁱ	P_i 的本地数据集
d_i	X ⁱ 的特征数量
x_jⁱ	X ⁱ 中第 j个样本实例
F ⁱ	X ⁱ 的特征集合
f_jⁱ	X ⁱ 中第 j列的特征索引
Q	本地模型

基于人在回路的纵向联邦学习模型可解释性研究

Research on the explainability of vertical federated learning models based on human-in-the-loop

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献 38

相关文章 1

Metrics

推荐阅读 0