ABSTRACT
Federated Multilingual Modeling (FMM) has become an essential approach in natural language processing (NLP) due to increasing linguistic diversity and the heightened emphasis on data privacy. However, FMM faces two primary challenges: 1) the high communication costs inherent in network operations, and 2) the complexities arising from parameter interference, as languages exhibit both unique characteristics and shared features. To tackle these issues, we introduce a communication-efficient framework for Multilingual Modeling (MM) that combines low-rank adaptation with a hierarchical language tree structure. Our method maintains the base model's weights while focusing on updating only the Low-rank adaptation (LoRA) parameters, significantly reducing communication costs. Additionally, we mitigate parameter conflicts by organizing languages based on their familial ties rather than merging all LoRA parameters together. Our experimental findings reveal that this novel model surpasses established baseline models in performance and markedly decreases communication overhead.
Supplemental Material
- Armen Aghajanyan, Sonal Gupta, and Luke Zettlemoyer. 2021. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).Google ScholarCross Ref
- M Saiful Bari, Batool Haider, and Saab Mansour. 2021. Nearest neighbour fewshot learning for cross-lingual classification. arXiv preprint arXiv:2109.02221 (2021).Google Scholar
- Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. 2022. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).Google ScholarCross Ref
- Gaode Chen, Xinghua Zhang, Yijun Su, Yantong Lai, Ji Xiang, Junbo Zhang, and Yu Zheng. 2023. Win-Win: A Privacy-Preserving Federated Framework for Dual-Target Cross-Domain Recommendation. In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Brian Williams, Yiling Chen, and Jennifer Neville (Eds.). AAAI Press.Google Scholar
- Alexandra Chronopoulou, Dario Stojanovski, and Alexander Fraser. 2023. Language-Family Adapters for Low-Resource Multilingual Neural Machine Translation. In Proceedings of the The Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023). 59--72.Google ScholarCross Ref
- Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Unsupervised Cross-lingual Representation Learning at Scale. CoRR (2019).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi. org/10.18653/v1/N19-1423Google Scholar
- Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, et al. 2022. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv preprint arXiv:2203.06904 (2022).Google Scholar
- Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, et al. 2021. Beyond english-centric multilingual machine translation. The Journal of Machine Learning Research 22, 1 (2021), 4839--4886.Google ScholarDigital Library
- Xinyu Fu and Irwin King. 2023. FedHGN: A Federated Framework for Heterogeneous Graph Neural Networks. arXiv preprint arXiv:2305.09729 (2023).Google Scholar
- Jay Gala, Deep Gandhi, Jash Mehta, and Zeerak Talat. 2023. A Federated Approach for Hate Speech Detection. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Dubrovnik, Croatia, 3248--3259. https: //doi.org/10.18653/v1/2023.eacl-main.237Google ScholarCross Ref
- Karim Gamal, Ahmed Gaber, and Hossam Amer. 2023. Federated Learning Based Multilingual Emoji Prediction In Clean and Attack Scenarios. arXiv preprint arXiv:2304.01005 (2023).Google Scholar
- Demi Guo, Alexander Rush, and Yoon Kim. 2021. Parameter-Efficient Transfer Learning with Diff Pruning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 4884--4896. https://doi.org/10.18653/v1/2021.acllong. 378Google ScholarCross Ref
- Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790--2799.Google Scholar
- Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations. https: //openreview.net/forum?id=nZeVKeeFYf9Google Scholar
- Zhiqiang Hu, Yihuai Lan, Lei Wang, Wanyu Xu, Ee-Peng Lim, Roy Ka-Wei Lee, Lidong Bing, and Soujanya Poria. 2023. LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models. arXiv preprint arXiv:2304.01933 (2023).Google Scholar
- Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2021. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14, 1--2 (2021), 1--210.Google Scholar
- Yeachan Kim, Junho Kim, Wing-Lam Mok, Jun-Hyung Park, and SangKeun Lee. 2023. Client-Customized Adaptation for Parameter-Efficient Federated Learning. In Findings of the Association for Computational Linguistics: ACL 2023.Google Scholar
- Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. CoRR abs/1610.05492 (2016). arXiv:1610.05492 http://arxiv.org/abs/1610.05492Google Scholar
- Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).Google ScholarCross Ref
- Tomasz Limisiewicz, Jiří Balhar, and David Mareček. 2023. Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada, 5661--5681. https://doi.org/10.18653/v1/2023.findings-acl.350Google ScholarCross Ref
- Bill Yuchen Lin, Chaoyang He, Zihang Ze, Hulin Wang, Yufen Hua, Christophe Dupuy, Rahul Gupta, Mahdi Soltanolkotabi, Xiang Ren, and Salman Avestimehr. 2022. FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks. In Findings of the Association for Computational Linguistics: NAACL 2022. Association for Computational Linguistics, Seattle, United States, 157--175. https://doi.org/10.18653/v1/2022.findings-naacl.13Google ScholarCross Ref
- Yi Liu, Xiaohan Bi, Lei Li, Sishuo Chen, Wenkai Yang, and Xu Sun. 2023. Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada, 5315--5328. https://aclanthology.org/2023.findings-acl.327Google Scholar
- Andrea Manoel, Mirian del Carmen Hipolito Garcia, Tal Baumel, Shize Su, Jialei Chen, Robert Sim, Dan Miller, Danny Karmon, and Dimitrios Dimitriadis. 2023. Federated Multilingual Models for Medical Transcript Analysis. In Proceedings of the Conference on Health, Inference, and Learning (Proceedings of Machine Learning Research, Vol. 209), Bobak J. Mortazavi, Tasmie Sarker, Andrew Beam, and Joyce C. Ho (Eds.). PMLR, 147--162. https://proceedings.mlr.press/v209/manoel23a.htmlGoogle Scholar
- Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 54), Aarti Singh and Jerry Zhu (Eds.). PMLR, 1273--1282. https://proceedings. mlr.press/v54/mcmahan17a.htmlGoogle Scholar
- Lewis M Paul, Gary F Simons, Charles D Fennig, et al. 2009. Ethnologue: Languages of the world. Dallas, TX: SIL International. Available online at www. ethnologue. com/. Retrieved June 19 (2009), 2011.Google Scholar
- Matt Post. 2018. A Call for Clarity in Reporting BLEU Scores. In Proceedings of the Third Conference on Machine Translation: Research Papers.Google ScholarCross Ref
- Yichen Ruan and Carlee Joe-Wong. 2022. Fedsoft: Soft clustered federated learning with proximal local updating. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8124--8131.Google ScholarCross Ref
- Sebastian Ruder, Jonas Pfeiffer, and Ivan Vulić. 2022. Modular and Parameter-Efficient Fine-Tuning for NLP Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts. 23--29.Google ScholarCross Ref
- Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv abs/1910.01108 (2019).Google Scholar
- Yi-Lin Sung, Jaemin Cho, and Mohit Bansal. 2022. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Advances in Neural Information Processing Systems 35 (2022), 12991--13005.Google Scholar
- Saeed Vahidian, Mahdi Morafah, Weijia Wang, Vyacheslav Kungurtsev, Chen Chen, Mubarak Shah, and Bill Lin. 2023. Efficient distribution similarity identification in clustered federated learning via principal angles between client data subspaces. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarDigital Library
- Haoyu Wang, Handong Zhao, Yaqing Wang, Tong Yu, Jiuxiang Gu, and Jing Gao. 2022. FedKC: Federated Knowledge Composition for Multilingual Natural Language Understanding. In The ACM Web Conference 2022.Google Scholar
- Orion Weller, Marc Marone, Vladimir Braverman, Dawn Lawrie, and Benjamin Van Durme. 2022. Pretrained Models for Multilingual Federated Learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 1413--1421. https://doi.org/10. 18653/v1/2022.naacl-main.101Google ScholarCross Ref
- Runxin Xu, Fuli Luo, Baobao Chang, Songfang Huang, and Fei Huang. 2022. S4-Tuning: A Simple Cross-lingual Sub-network Tuning Method-Tuning: A Simple Cross-lingual Sub-network Tuning Method. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 530--537.Google ScholarCross Ref
- Zheng Xu, Yanxiang Zhang, Galen Andrew, Christopher Choquette, Peter Kairouz, Brendan Mcmahan, Jesse Rosenstock, and Yuanbo Zhang. 2023. Federated Learning of Gboard Language Models with Differential Privacy. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track).Google ScholarCross Ref
- Dun Zeng, Siqi Liang, Xiangjing Hu, Hui Wang, and Zenglin Xu. 2023. Fed Lab: A Flexible Federated Learning Framework. Journal of Machine Learning Research (2023).Google Scholar
- Tianshu Zhang, Changchang Liu, Wei-Han Lee, Yu Su, and Huan Sun. 2023. Federated Learning for Semantic Parsing: Task Formulation, Evaluation Setup, New Algorithms. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 12149--12163. https://doi.org/10.18653/v1/2023.acllong. 678Google ScholarCross Ref
- Yifei Zhang, Dun Zeng, Jinglong Luo, Zenglin Xu, and Irwin King. 2023. A Survey of Trustworthy Federated Learning with Perspectives on Security, Robustness and Privacy. In Companion Proceedings of the ACM Web Conference 2023 (Austin, TX, USA) (WWW '23 Companion). Association for Computing Machinery, New York, NY, USA, 1167--1176. https://doi.org/10.1145/3543873.3587681Google ScholarDigital Library
- Zhuo Zhang, Xiangjing Hu, Jingyuan Zhang, Yating Zhang, Hui Wang, Lizhen Qu, and Zenglin Xu. 2023. FEDLEGAL: The First Real-World Federated Learning Benchmark for Legal NLP. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).Google ScholarCross Ref
- Zhuo Zhang, Yuanhang Yang, Yong Dai, Qifan Wang, Yue Yu, Lizhen Qu, and Zenglin Xu. 2023. Fed PETuning: When Federated Learning Meets the Parameter-Efficient Tuning Methods of Pre-trained Language Models. In Findings of the Association for Computational Linguistics: ACL 2023.Google Scholar
Index Terms
- FedHLT: Efficient Federated Low-Rank Adaption with Hierarchical Language Tree for Multilingual Modeling
Recommendations
Multilingual Offensive Language Identification for Low-resource Languages
Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, ...
The Role of WSD for Multilingual Natural Language Applications
TSD '02: Proceedings of the 5th International Conference on Text, Speech and DialogueNowadays, the need of advanced free text filtering in multilingual environment is increasing. Therefore, when searching for specific keywords in multilingual information space, it is desirable to eliminate occurrences where the word or words of each ...
Refining low-resource unsupervised translation by language disentanglement of multilingual model
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing SystemsNumerous recent work on unsupervised machine translation (UMT) implies that competent unsupervised translations of low-resource and unrelated languages, such as Nepali or Sinhala, are only possible if the model is trained in a massive multilingual ...
Comments