Communication-Efficient Distributed Minimax Optimization via Markov Compression

Yang, Linfeng; Zhang, Zhen; Che, Keqin; Yang, Shaofu; Wang, Suyang

doi:10.1007/978-981-99-8079-6_42

Linfeng Yang¹²,
Zhen Zhang¹²,
Keqin Che¹²,
Shaofu Yang¹² &
…
Suyang Wang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14447))

Included in the following conference series:

International Conference on Neural Information Processing

698 Accesses

Abstract

Recently, the minimax problem has attracted a lot of attention due to its wide applications in modern machine learning fields such as GANs. With the exponential growth of data volumes and increasing problem sizes, the design of distributed algorithms to train high-performance models has become imperative. However, distributed algorithms often suffer from communication bottlenecks. To address this challenge, in this paper, we propose a communication-efficient distributed compressed stochastic gradient descent ascent algorithm, abbreviated as DCSGDA, in a parameter-server setting. To reduce the communication cost, each client in DCSGDA transmits the compressed gradients of the primal and dual variables to the server at each iteration. In particular, we leverage a Markov compression mechanism that allows both unbiased and biased compressors to mitigate the negative effect of compression errors on convergence. Namely, we show theoretically that the DCSGDA algorithm can still achieve linear convergence in the presence of compression errors, provided that the local objective function is strongly-convex-strongly-concave. Finally, numerical experiments demonstrate the desirable communication efficiency and efficacy of the proposed DCSGDA.

This work was supported in part by the National Natural Science Foundation of China under Grant 62176056, and in part by the Young Elite Scientists Sponsorship Program by the China Association for Science and Technology (CAST) under Grant 2021QNRC001.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Basu, D., Data, D., Karakus, C., Diggavi, S.: Qsparse-local-sgd: distributed sgd with quantization, sparsification and local computations. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Beznosikov, A., Gorbunov, E., Berard, H., Loizou, N.: Stochastic gradient descent-ascent: unified theory and new efficient methods. In: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, vol. 206, pp. 172–235. PMLR (2023)
Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
Article Google Scholar
Deng, Y., Mahdavi, M.: Local stochastic gradient descent ascent: convergence analysis and communication efficiency. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, vol. 130, pp. 1387–1395. PMLR (2021)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Korpelevich, G.: An extragradient method for finding saddle points and for other problems. Ekonomika i Matematicheskie Metody 12, 747–756 (1976)
MathSciNet MATH Google Scholar
Lin, Y., Han, S., Mao, H., Wang, Y., Dally, W.J.: Deep gradient compression: reducing the communication bandwidth for distributed training. In: International Conference on Learning Representations (2018)
Google Scholar
Liu, C., Bi, S., Luo, L., Lui, J.C.: Partial-quasi-Newton methods: efficient algorithms for minimax optimization problems with unbalanced dimensionality. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1031–1041 (2022)
Google Scholar
Liu, C., Chen, L., Luo, L., Lui, J.: Communication efficient distributed Newton method with fast convergence rates. arXiv preprint arXiv:2305.17945 (2023)
Liu, C., Luo, L.: Quasi-Newton methods for saddle point problems. In: Advances in Neural Information Processing Systems, vol. 35, pp. 3975–3987 (2022)
Google Scholar
Liu, X., Li, Y., Wang, R., Tang, J., Yan, M.: Linear convergent decentralized optimization with compression. arXiv preprint arXiv:2007.00232 (2020)
Loizou, N., Richtárik, P.: Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods. Comput. Optim. Appl. 77(3), 653–710 (2020)
Article MathSciNet MATH Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.y.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54, pp. 1273–1282. PMLR (2017)
Google Scholar
Namkoong, H., Duchi, J.C.: Stochastic gradient methods for distributionally robust optimization with \(f\)-divergences. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)
Article MathSciNet MATH Google Scholar
Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2681–2690. PMLR (2017)
Google Scholar
Qian, X., Richtárik, P., Zhang, T.: Error compensated distributed sgd can be accelerated. In: Advances in Neural Information Processing Systems, vol. 34, pp. 30401–30413 (2021)
Google Scholar
Rabbat, M., Nowak, R.: Quantized incremental algorithms for distributed optimization. IEEE J. Sel. Areas Commun. 23(4), 798–808 (2005)
Article Google Scholar
Rasouli, M., Sun, T., Rajagopal, R.: Fedgan: federated generative adversarial networks for distributed data. arXiv preprint arXiv:2006.07228 (2020)
Richtárik, P., Sokolov, I., Fatkhullin, I.: Ef21: a new, simpler, theoretically better, and practically faster error feedback. In: Advances in Neural Information Processing Systems, vol. 34, pp. 4384–4396 (2021)
Google Scholar
Sharma, P., Panda, R., Joshi, G., Varshney, P.: Federated minimax optimization: Improved convergence analyses and algorithms. In: Proceedings of the 39th International Conference on Machine Learning, vol. 162, pp. 19683–19730. PMLR (2022)
Google Scholar
Shi, S., et al.: A distributed synchronous sgd algorithm with global top-\(k\) sparsification for low bandwidth networks. In: 2019 IEEE 39th International Conference on Distributed Computing Systems, pp. 2238–2247 (2019)
Google Scholar
Stich, S.U.: Local SGD converges fast and communicates little. In: International Conference on Learning Representations (2019)
Google Scholar
Ström, N.: Scalable distributed dnn training using commodity gpu cloud computing. In: Proceedings of the Annual Conference of the International Speech Communication Association. pp. 1488–1492 (2015)
Google Scholar
Sun, J., Chen, T., Giannakis, G., Yang, Z.: Communication-efficient distributed learning via lazily aggregated quantized gradients. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Wang, S., et al.: Adaptive federated learning in resource constrained edge computing systems. IEEE J. Sel. Areas Commun. 37(6), 1205–1221 (2019)
Article MathSciNet Google Scholar
Yu, Y., Wu, J., Huang, L.: Double quantization for communication-efficient distributed optimization. In: Advances in Neural Information Processing Systems, vol. 32, pp. 4438–4449 (2019)
Google Scholar
Zhang, Z., Yang, S., Xu, W., Di, K.: Privacy-preserving distributed admm with event-triggered communication. IEEE Trans. Neural Networks Learn. Syst., 1–13 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, Nanjing, China
Linfeng Yang, Zhen Zhang, Keqin Che & Shaofu Yang
Jiangsu Jinheng Information Technology Co., Ltd, Wujin, China
Suyang Wang

Authors

Linfeng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Keqin Che
View author publications
You can also search for this author in PubMed Google Scholar
Shaofu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Suyang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaofu Yang .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, L., Zhang, Z., Che, K., Yang, S., Wang, S. (2024). Communication-Efficient Distributed Minimax Optimization via Markov Compression. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14447. Springer, Singapore. https://doi.org/10.1007/978-981-99-8079-6_42

Download citation

DOI: https://doi.org/10.1007/978-981-99-8079-6_42
Published: 14 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8078-9
Online ISBN: 978-981-99-8079-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Communication-Efficient Distributed Minimax Optimization via Markov Compression