On the Training Instability of Shuffling SGD with Batch Normalization

Wu, David X.; Yun, Chulhee; Sra, Suvrit

Computer Science > Machine Learning

arXiv:2302.12444 (cs)

[Submitted on 24 Feb 2023 (v1), last revised 14 Aug 2023 (this version, v3)]

Title:On the Training Instability of Shuffling SGD with Batch Normalization

Authors:David X. Wu, Chulhee Yun, Suvrit Sra

View PDF

Abstract:We uncover how SGD interacts with batch normalization and can exhibit undesirable training dynamics such as divergence. More precisely, we study how Single Shuffle (SS) and Random Reshuffle (RR) -- two widely used variants of SGD -- interact surprisingly differently in the presence of batch normalization: RR leads to much more stable evolution of training loss than SS. As a concrete example, for regression using a linear network with batch normalization, we prove that SS and RR converge to distinct global optima that are "distorted" away from gradient descent. Thereafter, for classification we characterize conditions under which training divergence for SS and RR can, and cannot occur. We present explicit constructions to show how SS leads to distorted optima in regression and divergence for classification, whereas RR avoids both distortion and divergence. We validate our results by confirming them empirically in realistic settings, and conclude that the separation between SS and RR used with batch normalization is relevant in practice.

Comments:	ICML 2023 camera-ready version, added references; 75 pages
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2302.12444 [cs.LG]
	(or arXiv:2302.12444v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2302.12444

Submission history

From: David X. Wu [view email]
[v1] Fri, 24 Feb 2023 04:10:54 UTC (477 KB)
[v2] Fri, 16 Jun 2023 16:40:45 UTC (479 KB)
[v3] Mon, 14 Aug 2023 05:22:10 UTC (473 KB)

Computer Science > Machine Learning

Title:On the Training Instability of Shuffling SGD with Batch Normalization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the Training Instability of Shuffling SGD with Batch Normalization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators