Hallmarks of Optimization Trajectories in Neural Networks and LLMs: The Lengths, Bends, and Dead Ends

Singh, Sidak Pal; He, Bobby; Hofmann, Thomas; Schölkopf, Bernhard

Computer Science > Machine Learning

arXiv:2403.07379 (cs)

[Submitted on 12 Mar 2024]

Title:Hallmarks of Optimization Trajectories in Neural Networks and LLMs: The Lengths, Bends, and Dead Ends

Authors:Sidak Pal Singh, Bobby He, Thomas Hofmann, Bernhard Schölkopf

View PDF HTML (experimental)

Abstract:We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich structure of parameters contained within their optimization trajectories. Towards this end, we introduce some natural notions of the complexity of optimization trajectories, both qualitative and quantitative, which reveal the inherent nuance and interplay involved between various optimization choices, such as momentum, weight decay, and batch size. We use them to provide key hallmarks about the nature of optimization in deep neural networks: when it goes right, and when it finds itself in a dead end. Further, thanks to our trajectory perspective, we uncover an intertwined behaviour of momentum and weight decay that promotes directional exploration, as well as a directional regularization behaviour of some others. We perform experiments over large-scale vision and language settings, including large language models (LLMs) with up to 12 billion parameters, to demonstrate the value of our approach.

Comments:	Preprint, 51 pages
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Cite as:	arXiv:2403.07379 [cs.LG]
	(or arXiv:2403.07379v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.07379

Submission history

From: Sidak Pal Singh [view email]
[v1] Tue, 12 Mar 2024 07:32:47 UTC (15,321 KB)

Computer Science > Machine Learning

Title:Hallmarks of Optimization Trajectories in Neural Networks and LLMs: The Lengths, Bends, and Dead Ends

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Hallmarks of Optimization Trajectories in Neural Networks and LLMs: The Lengths, Bends, and Dead Ends

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators