Block-wise Training of Residual Networks via the Minimizing Movement Scheme

Karkar, Skander; Ayed, Ibrahim; de Bézenac, Emmanuel; Gallinari, Patrick

Computer Science > Machine Learning

arXiv:2210.00949 (cs)

[Submitted on 3 Oct 2022 (v1), last revised 6 Jun 2023 (this version, v2)]

Title:Block-wise Training of Residual Networks via the Minimizing Movement Scheme

Authors:Skander Karkar, Ibrahim Ayed, Emmanuel de Bézenac, Patrick Gallinari

View PDF

Abstract:End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward locking), which prohibit training the layers in parallel. Solving layer-wise optimization problems can address these problems and has been used in on-device training of neural networks. We develop a layer-wise training method, particularly welladapted to ResNets, inspired by the minimizing movement scheme for gradient flows in distribution space. The method amounts to a kinetic energy regularization of each block that makes the blocks optimal transport maps and endows them with regularity. It works by alleviating the stagnation problem observed in layer-wise training, whereby greedily-trained early layers overfit and deeper layers stop increasing test accuracy after a certain depth. We show on classification tasks that the test accuracy of block-wise trained ResNets is improved when using our method, whether the blocks are trained sequentially or in parallel.

Comments:	1st International Workshop on Practical Deep Learning in the Wild at AAAI 2022
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2210.00949 [cs.LG]
	(or arXiv:2210.00949v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2210.00949

Submission history

From: Skander Karkar [view email]
[v1] Mon, 3 Oct 2022 14:03:56 UTC (729 KB)
[v2] Tue, 6 Jun 2023 13:48:11 UTC (729 KB)

Computer Science > Machine Learning

Title:Block-wise Training of Residual Networks via the Minimizing Movement Scheme

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Block-wise Training of Residual Networks via the Minimizing Movement Scheme

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators