Embarassingly Simple Dataset Distillation

Feng, Yunzhen; Vedantam, Ramakrishna; Kempe, Julia

Computer Science > Machine Learning

arXiv:2311.07025 (cs)

[Submitted on 13 Nov 2023]

Title:Embarassingly Simple Dataset Distillation

Authors:Yunzhen Feng, Ramakrishna Vedantam, Julia Kempe

View PDF

Abstract:Dataset distillation extracts a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced variance in the gradients, computational burden, and long-term dependencies. We introduce an improved method: Random Truncated Backpropagation Through Time (RaT-BPTT) to address them. RaT-BPTT incorporates a truncation coupled with a random window, effectively stabilizing the gradients and speeding up the optimization while covering long dependencies. This allows us to establish new state-of-the-art for a variety of standard dataset benchmarks. A deeper dive into the nature of distilled data unveils pronounced intercorrelation. In particular, subsets of distilled datasets tend to exhibit much worse performance than directly distilled smaller datasets of the same size. Leveraging RaT-BPTT, we devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets.

Comments:	Short version appears at NeurIPS 2023 WANT workshop
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2311.07025 [cs.LG]
	(or arXiv:2311.07025v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.07025

Submission history

From: Yunzhen Feng [view email]
[v1] Mon, 13 Nov 2023 02:14:54 UTC (7,798 KB)

Computer Science > Machine Learning

Title:Embarassingly Simple Dataset Distillation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Embarassingly Simple Dataset Distillation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators