Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets

Underwood, Robert; Calhoun, Jon C.; Di, Sheng; Cappello, Franck

Computer Science > Machine Learning

arXiv:2403.15953 (cs)

[Submitted on 23 Mar 2024]

Title:Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets

Authors:Robert Underwood, Jon C. Calhoun, Sheng Di, Franck Cappello

View PDF HTML (experimental)

Abstract:Learning and Artificial Intelligence (ML/AI) techniques have become increasingly prevalent in high performance computing (HPC). However, these methods depend on vast volumes of floating point data for training and validation which need methods to share the data on a wide area network (WAN) or to transfer it from edge devices to data centers. Data compression can be a solution to these problems, but an in-depth understanding of how lossy compression affects model quality is needed. Prior work largely considers a single application or compression method. We designed a systematic methodology for evaluating data reduction techniques for ML/AI, and we use it to perform a very comprehensive evaluation with 17 data reduction methods on 7 ML/AI applications to show modern lossy compression methods can achieve a 50-100x compression ratio improvement for a 1% or less loss in quality. We identify critical insights that guide the future use and design of lossy compressors for ML/AI.

Comments:	12 pages, 4 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
ACM classes:	I.2.6; E.2; C.4
Cite as:	arXiv:2403.15953 [cs.LG]
	(or arXiv:2403.15953v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.15953

Submission history

From: Robert Underwood [view email]
[v1] Sat, 23 Mar 2024 23:14:37 UTC (5,014 KB)

Computer Science > Machine Learning

Title:Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators