STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional Settings

Beck, Nathan; Kothawade, Suraj; Shenoy, Pradeep; Iyer, Rishabh

Computer Science > Machine Learning

arXiv:2305.10643 (cs)

[Submitted on 18 May 2023]

Title:STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional Settings

Authors:Nathan Beck, Suraj Kothawade, Pradeep Shenoy, Rishabh Iyer

View PDF

Abstract:Deep neural networks have consistently shown great performance in several real-world use cases like autonomous vehicles, satellite imaging, etc., effectively leveraging large corpora of labeled training data. However, learning unbiased models depends on building a dataset that is representative of a diverse range of realistic scenarios for a given task. This is challenging in many settings where data comes from high-volume streams, with each scenario occurring in random interleaved episodes at varying frequencies. We study realistic streaming settings where data instances arrive in and are sampled from an episodic multi-distributional data stream. Using submodular information measures, we propose STREAMLINE, a novel streaming active learning framework that mitigates scenario-driven slice imbalance in the working labeled data via a three-step procedure of slice identification, slice-aware budgeting, and data selection. We extensively evaluate STREAMLINE on real-world streaming scenarios for image classification and object detection tasks. We observe that STREAMLINE improves the performance on infrequent yet critical slices of the data over current baselines by up to $5\%$ in terms of accuracy on our image classification tasks and by up to $8\%$ in terms of mAP on our object detection tasks.

Comments:	20 pages, 14 figures, 2 tables
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.10643 [cs.LG]
	(or arXiv:2305.10643v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2305.10643

Submission history

From: Nathan Beck [view email]
[v1] Thu, 18 May 2023 02:01:45 UTC (3,798 KB)

Computer Science > Machine Learning

Title:STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional Settings

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional Settings

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators