OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

Diao, Yiqun; Yang, Yutong; Li, Qinbin; He, Bingsheng; Lu, Mian

Computer Science > Machine Learning

arXiv:2308.15059 (cs)

[Submitted on 29 Aug 2023 (v1), last revised 15 Dec 2023 (this version, v3)]

Title:OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

Authors:Yiqun Diao, Yutong Yang, Qinbin Li, Bingsheng He, Mian Lu

View PDF HTML (experimental)

Abstract:How to get insights from relational data streams in a timely manner is a hot research topic. Data streams can present unique challenges, such as distribution drifts, outliers, emerging classes, and changing features, which have recently been described as open environment challenges for machine learning. While existing studies have been done on incremental learning for data streams, their evaluations are mostly conducted with synthetic datasets. Thus, a natural question is how those open environment challenges look like and how existing incremental learning algorithms perform on real-world relational data streams. To fill this gap, we develop an Open Environment Benchmark named OEBench to evaluate open environment challenges in real-world relational data streams. Specifically, we investigate 55 real-world relational data streams and establish that open environment scenarios are indeed widespread, which presents significant challenges for stream learning algorithms. Through benchmarks with existing incremental learning algorithms, we find that increased data quantity may not consistently enhance the model accuracy when applied in open environment scenarios, where machine learning models can be significantly compromised by missing values, distribution drifts, or anomalies in real-world data streams. The current techniques are insufficient in effectively mitigating these challenges brought by open environments. More researches are needed to address real-world open environment challenges. All datasets and code are open-sourced in this https URL.

Subjects:	Machine Learning (cs.LG); Databases (cs.DB)
Cite as:	arXiv:2308.15059 [cs.LG]
	(or arXiv:2308.15059v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2308.15059

Submission history

From: Yiqun Diao [view email]
[v1] Tue, 29 Aug 2023 06:43:29 UTC (5,894 KB)
[v2] Sun, 3 Sep 2023 14:43:31 UTC (5,908 KB)
[v3] Fri, 15 Dec 2023 09:04:01 UTC (5,999 KB)

Computer Science > Machine Learning

Title:OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators