research-article

CEDA: Learned Cardinality Estimation with Domain Adaptation

Authors:
Zilong Wang

Beijing Jiaotong University, China

Beijing Jiaotong University, China
View Profile

,
Qixiong Zeng

Beijing Jiaotong University, China

Beijing Jiaotong University, China
View Profile

,
Ning Wang

Beijing Jiaotong University, China

Beijing Jiaotong University, China
View Profile

,
Haowen Lu

Beijing Jiaotong University, China

Beijing Jiaotong University, China
View Profile

,
Yue Zhang

Beijing Jiaotong University, China

Beijing Jiaotong University, China
View Profile

Authors Info & Claims

Proceedings of the VLDB Endowment Volume 16 Issue 12pp 3934–3937https://doi.org/10.14778/3611540.3611589

Published:01 August 2023Publication History

Proceedings of the VLDB Endowment

Abstract

Cardinality Estimation (CE) is a fundamental but critical problem in DBMS query optimization, while deep learning techniques have made significant breakthroughs in the research of CE. However, apart from requiring sufficiently large training data to cover all possible query regions for accurate estimation, current query-driven CE methods also suffer from workload drifts. In fact, retraining or fine-tuning needs cardinality labels as ground truth and obtaining the labels through DBMS is also expensive. Therefore, we propose CEDA, a novel domain-adaptive CE system. CEDA can achieve more accurate estimations by automatically generating workloads as training data according to the data distribution in the database, and incorporating histogram information into an attention-based cardinality estimator. To solve the problem of workload drifts in real-world environments, CEDA adopts a domain adaptation strategy, making the model more robust and perform well on an unlabeled workload with a large difference from the feature distribution of the training set.

References

Nicolas Bruno, Surajit Chaudhuri, and Dilys Thomas. 2006. Generating queries with cardinality constraints for dbms testing. IEEE Transactions on Knowledge and Data Engineering 18, 12 (2006), 1721--1725.Google ScholarDigital Library
Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment 12, 9 (2019), 1044--1057.Google ScholarDigital Library
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677 (2018).Google Scholar
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment 9, 3 (2015), 204--215.Google ScholarDigital Library
Donald R Slutz. 1998. Massive stochastic testing of SQL. In VLDB, Vol. 98. Citeseer, 618--622.Google Scholar
Ji Sun, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and Nan Tang. 2021. Learned cardinality estimation: A design space exploration and a comparative evaluation. Proceedings of the VLDB Endowment 15, 1 (2021), 85--97.Google ScholarDigital Library
Jianhong Tu, Ju Fan, Nan Tang, Peng Wang, Chengliang Chai, Guoliang Li, Ruixue Fan, and Xiaoyong Du. 2022. Domain adaptation for deep entity resolution. In Proceedings of the 2022 International Conference on Management of Data. 443--457.Google ScholarDigital Library
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).Google Scholar
Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep unsupervised cardinality estimation. arXiv preprint arXiv:1905.04278 (2019).Google Scholar

Recommendations

Learned Cardinality Estimation for Similarity Queries
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

In this paper, we study the problem of using deep neural networks (DNNs) for estimating the cardinality of similarity queries. Intuitively, DNNs can capture the distribution of data points, and learn to predict the number of data points that are similar ...
Read More
Unsupervised Domain Adaptation for 3D Human Pose Estimation
MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Training an accurate 3D human pose estimator often requires a large amount of 3D ground-truth data which is inefficient and costly to collect. Previous methods have either resorted to weakly supervised methods to reduce the demand of ground-truth data ...
Read More
Asymmetric tri-training for unsupervised domain adaptation
ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

It is important to apply models trained on a large number of labeled samples to different domains because collecting many labeled samples in various domains is expensive. To learn discriminative representations for the target domain, we assume that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 16, Issue 12
August 2023
685 pages
ISSN:2150-8097
Editors:
Georgia Koutrika
Athena Research Center
,
Jun Yang
Duke University
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2023
Published in pvldb Volume 16, Issue 12

Check for updates
Badges
- Artifacts Available / v1.1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 90
  Total Downloads
- Downloads (Last 12 months)90
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CEDA: Learned Cardinality Estimation with Domain Adaptation

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Learned Cardinality Estimation for Similarity Queries

Unsupervised Domain Adaptation for 3D Human Pose Estimation

Asymmetric tri-training for unsupervised domain adaptation