Accelerating the Discovery of Data Quality Rules: A Case Study

Authors

  • Peter Z. Yeh Accenture
  • Colin A. Puri Accenture
  • Mark Wagman Accenture
  • Ajay K. Easo Accenture

DOI:

https://doi.org/10.1609/aaai.v25i2.18865

Abstract

Poor quality data is a growing and costly problem that af- fects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. In this paper, we present an application – Data Quality Rules Accelerator (DQRA) – that accelerates Data Quality (DQ) efforts (e.g. data profiling and cleansing) by automatically discovering DQ rules for detecting inconsistencies in data. We then present two evaluations. The first evaluation compares DQRA to existing solutions; and shows that DQRA either outperformed or achieved performance comparable with these solutions on metrics such as precision, recall, and runtime. The second evaluation is a case study where DQRA was piloted at a large utilities company to improve data quality as part of a legacy migration effort. DQRA was able to discover rules that detected data inconsistencies directly impacting revenue and operational efficiency. Moreover, DQRA was able to significantly reduce the amount of effort required to develop these rules compared to the state of the practice. Finally, we describe ongoing efforts to deploy DQRA.

Downloads

Published

2011-08-11

How to Cite

Yeh, P., Puri, C., Wagman, M., & Easo, A. (2011). Accelerating the Discovery of Data Quality Rules: A Case Study. Proceedings of the AAAI Conference on Artificial Intelligence, 25(2), 1707-1714. https://doi.org/10.1609/aaai.v25i2.18865