Abstract
We study the problem of augmenting relational tuples with inconsistency awareness and tackling top-k queries under a set of denial constraints (DCs). We define a notion of inconsistent tuples with respect to a set of DCs and define two measures of inconsistency degrees, which consider single and multiple violations of constraints. In order to compute these measures, we leverage two models of provenance, namely why-provenance and provenance polynomials. We investigate top-k queries that allow to rank the answer tuples by their inconsistency degrees. Since one of our measure is monotonic and the other non-monotonic, we design an integrated top-k algorithm to compute the top-k results of a query w.r.t. both inconsistency measures. By means of an extensive experimental study, we gauge the effectiveness of inconsistency-aware query answering and the efficiency of our algorithm with respect to a baseline, where query results are fully computed and ranked afterwards.
- Adult dataset. https://github.com/HoloClean/holoclean/blob/master/testdata/AdultFull.csv.Google Scholar
- Food inspection dataset. https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5.Google Scholar
- S. Amer-Yahia, S. Elbassuoni, A. Ghizzawi, R. M. Borromeo, E. Hoareau, and P. Mulhem. Fairness in online jobs: A case study on taskrabbit and google. In EDBT 2020, pages 510--521, 2020.Google Scholar
- Y. Amsterdamer, D. Deutch, and V. Tannen. Provenance for aggregate queries. In ACM PODS 2011, page 153--164, 2011. Google ScholarDigital Library
- M. Arenas, L. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In ACM PODS 1999, pages 68--79, 1999. Google ScholarDigital Library
- A. Arioua and A. Bonifati. User-guided repairing of inconsistent knowledge bases. In EDBT 2018, pages 133--144, 2018.Google Scholar
- L. Bertossi. Database Repairing and Consistent Query Answering. Morgan & Claypool, 2011. Google ScholarDigital Library
- L. Bertossi. Database repairs and consistent query answering: Origins and further developments. In ACM PODS, page 48--58, 2019. Google ScholarDigital Library
- L. Bertossi, A. Hunter, and T. Schaub. Introduction to inconsistency tolerance. In Inconsistency Tolerance, volume LNCS 3300, pages 1--14, 2005. Google ScholarDigital Library
- L. E. Bertossi. Repair-based degrees of database inconsistency: Computation and complexity. CoRR, abs/1809.10286, 2018.Google Scholar
- L. E. Bertossi and J. Chomicki. Query answering in inconsistent databases. In J. Chomicki, R. van der Meyden, and G. Saake, editors, In Logics for Emerging Applications of Databases, 2013.Google Scholar
- P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT 2001, 2001. Google ScholarDigital Library
- M. Calautti, M. Console, and A. Pieris. Counting database repairs under primary keys revisited. In ACM PODS, page 104--118, 2019. Google ScholarDigital Library
- A. Calì, D. Lembo, and R. Rosati. On the decidability and complexity of query answering over inconsistent and incomplete databases. In ACM PODS 2003, pages 260--271, 2003. Google ScholarDigital Library
- L. Chengkai, C.-C. C. Kevin, and I. Ihab F. Supporting ad-hoc ranking aggregates. In ACM SIGMOD 2006, 2006. Google ScholarDigital Library
- J. Chomicki, J. Marcinkowski, and S. Staworko. Computing consistent query answers using conflict hypergraphs. In CIKM 2004, pages 417--426, 2004. Google ScholarDigital Library
- X. Chu, I. F. Ilyas, and P. Papotti. Discovering denial constraints. PVLDB, 6(13):1498--1509, Aug. 2013. Google ScholarDigital Library
- Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM TODS, 25(2):179--227, 2000. Google ScholarDigital Library
- H. Decker and D. Martinenghi. Modeling, measuring and monitoring the quality of information. In ER 2009 Workshops. ACM, 2009. Google ScholarDigital Library
- D. Deutch, T. Milo, S. Roy, and V. Tannen. Circuits for datalog provenance. In ICDT 2014, 03 2014.Google Scholar
- D. Didier, L. Jerôme, and P. Henri. Possibilistic logic. In Handbook of Logic in Artificial Intelligence and Logic Programming, pages 439--513. Oxford University Press. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In ACM PODS, page 102--113, 2001. Google ScholarDigital Library
- F. Geerts, F. Pijcke, and J. Wijsen. First-order under-approximations of consistent query answers. Int. J. Approx. Reasoning, 83(C):337--355, Apr. 2017. Google ScholarDigital Library
- J. Grant and A. Hunter. Measuring inconsistency in knowledgebases. Journal of Intelligent Information Systems, 2005. Google ScholarDigital Library
- T. J. Green. Containment of conjunctive queries on annotated relations. In ICDT, pages 296--309. ACM, 2009. Google ScholarDigital Library
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In ACM PODS 2007, pages 31--40. ACM, 2007. Google ScholarDigital Library
- I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. The VLDB Journal, 13(3):207--221, Sept. 2004. Google ScholarDigital Library
- I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR), 2008. Google ScholarDigital Library
- I. F. Ilyas and X. Chu. Data Cleaning. ACM, 2019. Google ScholarDigital Library
- O. Issa, A. Bonifati, and F. Toumani. Evaluating Top-k Queries with Inconsistency Degrees. https://hal.archives-ouvertes.fr/hal-02898931. 2020.Google Scholar
- N. P. Karl Schnaitter. Evaluating rank joins with optimal cost. In ACM PODS 2008, pages 43--52, 2008. Google ScholarDigital Library
- S. Kolahi and L. V. S. Lakshmanan. On approximating optimum repairs for functional dependency violations. In ICDT 2009, pages 53--62, 2009. Google ScholarDigital Library
- J. Lang and P. Marquis. Reasoning under inconsistency: A forgetting-based approach. Artif. Intell., 174(12--13):799--823, 2010. Google ScholarDigital Library
- E. Livshits and B. Kimelfeld. Counting and enumerating (preferred) database repairs. In ACM PODS, pages 281--301, 2017. Google ScholarDigital Library
- E. L. Lozinskii. Information and evidence in logic systems. Journal of Experimental and Theoretical Artificial Intelligence, pages 163--193, 1994.Google ScholarCross Ref
- D. Maslowski and J. Wijsen. A dichotomy in the complexity of counting database repairs. Journal of Computer and System Sciences, 79(6):958 -- 983, 2013. Google ScholarDigital Library
- J. Rammelaere and F. Geerts. Explaining repaired data with CFDs. PVLDB, 11(11):1387--1399, 2018. Google ScholarDigital Library
- T. Rekatsinas, X. Chu, I. F. Ilyas, and C. Ré. Holoclean: Holistic data repairs with probabilistic inference. PVLDB, 10(11):1190--1201, 2017. Google ScholarDigital Library
- K. Schnaitter and N. Polyzotis. Optimal algorithms for evaluating rank joins in database systems. ACM TODS, 35(1):6:1--6:47, 2010. Google ScholarDigital Library
- J. Wijsen. Database repairing using updates. ACM TODS, 30(3):722--768, 2005. Google ScholarDigital Library
- D. Xin, J. Han, and K. C.-C. Chang. Progressive and selective merge: computing top-k with ad-hoc ranking functions. In ACM SIGMOD, pages 103--114, 2007. Google ScholarDigital Library
Recommendations
Top-k dominating queries in uncertain databases
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database TechnologyDue to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate ...
Scalable and efficient processing of top-k multiple-type integrated queries
AbstractIn this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various ...
Probabilistic top-k dominating queries in uncertain databases
Due to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate ...
Comments