Abstract
Extensible markup language (XML) has been widely adopted as a standard to exchange and integrate data over multiple sources. This allows users to explore large datasets through a declarative query interface, such as XQuery and XPath. However, the results of queries posted to such heterogeneous data sources are often inconsistent due to the anomalies arising from structural and semantic inconsistencies. This significantly affects the ability of the system to provide accurate query answers. Most of the prior work on finding consistent query answers (CQAs) lacks the full extensibility to find the CQAs relating to the requirements of data constraints holding conditionally on XML data with inconsistent structures. This paper proposes an approach, called SC2QA, which utilizes XML conditional functional dependency (XCSD) to compute consistent answers for queries posted to arbitrary XML data to improve information quality. An XCSD is a structured and content-based functional dependency holding conditionally on certain objects with diverse structures. The query answer is calculated by qualifying queries with appropriate information derived from the interaction between the query and the XCSDs. Experiments have been conducted on synthetic datasets to demonstrate the effectiveness of SC2QA.
Similar content being viewed by others
References
Afrati, F.N., Kolaitis, P.G.: Repair checking in inconsistent databases: algorithms and complexity, ICDT ‘09 Proceedings of the 12th International Conference on Database Theory St. Petersburg, Russia, pp. 31–41. (2009)
Arenas, M.: Normalization theory for XML. SIGMOD Rec. 35(4), 57–64 (2006)
Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases, PODS ‘99, Philadelphia, Pennsylvania, USA, ACM, pp. 68–79. (1999)
Arenas, M., Bertossi, L., Chomicki, J.: Answer sets for consistent query answering in inconsistent databases. Theory Pract Log Program 3(4), 393–424 (2003)
Arenas, M., Bertossi, L., Chomicki, J., He, X., Raghavan, V., Spinrad, J.: Scalar aggregation in inconsistent databases. Theor. Comput. Sci. 296(3), 405–434 (2003)
Arenas, M., Bertossi, L.: On the Decidability of Consistent Query Answering, In proc. Alberto Mendelzon Int. Workshop on Foundations of Data Management, (2010)
Bertossi, L.: Consistent query answering in databases. SIGMOD Rec. 35(2), 68–76 (2006)
Bertossi, L.: Database repairing and consistent query answering. In: Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2011)
Buttler, D.: A Short Survey of Document Structure Similarity Algorithms, Proceedings of the 5th International Conference on Internet Computing, USA, pp. 3–9. (2004)
Cate, B.T., Fontaine, G., Kolaitis, P.G.: On the data complexity of consistent query answering. Proceedings of the 15th International Conference on Database Theory, Berlin, Germany, ACM, pp. 22–33. (2012)
Ceravolo, P., Liu, C., Jarrar, M., Sattler, K.-U.: Special issue on querying the data web. World Wide Web 14(5–6), 461–463 (2011)
Chomicki, J.: Consistent Query Answering: Five Easy Pieces 11th International Conference on Database theory, Springer LNCS, 1–17. (2007)
Chomicki, J., Marcinkowski, J., Staworko, S.: Computing consistent query answers using conflict hypergraphs, CIKM ‘04 Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, ACM Press, pp. 417–426. (2004)
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving Data Quality: Consistency and Accuracy, VLDB‘07, Vienna, Austria, VLDB Endowment, pp. 315–326. (2007)
Deutsch, A., Tannen, V.: Reformulation of XML Queries and Constraints, Proceedings of the 9th International Conference on Database Theory, Springer-Verlag, pp. 225–241. (2002)
Deutsch, A., Popa, L., Tannen, V.: Query reformulation with constraints. SIGMOD Rec. 35(1), 65–73 (2006)
Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2), 1–48 (2008)
Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Interaction between record matching and data repairing, SIGMOD ‘11, Athens, Greece, ACM pp. 469–480. (2011)
Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Repairs and consistent answers for XML data with functional dependencies. In: Database and XML Technologies, pp. 238–253. Springer, Berlin (2003)
Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Querying and repairing inconsistent XML data. In: WISE 2005, pp. 175–188. Springer, Berlin (2005)
Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Repairing Inconsistent XML Data with Functional Dependencies. In: Encyclopedia of Database Technologies and Applications, Idea Group, 542–547. (2005)
Flesca, S., Furfaro, F., Parisi, F.: Querying and repairing inconsistent numerical databases. ACM Trans. Database Syst. 35(2), 1–50 (2010)
Ghodke, S., Bird, S., Zhang, R.: A Breadth-First Representation for Tree Matching in Large Scale Forest-Based Translation, 5th International Joint Conference on Natural Language Processing Chiang Mai, Thailand, IJCNLP2011 pp. 785–793. (2011)
Giacomo, G.D., Lembo, D., Lenzerini, M., Rosati, R.: Tackling inconsistencies in data integration through source preferences Workshop on Information Quality in Information Systems - QDB, Paris, pp. 27–34. (2004)
Kolahi, S., Lakshmanan, L.V.S.: On approximating optimum repairs for functional dependency violations, ICDT ‘09 Proceedings of the 12th International Conference on Database Theory St. Petersburg, Russia, ACM, pp. 53–62. (2009)
Kolahi, S., Lakshmanan, L.V.S.: Exploiting conflict structures in inconsistent databases, ADBIS‘10 Proceedings of the 14th East European Conference on Advances in Databases and Information Systems, Novi Sad, Serbia, Springer-Verlag, pp. 320–335. (2010)
Lee, K.-H., Whang, K.-Y., Han, W.-S.: XMin: minimizing tree pattern queries with minimality guarantee. World Wide Web 13(3), 343–371 (2010)
Manolescu, I., Florescu, D., Kossmann, D.: Answering XML Queries on Heterogeneous Data Sources, Proceedings of the 27th International Conference on Very Large Data Bases, Roma, Italy, pp. 241–250. (2001)
Ng, W.: Repairing Inconsistent Merged XML Data, Database and Expert Systems Applications. (2003).
Puhlmann, S., Naumann, F., Eis, M.: The Dirty XML Generator. (2004)
Rafiei, D., Moise, D.L., Sun, D.: Finding Syntactic Similarities Between XML Documents, Proceedings of the 17th International Conference on Database and Expert Systems Applications, DEXA‘06, pp. 512–516. (2006)
Staworko, S., Chomicki, J.: Validity-Sensitive Querying of XML Databases, EDBT Workshops, pp. 164–177. (2006)
Tagarelli, A.: Exploring dictionary-based semantic relatedness in labeled tree data. Inf. Sci. 220(20), 244–268 (2013)
Tan, Z., Zhang, L.: Repairing XML functional dependency violations. Inf. Sci. 181(23), 5304–5320 (2011)
Tan, Z., Wang, W., Shi, B.: Extending Tree Automata to Obtain Consistent Query Answer from Inconsistent XML Document Proceedings of the First International Multi-Symposium on Computer and Computational Sciences (IMSCCS‘06), pp. 488–495. (2006)
Tan, Z., Zhang, Z., Wang, W., Shi, B.: Computing repairs for inconsistent XML document using chase. In: Anvances in Data and Web Management, pp. 293–304. Springer, Berlin (2007)
Tan, Z., Liu, C., Wang, W., Shi, B.: Consistent query answers from virtually integrated XML data. J. Syst. Softw. 83(12), 2566–2578 (2010)
Vincent, M.W., Liu, J., Liu, C.: Strong functional dependencies and their application to normal forms in XML. ACM Trans. Database Syst. 29(3), 445–462 (2004)
Vo, L.T.H., Cao, J., Rahayu, W.: Discovering Conditional Functional Dependencies in XML Data, Australasian Database Conference, pp. 143–152. (2011)
Vo, L.T.H., Cao, J., Rahayu, W., Nguyen, H.-Q.: Structured content-aware discovery for improving XML data consistency. Inform. Sci. 248(1), 168–190 (2013)
W3C, XML Path Language (XPath), (1999)
Weis, M., Naumann, F.: Detecting Duplicate Objects in XML Documents, Proceedings of the 2004 international workshop on Information quality in information systems, Paris, France, ACM, pp. 10–19. (2004)
Weis, M., Naumann, F.: DogmatiX Tracks down Duplicates in XML, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, Baltimore, Maryland, ACM pp. 431–442. (2005)
Yakout, M., Elmagarmid, A.K., Neville, J., Ouzzani, M.: GDR: a system for guided data repair, SIGMOD, pp. 1223–1226. (2010)
Yu, C., Jagadish, H.V.: XML Schema refinement through redundancy detection and normalization. VLDB 17(2), 203–223 (2008)
Yu, C., Popa, L.: Constraint-based XML query rewriting for data integration, SIGMOD ‘04, Paris, France, pp. 371–382. (2004)
Yu, C., Jagadish, H.V.: Efficient Discovery of XML Data Redundancies, Proceedings of the 32nd International Conference on Very Large Databases, Seoul, Korea, VLDB Endowment pp. 103–114. (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vo, L.T.H., Cao, J. & Rahayu, W. Structured content-based query answers for improving information quality. World Wide Web 18, 889–912 (2015). https://doi.org/10.1007/s11280-014-0287-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-014-0287-z