Skip to main content
Log in

Structured content-based query answers for improving information quality

World Wide Web Aims and scope Submit manuscript

Abstract

Extensible markup language (XML) has been widely adopted as a standard to exchange and integrate data over multiple sources. This allows users to explore large datasets through a declarative query interface, such as XQuery and XPath. However, the results of queries posted to such heterogeneous data sources are often inconsistent due to the anomalies arising from structural and semantic inconsistencies. This significantly affects the ability of the system to provide accurate query answers. Most of the prior work on finding consistent query answers (CQAs) lacks the full extensibility to find the CQAs relating to the requirements of data constraints holding conditionally on XML data with inconsistent structures. This paper proposes an approach, called SC2QA, which utilizes XML conditional functional dependency (XCSD) to compute consistent answers for queries posted to arbitrary XML data to improve information quality. An XCSD is a structured and content-based functional dependency holding conditionally on certain objects with diverse structures. The query answer is calculated by qualifying queries with appropriate information derived from the interaction between the query and the XCSDs. Experiments have been conducted on synthetic datasets to demonstrate the effectiveness of SC2QA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Afrati, F.N., Kolaitis, P.G.: Repair checking in inconsistent databases: algorithms and complexity, ICDT ‘09 Proceedings of the 12th International Conference on Database Theory St. Petersburg, Russia, pp. 31–41. (2009)

  2. Arenas, M.: Normalization theory for XML. SIGMOD Rec. 35(4), 57–64 (2006)

    Article  Google Scholar 

  3. Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases, PODS ‘99, Philadelphia, Pennsylvania, USA, ACM, pp. 68–79. (1999)

  4. Arenas, M., Bertossi, L., Chomicki, J.: Answer sets for consistent query answering in inconsistent databases. Theory Pract Log Program 3(4), 393–424 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  5. Arenas, M., Bertossi, L., Chomicki, J., He, X., Raghavan, V., Spinrad, J.: Scalar aggregation in inconsistent databases. Theor. Comput. Sci. 296(3), 405–434 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  6. Arenas, M., Bertossi, L.: On the Decidability of Consistent Query Answering, In proc. Alberto Mendelzon Int. Workshop on Foundations of Data Management, (2010)

  7. Bertossi, L.: Consistent query answering in databases. SIGMOD Rec. 35(2), 68–76 (2006)

    Article  Google Scholar 

  8. Bertossi, L.: Database repairing and consistent query answering. In: Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2011)

  9. Buttler, D.: A Short Survey of Document Structure Similarity Algorithms, Proceedings of the 5th International Conference on Internet Computing, USA, pp. 3–9. (2004)

  10. Cate, B.T., Fontaine, G., Kolaitis, P.G.: On the data complexity of consistent query answering. Proceedings of the 15th International Conference on Database Theory, Berlin, Germany, ACM, pp. 22–33. (2012)

  11. Ceravolo, P., Liu, C., Jarrar, M., Sattler, K.-U.: Special issue on querying the data web. World Wide Web 14(5–6), 461–463 (2011)

    Article  Google Scholar 

  12. Chomicki, J.: Consistent Query Answering: Five Easy Pieces 11th International Conference on Database theory, Springer LNCS, 1–17. (2007)

  13. Chomicki, J., Marcinkowski, J., Staworko, S.: Computing consistent query answers using conflict hypergraphs, CIKM ‘04 Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, ACM Press, pp. 417–426. (2004)

  14. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving Data Quality: Consistency and Accuracy, VLDB‘07, Vienna, Austria, VLDB Endowment, pp. 315–326. (2007)

  15. Deutsch, A., Tannen, V.: Reformulation of XML Queries and Constraints, Proceedings of the 9th International Conference on Database Theory, Springer-Verlag, pp. 225–241. (2002)

  16. Deutsch, A., Popa, L., Tannen, V.: Query reformulation with constraints. SIGMOD Rec. 35(1), 65–73 (2006)

    Article  Google Scholar 

  17. Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2), 1–48 (2008)

    Article  Google Scholar 

  18. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Interaction between record matching and data repairing, SIGMOD ‘11, Athens, Greece, ACM pp. 469–480. (2011)

  19. Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Repairs and consistent answers for XML data with functional dependencies. In: Database and XML Technologies, pp. 238–253. Springer, Berlin (2003)

    Chapter  Google Scholar 

  20. Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Querying and repairing inconsistent XML data. In: WISE 2005, pp. 175–188. Springer, Berlin (2005)

    Google Scholar 

  21. Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Repairing Inconsistent XML Data with Functional Dependencies. In: Encyclopedia of Database Technologies and Applications, Idea Group, 542–547. (2005)

  22. Flesca, S., Furfaro, F., Parisi, F.: Querying and repairing inconsistent numerical databases. ACM Trans. Database Syst. 35(2), 1–50 (2010)

    Article  Google Scholar 

  23. Ghodke, S., Bird, S., Zhang, R.: A Breadth-First Representation for Tree Matching in Large Scale Forest-Based Translation, 5th International Joint Conference on Natural Language Processing Chiang Mai, Thailand, IJCNLP2011 pp. 785–793. (2011)

  24. Giacomo, G.D., Lembo, D., Lenzerini, M., Rosati, R.: Tackling inconsistencies in data integration through source preferences Workshop on Information Quality in Information Systems - QDB, Paris, pp. 27–34. (2004)

  25. Kolahi, S., Lakshmanan, L.V.S.: On approximating optimum repairs for functional dependency violations, ICDT ‘09 Proceedings of the 12th International Conference on Database Theory St. Petersburg, Russia, ACM, pp. 53–62. (2009)

  26. Kolahi, S., Lakshmanan, L.V.S.: Exploiting conflict structures in inconsistent databases, ADBIS‘10 Proceedings of the 14th East European Conference on Advances in Databases and Information Systems, Novi Sad, Serbia, Springer-Verlag, pp. 320–335. (2010)

  27. Lee, K.-H., Whang, K.-Y., Han, W.-S.: XMin: minimizing tree pattern queries with minimality guarantee. World Wide Web 13(3), 343–371 (2010)

    Article  Google Scholar 

  28. Manolescu, I., Florescu, D., Kossmann, D.: Answering XML Queries on Heterogeneous Data Sources, Proceedings of the 27th International Conference on Very Large Data Bases, Roma, Italy, pp. 241–250. (2001)

  29. Ng, W.: Repairing Inconsistent Merged XML Data, Database and Expert Systems Applications. (2003).

  30. Puhlmann, S., Naumann, F., Eis, M.: The Dirty XML Generator. (2004)

  31. Rafiei, D., Moise, D.L., Sun, D.: Finding Syntactic Similarities Between XML Documents, Proceedings of the 17th International Conference on Database and Expert Systems Applications, DEXA‘06, pp. 512–516. (2006)

  32. Staworko, S., Chomicki, J.: Validity-Sensitive Querying of XML Databases, EDBT Workshops, pp. 164–177. (2006)

  33. Tagarelli, A.: Exploring dictionary-based semantic relatedness in labeled tree data. Inf. Sci. 220(20), 244–268 (2013)

    Article  Google Scholar 

  34. Tan, Z., Zhang, L.: Repairing XML functional dependency violations. Inf. Sci. 181(23), 5304–5320 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  35. Tan, Z., Wang, W., Shi, B.: Extending Tree Automata to Obtain Consistent Query Answer from Inconsistent XML Document Proceedings of the First International Multi-Symposium on Computer and Computational Sciences (IMSCCS‘06), pp. 488–495. (2006)

  36. Tan, Z., Zhang, Z., Wang, W., Shi, B.: Computing repairs for inconsistent XML document using chase. In: Anvances in Data and Web Management, pp. 293–304. Springer, Berlin (2007)

    Chapter  Google Scholar 

  37. Tan, Z., Liu, C., Wang, W., Shi, B.: Consistent query answers from virtually integrated XML data. J. Syst. Softw. 83(12), 2566–2578 (2010)

    Article  Google Scholar 

  38. Vincent, M.W., Liu, J., Liu, C.: Strong functional dependencies and their application to normal forms in XML. ACM Trans. Database Syst. 29(3), 445–462 (2004)

    Article  Google Scholar 

  39. Vo, L.T.H., Cao, J., Rahayu, W.: Discovering Conditional Functional Dependencies in XML Data, Australasian Database Conference, pp. 143–152. (2011)

  40. Vo, L.T.H., Cao, J., Rahayu, W., Nguyen, H.-Q.: Structured content-aware discovery for improving XML data consistency. Inform. Sci. 248(1), 168–190 (2013)

    Article  MathSciNet  Google Scholar 

  41. W3C, XML Path Language (XPath), (1999)

  42. Weis, M., Naumann, F.: Detecting Duplicate Objects in XML Documents, Proceedings of the 2004 international workshop on Information quality in information systems, Paris, France, ACM, pp. 10–19. (2004)

  43. Weis, M., Naumann, F.: DogmatiX Tracks down Duplicates in XML, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, Baltimore, Maryland, ACM pp. 431–442. (2005)

  44. Yakout, M., Elmagarmid, A.K., Neville, J., Ouzzani, M.: GDR: a system for guided data repair, SIGMOD, pp. 1223–1226. (2010)

  45. Yu, C., Jagadish, H.V.: XML Schema refinement through redundancy detection and normalization. VLDB 17(2), 203–223 (2008)

    Article  Google Scholar 

  46. Yu, C., Popa, L.: Constraint-based XML query rewriting for data integration, SIGMOD ‘04, Paris, France, pp. 371–382. (2004)

  47. Yu, C., Jagadish, H.V.: Efficient Discovery of XML Data Redundancies, Proceedings of the 32nd International Conference on Very Large Databases, Seoul, Korea, VLDB Endowment pp. 103–114. (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Loan T. H. Vo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vo, L.T.H., Cao, J. & Rahayu, W. Structured content-based query answers for improving information quality. World Wide Web 18, 889–912 (2015). https://doi.org/10.1007/s11280-014-0287-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-014-0287-z

Keywords

Navigation