skip to main content
10.1145/2513190.2513196acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

INDREX: in-database distributional relation extraction

Published:28 October 2013Publication History

ABSTRACT

Relation extraction transforms the textual representation of a relationship into the relational model of a data warehouse. Early systems, such as SystemT by IBM or the open source system GATE solve this task with handcrafted rule sets that the system executes document-by-document. Thereby the user must execute a highly interactive and iterative process of reading a document, of expressing rules, of testing these rules on the next document and of refining rules. Until now, these systems do neither leverage the full potential of built-in declarative query languages nor the indexing and query optimization techniques of a modern RDBMS that would enable a user interactive rule refinement across documents and on the entire corpus. We propose the INDREX system that enables a user for the first time to describe corpus-wide extraction tasks in a declarative language and permits the user to run interactive rule refinement queries. For enabling this powerful functionality we extend a standard PostgreSQL with a set of white-box user-defined functions that enable corpus-wide transformations from sentences into relationships. We store the text corpus and rules in the same RDBMS that already holds domain specific structured data. As a result, (1) the user can leverage this data to further adapt rules to the target domain, (2) the user does not need an additional system for rule extraction and (3) the INDREX system can leverage the full power of built-in indexing and query optimization techniques of the underlaying RDBMS. In a preliminary study we report on the feasibility of this disruptive approach and show multiple queries in INDREX on the Reuters Corpus, Volume 1.

References

  1. A. Akbik and A. Löser. Kraken: N-ary facts in open information extraction. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, AKBC-WEKEX '12, pages 52--56, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Akbik, L. Visengeriyeva, P. Herger, H. Hemsen, and A. Löser. Unsupervised discovery of relations and discriminative extraction patterns. In COLING, pages 17--32, 2012.Google ScholarGoogle Scholar
  3. A. Akbik, L. Visengeriyeva, and J. K. A. Löser. Effective selectional restrictions for unsupervised relation extraction. In IJCNLP, 2013.Google ScholarGoogle Scholar
  4. J. F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26(11):832--843, Nov. 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Anderson, D. Antenucci, V. Bittorf, M. Burgess, M. J. Cafarella, A. Kumar, F. Niu, Y. Park, C. Ré, and C. Zhang. Brainwash: A data system for feature engineering. In CIDR, 2013.Google ScholarGoogle Scholar
  6. G. Attardi, F. dell'Orletta, M. Simi, A. Chanev, and M. Ciaramita. Multilingual dependency parsing and domain adaptation using desr. In EMNLP-CoNLL, pages 1112--1118, 2007.Google ScholarGoogle Scholar
  7. N. Bales, A. Deutsch, and V. Vassalos. Score-consistent algebraic optimization of full-text search queries with graft. In SIGMOD Conference, pages 769--780, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Bloom. Taxonomy of educational objectives: Handbook I: Cognitive Domain. New York, Longmans, Green 1956.Google ScholarGoogle Scholar
  9. J.-H. Boese, C. Tosun, C. Mathis, and F. Faerber. Data management with saps in-memory computing engine. In EDBT, pages 542--544, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Chen, A. Doan, J. Yang, and R. Ramakrishnan. Efficient information extraction over evolving text data. In ICDE, pages 943--952, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Chiticariu, V. Chu, S. Dasgupta, T. W. Goetz, H. Ho, R. Krishnamurthy, A. Lang, Y. Li, B. Liu, S. Raghavan, F. Reiss, S. Vaithyanathan, and H. Zhu. The systemt ide: an integrated development environment for information extraction rules. In SIGMOD Conference, pages 1291--1294, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. F. Codd. Extending the database relational model to capture more meaning. ACM Transactions on Database Systems, 4:397--434, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. D. Corro and R. Gemulla. Clausie: clause-based open information extraction. In WWW, pages 355--366, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. El-Helw, M. H. Farid, and I. F. Ilyas. Just-in-time information extraction using extraction views. In SIGMOD Conference, pages 613--616, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. Open information extraction: The second generation. In IJCAI, pages 3--10, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11, page 1535--1545, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Ferrucci and A. Lally. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng., 10(3-4):327--348, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Floratou, J. M. Patel, E. J. Shekita, and S. Tata. Column-oriented storage techniques for mapreduce. PVLDB, 4(7):419--429, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Ganchev, K. Hall, R. T. McDonald, and S. Petrov. Using search-logs to improve query tagging. In ACL (2), pages 238--242, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING, pages 539--545, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Jain, A. Doan, and L. Gravano. Optimizing SQL queries over text databases. In Data Engineering, International Conference on, volume 0, pages 636--645, Los Alamitos, CA, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, S. Vaithyanathan, and H. Zhu. SystemT: a system for declarative information extraction. SIGMOD Rec., 37(4):7--13, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res., 5:361--397, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Löser, S. Arnold, and T. Fiehn. The goolap fact retrieval framework. In Business Intelligence, pages 84--97. Springer, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  25. A. Löser, C. Nagel, S. Pieper, and C. Boden. Beyond search: Retrieving complete tuples from a text-database. Information Systems Frontiers, 15(3):311--329, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Marchionini. Exploratory search: from finding to understanding. Commun. ACM, 49(4):41--46, Apr. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Nakashole, G. Weikum, and F. M. Suchanek. Patty: A taxonomy of relational patterns with semantic types. In EMNLP-CoNLL, pages 1135--1145, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. E. Rose and D. Levinson. Understanding user goals in web search. In WWW, pages 13--19, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. M. Suchanek, M. Sozio, and G. Weikum. Sofie: a self-organizing framework for information extraction. In WWW, pages 631--640, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Sun and R. Grishman. Active learning for relation type extension with local and global data views. In CIKM, pages 1105--1112, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Tari, P. H. Tu, J. Hakenberg, Y. Chen, T. C. Son, G. Gonzalez, and C. Baral. Incremental information extraction using relational databases. IEEE Trans. Knowl. Data Eng., 24(1):86--99, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. INDREX: in-database distributional relation extraction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DOLAP '13: Proceedings of the sixteenth international workshop on Data warehousing and OLAP
        October 2013
        110 pages
        ISBN:9781450324120
        DOI:10.1145/2513190

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 October 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        DOLAP '13 Paper Acceptance Rate13of26submissions,50%Overall Acceptance Rate29of79submissions,37%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader