Skip to main content

A Framework for the Automatic Extraction of Rules from Online Text

  • Conference paper
Rule-Based Reasoning, Programming, and Applications (RuleML 2011)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6826))

Abstract

The majority of knowledge on the Web is encoded in unstructured text and is not linked to formalized knowledge, such as ontologies and rules. A potential solution to this problem is to acquire this knowledge through natural language processing and text mining methods. Prior work has focused on automatically extracting RDF- or OWL-based ontologies from text; however, the type of knowledge acquired is generally restricted to simple term hierarchies. This paper presents a general-purpose framework for acquiring more complex relationships from text and then encoding this knowledge as rules. Our approach starts with existing domain knowledge in the form of OWL ontologies and Semantic Web Rule Language (SWRL) rules and applies natural language processing and text matching techniques to deduce classes and properties. It then captures deductive knowledge in the form of new rules. We have evaluated our framework by applying it to web-based text on car rental requirements. We show that our approach can automatically and accurately generate rules for requirements of car rental companies not in the knowledge base. Our framework thus rapidly acquires complex knowledge from free text sources. We are expanding it to handle richer domains, such as medical science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yangarber, R., Grishman, R., Tapanainen, P., Huttunen, S.: Automatic Acquisition of Domain Knowledge for Information Extraction. In: Proceedings of COLING 2000: The 18th International Conference on Computational Linguistics, SaarbrĂĽcken, Germany (2000)

    Google Scholar 

  2. Maedche, A., Staab, S.: Ontology learning for the Semantic Web. IEEE Intell. Sys. 16(2) (2001)

    Google Scholar 

  3. Alani, H., Kim, S., Millard, D.E., Weal, M.J., Hall, W., Lewis, P.H., Shadbolt, N.R.: Automatic Ontology-Based Knowledge Extraction from Web Documents. IEEE Intell. Sys. 18(1), 14–21 (2003)

    Article  Google Scholar 

  4. Manine, A.P., Alphonse, E., Bessières, P.: Learning ontological rules to extract multiple relations of genic interactions from text. Int. J. Med. Informat. 78(12), e31–e38 (2009)

    Article  Google Scholar 

  5. Miller, G.A.: WordNet: A Lexical Database for English. Com. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  6. de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating Typed Dependency Parses from Phrase Structure Parses. In: Proceedings of 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy (2006)

    Google Scholar 

  7. Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Knowledge Discovery in Databases (1998)

    Google Scholar 

  8. Held, C.M., Heiss, J.E., Estevez, P.A., Perez, C.A., Garrido, M., Algarin, C., Peirano, P.: Extracting Fuzzy Rules From Polysomnographic Recordings for Infant Sleep Classification. IEEE Trans. Biomed. Eng. 53, 1954–1962 (2006)

    Article  Google Scholar 

  9. Madkour, A., Darwish, K., Hassan, H., Hassan, A., Emam, O.: BioNoculars: Extracting Protein-Protein Interactions from Biomedical Text. In: BioNLP, Prague, Czech Republic (2007)

    Google Scholar 

  10. Shnarch, E., Barak, L., Dagan, I.: Extracting Lexical Reference Rules from Wikipedia. In: Proceedings of the 47th Annual Meeting of the ACL, Suntec, Singapore (2009)

    Google Scholar 

  11. Xu, F., Kurz D., Piskorski J., Schmeier S.: A Domain Adaptive Approach to Automatic Acquisition of Domain Relevant Terms and Their Relations with Bootstrapping. In: Proc. Third Int’l Conf. Language Resources and Evaluation (LREC 2002) (2002)

    Google Scholar 

  12. Muller, H.M., Kenny, E.E., Sternberg, P.W.: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2, e309 (2004)

    Article  Google Scholar 

  13. Riloff, E., Jones, R.: Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI 1999), pp. 474–479 (1999)

    Google Scholar 

  14. Crow, L., Shadbolt, N.: Extracting Focused Knowledge from the Semantic Web. Int. J. Hum. Comput. Stud. 54, 155–184 (2001)

    Article  MATH  Google Scholar 

  15. Buitelaar, P., Olejnik, D., Sintek, M.: A Protégé plug-in for ontology extraction from text based on linguistic analysis. In: Proceedings of the International Semantic Web Conference, ISWC (2003)

    Google Scholar 

  16. Kang, J., Lee, J.K.: Rule Identification from Web Pages by the XRML Approach. Decision Support Systems 41(1), 205–227 (2005)

    Article  Google Scholar 

  17. Duboue, P.A., McKeown, K.R.: Statistical acquisition of content selection rules for natural language generation. In: Proceedings of EMNLP, pp. 121–128 (2003)

    Google Scholar 

  18. Park, S., Lee, J.K.: Rule identification using ontology while acquiring rules from Web pages. Int. J. Hum.-Comput. Stud. 65(7), 659–673 (2007)

    Article  Google Scholar 

  19. Lee, J.K., Sohn, M.: Extensible Rule Markup Language - toward intelligent Web platform. Communications of the ACM 46, 59–64 (2003)

    Article  Google Scholar 

  20. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  21. California cities by population, http://en.wikipedia.org/wiki/List_of_California_cities_by_population

  22. Avis information web page, http://www.avis.com/car-rental/content/render-faq.ac

  23. Enterprise information web page, http://enterprise.custhelp.com/app/answers/detail/a_id/3061/session/L3NpZC9MZjFxTlNtaw%3D%3D/sno/0

  24. Hassanpour, S., Das, A.K. Semantics-based Text Mining of Biomedical Concepts in Scientific Publications. Stanford Institute of Biomedical Informatics Research, Technical Report BMIR-2010-1421 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hassanpour, S., O’Connor, M.J., Das, A.K. (2011). A Framework for the Automatic Extraction of Rules from Online Text. In: Bassiliades, N., Governatori, G., Paschke, A. (eds) Rule-Based Reasoning, Programming, and Applications. RuleML 2011. Lecture Notes in Computer Science, vol 6826. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22546-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22546-8_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22545-1

  • Online ISBN: 978-3-642-22546-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics