A Framework for the Automatic Extraction of Rules from Online Text

Hassanpour, Saeed; O’Connor, Martin J.; Das, Amar K.

doi:10.1007/978-3-642-22546-8_21

Saeed Hassanpour¹⁹,
Martin J. O’Connor¹⁹ &
Amar K. Das¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6826))

Included in the following conference series:

International Workshop on Rules and Rule Markup Languages for the Semantic Web

1002 Accesses
9 Citations

Abstract

The majority of knowledge on the Web is encoded in unstructured text and is not linked to formalized knowledge, such as ontologies and rules. A potential solution to this problem is to acquire this knowledge through natural language processing and text mining methods. Prior work has focused on automatically extracting RDF- or OWL-based ontologies from text; however, the type of knowledge acquired is generally restricted to simple term hierarchies. This paper presents a general-purpose framework for acquiring more complex relationships from text and then encoding this knowledge as rules. Our approach starts with existing domain knowledge in the form of OWL ontologies and Semantic Web Rule Language (SWRL) rules and applies natural language processing and text matching techniques to deduce classes and properties. It then captures deductive knowledge in the form of new rules. We have evaluated our framework by applying it to web-based text on car rental requirements. We show that our approach can automatically and accurately generate rules for requirements of car rental companies not in the knowledge base. Our framework thus rapidly acquires complex knowledge from free text sources. We are expanding it to handle richer domains, such as medical science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yangarber, R., Grishman, R., Tapanainen, P., Huttunen, S.: Automatic Acquisition of Domain Knowledge for Information Extraction. In: Proceedings of COLING 2000: The 18th International Conference on Computational Linguistics, Saarbrücken, Germany (2000)
Google Scholar
Maedche, A., Staab, S.: Ontology learning for the Semantic Web. IEEE Intell. Sys. 16(2) (2001)
Google Scholar
Alani, H., Kim, S., Millard, D.E., Weal, M.J., Hall, W., Lewis, P.H., Shadbolt, N.R.: Automatic Ontology-Based Knowledge Extraction from Web Documents. IEEE Intell. Sys. 18(1), 14–21 (2003)
Article Google Scholar
Manine, A.P., Alphonse, E., Bessières, P.: Learning ontological rules to extract multiple relations of genic interactions from text. Int. J. Med. Informat. 78(12), e31–e38 (2009)
Article Google Scholar
Miller, G.A.: WordNet: A Lexical Database for English. Com. ACM 38(11), 39–41 (1995)
Article Google Scholar
de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating Typed Dependency Parses from Phrase Structure Parses. In: Proceedings of 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy (2006)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Knowledge Discovery in Databases (1998)
Google Scholar
Held, C.M., Heiss, J.E., Estevez, P.A., Perez, C.A., Garrido, M., Algarin, C., Peirano, P.: Extracting Fuzzy Rules From Polysomnographic Recordings for Infant Sleep Classification. IEEE Trans. Biomed. Eng. 53, 1954–1962 (2006)
Article Google Scholar
Madkour, A., Darwish, K., Hassan, H., Hassan, A., Emam, O.: BioNoculars: Extracting Protein-Protein Interactions from Biomedical Text. In: BioNLP, Prague, Czech Republic (2007)
Google Scholar
Shnarch, E., Barak, L., Dagan, I.: Extracting Lexical Reference Rules from Wikipedia. In: Proceedings of the 47th Annual Meeting of the ACL, Suntec, Singapore (2009)
Google Scholar
Xu, F., Kurz D., Piskorski J., Schmeier S.: A Domain Adaptive Approach to Automatic Acquisition of Domain Relevant Terms and Their Relations with Bootstrapping. In: Proc. Third Int’l Conf. Language Resources and Evaluation (LREC 2002) (2002)
Google Scholar
Muller, H.M., Kenny, E.E., Sternberg, P.W.: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2, e309 (2004)
Article Google Scholar
Riloff, E., Jones, R.: Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI 1999), pp. 474–479 (1999)
Google Scholar
Crow, L., Shadbolt, N.: Extracting Focused Knowledge from the Semantic Web. Int. J. Hum. Comput. Stud. 54, 155–184 (2001)
Article MATH Google Scholar
Buitelaar, P., Olejnik, D., Sintek, M.: A Protégé plug-in for ontology extraction from text based on linguistic analysis. In: Proceedings of the International Semantic Web Conference, ISWC (2003)
Google Scholar
Kang, J., Lee, J.K.: Rule Identification from Web Pages by the XRML Approach. Decision Support Systems 41(1), 205–227 (2005)
Article Google Scholar
Duboue, P.A., McKeown, K.R.: Statistical acquisition of content selection rules for natural language generation. In: Proceedings of EMNLP, pp. 121–128 (2003)
Google Scholar
Park, S., Lee, J.K.: Rule identification using ontology while acquiring rules from Web pages. Int. J. Hum.-Comput. Stud. 65(7), 659–673 (2007)
Article Google Scholar
Lee, J.K., Sohn, M.: Extensible Rule Markup Language - toward intelligent Web platform. Communications of the ACM 46, 59–64 (2003)
Article Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Article Google Scholar
California cities by population, http://en.wikipedia.org/wiki/List_of_California_cities_by_population
Avis information web page, http://www.avis.com/car-rental/content/render-faq.ac
Enterprise information web page, http://enterprise.custhelp.com/app/answers/detail/a_id/3061/session/L3NpZC9MZjFxTlNtaw%3D%3D/sno/0
Hassanpour, S., Das, A.K. Semantics-based Text Mining of Biomedical Concepts in Scientific Publications. Stanford Institute of Biomedical Informatics Research, Technical Report BMIR-2010-1421 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Stanford Center for Biomedical Informatics Research, Stanford, CA, 94305, U.S.A.
Saeed Hassanpour, Martin J. O’Connor & Amar K. Das

Authors

Saeed Hassanpour
View author publications
You can also search for this author in PubMed Google Scholar
Martin J. O’Connor
View author publications
You can also search for this author in PubMed Google Scholar
Amar K. Das
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
Nick Bassiliades
Queensland Research Laboratory, NICTA, PO Box 6020, 4067, St. Lucia, QLD, Australia
Guido Governatori
Computer Science Department, Corporate Semantic Web, Freie Universität Berlin, Königin-Luise-Str. 24/26, 14495, Berlin, Germany
Adrian Paschke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hassanpour, S., O’Connor, M.J., Das, A.K. (2011). A Framework for the Automatic Extraction of Rules from Online Text. In: Bassiliades, N., Governatori, G., Paschke, A. (eds) Rule-Based Reasoning, Programming, and Applications. RuleML 2011. Lecture Notes in Computer Science, vol 6826. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22546-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-22546-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22545-1
Online ISBN: 978-3-642-22546-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics