Text Mining via Information Extraction

Feldman, Ronen; Aumann, Yonatan; Fresko, Moshe; Liphstat, Orly; Rosenfeld, Binyamin; Schler, Yonatan

doi:10.1007/978-3-540-48247-5_18

Ronen Feldman⁸,
Yonatan Aumann⁸,
Moshe Fresko⁸,
Orly Liphstat⁸,
Binyamin Rosenfeld⁸ &
…
Yonatan Schler⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1704))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2113 Accesses
4 Citations

Abstract

Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are keywords that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope consists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope’s own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.

Download to read the full chapter text

Chapter PDF

Creating Knowledge Base from Automatically Extracted Information

Research on Semantic Text Mining Based on Domain Ontology

Consolidating and Exploring Information via Textual Inference

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Appelt, D.E., Hobbs, J.R., Bear, J., Israel, D., Tyson, M.: FASTUS: A Finite-State Processor for Information Extraction from Real-World Text. In: Proceedings IJCAI 1993, Chambery, France (August 1993)
Google Scholar
Daille, B., Gaussier, E., Lange, J.M.: Towards Automatic Extraction of Monolingual and Bilingual Terminology. In: Proceedings of the International Conference on Computational Linguistics, COLING 1994, pp. 515–521 (1994)
Google Scholar
Feldman, R., Hirsh, H.: Exploiting Background Information in Knowledge Discovery from Text. Journal of Intelligent Information Systems (1996)
Google Scholar
Feldman, R., Aumann, Y., Amir, A., Klösgen, W., Zilberstien, A.: Maximal Association Rules: a New Tool for Mining for Keyword co-occurrences in Document Collections. In: Proceedings of the 3rd International Conference on Knowledge Discovery, KDD 1997, Newport Beach, CA (1997)
Google Scholar
Feldman, R., Dagan, I.: KDT – Knowledge Discovery in Texts. In: Proceedings of the First International Conference on Knowledge Discovery KDD 1995 (1995)
Google Scholar
Rajman, M., Besançon, R.: Text Mining: Natural Language Techniques and Text Mining Applications. In: Proceedings of the seventh IFIP 2.6 Working Conference on Database Semantics (DS-7), Leysin, Switzerland, October 7-10. Chapam & Hall IFIP Proceedings serie (1997)
Google Scholar
Soderland, S., Fisher, D., Aseltine, J., Lehnert, W.: Issues in Inductive Learning of Domain-Specific Text Extraction Rules. In: Proceedings of the Workshop on New Approaches to Learning for Natural Language Processing at the Fourteenth International Joint Conference on Artificial Intelligence. Text Mining via Information Extraction, p. 173 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Bar-Ilan University, Ramat-Gan, Israel
Ronen Feldman, Yonatan Aumann, Moshe Fresko, Orly Liphstat, Binyamin Rosenfeld & Yonatan Schler

Authors

Ronen Feldman
View author publications
You can also search for this author in PubMed Google Scholar
Yonatan Aumann
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Fresko
View author publications
You can also search for this author in PubMed Google Scholar
Orly Liphstat
View author publications
You can also search for this author in PubMed Google Scholar
Binyamin Rosenfeld
View author publications
You can also search for this author in PubMed Google Scholar
Yonatan Schler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, UNC Charlotte, Charlotte, N.C. 28223 and Institute of Computer Science, Polish Academy of Sciences,
Jan M. Żytkow
Faculty of Informatics and Statistics, University of Economics, Prague, nám. W. Churchilla 4, 130 67, Prague, Czech Republic
Jan Rauch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feldman, R., Aumann, Y., Fresko, M., Liphstat, O., Rosenfeld, B., Schler, Y. (1999). Text Mining via Information Extraction. In: Żytkow, J.M., Rauch, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1999. Lecture Notes in Computer Science(), vol 1704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48247-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-48247-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66490-1
Online ISBN: 978-3-540-48247-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Text Mining via Information Extraction

Abstract

Chapter PDF

Similar content being viewed by others

Creating Knowledge Base from Automatically Extracted Information

Research on Semantic Text Mining Based on Domain Ontology

Consolidating and Exploring Information via Textual Inference

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Text Mining via Information Extraction

Abstract

Chapter PDF

Similar content being viewed by others

Creating Knowledge Base from Automatically Extracted Information

Research on Semantic Text Mining Based on Domain Ontology

Consolidating and Exploring Information via Textual Inference

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation