How to Rank Terminology Extracted by Exterlog

Saneifar, Hassan; Bonniol, Stéphane; Laurent, Anne; Poncelet, Pascal; Roche, Mathieu

doi:10.1007/978-3-642-19032-2_9

Hassan Saneifar^5,6,
Stéphane Bonniol⁶,
Anne Laurent⁵,
Pascal Poncelet⁵ &
…
Mathieu Roche⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 128))

Included in the following conference series:

International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management

803 Accesses
1 Citations

Abstract

In many application areas, systems reports occurring events in a kind of textual data called usually log files. Log files report the status of systems, products, or even causes of problems that can occur. The Information extracted from log files of computing systems can be considered one of the important resources of information systems. Log files are considered as a kind of “complex textual data”, i.e. the multi-source, heterogeneous, and multi-format data. In this paper, we aim particularly at exploring the lexical structure of these log files in order to extract the terms used in log files. These terms will be used in the building of domain ontology and also in enrichment of features of log files corpus. According to features of such textual data, applying the classical methods of information extraction is not an easy task, more particularly for terminology extraction. Here, we introduce a new developed version of Exterlog, our approach to extract the terminology from log files, which is guided by Web to evaluate the extracted terms. We score the extracted terms by a Web and context based measure. We favor the more relevant terms of domain and emphasize the precision by filtering terms based on their scores. The experiments show that Exterlog is well-adapted terminology extraction approach from log files.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Saneifar, H., Bonniol, S., Laurent, A., Poncelet, P., Roche, M.: Terminology extraction from log files. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 769–776. Springer, Heidelberg (2009)
Chapter Google Scholar
Yamanishi, K., Maruyama, Y.: Dynamic syslog mining for network failure monitoring. In: KDD 2005: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 499–508. ACM, New York (2005)
Google Scholar
Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. Data Knowl. Eng. 53(3), 225–241 (2005)
Article Google Scholar
Voorhees, E.M.: Query expansion using lexical-semantic relations. In: SIGIR 1994: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 61–69. Springer, Heidelberg (1994)
Google Scholar
Dey, L., Singh, S., Rai, R., Gupta, S.: Ontology aided query expansion for retrieving relevant texts. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 126–132. Springer, Heidelberg (2005)
Chapter Google Scholar
Even, F., Enguehard, C.: Extraction d’informations à partir de corpus dégradés. In: Proceedings of 9ème Conference sur le Traitement Automatique des Langues Naturelles (TALN 2002), pp. 105–115 (2002)
Google Scholar
Mollá, D., Vicedo, J.L.: Question answering in restricted domains: An overview. Computational Linguistics 33(1), 41–61 (2007)
Article Google Scholar
Roche, M., Heitz, T., Matte-Tailliez, O., Kodratoff, Y.: Exit: Un système itératif pour l’extraction de la terminologie du domaine à partir de corpus spécialisés. In: Proceedings of JADT 2004 (International Conference on Statistical Analysis of Textual Data), vol. 2, pp. 946–956 (2004)
Google Scholar
meng Tan, C., fang Wang, Y., do Lee, C.: The use of bigrams to enhance text categorization. In: Inf. Process. Manage, pp. 529–546 (2002)
Google Scholar
Grobelnik, M.: Word sequences as features in text-learning. In: Proceedings of the 17th Electrotechnical and Computer Science Conference (ERK 1998), pp. 145–148 (1998)
Google Scholar
David, S., Plante, P.: De la nécessité d’une approche morpho-syntaxique en analyse de textes. Intelligence Artificielle et Sciences Cognitives au Québec 2(3), 140–155 (1990)
Google Scholar
Collier, N., Nobata, C., Tsujii, J.: Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Journal of Terminology, John Benjamins 7(2), 239–257 (2002)
Article Google Scholar
Lin, D.: Extracting collocations from text corpora. In: First Workshop on Computational Terminology, pp. 57–63 (1998)
Google Scholar
Smadja, F.: Retrieving collocations from text: Xtract. Comput. Linguist. 19(1), 143–177 (1993)
Google Scholar
Bourigault, D., Fabre, C.: Approche linguistique pour l’analyse syntaxique de corpus. Cahiers de Grammaire - Université Toulouse le Mirail (25), 131–151 (2000)
Google Scholar
Daille, B.: Conceptual structuring through term variations. In: Proceedings of the ACL 2003, Workshop on Multiword Expressions, Morristown, NJ, USA, Association for Computational Linguistics, pp. 9–16 (2003)
Google Scholar
Evans, D.A., Zhai, C.: Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 17–24 (1996)
Google Scholar
Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, pp. 152–155 (1992)
Google Scholar
Amrani, A., Kodratoff, Y., Matte-Tailliez, O.: A semi-automatic system for tagging specialized corpora. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 670–681. Springer, Heidelberg (2004)
Chapter Google Scholar
Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In: The Balancing Act: Combining Symbolic and Statistical Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)
Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16, 22–29 (1990)
Google Scholar
Daille, B.: Approche mixte pour l’extraction automatique de terminologie: statistiques lexicales et filtres linguistiques. PhD thesis, Universit Paris 7 (1994)
Google Scholar
Roche, M., Prince, V.: AcroDef: A quality measure for discriminating expansions of ambiguous acronyms. In: Kokinov, B., Richardson, D.C., Roth-Berghofer, T.R., Vieu, L. (eds.) CONTEXT 2007. LNCS (LNAI), vol. 4635, pp. 411–424. Springer, Heidelberg (2007)
Chapter Google Scholar
Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics 22(1), 1–38 (1996)
Google Scholar
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA (1987)
Google Scholar
Flach, P., Blockeel, H., Ferri, C., Hernández-Orallo, J., Struyf, J.: Decision support for data mining: An introduction to ROC analysis and its applications. In: Data Mining and Decision Support: Integration and Collaboration pages, pp. 81–90. Kluwer, Dordrecht (2003)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

LIRMM - Univ. Montpellier 2 - CNRS, Montpellier, France
Hassan Saneifar, Anne Laurent, Pascal Poncelet & Mathieu Roche
Satin IP Technologies, Montpellier, France
Hassan Saneifar & Stéphane Bonniol

Authors

Hassan Saneifar
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Bonniol
View author publications
You can also search for this author in PubMed Google Scholar
Anne Laurent
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Poncelet
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Roche
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Delft University of Technology, Mekelweg 4, 2628, Delft, CD, The Netherlands
Jan L. G. Dietz
Informatics Research Centre, Henley Business School, University of Reading, RG6 6UD, Reading, UK
Kecheng Liu
Departament of Systems and Informatics, Polytechnic Institute of Setúbal – INSTICC, Rua do Vale de Chaves - Estefanilha, 2910-761, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saneifar, H., Bonniol, S., Laurent, A., Poncelet, P., Roche, M. (2011). How to Rank Terminology Extracted by Exterlog . In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowlege Engineering and Knowledge Management. IC3K 2009. Communications in Computer and Information Science, vol 128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19032-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-19032-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19031-5
Online ISBN: 978-3-642-19032-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics