Skip to main content

Abstract

In many application areas, systems reports occurring events in a kind of textual data called usually log files. Log files report the status of systems, products, or even causes of problems that can occur. The Information extracted from log files of computing systems can be considered one of the important resources of information systems. Log files are considered as a kind of “complex textual data”, i.e. the multi-source, heterogeneous, and multi-format data. In this paper, we aim particularly at exploring the lexical structure of these log files in order to extract the terms used in log files. These terms will be used in the building of domain ontology and also in enrichment of features of log files corpus. According to features of such textual data, applying the classical methods of information extraction is not an easy task, more particularly for terminology extraction. Here, we introduce a new developed version of Exterlog, our approach to extract the terminology from log files, which is guided by Web to evaluate the extracted terms. We score the extracted terms by a Web and context based measure. We favor the more relevant terms of domain and emphasize the precision by filtering terms based on their scores. The experiments show that Exterlog is well-adapted terminology extraction approach from log files.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Saneifar, H., Bonniol, S., Laurent, A., Poncelet, P., Roche, M.: Terminology extraction from log files. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 769–776. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Yamanishi, K., Maruyama, Y.: Dynamic syslog mining for network failure monitoring. In: KDD 2005: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 499–508. ACM, New York (2005)

    Google Scholar 

  3. Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. Data Knowl. Eng. 53(3), 225–241 (2005)

    Article  Google Scholar 

  4. Voorhees, E.M.: Query expansion using lexical-semantic relations. In: SIGIR 1994: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 61–69. Springer, Heidelberg (1994)

    Google Scholar 

  5. Dey, L., Singh, S., Rai, R., Gupta, S.: Ontology aided query expansion for retrieving relevant texts. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 126–132. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Even, F., Enguehard, C.: Extraction d’informations à partir de corpus dégradés. In: Proceedings of 9ème Conference sur le Traitement Automatique des Langues Naturelles (TALN 2002), pp. 105–115 (2002)

    Google Scholar 

  7. Mollá, D., Vicedo, J.L.: Question answering in restricted domains: An overview. Computational Linguistics 33(1), 41–61 (2007)

    Article  Google Scholar 

  8. Roche, M., Heitz, T., Matte-Tailliez, O., Kodratoff, Y.: Exit: Un système itératif pour l’extraction de la terminologie du domaine à partir de corpus spécialisés. In: Proceedings of JADT 2004 (International Conference on Statistical Analysis of Textual Data), vol. 2, pp. 946–956 (2004)

    Google Scholar 

  9. meng Tan, C., fang Wang, Y., do Lee, C.: The use of bigrams to enhance text categorization. In: Inf. Process. Manage, pp. 529–546 (2002)

    Google Scholar 

  10. Grobelnik, M.: Word sequences as features in text-learning. In: Proceedings of the 17th Electrotechnical and Computer Science Conference (ERK 1998), pp. 145–148 (1998)

    Google Scholar 

  11. David, S., Plante, P.: De la nécessité d’une approche morpho-syntaxique en analyse de textes. Intelligence Artificielle et Sciences Cognitives au Québec 2(3), 140–155 (1990)

    Google Scholar 

  12. Collier, N., Nobata, C., Tsujii, J.: Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Journal of Terminology, John Benjamins 7(2), 239–257 (2002)

    Article  Google Scholar 

  13. Lin, D.: Extracting collocations from text corpora. In: First Workshop on Computational Terminology, pp. 57–63 (1998)

    Google Scholar 

  14. Smadja, F.: Retrieving collocations from text: Xtract. Comput. Linguist. 19(1), 143–177 (1993)

    Google Scholar 

  15. Bourigault, D., Fabre, C.: Approche linguistique pour l’analyse syntaxique de corpus. Cahiers de Grammaire - Université Toulouse le Mirail (25), 131–151 (2000)

    Google Scholar 

  16. Daille, B.: Conceptual structuring through term variations. In: Proceedings of the ACL 2003, Workshop on Multiword Expressions, Morristown, NJ, USA, Association for Computational Linguistics, pp. 9–16 (2003)

    Google Scholar 

  17. Evans, D.A., Zhai, C.: Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 17–24 (1996)

    Google Scholar 

  18. Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, pp. 152–155 (1992)

    Google Scholar 

  19. Amrani, A., Kodratoff, Y., Matte-Tailliez, O.: A semi-automatic system for tagging specialized corpora. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 670–681. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  20. Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In: The Balancing Act: Combining Symbolic and Statistical Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)

    Google Scholar 

  21. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16, 22–29 (1990)

    Google Scholar 

  22. Daille, B.: Approche mixte pour l’extraction automatique de terminologie: statistiques lexicales et filtres linguistiques. PhD thesis, Universit Paris 7 (1994)

    Google Scholar 

  23. Roche, M., Prince, V.: AcroDef: A quality measure for discriminating expansions of ambiguous acronyms. In: Kokinov, B., Richardson, D.C., Roth-Berghofer, T.R., Vieu, L. (eds.) CONTEXT 2007. LNCS (LNAI), vol. 4635, pp. 411–424. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  24. Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics 22(1), 1–38 (1996)

    Google Scholar 

  25. Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA (1987)

    Google Scholar 

  26. Flach, P., Blockeel, H., Ferri, C., Hernández-Orallo, J., Struyf, J.: Decision support for data mining: An introduction to ROC analysis and its applications. In: Data Mining and Decision Support: Integration and Collaboration pages, pp. 81–90. Kluwer, Dordrecht (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Saneifar, H., Bonniol, S., Laurent, A., Poncelet, P., Roche, M. (2011). How to Rank Terminology Extracted by Exterlog . In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowlege Engineering and Knowledge Management. IC3K 2009. Communications in Computer and Information Science, vol 128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19032-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19032-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19031-5

  • Online ISBN: 978-3-642-19032-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics