Skip to main content

Improving Hierarchical Document Signature Performance by Classifier Combination

  • Conference paper
Neural Information Processing. Theory and Algorithms (ICONIP 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6443))

Included in the following conference series:

  • 2427 Accesses

Abstract

We present a classifier-combination experimental framework for part-of-speech (POS) tagging in which four different POS taggers are combined in order to get a better result for sentence similarity using Hierarchical Document Signature (HDS). It is important to abstract information available to form humanly accessible structures. The way people think and talk is hierarchical with limited information presented in any one sentence, and that information is always linked together to further information. As such, HDS is a significant way to represent sentences when finding their similarity. POS tagging plays an important role in HDS. But POS taggers available are not perfect in tagging words in a sentence and tend to tag words improperly if they are either not properly cased or do not match the corpus dataset by which these taggers are trained. Thus, different weighted voting strategies are used to overcome some of these drawbacks of these existing taggers. Comparisons between individual taggers and combined taggers under different voting strategies are made. Their results show that the combined taggers provide better results than the individual ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fellbaum, C.: Wordnet: an Electronic Lexical Database. Bradford Books (1998)

    Google Scholar 

  2. Gasperin, C., Gamallo, P., Agustini, A., Lopes, G., Lima, V.: Using syntactic contexts for measuring word similarity. In: Proceedings of the Workshop on Semantic Knowledge Acquisition and Categorisation, Helsink, Finland (2001)

    Google Scholar 

  3. Gedeon, T.D., Mital, V.: Information Retrieval in Law using a Neural Network Integrated with Hypertext. In: Proceedings International Joint Conference on Neural Networks, Singapore, pp. 1819–1824 (1991)

    Google Scholar 

  4. Koczy, L.T., Gedeon, T.D., Koczy, J.A.: Fuzzy tolerance relations and relational maps applied to information retrieval. Fuzzy Sets and Systems 126(1), 49–61 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  5. Huang, Z., Gedeon, T. D.:Information Retrieval Estimation Via Fuzzy Probability. World Automation Congress (WAC), Budapest, Hungary (2006)

    Google Scholar 

  6. Manna, S., Mendis, B., Gedeon, T.: Hierarchical document signature: A specialized application of fuzzy signature for document computing (2009)

    Google Scholar 

  7. Manna, S., Gedeon, T.: Hierarchical Document Signature for Semantic Analysis. In: WCCI 2010, FUZZ-IEEE 2010, Barcelona (2010)

    Google Scholar 

  8. Ali, K.M., Pazzani, M.J.: Error Reduction through Learning Multiple Descriptions. Machine Learning 24(3), 173–202 (1996)

    Google Scholar 

  9. Chan, P.K., Stolfo, S.J.: A Comparative Evaluation of Voting and Meta-Learning of Partition Data. In: 12th International Conference on Machine Learning (1995)

    Google Scholar 

  10. Gedeon, T.D., Wong, P.M., Harris, D.: Balancing Bias and Variance: Network Topology and Pattern Set Reduction Techniques. In: Sandoval, F., Mira, J. (eds.) IWANN 1995. LNCS, vol. 930, pp. 551–558. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  11. OpenNLP, http://opennlp.sourceforge.net/api/opennlp/tools/lang/english/PosTagger.html

  12. Lingpipe Tagger, http://alias-i.com/lingpipe/demos/tutorial/posTags/read-me.html

  13. Halacsy, P., Kornai, A., Oravercz, C.: HunPos -an open source trigram tagger. In: Proceedings of the Demo and Poster Session of the 45th Annual Meeting of the ACL, pp. 209–212 (2007)

    Google Scholar 

  14. Brants, T.: TnT-A Statistic Part-of-Speech Tagger. In: Proceedings of ANLP-NAACL Confference (2000)

    Google Scholar 

  15. CRFTagger, http://crftagger.sourceforge.net

  16. Sjöbergh, J.: A Comparative Evaluation of Voting and Meta-Learning of Partition Data. In: Proceedings of RANLP-2003, Borovets, Bulgaria (2003)

    Google Scholar 

  17. Rama Sree, R.J., Kusuma kumari, P.: Combining Pos Taggers For Improved Accuracy To Create Telugu Annotated Texts For Information Retrieval. In: ICUDL 2007, Carnegie Mellon University, Pittsburgh, USA-ULIB, (2007)

    Google Scholar 

  18. OpenNLP MAXENT, http://maxent.sourceforge.net

  19. van Halteren, H.: Comparison of Tagging Strategies, a Prelude of Democratic Tagging. Research in Humanities Computing 4, 207–215 (1996)

    Google Scholar 

  20. Van Halteren, H., Zavrel, J., Daelemans, W.: Improving Data Driven Worldclass Tagging by System Combination. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (1998)

    Google Scholar 

  21. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Computational Linguistics 19 (1993)

    Google Scholar 

  22. CoNLL2000 dataset, www.cnts.ua.ac.be/conll2000/chunking/

  23. Microsoft Research Paraphrase Corpus, http://research.microsoft.com/en-us/downloads/607d14d9-20cd-47e3-85bc-a2f65cd28042/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liao, J., Mendis, B.S.U., Manna, S. (2010). Improving Hierarchical Document Signature Performance by Classifier Combination. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds) Neural Information Processing. Theory and Algorithms. ICONIP 2010. Lecture Notes in Computer Science, vol 6443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17537-4_84

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17537-4_84

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17536-7

  • Online ISBN: 978-3-642-17537-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics