Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AINSC,volume 167))

  • 1280 Accesses

Abstract

Supervised classifiers are limited by the annotated corpora available. Active learning is a way to circumvent this bottleneck, reducing the number of annotated examples required. In this paper, we analyze the benefits of active learning combined with bagging applied to Quotation Start, Noun Phrase Chunking and Text Chunking tasks. We employ query-by-committee as query strategy to actively select examples to be annotated. By using these techniques, we achieve reductions up to 62.50% on the annotation effort depending on the task to obtain the same quality as in passive supervised learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 1–9. Morgan Kaufmann Publishers Inc., San Francisco (1998), http://dl.acm.org/citation.cfm?id=645527.657478

    Google Scholar 

  2. Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: ICML 1995, pp. 150–157 (1995)

    Google Scholar 

  3. Fernandes, W.P.D., Motta, E., Milidiú, R.L.: Quotation extraction for portuguese. In: Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), Cuiabá, pp. 204–208 (2011)

    Google Scholar 

  4. Freitas, M.C., Garrao, M., Oliveira, C., dos Santos, C.N., Silveira, M.: A anotação de um corpus para o aprendizado supervisionado de um modelo de sn. In: Proceedings of the III TIL/XXV Congresso da SBC, São Leopoldo - RS - Brasil (2005)

    Google Scholar 

  5. Freitas, C., Rocha, P., Bick, E.: Floresta Sintá(c)tica: Bigger, Thicker and Easier. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 216–219. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Hammerton, J.: Introduction to Special Issue on Machine Learning Approaches to Shallow Parsing. Journal of Machine Learning Research 19(2), 313–558 (2002), doi:10.1162/153244302320884533

    Google Scholar 

  7. Milidiú, R.L., Santos, C.N., Duarte, J.C.: Phrase chunking using entropy guided transformation. In: Proc. of ACL 2008: HLT, pp. 647–655 (2008)

    Google Scholar 

  8. Olsson, F.: A literature survey of active machine learning in the context of natural language processing. Tech. Rep. 06, Box 1263, SE-164 29 Kista, Sweden(2009), http://soda.swedish-ict.se/3600/1/SICS-T2009-06--SE.pdf

  9. Sang, E.F.T.K., Buchholz, S.: Introduction to the conll-2000 shared task: Chunking. In: Proceedings of CoNLL 2000 and LLL 2000, Lisbon, Portugal, pp. 127–132 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruy Luiz Milidiú .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag GmbH Berlin Heidelberg

About this paper

Cite this paper

Milidiú, R.L., Schwabe, D., Motta, E. (2012). Active Learning with Bagging for NLP Tasks. In: Wyld, D., Zizka, J., Nagamalai, D. (eds) Advances in Computer Science, Engineering & Applications. Advances in Intelligent Systems and Computing, vol 167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30111-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30111-7_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30110-0

  • Online ISBN: 978-3-642-30111-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics