Skip to main content

Question Answering System for Incomplete and Noisy Data

Methods and Measures for Its Evaluation

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2633))

Included in the following conference series:

Abstract

We present a question answering system that can handle noisy and incomplete natural language data, and methods and measures for the evaluation of question answering systems. Our question answering system is based on the vector space model and linguistic analysis of the natural language data. In the evaluation procedure, we test eight different preprocessing schemes for the data, and come to the conclusion that lemmatization combined with breaking compound words into their constituents gives significantly better results than the baseline. The evaluation process is based on stratified random sampling and bootstrapping. To measure the correctness of an answer, we use partial credits as well as full credits.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Moldovan, D., Harabagiu, S., Paşca, M., Mihalcea, R., Goodrum, R., Gîrju, R., Rus, V.: LASSO: A tool for surfing the answer net. In: Proceedings of the Text Retrieval Conference (TREC-8), Gaithersburg, Maryland, USA (1999)

    Google Scholar 

  2. Harabagiu, S., Moldovan, D., Paşca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Gîrju, R., Rus, V., Morărescu, P.: FALCON: Boosting knowledge for answer engines. In: Proceedings of TREC-9, Gaithersburg, Maryland, USA (2000)

    Google Scholar 

  3. Harabagiu, S., Moldovan, D., Paşca, M., Surdeanu, M., Mihalcea, R., Gîrju, R., Rus, V., Lăcăatuşu, F., Morăarescu, P., Bunescu, R.: Answering complex, list and context questions with LCC’s question-answering server. In: Proceedings of TREC-10, Gaithersburg, Maryland, USA (2001)

    Google Scholar 

  4. Busemann, S., Schmeier, S., Arens, R.G.: Message classification in the call center. In: Proceedings of 6th Applied Natural Language Processing Conference, Seattle, Washington, USA (2000)

    Google Scholar 

  5. Tapanainen, P., Järvinen, T.: A non-projective dependency parser. In: Proceedings of the 5th Conference on Applied Natural Language Processing, Washington, D. C., USA, Association for Computational Linguistics (1997)

    Google Scholar 

  6. Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley (1989)

    Google Scholar 

  7. Nolan, D., Speed, T.: Stat Labs Mathematical Statistics Through Applications. Springer-Verlag (2001)

    Google Scholar 

  8. Efron, B.: The Jackknife, the Bootstrap and Other Resampling Plans. Society for Industrial and Applied Mathematics (1983)

    Google Scholar 

  9. Cohen, P.: Empirical Methods for Artificial Intelligence. The MIT Press (1995)

    Google Scholar 

  10. Voorhees, E.M.: Overview of the TREC-2001 question answering track. In Voorhees, E.M., Harman, D.K., eds.: Proceedings of TREC-10, Gaithersburg, Maryland, USA, Department of Commerce, National Institute of Standards and Technology (2001)

    Google Scholar 

  11. van Rijsbergen, C. J.: Information Retrieval. 2nd edn. Butterworths (1980)

    Google Scholar 

  12. Carletta, J.: Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics 22 (1996) 249–254

    Google Scholar 

  13. Alkula, R.: From plain character strings to meaningful words: Producing better full text databases for inflectional and compounding languages with morphological analysis software. Information Retrieval 4 (2001) 195–208

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aunimo, L., Heinonen, O., Kuuskoski, R., Makkonen, J., Petit, R., Virtanen, O. (2003). Question Answering System for Incomplete and Noisy Data. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-36618-0_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-01274-0

  • Online ISBN: 978-3-540-36618-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics