Skip to main content

WebShodh: A Code Mixed Factoid Question Answering System for Web

  • Conference paper
  • First Online:
Book cover Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10456))

Abstract

Code-Mixing (CM) is a natural phenomenon observed in many multilingual societies and is becoming the preferred medium of expression and communication in online and social media fora. In spite of this, current Question Answering (QA) systems do not support CM and are only designed to work with a single interaction language. This assumption makes it inconvenient for multi-lingual users to interact naturally with the QA system especially in scenarios where they do not know the right word in the target language. In this paper, we present WebShodh - an end-end web-based Factoid QA system for CM languages. We demonstrate our system with two CM language pairs: Hinglish (Matrix language: Hindi, Embedded language: English) and Tenglish (Matrix language: Telugu, Embedded language: English). Lack of language resources such as annotated corpora, POS taggers or parsers for CM languages poses a huge challenge for automated processing and analysis. In view of this resource scarcity, we only assume the existence of bi-lingual dictionaries from the matrix languages to English and use it for lexically translating the question into English. Later, we use this loosely translated question for our downstream analysis such as Answer Type(AType) prediction, answer retrieval and ranking. Evaluation of our system reveals that we achieve an MRR of 0.37 and 0.32 for Hinglish and Tenglish respectively. We hosted this system online and plan to leverage it for collecting more CM questions and answers data for further improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Mixing of Spanish-English, Hindi-English, Telugu-English, Portugese-Spanish and French-Japanese language pairs respectively.

  2. 2.

    Hindi is one of the most spoken languages in India, with 370 million native speakers and is an official language along with English. Telugu is the most spoken Dravidian language in South India with about 70 million native speakers.

  3. 3.

    http://emnlp2014.org/workshops/CodeSwitch/call.html.

  4. 4.

    http://fire.irsi.res.in/fire/home.

  5. 5.

    This video is recorded in real time frame to demonstrate the speed of the system for practical purposes.

References

  1. Myers-Scotton, C., Linguistics, C.: Bilingual Encounters and Grammatical Outcomes. Oxford University Press, Oxford (2002)

    Book  Google Scholar 

  2. Hidayat, T.: An Analysis of Code Switching used by Facebookers (2008)

    Google Scholar 

  3. Brill, E., Dumais, S., Banko, M.: An analysis of the AskMSR question-answering system. In: EMNLP-Volume 10 (2002)

    Google Scholar 

  4. Zhang, D., Lee, W.S.: A web-based question answering system (2003)

    Google Scholar 

  5. Magnini, B., et al.: Overview of the CLEF 2004 multilingual question answering track. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 371–391. Springer, Heidelberg (2005). doi:10.1007/11519645_38

    Chapter  Google Scholar 

  6. Tay, M.W.J.: Code switching and code mixing as a communicative strategy in multilingual discourse. World Englishes 8(3), 407–417 (1989)

    Article  Google Scholar 

  7. Lesley, M., Pieter, M.: One Speaker, Two Languages: Cross-Disciplinary Perspectives on Code-Switching. Cambridge University Press, Cambridge (1995)

    Google Scholar 

  8. Beatrice, A.: Automatic Detection of English Inclusions in Mixed-lingual Data with an Application to Parsing. Dissertation, University of Edinburgh (2007)

    Google Scholar 

  9. Auer, P.: Code-Switching in Conversation: Language, Interaction and Identity (2013)

    Google Scholar 

  10. Dey, A., Fung, P.: A hindi-english code-switching corpus. In: LREC, pp. 2410–2413 (2014)

    Google Scholar 

  11. Barman, U., Das, A., Wagner, J., Foster, J.: Code mixing: a challenge for language identification in the language of social media. In: EMNLP (2014)

    Google Scholar 

  12. Vyas, Y., et al.: POS tagging of english-hindi code-mixed social media content. In: EMNLP, vol. 14, pp. 974–979 (2014)

    Google Scholar 

  13. Ferrucci, D., et al.: Building watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)

    Article  Google Scholar 

  14. Moschitti, A., et al.: Using syntactic and semantic structural kernels for classifying definition questions in Jeopardy! In: EMNLP, pp. 712–724 (2011)

    Google Scholar 

  15. Xu, J., Zhou, Y., Wang, Y.: A classification of questions using SVM and semantic similarity analysis. In: ICICSE, pp. 31–34 (2012)

    Google Scholar 

  16. Li, X., Roth, D.: Learning question classifiers. In: International Conference on Computational Linguistics-Volume 1, pp. 1–7 (2002)

    Google Scholar 

  17. Chandu, K.R., Chinnakotla, M., Shrivastava, M.: Answer ka type kya he? Learning to classify questions in code-mixed language. In: International Conference on World Wide Web, pp. 853–858. ACM (2015)

    Google Scholar 

  18. Majumder, G., Pakray, P.: NLP-NITMZ@ MSIR 2016 system for CodeMixed crossScript question classification. In: ECIR, pp. 7–10 (2016)

    Google Scholar 

  19. Banerjee, S., et al.: The first cross-script code-mixed question answering corpus. In: ECIR (2016)

    Google Scholar 

  20. Bhat, I.A., et al.: IIIT-H system submission for FIRE 2014 shared task on transliterated search. In: FIRE, pp. 48–53 (2014)

    Google Scholar 

  21. Zhang, D., Lee, W.S.: Question classification using support vector machines. In: International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 26–32 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khyathi Raghavi Chandu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Chandu, K.R., Chinnakotla, M., Black, A.W., Shrivastava, M. (2017). WebShodh: A Code Mixed Factoid Question Answering System for Web. In: Jones, G., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2017. Lecture Notes in Computer Science(), vol 10456. Springer, Cham. https://doi.org/10.1007/978-3-319-65813-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65813-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65812-4

  • Online ISBN: 978-3-319-65813-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics