Skip to main content

Verb Sense Annotation for Turkish PropBank via Crowdsourcing

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Abstract

In order to extract meaning representations from sentences, a corpus annotated with semantic roles is obligatory. Unfortunately building such a corpus requires tremendous amount of manual work for creating semantic frames and annotation of corpus. Thereby, we have divided the annotation task into two microtasks as verb sense annotation and argument annotation tasks and employed crowd intelligence to perform these microtasks. In this paper, we present our approach and the challenges on crowdsourcing verb sense disambiguation task and introduce the resource with 5855 annotated verb senses with 83.15% annotator agreement.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In PropBank only Arg0 and Arg1 are associated with a specific semantic content. Arg0 is used for actor, agent, experiencer or cause of the event; Arg1 represents the patient, if the argument is affected by the action, and theme, if the argument is not structurally changed.

  2. 2.

    https://www.mturk.com.

  3. 3.

    https://crowdflower.com.

  4. 4.

    METU-Sabancı Treebank is a morphologically and syntactically analyzed balanced corpus with 5635 sentences.

  5. 5.

    In Light Verb Constructions (LVC) with the verb ol, nominal dependent is linked with MWE dependency type to the predicate ol and in English PropBank copula is not annotated.

  6. 6.

    Number of questions per page - 1.

References

  1. Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W., Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2014 task 10: multilingual semantic textual similarity. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 81–91 (2014)

    Google Scholar 

  2. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, ACL 1998, vol. 1, pp. 86–90. Association for Computational Linguistics, Stroudsburg (1998). https://doi.org/10.3115/980845.980860

  3. Basile, V., Bos, J., Evang, K., Venhuizen, N.: Developing a large semantically annotated corpus. In: LREC, vol. 12, pp. 3196–3200 (2012)

    Google Scholar 

  4. Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D.M., Xia, F.: A multi-representational and multi-layered treebank for Hindi/Urdu. In: Proceedings of the Third Linguistic Annotation Workshop, ACL-IJCNLP 2009, pp. 186–189. Association for Computational Linguistics, Stroudsburg (2009). http://dl.acm.org/citation.cfm?id=1698381.1698417

  5. Branco, A., Carvalheiro, C., Pereira, S., Silveira, S., Silva, J., Castro, S., Graça, J.: A PropBank for Portuguese: the CINTIL-PropBank. In: LREC, pp. 1516–1521 (2012)

    Google Scholar 

  6. Callison-Burch, C., Ungar, L., Pavlick, E.: Crowdsourcing for NLP. In: Proceedings of NAACL 2015. North America Association for Computational Linguistics (2015)

    Google Scholar 

  7. Duran, M.S., Aluísio, S.M.: Propbank-Br: a Brazilian treebank annotated with semantic role labels. In: LREC, pp. 1862–1867 (2012)

    Google Scholar 

  8. Eryiğit, G., Nivre, J., Oflazer, K.: Dependency parsing of Turkish. Comput. Linguist. 34(3), 357–389 (2008)

    Article  Google Scholar 

  9. Fossati, M., Giuliano, C., Tonelli, S.: Outsourcing FrameNet to the crowd. In: ACL, vol. 2, pp. 742–747 (2013)

    Google Scholar 

  10. Fossati, M., Tonelli, S., Giuliano, C.: Frame semantics annotation made easy with DBpedia. In: Crowdsourcing the Semantic Web (2013)

    Google Scholar 

  11. Haverinen, K., Kanerva, J., Kohonen, S., Missilä, A., Ojala, S., Viljanen, T., Laippala, V., Ginter, F.: The Finnish proposition bank. Lang. Resour. Eval. 49(4), 907–926 (2015)

    Article  Google Scholar 

  12. İşgüder, G.G., Adalı, E.: Using morphosemantic information in construction of a pilot lexical semantic resource for Turkish. In: Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing, pp. 46–54. Association for Computational Linguistics and Dublin City University, Dublin, August 2014. http://www.aclweb.org/anthology/W14-5807

  13. Madnani, N., Tetreault, J., Chodorow, M., Rozovskaya, A.: They can help: using crowdsourcing to improve the evaluation of grammatical error detection systems. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 508–513. Association for Computational Linguistics (2011)

    Google Scholar 

  14. Negri, M., Mehdad, Y.: Creating a bi-lingual entailment corpus through translations with mechanical turk: $100 for a 10-day rush. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 212–216. Association for Computational Linguistics (2010)

    Google Scholar 

  15. Oflazer, K., Say, B., Hakkani-Tür, D.Z., Tür, G.: Building a Turkish treebank. In: Abeillè, A. (ed.) Treebanks: Building and Using Parsed Corpora. Text, Speech and Language Technology, vol. 20, pp. 261–277. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0201-1_15

    Chapter  Google Scholar 

  16. Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)

    Article  Google Scholar 

  17. Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: towards best practice guidelines. In: Proceedings of LREC (2014)

    Google Scholar 

  18. Sahin, I.G.G.: Framing of verbs for Turkish PropBank. In: Proceedings of Turkic Computational Linguistics, TurCLing 2016, 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLING 2016 (2016)

    Google Scholar 

  19. Schuler, K.K.: VerbNet: a broad-coverage, comprehensive verb lexicon. Doctoral dissertation, University of Pennsylvania (2005)

    Google Scholar 

  20. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. Association for Computational Linguistics (2008)

    Google Scholar 

  21. Sulubacak, U., Eryiğit, G.: A redefined Turkish dependency grammar and its implementations: a new Turkish web treebank & the revised Turkish treebank. In: Proceedings of Turkic Computational Linguistics, TurCLing 2016, 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLING 2016 (2016)

    Google Scholar 

  22. Xue, N., Palmer, M.: Adding semantic roles to the Chinese treebank. Nat. Lang. Eng. 15(1), 143–172 (2009)

    Article  Google Scholar 

  23. Zaghouani, W., Diab, M., Mansouri, A., Pradhan, S., Palmer, M.: The revised Arabic PropBank. In: Proceedings of the Fourth Linguistic Annotation Workshop, LAW IV 2010, pp. 222–226. Association for Computational Linguistics, Stroudsburg (2010). http://dl.acm.org/citation.cfm?id=1868720.1868756

  24. Zhai, H., Lingren, T., Deleger, L., Li, Q., Kaiser, M., Stoutenborough, L., Solti, I.: Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. J. Med. Internet Res. 15(4), e73 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gözde Gül Şahin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Şahin, G.G. (2018). Verb Sense Annotation for Turkish PropBank via Crowdsourcing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75477-2_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75476-5

  • Online ISBN: 978-3-319-75477-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics