Abstract
In order to extract meaning representations from sentences, a corpus annotated with semantic roles is obligatory. Unfortunately building such a corpus requires tremendous amount of manual work for creating semantic frames and annotation of corpus. Thereby, we have divided the annotation task into two microtasks as verb sense annotation and argument annotation tasks and employed crowd intelligence to perform these microtasks. In this paper, we present our approach and the challenges on crowdsourcing verb sense disambiguation task and introduce the resource with 5855 annotated verb senses with 83.15% annotator agreement.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In PropBank only Arg0 and Arg1 are associated with a specific semantic content. Arg0 is used for actor, agent, experiencer or cause of the event; Arg1 represents the patient, if the argument is affected by the action, and theme, if the argument is not structurally changed.
- 2.
- 3.
- 4.
METU-Sabancı Treebank is a morphologically and syntactically analyzed balanced corpus with 5635 sentences.
- 5.
In Light Verb Constructions (LVC) with the verb ol, nominal dependent is linked with MWE dependency type to the predicate ol and in English PropBank copula is not annotated.
- 6.
Number of questions per page - 1.
References
Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W., Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2014 task 10: multilingual semantic textual similarity. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 81–91 (2014)
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, ACL 1998, vol. 1, pp. 86–90. Association for Computational Linguistics, Stroudsburg (1998). https://doi.org/10.3115/980845.980860
Basile, V., Bos, J., Evang, K., Venhuizen, N.: Developing a large semantically annotated corpus. In: LREC, vol. 12, pp. 3196–3200 (2012)
Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D.M., Xia, F.: A multi-representational and multi-layered treebank for Hindi/Urdu. In: Proceedings of the Third Linguistic Annotation Workshop, ACL-IJCNLP 2009, pp. 186–189. Association for Computational Linguistics, Stroudsburg (2009). http://dl.acm.org/citation.cfm?id=1698381.1698417
Branco, A., Carvalheiro, C., Pereira, S., Silveira, S., Silva, J., Castro, S., Graça, J.: A PropBank for Portuguese: the CINTIL-PropBank. In: LREC, pp. 1516–1521 (2012)
Callison-Burch, C., Ungar, L., Pavlick, E.: Crowdsourcing for NLP. In: Proceedings of NAACL 2015. North America Association for Computational Linguistics (2015)
Duran, M.S., Aluísio, S.M.: Propbank-Br: a Brazilian treebank annotated with semantic role labels. In: LREC, pp. 1862–1867 (2012)
Eryiğit, G., Nivre, J., Oflazer, K.: Dependency parsing of Turkish. Comput. Linguist. 34(3), 357–389 (2008)
Fossati, M., Giuliano, C., Tonelli, S.: Outsourcing FrameNet to the crowd. In: ACL, vol. 2, pp. 742–747 (2013)
Fossati, M., Tonelli, S., Giuliano, C.: Frame semantics annotation made easy with DBpedia. In: Crowdsourcing the Semantic Web (2013)
Haverinen, K., Kanerva, J., Kohonen, S., Missilä, A., Ojala, S., Viljanen, T., Laippala, V., Ginter, F.: The Finnish proposition bank. Lang. Resour. Eval. 49(4), 907–926 (2015)
İşgüder, G.G., Adalı, E.: Using morphosemantic information in construction of a pilot lexical semantic resource for Turkish. In: Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing, pp. 46–54. Association for Computational Linguistics and Dublin City University, Dublin, August 2014. http://www.aclweb.org/anthology/W14-5807
Madnani, N., Tetreault, J., Chodorow, M., Rozovskaya, A.: They can help: using crowdsourcing to improve the evaluation of grammatical error detection systems. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 508–513. Association for Computational Linguistics (2011)
Negri, M., Mehdad, Y.: Creating a bi-lingual entailment corpus through translations with mechanical turk: $100 for a 10-day rush. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 212–216. Association for Computational Linguistics (2010)
Oflazer, K., Say, B., Hakkani-Tür, D.Z., Tür, G.: Building a Turkish treebank. In: Abeillè, A. (ed.) Treebanks: Building and Using Parsed Corpora. Text, Speech and Language Technology, vol. 20, pp. 261–277. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0201-1_15
Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)
Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: towards best practice guidelines. In: Proceedings of LREC (2014)
Sahin, I.G.G.: Framing of verbs for Turkish PropBank. In: Proceedings of Turkic Computational Linguistics, TurCLing 2016, 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLING 2016 (2016)
Schuler, K.K.: VerbNet: a broad-coverage, comprehensive verb lexicon. Doctoral dissertation, University of Pennsylvania (2005)
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. Association for Computational Linguistics (2008)
Sulubacak, U., Eryiğit, G.: A redefined Turkish dependency grammar and its implementations: a new Turkish web treebank & the revised Turkish treebank. In: Proceedings of Turkic Computational Linguistics, TurCLing 2016, 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLING 2016 (2016)
Xue, N., Palmer, M.: Adding semantic roles to the Chinese treebank. Nat. Lang. Eng. 15(1), 143–172 (2009)
Zaghouani, W., Diab, M., Mansouri, A., Pradhan, S., Palmer, M.: The revised Arabic PropBank. In: Proceedings of the Fourth Linguistic Annotation Workshop, LAW IV 2010, pp. 222–226. Association for Computational Linguistics, Stroudsburg (2010). http://dl.acm.org/citation.cfm?id=1868720.1868756
Zhai, H., Lingren, T., Deleger, L., Li, Q., Kaiser, M., Stoutenborough, L., Solti, I.: Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. J. Med. Internet Res. 15(4), e73 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Şahin, G.G. (2018). Verb Sense Annotation for Turkish PropBank via Crowdsourcing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-75477-2_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)