Verb Sense Annotation for Turkish PropBank via Crowdsourcing

Şahin, Gözde Gül

doi:10.1007/978-3-319-75477-2_35

Verb Sense Annotation for Turkish PropBank via Crowdsourcing

Gözde Gül Şahin ORCID: orcid.org/0000-0002-0332-1657¹⁴

Conference paper
First Online: 21 March 2018

1353 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Abstract

In order to extract meaning representations from sentences, a corpus annotated with semantic roles is obligatory. Unfortunately building such a corpus requires tremendous amount of manual work for creating semantic frames and annotation of corpus. Thereby, we have divided the annotation task into two microtasks as verb sense annotation and argument annotation tasks and employed crowd intelligence to perform these microtasks. In this paper, we present our approach and the challenges on crowdsourcing verb sense disambiguation task and introduce the resource with 5855 annotated verb senses with 83.15% annotator agreement.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In PropBank only Arg0 and Arg1 are associated with a specific semantic content. Arg0 is used for actor, agent, experiencer or cause of the event; Arg1 represents the patient, if the argument is affected by the action, and theme, if the argument is not structurally changed.
2.
https://www.mturk.com.
3.
https://crowdflower.com.
4.
METU-Sabancı Treebank is a morphologically and syntactically analyzed balanced corpus with 5635 sentences.
5.
In Light Verb Constructions (LVC) with the verb ol, nominal dependent is linked with MWE dependency type to the predicate ol and in English PropBank copula is not annotated.
6.
Number of questions per page - 1.

References

Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W., Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2014 task 10: multilingual semantic textual similarity. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 81–91 (2014)
Google Scholar
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, ACL 1998, vol. 1, pp. 86–90. Association for Computational Linguistics, Stroudsburg (1998). https://doi.org/10.3115/980845.980860
Basile, V., Bos, J., Evang, K., Venhuizen, N.: Developing a large semantically annotated corpus. In: LREC, vol. 12, pp. 3196–3200 (2012)
Google Scholar
Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D.M., Xia, F.: A multi-representational and multi-layered treebank for Hindi/Urdu. In: Proceedings of the Third Linguistic Annotation Workshop, ACL-IJCNLP 2009, pp. 186–189. Association for Computational Linguistics, Stroudsburg (2009). http://dl.acm.org/citation.cfm?id=1698381.1698417
Branco, A., Carvalheiro, C., Pereira, S., Silveira, S., Silva, J., Castro, S., Graça, J.: A PropBank for Portuguese: the CINTIL-PropBank. In: LREC, pp. 1516–1521 (2012)
Google Scholar
Callison-Burch, C., Ungar, L., Pavlick, E.: Crowdsourcing for NLP. In: Proceedings of NAACL 2015. North America Association for Computational Linguistics (2015)
Google Scholar
Duran, M.S., Aluísio, S.M.: Propbank-Br: a Brazilian treebank annotated with semantic role labels. In: LREC, pp. 1862–1867 (2012)
Google Scholar
Eryiğit, G., Nivre, J., Oflazer, K.: Dependency parsing of Turkish. Comput. Linguist. 34(3), 357–389 (2008)
Article Google Scholar
Fossati, M., Giuliano, C., Tonelli, S.: Outsourcing FrameNet to the crowd. In: ACL, vol. 2, pp. 742–747 (2013)
Google Scholar
Fossati, M., Tonelli, S., Giuliano, C.: Frame semantics annotation made easy with DBpedia. In: Crowdsourcing the Semantic Web (2013)
Google Scholar
Haverinen, K., Kanerva, J., Kohonen, S., Missilä, A., Ojala, S., Viljanen, T., Laippala, V., Ginter, F.: The Finnish proposition bank. Lang. Resour. Eval. 49(4), 907–926 (2015)
Article Google Scholar
İşgüder, G.G., Adalı, E.: Using morphosemantic information in construction of a pilot lexical semantic resource for Turkish. In: Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing, pp. 46–54. Association for Computational Linguistics and Dublin City University, Dublin, August 2014. http://www.aclweb.org/anthology/W14-5807
Madnani, N., Tetreault, J., Chodorow, M., Rozovskaya, A.: They can help: using crowdsourcing to improve the evaluation of grammatical error detection systems. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 508–513. Association for Computational Linguistics (2011)
Google Scholar
Negri, M., Mehdad, Y.: Creating a bi-lingual entailment corpus through translations with mechanical turk: $100 for a 10-day rush. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 212–216. Association for Computational Linguistics (2010)
Google Scholar
Oflazer, K., Say, B., Hakkani-Tür, D.Z., Tür, G.: Building a Turkish treebank. In: Abeillè, A. (ed.) Treebanks: Building and Using Parsed Corpora. Text, Speech and Language Technology, vol. 20, pp. 261–277. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0201-1_15
Chapter Google Scholar
Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)
Article Google Scholar
Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: towards best practice guidelines. In: Proceedings of LREC (2014)
Google Scholar
Sahin, I.G.G.: Framing of verbs for Turkish PropBank. In: Proceedings of Turkic Computational Linguistics, TurCLing 2016, 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLING 2016 (2016)
Google Scholar
Schuler, K.K.: VerbNet: a broad-coverage, comprehensive verb lexicon. Doctoral dissertation, University of Pennsylvania (2005)
Google Scholar
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. Association for Computational Linguistics (2008)
Google Scholar
Sulubacak, U., Eryiğit, G.: A redefined Turkish dependency grammar and its implementations: a new Turkish web treebank & the revised Turkish treebank. In: Proceedings of Turkic Computational Linguistics, TurCLing 2016, 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLING 2016 (2016)
Google Scholar
Xue, N., Palmer, M.: Adding semantic roles to the Chinese treebank. Nat. Lang. Eng. 15(1), 143–172 (2009)
Article Google Scholar
Zaghouani, W., Diab, M., Mansouri, A., Pradhan, S., Palmer, M.: The revised Arabic PropBank. In: Proceedings of the Fourth Linguistic Annotation Workshop, LAW IV 2010, pp. 222–226. Association for Computational Linguistics, Stroudsburg (2010). http://dl.acm.org/citation.cfm?id=1868720.1868756
Zhai, H., Lingren, T., Deleger, L., Li, Q., Kaiser, M., Stoutenborough, L., Solti, I.: Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. J. Med. Internet Res. 15(4), e73 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
Gözde Gül Şahin

Authors

Gözde Gül Şahin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gözde Gül Şahin .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Şahin, G.G. (2018). Verb Sense Annotation for Turkish PropBank via Crowdsourcing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-75477-2_35
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics