ABSTRACT
The paper discusses statistical methods for collocation extraction. We test the following hypothesis: combining several methods gives a better result than applying just one. At the first stage we suggest two methods to combine MI and t-score rankings and evaluate the results on attributive and verbal collocations against the data attested in the dictionary. At the second stage, we use regression analysis to tune up coefficients that further improve the best method discovered at the first stage. These results are evaluated against native speakers' intuition and prove our main hypothesis for most cases.
- Braslavskij, P., and Sokolov, E. 2006. Comparing four methods for automatic extraction of two-word terms from a text [Sravnenie chetyreh metodov avtomaticheskogo izvlechenija dvuhslovnyh terminov iz teksta]. In Proceedings of the International Conference "Dialogue 2006" [Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii: Trudy Mezhdunarodnoy Konferentsii "Dialog 2006"], 88--94.Google Scholar
- Evert, S., and Krenn, B. 2001. Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, 188--195. Google ScholarDigital Library
- Iordanskaia, L. N., Paperno, S., and MacKenzie, L. LaRocco, Leed J. 1996. A Russian-English collocational dictionary of the human body. Slavica Pub.Google Scholar
- Khokhlova, M. V. 2008. Evaluation of Methods for Collocation Extraction [Eksperimental'naja proverka metodov vydelelnija kollokacij]. In Slavica Helsingiensia 34. Instrumentarij rusistiki: Korpusnye podhody. Eds. A. Mustajoki, M.V. Kopotev, L.A.Birjulin, J.J. Protasova. Helsinki, 343--357.Google Scholar
- Kopotev, M., Pivovarova, L., Kochetkova, N., and Yangarber, R. 2013. Automatic detection of stable grammatical features in n-grams. In Papers from the 9th Workshop on Multiword Expressions (MWE 2013). Workshop at NAACL 2013 (Atlanta, Georgia, USA), June 13/14, 2013, Atlanta, 73--81.Google Scholar
- Mel'čuk, I. 1995. The Russian Language in the Meaning-Text Perspective. Wiener Slawistischer Almanach/Škola "Jazyki russkoj kul'tury". Vienna/Moscow.Google Scholar
- Mitrofanova, O.A., Belik, V.V., and Kadina, V.V. 2008. Corpus analysis of selectional preferences of frequent words in Russian [Korpusnoe issledovanie sochetaemostnyh predpochtenij chastotnyh leksem russkogo jazyka]. In Proceedings of the International Conference "Dialogue 2008" [Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii: Trudy Mezhdunarodnoy Konferentsii "Dialog 2008"], 7(14), 362--367.Google Scholar
- Pecina, P., and Schlesinger, P. 2006. Combining association measures for collocation extraction. In Proceedings of the COLING/ACL on Main conference poster sessions, 651--658. Google ScholarCross Ref
- Sharoff, S. 2006. Creating general-purpose corpora using automated search engine queries. In Working Papers on the Web as Corpus, Edited by Marco Baroni and Silvia Bernardini, 63--98.Google Scholar
- Toldova, S. Y., Akinina, Y. S., and Kuznetsov, I. O. 2013. The impact of syntactic structure on verb-noun collocation extraction. In Proceedings of the International Conference "Dialogue 2013" [Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii: Trudy Mezhdunarodnoy Konferentsii "Dialog 2013"], 12(19), 2--16.Google Scholar
- Voorhees, E.M. 1999. Trec-8 question answering track report. In Proceedings of the 8th Text Retrieval Conference, 77--82.Google Scholar
- Wiechmann, D. 2008. On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory, 4(2), 253--290. Google ScholarCross Ref
Index Terms
- In Search of Lost Collocations: Combining Measures to Reach the Top Range
Recommendations
Identification of Metaphorical Collocations in Different Languages – Similarities and Differences
Text, Speech, and DialogueAbstractMetaphorical collocations are a subset of collocations in which a semantic shift has occurred in one of the components. The main goal of this paper is to describe the process of identifying metaphorical collocations in different languages – ...
Stretched verb collocations with give: Their use and translation into spanish using the bnc and crea corpora
Within the context of on-going research,<xref ref-type="fn" rid="fn1">1</xref> this paper explores the pedagogical implications of contrastive analyses of multiword units in English and Spanish based on electronic corpora as a CALL resource. The main ...
Lexical Semantic Mind Maps Based on Collocations as a Tool for Teaching Vocabulary: A Case Study
Computational and Corpus-Based PhraseologyAbstractThe objective of this study is to propose and evaluate an innovative tool, which we call lexical-semantic mind maps, designed to improve learning of vocabulary and collocations. These maps are based on the glosses of the Lexical Function, which ...
Comments