skip to main content
10.1145/3143699.3143731acmotherconferencesArticle/Chapter ViewAbstractPublication PagesimsConference Proceedingsconference-collections
short-paper

In Search of Lost Collocations: Combining Measures to Reach the Top Range

Authors Info & Claims
Published:21 June 2017Publication History

ABSTRACT

The paper discusses statistical methods for collocation extraction. We test the following hypothesis: combining several methods gives a better result than applying just one. At the first stage we suggest two methods to combine MI and t-score rankings and evaluate the results on attributive and verbal collocations against the data attested in the dictionary. At the second stage, we use regression analysis to tune up coefficients that further improve the best method discovered at the first stage. These results are evaluated against native speakers' intuition and prove our main hypothesis for most cases.

References

  1. Braslavskij, P., and Sokolov, E. 2006. Comparing four methods for automatic extraction of two-word terms from a text [Sravnenie chetyreh metodov avtomaticheskogo izvlechenija dvuhslovnyh terminov iz teksta]. In Proceedings of the International Conference "Dialogue 2006" [Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii: Trudy Mezhdunarodnoy Konferentsii "Dialog 2006"], 88--94.Google ScholarGoogle Scholar
  2. Evert, S., and Krenn, B. 2001. Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, 188--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Iordanskaia, L. N., Paperno, S., and MacKenzie, L. LaRocco, Leed J. 1996. A Russian-English collocational dictionary of the human body. Slavica Pub.Google ScholarGoogle Scholar
  4. Khokhlova, M. V. 2008. Evaluation of Methods for Collocation Extraction [Eksperimental'naja proverka metodov vydelelnija kollokacij]. In Slavica Helsingiensia 34. Instrumentarij rusistiki: Korpusnye podhody. Eds. A. Mustajoki, M.V. Kopotev, L.A.Birjulin, J.J. Protasova. Helsinki, 343--357.Google ScholarGoogle Scholar
  5. Kopotev, M., Pivovarova, L., Kochetkova, N., and Yangarber, R. 2013. Automatic detection of stable grammatical features in n-grams. In Papers from the 9th Workshop on Multiword Expressions (MWE 2013). Workshop at NAACL 2013 (Atlanta, Georgia, USA), June 13/14, 2013, Atlanta, 73--81.Google ScholarGoogle Scholar
  6. Mel'čuk, I. 1995. The Russian Language in the Meaning-Text Perspective. Wiener Slawistischer Almanach/Škola "Jazyki russkoj kul'tury". Vienna/Moscow.Google ScholarGoogle Scholar
  7. Mitrofanova, O.A., Belik, V.V., and Kadina, V.V. 2008. Corpus analysis of selectional preferences of frequent words in Russian [Korpusnoe issledovanie sochetaemostnyh predpochtenij chastotnyh leksem russkogo jazyka]. In Proceedings of the International Conference "Dialogue 2008" [Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii: Trudy Mezhdunarodnoy Konferentsii "Dialog 2008"], 7(14), 362--367.Google ScholarGoogle Scholar
  8. Pecina, P., and Schlesinger, P. 2006. Combining association measures for collocation extraction. In Proceedings of the COLING/ACL on Main conference poster sessions, 651--658. Google ScholarGoogle ScholarCross RefCross Ref
  9. Sharoff, S. 2006. Creating general-purpose corpora using automated search engine queries. In Working Papers on the Web as Corpus, Edited by Marco Baroni and Silvia Bernardini, 63--98.Google ScholarGoogle Scholar
  10. Toldova, S. Y., Akinina, Y. S., and Kuznetsov, I. O. 2013. The impact of syntactic structure on verb-noun collocation extraction. In Proceedings of the International Conference "Dialogue 2013" [Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii: Trudy Mezhdunarodnoy Konferentsii "Dialog 2013"], 12(19), 2--16.Google ScholarGoogle Scholar
  11. Voorhees, E.M. 1999. Trec-8 question answering track report. In Proceedings of the 8th Text Retrieval Conference, 77--82.Google ScholarGoogle Scholar
  12. Wiechmann, D. 2008. On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory, 4(2), 253--290. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. In Search of Lost Collocations: Combining Measures to Reach the Top Range

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        IMS2017: Proceedings of the International Conference IMS-2017
        June 2017
        302 pages
        ISBN:9781450354370
        DOI:10.1145/3143699

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 June 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
        • Research
        • Refereed limited

        Acceptance Rates

        IMS2017 Paper Acceptance Rate46of101submissions,46%Overall Acceptance Rate46of101submissions,46%
      • Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader