skip to main content
10.3115/1220575.1220680dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
Article
Free Access

Using the web as an implicit training set: application to structural ambiguity resolution

Published:06 October 2005Publication History

ABSTRACT

Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of surface features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using unsupervised algorithms, we achieve 84% precision on PP-attachment and 80% on noun compound coordination.

References

  1. Rajeev Agarwal and Lois Boggess. 1992. A simple but useful approach to conjunct identification. In Proceedings of ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michele Banko and Eric Brill. 2001. Scaling to very very large corpora for natural language disambiguation. In Proceedings of ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Eric Brill and Philip Resnik. 1994. A rule-based approach to prepositional phrase attachment disambiguation. In Proceedings of COLING. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hiram Calvo and Alexander Gelbukh. 2003. Improving prepositional phrase attachment disambiguation using the web as corpus. In Progress in Pattern Recognition, Speech and Image Analysis: 8th Iberoamerican Congress on Pattern Recognition, CIARP 2003.Google ScholarGoogle Scholar
  5. Francis Chantree, Adam Kilgarriff, Anne De Roeck, and Alistair Willis. 2005. Using a distributional thesaurus to resolve coordination ambiguities. In Technical Report 2005/02. The Open University, UK.Google ScholarGoogle Scholar
  6. Kenneth Church and Ramesh Patil. 1982. Coping with syntactic ambiguity or how to put the block in the box on the table. Amer. J. of Computational Linguistics, 8(3--4):139--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Michael Collins and James Brooks. 1995. Prepositional phrase attachment through a backed-off model. In Proceedings of EMNLP, pages 27--38.Google ScholarGoogle Scholar
  8. M. Collins. 1997. Three generative, lexicalised models for statistical parsing. In Proceedings of ACL, pages 16--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joseph Fleiss. 1981. Statistical Methods for Rates and Proportions (2nd Ed.). John Wiley & Sons, New York.Google ScholarGoogle Scholar
  10. Miriam Goldberg. 1999. An unsupervised model for statistically determining coordinate phrase attachment. In Proceedings of ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Donald Hindle and Mats Rooth. 1993. Structural ambiguity and lexical relations. Computational Linguistics, 19(1):103--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sadao Kurohashi and Makoto Nagao. 1992. Dynamic programming method for analyzing conjunctive structures in japanese. In Proceedings of COLING, volume 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mirella Lapata and Frank Keller. 2004. The Web as a baseline: Evaluating the performance of unsupervised Web-based models for a range of NLP tasks. In Proceedings of HLT-NAACL, pages 121--128, Boston.Google ScholarGoogle Scholar
  14. Mirella Lapata and Frank Keller. 2005. Web-based models for natural language processing. ACM Transactions on Speech and Language Processing, 2:1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mitchell Marcus, Beatrice Santorini, and Mary Marcinkiewicz. 1994. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Preslav Nakov and Marti Hearst. 2005. Search engine statistics beyond the n-gram: Application to noun compound bracketing. In Proceedings of CoNLL 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Patrick Pantel and Dekang Lin. 2000. An unsupervised approach to prepositional phrase attachment using contextually similar words. In Proceedings of ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Adwait Ratnaparkhi, Jeff Reynar, and Salim Roukos. 1994. A maximum entropy model for prepositional phrase attachment. In Proceedings of the ARPA Workshop on Human Language Technology., pages 250--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Adwait Ratnaparkhi. 1998. Statistical models for unsupervised prepositional phrase attachment. In Proceedings of COLING-ACL, volume 2, pages 1079--1085. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Philip Resnik. 1993. Selection and information: a class-based approach to lexical relationships. Ph.D. thesis, University of Pennsylvania, UMI Order No. GAX94-13894. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Philip Resnik. 1999. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. JAIR, 11:95--130.Google ScholarGoogle ScholarCross RefCross Ref
  22. Vasile Rus, Dan Moldovan, and Orest Bolohan. 2002. Bracketing compound nouns for logic form derivation. In Susan M. Haller and Gene Simmons, editors, FLAIRS Conference. AAAI Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jiri Stetina and Makoto. 1997. Corpus based PP attachment ambiguity resolution with a semantic dictionary. In Proceedings of WVLC, pages 66--80.Google ScholarGoogle Scholar
  24. Kristina Toutanova, Christopher D. Manning, and Andrew Y. Ng. 2004. Learning random walk models for inducing word dependency distributions. In Proceedings of ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Martin Volk. 2000. Scaling up. using the WWW to resolve PP attachment ambiguities. In Proceedings of Konvens-2000. Sprachkommunikation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Martin Volk. 2001. Exploiting the WWW as a corpus to resolve PP attachment ambiguities. In Proc. of Corpus Linguistics.Google ScholarGoogle Scholar
  1. Using the web as an implicit training set: application to structural ambiguity resolution

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
        October 2005
        1054 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 6 October 2005

        Qualifiers

        • Article

        Acceptance Rates

        HLT '05 Paper Acceptance Rate127of402submissions,32%Overall Acceptance Rate240of768submissions,31%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader