Article

Free Access

Using the web as an implicit training set: application to structural ambiguity resolution

Authors:
Preslav Nakov

University of California at Berkeley, Berkeley, CA

University of California at Berkeley, Berkeley, CA
View Profile

,
Marti Hearst

University of California at Berkeley, Berkeley, CA

University of California at Berkeley, Berkeley, CA
View Profile

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language ProcessingOctober 2005Pages 835–842https://doi.org/10.3115/1220575.1220680

Published:06 October 2005Publication History

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Pages 835–842

ABSTRACT

Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of surface features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using unsupervised algorithms, we achieve 84% precision on PP-attachment and 80% on noun compound coordination.

References

Rajeev Agarwal and Lois Boggess. 1992. A simple but useful approach to conjunct identification. In Proceedings of ACL. Google ScholarDigital Library
Michele Banko and Eric Brill. 2001. Scaling to very very large corpora for natural language disambiguation. In Proceedings of ACL. Google ScholarDigital Library
Eric Brill and Philip Resnik. 1994. A rule-based approach to prepositional phrase attachment disambiguation. In Proceedings of COLING. Google ScholarDigital Library
Hiram Calvo and Alexander Gelbukh. 2003. Improving prepositional phrase attachment disambiguation using the web as corpus. In Progress in Pattern Recognition, Speech and Image Analysis: 8th Iberoamerican Congress on Pattern Recognition, CIARP 2003.Google Scholar
Francis Chantree, Adam Kilgarriff, Anne De Roeck, and Alistair Willis. 2005. Using a distributional thesaurus to resolve coordination ambiguities. In Technical Report 2005/02. The Open University, UK.Google Scholar
Kenneth Church and Ramesh Patil. 1982. Coping with syntactic ambiguity or how to put the block in the box on the table. Amer. J. of Computational Linguistics, 8(3--4):139--149. Google ScholarDigital Library
Michael Collins and James Brooks. 1995. Prepositional phrase attachment through a backed-off model. In Proceedings of EMNLP, pages 27--38.Google Scholar
M. Collins. 1997. Three generative, lexicalised models for statistical parsing. In Proceedings of ACL, pages 16--23. Google ScholarDigital Library
Joseph Fleiss. 1981. Statistical Methods for Rates and Proportions (2nd Ed.). John Wiley & Sons, New York.Google Scholar
Miriam Goldberg. 1999. An unsupervised model for statistically determining coordinate phrase attachment. In Proceedings of ACL. Google ScholarDigital Library
Donald Hindle and Mats Rooth. 1993. Structural ambiguity and lexical relations. Computational Linguistics, 19(1):103--120. Google ScholarDigital Library
Sadao Kurohashi and Makoto Nagao. 1992. Dynamic programming method for analyzing conjunctive structures in japanese. In Proceedings of COLING, volume 1. Google ScholarDigital Library
Mirella Lapata and Frank Keller. 2004. The Web as a baseline: Evaluating the performance of unsupervised Web-based models for a range of NLP tasks. In Proceedings of HLT-NAACL, pages 121--128, Boston.Google Scholar
Mirella Lapata and Frank Keller. 2005. Web-based models for natural language processing. ACM Transactions on Speech and Language Processing, 2:1--31. Google ScholarDigital Library
Mitchell Marcus, Beatrice Santorini, and Mary Marcinkiewicz. 1994. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330. Google ScholarDigital Library
Preslav Nakov and Marti Hearst. 2005. Search engine statistics beyond the n-gram: Application to noun compound bracketing. In Proceedings of CoNLL 2005. Google ScholarDigital Library
Patrick Pantel and Dekang Lin. 2000. An unsupervised approach to prepositional phrase attachment using contextually similar words. In Proceedings of ACL. Google ScholarDigital Library
Adwait Ratnaparkhi, Jeff Reynar, and Salim Roukos. 1994. A maximum entropy model for prepositional phrase attachment. In Proceedings of the ARPA Workshop on Human Language Technology., pages 250--255. Google ScholarDigital Library
Adwait Ratnaparkhi. 1998. Statistical models for unsupervised prepositional phrase attachment. In Proceedings of COLING-ACL, volume 2, pages 1079--1085. Google ScholarDigital Library
Philip Resnik. 1993. Selection and information: a class-based approach to lexical relationships. Ph.D. thesis, University of Pennsylvania, UMI Order No. GAX94-13894. Google ScholarDigital Library
Philip Resnik. 1999. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. JAIR, 11:95--130.Google ScholarCross Ref
Vasile Rus, Dan Moldovan, and Orest Bolohan. 2002. Bracketing compound nouns for logic form derivation. In Susan M. Haller and Gene Simmons, editors, FLAIRS Conference. AAAI Press. Google ScholarDigital Library
Jiri Stetina and Makoto. 1997. Corpus based PP attachment ambiguity resolution with a semantic dictionary. In Proceedings of WVLC, pages 66--80.Google Scholar
Kristina Toutanova, Christopher D. Manning, and Andrew Y. Ng. 2004. Learning random walk models for inducing word dependency distributions. In Proceedings of ICML. Google ScholarDigital Library
Martin Volk. 2000. Scaling up. using the WWW to resolve PP attachment ambiguities. In Proceedings of Konvens-2000. Sprachkommunikation. Google ScholarDigital Library
Martin Volk. 2001. Exploiting the WWW as a corpus to resolve PP attachment ambiguities. In Proc. of Corpus Linguistics.Google Scholar

Using the web as an implicit training set: application to structural ambiguity resolution
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Using the web as an implicit training set: application to noun compound syntax and semantics
Read More
Imaging implicit morphological processing: Evidence from hebrew

Is morphology a discrete and independent element of lexical structure or does it simply reflect a fine-tuning of the system to the statistical correlation that exists among orthographic and semantic properties of words? Hebrew provides a unique ...
Read More
Weakly supervised morphology learning for agglutinating languages using small training sets
COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics

The paper describes a weakly supervised approach for decomposing words into all morphemes: stems, prefixes and suffixes, using wordforms with marked stems as training data. As we concentrate on under-resourced languages, the amount of training data is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
October 2005
1054 pages
Conference Chair:
Raymond J. Mooney
The University of Texas at Austin
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 6 October 2005
Qualifiers
- Article
Conference

Acceptance Rates
HLT '05 Paper Acceptance Rate127of402submissions,32%Overall Acceptance Rate240of768submissions,31%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 386
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using the web as an implicit training set: application to structural ambiguity resolution

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Using the web as an implicit training set: application to noun compound syntax and semantics

Imaging implicit morphological processing: Evidence from hebrew

Weakly supervised morphology learning for agglutinating languages using small training sets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Using the web as an implicit training set: application to structural ambiguity resolution

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Using the web as an implicit training set: application to noun compound syntax and semantics

Imaging implicit morphological processing: Evidence from hebrew

Weakly supervised morphology learning for agglutinating languages using small training sets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media