Article

Free Access

A geometric view on bilingual lexicon extraction from comparable corpora

Authors:
E. Gaussier

Xerox Research Centre Europe, Meylan, France

Xerox Research Centre Europe, Meylan, France
View Profile

,
J.-M. Renders

Xerox Research Centre Europe, Meylan, France

Xerox Research Centre Europe, Meylan, France
View Profile

,
I. Matveeva

University of Chicago, Chicago, IL

University of Chicago, Chicago, IL
View Profile

,
C. Goutte

Xerox Research Centre Europe, Meylan, France

Xerox Research Centre Europe, Meylan, France
View Profile

,
H. Déjean

Xerox Research Centre Europe, Meylan, France

Xerox Research Centre Europe, Meylan, France
View Profile

ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational LinguisticsJuly 2004Pages 526–eshttps://doi.org/10.3115/1218955.1219022

Published:21 July 2004Publication History

ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Pages 526–es

ABSTRACT

We present a geometric view on bilingual lexicon extraction from comparable corpora, which allows to re-interpret the methods proposed so far and identify unresolved problems. This motivates three new methods that aim at solving these problems. Empirical evaluation shows the strengths and weaknesses of these methods, as well as a significant gain in the accuracy of extracted lexicons.

References

F. R. Bach and M. I. Jordan. 2001. Kernel independent component analysis. Journal of Machine Learning Research. Google ScholarDigital Library
R. Besançon, M. Rajman, and J.-C. Chappelier. 1999. Textual similarities based on a distributional approach. In Proceedings of the Tenth International Workshop on Database and Expert Systems Applications (DEX'99), Florence, Italy. Google ScholarDigital Library
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407.Google ScholarCross Ref
H. Dejean, E. Gaussier, and F. Sadat. 2002. An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In International Conference on Computational Linguistics, COLING'02. Google ScholarDigital Library
A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1): 1--38.Google Scholar
Mona Diab and Steve Finch. 2000. A statistical word-level translation model for comparable corpora. In Proceeding of the Conference on Content-Based Multimedia Information Access (RIAO).Google ScholarCross Ref
Pascale Fung. 2000. A statistical view on bilingual lexicon extraction - from parallel corpora to nonparallel corpora. In J. Véronis, editor, Parallel Text Processing. Kluwer Academic Publishers.Google Scholar
G. Grefenstette. 1994. Explorations in Automatic Thesaurus Construction. Kluwer Academic Publishers. Google ScholarDigital Library
Thomas Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pages 289--296. Morgan Kaufmann. Google ScholarDigital Library
Thomas Hofmann. 2000. Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In Advances in Neural Information Processing Systems 12, page 914. MIT Press.Google Scholar
Tommi S. Jaakkola and David Haussler. 1999. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11, pages 487--493. Google ScholarDigital Library
Philipp Koehn and Kevin Knight. 2002. Learning a translation lexicon from monolingual corpora. In ACL 2002 Workshop on Unsupervised Lexical Acquisition. Google ScholarDigital Library
P. A. W. Lewis, P. B. Baxendale, and J. L. Bennet. 1967. Statistical discrimination of the synonym/antonym relationship between words. Journal of the ACM. Google ScholarDigital Library
C. Peters and E. Picchi. 1995. Capturing the comparable: A system for querying comparable text corpora. In JADT'95--3rd International Conference on Statistical Analysis of Textual Data, pages 255--262.Google Scholar
R. Rapp. 1995. Identifying word translations in nonparallel texts. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
I. Shahzad, K. Ohtake, S. Masuyama, and K. Yamamoto. 1999. Identifying translations of compound nouns using non-aligned corpora. In Proceedings of the Workshop MAL'99, pages 108--113.Google Scholar
K. Tanaka and Hideya Iwasaki. 1996. Extraction of lexical translations from non-aligned corpora. In International Conference on Computational Linguistics, COLING'96. Google ScholarDigital Library
Naonori Ueda and Ryohei Nakano. 1995. Deterministic annealing variant of the EM algorithm. In Advances in Neural Information Processing Systems 7, pages 545--552.Google Scholar
A. Vinokourov, J. Shawe-Taylor, and N. Cristianini. 2002. Finding language-independent semantic representation of text using kernel canonical correlation analysis. In Advances in Neural Information Processing Systems 12.Google Scholar

Recommendations

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns ...
Read More
Clustering comparable corpora for bilingual lexicon extraction
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2

We study in this paper the problem of enhancing the comparability of bilingual corpora in order to improve the quality of bilingual lexicons extracted from comparable corpora. We introduce a clustering-based approach for enhancing corpus comparability ...
Read More
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
AMTA '98: Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup

We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
July 2004
729 pages
General Chair:
Donia Scott
ITRI, University of Brighton
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 21 July 2004
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 401
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A geometric view on bilingual lexicon extraction from comparable corpora

ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

Clustering comparable corpora for bilingual lexicon extraction

A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A geometric view on bilingual lexicon extraction from comparable corpora

ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

Clustering comparable corpora for bilingual lexicon extraction

A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media