Article

Free Access

An IR approach for translating new words from nonparallel, comparable texts

Authors:
Pascale Fung

University of Science and Technology, Clear Water Bay, Hong Kong

University of Science and Technology, Clear Water Bay, Hong Kong
View Profile

,
Lo Yuen Yee

University of Science and Technology, Clear Water Bay, Hong Kong

University of Science and Technology, Clear Water Bay, Hong Kong
View Profile

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1August 1998Pages 414–420https://doi.org/10.3115/980845.980916

Published:10 August 1998Publication History

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1

Pages 414–420

References

A. Bookstein. 1983. Explanation and generalization of vector models in information retrieval. In Proceedings of the 6th Annual International Conference on Research and Development in Information Retrieval, pages 118--132. Google ScholarDigital Library
P. Brown, J. Lai, and R. Mercer. 1991. Aligning sentences in parallel corpora. In Proceedings of the 29th Annual Conference of the Association for Computational Linguistics. Google ScholarDigital Library
P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. 1993. The mathematics of machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311. Google ScholarDigital Library
Kenneth Church. 1993. Char_align: A program for aligning parallel texts at the character level. In Proceedings of the 31st Annual Conference of the Association for Computational Linguistics, pages 1--8, Columbus, Ohio, June. Google ScholarDigital Library
W. Bruce Croft. 1984. A comparison of the cosine correlation and the modified probabilistic model. In Information Technology, volume 3, pages 113--114.Google Scholar
Ido Dagan and Kenneth W. Church. 1994. Termight: Identifying and translating technical terminology. In Proceedings of the 4th Conference on Applied Natural Language Processing, pages 34--40, Stuttgart, Germany, October. Google ScholarDigital Library
Ido Dagan and Alon Itai. 1994. Word sense disambiguation using a second language monolingual corpus. In Computational Linguistics, pages 564--596. Google ScholarDigital Library
William B. Frakes and Ricardo Baeza-Yates, editors. 1992. Information Retrieval: Data structures & Algorithms. Prentice-Hall. Google ScholarDigital Library
Pascale Fung and Kenneth Church. 1994. Kvec: A new approach for aligning parallel texts. In Proceedings of COLING 94, pages 1096--1102, Kyoto, Japan, August. Google ScholarDigital Library
Pascale Fung and Kathleen McKeown. 1997. Finding terminology translations from non-parallel corpora. In The 5th Annual Workshop on Very Large Corpora, pages 192--202, Hong Kong, Aug.Google Scholar
Pascale Fung and Dekai Wu. 1994. Statistical augmentation of a Chinese machine-readable dictionary. In Proceedings of the Second Annual Workshop on Very Large Corpora, pages 69--85, Kyoto, Japan, June.Google Scholar
Pascale Fung. 1995a. Compiling bilingual lexicon entries from a non-parallel English-Chinese corpus. In Proceedings of the Third Annual Workshop on Very Large Corpora, pages 173--183, Boston, Massachusettes, June.Google Scholar
Pascale Fung. 1995b. A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In Proceedings of the 33rd Annual Conference of the Association for Computational Linguistics, pages 236--233, Boston, Massachusettes, June. Google ScholarDigital Library
William Gale and Kenneth Church. 1991. Identifying word correspondences in parallel text. In Proceedings of the Fourth Darpa Workshop on Speech and Natural Language, Asilomar. Google ScholarDigital Library
William A. Gale and Kenneth W. Church. 1993. A program for aligning sentences in bilingual corpora. Computational Linguistics, 19(1):75--102. Google ScholarDigital Library
William A. Gale and Kenneth W. Church. 1994. Discrimination decisions in 100,000 dimensional spaces. Current Issues in Computational Linguistics: In honour of Don Walker, pages 429--550.Google Scholar
W. Gale, K. Church, and D. Yarowsky. 1992. Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In Proceedings of the 30th Conference of the Association for Computational Linguistics. Association for Computational Linguistics. Google ScholarDigital Library
W. Gale, K. Church, and D. Yarowsky. 1993. A method for disambiguating word senses in a large copus. In Computers and Humanities, volume 26, pages 415--439.Google Scholar
K. Sparck Jones. 1979. Experiments in relevance weighting of search terms. In Information Processing and Management, pages 133--144.Google Scholar
Martin Kay and Martin Röscheisen. 1993. Text-Translation alignment. Computational Linguistics, 19(1):121--142. Google ScholarDigital Library
Robert Korfhage. 1995. Some thoughts on similarity measures. In The SIGIR Forum, volume 29, page 8. Google ScholarDigital Library
Julian Kupiec. 1993. An algorithm for finding noun phrase correspondences in bilingual corpora. In Proceedings of the 31st Annual Conference of the Association for Computational Linguistics, pages 17--22, Columbus, Ohio, June. Google ScholarDigital Library
Reinhard Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 35th Conference of the Association of Computational Linguistics, student session, pages 321--322, Boston, Mass. Google ScholarDigital Library
G. Salton and C. Buckley. 1988. Term-weighting approaches in automatic text retrieval. In Information Processing and Management, pages 513--523. Google ScholarDigital Library
G. Salton and C. Yang. 1973. On the specification of term values in automatic indexing, volume 29.Google Scholar
Hinrich Shütze. 1992. Dimensions of meaning. In Proceedings of Supercomputing '92. Google ScholarDigital Library
M. Simard, G Foster, and P. Isabelle. 1992. Using cognates to align sentences in bilingual corpora. In Proceedings of the Forth International Conference on Theoretical and Methodological Issues in Machine Translation, Montreal, Canada.Google Scholar
Frank Smadja, Kathleen McKeown, and Vasileios Hatzsivassiloglou. 1996. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 21(4):1--38. Google ScholarDigital Library
Howard R. Turtle and W. Bruce Croft. 1992. A comparison of text retrieval methods. In The Computer Journal, volume 35, pages 279--290. Google ScholarDigital Library
Dekai Wu and Xuanyin Xia. 1994. Learning an English-Chinese lexicon from a parallel corpus. In Proceedings of the First Conference of the Association for Machine Translation in the Americas, pages 206--213, Columbia, Maryland, October.Google Scholar
D. Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Conference of the Association for Computational Linguistics, pages 189--196. Association for Computational Linguistics. Google ScholarDigital Library

An IR approach for translating new words from nonparallel, comparable texts

Recommendations

Translating pieces of words
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Translation for cross-language information retrieval need not be word-based. We show that character n-grams in one language can be 'translated' into character n-grams of another language. We demonstrate that such translations produce retrieval results ...
Read More
Detecting new Chinese words from massive domain texts with word embedding

Textual information retrieval (TIR) is based on the relationship between word units. Traditional word segmentation techniques attempt to discern the word units accurately from texts; however, they are unable to appropriately and efficiently identify all ...
Read More
Mining new word translations from comparable corpora
COLING '04: Proceedings of the 20th international conference on Computational Linguistics

New words such as names, technical terms, etc appear frequently. As such, the bilingual lexicon of a machine translation system has to be constantly updated with these new word translations. Comparable corpora such as news documents of the same period ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1
August 1998
768 pages
Program Chairs:
Christian Boitet
Université Joseph Fourier, Grenoble, France
,
Pete Whitelock
Sharp Laboratories of Europe Ltd., Oxford, United Kingdom
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 10 August 1998
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 82
  Total Citations
  View Citations
- 1,083
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An IR approach for translating new words from nonparallel, comparable texts

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1

References

Cited By

Recommendations

Translating pieces of words

Detecting new Chinese words from massive domain texts with word embedding

Mining new word translations from comparable corpora

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An IR approach for translating new words from nonparallel, comparable texts

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1

References

Cited By

Recommendations

Translating pieces of words

Detecting new Chinese words from massive domain texts with word embedding

Mining new word translations from comparable corpora

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media