Abstract
Long-sentence analysis has been a critical problem in machine translation becauseof its high complexity. Intrasentence segmentation has been proposed as a methodfor reducing parsing complexity. This paper presents a two-step segmentation method:(1) identifying potential segmentation positions in a sentence and (2) selecting an actualsegmentation position amongst them. We have attempted to apply machine learningtechniques to the segmentation task: ``concept learning'' and ``genetic learning''. Bylearning the ``SegmentablePosition'' concept, the rules for identifying potentialsegmentation positions are postulated. The selection of the actual segmentationposition is based on a function whose parameters are determined by genetic learning.Experimental results are presented which illustrate the effectiveness of our approachto long-sentence parsing for MT. The results also show improved segmentationperformance in comparison to other existing methods.
Similar content being viewed by others
References
Abney, Steven: 1991. ‘Parsing by Chunks’, in Robert Berwick, Steven Abney and Carol Tenny (eds), Principle-Based Parsing, Dordrecht, Kluwer Academic Publishers, pp. 257–278.
Abney, Steven: 1995. ‘Chunks and Dependencies: Bringing Processing Evidence to Bear on Syntax’, in Jennifer Cole, Georgia M. Green and Jerry L. Morgan (eds) Computational Linguistics and the Foundations of Linguistic Theory, Stanford, CA, CSLI Publications, pp. 145–164.
Abney, Steven: 1996. ‘Partial Parsing via Finite-State Cascades’, ESSLLI'96 Workshop on Robust Parsing Workshop, Prague, Czech Republic.
Beeferman, D., A. Berger and J. Lafferty: 1999. ‘Statistical Models for Text Segmentation’, Machine Learning 4, 177–210.
Cestnik, B., I. Kononenko, and I. Bratko: 1987. ‘ASSISTANT-86: A Knowledge-Elicitation Tool for Sophisticated Users’, in I. Bratko and N. Lavrac (eds) Progress in Machine Learning, Wilmslow: Sigma Press.
Chen, Kuang-Hua and Hsin-Hsi Chen: 1997. ‘A Hybrid Approach to Machine Translation System Design’, Computational Linguistics and Chinese Language Processing 23, 241–265.
Cranias, Lambros, Harris Papageorgiou and Stelios Piperidis: 1994. ‘A Matching Technique in Example-Based Machine Translation’, COLING 94: The 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 100–104.
Dean, Thomas, James Allen and Yiannis Aloimonos: 1995. Artificial Intelligence: Theory and Practice. Amsterdam: Benjamin/Cummings Publishing Company.
Gee, James Paul and François Grosjean: 1983. ‘Performance Structures: A Psycholinguistic and Linguistic Appraisal’, Cognitive Psychology 15, 411–458.
Kim, Sung Dong and Yung Tack Kim: 1995. ‘Sentence Analysis Using Pattern Matching in English-Korean Machine Translation’, ICCPOL '95: International Conference on Computer Processing of Oriental Languages, Honolulu, Hawaii, pp. 199–206.
Kim Sung Dong and Kim Yung Taek: 1997. Hyo-eul-juk-in yeoung-u gu-moon boon-seok-eul wui-han moon-jang boon-hal [Sentence Segmentation for Efficient English Syntactic Analysis], Han-kook Jung-bo-gwa-hak-hoy Non-moon-ji (Journal of Korea Information Science Society) 24, 884–890.
Kim, Yeun-Bae and Terumasa Ehara: 1994. ‘A Method for Partitioning of Long Japanese Sentences with Subject Resolution in J/E Machine Translation’, ICCPOL '94: International Conference on Computer Processing of Oriental Languages, Taejon, Korea, pp. 467–473.
Lee Ho Suk: 1993. Young-hasn gi-gye-byen-yeouk-eul wui-han mal-moong-chi-e gi-ban-han byeonhwan-sa-jun-wy ja-dong goo-chuk [Automatic Construction of Transfer Dictionary based on the Corpus for English-Korean Machine Translation], PhD thesis, Seoul National University.
Li, Wei-Chuan, Tzusheng Pei, Bing-Huang Lee and Chuei-Feng Chiou: 1990. ‘Parsing Long English Sentences with Pattern Rules’, COLING-90: Papers presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 3, pp. 410–412.
Lyon, Caroline and Bob Dickerson: 1995. ‘A Fast Partial Parse of Natural Language Sentences Using a Connectionist Method’ Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin, Ireland, pp. 215–222.
Lyon, Caroline and Bob Dickerson: 1997. ‘Reducing the Complexity of Parsing by a Method of Decomposition’, International Workshop on Parsing Technology, Boston, pp. 215–222.
Mitchell, Tom M.: 1977. Version Spaces: An Approach to Concept Learning. PhD thesis, Stanford University.
Mitchell, Tom M.: 1982. ‘Generalization as Search’, Artificial Intelligence 18, 20–51.
Mitchell, Tom M.: 1997. Machine Learning. New York: McGraw Hill.
Nasukawa, Tetsuya: 1995. ‘Robust Parsing Based on Discourse Information: Completing partial parses of ill-formed sentences on the basis of discourse information’, 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, Mass., pp. 39–46.
Palmer, David D. and Marti A. Hearst: 1997. ‘Adaptive Multilingual Sentence Boundary Disambiguation’, Computational Linguistics 23, 241–265.
Passonneau, Rebecca J. and Diane J. Litman: 1997. ‘Discourse Segmentation by Human and Automated Means’, Computational Linguistics 23, 103–139.
Quinlan, J. R.: 1986. ‘Induction of Decision Trees’, Machine Learning 1, 81–106.
Quinlan, J. R: 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Reynar, Jeffrey C. and Adwait Ratnaparkhi: 1997. ‘A Maximum Entropy Approach to Identifying Sentence Boundaries’, Fifth Conference on Applied Natural Language Processing, Washington, DC, pp. 16–19.
Schwefel, Hans-Paul: 1995. Evolution and Optimum Seeking. New York: Wiley.
Tomita, Masaru: 1986. Efficient Parsing for Natural Language, Dordrecht, Kluwer Academic Publishers.
Yoon, Sung Hee: 1994. ‘Efficient Parser to Find Bilingual Idiomatic Expressions for English-Korean Machine Translation’, ICCPOL '94: International Conference on Computer Processing of Oriental Languages, Taejon, Korea, pp. 455–460.
Zhang, Byoung-Tak and Yung-Taek Kim: 1990. ‘Morphological Analysis and Synthesis by Automated Discovery and Aquisition of Linguistic Rules’, in COLING-90: Papers presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 2, pp. 431–436.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kim, SD., Zhang, BT. & Kim, Y.T. Learning-based Intrasentence Segmentation for Efficient Translation of Long Sentences. Machine Translation 16, 151–174 (2001). https://doi.org/10.1023/A:1019896420277
Issue Date:
DOI: https://doi.org/10.1023/A:1019896420277