Abstract
This paper reviews the current state of the art in Natural LanguageProcessing for Hebrew, both theoretical and practical. The Hebrewlanguage, like other Semitic languages, poses special challenges fordevelopers of programs for natural language processing: the writingsystem, rich morphology, unique word formation process of roots andpatterns, lack of linguistic corpora that document language usage, allcontribute to making computational approaches to Hebrew challenging. The paper briefly reviews the field of computational linguistics andthe problems it addresses, describes the special difficulties inherentto Hebrew (as well as to other Semitic languages), surveys a widevariety of past and ongoing works and attempts to characterize futureneeds and possible solutions.
Similar content being viewed by others
References
Adler, M. & Tebeka, M. (2001). Unsupervised Hebrew Part-of-Speech Tagging. In Wintner, S. (ed.) Israeli Seminar on Computational Linguistics (ISCOL'01), 19–20. Haifa.
Albeck, O. (1995). A Formal Method for Analyzing a Hebrew Sentence. Hebrew Linguistics 39: 5–27 (in Hebrew).
Attar, R., Choueka, Y., Dershowitz, N. & Fraenkel, A. S. (1978). KEDMA – Linguistic Tools for Retrieval Systems. Journal of the Association for Computing Machinery 25(1): 52–66.
Azar, M. (1970). Analyse morphologique automatique du texte hébreu de la Bible. Technical Report 12 et 19, Faculte des Lettres et des Sciences Humaines, Nancy.
Azar, M. (1972). Automatic Syntactical Analysis: The Method and Its Application to the Book of Ruth. Hebrew Computational Linguistics 5: 1–50 (in Hebrew).
Bashkansky, G. & Ornan, U. (1998). Monolingual Translator Workstation. In MT and the Information Soup: Proceedings of AMTA'98, 136–149. Springer.
Beesley, K. (1996). Arabic Finite-State Morphological Analysis and Generation. In Proceedings of COLING-96, the 16th International Conference on Computational Linguistics. Copenhagen.
Beesley, K. R. (1998). Arabic Morphology Using Only Finite-State Operations. In Rosner, M. (ed.) Proceedings of the Workshop on Computational Approaches to Semitic Languages, 50–57. Montreal, Quebec (COLING-ACL'98).
Beesley, K. R. & Karttunen, L. (2003). Finite-State Morphology: Xerox Tools and Techniques. Stanford: CSLI Publications.
Bentur, E., Angel, A. & Segev, D. (1992). Computerized Analysis of Hebrew Words. Hebrew Linguistics 36: 33–38 (in Hebrew).
Bentur, E., Angel, A., Segev, D. & Lavie, A. (1992). Analysis and Generation of the Nouns Inflection in Hebrew. In Ornan et al. (eds.), Chapter 3, 36–38 (in Hebrew).
Carmel, D. & Maarek, Y. (1999). Morphological Disambiguation for Hebrew Search Systems. In Proceedings of the 4th international Workshop, NGITS-99, Number 1649 in Lecture Notes in Computer Science, 312–325. Springer Verlag.
Chayen, M. J. and Dror, Z. (1976). Introduction to Hebrew Transformational Grammar. Jerusalem: University Publishing Projects Ltd. (in Hebrew).
Choueka, Y. (1966). Computers and Grammar: Mechnical Analysis of Hebrew Verbs. In Proceedings of the Annual Conference of the Israeli Association for Information Processing, 49–66. Rehovot (in Hebrew).
Choueka, Y. (1972). Fast Searching and Retrieval Techniques for Large Dictionaries and Concordances. Hebrew Computational Linguistics 6: 12–32 (in Hebrew).
Choueka, Y. (1980). Computerized Full-Text Retrieval Systems and Research in the Humanities: The Responsa Project. Computers and the Humanities 14: 153–169.
Choueka, Y. (1990). MLIM – a System for Full, Exact, On-Line Grammatical Analysis of Modern Hebrew. In Eizenberg, Y. (ed.) Proceedings of the Annual Conference on Computers in Education, 63. Tel Aviv (in Hebrew).
Choueka, Y. (1993). Response to “Computerized Analysis of Hebrew Words”. Hebrew Linguistics 37: 87 (in Hebrew).
Choueka, Y. & Lusignan, S. (1985). Disambiguation by Short Context. Computers and the Humanities 19: 147–157.
Cohen, D. (1984). Mechanical Syntactic Analysis of a Hebrew Sentence. Ph.D. thesis, Hebrew University of Jerusalem (in Hebrew).
Cohen, D. (1985). Analysis of Unvocalized Texts. In Proceedings of the Ninth World Congress of Jewish Studies, 117–122. Jerusalem: World Union of Jewish Studies (in Hebrew).
Dagan, I. & Itai, A. (1994). Word Sense Disambiguation Using a Second Language Monolingual Corpus. Computational Linguistics 20(4): 563–596.
Dahan Netzer, Y. (1997). HUGG – Unification-Based Grammar for the Generation of Hebrew Noun Phrases. Master's thesis, Ben-Gurion University of the Negev, Department of Computer Science, Faculty of Natural Sciences, Be'er Sheva, Israel.
Dahan Netzer, Y. & Elhadad, M. (1998a). Generating Determiners and Quantifiers in Hebrew. In Rosner, M. (ed.) Proceedings of the Workshop on Computational Approaches to Semitic Languages (COLING/ACL'98), 82–88. Montreal, Canada.
Dahan Netzer, Y. & Elhadad, M. (1998b). Generation of Noun Compounds in Hebrew: Can Syntactic Knowledge be Fully Encapsulated? In Hovy, E. (ed.) Proceedings of the Ninth International Workshop on Natural Language Generation, 168–177, New Brunswick, New Jersey: Association for Computational Linguistics.
Dahan Netzer, Y. & Elhadad, M. (1999). Hebrew–English Generation of Possessives and Partitives: Raising the Input Abstraction Level. In Proceedings of the 37th Meeting of the ACL, 144–151. Maryland.
Dalrymple, M., Kaplan, R. M., Maxwell, J. T. & Zaenen, A. (eds.) (1995). Formal Issues in Lexical-Functional Grammar, Volume 47 of CSLI Lecture Notes. Stanford, CA: CSLI.
Fraenkel, A. S. (1976). All about the Responsa Retrieval Project – What You Always Wanted to Know But Were Afraid to Ask. Jurimetrics Journal 16(3): 149–156.
Glinert, L. (1989). The Grammar of Modern Hebrew. Cambridge: Cambridge University Press.
Goldstein, L. (1991). Generation and Inflection of the Possession Inflection of Hebrew Nouns. Master's thesis, Technion, Haifa, Israel (in Hebrew).
Haddock, N., Klein, E. & Morill, G. (eds.) (1987). Categorial Grammar, Unification and Parsing, Volume 1 of Working Papers in Cognitive Science. University of Edinburgh, Center for Cognitive Science.
Herz, J. & Rimon, M. (1991). Local Syntactic Constraints. In Proceedings of the Second International Workshop on Parsing Technologies. Cancun, Mexico.
Herz, J. & Rimon, M. (1992). Lexical Disambiguation and Other Applications of Short Context Automata. In Ornan et al. (eds.), Chapter 7, 74–87 (in Hebrew).
Izre'el, S., Hary, B. & Rahav, G. (to appear). Designing CoSIH: The Corpus of Spoken Israeli Hebrew.
Joshi, A. K. (1987). An Introduction to Tree Adjoining Grammars. In Manaster-Ramer, A. (ed.) Mathematics of Language. Amsterdam: John Benjamins.
Kaplan, R. & Bresnan, J. (1982). Lexical Functional Grammar: A Formal System for Grammatical Representation. In Bresnan, J. (ed.) The Mental Representation of Grammatical Relations, 173–281. Cambridge, MA: MIT Press.
Kaplan, R. M. & Kay, M. (1994). Regular Models of Phonological Rule Systems. Computational Linguistics 20(3): 331–378.
Karttunen, L., Chanod, J-P., Grefenstette, G. & Schiller, A. (1996). Regular Expressions for Language Engineering. Natural Language Engineering 2(4): 305–328.
Kiraz, G. A. (2000). Multitiered Nonlinear Morphology Using Multitape Finite Automata: A Case Study on Syriac and Arabic. Computational Linguistics 26(1): 77–105.
Koskenniemi, K. (1983). Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. The Department of General Linguistics, University of Helsinki.
Laufer, A. (1976). Computer Generated Artificial Hebrew Speech. Leshonenu 40: 67–78 (in Hebrew).
Lavie, A. (1989). Two-Level Morphology for Hebrew. Master's thesis, Technion, Haifa, Israel (in Hebrew).
Lavie, A., Itai, A., Ornan, U. & Rimon, M. (1988a). On the Applicability of Two-Level Morphology to the Inflection of Hebrew Verbs, Technical Report 513. Department of Computer Science, Technion, 32000 Haifa, Israel.
Lavie, A., Itai, A., Ornan, U. & Rimon, M. (1988b). On the Applicability of Two-Level Morphology to the Inflection of Hebrew Verbs. In Proceedings of the International Conference of the ALLC. Jerusalem, Israel.
Lazewnik, R. G. (1970). Construction of an Algorithm for Stem Recognition in the Hebrew Language. Hebrew Computational Linguistics 2: 84–101.
Levinger, M. (1992). Morphologic Disambiguation in Hebrew. Master's thesis, Technion, Haifa, Israel (in Hebrew).
Levinger, M., Ornan, U. & Itai, A. (1995). Learning Morpho-Lexical Probabilities from an Untagged Corpus with an Application to Hebrew. Computational Linguistics 21(3): 383–404.
Mani, A. (2001). Automatic Summarization. Amsterdam: John Benjamins.
Mani, A. & Maybury, M. T. (eds.) (1999). Advances in Automatic Text Summarization. Cambridge, MA: MIT Press.
Mohri, M. (1996). On Some Applications of Finite-State Automata Theory to Natural Language Processing. Natural Language Engineering 2(1): 61–80.
Mohri, M., Pereira, F. & Riley, M. (1998). A Rational Design for a Weighted Finite-State Transducer Library, Number 1436 in Lecture Notes in Computer Science. Springer.
Morgenbrod, M. & Serifi, E. (1976). Computer-Analysed Aspects of Hebrew Verbs. Hebrew Computational Linguistics 10: E1–17.
Morgenbrod, M. & Serifi, E. (1977). Computer-Analysed Aspects of Hebrew Verbs: Mathematical Models. Hebrew Computational Linguistics 12: E1–18.
Morgenbrod, M. & Serifi, E. (1978). Computer-Analysed Aspects of Hebrew Verbs: The Binjanim Structure. Hebrew Computational Linguistics 14: V–XV.
Nirenburg, S. & Ben-Asher, Y. (1984). HUHU – the Hebrew University Hebrew Understander. Computer Languages 9(3/4).
Nissan, E. (1993). Onomaturge: An Expert System for Word Formation. Hebrew Linguistics 36: 39–49 (in Hebrew).
Ornan, U. (1977). Report on Linguistic Research in the Computer Carried on in Israel. Hebrew Computational Linguistics 11: 121–127 (in Hebrew).
Ornan, U. (1979). The Simple Sentence. Jerusalem, Israel: Academon (in Hebrew).
Ornan, U. (1985a). Indexes and Concordances in a Phonemic Hebrew Script. In Proceedings of the Ninth World Congress of Jewish Studies, 101–108. Jerusalem: World Union of Jewish Studies (in Hebrew).
Ornan, U. (1985b). Vocalization by a Computer: A Linguistic Lesson. In Luria, B-Z. (ed.) Avraham Even-Shoshan Book, 67–76. Jerusalem: Kiryat-Sefer (in Hebrew).
Ornan, U. (1986). Phonemic Script: A Central Vehicle for Processing Natural Language – the Case of Hebrew, Technical Report 88.181. IBM Research Center, Haifa, Israel.
Ornan, U. 1(1987). Computer Processing of Hebrew Texts Based on an Unambiguous Script. Mishpatim 17(2): 15–24 (in Hebrew).
Ornan, U. (1994). Basic Concepts in “Romanization” of Scripts, Technical Report LCL 94–5. Laboratory for Computational Linguistics, Technion, Haifa, Israel.
Ornan, U., Arieli, G. & Doron, E. (eds.) (1992). Hebrew Computational Linguistics: Papers Presented at Seminars Held in 1988, 1989, 1990. Ministry of Science and Technology (in Hebrew).
Ornan, U. & Gutter, I. (2000). Machine Translation by Semantic Features. In Lewis, D. & Mitkov, R. (eds.) Machine Translation and Multilingual Applications in the New Millennium. Exester, UK.
Ornan, U. & Katz, M. (1995). A New Program for Hebrew Index Based on the Phonemic Script, Technical Report LCL 94–7. Laboratory for Computational Linguistics, Technion, Haifa, Israel.
Ornan, U. & Kazatski, W. (1986). Analysis and Synthesis Processes in Hebrew Morphology. In Proceedings of the 21 st National Data Processing Conference (in Hebrew).
Pinkas, G. (1985). A Linguistic System for Information Retrieval. Maase Hoshev 12: 10–16 (in Hebrew).
Pollard, C. & Sag, I. A. (1987). Information Based Syntax and Semantics, Number 13 in CSLI Lecture Notes. CSLI.
Pollard, C. & Sag, I. A. (1994). Head-Driven Phrase Structure Grammar. University of Chicago Press and CSLI Publications.
Price, J. D. (1969). An Algorithm for Generating Hebrew Words. Hebrew Computational Linguistics 1: 51–54. Reprinted from Computer Studies in the Humanities and Verbal Behavior 1(2): 84–102 (1969).
Price, J. D. (1970). The Development of a Theoretical Basis for Machine Aids for Translation from Hebrew to English. Hebrew Computational Linguistics 2: 65–83, May. Abstract of a Doctoral Dissertation, The Dropsie College for Hebrew and Cognate Learning, Philadelphia.
Price, J. D. (1971a). An Algorithm for Analyzing Hebrew Words. Computer Studies in the Humanities and Verbal Behavior 3(2): 137–165.
Price, J. D. (1971b). A Computerized Phrase Structure Grammar (Modern Hebrew), Report F-C2585–1/2/3/4. Franklin Institute.
Roche, E. & Schabes, Y. (eds.) (1997). Finite-State Language Processing. Language, Speech and Communication. Cambridge, MA: MIT Press.
Rosen, H. B. (1966). Ivrit Tova (Good Hebrew). Jerusalem: Kiryat Sepher (in Hebrew).
Rubinstein, E. (1968). Ha-mishpat Ha-shemani (The Nominal Sentence). Merhavia: Ha-Kibbutz Ha-Me'uxad (in Hebrew).
Rubinstein, E. (1970). Ha-cerup Ha-pooliy (The Verb Phrase). Merhavia: Ha-Kibbutz Ha-Me'uxad (in Hebrew).
Samuelsdorff, P. O. (1980). Computational Analysis of Modern Hebrew. Hebrew Computational Linguistics 16: IV–XVI.
Segal, E. (1997). Morphological Analyzer for Unvocalized Hebrew Words. Unpublished work, available from http://www.cs.technion.ac.il/~erelsgl/hmntx.zip.
Segal, E. (1999). Hebrew Morphological Analyzer for Hebrew Undotted Texts. Master's thesis, Technion, Israel Institute of Technology, Haifa (in Hebrew).
Shany-Klein, M. (1990). Generation and Analysis of Segolate Noun Inflection in Hebrew. Master's thesis, Technion, Haifa, Israel (in Hebrew).
Shany-Klein, M. & Ornan, U. (1992). Analysis and Generation of Hebrew Segolate Nouns. In Ornan et al. (eds.), Chapter 4, 39–51 (in Hebrew).
Shapira, M. & Choueka, Y. (1964). Mechanographic Analysis of Hebrew Morphology: Possibilities and Achievements. Leshonenu 28(4): 354–372 (in Hebrew).
Shieber, S. M. (1986). An Introduction to Unification Based Approaches to Grammar, Number 4 in CSLI Lecture Notes. CSLI.
Sima'an, K., Itai, A., Winter, Y., Altman, A. & Nativ, N. (to appear). Building a Tree-Bank of Modern Hebrew Text. Traitment Automatique des Langues.
Skoblikov, V. (2000). Feature-Based Computational Lexicon of Hebrew Verbs. Master's thesis, Technion, Israel Institute of Technology, Haifa, Israel.
Sproat, R. W. (1992). Morphology and Computation. Cambridge, MA: MIT Press.
Steedman, M. (2000). The Syntactic Process. Language, Speech and Communication. Cambridge, MA: The MIT Press.
Talmon, R. & Wintner, S. (2001). Computational Processing of Spoken North Israeli Arabic. In Arabic Language Processing: Status and Prospects, 124–126. Toulouse, France: Association for Computational Linguistics.
Vaillette, N. (2001). Hebrew Relative Clauses in HPSG. In Flickinger, D. & Kathol, A. (eds.) Proceedings of the 7th International Conference on Head-Driven Phrase Structure Grammar. CSLI Publications.
van der Toorn, A. J. (1971). Automatic Reading of Handwritten Hebrew. Hebrew Computational Linguistics 4: 83–99.
van Noord, G. & Gerdemann, D. (2001). Finite State Transducers with Predicates and Identity. Grammars 4(3).
Wintner, S. (1991). Syntactic Analysis of Hebrew Sentences. Master's thesis, Technion, Israel Institute of Technology, Haifa, Israel (in Hebrew, abstract in English).
Wintner, S. (1992). Syntactic Analysis of Hebrew Sentences Using PATR. In Ornan et al. (eds.), Chapter 9, 105–115 (in Hebrew).
Wintner, S. (1997). An Abstract Machine for Unification Grammars. Ph.D. thesis, Technion –Israel Institute of Technology, Haifa, Israel.
Wintner, S. (1998). Towards a Linguistically Motivated Computational Grammar for Hebrew. In Rosner, M. (ed.) Proceedings of the Workshop on Computational Approaches to Semitic Languages (COLING-ACL'98), 82–88. Université de Montréal, Quebec, Canada: Association for Computational Linguistics.
Wintner, S. (ed.) (2001). Israeli Seminar on Computational Linguistics (ISCOL'01). Haifa.
Wintner, S. & Ornan, U. (1991a). Computational Models for Syntactic Analysis – Their Fitness for Writing a Computational Grammar for Hebrew. In Proceedings of the Bar-Ilan Symposium on Foundations of Artificial Intelligence. Also as CIS Report 9103, Center for Intelligent Systems, Technion.
Wintner, S. & Ornan, U. (1991b). Syntactic Analysis of Hebrew Sentences. In Proceedings of the 8th Israeli Symposium on Artificial Intelligence and Computer Vision, 201–230. Information Processing Association of Israel.
Wintner, S, & Ornan, U. (1996). Syntactic Analysis of Hebrew Sentences. Natural Language Engineering 1(3): 261–288.
Yizhar, D. (1993). Computational Grammar for Hebrew Noun Phrases. Master's thesis, Computer Science Department, Hebrew University, Jerusalem, Israel (in Hebrew).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wintner, S. Hebrew Computational Linguistics: Past and Future. Artificial Intelligence Review 21, 113–138 (2004). https://doi.org/10.1023/B:AIRE.0000020865.73561.bc
Issue Date:
DOI: https://doi.org/10.1023/B:AIRE.0000020865.73561.bc