Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton December 23, 2016

Linguistic typology in natural language processing

  • Emily M. Bender EMAIL logo
From the journal Linguistic Typology

Abstract

This paper explores the ways in which the field of natural language processing (NLP) can and does benefit from work in linguistic typology. I describe the recent increase in interest in multilingual natural language processing and give a high-level overview of the field. I then turn to a discussion of how linguistic knowledge in general is incorporated in NLP technology before describing how typological results in particular are used. I consider both rule-based and machine learning approaches to NLP and review literature on predicting typological features as well as that which leverages such features.

Acknowledgments

I would like to thank Antske Fokkens, Gina-Anne Levow, and Olga Zamaraeva for helpful discussion in the preparation of this paper. All remaining errors and infelicities are my own.

References

Ackema, Peter, Patrick Brandt, Maaike Schoorlemmer & Fred Weerman (eds.). 2006. Arguments and agreement. Oxford: Oxford University Press.Search in Google Scholar

Ammar, Waleed, George Mulcaire, Miguel Ballesteros, Chris Dyer & Noah A. Smith. 2016. Many languages, one parser. Transactions of the Association for Computational Linguistics 4. 431–444. https://www.transacl.org/ojs/index.php/tacl/article/view/89210.1162/tacl_a_00109Search in Google Scholar

Baldwin, Timothy & Valia Kordoni (eds.). 2011. The interaction between linguistics and computational linguistics: Virtuous, vicious or vacuous? Special issue of Linguistic Issues in Language Technology 6. http://journals.linguisticsociety.org/elanguage/lilt/issue/view/330.html10.33011/lilt.v6i.1233Search in Google Scholar

Bandyopadhyay, Sivaji, Pushpak Bhattacharya, Vasudeva Varma, Sudeshna Sarkar, A. Kumaran & Raghavendra Udupa (eds.). 2009. Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3), June 4, 2009, Boulder, Colorado. Madison, WI: Omnipress. http://www.aclweb.org/anthology/W09-16Search in Google Scholar

Bender, Emily M. 2008. Grammar engineering for linguistic hypothesis testing. Texas Linguistics Society 10. 16–36.Search in Google Scholar

Bender, Emily M. 2009. Linguistically naïve != language independent: Why NLP needs linguistic typology. In Proceedings of the EACL 2009 workshop on the interaction between linguistics and computational linguistics: Virtuous, vicious or vacuous?, 26–32. Vrilissia, Greece: Tehnografia Digital Press. http://www.aclweb.org/anthology/W09-010610.3115/1642038.1642044Search in Google Scholar

Bender, Emily M. 2011. On achieving and evaluating language-independence in NLP. Linguistic Issues in Language Technology 6(3). 1–26. http://journals.linguisticsociety.org/elanguage/lilt/article/view/2624.html10.33011/lilt.v6i.1239Search in Google Scholar

Bender, Emily M. 2014. Language CoLLAGE: Grammatical description with the LinGO Grammar Matrix. International Conference on Language Resources and Evaluation 9. 2447–2451. http://www.lrec-conf.org/proceedings/lrec2014/pdf/639_Paper.pdfSearch in Google Scholar

Bender, Emily M., Joshua Crowgey, Michael Wayne Goodman & Fei Xia. 2014. Learning grammar specifications from IGT: A case study of Chintang. In Good et al. (eds.) 2014, 43–53. http://www.aclweb.org/anthology/W14-220610.3115/v1/W14-2206Search in Google Scholar

Bender, Emily M., Scott Drellishak, Antske Fokkens, Laurie Poulson & Safiyyah Saleem. 2010. Grammar customization. Research on Language and Computation 23–72.10.1007/s11168-010-9070-1Search in Google Scholar

Bender, Emily M., Dan Flickinger & Stephan Oepen. 2002. The grammar matrix: An open-source starter-kit for the rapid development of crosslinguistically consistent broad-coverage precision grammars. International Conference on Computational Linguistics 19 (Workshop on Grammar Engineering and Evaluation). 8–14. http://www.aclweb.org/anthology/W02-150210.3115/1118783.1118785Search in Google Scholar

Bender, Emily M., Michael Wayne Goodman, Joshua Crowgey & Fei Xia. 2013. Towards creating precision grammars from interlinear glossed text: Inferring large-scale typological properties. Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities 7. 74–83. http://www.aclweb.org/anthology/W13-2710Search in Google Scholar

Böhmová, Alena, Jan Hajič, Eva Hajičová & Barbora Hladká. 2003. The Prague Dependency Treebank. In Anne Abeillé (ed.), Treebanks: Building and using parsed corpora, 103–127. Dordrecht: Kluwer.10.1007/978-94-010-0201-1_7Search in Google Scholar

Brown, Peter F., John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer & Paul S. Roossin. 1990. A statistical approach to machine translation. Computational Linguistics 16. 79–85.Search in Google Scholar

Buchholz, Sabine & Erwin Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. Conference on Computational Natural Language Learning 10. 149–164. http://www.aclweb.org/anthology/W06-292010.3115/1596276.1596305Search in Google Scholar

Büring, Daniel. 2010. Towards a typology of focus realization. In Malte Zimmermann & Caroline Féry (eds.), Information structure, 177–205. Oxford: Oxford University Press.10.1093/acprof:oso/9780199570959.003.0008Search in Google Scholar

Bybee, Joan L., Revere Perkins & William Pagliuca. 1994. The evolution of grammar: Tense, aspect and modality in the languages of the world. Chicago: University of Chicago Press.Search in Google Scholar

Calzolari, Nicoletta, Riccardo Del Gratta, Gil Francopoulo, Joseph Mariani, Francesco Rubino, Irene Russo & Claudia Soria. 2012. The LRE map: Harmonising community descriptions of resources. International Conference on Language Resources and Evaluation 8. 1084–1089. http://www.lrec-conf.org/proceedings/lrec2012/pdf/769_Paper.pdfSearch in Google Scholar

Comrie, Bernard. 1976. Aspect: An introduction to the study of verbal aspect and related problems. Cambridge: Cambridge University Press.Search in Google Scholar

Comrie, Bernard. 1985. Tense. Cambridge: Cambridge University Press.10.1017/CBO9781139165815Search in Google Scholar

Comrie, Bernard. 1989. Language universals and linguistic typology. 2nd edn. Chicago: University of Chicago Press.Search in Google Scholar

Copestake, Ann, Dan Flickinger, Carl Pollard & Ivan A. Sag. 2005. Minimal recursion semantics: An introduction. Research on Language and Computation 3. 281–332.10.1007/s11168-006-6327-9Search in Google Scholar

Corbett, Greville G. 1991. Gender. Cambridge: Cambridge University Press.10.1017/CBO9781139166119Search in Google Scholar

Corbett, Greville G. 2000. Number. Cambridge: Cambridge University Press.10.1017/CBO9781139164344Search in Google Scholar

Corbett, Greville G. 2006. Agreement. Cambridge: Cambridge University Press.Search in Google Scholar

Crowgey, Joshua. 2012. The syntactic exponence of sentential negation: A model for the LinGO Grammar Matrix. Seattle: University of Washington MA thesis. http://hdl.handle.net/1773/22454Search in Google Scholar

Cysouw, Michael. 2003. The paradigmatic structure of person marking. Oxford: Oxford University Press.Search in Google Scholar

Dahl, Östen. 1979. Typology of sentence negation. Linguistics 17. 79–106.10.1515/ling.1979.17.1-2.79Search in Google Scholar

Dahl, Östen. 1985. Tense and aspect systems. Oxford: Blackwell.Search in Google Scholar

Daumé, Hal, III. 2009. Non-parametric Bayesian areal linguistics. North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2009(1). 593–601. http://www.aclweb.org/anthology/N09-106710.3115/1620754.1620841Search in Google Scholar

Daumé, Hal, III & Lyle Campbell. 2007. A Bayesian model for discovering typological implications. Association of Computational Linguistics 45(1). 65–72. http://www.aclweb.org/anthology/P07-1009Search in Google Scholar

Dixon, R. M. W. 1994. Ergativity. Cambridge: Cambridge University Press.10.1017/CBO9780511611896Search in Google Scholar

Dixon, R. M. W. 2004. Adjective classes in typological perspective. In R. M. W. Dixon & Alexandra Y. Aikhenvald (eds.), Adjective classes: A cross-linguistic typology, 1–49. Oxford: Oxford University Press.Search in Google Scholar

Drellishak, Scott. 2004. A survey of coordination strategies in the world’s languages. Seattle: University of Washington MA thesis.Search in Google Scholar

Drellishak, Scott. 2009. Widespread but not universal: Improving the typological coverage of the Grammar Matrix. Seattle: University of Washington doctoral dissertation.Search in Google Scholar

Drellishak, Scott & Emily M. Bender. 2005. A coordination module for a crosslinguistic grammar resource. International Conference on Head-Driven Phrase Structure Grammar 12. 108–128. http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2005/drellishak-bender.pdf10.21248/hpsg.2005.6Search in Google Scholar

Dryer, Matthew S. 2005. Negative morphemes. In Haspelmath et al. (eds.) 2005, 454–457.Search in Google Scholar

Dryer, Matthew S. 2008. Expression of pronominal subjects. In Martin Haspelmath, Matthew S. Dryer, David Gil & Bernard Comrie (eds.), The world atlas of language structures online, Chapter 101. München: Max Planck Digital Library. http://wals.info/feature/101Search in Google Scholar

Dryer, Matthew S. 2013a. Order of adjective and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 87. http://wals.info/feature/87Search in Google Scholar

Dryer, Matthew S. 2013b. Order of adposition and noun phrase. In Dryer & Haspelmath (eds.) 2013, Chapter 85. http://wals.info/chapter/85Search in Google Scholar

Dryer, Matthew S. 2013c. Order of demonstrative and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 88. http://wals.info/chapter/88Search in Google Scholar

Dryer, Matthew S. 2013d. Order of genitive and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 86. http://wals.info/chapter/86Search in Google Scholar

Dryer, Matthew S. 2013e. Order of numeral and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 89. http://wals.info/chapter/89Search in Google Scholar

Dryer, Matthew S. 2013f. Order of subject, object and verb. In Dryer & Haspelmath (eds.) 2103, Chapter 81. http://wals.info/chapter/81Search in Google Scholar

Dryer, Matthew S. & Martin Haspelmath (eds.). 2013. The world atlas of language structures online. Leipzig: Max Planck Institut für evolutionäre Anthropologie. http://wals.info/Search in Google Scholar

Evans, Nicholas & Stephen C. Levinson. 2009. The myth of language universals: Language diversity and its importance for cognitive science. Behavioral & Brain Sciences 32. 429–448.10.1017/S0140525X0999094XSearch in Google Scholar

Féry, Caroline & Manfred Krifka. 2009. Information structure: Notional distinctions, ways of expression. In Piet van Sterkenburg (ed.), Unity and diversity of languages, 123–135. Amsterdam: Benjamins.10.1075/z.141.13kriSearch in Google Scholar

Georgi, Ryan, Fei Xia & William D. Lewis. 2010. Comparing language similarity across genetic and typologically-based groupings. International Conference on Computational Linguistics 23. 385–393. http://www.aclweb.org/anthology/C10-1044Search in Google Scholar

Georgi, Ryan, Fei Xia & William D. Lewis. 2012. Improving dependency parsing with interlinear glossed text and syntactic projection. International Conference on Computational Linguistics 24(Posters), 371–380. http://www.aclweb.org/anthology/C12-2037Search in Google Scholar

Giannakopoulos, George & Georgios Petasis (eds.). 2013. Proceedings of the workshop “Multilingual multi-document summarization” (MultiLing 2013), August 9, 2013, Sofia, Bulgaria. Madison, WI: Omnipress. http://www.aclweb.org/anthology/W13-31Search in Google Scholar

Givón, T. 1994. The pragmatics of de-transitive voice: Functional and typological aspects of inversion. In T. Givón (ed.), Voice and inversion, 3–44. Amsterdam: Benjamins.10.1075/tsl.28.03givSearch in Google Scholar

Good, Jeff, Julia Hirschberg & Owen Rambow (eds.). 2014. Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL 2014), June 26, 2014, Baltimore, Maryland, USA. http://www.aclweb.org/anthology/W14-22Search in Google Scholar

Hajič, Jan, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Martí, Lluís Màrquez, Adam Meyers, Joakim Nivre, Sebastian Padó, Jann Štěpánek, Pavel Straňák, Mihai Surdeanu, Nianwen Xue & Yi Zhang. 2009. The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. Conference on Computational Natural Language Learning 13(2: Shared Task). 1–18. http://www.aclweb.org/anthology/W09-120110.3115/1596409.1596411Search in Google Scholar

Haspelmath, Martin, Matthew Dryer, David Gil & Bernard Comrie (eds.). 2005. The world atlas of language structures. Oxford: Oxford University Press.Search in Google Scholar

Hwa, Rebecca, Philip Resnik, Amy Weinberg, Clara Cabezas & Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering 11. 311–325.10.1017/S1351324905003840Search in Google Scholar

Jagarlamudi, Jagadeesh, Sujith Ravi, Xiaojun Wan & Hal Daumé III (eds.). 2012. Proceedings of the First Workshop on Multilingual Modeling, July 13, 2012,Jeju, Republic of Korea. http://www.aclweb.org/anthology/W12-39Search in Google Scholar

Kurimo, Mikko, Sami Virpioja, Ville Turunen & Krista Lagus. 2010. Morpho Challenge competition 2005–2010: Evaluations and results. ACL Special Interest Group on Computational Morphology and Phonology 11. 87–95. http://www.aclweb.org/anthology/W10-2211Search in Google Scholar

Lewis, William D. 2006. ODIN: A model for adapting and enriching legacy infrastructure. IEEE International Conference on E-Science 2. 137.10.1109/E-SCIENCE.2006.261070Search in Google Scholar

Lewis, William D. & Fei Xia. 2008. Automatically identifying computationally relevant typological features. International Joint Conference on Natural Language Processing 3(2). 685–690. http://www.aclweb.org/anthology/I08-2093Search in Google Scholar

Lewis, William D. & Fei Xia. 2010. Developing ODIN: A multilingual repository of annotated language data for hundreds of the world’s languages. Journal of Literary and Linguistic Computing 25. 303–319.10.1093/llc/fqq006Search in Google Scholar

Lu, Xia. 2013. Exploring word order universals: A probabilistic graphical model approach. Association for Computational Linguistics 51(3: Student research workshop). 150–157. http://www.aclweb.org/anthology/P13-3022Search in Google Scholar

Manning, Christopher D. & Hinrich Schütze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.Search in Google Scholar

Marcus, Mitchell P., Beatrice Santorini & Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19. 313–330.10.21236/ADA273556Search in Google Scholar

McDonald, Ryan, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló & Jungmee Lee. 2013. Universal dependency annotation for multilingual parsing. Association for Computational Linguistics 51(2: Short papers). 92–97. http://www.aclweb.org/anthology/P13-2017Search in Google Scholar

Naseem, Tahira, Regina Barzilay & Amir Globerson. 2012. Selective sharing for multilingual dependency parsing. Association for Computational Linguistics 50(1: Long papers). 629–637. http://www.aclweb.org/anthology/P12-1066Search in Google Scholar

Nivre, Joakim, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel & Deniz Yuret. 2007. The CoNLL 2007 shared task on dependency parsing. Joint Conference on Empirical Methods in Natural Language Processing & Computational Natural Language Learning 2007. 915–932. http://www.aclweb.org/anthology/D/D07/D07-1096Search in Google Scholar

Nivre, Joakim, Johan Hall, Jens Nilsson, Atanas Chanev, Gülşen Eryigit, Sandra Kübler, Svetoslav Marinov & Erwin Marsi. 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13. 95–135.10.1017/S1351324906004505Search in Google Scholar

Östling, Robert. 2015. Word order typology through multilingual word alignment. Association for Computational Linguistics 53(2: Short papers). 205–211. http://www.aclweb.org/anthology/P15-203410.3115/v1/P15-2034Search in Google Scholar

Payne, John R. 1985. Complex phrases and complex sentences. In Timothy Shopen (ed.), Language typology and syntactic description, Vol. 2: Complex constructions, 3–41. Cambridge: Cambridge University Press.Search in Google Scholar

Petrov, Slav, Dipanjan Das & Ryan McDonald. 2012. A universal part-of-speech tagset. International Conference on Language Resources and Evaluation 8. 2089–2096. http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdfSearch in Google Scholar

Pollard, Carl & Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press.Search in Google Scholar

Poulson, Laurie. 2011. Meta-modeling of tense and aspect in a crosslinguistic grammar engineering platform. University of Washington Working Papers in Linguistics 28. http://http://depts.washington.edu/uwwpl/vol28/poulson_2011.pdfSearch in Google Scholar

Rama, Taraka & Prasanth Kolachina. 2012. How good are typological distances for determining genealogical relationships among languages? International Conference on Computational Linguistics 24(Posters). 975–984. http://www.aclweb.org/anthology/C12-2095Search in Google Scholar

Saleem, Safiyyah. 2010. Argument optionality: A new library for the grammar matrix customization system. Seattle: University of Washington MA thesis.Search in Google Scholar

Saleem, Safiyyah & Emily M. Bender. 2010. Argument optionality in the LinGO Grammar Matrix. International Conference on Computational Linguistics 23(Posters). 1068–1076. http://www.aclweb.org/anthology/C10-2123Search in Google Scholar

Schultz, Tanja & Katrin Kirchhoff (eds.). 2006. Multilingual speech processing. Burlington, MA: Academic Press.Search in Google Scholar

Siewierska, Anna. 2004. Person. Cambridge: Cambridge University Press.10.1017/CBO9780511812729Search in Google Scholar

Søgaard, Anders. 2011. Data point selection for cross-language adaptation of dependency parsers. Association for Computational Linguistics: Human Language Technologies 49(2). 682–686. http://www.aclweb.org/anthology/P11-2120Search in Google Scholar

Song, Sanghoun. 2014. A grammar library for information structure. Seattle: University of Washington doctoral dissertation. http://hdl.handle.net/1773/25372Search in Google Scholar

Stassen, Leon. 2000. AND-languages and WITH-languages. Linguistic Typology 4. 1–54.10.1515/lity.2000.4.1.1Search in Google Scholar

Stassen, Leon. 2003. Intransitive predication. Oxford: Oxford University Press.Search in Google Scholar

Stassen, Leon. 2013. Predicative adjectives. In Dryer & Haspelmath (eds.) 2013, Chapter 118. http://wals.info/feature/118Search in Google Scholar

Täckström, Oscar, Ryan McDonald & Joakim Nivre. 2013. Target language adaptation of discriminative transfer parsers. North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2013(1). 1061–1071. http://www.aclweb.org/anthology/N13-1126Search in Google Scholar

Teh, Yee W., Hal Daumé III & Daniel M. Roy. 2007. Bayesian agglomerative clustering with coalescents. In John C. Platt, Daphne Koller, Yoram Singer & Sam T. Roweis (eds.), Advances in neural information processing systems 20. 1463–1480. Cambridge, MA: MIT Press.Search in Google Scholar

Trimble, Thomas James. 2014. Adjectives in the LinGO Grammar Matrix. Seattle: University of Washington MS thesis. http://hdl.handle.net/1773/27512Search in Google Scholar

Xia, Fei, William D. Lewis, Michael Wayne Goodman, Glenn Slayden, Ryan Georgi, Joshua Crowgey & Emily M. Bender. 2016. Enriching a massively multilingual database of interlinear glossed text. Language Resources and Evaluation 50. 321–349.10.1007/s10579-015-9325-4Search in Google Scholar

Yarowsky, David, Grace Ngai & Richard Wicentowski. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the First International Conference on Human Language Technology Research, 1–8. http://www.aclweb.org/anthology/H01-103510.3115/1072133.1072187Search in Google Scholar

Zeman, Daniel & Philip Resnik. 2008. Cross-language parser adaptation between related languages. International Joint Conference on Natural Language Processing 3(Workshop on NLP for Less Privileged Languages). 35–42. http://www.aclweb.org/anthology/I08-3008Search in Google Scholar

Zhang, Yuan & Regina Barzilay. 2015. Hierarchical low-rank tensors for multilingual transfer parsing. Conference on Empirical Methods in Natural Language Processing 2015. 1857–1867. http://aclweb.org/anthology/D15-121310.18653/v1/D15-1213Search in Google Scholar

Received: 2016-8-3
Revised: 2016-9-6
Published Online: 2016-12-23
Published in Print: 2016-12-1

©2016 by De Gruyter Mouton

Downloaded on 28.4.2024 from https://www.degruyter.com/document/doi/10.1515/lingty-2016-0035/html
Scroll to top button