Chinese Syntactic and Typological Properties Based on Dependency Syntactic Treebanks
This paper offers a quantitative analysis of the syntactic and typological properties of Chinese based on five Chinese dependency treebanks. The study shows that mean dependency distance of Chinese is 2.84; 40-50% dependencies are between non-adjacent words; Chinese is a mixed language with a governor-final and SV-VO-AdjN preference; the mean dependency distance of governor-initial dependencies is greater than that of governor-final ones. Methodologically, the paper adopts five treebanks with different text genres and annotation schemes as a resource to study syntactic features of a language. This method avoids corpus influences on results so that the conclusions can be more reliable and robust. If suitable treebanks are available, it will be an easy task to apply our method to other languages. In this way, the method has a broad theoretical and cross-linguistic perspective.
References
Abeillé A. (ed.). 2003. Treebank: Building and using parsed corpora. Dordrecht: Kluwer.Search in Google Scholar
Best, K.-H. 2006. Quantitative Linguistik: Eine Annaeherung. (3rd ed.) Göttingen: Peust & Gutschmidt.Search in Google Scholar
Bod, R., J. Hay and S. Jannedy (eds.). 2003. Probabilistic linguistics. Cambridge, MA: MIT Press.10.7551/mitpress/5582.001.0001Search in Google Scholar
Buch-Kromann, M. 2006. Discontinuous Grammar. A dependency-based model of human parsing and language acquisition. (Unpublished PhD dissertation, Copenhagen Business School.)Search in Google Scholar
Chen, K.-J. et al. 2003. "Sinica treebank: Design criteria, representational issues and implementation". In: Abeillé A. (ed.). 231-248.Search in Google Scholar
Collins, M. 1996. "A new statistical parser based on bigram lexical dependencies". Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA. 184-191.Search in Google Scholar
Cowan, N. 2005. Working memory capacity. Hove: Psychology Press.Search in Google Scholar
De Smedt, K., J. Hajič and S. Kübler (eds.). 2007. Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories. December 7-8, 2007. Bergen, Norway.Search in Google Scholar
Gries, S.Th. 2009. Quantitative corpus linguistics with R: A practical introduction. London: Routledge.10.4324/9780203880920Search in Google Scholar
Haspelmath, M., M. Dryer, D. Gil and B. Comrie (eds.). 2005. The world atlas of language structures. Oxford: Oxford University Press.Search in Google Scholar
Hudson, R. 1995. Measuring Syntactic Difficulty. http://www.phon.ucl.ac.uk/home/dick/difficulty.htmSearch in Google Scholar
Hudson, R. 2007. Language networks. The new word grammar. Oxford: Oxford University Press.Search in Google Scholar
Kakkonen, T. 2005. "Dependency treebanks: Methods, annotation schemes and tools". Proceedings of the 15th Nordic Conference of Computational Linguistics (NODALIDA 2005), Joensuu, Finland. 94-104.Search in Google Scholar
Köhler, R. and G. Altmann. 2000. "Probability distributions of syntactic units and properties". Journal of Quantitative Linguistics 7(3). 189-200.10.1076/jqul.7.3.189.4114Search in Google Scholar
Köhler, R., G. Altmann, and R.G. Piotrowski (eds.). 2005. Quantitative Linguistik. Ein internationales Handbuch [Quantitative linguistics. An international handbook]. Berlin: Mouton de Gruyter.Search in Google Scholar
Kühler, S., R. McDonald and J. Nivre. 2009. Dependency parsing. San Rafael, CA: Morgan and Claypool.10.2200/S00169ED1V01Y200901HLT002Search in Google Scholar
Liu, H. 2007a. "Probability distribution of dependency distance". Glottometrics 15. 1-12.Search in Google Scholar
Liu, H. 2007b. "Building and using a Chinese dependency treebank". Grkg/Humankybernetik, 48(1). 3-14.Search in Google Scholar
Liu, H. 2008. "Dependency distance as a metric of language comprehension difficulty". Journal of Cognitive Science 9(2). 159-191.10.17791/jcs.2008.9.2.159Search in Google Scholar
Liu, H. 2009a. "Probability distribution of dependencies based on Chinese Dependency Treebank". Journal of Quantitative Linguistics 16 (3). 256-273.10.1080/09296170902975742Search in Google Scholar
Liu, H. 2009b. Dependency grammar: From theory to practice. Beijing: Science Press.Search in Google Scholar
Liu, H. In press. "Dependency direction as a means of word-order typology: A method based on dependency treebanks". Lingua. doi: 10.1016/j.lingua.2009.10.001.10.1016/j.lingua.2009.10.001Search in Google Scholar
Liu, H., R. Hudson and Zh. Feng 2009. "Using a Chinese treebank to measure dependency distance". Corpus Linguistics and Linguistic Theory 5(2). 161-174.10.1515/CLLT.2009.007Search in Google Scholar
Ma, J. 2007. Research on Chinese dependency parsing based on statistical methods. (Unpublished PhD thesis, Harbin Technology University.)Search in Google Scholar
Marcus, M., B. Santorini and M.A. Marcinkiewicz. 1993. "Building a large annotated corpus of English: The Penn Treebank". Computational Linguistics 19(2). 313-330.10.21236/ADA273556Search in Google Scholar
Mel'čuk, I.A. 1988. Dependency syntax: Theory and practice. Albany: State University Press of New York.Search in Google Scholar
Miller, G. 1956. "The magical number seven plus or minus two: Some limits on our capacity for processing information". Psychological Review 63. 81-97.10.1037/h0043158Search in Google Scholar
Ninio, A. 2006. Language and the learning curve: A new theory of syntactic development. Oxford: Oxford University Press.10.1093/acprof:oso/9780199299829.003.0003Search in Google Scholar
Tesnière, L. 1959. Eléments de la syntaxe structurale. Paris: Klincksieck.Search in Google Scholar
Xue, N., F. Xia, F.-D. Chiou and M. Palmer 2005. "The Penn Chinese TreeBank: Phrase structure annotation of a large corpus". Natural Language Engineering 11(2). 207-238.10.1017/S135132490400364XSearch in Google Scholar
This content is open access.