Abstract
To investigate the feasibility of using complex networks in the study of linguistic typology, this paper builds and explores 15 linguistic complex networks based on the dependency syntactic treebanks of 15 languages. The results show that it is possible to classify human languages by means of the following main parameters of complex networks: (a) average degree of the node, (b) cluster coefficients, (c) average path length, (d) network centralization, (e) diameter, (f) power exponent of degree distribution, and (g) the determination coefficient of power law distributions. The precision of this method is similar to the results achieved by means of modern word order typology. This paper tries to solve two problems of current linguistic typology. First, the language sample of a typological study is not real text; second, typological studies pay too much attention to local language structures in the course of choosing typological parameters. This study performs better in global typological features of language and not only enhances typological methods, but it is also valuable for developing the applications of complex networks in the humanities, social, and life sciences.
Similar content being viewed by others
References
Hudson R. Language Networks: The New Word Grammar. Oxford: Oxford University Press, 2007
Ferrer i Cancho R. The structure of syntactic dependency networks: Insights from recent advances in network theory. In: Altmann G, Levickij V, Perebyinis V, eds. The Problems of Quantitative Linguistics. Chernivtsi: Ruta, 2005. 60–75
Ferrer i Cancho R, SoléR V, Köhler R. Patterns in syntactic dependency networks. Phys Rev E, 2004, 69: 051915
Liang W, Shi Y, Tse C K, et al. Comparison of co-occurrence networks of the Chinese and English languages. Physica A, 2010, 388: 4901–4909
Li J, Zhou J. Chinese character structure analysis based on complex networks. Physica A, 2007, 380: 629–638
Li Y, Wei L, Niu Y, et al. Structural organization and scale-free properties in Chinese phrase networks. Chinese Sci Bull, 2005, 50: 1304–1308
Liu H K, Zhang X L, Cao L, et al. Analysis on the connecting mechanism of Chinese city airline network (in Chinese). Sci China Ser G (Chinese Ver), 2009, 39: 935–942
Liu H. The complexity of Chinese dependency syntactic networks. Physica A, 2008, 387: 3048–3058
Altmann G, Lehfeldt W. Allgemeine Sprachtypologie: Prinzipien und Messverfahren. Munich: Fink, 1973
Croft W. Typology and Universals. 2nd ed. Cambridge: Cambridge University Press, 2002
Song J. Linguistic Typology: Morphology and Syntax. Harlow and London: Pearson Education, 2001
Liu H. Dependency direction as a means of word-order typology: A method based on dependency treebanks. Lingua, 2010, 120: 1567–1578
Liu H. Statistical properties of Chinese semantic networks. Chinese Sci Bull, 2009, 54: 2781–2785
Liu H, Hu F. What role does syntax play in a language network? Europhys Lett, 2008, 83: 18002
Mehler A. Large text networks as an object of corpus linguistic studies. In: Lüdeling A, Merja K, eds. Corpus Lin-guistics. An International Handbook. Berlin, New York: de Gruyter, 2008. 328–382
Čech R, Mačutek J. Word form and lemma syntactic dependency networks in Czech: A comparative study. Glottometrics, 2009, 19:85–98
Choudhury M, Mukherjee A. The structure and dynamics of linguistic networks. In: Dynamics on and of Complex Networks, Modeling and Simulation in Science, Engineering and Technology. Boston: Birkhaeuser, 2009. 145–166
Ke J, Yao Y. Analyzing language development from a network approach. J Quant Linguistics, 2008, 15: 70–99
Mukherjee A, Choudhury M, Basu A, et al. Self-organization of the sound inventories: Analysis and synthesis of the occurrence and co-occurrence networks of consonants. J Quant Linguistics, 2009, 16: 157–184
Peng G, Minett J W, Wang W S Y. The networks of syllables and characters in Chinese. J Quant Linguistics, 2008, 15: 243–255
He D, Liu Z, Wang B. Complex Systems and Complex Networks (in Chinese). Beijing: Higher Education Press, 2009
Albert R, Barabási A L. Statistical mechanics of complex networks. Rev Mod Phys, 2002, 74: 47–97
Dong J, Horvath S. Understanding network concepts in modules. BMC Syst Biol, 2007, 1: 24
Assenov Y, Ramírez F, Schelhorn S E, et al. Computing topological parameters of biological networks. Bioinformatics, 2008, 24: 282–284
Aduriz I. Construction of a Basque dependency treebank. In: Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories, Vaxjo, Sweden. 2003
Afonso S. Floresta sinta(c)tica: A treebank for Portuguese. In: Proceedings of LREC-2002, 2002. 1698–1703
Atalay N B, Oflazer K, Say B. The annotation process in the Turkish treebank. In: Proceedings of LINC-2003, 2003
Bamman D, Crane G. The design and use of a Latin dependency treebank. In: Proceedings of the Fifth International Workshop on Treebanks and Linguistic Theories (TLT 2006), 2006. 67–78
Bamman D, Mambrini F, Crane G. An ownership model of annotation: The ancient Greek dependency treebank. In: Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT8), 2009. 5–15
Csendes D. The szeged treebank. In: Proceedings of the 8th International Conference on Text, Speech and Dialogue, TSD 2005, LNAI 3658, 2005. 123–131
Torruella M C, Antonın M. Design principles for a Spanish treebank. In: Proceedings of TLT-2002, 2002
Kawata Y, Bartels J. Stylebook for the Japanese treebank in VERBMOBIL. Verbmobil-Report 240, Seminar fur Sprachwissenschaft, Universitat Tubingen, 2000
Liu H. Building and using a Chinese dependency treebank. GrKG/Humankybernetik, 2007, 48: 3–14
Montemagni S, Barsotti F, Battista M, et al. Building the Italian Syntactic-Semantic Treebank. Treebanks, 2003. 189–210
Prokopidis P, Desipri E, Koutsombogera M, et al. Theoretical and practical issues in the construction of a Greek dependency treebank. In: Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), 2005. 149–160
Buchholz S, Marsi E. CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), 2006. 149–164
Nivre J, Hall J, Kübler S, et al. The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007. 915–932
Hajic J, Smrz O, Zemanek P, et al. Prague Arabic dependency treebank: Development in data and tools. In: Proceedings of NEMLAR-2004, 2004. 110–117
Liu H. Dependency distance as a metric of language comprehension difficulty. J Cognit Sci, 2008, 9: 159–191
Clauset A, Shalizi C R, Newman M E J. Power-law distributions in empirical data. SIAM Rev, 2009, 51: 661–703
Liu H T, Feng Z W. Probabilistic valency pattern theory for natural language processing (in Chinese). Linguistic Sci, 2007, 3: 32–41
Greenberg J H. A quantitative approach to the morphological typology of language. In: Method and Perspective in Anthropology. Minneapolis: University of Minnesota Press, 1954. 192–220
Cysouw M. New approaches to cluster analysis of typological indices. In: Köhler R, Grzbek P, eds. Exact Methods in the Study of Language and Text. Berlin: Mouton de Gruyter, 2007. 61–76
Bryant D, Moulton V. Neighbor-Net: An agglomerative method for the construction of phylogenetic networks. Mol Biol Evolut, 2004, 21: 255–265
Deng X H, Wang S Y. Classification of Languages and Dialects in China (in Chinese). Beijing: ZhongHua Book Company, 2009
Haspelmath M, Dryer M, Gil D, et al. The World Atlas of Language Structures. Oxford: Oxford University Press, 2005
Liu H T, Zhao Y Y, Huang W. How do local syntactic structures influence global properties in language networks? Glottometrics, 2010, 20: 39–59
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Liu, H., Li, W. Language clusters based on linguistic complex networks. Chin. Sci. Bull. 55, 3458–3465 (2010). https://doi.org/10.1007/s11434-010-4114-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11434-010-4114-3