Skip to main content
Log in

Language clusters based on linguistic complex networks

  • Article
  • Applied Physics
  • Published:
Chinese Science Bulletin

Abstract

To investigate the feasibility of using complex networks in the study of linguistic typology, this paper builds and explores 15 linguistic complex networks based on the dependency syntactic treebanks of 15 languages. The results show that it is possible to classify human languages by means of the following main parameters of complex networks: (a) average degree of the node, (b) cluster coefficients, (c) average path length, (d) network centralization, (e) diameter, (f) power exponent of degree distribution, and (g) the determination coefficient of power law distributions. The precision of this method is similar to the results achieved by means of modern word order typology. This paper tries to solve two problems of current linguistic typology. First, the language sample of a typological study is not real text; second, typological studies pay too much attention to local language structures in the course of choosing typological parameters. This study performs better in global typological features of language and not only enhances typological methods, but it is also valuable for developing the applications of complex networks in the humanities, social, and life sciences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hudson R. Language Networks: The New Word Grammar. Oxford: Oxford University Press, 2007

    Google Scholar 

  2. Ferrer i Cancho R. The structure of syntactic dependency networks: Insights from recent advances in network theory. In: Altmann G, Levickij V, Perebyinis V, eds. The Problems of Quantitative Linguistics. Chernivtsi: Ruta, 2005. 60–75

    Google Scholar 

  3. Ferrer i Cancho R, SoléR V, Köhler R. Patterns in syntactic dependency networks. Phys Rev E, 2004, 69: 051915

    Article  Google Scholar 

  4. Liang W, Shi Y, Tse C K, et al. Comparison of co-occurrence networks of the Chinese and English languages. Physica A, 2010, 388: 4901–4909

    Article  Google Scholar 

  5. Li J, Zhou J. Chinese character structure analysis based on complex networks. Physica A, 2007, 380: 629–638

    Article  Google Scholar 

  6. Li Y, Wei L, Niu Y, et al. Structural organization and scale-free properties in Chinese phrase networks. Chinese Sci Bull, 2005, 50: 1304–1308

    Article  Google Scholar 

  7. Liu H K, Zhang X L, Cao L, et al. Analysis on the connecting mechanism of Chinese city airline network (in Chinese). Sci China Ser G (Chinese Ver), 2009, 39: 935–942

    Google Scholar 

  8. Liu H. The complexity of Chinese dependency syntactic networks. Physica A, 2008, 387: 3048–3058

    Google Scholar 

  9. Altmann G, Lehfeldt W. Allgemeine Sprachtypologie: Prinzipien und Messverfahren. Munich: Fink, 1973

    Google Scholar 

  10. Croft W. Typology and Universals. 2nd ed. Cambridge: Cambridge University Press, 2002

    Google Scholar 

  11. Song J. Linguistic Typology: Morphology and Syntax. Harlow and London: Pearson Education, 2001

    Google Scholar 

  12. Liu H. Dependency direction as a means of word-order typology: A method based on dependency treebanks. Lingua, 2010, 120: 1567–1578

    Article  Google Scholar 

  13. Liu H. Statistical properties of Chinese semantic networks. Chinese Sci Bull, 2009, 54: 2781–2785

    Article  Google Scholar 

  14. Liu H, Hu F. What role does syntax play in a language network? Europhys Lett, 2008, 83: 18002

    Article  Google Scholar 

  15. Mehler A. Large text networks as an object of corpus linguistic studies. In: Lüdeling A, Merja K, eds. Corpus Lin-guistics. An International Handbook. Berlin, New York: de Gruyter, 2008. 328–382

    Google Scholar 

  16. Čech R, Mačutek J. Word form and lemma syntactic dependency networks in Czech: A comparative study. Glottometrics, 2009, 19:85–98

    Google Scholar 

  17. Choudhury M, Mukherjee A. The structure and dynamics of linguistic networks. In: Dynamics on and of Complex Networks, Modeling and Simulation in Science, Engineering and Technology. Boston: Birkhaeuser, 2009. 145–166

    Google Scholar 

  18. Ke J, Yao Y. Analyzing language development from a network approach. J Quant Linguistics, 2008, 15: 70–99

    Article  Google Scholar 

  19. Mukherjee A, Choudhury M, Basu A, et al. Self-organization of the sound inventories: Analysis and synthesis of the occurrence and co-occurrence networks of consonants. J Quant Linguistics, 2009, 16: 157–184

    Article  Google Scholar 

  20. Peng G, Minett J W, Wang W S Y. The networks of syllables and characters in Chinese. J Quant Linguistics, 2008, 15: 243–255

    Article  Google Scholar 

  21. He D, Liu Z, Wang B. Complex Systems and Complex Networks (in Chinese). Beijing: Higher Education Press, 2009

    Google Scholar 

  22. Albert R, Barabási A L. Statistical mechanics of complex networks. Rev Mod Phys, 2002, 74: 47–97

    Article  Google Scholar 

  23. Dong J, Horvath S. Understanding network concepts in modules. BMC Syst Biol, 2007, 1: 24

    Article  Google Scholar 

  24. Assenov Y, Ramírez F, Schelhorn S E, et al. Computing topological parameters of biological networks. Bioinformatics, 2008, 24: 282–284

    Article  Google Scholar 

  25. Aduriz I. Construction of a Basque dependency treebank. In: Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories, Vaxjo, Sweden. 2003

  26. Afonso S. Floresta sinta(c)tica: A treebank for Portuguese. In: Proceedings of LREC-2002, 2002. 1698–1703

  27. Atalay N B, Oflazer K, Say B. The annotation process in the Turkish treebank. In: Proceedings of LINC-2003, 2003

  28. Bamman D, Crane G. The design and use of a Latin dependency treebank. In: Proceedings of the Fifth International Workshop on Treebanks and Linguistic Theories (TLT 2006), 2006. 67–78

  29. Bamman D, Mambrini F, Crane G. An ownership model of annotation: The ancient Greek dependency treebank. In: Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT8), 2009. 5–15

  30. Csendes D. The szeged treebank. In: Proceedings of the 8th International Conference on Text, Speech and Dialogue, TSD 2005, LNAI 3658, 2005. 123–131

  31. Torruella M C, Antonın M. Design principles for a Spanish treebank. In: Proceedings of TLT-2002, 2002

  32. Kawata Y, Bartels J. Stylebook for the Japanese treebank in VERBMOBIL. Verbmobil-Report 240, Seminar fur Sprachwissenschaft, Universitat Tubingen, 2000

  33. Liu H. Building and using a Chinese dependency treebank. GrKG/Humankybernetik, 2007, 48: 3–14

    Google Scholar 

  34. Montemagni S, Barsotti F, Battista M, et al. Building the Italian Syntactic-Semantic Treebank. Treebanks, 2003. 189–210

  35. Prokopidis P, Desipri E, Koutsombogera M, et al. Theoretical and practical issues in the construction of a Greek dependency treebank. In: Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), 2005. 149–160

  36. Buchholz S, Marsi E. CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), 2006. 149–164

  37. Nivre J, Hall J, Kübler S, et al. The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007. 915–932

  38. Hajic J, Smrz O, Zemanek P, et al. Prague Arabic dependency treebank: Development in data and tools. In: Proceedings of NEMLAR-2004, 2004. 110–117

  39. Liu H. Dependency distance as a metric of language comprehension difficulty. J Cognit Sci, 2008, 9: 159–191

    Google Scholar 

  40. Clauset A, Shalizi C R, Newman M E J. Power-law distributions in empirical data. SIAM Rev, 2009, 51: 661–703

    Article  Google Scholar 

  41. Liu H T, Feng Z W. Probabilistic valency pattern theory for natural language processing (in Chinese). Linguistic Sci, 2007, 3: 32–41

    Google Scholar 

  42. Greenberg J H. A quantitative approach to the morphological typology of language. In: Method and Perspective in Anthropology. Minneapolis: University of Minnesota Press, 1954. 192–220

    Google Scholar 

  43. Cysouw M. New approaches to cluster analysis of typological indices. In: Köhler R, Grzbek P, eds. Exact Methods in the Study of Language and Text. Berlin: Mouton de Gruyter, 2007. 61–76

    Google Scholar 

  44. Bryant D, Moulton V. Neighbor-Net: An agglomerative method for the construction of phylogenetic networks. Mol Biol Evolut, 2004, 21: 255–265

    Article  Google Scholar 

  45. Deng X H, Wang S Y. Classification of Languages and Dialects in China (in Chinese). Beijing: ZhongHua Book Company, 2009

    Google Scholar 

  46. Haspelmath M, Dryer M, Gil D, et al. The World Atlas of Language Structures. Oxford: Oxford University Press, 2005

    Google Scholar 

  47. Liu H T, Zhao Y Y, Huang W. How do local syntactic structures influence global properties in language networks? Glottometrics, 2010, 20: 39–59

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to HaiTao Liu.

About this article

Cite this article

Liu, H., Li, W. Language clusters based on linguistic complex networks. Chin. Sci. Bull. 55, 3458–3465 (2010). https://doi.org/10.1007/s11434-010-4114-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11434-010-4114-3

Keywords

Navigation