A statistical probe into the word frequency and length distributions prevalent in the translations of Bhagavad Gita

Rajput, Nikhil Kumar; Ahuja, Bhavya; Riyal, Manoj Kumar

doi:10.1007/s12043-018-1709-8

A statistical probe into the word frequency and length distributions prevalent in the translations of Bhagavad Gita

Published: 15 February 2019

Volume 92, article number 60, (2019)
Cite this article

Pramana Aims and scope Submit manuscript

Nikhil Kumar Rajput¹,
Bhavya Ahuja¹ &
Manoj Kumar Riyal²

261 Accesses
6 Citations
2 Altmetric
Explore all metrics

Abstract

A statistical study has been conducted on Bhagavad Gita. Four measures have been derived for the original text in Sanskrit and its translations in Hindi, English and French. First, word frequency distributions for the documents were modelled. Power law was observed with the longest tail in the case of Sanskrit. For other versions, the distributions well replicated the Zipf–Mandelbrot pattern. Second, the Kullback–Leibler (KL) divergence between the documents has been computed with the highest value recorded in all three translations from the Sanskrit text. Next, a Shannon entropy-based measure: vocabulary quotient has been calculated, which estimates the vocabulary richness the texts offer; the highest being in the case of Bhagavad Gita in Sanskrit. Finally, word-length distributions were obtained with the longest word length in Sanskrit. The results attribute to the inflectional nature of Sanskrit.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural Language Processing

Word prevalence norms for 62,000 English lemmas

Article 02 July 2018

GPT-3: Its Nature, Scope, Limits, and Consequences

Article Open access 01 November 2020

References

C D Manning and H Schütze, Foundations of statistical natural language processing (MIT Press, UK, 1999)
R Harald Baayen, Word frequency distributions (Springer Science & Business Media, 2001), Vol. 18
G K Zipf, The psycho-biology of language (George Routledge & Sons, Ltd., 1936), reprinted in 2002
W Li, IEEE Trans. Inf. Theory 38(6), 1842 (1992)
B Mandelbrot, Information theory and psycholinguistics (BB Wolman and E, USA, 1965)
H Baayen, Comput. Human. 26(5–6), 347 (1992)
Article Google Scholar
J B Carroll, Proceedings of the Conference on Language and Language Behavior edited by E M Zale (Appleton-Century-Crofts, New York, 1968) pp. 213–235
J Narisong Jiang and H Liu, J. Quant. Linguist. 21(2), 123 (2014)
S Shtrikman, J. Inf. Sci. 20(2), 142 (1994)
Article Google Scholar
S Miyazima, Y Lee, T Nagamine and H Miyajima, Phys. A: Stat. Mech. Appl. 278(1–2), 282 (2000)
Article Google Scholar
B D Jayaram and M N Vidya, J. Quant. Linguist. 15(4), 293 (2008)
Article Google Scholar
C E Shannon, ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3 (2001)
Article Google Scholar
W Ebeling and G Nicolis, Chaos Solitons Fractals 2(6), 635 (1992)
Article ADS MathSciNet Google Scholar
A Stolcke, Entropy-based pruning of backoff language models, arXiv:cs/0006025 (2000)
D Genzel and E Charniak, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Philadelphia, 2002) pp. 199–206
W Ebeling and T Pöschel, Europhys. Lett. 26(4), 241 (1994)
Article ADS Google Scholar
M A Montemurro and D H Zanette, Adv. Complex Syst. 5(01), 7 (2002)
Article Google Scholar
C C Hoi Hee, Singapore Manag. Rev. 29(1), 73 (2007)
D V Jeste and I V Vahia, Psychiatry Interpers. Biol. Process. 71(3), 197 (2008)
Article Google Scholar
W J Johnson, The Bhagavad Gita (Oxford University Press, New York, 1994)
Google Scholar
S Radakrishnan, Int. J. Ethics 21(4), 465 (1911)
Article Google Scholar
www.gitasupersite.iitk.ac.in
www.archive.org/stream/LaBhagavadGita-FrenchTranHrBslationHrB
www.gutenberg.org
A Mehri and M Jamaati, Phys. Lett. A 381(31), 2470 (2017)
Article ADS Google Scholar
M Wiegand, S Nadarajah and Y Si, Phys. Lett. A 382, 621 (2018)
Article ADS Google Scholar
M E J Newman, Contemp. Phys. 46(5), 323 (2005)
Article ADS Google Scholar
M A Montemurro, Phys. A: Stat. Mech. Appl. 300(3–4), 567 (2001)
Article Google Scholar
A K Singh et al, IEEE Commun. Lett. 18(8), 1335 (2014)
Article Google Scholar
T M Cover and J A Thomas, Elements of information theory (John Wiley & Sons, USA, 2012)
N K Rajput, B Ahuja and M K Riyal, Digit. Scholarship Human. 33, 894 (2018)
Article Google Scholar
G Wimmer, R Köhler, R Grotjahn and G Altmann, J. Quant. Linguist. 1(1), 98 (1994)
Article Google Scholar
C B Williams, Biometrika 62(1), 207 (1975)
Article Google Scholar
B Sigurd, M Eeg-Olofsson and J Van Weijer, Studia Linguist. 58(1), 37 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Ramanujan College, University of Delhi, New Delhi, 110 019, India
Nikhil Kumar Rajput & Bhavya Ahuja
Department of Physics, Veer Chandra Singh Garhwali Uttarakhand University of Horticulture and Forestry, Tehri Garhwal, 246 123, India
Manoj Kumar Riyal

Authors

Nikhil Kumar Rajput
View author publications
You can also search for this author in PubMed Google Scholar
Bhavya Ahuja
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Kumar Riyal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bhavya Ahuja.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rajput, N.K., Ahuja, B. & Riyal, M.K. A statistical probe into the word frequency and length distributions prevalent in the translations of Bhagavad Gita. Pramana - J Phys 92, 60 (2019). https://doi.org/10.1007/s12043-018-1709-8

Download citation

Received: 02 July 2018
Revised: 08 August 2018
Accepted: 27 August 2018
Published: 15 February 2019
DOI: https://doi.org/10.1007/s12043-018-1709-8

Keywords

PACS Nos

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A statistical probe into the word frequency and length distributions prevalent in the translations of Bhagavad Gita

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

Word prevalence norms for 62,000 English lemmas

GPT-3: Its Nature, Scope, Limits, and Consequences

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

PACS Nos

Navigation

A statistical probe into the word frequency and length distributions prevalent in the translations of Bhagavad Gita

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

Word prevalence norms for 62,000 English lemmas

GPT-3: Its Nature, Scope, Limits, and Consequences

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

PACS Nos

Search

Navigation