Published August 15, 2017 | Version v1
Journal article Open

On the parametric model of length distribution of the words on the literary texts example in spanish italian and swedish languages

Creators

Description

We study regularities, to which the relative frequencies of the word lengths are subject, if the entire series of relative frequencies is divided into several segments.

In the case of the Spanish language, there are four segments: lengths 1-2 (linear function  with positive slope); Lengths 3-5 (a polynomial of the second order  with branches directed upwards); Lengths 6-11 (linear function with negative slope); Length 12 and more (geometric progression  with a denominator less than 1). Here n is the length of the word (the number of letters in it).

In the case of the Italian language, there are also four lengths: lengths 1-3 and 4-6 (polynomials of the second order with branches directed downwards); Length 7-11 (geometric progression with denominator less than 1); Length 12 and more (geometric progression with a denominator less than 1).

In the case of the Swedish language, there are three segments: lengths 1-3 (a second-order polynomial with branches pointing upwards); Length 4-6 (second-order polynomial with branches directed downwards); Length 7 and more (geometric progression with a denominator less than 1).

Coefficients of equations are parameters that can be estimated for a given text on the basis of its statistical characteristics.

Five texts in Spanish and Swedish and six texts in Italian were considered. Then all the texts in the given language were combined into one text and distribution was considered.

Files

Палий И. А. .pdf

Files (1.2 MB)

Name Size Download all
md5:e5ba4802c69b25295f5fbb5c97dfeed2
1.2 MB Preview Download

Additional details

Related works

Is compiled by
2414-2948 (ISSN)
Is identical to
http://www.bulletennauki.com/palii (URL)