Principle of Least Effort and Sentence Length in Public Speaking

The analysis of sentence lengths in the inaugural speeches of US presidents and the annual speeches of UK party leaders is carried out. Transcripts of the speeches are used, rather than the oral production. It is discovered that the average sentence length in these speeches decreases linearly with time, with the slope of 0.13 ± 0.03 words/year. It is shown that among the analyzed distributions (log-normal, folded and half normal, Weibull, generalized Pareto, Rayleigh) the Weibull is the best distribution for describing sentence length. These two results can be considered a consequence of the principle of least effort. The connection of this principle with the well-known principles of maximum and minimum entropy production is discussed.


Introduction
The study of natural languages is extremely important not only for the human and social sciences, but also for the sciences that study the development patterns of complex systems (synergetics, cybernetics, etc.). An important section of language science is quantitative linguistics, which uses mathematical methods to establish language laws (note that the objectives and methods of quantitative linguistics go beyond the mere study of linguistic laws, see, e.g., [1,2]). At present, several similar laws are considered, and among them, the most famous are Zipf's law, Herdan's law, Brevity law, and Menzerath-Altmann's law [3][4][5][6][7]. Such laws, found mostly by statistical methods, indicate existing regularities between various elements of language (phonemes, words, etc.).
The most important element of language is the sentence-the object of this study. According to the Cambridge dictionary, a sentence is a group of words, usually containing a verb, that expresses a thought in the form of a statement, question, instruction, or exclamation. Sentences have semantic completeness; they express a particular thought of a person and serve to communicate it with other people. Based on the above sentence qualities, the study of these structural units is essential for cognitive science, which is of great interest. A metaphor from atomic physics would be very appropriate to illustrate this, especially for representatives of the natural sciences. Many properties of an atom are estimated by radiation (spontaneous and stimulated) that an atom emits and/or absorbs. A person (human brain) also "emits" and perceives elementary flows of thought in the form of sentences and the characteristics of this "human radiation" can reveal a lot about both the person and their environment.
An important quantitative characteristic of a sentence is its length, which can be measured in various ways (the number of letters, words, etc.). The study of sentence lengths does not require special linguistic training and can be easily processed by computer. As a result, this value has been studied for a long time and is used to determine the The purpose of this work is to check the applicability of the Weibull distribution to the distribution of sentence lengths and to discover the law of the sentences length decrease over time. The results can provide additional justification for the applicability of the principle of least effort to elementary units of human speech that carry a particular thought.

Data for Analysis
The object of this research was to study the public speeches of politicians. Previously, this has beencarried out several times (see, e.g., [32][33][34]). However, the objectives of those studies were different from the objective of this work (readability and sentiment analysis, letter frequency distribution, etc.). Political speeches are a convenient object of research, since this is a form of oral speech that is well-documented for sufficiently long times. Political speeches are positioned between spoken and written ways of expressing thoughts. Unlike spoken speech, the speech under consideration is more meaningful, prepared ahead of time, and less spontaneous, from the speaker's point of view. At the same time, in comparison with written speech, political speeches are more focused on the listener, and, therefore, have a greater emotional component and the tendency to be easily understood. As a result, political speeches are extremely valuable research material. Such speeches are usually focused on some "average" citizen-the voter-therefore, the processing of such data reflects the temporal changes in the majority of native speakers.
We analyzed text transcripts of the 59 inaugural speeches of US presidents from 1789 to 2021 and 224 texts of speeches of UK Party leaders from 1895 to 2018 (available in [35] and [36], respectively). The studied speeches of US presidents are uniformly distributed every four years. The time distribution of speeches of UK Party leaders was not so uniform (due to copyright, the appearance of a new large party in parliament in 1977, etc.), but much more extensive. Note that no UK speeches were processed for 1898, 1914-1917, 1931, 1938-1940, 1944, 1952-1954, or1959. The list of analyzed speeches is presented in Appendixes.
Sentence length was calculated from period to period, the unit of measurement was the words between spaces (prior to analysis, we replace all question marks, exclamation marks, and ellipses with a period, and also remove all dots used when writing decimal numbers). Note that the selected unit of measure for sentence length is not exclusive. Words were selected as a unit of measure for sentence length primarily because of the simplicity and the great prevalence of this approach. It is necessary to note that according to [37], sentence length is robust with respect to the selection of the unit of measurement. Thus, the choice of the word (and, e.g., not letters) will not lead to a change in the results of further analysis. The calculation was carried out automatically using a developed and tested computer program (see, example in Figure 1).  Despite the fact that the studied speeches belonged to a long period of time, the total number of words in the speeches did not change reliably ( Figure 2). The average length of speech in words was 2331 ± 355 for the US and 5434 ± 774 for the UK.
Statistical analysis was performed using the well-known and widespread professional commercial product Statistica 12.0 (TIBCO Software). The data (year of the speech and values corresponding to the processed sentence lengths of speeches) are in open ac-  Despite the fact that the studied speeches belonged to a long period of time, the total number of words in the speeches did not change reliably ( Figure 2). The average length of speech in words was 2331 ± 355 for the US and 5434 ± 774 for the UK. Despite the fact that the studied speeches belonged to a long period of time, the total number of words in the speeches did not change reliably ( Figure 2). The average length of speech in words was 2331 ± 355 for the US and 5434 ± 774 for the UK.
Statistical analysis was performed using the well-known and widespread professional commercial product Statistica 12.0 (TIBCO Software). The data (year of the speech and values corresponding to the processed sentence lengths of speeches) are in open access [38].  Statistical analysis was performed using the well-known and widespread professional commercial product Statistica 12.0 (TIBCO Software). The data (year of the speech and values corresponding to the processed sentence lengths of speeches) are in open access [38].

Change in Sentence Length over Time
The parameters characterizing sentence length were calculated. They are listed below.

1.
The average sentence length. To calculate this parameter, the total number of words in a speech was divided by the number of sentences. The change in this parameter over time is shown in Figure 3. The figure demonstrates that the average sentence length decreases linearly, with the slopes for USA and UK practically coinciding, and are equal to 0.13 ± 0.03 and 0.14 ± 0.01, respectively. On average, over 100 years, from 1900 to 2000, the average sentence length for both sets decreases from 30 to 16 words in a sentence, that is, the length is reduced by almost twice.
low. 1. The average sentence length. To calculate this parameter, the total number of words in a speech was divided by the number of sentences. The change in this parameter over time is shown in Figure 3. The figure demonstrates that the average sentence length decreases linearly, with the slopes for USA and UK practically coinciding, and are equal to 0.13 ± 0.03 and 0.14 ± 0.01, respectively. On average, over 100 years, from 1900 to 2000, the average sentence length for both sets decreases from 30 to 16 words in a sentence, that is, the length is reduced by almost twice. 2. The median is known to be a stable characteristic of the distribution, it is almost unaffected by outliers. According to Figure 4, the median sentence length distribution decreases linearly in both sets with time. The slopes of the lines for USA and UK are 0.11 ± 0.02 and 0.11 ± 0.01, respectively. It can be seen that the lines are very close and practically coincide with ones for average sentence lengths.

2.
The median is known to be a stable characteristic of the distribution, it is almost unaffected by outliers. According to Figure 4, the median sentence length distribution decreases linearly in both sets with time. The slopes of the lines for USA and UK are 0.11 ± 0.02 and 0.11 ± 0.01, respectively. It can be seen that the lines are very close and practically coincide with ones for average sentence lengths.

3.
The decrease in sentence length over time is also demonstrated in Figure 5, where the time dependence of the maximum sentence length is presented. The decrease in this parameter in both sets is approximately linear. 3. The decrease in sentence length over time is also demonstrated in Figure 5, where the time dependence of the maximum sentence length is presented. The decrease in this parameter in both sets is approximately linear. The final results of this section are summarized in Table 1.  The final results of this section are summarized in Table 1.

Analysis of the Sentence Length Distribution Law
To analyze the sentence length distribution law, a number of speeches of the US presidents were excluded from the initial data. First, small speech texts, containing less than 40 sentences were excluded (these are speech texts of 1789, 1793, 1797, 1813, 1829, 1833, 1849, 1865, 1869, 1905, and 1945). Second, since it is the texts of public oral speeches that are analyzed, the texts of 1953, 1961, 1973, and 1981 were excluded because these speeches were not spoken, but were only written. Third, speech texts of 1801, 1805, 1837, 1877, 1881, 1893, 1941, 1965, and 1969 were not processed, since these speech texts have a multimodal distribution (the reasons for this and the analysis of these distributions could be the subject of a separate work). Thus, the analysis of the distribution law was carried out at 31 inaugural speeches of US presidents. All 224 speeches of the UK party leaders were analyzed. However, 31 texts were excluded due to the low significance level (<0.05) of the results obtained in relation to all tested distribution laws. Single outliers were excluded from the datasets before data analysis.
Six distributions with no more than two parameters, such as log-normal, Weibull, folded normal, half normal (normal), generalized Pareto, Rayleigh were analyzed in order to find the best theoretical distribution that describes the studied empirical distributions. The ranking of these distributions by the quality of data description was carried out according to the Kolmogorov-Smirnov criterion: the larger the p-level value, the better this distribution describes the empirical data and, accordingly, the higher its place in compari-  Tables 2 and 3 show the number of times one of the six listed sentence length distributions was among the top three (see Appendix A for more information).  Table 2 shows that, in 14 inaugural speeches of the US presidents, the Weibull distribution took first place in terms of significance, in another 14 it took second place, and in 3, it took third place. Thus, the Weibull distribution is the only distribution that adequately describes all speeches and takes the top three places. The average distribution significance level, where Weibull was in the first place, is 0.73, and for the second and third places, it is 0.5 and 0.3, respectively (see Appendix A). The log-normal distribution, ranked in the top three for 19 speeches, describes the data somewhat worse than Weibull one. The average significance levels for the log-normal distribution are 0.67 for the first place (13 speeches), 0.37 for the second place (5 speeches) and 0.08 for the third place (1 speech). Similar results can be seen for UK speeches (see Table 3 and Appendix A).
Thus, according to the performed statistical analysis, the Weibull distribution is the most preferable for describing the studied speeches. The Weibull distribution (cumulative distribution function) has the form 1 − exp(− (x/λ) k ), where λ and k are the scale and shape parameters respectively. Examples of the experimental data description using the Weibull distribution are presented in Figure 6. Note that the one-parameter Rayleigh distribution, ranked third in the description quality according to the analysis results, is a special case of the Weibull distribution, where the shape parameter is equal to two (see Tables 2 and 3).
ranked in the top three for 19 speeches, describes the data somewhat worse than Weibull one. The average significance levels for the log-normal distribution are 0.67 for the first place (13 speeches), 0.37 for the second place (5 speeches) and 0.08 for the third place (1 speech). Similar results can be seen for UK speeches (see Table 3 and Appendix 1). Thus, according to the performed statistical analysis, the Weibull distribution is the most preferable for describing the studied speeches. The Weibull distribution (cumulative distribution function) has the form 1-exp(-(x/λ) k ), where λ and k are the scale and shape parameters respectively. Examples of the experimental data description using the Weibull distribution are presented in Figure 6. Note that the one-parameter Rayleigh distribution, ranked third in the description quality according to the analysis results, is a special case of the Weibull distribution, where the shape parameter is equal to two (see Tables 2 and  3).   Figure 7 that the scale parameter reliably decreases over time. Since this parameter is known to be directly proportional to the mean, median, and mode of the Weibull distribution, this once again confirms the above statement about the decrease in the average (and the most probable) sentence length. The shape parameter for US speeches does not reliably change over time and is equal to 1.9 ± 0.1. At the same time, the shape parameter for UK speeches is slightly increasing, changing from 1.5 to 1.8 over the past 100 years. This is the only difference found when comparing sentence lengths for US and UK speeches. Since the change over time is not large (the slope of the line is 0.002 ± 0.001), for reliability, an analysis of this result using additional data is required. Table 4 summarizes the results on the behavior of the parameters of the Weibull distribution over time.
Entropy 2021, 23, x 9 of 27 the scale parameter reliably decreases over time. Since this parameter is known to be directly proportional to the mean, median, and mode of the Weibull distribution, this once again confirms the above statement about the decrease in the average (and the most probable) sentence length. The shape parameter for US speeches does not reliably change over time and is equal to 1.9 ± 0.1. At the same time, the shape parameter for UK speeches is slightly increasing, changing from 1.5 to 1.8 over the past 100 years. This is the only difference found when comparing sentence lengths for US and UK speeches. Since the change over time is not large (the slope of the line is 0.002 ± 0.001), for reliability, an analysis of this result using additional data is required. Table 4 summarizes the results on the behavior of the parameters of the Weibull distribution over time.    Thus, the time behavior of the parameters of the Weibull distribution allows us to conclude that, over the past two hundred years, sentence length distribution has become less fuzzy, the width of the peak decreases, and its abscissa is slightly shifted to the left. This is demonstrated in Figure 9. As a result, over time, speeches become composed of similar in length and shorter sentences, the difference in length decreases. In terms of sentence lengths, the text becomes more ordered. This can be seen in Figure 10, where information entropy (Shannon entropy) is presented as a function of time. The calculation of this value was based on sentence length distribution histograms containing the probabilities of detecting sentence length in a speech at a certain length interval. This is demonstrated in Figure 9. As a result, over time, speeches become composed of similar in length and shorter sentences, the difference in length decreases. In terms of sentence lengths, the text becomes more ordered. This can be seen in Figure 10, where information entropy (Shannon entropy) is presented as a function of time. The calculation of this value was based on sentence length distribution histograms containing the probabilities of detecting sentence length in a speech at a certain length interval.

Conclusions
Based on the calculation of sentence lengths in the text transcripts of the inaugural speeches of the US presidents for 228 years and the annual speeches of the UK party leaders for 123 years, two main results were obtained: Figure 10. Information entropy S versus the time t. Red circles indicate data for USA, and black triangles indicate data for UK. The linear regression equation for USA is 7.7-0.003t (the slope of the line is 0.003 ± 0.001) and for UK is 8.2-0.003t (the slope of the line is 0.003 ± 0.001).

Conclusions
Based on the calculation of sentence lengths in the text transcripts of the inaugural speeches of the US presidents for 228 years and the annual speeches of the UK party leaders for 123 years, two main results were obtained: 1. The average sentence length for both US and UK speeches decreases linearly with time with the slope of 0.13 ± 0.03 words/year and, on average, from 1900 to 2000, sentence length decreased with time from 30 to 16 words.
2. Sentence length distribution for both US and UK speeches is better described by the Weibull distribution (in particular, in comparison with the log-normal). The scale parameter of this distribution reliably decreases over time from 35 to 15. The shape parameter for US speeches does not change over time and is equal to 1.9 ± 0.1, and the shape parameter for UКspeeches slightly changes over time from 1.5 to 1.8.
These two results are in agreement with the principle of least effort: the speaker, attempting to minimize both their efforts and the listeners' effort, tends to choose the shortest possible sentence length from a potential set of sentences of approximately the same content. As a result, on the one hand, sentence length distribution begins to correspond to the distribution of minimum values-the Weibull distribution, and on the other hand, at time intervals significantly longer than the speech preparation time, the average sentence length decreases. The detected change over time in the scale parameter of the Weibull distribution and in information entropy indicates that sentence length in public speeches is gradually becoming less diverse; it is being unified and standardized.
Here we highlight the following idea. When establishing the distribution type for empirical data, most important are not statistical tests, but rather the theoretical justification. If we accept the principle of least effort, then the Weibull distribution clearly follows from it. If one assumes that the principle of least effort is not suitable here, then obviously, they must propose some other theoretical justification-their principle-and theoretically derive, for example, gamma or lognormal distributions from it. Currently, we do not see such attempts. The G. Zipf's principle, in our opinion, is very profound and productive, and many interesting consequences can be obtained from it. It has great potential, which has not yet been fully embraced by modern linguists. Our work and a number of works (see, e.g., [27,28,[39][40][41][42][43]) show how useful it can be.
An interesting continuation of this work can be the verification of the obtained results using breath groups [6]. Detecting correlations and differences in such a collaborative analysis of breath groups (largely related to human physiology) and sentence lengths (largely related to cognitive processes) is a very interesting task. One of the problems in this direction will be a significantly smaller statistical database for breath groups in comparison with an almost limitless database for sentence lengths. Another interesting development of this work, in the scope of currently well-established directions connected to language complex networks (see, e.g., [39][40][41][42][43]), could be an analysis of the data obtained, here, from the position of the principle of compression, which appeared as a development of the ideas of G. Zipf [43]. It seems to us that the results of this work, combined with the principle of compression and with the use of Kolmogorov complexity ideas (existing inalgorithmic information theory) could be very promising. Found patterns for English also require validation for other languages, including artificial, as well as using other methods and linguistic units (letters, initial characters, words, etc.). In this regard, works [44][45][46] may be useful.
In conclusion, we return to the metaphor from physics given in the introduction. The analysis of the atom emission spectra and the Planck formula for wavelength distribution revolutionized the understanding of atomic properties, leading to the formulation of the laws of the quantum world-quantum mechanics. The distribution law of the lengths of utterances (sentences) "emitted" by the brain, corresponding to the Weibull distribution, is also able to stimulate the development of brain sciences. One of the possible directions related to brain biophysics may be the study of the energetic basis of the origin and development of thought and language. There are a number of works in this direction, in particular [4,6,[47][48][49][50][51][52][53]. Considering thought as a complex non-equilibrium process, it can be concluded that its development matches the well-known principle of maximum entropy production. According to this principle, causes (stimuli) generate such responses that maximize the thermodynamic entropy production [47,54,55]. One of these responses in the course of the evolution of human thinking was the origin of the language. This revolutionary bifurcation process led to an abrupt increase in energy consumption and, as a result, an increase in the entropy production in a nonequilibrium system, i.e., in neural networks of humans who have become users of language. Naturally, a spontaneously emerged structure (network) could not be optimal at inception: only a certain basic structure (framework) of language was formed, which had some imperfections. Subsequently, being already at the high level of energy consumption and entropy production achieved after bifurcation, the nonequilibrium system began to evolve for a rather long time, trying to minimize energy consumption [54,55]. This minimization will no longer return the system to its previous, pre-bifurcation values of entropy production; however, due to the optimization of the neural network processes responsible for language, a small decrease is possible. According to nonequilibrium thermodynamics, this optimization process is already progressing in accordance with the Prigogine minimum production principle [47,54,55]. Its linguistic analogue can be considered the principle of least effort (least effort assumes less energy spent on communication, and, consequently, less energy dissipation). The information on the "simplification" of language-a decrease in its entropy, discovered in this work, can be considered a confirmation that language is currently going through a second (minimizing) stage of development.

Conflicts of Interest:
The authors declare no conflict of interest.