Competition and fragmentation: a simple model generating lognormal-like distributions

The current distribution of language size in terms of speaker population is generally described using a lognormal distribution. Analyzing the original real data we show how the double-Pareto lognormal distribution can give an alternative fit that indicates the existence of a power law tail. A simple Monte Carlo model is constructed based on the processes of competition and fragmentation. The results reproduce the power law tails of the real distribution well and give better results for a poorly connected topology of interactions.

The astonishing similarity between biological and language evolution has attracted the interest of researchers familiar with analyze genetic properties in biological populations with the aim of describing problems of linguistics [1,2,3]. Their techniques, for instance, have succeeded in explaining some interesting features related to the coexistence of the about 7, 000 languages present on Earth.
More recently, the effort to connect evolutionary biology with linguistics has emerged into a study that describes the effects of competition between languages on language evolution. The work of Abrams and Strogatz in 2003 [4], who analyzed the stability of a system composed of two competing languages, can be considered as the starting point of this new research line. In the following years, other groups, simultaneously, developed new analytical and computational models [5,6,7,8,9,10,11,12]. An overview of the fast increasing literature on language competition can be found in refs. [13,14].
Languages are by no way static. They continuously evolve, changing, for example, their lexicon, phonetic, and grammatical structure. This evolution is similar to the evolution of species driven by mutations and natural selection [15]. Following the common picture of biology [16], changes in the language structure may be seen as the result of microscopic stochastic changes caused by mutations. Natural selection, which may be caused by competition between individuals, positively selects some of these small changes, depending on their reproductive success. A sequence of macroscopic observations corresponds to such a microscopic picture. In language evolution, these macroscopic events are, for instance, the origination of two languages from an ancestor one -for example, the emergence of the Romance languages from Latin -or the extinction of a language.
In this work we model the evolution of languages from a macroscopic point of view. More precisely, the microscopic processes responsible for the differentiation of one language into two new languages are not implemented here. Effectively, we neglect the microdynamics that generates language changes, at the level of individuals, and we just describe their effect on extinction and differentiation at the level of languages, throughout a phenomenological mechanism of growth and fragmentation. Language change is determined by the dynamics of the size of its population. The fact that rare languages are less attractive for people to both learn and use is the mechanism considered as the origin of these population size changes. Consequently, this mechanism introduces a sort of frequency dependent reproductive success for different languages. Statistical data supporting this conjecture can be found in Ref. [17]. Languages documented as declining are negatively correlated with population size. This phenomenon is similar to the Allee effect in biology [18]. With a simple computational model, based on the above described mechanisms, we compare simulation results with empirical data of the distribution of population sizes of languages (DPL) of Earth's actually spoken languages [19] (see Fig. 1).
Several attempts have recently been made to reproduce the DPL. Two works focused on the apparently lognormal shape of the DPL. Tuncay [20] described language differentiation by means of a process of successive fragmentations, in combination with a multiplicative growth process. In a recent paper by Zanette [21], the dynamics of language evolution is considered as a direct consequence of the demographic increase of the speaker populations, which is modeled by means of a simple multiplicative process. Unsurprisingly, these models obtain pure lognormal distributions for the DPL, as expected from the application of the central limit theorem for multiplied random variables [22,23]. Unfortunately, the DPL is known to significantly differ from a pure lognormal shape [17].
The Schulze model [8] relies on ideas already successfully applied to model biological evolution. Languages are identified by a bit string that represents their characteristic features. New languages are produced by mutations of these features and small languages are discriminated by competition. These simulations, during the transient towards the stationary state, are able to generate data with a distribution similar to the DPL. A review of the Schulze model and its application to different problems connected to language interaction can be found in Ref. [14].
Another model, the Viviane model, simulates human settlement on an unoccupied region. Languages suffer local mutations, until the available space becomes completely populated [12]. The introduction of a bit string representation into the Viviane model was able to generate new results which reproduce the DPL over almost the entire range well, except for large language sizes [24,25]. The bit string approach gives an explanation for the deviation of the DPL from a lognormal distribution for small population sizes. The DPL changes its shape depending on the method used to distinguish different languages from dialects. When restricting the comparison between languages to be based on a small number of different features in language structure, the deviation for small language populations appears. This interpretation was confirmed by using a simple model [26] that neglects the geographic effects present in the Viviane model.
A thorough look at the DPL may suggest that the deviations from the lognormal shape could be due to power law decays [27]. We investigate this idea by fitting the DPL with a double-Pareto lognormal distribution and additionally comparing it to the simulation results of our new model. The paper is organized as follows. First we analyze the DPL and show that the double-Pareto lognormal distribution [28,29] gives an alternative fit. The next section introduces the computational model. In the last two sections, the simulation results are presented and the conclusions are given, respectively.

Statistical analysis of the distribution of languages
Reference [19] provides the number of people speaking a given language as their own mother tongue. There, 6,912 different languages have been classified and for 6,142 of them their speaker population was estimated. In the data set of 6,142 languages we also verify a big difference between the median, 6000 speakers, and the average of the number speakers of a language, circa 1.052 × 10 6 speakers. This discrepancy is related to the fact that only 326 languages have at least one million speakers. When they are assembled, these languages account for more than 95% of world's population with the remaining 5816 (94.6%) languages encompassing the left-over speaking population. This draws attention to the fat tailed behaviour of the distribution of the number of speakers. Starting from these data, we make a histogram by counting the number of languages with population size enclosed in a bin with values between exp(0.625n) and exp(0.626(n + 1)) where n = 0, 1, 2, .... This distribution defines the DPL. A pure lognormal shape can be described by, where A, µ and σ are the parameters of the distribution. A lognormal distribution corresponds to a parabola in a double-logarithmic plot (see Fig. 1). In order to calculate the parameter values by maximum log-likelihood estimation we have to transform the distribution to obtain a probability density function. As the size of the bins in the DPL increases exponentially, the frequency of languages with a certain population size is calculated by dividing the DPL by the bin width which leads to, after further normalization in order to satisfy L c(L) = 1.
The fit by maximum log-likelihood estimation with the lognormal distribution of eq. (2) gave the parameter values µ = 8.70 and σ = 3.20 with a Kolmogorov-Smirnov distance of D KS = 0.0253. These values of the lognormal distribution are the same as the ones obtained in ref. [17], where log 10 (L) was used instead of ln(L).
Unfortunately, as stated before, the DPL of real data is known to significantly differ from lognormality [17] and deviations can be easily observed for small and large values of L.
In particular, a thorough look on the DPL shows that it does not seem to decay exponentially. Hence, we have tried to fit the data using a distribution that can account for such a power law decay. As both tails could show power laws, we assumed the so-called double Pareto lognormal distribution, introduced by Reeds [28,29], where Φ (x) represents the cumulative distribution function of the normal distribution up to In this case, using the maximum log-likelihood procedure, we have estimated the following values for the function parameters: α = 0.682, β = 0.603, τ = 2.33, and ν = 8.89. This set of parameters yields D KS = 0.0198 which is slightly smaller than the distance presented by the log-normal fit. With these values we can compute the critical P value, P c , which is given by The closer P c is to zero, the poorer the fit is. With the values presented previously we have P C ≃ 1.6 × 10 −2 for the fit for Eq. (3) and P C ≃ 8 × 10 −4 for the lognormal adjustment. This fact suggests that the double Pareto distribution allows a better description of real data. Figure 1 shows the results of this analysis comparing the corresponding DPLs. It is likely to obtain better results by using a fitting function with a larger number of adjustable parameters. The Akaike information criterion [30] provides an estimation whether the introduction of new parameters into the fitting procedure is useful, k denotes the number of fitting parameters and RSS is the residual sum of squares. Smaller values of this criterion correspond to better results. We obtain AIC = −10.25 for the fitting with the lognormal function and AIC = −12.08 for the one with the double-Pareto lognormal function.

Interaction versus fragmentation
This section introduces our model. As we stated before, we describe the behavior of languages on the macroscopic scale of sub-populations, neglecting the languages internal structure, constituted by the speaking preferences of each individual. Each language i is characterized exclusively by the number of its speakers L i . The origination of one new language is obtained throughout fragmentation. Fragmentation is implemented as follows: at each time step, each language can break into two new languages with a fixed probability F : For the sake of simplicity, the two new languages contain exactly half of the population of the ancestor language (fragmentations into two parts of unequal size do not alter the qualitative shape of the distribution of the fragments size). Taking into account only the fragmentation process, the number of languages, N L , increases until each language has only one speaker. For large numbers of N L , there is an interval, during the time evolution, when the population distribution displays a pure lognormal shape. It is interesting to remember that a simple soluble model analog to this fragmentation process is the discrete sequential fragmentation of a segment. In this case, through a rate equation approach, it is possible to show that an explicit asymptotic solution is given by a lognormal distribution. This is independent on the number of pieces of each breaking event [31].
The interaction between languages is implemented by a term that controls the growth of each language in dependence of its relative population size. At each time step, the population of each language i follows the rule: where L i (t) denotes the number of speakers of language i at time step t, I the interaction is the total number of speakers. The second term, which can be positive or negative, enhances the reproductive success of the languages with more speakers, and causes the decrease of the number of rare language speakers. The third term, which is controlled by the parameter V , is a cause of random death that avoids the unlimited growth of the population. The term limits the growth of each language to have less than V speakers. It is important to notice that this third term is necessary because the second one causes an uncontrolled growth in the total population. In fact, the second term does not conserve the total population because languages with less than one speaker (L i (t) < 1) are removed from the simulation. We perform the simulations over two different topologies. In the first, the languages are ordered in a chain, with periodic boundary conditions, and interact with their nearest neighbors (NN). New languages, generated by fragmentation, are placed between the ancestor language and the following one. This implementation describes restricted local interactions, that can only happen between neighbor languages, in a 1-dimensional space.
In the second implementation, the interaction is carried out between all languages, in a mean field like model. For each iteration, every single language interacts with the whole set of languages. This model corresponds to the ideal case of a fully connected world, without geographic constraints.

Simulation results
All our simulations runs begin with one language having a number of speakers that can vary from 1, 000 to 1, 000, 000. However, the final state of the system does not depend on the initial conditions. The system is characterized by three global quantities: the number of languages N L , the total population N T ot and the distribution of language sizes in terms of speaker population. This distribution is averaged over time steps. The results are collected after the stationary state is reached, i.e. with both N L and N T ot fluctuating around some fixed value. Note, that eq. (7) allows the case N T ot > V for N L ≥ 2.

Local interactions
We start with languages arranged on a 1-dimensional array. This setup can be interpreted as the most simple implementation to describe small range interactions corresponding to a weakly connected world.
First, we study the model dependence on the parameter V . Without the interaction term, this parameter would directly determine the population size, and it is normally called carrying capacity or Verhulst factor in models of population dynamics. The presence of the second term in Eq. (7) generates a different behavior. As it can be observed in Fig. 2a, the averaged total population size saturates for large V values. In contrast, after a short transient, for sufficiently large values of V , the number of languages, N L , increases linearly with V , as shown in Fig. 3a.
Looking at Fig. 2b, we can see how, for large V values, it is possible to rescale all the DPL towards a common scaling behavior. The V dependence of the distributions becomes clear when looking at the inset of the same figure. In fact, for small V one language completely dominates the system. By increasing V , the interaction term becomes weaker, and the fragmentation process takes over, leading to a larger number of languages. For even larger V values, the distribution moves towards smaller languages sizes maintaining its shape when displayed in a double-logarithmic scale as shown in Fig. 2b.
In Fig. 3b, we show the power law dependence of the total population size and of the number of languages with the parameter F , which controls the fragmentation process. Figure 4a shows the shape of the DPL for different values of the parameter V and of the interaction strength I. It is possible to obtain a data collapse by rescaling the distribution with the value of V and the population size with the factor V /I. It is interesting to note that, in log-scale, the broadness of the distribution does not depend on V or I. As can be clearly observed, the shape of the curve deviates strongly from a lognormal shape for small and large population values. In fact, we collected data over sufficiently many decades to be able to show that the curves decay like a power law for small and large population sizes.  For smaller F we obtain broader distributions. From the analysis of this data we obtain scaling relations for the maximum value of the distribution (scaling with V F 2.3 /I) and for its normalization (V F 1.6 ). This last scaling relation follows exactly from the V and F dependence on the averaged number of languages, N L , shown in Fig. 3b.
Using these scaling relations we can estimate the parameters values for running a simulation that can reproduce the DPL of real data. In fact, these parameters correspond to the ones that allow the rescaling of the DPL of real data to collapse with the ones shown in Fig 4b. Taking F = 2 · 10 −5 from adjusting the broadness of the distribu- tion, the values of the others parameters are I = 1.5 · 10 3 , V = 5 · 10 11 . Unfortunately, these values correspond to simulations with too demanding computational time. For this reason, we are forced to just test the quality of the collapse of rescaled real data and the rescaled simulation, and we can not do a direct comparison. We carry out a simulation with the parameters F = 2 · 10 −5 , I = 10 7 and V = 3.2 · 10 12 . In Fig. 5, we can see how the DPL collapses well with our simulation if we neglect very small population sizes (L < 100). In addition, our model can reproduce the behavior of the fat tail well (which corresponds to frequently spoken languages).

Mean field
The increasing connection between people speaking different mother languages suggests exploring the behaviour of our model located on a topology with more links than those of a 1-dimensional array. For this reason, we decided to explore the other limiting situation: a fully connected model (mean field like description). Even if this is a quite unrealistic implementation, we would like to estimate the upper bound for our dynamics, corresponding to a fully globalized world. Figure 6 shows the normalized total population, N T ot /V , and the normalized language number, N L /V , for different V values. In this topology, both quantities approximately grow linearly with V and N T ot does not reach any saturation value. This novel behavior is due to the different form of carrying out the sum in the second term of Eq. 7. In fact, in this implementation, the sum is taken over all languages and For large values of V , the distribution approaches a fixed lognormal-like shape. In Fig 7, we show the data collapse for different values of the parameters V , I and F . Surprisingly, the scaling relations are really simple and all DPLs can be perfectly collapsed into each other. On the logarithmic scale, the width of the distribution remains fixed and it does not change when varying the parameters values. Even the Fdependence, leading to a different broadness of the DPL in the previous implementation, disappears. The distribution clearly shows two power law decays with exponents 1 and −2.
These results are quite intriguing. On the one hand, the fixed value of the x ~ x -2 distribution broadness does not allow the reproduction of the real data of actual language sizes. On the other hand, it suggests an interesting conjecture. If the increasing connection between people, in the future, will allow the description of their interaction using a mean field like approximation, our model may provide some predictions. Independently on the population growth, the width of the DPL distribution will get narrower. This means that, despite of the probable growth of the world population, more languages will become extinct than new languages appear. The number of less spoken languages will strongly decrease.

Conclusions
A careful analysis of the distribution of population sizes of languages (DPL) suggests that the behavior of its tail is better described by a power law. For this reason, a fit of the real data using the so-called double-Pareto lognormal distribution generates better results than the fit with a pure lognormal distribution.
These deviations from a pure lognormal distribution mean that simple models that reproduce exactly such distributions must be improved. We implemented a new toy model on a 1-dimensional topology that, starting from a macroscopic description at the sub-population level, accounts for two general mechanisms: language competition and fragmentation. These ingredients are sufficient to generate distributions that well reproduce real data, particularly the behavior of frequently spoken languages (the right tail of the distribution).
Moreover, we studied our model on a fully connected network, in which all subpopulations are in mutual contact (mean field like behavior). This implementation can be a good approximation for describing the trend for the interaction patterns on a future, characterized by more and more interconnected communities. This model version allows a simple prediction that confirms previous conjectures about the future mass extinction of languages [17]: despite of the probable growth of the world's population, more languages will become extinct than new languages will appear.
Finally, we want to recall that, as our toy model is based on simple and general rules, it may be characteristic for systems other than language evolution as well. In fact, systems with an interplay between a fragmentation process and a size dependent growth should exhibit similar patterns. A good example comes from the description of the growth of companies [32,33,34] or from the analysis of the mutual fund size distribution [35]. In these systems it is possible to describe the size distributions in terms of log-normal-like distributions In fact, as a zeroth order approximation, log-normal distributions can naturally be generated by a multiplicative growth processes, in which the company size, at any given time, is given as a multiplicative factor times the size of the company at a previous time. For the logarithm of the company size, this process becomes an additive process and so the distribution converges to a normal distribution, obeying the central limit theorem (Gibrat's law [36] and theory of breakage [37]). Small variations in this approach, as the introduction of some type of cutoff, can modify the lognormal distribution into a heavy tailed distribution [22]. This is obtained, for example, by introducing creation and annihilation processes [38,39]. At this level of description, none of the components of the random process depend on size. Anyway, it is well known that the growth rates of companies are size dependent [32,40,41] and once size effects are taken into account the predictions for the distribution can become more quantitative [35]. This description of the growth of companies is similar to our approach, in the sense that the dynamics of creation and annihilation in growing phenomena can be analogously described introducing the process of fragmentation as well as its counterpart, the phenomenon of coalescence. Our work points out the importance of taking into account these mechanisms, in modelling systems usually described in terms of birth and death processes or random growth [42].