Diversity in complex systems: measuring parts of the distribution to the whole

Despite the utility of the Hill numbers, measuring diversity within complex systems and complex datasets, particularly regarding parts of a distribution to the whole—that is, different portions of diversity types to the entire system or dataset—is a challenging issue. In this paper, we attempt to relate the diversity of a part (or parts) of a distribution to the diversity of the whole. We derive this relationship for the Hill numbers q D , and use these results to further examine the effect of a freely varying type on the diversity of the whole distribution.


Measuring diversity
As identified by Sornette (Sornette 2009) and others (Newman 2005, Barabasi 2009), probability distributions are the 'first quantitative characteristics' of both complex systems and complex datasets (Sornette 2009, p. 2). In terms of diversity (both with regard to species richness and evenness), this makes them highly useful, as measurements on a wide range of complex systems and complex datasets are well approximated by their shape, particularly as the sample size and its diversity  ¥ n . For example, within the complexity sciences, the literature on measuring diversity is vast and includes a number of different mathematical formulations -Gini-Simpson index, Shannon entropy, Hill numbers- (Jost 2006, Leinster and Cobbold 2012, Chao and Jost 2015, Hsieh et al 2016, Pavoine and Marcon 2016, Jost 2018. The Hill numbers and Rényi entropies are also important measures used to derive quantum uncertainty relations in quantum theory and also the maximal entropy principle in statistical inference (see Jizba et al 2015, Jizba andKorbel 2018 andreferences therein).
Across these measures, the Hill numbers D q provide a highly useful way to measure the diversity of a distribution that computes the number of equivalent equiprobable types for a uniform distribution that maintains the same level of entropy (Rényi entropy to be precise) (macArthur 1965, Hill 1973, Peet 1974, Jost 2006, Gaggiotti et al 2018, Jost 2018. The parameter q favours the types with lower frequencies if 0<q<1 and the types with higher frequencies for q>1. For q=1, D 1 weights each type proportional to their relative frequency of occurrence, and boils down to e H where H is the Shannon entropy of the distribution. Still, despite the utility of the Hill numbers, measuring diversity within complex systems and complex datasets, particularly regarding parts of a distribution to the whole-that is, different portions of diversity types to the entire system or dataset-is a challenging issue. In this paper, we attempt to relate the diversity of a part (or parts) of a distribution to the diversity of the whole. We derive this relationship for the Hill numbers D q , and use these results to further examine the effect of a freely varying type on the diversity of the whole distribution.
Our interest in this issue is based on a series of papers we have written on diversity in complex systems and complex datasets, in which we have introduced a modification of D 1 called case-based entropy C c to more effectively measure the true diversity of some dataset of study, including the restriction of diversity within complex systems Castellani 2012, 2014). As a measure, C c is based on a modification of the Shannon-Wiener entropy H; and is inspired by Jost's paper on Entropy and Diversity, (Jost 2006), in which he states that: 'In physics, economics, information theory, and other sciences, the distinction between the entropy of a system and the effective number of elements of a system is fundamental. It is this latter number, not the entropy, that is at the core of the concept of diversity in biology' (p. 363).
To our knowledge such an approach, relative to the probability distributions of complex systems or complex datasets in general (particularly big data), has not been derived or explored before. As such, for us, Jost's quoteand, in turn, our development of Shannon-Wiener entropy H into C c -provides one potentially useful way for advancing the statistical measurement of diversity within complex systems in general. In addition, we anticipate the use of our results in furthering our development of case-based entropy and deriving new properties of the same.
Remark 1.1. We note that the choice of q actually encodes the kind of correlation that exists between the parts of P i of the distribution (Jizba and Korbel 2018). = q 1 means that the parts are independent and the strong system independence condition is satisfied i.e., disjoint parts are independent. Other choices of q will correspond to a sub or super exponential relationship between the number of distinguishable states and distinguishable sub-systems, as found in strongly correlated systems. So inherently, the choice of q actually chooses the kind of correlations that the partitions P i have, and it is with that caveat that we endeavor to establish the relationship between diversity of the parts to the whole.

Purpose of current study
In terms of the current study, we consider a general probability distribution with a random variable X as shown in table 1 (signifying different types or categories), where x i denotes the i−th type, with probability p i and frequency f i . We ask the following question: if P i denotes a partition of the indices {1, K, K} and D q P i represents the diversity of that partition, what is the relationship between D q P i and the diversity of all K types given by D q K ? And in particular, how does D q K change if the probability of a single type p K (say the K-th type) varies from 0 to 1. In other words, we are interested in the functional relationship of D p q K K ( )as p K varies from 0 to 1. These results also address a key aspect of diversity and its relationship with the probability (or relative frequency) of the free type; namely, they show that as p K increases, there exists an optimal level of frequency p K so that the maximal diversity, which is equal to one more than the diversity of the known types i.e., , is attained, and the diversity starts to decrease for all probabilities . In other words, when the probability of the K-th type equals p K , the diversity of all K types is equal to the diversity of the K−1 types. And, in some sense, at = p p K K , it is as though the K-th type does not appear to exist. Also, for all probabilities < < p p 1 K K the diversity starts decreasing and when p K =1, the K-th type dominates entirely leading to a total diversity of 1 i.e.
. We have organized the paper as follows. In section 2, we give a brief introduction to the idea of measurement of diversity. In section 3, we prove the two main theorems in the paper for D K 1 and ¹ D q , 1 q K respectively. In section 4, we use the results from section 3 to investigate the function D p q K K ( ). And we finally conclude with a summary in section 5. At the outset, we mention that superscripts on the left are labels for diversities distinguishing the choices of q amongst Hill numbers. We attach an appendix to clarify some of the notation used, in addition to explaining it in the paper as well.

A formal introduction to diversity
Diversity, as a measure, counts the 'richness' (number of types) of a distribution in relation to the 'evenness' (equal probability of occurrence) of its diversity types (MacArthur 1965, Hill 1973, Peet 1974, Jost 2006, Rajaram and Castellani 2012, 2014. The intuition behind this definition is that if all of the types in the distribution occur Table 1. General dataset with complexity types x i each having a probability p i and a frequency f i .
with the same probability, then diversity should simply be equal to the number of types K; and any deviation from uniformity in probabilities will always lead to a lower value of diversity.
Definition 2.1. Given an ordered set of types numbered as Î i N and their corresponding probabilities p i , the diversity of the entire distribution D q K for some complex system or dataset is defined as the number of equiprobable types needed to yield the same value of entropy H q .
Shannon entropy is defined as below: Rényi entropy is defined as below: It was shown (MacArthur 1965, Hill 1973, Peet 1974, Jost 2006, Rajaram and Castellani 2012, 2014) that definition 2.1 implies that the total diversity D q K is given by: Furthermore, we denote the diversity of any partition of indices of types in the following form { } . The partial diversity for such partition P i can be written as follows: are the marginal probabilities. We note that equations (3)-(5) can be rewritten in terms of the frequencies f i as below: For an arbitrary disjoint partition P i with indices not necessarily in order (we note that P i 1, { } has indices in order from 1 to i for the types), the partial diversities in equation (5) can be rewritten as follows, with D P i denoting the diversity of the partition P i : Consequently the partial diversities in terms of frequencies in equations (6) and (7) can also be rewritten with arbitrary partitions with appropriate notation as follows:

Parts of the diversity distribution to the whole
Theorem 3.1. Given a probability distribution similar to table 1, the diversity of the entire distribution D q K for some complex system or dataset, and the diversities of disjoint parts D q P i and their respective cumulative probabilities c P i are related as follows: are the probabilities of occurence of each of the individual distributions assuming they are equally likely. Now, thinking of a given distribution as a pooling of its disjoint parts with indices in Î  P i however, we may want to divide it-and changing the notation for the part P i as the part l from above, we have that = D D l P 1 1 i . The probability of the part of the distribution corresponding to indices in Î  P i will then be = å = Î p p c l r P r P i i , the sum of all the probabilities of the individual types comprising the part P i . Then the above equation's last step can be rewritten using the new notation to obtain equation (11).
• The cases ¹ q 1: The idea of the proof is the same i.e., start with pooling m disjoint distributions where the D l 1 ʼs are replaced now by D q l ʼs and so on. However the calculation from equation (7) for the pooling is different and as follows: and the result follows by the same renaming convention i.e. P i as the part l as in the previous q=1 case.
Remark 3.1. If we consider each part P i in the derivation of the above theorem to be exactly one type i.e., )and equation (11) reduces to equation (3), and equation (12) reduces to 4.
Remark 3.2. We can restrict ourselves to a portion of the distribution starting from = l 1 to say = l k. Then theorem 3.1 is true for the restriction for any sub-partition  k of such a restriction. In other words, the result is true for a part of part of a distribution and for all nested sub-parts. In this case the probabilities will have to be renormalised as = p p l l P k and = c c l l P k , i.e., all probabilities and cumulative probabilities should be divided by the sum of all probabilities in the partition P k . So this result is self-similar in nature. . We can also compute the following: = = c c , following can be easily checked numerically: 4 are known), the fourth diversity (or partial diversity) can be computed. Since entropy is the natural log of diversity, and diversity is the exponential of entropy, any result related to diversity is also a result related to entropy and vice-versa. We finish this section by stating a corollary of theorem 3.1 which relates the entropy of parts of a distribution to the total entropy.

Remark 3.3 (Example). Let us consider the following probability distribution in
Corollary 3.1. Given a probability distribution similar to table 1, the entropy of the entire distribution H q K for some complex system or dataset, and the entropies of disjoint parts H q P i and their respective cumulative probabilities c P i are related as follows: Proof 3.2. The proof follows by taking natural logarithm of equations (11) and (12) respectively.

Diversity and the probability of the free type
As the reader may recall, our results also address a key aspect of diversity and its relationship with the probability (or relative frequency) of the free type. To demonstrate, we apply theorem 3.1 to investigate the effect of changing the probability of the last type K on the total diversity of D q K . Changing the probability p K will mean re-normalizing the probabilities of the remaining -K 1 types to make the total probability=1. We note that, equivalently, we could simply rewrite all formulas in terms of actual frequencies and change the frequency f K of the K-th type. Using probabilities, however, will allow us to draw graphs between 0 and 1 which is a bit more convenient (and also visually useful for investigation) to compare the variations for different choices of q.
We state a corollary to theorem 3.1 which can be easily proved by choosing = ¼ - Corollary 4.1. Given a probability distribution like in table 1, the diversity of the first -K 1 ( ) types -D q K 1 , the diversity of the whole distribution D q K and the probability of the K -th type p K are related as follows: Proof 4.1. The proof follows by applying theorem 3.1 to the partition = ¼ - x 3 3 16 x 4 1 16 x 5 3 16 Remark 4.1. It is to be noted that even though equations (15) and (16) show the explicit relationship between D q K and p K for all Î - ] is still completely independent of p K . This is because while computing marginal probabilities Equations (15) and (16) give us a direct relationship between the probability of the K-th type and the total diversity. For the specific choice of K=6 and = D 5 q 5 , we plot the function D p q K K ( )for various choices of q. We note that the choice of K and -D q K 1 has no bearing on the shape of the functional relationships that we are discussing in this section.
We make the following observations from the graph'in i.e., the K-th type is counted as 1 at maximum. We note that this can be proved separately by Calculus based techniques. (See theorem A.1 in the appendix).
(2) Start: When p K =0, as expected, the diversity satisfieds = -D D 0 ( ) as expected i.e., in the absence of the K-th type, the diversity is simply equal to the diversity of the remaining -K 1 types. This is also equivalent to the Shannon-Kinchin axiom 2 (Jizba and Korbel 2019).
(3) Point of diminishing return: We note that there exists a probability < < i.e., there exists a probability of the K-th type where the total diversity of K-types is equal to the diversity of K−1 types. So in a sense, at = p p K K , it is as though the K-th type does not add to the total diversity at all. For all probabilities < < p p 1 K K , the diversity starts to decrease to a value less than the diversity of the K−1 types. Hence, for > p p K K the addition of the K-th type actually starts to decrease the total diversity, which justifies our terminology that = p p K K is the point of diminishing returns. Another way to state this is to say that the K-th type starts to dominate the total diversity for > p p K K . There is no explicit analytical form for p K , as it is the solution of an implicit nonlinear equation. However, its existence can be proved using Calculus based techniques. (See corollary A.1 in the Appendix) (4) Variation of point of diminishing return: As q increases, the point of diminishing return p K gets monotonically smaller.
(5) Asymptotic behavior at as  p 1 K : As  p 1 K , the K-th type completely dominates and the diversity satisfies  D 1 q K i.e., it is as though the previous K−1 types do not even exist! This is also equivalent to the Shannon-Kinchin axiom 2 (Jizba and Korbel 2019).

Conclusion
In this paper, relative to the distribution of diversity in complex systems and datasets, we have derived an explicit statistical relationship between the diversity of part of a distribution and the whole. Relative to our findings, the following are our conclusions: (1) Main result: The relationship in theorem 3.1 in equations (11) and (12) provide an explicit functional relationship between the diversity of a partition P i of a distribution (given by D q P i and the total diversity D q K of the K types. To our knowledge this is the first time such a relationship has been derived using this approach. (2) At the bottom of the hierarchy, if we consider each type as a part itself, then we can recover the original formulas given by 3 and 4.
(3) Self-similar nature: The relationship between the diversity of parts to the whole is self-similar in nature i.e., the same relationship exists between a part of the distribution and its own parts as well, as explained in remark 3.2.
(4) Entropy of parts: The relationship between the diversity of parts to the whole leads to a similar fractal relationship between entropy of parts and the whole as seen in corollary 3.1.
(5) Usefulness: The main result in the paper (Theorem 11) can be used to explore the relationship between the probability of occurrence of a part of a distribution and the diversity of the whole distribution for a variety of complex system and datasets. As an example, we have shown how it can be used to explore the dependence of the total diversity D q K of K types on the probability of the K-th type and found some interesting revelations along the way such as the point of diminishing returns. Complexity is all about relationships (or lack thereof) between parts and whole. Theorem 3.1, corollary 3.1 and corollary 4.1 explicitly relate the diversity (or entropy) of parts of a distribution to the diversity of the entire distribution. This is important because it allows us to directly compute the change in the diversity of the entire distribution if the diversity of the parts change due to complex internal mechanisms.
(6) Future work: In our future work we will try to use the main result in this paper to explore the relationship between case-based entropy of a given distribution and its shape. We believe this will be a useful step towards using case-based entropy as a tool to further explore the regions of the distribution that contribute to the inequality of diversity. For example, in a variety of probability distributions it could prove useful to pinpoint and quantify the contribution of parts of a distribution to the overall diversity.
Appendix A. Proof of graph properties

Appendix B. Notation
(1) K: The number of types in a distribution.
(2) D K : Diversity of the entire distribution i.e., all K types.
(3) : An ascending disjoint partition of the set of indices K 1 ,.., { } such that every element Î  P i satisfies the property that <  < i j P P max max i j . In other words, the partition preserves the ordering of the numbers K 1 ,.., { }. In particular, the member + i i j , 1 ,.., { ( ) }denotes the types in the distribution between indices i and j and will be denoted by {i, j}.
(4) D P i : Diversity of the part of the distribution corresponding to indices in Î  P i .