Statistical Divergence and Paths Thereof to Socioeconomic Inequality and to Renewal Processes

This paper establishes a general framework for measuring statistical divergence. Namely, with regard to a pair of random variables that share a common range of values: quantifying the distance of the statistical distribution of one random variable from that of the other. The general framework is then applied to the topics of socioeconomic inequality and renewal processes. The general framework and its applications are shown to yield and to relate to the following: f-divergence, Hellinger divergence, Renyi divergence, and Kullback–Leibler divergence (also known as relative entropy); the Lorenz curve and socioeconomic inequality indices; the Gini index and its generalizations; the divergence of renewal processes from the Poisson process; and the divergence of anomalous relaxation from regular relaxation. Presenting a ‘fresh’ perspective on statistical divergence, this paper offers its readers a simple and transparent construction of statistical-divergence gauges, as well as novel paths that lead from statistical divergence to the aforementioned topics.


Introduction
Measuring distances is of foundational importance in all fields of science and engineering.Arguably, measuring distances emerged with regard to points in planar geometry.Elevating from points in the plane to points in general spaces-e.g., Hilbert, Banach, and metric spaces-facilitates measuring distances between very general objects.
Even in the basic case of planar geometry, there are various ways of measuring distances.To illustrate this, envisage Manhattan (above 14th street).From an aerial perspective, the distance between two addresses in Manhattan is the Euclidean distance.From a pedestrian perspective, the distance between two addresses in Manhattan is the grid distance-which is attained by walking along avenues and streets (as pedestrians cannot walk through buildings).
Measuring distances is not at all confined to geometry-be it in the plane, or in general spaces.Indeed, consider a human society of interest and the distribution of wealth among its members.In a purely egalitarian society, the distribution of wealth is perfectly equal: all the members have exactly the same wealth.Of course, any real human society is not purely egalitarian.A key topic in economics and in the social sciences is quantifying socioeconomic inequality [1][2][3].Namely, measuring the distance of the human society of interest from the 'benchmark state' of perfect equality.This topic is addressed by important notions including the Lorenz curve [4][5][6][7] and socioeconomic inequality indices [1][2][3]8,9].
This paper establishes a general framework for measuring statistical divergence.The general framework is presented in Section 2, and it has a high 'return on investment'.Indeed, on the one hand, the framework is based on a simple and transparent construction.And, on the other hand, the framework yields potent gauges of statistical divergence.In particular, the framework leads to the aforementioned divergences.With the general framework established, the paper presents two applications of it.
Section 3 applies the general framework to the measurement of socioeconomic inequality.In particular, this application will show how the framework yields the aforementioned notions of the Lorenz curve and socioeconomic inequality indices; as well as the Gini index-perhaps the best known and the most widely applied socioeconomic inequality index [43][44][45][46][47][48]-and its generalizations.
Section 4 applies the general framework to renewal processes [49][50][51].In particular, this application will show how the framework facilitates measuring the divergence of renewal processes from the Poisson process [52][53][54], and measuring the divergence of anomalous relaxation [55][56][57][58] from 'regular' exponential relaxation.In addition, this application will yield further relations to socioeconomic inequality indices, as well as further generalizations of the Gini index.
Each of the Sections 2-4 ends with a short summary.For a quick read of the paper, readers can go over these summaries.
The novelty of this paper is twofold.Firstly, the paper offers its readers a direct and transparently constructed path to statistical divergence.Thereof, the paper offers its readers further paths to socioeconomic inequality, to renewal processes, and to anomalous relaxation.The paper illuminates profound linkages, which are not straightforwardly apparent, between the different (and seemingly unrelated) topics it addresses.
The paper is written in a self-contained fashion, and its pre-requisites are basic calculus and basic probability.Thus, the paper is suitable for a wide range of audiences: theoreticians and practitioners alike, from diverse fields of science and engineering.
A note about notation.Throughout this paper, IID is an acronym for "independent and identically distributed" (random variables), and E[•] denotes expectation (namely, E[Z] is the mean of a given random variable Z).

General Framework
Consider a random variable X that takes values in a real range R = (r low , r up ), where r low is the range's lower bound, r up is the range's upper bound, and −∞ ≤ r low < r up ≤ ∞.Further consider a random variable Y that is independent of X and that takes values in the range r low , r up .Namely, in addition to values in the range R, the random variable Y can also attain the lower-bound value r low and the upper-bound value r up .
The goal of this paper is to measure the statistical divergence of the random variable Y from the random variable X.In other words, the goal is to quantify the extent by which the statistical distribution of the random variable Y deviates from the statistical distribution of the random variable X.
Henceforth, the distribution functions of X and Y are denoted, respectively, by A(r) = Pr(X ≤ r) (r ∈ R) and by B(r) = Pr(Y ≤ r) (r ∈ R).The distribution function A(r) is assumed to be increasing, and hence it has an increasing inverse function A −1 (u) (0 < u < 1).In addition, the probability that the random variables X and Y coincide is assumed to be zero, Pr(X = Y) = 0; this assumption holds whenever the distribution function A(r) is continuous.
We shall measure the statistical divergence of Y from X via the curve (0 < u < 1).It is straightforward to observe that the curve C(u) is non-decreasing, and that its boundary values are lim u→0 C(u) = 0 and lim u→1 C(u) = 1.These observations imply that the curve C(u) manifests a distribution function over the unit interval.Specifically, C(u) = Pr(U ≤ u), where U is a random variable that takes values in the unit interval [0, 1].
In the case that the distribution functions of X and Y are smooth, their corresponding density functions are denoted, respectively, by a(r) = A ′ (r) (r ∈ R) and by b(r) = B ′ (r) (r ∈ R).In turn, differentiating Equation (1) yields (0 < u < 1).Namely, C ′ (u) is the density function of the random variable U. (As the distribution function A(r) is assumed to be increasing, the density a(r) is positive, and hence: the denominator appearing on the right-hand side of Equation ( 2) is positive, and the ratio is well-defined).
Statistical distributions that are defined over the unit interval have a natural 'benchmark': the uniform distribution-the unique statistical distribution that assigns all possible outcomes the same likelihood of occurrence.The uniform distribution is characterized by the linear distribution function C * (u) = u (0 < u < 1), as well as by the flat density function In what follows, we set U * to be a random variable that is uniformly distributed over the unit interval, Pr(U * ≤ u) = u.
Equation ( 1) implies that the random variables X and Y are equal in law, A(r) ≡ B(r), if and only if the random variables U and U * are equal in law, C(u) ≡ u.Consequently, the statistical divergence of Y from X can be measured 'by proxy' as follows: quantifying the deviation of the statistical distribution of the random variable U from the uniform distribution.We shall do so using three methods: area-Section 2.1; moments-Section 2.2; and coincidence-Section 2.4.

Area
The graphs of distribution functions that are defined over the unit interval reside in the unit square, where they 'climb' from the square's bottom-left corner to its top-right corner.Namely, the 2D coordinates of the unit-square's bottom-left corner are (0, 0), and the 2D coordinates of the top-right corner are (1, 1).The distribution function of the uniform distribution, C * (u) = u (0 < u < 1), is the diagonal line of the unit square.This line splits the unit square into two triangles: the triangle below the diagonal line, whose area is 1 2 ; and the triangle above the diagonal line, whose area is also 1  2 .Thus, the difference between the area of the upper triangle and the area of the lower triangle is zero.
Similarly to the diagonal line, the curve C(u) splits the unit square into two sets: the 'lower set', comprising the square's points that are below the curve; and the 'upper set', comprising the square's points that are above the curve.A derivation detailed in Appendix A asserts that the area of the lower set is In turn, as the square's total area is one, the area of the upper set is 1 − Pr(Y ≤ X) = Pr(Y > X).Thus, the difference between the area of the upper set and the area of the lower set is the quantity With regard to Equation (3), recall that the random variables X and Y are considered to be mutually independent.In terms of the curve C(u), the quantity of Equation We illustrate the fact that a zero quantity ∆(Y||X) does not imply that the random variables X and Y are equal in law.To that end, consider the random variables X and Y to be symmetric: X = −X and Y = −Y, the equalities being in law.The symmetry of the random variables X and Y implies that Pr(Y > X) = Pr(Y ≤ X) (this implication also uses the aforementioned assumption Pr(Y = X) = 0), and hence ∆(Y||X) = 0.However, the symmetry of X and Y does not imply that these random variables are equal in law.
The quantity ∆(Y||X) is a 'first-order measurement' of the statistical divergence of the random variable Y from the random variable X.Indeed, the quantity ∆(Y||X) does not characterize the case of a zero statistical divergence, i.e., the case where the random variables X and Y are equal in law.In the next subsection, we shall elevate from the 'first-order measurement' ∆(Y||X) to 'higher-order measurements'.

Moments
Consider a random variable ξ that takes values in the unit interval, and whose statistical distribution is governed by the density function ϕ(u) (0 < u < 1).The moments of the random variable ξ are the sequence The Hausdorff moment problem asserts that the moments of the random variable ξ characterize its statistical distribution [59].Namely, if another random variable-that also takes values in the unit interval-has the same moments as ξ, then the two random variables are equal in law.
In this subsection we shall apply the Hausdorff moment problem in order to measure the statistical divergence of the random variable Y from the random variable X.In what follows, {X 1 , • • • , X m } denote m IID copies of the random variable X; these IID copies are considered to be independent of the random variable Y.
As above, U * denotes a random variable that is uniformly distributed over the unit interval.As the density function of the uniform distribution is The quantity α m (Y||X) involves the probability p m = Pr(X 1 , • • • , X m ≤ Y).Being a probability, p m takes values in the unit interval [0, 1], and hence the quantity α m (Y||X) takes values in the range [−1, m].Moreover, the quantity α m (Y||X) attains its lower bound −1 if and only if p m = 0, i.e., if and only if the random variable Y equals its lower bound r low with probability one: α m (Y||X) = −1 ⇔ Pr(Y = r low ) = 1.And, antithetically, the quantity α m (Y||X) attains its upper bound m if and only if p m = 1, i.e., if and only if the random variable Y equals its upper bound r up with probability one: As the random variables U * and U take values in the unit interval, so do the random variables 1 − U * and 1 − U.The random variable 1 − U * is uniformly distributed, and hence its moments are E[(1 − U) m * ] = 1 m+1 .A derivation detailed in Appendix A asserts that the moments of the random variable 1 is the probability that the m IID copies of X are all larger than Y. Consequently, multiplying the difference of moments ] by the factor m + 1 yields the quantity The quantity β m (Y||X) involves the probability q m = Pr(X 1 , • • • , X m > Y).Being a probability, q m takes values in the unit interval [0, 1], and hence the quantity β m (Y||X) takes values in the range [−m, 1].Moreover, the quantity β m (Y||X) attains its lower bound −m if and only if q m = 1, i.e., if and only if the random variable Y equals its lower bound r low with probability one: β m (Y||X) = −m ⇔ Pr(Y = r low ) = 1.And, antithetically, the quantity β m (Y||X) attains its upper bound 1 if and only if q m = 0, i.e., if and only if the random variable Y equals its upper bound r up with probability one: The Hausdorff moment problem implies that the random variables U and U * are equal in law if and only if In turn, we obtain that the random variables X and Y are equal in law if and only if the quantities α m (Y||X) are all zero, as well as if and only if the quantities β m (Y||X) are all zero: Thus, collectively, the quantities α m (Y||X), as well as the quantities β m (Y||X), characterize the case of a zero statistical divergence of the random variable Y from the random variable X.
For m = 1, Equations ( 4) and ( 5) imply that where ∆(Y||X) is the quantity of Equation (3) (this implication uses the aforementioned assumption Pr(Y = X) = 0).As noted in Section 2.1, the quantity ∆(Y||X) is a 'first-order measurement' of the statistical divergence of the random variable Y from the random variable X.The quantities α m (Y||X) and β m (Y||X) are 'higher-order measurements' of this statistical divergence.
To conclude this subsection, we underscore the fact that no use of the moments of the random variables X and Y was made here.Indeed, in general, the random variables X and Y may or may not have well-defined moments.This subsection used the (well-defined) moments of the random variables U and 1 − U in order to 'harvest information' regarding the statistical divergence of the random variable Y from the random variable X.The information that was harvested appeared in terms of the probabilities Pr(X 1 , • • • , X m ≤ Y) and Pr(X 1 , • • • , X m > Y) (rather than in terms of moments of the random variables X and Y).
The quantity α m (Y||X) of Equation (4) involves the maximal random variable max{X 1 , • • • , X m }.Indeed, the probability appearing in the right-hand side of Equation ( 4) is Pr(max{X 1 , • • • , X m } ≤ Y).Set ǎm (r) (r ∈ R) to denote the density function of the maximal random variable max{X 1 , • • • , X m }.Then, using this density function, a derivation detailed in Appendix A asserts that The quantity β m (Y||X) in Equation ( 5) involves the minimal random variable min{X 1 , • • • , X m }.Indeed, the probability appearing in the right-hand side of Equation (5)  is Pr(min{X 1 , • • • , X m } > Y).Set âm (r) (r ∈ R) to denote the density function of the minimal random variable min{X 1 , • • • , X m }.Then, using this density function, a derivation detailed in Appendix A asserts that For m = 1, the maximal random variable max{X 1 , • • • , X m } and the minimal random variable min{X 1 , • • • , X m } are both equal, in law, to the random variable X.Consequently, ǎm (r) = âm (r) = a(r), and hence Equations ( 7), (8) and Equation (6) imply that Equations ( 7)-( 9) manifest weighted-average representations for the quantities appearing in them.In these representations, the averages are of the difference between the distribution functions of the random variables X and Y.Moreover, the averaging weights are as follows: the density function of the maximal random variable max{X 1 , • • • , X m } in Equation ( 7); the density function of the minimal random variable min{X 1 , • • • , X m } in Equation ( 8); and the density function of the random variable X in Equation ( 9).

Coincidence
As above, consider a random variable ξ that takes values in the unit interval, and whose statistical distribution is governed by the density function ϕ(u) (0 < u < 1).Further consider m IID simulations of the random variable ξ (m = 2, 3, 4, • • • ).The likelihood that the m simulations will all yield the same outcome is the "coincidence likelihood", of order m, of the random variable ξ.In this subsection we shall apply coincidence likelihoods in order to measure the statistical divergence of the random variable Y from the random variable X.
As above, U * denotes a random variable that is uniformly distributed over the unit interval.As the density function of the uniform distribution is With regard to Equation (10), recall that a(r) and b(r) are, respectively, the density functions of the random variables X and Y.
Evidently, the quantity of Equation ( 10) can be extended to any real value of the parameter m.In what follows, we extend this parameter from the discrete set of values m = 2, 3, 4, • • • to the continuous range of values m > 1.
For the continuous range m > 1, a convexity argument that is detailed in Appendix A implies the two following properties.(I) The quantity of Equation ( 10) is non-negative: γ m (Y||X) ≥ 0. (II) The quantity of Equation ( 10) is zero if and only if the random variables X and Y are equal in law: γ m (Y||X) = 0 ⇔ a(r) ≡ b(r).Hence, with regard to any parameter value in the continuous range m > 1, the following conclusion is attained: the quantity γ m (Y||X) characterizes the case of a zero statistical divergence of the random variable Y from the random variable X.
The convexity argument that holds for the continuous range m > 1 also holds for the continuous range m < 0. Shifting from the former range (m > 1) to the latter range (m < 0) flips the statistical divergence from "Y with respect to X" to "X with respect to Y".Indeed, straightforward calculations that are based on Equation (10) yield the following pair of 'flipping formulae'.(I) When the parameter is in the continuous range m > 1, then
Hellinger divergence and Renyi divergence.In terms of the quantity of Equation ( 10), the Hellinger divergence of the random variable Y from the random variable X admits the formulation 1  m−1 γ m (Y||X); and the Renyi divergence of the random variable Y from the random variable X admits the formulation 1 m−1 ln[1 + γ m (Y||X)].In both these divergences, the parameter m is positive (m > 0) and different from one (m ̸ = 1).
Kullback-Leibler divergence.Setting m = 1 in Equation ( 10), observe that γ 1 (Y||X) = 0.In turn, taking the limit m → 1-while using L'Hopital's rule-yields the following limiting result (see Appendix A for the details).The Hellinger divergence 1  m−1 γ m (Y||X) and the Renyi divergence The quantity appearing in Equation ( 11) is the Kullback-Leibler divergence of the random variable Y from the random variable X. f-divergence.The right-hand side of Equation ( 10) admits the general form where φ(t) (t ≥ 0) is a convex function that satisfies the condition φ(1) = 0.The quantity appearing in Equation ( 12) is the f-divergence of the random variable Y from the random variable X. Applying the change in variables u = A(r) to Equation ( 12) implies that the fdivergence admits-in terms of the density function C ′ (u) of Equation ( 2)-the formulation Different choices of the convex function φ(t) yield different manifestations of the f-divergence (see, for example, [26]).In particular, the choice φ(t) = t m − 1-with a real parameter m that takes values either in the range m > 1, or in the range m < 0-yields the quantity of Equation (10).The choice φ(t) = 1 m−1 (t m − 1)-with a positive parameter m that takes values either in the range m < 1, or in the range m > 1-yields the Hellinger divergence.And the choice φ(t) = t ln(t) yields the Kullback-Leibler divergence.
The convexity argument that was noted in Section 2.4 (with regard to the quantity of Equation ( 10)) holds for the f-divergence.Namely, the convexity argument implies the two following properties.(I) The f-divergence is non-negative.(II) The f-divergence is zero if and only if the random variables X and Y are equal in law.
Therefore, via the notion of "coincidence likelihood" (that was described in Section 2.4), the curve C(u) of Equation ( 1) leads to Equation (10).In turn, Equation ( 10) leads to the four aforementioned divergences.

Summary & Implementation
In this section, we have established three different quantities that measure the statistical divergence of the random variable Y from the random variable X.The random variable X takes values in the real range R = (r low , r up ), the random variable Y takes values in the real range r low , r up , and the three quantities are the following.
The quantities α m (Y||X) and β m (Y||X) can be used for any pair of random variables X and Y.In these quantities, {X 1 , • • • , X m } are m IID copies of the random variable X, and the IID copies are independent of the random variable Y.These quantities characterize the case of a zero statistical divergence collectively.Namely, the random variable Y is equal in law to the random variable X if and only if α m (Y||X) = 0 for all m, and β m (Y||X) = 0 for all m.
The quantity γ m (Y||X) can be used when the random variable X has a density function a(r) that is positive over the range R, and when the random variable Y has a density function b(r).For any given m, this quantity characterizes the case of a zero statistical divergence: the random variable Y is equal in law to the random variable X if and only if γ m (Y||X) = 0.The quantity γ m (Y||X) is intimately related to the Hellinger and the Renyi divergences, and it leads to the Kullback-Leibler divergence and to the general f-divergence.
In general, none of the three quantities are symmetric.Namely, in general, the statistical divergence of the random variable Y from the random variable X is not the same as the statistical divergence of the random variable X from the random variable Y. Key properties of the three quantities are summarized in Table 1.
Table 1.Key properties of the measures established in Section 2-the quantities α m (Y||X), β m (Y||X), and γ m (Y||X).For each quantity, the table specifies the values of the underlying parameter m, the range of the quantity, and the characterization of the case where the random variable Y is equal in law to the random variable X.For the quantities α m (Y||X) and β m (Y||X), the table also specifies the characterization of the following extreme cases: the random variable Y is equal, with probability one (w.p. 1), to the lower-bound value r low ; the random variable Y is equal, with probability one (w.p. 1), to the upper-bound value r up .

Quantity
α m (Y||X) The implementation of the three quantities, based on empirical data-be it real-world, experimental, simulated, etc.-is practiced as follows.Firstly, given n samples of the random variable X, order these samples increasingly: x n ; in addition, set x 0 = r low and x n+1 = r up .Secondly, given a set of samples of the random variable Y, calculate the following proportions: π i is the proportion of the samples (of the random variable Y) that are in the range . Thirdly, calculate the three quantities via the following approximation formulae.
Indeed, based on the empirical data, the estimate of the curve C(u) is the linear interpolation of the unit-square points C(0) = 0 and . In turn, the slopes of the piecewise-linear estimate of the curve . Consequently, the estimates of the three quantities are given by the above approximation formulae.

Socioeconomic Application
Consider a positive random variable W, whose statistical distribution is governed by the density function f (w) (0 < w < ∞).The mean of the random variable W is its first moment, E[W] = ∞ 0 w f (w)dw.We assume that the mean is positive and finite, and denote it by µ.This implies that f $ (w) = 1 µ w f (w) (0 < w < ∞) is a density function.We set W $ to be a random variable whose statistical distribution is governed by the density function f $ (w).
The random variable W $ is positive, and it has a socioeconomic interpretation that is described as follows.Envisage a human society that comprises members with positive personal wealth values, and consider W to be the personal wealth of a randomly sampled member of the society.Now, rather than sampling at random a single member of the society, sample at random a single dollar of the society's overall wealth (i.e., the aggregate of the personal wealth values of all the society's members).Then, W $ is the personal wealth of the society member to whom the randomly-sampled dollar belongs.
Whereas the random variable W has no inherent inclination towards large wealth values, the random variable W $ has such an inclination.Indeed, when a single member of the society is sampled at random, it is exactly as likely for any given rich individual, and for any given poor individual, to be the randomly-sampled member.However, when a single dollar is sampled at random from the society's overall wealth, it is more likely that this randomly-sampled dollar belongs to a rich member of the society than to a poor member of the society.
This section presents a socioeconomic application of the general framework that was established in Section 2: measuring the statistical divergence of the random variable W $ from the random variable W. Throughout this section, F(w) = Pr(W ≤ w) (0 < w < ∞) denotes the distribution function of the random variable W. In addition, the density function f (w) is assumed to be positive, and hence the distribution function F(w) is increasing, and it has an inverse function F −1 (u) (0 < u < 1) that is also increasing.

Lorenz Curve
In order to use the framework of Section 2, we set X = W and Y = W $ (the equalities being in law).In turn, the underlying range is the positive half-line R = (0, ∞) and: the distribution function of X is A(r) = F(r); the density function of X is a(r) = f (r); the density function of Y is b(r) = f $ (r) = 1 µ r f (r).Noting that b(r)/a(r) = r/µ, Equation (2) implies that the derivative of the curve C(u) is As the inverse function F −1 (u) is increasing, so is the derivative C ′ (u), and hence the curve C(u) is convex.In turn, the convexity of the curve C(u) implies that its graph resides below the diagonal line of the unit square: C(u) < u (0 < u < 1).The socioeconomic interpretation of the random variable W $ induces a socioeconomic interpretation of the curve C(u).Indeed, with regard to the underlying human society, the aggregate wealth held by the poor 100u% of the society's members constitutes 100C(u)% of the society's overall wealth.Termed the "Lorenz curve" in honor of the American statistician Max Lorenz [4], the curve C(u) is widely applied in economics and in the social sciences to investigate wealth and income distributions, as well as to quantify socioeconomic inequality [5][6][7].
As noted after Equation (13), the Lorenz curve is bounded from above by the diagonal line of the unit square: C(u) < u (0 < u < 1).In the space of Lorenz curves, the diagonal line characterizes the socioeconomic state of perfect equality: a society in which all the society members have an identical personal wealth value (which is positive).
Evidently, the Lorenz curve is bounded from below by the zero line of the unit square: C(u) > 0 (0 < u < 1).In the space of Lorenz curves, the zero line characterizes the socioeconomic state of perfect inequality: a society in which 0% of the society's members are in possession of 100% of the society's overall wealth.The state of perfect inequality is attainable only when the society's population is infinitely large.
Perhaps the simplest way to envisage the socioeconomic state of perfect inequality is via the following 'Pharaonic example'.Indeed, consider a Pharaonic society comprising n members, in which one single member-the Pharaoh-has personal wealth 1, and all other members have personal wealth 0.Then, the socioeconomic state of perfect inequality is attained in the infinite-population limit n → ∞ of the Pharaonic society.
The closer the Lorenz curve is to the diagonal line of the unit square, the more egalitarian the underlying human society.Antithetically, the closer the Lorenz curve is to the zero line of the unit square, the less egalitarian the underlying human society.Hence, the Lorenz curve provides a geometric approach to quantifying socioeconomic inequality.

Gini Index
Named in honor of the Italian economist Corrado Gini [43,44], the "Gini index" is arguably the principal gauge of socioeconomic inequality in economics and in the social sciences [45][46][47][48].The scientific applications of the Gini index extend well beyond socioeconomic inequality (for recent such applications see [48] and references therein).
In terms of the Lorenz curve, the Gini index is defined as follows: it is twice the area captured between the Lorenz curve and the unit-square's diagonal line, 2 The definition of the Gini index implies that, with regard to the unit square (in which the graph of the Lorenz curve resides), the Gini index is the difference between the area above the Lorenz curve and the area below the Lorenz curve.Hence, Equation (3) implies that the Gini index is As described above, the random variables W and W $ manifest two different wealthsampling methods.The random variable W $ is inclined towards sampling large wealth values, and the Gini index quantifies this inclination.

Wealth Maxima & Minima
Having addressed the first-order quantity ∆(W $ ||W) = α 1 (W $ ||W) = β 1 (W $ ||W) in the previous subsection, we now move from the first order m = 1, and address the higher-order quantities α m (Y||X) and β m (Y||X A derivation detailed in Appendix A asserts that Equation (4) yields the quantity Namely, the quantity α m (W $ ||W) is the overshoot of the mean of the maximal random variable max{W 1 , • • • , W m+1 }, measured relative to the mean of the random variable W. A derivation detailed in Appendix A asserts that Equation (5) yields the quantity Namely, the quantity β m (W $ ||W) is the undershoot of the mean of the minimal random variable min{W 1 , • • • , W m+1 }, measured relative to the mean of the random variable W.

Gini Index Revisited
As in the previous subsection, in this subsection {W 1 , • • • , W m+1 } denotes a sample comprising m + 1 IID copies of the random variable W. The sample's maximal wealth value is max{W 1 , • • • , W m+1 }, and the sample's minimal wealth value is min{W 1 , • • • , W m+1 }.In turn, the sample's range is the gap between the sample's maximal and minimal wealth values, max{W 1 , Summing up Equations ( 15) and (16), and denoting the sum by ρ m (W $ ||W) = α m (W $ ||W) +β m (W $ ||W), yields the following quantity: Namely, the quantity ρ m (W $ ||W) is the mean of the range of the sample {W 1 , Consider now the order m = 1.With regard to this order, as noted above, the quantities α 1 (W $ ||W) and β 1 (W $ ||W) coincide and yield the Gini index ∆(W $ ||W).In turn, the Gini index is the arithmetic average of the quantities α 1 (W $ ||W) and β 1 (W $ ||W), and hence With regard to the order m = 1, the sample is {W 1 , W 2 } and hence the sample's range is max{W 1 , Namely, the Gini index can be represented as the mean absolute deviation (MAD) of two IID copies of the random variable W, measured relative to twice the mean of the random variable W.

Wealth Moments
Having addressed the quantities α m (Y||X) and β m (Y||X), we now turn to address the quantity γ m (Y||X).Setting X = W and Y = W $ (the equalities being in law), a derivation detailed in Appendix A asserts that Equation (10) yields the quantity Namely, the quantity γ m (W $ ||W) is the overshoot of the moment of order m of the random variable W, measured relative to the m th power of the mean of the random variable W.
For any m that is larger than one, a Jensen-inequality argument that is detailed in Appendix A affirms the fact that the quantity γ m (W $ ||W) is non-negative, γ m (W $ ||W) ≥ 0.Moreover, the Jensen-inequality argument asserts that the quantity γ m (W $ ||W) is zero if and only if the random variable W is constant with probability one: γ m (W $ ||W) = 0 ⇔ Pr(W = µ) = 1.As m is larger than one, note that: due to the moment E[W m ], the quantity γ m (W $ ||W) is sensitive to large wealth values.
For m = 2 Equation ( 19) yields the ratio 2 , where Var[W] is the variance of the random variable W. The coefficient of variation (CV) of the random variable W is its 'noise-to-signal' ratio: the ratio of its standard deviation to its mean, is the squared CV of the random variable W.
Setting X = W $ and Y = W (the equalities being in law), a derivation detailed in Appendix A asserts that Equation (10) yields the quantity Namely, the quantity γ m (W||W $ ) is the overshoot of the moment of order 1 − m of the random variable W, measured relative to the (1 − m) th power of the mean of the random variable W.
For any m that is larger than one, a Jensen-inequality argument that is detailed in Appendix A affirms the fact that the quantity γ m (W||W $ ) is non-negative, γ m (W||W $ ) ≥ 0.Moreover, the Jensen inequality argument asserts that the quantity γ m (W||W $ ) is zero if and only if the random variable W is constant with probability one: γ m (W||W $ ) = 0 ⇔ Pr(W = µ) = 1.As m is larger than one, note that: due to the moment E W 1−m , the quantity γ m (W||W $ ) is sensitive to small wealth values.

Inequality Indices
Inequality indices are gauges that quantify the socioeconomic inequality of human societies [1][2][3]8,9].The aforementioned Gini index is perhaps the best known inequality index.A general inequality index is a 'socioeconomic score' that takes values in the range [0, 1] and that meets the three following properties.(I) The inequality index yields its lowerbound score 0 if and only if the society is in the socioeconomic state of perfect equality.(II) If the society is in the socioeconomic state of perfect inequality, then the inequality index yields its upper-bound score 1. (III) The inequality index is invariant with respect to the particular currency via which the personal wealth values of the society members are measured (e.g., Dollar or Euro).
With regard to the random variable W, the first inequality-index property can be formulated as determinism: the inequality index yields its lower-bound score 0 if and only if the random variable W is deterministic, i.e., if and only if it is constant with probability one.In addition, with regard to the random variable W, the third inequality-index property can formulated as scale invariance: the inequality index is invariant with respect to the change-of-scale W → s • W, where s is an arbitrary positive scale parameter.
The following quantities all satisfy the first and the third inequality-index properties.the quantities α m (W $ ||W) and β m (W $ ||W) of Equations ( 15) and ( 16); the quantity ρ m (W $ ||W) of Equation (17); and the quantities γ m (W $ ||W) and γ m (W||W $ ) of Equations ( 19) and (20).In fact, transformations of these quantities yield inequality indices, and these inequality indices are specified in Table 2.For a comprehensive study of the inequality indices that correspond to the quantities α m (W $ ||W), β m (W $ ||W), and ρ m (W $ ||W) the readers are referred to [60].For a comprehensive study of the inequality indices that correspond to the quantities γ m (W $ ||W) and γ m (W||W $ ), the readers are referred to [60,61].

Quantity
Transformation Inequality Index

Summary & Implementation
This section has presented a socioeconomic application of the general framework that was established in Section 2. The application gave rise to well-known socioeconomic notions including the Lorenz curve, the Gini index and its generalizations, and general inequality indices.Underlying this socioeconomic application is the following setting: a human society comprising members with positive personal wealth values.
Wealth is measured via two different statistical sampling methods.On the one hand, a single member of the society is sampled at random, and W is the wealth of the randomlysampled member.On the other hand, a single dollar is sampled at random (from the aggregate wealth of all the society members), and W $ is the wealth of the society member to whom the randomly-sampled dollar belongs.Evidently, the random variable W $ is inclined towards sampling the richer members of the society.
The socioeconomic application focused on measuring the statistical divergence of the random variable W $ from the random variable W. This statistical divergence quantifies the extent by which the rich deviate from the rest of the society members.This statistical divergence was shown to be zero if and only if the human society is in the socioeconomic state of perfect equality: all the society members share a common (positive) personal wealth value.
In terms of the random variables W and W $ , the state of perfect equality is characterized as follows: both the random variables W and W $ are equal, with probability one, to a fixed positive wealth value.Thus, from a probabilistic perspective, perfect equality manifests determinism.In turn, the statistical divergence of the random variable W $ from the random variable W quantifies the deviation from the 'deterministic benchmark'.
The statistical divergence of the random variable W $ from the random variable W was measured by four quantities: the quantities α m (W $ ||W) and β m (W $ ||W), as well as their sum ρ m (W $ ||W) = α m (W $ ||W) + β m (W $ ||W); and the quantity γ m (W $ ||W).In addition, the statistical divergence of the random variable W from the random variable W $ was measured by the quantity γ m (W||W $ ).As detailed in Table 2 above, these five quantities can be mapped-via one-to-one transformations-to socioeconomic inequality indices of the underlying human society.
The implementation of the five quantities, based on empirical data-be it real-world, experimental, simulated, etc.-is practiced as follows: Firstly, given n samples of the random variable W, order these samples increasingly: Secondly, calculate the corresponding wealth proportions: Thirdly, calculate the five quantities via the following approximation formulae.
Indeed, based on the empirical data, the estimate of the Lorenz curve C(u) is the linear interpolation of the unit-square points C(0) = 0 and In turn, the slopes of the piecewise-linear estimate of the Lorenz curve Consequently, the estimates of the five quantities are given by the above approximation formulae.
Last but not least, we emphasize that the socioeconomic application presented in this section is not at all confined to the underlying socioeconomic setting.Indeed, the socioeconomic application can be used with regard to any non-negative random variable W that has a positive and finite mean [8,9].

Renewal Application
Consider a positive random variable T, whose statistical distribution is governed by the survival function F(t) = Pr(T > t) (0 < t < ∞).The mean of the random variable T is given by the integral of its survival function, E[T] = ∞ 0 F(t)dt.We assume that the mean is positive and finite, and (as in Section 3) denote it by µ.This implies that f res (t) = 1 µ F(t) (0 < t < ∞) is a density function.We set T res to be a random variable whose statistical distribution is governed by the density function f res (t).
The random variable T res is positive, and it has a renewal interpretation that is described as follows.Consider a renewal process that is generated from the random variable T [49][50][51].Specifically, the renewal process is a sequence of temporal points 0 = τ 0 < τ 1 < τ 2 < τ 3 < • • • that are termed "renewal epochs", and that are constructed as follows: the temporal durations between consecutive renewal epochs are IID copies of the random variable T. Now, standing at a fixed time point t, denote by ∆ t the temporal duration between the time point t and the first renewal epoch after the time point t.A key result of the theory of renewal processes asserts that [51]: the random variable ∆ t converges in law, in the limit t → ∞, to the random variable T res -which is termed the "residual lifetime" of the random variable T.
The random variables T and T res manifest two different observations of the renewal process.To illustrate these different observations, consider the renewal process as manifesting the time epochs at which buses (of a certain bus line) arrive at a given bus station.The random variable T manifests the waiting time of a passenger that reaches the bus station just after a bus has left the station.The random variable T res manifests the waiting time of a passenger that reaches the bus station at an arbitrary time point.
This section presents a renewal application of the general framework that was established in Section 2: measuring the statistical divergence of the random variable T res from the random variable T. In this section, the survival function F(t) is assumed to be smooth, and f (t) = − F′ (t) (0 < t < ∞) denotes the corresponding density function.Namely, f (t) is the density function of the random variable T.

Hazard Rate
The likelihood that the random variable T is realized at the positive time point t is f (t).In turn, the conditional likelihood that the random variable T is realized at the positive time point t-given the information that T was not realized up to the time point t-is h(t) = f (t)/ F(t).The conditional likelihood h(t), as a function of the temporal variable t, is termed the "hazard rate" and the "failure rate" of the random variable T.
The hazard rate h(t) characterizes the statistical distribution of the random variable T. Indeed, in terms of the hazard rate h(t), the survival function of the random variable T admits the representation F(t) = exp[− t 0 h(u)du].The hazard rate h(t) is a key statistical tool that is widely applied in survival analysis [62][63][64] and in reliability engineering [65][66][67].
Here, as shall now be shown, the hazard rate h(t) emerges naturally in the context of the statistical divergence of the random variable T res from the random variable T.
In order to use the framework of Section 2, we set X = T and Y = T res (the equalities being in law).In turn, the underlying range is the positive half-line R = (0, ∞), and: the distribution function of The curve C(u) coincides with the diagonal line of the unit square, C(u) = u (0 < u < 1), if and only if the curve's derivative is identically one, C ′ (u) = 1 (0 < u < 1).In turn, Equation (21) implies that the curve's derivative is identically one if and only if the hazard rate is flat: This flat hazard rate is equivalent to the exponential survival function The above argumentation affirms the well-known fact that the residual lifetime T res (of the random variable T) is equal in law to the random variable T if and only if this random variable is exponentially distributed.Thus, in effect, the statistical divergence of the random variable T res from the random variable T quantifies how "non-exponential" the statistical distribution of the random variable T is.
When the random variable T is exponentially distributed, then the renewal process that it generates is the Poisson process [52][53][54].Hence, from a renewal perspective: the statistical divergence of the random variable T res from the random variable T quantifies the following: the deviation of the renewal process that is generated by the random variable T from the 'Poisson benchmark'.

Increasing and Decreasing Hazard Rates
In reliability engineering, two particular classes of random variables are distinguished as significantly important [65].One is the increasing failure rate (IFR) class, which comprises all positive random variables whose hazard rates are increasing functions.The other is the decreasing failure rate (DFR) class, which comprises all positive random variables whose hazard rates are decreasing functions.
The IFR class is used to model the lifetimes of systems that age with time.Namely, a system is aging if the likelihood that the system will fail grows as the age of the system grows.Aging systems are all around us, and examples of such systems include buildings, infrastructures, machines, cars, ships, airplanes, and even our very own human bodies.
If the random variable T belongs to the IFR class, then its hazard rate h(t) is an increasing function.In this IFR scenario, Equation (21) implies that the derivative C ′ (u) is decreasing, and hence the curve C(u) is concave.In turn, the concavity of the curve C(u) implies that its graph resides above the diagonal line of the unit square: The DFR class is used to model the lifetimes of phenomena that anti-age with time.Namely, a phenomenon is anti-aging if the likelihood that the phenomenon will cease diminishes as the age of the phenomenon grows [68].Anti-aging phenomena are encountered in our culture and in our technologies, e.g., the symphonies of Beethoven, the writings of Shakespeare, the Georgian calendar, the English alphabet, cutlery, and the wheel.Indeed, the longer we listen to Beethoven and the longer we use cutlery, the greater the likelihood that we shall keep on doing so.
If the random variable T belongs to the DFR class, then its hazard rate h(t) is a decreasing function.In this DFR scenario, Equation (21) implies that the derivative C ′ (u) is increasing, and hence the curve C(u) is convex.In turn, the convexity of the curve C(u) implies that its graph resides below the diagonal line of the unit square: C(u) < u (0 < u < 1).
As noted in Section 2.1, the curve C(u) splits the unit square into two sets: the square's points that are above the curve, and the square's points that are below the curve.Moreover, Equation (3) asserts that the difference between the area of the upper set and the area of the lower set is the quantity The quantity ∆(T res ||T) takes values in the range [−1, 1], and for the IFR and DFR classes, it gauges the "deviation from exponentiality".Indeed, if the random variable T belongs to the IFR class, then the quantity ∆(T res ||T) takes values in the range [−1, 0], and it displays the following properties.(I) The quantity ∆(T res ||T) attains its upper bound 0 if and only if the random variable T is exponentially distributed.(II) The smaller the quantity ∆(T res ||T), the more "non-exponential" the statistical distribution of the random variable T is.
Similarly, if the random variable T belongs to the DFR class, then the quantity ∆(T res ||T) takes values in the range [0, 1], and it displays the following properties.(I) The quantity ∆(T res ||T) attains its lower bound 0 if and only if the random variable T is exponentially distributed.(II) The larger the quantity ∆(T res ||T), the more "non-exponential" the statistical distribution of the random variable T is.
In statistical physics, the exponential distribution is the paradigmatic model for 'regular' relaxation.The IFR and DFR classes offer general models for 'anomalous' relaxation [55][56][57][58].Specifically, the IFR class is a general model for super-exponential relaxation, and the IFR class is a general model for sub-exponential relaxation.In turn, the quantity ∆(T res ||T) quantifies the deviation of anomalous relaxation-super-exponential and sub-exponential-from the 'regular-relaxation benchmark'.

Weibull Example
To illustrate the quantity ∆(T res ||T) in the context of the IFR and DFR classes, consider the example of a Weibull-distributed random variable T [69][70][71].This example is characterized by the survival function F(t) = exp[−(t/s) ϵ ] (0 < t < ∞), where s is a positive scale parameter, and where ϵ is a positive exponent.Equivalently, this example is characterized by the hazard rate h(t) = ct ϵ−1 (0 < t < ∞), where c = ϵ/s ϵ is a positive coefficient.
The Weibull hazard rate exhibits the following behaviors.For exponent values ϵ < 1 the hazard rate is decreasing, and hence the random variable T belongs to the DFR class.At the exponent value ϵ = 1 the hazard rate is flat, and hence the random variable T is exponentially distributed-with a mean that equals the scale parameter, µ = s.For exponent values ϵ > 1, the hazard rate is increasing, and hence the random variable T belongs to the IFR class.
A calculation using a general result to be presented below (Equation ( 25)) yields-for the Weibull-distributed random variable T-the quantity Note that the right-hand side of Equation ( 23) depends only on the Weibull exponent ϵ (i.e., it does not depend on the scale parameter s).Denoting the right-hand side of Equation ( 23) by g(ϵ), this function of the Weibull exponent ϵ decreases from the level lim ϵ→0 g(ϵ) = 1 to the level lim ϵ→∞ g(ϵ) = −1.In addition, this function passes through the origin at the exponent value one, g(1) = 0. Thus, for the Weibull-distributed random variable T, the quantity ∆(T res ||T) exhibits the following behaviors.In the DFR range (ϵ < 1), the quantity ∆(T res ||T) is positive, and this quantity attains its upper bound 1 in the Weibull limit ϵ → 0. The quantity ∆(T res ||T) is zero if and only if the Weibull exponent ϵ is one, which characterizes the case of an exponentially distributed random variable T. In the IFR range (ϵ > 1), the quantity ∆(T res ||T) is negative, and this quantity attains its lower bound −1 in the Weibull limit ϵ → ∞.
As noted in Section 4.2, the exponential distribution is the paradigmatic model for 'regular' relaxation in statistical physics.The paradigmatic model for 'anomalous' relaxation is the Weibull distribution [55][56][57][58].The Weibull distribution spans sub-exponential anomalous relaxation (the DFR range ϵ < 1), super-exponential anomalous relaxation (the IFR range ϵ > 1), and 'regular' exponential relaxation (the ϵ = 1 boundary, which separates the DFR and the IFR ranges).So, with regard to the Weibull model of anomalous relaxation, the quantity ∆(T res ||T) of Equation ( 23) quantifies the deviation from the 'regular-relaxation benchmark'.
A derivation detailed in Appendix A asserts that Equation ( 4) yields the quantity The quantity α m (T res ||T) is based on the difference between the mean of the maximal random variable max{T 1 , • • • , T m+1 } and the mean of the maximal random variable max{T 1 , • • • , T m }.
The quantity α m (T res ||T), via the maximal random variables max{T 1 , • • • , T m+1 } and max{T 1 , • • • , T m }, sets its focus on the occurrence of large val- ues of the duration T. As noted in Section 2.2, the quantity α m (T res ||T) takes values in the range [−1, m].
A derivation detailed in Appendix A asserts that Equation ( 5) yields the quantity The quantity β m (T res ||T) is based on the mean of the minimal random variable min{T 1 , • • • , T m+1 }.The quantity β m (T res ||T), via the minimal random variable min{T 1 , • • • , T m+1 }, sets its focus on the occurrence of small values of the duration T.
As noted in Section 2.2, the quantity β m (T res ||T) takes values in the range [−m, 1].

Duration-Wealth Linkages
This section addresses the statistical divergence of the random variable T res from the random variable T, where T res is a random variable whose statistical distribution is governed by the density function f res (t) = 1 µ F(t) (0 < t < ∞).The previous section addressed the statistical divergence of the random variable T $ from the random variable T, where T $ is a random variable whose statistical distribution is governed by the density function In this subsection we will show that the quantities α m (T res ||T) and β m (T res ||T) of this section are intimately linked to the quantities α m (T $ ||T) and β m (T $ ||T) of the previous section.Indeed, observing Equation (24) on the one hand, and Equation (15) on the other hand, it follows that Moreover, observing Equation ( 25) on the one hand, and Equation ( 16) on the other hand, it follows that In particular, setting m = 1 in Equations ( 26) and ( 27) yields Specifically, Equation ( 28) follows from Equations ( 26) and ( 27) by noting that α 0 (T $ ||T) = 0 (indeed, setting m = 0 in Equation ( 15) yields α 0 (T $ ||T) = 0), and, by using Equation ( 6) (with regard to the divergence of T res from T, and with regard to the divergence of T $ from T). Recall that the quantity ∆(T $ ||T) that appears on the right-hand side of Equation ( 28) is the Gini index of the random variable T.
As described in the opening of this section, the random variable T res has a renewal interpretation.And, as described in the opening of the previous section, the random variable T $ has a wealth interpretation.In fact, the random variable T $ also has a renewal interpretation which is described as follows.
Consider a renewal process that is generated from the random variable T (as detailed in the opening of this section).Standing at a fixed positive time point t, denote by C t the temporal duration of the renewal interval that 'covers' the time point t.Namely, C t is the temporal duration between the following renewal epochs: the last renewal epoch before the time point t, and the first renewal epoch after the time point t.A key result of the theory of renewal processes asserts that [51]: the random variable C t converges in law, in the limit t → ∞, to the random variable T $ .In this renewal context, the random variable T $ is termed the "total lifetime" of the random variable T.
The random variables T and T $ manifest two different observations of the renewal process.To illustrate these different observations, consider (as in the opening of this section) the renewal process as manifesting the time epochs at which buses (of a certain bus line) arrive at a given bus station.The random variable T manifests the waiting time between consecutive bus arrivals-as observed by a passenger that reaches the bus station just after a bus has left the station.The random variable T $ manifests the waiting time between consecutive bus arrivals-as observed by a passenger that reaches the bus station at an arbitrary time point.

Inequality Indices Revisited
Equations ( 26) and (27) established linkages between the quantities α m (T res ||T) and β m (T res ||T) of this section, and the quantities α m (T $ ||T) and β m (T $ ||T) of the previous section.In the previous section, it was shown that transformations of the quantities α m (T $ ||T) and β m (T $ ||T) are inequality indices of the random variable T. Thus, the following question arises naturally: are there transformations of the quantities α m (T res ||T) and β m (T res ||T) that are also inequality indices?This subsection shall answer the question affirmatively.
The following transformation of the quantity α m (T res ||T) is an inequality index: Indeed, setting X = T and Y = T res in Equation ( 4), it follows that the term appearing in Equation ( 29) is the probability Pr(X 1 , • • • , X m ≤ Y)-which, of course, takes values in the range [0, 1].The right-hand side of Equation ( 29) attains the lower bound 0 if and only if the random variable T is constant with probability one.In addition, the right-hand side of Equation ( 29) is invariant with respect to changes-of-scale of the random variable T.
The following transformation of the quantity β m (T res ||T) is an inequality index: Indeed, setting X = T and Y = T res in Equation ( 5), it follows that the term appearing in Equation ( 30) is the probability Pr(X 1 , • • • , X m > Y), which, of course, takes values in the range [0, 1].The right-hand side of Equation ( 30) attains the lower bound 0 if and only if the random variable T is constant with probability one.Moreover, the right-hand side of Equation ( 30) is invariant with respect to changes-of-scale of the random variable T. Equation ( 27) implies that the left-hand side of Equation ( 30) is equal to the quantity β m (T $ ||T).In the previous section, we saw that the quantity β m (T $ ||T) is an inequality index, and hence Equation (30) does not yield a 'new' inequality index.On the other hand, Equation ( 29) does yield a 'new' inequality index, i.e., an inequality index that was not encountered in the previous section.

Summary and Implementation
This section has presented a renewal application of the general framework that was established in Section 2. Underlying the renewal application is the following setting: a renewal process whose inter-renewal durations-i.e., the temporal durations between the process' consecutive renewal epochs-are IID copies of the random variable T.
The waiting-time till the next renewal epoch was observed via two different temporal perspectives.On the one hand, an observer was placed right after a renewal epoch; the waiting time of this observer was the random variable T. On the other hand, an observer was placed at an arbitrary time point; the waiting time of this observer was the random variable T res -the "residual lifetime" of the random variable T.
The renewal application focused on measuring the statistical divergence of the random variable T res from the random variable T. In effect, this statistical divergence quantifies the extent by which the distribution of the random variable T deviates from the exponential distribution.Indeed, the statistical divergence of T res from T was shown to be zero if and only if the random variables T and T res share a common exponential distribution.
The renewal process whose inter-renewal durations are exponentially distributed is the Poisson process.Hence, from a renewal perspective: the statistical divergence of T res from T quantifies the deviation of renewal processes from the 'Poisson benchmark'.In statistical physics the exponential distribution is the paradigmatic model of 'regular' relaxation.Hence, from a statistical-physics perspective: the statistical divergence of T res from T quantifies the deviation of anomalous relaxation from the 'regular-relaxation benchmark'.
The statistical divergence of the random variable T res from the random variable T was measured using two quantities, α m (T res ||T) and β m (T res ||T).As shown above, these two quantities can be mapped-via one-to-one transformations-to inequality indices of the random variable T. The measurement of the statistical divergence of the random variable T res from the random variable T by the quantity γ m (T res ||T) is detailed in Appendix A. Also detailed in Appendix A is the measurement of the statistical divergence of the random variable T from the random variable T res via the quantity γ m (T||T res ).
The implementation of the aforementioned quantities, based on empirical data-be it real-world, experimental, simulated, etc.-is practiced as follows.Firstly, given n samples of the random variable T, order these samples increasingly: Secondly, set π 0 = 0 and calculate the following proportions: Thirdly, calculate the quantities via the following approximation formulae.

•
α m (T res ||T) ≃ (m + 1) Indeed, based on the empirical data, it is shown in Appendix A that the estimate of the curve C(u) is the linear interpolation of the unit-square points C(0) = 0 and In turn, the slopes of the piecewiselinear estimate of the curve C(u) are C ′ (u) = n(n − i + 1)(π i − π i−1 ) for i−1 n < u < i n (i = 1, 2, • • • , n).Consequently, the estimates of the aforementioned quantities are given by the above approximation formulae.An alternative way to implement the quantities α m (T res ||T) and β m (T res ||T) is the following: use Equations ( 26) and (27) to represent these quantities in terms of the corresponding quantities of Section 3, and use the approximation formulae of Section 3.7 with regard to the latter quantities.

Conclusions
This paper addressed the topic of statistical divergence.To that end, a pair of random variables, which take values in a common real range, were considered: a 'benchmark' random variable X, and a random variable Y of interest.The focus was put on gauging the distance of the statistical distribution of Y from the statistical distribution of X.
A general framework of statistical divergence was established in Section 2, and was summarized in Section 2.6.The general framework was constructed in a simple and transparent fashion, and the gauges it yielded included the Hellinger divergence, the Renyi divergence, the Kullback-Leibler divergence, and the f-divergence.Two applications of the general framework were then presented.
The first application was to the topic of socioeconomic inequality.This application was detailed in Section 3, and was summarized in Section 3.7.The fruits that this application yielded included the Lorenz curve, socioeconomic inequality indices, the Gini index, and generalizations of the Gini index.
The second application was to the topic of renewal processes.This application was detailed in Section 4, and was summarized in Section 4.7.The fruits that this application yielded included gauges that quantify the divergence of renewal processes from the Poisson process, gauges that quantify the divergence of anomalous relaxation from regular relaxation, and further generalizations of the Gini index.
Empirical applications of the general framework are beyond the scope of this paper.Each of the aforementioned summary subsections provided 'implementation formulae'.Namely, with regard to given empirical data-be it real-world, experimental, simulated, etc.-the implementation formulae specify how to calculate the gauges that were established and presented here.
Theoretically, this paper offers its readers a transparent and rather general path to statistical divergence, paths thereof to further topics, and deep linkages between the different (and seemingly unrelated) topics addressed.Practically, this paper offers its readers potent gauges of statistical divergence, and explicit formulae that specify how to implement the gauges.Theoretically and practically alike, this paper provides a wide and multi-disciplinary perspective on statistical divergence.
The quantity appearing in the bottom line of Equation (A25) is the Kullback-Leibler divergence of the random variable Y (whose density function is b(r)) from the random variable X (whose density function is a(r)).

2 10
(3) admits the formulation ∆(Y||X) = 1 − C(u)du.The quantity ∆(Y||X) takes values in the range [−1, 1].The quantity ∆(Y||X) attains its lower bound if and only if the random variable Y equals its lower bound with probability one: ∆(Y||X) = −1 ⇔ Pr(Y = r low ) = 1.Antithetically, the quantity ∆(Y||X) attains its upper bound if and only if the random variable Y equals its upper bound with probability one: ∆(Y||X) = 1 ⇔ Pr Y = r up = 1.The quantity ∆(Y||X) is zero if-but not only if-the random variables X and Y are equal in law: A(r) ≡ B(r) ⇒ ∆(Y||X) = 0 (this implication uses the aforementioned assumption Pr(Y = X) = 0).

1 0
[u − C(u)]du.The Gini index displays the three following properties.(I) It takes values in the range [0, 1].(II) It attains its lower bound 0 if and only if the underlying human society is in the socioeconomic state of perfect equality.(III) It attains its upper bound 1 if and only if the underlying human society is in the socioeconomic state of perfect inequality.
The quantity α m (W $ ||W), via the maximal random variable max{W 1 , • • • , W m+1 }, sets its focus on the occurrence of large wealth values.It is shown in Appendix A that the quantity α m (W $ ||W) takes values in the range [0, m].
The quantity β m (W $ ||W), via the minimal random variable min{W 1 , • • • , W m+1 }, sets its focus on the occurrence of small wealth values.It is shown in Appendix A that the quantity β m (W $ ||W) takes values in the range [0, 1].
that the moments of the random variableU are E[U m ] = Pr(X 1 , • • • , X m ≤ Y); namely, the moment E[U m ]is the probability that the m IID copies of X are all no larger than Y.
1, the moments of the random variable U * are E[U m * ] = 1 m+1 .A derivation detailed in Appendix A asserts • • • , W m+1 }, measured relative to the mean of the random variable W. As the quantity α m (W $ ||W) takes values in the range [0, m], and as the quantity β m (W $ ||W) takes values in the range [0, 1], the quantity ρ m (W $ ||W) takes values in the range [0, m + 1].

Table 2 .
Inequality indices that are obtained via transformations of the quantities presented and discussed in Section 3.For each quantity, the table specifies the transformation and the resulting inequality index.The values of the underlying parameter m are as in Table1: m = 1, 2, 3, • • • for the quantities α m (W $ ||W) and β m (W $ ||W), as well as for the quantity ρ m (W $ ||W); and m > 1 for the quantities γ m (W $ ||W) and γ m (W||W $ ).