Elsevier

Physics Reports

Volume 649, 5 September 2016, Pages 1-29
Physics Reports

Harnessing inequality

https://doi.org/10.1016/j.physrep.2016.07.005Get rights and content

Abstract

Living in the era of “big-data” information, we are ubiquitously inundated by overabundances of sizes—non-negative numerical values representing count, score, length, area, volume, duration, mass, energy, etc. Datasets of sizes display numerous types of statistical variability that are commonly quantified either by the standard deviation, or by the Boltzmann–Gibbs–Shannon entropy. The standard deviation measures the sizes’ Euclidean divergence from their mean, the Boltzmann–Gibbs–Shannon entropy measures the sizes’ informational divergence from the benchmark of pure determinism, and both these gauges are one-dimensional. In this paper we overview a methodology that harnesses inequality in order to quantify statistical variability. The methodology follows a socioeconomic approach of measuring the sizes’ inequality–their divergence from the benchmark of pure egalitarianism–and yields frameworks that gauge statistical variability in a multi-dimensional fashion. The aim of this overview is to serve both researchers and practitioners as a crash-introduction to the “harnessing inequality” methodology, and as a crash-manual to the implementation of this methodology.

Introduction

The inequality between the rich and the poor is a matter of most significant interest and importance, public and scientific alike [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30]. To attend this matter in the context of a given human society it is essential, at first, to quantitatively measure the society’s socioeconomic inequality. Indeed, as often said: “If you cannot measure it, you cannot improve it”. So, how can we gauge socioeconomic inequality?

Common wisdom would suggest applying the mainstream statistical approach: collect the wealth data of the society members, and then calculate the corresponding mean and standard deviation. The calculated standard deviation is a quantitative measure of the statistical variability of the society’s wealth distribution: the smaller the standard deviation–the smaller the fluctuations about the mean, and hence the more egalitarian the society; the larger the standard deviation–the larger the fluctuations about the mean, and hence the greater the gap between the rich and the poor.

This mainstream statistical approach is obviously correct. However, this approach is quite rudimentary, and it fails to adequately capture the inequality between the rich and the poor. And indeed, economists and social scientists do not apply this approach. Rather, to quantify socioeconomic inequality economists and social scientists use gauges called  inequality indices—the most widely applied such gauge being the well known Gini index   [31], [32], [33], [34], [35].

Wealth is a particular example of size, and in the current era of “big data” information we are ubiquitously inundated by overabundances of sizes—non-negative numerical values such as count, score, length, area, volume, mass, energy, duration, etc. Quantifying the statistical variability, i.e. the intrinsic randomness, of datasets of sizes is of prime importance. For example, consider the criticality of correctly tracking and analyzing the statistical variability of climate-change measurements such as temperatures, precipitation, levels of greenhouse gasses, etc.

The standard deviation is the most common gauge of statistical variability in the context of real-valued quantities. Yet another common gauge of statistical variability that is applicable in the context of real-valued quantities is the Boltzmann–Gibbs–Shannon entropy, which stems from statistical physics and from information theory  [36], [37], [38], [39], [40]. In effect, in the context of sizes, equality indices can also be considered as legitimate gauges of statistical variability. Indeed, the mathematical formula for the calculation of an inequality index will do the very same ‘job’ when its inputs represent socioeconomic wealth data as when they represent arbitrary size data.

When applying an inequality index to a general dataset of sizes the large sizes are deemed to be ‘the rich’, the small sizes are deemed to be ‘the poor’, and the dataset’s statistical variability is quantified in terms of its inherent ‘socioeconomic inequality’. To date, this socioeconomic approach to measure the statistical variability of arbitrary sizes uses almost exclusively the Gini index. Quite recent examples of Gini-index applications outside economics and the social sciences include: bacterial chemotaxis [41], interface friction  [42], crowd science  [43], RNA regulatory mechanisms  [44], stem cell differentiation [45], clean energy  [46], cancer mutations  [47], maternal and pediatric health and disease  [48], cell transplants  [49], and cosmological lensing  [50].

The standard deviation, the Boltzmann–Gibbs–Shannon entropy, and the Gini index are all one-dimensional gauges of statistical variability. In the era of “big data” information one-dimensional gauges bear the risk of providing low resolution and missing of fine details. Consequently, huge datasets call for a  multi-dimensional quantification of statistical variability, and in the context of sizes the socioeconomic approach well answers this call. Indeed, the socioeconomic approach yields a plethora of different inequality indices, hence providing the desired multi-dimensionality.

In recent years physicists have leaped far beyond their ‘core’ domain of investigating physical phenomena. Driven by the study of complex systems and networks physicists entered the exploration of economic and social phenomena, thus establishing the ‘exotic’ fields of econophysics and sociophysics [7], [8], [17], [18], [51], [52], [53], [54], [55], [56]. While ‘classic’ physics coupled together strong predictive theory and carefully designed experiments that were executed with controlled accuracy, ‘exotic’ fields of physics often operate in domains where such theory-experimentation coupling is unfeasible. Instead, scientific exploration is advanced by harvesting and thereafter studying “big data” information—in which case a multi-dimensional analysis of the inherent statistical variability is indeed of the essence.

Remarkably, the arrow leading from ‘classic’ physics to ‘exotic’ econophysics and sociophysics now reverses. In the original direction statistical-physics methods facilitated the modeling of complex economic and social phenomena. In the reverse direction socioeconomic-inequality methods facilitate the multi-dimensional statistical-variability analysis of huge datasets of sizes—be they physical, environmental, biological, or of any other source.

This paper has two main goals. On the one hand–following the widespread contemporary interest in socioeconomic inequality–the first goal is to present a panoramic up-to-date overview of the recent advances in the quantitative measurement of inequality. On the other hand–following the rise of “big data” information–a grander goal is to present a “harnessing inequality ” methodology: a multi-dimensional socioeconomic approach to gauge statistical variability in the context of sizes at large.

This paper is a crash-introduction to the “harnessing inequality” methodology, and a crash-manual to its implementation. The reading of this paper requires no prior knowledge beyond basic calculus and statistics, and the paper’s organization is as follows:

After recalling the standard deviation and entropy (Section  2), we introduce the notions of Lorenz curves (Section  3), Lorenz sets (Section  4), and inequality indices (Section  5). In the context of these notions we further discuss finiteness and infiniteness (Section  6). We then present an assortment of quantitative measures of inequality: the Gini and Amato indices (Section  7); the Pietra index (Section  8); the vertical-diameter, horizontal-diameter, and perpendicular-diameter indices (Section  9); the vertical and horizontal hill curves (Section  10); and disparity-based inequality indices (Section  11). With regard to the various inequality indices presented we address their universal ordering (Section  12), their underlying statistical bedrock (Section  13), and their underlying geometric bedrock (Section  14). Sections  2 Foundation, 3 Lorenz curves, 4 Lorenz sets, 5 Quantifying inequality, 6 Finiteness and infiniteness, 7 Area and circumference, 8 Maximal distances, 9 Maximal widths, 10 Hill curves, 11 Rich–poor disparity, 12 Universal ordering, 13 Statistical bedrock, 14 Geometric bedrock overview a ‘sociogeometric’ perspective of inequality. We supplement the ‘sociogeometric’ perspective by an alternative entropy-based perspective of inequality (Section  15), and demonstrate the effectiveness of the entropy-based perspective in the context of partitioned datasets of sizes (Section  16). We conclude with a summary (Section  17), and with a ‘take-home message’ (Section  18).

A general note about notation: x=φ1(y) denotes the inverse function of a given monotone function y=φ(x), with real-valued inputs and outputs.

Section snippets

Foundation

Throughout the paper we consider as given a dataset of non-negative sizes S={s1,s2,,sn}. We term the integer n the dimension of the dataset S, and label the dataset’s sizes by the index i=1,2,,n. In what follows we assume that the sum of the dataset’s sizes is positive, s1++sn>0, thus excluding the trivial scenario in which all the sizes are identically zero: s1=s2==sn=0.

As noted in the introduction, the examples of such datasets are numerous. To illustrate the generality of the dataset S

Lorenz curves

Lorenz curves constitute the foundation underlying the sociogeometric gauging of the statistical variability of the dataset of sizes S   [63], [64], [65], [66], [67], and in this section we describe these curves. Specifically, in what follows we introduce a pair of Lorenz curves, y=L(x) and y=L̄(x)  (0x,y1), that encode the dataset of sizes S.

For the sake of illustration we henceforth consider the elements of the dataset of sizes S to represent the wealth values of the members of a given

Lorenz sets

With the notion of Lorenz curves at hand, we are now in position to present the notion of Lorenz sets. In this section we first describe the notion of socioeconomic extremes, and then introduce the Lorenz sets.

Quantifying inequality

In the previous section we introduced the Lorenz set L, a geometric object that quantifies the distribution of wealth in the human society under consideration. Evidently, the closer the Lorenz set L is to the ‘lower bound’ backbone B–the more egalitarian the society; and the closer the Lorenz set L is to the ‘upper bound’ unit square U–the greater the gap between the rich and the poor. Consequently, geometric measures can be used to quantify the divergence of the Lorenz set L from the backbone B

Finiteness and infiniteness

As established in Section  4.2, the Lorenz set L always resides in between two geometric ‘bounds’—the ‘lower bound’ backbone B that manifests the socioeconomic extreme of pure communism, and the ‘upper bound’ unit square U that manifests the socioeconomic extreme of absolute monarchy. There is a subtle distinction between these two ‘bounds’. On the one hand, the backbone B characterizes pure communism for any dimension, n, of the dataset of sizes S. On the other hand, the unit square U

Area and circumference

In this section we review two inequality indices. The first is the popular Gini index, which is based on the area of Lorenz sets. The second is the Amato index, which is based on the circumference of Lorenz sets.

Maximal distances

The convex shape of the Lorenz set L implies that the maximal distances of its Lorenz-curve boundaries from the backbone are monotone geometric measures of this set. Specifically, in this section we consider the six following geometric measures [68]:

  • The maximal vertical distances, M(L)=MVD(L) and M(L)=MVD¯(L), between the line of perfect equality y=x and the Lorenz curves y=L(x) and y=L̄(x), respectively.

  • The maximal horizontal distances, M(L)=MHD(L) and M(L)=MHD¯(L), between the line of perfect

Maximal widths

The convex shape of the Lorenz set L implies that its  maximal widths–the maximal distances between its Lorenz-curve boundaries–are monotone geometric measures of this set. Specifically, in this section we consider the three following geometric measures  [68]:

  • The maximal vertical width M(L)=MVW(L)—the maximal vertical distance between the Lorenz curves y=L(x) and y=L̄(x).

  • The maximal horizontal width M(L)=MHW(L)—the maximal horizontal distance between the Lorenz curves y=L(x) and y=L̄(x).

  • The

Hill curves

In Section  9 we introduced the vertical and horizontal maximal widths of Lorenz sets, and established them as inequality indices. In fact, any vertical and horizontal width of the Lorenz sets–rather than only the maximal vertical and horizontal widths–is an inequality index. The deficiency of these inequality indices, though, is that they fail to uniquely characterize the socioeconomic extreme of absolute monarchy. In this section we rectify the deficiency of the vertical-width and the

Rich–poor disparity

The Lorenz curves quantify the distribution of wealth within given societies. Often, beyond the overall distribution of wealth, interest is focused on the disparity between the rich and the poor. The vertical-width and horizontal-width hill curves, presented in the previous section, provided two socioeconomic quantifications of the rich–poor gap. In this section we present a framework–analogous to the Lorenz-based framework–for quantifying the rich–poor disparity. In what follows we introduce

Universal ordering

So far we presented quite a few inequality indices. Do these inequality indices obey certain relationships? Well, excluding the Amato index ICirc(L), the answer is affirmative. Indeed, the following ordering of the inequality indices holds universally  [88]: IPdiam(L)IBdist(L)IArea(L)IVdiam(L),IHdiam(L). Interestingly, the popular Gini index IArea(L) and the socioeconomically meaningful Pietra index IBdist(L) are bounded–from below and from above–by the diameter indices. We note that, in

Statistical bedrock

To gain insights into the statistical meanings of the various inequality indices addressed above, we now turn to examine their stochastic representations. To that end let E[] denote the operation of mathematical expectation, S denote a randomly sampled element from the dataset of sizes S, and Q(u) (0u1) denote the quantile function of the random variable S.8

In terms of these notations Eq. (2) implies that the

Geometric bedrock

Our starting point, both in the introduction and in Section  2, was the mainstream statistics tool: the standard deviation σ. Thereafter, following a sociogeometric approach, we arrived at a whole set of inequality indices. Based on the stochastic representations of Section  13.1, and excluding the stand-alone Amato index, in this section we discuss the profoundly different ‘bedrock geometries’ underlying the standard deviation and the inequality indices. We begin with the Manhattan and

Inequality, entropy and perplexity

As noted throughout the manuscript, the sociogeometric frameworks of Lorenz sets and disparity sets are invariant with respect to linear transformations of the underlying dataset of sizes S. Thus, we can always consider the dataset’s sizes to be ‘normalized’, i.e. sum up to one: s1+s2++sn=1.9 This implies that s=(s1,s2,,sn) is an n-dimensional probability vector. Consequently, the

Partitioning

In this section we shall address the topic of partitioning a dataset into a collection of disjoint subsets: S=S1Sc, where S is the dataset, and where {S1,,Sc} is the collection of its disjoint subsets. For example, the dataset S compiles the wealths of all USA citizens, the partitioning is with regard to the different States, and the subset Si compiles the wealths of the citizens of State i (i=1,,c, with c=50). The question we shall explore in this section is the following: can the

Summary

In this paper we presented a “harnessing inequality” methodology for measuring the statistical variability, i.e. the intrinsic randomness, of general datasets of sizes. The mainstream statistical approach measures the statistical variability of a given dataset S via its standard deviation σ—a gauge quantifying the Euclidean divergence of the dataset’s sizes from their mean. Alternatively, statistical physics and information theory measure the statistical variability of the given dataset S via

Take-home message

We conclude the paper with the following ‘take-home message’. When studying a dataset of sizes–no matter from what discipline or field of science the dataset came from, and no matter what the sizes represent–do not restrict yourself to quantifying the dataset’s statistical variability by mainstream one-dimensional gauges such as the standard deviation or the Boltzmann–Gibbs–Shannon entropy. Rather, harness inequality to gain a fine and detailed view of the dataset’s statistical variability:

References (115)

  • C.E. Shannon et al.

    The Mathematical Theory of Communication

    (1971)
  • J.L. Gastwirth

    Econometrica

    (1971)
  • C. Gini

    Econ. J.

    (1921)
  • E.M. Hoover

    Rev. Econ. Stat.

    (1936)
  • B.V. Frosini

    Empir. Econ.

    (2012)
  • J.M. Sarabia et al.

    Physica A

    (2014)
  • A. Ghosh et al.

    Physica A

    (2014)
  • I. Eliazar

    Physica A

    (2016)
  • I. Eliazar et al.

    Physica A

    (2014)
  • I. Eliazar

    Physica A

    (2015)
  • S. Gorard

    Brit. J. Educ. Stud.

    (2005)
  • S. Yitzhaki et al.

    METRON

    (2013)
  • H. Theil

    Economics and Information Theory

    (1967)
  • C.D. Manning et al.

    Foundations of Statistical Natural Language Processing

    (1999)
  • K. Marx et al.

    The Communist Manifesto

    (2014)
  • K. Marx

    Das Capital

    (1867)
    K. Marx

    Capital: A Critique of Political Economy, Vol. 1–3

    (1992–1993)
  • V. Pareto

    Cours d’économie politique

    (1896)
    V. Pareto

    Manual of Political Economy

    (2014)
  • J. Rawls

    A Theory of Justice

    (1971)
  • J.Y. Duclos et al.

    Poverty and Equity: Measurement, Policy and Estimation with DAD

    (2006)
  • B. Milanovic

    Worlds Apart: Measuring International and Global Inequality

    (2007)
  • C. Freeland

    Plutocrats: The Rise of the New Global Super-rich and the Fall of Everyone Else

    (2012)
  • B. Milanovic

    The Haves and the Have-nots: A Brief and Idiosyncratic History of Global Inequality

    (2012)
  • J. Sutter, Help wanted: must-reads on income inequality and the rich-poor gap, CNN, Aug. 19, 2013:...
  • J.E. Stiglitz

    The Price of Inequality: How Today’s Divided Society Endangers Our Future

    (2013)
  • A. Deaton

    The Great Escape: Health, Wealth, and the Origins of Inequality

    (2013)
  • B.K. Chakrabarti et al.

    Econophysics of Income and Wealth Distributions

    (2013)
  • P. Sen et al.

    Sociophysics: An Introduction

    (2014)
  • M. Buchanan

    Nat. Phys.

    (2014)
  • The science of inequality, a special issue of Science, 23 May...
  • A. Chatterjee
  • A.B. Atkinson

    Inequality: What Can Be Done?

    (2015)
  • F. Bourguignon

    The Globalization of Inequality

    (2015)
  • J.E. Stiglitz

    The Great Divide: Unequal Societies and What We Can Do About Them

    (2015)
  • T. Piketty

    The Economics of Inequality

    (2015)
  • C. Boix

    Political Order and Inequality

    (2015)
  • L. Leopold

    Runaway Inequality

    (2015)
  • B. Milanovic

    Global Inequality

    (2016)
  • P.H. Lindert

    Unequal Gains

    (2016)
  • P.B. Coulter

    Measuring Inequality: A Methodological Handbook

    (1989)
  • L. Hao et al.

    Assessing Inequality

    (2010)
  • F. Cowell

    Measuring Inequality

    (2011)
  • S. Yitzhaki et al.

    The Gini Methodology

    (2012)
  • I. Eliazar et al.

    Physica A

    (2012)
  • E.T. Jaynes

    Phys. Rev.

    (1957)
  • Cited by (0)

    View full text