Universal patterns of inequality

Probability distributions of money, income, and energy consumption per capita are studied for ensembles of economic agents. The principle of entropy maximization for partitioning of a limited resource gives exponential distributions for the investigated variables. A non-equilibrium difference of money temperatures between different systems generates net fluxes of money and population. To describe income distribution, a stochastic process with additive and multiplicative components is introduced. The resultant distribution interpolates between exponential at the low end and power law at the high end, in agreement with the empirical data for USA. We show that the increase of income inequality in USA originates primarily from the increase of the income fraction going to the upper tail, which now exceeds 20% of the total income. Analyzing the data from the World Resources Institute, we find that the distribution of energy consumption per capita around the world can be approximately described by the exponential function. Comparing the data for 1990, 2000, and 2005, we discuss the effect of globalization on the inequality of energy consumption.


Introduction
Two types of approaches are utilized in sciences to describe the natural world around us. One approach is suitable for systems with a small number of degrees of freedom, such as a harmonic oscillator, a pair of gravitating bodies, and a hydrogen atom. In this case, the goal is to formulate and solve dynamical equations of motion of the system, be it within Newtonian, relativistic, or quantum mechanics. This approach is widely used beyond physics to study dynamical systems in chemistry, biology, economics, etc. In the opposite limiting case, we deal with systems consisting of a very large number of degrees of freedom. In such cases, statistical description is employed, and the systems are characterized by probability distributions. In principle, it should be possible to derive statistical description from microscopic dynamics, but it is rarely feasible in practice. Thus, it is common to use general principles of the theory of probabilities to describe statistical systems, rather than to derive their properties from microscopic equations of motion. Statistical systems are common in physics, chemistry, biology, economics, etc.
Any probability distribution can be thought of as representing some sort of "inequality" among the constituent objects of the system, in the sense that the objects have different values of a given variable. Thus, a study of probability distributions is also a study of inequality developing in a system for statistical reasons. To be specific, let us consider an economic system with a large number of interacting agents. In the unrealistic case where all agents have exactly the same values of economic variables, the system can be treated as a single agent called the "representative agent." This approach is common in traditional economics, but, by construction, it precludes a study of inequality among the agents. However, social and economic inequality is ubiquitous in the real world, and its characterization and understanding are very important issues.
In this paper, we apply the well-developed methods of statistical physics to economics and society in order to gain insights into probability distributions and inequality in these systems. We consider three specific cases: the distributions of money, income, and global energy consumption. In all three cases, the common theme is entropy maximization for partitioning of a limited resource among multiple agents. Despite the difference in the nature of the considered variables, we find a common pattern of inequality in these cases. This approach can be also useful for studying other statistical systems beyond the three specific cases considered in this paper.
Applications of these ideas to money and income have been published in the literature before: see review [1]. To introduce these ideas and to make the paper selfcontained, we briefly review the applications to money and income in section 2 and section 3. Section 3 also shows the latest available data for income distribution in 2007, not published before. In section 4, we present a quantitative study of the probability distribution of energy consumption per capita around the world. This is a new kind of study that, to the best of our knowledge, has not appeared before in the literature.

Entropy maximization for division of a limited resource
Let us consider a general mathematical problem of partitioning (dividing) a limited resource among a large number of agents. The solution of this problem is similar to the derivation of the Boltzmann-Gibbs distribution of energy in physics [2]. To be specific, let us apply it to the probability distribution of money in a closed economic system.
Following [3], let us consider a system consisting of N economic agents. At any moment of time, each agent i has a money balance m i . Agents make pairwise economic transactions with each other. As a result of a transaction, the money ∆m is transferred from an agent i to an agent j, so their money balances change as follows The total money of the two agents before and after transaction remains the same i.e., there is a local conservation law for money. It is implied that the agent j delivers some goods or services to the agent i in exchange for the money payment ∆m. However, we do not keep track of what is delivered and only keep track of money balances. Goods, such as food, can be produced and consumed, so they are not conserved. Rule (1) for the transfer of money is analogous to the transfer of energy from one molecule to another in molecular collisions in a gas, and rule (2) is analogous to conservation of energy in such collisions. It is important to recognize that ordinary economic agents cannot "manufacture" money (even though they can produce and consume goods). The agents can only receive money from and give it to other economic agents. In a closed system, the local conservation law (2) implies the global conservation law for the total money M = i m i in the system. In the real economy, M may change due to money emission by the central government or central bank, but we will not consider these processes here. Another possible complication is debt, which may be considered as negative money. Here we consider a model where debt is not permitted, so all money balances are non-negative m i ≥ 0.
After many transactions between different agents, we expect that a stationary probability distribution of money would develop in the system. It can be characterized as follows. Let us divide the money axis m into the intervals (bins) of a small width m * and label them with an integer variable k. Let N k be the number of agents with the money balances between m k and m k + m * . ‡ Then, the probability to have a money balance in this interval is P (m k ) = N k /N. We would like to find the stationary probability distribution of money P (m), which is achieved in statistical equilibrium.
Because the total money M in the system is conserved, the problem reduces to partitioning (division) of the limited resource M among N agents. One possibility is an equal division, where each agents gets the same share M/N of the total money. However, such an equal partition would be extremely improbable. It is more reasonable to obtain the probability distribution of money from the principle of entropy maximization. Let us consider a certain set of occupation numbers N k of the money bins m k . The multiplicity Ω is the number of different realizations of this configuration, i.e., the number of different placements of the agents into the bins preserving the same set of occupation numbers N k . It is given by the combinatorial formula in terms of the factorials § The logarithm of multiplicity is called the entropy S = ln Ω. In the limit of large numbers, we can use the Stirling approximation for the factorials In statistical equilibrium, the entropy S is maximized with respect to the numbers N k under the constraints that the total number of agents N = k N k and the total money M = k m k N k are fixed. To solve this problem, we introduce the Lagrange multipliers α and β and construct the modified entropỹ Maximization of entropy is achieved by setting the derivatives ∂S/∂N k to zero for each N k . Substituting (4) into (5) and taking the derivatives , we find that the equilibrium probability distribution of money P (m) is an exponential function of m Here the parameters T = 1/β and µ = α/T are the analogs of temperature and chemical potential for money. Their values are determined by the constraints We see that the money temperature T = m (8) is nothing but the average amount of money per agent. The chemical potential µ (7) is a decreasing function of T . Equation (6) shows quite generally that division of a conserved limited resource using the principle of entropy maximization results in the exponential probability distribution of this resource among the agents. In physics, the "limited resource" is the § Notice that human agents, unlike particles in quantum physics, are distinguishable.
Notice that N = k N k in (4) should be also differentiated with respect to N k . energy E divided among N molecules of a gas, and the result is the Boltzmann-Gibbs distribution of energy [2]. The exponential distribution of money (6) was proposed in [3], albeit without explicit discussion of the chemical potential, as well as in [4]. Various models for kinetic exchange of money are reviewed in [1] and in the popular article [5]. The applicability of the underlying assumptions of money conservation and random exchange of money is discussed in [1] and [6]. The analogy between energy and money is mentioned in some physics textbooks [7], but not developed in detail.

Flow of money and people between two countries with different temperatures
To illustrate some consequences of the statistical mechanics of money, let us consider two systems with different money temperatures T 1 > T 2 . These can be two countries with different average amounts of money per capita: the "rich" country with T 1 and the "poor" with T 2 . ¶ Suppose a limited flow of money and agents is permitted between the two systems. Given that the variation δS vanishes due to maximization under constraints, we conclude from (5) that If δM and δN denote the flow of money and agents from system 1 to system 2, then the change of the total entropy of the two systems is According to the second law of thermodynamics, the total entropy should be increasing, so δS ≥ 0. Then, the first term in (10) shows that money should be flowing from the high-temperature system (rich country) to the low-temperature system (poor country). This is called the trade deficit -a systematic net flow of money from one country to another, which is best exemplified by the trade between USA and China. The second term in (10) shows that the agents would be flowing from high to low chemical potential, which corresponds to immigration from a poor to a rich country. Both trade deficit and immigration are widespread global phenomena. The direction of these processes can be also understood from (8). The two systems are trying to equilibrate their money temperatures T = M/N, which can be achieved either by changing the numerators due to money flow or the denominators due to people flow.

Thermodynamics of money and wealth
Thermal physics has two counterparts: statistical mechanics and thermodynamics. Statistical mechanics of money was outlined in section 2.1. Is it possible to construct an analog of thermodynamics for money? Many attempts were made in the literature, but none was completely successful: see reviews [5] and [8].
One of the important concepts in thermodynamics is the distinction between heat and work. In statistical physics, this distinction can be microscopically interpreted as follows [2]. The internal energy of the system is U = k ε k N k , where ε k is an energy level, and N k is the occupation number of this level. Suppose the energy levels ε k (λ) depend on some external parameters λ, such as the volume of a box in quantum mechanics, an external magnetic field acting on spins, etc. Then, the variation of U contains two terms δU = k (δε k )N k + k ε k (δN k ) = δW + δQ. The first term has mechanical origin and comes from the variation δε k = (∂ε k /∂λ) δλ of the energy levels due to changes of the external parameters λ. This term is interpreted as the work δW done on the system externally. The second term has statistical origin and comes from the changes δN k in the occupation numbers of the energy levels. This term is interpreted as the heat δQ.
An analog of this construction does not seem to exist for money M = k m k N k . A variation δM = k m k (δN k ) is possible due to changes in the occupation numbers, but there is no analog of the variation δm k of the "money levels" due to changes in some external parameters. Thus, we can only define the heat term, but not the work term in the money variation. Indeed, (9) is the analog of the first law of thermodynamics for money, but there is no term corresponding to work in this equation.
Nevertheless, statistical mechanics of money can be extended to a form somewhat resembling conventional thermodynamics, if we take into account the material property of the agents. Let us define the wealth w i of an agent i as a sum of two terms. One term represents the money balance m i , and another term the material property, such as a house, a car, stocks, etc. For simplicity, let us consider only one type of property, so that the agent has v i physical units of this property. In order to determine the monetary value of this property, we need to know the price P per unit. Then, the wealth of the agent is w i = m i + P v i . Correspondingly, the total wealth W in the system is + where V = i v i is the total "volume" of the property in the system. If money M is analogous to the internal energy U in statistical physics, then wealth W is analogous to the enthalpy H. The wealth W includes not only the money M, but also the money equivalent necessary to acquire the volume V of property at the price P per unit. Let us consider the differential of wealth Here the first two terms cancel out, and only the last term remains. Indeed, when the volume dV > 0 of property is acquired, the money dM = −P dV < 0 is paid for the property, i.e., money is exchanged for property. Equation (12) is also valid at the level of individual agents, dw i = v i dP . These equations show that wealth changes only when the price P changes.
To advance the analogy with thermodynamics, let us consider a closed cycle in the (V, P ) plane illustrated in figure 1. This cycle can be interpreted as a model of stock + From now on, we use the letter W to denote wealth, not work.  Figure 1. A closed cycle of speculation or trading. V and P represents the volume and price of goods. market speculation, in which case V is the volume of stock held by a speculator. Starting from the lower left corner, the speculator purchases the stock at the low price P 2 and increases the owned volume from V 1 to V 2 . Then, the price increases from P 2 to P 1 . At this point, the speculator sells the stock at the high price P 1 , reducing the owned volume from V 2 to V 1 . Then, the price of the stock drops to the level P 2 , and the cycle can be repeated. From (12), we find that the wealth change of the speculator is ∆W = V dP , which is the area (P 1 − P 2 )(V 2 − V 1 ) enclosed by the cycle in figure 1. From (11), we also find that ∆W = ∆M, because P and V return to the initial values at the end of the cycle. Thus, the monetary profit ∆M is given by the area enclosed by the cycle. This money is extracted by the speculator from the other players in the market, so the conservation law of money is not violated. In the ideal economic equilibrium, there should be no price changes allowing one to make systematic profits, which is known as the "no-arbitrage theorem". However, in the real market, significant rises and falls of stock prices do happen, especially during speculative bubbles.
The cycle in figure 1 also illustrates the trade between China and USA. Suppose a trade company pays money M 2 to buy the volume V 2 − V 1 of the products manufactured in China at the low price P 2 . After shipping across Pacific Ocean, the products are sold in USA at the high price P 1 , and the company receives money M 1 . Empty ships return to China, and the cycle repeats. As shown in [3], the price level P is generally proportional to the money temperature T . Thus, the profit rate in this cycle is By analogy with physics, one can prove that (13) gives the highest possible profit rate for the given temperatures T 1 and T 2 . Indeed, from (9) with δN = 0, we find that Under the most ideal circumstances, the total entropy of the whole system remains constant, so ∆S 1 = ∆S 2 . Then, M 1 /M 2 = T 1 /T 2 , and (13) follows. Here we assumed that the profit money M 1 − M 2 has low (ideally zero) entropy, because this money is concentrated in the hands of just one agent or trading company and is not dispersed among many agents of the systems.
Thermal machines have cycles analogous to figure 1, and equation (13) is similar to the Carnot formula for the highest possible efficiency [2,7]. The China-USA trade cycle resembles an internal combustion engine, where the purchase of goods from China mimics fuel intake, and the sales of goods in USA mimics expulsion of exhaust. The net result is that goods are manufactured in China and consumed in USA. The analogy between trade cycles and thermal machines was highlighted by Mimkes [9,10]. Although somewhat similar to [10], our presentation emphasizes conceptual distinction between money and wealth and explicitly connects statistical mechanics and thermodynamics.
Empirical data on the international trade network between different countries were analyzed in several papers. Serrano et al. [11] analyzed trade imbalances, defined as the difference between exports and imports from one country to another. The paper classified countries as net consumers and net producers of goods. The typical examples are USA and China, respectively, as illustrated in Figure 2 of [11] for 2000, in qualitative agreement with our discussion above. In contrast, Bhattacharya et al. [12] studied trade volumes, defined as the sum of exports and imports from one country to another. The paper found that the trade volume s of a country is proportional to the gross domestic product (GDP) of the country: s ∝ (GDP) γ with the exponent γ ≈ 1. It means that the trade volume and GDP are extensive variables in the language of thermodynamics, so the biggest volumes of trade are between the countries with the biggest GDPs. In thermal equilibrium, money flows between two countries in both directions as payment for traded goods, but the money fluxes in the opposite directions are equal, so there is trade volume, but no trade imbalance. Trade imbalance may develop when the two systems have different values of intensive parameters, such as the money temperature. Then, the direction of net money flow is determined by the sign of the temperature difference.
Of course, there may be other reasons and mechanisms for trade imbalance besides the temperature difference. Normally, the flow of money from the high-to lowtemperature system should reduce the temperature difference and eventually bring the systems to equilibrium. Indeed, in the global trade, many formerly low-temperature countries have increased their temperatures as a result of such trade. However, the situation with China is special, because the Chinese government redirects the flow of dollars back to USA by buying treasury bills from the US government. As a result, the temperature difference remains approximately constant and does not show signs of equilibration. The net result is that China supplies vast amounts of products to USA in exchange for debt obligations from the US government. The long-term global consequences of this process remain to be seen.

The circuit of money and the circuit of goods
Section 2.3 illustrates that there are two circuits in a well-developed market economy [10]. One is the circuit of money, which consists of money payments between the agents for goods and services. As argued in section 2.1, money is conserved in these transactions and, thus, can be modeled as flow of liquid, e.g., blood in the vascular system. (A hydraulic device, the MONIAC, was actually used by William Philips, the inventor of the famous Philips curve, to illustrate money flow in the economy [5].) The second circuit is the flow of goods and services between the agents. This circuit involves manufacturing, distribution, and consumption. The goods and services are inherently not conserved. They represent the material (physical) side of the economy and, arguably, are the ultimate goal for the well-being of a society. In contrast, money represents the informational, virtual side of the economy, because money cannot be physically consumed. Nevertheless, money does play a very important role in the economy by enabling its efficient functioning and by guiding resource allocation in a society. * The two circuits interact with each other when goods and services are traded (exchanged) for money. However, money cannot be physically transformed into goods and vice versa. To illustrate this point, we draw an analogy with fermions and bosons in physics. While the "circuits" of fermions and bosons interact and transfer energy between each other, it is not possible to convert a fermion into a boson and vice versa.
The important consequence of this consideration is that an increase of material production in the circuit of goods and services does not have any direct effect on the amount of money in the monetary circuit. The amount of money in the system depends primarily on the monetary policy of the central bank or government, who have the monopoly on issuing money. Technological progress in material production does not produce any automatic increase of money in the system. Thus, the expectation of continuous monetary growth, where the agents would be getting more and more money as a result of technological progress, is false. It is not possible for all businesses to operate with profit on average, i.e., to have the greater total amount of money at the end of a cycle than at the beginning. The agents can get more money on average only if the government decides to print money, i.e., to increase the money temperature T = M/N.♯ Thus, monetary growth of the economy is directly related to the deficit spending by the central government. On the other hand, it is very well possible to have technological progress and an increase in the physical standards of living without monetary growth. The monetary and physical circuits of the economy interact with each other, but they are separate circuits. Unfortunately, this distinction is often blurred in the econophysics and economics literature [10], as well as in the public perception.
3. Two-class structure of income distribution

Introduction
Although the exponential probability distribution of money (6) was proposed 10 years ago [3,4], no direct statistical data on money distribution are available to verify this conjecture. Normally, people do not report their money balances to statistical agencies. * Here we consider the modern fiat money, declared to be money by the central bank or government.
We do not touch the origin of money in the early history as some kind of special goods. ♯ For discussion of the issues related to debt, see the review [1].
Given that most people keep their money in banks, the distribution of balances on bank accounts can give a reasonable approximation of the probability distribution of money. However, these data are privately held by banks and not available publicly.
On the other hand, a lot of statistical data are available on income distribution, because people report income to the government tax agencies. To some extent, income distribution can also be viewed as a problem of partitioning of a limited resource, in this case of the total annual budget. Following section 2.1, we expect to find the exponential distribution for income. Drȃgulescu and Yakovenko [13] studied the data on income distribution in USA from the Internal Revenue Service (IRS) and from the US Census Bureau. They found that income distribution is indeed exponential for incomes below 120 k$ per year. However, in the subsequent papers [14,15], they also found that the upper tail of income distribution follows a power law, as was first pointed out by Pareto [16]. So, the data analysis of income distribution in USA reveals coexistence of two social classes. The lower class (about 97% of population) is characterized by the exponential Boltzmann-Gibbs distribution, and the upper class (the top 3% percent of the population) has the power-law Pareto distribution. Time evolution of the income classes in 1983-2001 was studied by Silva and Yakovenko [17]. They found that the exponential distribution in the lower class is very stable in time, whereas the power-law distribution of the upper class is highly dynamical and volatile. They concluded that the lower class is in thermal equilibrium, whereas the upper class is out of equilibrium.
Many other papers investigated income distributions in different countries: see the review [1] for references. The coexistence of two classes appears to be a universal feature of income distribution. In this section, we present a unified description of the two classes within a single mathematical model.

Income dynamics as a combination of additive and multiplicative stochastic processes
The two-class structure of income distribution can be rationalized on the basis of a kinetic approach. Suppose the income r of an agent behaves like a stochastic variable. Let P (r, t) denote the probability distribution of r at time t. Let us consider a diffusion model, where the income r changes by ∆r over a time period ∆t. Then, the temporal evolution of P (r, t) is described by the Fokker-Planck equation [18] The coefficients A(r) and B(r) are the drift and the diffusion terms, which are determined by the first and second moments of the income changes ∆r per unit time The stationary solution P s (r) of (14) satisfies ∂ t P s = 0; thus we obtain The general solution of (16) is where c is a normalization factor, such that ∞ 0 P s (r) dr = 1. In the lower class, the income comes from wages and salaries, so it is reasonable to assume that income changes are independent of income itself, i.e., ∆r is independent of r. This process is called the additive diffusion [17]. In this case, the coefficients in (14) are some constants A 0 and B 0 . Then (17) gives the exponential distribution On the other hand, the upper-class income comes from bonuses, investments, and capital gains, which are calculated in percentages. Therefore, for the upper class, it is reasonable to expect that ∆r ∝ r, i.e., income changes are proportional to income itself. This is known as the proportionality principle of Gibrat [19], and the process is called the multiplicative diffusion [17]. In this case, A = ar and B = br 2 , and (17) gives a powerlaw distribution The multiplicative hypothesis for the upper class income was quantitatively verified in [20] for Japan, where tax identification data are officially published for the top taxpayers.
The additive and multiplicative processes may coexist. For example, an employee may receive a cost-of-living raise calculated in percentages (the multiplicative process) and a merit raise calculated in dollars (the additive process). Assuming that these processes are uncorrelated, we find that A = A 0 + ar and B = B 0 + br 2 = b(r 2 0 + r 2 ), where r 2 0 = B 0 /b. Substituting these expressions into (17), we find P s (r) = c e −(r 0 /T ) arctan(r/r 0 ) The distribution (20) interpolates between the exponential law for low r and the power law for high r, because either the additive or the multiplicative process dominates in the corresponding limit. A crossover between the two regimes takes place at r ∼ r 0 , where the additive and multiplicative contributions to B are equal. The distribution (20) has three parameters: the temperature T = A 0 /B 0 , the Pareto exponent α = 1 + a/b, and the crossover income r 0 . It is a minimal model that captures the salient features of the two-class income distribution. A formula similar to (20) was also derived by Fiaschi and Marsili [21] for a microscopic economic model, which is effectively described by (14).

Comparison with the personal income data from IRS
In this section, we compare (20) (20), obtained by fitting the annual income data from IRS. r * is the income separating the upper and lower classes. f is the fraction of income going to the upper class, given by (21). G is the Gini coefficient.
distribution function (CDF), which is the integral C(r) = ∞ r P (r ′ ) dr ′ of the probability density. For the probability density (20), C(r) is not available in analytical form, therefore it has to be calculated by integrating P s (r) numerically. We use the theoretical CDF C t (r) to fit the empirical CDF C e (r) calculated from the IRS data.
Determining the best values of the three fitting parameters in the theoretical CDF is a computationally challenging task. Thus, we do it step by step. For each year, we first determine the values of T and α by fitting the low-income part of C e (r) with an exponential function and the high-income part with a power law. Then, keeping these two parameters fixed, we determine the best value of r 0 by minimizing the mean-square deviation Σ n ln 2 [C t (r n )/C e (r n )] between the theoretical and empirical functions, where the sum is taken over all income levels r n for which empirical data are available. Table 1 shows the values of the fit parameters obtained for different years. The data points for the empirical CDF and their fits with the theoretical CDF are shown in figure 2 in the log-log scale versus the normalized annual income r/T . For clarity, the curves are shifted vertically for successive years. Clearly, the theoretical curves agree well with the empirical data, so the minimal model (20) indeed captures the salient features of income distribution in USA.
In previous papers [15,17], fits of the income distribution data were made only to the exponential (18) and power-law (19) functions. The income r * , where the two fits intersect, can be considered as a boundary between the two classes. The values of r * are shown in table 1. We observe that the boundary r * between the upper and lower classes is approximately 3.5 times greater than the temperature T . Given that the CDF of the lower class is exponential, we find that the upper class population is approximately exp(−r * /T ) = exp(−3.5) = 3%, which indeed agrees with our observations.

The fraction of income in the upper tail and speculative bubbles
Let us examine the power-law tail in more detail. Although the tail contains a small fraction of population, it accounts for a significant fraction f of the total income in the system. The upper-tail income fraction can be calculated as Here R is the total income, N is the total number of people, and r = R/N is the average income for the whole system. In addition, N e is the number of people in the exponential part of the distribution, and T is the average income of these people. Since the fraction of people in the upper tail is very small, we use the approximation N e ≈ N in deriving the formula (21)  large jumps. In contrast, the fraction f going to the upper class shows large variations and now exceeds 20% of the total income in the system. The maxima of f are achieved at the peaks of speculative bubbles, first at the end of the ".com" bubble in 2000 and then at the end of the subprime mortgage bubble in 2007. After the bubbles collapse, the fraction f drops precipitously. We conclude that the upper tail is highly dynamical and out of equilibrium. The tail swells considerably during the bubbles, whereas the effect of the bubbles on the lower class is only moderate. As a result, income inequality increases during bubbles and decreases when the bubbles collapse.
In the view of the argument about conservation of money presented in section 2.1, what is the source of money for the enormous increase of the upper tail income during speculative bubbles? The stock market bubble in the late 1990s was actually predicted in the book [23] published in 1993. The prediction was based on the population data showing that the demographic wave of aging baby boomers will be massively investing their retirement money in the stock market in the second half of 1990s, which indeed happened. The stock prices rose when millions of boomers paid for the stocks of ".com" companies. When the demographic wave reached its peak around 2000 and the influx of money to the stock market started to saturate (at its highest level), the market crashed precipitously, and the population was left with worthless stocks. One can see an analogy with the cycle in figure 1. The net result of this bubble is the transfer of money from the lower to the upper class under the cover of "retirement investment".
As it is clear now, the second bubble in 2003-2007 was based on the enormous growth of debt due to proliferation of subprime mortgages. As discussed in [3,1], debt can be considered as negative money, because debt liabilities are counted with negative sign toward net worth of an individual. The conservation law (2) is still valid, but money balances m i can take negative values. So, the first moment (the "center of mass") of the money distribution m = M/N remains constant. However, now some agents can become super-rich with very high positive money balances at the expense of other agents plunging deeply into debt with negative money balances. Thus, relaxing the boundary condition m ≥ 0 undermines stability of the Boltzmann-Gibbs distribution (6). This is what happened during the subprime mortgages bubble. The money flowing to the upper tail were coming from the growth of the total debt in the system. Eventually, the bubble collapsed when the debt reached a critical level. Now the bailout effort by the government, effectively, represents the transfer of debt from economic agents to the government. The overall result is that the income growth of the upper class in 2003-2007 was coming from the bailout money that the government is printing now. As emphasized in section 2.1 and section 2.4, the government and central bank are the ultimate sources of new money because of the government monopoly on fiat money.
The discussion and the data presented in this section indicate that, by combining demographic data with the principle of money conservation, it may be possible to predict, to some degree, the macroeconomic behavior of the economy. In fact, the book [23] predicted in 1993 that "the next great depression will be from 2008 to 2023" (page 16). This is a stunning prediction 15 years in advance of the actual event. For an update, see the follow-up book [24].

The power-law exponent of the upper tail
Another parameter of the upper tail is the power-law exponent α in (19). Table 1 and panel (b) in figure 3 show historical evolution of α from 1983 to 2007. We observe that α has decreased from about 2 to about 1.3. The decrease of α means that the power-law tail is getting "fatter", i.e., the inequality of income distribution increases. It looks like the system is approaching dangerously closely to the critical value α = 1, where the total income in the tail ∞ r * rP (r) dr would formally diverge [25]. On top of the gradual decrease, α dived down and up sharply around 1987 and 2000. The dive-downs of α represent sharp increases of income inequality due to the bubbles, followed by crashes of the bubbles in 1987 and 2000 and subsequent contractions of the upper tail. Thus, the behavior of the tail exponent α is qualitatively consistent with the behavior of the tail fraction f discussed in section 3.4. A similar behavior was found for Japan [20], where α jumped sharply from 1.8 to 2.1 between 1991 and 1992 due to the crash of the Japanese market bubble.
During the times of bubbles, the sharp decrease of α is clearly a dynamical process, which cannot be described adequately by stationary equations. On the other hand, during the time between bubbles, which economists may call "recession" or "depression", the market is quiet, and it may be possible to describe it using a stationary approach. Even during these times, the power-law tail does not disappear, but the exponent α takes a relatively high value. From the panel (b) in figure 3, it appears that the upper limit for α is about 2. This limiting value is supported by other observations in the literature. Analysis of Japanese data [20] shows that α changes in the range between 1.8 to 2.2. Drȃgulescu and Yakovenko [14] found α = 1.9 for wealth distribution in UK for 1996. Thus, we make a conjecture that α = 2 is a special value of the power-law exponent corresponding to a quiet, stationary market.
In order to understand what is special about α = 2, let us examine the moments of the income change ∆r. The first moment, ∆r is always negative. This condition ensures that A > 0 in (15), so that (16) has a stationary solution. The condition ∆r < 0 indicates that, on average, everybody's income is decreasing due to the drift term, yet the whole income distribution remains stationary because of the diffusion term. In stochastic calculus, the first ∆r and the second (∆r) 2 moments are of the same order in ∆t, so they must be treated on equal footing. Thus, instead of considering the changes in r, let us discuss how r 2 changes in time. Using (15), we find ∆(r 2 ) = (r + ∆r) 2 − r 2 = 2r ∆r + (∆r) 2 = 2(−rA + B) ∆t. (22) For the additive stochastic process (18), we find from (22) that ∆(r 2 ) > 0 for r < T and ∆(r 2 ) < 0 for r > T . These conditions indicate a stabilizing tendency of the income-squares to move in the direction of the average income T . Now, let us apply (22) to the multiplicative process (19). In this case, we find ∆(r 2 ) = 2(−a + b) r 2 ∆t.
For a = b, (23) gives ∆(r 2 ) = 0 for all r. This condition can be taken as a criterion for the inherently stationary state of a power-law tail, because r 2 does not change (on average) for any r in a scale-free manner. From (19), we observe that the condition a = b corresponds to the value α = 1 + a/b = 2, which is indeed the upper value of the power-law exponent observed for stationary, quiet markets: On the other hand, for a < b, we find ∆(r 2 ) > 0 and α < 2. In this case, the incomesquare increases on average, which correlates with the upper tail expansion during the boom times. Notice that the value α = 2 in (24) is different from the value α = 1 found for the models of random saving propensity and earthquakes in [5,6,26].

Lorenz plot and Gini coefficient for income inequality
The standard way of representing income distribution in the economic literature is the Lorenz plot [27]. It is defined parametrically in terms of the two coordinates x(r) and  .
Here x(r) is the fraction of the population with incomes below r, and y(r) is the total income of this population, as a fraction of the total income in the system. When r changes from 0 to ∞, the variables x and y change from 0 to 1 producing the Lorenz plot in the (x, y) plane. The advantage of the Lorenz plot is that it emphasizes the data where most of the population is. In contrast, the log-linear and log-log plots, like figure 2, emphasize the upper tail, which corresponds to a small fraction of population, and where the data points are sparse. Another advantage of the Lorenz plot is that all available data are represented within a finite area in the (x, y) plane, whereas, in other plots, the upper end of the data at r → ∞ is inevitably truncated. For the exponential distribution P (r) = exp(−r/T )/T , it was shown in [13] that the Lorenz curve is given by the formula y = x + (1 − x) ln(1 − x). Notice that this formula is independent of T . However, when the fat upper tail is present, this formula is modified as follows [15,17] Here Θ(x − 1) is the step function equal to 0 for x < 1 and 1 for x = 1. The jump at x = 1 is due to the fact that the fraction of population in the upper tail is very small, but their fraction f of the total income is substantial.
The data points in figure 4 show the Lorenz plots calculated from the IRS data for 1996 and 2007. The solid lines in figure 4 are the theoretical Lorenz curves (26) with the values of f obtained from (21). The theoretical curves agree well with the data. The distance between the diagonal line and the Lorenz curve characterizes income inequality. We observe in figure 4 that income inequality increased from 1995 to 2007, and this increase came exclusively from the growth of the upper tail, which pushed down the Lorenz curve for the exponential income distribution in the lower class.
The standard way of characterizing inequality in the economic literature [27] is the Gini coefficient 0 ≤ G ≤ 1 defined as twice the area between the diagonal line and the Lorenz curve. It was shown that G = 1/2 for the exponential distribution [13], and when taking into account the fraction f going to the upper class on top of the exponential distribution [17]. The values of G deduced from the IRS data are given in table 1 and shown in panel (a) of figure 3 by the connected line, along with (27) shown by open circles. The increase of G indicates that income inequality has been rising since 1983. The agreement between the empirical values of G and the formula (27) in figure 3 demonstrates that the increase in income inequality from the late 1990s comes from the upper tail growth relative to the lower class.

Introduction
In the preceding sections, we studied monetary aspects of the economy and discussed probability distributions of money and income. We found that significant inequality of money and income distributions can develop for statistical reasons. Now we would like to discuss physical aspects of the economy. Since the beginning of the industrial revolution several centuries ago, rapid technological development of the society has been based on consumption of fossil fuel, such as coal, oil, and gas, accumulated in the Earth for billions of years. The whole discipline of thermodynamics was developed in physics to deal with this exploitation. Now it is becoming exceedingly clear that these resources will be exhausted in the not-too-distant future. Moreover, consumption of fossil fuel releases CO 2 to the atmosphere and affects the global climate. These pressing global problems pose great technological and social challenges. As shown below, energy consumption per capita by human population around the world has significant variation. This heterogeneity is a challenge and a complication for reaching a global consensus on how to deal with the energy problems. Thus, it is important to understand and quantitatively characterize the global inequality of energy consumption. In this section, we present such a study using the approach developed in the preceding sections of the paper.

Energy consumption distribution as division of a limited resource
Let us consider an ensemble of economic agents and characterize each agent i by the energy consumption ε i per unit time. Note that here ε i denotes not energy, but power, which is measured in kiloWatts (kW). Similarly to section 2.1, we can discuss the probability distribution of energy consumption in the system and introduce the probability density P (ε), such that P (ε) dε gives the probability to have energy consumption in the interval from ε to ε + dε. Energy production, based on extraction of fossil fuel from the Earth, is physically limited. So, energy production per unit time is a limited resource, which is divided for consumption among the global population. As argued in section 2.1, it would be very improbable to divide this resource equally. More likely, this resource would be divided according to the entropy maximization principle. Following the same procedure as in section 2.1, with money m replaced by energy consumption ε, we arrive at the conclusion that the probability distribution of ε should follow the exponential law analogous to (6) Here the "temperature" T is the average energy consumption per capita. † † Now we would like to compare the theoretical conjecture (28) with the empirical data for energy consumption around the world. For this purpose, it is convenient to introduce the cumulative distribution function Operationally, C(ε) is the number of agents with the energy consumption above ε divided by the total number of agents in the system. If P (ε) is an exponential function, then C(ε) is also exponential.

Empirical data analysis
We downloaded empirical data from the World Resources Institute (WRI) website [28]. The data on energy consumption is listed under the topic "Energy and Resources". We downloaded the variable "Total energy consumption" [29], which contains the annual energy consumption for various countries for the years 1990, 2000, and 2005 (only these years are available). Population data is listed under the topic "Population, Health and Human Well-being". We downloaded the variable "Total population, both sexes" [30], which contains the total population of various countries for the same years. From these two data files, we selected the countries for which both energy and population data are available.   Then we proceeded to construct the cumulative probability distribution for ε. First, we sorted the countries in the ascending order of their energy consumption per capita ε n , so that n = 1 corresponds to the country with the lowest consumption, and n = L to the maximal consumption, where L is the total number of countries. We denoted the population of a country n as N n . Then, the cumulative probability for a given ε n is Effectively, this construction assigns the same energy consumption ε n to all N n residents of the country n. Of course, this is a very crude approximation, but it is the best we can do in the absence of more detailed data. The empirically constructed function C e (ε n ) is shown in figure 5 by different colors for the years 1990, 2000, and 2005. Table 2 and figure 5 illustrate the great variation and inequality of energy consumption per capita around the world. Let us focus on the data for 2005. In USA, ε is about 5 times greater than the global average; in China, ε is close to the global average; and, in India, ε is about 1/4 of the global average. By construction, C e (ε n ) exhibits discontinuities at each ε n because of the approximation used in our procedure. Given the relatively small number of data points (L = 135) and discontinuities of the plot, it is not practical to do a quantitative fit of the data. Nevertheless, the empirically constructed function C e (ε) can be compared with the theoretical function C t (ε) = exp(−ε/T ), which is shown by the solid line in figure 5. Here the temperature T = 2.2 kW is the average global energy consumption per capita, obtained by dividing the total energy consumption of all countries by their total population. This value is indicated by the arrow in figure 5. (For comparison, the physiological energy consumption at rest by a female of the weight 53 kg is 63 W [31].) The exponential function does not fit the data perfectly, but it captures the main features reasonably well, given the crudeness of the data. The agreement is remarkable, given that the solid line is not a fit, but a plot of a function with one parameter T fixed by the global average.
In order to make an additional visual comparison between the theory and the data, the functions C e (ε n ) and C t (ε) are plotted in figure 6 in the log-linear scale and in figure 7 in the log-log scale. In figure 6, we see that the empirical data points oscillate around the theoretical exponential function shown by the straight line. The data jumps for high ε are unnaturally magnified in the logarithmic scale. Figure 7 demonstrates that  the empirical data points do not fall on a straight line in the log-log scale, so the energy consumption per capita is not described by a power law. Indeed, energy production and consumption are physically limited and have the characteristic average scale T , so a scale-free power-law distribution would not be expected here.
We have also constructed the plots for CO 2 emission per capita using the data from WRI [28]. They look essentially the same as the plots for energy consumption per capita, in agreement with findings by other authors [32], because most of energy in the world is currently generated from fossil fuel. A smoother visualization can be achieved in the Lorenz plot for energy consumption per capita. As in (25), the empirical Lorenz curve is constructed parametrically

The effect of globalization on the inequality of energy consumption
The horizontal coordinate x(ε n ) gives the fraction of global population with energy consumption per capita below ε, and y(ε n ) gives the total energy consumption of this population as a fraction of the global consumption. When n runs from 1 to L, we obtain a set of points in the (x, y) plane representing the Lorenz plot.  The empirically constructed Lorenz plots for 1990, 2000, and 2005 are shown in figure 8 using different colors. By construction, the Lorenz plots are continuous without jumps, although the slope (the derivative) of the y(x) curve is discontinuous. Another advantage of the Lorenz plot is that it emphasizes the data where most of the population is, i.e., the range from the bottom 5% to the top 95% of the population sorted according to their energy consumption per capita.
The black solid line shows the theoretical Lorenz curve y = x + (1 − x) ln(1 − x) for the exponential distribution. We observe that, in the first approximation, the theoretical curve captures the data reasonably well, especially given that the curve has no fitting parameters at all. Upon a closer examination, we notice a systematic historical evolution of the empirical curves. From 1990 to 2005, the data points moved closer to the diagonal, which indicates that global inequality of energy consumption decreased. This is confirmed by the decrease of the calculated Gini coefficient G, which is listed in figure 8.
On the Lorenz plot for 1990, we notice a kink or a knee indicated by the arrow, where the slope of the curve changes appreciably. This point represents the boundary between developed and developing countries. Indeed, below this point we find Mexico, Brazil, China, and India, whereas above this point we find Britain, France, Australia, Russia, and USA. The conclusion is that the difference between developed and developing countries lies in the degree of energy consumption and utilization. This criterion provides a physical measure for such a distinction, as opposed to more ephemeral monetary measures, such as dollar income per capita.
Comparing the Lorenz plots for 2000 and 2005 with the plot for 1990, we observe  that the kink in the plots is progressively smoothed out. It means that the gap in energy consumption per capita between developed and developing countries is shrinking. We attribute this result to rapid globalization of the world economy in the last 20 years. Nevertheless, the distribution of energy consumption per capita around the world still remains highly unequal. We observe in figure 8 that the Lorenz plot has moved closer to the solid curve representing the exponential distribution. Based on the general arguments about partitioning of a limited resource, we expect that the result of a wellmixed globalized world economy would not be an equal energy consumption, but the exponential distribution. Thus, it is not likely that the energy consumption inequality will be eliminated in the foreseeable future.
It is generally known that energy consumption per capita and GDP per capita are positively correlated, and energy consumption is the physical basis for economic prosperity [32]. Brown et al. [33] found a power-law relation ε ∝ (GDP/capita) 0.76 between these two variables by analyzing the data for different countries around the world (see figure 3A in [33]). The last three columns in Table 2 show the data for GDP per capita [34]. Although this variable is generally correlated with the energy consumption per capita, the monetary and the physical measures are not always well aligned. The movement of sustainable economics [35] criticized GDP as a useful measure of economic prosperity.

Conclusions
In this paper, we study probability distributions of money, income, and energy consumption per capita for ensembles of economic agents. Following the principle of entropy maximization for partitioning of a limited resource among many agents, we find exponential distributions for the investigated variables. Using an analogy with thermodynamics, we discuss trade deficit and immigration between two countries with different money temperatures. Considering a cycle similar to a thermal engine, we discuss how a monetary profit can be extracted in the presence of non-equilibrium due to a temperature difference.
Then we study a Fokker-Planck equation for income diffusion with additive and multiplicative components. The resulting probability distribution of income interpolates between the exponential function (Boltzmann-Gibbs) at the low end and the power law (Pareto) at the high end. This function agrees well with the empirical income distribution data in USA obtained from the Internal Revenue Service. While the exponential distribution in the lower class remains stable in time, the income fraction f going to the upper tail expands dramatically during speculative bubbles and shrinks when the bubbles burst. Overall, income inequality in USA has increased significantly from 1983 to 2007, so that now f exceeds 20% of the total income in the system. We also discuss reasons why the Pareto exponent tends to have the value about α = 2 in the steady state in the absence of bubbles.
Finally, we analyze the probability distribution of energy consumption per capita around the world using the data from the World Resources Institute. We find that the distribution is reasonably described by the exponential function with the average global consumption as the effective temperature. A closer examination finds a gap in energy consumption between developed and developing countries, which tends to shrink as time progresses. We attribute this effect to globalization of the world economy. The inequality of energy consumption decreased from 1990 to 2005, while the corresponding Lorenz plot moved closer to the exponential distribution.
In conclusion, we observe that statistical problems of different nature have common mathematical description and exhibit similar and universal patterns of inequality. Thus, statistical approach gives an insight into the persistent and ubiquitous nature of inequality in the world around us. The approach presented here can be also applied to other statistical problems.