Modeling income distribution: An econophysics approach

: This study aims to develop appropriate models for income distribution in Iran using the econophysics approach for the 2006–2018 period. For this purpose, the three improved distributions of the Pareto, Lognormal, and Gibbs-Boltzmann distributions are analyzed with the data extracted from the target household income expansion plan of the statistical centers in Iran. The research results indicate that the income distribution in Iran does not follow the Pareto and Lognormal distributions in most of the study years but follows the generalized Gibbs-Boltzmann distribution function in all study years. According to the results, the generalized Gibbs-Boltzmann distribution also properly fits the actual data distribution and could clearly explain the income distribution in Iran. The generalized Gibbs-Boltzmann distribution also fits the actual income data better than both Pareto and Lognormal distributions.


Introduction
An income distribution shows how the national income of a country is shared among its people. It provides an insight into the degree of inequality in the incomes of individuals within a country. and high-income social classes to study the income distribution of EU members. The Faker-Planck equation was utilized in this study to describe the income levels of different social classes through the statistical econophysics approach, in which the Gibbs-Boltzmann distribution, Pareto distribution, and Zipf distribution yielded the best outputs for low-income, middle-income, high-income social classes, respectively. Also, Wada and Scarfone [11] found that the relations are the basis and necessary conditions of physical behavior and showed some uses of the Kaniadakis distribution to achieve results aligned with the takeaway of our use of the distribution.
The income index and sustainability are interconnected in multiple ways. Income is a crucial factor that influences the environmental impact of economic activities and determines the ability of individuals, households, and businesses to adopt sustainable practices [12,13]. On the one hand, higher income levels can lead to increased consumption and waste generation, leading to a negative impact on the environment [14,15]. However, higher income levels also provide individuals and businesses with the resources needed to invest in sustainable technologies and practices, thereby reducing their environmental footprint [16,17]. Moreover, income inequality can directly affect sustainability efforts by creating social tensions, hindering access to education and healthcare, and limiting opportunities for economic growth and development, which in turn can impede the adoption of sustainable practices. Therefore, ensuring equitable access to income and promoting sustainable practices can work together towards achieving long-term environmental, social, and economic sustainability [18,19].
This study aims to develop a distribution function using the econophysics approach, which can show the income distribution of different classes of society.

Methods
If ~( , 1) the probability density function ( ) of the gamma distribution is [20]: The normalization constant for the new distribution function = is: Therefore, the of the general Gibbs-Boltzmann distribution is defined as below: where and respectively are the lowest household income and average income, analogous to temperature in the Boltzmann-Gibbs distribution [7,10,21]. where is the shape parameter, and this density function includes low, medium-and high-income social classes.
The Gibbs-Boltzmann distribution is the most widely used function in statistical mechanics and physics. In recent years, it has been used in revenue distribution. The Gibbs-Boltzmann law with exponential distribution is defined for the medium-and high-income social classes in [10,22,23].
We use two statistical distributions in this paper. They are also utilized in revenue distribution to compare distribution functions [21]. The pdf of the Lognormal distribution function is defined below with parameters µ and σ: Having heavy sequences, the Pareto distribution is a probability distribution that describes many physical, economic, and social phenomena. The Pareto distribution of this law is valid only for the high-income social class [1]. This distribution is defined as below: Here , the scale parameter is positive, and α (Pareto index of inequality), the shape parameter, is also positive.
In statistical applications, the maximum likelihood method ( ) is a powerful technique used to estimate the parameters of a specific probability distribution function based on observed data. Its main objective is to find the values of the parameters that maximize the likelihood function, which represents the probability of observing the given data for different values of the parameters. is widely used in various fields, such as economics, biology, and engineering, to name a few.
The strength of lies in its ability to produce reliable estimates of the parameters of a model, even when the sample size is relatively small. It is also useful for comparing different models and selecting the one that best describes the data. However, it assumes that the data are independent and identically distributed, which may not always be true in practice. Overall, is a versatile and widely used technique for analyzing the statistical behavior of a sequence or dataset.
In parametric distribution functions, the observed data are supposed to be generated by a distribution function depending on a few unknown parameters [24][25][26]. The is the method of estimating the parameters of a distribution function.
For the Gibbs-Boltzmann distribution, we have Therefore, the following equations are employed to obtain , , and : and

Results
A widespread problem facing many societies is the unequal distribution of income and wealth, which is commonly referred to as the class gap. Inequality and the class gap have significant impacts on all aspects of individual and social life [27]. As discussed earlier, income distribution is a critical topic in economics, and economists have long debated which distribution function provides the best approximation for the experimental distribution of income in a country. This section will analyze various distribution functions in economics, starting with the Pareto and Lognormal distributions. We will then adopt the econophysics approach to study the Gibbs-Boltzmann and generalized Gibbs-Boltzmann distributions. To evaluate the goodness of fit of these distributions, we will conduct a chisquare goodness of fit test, which is briefly explained below.
Chi-square goodness of fit test: The chi-square test is a goodness test of the fit for a set of statistical data to determine a specific probability distribution showing how well that statistical model fits a set of observations [28,29].
To determine the goodness of fit of statistical data for a probability distribution, the observed frequency of each group or class is compared with the expected theoretical frequency obtained from the probability distribution. The chi-square test statistics are written as follows to determine whether statistical data have a specific probability distribution.
where k is the number of classes or groups, p is the estimated number of parameters, and respectively the observed and expected frequency. The null hypothesis of the chi-square test is defined as follows: : The statistical data follow the specified probability distribution. If the calculated test statistic exceeds the critical value obtained from the chi-square table, then is rejected.
According to the chi-square test results, was rejected in the Pareto distribution for all the study years. In other words, the income distribution of the whole country did not follow the Pareto distribution in any of the study years. As the results of the Lognormal distribution show, is rejected in all years except for 2007 and 2011. Hence, the income distribution throughout Iran does not obey the Lognormal distribution in most study years. It can then be concluded that Pareto and Lognormal distributions are good enough to show the income distribution of the whole country. Therefore, further research must be conducted to find a suitable distribution function to express the income distribution of the whole country. For this purpose, the generalized Gibbs-Boltzmann distribution will be analyzed in the next step.
According to Table 2, was not rejected in any of the study years. Therefore, it is concluded that income distribution across the country follows the generalized Gibbs-Boltzmann distribution. In other words, the generalized Gibbs-Boltzmann distribution can adequately explain the income distribution across the country.   Accordingly, the generalized Gibbs-Boltzmann distribution is a very good fit to the actual data distribution. In fact, it is able to properly explain income distribution across the country. Moreover, the generalized Gibbs-Boltzmann distribution better fits the actual income data than the Pareto and the Lognormal distributions. Therefore, the results also confirm the results expressed in this section.
Iran's income distribution was analyzed in this section. The chi-square goodness of fit test was employed to examine the goodness of fit of the studied distributions. The results show that the income distribution in Iran did not follow the Pareto and Lognormal distributions in most of the study years but followed the Gibbs-Boltzmann distribution in all years. According to the results, the Gibbs-Boltzmann distribution also has a very good fit to the actual distribution of data and is able to properly explain the distribution of income in Iran. Furthermore, the Gibbs-Boltzmann distribution has a better fit to the actual revenue data than both Pareto and Lognormal distributions.

Conclusions
This study analyzed the Pareto and Lognormal distributions, which are among the most wellknown income distribution functions [30][31][32]. The distribution parameters are estimated through the MLE. The chi-square test was also employed to evaluate the goodness of fit.
The research results indicate that none of the income distributions followed the Pareto and Lognormal distributions across Iran in most of the study years (2006-2018 period). Therefore, it can be concluded that the Pareto and the Lognormal distributions are not good enough to explain the income distribution in Iran. In such circumstances, this study explored the known distributions of econophysics.
The Gibbs-Boltzmann distribution function has been widely used in statistical mechanics and physics and has recently been applied to analyze income distribution. However, it had never been used before in Iran to model income distribution. Therefore, this study utilized the generalized Gibbs-Boltzmann distribution (2) to analyze income distribution in Iran. Based on the estimation of the generalized Gibbs-Boltzmann distribution parameters and the chi-square test, the income distribution in Iran was found to follow the generalized Gibbs-Boltzmann distribution. In other words, the generalized Gibbs-Boltzmann distribution provides a better fit for the actual income data in Iran than the Pareto and Lognormal distributions. From a practical perspective, understanding the position of income distribution in society and having accurate information about the positions of individuals in different income groups can help governments and policymakers take necessary actions to reduce social class gaps through new policies and mechanisms. It is worth mentioning that using the Gini and Herfindahl-Hirschman index can be an interesting research area for future research.