Decomposition Analysis of Earnings Inequality in Rural India: 2004-2012

We analyze the changes in earnings of paid workers (wage earners) in rural India from 2004/05 to 2011/12. Real earnings increased at all percentiles, and the percentage increase was larger at the lower end. Consequently, earnings inequality declined. Recentered Influence Function decompositions show that throughout the earnings distribution, except at the very top, both changes in 'worker characteristics' and in 'returns to these characteristics' increased earnings, with the latter having played a bigger role. Decompositions of inequality measures reveal that although the change in characteristics had an inequality increasing effect, chiefly attributable to increased education levels, inequality declined because workers at lower quantiles experienced greater improvements in returns to their characteristics than those at the top.


Introduction
In their discussion of India's economic growth, Kotwal et al (2011) point to the existence of two Indias: "One of educated managers and engineers who have been able to take advantage of the opportunities made available through globalization and the other-a huge mass of undereducated people who are making a living in low productivity jobs in the informal sector-the largest of which is still agriculture." This paper is about the second India that mainly resides in its rural parts. Agriculture, the mainstay of the rural economy, continues to employ the largest share of the Indian workforce, but its contribution to gross value added is much smaller. In 2011, the employment shares of agriculture, industry, and services were 49, 24 and 27 percent respectively, whereas their shares in Gross Value Added were 19, 33, and 48 percent respectively (GOI 2015). In addition, between 2004/05 and 2011/12, real Gross Domestic Product in these sectors grew at 4.2, 8.5 and 9.6 percent per annum, respectively, making agriculture the slowest growing sector of the economy (authors' calculations based on RBI 2015). Given these figures, the concern about whether high overall GDP growth has benefitted those at the bottom, and to what extent they have benefitted compared to those at the top, is even more pertinent for rural India. We therefore focus on rural India and examine how real earnings of paid workers (wage earners) evolved over the seven-year period between 2004/05 and 2011/12.
Several studies have documented that along with the high growth rates of GDP that have characterized the Indian economy since the 1980s, there has been an increase in inequality. 1 However, most of these studies have either focused on consumption expenditure (Sen and Himanshu 2004; Cain et al 2010; Motiram and Vakulabharanam 2013;Jayaraj and Subramanian 2015;Datt et al 2016), 2 or on earnings of 1 A notable exception is Dutta (2005). For the period, 1983-99, at the all-India level she finds an increase in wage rate inequality among regular salaried workers, but a decrease among casual labor. 2 There are some advantages in looking at consumption expenditure instead of earnings (Goldberg and Pavcnik 2007). The former are a better measure of lifetime wellbeing and suffer from fewer reporting errors. In spite of this, we feel that it is important to juxtapose the two to get a complete picture. This is especially important as the two paid workers in urban India (Kijima 2006;Azam 2012). Two notable exceptions are Hnatkovska and Lahiri 2013, and Jacoby and Dasgupta 2015. Hnatkovska and Lahiri (2013 focus on wage comparisons between rural and urban areas between 1983 and 2010. They find that urban agglomeration led to a massive increase in urban labor supply that in turn reduced the rural-urban wage gap. Unlike Hnatkovska and Lahiri (2013), we focus exclusively on rural India to provide a more detailed picture of the changes within this sector. Jacoby and Dasgupta (2015) adopt the Supply-Demand-Institutions (SDI) framework pioneered by Katz and Murphy (1992), and Bound and Johnson (1992), to decompose wage changes between 1993 and 2011 in both rural and urban India. We use a very different approach, namely, the Recentered Influence Function (RIF) Decomposition developed by Firpo, Fortin, and Lemieux (2009) to study earnings evolution in rural India. 3 Jacoby and Dasgupta (2015) decompose the change in an indirect measure of wage inequality, namely, the relative wages of educated and uneducated workers, into changes in employment shares of different demographic groups and changes in the industrial composition. In this paper, we focus on direct measures on inequality such as the Gini and the 90/10 percentile ratio, and decompose changes in these measures into changes in worker characteristics and changes in returns to these characteristics.
Our finding that the change in returns to characteristics that is driving the decline in earnings inequality in rural India is a novel one. Moreover, we document changes not just at the mean but also at various quantiles. It is important to do so because several studies have found that earnings inequality is mainly concentrated at the upper end. For India, Azam (2012) and Kijima (2006) find this for urban wage earners, and Banerjee and Piketty (2005) find it for income tax payers. We use unconditional quantile regressions to account for the effects of workers' characteristics at different quantiles and thereby make inferences measures may exhibit different trends. Krueger and Perri (2006) document this for the US, and then develop a model to show how income inequality can affect consumption inequality. 5 about their effects on earnings inequality. Finally, we use the RIF Decompositions to divide the overall change in earnings inequality into a composition effect (the component due to changes in the distribution of worker characteristics) and a structure effect (the component due to changes in returns to these characteristics).
We find that during the period from 2004 to 2012, real earnings among paid workers increased at all percentiles and the percentage increase was greater at lower percentiles. Consequently, earnings inequality declined in rural India. The RIF decompositions reveal that throughout the earnings distribution, except at the very top, both the composition effect and the structure effect increased earnings, with changes in the latter having played a bigger role. Decompositions of inequality measures reveal that in spite of the composition effect having had an inequality-increasing role, inequality fell because workers at lower quantiles experienced greater improvements in returns to their characteristics than those at the top. Earnings inequality increased as workers acquired higher levels of education. At the same time, lower returns to higher education, reduced inequality.
The rest of the paper is organized as follows. Section 2 discusses the methodology used to analyze the change in earnings. Section 3 describes the data and the analysis sample. Section 4 presents the results, and section 5 concludes.

Methodology
We briefly explain the RIF regression for unconditional quantiles, followed by the RIF decomposition technique. For a detailed exposition of this and other decomposition techniques, see Fortin et al. 2011.

Unconditional Quantile Regressions
Unconditional quantile regressions (UQR, Firpo et al. 2009) help us examine the marginal effects of covariates on the unconditional quantiles of an outcome variable. UQR differ from the traditional quantile 6 regressions (Koenker and Bassett 1978) in that the latter examine the marginal effects on the conditional quantiles. For instance, if we observe that the conditional quantile regression coefficients for college education increase as we move from the first to the ninth decile, we can say that having more people with a college education would increase earnings dispersion within a group of individuals having the same vector of covariate values. However, in order to claim that college education increases overall earnings dispersion (among all individuals irrespective of their covariates), we need to rely on unconditional quantile regressions. To understand UQRs we begin with the concept of an Influence Function (IF).
The IF of any distributional statistic represents the influence of an observation on that statistic.
Specifically, let denote earnings, and let denote the th quantile of the unconditional earnings distribution. Then, where {. } is an indicator function, and is the density of the marginal distribution of earnings. The RIF is obtained by adding back the statistic to the IF. Thus, the RIF for the th quantile is given by: Note that the expected value of the RIF is itself. The conditional expectation of the RIF modelled as a function of certain explanatory variables, , gives us the UQR or RIF regression model: In its simplest form, where represents the marginal effect of on the th quantile. can be estimated by Ordinary Least Squares (OLS) wherein the dependent variable is replaced by the estimated RIF. The RIF is estimated by plugging the sample quantile, ̂, and the empirical density, ( ), the latter estimated using kernel methods, in equation {2}.

RIF Decomposition
The RIF decomposition divides the overall change in any distributional statistic into a structure effect (due to the changes in returns to characteristics/covariates), and a composition effect (due to the changes in the distribution of covariates). Compared to other decomposition methods such as the Machado-Mata (Machado and Mata 2005), the RIF decomposition has the added advantage of further dividing the structure and composition effects into the contribution of each covariate. In this way, it is closest in spirit to the decomposition method proposed by Blinder (1973) and Oaxaca (1973).
In the case of quantiles, the RIF Decomposition is carried out using the estimated UQR /RIF regression coefficients explained in section 2.1. The RIF regression coefficients for each year (T) are given by: The aggregate decomposition for any unconditional quantile is given by: To examine the contribution of each covariate, the two terms in {6} can be further written as:
The detailed decomposition of the structure effect has a limitation when categorical variables are included as covariates. The choice of the omitted or reference group (for caste, education, industry, occupation or state of residence in our analysis) can influence the contribution of each covariate to the structure effect.
Since the choice of the reference categories is arbitrary, results of the detailed decomposition can vary.
Existing solutions to the omitted category problem come at the cost of interpretability (see Fortin et al. 2011). We have maintained one set of categorical variables throughout the paper. All our interpretations are based on this choice.
Though the above discussion on RIF decomposition focused on quantiles, it is also applicable to any other distributional statistic. We present the RIF decomposition for quantiles as well as selected inequality measures including the Gini.

Data
We use two rounds of the nationally representative Employment Unemployment Survey (

Results
We present below our findings related to the evolution of the earnings distribution in rural India between 2004/05 and 2011/12. The earnings density for each year is skewed to the right implying that the median earning was less than the mean. Over the seven-year period the earnings density shifted to the right and became more peaked   that the increase in earnings was, in absolute terms (i.e. measured in rupees), greater for higher percentiles. For instance, real weekly earnings increased by 99 rupees at the first decile, 194 rupees at the median, and 307 rupees at the ninth decile. However, as seen in Figure 3, the percentage increase in earnings was greater at the lower end of the distribution. 10 For instance, earnings increased by 91 percent at the first decile, 74 percent at the median, and by 44 percent at the ninth decile. Thus, earnings inequality-defined in relative rather than absolute terms-declined over the seven-year period.

Changes in Earnings Inequality
10 Using consumption expenditure data (also collected by the NSSO), for the period between 2004/05 and 2009/10, Jayaraj and Subramanian (2015) find a similar pattern of an increase in real consumption expenditures at all deciles for rural India, with the highest growth occurring at the third and fourth deciles.  Table 1 supplements Figures 2, 3 and 4 and shows how various summary measures of inequality changed over time. The ratio of the (raw) earnings at the twenty-fifth to the tenth percentile was steady at about 1.52. At the middle of the distribution, there was some decrease in inequality as measured by the sixtieth to the fortieth percentile. In contrast, the ratio at the ninetieth to the seventy-fifth percentile fell very sharply from 1.72 to 1.53. Thus, it is clear that the decrease in inequality mainly came from changes at the top and middle of the distribution than from the bottom. 13 The decrease in inequality is also reflected in the variance of log earnings and in the Gini coefficients. The Gini of real weekly earnings fell from 0.462 to 0.396. 11 This is in sharp contrast to the picture in urban India where earnings inequality remained virtually unchanged over the period: The Gini of real weekly earnings in urban India was 0.506 in 2004/5 and 0.499 in 2011/12. Jayaraj and Subramanian (2015) use consumption expenditure data (also from the NSSO) and find that between 2004/05 and 2009/10, the Gini declined from 0.305 to 0.299 in rural India. For urban India, it increased from 0.376 to 0.393. It is noteworthy that while the direction of change in rural inequality that they find using consumption expenditure is the same as what we find using earnings, this is not the case for urban inequality. This makes a strong case for studying both consumption and earnings inequality.

Wage Rates or Days Worked: Decomposition of the Variance in Log Earnings
So far our analysis has been about weekly earnings. The EUS also collects data on the number of half-days The decomposition tells us how much of the earnings inequality (1), is accounted by inequality of wage rates (2), inequality of workdays (3), and the co-movement of wage rates and workdays (4). We implement this decomposition for both years, and then calculate the difference between corresponding terms. 12 The results are shown in Table 2. In both years, the covariance between wage rates and days worked was positive implying that highly paid workers worked more number of days. Also, earnings inequality was largely on account of inequality of wages rates rather than inequality of days worked or because highly paid workers also worked for longer time: Over 70 percent of the earnings inequality was due to inequality of wage rates. 13 As mentioned earlier in section 4.1.1, earnings inequality declined over the seven-year period as seen in the decrease in the variance of log earnings. The last row of Table 2 presents the decomposition of decline in earnings inequality. About 50 percent of this decline was due to a decline in inequality of wage rates.
The rest was due to a decrease in inequality of days worked (about 30 percent), and a weaker relationship between highly paid workers working more number of days (about 20 percent).

Unconditional Quantile Regression Results
Before moving to the regression results, we present some descriptive statistics in Table 3 for paid workers in rural India. Mean (log) weekly earnings increased over the period. The average age also increased by about 1.7 years, perhaps an indication of later entry into the labor market as more people acquire higher education. There was also an increase in the share of males, married workers and Muslims. The proportion of those belonging to ST (Scheduled Tribes) and SC (Scheduled Castes) declined. 14 Education levels rose significantly: The share of illiterates decreased by around 11 percentage points, while the share of each schooling level, including college education, increased.
We classify industries into seven categories: Agriculture, manufacturing (including mining), construction, utilities, wholesale and retail trade, public administration (including defense) and other services (including education, health, real estate and finance). Over the period, the major change in the industrial distribution came primarily from agriculture, which saw a 12 percentage point decrease, and construction, which saw a roughly equivalent increase. 15 is important to note that these predictions are based on the assumption that the wage structure, i.e. the returns to observed worker characteristics, remains intact as the distribution of characteristics changes.
In effect, this amounts to assuming away the presence of general equilibrium effects, a standard assumption made in this literature.
The first row of plots in Figure 5 show that the coefficients for being male were positive and significant, implying the presence of a gender earnings gap. The UQR male coefficients were decreasing across deciles: In 2011/12, the male coefficient value was 0.69 at the first decile, 0.44 at the median, and 0.40 at the ninth decile. This is termed as the 'sticky floor' effect and shows that while men earned more than women throughout the distribution, the penalty for being female was more pronounced at the bottom of the distribution. 17 The decreasing UQR coefficients also mean that having a greater proportion of men would reduce earnings inequality among wage earners. This was unambiguously true for 2004/05 as the coefficients decline monotonically across deciles, and it was true for the lower part of the 2011/12 distribution.
The second through fourth rows of plots in Figure 5 show the presence of caste earnings gaps, though we do not see such gaps in all parts of the distribution. 2011/12, the caste earnings gaps were overwhelmingly because of occupation and industrial segregation by caste.
The fifth row of Figure 5 indicates that returns to being married moved from being insignificant at lower deciles to being positive at upper ones. Thus, if the proportion of married individuals were to increase earnings inequality among wage earners would increase. Except at the ninth decile in 2004/05, there was no penalty for being Muslim in both years.

RIF Decomposition Results
Next we turn to RIF decompositions to understand the factors behind the changes in the real earnings distribution. We first present the aggregate decomposition followed by the detailed decompositions of the composition and structure effects. again shows that the lower quantiles experienced a larger percentage increase in earnings than the higher quantiles.

Aggregate Decomposition of Change in Earnings
The total difference is decomposed into the structure (dashed) and the composition effects (dotted). Both components made significant contributions to the overall increase in earnings over the seven-year period.
The only exception to this is at the nineteenth vigintile (95 th percentile), where the structure effect is not significant. Thus, the contribution of the structure effect to the overall increase in earnings was positive and much larger than the composition effect at all but the top vigintile. 21 21 We also implemented the aggregate decomposition using the Melly's refinement (Melly 2006) of the Machado-Mata Decomposition (Machado and Mata 2005) and found similar results. An important conclusion from the decomposition is that most of the decline in inequality occurred because the returns to characteristics improved a lot more at lower percentiles. In fact, it is clear that while changing characteristics did lead to an improvement in real earnings throughout the distribution, it had an inequality increasing effect: The composition effect increased sharply after the eighth decile, implying that had 'returns to characteristics' been held constant over the period, earnings inequality would have risen.  In summary, the aggregate decomposition of all inequality measures reveals that the decline in inequality came exclusively from the structure effect, but the detailed decompositions that follows presents a more nuanced picture.

Detailed Decomposition of the Composition Effect
The second panel of Table 4 and Figure 8 present the detailed decomposition of the composition effect to ascertain which set of covariates were important in driving the total composition effect. The inequality increasing effect was mainly driven by changes in the distribution of education, and to a lesser extent of experience and occupation. On the other hand, the change in the industrial distribution had a significant inequality decreasing effect, operative at the top of the distribution. Further decomposing the industry category into its constituents (not shown here) points to a large contribution from the shift into construction. The large shift from agriculture to construction noted earlier, decreased earnings inequality.
The greater proportion of male workers, also contributed to the decline in inequality. Changes in the distribution of state of residence, marital status, caste and religion did not have a major effect on change in inequality.

Detailed Decomposition of the Structure Effect
The bottom panel of Table 4 presents the decomposition of the structure effect. The Gini decomposition reveals that the education, occupation and the residual (the unexplained portion of the structure effect) were significant in reducing earnings inequality, while none of the other components were statistically significant.
As noted earlier (Table 4), the decline in inequality came disproportionately due to the structure effect and especially from changes at the top of the distribution. The decomposition of the structure effect component of the 90-50 measure shows that a large part of the change (-0.228) can be explained by changing returns to education (-0.108) and to occupation (-0.096). As noted in Figure 6, the returns to education (with illiterates as the base category) actually declined at the higher end of the wage distribution, whereas returns did not change significantly in the middle. The same is true for the return to higher occupations (laborers and unskilled workers as the base category).

Conclusions
Using nationally representative data from the Employment Unemployment Survey we examine the changes in real weekly earnings from paid work for rural India from 2004/05 to 2011/12.
For wage earners who constituted about a quarter of the rural working age population, we find that their real earnings increased at all percentiles. Using consumption expenditure data that span the entire population, other studies 22 have also documented an improvement in all parts of the distribution. Taken together, there is clear evidence that economic growth in the post-reform period (after the early 1990s) has been accompanied by a reduction in poverty. 23 At the same time, according to official estimates, in 2011/12, 25.7 percent of the rural population was below the poverty line. This figure represents about 216.7 million poor persons, a large number of people living below a minimum acceptable standard. 24 Our analysis also reveals that earnings inequality in rural India decreased over the seven year period, and about half of the decline can be accounted for by the decline in daily wage inequality. However, while the rural Gini fell over this period, it remained virtually unchanged in urban India. This shows that the dynamics of earnings is different for the two sectors. This could be because the underlying structural characteristics are different, for example, while agriculture is the largest employer in rural India, for urban India it is services. It could also be the result of different redistributive policies followed in the two sectors.
These aspects need to be recognized when designing future policies to tackle inequality in the two regions.
Aggregate decompositions of the change in inequality measures reveal that the change in returns to worker characteristics was mainly responsible for the decrease in earning inequality. Further detailed decompositions reveal that higher levels of education in the population contributed to an increase in earnings inequality, while lower returns to higher education contributed to a decrease. Rural India also experienced a construction boom during this period that also contributed to the decrease in earnings inequality.
Some studies ( One cannot be certain that this trend of rising casual wages and declining earnings inequality will continue into the future. Regardless of the underlying causes of the recent decline in earnings inequality in rural India, volatility in global crop prices and the drought conditions currently experienced by large parts of the country because of two consecutive weak monsoons are important reminders that policies designed to foster employment opportunities and wage growth of unskilled workers outside of agriculture are crucial for improving the economic well-being of the second part of India.