A New Method for Measuring Tail Exponents of Firm Size Distributions

The authors propose a new method for estimating the power-law exponents of firm size variables. Their focus is on how to empirically identify a range in which a firm size variable follows a power-law distribution. On the one hand, as is well known a firm size variable follows a power-law distribution only beyond some threshold. On the other hand, in almost all empirical exercises, the right end part of a distribution deviates from a power-law due to finite size effects. The authors modify the method proposed by Malevergne et al. (2011). In this way they can identify both the lower and the upper thresholds and then estimate the power-law exponent using observations only in the range defined by the two thresholds. They apply this new method to various firm size variables, including annual sales, the number of workers, and tangible fixed assets for firms in more than thirty countries.


Introduction
Power-law distributions are frequently observed in social phenomena (e.g., Pareto (1897); Newman (2005); Clauset et al. (2009)). One of the most famous examples in Economics is the fact that personal income follows a power-law, which was first found by Pareto (1897) about a century ago, and thus referred to as Pareto distribution. Specifically, the probability that personal income x is above x 0 is given by where µ is referred to as a Pareto exponent or a power-law exponent.
As for the variables related to firm behavior, it is well known that there are several variables that follow a power-law, including firm sales for a particular period (e.g., annual sales), the number of workers employed by a firm, and the amount of fixed assets, like machinery equipments, held by a firm. The fact that the firm size variables mentioned above follow power-law distributions implies that the behavior of these variables at the aggregate level is dominantly affected by a very small number of firms that are extremely large in their size.
The purpose of this paper is to propose a new method for estimating the powerlaw exponent of a distribution. Our special focus is on how to empirically determine a range in which a variable follows a power-law distribution. On the one hand, as shown in equation (1), a variable follows a power-law distribution only when it exceeds some threshold, for example x 0 in (1); the variable deviates from a power-law below that threshold. Thus we need to empirically specify where such a threshold exists. On the other hand, in almost all empirical exercises, the right end part of a distribution deviates from a power-law due to the limited number of observations. It is often the case that the right end part of a distribution exhibits a much quicker decay than implied by a power-law due to such a finite size effect. We need to eliminate that part of a distribution before estimating a power-law exponent. Our strategy is to empirically specify the range of a variable, which is defined by a lower threshold x 0 and an upper threshold x 1 , and then estimate a power-law exponent using only observations only in that range.
Our method is based on the one proposed by Malevergne et al. (2011). 1 They propose to test the null hypothesis that, beyond some threshold, the upper tail of a distribution is characterized by a power law distribution against the alternative that the upper tail follows a lognormal beyond the same threshold. 2 It is important to note that their intention was to detect a lower threshold x 0 by conducting this test, and that they did not pay any particular attention to the presence of an upper threshold x 1 . However, as we will show later, in applying this method to firm size variables, one often encounters a situation that the threshold detected by this method is not x 0 but x 1 . Needless to say, this failure leads to an imprecise estimate of a power-law exponent.
In our method, we first apply the test by Malevergne et al. (2011) to detect a upper threshold, x 1 . We then repeat the test, but we "thin out" observations before conducting the second round test. Specifically, we discard observations above x 1 , which is detected by the first round test, and similarly we thin out observations below x 1 . Then we apply the test to the thinned out set of observations to detect x 0 .
The rest of this paper is organized as follows. In Section 1, we will provide detailed explanation on our new method. In Section 2, we will apply the new method to firm size variables, including annual sales, the number of workers, and tangible fixed assets for firms in more than thirty countries. Section 3 concludes the paper.

Methodology
Let us start by showing the empirical distributions for tangible fixed assets, which is denoted by K, the number of workers, L, and annual sales, Y . The cumulative distributions for these three variables for Japanese firms are shown in Figure 1 with horizontal and vertical axes being in logarithm. We see that dots are on a straight line in each of the three figures, indicating that each of the distributions is a power-law. However, dots deviate from a straight line when the firm size variables take very small or very large values. In other words, K, L, and Y follow power-law distributions only within some range. That is, The main issue of this paper is how to estimate the range in which dots are on a straight line; namely, Our method is based on Malevergne et al. (2011), which propose a method to identify the boundary between a power-law and a lognormal. Consider a case described by equation (1). For each value of x, they test the null hypothesis that x follows a power-law distribution beyond that value against the alternative that x follows a lognormal distribution beyond the same value. They start this test for the maximum value of x, and repeat the test for the second largest, the third largest values, and so on, until the null is finally rejected.
Note that their test is equivalent to testing the null that the upper tail of the log of x follows an exponential distribution against the alternative that the log of x follows a truncated normal distribution. For this transformed test, del Castillo and Puig (1999) have shown that the clipped empirical coefficient of variation provides the uniformly most powerful unbiased test.
Specifically, let us consider a random variable z which follows a truncated normal distribution with truncation occurring at z = A. The probability density function P(z) is given by where NC(α, β ) represents a scaling value, and it is defined by where Φ(·) is the CDF of a standard normal distribution. Note that it can be shown by using asymptotic expansion that P(z; α, Suppose there are n observations for z (namely, z 1 , z 2 , · · · , z n ). The log likelihood is given by and the maximum likelihood estimate for θ = (α, β ) is characterized by where γ and h(γ) are defined by and c 2 is the square of the coefficient of variation for (x i − A), which is defined by where ⟨·⟩ represents the sample mean. For a give value of A, one can calculate c 2 from the data, and then obtain a maximum likelihood estimate of γ from (8), which is denoted byγ. Note that the expression on the left hand side of (8) is monotonically increasing with respect to γ, so that one can obtain a solution just by applying a simple method like the Newton-Raphson method.
If z follows an exponential distribution rather than a truncated normal distribution, β in equation (5) is equal to zero, and the log likelihood is given by The maximum likelihood estimate for θ is given byθ = (α, 0) = (1/ ⟨x − A⟩, 0). Then the null hypothesis that z is exponentially distributed can be tested against the alternative that z follows a truncated normal distribution by conducting a likelihood ratio test, in which the likelihood ratio is given by The random variable z is more likely to follow a truncated normal distribution if the value of W is above zero, and it is more likely to be exponentially distributed if W is below zero. Specifically, it is known that the asymptotic distribution of W around W = 0 is a 50-50 mixture of a χ 2 distribution with a degree of freedom of one, and a constant zero (see Self and Liang (1987) and Geyer (1994)). Therefore, the asymptotic distribution of W is given by W (γ) = 0 if c is greater than unity, and if c is less than unity. del Castillo and Puig (1999) adopts a more precise approximation to W by using where L(·) is defined by In sum, the procedure proposed by del Castillo and Puig (1999) and Malevergne et al. (2011) is as follows.
1. Pick up the largest n observations and take log. Set the threshold A equal to the log of the value for the largest observation.
3. Compute W * and p-value associated with it by inserting the value ofγ into (14).
4. Repeat this procedure for n = 1, 2, 3, . . . until the p-value associated with W * is sufficiently large to reject the null hypothesis.
Let us show how the method proposed by Malevergne et al. (2011) works by applying it to the distribution for the number of workers employed by Japanese firms in 2004. The black dots in Figure 2 represent empirical CDF produced using actual observations. There are two vertical lines in the figure, but the dashed line represents the threshold identified by the procedure proposed by Malevergne et al. (2011), which corresponds to the 17th largest observation with the value (i.e., the number of workers) of 84,899. Figure 3a shows the p-value for each rank in this test. If their method works well, this result indicates that the number of workers follows a power-law beyond this threshold, but a lognormal below it. However, as one can clearly see from the figure, the black dots are on a straight line even below this threshold, implying that their method fails to detect a correct threshold. This failure happens because the right end part of the distribution decays quicker than the other part of the distribution due to the limited number of observations. The possibility of such a finite size effect is not seriously considered in Malevergne et al. (2011). It is important to note that this particular case is not an exception, but in fact we encounter similar failures quite often in estimating the power-law exponents of firm size distributions.
To cope with this problem, we propose to modify their procedure in the following way. Basically what we will do is to "thin out" observations so as to minimize the extent to which one suffers from the finite size effect. Specifically, after detecting the 17th largest observation as a (wrong) threshold, we discard 16 observations above it. We also discard the 18th, 19th, 20th,. . ., and 33rd largest observations, the 35th, 36th, 37th,. . ., and 50th largest observations, and so on. By repeating this procedure, we end up with a thinned out set of observations which consist of Black dots represent the original set of observations, while grey dots represent the thinned-out set of observations. The two vertical lines indicate the upper and lower thresholds, which are estimated using the method described in the text. The power-law exponent is estimated using only observations within the range defined by these two thresholds. the 17th largest observation, the 34th largest observation, the 51st largest observation, and so on. These thinned out observations are indicated by grey circles in Figure 2. Then we apply again the method by Malevergne et al. (2011), but this time not to the original set of observations but to the thinned out set of observations. This second round test identifies a new threshold, which is represented by (a) The p-value for each rank obtained from the first round test. The vertical line represents the rank whose p-value falls below 5% threshold for the first time. The vertical line corresponds to the 17th largest observation.

(b)
The p-value for each rank obtained from the second round test. The vertical line represents the rank whose p-value is below the 5% threshold but the p-values associated with the ranks lower than that are all above the 5% threshold. The vertical line corresponds to the 1453rd largest observation. the vertical solid line in Figure 2. This corresponds to the 24701st largest among the original set of observations and the 1453rd largest among the thinned out set of observations. Figure 3b shows the p-value for each rank in this second round test. The number of workers corresponding to this second threshold is 60, which is substantially lower than the number corresponding to the first threshold. We see from the figure that dots, both black and grey, are on a straight line in the range indicated by the two vertical lines.
To see how our method works, consider a size-rank equation of the form where s represents a firm size, r is the rank associated with it, and µ is a power-law exponent. We assume that this size-rank equation holds for r ∈ [r 0 , r 1 ]. We know the value of r 0 from the first round test (r 0 is 17th in the above example). Let s 0 represents the size associated with the rank r 0 . The constant term in equation (16) is equal to ln r 0 + µ ln s 0 . Therefore, equation (16) implies that holds for r = r 0 , 2r 0 , 3r 0 , 4r 0 , . . . as far as r is smaller than r 1 . Thus we can estimate a power-law exponent µ using a thinned out set of observations {r 0 , 2r 0 , 3r 0 , 4r 0 , . . .}. Note that discarding only observations with higher ranks than r 0 does not work, because, in this case, the rank in the new set of observations is r − r 0 , rather than r/r 0 in equation (17), and the log of r − r 0 does not depend linearly on the log of s. The procedure we propose is summarized as follows.
1. Apply the method proposed by Malevergne et al. (2011) to the original observations to detect an observation (we refer to this as k-th largest observation), above which the CDF is steeper than the other part due to finite size effect.
2. Create a new (thinned out) set of observations, consisting of the k-th largest observation, the 2k-th largest observation, the 3k-th largest observations, and so on.
3. Apply the method proposed by Malevergne et al. (2011) to the thinned out set of observations to detect a new threshold (we refer to this as K-th largest observations).
4. Estimate the slope of a straight line within the range defined by the value associated with the k-th largest observation and the value associated with the K-th largest observation.

Empirical Results
In this section we apply the new method to firm size variables, including annual sales, the number of workers, and tangible fixed assets for firms in more than thirty countries. 3 The data comes from ORBIS provided by Bureau van Dijk, which contains B/S and P/L information for more than 60 million firms all over the world. The sample includes the period from 1999 until 2009. 4 Figure 4 shows the CDFs for tangible fixed assets, the number of workers, and annual sales for Japanese firms in 2007. As emphasized in the previous sections, dots are not always on a straight line; namely, there is a range in which dots are on a straight line, but dots deviate from the straight line below the lower bound of the range, and they also deviate from the straight line beyond the upper bound of the range. Our estimation result indicates that, for tangible fixed assets, the lower bound of the range, K 0 , is 3,134 thousand USD, and the upper bound of the range, K 1 , is 4,335,478 thousand USD. The range is shown by two vertical lines, and we see that dots are on a straight line inside the range, but dots deviate from it outside the range, indicating that our estimation procedure works well in identifying upper and lower bounds. We confirm the same results for the number of workers as well as for annual sales. Figure 5 and Figure 6 show the results for French firms and  The estimates of power-law exponents for tangible fixed assets (µ K ), the number of workers (µ L ), and annual sales (µ Y ) for Japanese firms those for Chinese firms, indicating again that our estimation procedure works well in identifying upper and lower bounds of a range. After identifying upper and lower bounds of a range, we estimate the slope of CDF by applying an OLS regression. The results for Japan is presented in Table  1. The table shows the power-law exponents for tangible fixed assets, the number of workers, and annual sales, each of which is denoted by µ K , µ L , and µ Y . For example, the power-law exponent for tangible fixed assets in 2005 is 0.8025 and its standard error is 0.0027, suggesting a high precision of the estimate. We also see that each of the three exponents is fairly stable over time.
One of the interesting findings we learn from the table is that µ K tends to be the smallest among the three, and µ L tends to be the largest among the three. Put differently, there exists a relationship between µ K , µ L , and µ Y such that We conduct the same exercise for other countries, and the result is reported in Table 2. It shows that the estimates of power-law exponents differ across countries, but there still exist some tendency that µ K < µ Y < µ L for each country. Why does equation (18) hold? One way to address this question is to start from a Cobb-Douglas production function, which is of the form www.economics-ejournal.org 13 conomics: The Open-Access, Open-Assessment E-Journal   where α and β are positive (but less than unity) parameters. 5 This equation simply says that the amount of output produced by a firm is determined by the amount of inputs, i.e., labor and capital inputs, employed by the firm, as well as the level of productivity of the firm, which is denoted by A in equation (19). Given that Y , K, and L all follow power-law distributions, equation (19) implies if K and L are independent. 6 A simple comparison of (20) and (18) suggests a way to know where (18) comes from. Suppose that the sum of α and β equals to unity as is often assumed in the literature in Economics. The value of µ K , µ L , and µ Y for 2005 in Japan is 0.8025, 0.9923, 0.9210, respectively. These empirical estimates of power-law exponent are consistent with (20) if α = 0.87 and β = 0.13. 7 Note that this calculation is nothing more than an illustration since the assumptions adopted above may not necessary be satisfied in the actual data; namely, K and L may not necessarily be independent, and the sum of α and β may not necessarily be equal to unity. However, this calculation still suggests a way to reconcile the different empirical estimates of power-law exponents for tangible fixed assets, the number of workers, and annual sales. See Mizuno et al. (2011) for more empirical results and discussion along this line of research.

Conclusion
We have proposed a new method for estimating the power-law exponent of a firm size variable, such as annual sales. Our focus is on how to empirically identify a range in which a firm size variable follows a power-law distribution. It is well known that a firm size variable follows a power-law distribution only beyond some threshold. On the other hand, in almost all empirical exercises, the right end part of a distribution deviates from a power-law due to finite size effect. We modify the method proposed by Malevergne et al. (2011) so that we can identify both of the 6 Jessen and Mikosch (2006) provide a compact summary of various properties of power law distributions. One of them indicates that K/α follows a power-law and its exponent is µ K /α. Similarly, the power-law exponent for L/β is µ L /β . Also we know from Jessen and Mikosch (2006) that the product of two power-law variables is again a power-law and its exponent is equal to the smaller one of the two exponents associated with the two variables. We obtain equation (20) by combining these properties. See Mizuno et al. (2011) for further discussions and empirical evidence on this property. 7 Given these parameter values, µ K α = 0.921 and µ L β = 7.633, so that min lower and the upper thresholds and then estimate the power-law exponent using observations only in the range defined by the two thresholds. 8 Malevergne et al. (2011) propose to test the null hypothesis that, beyond some threshold, the upper tail of a distribution is characterized by a power law distribution against the alternative that the upper tail follows a lognormal beyond the same threshold. It is important to note that their intention was to detect a lower threshold by conducting this test, and that no attention was paid to the presence of an upper threshold. In our method, we first apply the test by Malevergne et al. (2011) to detect a upper threshold. We then repeat the test, but we "thin out" observations before conducting the second round test. Specifically, we discard observations above the upper threshold, which is detected by the first round test, and similarly we thin out observations below the upper threshold. Then we apply the test to the thinned out set of observations to detect a lower threshold.
We have applied this new method to various firm size variables, including annual sales, the number of workers, and tangible fixed assets for firms in more than thirty countries. First, we find that our new method works well in identifying upper and lower thresholds. Second, we find that there exits robust tendency in each country that the exponent for tangible fixed capital is the lowest, the exponent of annual sales is the second lowest, and the exponent of the number of workers is the largest. We provide a tentative argument based on a Cobb-Douglas production function to explain the observed difference in the three power-exponents.

Details on Numerical Calculation
In this appendix we will provide more details about how to numerically solve equation (9) and the other related equations. Error functions built in programming languages sometimes fail to solve these equations due to underflow. To illustrate this, consider a function of the form Note that the function f (·) is basically the same function as h(·) in (9). The second row of this equation is obtained by asymptotic expansion. Figure 7a compares the result obtained from a built-in error function and the result obtained using the equation resulting from asymptotic expansion (up to x to the 25th power). We see that the built-in error function is able to return a precise outcome up to x = 6, but unable to do so for the values greater than that due to underflow. To fix this problem, we use a built-in error function for up to x = 4, but use an equation obtained from asymptotic expansion for x > 4.
Turning to a function L(·) in equation (15), we compare in Figure 7b the result obtained from a built-in error function and the result obtained using asymptotic expansion up to x to the 25th power. Again we see that the built-in error function fails to return a precise outcome for x greater than 4. More importantly, there is a discontinuous jump around at x = 4, which cannot be completely eliminated even if we increase the order of expansion. To fix this, we use the built-in function for x < 4 and use an equation obtained from asymptotic expansion for x > 5, and adopt a linear extrapolation between the two. Also we set an approximate value of L(x) for x > 15 at zero since it can be shown analytically that L(x) → −0 as x → ∞.
Finally, a similar problem occurs for L(x) 2 /W (x) in equation (14). We use the built-in function for x < 4 and use an equation obtained from asymptotic expansion for x > 6, and adopt a linear extrapolation between the two. We also set an approximate value of L(x) 2 /W (x) at 64/9 for x > 10 since it can be shown analytically that L(x) 2 /W (x) → 64/9 as x → ∞.
(a) Black line represents f (x) in (21) computed using a built-in error function; Gray line represents the same function f (x), but it is computed using an equation obtained from asymptotic expansion equation (up to x to the 25th power). (15) computed using a built-in error function; Gray line represents the same function L(x), but it is computed using an equation obtained from asymptotic expansion equation (up to x to the 25th power).