First digit law from Laplace transform

The occurrence of digits 1 through 9 as the leftmost nonzero digit of numbers from real-world sources is distributed unevenly according to an empirical law, known as Benford's law or the first digit law. It remains obscure why a variety of data sets generated from quite different dynamics obey this particular law. We perform a study of Benford's law from the application of the Laplace transform, and find that the logarithmic Laplace spectrum of the digital indicator function can be approximately taken as a constant. This particular constant, being exactly the Benford term, explains the prevalence of Benford's law. The slight variation from the Benford term leads to deviations from Benford's law for distributions which oscillate violently in the inverse Laplace space. We prove that the whole family of completely monotonic distributions can satisfy Benford's law within a small bound. Our study suggests that Benford's law originates from the way that we write numbers, thus should be taken as a basic mathematical knowledge.


Introduction
There is an empirical law concerning the occurrence of the first digits in real-world data, stating that the first digits of natural numbers prefer small ones rather than a uniform distribution as might be expected. More accurately, the probability that a number begins with digit d, where d = 1, 2, . . . , 9 respectively, can be expressed as P d = log 10 (1 + 1 d ) , d = 1, 2, . . . , 9 , as shown in Fig. 1. This is known as Benford's law, which is also called the first digit law or the significant digit law, first noticed by Newcomb [1] in 1881, and then rediscovered independently by Benford [2] in 1938. Empirically, the areas of lakes, the lengths of rivers, the Arabic numbers on the front page of a newspaper [2], physical constants [3], the stock market indices [4], file sizes in a personal computer [5], survival distributions [6], etc., all conform to this peculiar law well. Due to the powerful data analyzing tools provided by computer science, Benford's law has been verified for a vast number of examples in various domains, such as economics [7,8], social science [6], environmental science [9], biology [10], geology [11], astronomy [12], statistical physics [13,14], nuclear physics [15,16,17], particle physics [18], and some Figure 1: Benford's law of the first digit distribution, from which we see that the probability of finding numbers with leading digit 1 is more than 6 times larger than that with 9. dynamical systems [19,20]. There have been also many explorations on the applications of the law in various fields, e.g., in upgrading the description in precipitation regime shift [21]. Some applications focus on detecting data and judging their reasonableness, such as distinguishing and ascertaining fraud in taxing and accounting [22,23,24,25], fabrication in clinical trials [26], the authenticity of the pollutant concentrations in ambient air [9], electoral cheats or voting anomalies [5,27], and falsified data in scientific experiments [28]. Moreover, the first digit law is applied in computer science for speeding up calculation [29], minimizing expected storage space [30,31], analyzing the behavior of floating-point arithmetic algorithms [31], and also for various studies in the image domain [32,33].
Theoretically, several elegant properties of Benford's law have been revealed. In mathematics, Benford's law is the only digit law that is scale-invariant [34,35], which means that the law does not depend on any particular choice of units. This law is also base-invariant [36,37,38], which means that it is independent of the base b. In the octal system (b = 8), the hexadecimal system (b = 16), or other base systems, the data, if fit the law in the decimal system (b = 10), all fit the general Benford's law The law is also found to be power-invariant [18], i.e., any power ( = 0) on numbers in the data set does not change the first digit distribution.
There have been many studies on Benford's law with numerous breakthroughs. For example, Hill provided a measure-theoretical proof that Benford's law is equivalent to the scale-invariant property and that random samples taken from randomly-selected distributions converge to Benford's law [37,38]. Pietronero et al. explained why some data sets naturally show scale-invariant properties from a dynamics governed by multiplicative fluctuations thus conform to Benford's law [39]. Gottwald and Nicol figured out that deterministic quasiperiodic or periodic forced multiplicative process and even affine processes also tend to Benford's law [40]. Engel and Leuenberger focused on exponential distributions and illustrated that they approximately obey Benford's law within a bound of 0.03 [41]. Smith applied digital signal processing and studied the distributions on the logarithmic scale and their frequency domain, revealing that the first digit law holds for distributions with no components of nonzero integer frequencies [42]. Fewster asserted that any distribution might tend to Benford's law if it can span several orders of magnitude and be reasonably smooth [43].
However, there are still various data sets that violate Benford's law, e.g., the telephone numbers, birthday data, and accounts with a fixed minimum or maximum. Benford's law still remains obscure whether this law is merely a result of our way of writing numbers. If the answer is yes, why not all number sets obey this law; if the answer is no, why is this law so common that it can be a good approximation for most data sets. The situation can also be reflected by some puzzles about Benford's law in the literature, e.g., it is stated by Tao that no one can really prove or derive this law because Benford's law, being an empirically observed phenomenon rather than an abstract mathematical fact, cannot be "proved" the same way a mathematical theorem can be proved [44]. Aldous and Phan also suggested that without checking the assumptions of Benford's law for the data sets we studied, this logically correct mathematical theorem is not relevant to the real world [45].
Therefore, most studies on Benford's law are case studies in literature, restricted to a specific probability density distribution or a group of them. In this work, we provide a general derivation of Benford's law with the application of the Laplace transform, which is an important tool of mathematical methods in physics [46]. From our derivation, we can safely assert that the deviation from Benford's law is always less than a small proportion of the L 1 -norm of the logarithmic inverse Laplace transform of the probability density function. This bound is universal. Since the L 1 -norm of the logarithmic inverse Laplace transform is usually small but not zero, Benford's law is commonly well obeyed but not strictly obeyed. We introduce a guideline to judge how well a specific distribution obeys Benford's law. In this method, the degree of deviation from the law is associated with the oscillatory behavior of the probability density function in the inverse Laplace space. We find that the whole family of completely monotonic distributions can all fulfill Benford's law within a small bound. We also carry out some numerical estimations of the error term, and present several examples which verify our method. We agree with Goudsmit and Furry [47] and reveal from our own method that the appearance of the first digit law is a logical consequence of the digital system, but not due to some unknown mechanics of the nature.
Our work is organized as follows. In Sec. 2 we introduce the digital indicator functions for the given digital system and put forward an intuitive explanation of Benford's law by revealing the heterogeneity of such functions among different first digits. In Sec. 3 we apply the Laplace transform to the digital indicator functions to reveal their elegant properties. From these, in Sec. 4 we provide a general derivation of a strict version of Benford's law and prove that the strict Benford's law is composed of a Benford term and an error term. In Sec. 5 we study the error term by applying our general result to four categories of number sets, which obey Benford's law to varying degrees. Especially, we prove that completely monotonic distributions can satisfy Benford's law well. Numerical studies are also provided to verify our method. Sec. 6 is reserved for conclusions.

The intuition
Let F (x) be an arbitrary normalized probability density function (PDF) defined on the positive real number set R + (here we use the capital letter F instead of the lowercase one, due to conventions for the Laplace transform introduced in Sec. 4). It does not matter if negative data are allowed, for we can instead use the PDFs of their absolute values.
In the decimal system, the probability P d of finding a number with first digit d is the sum of the probability that it is within the interval [d·10 n , (d + 1)·10 n ) for an integer n, therefore P d can be expressed as which can also be rewritten as where g d (x) is the digital indicator function (DIF), indicating numbers with first digit d in the decimal system (here the lowercase letter is used, also due to conventions of the Laplace transform). Using the notation of the Heaviside step function, we can write g d (x) as Different first digits define different g d (x) functions, thus behave differently in the digital system. For a better illustration, we draw the images of g 1 (x) and g 2 (x) in the interval [1,30), as shown in Fig. 2. We notice that g 2 (x) can be neither a translation nor an expansion of g 1 (x), and that the gap between the shaded areas in g 2 (x) is wider than that in g 1 (x). This fact intuitively explains the inequality among the 9 digits, where smaller leading digits are more likely to appear. Furthermore, if drawn on the logarithmic scale, g d (x) becomes a periodic function with a mean value of log 10 (1+ 1 d ). This gives us the intuition why g d (x) has a strong connection with Benford's law. In the following sections, through strict mathematical derivations, we verify our intuition and show that the Benford term comes exactly from g d (x).

The Laplace transform of the digital indicator function
In this section, we study the Laplace transform of the digital indicator function (DIF) and show that the transformed DIF is also a log-periodic function which frequently appears in various systems [48], and exhibits some elegant properties that indicate Benford's law. For general cases, we can define the DIF under base-b as g b,d,l (x), whose value is 1 for numbers within the interval [d·b n , (d + l)·b n ) for some integer n and 0 otherwise, i.e., The Laplace transform of this general DIF is defined as We turn to the logarithmic scale again and further define The properties of H b,d,l (s) are given as follows: The first property is obvious by expanding H b,d,l (s) from Eqs. 8 and 9, i.e., For the second property, we have the mean value of With these two properties, it is straightforward to rewrite H b,d,l (s) as where ∆ b,d,l (s) represents the periodic fluctuation of H b,d,l (s) around its mean value. It is noted here that H b,d,l (s) is the logarithmic Laplace spectrum of the DIF. Therefore, it is independent of any particular distributions of number sets. The first term log b (1 + l d ) is here called the Benford term and we will show in Sec. 4 that it is the origin of the classical Benford's law, while ∆ b,d,l (s) is responsible for the possible deviation from the law. We will show in Sec.5 that this deviation is small for a big family of distributions.
It is worth noting that the Benford term is derived merely from the DIF of a certain digital system without assuming the exact form of the PDF. Therefore, we assert that the origin of Benford's law comes from the way that the digital system is constructed, instead of the way that some specific number set is formed.

The derivation of the general digit law
We see that the logarithmic Laplace spectrum of the digital indicator function fluctuates around the Benford term. Another reason why we choose the Laplace transform is that the inverse Laplace transform can be served as a method to judge how well a specific PDF obeys the law, as well as to derive the general digit law. For an arbitrary PDF F (x), we can assume that it has an inverse Laplace transform f (t) which belongs to L 1 (R + ), satisfying The probability that a number drawn from a data set We turn to the logarithmic scale again and define then f (s) also satisfies the normalization condition, i.e., According to the property of the Laplace transform, Eq. 14 can be rewritten in the inverse Laplace space of the PDF as Combining the expression of H b,d,l (s) in Eq. 12 and the normalization condition of f (s) in Eq. 16, we derive the strict form of Benford's law, which is composed of a Benford term and an error term, as follows, Since ∆ b,d,l (s) is slightly fluctuating, the error term is small for most circumstances, as we intend to illustrate further in Sec. 5. If we ignore the error term in Eq. 18, the strict Benford's law turns into the general digit law, i.e., Lots of variations of the classical Benford's law can be seen as corollaries of the general digit law. For example, the base b can be set to 100 to derive the second significant digit law given by Newcomb [1]. A number (x) 10 in the decimal system can be equally treated as a number (x) 100 in the base-100 system, so that the second digit of (x) 10 being d is equivalent to that either the first "digit" of (x) 100 belongs to the set S d = {10 + d, 20 + d, · · · , 90 + d}, or that the first "digit" of (10x) 100 belongs to the same set S d . Therefore, we have P 2nd digit of (x) 10 Similar reasoning can also be applied to the ithsignificant digit law of Hill [38]: letting D i (D 1 , D 2 , ...) denotes the ith-significant digit (with base 10) of a number (e.g., D 1 (0.0314) = 3, D 2 (0.0314) = 1, D 3 (0.0314) = 4), then for all positive integers k and all d j ∈ 0, 1, · · · , 9, j = 1, 2, · · · , k, one has

The error term
In this section, we introduce a method to judge how well a certain PDF obeys the classical Benford's law by analyzing the total error term in Eq. 18, which is the interrelation of ∆ b,d,l (s) and f (s), i.e., We know in Sec. 4 that ∆ b,d,l (s) is a ln b-periodic function with a mean value of 0. For instance, a graph of ∆ 10,1,1 (s) is shown in Fig. 3. The amplitude of this periodic function is small compared with the Benford term, e.g., the amplitude of ∆ 10,1,1 (s) is less than 0.03 while the Benford term is 0.30. Therefore, intuitively speaking, if f (s) is smooth enough and changes slowly, its interrelation with ∆ b,d,l (s) tends to be averaged out; thus the total error tends to be small. On the other hand, if f (s) oscillates violently, the interrelation is highly sensitive to the exact form of f (s), and the total error is likely to be large. Rigorously, we can classify real-world number sets into the following four categories, with different degrees of deviation from the classical Benford's law.
1. The error term equals to the constant 0 for scaleinvariant distributions. 2. Since the error term is bounded by a proportion of the L 1 -norm of f (s) (noted by f 1 ), i.e., when f (s) oscillates mildly between positive and negative values, f 1 is close to 1, thus the error term is small. 3. Specifically, if f (s) ≥ 0 holds for ∀s ∈ R, f 1 reaches its minimum value 1, so the bound is the tightest, i.e., Such distributions are called completely monotonic distributions. 4. When f (s) oscillates dramatically, its L 1 -norm becomes large, so the error term becomes uncertain and the classical Benford's law is generally violated.
We prove and explain the above assertions one by one in the following sections. Some numerical examples are provided for better illustration.

Scale-invariant distribution
We first turn to scale-invariant distributions which are initially discussed by Hill on the prospect of the probability measure theory [37]. Pietronero et al. [39] has shown that scale invariant distributions arise naturally from any multiplicative stochastic process such as the dynamics of stock prices. With Laplace transform, we can also show that scale invariance leads to Benford's law.
Following the definition of Hill [37], a scale-invariant probability measure P is a measure defined on the fol-lowing σ-algebra satisfying P (S) = P (λS) for all λ > 0 and S ∈ M . Hill proved that such a scale-invariant measure strictly satisfies Benford's law. We can easily prove this result again with the language of PDF, if we set S to be ∞ n=−∞ [d, d+l)×b n and notice that the scale-invariance property implies the following statement, i.e, for all ǫ ∈ R, holds, where C is independent of ǫ. According to the periodic and zero-mean properties of ∆ b,d,l (s), we have Thus, we get ln b 0 C ds = 0, i.e., C = 0. We notice that C is also the error term in Eq. 22. Therefore, such scale-invariant distributions conform strictly to the classical Benford's law.

Small-f 1 distribution
Although scale invariance is common in nature, not all natural data sets are scale invariant. Even for those data sets which are not scale invariant, Benford's law can still be a good approximation for most cases. This is because the error term in Eq. 22 can be well bounded by the L 1norm of f , as is shown in Eq. 23.
The proof is from the fact that f 1 is an upper bound of the integral of a function. According to Eq. 22, we have where f 1 = ∞ −∞ f (s) ds is the L 1 -norm of f (s). In the decimal system, we numerically calculate the Benford terms (noted by P B 10,d,1 ) and the maximum value of ∆ b,d,l (noted by ∆ max 10,d,1 ) in Table 1. The relative errors δ max 10,d,1 = ∆ max 10,d,1 /P B 10,d,1 are also listed. From Table 1, we 5 11.0 11.3 11.4 11.5 11.5 11.6 11.6 11.7 notice that when f 1 is small, Benford's law holds well, with a maximum relative error of 12 * f 1 % for all digits. The exponential distribution is a good example of smallf 1 distributions.
The log-normal distribution with a big variance is another example. The PDF is As long as σ is not too small relative to the base b, f (s) oscillates mildly between positive and negative values, so f 1 is also considerably small. For example, when µ = 5 and σ = 1, Then, from Eq. 23 we have ∆ total,10,1,1 ∈ f 1 min{ ∆ 10,1,1 }, f 1 max{ ∆ 10,1,1 } ⊂ (−0.048, 0.047) , which is also acceptable compared to the Benford term 0.301.

Completely monotonic distribution
The family of completely monotonic (c.m.) distributions is a special case of small-f 1 distributions. Completely monotonic distributions are probability distributions with c.m. PDFs. This is equivalent to say that f (s) is nonnegative for all s ∈ R. In this case, f 1 reaches its minimum value 1 due to the normalization condition in Eq. 16, thus we get the tightest bound in Eq. 24. In fact, the exponential distributions and the scale invariant distributions that we have discussed above are both c.m., but the family of c.m. functions are much more prosperous. Miller and Samko [49] surveyed a series of good properties of completely monotonic functions. Herein we summarize these properties again for the convenience of the readers.
4. If F (x) is c.m., then e F (x) is also c.m.
From properties 4, 5, 6, 7, we can generate a large family of c.m. functions from elementary c.m. functions in Property 3. For such a c.m. function F (x) to be a valid PDF, it should also satisfy the normalization condition in Eq. 16. When the normalization condition can be guaranteed by introducing a normalization factor. Several examples of c.m. PDFs are listed below. The parameters are thus chosen so that the integral in Eq. 35 converges. The normalization factors are omitted in Eq. 36. Distributions generated from these PDFs all satisfy Ben-ford's law within a very small bound: Some non-c.m. distributions can be converted to c.m. distributions through a non-linear transformation of the data. Literature has shown that non-linear transformations on some data sets yield more robust results when Benford's law is used to detect fraud [50]. To explain this, suppose x is a random variable with PDF F (x), if we transform x into y = τ (x), then the PDF of y becomes where n(y) is the number of solutions in x for the equation τ (x) = y and τ −1 k (y) is the kth solution. One example of this case is the normal distribution with the PDF Through the transformation y = (x − µ) 2 , the PDF becomes F (y) = 1 2πσ 2 y e − y 2σ 2 , y > 0.
Eq. 39 is c.m. Therefore, the transformed data set of y fulfills Benford's law with an error bound of 0.03, same to that of exponential distributions.

Violently-oscillating-f (s) distribution
The only case in which Benford's law loses its power is when f (s) oscillates violently between positive and negative values. The fast oscillation of f (s) makes the small term of ∆ b,d,l be counted and accumulated again and again. Hence, f 1 becomes large, and f (s) is highly sensitive to some tiny perturbation on F (x), reflecting the instability of the inverse Laplace transform [51]. Therefore, the total error is also highly sensitive to the exact form of F (x) and Benford's law is generally violated in this case.
Examples of such violently-oscillating-f(s) distributions are log-normal distributions with small σ and uniform distributions. We have shown that the log-normal distribution with parameters µ = ln 5 and σ = 1.0 approximately conforms to the classical Benford's law. However, when σ becomes smaller, the distribution is concentrated on some specific first digits, as shown in Fig. 4 for σ = 1.0, 0.5, 0.3. In the logarithmic inverse Laplace space, we numerically calculate f (s) by the Stehfest method [52,53,54]    and plot them together with ∆ 10,1,1 (s) in Fig. 5. We notice that f (s) displays stronger oscillatory behavior as σ decreases. Thus the interrelation between ∆ 10,1,1 (s) and f (s) is highly sensitive to the exact form of f (s), or some tiny perturbation on F (x). In fact, we can numerically calculate P 10,1,1 directly from Eq. 4 for 0 < σ ≤ 1.5 and the results are shown in Fig. 6. This verifies our prediction. When σ becomes smaller, f (s) oscillates stronger and P 10,1,1 deviates fur-  ther from the Benford term. For the uniform distribution on the interval [1, a] (a > 1), f (s) oscillates even stronger. The PDF is given by We use an analytic function to approach F (x), and calculate the numerical values of the inverse Laplace transform for a = 10, 20 and 30, as shown in Fig. 7. Since f (s) is extremely unstable for such distributions, we can expect the total error is unstable as well, generally large. Also for verification, we draw the numerical values of P 10,1,1 under 1 < a ≤ 50 in Fig. 8. P 10,1,1 in this case depends greatly on the endpoints of the PDFs, occasionally coincides with the Benford term but generally violates the classical Benford's law. This is again expected.
At last, we want to rectify two typical misunderstandings about Benford's law, i.e., 1. random data sets without human manipulation are supposed to fulfill Benford's law; 2. smooth distributions that span many orders of magnitude should satisfy Benford's law.
Unfortunately, however, neither of these two assertions are correct.
As we have shown, the first statement is only approximately true for random variables generated from PDFs with small f 1 . Even natural random data sets could violate Benford's law if their PDFs oscillate violently in the inverse Laplace space. For example, lots of natural number sets are distributed normally or lognormally near their mean values with very small variances, such as heights of all trees on the earth. Such data sets, although natural, do not obey Benford's law.
As for the second statement, whether or not a distribution satisfies Benford's law is determined by the shape, instead of the scale, of the PDF. Therefore, even an extremely flat PDF which spans many orders of magnitude may still violate Benford's law. To understand this, we note that one can change the scale of any PDF F 1 (x) by multiplying the original data with an arbitrary number a. The PDF of the new data set is When a turns bigger, F 2 (x) is flattened out, and it can span as many orders of magnitude as we desire. However, the logarithmic inverse Laplace transform of F 1 (x) and F 2 (x) differ only by a horizontal shift, i.e., f 2 (s) = f 1 (s + ln(a)).
Such a horizontal shift does not change the unstable nature of the error term in Eq. 22.
If we desire to reduce the error term, we need to flatten out f 1 (s) directly, e.g., into f 2 (s) = 1 α f 1 ( s α ). When α becomes bigger, the total error of Benford's law in Eq. 22, as the interrelation between a periodic function and an extremely flat f 2 (s), tends to vanish. In this case, the shape of the PDF F 1 (x) has been changed. In fact, when α is big enough and f 1 (1) = 0 (this can be guaranteed up to a scaling factor in Eq. 41), F 2 (x) approaches to the scale invariant distribution, i.e., Such an operation brings Benford's law back to power again because the shape of the original PDF, not only the scale, has been changed.

Summary
The first digit law has revealed an astonishing regularity of natural number sets. We introduce a method of the Laplace transform to study the law in depth. Our method can explain the long-standing puzzle about Benford's law, i.e., whether or not Benford's law is merely a result of the way of writing numbers. Our answer is yes in the sense that the Benford term can be derived independently of any specific probability distributions. This does not conflict with the fact that when the L 1 -norm of the logarithmic inverse Laplace transform of the PDF is large, Benford's law is always violated.
Besides, the method sets a bound on the error term, allowing us to predict the validity of Benford's law by the logarithmic inverse Laplace transform of an arbitrary PDF. Real-world distributions can be categorized into four types, corresponding to their oscillatory behavior in the inverse Laplace space. A milder oscillation guarantees higher conformity to the law, and vice versa. Especially, the whole family of completely monotonic distributions all obey Benford's law within a small bound. Numerical examples are shown to verify our method. It is not strange anymore why Benford's law is so successful in various domains of human knowledge. Such a law should receive attention as a basic mathematical knowledge, with great potential for vast application.