Multifractal methodology

Various methods have been developed independently to study the multifractality of measures in many different contexts. Although they all convey the same intuitive idea of giving a"dimension"to sets where a quantity scales similarly within a space, they are not necessarily equivalent on a more rigorous level. This review article aims at unifying the multifractal methodology by presenting the multifractal theoretical framework and principal practical methods, namely the moment method, the histogram method, multifractal detrended fluctuation analysis (MDFA) and modulus maxima wavelet transform (MMWT), with a comparative and interpretative eye.

From a mathematical point of view, the idea of applying fractal theory to measures was hinted by Mandelbrot as soon as 1982 [26], and was theorized more in depth later, notably by Evertsz and Mandelbrot [27], Brown, Michon and Peyriere [28], Olsen [29], Riedi [30], and Pesin [31] in the 1990s, and by Falconer [32] in the 2000s. Based on this theoretical framework, four principal multifractal methodologies have been established to solve practical problems.
The moment method was the first method to be introduced in the mid eighties [1,33,34]. It can still be considered the reference method in the field because of the simplicity of its implementation, adaptability to many types of data, as well as the existence of many variants to enhance its accuracy or computational efficiency. The histogram method [3,27,35] on the other hand improves greatly the run time over the moment method and is less reliant on error generating techniques. However, it only works for data offering a wide variety of scaling ranges. Multifractal detrended fluctuation analysis (MDFA) [36,37] is a generalization of detrended fluctuation analysis (DFA), which was originally created to detect long-range monofractal correlations in DNA nucleotide sequences [10,11]. It is used to remove artifacts created by nonstationarities in one-dimensional time series and uses the core idea of the moment method as its mechanic. The simplicity of its implementation allows to extend it to higher dimensions [15]. Wavelet transform modulus maxima (WTMM) is another method originally invented for time series [38,39]. Better suited for a gen-This diversity of practical methods as well as the variety of domains they can be applied to enlighten the depth of the multifractal formalism. It is also one of its drawbacks, since most methodologies have been developed independently so that multifractality lacks the unity present in some older fields. This article aims at reducing this drawback by presenting the main methodologies in a common intuitive and comparative framework. Its intent is to help making an informed choice of a multifractal methodology for someone willing to study real datasets.
The paper is organized as follows. Section II proposes a unified intuitive approach of the core concepts behind monofractals and multifractals. The general multifractal framework can be grasped without prior knowledge of the Hausdorff measure and the box-counting dimension, although to fully understand the details these definitions are essential. In section III, the main mathematical methodologies as well as the four practical multifractal methodologies mentioned above are detailed, compared and applied to binomial cascades. Each methodology is explained from the ground up and can be understood on its own, keeping in mind that the concepts explained in section II help to understand how they relate to one another. The main elements of interpretation as well as the limits of multifractal analysis are discussed in section IV.

II. FROM MONOFRACTALS TO MULTIFRACTALS
The purpose of this section is to give a brief heuristic approach of monofractals, simply referred to as fractals, and of multifractals. All subsequent practical definitions and methodologies, as unrelated as they may appear at first glance, are only different interpretations of the core concepts explained here.

A. Monofractals: Characterizing space
The use of the word fractal in various overlapping yet different contexts makes it quite confusing for someone new to fractality. The root of all its meanings was planted by Benoit Mandelbrot who coined it from the Latin word "fractus" which means "broken", as in "too irregular to fit into classical geometry" [26]. Over the years, some have restricted its use to sets which present self-similarity or to subsets whose dimension is intuitively a fraction of the integer dimension of the set they are embedded in. The reason why most of the focus has turned towards the former type of sets is that self-similarity makes most subtly different definitions equivalent and provides effective computational tricks for practical uses.
Informally speaking, given an irregular subset D of a space A whose properties are well known, the fractal analyst is usually interested in either quantifying how much of the set A is filled by the subset, or measuring the complexity of D through the scale invariance of its details. Both goals are achieved simultaneously by choosing a well adapted definition of fractal dimension and a method to compute it. In most practical situations, A will be in fact R n and the chosen dimension will be the box-counting dimension. Meanwhile, the mathematician may be more interested in the Hausdorff dimension, the canonical measure of local size. Other, more rarely seen, definitions include the correlation dimension for sets of random points [40] and the packing dimension, a dual to the Hausdorff dimension [41].
For most rigorously self-similar subsets encountered, Hausdorff and box-counting dimensions are in fact the same thing. Finding how much of the set A is filled by the fractal subset D is the same as finding by how much one needs to grow a sub-element of the figure to find the whole figure again. This is done through the relation dim H = − log(number of copies) log(scaling factor) , where dim H is the Hausdorff dimension. Because of this, self-similarity is often treated as a synonym of fractality in the literature. By extension, the word "fractal" is also used for phenomena described by self-similar functions, i.e. functions f : D ⊂ R n → R for which there exists an α such that and in particular power-laws [42], which are, in a sense, representations of scale invariance within the space D.
On the other hand, any dense subset made of a countable number of points is of dimension 0 for the Hausdorff dimension and of dimension equal to its closure for the box-counting dimension. For example Q ∩ [0, 1] in R is of dimension 0 for the Hausdorff dimension and of dimension 1 for the box-counting dimension. Those definitions are therefore far from equivalent in all generality. A detailed discussion on what elements are desirable to define a suitable fractal dimension can be found in chapter 3 of [32].
A simple example is given in Fig. 1. The middle third Cantor set is created from an initial segment of length 1, from which two sub-segments of length one third are extracted. This process is then repeated for each new segment, and so on. Since the resulting set is self-similar with two new copies of itself each scaled at a ratio of 1/3, one would get from equation (1) a Hausdorff dimension of log(2)/ log (3). Calculating directly the Hausdorff dimension without using equation (1) is more involving than one would expect even for such a simple set, hence the motivation to restrict the notion of fractals to self-similar sets. The value log(2)/ log(3) represents how much of the initial segment is still present in the Cantor set after an infinite number of iterations of the generating process.

B. Multifractals: A theory of measures
While monofractals are mostly concerned with spaces, multifractals deal with measures. Even if the idea behind multifractals is also to study the complexity and reveal the scaling properties of a mathematical object, those two concepts are distinct. Indeed, a measure can be a multifractal despite its support not being a monofractal [43]. Let us consider a subset D ⊂ R n on which are defined: • a "fractal" measurement method M; • a finite measure µ which we want to study.
Here, M can be any method providing a way to compute a monofractal dimension, such as those quoted in the previous section, as deemed appropriate for the nature of the space D.
A multifractal measure µ on D is characterized by a distribution such that around any x ∈ D, the measure in a ball of radius r around x scales with r, i.e. is proportional to r α for some α, provided r is small enough, and such that the sets formed by all points around which the scaling exponent is the same are monofractals for M. The fractal dimension of the set corresponding to the local exponent α is usually denoted f (α).
Most methods M are based on defining a self-similar local measurement M r , such as the number of boxes of radius r necessary to cover the set for the box-counting dimension or the quantity H s r in the definition of the sdimensional Hausdorff measure (see [32]). In that case, the multifractality of µ is equivalently characterized by a distribution such that the two following fundamental scaling relations hold for r small enough: 1. µ r (x) ∼ r αx for an α x around any x ∈ D, where µ r (x) is the measure in a ball of radius r around x; The multifractal spectrum is the curve f (α) against α. It gives, roughly speaking, the "fractal dimension" f (α) of sets where the measure scales locally with the same exponent α. Multifractal analysis should be understood as a method to characterize and compare measures defined on D when they present enough scaling properties to alleviate the intrinsic complexity of (D, µ).
An example is given in Fig. 2. The middle third Cantor set is made multifractal by weighting every right subinterval twice as much as every left sub-interval, the total weight being normalized to 1 at each step. The first three steps of this process are illustrated in the top figure. Denote by r k the size of the new sub-intervals at step k, and fix r 0 = 1. Then, at step k = 3, height sub-intervals are obtained, each of size r 3 = (1/3) 3 and carrying a weight that can be expressed as r α 3 for some α. At this macroscopic state, a broad M can be defined such that M r k (α) denotes the number of sub-intervals scaling with r k for an exponent α. This number can be in turn expressed as r Of course, at such a low level of iteration, M does not make much sense. But as k grows to infinity, the spectrum resulting from this M converges to the actual spectrum one would obtain for the Hausdorff measure and the proper totally disconnected weighted middle third Cantor set. The first 500 iterations are illustrated in Fig. 2. It can be noted that the multifractality comes from the measure created by the weights, not from the physical support itself which is only the monofractal Cantor set presented in the previous section. In particular, the dimension of the support, here log(2)/ log(3), can be found at the peek of the spectrum.

III. MULTIFRACTALS IN THE FIELD
In this section, different ways of defining the "multifractal spectrum" are given, along with methods to compute it wherever possible. Those are all equivalent definitions in the sense that they convey the same intuitive idea of providing a fractal dimension to iso-scaling sets in data. As such, it is expected that the spectra resulting from each of them will share the same symmetries. They are not equivalent however on a more precise level, and the resulting spectra may differ in width or lead to an overshooting of the f (α) value. The first formal definition should be considered as canonical and subsequent definitions may be considered as approximations of it.
Methods are sorted in three groups depending on the type of data best suited to support the studied measure: mathematical abstract sets, two-dimensional data such as maps and images, and one-dimensional time series. Nonetheless, all these methods can be extended quite easily to any subset of R n . Methods from the two latter groups are tested against two samples obtained from binomial cascades. The first sample corresponds to a theoretical binomial cascade of parameter p = 0.6, that is the measure one would obtain after averaging over an infinite number of random walks through the cascade, while the second one corresponds to one particular realization of a random walk through the cascade.

A. Abstract sets -Formal definitions
Falconer distinguishes two variants of spectra of particular interest for mathematicians [32]: The singularity spectrum, which is the most canonical definition and encompasses the universality sought in mathematics, and the coarse spectrum, which is more adequate for practical purposes.
Consider a topological space D and a finite measure µ on D. The local scaling exponent α x of µ at x ∈ D is given by the Hölder dimension dim loc , defined by where B(x, r) is the ball of center x and radius r for the topology of D. The singularity spectrum is then defined by the function where dim H is the Hausdorff dimension. Note that the Hausdorff dimension is chosen for M instead of box-counting since {x ∈ D, dim loc µ(x) = α} is often dense in the support of µ, in which case boxcounting would give a constant spectrum equal to the dimension of the support of µ.
Let us now consider an r-mesh grid covering D and count the number of cells for which µ is roughly r α . Define, where # stands for "the number of". Provided the limits do exist, the coarse spectrum is defined by the function where log + (·) stands for max(log(·), 0). When f C does exist, then for all α, and the equality holds true for self-similar measures (Proposition 17.9 of [32]). When f C does not exist, one can define the lower and upper spectra by In that case, according to lemma 17.3 of [32], An example is given in Fig. 3 for a binomial cascade of parameter p = 0.6. The binomial cascade is a simpler version of the multifractal middle-third Cantor set introduced in section II B. Here, an original interval of size 1 is divided into two sub-intervals of length 1/2 carrying a probability 0.6 for the left one and 0.4 for the right one. This process is then iterated on each resulting sub-intervals, and so on. The resulting singularity spectrum f H in the bottom figure is computed using the same trick as in section II B and is identical to the coarse spectrum f C = f C (α) =f C (α), since the measure is selfsimilar. Here, the fractal dimension of the support is 1 since the iterative process does not create "holes" in the initial segment.

B. Spatial data -Moment and histogram methods
Moment and histogram methods are the elementary components of practical multifractal spatial data and image analysis. Both methods rely on counting the measure at different levels of aggregation and include a multitude of variants depending on the aggregation method chosen. The most basic way to aggregate consists in applying square grids of increasing resolutions to the data. Instead of grids, one can use ball neighborhoods of increasing radius as they can be easily calculated for twodimensional geographic data by GIS software, or gliding boxes to increase the number of data points. Grids can also be based on any regular unit shape, such as diamonds or equilateral triangles, instead of squares, to enhance computational complexity or suitability with the data. Other variants exist to counter the reliance on error generating techniques such as linear fits or Legendre transforms. The core mechanics of both methods will be detailed and reference to the literature will be given for their variants.
Let us consider a mesh grid of unit r covering a domain D, and a phenomenon occurring N times in D.
is the probability that an instance of the phenomenon occurs in the i th box. To interpret the two scaling rules from section II B, one simply needs , and ρ is a density function used to take into account the dimension of D.
To effectively compute f , the trick generally used is the moment method [1,33,34]. By raising p i to its moment p q i for different q, one can force only one value of alpha for each q to make a significant contribution to the total value of the measure. Consider then, for r small enough, the value of Z(q) is almost entirely given by the α such that is minimal. Let us call α(q) this value of α. It is easy to show by a Legendre transform that the minimality condition yields and so that computing τ (q) from Z(q) for each q between −∞ and +∞ is enough to obtain the full spectrum. It is not possible in practice to use an infinite range of values for q, nor is it desirable since the method becomes less and less accurate for extreme values of q. To select an appropriate range of q, one should set a threshold for the error generated by linear fitting and dismiss all the values of q for which the threshold is exceeded. It is also necessary to select q in order to ensure that f (α) > 0 and that the generalized dimension D q , defined by D q := τ (q)/(q − 1), remains lower than the dimension of the physical support of the phenomenon. The reason for that last constraint will become clear when the physical meaning of D q will be explained in section IV.
In practice, τ (q) is found directly as the slope in a loglog plot of i µ q i (r) versus r obtained for different grid sizes r, where µ i (r) is the total measure of cell i of size r. Since this slope is independent of the normalization of the measure µ, µ does not need to be weighted as a probability measure. Explicitly, τ (q) is found as the limit In accordance with the idea that practical methods tend to create overshooting, one finds that where f M is the spectrum resulting from the moment method (corollary from Proposition 17.2 of [32]). Both finding τ (q) through linear fitting and applying numerical Legendre transforms have a cost on the accuracy of the results. The possibility of averaging over several samples can be extremely beneficial. There are two ways of doing this: averaging over a range of {f j (α i )} j computed independently for different samples j, or averaging first over a range of {N j (α i )} j and then deducing the corresponding f (α i ) from the relation N (α) ∼ ρ(α)dαr −f (α) on the averaged values. The first solution guarantees to obtain a "classic" positive spectrum, but it can be unreliable if the fluctuations between the f j (α i ) are too important. The second solution is more reliable, but may create an artificial negative part in the spectrum if N (α i ) falls below 1 for some α i as a result of the averaging process.
Chhabra and Sreenivasan argue in [44] that this artificial negative part can still be of relevance when a strong underlying probabilistic process is suspected either as a cause of the phenomenon or as a result of the experimental methodology since it could describe the rarely occurring events. Unfortunately, since α → N (α) decreases exponentially compared to α → f (α) in the negative part, one would need an exponentially increasing number of samples as the resolution gets smaller to maintain accuracy while supersampling. Paradoxically, for a constant number of samples, a better resolution would mean a less accurate result.
A multiplier method is presented in [44] to tackle this problem. The self-similarity of the measure implies the existence of an underlying scale-invariant multiplier distribution such that the α i at resolution r k are only the result of k composition of said multipliers. If there is no correlation in the underlying probabilistic process from resolution r k−1 to resolution r k for some k, then one can deduce the multipliers and hence α. In particular, if all levels of resolution are uncorrelated, one can choose k = 1, otherwise, one should choose the smallest k for which a level of resolution is uncorrelated to the previous one.
Denote r 0 the minimal resolution and r k the resolution chosen as described above. Then, define and for each sample j and box i, Then, according to [44], τ (q) and α(q) are given by .
where N (r) is the number of non zero values of M ij and d is the dimension of the physical support D.
Chhabra and Sreenivasan have shown that for a binary cascade and the dissipation field of fully developped turbulance in the atmospheric surface layer, using the multiplier method allowed to expand the negative part of the spectrum and make it converge to the theoretical result more rapidly than the supersampling method, with also a gain in computational complexity.
Another way to expand the set of sample points consists in using one grid and aggregate with a gliding box for different radii of said gliding box instead of using different grid sizes [13,45]. In that case, the corrected formula reads where N (r) is the number of gliding boxes of size r with non zero measure, µ q i (r) is the measure inside the i th gliding box, and d is the dimension of the physical support D.
Since gliding boxes need not be mutually exclusive, contrary to squares from a mesh grid, the number of values contributing to the analysis remains that of the smallest resolution at all scales. The trade-off is that only boxes which are completely bounded in D should be included, so that only the "inner portion" of the data can be analyzed, or the object of study needs to be surrounded by a large neighborhood of known values (see Fig. 4). Using gliding boxes allows a higher raw number of sample points at the cost of restricting the range of study. It is of course possible to join gliding boxes and the multiplier method by adapting the definition of µ ij and N (r) in equations (15) and (16).
In [2], Chhabra et al. propose a recipe to avoid the Legendre transform of τ (q) when the measure arises from Then, the Legendre transform can be directly integrated in the calculation of f and α through the formulas Note that here α(q) is the average value of α at resolution q. Unfortunately, this recipe does not remove the need for linear fitting when calculating the limits, which is usually the main cause of error. It was applied to twoscaled cantor measures in [46] with good results when the boxes size progression matched the sub-intervals size progression, and "satisfactory" results otherwise despite the errors created by linear fitting. It was also found in [2] that the result of this direct computation were in good agreement with those obtained from Legendre transforming τ (q) for fully developed turbulence. Another direct approach is the histogram method, see for example [3,35]. The idea consists in finding the cells with extremal values of total measure for different grid resolutions, and dividing the distance between those values into regular intervals to exploit the fact that exactly one value of α and f (α) will correspond to an extremity of one of the new sub-intervals.
Let us call µ k i the total measure of cell i of a grid of unit r k , and N (X) the number of boxes presenting feature X.
Step by step, the method breaks down as follows.
1. Compute X k i := log(µ k i ) for each cell i of different grids of unit r k ; 2. Divide [X k min , X k max ] regularly in n smaller intervals for each k, where X k min := min{X k i } and X k max := max{X k i }; 3. Deduce one value of α and f (α) from the slopes of X k and N (X k ) versus log(r k ) for each sub-interval; 4. Repeat for different grid positions to get a better estimate.
In step 3, for 1 ≤ j ≤ n, the value α j is given by the slope of X j,k versus log(r k ) where X j,k is one of the extremities of the j th sub-interval for grid resolution r k . According to [3], the correct normalization of the total measure leads to an expression of f (α) as the slope of log(N (X j,k )∆X 1/2 ) versus log(r k ), where N (X j,k ) is the number of boxes of size r k containing an X falling in the interval of size ∆X around X j,k .
It is indeed a problem to find the correct normalizations of α and f (α) because the relations p i ∼ r αi and N (α i ) ∼ ρ(α i )dα i r −f (αi) depend on prefactors that are unknown a priori. Such a problem is absent in previous methods since those factors are canceled while taking the limit for r → 0, which is not done here.
Meneveau and Sreenivasan have applied the histogram method in [3] for a binomial measure, a period doubling attractor for a specific logistic map and the dissipation field of turbulent kinetic energy in turbulence flows. They found good agreement with the results obtained from the moment method for the first two cases but it was evidenced that errors are generated by the histogram method for measures with small scaling ranges such as the third case. In fact it was evaluated that the exponent found by this method is only accurate up to order log(L/r) −2 , where L represents a characteristic value intrinsic to the problem (a translation of the unknown prefactor). It was recommended to use this method only for measures such that the largest obtainable scale is at least 10 3 times bigger than the smallest measurable scale.
The spectra resulting from moment and histogram methods applied to binomial cascades are given in Fig. 5-7. In Fig. 5, the plots on the top are obtained for the theoretical binomial cascade of probability p = 0.6 introduced in section III A, that is the result of averaging an infinite number of random walks through the iterative process defining the cascade. The plots on the bottom are obtained for one particular realization of a random walk through the cascade. The iterative process is stopped after the 20 th step for which over a million sub-intervals have already been created. This is done to ensure a reasonable run time and use of memory, and because it is comparable to the size of many encountered data types such as pixels of an HD image or data collected from a city-size human settlement. Unfortunately, it also means that the studied cascades are not equivalent to the one used in Fig. 3, for which the iterative process is repeated an infinite number of times. As such the resulting spectra are expected to be similar, but not necessarily identical to the theoretical curve of Fig. 3 depending on the sensitivity of the chosen multifractal method. The chosen range of q for the moment method goes from −20 to 20.
The results of the standard moment method and its variants, the gliding box and the multiplier methods, are illustrated in Fig. 5. In the theoretical case on the top, the range of α is a lot narrower than what is expected based on the reference curve (Fig. 3). The variants help improve the situation, but not by much. This problem is due to the fact that the iterative process was stopped too soon. The top figure of Fig. 6 evidences that stopping the iterative process at step 25 results in a wider spectrum (circles) than stopping it at step 10 (triangles). Aside from this problem, the resulting spectrum is similar to the reference one and a simple rescaling of the range of α is enough to make both spectra harmonious.
For the particular realization of the cascade on the bottom of Fig. 5, the accuracy of the standard and gliding box moment methods deteriorates rapidly for negative q. The multiplier method gives the best results overall. To keep the curve above zero, the range of q had to be restricted to q ≥ −4 for the first two variants and to 17.5 ≥ q ≥ −10 for the multiplier variant.
Results of the histogram method applied to the binomial cascades can be found in Fig. 7. Unfortunately, for such a small range of scaling, the histogram method is not well adapted and the resulting spectra are not smooth. The error generated by the method makes results difficult to interpret in this case. It has however the advantage of being faster than the moment method and gives a range of α closer to the reference in this particular case.

C. Time series -MDFA & WTMM
Some methods have been independently developed for the specific purpose of studying time series, an important object in physics. They are therefore particularly well suited for one-dimensional data, but can be extended to any dimension at the expense of computational complexity. For our purpose, time series will be defined as a one dimensional array of discrete values representing observations taken at regular intervals.
Multifractal Detrended Fluctuation Analysis (MDFA) is thoroughly described in [36,37]. In the basic approach, time series are first sub-divided into smaller segments on which is subtracted a least-squares best-fit polynomial of a chosen order to remove the artifacts created by nonstationarities in the time series. A method similar to Here, h(q) is the hurst exponent, which relates to the classical τ (q) through the relation τ (q) = qh(q) − D f , where D f is the fractal dimension of the physical support of f . The second step is not compulsory but is helpful in the sense that it allows the use of simple polynomials of the form a n i n + · · · + a 0 with i ∈ N to detrend in step three. It should be noted that the expression of F q given above is not well defined for q = 0. It is indeed necessary to set According to [36], MDFA works only for positive h and becomes inaccurate for h close to 0. A solution consists in integrating by considering the sum F (·, s) instead of F . Following the same steps, one would obtain h(q)+1 instead of h(q).
The use of τ (q) as an intermediary step is given to link the method to previous techniques (see equation (10) and (11)), but one can compute directly α and f (α) using the expressions An original application of MDFA, presented in [36], is to distinguish the underlying cause of multifractality between long-range correlations and a broad probability density function. Indeed, if one shuffles the time series, all correlations are destroyed. Hence, when applying MDFA to the shuffled time series, if the resulting spectrum shifts towards monofractality, that is if the shuffled h is constant, then multifractality is probably due to long range correlations. Obviously, multifractality can have several causes, but an important influence of correlations should be noticeable in the alteration of h.
MDFA can be extended to 2 or more dimensions using multivariate polynomials, as proposed in [15]. The 3 dimensions extension is particularly useful to study the evolution of a two-dimensional spatial pattern simultaneously in space and time. Unfortunately, the necessity to choose a common array of s for all directions at the same time, and therefore constraining the precision and accuracy of the method to the direction along which the data is the most scarce or irregular, as well as the rapidly growing computational complexity are significant limiting factors.
Wavelet Transform Modulus Maxima (WTMM) replaces square intervals (and square grids for higher dimensions) by highly customizable wavelets. If those are chosen orthogonal to low order polynomials, a natural detrending happens. The "modulus maxima" part of the name refers to an observation that analyzing the data along maxima lines is enough to bring out the underlying multifractal structure. With a judicious choice of wavelets, one can therefore integrate the advantages of MDFA and extend it efficiently to higher dimensions while maintaining adequate computational complexity. The wavelet transform idea is introduced in [38] and a more extensive study of WTMM can be found in [39].
Consider a function f : R → R representing either a continuous signal or the interpolation of a time series and a wavelet ψ orthogonal to low-order polynomials. The wavelet is a real valued function, preferably of zero mean to ensure the method is invertible. WTMM is divided in the following steps.

Operate the wavelet transform by defining for any
x 0 : 2. Sum along the local maxima lines L(r) at scale r: 3. Find the scaling relation Z q (r) ∼ r τ (q) .
The set of maxima lines L(r) is defined as follows. Consider the set of extrema L(r) defined by Then, the set {(x, r), x ∈ L(r)} is formed of connected curves called maxima lines. The set L(r) is then obtained as the set of all maxima lines defined for all r ≤ r. Explicitly, Analyzing wavelets can be obtained from several ways. A classical one is to use the successive derivatives of the Gaussian function exp(−x 2 /2). Indeed, the derivative of order n is orthogonal to polynomials of order up to n and of zero mean if n is greater than 1. See Fig. 8 for a representation of these wavelets for order 0 to 5.  Another possible way is to process convolutions of the unit box over Dirac type distributions. On Fig. 9, three successive convolutions of three variants of Dirac distributions are represented. The plot Dij is obtained from the Dirac distribution Diraci by applying j number of convolution. Note that only the last two Dirac distributions produce zero mean wavelets and that the unit box has been centered on 0 for aesthetic preferences.
WTMM can be easily extended to n dimensions by considering the wavelets formed by the partial derivatives ψ = (ψ 1 , · · · , ψ n ) of a function φ such as the Gaussian function exp(− |X| 2 /2), where X = (x 1 , · · · , x n ). The wavelet transform is then replaced by the higher dimensional version More details on the two-dimensional case and examples are provided in [39]. In Fig. 10, MDFA is applied to the theoretical binomial cascade and to the random realization of it already used in section III B. When the values of s are chosen as powers of 2 in the theoretical case, the N s intervals are only translated copies of themselves, resulting in a completely flat spectrum (on the top). MDFA is particularly well suited for data such as the random realization (on the bottom) and gives the closest results to what is expected from the mathematical study (see Fig. 3), with only a slight offset to the left of the range of α which is explained by the fact that the iteration process generating the cascade was stopped at a relatively low level of iterations.

IV. INTERPRETATION AND LIMITS
The main elements of interpretation are presented in IV A and links are established with some classical measures of heterogeneity such as Shannon's entropy. The type of information one can expect to gain from multifractal analysis is illustrated with the binomial cascade and results from a study on the multifractality of London's street network [20]. In IV B, limits of multifractal analysis are discussed, in particular the lack of consistency between methods.

A. Link to usual measures and interpretation
Recall equation (8) and introduce the approximation that only the exponent τ (q) = α(q)q − f (α(q)) is making a significant contribution to the value of Z(q).
Then, one defines the generalized dimension as the family {D q } q , where Three values of the generalized dimension are of particular interest. D 0 is the usual box-counting dimension of D and therefore gives information on how much the data fills its physical support. D 1 , referred to as the information dimension, relates to Shannon's entropy and captures how even the data density is, with higher values of D 1 meaning a more uniform density. D 2 is the probability of pairs of independent events occurring in the same box and measures how scattered the data is, with increasing compactness for increasing values of D 2 . It is similar to the correlation dimension quoted in II A, see [40]. The full plot of D q versus q is representative of the strength of the multifractality of µ. The more constant the plot is, the weaker the multifractality is. See Fig. 11. It is also a good way to select the range of q to study. Indeed, equation (30) may lead to obtaining D q values greater than d the dimension of D, in particular for negative values of q. As such a D q would loose its physical meaning, one may want to restrict the range of q to ensure D q ≤ d.
Another expression to obtain D q for q = 1 directly from f (α) and α is It is mainly useful for the histogram method since τ (q) is never calculated when applying it. The limit of the generalized dimension is that it only gives global measurements of the whole data. In contrast, the multifractal spectrum gives one dimension for each set where the data scales similarly. In a sense, the variable q selects different resolutions, with higher values of q selecting a local scaling α(q) of lower order. The variable f (α(q)) then gives the local fractal dimension at resolution q.
It is easy to see that the spectrum's peak is achieved for q = 0, where f (α 0 ) = D 0 is the fractal dimension of the physical support. Of particular relevance are therefore the asymmetries between the left and the right part of the spectrum. The spread of α indicates the variety of scaling present in the sample while the value of f (α) indicates the strength of the contribution of each α.
For the binomial cascade, the range of α(q) is symmet-ric relatively to q = 0 and the spectrum is quite "round", denoting a well balanced repartition of each scaling and its contribution. A more interesting example is given by the evolution of London's street network. Fig. 12 presents a clear picture of the structural differences that the London's street network has experimented in the last 200 years. The strong differences in value and shape between the first and last years is an evidence of how this street network evolved to lose its multifractal nature, and has become more homogeneous in terms of intersection density across the whole city.
Firstly, the singularity exponent α(q) accounts for the balance between areas with more/less street intersections (Fig. 12, top). In 1786, the α(q) values for positive q are relatively low compared with the (q) values for negative q, a situation that confirms that the number of areas with major intersection densities are not that common. As we move forward in time, these differences are less and less evident, until 2010, where basically the same density can be found across the whole network.
The multifractal spectrum curves (Fig. 12, bottom) represent the distribution of intersection densities between the different regions. The left part of each curve (related with positive q, i.e., with the denser areas) became less and less wide as we move forward in time, while the right section remains stable for all nine networks. This is a clear indicator of how most of the network is evolving to become more similar through its different areas.
One last property of the curve f (α) versus α can be used to test the correctness of the result: since µ is finite, equation (12) implies that τ (1) = 0 and, consequently, that f (α(1)) = α(1). Moreover df (α)/dα = q = 1, implying that the spectrum lies below the diagonal and touches it exactly in the point corresponding to q = 1. See Fig. 13 for the middle third Cantor set.

B. Limits
Using a multifractal study written by someone else can be a perilous initiative. The definition of what is called here M is often left implicit and some authors restrict the analysis to either the multifractal spectra or the generalized dimension, or even to the three values D 0 , D 1 and D 2 only. Furthermore, most methods are equivalent for rigorously abstract self-similar sets and measures, but practical estimations often create an overshooting that is hard to measure. It is unfortunately a necessity to tailor the experimental protocol to the type and amount of data studied. One should therefore be extremely careful when comparing analyses made under different circumstances.
Results of isolated multifractal studies should be seen as giving "trends" or describing an evolution. In [20] for example, the loss of multifractality in London's street pattern through time is striking. The study provides rather precise information on which part of the spectrum is collapsing and from what point in time it becomes FIG. 12. London's street network multifractality (taken from [20]). On the top, the curve α versus q shows the variety of zones with low probabilities of intersection not evolving much over the years (left part of the curve), while disappearing in zones corresponding to higher probabilities (right part of the curve). On the bottom, the spectrum indicates an evolution that favours more and more areas with an increasingly lower density of intersections while presenting less and less variety.
FIG. 13. Tangency with the diagonal. On the spectrum obtained for the middle third Cantor set, the diagonal f (α(q)) = α(q) is tangent to the spectrum on the point where q = 1. noticeable, only because it is consistently applied over sufficiently similar datasets of London's street network.
Another main issue when computing the spectra is controlling the error. To the theoretical error inherent to the chosen method one has to add the practical concessions for computational efficiency, and the measurement error that inevitably arises from the large amount of data required. The most common sources of error are the edge effects and the linear fit that is generally used to deduce τ (q) from the slopes of Z(q) against r in log-log plots. The value τ (q) is quite sensitive and its computation often has to be automated and operated over only few values of r.
Choosing an adequate range of q is critical for moment based methods (including MDFA). In theory, q should go from −∞ to +∞, which is not possible in practice. For negative values of q, one may find values of D q greater than the dimension of the support of the measure, indicating that those values of q should be discarded. Furthermore, values for which the linear fitting of Z(q) versus r is not obtained with a sufficiently good level of confidence should also be discarded. For example, Lee and Stanley found in [47] that for diffusion-limited aggregation, a phase transition occurs. Below some critical value of q, the function Z(q) does not scale as a power law and forcing a linear fit for these values would create important discrepancies.
As evidenced through the example of the binomial cascades, the smallest resolution allowed by the studied dataset may play an important role in the results' accuracy. Multifractality aims at studying measures at an infinitely small scale and a scale as small as 10 −6 might not be small enough to be considered "infinitely" small. In particular, when comparing two similar objects for which the measure cannot be evaluated with the same level of precision, one should use the largest minimal resolution of the two as the starting one for both.
It should be noted that predicting beforehand the shape of the spectrum is often difficult. There exist exactly self-similar nonrandom measures for which half of the spectrum is not defined [48], proving that unusual shapes may be normal. Another remark is that if µ 1 and µ 2 are finite measures on R n with disjoint supports, then it can be proven that where f µ H is the fine spectrum of measure µ as defined in III A. In particular, f H (·) does not need to be concave.

V. CONCLUSIONS
Four methods were presented in the multifractal context. They are the moment method, the histogram method, MDFA and WTMM. Each is particularly well suited for a particular type of data: spatial data for the moment method and also for the histogram method provided the range of scaling is large enough, onedimensional time series for MDFA and WTMM, with a convenient possibility to extend to higher dimensions for the last one.
Through the study of binomial cascades, it was found that the MDFA method gave the best results in addition to the possibility to detect the source of multifractality by applying the shuffling procedure. The moment method proved both simple and reliable with many variants giving it great adaptability to different contexts.
It was finally evidenced that multifractality can provide useful information about both the local and the global complexity and inhomogeneity of a phenomenon. In addition, it can be an effective comparative tool for measures and spaces that are too complex for classical geometry provided the methodology used is consistent.
It was emphasized that one should focus on trends and evolutive aspects in a phenomenon rather than expect to obtain a foolproof numerical result.

APPENDIX: R CODES
The codes implemented in R for the first three methods described in this paper will be available at the following link.