Toward a deeper understanding of a basic cascade

Towards the end of the last century, B. Mandelbrot saw the importance, revealed the beauty, and robustly promoted (multi-)fractals. Multiplicative cascades are closely related and provide simple models for the study of turbulence and chaos. For pedagogical reasons, but also due to technical difficulties, continuous stochastic models have been favoured over discrete cascades. Particularly important are the $\alpha$ and the $p$ model. It is the aim of this contribution to introduce original concepts that shed new light on a variant of the latter paradigmatic cascade and allow key features to be derived in a rather elementary fashion. To this end, we introduce and study a discrete version of the $p$ model which is based on a new kind of sampling. Technical machinery can be kept simple, therefore proofs are straightforward and formulas are explicit. It is hoped that the proposed line of investigation may enhance understanding and simplify received multifractal analyses.


Introduction
Early examples of fractals were provided by, among others, mathematicians Weierstraß, Cantor and Peano.Later, upon studying dynamic systems, chaos and turbulence, physicists found similar patterns.Pioneering work dates back to the first half of the 20th century, in particular to Richardson (1922) and Kolmogorov (1941), and Mandelbrot (1982Mandelbrot ( , 1997Mandelbrot ( , 1999) ) obtained a first synthesis when he established a strong link between fractal geometry and its applications in the sciences (physics and economics in particular) which has since been extended to "multifractal methodology" (Salat et al. 2017), general "critical phenomena" (Sornette 2007), and asymptotic theory (Kendal and Jørgensen 2011).
Moving from the objects involved to the processes generating them, cascades have come into focus (Schertzer and Lovejoy 2011) only recently.On the one hand, they are quite common.On the other hand, they are also a "key idea" conceptually (Lovejoy (2019), p. 76).That is, although cascades are rather primitive (iterate a basic building block), they can easily be adapted to observable phenomena: The basic building block (a type of fork-like structure) can be chosen appropriately, the propagation mechanism may be deterministic or stochastic, scales (and scale invariance) are closely related, and the approach works in spaces of (almost) any dimension.
In more detail, the basic element to be considered is a bifurcation which distributes some commodity m (mass, energy,. ..) in a parent vertex among two descendants: If mass is neither added nor withdrawn (i.e., conserved, m 0 + m 1 = m), such a step is called microcanonical.Typically, there is a preference toward one of the siblings.
The standard stochastic model for a bifurcation (the division of mass among two descendants) is the Bernoulli distribution B(p), with m = 1, m 0 = 1 − p and m 1 = p.In the following, we are going to study the more general 'double random' model probability p and population V 0 ∼ H 0 with probability 1 − p.Of course, if V i ≡ i (i = 0, 1) we just have a standard Bernoulli with realizations 0 and 1, respectively.
However, in general, V 0 and V 1 are random variables with probability distributions H 0 and H 1 , respectively.Thus the bifurcation turns out to be canonical in the sense that mass is conserved in expectation: If, w.l.o.g., EV i = i, then pEV 1 + (1 − p)EV 0 = p.
The second constitutive idea of a cascade is an iteration of the splitting procedure.
That is, every leaf of a given structure (a tree, if there is a single root) is replaced by another bifurcation.Although we are used to think of the Binomial B(n, p) as the most natural continuation of the Bernoulli B(p), note that there is a fundamental difference between a cascade and a binomial structure (see the next figure).
Illustration 1: Cascade (local splitting) vs. binomial structure.In a cascade, there are only bifurcations.Therefore, a binary tree evolves whose number of leafs doubles with every iteration and grows exponentially fast (2 n → 2 n+1 , n ≥ 0).

•
A binomial structure, however, grows slowly (n → n + 1, n ≥ 1), since right after the splits, descendants merge.Actually, given n iterations, the characteristic feature of the Binomial is to count the number of ways n k that lead to some leaf k, (0 From a dynamic system perspective, paths split and merge in Pascal's triangle, leading to an accumulation of mass toward the centre.However, since paths only split in a cascade, the latter are excellent models for divergent phenomena, such as chaos and turbulence (with vortices, eddies or boxes multiplying, but not fusing).Moreover, given a starting point with all mass concentrated there, repeated local bifurcations (all governed by the same mechanism), evoke self-similar structures that can often be extended to a reasonable (multi-)fractal limit (Shynkarenko 2019).
Pascal's arithmetic triangle corresponds to the Binomial B(n, p).Analogously, the above 'geometric triangle' (with f = 2/1 = p/(1 −p), and thus p = 2/3) corresponds to a new probability distribution that will be named the Weaver's distribution W (n, p).
More explicitly, Y ∼ W (3, 2/3) and W (3, p), respectively, obtain the values y k with probabilities p k : The choice of the realizations y k = k/(2 n − 1) or the unstandardized values k = 0, 1, . . ., 2 n − 1, is a very natural one, and simplifies a thorough analysis considerably.
Actually, the received p model (see Mandelbrot (1989), p. 19, Mandelbrot (1974), p. 329, and de Wijs (1951, 1953)) is a continuous version of the Weaver.That is, for n = 0 one may start with the continuous Uniform distribution on the unit interval.Next, the proportion 1 − p is uniformly distributed on the interval (0, 1/2), and the proportion p is uniformly distributed on the interval (1/2, 1).In the same vein, one splits the masses further (locally), and obtains the following cascade (illustration 2).
Illustration 2: Cascade of the p model . . .
Curiously enough, Mandelbrot (1999), p. 87, says that the p model appeared "in an esoteric corner of mining engineering science."However, if one thinks about it, ores are the result of an enrichment process, and a straightforward model for such a process is a sequence of binary decisions, i.e., a cascade that is biased in favour of some mineral.
Although a density with many jumps is more difficult to handle than a suitably defined distribution with finite support, it is well-known that the limit distribution function of the p model, with the exception of p = 1/2, has no density (Salem 1943).Instead, one encounters a multifractal structure asymptotically, involving a certain amount of polarization, depending on p.This corresponds to veins of gold or a mineral deposit vs. dead rock, say, in a mining application.For a graphic example see Hill (1999).
Since the basic building block used in the p model is a binary bifurcation (each point bequeaths its mass to two descendants with proportions p and 1 − p, respectively), the corresponding cascade should be named after Bernoulli.Unfortunately, the terms 'binomial cascade' (and 'binomial measure') have caught on in the literature, since a process that is based on the Bernoulli readily yields a binomial distribution.However, the Weaver W (n, p) and its limit W (p) are as basic as the Binomial and its siblings.
We are going to demonstrate that the crucial difference between the Binomial and the Weaver can be reduced to a slightly more sophisticated way of sampling that 'augments' Pascal's triangle to the 'geometric triangle' above.The latter multiplicative pattern is equivalent to local Bernoulli bifurcations and brings out the fractal nature of zero-one decisions.In algorithmic terms: 0) There are two populations H 0 , H 1 ; fix n ≥ 1, and let j = 1. 1) Choose one of the populations at random, C ∼ B(p).
2) Draw an iid sample of size 2 j−1 from the selected population H c .
3) Let j = j + 1. 4) If j ≤ n, proceed to step 1) Vividly, step 2 (named exponential sampling hereafter) acts as a "repulsive force" that is able to prevent the two distinct populations from merging.Essentially, we are going to study the sum of the sample X 1 ; X 2 , X 3 ; X 4 , X 5 , X 6 , X 7 ; . . .thus constructed. 2 The rest of this article is organized as follows: In the next section we give a rather different, theoretical motivation for the above model.Section 3 defines exponential sampling, and derives the Weaver's distribution and its properties in a systematic way.It turns out that the corresponding deterministic cascade forms 'threads' that interweave in a particular way.Moments are given in section 4, the limit distribution is discussed in section 5, and section 6 is devoted to populations with finite variances.In particular, the variance can be decomposed into several components.Finally, in section 7, we compare several kinds of asymptotic behaviour.
2 Of course, if the populations are trivial, in particular if H c d = ε c (point mass at c, c ∈ {0, 1}), then the sample just consists of n blocks of zeros and ones, the length of each block being 2 j−1 .

Theoretical motivation
Given an iid sequence X 1 , X 2 , . . . of random variables, the basis of traditional (Frequentist) statistics are the central limit theorem (CLT) and some law of large number (LLN), i.e., the convergence of Xn = S n /n = n i=1 X i /n towards a single number.However, in calculus, convergence of a sequence x 1 , x 2 , . . . is a strong assumption, and, typically, not even the (much weaker) Cesaro-limit lim n→∞ xn = lim n→∞ ( x i /n) exists.In dynamic system theory, also, convergence towards a point is a rare exception.
In probability theory, the iid model represents a single population and a large, potentially infinite sample from this population.To avoid convergence, it is thus straightforward to consider two populations (distributions H 0 and H 1 ), and a sample that fluctuates between them.In other words, if one switched between the populations skilfully, Xn should not converge.In the jargon of dynamic system theory, the (unique) limit point may be replaced by a (more complicated) attractor.
However, a constant switching rate won't do: If j observations from H 0 are followed by j observations from H 1 , and so forth, the arithmetic mean of this sequence will converge, since the 'influence' of another j observations on Xn becomes insignificant with increasing n.Yet if 2 j (j ≥ 0) observations from H 0 are followed by 2 j+1 observations from H 1 , etc., one then obtains the desired effect.(On a logarithmic scale, taking ld = log 2 , the ratio ld((2 j+1 )/2 j ) = j +1−j = 1 is a constant.Thus, there, one switches at a constant rate, '1' indicating that H 0 alternates with H 1 .)Since 2 0 + 2 2 + 2 4 + . . .observations are from H 0 , and 2 1 + 2 3 + 2 5 + . . .observations are from H 1 , given a sample of size 2 n − 1, considerably more than one half of these observations come from H 0 (H 1 ), if n is an odd (even) number.Thus the arithmetic mean cannot 'settle' in some point.
Altogether, we obtain a stochastic process that is inhomogeneous in a particular way.
Its paths depend on the concrete distributions of H 0 and H 1 , and on the way switching is done.The aim of this article is to explore straightforward consequences of this basic setting.

The weaver's distribution
In order to keep things finite, suppose for the rest of this contribution that first moments exist, such that without real loss of generality3 µ(H 0 ) = 0 and µ(H 1 ) = 1 are the expected values of the two populations (distributions) involved.
A particularly simple way to alternate between H 0 and H 1 is to take the next batch of 2 j observations (j = 0, 1, . ..) from population H 0 with probability 1 − p, and from population H 1 with probability p.To avoid trivialities, we assume 0 < p < 1 throughout this contribution.Thus, one creates a hierarchical random system (a particular random probability measure) composed of a choice mechanism which selects the population in charge, and a realization mechanism which provides observations from the population selected.

Definition 1. (Exponential sampling)
Given two parent distributions H 0 and H 1 , and 0 < p < 1, define exponential sampling as follows: A sample of size 2 n − 1, i.e., X 1 ; X 2 , X 3 ; X 4 , X 5 , X 6 , X 7 ; . . .; X 2 n−1 , . . ., X 2 n −1 , consists of n sub-samples, where sub-sample j runs from X 2 j−1 to X 2 j −1 (j = 1, . . ., n).Further Then sub-sample j consists of 2 j−1 iid random variables which are distributed according With probability p, the first observation comes from H 1 , and with probability 1 − p, the first observation comes from H 0 .Thus, conditional on this choice, the expected value observed is either µ(H 1 ) = 1 or µ(H 0 ) = 0, and the unconditional mean is With probability p, the second and third observations both come from H 1 , and with probability 1 − p, these observations both come from H 0 .Thus, after two choices, the overall situation is as follows: The unconditional mean does not change, since For , and Y n = E( Xn |B n ).Some elementary properties of these processes are: (i) With probability one, Y n assumes the values y k = y k,n = k/(2 n − 1) for k = 0, 1, . . ., 2 n − 1, and the difference between the realizations of Y n is a constant; more precisely, ) is a binary vector of length n.Thus b j−1 may be interpreted as the j-th digit in the binary representation of a natural number k ∈ {0, . . ., 2 n − 1}, i.e., k = n−1 j=0 b j 2 j , and the probability p k at the point y k is given by where #1 and #0 denote the number of ones and zeros in b n , respectively.In particular, every p k can be written in the form p k = p l (1 − p) n−l with some l ∈ {0, . . ., n}.
(iii) More explicitly, the distributions of B n , E(S n |B n ), and Y n are (iv) There is the representation Proof: (iv).Note that X i is neither Bernoulli nor integer-valued, and thus S n can be any real number.However, conditional on B n = b n , with probability one, the expected sum E(S n |b n ) is a natural number, since the typical contribution of an X i ∼ H c to the sum S n is either one if c = 1 or zero if c = 0. Since the j-th subsample consists of 2 j−1 random variables having the same distribution, the right hand side of equation (1) follows.The left hand side is due to definition.
Thus the random variable E(S n |B n ) = n j=1 B j−1 2 j−1 takes values in {0, . . ., 2 n − 1} which implies (i).(ii) is due to construction.This or the binomial theorem yield It is crucial that the populations differ in expectation (µ(H 0 ) = µ(H 1 )), i.e., that Xn may fluctuate between two 'centres of gravity'.The assumption µ(H c ) = c, (c = 0, 1) simplifies the formal treatment considerably, since it leads to equation (1) which states that c = b j−1 ∈ {0, 1} may be interpreted as the index of the selected population H c , and as the expectation EX c where X c ∼ H c .Otherwise, a linear transformation would have to map µ(H 0 ) to zero and µ(H 1 ) to one, which would add an additional -yet unnecessary -layer of complexity.
We say that Y n has a weaver's distribution, Y n ∼ W (n, p), with parameters n and p.
Since powers of two play a major role, 'binary distribution' would also be a suitable choice -much in line with 'Bernoulli' and 'binomial' distributions, which are closely related.
Theorem 3. (The 'geometric triangle') Given the assumptions and the notation of the last theorem, let b n = s ij be a vector with exactly i ones and j zeros, such that i + j = n.Moreover, set f = p/(1 − p).
(i) The probabilities of the concatenated vectors (s ij , 1), (1, s ij ), (s ij , 0), and (0, s ij ) In particular, p l+1 /p l = p/(1 − p) = f for any two adjacent realizations y 2l , y 2l+1 , where l = 0, 1, . . ., 2 n−1 − 1.The probabilities p(•) of the concatenated vectors is the probability that only H 0 is chosen, and p k = p 0 • f #1 for k = 0, . . ., 2 n − 1, where, again, #1 is the number of ones in the binary representation of k.This means that the vector of probabilities p n = (p 0 , p 1 , . . ., p 2 n −1 ) can be written as follows: More explicitly, with p 0 = 1, the vector f n has dimension 2 n and obeys the recursive relation f 0 = 1, and for n = 1, 2, . . .Thus its components can be calculated with the help of the following scheme, which may be interpreted as a geometric version of Pascal's triangle.4n = 0 : 1 Every row has 2 n entries.Note that the left and the right of every | are 'separated' by the factor f in the following sense: First (iv) One may construct successive rows of (iii) in a rather elementary way: Start with a single 1 in the very first row.Then, fork every entry of row n into two, by multiplying each entry with 1 and f upon moving left and right, respectively.It is quite remarkable that this local (cascade) view is equivalent to the global (weaving) view taken in the definition.5 (v) Applying the logarithm base f to every entry of the geometric triangle yields the exponents, i.e., the following numbers: In general, s 0 = 0, and s n+1 = 2s n + 2 n for n = 0, 1, . . .That is, one obtains the sequence 0, 1, 4, 12, 32, 80, 192, 448, 10241, 4, 12, 32, 80, 192, 448, , 2304, . . . Proof: (i) , . . . Proof: (i) is proven in the statement of the theorem.However, (i) is also obvious, since the positions of the numbers 0 and 1 are irrelevant for the probabilities in question.In particular, for k = 0, 2, . . ., 2 n − 2, the binary representations of k and k + 1 differ in exactly one position.
(ii) Using Theorem 2 (ii), one obtains immediately (iii) is a consequence of self-similarity.Since the binary representations of 0 and 2 n−1 , and of 1 and 2 n−1 + 1, etc., differ only by a single one, Since, again by (ii), also p n = p 0 f n , the desired result follows.
One may also prove (iii) by induction on n: First, p 1 = f p 0 , and thus (p 0 , p 1 ) = (p 0 , f p 0 ) = p 0 (1, f ).Second, the binary representation of any k ∈ {0, . . ., 2 n − 1} is ).Since in the first case, the number of ones does not change, and in the second case, the number of ones increases by one, we obtain on the one hand (to the left), This is tantamount to f n being reproduced as the first half of f n+1 .(Upon moving from n to n + 1, the exponent of f does not change.)On the other hand (to the right), we obtain The additional factor f means that the second half of f n+1 has to be f • f n .
(iv) The proof is by induction on n.For n = 0 there is nothing to prove, and the equivalence is obvious for n = 1.By the inductive assumption, the vector occurring on line n, having length 2 n , has the form Local splits (see the definition given in the statement of the theorem) produce a vector w n+1 of length 2 n+1 .Since, locally, a step to the left reproduces the numbers, and a step to the right multiplies any two entries on tier n with the same factor f , we also have, because of the inductive assumption, w 2 n +k /w k = f for k = 1, . . ., 2 n .Therefore (v) Straightforward induction on n yields the recursive formula.♦ Note that the multiplicative triangle lies at the heart of the observation that "the best known multifractal constructions use multiplicative operations" (Mandelbrot (1999), p. 32).
Given the assumptions and the notation of Theorem 2, one obtains (i) The probabilities corresponding to row n can be constructed by the following simple scheme: . . .
(iii) Symmetry: Suppose Y ∼ W (n, p) and (iv) Distribution function F of W (n, p): For all n ≥ 0 and k = 0, . . ., 2 n define v k,n = k/2 n .For every fixed n, the mass left and right of v k,n (0 for every m ≥ n, and so is the value of (v) The total mass p k in every interval [v k,n , v k+1,n ] (k = 0, . . ., 2 n − 1) remains the same for all m ≥ n.For m = n it is located at the point y k = y k,n = k/(2 n − 1).
In the interest of consistency let y 0,0 = p and p 0 = 1 if n = 0.
Thus W (n, p) may be interpreted as a discretisation of the density in the corresponding classical p model.
(vi) Distribution of the jumps (stick heights): F n has 2 n points of discontinuity.If p = 1/2 there is a constant jump height h = 1/2 n .Otherwise, there are n + 1 different jump sizes, given by h j = p j (1−p) n−j for j = 0, . . ., n, having a binomial distribution.That is, there is 1 jump of size h 0 = (1 − p) n , there are n 1 = n jumps of size h 1 = (1 − p) n−1 p, etc. Proof: (i) For n = 1, 2, . .., we have p 0 = p 0 (n) = (1 − p) n for the leftmost probability (only H 0 is selected).Applying the geometric triangle yields the result.
(ii) We have p < 1/2 ⇒ f > 1.Thus the mass in y 1 exceeds the mass in y 0 = 0 by the factor f , and the result follows straightforwardly.
(iii) Exchanging the roles of zeros and ones, and replacing p by 1 − p yields the same distribution.In other words: The reflection of W (n, p) across the axis of symmetry (iv) follows immediately from the geometric triangle.Geometrically speaking, the unit interval on the horizontal axis is successively halved.At the same time, the unit interval on the vertical axis is successively divided according to the ratio f .Thus, for finite n ≥ 1, one obtains a step function with 2 n jumps.
(v) holds because of the local interpretation of the geometric triangle: Each split can be interpreted as distributing the mass p k in y k to the points y 2k,n+1 and y 2k+1,n+1 in that same interval.Graphically, the stick of height p k in y k,n is broken into two sticks of heights (1 − p)p k and p • p k , located in y 2k,n+1 and y 2k+1,n+1 , respectively.
(vi) is due to construction.♦ Note that there are two kinds of scale: the first could be named 'discrete time,' i.e., the total number of observations 2 n , the second would be 'logarithmic time,' that is, the number of selections, ld 2 n = n.
It is also instructive to compare the above structure to the received α model (see, in particular, Lovejoy and Schertzer (2013), pp.65-70, with the number of descendants being λ 0 ).If λ 0 = 2, both models are built on a binary cascade, i.e., after n steps, there are 2 n leaves.Like the p model, the α model distributes the mass assigned to some interval I k = (k/2 n , (k + 1)/2 n ), k = 0, . . ., 2 n − 1 uniformly on that interval.Moreover, the initial mass in some parent vertex is split according to a binary random variable B that assumes the value γ + > 0 (a 'boost') with probability p = 2 −c and the value γ − < 0 (a 'decrease') with probability 1 − p = 1 − 2 −c , where c is a positive parameter.
In particular, if the mass at the root is one, The illustrations ibd., p. 68, show that this model produces a 'randomized p density', i.e., a function that is equal to a constant c k > 0 on every interval (k/2 n , (k+1)/2 n ), k = 0, . . ., 2 n − 1.However, the values c k are random, since n steps amount to exactly n j=1 2 j−1 = 2 n −1 binary decisions, and each decision means to split the mass m found at some parent vertex in the way just described, i.e., with probability p a descendant inherits mass m • 2 γ + , with probability 1 − p it receives mass m • 2 γ − .Thus one obtains a particular realization of a sequence / cascade of randomly occurring boosts and decreases.
Weaving is different: First, we are interested in all possible realizations.Therefore, after n steps, the total mass of 1 has been distributed among the 2 n leafs y k,n of a binary tree.So, secondly, there is no density but a discrete distribution.Third, and perhaps most importantly, there have been n 'major' decisions that chose a parent distribution (H 0 or H 1 ) for each of the stages j = 1, . . ., n.Based on the latter distributions, there have also been 2 n − 1 'minor' decisions.That is, given j (and thus either H 0 or H 1 ), one has obtained a group of 2 j−1 independent realizations x 2 j−1 , . . ., x 2 j −1 from the parent distribution in charge.In other words, the cascade corresponding to weaving is hierarchical: n binary decisions select the composition of the sample X 1 , . . ., X 2 n −1 (exactly one out of 2 n possible compositions, since j = 1, . . ., n).Given this, in each group j, one considers the sum T j = 2 j −1 i=2 j−1 X i and its realization t j = 2 j −1 i=2 j−1 x i .
A different interpretation would be that the basic building block of an (unstandardized) Bernoulli cascade is where j = 1, 2, . . .and S 0 = 0. Notice that the crucial feature of this building block and the corresponding cascade is a particular kind of summation.
Proof: Equation ( 1) implies After the first step, the distribution of the conditional expected values is B(p).For any random variable X with values in the unit interval, and EX = p, this distribution has maximum variance p(1 − p).Upon weaving, probability mass is successively concen-trated within the unit interval, and thus variance decreases.On the other hand, every bifurcation may increase the variance term.
Both effects combined result in a (net) monotone decrease of variance up to a certain point.Moreover, there is a limit variance σ 2 = cp(1 − p) with 0 < c < 1.
Corollary 7. EY 2 n exists, and so do all higher moments EY j n for j ≥ 1.
Proof: For fixed n, all realizations y k are in the unit interval.Thus Denote by Z n = B k,n • Y n the random variable defined by replacing the realization Proof: Since each y k,n is mapped to 1 w.p. of "dual distributions" to W (k, n), mapping that distribution to the set {0, 1} without changing the center of gravity.Since the point y k,n and its mass are distributed to two values, 'splitting' might be an appropriate term for this operation.However, since the crucial idea is that all y k,n map to the set {0, 1}, 'merging' is even more appropriate.
A different interpretation would be that upon constructing the Binomial, splitting and merging occur together (cannot be separated) when moving from n to n + 1.The Bernoulli cascade also starts with unit mass at point p.However, after n bifurcations, it creates a W (n, p) that distributes mass among 2 n points.Hereafter, 'all the merging' occurs in a single step, i.e., merging is postponed until the end.
Due to Theorems 6 and 8, one may decompose the total variance among weaving and merging: Note that the last term in the second line is just the last column of matrix M.More importantly, in the last line, we interpret Z n as a particular mixture of H 0 and H 1 , whose total variance may be decomposed into the "variance of the conditional expected values" (the first term, i.e., the variability in the y k,n ) and the "expected conditional variance" (the second term, i.e., the mean variance of the B k,n ).
Some concrete values may be helpful: point at 1/2 and cannot be differentiated there.The same holds for all v(k, n).Since the set of these points lies dense in the unit interval, there should be no density.
Formally, consider the interval [v k,n , v k+1,n ] about y k = y k,n .For fixed n, this interval has length v k+1,n − v k,n = (k + 1 − k)/2 n = 1/2 n .By Theorem 3 (ii), the density in the neighbourhood of y k is given by where #0 and #1 are the number of zeros and ones in the binary representation of k, respectively.If p = 1/2, g k,n = 1, and thus W (1/2) is the uniform distribution on [0, 1].
In general, compare Equation (3) and the classical De Moivre-Laplace theorem.In the latter case, one considers n k p k (1 − p) n−k , which approaches a limit b ∈ (0, ∞), since the convergence of p k (1 − p) n−k toward zero is counterbalanced by a sequence that goes to infinity at the same speed, i.e., an appropriate binomial coefficient (also depending on n and k).
Here, every iteration (n → n + 1) doubles the number of values y k , and thus the first factor is 2 n instead of n k .Moreover, due to Theorem 4, every y k,n is the starting point of a cascade, i.e., a sequence of local bifurcations in the corresponding interval [v k,n ; v k+1,n ].After one iteration, the probabilities at y 2k,n+1 and y 2k+1,n+1 , i.e., (1−p)p k and p • p k , respectively, differ by the factor f .After l iterations, the probabilities at the leftmost value y 2 l k,n+l and the rightmost value y 2 l k+(2 l −1),n+l differ by f l .If w.l.o.g.mass is systematically shifted to the right (p > 1/2), we have f > 1, and thus the ratio of these probabilities soon exceeds any bound.Even more so, 2 l (1 − p) l p k → 0 and 2 l p l p k → ∞ in every interval [v k,n ; v k+1,n ] if l → ∞.Thus, there cannot be a limit density.♦ Note that the 'roughness' of the density (measured by f l ) grows at the same rate as the number of intervals.Thus ln f n / ln 2 n = ln f / ln 2 is a constant, the fractal dimension.
Proof: Mandelbrot's 'binomial measure' is the limit of the p model, splitting the mass (locally) according to the geometric triangle.Thus, the p model's Bernoulli cascade and weaving (see Theorem 4, (vi)) assign the same mass to every interval [v k,n ; v k+1,n ].
Since these intervals shrink to zero, the limit distributions have to coincide.♦ Of course, the last theorem and Salem (1943) also imply that W (p) has no density if p = 1/2.
The last theorem could be an example of a more general 'sandwich principle'.That is, W (p) is defined on a countable set and is the limit of a Bernoulli cascade.The cascade either starts with unit mass at the point p and distributes that mass to an increasing number of points (this article); or the cascade starts with the Uniform on the unit interval, and distributes that mass to an increasing number of (shrinking) intervals (the received p model).If both processes have the same 'inheritance' preference f = p/(1 − p), they determine the same multifractal in the limit.

The complete process
So far, we have mainly considered the distribution of the (conditional) expected values, Y n = E( Xn |B n ), or, equivalently, the case of two one-point distributions located in µ(H 0 ) and µ(H 1 ), respectively.Looking at Xn , however, there is not just variance between the populations H 0 and H 1 , but also within each of these populations, σ 2 (H 0 ) = σ 2 0 and σ 2 (H 1 ) = σ 2 1 , say, contributing to the total variance.
For the unconditional process we obtain: Of course, if the populations H 0 and H 1 are not too complicated, it is possible to study the process Xn in much more detail.For various extensions see the last section of Saint-Mont (2019).
, p. 471, favours the right-hand side with factor f = 2 : 1.Thus their figure at the top of that page can be translated into the Similar to the binomial distribution, every path, being defined by a sequence of independent binary decisions (b n−1 , . . ., b 0 ), splits upon moving from n to n + 1.However, unlike the Binomial, the paths do not merge.Rather, like threads, they interweave (see the next figure):The next illustration demonstrates that, in a sense, the difference between weaving and splitting is minor: Given a binary string, weaving adds the next cipher to the left (a prefix), whereas splitting adds the next cipher to the right (a suffix).Illustration 3: Global weaving (left) and local splitting (a cascade, right) After n steps (selections, choices), one thus obtains an interesting distribution: Theorem 2. (The weaver's distribution) Given the situation described in Definition 1, suppose the first moments are µ(H 0 ) = 0 and µ(H 1 ) = 1, respectively.

a
vector b n = (b n−1 , . . ., b 0 ).Let #1 be the number of ones in b n .With probability 1 − p, the next selection leads to (0, b n ), and with probability p this selection results in (1, b n Local interpretation [Bernoulli cascade]: Start with mass 1 in the very first (the zeroth) row.Then, fork every probability of row n into two, by multiplying each entry with 1 − p (on the left) and p (on the right), respectively.
y k,n and to 0 w.p. 1 − y k,n , Z n = B k,n • Y n assumes the values zero and one.Moreover, since EB k,n = y k,n the location of Y n 's distribution is the same as that of Z n .Thus Z n ∼ B(p) and the variance follows.♦ For fixed n, one might think of the collection of all B k,n (k = 0, . . ., 2 n − 1) as a family

Theorem 12 .
(Expected value and variance)    With the assumptions of Theorem 2 , E Xn = p andσ 2 ( Xn ) = σ 2 (Y n ) + pσ 2 1 + (1 − p)σ n, p) the way we did in Theorem 8, asymptotically leads to Y ∼ W (p) and a countable family of dual distributions B k .Since σ 2 (Y ) = p(1 − p)/3, one third of the total asymptotic variance between the populations is due to weaving, and the rest stems from forcing all threads to terminate in 0 or 1.