The infinite extendibility problem for exchangeable real-valued random vectors

We survey known solutions to the infinite extendibility problem for (necessarily exchangeable) probability laws on $\mathbb{R}^d$, which is: Can a given random vector $\vec{X} = (X1,\ldots,X_d)$ be represented in distribution as the first $d$ members of an infinite exchangeable sequence of random variables? This is the case if and only if $\vec{X}$ has a stochastic representation that is"conditionally iid"according to the seminal De Finetti's Theorem. Of particular interest are cases in which the original motivation behind the model $\vec{X}$ is not one of conditional independence. After an introduction and some general theory, the survey covers the traditional cases when $\vec{X}$ takes values in $\{0,1\}^d$ has a spherical law, a law with $\ell_1$-norm symmetric survival function, or a law with $\ell_{\infty}$-norm symmetric density. The solutions in all These cases constitute analytical characterizations of mixtures of iid sequences drawn from popular, one-parametric probability laws on $\mathbb{R}$, like the Bernoulli, the normal, the exponential, or the uniform distribution. The survey further covers the less traditional cases when $\vec{X}$ has a Marshall-Olkin distribution, a multivariate wide-sense geometric distribution, a multivariate extreme-value distribution, or is defined as a certain exogenous shock model including the special case when its components are samples from a Dirichlet prior. The solutions in these cases correspond to iid sequences drawn from random distribution functions defined in terms of popular families of non-decreasing stochastic processes, like a L\'evy subordinator, a random walk, a process that is strongly infinitely divisible with respect to time, or an additive process. The survey finishes with a list of potentially interesting open problems.


Contents
Summary of main results surveyed. Whereas most appearing notations are introduced in the main body of the article, here L[X] denotes the Laplace transform of a random variable X > 0, and x, y := d k=1 x k y k .
Some general definitions regarding probability spaces: All random objects to be introduced are formally defined on some probability space (Ω, F, P) with σ-algebra F and probability measure P, and the expected value of a random variable X is denoted by E[X]. As usual, the argument ω ∈ Ω of some random variable X : Ω → R will always be omitted. The symbol d = denotes equality in distribution and the symbol ∼ means "is distributed according to". We recall that for random vectors (X 1 , . . . , X d ) d = (Y 1 , . . . , Y d ) means E[g(X 1 , . . . , X d )] = E[g(Y 1 , . . . , Y d )] for all bounded, continuous functions g : R d → R, where the expectation values E are taken on the respective probability spaces of (X 1 , . . . , X d ) and (Y 1 , . . . , Y d ), which might be different. Equality in law for two stochastic processes X = {X t } and Y = {Y t } means that (X t 1 , . . . , X t d ) d = (Y t 1 , . . . , Y t d ) for arbitrary d ∈ N and t 1 , t 2 , . . . , t d . Throughout, the abbreviation iid stands for independent and identically distributed. The index t of a real-valued random variable f t that belongs to some stochastic process f = {f t } t∈T is purposely written as a sub-index, in order to distinguish it from the value f (t) of some (non-random) function f : T → R. If F is the distribution function of some random variable taking values in R, we denote by F −1 (y) := inf{x ∈ R : F (x) ≥ y}, y ∈ [0, 1], its generalized inverse, see [22] for background. Any distribution function C of a random vector U = (U 1 , . . . , U d ) whose components U k are uniformly distributed on [0, 1] is called a copula, see [66] for a textbook treatment. We further recall that an arbitrary survival functionF of some d-variate random vector X can always be written asF (x) = C P(X 1 > x 1 ), . . . , P(X d > x d ) , whereĈ is a copula, called a survival copula forF , and it is uniquely determined in case the random variables X 1 , . . . , X d have continuous distribution functions. This is the survival analogue of the so-called Theorem of Sklar, due to [89]. The Theorem of Sklar itself states that the distribution function F of X can be written as F (x) = C P(X 1 ≤ x 1 ), . . . , P(X d ≤ x d ) for a copula C, called a copula for F . The relationship between a copula C and its survival copulaĈ is that if (U 1 , . . . , U d ) ∼ C then (1 − U 1 , . . . , 1 − U d ) ∼Ĉ.
A notation of specific interest in the present survey: We denote by H the set of all distribution functions of real-valued random variables, and by H + the subset containing all elements F such that x < 0 implies F (x) = 0, i.e. distribution functions of non-negative random variables. Elements F ∈ H are rightcontinuous, and we denote by F (x−) := lim t↑x F (x) their left-continuous versions. If X is some Hausdorff space, M 1 + (X) denotes the set of all (Radon) probability measures on X. Recall that H is metrizable (hence in particular Hausdorff) when topologized with convergence in distribution of the associated random variables, see [88]. Thus, (Radon) probability measures are well defined and we denote by M 1 + (H) the set of all probability measures on H. We also conveniently write X ∈ M 1 + (X) if X is a random variable with a probability law in M 1 + (X). For instance, H ∈ M 1 + (H) means that H is a stochastic process such that each potential realization is a distribution function.

Motivation and mathematical preliminaries
Throughout, we consider by X = (X 1 , . . . , X d ) a random vector taking values in R d . Since we are only interested in the probability distribution of X, we identify X with its probability law in the sense that we often say X has some property if and only if its probability distribution has this property. Given a family of probability distributions M ⊂ M 1 + (R d ), we are interested in a solution to the following generic problem: Problem 1.1 (Motivating problem) Find a convenient probabilistic description for the subfamily M * ⊂ M of those probability distributions for which there exists a stochastic representation on some probability space (Ω, F, P) in which the components are iid conditioned on some σ-algebra H ⊂ F.

Remark 1.2 (Nomenclature)
If M = M 1 + (R d ) denotes the family of all probability laws on R d , we say that an element in M * is conditionally iid. Similarly, we say that a random vector X ∈ M 1 + (R d ) is conditionally iid if its probability law is actually in the subset M * . It is important to be aware that a random vector that is conditionally iid according to this definition does have a stochastic representation as described in Problem 1.1, but it might also have another stochastic representation in which the components are not iid conditioned on some σ-algebra. In fact, typical cases of interest are such that M is defined in terms of a stochastic model or probabilistic property which is a priori unrelated to the concept of conditional independence. In the literature, elements of M * are not always called conditionally iid, but other names have been given. For instance, [86] calls them positive dependent by mixture (PDM), [90,36,53] call them infinitely extendible, and [66, Definition 1.10, p. 43] call them simply extendible. The nomenclature "(infinite) extendibility" refers to the fact that conditionally iid random vectors can always be thought of as finite margins of infinite conditionally iid sequences, as will be explained below. The nomenclature "PDM" becomes intuitive from Lemmata 1.8, 1.9 and 1.10 below, but is rather unusual.
The investigation of conditionally iid random vectors is closely related to the concept of exchangeability. Recall that the probability distribution of a random vector X is called exchangeable if it is invariant under an arbitrary permutation of the components of X. The following observation is immediate but important.

Lemma 1.3 (Exchangeability)
If X is conditionally iid, it is also exchangeable.

Proof
Consider a probability space (Ω, F, P) such that conditioned on some σ-algebra H ⊂ F the components X 1 , . . . , X d are iid. If π is an arbitrary permutation of {1, . . . , d} we observe that where ( * ) follows from the iid property. This implies the claim.
Exchangeability is a property which is convenient to investigate by means of Analysis, whereas the notion "conditionally iid", in which we are interested, is a priori purely probabilistic and more difficult to investigate. Unfortunately, exchangeability is only a necessary but no sufficient condition for the solution of our problem. For instance, the bivariate normal distribution is obviously exchangeable if and only if the two means and variances are identical, also for negative correlation coefficients. However, Example 1.6 and Lemma 1.8 below show that conditionally iid random vectors necessarily have nonnegative correlation coefficients. One can show in general that the correlation coefficient -if existent -between two components of an exchangeable random vector on R d is bounded from below by −1/(d−1), see, e.g., [1, p. 7]. As the dimension d tends to infinity, this lower bound becomes zero. Even better, the difference between exchangeabilty and a conditionally iid structure vanishes completely as the dimension d tends to infinity, which is the content of De Finetti's Theorem.

Theorem 1.4 (De Finetti's Theorem)
Let {X k } k∈N be an infinite sequence of random variables on some probability space (Ω, F, P). The sequence {X k } k∈N is exchangeable, meaning that each finite subvector is exchangeable, if and only if it is iid conditioned on some σ-field H ⊂ F. In this case, H equals almost surely the tail-σ-field of {X k } k∈N , which is given by ∩ n≥1 σ(X n , X n+1 , . . .).

Proof
Originally due to [18]. We refer to [1] for a proof based on the reversed martingale convergence theorem, which is briefly sketched. Of course, we only need to verify that exchangeability implies conditionally iid, as the converse is obvious. For the sake of a more convenient notation we assume the infinite sequence {X k } k∈N 0 is indexed by N 0 = N ∪ {0}, and we define σ-algebras F −n := σ(X n , X n+1 , . . .) for n ∈ N 0 . The tailσ-filed of the sequence is H := ∩ n≤0 F n . In order to establish the claim, three auxiliary observations are helpful with an arbitrary bounded, measurable function g fixed: (i) Exchangeability implies (X 0 , X 1 , . . .) d = (X 0 , X n+1 , . . .) for arbitrary n ∈ N. This (ii) The sequence Y n := E[g(X 0 ) | F n ], n ≤ 0, is easily checked to be a reversed martingale. The reversed martingale convergence theorem implies that Y n converges almost surely and in L 1 to E[g(X 0 ) | H]. See [21, p. 264 ff] for background on reversed martingales (convergence).
(iii) Letting n → ∞ in (i), we observe from (ii) that Y −1 d = E[g(X 0 ) | H]. We can further replace this equality in law by an almost sure equality, since H ⊂ F −1 and the second moments of Y −1 and E[g(X 0 ) | H] coincide. Thus, the sequence {Y −n } n∈N is almost surely a constant sequence.
With these auxiliary observations we may now finish the argument. On the one hand, exchangeability implies (X 0 , X n+1 , . . .) d = (X n , X n+1 , . . .), which gives the almost sure equality E[g(X 0 ) | F −(n+1) ] = E[g(X n ) | F −(n+1) ]. Taking E[. | H] on both sides of this equation implies with the tower property of conditional expectation that E[g(X 0 ) | H] = E[g(X n ) | H]. Since g was arbitrary, X 0 and X n are identically distributed conditioned on H, and since n was arbitrary all members of the sequence are identically distributed conditioned on H. To verify conditional independence, let g 1 , g 2 be two bounded, measurable functions. For n ≥ 1 arbitrary, we compute The precisely same tower property argument inductively also implies for arbitrary 0 ≤ i 1 < . . . < i k and bounded measurable functions g 1 , . . . , g k . Thus, the random variables X 0 , X 1 , . . . are independent conditioned on H.
As will be explained in more detail in paragraph 1.3 below, we may always think of a conditionally iid random vector X = (X 1 , . . . , X d ) as being defined via X k := f (U k , H), k = 1, . . . , d, with some measurable "functional" f , an iid sequence of random objects U 1 , . . . , U d , and some independent random object H. The object H, sometimes called a latent (dependence-inducing) factor, then induces the dependence between the components, which are iid conditioned on H = σ(H). Obviously, in this construction it is possible to let d tend to infinity. Thus, we may without loss of generality think of a conditionally iid random vector X as the first d members of an infinite sequence {X k } k∈N on (Ω, F, P) such that conditioned on some σ-algebra H ⊂ F the sequence {X k } k∈N is iid. De Finetti's Theorem thus allows us to view conditionally iid random vectors X = (X 1 , . . . , X d ) as the first d members of an infinite exchangeable sequence {X k } k∈N . More clearly, a probability law γ ∈ M 1 + (R d ) is conditionally iid if and only if there exists an infinite exchangeable sequence {X k } k∈N on some (possibly different) probability space such that X = (X 1 , . . . , X d ) ∼ γ. This viewpoint allows to refine Problem 1.1 as follows.

Problem 1.5 (Motivating problem refined)
Let M denote all probability laws on R d which have some property (P). Further assume that (P) is a property that makes sense in any dimension. Find a convenient probabilistic description for the subfamily M * * ⊂ M of those probability distributions for which there exists an infinite exchangeable sequence {X k } k∈N such that the law of each finite margin (X 1 , . . . , X n ) has property (P), n ∈ N arbitrary.
In the situation of Problem 1.5 we have M * * ⊂ M * , and the inclusion can be proper in general, although this is unusual in cases of interest. A typical example for (P) is the property of "being a multivariate normal distribution (in some dimension)". For a given d-variate multivariate normal law it is a priori unclear whether there exists an infinite exchangeable sequence with d-margins being equal to the given multivariate normal law and such that all n-margins are multivariate normal as well for n > d. This is indeed the case, i.e. M * * = M * in this particular situation, as can be inferred from Example 1.6 below. The typical questions in the theory deal with properties (P) that are dimension-independent, so most results presented are actually solutions to Problem 1.5 rather than to Problem 1.1, see also paragraph 7.5 below for a further discussion related to this subtlety.
Which topics are covered in the present survey?
The present article surveys known answers to Problems 1.1 and 1.5 for families of multivariate probability distributions M that are well known in the statistical literature, and/ or have proven useful as a mathematical model for specific applications. While several traditional results of the theory have been studied in the last century, some significant achievements have been accomplished only within the last decade, so the present author feels that this is a good time point to recap what has been achieved, hence to write this overview article. One goal of the present survey is to collect the numerous results under one common umbrella in a reader-friendly summary to make them accessible for a broader audience of applied and theoretical probabilists, and in order to inspire others to join this interesting strand of research in the future. Proofs, or at least proof sketches, are presented for most results in order to (a) demonstrate how solutions to Problems 1.1 and 1.5 often unravel surprising links between seemingly different fields of mathematics/probability theory, and (b) render this document a useful basis for the use as lecture notes in an advanced course on multivariate statistics or probability theory.
Which topics are not covered in the present survey?
The scope of former literature on the topic is often wider, in particular the references [1,50,2] are very popular surveys on the topic with wider scope. On the one hand, many references on the topic discuss conditionally iid models under the umbrella of exchangeability, which has been mentioned to be a weaker notion for finite random vectors. The characterization of the (finitely) exchangeable subfamily of M is often easier than the characterization of the (in general) smaller set M * , and is typically an important first step towards a solution to Problem 1.1. However, the second (typically harder) step from (finite) exchangeability to conditionally iid is usually the more important and more interesting step from both a theoretical and practical perspective. The algebraic structure of a general theory on (finite) exchangeability is naturally of a different, often more combinatorial character, whereas "conditionally iid" by virtue of De Finetti's Theorem naturally is the concept of an infinite limit (of exchangeability) so that techniques from Analysis enter the scene. Thus, we feel it is useful to provide an account with a more narrow scope on conditionally iid, even though for some of the presented examples we are well aware that an interesting (finite) exchangeable theory is also viable. On the other hand, many references consider the case when the components of X take values in more general spaces than R, for instance in R n (i.e. lattices instead of vectors) or even function spaces. In particular, De Finetti's Theorem 1.4 can be generalized in this regard, seminal references are [27,80]. Research in this direction is by nature more abstract and thus maybe less accessible for a broader audience, or for more practically oriented readers. One goal of the present survey is to provide an account that is not exclusively geared towards theorists but also to applicants of the theory, and in particular to point out relationships to classical statistical probability laws on R d . We believe that a limitation of this survey's scope to the real-valued case is still rich enough to provide a solid basis for an interesting and accessible theory. in fact, we seek to demonstrate that Problems 1.1 and 1.5 have been solved satisfactorily in quite a number of highly interesting cases, and the solutions contain interesting links to different probabilistic topics. Of course, it might be worthwhile to ponder about generalizations of some of the presented results to more abstract settings in the future (unless already done) -but purposely these lie outside the present survey.
Why are Problems 1.1 and 1.5 interesting at all? Broadly speaking, because of two reasons: (a) conditionally iid models are convenient for applications, and (b) solutions to the extendibility problem sometimes rely on compelling relationships between a priori different theories.
(a) Roughly speaking, conditionally iid models allow to model (strong and weak) dependence between random variables in a way that features many desirable properties which are taylor-made for applications, in particular when the dimension d is large. Firstly, a conditionally iid random vector is "dimension-free" in the sense that components can be added or removed from X without altering the basic structure of the model, which simply follows from the fact that an iid sequence remains an iid sequence after addition or removal of certain members. This may be very important in applications that require a regular change of dimension, e.g. the readjustment of a large credit portfolio in a bank, when old credits leave and new credits enter the portfolio frequently. Secondly, if X has a distribution from a parametric family, the parameters of this family are typically determined by the parameters of the underlying latent factor H (whichever object it may be), irrespective of the dimension d. Consequently, the number of parameters does usually not grow significantly with the dimension d and may be controlled at one's personal taste. This is an enormous advantage for model design in practice, in particular since the huge degree of freedom/ huge number of parameters in a high-dimensional dependence model is often more boon than bane. Thirdly, fundamental statistical theorems relying on the iid assumption, like the law of large numbers, may still be applied in a conditionally iid setting, making such models very tractable. Last but not least, in dependence modeling a "factor-model way of thinking" is very intuitive, e.g. it is well-established in the multivariate normal case (thinking of principle component analyses etc.). On a high level, if one wishes to design a multi-factor dependence model within a certain family of distributions M, an important first step is to determine the one-factor subfamily M * . Having found a conditionally iid stochastic representation of M * , the design of multi-factor models is sometimes obvious from there, see also paragraph 7.3.
(b) The solution to Problems 1.1 and 1.5 is often mathematically challenging and compelling. It naturally provides an interesting connection between the "static" world of random vectors and the "dynamic" world of (one-dimensional) stochastic processes. The latter enter the scene because the latent factor being responsible for the dependence in a conditionally iid model for X may canonically be viewed as a non-decreasing stochastic process (a random distribution function), which is explained in Section 1.3 below. In particular, for some classical families M of multivariate laws from the statistical literature the family M * * in Problem 1.5 is conveniently described in terms of a well-studied family of stochastic processes like Lévy subordinators, Sato subordinators, or processes which are infinitely divisible with respect to time. Moreover, in order to formally establish the aforementioned link between these two seemingly different fields of research the required mathematical techniques involve classical theories from Analysis like Laplace transforms, Bernstein functions, and moment problems.
Before we start, can we please study a first simple example? It is educational to end this motivating paragraph by demonstrating the motivating problem with a simple example that all readers are familiar with. Denoting by N (µ, Σ) the multivariate normal law with mean vector µ ∈ R d and covariance matrix Σ ∈ R d×d , Example 1.6 provides the solution for Problem 1.1 (and 1.5) in the case when M consists of all multivariate normal laws. Example 1.6 (The multivariate normal law) We want to solve Problems 1.1 and 1.5 for the family M which comprises the probability laws of random vectors X = (X 1 , . . . , X d ) saftsfying the property (P) of "having a multivariate normal distribution". We claim that M * equals M * * and is given by Proof Consider X = (X 1 , . . . , X d ) ∼ N (µ, Σ) on some probability space (Ω, F, P) for µ = (µ 1 , . . . , µ d ) ∈ R d , and Σ = (Σ i,j ) ∈ R d×d a positive definite matrix. If we assume that the law of X is in M * , it follows that there is a sub-σ-algebra H ⊂ F such that the components X 1 , . . . , X d are iid conditioned on H. Consequently, irrespectively of k = 1, . . . , d. The analogous reasoning also holds for the second moment of X k , which implies Σ k,k = Σ 1,1 for all k. Moreover, for arbitrary components i = j, where we used the conditional iid structure and Jensen's inequality. This finally implies that all off-diagonal elements of Σ are identical and nonnegative.
Conversely, let µ ∈ R, σ > 0, and ρ ∈ [0, 1]. Consider a probability space on which d + 1 iid standard normally distributed random variables M, M 1 , . . . , M d are defined. We define It is readily observed that X = (X 1 , . . . , X d ) has a multivariate normal law with pairwise correlation coefficients all being equal to ρ, and all components having mean µ and variance σ 2 . Notice in particular that the non-negativity of ρ is important in the construction (3) because the square root is not well-defined otherwise. The components of X are obviously conditionally iid given the σ-algebra H generated by M . Hence the law of X is in M * . Obviously, the stochastic construction (3)

Canonical probability spaces
We have mentioned earlier that a conditionally iid random vector X is usually constructed as X k := f (U k , H), k = 1, . . . , d, from an iid sequence U 1 , . . . , U d , some independent stochastic object H, and some functional f . Clearly, this general model is inconvenient because neither the law of U 1 , nor the nature of the stochastic object H or the functional f are given explicitly. However, there is a canonical choice for all three entities, which we are going to consider in the sequel. By definition, conditionally iid means that conditioned on the object H the random variables X 1 , . . . , X d are iid, distributed according to a univariate distribution function F , which may depend on H. A univariate distribution function F is nothing but a non-decreasing, right-continuous function F : R → [0, 1] with lim t→−∞ F (t) = 0 and lim t→∞ F (t) = 1, see [11,Theorem 12.4,p. 176]. Without loss of generality we may assume that the random object H = {H t } t∈R already is the conditional distribution function itself, i.e. is a random variable in the space of distribution functions -or, in other words, a non-decreasing, right-continuous stochastic process with lim t→−∞ H t = 0 and lim t→∞ H t = 1. In other words, H ∈ M 1 + (H). In this case, a canonical choice for the law of U 1 is the uniform distribution on [0, 1] and the functional f may be chosen as Indeed, one verifies that X 1 , . . . , X d are iid conditioned on H := σ {H t } t∈R , with common univariate distribution function H, since for all t 1 , . . . , t d ∈ R. Every random vector which is conditionally iid can be constructed like this, i.e. there is a one-to-one relation between such models and random variables in the space of (one-dimensional) distribution functions. For each given H = {H t } t∈R ∈ M 1 + (H), and a given dimension d ∈ N, the canonical construction (4) induces a multivariate probability distribution on R d , which we denote 1 by Θ d (H). Consistently, for M ⊂ M 1 + (R d ) we denote by Θ −1 d (M * ) the subset of M 1 + (H) which consists of all stochastic processes {H t } t∈R such that X of the canonical construction (4) has a law in M, hence in M * . Hence, there is a one-to-one correspondence between the sets M * and Θ −1 d (M * ) induced by the stochastic model (4), i.e. Θ d is a bijection: From this equivalent viewpoint our motivating Problems 1.1 and 1.5 become Admittedly, this reformulation in terms of the stochastic process H might appear quite artificial at this point, but we will see later that in some cases the correspondence M * On a high level, the problem of determining the intersection of the given family M of distributions with the family of conditionally iid distributions may also be re-phrased as the problem of finding an increasing stochastic process whose stochastic nature induces the given multivariate distribution when inserted into a canonical stochastic model.
given family (some probability space) conditionally iid (canonical probability space) ?

Laws with positive components
If the given family M consists only of probability laws on [0, ∞) d , it is convenient to slightly reformulate the stochastic model (4). Clearly, if we have non-negative components, necessarily H t = 0 for all t ≤ 0 almost surely. Therefore, without loss of generality we may assume that H = {H t } t≥0 is indexed by t ∈ [0, ∞). Moreover, applying the substitution z = − log(1 − F ) it trivially holds true that One may therefore rewrite the canonical construction (4) as where the k := − log(1 − U k ), k = 1, . . . , d, are now iid exponential random variables with unit mean, and Z = {Z t } t≥0 is now no longer a distribution function, but instead a non-decreasing, right-continuous process with Z 0 ≥ 0 and lim t→∞ Z t = ∞, related to H via the substitution Z t = − log(1 − H t ). This canonical probability space is visualized in Figure 1.

General properties of conditionally iid models
In this section we briefly collect some general properties of conditionally iid models. To this end, throughout this section we assume that M = M 1 + (R d ) denotes the family of all d-dimensional probability laws on R d and we collect general properties of M * . Fig. 1 Illustration of one simulation of the canonical construction (5) in dimension d = 4. One observes that the process Z = {Z t } t≥0 in this particular illustration has jumps, and therefore there is a positive probability that two components take the identical value. This does not happen if Z is a continuous process, see also Lemma 1.12 below.

Positive dependence
If the law of X is in M * , the covariance matrix of X -provided existence -cannot have negative entries.

Lemma 1.8 (Non-negative correlations)
If the law of X is in M * and the covariance matrix of X exists, then all its entries are non-negative.

Proof
This follows from precisely the same computations that have been carried out already in (1) and (2) for the particular example of the multivariate normal distribution.
Correlation coefficients are sometimes inappropriate dependence measurements outside the Gaussian paradigm. For instance, their existence depends on the existence of second moments, or we might have a correlation coefficient that is strictly less than one despite the fact that one component of the random vector is a monotone function of the other, since correlation coefficients depend on the marginal distributions as well. For these reasons, several alternative dependence measurements have been developed. One popular among them is the concordance measurement Kendall's Tau.
In words, concordance means that one of the two points lies north-east of the other, while discordance means that one of the two lies north-west of the other. Kendall's Tau for a bivariate random vector X is defined as the difference between the probability of concordance and the probability of discordance for two independent copies of X. If X is conditionally iid, Kendall's Tau is necessarily non-negative.

Lemma 1.9 (Non-negative Kendall's Tau)
If the law of X = (X 1 , X 2 ) is in M * , then Kendall's Tau is necessarily non-negative.

Proof
Let X (1) and X (2) be two independent copies of X, both defined on some common probability space. By assumption we find a σ-algebra H such that conditioned on H all four random variables X 1 , X 2 , X 1 , X 2 are independent with respective distribution functions H (1) (for X 2 ) and H (2) (for X (2) 1 , X 2 ). Notice that H (1) and H (2) are iid. We compute x− dH (2) x 2 and analogously The next lemma is less intuitive on first glimpse, but like Lemmata 1.8 and 1.9 it qualitatively states that laws in M * exhibit some sort of "positive" dependence. In order to understand it, it is useful to recall the notion of majorization, see [71] for a textbook account on the topic. A vector a = (a 1 , . . . , a d ) is said to majorize a vector Intuitively, the entries of b are "closer to each other" than the entries of a, even though the sum of all entries is identical for both vectors. For instance, the vector (1, 0, . . . , 0) majorizes the vector (1/2, 1/2, 0, . . . , 0), which majorizes (1/3, 1/3, 1/3, 0, . . . , 0), and so on.

Lemma 1.10 (A link to majorization)
Consider X with law in M * . Further, let Y = (Y 1 , . . . , Y d ) be a random vector with components that are iid and satisfy We denote F Z (x) := P(Z ≤ x) for real-valued Z and x ∈ R.
(a) The vector First, it is not difficult to verify that is concave for arbitrary 1 ≤ n ≤ d. Second, concavity implies that where Jensen's inequality has been used. Making use of the relation For the general case, one simply has to observe that the law of g(X 1 ), . . . , g(X d ) is also in M * and due to monotonicity of g we have either g(X [1] ) ≤ . . . ≤ g(X [d] ) in the non-decreasing case or g(X [d] ) ≤ . . . ≤ g(X [1] ) in the non-increasing case.
Intuitively, statement (b) in case g(x) = x states that the expected values of the order statistics E[X [k] ], k = 1, . . . , d, are closer to each other than the respective values if the components of X were iid (and not only conditionally iid). Intuitively, the components of a random vector X with components that are conditionally iid are thus less spread out than the components of a random vector with iid components. Thus, Lemmata 1.8, 1.9 and 1.10 show that dependence models built from a conditionally iid setup can only capture the situation of components being "more clustered" than independence, which is loosely interpreted as "positive dependence". Generally speaking, negative dependence concepts are more complicated than positive dependence concepts in dimensions d ≥ 3, the interested reader is referred to [77] for a nice overview and references dealing with such concepts.
Whereas Lemmata 1.8, 1.9 and 1.10 provide three particular quantifications for positive dependence of a conditionally iid probability law, many other possible concepts of positive dependence can be found in the literature, a textbook account on the topic is [75]. [83,Theorem 4] claims that if X = (X 1 , . . . , X d ) is conditionally iid and x → P(X 1 ≤ x) is continuous, then It is not difficult to compute that and the distribution function of X is a copula, i.e. has standard uniform one-dimensional marginals. In particular, contradicting positive lower orthant dependency. Notice that Kendall's Tau for X is exactly equal to zero, and also the correlation coefficient between the components of X equals zero.

Further properties
Even though it is obvious, we find it educational to point out explicitly that path continuity of H corresponds to the absence of a singular component in the law of X.

Lemma 1.12 (Path continuity of H)
Let H ∈ M 1 + (H) and consider the random vector X = (X 1 , . . . , X d ) constructed in Equation (4) for arbitrary d ≥ 2. Then P(X 1 = X 2 ) = 0 if and only if the paths of H are almost surely continuous.

Proof
Conditioned on the σ-algebra H generated by H, the random variables X 1 , X 2 are iid with distribution function H. Since two iid random variables take exactly the same value with positive probability if and only if their common distribution function has at least one jump, the claim follows.
The following result is shown in [86,Proposition 4.2], but we present a slightly different proof.

Lemma 1.13 (Closure under convergence in distribution)
If X (n) are conditionally iid and converge in distribution to X, then the law of X is also conditionally iid.

Proof
Since we only deal with a statement in distribution, we are free to assume that each X (n) is represented as in (4)  This set is compact by Helly's Selection Theorem and Hausdorff when equipped with the topology of pointwise convergence at all continuity points of the limit, see [88]. Thus, the Radon probability measures on this set are a compact set by [3,Corollary II.4.2,p. 104]. This implies that we find a convergent subsequence {n k } k∈N ⊂ N such that H (n k ) converges in distribution to some limiting stochastic process H, which takes itself values in the set of distribution functions of random variables taking values in [−∞, ∞]. It is now not difficult to see that where bounded convergence is used in the last equality. This implies that the law of X can be constructed canonically like in (4), hence X is conditionally iid. Finally, since X is assumed to take values in R d , necessarily H is almost surely the distribution function of a random variable taking values in R (instead of [−∞, ∞]).
Recall that a random vector (X 1 , . . . , X d ) is called radially symmetric if there exist If (X 1 , . . . , X d ) is constructed as in Equation (4), then radial symmetry can be translated into a symmetry property of the random distribution function H, which is the content of the following lemma.

Proof
On the one hand, we observe On the other hand, we have from where the claimed equivalence can now be deduced easily. Notice that the conditionally iid structure implies that d can be chosen arbitrary and the law of H is determined uniquely by the law of an infinite exchangeable sequence {X k } k∈N constructed as in (4) with d → ∞.

Example 1.15 (The multivariate normal law, again)
The most prominent radially symmetric distribution is the multivariate normal law. Recalling Example 1.6, it follows from (3) that N (µ, Σ) * , the conditionally iid normal laws, are induced by the stochastic process {H t } t≥0 given by for some µ ∈ R, σ > 0, and ρ ∈ [0, 1], and a random variable M ∼ Φ = distribution function of a standard normal law. The reader may check herself that this random distribution function H satisfies the property of Lemma 1.14.
An immediate but quite useful property of a conditionally iid model is the following corollary to the classical Glivenko-Cantelli Theorem.

Lemma 1.16 (Conditional Glivenko-Cantelli)
Let {X k } k∈N be an infinite exchangeable sequence defined by the canonical construction (4) from an infinite iid sequence {U k } k∈N and an independent random distribution function H ∈ M 1 + (H). It holds almost surely and uniformly in t ∈ R that

Proof
Follows immediately from the classical Glivenko-Cantelli Theorem, which is applied in the second equality below: The stochastic nature of the process {H t } t∈R clearly determines the law of X. Conversely, Lemma 1.16 tells us that the law of the d-dimensional vector X does not determine the law of the underlying latent factor {H t } t∈R in general, but accomplishes this in the limit as d → ∞. Given some infinite exchangeable sequence of random variables {X k } k∈N , it shows how we can recover its latent random distribution function H.
For the sake of completeness, the following remark gives two equivalent conditions for exchangeability of an infinite sequence of random variables.

Remark 1.17 (Conditions equivalent to infinite exchangeability)
A result due to [81] states that an infinite sequence {X k } k∈N of random variables is exchangeable (or, equivalently, conditionally iid by De Finetti's Theorem) if and only if the law of the infinite sequence {X n k } k∈N is invariant with respect to the choice of (increasing) subsequence {n k } k∈N ⊂ N. Another equivalent condition to exchangeability for an arbitrary finite stopping time τ with respect to the filtration F n := σ(X 1 , . . . , X n ), n ∈ N, see [46].

A general (abstract) solution to Problem 1.1
[51] solve Problem 1.1 on an abstract level for the whole family M = M 1 + (R d ) of all probability laws on R d . Their result is formulated in the next theorem in our notation 2 .
where the outer supremum is taken over all bounded, measurable functions g : R d → R, and the inner supremum in the denominator is taken over all random vectors Y = (Y 1 , . . . , Y d ) with iid components.

Proof
The proof of sufficiency is the difficult part, relying on functional analytic methods, and we refer the interested reader to [51, Theorem 5.1], but provide some intuition below. Necessity of the condition in Theorem 1.18 is the easy part, as will briefly be explained.
Without loss of generality we may assume that X is represented by (4) with some stochastic process H ∈ M 1 + (H) and an independent sequence of iid variates U 1 , . . . , U d uniformly distributed on [0, 1]. For arbitrary bounded and measurable g we observe Regarding the intuition of the sufficiency of the condition in Theorem 1.18, we provide one demonstrating example. With X standard normal, we have already seen in Example 1.6 that the random vector X = (X, −X) is not conditionally iid, since it is bivariate normal with negative correlation coefficient. So how does this random vector violate the condition? Considering the bounded measurable function g(x 1 , is an arbitrary vector with iid components, we observe that Consequently, the supremum over all such Y is bounded from above by 1/4, hence the supremum over all g in the condition of Theorem 1.18 is at least two, hence larger than one. The intuition behind this counterexample is that we have found one particular bounded measurable g that addresses a distributional property of X that sets it apart from any iid sequence. Indeed, the proof of [51] relies on the Hahn-Banach Theorem and thus on a separation argument, since the set of conditionally iid laws can intuitively be viewed as a closed convex subset of M 1 + (R d ) with extremal boundary comprising the laws with iid components.
On the one hand, Theorem 1.18 is clearly a milestone with regards to the present survey as it solves Problem 1.1 in the general case. On the other hand, it is difficult to apply the derived condition in particular cases of Problem 1.1, when the family M is some (semi-)parametric family of interest -simply because the involved suprema are hard to evaluate. On a high level, Theorem 1.18 solves Problem 1.1 but not the refined Problem 1.5, which depends on an additional dimension-independent property (P). However, the most compelling results of the theory deal precisely with certain dimension-independent properties (P) of interest, see the upcoming sections as well as paragraph 7.5 for a further discussion. This is because, concerning both practical applications and amenable algebra, the additional structure provided by some property (P) and the search for structure-preserving extensions is in many cases a natural and interesting problem, whose algebraic structure in general is highly case-specific.

Binary sequences
We study probability laws on {0, 1} d , i.e. on the set of finite binary sequences. We start with a short digression on the little moment problem, because it occupies a commanding role, not only in this section but also in Section 4 below. For a further discussion between the little moment problem and De Finetti's Theorem, the interested reader is also referred to [16].

Hausdorff 's moment problem
The (reversed) difference operator ∇ for sequences (b 0 , . . . , b d ) of real numbers is defined as , and so on. , then (−1) j ∇ j b k is something like the j-th derivative at k. With this interpretation in mind, d-monotonicity means that the higher-order derivatives alternate in sign, i.e. first derivative is non-positive, second derivative is non-negative, third derivative is non-positive, and so on. For instance, a 2-monotone sequence is non-increasing (b k ≥ b k+1 ) and "convex" (b k+1 is smaller or equal than the arithmetic mean of its neighbors b k and b k+2 ). The set of all d-monotone sequences starting with b 0 = 1 will be denoted by M d in the sequel. Similarly, M ∞ denotes the set of completely monotone sequences starting with b 0 = 1.
Finite sequences in M d arise quite naturally in the context of discrete exchangeable probability laws, as will briefly be explained. Consider an exchangeable probability distribution on the power set (including the empty set) of {1, . . . , d}, which has cardinality 2 d . By exchangeability, the probability of some subset I ⊂ {1, . . . , d} only depends on the cardinality |I| of I, and there are only d + 1 possible cardinalities. Denote the probability of a subset with cardinality k by p k , k = 0, . . . , d. Then p 0 , . . . , p d are non-negative numbers satisfying Defining the sequence (7) The so-called Hausdorff moment problem (also known as little moment problem) states that the sequences M ∞ stand in one-to-one correspondence with the moment sequences of random variables taking values on the unit interval [0, 1]. Concretely, the sequence uniquely determines the probability law of M . This result is originally due to [40,41]. See also [29, p. 225] for a proof. Uniqueness of the probability law of M relies heavily on the boundedness of the interval [0, 1] and is due to the fact that polynomials are dense in the space of continuous functions on a bounded interval (Stone-Weierstrass).
It is important to observe that not every d-monotone sequence can be extended to a completely monotone sequence. Given a d-monotone sequence (b 0 , . . . , b d ), to check whether there exists an extension b d+1 , b d+2 , . . . to an infinite completely monotone sequence {b k } k∈N 0 is a purely analytical, highly non-trivial problem, and luckily already solved. This problem is known as the truncated Hausdorff moment problem. Its solution, due to [47], states that (b 0 , . . . , b d ) with b 0 = 1 can be extended to an element in M ∞ if and only if the Hankel determinantsĤ 1 ,Ȟ 1 , . . . ,Ĥ d−1 ,Ȟ d−1 are all non-negative, which are defined asĤ for all ∈ N 0 with 2 ≤ d, respectively 2 + 1 ≤ d. To provide an example, the sequence (1, 1/2, ) is 2-monotone for all ∈ [0, 1/2], but can only be extended to a completely monotone sequence if ∈ [1/4, 1/2].

Extendibility of exchangeable binary sequences
Actually, before Bruno de Finetti published his seminal Theorem 1.4 in 1937, he first published in [17] the same result for the simpler case of binary sequences. In fact, he showed that there is a one-to-one correspondence between exchangeable probability laws on infinite binary sequences and the set M 1 + ([0, 1]) of probability laws on [0, 1]. We start with a random vector X = (X 1 , . . . , X d ) taking values in {0, 1} d . We know from Lemma 1.3 that X needs to be exchangeable in order to possibly be conditionally iid, so we concentrate on the exchangeable case. Let 1 m , 0 m denote m-dimensional row vectors with all entries equal to one and zero, respectively, and define Consequently, the probability law of X is fully determined by p 0 , . . . , p d .

Theorem 2.2 (Extendibility of exchangeable binary sequences)
Let X be an exchangeable random vector taking values in {0, 1} d . We denote The following statements are equivalent: (a) X is conditionally iid.
(b) There is a random variable M taking values in [0, 1] such that (c) The Hankel determinants in (9) are all non-negative, for all ∈ N 0 with 2 ≤ d, If one (hence all) of these conditions are satisfied, and U = (U 1 , . . . , U d ) is an iid sequence of random variables that are uniformly distributed on [0, 1], independent of M in part (b), then

Proof
The equivalence of (c) and (b) relies on the truncated Hausdorff moment problem and the identities So we define M := 1 − H 1/2 and observe that conditioned on M , the random variables X k are iid Bernoulli with success probability M . This implies the claim.
In words, the canonical stochastic model for conditionally iid X with values in {0, 1} d is a sequence of d independent coin tosses with success probability M which is identical for all coin tosses, but simulated once before the first coin toss. We end this section with two examples of particular interest.

Example 2.3 (Pólya's urn)
Let r ∈ N and b ∈ N denote the numbers of red and blue balls in an urn. Define a random vector X ∈ {0, 1} d as follows: (i) Set k := 1.
(ii) Draw a ball at random from the urn.
(iii) Set X k := 1 if the ball is red, and X k := 0 otherwise.
(iv) Put the ball back into the urn with 1 additional ball of the same color.
It is not difficult to observe that X is exchangeable, since depends on x only through ||x|| 1 . Like in Theorem 2.2 we denote by p k the probability P(X = x) if ||x|| 1 = k, k = 0, . . . , d. Using induction over k = d, d − 1, . . . , 0 in order to verify ( * ) below and knowledge about the moments of the Beta-distribution 4 in ( * * ) below, we observe that where M is a random variable with Beta-distribution whose density is given by Thus, the probability law of X has a conditionally iid representation like in Theorem 2.2. This is one of the traditional examples, in which the conditionally iid structure is a priori not easy to guess from the original motivation of X -in this case a simple urn replacement model.

Example 2.4 (Ferromagnetic Curie-Weiss Ising model)
Motivated by several models in statistical mechanics, [53] study random vectors which admit a density with respect to the law of a vector with iid components which is the exponential of a quadratic form. Concretely, they consider the situation where Y = (Y 1 , . . . , Y d ) is a vector with iid components and Y 1 is assumed to satisfy Of particular interest are cases in which Y 1 takes only finitely many different values. Especially if Y 1 ∈ {0, 1}, the vector X is a binary sequence like in the present section.
A prominent model motivating the investigation of [53] is the so-called Curie-Weiss Ising model. In probabilistic terms, this model is a probability law on {−1, 1} d with two parameters J, h ∈ R, and the components of a random vector Z with this probability law models the so-called spins at d different sites. These spins can either have the value −1 or 1 (so X : We denote for n ∈ {−1, 1} d by N (n) the number of 1's in n, so that d − N (n) equals the number of −1's. For n ∈ {−1, 1} d we define which is an exchangeable probability law on {−1, 1} d . The exponent of the numerator can be re-written as and is called the Hamilton operator of the model. The parameter h determines the external magnetic field and the parameter J denotes a coupling constant. If J ≥ 0 the model is called ferromagnetic, and for J < 0 it is called antiferromagnetic. The ferromagnetic case arises as special case of (10) 1} d is precisely given by the Curie-Weiss Ising model in (12) with J ≥ 0. Notice that for the antiferromagnetic case J < 0 this construction is impossible.
[53, Theorem 1.2] shows that X as defined in (10) is conditionally iid. More concretely, conditioned on a random variable M with density 5 the components of X are iid with common distribution as can easily be checked. In particular, this shows that the aforementioned ferromagnetic Curie-Weiss Ising model is conditionally iid, a result originally due to [76].

Classical results for static factor models
Besides the seminal de Finetti's Theorem 1.4, the most popular results in the theory on conditionally iid models concern latent factor processes H of a very special form to be discussed in the present section. To this end, we consider a popular one-parametric family of one-dimensional distribution functions x → F m (x) on the real line and put a prior distribution on the parameter m ∈ R.
where M is some random variable taking values in the set of admissible values for the parameter m. For some prominent families, for example the zero mean normal law or the exponential law, the resulting distribution of the random vector X belongs to a prominent multivariate family of distributions M, and in fact defines the subset M * ⊂ M. Of particular interest is the case when the subset M * of M admits a convenient analytical description within the framework of the analytical description of the larger family M. By construction, in this method of generating conditionally iid laws the dependence-inducing latent factor process H is fully determined already by a single random parameter M , so that it appears unnatural to formulate the model in terms of a "stochastic process" H at all. Since we investigate situations in which this appears to be more natural in later sections, we purposely do this anyway in order to present all results of the present article under one common umbrella. The "single-parameter construction" just described can then be classified as some kind of "static" process within the realm of all possible processes M 1 + (H). More rigorously, let {H t } t≥0 be the stochastic process from the canonical stochastic representation (4) of some multivariate law in M * ⊂ M. Equivalently, we view this probability law as a d-dimensional marginal law of some infinite exchangeable sequence of random variables {X k } k∈N , and define {H t } t≥0 according to Lemma 1.16 as the uniform limit of , Ω} for t ≤ T ("zero information before T ") and H t = H for t > T ("total information after T "). The present section reviews well-known families of distributions M, for which the set M * consists only of static laws. As already mentioned, this situation typically occurs when the random distribution function H ∈ M 1 + (H) is itself given by H t = F M (t), for a popular family F m of one-dimensional distribution functions and a single random variable M representing a random parameter pick.
Example 3.1 (The multivariate normal law revisited) It follows from Examples 1.6 and 1.15 that N (µ, Σ) * , the conditionally iid normal laws, are static. The random distribution function H as given by (6) obviously satisfies

Example 3.2 (Binary sequences revisited)
If one (hence all) of the conditions of Theorem 2.2 is satisfied, the law of the binary sequence X ∈ {0, 1} d is static. Using the notation in Theorem 2.2, the ran- In the remaining section we treat the mixture of iid zero mean normals in paragraph 3.1 and the mixture of iid exponentials in paragraph 3.2, since these are the best-studied cases of the theory with nice analytical characterizations. The interested reader is also referred to [19,78] who additionally study mixtures of iid geometric variables, iid Poisson variables, and iid uniform variables. Mixtures of uniform random variables are discussed in more detail also in Section 3.3 below.

Spherical laws (aka 2 -norm symmetric laws)
A random vector X ∈ R d is called spherical if its probability distribution remains invariant under unitary transformations, such as rotations or reflections, i.e. X d = X O for an arbitrary orthogonal matrix O ∈ R d×d . A spherical random vector X has a canonical stochastic representation where R is a non-negative random variable and the random vector S is independent of R and uniformly distributed on the Euclidean unit sphere 6 {x ∈ R d : ||x|| 2 = 1}, see [28,Chapter 2]. Hence, realizations of spherical laws must be thought of as being the result of a two-step simulation algorithm: first draw one completely random point on the unit d-sphere, and then scale this point according to some one-dimensional probability distribution on the positive half-axis. In analytical terms, spherical laws are most conveniently treated via their (multivariate) characteristic functions. In particular, it is not difficult to see that X has a spherical law if and only if there exists a real-valued function ϕ : [0, ∞) → R in one variable such that  (b) There are iid standard normal random variables Y 1 , . . . , Y d and an independent positive random variable M ∈ (0, ∞) such that In other words, this means that X has a stochastic representation as in (4) where ϕ is the Laplace transform ϕ of some positive random variable.

Proof
Named after [85], see also [49] or [1, p. 22] for further references. An alternative proof is also given in [19]. Statement (c) is only included in order to highlight how the random radius R must be chosen in the canonical representation (13) such that the law of X is in M * , see also Remark 3.4 below; the interested reader can find a proof for the equivalence (i) As a first step we show Maxwell's Theorem, i.e. if X 1 , . . . , X d are independent and (X 1 , . . . , X d ) is spherically symmetric, then all components X k are actually iid sharing a normal distribution with mean zero. Since (X 1 , . . . , X d ) is spherically symmetric, its characteristic function can be written as for some function ϕ in one variable, see, e.g., [66,Lemma 4.1,p. 161]. Denoting the characteristic function of X k by f k , k = 1, . . . , d, independence of the components implies that ϕ(||u|| 2 2 ) = f 1 (u 1 ) . . . f (u d ). Taking the derivative 7 w.r.t. u k and dividing by ϕ(||u|| 2 2 ) on both sides of the last equation implies for arbitrary k = 1, . . . , d that .
Let u, y ∈ R arbitrary. Plugging u = (u, . . . , u) into (14) shows that Plugging some u which has u as its k-th and y as its j-th component into (14), we observe Since u, y were arbitrary, the functions x → f k (u)/(f k (u) 2 u) are therefore shown to equal some constant c independent of k. Since f k (0) = 1, solving the resulting ordinary differential equation implies that f k (u) = exp(c u 2 ). Left to show is now only that c ≤ 0, because this would imply that f k equals the characteristic function of a zero-mean normal. Since f k is a characteristic function and as such must be positive semi-definite, the inequality must hold. Clearly, this is only possible for c ≤ 0. The case c = 0 is ruled out by the assumption that X is not identical to a vector of zeros.
(ii) If the law of X lies in M * we can without loss of generality assume that X equals the first d members of an infinite exchangeable sequence {X k } k∈N . Conditioned on the tail-σ-field H := ∩ n≥1 σ(X n , X n+1 , . . .) the random variables X 1 , . . . , X d are iid according to De Finetti's Theorem 1.4. We observe for an arbitrary orthogonal , since X is spherical. Since H does not depend on X (but only on the tail of the infinite sequence), this implies that the conditional distribution of X and X O given H 7 Notice that characteristic functions are differentiable. are identical. As O was arbitrary, X conditioned on H is spherical. Maxwell's Theorem now implies that X conditioned on H is an iid sequence of zero mean normals. Thus, only the standard deviation may still be a H-measurable random variable, which we denote by M .
If (P) in Problem 1.5 is the property of "having a spherical law (in some dimension)", then Schoenberg's Theorem 3.3 also implies that M * = M * * , which follows trivially from the equivalence of (a) and (b), since the stochastic construction in (b) clearly works for arbitrary n > d as well. Furthermore, it is observed that the random distribution function H t = Φ(t/M ) in part (b) satisfies the condition in Lemma 1.14 with µ = 0, so conditionally iid spherical laws are radially symmetric. In fact, (arbitrary) spherical laws are always radially symmetric, since (X 1 , . . . , X d ) which shows how to generate realizations of the uniform law on the Euclidean unit sphere from a list of iid standard normals.

Remark 3.5 (Elliptical laws)
Spherical laws are always exchangeable, which is easy to see. A popular method to enrich the family of spherical laws to obtain a larger family beyond the exchangeable paradigm is linear transformation. To wit, for X ∈ R k spherical with characteristic generator ϕ, A ∈ R k×d some matrix with Σ := A A ∈ R d×d and rank of Σ equal to k ≤ d, and with b = (b 1 , . . . , b d ) some real-valued row vector, the random vector is said to have an elliptical law with parameters (ϕ, Σ, b). This generalization from spherical laws to elliptical laws is especially well-behaved from an analytical viewpoint, since the apparatus of linear algebra gets along perfectly well with the definition of spherical laws. The most prominent elliptical law is the multivariate normal distribution, which is obtained in the special case when ϕ(x) = exp(−x/2) is the Laplace transform of the constant 1/2. The case when E[||X|| 2 2 ] < ∞ is of most prominent importance, since the random vector Z then has existing covariance matrix given by E[||X|| 2 2 ] Σ /k. Since the normal distribution special case occupies a commanding role when deciding whether or not a spherical law is conditionally iid according to Theorem 3.3(b), and since we have also solved our motivating Problem 1.1 for the multivariate normal law in Example 1.6, it is not difficult to decide when an elliptical law is conditionally iid as well. To wit, in the most important case when E[||X|| 2 2 ] < ∞ the random vector Z in (16) has a stochastic representation that is conditionally iid if and only if b 1 = . . . = b d , and Z d = R Y + b with R some positive random variable with finite second moment and Y = (Y 1 , . . . , Y d ) multivariate normal with zero mean vector and covariance matrix such as in Example 1.6, i.e. with identical diagonal elements σ 2 > 0 and identical off-diagonal elements ρ σ 2 ≥ 0.

1 -norm symmetric laws
According to [73], a random vector X ∈ [0, ∞) d is called 1 -norm symmetric if it has a stochastic representation where R is a non-negative random variable and the random vector S is independent of R and uniformly distributed on the unit simplex 8 {x ∈ [0, ∞) d : ||x|| 1 = 1}. Comparing this representation to (13), the only difference is that S is now uniformly distributed on the unit sphere with respect to the 1 -norm, rather than on the unit sphere with respect to the Euclidean norm. Consequently, quite similar to spherical laws, realizations of 1norm symmetric distributions must be thought of as being the result of the following twostep simulation algorithm: first draw one completely random point on the d-dimensional unit simplex, and then scale this point according to some one-dimensional probability distribution on the positive half-axis. Remark 3.4 points out an important relationship between the (univariate) standard normal distribution and the uniform law on the Euclidean unit sphere (w.r.t. the Euclidean norm ||.|| 2 ). It is not difficult to observe that the (univariate) standard exponential law plays the analogous role for the uniform law on the unit simplex (w.r.t. the 1 -norm ||.|| 1 ). More precisely, if the components of E = (E 1 , . . . , E d ) are iid exponentially distributed with unit mean, then is uniformly distributed on the unit simplex, cf. [66, Lemma 2.2(2), p. 77] or [28, Theorem 5.2(2), p. 115]. An arbitrary 1 -norm symmetric random vector X is represented as with independent R and E. With the analogy to the spherical case in mind, heuristic reasoning suggests that X is extendible if and only if R is chosen such that it "cancels" out the denominator of S in distribution. Since ||E|| 1 has a unit-scale Erlang distribution with parameter d, this would imply that R should be chosen as R = Z/M for some positive random variable M and an independent random variable Z with Erlang distribution and parameter d. This is precisely the case, as Theorem 3.6 below shows.
Generally speaking, it follows from the canonical stochastic representation (17) that where the last equality uses knowledge about the Laplace transform of the Erlangdistributed random variable i =k E i . This means that the marginal survival functions of the components X k are given by the so-called Williamson d-transform ϕ d,R of R. It has been studied in [92], who shows in particular that the law of R is uniquely determined by ϕ d,R . A similar computation as above shows that the joint survival function of X is given by Theorem 3.6 solves Problem 1.5 for the property (P) of "having an 1 -norm symmetric law (in some dimension)". In this case, for arbitrary d ∈ N we have where X as in (a), M as in (b), S uniformly distributed on the unit simplex, E = (E 1 , . . . , E d ) a vector of iid unit exponentials, and Z a unit-scale Erlang distributed variate with parameter d, all mutually independent. In other words, X has a stochastic representation as in (5) with Z t := M t, in particular is conditionally iid.

Proof
The implication (b) ⇒ (a) works precisely along the stochastic model claimed, and is readily observed. The implication (a) ⇒ (b) is known as Kimberling's Theorem, see [48].
We provide a proof sketch in the sequel. From d = 1 we observe that ϕ is the survival function of some positive random variable. Consequently, due to Bernstein's Theorem 9 , it is sufficient to prove that ϕ is completely monotone, meaning that (−1) d ϕ (d) ≥ 0 for all d ∈ N 0 . To this end, recall that To this end, we consider the infinite sequence of random variables {U k } k∈N with U k := ϕ(X k ), k ∈ N, and with α := ϕ(x/d) and β := ϕ(x/d − h) > α define the events A lengthy but straightforward computation, with one application of the inclusion exclusion principle, shows that which implies the claim. , with a constant r > 0. In fact, [92] shows that the set of Williamson d-transforms is a simplex with extremal boundary given by {ϕ d,r } r>0 , which is just another way to say that the function ϕ d,R determines the probability law of the positive random variable R uniquely. Similarly, Laplace transforms form a simplex with extremal boundary given by the functions x → exp(−m x) for m > 0, which is just another way to say that the function ϕ(x) = E[exp(−x M )] determines the law of the positive random variable M uniquely. Typical parametric examples for Laplace transforms in the context of 1 -norm symmetric distributions are ϕ(x) = (1 + x) −θ with θ > 0, corresponding to a Gamma distribution of M , or ϕ(x) = exp(−x θ ) with θ ∈ (0, 1), corresponding to a stable distribution of M .

Remark 3.9 (Extension to Liouville distributions)
Analyzing the analogy between spherical laws (aka 2 -norm symmetric laws) and 1norm symmetric laws, there is one common mathematical fact on which the analytical treatment of both families relies. To wit, for both families the uniform distribution on the d-dimensional unit sphere can be represented as the normalized vector of iid random variables. In the spherical case the normalized vector Y /||Y || 2 of d iid standard normals Y = (Y 1 , . . . , Y d ) is uniform on the ||.|| 2 -sphere, whereas in the 1 -norm symmetric case the normalized vector E/||E|| 1 of d iid standard exponentials E = (E 1 , . . . , E d ) is uniform on the ||.|| 1 -sphere. Furthermore, in both cases the normalization can be "canceled out" in distribution, that is where Z d = ||Y || 2 is independent of Y and has a χ 2 -law with d degrees of freedom and R d = ||E|| 1 is independent of E and has an Erlang distribution with parameter d. The so-called Lukacs Theorem, due to [55], states that the exponential distribution of the E k in the last distributional equality can be generalized to a Gamma distribution (but no other law on (0, ∞) is possible). More precisely, if G = (G 1 , . . . , G d ) are independent random variables with Gamma distributions with the same scale parameter, then ||G|| 1 is independent of G ||G|| 1 , which means that The random vector S := G/||G|| 1 on the unit simplex is not uniformly distributed unless the G k happen to be iid exponential. In general, the law of S is called Dirichlet distribution, parameterized by the d values α = (α 1 , . . . , α d ), where the Gamma density of G k is given by Notice that the scale parameter of this Gamma distribution is without loss of generality set to one, since it has no influence on the law of S. A d-parametric generalization of 1norm symmetric laws is obtained by replacing the uniform law of S on the unit simplex (which is obtained for α 1 = . . . = α d ) with a Dirichlet distribution (with arbitrary α k > 0). One says that the random vector X = R S with R some positive random variable and S an independent Dirichlet-distributed random vector on the unit simplex, follows a Liouville distribution. It is precisely the property (18) that implies that the generalization to Liouville distributions is still analytically quite convenient to work with, see [74] for a detailed study. Analogous to the 1 -norm symmetric case, the components of X are conditionally iid if α 1 = . . . = α d and R satisfies R d = Z/M with Z d = ||G|| 1 and M some independent positive random variable.

∞ -norm symmetric laws
[36, Theorem 2] studies random vectors X = (X 1 , . . . , X d ) which are absolutely continuous with density given by with some measurable function g d : Since f X is invariant with respect to permutations of the components of x, the random vector X is exchangeable. But whether or not it is conditionally iid depends on the choice of g d . First of all, since f X is a probability density, constituting a necessary and sufficient integrability condition on g d such that f X defines a proper probability density. Furthermore, lower-dimensional margins of X have a density of the same structural form, since and the function g d−1 is easily checked to satisfy (21) It is further not difficult to verify that g d is given in terms of g d−1 as If M denotes the family of all laws with density of the form (20), i.e. with a function g d satisfying (21), the following result provides necessary and sufficient conditions on g d to define a law in M * .
By non-increasingness, we may without loss of generality assume that g d is rightcontinuous (otherwise, change to its right-continuous version, which does not change the density f X essentially). Applying integration by parts, (24) and (21) imply Consequently, x → x 0 y d d − g d (y) defines the distribution function of a positive random variable M , and we see that Now let U as claimed be independent of M . Conditioned on M , the density of M U is Integrating out M , the density of M U is found to be which shows (c). The hardest part is (a) ⇒ (b). Fix > 0 arbitrary. Due to measurability of g d , Lusin's Theorem guarantees continuity of g d on a set C whose complement has Lebesgue measure less than . Without loss of generality we may assume that all points t in C are density points, i.e. satisfy where λ denotes Lebesgue measure. Let {X k } k∈N an infinite exchangeable sequence such that d-margins have the density f X . Fix t ≥ s arbitrary. We define the sequence of random variables {ξ k } k∈N by Notice that the ξ k are square-integrable and If we divide by d 2 and let d → ∞, it follows that E[ξ 1 ξ 2 ] ≥ 0. Denoting by g 2 the marginal density of (X 1 , X 2 ), we observe for certain values s − δ ≤ η s ≤ s + δ and t − δ ≤ η t ,η t ≤ t + δ by the mean value theorem for Lebesgue integration. As δ 0, we thus observe that g 2 (s) ≥ g 2 (t), i.e. g 2 is non-increasing. Making use of (23) and integrating by parts, we observe that which implies that g 3 is non-increasing as well. Inductively, the same argument implies that g 4 , . . . , g d are all non-increasing.
From the equivalence of (a) and (c) in Theorem 3.10 we observe easily that M * = M * * , when considering the property (P) of "having a density of the form (20) (in some dimension d ∈ N)" in Problem 1.5. Notice furthermore that the law of M U is static in the sense defined in the beginning of this section, and we have for X k := M U k as defined in part (c) of Theorem 3.10.

Remark 3.11 (Common umbrella of p -norm symmetry results)
Theorem 3.10 on ∞ -norm symmetric densities is very similar in nature to Schoenberg's Theorem 3.3 on 2 -norm symmetric characteristic functions and Theorem 3.6 on 1norm symmetric survival functions, which makes it a beautiful result with regards to the present survey. The reference [78] considers all these three cases under one common umbrella, and even manages to generalize them in some meaningful sense to the case of arbitrary p -norm, with p ∈ (0, ∞] arbitrary. More precisely, it is shown that an infinite exchangeable sequence {X k } k∈N of the form X k := M Y k , k ∈ N, with M > 0 and an independent iid sequence {Y k } k∈N of positive random variables is p -norm symmetric in some meaningful sense 10 if and only if the random variables Y k have density f p given by Notice that f 1 , f 2 , and f ∞ are the densities of the unit exponential law, the absolute value of a standard normal law, and the uniform law on [0, 1], respectively. This parametric family in the parameter p is further investigated, and might for instance be characterized by the fact that f p for p < ∞ has maximal entropy among all densities on (0, ∞) with pth moment equal to one, and f ∞ has maximal entropy among all densities with support (0, 1), which is [78,Theorem 3.5].
An analogous result to Theorem 3.10 on mixtures of the form M U , when the components of U are iid uniform on [−1, 1], is also presented in [36]. The resulting densities depend on two arguments, x [1]   Remark 3.12 (Relation to non-homogeneous pure birth processes) [87] provide an interesting interpretation of ∞ -norm symmetric densities, which is briefly explained. Every non-negative function g d satisfying (21) is of the form for some non-negative function r d satisfying ∞ 0 r d (x) dx = ∞, and some normalizing constant c d > 0. To wit, a function for some normalizing constant c d > 0 does the job, as can readily be checked. From such a function r d we iteratively define functions r d−1 , . . . , r 1 by solving the equations where R k (x) := x 0 r k (u) du for k = 1, . . . , d. Notice that r k is related to the right-hand side of (26) exactly in the same way as r d is related to g d , so the solution (25) shows how the r k look like in terms of r k+1 . We define independent positive random variables E 1 , . . . , E d with survival functions P(E k > x) = exp(−R k (x)), k = 1, . . . , d, x ≥ 0. Independently, let Π be a random permutation of {1, . . . , d} with P(Π = π) = 1/d! for each permutation π of {1, . . . , d}, i.e. Π is uniformly distributed on the set of all d! permutations. We consider the increasing sequence of random variables T 1 < T 2 < . . . < T d defined by T k := E 1 + . . . + E k . Then the (obviously exchangeable) random vector X = (X 1 , . . . , X d ) := (T Π(1) , . . . , T Π(d) ) has density (20). If E 1 , E 2 , . . . is an arbitrary sequence of independent, absolutely continuous, positive random variables the counting process is called non-homogeneous pure birth process with intensity rate functions r k (x) := − ∂ ∂x log{P(E k > x)}, k ≥ 1. A random permutation of the first d jump times T k := E 1 + . . . + E k , k = 1, . . . , d, of a pure birth process N thus has an ∞ -norm symmetric density if the intensities r 1 , . . . , r d−1 can be retrieved recursively from r d via (26). The case of arbitrary intensities r 1 , . . . , r d hence provides a natural generalization of the family of ∞ -norm symmetric densities. It appears to be an interesting open problem to determine necessary and sufficient conditions on r 1 , . . . , r d such that the respective exchangeable density is conditionally iid, see also paragraph 7.1 below.

Example 3.13 (Pareto mixture of uniforms)
Let M in Theorem 3.10 have survival function P(M > x) = min{1, x −α } for some α > 0. The associated function g d generating the ∞ -norm symmetric density is given by The components X k of X have the following one-dimensional distribution function G(x) := P(X k ≤ x), and respective inverse G −1 , given by This induces the one-parametric bivariate copula family defined by Scatter plots from this copula for different values of α are depicted in Figure 2, visualizing the dependence structure behind pairs of X. The dependence decreases with α, and the limiting cases α = 0 and α = ∞ correspond to perfect positive association and independence, respectively. One furthermore observes that the dependence is highly asymmetric, i.e. large values of G(X 1 ), G(X 2 ) are more likely jointly close to each other than small values, which behave like independence. This effect can be quantified in terms of the so-called upper-and lower-tail dependence coefficients, given by respectively.  Fig. 2 Left: 5000 samples of (G(X 1 ), G(X 2 )) for α = 0.1 in Example 3.13. Right: 5000 samples of (G(X 1 ), G(X 2 )) for α = 1 in Example 3.13.

The multivariate lack-of-memory property
A random vector X = (X 1 , . . . , X d ) with non-negative components is said to satisfy the (multivariate) lack-of-memory property if for arbitrary 1 ≤ i 1 < . . . < i n ≤ d we have that with the t, t 1 , . . . , t d either in (0, ∞) (continuous support case) or in N 0 (discrete support case). The lack-of-memory property is very intuitive when the k-th component X k of X is interpreted as the future time point at which the k-th component in a system of d components fails. In words, it means that conditioned on the survival of an arbitrary sub-system (i 1 , . . . , i n ) until time t, the residual lifetimes of the components i 1 , . . . , i n are identical in distribution to the lifetimes at inception of the system. Needless to mention that such intuitive property occupies a commanding role in reliability theory, see [6] for a textbook treatment, but is also important in other contexts such as financial risk management, e.g., [25,54]. An alternative way to formulate the multivariate lackof-memory property, due to [12], is the following.
From a theoretical point of view, studying the (multivariate) lack-of-memory property is also natural as it generalizes very popular one-dimensional probability distributions to the multivariate case. Indeed, if d = 1 we abbreviate X := X 1 and recall the following classical characterizations. Lemma 4.1 (Characterization of lack-of-memory for d = 1) (E) If the support of X equals [0, ∞), then X satisfies the lack-of-memory property if and only if X has an exponential distribution, that is P(X > t) = exp(−λ t) for some λ > 0.
(G) If the support of X equals N, then X satisfies the lack-of-memory property if and only if X has a geometric distribution, that is P(X > n) = (1 − p) n for some p ∈ (0, 1).

Marshall-Olkin and multivariate geometric distributions
The well known characterizations of univariate lack-of-memory in Lemma 4.1 have been lifted to the multivariate case in [70] and [4,59], respectively, which is briefly recalled.
First of all, we introduce the multivariate exponential models of [70] and [4]. To this end, we denote by E(λ) the univariate exponential law with rate λ > 0, and by G(p) the univariate geometric distribution with parameter p ∈ (0, 1), i.e. with survival function P(X > n) = (1 − p) n . In order to include boundary cases, we denote by E(0), G(0) the probability law of a "random" variable that is identically equal to infinity, and by G(1) the probability law of a "random" variable that is identically equal to one. Then X satisfies the multivariate lack-of-memory property, which is easy to see while noticing that the survival function of X equals (G) For each (possibly empty) I ⊂ {1, . . . , d} let p I ∈ [0, 1] with I : k / ∈I p I < 1 for each k = 1, . . . , d and I p I = 1. The probabilities p I define a probability law on the power set of {1, . . . , d}. Let S 1 , S 2 , . . . be an iid sequence drawn from this law and denote by G I the smallest n ∈ N such that S n = I. Notice that G I ∼ G(p I ). We define the random vector X with values in N d by X k := min{G I : k ∈ I}, k = 1, . . . , d.
Then X satisfies the multivariate lack-of-memory property. Furthermore, the survival function of X equals The probability distribution in part (E) of Example 4.2 is called Marshall-Olkin distribution. It is named after [70]. The probability distribution in part (G) of Example 4.2 is called wide-sense geometric distribution. The stochastic model has been introduced in [4]. The presented form of the survival function is computed in [59]. The following lemma shows that the multivariate stochastic models in Example 4.2 define precisely the multivariate analogues of Lemma 4.1.

Lemma 4.3 (Characterization of lack-of-memory for d ≥ 1) (E) The d-variate
Marshall-Olkin distribution is the only probability law with support [0, ∞) d satisfying the lack-of-memory property.
(G) The d-variate wide-sense geometric distribution is the only probability law with support N d satisfying the lack-of-memory property.

Proof
Part (E) is due to the original reference [70], while part (G) is shown in [59].

Example 4.4 (Narrow-sense geometric law)
If Y has a Marshall-Olkin distribution and we define X := ( Y 1 , . . . , Y d ), then X is said to have the narrow-sense geometric distribution. As the nomenclature suggests, the narrow-sense geometric distribution is a subset of the wide-sense geometric distribution in dimensions d ≥ 2 (and identical for d = 1), which is very easy to see by the characterizing lack-of-memory property of the Marshall-Olkin law. Not every wide-sense geometric law can be constructed like this, i.e. the narrow-sense family defines a proper subset of the wide-sense family. This indicates that for d ≥ 2 the structure of the discrete lack-ofmemory property is more delicate than the structure of its continuous counterpart. For example, while two components of a random vector with Marshall-Olkin distribution or narrow-sense geometric distribution cannot be negatively correlated, two components of a random vector with wide-sense geometric distribution can be, see [59] for details.

Infinite divisibility and Lévy subordinators
The concept of infinite divisibility is of fundamental importance in the present section, but also in Sections 5 and 6 below. Thus, we briefly recall the required background in the present paragraph. For an elaborate textbook treatment we refer to [82]. The concept of a Lévy subordinator plays an essential role when studying the conditionally iid subfamily of the Marshall-Olkin distribution, a result first discovered in [64]. Recall that a càdlàg stochastic process Z = {Z t } t≥0 with Z 0 = 0 is called a Lévy process if it has stationary and independent increments, which means that: Hence, Lévy processes are the continuous-time equivalents of discrete-time random walks. A non-decreasing Lévy process is called a Lévy subordinator. However, there is one fundamental difference between a random walk and a Lévy process: the probability law of the increments in a random walk is arbitrary on R, whereas the law of the increments in a Lévy process need to satisfy a certain compatibility condition with respect to time, as increments of arbitrarily large time span can be considered. Concretely, it is immediate from the definition of a Lévy process that the probability law of Z 1 is infinitely divisible. Recall that a random variable X is called infinitely divisible if for each n ∈ N there exist iid random variables X n . Furthermore, if X has an infinitely divisible probability law, there exists a Lévy process Z = {Z t } t≥0 , which is uniquely determined in law, such that Z 1 d = X. As a consequence, a Lévy subordinator Z is uniquely determined in distribution by the law of Z 1 , or analytically by the function Ψ(x) := − log(E[exp(−x Z 1 )]), x ≥ 0. One calls Ψ the Laplace exponent of the infinitely divisible random variable Z 1 (or of the Lévy subordinator Z). The function Ψ is a so-called Bernstein function, which means that it is infinitely often differentiable on (0, ∞) and the derivative Ψ is completely monotone, i.e. (−1) k+1 Ψ (k) ≥ 0 for all k ≥ 1, see [8,84] for textbook treatments on the topic. The value Ψ(0) is by definition equal to zero but we might have a jump at zero meaning that Ψ(x) > > 0 for all x > 0 is possible. Intuitively, this is the case if and only if P(Z t = ∞) > 0 for t > 0, and in this case one sometimes also speaks of a killed Lévy subordinator.

Analytical characterization of exchangeability and conditionally iid
By Lemma 1.3 a random vector X with either Marshall-Olkin distribution or widesense geometric distribution can only be conditionally iid if it is exchangeable. An elementary computation shows that the Marshall-Olkin distribution (resp. wide-sense geometric distribution) is exchangeable if and only if its parameters λ I (resp. p I ) depend on the indexing subsets I only through their cardinality |I|. In this exchangeable case, we denote these parameters by λ 1 , . . . , λ d (resp. p 0 , p 1 , . . . , p d ), with subindices denoting the possible cardinalities, i.e. λ k := λ {1,...,k} and p k := p {1,...,k} , and combinatorial computations show that the survival functionF of X takes the convenient algebraic formF for either x 1 , . . . , x d ∈ [0, ∞) with x 0 := 0 (in the Marshall-Olkin case) or n 1 , . . . , n d ∈ N 0 with n 0 := 0 (in the wide-sense geometric case), and with 11 While the parameters λ k (resp. p k ) are intuitive since they allow for the probabilistic interpretations according to Example 4.2, the re-parameterization in terms of the new parameters b k is more convenient with regards to finding an answer to the question: when is X conditionally iid? The main result in this regard is stated below in Theorem 4.6 below, which requires the notion of d-monotone sequences and log-d-monotone sequences. The concept of d-monotonicity as well as the notations M d and M ∞ have already been introduced in paragraph 2.1, the related concept of log-d-monotonicity is introduced in the following definition.
The notion of a log-d-monotone sequence is less intuitive than that of a d-monotone sequence. First notice that, in contrast to the definition of a d-monotone sequence in paragraph 2.1, log(b d ) ≥ 0 needs not hold for a log-d-monotone sequence, which is explained by the following useful relationship between (d−1)-monotonicity and log-d-monotonicity. It helps to transform statements involving log-d-monotonicity into statements involving only the simpler notion of (d − 1)-monotonicity 12 : The set of all log-d-monotone sequences starting with b 0 = 1 will be denoted by LM d in the sequel. Similarly, LM ∞ denotes the sets of completely log-monotone sequences starting with b 0 = 1. [59,Proposition 4.4] shows that {b k } k∈N ∈ LM ∞ if and only if {b t k } k∈N ∈ M ∞ for arbitrary t > 0. In particular, LM ∞ ⊂ M ∞ . Theorem 4.6 below provides a second result, besides Theorem 2.2, showing that whether or not a (log-) d-monotone sequence can be extended to a completely (log-) monotone sequence plays an important role in the context of the present survey.
In order to better understand the following theorem it is helpful to know that the Laplace exponent Ψ of a Lévy subordinator Z is already completely determined by its values on N, i.e. by the sequence {Ψ(k)} k∈N 0 . Furthermore, the sequence {exp(−Ψ(k))} k∈N 0 equals the moment sequence of the random variable exp(−Z 1 ), so lies in M ∞ by the little moment problem, see paragraph 2.1. Since for arbitrary t > 0 even the sequence {exp(−t Ψ(k))} k∈N 0 lies in M ∞ as the moment sequence of exp(−Z t ), the sequence {exp(−Ψ(k))} k∈N 0 even lies in the smaller set LM ∞ of completely log-monotone sequences. The subset LM ∞ M ∞ corresponds to precisely the infinitely divisible laws on [0, ∞], which is the discrete analogue of the well known statement that exp(−t Ψ) is a completely monotone function for arbitrary t > 0 if and only if Ψ is completely monotone. With this information and the information of paragraph 4.2 as background the following theorem is now quite intuitive. Theorem 4.6 solves Problem 1.5 for the property (P) of "satisfying the multivariate lack-of-memory property".
such that X has the same distribution as the vector defined in (5).
(G) The function (27) such that X has the same distribution as the vector defined in (5) when Proof Part (E) is due to [63,64], while part (G) is due to [59].
First, we observe that once the correspondence between M d and the wide-sense geometric law is established, the correspondence between LM d and the narrow-sense geometric law (or, algebraically equivalent, its continuous counterpart the Marshall-Olkin law) follows from (30) together with (28) and (29). This is because the λ j in (28) are arbitrary non-negative numbers, and the p i in (29) are also arbitrary non-negative up to scaling (i.e. with an additional scale factor c > 0 we have that c (p 0 , . . . , p d−1 ) and (λ 1 , . . . , λ d ) both run through all of [0, ∞) d \ {(0, . . . , 0)}, noticing that p d is determined by p 0 , . . . , p d−1 ). Concretely, by the correspondence between M d and the wide-sense geometric law, we obtain a correspondence between M d and [0, ∞) d \ {(0, . . . , 0)} up to scaling in (29). In particular, the property of being d-monotone is not affected by c. Replacing the λ j in (28) by c p j−1 and making use of (30), we then end up with the correspondence between LM d and the Marshall-Olkin law. To establish the correspondence between M d and the wide-sense geometric law is really only a tedious algebraic computation, see [59] for details. Essentially, d-monotonicity enters the scene precisely for the same reason as in paragraph 2.1.
Regarding the conditionally iid subfamliy, the crucial insight is that M ∞ stands in oneto-one relation with the set of probability measures on [0, ∞] via (32), which is exactly the well-known statement of the little moment problem, only formulated for the compact interval [0, ∞] instead of the more usual interval [0, 1] via the transformation − log. That the (discrete) random walk construction in part (G) can only be "made continuous" in case Y 1 is infinitely divisible is very intuitive, and the Lévy subordinator in part (E) is simply the continuous analogue of the discrete random walk in that case.
Since the narrow-sense geometric law of Example 4.4 is a special case of the wide-sense geometric law, it follows that LM d M d , which in fact is not an obvious statement. Furthermore, X in part (G) of Theorem 4.6 happens to be narrow-sense geometric if and only if the random variable Y 1 is infinitely divisible. In fact, the elements of LM ∞ stand in one-to-one correspondence with the family of infinitely divisible laws on [0, ∞] via (31), whereas the elements of the larger set M ∞ stand in one-to-one correspondence with the family of arbitrary probability laws on [0, ∞] via (32), which is just a slight re-formulation of the little moment problem.

Remark 4.7 (Analytical criterion for conditionally iid)
Given an exchangeable random vector X with lack-of-memory property and parameters (b 1 , . . . , b d ), Theorem 4.6 implies that X has a stochastic representation that is conditionally iid if (b 0 , . . . , b d ) can be extended to a completely (log-) monotone sequence. Using (30), an element (b 0 , . . . , b d ) ∈ LM d is extendible to an element in LM ∞ if and only if the (d − 1)-monotone sequence (− log(b 1 /b 0 ), . . . , − log(b d /b d−1 )) is extendible to a completely monotone sequence. Thus, we can concentrate on the completely monotone case. Deciding whether a d-monotone sequence can be extended to a completely monotone sequence is the truncated Hausdorff moment problem again, see Section 2.1. This means that an effective analytical criterion for extendibility is known.
The following example demonstrates how a parameter sequence {b k } k∈N 0 for some widesense geometric law is conveniently defined via the link to the little moment problem, setting b k := E[X k ], k ∈ N 0 , where X is some arbitrary random variable taking values in [0, 1].

Example 4.8 (A two-parametric family based on the Beta distribution)
Consider a random variable X with density with parameters p, q > 0, which is a Beta distribution. The moment sequence is known to be 13 so that a two-parametric family of d-variate wide-sense geometric survival functions (for arbitrary d ≥ 1) is given bȳ The associated probability distribution of Y 1 in Theorem 4.6(G) is given by Y 1 d = − log(X), i.e. the logarithm of the reciprocal of the Beta distribution in concern. Similarly, making use of (30), a two-parametric family of d-variate Marshall-Olkin survival functions (for arbitrary d ≥ 1) is given bȳ In the special case when q = 2, the Lévy subordinator in Theorem 4.6(E) is of compound Poisson type with intensity p + 1 and jumps that are exponentially distributed with parameter p.

Max-/ min-stable laws and extreme-value copulas
Throughout this paragraph, for the sake of a more compact notation we implicitly make excessive use We denote by F (resp.F ) the d-variate distribution function (resp. survival function) of some d-dimensional random vector Y = (Y 1 , . . . , Y d ) (resp. X = (X 1 , . . . , X d )).
(a) (The probability law of) Y is said to be max-stable if for arbitrary t > 0 there are In this case, we also say that F is max-stable. In words, F t is again a distribution function and equals F modulo a linear transformation of its arguments.
(b) (The probability law of) X is said to be min-stable if for arbitrary t > 0 there are In this case, we also say thatF is min-stable. In words,F t is again a survival function and equalsF modulo a linear transformation of its arguments.
If Y is max-stable and Y (i) are independent copies of Y , then for arbitrary n ∈ N we observe Similarly, if X is min-stable this means In words, the component-wise re-scaled maxima of iid copies of Y (resp. minima of iid copies of X) have the same distribution as Y (resp. X).
Max and min-stability play a central role in multivariate extreme-value theory, as will briefly be explained. If V (i) are independent copies of some random vector V = (V 1 , . . . , V d ), one is interested in the probability law of the vectors of component-wise maxima, that is If one can find sequences α 1 (n), . . . , α d (n) > 0 and β 1 (n), . . . , β d (n) ∈ R such that the re-scaled vector converges in distribution to some Y = (Y 1 , . . . , Y d ), then one says that Y has a multivariate extreme-value distribution. A classical result in multivariate extreme-value theory states that Y has a multivariate extreme-value distribution if and only if Y is max-stable, see, e.g., [45, pp. 172-174].
Since Y is max-stable if and only if −Y is min-stable (obviously), max-and min-stability can be studied jointly by focusing on one of the two concepts. Classical extreme-value theory textbooks typically focus on max-stability and further subdivide the study of the probability law of max-stable Y into two sub-studies: (i) By the Fisher-Tippett-Gnedenko Theorem, the univariate distribution function F k of each component Y k necessarily belongs to either the Gumbel, the Fréchet or the Weibull family, see [7, Chapter 2, p. 45 ff] for background.
(ii) Having understood the univariate marginal distribution functions F 1 , . . . , F d according to (i), the distribution function F of Y necessarily takes the form for a copula C : [0, 1] d → [0, 1] with the characterizing property that C(u) t = C(u t 1 , . . . , u t d ) for each t > 0, a so-called extreme-value copula. In order to focus on a deeper understanding of extreme-value copulas it is convenient to normalize the margins F 1 , . . . , F d . In classical extreme-value theory, it is standard to normalize to standardized Fréchet distributions, i.e. F k (x) = exp(−λ k /x) 1 {x>0} for some λ k > 0. Furthermore, we observe that X := (1/Y 1 , . . . , 1/Y d ) is well-defined, X k is exponential with rate λ k , and X is min-stable (since x → 1/x is strictly decreasing, so max-stability of Y is flipped to min-stability of X). The vector X is thus called min-stable multivariate exponential and has survival function with extreme-value copula C. The survival functionF is min-stable, satisfyinḡ The analytical property (33) characterizes the concept of min-stable multivariate exponentiality on the level of survival functions, and serves as a convenient starting point to study the conditionally iid subfamily (of extreme-value copulas, resp. min-stable multivariate exponential distributions). For a given extreme-value copula C it further turns out convenient to consider its so-called stable tail dependence function which satisfies (t x) = t (x). Clearly, determines C and C determines , so that investigating instead of C is just a matter of convenience. Wrapping up, a minstable multivariate exponential distribution is fully determined by the rates (λ 1 , . . . , λ d ) specifying the one-dimensional exponential margins, and by a stable tail dependence function which stands in a one-to-one relationship with the associated extreme-value copula C.

Analytical characterization of conditionally iid
In the sequel, we are interested in the question: when is a min-stable multivariate exponential vector X, i.e. one whose survival function satisfies (33), conditionally iid?
We start with two important examples.

Example 5.2 (Independent exponentials)
If the components X 1 , . . . , X d of X are iid, then we only need to consider the law of X 1 . By definition, X 1 must have an exponential law, so there is some λ > 0 such that for arbitrary t > 0 we havē Consequently, X is min-stable multivariate exponential. The associated stable tail dependence function is (x) = x 1 + . . . + x d .
For arbitrary c ≥ 0 we introduce the notation H +,c ⊂ H + for distribution functions of non-negative random variables with mean equal to c. For G ∈ H + we further denote by With an iid sequence of unit exponentials η 1 , η 2 , . . . we consider the stochastic process . It is not difficult to see that H := 1 − exp(−Z) ∈ M 1 + (H + ). Consequently, we may define a conditionally iid random vector X via the canonical stochastic model (5) from this process H. Conditioned on H, the components of X are iid with distribution function H. It turns out that X is min-stable multivariate exponential. To see this, we recall that the increasing sequence {η 1 + . . . + η n } n≥1 equals the enumeration of the points of a Poisson random measure on [0, ∞) with intensity measure equal to the Lebesgue measure. This implies with the help of [79,Proposition 3.6] in ( * ) below that the survival functionF of X is given bȳ We introduce the notation and we observe by substitution that t G (x) = G (t x) for arbitrary t > 0. This implies , so X is min-stable multivariate exponential. The function G is the stable tail dependence function of X. The constant M G equals the exponential rate of the exponential random variables X 1 , . . . , X d .
The main theorem in this section states that Examples 5.2 and 5.3 are general enough to understand the structure of the set of all infinite exchangeable sequences {X k } k∈N whose finite-dimensional margins are both min-stable multivariate exponential and conditionally iid. Concretely, Theorem 5.4 solves Problem 1.5 for the property (P) of "having a min-stable multivariate exponential distribution (in some dimension)". In analytical terms, it states that the stable tail dependence function associated with the extremevalue copula of a conditionally iid min-stable multivariate exponential random vector is a convex mixture of stable tail dependence functions having the structural form as presented in Examples 5.2 and 5.3.
Theorem 5.4 (Which min-stable laws are conditionally iid?) Let {X k } k∈N be an infinite exchangeable sequence of positive random variables such that X = (X 1 , . . . , X d ) is min-stable multivariate exponential for all d ∈ N. Assume that {X k } k∈N is not iid, i.e. not given as in Example 5.2. Then there exists a unique triplet (b, c, γ) of two constants b ≥ 0, c > 0 and a probability measure γ on H +,1 , such that X k is exponential with rate b + c for each k ∈ N and the stable tail dependence function of X equals In probabilistic terms, the random distribution function H, defined as the limit of empirical distribution functions of the {X k } k∈N as in Lemma where G (k) is an iid sequence drawn from the probability measure γ, independent of the iid unit exponentials η 1 , η 2 , . . ..

Proof
A proof consists of three steps, which have been accomplished in the three references [67,52,57], respectively, and which are sketched in the sequel.
where Z (i) are independent copies of Z. Conversely, it is shown that if Z is nondecreasing and satisfies (35), then 1 − exp(−Z) is an element of Θ −1 d (M * * ), when M * * is as in Problem 1.5 and (P) is the property of "having a d-variate min-stable multivariate exponential distribution".
(ii) [52] show that a non-negative stochastic process Z satisfying (35) admits a series representation of the form where f (n) are iid copies of some càdlàg stochastic process f with f 0 = 0 satisfying some integrability condition, and b ∈ R.
(iii) [57] proves that b ≥ 0 and that f from the series representation in (ii) is necessarily non-decreasing almost surely. Furthermore, the integrability condition on f can be re-phrased to say that t →G t := exp(− lim s↓t f 1/s ) defines almost surely the distribution function of some random variable with finite mean MG = ∞ 0 1 − G t dt > 0. Finally, the distribution function t → G t :=G MG t has unit mean, and the claimed representation for is obtained when c := E[MG] and γ is defined as the probability law of G after an appropriate measure change. That (b, c, γ) is unique follows from the normalization to unit mean of G (for each single realization).
Stochastic processes with property (35) are said to be strongly infinitely divisible with respect to time (strong IDT). Particular examples of strong IDT processes have been studied in [69,24,39], with an emphasis on the associated multivariate min-stable laws also in [67,9,56,68].
Every Lévy process is strong IDT, but the converse needs not hold. For instance, if Z = {Z t } t≥0 is a non-trivial Lévy subordinator and a > b > 0, then the stochastic process {Z a t + Z b t } t≥0 is strong IDT, but not a Lévy subordinator. The probability law γ in Theorem 5.4 in case of a Lévy subordinator is specified as the probability law of  This is the so-called logistic model. It is particularly convenient to be looked at from the perspective of conditionally iid models, since the associated strong IDT process Z takes a very simple form, to wit In particular, the resulting extreme-value copula, named Gumbel copula after [37,38], is also an Archimedean copula, see Remark 3.8. In fact, it is the only copula that is both Archimedean and of extreme-value kind, a result first discovered in [35].
A related example is obtained, if we choose the Weibull distribution function G(x) = 1 − exp(−{Γ(θ + 1) x} 1/θ ), which implies This is the so-called negative logistic model. The associated extreme-value copula is named Galambos copula after [32]. There exist many analogies between logistic and negative logistic models, the interested reader is referred to [33] for background. In particular, the Galambos copula is the most popular representative of the family of so-called reciprocal Archimedean copulas as introduced in [34], see also paragraph 7.1 below. Example 5.6 (A rich parametric family) For G ∈ H +,1 the function Ψ G (z) := ∞ 0 1 − G z (t) dt defines a Bernstein function with Ψ G (1) = 1, see [56,Lemma 3]. This implies for z ∈ (0, ∞) that G z ∈ H +,1 , where G z (x) := G(x Ψ G (z)) z . Consequently, if A is a positive random variable, we may define γ as the law of G A ∈ M 1 + (H +,1 ). The associated stable tail dependence function equals (x) := E[ G A (x)]. Many parametric models from the literature are comprised by this construction. In particular, Example 5.3 corresponds to the case A ≡ 1, and if G(x) = exp(−1) + (1 − exp(−1)) 1 {1−exp(−1)≥1/x} we observe that G A equals the random distribution function (36) corresponding to the Marshall-Olkin subfamily. See [68] for a detailed investigation and applications of this parametric family.

Remark 5.7 (Extension to laws with exponential minima)
We have seen that the Marshall-Olkin distribution is a subfamily of min-stable multivariate exponential laws. The seminal reference [23] treats both families as multivariate extensions of the univariate exponential law and in the process introduces the even larger family of laws with exponential minima. A random vector X is said to have exponential minima if min{X i 1 , . . . , X i k } has a univariate exponential law for arbitrary 1 ≤ i 1 < . . . i k ≤ d. Obviously, a min-stable multivariate exponential law has exponential minima, but the converse needs not hold in general. It is shown in [67] that if Z = {Z t } t≥0 is a right-continuous, non-decreasing process such that E[exp(−x Z t )] = exp(−t Ψ(x)) for some Bernstein function Ψ, then X as defined in (5) has exponential minima. The process Z is said to be weakly infinitely divisible with respect to time (weak IDT), and -as the nomenclature suggests -every strong IDT process is also weak IDT. However, there exist weak IDT processes which are not strong IDT. Notice in particular that a Lévy subordinator is uniquely determined in law by the law of Z 1 (or equivalently the Bernstein function Ψ), but neither strong nor weak IDT processes are determined in law by the law of Z 1 . If one takes two independent, but different, strong IDT processes Z (1) , Z (2) 1 , then the stochastic process then Z is weak IDT, but not strong IDT. On the level of X this means that the mixture of two min-stable multivariate exponential random vectors always has exponential minima, but needs not be min-stable anymore.

Remark 5.8 (Archimax copulas)
The study of min-stable multivariate exponentials is analogous to the study of extremevalue copulas. From this perspective, Theorem 5.4 gives us a canonical stochastic model for all conditionally iid extreme-value copulas. Another family of copulas for which we understand the conditionally iid subfamily pretty well is Archimedean copulas, related to 1 -norm symmetric distributions and mentioned in Remark 3.8. The family of so-called Archimax copulas is a superclass of both extreme-value and Archimedean copulas. It has been studied in [13,14] with the intention to create a rich copula family that comprises well-known subfamilies. An extreme-value copula C is conveniently described in terms of its stable tail dependence function. Recall that Theorem 5.4 is formulated in terms of the stable tail dependence function and gives an analytical criterion for C to be conditionally iid. An Archimax copula C is a multivariate distribution function of the functional form It is recognized that if (x 1 , . . . , when M denotes the family of all probability laws with the property (P) of "having a survival function of the functional form (in some dimension d)". In this case, the function ϕ equals the Laplace transform of M and is given in terms of a triplet (b, c, γ) such as in Theorem 5.4, associated with the strong IDT process Z, and b + c = 1.

Exogenous shock models
The present section studies a family M of multivariate distribution functions that have a stochastic representation according to the following exogenous shock model: We consider some system consisting of d components and interpret the k-th component of our random vector X = (X 1 , . . . , X d ) with law in M as the lifetime of the k-th component in our system. A component lives until it is affected by an exogenous shock, and the arrival times of these exogenous shocks are modeled stochastically. For each non-empty subset I ⊂ {1, . . . , d} of components, we denote by E I a non-negative random variable. We assume that all E I are independent and interpret E I as the arrival time of an exogenous shock affecting all components of our random vector which are indexed by I. This means that we define Such exogenous shock models are popular in reliability theory, insurance risk, and portfolio credit risk. Recall from Example 4.2(E) that this model is a generalization of the Marshall-Olkin distribution, which arises as special case if all the E I are exponentially distributed, see also Example 6.2 below.

Exchangeability and the extendibility problem
We are interested in a solution of Problem 1.5 for the property (P) of "having an exogenous shock model representation (38)". By Lemma 1.3 exchangeability is a necessary requirement on X, and we observe immediately from (38) that this implies that the distribution function of E I is allowed to depend on the subset I only through its cardinality |I|. Some simple algebraic manipulations, see the proof of Theorem 6.1 below, reveal that the survival function of X necessarily must be given as the product of its arguments after being ordered and idiosyncratically distorted. Thus, already from a purely algebraic viewpoint it is a quite compelling problem to determine necessary and sufficient conditions on the distortions to obtain a proper survival function. In other words, already the characterization of the exchangeable subfamily in analytical terms is an interesting problem, the interested reader is referred to [60] for its solution.
The conditionally iid subfamily M * * is also investigated in [60]. One major finding is that when the increments of the factor process Z in the canonical construction (5) are independent, then one ends up with an exogenous shock model. Recall that a càdlàg stochastic process Z = {Z t } t≥0 with independent increments is called additive, see [82] for a textbook treatment. For our purpose, it is sufficient to be aware that the probability law of a non-decreasing additive process Z = {Z t } t≥0 with Z 0 = 0 can be described uniquely in terms of a family {Ψ t } t≥0 of Bernstein functions defined by Ψ t (x) := − log(E[exp(−x Z t )]), x ≥ 0, i.e. Ψ t equals the Laplace exponent of the infinitely divisible random variable Z t . The independent increment property implies for 0 ≤ s ≤ t that Ψ t − Ψ s is also a Bernstein function and equals the Laplace exponent of the infinitely divisible random variable Z t −Z s . The easiest example for a non-decreasing additive process is a Lévy subordinator, in which case Ψ t = t Ψ 1 , i.e. the probability law is described completely in terms of just one Bernstein function Ψ 1 (due to the defining property that the increments are not only independent but also identically distributed). Two further compelling examples of (non-Lévy) additive processes are presented in subsequent paragraphs.
Theorem 6.1 (Additive subordinators and exogenous shock models) Let M denote the family of probability laws with the property (P) of "having a stochastic representation as in (38)". A random vector X has law in M and is exchangeable if and only if it admits a survival copula of the functional form Proof A proof sketch works as follows, see [60] for details. The survival function of the random vector X defined by (38) can be written in terms of the one-dimensional survival functions of the E I as Exchangeability of X implies that the probability law of E I depends on I only via its cardinality |I| ∈ {1, . . . , d}. If we denote the survival function of E I with |I| = m bȳ H m , we observe that Noting for x = (x, 0, . . . , 0) that x [d] = x and x [1] = . . . = x [d−1] = 0, we observe that the one-dimensional margins are That (40) can be written asĈ P(X 1 > x 1 ), . . . , P(X d > x d ) withĈ as in (39) follows by a tedious yet straightforward computation with the g k defined as denotes the generalized inverse of the non-increasing functionF 1 , which is defined analogous to the generalized inverse of a distribution function as Now assume Z is an additive process with associated family of Bernstein functions {Ψ t } t≥0 . The survival copula of the random vector X of Equation (5) can be computed in closed form using the independent increment property of Z. It is easily shown to be of the structural form (39), when There are some interesting subfamilies of exogenous shock models that are worth mentioning with respect to their conditionally iid substructure. The first of them is a well known friend from previous sections, re-visited once again in the following example.

Example 6.2 (The Marshall-Olkin law revisited)
If all random variables E I in the exogenous shock model construction (38) are exponentially distributed, we are in the special situation of Example 4.2(E). Indeed, it has already been shown in the original reference [70] that every Marshall-Olkin distribution can be constructed like this. Hence, we already know from Theorem 4.6(E) that an exogenous shock model with exponential arrival times is obtained via the canonical conditionally iid model (5) if the associated stochastic process H = {H t } t≥0 ∈ M 1 + (H + ) is such that Z t := − log(1 − H t ), t ≥ 0, defines a Lévy subordinator, which is a special additive subordinator.

Example 6.3 (A simple global shock model)
A special case of copulas of the form (39) is considered in [20], namely g 2 = . . . = g d , which we briefly put in context with the additive process construction. To this end, let g 2 be a strictly increasing and continuous distribution function of some random variable taking values in [0, 1], assuming x → g 2 (x)/x is non-increasing on (0, 1]. The function F M (x) := x/g 2 (x) then is a distribution function on [0, 1], and we let M be a random variable with this distribution function. Independently, let W 1 , W 2 , . . . be an iid sequence drawn from g 2 . We consider the infinite exchangeable sequence By definition, each finite d-margin has an exogenous shock model representation, and the survival copulaĈ is easily seen to be of the form (39) with g 2 = . . . = g d , for arbitrary d ≥ 2. The conditional distribution function is static in the sense of Section 3, and given by H = 1 − exp(−Z) with The random variable E {1,2,...} is unit exponential, and Z is additive with associated family of Bernstein Notice that for each fixed t > 0 this corresponds to an infinitely divisible distribution of Z t that is concentrated on the set The case g 2 (x) = x α with α ∈ [0, 1] implies that Z is a killed Lévy subordinator that grows linearly before it jumps to infinity, X has a Marshall-Olkin law, and the E I are exponential. In the general case, Z needs not grow linearly before it gets killed.
Two further examples are studied in greater detail in the following two paragraphs, since they give rise to nice characterization results.

The Dirichlet prior and radial symmetry
In the two landmark papers [30,31], T.S. Ferguson introduces the so-called Dirichlet prior and shows that it can be constructed by means of an additive process. More clearly, let c > 0 be a model parameter and let G ∈ H + , continuous and strictly increasing. Consider a non-decreasing additive process Z = {Z t } t∈[G −1 (0),G −1 (1)] whose probability law is determined by a family of Bernstein functions {Ψ t } t≥0 , which are given by and H is called Dirichlet prior with parameters (c, G), denoted DP (c, G) in the sequel. The probability distribution of (X 1 , . . . , X d ) in (5), when H = DP (c, G) for some G with support [G −1 (0), G −1 (1)] := [0, ∞], is given by It is insightful to remark that for c 0 the copulaĈ c converges to the so-called upper-Fréchet Hoeffding copulaĈ 0 (u) = u [1] , and for c ∞ to the copulaĈ ∞ (u) = d k=1 u k associated with independence. The intuition of the Dirichlet prior model is that all components of X have distribution function G, but one is uncertain whether G is really the correct distribution function. So the parameter c models an uncertainty about G in the sense that the process H must be viewed as a "distortion" of G. For c ∞ we obtain H = G, while for c 0 the process H is maximally chaotic (in some sense) and does not resemble G at all.
Interestingly, if the probability law dG is symmetric about its median µ := G −1 (0.5), then the random vector (X 1 , . . . , X d ) is radially symmetric, which can be verified using Lemma 1.14. One can furthermore show that there exists no other conditionally iid exogenous shock model satisfying this property, see the following lemma. To this end, recall that a copula C is called radially symmetric if C =Ĉ, i.e. it equals its own survival copula, which means that U = (U 1 , . . . , U d )

Proof
This is [61,Theorem 3.5]. In order to prove necessity, the principle of inclusion and exclusion can be used to express the survival copula ofĈ as an alternating sum of lower-dimensional margins ofĈ. By radial symmetry, this expression equalsĈ, and on both sides of the equation one may now take the derivatives with respect to all d arguments. A lengthy but tedious computation then shows that the g k must all be linear, which implies the claim. Sufficiency is proved using the Dirichlet prior construction. The defining properties of the Dirichlet prior imply that the assumptions of Lemma 1.14 are satisfied, which implies the claim.

The Sato-frailty model and self-decomposability
A real-valued random variable X is called self-decomposable if for arbitrary c ∈ (0, 1) there exists an independent random variable Y such that X d = c X + Y . It can be shown that a self-decomposable X is infinitely divisible, so self-decomposable laws are special cases of infinitely divisible laws. In particular, if X takes values in (0, ∞) and is infinitely divisible with Laplace exponent given by the Bernstein function Ψ, then X is self-decomposable if and only if the function x → x Ψ (x) is again a Bernstein function, see [91,Theorem 2.6,p. 227]. Now let Ψ be the Bernstein function associated with a self-decomposable law on (0, ∞), and consider a family of Bernstein functions defined by Ψ t (x) := Ψ(x t), t ≥ 0. One can show that there exists an ad- ), x, t ≥ 0, called Sato subordinator. If we use this process in (5), the conditionally iid random vector X obtained by this construction has survival function given by The following lemma characterizes self-decomposability analytically in terms of multivariate probability laws given by (42).

Proof
Sufficiency is an instance of the general Theorem 6.1, as demonstrated above. Necessity, i.e. that self-decomposability can actually be characterized in terms of the multivariate survival functions (42), is shown in [62] and relies on some purely analytical, technical computations.
Example 6.6 (A one-parametric, multivariate Pareto distribution) Let Ψ(x) = α log(1 + x) be the Bernstein function associated with a Gamma distribution 15 with parameter α > 0. The Gamma distribution is self-decomposable and the survival function (42) takes the explicit, one-parametric form 7 Related open problems

Extendibility-problem for further families
The present article surveys solutions to Problems 1.1 and 1.5 for several families M of interest. One goal of the survey is to encourage others to solve the problem also for other families. We provide examples that we find compelling: (i) The family of min-stable laws in Section 5 can be generalized to min-infinitely divisible laws. Generalizing (33), a multivariate survival functionF is called min-infinitely divisible if for each t > 0 there is a survival functionF t such that F (x) t =F t (x). Like in the case of min-stable laws, the concept of min-infinite divisibility is equivalent to the concept of max-infinite divisibility, on which [79] provides a textbook treatment. It is pretty obvious that non-decreasing infinitely divisible processes occupy a commanding role with regards to the conditionally iid subfamily, but to work out a convenient analytical treatment of these in relation with the associated min-infinitely divisible laws appears to be a promising direction for further research. Notice that the family of reciprocal Archimedean copulas, introduced in [34], is one particular special case of max-infinitely divisible distribution functions, and in this special case the conditionally iid subfamily is determined similarly as in the case of Archimedean copulas, see [34,Section 7]. This might serve as a good motivating example for the aforementioned generalization.
(ii) Theorem 3.10 studies d-variate densities of the form g d (x [d] ), and [36] also considers a generalization to densities of the form g(x [1] , x [d] ), depending on x [1] and x [d] . From a purely algebraic viewpoint it is tempting to investigate whether exchangeable densities of the structural form d k=1 g k (x [k] ), allow for a nice theory as well. When are these conditionally iid? This generalization of the ∞ -norm symmetric case is motivated by a relation to non-homogeneous pure birth processes, as already explained in Remark 3.12. Such processes are of interest in reliability theory, as explained in [87].
(iii) On page 45 it was mentioned that the Marshall-Olkin distribution is characterized by the property that for all subsets of components the respective "survival indicator process" is a continuous-time Markov chain. This property may naturally be weakened to the situation when only the survival indicator process Z t := (1 {X 1 >t} , . . . , 1 {X d >t} ) of all components is a continuous-time Markov chain. On the level of multivariate distributions, one generalizes the Marshall-Olkin distribution to a more general family of multivariate laws that has been shown to be interesting in mathematical finance in [42]. Furthermore, it is a subfamily of the even larger family of so-called multivariate phase-type distributions, see [5]. Which members of theses families of distributions are conditionally iid? Presumably, this research direction requires to generalize the Lévy subordinator in the Marshall-Olkin case to more general non-decreasing Markov processes.

Testing for conditional independence
If a specific d-variate law in some family M is given, do we have a practically useful, analytical criterion to decide whether or not this law is in M * , resp. M * * , or not? According to Theorem 1.18, in general this requires to check whether a supremum over bounded measurable functions is bounded from above, which in practice is rather inconvenient -at least on first glimpse. For certain families, however, there is hope to find more useful criteria. For instance, for Marshall-Olkin distributions the link to the truncated moment problem in Remark 4.7 is helpful in this regard, like it is for binary sequences. For Archimedean copulas (resp. 1 -norm symmetric survival functions) this boils down to checking whether a d-monotone function is actually completely monotone, i.e. a Laplace transform. However, it is an open problem for the family of extremevalue copulas. Of course, Theorem 5.4 tells us which stable tail dependence functions correspond to conditionally iid laws. But given some specific stable tail dependence function, how can we tell effectively whether or not this given function has the desired form? Even for dimension d = 2, in which case the problem is presumably easier due to the fact that the 2-dimensional unit simplex is one-dimensional, this problem is nontrivial and open. Given we find such effective analytical criterion for some family M, is it even possible to build a useful statistical test based on it, i.e. can we test the hypothesis that the law is conditionally iid?

Combination of one-factor models to multi-factor models
This is probably the most obvious application of the presented concepts. The idea works as follows. According to our notation, the dependence-inducing latent factor in a conditionally iid model is H. Depending on the stochastic properties of H ∈ M 1 + (H), it may be possible to construct H from a pair (H (1) , H (2) ) ∈ M 1 + (H) × M 1 + (H) of two independent processes of the same structural form, say H = f (H (1) , H (2) ). For example, if H (1) and H (2) are two strong IDT processes, see Section 5, then so is their sum H = H (1) +H (2) . In this situation, we may define dependent processes H (1,1) , . . . , H (1,J) from J + 1 independent processes H (0) , . . . , H (J) as H (1,j) = f (H (0) , H (j) ). The conditionally iid vectors X (1) , . . . , X (J) defined via (4) from H (1,1) , . . . , H (1,J) are then dependent, so that the combined vector X = (X (1) , . . . , X (J) ) has a hierarchical dependence structure. Such structures break out of the -sometimes undesired and limited -exchangeable cosmos and have the appealing property that the lowest-level groups are conditionally iid, so the whole structure can be sized up, i.e. is dimension-free to some degree. Of particular interest is the situation when the random vector (X (1) 1 , . . . , X (J) 1 ) composed of one component from each of the J different groups is conditionally iid and its latent factor process equals H (0) in distribution. In this particular situation, an understanding of the whole dependence structure of the hierarchical model X is retrieved from an understanding of the conditionally iid sub-models based on the H (j) . In other words, the conditionally iid model can be nested to construct highly tractable, non-exchangeable, multi-factor dependence models from simple building blocks. For instance, hierarchical elliptical laws, Archimedean copulas 16 , and min-stable laws can be constructed based on the presented one-factor building blocks, see [65] for an overview. For these and other families, the design, estimation, and efficient simulation of such hierarchical structures is an active area of research or even an unsolved problem.

Parameter estimation with uncertainty
The classical statistical parameter estimation problem is to estimate the (true) parameter m of a one-parametric distribution function F m from iid observations X 1 , . . . , X d ∼ F m . A parameter estimate is then a functionm =m(X 1 , . . . , X d ) of the observations into the set of admissible parameters. This classical problem relies on the hypothesis that there is a "true" parameter m, from which the observations are drawn. But what if we are uncertain whether or not the observations are actually drawn from some F m ? The Dirichlet prior has been introduced in [30,31] with the motivation to model uncertainty about the hypothesis that observations are drawn from some F m . Instead, it is assumed that they are draw from DP (c, F m ) with an uncertainty parameter c > 0. On a high level, this amounts to observing one sample X = (X 1 , . . . , X d ), with large d, from a parametric conditionally iid model. Optimal estimates for m based on the observations can then be derived due to the convenient Dirichlet prior setting, see [30,31] for details. But this question can clearly also be posed for other conditionally iid models. Let us provide a second motivation that appears to be natural: let X 1 , . . . , X d be observed time points of company bankruptcy filings within the last 10 years. An iid assumption for X 1 , . . . , X d is well known to be inappropriate. Instead, a popular model for such time points is a Marshall-Olkin distribution, see [25]. If we assume in addition -for mathematical convenience -that X = (X 1 , . . . , X d ) is conditionally iid, we know from Theorem 4.6(E) and Lemma 1.16 that the empirical distribution function of X 1 , . . . , X d is approximately equal to 1 − exp(−Z) for a Lévy subordinator Z. Depending on a specific parametric model for Z, it is well possible that we can estimate the parameters based on the observed empirical distribution function. For example, if Z is a compound Poisson process with constant jump size m, then huge (small) jumps in the empirical distribution function apparently indicate a large (small) value of m. Such parameter estimation problems based on one (large) sample X = (X 1 , . . . , X d ) from a conditionally iid model appear to be very model-specific and thus possibly interesting, and the two motivating examples above indicate that one might find natural motivations for those.

Quantification of diversity of possible extensions
All of the presented theorems solve Problem 1.5, but only in some cases 17 the solution set M * * is shown to coincide with the in general larger solution set M * in Problem 1.1. Is it possible at all to find a non-trivial example for a property (P) such that M * * = M * ? 16 See also the many references in Remark 3.8. 17 To wit, Example 1.6, Schoenberg's Theorem 3.3 and Theorem 3.10.
Can one show that M * = M * * in the other presented solutions of Problem 1.5? To provide one concrete example, from Theorem 4.6(G) we know that (b 0 , b 1 , b 2 ) ∈ M 2 determines a three-dimensional, exchangeable wide-sense geometric law. However, this exchangeable probability distribution is only in M * * if there exist b 3 , b 4 , . . . such that {b k } k∈N 0 ∈ M ∞ . Could it be that the last extension property does fail, but the threedimensional, exchangeable wide-sense geometric law associated with (b 0 , b 1 , b 2 ) is still conditionally iid? If so, then necessarily there is some n > 3 and an infinite exchangeable sequence {X k } k∈N such that (X 1 , X 2 , X 3 ) has the given wide-sense geometric law but (X 1 , . . . , X n ) is not wide-sense geometric.
A related question concerns only elements in M * * . There might be two infinite exchangeable sequences {X d ) for some d ∈ N. To provide an example, related to Theorems 2.2 and 4.6, the vector (1, b 1 ) with b 1 ∈ [0, 1] can always be extended to a sequence {b k } k∈N that is completely monotone, for example set b k = b k 1 . In case of Theorem 4.6(G), all the different possible extensions imply different exchangeable sequences {X k } k∈N such that 2-margins follow the associated wide-sense geometric law with parameters (1, b 1 ). But all these extensions have in common that arbitrary d-margins are always wide-sense geometric. But it is unclear whether one can find {X k } k∈N with the same 2-margins, but with structurally different d-margins for d > 2. If structurally different extensions are possible, can one quantify how different such extensions are allowed to be? Further related questions are: Is the "⊂" in (37) and in Theorem 6.1 actually a "="? Notice that the proof ideas in [19,78], who study such issues in the case of some static laws, might help to approach such questions.

Characterization of stochastic objects via multivariate probability laws
As a general rule, for an infinite exchangeable sequence {X k } k∈N defined via (4) the probability law of the random distribution function H is uniquely determined by its mixed moments E[H t 1 · · · H t d ] = P(X 1 ≤ t 1 , . . . , X d ≤ t d ), d ∈ N, t 1 , . . . , t d ∈ R.
This often implies interesting analytical characterizations of the stochastic object H in terms of the multivariate distribution functions t → P(X 1 ≤ t 1 , . . . , X d ≤ t d ). In particular, if H is of the form H = 1 − exp(−Z) like in (5), then the mixed moments above become E e − d k=1 Zt k = P(X 1 > t 1 , . . . , X d > t d ), d ∈ N, t 1 , . . . , t d ≥ 0, that is the survival functions t → P(X 1 > t 1 , . . . , X d > t d ) stand in one-to-one relation with the Laplace transforms of finite-dimensional margins of the non-decreasing process Z. This general relationship explains the close connection between conditionally iid probability laws and moment problems/ Laplace transforms encountered several times in this survey. For instance, Theorem 3.6 shows that ϕ is a Laplace transform if and only if x → ϕ(||x|| 1 ) is a survival function for all d ≥ 1, or Theorem 4.6 characterizes Lévy subordinators in terms of multivariate survival functions, or Lemma 6.5 characterizes self-decomposable Bernstein functions via multivariate survival functions. Can further characterizations be found? Is there a compelling application for such characterizations in terms of multivariate survival functions?