When does the minimum of a sample of an exponential family belong to an exponential family?

It is well known that if ( X 1 , ..., X n ) are i.i.d. r.v.’s taken from either the exponential distribution or the geometric one, then the distribution of min( X 1 , ..., X n ) is, with a change of parameter, is also exponential or geometric, respectively. In this note we prove the following result. Let F be a natural exponential family (NEF) on R generated by an arbitrary positive Radon measure µ (not necessarily conﬁned to the Lebesgue or counting measures on R ). Consider n i.i.d. r.v.’s ( X 1 , ..., X n ) , n ≥ 2 , taken from F and let Y = min( X 1 , ..., X n ) . We prove that the family G of distributions induced by Y constitutes an NEF if and only if, up to an afﬁne transformation, F is the family of either the exponential distributions or the geometric distributions. The proof of such a result is rather intricate and probabilistic in nature.


Introduction
Both distributions, the geometric distribution supported on N 0 = {0, 1, 2, . . .} and the exponential distribution supported on [0, ∞), possess similar properties. We outline only some of them: • Like its continuous analogue (the exponential distribution), the geometric distribution is memoryless.
• If a r.v. X has an exponential distribution with mean 1/λ then X , where x denotes the floor function of a real number x, is geometrically distributed with parameter p = 1 − e −λ . • If (X 1 , ..., X n ) are i.i.d. r.v.'s taken from either the exponential distribution or the geometric one, then the distribution of min(X 1 , ..., X n ) is, with a change of parameter, also exponential or geometric, respectively.
• Both families of distributions belong to the class of natural exponential families (NEF's).
Indeed, the present note incorporates the last two properties in the following sense.
Let F be an NEF on R generated by an arbitrary positive Radon measure µ (not necessarily confined to the Lebesgue or counting measures on R). Consider n i.i.d. r.v.'s (X 1 , ..., X n ), n ≥ 2, taken from F and let Y = min(X 1 , ..., X n ). Then we prove that the family G of distributions induced by Y constitutes an NEF if and only if, up to an affine transformation, F is the family of either the exponential distributions or the geometric distributions.
A similar, but rather more restrictive, problem has been treated by Bar-Lev and Bshouty (2008) in which they considered the case where µ has the form µ(dx) = h(x)dx.
Then under some restrictive conditions on h (as differentiability) they showed that the family of distributions induced by Y is an NEF if and only if the distribution of the X i 's is an exponential one (up to an affinity x → ax + b). In their concluding remarks, Bar-Lev and Bshouty (2008) indicated the mathematical difficulties arising for proving that when µ is a counting measure on N 0 then the family G is an NEF iff F is the family geometric distributions. It should be noted, however, that for the restricted case µ(dx) = h(x)dx, Bar-Lev and Bshouty (2008) treated the question of when G r , the family of distributions induced by the r-th order statistic X (r) (out (X 1 , ..., X n )), is an NEF. They showed that necessarily r = 1 in which case the NEF F must be that of the exponential distributions.
As already indicated, we consider here the case r = 1 and prove in Theorem 1 a more general result for an arbitrary measure µ (which includes the Lebesgue measure and counting measure as special cases).
In Section 2 we introduce some required preliminaries on NEF's. In Section 3 we present and prove our main result. The style of the result and the methods of the proof are close to the celebrated Balkema-de Haan-Pickands theorem on extreme values (see [1] and [5]).

Some preliminaries on NEF's
For proving our main result we shall need the definition of an NEF (for a detailed description of NEF's on R see Letac and Mora, 1990).
Let µ be a positive non-Dirac Radon measure on R. The Laplace transform of µ is and denote k µ (θ) = log L µ (θ), θ ∈ Θ(µ). Also, let M(R) denote the set of positive measures µ on R not concentrated on one point such that Θ(µ) = ∅. Then, the family of is called the NEF generated by µ.
The two special cases of the geometric and exponential families have the following NEF features: • Geometric: where δ k is the Dirac mass on k. In this case Order statistics and exponential families where q = e θ < 1. Let X 1 , ..., X n be i.i.d. r.v.'s with common geometric distribution with parameter q, then the p.d.f. of Y = min (X 1 , ..., X n ) is geometric with parameter q n , or in its NEF p.d.f. form with θ −→ nθ.

The main result
Theorem 3.1. Let µ ∈ M(R) and n ≥ 2 be an an integer. Let X 1 , ..., X n be i.i.d. r.v's with common distribution P (θ, µ) and denote by Q θ the distribution of Y = min(X 1 , ..., X n ).
Then there exist a measure ν ∈ M(R), an NEF F (ν) and a differentiable mapping is a positive affine transformation of either the NEF of geometric distributions or the NEF of exponential distributions.
Proof. The statement ⇐ is simple as can be seen from the remarks at the end of Section 2. Indeed, with the choices of µ made there, we have for both, the geometric and exponential cases, that µ = ν and α(θ) = nθ. We prove the statement ⇒ in six steps. In the first step we derive the functional equation (3.3) which provides a necessary condition for Q θ ∼ Y = min (X 1 , ..., X n ) to belong to some NEF F (ν). The second step proves that the support of µ is bounded on the left, while the third step shows that such a support is unbounded on the right. The fourth step further analyzes the functional equation (3.3) and provides an important equation (3.7) associated with the measure µ. More specifically the problem is then being reduced to the case where the support interval (i.e., the convex hull of the support) of µ is exactly [0, ∞). If we denote by µ x the translate of µ(dt) by t → t − x and then truncate at zero, the equality (3.7) is k µx = k µ for µ almost all x. This equality reduces the characterization problem to the problem of whether µ possesses at least one atom or not. If µ has at least one atom the the fifth step proves that µ generates the geometric NEF. Otherwise, the sixth step shows that µ generates the exponential NEF. Such six steps then conclude the proof.
First step. This step is devoted to the setting of the functional equation (3.3) below.
For simplicity, we write k = k µ , Θ = Θ(µ) and so on. In the sequel we write If the law of Y belongs to an NEF F (ν) then for θ ∈ Θ and real y, the number P (Y ≥ y) can be represented in two different ways, by which one gets the following equality ECP 21 (2016), paper 6. and hence the following equality, between two probability measures, holds: This proves that the measures ν and µ are equivalent and we can introduce the Radon Nikodym derivative g(y) = dν dµ (y). Hence, the following equality which holds µ almost everywhere: By denoting g n (y) = g(y) n 1/(n−1) and and elevating to the power 1/(n − 1), we get the following equality which holds µ almost everywhere: Assume, without loss of generality, that µ and ν are probability measures. Then, the Hölder inequality, applied to the pair of functions (g, 1) and to (p, q) = (n − 1, Now, by differentiating, with respect to θ, of both sides of the latter equality, we obtain Since the latter equality holds for all x, it follows that for each fixed θ ∈ Θ, which holds µ(dx) almost everywhere. The equality (3.3) holds in particular for any element x of the support S of the measure µ. To prove this statement, we denote by H(x) the left hand side of (3.3). Then locally, H has a bounded variation (i.e., it is the difference of two non-increasing functions) and its discontinuity points are the atoms of µ. Therefore H(x) = 0 if x is an atom of µ. If x ∈ S and is not an atom of µ then there exists a sequence (x k ) such that H(x k ) = 0 for all k and such that x k → x. Since H is continuous in x it follows that H(x) = 0 for all x ∈ S. Second step. We prove that the support of µ is bounded on the left. If not, the equality (3.3) holds for some fixed θ ∈ Θ and for some sequence (x k ) such that lim k→∞ x k = −∞. This implies that A (θ) = 0 and B (θ) = k (θ). But then clearly the equality ∞ x − k e θt (t − k (θ))µ(dt) = 0 cannot hold for all k. Indeed, if k 0 is such that x k0 ≤ k (θ) then such an equality would imply that for any k > k 0 ECP 21 (2016), paper 6. while the right hand side is negative for k large enough.
Third step. This step proves that the support of µ is unbounded on the right. It relies on the following lemma, which has its own interest with its characterisation of the distribution B(1, a) up to a dilation by b : Lemma 1. Let P be a non-Dirac probability on [0, ∞) and K > 0 such that for P almost all x we have (3.4) Proof. If K > 1 then for at least one x > 0 we have which is a contradiction. If K = 1 then 0 = x + 0 − (t − x)P (dt) for P almost all x. This implies that t − x = 0 for P (dt)P (dx) almost all (t, x), which is possible only if P is a Dirac measure, a contradiction. The probability measure P has no atom on t 0 > 0 since (3.4) implies t 0 P ({t 0 }) = Kt 0 P ({t 0 }) which contradicts that K < 1. Similarly, P has no atom on zero. If not, since for at least one x > 0 one has we get a contradiction.
The support S of P contains 0. If not, there exists b in S such that P ([0, b)) = 0. Since P is not Dirac there exists a sequence x n b such that Now consider the conditional probability P n which is P (dt) conditioned on b < t < x n . Then, P n converges weakly to δ b (the simplest way to prove this is to use the distribution function of P n ). Since xn b tP n (dt) = Kx n , then by passing to the limit we get the contradiction for K = 1.
The support S of P is an interval containing zero. If not, and since 0 ∈ S, there exist 0 < x 1 < x 2 such that P ((x 1 , x 2 )) = 0, x 1 , x 2 ∈ S and x1 0 P (dt) > 0. Hence, from (3.4), we get the following contradiction where the equalities (a) and (c) stem from (3.4) and the fact that x 1 and x 2 are in S. The equalities (b) and (d) come from the fact that P ((x 1 , x 2 )) = 0. Now, since P has no atoms, the function is continuous. Furthermore, f is zero P almost everywhere. This implies that f is zero on the support S of P . If not, there exists x 0 ∈ S such that |f (x 0 )| > 0 and an open interval Differentiating this equality (in the Stieltjes sense) we get (on S 0 ) that 1−K . This shows that P (dx) = g(x)dx is absolutely continuous. In fact, from xg(x) = a x 0 g(t)dt, it follows that the function g is continuous and even differentiable on S 0 . This leads to the differential equation g (x)/g(x) = (a − 1)/x on S 0 and g(x) = Cx a−1 , where C > 0. If S is unbounded then g cannot be a probability density. Therefore S = [0, b] is bounded and the lemma is proved.
We now prove the claim of Step 3 that the support of µ is unbounded on the right. If not, from Step 2, we may assume without loss of generality that the support interval of µ is exactly [0 x − e θt (t − x)µ(dt) b + x − e θt µ(dt) (3.6) Fix θ, consider the change of variable t → b − t and apply Lemma 1 to the image P (dt) of the probability e θt−k(θ) µ(dt) and to A = k (θ)/b. Then, it follows that the a of Lemma 1 is a(θ) = k (θ)/(b − k (θ)). Since the support interval of P is also [0, b] we can claim that an equality which cannot hold for all θ. One may realize this as follows. Since where c(θ) = k(θ) + log a(θ), we have, by differentiating by θ, that for all (θ, t) ∈ Θ × (0, b), Then, differentiating by t, we get b − t = a (θ), which is clearly impossible. which shows that B = k . By the definition of B in (3.1), this implies that k(θ) − k ν (α(θ)) is a constant. We denote by µ x (du) the image of the measure µ by the map t −→ u = t − x multiplied by the function 1 [0,∞) (u). The equality (3.3) can then be reformulated as for µ(dx) almost everywhere. We now analyze (3.7) according to whether µ has at least one atom (Fifth Step), an assumption that will lead to the geometric NEF, or not (Sixth Step), a fact that will lead to the exponential NEF.
Fifth step. Assume that µ has an atom x 0 . We prove that there exists a countable additive subgroup G of R and a real character χ of G such that where µ(x) denotes the mass of the atom x.
This assumption implies that (3.7) is true for x = x 0 and thus that µ has an atom on 0 (and thus are all the measures µ x for which (3.7) is true). This implies that µ is purely atomic. Denote by S the set of atoms of µ. From (3.7) we infer that for all x ∈ S we have Calculating the mass of this measure on s ∈ S we get For x ∈ S, denote χ(x) = log µ(x) − log µ(0) and for x ∈ −S denote χ(x) = −χ(−x). Then the latter equality implies that is, χ is a real character of G. We now prove that G is aZ for some for some a > 0. If not, then G is a dense in R. Then, either any pair (x, x ) of G\ {0} is such that x/x is rational, or there exists a pair such that x/x is irrational. Without loss of generality, we may assume for the latter two cases that 1 ∈ G. In the first case (where x/x is rational) there exist arbitrary small rational numbers x ∈ G such that χ(x) = xχ(1). Thus, for A > 0, the family e χ(x) : x ∈ G ∩ [0, A] cannot be summable and µ is not a Radon measure. Similarly, for the second case (x/x is irrational), G contains a subgroup Z(α) for some irrational number α (where Z(α) is the set of a + bα with a, b in Z). By denoting p 1 = e χ(1) and p 2 = e χ(α) we obtain that p a 1 p b 2 = e χ(a+bα) . We now need to prove that This can be accomplished by a tedious discussion and analysis of the nine cases 0 < p 1 < 1, p 1 = 1 and p 1 > 1 combined with 0 < p 2 < 1, p 2 = 1 and p 2 > 1 (we omit details for brevity). This, however, would finally show that µ cannot be a Radon measure.
Thus we conclude the case where µ has at least one atom by stating that for this case there exist a > 0 and numbers p = e χ(a) > 0 and q = µ(0) such that µ(dt) = µ(0) ∞ n=0 qp n δ na (dt).
ECP 21 (2016), paper 6. This is equivalent to saying that F (µ) is the image of the geometric distributions by the dilation n −→ an.
Sixth step. We assume that µ has no atoms. Denote by X ⊂ [0, ∞) the set of x such that (3.7) holds. We prove that the closureX of X is the support S of µ. To see that S ⊂X, we choose x 0 ∈ S. If there is no sequence (x n ) of X converging to x 0 , this would imply the existence of > 0 such that µ([x 0 − , x 0 + ]) = 0 and thus contradict the fact that x 0 ∈ S. To see that X ⊂ S we choose x 0 ∈ X. If x 0 / ∈ S then this would imply the existence of > 0 such that µ([x 0 − , x 0 + ]) = 0. Since 0 ∈ S, the measure µ x0 cannot be equivalent to µ. Thus, the statement that S =X is proved. Now, the fact that µ has no atoms implies that x −→ µ x is a continuous function on R for the vague topology of Radon measures. The equality (3.7) is thus equivalent to the existence of a function χ on X such that µ x (dt) = e χ(x) µ(dt), (3.9) and the preceding remark implies that χ is a continuous function on X and is extendable in a continuous function toX. Thus (3.7) and (3.8) hold on S. Now we observe that (3.7) implies that for all x ∈ S we have S = (S − x) ∩ [0, ∞). Thus G = S ∪ (−S) is an additive subgroup of R. Since G is closed, then either G = {0}, or there exists a > 0 such that G = aZ or G = R. Such two cases can be excluded since µ has no atoms, and thus we get S = [0, ∞). We now show that χ(x + s) = χ(x) + χ(s) for all x ≥ 0 and s ≥ 0. For this we observe that (3.7) implies that for all x ≥ 0 the measure µ x generates the NEF F (µ). Thus µ x must share with µ the property (3.7), and for s ≥ 0 we therefore have µ x+s (dt) = e χ(x) µ(dt).
As χ is continuous, it is simple to see that there exists b ∈ R such that χ(x) = bx. One can consult Bingham, Teugels and Goldie for reference to this Cauchy functional equation.
This implies that for all intervals I ⊂ [0, ∞), we haveμ(x + I) =μ(I). Thusμ is proportional to the restriction of the Lebesgue measure to [0, ∞) and the theorem is proved.