CONDITIONAL MOMENT REPRESENTATIONS FOR DEPENDENT RANDOM VARIABLES

Abstact : The question considered in this paper is which sequences of p -integrable random variables can be represented as conditional expectations of a (cid:12)xed random variable with respect to a given sequence of (cid:27) -(cid:12)elds. For (cid:12)nite families of (cid:27) -(cid:12)elds, explicit inequality equivalent to solvability is stated; su(cid:14)cient conditions are given for (cid:12)nite and in(cid:12)nite families of (cid:27) -(cid:12)elds, and explicit expansions are presented. Abstract The question considered in this paper is which sequences of p -integrable random variables can be represented as conditional expectations of a (cid:12)xed random variable with respect to a given sequence of (cid:27) -(cid:12)elds. For (cid:12)nite families of (cid:27) -(cid:12)elds, explicit inequality equivalent to solvability is stated; su(cid:14)cient conditions are given for (cid:12)nite and in(cid:12)nite families of (cid:27) -(cid:12)elds, and explicit expansions are presented.


Introduction
We analyze which sequences of random variables {X j } can be represented as conditional expectations E(Z|F j ) = X j .
of a p-integrable random variable Z with respect to a given sequence (F j ) of σ-fields. The martingale theory answers this question for families of increasing σ-fields (F j ). We are interested in other cases which include σ-fields generated by independent, or Markov dependent (see [3]), random variables. In particular, given a random sequence ξ j and p-integrable random variables X j = f j (ξ j ), we analyze when there exists Z ∈ L p such that This is motivated by our previous results for independent random variables and by the alternating conditional expectations (ACE) algorithm of Breiman & Friedman [4]. In [4] the authors are interested in the L 2 -best additive prediction Z of a random variable Y based on the finite number of the predictor variables ξ 1 , . . . , ξ d . The solution (ACE) is based on the fact that the best additive predictor Z = φ 1 (ξ 1 ) + . . . + φ d (ξ d ) satisfies the conditional moment constraints (2). Relation (1) defines an inverse problem, and shares many characteristics of other inverse problems, c. f. Groetsch [9]. Accordingly, our methods partially rely on (non-constructive) functional analysis. We give sufficient conditions for the solvability of (1) in terms of maximal correlations. We also show that (2) has solution for finite d < ∞, if the joint density of ξ 1 , . . . , ξ d with respect to the product of marginals is bounded away from zero and EX i = EX j .
We are interested in both finite, and infinite sequences, extending our previous results in [5,6]. In this paper we concentrate on the p-integrable case with 1 < p < ∞. The extremes p = 1 or p = ∞ seem to require different assumptions. For infinite sequences of independent r. v. all three cases 1 < p < ∞, p = 1, and p = ∞ are completely solved in [6]. For finite sequences of dependent σ-fields, Kellerer [10] and Strassen [16] can be quoted in connection with conditional expectations problem for bounded random variables (p = ∞) case. For pairs of σ-fields the case 1 < p < ∞ is solved in [5].

Notation and results
For 2 ≤ d ≤ ∞, let {F j } 1≤j≤d be a given family of σ-fields. By L 0 p (F ) we denote the Banach space of all p-integrable F -measurable centered random variables, 1 ≤ p ≤ ∞. By E j we denote the conditional expectation with respect to F j . For d < ∞ by d j=1 L p (F j ) we denote the set of We shall analyze the following problems.
• For all consistent X j ∈ L p find Z ∈ L p satisfying (1) and such that E|Z| p is minimum.
• For all consistent X j ∈ L p find additive Z ∈ L p satisfying (1); additive means that (for d = ∞ the series in (4) is assumed to converge absolutely in L p ) The above statements do not spell out the consistency conditions which will be explicit in the theorems.
Remark 2.1 If (1) can be solved, then there exists a minimal solution Z. This can be easily recovered from the Komlós law of large numbers [11].

Maximal correlations
Maximal correlation coefficients play a prominent role below; for another use see also [4,Section 5]. The following maximal correlation coefficient is defined in [5]. Let Notice thatρ(F , G) = 0 for independent F , G but also for increasing σ-fields F ⊂ G. If the intersection F ∩ G is trivial,ρ coincides with the usual maximal correlation coefficient, defined in general by ρ(F , G) = sup Given d ≤ ∞, σ-fields {F j } j≤d , and a finite subset T ⊂ I := {1, 2, . . ., d} put Define pairwise maximal correlation r by and global maximal correlation For p = 2 a version of R based on additive r. v. will also play a role. Let Clearly, r ≤ R ≤ R. All three coefficients coincide for two σ-fields d = 2 case. One can easily see that R = 0 and R = 1 can happen already for d = 3.

Main results
In Section 2.4 we present complete solution of (1) for the two σ-fields case. For general families of σ-fields, there seems to be little hope to get existence and uniqueness results as precise as for d = 2. As Logan & Shepp [12] point out, complications arise even in relatively simple situations. One source of difficulties is the possibility of linear dependence between vectors in L 0 p (F j ). Suitable assumptions on maximal correlation coefficients exclude this possibility.
The following result extends [6, Corollary 1] to infinite sequences of dependent families of σfields.
Theorem 2.1 (i) Fix 1 < p < ∞ and suppose R < 1. Then equation (1) is solvable for Z for all and the solution is unique.
(ii) If R < 1 then for all X j ∈ L 0 2 (F j ) such that j EX 2 j < ∞ there exists a unique additive solution Z to (1), and it satisfies If one is not interested in sharp moment estimates for Z and only finite families d < ∞ are of interest, then one can iterate Theorem 2.11 for a pair of σ-fields, relaxing the assumption that R < 1. By Lemma 3.2, iterated Theorem 2.11 yields the following.
and 1 < p < ∞, then equation (1) has an additive solution Z to (1) for all X j ∈ L 0 p (F j ).
The following criterion for solvability of the additive version of (1) uses the pairwise maximal correlation r and gives explicit alternative to ACE. For d = 2 the assumptions are close to [5], except that we assume p = 2 and (implicitly) linear independence. (1) and (4) hold. Moreover, the solution is given by the explicit series expansion (with the convention j∈∅ X j = 0). Furthermore, Results in [4, Proposition 5.2] can be recovered from maximal correlation methods as follows. For finite families of σ-fields, Lemma 3.2 states inequality (12) which is equivalent to solvability of (1). This inequality is verified in Lemma 3.5 under the assumptions motivated by Breiman & Friedman [4].
are linearly independent, and for all 1 ≤ j, k ≤ d, k = j, the operators are compact, then for all square integrable X 1 , . . . , X d with equal means, there exists a unique additive solution Z to (1).

Conditioning with respect to random variables
We now state sufficient conditions for solvability of (1) in terms of joint distributions for finite families d < ∞ of σ-fields generated by random variables F j = σ(ξ j ). We begin with the density criterion that gives explicit estimate for R, and was motivated by [15]. By Lemma 3.2, it implies that (1) has a unique additive solution Z for all 1 < p < ∞. Although it applies both to discrete and continuous distributions (typically, the density in the statement is with respect to the product of marginals), it is clear that the result is far from being optimal. Theorem 2.5 Suppose there is a product probability measure µ = µ 1 ⊗ . . . ⊗ µ d such that the distribution of ξ 1 , . . ., ξ d on IR d is absolutely continuous with respect to µ and its density f is bounded away from zero and infinity, 0 Next we give sufficient conditions in terms of bivariate densities only. The result follows from [14, page 106, Exercise 15] and Corollary 2.4 and is stated for completeness only. Proposition 2.6 ([4]) Suppose d < ∞ and for every pair of i = j the density f i,j of the distribution of (ξ i , ξ j ) with respect to the product measure µ i,j = µ i ⊗ µ j of the marginals exists and If vector spaces L 0 2 (F j ) are linearly independent p = 2, then R < 1. In particular, (1) has a unique additive solution Z for all square integrable X 1 , . . . , X d with equal means.
In general, linear independence is difficult to verify (vide [12], where it fails). The following consequence of Proposition 2.6 gives a relevant "density criterion".
In relation to Theorem 2.5, one should note that the lower bound on the density is of more relevance. (On the other hand, in Theorem 2.5 we use the density with respect to arbitrary product measure rather than the product of marginals.)  (12) holds for all 1 < q < ∞. In particular, for 1 < p < ∞ and X j ∈ L p (F j ) such that EX i = EX j there exists an additive solution to (1).

Results for two σ-fields
This case is rather completely settled. Most of the results occurred in various guises in the literature. They are collected below for completeness, and to point out what to aim for in the more general case.
The following shows that for d = 2 there is at most one solution of (1) and (4). (Clearly, there is no Z if X 1 , X 2 are not consistent, e.g., if EX 1 = EX 2 .) Since best additive approximations satisfy (1), uniqueness allows to consider the inverse problem (1) instead. This is well known, c.f., [8].
2 ) exists, then it is given by the solution to (1).
The following result points out the role of maximal correlation and comes from [5]. 1. There exists a minimal solution to (1) for all consistent X 1 , X 2 in L p (F 1 ), L p (F 2 ) respectively; 2. There exists an additive solution to (1) for all consistent X 1 , X 2 in L p (F 1 ), L p (F 2 ) respectively; 3.ρ < 1.
The consistency condition is Furthermore, if E(Z|F 1 ∩ F 2 ) = 0, the minimum norm in (3) is bounded by and the bound is sharp. The solution Z is given by the follwoing series expansion which converges in L p Remark 2.2 Formula (9) resembles the expansion for the orthogonal projection of L 2 onto the closure of (L 2 (F 1 ) ⊕ L 2 (F 2 )) (see [1]).

Proofs
The following uniqueness result is proved in [4] for the square-integrable case p = 2 (the new part is 1 < p < 2).
are linearly independent, p ≥ 2, and d < ∞, then for every {X j } in L p (F j ), there is at most one solution of (1) in the additive class (4). (ii) If inequality (12) holds with q = 2, then for every {X j } in L p (F j ), p ≥ 2, there is at most one solution of (1) in the additive class (4).
holds for q = p and for the conjugate exponent q = p p−1 , then for every {X j } in L p (F j ) there is at most one solution of (1) in the additive class (4).
Proof of Lemma 3.1. The case p = 2 goes as follows. Suppose Z = Z 1 + Z 2 + . . . has E j (Z) = 0 for all j. Then EZ 2 = j EZZ j = j E(Z j E j (Z)) = 0. This implies that Z j = 0 for all j either by linear independence, or by (12).
The second part uses the existence part of the proof of Theorem 2.1. Take Z = j Z j (L pconvergent series) such that E j (Z) = 0. Then by (11) where E( j X 2 j ) q/2 = 1, 1/p + 1/q = 1 and X j ∈ L 0 q (F j ). The latter holds because the conjugate space to L 0 q ( 2 (F j )) is L 0 p ( 2 (F j )). The existence part of the proof of Theorem 2.1 implies that there isZ ∈ L q such that E j (Z) = X j andZ = jZ j withZ j ∈ L 0 q (F j ). Therefore This shows E|Z| p = 0 and by the left hand side of (11) we have Z j = 0 a.s. for all j. 2 Proof of Proposition 2.9. Clearly Z = E{Z|F 1 ∩ F 2 } is uniquely determined by Z and without loosing generality we may assume Z = 0. Suppose that Z = Z 1 + Z 2 satisfies E j Z = 0. Then Using this iteratively, by "alternierende Verfahren" (see [13]) we get By symmetry, Z 2 = 0 and the proof of uniqueness follows.

2
Proof of Corollary 2.10. Without loss of generality we may assume Since the same analysis applies to E 2 , the optimal Z has to satisfy (1). By Theorem 2.9, there is only one such Z, so this one has to be the optimal one.2 Proof of Theorem 2.11. Let L 0 p denote the null space of the linear operator E( . Therefore the linear operator  (1) is solvable in the additive class (4), then there exists also a minimal solution (3). The following shows that for finite families of σfields the solvability of both problems is actually equivalent, at least when constraints EX i = EX j are the only ones to be used. Lemma 3.2 Fix 1 < p < ∞ and suppose d < ∞. The following conditions are equivalent (i) Equation (1) has an additive (4) solution Z for all X j ∈ L 0 p (F j ); (ii) Equation (1) has a minimal (3) solution Z for all X j ∈ L 0 p (F j ); (iii) There exists δ = δ(q) > 0 such that for all X j ∈ L 0 q (F j ) where 1/p + 1/q = 1. Moreover, if inequality (12) holds, then there exists an additive solution Z to (1) with Remark 3.1 If in addition L 0 q (F j ) are linearly independent, then the following equivalent condition can be added: Proof of Lemma 3.2. (iii)⇒(i) Consider the linear bounded operator T : L p → p (L 0 p (F j )) defined by Z → (E(Z|F j ) : j = 1, . . ., d). The conjugate operator T : q → L q is given by (X j ) → d j=1 X j . Coercivity criterion for T being onto, see [14,Theorem 4.15], is which is (12). Therefore (i) follows.
The left-inverse operator has p → L p operator norm T −1 ≤ 1/δ, which gives the estimate of the norm of Z as claimed.
(i)⇒(ii) If there is additive solution, then X j are consistent and Remark 2.1 implies that there is a solution with the minimal L p -norm.
is a closed subspace of L q (F I ) then (12) holds. Indeed, by linear independence, the linear operator (X 1 + . . . The right hand side is stated as [7, (7)].
By the Khinchin inequality this implies (11). ( Note that a more careful analysis gives explicit estimates for the constants involved.) For q = 2 the above is replaced by which is stated in [2, Lemma 1].
Existence of the solution follows now from functional analysis. Consider the bounded linear (c.f. (11)) operator T : L 0 p → L 0 p ( 2 (F j )) defined by Z → (E(Z|F j ) : j = 1, 2, . . .). The conjugate operator T : L 0 q ( 2 ) → L 0 q is given by (X j ) → ∞ j=1 X j . Coercivity criterion for T being onto, see [14,Theorem 4.15], is which follows from (11). Therefore the existence of a solution to (1) follows and the minimal solution exists by Remark 2.1.
This shows that for all d < ∞ there is a solution to (1), and hence a minimal solution exists. Therefore, by Lemma 3.2 there exists an additive solution (4).
To verify that the series (7) converges, notice that for j = i k Clearly, (4) holds true. We check now that Z defined by (7) satisfies (1). To this end, without loss of generality we assume EX j = 0 and we verify (1) for j = 1 only. Splitting the sum (7) we get The 0-th term of the first series is X 1 and the k-th term of the first series cancels the (k − 1) term of the second series. Therefore E 1 (Z) = X 1 .
To prove the uniqueness, it suffices to notice that r < 1 d−1 implies linear independence. Alternatively, suppose that both Z = Z 1 + . . . + Z d and Z = Z 1 + . . . + Z d have the same conditional moments E 1 . Then , this implies that the sum vanishes, proving uniqueness.

2
Since the above analysis can also be carried through for E(U + V ) 2 , we get the following.
Example 4.1 Let d < ∞. Suppose X 1 , . . ., X d are square-integrable, centered and have linear regressions, i.e., there are constants a i,j such that E(X i |X j ) = a i,j X j for all i, j (for example, this holds true for (X 1 , . . . , X d ) with elliptically contoured distributions, or when all X j are two-valued). Let C = [C i,j ] be their covariance matrix. Clearly, if either R < 1 or r < 1/(d − 1), then C is non degenerate. Explicit solutions illustrating Theorems 2.11, 2.1, and 2.3 are then possible: It is easy to check that Z = d j=1 θ j X j , where

1
  , satisfies E(Z|X j ) = X j for all j, and Z has additive form. In the special case when corr(X i , X j ) = ρ does not depend on i = j, It is easy to see directly from provided R < 1. Therefore Z is well defined. In particular, for d = 2 we have Z = 1 1+ρ (X 1 + X 2 ), which points out the sharpness of estimates for EZ 2 in Theorem 2.11 when ρ < 0.