Gaussian Covariance faithful Markov Trees

A covariance graph is an undirected graph associated with a multivariate probability distribution of a given random vector where each vertex represents each of the different components of the random vector and where the absence of an edge between any pair of variables implies marginal independence between these two variables. Covariance graph models have recently received much attention in the literature and constitute a sub-family of graphical models. Though they are conceptually simple to understand, they are considerably more difficult to analyze. Under some suitable assumption on the probability distribution, covariance graph models can also be used to represent more complex conditional independence relationships between subsets of variables. When the covariance graph captures or reflects all the conditional independence statements present in the probability distribution the latter is said to be faithful to its covariance graph - though no such prior guarantee exists. Despite the increasingly widespread use of these two types of graphical models, to date no deep probabilistic analysis of this class of models, in terms of the faithfulness assumption, is available. Such an analysis is crucial in understanding the ability of the graph, a discrete object, to fully capture the salient features of the probability distribution it aims to describe. In this paper we demonstrate that multivariate Gaussian distributions that have trees as covariance graphs are necessarily faithful. The method of proof is original as it uses an entirely new approach and in the process yields a technique that is novel to the field of graphical models.


Introduction
Markov random fields or graphical models are widely used to represent conditional independences in a given multivariate probability distribution (see Kunsch et al. (1995), Ji & Seymour (1996), Spitzer (1975), Kindermann & Snell (1980), Lauritzen (1996) to name just a few).Many different types of Markov Random fields or graphical models have been studied in the literature.For example, directed acyclic graphs or DAGs are commonly referred to as "Bayesian networks" (see Pearl (1988)).When the graph is undirected and when such graphs are constructed using marginal independence relationships between pairs of random variables in a given random vector these graphical models are called "covariance graph" models (see Cox & Wermuth (1993), Cox & Wermuth (1996), Kauermann (1996), Malouche & Rajaratnam (2009) and Khare & Rajaratnam (2009)).Covariance graph models are commonly represented by graphs with exclusively bi-directed or dashed edges (see Kauermann (1996)).This representation is used in order to distinguish them from the traditional and widely used concentration graph models.Concentration graphs encode conditional independence between pairs of variables given the remaining ones.Formally, if we consider a random vector X = (X v , v ∈ V ) ′ with a probability distribution P where V is a finite set representing the random variables in X.The concentration graph associated with P is an undirected graph G = (V, E) where • V is the set of vertices.
• Each vertex represents one variable in X.
• E is the set of edges (between the verices in V ) constructed using the pairwise rule : for pair where X V \{u,v} := (X w , w = u and w = v) ′ .
Note that (u, v) ∈ E means that the vertices u and v are not adjacent in G.
An undirected graph G 0 = (V, E 0 ) is called the covariance graph associated with the probability distribution P if the set of edges E 0 is constructed as follows The subscript zero is invoked for covariance graphs (i.e., G 0 vs G) as the definition of covariance graphs does not involve conditional independences.Both concentration and covariance graphs are not only used to encode pairwise relationships between pairs of variables in the random vector X, but as we will see below, these graphs can be used to encode conditional independences that exist between subsets of variables of X.First we introduce some definitions: The multivariate distribution P is said to satisfy the "intersection property" if for any subsets A, B C and D of V which are pairwise disjoint, We will call the intersection property (see Lauritzen (1996)) in (3) above the concentration intersection property in this paper in order to differentiate it from another property that is satisfied by P when studying covariance graph models.
Let P satisfy the concentration intersection property.Then for any triplet (A, B, S) of subsets of V pairwise disjoint, if S separates 1 A and B in the concentration graph G associated with P then the random vector . This latter property is called concentration global Markov property and is formally defined as, Kauermann (1996) and Banerjee & Richardson (2003) show that if P satisfies the following property : for any triplet then for any triplet (A, B, S) of subsets of V pairwise disjoint, if V \ (A ∪ B ∪ S) separates A and B in the covariance graph G 0 associated with P then X A ⊥ ⊥ X B | X S .This latter property is called the covariance global Markov property and can be written formally as follows In parallel to the concentration graph case, property (5) will be called the covariance intersection property.
Even if P satisfies both intersection properties, the covariance and concentration graphs may not be able to capture or reflect all the conditional independences present in the distribution, i.e., there may exist one or more conditional independences present in the probability distribution that does not correspond to any separation statement in either G or G 0 .Equivalently, a lack of a separation statement in the graph does not necessarily imply conditional independences.On the contrary case when no other conditional independence exist in P except the ones encoded by the graph, we classify P as a faithful probability distribution to its graphical model.More precisely we say that P is concentration faithful to its concentration graph if for any triplet (A, B, S) of subsets of V pairwise disjoint, the following statement holds : Similarly, P is said to be covariance faithful to its covariance graph G 0 if for any triplet (A, B, S) of subsets of V pairwise disjoint, the following statement holds : A natural question of both theoretical and applied interest in probability theory is to understand the implications of the faithfulness assumption.This assumption is fundamental since it yields a bijection between the probability distribution P and the graph G in terms of the independences that are present in the distribution.In this paper we show that when P is a multivariate Gaussian distribution whose covariance graph are trees are necessarily covariance faithful, i.e., these probability distributions satisfy property (8), i.e., the associated covariance graph G is fully able to capture all the conditional independences present in the multivariate distribution P .This result can be considered as a dual of a previous probabilistic result proved by Becker et al. (2005) for concentration graphs that demonstrates that Gaussian distributions having concentration trees, i.e., the concentration graph is a tree are necessarily concentration faithful to its concentration graph (implying property (7) is satisfied).This result was proved by showing that Gaussian distributions satisfy an additional intersection property.The approach in the proof of the main result of this paper is vastly different from the one used for concentration graphs by Becker et al. (2005).The outline of this paper is follows.Section 2 presents graph theory preliminaries.Section 3 gives a brief overview of covariance and concentration graphs associated with multivariate Gaussian distributions.Furthermore, an easier way to encode conditional independence using covariance graphs is given in Section 3. The prove of the main result of this paper is given in Section 4. Section 5 concludes by summarizing the results in the paper and the implications thereof.

Graph theory preliminaries
This section introduces notation and terminology that is required in subsequent sections.An undirected graph G = (V, E) consists of two sets V and E, with V representing the set of vertices, and E ⊆ (V × V ) \ {(u, u), u ∈ V } the set of edges satisfying : ) ∈ E and we say that u and v are adjacent in G.
Definition 1 A path connecting two distinct vertices u and v in G is a sequence of distinct vertices (u 0 , u 1 , . . ., u n )) where u 0 = u and u n = v where for every i = 0, . . ., n − 1, Such a path will be denoted p = p(u, v, G) and we say that p(u, v, G) connects u and v or alternatively u and v are connected by p(u, v, G).Its length, denoted by |p(u, v, G)|, is defined as the number of edges connecting the vertices of p. So, in this case |p(u, v, G)| = n.We also denote by P(u, v, G) the set of paths between u and v.
Trees are a particular class of graphs that are studied in this paper.This class of graphs are formally defined below.
such that each pair of vertices can be connected by at least one path in G U .
We now state a Lemma needed in the proof of the main result of this paper.
Lemma 1 Let G = (V, E) be an undirected graph.If G is a tree, any subgraph of G induced by a subset of V is a union of connected components, each of which are trees (or what we shall refer to as a "union of tree connected components").
Proof.Consider U ⊂ V , the induced graph G U and a pair of vertices (u, v) ∈ U × U .Let us assume to the contrary that u and v are connected by two distinct paths p 1 and p 2 in G U (i.e., G U is not a tree).As the set of edges E U of the graph G U is included in the set of edges E of G, i.e., E U = E ∩ (U × U ) ⊆ E, then p 1 and p 2 are also paths in G. Hence u and v are vertices in G which are connected by two distinct paths, i.e., p 1 and p 2 .This of course yields a contradiction with the fact that G is a tree.Thus any pair of vertices in G U are connected by at most one path and, hence G U is a union of connected components, each of which are trees (or a "union of tree connected components").
Definition 4 For a connected graph, a separator is a subset S of V such that there exists a pair of non-adjacent vertices u and v such that u, v ∈ S and If S is a separator then it is easily verified that every S ′ ⊇ S such that S ′ ⊆ V \ {u, v} is also a separator.We are thus lead to the notion of a minimal separator.

Definition 5
The separator S is defined to be a minimal separator between two non-adjacent vertices u and v if for any w ∈ S, the subsets S \ {w} is not a separator of u and v.
Note that in the case where G contains more than two connected components and if u and v belong to different connected components the empty set is the only possible separator of u and v. Finally, let A, B and S be pairwise disjoint subsets of V .We say that S separates A and B if for any pair of vertices (u, v) ∈ A × B, any path connecting u and v intersects S. In the case where A and B belong to different connected components of G the subset S can be empty because the set of paths between any pair of vertices (u, v) ∈ A × B is empty.

Gaussian Concentration and Covariance Graphs
In this section we present a brief overview of concentration and covariance graphs in the case when the probability distribution P is multivariate Gaussian.Such graphical models are commonly referred to as Gaussian covariance or Gaussian concentration graph models.

Gaussian concentration graph models
Consider a probability space with triplet (Ω, F, P) and let X : Ω → R |V | be a random vector where X = (X v , v ∈ V ) ′ and P represents the induced measure of P by X.If X follows a Gaussian distribution then it has the following density function with respect to Lebesgue measure : where is the mean vector and Σ = (σ uv ) ∈ P + is the covariance matrix with P + denoting the cone of symmetric positive definite matrices.Without loss of generality we will assume that µ = 0.As any Gaussian distribution with µ = 0 is completely determined by its covariance matrix Σ, this set of multivariate Gaussian distributions can therefore be identified by the set of symmetric positive definite matrices.Gaussian distributions can also be parameterized by the inverse of the covariance matrix Σ denoted by K = Σ −1 = (k uv ).
The matrix K is called the precision or concentration matrix.It is well known (see Lauritzen (1996)) that for any pair of variables Hence the concentration graph G = (V, E) can be constructed simply using the precision matrix K and the following rule Furthermore it can be easily deduced from a classical result in Hammersly & Clifford (1971), that is reproved in Lauritzen (1996), that any multivariate random vector with a positive density necessarily satisfies the concentration intersection property (3).Hence for Gaussian concentration graph models the pairwise Markov property in ( 1) is equivalent to the concentration global Markov property in (4).

Gaussian covariance graph models
As seen earlier in (2) covariance graphs are constructed using pairwise marginal independence relationships.It is also well known that for multivariate Gaussian distributions : Hence in the Gaussian case the covariance graph G 0 = (V, E 0 ) can be constructed using the following rule : It is also easily seen that Gaussian distributions satisfy the covariance intersection property defined in (5).Hence Gaussian covariance graphs can also encode conditional independences according to the following rule : for any triplet We now show (see proposition 2 below) that there is a simple way to read conditional independence statements from the covariance graph.This result holds true for any probability distribution that satisfy the covariance intersection property given in (5).
Proposition 2 Let X V = (X v , v ∈ V ) ′ be a random vector with probability distribution P satisfying the covariance intersection property in ( 5) and let G 0 = (V, E 0 ) be the covariance graph associated with P .Then the following statements are equivalent, i. for any pairwise disjoint subsets A, B and S of ii. for any pairwise disjoint subsets A, B and S of V : if S separates A and B in G 0 then Proof.Let us first assume that (i) is satisfied and let us prove (ii).Let A, B and S be three pairwise disjoint subsets of V such that S separates A and B in G 0 .Note that we can write S as follows: By hypothesis Assume now that property (ii) is satisfied and let A, B and S be three pairwise disjoint subsets of Proposition 2 can be used to formulate an equivalent definition of the covariance faithfulness property.
Definition 6 Let X V = (X v , v ∈ V ) ′ be a random vector with probability distribution P satisfying the covariance intersection property in ( 5) and let G 0 = (V, E 0 ) be the covariance graph associated with P .We say that P is covariance faithful to G 0 if for any pairwise disjoint subsets A, B and S of V the following condition is satisfied The above reformulation of the covariance faithfulness property is an important ingredient in the proofs in the next section.

Gaussian Covariance faithful trees
We now proceed to study the faithfulness assumption in the context of multivariate Gaussian distributions and when the associated covariance graphs are trees.
The main result of this paper, presented in Theorem 3, proves that multivariate Gaussian probability distributions having tree covariance graphs are necessarily faithful to their covariance graphs.The analogous result for concentration graphs was demonstrated by Becker et al. (2005) where the authors proved that Gaussian distributions having tree concentration graphs are necessarily faithful to these graphs.We now formally state Theorem 3. The proof follows shortly after a series of lemmas/theorem(s) and an illustrative example.
Theorem 3 Let X V = (X v , v ∈ V ) ′ be a random vector with Gaussian distribution P = N |V | (µ, Σ −1 ).Let G 0 = (V, E 0 ) be the covariance graph associated with P .If G 0 is a tree or more generally a union of connected components each of which are trees (or a union of "tree connected components"), then P is g 0 −faithful to G 0 .
The proof of Theorem 3 requires among others a result proved by Jones & West (2005).This result gives a method that can be used to compute the covariance matrix Σ from the precision matrix K using the paths in the concentration graph G.The result can also be easily extended to show that the precision matrix K can be computed from the covariance matrix Σ using the paths in the covariance graph G 0 .We now state the result by Jones & West (2005).
Let X V = (X v , v ∈ V ) ′ be a random vector with Gaussian distribution P = N |V | (µ, Σ) where Σ and K = Σ −1 are positive definite matrices.Let G = (V, E) and G 0 = (V, E 0 ) denote respectively the concentration and covariance graph associated with the probability distribution of X V .
For all where, if p = (u 0 , . . ., u n ), denote respectively K and Σ with rows and columns corresponding to variables in path p omitted.The determinant of a zero-dimensional matrix is defined to be 1.
The proof of our main theorem (Theorem 3) also requires the results proved in the lemma below., E) denote respectively the covariance and concentration graphs associated with P , then i. G and G 0 have the same connected components ii.If a given connected component in G 0 is a tree then the corresponding connected component in G is complete and vice-versa.
The fact that G 0 and G have the same connected components can be deduced from the matrix structure of the covariance and the precision matrix.The connected components of G 0 correspond to block diagonal matrices in Σ.Since K = Σ −1 , then by properties of inverting partitioned matrices, K also has the same block diagonal matrices as Σ in terms of the variables that constitute these matrices.These blocks corresponds to distinct components in G and G 0 .Hence both matrices have the same connected components.
Let us assume now that the covariance graph G 0 is a tree, hence it is a connected graph with only one connected component.We shall prove that the concentration graph G is complete by using Theorem 4 by Jones & West (2005) and computing any coefficient k uv (u = v).Since G 0 is a tree, there exists exactly one path between between any two vertices u and v.We shall denote this path as p = (u 0 = u, . . ., u n = v).Then by Theorem 4 First note that the determinant of the matrices in ( 11) are all positive since principal minors of positive definite matrices are positive.Second since we are considering a path in G 0 , σ u i−1 u i = 0, ∀ i = 1, . . ., n.Using these two facts we deduce from (11) that k uv = 0 for all (u, v) ∈ E. Hence u and v are adjacent in G for all (u, v) ∈ E. The concentration graph G is therefore complete.The proof that when G is assumed to be a tree implying that G 0 is complete follows similarly.
Remark.We further note that Theorem 4 is also directly useful in deducing the completeness of the concentration graph by using the covariance graph in other settings.As a concrete example consider the case when G 0 is a cycle with an even number of edges s.t.|V | = 2k for some odd integer k, and assume that all the coefficients in the covariance matrix Σ of X V are positive.Hence a given pair of vertices (u, v) in G 0 are connected by two paths which are both of odd length.Let us denote these paths as p 1 and p 2 .Using Theorem 4, it is easily deduced that Here |σ p 1 | and |σ p 1 | are different from zero as they are both equal to a product of positive coefficients.Hence k uv = 0.The same argument can also be used in the case when p 1 and p 2 both have even length (i.e., |V | = 2k for some even integer k) to deduce that k uv = 0. Hence u and v are adjacent in the concentration graph G; thus G is necessarily complete.We now give an example illustrating the main result in this paper (Theorem 3).
Example 1 Consider a Gaussian random vector X = (X 1 , . . ., X 8 ) ′ with covariance matrix Σ and its associated covariance graph as given in Figure 1.Consider the sets A = {1, 2}, B = {5} and S = {4, 6}.Note that S does not separate A and B in G 0 as any path from A and B does not intersect S. In this case we cannot use the covariance global Markov property to claim that X A is not independent of X B given X V \(A∪B∪S) .This is because the covariance global Markov property allows us to read conditional independences present in a distribution if a separation is present in the graph.It is not an "if and only if" property in the sense that the lack of a separation in the graph does not necessarily imply the lack of the corresponding conditional independence.We shall show however that in this example that X A is indeed not independent of X B given X V \(A∪B∪S) .In other words we shall show that the graph has the ability to capture this conditional dependence present in the probability distribution P .
Let us now examine the relationship between X 2 and X 5 given X {3,7,8} .Note that in this example V \ (A ∪ B ∪ S) = {3, 8, 7}, 2 ∈ A and 5 ∈ B. Note that the covariance graph associated with the probability distribution of the random vector (X 2 , X 5 , X {3,8,7} ) ′ is the subgraph represented in Figure 2 and can be obtained directly as a subgraph of G 0 induced by the subset {2, 5, 3, 7, 8}.Since 2 and 5 are connected by exactly one path in (G 0 ) {2,5,3,7,8} , that is p = (2, 3, 5), then the coefficient k 25|387 , i.e., the coefficient between 2 and 5 in inverse of the covariance matrix of (X 2 , X 5 , X {3,8,7} ) ′ , can be computed using Theorem 4 as follows where Σ({7, 8}) and Σ({2, 5, 3, 8, 7}) are respectively the covariance matrices of the Gaussian random vectors (X 7 , X 8 ) ′ and (X 2 , X 5 , X {3,8,7} ) ′ .Hence k 25|387 = 0 since the right hand side of the equation in ( 12) is different from zero.Hence X 2 ⊥ ⊥ X 5 | X {3,8,7} .Now recall that for any Gaussian random vector vector where A, B and C are pairwise disjoint subsets of V .The contrapositive of ( 13) yields Hence we conclude that since {3, 7, 8} does not separate {1, 2} and {5} therefore X {1,2} is not independent of X 5 given X {3,7,8} , i.e., . We now proceed to the proof of Theorem 3. Proof. of Theorem 3. Without loss of generality we assume that G 0 is a connected tree.Let us assume to the contrary that P is not covariance faithful to G 0 , then there exists a triplet (A, B, S) of pairwise disjoint subsets of V , such that X A ⊥ ⊥ X B | X V \(A∪B∪S) , but S does not separate A and B in G 0 , i.e., As S does not separate A and B and since G 0 is a connected tree, then there exists a pair of vertices (u, v) ∈ A × B such that the single path p connecting u and v in G 0 does not intersect S, i.e., S ∩ p = ∅.Hence p ⊆ V \ S = (A ∪ B) ∪ (V \ (A ∪ B ∪ S)).Thus two cases are possible with regards to where the path p can lie : either p ⊆ A ∪ B or p ∩ (V \ (A ∪ B ∪ S)) = ∅.Let us examine both cases separately.
• Case 1 : p ⊆ A ∪ B In this case the entire path between u and v lies in A ∪ B and hence we can find a pair of vertices2 (u ′ , v ′ ) belonging to p and (u Recall that since G 0 is a tree, any induced graph of G 0 by a subset of V is a union of tree connected components (see Lemma 1).Hence the subgraph is a union of tree connected components.As u ′ and v ′ are adjacent in G 0 , they are also adjacent in (G 0 ) W and belong to the same connected component3 of (G 0 ) W . Hence the only path between u ′ and v ′ is precisely the edge (u ′ , v ′ ).Using theorem 4 to compute the coefficient k u ′ v ′ |V \(A∪B∪S) , i.e., (u ′ , v ′ )th coefficient in the inverse of the covariance matrix of the random vector where Σ(W ) denotes the covariance matrix of X W , and Σ(W \{u ′ , v ′ }) denotes the matrix Σ(W ) with the rows and the columns corresponding to variables X u ′ and X v ′ omitted.We can therefore deduce from ( 14) that k u ′ v ′ |V \(A∪B∪S) = 0. Recall that at the start of the proof we assumed to the contrary that Note however that we have established that X Hence we obtain a contradiction to (15) since u ′ ∈ A and v ′ ∈ B.
In this case there exists a pair of vertices (u ′ , v ′ ) ∈ A × B with u ′ , v ′ ∈ p, such that the vertices u ′ and v ′ are connected by exactly one path p ′ ⊆ p in the induced graph Let us now use Theorem 4 to compute the coefficient k u ′ v ′ |V \(A∪B∪S) , i.e., the (u ′ , v ′ )−coefficient in the inverse of the covariance matrix of the random vector where Σ(W ) denotes the covariance matrix of X W and Σ(W \ p ′ ) denotes Σ(W ) with the rows and the columns corresponding to variables in path p ′ omitted.One can therefore easily deduce from ( 16) that k . Hence once more we obtain a contradiction to (15) since u ′ ∈ A and v ′ ∈ B.
Remark.The dual result of the theorem above for the case of concentration trees was proved by Becker et al. (2005).We note however that the argument used in the proof of Theorem 3 cannot also be used to prove faithfulness of Gaussian distributions that have trees as concentration graphs.The reason for this is as follows.In our proof we employed the fact that the sub-graph (G 0 ) {u,v}∪S of G 0 induced by a subset {u, v} ∪ S ⊆ V is also the covariance graph associated with the Gaussian sub-random vector of X V as denoted by X {u,v}∪S = (X w , w ∈ {u, v} ∪ S) ′ .Hence it was possible to compute the coefficient k uv|S which quantifies the conditional (in)dependence between u and v given S, in terms of the paths in (G 0 ) {u,v}∪S and the coefficients of the covariance matrix of X {u,v}∪S = (X w , u ∈ {u, v} ∪ S) ′ .On the contrary, in the case of concentration graphs the sub-graph G {u,v}∪S of the concentration graph G induced by {u, v} ∪ S is not in general the concentration graph of the random vector X {u,v}∪S = (X w , u ∈ {u, v} ∪ S) ′ .Hence our approach is not directly applicable in the concentration graph setting.

Conclusion
Faithfulness of a probability distribution to a graph is a crucial assumption that is often made in the probabilistic treatment of graphical models.This assumption describes the ability of a graph to reflect or encode the multivariate dependencies that are present in a joint probability distribution.Much of the methodology in this area often do not undertake a detailed analysis of the faithfulness assumption, as such an endeavor requires a more careful and rigorous probabilistic study of the joint distribution at hand.In this note we looked at the class of multivariate Gaussian distributions that are Markov with respect to covariance graphs and prove that Gaussian distributions which have trees as their covariance graphs are necessarily faithful.The method of proof that is employed in this paper is novel in the sense that it is self contained and yields a completely new approach to demonstrating faithfulness -as compared to the methods that are traditionally used in the literature.Moreover, it is also vastly different in nature from the proof of the analogous result for concentration graph models.Hence the approach used in this paper promises to have further implications and give other insights.Future research in this area will explore if the techniques used in this paper can be modified to prove or disprove faithfulness for other classes of graphs.

Figure 2 :
Figure 2: the covariance graph (G 0 ) {2,5,3,8,7} is empty then p has to lie entirely in A ∪ B. This is because by assumption p does not intersect S. The case when p lies in A ∪ B is covered in Case 1 and hence it is assumed that