The Letac-Massam conjecture and existence of high dimensional Bayes estimators for Graphical Models

In recent years, a variety of useful extensions of the Wishart have been proposed in the literature for the purposes of studying Markov random fields/graphical models. In particular, generalizations of the Wishart, referred to as Type I and Type II Wishart distributions, have been introduced by Letac and Massam (\emph{Annals of Statistics} 2006) and play important roles in both frequentist and Bayesian inference for Gaussian graphical models. These distributions have been especially useful in high-dimensional settings due to the flexibility offered by their multiple shape parameters. The domain of In this paper we resolve a long-standing conjecture of Letac and Massam (LM) concerning the domains of the multi-parameters of graphical Wishart type distributions. This conjecture, posed in \emph{Annals of Statistics}, also relates fundamentally to the existence of Bayes estimators corresponding to these high dimensional priors. To achieve our goal, we first develop novel theory in the context of probabilistic analysis of graphical models. Using these tools, and a recently introduced class of Wishart distributions for directed acyclic graph (DAG) models, we proceed to give counterexamples to the LM conjecture, thus completely resolving the problem. Our analysis also proceeds to give useful insights on graphical Wishart distributions with implications for Bayesian inference for such models.


Introduction
Inference for graphical models is a topic of contemporary interest, and in this regard, various tools for inference have been proposed in the statistics literature, including establishing sufficient and/or necessary conditions for existence of high dimensional estimators. One important contribution in the area are the families of Type I and Type II graphical Wishart distributions introduced by Letac and Massam (LM) [14]. The families of graphical Wishart type distributions of Letac-Massam have the distinct advantage of being standard conjugate for Gaussian graphical models, have attractive hyper Markov properties, and have multiple shape parameters. This is in contrast with the classical Wishart distribution which has just one shape parameter that is restricted to the one dimensional Gindikin set -see (1.1). These multi-parameter graphical Wishart distributions are therefore useful for flexible high dimensional inference [15], and have also been used as flat conjugate priors for objective Bayesian inference. Since the domain of integrability of these high dimensional priors are not fully identified, it is not clear when these distributions yield proper priors. The LM conjecture aims to address this question formally. The LM conjecture is also critical for understanding when these priors lead to well-defined Bayes estimators, since this is not always guaranteed in high dimensional, sample starved settings. In this sense resolving the LM conjecture can be viewed as a Bayesian analogue of the frequentist problem of identifying sufficient and necessary conditions for the existence of the maximum likelihood estimator for Gaussian graphical models.
The primary goal of this paper is therefore to resolve a conjecture of Letac and Massam (henceforth the LM conjecture) which concerns identifying the parameter sets for the families of the so-called Type I and Type II Wishart distributions. A definitive solution to the LM conjecture has remained elusive to the graphical models community ever since it was formally posed by Letac and Massam about ten years ago. The conjecture also has deep and profound connections to Gindikin's result [7,8] on the region of integrability of the p-variate Gamma function. This domain is referred to as the Gindikin set and is given as follows: Though the main goal of this paper is to resolve the LM conjecture, we note that understanding the domain of integrability of these graphical Wishart distributions is important for two other reasons beyond Bayesian inference and model selection: 1) These two classes of distributions also serve as statistical models in their own right for matrix-variate distributions defined on sparse subsets of the cone, and 2) The integrals of these graphical Wishart densities are extensions of the gamma and multivariate gamma functions on sparse manifolds. Thus understanding the domains of integrability of these graphical Wishart distributions is of independent mathematical interest that is closely linked to generalizations of the Gindikin set.
In what follows we shall employ the notation introduced in the work of Letac and Massam [14]. The Type I Wishart and Type II Wishart are defined, respectively, on the cones Q G and P G associated with a decomposable graph G, i.e., an undirected graph that that has no induced cycle of length greater than or equal to four. These cones naturally arise as the set of covariance and inverse-covariance parameters for a Gaussian undirected graph model over G. i.e., the family of multivariate Gaussian distributions that obey the pairwise or global Markov property with respect to G [13]. It is well known that if the vertices of G are labeled 1, 2, . . . , p, then a p-variate Gaussian distribution N p (0, Σ) obeys the global Markov property with respect to G if Σ −1 i j = 0 whenever there exists no edge between i and j. This property gives a simple characterization of the associated inverse-covariance matrices, i.e., the elements of the cone P G . The cone Q G is the dual cone of P G and its elements are incomplete covariance matrices where only the entries along the edges of G are specified, and the rest of the entries are unspecified. However, the specified entries are also the only functionally independent entries of the covariance matrix parameter, and uniquely determine the rest of the entries (the unspecified entries that is). In particular, the space of covariance matrices for the Gaussian inverse-covariance graph model over G can be identified with the cone Q G . When G is complete, i.e., in the full model, Type I and Type II Wishart distributions are identical to the classical Wishart distribution. Moreover by restricting the multi-parameters to a specific one dimensional space, these distributions reduce to the hyper Wishart distribution introduced by Dawid and Lauritzen [6] and the G-Wishart defined by Roverato in [16] respectively (see [14] for more details). Although having multiple shape parameters allow the Type I Wishart and Type II Wishart distributions to be more flexible as prior distributions, there is a trade-off: the sets of multi-parameters are not completely identified.
In an attempt to identify the set of multi-parameterss of the Type I Wisharts, denoted by A, and that of Type II Wisharts, denoted by B, in [14, Section 3.3] Letac and Massam first consider the case when G is homogeneous, i.e., G is decomposable and has no induced paths of length greater than or equal to 4.
When G is homogeneous Letac and Massam are able to completely identify A and B and, furthermore, give algebraic expressions for the elements of both sets. If G is non-homogeneous, however, in [14,Section 3.4] the authors are able to only partially identify the sets A and B. More specifically, for each perfect order P of G, they identify a subset A P of A and a subset B P of B. The authors then proceed to conjecture that A and B are indeed the union of A P and B P over all perfect orders of the cliques of G, respectively. They demonstrate that the conjecture holds when G is the 4-path, the simplest non-homogeneous decomposable graph, •. They note that a similar calculation for the 5-path appears insurmountable. On a different route, but motivated by the recent work of Letac and Massam [14] and Rajaratnam et al. [15] for concentration graph models, and Khare and Rajaratnam [11] for covariance graph models, the authors of this paper undertook a parallel analysis in [3] for directed acyclic graph models, abbreviated DAG models, or Bayesian networks. In [3], we introduce a new class of multi-parameter Wishart type distributions, useful for Bayesian inference for Gaussian DAG models. One of its advantages is that the framework in [3] applies to all directed acyclic graph models and not just the narrower class of perfect DAGs. Furthermore the normalizing constant for these DAG Wisharts is available in closed form for all DAGs. It is also well-known fact that the family of inverse-covariance graph model over a decomposable graph G is Markov equivalent to the family of DAG models over a perfect DAG version of G. As we shall demonstrate later, this, in particular, implies that both the Type II Wisharts of Letac-Massam in [14] and the DAG Wisharts in [3] are indeed defined on the same cone P G . Therefore, a relevant question is how the functional form and the multi-parameter set of the Type II Wishart density compare with those of the DAG Wishart. A similar comparison arises between the Type I Wisharts of Letac-Massam and the Riez distributions for decomposable graphs introduced by Andersson and Klein in [2]. Such comparisons shed light on the LM conjecture since the domains of integrability of the DAG Wisharts are fully specified in [3]. In this paper we develop tools which allows a careful comparison of these two types of Wisharts on P G and Q G , leading to counterexamples, which in turn can then be used to conclude that the LM conjecture does not hold in general.
The primary key to resolving the part of the LM conjecture that concerns the Type II Wishart is Theorem 5.1 of this paper. In this theorem we show that for any non-homogeneous decomposable graph G there exists a perfect order P and a perfect DAG version of G, associated with this order, such that the Type II Wishart distribution on B P is a special case of the DAG Wishart distribution. Using this observation, and depending on the perfect DAG version of the underlying graph G, we derive a condition in Proposition 5.1, which when satisfied, can lead to counterexamples to the LM conjecture. We then proceed to present two graphs (with their respective perfect DAG versions) where the stated condition in Proposition 5.1 is satisfied and lead to counterexamples to the LM conjecture. The counterexample to the other part of the LM conjecture concerning the Type I Wishart distribution is given after Proposition 8.1 where we prove that the same condition as that in Proposition 5.1 can lead to resolving the LM conjecture. In addition to disproving the LM conjecture, we also prove that not only for non-homogeneous decomposable graphs, but also for homogeneous graphs, the family of Type II Wisharts are a subclass of the family of DAG Wisharts.
The organization of the paper is as follows. In §2 we recall some fundamental notation and concepts in graphical models and, in particular, for Gaussian undirected graphical models. In §3 we provide the reader with definition of Type I and Type II Wishart distributions and formally state the Letac-Massam conjecture. In §4.1 and §4.2 we give a short introduction to Guassian DAG models and the families of DAG Wisharts. The main results of the paper are presented in the ensuing four sections. In §5 and §6, we develop tools which enable a detailed comparison between Type II Wisharts on one hand, and on the other hand, DAG Wisharts for the corresponding DAG versions of the associated undirected graphs. Moreover, tools are developed for comparisons of both decomposable and homogeneous Type II Wisharts to their DAG Wishart counterparts. Using the tools developed in §5 and §6, we formally resolve the Letac-Massam conjecture in §7 and §8 by providing counterexamples.

Preliminaries
We now introduce some preliminaries on graph theory and graphical models. This section closely follows the notation and exposition given in [3] and [4].

Graph theoretic notation and terminology
A graph G is a pair of objects (V, E), where V and E are two disjoint finite sets representing, respectively, the vertices and the edges of G. An edge e ∈ E is said to be undirected if e is an unordered pair {v, v ′ }, or directed if e is an ordered pair (v, v ′ ) for some v, v ′ ∈ V. Now a graph is said to be undirected if all its edges are undirected, and directed if all its edges are directed.
The set of parents of v is denoted by pa (v), and the set of children of v is denoted by ch (v). The family of v is fa The set of all neighbors of v is denoted by ne (v). In general two distinct vertices are said to be adjacent, denoted by v ∼ v ′ , if there exists either a directed or an undirected edge between them. A loop in G is an ordered pair (v, v), or an unordered pair {v, v} in E. For ease of notation, in this paper we shall always assume that the edge set of each graph contains all the loops, however, we draw the graph without the loops.
A path is said to be directed if at least one of the edges is directed. We say v leads to v ′ , denoted by v → v ′ , if there is a directed path from v to v ′ . A graph G = (V, E) is said to be connected if for any pair of distinct vertices v, v ′ ∈ V there exists a path between them. An n-cycle in G is a path of length n with the additional requirement that the end points are identical. A directed n-cycle is defined accordingly. A graph is acyclic if it does not have any cycles. An acyclic directed graph, denoted by DAG (or ADG), is a directed graph with no cycles of length greater than 1.
Notation. Henceforth in this paper, we denote an undirected graph by G = (V, E) and a DAG by D = (V, F). Also, otherwise stated, we always assume that the vertex set V = {1, 2, . . . , p}.
The undirected version of a DAG D = (V, F), denoted by D u = (V, F u ), is the undirected graph obtained by replacing all the directed edges of D by undirected ones. An immorality in D is an induced subgraph of the from v → v ′ ← v ′′ . Moralizing an immorality entails adding an undirected edge between the pair of parents that have the same children. Then the moral graph of D, denoted by D m = (V, F m ), is the undirected graph obtained by first moralizing each immorality of D and then making the undirected version of the resulting graph. Naturally there are DAGs which have no immoralities and this leads to the following definition. Given a directed acyclic graph (DAG), the set of ancestors of a vertex v, denoted by an (v), is the set of those vertices v ′′ such that v ′′ → v. Similarly, the set of descendants of a vertex v, denoted by de (v), is the set of those vertices v ′ such that v → v ′ . The set of non-descendants of v is nd (v) = V \ (de (v) ∪ {v}). A set A ⊆ V is said to be ancestral when A contains the parents of its members. The smallest ancestral set containing a set B ⊆ V is denoted by An (B).

Decomposable and homogeneous graphs
Let G be a decomposable graph. The reader is referred to Lauritzen [13] for all the common notions of decomposable graphs that we will use here. One such important notion is that of a perfect order of the cliques. Every decomposable graph admits a perfect order of its cliques. Let (C 1 , · · · , C r ) be one such perfect order of the cliques of the graph G. The history for the graph is given by H 1 = C 1 and The separators of the graph are given by The residuals are defined as follows: Generally, we will denote by C G the set of cliques of a graph and by S G its set of separators. Let r ′ ≤ r − 1 denote the number of distinct separators and ν (S ) denote the multiplicity of S , i.e., the number of j such that S j = S . Decomposable (undirected) graphs and (directed) perfect graphs have a deep connection. If G is decomposable, then there exists a perfect DAG version of G, i.e., a perfect DAG D such that D u = G. On the other hand, the undirected version of a perfect DAG is necessarily decomposable [9,13].
A decomposable graph G is said to to be homogeneous if for any two adjacent vertices i, j we have The reader is referred to Letac and Massam [14] for all the common notions of homogeneous graphs.

Undirected Gaussian Graphical Models
Let G = (V, E) be a undirected graph with V = {1, . . . , p} and let X = X 1 , . . . , X p ⊤ be a random vector in R p such that X ∼ N p (0, Σ), i.e., X has a p-variate Gaussian distribution with mean zero and covariance Σ. The covariance matrix Σ is assumed to be positive definite (written as Σ ≻ 0) with inverse-covariance matrix (also said to be the precision or concentration matrix). Ω := Σ −1 . Now for any two vertices i, j ∈ V A simple proof of this well-known fact can be found in [13, section 5.1]. In particular, the distribution Now let N (G) denote the family of all p-variate Gaussian distributions N p (0, Σ) that are Markov random fields over G. Note that Equation (2.2) provides an easy description of the elements of N (G) in terms of the pattern of zeros in the associated inverse-covariance matrices. Subsequently, N (G) is said to be the Gaussian concentration graph model over G. The set of covariance matrices is the standard parameter set for N (G) . In light of Equation (2.2) the distributions in the exponential family N (G) can be parametrized by the canonical parameter Ω = Σ −1 which lives in the space of inversecovariance matrices defined as follows: Let S p denote the set of p × p symmetric matrices. Then We call each element in I G a G-incomplete matrix. One can easily check that I G is a real linear space, is positive definite. Note that for any decomposable graph G, the set of partial positive definite matrices over G, denoted by Q G , is the dual cone of the (open convex) cone P G [14]. When G is decomposable Grone et al. in [10] prove that each Γ ∈ Q G can be completed to a unique positive definite matrix Σ = Σ (Γ) ∈ PD G . This means that Σ is the only element in PD G with the property that Σ i j = Γ i j , for each {i, j} ∈ E. If Σ E denotes an element of Q G with the unique positive definite completion Σ in PD G , then Grone et al. [10] explicitly provide a bijective mapping Σ E → Σ : Q G → PD G . If we compose this mapping with the inverse mapping Σ → Σ −1 : PD G → P G , then we obtain the bijective mapping Σ E → Σ −1 : Q G → P G . The corresponding inverse mapping is given as . We shall frequently invoke these mappings in subsequent sections.

The Letac-Massam Wishart type distributions for decomposable graphs
Henceforth in this paper, we assume that G = (V, E) is a decomposable graph and the vertices are labeled 1, 2, . . . , p. The primary goal of this section is to provide the reader with an overview of the families of Wishart-Type I and Wishart-Type II distributions introduced in [14]. At the end of this section we shall formally state the LM conjecture concerning the domains of the multi-parameters for these distributions.

Markov ratios and corresponding measures on Q G and P G
Let C 1 , . . . , C r be a perfect order of the cliques of G and let (S 2 , . . . , S r ) be the corresponding sequence of separators, with possible repetitions. For each α ∈ R r , β ∈ R r−1 and Σ E ∈ Q G , the Markov ratio H G α, β, Σ E is defined as follows: Let c := (c 1 , . . . , c r ) and s := (s 2 , . . . , s r ) where c i := |C i | and s i := |S i |, respectively. Moreover, let dΣ E denote Lebesgue measure on Q G 1 . Then where dΩ is Lebesgue measure on P G [14]. 1 More precisely, dΣ E is the standard Lebesgue measure on I G restricted to the open set Q G .

Type I & II Wishart distributions
We now introduce the Type I and Type II Wishart distributions from [14]. The Type I Wishart is a distribution defined on the cone Q G . The non-normalized density of this distribution is given by where (α, β) ∈ R r × R r−1 denotes the multi-shape parameter and U E ∈ Q G is the scale parameter. The normalized version of ω Q G , denoted by W Q G , is defined for pairs of (α, β) such that for every The Type II Wishart is a distribution on the cone P G with the non-normalized density Similarly, the normalized version of ω P G , denoted by W P G , is defined for pairs of (α, β) such that for every The space of multi-shape parameter for the family of Type I Wisharts, i.e., the set of pairs (α, β) that satisfy both conditions (A1) and (A2), is denoted by A. Likewise, the space of multi-shape parameter for the family of Type II Wisharts, i.e., the set of pairs (α, β) that satisfy conditions (B1) and (B2), is denoted by B.

The LM conjecture for identifying A and B
After defining Type I & II Wishart distributions, an important goal of Letac & Massam in [14] is to identify A and B, the associated spaces of multi-shape parameters. When the underlying graph G is homogeneous both A and B are completely identified in [14], but when G is no longer homogeneous, these spaces are only partially identified. More precisely, Letac & Massam [14] identify a subset of A and a subset of B as follows.
Let P = (C 1 , · · · , C r ) be a given perfect order of the cliques of G and (S 2 , · · · , S r ) the corresponding sequence of separators. For each separator S ∈ S G let J (P, S ) := j : S j = S . A set associated with P and A, denoted by A P , is the set of (α, β) ∈ R r × R r−1 such that: Similarly, a set associated with P and B, denoted by B P , is the set of (α, β) such that: [14] prove that if G is a non-complete decomposable graph, then A P ⊆ A and B P ⊆ B. Therefore, P A P ⊆ A and P B P ⊆ B, where the subscript P runs through all perfect orders of the cliques of G. When G is homogeneous Letac and Massam in [14] establish that P A P A and P B P B, but in the case of an arbitrary non-homogeneous decomposable graph they conjecture that equalities hold. We now proceed to formally state the Letac-Massam conjecture.

Theorems 3.3 & 3.4 in
The Letac-Massam (LM) Conjecture. Let G be a non-homogeneous decomposable graph and let Ord(G) denote the set of the perfect orders of the cliques of G. Then

P∈Ord(G)
Remark 3.1. Note that for each perfect order P = (C 1 , . . . , C r ) of the cliques of a decomposable graph G the sets A P and B P , as manifolds, are of dimension r + 1. Therefore, the LM conjecture asserts that A and B are also of dimension r + 1.

The DAG Wishart distributions for directed Markov random fields
One of the main goals of this paper is to study the LM conjecture and formally demonstrate that it does not hold in general. Our goal is slightly broader as we are also interested in understanding when exactly the LM conjecture does not hold. In particular, we aim to identify graph characteristics which lead to a violation of the LM conjecture. Our approach is to develop tools which will allow us to compare the Type I & II Wishart distributions, respectively, with the generalized versions of Riesz distributions, by Andersson et al. [2] for perfect DAGs, and the DAG Wishart distributions introduced by the present authors in [3]. We demonstrate that relating the LM conjecture to the class of DAGs (and not just undirected graphical models) can provide valuable insights. Since we are able to completely characterize the domain of integrability of the Wisharts associated with DAG models. We begin with a compact review of the DAG Wisharts given in [3].

Gaussian DAG models
Inference for Gaussian DAG models provide the main motivation for developing the DAG Wisharts in [3]. We give a brief introduction here.
The relation D clearly defines a partial order on V. Since every partial order can be extended to a linear order [17], without loss of generality, we can assume that the vertices in V are labeled 1, 2, . . . , p, and for each i, j ∈ V if i → j, then i > j. This order corresponds to the parent order of the vertices of the DAG. Now let the random vector X = X 1 , . . . , X p ⊤ ∈ R p be a directed Markov random field (or DAG) over D. Thus X obeys the directed local Markov property with respect to If, in addition, X ∼ N p (0, Σ), then a simple observation in [1] shows that the directed local Markov property in Equation (4.1) is satisfied if and only if Σ ≻ 0 and We define the Gaussian DAG model 2 , denoted by N (D), to be the family of all centered Gaussian distribution N p (0, Σ) which are directed Markov random fields over D. It is easily seen that the distributions in N (D) can be parameterized by the space of covariance matrices These distributions can also be parametrized by the space of inverse-covariance matrices P D := Ω : Other important parameterizations of the distributions in N (D) are available in terms of the modified Cholesky decompositions of the inverse-covariance matrices, i.e., Σ −1 = LΛL ⊤ such that L is a lower triangular matrix with all diagonals equal to 1, and Λ is a diagonal matrix. Note that for two distinct vertices i, j, if i is not a parent of j, then L i j = 0. We refer the reader to [1,3,4] for more details.

The DAG Wishart distribution for perfect DAGs
Let D be a perfect DAG. First note that a random vector X in R p is a DAG over D if and only if it is an undirected graphical model over D u , the undirected version of D (which is also necessarily decomposable) [18]. This implies that P D and PD D are, respectively, identical to P D u and PD D u (see §2.3 for definitions).
In particular, P D is an open convex cone. The DAG Wishart distribution π P D , as we shall define here is a distribution on P D [3]. We first define, π P D , the non-normalized version of π P D as follows: where the multi-shape parameter η lives in R p , U ≻ 0, D j j : and pa j = |pa ( j) |. From Theorem 4.1 in [3] the domain of integrability of the DAG Wishart distribution can be fully characterized: Moreover, if η j > pa j + 2 ∀ j ∈ V, then the normalizing constant is given by where j := pa( j) ∪ { j}.
Let Ω ∈ P D and Ω = LΛL ⊤ be the modified Cholseky decompositions of Ω. Then L is a lower triangular matrix with all diagonal entries equal to one and L i j = 0 ∀ (i, j) F, and Λ is a diagonal matrix such that Λ j j = Σ j j|≺ j≻ −1 = D −1 j j [18].

Comparing decomposable Type II Wisharts with perfect DAG Wisharts
We now proceed to compare Type II Wisharts for decomposable graphs with DAG Wisharts for perfect DAGs. We had noted earlier that the class of decomposable graphs are Markov equivalent to the class of perfect DAGs. In particular, every probability distribution that obeys the global Markov property with respect to a decomposable graph also obeys the local directed Markov property with respect to a perfect DAG version and vice versa. In the Gaussian setting this means that if D is a perfect DAG version of G, then N (G) and N (D) define the same family of p-variate Gaussian distributions. Consequently, the family of Type II Wisharts and the family of DAG Wisharts are both defined on P G = P D , the space of inversecovariance matrices. Therefore, a relevant question is how the functional form of the Type II Wishart density compares with that of the DAG Wishart. First, to facilitate comparison, we re-parameterize the DAG Wishart π P D as follows. For each j = 1, . . . , p let the expressions of the form − 1 2 η j + pa j + 2 in Equation (4.4) be replaced by γ j , and let U be replaced by 2U. Let γ := γ 1 , . . . , γ p . Under this parametrization, with a slight abuse of notation, we write where the normalizing constant z D (γ) exists if and only if γ j < pa j /2 + 1 for each j = 1, . . . , p and (5.1) Note also that for each perfect directed version D of G the exponential term exp −tr Ω −1 U is a common term in both the Type II Letac-Massam Wishart W P G and the DAG Wishart π P D . Moreover, the DAG Wishart π P D has additional terms only of the form D γ j j j . Before comparing W P G and π P D more generally, we first illustrate the comparison with an example. Example 5.1. Let G be the 4-path given in Figure 1(a). Note that G is a non-homogeneous decomposable graph. It is clear that the DAG given in Figure 1 To compare the corresponding Letac-Massam Type II Wishart and the DAG Wishart for this graph, we rewrite the Markov ratio present in the density of W P G as follows.
As shown in section 3.4 of [14], one can check that B = B P 1 ∪ B P 2 where P 1 = (C 1 , C 2 , C 3 ) and P 2 = (C 2 , C 1 , C 3 ) are the perfect orders of the cliques of G and Note that unless α 3 − β 3 + 1 2 = 0 and α 1 − β 2 + 1 2 = 0 ( i.e., (α, β) is restricted to the intersection of B P 1 and B P 2 ) the expression in Equation (5.2) contains some terms different from the product of D j j to some powers. Since the DAG Wishart has polynomial terms only of the form D γ j j j , it is clear that for this directed version of G, W P G and π P D are not directly comparable. Note that we did not need to account for two additional perfect orders P ′ 1 = (C 3 , C 2 , C 1 ) and P ′ 2 = (C 2 , C 3 , C 1 ) since it has been shown in [14] that B P ′ 1 = B P 1 and B P ′ 2 = B P 2 . Now consider the comparison with the perfect DAG version given in Figure 1 Therefore, for this directed version of G, the family of Type II Wisharts, restricted to B P 1 , is a subfamily of the family of DAG Wisharts.
In Example 5.1 we illustrated the fact that although W P G does not, necessarily, compare with π P D for any arbitrary perfect DAG version D of G, it is however comparable with some particular DAGs. We will show next that this conclusion can be generalized to any decomposable graphs. To this end, we proceed with a few useful lemmas. In particular, we introduce tools that will allow us to relate decomposable graphs with a given perfect ordering of its cliques with perfect DAGs and vice versa. These tools turn out to be critical ingredients in comparing the undirected Letac-Massam Wisharts to the directed DAG Wisharts. Before we proceed with the next lemma we introduce some convenient notation and a definition.

b) Conversely, suppose D is a perfect DAG version of G.
Then there exists a perfect order P, of the cliques of G, such that D is induced by P.
Proof. a) Suppose, to the contrary, that D is not perfect. Let j be the smallest integer such that D H j , the induced DAG on H j , is not perfect. It is clear that 1 This, in particular, implies that there are two distinct cliques C j 1 and C j 2 , with subscript j 1 , j 2 ≤ j, such that they contain v, v ′ and v ′ , v ′′ , respectively. Since j 1 and j 2 are distinct we may assume that j 1 < j. But since H j−1 is ancestral and v ′′ is a parent of v ′ ∈ H j−1 we must have v ′′ ∈ H j−1 . This contradicts the fact that the induced DAG on H j−1 is perfect. Now we show that in particular there exists a DAG D induced by P such that S 2 is ancestral in D. First consider the case where there are only two cliques. We start with relabeling the vertices in S 2 , H 1 \ S 2 and R 2 , respectively, in a decreasing order. If D is the DAG version of G induced by this order, then S 2 and H 1 are ancestral in D. Now suppose that such a DAG version exists for any decomposable graph with number of cliques less than r ≥ 3. By the mathematical induction there exists a DAG version D ′ of G H r−1 such that S 2 , H 1 , . . . , H r−2 are ancestral in D ′ . Without loss of generality, we can assume that the vertices in D ′ are labeled from p, . . . , p − |R r |. Let us label the vertices in R r from 1, . . . , |R r | and let D be the DAG version of G induced by this order. One can easily check that D has the desired properties. b) Suppose that by mathematical induction the lemma holds for any DAG with fewer than p vertices. Now we assume that G is a decomposable graph with p vertices. Clearly, we can assume that p ≥ 2. Let G ′ be the induced graph on V \ {1}. By our induction hypothesis, there is a perfect order P ′ = (C 1 , . . . , C k ) of G ′ such that D ′ , the induced DAG on V \ {1}, is induced by P ′ . Note that C := 1 = is a clique of G. Consider two possible cases: a) There is an i such that C i is not a clique in G. Then C = C i ∪ {1}. This implies that for each j i, C j remains a clique in G. Let us replace C i with C and define P = (C 1 , . . . , C i , . . . , C k ).
One can easily check that P is a perfect order of G, and D is induced by P.
b) For every i = 1, . . . , k, C i is a clique in G. Let i 1 := max{i : C i → C} and i 2 := min{i : C → C i }.
We use the convention that max ∅ = −∞ and min ∅ = +∞. First suppose i 1 , i 2 are both finite. Thus we have C i 1 → C → C i 2 and therefore i 1 < i 2 , because by our induction hypothesis the histories of P ′ are ancestral in D ′ . One can check that P = (C 1 , . . . , C i 1 , C, . . . , C k ) is then a perfect order of G and D is induced by P. If i 1 = +∞ or i 2 = −∞, then by appending the clique C at the end or at the beginning of (C 1 , . . . , C k ), respectively, we obtain a perfect order P and in either case D is induced by such P. i) For each i, j ∈ V if i ∈ pa ( j), then L i j = −β ji , where β ji is the partial regression coefficient of X i in the linear regression of X j on X ≺i≻ and D j j = Σ j j|≺ j≻ .

ii) If A is an ancestral subset of V, then (Σ
In particular, det (Σ A ) = j∈A D j j (also see [12] for a related result).
ii) Since A is ancestral in D, by using Equation (4.3), one can easily show that X A ∼ N |A| (0, Σ A ) ∈ N (D A ). Now let (Σ A ) −1 = KF −1 K ⊤ be the modified Cholesky decomposition of (Σ A ) −1 . Part i) and the fact that A is ancestral imply that K i j = −β i j = L i j whenever i ∈ pa ( j), and F j j = Σ j j|≺ j≻ = D j j . This implies that K = L A and F = D A Theorem 5.1. For every perfect order P = (C 1 , . . . , C r ) of the cliques of a decomposable G there exists a perfect DAG version D of G such that the family of distributions W P G , restricted to B P , is a subfamily of the family of distributions π P D .
Proof. Let D be a DAG version of G induced by P. First note that by Lemma 5.1 D is perfect. Therefore P D is identical to P G and π P D is indeed a distribution on P G . In comparing W P G with π P D it suffices to show that the Markov ratio that appears in the density of W P G can be written as products of D j j to some powers. We proceed to rewrite the corresponding Markov ratio as follows: Let K j := H j \ C j for each j = 2, . . . , r. Consider the following block-partitioning of Σ H j .
Now for each j = 2, . . . , r we have Σ H j ∈ PD G H j , and S j separates R j form K j . By Lemma 5.5 [13] we have By rewriting this and using Lemma 5.2 we obtain Similarly, Lemma 5.2 implies that det Σ S 2 = ℓ∈S 2 D ℓℓ and det Σ R 1 |S 2 = ℓ∈R 1 D ℓℓ . Now note that if (α, β) ∈ B P and S S 2 , then j∈J(P,S ) α j + c j − |S | /2 − ν (s) β (S ) = 0. Thus when the shape parameters are restricted to B P the Markov ratio above is only a product of some powers of D j j . The next proposition is essential for our purposes as it gives us the recipe that we need to construct counterexamples to the LM conjecture (II). Note that if the LM conjecture (II) is true, then the dimension of the set B, as a manifold, is r + 1. First we introduce a new notation as follows. in Equation (5.3) can be written as products of some powers of D j j . This in turn implies that for each S ∈ S D G , restricting (α, β) to the equation is not necessary and, consequently, the dimension of the corresponding set, as a manifold, is at least r + r D .
Remark 5.2. As we mentioned earlier, the LM conjecture (II) implies that for any decomposable graph G the dimension of B is r + 1. Now in light of Proposition 5.1 the LM conjecture (II) suggests that for any DAG version D of G the number r D ≤ 1. Therefore, the LM conjecture (II) can be shown not to be true if we can construct a decomposable graph G and a DAG version D such that r D > 1. We shall further exploit this line of reasoning in §7.

Comparing Homogeneous Type II Wisharts with Perfect Transitive DAG Wisharts
Henceforth in this section, let H = (V, H) denote a homogeneous graph. In this section we show that for any homogeneous graph G there is a DAG version D such that W P G is a special case of π P D on the whole parameter set B (note that, as we discussed in §3.3, when a graph is homogeneous the parameter set B is completely identified). Note also that Equation (2.1) defining homogeneous graphs naturally defines a partial order on the vertex set V of H as follows: See [14] for more details on rooted Hasse trees and on this partial order H . We let the linear order ≥ H (or simply ≥ when there is no danger of confusion) be a linear extension of the partial order H , and let D be the DAG version of H induced by ≥ H . One can easily check that D is perfect, and transitive, i.e., The above shows that any homogeneous graph has a perfect transitive DAG version. Now for a homogeneous graph we prove the following generalized form of Theorem 5.1.

Proposition 6.1. Let H be a homogeneous graph and D a perfect transitive DAG version. Then the family of Type II Wisharts for H is a subfamily of the DAG Wisharts for D.
Proof. First we claim that all the cliques, and consequently all the separators, of G are ancestral in D. To see this note the following: Let C be a clique of G and suppose u → v for some v ∈ C. Let w be any other vertex in C. Then either v → w or v ← w. Regardless, since D is homogeneous, u and w must be adjacent. Thus u ∈ C. This proves that C is ancestral in D. Now if S is a separator of G, then the fact that S = C ∩ C ′ for some C, C ′ ∈ C G implies that S is ancestral (otherwise, it implies that the clique C ′ is not ancestral leading to a contradiction). Therefore by Lemma 5.2 we obtain which obviously shows that this Markov ratio is a product of powers of D j j , therefore the family of Type II Wisharts for H is a subfamily of the DAG Wisharts for D.
In the following two examples we compare W P G and π P D in more detail. More specifically, we shall explain how the space of shape parameters is identified for each family of distributions. The space of shape parameters for W P H is identified by Theorem 3.2 in [14]. For this purpose, we follow the notion introduced in [ where n t is the number of the elements in t. By Theorem 3.2 in [14], We now proceed to compare the space of shape parameters of W P G and π P D in two concrete examples. Example 6.1. Let H be the 3-path given in Figure 2(a) and D the DAG version given in Figure 2(b). It is clear that H is a homogeneous graph and D is a perfect transitive DAG version. First we show that the densities W P H and π P D have the same functional form. Using the labeling in D, the cliques of H are C 1 = 1 , C 2 = 2 . The only separator is S 2 =≺ 1 ≻. Thus c 1 = 2, c 2 = 2, s 2 = 1. Replacing these in the corresponding Markov ratio that appears in the W P G distribution we obtain . Therefore, W P H α, β, U H and π P D (γ, U) have exactly the same functional form when γ 1 = α 1 + 3/2, γ 2 = α 2 + 3/2 and γ 3 = α 1 + α 2 − β 2 + 2. To identify the space of shape parameters for W P H we proceed to compute ρ [1] = α 1 , ρ [2] = α 2 , ρ [3] = α 1 + α 2 − β 2 , n [1] = 1, n [2] = 1, n [3] = 1. Now by using Theorem 3.2 in [14] we obtain B = {(α 1 , α 2 , β 3 ) : α 1 < 0, α 2 < 0, β 2 − α 2 − α 1 < 3/2}. Th space of shape parameters (γ 1 , γ 2 , γ 3 ) for π P D is easily determined by inequalities γ j < pa j /2 + 1, which yields the same inequalities α 1 < 0, α 2 < 0 and β 2 − α 2 − α 1 < 3/2. This shows that up to re-parametrization W P H α, β, U H and π P D (γ, U) are the same distributions. In the next example we shall show that the family of DAG Wisharts π P D (γ, U) strictly contains the family of Type II Wisharts W P H α, β, U H . Figure 3: A homogeneous graph, its Hasse diagram, and a DAG version.
Alternatively, with much less computation, from the inequalities γ j < pa j /2 + 1 we can identify B.
The examples above demonstrate that for a homogeneous graph H, the Type II Wishart is a special case of the DAG Wishart for a perfect transitive DAG version of H . Furthermore, identifying the space of shape parameter B under the DAG Wishart family is computationally less expensive.
Remark 6.1. Proposition 6.1 for homogeneous graphs is stronger than Theorem 5.1 for decomposable graphs. The reason is that the latter guarantees the family of W P G distributions is a subfamily of DAG Wisharts π P D only when (α, β) are restricted to B P ⊂ B. Proposition 6.1 however guarantees that the family of W P G is a subfamily of DAG Wisharts π P D on the whole parameter set B. Also note that Proposition 6.1 is not implied by Theorem 5.1. To see this, consider the DAG D given by Figure 2(c). This DAG is a perfect, but non-transitive, DAG version of the homogeneous graph A 3 given by Figure 2(a) and in fact induced by the perfect order P = (C 1 := {2, 3}, C 2 := {1, 2}) of the cliques of A 3 . It is easy to check that B P B. Therefore using Theorem 5.1 for this homogeneous graph does not imply that the family of W P G is a subfamily of DAG Wisharts π P D on the whole parameter set B.

Counterexamples to part II of the LM conjecture
We now return to the LM conjecture (II) in this section. Using the tools we have developed thus far, we proceed to obtain some counterexamples to show that Part (II) of the LM conjecture fails. In particular, we show that there exist decomposable graphs where the space of shape parameters B for the Type II Wisharts, over such a graph G, strictly contains P∈Ord(G) B P . Recall that a pair (α, β) ∈ R r × R r−1 belongs to B if and only if it satisfies both Equation (B1) and Equation (B2). As we hinted in Remark 5.2, if there exists a decomposable graph G, necessarily non-homogeneous, that has a perfect order P such that the DAG version D induced by P yields r D ≥ 2 (recall Definition 5.2), then Equation (B1) is satisfied on a set that strictly contains P∈Ord(G) B P . In the following examples we show that in such a situation, simultaneously for the same set, Equation (B2) can be satisfied as well. It is easy to check that the DAG D given in Figure 4(b) is a directed version of G induced by P. Using the labeling in D we have Note that for the DAG version D given by Figure 7.1 r D = 2 since S 3 and S 2 are both ancestral in D. Thus by Proposition 5.1 Equation (B1) is satisfied on a set of (α, β) that is of dimension greater than or equal to r + r D = 6. This is strictly greater than 5, the dimension of P∈Ord(G) B P . Hence it is clear that the LM conjecture (II) fails for this example. Nonetheless, for this specific example, we provide the reader with a self-contained proof. To begin with, we rewrite the Markov ratio term that appears in W P G as follows: Let γ j be the exponent of D j j in Equation (7.1). Then P G ω P G α, β, U E , dΩ < ∞ for every (α, β) ∈ R 4 × R 3 such that Now we show that for each (α, β) that satisfies Equation (7.2), not only Equation (B1) is satisfied, but also Equation (B2) is satisfied, i.e., By Equation (7.1) and Equation (5.1) we have Now similar to our computation in Equation (7.1) we can show Therefore, we have This completes our first counterexample.
Example 7.2. Let G be the non-homogeneous graph given in 5(a) and let D be the DAG given in 5(b). One can readily check that D is a DAG version of G induced by P = ( 4 , 3 , 6 , 2 , 1 ) .
Since there are 5 cliques and 4 separators r = 5 and B ⊂ R 9 . Therefore, the LM conjecture (II) here implies that the dimension of B is 6. Now r D = 2 as the separators S 2 = ≺ 3 ≻ and S 3 = ≺ 7 ≻ are ancestral ( two other separators S 4 = ≺ 5 ≻ and S 5 = ≺ 1 ≻ are not ancestral). Therefore, by where the exponents γ i and η are some linear functions of α and β. Now using Equation ( η = η(α, β) = 0. This shows that the dimension of B ≥ 8. On the other hand, we can see that for any DAG version of G always two of the separators will be be non-ancestral. Therefore, although Proposition 5.1 identifies a subset of B that is strictly larger than the subset identified by the LM conjecture (II) it does not identify the whole set B, since for any DAG version of G Proposition 5.1 will identify a subset of dimension ≤ 7.

Counterexamples to part I of the LM conjecture
In this section we once more use the theory we had developed in previous sections to produce counterexample to Part (I) of the LM conjecture. First we establish the following lemma.
Proof. We shall proceed by the method of mathematical induction. Suppose this is true for any decomposable graph with number of vertices less than p and we prove the lemma for |V| = p. The equality trivially holds when p = 1. Let us therefore assume that p > 1. As before, let r be the number of the cliques and consider the following cases.
1) Suppose that r = 1, i.e., G is complete. We write where the last equality uses the induction hypothesis for the induced graph G ≺1≻ .
2) Suppose that r ≥ 2. Note that by Lemma 5.1 we can assume that D is induced by some perfect order P = (C 1 , . . . , C r ). In particular, there exists a vertex in R r that has no child. Thus, without loss of generality, assume that the vertices in D are labeled such that 1 ∈ R r . We now consider two cases: a) If the residual R r = {1}, then (C 1 , . . . , C r−1 ) and D V\{1} are, respectively, a perfect order of the cliques and a perfect DAG version of G V\{1} . Moreover, S r =≺ 1 ≻. Now it follows that b) If the residual R r has more than one element, then (C 1 , . . . , C r−1 , C r \ {1}) is a perfect order of the cliques of G V\{1} with associated separators S 2 , . . . , S r . Using the induction hypothesis we obtain Remark 8.1. The Markov ratio in the right-hand-side of the Equation (8.1) is the squared root of the Jacobian of the inverse mapping Σ E → Σ −1 : Q G → P G . It can be shown directly that the left-hand-side of the Equation (8.1) is also is the squared root of the Jacobian of the inverse mapping Q D → P D (see [3]). We illustrate the result of Lemma 8.1 for the decomposable graph G and its perfect DAG version D given in Figure 4(a) and Figure 4(b), respectively.
Next we use Lemma 8.1 to prove an analog of Proposition 5.1 for the Letac-Massam Type I Wisharts.
where λ j = λ j (α, β) is an affine combination of the components of α and β. The expression in Equation (8.3) is the non-normalized version of the density of the generalized Riesz distribution on Q G , defined in [2], and is integrable, by Equation (18) in [2], if and only if λ j > pa j /2 for each j = 1, . . . , p.
An important observation in Proposition 8.1 is that if G is chosen such that r D > 1, then Part (I) of the LM conjecture may fail. In fact we can show that the same decomposable graphs given in Figure 4(a) and Figure 5(a) provide two counterexamples to Part (I) of the LM conjecture. Since the calculations are very similar, we provide details only for the second graph.
Consequently, the dimension of A is greater than or equal to 8. Thus the LM conjecture (I) fails.

Closing remarks
In this paper we develop appropriate tools in order to carefully compare the Wishart Type II distributions introduced by Letac and Massam in [14] for decomposable graphs and the DAG Wisharts introduced by the authors in [3]. The comparison is made when the DAG Wisharts are restricted to the class of perfect DAGs, that is where DAGs are Markov equivalent to the class of decomposable graphs. By this comparison, we establish the fact that in general, the family of Type II Wisharts is a subfamily of that of DAG Wisharts when the multi-parameters are restricted to a well identified set B P . In case of homogeneous graphs we show that the latter restriction is not needed. In light of this result we are led to a condition on the structure of the graphs that yield counterexamples to the second part the LM conjecture. By taking a similar approach we are also able to reject the first part of the LM conjecture and therefore completely resolve the LM conjecture.