Identifiability of directed Gaussian graphical models with one latent source

We study parameter identifiability of directed Gaussian graphical models with one latent variable. In the scenario we consider, the latent variable is a confounder that forms a source node of the graph and is a parent to all other nodes, which correspond to the observed variables. We give a graphical condition that is sufficient for the Jacobian matrix of the parametrization map to be full rank, which entails that the parametrization is generically finite-to-one, a fact that is sometimes also referred to as local identifiability. We also derive a graphical condition that is necessary for such identifiability. Finally, we give a condition under which generic parameter identifiability can be determined from identifiability of a model associated with a subgraph. The power of these criteria is assessed via an exhaustive algebraic computational study on models with 4, 5, and 6 observable variables.


Introduction
In this paper we study parameter identifiability in directed Gaussian graphical models with a latent variable. Our work falls in a line of work where the graphical representation of causally interpretable latent variable models is used to give tractable criteria to decide whether parameters can be uniquely recovered from the joint distribution of the observed variables (Pearl, 2009). Some examples of prior work in this context are Chen et al. (2014), Drton et al. (2011), Foygel et al. (2012, Grzebyk et al. (2004), Kuroki and Miyakawa (2004), Kuroki and Pearl (2014), Stanghellini and Wermuth (2005), Tian (2005), and Tian (2009).
The setup we consider has a single latent variable appear as a source node in the directed graph defining the Gaussian model. The resulting models can be described as follows. Let X 1 , . . . , X m be observable variables, and let L be a hidden variable, and suppose the variables are related by linear equations as where λ wv , δ v are real coefficients quantifying linear relationships, and the ǫ v are independent mean zero Gaussian noise terms with variances ω v > 0. The latent variable L is assumed to be standard normal and independent of the noise terms ǫ v . Letting X = (X 1 , . . . , X m ) T , ǫ = (ǫ 1 , . . . , ǫ m ) T and δ = (δ 1 , . . . , δ m ) T , we may present the model in the vectorized form (1.1) where Λ is the matrix (λ wv ) with λ vv = 0 for all v = 1, . . . , m. We are then interested in specific models, in which for certain pairs of nodes w = v the coefficient λ wv is constrained to zero. In particular, we are interested in recursive models, that is, models in which the matrix Λ can be brought into strictly upper triangular form by permuting the indices of the variables (and thus the rows and columns of Λ). This implies that I m − Λ is invertible, where I m is the m × m identity matrix. It follows that the observable variate vector X has a m-variate normal distribution N m (0, Σ) with covariance matrix where Ω is the diagonal matrix with Ω vv = ω v . For additional background on graphical models we refer the reader to Lauritzen (1996) and Pearl (2009). We note that the models we consider also belong to the class of linear structural equation models (Bollen, 1989).
A Gaussian latent variable model postulating recursive zero structure in the matrix Λ from (1.1) can be thought of as associated with a graph G = (V, E) whose vertex set V = {1, . . . , m} is the index set for the observable variables X 1 , . . . , X m . For two distinct nodes w, v ∈ V , the edge set E includes the directed edge (w, v), denoted as w → v if and only if the model includes λ wv as a free parameter. When the model is recursive, the directed graph G is acyclic and following common terminology we refer to G as a DAG (for directed acyclic graph). In this paper, we will then always assume that the nodes are labeled in topological order, that is, we have V = {1, . . . , m} and w → v ∈ E only if w < v.
To defined on the set Θ := R E × diag + m ×R m , which we may also view as an open subset of R 2m+|E| , where |E| is the cardinality of the directed edge set E. Clearly, the image of φ G is in PD m , the cone of positive definite m × m matrices. Note that since G is acyclic, we have (I m − Λ) −1 = I m + Λ + Λ 2 + · · · + Λ m−1 and thus the covariance parametrization φ G is a polynomial map.
In this paper we will derive graphical conditions on G that are sufficient/necessary for identifiability of the model N * (G). We begin by clarifying what precisely we mean by identifiability. The most stringent notion, namely that of global identifiability, requires φ G to be injective on all of Θ. While this notion is important (Drton et al., 2011), it is too stringent for the setting we consider here. Indeed, for any triple (Λ, Ω, δ) ∈ Θ, φ G (Λ, Ω, δ) = φ G (Λ, Ω, −δ), which implies that the fiber {(Λ ′ , Ω ′ , δ ′ ) ∈ Θ : φ G (Λ, Ω, δ) = φ G (Λ ′ , Ω ′ , δ ′ )} always has cardinality ≥ 2. We may account for this symmetry by requiring φ G to be 2-to-1 on all of Θ but this is not enough as there are always some fibers that are infinite. For instance, it is easy to show that the fiber in the above display is infinite when δ = 0. As such, it is natural to consider notions of generic identifiability. Specifically, our contributions will pertain to the notion of generic finite identifiability, as defined below, that only requires finite identification of parameters away from a fixed null set in Θ; here a null set is a set of Lebesgue measure zero. This notion is also referred to as local identifiability in other related work such as Anderson and Rubin (1956).
Null sets appearing in our work are algebraic sets, where an algebraic set A ⊂ R n is the set of common zeros of a collection of multivariate polynomials, i.e., x n ] is the ring of polynomials in n variables with coefficients in R. Note that A is a closed set in the usual Euclidean topology. If all polynomials f i are the zero polynomial then A = R n . Otherwise, A is a proper subset, A R n , and its dimension is then less than n. In particular, a proper algebraic subset of R n has measure zero.
Definition 1.2. Let S be an open subset of R n , and let f be a map defined on S. Then f is said to be generically finite-to-one if there exists a proper algebraic set S ⊂ R n such that the fiber of s, i.e. the set {s ′ ∈ S : f (s ′ ) = f (s)}, is finite for all s ∈ S \S. Otherwise, f is said to be generically infinite-to-one. Definition 1.3. The model N * (G) of a given DAG G = (V, E) is said to be generically finitely identifiable if its parametrization φ G defined on Θ is generically finite-to-one. We also say the DAG G is generically finitely identifiable for short.
Hereafter for any map f defined on an open domain S ⊂ R n , we will use to denote the fiber of a point s ∈ S. If T is a subset of S, we will use f | T to denote the restriction of f to T , in which case for any t ∈ T , we have the fiber The term "generic point" will refer to any point in the domain S that lies outside a fixed proper algebraic subsetS, and a property is said to hold generically if it holds everywhere on S \S. The following well-known lemma is a main tool in this paper, and its proof will be included in Appendix A for completeness. It gives as an immediate corollary a trivial necessary condition for generic finite identifiability. (i) f is generically finite-to-one. (ii) There exists a proper algebraic subsetS ⊂ R n such that the fibers of the The Jacobian matrix of f is generically of full column rank.
Corollary 1.2. Given a DAG G = (V, E), a necessary condition for generic finite identifiability of its associated model N * (G) is that m+1 Proof of Corollary 1.2. The Jacobian matrix of φ G is of size m+1 2 × (|E| + 2m), and it is necessary that m+1 2 ≥ |E| + 2m for it to have full column rank.
Property (ii) is seemingly weaker than (i) in Lemma 1.1. It is useful in proving our results in Section 5. In light of Corollary 1.2, for the rest of this paper we will restrict our attention to DAGs G = (V, E) with m+1 2 − 2m ≥ |E|, in which case m must be at least 3.
One of our contributions is a sufficient graphical condition stated in Theorem 1.3 below. For v = w ∈ V , we will use v -w or w -v to denote the edge (v, w) = (w, v) of an undirected graph on V . With slight abuse of notation, we may also use v -w or w -v to denote an edge v → w ∈ E when the directionality of edges in a DAG G = (V, E) is to be ignored. For any directed/undirected graph G = (V, E), the complement of G, denoted as G c = (V, E c ), is the undirected graph on V with the edge set E c = {v -w : (v, w) ∈ E and (w, v) ∈ E}. Theorem 1.3 (Sufficient condition for generic finite identifiability). The model N * (G) given by a DAG G = (V, E) is generically finitely identifiable if every connected component of G c contains an odd cycle. Figure 1.1 shows a DAG G that satisfies the sufficient condition in Theorem 1.3; its undirected complement G c is shown on the right of the figure. We will revisit this example in Section 4, where we report on algebaric computations that show that for this graph G the fibers of φ G are generically of size 2 or 4.
Our approach to proving Theorem 1.3 also yields a necessary condition for generic finite identifiability. This condition can be stated in terms of two undirected graphs on the node set V , denoted G |L,cov = (V, E |L,cov ) and G con = (V, E con ), where E |L,cov captures the dependency of variable pairs after conditioning on the latent variable L, and E con captures the dependency of variable pairs after conditioning on all other variables. From (1.1) it can be seen that Σ |L := (I m − Λ T ) −1 Ω(I m − Λ) −1 is the covariance matrix of X conditioning on L, hence v -w ∈ E |L,cov if and only if (Σ |L ) vw ≡ 0, and analogously v -w ∈ E con if and only if (Σ −1 |L ) vw ≡ 0. It is well known that these two undirected graphs can be obtained by using the d-separation criterion applied to the extended DAG G; see Drton et al. (2009, p. 73) for example. A graph G (left), G con (middle) and G c con (right). Since |E con | − |E| = 1 < 2 = d con , the necessary condition in Thm. 1.4 does not hold.
Theorem 1.4 (Necessary condition for generic finite identifiability). Given a DAG G = (V, E), for the model N * (G) to be generically finitely identifiable, it is necessary that the following two conditions both hold: (i) |E con | − |E| ≥ d con , where d con is the number of connected components in the graph (G con ) c that do not contain any odd cycle; (ii) |E |L,cov | − |E| ≥ d cov , where d cov is the number of connected components in the graph (G |L,cov ) c that do not contain any odd cycle.
Figure 1.2 gives an example of a DAG that fails to satisfy our necessary condition, specifically, condition (ii).
In addition to the closely related work of Stanghellini (1997) and Vicard (2000), identifiability of directed Gaussian models with one latent variable has been studied by Stanghellini and Wermuth (2005). The models we treat here are special cases with the latent node being a common parent of all the observable nodes. As we review in more detail in Section 2, we can readily adapt the sufficient graphical criteria given in Stanghellini and Wermuth (2005) for certifying that the model N * (G) of a given DAG G is generically finitely identifiable with respect to Definition 1.3. Our own sufficient condition stated in Theorem 1.3 is stronger, in the sense that every DAG G satisfying the sufficient conditions in Stanghellini and Wermuth (2005) necessarily satisfies the condition in Theorem 1.3. However, when it applies the result of Stanghellini and Wermuth (2005) yields a stronger conclusion than our generic finiteness result. Indeed as we also emphasize in the discussion in Section 6, their conditions imply that the parmetrization is generically 2-to-1.
We will prove the above stated Theorems 1.3 and 1.4 in Section 3. Since the parametrization map in (1.3) is polynomial, the generic finite identifiability of a given model is decidable by algebraic techniques that involve Gröbner basis computations. In Section 4, we will study the applicability of our graphical criteria via such algebraic computations for all models N * (G) of DAGs G with m = 4, 5, 6 nodes. Section 5 will give results on situations where we can determine generic finite identifiability of a model N * (G) based on knowledge about the generic finite identifiability of a model N * (G ′ ), where G ′ is an induced subgraph of G.
Before ending this introduction, however, we comment on the role that Markov equivalence plays in our problem. Recall that two DAGs defined on the same set of nodes are Markov equivalent if they have the same d-separation relations. The following theorem, which will be proved in Appendix A, says that generic finite identifiability is a property of Markov equivalence classes of DAGs.
Theorem 1.5. Suppose G 1 = (V, E 1 ) and G 2 = (V, E 2 ) are two Markov equivalent DAGs on the same set of nodes V . Then the model N * (G 1 ) is generically finitely identifiable if and only if the same is true for N * (G 2 ). Stanghellini and Wermuth (2005) give sufficient graphical conditions for identifiability of directed Gaussian graphical models with one latent variable that can be any node in the DAG. We revisit their result in the context of the models from Definition 1.1 and formulate it in terms of generic finite identifiability. (As was mentioned in the Introduction, their result yields in fact the stronger conclusion of a generically 2-to-1 parametrization.) We begin by stating a well-known fact about DAG models without latent variables.

Prior work
be the parent set of the node v. Then one can show, by induction on m and considering a topological ordering of V , that compare, for instance, Richardson and Spirtes (2002, §8).
Let the random vector X and the latent variable L have their joint distribution specified via the equation system from (1.1). Write Σ |L for the conditional covariance matrix of X given L. Then it holds that Hence, by Lemma 2.1, when knowing Σ |L we can uniquely solve for the pair (Λ, Ω), which are rational functions of Σ |L . Writing Σ for the (unconditional) covariance matrix of X, we have from (1.2) that Consequently, (Λ, Ω) can be recovered uniquely from Σ and (I m − Λ T ) −1 δ. The results of Stanghellini and Wermuth (2005) then address identification of the vector (I m − Λ T ) −1 δ, which holds the covariances between each coordinate of X and the latent variable L. We obtain the following observation.
has an odd cycle, or (ii) every connected component of G c con = (V, E c con ) has an odd cycle. Proof. Theorem 1 in Stanghellini and Wermuth (2005) gives (i) or (ii) as a sufficient condition for identifying, up to sign, the m-vector (I m − Λ T ) −1 δ when Σ = φ G (Λ, Ω, δ) for a generic point (Λ, Ω, δ) in Θ. In this case, we can uniquely recover the conditional covariance matrix Σ |L from (2.1) and also the pair (Λ, Ω)  We conclude this review of prior work by pointing out that any model N * (G) that can be determined to be generically finitely identifiable using Proposition 2.2 can also be found to have this property using our new Theorem 1.3.

Proposition 2.3. A DAG G = (V, E) satisfying either one of the conditions in Proposition 2.2 necessarily satisfies the condition in Theorem 1.3.
Proof. Let G |L,cov = (V, E |L,cov ) and G con = (V, E con ). An edge v → w ∈ E also present itself as an undirected edge in both E |L,cov and E con . Hence, when ignoring the directionality of its edges, G is a subgraph of both G |L,cov and G con and, thus, G c is a supergraph of both G c |L,cov and G c con . As such, if every connected component of G c |L,cov , or of G c con , contains an odd cycle, the same is true of G c .

Criteria based on the Jacobian of parametrization maps
In this section, we prove Theorems 1.3 and 1.4. Let G = (V, E) be a fixed DAG with m = |V | nodes, and let Θ : of the covariance matrix of the distributions in model N * (G). We begin by introducing other mappings that are generically finite-to-one if and only if φ G is generically finite-to-one.
First, it will be helpful to study the map defined on Θ. Second, focusing on concentration instead of covariance matrices, we will also consider the maps Lemma 3.1. The parametrization φ G is generically finite-to-one if and only if any one of the mapsφ G , ϕ G andφ G is generically finite-to-one.
is a diffeomorphism that maps Θ to itself. By the chain rule, the Jacobian of φ G at (Λ, Ω, δ) is the product of the Jacobian ofφ G at g(Λ, Ω, δ) and the Jacobian of g at (Λ, Ω, δ). Now the latter matrix is invertible on all of Θ since g is a diffeomorphism. It follows that there exists a point in Θ at which the Jacobian of φ G has full column rank if and only if the same is true forφ G . For the Jacobian of a polynomial map such as φ G andφ G , full column rank at a single point implies generically full column rank; use the subdeterminants that characterize a drop in rank to define a proper algebraic subset of exceptions, see also Geiger et al. (2001, Lemma 9). The claim about φ G andφ G follows from Lemma 1.1.
In order to complete the proof of the lemma it suffices to show that φ G is generically finite-to-one if and only if the same holds for ϕ G . Define another diffeomorphism from Θ to itself as Writing inv for matrix inversion, we then have that Rao (1973, p. 33). Using (3.4), the equivalence of being generically finite-to-one for φ G and ϕ G may be argued similarly as for the maps considered earlier.
Let J(φ G ) be the Jacobian matrix of the mapφ G from (3.3). It will be examined to prove Theorem 1.3. In light of Lemmas 1.1 and 3.1, we will show that if G satisfies the condition in Theorem 1.3, then J(φ G ) is generically of full column rank, implying that φ G is generically finite-to-one. Our arguments will make use of the following lemma that rests on observations made in Vicard (2000).
Then the Jacobian of f G has generic rank m − d, where m = |V | is the number of nodes and d is the number of connected components of G that do not contain an odd cycle. Proof. For simpler notation, let f := f G . Let J f be the Jacobian matrix of the polynomial map f , and let ker(J f ) be its kernel. By the rank theorem (Rudin, 1976, p. 229), the dimension of ker(J f ) is generically equal to the dimension of the fiber F f ; recall (1.4). Since rank(J f ) = m − dim(ker(J f )), it suffices to show that F f has generic dimension d.
Since the claim is about a generic property, we may restrict the domain of f to the open set X := (R \ {0}) m . This assumption is made so that Lemma 1 in Vicard (2000) is applicable later without difficulty. Now, fix a point y ∈ f (X ) ⊂ R E . The elements of the fiber F f (y) are the vectors x ∈ R m , or equivalently, x ∈ X , that are solutions to the system of equations be the connected components of G, so that V 1 , . . . , V k form a partition of V and E 1 , . . . , E k partition E. Let k ′ ≤ k be the number of connected components containing two nodes at least. Without loss of generality, assume G k ′ +1 , . . . , G k are all the connected components with only a single node. Then the equations listed in (3.5) can be arranged to form k ′ disjoint subsystems indexed by i = 1, . . . , k ′ . The i-th subsystem has the form Vicard (2000) and also the relevant discussion in the proof of Theorem 1 in the same paper, the solution set to (3.6) either contains two points or can be parametrized by a single free variable in R. The former case arises if and only if G i contains an odd cycle. It follows that the dimension of the solution set of (3.6) is zero when G i contains an odd cycle, and it has dimension one if G i does not contain an odd cycle. In addition, each singleton component . , k provides one additional dimension to the fiber F f (y), since the corresponding variables in x are not restricted by any equations. We conclude that the dimension of F f (y) equals the number of connected components G i that do not contain an odd cycle.
We return to the object of study, namely, the mapφ G which sends the (2m+|E|)dimensional set Θ = R E ×diag + m ×R m to the m+1 2 -dimensional space of symmetric m × m matrices. The Jacobian J(φ G ) is of size m+1 2 × (2m + |E|), and we index its rows by pairs (v, w) with 1 ≤ v < w ≤ m, whereas in Section 1 we assume the vertex set V = {1, . . . , m} to be topologically ordered. We now describe a particular way of arranging the rows and columns of J(φ G ).
Define the set of "non-edges" as N : all entries in the upper triangular half of an m × m symmetric matrix. The rows of J(φ G ) are now arranged in the order D, E and N . The columns of J(φ G ) are indexed such that partial derivatives with respect to the free input variables in the triple (Λ, Ψ, γ) appear from left to right, in the order Ψ, Λ and γ. In other words, we partition J(φ G ) into 9 blocks as follows: The following lemma is obtained by inspection of the partial derivatives ofφ G . Its proof appears in Appendix A.
Lemma 3.3. The Jacobian matrix J(φ G ) is generically of full column rank provided that the submatrix [J(φ G )] N,γ is so.
We now give the proof of Theorem 1.3.
Proof of Theorem 1.3. By Lemmas 1.1 and 3.3 , it suffices to show that [J(φ G )] N,γ is generically of full column rank. For each v → w ∈ N , Note that only the right most term in (3.8) contributes to the partial derivatives of ϕ G with respect to γ = (γ v ) v∈{1,...,m} .
Ignoring the directionality of non-edges in N , define the undirected graph H = (V, N ) to which we associate a map f H as in Lemma 3.2. Then But J fH has generically full column rank by Lemma 3.2 because, in fact, H is equal to the complementary graph G c for which we assume that all connected components contain an odd cycle.
We remark that Theorem 1.3 can also be proven by studying the Jacobian of the mapφ G from (3.1). We chose to work withφ G above since this allowed us to avoid consideration of the inverse of the matrix I m − Λ. For Theorem 1.4, however, we consider bothφ G andφ G .
Proof of Theorem 1.4. We first prove the necessity of condition (i) by showing that if |E con | − |E| < d con , then the Jacobian matrix J(φ G ) always has row rank less than 2m + |E|. This implies that it cannot be of full column rank which implies the failure of generic finite identifiability by Lemma 1.1.
As in the proof of Theorem 1.3, we consider the set of non-edges N , which we now partition as N = N 1∪ N 2 , where N 1 = {v → w ∈ E : v -w ∈ E con }, and N 2 = N \ N 1 . Accordingly, we can partition the submatrix [J(φ G )] N,{Ψ,Λ,γ} into two block of rows indexed by N 1 and N 2 as To see that the submatrix [J(φ G )] N2,{Ψ,Λ} = 0, observe first that an entry of (I − Λ)Ψ(I − Λ T ) is the zero polynomial if and only if the same is true for Σ −1 |L , where Σ |L is the matrix from (2.1). Second, by definition of E con and N 2 , if (v, w) ∈ N 2 then (Σ −1 |L ) vw = 0. Next, observe that to prove the necessity of condition (i) it suffices to show that the rank of [J(φ G )] N2,γ cannot be larger than m − d con . Indeed, if this is true, then there exists a subset N ′ has the same rank as the original Jacobian matrix J(φ G ). However, the submatrix [J(φ G )] {D,E,N1,N ′ 2 },{Ψ,Λ,γ} has 2m + |E con | − d con rows, and thus its rank is less than 2m + |E| because under condition (i) we have |E con | − |E| < d con . As a result, J(φ G ) cannot be of full column rank.
It now remains to show that [J(φ G )] N2,γ has rank at most m − d con . Observe that the undirected graph (V, N 2 ) is equal to the complementary graph (G con ) c . Moreover, [J(φ G )] N2,γ is equal to the negative Jacobian of the map f (Gcon) c that we get by applying the construction from Lemma 3.2 to (G con ) c ; recall the proof of Theorem 1.3. Applying Lemma 3.2, we find that [J(φ G )] N2,γ has generic rank m − d con , which is also the maximal rank that [J(φ G )] N2,γ may have.

Algebraic computations and examples
As explained in Drton (2006, §3) and Garcia-Puente et al. (2010), identifiability properties of a model such as N * (G) can be decided using Gröbner basis techniques from computational algebraic geometry (Cox et al., 2007). While these techniques are tractable only for small to moderate size problems, we were able to perform an exhaustive algebraic study of all DAGs G = (V, E) with m ≤ 6 nodes. Beyond a mere decision on whether the parametrization map φ G is generically 1-to-1, the algebraic methods also provide information about the generic cardinality of the fibers of φ G as a map defined on complex space.
Definition 4.1. For a DAG G = (V, E), let φ C G be the map obtained by extending φ G to the complex domain C 2m+|E| . If the (complex) fibers of φ C G are generically of cardinality k, then we say that φ C G is generically k-to-one. The language of Definition 4.1 allows us to give a refined classification of DAGs G in terms of the identifiability properties of the parametrization of model N * (G). Indeed, N * (G) is generically finitely identifiable if and only if φ C G is generically k-to-one for some k < ∞.
Remark. The generic size of the fibers of φ C G equals the generic size of the fibers of the complex extensions of the three maps from Lemma 3.1. The mapφ G has low degree coordinates and tends to be the easiest to work with in algebraic computation. Another approach that can be useful is to adapt the algorithm described in Section 8 of the supplementary material for Foygel et al. (2012). To do this note that for Λ ∈ C E there exist complex choices of Ω and δ such that φ G (Λ, Ω, δ) = Σ if and only if (I − Λ T )Σ(I − Λ) is a matrix that is the sum of a diagonal matrix, namely, Ω, and a symmetric matrix of rank 1, namely, δδ T . Whether a matrix is of the latter type can be tested using tetrads, that is, 2 × 2 subdeterminants involving only off-diagonal entries of the matrix; see also (5.4) below. The tetrads of a matrix form a Gröbner basis (de Loera et al., 1995, Drton et al., 2007. Table 1 lists out the counts of DAGs G = (V, E), with 4 ≤ m ≤ 6 nodes, that have φ C G generically k-to-one, for all possible values of k. The table also gives the the counts of DAGs satisfying the conditions in Theorems 1.3 and 1.4 as well as Proposition 2.2. DAGs with m+1 2 − 2m < |E|, which trivially give generically ∞-to-one maps φ C G in view of Corollary 1.2, are excluded. We emphasize that the counts are with respect to unlabeled DAGs, that is, all DAGs that are isomorphic with respect to relabeling of nodes are counted as one unlabeled graph.
In the considered settings the condition in Theorem 1.3 is very successful in certifying DAGs with a generically finitely identifiable model. For instance, when Total # of DAGs 6 115 3896 m = 6, it is able to correctly identify 2957 out of 3344 such graphs. The previously known sufficient condition of Stanghellini and Wermuth (2005) identifies 985 of them. Our necessary condition in Theorem 1.4 is also useful in assessing graphs that give generically infinite-to-one models. For instance, when m = 6, we find that 361 of 552 such graphs violate the condition; recall the example from Figure 1.2. While, by Proposition 2.3, our sufficient condition in Theorem 1.3 is stronger than that in Proposition 2.2 for generic finite identifiability, the latter condition, due to Stanghellini and Wermuth (2005), in fact implies that φ C G is generically 2to-one. For m = 5, there are 6 DAGs that satisfy the condition in Theorem 1.3 but give generically 4-to-one maps φ C G . The graph from Figure 1.1 is an example. We note that for this graph G the fibers of φ C G intersect the statistically relevant set Θ in either 2 or 4 points, and both possibilities do occur.

Subgraph extension
This section concerns results on how we can extend knowledge about identifiability of an induced subgraph to that of the original DAG. We recall standard terminology in graphical modeling. For a given DAG G = (V, E), we write pa(v) = {w : w → v ∈ E} for the parent set of the node v, and ch(v) = {w : v → w ∈ E} for the child set of v. If for some node s ∈ V there does not exist a node s ′ ∈ V with s → s ′ ∈ E, then s is a sink node. If there is no other node s ′ ∈ V with s ′ → s ∈ E, then s is a source node. The following theorem is the main result of this section. Recall that in Table 1 there are 3344 − 2957 = 387 DAGs with m = 6 nodes that are generically finitely identifiable but do not satisfy our sufficient condition from Theorem 1.3. The above Theorem 5.1 provides a way to certify identifiability of models falling within this "gap", provided that we have knowledge of which DAGs on m = 5 nodes are generically finitely identifiable. For instance, from our algebraic computations we know that there are 95 − 88 = 7 DAGs that are generically finitely identifiable but cannot be proven to be so by Theorem 1.3. Of the 387 aforementioned DAGs on 6 nodes, 194 can be proven to be generically finitely identifiable by using the knowledge about the 7 graphs on m = 5 nodes and applying Theorem 5.1. We remark that if a DAG satisfies the condition in Theorem 1.3, the resulting supergraph obtained by augmenting a sink (source) node that does not have every other node as its parent (child) must also satisfy the condition in Theorem 1.3. Hence, given current state-of-the-art, Theorem 5.1 is useful primarily as a tool to reduce the identifiability problem to smaller subgraphs that may then be tackled by algebraic methods.
Definition 5.1. A symmetric matrix Υ ∈ R m×m of size m ≥ 3 is a Spearman matrix if Υ = Ω + δδ T for a diagonal matrix Ω with positive diagonal and a vector δ with no zero elements.
Any Spearman matrix Υ is positive definite, and it is not difficult to show that if Υ = Ω + δδ T is Spearman with m ≥ 3 then the two summands Ω and δδ T are uniquely determined as rational functions of Υ. Moreover, δδ T determines δ up to sign change. For these facts see, for instance, Theorem 5.5 in Anderson and Rubin (1956). We term Ω the diagonal component of Υ, and δδ ′ the rank-1 component. The following theorem gives an implicit characterization of Spearman matrices of size m ≥ 4.
Theorem 5.2. A positive definite symmetric matrix Υ = (υ ij ) ∈ R m×m of size m ≥ 4 is a Spearman matrix if and only if, after sign changes of rows and corresponding columns, all its elements are positive and such that This is essentially the same as Theorem 1 in Bekker and de Leeuw (1987), which the reader is referred to for a proof. Unlike Bekker and de Leeuw (1987), we have a strict inequality in (5.3) since in Definition 5.1 we require the diagonal component of a Spearman matrix to be strictly positive.
The three polynomial expressions in (5.2) are the 2 × 2 off-diagonal minors of the matrix Υ, which are also known as tetrads in the literature. We call the quadruple i < j < k < l the indices of the tetrad they define. Note that so that the three tetrads in (5.2) are algebraically dependent. In general, a symmetric m × m matrix Υ has 2 m 4 algebraically independent tetrads and we write TETRADS(Υ) to denote a column vector comprising a choice of 2 m 4 algebraically independent tetrads.
For each triple (Λ, Ω, δ) ∈ Θ \ Ξ that solves (5.1), it must be true that Together with the uniqueness of the diagonal and rank-1 components for a Spearman matrix, if we can show only finitely many Λ's solve the system (5.4), then we have shown that the model N * (G) is generically finitely identifiable. Our proof for Theorem 5.1(i) follows this approach. Alternatively, based on Lemma 3.1, we can also prove generic finite identifiability by considering the map ϕ G from (3.2). We then need to show that there exists a proper algebraic subset Ξ ⊂ R 2m+|E| so that |F ϕG| Θ\Ξ (θ 0 )| < ∞ for all θ 0 = (Λ 0 , Ψ 0 , γ 0 ) ∈ Θ \ Ξ, or equivalently, has finitely many solutions for (Λ, Ψ, γ) in Θ \ Ξ. Again we assume that Ξ is defined to avoid issues due to zeros, that is, every triple (Λ, Ψ, γ) ∈ Θ \ Ξ has γ i = 0 for all i = 1, . . . , m. We introduce the term coSpearman matrix to describe the matrix on the right hand side of (5.5).
Definition 5.2. A symmetric matrix Υ ∈ R m×m of size m ≥ 3 is a coSpearman matrix if Υ = Ψ − γγ T for a diagonal matrix Ψ with positive diagonal and a vector γ with no zero elements.
Again, the diagonal component Ψ and the rank-1 component γγ T are uniquely determined by Υ; compare Stanghellini (1997, p. 243). The following theorem is analogous to Theorem 5.2.

is a coSpearman matrix if and only if, after sign changes of rows and corresponding columns, all its non-diagonal elements are negative and such that
Using the tetrad characterizations (5.6) and the uniqueness of diagonal and rank-1 components, one can now demonstrate that the restricted map ϕ G | Θ\Ξ has finite fibers by showing that the system of tetrad equations admits only finitely many solutions for Λ when θ 0 ∈ Θ \ Ξ. The finiteness of solutions in Λ for the system (5.4), or (5.8), is a sufficient condition for the generic finite identifiability of N * (G). It is, however, not obvious that these two systems necessarily have finitely many solutions when N * (G) is generically finitely identifiable. The following lemma states that such a converse does hold for the following two types of DAGs, whose generic finite identifiability can be easily checked by Theorem 1.3. Recall that the notation " " means "being a proper subset of".
The proof of Lemma 5.4 is deferred to Appendix A.
Proof of Theorem 5.1. We will first prove (i), which uses Lemma 5.4(i). The proof of (ii) will follow from similar reasoning using Lemma 5.4(ii). Without loss of generality, assume that the sink node s = m, by giving the nodes a new topological order if necessary. Define two DAGs as follows. First, let G 1 = (V 1 , E 1 ) be the subgraph of G induced by the set V 1 = V \ {m} = [m − 1], where we adopt the shorthand [k] := {1, . . . , k}, k ∈ N. Second, let G 2 = (V, E \E 1 ) be the graph on V obtained from G by removing all edges that do not have the sink node m as their head. As before, let Θ := R E × diag + m ×R m . We will construct a proper algebraic subset Ξ, such that for any θ ∈ Θ \ Ξ, the fiber F φG| Θ\Ξ (θ) is finite. Then Lemma 1.1(ii) applies and yields the assertion of Theorem 5.1(i).
The proof of (ii) is analogous, and we only give a sketch. Instead of considering φ G we turn to ϕ G , which also has domain Θ. Without loss of generality, we let the source node be s = 1. We then define G 1 = (V 1 , E 1 ) to be the subgraph of G that is induced by V 1 = {2, . . . , m}, and we let G 2 = (V, E \ E 1 ). We consider the parametrization ϕ G1 with domain Θ 1 = R E1 × diag + m−1 ×R m−1 . By assumption, N * (G 1 ) is generically finitely identifiable, so there exists a proper algebraic subset Ξ ′ 1 such that ϕ G1 | Θ\Ξ ′ 1 has finite fibers, by Lemma 1.1(ii). On the other hand, for any (Λ, Ψ, γ) ∈ Θ, we have (v,w)∈E1 and λ 1,ch(1) := (λ 1v ) T v∈ch(1) . Then the tetrad equations with one index equal to s = 1 yield the equation system where part (ii) of Lemma 5.4 can be applied to show that C (λ E1 , ϕ G (Λ, Ψ, γ)) is of full rank outside some proper algebraic subset Ξ 2 . We may then define a set Ξ as in the proof of part (i) and use arguments similar to the ones above for a proof of part (ii) of our theorem.

Discussion
In this paper we studied identifiability of directed Gaussian graphical models with one latent variable that is a common cause of all observed variables. To our knowledge, the best criteria to decide on identifiability of such models are those given by Stanghellini and Wermuth (2005) who consider a more general setup of Gaussian graphical models with one latent variable. Their results provide a sufficient condition for the strictest notion of identifiability that is meaningful is this context, namely, whether the parametrization map is generically 2-to-one. Recall that the coefficients associated with the edges pointing from the latent variable to the observables can only be recovered up to a common sign change.
In our work, we take a different approach and study the Jacobian matrix of the parametrization, which leads to graphical criteria to check whether the parametrization is finite-to-one. Our sufficient condition covers all graphs that can be shown to have a 2-to-one parametrization by the conditions of Stanghellini and Wermuth (2005). However, our sufficient condition, which is stated as Theorem 1.3, covers far more graphs as was shown in the computational experiments in Section 4. Our Theorem 1.4 describes a complementary necessary condition.
By studying tetrad equations, we also give a criterion that allows one to deduce identifiability of certain graphs from identifiability of subgraphs (Theorem 5.1). This result is stated for generic finite identifiability but as is clear from the proof, the result would also confirm that the parametrization of a graph is generically 2to-one provided the involved subgraph has a generically 2-to-one parametrization.
The extension result from Theorem 5.1 can be used in conjunction with the results obtained by the algebraic computations in Section 4. These computations solve the identifiability problem for graphs with up to 6 nodes. In particular, we confirm that the sufficient conditions of Stanghellini and Wermuth (2005) are not necessary for the parametrization map to be generically 2-to-one and provide examples of graphs that yield a generically finite but not 2-to-one parametrization.
As mentioned above, we studied models with one latent source 0 that is connected to all nodes that represent observed variables. However, the graphical criteria in Theorems 1.3 and 1.4 can be readily extended to models with some of these factor loading edges missing. Given the previously used notation, we describe such models as follows. Let G = (V, E) be a DAG with vertex set of size m = |V |; these vertices index the observed variables. Let V ′ ⊂ V be the nodes representing observed variables that do not directly depend on the latent variable. Then only the edges 0 → v with v ∈ V \ V ′ are added when forming the extended DAG G. The parametrization of the Gaussian graphical model determined by G and V ′ is the restriction of φ G from (1.3) to the domain When the parametrization mapsφ G , ϕ G andφ G are restricted to the same domain, the assertion of Lemma 3.1 still holds. The corresponding identifiability results, which are in the spirit of Corollary 1 in Grzebyk et al. (2004), are stated below. A brief outline of their proofs is given in Appendix A.
Theorem 6.1 (Sufficient condition). Let G = (V, E) be a DAG, and let V ′ ⊂ V . If every connected component of (G c ) V \V ′ , the subgraph of G c induced by V \ V ′ , contains an odd cycle, then the parametrization map φ G is generically finite-to-one when restricted to the domain Θ(V ′ ).
The necessary condition given next makes references to the graphs G con and G |L,cov that were defined in the introduction.
Theorem 6.2 (Necessary condition). Let G = (V, E) be a DAG, and let V ′ ⊂ V . In order for the restriction of φ G to the domain Θ(V ′ ) to be generically finite-to-one, it is necessary that the following two conditions both hold: (i) Let G c con = (V \ V ′ , E con ) be the subgraph of G c con induced by V \ V ′ . If d con is the number of connected components in the graph G c con that do not contain any odd cycle, then | E con | − |E| ≥ d con .
(ii) Let G c |L,cov = (V \ V ′ , E |L,cov ) be the subgraph of G c con induced by V \ V ′ . If d cov is the number of connected components in the graph G c |L,cov that do not contain any odd cycle, then | E |L,cov | − |E| ≥ d cov .
While Theorems 6.1 and 6.2 may be useful in some contexts, models in which latent variables are parents to only some of the observables deserve a more in-depth treatment in future work. In particular, it would be natural to seek ways to combine the results of Stanghellini and Wermuth (2005) and the present paper with the work of Foygel et al. (2012) and Drton and Weihs (2015).

Appendix A. Proofs
Proof of Lemma 1.1. We may assume d ≥ n, otherwise J f is never of full column rank. The implication (i) ⇒ (ii) is obvious.
To show (ii) ⇒ (iii), suppose for contradiction that J f is not generically of full rank. Since f is polynomial, we then know that Rank(J f ) = r < n generically, that is, outside a proper algebraic subset S ′ ⊂ R n the rank is constant r. By the rank theorem (Rudin, 1976, p. 229), for every point s ∈ S \ (S ′ ∪S), we can choose an open ball B(s) that contains s, is a subset of S \ (S ′ ∪S) and for which the restricted map f | Bs has fibers of dimension n − r > 0, contradicting (ii).
It remains to show (iii) ⇒ (i). We observe that since f is a polynomial we can assume S = R n . We then show that the set of points with an infinite fiber, denoted is contained in a proper algebraic subset of R n . We note that it suffices to assume n = d, for without loss of generality, we can permute the d component functions of f and assume that π • f : R n −→ R n has a generically full rank Jacobian matrix, where π is the projection onto the first n coordinates. Then F f ⊂ F π•f . Now, assume d = n, and let C = {s ∈ R n : det J f (s) = 0} be the set of critical points of f , where J f is the Jacobian matrix of f . Note that by assumption C is a proper algebraic subset of R n .
Claim. If y ∈ R n is a point such that |F f (y)| = ∞, then F f (y) ∩ C = ∅.
Proof of the Claim. If an algebraic set like F f (y) is infinite, then it has dimension k > 0. By semialgebraic stratification (Basu et al., 2006), one can see that there exists an open set U ⊂ R k and a differentiable map g : U −→ F f (y) such that the Jacobian of g has full rank on U . If F f (y) ∩ C = ∅, then the chain rule yields that the composition f • g : U → {y} has Jacobian of positive rank. This, however, is a contradiction because f • g is a constant function. Hence, F f (y) ∩ C = ∅.

The claim implies that
given that f is a polynomial. To finish the proof we only need to show that f −1 (f (C)) has dimension less than n, which is equivalent to f −1 (f (C)) = R n . By Sard's theorem (Basu et al., 2006, p. 192), f (C), and thus also f (C), has dimension less than n. If f −1 (f (C)) = R n , then the inverse function theorem, which says that the restricted map f | R n \C is a local diffeomorphism, is contradicted.
Proof of Theorem 1.5. Let m = |V |. For i = 1, 2, let G i = (V , E i ) be the extended DAG of G i , i.e., V = {0, 1, . . . , m}, and E i = E i ∪ {0 → v : v ∈ {1, . . . , m}}. By the well-known characterization that two DAGs are Markov equivalent if and only if they have the same skeleton and v-structures (Pearl, 2009), it is easy to see that G 1 and G 2 are also Markov equivalent. For and Ω is a diagonal matrix with Ω 00 = 1 and Ω vv = Ω vv for v = 1, . . . m. Then the image Φ Gi (Θ i ) is the set of all covariance matrices of (m + 1)-variate Gaussian distributions that obey the global Markov property of G i and have the variance of node 0, which represents the latent variable L, equal to 1. Consider the projection π(Σ) = Σ {1,...,m},{1,...,m} , where Σ has its rows and columns indexed by {0, . . . , m}. Then the parametrization map for the latent variable model N * (G i ) equals Since G 1 and G 2 are Markov equivalent, Φ G1 (Θ 1 ) = Φ G2 (Θ 2 ). By Lemma 2.1, each map Φ G i is injective on Θ i with rational inverse defined on the common image Φ G 1 (Θ 1 ) = Φ G 2 (Θ 2 ). From (A.1), we obtain that is a diffeomorphism, the chain rule implies that the Jacobian of φ G1 can be of full column rank if and only if the same is true for φ G2 . Since φ Gi are polynomial, the two Jacobians either both have generically full rank or are both everywhere rank deficient. By Lemma 1.1, φ G1 is generically finite-to-one if and only if φ G2 is so.
Since the concerned matrix has polynomial entries, we need to show that the determinant of [J(φ G )] {D,E,N ′ },{Ψ,Λ,γ} is a nonzero polynomial. To this end, it is sufficient to show that the determinant is a nonzero polynomial in the entries of (Λ, γ) when we specialize ψ 1 = · · · = ψ m = 1. Noting that |Ψ| + |Λ| + |γ| = |D| + |E| + |N ′ |, let P denote the set of all permutation functions mapping from the set D ∪ E ∪ N ′ to the set of free variables in Λ, Ψ and γ. Choose any ordering of the elements of domain and codomain so as to have a well-defined sign for the permutations. Then by Leibniz's formula, we have LetP be the subset of all permutations σ ∈ P with σ((v, v)) = ψ v for all (v, v) ∈ D and σ((v, w)) = λ vw for all (v, w) ∈ E. Then we obtain that where the equality in (A.12) follows from (A.2), (A.6) and the fact that ψ 1 = · · · = ψ m = 1. We also deduce from (A.2)-(A.11) that every summand in the second term of (A.12) is either zero or a polynomial term involving free variables of Λ. In contrast, det J(φ G ) N ′ ,γ is a nonzero polynomial only in free variables of γ and can thus not be canceled by the second term in (A.12).
Proof for Lemma 5.4. We first prove (i). Since N * (G) is generically finitely identifiable by Theorem 1.3, there exists an algebraic subset Ξ ′ such that for all θ ∈ Θ\Ξ ′ , |F φG (θ)| < ∞. Define Ξ to be the union of Ξ ′ and the set of triples (Λ, Ω, δ) ∈ R 2m+|E| with at least one coordinate δ i = 0. Let Σ 0 = φ G (Λ 0 , Ω 0 , δ 0 ) and Then for 1 ≤ i < j ≤ m, where the last equality follows from the fact that λ ij are nonzero only when (i, j) ∈ E. Hence, for any four indices 1 ≤ i < j < k < l ≤ m, the tetrads To finish the proof, we now need to show that (A.14) is uniquely solvable in λ pa(m),m . We will aim to contradict |F φG (θ 0 )| < ∞ if (A.14) does not have a unique solution. Note that the solution set is an affine subspace L ⊂ R |E| . For a contradiction, suppose that L is of positive dimension. Upon substituting Λ = Λ 0 into (A.13), we obtain S 0 = (s 0 ij ) = (I m − Λ T 0 )Σ 0 (I m − Λ 0 ), and in consideration of (5.3) in Theorem 5.2, it must be true that s 0 ii s 0 jk − s 0 ik s 0 ji > 0, for all i = j = k. We may then pick an open ball B(Λ 0 ) such that for all solutions Λ ∈ L ∩ B(Λ 0 ), the matrix S = (s ij ) defined by (A.13) satisfies s ii s jk − s ik s ji > 0 , for all i = j = k.
The proof of (ii) is analogous. We first let Υ 0 = ϕ G (Λ 0 , Ψ 0 , γ 0 ) and define Proof of Theorems 6.1 and 6.2. For Theorem 6.1, one can partition the Jacobian matrix J(φ G ) ofφ G as in (3.7), only with γ replaced by γ V \V ′ = {γ v : v ∈ V \ V ′ }. In analogy with Lemma 3.3, it can be shown that Jφ G is of column full rank if [J(φ G )] N,γ V \V ′ is. The reasoning is then analogous to that in the proof of Theorem 1.3, the main step being the application of Lemma 3.2 where the graph defining the considered map becomes (G c ) V \V ′ . The proof of Theorem 6.2 is analogous to the proof of Theorem 1.4. The only change is to replace G c con , G c |L,cov , γ and δ by G c con , G c |L,cov , γ V \V ′ and δ V \V ′ , respectively.