Triangular systems for symmetric binary variables

We introduce and study distributions of sets of binary vari- ables that are symmetric, that is each has equally probable levels. The joint distribution of these special types of binary variables, if generated by a recursiveprocess of linear main e!ects is essentially par ametrized in terms of marginal correlations. This contrasts with the log-linear formulation of joint probabilities in which parameters measure conditional associations given all remaining variables. The new formulation permits useful compar- isons of di!erent types of graphical Markov models and leads to a close approximation of Gaussian orthant probabilities.


Introduction
The explicit study of dependences and independences among a set of random variables is easiest for the multivariate Gaussian distribution since there are only linear relations and no interactive effects. Its key features include preservation of functional form under marginalization and conditioning and the use of triangular systems in only main effects to model recursive data generating processes. The latter in particular are strongly linked to a matrix formulation, itself in turn connected with the representation of independences by graphs; see for instance Marchetti and Wermuth (2009).
In the present paper, we give similar results for binary variables that are symmetric. Such binary variables may but need not result by median-dichotomizing continuous variables. We denote the two equally likely levels by −1 and 1, so that all variables are standardized to have marginally zero mean and unit variance. Then we give linear representations of probabilities involving parameters of which some can be interpreted as covariances and equivalently as correlations and others as linear regression coefficients. A strong parallel with Gaussian theory is established and used to compare different types of graphical Markov models, including seemingly unrelated regression in binary variables.
Two related approaches in logarithms of probabilities are studied briefly. For three variables, but not for more, the linear in probability representation is equivalent to a joint log linear model and to a logit triangular version. It is shown also that when all pairs of variables are equally correlated, the linear representation is remarkably close to that formed by median dichotomy of a latent multivariate Gaussian distribution.
Whenever measurement tools are questionnaires, median-dichotomized variables are widely used. Especially for visual analogue scales, the resulting symmetric binary variables often lead to appropriate data summaries. Symmetric binary variables may also result by design, for instance, when in medical studies controls are matched to an equal number of cases. But, since linear triangular systems of main effects for symmetric binary variables appear to have not been studied before, they are so far also absent in applications of graphical Markov models, see for instance Green, Hjort and Richardson (2003), and of discrete variable models; see e.g. Bergsma, Croon and Hagenaars (2009).

Definition and background
We consider p binary variables A s , s = 1, . . . , p. Variable A s has two equally probable levels a s , coded as 1 for success and −1 for failure. For up to four variables, we sometimes denote the variables by A = A 1 , B = A 2 , C = A 3 , D = A 4 and the levels, respectively, by i, j, k, l.
Joint and conditional probabilities are written in a condensed form, for instance for variables A, B, C π ijk = Pr(A = i, B = j, C = k), π i|jk = π ijk / i π ijk .
This form of the η's as linear regression coefficients in a conventional sense generalizes directly to p variables and stems from the close connection for binary variables between probabilities and expectations. For example, on multiplying the first equation in (2.1) by i, using i 2 = 1, and summing over i E(A 1 | A 2 = j, A 3 = k, A 4 = l) = η 12 j + η 13 k + η 14 l.
Joint distributions generated via linear triangular systems of main effects for symmetric binary variables relate for p ≥ 4 to the Bahadur expansion (1961) of general densities as given by Streitberg (1990), who proved existence and uniqueness without reference to any stepwise generating processes.
Here, we see that these parameters relate also directly to the triangular decomposition of the concentration matrix, which is for variables standardized to have zero means and unit variances, the inverse of the correlation matrix P. This triangular decomposition is denoted here by (H, ∆ −1 ) and gives where H is a unit upper-triangular matrix, which has ones along the diagonal and zero entries below the diagonal, and ∆ is a diagonal matrix with linear regression variances along the diagonal. The decomposition exists for every positive definite P and is unique for a given complete ordering (1, . . . , p).
For p > 2, the δ ss are not the conditional variances of the binary variables, var(A s |A r(s) ), but their expected values with respect to A r(s) . For p symmetric binary variables, the triangular system in linear main effect parameters defines the joint distribution as a product of the univariate conditional probabilities, where π Ap 1 = 1 2 and π As|A r(s) with (η s,s+1 , . . . , η sp ) = −H s,r(s) as in (2.6) and E (A s |A r(s) ) = a s+1 η s,s+1 + · · · + a p η sp var(A s |A r(s) ) = 1 − (a s+1 η s,s+1 + · · · + a p η sp ) 2 .
The main effects, η st for t > s, and the conditional expectations are fully determined by the marginal correlations via the triangular decomposition of P −1 .
The triangular decomposition of a concentration matrix may be obtained as a byproduct when applying the operator of partial inversion (Wermuth, Wiedenbeck and Cox, 2006) repeatedly to the covariance matrix or, for standardized variables, to the correlation matrix. If (a, b) denotes any ordered split of {1, 2, . . .p}, and e.g. P ab = (P) a,b , then where ∼ denotes entries which are symmetric up to the sign. Linear triangular systems for continuous responses have been studied in econometrics under the name of linear recursive equations with uncorrelated residuals; see Goldberger (1964) and in genetics as path analysis models; see Wright (1934). In that context, the regression coefficients have been called leastsquares regression coefficients in the population; see Cramèr (1946), p. 302. To cover path analysis and similar models for general types of densities, the name triangular system was introduced; see Wermuth and Cox (2004).
For general types of binary variables, the models most closely analogous to path analysis are triangular systems of logit regressions; see Goodman (1973) and here Section 3. The vanishing of a logit regression coefficient indicates always conditional independence given the remaining directly explanatory variables; see also Fienberg (2007). It is in the special case of symmetric binary variables considered here, that the vanishing of a main effect in probabilities in (2.7) coincides with the vanishing of a corresponding logit regression coefficient.
The symmetry of the binary variables leads to missing odd-order terms in the effect expansion of π A1,...,Ap . The presence of only even-order moments is also a feature of joint Gaussian distributions so that Gaussian probabilities, such as all variables being jointly positive, are well approximated by the corresponding probabilities for the binary variables; see Section 4.
By the relation of the linear main effects to the triangular decomposition of the correlation matrix, independence constraints on the joint probability distribution generated by a linear triangular system (2.7) in symmetric binary variables have similar implications as in a joint Gaussian distributions. For instance for three disjoint subsets a, b, c of {1, . . . , p}, the conditional independence A a ⊥ ⊥ A b |A c , often written compactly as a ⊥ ⊥ b|c, imposes the following constraint on the correlation matrix. The submatrix P ab is replaced by P * ab , where all other entries are left unchanged. This is a unique modification of P which maximizes the determinant; see Dempster (1972). The effect of this modification of a correlation matrix is that P * ab|c = 0. For p variables having a multivariate Gaussian distribution, this is known to be equivalent to a ⊥ ⊥ b|c. For joint distributions of four or more symmetric binary variables generated as in (2.7), it is only a necessary condition for a ⊥ ⊥ b|c; see Appendix C. In the linear triangular systems (2.7), mutual conditional independence is captured by zero parameters of this stepwise generating process while marginal mutual independence is reflected in sets of zero parameters of the joint distribution. For p = 4, for instance A 1 ⊥ ⊥ A 2 ⊥ ⊥ A 3 |A 4 holds if 0 = η 12 = η 13 = η 23 and A 2 ⊥ ⊥ A 3 ⊥ ⊥ A 4 holds if 0 = ρ 23 = ρ 24 = ρ 34 . Likelihood ratio tests apply just as for general categorical variables, but maximum-likelihood estimation is often simpler than for general binary variables. These features makes the family attractive for illustrating basic distinctions and similarities between various types of graphical Markov models, as studied for discrete variables by Wermuth (1976), Darroch, Lauritzen and Speed (1980), Wermuth and Cox (2004), Drton (2009), Marchetti and Lupparelli (2009); see Section 5. Streitberg (1999) stresses that estimation in constrained Bahadur expansions is undeveloped and that a truncated Bahadur expansion may be negative. The closed form (2.4) of the four-factor interaction term makes the first statement transparent for joint distributions generated by (2.7).

Some basic properties
A linear four-factor effect can be zero if and only if some of the marginal or conditional linear regression coefficients are zero as well. For instance, when η 12 = 0 in (2.1), and all other parameters in this process generating the joint distribution are nonzero, then the linear four-factor effect does not vanish unless 0 = ρ 23 = ρ 24 or η 14 ρ 23 = −η 13 ρ 24 .
Marginal independences may be implied by independence constraints on (2.7), or result by parametric cancellations, which are very special parametric constellations, for instance 0 = ρ 23 if η 23 = −η 24 ρ 34 ; see Appendix A. For examples of parametric cancellations in joint Gaussian distributions see Wermuth and Cox (1998).
That is, the probabilities of any given pattern of level combinations and that of the reversed pattern are the same. For p ≥ 4, every marginal trivariate distribution of (2.7) is of the same form as the joint probability π BCD jkl in (2.1), that is π BCD jkl = 1 8 (1 + ρ 23 jk + ρ 24 jl + ρ 34 kl).
Sets of marginal correlations that give valid trivariate distributions (2.9) satisfy so that they belong to the simplex of Figure 1a), defined by the convex hull of the points Within the curved boundaries of Figure 1b) are sets of correlations constrained only by the positive definiteness of the 3 × 3 correlation matrix. The maximal Euclidean distance of these boundaries to the simplex is 1/ √ 12. Figure 1c) shows the surface obtained by avoiding parametric cancellation for pair (C, D) in the distribution of B, C, D generated by (2.1) with nonvanishing η 23 , η 24 , ρ 34 . That is, when ρ 34 = ρ 23 ρ 24 were to be explicitly excluded.
With only linear two-factor effects in (2.9), there is in this special trivariate family of symmetric binary variables no additive interaction as defined for threedimensional contingency tables by Lazarsfeld (1961), or Lancaster (1969; see also Streitberg (1990).
Therefore, the conditional odds ratios of success to failure, abbreviated here by odr, coincide at the two levels of the third variable, where Equal odds-ratios at both levels of A 4 are equivalent to a zero three-factor effect in log π A2A3A4 jkl ; the 2 3 table is said to be without a multiplicative interaction. If these two odds-ratios are in addition equal to one then 2 ⊥ ⊥ 3|4 Multiplicative interaction was defined for three-dimensional contingency tables by Bartlett (1935) and extended later for measuring associations in general log-linear models; see for instance Fienberg (2007).
For general trivariate log-linear models, expected counts have been defined as m ijk = nπ ijk , where n = n ijk is the total number of observed counts. For such log-linear models without a multiplicative interaction, maximum-likelihood estimates are obtained by equating the three marginal two-way tables of m ijk to the three marginal two-way tables of n ijk ; see Birch (1963). An iterative algorithm is needed to solve the equations. One such algorithm is iterative proportional fitting, for which Darroch and Ratcliff (1972) proved convergence.
By contrast, for symmetric binary variables, closed form estimates are available. The set of minimal sufficient statistics for model (2.9) consists of observed sums of counts, sums for two successes and two failures, and sums for the mixture of success and failure n AsAt These lead to closed form maximum-likelihood estimates of ρ st , in terms of differences of the cross sumŝ and to closed form estimates of the marginal odds-ratios by the one-to-one relations For p > 3, the method of moments yields with (2.10) and (2.5) all elements of an unconstrained correlation matrix of the binary variables. In this paper we shall not discuss inference for models with independence constraints. Instead, we describe and compare examples of models in different types of model classes.
Equal odds-ratios for any variable pair, at the fixed level combinations of all remaining variables, is equivalent for this variable pair having zero parameters of all orders higher than two in the log-linear model for A 1 , . . . , A p . Log-linear models with this property satisfied for all variable pairs have been studied and named binary quadratic exponential distributions by Cox and Wermuth (1994).
Such joint distributions are in general determined by sets of full two-way marginal tables and, for the subclass of symmetric binary variables considered here, by the sets of cross sums defining also the correlation coefficients; see (2.5). Nevertheless for p > 3, these distributions are in general different from those generated by a linear triangular system (2.7) as well as from those generated by a logit triangular system; see the next section.

A logit triangular system for symmetric binary variables
A logit is the logarithm of the odds, for instance, = log ods A2|kl .

Wermuth, Marchetti and Cox/Triangular systems for symmetric binary variables 940
The logit triangular system of main effects in four symmetric binary variables is (1). After exponentiating, each given logit equation can be solved for the corresponding probabilities of success and of failure.
For symmetric binary variables, the trivariate marginal distributions coincide for the logit triangular system and for the linear triangular system of only main effects, since they have no additive and no multiplicative interaction; see the discussion of (2.9). For p > 3, the joint distributions generated by (3.1) will typically contain higher than two-factor effects of even order which differ from those generated by (2.1).
It can however be shown that for up to four variables, families of joint distributions can be obtained with (3.1) which are quadratic exponential and lead to an equal correlation matrix P. The following vector of probabilities, gives an example for ρ = 0.833 and p = 4, (π A1A2A3A4 ) T = (81 3 3 1 3 1 1 3 3 1 1 3 1 3 3 81)/192. These distributions introduce six non-vanishing two-factor effects and one four-factor effect in the linear expansion of π A1...A4 . For instance, the fourfactor terms obtained for π A1A2A3A4 ijkl from the linear triangular system, (2.1) as given in (4.3) below, and from the logit triangular system (3.1) are for equal correlation ρ = 0.555, 0.833, 0.963, respectively, 0.44, 0.78, 0.95. and 0.41, 0.75, 0.94. Thus, the joint quadrivariate distributions differ, but it would take large numbers of observations to discriminate between them in any given set of data.
In this paper, we explore properties of the linear triangular system (2.7). In the next section, we give its relation to a standardized joint Gaussian distribution having equal correlations.

Orthant probabilities
Attempts to approximate the Gaussian multivariate integral have a long tradition; see for instance Schläfli (1858), Sheppard (1898), Moran (1956), McFadden (1955, Cheng (1969). For a more recent overview see Johnson, Kotz and Balakrishnan (1995). An orthant probability is the probability that all variables are simultaneously positive. Sheppard (1898) derived the bivariate orthant probability in closed form, Mc-Fadden (1955) extended the result to trivariate distributions and Cheng (1969) used the di-logarithm for the quadrivariate distribution. From (2.7), bivariate and trivariate probability distributions for symmetric binary variables having equal correlation ρ are π As ,At as,at = 1 4 (1 + ρ a s a t ), and from Sheppard's result, the correlation coefficient ρ * in an underlying Gaussian distribution satisfies ρ * = sin( 1 2 π ρ), where π is Archimedes' constant. In this context, each symmetric binary variable results by dichotomizing a corresponding standardized Gaussian variable at its median.
For equally correlated binary variables, the effect parameters η in (2.7) reduce to where d denotes the number of explanatory variables, i.e. is the dimension of r(s). For p = 4, 6, 8, the even order interaction terms in the joint probabilities generated by the linear triangular system of only main effects are, respectively, where indices of the η's do not overlap. Table 1 compares the binary orthant probabilities for a few selected values of odds being in favor of success with orthant probabilities of standardized and equally correlated Gaussian variables, as obtained with Matlab. Displayed are on top, the selected value of the odds, ods A as |1 , the bivariate orthant probability, π 11 , the correlations of the binary variables, ρ, and the corresponding correlation ρ * of the Gaussian variables. For each p, there are two rows of orthant probabilities, the first row contains values given by Matlab for the Gaussian distribution and the second row contains values given by (4.4) for the linear triangular system. For p = 4, the former coincide with Cheng's evaluations in terms of di-logarithms up to the 6th decimal place. As it turned out, the two types of orthant probabilities agree at least up to the third decimal place, also for all other values of ρ > 0 not shown in the Table 1.
The p'th order interaction introduced for even p, and denoted by int p , may be expressed directly in terms of ρ as (1 + 2h)/(1 + 2hρ).

Types of graphical Markov models
We now discuss distributions for symmetric binary variables generated by triangular systems (2.7) with independence constraints. The resulting graphs consist of nodes and of edges that couple pairs of nodes. Edges may be arrows, dashed lines or full lines. A path is a sequence of nodes for which the consecutive nodes are distinct and are coupled by an edge. Every path of three or more nodes has some inner nodes. These are nodes of the path that differ from the path endpoints. A path of at least two arrows is direction-preserving if every inner node has an incoming and and outgoing arrow. A directed graph is acyclic if it is impossible to return to the same node following any direction-preserving path.
We assume in this section that edges present in the graph correspond to positive correlations and that a positive probability is associated with each level combination.

Markov chains, covariance and concentration graph models
A Markov chain is a special type of a triangular system represented by a directed graph consisting of a single direction-preserving path which is a special case of a directed acyclic graph. For four variables, such a path is 1≺ 2≺ 3≺ 4.
In the linear triangular system (2.1), the constraints are then 0 = η 13 = η 14 = η 24 , or, equivalently, 1 ⊥ ⊥ 34|2 and 2 ⊥ ⊥ 4|3. Every quadrivariate joint density with the above Markov chain structure factorizes. The factorization is, written in a condensed notation in terms of the nodes in the graph, This chain is represented in the correlation matrix of the symmetric binary variables by a particular pattern of correlations, which is the same pattern as for the corresponding Gaussian Markov chain. The diagonal matrix ∆ is defined by δ ss = 1 − ρ 2 s,s+1 for s = 1, . . . .
where the dots indicate symmetric entries. Thus, the correlations corresponding to edges present in the generating directed graph match the simple correlations in P; the three other correlations for pairs (1,3), (1,4), (2,4) are products of correlations along the path and therefore called sometimes merely induced or spurious correlations. The zero constraints on the regression coefficients turn here into more complex constraints on the correlation matrix. The Markov chain for (A 1 , A 2 , A 3 , A 4 ) has as conditioning set r(s) for variable A s , s = 1, 2, 3, while the correlation matrix contains the bivariate association for each variable pair. For each missing arrow, the bivariate association results by ignoring some variables in the conditioning sets defined implicitly by the generating Markov chain. Therefore, the more complex structure in P is sometimes said to represent a case of underconditioning. Each of the induced correlations may be computed, for population values as well as for maximum-likelihood estimates, by using Wright's rules for tracing paths whenever the directed graph is decomposable, i.e. contains no sink-oriented V-configuration; Wermuth (1980). Such a V-configuration is a subgraph of three nodes, •, having two arrows as edges and both arrows point to the common neighbor: • ≻ • ≺ •; see also Section 5.6.
In a covariance graph, that is an undirected graph with edges shown by dashed lines, each edge corresponds to a marginal, pairwise association. Thus, for instance a missing edge for pair (1,3) means 1 ⊥ ⊥ 3. For symmetric binary variables generated by (2.1), a covariance graph model is specified by zero constraints on marginal correlations. In general, some additional latent variables are needed to generate a covariance chain model recursively with univariate conditional distributions having exclusively zero parameter constraints; see Pearl and Wermuth (1994), Holland and Rosenbaum (1986).
For the above Markov chain, none of the marginal pairwise associations vanishes, so that the covariance graph induced by the generating directed graph for the given four variables is complete, i.e. it has no missing edges. In such cases, it cannot be recognized from the two graphs alone that the two model specifications captured by the matrices (H, ∆ −1 ) and P are Markov equivalent, i.e. that they define the same independence structure.
By contrast, in a concentration graph, that is an undirected graph with edges shown by full lines, each edge corresponds to a conditional association given all of the remaining variables. Thus, for instance a missing edge for pair (1,3) in a concentration graph of four nodes means 1 ⊥ ⊥ 3|24. For discrete variables, these conditional associations and independences are reflected in the parameters of log-linear models for the joint probabilities, that is in the effect expansion of log π A1...Ap ; see Appendix B. There is conditional independence of a variable pair if the log-linear two-factor effect and all higher-order effects of this pair vanish and this happens if and only if the conditional odds-ratios of this pair equal one at all level combinations of the remaining variables. For a Markov chain, the induced concentration graph has the same edge set as the Markov chain graph; see the following section.
For the symmetric binary variables discussed here, zero partial correlation coefficients given all remaining variables are necessary for the corresponding conditional independence statement to hold. But starting with four variables, zero partial correlations are no longer sufficient, see Appendix C. Partial correlations given all remaining may be computed from P −1 with elements ρ st , see for instance Cox and Wermuth (1996), p. 69, as ρ st|{1,...,p}\{st} = −ρ st / ρ ss ρ tt .
Conversely, given the defining independences of this concentration graph, all independence statements implied by the Markov chain graph of Section 5.1 can be derived, so that the two graphs are Markov equivalent. For the symmetric binary variables, each edge present in the above concentration graph corresponds to a two-factor term in the effect expansion of log π A1A2A3A4 , where π A1A2A3A4 ijkl = 1 16 {(1 + ρ 12 ij)(1 + ρ 23 jk)(1 + ρ 34 kl)}. Then log f 1234 is the sum of terms involving at most two variables. Therefore, the joint log-linear model has no higher-order than two-factor effects and no two-factor effects for all pairs of missing edges in the generating directed graph; see also (2.11).
In general, the characterizing feature for Markov equivalence of a directed acyclic graph and a concentration graph is that the generating directed graph is decomposable; see Frydenberg (1990). Thus, from a decomposable directed graph the corresponding induced concentration graph is obtained by replacing each arrow by a full line. In such a case, the concentration graph induced by the generating directed acyclic graph has an unchanged edge set and the two graphs are then said to have the same skeleton.

A covariance chain
A covariance chain for the ordered pairs (1, 2), (2, 3), (3, 4) has covariance graph where each edge present corresponds to the bivariate association of the corresponding variable pair. This is another undirected graph with the same skeleton as those discussed in the previous two graphs, but it represents a quite different independence structure. For the linear triangular system (2.1), the correlation matrix P to this graph and the matrix H of the triangular decomposition of P −1 are Here, the dependences that do not correspond to edges present in the covariance graph, are products of the coefficients corresponding to the non-vanishing marginal correlations. Thus, simple zero constraints on the marginal correlations in P, correspond to more complex nonlinear constraints in H. Therefore, the more complex structure in H which may be regarded as a case of overconditioning. The directed graph in the induced by the covariance chain is complete, since none of the induced conditional dependences of A s on A r(s) vanishes. Therefore, there is also no simplifying factorization of the joint density and there is no direct representation of the structure in the log-linear formulation of π A1A2A3A4 ijkl that is in the effect expansion of log π A1A2A3A4 . The two model specifications captured by the matrices (H, ∆ −1 ) and P are again Markov equivalent but this is not reflected in the corresponding two graphs alone, that is in the complete directed graph corresponding to H, and in the covariance graph corresponding to P, which is here a chain of dashed lines.
For a joint Gaussian distribution, maximum-likelihood estimation for a chain of covariances and for a chain of concentrations are quite different, even though both are defined without reference to an order of the variables; see Wermuth, Cox and Marchetti (2006) or Cox (2006), pp. 120-124. For binary variables, estimation under a covariance graph model maybe is less complex than in the Gaussian case; see Drton (2009). This holds also for symmetric binary variables when the multivariate regression is equivalent to a covariance chain; see the next section.

Seemingly unrelated regression and multivariate regression chains
A seemingly unrelated regression model, Zellner (1962), that is Markov equivalent to the covariance chain in the previous section has the graph 1 ≻2 3≺ 4, where variables A 1 , A 4 are independent explanatory variables for the joint responses A 2 , A 3 . With a set of arrows pointing to a set of nodes that may be connected by undirected edges, one speaks sometimes of a joint response graph. The graph implies in particular that A 1 , A 4 are both generated before A 2 , A 3 . We therefore reordered the correlation matrix accordingly, i.e. as (2, 3, 1, 4), to give P ′ . This matrix and the changes that the result after partial inversion with respect to the explanatory variables are By using the more explicit Yule-Cochran notation, where for instance β 1|2.3 denotes the coefficient of variable A 2 in linear regression of variable A 1 on A 2 , A 3 , we obtain from equations (A.1), (A.2) and (A.4) in Appendix A that the nonzero two-factor terms in π A2A3|A1A4 reduce to β 2|1.3 = ρ 12 , β 3|4.1 = ρ 34 , η 23 +η 24 ρ 13 + η 21 ρ 34 = ρ 23 and η 21 η 34 + η 24 η 12 = ρ 12 ρ 34 to give π A2A3|A1A4 jk|il = 1 4 (1 + ρ 12 ij + ρ 23 jk + ρ 34 jl + ρ 12 ρ 34 ijkl).
The next, more complex seemingly unrelated regression model has associated explanatory variables and a joint response graph reflecting 1 ⊥ ⊥ 4|3 and 2 ⊥ ⊥ 3|4 1 2 3 4 In the model for this graph, the conditional distribution of the responses (1, 2) given (3, 4) has in general the same form as for independent explanatory variables, the case described first in this subsection.
Whenever a conditional distribution is complemented by a marginal one in such a way that its parameters do not depend on those of the margin, then the joint density gives independently varying factors, say f 12|34 , f 34 , where and for maximum-likelihood estimation, each of the two parts of the likelihood can then be maximized separately. This is typically achieved, whenever the only types of constraint are independence statements captured by missing edges in graphs. By contrast, this is for instance not possible, when a parametric cancellation, such as illustrated in Figure 1, were to be explicitly excluded or when regression coefficients in different equations had to take on equal values.
There is only a small step from this last example to the formulation of multivariate regressions chains and the corresponding graphs; see Wermuth and Cox (2004). Let for example {1, . . . , p} = (a, b, c, d) be an ordered partition of {1, . . . , p} and suppose that There results a factorisation corresponding to the joint or single responses within the chain components a, b, c, d. Each response has as potential explanatory variables all variables in the past but not those within the same chain component nor any variable in the future. Among several responses on equal standing, i.e. variables within the same chain component, the pairwise associations are modelled given the past.
The corresponding chain graph has within each chain component a covariance graph of dashed lines and represents between chain components regressions given the past by arrows. For general discrete variables, Marchetti and Lupparelli (2009) show how such a multivariate regression chain is formulated as a multivariate logistic model of Glonek and McCullagh (1995). They also prove equivalence of its independence structure to the one obtained with a more complex formulation that depends only on the graph; see for instance Drton (2009 For mutually independent discrete variables given a latent variable, it is known that no constraints are implied on an observed contingency table provided that the latent variable has a sufficiently large number of levels; see Holland and Rosenbaum (1986). However, if all variables including the latent variable are binary then the correlation matrix of the observed variables satisfies the tetrad conditions, just like in the Gaussian case; see e.g. equation (21) of Wermuth (1976).
But, unless the joint probabilities even of symmetric binary variables may be generated via a triangular system of only main effects (2.7), a partial correlation coefficient of two binary variables given a third may be zero even if the three variables are strongly associated; see Wermuth (1998) for an example from psychological research.
We take here four symmetric binary variables to satisfy mutual conditional independence given A 5 and to have positive correlations with A 5 . The decomposable directed generating graph is This holds since every marginal correlation of the first four variables is induced by their correlations with A 5 to satisfy ρ st|5 = 0 , that is 5.6. Mutual independence of A 2 , A 3 , A 4 , A 5 , common response A 1 The final example treats mutual marginal independence of four variables. With the four variables having a common response, the non-decomposable generating graph is 1 2 3 4 5 The joint distribution in symmetric binary variables, generated by (2.7), is in this case defined by an identity matrix for H r(1),r(1) , δ ss = 1 for s = 2, . . . , 5, If data to this model were analyzed by a log-linear model that is by an effect expansion of log π A1...A5 , then one would see all variables being associated. The reason is that for this model, the induced partial correlation for a pair A s A t of the explanatory variables given all remaining variables is a multiple of ρ s1 ρ t1 . Ignoring the information about which variables are responses and which independent explanatory variables by treating all variables on equal standing, is another instance of over-conditioning, that is here of inducing associations by enlarging the conditioning sets, r(s) of A s , for s = 2, 3, 4, 5 given in the generating process, to include the common response A 1 .
Thus, as we have seen, independences may lead to simple zero constraints in one formulation but may appear as more complex constraints in parameter equivalent but different model formulations. It is therefore in general rarely useful to restrict model fitting and data analysis to one particular class of graphical Markov models unless there is strong prior knowledge. Also, Markov equivalent graphs may be most helpful in suggesting possible alternative interpretations of a given model, but Markov equivalence of two models need not be reflected in the corresponding graphs. Thus by using only one class graphs to learn about models, one may overlook some important Markov equivalent models.
The outstanding features of the described linear triangular systems are the simplicity of the parametrization of different types of independence models for few variables, the close approximation of the orthant probabilities of equally correlated Gaussian variables by the corresponding binary orthant probabilities and the complete reversal of odds for success in two-way margins.
Because of the symmetry in the joint probability distributions it may be difficult to distinguish for a given small set of data among different types of independence structure. Moreover the permissible range of association parameters reduces when the number of variables in the generating process is increasing. It remains to be seen for which substantive questions these special binary distributions may become attractive. Possibly, complete reversals in odds for success of some features or events can be expected when in medical case-control studies, particular genes are present for cases and absent for controls. = 1 8 {1 + (η 23 + η 24 ρ 34 )jk + (η 24 + η 23 ρ 34 )jl + ρ 34 kl}. and for instance, from a matrix multiplied by its inverse giving the identity matrix, With η 23 + η 23 η 34 = ρ 23 obtained similarly, equation (2.9) follows. The above argument proves also Cochran's recursion relation for regression coefficients, Cochran (1938). With the coefficients written in the more explicit Yule-Cochran notation, where for instance β 2|3.4 denotes the coefficient of variable X 3 in a linear least-squares regression of variable X 2 on X 3 , X 4 , one obtains here applied to regression coefficients of binary variables, standardized to have mean zero and unit variance. See Cox and Wermuth (2003), Ma, Xie and Geng (2006), Cox (2007) .  to direct expressions of a marginal correlation in terms weighted sums of η's and ρ's. Products of parameters without overlapping indices are not related in this way and induce higher-order interactions, such as given in (4.2) for the equal correlation case. In general, for instance the six-factor interaction contains terms such as β 1|2.3456 β 3|4.56 β 5|6 , β 1|3.2456 β 2|4.56 β 5|6 .
Such expressions look complex, but they are readily computed via the effect parameter expansion of the joint probabilities that are given by the linear triangular system.

Appendix B: Effect parameter expansions
Effect parameter expansions for binary variables are given by Yates' algorithm (Yates, 1937), which has been extended to general discrete variables by Good (1958) using contrast matrices. Every contrast matrix is the inverse of a corresponding design matrix, D p , that can for a 2 p table be obtained in terms of a 2 × 2 matrix D 1 via Kronecker products, where here for effect coding Thus, since the inverse of a matrix of Kronecker products is the Kronecker product of the inverses C p = D −1 p = 2 −p D p . When writing the 2 p probabilities in a column vector π A1,...,Ap such that the levels of the first variable change fastest, then the levels of the second variable next and the levels of the p ′ th variable last, then the effect parameters that result with C p π A1,...,Ap are in a lexicographical order in which the overall effect λ − is followed by the main effect λ A1 1 of A 1 , then (λ A2 1 , λ A1A2

111
) follows the first four terms and so on, adding at the next step, the additional variable in the superscripts and a level 1 in the subscripts of each of the previous terms.
Thus, for a vector of joint probabilities with structure given by (2.9), the linear effect parameter expansion is (C 3 π A1A2A3 ijk ) T = 1 8 (1 0 0 ρ 12 0 ρ 13 ρ 23 0) and for the three correlations being nonzero, precisely the entries in position 2,3,5 and 8 are also zero in (C 3 log π A1A2A3 ijk ) T , the log-linear effect expansion. For any constant c, the linear effect terms of c π A1A2A3 ijk are multiplied by c, while the effect parameters of log c π A1A2A3 ijk remain unchanged except that log c is added to the overall effect. Thus, for comparisons of linear and log-linear parameters it is sometimes convenient to divide the probability vector by one Wermuth, Marchetti and Cox/Triangular systems for symmetric binary variables 952 of its probabilities to obtain ones in some entries. For instance, for model (2.9) with equal correlations, we have with α = π 111 /π −111 and β = log α (π A1A2A3 ) T /π −111 = (α 1 1 1 1 1 1 α), log{(π A1A2A3 ) T /π −111 } = (β 0 0 0 0 0 0 β).

Appendix C: A Gaussian and a symmetric binary correlation matrix
Here, an example of a correlation matrix is provided, in which precisely one independence, 1 ⊥ ⊥ 2|34, holds whenever it represents the correlation structure of a joint Gaussian distribution since ρ 12|34 = 0 so that ρ * 12 = 0.6173 is given by (2.8) with a = {1}, b = {2}, c = {34}. But as we shall see, when the same matrix represents the correlation structure of a linear triangular system of symmetric binary variables, it depends on the order in which the variables are generated, whether 1 ⊥ ⊥ 2|34 holds or not. The following display on the left, shows in the order (1,2,3,4), marginal correlations in the lower half, and partial correlations given the two remaining variables in the upper half; on the right in the upper half, the upper part of the matrix H for generating the joint density the order (1,2,3,4) and in the lower half, the upper part of the matrix H ′ for the order (4,3,2,1) but rearranged so that indices s, t correspond to t, s in the upper half: The two joint distributions of the binary variables differ substantially for the two orders. For (1,2,3,4), the conditional probability vectors π A3|A4 are generated first, then π A2|A3A4 and last π A1|A2A3A4 while for (4,3,2,1), π A2|A1 comes first, then π A3|A2A1 and π A4|A3A2A1 last. With the following index vector, the joint probabilities obtained for (4,3,2,1) are arranged to imply the correlation matrix as it has been used for (1,2,3,4), i.e. the same values in the same order (1,9,5,13,3,11,7,15,2,10,6,14,4,12,8,16).
We get as the first case for the variable order (1,2,3,4) 1000πijk1 levels jk1 of A2, A3, A4 levels of A1 111 −111 1 − 11 −1 − 11 i = 1 373.54 33.96 21.12 11.38 i = −1 38.96 3.54 11.38 6.12 odr(A1A2|A3 = k, A4 = 1) 1 1 and as the second case for the variable order (4,3,2,1), with the probabilities rearranged to be directly comparable to those obtained for (1,2,3,4), Because of the symmetry of the joint distributions, it is enough to list i, j, k at level 1 of A 4 . Since in the second case the conditional odds-ratios odr(A 1 A 2 |A 3 = k, A 4 = 1) are quite different, there is a substantial four-factor effect in the loglinear expansion. It is only in the first case that 1 ⊥ ⊥ 2|34 holds. In both cases, non-vanishing four-factor terms result in the linear effect expansion of the probabilities. This illustrates in particular that there may be a conditional independence in the presence of a linear four-factor effect. The second case shows with ρ 12|34 = 0, that there may be a linear independence without a corresponding conditional independence statement being satisfied.
Thus, in spite of the strong similarities to standardized, equally correlated Gaussian distributions, some important properties of Gaussian distributions do not carry over to the described joint distributions of symmetric binary variables.