Graphical Independence Networks with the gRain Package for R

In this paper we present the R package gRain for propagation in graphical independence networks (for which Bayesian networks is a special instance). The paper includes a description of the theory behind the computations. The main part of the paper is an illustration of how to use the package. The paper also illustrates how to turn a graphical model and data into an independence network.


Introduction
The gRain package (Højsgaard 2012a) is an R package, (R Development Core Team 2011) for probability propagation in gRaphical i ndependence networks which are also denoted probabilistic networks or Bayesian networks.gRain is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=gRain.
A Bayesian network is often taken to be a graphical model based on a directed acyclic graph (DAG).The DAG, however, is only used to give a simple way of specifying a multivariate distribution by combining (conditional) univariate distributions.The key to efficient probability propagation is to exploit conditional independencies in an undirected graph which is derived from the DAG.Hence, methods for building undirected graphical models can also be used for constructing probabilistic networks.Put differently, if we infer an undirected graphical model from data then we can convert the model to a probabilistic network and use this network for computation of conditional probabilities.As a concrete example, Tamayo et al. (2011) used gRain for probability propagation in model for clinical prediction in brain cancer.Throughout grain: Graphical Independence Networks in R the paper we shall use the term network for a graphical independence network.
The networks available in gRain are restricted to consisting of discrete variables, each with a finite state space.The propagation algorithm currently available in gRain is that of Lauritzen and Spiegelhalter (1988) which we denote the LS algorithm.For brevity we refer to the paper Lauritzen and Spiegelhalter (1988) as L&S.
To our knowledge there are two other R packages for probability propagation.One is the GRAPPA suite of functions (Green 2009) and the other is RHugin (Konis 2011).Neither of these packages are on CRAN.
The paper is organized as follows: Section 2 briefly reviews Bayesian networks through the chest clinic example of L&Sand the section also provides a very short presentation of some of the functionality of gRain.Section 3 describes the concepts and computational steps of L&Sand Section 4 then describes the gRain implementation of these steps.Section 5 describes methods for building networks from data and Section 6 describes additional features of gRain which includes methods for prediction, simulation and building networks from repeated patterns.Finally, Section 7 gives an overview of the software structure while Section 8 contains a discussion.

An introductory example: The chest clinic
This section reviews the chest clinic example of L&S (illustrated in Figure 1) and shows one way of specifying the network in gRain.Details of the steps will be given in Section 4 (additional ways of specifying a network are described in Section 5).L&S motivate the chest clinic example by the following narrative: "Shortness-of-breath (dyspnoea) may be due to tuberculosis, lung cancer or bronchitis, or none of them, or more than one of them.A recent visit to Asia increases the chances of tuberculosis, while smoking is known to be a risk factor for both lung cancer and bronchitis.The results of a single chest X-ray do not discriminate between lung cancer and tuberculosis, as neither does the presence or absence of dyspnoea."

Specifying a network through conditional probability tables
Let X = X V = (X v ; v ∈ V ) be a discrete random vector where V is a set of labels of the variables.The levels of X v are generically denoted by x V so the levels of X are denoted x = x V = (x v , v ∈ V ).Each variable X v has a finite number of levels.
In this section we outline one way of building a network.The starting point is a multivariate probability distribution constructed by combining univariate (conditional) distributions using the structure of a directed acyclic graph (hereafter denoted a DAG) with vertices V .The parents of a vertex v is denoted pa(v).A joint distribution can be given as where p(x v |x pa(v) ) is a function defined on X v ×X pa(v) .This function is non-negative and satisfies that xv p( becomes the conditional distribution of X v given X pa(v) .In practice p(x v |x pa(v) ) is specified as a table called a conditional probability table (hereafter denoted a CPT).Thus, a Bayesian network can be regarded as a complex stochastic model built up by putting together simple components (conditional probability distributions).
For the chest clinic example we write the variables as A = Asia, S = smoker, T = tuberculosis, L = lung cancer, B = bronchitis, D = dyspnoea, X = X-ray and E = either tuberculosis or lung cancer.Notice that we are using X with two different meanings here.Notice also that E is a logical variable which is true if either T or L are true and false otherwise.
Following (1), the DAG in Figure 1 dictates a factorization of the joint probability function p(x V ) as

Queries to networks
Suppose we are given a finding that a set of variables E ⊂ V have a specific value x * E .With this finding we are often interested in the conditional distribution p( Interest might also be in calculating the probability of a specific event, e.g., the probability of seeing a specific finding, i.e., p(x * E ) = p(X E = x * E ).In the chest clinic example, a finding x * E could be a person who has recently visited Asia and suffers from dyspnoea, i.e., x A = yes and x D = yes.Notice that E is used with two different meanings in this example.Interest might be in p

A one-minute version of gRain
A simple way of specifying the network for the chest clinic example is as follows (the steps presented below will be explained in detail in Section 4).

Local computations
This section describes the algorithm for making efficient computations with networks while the sections to follow will relate the facilities in gRain to the theory.
Returning to the chest clinic example, suppose interest is in p(x L |x D = "yes") = p(x L , x D = "yes")/p(x D = "yes"), i.e., the probability of lung cancer given dyspnoea.A brute force approach is to (i) carry out the multiplications in (2) (which would yield a 2 8 dimensional table) and then (ii) marginalize the table onto the 2 2 table defined by L and D. Such an approach is intractable for larger networks.
The key to making efficient computations with networks is (a) to utilize the modular structure of the model so that computations can be made locally and (b) to use the graph for identifying this modular structure.

Conditional independence and graphs
Before describing the local computations in detail, we review the notions of conditional independence and dependency graphs; see Lauritzen (1996) for further details.Let X = (X v ) v∈V be a random vector with a joint density.Let A, B and C be subsets of V and let X A = (X v ) v∈A and similarly for X B and X C .Then X A and X B are conditionally independent given that is, as a product of two non-negative functions g() and h(), where g() does not depend on x B and h() does not depend on x A .This is known as the factorization criterion.
Let G = (V, E) be an undirected graph with cliques C 1 , . . .C k .If p(x V ) can be factorized as for some functions ψ C 1 () . . .ψ C T () where ψ C j () depends on x only through x C j then we say that p() factorizes according to G. If all the densities under a given model factorize according to G, then G encodes the conditional independence structure of the model, through the following result (the global Markov property): whenever sets A and B are separated by a set C in G, then A ⊥ ⊥ B | C under the model.

The moral graph
We may think of each of the conditional probability tables of the form p(x v |x pa(v) ) in ( 2) as a non-negative function of the variables it involves.We can make this explicit by writing ) or even shorter as ψ(x v , x pa(v) ).Following L&S we call these functions evidence potentials or simply potentials and we may hence write (2) as ) and so forth.Absorbing lower order terms into higher order terms allows us to write (2) as where, for example, The dependence graph of a Bayesian network is an undirected graph with vertices V , and this graph is derived from these potentials.For example, the presence of the term p(x D |x E , x B ) implies that there must be edges between all pairs in {D, E, B}.The dependence graph can algorithmically be formed from the DAG by moralization: The moral graph of a DAG is obtained by first joining all parents of each node by an edge and then dropping directions on the arrows.The moral graph for the chest clinic example is shown in Figure 2, left.The edges added are between tub and lung and between either and bronc.The global Markov property allows conditional independence restrictions implied by (4) to be read off the moral graph.
Next we illustrate an approach to efficient computations.To find e.g., p(x L , x D ) we need to sum over all other variables.We are interested in making these summations locally so that we do not create new potentials defined over large sets of variables because this is prohibitive for large networks.
For example if we start by summing over x T then it follows from (4) that we will create a potential which depends on ( Summing over x A afterwards then amounts to finding On the other hand, if we start by summing over x A then this means summing in a single potential, namely ψ * (x T ) = x A ψ(x T , x A ).This new potential can then be absorbed into ). Thereby no new potential has been created; only an existing potential has been modified.Summing over x T afterwards then amounts to finding ψ which again is a sum involving only a single potential.We can also employ such local computations when summing over x X and it then follows that Next we will have to sum over either x S , x B or x E .Summing over x S gives (5) This will produce a function of (x L , x B ), namely ψ(x L , x B ) = x S ψ(x L , x S )ψ(x B , x S ).Since L and B are not neighbours in the moral graph there is no potential into which ψ(x L , x B ) can be absorbed.(Summing over x B or x E would also create functions of variables which are not neighbours in the moral graph).

Triangulation
There is a simple condition under which the summations (when carried out in a suitable order) can be made so that no new functions are created.The condition is that the moral graph is triangulated.A graph is triangulated if and only if the graph contains no cycles of length ≥ 4 without a chord.If a graph is not triangulated it can be made so by adding edges, so called fill-ins.A triangulation of the moral graph for the chest clinic example is shown in Figure 2, right (an edge between bronc and lung has been added).Moreover, when the graph is triangulated it is possible to give an ordering in which all summations can be made such that no new functions are created.We shall return to describing the ordering later.
Clearly, p(x V ) has interactions only among neighbours in the triangulated graph and we may write 1 and all other potentials are as defined in connection with (4).The representation in ( 6) is called a clique potential representation.Using (6), the analogue of (5) becomes to set ψ ) so in this case there is now an existing potential ψ(x E , x B , x L ) into which x S ψ(x L , x B , x S ) can be incorporated.We then obtain

Different representations of the joint distribution
The key to efficient calculations is to transform the clique potential representation (such as the representation in ( 6)) into a clique marginal representation which is described below.
To do so, the following graph theoretical concepts are needed (where we refer to L&S for additional details and for references): Let bd(v i ) denote the neighbours of the vertex v i in an undirected graph.A set of variables U in an undirected graph is complete if there are edges between all pairs of variables in U .A numbering v 1 , . . ., v k of the vertices is perfect if and only if for all i the sets {v 1 , . . ., v i−1 } ∩ bd(v i ) are complete.It can be shown that a graph is triangulated if the vertices can be given a perfect ordering.Triangulatedness can be checked using the maximum cardinality search (abbreviated MCS) algorithm which works as follows: Give number 1 to any node and proceed iteratively as follows: The next node to number is a node with a maximum of previously numbered neighbours.If the graph is triangulated then the numbering obtained this way will be perfect.If the graph is not triangulated it can be made so by adding additional edges, so called fill-ins.Finding an optimal triangulation of a given graph is NP-complete.Yet, various good heuristics exist.For graph triangulation we have in gRain used the minimum clique weight heuristic method suggested by Kjaerulff (1990).
An ordering C 1 , . . ., C T of the cliques of an undirected graph has the Running Intersection Property if S j = (C 1 ∪ . . .C j−1 ) ∩ C j is contained in one (but possibly several) of the cliques C 1 , . . ., C j−1 .An ordering of the cliques that satisfies the Running Intersection Property is also called a RIP ordering.We pick one of the cliques, say C k of C 1 , . . .C j−1 and call this the parent clique of C j while C j is called a child of C k .We call S j the separator and R j = C j \ S j the residual, where S 1 = ∅.It can be shown that the cliques of a graph admit a RIP ordering if and only if the graph is triangulated.A RIP ordering can be found, for example, by ordering the cliques according to the highest number assigned to the vertices in each clique by the maximum cardinality search algorithm.
Table 1 shows the RIP ordering of the cliques of the triangulated moral graph for the chest clinic example as obtained in gRain.
The RIP ordering of the cliques can be represented as the DAG in Figure 3 which we call a rooted junction tree with C 1 as the root.
In a general setting we may (following the steps illustrated above) obtain a clique potential representation of p(x V ) given in (1) as where  1.
Notice that (1) can be seen as a way of specifying a complex joint distribution through simpler terms, a collection of CPTs.However, these CPTs are only used for establishing the clique potential representation (8).Once this representation is established, the CPTs are of no further use.It should also be noticed that the clique potentials are not uniquely defined (and in the following we shall make use of alternative sets of clique potentials).
The LS algorithm works by turning the clique potential representation (8) into a representation in which each potential ψ C j (x C j ) is replaced by the marginal distribution p(x C j ).This representation is called a clique marginal representation.This is done by working twice through the set of cliques and passing messages between neighbouring cliques: first from the last clique in the RIP ordering towards the first, i.e., inwards in the junction tree, and subsequently passing messages in the other direction.
In detail, we proceed as follows.Start with the last clique C T in the RIP ordering where C T = S T ∪ R T , S T ∩ R T = ∅ (for the chest clinic example, this is C 6 in Figure 3).The factorization (8) together with the factorization criterion (3 Then from the expression above we have and hence The RIP ordering ensures that S T is contained in the neighbour of C T in the junction tree (one of the cliques C 1 , . . ., C T −1 ), say C k .(In Figure 3, We can think of the clique C T passing the message ψ S T to its neighbour C k , and then changing its own potential to ψ C T ← ψ C T /ψ S T .Similarly, C k absorbes the message.After this we now have We can then apply the same scheme to the part of the junction tree which has not yet been traversed.Continuing in this way until we reach the root of the junction tree yields . The resulting expression ( 9) is called a set chain representation.Note that the root potential now yields the joint marginal distribution of its nodes.(In Section 5 we establish the representation (9) directly from a triangulated graph and a dataset).
Next we the other way in the set chain representation (and outwards in the rooted junction tree): Suppose C 1 is the parent of C 2 in the rooted junction tree.Then we have that p(x S 2 ) = x C 1 \S 2 p(x C 1 ) and so Thus when the clique C 2 has absorbed its message by the operation its potential is equal to the marginal distribution of its nodes.Proceeding this way until we reach the leaves of the junction tree yields the clique marginal representation Based on the clique marginal representation, marginal probabilities of each node can be found by summing out over the other variables.

Absorbing evidence and answering queries
Consider absorbing the evidence x * E = (x * v , v ∈ E), i.e., that nodes in E are known to have a specific value.We note that p(x and hence evidence can be absorbed into the model by modifying the terms ψ C j in the clique potential representation (8) as follows: for every v ∈ E, we pick an arbitrary clique C j with v ∈ C j .Entries in ψ C j which are locally inconsistent with the evidence, i.e., entries x C j for which x v = x * v , are set to zero and all other entries are unchanged.Evidence can be entered before or after propagation without changing final results.
To answer queries, we carry out the propagation steps above leading to a clique marginal representation where the potentials become ψ C j (x C j ) = p(x C j |x * E ).In this process we note that processing of the root potential to find p(x Hence the probability of the evidence comes at no extra computational cost.

Suppose we want the distribution p(x
If there is a clique C j such that U ⊂ C j then the distribution is simple to find by summing p(x C j ) over the variables in C j \ U .If no such clique exists we can obtain p(x U | x * E ) by calculating p(x * U , x * E ) for all possible configurations x * U of U and then normalize the result: this can be computationally demanding if U has a large state space.

Local computations in gRain
This section illustrates how the local computation steps described in Section 3 are made in gRain.The two main concepts are compilation and propagation; a terminology which has been adopted from the HUGIN software (HUGIN Expert A/S 2011).
In general terms, compilation refers to a transformation of specification of a joint probability distribution into a clique potential representation of the form (8) while propagation means transforming a clique potential representation of the form (8) into the clique marginal representation (10).

Specification of a network
Consider the following code fragment (which is also used in Section 2. The compileCPT() function performs the following steps: (i) Checks that specifications of the CPTs do not specify cycles.(ii) Create the corresponding DAG.(iii) Checks that the dimensions of the CPTs are consistent.(For example the specification of t.a gives a table with four entries and the variable tub is specified to be binary.Hence it is checked that the variable asia is also binary).
A network is specified using the generic grain() function.There is a grain() method for different types of model specification.The grain() method applied to plist (as defined above) sets up an internal structure which is subsequently used in the computations: R> gin1 <-grain(plist) R> summary(gin1) Independence network: Compiled: FALSE Propagated: FALSE Nodes : chr [1:8] "asia" "tub" "smoke" "lung" "bronc" ...

Compilation
The compile() method performs the following steps: (i) Creates the moral graph.(ii) Detects that the moral graph is not triangulated and therefore creates a triangulated graph by making fill-ins.(iii) Establishes a potential representation by absorbing each CPT into an appropriate clique potential; i.e., establish the representation in (8.(iv) Creates a junction tree of the cliques.

Number of cliques:
6 Maximal clique size: 3 Maximal state space in cliques: 8 A RIP ordering of the cliques of the triangulated undirected graph is contained in the slot rip in the network object; see Table 1.

Propagation
Propagation means transforming a clique potential representation of the form (8) into the clique marginal representation (10): Notice that the compile() and propagate() functions were not called explicitly in the example in Section 2.3.The reason for this is given in Section 4.3.

Setting findings
Suppose we want to enter the finding that a person has recently been to Asia and suffers from dyspnoea.This can be done in one of two ways: R> gin1c.find<-setFinding(gin1c, nodes = c("asia", "dysp"), + states = c("yes", "yes")) R> gin1c.find<-setFinding(gin1c, + flist = list(c("asia", "yes"), c("dysp", "yes"))) These findings are entered into the clique potential representation (8) as described in Section 3.5.If the network is not already compiled then no clique potential representation exists for the network.In this case, setFinding() will force a compilation to be made.Default is that the network is also propagated when setFinding() is called and therefore the compile() and propagate() functions were not called in Section 2.3.Notice that findings can be entered incrementally by calling setFinding() repeatedly.If doing so, it is advantageous (in terms of computing time) to set propagate = FALSE in setFinding() and then only call the propagate() function at the end: R> gin1c.find<-setFinding(gin1c, nodes = "asia", states = "yes", + propagate = FALSE) R> gin1c.find<-setFinding(gin1c.find,nodes = "dysp", states = "yes", + propagate = FALSE) R> gin1c.find<-propagate(gin1c.find) The finding itself is displayed with: R> getFinding(gin1c.find) The probability of observing the finding is obtained with pFinding().Findings can be retracted (removed from the network) with R> gin1c3 <-retractFinding(gin1c.find, nodes = "asia") R> getFinding(gin1c3) Finding: variable state [1,] dysp yes Pr(Finding)= 0.004501375 Omitting the nodes argument when calling retractFinding() implies that all findings are retracted, i.e., that the network is reset to its original status.

Answering queries
As illustrated in Section 2.3, queries can be made to a network using the querygrain() function.The result is by default an array (or a list of array(s)).Setting result = "data.frame"causes the result to be returned as a dataframe (or a list of dataframes).Calling querygrain() on a network which is not propagated will force a propagation to be made.
For example, consider the network with the findings inserted above: R> querygrain(gin1c.find,nodes = c("lung", "bronc"), type = "marginal", + result = "data.frame")In the first instance where type = "marginal" we get p(x L |finding) and p(x B |finding); notice that type = "marginal" is the default.Next, setting type = "joint" gives p(x L , x B |finding).Finally, setting type = "conditional" gives p(x L |x B , finding), i.e., the distribution of the first variable in nodes given the remaining variables listed.
grain: Graphical Independence Networks in R Omitting nodes when calling querygrain() implies that all nodes are considered.Hence omitting nodes and setting type = "joint" means that the full table will be calculated.

Building networks from data
The CPTs used in the previous sections have all been specified directly.However, it is clearly possible to extract such tables from data.Below we illustrate two approaches for doing so.

From a directed acyclic graph
This section describes building a network from a given DAG and data.As an example we consider the coronary artery disease data in the dataframe cad1 (see Højsgaard and Thiesson 1995 for a more detailled discussion of the data).Put briefly the coronary artery disease CAD (C) is the response variable.Occurrence of CAD is believed to depend on smoking habits (Smoker (S)), hereditary predispositions (Inherit (I)) and hypercholesterolamia (Hyperchol (H)).
Manifestations of the disease includes angina pectoris (AngPec (A)) and recordings of other heart failures (Heartfail (F)).Furthermore, the ECG reading of Q-wave (QWave (Q)) is a manifestation of CAD.We postulate the simple DAG structure in Figure 4: R> cad.dag <-dag(~CAD:Smoker:Inherit:Hyperchol + AngPec:CAD + + Heartfail:CAD + QWave:CAD) Given a DAG and the data, it is possible to extract CPTs and create a network from these as follows:: The CPTs extracted are relative frequencies of each node given its parents.To avoid zeros in the CPTs one can choose to add a small number, e.g., smooth = 0.1 to all values, corresponding to a Bayesian estimate based on prior Dirichlet distributions for the CPT entries: R> data("cad1") R> cad.cpt <-extractCPT(cad1, cad.dag, smooth = 0.1) R> cad.gin <-grain(compileCPT(cad.cpt)) These steps are wrapped into a single function which takes a DAG and a table as arguments: R> grain(cad.dag,data = cad1, smooth = 0.1) Independence network: Compiled: FALSE Propagated: FALSE Nodes: chr [1:7] "CAD" "Smoker" "Inherit" "Hyperchol" ...

From a triangulated undirected graph
In this section we describe how to build a network from an triangulated undirected graph.
The situation we have in mind is where log-linear model for a contingency table has been found, for example using a stepwise model selection procedure.
These steps are wrapped into a single function which takes an undirected (triangulated) graph and data as arguments: R> grain(cad.ug,data = cad1) 6.Additional features of gRain

Fast computation of a joint distribution
Suppose interest is in fast computation of a joint distribution of a set of variables U as discussed in Section 3.5.In this case one can force U to be in the same clique of the triangulated moralized DAG as: R> gin1c.alt<-compile(gin1, root = c("lung", "bronc", "tub"), + propagate = TRUE) R> summary(gin1c.alt)Independence network: Compiled: TRUE Propagated: TRUE Nodes : chr [1:8] "asia" "tub" "smoke" "lung" "bronc" ... Number of cliques: 5 Maximal clique size: 4 Maximal state space in cliques: 16 Hence this network has a larger clique size than the original network in Section 4.2.The triangulated graph resulting from this is shown in Figure 6.
The computing time for the joint distribution of lung, bronc and tub is much smaller when compared with the original network: R> system.time({+ for (i in 1:200) + querygrain(gin1c.alt,nodes = c("lung", "bronc", "tub"), type = "joint") + }) This makes computation of their joint distribution faster but at the expense of a larger clique size.

Simulation
It is possible to simulate data from a network (with or without findings) using the method of Dawid (1992), see also Cowell et al. (1999, p. 99).If a network is not propagated when simulate() is applied, simulate() will force this to happen automatically.The algorithm works as follows: Consider again Figure 3. First data are simulated for the variables in C 1 using p(x C 1 ).Then data for R 2 = C 2 \ S 2 are generated from p(x R 2 |x S 2 ) using the previously simulated data for S 2 = C 2 ∩ C 1 and so on outwards in the junction tree.
The output should be read carefully: Conditional on the first observation in mydata, the most probable value of lung is "yes" and the same is the case for bronc.This is not in general the same as saying that the jointly most likely configuration of the two variables lung and bronc is "yes".

Repeated patterns
The gRain package provides a simple mechanism for producing repeated patterns.To illustrate this we define a homogeneous hidden Markov model where the x t s are unobserved and the y t s are observed.
First we define templates for transition and emission densities for each time point: R> yn <-c("yes", "no") R> x.This causes two types of problems.First, in HUGIN it is allowed to have e.g., spaces and special characters (e.g., "?") in variable labels.This is not permitted in gRain.If such a name is found by loadHuginNet(), the name is converted as follows: Special characters are removed, the first letter after a space is capitalized and then spaces are removed.Hence the label "visit to Asia?" in a net file will be converted to "visitToAsia".Then same convention applies to states of the variables.Secondly, because node labels in the net file are used as node names in gRain we may end up with two nodes having the same name which is obviously not permitted.To resolve this issue gRain will in such cases force the node names in gRain to be the node names rather than the node labels from the net file.For example, if nodes A and B in a net file both have label foo, then the nodes in gRain will be denoted A and B. It is noted that in itself this approach is not entirely foolproof: If there is a node C with label A, then we have just moved the problem.Therefore the scheme above is applied recursively until all ambiguities are resolved.

Overview of software structure
In this section we briefly summarizes the software structure.
A network can be specified in different ways: (a) From a list of CPTs, (Section 2.3) (b) From a DAG and a dataset (Section 5.1) and (c) From a triangulated undirected graph and a dataset (Section 5.2).The grain() function is used in this connection.
Findings are entered into a network using setFinding() and queries to a network are made using querygrain() (see Sections 2.3 and 4.3).Findings can only be entered to a compiled network and queries can only be asked to a compiled and propagated network.However, compilation and propagation (using compile() and propagate()) is forced (when necessary) by the setFinding() and querygrain() functions).
Findings can be entered sequentially and findings can be retracted (Section 4.3).Entering findings sequentially is one example where it may be advantageous to defer propagation until all findings have been entered.
When compiling a network it is possible to force a set of variables to be in the same clique of the underlying undirected graph.Section 6.1 shows an example where this is used.

Discussion and perspectives
This paper describes a propagation algorithm for graphical independence networks (Bayesian networks); how to set findings and how to query such networks.The paper also describes how to establish networks from data and a statistical model.
The critical point in terms of storage is the size of the clique potentials which is determined by the size of the cliques in the underlying triangulated graph.The critical computational point in connection with computing time are the operations on clique potentials: multiplication, division and marginalization.When the cliques become large and/or the variables have many levels then gRain becomes slow.On the other hand, if the underlying graph is sparse (for example a tree) then gRain is quite fast.Thereby it is feasible to use gRain in e.g., bioinformatic applications where the underlying graph often is sparse.
An alternative to gRain is the RHugin package, Konis (2011).RHugin provides an interface to the HUGIN Application Programmers Interface (HUGIN Expert A/S 2011) which makes RHugin very efficient.RHugin works with the limited version of HUGIN which is freely available.In terms of speed and computational efficiency, RHugin is much more efficient than gRain.
There exist faster algorithms than the one implemented in gRain, for example the algorithm described by Jensen et al. (1990).Their algorithm is presumably faster than the LS algorithm but also more expensive in terms of memory requirements than the LS algorithm.Their algorithm may become available in gRain in the future.

Figure 2 :
Figure 2: Left: Moral graph obtained by adding edges between common parents of each node and then dropping directions.Right: Triangulated moralized DAG.

Figure 4 :
Figure 4: A DAG model for coronary artery disease data.

Figure 6 :
Figure 6: Triangulation when lung, bronc and tub are forced to be contained in one clique.This makes computation of their joint distribution faster but at the expense of a larger clique size.

Figure 7 :
Figure 7: Hidden Markov model generated with repeated patterns.

Table 1 :
RIP ordering of the cliques of the triangulated moral graph for the chest clinique example.For example, clique number 4 has clique number 3 as its parent.
C 1 , C 2 , . . ., C T are the cliques of the triangulated moral graph.grain: Graphical Independence Networks in R Figure 3: Rooted junction tree for the chest clinic example from L&S.The numbers refer to the cliques given in Table and is shown in Figure5.To turn this model into a network, two observations are made: First, the compilation step in Section 4.2 only served to get from a list of CPTs to the grain: Graphical Independence Networks in R ) of the cliques of a triangulated undirected graph.After the clique potential representation is obtained, the CPTs are of no further use.Secondly, the set chain representation (9) is one such clique potential representation and obtaining this set chain representation is straight forward given data and a triangulated graph: Given the RIP ordering of the cliques we extract the tables for a clique C k and a separator S k , normalize these to sum to one to give p(x C k ) and p(x S k ) and set ψ C k (x C k ) = p(x C k )/p(x S k ).These potentials are then turned into a network.
(Csardi and Nepusz 2006)thods for plotting networks: (a) The plot() methods are based on the Rgraphviz packageGentry et al. (2012)which requires that the Graphviz program, (AT&T Research 2009)is installed.All plots in this paper are created using these plot() methods.(b)Theiplot()methodsarebased on the igraph package(Csardi and Nepusz 2006)and igraph does not require external programs to be installed.6.6.Working with HUGIN net filesThe HUGIN program (HUGIN Expert A/S 2011) is a commercial program for Bayesian networks.A limited version of HUGIN is freely available.A network made in HUGIN can be saved in a specific format known as a net file (which is a text file).Such a net file can be read into R as a network using the loadHuginNet() function and a network in R can be saved in the net format with the saveHuginNet() function.HUGIN distinguishes between node names and node labels.Node names have to be unique; node labels need not be so.When creating a network in HUGIN node names are generated automatically as C1, C2 etc.The user can choose to give more informative labels or to give informative names.Typically one would do the former.Therefore loadHuginNet() uses node labels (if given) from the netfile and otherwise node names.