Hierarchical Graphical Bayesian Models in Psychology

The improvement of graphical methods in psychological research can promote their use and a better comprehension of their expressive power. The application of hierarchical Bayesian graphical models has recently become more frequent in psychological research. The aim of this contribution is to introduce suggestions for the improvement of hierarchical Bayesian graphical models in psychology. This novel set of suggestions stems from the description and comparison between two main approaches concerned with the use of plate notation and distribution pictograms. It is concluded that the combination of relevant aspects of both models might improve the use of powerful hierarchical Bayesian graphical models in psychology.


Introduction
Graphical models are powerful formal tools to represent dependencies and independence among variables, to make probabilistic inferences of conditional probabilities, and to estimate the values of parameters (Jordan 2004, Koller, Friedman, Getoor & Taskar 2007, Pearl 2009). This article focuses on the use of graphical models as tools to specify or illustrate hierarchical Bayesian models in psychology. More specifically, we concentrate on the visual features and the pieces of information represented in graphical hierarchical Bayesian models in psychology. Issues related to the formalism of the graphical model representation, algorithms to infer conditional probabilities, and machine learning procedures to estimate parameters are beyond the scope of this article (see for an exhaustive treatment of these issues Koller & Friedman 2009). Research into the role of graphs on the understanding of statistical information has shown that diagrams facilitate different kinds of reasoning (Bauer & Johnson-Laird 1993, Stenning & Oberlander 1995. It also emphasised that, in order to make research interesting for readers, graphs that represent complex information should be kept as simple as possible (Gray & Wegner 2013).
This article discusses two types of graphical models that have been used in psychology to represent hierarchical Bayesian models-graphical models with plate notation (Buntine 1994, Gilks, Thomas & Spiegelhalter 1994, Lee 2008) and graphical models with distribution pictograms (Kruschke 2010a, Kruschke 2010b). Moreover, it introduces a new type of graphical model that aims at combining the positive aspects of those two alternatives, while keeping the representation as simple as possible. In line with Gray and Wegner's proposal we aimed at generating an attractive graphical representation, for example, by adding 3D shapes to the graph. As indicated by a reviewer and by Wickens, Merwin & Lin (1994) 3D graphs sometimes improve understanding, but sometimes decrease understanding of statistical information. Therefore, we were very cautious when using 3D shapes. The article continues as follows. We first very briefly describe hierarchical Bayesian models, and then we discuss the use of graphical models to represent such models. After that we present the rationale of graphical models with plate notation, and we describe the introduction of hierarchical Bayesian graphical models in psychology. We then explain and discuss the use of graphical models with plate notation and graphical models with distribution pictograms. Lastly, we present the new type of representation, and we discuss advantages and disadvantages.

Hierarchical Bayesian Models
Bayesian analysis was introduced to psychology by Edwards, Lindman & Savage (1963). In this article we concentrate on the hierarchical aspect of the hierarchical Bayesian models (for an introduction to Bayesian statistics see, for example, Bolstad, 2007). Good (1980) indicated that there are three types of hierarchies that could be represented with hierarchical Bayesian models. One type is hierarchies about physical probabilities. For example, in a population of animals, there is a probability that an animal belongs to a category of animal (e.g., mammal), a probability that it belongs to a species (e.g., dog), a probability that belongs to an age category (e.g., 1 year of age), and so forth.
The second type of hierarchy arises from the fact that subjective probabilities about events cannot be sharp (e.g., the probability of an article being accepted being 0.65474). One way of dealing with this is to express the confidence of this subjective probability, and represent it as a probability distribution of a higher type. The third type of hierarchy is a combination of the first two types. Lee (2008) takes a pragmatic approach, and considers that a Bayesian model is hierarchical when it is more complex than a Bayesian model with a set of parameters θ generating a set of data d through a likelihood function f (.). Similarly, Kruschke's (2010a) chapter on hierarchical Bayesian models includes hyper-parameters, which are parameters that do not generate data, instead they affect other parameters.

Graphical Model Representation of Hierarchical
Bayesian Models Jordan (2004) indicated that a graphical model is a family of probability distributions represented in a directed or undirected graph. Graphs contain nodes -which represent random variables-connected by edges, which could be either directed (i.e., arrows) or undirected (i.e., lines). The most popular graphical models are Markov networks and Bayesian networks (or "Bayes nets"). In Markov networks all the edges are undirected whereas in the Bayesian networks all the edges are directed and the graph is acyclic. A graph is cyclic if it contains a cyclic path. A path is a sequence of nodes in which each node, except the first node in the sequence, receives a directed edge from the precedent node in the sequence (e.g., o→ o→ o→ o). When the last node in the sequence is the same as the first node, the path is cyclic, otherwise it is acyclic. In this article we focus on directed acyclic graphs (DAG). Although graphical models could be used both by frequentist and Bayesian approaches, they are very powerful tools to formulate complex models of joint probability distributions; hence, they have been very popular within the Bayesian approach (Jordan 2004). We present more details of graphical models in the following sections.

Plate Notation in Graphical Model
Representations of Hierarchical Bayesian Models The plate notation was introduced by Buntine (1994) 1 in order to solve the problem that graphical models were not representing the fact that the learning algorithms to estimate parameters had to do so over a repeated set of measured variables. Figure 1 is an adaptation of Buntine's Figure 13, which shows how repetition of variables were represented without plates (panel a) and the same model represented with plates (panel b). The graphical model represents a biased coin toss experiment in which the probability of heads is θ , and the coin is tossed N times. Each time the coin is tossed the experimenter records whether it was a head or not. The hierarchical aspect of the model is apparent by the introduction of a prior distribution in the graph-that is, a Beta distribution with parameters a = 1.5 and b = 1.5-from which the parameter θ is generated.  Buntine (1994) to represent the graphical model without plates in panel a. Instead of representing a number of nodes of the variable heads as in panel a, the plate representation surrounds one heads node by a plate, indicating that the model structure is repeated N times. Shaded nodes indicate observed variables and unshaded nodes indicate unobserved variables. The arrows show the dependencies between the variables.
There is nothing wrong with the model without plates, but when the number of variables and parameters increases the graphical model becomes difficult to represent in a limited space. The plate indicates that the variables within that plate are repeated the number of times indicated in the plate (in the model presented in Figure 1b, N times). This version of plate notation includes other important features: shaded nodes indicate observed data or variables with known values whereas unshaded nodes are unobserved variables. In other graphical models in the article Buntine (1994) used a double border to represent deterministic variables (the same notation was adopted by Lee (2008); see Figure 5.1, variables h i and f i ). A note of interest is that Buntine included in the graph the Beta distribution from which the parameter θ is generated, but not the Bernoulli distribution by which each value in heads is obtained. Figure 2 presents an adaptation of Gilks et al.'s (1994) graphical model with plates in their presentation of the BUGS (Bayesian inference Using Gibbs Sampling) language and sampler. (For historical reasons, it is probably important to note that the term "plate" was not used in this article). The model represents two research studies with the purpose of estimating disease risk associated to exposure to chemical agents: in the "disease study" the variables jobtype and disease were collected, and in the "exposure study" the variables jobtype and exposuretoagents were collected in another set of individuals. The model is the following:  Gilks et al. (1994). Circle nodes denote unobserved stochastic variables, square nodes with a single border represent observed stochastic variables (i.e., data), squares with double border denote fixed quantities in prior distributions, and triangles denote deterministic variables. The arrows represent the dependencies between variables. In this model all the arrows represent stochastic dependencies.
where D i is the disease status of the ith individual and θ i is his/her probability of disease where E ik is the unobserved exposure status (0 = unexposed; 1 = exposed) of the ith individual to the kth chemical agent, and nA = 4 is the number of chemical agents. The model for the exposure study is: where j = 1 to nJ, nJ = 2 is the number of job types, π jk is an exposure probability, n j is the number of individuals in the exposure study who were in job type j, and m jk is the number of those individuals in job type j who were exposed to chemical agent k. The connection between the disease study and the exposure study is indicated by: where j(i) is the job type of individual i in the "disease study". The priors are denoted by: and logit(π jk ) = φ jk , The graphical representation of this Bayesian hierarchical model in Figure 2 contains the plates i, j, k. The variables within each plate are replicated for each unit indicated in the plate. For example, D i indicates that there is a value of disease for each individual i, and n j that there is a value for the variable n for each job type j (e.g., 10 individuals are drivers, 25 individuals are teachers, etc.). The number of replications of variables that are contained in two plates (e.g., i and j ) is the product of the number of units indicated in each plate (e.g., number of i units x number of k units). The replication aspect of this graphical model is identical to that of Buntine (1994).
However, there are a number of differences in the graphical representations. In Gilks et al. (1994) there are no shaded nodes, and the difference between observed and unobserved variables is given by the shape of the nodes. Circles denote unobserved stochastic variables, square nodes with a single border denote observed stochastic variables (i.e., data), squares with double border denote fixed quantities in prior distributions, and triangles denote deterministic variables. Another difference is that in Gilks et al. (1994) the name of the distributions are not included, and instead of including the total number of units per plate they present the unit index (i.e., i, j, k), the name of the units (i.e., individuals, jobs, agents), and the indices of the variables are located within the nodes. Lunn, Thomas, Best & Spiegelhalter (2000) presented WinBUGS, which uses a graphical interface called DoodleBUGS. The graphical representation code in DoodleBUGS differs from the previous two versions. Figure 3 shows the simple example presented by Lunn et al. (2000), in which the graphical model represents a linear regression model expressed by where y i are observations measured at an experiment design points' x i , i = 1, . . . , N, τ is the inverse of the residual variance, and for i = 1, . . . , N . α is the intercept parameter, β is the slope parameter, and both are unknown.
In this representation rectangular nodes denote known constants, and round nodes denote deterministic relationships or stochastic quantities. Stochastic dependence is represented by single-edged arrows and deterministic dependence is denoted by double-edged arrows. The plate includes both the index of the units (i.e., i) and the fact that the repetition is from i unit 1 to i unit N .
Summing up, a number of visual components of the graphical models have been used to represent hierarchical Bayesian models: shape of nodes, shade of nodes, type of arrows, indication of repetition, indices and inclusion of the distribution name. In the next section we describe the most popular plate notation that has been used to represent hierarchical Bayesian models in psychology and a new type of graphical model, which does not contain plates.  Lunn et al. (2000). Rectangular nodes denote known constants, round nodes denote deterministic relationships or stochastic quantities. Stochastic dependence is represented by single-edged arrows and deterministic dependence is denoted by double-edged arrows.

Lee's (2008) Hierarchical Bayesian Graphical Models with Plate Notation
Lee (2011) indicated that Bayesian statistics have been used in psychology in three ways. The first use of Bayesian statistics is to conduct data analysis. Edwards et al. (1963) introduced Bayesian analysis to psychology, and more recently a number of researchers (Dienes 2011, Kruschke 2010a, Kruschke 2010b, Lee & Wagenmakers 2005, Wagenmakers 2007 have advocated that Bayesian statistics should replace the Null Hypothesis Statistical Testing (NHST) paradigm, which is still the most popular paradigm in psychology. Another use of a Bayesian approach in psychology is to produce psychological models of how humans make inferences (see Griffiths, Kemp & Tenenbaum, 2008 for a detailed treatment of this approach). The third use -the one we focus on in this article-is to relate psychological models to data. To our knowledge Lee (2008) presented the first graphical model with plates to represent hierarchical Bayesian models in psychology. Lee "translated" three influential cognitive models in psychology -the multidimensional scaling (MSD) representation of stimulus similarity (Shepard, 1962(Shepard, , 1980, the generalized context model (GCM) account of category learning (Nosofsky, 1984(Nosofsky, , 1986, and a signal detection theory (SDT) account of reasoning (Heit & Rotello 2005)-into hierarchical Bayesian models. (see Figure 4).
The plate represents repetitions over N participants in a reasoning experiment (see 2005 Heit & Rotello for details of the experiment). Lee uses shaded nodes for observed variables and nodes without shading for unobserved variables. Continuous variables are represented by circles and discrete variables with squares; and Figure 4: Graphical model with plate notation adapted from Lee (2008, Figure 10).
Squares represent discrete variables, circles continuous variables, shaded nodes denote observed variables and unshaded nodes represent unobserved variables. The plate represents repetition of the model structure over units i = 1 to N . The arrows represent dependencies between variables.
stochastic and deterministic unobserved variables are denoted by single border and double border, respectively. In the model, d i and c i denote the ith individual's discriminability and bias, respectively. They are generated from a normal distribution with mean parameters m d and m c , respectively, and precision (i.e., 1/variance) parameters τ d and τ c ; that is: The priors of the mean and precision parameters are not shown in the graphical model, and their mean and precision have standard near non-informative priors The hit and false alarm rates are deterministic variables calculated by: where Φ(.) is the standard cumulative Gaussian function. The count on hit rate and false alarms comes from a Binomial distribution with probabilities h i and f i , respectively. This is expressed by where t s and t n are the number of signal and noise trials presented in the experiment. As in Buntine (1994) this graphical model uses shading to differentiate observed from unobserved variables, and double border to represent deterministic variables. A new aspect of this representation is that shape is used to differentiate continuous from discrete variables. Like Lunn et al. (2000), Lee (2008) indicated the unit index and the total number of repetitions, but he did not follow the former in using double-edged arrows to denote a deterministic dependency between variables. The distributions are not represented in the graphical model.
After Lee's (2008) article the plate notation gained popularity within mathematical psychology, and it has been subsequently used in a number of studies

Kruschke's (2010) Hierarchical Bayesian Graphical
Models with Distribution Pictograms Kruschke (2010aKruschke ( , 2010b) introduced a new graphical representation of hierarchical Bayesian models in his book "Doing Bayesian Data Analysis: A tutorial with R and BUGS". He explained the purpose of introducing his graphical representation in a post of October 2013 in his blog (doingbayesiandataanalysis.blogspot.com). He stated that the directed acyclic graphs are incomplete and/or confusing for him, and that designing graphical models with the pictograms of the distributions help him explaining, inventing and programming models. As shown in Figure 5 this graphical model does not use plates, the variables are not represented by nodes but by letters, and pictograms with a prototypical distribution shapes are used. Moreover, the arrows are accompanied by a ∼ sign indicating a stochastic dependency, and a = sign denoting deterministic dependency. The repetition is represented by three dots and the index (e.g., . . . i), and when variables repeat over two different sets of units it is indicated with the conditional notation (e.g., j | i).
Moreover, unlike in the graphs with plate notation, the equations to calculate the deterministic variables are included in the model. The graphical model in Figure 5 was presented by Kruschke in his blog (which differs from the models in his book in which only ". . . " without the index is used to represent repetition) with the purpose of comparing his novel graphical model with the more established graphical models with plate notation. It represents a model described by Spiegelhalter, Thomas, Best & Gilks (1996), in which y j|i denotes the weight of rat i at day after birth j. This variable is generated by the normal distribution so that: where w j|i is the mean and λ is the precision. w j|i is deterministically calculated by where φ i is the intercept and ξ i is the slope, and both come from normal distributions with means κ and ζ,; and precision δ and γ, respectively so that: The mean parameters come from a normal distribution with priors M and H, and the precisions come from a gamma distribution with parameters K and I. free software to draw the graphical diagrams with the distribution pictograms. Schneider (2013) presented a graphical model using Bååth's pictograms. This representation differs from that of Kruschke in that boxes were added around the distributions and double-edge arrows were used to denote deterministic relations.
(Note that instead of the single-edged arrow used in Figure 6, Schneider used a curly arrow), and the deterministic equation is surrounded by a brace. Summing up, Kruschke's (2010aKruschke's ( , 2010b graphical model puts emphasis on the type of distribution. The repetition is not represented by plates, rather three dots and the indices of the variables are presented to denote the repetition. Another important aspect of this graphical model is that the parameters that belong to the same distribution are presented contiguously in space. (Note that this is even more emphasized in Schneider's representation, in which the parameters that form a distribution are surrounded by a box). As indicated by Kruschke the inclusion of distribution pictograms plays a heuristic role for the invention, explanation and programming of models. A possible problem with this approach is that sometimes a deterministic variable comes from a long relation between other variables, which could use a lot of space and make the graph difficult to understand.

Discussion
Both the graphical models with plates and the graphical models with distribution pictograms share the importance of representing the dependencies between variables by using arrows. The former emphasise the independent repetition of the variables, and the latter emphasises the probabilistic distributions. As mentioned earlier, Buntine's (1994) graphical models contained both plates and the names (instead of the pictograms) of the distributions. However, Buntine only presented graphical models with one or two plates, and the larger the number of plates the smaller the space to include information about the distribution.
Moreover, complex models are also difficult to represent with graphical models with plates. For example, Figure 6.3 shows a hierarchical Bayesian model presented by Lee (2008). The complexity of the model led the author to use three i plates and two j plates. Although this solution is flawless from the formal point of view, it reduces the heuristic value of the graphical models with plates. Given that there is no differentiation between plates (i.e., they all have the same format) the only graphical difference between, say the plate i and the plate j is that they are different plates. Therefore, adding more plates for denoting repetition over the same set of units might be confusing for some researchers. One obvious solution to this problem is to colour code or shape code the plates. That is, to use a different colour (or shape) for each type of plate. Based on this summary we aimed at developing a type of graphical model that incorporates both the distribution pictograms and the plate notation, that solves the problems identified in the discussion, and that it has heuristic value to help researchers invent, explain and program hierarchical Bayesian models.

Hierarchical Bayesian Graphical Models with Distribution Pictograms and Mini-Plates
The first ingredient of our proposal is the replacements of plates by colourcoded mini-plates. (Note that in the paper version of this article we use shadings; please refer to the link provided below to see a colour version of the graphical models). As explained above, the purpose of colour-coding is to graphically differentiate between different types of plates. While we were developing the idea of using colour to differentiate between plates we realise that if the colour indicates the set of units over which the repetition occurs then the plates might not be necessary. Thus, we originally developed graphical models without plates in which we colour coded the nodes. However, because colour (or shading) is already used to code for variable type (i.e., observed vs. unobserved) this implementation was unsatisfactory.
That led us to develop the idea of colour coded mini-plates, and use them to surround each node. However, this was also unsatisfactory because the mini-plates with the nodes occupied too much space. In parallel we were also considering the use of 3D representations in order to make the graphical models more attractive. When we were trying different ways of using 3D nodes we found out that in some cases the combination of 3D shapes occupy less space than 2D shapes because the former allow more flexibility to locate the nodes in space. Thus, we realised that using 3D nodes was also a good idea to save space, and then we came up with the idea of using 3D rotated mini-plates under the nodes, instead of surrounding the nodes.
Unfortunately, in order to trigger the perception of 3D nodes it is necessary to add some shading to the nodes, which, as indicated by a reviewer, was very distracting. Therefore, we used 3D mini-plates and 2D nodes. Note that in the link provided below we used 3D nodes with colour because the shading in some colours is not distracting. Figure 7 shows the mini-plate version of the bottom right part of the graphical model presented in Figure 4. (Note that this is a blackgrey-white version of the graphical model. A colour version of Figure 7, as well as Figure 8 and Figure 11, could be accessed in this link: http://dx.doi.org/ 10.6084/m9.figshare.1020148. Moreover, a Microsoft PowerPoint template to create the figures could be accessed in this link: http://dx.doi.org/10.6084/ m9.figshare.1020020) As in 4, the nodes of the variable k f i and t n are shaded because they are observed variables, and they have the shape of a square to indicate that they are discrete variables. Likewise, the f i node is unshaded to denote that it is an unobserved variable, and it is a circle to indicate that it is a continuous variable. Like in Figures 3 and 6, we adopted the double-arrow to indicate a deterministic dependency. Thus, the f i node does not have a double border as in Figure 4. The grey mini-plates under the f i and k f i indicate that they repeat over i = 1 to N units. Mini-plates play exactly the same role as the plates; that is, they indicate that the variables within them are repeated over the number the units indicated in the plates or mini-plates. Having mini-plates provides flexibility to the modeler to where to position the nodes in space, and, as a consequence, it makes easier to incorporate Kruschke's distribution pictograms. Figure 8 shows the mini-plate version of the bottom part of the graphical model presented in Figure 5. As in Figure 5, the mini-plate version incorporates the pictogram of the normal distribution with the parameters represented with unshaded circles. Note that two plates are located under y ij : The grey i plate and the diagonally stripped j plate. Likewise the normal distribution and the ω ij are on top of an i plate and a j plate, indicating that the process that generates y ij repeats over the i and j units. However the λ node is not on top of any plate because the generating process uses only one value of λ.

Comparison Between Three Graphical Model Representations of a Hierarchical Bayesian Model
Having presented the mini-plate with distribution pictograms, we now discuss whether this type of graphical representation is capable of representing complex models, and how it compares with the other two types of graphical models. For this purpose we present here a complex hierarchical Bayesian model with plate notation developed by Lee (2008, pages 5 to 8), based on Nosofsky (1986) generalised context model (GCM) of category learning.
This model aims at explaining how people learn to categorise unknown stimuli into two categories in experiments in which the researcher uses different category structures (see more details in Lee, 2008). In those experiments the researcher assigns one fourth of the total number of stimuli to one category (i.e., category A), one fourth to the other category (i.e., category B), and one half is not assigned to any category. The model utilises the multidimensional scaling (MDS) representation of stimulus similarity developed by Shepard (1962). Stimuli are represented as points (p) in a D-dimensional space. In Figure 9 p ix denotes a coordinate value of stimulus i in dimension x. The surrounding plates indicates that, in this example, there are N stimuli and x = 2 dimensions. Lee (2008) assigned the following prior probability to p ix (not shown in the graph): d 2 ij denotes the squared psychological distance between all the possible pairs of N stimuli, where i denotes the first stimulus and j the second stimulus of the pair. The node is surrounded by the i and j plates to denote the repetition over all the possible pairs of stimuli. The squared psychological distance between stimuli is determined by: where w is the relative attention paid to the first stimulus over the second stimulus, and has the following prior distribution (not shown in the graph): w ∼ U nif orm(0, 1).
(22) Figure 9: Adaptation of the graphical model with plates presented by Lee (2008, page 6, Figure 5). The notation is the same as in Figure 4.
The similarity between each pair of stimuli is given by: where c is a generalisation gradient parameter, with the following prior (not shown in the graph): where ε = .001.
The number of times the ith stimulus is chosen as a member of category A out of t i trials is denoted by k i , which follows a binomial distribution: where r i is the probability of stimulus i being chosen as a member of category A. This probability is determined by the similarities between stimuli (s ij ), a response bias b, how the stimuli were assigned to the categories by the researcher (i.e., indicator variables a j , z j ), and a third indicator variable x i . The prior distribution of the response bias b (not shown in the graph) is: b ∼ U nif orm(0, 1).
a j denotes the known assignment of the jth presented stimulus, which ranges over N/2 such stimuli, and z j indicates the latent assignment of the jth unassigned stimulus, ranging over the N/2 such stimuli (see Lee, 2008, for a more detailed explanation of stimuli assignment structures), with the following prior: The x i is incorporated in the graph in order to compare a model that uses the latent stimulus assignment z j with a simpler model without this latent variable.
x i indicates for each stimulus i whether the response probability r i uses a j and z j (i.e., x i = 1) or only a j (i.e., x i = 0). It is assumed that all the indicators x i support either model following a fixed underlying rate of use (i.e., θ), which is given by the following prior: The posterior rate of use provides information on how well each model accounts for the categorisations of the participants (for a more detailed explanation see Lee, 2008, page 8). Finally, the probability for the ith stimulus to be classified as a member of category A is given by: if x i is 0.
if x i is 1. Figure 9 shows Lee's (2008) graphical model with plates, 10 presents our best attempt to represent the same model with the graphical model with distribution pictograms proposed by Kruschke (2010a), and Figure 11 depicts the graphical model with mini-plates and distribution pictograms, as proposed in this article. Regarding the amount of information presented, Kruschke's graphical model presents more information (equations, distributions, dependencies and repetitions) than the mini-plate graphical model (distributions, dependencies and repetitions) and the plate graphical model (dependencies and distributions). However, it seems to us that the inclusion of equations in Kruschke's graphical model attempts against the heuristic value of the model by cluttering the space with too much information. Possibly, Kruschke's style graphical model should include equations only when the models are simple and exclude them when the models contain more than one or two equations and/or when the equations are large.
The mini-plate graphical representation seems to strike a balance between amount of information and use of space. By using the concept of plates in a flexible way (i.e., mini-plates), combining them with Kruschke's distribution pictograms, and using 3D shapes it has a strong visual appeal. However, this comparison is not completely fair for the other graphical representations. As we mentioned   Lee (2008, pages 5 to 8). The notation is the same as in Figure  8.
above, Kruschke's graphical models could be improved by not including equations in complex models, and we might not have done the best to present the best possible representation. Moreover, there are a variety of plate graphical models that we are not displaying in this article. For example, van Ravenzwaaij et al. (2014) presented a plate graphical model following Lee's (2008) notation, but in the right hand side of the figure they added both the distributions and the equations in text format. Allegedly, this might be a better representation than the mini-plate graphical models because it combines graphical aspects with text, rather than being too graphical.

Conclusion
We discussed two types of hierarchical Bayesian graphical models used in psychology - Lee's (2008) plate graphical models and Kruschke's (2010b) graphical models with distribution pictograms. We proposed a third type-graphical models with colour-coded mini-plates-as an attempt to combine the positive aspects of the other two types of graphical models, and we presented a preliminary intuitive analysis of the informative value of this new graphical respresentation. We believe our proposal provides an important contribution to the field, but our intuitive evaluation should be confirmed by empirical evidence. Statistical cognition is a flourishing area of research in psychology (e.g., Beyth Marom, Fidler & Cumming 2008), that investigates how people understand statistical information. Our proposal could be followed up by a research that aims at elucidating which type of graphical model leads to a better understanding of hierarchical Bayesian models.