Bayesian Centroid Estimation for Motif Discovery

Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.


Introduction
In motif discovery we are given a set of sequences that share a common motif and aim to identify the motif profile-the frequency of symbols for each position in the pattern-and the positions in each sequence where the motifs are. It is assumed that the motifs have significantly different profiles from sequence background. This problem has gained attention and relevance in the past 25 years mainly due to biological applications; a classical example is regulation of gene expression by transcription factors that bind to specific motifs in genomic promoter regions [1][2][3]. For this reason, we refer to the positions where the motifs are realized in the sequences as ''binding sites''.
Due to its importance, hundreds of procedures have been proposed for motif discovery [4,5]. While some approaches seek to characterize motifs and their binding sites using dictionary methods that capture over-representation of words as evidence [6,7], it is common to represent motif compositions by a position weight matrix (PWM) [18] and specify a parametric model where sequences are generated conditionally on motif and background compositions and binding sites. Binding sites can then be regarded as missing data; parameters for the compositions can be estimated using expectation-maximization (EM) [9] in a frequentist setup [10,11], or assigned a prior distribution in a Bayesian setup [12][13][14]. Other computational approaches are based on evolutionary algorithms and population clustering [15][16][17].
Even when exploiting prior information in both compositions and binding site configurations in a Bayesian setup, motif discovery is still considered a hard problem since motifs are usually short relative to sequence length and have a composition that might be hard to distinguish from background (see, for instance, [4].) It is then imperative to rely on more refined, informative estimation methods that better glean information from the posterior distribution of binding site configurations. Discrete inferential methods with this goal have recently been proposed, including the median probability model of Barbieri and Berger [8] and the centroid estimator [19,20].
Estimators based on centroid inference, in particular, have been more successful than the ubiquitous maximum a posteriori (MAP) estimator when applied to motif discovery in models that account for sequence conservation [21,22]. These estimators, however, were defined from a thresholded loss function that mostly compares binding sites across sequences instead of more finely comparing sequences position-wise for binding site overlaps, as in the traditional centroid estimator (details in Methods.) Moreover, these results rely on sampled binding site configurations to derive the centroid and thus do not offer a characterization of the estimator. Centroid estimators were also shown to yield more compact centered credibility sets then MAP estimators when applied to sequence alignment [23].
In this paper, we propose a novel centroid estimator that arises from a more refined and arguably more natural loss function and that can, in contrast to previous approaches, be fully characterized as a function of marginal posteriors in the space of binding site configurations. In this sense, we argue that the proposed estimator is a better representative of the posterior space of configurations. In addition, as a by-product of its derivation, we obtain informative summaries of the distribution of posterior mass. To this end, we adopt a Bayesian model for motif discovery on multiple sequences with multiple possible binding sites that is an extended version of the classic model from Liu, Neuwald, and Lawrence [14]. The motivation for this extended model is twofold: to obtain a feasible computational method while still retaining a realistic interpretation and to allow us to focus the discussion on the proposed estimator.

Methods
We present our approach starting from a simple model and building up to the most general setup in the next sections.
One Sequence, One Binding Site Suppose we observe a sequence R, DRD ¼ : n, and wish to infer the location of the only binding site Y , Y [f1, . . . ,n{Lz1g. Following the Bayesian model from [14], we assume that there is only one motif of fixed length L and that sequences are generated conditionally independently according to a product multinomial model given binding site positions and motif and background compositions. Thus, for an alphabet S, we define h 0~( h 0,s ) s[S as background probabilities of generating each letter in S and, for each position i~1, . . . ,L in the motif, h i~( h i,s ) s[S as the probabilities of generating each letter at the i-th position in the motif. To simplify the notation we denote H~(h 0 ,h 1 , . . . ,h L ). The likelihood is then: where j[BG means position j in background. Setting a non-informative prior on Y , P(Y )~(n{Lz1) {1 , we have the posterior: One traditional estimator is the maximum a posteriori (MAP) estimator,Ŷ but we argue for an estimator that accounts for differences in positions when comparing binding site configurations. Using Bayesian decision theory [24] we look for an estimator that minimizes, on average, a more refined loss function H: We adopt a generalized Hamming loss H, returns the position in the motif. Loss function H compares configurations position-wise according to h, which in turn compares states. If we define m(i) ¼ : which yields a loss H that accounts for overlap in binding sites. Such metric is commonly adopted to measure binding site level accuracy, as in the performance coefficients in [4,5,25]. EstimatorŶ Y C is a generalized centroid estimator; for instance, if h is a common zero-one loss, h(i,j)~I(i=j), H corresponds to Hamming loss, and thusŶ Y C is the regular centroid estimator [19,20]. As Carvalho and Lawrence [26] argue, centroid estimators more effectively represent the space since they are closer to posterior means; in contrast, it can be shown thatŶ Y M arises from a zero-one loss function which yields the posterior mode [26].
Let us now derive more specific expressions for H andŶ Y C . We first notice that if DỸ Y {Y D §L then the binding sites do not overlap and so H(Ỹ Y ,Y )~2 P L j~1 h(j,0) ¼ : H Ã , the null overlap distance between two configurations. Alternatively, when DỸ Y {Y DvL then since the common backgrounds inỸ Y and Y do not affect H(Ỹ Y ,Y ), the first two terms above account for the left and right ''tails'' where binding sites in one sequence are matched with background in the other sequence, and the last term accounts for the overlap in binding sites. We also note that Instead of a loss function we can also define our estimator in terms of a gain function G(Ỹ Y ,Y ) ¼ : Noting that G, like H, is also a function of DỸ Y {Y D, we obtain the following characterization: as required.
When contrasted toŶ Y M we can see the effect of having a higher resolution loss function:Ŷ Y C gathers probability support from nearby, relative to H, binding site configurations instead of just picking the most likely configuration. More specifically, for h that corresponds to the overlap in binding sites, we have H Ã~2 L, a ''step pyramid'' convolution filter that weights farther contributions less heavily. From now on we will be adopting this loss/gain function.
Other choices of G could be used, but they do not necessarily correspond to a generalized Hamming loss, and thus not to a centroid estimator (as defined here) either. The centroid estimator in [21,22], for instance, adopts the thresholded gain close to a ''half pla-teau'' filter, where e is an infinitesimal meant to break ties. Besides offering less resolution when comparing binding site configurations-binding sites are only compared for a ''significant'' overlap-this gain function does not result from a generalized Hamming loss since positional information is needed to assess if an overlap is significant or not. Finally, to get some insight into the new estimator, check the first example in the Results section.

One Sequence, Multiple Binding Sites
We now allow for multiple binding sites by defining Y~fY k g as the collection of non-overlapping binding sites Y k . The likelihood is similar, but accounts for the multiple binding sites: Given the ''entropic'' effect of possibly having many binding sites, we need to adopt a better prior for Y that takes into account the number of possible configurations for the binding sites. So, instead of naively electing P(Y )!1, we explore a hierarchical structure: if c(Y )~DY D, the number of binding sites in Y , we note that then first set P(Y D c(Y ))!1-binding site configurations are equally likely given the number of binding sites-and next define P(c(Y )).
In what follows we settle on a prior distribution for c(Y ) that is based on a Markov chain with two states, background and motif, where the probability of transitioning to background, either from background or motif, and of starting at background is p; this approach results in since there needs to be c(Y ) transitions to the motif state. This prior structure offers a good degree of flexibility through p: we can further set a hyperprior distribution on p, or specify it directly based on the expected number b of binding sites in the sequence; if n is large compared to b, as usual, then p should be close to one, c(Y ) is approximately Poisson with mean n(1{p) and thus p ¼ : 1{b=n becomes a good candidate.
Going back to our inferential goal we note that, in contrast to the one binding site case from last section, posterior inference is more difficult since comparing configurations with different number of binding sites is not amenable to a systematic approach. Our first approximation is to consider local estimators for each group of configurations with a fixed number of binding sites and then appeal to a triangle inequality: where Y is a configuration with c binding sites,Ŷ Y c is the constrained estimator for all configurations with c binding sites, andŶ Y is the (overall) centroid estimator. Let C ¼ : tn=Ls be the maximum number of binding sites in R, and recall that for the centroid estimator we wish to findỸ Y that minimizes Using the triangle inequality for each group we then have whereỸ Y c is an arbitrary point in fY : c(Y )~cg. Our task is now to find an estimator-let us still call it centroid-that minimizes the right-hand bound in Equation 4 above. This goal suggests a two-step strategy: 1. For each number of binding sites, c~1, . . . ,C, find the local centroidsŶ We note that this strategy does not guarantee that the bound is minimized; the main goal here is computational convenience. Let us tackle each step of this heuristic next. To this end we need P(c(Y ) DR,H) and marginal posteriors P(Y k Dc(Y ),R,H); obtaining these posterior probabilities is a standard procedure, but we provide details on how they can be computed in File S1 for completeness.

Local Centroids
Even when the number of binding sites is fixed, minimizing the conditional posterior expectation of H(Ỹ Y ,Y ) can be challenging: we would still have to consider for each candidate configurationỸ Y the posterior probability of configurations with all binding sites to the left of the first binding site inỸ Y , in-between binding sites inỸ Y , and so on. We adopt another approximation and decide to minimize a paired Hamming loss H A where binding site positions are matched according to their order: is Hamming loss when comparing sequences with only one binding site atỸ Y k and Y k , respectively, that is, The next result adapts Theorem 1 to yield the paired local centroids.
Lemma 2 If P k ( : Dc(Y )~c,R,H) is the marginal conditional posterior on Y k then the paired local centroids are Proof. In the same spirit of Theorem 1, we use the conditional estimator in Equation 5 with the paired loss H A : and the result follows.
We can spot in Lemma 2 the familiar convolutions, but now with the marginal posteriors P(Y k D c(Y ),R,H) and in a more restricted range. We have a nice characterization, but we still have to optimize a sum to obtain the local centroids; to this end we explore the same recursive structure that allows us to compute forward and backward sums (see File S1 for details.) Let us define This important observation allows us to obtainŶ Y c using the dynamic programming approach listed in Algorithm 1, as Theorem 3 formalizes. Algorithm 1 FindŶ Y c using dynamic programming. Construct partial maxima and backtrack pointers: Step 2. For k~2, . . . ,c andỸ Y k~( k{1)Lz1, . . . , n{(c{kz1)Lz1 do: set backtrack pointers and set partial maximum sum m k as Reconstruct centroidŶ Y c using backtrack pointers: Step 3. Set last binding site position: Step 4. For k~c, . . . ,2 do: recover the remainder ofŶ Y c by settingŶ Y c,k{1~Ak{1 (Ŷ Y c,k ).
Theorem 3 Algorithm 1 correctly identifies the paired local centroidŝ The key device in Algorithm 1 is to exploit the recursion in Equation 7 to define m 1 (Ỹ Y 1 )~f (Ỹ Y 1 ) and for kw1, to store partial sum maxima. Now it follows that and so Step 3 must be correct. The correctness of Step 4 relies on the right specification of m in Steps 1 and 2; but these steps are a straightforward application of Equation 7 using the definition of m 1 and a formulation of Equation 8 based on the backtrack pointers A, and so the algorithm is correct. We note that the paired local centroids minimize an expected posterior upper bound H A on the loss H, and so the actual local centroid might not be attained. We expect, however, that for common cases in which the motif coverage c(Y )L is much smaller than n that the bound is tight since H A approximates H well and thus the two local centroids often coincide.

Global Centroid
While the local centroids already convey information about the distribution of posterior mass in the space of binding site configurations, the end goal of the analysis is a point estimate that is, in itself, a good representative of the space. Following the strategy we outlined in the beginning of this section, we can further summarize the information in the local centroids by identifying a configurationŶ Y that minimizes the expected conditional Hamming loss, as in Equation 6. This approach, however, entails the same difficulties as defining the centroid based on all points in the space, and it is thus not treatable by a systematic approach-we are now just restricting the configurations to the local centroids.
The global centroid can be defined by direct enumeration of all possible configurations while keeping the minimizer of the expected conditional posterior loss, but this ''brute-force'' approach considers an exponential number of solutions. A simple heuristic is to restrict the global centroid to be one of the local centroids,Ŷ Y~arg miñ Another alternative is to just take as global centroid the local

Constrained Global Centroid
A constrained, on the number of binding sites, global centroid might be more computationally feasible since we are restricting the space of available configurations. For instance, consider the 1global centroid,Ŷ As when defining local centroids, we can approximateŶ Y o using a paired loss, and since we have that It is important to note that while the restriction of one binding site might seem artificial, the derivation ofŶ Y o is helpful in recognizing sequence regions that are likely to host binding sites. In fact, since P c captures the posterior probability of having a binding site starting at each position, and considering the overlap gain G, the convolution of G and P c highlights positions that have higher posterior probability of being covered by a binding site.

Multiple Sequences, Multiple Binding Sites per Sequence, Random Motif
We are now ready to address our model in broader generality: the dataset now comprises m sequences, R~fR i g m i~1 , and thus binding site configurations are also indexed by sequence, Y~fY i g m i~1 . As before, we have that Y is independent of motif parameters H, but we further assume that sequences and configurations are conditionally independent given H: Given H we would be able to apply the methods discussed this far to each sequence separately: compute forward and backward sums to obtain marginal posterior probabilities for each Y i and then find local centroids and the i-th global centroid. We will, however, assume that H is random, say, independently with the usual conjugacy [14], and we thus wish to also conduct inference on background and motif compositions. This assumption, albeit more realistic, complicates matters, since the marginal unconditioned posterior distributions of Y and H are not readily available; we are now required to estimate them before obtaining centroid estimates.
To obtain the centroids we follow the procedure described in the last section, but adopting Monte Carlo estimates of the marginal posterior distributions, for i~1, . . . ,m, where T is the number of samples. In File S1 we present a Gibbs sampler [27,28] that draws Y i for each sequence given H and then samples H conditional on the binding site configurations Y , similar to the approach in [14].

Illustrative Examples
Example 1: One Sequence, One Binding Site. Consider the following sequence of length n~200 from the nucleotide alphabet S~fA, C, G, Tg, 10 20 30 40 50 | | | | | GCCACTTTCGGGCCCGTGTCTAACGCACCACGGGC-TACGTGACGGTGTGG CTCTATACTGACGACGTGAAC-CAAGCTTTACTGAAGGACTTGCTGTTCCC CGACCCA-TTTCCTGCCAGAACCTCTGACCAGTGTCTAGGGCTAT-CGCCCG TGATGTCTCATGGCGACGCGCGAGGCGGT-TGCTCGCCTCACTCCGTTCTG and a motif of length L~6 with parameters H given by Table 1. Figure 1 shows the conditional marginal posterior P(Y DR,H) and the convolution G Ã P( : D R,H) used to obtain the centroid Y Y C~3 6, binding at the subsequence TACGTG, close to the consensual motif. Note that since H is very informative the posterior profile has clear peaks and in this caseŶ Y c~Ŷ Y M , the two estimators coincide.
Example 2: One Sequence, Multiple Binding Sites. We revisit the same sequence from Example 1, but now allow for at most C~tn=Ls~33 binding sites, and adopt the prior given in Equation 3 with b~3 and thus p~1{b=n~0:985. Using Algorithm 1 in File S1 we are able to compute the conditional marginal posteriors P(c(Y )D R,H) and P(Y k D c(Y ),R,H) for k~1, . . . ,c(Y ). These posterior distributions yield the local centroids-according to Algorithm 1-and the global centroid from Equation 9. In Table 2 we list the marginal posterior P(c(Y )~c DR,H) up to the smallest c such that P(c(Y )ƒc DR,H)w0:95, along with the local centroids; the global centroidŶ Y C is highlighted. Interestingly, the global centroid coincides with the local centroid from the modal number of binding sites.
In Figure 2 we display the posterior probabilities of binding site coverage P c from Equation 10, along with the convolutions that are needed to define the 1-global centroidŶ Y o~3 6. As can be seen, position 36 has a lot of support, being present in all the local centroids listed in Table 2; in fact, the probability of a binding site starting at position 36 is greater than 50%.
While P c can provide us guidance for which positions are likely to start a binding site, using P c to define local centroids can be misleading. For instance, we could expect that the local centroid with three binding sites-the modal number of binding siteswould be, following a decreasing order on P c , 36, 63, and 147. However, if we examine the marginal posteriors P(Y k Dc(Y )~3,R,H) in Figure 3 we realize that position 13 is favored over position 63 because, if F k ¼ :  We assume a non-informative prior on H by setting a j,s~1 for s[S and j~0, . . . ,L; the prior on each sequence Y i is the same prior from Example 2 with p~0:985. Algorithm 2 in File S1 is run for 10,000 iterations to guarantee convergence (diagnostics not shown.) The marginal posterior distribution of H can be assessed in Figure 4. Since most positions in the sequences are background sequences h 0 has very small posterior variances. Also note that the canonical palindromic E-box motif, with consensus CACGTG, is  Background is assumed to be CG-rich, while the motif represents a canonical palindromic E-box, CACGTG [34]. doi:10.1371/journal.pone.0080511.t001   recovered. The procedure is now similar to what we presented in Example 2; the main difference is that the marginal posterior distributions are estimated from Markov chain Monte Carlo (MCMC) samples obtained as shown in File S1. Table 3 lists the estimated marginal posterior distribution of the number of binding sites, the local and global centroids. The global centroid does not coincide with the local centroid for the modal number of binding sites. Moreover, the local centroids here are different from the (conditional) local centroids in the last example, most likely due to the randomness of H being taken into account. Figure 5 displays the estimated P c , G Ã P c , and the centroids. We see that compared to the second example some posterior mass has shifted to positions 29 and to the group of positions 166, 167, and 168. Here we clearly see the advantage of a centroid estimator: G Ã P c , and later G Ã P k ( : D R), gathers evidence of motif binding from nearby positions, yielding a better summary-  The selection of position 167 in the second local centroidŶ Y 2 might seem puzzling since the peaks at positions 36, 63, and 147 hold higher coverage probabilities. CheckingP P(Y k DR) in Figure 6 helps dismiss any doubts: most of the support for these positions come from configurations with higher number of binding sites, as evidenced by the respective local centroids, but these configurations hold relatively low posterior mass. When c(Y )~2, the prior on Y 2,2 assigns more posterior probability to higher positions, close to the end of the sequence, simply because there are more configurations for Y 2,2 on these positions. It is also important to notice that while none of the positions in the cluster 166-168 has higher marginal posterior mass than positions 63 and 147, the convolution G ÃP P 2 ( : DR) is maximized at position 167, that is, the cluster when taken together has more support from the data, as weighted by G.

Case Study
We end this section with an example from the real-world dataset in [5], sequence set yst02r. The dataset contains m~4 sequences each with n~500 letters. We set L~16 and adopt a noninformative prior on H, as in the previous example, and the prior on each Y i , for the i-th sequence, from Equation 3 with b~3 per thousand positions, so p~1{3=1000~0:997. As in the previous example, 10,000 iterations suffice to reach convergence.
Let us focus on the second sequence. Figure 7 pictures the binding site coverage probabilities, along with the local centroids. The global centroidŶ Y C~f 85,105,169g contains three binding sites, and it is also the local centroid for the modal number of binding sites, withP P(c(Y )~3 D R)~0:32. Since most of the posterior mass in concentrated in configurations with c(Y )~3, the posterior profilesP P(Y k D c(Y )~3,R) are similar to P c and are thus omitted.
From the MCMC samples we can produce the MAP estimatê Y Y M~f 86,105,174g as the configuration with highest frequency among the samples:P P(Ŷ Y M D R)~0:032. In fact, we can estimate the posterior probability of each sampled binding site configuration and then, using classic multidimensional scaling [29], visualize the estimated posterior distribution in Figure 8. It is interesting to note that the null configuration-that is, without binding sites-is also very likely with posterior probability 0:024. In contrast, the global centroid has very small posterior probability, close to 0:001;  it sits, however, closer to configurations with high posterior mass, including the local centroids with one, two, and four binding sites.
To better assess how the centroid estimator is closer to a mean than a mode estimator, we plot the estimated posterior distribution of the generalized loss function H centered at bothŶ Y C andŶ Y M in Figure 9.   concentrate posterior mass: one with configurations Y such that 25ƒH(Ŷ Y C ,Y )ƒ40 and another with configurations further away, satisfying 40ƒH(Ŷ Y C ,Y )ƒ50.

Discussion
In this paper we have presented a Bayesian approach, similar to the Gibbs motif sampler in [12,14], that jointly models motif and background compositions and binding site locations in a set of  sequences. More importantly, we discuss and formalize an inferential procedure based on the centroid estimator proposed by Carvalho and Lawrence [20]. As in any Bayesian analysis, we wish to evaluate features of interest in a model based on their posterior distribution; however, if we are required to pick a representative configuration, a point in the parameter space, then a principled approach is to elect a loss function and conduct formal statistical decision analysis. In this sense, by exploring a more refined loss function that depends on position-wise comparisons between sequence states-background or motif positions-we are able to identify a better representative of the posterior space of binding site configurations. Perhaps more importantly, this loss function is meaningful to investigators since is commonly adopted as a metric to measure binding site level accuracy [4,5,25], and so the centroid estimator should be preferred over MAP estimation in principle. Moreover, as pointed out in [20], the centroid estimator better accounts for the distribution of posterior mass; it is more similar to a median than to a mode, and can thus offer better predictive resolution than the MAP estimator [18]. When applied to motif discovery, the centroid estimator captures information in the vicinity of binding site positions through a convolution in marginal posterior distributions of binding sites.
Given the combinatorial number of possible configurations in the parameter space it is not straightforward to identify the centroid estimate through enumeration or even a systematic approach. Yet, we devise an approximative scheme that efficiently optimizes an upper bound on the posterior expected loss and thus provides a related centroid. Despite its heuristic nature, the proposed method has another advantage besides computational convenience: it allows for an informative depiction of the posterior distribution on binding site configurations. First, when defining the local centroids, we are able to assess the contributions from each binding site through their marginal posterior distributions conditional on the number of binding sites, and, in particular, through the convolution of these marginal profiles with the gain filter; secondly, when finding the global centroid we explore the marginal posterior distribution on the number of binding sites. Moreover, other representations might be helpful in understanding the distribution of posterior mass, as in the use of P c (in Equation 10) to pinpoint the 1-global centroid and measure the overall support of the configurations to a binding site at some specific position in the sequence. These comments are in the spirit of an estimator being also a communicator of the posterior space and the particular choice of prior distribution (see, e.g., Section 4.10 in [24].) It is important to note that even when the model is accurate, poor inference might fail in recovering relevant features of the space. In Example 2, the MAP estimate is the null configuration, while the centroid indicates three binding sites that represent a group of configurations that jointly pool significant posterior mass. It is also common that the posterior distribution is too complex to be reasonably captured by a single representative; in this case the expected posterior loss could also be used to partition the space and further define additional representatives as conditional estimates on each subspace. This is a direction of work that warrants interest and that we intend to follow next.
Product multinomial and product Dirichlet models are justified as a good working, first approximation based on position independence. There are many extensions to this model that consider DNA strand complementarity [30], a more informative Markov structure for the background composition [31], and an explicit representation of the number of binding sites per sequence [32]. While we adopted a simple hierarchical model to guide the discussion, the proposed methodology is actually broader and the centroid estimators can be obtained from any Bayesian procedure that reports marginal posterior probabilities P(c(Y ) DR) and P(Y k Dc(Y ),R), k~1, . . . ,c(Y ), for sequence R and binding site configuration Y .
Further improvements can be obtained by specifying a more complex model that accounts, for example, for higher order Markov chains with more states for the background, as in [30,31], phylogenetic profiles [22], structural information [33], a variable motif length, or dependency among motif positions. As pointed out by Hu, Li, and Kihara [4], motif discovery using sequence only is well known for low signal-to-noise ratio; future extensions would also incorporate other data sources, such as gene expression or ChIP-Seq data, to increase the signal-to-noise ratio. In addition, for future work we intend to extend the model to account for multiple motifs, either from multiple TFs or from a single TF with alternative motifs. While the problem then becomes computationally more challenging, we expect that recursions and estimators similar to the ones discussed here will follow from extra bookkeeping on which motifs are bound at each binding site.

Supporting Information
File S1 Derivation of conditional and marginal posterior probabilities for Y and c(Y ) and Gibbs sampler for the posterior joint on Y and H. Derivation of conditional posterior probabilities P(c(Y ) DR,H) and marginal posterior probabilities P(Y k D c(Y ),R,H), along with a routine to compute them in Algorithm 1. A Gibbs sampler to iteratively sample H D Y ,R and Y DH,R is given in Algorithm 2. (PDF)