conting : An R Package for Bayesian Analysis of Complete and Incomplete Contingency Tables

The aim of this paper is to demonstrate the R package conting for the Bayesian analysis of complete and incomplete contingency tables using hierarchical log-linear models. This package allows a user to identify interactions between categorical factors (via complete contingency tables) and to estimate closed population sizes using capture-recapture studies (via incomplete contingency tables). The models are ﬁtted using Markov chain Monte Carlo methods. In particular, implementations of the Metropolis-Hastings and reversible jump algorithms appropriate for log-linear models are employed. The conting package is demonstrated on four real examples.


Introduction
Contingency tables (see, e.g., Agresti 2007) are formed when a population is cross-classified according to a series of categories (or factors).Each cell count of the contingency table gives the number of units observed under a particular cross-classification.The purpose of forming a contingency table is to identify the dependence structure between the factors, i.e., to identify associations (or interactions) between the factors, using statistical models.Additionally, incomplete contingency tables formed from capture-recapture studies can be used to estimate closed populations (Fienberg 1972).Here some of the factors correspond to sources which have or have not observed members of a target population.Cell counts corresponding to not being observed by any of the sources are missing (or unknown).However they can be estimated by fitting a statistical model to the observed cell counts and predicting the missing cell counts.Note that our definition of an incomplete contingency table differs from that of, e.g., Mantel (1970) who defines incomplete contingency tables to be those with structural zeros.We can also distinguish our concept of incomplete contingency tables from those where For example, consider the alcohol, obesity and hypertension (AOH) data given by Knuiman and Speed (1988) and included as an example dataset in conting.Here, 491 people from Western Australia are cross-classified according to the following F = 3 factors: alcohol intake (l 1 = 4 levels); obesity (l 2 = 3 levels) and hypertension (l 3 = 2 levels).Therefore, there are n = l 1 × l 2 × l 3 = 24 cells in the corresponding contingency table and the total population size is N = 491.See Section 4.1 for the associated Bayesian analysis of this table using conting, which identifies interactions between the factors.
For a log-linear model it is assumed that independently, for i ∈ S, where log µ i is written as a linear combination of the intercept, main effect and interaction parameters (see, e.g., Overstall and King 2014).However for identifiability, these parameters are constrained using, for example, sum-to-zero or cornerpoint constraints.In this paper and in conting, we use sum-to-zero constraints.The log of the expectation, µ i , of y i can then be written as where η i is referred to as the linear predictor, φ ∈ R is the unknown intercept parameter, θ ∈ R p is the p × 1 vector of unknown regression parameters and x i is the p × 1 design vector that identifies which elements of θ are applicable to cell i ∈ S. Let β = φ, θ be the (p + 1) × 1 vector of log-linear parameters.We can write (2) in matrix form as where µ and η are n × 1 vectors with elements given by µ i and η i , 1 n is an n × 1 vector of ones, X is the n × p design matrix with rows given by x i and log(•) is applied element-wise.
The specification given by (2), and equivalently by (3), assumes that we know which interactions are present in the model.This will not be the case in practice, so we introduce an unknown model indicator, m, which denotes a combination of interactions.We now write the log-linear model as log where θ m is the p m × 1 vector of regression parameters for model m and x m,i is the p m × 1 design vector that identifies which elements of θ m are applicable to cell i ∈ S for model m.
Let β m = φ, θ m denote the vector of log-linear parameters for model m.In matrix form this model is where X m is the n × p m matrix with rows given by x m,i .
Let M be the set of all models we wish to consider.In this paper and in conting, M consists of hierarchical log-linear models (e.g., Dellaportas and Forster 1999).Note that hierarchical models include, as a subset, the classes of graphical and decomposable models.Hierarchical models adhere to the principle of marginality (e.g., Fox 2002, p. 135), i.e., we cannot have a higher-order interaction unless all the constituent lower-order interactions are included in the model.Typically, see, e.g., Dellaportas and Forster (1999), the simplest (or minimal) model we wish to consider is the so-called independence model, i.e., the model with main effects for all factors but no interactions.We also now define the concept of the maximal model, i.e., the most complex model we wish to consider.This is usually accomplished by specifying the highest-order interaction terms that we will consider.The saturated model, for a complete contingency table, is the one that contains the F -way interaction between all factors.
To complete the model specification under the Bayesian approach, we specify a joint prior distribution for φ, θ m and m which is denoted by π(φ, θ m , m).We decompose this distribution as follows where π(φ, θ m |m) is the joint prior distribution of φ and θ m conditional on model m ∈ M and π(m) is the prior model probability of model m ∈ M, such that m∈M π(m) = 1.
In this paper, and in conting, we assume a position of having weak prior information and wish to specify prior distributions that reflect this position.However due to Lindley's paradox (see, e.g., O'Hagan and Forster 2004, pp. 77-79) care must be taken when specifying prior distributions that reflect weak prior information since the posterior model probabilities will be sensitive to the scale of the prior variance.
One approach from the literature is to use a "default" prior distribution (Kass and Wasserman 1996).For examples of such prior distributions which are directly applicable to loglinear models, see Dellaportas andForster (1999), Ntzoufras, Dellaportas, andForster (2003), Sabanés-Bové and Held (2011) and Overstall and King (2014).We use the generalized hyper-g prior proposed by Sabanés-Bové and Held (2011) where for a log-linear model, we decompose the joint prior distribution of φ and θ m as with π(φ) ∝ 1 and , with σ 2 > 0 an unknown hyperparameter with a hyper-prior distribution given by σ 2 ∼ IG a 2 , b 2 , where a and b are specified hyperparameters.The Sabanés-Bové and Held prior distribution is a generalization to GLMs of the Zellner g-prior (Zellner 1986) for linear models.It can be interpreted as being the posterior distribution from a locally uniform prior and an imaginary sample where 1/σ 2 indicates the size of this "prior sample" (Dellaportas and Forster 1999).If σ 2 = 1, then the Sabanés-Bové and Held prior distribution provides the same information as a prior sample of one observation and the prior reduces to the unit information prior (Ntzoufras et al. 2003).
Both the Sabanés-Bové and Held and unit information priors are implemented in conting.The user can specify the value of the hyperparameters a and b, under the Sabanés-Bové and Held prior.They have default values of a = b = 10 −3 and we use these default values whenever we employ the Sabanés-Bové and Held prior for the examples in Sections 4.3 and 4.4.
For the prior model probabilities we assume a uniform prior over the model space, i.e., π(m) = |M| −1 where |M| denotes the number of models in M.
Note that, as an alternative to the model specification given in (1), we can assume that y|N, ρ ∼ Multinomial (N, ρ) , where the elements of ρ are given by Under the multinomial specification, the intercept, φ, is unnecessary since it will cancel in the numerator and denominator of (6).For complete contingency tables, Forster (2010) shows that the joint posterior distribution of θ m and m are identical under both the Poisson and multinomial models, if the joint prior distribution for φ and θ m is specified as π(φ, θ m |m) = π(θ m |m), i.e., as is true for the Sabanés-Bové and Held and unit information priors.
Overstall, King, Bird, Hutchinson, and Hay (2014) extend this result to incomplete contingency tables and show that the joint posterior distribution of θ m , m and the missing cell counts are identical under the Poisson and multinomial models if we adopt the same prior structure as above for φ and θ m , and the prior distribution for the unknown total population size, N , under the multinomial model is of the form π(N ) ∝ N −1 , i.e., the Jeffreys prior (Madigan and York 1997).
It follows that the MCMC methods detailed in Section 3 allow us to evaluate the posterior distribution under either the Poisson or multinomial model formulations.
Under the Sabanés-Bové and Held, and unit information prior distributions described above, the posterior distribution is analytically intractable necessitating the use of the MCMC methods, detailed in Section 3, to estimate, for example, the posterior model probabilities.Under the subset of decomposable models and the multinomial model formulation, Dawid and Lauritzen (1993) show that by using a hyper-Dirichlet prior distribution the posterior model probabilities are available in closed form.Madigan and York (1997) used this result to estimate closed population sizes from incomplete contingency tables.However, as noted by Dellaportas and Forster (1999), we should not restrict ourselves to the less flexible class of decomposable models for solely computational reasons.Hence we consider the much richer class of hierarchical models and use the prior distributions described above.

Posterior distributions
In the case of complete contingency tables we evaluate the joint posterior distribution of φ, θ m , σ 2 (if unknown) and m given by where π(y|φ, θ m , m) is the likelihood function under model m ∈ M.This distribution is analytically intractable so we generate a sample from it using MCMC methods.In particular we use the reversible jump algorithm (Green 1995).We briefly describe a particular implementation of this algorithm, suitable for log-linear models, in Section 3.2.
For incomplete contingency tables, let y = y (O) , y (U ) where y (O) and y (U ) are the observed and unobserved cell counts, respectively.Furthermore, let O and U denote the sets of observed and unobserved cells, respectively, so that O ∩ U = ∅ and O ∪ U = S.We evaluate the joint posterior distribution of y (U ) , φ, θ m , σ 2 , and m, i.e., where π(y (O) , y (U ) |φ, θ m , m) = π(y|φ, θ m , m) is the complete-data likelihood function.We generate a sample from this posterior distribution using a data-augmentation MCMC algorithm (King and Brooks 2001).This algorithm is briefly described in Section 3.2.
In some cases, for incomplete contingency tables, one of the sources may observe individuals who are not members of the target population.An example of this is from the capturerecapture studies that have been used to estimate the number of people who inject drugs (PWID) in Scotland in the years 2003, 2006and 2009(see, King, Bird, Brooks, Hutchinson, and Hay 2005;King, Bird, Hutchinson, and Hay 2009;King, Bird, Overstall, Hay, and Hutchinson 2013, respectively).Here there are four sources who observe PWID.One of the sources is the Hepatitis C virus (HCV) database.This database does not actually observe PWID, but instead people who are newly diagnosed with the HCV and have injecting drug use as a historical risk factor.Assuming that these people are PWID results in over-estimation of the total population size of PWID in Scotland (see King et al. 2009 andOverstall et al. 2014).A modeling approach was adopted by Overstall et al. (2014) whereby people observed by the HCV database and another source are regarded as PWID, however the true number of PWID observed by just the HCV database is unknown but bounded from above by the count in this cell, i.e., the cell count is left censored.Let y (O) , y (U ) and y (C) be the observed, unobserved and censored cell counts.Let z (C) be the observed cell counts of the censored cells, i.e., the upper bound on the true cell counts, y (C) , for the censored cells.Furthermore, let C denote the set of all censored cells, so that all pairwise intersections of O, U and C are the empty set and O ∪ U ∪ C = S.We evaluate the joint posterior distribution of y (U ) , y (C) , φ, θ m , σ 2 and m, given by where π(y (O) , y (U ) , y (C) |φ, θ m , m) = π(y|φ, θ m , m) is the complete-data likelihood function and π(z (C) |y (C) ) gives the distribution of z (C) conditional on y (C) .Overstall et al.
, uninformative censoring.The data-augmentation algorithm used to generate a sample from the posterior distribution given by ( 9) is described in Section 3.2.
The posterior distributions given by ( 7), ( 8) and ( 9) are all joint distributions of the model parameters, missing cell counts (in the case of ( 8) and ( 9

Computation
In this section we describe the computational algorithms used to generate an MCMC sample from the posterior distributions given by ( 7), ( 8) and ( 9) in Section 3.1, i.e., for complete contingency tables, and for incomplete contingency tables (without and with censoring).We begin by describing the most general data-augmentation algorithm which can be used to generate an MCMC sample from the posterior distribution given by ( 9), i.e., the joint posterior distribution of φ, θ m , σ 2 , m, y (U ) and y (C) .The algorithms for generating an MCMC sample from the posterior distributions given by ( 7) and ( 8) are special cases of the general algorithm.
The general data-augmentation algorithm cycles through the following steps.
2. Given the current values of φ, θ m , σ 2 , m and y (U ) , generate new values of the censored cell counts, y (C) , from the full conditional distribution which is denoted by 3. Given the current values of y (C) and y (U ) , generate new values of the model parameters and model indicator, φ, θ m , σ 2 and m, from the full conditional distribution which is denoted by π(φ, θ m , σ 2 , m|y (C) , y (U ) , y (O) , z (C) ).
We will shortly describe how we can generate values from the full conditional distributions given in Steps 1, 2 and 3.The above algorithm is implemented in conting by the functions bict() and bictu(), which act as wrapper functions for bict.fit()which is the workhorse.
For incomplete contingency tables, with no censored cell counts, we can skip Step 2. This algorithm is also implemented by bict() and bictu().For complete contingency tables, we can skip Steps 1 and 2. This algorithm is implemented in conting by the functions bcct() and bcctu(), which again act as wrapper functions for bcct.fit().
We now briefly describe how the simulation in each of the above three steps can be achieved.We do not describe the methods in detail but outline the algorithms involved and point the appropriate references out to the reader.

The simulation in
Step 1 is trivial.The full conditional distribution can be written as and each element of y (U ) can be generated as follows for log µ i = φ + x m,i θ m and i ∈ U.

Similarly in
Step 2, the full conditional distribution can be written as and each element of y (C) can be generated as follows , where z i is the element of z (C) corresponding to y i and Trunc-Poisson(µ, z) is the truncated Poisson(µ) distribution bounded from above by z.
In Step 3, the univariate full conditional distribution of σ 2 is The full conditional distribution of φ, θ m and m can be written as and can be generated from using the reversible jump algorithm.The reversible jump algorithm works as follows.Suppose the current model is m with current log-linear parameters β m = (φ, θ m ) .We propose a move to model k with probability π m,k .Innovation variables (Green 2003), which are denoted by u m , are generated from some proposal distribution and then a mapping function is applied to β m and u m to produce the proposed log-linear parameters β k = (φ, θ k ) .We accept this proposed move with an associated acceptance probability.
The key is the joint specification of the mapping function and the proposal distribution with different implementations resulting from different specifications.We use the specification for GLMs proposed by Forster, Gill, and Overstall (2012).For details on this specification for log-linear models, see Overstall et al. (2014).This method is implemented in conting by the function RJ_update().
Note that π m,m , the probability of proposing a move to the current model (called a null move), can be positive.In this case, we generate new values for the log-linear parameters, β m , conditional on the model, m.We use the Metropolis-Hastings algorithm, in particular, the iterated weighted least squares implementation for GLMs proposed by Gamerman (1997).
For details on this method applied to log-linear models, see Overstall et al. (2014).This method is implemented in conting by the function iwls_mh().
The values of π m,k are specified such that only local moves are proposed, i.e., π m,k are only non-zero for models k that can be derived from m by adding or dropping a single interaction term (both subject to the principle of marginality).The moves of adding or dropping a term are referred to as birth or death moves, respectively, by Forster et al. (2012).Suppose our current model is m, and we can propose a birth or death move to T m models, then we set In conting, the user may specify the probability, π m,m , of the null move as an optional argument to the functions bcct() and bict(), with default value 0.5.

Assessing model adequacy
In this section we briefly review the method of assessing model adequacy, under the Bayesian approach, which is implemented in conting.This method uses predictions from the model of the observed cell counts.The idea is to compare these predicted cell counts to the observed cell counts.If they are inconsistent, then we can conclude that the model is inadequate.For more details on other approaches to assessing model adequacy, under the Bayesian approach, see Gelman et al. (2004, Chapter 6).
The comparison of predicted and observed cell counts can be made using the Bayesian (or posterior predictive) p value.In general, let y rep and ψ denote the predicted cell counts and the vector of all of the model parameters.Define T (Y, ψ) to be a discrepancy statistic that can depend on both the cell counts, Y, and the parameters, ψ.The Bayesian p value is defined as where the probability is with respect to the joint posterior distribution of y rep and ψ.If the model is inadequate, the distribution of T (y rep , ψ) will be inconsistent with the distribution of T (y, ψ), therefore the Bayesian p value will be close to zero or one.
The Bayesian p value differs from the classical p value in that for a classical p value the discrepancy statistic only depends on the cell counts.The probability that defines the classical p value is with respect to y rep , conditional on a fixed value for ψ (either from the null hypothesis or an estimated value).Under the Bayesian approach, the discrepancy statistic can depend on ψ, over which we integrate out uncertainty with respect to the posterior distribution.
We can easily estimate the Bayesian p value using the MCMC sample generated from the posterior distribution.In the case of log-linear models applied to contingency tables, ψ = (φ, θ m , m), and let φ (j) , θ m (j) and m (j) be the jth value of φ, θ m and m from the MCMC chain.We generate a predicted response vector, y (j) , using , where log µ Therefore for an MCMC sample of size M , we will have M sampled values, y (1) , . . ., y (M ) from the posterior predictive distribution of y rep .Let and we estimate the Bayesian p value to be the proportion of T (j) O , for j = 1, . . ., M .Examples of discrepancy statistics which are appropriate for contingency table data are the χ 2 , deviance and Freeman-Tukey statistics, given by Each of these statistics can be used within conting to assess model adequacy using the bayespval() function (see Section 3.4).This function estimates the Bayesian p value also and allows the user access to the sampled values T O , for j = 1, . . ., M .

Functions of conting
Figure 1 shows all of the functions of conting that the user will typically call.The main two functions are bcct() and bict() that implement the MCMC algorithm in Section 3.2, for complete and incomplete contingency tables, respectively.Mandatory arguments to bcct() and bict() will be the form of the maximal model (in terms of a 'formula' object) and the number of MCMC iterations to perform in the first instance.The data are introduced to bcct() and bict() using the data argument, where the data must be either a 'data.frame'or 'table' object.Otherwise, the variables in the maximal model are taken from the environment from which bcct() or bict() is called.Once the data-augmentation algorithm has finished the requested number of iterations, then additional iterations can be requested using bcctu() and bictu(), for bcct() and bict(), respectively.
The functions bcct() and bcctu() will return an object of class 'bcct' which is a list containing all of the MCMC output.Correspondingly, bict() and bictu() will return an object of class 'bict'.The S3 generics print and summary can be applied to 'bcct' and 'bict' objects producing a very simple summary (in the case of print) and a more detailed summary (in the case of summary).There are also seven specific functions that can be used for 'bcct' and 'bict' objects and these are summarized in Table 1.The functions shown in Table 1 (except for inter_probs(), accept_rate() and sub_model()) are used to construct the output given by the summary() function applied to 'bcct' and 'bict' objects.

Examples
In this section we use four examples to demonstrate conting.The first two examples involve complete contingency tables and the latter two involve incomplete contingency tables.The data for the four examples are included in conting.In each example, we have specified the seed using the set.seed()function, so that all of the examples are fully reproducible.

Alcohol, obesity and hypertension
In Section 2, we introduced the AOH data as an example of a complete contingency table.We now conduct a Bayesian analysis of this table using the bcct() and bcctu() functions.
The mandatory arguments to bcct() are the form of the maximal model and the number of MCMC iterations.We specify that the maximal model be the saturated model which includes the three-way interaction.We initially request 1000 MCMC iterations.Optional arguments to bcct() involve specifying the starting values of the MCMC algorithm, the prior (Sabanés-Bové and Held, or unit information), the value of π m,m and details for saving the MCMC output to external files.Additionally, we can also specify the hyperparameters for the Sabanés-Bové and Held prior (a and b from Section 2) and request a progress bar to monitor the iterations.We assume the unit information prior distribution but allow the remaining arguments to take their default values so that the algorithm starts from the posterior mode of the maximal model, π m,m = 0.5, the output is not saved to external files, and no progress bar is displayed.Once the initial 1000 MCMC iterations are complete we request a further 9000 iterations (making a total of 10000) using the bcctu() function.We then print out the resulting 'bcct' object which gives a very simple summary which essentially informs the user of the analysis performed.The object aoh_ex is a list which includes BETA, a matrix containing the MCMC samples for the log-linear parameters.We assess MCMC convergence informally (O'Hagan and Forster  The resultant plot is shown in Figure 2. We see that convergence occurs very quickly.For a detailed summary of the analysis performed, including posterior summary statistics, use the generic S3 method summary().This function has optional arguments which control how the MCMC output is used.These are the number of burn-in iterations (n.burnin), the amount of thinning (thin), the cutoff of the posterior probability for presenting results on the log-linear parameters (cutoff), the discrepancy statistic (statistic), values to control which posterior model probabilities to present (best and scale), and the target probability (prob.level)for the highest posterior density intervals (HPDIs).We use a conservative burn-in phase of 5000 iterations, present posterior summary statistics only for log-linear parameters with probability greater than 0.05 and present the posterior model probabilities of the four models with the highest values for these probabilities.We use the χ 2 discrepancy statistic and no thinning which are the default values of the arguments statistic and thin.
R> aoh_ex_summ <-summary(aoh_ex, n.burnin = 5000, cutoff = 0.05, best = 4) R> aoh_ex_summ Posterior summary statistics of log-linear parameters: post_prob post_mean post_var lower_lim upper_lim First, under the χ 2 discrepancy statistic, the Bayesian p value of 0.22 does not indicate an inadequate model meaning that predictions of the cell counts are consistent with the observed cell counts.The posterior modal model is the independence model and the model with the second largest posterior model probability only contains the hypertension and obesity interaction (hyp:obe).These two models are estimated to account for approximately 98% of the posterior probability.This shows there are no strong interactions between the factors although there is some evidence for interaction between hypertension and obesity.From inspecting the posterior means of the log-linear parameters associated with the hypertension and obesity interaction, it indicates that as obesity level moves from low to high, the probability of hypertension increases.Now suppose we were interested in inference solely based on the model with the second highest posterior model probability, i.e., the model with formula = y ~alc + hyp + obe + hyp:obe.We can produce posterior summary statistics similar to those given by summary(), but which are conditional on a model of interest.This is accomplished using the sub_model() function.There are two ways to pass the model of interest to sub_model(); either by providing a formula, or by the ranking of the model with respect to its posterior model probability (order).We demonstrate with the latter, i.e., set order = 2.
R> sub_model(object = aoh_ex, order = 2, n.burnin = 5000) Note that if the MCMC algorithm has not visited the specified model (given by formula or order) then sub_model() will return an informative error.

Risk factors for coronary heart disease
In this section, we consider the 2 6 complete contingency table given by Edwards and Havránek (1985) and used by, for example, Dellaportas and Forster (1999)   Under the deviance discrepancy statistic, the Bayesian p value of 0.15 does not suggest an inadequate model.The models with the four highest posterior model probabilities are the same as identified by Dellaportas and Forster (1999) and Forster et al. (2012).With regards to the log-linear parameters we used the default value for cutoff so we are only presented with results for parameters with posterior probability greater than 0.75.This suggests there are strong interactions between smoking and strenuous physical work (A:C), systolic blood pressure (A:D) and the ratio of α and β lipoproteins (A:E), as well as for the interactions between strenuous mental and physical work (B:C) and between systolic blood pressure and the ratio of α and β lipoproteins (D:E).

Spina Bifida
In this section we consider a 2 3 × 3 incomplete contingency table given by Madigan and York (1997).Between 1969 and1974, in upstate New York, 621 people born with Spina Bifida are observed by three sources: birth certificates (S1), death certificates (S2), and medical rehabilitation records (S3).Additionally, these people are cross-classified according to ethnicity (eth) which has three levels: Caucasian, Afro-American and Other.These data can be used to estimate the total population size of people born with Spina Bifida by estimating the missing cell counts of people not observed by any of the three sources.
Below we print out the data.It can be seen that the three cell counts corresponding to people not observed by any of the three sources (one for each classification of ethnicity) are NA.This tells bict() that these cell counts are missing.The posterior probabilities of the log-linear parameters are consistent with those found by Overstall et al. (2014).Finally we derive an MCMC sample from the posterior distribution of the total population size and find its posterior mean and 95% HPDI using the tot_pop() function.

R> data(
R> scot_ex_tot <-total_pop(scot_ex, n.burnin = 200000, thin = 5) R> scot_ex_tot Posterior mean of total population size = 22856.2495 % highest posterior density interval for total population size = ( 16427 27097 ) The posterior mean of the total population size of PWID in Scotland in 2006 is 22900 (to nearest 100).This value is consistent with that found by Overstall et al. (2014).Applying the S3 generic plot() to an object of class 'totpop' will produce a histogram of the MCMC sample from the posterior distribution of the total population size.

R> plot(scot_ex_tot)
The resultant histogram is shown in Figure 3.Note that the bimodal nature of the posterior distribution of the total population size was also found by Overstall et al. (2014).This bimodality is caused by the presence, or absence, of the interaction between the social inquiry and Scottish Drug Misuse Database sources (S1:S3).The posterior mean for this interaction is positive so that the upper mode corresponds to the presence of the interaction, and the lower mode to absence.

Concluding remarks
This paper demonstrates the use of the R package conting, for the Bayesian analysis of complete and incomplete contingency tables.The conting package allows a user to identify interactions between factors and to estimate closed populations using incomplete contingency tables from capture-recapture studies.The capabilities of conting are demonstrated via four examples.
A novice user need only supply the data, the form of the maximal model (in the usual R way of using a formula), and the number of MCMC iterations.The S3 method for summary() will then provide all of the relevant information.
However a more experienced user can also access the sampled values of the model parameters and missing cell counts (for an incomplete table).For example, from Section 4. unobserved and censored cell counts.Furthermore, if the argument save is not NULL, then bcct() and bict() will save the MCMC output to external files; see Section 4.2.These files can be read into R and manipulated however the user wishes.Possible applications of this output include more extensive assessments of MCMC convergence and model adequacy, and the calculation of posterior summaries relevant to a specific research question.
Future work will involve updating conting to fit Bayesian models to contingency tables involving structural zeros, misclassified cell counts and ordinal factor levels.
) for incomplete contingency tables) and the model indicator.The model defined by such a joint distribution (i.e., including the model indicator) is known as the encompassing model (O'Hagan and Forster 2004, p. 164).

Figure 2 :
Figure 2: Trace plot for the intercept parameter, φ, in the alcohol, obesity and hypertension example.

Figure 3 :
Figure 3: Histogram of the posterior distribution of the total population size of PWID in Scotland in 2006.

Table 1 :
The seven specific functions that can be used for 'bcct' and 'bict' objects.
and Forster et al. (2012)to demonstrate statistical methodology for complete contingency tables.Here 1841 men have been cross-classified by six risk factors (each with two levels) for coronary heart disease.The factors are: A, smoking; B, strenuous mental work; C, strenuous physical work; D, systolic blood pressure; E, ratio of α and β lipoproteins; F, family anamnesis of coronary heart disease.We set the maximal model to be the saturated model including the six-way interaction.Starting from the model including all two-way interactions, we request 50000 iterations of the MCMC algorithm, under the unit information prior.We save the MCMC output to external files every 1000 iterations.
We begin by considering the independence model only, i.e., we do not consider model uncertainty.Single model inference can be achieved using bict() (and bcct() for complete contingency tables) by specifying the model of interest as the maximal model and then setting π m,m = 1 (i.e., null.move.prob= 1).Note that single model inference can also be achieved, having already fitted the encompassing model (Section 3), by using the sub_model() function as demonstrated in Section 4.1.We use 25000 iterations under the default Sabanés-Bové and Held prior.When the MCMC algorithm has finished, we summarize the analysis after discarding the first 5000 iterations as burn-in and use the default χ 2 discrepancy statistic.