DPpackage : Bayesian Semi- and Nonparametric Modeling in R

Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling ﬂexibility and robustness against mis-speciﬁcation of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R , DPpackage . Currently, DPpackage includes models for marginal and conditional density estimation, receiver op-erating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational eﬃciency, the actual sampling for each model is carried out using compiled C , C++ or Fortran code.


Introduction
In many practical situations, a parametric model cannot be expected to properly describe the chance mechanism generating an observed dataset. Unrealistic features of some common models (e.g., the thin tails of the normal distribution when compared to the distribution of the observed data) can lead to unsatisfactory inferences. Constraining the analysis to a specific parametric form may limit the scope and type of inferences that can be drawn from such models. In these situations, we would like to relax parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of a parametric statistical model. In the Bayesian context such flexible inference is typically achieved by placing a prior distribution on infinite-dimensional spaces, such as the space of all probability distributions for a random variable of interest. These models are usually referred to as Bayesian semiparametric (BSP) or nonparametric (BNP) models depending on whether the problem can be specified in such a way that the infinite-dimensional parameter θ can be written as θ = (θ 1 , θ 2 ), where θ 1 is a finite-dimensional parameter and θ 2 is an infinite-dimensional parameter, or not (see, e.g., Dey, Müller, and Sinha 1998;Walker, Damien, Laud, and Smith 1999;Ghosh and Ramamoorthi 2003;Hanson, Branscum, and Johnson 2005;Hjort, Holmes, Müller, and Walker 2010).
BNP is a relatively young research area in statistics. First advances were made in the sixties and seventies, and were primarily mathematical formulations. It was only in the early nineties with the advent of sampling based methods, in particular Markov chain Monte Carlo (MCMC) methods, that substantial progress has been made. Posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. The introduction of MCMC methods in the area began with the work of Escobar (1994) for Dirichlet process mixtures. A number of themes are still undergoing development, including issues in theory, methodology and applications. We refer to Walker et al. (1999), , Hanson et al. (2005) and Hjort et al. (2010) for recent overviews.
While BNP and BSP are extremely powerful and have a wide range of applicability, they are not as widely used as one might expect. One reason for this has been the gap between the type of software that many users would like to have for fitting models and the software that is currently available. The most general programs currently available for Bayesian inference are BUGS (see, e.g., Gilks, Thomas, and Spiegelhalter 1994) and OpenBugs (Thomas, O'Hara, Ligges, and Sibylle 2006). BUGS can be accessed from the publicly available R program (R Development Core Team 2011), using the R2WinBUGS package (Sturtz, Ligges, and Gelman 2005). OpenBugs can run on Windows and Linux, as well as from inside R. In addition, various R packages exist that directly fit particular Bayesian models. We refer to Appendix C in Carlin and Louis (2008), for an extensive list of software for Bayesian modeling. Although the number of fully Bayesian programs continues to burgeon, with many available at little or no cost, they generally do not include semiparametric models. An exception to this rule is the R package bayesm (Rossi, Allenby, and McCulloch 2005;Rossi and McCulloch 2011), including functions for some models based on Dirichlet process priors (Ferguson 1973). The range of different Bayesian semiparametric models is huge. It is practically impossible to build flexible and efficient software for the generality of such models.
In this paper we present an up to date introduction to a publicly available R (R Development Core Team 2011) package designed to help bridge the previously mentioned gap, the DPpackage, originally presented in Jara (2007). Although the name of the package is due to the most widely used prior on the space of the probability distributions, the Dirichlet process (DP Ferguson 1973), the package includes many other priors on function spaces. Currently, DPpackage includes models considering DP (Ferguson 1973), mixtures of DP (MDP Antoniak 1974), DP mixtures ( DPM Lo 1984;Escobar and West 1995), linear dependent DP (LDDP De Iorio, Müller, Rosner, and MacEachern 2004;De Iorio, Johnson, Müller, and Rosner 2009), linear dependent Poisson-Dirichlet processes (LDPD Jara, Lesaffre, De Iorio, and Quintana 2010), weight dependent DP (WDDP Müller, Erkanli, and West 1996), hierarchical mixture of DPM of normals (HDPM Müller, Quintana, and Rosner 2004), centrally standardized DP (CSDP Newton, Czado, and Chapell 1996), Polya trees ( PT Ferguson 1974;Mauldin, Sudderth, and Williams 1992;Lavine 1992Lavine , 1994, mixtures of Polya trees ( MPT Lavine 1992Hanson and Johnson 2002;Hanson 2006;Christensen, Hanson, and Jara 2008), mixtures of triangular distributions (Perron and Mengersen 2001), random Bernstein polynomials (Petrone 1999a,b;Petrone and Wasserman 2002) and dependent Bernstein polynomials (Barrientos, Jara, and Quintana 2011). The package also includes models considering penalized B-splines (Lang and Brezger 2004) and a general purpose function implementing a independence chain Metropolis-Hastings algorithm with a proposal density function generated using PT ideas (Hanson, Monteiro, and Jara 2011). The package is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=DPpackage.
The article is organized as follows. Section 2 reviews the general syntax and design philosophy. Although the material in this section was presented in Jara (2007), its inclusion here is necessary in order to make the paper self-contained. In Sections 3, 4 and 5 the main features and usages of DPpackage are illustrated by means of simulated and real life data analyses. We conclude with additional comments and discussion in Section 6.

Design philosophy and general syntax
The design philosophy behind DPpackage is quite different from the one of a general purpose language. The most important design goal has been the implementation of model-specific MCMC algorithms. A direct benefit of this approach is that the sampling algorithms can be made dramatically more efficient than in a general purpose function based on black-box algorithms.
Fitting a model in DPpackage begins with a call to an R function, for instance, DPmodel, or PTmodel. Here "model" denotes a descriptive name for the model being fitted. Typically, the model function will take a number of arguments that control the specific MCMC sampling strategy adopted. In addition, the model(s) formula(s), data, and prior parameters are passed to the model function as arguments. The common arguments in every model function are listed next.
(i) prior: An object list which includes the values of the prior hyper-parameters.
(ii) mcmc: An object list which must include the integers nburn giving the number of burnin scans, nskip giving the thinning interval, nsave giving the total number of scans to be saved, and ndisplay giving the number of saved scans to be displayed on screen, that is, the function reports on the screen when every ndisplay iterations have been carried out and returns the process runtime in seconds. For some specific models, one or more tuning parameters for Metropolis steps may be needed and must be included in this list. The names of these tuning parameters are explained in each specific model description in the associated help files.
(iii) state: An object list giving the current value of the parameters, when the analysis is the continuation of a previous analysis, or giving the starting values for a new Markov chain, which is useful to run multiple chains starting from different points.
(iv) status: A logical variable indicating whether it is a new run (TRUE) or the continuation of a previous analysis (FALSE). In the latter case, the current value of the parameters must be specified in the object state.
Inside the R model function the inputs are organized in a more useable form, the MCMC sampling is performed by calling a shared library written in a compiled language, and the posterior sample is summarized, labeled, assigned into an output list, and returned. The output list includes, (i) state: An object list containing the current value of the parameters.
(ii) save.state: An object list containing the MCMC samples for the parameters. This list contains two matrices randsave and thetasave, which contain the MCMC samples of the variables with random distribution (errors, random effects, etc.) and the parametric part of the model, respectively.
In order to exemplify the extraction of the output elements, consider the abstract model fit: fit <-DPmodel(..., prior, mcmc, state, status, ....) The lists can be extracted using the following code: fit$state fit$save.state$randsave fit$save.state$thetasave Based on these output objects, it is possible to use, for instance, the boa (Smith 2007) or the coda (Plummer, Best, Cowles, and Vines 2006) R packages to perform convergence diagnostics. For illustration, we consider the coda package here. It requires a matrix of posterior draws for relevant parameters to be saved as a mcmc object. Assume that we have obtained fit1, fit2, and fit3, by independently running a model function three times, specifying different starting values each. To compute the Gelman-Rubin convergence diagnostic statistic for the first parameter stored in the thetasave object, the following commands may be used: The second command line above saves the results as a mcmc.list object class and the third command line computes the Gelman-Rubin statistic from these three chains.
Generic R functions such as print, plot, summary and anova have methods to display the results of the DPpackage model fit. The function print displays the posterior means of the parameters in the model, and summary displays posterior summary statistics (mean, median, standard deviation, naive standard errors, and credibility intervals). By default, the function summary computes the 95% highest posterior density (HPD) intervals using the Monte Carlo method proposed by Chen and Shao (1999). The user can display the order statistic estimator of the 95% credible interval by using the following code: The plot function displays the trace plots and a kernel-based estimate of the posterior distribution for the parameters of the model. Similarly to summary, the plot function displays the 95% HPD regions in the density plot and the posterior mean. The same plot but considering the 95% credible region can be obtained by using the following code: The anova function computes simultaneous credible regions for a vector of parameters from the MCMC sample using the method described by Besag, Green, Higdon, and Mengersen (1995). The output of the anova function is an anova-like table containing the pseudo-contour probabilities for each of the factors included in the linear part of the model.
The following sections will show how model fitting fuctions in DPpackage are implemented in the context of density regression, survival data and education data. A complete description of the currently available functions is given in the supplementary material file.

Bayesian density regression
Density estimation is perhaps the most traditional inference problem addressed by BNP inference. DPpackage provides several models to implement density estimation, including DPM, PT, Bernstein polynomials and more. Details are described in the on-line supplementary material and in the package documentation. We describe in more detail an extension of the basic density estimation problem to density regression. Density regression is the fully nonparametric version of traditional regression problems for data {(x i , y i )} n i=1 , where x i ∈ X ⊂ R p is a set of predictors, and y i ∈ R is the response variable. Rather than assuming a functional form for the mean function and/or a common error distribution the problem is cast as inference for a family of conditional distributions ∼ G x i . The current version of DPpackage considers several BNP models for related random probability distributions based on particular implementations of the dependent DP (DDP) proposed by MacEachern (1999MacEachern ( , 2000, a natural generalization of the approach discussed by Müller et al. (1996) for nonparametric regression to the context of conditional density estimation, and the hierarchical mixture of DPM models (HDPM) proposed by . In this section we show how to perform conditional density estimation using BNP models for related probability distributions, also referred to as Bayesian density regression, using the DPcdensity and LDDPdensity functions.
3.1. The linear dependent DP model MacEachern (1999MacEachern ( , 2000, proposed the DDP as an approach to define a prior model for an uncountable set of random measures indexed by a single continuous covariate, say x, {G x : x ∈ X ⊂ R}. The key idea behind the DDP is to create an uncountable set of DPs (Ferguson 1973) and to introduce dependence by modifying the Sethuraman (1994)'s stick-breaking representation of each element in the set. If G follows a DP prior with precision parameter α and base measure G 0 , denoted by G ∼ DP (αG 0 ), then the stick-breaking representation of G is where B is a measurable set, δ a (·) is the Dirac measure at a, θ l | G 0 iid ∼ G 0 and ω l = MacEachern (1999MacEachern ( , 2000 generalizes (1) by assuming the point masses θ(x) l , l = 1, . . ., to be dependent across different levels of x, but independent across l. , where x is a p-dimensional design vector. An advantage of this model for related random probability measures, referred to as the linear DDP (LDDP), is that it can be represented as DPM of linear (in the coefficients) regression models, when the model is convolved with a normal kernel. This approach is implemented in the LDDPdensity function, where for the regression data (y i , x i ), i = 1, . . . , n, the following model is considered: The LDDP model specification is completed with the following hyper-priors:

The weight dependent DP model
The LDDP of the previous section defines a mixture model where the weights are independent of the predictors z, given by where the weights ω l follow a stick-breaking construction and (β 0l , β l , σ 2 l ) iid ∼ G 0 . Motivated by regression problems with continuous predictors, different extensions have been proposed by making the weights dependent on covariates (see, e.g., Griffin and Steel 2006;Duan, Guindani, and Gelfand 2007;Dunson, Pillai, and Park 2007a;Dunson and Park 2008), such that An earlier approach that is related to the latter references and that also induces a weightdependent DP model, as in expression (2), was discussed by Müller et al. (1996). These authors fitted a "standard" DPM of multivariate Gaussian distributions to the complete data d i = (y i , z i ) , i = 1, . . . , n, and looked at the induced conditional distributions. Although Müller et al. (1996) focused on the mean function only, m(z) = E(y|z), their method can be easily extended to provide inferences for the conditional density at covariate level z, that is, a "density regression" model in the spirit of Dunson et al. (2007a). The extension of the approach of Müller et al. (1996) for related probability measures is implemented in the DPcdensity function, where the model is given by . To complete the model specification, the following hyper-priors are assumed This model induce a weight dependent mixture models, as in expression (2), where the components are given by where the weights ω l follow a DP stick-breaking construction and the remaining elements arise from the standard partition of the vectors of means and (co)variance matrices given by respectively.
The DPcdensity function fits a marginalized version of the model, where the random probability measure G is integrated out. Full inference on the conditional density at covariate level z is obtained by using the -DP approximation proposed by Muliere and Tardella (1998), with = 0.01.

Mixed-effects models with nonparametric random effects
Standard implementations of generalized linear mixed models (GLMM) typically assume independent and identically distributed random effects from a parametric distribution. Different BNP strategies have been proposed to relax the parametric assumption, including DP, DPM of normals, and PT (see, Jara, Hanson, and Lesaffre 2009, for a review and methods based on PT), which are available in the current version of DPpackage. In this section we show how to fit a semiparametric GLMM using the PTglmm function. The example further illustrates that a proportional-hazards model with nonparametric frailties can be fitted using any of the BNP-GLMM functions.

A semiparametric GLMM
Assume that for each of m experimental units the regression data (Y ij , x ij , z ij ), 1 ≤ i ≤ m, 1 ≤ j ≤ n i , is recorded, where Y ij is a response variable, and x ij ∈ R p and z ij ∈ R q are p-and q-dimensional design vectors, respectively. The observations are assumed to be conditionally independent with exponential family distribution, The means µ ij = E (Y ij | ϑ ij , τ ) and variances σ 2 ij = V ar (Y ij | ϑ ij , τ ) are related to the canonical ϑ ij and dispersion parameter τ via µ ij = b (ϑ ij ) and σ 2 ij = τ b (ϑ ij ), respectively. The means µ ij are related to the p-dimensional and q-dimensional "fixed" effects vectors β F and β R , respectively, and the q-dimensional "random" effects vector b i via the link relation where, h(·) is a known monotonic differentiable link function, and η ij is called the linear predictor. Due to software limitations, the analyses are often restricted to the setting in which the random effects follow a multivariate normal distribution, b 1 , . . . , b m | Σ iid ∼ N q (0, Σ). BNP extensions incorporate a probability model for the random effects distribution in order to better represent the distributional uncertainty and to avoid the effects of the miss-specification of an arbitrary parametric random effects distribution. Under these approaches, the parametric assumption is relaxed by considering where H is one of the previously mentioned probability models for probability distributions (e.g., DP, DPM, PT). Even though any of the BNP models can be considered with this aim, its implementation is not direct and it is necessary to discuss some important issues regarding the specification of the model. Specifically, it is important to stress that under parameterization given by expression (3), β R represents the mean of random effects, and b i represents the subject-specific deviation from the mean. It follows that fixing the mean of the normal prior distribution for the random effects b at zero in the parametric context corresponds to an identification restriction for the model parameters (see, e.g., Newton 1994; San Martín, Jara, Rolin, and Mouchart 2011). Equivalently, the random probability measure must be appropriately restricted in a semiparametric GLMM specification. In our settings, the location of G is "confounded" with the parameters β R . Although such identification issues present no difficulties for a Bayesian analysis in the sense that a prior is transformed into a posterior using the sampling model and the probability calculus, if the interest focuses on a "confounded" parameter, then such formal assurances have little practical value. Furthermore, as more data become available, the posterior mass will not concentrate on a point in the model, making asymptotic analysis difficult. As pointed out by Newton (1994), from a computational point of view, identification problems imply ridges in the posterior distribution and MCMC methods can be difficult to implement in these situations.
Following Jara et al. (2009), we consider the following re-parameterization of the model where β = β F , and θ i = β R +b i , and we center the nonparametric priors for G at an N q (µ, Σ) distribution. Notice that samples under the original parameterization can be obtained in a straightforward manner from the MCMC samples, as discussed by Jara et al. (2009). When a DP or DPM prior is used to model the random effects distribution, Dunson, Yang, and Baird (2007b) and Li, Müller, and Lin (2007) proposed alternative strategies to avoid the identifiability problem described above, but these approaches are not implemented in the current version of DPpackage.
The PTglmm function considers the PT prior for G as described in Jara et al. (2009) where M is the maximum level of the partition to be updated, Π µ,Σ,O = {π j } j≥0 is a set of partitions of R q , indexed by the centering mean µ, centering covariance matrix Σ and the matrix O, and A α is a family of non-negative vectors controlling the variability of the process indexed by α > 0. Here O is a q × q orthogonal matrix defining the "direction" of the partition sets and the PT prior is centered around N q (µ, Σ) distribution.
The models in PTglmm are completed by assuming the following prior distributions: where Γ, IW , and Haar refer to the Gamma, inverted Wishart, and Haar distributions, respectively. Notice that the inverted Wishart prior is parameterized such that E(Σ) = T −1 /(ν 0 − q − 1). Notice also that the Haar measure induces a uniform prior on the space of the orthogonal matrices.

A proportional hazards model with nonparametric frailties
We show that DPpackage functions for fitting GLMM can be used to fit the Cox proportional hazards model (Cox 1972) with nonparametric frailties in this section. Consider right-censored survival data where failure times are repeatedly observed within a group or subject. Let i = 1, . . . , n denote the strata over which repeated times-to-event are recorded, and j = 1, . . . , n i denote the repeated observations within stratum i. The data are denoted by {(w ij , t ij , δ ij ) : i = 1, . . . , n, j = 1, . . . , n i }, where t ij is the recorded event time, δ i = 1 if t ij is an observed failure time and δ ij = 0 if the failure time is right censored at t ij , and w ij is a p-dimensional vector of covariates.
The baseline hazard function λ 0 (t) corresponds to an individual with covariates w = 0 and survival time T 0 . Given that the baseline individual has made it up to t, T 0 ≥ t, the baseline hazard is how the probability of expiring in the next instant is changing. In terms of the baseline survival function S 0 (t) = P (T 0 > t) and density f 0 (t), this is given by The conditional proportional hazards assumption stipulates that where θ = (θ 1 , . . . , θ n ) are random effects, termed frailties in the survival literature. Often the frailties θ i , or exponentiated frailties e θ i , are assumed to be iid from some parametric distribution such as N (0, σ 2 ), gamma, positive stable, etc. We consider a PT prior for the frailties distribution below.
The specification is conditional because proportionality only holds for survival times within a given group i, not across groups unless the distribution of θ i is positive stable (see, e.g., Qiou, Ravishanker, and Dey 1999). Precisely, for individuals j 1 and j 2 within group i, Often the baseline hazard is assumed to be piecewise constant on a partition of R + comprised of K intervals, yielding the piecewise exponential model. References are too numerous to list. See, for instance, Walker and Mallick (1997), Aslanidou, Dey, andQiou et al. (1999). Assume that where a 0 = 0 and a K = ∞, although in practice a K = max{t ij } is sufficient. The prior hazard is specified by cutpoints {a k } K k=0 and hazard values λ = (λ 1 , . . . , λ K ) . If the prior on λ is taken to be independent gamma distributions, the model can approximate the gamma process on a fine mesh (Kalbfleisch 1978). Regardless, the resulting model implies a Poisson likelihood for "data" y ijk , taking values y ijk = 0 when t ij / ∈ (a k−1 , a k ] or δ ij = 0, and y ijk = 1 when t ij ∈ (a k−1 , a k ] and δ ij = 1, for k = 1, . . . , K(t ij ), where K(t) = max{k : a k ≤ t}. The likelihood for (β, λ, γ) is given by where p(y|µ) is the probability mass function for a Poisson(µ) random variable, µ ijk = exp{log(λ k ) + w ij β + γ i }∆ ijk , and ∆ ijk = min{a k , t ij } − a k−1 . Thus, the Cox model assuming a piecewise constant baseline hazard can be fitted in any software which allows for Poisson regression. Note that if covariates are time dependent as well, and change only at values included in {a k } K k=0 , the likelihood is trivially extended to include w ijk above for k = 1, . . . , K(t ij ), rather than w ij .

Kidney patient data
We consider data on n = 38 kidney patients discussed by McGilchrist and Aisbett (1991). Each of the patients provides n i = 2 infection times, some of which are right censored. McGilchrist and Aisbett (1991) found that only gender was significant, and so we follow Aslanidou et al. (1998), Walker and Mallick (1997), Qiou et al. (1999), and Hemming and Shaw (2005) in considering only this covariate in what follows. We fitted the semiparametric proportional hazards regression model using a nonparametric prior for the frailties distribution. The commands used to prepare the data to fit the model are given in the supplementary material. The original dataset, d [i, j], is a 38 by 6 matrix, which for each row (from left to right) contains the subject indicator, t i1 , δ i1 , t i2 , δ i2 , and the gender indicator. Ten intervals were considered with cutpoints {a 1 , . . . , a 10 } taken from the empirical distribution of the data.
R> beta <-fit0$coefficients$fixed R> b <-as.vector(fit0$coefficients$random$id) R> mu <-rep(0, 1) R> sigma <-getVarCov(fit0)[1, 1] R> state <-list(alpha = 1, beta = beta, b = b, mu = mu, sigma = sigma) A single Markov chain cycle of length 25, 000 was completed. The full chain was sub-sampled every 20 steps after a burn in period of 5,000 samples, to give a reduced chain of length 5,000. In the code below, tune3 = 1.5, corresponds to the standard deviation of the lognormal candidate generating distribution used in the Metropolis-Hastings step for the precision parameter of the PT.
R> fitPT <-PTglmm(fixed = y~gender + loghazard, + offset = log(off), random =~1 | id, family = poisson(log), + prior = prior, mcmc = mcmc, state = state, status = FALSE) R> summary(fitPT) R> predPT <-PTrandom(fitPT, predictive = TRUE, gridl = c(-2.3, 2.3)) R> plot(predPT) The abridged output is given below. The output lists the estimated effect for genderβ 1 = −1.13 followed by K = 10 estimated log-hazard values. Notice that the intercept term in the posterior information for the "fixed" effects (regression coefficients in the output), corresponds to the mean of the frailties distribution G. The posterior median estimate of the centering variance wasσ 2 = 0.35 and close to the posterior median of the frailties variance (0.33). Further, the posterior median (95% credible interval) for α was 0.75 (0.04; 3.77). The trace plots of the parameters (not shown) indicate a good mixing of the chain. The acceptance rates for the Metropolis-Hastings steps associated with the regression coefficients, frailties, centering variance and precision parameter was 36, 61, 43 and 0.46%, respectively. Notice that the 0 values for the acceptance rates in the output corresponds to the centering mean, which is not sampled, and the decomposition of the centering covariance matrix. The latter is only sampled for dimensions greater or equal than 2. Walker and Mallick (1997) analyzed these data with piecewise exponential model and frailties following a PT with fixed centering variance, P T 8 (Π 100 , A 0.1 ) and findβ 1 = −1.0. McGilchrist and Aisbett (1991) obtainβ 1 = −1.8, but with other nonsignificant covariates included. Aslanidou et al. (1998)    in contrast to the analysis presented in Walker and Mallick (1997), which showed two well defined density modes corresponding to men and women. We were unable to duplicate this result across several sets of hyper-prior values, including the consideration of P T 8 (Π 100 , A 0.1 ). In retrospect, this is not surprising. Two well separated modes would typically indicate an omitted covariate, although gender was included as a risk factor in the model. Finally, Figure 4 shows the posterior median and 95% credible interval for survival curves for males and females, taking the individual-level heterogeneity modeled through the frailty distribution into account.

Dependent random effects distributions
Although different BNP strategies have been proposed to relax the parametric assumption of the random effects distribution in mixed models, the assumption of exchangeable random effects has rarely been tested or relaxed, possibly obscuring important features of the amongsubjects variation. We show how to fit dependent GLMM in the context of educational data using the LDDPrasch function in this section.

The dependent semiparametric Rasch model
The statistical analyses of educational measurement data are usually based in Rasch-type models (see, e.g., De Boeck and Wilson 2004). Rasch-type models (Rasch 1960) can be viewed as a particular case of GLMM (see, e.g., Doran, Bates, Bliese, and Dowling 2007;De Boeck, Bakker, Zwitser, Nivard, Hofman, Tuerlinckx, and Partchev 2011), where the linear predictor, η ij , is modeled as a linear function of two parameters, η ij = θ i − β j , where θ i ∈ R corresponds to the ability of subject i, i = 1, . . . , n, and β j ∈ R corresponds to the difficulty of probe/item j, j = 1, . . . , m. The difficulty and ability parameters are interpreted as fixed and random effects, respectively. The structural assumptions underlying the model specification implies an important property of the models, namely, that the vectors of individual response patterns, Y i = (Y i1 , . . . , Y im ), and then total number of correct responses (total score), form an iid process conditionally on the difficulty parameters, β = (β 1 , · · · , β m ), and the abilities distribution G (see, e.g., San Martín et al. 2011). The common distribution of Y i , for all i ∈ N, is given by In many practical applications the exchangeability assumption is not appropriate and a partially exchangeable model that emphasizes the explanatory use of the model may be more appropriated. De Boeck and Wilson (2004) discussed a wide variety of explanatory Rasch-type models that incorporate predictor information in the model. When the predictors represent characteristics of the individuals, the abilities parameters have been modeled using a location model given by and where γ are "fixed" regression coefficients and the u i represent "residual" random effects. Therefore, the model becomes a more general GLMM. Although a semiparametric specification of the model given by expressions (4) and (5), obtained by considering a BNP model for G, provides flexibility in capturing different distributional shapes and, at the same time, allows for non-exchangeable abilities, it implies that the vector of covariates x i acts modifying the location of the distribution of the subject specific behaviors only. However, a more thorough evaluation of the effect of the predictors should account for potential changes in other characteristics of the abilities distribution than just the location. For instance, it is useful to examine for potential changes in the symmetry and multimodality of the abilities distribution.
The LDDPrasch functions considers the LDDP mixture of normals prior for the ablities in a Rasch model, as described in Fariña, Quintana, San Martín, and Jara (2011). The model is given by Notice that the first difficulty parameter is fixed at 0 in order to avoid identification problems (see, San Martín et al. 2011). The LDDP model specification is completed with the following hyper-priors:

Chilean educational data
We consider data from the Chilean system for educational quality measurement (sistema de medicición de la calidad de la educación, SIMCE). The Chilean education system is subject to several performance evaluations regularly at the school, teacher and student level. In the latter case, SIMCE has developed mandatory census-type tests to regularly assess the educational progress at three stages: 4th and 8th grades in primary school (9 and 13 years old children, respectively), and 2nd grade in secondary school (16 years old children). The SIMCE instruments are designed to assess the achievement of fundamental goals and minimal contents of the curricular frame in different areas of knowledge, currently Spanish, mathematics and science. Here we focus on data from the math test applied in 2004 to 8 grader examinees in primary school. The test consists of 45 multiple choice items questions with 4 alternatives. The response Y ij ∈ {0, 1} is a binary variable indicating whether the individual i answers item j correctly.
The main purpose of collecting these data is to monitor standards and progress of educational systems, focusing on characterizing the population (and its evolution) rather than individual examinees. It is of particular interest to understand the way in which some factors at individual and/or school level could explain systematic differences in the performance of students in order to establish policies to improve the education system. For instance, a significant characteristic of the Chilean elementary and secondary education system is a variety of different school types. These are grouped as Public I, financed by the state and administered by county governments; Public II, financed by the state and administered by county corporations; Private I, financed by the state and administered by the private sector; Private II, fee-paying schools that operate solely on payments from parents and administered by the private sector.
In order to evaluate the effect of the type of school and gender on the student performance we consider the LDDPrasch function. For illustration purposes, we consider a subset of 500 children. We refer to Fariña et al. (2011) for a full analysis of the complete data. In this analysis, x i includes an intercept term, three dummy variables for the type of school and the gender indicator. The LDDP Rasch model was fitted assuming β 2:45 ∼ N 44 (0, 10 3 I 44 ), α = 1, µ 0 = 0 5 , S 0 = 100I 5 , τ 1 = 6.01, τ s1 = 6.01, τ s2 = 2.01, ν = 8, Ψ = I 5 . The following code illustrates the prior specification in the LDDPrasch function.
R> mcmc <-list(nburn = 5000, nskip = 3, ndisplay = 1000, nsave = 5000) For each gender and type of school the density of the abilities distribution was evaluated on a grid of 100 equally spaced points in the range (−3, 8). The following code was used to create the design matrix for the prediction, where the first column corresponds to the intercept, the next three columns correspond to the dummy variable for the type of school (types), and the last column corresponds to the gender indicator (1= girl).
R> zpred <-matrix(c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, + 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, + 1, 0, 0, 1, 1), nrow = 8, ncol = 5, byrow = TRUE) Based on the previous specification, the LDDPrasch function was used to fit the model, using the following code: R> fitLDDP <-LDDPrasch(formula = y~types + gender, prior = prior, + mcmc = mcmc, state = NULL, status = TRUE, zpred = zpred, + grid = seq(-3, 8, len = 100), compute.band = TRUE) Different shapes in the resulting posterior densities were observed. Figure 5 displays the posterior mean and point wise 95%HPD interval for the random effects distribution for different combinations of the predictors. The density estimates show a clear departure from the commonly assumed normality of the random effects distributions. We found no important differences in the behavior of boys and girls. Children in Public I and II schools showed a similar skewed to the right random effects distribution. The estimated abilities distributions for children in private schools were shifted to the right in comparison with the distribution observed for children from public schools. This shift was more pronounced for children in fee-paying schools that operate solely on payments from parents and administered by the private sector (Private II) than those from schools financed by the state and administered by the private sector (Private I). A bimodal random effects distribution was observed in the abilities distributions from private schools.

Concluding remarks
Because the main obstacle for the practical use of BSP and BNP methods has been the lack of estimation tools, we presented an R package for fitting some frequently used models. Until the release of DPpackage, the two options for researchers who wished to fit a BSP or BNP model were to write their own code or to rely heavily on particular parametric approximations to some specific processes using the BUGS code given in Peter Congdon's books (see, e.g., Congdon 2001). DPpackage is geared primarily towards users who are not willing to bear the costs associated with either of these options.
Chambers (2000) conceptualized statistical software as a set of tools to organize, analyze and visualize data. Data organization and visualization of results is based on R capabilities. Chambers (2000) also proposed requirements and guidelines for developing and assessing statistical softwares. These requirements may be discussed with respect to DPpackage.
1. Easy specification of simple tasks: The documentation contains examples, and similar problems can be analyzed by moderate modifications of the model description files. The examples have been chosen so that they demonstrate the functionality of DPpackage with well-known datasets.
2. Gradual refinement of the tasks: The user can enhance a nonparametric model by adding covariates, and by fixing part of the baseline distributions and the precision parameters.
3. Arbitrarily extensive programming: DPpackage has a programming environment for implementing sophisticated proposal distributions, if the default proposals are not sufficient. Each source file contains a detailed specification of the corresponding function which is useful for potential user/developers who wish to modify the source files. In addition, we are currently working on a DPpackage developer's manual that will help users who would like to add new functionality to the package.
4. Implementing high-quality computations: Also, because the source code in a compiled language is available, new procedures can be added and the old ones modified to improve performance and flexibility.

5.
Embedding the results of items 2-4 as new simple tools: DPpackage has the capability of continuing a Markov chain from the last value of the parameters of a previous analysis.
As the MCMC samples are saved in matrix objects, both parts of the Markov chain can be easily merged.
Many improvements to the current status of the package can be made. For example, all DPpackage modeling functions compute conditional predictive ordinates for model comparison. However, only some of them compute the effective number of parameters pD and the DIC. These and other model comparison criterion will be included for all functions in future versions of DPpackage.
The implementation of more models, the development of general-purpose sampling functions, real-time visualization of simulation progress, and the ability to handle large data set problems, through the use of sparse matrix techniques (George and Liu 1981), are the topic of further improvements.