Dependent Species Sampling Models for Spatial Density Estimation

We consider a novel Bayesian nonparametric model for density estimation with an underlying spatial structure. The model is built on a class of species sampling models, which are discrete random probability measures that can be represented as a mixture of random support points and random weights. Specifically, we construct a collection of spatially dependent species sampling models and propose a mixture model based on this collection. The key idea is the introduction of spatial dependence by modeling the weights through a conditional autoregressive model. We present an extensive simulation study to compare the performance of the proposed model with competitors. The proposed model compares favorably to these alternatives. We apply the method to the estimation of summer precipitation density functions using Climate Prediction Center Merged Analysis of Precipitation data over East Asia.


Introduction
Inference on a collection of probability distributions that are related but not identical is a fundamental problem in statistics. Some of the simplest versions of this problem are normal linear regression models, where the distributions are assumed to be normal, with means assumed to be linear functions of covariates. Such models are, however, inadequate to describe more complicated scenarios, such as those arising in inference for climate and meteorological data when observations are indexed by location and estimation of the climate variable distributions at each location is of interest (see, e.g. Gelfand et al., 2005). As a motivating application, we consider data on a 33-year period of summer precipitation over East Asia. One traditional use of such data is the prediction of summer precipitation in the form of probabilities for above average, average, and below average precipitation (Barnston et al., 2003;Tippett et al., 2007). A crucial element in such predictions is the adequate modeling of the distribution of climate variables. Modeling of climate variables, however, is not straightforward, as their distributions are poorly approximated by commonly used parametric families, and might vary by covariates such as geographical coordinates. In summary, the problem can be described as spatially varying density estimation.
Density estimation is one of the traditional applications of nonparametric Bayesian methods. See, for example, Müller and Mitra (2013) for a review. Related nonparametric Bayesian models include, in particular, prior models for families of distributions that are indexed by a given set of covariates, {G x : x ∈ X }. Such models are known as dependent random probability measures and have been a focal point of recent research efforts. The study of such models started with the seminal work by MacEachern (1999MacEachern ( , 2000, who proposed a covariate-dependent version of the Dirichlet process (DP) by Ferguson (1973).
The key idea of the dependent DP construction is based on the stick-breaking representation by Sethuraman (1994). A random probability measure G with DP prior can be represented as where Here, B is a Borel σ-field on the space where G 0 is supported. In most applications, this space is Euclidean. In summary, the DP prior model is indexed by two parameters, M and G 0 . The parameter M > 0 is known as the total mass parameter, and G 0 is called the centering distribution. We write G ∼ DP(M, G 0 ).
MacEachern's dependent DP (DDP) considers a family of random probability measures G x (B) = h w x,h δ θ x,h (B), where the support points θ x,h and weights w x,h are modeled as stochastic processes indexed by covariates x. DDPs have interesting general properties, such as continuity and large support under suitable conditions (Barrientos et al., 2012). The construction of the DDP is such that marginally each G x follows a DP distribution for every x ∈ X . A commonly used variation of the DDP model is obtained when dependence is introduced through support points only, keeping the weights w x,h = w h common across x. This is usually referred to as common or single weights DDP and has been used by many authors, including De Iorio et al. (2004), Gelfand et al. (2005), Rodríguez and ter Horst (2008), and Jara et al. (2010) to name just a few. One of the advantages of models based on common weights DDPs is that posterior simulation algorithms for models based on regular DP priors can usually be adapted with few modifications.
Although popular, common weight DDPs have been argued to have some limitations. See, e.g. the discussion in Griffin and Steel (2006), Duan et al. (2007) and Rodríguez and Dunson (2011). This has led some researchers to consider common or single atoms DDPs where dependence is introduced in the weights only. Griffin and Steel (2006) proposed the order-based DP in which one set of weights and support points are shared by all random measures G x , but with different random order. Chung and Dunson (2011) defined the local DP. The local DP utilizes one set of weights and support points as in the order-based DP, but each probability measure G x is defined by a random subset of the weights and support points. Dunson and Park (2008) proposed the kernel stickbreaking process in which [0, 1]-valued random variables that define a stick-breaking process are multiplied by a kernel factor to reflect location effects. Reich and Fuentes (2007) and Fuentes and Reich (2013) used the kernel stickbreaking process in modeling wind speed to define spatially varying random distributions, with uniform and elliptical kernels used as location-specific multiplicative factors. Li et al. (2015) used a similar idea in the boundary detection problem where the kernel factor is replaced by logit transformed random variables from a latent autoregressive model. Rodríguez and Dunson (2011) defined the probit stick-breaking process in which the Beta random variables of the stick-breaking process are replaced by probittransformed Gaussian processes. Alternatively, Ren et al. (2011) and Ding et al. (2012) used Beta random variables defined by a logit transformation of covariates. These approaches allow for more flexible variations of random probability measures than the common weights DDP.
All these approaches include the stick-breaking construction, and the type of dependence structure introduced is counterintuitive and foreign to the spatial dependence literature, with the exception of Rodríguez and Dunson (2011) and Li et al. (2015). Another critical limitation of DP-or DDP-based models is that the weights in the stick-breaking construction are stochastically ordered in an exponentially descending way. This implies a particular clustering structure that favors a small group of large clusters, which is unnatural in many applications. To deal with these limitations, we propose a class of dependent random probability measures whose definition is based on two main ingredients: species sampling models (SSM) and conditional autoregressive (CAR) models. In particular, we do not use the stick-breaking process to define the weights of random probability measures and define the weights directly via normalization.
The first component of the proposed approach is the use of general SSM's (Pitman, 1996). A SSM defines a random discrete probability measure with random weights and support points that are independent of each other. It is the de Finetti measure of a species sampling sequence, that is, an exchangeable sequence of random variables for which the predictive distribution of the next observation is a function of the number of ties and frequencies among the earlier observations. Species sampling sequences are extensions of the Pólya urn sequence (Blackwell and MacQueen, 1973), that is, an exchangeable sequence of random variables whose de Finetti measure is the Dirichlet process. SSMs are random probability measures that involves a probabilistic structure to define a random probability measure. Many well known examples follow this structure.
The second element of the proposed approach is the use of CAR models, which were introduced by Besag (1974) and have been widely used for modeling data with spatial dependencies. For example, Geman and Geman (1984) and Besag et al. (1991) applied CAR models to image data, while Nieto- Barajas (2008) and Lee (2011) used these models to investigate medical records. The main advantages of CAR models are the easy implementation of posterior simulation algorithms and the natural extension to the modeling of spatio-temporal data. We will argue that the combination of these two features provides a rich basis for flexible modeling of data with a spatial structure.
The rest of this paper is organized as follows. In Section 2 we review SSMs and CAR models. In Section 3 we introduce the proposed CAR SSMs. Computational methods to implement posterior simulation under the CAR SSMs are described in Section 4.1. In Section 5 we present a simulation study and the application to the motivating climate variables data for predicting rainfall in the Korean peninsula. Some final comments and conclusions are presented in Section 6.

Species Sampling Models (SSM)
A random probability measure G is called a SSM (Pitman, 1996), if it can be represented as for a sequence of positive random variables (P h ) and a nonnegative random variable R with R = 1 − ∞ h=1 P h with probability 1, and where (θ h ) is a sequence of random samples from a diffuse probability measure G 0 and independent from (P h ). We can immediately see that (1) is a special case of (2). A SSM for which P (R = 0) = 1 is termed proper. The practical use of SSM as nonparametric priors is restricted to proper SSM's. SSMs admit many popular classes of random probability measures as particular cases. These include the DP, the two-parameter DP or Pitman-Yor process (Pitman and Yor, 1997), the stick-breaking priors (Ishwaran and James, 2001) and the normalizedinverse Gaussian processes (Lijoi et al., 2005). More properties of SSMs can be found in Pitman (1996), Ishwaran and James (2003), Navarrete et al. (2008), James (2008), James et al. (2009), Jang et al. (2010, and Lee et al. (2013).
Recently, Bassetti et al. (2010) and Airoldi et al. (2014) proposed a generalization of species sampling sequences designed for nonexchangeable data which has modeling potential for time series and spatial statistics. Lee et al. (2013) considered a family of SSMs whose weights are normalized positive random variables where (w h ) ∞ h=1 is a sequence of positive random variables and (θ h ) are independent and identically distributed (i.i.d.) random support points distributed according to G 0 , and independent of (w h ). A sufficient condition for (Lee et al., 2013).
The class of SSMs is a large class of discrete random probability measures subject only to the constraint of having i.i.d. atoms that are independent of the weights. In the following discussion, we consider a specific class of SSMs based on sequences of log-normal random variables: for h = 1, 2, . . . , where a, b, β, τ 2 are positive constants. It is easy to verify that ∞ l=1 E(w l ) < ∞, and thus ∞ l=1 w l < ∞ a.s. The specification (4) is chosen to define models with relevant differences compared to the Dirichlet process prior. The proposed form for the weights (w h ) allows for a number of moderately large weights of (a priori) comparable size. This is in contrast to the DP prior which assumes a priori stochastically ordered and geometrically decreasing sizes for the weights. This will be important in the context of the spatial dependence across related measures which we will introduce later. With multiple dominant weights it is easier to model the desired heterogeneity across locations.

Gaussian Conditional Autoregressive (CAR) Models
The CAR model is one of the most popular models for a large number of dependent random variables. Under a CAR model, the joint distribution of a set of random variables is specified via the full conditional distributions. However, not every set of full conditional distributions leads to a legitimate joint distribution. See further details in Besag (1974), Cressie (1993), Kaiser and Cressie (2000), Cressie and Wikle (2011) and references therein.
We focus on Gaussian CAR models. Let D = {s i , i = 1, . . . , n} be a set of spatial locations and let u = (u 1 , . . . , u n ) T be a collection of random variables with u i corresponding to location s i for i = 1, 2, . . . , n. We define where μ i , ξ ij ∈ R and τ 2 i > 0 for all i, j = 1, . . . n and let Λ = (λ ij ) with A key result for (5) to define a proper model is the following. If Λ is symmetric and positive definite, then the joint distribution of y is N (μ, Λ −1 ). See Rue and Held (2005) for further details, or Besag (1974), Cressie (1993), and Banerjee et al. (2004). Specific choices for ξ ij will be discussed below, when we combine (3) through (5) to construct the desired spatially dependent SSMs.

Spatial CAR SSMs
We combine the definitions of SSM's and CAR models to construct a class of spatially ∓ dependent SSMs {G s : s ∈ D} on a set of locations D = {s 1 , s 2 , . . . , s n }. The domain D can be a set of grid points defined on a spatial region, such as the area including South Korea for our motivating application, or a set of time points, etc. We define a probability measure G s at each location s ∈ D as where The dependence of the random probability measures G s across s is introduced through the dependence of (w s,h , s ∈ D). We use the CAR model to define the joint distributions of (w s,h , s ∈ D). We also note that (6) highlights the discrete nature of the model for G s . This implies that a sample from any G s will have ties with positive probability, giving thus rise to a particular clustering structure whose distribution is controlled by the distributional assumptions on the set of weights (P s,h ) for s ∈ D, in turn determined by the (w s,h ) quantities.
We combine (4) and (5) to define w si,h = e u i,h , with the joint distribution of (u i,h ) determined by the CAR model where τ 2 > 0, and A key element of model (7) is the specification of the autoregressive coefficients (ξ i ). We consider here two alternative ways of defining (ξ i ), which differ in the extent and strength of the induced dependence structure. We will later compare the relative merits of both specifications.

Mercer CAR model:
. . , m n,h ) and u h = (u 1,h , . . . , u n,h ). We specifically use the Gaussian kernel where ρ is positive. We call the spatial SSM defined by a Mercer kernel a Mercer CAR SSM, and denote it as (G s , s ∈ D) ∼ MCS(G 0 , a, b, τ 2 , ρ). The model does not include the independent model with u h having diagonal covariance matrix. But it can be arbitrarily close to the independent model.

Clayton-Kaldor CAR model (Clayton and Kaldor, 1987; Sun et al., 1999):
Let N (s i ) be the set of neighbors of the i-th location s i . Define (Sun et al., 1999). This guarantees that Λ = I − ρC is positive definite and u h ∼ N (m h , τ 2 Λ −1 ). We specifically consider the neighborhood of s i defined by for some fixed positive constant B. We refer to this spatial SSM as the CK CAR SSM and denote it as (G s , s ∈ D) ∼ CKCS(G 0 , a, b, τ 2 , ρ, B). The parameter ρ covers a spectrum of dependence among u i,h 's. For example, the independence model can be specified by setting ρ = 0.
The main difference between the two CAR SSM models is the nature of the precision matrix that is implied by the two models, which in turn results in different partial correlation structures. The precision matrix of CK CAR is sparse and thus its partial correlation structure is local, i.e. only neighboring sites have nonzero partial correlation. On the other hand, the precision matrix of Mercer CAR model is dense and its partial correlation structure is global, i.e., any two sites have nonzero partial correlation. Although the two models have little difference in fitted mean models, the difference in partial correlation structure can have actual impacts in other data analysis. As ρ −→ ∞, the Mercer CAR can be arbitrarily close to the independent model with independent random probability measures at different locations. But in actual data analysis choice of hyperparameter ρ can be delicate and sometimes the posterior includes long range dependencies. See Figure 10. Thus, if the densities are expected to possibly change abruptly, the Mercer CAR needs to be used with caution. Also, the sparsity of the precision matrix of CK CAR may be used to speed up the computation for large data set.
In a recent paper, Li et al. (2015) considered the use of CAR models to define a spatial stick-breaking prior. In the CAR SSM the weights of random probability measures are directly defined via normalization of CAR models, while Li et al. (2015) modeled the weights with a stick-breaking process defined by logit transformations of CAR models.
Nonparametric prior models, by definition, are designed to be flexible and to capture departures from simpler parametric constructions. A direct approach to verify this property consists of characterizing the weak support of the prior model. It is common in the Bayesian nonparametric literature to prove that the prior model has full weak support in order to emphasize its flexibility. The prior is said to have full weak support if the support of the prior under the weak topology is the same as the parameter space. This approach has been used for example to study prior models for unknown densities (see, e.g. Wu and Ghosal, 2008) and prior models for families of unknown distributions indexed by covariates (see, e.g. Barrientos et al., 2012). We refer to Ghosh and Ramamoorthi (2003) for a discussion on the use of the weak metric in Bayesian nonparametrics. The following proposition shows that the CAR SSMs have full weak support. The proof is given in Appendix A of the Supplementary Material (Jo et al., 2016). Proposition 1. If G 0 has full support on R, then the Mercer CAR SSM, MCS(G 0 , a, b, τ 2 , ρ), and the CK CAR SSM, CKCS(G 0 , a, b, τ 2 , ρ, B), have full weak supports.

Mixtures of CAR SSM
In this section, we discuss how the CAR SSMs can be used to model spatially located density functions simultaneously. Consider a grid D = {s i : i = 1, 2, . . . , n}, and a set of corresponding observations at each point, Y = {(y si,j , j = 1, . . . , N i ) : s i ∈ D}. In many applications, such as in the motivating example, the responses Y are continuous variables. The discrete nature of the MCS and the CKCS model would be inappropriate for such data.
A common extension of discrete random probability measures to accommodate continuous data is the use of an additional convolution with a continuous kernel (see, e.g. Lo, 1984). Specifically, we propose a probability model for data Y with spatial structure D which can be described as a mixture with respect to a CAR SSMs. We use either the Mercer or the CK constructions described above. The model is stated as follows.
where f (y|θ) is a kernel density with parameter θ, and a, b, τ 2 , ρ, B > 0. While this kernel may be chosen to be any distribution supported on the real numbers, for convenience we often specify it as the normal density f (·|θ) ≡ N (·; μ, σ 2 ) with θ = (μ, σ 2 ), in which case the conjugate (Normal Inverse Gamma) base distribution G 0 is chosen as where μ 0 ∈ R and α, ν, ψ > 0, and Ga(a, b) refers to a gamma distribution with mean a/b and variance a/b 2 . The hyperparameter ψ plays the role of a bandwidth in the kernel density estimation and controls the smoothness of the density estimates. It can affect posterior sensitivity. We fix hyper parameters μ 0 , α, and ν, but place a hyperprior on ψ: where ν 0 is a positive constant.
To facilitate posterior sampling, we use the following equivalent hierarchical model, which is obtained by introducing latent variables θ ij and replacing the integral in (9) with a hierarchical prior on θ ij : where (w si,h , s i ∈ D) are defined by either Mercer or CK CAR models, and the hyperprior for ψ is given in (11). We refer to (12) as either the MCS mixture or CKCS mixture model.

Posterior Computation
We introduce a posterior sampling algorithm for the mixtures of CAR SSM (12). For the posterior computation, the SSMs are truncated to the first K terms in the infinite sum. Similar truncations are often used in infinite mixture models similar to the ones considered here. See, e.g. Ishwaran and James (2001) and Rodríguez and Dunson (2011).
The proposed algorithm is based on the blocked Gibbs sampling algorithm (Ishwaran and James, 2001) and a data augmentation method for the multinomial logit model (Scott, 2011) which is an extension of the algorithm of Albert and Chib (1993) for the probit model. To be specific we use again the normal kernel.
We introduce an additional set of latent variables to help identifying cluster alloca-  (12) can be equivalently written as where p(u h | η) is the density of u h under the Mercer or CK CAR models with hyperparameter η = (a, b, τ 2 , ρ) and ψ is the parameter for G 0 with hyperprior π(ψ). For the normal kernel f , π(ψ) is given by (11). The details of the full conditionals and MCMC steps are given in Appendix B of the Supplementary Material (Jo et al., 2016).
Finally, we comment on computation times. Using the described posterior MCMC simulation strategy we find that inference for problems with up to some 100s sites can be implemented with reasonable computing times. See, for example, the computing times reported in Table 2. For the data analysis reported later, in Section 5.2., we could finish the computation of the proposed CAR SSMs within a few hours for inference with 6000 observations over more than 200 sites. In summary, computational effort for the proposed CAR SSMs compares favorably against available methods, but there remains a limiting factor.

Hyperparameters
Inference appears to be robust with respect to most of the hyperparameters. We therefore recommend to fix hyperparameters except ψ. Below we give guidelines for specific choices that we find to work well. Let K be the number of mixture components to be used in the MCMC sampler. The value of K needs to be large enough for the model to be flexible to fit the data, but not too large for it to slow down the MCMC sampler. Thus, small K which provides enough flexibility of the model is ideal. We choose where N i is the number of the observations at site i. From our experience, this appears to work for most data sets. If the computation time is scarce, smaller K with C = 1 3 or 1 4 can be used. The values of a, b and τ are chosen to allow for a sufficient number of non-negligible weights, but at the same time to represent a preference for a small number of effective components. If m ih decreases too rapidly with h, the number of effective mixture components gets too small and the mixture model may not be flexible enough. We use log m i,K/2 = 1/2 and log m i,3K/4 = 1/4, and P r(e u i,K/2 ∈ [ 1 4 , 3 4 ]) ≥ 1 2 which renders τ = 0.6. The parameter ψ in the base measure plays a similar role as the bandwidth in the kernel density estimation. We put a prior on ψ, where ψ 0 = 0.7σN −1/5 , ν 0 = 2,σ = 1 n n i=1σ i andσ i is the standard deviation of observations at site s i . The proposed value for ψ 0 is motivated by the rule of thumb for a kernel density estimation bandwidth parameter, suggested by Silverman (1986) as 1.06σN −1/5 . We choose i , α= 0.7 2 n −2/5 and ν = 2, which are chosen so that the mean and variance of N (μ 0 , σ 2 /α) in the base measure match the mean and variance of observations.
The partial correlation of the u i,h variables is controlled by ρ and B. In the Mercer CAR model the partial correlation between u i,h and u j,h is −e −ρd 2 , with d = ||s i − s j ||. For example, if ρ ≈ 0, the partial correlation between two sites are close to 1 and the model may fail to cope with abrupt changes of the densities. Thus, one needs to be careful in choosing the value of ρ. We suggest the following steps to choose ρ. We first determine the fixed distance d 0 at which the partial correlation is 0.5, i.e., e −ρd 2 0 = 0.5. Then, solve the equation to get the value of ρ = (log 2)/d 2 0 . In the above steps, 0.5 can be replaced by a different value. For the Clayton-Kaldor model we choose ρ and B similarly.

Simulation Studies
We carried out two simulation studies to compare the proposed CAR SSMs versus (i) a spatially independent SSM, (ii) the spatially dependent Dirichlet process mixture model (sDP) proposed by Gelfand et al. (2005), (iii) the generalized spatial Dirichlet process model (gsDP) of Duan et al. (2007), and (iv) the spatial dependent probit stick-breaking process model (sPSBP) of Rodríguez and Dunson (2011). In the first simulation study, we carry out repeat experimentation with 100 independent data sets for 13 locations over a univariate grid, while the second simulation study considers a large scale data set over a bivariate grid with 10 × 10 = 100 grid points.
The sDP, gsDP and sPSBP are implemented using JAGS (Plummer, 2003), which can be freely downloaded from http://mcmc-jags.sourceforge.net. For speed, we coded posterior sampling algorithms for the independent SSMs and the proposed CAR SSMs in Fortran functions that we called from R. (We are developing a user-friendly R package to implement our models; a preliminary version is available from http://sites. google.com/site/joseongil/software/). For example, for a two-dimensional large scale data with 100 locations, posterior simulations of independent SSM and the proposed CAR SSMs take about 2 hours and 5 hours to run 30,000 iterations (10,000 burn-in, 20,000 iterations after burn-in period), respectively. By contrast, the sPSBP takes approximately 11 hours and the sDP and gsDP take approximately 10 hours and 72 hours in JAGS, respectively. This may be due to the fact that the algorithms of the sDP, gsDP and sPSBP are implemented through JAGS.
We evaluate the performance of the density estimators by integrated absolute error (IAE) where f s (y) is the density under the simulation truth at location s andf s (y) is the corresponding density estimate. The integral was approximated numerically using Simpson's 3/8 rule over a grid of 200 equally-spaced points in the interval [−6, 6]. We also report the deviance information criterion (DIC), introduced by Spiegelhalter et al. (2002) and the log-pseudo marginal likelihood (LPML) defined as the sum of log conditional predictive ordinates (CPO) (Geisser and Eddy, 1979). Celeux et al. (2006) discussed 8 versions of DICs for models with missing variables. When there are many latent variables in the models, due to poor performance of the estimators, the effective number of parameters p D can be negative. We compute DIC as which is one of two recommended versions by Celeux et al. (2006). Although Celeux et al. (2006) cautioned against this version because its p D can be negative in some models, we use it because our focus of estimation is f (y|θ) and in our examples p D was never negative.
Both, the conditional density of the CPO statistic and DIC, are evaluated for Higher LPML values and lower DIC values are indicative of better models.

One-Dimensional Data
In the first simulation study, we compare the proposed CAR SSMs with inference under a spatially independent SSM and inference under the sDP. The hypothetical data are generated from three simulation truths consisting of mixtures of two normal distributions, a modified version of the simulation model used by Rodríguez and ter Horst (2008). The first two models only have spatially varying weights or atoms, while the third model has both, spatially varying locations and weights. Also, in all models we fix the scale parameters.
The density estimates under both mixture of CAR SSMs perform favorably compared to estimates under the competing models. The results from these examples suggest that the proposed methods utilize information across space and estimate spatially varying densities effectively. The proposed models also provide a better fit to the data. The two proposed CAR SSMs perform comparably with each other. However, the sDP performs less favorably for these data. The model is not flexible enough to identify the two modes of both densities. This may be due to the fact that the dependency of the sDP is modeled through atoms only and not through weights. To have a closer look at the density estimates, we present in Figure 2 density estimates for one simulated data set. For comparison the densities under the respective simulation truth are overlayed in the same plot. Generally speaking, the proposed estimates are closer to the simulation truth than under the competing models. The independent SSM overfits the data by following the empirical distribution (histograms) too closely. For some locations the density estimate has three modes, while each of the proposed CAR SSM density estimates correctly identify two modes.
The data set consists of 2,000 observations for all 100 locations, i.e., 20 observations for each location. In this study, we compare inference under the CAR SSMs with inference under the sDP, the gsDP, and the sPSBP models. Table 2 shows LPML, DIC and run times. Figure 3 displays box plots of IAE values, obtained from data generated under the simulation truth (17). In Table 2, the CK CAR SSM appears to be the best by both criteria, while the Mercer CAR SSM is slightly worse than gsDP. However, the proposed CAR SSMs have the smaller IAE values (Figure 3 (17) and computation times. The computation times reported here are not for comparison of computation times of the models, because Fortran is considerably faster than rjags.
for most locations. Figure 4 shows density estimates for one of the simulated data sets. Similar to the first simulation study, the proposed estimates are closer to the simulation truth than under the competing models.  (17). The boxplots show results under the following models: CKCS(G 0 , a, b, τ 2 , ρ, B) (white), MCS(G 0 , a, b, τ 2 , ρ) (gray), independent SSM (red), sDP (blue), gsDP (skyblue), and sPSBP (brown). Figure 5 shows the number of dominant weights, which is the smallest number of mixture components with the sum of weights larger than 0.9,for the Mercer CAR SSM (black), the CK CAR SSM (red), and gsDP (green) at each location. The proposed CAR SSMs use more dominant weights than gsDP at most locations. In particular, the CK CAR SSM uses more dominant weights than gsDP at all locations. The gsDP captures the spatial structure through both the weights and atoms, while CAR SSMs fits the spatial dependencies with spatially varying weights only. A larger number of nonnegligible weights provides more flexibility to model the desired heterogeneity across locations.
We also examine the prediction performance of the proposed CAR SSMs for 13 unobserved new sites. Figure 6 displays the posterior predictive densities for five selected locations and box plot of IAE values for 13 new sites. Predictive inference under the proposed models is quite accurate, even with the small sample sizes. The proposed methods borrow information across space effectively.

Precipitation over East Asia
We use the proposed CAR SSM models to analyze summer precipitation data collected for 216 sites over East Asia over 33 years, from 1979 to 2011. Figure 7 displays the 216 observation sites over East Asia. For each grid point, we have only 33 observations to construct probability density estimates, which is not enough for precise estimates. However, we expect the nearby densities are similar to each other and therefore expect better estimates if neighborhood information is utilized, as it is implemented in the CAR SSM models. Good inference for summer precipitation is of critical importance in South Korea. The country frequently suffers floods in summer that can cause large monetary loss as well as human casualties. For this reason, at the beginning of each season the Korean meteorology administration (KMA) publishes precipitation estimates as probabilities for three categories (below, normal and above) for subregions of Korea. The category "normal" means that in the coming season the precipitation will be in the middle 1/3 probability range of the past precipitations. The validity of these forecast probabilities hinges on good density estimates that make efficient use of the available data while honestly reporting relevant uncertainties.
In the following analysis we show how the proposed CAR SSM models can provide such density estimates (still without a temporal component for prediction). We carry out inference under the proposed models to obtain the desired density estimates using recorded summer (i.e. June, July and August) precipitation for grid points over East . We use the Climate Prediction Center Merged Analysis of Precipitation (CMAP) data set. See Figure 7 for the grid points.
As before we fix the hyperparameters of the CAR SSMs. We use B = 13 so that the neighborhood of each grid point consists of 8 points around it; and K = 16 (K = n/2) for the discrete probability measures. Inference is based on 20,000 posterior samples from Markov chain Monte Carlo (MCMC) posterior simulation after a burn-in period of 10,000 samples. Table 3 shows LPML and DIC values for fits under both CAR SSMs, MCS(G 0 , a, b, τ 2 , ρ) and CKCS (G 0 , a, b, τ 2 , ρ, B), and under sDP, and sPSBP. The Mercer CAR SSM is favored by both criteria. Both CAR SSMs provide better fits than the competitors. The results suggest that the proposed methods utilize information across space and estimate spatially varying densities effectively. Figure 8 shows the posterior mean estimates under the proposed CAR SSMs, sDP and sPBSP. Figure 9 shows the estimated 33.33 % and 66.66 % percentiles of the precipitation density functions, based on the density estimates, on each grid. Figure 10 shows a predictive density estimate for a single site whose observations were excluded from the analyses. That is, no observations at this site were used for the fit. The predictive density estimates show the difference of Mercer CARSSM and CK CARSSM. The smaller mode in the density estimate of the Mercer CARSSM appears to be an effect from distant observations; while the density estimate of the CK CARSSM seems to have limited effect from distant observations. We investigate sensitivity with respect to some prior specifications. We first repeated the analysis using (a, b) ∈ {(1, 10), (0.5, 20)} and τ ∈ {0.1, 0.5, 1, 4} for both CAR SSMs. The values were chosen to correspond to prior mean weights shown in Figure 11. In all cases, the remaining parameters were fixed as in the simulation study. Figure 12    We also assessed sensitivity of posterior inference with respect to changing the type of kernel in the Mercer CAR SSM from Gaussian, to exponential, Cauchy, or Matérn (Whittle, 1954) kernels and varying the value of the B parameter that specifies the neighborhood size, from B = 6.25 to B = 31.25. The grid for B corresponds to neighborhood sizes from 4 to 12. We find little change in the reported density estimates. Summaries are shown in Figure 13. We conclude that posterior inference is robust with respect to the choice of the kernel and B.

Conclusion
In this paper we proposed dependent species sampling models as an alternative class of nonparametric priors for the estimation of spatially varying densities. The proposed CAR SSMs introduce the desired spatial dependence in the random weights of discrete probability measures. We use CAR models to construct such spatially dependent weights. Since the dependence is modeled through weights, the CAR SSMs are flexible to accommodate spatial variations of densities. This is confirmed by the simulation stud- ies and the real data example. An appealing feature of the proposed models is that the modeling of dependence remains intuitive, as the CAR models are a well-understood class of statistical models. The posterior sampling algorithm we described is greatly simplified after introducing the data augmentation via auxiliary latent variables. CAR SSMs can be applied in any problem where a group of distributions indexed by locations or time need to be estimated simultaneously. One advantage of the posterior sampling algorithm for CAR SSMs is that the two components of the MCMC iterations, (i) cluster membership imputation of each observation and (ii) conditional sampling of the mixture weights P s,h , present strict similarities with extensively studied computational procedures. The first component is present in a large portion of algorithms dedicated to Dirichlet mixture models, and the second component is nearly identical to widely used MCMC strategies for posterior sampling under logit binary regression models. In different words, procedures designed to solve these two important sampling problems can be plugged in and structure the MCMC for CAR SSMs. Additionally, the computational advantages associated to the conditional independence structure of CK CAR models can be inherited in the posterior sampling of the unnormalized weights w s,h . In particular, the conditional independence structure in the models can be capitalized in  In this paper we included an application to inference for precipitation density regression. But CAR SSMs can be applied in many other problems, including other climate variables such as temperature, ozone or CO 2 , density estimation for housing prices, stock market prices etc.
In future work we plan to generalize the precipitation density regression with the inclusion of temporal components. An important remaining hurdle is computation time. New computational methods such as parallel MCMC or variational methods are promising directions. Other important generalizations are the inclusion of discrete and contin-  uous components in the distributions as well as a regression on covariates. The latter can easily be achieved by introducing a regression in the sampling model. Suppose response y si,j has associated covariate x si,j . The response y si,j could then be modeled as y si,j = x si,j β + si,j , where β is a regression coefficient and si,j follows the mixtures of CAR SSMs. When the covariate is discrete, assuming only a few values, the mixtures of CAR SSMs could be applied with covariates being accommodated in the model in the same way as spatial locations. For example, see, e.g., Gelfand et al. (2005) and Duan et al. (2007).