Concentration rate and consistency of the posterior under monotonicity constraints

In this paper, we consider the well known problem of estimating a density function under qualitative assumptions. More precisely, we estimate monotone non increasing densities in a Bayesian setting and derive concentration rate for the posterior distribution for a Dirichlet process and finite mixture prior. We prove that the posterior distribution based on both priors concentrates at the rate $(n/\log(n))^{-1/3}$, which is the minimax rate of estimation up to a \log(n)$ factor. We also study the behaviour of the posterior for the point-wise loss at any fixed point of the support the density and for the sup norm. We prove that the posterior is consistent for both losses.


Introduction
The nonparametric problem of estimating monotone curve, and monotone densities in particular, has been well studied in the literature from both a theoretical and applied perspective. Shape constrained estimation is fairly popular in the nonparametric literature and widely used in practice (see Robertson et al., 1988, for instance). Monotone densities appear in a wide variety of applications such as survival analysis, where it is natural to assume that the uncensored survival time has a monotone non increasing density. In these problems, estimating the survival function is equivalent to estimate the survival time density say f and the pointwise estimate f (0). It is thus interesting to have a better understanding of the behaviour of the estimation procedures in this case. An interesting property of monotone non increasing densities on R + is that they have a mixture representation pointed out by Williamson (1956) where P is a probability distribution on R + called the mixing distribution. In order to emphasize the dependence in P , we will denote f P the functions admitting representation (1). This representation allows for inference based on the likelihood. Grenander (1956) derived the nonparametric maximum likelihood estimator of a monotone density and Prakasa Rao (1970) studied the behavior of the Grenander estimator at a fixed point. Groeneboom (1985) and more recently, Balabdaoui and Wellner (2007) studied very precisely the asymptotic properties of the non parametric maximum likelyhood estimator. It is proved to be consistent and to converge at the minimax rate n −1/3 when the support of the distribution is compact. In their paper Durot et al. (2011) get some refined asymptotic results for the supremum norm.
From a Bayesian point of view, frequentist results such as consistency or concentration rates are also essentials; The Bayesian approach to nonparametric problems require the construction of priors on an infinite dimensional space. This cannot be done in a completely subjective way. Thus, as argued in Diaconis and Freedman (1986), strong consistency of the posterior distribution is a key issue in nonparametric Bayesian statistics. Studying concentration rates of the posterior provides more refine results that lead to a better understanding of the behaviour of the posterior. It also allows the comparison between Bayesian and frequentist procedures.
The mixture representation of monotone densities lead naturally to a mixture type prior on the set of monotone non increasing densities with support on [0, L] or R + . For example Ferguson (1983) and Lo (1984) introduced the Dirichlet Process prior (DP) and Brunner and Lo (1989) considered the special case of unimodal densities with a prior based on a Dirichlet Process mixture. The problem of deriving concentration rates for mixtures models have receive a huge interest in the past decade. Wu and Ghosal (2008) studied properties of general mixture models Ghosal and van der Vaart (2001) studied the well known problem of Gaussian mixtures, Rousseau (2010) derive concentration rates for mixtures of betas, Kruijer et al. (2009) proved good adaptive properties of mixtures of Gaussian. Extensions to the multivariate case have recently been introduced (e.g. Shen and Ghosal (2011);Tokdar (2011)).
Under monotonicity constrained, we derive an upper bound for the posterior concentration rate with respect to some distance d(·, ·), that is a positive sequence (ǫ n ) n that goes to 0 when n goes to infinity such that where the expectation is taken under the true distribution P 0 of the data X n and where f 0 is the density of P 0 with respect to the Lebesgue measure. Following Khazaei et al. (2010) we study two families of nonparametric priors on the class of monotone non increasing densities. Interestingly in our setting, the so called Kullback-Leibler property, that is the fact that the prior puts enough mass on Kulback-Leibler neighbourhood of the true density, is not satisfied. Thus the approach based on the seminal paper of Ghosal et al. (2000) cannot be applied. We therefore use a modified version of their results and obtain for the two families of prior a concentration rate of order (n/ log(n)) −1/3 which is the minimax estimation rate up to a log(n) factor. We extend these results to densities with support on R + and prove that under some conditions on the tail of the distribution, the posterior still concentrates at an almost optimal rate. To the author's knowledge, no concentration rates have been derived for monotone densities on R + . Interestingly, the non parametric maximum likelyhood estimator of f P (x) is not consistent for x = 0 (see Sun and Woodroofe (1996) and Balabdaoui and Wellner (2007) for instance). However, we prove that the posterior distribution of f 0 is still consistent at this point. In fact we prove the pointwise consistency of the posterior for all x in [0, L] with L ≤ ∞. We then derive a consistent Bayesian estimator of the density at any fixed point of the support. This is particularly interesting as the point-wise loss is usually difficult to study in a Bayesian framework as the Bayesian approaches are well suited to losses related to the Kullback-Leiber divergence. We also study the behaviour of the posterior distribution for the sup norm. This problem has been addressed recently in the frequentist literature by Durot et al. (2011). They derive refined asymptotic results on the sup norm of the difference between a Grenander-type estimator and the true density on sub intervals of the form [ǫ, L − ǫ] where ǫ > 0 avoiding the problems at the boundaries. Here, we prove that the posterior distribution is consistent in sup norm on the whole support of f 0 . We also derive concentration rate for the posterior of the density taken at a fixed point and for the sup norm on subsets of [0, L]. Surprisingly the concentrations rates are not deteriorated when considering such losses. It seems that the monotone non increasing densities have the same behaviour as 1-Hölderian densities in terms of concentration rates. However Giné and Nickl (2012) obtained sub-optimal concentration rates in sup norm under Hölder condition on the true density. Moreover, Arbel et al. (2012) proved that a specific prior in the white noise model which leads to minimax posterior concentration rates was sub-optimal under the pointwise loss and thus under the sup norm. Therefore in this respect, shape constraints such as monotonicity imply simpler behaviour since we obtain the minimax concentration rate also under the pointwise loss and the sup norm.
We now introduce some notations which will be needed throughout the paper.
Notations For 0 < L ≤ ∞ define the set F L by Let KL(p 1 , p 2 ) be the Kullback Leibler deviation between the densities p 1 and p 2 with respect to some measure λ KL(p 1 , p 2 ) = log p 1 p 2 p 1 dλ.
We also define the Hellinger distance h(p 1 , p 2 ) between p 1 and p 2 as Finally, we will say that Construction of a prior distribution on F L Using the mixture representation of monotone non increasing densities (1) we construct nonparametric priors on the set F L by considering a prior on the mixing distribution P . Let P be the set of probability measures on [0, L]. Thus we fall in the well known set up of nonparametric mixture priors models. We consider two types of prior on the set P.
Type 1 : Dirichlet Process prior P ∼ DP (A, α) where A is a positive constant and α a probability distribution on [0, L].
Type 2 : Finite mixture P = K j=1 p j δ xj with K a non zero integer and δ x the dirac function on x. We choose a prior distribution Q on K and given K, define distributions π x,K on (x 1 , . . . , x K ) ∈ [0, L] K and π p,K on (p 1 , . . . , p K ) ∈ S K . For X n = (X 1 , . . . , X n ), a sample of n independent and identically distributed random variables with common probability distribution function f in F L with respect to the Lebesgue measure, we denote Π(·|X n ) the posterior probability measure associated with the prior Π.
The paper is organised as follow: the main results are given in Section 2, where conditions on the priors are discussed. The proofs are presented in Section 3.

Main results
Concentration rates of the posterior distributions have been well studied in the literature and some general results link the rate to the prior (see Ghosal et al. (2000)). However, in our setting, the Kullback Leibler property is not satisfied and thus the standard theorems do not hold. We then use a modified version of the results of Ghosal et al. (2000) considering truncated versions of the density f . This idea has been considered in Khazaei et al. (2010) in a similar setting. The following theorem gives general conditions on priors to achieve a posterior convergence rate (n/ log(n)) −1/3 for densities in F L with 0 < L < ∞.
Theorem 1. Let X n = (X 1 , . . . , X n ) be an independent and identically distributed sample with a common probability distribution function f 0 such that f 0 ∈ F L with 0 < L < ∞. Let α be a probability density with respect to the Lebesgue measure with support included on R + such that α > 0 on [0, L] that satisfies for θ sufficiently small, and t > 1 Define also Q a probability distribution on N * and π p,K a probability distribution on S k satisfying for some positive constants C 1 , C 2 , a 1 , . . . , a K , c and finally, let (x i ) K i=1 be the order statistics of K independent and identically distributed random variables from α. If d(·, ·) is either the L 1 or Hellinger distance, then for Π a Type 1 or Type 2 prior, there exists a positive constant C such that when n goes to infinity, where C depends on f 0 only through L and an upper bound on f 0 (0).
The proof of Theorem 1 is given in section 3. Conditions (2) are roughly the same as in Khazaei et al. (2010). Theorem 1 is thus an extension of their results to concentration rates. Under some additional conditions on the tail of the true density, namely we require exponential tails, we get the posterior concentration rate for density with support on R + .
Theorem 2. Let X n = (X 1 , . . . , X n ) be an independent and identically distributed sample with a common probability distribution density f 0 such that f 0 ∈ F ∞ and f 0 (x) ≤ e −βx τ for β and τ some positive constants and x large enough. Let Π be as in Theorem 1. Then for some positive constant C we have for d(·, ·) either the L 1 or Hellinger metric when n goes to infinity.
Note that considering monotone non increasing densities on R + deteriorates the upper bound on the posterior concentration rate with a factor log(n) 1/τ . It is not clear whether it could be sharpen or not. For instance, in the frequentist literature, Reynaud-Bouret et al. (2011) observe a slower convergence rate when considering infinite support for densities without any other conditions. In a Bayesian setting, a similar log term appears in Kruijer et al. (2009) when considering densities with non compact support. However this deterioration of the concentration rate does not have a great influence on the asymptotic behaviour of the posterior. Note also that the tail conditions are mild as τ can be taken as small as needed, and thus the considered densities can have almost polynomial tails.
The above results on the posterior concentration rate in terms of the L 1 or Hellinger metric are new to our knowledge but not surprising. The specificity of these results lies in the fact that the usual approach based in the Kullback Leibler neighbourhoods to bound from below the marginal density cannot be used here, as explained in section 1.
The following results consider the pointwise loss function for which no results exist in the Bayesian nonparametric literature, apart from the paper of Giné and Nickl (2010). In this paper, the authors propose to use a global test based on a frequentist estimator instead of a finitely many local tests considered in Ghosal et al. (2000) which leads to the entropy condition (2.2) in their Theorem 2.1. When applied to Hölderian classes of densities and under the sup norm loss, this approach induces a suboptimal concentration rate. Interestingly, under shape constraints, as in our case, this leads to the minimax rate of concentration (up to a log(n) term). We first state a consistency result over [0, L].
We thus have a pointwise consistency of the posterior distribution of f 0 (x) for every x in the support of f 0 which also includes the boundary parts. Note that the maximum likelihood is not consistent at the boundaries of the support as pointed out in Sun and Woodroofe (1996) for instance. In particular it is not consistent at 0 and when L < ∞, it is not consistent at L. It is known that integrating the parameter as done in Bayesian approaches induces a penalisation. This is particularly useful in testing or model choice problems but can also be effective in estimation problems, see for instance Rousseau and Mengersen (2011). The problem of estimating f 0 (0) under monotonicity constraints is another example of the effectiveness of penalisation induced by integration on the parameters. However, contrariwise to Rousseau and Mengersen (2011), we have not clearly identified how the penalisation acts, and only observe that it leads to a consistent posterior distribution and a consistent estimator. Some refined results on the asymptotic distribution of a monotone non increasing density at a fixed point can be found in Prakasa Rao (1970) and more recently Balabdaoui and Wellner (2007), however, none of these results stands for points in the whole support of f 0 .
The following Theorem gives an upper bound for the concentration rate on the posterior distribution under the pointwise loss.
Theorem 4. Let f 0 and Π be as in Theorem 3 and let x be in (0, L), then for C a positive constant when n goes to infinity.
We derive from Theorem 3 the consistency of the posterior distribution for the sup norm. This is particularly useful when considering confidence bands, as pointed out in Giné and Nickl (2010). Under similar assumptions as in Durot et al. (2011), we get the consistency of the posterior distribution for the sup norm. Note that contrariwise to Durot et al. (2011), we do not restrict to sub-intervals of the support of the density. This is mainly due to the fact that the Bayesian approaches are consistent at the boundaries of the support of f 0 .
Let also the prior Π be as in Theorem 1. Then Similarly to before, we study the concentration rate of the posterior distribution for the sup norm loss. Durot et al. (2011) prove that the non parametric maximum likelihood estimator achieves the optimal rate when restricting the norm to a compact strictly included in the support of f 0 . This is mainly due to the behaviour of the non parametric maximum likelyhood estimator near the boundaries as pointed out in Kulikov and Lopuhaä (2006). Although we could obtain consistency of the posterior distribution in sup norm over [0, L], contrariwise to what appends with the MLE, in the following Theorem, we only get a posterior concentration rate for the sup norm on sub intervals of [0, L]. It is not clear to us that the rate (n/ log(n)) −1/3 can still be attained for the sup norm over the whole interval [0, L].
Let also the prior Π be as in Theorem 1. Let 0 < a < b < L and C > 0 be some fixed constants, then

Proofs
In this section we prove Theorems 1 to 6 given in Section 2. The proofs of Theorem 1 and 2 are based on a piecewise constant approximation of f 0 in the sense of Kullback Leibler divergence following Khazaei et al. (2010). Note that our approximation differs from theirs, which was adapted to the L 1 distance but not to the Kullback-Leibler divergence. Our approach is similar to the construction proposed in the proof of Theorem 2.7.5. of van der Vaart and Wellner (1996).

Proof of Theorems 1 and 2
We first give a series of Lemmas used to prove Theorem 1. The proofs of these Lemmas are postponed to Appendix A. We focus on densities on F L with 0 < L < ∞ and then extend the results to the case L = ∞ with exponential tails. The piecewise constant approximation of f 0 is base on a sequential subdivision of the interval [0, L] with more refined subdivisions where f 0 is less regular. We then identify a finite piecewise constant density by a mixture of uniform. The following Lemma gives the form of a finite probability distributionP 0 such that fP 0 is in the Kullback-Leibler neighbourhood of f 0 .
where f P is defined as in (1).
The proof of this Lemma is postponed to appendix A. The proof of Theorem 1 is adapted from Theorem 2.1 of Ghosal et al. (2000). It consists in obtaining a lower bound on the prior mass of Kullback Leibler neighbourhoods of any density in F L . An interesting feature of mixture distributions whose kernels have varying support is that the prior mass of the sets {f, KL(f 0 , f ) = +∞} is 1 for most f 0 ∈ F L . Hence we cannot apply directly the result of Ghosal et al. (2000). We thus extended the approach used in Khazaei et al. (2010) to the concentration rate framework and get similar results as those presented in Ghosal et al. (2000). Let f 0 be in F L , and define where Lemma 8. Let Π be either a Type 1 or Type 2 prior on F L satisfying (2) and let S n (ǫ n , θ n ) be a set as in (11), then Π(S n (ǫ n , θ n ) exp C 1 ǫ −1 n 1 + | log(ǫ n /n)| log This lemma is proved in appendix A. The ǫ metric entropy of the set of bounded monotone non increasing densities has been shown to be less than ǫ −1 , up to a constant (see Groeneboom (1986) or van der Vaart and Wellner (1996) for instance). As the prior puts mass on F L , on which f (0) is not uniformly bounded, we consider an increasing sequence of sieves where M n = exp cn 1/3 log(n) 2/3 (t + 1) −1 with t as in Theorem 1. The following Lemma shows that F n covers most of the support of Π as n increase.
Lemma 9. Let F n be defined by (14) and Π be either a Type 1 or Type 2 prior satisfying (2), then Π F c n e −cn 1/3 log(n) 2/3 Here again, the proof is postponed to appendix A. We now get an upper bound for the ǫ-metric entropy of the set F n . Recall that in Groeneboom (1985) it is proved that the L 1 metric entropy of monotone non increasing densities on [0, 1] bounded by M can be bounded from above by C 0 log(M )ǫ −1 n . We thus cannot use the bound M n in the definition of the set F n in (14) as it would give a suboptimal control of the to construct tests in a similar way as in Ghosal et al. (2000). However using a modified version of their results presented in Rivoirard and Rousseau (2009) we only have to bound the ǫ-metric entropy of the sets F n,j = {f ∈ F n , jǫ n ≤ d(f, f 0 ) ≤ (j + 1)ǫ n } for j ∈ N * . We can easily adapt the results of Groeneboom (1985) to densities on any interval [a, b] and get the following Lemma.
Lemma 10. LetF be the set of monotone non increasing functions on [a, b] such that for all f inF , b a f ≤ M 2 and f ≤ M , then The proof of this Lemma is straightforward given the results of Groeneboom (1985) and is thus omitted. Let x n,j ∈ [0, L] such that ǫ n /2 ≤ x n,j ≤ ǫ n . We denote for all f in F n,j f 1,j = f I [0,xn,j) and f 2,j = f I [xn,j,L] . Since for all f in F n,j we have x n,j f (x n,j ) ≤ x n,j f 0 (0) + (j + 1)ǫ n which in turn gives f (x n,j ) ≤ f 0 (0) + 2(j + 1) Recall that for all f ∈ F n we have f (0) ≤ M n . Using Lemma 10, we construct an ǫ n /2-net for the set F 1 n,j = f 1,j , f ∈ F n,j with N 1 points, and log(N 1 ) ǫ −1 n log(M n + 1)ǫ n (j + 2) and thus deduce log(N 1 ) ≤ C ′ nǫ 2 n j 2 Similarly, given that f (x n,j ) ≤ M + 2(j + 1) we get an ǫ n /2-net for the set F 2 n,j = f 2,j , f ∈ F n,j with N 2 points and This provide a ǫ n -net for F n,j with less than N 1 × N 2 points. Given (15) and (16) the L 1 metric entropy of the sets F n,j satisfy log(N (F n,j , ǫ n , L 1 ) nǫ 2 n j 2 Here we only control the entropy of slices of the sieves, and thus cannot apply directly the results of Ghosal et al. (2000). A modified version of this theorem proposed in Rivoirard and Rousseau (2009) (Theorem 4.1) can be used in that case. Nonetheless, this result requires the Kullback-Leibler property which is not satisfied here. Given Theorem 2.1 of Khazaei et al. (2010), we get that the same result stand in the convergence rate setting by replacing ǫ by ǫ 2 n in their proof. We thus derive a modified version of Theorem 4.1 of Rivoirard and Rousseau (2009) given in appendix B which is valid without the Kullback-Leibler property. Thus Theorem 1 is proved.
Extention to R + Given that f 0 (x) e −βx τ when x goes to infinity, if θ n is such that θ n = inf{x, 1 − F 0 (x) > ǫ n /(2n)} then θ n (log(n)) 1/τ . Similarly to before, we can approximate a restriction of f to [0, θ n ] by a piecewise constant function g as in Lemma 7. Setting g(x) = 0 if x > θ n we get an approximation of f and Lemma 7 still holds. Similarly, Lemma 8 still holds under the exponential tail assumption. We now get an upper bound for the ǫ-metric entropy of F n,j . Similarly to before we split F n,j into two parts. The construction of an ǫ n /2net for F 1 n,j does not change and therefore (15) holds. Finally, letF 2 n,j = {f ∈ F 2 n,j , ∀x > θ n , f (x) = 0}. Given Lemma 10, we get for c 1 > 0 large enough an ǫ n /(2c 1 (j + 1))-net forF 2 n,j by considering f ⋆ the restriction of f to [x n,j , θ n ]. We have d(f, f ⋆ ) ≤ c 2 (j + 1)ǫ n where d(·, ·) is either the L 1 or Hellinger distance. Hence, for c 1 > c 2 an ǫ/2-net for F 2 n,j with at most e c3nǫ 2 n j 2 points and thus log N (F 2 n,j , ǫ n , d ≤C ′′ nǫ 2 n j 2 We conclude using the same arguments as in the preceding section, and thus Theorem 2 is proved.

Proof of Theorems 3 and 4
Define f (0) as lim xց0 f (x) = f (0 + ). Since f is monotone non increasing, this limit exists and is uniquely defined. It is well known that the maximum likelihood estimator is not a consistent estimator of f 0 (0 + ) (see Balabdaoui et al. (2011) for instance). In this section, we prove that the Bayesian approaches we consider do not suffer from this and that the posterior distribution of the density at a any fixed point of the support of f 0 is consistent. Moreover, we prove that the Bayesian approach yield a consistent estimator, namely the posterior median. For the sake of the presentation, we only consider the case L = 1. The same results holds for any L such that 0 < L ≤ ∞ We first prove consistency of the posterior distribution for f (x) for x ∈ [0, 1] and thus prove the first part of Theorem 3.
We must therefore prove that if (1) Note that Theorem 2.1 implies that so that we can substitute A ǫ with A ǫ ∩ {|f P − f 0 | 1 ≤ ǫ n } which we still denote A ǫ . Let c n = 5(log(n)) 1/3 /ǫ. We have Letf n be the maximum likelihood estimator of f 0 if 0 < x < 1. Letf n be such thatf We define the test functions φ x n = I |fn(x)−f0(x)|>cnn −1/3 . It is known from Prakasa Rao (1970) and Kulikov and Lopuhaä (2006) We now obtain an exponentially small bound for the type II error of the test Following Kulikov and Lopuhaä (2006) we consider the inverse process U n defined as U n (a) = argmax t>0 (F n (t) − at) where F n is the empirical cumulative distribution function, with a = ǫ/2 + f (x). As |f P (x) − f 0 (x)| > ǫ. We denote We thus have for n large enough which leads to We now recover the direct process and obtain Note that, for all t n ≤ x n , Since f is monotone non increasing, we thus deduce that, for t n ≤ x n P f sup and, taking t n = x n /2 we get We now have to study the empirical process √ n(F n (t) − F (t)) to control N n .
These processes have been widely studied ; van der Vaart and Wellner (1996) obtained some large deviation inequality. In order to use van der Vaart and Wellner (1996) results, we have to control the metric entropy of the set Q = {I [0,t] ; t ∈ [0, 1]}. The following Lemma gives an upper bound for N [ ] (ǫ, Q, L 2 ). The proof is postponed to Appendix A.
Now using Theorem 2.14.9 of van der Vaart and Wellner (1996), we have n /32 e −nx 2 n /33 and thus have Similarly to the proof of Theorem 1, following Khazaei et al. (2010), we get an exponentially small lower bound for D n . More precisely, we get that D n ≥ 2e −(c+2)nǫ 2 n with probability that goes to 1. Note that Given the preceding results, we have And with equation (21), given that for n large enough x n ≥ c n n −1/3 we get that which ends the proof the first part of Theorem 3. Note that for x in the interior of the support of f 0 the same proof hold if we replace ǫ by ǫ n = (n/ log(n)) −1/3 . We thus get the proof of Theorem 4.

Consistency of a bayesian estimator
We consider in this sectionf π n (t), the Bayesian estimator associated with the absolute error loss, define as the median of the posterior distribution. Consistency of the posterior mean, which is the most common Bayesian estimator is however not proved here but could nevertheless be an interesting result.
We first definef π n (t) such that In order to get consistency in probability we note that iff π We deduce, with Markov inequality and Theorem 1 and similarly → 0 which gives the consistency in probability off π n (t).

Proof of Theorems 5 and 6
In this section we prove that the posterior distribution is consistent in sup norm. We prove that, if the posterior distribution is consistent at the points of a sufficiently refined partition of [0, L] then it is consistent for the sup norm.
Here again, we will only consider the case L = 1 without loss of generality. We first denote Note that given Theorem 3, we have that Given that f is monotone non increasing, and given the hypotheses on f 0 we have and for the same reasons and thus, taking the supremum over x, we get We then deduce Which gives the consistency of the posterior distribution in sup norm We now prove that we get optimal concentration rate when considering the sup norm over any fixed sub-interval of [0, 1]. To do so, we use the same approach as before, replacing ǫ by ǫ n = (n/ log(n)) −1/3 . We thus have an increasing number of bins p = p n in the partition (x i ) i . Note that p n ≈ ǫ −1 n .
We first control the rate of convergence of the type I error of a single test φ xi n define as before. Prakasa Rao (1970) found the asymptotic behaviour of the non parametric maximum likelyhood estimator at a fixed point t ∈ (0, L) and prove that with C a positive constant and Z = argmax{W (u) − u 2 u ∈ R} with W a standard Winer process on R originating from 0. Groeneboom (1989) study very precisely the process Z. Given point (iii) of Corollary 3.4 of Groeneboom (1989) and equation (24), we get that for t ∈ (0, L) We thus deduce that Similarly to (18) The proof of Theorem 2.1 of Khazaei et al. (2010) gives us Π(D n > 2e −(c+2)nǫ 2 n ) ≤ C nǫ 2 n and similarly to (22) we get for all x ∈ (0, 1) The two last terms of (25) are exponentially small. We thus deduce that Similarly to before, we prove Theorem 6.

Discussion
In this paper, we obtain an upper bound for the concentration rate of the posterior distribution under monotonicity constraints. This is of interest as in this model, the standard approach based on the seminal paper of Ghosal et al. (2000) cannot be applied given that the Kullback Leibler property is not satisfied. We prove that the concentration rate of the posterior is (up to a log(n) factor) the minimax estimation rate (n/ log(n)) −1/3 for standard losses such as L 1 or Hellinger.
More interestingly, we prove that the posterior distribution is consistent for the pointwise loss at any point of the support and for the sup norm loss. We also derive almost optimal concentration rates of the posterior distribution for these losses. Giné and Nickl (2012) derived results on posterior concentration for the sup norm. They prove that the posterior concentration rate in this case is suboptimal under Hölder constraints on the true density. Note that similarly to Giné and Nickl (2012), our results on the sup norm are proved using tests constructed from a frequentist estimatorf n of the true density. Pointwise loss correspond to estimating an unsmooth linear functional, see Arbel et al. (2012) or Li and Zhao (2002) and the recent literature on concentration rates in semi parametric models, in particular for functional of curves, shows that this is a difficult problem, see Rivoirard and Rousseau (2009) or Kleijn and Knapik (2012). Interestingly in the context of shape constrained densities, it turns out to be not so difficult. Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol., 73 (5):689-710. Shen, W. and Ghosal, S. (2011). Adaptive Bayesian multivariate density estimation with Dirichlet mixtures. ArXiv e-prints. Sun, J. and Woodroofe, M. (1996). Adaptive smoothing for a penalized NPMLE of a non-increasing density. J. Statist. Plann. Inference, 52 (2) Proof. For a fixed ǫ, let f be in F L . Consider P 0 the coarsest partition : at the i th step, let P i be the partition we split the interval [x j−1 , x j ] into two subsets of equal length. We then get a new partition P i+1 . We continue the partitioning until the first k such that ε 2 k ≤ ǫ 3 . At each step i, let n i be the number of intervals in P i , s i the number of interval in P i that have been divided to obtain P i+1 , and c = 1/ √ 2. Thus, it is clear that using Hölder inequality. We then deduce that Now, for f ∈ F L , we prove that there exists a stepwise density with less than K 0 M 2/3 L 1/3 1 ǫ pieces such that In order to simplify notations, we define . We consider the partition constructed above associated with f 1/2 , which is also a monotone nonincreasing function that satisfy f 1/2 (0) ≤ M 1/2 (instead of M ). We denote g the function defined as g(x) = We then define h = g 2 g 2 and and get an equivalent of g 2 .
Let P be a probability distribution defined by thus f P = h and given the previous result, lemma 7 is proved.