On a nonparametric resampling scheme for Markov random ﬁelds

: We study an extension to general Markov random ﬁelds of the resampling scheme given in Bickel and Levina (2006) [4] for texture synthesis with stationary Markov mesh models. The procedure generates bootstrap replicates of a sample using kernel regression and the principle of Gibbs sampling. Consistency of the bootstrap distribution is investigated under the Dobrushin contraction condition. Some simulation examples are given, in particular for the texture synthesis, for which the multiscale algo- rithm of Paget and Longstaﬀ (1998) [27] is revisited.


Introduction
The aim of this paper is to study an extension of a resampling scheme investigated by Bickel & Levina [4] for a particular class of stationary Markov random fields. This resampling method is in turn an adaption to random fields of the local bootstrap for Markov processes introduced by Paparoditis & Politis [28] for bootstrapping some time series. The basic principle of this bootstrap procedure can be formulated as follows: given a sample of a process, generate a replicate of this sample whose characteristics are similar to the original one. Generally, a stationary Markov model is defined by an invariant measure with respect to some probability kernels. Under some conditions, it is possible to simulate the model, using these kernels and a convenient recursive algorithm. Then, in order to reproduce the behaviour of the original time series or random field, a natural idea is to use nonparametric estimators of those kernels and to simulate a replicate of the sample, using the same algorithm. The term "bootstrap" comes from the fact that the nonparametric estimators of the conditional distributions used in [28] and [4] are based on a weighted empirical measure of the sample. This empirical measure is constructed using the idea of the kernel regression estimation. For example, in the case of a p−Markovian time series, the purpose is to estimate the regression function E (½ Xt≤x |X t−1 , . . . , X t−p ).
In [28], this method is shown to be applicable to a large class of Markovian time series in order to estimate some statistics of the original process. A variant of this method was investigated by Monbet & Marteau [25] for multivariate processes.
Bickel & Levina [4] studied an extension of this method to a class of Markov random fields in the texture synthesis framework. The goal of texture synthesis can be stated as follows: given a texture sample, synthesize a new texture that appears similar to a human observer. Texture mapping or image compression are frequent applications for such algorithms. The stochastic nature of texture variations makes it a particularly natural area for applying statistical methods but usually the consistency of those algorithms are not investigated. In [4], the motivation of the authors was to give a theoretical justification to the popular algorithm of Efros and Leung [13] used for texture synthesis. But a crucial assumption made on the random field is that its simulation can be done recursively, starting from a seed and ordering the pixels as for the simulation of a time series. In particular, if (X s ) s∈Z 2 denotes a random field, the rectangular lattice R (the sites of the pixels) of Z 2 is ordered setting R = {s 1 , . . . , s k } and it is assumed that for i ∈ {1, . . . , k}, the conditional distribution of X si given the "past" X si−1 , . . . , X s1 depends only on a limited number of variables. The consequence is that a texture can be simulated quickly with a single sweep. A typical example of such random fields were introduced by Pickard [29]. Although these models do not have a physical interpretation for a spatial phenomenon, they remain numerically efficient when the conditional kernels used for the simulation depend on more and more variables of the "past". In fact, these models have been introduced in the context of image analysis in order to approximate the general Markov-Gibbs model which is more difficult to handle: a recursive computation of joint distributions is generally impossible for the general model and Markov chains methods such as the Gibbs sampler or the Metropolis algorithm must be used for the simulation. However, this restrictive class of random fields approximates generally very well the textures for the statistics distinguishable by a human observer when large data sets are available (see in particular the simulation results given in [4]).
In this paper we will not make the same assumption as in [4] and we will study an extension of the results given in [4] for a general Markov-Gibbs random field satisfying some regularity conditions. We will consider a classical stochastic relaxation algorithm: the Gibbs sampler. Gibbs sampling is a Markov chain method used to simulate a sample of a random field, using its local conditional distribution (LCD), that is for s ∈ Z 2 the distribution of X s given (X t ) t =s (which only depends on a finite number of variables if X is a Markov random field). Then a natural idea is to build a nonparametric estimator of the LCD, using the same idea as Bickel & Levina [4] and to simulate a replicate of the random field. However, in general, it is not possible to define a joint distribution associated to the nonparametric estimator of the LCD. This point makes the problem difficult to study because our algorithm will simulate a strange non-Markovian distribution whose shape depends on the estimator but also on the size of the desired replicate. Nevertheless we give some conditions under which our method has consistency properties (i.e convergence to the distribution of the original random field).
We believe that studying such an extension is interesting for several reasons. First, our approach enlightens the good behaviour of another algorithm for texture synthesis introduced by Paget & Longstaff [27] and the resampling scheme studied is in the spirit of their work, using the nonparametric regression estimation introduced in [28] and extended in [4] to the random fields case. Secondly, although Gibbs sampling leads to long simulation times, this stochastic relaxation method applies to general Markov random fields, taking into account all the directions in the restitution of their local interactions. Motivated by other statistical applications, we believe that bootstrapping the "true" model instead of an approximation is more natural. Of course, the simulation of an image using a single sweep is numerically a great advantage and the speed of the algorithm is a major problem for the applications of texture synthesis. But except simulation time, the general statistical model should give better results, provided of course finding a suitable algorithm. However we do not claim that the approach studied in this paper is a satisfying answer, especially for the texture synthesis problem because the general Gibbs model is very difficult to handle, as shown by the impressive multiscale algorithm of Paget and Longstaff for texture synthesis (see Section 5), necessary to speed up the algorithm and to avoid problems related to local minima of the energy. Moreover the texture synthesis problem has been extensively studied over the last decade and several complex algorithms give now better results for this particular problem (see in particular [22,12] or [21]). But to our knowledge, a consistency result for bootstrapping the distribution of a general Markov field has not been investigated yet.
Finally, it could be interesting to consider other statistical applications. As pointed in [4], this kind of resampling scheme could also be used to approximate the distribution of some linear statistics, although its principle differs from some classical methods (using blocks) studied for stationary time series (see Künsch [20]) or random fields (see Politis & Romano [30]). This approach is used in [28] for time series. Moreover in [25], the simulation of missing data is investigated for time series. The so-called local grid bootstrap of [25] was successfully applied to wind data (see [2]) and it seems of interest to extend this approach to spatiotemporal phenomena. In the spatial context, one could use the observed sites for the estimation of the LCD and in a second step the Gibbs sampling for the simulation of the missing data. In this kind of application, ordering the different sites for the simulation seems not very natural. One can also argue that all the statistics are not distinguishable by a human observer and that the (visual) good results obtained in [4] for the texture synthesis should be extended to more natural random fields.
The paper is organized as follows. In Section 2, we recall the results of Bickel and Levina about their nonparametric resampling for a particular class of Markov random fields. In Section 3, we extend the resampling principle of Section 2 to general Markov random fields and some consistency results are proved for our extension. The example of a nearest-neighbor Markov random field with a pairwise potential is detailed in Section 4. Section 5 is devoted to the texture synthesis problem: we recall the multiscale algorithm used by Paget and Longstaff, which has an independent interest and we give several simulation examples. Proofs of our investigations are postponed to the last section of the paper.

Principle
We first recall the Markov Mesh Models (MMM in the sequel) algorithm introduced by Bickel and Levina [4] in the context of the texture synthesis problem. This algorithm is different from the original algorithm of Efros and Leung [13] by the order in which pixels are filled in the synthesized texture (raster instead of spiral), and the weights with which the pixels are resampled.
In the sequel we consider {X t , t ∈ N * × N * } a real-valued random field and a positive integer o ∈ N * . We will use the following notations: The set U (2.1) Now, the MMM resampling algorithm of Bickel and Levina [4] can be presented. First assume that a MMM X is observed on rectangle Then consider a family of kernels (W (ℓ) ) ℓ∈N * that are Borelian functions W (ℓ) : R ℓ → [0, ∞) satisfying some general smoothness assumptions (see Assumption (A4) in the Appendix). Moreover, for a resampling width b > 0 and all ℓ ∈ N * , define In the sequel, for simplicity, we will omit the exponent o and ℓ for respectively U On a nonparametric resampling scheme for Markov random fields

1507
The MMM resampling algorithm The aim of the algorithm is to generate a texture There are 3 main steps in this algorithm.

Select a starting value for {X
Typically the starting value will be a (o + 1) × (o + 1) square random chosen from the observed field (X t ) t∈IT . 2. Suppose that there exists (u, v) ∈ R T such that X * t has been generated for t ∈ {1, . . . , u − 1} × {1, . . . , v T } ∪ {u} × {1, . . . , v − 1}, that is, u − 1 rows are filled in completely, and the row u is filled up for the column v. To generate the next value X * t = X * (u,v) , let N t be a discrete random variable with probability distribution is a normalizing constant. Note that the set of all possible s is such that the conditioning neighborhood of s fits within the observed texture field. 3. Generate N t and set X * t = X * (u,v) = X Nt . In Figure 1, we show two steps in the progress of the MMM algorithm, just after the choice of the seed (step 1 above) and when the neighborhood of the pixel is full (see the center of the picture in Figure 1). In Figure 2, we illustrate the difference with the original algorithm of Efros and Leung. Here, the seed is located at the center of the new texture and the synthesis uses a spiral ordering. At each step, a pixel X * t is chosen conditionally to some pixels previously simulated and located in a square window centered at t. Another difference with the MMM algorithm is the choice of uniform weights for the synthesis (see [4] for details). The MMM algorithm is formulated for a particular class of random fields, the Markov Mesh Models (also known as Pickard random fields [29]) which were introduced by Abend, Harley and Kanal [1]. These models have been developed for image applications and can be efficiently simulated. The resampling scheme described above is an adaption to random fields of a method proposed for bootstrapping Markovian time series ( [31,28]).

Consistency results
To state the consistency of the above algorithm, we need the following notations: • for y ∈ R ℓ with ℓ ∈ N * , y is the usual Euclidean norm.
Moreover we define the following subsets of N * × N * : To show the consistency of their algorithm, Bickel and Levina prove a general lemma about the estimation of the local conditional distribution function F X|Y (x|y) = P(X t ≤ x|Y t = y) (see Theorem 2 in [4]). More precisely they establish the convergence of the following sample cumulative conditional distribution function, defined by for (x, y) ∈ S×S A and T ∈ N * ×N * such that I T = ∅. Here Z T = s∈IT W bT (y− Y s ). Note that F T is a classical nonparametric regression estimator of the conditional expectation E (½ Xt≤x |Y t = y). We recall the following theorem whose assumptions are reported to the Appendix.
Theorem 1 (Theorem 2 [4]). If X is a MMM satisfying assumptions (A1)-(A4) (see section 6.1), then for all A ∈ Z 2 such that |A| < ∞, Note. Theorem 1 will also be used for our extension. Moreover an almost sure convergence rate for this approximation will be given (see Theorem 4).
Using Theorem 1, Bickel & Levina show the consistency of their MMM algorithm and also of the original spiral resampling algorithm of Efros and Leung, using appropriated conditional independence properties (we refer to [4] for details). This resampling scheme uses the kernel regression estimation and requires some regularity assumptions (see Assumptions (A1-4) in section 6.1). As it is pointed in [4], these assumptions are perfectly plausible for most real textures: the mixing property is natural for stochastic textures, the compactness assumption is always satisfied since the number of gray levels is finite, and this number is sufficiently high in most of real textures to make the smoothness assumptions plausible.
However, the random fields used for the two algorithms described above must satisfy suitable conditional independence properties. For example, when o = 1, the rows of the MMM form a Markov chain. This is also the case for the Efros & Leung algorithm for which the last simulated row on a square lattice depends (conditionally to the previous rows) only on the last row: then it is easily seen that the rows of a such stationary process form a Markov chain. Moreover if o > 1, the rows on a square lattice evolve as a multivariate o−markovian time series.
For a general Markov-Gibbs model, such properties do not hold in general and it is not possible to simulate a sample using a single sweep. For example, consider the simple example of the four nearest neighbor AR model with Gaussian white noise with 0 < |α| < 1 4 . Then it is shown in Guyon [18] (see p. 13, Theorem 1.3.2 and its proof) that this Gaussian Markov random field does not admit a representation of the form where e t is independent of (X t−j ) j∈J , J is a half space of Z 2 and K is a finite subset of J. However, such an "unilateral" representation is necessary in order to apply the Markov property at each step of the MMM algorithm or the Efros & Leung algorithm. For the simple class of linear gaussian random fields, the unilateral type representation writes generally as an infinite linear expansion (see [18], Example 1.3.3). Then for a general Markov random field, it would be necessary to take in account of all the pixels previously simulated in order to apply the MMM or the spiral algorithm. Of course, applications of some Markov properties using a large neighborhood size o, provide an approximation of the model (see [3]). A class of Markov random fields satisfying these unilateral type Markov assumptions were introduced in [1] and [29]. These models have been developed for image applications and they remain numerically efficient tools which approximate very well the general model, as shown by the simulations given in [4]. The stationary models used in [4] satisfy the usual Markov property even for the sites located on the boundary of a square lattice. Such processes were characterized in Champagnat & al. [5] and as pointed by the authors, the class of such models is restrictive compared to general MRF's. Having in mind other statistical applications than the texture synthesis problem, we believe that a resampling scheme that can apply to a general Markov random field is of interest. For example, the simulation of missing data in meteorology, using an ordering for the different sites, seems not very appropriated. This is why we investigate in the next section, an extension of the algorithms described above for the general model using Gibbs sampling.

Extension of the resampling algorithm for a general Markov-Gibbs model
The goal of this section is to study convergence properties of a resampling scheme that can apply to more natural random fields than MMM. In particular, the model is not supposed to satisfy assumption (2.1). We will use the nonparametric estimators (2.2) of the local conditional distributions of the random field. Basically, our extension of the algorithm given in [4] can be formulated as follows:

Choose some values for the pixels {X
Choose randomly a site j of R T , extract the neighborhood X * j+No and choose a new pixel X s with a probability given by the conditional distribution F T (constructed with A = N o ).

Repeat point 2.
Here N o denotes a full square centered at 0. The algorithm just formulated above has similarities with the Gibbs sampler with a random visiting scheme used for the simulation of a Markov random field. However, as it will be explained in this section, this algorithm is not exactly a Gibbs sampler but just a Markov chain approach for the simulation of a bootstrap sample (theoretically, the step 2 must be repeated infinitely often). Note that at each step we use a conditional distribution with respect to a full neighborhood of a given site. Contrarily to the MMM algorithm, there is no initialization with a seed.

An approximated Gibbs sampler for Markov random fields
In this paper we consider a Markov-Gibbs random field X over Z 2 with a continuous compact state space S ⊂ R and we denote by µ its distribution. More precisely, denoting by λ S the Lebesgue measure on S and we assume that the probability measure µ on S Z 2 satisfies the following definition.

Definition 2.
A probability measure µ on S Z 2 is said to be the distribution of a stationary Markov random field X with neighborhood system N o if there exists a family of applications (called a Gibbs potential) Φ = {φ A : S A → R, A ∈ S} indexed by S, the set of all finite subsets of Z 2 such that and for V ∈ S, the conditional distributions of µ are given by: We refer to Guyon [18] for an introduction to Markov random fields. Using this formulation, it is easily seen that the following Markov properties are satisfied: The set t+N o is called the set of neighbors of the site t. Moreover P X t |X Z 2 \{t} is referred to the local conditional distribution (LCD). It is possible to show that any stationary random field satisfying (3.3) with a positive conditional density over S can be defined using a Gibbs potential Φ as in Definition 2. This characterization is given by the Hammersley-Clifford theorem (see [18]) and has been studied by several authors (see for example [3]). In general, a Markov random field satisfies the condition In contrast, the more restrictive class of MMM assumes additional conditional independence properties (see property (2.1)). A difficulty which arises with Gibbs random fields is the phase transition problem. For a given potential Φ, several random fields distributions can satisfy (3.1) and natural candidates are the cluster points of the probability kernels given by the right hand side of (3.1), when V ր Z 2 . In the sequel, we denote by G(Φ) the set of probability measures whose conditional distributions are given by (3.1). We refer to Georgii [17] for an introduction to Gibbs random fields and the phase transition phenomenon.
In order to extend the consistency proof of Bickel and Levina to general Markov random fields, we use the one point conditional distributions (2.2) with A = N o . However, in this case, the one point distribution (2.2) cannot be in general the LCD of a random field. Proposition 1. Suppose that |I T | ≥ 2. Then when W is the Gaussian kernel, the one point conditional law F T cannot be the LCD of a random field.
In particular, no Gibbs potential can be associated to F T when W is the Gaussian kernel. It is well known that for a given one point conditional distribution, several constraints are required to be the conditional distribution of a random field (see [18] p. 98). Those constraints are satisfied if the form of the one point conditional distribution is given by a Gibbs potential, as in (3.1). Thus, we could avoid this compatibility issue using a nonparametric estimator of the Gibbs potential. Unfortunately, this kind of estimation is not very well investigated in its full generality and its study is beyond the scope of this paper.
Nevertheless, we will use the conditional distributions F T and the principle of the Gibbs sampler for the simulation of an approximated realization of the random field X. We refer to [16] for the use of the Gibbs sampler (see also [18] Then, given a sequence z ∈ S Z 2 , the Gibbs sampler gives an approximated simulation of a realization of the distribution µ RT (·|z) = P(X RT |X ∂RT = z ∂R T ). The principle is to simulate a S RT −valued Markov chain with invariant distribution µ RT (·|z). When G (Φ) = {µ}, the compactness assumption ensures that weakly, whatever the value of the boundary conditions z ∂RT . The transition P (z) RT of this Markov chain is constructed using a visiting scheme for the sites of R T : at each time a site is chosen, we replace the value at this site by a realization of the conditional distribution F X|Y . A requirement is that each site has to be visited infinitely often. Here are two well known examples of transitions for the Gibbs sampler: • The periodic visiting scheme. Here we denote by ≺ the lexicographic order relation on R T . If x, x ∈ S RT and s ∈ R T , we define the vectors xx(s) ∈ S No such that xx(s) j = x s+j if s + j ≺ s and xx(s) = x s+j otherwise, completed with the boundary conditions x s+j = z s+j or x s+j = z s+j if the site (s, j) ∈ R T × N o is such that s + j / ∈ R T . Then we define the following transition on S RT : (3.4) • The random visiting scheme with uniform law uses the transition: From now on, we fix an arbitrary element z ∈ S Z 2 for the boundary conditions of the Gibbs sampler. Then we just write P RT instead of P (z) RT and the conditional distribution µ RT (·|z) is simply denoted by µ RT .
Suppose that we use the conditional distributions F T and the Gibbs sampler to synthesize a new texture on the rectangle R T . We suppose here that assumptions of Theorem 1 hold. Though the conditional distributions F T are not compatible with a Markov random field, one can use those distributions to simulate a Markov chain. The transition of this Markov chain can be constructed by replacing F X|Y by F T in (3.4) or (3.5) and we define Then if the marginal distribution of the Markov chain with transition P * T,RT converges a.s to an invariant probability µ * T,RT on I RT we hope that µ * T,RT is not too far from the unique invariant distribution µ RT associated to the transition P RT (the unicity holds if the transition P RT is positive, see [24]).
Note. Comparated to the method studied in [4], our extension uses the same nonparametric estimation but requires several sweeps on R T for the simulation of one bootstrap sample and then a high computational cost. Numerically, a simulation procedure using a single sweep is of course a great advantage, especially for the texture synthesis problem. However, without additional assumptions of conditional independence, it is not possible to simulate a sample using a single sweep.

Asymptotic when R T = R
Here we consider the case R T = R for all T (i.e we consider a fixed rectangle R for the simulation and an increasing size for the original sample). We are going to study the convergence of the sequence (µ * T,R ) T , defined in (3.8), to the distribution µ R . Usually, the size of the bootstrap sample must increase to infinity with the size of the original sample. This is the case for statistical applications (see [28] for applications to the distribution of some linear statistics) but also for the texture synthesis problem, where the bootstrap sample is generally larger than the original one. Then the asymptotic investigated here is not the most natural, but a consistency result can be stated using the assumptions of Theorem 1 only. We also point out that the family of distributions (µ * T,R ) R is not ensured to be coherent, in the sense that which cannot be the conditional distribution of a measure defined on a product (see Proposition 1 and its proof). This problem suggests that the shape of the bootstrap distributions depends not only on the size of the sample T , but also on the size of R. It also emphasizes that a classical nonparametric estimation of the LCD is not enough to estimate the full Markov model: this point is an important difference with respect to the markovian time series studied in [28].
The following theorem states a result of continuity of the invariant measure µ * T,R with respect to the transition P * T,R . Since the state space of the process X is compact, Theorem 1 and a tightness argument lead to the following result Theorem 2. For T ∈ N 2 such that I T = ∅, let P * T,R be the transition defined in (3.4) or (3.5) replacing F X|Y by F T . Assume that the assumptions of Theorem 1 hold. Then P * T,R has a unique invariant distribution µ * T,R and for all initial value x (0) ∈ I R T , the approximated Gibbs sampler converges to µ * T,R in distribution. Moreover we have almost surely:

Notes
1. It is easily seen that the regularity assumption (A3) of Theorem 1 is satisfied when the potential Φ is a family of continuously differentiable applications. 2. Theorem 2 shows the consistency of the bootstrap distribution only for the simulation on a fixed rectangle R. Of course, it is more interesting to investigate the case of increasing bootstrap samples X * RT with R T ր Z 2 . The difficulty for such an asymptotic is that no information seems available for the bootstrap distribution µ * T,RT except its invariance with respect to the Markov kernel P * T,RT . In particular, the conditional distributions of µ *

T,RT
are unknown and it is difficult to derive properties of some marginal distributions. However, Theorem 2 suggests that such an asymptotic should be possible if the growth of the sequence (R T ) T is moderated and G(Φ) = {µ}.
In the next section, we show that the sequence (R T ) T can be chosen arbitrarily under a strong assumption on the LCD.

Asymptotic when R T ր Z 2
The case R T ր Z 2 is the most natural for the bootstrap and its applications (e.g texture synthesis). We only consider the case of the Gibbs sampler with a random visiting scheme (3.5). Our aim here is to study the convergence of the sequence (µ * T,RT ) T to µ, the distribution of the random field X. The difficulty is that the transition P * T,RT (x, ·) defined in (3.5) converges weakly to the Dirac measure δ x and this limit is not informative to study the cluster points of the sequence (µ * T,RT ) T . On the other hand, the single site conditional distributions associated to µ * T,RT are not F T in general and are just defined through the implicit equation µ * T,RT = P * T,RT µ * T,RT ; a tractable expression of the bootstrap distributions seems not available. Moreover, the sequence (µ RT ) T can have several cluster points if the random field exhibits a phase transition, i.e |G(Φ)| > 1. A sufficient condition ensuring the unicity of the Gibbs measure is the Dobrushin contraction condition (see [8]). A survey of the Dobrushin contraction technique and its application is given in [15]. As we will see, this kind of condition is also sufficient to give a positive answer for the convergence of the sequence (µ * T,RT ) T . Let us first recall the definition of the contraction coefficients in our setting. If | · | denotes the usual distance on S, we recall that the Wasserstein metric of order 1 is defined for two distributions G and H on S by where the supremum is taken over all Lipschitz functions f on S with: Another expression is given by Then it can be shown (see [15]) that under the condition Finally, we denote by d R the distance between probability measures on S R defined by Then we have the following result.
Theorem 3. Assume (3.10). Then we have almost surely Moreover, for an arbitrary sequence of rectangles R T ր Z 2 , we have almost surely lim Notes 1. Condition (3.10) is a restrictive assumption because it assumes that the random field X has very weak interactions. Under this condition and using Theorem 4, a rate of convergence for the invariant measure of the approximated Gibbs sampler is available. We did not prove the convergence of µ * T,RT to the distribution µ of the sample assuming only that there is no phase transition. As mentioned before, the main difficulty is due to the lack of available local properties of the measure µ * T,RT , e.g local conditional distributions. This problem could be avoided using a nonparametric estimation of the Gibbs potential (although the corresponding simulation scheme will probably be no more linked to a resampling method). Indeed, in this case, an estimation of the local conditional distribution would be compatible with the existence of joint distributions, which is a failure in our context as shown in Proposition 1. Nonparametric estimators of the Gibbs potential seem to have been investigated only for particular cases. One can mention the work of Dachian [6] who considers {0, 1}−valued random fields or [7] for the nonparametric estimation of a pairwise-interaction Gibbs point process. On the other hand, an estimation of each function Φ C where C is a clique of N o and Φ C is defined in (3.1) could lead to a much more complicated procedure for the estimation of the local conditional distribution than the method studied here which is a simple extension of that of Bickel and Levina, using nonparametric regression estimation. 2. We mention that the more classical Dobrushin condition for unicity (i.e G(Φ) = {µ}) is formulated using the total variation distance. In this case the condition (3.10) becomes But contrarily to the Wasserstein metric, the total variation distance is not compatible with the convergence stated in Theorem 1 for the LCD. 3. In this paper, we study the Gibbs sampler with random scanning using the uniform distribution. We mention that one can also use different distributions {c j,T : j ∈ R T } for the visiting scheme, for which lim T →∞ c T,j = c j > 0 with j∈Z 2 c j = 1. In this case, the cluster points of the sequence (µ * T,RT ) T can be shown to be invariant measures for an infinite volume Gibbs sampling. Additional developments are required in order to study the unicity of these cluster points; this problem will be investigated in a subsequent paper.

A convergence rate in Theorem 1
In this section, we establish an almost sure convergence rate for the estimators (2.2), which is a result of independent interest. Here we consider a random field X and a finite subset A of Z 2 \ {0}. A convergence rate for the Theorem 2 of [4] can be established. In fact, as mentioned previously, the proof of this theorem does not use the MMM property (2.1) and applies to more general random fields. Following Bickel and Levina's proof, we obtain the following result.

Notes
• The convergence rate in Theorem 4 is depending on a power law of [T ] and the maximal exponent of convergence rate that we can obtain is 1 2(1+v) (when τ → ∞ for the mixing assumption). Note that if o = 0 (corresponding to an independent random field) then v = 0 and the convergence rate is arbitrary close to [T ] 1/2 . • From this result, one can also obtain a convergence rate for joint distributions (see in particular Theorem 6.1). Such a rate could be interesting to obtain additional properties of the bootstrap distribution.

The choice of parameters
From the above results, we have a theoretical method to resample a Markov random field. It is enough to use the sample {(X s , Y s )/s ∈ I T } and to run an approximated Gibbs sampler with the conditional distribution F T evaluated with the help of a kernel W . But from a practical point of view, the choice of two parameters need to be discussed: the neighborhood size o in the Markov property and the value of the bandwidth b T . In this paper, we focus on the bandwidth parameter b T assuming that the neighborhood size o is known. The quality of the bootstrap approximations depends heavily on how close the function F T (·|y) is to the LCD F X|Y (·|y). Usually, for the nonparametric kernel regression, a value of b T which is too small will give predictions with too high variance while a value of b T which is too large will lead to an oversmooth function with high bias. In [28], the authors propose to equilibrate the bias and the variance, fitting a Gaussian AR process to the data. If X denotes a p-Markovian time series, they assume that X satisfies the following equation where ξ is a Gaussian white noise (ξ 0 ∼ N (0, σ 2 )). In this case, assuming that the kernel W satisfies W (y) = Π p i=1 W (y i ) (e.g the Gaussian kernel), the number b T = b T (y) which minimizes the quantity (Bias (F T (x|y))) 2 + Var (F T (x|y)) dx, is given by (see [28]) Moreover W 1 = W 2 (y)dy and W 2 = y 2 1 W (y 1 )dy 1 < ∞. Note that an estimatorb AR (y) of b AR (y) can be derived substituting the unknown quantities in (3.12) by sample estimates or fitting the linear model to the data. The advantage of such an approximation is that the values of b AR depend on the particular state y of the process. In fact this method can also be adapted to the case of random fields, assuming in our case and using a straightforward modification of (3.12).
One other approach is to use cross validation in the spirit of Härdle and Vieu [19]. For example, one can compute the kernel estimator using a part of the sample and then a prediction error using the second part of the sample. In [19], the case of the leave-out technique is studied for a two-dimensional time series.
We propose to combine the two approaches mentioned above. More precisely, for a particular state y, we define a bandwidth of the form b T (y) = C T × b AR (y). (3.13) Then we first compute an estimatorb AR as in [28] using the formula (3.12) and after the constant C T is estimated using cross-validation for the kernel estimation of the mean E (X t |Y t ), separating the sample into two parts of equal size.
In this case, the scale parameter C T could correct the misspecification made by fitting the Gaussian model to the data. Of course, one can modify this approach using cross-validation for the kernel estimation of E (½ Xt≤x |Y t ) and minimizing an integrated error over x. One can also use the leave-out method for the second step or just one step with cross validation only. However, more complicated methods will require more programming effort. In the next section, we use the two-step method described previously for the choice of the bandwidth parameter. We also mention that these methods also apply to the MMM algorithm, with appropriated modifications.

The example of a pairwise-interaction Gibbs potential
In this section we consider the case of a Markov random field whose state space is S = [a, b], the reference measure is the Lebesgue measure and the (translation invariant) Gibbs potential is given for A ⊂ N 1 by otherwise. (4.1) A random field X associated with the previous Gibbs potential is a nearest neighbor Markov random field (i.e o = 1), whose one point conditional densities are given for i ∈ Z d by It is a particular example of a continuous spin Ising model. The following proposition makes condition (3.10) more precise in this context: Proposition 2. If j∈N1 |β j | < 12 (b−a) 2 , then condition (3.10) holds.

Notes
• One can also extend this result for a neighborhood size o ≥ 1.
• The more classical unicity condition of Dobrushin which is expressed in term of total variation distance (see 3.11) is less easy to precise. One can consider the stricter condition given by Simon [32] which writes For the potential defined in (4.1), when a = −b < 0, we obtain the condition j∈No |β j | < 1 b 2 which is more restrictive than that of Proposition 2. • For a potential given by (4.1), it is straightforward to verify the regularity assumptions (A2) and (A3). However the assumption (A1) is less clear. It is well known that mixing conditions with exponential decays hold under the usual Dobrushin unicity condition (see [17]).When for example b = −a, assumption (A1) is satisfied if j∈No |β j | < 1 b 2 (which implies the condition (3.11) by the remark above). Without condition (3.11), it is not easy to prove mixing properties of the random field. This problem occurs in particular if we assume only the weaker contraction condition given in Proposition 2. Nevertheless, the contraction condition (3.10) implies covariance inequalities between Lipschitzian functionals of some marginals of the unique Gibbs measure (see [15]). In this case, one can verify the Lipshitzian type mixing conditions introduced by Doukhan & Louichi [10] and an analogue of inequality (1) for the moment of partial sums is available (see [11]). Thus, with this slight modification of assumption (A1), Theorem 1 still holds. Details are omitted.
The problem of phase transition is illustrated in figure 3 when all the β j 's equal to 0.5, β = 0 and S = [−1, 1]. In this case, a simulation of a sample of size 200 × 200 does not give the same distribution when the boundary conditions satisfy z ≡ −1 or z ≡ 1, as it is usually the case in statistical mechanics when the inverse of the so-called temperature (here 0.5) is sufficiently high to ensure that |G(Φ)| > 1. Figure (4) shows a sample and its replicate in two cases. For the first case, we set β = 0 and the β j 's are given by the following array Observe that the condition given in Proposition 2 does not hold, but we choose the coefficients β j large enough in order to make the "attraction-repulsion" phenomenon in the Gibbs potential sufficient for a visual analysis of the interactions and of the quality of the resampling. Indeed, condition (3.10) is related to weak interactions for the random field and local characteristics of the sample are less distinguishable. In figure 3, in order to avoid a possible dependency of the simulation with respect to the boundary conditions, we use the same boundary for the sample and the bootstrap replicate. Of course, one could also estimate the distribution of some linear statistics as in [28] and a comparison with the MMM algorithm studied in [4] or the block method [30] would be of interest. This can be done for model (4.1), under the condition of Proposition 2 which ensures the consistency of the bootstrap distributions. Those aspects will not be considered in the present paper: a statistical comparison between the MMM and the method of Gibbs sampling requires the simulation of several bootstrap samples for several original samples and then a high computational coast. This kind of comparison could be investigated more seriously in a subsequent paper.

Paget and Longstaff method
For the texture synthesis problem, Paget and Longstaff [27] investigate a nonparametric method also based on the Gibbs sampler, using kernel density estimation, but the consistency of their algorithm was not considered. Their nonparametric estimator of the LCD is not directly linked to the regression estimation and resampling but all the results formulated in section 3 have their analogue. However, we choose to formulate our results in a resampling context and in the spirit of the existing literature ( [4,28]). For highly structured textures, the convergence of the Gibbs sampler can lead to some numerical difficulties, especially if the neighborhood size o must be very large to capture the global characteristics of the texture and if the Gibbs potential of the underlying stochastic process has a large number of local minima. In this case, a huge number of iterations could be necessary in order to reach the equilibrium.
Then Paget and Longstaff proposed a multiscale algorithm, using a multigrid representation of an image, as shown in Figure 5 which is taken from [27]. If The set of sites S l at level l represents a decimation of the previous set of sites S l−1 at the lower grid level l − 1. The neighborhood system is redefined for each grid level l > 0: N l t = {s ∈ S l / t − s ∞ ≤ o}. For a level grid l, the Gibbs sampling is not applied to the sites s ∈ S l+1 . In fact, a multiscale representation of a Markov random field is used (see [23] for a precise definition). To better incorporate the multiscale relaxation described above, Paget and Longstaff have also introduced a pixel temperature function in order to determine when to stop the Gibbs sampler at one level and start it at the next level. Let l be a grid level. A pixel temperature is incorporated in equation (2.2) by modifying the form of the difference In fact, at the beginning of the stochastic relaxation at a level l, they define for a site j ∈ S l of the output texture the pixel temperature c j as follows: c j = 0 if j ∈ S l+1 and c j = 1 otherwise. The difference d is replaced by d ′ such that: When a pixel x t has been relaxed in the stochastic relaxation process, Paget and Longstaff set: where ξ < 0 is fixed by the user.
Here, the idea is to provide a total confidence to pixels coming from the preceding resolution and to progressively increase the confidence level of a pixel synthesized in the present resolution. When c j = 0 ∀j ∈ S l , the stochastic relaxation process is considered to have reached an equilibrium state, indicating that the image can be propagated to the next lower grid level. This notion of temperature is related to the global temperature used in stochastic annealing (see [16]). Although we have incorporated this pixel temperature function for texture synthesis, we will not study in this paper statistical properties of such an approximation.

Texture synthesis examples
In order to illustrate the behaviour of the resampling scheme for the texture synthesis problem, we use the multiscale algorithm with the pixel temperature function described above in order to simulate a new texture from a sample. The Gaussian kernel is chosen for the simulations.
• The neighborhood size is set at o = 3 or o = 4. • As in [26], we set ξ = −1. Moreover 4 or 5 grid levels are used for the synthesis. Figure 6 shows a step of the synthesis at the highest resolution. The Gibbs sampler runs in the raster ordering (the periodic visiting scheme (3.4)) and    Figure 6 shows the first sweep. One can see that the lower resolutions give the shape of the texture. Moreover the pixel temperature function helps for a good initialization of the sampler. Simulation results are also given in Figure 8 and Figure 7 and the visual quality is comparable to the simulations given in [4]. Note that small discontinuities appear for the bootstrap sample in Figure 7. Those problems could be explained by the presence of several local maxima for the true probability distribution of the sample but also by additional local maxima due to the nonparametric estimation. A comparison between the different algorithms in the texture synthesis setting is not the objective of the present paper. Generally, the MMM and the multiscale algorithm give good results and the computer graphics community has explored several variants and complex computational tools very well adapted for this difficult problem over the last decade (see in particular [22,12] or [21]), in particular for improving the spiral algorithm of Efros & Leung. Instead of a visual appreciation of the simulations, we believe more informative a comparison of some statistical properties of the different resampling schemes used in their simplest form, in order to bootstrap the distribution of a simple model (for example model (4.1)).

Assumptions of Theorem 1
(A1) The random field X is strictly stationary and α-mixing: for k, c ∈ N * , we assume that the strong mixing coefficients where the suppremum is taken over all the subsets E, for a given ǫ > 0 and a real number τ > 2 such that for c = 2 ⌈τ /2⌉. (A2) The distribution of X t has a compact support S ⊂ R. (A3) F X,Y = P (X t ≤ ·, Y t ≤ ·), F X|Y and F Y = P (Y t ≤ ·) have bounded continuous strictly positive densities (denoted f X,Y , f X|Y and f Y respectively) with respect to Lebesgue measure. Moreover, there exists L > 0 such that for any y, y ′ ∈ S A , any x ∈ S, (A4) The family of kernels (W (ℓ) ) ℓ∈N * is such that W (ℓ) : R ℓ → (0, ∞) are bounded, symmetric and first-order Lipschitz continuous functions such that for all ℓ ∈ N * , Moreover, the width of W

Proof of Theorem 4
We follow the proof of theorem 2 of Bickel and Levina in order to compute a convergence rate. We first recall the following lemma which proof can be found in [9]. Lemma 1. (Moment inequality). Let F t be a real-valued random field indexed by I ⊂ Z d satisfying conditions (A1). If EF t = 0, F t ∈ L τ then there exists a constant C depending only on τ and mixing coefficients of F t such that .
It is easy to see that if sup t F t ∞ ≤ M , then we obtain: For (x, y) ∈ S × S A , we set: We have: Following the proof of lemma A2 in [4], we prove the following result for 0 < γ < τ −2 2(v+1)(τ +v+2) .
Proof of Lemma 2 In this proof, we will denote by C > 0 a generic constant which does not depend on T . Let δ > 0 such that b T = O [T ] −δ , then the proof of lemma A2 in [4] leads to sup Then we need to bound sup (x,y)∈S×S A |r T (x, y) − Er T (x, y)|.
As in [4], we define and we need to bound sup (x,y)∈S×S A 1 [T ] t∈IT Z t,T (x, y) .
As S × S A is compact, we can cover S × S A with N T cubes I i,T with centers (x i , y i ) and sides L T for the supremum norm. Without loss of generality, we suppose x 1 ≤ · · · ≤ x NT and we set x 0 = x 1 − L T and x NT = x NT + L T . Then • First let us deal with term II. Using assumption (A4) for the kernel, we have for t ∈ I T and x ∈ (x i−1 , x i ]: x i+1 ). Note that assumption (A2) about the existence of densities allows to derive the bound: We have: , we obtain using (A1) and lemma 1

L. Truquet
By the choice of γ, we have β(v + 1) + (γ + vδ − 1)τ + τ 2 < −1 and we deduce from the Borel Cantelli lemma that Now using the previous inequalities, we deduce that: • Now we turn on the term I. For a real number γ < τ −2−2vδτ −2β(v+1) 2τ , we have using (A1) and lemma 1: By the choice of γ, we have and by the Borel Cantelli lemma, we have Now we choose the number β such that: This leads to β = τ −2+2τ δ 2(v+τ +1) and to the following rate: Finally, we choose δ for an equilibrium with the bound (6.3), solving the equation: This leads to: which gives the rate given by the Lemma 2.

Proof of Proposition 1
Suppose that Q is a probability measure on E = I Z 2 T whose one point conditional distributions are given by F T . For h ∈ Z 2 , we denote by p h : z → z h the projection defined on E. We consider k ∈ (N * ) 2 and i ∈ N o such that {k, k+i} ⊂ I T . We have: On the other hand, we have also: Using that W (u) = (2π) −v/2 exp(− u 2 /2) as well as the form of the distributions F T , the equality between the two expressions of Q(p0=pi=X i+k /p h =X k ,k / writes: Then we derive: This is a contradiction with the density assumption for the conditional law X k /X s , s = k.

Proof of Theorem 2
In the theory of Markov processes, conditions ensuring the continuity of the invariant distribution with respect to the generator can be found in [14] (see theorem 9.10) and the following theorem is a particular case for discrete time Markov chains with a compact state space. For completeness of this work we give a proof of the following theorem.
Theorem 5. Suppose that for T ∈ N 2 such that I T = ∅, Q T is a transition on I R T ⊂ S R . Assume that there exists a transition P on S R satisfying for a given continuous and bounded real valued function g on S R : We suppose that Q T (resp. P ) admits an unique invariant distribution ν T (resp. µ). Then we have almost surely: Proof of Theorem 5 By definition of ν T , we have the following equality: where B(S R ) denote the Borel σ−algebra on S R . Since S R is a compact metric space, the tightness of the sequence (ν T ) T implies the existence of a cluster point denoted by ν. We are going to show that ν = µ. Then by uniqueness of the cluster point, we will conclude: Almost surely : lim To show that ν = µ, it is sufficient to prove that ν is an invariant probability of the transition P on S R . Then we need to show that ν(A) = P (x, A)ν(dx), ∀A ∈ B(S R ). Suppose that (T n ) n∈N is sequence in N * × N * such that lim n→∞ ν Tn = ν. Then if g is a continuous and bounded real valued function on S R , we have: By assumption, the function x → g(y)P (x, dy) is still bounded and continuous and the weak convergence of the sequence (ν Tn ) n implies that B → 0 (n → ∞).
More, A → 0 (n → ∞) follows directly from the assumption 1. Then, by the uniqueness of the limit of the sequence (ν Tn ) n , we conclude that ν(dy) = P (x, dy)ν(dx) and the conclusion of the theorem follows. Theorems 5 and 4 can now be applied to prove the convergence of the invariant distributions of the approximated Gibbs sampler for the two visiting schemes (3.4) and (3.5).
Proof of Theorem 2 First, from the positivity assumption on the kernel W in Theorem 2.2, P * T,R is a.s a positive transition on the finite state space I R T . Then this Markov chain has a unique invariant distribution µ * T,R and almost surely, we have for a given starting value x (0) ∈ I R T : lim n→∞ P * T,R n x (0) , dx = µ * T,R (dx), in distribution, and the approximated Gibbs sampler converges to µ * T,R . Now we are going to show that transitions P * T,R , P R satisfy the assumptions of Theorem 5. The positivity of the transition P R has been already discussed for both cases. As µ = P (X R |X ∂R = z ∂R ) is an invariant distribution for the transition P R , we know from theoretical results about Markov chains that µ is the unique invariant distribution for P R (see [24]). Assumption 2 in Theorem 5 holds from assumption (A2) on f X|Y and the Lebesgue theorem. So we only need to verify assumption 1 in Theorem 5. To this end, we first prove that for a given continuous and bounded real valued function h on S R × S R , we have ∀s ∈ R: Indeed if ǫ > 0, the uniform continuity of h on S R × S R entails the existence of a number δ > 0 such that Now choose a subdivision a 1 , . . . , a k of S such that for i ∈ {1, . . . , k − 1} either (a i , a i+1 ) ∩ S is an empty set, either |a i − a i+1 | < δ. Then for s ∈ R, we define the function h k,s on S R × S R by where for ℓ ∈ {1, . . . , k − 1}, κ (ℓ) is the element of S R such that κ (ℓ) s = a ℓ and κ (ℓ) t = κ t if t ∈ R \ {s}. Then we have h − h k,s ∞ < ǫ and we deduce if T is large enough, using Theorem 1. Then the limit (6.6) is proved. Now, we use (6.6) to verify assumption 1 in Theorem 5.
• In the case of the periodic visiting scheme (3.4), if u ∈ R is such that u ≻ s, ∀s ∈ R \ {u}, we have: g( x)P * T,R (x, d x) − g( x)P R (x, d x) ≤ g( x)F X|Y (d x u | xx(u)) × ⊗ s∈R\{u} F T (d x s | xx(s)) − ⊗ s∈R\{u} F X|Y (d x s | xx(s)) + sup xs∈S,s =u g( x) F T (d x u | xx(u)) − F X|Y (d x u | xx(u)) . (6.7) Then if we iterate the bound (6.7), using a non increasing enumeration of the sites of R, we deduce the existence of a family {h s /s ∈ R} of continuous and bounded functions on S R × S R such that: (6.8) Note that the continuity of h s follows from assumption (A2) and the Lebesgue theorem. Then applying (6.6) to the bound (6.8), we conclude that assumption 1 in Theorem 5 holds.
• In the case of the random visiting scheme (3.5), we have for u ∈ I R T : g(w)P * T,R (u, dw) − g(w)P R (u, dw) ≤ 1 |R| s∈R g(w s u R\{s} ) F T (dw s |u s+No ) − F X|Y (dw s |u s+No ) .
Proof of Theorem 6.1 For the proof we reformulate the contraction method used by Dobrushin [8] (see also [15] (p 116-123)) in order to obtain a contraction for the transition P R . From the two expressions of the Wasserstein metric d W , we have the following inequality | gdG − gdH| ≤ L(g) S |G(w) − H(w)|dw, (6.10) which holds for a Lipschitz function g : S → R and for two distribution functions H,G with support S (L(g) denotes the Lipschitz constant of g). We use the notation u − j = (u i ) i∈Z 2 \{j} . For f ∈ L R , we have |F T (x|y) − F X|Y (x|y)|dx, using (6.10). Moreover we have Using (6.10) and setting L j = 0 if j / ∈ N o , we have This yields to the following contraction inequality with α = j∈No L j < 1, from (3.10). The first part of the theorem easily follows. Since the bound obtained for d R µ * T,R , µ R does not depend on R, the second part of theorem follows from the first part, the converge of µ RT to µ and the fact that Lipschitzian functions form a convergence-determinating class.
Proof of Proposition 2 The result will easily follow from the following inequality which states that for every (α, α) ∈ R 2 : b a e αz − e αa e αb − e αa − e αz − e αa e αb − e αa dz ≤ (b − a) 2 12 |α − α| . (6.11) Assume that α < α. First, we observe that the left hand side of (6.11) can be written Using the concavity of the function y → y z on R + , it is easily seen that the absolute values can be suppressed. Then the left hand side of (6.11) writes