Automatic tempered posterior distributions for Bayesian inversion problems

We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise is split. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure, alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of tempered posterior densities is generated, where the tempered parameter is automatically selected according to the actual estimation of the noise power. A complete Bayesian study over the model parameters and the scale parameter can be also performed. Numerical experiments show the benefits of the proposed approach.


Introduction
The estimation of unknown parameters from noisy observations is an essential problem in signal processing, statistics and machine learning [1,2,3]. Within the Bayesian signal processing framework, these problems are addressed by constructing posterior probability distributions of the unknowns. Given the posterior, one often wants to make inference about the unknowns, e.g., if we are estimating parameters, finding the values that maximize their posterior, or the values that minimize some cost function given the uncertainty of the parameters. Unfortunately, obtaining closed-form solutions, usually expressed as integrals of the posterior, is infeasible in most practical applications. Hence, developing approximate computational techniques (such as importance sampling and Markov chain Monte Carlo (MCMC) algorithms) are often required [4,5,6].
The so-called tempering of the posterior is a well-known procedure for improving the performance of the Monte Carlo (MC) algorithms [7,8,9,10]. The tempering is obtained by modulating an artificial scale parameter or by sequentially including new data. The reasons of the improvement in the performance are several: improving mixing, discovering modes, foster the exploration of the inference space, etc. In the first iterations of the MC scheme, a posterior density with a bigger scale is considered. The artificial scale parameter (often called "temperature") is reduced during the iterations, until considering the true posterior distribution. However, the user should decide a temperature schedule, i.e., a decreasing rule for the scale parameter, which is usually chosen in an heuristic way. In the literature, the tempering procedure has gained a particular attention for the estimation of the marginal likelihood (a.k.a., Bayesian model evidence) [9,11,12].
Furthermore, the joint inference of parameters (denoted as θ) of observation models, f(θ), and scale parameters of the likelihood function (that, in the scalar case, is usually denoted as σ) can be a hard task. Indeed, "wrong choices" of σ values can easily jeopardize the sampling of θ. In this work, we introduce a procedure to tackle this problem.
More specifically, in this work, we design an adaptive importance sampling (AIS) scheme [13] for Bayesian inversion problems, where an automatic tempering procedure is implemented. We consider that the vector of observations y is obtained by a nonlinear transformation f(θ) of the variables of interest θ, perturbed by additive Gaussian noise with unknown power σ 2 . The nonlinear mapping f(θ) usually represents a complex physical model or a computer code etc. The resulting posterior densities are usually highly multimodal and complex distributions. Furthermore, the inference task in the joint space [θ, σ] is particularly challenging. We proposed a split strategy to tackle this problem. We consider an optimization approach over σ and a sampling scheme for θ. More specifically, we design an iterative procedure where these two tasks are alternated. Additionally, the actual maximum likelihood (ML) estimation of the noise power, σ 2 ML , is employed as a tempering parameter, starting from high values and then "cooling down" according to the ML estimations at each iteration. Therefore, the proposed scheme deals with a sequence of tempered posteriors according to the current estimation σ 2 ML . It is important to observe that, given a fixed vector θ, the ML estimation σ 2 ML can be obtained analytically. Furthermore, the complete Bayesian analysis regarding the joint posterior of θ and σ is also possible (as discussed in Section 5). This is obtained by implementing a proper re-weighting of the samples generated by the proposed algorithm, called Automatic Tempering AIS (ATAIS), without any additional evaluations of the observation model. An approximation of the marginal posterior of σ is provided as well. The advantages of the proposed scheme are shown in two numerical experiments, one of them considering a complex astronomical model.

Problem Statement
Let us denote the observed measurements as y = [y 1 , ..., y K ] ∈ R K , and the variable of interest that we desire to infer, as θ = [θ 1 , ..., θ M ] ∈ Θ ⊆ R M . Furthermore, let us consider the observation model where we have a nonlinear mapping, and a Gaussian perturbation noise, with σ > 0, and we have denoted the K-dimensional unit matrix as I K . 1 The noise variance σ 2 is unknown, in general. The mapping f(θ) could be analytically unknown: the only assumption is that we are able to evaluate it pointwise. The likelihood function is Note that we have two types of variables of interest: the vector θ contains the parameters of the nonlinear mapping f(θ), whereas σ is a scale parameter of the likelihood function. Given the vector of measurements y, we desire to make inferences regarding the hidden parameters θ and the noise power σ 2 , obtaining at least some point estimators θ and σ 2 . We are also interested in performing uncertainty and correlation analysis among the components of θ. Furthermore, we aim to perform model selection, i.e., to compare, select or properly average different models.
Bayesian inference in the complete space. We consider prior densities g θ (θ) and g σ (σ) over the unknowns. Hence, the complete posterior density is The marginal likelihood Z = p(y) is This quantity is often needed for model selection. Since Z(y) is generally unknown, we can usually evaluate pointwise the unnormalized posterior π(θ, σ|y) = (y|θ, σ)g θ (θ)g σ (σ) (i.e., p(θ, σ|y) ∝ π(θ, σ|y)). More generally, the computation of integrals of the form where h(·) : Θ × R + → R is an integrable function, is usually required. We consider a Monte Carlo quadrature approach for approximating the integral above and, more generally, provide a particle approximation of the joint posterior p(θ, σ|y).
Main observation. Generally, generating random samples from a complicated posterior in Eq. (6) and computing efficiently the integrals as in Eqs. (7)-(8) is a hard task. Moreover, this task becomes often more difficult when we try to perform a joint inference where scale parameters are involved, i.e., σ, and parameters of the nonlinearity, i.e., θ. Indeed, "wrong choices" of σ values can easily jeopardize the sampling of θ. In the next section, we describe a strategy that we propose to tackle this problem. Before, we need to recall some additional definitions.
Conditional and marginal posteriors. In other to design efficient computational schemes is often useful to consider the conditional posteriors, for instance, In the next section, we will see that the idea underlying the proposed scheme is to split the space [θ, σ], restricting the sampling problem only with respect to θ and considering an optimization problem with respect to σ. The conditional marginal likelihood is obtained by integrating out one of the two variables, i.e., The integral above cannot be computed analytically, in general. We can also consider marginal posteriors, for instance, the marginal posterior of σ is Note that the joint posterior in Eq. (6) can be also written as p(θ, σ|y) = p(θ|y, σ)p(σ|y).
Underlying idea. The underlying idea of this work is to divide the inference study in two parts. In the first part (Sections 3 and 4), we focus on the study of the conditional posterior p(θ|y, σ) given a fixed σ. Then, in the second part (Section 5), we also estimate the marginal posterior p(σ|y). Finally, using (12), we can obtain a final approximation of the complete posterior p(θ, σ|y). Estimations of Z(σ) and Z are also obtained.

Key observations and proposed approach
In the first part of work, we assume a proper (or improper) uniform prior over θ, i.e., g θ (θ) ∝ 1 in Θ.
The possible use of a general choice of g θ (θ) is discussed in Section 4.1. Let θ MAP denote the MAP estimator of p(θ|y, σ). Generally, θ MAP should be a function of σ, i.e., θ MAP = θ MAP (σ). However, due to the choice of likelihood function (and the uniform prior) considered in this paper, we have that which does not depend on σ, i.e., we have that θ MAP maximizes the conditional posterior p(θ|y, σ) for any σ. See App. A for further details. Furthermore, the variance of the conditional posterior p(θ|y, σ) grows when σ increases. In this sense, with larger σ, the density p(θ|y, σ) is 'wider', hence it is easier for Monte Carlo methods to explore the space (namely, we have a tempering effect). Based on these considerations, we can run Monte Carlo schemes (specifically IS algorithms) on p(θ|y, σ 0 ) with a large value σ 0 for estimating θ MAP more efficiently. Furthermore, apart from estimating θ MAP , we are also interested in studying the conditional posterior p(θ|y, σ ML ) where The value σ ML can be obtained in closed-form (see App. A). In fact, for any θ, we have has the form of an Inverse Gamma density for σ 2 , and it has a unique mode at 1 K ||y − f(θ)|| 2 , considering θ is a fixed value. Hence, finally we have This can serve as a point estimator of the noise power in the system, and also as possible value to stop the tempering of the conditional posterior, as we show in the following section.

Suggested iterative approach
Consider we start with a large value σ 0 , which can be viewed as a coarse approximation of σ ML , so we MAP denote an estimator of θ MAP obtained by working w.r.t. p(θ|y, σ (0) ML ). We use this current estimation to obtain the next value of σ, i.e., σ (1) will be a better estimator of σ ML than σ (0) ML . We can iterate this procedure: for t = 1, . . . , T : 1 Estimate θ (t) MAP by Monte Carlo (e.g., by an IS scheme) working with respect to p(θ|y, σ (t−1) ML ). 2 Compute With this iterative scheme, we have that σ (T ) ML → σ ML as T grows, hence we eventually perform IS with respect to the density of interest p(θ|y, σ ML ). Furthermore, a non-increasing sequence of values ML is produced, which facilitates the estimation of θ MAP , and ensures the IS estimation of p(θ|y, σ ML ) is performed efficiently by using the set of intermediate, tempered (i.e., wider) distributions p(θ|y, σ (t) ML ) for t = 0, 1, ..., T . Finally, a particle approximation of p(θ|y, σ (T ) ML ) is obtained, i.e.,

Automatic Tempering Adaptive Importance Sampling (ATAIS)
In this section, we describe an adaptive importance sampler with an automatic tempering approach which follows the procedure given above. At each iteration t of the algorithm, we have an ML approximation of σ, i.e., σ (t−1) ML . Considering Eq. (9), we define the unnormalized tempered conditional posterior at the t-th iteration, where we assume g θ (θ) ∝ 1 in Θ. For other generic choice of g θ (θ), see the discussion in Section 4.1. At each iteration, we consider p(θ|y, σ (t−1) ML ) ∝ π t (θ) as the target distribution. The dependence on the iteration t is due to σ (t) ML varies with t. The ATAIS algorithm is outlined in Table 1. The resulting scheme is an adaptive IS algorithm which combines sampling schemes and stochastic optimization. It is important to remark that, if σ (0) ML is bigger than the true ML value, we generate a non-increasing sequence of σ (t) ML , i.e., σ (0) Note that this is true since we have assumed a uniform prior g θ (θ). 3 IS steps. A set of N samples {θ (n) t } N n=1 are drawn from a (normalized) proposal density q(θ|µ t , Σ t ) with mean µ t and a covariance matrix Σ t . An importance weight , is assigned to each sample. Proposal adaptation. A particle estimation of the conditional MAP estimator of θ is given by . The value of current MAP approximation π t ( θ t ) is then compared with the value of global MAP estimator obtained so far denoted as π MAP . If π t ( θ t ) ≥ π MAP , all the global MAP estimators are updated and the proposal pdf is moved at θ t , i.e., we set Otherwise, we keep the previous values θ (t) MAP = θ (t−1) MAP , π MAP and µ t = µ t−1 . The covariance matrix Σ t is adapted by considering the empirical covariance of the weighted samples. Note that, we set µ t = θ (t) MAP instead of using the empirical mean of the samples (as in other classical AIS schemes). This is because we have noticed that this choice provides better and more robust results, specially as the dimension of the problem grows.
Automatic tempering. As we showed in the previous section, the current ML estimator of σ can be obtained analytically as where r t = f( θ t ). If the current ML estimator σ t is smaller than the current global one ML . Actually, with a uniform prior g θ (θ), every time that we update θ (t) MAP we also update σ (t) ML (see footnote in the previous page).
ATAIS outputs. After T iterations, a final correction of the weights is needed, i.e., in order to obtain a particle approximation of the measure of the final conditional posterior p(θ|y, σ (T ) ML ) ∝ π T +1 (θ). Thus, the algorithm returns the final estimators θ (T ) MAP , σ (T ) ML , and all the weighted samples {θ (n) t , w (n) t }, for all n = 1, ..., N and t = 1, ..., T . Other outputs can be obtained with a post-processing of the weighted samples, as shown below. Note that Eq. (17) does not require any additional evaluation of the model, and the error e (n) Moreover, we can also use e (n) t and {θ (n) t } for building a particle approximation of any other conditional posterior p(θ|y, σ). This allows the study of the marginal posterior p(σ|y) and provides the complete Bayesian inference, as we show in the next section.

For a generic prior g θ (θ)
The ATAIS algorithm is based on the fact that θ MAP does not depend on σ. This allows us to progressively estimate it by targeting the sequence of tempered posteriors π t (θ) ∝ p(θ|y, σ (t) ML ) that share all the same MAP. However, in the case g θ (θ) is not uniform, we generally have one θ MAP (σ) for each p(θ|y, σ), and we could have that the sequence of θ MAP ( σ (t) ML ) will not approach θ MAP (σ ML ). If the data are informative and the prior g θ (θ) is chosen such it is vague with respect to the likelihood, the position of θ MAP (σ) is not very sensitive to the value of σ. Namely, we have θ MAP ( σ (1) ML ) ≈ ii. Assign to each sample the weights (b) Current maximum estimations: i. Obtain θ t = arg max n π t (θ (n) t ), and compute MAP and keep the value of π MAP .
(d) Adaptation: Set t and > 0 is a small scalar value .

Output: Return the final estimators θ (T )
MAP , σ (T ) ML , and all the weighted samples {θ (n) t , w (n) t }, for all t and n, with the corrected weights θ MAP ( σ (2) ML ) ≈ · · · ≈ θ MAP (σ ML ), and hence our algorithm can be applied in this context. When the data are not informative, we should use an even more vague prior (i.e. wider than the likelihood function) in order to maintain the usefulness of the algorithm.

Complete Bayesian inference with ATAIS
Let us assume we have a proper prior g θ (θ) and we introduce another proper prior g σ (σ) for σ. The outputs of the ATAIS algorithm can serve to approximate the normalizing constant of the joint posterior p(θ, σ|y) ∝ (y|θ, σ)g θ (θ)g σ (σ), i.e., the so-called marginal likelihood or Bayesian model evidence, given by where we have denoted Z(σ) = Θ (y|θ, σ)g θ (θdθ, usually called conditional marginal likelihood. The quantity Z is useful for model selection purposes. Furthermore, a complete Bayesian study of the joint posterior p(θ, σ|y) can be provided as well.
Approximation of Z(σ) = p(y|σ). After the T iterations of ATAIS, we can also approximate the conditional marginal likelihood Z(σ) = p(y|σ) without additional evaluations of the target function. Indeed, saving the error values at each particle obtained for the computation of the likelihood function during ATAIS, e (n) t = ||y − f(θ (n) t )|| 2 , we can compute the IS weights, for a generic value of σ Thus, the IS estimator of the conditional marginal likelihood Z(σ) is given by the arithmetic mean of the weights ρ (n) t (σ), Approximation of Z. Drawing σ (r) ∼ g σ (σ), for r = 1, ..., R, (or considering a deterministic grid, e.g., as a Riemannian integration), we can approximate the global marginal likelihood Z by applying simple Monte Carlo to the integral in Eq. (22), Approximation of p(σ|y). An approximation of the marginal posterior p(σ|y) = p(y|σ)g σ (σ)

p(y)
can be also obtained as which can be used to approximate, e.g., the MAP of p(σ|y) by σ MAP-marg ≈ arg max σ Z(σ)g σ (σ). Other different moments of p(σ|y) can be computed by a deterministic quadrature (since the problem is now one-dimensional) or applying noisy Monte Carlo approaches. Complete Bayesian analysis. We can approximate the integral of interest as and σ ( j) are generated by applying a noisy MCMC with invariant density p(σ|y) ∝ Z(σ)g σ (σ). Note that the samples θ (n) t do not depend on the index j (they do not change) since we are recycling the particles generated by ATAIS and reusing evaluations e (n) t = ||y − f(θ (n) t )|| 2 .

Simulations
We test the proposed scheme in two numerical examples. The first numerical experiment is a simple bidimensional example (which is easy to be reproduced). The second experiment considers a realworld application, i.e., a radial velocity models of exoplanet systems which is often employed in astronomy applications (with a dimension of the inference problem of 6 and 11).
Considering also a uniform prior over σ in (0, 20], we have also a bidimensional joint posterior over [θ, σ], which is depicted in Fig. 2(a). In this bidimensional example, it is possible to obtain the ground-truths using an expensive thin grid. We show the ground-truths of the different pdfs in Table 2. Moreover, the true value of the complete evidence Z = p(y) = 1.5983 · 10 −9 . Since the prior over σ is uniform, the maximum likelihood of σ is σ ML = σ MAP-joint = 3.23. The two marginal posteriors are shown in Figures 2(b)-(c). We apply ATAIS with the goal of estimating the expected value and the variance of the posterior density with respect to θ. We consider a Gaussian proposal q(θ|µ t , λ t ) with µ 0 = 10 and a starting variance of λ 0 = 4. Note that µ 0 is located in a region that does not contain modes. We also start with σ (0) ML = 20 and π MAP = 0 (initial conditions). The Mean Square Error (MSE) of ATAIS, averaged over 500 runs, in estimation of different moments and modes as function of N (and with T = 10), is given in Table 3. The ML estimation σ (t) ML as function of the iteration t (with N = 5) for different runs, is given in Figure 4 30) and (23), are shown in Figure 3, i.e., using a sampling importance resampling procedure. For more details, see [14] and [15,Chapter 24].

Radial velocity curves of exoplanets and binary systems
In this example, we consider an application in an astronomical model. In recent years, the problem of revealing objects orbiting other stars has acquired large attention. Different techniques have been proposed to discover exo-objects but, nowadays, the radial velocity technique is still the most used [16,17,18,19]. The problem consists in fitting a model (the so-called radial velocity curve) to data acquired at different moments spanning during long time periods (up to years). The model is highly non-linear and it is costly in terms of computation time (specially, for certain sets of parameters).
Obtaining a value to compare to a single observation involves numerically integrating a differential equation in time or an iterative procedure for solving to a non-linear equation. Typically, the iteration is performed until a threshold is reached or 10 6 iterations are performed. The problem of radial velocity curve fitting is applied in several related applications.
Observation model -likelihood. When analysing radial velocity data of an exoplanetary system, it   3.46 σ ML 8 · 10 −5 2 · 10 −5 5 · 10 −7 6 · 10 −9 3.23 Z = p(y) 2 · 10 −18 1.8 · 10 −20 1.4 · 10 −20 3.6 · 10 −22 1.6 · 10 −9 is commonly accepted that the wobbling of the star around the centre of mass is caused by the sum of the gravitational force of each planet independently and that they do not interact with each other. Each planet follows a Keplerian orbit and the radial velocity of the host star is given by with t = 1, . . . , T and r = 1, . . . , R. In this equation, A i is the amplitude of the curve, w i is the argument of perigee and e i is the eccentricity of the orbit, of the i-th planet. The parameter V 0 represents the mean velocity, and is common for all the planets. The number of objects in the system is S , that is consider known in this experiment (for the sake of simplicity). Both y r,t , u i,t depend on time t, and then ξ t is a Gaussian noise perturbation with variance σ 2 . The likelihood function is defined by (31) and some indicator variables described below. The angle u i,t is the true anomaly of the planet i and it can be determined from This equation has analytical solution. As a result, the true anomaly u t can be determined from the mean anomaly M. However, the analytical solution contains a non linear term that needs to be determined by iterating. First, we define the mean anomaly M i,t as where τ i is the time of periastron passage of the planet i and P i is the period of its orbit. Then, through the Kepler's equation, where E i,t is the eccentric anomaly. Equation (34) has no analytic solution and it must be solved by an iterative procedure. A Newton-Raphson method is typically used to find the roots of this equation [20]. For certain sets of parameters, this iterative procedure can be particularly slow. We also have The variable of interest θ is then the vector Then, for a single object (e.g., a planet or a natural satellite), the dimension of θ is M = 5 + 1 = 6, with two objects the dimension of θ is is M = 11 etc. This example consists in a synthetic radial velocity curve of a planetary system with one planet or two planets (i.e., S = 1 or S = 2). More specifically, we generate simulated data with a model with two planets. The orbital parameters of the planets are listed in Table 4, where P is the period of the orbit, A is the amplitude of the curve, e is the eccentricity of the orbit, ω is the argument of perigee and τ is the last periastron passage. A mean velocity V 0 = 5 m s −1 is assumed. A Gaussian noise perturbation is added with a standard deviation σ = 3 m s −1 . To simulate observations, a total of K = 120 data points are selected from three, random time periods (and two planets in the system). Note that the amplitude of the radial velocity curve of the second planet is close to the noise level. We run ATAIS and a standard AIS scheme with the model with one planet and with the model with two planets. The purpose of this simulation is to check the ability of the method to detect the two planets (by approximating the model evidence).
We apply ATAIS and a standard AIS scheme [13] over the space [θ, σ] for approximating the model evidence Z = p(y) (marginal likelihood) of both models (one planet or two planets) with the given data (generated considering two planets). Uniform priors are considered for each parameter: 20,20], e ∈ [0, 1], ω ∈ [0, 2π], and τ ∈ [0, 50] (moreover, σ ∈ [0, 30] for the standard AIS scheme). The ATAIS algorithm and the standard AIS scheme have been run with N = 10 6 and T = 50 iterations for both, the model with one and two planets. In both cases, we consider the same Gaussian proposal with a starting standard deviation of 5 for each component (note that the standard AIS scheme works in higher dimensional space due the inference over σ). To decide which model is more probable, the model evidence Z of each model is estimated. More specifically, we approximate the one-planet model Z 1 = p 1 (y), and the two-planets model Z 2 = p 2 (y) with the ATAIS algorithm and the standard AIS scheme. When Z 1 > Z 2 we select the first model otherwise if, Z 1 < Z 2 , we select the second one. The true model is the two-planets model, since the simulated data were generated from that model. After 500 independent runs, the percentage of correct detection of the true model for ATAIS is ≈ 98%, whereas with the standard AIS scheme is only ≈ 56%. This is due to the difficulty of making inference jointly over [θ, σ] ≈ 0.15. Therefore, for ATAIS, the model with two planets is clearly more probable than the model with one planet.
The fitted curves, corresponding to the vector of parameters θ MAP obtained with ATAIS, are shown in Fig. 5. From the figure, it is not clear which model fits better the simulated observations (blue points), although the model with two planets seems to fit better the observations in the time period from 200 to 300 days. The values of θ MAP , obtained in one specific run by ATAIS, is given in Table 5. We notice that ω and τ are highly correlated and more iterations may be needed to obtain the actual global maximum, but the remaining parameters obtained from θ MAP are similar to the simulated values. In addition, the amplitude of the curve of the second planet is close to the intensity of the noise, what makes difficult to derive the best fit for that planet. Summarizing, our results show the method is able to discriminate between a model with one planet (with 6 dimensions of the inference problem) and a model with two planets (with 11 dimensions of the inference problem), for this particular simulation. Finally, the evolution of the automatic tempering parameter σ (t) ML is shown in Fig. 6. The dashed line is the evolution of σ (t) ML for the single-planet model, whereas the continuous line is the evolution of σ (t) ML for the model with two planets. In this second model, the tempering parameter reaches a smaller value, as expected.  Figure 6: Evolution of the tempering parameter σ (t) ML . We decide σ (0) ML = 50 as starting value (the figure shows from t = 1), which an arbitrary high value to help the exploration in the first iteration. However, after the first iteration, the algorithm is able to obtain reasonable values of σ (1) ML . The dashed line is the evolution for the model with one planet. The continuous line is the evolution of the two-planets model.

Conclusions
We have proposed a novel AIS scheme for Bayesian inversion problems where an automatic tempering procedure is implemented (called ATAIS). The inference of the variables of interest θ and the noise power σ 2 is divided. A sampling strategy is considered for θ and an optimization approach is employed for σ 2 . Thus, ATAIS performs an iterative procedure, alternating sampling and optimization steps. Therefore, the proposed scheme deals with a sequence of tempered posteriors according to the current estimation of the noise power. We have also discussed the possibility of approximating the marginal posterior of σ without additional evaluations of the complex model. Furthermore, the complete Bayesian analysis regarding the complete joint posterior is possible as discussed in Section 5, again without any additional evaluations of the likelihood function. Several simulations are provided and the application to a sophisticated astronomical model has been considered, where the number of planets in the system is detected by the analysis of the marginal likelihood. The results show the benefits of the proposed scheme. For instance, in the astronomical example, the percentage of correct detection of the true model obtained by ATAIS is ≈ 98%, whereas with the standard AIS scheme is only ≈ 56%. As future research, we plan to extend the ATAIS scheme in order to deal with an observation model with correlated noise perturbations (for instance, using a Gaussian Process). Moreover, the use of parallel AIS schemes (or MCMC algorithms) will be also considered. A combination of parallel MCMC chains and AIS schemes can be found in the so-called layered AIS method and other similar approaches [21,22]. This idea seems particularly interesting for improving the inference with radial velocity models.
We can write the gradient and equal to zero, We have obtained that the ML solution is defined by the system of equations,