Particle rejuvenation of Rao-Blackwellized Sequential Monte Carlo smoothers for Conditionally Linear and Gaussian models

This paper focuses on Sequential Monte Carlo approximations of smoothing distributions in conditionally linear and Gaussian state spaces. To reduce Monte Carlo variance of smoothers, it is typical in these models to use Rao-Blackwellization: particle approximation is used to sample sequences of hidden regimes while the Gaussian states are explicitly integrated conditional on the sequence of regimes and observations, using variants of the Kalman filter / smoother. The first successful attempt to use Rao-Blackwellization for smoothing extends the Bryson-Frazier smoother for Gaussian linear state space models using the generalized two-filter formula together with Kalman filters / smoothers. More recently, a forward backward decomposition of smoothing distributions mimicking the Rauch-Tung-Striebel smoother for the regimes combined with backward Kalman updates has been introduced. This paper investigates the benefit of introducing additional rejuvenation steps in all these algorithms to sample at each time instant new regimes conditional on the forward and backward particles. This defines particle based approximations of the smoothing distributions whose support is not restricted to the set of particles sampled in the forward or backward filter. These procedures are applied to commodity markets which are described using a two factor model based on the spot price and a convenience yield for crude oil data.


Introduction
State space models are bivariate stochastic processes {(Y i , Z i )} i≥1 where the state sequence (Z i ) i≥1 is a Markov chain which is only partially observed through the sequence (Y i ) i≥1 . Conditionally on the state sequence (Z i ) i≥1 the observations are independent and for all ℓ ≥ 1 the conditional distribution of Y ℓ given (Z i ) i≥1 depends on Z ℓ only. These models are used in a large variety of disciplines such as financial econometrics, biology, signal processing, see [DM13] and the references therein. In general state space models, bayesian filtering and smoothing problems, i.e. the computation of the posterior distributions of a sequence of states (Z i , . . . , Z p ) for 1 ≤ i ≤ p ≤ ℓ given observations (Y 1 , . . . , Y ℓ ), are challenging tasks. Filtering refers to the estimation of the distributions of the hidden state Z i given the observations (Y 1 , . . . , Y i ) up to time i, while fixed-interval smoothing stands for the estimation of the distribution of sequence of states (Z i , . . . , Z p ) given observations (Y 1 , . . . , Y ℓ ) with 1 ≤ i ≤ p < ℓ.
When the state and observation models are linear and Gaussian, filtering can be solved explicitly using the Kalman filter [Kal60]. Exact solutions of the fixed-horizon smoothing problem can be obtained using either the Rauch-Tung-Striebel smoother [RST65] or the Bryson-Frazier two-filter smoother [BF63]. This paper focuses on Conditionally Linear and Gaussian Models (CLGM) given for i ≥ 2 by: where: -(ε i ) i≥2 is a sequence of independent and identically distributed (i.i.d.) m-dimensional Gaussian vectors with zero mean and identity covariance.
-(a i ) i≥1 is a homogeneous Markov chain taking values in a finite space {1, . . . , J}, called regimes, with initial distribution π and transition matrix Q.
-Z 1 is a m-dimensional Gaussian random variable with mean µ 1 and variance Σ 1 independent of (ε i ) i≥2 .
Let n be the number of observations. At each time step 1 ≤ i ≤ n, the observation Y i is given by: where: -(η i ) i≥1 is a i.i.d. sequence of p-dimensional Gaussian vectors, independent of (ε i ) i≥2 and Z 1 .
CLGM play an important role in many applications; see [Sar13] and the references therein for an up-to-date account. A crucial feature of these models is that, conditional on the regime sequence (a 1 , . . . , a n ), both the state equation and the observation equation are linear and Gaussian, which implies that conditional on the sequence of regimes and on the observations, the filtering and the smoothing distributions of the continuous states (Z 1 , . . . , Z n ) can be computed explicitly.
To exploit this specific structure, it has been suggested in the pioneering works of [CL00, DGA00] to solve the filtering problem by combining Sequential Monte Carlo (SMC) methods to sample the regimes with the Kalman filter to compute the conditional distribution of the states sequence (Z i ) 1≤i≤n conditional on the regimes and on the observations. This is a specific instance of Rao-Blackwellized Monte Carlo filters, often referred to as the Mixture Kalman Filter. Improvements of these early filtering techniques have been introduced in [DGK01,SGN05].
The use of Rao-Blackwellization to solve the smoothing problem has been proved to be more challenging and has received satisfactory solutions only recently. The first forward-backward smoother proposed in the literature [FGDW02] was not fully Rao-Blackwellized as it required to sample the hidden linear states in the backward pass. An alternative approach, based on the so-called structural approximation of the model suggested in an early paper by [Kim94], was proposed by [Bar06] to avoid to sample a continuous state in the backward pass. This approximation is rather ad-hoc and the resulting smoother is not consistent when the number of particles goes to infinity. The inaccuracy introduced by the approximation might be difficult to control.
The first fully Rao-Blackwellized SMC smoother which should lead to consistent approximations when the number of particles grows to infinity was proposed by [BDM10] and extends the Bryson-Frazier smoother for Gaussian linear state space models using the generalized two-filter formula with Rao-Blackwellization steps for the forward and the backward filters. This two-filter approach combines a forward filter with a backward information filter which are approximated numerically using SMC for the regime sequence and Kalman filtering techniques for the hidden linear states.
More recently, [SBG12, LBGS13, LBS + 16] introduced a Rao-Blackwellized smoother based on the forward-backward decomposition of the FFBS algorithm with Rao-Blackwellization steps both in the forward and backward time directions. The update of the smoothing distribution of the regime given the observations shares some striking similarities with the Rauch-Tung-Striebel smoothing procedure, which is at the heart of the FFBS procedure. The Rao-Blackwellization requires to update backward in time the smoothing distribution of the states given the regimes and the observations, which is achieved by using anà la Kalman backward update.
In this paper, we propose to improve the performance of the algorithms introduced in [BDM10] and in [SBG12, LBGS13, LBS + 16] by using additional Rao-Blackwellization steps which allows to sample new particles in the backward pass. This approach may be seen as an extension of the ideas of [FC03] for Rao-Blackwellized smoothers. In [BDM10], for all 1 ≤ i ≤ n, the sampled forward and backward sequences are merged to approximate the posterior distribution of (a i , z i ). This provides an approximation whose support is restricted to the particles produced at time i by the backward particle filter. As noted in [FWT10, Secion 2.6], these two-filter smoothers are prone to suffer from degeneracy issues when the algorithm associates forward particles at time i − 1 with backward particles at time i. We propose to approximate the marginal smoothing distribution of (a i , z i ) by merging the sampled forward and backward trajectories at times i − 1 and i + 1 and integrating out all possible paths between time i − 1 and time i and between time i and time i + 1 instead of sampling random variables. Similarly, in the backward pass of [SBG12, LBGS13, LBS + 16], a regimeã i is sampled at time 1 ≤ i ≤ n − 1 using the particles produced by the forward filter at time i. In this case, particle rejuvenation may be introduced by using the forward weighted samples at time i − 1 and extending these trajectories at time i with a Kalman filter for all possible values of the regime. Then,ã i is sampled in {1, . . . , J} using an appropriately adapted weight.
The paper is organized as follows. The algorithms introduced in [BDM10] and in [SBG12, LBGS13, LBS + 16] as long as the proposed rejuvenation associated with each method are presented in Section 2. The performance of all these methods is illustrated in Section 3 with simulated data. In Section 4, an application to commodity markets is presented; the performance of our procedure is illustrated with crude oil data. A detailed derivation of the algorithms is provided in the Appendix. probability density of the conditional distribution of Y i given (a i , Z i ): where All the algorithms considered in this paper are based on forward-backward or two-filter decompositions of the smoothing distributions and share the same forward filter presented in Section 2.1.

Forward filter
The SMC approximation p N (a 1:i , z i |y 1:i ) of p(a 1:i , z i |y 1:i ) may be obtained using a standard Rao-Blackwellized algorithm. The procedure produces a sequence of trajectories (a k 1:i ) 1≤k≤N associated with normalized importance weights (ω k i ) 1≤k≤N ( N k=1 ω k i = 1) used to define the following approximation of p(a 1:i , z i |y 1:i ): where δ is the Dirac delta function. In this equation, the conditional distribution of the hidden state z i given the observations y 1:i and a trajectory a k 1:i is a Gaussian distribution whose mean µ k i and variance P k i may be obtained by using the Kalman filter update. Initialization At time i = 1, write, for all 1 ≤ j ≤ J, (a k 1 ) 1≤k≤N are sampled independently in {1, . . . , J} with probabilities proportional to Then, µ k 1 and P k 1 are computed using a Kalman filter: where for all positive integer p, I p is the p× p identity matrix. Each particle particle a k 1 is associated with the importance weight ω k 1 = 1/N .

Iterations
Several procedures may be used to extend the trajectories (a k 1:i−1 ) 1≤k≤N at time i. For all sampled trajectories (a k 1:i−1 ) 1≤k≤N and all 1 ≤ j ≤ J, [CL00] used the incremental weights: where: In [CL00], for all 1 ≤ k ≤ N , an ancestral path is chosen with probabilities proportional to (ω k i−1 ) 1≤k≤N . Then, the new regime a k i is sampled in {1, . . . , J} with probabilities proportional to (γ j,k i ) 1≤j≤J . A drawback of this method is that only ancestral paths that have been selected using the importance weights (ω k i−1 ) 1≤k≤N are extended at time i. Following [BGM08], this may be improved by considering all the offsprings of all ancestral trajectories (a k 1:i−1 ) 1≤k≤N . Each ancestral path has J offsprings at time i, it is thus necessary to choose a given number of trajectories at time i (for instance N ) among the N J possible paths. To obtain the weight associated with each offspring write the following approximation of p(a 1:i |y 1:i ) based on the weighted samples at time i − 1: Therefore, each ancestral trajectory of the form (a k 1: Several random selection schemes have been proposed to discard some of the possible offsprings to maintain an average number of N particles at each time step. Following [BGM08], we might choose between the Kullback-Leibler Optimal Selection (KL-OS) or the Chi-Squared Optimal Selection (CS-OS) to associate a new weight to each of the N J trajectories. If the new weight is 0, then the corresponding particle can be removed.

KL-OS: λ is chosen as the solution of :
Then, in both cases, all particles such thatΩ j,k i = 0 are discarded and for all the other trajectories defined as an ancestral path (a k 1:i−1 ) extended by a k i = j, the new corresponding weight ω in (5) is given by the normalized weightΩ j,k i . In the numerical sections of this paper, the Kullback-Leibler Optimal Selection (KL-OS) scheme is used. p(a 1:n |y 1:n ) = p(a 1:i |a i+1:n , y 1:n )p(a i+1:n |y 1:n ) .

FFBS based algorithms
This decomposition is similar to the Rauch-Tung-Striebel decomposition of the filtering distribution. The first factor on the right hand side of the previous equation is nevertheless more difficult to handle because it itself relies on all the observations. As noted by [SBG12], this term can be computed recursively by considering the following decomposition: p(a 1:i |a i+1:n , y 1:n ) ∝ p(y i+1:n , a i+1:n |a 1:i , y 1:i )p(a 1:i |y 1:i ) .
The second factor in the last equation may be approximated using the ancestral trajectories (a k 1:i ) 1≤k≤N and the associated importance weights (ω k i ) 1≤k≤N produced by the forward filter. Therefore, p(a 1:i |a i+1:n , y 1:n ) may be approximated by: Then, a trajectoryã 1:n approximatively distributed according to p(a 1:n |y 1:n ) may be sampled following these steps: This algorithm requires to compute the quantity p(y i+1:n , a i+1:n |a k 1:i , y 1:i ). This predictive quantity is available analytically using Kalman filtering techniques. However, this has to be done for each trajectory (a k 1:i ) 1≤k≤N , which leads to an algorithm with a prohibitive computational complexity. [LBS + 16] proposed a procedure computationally less intensive by conditioning with respect to z i and then marginalizing with respect to this variable: This is similar to the two-filter decomposition of the smoothing distribution, see Section 2.3. By where the proportionality is with respect to (a i , z i ) and where the proportionality is with respect to z i . These quantities may be computed recursively backward in time with: As p(y i:n , a i+1:n |z i , a i ) = p(y i |z i , a i )p(y i+1:n , a i+1:n |z i , a i ), Then, by (9), where the proportionality is with respect to a k 1:i and are independent copies ofã 1:n , the SMC approximation of [LBS + 16] of the joint smoothing distribution of the regime is: p Lbscg N (a 1:n |Y 1:n ) = 1 NÑ k=1 δãk 1:n (a 1:n ) .

Particle rejuvenation of FFBS algorithms
The crucial step of the FFBS algorithm is the decomposition (8) which allows to extend a backward trajectoryã i+1:n by choosing a particle in the set (a k i ) 1≤k≤N produced by the forward filter (and discarding the states a k 1:i−1 ). An improved version of this FFBS algorithm which is not constrained to sample states in the support (a k i ) 1≤k≤N may be defined for all 2 ≤ i ≤ n − 1 by writing: p(a 1:i |a i+1:n , y 1:n ) ∝ p(y i+1:n , a i+1:n |a 1:i , y 1:i )p(a 1:i |y 1:i ) , Replacing p(a 1:i−1 , z i−1 |y 1:i−1 ) in the integral by the particle approximation obtained during the forward pass and using Kalman filtering techniques for each trajectory (a k 1:i−1 ) 1≤k≤N and each a i ∈ {1, . . . , J} yields: On the other hand, for all 1 ≤ k ≤ N , p(y i+1:n , a i+1:n |a k 1:i−1 , a i , y 1:i ) is computed as in (10) with all possible values a i ∈ {1, . . . , J} and not only the regime of the filtering pass (a k i ) 1≤k≤N . This means that a Kalman filter must be used for each trajectory a k 1:i−1 which may be extended by a i ∈ {1, . . . , J}. Denote by µ k i|i−1 (a i ) and P k i|i−1 (a i ) the mean and covariance matrix of the law of z i given (a k 1:i−1 , a i ) obtained as in (6) and (7). Then, where the proportionality is with respect to (a k 1:i−1 , a i ) and The distribution p(a 1:i |a i+1:n , y 1:n ) is then approximated by : p N (a 1:i |a i+1:n , y 1:n ) where ( ϑ k i−1 ) 1≤k≤N are adjustment multipliers and q(a i |a k 1:i−1 , a i+1:n , y 1:n ) is a proposal kernel chosen by the user. This means that an ancestral path a ⋆ 1:i−1 is sampled in (a k 1:i−1 ) 1≤k≤N with weights ( ϑ k i−1 ) 1≤k≤N and a ⋆ i is sampled from q(·|a ⋆ 1:i−1 , a i+1:n , y 1:n ). Then, the proposed sequence a ⋆ 1:i is accepted or rejected using the usual Metropolis-Hastings acceptance ratio. The choice of MCMC rejuvenation has interesting practical consequences as the computation of the acceptance ratio only requires to compute the posterior probability (11) for the proposed sequence a ⋆ 1:i while our technique is based on the computation of (11) for all combinations of sequences (a k 1:i ) 1≤k≤N and states a i ∈ {1, . . . , J}. Sampling from (12) is computationally more intensive, especially when N is large, but our method is based on a direct approximation of p(a 1:i |a i+1:n , y 1:n ) based on (a k 1:i−1 ) 1≤k≤N and a i+1:n instead of approximate MCMC draws.

Rao-Blackwellized Two-filter Smoother of [BDM10]
Contrary to the previous methods, two-filter based smoothers are designed to compute approximations of marginal smoothing distributions (usually the posterior distribution of one or two consecutive regimes given all the observations). [BDM10] introduced the following decomposition of the smoothing distributions for all 2 ≤ i ≤ n: The first term on the right hand side may be approximated using the forward filter by noting that: In the forward pass described in Section 2.1, a set of possible sequences of regimes a k 1:i−1 associated with importance weights ω k i−1 , 1 ≤ k ≤ N are sampled to approximate p(a i−1 , z i−1 |y 1:i−1 ). This provides a normalized approximation p N (a i , z i |y 1:i−1 ) of p(a i , z i |y 1:i−1 ). Define Then, As the function (a i , z i ) → p(y i:n |a i , z i ) is not a probability density function, approximating the second term of (13) using SMC samples is not straightforward. The backward filter uses artificial densities to introduce a surrogate target density function which may be approximated recursively using SMC methods. Then, the forward and backward weighted samples are combined using (13) to approximate p(a i , z i |y 1:n ). Following [BDM10], for any probability densities (γ i ) 1≤i≤n , define the following joint probability densities: p n (a n , z n , y n ) := γ n (a n , z n )g(a n , z n ; y n ) ,p n (y n ) := J an=1 γ n (a n , z n )g(a n , z n ; y n )dz n , and, for all 1 ≤ i ≤ n − 1, p i (a i:n , z i:n , y i:n ) := γ i (a i , z i )p(y i:n |a i:n , z i:n )p(a i+1:n , z i+1:n |a i , z i ) , p i (y i:n ) := J ai:n=1 γ i (a i , z i )p(y i:n |a i:n , z i:n )p(a i+1:n , z i+1:n |a i , z i )dz i:n .
Note that this choice differs slightly from [BDM10] where it is advocated to set γ i as the product of two independent densities γ a i (a i ) and γ z i (z i ). As the accuracy of the algorithm relies heavily on a proper tuning of this artificial density, a more general choice of γ i is considered in this paper. By Lemma 1, these probability densities may be used to approximate the quantities p(y i:n |a i , z i ), 1 ≤ i ≤ n, in (13).
i (a i:n |y i:n )p(y i:n |a i:n , z i ) γ i (a i , z ′ )p(y i:n |a i:n , z ′ )dz ′ .
Proof. The proof is postponed to Appendix A.
By definition ofp i for all 1 ≤ i ≤ n, p i (a i:n , z i |y i:n ) ∝ γ i (a i , z i ) p(y i:n |a i:n , z i:n )p(a i+1:n , z i+1:n |a i , z i )dz i+1:n , This yields:p i (a i:n |y i:n ) ∝ n−1 u=i Q(a u , a u+1 ) γ i (a i , z i )p(y i:n |z i , a i:n )dz i .
A set of weighted trajectories (ã ℓ i:n ) 1≤ℓ≤N with importance weights (ω ℓ i ) 1≤ℓ≤N , 1 ≤ i ≤ n, may then be sampled recursively backward in time to produce a SMC approximation ofp(a i:n |y i:n ) as follows.
-For all 1 ≤ i ≤ n − 1, resample the set (ã ℓ i+1:n ) 1≤j≤N using the normalized weights (ω ℓ i+1 ) 1≤j≤N . Then, for 1 ≤ ℓ ≤ N , sampleã j i ∼q i (ã ℓ i+1:n , ·) and set: To obtain uniformly weighted samples at each time step, in the numerical experiments we use: By (15) and (16), i (a i:n |y i:n )p(y i:n |a i:n , z i ) γ i (a i , z ′ )p(y i:n |a i:n , z ′ )dz ′ , which suggests the following particle approximation p N (y i:n |a i , z i ) of p(y i:n |a i , z i ): p N (y i:n |a i , z i ) =p i (y i:n ) N ℓ=1ω ℓ i p(y i:n |ã ℓ i:n , z i ) γ i (ã ℓ i , z ′ )p(y i:n |ã ℓ i:n , z ′ )dz ′ δãℓ i (a i ) .
The conditional likelihood of the observations given the sequence of states p(y i:n |a i:n , z i ) can be computed explicitly using a Gaussian backward smoother; these computations are summarized in Lemma 2. In the numerical experiments, γ i (a i , z i ) is set as a mixture of Gaussian distributions.
Note that for such a choice, the integral γ i (a i , z ′ )p(y i:n |a i:n , z ′ )dz ′ may be computed explicitly, see Lemma 3. Then, combining (17) and (14) with (13) provides an approximation of p(a i , z i |y 1:n ) by merging the forward particles (a k i−1 ) 1≤k≤N with the backward particles (ã k i+1 ) 1≤k≤N , the support of this SMC approximation of p(a i , z i |y 1:n ) being (ã k i+1 ) 1≤k≤N . As noted in [FWT10, Secion 2.6], two-filter smoothers are prone to suffer from degeneracy issues when the algorithm associates forward particles at time i − 1 with backward particles at time i. The authors illustrate this issue in the case where the hidden state is an AR(2) process. To overcome the weakness of such standard two-filter approaches the particle rejuvenation proposed in Section 2.3.2 follows the idea introduced in [FWT10] where new particles at time i are sampled conditional on (a k 1:i−1 ) 1≤k≤N and on (ã k i+1:n ) 1≤k≤N and appropriately weighted. This allows to produce new particles at time i and to obtain a SMC approximation of p(a i , z i |y 1:n ) whose support is not restricted to (ã k i+1 ) 1≤k≤N . Section 2.3.2 exploits this idea in the specific case of linear and Gaussian models where explicit computations allows to produce an approximation using (a k 1:i−1 ) 1≤k≤N and (ã k i+1:n ) 1≤k≤N with support {1, . . . , J} and without any additional sampling steps.

Particle rejuvenation of two-filter based algorithms
For 2 ≤ i ≤ n − 1, particle rejuvenation relies on the explicit marginalization: p(a i , z i |y 1:n ) = ai−1 ai+1 zi−1 zi+1 where ψ n i (a i−1:i+1 , z i−1:i+1 ) is the smoothing distribution of the hidden regimes and states between time indices i − 1 and i + 1. Note that the EM algorithm requires the approximation of p(a i−1 , z i−1 , a i−1 , z i−1 |y 1:n ) in the E-step, this may be obtained following the same steps by marginalizing explicitly the linear states at time i − 2 and i + 1. Intermediate computations follow the same steps as for the approximation of p(a i , z i |y 1:n ). First, ψ n i may be decomposed as follows: where the proportionality is with respect to (a i−1:i+1 , z i−1:i+1 ). Then, by (18), the smoothing distribution p(a i , z i |y 1:n ) may be written as where m and g are defined in (3) and (4) and The backward pass described in Section 2.3.1 produces a sequence of statesã ℓ i+1:n associated with importance weightsω ℓ i+1 , 1 ≤ ℓ ≤ N which are used to approximate p(y i+1:n |a i+1 , z i+1 ). Plugging this approximation into (20) provides an approximation t N i (a i , z i , y i+1:n ) of t i (a i , z i , y i+1:n ) integrating over all possible choices (a i+1 , z i+1 ). These steps are then combined to form a non normalized SMC approximation of p(a i , z i |y 1:n ) using (19). The normalization of the SMC approximation of p(a i , z i |y 1:n ) is obtained by integrating over the states a i , z i , when p(a i , z i |y 1:i−1 ) and t i (a i , z i , y i+1:n ) are replaced by p N (a i , z i |y 1:i−1 ) and t N i (a i , z i , y i+1:n ) in (19). Our procedure allows to construct sequence of regimes with non-degenerated importance weights in the combination step. This procedure improves significantly [BDM10] where no marginalization of p(a i , z i |y 1:n ) over the states at times i − 1 and i + 1 is performed and where the proposed forward and backward paths are directly merged. This method often leads to importance weights which are close to be numerically degenerated. By Lemma 2, the SMC approximation p N (y i:n |a i , z i ) of p(y i:n |a i , z i ) is then given by: where (P ℓ i ) −1 :=P −1 i (ã ℓ i:n ),ν ℓ i :=ν i (ã ℓ i:n ) andc ℓ i :=c ℓ i (ã ℓ i:n ) are defined in Lemma 2. Define Then, by (20), the SMC approximation t N i (a i , z i , y i+1:n ) of t i (a i , z i , y i+1:n ) is given by: where ) . In the numerical experiments, γ i (a i , z i ) is set as a mixture of Gaussian distributions. As explained in Section 2.3.1, the integral zi+1 γ i+1 (ã ℓ i+1 , z)p(y i+1:n |ã ℓ i+1:n , z)dz may be computed explicitly, see Lemma 3.

Simulated data
This section highlights the improvements brought by the additional Rao-Blackwellization steps for the two-filter and the FFBS approximations of the marginal smoothing distributions in the case where the number of states is J = 2. The transition matrix Q is such that the probability of switching from one regime to the other is small, as expected for the WTI crude oil data, see Section 4. First, the algorithms are applied to a simple one-dimensional model with: The original FFBS algorithm of [LBS + 16] and the FFBS algorithm with rejuvenation proposed in this paper are used with N =Ñ = 25. For comparable computational costs, the two-filter method of [BDM10] and the method with rejuvenation are run with N = 100. The artificial distributions are chosen as γ i (a i , z i ) = p N (a i , z i |y 1:i−1 ) where p N (a i , z i |y 1:i−1 ) is defined by (14). All these algorithms are compared to the estimation obtained with the proposed FFBS algorithm with rejuvenation and 5000 particles considered as a benchmark value. Figure 1 displays the mean estimation error over 100 independent Monte Carlo runs. The estimation error is defined as the absolute difference between the benchmark value and the estimations given by all algorithms. In addition, Figure 2 displays the empirical variance of the estimation for each algorithm. Figure 1 and Figure 2 illustrate that in both cases the additional rejuvenation step improves the accuracy and the variability of SMC smoothers. In addition, even with a sharp choice for the artificial distributions γ i , 1 ≤ i ≤ n, FFBS based methods outperform two-filter based smoothers for this model. 4 Application to CME crude oil (WTI)

Model
Modeling commodity prices is a crucial step to valuate contingent claims related to energy markets and to optimize storage or extraction strategies. In [GE90,Sch97], the authors proposed a model where the spot price of a commodity(S t , t ≥ 0) depends on a second factor (δ t , t ≥ 0), referred to as the instantaneous convenience yield. This factor plays the role of dividends in equity markets and models the benefit of holding the physical commodity or the storage and maintenance costs required to keep the commodity. In this model, this convenience yield is described as an Ornstein-Uhlenbeck process:  where the parameter (r, σ, κ, α, η, ρ) are constant and ((W 1 t , W 2 t ), t ≥ 0) are standard Brownian motions. This model appears to be too restrictive as energy markets are not likely to revert to a single equilibrium value. This assumption is relaxed using Markov switching models to allow several possible regimes for the spot price and the convenience yield. Following [Alm16], the spot price and convenience yield are described in this paper as: where (with H ai−1 := H ai−1 H ′ ai−1 and τ = t i − t i−1 ): The observations are Wednesday future contracts of the West Texas Intermediate crude oil (WTI) traded in the Chicago Mercantile Exchange (CME) from 11 January 1995 to 13 November 2013. The contracts are numbered F 1 , F 2 , . . . , F 36 where F 1 (or front month) is the earliest delivery future contract, F 2 is the second earliest delivery future contract and so on. Among these 36 contracts, the four future contracts: F 1 , F 4 , F 6 , F 13 are used since their trading volumes and their impacts on the Term Structures are the most important (F 1 is the most liquid contract, F 13 characterizes the gap between prices over a one year period, F 4 and F 6 are intermediate future contracts that are mostly traded). As in [Alm16], we consider that each future contract has a fixed time to maturity: Therefore, the observations of the logfuture prices are given, for all 1 ≤ i ≤ n, by: where η i is a standard multivariate Gaussian random variable and: The model depends of the parameters: The aim of this section is to estimate θ and the posterior probabilities P(a k = j|Y 1:n ), 1 ≤ k ≤ n, 1 ≤ j ≤ J. Given the observations Y 1:n , the EM algorithm introduced in [DLR77] maximizes the incomplete data log-likelihood θ → ℓ n θ defined by where the complete data likelihood p θ is given by p θ (a 1:n , z 1:n , Y 1:n ) := π(a 1 )φ µ1,Σ1 (z 1 )g θ (a 1 , z 1 ; Denote by E θ [·|Y 1:n ] the conditional expectation given Y 1:n when the parameter value is set to θ. The EM algorithm iteratively builds a sequence {θ p } p≥0 of parameter estimates following the two steps: 1. E-step: compute θ → Q(θ, θ p ) := E θp [log p θ (a 1:n , Z 1:n , Y 1:n )|Y 1:n ] ; 2. M-step: choose θ p+1 as a maximizer of θ → Q(θ, θ p ).
All the conditional expectations involved in Q(θ, θ p ) are approximated using our two-filter algorithm with rejuvenation to define the SMC approximation θ → Q N (θ, θ p ) of θ → Q(θ, θ p ). As the function θ → Q N (θ, θ p ) cannot be maximized analytically, the M-step is performed numerically using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) introduced in [HO01]. This derivative-free optimization procedure is known to perform well in complex multimodal optimization settings, see e.g. [HK04].
The CMA-ES algorithm is used with an initial standard deviation for the parameters σ cmaes = 0.005, a number of selected search points µ cmaes = 20 and a population size λ cmaes = 100. The algorithm is stopped after 10000 iterations.
In Gibson-Schwartz model [GE90], a stronger backwardation effect implies a greater value for α for the same values of the other parameters. For the CME WTI Crude Oil, backwardation effect is more frequent than contango effect so that α 1 should be greater than α 2 . Therefore, this condition is imposed for all simulations in the CMA-ES algorithm. The results after 2500 iterations of the EM algorithm are given in Table 2. The estimated values and standard deviations are obtained with 50 independent runs of the algorithm. As expected, we obtain σ 1 ≥ σ 2 , α 1 ≥ α 2 , η 1 ≥ η 2  and ρ 1 ≥ ρ 2 at convergence of the EM algorithm. Moreover, Q(1, 1) > Q(2, 2) corresponds to the prediction that we did from the data description. The fact that σ 1 ≥ σ 2 and α 1 ≥ α 2 indicates the first regime (backwardation) characterized by a higher value in both volatility and equilibrium level of convenience yield, and the second regime (contango) characterized by a lower value in both volatility and equilibrium convenience yield level. This in accordance with the theory of storage that the volatility of the commodity spot price is high when the inventory is low, and the convenience yield is all the higher as inventory is low. Figure 3 compares the evolution of future 1M (the nearest contracts) to the term structure observed from CME WTI crude oil, defined as the difference of future 13M and future 1M (to avoid seasonality). The figure shows that it is not necessary to have an inverse relationship between the price of the nearest contract and the term structure. But when a significant drop in the price of the nearest contract occurs, the term structure increases (i.e. in contango).
The correlation between the spot price and the convenience yield is positive and high in both two regimes. This is an accordance to what as been observed in most commodity market, see [GE90]. The slope of future curve decreases in function of maturity. Figure 4 and 5 display the the estimated posterior probabilities of the regimes and the observed future slope. When the future curve is in backwardation (resp. contango), the model is expected to be in the first regime (resp. second regime), except for the period where the slope of the future curve is too small and in the period from December 2008 to April 2009 (beginning of the crisis).

Conclusions
This paper presents Rao-Blackwellized Sequential Monte Carlo methods to approximate smoothing distributions in conditionally linear and Gaussian state spaces in a common unifying framework. It also provides different techniques that could be used in the forward filtering pass to improve significantly the usual mixture Kalman filter. The filtering distributions are approximated at each time step by considering all possible offsprings of all ancestral trajectories before discarding degenerated paths instead of resampling the ancestral paths before propagating them at the next time step. The paper investigates the benefit of additional Rao-Blackwellization steps to sample new regimes at each time step conditional on the forward and backward particles. This rejuvenation step uses explicit integration of the hidden linear states before merging the forward and backward filters for two-filter based algorithms or before sampling new states backward in time for FFBS based methods. The paper displays some Monte Carlo experiments with simulated data to illustrate that this additional rejuvenation step improves the performance of the smoothing algorithms with no substantial additional computational costs. They are also applied to commodity markets using WTI crude oil data.