1 Introduction

Crude oil is without doubt one of the most important commodities, strongly tied to industrial and economic growth. Trading in crude oil is mainly organized via futures contracts but option contracts play an increasingly popular role in the daily trade, mainly for risk management purposes. Nowadays, there are two price benchmarks for crude oil, the Brent and the WTI (West Texas Intermediate). In this paper, we will focus on the WTI, which is also known as Texas light sweet (relatively low-density and low sulphur content). WTI is mainly traded in North America, and its futures contracts are mainly traded on the New York Mercantile Exchange (linked to the CME group). On a typical trading day about 1 million contracts are traded at the CME, the majority being futures and 10% of contracts being options. All futures on WTI traded on the CME have physical settlement, which means that these futures are tied to actual delivery when they mature, and the place of settlement is Cushing, Oklahoma.

The pricing of futures and options on commodities in a modern theoretical framework has been considered at least since Black (1976). Ramaswamy and Sundaresan (1985) extended Black’s ideas to value American style call options on futures contracts. Further, Shastri and Tandon (1986) suggested a way to price both American type call and put options on the Standard and Poor 500 index and the West German Mark futures contracts. Schwartz (1997) and Hilliard and Reis (1998) shifted the attention to the incorporation of a stochastic convenience yield into the modeling. The convenience yield takes account of certain costs and benefits that apply to holding the asset, and as Schwartz (1997) demonstrated, allowing this model parameter to follow an Ornstein–Uhlenbeck process provides the model with enough flexibility to produce a reasonable large class of shapes for the forward-curves. The Schwartz (1997) model is typically calibrated to futures prices by using the Kalman-Filter approach, taking account of the fact that the convenience yield and possibly the spot-price of the commodity are not directly observable.Footnote 1\(^{,}\)Footnote 2 Hilliard and Reis (1998) presented a formula that prices a European call option on a commodity in the Schwartz (1997) framework.

Our paper expands on Schwartz (1997) and Hilliard and Reis (1998) as such as that we estimate the implied spot price and convenience yield of the underlying commodity from options rather than futures. More specifically, we use the extended Kalman-Filter and prices of European call options on WTI crude oil futures to estimate the Schwartz (1997) model. The motivation for this lies in the fact that option prices carry far more information on the volatility structure of the underlying asset than futures do. The implied volatility smile as well as the existence and strong use among practitioners of local and stochastic volatility models provide plenty of evidence for this statement.

The remainder of the paper is organized as follows. The mathematical framework will be presented in the second section, while the econometric methodology used to estimate the model will follow in the third section. In the fourth section, the data and general assumptions for this paper will be described in detail. Empirical results and conclusions will be discussed in the last section.

2 Mathematical model

Let us recall from the Schwartz (1997) two-factor model that the dynamics of the commodity spot price and convenience yield are given by the following stochastic differential equations:

$$\begin{aligned} dS/S= & {} (\mu -\delta ) dt+\sigma _{1} dZ_{1} \\ d\delta= & {} \kappa (\alpha -\delta )dt+\sigma _{2}dZ_{2}. \end{aligned}$$

With positive correlation \(\rho = dZ_{1}dZ_{2}\), this model generates a mean reversion effect in spot and convenience yield. This is typically present in commodity prices and explained in Schwartz (1997). For pricing of commodity contracts, the dynamics under the pricing measure is relevant. Following Schwartz (1997), we have

$$\begin{aligned} dS/S= & {} (r-\delta )dt+\sigma _{1} d\hat{Z}_{1} \\ d\delta= & {} (\kappa (\alpha -\delta )-\lambda )dt+ \sigma _{2}d\hat{Z}_{2}, \end{aligned}$$

where \(d\hat{Z}_{1}\) and \(d\hat{Z}_{2}\) are Brownian motions under the pricing measure and \(\lambda \) represents the market price of convenience yield risk and \(\rho = d\hat{Z}_{1}d\hat{Z}_{2}\).Footnote 3

Using a scaled market price of convenience yield risk \(\lambda = \tilde{\lambda }\sigma _{2} \) Hilliard and Reis (1998) compute the price of a futures with maturity T at time t asFootnote 4

$$\begin{aligned} F(S_{t},\delta _{t},t,T)=S_{t} A(\tau )e^{-H(\tau ) \delta _{t}}\frac{1}{P(t,T)} \end{aligned}$$

with

$$\begin{aligned} A(\tau )=\exp \left[ \frac{(H(\tau )-\tau )(\kappa ^{2} \alpha -\kappa \tilde{\lambda }\sigma _{2}-\sigma _{2}^2/2+\rho \sigma _{1}\sigma _{2}\kappa )}{\kappa ^{2}}-\frac{\sigma _{2}^2 H^{2}(\tau )}{4\kappa }\right] \end{aligned}$$

and

$$\begin{aligned} H(\tau )=\frac{1-e^{-\kappa \tau }}{\kappa }, \end{aligned}$$

where \(S_{t} \) is the current level of the spot price; \( \delta _{t}\) is the current level of the convenience yield; \( \tau =T-t \) is the length of time to maturity; and P(tT) is the price at time t of a zero-coupon bond with maturity at time T.

We have that

$$\begin{aligned} F_{S} S=F \end{aligned}$$

and

$$\begin{aligned} F_{\delta }=F H(\tau ). \end{aligned}$$

Based on Itô’s Lemma, the futures price follows the following dynamics under the risk-neutral measure:

$$\begin{aligned} dF= & {} \left[ F_{t}+\frac{1}{2}F_{SS}\sigma _{1}^{2}S^{2}+\frac{1}{2}F_{\delta \delta }\sigma _{2}^{2}+F_{S\delta }\rho S\sigma _{1}\sigma _{2}+F_{S}(r-\delta )S+F_{\delta }(\kappa (\alpha -\delta )-\lambda )\right] dt \\&+F_{S}\sigma _{1}Sd\hat{Z}_{1}+F_{\delta }\sigma _{2}d\hat{Z}_{2}, \end{aligned}$$

Setting the drift to zero one obtains the pricing PDE for the futures. Substituting \( F_{S} \) and \( F_{\delta } \), the futures’ price change can be rewritten as follows:

$$\begin{aligned} dF=F\sigma _{1}d\hat{Z}_{1}-FH(\tau )\sigma _{2}d\hat{Z}_{2}. \end{aligned}$$

Further, if we define a new Brownian motion \( Z_{F} \) and volatility \( \sigma _{F} \) through

$$\begin{aligned} \sigma _{F}d\hat{Z}_{F}\equiv {\sigma _{1}d\hat{Z}_{1}-H(\tau )\sigma _{2}d\hat{Z}_{2}}, \end{aligned}$$

then, it is not hard to obtain the following:

$$\begin{aligned} dF/F=\sigma _{F}d\hat{Z}_{F}, \end{aligned}$$

whereFootnote 5

$$\begin{aligned} \sigma _{F}^{2}(\sigma _{1},\sigma _{2},\rho ,\kappa ,\tau )=\sigma _{1}^{2}+\sigma _{2}^{2}H(\tau )^2-2\rho \sigma _{1}\sigma _{2}H(\tau ). \end{aligned}$$

This representation enabled Hilliard and Reis (1998) to provide an explicit expression for the price of a European call option in Schwartz’ (1997) model:

$$\begin{aligned} C(t,T_{1},T)=P(t,T_{1})[F(t,T)N(d_{1})-KN(d_{2})], \end{aligned}$$

where \( d_{1}=\frac{\ln (F(t,T)/K)+0.5\upsilon ^{2}}{\upsilon } \), \(d_{2}=d_{1}-\upsilon \). Here t is the current time; \( T_{1} \) is the time of maturity of the call option on the futures contract; T is the time of maturity of the underlying futures contract, with \(T_{1}\le T \); F(tT) is the futures price at time t for maturity at time T; \( C(t,T_{1},T)\) is the price of a call with maturity \(T_{1}\) on the underlying futures contract with the maturity T at time t; \(P(t,T_{1})\) is the price of a zero-coupon bond with maturity \(T_{1}\) and K is the strike price of the European call option. Further \(\upsilon ^{2} \) can be computed as:Footnote 6

$$\begin{aligned} \upsilon ^{2}= & {} \sigma _{1}^{2}(T_{1}-t)-\frac{2\sigma _{1}\sigma _{2}\rho }{\kappa }\left[ (T_{1}-t)-\frac{e^{-\kappa (T-T_{1})}-e^{-\kappa (T-t)}}{\kappa }\right] \\&+\frac{\sigma _{2}^{2}}{\kappa }\left[ (T_{1}-t)-\frac{2}{\kappa }\left( e^{-\kappa (T-T_{1})}-e^{-\kappa (T-t)}\right) +\frac{1}{2\kappa } \left( e^{-2\kappa (T-T_{1})}-e^{-2\kappa (T-t)}\right) \right] . \end{aligned}$$

For our empirical analysis it is convenient to express Hilliard and Reis’ (1998) option pricing formula in the notation of Schwartz, as we will consider simultaneously options with different strikes and maturities at the same time. This reflects on choosing current time \(t=0\) at each instance and T respectively \(T_1\) becoming the time to maturity. By doing this, we eliminate t from the notation and obtain the following expression for the price of the European call option above

$$\begin{aligned} C(S,\delta ,T_{1},T)=P(t,T_{1})[F(S,\delta ,T)N(d_{1})-KN(d_{2})], \end{aligned}$$

where \( d_{1}=\frac{\ln (F(S,\delta ,T)/K)+0.5\upsilon ^{2}}{\upsilon }\) and \(d_{2}=d_{1}-\upsilon \). Following in the spirit of Schwartz (1997), we will in the following assume that option prices are directly observable, while the state variables S and \(\delta \) are hidden.Footnote 7 To get hold of these hidden state variables we will employ filtering theory.

3 The extended Kalman Filter algorithm

The classical Kalman Filter algorithm requires a linear setup and is hence not applicable to our options based framework. Instead, we will employ the extended Kalman Filter. The principles of the Kalman Filter algorithm and the extended Kalman Filter algorithm are the same, except that the latter involves linearization of some key functionals and approximations.

Before discussing the extended Kalman Filter, in order to introduce some notations, the basic principles behind filtering technology are explained in the following: Using Bayes theory, filters can use the information about current observations to predict the values of unobservable variables at the current time point, and then update the information and forecast the situation at the next time point (compare Pasricha 2006).

To be more specific, the process of filtering can be described as follows: In a state space model, we consider two parts, the state variable \( x_{k} \) for \( k=1,2,\ldots ,K \) and the observations \( z_{k} \) for \(k=1,2,\ldots ,K \), where K is the count of the observations of the time variable. Both are possibly multi-dimensional. Normally, progression is determined by the state dynamics, \(x_{k}=f_{k}(x_{k-1},v_{k}) \), where \( x_{k} \) and \( x_{k-1} \) are the respective states at time points k and \(k-1\) and \(v_k\) is the noise effecting the dynamics. Then \( x_{k} \) is following a first-order Markov process and \(x_{k}\mid {x_{k-1}}\sim {p_{x_{k}\mid {x_{k-1}}}(x_{k}\mid {x_{k-1}})}\). The relationship between state variable(s) and the observations can be described as \( z_{k}=h_{k}(x_{k},n_{k}) \), where \( x_{k} \) is the state variable(s) and \( n_{k} \) is the measurement noise at time k.

We denote with \( z_{1:{k}} \) the history of observations from the start of the time series until time k. We assume that the observations are conditionally independently given \( x_{k} \). For \( k\ge {1} \) the expression \( p(x_{k} \mid {x_{k-1}}) \) presents the state transition probability. Essentially, the recursive filter consist of two steps: The first step is referred to as the prediction step, while the second step is called the update step. Schematically this comes down to

$$\begin{aligned} p(x_{k-1}\mid {z_{1:{k-1}}})\rightarrow & {} {p(x_{k}\mid {z_{1:{k-1}}})} \quad \text{ prediction } \text{ step } \\ p(x_{k}\mid {z_{1:{k-1}}}),z_{k}\rightarrow & {} {p(x_{k}\mid {z_{1:{k}}})} \quad \text{ update } \text{ step. } \end{aligned}$$

As for the prediction step, assume the probability density function \( p(x_{k-1}\mid {z_{1:{k-1}}}) \) is available at time point \(k-1\), using the Chapman–Kolmogorov equation, the prior probability of the state at time k can be expressed as:

$$\begin{aligned} p(x_{k}\mid {z_{1:{k-1}}})=\int { p(x_{k}\mid {x_{k-1}})p(x_{k-1}\mid {z_{1:{k-1}}})dx_{k-1}}. \end{aligned}$$

For the update step, the posterior probability density function is

$$\begin{aligned} p(x_{k}\mid {z_{1:{k}}})=\frac{p(z_{k}\mid {x_{k})p(x_{k}\mid {z_{1:{k-1}}})}}{p(z_{k}\mid {z_{1:{k-1}}})}, \end{aligned}$$

where

$$\begin{aligned} p(z_{k}\mid {z_{1:{k-1}}})=\int {p(z_{k}\mid {x_{k})p(x_{k}\mid {z_{1:{k-1}}})}dx_{k}}. \end{aligned}$$

The classical Kalman filter requires the state transition function as well as the measurement function to be linear and the noise terms to be normal distributed. Only in this case, prediction and update steps reflect conditional expectations and variances only. In the case of the extended Kalman filter, a local linearization of the aforementioned equations is carried out and the relevant conditional densities \( p(x_{k-1}\mid {z_{1:{k-1}}})\), \(p(x_{k}\mid {z_{1:{k-1}}}) \) and \( p(x_{k}\mid {z_{1:{k}}})\) are approximated by normal densities as follows:

$$\begin{aligned}&p(x_{k-1}\mid {z_{1:{k-1}}})\approx {N(x_{k-1};m_{k-1\mid {k-1}},P_{k-1\mid {k-1}})}, \\&\quad p(x_{k}\mid {z_{1:{k-1}}})\approx {N(x_{k};m_{k\mid {k-1}},P_{k\mid {k-1}})}, \end{aligned}$$

where \(m_{k-1\mid {k-1}}\) respectively \(m_{k\mid {k-1}}\) and \(P_{k-1\mid {k-1}}\) respectively \(P_{k\mid {k-1}}\) reflect expectations and variances of the relevant conditional distributions in approximation. We have

$$\begin{aligned}&p(x_{k}\mid {z_{1:{k}}})\approx {N(x_{k};m_{k\mid {k}},P_{k\mid {k}})} \\&\quad m_{k\mid {k-1}}=f_{k}(m_{k-1\mid {k-1}}) \\&\quad P_{k\mid {k-1}}=Q_{k-1}+\widehat{F_{k}}P_{k-1\mid {k-1}}\widehat{F_{k}}^{T} \\&\quad m_{k\mid {k}}=m_{k\mid {k-1}}+K_{k}(z_{k}-h_{k}(m_{k\mid {k-1}})), \end{aligned}$$

as well as

$$\begin{aligned} P_{k\mid {k}}=P_{k\mid {k-1}}-K_{k}\widehat{H_{k}}P_{k\mid {k-1}}=(I-K_{k}\widehat{H_{k}})P_{k\mid {k-1}}, \end{aligned}$$

where

$$\begin{aligned} K_{k}=P_{k\mid {k-1}}\widehat{H_{k}}^{T} \left( \widehat{H_{k}}P_{k\mid {k-1}}\widehat{H_{k}}^{T}+R_{k}\right) ^{-1} \end{aligned}$$

is known as gain, and

$$\begin{aligned} \widehat{F_{k}}=\frac{df_{k}(x)}{dx}\mid _{{x=m_{k-1\mid {k-1}}}}, \qquad \widehat{H_{k}}=\frac{dh_{k}(x)}{dx}\mid _{{x=m_{k-1\mid {k-1}}}} \end{aligned}$$

are the relevant Jacobian matrices. This process is known as the extended Kalman filter algorithm.

The likelihood function and the maximum likelihood value of the Kalman Filter is seen as a reasonable approximations of the likelihood function and the maximum likelihood value of the extended Kalman Filter. According to Harvey (1989), the joint density can be written as: \( L(z;\Psi )=\prod _{k=1}^{K}p(z_{k}) \), where \( p(z_{k})\) is the (joint) probability density function of the k-th set of observations, and \(\Psi \) is the set of unknown parameters, if the K sets of observations \(z_{1},\ldots ,z_{K} \) are independently and identically distributed. However, the sets of observations are generally not independent; therefore, the aforementioned \(L(z;\Psi )\) cannot be applied. In this case \(L(z;\Psi )=\prod _{k=1}^{K}p(z_{k}\mid {Z_{k-1}}) \), where the capital \(Z_{k-1} \) is denoted as \( Z_{k-1}=\{z_{k-1},z_{k-2},\ldots ,z_{1}\} \). The distribution of \(z_{k} \) conditional on \( Z_{k-1} \) is itself normal, if the initial state vector and the disturbances have multivariate normal distributions. Since the expectation of the \(z_{k}\) at time point \(k-1 \) is based only on the information at \(k-1\), the likelihood function can be written as

$$\begin{aligned} \log {L}=-\frac{NK}{2}\log {2\pi }-\frac{1}{2}\sum \nolimits _{k=1}^{K}{\log {\mid {D_{k}\mid }}-\frac{1}{2}}\log {\sum \nolimits _{k=1}^{K}v_{k}^{'}D_{k}^{-1}u_{k}} \end{aligned}$$

where \(u_{k}=z_{k}-z_{k\mid {k-1}} \) and \(D_{k}=H_{k}P_{k\mid {k-1}}H_{k}^{'}+R \).

4 Data and assumptions

All data used in our analysis have been collected from Bloomberg. More specifically, we investigate the prices of 15 European call options on WTI crude oil traded at the Chicago Mercantile Exchange (CME), which are monitored daily over one year from 1st March 2013 to 28th Feb 2014. Three different strike prices have been chosen: 102.5 dollars per barrel, 100 dollars per barrel and 97.5 dollars per barrel. In addition to this, each strike price corresponds to five calls differing in their time to maturity. The five options contracts maturities are one month, two months, five months, eight months and one year, respectively. Specifically, the calls used on the WTI mature in April 2014, May 2014, Aug 2014, Nov 2014, and March 2015, respectively. Prices for the corresponding futures are collected at the same time. The risk free rate has been obtained from treasury notesFootnote 8 maturing in the same months as the options, i.e. April 2014, May 2014, Aug 2014, Nov 2014, and March 2015, respectively.Footnote 9

5 Empirical results and conclusion

Estimated parameters from the various European calls are shown in Table 1. Overall, the various estimated parameters from the three options are rather similar to each other. To be more specific, \(\kappa \)’s are around 0.8; \(\alpha \)’s and \( \lambda \)’s are close to 0.05; \(\sigma _{1}\) and \( \sigma _{2}\) are around 0.45 and 0.5, respectively; and \(\rho \)’s are around 0.75. A slight upward trend for \(\sigma _1\) with increasing the strike can be observed, reflecting what is generally referred to as a volatility skew in the context of the Black–Scholes model. In contrast, the parameters estimated from the futures contracts are notably different, showing a significantly larger \(\sigma _1\) and smaller \(\sigma _2\), and also markedly different correlation as well as mean reversion level.Footnote 10

Table 1 The estimated parameters

Figure 1 exhibits the estimated spot price of WTI crude oil determined by the extended Kalman Filter applied to options with strike prices of 97.5, 100 and 102.5 dollars per barrel, and in comparison the estimated spot price of WTI crude oil by the classical Kalman Filter applied to futures as in Schwartz (1997). It is easy to see that the estimated spot prices for all three different strike prices of WTI crude oil fluctuate between 80 and 120 dollars per barrel in the test period, and all three spot prices estimated from options are extremely similar. The estimated spot from futures prices (yellow line) differentiates itself from that group by showing a less extreme behavior, i.e. above the other estimates, if low and below the other estimates if high.

Fig. 1
figure 1

Estimated spot price from calls and futures contracts. (Color figure online)

Figure 2 shows the estimated convenience yields for WTI crude oil obtained from options through the extended Kalman Filter for strike prices of 97.5, 100 and 102.5 dollars per barrel, as well as the estimated convenience yield for WTI crude oil obtained from futures through the classical Kalman Filter. The convenience yields corresponding to the three different strikes are all relatively closely aligned, fluctuating between 0 and about 0.3. Convenience yields corresponding to lower strikes appear to be slightly elevated. Prominently, at the beginning of the sample period the estimated convenience yields based on strikes \(K=100\) and \(K=102.5\) drop to about 0, while this peg does not appear in the call with strike \(K=97.5\). As for the estimated spot, the convenience yield obtained from the associated futures prices differs markedly, showing less extreme movements and less fluctuation.

Fig. 2
figure 2

Estimated convenience yield from calls and futures contracts

Fig. 3
figure 3

The term structures of the European calls at 50th testing day

Fig. 4
figure 4

The term structures of the European calls at 100th testing day

Figures 3, 4, and 5 show the term structures of the European calls, i.e. prices for increasing maturities, at different dates during the sample period. In all three pictures, the horizontal axis represents the number of months to maturity and the vertical axis represents the prices of the calls. As shown in the figures, the dotted lines represent the model-estimated prices corresponding to different calls based on different strike prices, where model calibration has been undertaken via the extended Kalman filter and option data as described earlier. The solid lines represent the observed prices corresponding to the different calls based on different strike prices. It is not hard to observe that the model-estimated prices are close to the observed prices of the calls in all three cases, whatever the strike prices are, the fit is excellent. However, in addition to these, the lines with plus signs lying far above the observed option prices represent the model prices obtained from the Schwartz (1997) model, calibrated to observed futures prices via the classical Kalman filter. As can be seen very clearly, this calibration performs poorly, providing clear evidence, that if the main objective is to price options in the Schwartz (1997) model, then the approach presented in our paper should be followed, rather than the standard Schwartz (1997) approach.

Fig. 5
figure 5

The term structures of the European calls at 200th testing day

Figures 6, 7 and 8 show model and observed prices over the sample period for nine calls maturing in April 2014, August 2014 and March 2015 with strike prices \(K=97.5\), \(K=100\) and \(K=102.5\). As can be seen, fit is excellent, except towards the end of the sample period for the option maturing in March 2015.

Fig. 6
figure 6

The observed and model prices of European calls with the strike price of K \(=\) 97.5

Fig. 7
figure 7

The observed and model prices of European calls with the strike price of K \(=\) 100

Fig. 8
figure 8

The observed and model prices of European calls with the strike price of K \(=\) 102.5

Figure 9 shows that all four parameter sets, i.e. those from three options with strikes \(K=97.5\), \(K=100\), and \(K=102.5\) as well as from the underlying futures contracts, produce a very good fit for pricing futures contracts.

Fig. 9
figure 9

The observed futures prices and the model-estimated futures prices based on the different strike prices of European style calls

6 Conclusions

Using the extended Kalman filter, we have estimated the Schwartz (1997) two-factor model by means of historical prices for options with differing maturities and strikes. We found that some parameter estimates significantly differ from those obtained via the classical Schwartz (1997) approach, which involves using the classical Kalman filter and futures prices. We then demonstrated, that while the parameter sets obtained from the options are also able to provide a good fit when pricing futures, the opposite does not hold. Using the parameters obtained via the classical Schwartz (1997) approach to price options produces a significantly poorer fit than our approach. Hence, when the objective is to price both futures and options simultaneously within the Schwartz (1997) framework, our recommendation is to fit the model to options rather than futures by using the extended Kalman filter.