Modeling and Forecasting Average Temperature for Weather Derivative Pricing

Themain purpose of this paper is to present a feasible model for the daily average temperature on the area of Zhengzhou and apply it to weather derivatives pricing. We start by exploring the background of weather derivatives market and then use the 62 years of daily historical data to apply the mean-reverting Ornstein-Uhlenbeck process to describe the evolution of the temperature. Finally, Monte Carlo simulations are used to price heating degree day (HDD) call option for this city, and the slow convergence of the price of the HDD call can be found through taking 100,000 simulations.The methods of the research will provide a frame work for modeling temperature and pricing weather derivatives in other similar places in China.


Introduction
Weather derivative is a new risk management tool which can be widely used in the financial market to avoid the impact of bad weather effects and control the weather risks. The first weather derivative was introduced in the US in 1997 and the size of the market now exceeds 8 billion dollars. Weather derivatives are different from traditional financial derivatives as their underlying asset such as temperature, humidity, and precipitation, which cannot be traded in the market, so ordinary pricing models such as Black and Scholes formula is not applicable in pricing weather derivatives. As a consequence, the valuation of weather derivatives is generally carried out in a lot of paperwork. Weather derivatives can be based on any meteorological index, such as temperature, rain, wind, and snow, while the most commonly used weather variable is the temperature.
A classical approach that was used to derive the price of a temperature derivative is the performing of simulations based on historical data, known as HBA (History Burn Analysis). HBA is very easy to calculate since there is no need to fit the distribution of the temperature or to solve stochastic differential equations. However, this method can only price the contract roughly since the assumption is not correct usually. It is apparent that the temperature time series is not stationary but contains seasonality, jumps, and trends [1][2][3][4], so this method is bound to be biased and inaccurate.
Early studies tried to model different temperature indices like the heating degree days (HDDs), the cooling degree days (CDDs) directly. Geman and Leonardi [5] analyzed the statistical properties of both HDD and AccHDD indices. They concluded that modeling the HDDs directly is not appropriate. Wimmer [6] tested the validity of index modeling in forecasting temperature index and temperature futures pricing. Their results show that a model without a linear trend is better at forecasting the price of temperature futures.
Recently, more studies are focused on simulating the dynamic change of temperature directly. The estimated models can be used to derive the corresponding indices and price temperature derivatives. Daily models are more accurate than the HBA [7], since calculating the temperature index directly, such as HDDs, may result in losing a lot of information in both common and extreme events. Because the observations of temperature are discrete, a discrete process can be used directly. In [8], a mean-reverting discrete process and a general AR(p) were compared, and he concluded that the distribution of the residuals is not constant through the year. Cao and Wei [9] adapt the framework proposed by Lucas [10] to derive a valuation framework for temperature derivatives and to study the market price of risk. Campbell and Diebold [11] expand the model proposed by Cao and Wei [12]. They use a low-order Fourier series with autoregressive lags to model the seasonal mean temperature. In [13], various models were compared in modeling and forecasting DAT (daily average temperature). These models were the version of modification of models originally proposed by Campbell and Diebold [14], and their results indicate that the modified models outperform the original model.
Since temperature exhibits strong and clear seasonality, the mean-reverting O-U process is proposed in most weather derivatives literature. Dischel [15] argued that the classical Black-Scholes-Merton pricing approach cannot be directly applied in weather derivatives pricing. He is the first person who proposed a continuous stochastic model in temperature forecasting. Dornier and Queruel [16] use a more general ARMA model than the AR(1) model proposed by [17], where Benth and Saltyte-Benth fitted Norwegian data by modeling the DAT variations with a mean-reverting O-U process where the noise process is modeled by a generalized hyperbolic Levy process. Instead of the FBM used in their previous work, they expanded the work of Dornier and Queruel. In order to rectify the rejection of the normality hypothesis, Zapranis and Alexandridis [18] replaced the simple AR(1) model by more complex ones. Their results from the DAT in Paris indicate that, as the model gets more complex, the noise part draws away from the normal distribution. Swishchuk and Cui [19] applied two daily average temperature models to Canadian cities data and derived their derivative pricing applications. Göncü [20] proposes a seasonal volatility model that estimates daily average temperatures of Beijing, Shanghai and Shenzhen using the mean-reverting Ornstein-Uhlenbeck process and then derive analytical approximation formulas for the sensitivities of these contracts. Their results verify the convergence of the Monte Carlo and approximation estimators.
To our knowledge, little literature has been developed for the temperature model of Henan province, which is the most important state granary in China and is also a major financial market of China. In order to ensure food security, it is necessary to transfer the weather risk of Henan province into financial markets through developing weather derivatives. So the main objective of our research is to take the first step to construct weather models for developing weather derivatives in Henan province. We construct a temperature model for the first time for Zhengzhou (see Figure 1), which is the capital of Henan province.
The remainder of the paper is organized as follows. We introduce the basic concepts of weather derivative in Section 2. Section 3 proposes a seasonal mean and volatility model that describes the daily average temperature behavior using the mean-reverting Ornstein-Uhlenbeck process. The unknown parameters of the process will be estimated with 62 years of daily average temperature from Zhengzhou (57083) meteorological station. Then we discuss the pricing model of HDD option based on the temperature forecasting model and price the HDD option with Monte Carlo simulation in

Basic Concepts of Weather Derivative
Weather derivatives are usually structured as swaps, futures, and options based on different underlying weather indexes. Some commonly used indices are HDD, CDD, precipitation, wind, and snowfall. For the sake of clarity, we focus our analysis on derivative products whose underlying is the level of cumulated daily temperatures over a given period, the HDD and CDD.
The daily average temperature is calculated by averaging each day's maximum and minimum temperature from midnight-to-midnight. Given a weather station, let us note by max and min , respectively, the maximum and minimum temperatures measured in one day . We define the mean temperature of the day by For a given site, the degree days are the difference of the daily average temperature from the base temperature (in general 65 degrees Fahrenheit or 18 Celsius). An HDD is the number of degrees by which the day's average temperature is below the base temperature, while a CDD is the number of degrees by which the day's average temperature is above the base temperature. Cooling degree days and heating degree days are never negative. There cannot be both heating and cooling degree days in a single day, given that the daily average temperature can only be either above or below 65 degrees. Thus, if the average daily temperature is less than 65 degrees, HDD will accumulate for the period, and if the average daily temperature is greater than 65 degrees CDD will accumulate. Simply put, HDD and CDD are calculated as follows: where base is the base temperature.

Advances in Meteorology 3
The HDD and CDD indexes are the number of daily HDDs and CDDs, respectively, over a period of days, The CME's (Chicago Mercantile Exchange) HDD and CDD futures and options contracts are based on indexes of HDD and CDD. These indexes are accumulations of daily HDDs and CDDs, over a calendar month or an entire season. There are two types of options, calls and puts. A call option allows an investor to protect himself against the high index levels and a put option allows a company to hedge against the low index levels. The buyer of an HDD call pays the seller a premium at the beginning of the contract. In return, if the number of HDDs for the contract period is greater than the predetermined strike level the buyer will receive a payout. The amount of the payout is determined by the strike level and the tick amount (monetary value for each HDD exceeding the strike level of the option).
There are four basic elements in a contract: (i) the option type (call or put); (ii) the underlying variable, HDD or CDD; (iii) the contract period; (iv) the meteorological station from which the temperature data are recorded; (v) the strike level; (vi) the tick size, the dollar amount attached to each HDD or CDD; (vii) the maximum payout.
For example, we evaluate the payout of an HDD call option. Let denote the strike level; is the tick amount. So, the payout of HDD call ( ) is where is the reference temperature (in general between 18 ∘ C and 20 ∘ C for HDD, while, for CDD, is in general between 26 ∘ C and 28 ∘ C for CDD).
In order to avoid excessive payouts on the contract due to extreme weather, the options often come with the maximum payoff. Suppose is the maximum payoff. Then, the payout of HDD call ( ) is For simplicity though, we will not consider the maximum payoff but pay our attention on the payoff of the calls and puts.

Modeling Temperature
As we mentioned before, temperature derivatives are the most commonly traded on the market. So temperature derivatives modeling and pricing draws our attention. There are some pricing models that focus on the HDD or CDD directly, while others attempt to model temperature directly. We tend to model temperature directly as model HDD or CDD directly may lose a lot of information. Our basic pricing framework is shown as follows: (1) historical daily average temperature data collection; (2) making necessary corrections; (3) creation of a temperature forecasting model; (4) calculation of the price of the derivative using Monte Carlo method.

Data Collection.
Zhengzhou is the capital city and is located in the north central part of Henan province. It is an important transportation hub and agricultural city of central China. Our dataset consists of daily temperatures from Zhengzhou weather station (WMO ID: 57083, latitude: 34 ∘ 43 0 N, longitude: 113 ∘ 39 0 E, and elevation: 111 meters). We collect the data from China meteorological data sharing service system over the period of 1951-2013. And, in order to eliminate the influence of the leap year, we simply drop the additional temperature of the additional days of leap years. Therefore, there are 365 daily average temperatures for every year. Correspondingly, we obtain a dataset with 63 years with 22995 observations with no missing data. Before proceeding to detailed modeling and forecasting results, it is useful to get an overall feel for the daily average temperature data. Figure 2 shows the daily average temperature of Zhengzhou from 1951 to 2013. And the plot and table reveal strong and unsurprising seasonality in average temperature. The daily average temperature moves repeatedly and regularly through periods of high temperature (summer) and low temperature (winter).
In further, the probability distribution is described with a histogram in Figure 3, and Table 1 presents some descriptive statistics for the daily average temperatures. We can clearly see the basic temperature conditions of Zhengzhou. Here, we note that the temperature we used in this paper is "temperature * 10" if there is no special explanation.
In this work, we will assume that the temperature data of 2013 was still unknown to us. We utilize the first 62year (1951-2012) data to construct the temperature forecasting model and utilize the real temperature data of 2013 to verify and validate the model.

Temperature Forecasting Model.
Considering the seasonal variation and long-term trends of temperature, we use the mean-reverting Ornstein-Uhlenbeck process to model the temperature behavior. In the long run, the variation of daily average temperature intends to revert to a mean temperature. In order to test the stochastic fluctuation of daily temperature, we use the Q-Q plot to examine the normality of the difference of the temperature. If the distribution of the difference sequence is normal, the Q-Q plot should be a line. Figure 4 shows the difference of the temperature is approximately normal, so we consider the fluctuation of daily temperature as a Brown motion.
If we denote by the average temperature at date , the dynamics of the process are described by the stochastic differential equation: where is the long-term mean of the process, is the rate of mean reversion, is the volatility square root time of the fluctuations and is a Brownian motion.
In order to simulate trajectories of this process, we need to discretize this equation. Equation (1) can be written as where, ∼ (0, 1).

Estimation of Mean
Temperature. The temperature movement has strong seasonality. Thus, we guess that this seasonal dependence should be modeled with a sine-function of the form, sin( + ), where denotes the time measured in days. Since we know that the period of the oscillation is one year, we have = 2 /365. Besides, owing to global warming all over the world and the urban heating effect, another component of the mean temperature that should be added to the model is a positive trend. The mean temperature at time has the following form: We can rewrite under the following form: Then, it is possible to estimate the parameters by operating some changes of variables and renaming the constant. We can then write By applying the method of ordinary least squares to the observations of the daily temperatures, we can find the parameters as follows: Advances in Meteorology 5  According to formula (4), we simulate the daily average temperature of Zhengzhou from January 1 to December 31 of 2013. Figure 5 shows that the comparison of the simulations and actual values can fit very well. From a long-term trend, the fluctuation of daily average temperature tends to revert to a mean temperature. [17] make Fourier transform to 2 ( ), which is supposed as a deterministic function for simplicity, although a stochastic one is more reasonable. The expansion is as follows:

Estimation of the Volatility. Benth and Saltyte-Benth
Note that we assume 2 ( ) to be a periodic function such that = + ⋅365 for = 1, . . . , 365 and = 1, 2, 3, . . .. According to the approach developed by Bhowan [21], where the volatility is regarded as a stochastic process. A stochastic volatility may be more reasonable, but we will prefer the simpler way if a deterministic 2 ( ) can be applied for the temperature forecasting in Zhengzhou. In a further discussion, besides stochastic volatility, we think a more complex volatility that evolved with years may deserve further study.
We divide the data of the past 62 years (1951-2012) into 365 groups, one for each day of the year. Then we calculate the variance of data in each group. So we get 365 variances of data. Through a nonlinear regression, the estimation of the parameters is obtained in Table 2.
As shown by Table 2, the high frequency part of (7) is insignificant. The variation of 2 ( ) is mainly dominated by the corresponding low periodic frequency part. That is also to say the variation of 2 ( ) changes with the time slowly. Based on the determined 2 ( ), we may also further treat 2 ( ) as a random variable like that in [22]. We omit the random 2 ( ) in this work for simplicity, since the determined 2 ( ) works well for the temperature model of Zhengzhou.

Estimation of the Mean Reverting Speed.
Following the derivations of [21,22], we present the estimation of the mean reverting speed.
For a stochastic process, an unbiased estimator of is the zero of the martingale function given by It can be written as follows: Then, we obtain ) . (18)

Construction of the Temperature Model.
Based on the 22630 history data from the years 1951 to 2012, we get the unbiased estimation of mean reverting speed = 0.3198. According to (7), we have where ∼ (0, 1). We substitute , , into formula (20) and then make Monte Carlo simulation. Figure 6 shows one of the comparisons between simulation and actual values. We can roughly see that the simulation effect agrees well with the real data in 2013 in three aspects: the mean, the volatility, and the mean reverting speed, which verifies the validity of the model we created.

Validation of the Model.
We validate the model five performance measurements. The first four measurements are simply scalar indices only for the performance of one certain simulation, while the last one measures the performance of 100,000 simulations. The first four measurements are mean bias error (MBE), root mean squared error (RMSE), unbiased absolute percentage error (UAPE), and mean proportionate error (MPE), respectively. The measurements which were used can be defined as where is the actual value of the daily temperature, is the estimated value of the daily temperature, and is the number of observations; that is, = obs and = pred . MBE provides information on long-term bias, while RMSE indicates that the difference between the daily temperatures acquired from the stochastic process and the daily temperature actually observed from CME. UAPE is used to assess forecast error and model comparison on the basis of the scale of the error. The use of MPE is limited to the examination of a model's propensity to over-and under-forecast the actual value. A positive MPE indicates a forecasting model under-forecast, whereas a negative MPE indicates a model over-forecasts. Table 3 shows the first four summary measurements for measuring the performance of our model for modeling the seasonal variance. They are MBE = 0.4301, RMSE = 5.1763, UAPE = 1.8839, and MPE = 0.2808. Although these indices are small but not very small, we still think the proposed model has been a basically good approach to model the seasonal variance, since this is only a stochastic simulation and our  In fact, it is more important that the model is statistically significant for weather derivative pricing. We compare the predicted values obtained by our stochastic model with 100,000 simulations to the actual observations for Zhengzhou over year 2013 (see Figure 7). The last measurement is the relative error as a standard statistic error [22] to evaluate the performance of the model in statistics. It is given by the following: where pred and obs refer, respectively, to the predicted and the observed values. Here, we also only consider the ER over monthly periods as [22] to more clearly measure the performance of the model in statistics. In Figure 8, the monthly evolution of the normalized relative error shows that this error belongs to the interval [−0.1 + 0.1] with one jump from December to February, which is much smaller than the interval [−1 1] in [22]. They represent the transitions between the cold and warm periods, where both the small obs around 0 Celsius and the season transition lead to relative large errors. However, the errors, including the transition period which is still in [−0.6 0.1], are much smaller than the [−1.5 +2.5] for simulating the temperature of Casablanca-Anfa: 2004 in [22].
In terms of standardized relative error, it is shown that our model forecasts the temperature correctly with highly statistical significance. We can also conclude that the simulation effect agrees well with the real data in 2013 in three aspects:  the mean, the volatility, and the mean reverting speed, which verifies the validity of the model.

Option Pricing with Monte Carlo Simulation
As for HDD call options, the price of the derivative at 1 is where is the discount interest rates, 2 is the maturity date of the contract, is the strike level, and is the principal nominal.
We use the Monte Carlo method to generate a set of paths and for each of these paths we calculate the payoff. Then, the average of the payoffs from all the generated paths will represent the expected price of the derivative.
Then we get the option pricing formula as follows: where is the number of simulations. Suppose that there is an HDD call option, defined as in Table 4.
We make Monte Carlo simulation for 100 times, and the calculated HDDs value in January 2013 of Zhengzhou is 500, while the actual HDDs value is 498. The accuracy of our prediction model is 0.3%, so the mean-reverting Ornstein-Uhlenbeck model we created is applicable. Then we calculate that the price of HDD call option at time 1 is 5851.750. If we make 100,000 simulations, we find From Figure 9, we remark that the price of the HDD call is slowly convergent.

Conclusions
China's climate varies drastically because the country is large, and weather events affect the profits of many of its industries. As a new risk management instrument, weather derivative market is developing rapidly. So the introduction of weather derivatives can bring significant benefits to various weatheraffected industries in China. As weather is not a tradable asset, it is a relatively complicated task to make a "fair" pricing model for them. In order to enable the weather derivative market and bring significant benefits to various weatheraffected industries in China, it is essential for the new market, especially in China, that a transparent and reliable approach to pricing is put in place so that the participants can feel secure in their transactions. In this study, a unified approach to pricing is established for the weather derivative marketplace, and we mainly worked on a stochastic modeling to describe the behavior of the temperature choosing as underlying for the HDD call weather derivative. Through taking Zhengzhou, for example, we establish a stochastic model by the 62-year historical temperatures in Zhengzhou and use the Monte Carlo method for the HDD call pricing. The model describing the stochastic behavior of the temperature is validated on the basis of temperatures observed during year 2013. As shown by the monthly evolved relative errors, the proposed mean-reverting Ornstein-Uhlenbeck model can reasonably describe daily temperature data of Zhengzhou in the aspects of the mean, volatility, and mean reverting speed. Therefore, the mean-reverting Ornstein-Uhlenbeck model, on the basis of 62 year of data of temperature, in Zhengzhou, enabled us to simulate a temperature weather derivative contract.
Here, we should mention that the temperature of the physical formation process is extremely complex, especially for complex regional climates of China; the study on the technical design temperature option is not enough and still worthy of more in-depth theoretical exploration and practices. We should develop more weather models for each geographic region for the weather derivative market in China. In addition, it is also necessary to further develop more precise temperature models for weather derivatives pricing. Many other mathematical tools, such as wavelet functions, B-Spline functions, and polynomial function, can be used for a more accurate modeling of temperature data. It would be interesting to treat carefully the jumps of temperatures due to extreme phenomena by integrating a term which describes their behavior within the stochastic model. This will make it possible to improve the performance of the daily scale modeling. In the end, we emphasize that it is also important to construct accurate weather models for other weather factors, such as the rain, the snow, or the fog, since the risks implied by these factors also deeply affect economic development of China.