Estimation of Smooth Functions via Convex Programs

Abstract One of the numerically preferred methods for fitting a function to noisy data when the underlying function is known to be smooth is to minimize the roughness of the fit while placing a limit on the sum of squared errors. We show that the fit can be formulated as a solution to a convex program. Since convex programs can be solved by various methods with guaranteed convergence, our formulation enables one to use these methods to compute the fit numerically. Numerical results show that our formulation is successfully applied to the problem of sensitivity estimation of option prices as functions of the underlying stock price.


Introduction
We consider the problem of fitting a function g : [a, b) → R to noisy data when the underlying function f * : [a, b) → R is assumed to satisfy a certain smoothness condition. We presume that we observe m noisy measurements Y i1 , . . . , Y im of f * at each x i ∈ [a, b) for 1 ≤ i ≤ n, and that for 1 ≤ i ≤ n and 1 ≤ j ≤ m, where ϵ i1 , . . . , ϵ im are independent and identically distributed (iid) with a mean of 0 and a variance of σ 2 i < ∞. We are particularly interested in the case where the underlying function f * is known to be k times differentiable (for k ≥ 2) and its kth derivative is square integrable. This situation naturally arises when the (k − 2)th derivative of f * is observed to be smooth, and hence, one tries to find the fit g by minimizing the roughness of the (k − 2)th derivative of the fit g. Since the "roughness" of a function f : [a, b) → R is measured by ∫ b a { f (2) (x) } 2 dx, the roughness of the (k − 2)th derivative of g is measured by } 2 dx, one needs to ensure that the fitted function is close enough to the estimated values of f * . This can be done by imposing a limit on the sum of squared distances between the fitted function and the estimates of f * as follows: This leads to the following formulation for computing the fit: Formulation (1) was introduced by Schoenberg (1964) and was studied extensively in the numerical analysis community by a number of authors including Reinsch (1967), Reinsch (1971), Wahba (1975), and Gander (1980). Formulation (1) is often contrasted, in the statistics literature, with the following formulation: www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 5, No. 4;2016 over g ∈ D k for a sequence of nonnegative real numbers (λ n : n ≥ 1). In Formulation (2), the term measures the roughness of the fit g and the term ) 2 /n measures the sum of squared errors. The parameter λ n controls the trade-off between the roughness and the goodness-of-fit.
One of the advantages of Formulation (1) over Formulation (2) is that a good estimate of S is easily provided in Formulation (1), while the performance of Formulation (2) is highly sensitive to the choice of (λ n : n ≥ 1) and selecting (λ n : n ≥ 1) is not straightforward. For this reason, Formulation (1) is preferred in the numerical analysis community. A good estimate of S in Formulation (1) can be obtained using the laws of large numbers as follows. By the weak law of large numbers, ∑ m j=1 ϵ i j / √ m converges in distribution to N(0, σ 2 i ) as m increases to infinity, where N(µ, σ 2 ) denotes a normal random variable with a mean of µ and a variance of σ 2 . Hence, ϵ 2 i , where ϵ i = (1/m) ∑ m j=1 ϵ i j , can be approximated by (N(0, σ 2 i )) 2 /m for m sufficiently large. By applying the strong law of large numbers in n, ∑ n i=1 (N(0, σ 2 i )) 2 /n can be approximated as ∑ n i=1 σ 2 i /n for n sufficiently large. Therefore, the following approximation is possible for n and m sufficiently large. The symbol ≈ is used to informally express "approximate equality." In practice, σ 2 i is estimated by the sample variance . Despite its practical importance, there exist few numerical procedures that compute the solution of Formulation (1) with guaranteed convergence. Traditionally, the solution of (1) is computed as follows. For each S > 0, there is a unique λ n = λ n (S ) such that the solution of (2) for this λ n is the solution of (1). Furthermore, the solution of (2) for this λ n , ) 2 /n = S . Since (2) can be solved by solving a set of linear systems (pages 410-412 of Györfi et al., 2002), λ n (S ) can be computed iteratively by means of the Newton procedure starting from an initial guess of λ n (S ). This procedure does not guarantee global convergence to the solution of (1); see Reinsch (1971).
In this paper, we show that (1) can be reformulated as a convex program (Proposition 1). Convex programs can be solved using various methods that guarantee global convergence to the solution; see the Lagrangian method on page 217 of Zangwill (1969) for an example of methods that solve convex programs. Our formulation thus enables one to compute the solution of (1) with guaranteed convergence by using those methods and powerful software packages that are designed to solve convex programs.
This paper is organized as follows. Section 2 describes the proposed formulation in detail. Numerical results in Section 3 illustrate that our formulation is successfully applied to the problem of sensitivity estimation of option prices as functions of the underlying stock price. Concluding remarks are included in Section 4.

Proposed Formulation
In this section, we describe how Formulation (1) can be reformulated as a convex program. We first present some preliminary results.
A spline function with degree r > 1 with knots We denote the set of spline functions with degree r by S r ([a, b)). Even though S r ([a, b)) seems to be infinite dimensional, it turns out to be finite dimensional with the dimension equal to r + n + 1. We describe one of the bases for S r ([a, b)), which is the set of B-splines; the B-splines are preferred in numerical studies since they have bounded supports, and hence, produce well-conditioned numerical settings. We introduce additional knots x −r , . . . , x 0 , x n+1 , . . . , x n+r+1 so that The B-spline B i,r of degree r is defined recursively by for i = −r, . . . , n + r and x ∈ R and International Journal of Statistics and Probability Vol. 5, No. 4;2016 for i = −r, . . . , n + r − l, l = 1, . . . , r, and x ∈ R. By Theorem 14.1 on page 262 of Györfi et al. (2002), We are ready to present the main result of this paper.

dx can be computed by evaluating B (k)
i,2k−1 using the recursion in (8) and by numerically evaluating the integration. Second, one can use the closed form formula for the integration of the product of the kth derivatives of the B-splines given by Equation (7) on page 1026 of Vermeulen et al. (1992) and the closed form formulas for the B-splines (Equation (1.20) on page 8 of Dierckx, 1993) to directly compute the kth derivatives and the integration of their products.

A Motivation from Finance
This paper is motivated by the need to estimate the price of a stock option and its derivatives as functions of the underlying stock price. The first and second derivatives of an option price play an important role when financial institutions manage a portfolio of stocks and stock options in an attempt to hedge the risks associated with the portfolio. For example, consider a call option that gives the holder of the option the right to buy the underlying stock by a certain date for a certain price. Since the price of such a call option depends on the underlying stock price, the option price can be denoted by f * (x), where x is the underlying stock price per share. The delta (∆) of the option is defined by the first derivative d f * /dx of the option price with respect to the underlying stock price. It is well known that a portfolio consisting of a short position of the call option and a long position of ∆ shares of the underlying stock is expected to grow at a risk-free interest rate. Since the value of delta changes as the underlying stock price changes over time, the number of shares of the underlying stock in the portfolio must be changed periodically to stay in the risk-free position. This step is called rebalancing. When rebalancing a portfolio, the gamma (Γ) of the option, which is the second derivative d 2 f * /dx 2 of the option price with respect to the underlying stock price, is used since the value of gamma tells us how much delta changes, and hence, how many shares of the underlying stock should be sold or bought in order to stay in a delta neutral position.
Recently, financial institutions have been issuing stock options with much more complex payoff structures than that of a call option. For such options, the option price cannot be expressed in a closed-form formula, and hence, one needs to use simulation to estimate the option price. Simulation of the option price consumes a significant amount of time. In order to facilitate quick decisions, traders in financial institutions conduct simulations before they actually need to rebalance a portfolio. Since the underlying stock price in the future cannot be predicted accurately, the traders conduct simulations for all possible underlying stock prices on the day when rebalancing takes place. The question thus boils down to how to estimate the option price f * and its first and second derivatives, d f * /dx and d 2 f * /dx 2 , over a range [a, b) of possible underlying stock prices using simulation.
One simple strategy for estimating f * , d f * /dx and d 2 f * /dx 2 for x ∈ [a, b) is to choose various possible values for the stock price, say x 1 , . . . , x n , from [a, b), estimate the option price at each x i for 1 ≤ i ≤ n using simulation, use finite differences of the estimated option prices to estimate delta, and use finite differences of the estimated delta values to estimate gamma. A serious drawback of this approach is that the estimated delta and gamma values often lead to noisy curves as functions of the underlying stock price. It is especially frustrating for traders to see gamma values fluctuating around zero because positive gamma values suggest purchasing additional shares of the underlying stock, while negative gamma values suggest selling some of the shares. Figure 1 shows an example of the estimated gamma values plotted against the underlying stock price. The stock prices S 1 , S 2 , and S 3 are close to one another, but the graph suggests different strategies because the gamma values are positive, negative, and positive at S 1 , S 2 , and S 3 , respectively. This degree of randomness in the gamma curve is not acceptable in practice.
To overcome this drawback, we propose fitting a curve g : [a, b) → R to the estimated values of f * so that the fitted curve has a smooth second derivative. Since the "roughness" of a function f : } 2 dx, we want to make sure that the fitted function is close to the estimated values of f * by placing a limit on the sum of squared distance between the fitted values and the estimated values. This leads to the following optimization problem: , where Y i j is the jth replication of the estimated value of the option price at x i for 1 ≤ i ≤ n and 1 ≤ j ≤ m. The above formulation is a special case of Formulation (1) with k = 4. Estimate of * S1 S3 S2 Figure 1. The horizontal axis is the underlying stock price, and the vertical axis is gamma.

Applying Our Formulation to Sensitivity Estimation of Option Prices
We consider the case where f * (x) is the expected payoff of a certain equity-linked security (ELS) when the underlying stock price is denoted by x. The payoff function of the ELS has the following structure. Suppose that the ELS is issued at time 0 and matures at time T . We denote the underlying stock price at time t ∈ [0, T ] by S t . There are q days when early redemption is possible. On each of those days d i for 1 ≤ i ≤ q, the ELS expires with a payoff of $r i if S d i /S 0 exceeds some threshold b i . Otherwise, the ELS does not expire until maturity. If there is no early redemption and S t /S 0 does not drop below a limit b until maturity, then the ELS expires with a payoff of $1 at maturity. Otherwise, the ELS expires with a payoff of $S T /S 0 at maturity.
We let a = 90, b = 110, and x i = 90 + (20)(i/n) − (10/n) for 1 ≤ i ≤ n. For each x i , a sample path of a geometric Brownian motion is generated as a trajectory of the stock price between now and maturity, and the corresponding payoff of the ELS is computed. Y i j is the payoff computed this way in the jth replication of the geometric Brownian motion at x i . The parameters used for the experiment are T = 365 days, q = 6, d 1 = 61, d 2 = 122, d 3 = 182, d 4 = 243, d 5 = 304, d 6 = 365, b 1 = 0.9, b 2 = 0.9, b 3 = 0.85, b 4 = 0.85, b 5 = 0.8, b 6 = 0.8, r 1 = 1.05, r 2 = 1.10, r 3 = 1.15, r 4 = 1.20, r 5 = 1.25, r 6 = 1.30, and b = 0.7. The remaining time until maturity is 60 days, the annual volatility is 30%, the annual risk-free interest rate is 5%, and the initial stock price at time 0 is $125.
We set m = 50, so 50 sample paths for the geometric Brownian motion are generated at each x i to compute Y i1 , . . . , Y i50 for 1 ≤ i ≤ n. We compute Y i = ∑ 50 j=1 Y i j /50 for 1 ≤ i ≤ n and use (x 1 , Y 1 ), . . . , (x n , Y n ) to compute the proposed estimator g n by solving (6) with CVX, a package for specifying and solving convex programs (Grant & Boyd, 2014). The constant S in Formulation (6) is replaced with ∑ n i=1 S 2 i , where S 2 i is the sample variance of Y i1 , . . . , Y im for 1 ≤ i ≤ n. To measure the accuracy of the proposed estimator, we compute the following integrated mean square error (IMSE) between the underlying function f * andĝ n : ∑ n i=1 (ĝ n (x i ) − f * (x i )) 2 /n, where f * (x i ) is estimated from the average of 400, 000 iid replications of Y i j at each x i . Table 1 reports the averages and the standard deviation of the IMSE, computed based on 200 iid replications ofĝ n , for a variety of n values. The IMSE decreases as n increases, which displays the convergence ofĝ n to f * as n → ∞.

Concluding Remarks
In this paper, we study a numerically preferred formulation for fitting a smooth function to noisy data. Numerical results illustrate that our formulation successfully computes the fit. They also suggest that the fit converges to the true function as the number of observations in the data set increases to infinity. Future research topics include studies on asymptotic