Abstract
Optimizing risk measures such as Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR) of a general loss distribution is usually difficult, because (1) the loss function might lack structural properties such as convexity or differentiability since it is often generated via black-box simulation of a stochastic system; (2) evaluation of risk measures often requires rare-event simulation, which is computationally expensive. In this paper, we study the extension of the recently proposed gradient-based adaptive stochastic search to the optimization of risk measures VaR and CVaR. Instead of optimizing VaR or CVaR at the target risk level directly, we incorporate an adaptive updating scheme on the risk level, by initializing the algorithm at a small risk level and adaptively increasing it until the target risk level is achieved while the algorithm converges at the same time. This enables us to adaptively reduce the number of samples required to estimate the risk measure at each iteration, and thus improving the overall efficiency of the algorithm.
Similar content being viewed by others
References
Alexander, S., Coleman, T.F., Li, Y.: Minimizing CVaR and VaR for a portfolio of derivatives. J. Bank. Finance 30(2), 583–605 (2006)
Artzner, P., Delbaen, F., Eber, J.M., Heath, D.: Coherent measures of risk. Math. Finance 9, 203–228 (1999)
Borkar, V.S.: Stochastic approximation. Cambridge University Press, Cambridge (2008)
Cappé, O., Moulines, E., Rydén, T.: Inference in Hidden Markov Models. Springer, New York (2005)
Dorigo, M., Blum, C.: Ant colony optimization theory: a survey. Theor. Comput. Sci. 344(2), 243–278 (2005)
Gordy, M.B., Juneja, S.: Nested simulation in portfolio risk measurement. Manag. Sci. 56(10), 1833–1848 (2010)
Hu, J., Fu, M.C., Marcus, S.I.: A model reference adaptive search method for global optimization. Oper. Res. 55(3), 549–568 (2007)
Hu, J., Fu, M.C., Marcus, S.I., et al.: A model reference adaptive search method for stochastic global optimization. Commun. Inf. Syst. 8(3), 245–276 (2008)
Kushner, H.: Stochastic approximation: a survey. Wiley Interdiscip. Rev. Comput. Stat. 2(1), 87–96 (2010)
Kushner, H., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications. Springer Science & Business Media, Berlin (2003)
Kushner, H.J., Clark, D.S.: Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer Science & Business Media, Berlin (2012)
Larranaga, P., Lozano, J.A.: Estimation of distribution algorithms: a new tool for evolutionary computation. Springer Science & Business Media, Berlin (2002)
Molvalioglu, O., Zabinsky, Z.B., Kohn, W.: The interacting-particle algorithm with dynamic heating and cooling. J. Glob. Optim. 43(2–3), 329–356 (2009)
Molvalioglu, O., Zabinsky, Z.B., Kohn, W.: Meta-control of an interacting-particle algorithm for global optimization. Nonlinear Anal. Hybrid Syst. 4(4), 659–671 (2010)
Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank. Finance 26(7), 1443–1471 (2002)
Romeijn, H.E., Smith, R.L.: Simulated annealing for constrained global optimization. J. Glob. Optim. 5(2), 101–126 (1994)
Rubinstein, R.Y.: Combinatorial optimization, cross-entropy, ants and rare events. Stoch. Optim. Algorithms Appl. 54, 303–363 (2001)
Ruszczyński, A.: Risk-averse dynamic programming for Markov decision processes. Math. Program. 125(2), 235–261 (2010)
Ruszczyński, A., Shapiro, A.: Optimization of convex risk functions. Math. Oper. Res. 31(3), 433–452 (2006)
Zhou, E., Hu, J.: Gradient-based adaptive stochastic search for non-differentiable optimization. IEEE Trans. Autom. Control 59(7), 1818–1832 (2014)
Zhu, H., Zhou, E.: Risk quantification in stochastic simulation under input uncertainty. e-prints: arXiv:1507.06015 (2016)
Acknowledgements
This work was supported by National Science Foundation under Grants CMMI-1413790 and CAREER CMMI-1453934, and Air Force Office of Scientific Research under Grant YIP FA-9550-14-1-0059.
Author information
Authors and Affiliations
Corresponding author
A Proof of theorems
A Proof of theorems
Proof
Proof of Lemma 4.1. Since \(S_{\theta _k}(\cdot )\) is continuous in both \({C}_{\alpha ^*}\) and \(\gamma _{\theta _k}\), it suffices to show that for all \(x\in \mathcal {X}\)
Let us first show the left part of the above statement. Recall that by Assumption 1.(iii), we have \(M_k\rightarrow \infty \) as \(k\rightarrow \infty \). Then, we only need to show that the one-layer CVaR estimator \(\widehat{C}_{\alpha ^*}(x)\) is strongly consistent. By Lemma A.1 in [22] under Assumption 2.(i) this holds, where note that Assumption 3.1 in [22] is satisfied by Assumption 2 here.
It remains to establish the right part of (A.1). In view of Assumptions 1.(ii) and 1.(iii), we have \(N_k, M_k\rightarrow \infty \) as \(k\rightarrow \infty \). That is, \(N_k, M_k\) go to infinity simultaneously as \(k\rightarrow \infty \). Therefore, it suffices to show
Note that
i.e., the \((1-\rho )\)-level Value-at-Risk (VaR) of \((-C_{\alpha ^*}(x))\) w.r.t. \(f(x;\theta _k)\). Furthermore,
i.e., the sample \((1-\rho )\)-quantile of \(\{-\widehat{C}_{\alpha ^*}(x_k^i): i=1,\ldots ,N_k\}\). Therefore, \(\widehat{\gamma }_{\theta _k}\) is a nested estimator of \(\gamma _{\theta _k}\), where \(N_k\) outer-layer samples are drawn, and for each outer-layer sample \(M_k\) inner-layer samples are drawn.
Rewrite \(\widehat{C}_{\alpha ^*}(x)\) as
where \(\mathcal {E}_k(x)\) is the standardized error. By Theorem 3.3 in [22], we have
where “\(\overset{\mathcal {D}}{\Rightarrow }\)” denotes the convergence in distribution, and \(\mathcal {N}\left( 0, \sigma ^2(x)\right) \) denotes a normal distribution with mean zero and variance \(\sigma ^2(x)\), where \(\sigma ^2(x)\) is the variance parameter that only depends on x. Combined with (A.3), we can see that the standardized error \(\mathcal {E}_k(x)\) converges to \(\mathcal {N}\left( 0, \sigma ^2(x)\right) \) in distribution. Having established this, the remaining proof is identical to the proof of Theorem 3.2 in [22], where note that Assumption 2 here is parallel with Assumption 3.2 in [22]. For simplicity, here we list the main steps as follows.
-
1.
Show that the p.d.f. of \(\widehat{C}_{\alpha ^*}(x)\) and its first-order derivative, which are induced jointly by the sampling distribution \(f(\cdot , \theta _k)\) and the noise \(\xi _x\), converge to the p.d.f. of \(C_{\alpha ^*}(x)\) and its first order derivative, respectively, as \(k\rightarrow \infty \). Here note that the p.d.f. of \(C_{\alpha ^*}(x)\) is induced by \(f(\cdot , \theta _k)\). [Lemma B.1 in [22]].
-
2.
Show that the VaR of \(\widehat{C}_{\alpha ^*}(x)\) at risk level \(\rho \), denoted by \(VaR_{\rho }(\widehat{C}_{\alpha ^*}(x)\), converges to VaR of \({C}_{\alpha ^*}(x)\) at risk level \(\rho \) in the order of \(O(\frac{1}{M_k})\). This is done by Taylor expansion analysis on the p.d.f. of \(\widehat{C}_{\alpha ^*}(x)\) and its derivative. [Lemma B.3 in [22]].
-
3.
Show that the difference between nested risk estimator \(\widehat{VaR}_{\rho }(\widehat{C}_{\alpha ^*}(x)\), i.e., \(\widehat{\gamma }_{\theta _k}\), and \(VaR_{\rho }(\widehat{C}_{\alpha ^*}(x)\) is in the order of \(O(\frac{1}{N_k} \log N_k)\) uniformly for all \(M_k\). [Lemma B.4 in [22]].
Proof
Proof of Lemma 4.2. With a slight abuse of notation, we also use \(\left||A\right||_2\) to denote the spectral norm of a real square matrix A induced by the vector Euclidean norm. In particular, \(\left||A\right||_2=\sqrt{\lambda _{max}(A^T A)}\), i.e., \(\left||A\right||_2\) is the largest eigenvalue of the positive-semidefinite matrix \(A^TA\). When the matrix A is positive-semidefinite, \(\left||A\right||_2\) is just the largest eigenvalue of A.
To facilitate the proof, let us also introduce the following notations:
Here note that \(\widetilde{\mathbb {Y}}_k\), \(\widehat{\mathbb {Y}}_k\) are vectors because \(\Gamma (\cdot )\) are vector-valued functions, and \(\widetilde{\mathbb {Z}}_k\), \(\widehat{\mathbb {Z}}_k\) are scalar-valued.
Since \(C_{\alpha ^*}(x)\) and \(\Gamma (x)\) are both bounded on \(\mathcal {X}\), we immediately have \(|\widetilde{\mathbb {Z}}_k|\) bounded below from zero and \(\frac{\left||\widehat{\mathbb {Y}}_k\right||_2}{\left| \widehat{\mathbb {Z}}_k\right| } \) bounded for all k. Note that
Therefore,
Recall that \(\widehat{V}_k=\left( \widehat{Var}_{\theta _k}[\Gamma (x)]+\epsilon I\right) \). Thus, it is a positive-definite matrix and its minimum eigenvalue is at least \(\epsilon \). It follows that the maximum eigenvalue of \(\widehat{V}_k^{-1}\) is no greater than \(\epsilon ^{-1}\), i.e., \(\left||\widehat{V}_k^{-1}\right||_2\le \epsilon ^{-1}\). Since \(|\widetilde{\mathbb {Z}}_k|\) is bounded below from zero, \(\frac{\left||\widehat{\mathbb {Y}}_k\right||_2}{\left| \widehat{\mathbb {Z}}_k\right| }\) is bounded, and \(\Gamma (x)\) is bounded on \(\mathcal {X}\), Lemma 4.1 implies that \(\left||b_k\right||_2\rightarrow 0\) w.p.1 as \(k\rightarrow \infty \).
Proof
Proof of Theorem 4.2. Let us first show the following lemma. \(\square \)
Lemma A.1
Suppose Assumptions 1 and 2 hold. Further suppose the risk level sequence \(\{\alpha _k\}\) generated by (3.2) converges to the target risk level \(\alpha ^*\) w.p.1. Then the sequence \(\{\theta _k\}\) generated by (4.8) converges to a limit set of the ODE (4.7) w.p.1.
Proof of Lemma A.1. Similar to the proof of Theorem 4.1, we will reformulate the updating scheme (4.8) as a noisy discretization of the constrained ODE (4.7), and show both the bias and the noise are properly bounded. Specifically, rewrite (4.8) as
where \(G(\theta _k)\) and \(e_k\) are defined as previously, \(\bar{b}_k\overset{\triangle }{=}\widehat{V}_k^{-1}\left( \bar{\mathbb {E}}_{q_k}[\Gamma (x)]- \widetilde{\mathbb {E}}_{q_k}[\Gamma (x)]\right) \), and \(\bar{p}_k\) is the projection error term that takes the current iterate back onto the constraint set \(\widetilde{\Theta }\) with minimum Euclidean norm. In view of Theorem 2 in [9], it suffices to show
To ease the presentation, let us denote
It immediately implies that \(\widetilde{\mathbb {E}}_{q_k}[\Gamma (x)]= \widetilde{\mathbb {E}}^{\alpha ^*}_{q_k}[\Gamma (x)]\). Furthermore,
Following an argument almost identical to the proof of Lemma 4.2, the first term in (a.5) converges to 0 w.p.1 as \(k\rightarrow \infty \). Note that \(S_{\theta _k}(\cdot )\) is a continuous function and \(C_{\alpha _k}(x)\) is continuous in \(\alpha _k\). Thus, \(\widetilde{\mathbb {E}}^{\alpha _k}_{q_k}[\Gamma (x)]\) is a continuous function in \(\alpha _k\). Therefore, the second term in (a.5) converges to 0 w.p.1 as \(k\rightarrow \infty \) since \(\left||\widehat{V}_k^{-1}\right||_2\) is bounded and \(\alpha _k\) converges to \(\alpha ^*\) as \(k\rightarrow \infty \). Proof of Lemma A.1 is now complete.
In view of Lemma A.1, it remains to show that the risk level sequence \(\{\alpha _k\}\) generated by (3.2) converges to the target risk level \(\alpha ^*\) w.p.1. Proof by contradiction. Since the sequence \(\{\alpha _k\}\) is non-decreasing and bounded above by \(\alpha ^*\), let us assume \(\lim _{k\rightarrow \infty }\alpha _k=\bar{\alpha }^*\) and \(\bar{\alpha }^*< \alpha ^*\) w.p.1. Conditioning on this, Lemma A.1 still holds when the target risk level \(\alpha ^*\) is replaced by \(\bar{\alpha }^*\). That is, the algorithm GASS-CVaR-ARL converges, and the gradient sequence \(\{g_k\}\) converges to 0 w.p.1. as \(k\rightarrow \infty \). Note that \(g_k\) is bounded (since \(\Gamma (x)\) is bounded), by bounded convergence theorem we have
Furthermore, note that
where
We have shown in the proof of Lemma A.1 that
Since \(\left||\bar{\mathbb {E}}_{q_k}[\Gamma (x)]- \widetilde{\mathbb {E}}^{\bar{\alpha }^*}_{q_k}[\Gamma (x)]\right||_2\) is bounded, again by bounded convergence theorem
Moreover, notice that \(\widetilde{\mathbb {E}}^{\bar{\alpha }^*}_{q_k}[\Gamma (x)]\) is a self-normalized importance sampling estimator of \(\mathbb {E}^{\bar{\alpha }^*}_{q_k}[\Gamma (x)]\). Applying Theorem 9.1.10 (pp. 294) in [4], we have
where \(\Gamma _j(x)\) is the \(j^{th}\) element in the vector \(\Gamma (x)\), and \(c_j\)’s are positive constants that depend on the bounds of \(\Gamma _j(x)\)’s on \(\mathcal {X}\). Therefore, by Cauchy–Schwarz Inequality we have
That is,
Combining (a.7), (a.8) with (a.9), we have
In view of (a.6), we have
Since \(\bar{\alpha }^*<\alpha ^*\), the sequence \(\{\left||\bar{g}_k\right||_2\}\) generated by (3.2) will always be above a certain positive value w.p.1 (otherwise \(\alpha _k\) will converge to \(\alpha ^*\)), which contradicts with (a.10). Proof of Theorem 4.2 is complete. \(\square \)
Rights and permissions
About this article
Cite this article
Zhu, H., Hale, J. & Zhou, E. Simulation optimization of risk measures with adaptive risk levels. J Glob Optim 70, 783–809 (2018). https://doi.org/10.1007/s10898-017-0588-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-017-0588-8