Testing for the Stochastic Dominance Efficiency of a Given Portfolio

We propose a new statistical test of the stochastic dominance efficiency of a given portfolio over a class of portfolios. We establish its null and alternative asymptotic properties, and define a method for consistently estimating critical values. We present some numerical evidence that our tests work well in moderate�?sized samples.


INTRODUCTION
The portfolio choice problem is the cornerstone of portfolio theory and asset pricing. There are two main approaches to this: the mean variance (MV) approach and the more general stochastic dominance (SD) approach. In the MV approach, strong assumptions are made about the distribution of returns and/or preferences of the investor. The rules for practical computation and statistical inference are well established; see, e.g., Markowitz (1959) and Gibbons et al. (1989). The SD approach makes much weaker assumptions about the distribution of returns and preferences. However, the practical implementation of SD analysis has proven to be more difficult because of a number of mathematical and statistical issues. The portfolio problem is especially difficult, because we have to consider infinitely many portfolios, while the standard SD rules rely on pairwise comparison of the individual alternatives. Recently, there has been significant progress with regards to computational and statistical issues, which has advanced the position of the SD method; see Levy (2006) for an overview and bibliography.
We propose a test of whether a given portfolio is efficient with respect to the SD criterion in comparison with a polyhedral set of portfolios formed from a given finite set of assets. Post (2003) and Post and Versijp (2007) have recently proposed tests of the same hypothesis and have provided a method of inference based on the first-order optimality conditions of the investor's expected utility maximization problem. Their statistical approach uses a conservative bounding distribution, which might compromise statistical power or the ability to detect inefficient portfolios in small samples. They have also used a sampling scheme that assumes serially i.i.d.

S60
O. Linton, T. Post, and Y-J. Whang observations and hence does not allow for the GARCH effects often seen in high-frequency returns.
We propose an alternative statistical approach to the problem. Specifically, we suggest the use of a modification of the Kolmogorov-Smirnov test statistic of McFadden (1989) and Klecan et al. (1991). Recently, Linton et al. (2005a) (hereafter LMW) have provided a comprehensive theory of inference for a class of test statistics for the standard pairwise comparison of prospects. We extend their work to the portfolio case. This entails several non-trivial conceptual issues. The null hypothesis in LMW was of stochastic maximality in a finite set (i.e., there was at least one prospect that weakly stochastically dominated some of the others). The alternative was twosided and the number of prospects considered was finite. Because this only involves pairwise comparison, it is not appropriate for the situation where an investor might combine a set of basis assets into a portfolio. In addition, financial theory generally focuses on the concept of efficiency or non-dominance rather than maximality. For example, under the standard capital asset pricing model, the market portfolio is efficient but not maximal; it does not dominate any of the other efficient portfolios on the so-called capital market line.
We consider the null hypothesis that a given portfolio is not dominated by any other feasible portfolio. This requires a substantial modification to the test statistics of LMW because of boundary problems, an issue raised by Kroll and Levy (1980). Specifically, we estimate a contact set and compute the supremum in the test statistic only over the complement of a small enlargement of this set. For this, we need to develop new theory for the behaviour of these estimated sets and derived quantities. Our theory is related to the recent work of Chernozhukov et al. (2007). There is also an issue of computation because it is necessary to search over a large set of portfolios. When the number of base assets considered is small, grid search can be an effective way to solve the required optimization problem. For larger problems, some iterative optimization algorithms such as that of Nelder and Mead (1965) might be better suited. We leave the further development of an efficient algorithm for large-scale applications for a followup project. Here, we address the statistical performance of our test statistics. We provide the limiting distribution of our test statistic under the null hypothesis of SD efficiency, and we also give some results on asymptotic power. We propose to use the subsampling method for obtaining the critical values, and we establish that this is consistent under general conditions. We evaluate the performance of our method on simulated data.
For various reasons, we focus on the SD criteria of order two and higher, meaning that risk aversion is assumed throughout this study. Kopa and Post (2009) have provided, in a portfolio context, an extensive analysis of first-order SD (FSD), which allows for risk seeking. First, risk aversion is a standard assumption in financial economics, consistent with common observations such as risk premiums for risky assets, portfolio diversification and the popularity of insurance contracts. There are indications for local risk seeking behaviour at the individual level (e.g., the popularity of lotteries). However, the bulk of the literature on asset pricing and portfolio selection assumes that investors are globally risk averse. Secondly, FSD generally involves mixed integer programming techniques. In large-scale applications, the computational burden quickly becomes prohibitive, especially if the statistical inference is based on resampling methods. Thirdly, the FSD criterion is very general and allows for 'exotic' preference structures (e.g., utility functions with inflection points and discontinuous jumps). Thus, an empirical test for FSD efficiency will have considerable freedom to fit an unrealistic utility function to the data. Presumably, this will considerably slow down the rate of convergence of an empirical test.

THE NULL HYPOTHESIS
We consider a single-period portfolio decision under risk model. Individuals choose portfolios of assets to maximize the expected utility of the returns to their portfolio. Let X = (X 1 , . . . , X K ) be the vector of returns on a set of K assets, and let Y be the return on some benchmark portfolio of X. The benchmark portfolio is taken as given and follows from theory or observation (e.g., the aggregate market portfolio held by the representative investor or the actual portfolio held by a given individual investor). We consider portfolios with return X λ, where λ = (λ 1 , . . . , λ K ) , = {λ ∈ R K + : e λ = 1} and e = (1, . . . , 1) . The approach applies also for a portfolio possibilities set with the shape of a general polytope, allowing for general linear constraints, such as short selling constraints, position limits and restrictions on risk factor loadings. In this general case, we can enumerate the V vertices of the polytope and replace the K base assets with these vertices. However, restrictions on value-at-risk and tracking error introduce non-linearities and estimation issues that require modifications of our model. Let 0 be some subset of reflecting whatever additional restrictions, if any, are imposed on . Let U 1 denote the class of all von Neumann-Morgenstern type utility functions, u, such that u ≥ 0 (non-satiation). Also, let U 2 denote the class of all utility functions in U 1 for which u ≤ 0 (risk aversion), and let U 3 be the set of functions in U 2 for which u ≤ 0 (prudence).
Likewise, we can define third-order efficiency replacing U 2 by U 3 . This is the definition of portfolio efficiency used by Post (2003), which generalized the definition of convex SD of Fishburn (1974) and Bawa et al. (1985) to account for diversification.
Let F λ (·) and F Y (·) be the cumulative distribution functions (CDFs) of X λ and Y , respectively. For a given integer s ≥ 2, define the sth-order integrated CDF of X λ to be ≤ 0 for all x with strict inequality for at least one x in the support X . One way to interpret this condition is that portfolio X λ has a more favourable (s − 1)th-order lower partial moment than portfolio Y for all relevant return levels. For s ≥ 2, this definition is equivalent to Definition 2.1, but not so for s = 1; see Kopa and Post (2009) for a discussion. Thus, our results are only meaningful for s ≥ 2, although we retain the general definition. For notational simplicity, we sometimes let the dependence on s of the quantities introduced below be implicit (i.e., we write G (s) λ as G λ , and so on). We wish to test the following null hypothesis. H 0 : Y is sth-order SD efficient according to Definition 2.1 in the sense that there does not exist any portfolio in {X λ : λ ∈ 0 } that dominates it, where 0 is a compact subset of .
This hypothesis has previously been tested by Post (2003) and Post and Versijp (2007), among others. In the next section, we discuss the general approach for testing the efficiency hypothesis against the general alternative hypothesis H A that is the negation of H 0 (i.e., the evaluated portfolio is inefficient). In our case with a convex choice set and SD criteria of order two and higher, inefficiency of Y implies that there exists a portfolio with weights λ 0 ∈ 0 that dominates Y (a result that does not apply in the case of a discrete choice set and/or the first-order criterion). However, our focus here is on statistical inference regarding the efficiency classification of a given portfolio rather than the (often non-robust) dominance relation between individual portfolios. Also, we do not attempt to characterize the entire set of efficient portfolios, which is generally uncountable, non-convex and non-robust, although such a characterization (presumably by means of mixed-integer linear constraints) can be of interest for a stochastic programming approach to portfolio construction. Similarly, a characterization of all utility functions (presumably piecewise polynomials) that support the efficiency classification of a given portfolio is beyond the scope of this paper.

General strategy
Let F be the joint distribution of X. The general approach is to find a functional d(F ) ∈ R such that d(F ) ≤ 0 when F satisfies the null hypothesis, while d(F ) > 0 when F does not satisfy the null hypothesis. We then replace F by an estimate F , compute the empirical functional d( F ) and reject for large positive values of d( F ). To carry out a statistical test, we have to choose the cut-off point c α to have certain properties, but we shall address this later.
Consider the functional This is essentially a modification of the functional used in LMW to test for stochastic maximality. 1 This functional satisfies (2.1) ≤ 0 under the null hypothesis. Unfortunately, there are some elements of the alternative for which (2.1) = 0 and so we cannot obtain a consistent test from this functional. Kroll and Levy (1980) have considered a similar example, where Y is U [0, 1] and X is U [0, 2] so that X dominates Y . They have proved that Pr(min 1≤i≤n X i < min 1≤i≤n Y i ) → 1/3 as n → ∞ so that there is approximately at least one-third chance of finding no dominance based on samples on X, Y . The null and alternative hypotheses we are testing are quite complex, and to characterize them we introduce some further notation. For each λ, we define the following three subsets of X : The supremum over the entire support fails to distinguish between weak and strict inequality. This is not an issue in testing the hypothesis of stochastic maximality, because the reverse comparison will identify that inf x∈X (G λ (x) − G Y (x)) < 0. However, it does matter here. Specifically, suppose that A = λ and A + λ 1 Their null hypothesis was that there exists at least one prospect from a finite set that dominates some of the others. They considered the functional where λ and μ are chosen from a finite set. Under their null hypothesis, d * ≤ 0, while under their alternative d * > 0. are non-empty and A − λ = ∅ for some λ. For these λ, we have inf x∈X (G Y (x) − G λ (x)) = 0, even though X λ dominates Y . If the other λ are such that we have only A = λ and A − λ non-empty so that inf x∈X (G Y (x) − G λ (x)) < 0 for those values, then we find that (2.1) = 0.
We next suggest some modifications of (2.1) that properly characterize the null hypothesis. This modification involves keeping away from the boundary points.
For each > 0, define the -enlargement of the set A = λ and its complement in X : Under the null hypothesis, d * ( , F ) ≤ 0 for each ≥ 0, while under the alternative hypothesis we have d * ( , F ) > 0 for some > 0. The idea is to prevent the inner infimum ever being zero through equality on some part of X . Now, consider This functional divides the null from the alternative. An alternative approach is based on the idea that even in cases where lim →0 d * ( , F ) = 0 under the alternative, there might be slow enough convergence in so that it is possible to distinguish the null from the alternative for these cases based on the contact rate. That is, we can expect d * ( , F ) (F ) α as → 0 for some α > 0 and (F ), where (F ) = 0 under the null hypothesis and (F ) > 0 under the alternative hypothesis. This higher-order difference is enough to identify the null from the alternative, as we show below.
In practice, we have to estimate the set B λ from the data, which we do below in a simple way; see Chernozhukov et al. (2007) for a discussion of set estimation problems.

TEST STATISTICS
We suppose now that we have a time series of observations on the assets, and Y t for t = 1, . . . , T . The general approach is to define empirical analogues of (2.3) as our test statistics. Let k T = c 0 · (log T /T ) 1/2 and let T denote a sequence of positive constants satisfying Assumption 4.2, where c 0 is a positive constant. Define Linton, T. Post, and Y-J. Whang and likewise for G Y (x). This is our proposed test statistic; rejection is for large positive values. Note that to compute (3.5) requires potentially high dimensional optimization of a discontinuous non-convex/concave objective function. We next discuss briefly the computational issues. The supremum over the scalar x in (3.5) is computed by a grid search; the main issue is with regards to the optimization over λ, which might be high dimensional. The objective function Q T (λ, x) can be written as see Davidson and Duclos (2000). When s = 1, Q T (λ, x) is neither continuous in x nor in λ. When s = 2, this function is not differentiable or convex in λ ∈ R K , but it is continuous in x. When s = 3, the objective function is differentiable in x but not in λ. Therefore, we cannot use standard derivative-based algorithms, such as Newton-Raphson, to find the optima (in any case, these methods do not work when the solution might be on the boundary of the parameter space). We could replace the empirical CDFs by smoothed empirical CDF estimates in order to impose additional regularity on the optimization problem so that derivative-based iterative algorithms could be used. There is a well-established literature in econometrics concerning this class of non-smooth optimization estimators; see Pakes and Pollard (1989). Nevertheless, it is a difficult problem computationally to achieve the maximum over λ with high accuracy when K is large in the non-smooth case. In principle, it is possible to use one of the many algorithms appropriate for non-smooth optimization, such as the algorithm of Nelder and Mead (1965) or more recent developments. This method does not require any particular structure. For this algorithm to work well in high-dimensional cases, good starting values are necessary. One proposal is to obtain these by grid searching over the MV efficient frontier. The MV efficient set is a natural starting point, because for the normal distribution the SD efficient set and the MV efficient set coincide. The set of MV efficient portfolios can be computed in terms of the unconditional mean μ and the covariance matrix of the vector X t . For given μ p , there exists a unique portfolio λ(μ p ) that minimizes the variance σ 2 p of the portfolios that achieve return μ p . The set of MV efficient portfolio weights are indexed by the target portfolio return μ p , specifically λ p = g + hμ p , where the vectors g(μ, ), h(μ, ) satisfy g Campbell et al. (1997, p. 185). Therefore, we take a grid of values of μ p , obtain λ p for this grid and then compute the test statistic. To impose the condition that there is no short selling, it suffices to search in the range M = [μ min , μ max ]. The optimal value of λ p can be used as a starting value in some more general optimization algorithm.
In an earlier version of this paper (Linton et al., 2005b), we proposed an alternative computationally desirable method based on linear programming, and in a future paper we will implement a large-scale application based on this idea.

ASYMPTOTIC PROPERTIES
In this section, we give the asymptotic properties of the test statistic under the null and alternative hypothesis. We also present the subsampling method for obtaining critical values and establish that our test is consistent against all alternatives under our conditions.
To discuss the asymptotic null distribution of our test statistic, we need the following assumptions.
Assumption 4.2 requires that the function G Y (·) − G λ (·) is monotonic on a Tneighbourhood of the boundary ∂A = λ of A = λ . It is satisfied when G Y (x) and G λ (x) have derivatives that are not equal on the local neighbourhood of ∂A = λ because by Taylor expansion while for x far from A = λ the minimum is eventually dominated by T , which can be made arbitrarily small.
Define the empirical process in λ and x to be Let ν(·, ·) be a mean zero Gaussian process on 0 × X with covariance function given by Then, the limiting null distribution of our test statistic is given in the following theorem.  (4.1). Theorem 4.1 shows that the asymptotic null distribution of d T is non-degenerate when = 0 = ∅ and depends on the joint distribution function of (X t , Y t ) . The latter implies that the asymptotic critical values for d T cannot be tabulated once and for all. However, we define in Section 4.2 various simulation procedures to estimate them from the data.

Critical values
4.2.1. Subsampling. We propose a subsampling method to obtain consistent critical values. The subsampling method has been proposed by Politis and Romano (1994) and works in many cases under very general settings; see, e.g., Politis et al. (1999). The subsampling is useful in our context because our null hypothesis consists of a complicated system of inequalities, which is hard to mimic using the standard bootstrap. Furthermore, the subsampling-based test described below has an advantage of being asymptotically similar on the boundary of the null hypothesis; see below and LMW for details. It is also much more computationally convenient than full resampling.
The subsampling procedure is based on the following steps. STEP 4. Approximate the sampling distribution of d T by The circular block version (Kläver, 2005) involves an edge modification in Step 2 that wraps the sample around. The above subsampling procedure can be justified in the following sense. Let b = b T be a data-dependent sequence satisfying the following. ASSUMPTION 4.3. Pr(l T ≤ b T ≤ u T ) → 1 where l T and u T are integers satisfying 1 ≤ l T ≤ u T ≤ T , l T → ∞ and u T /T → 0 as T → ∞.
Then, the following theorem shows that our test based on the subsample critical value has the asymptotically correct size.
THEOREM 4.2. Suppose Assumptions 4.1-4.3 hold. Then, under the null hypothesis, we have We now compare the subsampling and bootstrap procedures. Under suitable regularity conditions, it is not difficult to show that the asymptotic size of the test based on the bootstrap critical value h T (α) is α if the least favourable case (when the marginal distributions all coincide) is true. Therefore, in this case, we might prefer bootstrap to subsampling because the former uses the full sample information and hence might be more efficient in finite samples. However, as we have argued in another context (see LMW, Section 6.1), the least favourable case is only a special case of the boundary (i.e., = 0 = ∅) of the null hypothesis H 0 , whereas the test statistic d T has a non-degenerate limit distribution everywhere on the boundary. This implies that the bootstrapbased test is not asymptotically similar on the boundary, which in turn implies that the test is biased; see Lehmann (1959, Chapter 4). However, the subsample-based test is unbiased and asymptotically similar on the boundary and might be preferred in this sense. In practice, it might be desirable to employ both approaches to see if the results obtained are robust to the choice of resampling schemes.

Asymptotic power
In this section, we discuss the consistency and local power properties of our test.
If the alternative hypothesis is true, 2 = + 0 ∪ 0 is non-empty. When + 0 is empty, we need the following assumption for the consistency of our test.
Then, we have the following result. Pr d T > s T , b T (α) → 1 as T → ∞.
Next, we determine the power of the test d T against a sequence of contiguous alternatives converging to the boundary = 0 = ∅ of the null hypothesis at the rate 1/ √ T . That is, consider the set of portfolio weights T λ T (y)dy. As before, we abbreviate the superscript s for notational simplicity. Then, we assume that the functionals G λ T (x) and G Y (x) satisfy the following local alternative hypothesis: where δ λ,T (·) is a real function such that δ λ, The asymptotic distribution of d T under the local alternatives is given in the following theorem.
The result of Theorem 4.4 implies that asymptotic local power of our test based on the subsample critical value is given by lim T →∞ Pr(d T > s T , b T (α)) = Pr(L 0 > s(α)), (4.5) where L 0 denotes the limit distribution given in Theorem 4.4 and s(α) denotes the α-quantile of the asymptotic null distribution of d T given in Theorem 4.1. The asymptotic local power of the test is determined by δ λ (·) and the test is asymptotically locally unbiased if δ λ (x) is a non-negative function and is bound bounded away from zero on (a strict subset of) X for all λ ∈ = 0 .

CONCLUSIONS
We have proposed a statistical test of the efficiency in the SD sense of a given portfolio relative to a polyhedral set of feasible portfolios formed from a discrete set of base assets. We have shown that the test is consistent against alternatives. In an earlier version of this paper (Linton et al., 2005b), we have shown that it works reasonably well in small samples in a simple bivariate Monte Carlo simulation based on plausible parameter values. Implementing the test for higher dimensions (a large number of base assets) remains a formidable challenge, and we are pursuing this in a separate paper (in progress).
where the pseudo-metric on 0 × X is given by The fidi convergence result holds by the Cramer-Wold device and a CLT for bounded random variables (see Hall and Heyde, 1980, Corollary 5.1), because the underlying random sequence {(X t , Y t ) : t ≥ 1} is strictly stationary and α-mixing with ∞ m=1 α(m) < ∞ by Assumption 4.1. However, the stochastic equicontinuity condition (A.2) holds by Theorem 2.2 of Andrews and Pollard (1994) with Q = q and γ = 2. To see this, note that their mixing condition is implied by Assumption 4.1(a). Also, let Then, F is a class of uniformly bounded functions that satisfy the L 2 -continuity condition. That is, for some constants C 1 , C 2 < ∞, Here, sup * denotes the supremum taken over (λ 1 , x 1 ) ∈ 0 × X for which λ 1 − λ ≤ r 1 , |x 1 − x| ≤ r 2 and r 2 1 + r 2 2 ≤ r, the first inequality holds by several applications of Cauchy-Schwarz inequality and Assumption 4.1(b) and the second inequality holds by Assumption 4.1(c). This implies that the bracketing condition of Andrews and Pollard (1994, p. 121) holds because the L 2 -continuity condition implies that the bracketing number satisfies N (ε, F) ≤ C 3 · (1/ε) K+1 . This establishes Lemma A.1.

LEMMA A.2. Suppose Assumptions 4.1 and 4.2 hold. Then, we have
Proof: It suffices to show that for each λ ∈ 0 , where the second equality holds by the fidi convergence result of Lemma A.1. Next, we establish (A.4). Let x * 1 ∈ ( A = λ ) T , i.e., x * 1 = x 1 + η 1T for some x 1 ∈ A = λ and fixed sequence |η 1T | < T . It suffices to show that Pr(x 1 ∈ (A = λ ) T ) → 1. Let C 1 > 1 be a constant. Then, we have, wp → 1, where the first inequality holds by triangular inequality and the second inequality holds using the fidi convergence result as in (A.5) and the fact that x 1 ∈ A = λ . Now, by Assumption 4.2, because T > k T , we have inf x ∈A = λ |x 1 − x | < T , wp → 1, which implies that Pr(x 1 ∈ (A = λ ) T ) → 1, as required. Next, consider (A.7). Let Z ⊂ R be a compact set containing zero. Define the stochastic process l T (·, ·, ·) on − 0 × X × Z to be l T (λ, x, z) = ν T (λ, x + z). Then, using an argument similar to Lemma A.1, l T (·, ·, ·) is stochastic equicontinuous on − 0 × X × Z, which in turn implies that This now completes the proof of Theorem 4.1.

Proof of Theorem 4.2:
The proof is similar to the proof of Theorem 2 of LMW; see also Politis et al. (1999, Theorem 3.5.1).