Evaluating Asset Pricing Models in a Simulated Multifactor Approach

In this paper a methodology to compare the performance of different stochastic discount factor (SDF) models is suggested. The starting point is the estimation of several factor models in which the choice of the fundamental factors comes from different procedures. Then, a Monte Carlo simulation is designed in order to simulate a set of gross returns with the objective of mimicking the temporal dependency and the observed covariance across gross returns. Finally, the artiﬁcial returns are used to investigate the performance of the competing asset pricing models through the Hansen & Jagannathan (1997) distance and some goodness-of-ﬁt statistics of the pricing error. An empirical application is provided for the U.S. stock market.


Introduction
In asset pricing theory, one of the major interests for empirical researchers is oriented by testing whether a particular asset pricing model is indeed supported by the data. In addition, a formal procedure to compare the performance of competing asset pricing models is also of great importance in empirical applications. In both cases, it is of utmost relevance to establish an objective measure of model misspecification. The most useful measure is the well-known Hansen & Jagannathan (1997) distance (hereafter HJ-distance), which has been used both as a model diagnostic tool and as a formal criterion to compare asset pricing models. This type of comparison has been employed in many recent papers. See for example, Campbell & Cochrane (2000); Jagannathan & Wang (2002); Dittmar (2002); Jagannathan et al. (1998) ;Farnsworth et al. (2002); Lettau & Ludvigson (2001a); and Chen & Ludvigson (2009). As argued by Hansen & Richard (1987), observable implications of asset pricing candidate models are conveniently summarized in terms of their implied stochastic discount factors. As a result, some recent studies of the asset pricing literature have been focused on proposing an estimator for the SDF and also on comparing competing pricing models in terms of the SDF model. For instance, see Lettau & Ludvigson (2001b), Chen & Ludvigson (2009), Araujo et al. (2006). A different route to investigate and compare asset pricing models has also been suggested in the literature. The main idea consists of assuming a data generation process (DGP) for a set of asset returns, based on some assumptions about the asset price behavior, and then creating a controlled framework, which is used to evaluate and compare the asset pricing models. For instance, Fernandes & Vieira Filho (2006) study, through Monte Carlo simulations, the performance of different SDF estimates at different environments. One of the environments considered by the authors is that all asset prices follow a geometric Brownian motion. In this case, one should 426 expect that a SDF proxy based on a geometric Brownian motion assumption would perform better than an asset pricing model that does not assume this hypothesis. On the other hand, a critical issue of this procedure is that the best asset pricing model from these particular environments might not be a good model in the real world. In other words, the best estimator for each controlled framework might not necessarily exhibit the same performance for observed stock market prices of a real economy.
In this paper, we propose a methodology to compare different stochastic discount factor or pricing kernel proxies. Instead of generating the asset returns from a direct ad-hoc assumption about the DGP of returns, we use factor models and related market information from the real economy. The idea is to create a set of gross returns with the objective of mimicking the real world structure as closely as possible. Our starting point is the estimation of linear factor models (in the sense of the Arbitrage Pricing Theory -APT of Ross, 1976), in which the choice of common factors, which usually correspond to unobserved fundamental influences on returns, come from different procedures. For example, the well-known factors of Fama & French (1993, 1996, which evidenced those asset returns of the U.S. economy, could be explained by relative factors linked to characteristics of firms. The next step is to create a framework to compare the competing asset pricing models. In this sense, a Monte Carlo simulation is constructed to mimic, as closely as possible, the temporal dependency and the observed covariance across gross returns. Finally, the artificial returns are used to investigate the performance of the competing asset pricing models based on the performance of some statistics. In order to compare asset pricing models, several works use the HJ-distance on real data. Our strategy allows calculating the average and median of the HJ-distance across all realizations of the Monte Carlo experiment, which is shown to be a useful model evaluation tool. The main objective here is not to investigate and/or test distinct factor models to explain actual market returns, but rather to provide a simulated multifactor approach that allows one to properly compare and evaluate different SDF proxies. In this sense, this paper also follows the idea of Farnsworth et al. (2002), which studies different SDFs by constructing artificial mutual funds using real stock returns from the CRSP data. In addition, this controlled framework may be used for other applications that involve the study of asset returns, such as portfolio risk analysis or stress testing exercises. This way, it is worth mentioning that the results presented along the paper (regarding SDF proxies comparison) are, thus, conditional on the factor models adopted to replicate returns. In other words, we implicitly assume that those models might be representative of return series, provided that the focus of the paper is grounded on SDF comparison through Monte Carlo simulation and not on factor model investigation.
Nonetheless, one advantage of this methodology is that it not only restricts the analysis to known factors like the three factors of Fama and French, but also allows for purely statistical procedures, such as factor analysis. Moreover, the beta parameters associated with those factors are estimated instead of calibrated. In addition, the covariance structure across returns is conveniently taken into account in order to replicate the observed structure in the simulated setup.
To illustrate our methodology, we present a simple empirical application for the U.S. stock market, in which three SDF estimators are compared: a) the nonparametric estimator of Araujo et al. (2006); b) the Brownian motion pricing model studied in Brandt et al. (2006); and c) the traditional linear CAPM (see Cochrane (2001, p.152-166)). We also estimate the Hansen and Jagannathan SDF of minimum variance that will be used as a benchmark. The common factors used in this exercise are formed by three sets: (i) factors provided by the use of the factor analysis; (ii) the well-known three-factor model of Fama & French (1993, 1996; and (iii) an extended version of the previous three-factor model of Fama and French, (Ang et al., 2006, see), including momentum and short-or long-term reversal factors. In comparing asset pricing models, we use the average and median of the HJ-distances and a goodness-of-fit statistic provided by the pricing error. The result indicates that the SDF of Brandt et al. (2006) seems to be the best model, given that the Brownian motion hypothesis is able to generate SDF dynamics with adequate statistical features, which are closer to the Hansen and Jagannathan SDF. This paper is organized as follows: Section 2 presents some stochastic discount proxies; Section 3 discusses the factor models; Section 4 shows the procedures used to replicate the gross returns and the statistical measures to evaluate the performance of the SDF estimators; Section 5 presents the empirical application for the U.S. stock market; and Section 6 concludes.

Stochastic Discount Factor models
A general framework for asset pricing is well described in Harrison & Kreps (1979), Hansen & Richard (1987) and Hansen & Jagannathan 428 (1991), associated with the stochastic discount factor (SDF), which relies on the pricing equation; p t = E t (m t+1 x i,t+1 ), where E t (·) denotes the conditional expectation given the information available at time t, p t is the asset price, m t+1 is the stochastic discount factor, and x i,t+1 is the asset payoff of the i-th asset in t + 1. This pricing equation means that the market value today of an uncertain payoff tomorrow is represented by the payoff multiplied by the discount factor, also taking into account different states of nature by using the underlying probabilities. 1 The stochastic discount factor model provides a general framework for pricing assets. As documented by Cochrane (2001), asset pricing can basically be summarized by two equations: where the model is represented by the function f (·), and the pricing equation (1) can lead to different predictions stated in terms of returns. 2 Hansen & Jagannathan (1991) propose a way to calculate the SDF and provide a lower bound on the variance of a stochastic discount factor (SDF). In fact, although the authors do not deal with a direct estimate of the SDF, they show that the mimicked discount factor M * t+1 has a direct relation to the minimal conditional variance portfolio. Moreover, they exploit the fact that it is always possible to project the SDF onto the space of payoffs, which makes it straightforward to express the mimicking portfolio as a function of only observable variables: 1 According to Cochrane (2001, p.68), unless markets are complete, there are an infinite number of SDFs, but all can be decomposed as mt+1 = m * t+1 + vt+1, where Et(vt+1R i t+1 ) = 0, in which m * t+1 is called the SDF mimicking portfolio. 2 For instance, in the Consumption-based Capital Asset Pricing Model (CCAPM) context, the first-order conditions of the consumption-based model, summarized by the wellknown Euler equation: pt = Et β u ′ (ct+1) u ′ (ct) xt+1 . The specification of mt+1 corresponds to the intertemporal marginal rate of substitution. Hence, mt+1 = f (c, β) = β u ′ (ct+1) u ′ (ct) , where β is the discount factor for the future, ct is consumption and u (·) is a given utility function. The pricing equation (1) mainly illustrates the fact that consumers (optimally) equate marginal rates of substitution to prices.
where ı N is an N × 1 vector of ones, and R t+1 is an N × 1 vector stacking all asset returns. Equation (3) delivers a nonparametric estimate of the SDF that is solely a function of asset returns. There are different estimates of the SDF derived from other hypotheses, such as the SDF derived from the hypothesis of Brownian motion pricing, the linear stochastic discount factor derived from the CAPM, and the nonparametric SDF of Araujo et al. (2006).

i) Brownian motion pricing model
The price dynamics of a risky asset follows the basic Black & Scholes assumptions. Suppose that a vector of asset prices follows a geometric Brownian motion (GBM). Such hypothesis is defined by the following partial differential equation: where, dP P = dP 1 P 1 , ..., dP N P N ′ , µ = (µ 1 , ..., µ N ) ′ , Σ is an N × N positive definite matrix, P i is the price of the asset i, µ is the risk premium vector, R f is the risk-free rate, and B is a standard GBM of dimension N . Using the Itô theorem, it is possible to show that: where Z t is a vector of N independent variables with Gaussian distribution. Therefore, the SDF proposed by these authors is calculated as and the estimator of this stochastic discount factor model is given by: where, µ, R and Σ are estimated by: ii) Capital Asset Pricing Model -CAPM Using the pricing equation , it is easy to show that this implies a single-beta representation which is also equivalent to linear models for the discount factor m t+1 = a + bR w,t+1 , where R w,t+1 is a factor relative to the market risk. Therefore, assuming the unconditional CAPM, the SDF is a linear function of market returns. For instance, in the U.S. economy, in order to implement the CAPM, for practical purposes, it is commonly assumed that the return on the value-weighted portfolio of all stocks listed on the NYSE, AMEX, and NASDAQ is a reasonable proxy for the return on the market portfolio of all assets of the U.S. economy.
iiii) Araujo et al. (2006) An estimator for the stochastic discount factor within a panel data context is proposed by Araujo et al. (2006). This estimator assumes that, for every asset i ∈ {1, ..., N }, the vector process {ln(M t R t )} is covariance stationary with finite first and second moments. In addition, under no arbitrage and some mild additional conditions, they show that a consistent estimator for a positive SDF M t is given by: i,t are respectively the cross-sectional arithmetic and geometric mean of all gross returns. Therefore, this nonparametric estimator depends exclusively on appropriate means of asset returns that can easily be implemented. 3

Multifactor Pricing Models
A benchmark for the development of asset pricing models is the work of Sharpe (1964), which proposed the well-known Capital Asset Pricing Model (CAPM). The CAPM approach is based on a single factor to explain different return series and, despite its simplicity, quite often does not exhibit a good fit to real data. In this sense, Ross (1976) proposed a multifactor approach based on "no arbitrage" assumptions to explain return series, resulting in the so-called APT (Arbitrage Pricing Theory) model. Afterward, Fama & French (1993) suggested the 3-factor model, based on market and firms characteristics, with the aim of improving the fit to return data and capture market anomalies. Based on the "momentum" factor of Jegadeesh & Titman (1993), the 4-factor model is later proposed by Carhart (1997).
More recently, Grinblatt & Titman (2002) divides the factor model literature into three main categories: (i) factors derived from macroeconomic variables (e.g., CAPM, ICAPM -Intertemporal Capital Asset Pricing Model (see Cochrane (2001, p.166) for further details)); (ii) factors based on firm attributes; and (iii) factors based on statistical procedures. In this work, we ground the analysis on categories (ii) and (iii) by using the Fama-French (standard and extended) factors and principal componentbased factors to replicate return series. Nonetheless, it is worth mentioning that different approaches could also be employed to generate artificial returns from multifactor models (see Campbell et al. (1997, p.219) for a good survey).
We start investigating the APT model of Ross (1976) in order to use a multifactor pricing model to reproduce artificial asset returns. Consider a K-factor model from a set of observed gross returns in which K is the total number of common components or factors X t,k .
to recover the logarithm of the SDF relies on one of its basic properties: the SDF can be interpreted as a "common feature", in the sense of Engle & Kozicki (1993), of every asset return of the economy. Thus, under mild regularity restrictions (e.g. absence of arbitrage opportunities) on the behavior of asset returns, the authors treat the SDF as a stochastic process and build a consistent estimator for it, which is a simple function of the arithmetic and geometric averages of asset returns alone, and does not depend on any parametric function used to characterize preferences.

432
Rev. Bras. Finanças (Online), Rio de Janeiro, Vol. 10, No. 4, December 2012 Factor models summarize the systematic variation of the N elements of the vector R t using a reduced number of K factors. The expected return-beta expression of a factor pricing model is: where λ k is interpreted as the price of the k-th risk factor. Fama and French constructed factors and developed the pricing model that combined these factors to explain the average of stock returns. They evidenced that some factors can (relatively well) explain the average of stock returns. 4 They showed that, besides the market risk, there are other important related factors, such as size, book-to-market ratio, momentum and leverage, among others, that help explain the average return in the stock market. The authors mentioned that these factors are indeed related to economic fundamentals and these additional factors might (quite well) help to understand the dynamics of the average return. This evidence has been demonstrated in subsequent works and for different stock markets (see Gaunt (2004) and Griffin (2002) for a good review). 5 The main three factors, described below, are the SMB, HML and RM.
(i) The SMB (Small Minus Big) factor is constructed to measure the size premium. In fact, it is designed to track the additional return that investors have historically received by investing in stocks of companies with relatively small market capitalization. A positive SMB in a given month indicates that small cap stocks have outperformed the large cap stocks in that month. On the other hand, a negative SMB suggests that large caps have outperformed.
(ii) The HML (High Minus Low) factor is constructed to measure the premium-value provided to investors for investing in companies with high book-to-market values. A positive HML in a given month suggests that "value stocks" have outperformed "growth stocks" in that month, whereas a negative HML indicates the opposite. 6 (iii) The Market factor RM = R M − R f is the market excess return in comparison to the risk-free rate. For example, in the U.S. economy, the RM can be proxied by the value-weighted portfolio of all stocks listed on the New York Stock Exchange (NYSE), the American Stock Exchange (AMEX), and NASDAQ stocks (from CRSP data) minus the one-month Treasury Bill rate.
Considering these three factors, the factor model for expected returns is given by: where the betas β im , β is and β ih are slopes in the multiple regression (13). Hence, one implication of the expected return equation of the three-factor model is that the intercept in the time-series regression (14) is zero for all assets i: Using this criterion, Fama & French (1993, 1996 find that the model captures much of the variation in the average return for portfolios formed on size, book-to-market ratio and other price ratios. The Fama and French approach is (in fact) a multifactor model that can be seen as an expected beta 7 representation of linear factor pricing models of the form: By running this cross sectional regression of average returns on betas, one can estimate the parameters (γ, λ m , λ s , λ h ). Notice that γ is the intercept and λ m , λ s and λ h are the slope in this cross-sectional relation. In addition, the β im , β is and β ih are the unconditional sensitivities of the i-th asset to the factors. 8 Moreover, β ij , for some j ∈ {m, s, h}, can be interpreted as the amount of risk exposure of asset i to factor j, and λ j as the price of such risk exposure.
On the other hand, one can use factors different from the Fama-French approach to help explaining the variation of cross-section assets. An interesting example is the statistical technique known as factor analysis, which has been used to estimate factors from a huge quantity of asset returns. Factor analysis explains the covariance relationships among a number of observed variables in terms of a much smaller number of unobserved variables, termed factors, which reduces the dimensionality of the problem. In other words, this approach allows one to identify a small set of orthogonal unobservable factors by summarizing all the information contained in the original dataset (see Tucker & MacCallum (1993) and Johnson & Wichern (1992) for further details). The factor analysis technique applied to gross return series R it involves the following model: where J is the number of factors adopted, µ i is the unconditional mean of the gross returns, L i,j represents the factor loading (i.e., the contribution of each return to the variation of each factor), F j,t is the j-th factor and v it is an error term with zero mean and finite variance. Therefore, by using principal component analysis to estimate, for example, three factors, provided that the first factor alone F 1,t accounts for x percent of the total variance, whereas the second (F 2,t ) and the third (F 3,t ) ones account for y and z percent of the total variance, respectively, we have that x > y > z.

Replicating Return Time Series
Now we construct hypothetical returns using the multifactor pricing models. We first estimate the beta parameters from a linear factor model. Then, we replicate the returns by creating artificial series which mimic the real world ones. Finally, based on the artificial returns created through a Monte Carlo simulation, we evaluate the SDF candidates within this controlled setup.

A Controlled Environment to Simulate Portfolios
Since the objective of this paper is to provide a controlled setup to evaluate SDF estimators, we now present a simple methodology to replicate return series based on factor models. Given that a linear factor model approach is adopted to mimick the real world returns, we now focus on the methodology to replicate a vector of returns from a set of X t,k factors.
The following K-factor model is given by: Following the approach of Ren and Shimotsu (2006), we firstly estimate the beta parameters and collect the residuals ε it in order to compute the respective sample covariances (here summarized by the covariance matrix Ω). This covariance matrix is used in the simulation exercise as an additional information to the K factors. In this way, the simulated returns can account for both the factors and the model-based residual covariance structure. Second, we run the following cross-sectional regression (i.e., a standard one-dimension regression along i ∈ {1, ..., N }, which refers to the assets index): which gives the estimates of the risk-free rate α and the factor-mean adjusted risk prices η k based on: (i) β i,k estimated in (17); (ii) the sample X t,k for k = 1, ..., K; and (iii) the residual of the regression u i . After that, we considered random factors based on normal distributions with mean equal to the sample mean and variance equal to the 436 Rev. Bras. Finanças (Online), Rio de Janeiro, Vol. 10, No. 4, December 2012 Evaluating Asset Pricing Models in a Simulated Multifactor Approach their sample variance. 9 10 Finally, we create artificial return series R it , based on the following equation: in which X t,k are the adopted factors, ε it = Ω 1/2 ǫ it and ǫ it are drawn from independent standard normal distributions. Notice that, since E(ǫ it ǫ ′ it ) = I, it follows by construction that E( ε it ε ′ it ) = Ω. Considering the error structure in the multifactor model, the set of asset spans (at least, reasonably) the return space.
The objective here is to make the mean and variance of simulated returns as close as possible to the assumed factor models. The next step is to estimate the SDFs based on the artificial return series R it and further evaluate (to compare the competing SDF estimators) them through Hansen-Jagannathan distance and a goodness-of-fit statistic, which are discused in the next section. It is worth mentioning that we repeat the previous steps for an amount of n replications in order to complete the Monte Carlo simulation. For each replication, we split the set of N generated assets into two groups (with the same number of time series observations within each group). Firstly, consider an amount ofÑ < N assets to estimate the SDF candidates (henceforth, this first group of assets will be denominated insample). Based on the estimated SDF proxies and using the remaining (N −Ñ ) assets, used to generate the out-of-sample exercise (based on the approach of Fama & MacBeth (1973)), we compute the goodness-of-fit statistic in order to compare the performance of each SDF candidate and the Hansen-Jagannathan distance. In other words, we want to know how well the proxies are carried on when new information is considered.

Pricing Error of stochastic discount factor models i) Hansen and Jagannathan distance
In the asset pricing literature, some measures are suggested to compare competing asset pricing models. The most famous measure is the Hansen and Jagannathan distance, which is employed in this paper to test for model misspecification and compare the performance of different asset pricing models.
The Hansen & Jagannathan (1997) measure is a summary of the mean pricing errors across a group of assets. As shown by Hansen and Jagannathan, the HJ-distance δ = min m∈M y − m , defined in the L 2 space, is the distance of the SDF model y to a family of SDFs, m ∈ M, that correctly price the assets. In other interpretation, Hansen and Jagannathan show that the HJ-distance is the pricing error for the portfolio that is most mispriced by the underlying model. The pricing error can be written by α t = E t (m t+1 R i,t+1 ) − 1. Notice, in particular, that α t depends on the considered SDF, and the SDF is not unique (unless markets are complete). Thus, different SDF proxies can produce similar HJ measures. In this sense, even though the investigated SDF models are misspecified, in practical terms, we are interested in those models with the lowest HJdistance. 11 In the special case of linear factor pricing models, the HJ-distance takes the following form (see Ren & Shimotsu (2006) for details): and X t is a factor vector including a column of 1's.
ii) Goodness-of-fit statistic We also use a pricing error statistic to compare stochastic discount factor models, which is derived from the following equations, as mentioned by Cochrane (2001, p.81).
Note that (theoretically) the pricing error should be null (i.e., α t = 0 for ∀t). However, in practice, due to finite sample data and possible model (mis)specification, in general, we have that α t = 0. Nonetheless, the statistical significance of α t can be used as a model specification test. In the language of excess returns, we investigate the pricing errors through the distance between actual and predicted returns. Let E R e i,t ≡ E R i,t − R f t express the expected excess returns. For notation simplification we denote E R e i,t simply by E (R e ). From Cochrane (2001, p.96) we have that E (mR e ) = 0. Now, recall that E (mR e ) = E(m)E (R e ) + cov(m, R e ). Thus, it follows that: The pricing error based on excess returns (now labelled as P r) can be defined by = 1 R f * (actual mean excess return -predicted mean excess return) where E (R e ) is computed from actual mean excess return (i.e., estimated through the sample average of R e i,t along the time dimension) and − cov(m,R e )

E(m)
is the mean excess return predicted by equation (22), in which cov(m, R e ) is estimated via the sample covariance between m and R e . Let M s t be the SDF proxy provided by the model s in a family S of asset pricing models. Therefore, based on equation (23), the suggested (finite sample) goodness-of-fit statistic is based on the sum of squared pricing errors P r (see (Cochrane, 2001, p.81)) for further details): In addition to the previous statistic, the artificial assets are also tested in an out-of-sample setup. In this sense, the set of (N −Ñ ) assets are used to generate the out-of-sample returns in order to compute the Hansen and Jagannathan distance. That is, we want to know how well the SDF proxies are carried on when new information is considered.

Empirical Application
In order to present our methodology, we investigate the performance of three different SDF estimators described in section 3, which are the Brownian motion pricing, the linear stochastic discount factor of the CAPM and the SDF proposed by Araujo et al. (2006). We also estimate the SDF of Hansen and Jagannathan as a benchmark.

Data
The U.S. portfolios dataset was extracted from the Kenneth R. French website 12 and the asset return of the Standard & Poor's 500 stock-market was obtained from the Yahoo Finance web site. The U.S. Treasury Bill return is used as a measure of the risk-free asset. The primitive portfolios used in competing SDFs models are described as follows: i) 25 portfolios which contain value-weighted returns for the intersections of 5 ME portfolios and 5 BE/ME portfolios. 13 ii) 48 industrial portfolios which contain value-weighted returns for 48 industry portfolios.
iii) 96 portfolios which contain value-weighted returns for the intersections of 10 ME portfolios and 10 BE/ME portfolios. 12 More information about data can be found in http://mba.tuck.dartmouth.edu/ pages/faculty/ken.french/data_library.html. For other economies, the factors can be constructed as showed in Fama & French (1992. 13 ME is market cap at the end of June. BE/ME is book equity (Returns -Monthly).
In order to construct the set of factors based on Factor Analysis (FA) we use monthly S&P500 stock returns 14 covering the period from February 1987 to July 2010. Moreover, we only consider companies for which data from S&P500 are available throughout the whole considered period, which reduces the cross-section sample from N = 500 to N = 263. This data reduction comes from the fact that the S&P500 dataset is not balanced (i.e., it is not based on a fixed set of companies throughout time, since the firms that compound the index are revised in a frequent basis 15 ), which makes it difficult to deal with large N and T dimensions, such that N represents a fixed set of companies for a long span of time periods T . Thus, although the aggregate S&P500 index is available much far before 1987, from the set of N = 500 firms, which compound the index, less than 150 (surviving companies) would be available to construct factors in the case of T = 350. Since the larger is the set of considered companies the better is the motivation to employ the principal component technique, we have decided to only investigate T = 280 in this case.

Factors
We use the three factors construted by Fama and French, thus the first set of factors is: (i) X t ={R ex M t ; SM B t ; HM L t }. Provided that the threefactor model of Fama & French (1993, 1996 is not an unanimous approach in the literature, we also investigate an extend version by including momentum, M om t , short-term reversal ST Rev t and long-term reversal LT Rev t factors, in order to increase the fit of the factor model to the actual data (Farnsworth et al., 2002, see). For example, the momentum factor is defined as the average return on the two high prior return portfolios minus the average return on the two low prior return portfolios, and the short-term and long-term reversal factors are defined as the average return on the two low prior return portfolios minus the average return on the two high prior return 14 The S&P500 index is based on the stock prices of 500 companies (mostly from the United States) selected by a committee, in order to be representative of the industries in the U.S. economy. Nonetheless, the index nowadays includes a handful set of non-U.S. companies (15 as of May 8, 2012). This group includes both formerly U.S. companies that have reincorporated outside the United States, as well as firms that have never been incorporated in the U.S. 15 In order to keep the S&P 500 index reflective of American stocks, the constituent stocks need to be changed from time to time. For instance, it needs to take into account stocks liquidity as well as corporate actions such as stock splits, share issuance, dividends and restructuring events such as mergers or spinoffs.
portfolios. In addition, the (i) short-term reversal factor, (ii) the momentum factor and (iii) the long-term reversal factor are based on previous (i) t − 1; (ii) t − 2 to t − 12 and (iii) t − 13 to t − 60 months, respectively. 16 This way, the second set of factors is given by (ii) X t = {R ex M t ; SM B t ; HM L t ; M om t ; ST Rev t ; LT Rev t }. In addition to the Fama and French factors we also construct a new set of factors, based on purely statistical grounds, using assets from the S&P500 stock market.
We use the factor analysis technique 17 to construct a set of K factors. In this setup, we study a set of factors generated by the factor analysis K = 3, (iii) X t = {F 1,t ; F 2,t ; F 3,t }. In summary, we study three set of factors: (i) The Fama and French factors X t = {R ex M t ; SM B t ; HM L t }; (ii) The extendend Fama and French setup X t = {R ex M t ; SM B t ; HM L t ; M om t ; ST Rev t ; LT Rev t }; and (iii) The Factor Analysis set of factors X t = {F 1,t ; F 2,t ; F 3,t }.

Results
Based on a given set of observed gross returns, we construct a Monte Carlo simulation in order to replicate the observed returns and further evaluate the competing SDF estimators. Then, we estimate the stochastic discount factors based on the returns generated from the factor models, and repeat the mentioned procedure for an amount of n = 5, 000 replications. Some descriptive statistics of the generated SDFs are presented in Appendix (table A.1). Finally, the evaluation of the SDF proxies is conducted and the simulation results are summarized by goodness-of-fit statistic and the HJ-distance, which are averaged across all replications. We denote the SDF proxies, estimated in each replication, as models A, B and C to Araujo et al. (2006), Brownian Motion and CAPM, respectively. In addition, the stochastic discount factor implied by the Hansen & Jagannathan (1991) setup is estimated as a benchmark, denoted by SDF HJ. In Figure 1, the estimates of the SDF proxies are shown for one replication of the Monte Carlo simulation, with N = 96 and T = 280. A simple graphical investigation reveals that the Brandt et al. (2006), and the Araujo et al. (2006) proxies are, respectively, the most and less volatile ones; which is a result confirmed by the descriptive statistics of   Figure 1 shows one replication out of the total amount of 5,000 replications. We adopt N = 96 (56 in-sample and 40 out-of-sample) assets and three factors obtained from the factor analysis.
We show in Figure 2 and Figure 3, for illustrative purposes, in a "mean versus variance" plot, the real returns and one replication of the simulated returns for the three factor RM, SMB and HML setup of Fama and French and also the six extended factors Rm, SMB, HML, Mom, STRev and LTRev of Fama and French, respectively. These pictures evidenced that the covariance structure of the artificial return is preserved.      Rev. Bras. Finanças (Online), Rio de Janeiro, Vol. 10, No. 4, December 2012 Araujo et al. (2006); B is the SDF over the Brownian motion and C the SDF provided by the CAPM. Table 1 shows the results for T = 280 and the three sets of factors investigated. Initially considering the three Fama-French factors and N = 25, the mean and median HJ distance, 18 as well as the mean and median of p 1s 18 The standard error of the HJ-distance is estimated by a Newey & West (1987) HAC procedure, in which the optimal bandwidth (number of lags=5) is given by m(T ) = int(T 1/3 ), where int(.) represents the integer part of the argument, and T is the sample size. The adopted kernel used to smooth the sample autocovariance function is given by a standard modified Bartlett kernel: w(j, m(T )) = 1 − [j/{m(T ) + 1}]. See Newey & West (1994) for an extensive discussion about lag selection in covariance matrix estimation, and also Kan & Robotti (2009). statistic, indicates model B as the best one, closely followed by model C. Nonetheless, it is worth mentioning that, in this case, the respective standard deviations (although computed across Monte Carlo replications) provide an indication that all HJ distances might be statistically the same and that the goodness-of-fit statistics p 1s might be indeed zero. On the other hand, the HJ distance based on real data selects model A. For N = 48 and N = 96, only the mean and median HJ distance selects model C, whereas the goodness-of-fit statistic p 1s suggests again model B.

T = 280 and three factor RM, SMB and HML of Fama and French N = 25 (15 In-sample, 10 out-of-sample) Mean HJ Median HJ S.E. HJ HJ Real S.E. HJ Mean P1s Median P1s S.E. P1s
Considering the second part of results for the six extended Fama-French factors, the mean and median HJ distance, as well as the p 1s statistic, select model B, which is a different result when considering real data. In the third part of the results based on three factors generated via factor analysis based on the S&P500 dataset, the results of all statistics again indicate model B as the best one, and suggest model A based on real data.
The results from Tables 2 and 3 point to the same findings and, in general, the model ranking based on artificial returns is B C A. It should be highlighted that the SDF of Hansen-Jagannathan would be the best one across the four presented SDFs, however, it is not considered in the "horse race", since it is designed to generate an adequate HJ distance, so it is presented here only as a benchmark. For further details regarding these findings, see Figures  Therefore, the nonparametric and fully data-driven model A is often selected only when considering real data, mainly when considering higher values of N and T , which is a natural result since its performance increases asymptotically as long as the sample sizes N, T increase. However, this result is not robust when considering a kind of "reality check" in the sense of White (2000) within a simulated framework produced to mimic asset returns. In this case, model B is selected, since it produces a SDF with statistical properties, such as volatility, which are the closest to the HJ SDF ones (see Table A.1). This feature has a direct influence on the results for the HJ distance. Regarding the p 1s statistic, although model B is again the selected one, notice that (overall) the considered models seem to exhibit a pricing error that is statistically zero, if one considers the standard deviation of such statistic across all replications as a proxy for the standard error of this variable. Finally, model C is not selected in real data, and usually not suggested in the simulated returns setup, which is a result closely linked to the hypotheses embodied in the CAPM framework (e.g., two-period model and log utility function); revealing that such hypotheses are indeed not reflected in both real and artificial data. 19

Conclusions
In this paper, we propose a methodology to compare different stochastic discount factor models constructed from relevant market information. Based on a multifactor approach, which is grounded on characteristics of the firms in a particular economy, a Monte Carlo simulation strategy is proposed in order to generate a set of artificial returns that is properly compatible with those factors. One feature of such methodology is that the comparison directly relies on estimated stochastic discount factor time series and their ability to properly price asset returns. One advantage of this approach is to enable investigating the performance of different models based not only on a single realization of asset return series (i.e., real data time series), but also to provide a simulated setup with up to n = 5, 000 replications of real data in order to compare model performance in a much broader dataset. This approach can mimic observed data features (e.g., time-dependence and covariance structures) and, thus, provide a reality check to evaluate distinct SDFs (White, 2000, see).
Therefore, the main contribution of this paper consists of a methodology to compare distinct SDFs in a setup where a multifactor approach is used to summarize a given economic environment, which is used to generate numerical simulations in which SDF proxies are compared through a goodness-of-fit statistic and the Hansen and Jagannathan distance. An empirical application is provided to illustrate our methodology, in which returns time series are produced from three set of factors of the U.S. economy.
The main results based on real data quite often indicate the SDF model of Araujo et al. (2006). Its nonparametric setup and data-driven approach lead to a performance improvement as long as the sample sizes increase (i.e., number of considered assets and time periods). Nonetheless, this re-sult is not robust when considering simulated datasets within a Monte Carlo exercise. In this case, the SDF of Brandt et al. (2006) seems to be the best model, given that the Brownian motion hypothesis is able to generate SDF dynamics with adequate statistical features, which are closer to the Hansen and Jagannathan SDF. Finally, the CAPM derived SDF is not often indicated by the comparison exercise, since its restricted hypotheses, such as the log utility function and single two-period model, seem to be rejected by both real and simulated data.
Future extensions of this paper might also include the investigation of other SDF proxies as well as the adoption of distinct factors. In addition, the analysis of such models in other economies, such as developed or emerging ones, might lead to distinct model recommendations, depending on the adequacy of each model's assumptions with respect to different market features. For example, the empirical exercise could be extended to the Brazilian stock market. However, market specific features, such as liquidity issues and structural breaks, should properly be taken into account in order to not bias the results regarding the estimated SDFs. For instance, the merge in 2008 of the Brazilian stock market (Bovespa) with the Brazilian Mercantile & Futures Exchange (BMF) 20 resulted in a liquidity break, due to the sudden hike in market liquidity. A formal treatment on this issue (within a factor modeling setup) would require, for instance, the use of dummies and tests for the hypothesis of a time series structural break (e.g., in monthly financial volume). In order to guide this possible route, some papers focused on factor models and Brazilian data are worth mentioning. 21 For instance, Rayes et al. (2012)  21 Some related papers are the following: (i) Neves & Leal (2003), which investigate the relationship between GDP growth and the effects of size, value and moment within a FF setup; (ii) Málaga & Securato (2004) corroborate the statistical significance of the three FF factors regarding return forecasts; (iii) Lucena & Pinto (2005) revisit the FF model for Brazilian data and adapt it to include parameters from ARCH-GARCH models; (iv) Mussa et al. (2007) investigate an augmented four-factor model (including momentum), concluding that the 3 original FF factors are significant for the majority of considered portfolios; and (v) Mussa et al. (2009) test the CAPM, 3-factor and 4-factor models, concluding in respect to the FF setup that the HML effect is quite significant for Brazilian stock market. to investment portfolios with variable weightings, still explains the returns in view of the structural break in the Brazilian stock market in terms of its liquidity. 22