The Volatility of Mutual Fund Performance

Previous research has shown that fund performance is reduced by higher expense ratios but improved by more active management. Using data for equity mutual funds from 1991-2012, we show that prior studies has overlooked the fact that a high degree of active management magnifies the extremes of performance. In addition, funds with higher expense ratios and turnover ratio have had greater volatility of performance as well as lower mean performance, a doubly adverse pattern. Thus, mutual funds with more active management, higher expense ratios and turnover ratios are riskier.


Introduction
There is a vast literature on mutual fund performance and most studies relate fund characteristics to risk-adjusted returns. 1 Previous research has shown that mutual fund net performance is reduced by higher expense ratios. Some studies have suggested poorer performances for funds with higher turnover ratios. Several recent papers have presented evidence that more actively managed funds have higher net performance.
Our study shows that these previous results have overlooked important aspects of the impact of mutual fund characteristics on fund performance. While active management may increase mean performance, we show that it also increases the dispersion of performance. As a result, active management does not provide a foolproof guarantee of better results for investors. 2 Further, we find that not only do higher expense ratios or higher portfolio turnover reduce mean performance, but they also increase the dispersion of performance -a doubly adverse effect.
Prior research assumes that a single equation regression model can describe the relationship between performance and fund characteristics. We show that single equation regression models of mutual fund performance have pronounced heteroscedasticity problems. Specifically, mutual funds with more active strategies, high expense ratios or high turnover ratios have markedly higher dispersion of performance compared to mutual funds with low expense ratios or low turnover ratios or relatively passive strategies.
The standard corrections for heteroscedasticity do not adequately capture these dispersion differences.
Our paper begins by first documenting heteroscedasticity in the risk-adjusted performance of US equity mutual funds from 1991 to 2012. Then for comparison with earlier studies, we repeat single-equation regression models of performance on fund characteristics and replicate the results of previous single equation regression studies using the standard adjustments for heteroscedasticity.
Next, we run quantile regressions on the same data. Quantile regression was developed by Koenker and Bassett (1978) and has been applied to financial studies in recent years (see, for example, Baur, Dimpfl, & Jung, 2012;Ma & Pohlman, 2008). Quantile regressions fit a series of linear functions for different values of the dependent variable. Quantile regressions allow for the possibility that the relation between the dependent variable and each independent variable may differ for specific levels of the dependent variable and may differ for each explanatory variable. One of the extremely valuable features of quantile regression is its ability to differentiate those variables that have a changing relationship with the dependent variable versus those that have a steady relationship with the dependent variable. Thus, quantile regressions allow a much fuller description of the relationships between fund performance and expense ratios, turnover ratios, active management, fund size, and family size.
Using quantile regressions, we find that the relationship between fund performance and active management (the turnover ratio, expense ratio) is markedly different for high levels of fund performance compared to low levels of fund performance. That is, the relationship between these three variables and fund performance is heteroscedastic. However, for fund size and for mutual fund family size, the relationship between performance and these two variables is the same for different levels of performance. That is, there is a homoscedastic relation between performance versus fund size and performance versus family size.
Most importantly, empirical results from quantile regression models show that the degree of active management has diametrically opposite impacts on fund performance for high-performing versus low performing mutual funds. For those mutual funds that perform well, a higher degree of active management results in even better performance. For poorly performing funds, more active management exacerbates the poor performance. Because actual future performance cannot be predicted with complete certainty in specific time intervals, active management strategies entail higher risk. The greater risk of active management has been overlooked in the literature.
There has been considerable controversy about the value of active fund management. Several recent papers have presented evidence that active management style results in higher average performance (Cremers & Petajisto, 2009;Amihud & Goyenko, 2013). However, advocates of the efficient market hypothesis argue that active investment management must be a zero-sum game. That is, any potential over-performance of some actively managed funds must be offset by under-performance of other sectors of the market (Fama & French, 2010). Consistent with the efficient market hypothesis, several studies show that over-performance by some actively managed funds is mostly due to luck, not skill, and there is little or no evidence of persistence in fund over-performance (Barras, Scaillet, & Wermers, 2010;Busse, Goyal, & Wahal, 2010). In addition, a recent study by Frazzini, Friedman, and Pomorski (2016) finds that the positive relation between Active Share, a measure of fund active management, and fund performance documented in Cremers and Petajisto (2009) disappears after controlling for differences in benchmark returns.
Contributing to the debate, our study shows that the superior average performance of actively managed funds is accompanied by higher variability in performance. Thus, investors in actively managed funds do not get a free lunch. Instead, the superior average performance entails higher risks.
Active management not only increases performance variability across different funds, but also accentuates time-series volatility. If a fund has good performance in one period, high active management amplifies its good performance. If its performance is poor in the next period, high active management magnifies the poor performance. The result is greater time-series volatility in performance for funds with active management relative to funds with more passive management. Thus, the higher risk-adjusted return variability of actively managed fund suggests that static, one-period or historical average Carhart 4-factor alphas might not capture all risks. Consequently, investors should examine the volatility in the traditional 'risk-adjusted' returns.
In addition, quantile regressions for high-performing funds show that the impacts of the expense ratio and the turnover ratio are insignificantly different from zero. In contrast, the quantile regressions for poorly performing funds indicate that higher expense ratios and higher turnover ratios have a pronounced negative impact on performance. Since investors do not know in advance which funds will be good performers and which will be poor performers, higher expense ratios and higher turnover ratios increase downside risk without any upside gain. While the lower mean performance of funds with higher expense ratios is well documented, the greater dispersion of performance has been unnoticed.
To demonstrate the power of quantile regressions, Fig. 1 shows an OLS regression and quantile regressions of fund performance (measured by Carhart 4-factor alphas) on fund expense ratios using equity mutual fund data for 2012. Panel A is the scatter plot of fund performance versus the expense ratio with the OLS fitted line. The fitted OLS line is downward sloping, consistent with prior literature findings of lower performance for higher expense ratio funds. However, the scatter plot clearly shows that fund performance is heteroscedastic and that the OLS regression fails to capture the large variance in performance of high expense ratio funds. Panel B presents the quantile regression fits for the conditional 10th, 25th, 50th, 75th and 90th quantiles (or percentiles) of the 4factor alphas. The slopes of the five fitted lines are quite different from each other and from the OLS fitted line. For the 90th quantile, the fitted line is basically flat, suggesting that the expense ratio has minimal impact upon the risk-adjusted returns of high performing funds. In contrast, the slope of the 10 th percentile fitted line is much steeper, indicating a much larger negative impact of the expense ratio on the risk-adjusted returns of poorly performing funds than suggested by the OLS slope.
The rest of the paper is organized as follows. Section I describes the data collection and presents the summary statistics. Section II examines the impact of fund characteristics and fund performance volatility and demonstrates significant heterogeneity in fund performance.
Section III analyzes the relation between fund performance and fund characteristics with both conditional-mean based regressions and quantile regressions. Initially, we run single equation panel regressions and Fama-MacBeth regressions with our data to confirm the findings in the existing literature. Our empirical findings are consistent with the prior studies using single equation regressions. Then Section III employs quantile regressions on the same data and finds considerably different results from single-equation conditional-mean regressions. Section IV provides robustness checks of the main results and section V concludes the paper. The Appendix provides a detailed discussion of quantile regression.

Data collection, variable construction, and summary statistics
We collect data from the CRSP mutual fund database. Our sample period covers 1991 to 2012. We focus on U.S. equity funds and exclude index or index-based funds. 3 We also eliminate sector funds and funds with non-traditional investment objectives, such as long/short equity funds and absolute return funds. We use the CRSP fund objective codes to classify funds as growth, income, balanced, large cap, midcap, small cap, and microcap funds.
To minimize the potential for 'incubation' bias (Evans, 2010), we eliminate funds with TNA value less than $5 million. Further, we exclude funds with an insufficient number of monthly returns to estimate 4-factor models (see estimation procedure below). For funds with multiple classes, we aggregate data to the fund level by taking TNA-weighted averages, similar to the approach in French (2008) and Fama and French (2010).

A 4-factor Model Estimation
At the beginning of every calendar year, we estimate the Carhart (1997) 4-factor model based on the fund's previous 36 monthly returns for each fund as follows: RMRF denotes the excess return of the market portfolio over the risk-free rate. SMB is the return difference between small and large capitalization stocks. HML is the return difference between high and low book-to-market stocks, and MOM is the return difference between stocks with high and low past returns. The time series of the 4 monthly factors are downloaded from the website of Kenneth French. 4 Following Carhart (1997), we exclude funds with less than 30 valid prior months of return data. The R-squared of regression model (1), or the percent of mutual fund returns explained by the 4-factor model, is used by Amihud and Goyenko (2013) to measure the degree of fund active management. The idea is simple. Mutual funds with low R-squared are defined as actively managed since their performance deviates significantly from the 4-factor model. While other methodologies have been used to measure active management, the Amihud and Goyenko (2013) approach avoids the need to obtain and analyze mutual fund holdings. Thus, we use the R-squared to measure the degree of active management. 5 The regression intercept,ˆi measures the average monthly risk-adjusted return from the estimation period. We annualize it by multiplying by 12 and call it the Lagged Alpha because it measures the fund past risk-adjusted performance during the estimation period.
Next, we estimate the 12 monthly Fitted Alphas for each fund in a given calendar year as follows: The 4-factor model loadings are estimated from the regression model in Eq.
(1) based on the fund's prior 36 monthly return data. The 12 monthly Fitted Alphas are compounded to arrive at the annualized Fitted Alpha. We repeat the process for each fund for every calendar year with non-missing monthly return data.
Following Amihud and Goyenko (2013), we eliminate observations with R-squared at the two extreme tails (greater than 99.5% and less than 0.5%). In addition, we also truncate the sample at the two extreme tails of the Fitted Alpha (removing observations greater than 99.5% and less than 0.5%) and turnover ratios. 6 The final sample includes 27,276 fund-year observations and 5084 unique US domestic equity mutual funds.
In addition to the 4-factor Fitted Alpha, we also use benchmark-adjusted returns as an alternative measure of performance. 7 The benchmark return is the value-weighted average return of all mutual funds in the CRSP database with the same CRSP investment objective. 8 The difference between the fund's net-of-fee return and the benchmark return is the benchmark-adjusted return and we call it Benchmark Alpha. 9 Finally we compound the monthly Benchmark Alpha to annual Benchmark Alpha.
To measure the fund risk-adjusted performance volatility, we calculate the standard deviations of the monthly Fitted Alpha and monthly Benchmark Alpha for a given fund in a given calendar year.
Panel B reports the Pearson correlations of the variables. First, note that the expense ratios and turnover ratios have significantly negative correlations with the two measures of performance, the Fitted Alpha and the Benchmark Alpha, consistent with findings in prior research that these two ratios have a negative impact on fund performance. Second, similar to Amihud and Goyenko (2013), the R-squared has a significantly negative correlation with Lagged Alpha, suggesting lower R-squared funds, on average, have better riskadjusted returns. R-squared is also negatively correlated with Fitted Alpha and Benchmark Alpha, though the correlation between Rsquared and Fitted Alpha is not statistically significant. 13 Interestingly, the three fund characteristics are also significantly correlated with the volatility of monthly fund performance. The significantly positive correlations between Expense Ratio, Turnover Ratio and the Standard Deviations of monthly performance suggest that funds with higher expense and turnover ratios have higher performance variability. Similarly, R-squared is significantly negatively correlated with the performance volatility measure, indicating that actively managed funds (funds with lower R-squared) have higher performance variability. 5 In a robustness check reported in Section IV, we use the Active Share of Cremers and Petajisto (2009) as an alternative measure of fund active management. The main results are qualitatively similar. 6 As argued by Amihud and Goyenko (2013), funds with R-squared close to 1 are effectively 'closet index' funds, while extremely low R-squared might be the result of estimation error. Our main results are qualitatively the same if we winsorize or keep the outliers. 7 Part of the reason to use benchmark-adjusted return is to address a potential mechanical relation between R-squared and the volatility of Fitted Alphas. For low R-squared funds, the four risk factors (RMRF, SMB, HML and MOM) explain a smaller portion of the fund past return variations. As a result, when we use the factor loadings to estimate the Fitted Alphas, the estimation precision is likely to be lower, or the variance of the estimated Fitted Alpha is likely to be higher for the low R-squared funds. Benchmark-adjusted returns do not suffer from the potential problem of estimation errors. 8 Using equal-weighted average returns as benchmark generates essentially the same empirical results. 9 Jain and Wu (2000) use a similar procedure to calculate the benchmark-adjusted returns.
In addition, R-squared has significantly negative correlations with both the expense ratio and the turnover ratio. This suggests that more actively managed funds (i.e., funds with lower R-squared) have higher expense and turnover ratios, signifying that active management is more expensive and requires more trading either to time the market or to take advantage of perceived market inefficiencies. In addition, the expense and turnover ratios are significantly positively correlated, indicating some commonality between funds with high expense ratios and funds with high turnover ratios.
Finally, the fund size, as measured by the log of TNA, is positively correlated with R-squared and negatively correlated with the expense ratio and the turnover ratio, indicating that larger mutual funds are less actively managed and, likely as a result, charge lower expense ratios and trade less. 14 These patterns are consistent with findings in Amihud and Goyenko (2013).

Fund performance volatility
While there is an extensive literature on the determinants of mutual fund performance, no study explicitly focuses on the volatility in fund performance and how fund characteristics, such as the expense ratio and the turnover ratio, affect performance volatility. Various fund characteristics reflect fund investment styles and risk-taking approaches, which in turn can be expected to have an impact on fund performance volatility. Indeed, the results in Table 1 suggest that funds might differ significantly in their performance variability. In this section, we first develop our hypotheses on the relations between fund characteristics and performance volatility. Subsequently we test these hypotheses empirically.

A Hypotheses on Fund Characteristics and Performance Volatility
Two explanations for high expense ratios have been proposed in the literature. While the two hypotheses have opposite predictions on the relation between the expense ratio and fund performance, they have similar implications for the impact of the expense  Carhart (1997) 4-factor model. R-squared is the 4-factor regression R-squared. The Lagged Alpha is the monthly 4-factor regression intercept multiplied by 12. The Fitted Alpha is the annualized (compounded) monthly fund risk-adjust return, or the difference between the fund monthly return and the fund's predicted return estimated by multiplying the factor monthly returns by the estimated factor betas. The Benchmark Alpha is the annualized (compounded) monthly difference between the fund net return and valued weighted average returns of all funds in the CRSP database with the same investment objective. The sample period covers 1991-2012 with 27,276 fund-year observations and 5084 unique US domestic equity funds of $5 million TNA or greater.  14 The direction of causation is not clear. Funds with low expense ratios and low turnover may attract a large inflow of funds from sophisticated investors.

M. Livingston, et al.
Journal of Economics and Business xxx (xxxx) xxx-xxx ratio on performance volatility. First, as expense ratios pay for portfolio management services provided by mutual funds, high expense ratios should be justified by superior investment services, namely higher risk-adjusted returns. To achieve higher performance, skillful fund managers deviate from the index and concentrate the portfolio on undervalued stocks, industries, or sectors. For example, Kacperczyk, Sialm, and Zheng (2005) find a significantly positive correlation between the expense ratio and industry concentration of fund holdings. A consequence of stock/sector picking is a lack of diversification and an increase in the idiosyncratic risk, leading to higher variability of risk-adjusted performance. While this hypothesis implies a positive relation between expense ratio and fund performance, most empirical studies find the opposite: high expense funds underperform their low expense brethren. Several empirical and theoretical studies argue that investors in high expense funds are unsophisticated, have low sensitivity to poor performance, and trust their fund managers (Christoffersen & Musto, 2002, Gil-Bazo & Ruiz-Verdu, 2009, and Gennaioli, Shleifer, & Vishny, 2015. As a result, fund managers are able to strategically charge higher expense ratios. In addition, Gil-Bazo and Ruiz-Verdu (2009) find that better fund governance can mitigate the negative relation between expense ratios and fund performance, suggesting high expense funds are weak in governance. Further, due to non-linear compensation structures, there is an incentive for high expense funds to take excessive risk (Brown, Harlow, & Starks, 1996). If risks pay off, the resulting high performance attracts fund inflows and managers are better compensated. If risks backfire, fund outflows will be minimal due to the low sensitivity of fund investors to weak performance. Indeed, Huang, Sialm, and Zhang (2011) find that high expense funds are more likely to engage in risk-shifting activities. Consequently, this hypothesis about high expense ratios also predicts a positive relation between the expense ratio and performance volatility. Based on the two main explanations for high expense ratios, we propose our first hypothesis: H1. High expense funds have higher volatility in performance than low expense funds.
It is well established in the literature that trading volume is positively related to asset price and return volatility (Kandel & Pearson, 1995;Karpoff, 1987). Two theories have been proposed to explain this pattern: differences-of-opinion and overconfidence. The differences-of-opinion hypothesis posits that investors have substantially different interpretations of public information which lead to higher trading volume and larger asset price/return volatility (Harris & Raviv, 1993;Kandel & Pearson, 1995). The overconfidence hypothesis further argues that the large difference of opinion is likely a result of investors' and traders' overconfidence in the accuracy and precision of their information (Odean, 1998). Several empirical studies find overconfidence leads to higher trading volume, lower risk-adjusted returns and higher return volatility (Barber & Odean, 2000, 2001. Thus, if high fund turnover ratios are a symptom of fund managers' overconfidence and preference for assets with a high degree of differences-of-opinion, we propose our second hypothesis: H2. High turnover funds have higher volatility in performance than low turnover funds. By definition, actively managed funds pursue investment and trading strategies that deviate from benchmarks (Cremers & Petajisto, 2009). Consequently, actively managed funds are likely to take more idiosyncratic risk. Indeed, following Amihud and Goyenko (2013), we use the OLS R-squared of the Carhart 4-fctor regression models to measure active management. As argued by Amihud and Goyenko (2013), one minus R-squared is the 'proportion of the fund's variance that is due to idiosyncratic risk.' Thus, our third hypothesis is as follow: H3. Actively managed funds have higher volatility in performance.

• Empirical Examination of Performance Volatility
Using both panel regressions and Fama-MacBeth regressions, we test the three hypotheses on fund performance volatility. Table 2 presents the empirical results. The dependent variables are either the Standard Deviations of Monthly Fitted Alpha or Standard Deviations of Monthly Benchmark Alpha. The test variables include the Expense Ratio, Turnover Ratio, and R-squared. 15 Several control variables are included in the regression. Lagged Log of TNA is the natural log of the year-end TNA for the previous calendar year. Lagged Log of Family TNA is the natural log of the year-end fund family TNA for the previous calendar year. Lagged Volatility is the corresponding performance volatility of the funds in the previous calendar year. In addition, the regression includes fund style dummies. Year dummies are included in the panel regression to capture the year fixed effects. The first (last) two columns report the results from panel regressions (Fama-MacBeth regressions). Given the panel data, there is a concern about potential autocorrelation in residuals. To correct for potential autocorrelation and heteroskedasticity, we use the Newey-West standard errors with five lags (Newey & West, 1987). 16 First, note that the coefficient on Lagged Volatility is positive and highly significant in all regression models, indicating strong persistence in performance volatility. Second, consistent with our hypotheses, fund characteristics are significantly related to performance volatility. The coefficients on Expense Ratio range from 0.11 to 0.17 and are highly significant in both the panel and Fama-MacBeth regressions, suggesting a one-standard-deviation increase in the expense ratio (0.45%) increases the standard deviation of monthly risk-adjusted performance by 0.05% (0.45%*0.11). The coefficients on the Turnover Ratio are also significantly positive in both the panel and Fama-MacBeth regressions, supporting Hypothesis 2 that high turnover funds have higher volatility in risk 15 Following Amihud and Goyenko (2013), we also take the logistic transformation of the R-squared as an alternative test variable because the Rsquared value is heavily skewed toward values between 0.9 and 1, its upper bound. The empirical results are qualitatively the same. 16 The empirical results are robust to different numbers of lags used.
adjusted performance. A one-standard-deviation increase in turnover ratio (64%) increases the standard deviation of monthly riskadjusted performance by about 0.06% (64%*0.001). The coefficient on R-squared is about -1 and highly significant. A one-standarddeviation decrease in R-squared (−0.078), or increase in active management, increases the standard deviation of monthly riskadjusted performance by about 0.08%. Funds belonging to a large fund family have relatively lower performance volatility although the size of the fund itself does not significantly affect performance variability.
The results in Table 2 suggest that fund characteristics impact monthly performance variability. The intertemporal volatility in monthly performance might be smoothed out over time. Next, we further check the cross-sectional annual performance variability by sorting fund characteristics into quintiles based on fund characteristics. First, we sort the sample into quintiles by expense ratios with quintile 1 being the lowest expense ratio and quintile 5 having the highest expense ratio. Fig. 2 is a box plot of the annual Fitted Alpha on expense ratio quintiles. The whiskers are set at the 1.5 interquartile ranges (IQR) of the lower and upper quartiles. 17 The standard deviation and interquartile range of annual Fitted Alphas for the lowest expense ratio quintile are 6.35% and 6.09% respectively. They increase monotonically for funds in higher expense ratio quintiles, reaching 9.01% and 9.40% for the highest expense ratio quintile. The evidence shows high cross-sectional variation in annual fund performance for high expense ratio funds. 18 The conditional distribution of annual Fitted Alphas (conditional on expense ratio) is not constant: the variance of the annual Fitted Alphas is much larger for high expense fund.
We make similar box plots of annual Fitted Alpha against quintiles of turnover ratios, R-squared, and fund size in Figs. 3-5. Figs. 3 and 4 indicate a similar pattern of performance heteroscedasticity with respect to fund turnover ratios and R-squared. Funds in the lower turnover ratio (higher R-squared) quintiles have much smaller standard deviations and interquartile ranges of Fitted Alpha, and the standard deviations and interquartile ranges increase (decrease) with the turnover ratio (R-squared) quintiles. In contrast, Fig. 5 shows that fund performance is relatively homoscedastic with respect to fund size. The standard deviations and interquartile ranges of Fitted Alpha do not differ greatly between various quintiles of fund TNA. The findings are consistent with the results in Table 2 that high expense ratio, high turnover ratio and more actively managed funds tend to have more volatile risk-adjusted returns, while fund size does not significantly affect performance variability.  Table presents regression analyses of the volatility in monthly fund performance. The first two columns present the panel regression results and the last two columns contain the results from Fama-MacBeth regressions. In each set of regressions, two different dependent variables are used: the standard deviation of monthly Fitted Alpha in a given calendar year and the standard deviation of the monthly Benchmark Alpha in a given calendar year (both in percentage). R-squared is the R-squared of the 4-factor regression of the fund's prior 36 monthly returns. The Expense Ratio is the fund's annual expense ratio in percentage and Turnover Ratio is the annual turnover ratio in percentage. Lagged Log (Fund TNA) is the natural log of the fund's total net asset value at the end of the prior year. Lagged Log (Family TNA) is the natural log of the fund family's total net asset value at the end of prior year. Lagged Volatility is the monthly fund performance volatility in the previous year. The first two columns report the coefficient estimation from panel regressions. The numbers in parentheses are Newey-West tstatistics (Newey & West, 1987

Fund characteristics and performance
As shown in Section II, fund performance exhibits heteroscedasticity, a major problem for conditional-mean models of fund performance employed by prior studies. Although methodologies have been developed to allow for heteroscedastic errors, traditional one-equation, conditional-mean regression only models the conditional mean of the dependent variable as a function of the    Journal of Economics and Business xxx (xxxx) xxx-xxx independent variables. Consequently, the conditional-mean models cannot reveal the impact of the independent variables on the noncentral locations of the dependent variable when the homoscedasticity or normality assumption is violated. 19 In this section, we use quantile regression to re-examine the relationship between fund characteristics and performance. We first use the conditional-mean panel regressions and Fama-MacBeth regressions to analyze the relationship between fund characteristics and performance. The purpose is to show that our data generate similar empirical results as documented in previous studies which employ conditional-mean regression models. Then, we use quantile regression to analyze the same issue and show the impacts of the fund characteristics on performance differ significantly between high-performing and poor-performing funds.

A Conditional-mean Regression Analysis
Two measures of fund performance are used as dependent variables: the annual Fitted Alpha and the annual Benchmark Alpha, which were defined in Section I.A. Panel A (panel B) of Table 3 reports the results based on annual Fitted Alpha (Benchmark Alpha). The explanatory variables include fund characteristics and fund style dummies. Year dummies are included in the panel regression to capture the year fixed effects. In addition, previous studies document persistence in fund performance (Brown & Goetzmann, 1995;Gruber, 1996). To control for performance persistence, we include Lagged Alpha as an explanatory variable, which is the annualized intercept of the 4-factor model regression of the fund's previous 36 monthly returns as in Eq. (1).
The first column of Table 3 reports the coefficient estimates from the panel regressions. The coefficients on Expense Ratio are significantly negative, consistent with findings in many prior studies that high expense ratios negatively affect fund risk-adjusted returns (Carhart, 1997;Malkiel, 1995). Ratios are also significantly negative, indicating higher fund trading volume reduces fund performance. Previous literature has mixed findings on the impact of Turnover Ratios on fund performance. 20 The coefficient on Rsquared is −8.372 (−7.245) and highly significant when the dependent variable is the annual Fitted Alpha (Benchmark Alpha), suggesting that more actively managed funds (or funds with low R-squared) have, on average, better risk-adjusted return. A onestandard-deviation decrease in R-squared increases the Fitted Alpha, on average, by 0.65% (8.372*0.078). The coefficients on Turnover The coefficient on the natural logarithm of the lagged TNA is significantly negative and the coefficient on the natural logarithm of the lagged fund family TNA is significantly positive, consistent with the findings of Chen, Hong, Huang, and Kubik (2004). The coefficients on Lagged Alpha are either insignificant or negative, indicating no persistence in fund performance.
The second column of Table 3 reports the coefficient estimates from the Fama-MacBeth regressions. The coefficient estimates are remarkably similar to those from the panel regressions. In addition, the coefficient on R-squared in the Fitted Alpha regression is -8.10, very close to the −8.21 in Amihud and Goyenko (2013).
Overall, the empirical results from the OLS-based regression analysis are consistent with findings in existing literature that fund characteristics such as the expense ratio, the turnover ratio, and the degree of active management have significant impacts on fund performance.

• Quantile Regression Analysis
While confirming findings in existing literature, the conditional-mean regression approach in the previous section assumes that a) a single equation regression model fully describes the data, b) the dependent variable follows a conditional normal distribution, and c) the dependent variable is homoscedastic. As shown in Section II, fund performance exhibits heteroscedasticity. Furthermore, White's test and the Breusch-Pagan test for heteroscedasticity in the panel regressions both reject the null hypothesis of homoscedasticity at the 1% significance level.
In this section, we use quantile regression to re-examine the relationship between fund characteristics and performance. As mentioned earlier, quantile regressions model the impact of the explanatory variable on the full distribution of the dependent variable by fitting a linear function for each conditional quantile of the dependent variable. Thus, quantile regressions provide a more comprehensive analysis of the relation between explanatory and dependent variables, particularly in the presence of heteroscedasticity. 21 Appendix A provides an overview of the methodology.
Columns 3-7 of panel A (panel B), Table 3, report the quantile regression results for the conditional 10th, 25th, 50th, 75th and 90th percentiles of the Fitted Alpha (Benchmark Alpha). The empirical results allow us to examine the impact of each of the explanatory variables on mutual fund performance for low, medium, and high-performing funds. Although Table 3 reports results for five conditional quantiles, they are consistent with the full range of quantiles reported below in Figs. 6 and 7.
The coefficients for the Expense Ratio in both panels A and B of Table 3 are significantly negative for all the conditional 19 Under the ideal condition that the dependent variable is identically and normally distributed, the classical conditional-mean models describe, completely and parsimoniously, the relationship between the independent variables and the distribution of the dependent variable. 20 For example, Carhart (1997) finds a negative impact of turnover on fund performance. Other studies, however, document that funds with higher turnover have better stock selection skills (Grinblatt & Titman, 1993;Chen, Jegadeesh, & Wermers, 2000;Wermers, 2000). Further, Pástor, Stambaugh, and Taylor (2017) finds a positive time-series turnover-performance relation. 21 It is important to note that quantile regression is not the same as running individual regressions on subsamples based on unconditional quantiles of the dependent or independent variables. Hallock et al. (2010) has an excellent detailed discussion on this possible misconception about quantile regression.

Journal of Economics and Business xxx (xxxx) xxx-xxx
percentiles of the dependent variable except the 90th percentile. This pattern shows that the expense ratio has an overall negative impact on fund performance, consistent with the results from the conditional-mean regression models and the findings in the earlier literature. Yet the impact of the Expense Ratio is not uniform at the different conditional percentiles of the dependent variable. Most of the coefficients at the conditional 10th, 25th, 75th and 90th percentiles are significantly different from the 50th percentile Table 3 Annual Fund Performance Analyses. This Table presents regression analyses of mutual fund risk-adjusted returns. In panel A, the dependent variable is the Fitted Alpha (in percentage). In panel B, the dependent variable is the Benchmark Alpha (in percentage). The first (second) column reports the results on panel (Fama-MacBeth) regressions. Columns 3-7 report the results on quantile regressions. The Expense Ratio is the fund's annual expense ratio in percentage and the Turnover Ratio is the annual turnover ratio in percentage. R-squared is the R-squared of the 4-factor regression of the fund's prior 36 monthly returns. Lagged Alpha is the 4-factor regression intercept multiplied by 12. Lagged Log (Fund TNA) is the natural log of the fund's total net asset value at the end of the prior year. Lagged Log (Family TNA) is the natural log of the fund family's total net asset value at the end of prior year. The numbers in parentheses are t-statistics. For the panel regressions, the numbers in the parenthesis are Newey-West t-statistics (Newey & West, 1987 coefficient and also different from the coefficient from the panel regression. 22 Furthermore, the magnitude and significance levels of the coefficients decrease at the higher conditional percentiles of fund performance. While the coefficient on the Expense Ratio at the conditional 90 th percentile of fund performance is still negative, it is no longer significantly different from zero. These findings suggest that the expense ratio has a much larger negative impact on performance of poorly performing funds but has a smaller negative effect when the fund is doing well. This finding provides further justification for not investing in high expense funds. As found in prior literature, high expense funds have, on average, lower performance than their low expense counterparts. In addition, as we documented in Section II, high expense   22 To test the significance of the difference between two coefficients, we compare the 95% confidence intervals of the two coefficients. If the two confidence intervals overlap, then the difference in the coefficients is not significant. M. Livingston, et al. Journal of Economics and Business xxx (xxxx) xxx-xxx funds also have much more volatile risk-adjusted returns, increasing the risk for their investors. Finally, the preceding finding indicates that high expense ratios inflict on fund investors a 'wrong-way' risk, that is, they greatly exacerbate the already poor performance suffered by investors. The coefficients on the Turnover Ratio in Table 3 have a very similar pattern. All the quantile regression coefficients are negative, indicating that turnover generally has a negative impact on performance. These findings are consistent with the results from the conditional-mean regressions. In addition, the magnitude of the coefficients is bigger and the significance level is higher at the lower conditional percentiles of the fund performance, but smaller and lower at the higher conditional percentiles. At the 90 th percentile, the coefficient is not significantly different from zero.
As discussed in Section II, prior studies suggest that high turnover is likely a symptom of fund managers' overconfidence in their information and stock picking skills. If the managers' information turns out to be wrong and/or their investment strategies fail to work, the fund performance will be inferior. The more overconfident the managers are, the more they are likely to trade based on their poor information and failing strategies, leading to a significantly larger negative impact of turnover on already poor performance. On the other hand, if the fund managers' information is accurate and investment strategies are successfully, the high trading volume based on the information and strategies help to offset the generally negative impact of high trading on performance due to commissions, bid-ask spread and market impact costs, resulting in minor or no drag in net risk-adjusted performance.
In both panels of Table 3, the coefficients on R-squared exhibit a startling pattern. They are significantly positive for the conditional 10th and 25th percentiles, turn to significantly negative for the 50 th percentile and become increasingly more negative for higher percentiles. All the quantile regression coefficients on R-squared are significantly different from the coefficient in the panel regressions. In addition, the coefficients on R-squared for the conditional 10th, 25th, 75th and 90th percentiles are significantly different from the coefficients for the 50 th percentile.
The interpretation of the results is as follows. First, the significantly negative coefficient on R-squared for the conditional 50th percentile regression indicates a decrease in R-squared (or an increase in the degree of active management) increases the expected conditional median value of Fitted Alphas and Benchmark Alphas. This pattern is consistent with the results from the conditionalmean regressions that lower R-squared funds, on average, have better performance. Second, the much larger coefficients for the conditional 75th and 90th quantile regressions indicate that a shift to more active management increases the expected 75th and 90th percentile values of the risk-adjusted performance much more than the increase in the expected median value, showing that more actively managed funds have higher upside potential. However, the significantly positive coefficients for the conditional 25th and 10th quantile regressions suggest that a shift to more active management decreases the expected 25th and 10th percentile value of the fund performance, indicating that active management style also has significant downside risk. Actively managed funds have enormous variability in performance. In a nutshell, when performance is bad, active funds do very badly. When performance is good, active funds do very well.
A possible economic explanation of the results is as follows. Mutual fund managers who believe that they have special information with the potential to earn higher risk-adjusted returns tend to concentrate more of their portfolio in these special information securities (Kacperczyk et al., 2005). The consequence of this investment strategy is a lower correlation with the performance of broad market indexes, i.e. lower R-squared. While the special information is not perfect, the risk-adjusted returns from exploiting the special information can be superior on average. However, since the information is imperfect, the risk-adjusted returns will be either favorable (if the information proves to be accurate and the fund's risk-adjusted performance is on the conditional 75th or 90th quantiles) or unfavorable (if the information is wrong and the fund's performance is on the conditional 10 or 25th quantiles). The result is greater dispersion of performance. 23 To illustrate how the degree of active management, as measured by R-squared, magnifies fund over-or underperformance, suppose two mutual funds have the same special information. The manager in the first fund allocates 50% of his investment to a passive index portfolio and the other half to the 'information' portfolio. The second fund invests 55% of its money into the 'information' portfolio and the rest in the index portfolio. The second fund is more actively managed with a lower R-squared. If the special information proves to be accurate, both funds will outperform the passive index fund, but the more active fund will have larger over-performance. In other words, when funds outperform, as those funds with Fitted Alphas or Benchmark Alphas in the 75th and 90th percentiles, the higher degree of active management further enhances fund performance. In contrast, if the funds' special information is wrong and the investment strategy fails, both funds will underperform but the more active fund will underperform more. The performance will be worse the more closely the fund follows the losing strategy.
While the coefficients on the Expense Ratio, the Turnover Ratio, and R-squared vary greatly from the top percentiles to the bottom percentiles of the conditional distribution of the dependent variable, the coefficients on the Lagged Log of Fund TNA and the Lagged Log of Family TNA in Table 3 are fairly uniform at the different percentiles of the dependent variable and they are not significantly different from the panel regression coefficients in column 1. The reason for the fairly uniform coefficients on these two variables is that the two dependent variables, Fitted Alpha and Benchmark Alpha, are relatively homogeneously distributed at different fund sizes 23 Investors could potentially lower the performance volatility of individual funds by forming a portfolio of actively managed funds. To check if such a diversification strategy can result in second degree stochastic dominance by a portfolio of actively managed funds over a portfolio of least actively managed funds, we performed a simulation. At the beginning of each calendar year, we form 400 equal-weighted portfolios of 15 funds. For 200 portfolios, the 15 funds are randomly chosen, without replacement, from the lowest R-squared quintile. Another 200 portfolios contain funds randomly chosen from the highest R-squared quintile. Thus, for the 22-year sample period, we have a total of 8,800 portfolios. A comparison of the CDFs of the two groups of portfolios indicates no second degree stochastic dominance. and at different fund family sizes. Fig. 5 indicates that the standard deviation of the Fitted Alpha does not vary significantly between different quintiles of fund TNA. Consequently, the quantile regression coefficients are not statistically different from that of the conditional-mean regression.
Figs. 6 and 7 contain four diagrams plotting the quantile regression coefficients on the Expense Ratio, the Turnover Ratio, Rsquared and Lagged Log of Fund TNA for all quantiles of Fitted Alphas along with the 95% confidence intervals. As a comparison, the coefficients from the panel regression in Table 3 are also plotted in the diagrams. The diagrams confirm the patterns of the quantile regression coefficients observed in Table 3. In the left diagram of Fig. 6, the coefficients on the Expense Ratio are always negative, but the magnitude of the coefficients is smaller at the higher percentile regressions. A similar pattern shows up in the right diagram of the coefficients on the Turnover Ratio. In the left diagram of Fig. 7, the coefficients on R-squared are positive for the low quantiles of Fitted Alphas and become negative for the higher quantiles. The 95% confidence interval is quite tight for the coefficients on Rsquared.
In contrast, the right diagram of Fig. 7 shows that the coefficients for the Lagged Log of Fund TNA are fairly constant at all conditional percentiles, consistent with the pattern observed in Table 3. In addition, the 95% confidence intervals of the quantile regression coefficients largely overlap with the 95% confidence intervals of the panel regression coefficient.

• An Example of Active Fund Management
The empirical findings strongly suggest that actively managed funds have more volatile performance and that the higher performance variability is a direct consequence of pursuing an active management style. The case of Legg Mason's Value Trust Fund provides some anecdotal evidence. The fund was run by legendary investor Bill Miller until 2011 and had outperformed the S&P 500 index 15 years in a row from 1991 to 2005. According to news stories, the fund pursued an active management style (see Bruner & Carr, 2005, for a discussion of Bill Miller's investment strategies). The average R-squared for the Value Trust fund in our sample period is 0.90, putting it into the second lowest R-squared quintile.
The good fortune of Value Trust ran out in 2008. The fund's raw return of −55% far underperformed the S&P 500 index's decline of 37%. We estimate the fund had a 4-factor alpha of −13% for 2008. The fund underperformance was largely due to its heavy investments in distressed financials, such as Countrywide Financial, Bear Stearns, Freddie Mac and AIG. The R-squared of the fund dipped to 0.856 between 2005 and 2007. While the strategy of heavy investment in distressed financials backfired in 2008, the same strategy had been very successful in the past. During the S&L crisis in the early 1990s, the Value Trust Fund loaded up on American Express, Freddie Mac and other struggling banks, resulting in a weighting of more than 40% in financials (Lauricella, 2005). Consistently, we estimate that the R-squared of Value Trust Fund was 0.874 from 1993-1995. When the financial sector rebounded in 1996, the Fund's performance was a spectacular 38% compared to 15% of the S&P 500 Index. Cremers and Petajisto (2009) propose an alternative measure of fund active management: Active Share, and demonstrate that funds with the highest (lowest) degree of Active Share significantly outperform (underperform) their benchmarks. Active Share is defined as the degree of deviation of fund portfolio holdings from those of its benchmark. Index funds with portfolio holdings exactly replicating the index have Active Share of 0 and funds whose holdings do not overlap with its benchmark have Active Share of 1.

Robustness checks
To check the robustness of our main findings to the alternative measure of Active Management, we perform the regression analysis of fund Fitted Alpha with Active Share as an independent variable for a subsample from 1991 to 2009. Data on Active Share are obtained from the website of Petajisto's website. 24 Table 4 reports the regression results. For the panel regression, the coefficient on Active Share are significantly positive, consistent with Cremers and Petajisto (2009) that funds with higher degree of Active Share outperform their counterparts with low Active Share. However, the positive impact of Active Share on performance is concentrated on better performing funds in the 75th and 90th percentiles of performance distributions. For funds with Fitted Alpha in the 10th and 25th percentiles, higher degree of Active Share has a negative impact on performance. These diametrically opposite impacts of Active Share on fund performance for high-performing versus low-performing mutual funds are consistent with the findings in Table 3, indicating that our main results are robust to the alternative measure of fund Active Management.
Our panel data spans 13 years with an average of 2098 fund observations per year. The panel and quantile regressions could capture both the cross-sectional and time-series performance volatilities. Further, serial correlation of fund performance over time is a potential concern. To address this concern and isolate cross-sectional performance variability from time-series volatility, we also run the quantile regressions for each sample year. The empirical results are consistent with those from the pooled data. While the coefficients vary from year to year, there is a clear pattern of generally positive coefficients on R-squared at the 10th and 25th percentiles and increasingly negative coefficients on R-squared from the 50th to the 90th percentiles. Similar patterns as the pooled sample are also observed for the coefficients on the Expense Ratio and Turnover Ratio from the annual quantile regressions. To conserve space, we do not present these results but they are available upon request.
We analyze annual fund performance in Table 3 rather than monthly performance because annual intervals are a more reasonable investment horizon for average investors. Consequently, we require funds in the sample to have a full year of monthly returns. This 24 http://petajisto.net/data.html. We are grateful to Dr. Petajisto for making the data publicly available. procedure might introduce potential survivorship bias due to the elimination of funds in the last several months of their lives. To address this concern, we repeat the quantile regressions with annualized monthly Fitted Alpha (monthly Fitted Alpha multiplied by 12) and annualized monthly Benchmark Alpha as the dependent variables. 25 The results are reported in Table 5.
For the monthly results, the same pattern is observed for the Expense Ratio and Turnover Ratio as for the annual analysis. The patterns of significantly negative coefficients on R-squared for higher conditional quantiles (75% and 90%) and significantly positive coefficients for lower conditional quantiles (10% and 25%) in Table 5 remain consistent with the results in Table 3. Thus, the main empirical results are not affected significantly by the elimination of the final monthly performances of funds that cease to exist during the sample period.
Interestingly, the coefficients for the conditional 50th percentile are almost identical to those in Table 3, regardless if annual or monthly performance is used as dependent variable. The magnitude of the coefficients for higher and lower conditional quantiles are much larger in Table 5. This suggests that the quantile regressions on the pooled monthly panel data also capture significantly higher time-series volatility of monthly fund performance, which is smoothed over to get at annual performance measure.
We use two measures of fund performance, Carhart 4-factor Fitted Alpha and benchmark-adjusted return. In computing benchmark-adjusted return, we rely on funds' declared investment objectives. However, there is a concern about benchmark misclassification (Sensoy, 2009). To address the concern, we use the minimum Active Share benchmarks by Cremers and Petajisto (2009) to calculate the benchmark-adjusted return. 26 The empirical results based on the alternative benchmark-adjusted returns are qualitatively similar. To conserve space, we do not present these results but they are available upon request.

Conclusion
We document great heterogeneity in mutual fund performance and show that fund characteristics, such as the expense ratio and the turnover ratio, not only affect the average fund performance, but also the variability in performance. This heterogeneity in fund performance has been overlooked in prior research with conditional-mean based single-equation regression models of fund performance. Our study uses quantile regressions to re-evaluate the impact of fund characteristics on performance and provides a much Table 4 Annual Fund Performance Analyses -Active Share. This Table presents regression analyses of mutual fund risk-adjusted returns with an alternative measure of Active Management. The dependent variable is the Fitted Alpha (in percentage). Columns 3 to 7 report the results on quantile regressions. The Expense Ratio is the fund's annual expense ratio in percentage and the Turnover Ratio is the annual turnover ratio in percentage. Active Share is defined as the degree of deviation of fund portfolio holdings from those of its benchmark, ranging from 0 for Index funds to 1 for funds whose holdings do not overlap with its benchmark. Lagged Log (Fund TNA) is the natural log of the fund's total net asset value at the end of the prior year. Lagged Log (Family TNA) is the natural log of the fund family's total net asset value at the end of prior year. The numbers in parentheses are t-statistics. For the panel regressions, the numbers in the parenthesis are Newey-West t-statistics (Newey & West, 1987 0.17 0.11 n.a n.a n.a n.a n.a No. of Obs.
10,454 10,454 10,454 10,454 10,454 10,454 10,454 ǂ Indicates the coefficient is significantly different from the corresponding panel regression coefficient. ♯ Indicates the coefficient is significantly different from the coefficient of the conditional 50% quantile regression. 25 We use the annualized monthly Fitted Alpha and Benchmark Alpha because most explanatory variables, such as Expense Ratio and Turnover Ratio, are reported on annual basis. In addition, using the annualized monthly Fitted Alpha and Benchmark Alpha facilitates the comparison of the results in Table 3. 26 Out of 19 benchmarks, the minimum Active Share benchmark is the one which has the most portfolio overlapping with a particular mutual fund. We obtain the minimum Active Share benchmark for each mutual fund from the website of Dr. Petajisto.
richer and more nuanced description on the impact of fund characteristic on performance. First, while we confirm prior studies that actively managed funds have superior average performance, we demonstrate that previous research has overlooked the risks of active management. We find that superior average performance is accompanied by higher volatility in performance. The impact of active management on performance is asymmetric at the top and bottom tails of fund performance distributions. For those mutual funds that are performing well, a higher degree of active management results in even better performance. However, for poorly performing funds, more active management exacerbates the poor performance. Because actual fund performance in a specific time interval cannot be predicted with certainty, the greater variability in performance of actively managed funds increases the risk to investors. The practical implication is that active managed funds are suitable for relatively more risk tolerant investors. This finding adds to a growing literature that casts doubt on the superiority of fund active management (see, for example, Frazzini et al., 2016).
Second, we find that the expense ratio not only reduces net risk-adjusted return on average, but significantly increases performance volatility. Furthermore, its negative impact is more pronounced for poorly performing funds. Third, we observe a similar pattern for the turnover ratio. High expense ratios and turnover ratios reduce fund performance and increase performance volatility, doubly adverse effects. In addition, negative impacts of turnover and expense ratios on performance are accentuated for the worst performing mutual funds. Since investors do not know in advance which funds will be good performers and which will be poor performers in specific time intervals, higher expense ratios and higher turnover ratios increase downside risk without any upside gain.  Table presents quantile regression analyses of monthly fund risk-adjusted returns. In panel A, the dependent variable is the annualized monthly Fitted Alpha in percentage (monthly Fitted Alpha multiplied by 12). In panel B, the dependent variable is the annualized monthly Benchmark Alpha in percentage. The Expense Ratio is the fund's annual expense ratio in percentage and the Turnover Ratio is the annual turnover ratio in percentage. R-squared is the R-squared of the 4-factor regression of the fund's prior 36 monthly returns. Lagged Alpha is the 4-factor regression intercept multiplied by 12. Lagged Log(Fund TNA) is the natural log of the fund's total net asset value at the end of the prior month. Lagged Log(Family TNA) is the natural log of the fund family's total net asset value at the end of prior month. The numbers in parentheses are t-statistics.

Appendix A. Quantile Regression
Quantile regression was developed by Koenker and Bassett (1978) and has been widely used in the social sciences (see Hao & Naiman, 2007, for a detailed discussion). It differs from the classical conditional-mean regression in that it does not impose the strong assumptions of homogeneity and normality in the dependent variable, making it an ideal tool when the dependent variable is heteroscedastic and/or highly skewed. Second, rather than modeling the conditional mean of the dependent variable as in classical least squares regression, quantile regression models the different conditional quantiles (percentiles) of the dependent variable as functions of the independent variables. In other words, quantile regression fits multiple linear functions for different conditional quantiles of the dependent variable. This enables an estimation of the impact of the independent variables on the whole distribution, rather than only the conditional mean, of the dependent variable. 27 It is important to note that quantile regression is not the same as running individual regressions on subsamples based on unconditional quantiles of the dependent or independent variables. Hallock, Maddozzo, and Reck (2010) has an excellent detailed discussion on this possible misconception about quantile regression.
In addition, the estimation technique of quantile regression is different from the least squares regression. Instead of minimizing the sum of squared deviations from the mean as in least squares regression, quantile regression uses linear programming to minimize the weighted sum of the absolute value of the deviations from a particular quantile.
The basic approach of quantile regression is to fit a linear function for each conditional quantile, q, (or percentile) of the dependent variable so that the absolute value of deviations from the fitted values are weighted by (1 -q) for observations below the q th quantile of the dependent variable and by q for observations above the q th quantile. The linear function for q th quantile minimizes the sum of the weighted absolute value of the deviations from the fitted value. Thus, to fit a linear function for low (high) percentiles, observations below (above) that percentile are weighted more heavily compared to observations above (below) that percentile. Suppose we consider the conditional 25th percentile. In the fitted function, the weighting is 0.75 for all observations below the conditional 25th percentile and 0.25 for all observations above the conditional 25th percentile of the dependent variable. On the other hand, for the conditional 90th percentile fitted function, observations below the 90th percentile are weighted by 0.10 and observations above the 90th percentile are weighted by 0.90.
To demonstrate the advantages of quantile regression and illustrate its differences from the conditional-mean regression, we perform a simulation. Specifically, we create three cases and fit the OLS and quantile regressions for each case. In the first case, the dependent variable is homoscedastic and we will show that OLS and quantile regression generate the same results. In the other two cases, the dependent variable is heteroscedastic. We will demonstrate that OLS cannot distinguish the heteroscedastic cases from the homoscedastic one. Further, we will show the results from quantile regression are more informative and describe fully the relation between the dependent and independent variables.
First, we generate 1000 random numbers, X i˜U (0, 10), where i = 1-1000. X is the independent variable. Next, we generate three dependent variables, Y, H and L. All three dependent variables follow conditional normal distributions. Y i is a random observation from a conditional normal distribution with mean of X i and standard deviation of 5. Thus Y i˜N (X i, 5), where i = 1 to 1000. Y is homoscedastic, i.e., the conditional normal distributions have the same standard deviation. Panels A and B in Fig. A1 are the scatter plots of Y against X.
The second dependent variable, H, is heteroscedastic. H i is a random variable drawn from a conditional normal distribution with mean of X i and standard deviation equal to (2.5 + X i /2). Thus, H i˜N (X i , 2.5+X i /2), where i = 1 to 1000. For high (low) values of the independent variable X, the standard deviation of H is larger (smaller). The standard deviation has a lower bound of 2.5 and upper bound of 7.5. The scatter plots of H against X in panels C and D in Fig. A1 clearly demonstrate heteroscedasticity of H.
The third dependent variable, L, is designed with very large heteroscedasticity. L i is a random variable from a conditional normal distribution with mean of X i and standard deviation equal to X i . Thus, L i˜N (X i , X i ), where i = 1 to 1000. The standard deviation can be as large as 10 and as small as 0. The triangular shape of the scatter plots of L against X in panels E and F of Fig. A1 reflects the very small (large) standard deviations in L at low (high) values of X.
Note that, by design, the conditional normal distributions of the three dependent variables all have the value of the independent variable as the mean. Thus, we expect the conditional-mean regression model will result in a coefficient of 1 on the independent variable.
Next, we run separate OLS regressions of the three dependent variables (Y, H, and L) on the independent variable X. As expected, the coefficients on the X variable are 1.0 in all three regressions. Panels A, C, and E in Fig. A1 have the OLS fitted line with 95% confidence intervals and 95% prediction limits. Note that, while the shapes of the three scatter plots differ greatly, the OLS fitted lines, 95% confidence intervals and the 95% prediction limits are basically the same for all three dependent variables. Overall, the OLS indicates the relation between Y and X is identical to the relations between H and X, and between L and X, ignoring the large heteroscedasticity and different impact of the independent variable on the dependent variable at different points of the distribution.
Next, we fit quantile regressions for each of the dependent variable at the 10th, 25th, 50th, 75th and the 90th percentiles. Panel B reports the quantile regression fits for the dependent variable Y. Note that the five fitted lines are parallel. Unreported quantile regression results show that the coefficients on the independent variable are close to 1 at all five percentiles. This suggests that quantile regression generates the same results and conclusion as the OLS when the dependent variable is homoscedastic and follows 27 Note that, if the dependent variable is identically and normally distributed, quantile regression will generate the same coefficient estimates at different conditional percentiles of the dependent variable. No additional information will be gained from quantile regression. M. Livingston, et al. Journal of Economics and Business xxx (xxxx) xxx-xxx conditional normal distribution. Panel D reports the quantile regression fits for dependent variable H. While all the fitted lines are upward sloping, those for the higher percentiles of the dependent variable are steeper. The steeper fitted line (and the larger coefficient) for the 90th quantile of the dependent variable indicates that value of the conditional 90th percentile of H increases much faster than the value of the conditional Fig. A1. Each panel contains the scatter plot and regression fit for a simulated dataset. X is the independent variable and X i˜U (0, 10), where i = 1 to 1000. Y, H and L are three dependent variables, each following a conditional normal distribution. Y i˜N (X i, 5), H i˜N (X i , 2.5+X i /2), L i˜N (X i , X i ), where i = 1 to 1000. Panels A, C and E have the scatter plots of each dependent variable and the OLS fitted line with 95% confidence interval and 95% prediction limits. Panels B, D and E have the scatter plots of each dependent variable and the quantile regression fitted lines for the 10th, 25th 50th, 75th and 90th percentiles.
10th percentile of H when the independent variable increases. Thus, quantile regression captures the asymmetric impact of the independent variable on the different percentiles of the conditional distribution of the dependent variable.
Panel F reports the quantile regression fits for dependent variable L. Interestingly, the fitted line for the 10th percentile is downward sloping, an observation confirmed by the negative coefficient for the 10th percentile regression. The downward sloping fitted line for the 10th percentile indicates that the lower tail of the dependent variable actually decreases as the independent variable increases, a pattern that is completely lost by the OLS. Similar to Panel D, the slopes of the other fitted lines are steeper for higher percentiles of the dependent variable.
The simulation shows that quantile regression provides an adjustment for heteroscedasticity since regression lines are fitted for different levels of the dependent variable. More importantly, quantile regression models can relate the dependent variable and independent variables throughout the whole distribution of the dependent variable, not just at the conditional mean as in OLS.