Is Corporate Social Responsibility investing a free lunch? The relationship between ESG, tail risk, and upside potential of stocks before and during the COVID-19 crisis

Did Corporate Social Responsibility investing benefit shareholders during the COVID-19 pandemic crisis? Distinguishing between downside tail risk and upside reward potential of stock returns, we provide evidence from 5,073 stocks listed on stock markets in ten countries. The findings suggest that better ESG ratings are associated with lower downside risk, but also with lower upside return potential. Thus, ESG ratings helped investors to reduce their risk exposure to the market turmoil caused by the pandemic, while maintaining the fundamental trade-off between risk and reward.


Introduction
There is a widespread perception that investors consider stocks with better Environmental, Social and Governance (ESG) ranking to be safer during market turmoil, and they expect them to exhibit a greater potential for future recovery from the crisis.
Research on the 2008-2009 financial crisis reveals that firms with high social capital, as measured by corporate social responsibility (CSR) intensity, were substantially less affected than firms with low social capital (Lins et al., 2017). The COVID-19 pandemic has reminded corporations and equity investors that markets suffer from rare but extreme negative shocks (Goodell, 2020;Kantos et al., 2020;Rizwan et al., 2020). Did CSR investment also pay off in this global financial turbulence?
Early generations of measurement of CSR, captured by ESG ratings, were only indirectly connected with firms fundamentals and therefore also questioned by both investors and researchers (Christensen et al., 2021;Eccles et al., 2012;Kotsantonis et al., 2016;Porter et al., 2019). The new ESG generation, originally developed by Sustainalytics, is explicitly designed to help investors identify and understand financially relevant ESG risks at the security and portfolio level and how they might affect the long-term performance for equity and fixed income investments (Gaussel and Le Saint, 2020). Contrary to traditional ESG approaches, a higher score reflects higher ESG risk exposure.
Although there is support in the literature that investments with lower ESG risks can be considered as safer during strong stock market turmoil, the overall evidence is somewhat ambiguous (for a discussion, see Bruna and Lahouel, 2021). For instance, Broadstock et al. (2021) explore the role of ESG performance in China before and during the pandemic and find that high ESG portfolios generally outperform low ESG portfolios. They also show that good ESG performance mitigates financial risk during the crisis. On the other hand, using a sample of 1750 U.S. firms and two alternative CSR ratings, MSCI ESG Stats and Thomson Reuters Refinitiv data, Bae et al. (2021) find no evidence that CSR affected stock returns during the crash period. However, also exploiting the Refinitiv data, Albuquerque et al. (2020) report that stocks with high ESG ratings are more resilient during a time of crisis and had significantly higher returns, lower return volatilities, and higher trading volumes than other stocks during the first quarter of 2020.
Basing their analysis on Morningstar data, Ferriani and Natoli (2020) find that equity funds with low ESG risk scores experienced positive investment inflows during and after the stock market collapse, while high risk ESG funds suffered sell-offs during the panic phase and afterwards. While all examined funds experienced negative cumulative returns, low risk funds scored significantly better than other funds. Exploiting data from MSCI, Singh (2020) studies the period May 2017-May 2020 and shows that risk averse investors sought shelter in CSR portfolios during the crisis period. Döttling and Kim (2020) apply a difference-in-differences framework using retail fund flow and ESG ratings data from Morningstar, and show that investors' demand for sustainability significantly weakens the economic stress induced by COVID-19. Using Morningstar ESG ratings as well, Pástor and Vorsatz (2020) analyze flows of U.S. active equity mutual funds during the COVID-19 crisis in 2020 and report that investors favored funds with high sustainability ratings, while the performance results are less conclusive. Pavlova and de Boyrie (2021) use Morningstar data to investigate risk-adjusted returns on 62 exchange trade funds before and during the COVID-19 market crash. They report that higher sustainability ratings did not protect the funds from losses during the downturn 2020, however they did not perform worse than the rest of the market. There are also a number of studies which provide evidence that high ESG score companies enjoy higher risk-adjusted returns (for instance Ashwin Kumar et al., 2016;Sherwood and Pollard, 2018;Verheyden et al., 2016).
The current paper contributes by analyzing the connection between ESG ratings and tail risks. We provide an answer to the question whether stocks with better ESG scores have been more resilient to higher financial market uncertainty. We study both the traditional and the new generation ESG ratings, and utilize a recent approach by Patton et al. (2019) to estimate tail returns as conditional Value-at-Risk (cVaR) and conditional Value-of-Return (cVoR) for a broad sample of 5,047 stocks from global stock markets.
Tail return measures for each stock are then combined with the ESG scores over the sample period January 2018 to October 2020 and correlated random effects regressions are employed to estimate the relationship between ESG and tail returns. We find that stocks with superior scores for both ESG generations have overall lower tail risks, but at the same time also a lower upside potential. A main conclusion is therefore that the ESG measures help investors to identify stocks with high risk exposure. The fundamental trade-off between risk and return remains.
The rest of the paper is organized as follows. Section 2 presents the data and empirical methodology. Results are provided in Section 3. Section 4 reports robustness tests. Section 5 concludes.

Data and methodology
Monthly ESG scores of various firms from different countries and industries are obtained from Sustainalytics which globally provides research and data related to ESG and corporate governance. The time frame of the collected ESG data is from January 2018 to October 2020 and includes a high number of stocks which are listed in ten countries: United States, Canada, Sweden, Germany, France, United Kingdom, Netherlands, Australia, China, and Japan. 1 Our motivation to choose stocks from these countries is that they represent markets with different CSR engagement, different regions and different markets sizes. 2 While the traditional ESG measure is built on three individual pillars Env, Soc and Gov, the new measure, ESG risk rating, distinguishes between overall risk exposure (OES) and overall managed risk (OMS).
We obtain daily adjusted returns from Thomson Reuters Refinitiv database. The return data expands from January 2006 to October 2020. Using an estimation window from January 2006 to December 2017, we obtain out-of-sample lower and upper tail forecasts at the 1% level until October 2020. We include stocks that have at least 1000 returns during the estimation window. Then, we evaluate the accuracy of the risk models, from January 2018 to October 2020, and identify the best performing risk model for each stock. Finally, we investigate the impacts of the ESG scores on the tail forecasts during the 2018-2020 period. 3 We use Value-at-Risk (VaR) and conditional Value-at-Risk (cVaR) as financial risk measures. For an asset, the VaR is defined as the maximum loss given a probability level ∈ (0, 1), and the cVaR, also known as expected shortfall, measures the expectation of losses beyond the VaR. Let ∈ R be an asset return at time , with distribution function conditioned on information set −1 , s.t. | −1 , the − level VaR and cVaR at time are given as: 1 Descriptive statistics for stocks in each country are provided in Table S1 in the online supplementary materials. 2 Although the global spread of ESG ratings is growing rapidly, coverage is still low in most countries, including OECD countries. Therefore, data availability has been a criterion for selecting countries. In addition, there are a limited number of small economies in Europe with relatively good ESG coverage. In total, our ESG data captures the financial markets of seven of the world's largest economies. Optimally, we would of course like to include more countries than the ten chosen in our studies. But the low ESG coverage in some countries would make country-level findings less reliable. 3 We convert the daily out-of-sample lower and upper tail forecasts to the monthly frequency by taking the average value for each month. Using the daily frequency on financial performance enables better out-of-sample back-testing of the risk models. To estimate these risk measures, we apply several risk models, including generalized autoregressive conditional heteroscedasticity (GARCH) and generalized autoregressive score (GAS). The latter is applied either to model VaR and cVaR jointly, as suggested in Patton et al. (2019), or to estimate VaR and cVaR from a conditional step-ahead distribution for returns, similar to Ardia et al. (2019). In the supplementary materials, Section 1, we introduce the risk models. To describe the potential of upside returns, we use Value-of-Return (VoR ) and conditional Value-of-Return (cVoR ) at level .
To test the link between ESG rating and stock tail returns, we use the correlated random effects (CRE) approach (Mundlak, 1978;Wooldridge, 2010;Schunck, 2013;Schunck and Perales, 2017) formulated aŝ cVaR = 0 + + ESG + + industry + + whereĉVaR is one-step ahead cVaR (or cVoR) forecast for stock at time , is a stock-specific effect, uncorrelated with the error term , and are within and between estimates, respectively, are time-invariant industry and country variables, and denotes time effects.
denotes the average ESG score for stock , and industry is a time-invariant industry effect. We apply the same model for the opposite upside tail measure cVoR.

Results
Table 1 displays the summary statistics of the variables used. There are more observations of ESG than ESG risk rating as the former starts in January 2018 and the latter in December 2018. However, the new measure has a better coverage of its components.
Results of the CRE model regression on the relation between the old ESG and downside risk are presented in Table 2. We apply several risk models to forecast VaR, VoR, cVaR and cVor at each level of , and select the best-performing model, with the lowest average loss computed from the Fissler and Ziegel (FZ) joint scoring function suggested in Fissler et al. (2016). 4 We further perform the goodness-of-fit test suggested in Patton et al. (2019).
Columns (1) and (3) of Table 2 report estimates for the pandemic crisis year 2020, while columns (2) and (4) show the estimates for pre-crisis years 2018 and 2019. Furthermore, columns (1) and (2) report the estimates for the ESG score, while columns (3) and (4) present results for the pillars Env, Soc and Gov, separately.
Two main conclusions can be drawn from Table 2. First, the impact of the ESG scores on downside risk becomes more pronounced in the year 2020 during the pandemic crisis. Second, the between estimate is in most cases higher than the within estimate, and between estimates appear to be more statistically significant. This is largely explained by the low variation of the rating for the stocks over time. As many as 98% of companies maintained the same ESG rating during the peak of financial volatility in the spring of 2020. Conventional wisdom states that the between estimate measures the long-term impact, while the within estimate shows the short-term impact of the variable. The coefficient of 0.0247 for ESG (w) implies that an improvement of the ESG score by 1 unit reduces a stock's downside risk by 2.5 basis points (bps), while the ESG (b) coefficient of 0.07 implies that stocks with a 1 unit higher ESG score have 7 bps lower downside risk in the long-term.
Considering the ESG pillars, for Env the between effect is positive and highly significant during both periods. The between estimate for Gov is positive and significant at the 5% level in column 4 (2018-2019), and weakly significant in column 3. The estimates for Soc are non-significant in both columns. Table 3 estimates Eq.
(2) with the ESG risk rating measure where the sign is to be interpreted inversely because low rating indicates low risk. Surprisingly, the results for the aggregate measures are very similar to Table 2. In contrast to Table 2, the within estimate is statistically different from zero for the pandemic year 2020 and suggests lower downward risk, while being non-significant for the previous period. The between estimate shows that higher scores for the overall risk exposure increases the downward risk for both periods. Considering individual pillars, the between measure for overall managed risk is associated with reduced downside risk for both periods. 4 See Figures S4-S13 in the online supplementary materials which provide the -values for each model across all stocks. In this test, a -value higher than 10% suggests no indication of evidence against optimality, and therefore, a good fit for the corresponding risk model. We also compare the risk models using the Diebold-Mariano test. Those results are available upon request.  Table S2. Cluster-robust t statistics in parentheses.
(w) denotes the within, (b) denotes the between estimate. rho indicates the fraction of variance due to stock random effects. * < 0.10, ** < 0.05, *** < 0.01.  Table S3. Table 4 estimates whether ESG ratings also affect firms upside reward potential during the turmoil period. Results for the old ESG measure are presented in columns (1) and (2), and for the new measure in columns (3) and (4). Focusing on the between estimates, the results for cVoR 0.01 show that higher ESG is associated with lower upside potential before and during the crisis. For ESG risk rating higher scores imply higher upside potential. 5 H. Lööf et al.  Table S4. Notes: Sample year 2020. Sample split below and above the median of stock characteristic. Cluster-robust t statistics in parentheses. Random stock effects and Fixed time effects included. Industry and country effects included. (w) denotes the within, (b) denotes the between estimate. rho indicates the fraction of variance due to stock random effects. 50 indicates the median values of the split variable in the subsample. * < 0.10, ** < 0.05, *** < 0.01.

Robustness tests
We investigate whether stock characteristics mediate the effects of the ESG ratings. Since these characteristics are not normally distributed, we split the sample using the median. Table 5 (ESG) and 6 (ESG risk rating) consider sample splits below and above median values of selected stock characteristics: market capitalization, beta, dividend yields and P/E.
The results confirm that the relationship between ESG and downside risk is not mediated by stock characteristics which are omitted in the regression models. Overall, the relationships are more pronounced for stocks with high P/E ratio and low dividend yield, which are typically considered as stocks with higher risk.
We also test whether including lags of ESG and ESG risk rating would affect the results for downside risk and upside potential. Lagging the CSR variables by one month, we find essentially the same results. The estimations are also robust when the number of country-specific COVID infections are included as regressors. The COVID cases variable exhibits a strong relationship with the forecasted downside tail risk of stocks.
Finally, one potential concern is whether reported standard errors are accurate. We report cluster robust standard errors at the stock level in all tables. Cross-stock correlations and dependencies could be a concern, which are not taken into account by the cluster robust standard errors. To analyze whether consideration of heteroscedasticity, autocorrelation and cross-sectional correlation could alter the conclusions we also estimate the models using Driscoll and Kraay (1998)'s robust standard errors. Overall, these robust standard errors are smaller compared to cluster robust standard errors, and statistical inference gets even stronger.

Conclusions
The main finding of this paper is that stocks with higher ESG ratings have less downside risk, but also possess less upside potential. These relationships became more pronounced during the COVID-19 crisis compared to the period before. This implies that investors can reduce their risk exposure by investing in companies with superior CSR, but at the same time they reduce the likelihood to obtain higher upside returns. This conclusion applies to both the old and the new generation of ESG measures.
Overall our results highlight that the fundamental trade-off between risk and return also holds for ESG investing. These results have practical applications for asset allocation and portfolio management. In particular, we provide evidence that ESG investing is more suitable for risk-averse investors. However, we only apply univariate risk models without considering the dependence structure between assets. Future research should (i) consider the impacts of CSR on other properties of asset returns such as risk-adjusted performance and dependence structure, and (ii) incorporate ESG investing in multi-objective portfolio optimization.