Media sentiment and stock returns

Based on 35,344 news articles published in the Financial Times that cover 40 companies that have been included in the Dow Jones Industrial Average, we find that a negative media sentiment in the form of a negative language tone in news articles is a priced factor in five of nine asset-pricing models that aim to explain the cross-section of stock returns. In particular, the sentiment factor is a priced factor in the market model augmented with the sentiment factor in all three samples — the 2005 – 09 subsample, the 2010 – 18 subsample, and the 2005 – 18 full sample — and in the Fama-French three-and five-factor models augmented with the sentiment factor in the 2010 – 18 subsample. In addition, factor-spanning regressions with the Fama-French five-factor model as the right-hand-side model confirm that the sentiment factor contributes to the model ’ s explanation of the stocks ’ mean excess returns in the 2005 – 09 subsample and the 2005 – 18 full sample.


Introduction
The use of automated text analysis in research to better understand financial markets dates back decades.Surveys of the literature include Das (2014), Kearney and Liu (2014), Li (2010a), Loughran and McDonald (2016), Marty et al. (2020), Mitra andMitra (2011), Nardo et al. (2016), Tetlock (2014), and Xing et al. (2018).One stream of research uses bag-of-words methods in which the grammar and word sequences in a document are ignored but the words are categorized with the use of dictionaries as positive, negative, etc.The number of words of a certain category in a document-for example, the frequency of negative words in a news article in the financial press-could be informative from, say, an investor perspective.
To give a taste of this growing body of literature, Tetlock (2007) wrote a seminal paper on the content of financial news media and the stock market that examined the influence of a column in the Wall Street Journal (WSJ), "Abreast of the Market," on stock returns.For this purpose, Tetlock (2007) used the Harvard General Inquirer to analyze the pessimism in the language tone in the WSJ column and found that the column contained information that could be used to predict short-term stock returns.Dougal et al. (2012) expanded Tetlock's (2007) research by examining the authorship of the WSJ column and found that journalists with a pessimistic language tone were associated with negative stock returns.Heston and Sinha (2017) used 900,754 news articles tagged with company identifiers from Thomson Reuters to test whether news predicts stock returns and found that daily news predicts stock returns one to two days ahead.Positive news quickly increased stock returns, whereas negative news spurred a delayed reaction.García (2013) analyzed two columns in the New York Times (NYT) published over a century  using the frequencies of both positive and negative words in the columns in the text analysis and found that the language tone in the columns was associated with future stock returns, especially during recessions.Hillert et al. (2014) analyzed 2,215,833 news articles from 45 U.S. newspapers and found a stronger tone-enhanced momentum effect in stocks of firms that were particularly covered by the media, where the language tone was measured by the fraction of negative words in the news articles.Chen et al. (2014) analyzed 97,070 articles and 459,679 commentaries written in response to these articles published on Seeking Alpha, a social media platform for investors, and found that the language tone in the articles and the commentaries was associated with future stock returns based on the frequency of negative words in the text analysis.Finally, Huang et al. (2014a) examined 363,952 analyst reports and found that investors reacted more strongly to negative than to positive text, where the opinion in the reports was measured as the difference between the frequencies of positive and negative words.
The cited research inspires our own research because it offers applications of useful methods for examining the language tone in articles in financial news media.1However, the aim of the present paper is not to provide the research community with yet another study on news and the predictability of stock returns per se.Instead, we ask the following question in the paper: is a negative media sentiment in the form of a negative language tone in news media a priced factor in asset-pricing models that aim to explain the cross-section of stock returns?For this purpose, we use 35,344 news articles published in the Financial Times (FT) that cover 40 companies that have been included in the Dow Jones Industrial Average (DJIA).
To the best of our knowledge, Fang and Peress (2009) were the first to study the relationship between media coverage (the NYT, USA Today, the WSJ, and the Washington Post) and the cross-section of stock returns using a factor model.After controlling for market, size, value, and momentum factors, they found that stocks with no media coverage earned higher returns than stocks with high media coverage.Specifically, zero-investment portfolios were sorted by media coverage (no, low, and high media coverage) that long stocks with no media coverage and short stocks with high media coverage, where their results were driven by the long legs in the portfolios.
Should Fang and Peress (2009) have expected a media premium in stock returns?There are at least two reasons-no market frictions and well-informed investors-why the answer is negative.First, a media premium might reflect the mispricing of stocks due to market frictions that prevent arbitrageurs from exploiting the mispricing, which is the impediments-to-trade hypothesis of Fang and Peress (2009).Second, a media premium might reflect compensation for imperfect diversification because investors are not well informed about all companies, which is the investor-recognition hypothesis of Merton (1987).Hence, Fang's and Peress's (2009) findings suggest that there are either stock market frictions or investors who are not well informed, or both.
In this paper, the sentiment factor is not about stocks being covered or not covered by the media.Instead, we take the analysis in Fang and Peress (2009) one step further by concentrating the analysis on only stocks that have been covered by the media.Specifically, when constructing the sentiment factor, we form zero-investment portfolios that long stocks with the most negative news and short stocks with the least negative news.We focus on the negativism in the language tone when constructing the sentiment factor because earlier research has shown that a negative language tone matters more for stock returns than a positive language tone (Huang et al., 2014a).
We adopt the two-stage approach of Fama and French (1993) to estimate three well-known factor models, with and without the sentiment factor, to answer the question of whether a negative media sentiment is a priced factor in asset-pricing models that aim to explain the cross-section of stock returns.For example, adding the sentiment factor to the Fama-French (2015) five-factor model results in the following six-factor model: where R i,t is the return on stock i, R F,t is the risk-free return, R M,t is the return on the market portfolio, NMP t is the return on the sentiment factor, SMB t is the return on the size factor, HML t is the return on the value factor, RMW t is the return on the profitability factor, CMA t is the return on the investment factor, e i,t is the error term, where the subscript t denotes time, a i is Jensen's alpha, and b i , n i , s i , h i , r i and c i are factor loadings.SMB t equals the difference between the returns on portfolios of stocks of firms with small and large market capitalization ("small minus big"), HML t equals the difference between the returns on portfolios of stocks of firms with high and low book-to-market ratio ("high minus low"), RMW t equals the difference between the returns on portfolios of stocks of firms with robust and weak profitability ("robust minus weak"), and CMA t equals the difference between the returns on portfolios of stocks of low and high investment firms, respectively referred to as conservative and aggressive firms ("conservative minus aggressive").
The new factor in the six-factor model in (1) is the sentiment factor, NMP t , which equals the difference between the returns on portfolios of stocks with negative and positive media sentiment ("negative minus positive").In this paper, we analyze the negativism in the language tone in news articles in the FT, and the sentiment factor equals the difference between the returns on portfolios of stocks of firms receiving the most negative news and those with the least negative news.
Because we are interested in learning whether a negative media sentiment is a priced factor in an asset-pricing model that aims to explain the cross-section of stock returns, the factor loadings from the time-series regressions in (1), with one time-series regression for each stock in the data set, are used as explanatory variables for the stocks' mean excess returns in a single cross-sectional regression (Fama and French, 1993): where the bar symbol denotes the variable's mean and the hat symbol denotes the estimated value of the parameter.The mean return is a proxy for the expected return.If a negative media sentiment is a priced factor in the cross-section of stock returns, then λ n ∕ = 0 in (2).
To be more precise, if λ n > 0, then a negative media sentiment has, on average, a positive premium in the cross-section of stock returns.The underlying intuition is that stocks associated with negative press are also associated with depressed stock prices and, therefore, are expected to have higher returns when recovering from the period with negative press.If λ n < 0, then a negative media sentiment has, on average, a negative premium in the cross-section of stock returns.Here, the underlying intuition is that the negative press tends to linger in investors' minds, affecting their perceptions of the stocks even after the period of negative coverage has ended.As a result, stock returns are not expected to rebound, if at all, as much as when λ n > 0.

Data set
We use the FT as our source of news coverage and use articles that were published during the period 2004-18 covering 40 companies that have been included in the DJIA.Table 1 shows the companies in the data set, their ticker symbols, and the periods during which they were included in the DJIA.
Our choice of period means that the Great Recession, which started in the U.S. in 2007 and thereafter spread with devastating effects to the rest of the industrialized world, is covered in the analysis.For this reason, we estimate the asset-pricing models not only using the full sample but also for a subsample that includes the Great Recession and another subsample that excludes this period of economic downturn.Table 2 shows the period, the number of articles, and the number of trading days associated with each sample in the data set, including the initiation sample for the construction of the sentiment factor.
The number of articles per company and year in the FT varies greatly (see Section 3.1 for how it is determined that an article in the FT is associated with a specific company). 2For example, articles about Apple are in the 800-900s per year during 2011-16.Other companies associated with many articles are Citigroup with at least 400 articles per year during 2006-11, JPMorgan Chase with at least 400 articles per year during 2012-15 and 2018, Goldman Sachs with at least 200 articles per year during 2006-18, and Microsoft with at least 200 articles per year

Table 1
Companies in the data set, their ticker symbols, and the periods during which they were included in the DJIA.

Table 2
Period, number of articles, and number of trading days associated with each sample in the data set, including the initiation sample for the construction of the sentiment factor.Note: The number of articles for the full sample equals the number of articles for the two subsamples, not including the initiation sample, and the number of trading days for the full sample equals the number of trading days for the two subsamples, not including the initiation sample.

Period
2 Descriptive statistics regarding the number of articles per company and year is available on request from the corresponding author.
M. Bask et al.
during 2005-16.In fact, 17 of the 40 companies included in the data set are associated with at least 100 articles in the FT in a single year.At the other extreme, there are several companies with no articles in the FT in some years: Altria Group, American International Group, UnitedHealth Group, United Technologies, Visa (although they are associated with more than 100 articles per year during 2012-18), and Walgreens Boots Alliance.
Because a small number of companies received high coverage in the FT during 2004-18 and other companies were not mentioned at all in some years, we do not believe that an expansion of the data set to include, say, all stocks ever contained in the S&P 500 during the study period would have a huge impact on our findings.Additionally, Fang and Peress (2009) found that more than 25% of NYSE stocks and more than 50% of NASDAQ stocks were not featured in the examined newspapers (the NYT, USA Today, the WSJ, and the Washington Post) in a typical year in their study; they concluded that "overall newspaper coverage is surprisingly low" (p.2028).That is, we would only add a large number of companies to the data set with no or only a few articles in the FT.Hence, we do not assert that our findings capture market-wide systematic risk.Our conclusions are specifically applicable to the companies included in our data set.However, these companies are among the largest players in the stock market and receive the most coverage in the financial media.
The daily stock price and index data, the latter being the S&P 500, were retrieved from Yahoo Finance 3 , and the daily data on the factors in the Fama-French (1992, 2015) three-and five-factor models were retrieved from Ken French's Data Library 4 .Stock price and index data have been adjusted for both dividends and splits.

Empirical analyses
The construction of the factor for a negative media sentiment is explained in Section 3.1, and we examine whether a negative media sentiment is a priced factor in asset-pricing models that aim to explain the cross-section of stock returns in Section 3.2.

Factor for a negative media sentiment
The return on the factor for a negative media sentiment equals the difference between the returns on stocks of firms receiving the most negative and the least negative news in the FT, where the stocks included in the long and short legs of the sentiment portfolio-or the sentiment factor-are updated on a yearly basis on the first trading day in July.Hence, the sentiment factor is updated with the same frequency and on the same date as the other factors in the factor models.
An article in the FT is attributed to a company included in the data set, say, American Express (including versions of the company name; e. g., Amex), if (i) the company is mentioned at least twice in the article and (ii) no other company is mentioned more often in the article.If two or more companies are mentioned an equal number of times in the article and more often than other companies are mentioned, the article is attributed to those companies.After all the articles have been attributed to companies, the fractions of negative words in the articles are determined using the Loughran-McDonald Sentiment Word List 5 .This dictionary is described in Loughran and McDonald (2011).
For each year starting on the first trading day in July and ending on the last trading day in June of the following year, we calculate the average fraction of negative words in the articles for each company that was included in the DJIA on July 1.Thereafter, we sort the companies on the average fraction of negative words during the year and construct a 30%-40%-30% zero-investment portfolio, where the long leg of the portfolio contains the stocks of the top 30% of companies with the most negative news, and the short leg contains the stocks of the top 30% of companies with the least negative news.The return on the portfolio is calculated on a daily basis as the difference between the returns on the equally weighted long and short legs.News articles, stock prices, and index data between July 1, 2004, and June 30, 2005, initiate the sentiment factor.
The correlation matrixes for the factors in the models are found in Tables 3a-3c.For the first and second subsamples, the sentiment factor has the strongest correlation with the value factor and the weakest correlation with the size factor.For the full sample, the strongest correlation is again with the value factor but the weakest correlation is with the investment factor.
The mean returns on the factors-or the mean returns on the portfolios-are found in Table 4, where several noteworthy observations can be made.First, the mean return on the market factor is the lowest of all six factors (i.e., the market, sentiment, size, value, profitability, and investment factors) in the first subsample, which includes the Great Recession, with a yearly return of -1.70%.Second, in contrast with the first subsample, the market factor has the highest mean return of all factors in the second subsample and the full sample with yearly returns of 12.20% and 7.75%, respectively.Third, the sentiment factor has the second-highest mean return in the second subsample and the full sample with yearly returns of 6.30% and 4.40%, respectively, whereas the mean return in the first subsample is on par with the Fama-French factors with a yearly return of 0.89%.Fourth, the value factor has a negative mean return in all three samples.
That the mean return on the market factor is higher than the mean return on the sentiment factor in two of three samples will shed light on some of the findings in the next section.

Table 3a
Correlation matrix for the factors in the models for the first subsample.

Table 3b
Correlation matrix for the factors in the models for the second subsample.

Is a negative media sentiment a priced factor in asset-pricing models?
We adopt the two-stage approach of Fama and French (1993) to examine whether the sentiment factor is a priced factor in the cross-section of stock returns.In the first stage (Section 3.2.1), a time-series regression is run separately for each stock in the data set.The parameter estimates, or factor loadings, from these regressions are then used in the second stage (Section 3.2.2) as explanatory variables for the stocks' mean excess returns in a cross-sectional regression.The parameter estimates in the latter regression are interpreted as premia for the factors, and we are interested in learning whether the sentiment factor has a positive or negative premium, if any premium.We also study whether the sentiment factor can be explained by other factors via factor-spanning regressions (Section 3.2.3).

First-stage regressions
In addition to estimating the six-factor model in (1) for each stock using daily data, we also estimate the two-factor model with the market and sentiment factors, and the four-factor model with the market, sentiment, size, and value factors, for the same stocks using daily data.For comparison, we also estimate the factor models in ( 1) and ( 4)-( 5) without the sentiment factor: and Table 5 shows a summary of the significant results from the first-stage time-series regressions. 6irst, the market factor is significant in the time-series regressions for at least 39 of 40 stocks in the two subsamples and the full sample (p = 0.001).Second, the sentiment factor is significant for 28-33 stocks in the two subsamples and for 32-34 stocks in the full sample (p = 0.05).Third, the size and value factors are significant for 24-33 respective 27-33 stocks in the two subsamples and for 33-35 respective 29-31 stocks in the full sample (p = 0.05).Fourth, the profitability and investment factors are significant for 20-33 respective 30-32 stocks in the Note: The monthly return is the daily geometric mean return transformed to the monthly geometric mean return under the assumption that there are 21 trading days per month, and the yearly return is the monthly geometric mean return transformed to the yearly geometric mean return.Note: The 1-factor model is in (6), the 2-factor model is in (4), the three-factor model is in (7), the four-factor model is in (5), the five-factor model is in (8), and the sixfactor model is in (1).a/b/c shows that a specific factor (e.g., the sentiment factor, NMP) or the intercept is significant (at the 0.001 level, the 0.01 level, or the 0.05 level) for a stocks in the 2005-09 subsample, b stocks in the 2010-18 subsample, and c stocks in the 2005-18 full sample.
two subsamples and for 28-29 respective 32-34 stocks in the full sample (p = 0.05).Fifth, Jensen's alpha is significant only for a few stocks in the two subsamples and the full sample (p = 0.05), where Jensen's alpha should be indistinguishable from zero in a well-specified model.To summarize, the sentiment factor is significant in the time-series regressions for as many stocks as for the size, value, profitability, and investment factors.Notably, the loading for the size factor in the time-series regressions is, in most cases, negative.This is not surprising since the DJIA only includes companies with large market capitalizations.Moreover, growth stocks are in the majority since the loading for the value factor in the time-series regressions is more often negative than positive.Companies with robust profitability or a conservative investment style are also in the majority since the loadings for the profitability and investment factors in the time-series regressions are more often positive than negative.

Second-stage regressions
The factor loadings from the time-series regressions are next used in cross-sectional regressions, where the dependent variable in the regressions is the stocks' mean excess returns.In particular, for each factor model, there is a corresponding cross-sectional regression model.For example, the cross-sectional regression in (2) corresponds to the sixfactor model in (1).
In addition to estimating the model in (2), we also run the following regressions: and where λ n ∕ = 0 if a negative media sentiment is a priced factor in the crosssection of stock returns.For the sake of completeness, we also run the following regressions in which the loading for the sentiment factor has been excluded from the models in (2) and ( 9)-( 10): and The estimation results from the second-stage cross-sectional regressions for the two subsamples and the full sample are shown in Tables 6-11.First, the sentiment factor is a priced factor in the cross-section of stock returns in five of nine model specifications: the market model augmented with the sentiment factor in the two subsamples (p = 0.001) and the full sample (p = 0.01), and the Fama-French (1992, 2015) threeand five-factor models augmented with the sentiment factor in the second subsample (p = 0.01).Second, the market factor is a priced factor in all but one model specification (p = 0.05), the size and value factors are priced factors in four respective five of twelve model specifications (p = 0.05), and the profitability and investment factors are priced factors in four respective four of six model specifications (p = 0.05).Finally, the sentiment, size, value, and investment factors have, on average, negative premia, and the market and profitability factors have, on average, positive premia in the cross-section of stock returns.the Fama-French (2015) five-factor model's explanation of the stocks' mean excess returns in the 2005-09 subsample and the 2005-18 full sample (p = 0.05).

Discussion
Without prior knowledge of the findings in Fang and Peress (2009), one would expect that markets function well enough and that investors are informed enough that there is no premium for investing in stocks with no media coverage (cf., market efficiency), let alone investing in stocks associated with negative news instead of stocks associated with not-so-negative news.However, because Fang and Peress (2009) showed that there is a premium for investing in stocks with no media coverage, we took their analysis one step further by focusing on stocks that actually have been covered by the media and asking whether there is a premium for investing in stocks associated with negative news.Specifically, we studied the language tone in 35,344 news articles published in the FT during a 14 ½-year-long period covering 40 companies that have been included in the DJIA.Be aware that our data set does not allow us to construct a factor for media coverage, as in Fang and Peress (2009), since there are not enough stocks without media coverage in any single year.
We found that a negative language tone in the news articles is a priced factor in five of nine asset-pricing models that aim to explain the cross-section of stock returns.Specifically, the sentiment factor is a priced factor in the market model augmented with the sentiment factor in the 2005-09 subsample (p = 0.001), the 2010-18 subsample (p = 0.001), and the 2005-18 full sample (p = 0.01), and in the Fama-French (1992, 2015) three-and five-factor models augmented with the sentiment factor in the 2010-18 subsample (p = 0.01).Moreover, the parameter estimate for the sentiment factor is negative in those five asset-pricing models.Thus, a negative media sentiment has, on average, a negative premium in the cross-section of stock returns.
Furthermore, factor-spanning regressions with the Fama-French (2015) five-factor model as the right-hand-side model confirm that the sentiment factor contributes to the model's explanation of the stocks' mean excess returns in the 2005-09 subsample and the 2005-18 full sample (p = 0.05).To be more precise, Jensen's alpha is positive in those two factor-spanning regressions, which means that the sentiment factor-or the sentiment portfolio-earns a positive risk-adjusted return.Hence, the positive mean return on the sentiment factor is still positive after risk adjustment by the Fama-French (2015) five-factor model.
Upon initial review, the outcomes from the cross-sectional regressions appear to conflict with those from the factor-spanning regressions.Specifically, one might wonder how a trading strategy that longs stocks with the most negative news and short stocks with the least negative news can be profitable, even after adjusting for risk.This is especially perplexing when the sentiment factor seems to reduce the stocks' mean excess returns in a factor model.However, this apparent inconsistency is resolved when considering that the mean return for the market factor outperforms that of the sentiment factor in two of three samples.Although investing in the sentiment portfolio would have yielded a profit, choosing the market portfolio would have been even more lucrative in two of three samples.Therefore, the positive riskadjusted return of the sentiment portfolio is not at odds with the sentiment factor carrying a negative premium when the more profitable market portfolio is included in the factor model.
What can we learn from the time-series regressions for individual stocks?First, the sentiment factor was significant for as many stocks as for the size, value, profitability, and investment factors, where the factors were significant in the three samples for approximately threefourths of the stocks (p = 0.05).Second, if the regressions for individual stocks are studied, the loading for the sentiment factor is significantly positive in all nine regressions for Citigroup and JPMorgan Chase, it is significantly positive in six regressions for Goldman Sachs, and it is significantly negative in five and eight regressions for Apple and Microsoft, respectively (p = 0.05).These companies have in common that there were thousands of articles about each of them in the FT during the study period.
It is not surprising that the loading for the sentiment factor in the time-series regressions is positive for Citigroup, Goldman Sachs, and JPMorgan Chase because of the negative press surrounding the financial industry at the onset and during the Great Recession.The loading for the same factor is also positive for American Express and Bank of America in seven respective all nine regressions (p = 0.05).In other words, investors in those companies were compensated with a higher expected return due to the riskiness of stocks in the financial sector.Therefore, even though there is no support in this paper for the claim that a negative media sentiment is associated with a positive premium, one can easily find individual stocks in which investors have been compensated with a higher expected return because of the negative press surrounding these stocks.

Declaration of Competing Interest
The authors of the paper "Media sentiment and stock returns" declare no competing financial interests or personal relationships that could have influenced the work reported in the paper.

Table 3c
Correlation matrix for the factors in the models for the full sample.

Table 4
Monthly and yearly factor returns.

Table 5
Significant results in the time-series regressions.

Table 6
, where the factor loading from the market model is the explanatory variable for the stocks' mean excess returns.

Table 7
, where the factor loadings from the market model augmented with the sentiment factor are the explanatory variables for the stocks' mean excess returns.

Table 8
Fama-French (1992) loadings from theFama-French (1992)three-factor model are the explanatory variables for the stocks' mean excess returns.

Table 9
Fama-French (1992) loadings from theFama-French (1992)three-factor model augmented with the sentiment factor are the explanatory variables for the stocks' mean excess returns.