News-driven inflation expectations and information rigidities

We investigate the role played by the media in the expectations formation process of households. Using a novel news-topic-based approach we show that news types the media choose to report on, e.g., fiscal policy, health, and politics, are good predictors of households’ stated inflation expectations. In turn, in a noisy information model setting, augmented with a simple media channel, we document that the underlying time series properties of relevant news topics explain the time-varying information rigidity among households. As such, we not only provide a novel estimate showing the degree to which information rigidities among households varies across time, but also provide, using a large news corpus and machine learning algorithms, robust and new evidence highlighting the role of the media for understanding inflation expectations and information rigidities. JEL-codes: C11, C53, D83, D84, E13, E31, E37


Introduction
In most democracies the fourth estate, i.e., the news media, plays an important role in society. The media not only has the capacity of advocacy and implicit ability to frame political and economic issues, but it is also the primary source from which most people get information. 1 In macroeconomics, expectations are center stage. But, expectations are shaped by information, and information does not travel unaffected through the ether.
Rather, it is digested, filtered, and colored by the media. Surprisingly, however, the potential independent role of the media in the expectation formation process has received relatively little attention in macroeconomics, both in theory and in applied work. 2 In this paper we build on a growing literature providing evidence for a departure from the full information rational expectation (FIRE) assumption towards a theory of information rigidities (Coibion and Gorodnichenko (2012), Dovern et al. (2015), Coibion and Gorodnichenko (2015a), Armantier et al. (2016)), and investigate the potential role played by the media for households' inflation expectations in this setting.
Our analysis is motivated by two particular views giving rise to information rigidities.
First, we take the view that acquiring information can be looked upon as a choice variable (Sims (2003), Gorodnichenko (2008), Woodford (2009), Mackowiak and Wiederholt (2009)), implying that the degree of information rigidity is time-varying. For example, in times of trouble it might be worth devoting more resources towards staying informed than in more regular time periods. Second, as in Nimark and Pitschner (2018), we assume that no agent has the resources to monitor all events that are potentially relevant for her decision, and thereby delegate their information choice to specialized news providers who report only a curated selection of events. That is, the media works as "information intermediaries" between agents and the state of the world. In a general, but abstract, theoretical model, Nimark and Pitschner (2018) show that this delegation is optimal when the information flow is overwhelming, and that media's news selection functions and distributions of events jointly determine the degree to which knowledge about an event is common among agents. 3 We merge and test these two views within the setup proposed by Coibion and Gorod-  nichenko (2015a). Starting from a noisy information framework, where agents form and update beliefs about the underlying fundamentals via a signal extraction problem, they show that agents' forecast revisions are a sufficient statistic when testing for information rigidities in forecasting efficiency regressions. In Section 2 we augment this framework by introducing time variation in the underlying parameters, and an explicit, but simple, role for the media. It then follows that information rigidity is a function of the time-varying persistence in media coverage and the noise-to-signal ratio in the signal extraction problem. The mechanics of the model are straight forward. When an important event happens, media coverage potentially becomes more concentrated and persistent around this event, and perhaps easier to filter (less noisy) for the agents. Accordingly, information rigidity is reduced as agents put more weight on new information relative to their previous forecasts.
This contrasts with (a time-varying version of) the conventional model, where the degree of information rigidity would be determined by the time series properties of inflation itself, but mirrors our assumption that the media works as "information intermediaries" between agents and the state of the world.
Focusing on households' one-year-ahead expectations of inflation, measured by the University of Michigan Surveys of Consumers, we proceed empirically in four successive steps. First, in Section 3, we use a time-varying parameter model to fit households' forecast errors with their forecast revisions, and assess whether information rigidities among households actually show high-frequency variation across time. As seen from the solid black line in Figure 1a, which illustrates our estimate of time-varying information rigidity among households, we provide a confirmatory answer to this question. On average, our estimate of the degree of information rigidity is just above one. Interpreted through the lens of the signal extraction model outlined in Section 2, this implies that households put a weight of roughly 0.4 on new information. In turn, this estimate is in line with the existing literature, but as seen from the figure, it is far from constant across the sample.
Second, in Sections 4.1 and 4.2, we use techniques from the Natural Language Processing (NLP) literature to construct 80 measures of the news topics the media writes about, i.e., the different types of news reporting, and map these high dimensional data to actual inflation expectations using penalized linear regressions of the LASSO type (Tibshirani (1996)). The intuition for why we focus on news topics, and how, is discussed further below, but as illustrated in Figure 1b, which graphs inflation expectations together with the fitted values from the LASSO, news topics written about in the media have high predictive power for consumers' inflation expectations. While the degree of sparsity is large, from 80 potential news topic candidates only roughly 10 are selected and significant, we later show that the narrative realism of the news-topic-based approach is good.
Additional results strongly indicate that this type of textual data contain information not captured by a large set of conventional economic indicators, including inflation itself, confirming that the media is an important source for information among households.
Third, in Section 4.3, we combine the results obtained above, and investigate whether the evolution of households' information rigidity can be explained by the time-varying time series properties, i.e., the persistence and noise-to-signal ratio, in the statistically selected set of news topics. Running simple linear regressions, but taking aboard all posterior uncertainty, we confirm that it can. The regression fit is dominated by the persistence measure, and when media persistence (noise-to-signal) is high (low), information rigidities tend to be low (high), and vice versa, as theory predicts.
We later show, in a falsification experiment, that this result is highly unlikely to be obtained by chance. In particular, looking at the persistence in news topics that are not selected by the LASSO, i.e., not relevant for households' inflation expectations, we obtain mostly insignificant results. Further analysis, in Section 4.4, also confirms, in line with our hypothesis that households' inflation expectations are news-driven, that a negative correlation is not found between the persistence of inflation itself and information rigidity. 4 As alluded to above, important business cycle events and information rigidities are closely related. In state-dependent models of information rigidity, the degree of information rigidity should be inversely related to business cycle developments, and much lower after a large and visible shock than during normal periods (Mackowiak et al. (2018)). At 4 In contrast, results presented in Appendix E show that if we instead focus on expectations measured by the Survey of Professional Forecasters, the media does not matter, but the persistence in inflation itself does. This is in line with the intuition that the media matters foremost for households, and less so for professionals.
the same time, in the related delegated information choice view of the world (Nimark and Pitschner (2018)), the distribution of events and media's reporting on those events jointly determine the degree of information rigidity. In Section 4.5 we report estimates from Vector Autoregressive (VAR) models, confirming these predictions. We document a strong dynamic interaction between the business cycle and the media-based estimates of information rigidity proposed here. For policy institutions aiming at managing consumers' expectations to stabilize economic fluctuations, e.g., monetary policy (Galí (2008)), this also highlights the role of the media for their communication strategies. 5 Finally, as seen from Figure 1a, the Great Recession period was associated with a high level of information rigidity, and not low, as would be the prediction from our model. The importance of news media is further documented in Section 5 when we try to understand why? In the process we document that the significant relationship between households' expectations and news topics withstand out-of-sample evaluation. Still, during the Great Recession period the predictive relationship between news topics and households' expectations breaks down, but strengthens when using news topics to predict actual inflation.
This suggest that the quality of information was good, but that something happened with how households consumed it. Using newspaper circulation statistics, which showed the largest cyclical fall since the 1940s during the Great Recession period, we find evidence consistent with what we label an information diffusion story: If the quality of information is good, but information diffusion is poor, the quality of information matters less. An alternative interpretation of this result is that the households consumed a different type of news during the GR period, e.g., (free) social media, and not that information diffusion was lower per se. Irrespective of interpretation, this is in accordance with our main conclusions about news media's important role in the expectations formation process.
The contribution, and novelty, of our analysis is threefold. First, we provide direct evidence of time-variation in the degree of information rigidity among households in the U.S. Although results reported in Loungani et al. (2013), Coibion and Gorodnichenko (2015a), and Dovern et al. (2015) point towards low frequency state-dependence in the degree of information rigidity, we are, to the best of our knowledge, the first to provide a quantitative measure of high-frequency changes in the degree of information rigidity among households in the U.S. Second, we are the first to relate measures of households information rigidity to the time series properties of the news, i.e., the persistence and noise-to-signal ratio. This is important because it puts our analysis well within an established theoretical framework 5 As Blinder et al. (2008) write: "...if researchers are interested in testing market responses to communication, it may make sense to focus on statements that actually reach market participants, and on the content as conveyed by the media." used to test and explain information rigidities (Coibion and Gorodnichenko (2015a)).
However, by analyzing the relationship between information frictions and the media our analysis also speaks directly to work by Carroll (2003), Doms andNorman (2004), Pfajfar andSantoro (2008), Pfajfar and Santoro (2013), Lamla and Lein (2014), Dräger and Lamla (2017). The seminal contribution by Carroll (2003) is particularly well known. He shows in an epidemiological model of inflation expectations that households update their beliefs towards professionals, assumed to express their views through the media and to be fully informed, more frequently in periods of intense media reporting on inflation. The epidemiological view, however, has later been questioned by Pfajfar and Santoro (2013), who show that available and perceived news stories do not help at restricting the forecast gap between professionals and households, but rather widen it.
Finally, we make an important contribution in how we use text as data to better understand the expectations formation process among households. In contrast to the earlier studies in the literature cited above, which almost exclusively have derived quantitative media measures by counting terms related to inflation in the news, we work with the assumption that many news items might be of relevance for inflation expectations, even without explicitly mentioning terms related to inflation. As such, we hypothesize that when the media writes extensively about topics related to, e.g., politics, even without explicitly mentioning terms related to inflation, this reflects that something is happening in this area that potentially has economy-wide effects, and might therefore also affect inflation expectations. In line with this, we find that news stories about, e.g., fiscal policy, health, and politics, significantly affects households' inflation expectations. Importantly, we also show that the news-topic-based approach adopted here delivers results in accordance with theory, while an approach relying on simple word counts does not.
Technically, the news-topic-based approach is operationalized by estimating a topic model, of the Latent Dirichlet Allocation (Blei et al. (2003)) class, on a large news corpus extracted from the Dow Jones Newswires Archive (DJ). Following Larsen and Thorsrud (2018b), the topic decomposition is then transformed into tone-adjusted time series, measuring how much each topic is written about in the media at any given point in time. 6 In sum, the analysis conducted here provide positive evidence in favor of the statedependent information rigidity view, as advocated in, e.g., Sims (2003), Wiederholt (2009), andGorodnichenko (2015a). However, as in Nimark and Pitschner (2018), our analysis emphasizes the role of information providers. As such, this study also speaks to the literature trying to identify the causal effect of the media. While 6 Thorsrud (2018) and Larsen and Thorsrud (2018a) show that a similar topic decomposition of economic news can be used to construct daily business cycle indicators with very good classification and nowcasting properties for GDP growth. See also Hansen and McMahon (2015), Larsen (2017), Hansen et al. (2018) and Dybowski and Adämmer (2018) for related economic applications of the LDA technology.

Information rigidities in theory
There are many theoretical models that predict a departure from the full information part of the FIRE assumption used in text book economics. For example, Mankiw and Reis (2002) propose a sticky-information model where agents update their information sets infrequently as a result of fixed cost to the acquisition of information, while Sims (2003) and Woodford (2003) have proposed mechanisms categorized as noisy information models. In Sims (2003), for example, the underlying assumption is that people's ability to process information is constrained and thereby rationally choose what information to pay attention to (rational inattention).
Starting from a noisy information framework, where agents form and update beliefs about the underlying fundamentals via a signal extraction problem, Coibion and Gorodnichenko (2015a) show that economic agents' forecast revisions are a sufficient statistic to test whether expectations are rational, in the FIRE sense, or inhabit information rigidities consistent with (all) the theories mentioned above. We take the view that acquiring information can be looked upon as a choice variable (Sims (2003), Gorodnichenko (2008), Woodford (2009), Mackowiak and Wiederholt (2009)), implying that the degree of information rigidity is time-varying. We also take the view that media works as "information intermediaries" between agents and the state of the world (Nimark and Pitschner (2018)).
Below we merge and incorporate these two assumptions in the Coibion and Gorodnichenko (2015a) framework.

The role of media
We start by making the assumption that most people do not follow inflation as measured by the statistical agency per se, but get information about inflation through the media.
While this information-object is high dimensional, letting π N t denote an aggregated measure of relevant media coverage, the signal agent i receives about inflation at time period t can be written as: where ω it ∼ N (0, σ 2 ωt ) is idiosyncratic noise capturing heterogeneity in forecasting "models" across agents, while potential heterogeneity in the signal noise across time is captured by time dependence in σ 2 ωt . In our framework, the noise term could be thought of as capturing heterogeneity in how different agents weigh and interpret different news sources and items. For example, not all agents (if any) read, and interpret correctly, all articles relevant for forecasting inflation.
We further assume that media actually fulfills its purpose in informing the public about important developments in society, including inflation. However, exactly how the media does this, e.g., the systematic editorial decisions, resources used, and discussions within the media houses, are not observable to the agents. Thus, the relationship between actual inflation, π t , and media's coverage of inflation is specified as: where α t is a time-fixed effect, capturing for example potential media biases (Pfajfar and Santoro (2008), Lamla and Lein (2014)). Importantly, as the agents only observe the left-hand side of (2), they do not know that the news they receive, with noise, does not map one-for-one to actual inflation. 7 To introduce dynamics into the model we specify the time series properties of media coverage as a simple autoregressive process of order one: where ν N t ∼ N (0, σ 2 νt ). Again, we allow for potential time dependencies in the process by letting both ρ N t and σ 2 νt depend on the time index t, where variation in, e.g., ρ N t , can be due to major economic or political events that become extensively covered by the media.
Together, equations (1) and (3) constitute a conditional linear state space system, and, removing the time-varying parameter specification and replacing the π N t terms with π t , the system becomes identical to the one in Coibion and Gorodnichenko (2015a). Using the standard Kalman filter recursions, the variance of the prediction error, Ψ t ≡ P t|t−1 ≡ E(π N t − π N t|t−1 (i))(π N t − π N t|t−1 (i)) , can be written as: which is known as the Riccatti equation. From this it follows that the Kalman Gain, capturing the weight assigned to new information about π N t contained in the prediction 7 We show in Appendix C that if agents also form an expectation about the time-varying constant in (2), the time-varying information rigidity will be a function of the time series properties of inflation itself, and not the news, as in our setup (shown below). However, as documented in Section 4.4, using actual inflation gives results at odds with theory, suggesting that such an assumption is questionable. In general, the assumptions behind equations (1) and (2) are also consistent with a substantial literature showing that people are not fully informed about their, e.g., tax credit (Chetty and Saez (2013)), returns to schooling (Jensen (2010), Wiswall and Zafar (2014)), and their marginal price for basic consumption goods such as electricity and water (Carter and Milon (2005). error, is given by: As seen from (5), this weight depends on the persistence of media coverage, ρ N t , and on the amount of noise in the signal, σ 2 ωt . The forecast for the unobservable state is then given by: Averaging equation (1) and (6) across agents, and iterating h periods forward, equation (6) becomes: where ν N t+h,t = h j=1 (ρ N t ) h−j ν N t+j , and F t π N t+h is the agents' expected future media coverage. We observe neither π N t+h nor F t π N t+h . However, using (2), we can write equation (7) as: where F t π t+h is the households' expressed expectation of future inflation, through, e.g., Coibion and Gorodnichenko (2015a), equation (8) describes the relationship between ex-post forecast errors and ex-ante mean forecast revisions. Although individuals form their forecasts rationally conditional on their information set, the ex-post mean forecast error across agents is systematically predictable using ex-ante mean forecast revisions due to gradual adjustment of beliefs to new information. A higher value of β t implies a higher degree of information rigidity. Conversely, if β t = c t = 0, we are back in the world of FIRE.
We depart from this earlier literature by introducing an explicit media channel and time variation in (1) and (3). This implies that the degree of information rigidity, β t , is time-varying and depends on the time series properties of media coverage. In particular, because β t is a function of the Kalman Gain (5), information rigidity is decreasing if the persistence of media coverage (ρ N t ) is high, and increasing if the amount of noise in the signal received by households (σ 2 ωt ) is high (relative to σ 2 νt ). In contrast, in the conventional model, without a media channel, agents are assumed to follow inflation directly. Thus, the degree of information rigidity would be determined by properties of inflation itself.

Two testable hypothesis
Together, equations (5) and (8) deliver two testable hypothesis. First, one can estimate (8) to gauge the degree of time variation in the parameters. Cross-sectional results in Coibion and Gorodnichenko (2015a) point towards state dependent low-frequency variation in information rigidity among professional forecasters. However, to the best of our knowledge, nobody have tested the degree to which information rigidity among households (in the U.S.) varies across time. We do so in Section 3.
Second, and conditional on time-variation in β t , one can use (5) and test if the underlying time-varying persistence and noise-to-signal ratio in media coverage explain the evolution of β t = 1−Kt Kt . However, to operationalize such a test, one needs a measure of media coverage that is relevant for households' inflation expectations. In Section 4 we first propose a measure of media coverage and evaluate its relevance for inflation expectations.
Then, we test if the underlying time-varying persistence and noise-to-signal ratio in media coverage relevant for inflation expectations can explain the evolution of β t .

Information rigidities in the data
To bridge our analysis with the earlier literature, we start by estimating a static version of equation (8), and then turn to the more advanced time-varying parameter specification towards the end of the section. In both cases, we use the Michigan Survey of Consumers (MSC) monthly measure of one year ahead CPI inflation as F t π t+h , and U.S. headline CPI inflation as a measure of realized inflation π t+h . Because the MSC only contain households' forecast of inflation over the course of the next year, revisions to these forecasts will not have perfectly overlapping time periods. Accordingly, the static model we estimate is: where π t+12,t+1 is actual inflation over the next year, and F t π t+12,t+1 is the households expectations, at time t, of inflation over the next year. Thus, the left-hand and righthand side variables in (9) are the forecast errors and revisions, respectively. Both variables are graphed in Figure 8 in Appendix B. Table 1 reports the results of estimating (9) using the IV estimator. 9 The parameter of foremost interest is β. A rejection of the null hypothesis of β = 0 indicates potential information rigidities. As seen from column I, the β estimate is large and significant.
Moreover, the results reported in columns II to VI show that this finding remains robust 9 Due to the non-overlapping time periods in forecast revisions, OLS estimates of (9) will be biased since the error term consist of the rational expectations forecast error. To avoid this issue, we follow Coibion and Gorodnichenko (2015a), and instead apply an IV estimator using the (log) change in the monthly price of oil as an instrument. Table 6, in Appendix B, reports the first stage regression results when the (log) change in the monthly oil price is used as an instrument for the households' forecast revisions. As seen from the table, the instrument is strong and relevant. In the remaining part of this analysis, we will therefore be using the households' forecast revisions instrumented by the price of oil. Table 1. Michigan Survey of Consumers inflation forecast errors and revisions. Each column reports the results of the following regression: π t+12,t+1 − F t π t+12,t+1 = c + β(F t π t+12,t+1 − F t−1 π t+11,t ) + δz t−1 + u t .
Since the forecast revisions contain non overlapping forecast horizons, an instrument variable approach is used to avoid having rational expectation forecast errors in the error term. Newey-West corrected standard errors are reported in parenthesis. *, **, and ***, indicate that coefficients are statistically significant at the 10, 5, 1 percent level, respectively. See the text for details about the additional controls z t−1 and the first-stage IV estimates. The sample period is 1990:01-2016:12. when including the same type of control variables as used in Coibion and Gorodnichenko (2015a) analyzing information rigidity among professional forecasters, namely lagged values of inflation, unemployment, oil price growth, and the T-bill rate.
To further test the robustness of this results, we first control for a total of 10 factors obtained from the monthly FRED-MD database developed by McCracken and Ng (2016). 10 As it is well known that factors like these capture a large bulk of the co-movement among macroeconomic indicators, potential omitted variable biases should be less severe than when only controlling for subjectively chosen single indicators. Next, for the same reason, but to avoid the reliance on factor estimates, we also run a double selection procedure (Belloni et al. (2014)). In short, the double selection algorithm is implemented as follows: First, we regress the treatment (forecast revisions) and the dependent (forecast errors) variables on all the variables in the FRED-MD data set using the LASSO estimator (described in greater detail in Section 4.2). Next, after these two separate penalized regressions, we run an OLS regression on the dependent variable, including the treatment variable and the union of the control variables selected in step one. The final parameter estimates of β from these two additional tests are reported in columns VII and VIII of Table 1. While the point estimates become somewhat smaller when controlling for a larger 10 The FRED-MD is a much used data set in macroeconomics, and contains roughly 130 monthly economic indicators. The data is briefly described in Appendix A. The 10 factors extracted from the FRED-MD data set are obtained using conventional Principal Components Analysis (PCA), see, e.g., Stock and Watson (1989).
set of variables, the results are still significant at either the 1 or 5 percent level.
The estimates in Table 1 strengthen the conclusions drawn in earlier research about the presence of information rigidities. However, the theory models we implicitly build on (Sims (2003), Gorodnichenko (2008), Woodford (2009), Mackowiak and Wiederholt (2009)), and the model in Section 2, imply that the degree of information rigidity varies across time. 11 We turn to this next.

Time-varying information rigidities?
By construction, allowing the parameters in 9 to change through time will deliver a better model fit. On the other hand, controlling for a large set of other relevant variables, as in columns VI to VIII in Table 1, becomes substantially more difficult. Time-varying parameter models are already highly parameterized (one parameter for each time period), and increasing the model size with more explanatory variables makes this challenge even more severe. For this reason, we estimate the time-varying version of 9: using the Latent Threshold Model (LTM) idea by Nakajima and West (2013). Here, dynamic sparsity is enforced on the system through a latent threshold mechanism, which shrinks the parameters towards zero whenever they are not contributing significantly to the model fit. Accordingly, the time-varying parameter model we specify is parsimonious in its size, we only include forecast revisions as explanatory variables, but also, due to the threshold mechanism, faithful to the null hypothesis of full information, i.e., β t = 0.
Formally, dynamic sparsity is enforced on the system through the time-varying parameters c t and β t . For, e.g., β t , the LTM structure can be written as: where we let β * t follow a random walk process: β * As is common for many time-varying parameters models, estimation of (10) is done by drawing from the conditional posterior distribution using MCMC simulations. In the interest of preserving space, details about priors, initialization, and the estimation algorithm are relegated to Appendix G.1.
The time-varying posterior estimates of β t were plotted already in Figure 1a in Section 1. We clearly see that the degree of information rigidity varies substantially across the sample. During the U.S. recessions in the early 1990s it started out low, but then increased sharply both during and in the years after the recession end. The mid 1990s were associated with a high degree of information rigidity. Then, well before the 2001 recession episode, information rigidity started falling, and remained low until the mid 2000s, before it increased substantially again in the years prior to the Great Recession. Since then it has remained at a relatively high level, albeit with small drops after the Great Recession period and towards the end of the sample. Although not our primary focus, we also observe that the c t parameter is downward trending, see Figure 10a in Appendix B. As (10) is basically a (time-varying) forecast efficiency regression, this suggests a departure not only from full information, but also from rational expectations where forecast errors should be white noise. In light of the theory model in Section 2.1, one interpretation of this parameter is that it captures media biases and that such biases are not constant across time, as also suggested by findings in Souleles (2004).
How large is the degree of information rigidity? The static results reported in Table   1 indicate that the degree of information rigidity is substantial. Interpreted through the lens of the model in Section 2.1, our estimates suggest that agents put a weight of less than 0.35-0.47 on new information, and more than 0.53-0.65 on their previous forecasts, i.e.,K = 1/(1 +β). These numbers are well in line with Coibion and Gorodnichenko (2015a), who find similar magnitudes for the Survey of Professional Forecasters forecast of the GDP deflator. Alternatively, in the context of sticky-information models, Coibion and Gorodnichenko (2015a) show that these estimates equivalently imply an updating frequency every six to eight months. This is close to twice as frequent as in the epidemiological model estimated by Carroll (2003), and more than twice as frequent compared to results presented by Dräger and Lamla (2017) and Doepke et al. (2008) for U.S. and European households, respectively. But, as shown above, the estimates of information rigidity is time-varying, and therefore also sample dependent.
Looking at the time-varying parameter estimates in Figure 1a, we obtain results similar to the static ones on average. However, the weight put on new information, relative to old, varies from basically 1, during the recessionary periods in the early 1990s and 2000s, to less than 0.25 during the Great Recession period. Again, interpreted in context of sticky-information models, this implies updating frequencies ranging from every month to roughly every 10th month. Compared to Lamla and Sarferaz (2012), who provide evidence of time variation in information rigidity among German households ranging from 2 to 33 months, these estimates are still modest. But, as discussed in Coibion and Gorodnichenko (2015a), the magnitudes of information rigidity we document here have profound macroeconomic effects in theoretical models incorporating information frictions.

Expectations and news
Having established that the degree of information rigidity varies significantly across time, we now turn to our second question: Can the underlying time-varying persistence and noise-to-signal ratio in media coverage explain the evolution of β t ? To address this question we proceed in three steps. First, we introduce quantitative measures of news coverage using a statistical topic model. Next, we construct a mapping between the derived news topics and households' inflation expectations by running penalized linear regressions. Under the assumption that only news topics with predictive power for actual expectations should be relevant for describing the information households care about, the idea is to construct an approximation to the high dimensional object π N t in equation (1). 12 Finally, we test if the underlying time-varying persistence and noise-to-signal ratio in media coverage, relevant for inflation expectations, can explain the evolution of β t .

The news
The main raw media data used in this analysis consist of roughly 5 million news articles from the Dow Jones Newswires Archive (DJ), covering the period 1990 to 2016. The database covers a large range of Dow Jones' news services, including content from The Wall Street Journal. All text is business-focused and written in English.
Arguable, most households do not likely read, e.g., The Wall Street Journal. However, it is very likely that news stories relevant for inflation expectations are covered by this type of source, and that such coverage spills over to news sources that households follow more directly. King et al. (2017), for example, provide a convincing randomized experiment showing that even articles reported in small media outlets affect the nationwide discussion of the articles' specific subjects. Moreover, The Wall Street Journal is one of the largest newspapers in the United States in terms of circulation, and therefore leaves a large footprint in the U.S. media landscape. Ideally, of course, one would want to work with the exact media content people consume (if that was measurable), together with their individually stated inflation expectations. As a second best, we use aggregated inflation expectations, and one important news source.
To make the high-dimensional, and unstructured, textual data applicable for time series analysis, i.e., to explain time-varying information rigidities, we follow Larsen and Thorsrud (2018b) and Thorsrud (2018), and work with the simple assumption that the more intensive a given topic is represented in the media at a given point in time, the more likely it is that this topic represents something of importance for the economy's current and future needs and developments, including inflation expectations. This assumption is operationalized by doing a topic decomposition of the news corpus, i.e., all the text and articles, using a Latent Dirichlet Allocation (LDA) model (Blei et al. (2003)). This model can be looked upon as a factor model applied to text, where each article is treated as a mixture of topics, while each topic is treated as a mixture of words (terms). The LDA model is also one of the most popular clustering algorithms in the NPL literature because of its simplicity, and because it has proven to classify text in much the same manner as humans would do (Chang et al. (2009)). Thus, the topic decomposition transforms something that is large and complex, i.e., the corpus, into something that is relatively small, dense, and interpretable.
As common in this literature, and prior to estimation, the news corpus is cleaned (Gentzkow et al. (2017)). We remove stop-words, do stemming, and apply term frequency -inverse document frequency calculations. A more detailed description of these steps is given in Appendix F.1. Likewise, in the interest of preserving space, we describe the technical details related to the LDA and estimation in Appendix F.2. Here we note that, based on Larsen and Thorsrud (2018b) and Thorsrud (2018), we extract 80 different topics in total, and use the average of the 10 last iterations of the Gibbs simulations, used to estimate the LDA, as our measure of article weights and topics. Using the output from the LDA, the topic decomposition is transformed into time series, measuring how much each topic is written about at any given point in time. We note that, by definition, on a given day, more coverage of one particular news topic leads to less coverage of other topics, i.e., the topic probabilities sum to 1 on each day in the sample. Across time, however, there can be large variation in the topic contributions. Finally, we compute the tone of the news using a simple dictionary-based approach, counting positive relative to negative words in articles relevant for each news topic, and sign adjust the topic frequencies accordingly. A more detailed description of this latter step is relegated to Appendix F.3. 13 To build intuition, Figure 2 illustrates the output from the above steps for six of the 80 13 The results presented in Thorsrud (2018) highlight that tone-adjusted topic frequencies perform much better for nowcasting GDP growth than un-adjusted topic frequencies do. I.e., whether or not the news is positive or not matters. In Appendix D we show that the same applies in the current setting. topics. A full list of the estimated topics is given in Table 5, in Appendix A. First, the LDA produces two outputs; one distribution of topics for each article in the corpus, and one distribution of words for each of the topics. The latter distributions are illustrated using word clouds in Figure 2. A bigger font illustrates a higher probability for the terms. As the LDA estimation procedure does not give the topics any name, labels are subjectively given to each topic based on the most important terms associated with each topic. How much each topic is written about at any given point in time, and its tone, is illustrated in the graphs below each word cloud. The graphs should be read as follows: Progressively more positive values means the media writes more about this topic, and that the tone of reporting on this topic is positive. Conversely, progressively more negative values means the media writes more about this topic, but that the tone of reporting is negative. Across topics, our simple hypothesis is that these fluctuations can tell us something important about which narratives dominate in the public discourse at different points in time.
Starting with Carroll (2003), the conventional method for quantifying media coverage of inflation has been to apply Boolean techniques. That is, simply counting (subjectively defined) terms related to inflation in every news article (or headline), and then constructing time series based on aggregated daily or monthly (normalized) counts. Although we also apply this method, in Section 4.4, our preferred method for quantifying media coverage relevant for inflation expectations relies on the topic-based decomposition. The advantage of this procedure is that articles that are relevant for inflation, but do not use the term inflation explicitly, might be captured by the more general topics. Still, a large amount of topics are needed to describe the news corpus, making the mapping between inflation expectations and news a high-dimensional variable selection problem. We turn to this next.

News-driven inflation expectations?
To find the set S of news topics relevant for households' inflation expectations, we run linear predictive regressions like: where F t π t+12,t+1 is the households' expectations, at time t, of inflation over the next year, and M are the number of news topics X n,t−1 . Each news topic is lagged one period relative to F t π t+12,t+1 to avoid simultaneity issues and look-ahead-biases. Our results are also robust to inclusion of more than one lag, and we will later augment the right hand side of (13) with a large set of hard economic indicators.
Among the large set of M predictors, we are interested in those that contribute significantly in predicting F t π t+12,t+1 . However, as the number of explanatory variables M is high relative to the number of periods T in our data sample, the standard ordinary least squares (OLS) estimator is inappropriate. Instead, we use the Least Absolute Shrinkage and Selection Operator (LASSO) method, first proposed by Tibshirani (1996). In contrast to OLS, this method is built for high-dimensional variable selection problems, and shrinks parameter estimates for unimportant variables towards zero. The LASSO thereby encourages simple and sparse models.
Formally, letting y = (F 1 π 1+12,2 , . . . , F T π T +12,T +1 ) be a T × 1 response variable, and X = [X 0 , . . . , X T −1 ] be the T × M matrix of predictors, the LASSO algorithm solves the constrained least squares problem: where λ ≥ 0 is a tuning parameter, controlling the amount of regularization. If λ = 0, (14) yields the OLS solution. If λ > 0, coefficients will be shrunk towards 0. As is common in the literature, we choose the optimal value of the tuning parameter using 5fold cross-validation (CV), and minimum mean squared error (MSE) loss. To further avoid over-fitting, we choose the sparsest model within one standard error of the minimum loss, but note that our results are robust to choosing instead the more highly parameterized MSE solution. Prior to estimation, all variables are standardized to make the penalized regressions invariant to scale. Finally, as LASSO parameter estimates will be pulled towards zero, and thereby be biased, we follow Belloni and Chernozhukov (2013) and run the post-LASSO routine, i.e., OLS on the selected variable set, when reporting the results and making inference.
The column labeled News in Table 2 summarizes our main results. Among 80 potential news topics, 11 are selected by LASSO. Of these, 6 are significant at the 10 percent level or lower. As the LASSO regressions are just predictive relationships used for variable selection, we do not spend time on interpreting the sign of the coefficients. The adjusted R 2 statistic, however, is informative and as high as 41 percent. Thus, as also seen in Figure 1b, the selected news topics explain a relatively large fraction of the total variation in households' inflation expectations. The relevant set includes topics like Health, Internet, Clients, Aviation, Labor market, The White House, and M&A, while the partial R 2 statistics suggest that the topics Health and Internet contribute the most to the regression fit.
Of course, many news articles are just reporting on hard economic indicators the households might actively follow. However, the independent relevance of news topics for describing households' inflation expectations is robust to augmenting the news topic regressors in (13) with the 130 variables in the FRED-MD database (described in Section 3), and re-estimating the LASSO. As seen from the column labeled News and Hard in Table 2, while some of the topics selected in the news-only regression become insignificant when controlling for the hard economic indicators, most of them survive (and only one new topic gets selected). And, the adjusted R 2 only increases from 41 to 61 percent. In other words, the news topics capture aspects of households' inflation expectations that are not captured by hard economic indicators. Interestingly, however, among the hard economic indicators that gets selected, we find many variables already focused on in the earlier literature, like, production and consumption indicators (Pfajfar and Santoro (2008) and Ehrmann et al. (2017)), volatility measures and spreads (Dräger and Lamla (2017)), and consumer sentiment (Doms and Norman (2004) and Ehrmann et al. (2017)).
The (significant) topics in Table 2 might not have been given names that intuitively link them to households' inflation expectations. However, the narrative realism of the approach becomes evident when we query the news corpus for articles where each of the significant topics have a particularly high topic weight. This is illustrated in Table 3.  Aviation (2013-01-24) Want a quick 30% discount on your family's trip to Europe or Hawaii? In the crazy airfare world, sometimes buying two tickets is cheaper than one. Pairing two discounted tickets together to create your own connecting itinerary can often be less expensive than flying on one ticket, if you take advantage of airlines' city-specific specials, or create your own route using discount airlines. In sum, we find that a relatively large fraction of the variation in household's inflation expectations can be explained by between 5 and 11 news topic time series. Most of the news variables survive when controlling for a large set of hard economic indicators, suggesting that economic news play an independent role in shaping inflation expectations.
In turn, these news topics provide a plausible narrative for what type of news articles households pay attention to when forming their inflation expectations.

News-driven information rigidities?
It follows from the theory model in Section 2 that the degree of information rigidity among households should be a function of the persistence and noise-to-signal ratio in the signal extraction problem. Using a set S of relevant news topics, derived from the LASSO procedure in the previous section, we now test this relationship by running the following regression: Here, β t is the median time-varying information rigidity, reported in Figure 1a, while ρ t and κ t are the persistence and noise-to-signal ratio in the underlying information set.
Based on the regression results in the previous section, we construct quantitative measures of these as follows.
First, for each of the news topics in the selected variable set S, we run simple Autoregressive (AR(p)) models. To introduce time dependencies, we allow both the volatility of the AR(p) innovations and the autoregressive parameters to be time dependent. The parameters follow random walk processes, and we set p = 1 to avoid over-fitting. This model structure, together with the Gibbs simulations used for estimation, is standard in the time series literature (see, e.g. Primiceri (2005)), and described in greater detail in Appendix G.3. For future reference, we letρ i,t andσ i,t denote the estimated posterior draws of the time-varying persistence and volatility for news topic i. As higher and more persistent coverage of one type of news leads to less coverage of other news items by definition in the LDA model, time-variation inρ i,t also captures the predictions from Nimark and Pitschner (2018) that news-coverage will be homogeneous, for example around major events.
Second, for each of the news topics in the selected variable set S, we construct a measure of the noise in the signal by calculating the standard deviation of the posterior topic estimates. That is, when calculating the importance of each news topic for each day in the sample, posterior uncertainty will introduce variability in the article weight distributions and in the articles selected to tone-adjust the news topic time series. Using the sum of the standard deviation of these distributions, we obtain a measure of the noise, Finally, aggregating across all the news topics i in the set S, and combining the output from these two steps, we obtain: where w i is the normalized partial R 2 statistic for variable i in Table 2. Thus, variables that are more important in terms of explaining the variation in households' inflation expectations are given a larger weight when constructingρ t andκ t .
The results presented in Table 2 give some degrees-of-freedom in choosing the relevant set S. To avoid using news topics potentially just capturing information highly correlated with hard economic indicators, and to avoid relying on test statistics computed in stage two of a variable selection problem, we define the set S to include all the selected topics from the LASSO regression controlling for hard economic indicators. Figure 3 graphs the posterior draws of the estimates in (16) for this set. The average degree of persistence in the news varies significantly, and tends to be especially high around recession periods, as one would expect. In contrast, and perhaps surprising, the noise-to-signal ratio is not very high around recessions. Instead, it seems to associate (in particular) the mid 1990s as a "noisy period". We do not have a good explanation for this pattern, but note that while the method used to construct this variable is intuitive, it is also somewhat sensitive to the representativeness of the raw corpus data. If some time periods contain news extracts from fewer, or different types, of articles, this might contaminate our noise-to-signal measure. 14 14 The topic extraction itself is less prone to this issue because the topic distributions are based on information from the whole sample. We also note that the estimates in Figure 3, and the results presented below, are very similar if we instead define the set S as; Only the selected and significant topics from Column I in Figure 4a reports the results of estimating (15) for each draw ofρ t and κ t , using non-informative natural conjugate priors. Accordingly, parameter estimates correspond to the OLS solution, but taking into account the generated regressors issue by sampling from the full distribution ofρ t andκ t . Going forward, we label this our Benchmark model. As seen from the figure, the coefficient estimates have the correct sign. A higher persistence and lower noise-to-signal ratio lead to a reduction in information rigidity. However, given that we work in a high-dimensional time-varying parameter setting, the posterior uncertainty is naturally large, especially for the persistence parameter.
The results presented in Table 4 highlight that the patterns documented above are very unlikely to be obtained by chance, at least for the persistence parameter. We show this by running a falsification experiment. First, 100 different sets of news topics, not including those used in constructing the set S above, are constructed. Then, for each of these alternative sets, we calculateρ t andκ t , and redo the estimation of (15). Since the news topics used in this experiment are "irrelevant" for households' inflation expectations (according to the LASSO), we also expect the posterior distributions to be less informative.
And, as illustrated in Table 4 In columns II and III in Figure 4a we augment the Benchmark model specification in (15) (15), respectively. The red crosses mark the mean estimate +/-one standard deviation. The mean adjusted R 2 statistic is reported above the distributions. Figure 4b reports Benchmark estimates from a truncated sample, and when including a Great Recession dummy variable to the regression. In Figure 4c equation (15) is estimated; Using the Benchmark news-based model without including the noise-to-signal term (I ); Using the persistence in inflation (II ) and using the persistence in the alternative inflation count series (III ). Figure 4d reports the same estimates as in Figure 4c after also controlling for the Benchmark news-based persistence measure.
The results presented in Figure 4a summarizes well our main result: Media coverage plays an important role for describing the degree of information rigidity among households.
However, from visual inspection of the co-movement between the estimated information rigidity (Figure 1a) and the media persistence (Figure 3a) we observe that something is different during the Great Recession period. For this time period, both information rigidity and media persistence is high, suggesting a positive correlation, and not negative,  (15) is estimated for 100 randomly selected sets of news topics (not in the set of topics used to generate the Benchmark parameter distributions). The table reports the fraction of draws that have posterior probabilities P r(γ 1 < 0) ≥ x and P r(γ 2 > 0) ≥ x, where x refers to a bin in the histogram. The bin associated with the posterior probability for the Benchmark model is marked in gray. as implied by theory. This is also one reason for why the results in columns II and III in Figure 4a are so strong relative to those in column I. For example, the double selection procedure consistently chooses control variables that experienced particularly large swings during the Great Recession period, i.e., housing starts, spreads, weekly hours, and employment figures (see Figure 11a in Appendix B). The results presented in Figure 4b illustrate more directly the peculiarity of the Great Recession period. That is, estimating the Benchmark model up until 2007, or including a dummy variable for the Great Recession period, yields very similar results to those presented in columns II and III in Figure 4a. We return to the Great Recession discussion in Section 5. Before that we explore if alternative inflation-based variables can explain the time-varying degree of information rigidity among households, and then examine the dynamic interaction between information rigidity, the media, and macro economic developments in greater detail.

Inflation and an alternative news measure
First, in the standard theoretical framework developed by Coibion and Gorodnichenko (2015a), there is no role for the media, and it is the underlying time series properties of inflation itself that should determine the degree of information rigidity. To test this no-media alternative we construct a quantitative measure of the persistence in inflation as we did for the news topics, i.e., using a time-varying AR(1) model, and then re-estimate (15) using this persistence measure instead of the news-based one. As we do not have a good measure for noise in the inflation series, the noise-to-signal ratio is not included in the regression. 16 As seen from column II in Figure 4c, the persistence parameter for the inflation-based regression is not significant, and if anything, has the wrong sign. For comparison, using the news-based persistence measure as the only explanatory variable in (15)  zero, see column I in Figure 4c. This finding also holds after controlling for inflation and the news-based persistence measure in the same regression, see column I in Figure   4d. Accordingly, both results confirm that there is an important independent role for the media in explaining households' expectations formation process. 17 In the literature we speak to, the conventional method used to measure the intensity of media reporting relevant for households' inflation expectations has been to count the number of terms related to inflation in the corpus' articles (headlines) (Carroll (2003), Pfajfar and Santoro (2013), Lamla and Lein (2014)). In our view, and as alluded to in Section 4.1, this method builds on a rather stringent assumption. Many items in the news might be of relevance for households' inflation expectations, even without explicitly mentioning the term inflation. This motivates our news-topic-based approach.
The results presented in the last columns in Figures 4c and 4d illustrate that the newstopic-based approach also provides a better description of the time-varying information rigidity observed among households than the count-based method does. The results are produced as follows. First, we construct an alternative media measure based on counting terms related to inflation in articles using the wild-card search inflation*. This count metric is then summed for each day in the sample, and normalized by the article count for that day. Next, we follow the same procedure as used above to measure media persistence, i.e., estimating a time-varying AR(1) model for the alternative media measure, and then redo the estimation of (15). Figure 12, in Appendix B, reports the alternative inflation count measure together with the time-varying persistence parameter. Re-estimating the LASSO regression in Section 4.2, including the inflation count measure as an additional variable, we observe that it is not selected (results not shown). Further, as seen from column III in Figure 4c and column and II in Figure 4d, the persistence for this alternative measure of inflation news does not explain the evolution of households' information rigidity, and if anything, has the wrong sign.
As an additional test of our research design, we have also applied our methodology to the Survey of Professional Forecasters (SPF) CPI inflation forecasts. A-priori we conjecture that the media plays a much smaller role for professional forecasters than for households, 18 and our results confirms this conjecture. The media-based news topics do not predict SPF expectations. However, in line with the model outlined in Section 2, but without a media channel, the time series properties of inflation itself can (partly) 17 In unreported results we have also tested if persistence in macroeconomic variables relevant for households' inflation expectations (confer Table 2), can explain households' time-varying information rigidity, finding that they can not. 18 That is, professional forecasters have much less need to delegate their information choice to the media.
They surely know and follow actual CPI inflation, and they have much more resources than households to help them stay informed about the different (economic) states of the world. explain the degree of time-varying information rigidity among profession forecasters. In the interest of preserving space, the details of this experiment are relegated to Appendix E.

Information rigidity, business cycles, and the media
The mechanics of the model outlined in Section 2 are straight forward. When an important (business cycle) event happens, media coverage potentially becomes more concentrated and persistent around this event and, if the noise-to-signal ratio is also reduced, information rigidity falls. Accordingly, the degree of information rigidity and business cycle developments are closely related, and documenting this relationship is important because theoretical models with information frictions give different policy implications than models assuming, e.g., FIRE. For example, in state-dependent models of information rigidity (Gorodnichenko (2008), Woodford (2009), Mackowiak and Wiederholt (2009)), the degree of information rigidity should be much lower after a large and visible shock than during normal periods. And, when acquiring information is looked upon as a choice variable, the degree of information rigidity should be inversely related to business cycle developments (Mackowiak et al. (2018)). At the same time, in the related delegated information choice view of the world (Nimark and Pitschner (2018)), the distribution of events and media's reporting on those events jointly determine the degree of information rigidity.
When using aggregate (monthly) business cycle data, these views give at least three testable predictions. First, because predictors of information rigidity and the business cycle are jointly determined, they imply that these variables should Granger cause each other. Second, and for the same reason, they imply that an exogenous shock to information rigidity cannot be separately identified from a large economic event, e.g., a recessionary shock. Or, in other words, these two shocks are the same. Finally, they imply that following a more regular business cycle disturbance, the business cycle and information rigidity should be negatively correlated.
Below we test each of these predictions in a simple Vector Autoregressive (VAR) framework using our news-based information rigidity predictors. This framework allows us to conduct standard Granger causality tests (Granger (1969)) and use impulse response analysis to investigate the dynamic interaction between information rigidity, business cycles, and the media.  Figure 11b in Appendix B. The VAR is specified with six lags, and includes the business cycle indicator, denoted BC, the news-based persistence and noise-to-signal According to the Granger causality test we find strong support, significant at the one percent level, for the first prediction, namely that Granger causality runs in both directions between the business cycle and the news-based persistence variable. In contrast, the newsbased noise-to-signal ratio is not Granger caused by any of the other two variables, and does not itself Granger cause neither the business cycle indicator nor the persistence measure. Figure 5a reports the response in the business cycle indicator and the news-based persistence variable following an exogenous shock to the BC indicator, identified using a recursive ordering (Cholesky). 20 In line with the third prediction from above, there is an inverse relationship between the two response paths. On impact, the persistence variable falls slightly, indicating that information rigidity increases. Then, as the business cycle boom cools off, persistence gradually increases, and reaches its peak when the business cycle is at its trough, after roughly 25 months. Thus, according to these estimates, and given the negative relationship between information rigidities and the news-based persistence measure, information rigidities are at their lowest during economic downturns.
The implications from Figure 5a, namely that information rigidities are at their lowest during recessionary periods, is also documented in Loungani et al. (2013), who look 19 For simplicity, the mean estimates of the news-based persistence and noise-to-signal ratio time series are used in the regression. The VAR lag length is determined from likelihood ratio tests, while parameter uncertainty in the VAR is estimated using a residual bootstrap. 20 As implied by the Granger causality tests, the noise-to-signal ratio is more or less exogenous, and its impulse responses are more or less insignificant. In the interest of brevity, and for visual clarity, we do not report them.
at GDP growth forecasts of professional forecasters in 46 countries, and by Coibion and Gorodnichenko (2015a), who analyze inflation expectations among professional forecasters. However, neither of these studies considers the role of the media in this setting.
A consequence of the second prediction from above, i.e., that an exogenous shock to information rigidity cannot be separately identified from a large economic event, is that an immediate increase in the news-based persistence measure should be associated with an immediate drop in the business cycle indicator. Figure 5b confirms this prediction.
An unexpected increase in the news-based persistence measure, implying a reduction in information rigidity, is associated with a sharp fall in the business cycle, i.e., a recession.
In Figure 5b, and by construction, the drop first occurs with a one-period lag. However, when reversing the order of the variables in the VAR, we show in Figure 13, in Appendix B, that the fall is immediate. Importantly, these responses differ from those obtained after a more regular business cycle disturbance, graphed in Figure 5a, where the relationship between economic activity and the news-based persistence measure is more gradual. We also note that the results reported in Figure 5 are robust. Augmenting the VAR with typical recession indicators like the spread between long and short horizon interest rate maturities, the VIX index, oil prices, or differencing the highly persistent news-based variables, do not qualitatively change our results.
In sum, while it is well known that models with information frictions affect the dynamics of the business cycle and have important policy implications (Mackowiak et al. (2018)), the results documented here are new because they highlight the role played by the media as "information intermediaries" in this setting. For institutions conducting countercyclical policies (partly) by managing expectations (Galí (2008)), this suggest that media should play an important role in their communication strategies, as also emphasized in, e.g., Berger et al. (2011) and Haldane (2017).

The Great Recession period
The results presented in Section 4.3 documented that there is a significant and theory consistent relationship between households' information rigidity and media persistence and noise. However, as was illustrated in Figure 4b, the Great Recession period, hereafter referred to as the GR period, perturbed this relationship. The question then becomes, why is there a GR "puzzle"? In this final section, we propose two potential explanations, which not only shed further light on the news-driven inflation expectations and information rigidities relationship, but also allow us to conduct a (economic) meaningful out-of-sample evaluation of the predictive relationships documented earlier.
First, one could argue, as for example in Imbs (2010) and Bjørnland et al. (2017), that the GR period was the first truly global recession period in decades, and that inflation behaved differently during this period relative to before, as discussed in, e.g., Ball and Mazumder (2011) and Coibion and Gorodnichenko (2015b). Accordingly, media coverage might simply have been less informative for describing inflation, and inflation expectations, during the GR period relative to before (and after). In this environment, households might rationally have chosen not to follow the media to update their expectations. We call this the bad media quality explanation.
Second, one (of potentially many) alternative explanation is that media coverage was good and relevant, also during the GR period, but that households paid less attention to news altogether. That is, if the quality of information is good, but information diffusion is poor, the quality of information matters less. While information diffusion is unobserved, a reasonable first order approximation can be obtained by using newspaper circulation statistics, where simple statistics seem to give some support to this view. 21 In particular, it is well known that newspaper circulation numbers have been falling dramatically in the U.S. (and in other countries) since the early 1990s, but the cyclical patterns of newspaper circulation are less known. However, irrespective of whether one detrends the non-stationary circulation statistics with a second order polynomial, a Hodrick-Prescott (HP) filter (Hodrick and Prescott (1997)), or simply looks at the yearly difference of the series, one obtains the same qualitative answer: During recessionary periods, circulation numbers often, but not always, drop below trend or fall. But, during the GR period, the negative gap was over three times larger than during any other recession in the U.S. since the 1940s, see Figure 14, in Appendix B. Accordingly, the GR period might have been different because information diffusion from news media was much worse during this time period than in any other recession the last decades.

Out-of-sample performance
We start by addressing the bad quality story. To this end we rely on a quasi out-of-sample (OOS) predictive experiment for both households' inflation expectations and actual CPI inflation. We first estimate a LASSO regression, as described in Section 4.2, for each of these outcome variables for the sample period 1990-2000, predict one period forward, and redo this analysis for the remaining part of the sample, using an expanding estimation window. As explanatory variables, we either use all the news topics, or the hard economic indicators from the FRED-MD database. As such, the OOS experiment allows us to track how the prediction error evolves, and thereby pinpoint when potential breaks occur, 21 The newspaper circulation statistics are collected from Pew Research Center and their Newspaper Fact Sheet at http://www.journalism.org/fact-sheet/newspapers/. We refer to their web-pages for more documentation on how the circulation statistics are collected and compiled.  Figure 6a report the (recursively estimated) fraction of topics that are still being selected by the LASSO 1 to 60 months after they were initially selected as predictors for households' inflation expectations. Figure 6b reports the out-of-sample cumulative squared prediction error differences (CSPED) between the news-based-only and hard-economic-only based LASSO regressions. Two outcome variables are considered, namely households' inflation expectations and actual CPI inflation. See the text for further details.   Figure 6a shows that the GR period is associated with a large change in the relationship between news topics and inflation expectations. The figure should be read as follows: For each estimation vintage, ending as illustrated on the top x-axis, we calculate the fraction of news topic predictors that are still being selected by the LASSO algorithm 1-60 months later, where the end date for the 60 period duration is illustrated on the lower x-axis. For example, during the mid 2000s, roughly 40 percent of the initially selected news topics were still useful predictors for households' inflation expectations even 5 years (60 months) after they where initially selected. Thus, the predictive relationship was fairly stable and long lasting. Going into the GR period, this pattern changes markedly. From Figure 6a we observe that less than 10 percent of the selected news topic predictors are still being used after just 20-40 months. After the GR period, however, there seems to be a return towards the same patterns we observe prior to the GR period. Thus, there is clear evidence of in-sample instability during the GR period.
The black line in 6b puts further evidence behind this reasoning. It reports the cumulative squared prediction error difference (CSPED) between the news-based out-of-sample predictions and those based on the model including only the hard economic indicators.
A value above zero indicates that the latter specification is better than the news-based one, and vice versa. As clearly seen in the figure, the news-based out-of-sample predictions were better until 2008, but then deteriorated substantially. However, starting from roughly 2010, we again observe that the news-based model improves relative to the one based on hard economic indicators, confirming that the GR period was different. Indeed, if the evaluation sample had started after the GR period, the news-based model would have been better also in absolute terms.
At the same time, it is hard to argue that media coverage became of worse quality during the GR period. In particular, when we do the OOS experiment for actual CPI inflation, and compare the news-based performance to one based on hard economic indicators, we actually observe that the news-based approach is superior, see the gray line in Figure 6b. And, in contrast to the results for household expectations, the news-based approach improves further during the GR period. 22 In sum, and consistent with the bad media quality explanation, Figure 6 documents clear in-sample instability and an out-of-sample deterioration of the news and inflation expectation relationship during the GR period. At the same time, however, our results suggest that the news was highly informative of actual inflation developments during this period. In other words, households could have followed the news to form good updates of their inflation expectations, but do not seem to have done so. This points us towards the second potential explanation mentioned above, where media coverage was good and relevant, also during the GR period, but households paid less attention to news altogether.

An information diffusion story?
We formally address what we label as the information diffusion explanation by estimating an augmented version of equation (15): where circ t is the HP-filtered newspaper circulation variable, used in (17) as a crude approximation for information diffusion. 23 All variables, ρ t , κ t and circ t , are measured as 22 Overall, the root mean squared forecasting error (RMSFE) for the news-based model is 1.2, while the LASSO regression entertaining the FRED-MD data obtains a RMSFE score of 1.3. Interestingly, both statistics are much better than 1.9, which is the score obtained when we estimate the much used unobserved-components stochastic volatility inflation forecasting model suggested by Stock and Watson (2007). The finding that the news-topic-based approach outperforms the usage of hard economic indicators when predicting actual CPI inflation is a novel finding in itself. We leave it for future research to explore in greater detail how news data can be used to predict inflation developments. 23 The circulation statistic is recorded on a yearly frequency. We obtain monthly numbers by using shapepreserving piecewise cubic interpolation on the yearly trend-adjusted estimates. deviations from their means. Accordingly, γ 3 measures the direct effect of a higher than usual information diffusion on information rigidity, while the interaction terms capture the idea that the effects of the media persistence and noise-to-signal ratio might be conditional on the degree of information diffusion. Figure 7 summarizes the results. Four findings stand out. First, as seen from the dark-colored density estimates in Figure 7a, and consistent with a large body of past work outside economics consistently finding that those who consume print media are better informed about current events than those who do not (see, e.g., Finnegan and Viswanath (1996)), we find that the independent effect of the circulation variable is negative and significant. More importantly here, the persistence and noise parameters have a significant negative and positive sign, respectively, and the interaction terms suggest that these patterns are stronger in times of relatively higher newspaper circulation, i.e., better information diffusion, as hypothesized at the beginning of this section. Second, the mean adjusted R 2 statistic for (17) is 0.24, which is substantially higher than 0.04 obtained for the Benchmark model in column I in Figure 4a. Thus, controlling for the circulation statistic improves the model's explanatory power considerably. Third, from  (17), we learn that most of the improvement in fit is due to smaller errors during recessionary periods, and in particular during the GR period. Finally, although there are many ways to calculate the cyclical patterns of a non-stationary variable, the qualitative results reported above are not very sensitive to our choice of using the com-mon HP-filter for detrending the newspaper circulation statistics. In fact, when instead using the simple difference in the newspaper circulation statistics, we obtain the posterior estimates reported in light gray in Figure 7, where the model fit improves further.
Of course, as news consumers today have the option to substitute from print to online news, one could argue that newspaper circulation statistics no longer serves as a good approximation for information diffusion, and that the large cyclical drop during the GR period is due to a substitution towards cheaper (free) online news or social media. As such, an alternative interpretation of the results reported above is that the households consumed a different type of news during the GR period, and not that information diffusion was lower per se. Although we can not rule out this explanation, which is highly interesting in itself, we note that during the early 1990s, and partly during the early 2000s, the substitution effect must have been minor because online news did not exist. Still, the increase in explanatory power for the augmented model, relative to the Benchmark model, is almost just as large during the 1990s recession as in the GR period, see Figure 7b, and it is exactly during these recession periods the circulation statistic falls substantially (confer Figure   14, in Appendix B). Accordingly, what we label as the information diffusion explanation is consistent across the sample, while the substitution explanation is not.
We conclude from the above analysis that the predictive relationship between news topics and households' inflation expectations withstand out-of-sample testing. Moreover, in terms of information rigidities, we find that one potential explanation for the GR "puzzle" is related to the amount of information households' consumed during this period, or, alternatively, the type of media content. Irrespective of interpretation, this is in accordance with the results presented in earlier sections about news media's important role in the expectations formation process. Going forward, the preceding analysis motivates more work investigating how the relationship between media coverage and macroeconomic expectations might be affected by a changing media landscape.

Conclusion
We investigate the role of media for understanding inflation expectations and information rigidities among U.S. households. Taking the view that acquiring information can be looked upon as a choice variable (Sims (2003), Gorodnichenko (2008), Woodford (2009), Mackowiak and Wiederholt (2009)), and that the media work as "information intermediaries" (Nimark and Pitschner (2018)), we augment the testing framework introduced by Coibion and Gorodnichenko (2015a) with a simple media channel, and find empirical support for the following: First, the degree of information rigidity among households is far from constant, but varies significantly across time (over the business cycle). Second, using a novel news-topic-based approach, we show that the news types the media choose to report on are good predictors of households' stated inflation expectations. Finally, we show that the underlying time series properties of news topics relevant for households' expectations explain the time-varying information rigidity among households. When media persistence (noise-to-signal) is high (low), information rigidities tend to be low (high), and vice versa, as the theory model predicts.
A number of robustness tests document that these results are very unlikely to have been obtained by chance, and that similar findings are not found when using the time series properties of inflation itself, or a measure of media coverage based on counting inflation terms in the news. In line with theory, we also document significant interactions between the business cycle and the part of households' information rigidity that can be explained by the media. For policy institutions aiming at managing consumers' expectations, e.g., monetary policy, this emphasizes the media as an important channel for their communication strategies.
Our study speaks to a growing literature documenting significant information rigidities and a departure from the full information rational expectations hypothesis. Thus far, however, the role of the media has only gotten limited (formal) attention in the this literature. Using a large news corpus and machine learning algorithms we contribute by focusing on media's role within a well-established (theoretical) testing framework. Our results highlight media's importance in such a setting.
Arguably, our analysis is only partial. We do not specify a full model for how potentially profit maximizing "information intermediaries" interact with utility maximizing consumers etc. Given information and expectations prevalent role in economics, an interesting avenue for further research would be to incorporate the delegated information choice mechanism, and the role of the media as information providers, in a general equilibrium framework.    . Cusum test (Brown et al. (1975)) of parameter stability in equation (9).   We refer to their web-pages for more documentation on how the circulation statistics are collected and compiled.

Appendix C Endogenous bias
In the model developed in Section 2 we assumed that the relationship between actual inflation, π t , and media's coverage of inflation was specified as: where α t is a time-fixed effect, capturing for example potential media biases (Pfajfar and Santoro (2008), Lamla and Lein (2014)). Importantly, as agents were assumed to only observe the left hand side of (18), they do not know that the news they receive, with noise, do not map one-for-one to actual inflation.
Here we consider a model where we assume that agents are aware of the gap between π t and π N t . In this case the signal extraction problem can be formulated as the following: where x t = π t α t , a = 1 1 , B t = ρ t 0 0 1 , and ν t = ν π t ν α t , and x t is the unobservable state vector predicted by agents, implying that agents predict both π t and α t .
As before, the variance of the prediction error can be found from the Ricatti equation, The Kalman Gain follows as: In this case, and in contrast to under the assumptions in Section 2, the Kalman Gain weight, used to update the inflation forecast (k π t ), depends on the persistence of inflation itself, ρ t , and on the amount of noise in the signal, σ ωt .
Following the same procedure as in Section 2 we can show that: , and e t = h j=1 (ρ t ) h−j ν π t+j . Accordingly, the β t coefficient, measuring the degree of information rigidities, depends on the properties of inflation itself, and not the news as in equation 8. Thus, while the expressions in equations 8 and 22 are the same, the assumptions underlying the reduced form coefficients c t and β t differ, and we document in Section 4.4 that using the time series properties of actual inflation to describe the evolution of β t gives results at odds with theory.

Appendix D Alternative news-topic time series
Results presented in Thorsrud (2018) highlight that tone-adjusted topic frequencies perform much better for nowcasting GDP growth than un-adjusted topic frequencies do. I.e., whether or not the news is positive or not matter. In this paper we have followed suit, and work with time series of news topics that are tone-adjusted, i.e., topic frequencies are adjusted depending on whether the news is positive or negative (see Appendix F.3 for the technical details). Below we show that tone adjustment of the topic frequencies also adds value in the current setting.
In particular, Table 7 presents results for the same post-LASSO regressions as in Table   2 in Section 4.2. However, in contrast to the results presented there, we have used news topic time series that are not tone-adjusted. In the news-only regressions we observe that many news topics are selected, and that the model fit is actually better than the comparable column in Table 2. Still, when we control for the hard economic variables in the FRED-MD database, almost all of the news-based variables drop out, and the adjusted R 2 remains very similar. This contrasts with the results presented for tone-adjusted news topics in Section 4.2, where controlling for hard economic variables does not significantly change which news topics gets selected by the LASSO algorithm. Accordingly, without tone adjustment, the news topic time series do not seem to capture the same independent media component as the tone-adjusted topic time series do. Moreover, if we run the LASSO regression including both tone-adjusted topic frequencies, un-adjusted topic frequencies, and hard economic indicators, the obtained result turns out to be very similar to the one presented in the News and Hard column in Table 2. That is, very few un-adjusted news topic times series are selected.
In sum, these results again highlight the independent role of the media for understanding households' inflation expectation process. The results also speak directly to the claim by Sims (2003) that the tone of economic reporting affect sentiment beyond the economic information contained in the reporting itself. Related empirical work by Doms and Norman (2004), Lein (2014), andPfajfar andSantoro (2008) also show how the tone of the news matter. Note here however, and as described in Appendix F.3, that our Table 7. Post-LASSO regression results and unadjusted news topic times series. The dependent variable is the Michigan Survey of Consumers stated inflation expectations over the next year. In the column labeled News, unadjusted news topics are used as the only predictors. In the column labeled News and Hard the set of potential predictors is augmented to also include roughly 130 hard economic variables from the McCracken and Ng (2016)  tone adjustment procedure explicitly uses the output from the topic model. Thus, in our analysis, it is not the case that we look at only the overall tone, or sentiment, of the news.
The topic decomposition matter.

Appendix E Survey of Professional Forecasters
Applying our methodology to the Survey of Professional Forecasters (SPF) CPI inflation forecasts serves two purposes. First, although we a-priori conjecture that the media should Table 8. Survey of Professional Forecasters inflation forecast errors and revisions. Each column reports the results of the following regression: π t+3,t −F t π t+3,t = c+β(F t π t+3,t −F t−1 π t+3,t )+δz t−1 +u t . Newey-West corrected standard errors are reported in parenthesis. *, **, and ***, indicate that coefficients are statistically significant at the 10, 5, 1 percent level, respectively. See the text for details about the additional controls z t−1 . play a much smaller role for professional forecasters than for households, this experiment is as a good robustness check of our research design. Second, if it is true that the media play less of a role for describing information rigidity among professionals, it should also be true that the difference between information rigidity among households and professionals should be partly explained by the media (because the media has been shown to matter for households). As such, looking at information rigidity among professional forecasters potentially allows us to perform one additional test regarding media's independent role in the expectations formation process among households.
Re-doing our analysis using SPF data, we reach four main conclusions. 24 First, when running the static regression in (9) using the SPF data, we obtain (significant) coefficient estimates that are similar to those reported in the earlier literature, see in particular Coibion and Gorodnichenko (2015a). Moreover, this finding complements existing evidence in the literature because we show that it is robust to doing the more restricted factor augmented and double selection procedures described earlier, see Table 8.
Second, when allowing for time-varying information rigidities, as in (10), we confirm that the degree of information rigidity changes through time, also for the SPF data, see Figure 15.
Third, when re-doing the LASSO regression in Section 4.2 using the SPF data, we 24 The SPF is a quarterly survey, so the number of observations available for estimation becomes lower than for the household regressions. However, in contrast to the Michigan Survey of Consumers, forecasts for multiple horizons, up to one year ahead, are available for SPF. For this reason, we do not need to apply an instrument variable approach, but restrict ourself to π t+3,t inflation rates. The news topics data are converted from monthly to quarterly frequency by taking the quarterly mean of the monthly variables. actually find that none of the news topics are selected as predictors. Thus, our hypothesis regarding media's role in shaping households' inflation expectations, but not the expectations of professional forecasters, is supported by the data.
Finally, when testing whether the degree of time-varying persistence in actual inflation can explain the evolution of information rigidity among professional forecasters, we confirm the predictions from the theory model in Section 2 (without a media channel).
As seen from column I in Figure 16a, there is a negative relationship between the persistence parameter for inflation and the degree of information rigidity among professional forecasters. However, compared to the number of monthly observations available for estimation for the households' data, the number of observation available for the quarterly SPF data is much more restrictive. Therefore, the results also become more uncertain, and the posterior distribution have a substantial mass above zero. Interestingly, however, We conclude from this experiment that our research design, estimating time-varying information rigidities and then linking them to the underlying time series properties of the relevant information sets (news or inflation), seems valid.
Turning to the second motivation for looking at the SPF data, we run the following regression: |β H (a) Inflation-driven information rigidities (b) β difference regression Figure 16. Survey of Professional Forecasters and inflation persistence. Figure 16a reports the posterior distributions for the γ 1 parameter from equation (15) when β t is the time-varying information rigidity among the Survey of Professional Forecasters, and ρ t refers to the persistence in actual CPI inflation.
Three different estimation samples are considered. Figure 16b reports the posterior estimates from equation 23. See the text and the notes to Figure 4 for additional details.
among households and professional forecasters, and ρ t and κ t are the news-based persistence and noise-to-signal variables defined earlier. 25 As we have shown that the media matters for households, but not for professional forecasters, the news-based persistence and noise measures should be able to explain the difference between the information rigidity among households and professional. Figure 16b graphs the results, and shows that we reach a positive answer. Although we find that the noise-to-signal parameter is centered around zero, the posterior estimate of the persistence parameter is highly significant.
Moreover, the regression fit is relatively good, with an adjusted R 2 statistic of 0.26.
Appendix F Feature selection and topic time series  (2003). However, while similar in spirit, it differs fundamentally. Carroll (2003) assumes that households adjust their expectations toward rational professionals when media intensity is high. Thus, he uses the absolute difference in inflation expectations as dependent variable. In contrast, our results suggest that information rigidities are present also among the professionals, but that they cannot be explained by the media. Thus, we use absolute information rigidity differences as dependent variable to test for the role of the media among households.
taken to clean and reduce the raw dataset before estimation (Gentzkow et al. (2017)).
First, a stop-word list is employed. This is a list of common words not expected to have any information relating to the subject of an article. Examples of such words are the, is, are, and this. In total, the stop-word list together with the list of common surnames and given names removed roughly 1800 unique tokens from the corpus. Next, an algorithm known as stemming is run. The objective of this algorithm is to reduce all words to their respective word stems. A word stem is the part of a word that is common to all of its inflections. An example is the word effective whose stem is effect. Finally, a measure called tf-idf, which stands for term frequency -inverse document frequency, is calculated.
This measures how important all the words in the complete corpus are in explaining single articles. The more often a word occurs in an article, the higher the tf-idf score of that word. On the other hand, if the word is common to all articles, meaning the word has a high frequency in the whole corpus, the lower that word's tf-idf score will be. Around 150 000 of the stems with the highest tf-idf score are kept, and used as the final corpus.

F.2 Topic extraction
The "cleaned", but still unstructured, datasets are decomposed into news topics using a Latent Dirichlet Allocation (LDA) model (Blei et al. (2003)). The LDA model is one of the most popular clustering algorithms in the NPL literature because of its simplicity, and because it has proven to classify text in much the same manner as humans would do (Chang et al. (2009)).
The LDA is an unsupervised topic model that clusters words into topics, which are distributions over words, while at the same time classifying articles as mixtures of topics.
A unsupervised learning algorithm is an algorithm that can discover an underlying structure in the data without being given any labeled samples to learn from. The term "latent" is used because the words, which are the observed data, are intended to communicate a latent structure, namely the subject matter (topics) of the article. The term "Dirichlet" is used because the topic mixture is drawn from a conjugate Dirichlet prior. 26 Different algorithms exist for solving the LDA model. We follow Griffiths and Steyvers (2004), and estimate the model using Gibbs simulations. Technical details and a short description of estimation and prior specifications are described in Appendix G.2. Here we note that we extract K = 80 topics from each of the three cleaned datasets. We subjectively chose K = 80 for two reasons. First, this was the choice showing the best statistical results in Larsen and Thorsrud (2018b) and Thorsrud (2018). Second, we have experimented with estimating both fewer and more topics. It is our experience that with K substantially higher than 80, each topic starts to become highly event specific, i.e., there are signs of over-fitting. Conversely, extracting substantially fewer than 80 topics results in too general topics. Thus, in sum, our choice of K = 80 is based on a compromise between fitting the corpus well, getting interpretable topics, as well as earlier experience.

F.3 Topic time series
Given knowledge of the topics (and their distributions), the topic decompositions are translated into tone-adjusted time series. To do this, we proceed in three steps. In short, we first collapse all the articles for a particular day into one document, and then compute, using the estimated word distribution for each topic, the topic frequencies for this newly formed document. See the end of Appendix G.2 for the technical details. This yields a set of K daily time series. Then, for each day and topic, we find the article that is best explained by each topic, and from that identify the tone of the topic, i.e., whether or not the news is positive or negative. This is done using an external word list and simple word counts, similar to in Tetlock (2007). The word list used here classifies positive/negative words as defined by the Harvard IV-4 Psychological Dictionary. For each day, the count procedure delivers a statistic containing the normalized difference between positive and negative words associated with a particular article. These statistics are then used to sign-adjust the topic frequencies computed in step one.
Notice from the description above that also the tone adjustment procedure explicitly uses the output from the topic model. Still, the method used for identifying the tone of the news using dictionary based techniques is simple, and could potentially be improved upon with more sophisticated algorithms (Pang et al. (2002)). While leaving such endeavors for future research, Thorsrud (2018) shows that working with topic frequencies without tone adjustment results in a loss of important information.
Appendix G Models

G.1 The Latent Threshold Model
In Section 3.1 of the main paper we describe the Latent Threshold Model (LTM). Here we provide the estimation details. For convenience we first repeat the model, which can be written in general notation as follows: where t is the time index, x t is a (n×1) vector of variables used for prediction, b t a (n×1) vector of time-varying parameters, and ς t is a zero one variable, who's value depends on the indicator function I(|β t | ≥ d). If the ith element in |β t | is above the ith element in the (n × 1) threshold vector d, then the ith element in ς t = 1, otherwise ς t = 0. Ξ equals the identity matrix in our application, but does not need to do so, see the discussion below. Finally, Σ e is a diagonal matrix, and the error terms, e t and u t , are assumed to be independent. Apart from equation (24b), the system in (24) has a standard state space form.
To simulate from the conditional posterior of β t and d in (24b), we follow the procedure outlined in Nakajima and West (2013). That is, conditional on all the data and hyperparameters in the model, x T = [x 1 , ..., x T ] , d, Σ e and σ 2 u , we draw the conditional posterior of β t sequentially for t = 1 : T using a Metropolis-Hastings (MH) sampler. As described in Nakajima and West (2013), the MH proposals come from a non-thresholded version of the model specific to each time t, as follows. Fixing ς t = 1, takes proposal distribution N (β t |m t , M t ) where: for t = 2 : T − 1. For t = 1 and t = T , a slight modification is needed. Details can be found in Nakajima and West (2013). The candidate is accepted with probability: where b t = β t ς t is the current state, and b p t = β p t ς p t is the candidate. The independent latent thresholds in d can then be sampled conditional on the data and the hyper-parameters. For this, a direct MH algorithm is employed. Let d −j = d 0:s \d j . A candidate is drawn from the current conditional prior, d p j ∼ U (0, |β 0 | + K), where K is described below, and accepted with probability: α(d j , d p j ) = min 1, Π T t=1 N (yt|x t b p t ,σ 2 u ) N (yt|x t bt,σ 2 u ) where b t is the state based on the current thresholds (d j , d −j ), and b p t the candidate based on (d p j , d −j ).
Lastly, conditional on the data, the hyper-parameters and the time-varying parameters, we can sample σ 2 u and the diagonal elements in Σ e using the standard inverse Gamma distribution. For each of these elements we use a degrees of freedom prior of 10, and set the prior variance to 0.01.
In essence, the MH steps described above are identical to those described by Nakajima and West (2013). We only differ in the assumptions we make about the processes for the elements in β t , which follow stationary processes in their application, but independent Random Walks in our application. In turn, this simplifies inference, since Ξ is an identity matrix. However, this model formulation has consequences for the choice of K. The K parameter, used to draw d p j , controls our prior belief concerning the marginal sparsity probability. A neutral prior will support a range of sparsity values in order to allow the data to inform on relevant values. For stationary processes for β t , results in Nakajima and West (2013) suggest that setting K = 3 is a reasonable choice. Because of the nonstationarity assumption adopted here, this recommendation can not be followed. We set K = 1, which, based on a reference model without latent threshold dynamics, seems to be a reasonable prior. Finally, the prior mean and covariance for the initial states are set to zero and the identity matrix, respectively.

G.2 The Latent Dirichlet Allocation Model
The Latent Dirichlet Allocation (LDA) is estimated using the algorithm described in Griffiths and Steyvers (2004). Before giving an overview of the procedure, we need to introduce some notation. First, let the corpus consists of M distinct documents, where N = M m=1 N m is the total number of words in all documents, K is the total number of latent topics, and V is the size of the vocabulary. Each document consists of a repeated choice of topics Z m,n and words W m,n . Let t be a term in V , and denote P (t|z = k), the mixture component, one for each topic, by Φ = {ϕ k } K k=1 . Finally, let P (z|d = m) define the topic mixture proportion for document m, with one proportion for each document Θ = {θ m } M m=1 . The goal of the algorithm is then to approximate the distribution: P (Z|W ; α, β) = P (W , Z; α, β) P (W ; α, β) using Gibbs simulations, where α and β are the (hyper) parameters controlling the prior conjugate Dirichlet distributions for θ m and ϕ k , respectively. A very good explanation for how this method works is found in Heinrich (2009). The description below provides a brief summary only.
With the above definitions, the total probability of the model can be written as: In (30), the terms inside the first integral do not include a θ term, and the terms inside the second integral do not include a ϕ term. Accordingly, the two terms can be solved separately. Exploiting the properties of the conjugate Dirichlet distribution it can be shown that: where n (k) m denotes the number of word tokens in the m th document assigned to the k th topic, and n (t) k is the number of times the t th term in the vocabulary has been assigned to the k th topic.
Since P (W ; α, β), in (28), is invariable for any of Z, the conditional distribution P (Z|W ; α, β) can be derived from P (W , Z; α, β) directly using Gibbs simulation and the conditional probability: P (Z (m,n) | Z −(m,n) , W ; α, β) = P (Z (m,n) , Z −(m,n) , W ; α, β) P (Z −(m,n) , W ; α, β) where Z (m,n) denotes the hidden variable of the n th word token in the m th document, and Z −(m,n) denotes all Zs but Z (m,n) . Denoting the index of a word token by i = (m, n), and using the expressions in (31) and (32), cancellation of terms (and some extra manipulations exploiting the properties of the gamma function) yields: where the counts n (·) ·,−i indicate that token i is excluded from the corresponding document or topic. Thus, sampling topic indexes using equation (34) for each word in a document and across documents until convergence allows us to approximate the posterior distribution given by (28). As noted in Heinrich (2009), the procedure itself uses only five larger data structures; the count variables n (k) m and n (t) k , which have dimension M × K and K × V , respectively, their row sums n m and n k , as well as the state variable z m,n with dimension W .
With one simulated sample of the posterior distribution for P (Z|W ; α, β), ϕ and θ can be estimated from:φ In the analysis of the main paper the average of the estimatedθ andφ from the 10 last samples of the stored Gibbs simulations are used to construct the daily news topic frequencies. Because of lack of identifiability, the estimates ofθ andφ can not be combined across samples for an analysis that relies on the content of specific topics. However, statistics insensitive to permutation of the underlying topics can be computed by aggregating across samples, see Griffiths and Steyvers (2004).
Before estimation three parameters need to be predefined: the number of topics and the two parameter vectors of the Dirichlet priors, α and β. Here, symmetric Dirichlet priors, with α and β each having a single value, are used. In turn, these are defined as a function of the number of topics and unique words: The choice of K is discussed in Section F.2. In general, lower (higher) values for α and β will result in more (less) decisive topic associations. The values for the Dirichlet hyperparameters also reflect a clear compromise between having few topics per document and having few words per topic. In essence, the prior specification used here is the same as the one advocated by Griffiths and Steyvers (2004).
Using the posterior estimates from the LDA model, the frequency with which each topic is represented in the newspaper for a specific day is computed. This is done by first collapsing all the articles in the newspaper for one specific day into one document.
Following Heinrich (2009) and Hansen et al. (2018), a procedure for querying documents outside the set on which the LDA is estimated is then implemented. In short, this corresponds to using the same Gibbs simulations as described above, but with the difference that the sampler is run with the estimated parameters Φ = {ϕ k } K k=1 and hyper-parameter α held constant.
Denote byW the vector of words in the newly formed document. Topic assignments, Z, for this document can then be estimated by first initializing the algorithm by randomly assigning topics to words and then performing a number of Gibbs iterations using: Sinceφ k,t does not need to be estimated when sampling from (37), fewer iterations are needed to form the topic assignment index for the new document than when learning both the topic and word distributions. Here 2000 iterations are performed, and only the average of every 10th draw is used for the final inference. After sampling, the topic distribution can be estimated as before:

G.3 Time-varying Autoregressive models
To model the time-varying Autoregressive processes we follow the general setup used in Primiceri (2005), where the parameters follow independent random walks. Letting y t denote the dependent variable, the model structure can be written as: This system contains two time-varying state variables, ρ t and log(σ t ), and two hyperparameters σ 2 ρ and σ 2 σ . The system is estimated using Carter and Kohn's multimove Gibbs sampling approach (Carter and Kohn (1994)) together with the mixture proposal by Kim et al. (1998) to estimate the stochastic volatility part of the model.
We specify the initial values ρ 0 ∼ N (0, 1), y 0 ∼ N (0, 1), and log(σ 0 ) ∼ N (0, 1). The priors for the hyper-parameters are all from the inverse-Wishart distribution: σ 2 ρ ∼IW (T σ 2 ρ ,T σ 2 ρ · κ 2 σ 2 ρ )T σ 2 ρ = 10, κ σ 2 ρ = 0.01 σ 2 σ ∼IW (T σ 2 σ ,T σ 2 σ · κ 2 σ 2 σ )T σ 2 σ = 10, κ σ 2 σ = 0.03 where the first element in each prior distribution is the degrees of freedom parameter, and the second the scale parameter. We note that for the inverse-Wishart distribution the prior scale matrix has the interpretation of the prior sum of squared residuals. Therefore, each scale matrix is multiplied by the degrees of freedom parameter. Also, for the inverse-Wishart prior to be proper, the degrees of freedom parameter must be larger than the dimension of the scale matrix. This is the case in all our prior specifications.
Since (39), conditional on y T , σ T , and σ 2 ρ , constitute a conditional linear state space system, ρ T can be drawn using Carter and Kohn's multimove Gibbs sampling approach (Carter and Kohn (1994)).
Second, conditional on ρ T , we can construct the following transformation: which, after squaring and taking logarithms, can be written as: where e t = log(w 2 t ). Now, equation (43) together with the law-of-motion for log(σ t ) in (39), is a new state space system, albeit non Gaussian. Accordingly, we use the mixture approximation proposed in Kim et al. (1998), together with Carter and Kohn's multimove Gibbs sampling approach, to sample σ T .
Finally, conditional on y T , σ T and ρ T , the hyper-parameters σ 2 σ and σ 2 ρ are drawn from the inverse-Wishart distribution.