Structural transmissions among investor attention, stock market volatility and trading volumes

We employ data ‐ based approaches to identify the transmissions of structural shocks among investor attention measured by Google search queries, realised volatilities and trading volumes in the United States, the United Kingdom and the German stock market. The two identification approaches adopted for the structural vector autoregressive analysis are based on independent component analysis and the informational content of disproportional variance changes. Our results show robust evidence that investors' attention affects both volatilities and trading volumes contemporaneously, whereas the latter two variables lack immediate impacts on investors' attention. Some movements in investors' attention can be traced back to market sentiment.


| INTRODUCTION
The Google search engine has become an integral tool to find information for more and more people around the globe. The aggregate search frequency in Google provides a direct measure of individual/retail investor's attention (Da et al., 2011). When an individual searches for DOW in Google, she/he certainly pays attention to it. Empirical analysis has shown that stock return volatilities and trading volumes are positively associated with Google search queries (Vlastakis & Markellos, 2012). Moreover, changes in search queries today can help to explain future changes in return volatility. Dimpfl and Jank (2016) show that search queries Granger-cause volatility and including search queries in models of realised volatility improves volatility forecasts out-of-sample. It is, however, unclear what is the propagation mechanism of the shocks. Does a volatility shock trigger search queries (investors' attention), or/and is it the increased investor's attention (reflected in a positive shock in search queries) that triggers more trading and thereby higher volatility?
On the one hand, there are several theoretical underpinnings for the impact of investors' attention on volatility. If investors pay more attention, new information is quickly incorporated into prices and, thus, can induce high return volatility (Andrei & Hasler, 2015). Moreover, as retail investors are often regarded as uninformed noise traders, their trading can lead to excessive volatilities of asset prices according to the noise trader model (De Long et al., 1990). Similarly, exogenous shocks of the fundamental prices can be interpreted by noise traders as a potential future trend in agent-based models (Lux & Marchesi, 1999). When there is a large fraction of noise-trader agents in the market, the volatility of the stock becomes larger. Thus, the higher the volume of the search queries, the more likely it is that retail investors are actively trading and the larger are the volatilities of the relevant stocks. On the other hand, volatile movements in the stock markets have been frequently featured in the news, specially in downturn periods. This could attract retail investors' attention and increase the count of search queries for the stock indices. The recursive structural vector autoregressive (SVAR) model in Dimpfl and Jank (2016), for example, builds upon the hierarchical assumption of an immediate impact of volatility on search queries. This paper contributes to the literature by estimating the contemporaneous relationship between search queries and return volatilities. For such a purpose, ad-hoc impositions of triangular structures (e.g., in terms of lower triangular Cholesky factors) for SVAR models leave no room for the data to object against the model implied hierarchy. In this paper, we use datadriven identification approaches, which let data determine the latent structural relationships. Our analysis is based on daily Google search queries for US, UK and German stock market indices from 2006 to 2011. Our data exhibit both deviations from a conditionally multivariate Gaussian model and conditional changes in the covariance structure. Therefore, we exploit the uniqueness of independent structural shocks (Matteson & Tsay, 2017) and the informational content of disproportional variance changes of the model implied structural shocks (Bouakez & Normadin, 2010;Lanne & Saikkonen, 2007;Normadin & Phaneuf, 2004) for SVAR identification.
Results from both identification approaches and the three markets point to the same evidence-shocks in Google search queries affect return volatilities immediately, whereas shocks in volatilities exert an only mild (if any) instantaneous effect on search queries. Therefore, what underlies the positive correlation observed among search queries and volatility is the increased investor's attention which triggers more trading and, thus, higher volatility. Introducing the trading volume as a third variable into the SVARs confirms that search queries also affect the trading volume simultaneously. The results from this paper also provide a guide for the order of the variables in the recursive SVARs, namely, search queries as the first variable and realised volatility as the second in trivariate SVARs. Using a different ordering of the variables, the model implied impulse responses are at the risk to provide a misleading perspective on impact and dynamic relations within the triad of search queries, volatilities and volumes.
After highlighting the contemporaneous impact of retail investors' attention on stock market volatility, we further explore if market sentiment is partially behind movements in investors' attention. We estimate bivariate SVARs using Goggle search queries on DOW and the FEARS sentiment index from Da et al. (2015), the latter of which is available for the United States. The results show that changes in the market sentiment have a significant contemporaneous impact on variations in retail investors ' attention. De Long et al. (1990) demonstrate that changes in investors' sentiment can lead to more noise trading and excess volatility, if uninformed noise traders base their trading decision on sentiment. Da et al. (2015) confirm the positive contemporaneous relationship between sentiment and the market volatility empirically. Our results are in line with the view that the retail investor's attention could be part of this transmission channel.
The remainder of this paper is organised as follows: Section 2 provides a brief formalisation of the SVAR model and sketches the data-based identification schemes. Section 3 introduces the data and provides some preliminary empirical analyses. Section 4 addresses structural and dynamic empirical relationships in the triad of search queries, realised volatilities and trading volumes. Section 5 looks at the relationship between search queries and market sentiment. Section 6 summarises our main findings and concludes.

| DATA-BASED IDENTIFICATION OF SVARS
This section provides an outline of the vector autoregressive (VAR) model in its reduced form and in SVAR representation. The identification problem is described and subsequently we sketch two alternative data-based identification schemes.

| The structural VAR
Consider a p-th order autoregressive model for the K -dimensional system of random variables y t , that is, with vector-valued intercept terms ν K K , × parameter matrices A i , the backshift operator L such that ⋯ Ly and I K denoting the identity matrix. By assumption, the model is stationary, that is, ≠ A z det( ( )) 0 for all | | ≤ z 1. (1) and (2) differ in terms of their stochastic model components. Reduced-form residuals u t in Equation (1) are of mean zero (E u ( ) = 0 t ) and subject to contemporaneous correlation with covariance Σ. Residuals ε t in Equation (2) are orthogonal with E ε ( ) = 0 t and Cov ε I [ ] = t K . By implication, the covariance matrix Σ aligns with the decomposition Σ = BB′, where B is a nonsingular K K × parameter matrix. Orthogonalized residuals ε t qualify as 'structural shocks' if an identified parameter matrix B benefits from a sound theoretical underpinning of its implied effect structure, which is typically summarised in the form of impulse response functions (IRFs). Unlike reduced form residuals u t , the structural shocks ε t cannot be recovered uniquely by means of the ordinary least squares estimation. As the space of potential covariance decompositions Σ = BB′ is infinite, it deserves further assumptions to identify the structural parameters in B.

The representations in Equations
To design a space of alternative covariance decompositions, let Q denote a rotation matrix ( ≠ Q I I , QQ′ = K K ) which is parameterised with (vector-valued) rotation angle(s) θ, that is, Q Q θ = ( ). Moreover, D is the lower triangular Choleski factor of Σ, such that Σ = DD′. Then, a space of covariance decompositions results from rewriting the residual form covariance where the representation B θ DQ θ ( ) = ( ) points to B θ ( ) as a specific member of . The prime aim of SVAR analysis is to identify a particular structural matrix B B θ = ( ), as this matrix formalises the instantaneous impacts of the structural shocks ε t on the observable variables in u t (or y t ). Henceforth, the dependence of the structural parameter matrix B on θ is suppressed whenever appropriate.

| Marginal effects
Typical parameters of the structural matrix B, denoted b ij , quantify the instantaneous impact of shocks ε jt on reduced form residuals u it (or y it ). A reformulation of model (2) allows for an explicit formalisation of the implied contemporaneous linkages among the variables in y t , that is, . Unlike the model in Equation (2), the left hand side of Equation (5) is explicit on the marginal effect patterns that involve the variables in y t contemporaneously. To provide these effects in normalised form, define Ω t as the information set comprising the process information up to time t, that is, (21) 1 2 we obtain after normalisation Estimates of ω t−1 (1) and ω t−1 (2) in Equation (6) are obtained from the VAR parameters and information in Ω t−1 . If B could be observed or properly identified, one can directly recover causal effects that are typically in the focus of single equation regression models. Conditional on Ω t−1 , the marginal effect of y t 2 on y t 1 is measured by ∕ b b − (12) (11) and the marginal effect of y t 1 on y t 2 is measured by

| Identification
For purposes of structural analysis, triangular-model structures (i.e., Cholesky factors) have become popular (Sims, 1980). However, to justify the choice B D = one has to make a-priori assumptions to motivate a particular hierarchical model. Kilian and Lütkepohl (2017) provide an up-to-date textbook treatment of identification schemes in SVAR analysis. In this study, we avoid the setting of a-priori triangular model structures and use instead data-based identification schemes to uncover structural transmission patterns.
Two specific statistical properties of ε t have become prominent in identifying structural information, namely, (i) independently distributed and non-Gaussian shocks and (ii) shocks with informative (i.e., nonproportional) changes in variance. These properties provided the motivation for two classes of data-based identification. In this study, we identify structural models by means of one representative of each class. First, we exploit the independence property following Matteson and Tsay (2017). Second, we focus on the assumption of generalised autoregressive conditionally heteroskedastic (GARCH) structural shocks as suggested, for example, by Normadin and Phaneuf (2004), Bouakez and Normadin (2010) and Lanne et al. (2017). 1 Below are the brief outlines of the two alternative (or complementary) identification schemes employed in this study.
(i) Independent components Independence based identification builds upon a fundamental result of Comon (1994) stating that a vector of independent components ε t allows the unique recovery of B from reduced form residuals u t , if at most one independent component ε it exhibits a Gaussian distribution. For an intuitive illustration, let's assume that a false member of the class of covariance decompositions  in Equation (4), denoted ∼ B , is used for the structural analysis. Then, the corresponding structural shocks read as . By implication, if the elements in ε t are independent and non-Gaussian by assumption, the elements ε k t , in ε t obtain as linear combinations of ε k t , and, therefore, lack independence. Put differently, to recover the true independent, non-Gaussian shocks ε t from reduced form residuals u t , it is essential to employ the correct structural matrix B. On the basis of the uniqueness of independent non-Gaussian components, several approaches have been suggested for SVAR identification. These approaches differ with respect to the investigated space of structural matrices (e.g., the comparison of alternative recursive structures in Moneta et al., 2013) and the rigidity of underlying parametric model assumptions (e.g., the imposition of distributional assumptions to enable (pseudo) ML estimation in Gouriéroux et al., 2017;Lanne et al., 2017). As our daily data provide huge sample information and exhibit heterogeneous second-order properties, we refrain from 1 For the implementation of data-based identification and further computations we use the R-package 'svars' (https://CRAN.R-project.org/package-svars, Lange et al., 2019, see also Lange et al., 2020). We use the modules 'id.dc' (independence of non-Gaussian shocks) and 'id.garch' (conditionally heteroskedatic shocks).
The module 'id.dc' builds upon the function steadyICA from the R package steadyICA (Risk et al., 2015). parametric pseudo ML estimation. Instead we pursue a semiparametric estimation by targeting at implied shocks ε t which provide weakest evidence against the null hypothesis of independence in terms of a suitable test statistic. More specific, we follow Matteson and Tsay (2017) who suggest to obtain an estimate of the structural parameters from solving the minimisation problem where the degree of dependence is quantified in terms of the so-called distance covariance statistic of Székely et al. (2007). Henceforth, we denote the structural matrix estimates based on an independence assumption as B DC . (ii) Heteroskedastic shocks Rigobon (2003) has pioneered the identification of heteroskedastic structural shocks. 2 Going beyond the stylised covariance break model of Rigobon (2003), the second identification scheme that we employ in this study builds upon patterns of GARCH variances (Bollerslev, 1987) as proposed by Normadin and Phaneuf (2004), Lanne and Saikkonen (2007) and Bouakez and Normadin (2010). In this framework, time-varying covariances are formulated as is a diagonal matrix and s k t , 2 denotes GARCH-type conditional variance processes capturing the conditional second order properties of the structural shocks. Assuming a parsimonious GARCH(1,1) specification and noticing that 2 exhibit a dynamic structure as Under suitable distributional and parametric restrictions ( ), the GARCH processes ε k t , are covariance stationary (Milunovich & Yang, 2013). Sentana and Fiorentini (2001) have shown that the structural parameters in B can be determined uniquely by means of (quasi) ML estimation, if at least K − 1 structural shocks exhibit dynamic GARCH-type variance patterns. Henceforth, we denote the structural matrix estimates based on changes of variance as B GARCH .
Irrespective of an applied identification scheme, any identified matrix B is unique only up to the ordering and signs of its columns. With ⋅ b i denoting a typical column of B it is immediate to observe that the reduced form covariance (Σ) allows for a representation ∑ imply the same covariance BB′ = Σ. Hence, the explicit exposition of structural parameter estimates deserves a suitable guideline. We follow the convention to document 2 For simplicity of demonstration of this identification scheme, assume that there are two distinct covariances (denoted as Σ (1) and Σ (2) ) that characterise the reduced form residuals u t across two disjoint subsamples. From these subsamples, one can estimate two sets of ∕ N N ( + 1) 2 empirical (co)variances. The two covariance matrices allow for a reparameterization as Σ = BB′ (1) and , where Λ is a diagonal matrix. This representation comprises N N N N + = ( +1) 2 unknown parameters that formalise the structural model. Hence, it becomes possible to map the estimated (co)variances one-to-one into the parameter space of the structural model. For a unique mapping, however, it is essential that the diagonal elements in Λ are distinct from each other.
Otherwise single structural shocks remain unidentified. impact effects of positive shocks. In case that a particular structural matrix candidate obtained from SVAR estimation has a negative diagonal element (i.e., b < 0 ii ), ⋅ b i is multiplied with minus one. To achieve a unique column ordering, we choose from the set of alternative column orderings the one which implies the largest sum of diagonal elements of B (Lütkepohl & Netšunajev, 2017). Hence, an identified structural shock is supposed to exert the strongest effect on the particular variable to which it is primarily associated.

| DATA AND PRELIMINARY ANALYSIS
In this section, we introduce the data and conduct a preliminary analysis with recursive SVARs.
Our analysis focuses on the US stock market index (the Dow Jones Industrial Average-DJIA), the German stock market index (DAX) and the UK stock market index (FTSE 100). The daily realised volatilities for the three indices are obtained from Oxford-Man Institute. 3 It is the sum of the squared intraday log-price changes of the index over 10 min intervals (Andersen et al., 2001;Barndorff-Nielsen & Shephard, 2002). Daily data on trading volumes have been obtained from Datastream. It is the total number of shares traded from the underlying stocks of the corresponding index per day in millions. With regard to data on Google search queries, those for the keywords "DOW" (US search queries) from the 3 July 2006 to the 30 December 2011 are from Dimpfl and Jank (2016). Search queries for the keywords "FTSE" (UK stock market) and "DAX" (German stock market) from the 3 July 2006 to the 30 June 2011 are from Dimpfl and Jank (2011). 4 These are the longest time series of search queries daily data that we can obtain for each market and that were directly downloaded from Google trend. 5 The time periods of available data on search queries determine the sample period of the SVAR models in the analysis. Table 1 presents descriptive statistics for search queries (SQ), realised volatilities (RV) and trading volume (VO). For RV, the raw data are heavily skewed and have excessive kurtosis. Applying the log transformation, however, reduces the skewness close to zero and the kurtosis close to three. The log transformation also helps to reduce the high skewness and larger kurtosis of SQ. We use all variables in natural logarithms in our analysis. The estimated AR coefficient matrices in the (reduced form) VAR models are similar to those in Dimpfl and Jank (2011) and Dimpfl and Jank (2016). They are not reported here for space considerations.
The linear relation between structural shocks and observable variables is unique if the former are independent and non-Gaussian distributed. Hence, the applicability of the identification of independent components relies on the testable assumption of non-Gaussianity. Unreported diagnostic results from Jarque-Bera tests of the null hypothesis of joint Gaussianity of the reduced form model residuals (i.e., p-values < 0.01%) indicate that model residuals are at odds with the assumption of joint Gaussianity. 6 As alternative recursive models imply quite distinct hierarchical/causal patterns of shock transmission, it is interesting to unravel in how far these transmissions apply to independent shocks or only to orthogonalized model residuals. Accordingly, independence diagnosis provides indicative information on alternative variable orderings in the VAR model. We use distance covariance statistics to assess the dependence of orthogonalized model residuals implied by lower triangular covariance factors under two and six alternative variable orderings in bi-and trivariate VARs, respectively. 7 Given strong evidence against multivariate Gaussian models, it is not surprising to see that most variations of variable orderings in lower triangular models obtain orthogonalised model residuals which lack independence.
As can be seen in Table 2 for all markets, orderings with SQ not in the first position obtain strong evidence against the null hypothesis of independence (p-values < 1%). For the few cases where the null hypothesis of independence cannot be rejected, SQ is in the first position throughout and RV is ordered second. Hence, from the set of potential hierarchical models the particular order where shocks to SQ have an immediate impact on the remaining variable(s) of the dynamic system seems best in line with the assumption of independent shocks. Specifically, for such triangular covariance decompositions the p-values for the German market with the order-(SQ, RV)-in the bivariate VAR and the order-(SQ, RV, VO)-in the trivariate VAR are in excess of 10% and 5%, respectively. The p-value for the US market with the order-(SQ, RV, VO)-is about 5%. With these exceptions, however, hierarchical models generally fail to yield structural shocks which can be reasonably considered as independent. Hence, it is of further interest to investigate if unrestricted covariance decompositions allow the retrieval of unique independent structural shocks.

| DATA-DRIVEN SVARS
In this section we discuss the model selection, present structural parameter estimates and results from the data-driven SVARs. We also show some further model diagnostics which underpin the informational content of independent components and disproportional covariance changes for model identification. As the identified structural parameter matrices B DC and B GARCH are quite similar for all markets and VAR dimensions, the following discussion of empirical results refers mainly to the implications of B DC . We provide further diagnostic outcomes subsequent to the discussion of estimation results.

| Structural implications of identified models
The estimation of the structural relations confirms that retail investors' attention (measured by Google search queries) affect stock return volatility contemporaneously, whereas effects of volatility on search queries are negligible. This evidence is robust and can be found for all three stock indices and both identification schemes. Table 3 summarises the estimated structural relations for shocks in SQ and RV. All matrices are close to a lower triangular matrix. Numerical values of the upper right estimates are very close to zero albeit statistically significant in some cases. This result confirms the adopted variable ordering for a recursive SVAR model, as indicated by the preliminary analysis.
Our results are consistent with the existing theories. In the noise trader model of De Long et al. (1990), when noise traders are present, asset prices become excessively volatile such that they move more than can be explained by changes of fundamental values. In the agent-based model of Lux and Marchesi (1999), exogenous shocks of the fundamental prices can be interpreted by noise traders as a potential future trend. If there is a large fraction of noise-trader agents in the market, the volatility of the stock increases. In the model of Andrei and Hasler (2015), when investors pay more attention to news, new information is quickly incorporated into prices and, thus, induces high return volatility. All these theoretical models are supporting our results. Google search queries approximate the retail investors' (noise traders') attention. The higher the volume of search queries, the more interest (retail) investors show. As more (retail) investors trade, the volatility increases.

RV
Introducing trading volumes as a third variable into our system confirms that shocks in search queries affect trading volumes on impact. Table 4 summarises the estimated structural relations (B matrices) of the trivariate SVARs comprising SQ, RV and VO. The relationships between the first two variables (SQ and RV) documented in Table 4 are very close to those characterising the bivariate systems (documented in Table 3). Whereas shocks in SQ affect RV contemporaneously (see estimates of b 21 ), shocks in RV exert only weak impacts on SQ (see estimates of b 12 ). Now consider impacts on VO. Shocks in SQ affect VO significantly in all three markets (see estimates of b 31 ). Moreover, there is some evidence of significant impacts of shocks in RV on VO for Dow Jones and DAX as implied by B DC (see estimates of b 32 ). Now consider the impacts of shocks to trading volumes (see estimates in ⋅ b 3 ). In this regard, we do not find any significant impact from shocks in VO on SQ. This result is intuitive. Information about trading volumes is not a popular topic on mass media, as such changes in trading volumes would not draw the attention of the retail/noise investors immediately and thereby affect search queries. Shocks in VO show significant impacts only on RV of FTSE. The weak indications of impacts of the VO on RV are consistent with the evidence from the literature that information on trading volumes does not improve the accuracy of volatility forecasts (e.g., Brooks, 1998).
Though estimates of the structural matrix B demonstrate contemporaneous instantaneous effects of structural shocks on the variables of a dynamic system, their numerical interpretations are limited. In contrast, the model-implied marginal effects as displayed in Equation (6) allow for a direct interpretation of effects among the variables conditional on the history Ω t−1 .
T A B L E 4 Estimated structural relations in trivariate structural vector autoregressive models This table reports the estimated structural relations (B matrices) of the trivariate SVAR models, with standard errors in parentheses. The search query (SQ) is the first, the realised volatility (RV) the second, the trading volume (VO) the third variable in the analysed series y t . Parameter estimates with 5% significance are highlighted.  Table 5 summarises the model-implied marginal effects for all markets and models. 8 As all variables are measured in natural logarithms, the documented estimates allow for an interpretation as elasticities conditional on Ω t−1 . Changes in SQ have almost a doubled effect on RV. When the Google search volume on the stock index (relative to the total search volume) increases by 1%, RV increase by 1.3%-2.1% depending on the market. This result is found in both bivariate and trivariate SVARs. In addition, when the relative search volume increases by 1%, VO of the corresponding index increases mildly by about 0.2%-0.4%.

DC GARCH
As the next, we look at the long-term impact of the identified contemporaneous structural relationships. This can be observed through IRFs, which trace the effects of the identified structural shocks on the variables of the system over time. Figure 1 shows the , denote quantities that are available from sample information Ω t−1 and estimated reduced form model parameters. Then, bivariate structural vector autoregressive models (SVARs) have the following structure Structural relations in trivariate SVARs are of the form: The marginal representations involve a rescaling of the elements of structural shock vectors εk t , by the structural volatilities.
F I G U R E 1 Impulse response functions (IRFs) for the trivariate structural vector autoregressive (SVAR) model. This figure displays IRFs for the trivariate SVAR model (implied by B DC for the Dow Jones) and the 95% bootstrap confidence intervals. The R-package 'svars' supports the analyst with tools of bootstrap inference which are commonly used in (structural) VAR analysis. We opt for a recursive design moving block bootstrap approach. Brüggemann et al. (2016) have shown the asymptotic validity of moving block bootstrap designs for inferential analysis in SVAR models. The chosen block length is 30 which is between 2.17% (Dow Jones) and 2.38% (FTSE) of the overall available sample information. The number of bootstrap replications is 1999 IRFs for the trivariate SVAR model as implied by B DC for the Dow Jones. 9 A shock in SQ has a significant impact on RV up to around 90 days and on VO up to around 35 days. The magnitude of the impact on RV is almost five times larger than the one on VO. A shock in RV also has a significant impact on the VO lasting for about 10 days. There are no further significant IRFs among other variations of the pairing of the variables. This evidence confirms that the contemporaneous relationships among the variables dominate the subsequent dynamics (IRFs). It is then not surprising to see that a recursive SVAR model with a different variable sequence than the one suggested by the data-driven approach produces different IRFs, which can be misleading. Figure 2 shows the IRFs from a recursive SVAR model using the ordering (RV, SQ, VO). This structure implies that RV have an impact on the other two variables and SQ have an impact on VO. Indeed, the IRFs (Row 1 Column 2) show that RV have a lasting significant impact on the SQ up to 40 days, which are entirely different from the corresponding ones in Figure 1 showing no significant impacts. This result is purely driven by the assumption that RV affects SQ in the recursive structure. Considering the IRFs of RV to shocks in SQ, the initial zero response (due to the assumed recursive structure) gives way to significant responses from 10 days onwards which is due to lagged impacts of SQ on RV (significant AR coefficients). Also the exclusion of contemporaneous impacts of SQ on RV seems to weaken the magnitude of the IRFs compared with results from the unrestricted structural model evaluation displayed in Figure 1.

| Further diagnostics
This subsection provides further diagnostics which highlight the informational content of the adopted data-driven SVAR models. We first discuss results from independence tests applied to model implied shocks and subsequently comment on the informational content of estimated GARCH models as given in Equation (9).
Considering the results of independence tests in Table 6, it turns out that the estimation of structural shocks in a data-based manner results in independent shocks for most cases. Unlike the lower triangular models, structural models implied by B DC obtain for four out of six specifications, p-values of the distance covariance in excess of 10%. Although one should be careful in interpreting these supremum p-values in the usual way as evidence in favour of the null hypothesis, it seems that identified shocks are not only orthogonal but also independent and unique in this sense. Subjecting the structural shocks identified by means of patterns of conditional heteroskedasticity to independence testing is largely in line with the outcomes for the independence-based identification. For three systems (bivariate: DAX and FTSE; trivariate: FTSE) we find that the hypothesis of independent structural shocks cannot be rejected with 10% significance. Sentana & Fiorentini (2001) have shown that assuming conditionally heteroskedastic structural shocks allows the full identification of the structural model if at least K ( − 1) processes ε k t , are well described by (G)ARCH processes. For the emergence of stylised patterns of volatility clustering, it is essential that the news response parameter (i.e., γ in Equation 9) is positive (and significant). To diagnose if the SVAR model is fully identified under conditionally heteroskedastic shocks, Table 7 documents GARCH parameter estimates and respective standard errors. As all documented estimates γ are significant at conventional levels, it follows that the respective SVARs are fully identified for both dimensions (K = 2, 3) and all markets.  between sentiment and the market volatility empirically. This section shows that the retail investor's attention can be part of this transmission channel. Changes in the market sentiment have a significant impact on variations in retail investors' attention. We adopt the FEARS sentiment index of Da et al. (2015) in SVAR models. This index is more transparent compared with market-based measures and available at a higher frequency compared with survey-based sentiment measures. It reveals market-level sentiment by aggregating the internet search volume of queries related to households' sentiment about the economic conditions. Search queries with strongest historical correlations with the marketsuch as gold prices, recession, GDP, bankruptcy, unemployment-are used to construct the index. Data for this index is only available for the United States, however, and can be obtained from Joseph Engelberg's website. 10 We use the FEARS index based on the top 25, 30 and 35 search terms, which are denoted as FEARS25, FEARS30 and FEARS35, respectively. It is calculated as the sum of daily log changes of top search terms, each of which is adjusted (winsorized, deseasonalized and standardised; see Da et al., 2015 for details). We winsorized and desasonalized the daily changes of SQ on DOW in the same way, 11 to obtain an adjusted SQ growth, denoted SQG. Table 8 provides the descriptive statistics of the series. The various FEARS indices have similar distributions with a minimum around −1.6 and maximum around 3. The SQG series has a minimum around −0.33 and a maximum about 0.43. Bivariate SVAR models with SQG and FEARS are then estimated and the estimated marginal effects (see Equation 6) are shown in Table 9.
We have robust evidence for the impact of FEARS on SQG. The estimated marginal effects are significant and vary between 0.08 and 0.12 depending on the FEARS index (FEARS25, FEARS30, or FEARS35) and the identification method (DC or GARCH). As both variables are growth rates of internet search terms, this indicates that a 1% increase in aggregated search terms revealing sentiment leads to around 0.1% increase in search on DOW in the United States. In contrast, impacts of SQG on FEARS are subject to high estimation uncertainty and lack significance. The evidence from the bivariate SVAR models confirms the contemporaneous impact of market sentiment on retail investors' attention.

| CONCLUSION
This paper fills the gap of literature on the relationship between investor attention and stock market activities by identifying the underlying structural transmission among Google search queries, realised volatilities and trading volumes in the US, German and UK markets. We adopt data-based approaches to structural VAR identification. Unlike the a-priori imposition of triangular (i.e., hierarchical) model structures, the data-based identification allows to estimate the structural model parameters in an unrestricted manner. We consider the two identification strategies to provide complementary information. One is identification through the independence of non-Gaussian structural shocks and the other is identification via conditionally heteroskedastic structural shocks. Our results show the important role of the investor attention in stock markets. Whereas shocks in investor attention affect volatilities and trading volumes immediately, shocks in volatilities and trading volumes do not exert an instant impact on 10 https://rady.ucsd.edu/faculty/directory/engelberg/pub/portfolios/research.htm 11 We do not standardise this series, noticing that standardisation is used in Da et al. (2015) to make their series of search terms comparable. As we have only one search term regarding a stock index (SQ), this step is not necessary. investor attention. Our results are largely robust across the three markets, with alternative identification schemes and using bivariate or trivariate SVAR models. Although our analysis does not fully support the assumption of a hierarchical model, our results provide important guidance on the hierarchical structure of the variables if a recursive SVAR model were used. Finally, our bivariate SVAR models with FEARS indices in the United States and growth rates of search queries on DOW support the view that market sentiment has an impact on retail investor's attention.